2020-10-21

it2024-12-06  10

一个简单的KNN算法,数据已上传

数据文件:https://download.csdn.net/download/weixin_39731450/12998036

代码:

import csv import random with open('prostate-cancer\Prostate_Cancer.csv', 'r')as file: read = csv.DictReader(file) datas = [row for row in read] # 数据随机打乱 random.shuffle(datas) # 分出测试数据 n = len(datas)//3 test_set = datas[0:n] train_set = datas[n:] # 距离 def distanse(d1, d2): res = 0 for key in ("radius", "texture", "perimeter", "area", "smoothness", "compactness", "symmetry", "fractal_dimension"): res += (float(d1[key])-float(d2[key]))**2 return res**0.5 # KNN k = 5 # k个离得最近的 def KNN(data): # 距离 res = [ {"result": train['diagnosis_result'], "distance": distanse(data, train)} for train in train_set # 把整个测试集的拿出来将其中的每一项与现在的数据该项进行计算 ] # 排序 res = sorted(res, key=lambda item: item['distance']) # 取前k个 res2 = res[0:k] # 加权平均 result = {'B': 0, 'M': 0} # 总距离 sum = 0 for r in res2: sum += r['distance'] for r in res2: result[r['result']] += 1-r['distance']/sum if result['B'] > result['M']: return 'B' else: return 'M' print(data['diagnosis_result']) #测试 coreect = 0 for test in test_set: result_test = test['diagnosis_result'] result_train = KNN(test) if result_test == result_train: coreect += 1 print(coreect) print(len(test_set)) print("accuracy:%.2f"%(coreect/len(test_set)))

结果:

 

最新回复(0)