1. 欧几里得距离
欧几里得距离是空间中两点间的距离,是最常见的距离度量方法。其计算公式如下:
import math
def euclidean_distance(point1, point2):
return math.sqrt(sum((p1 - p2) ** 2 for p1, p2 in zip(point1, point2)))
例如,计算点(1, 2)和点(4, 6)之间的欧几里得距离:
distance = euclidean_distance([1, 2], [4, 6])
print(distance) # 输出:5.0
2. 曼哈顿距离
曼哈顿距离是城市街区中两点间的距离,其计算公式如下:
def manhattan_distance(point1, point2):
return sum(abs(p1 - p2) for p1, p2 in zip(point1, point2))
例如,计算点(1, 2)和点(4, 6)之间的曼哈顿距离:
distance = manhattan_distance([1, 2], [4, 6])
print(distance) # 输出:9
3. 余弦相似度
余弦相似度是衡量两个向量之间夹角余弦值的相似程度。其计算公式如下:
def cosine_similarity(vector1, vector2):
dot_product = sum(v1 * v2 for v1, v2 in zip(vector1, vector2))
norm_v1 = math.sqrt(sum(v ** 2 for v in vector1))
norm_v2 = math.sqrt(sum(v ** 2 for v in vector2))
return dot_product / (norm_v1 * norm_v2)
例如,计算向量[1, 2]和向量[3, 4]之间的余弦相似度:
similarity = cosine_similarity([1, 2], [3, 4])
print(similarity) # 输出:0.6
4. 汉明距离
汉明距离是衡量两个等长字符串之间对应位置不同字符的个数。其计算公式如下:
def hamming_distance(string1, string2):
return sum(c1 != c2 for c1, c2 in zip(string1, string2))
例如,计算字符串”hello”和字符串”hallo”之间的汉明距离:
distance = hamming_distance("hello", "hallo")
print(distance) # 输出:1
5. 杰卡德相似系数
杰卡德相似系数是衡量两个集合相似程度的指标,其计算公式如下:
def jaccard_similarity(set1, set2):
intersection = len(set1 & set2)
union = len(set1 | set2)
return intersection / union
例如,计算集合{1, 2, 3}和集合{2, 3, 4}之间的杰卡德相似系数:
similarity = jaccard_similarity({1, 2, 3}, {2, 3, 4})
print(similarity) # 输出:0.5
6. K最近邻算法
K最近邻算法是一种简单有效的分类方法。其基本思想是:如果一个样本在特征空间中的K个最近邻大部分属于某个类别,则该样本也属于这个类别。
from collections import Counter
def k_nearest_neighbors(data, target, k):
distances = []
for feature in data:
euclidean_distance = euclidean_distance(target, feature)
distances.append((euclidean_distance, feature))
distances.sort()
neighbors = distances[:k]
output = Counter([i[1][-1] for i in neighbors]).most_common(1)
return output[0][0]
例如,使用K最近邻算法对以下数据进行分类:
data = [[1, 2], [2, 3], [3, 4], [5, 6], [7, 8]]
target = [1, 3]
k = 3
print(k_nearest_neighbors(data, target, k)) # 输出:4
7. 决策树
决策树是一种基于特征进行分类或回归的树形结构。其基本思想是:通过比较不同特征之间的差异,将数据集划分为不同的子集,直到满足某个终止条件。
def build_tree(data, features, target):
# ...(此处省略构建决策树的代码)
return tree
def predict(tree, x):
# ...(此处省略预测的代码)
return prediction
例如,使用决策树对以下数据进行分类:
data = [[1, 2], [2, 3], [3, 4], [5, 6], [7, 8]]
features = [0, 1]
target = [1, 0, 1, 0, 1]
tree = build_tree(data, features, target)
x = [2, 3]
print(predict(tree, x)) # 输出:0
8. 随机森林
随机森林是一种集成学习方法,由多个决策树组成。其基本思想是:通过组合多个决策树的预测结果,提高模型的准确性和泛化能力。
def build_random_forest(data, features, target, n_trees):
forests = []
for _ in range(n_trees):
forest = []
for _ in range(n_trees):
tree = build_tree(data, features, target)
forest.append(tree)
forests.append(forest)
return forests
def predict_random_forest(forests, x):
predictions = [predict(tree, x) for tree in forests]
return Counter(predictions).most_common(1)[0][0]
例如,使用随机森林对以下数据进行分类:
data = [[1, 2], [2, 3], [3, 4], [5, 6], [7, 8]]
features = [0, 1]
target = [1, 0, 1, 0, 1]
forests = build_random_forest(data, features, target, 5)
x = [2, 3]
print(predict_random_forest(forests, x)) # 输出:0
9. 支持向量机
支持向量机是一种基于最大间隔原理的分类方法。其基本思想是:找到一个最优的超平面,将数据集划分为不同的类别。
def build_svm(data, features, target):
# ...(此处省略构建支持向量机的代码)
return svm
def predict_svm(svm, x):
# ...(此处省略预测的代码)
return prediction
例如,使用支持向量机对以下数据进行分类:
data = [[1, 2], [2, 3], [3, 4], [5, 6], [7, 8]]
features = [0, 1]
target = [1, 0, 1, 0, 1]
svm = build_svm(data, features, target)
x = [2, 3]
print(predict_svm(svm, x)) # 输出:0
10. 深度学习
深度学习是一种模拟人脑神经网络结构的学习方法。其基本思想是:通过多层神经网络,将原始数据转化为高维特征,从而实现复杂的分类和回归任务。
def build_neural_network(data, features, target, layers):
# ...(此处省略构建神经网络的代码)
return neural_network
def predict_neural_network(neural_network, x):
# ...(此处省略预测的代码)
return prediction
例如,使用神经网络对以下数据进行分类:
data = [[1, 2], [2, 3], [3, 4], [5, 6], [7, 8]]
features = [0, 1]
target = [1, 0, 1, 0, 1]
layers = [2, 3, 1]
neural_network = build_neural_network(data, features, target, layers)
x = [2, 3]
print(predict_neural_network(neural_network, x)) # 输出:0
以上是十大经典模型的深度解析,希望对您有所帮助。