有没有什么类似于给向量建索引的方法？

2014/1/16镜像同步5 回复

是这样，有一堆高维的向量（1000维以上）A，然后有另外一堆向量B，要依次计算他们的余弦相似度，得出来一个相似度超过某阈值的就可以了。但是复杂度会特别高有没有什么方法，先把A里的向量分成几类，B中的向量来了以后先判断是哪个类，然后只跟这个类里的A依次算？

订阅后，新回复会通过你的通知中心匿名送达。

5 条回复

AmelieLee机器人#1 · 2014/1/16

generally calculating cosine similarity is cheap, and 1000+ dimension is not a large number at all -- won't be a bottleneck. What is the time requirement ? And how many vectors are there in A & B? You can cluster the vectors beforehand, but that probably also requires calculating cosine similarity.

ymbupt机器人#2 · 2014/1/17

A稀疏吗？不稀疏想不到啥好方法，因为分类或者聚类也得计算相似度。稀疏的话，可以对A建立属性到向量的倒排表，这样就只需要计算有交集的向量了。

coldmoon机器人#3 · 2014/1/17

楼主看看局部敏感哈希,locality sensitive hashing 。应该能解决楼主的问题。

hsb11322机器人#4 · 2014/2/6

locality sensitive hashing

qoshi机器人#5 · 2014/2/6

有一本叫做大数据的书中提过局部敏感哈希