Clarification Needed: Differences Between Faiss IndexFlatIP Search Results and Cosine Similarity #3583
Replies: 4 comments 1 reply
-
Faiss does not L2 normalize either the query, not the database vectors. Have you done this normalization on the vectors yourself before adding them to the index and querying? |
Beta Was this translation helpful? Give feedback.
-
Yes, I have indeed applied L2 normalization to the vectors before adding them to the index and querying. Any insights into this would be greatly appreciated. |
Beta Was this translation helpful? Give feedback.
-
hi @YoungjaeDev, as @algoriddle mentioned, you need to normalise all vectors before constructing the index and use the inner product metric. Could you provide a toy example where you reproduce the issue so this is actionable? |
Beta Was this translation helpful? Give feedback.
-
Hello here π I currently testing the flat index with inner product in order to retrieve candidates with lower cosine distance. I also observed that they differ from those calculated with Here is a toy example to reproduce :
The indexes How do you explain such a difference ? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Platform
OS: Ubuntu20
Faiss version: lastest
Installed from: sourec build
Faiss compilation options:
Running on:
Interface:
Reproduction instructions
I am reaching out with a query regarding some inconsistencies I've encountered while using Faiss for indexing and search operations. My primary concern revolves around the discrepancy between results obtained from a search function on an IndexFlatIP index and those calculated using
cosine similarity
(both scipy and scikit-learn) on the same dataset. Values ββfrom faiss search have a higher matching rateSpecifically, there are occasional mismatches between the distance values (D) from cosine_similarity(np.expand_dims(face_embedding, axis=0), index_np) and those obtained from index.search. While these discrepancies aren't constant, they are noticeable in certain instances.
I am curious about how Faiss handles distance calculations and whether there is any additional preprocessing applied to feature vectors post L2-normalization within Faiss. Any clarification or additional information on this matter would be immensely helpful.
Your insights and experiences could greatly assist in enhancing my understanding and in finding a resolution to these inconsistent results. Thank you in advance for your time and assistance.
Best regards,
Beta Was this translation helpful? Give feedback.
All reactions