Clarification Needed: Differences Between Faiss IndexFlatIP Search Results and Cosine Similarity #3583

YoungjaeDev · 2024-01-04T02:10:23Z

YoungjaeDev
Jan 4, 2024

Summary

Platform

OS: Ubuntu20

Faiss version: lastest

Installed from: sourec build

Faiss compilation options:

Running on:

[v] CPU
[v] GPU

Interface:

C++
[v] Python

Reproduction instructions

I am reaching out with a query regarding some inconsistencies I've encountered while using Faiss for indexing and search operations. My primary concern revolves around the discrepancy between results obtained from a search function on an IndexFlatIP index and those calculated using cosine similarity (both scipy and scikit-learn) on the same dataset. Values from faiss search have a higher matching rate

Specifically, there are occasional mismatches between the distance values (D) from cosine_similarity(np.expand_dims(face_embedding, axis=0), index_np) and those obtained from index.search. While these discrepancies aren't constant, they are noticeable in certain instances.

I am curious about how Faiss handles distance calculations and whether there is any additional preprocessing applied to feature vectors post L2-normalization within Faiss. Any clarification or additional information on this matter would be immensely helpful.

Your insights and experiences could greatly assist in enhancing my understanding and in finding a resolution to these inconsistent results. Thank you in advance for your time and assistance.

Best regards,

algoriddle · 2024-01-08T14:37:52Z

algoriddle
Jan 8, 2024
Collaborator

Faiss does not L2 normalize either the query, not the database vectors. Have you done this normalization on the vectors yourself before adding them to the index and querying?

0 replies

YoungjaeDev · 2024-01-08T14:45:24Z

YoungjaeDev
Jan 8, 2024
Author

Faiss does not L2 normalize either the query, not the database vectors. Have you done this normalization on the vectors yourself before adding them to the index and querying?

Yes, I have indeed applied L2 normalization to the vectors before adding them to the index and querying.
My confusion arises from the fact that while some results from the Faiss search align perfectly with those obtained from my own inner product calculations, there are occasional discrepancies. This led me to wonder if Faiss implements any additional steps or calculations that might account for these differences.

Any insights into this would be greatly appreciated.

0 replies

mlomeli1 · 2024-06-19T14:57:41Z

mlomeli1
Jun 19, 2024
Collaborator

hi @YoungjaeDev, as @algoriddle mentioned, you need to normalise all vectors before constructing the index and use the inner product metric. Could you provide a toy example where you reproduce the issue so this is actionable?

0 replies

ThibaultDef · 2025-07-11T09:36:31Z

ThibaultDef
Jul 11, 2025

Hello here 👋

I currently testing the flat index with inner product in order to retrieve candidates with lower cosine distance.
As for @YoungjaeDev, I have normalized the vectors to calculate distance similarities based on cosine similarities under the hood using faiss.IndexFlatIP.

I also observed that they differ from those calculated with cosine_similarity from scikit-learn.

Here is a toy example to reproduce :

# Simulate candidate embeddings
test_embedings = np.random.rand(2, 1024)
norms = np.linalg.norm(test_embedings, axis=1, keepdims=True)
norms[norms == 0] = 1
normalized_test_embedings = test_embedings / norms

# Index these simulated candidates embeddings
index_for_test = faiss.IndexFlatIP(1024)
index_for_test.add(normalized_test_embedings)

# Generate an input embedding
input_embedding = np.random.rand(1, 1024)
input_embedding = input_embedding / np.linalg.norm(input_embedding)

# Retrieve distances and indexes using faiss.IndexFlatIP 
test_distance_flat_index_ip, test_indexes_flat_index_ip = index_for_test.search(input_embedding, test_embedings.shape[0])

# Retrieve distances 
test_cosine_similarities = cosine_similarity(input_embedding, normalized_test_embedings)
sorted_indices = np.argsort(test_cosine_similarities, axis=1)
test_indexes_sklearn = sorted_indices[:, ::-1]
cosine_ordered = np.array(
    list(map(lambda x, y: y[x], test_indexes_sklearn, test_cosine_similarities))
)
test_distances_sklearn = 1 - cosine_ordered

The indexes test_indexes_flat_index_ip and test_indexes_sklearn are the same which is fortunately expected. However, you will see that test_distance_flat_index_ip and test_distances_sklearn differs significantly from each other.

How do you explain such a difference ?

1 reply

ThibaultDef Jul 11, 2025

I have just found where is the bottleneck. It seems to be either an error or a lack of documentation from faiss.

It turns out that test_distance_flat_index_ip and 1 - test_distances_sklearn are the same if we took my toy example above. It means that fais returns the cosine similarities and not the distance similarities, because 1 - test_distances_sklearn = cosine_ordered

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarification Needed: Differences Between Faiss IndexFlatIP Search Results and Cosine Similarity #3583

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Clarification Needed: Differences Between Faiss IndexFlatIP Search Results and Cosine Similarity #3583

Uh oh!

Uh oh!

YoungjaeDev Jan 4, 2024

Summary

Platform

Reproduction instructions

Replies: 4 comments · 1 reply

Uh oh!

algoriddle Jan 8, 2024 Collaborator

Uh oh!

YoungjaeDev Jan 8, 2024 Author

Uh oh!

mlomeli1 Jun 19, 2024 Collaborator

Uh oh!

Uh oh!

ThibaultDef Jul 11, 2025

Uh oh!

ThibaultDef Jul 11, 2025

YoungjaeDev
Jan 4, 2024

Replies: 4 comments 1 reply

algoriddle
Jan 8, 2024
Collaborator

YoungjaeDev
Jan 8, 2024
Author

mlomeli1
Jun 19, 2024
Collaborator

ThibaultDef
Jul 11, 2025