CAGRA vs FAISS HNSW — concurrency vs QPS

Peak GPU QPS

32,229

at concurrency 32

Peak CPU QPS

10,005

at concurrency 16

Peak speedup

3.2×

GPU vs CPU at peak

Recall (approx)

93%

both indices, k=10

CAGRA — NVIDIA L4 GPU (24 GB) FAISS HNSW — AMD EPYC 7R13 (64 vCPU) 1M vectors · dim=1024 · k=10 · bs=1 · nn_descent build

Enable JavaScript to see the QPS chart. Open this file from disk or GitHub Pages (raw.githubusercontent.com is often served as text/plain and will not run scripts).

Concurrency vs throughput chart comparing CAGRA GPU search and FAISS HNSW CPU search, showing QPS at batch size 1 across concurrency levels 1 to 128