Concurrency vs throughput chart comparing CAGRA GPU search and FAISS HNSW CPU search, showing QPS at batch size 1 across concurrency levels 1 to 128
Peak GPU QPS
32,229
at concurrency 32
Peak CPU QPS
10,005
at concurrency 16
Peak speedup
3.2×
GPU vs CPU at peak
Recall (approx)
93%
both indices, k=10
CAGRA — NVIDIA L4 GPU (24 GB)
FAISS HNSW — AMD EPYC 7R13 (64 vCPU)
1M vectors · dim=1024 · k=10 · bs=1 · nn_descent build