Concurrency vs throughput chart comparing CAGRA GPU search and FAISS HNSW CPU search, showing QPS at batch size 1 across concurrency levels 1 to 128

Peak GPU QPS
32,229
at concurrency 32
Peak CPU QPS
10,005
at concurrency 16
Peak speedup
3.2×
GPU vs CPU at peak
Recall (approx)
93%
both indices, k=10
CAGRA — NVIDIA L4 GPU (24 GB) FAISS HNSW — AMD EPYC 7R13 (64 vCPU) 1M vectors · dim=1024 · k=10 · bs=1 · nn_descent build