Technical Deep Dive
FAISS Extended
Vector Search

FAISS vs. The World: Why We Still Build on Facebook's Foundation

After evaluating every major vector search system, we chose to build on FAISS. Here's why.

David Gornshtein-December 2025-16 min read

I always struggle a bit when people ask why we built on FAISS instead of using one of the shiny new vector databases. After all, Pinecone exists. Milvus is mature. Weaviate has a beautiful API. Why would anyone in their right mind choose to work with a C++ library designed for research at Facebook?

The short answer is: performance. The long answer is... also performance, but with more nuance. Let me explain by telling you what happened when we benchmarked everything.

The Great Benchmark (2023)

When we started building our code intelligence platform, we needed vector search that could handle tens of millions of code embeddings with sub-10ms latency. This isn't an unusual requirement—it's table stakes for production. So we did what any responsible engineering team would do: we benchmarked everything.

The test was simple. 10 million 768-dimensional vectors (code embeddings from our training corpus). 1000 queries. K=10 nearest neighbors. Measured latency at p50, p95, and p99. Also measured recall against brute-force ground truth. Standard stuff.

Benchmark Results: 10M Vectors, 768d, K=10

Systemp50 Latencyp99 LatencyRecall@10RAM Usage
FAISS (IVF4096,PQ32)2.3ms8.1ms92.4%12GB
Milvus (IVF_PQ)5.8ms18.2ms91.8%14GB
Qdrant (HNSW)4.2ms12.6ms95.1%28GB
Weaviate (HNSW)5.1ms15.3ms94.7%32GB
Pinecone (p2 pod)8.4ms24.1ms93.2%N/A (managed)
pgvector (HNSW)42.3ms89.7ms94.9%35GB

Tests run on AMD EPYC 7543 (32 cores), 256GB RAM, NVMe SSD. All systems configured for comparable recall. FAISS used nprobe=64, others used recommended settings.

Look at those numbers. FAISS isn't just faster—it's significantly faster. The p99 latency difference means that under load, FAISS serves requests in 8ms while others spike to 15-25ms. For interactive applications, that's the difference between "snappy" and "sluggish."

And this is FAISS with the default OpenMP parallelism, the basic IVF-PQ index, and none of our optimizations. Raw, off-the-shelf FAISS still beats systems that have been tuned for production. That tells you something about the underlying engineering.

Why FAISS is Different: First Principles

Here's the thing about FAISS that most people don't appreciate: it wasn't built to be a product. It was built to be the fastest possible implementation of similarity search algorithms. No compromises for usability. No abstractions for extensibility. Just raw, brutal performance.

The FAISS team at Meta Research did things that product engineers wouldn't do. They wrote custom SIMD kernels in assembly. They optimized cache utilization to the byte level. They restructured algorithms to minimize memory bandwidth, because at scale, memory bandwidth is the bottleneck, not compute.

The SIMD Advantage

FAISS includes hand-tuned SIMD implementations for distance calculations. Not just "use AVX intrinsics"—actual assembly code that schedules instructions to maximize throughput on specific CPU architectures.

For L2 distance on AVX-512, FAISS processes 16 floats per cycle with perfect pipeline utilization. Most vector databases use generic implementations that achieve maybe 40% of theoretical throughput. That 2.5x difference compounds across millions of distance calculations.

The Algorithm Zoo

FAISS doesn't commit to one indexing strategy. It supports a menagerie of algorithms, each optimized for different tradeoffs:

IVF (Inverted File)

Cluster vectors, search only relevant clusters. Classic partition-based approach. Best for disk-based search and large datasets.

+ Low memory, scalable
- Requires tuning nprobe

HNSW (Hierarchical NSW)

Graph-based navigation. Build a navigable small world graph, traverse it during search. Best for high recall requirements.

+ High recall, fast
- Memory intensive, slow updates

PQ (Product Quantization)

Compress vectors to a few bytes each. Search in compressed domain. Memory reduction up to 32x with minimal recall loss.

+ Extreme compression
- Training required, some recall loss

Composite Indexes

Combine strategies: IVF for partitioning, PQ for compression, HNSW for refinement. FAISS lets you mix and match.

+ Flexible tradeoffs
- Complex tuning

Most vector databases pick one or two algorithms and optimize around them. FAISS gives you the full toolkit. Want IVF with PQ codes and HNSW coarse quantization? Build it. Want flat indexes for small datasets and switch to IVF at scale? Supported. This flexibility means you can always find the right tradeoff for your specific workload.

The GPU Story: Where FAISS Really Shines

We haven't even talked about GPU search yet. FAISS has first-class GPU support—not "GPU acceleration" as a marketing checkbox, but actual optimized GPU kernels that can search billions of vectors in milliseconds.

The GPU implementation is remarkably sophisticated. It handles multi-GPU search with sharding. It has specialized kernels for different GPU generations. It can mix CPU and GPU computation for hybrid workloads. And crucially, it was written by people who actually understand GPU programming, not people who wrapped cuBLAS and called it done.

GPU vs CPU: 1B Vectors, 128d

ConfigurationBatch SizeLatencyThroughput
FAISS CPU (IVF4096,PQ64)112.3ms81 QPS
FAISS GPU (IVF4096,PQ64) - A10010.8ms1,250 QPS
FAISS CPU (IVF4096,PQ64)10089ms1,123 QPS
FAISS GPU (IVF4096,PQ64) - A1001003.2ms31,250 QPS

31,000 queries per second on a billion vectors. On a single GPU. That's the kind of performance that lets you build interactive applications at scales that would be impossible otherwise.

The Downsides: Why We Built FAISS Extended

I've been gushing about FAISS, but let me be honest about its problems. Because we spent significant engineering effort fixing them.

The Production Gap

  • 1.
    No Updates: FAISS indexes are effectively append-only. Updating a vector means deleting and re-adding, which for IVF means potentially moving it to a different partition. Worse, there's no efficient way to delete—just mark as deleted and hope you rebuild someday.
  • 2.
    OpenMP Limitations: FAISS uses OpenMP for parallelism, which is fine for batch operations but doesn't compose well. Try to combine FAISS with other parallel workloads and you get thread oversubscription.
  • 3.
    Memory-Only Focus: The on-disk support (OnDiskInvertedLists) is an afterthought. No caching strategies. No prefetching. No async I/O.
  • 4.
    No Metadata: FAISS stores vectors, not documents. Filter by metadata? Pre-filter your candidate set, then search. Post-filter results? Hope you retrieved enough candidates.

These limitations are why vector databases exist. Pinecone, Milvus, and friends solve these problems. They wrap similar algorithms in production-ready infrastructure. But—and this is the key insight—they often sacrifice 2-3x performance in the process.

Our bet was different: what if we could add the production capabilities while keeping the FAISS performance? What if we could have our cake and eat it too?

FAISS Extended: Best of Both Worlds

That's what FAISS Extended is. We took the FAISS core—all those beautiful SIMD kernels and optimized algorithms—and built the production layer ourselves. Not a wrapper. Not an abstraction. Direct modifications to the FAISS codebase that add capabilities without sacrificing performance.

TBB Parallelism

We replaced OpenMP with Intel TBB for fine-grained parallelism. TBB's work-stealing scheduler handles load imbalance gracefully. Composable parallel regions let us nest FAISS search inside larger parallel workflows. The result: 2.8x better throughput under mixed workloads compared to stock FAISS.

In-Place Updates and Deletes

Our OnDiskInvertedListsTBB class supports atomic updates. Update a vector by ID—it finds the right inverted list, modifies the entry in place, invalidates caches, done. Deletes work similarly with optional compaction. No more "rebuild the entire index" for routine operations.

Smart On-Disk Storage

Multi-layer caching with adaptive policies. Prefetching based on access patterns. Thread-safe I/O using pread() andpwrite(). The boring engineering that makes disk-based search actually fast. We went from 12ms p50 latency on disk to 4.8ms—competitive with in-memory search for most workloads.

Sorted Inverted Lists

Our secret weapon. We store vectors sorted by distance to their cluster centroid. During search, we can early-terminate when remaining candidates can't beat our current best. For high-selectivity queries, this cuts search time by 40-60%. Pure algorithmic optimization, compatible with all other FAISS features.

Benchmark: FAISS Extended vs. The Competition

Same test as before. 10 million vectors, 768 dimensions, K=10 nearest neighbors. But now we include FAISS Extended with our optimizations.

Updated Benchmark: 10M Vectors, 768d, K=10

Systemp50 Latencyp99 LatencyUpdates/secNotes
FAISS Extended (Ours)1.8ms5.2ms12,000TBB + Sorted Lists + Caching
FAISS (Stock)2.3ms8.1msN/ANo native updates
Milvus 2.x5.8ms18.2ms3,500Distributed support
Qdrant4.2ms12.6ms5,200Great filtering
Weaviate5.1ms15.3ms4,100Built-in embeddings

FAISS Extended matches the best in class on latency AND supports 12,000 updates per second. That's the production capability of a modern vector database with the raw performance of FAISS. This is what happens when you add features instead of adding abstraction.

When to Use What

I'm not going to claim FAISS Extended is the right choice for everyone. Different problems require different solutions. Here's our honest assessment:

Use FAISS Extended When:

  • Raw latency is critical (under 5ms required)
  • You need GPU acceleration at scale
  • Custom index configurations are needed
  • You have C++ expertise on the team
  • On-prem deployment is required

Consider Alternatives When:

  • Managed service is strongly preferred
  • Multi-region replication is critical
  • Complex metadata filtering is primary use case
  • Team prefers Python/REST over C++
  • Under 1M vectors (pgvector is fine)

The Bottom Line

FAISS is still the foundation of high-performance vector search. The core algorithms and implementations are unmatched. What was missing was the production layer—updates, deletes, caching, composable parallelism.

We built FAISS Extended because we needed both: FAISS-level performance AND production-grade operations. Two years and a lot of C++ later, we have it. The benchmarks speak for themselves.

Is it more work than using a managed service? Yes. Is it worth it when you need the performance? Absolutely. That's the trade-off. We made our choice. Now you can make yours.

Ready to Try FAISS Extended?

FAISS Extended is available as part of our enterprise offering, with full source code access and support. We also offer consulting for teams migrating from other vector databases.

Written by David Gornshtein, CTO at WEB WISE HOUSE LTD. David benchmarks vector databases for fun, which tells you everything you need to know about his social life. He's available for consulting, technical arguments, and spirited debates about whether HNSW or IVF is better (the answer is "it depends," which means everyone loses).