Here's a weird thing about vector search: it was born not from some grand academic vision of semantic understanding, but from Facebook engineers staring at a very specific, very annoying problem. They had billions of photos. People kept uploading new ones. And somewhere in that haystack, they needed to find near-duplicates. Fast.

This is the story of how that narrow operational need sparked a revolution we're still riding today. It involves some brilliant math, a lot of engineering pragmatism, and the kind of "we'll figure it out as we go" energy that defines most important infrastructure.

The Problem: Billions of Vectors, No Good Answers

Visualization of Facebook's scale problem: billions of images represented as points in high-dimensional space

The naive approach: check every vector against every query. At billion scale, that's... not going to work.

Let me paint the picture. It's 2015. Deep learning just conquered ImageNet. Everyone realizes you can take an image, shove it through a neural network, and get a vector out the other end. That vector captures something about the image's "meaning"—similar images produce similar vectors. Beautiful.

Except now you have a billion of these vectors. And when a new image comes in, you need to find the most similar ones. Like, right now. Users are waiting.

The naive approach? Compare your query vector to all billion stored vectors, compute distances, sort, return top-k. This is called brute force or exact search. It's elegant. It's correct. And at billion scale, it takes approximately forever.

"We were computing cosine similarities against billions of embeddings. Per query. The math worked perfectly. The electricity bill... less so."
— Apocryphal Facebook engineer

This wasn't a research problem anymore. This was an engineering emergency. Facebook needed something that could search billions of vectors in milliseconds. And it needed it yesterday.

First Principles: Why Is This Even Hard?

Let's start from the absolute basics, because I think a lot of people skip this and then get confused later. Why can't we just... index vectors like we index text?

With text, you have discrete tokens. The word "cat" either appears in a document or it doesn't. You build an inverted index: for each word, store a list of documents containing it. When someone searches for "cat," you look up the list. Done. O(1) lookup.

Vectors don't work like that. A 512-dimensional vector isn't discrete—it's a point floating in continuous space. "Similar" vectors aren't identical; they're close. But "close" depends on your distance metric, and there's no magical hash function that maps nearby points to the same bucket.

The curse of dimensionality: in high dimensions, all points become roughly equidistant

In high dimensions, the ratio between the nearest and farthest points approaches 1. Everything looks equally far away.

And here's the real kicker: the curse of dimensionality. In low dimensions—2D, 3D—we have intuitive spatial structures. Trees, grids, Voronoi diagrams. They work because nearby points stay nearby as you partition space.

In 512 dimensions? Those structures fall apart. The ratio between your nearest neighbor and your farthest point approaches 1 as dimensions increase. Everything is roughly the same distance from everything else. Your spatial intuitions don't just bend—they shatter.

This is why nearest neighbor search in high dimensions was considered an almost hopeless problem for decades. The theoretical lower bounds were discouraging. Every known algorithm hit exponential walls.

The Insight That Changed Everything: Approximate Is Fine

The breakthrough wasn't a new algorithm. It was a mindset shift.

What if we don't need exact nearest neighbors? What if "very close" is good enough? After all, when you're searching for similar photos, does it matter if you find the 3rd most similar instead of the 1st? Users can't tell the difference. The application works fine.

This is the core insight behind Approximate Nearest Neighbor (ANN) search: trade a tiny bit of accuracy for massive speed improvements. And I mean massive. We're talking 1000x faster. Sometimes 10000x.

"The best algorithm is the one that gets you close enough, fast enough, cheap enough."

The Facebook team—later to become the creators of FAISS—took this philosophy and ran with it. They weren't building a research prototype. They were building production infrastructure. If 99% recall saved 99% of compute, that was an easy trade.

FAISS: The Swiss Army Knife That Started It All

FAISS architecture diagram showing IVF indexing, product quantization, and GPU acceleration

FAISS combined multiple techniques: IVF for partitioning, PQ for compression, and GPU acceleration for raw speed.

FAISS (Facebook AI Similarity Search) was released in 2017, but its internals had been battle-tested on Facebook's production workloads for years. It wasn't one algorithm—it was a library of composable building blocks:

Inverted File Index (IVF): Divide and Conquer

The first trick: don't search everything. Cluster your vectors into groups (called cells or partitions). At query time, figure out which cluster the query belongs to, and only search vectors in that cluster (and maybe a few nearby clusters).

Think of it like a library. Instead of searching every book, you first go to the right section (fiction, non-fiction, science), then search within that section. The "section" is determined by training a clustering algorithm (usually k-means) on your data.

With 1000 clusters and 1 billion vectors, each cluster has about 1 million vectors. If you only search 10 clusters per query, you've reduced your workload by 100x. And because cluster centroids capture data distribution, you're unlikely to miss good matches.

Product Quantization (PQ): Compression That Matters

The second trick: compress your vectors. A 512-dimensional float32 vector takes 2048 bytes. At a billion vectors, that's 2 terabytes just for storage. Plus, loading that much data for distance computation kills your memory bandwidth.

Product Quantization splits each vector into subvectors (say, 8 subvectors of 64 dimensions each), then quantizes each subvector to one of 256 centroids. Now your 2048-byte vector is just 8 bytes—one centroid ID per subvector. 256x compression.

Distance computation becomes table lookups: precompute distances from the query to all centroids, then sum up the appropriate table entries. Blazing fast. And because you trained the quantizer on real data, the approximation error is small for the distances you actually care about.

GPU Acceleration: Because CPUs Are Slow

FAISS was among the first to take GPU vector search seriously. The operations—matrix multiplications, parallel distance computations, top-k selection—are embarrassingly parallel. GPUs eat them for breakfast.

The FAISS GPU backend could handle 10,000+ queries per second on a single GPU. For Facebook's scale, this wasn't optional—it was essential. And it raised the bar for everyone else.

The Wild West Era: Before Vector Databases

Here's what people forget: from 2017 to about 2020, there were no "vector databases." There was FAISS. And there was rolling your own.

FAISS gave you the algorithms, but not the infrastructure. You want durability? Figure it out. Replication? Good luck. Dynamic inserts without reindexing? Ha. CRUD operations? What's that?

Typical 2018 vector search architecture: custom Python scripts, periodic batch reindexing, prayer

A typical 2018 vector search setup. The "Scheduled Reindex" box was usually a cron job and a prayer.

I remember the stack everyone cobbled together circa 2018:

Store embeddings in PostgreSQL (as float arrays) or S3 (as NumPy files)
Build FAISS index offline, periodically (every night? every hour?)
Serialize index to disk, ship to search servers
Pray nothing crashes during the reload
Handle the inevitable OOM errors when index grows too big

It was fragile. It was hacky. It worked, sort of, if you were careful. But it was obvious that something better was needed. The gap between "FAISS does fast search" and "I have a production-ready vector search system" was enormous.

The Euclidean Assumption (And When It Breaks)

Something that bothered me during this era—and still does—is how casually everyone assumed Euclidean distance was The Right Metric. Or cosine similarity, which is just normalized Euclidean.

The thing is, Euclidean distance makes a very specific assumption: your embedding space is flat. Differences in any direction are equally meaningful. Moving 0.1 in dimension 47 is exactly as significant as moving 0.1 in dimension 128.

But is that true for your embeddings? Neural networks don't optimize for clean Euclidean geometry. They optimize for whatever loss function you trained with. The learned space might be curved, warped, have weird density variations. Some dimensions might carry more semantic information than others.

In practice, Euclidean/cosine often works "well enough." The embeddings are trained to put similar things close together, and "close" according to the model tends to correlate with "close" according to Euclidean distance. But it's an approximation on top of an approximation.

"We act like we understand the geometry of embedding spaces. We really don't. We just know that distance-based search usually gives reasonable results."

This is one of those uncomfortable truths that practitioners learn to live with. Your metric is an approximation. Your index is an approximation. Your embeddings are approximations of meaning. Stack enough approximations and somehow it still works. But never forget that you're standing on a tower of heuristics.

What FAISS Got Right (And What It Left Behind)

Looking back, FAISS made several brilliant design decisions that enabled the entire field:

The Good

Composable index types: IVF + PQ + OPQ + GPU could be mixed and matched. Users could tune for their specific accuracy/speed/memory tradeoffs.
GPU-first design: Taking GPUs seriously from day one set performance expectations.
Open source: MIT license, active development, responsive maintainers. This wasn't an abandoned academic project.
Production-tested: If it works at Facebook scale, it'll probably work for you.

What It Left for Others

No database features: FAISS is an index, not a database. No ACID, no replication, no dynamic updates (for most index types).
Single-machine focus: Distributed search? Build it yourself.
Metadata filtering: What if you only want results from a specific category? FAISS searches first, filters after. Inefficient for selective filters.
Incremental updates: Most FAISS indexes must be rebuilt from scratch to add data. Fine for static datasets, painful for dynamic ones.

These gaps became the foundation for the vector database industry. Pinecone, Weaviate, Milvus, Qdrant—they all exist because FAISS solved the hard algorithmic problem but left the infrastructure problem unsolved.

The Legacy: Why This History Matters

If you're building with vector search today, you're living in the world FAISS created. The concepts—IVF partitioning, product quantization, approximate search with recall/speed tradeoffs—are universal. Every vector database uses variants of these ideas.

Understanding this history helps you make better decisions:

When a vendor promises "exact search at scale," you know that's probably BS or very expensive. The fundamental limits haven't changed.
When your recall drops at higher throughput, you understand why—you're searching fewer partitions.
When someone says "just use FAISS," you know that means solving your own durability, distribution, and update problems.
When you need metadata filtering, you understand why it's hard—the index structure wasn't designed for it.

Vector search went from "impossible at scale" to "trivially easy with managed services" in about eight years. That's remarkable. But the underlying physics—high-dimensional geometry, approximate algorithms, compute/accuracy tradeoffs—hasn't changed. Anyone telling you otherwise is selling something.

Coming Up Next

In the next post, we dive into the ANN revolution: how Locality Sensitive Hashing opened the door, why HNSW changed everything, and the philosophy behind "good enough" search that enabled billion-scale applications.

Read: The ANN Revolution Explore FAISS Extended

References & Further Reading

Billion-scale similarity search with GPUs (Johnson et al., 2017 - The original FAISS paper)
FAISS GitHub Repository (Still actively maintained)
Product Quantization for Nearest Neighbor Search (Jegou et al., 2011 - The PQ paper)
Curse of Dimensionality (Wikipedia - Good introduction to the problem)