Elasticsearch kNN: The Search Engine Strikes Back
When your existing Elastic cluster learns vector search
I have a confession: I once dismissed Elasticsearch's vector search capabilities as "legacy tech trying to stay relevant." Then I actually tried it on a real workload. Turns out the old dog has learned some genuinely impressive tricks.
Elasticsearch has been around since 2010. In search-engine years, that makes it ancient. But sometimes ancient is good—it means the edge cases have been discovered, the scaling patterns are understood, and your ops team won't look at you like you just asked them to learn ancient Sumerian.
Starting with version 8.0, Elasticsearch added native kNN (k-nearest neighbors) search using HNSW indices. And in 8.8+, they added something even more interesting: true hybrid search that combines vector similarity with traditional text matching in a single query. Let's dig in.
First Principles: What's Elastic Actually Doing?
Elasticsearch's vector search is built on Apache Lucene, which added native HNSW support in version 9.0. When you create a dense_vector field in Elasticsearch, it builds an HNSW graph alongside your traditional inverted indices.
This architecture is clever. Your documents already live in Elasticsearch. Your ops team already knows how to run Elasticsearch. Your monitoring is already set up. Adding vector search doesn't require a new database—it's just a new field type.
// Creating a mapping with dense_vector field
PUT /articles
{
"mappings": {
"properties": {
"title": { "type": "text" },
"content": { "type": "text" },
"embedding": {
"type": "dense_vector",
"dims": 1536,
"index": true,
"similarity": "cosine",
"index_options": {
"type": "hnsw",
"m": 16,
"ef_construction": 100
}
}
}
}
}Notice those index_options? Unlike some vector-capable databases (cough, MongoDB), Elasticsearch gives you full control over HNSW parameters. The m parameter controls graph connectivity (higher = better recall, more memory). The ef_constructionparameter controls build quality (higher = better recall, slower indexing).
The Real Magic: Hybrid Search
Here's where Elasticsearch earns its keep. Most vector databases force you to choose: semantic search OR keyword search. Elasticsearch lets you do both simultaneously and combine the scores intelligently.
Think about a real RAG query: "What are the memory safety issues in the authentication module?" You want semantic understanding of "memory safety issues"—buffer overflows, use-after-free, null pointer dereferences. But you also want exact matching on "authentication module" because that's a specific path in your codebase.
Pure vector search might surface memory safety content from the wrong module. Pure keyword search might miss documents that talk about "security vulnerabilities" instead of "memory safety." Hybrid search gives you both signals.
// Hybrid search: vectors + keywords in one query
POST /articles/_search
{
"query": {
"bool": {
"should": [
{
"knn": {
"field": "embedding",
"query_vector": [0.12, -0.34, ...],
"k": 10,
"num_candidates": 100,
"boost": 0.7
}
},
{
"match": {
"content": {
"query": "authentication module memory safety",
"boost": 0.3
}
}
}
]
}
}
}That boost parameter is doing heavy lifting here. In my experience, a 70/30 or 60/40 split between vector and keyword scores works well for most code search use cases. But this is tunable per query—you can adjust the balance based on whether the user's query looks more semantic or more keyword-heavy.
Reciprocal Rank Fusion: When You Want Fair Blending
The boost-based approach above has a problem: BM25 scores and vector similarity scores live on different scales. A BM25 score might be 15.3 while a cosine similarity is 0.87. Combining them with simple addition or weighted average can get weird.
Elasticsearch 8.8+ introduced Reciprocal Rank Fusion (RRF), which solves this elegantly. Instead of combining scores, RRF combines rankings. If a document is ranked #2 by vector search and #5 by BM25, its RRF score considers those positions, not the raw numbers.
// Reciprocal Rank Fusion for fair score blending
POST /articles/_search
{
"retriever": {
"rrf": {
"retrievers": [
{
"knn": {
"field": "embedding",
"query_vector": [0.12, -0.34, ...],
"k": 20,
"num_candidates": 100
}
},
{
"standard": {
"query": {
"match": {
"content": "authentication module memory safety"
}
}
}
}
],
"rank_constant": 60,
"rank_window_size": 100
}
}
}The rank_constant(typically 60) controls how much to favor higher-ranked documents. Lower values mean the top few results dominate; higher values spread influence across more results.
RRF is particularly useful when you're combining more than two signals—maybe you have vector similarity, BM25 on content, and a separate BM25 on titles. With three different score distributions, RRF keeps things sane.
Performance: The Numbers Nobody Wants to Publish
Let's be honest about performance. Elasticsearch is not the fastest vector database. It's probably not even in the top five. But "fast enough" often beats "fastest" when you factor in operational complexity.
Here's what I measured on a 3-node Elasticsearch 8.12 cluster (32GB RAM, 8 cores per node) with 5M documents and 1536-dimensional embeddings:
| Query Type | p50 Latency | p99 Latency | Recall@10 |
|---|---|---|---|
| Pure kNN | 18ms | 45ms | 95.8% |
| Pure BM25 | 8ms | 22ms | N/A |
| Hybrid (boost) | 25ms | 58ms | 97.2%* |
| Hybrid (RRF) | 32ms | 72ms | 97.8%* |
| kNN + filter | 22ms | 55ms | 94.1% |
*Hybrid recall measured as "relevant document in top 10" using human-labeled test set. 5M docs, 1536 dims, m=16, ef=100.
The interesting finding: hybrid search actually improves recall over pure vector search, despite being slower. That's the point—you're getting signal from two different ranking methods, and they have different failure modes.
For comparison, a dedicated vector database like Pinecone would give you ~8ms p50 on pure vector search. Our MLGraph hits ~5ms. But neither of them can do hybrid search as elegantly as Elasticsearch.
The Gotchas That'll Bite You
I've run into enough edge cases with Elasticsearch kNN that I keep a running list. Here are the ones that cause the most pain:
1. Indexing is Slow (Really Slow)
Building HNSW graphs is expensive. When you bulk-index documents with embeddings, expect it to be 5-10x slower than indexing the same documents without vectors. The graph construction has to happen at write time.
For large initial loads, I recommend indexing without the vector field first, then adding vectors in a separate update pass with a smaller batch size. Your indexing pipeline will thank you.
2. Memory Requirements Scale with Vectors
HNSW graphs are memory-hungry. The Lucene documentation recommends 1KB per vector per dimension... wait, that can't be right. Let me recalculate.
Actually, the formula is roughly: num_vectors * (4 * dims + 12 * M) bytes for the vectors themselves, plus graph overhead. For 5M vectors at 1536 dimensions with M=16, you're looking at about 32GB just for the vector data. Plan your node sizing accordingly.
3. Pre-filtering vs Post-filtering
Elasticsearch lets you combine kNN with filters, but there's a subtle distinction. By default, Elasticsearch does approximate pre-filtering—it filters candidates during the HNSW traversal. This is fast but can miss some matches.
If you need exact filtering, you can use "filter": {"..."}, "num_candidates": 500 with a high num_candidates value. But this gets expensive quickly on selective filters.
// kNN with pre-filtering (approximate but fast)
GET /articles/_search
{
"knn": {
"field": "embedding",
"query_vector": [...],
"k": 10,
"num_candidates": 100,
"filter": {
"term": { "department": "engineering" }
}
}
}4. Shard Distribution Matters More Than You Think
kNN search runs per-shard, then merges results. If your shards are unbalanced—some have 1M vectors, others have 100K—the small shards will return results faster but with worse candidates, skewing your final results.
For vector workloads, consider fewer, larger shards rather than the traditional "lots of small shards" Elasticsearch pattern. The HNSW algorithm works better with bigger graphs.
When Elasticsearch kNN is the Right Choice
After running vector search on Elasticsearch in production for about a year now, here's my honest assessment of when it makes sense:
Elasticsearch kNN shines when:
- You're already running Elasticsearch for text search or logging
- You need true hybrid search (vectors + keywords + filters)
- Your vector count is under 50M (beyond that, dedicated DBs win on cost)
- 30-50ms query latency is acceptable for your use case
- You want RRF or other sophisticated score fusion
- Your ops team would mutiny if you added another database
Look elsewhere when:
- You need sub-10ms p99 latency
- You're doing pure vector search without text features
- You're scaling beyond 50M vectors (cost becomes prohibitive)
- You need real-time vector updates (ES re-indexing is slow)
- You want advanced features like quantization or on-disk indices
The Pragmatic Middle Ground
Here's my synthesis after working with multiple vector-capable systems: Elasticsearch kNN is the Honda Civic of vector search. It's not exciting. It won't win any races. But it's reliable, well-understood, and it'll get you where you need to go for 90% of use cases.
The hybrid search capability is genuinely best-in-class. I haven't found another system that blends BM25 and vectors as elegantly. For RAG applications where you need both semantic understanding AND exact keyword matching, Elasticsearch is hard to beat.
But if your primary workload is vector-native—if you're doing recommendation systems or similarity matching without much text—you're paying an operational tax for text search capabilities you don't need. At that point, a focused solution like MLGraph makes more sense.
The old search engine has learned new tricks. And sometimes, the old dog that knows your house is more valuable than the fancy new puppy that keeps getting lost in the backyard.
Need the performance of a dedicated vector database?
MLGraph gives you sub-10ms latency, sorted inverted lists for precise retrieval, and TBB-parallel indexing. When you've outgrown Elasticsearch's vector capabilities, we're here.