Architecture
Distribution
Centroid Allocation Strategies
Choose how centroids are distributed across nodes to optimize for simplicity, balance, or locality.

Comparison of allocation strategies with trade-offs
Allocation Strategies
Round-Robin Allocator
Simple, deterministic assignment. Centroid N goes to Node (N % num_nodes).
Pros
- • Predictable placement
- • Zero coordination
- • Even distribution
Cons
- • No load awareness
- • Reshuffles on node change
- • No locality
Load-Balanced Allocator
Tracks centroid counts and vector counts per node. Assigns to least-loaded.
Pros
- • Handles skewed clusters
- • Adapts to load
- • Better resource usage
Cons
- • Requires coordination
- • Less predictable
- • Rebalancing needed
Locality-Aware AllocatorRecommended
Groups similar centroids on the same node. Uses k-means on centroid vectors.
Pros
- • Reduces query fanout
- • Better cache locality
- • Zone-aware option
Cons
- • Complex algorithm
- • Requires recomputation
- • May create hotspots
Configuration
{
"allocation": {
"strategy": "locality-aware", // round-robin, load-balanced, locality-aware
"roundRobin": {
// No additional config needed
},
"loadBalanced": {
"rebalanceThreshold": 0.2, // Trigger rebalance at 20% imbalance
"rebalanceInterval": "1h"
},
"localityAware": {
"groupingMethod": "kmeans", // kmeans or hierarchical
"numGroups": null, // Auto = num_nodes
"zoneAware": true,
"zones": {
"us-east": ["node-1", "node-2"],
"us-west": ["node-3", "node-4"]
}
}
}
}Query Fanout Impact
The allocation strategy directly affects how many nodes a query touches:
| Strategy | nprobe=8 | nprobe=32 | nprobe=128 |
|---|---|---|---|
| Round-robin (5 nodes) | 5 nodes (100%) | 5 nodes (100%) | 5 nodes (100%) |
| Load-balanced (5 nodes) | 4-5 nodes (80-100%) | 5 nodes (100%) | 5 nodes (100%) |
| Locality-aware (5 nodes) | 1-2 nodes (20-40%) | 2-3 nodes (40-60%) | 4-5 nodes (80-100%) |
When to Use Each
- Round-robin: Development, uniform workloads, simple deployments
- Load-balanced: Skewed cluster sizes, heterogeneous nodes
- Locality-aware: Large clusters, multi-region, latency-sensitive