Architecture
Core
Request Lifecycle and Data Flow
Follow a search request from API gateway through routing, parallel execution, and result merging.

Complete request flow from client to response
Request Flow
1
API Gateway
Authentication, rate limiting, request validation
2
ServiceManager
Parse request, identify target index, prepare execution
3
Centroid Router
Compute query-centroid distances, select top nprobe centroids
4
Node Selection
Map centroids to nodes, prepare parallel requests
5
Parallel Execution
Fan-out to nodes, each searches local shards
6
Result Merging
Collect responses, merge by distance, deduplicate
7
Response
Serialize, add metadata, return to client
Key Components
ServiceManager
Coordinates request handling:
- • Request parsing and validation
- • Index resolution
- • Timeout management
- • Error handling
DistributedIndexManager
Manages distributed search:
- • Centroid-to-node mapping
- • Parallel request dispatch
- • Failure handling
- • Load balancing
ResultMerger
Combines distributed results:
- • Priority queue for top-k
- • Duplicate detection
- • Distance normalization
- • Metadata aggregation
LocalSearcher
Executes search on local shards:
- • Multi-tier search
- • Cache utilization
- • SIMD distance computation
- • Early termination
Code Example
// Simplified request flow in ServiceManager
SearchResponse ServiceManager::search(const SearchRequest& req) {
// 1. Validate and parse
auto index = get_index(req.index_name);
auto query = parse_vector(req.vector);
// 2. Find relevant centroids
auto centroids = index->find_nearest_centroids(query, req.nprobe);
// 3. Map to nodes
std::map<NodeId, std::vector<CentroidId>> node_centroids;
for (auto& c : centroids) {
auto node = allocator->get_node_for_centroid(c);
node_centroids[node].push_back(c);
}
// 4. Parallel execution
std::vector<std::future<PartialResult>> futures;
for (auto& [node, cids] : node_centroids) {
futures.push_back(std::async([&] {
return node->search(query, cids, req.k);
}));
}
// 5. Collect and merge
std::vector<PartialResult> results;
for (auto& f : futures) {
results.push_back(f.get());
}
return merger->merge(results, req.k);
}Latency Breakdown
| Stage | P50 | P99 | Notes |
|---|---|---|---|
| Gateway | 0.5ms | 2ms | Auth + validation |
| Routing | 0.3ms | 1ms | Centroid lookup |
| Search (parallel) | 2ms | 10ms | Dominated by slowest node |
| Merge | 0.2ms | 1ms | Priority queue |
| Total | 3ms | 14ms |