Architecture
Core

Request Lifecycle and Data Flow

Follow a search request from API gateway through routing, parallel execution, and result merging.

Request Lifecycle Data Flow

Complete request flow from client to response

Request Flow

1

API Gateway

Authentication, rate limiting, request validation

~1ms
2

ServiceManager

Parse request, identify target index, prepare execution

~0.5ms
3

Centroid Router

Compute query-centroid distances, select top nprobe centroids

~0.2ms
4

Node Selection

Map centroids to nodes, prepare parallel requests

~0.1ms
5

Parallel Execution

Fan-out to nodes, each searches local shards

~2-10ms
6

Result Merging

Collect responses, merge by distance, deduplicate

~0.3ms
7

Response

Serialize, add metadata, return to client

Total: 3-12ms

Key Components

ServiceManager

Coordinates request handling:

  • • Request parsing and validation
  • • Index resolution
  • • Timeout management
  • • Error handling

DistributedIndexManager

Manages distributed search:

  • • Centroid-to-node mapping
  • • Parallel request dispatch
  • • Failure handling
  • • Load balancing

ResultMerger

Combines distributed results:

  • • Priority queue for top-k
  • • Duplicate detection
  • • Distance normalization
  • • Metadata aggregation

LocalSearcher

Executes search on local shards:

  • • Multi-tier search
  • • Cache utilization
  • • SIMD distance computation
  • • Early termination

Code Example

// Simplified request flow in ServiceManager
SearchResponse ServiceManager::search(const SearchRequest& req) {
  // 1. Validate and parse
  auto index = get_index(req.index_name);
  auto query = parse_vector(req.vector);

  // 2. Find relevant centroids
  auto centroids = index->find_nearest_centroids(query, req.nprobe);

  // 3. Map to nodes
  std::map<NodeId, std::vector<CentroidId>> node_centroids;
  for (auto& c : centroids) {
    auto node = allocator->get_node_for_centroid(c);
    node_centroids[node].push_back(c);
  }

  // 4. Parallel execution
  std::vector<std::future<PartialResult>> futures;
  for (auto& [node, cids] : node_centroids) {
    futures.push_back(std::async([&] {
      return node->search(query, cids, req.k);
    }));
  }

  // 5. Collect and merge
  std::vector<PartialResult> results;
  for (auto& f : futures) {
    results.push_back(f.get());
  }

  return merger->merge(results, req.k);
}

Latency Breakdown

StageP50P99Notes
Gateway0.5ms2msAuth + validation
Routing0.3ms1msCentroid lookup
Search (parallel)2ms10msDominated by slowest node
Merge0.2ms1msPriority queue
Total3ms14ms