MLGraph
Cluster
Distributed

Cluster Architecture

Scale from 1 to 50+ nodes with automatic replication, sharding, and failover. Built for production-grade vector search with 99.9% availability guarantees.

Architecture Overview

MLGraph Cluster Architecture Diagram

MLGraph's cluster architecture distributes vector storage across multiple nodes, coordinated by a lightweight coordinator service. Each data node hosts FAISS indexes for assigned centroids, with automatic replication for fault tolerance.

Client Request Flow:
Client → DistributedIndexManager
→ Coordinator (cluster state)
→ CentroidClient (gRPC)
→ Data Node A (primary)
→ Data Node B (replica)
→ SingleCentroidManager (FAISS)
→ IDMap / IVF / OnDisk tiers

Coordinator Node

Maintains cluster state, handles leader election, manages replica placement and failover

Cluster membership management
Health monitoring and failover
Replica placement decisions
Distributed configuration

Data Nodes

Store vector data in FAISS indexes with tiered storage (IDMap, IVF, OnDisk)

Vector storage and indexing
Search query execution
Replication (primary & replica)
Local health reporting

Replication & Sharding

Data Replication and Sharding Visualization

Why Sharding Matters for Vector Search

Unlike traditional databases where you can just throw more RAM at the problem, vector search has a dirty secret: FAISS indexes don't scale linearly with data size. The inverted file (IVF) algorithm used for approximate nearest neighbor search gets slower as indexes grow beyond a few hundred million vectors.

Sharding distributes vectors across multiple smaller indexes, each operating at peak efficiency. A 1 billion vector dataset split across 10 nodes (100M vectors each) searches faster than a single 1B vector index—even accounting for network overhead.

Basic Replication (n-way)

Each centroid replicated across n servers

Replication Factor: 2-3
Consistency: Strong consistency
Network Overhead: Low (10-15%)

Sharding + Replication

Data partitioned across shards, each shard replicated

Replication Factor: 3-5
Consistency: Eventual consistency
Network Overhead: Medium (15-20%)

Cross-Region Replication

Asynchronous replication to secondary regions

Replication Factor: 2+ regions
Consistency: Eventual (async)
Network Overhead: High (25-30%)

Shard Placement Algorithm

The DistributedIndexManager uses a consistent hashing algorithm to assign centroids to nodes. When a new node joins the cluster:

  1. Coordinator calculates centroid assignments based on cluster topology
  2. Affected centroids are migrated to new node in background
  3. Replicas are created on other nodes for fault tolerance
  4. Health checker validates data consistency across replicas
Consistency During Migration

Reads continue to be served from old locations during migration. Writes are buffered and replayed on new nodes. Zero downtime, zero data loss.

Failover & Recovery

Failover and Recovery Flow Diagram

Health Checking

The HealthChecker service periodically pings each data node with lightweight gRPC health checks:

  • gRPC health check every 5 seconds (configurable)
  • 3 consecutive failures trigger failover
  • Circuit breaker prevents cascading failures
  • Automatic recovery when node returns

Automatic Failover

When a primary replica fails, the coordinator promotes a healthy replica:

  • Detect failure (health check timeout)
  • Select best replica (lowest latency)
  • Update cluster state (atomic)
  • Notify clients (via gRPC metadata)

Disaster Recovery

Beyond automatic failover, MLGraph supports point-in-time recovery from snapshots:

Snapshotting

  • Periodic FAISS index snapshots
  • Metadata backup to key-value store
  • Store in object storage (S3, GCS)

Restoration

  • Full cluster restore from snapshot
  • Individual node recovery
  • Automated runbooks for common scenarios

Cluster Scaling

Cluster Scaling Architecture from 1 to 10 Nodes

Scaling Characteristics

MLGraph demonstrates excellent scaling efficiency up to 10 nodes, with near-linear throughput improvements and sub-linear overhead. The sweet spot for most use cases is 3-5 nodes, balancing performance, cost, and operational complexity.

3x
Load Throughput (3 nodes)
4.2x
Search QPS (5 nodes)
90%+
Scaling Efficiency

1 Node

blue

Development & small datasets

Load: 100K vectors/sec
Search: 1,800 QPS
Memory: 2 GB
Availability: None (single point of failure)

3 Nodes

cyan

Production starter (recommended)

Load: 300K vectors/sec
Search: 5,000 QPS
Memory: 700 MB/node
Availability: Survives 1 node failure

5 Nodes

purple

High availability

Load: 450K vectors/sec
Search: 7,500 QPS
Memory: 400 MB/node
Availability: Survives 2 node failures

10+ Nodes

amber

Large-scale deployments

Load: 800K+ vectors/sec
Search: 12,000+ QPS
Memory: 200 MB/node
Availability: Survives 3-4 node failures

Choosing Cluster Size

By Dataset Size

Small (<10M vectors)
1-3 nodes recommended
Medium (10M-100M)
3-5 nodes recommended
Large (100M-1B)
10-20 nodes recommended
Very Large (>1B)
50+ nodes recommended

By Performance Requirements

High Throughput (>10K QPS)
10-15 nodes optimal
Low Latency (<1ms)
3 nodes optimal (minimize network)

Configuration

Coordinator Setup

# coordinator.yaml
cluster_id: "mlgraph-prod"
election_timeout_ms: 5000
heartbeat_interval_ms: 1000
replication_factor: 3
kvstore: "etcd://localhost:2379"

Data Node Config

# node.yaml
node_id: "node-1"
listen_addr: "0.0.0.0:50051"
coordinator: "localhost:50050"
storage_path: "/data/mlgraph"
cache_size_mb: 2048

Command-Line Tools

Start coordinator:
./mlgraph-coordinator --config coordinator.yaml
Start data node:
./centroid_server --port 50051 --coordinator localhost:50050
Check cluster status:
./mlgraph-cli cluster status
Drain node for maintenance:
./mlgraph-cli node drain node-1 --wait

Performance Tuning

Memory Optimization

Cache Size
Set cache_size_mb to 20-30% of available RAM
Tiered Storage
Hot data in IDMap, warm in IVF, cold on disk

Network Tuning

gRPC Settings
Increase max_concurrent_streams for high QPS
Bandwidth
10 Gbps recommended for 10+ node clusters

Load Balancing

Consistent Hashing
Minimizes data movement when nodes join/leave
Centroid Balancing
Rebalance automatically to maintain even load

Related Resources