MLGraph

Cluster

Distributed

Cluster Architecture

Scale from 1 to 50+ nodes with automatic replication, sharding, and failover. Built for production-grade vector search with 99.9% availability guarantees.

Architecture Overview

MLGraph's cluster architecture distributes vector storage across multiple nodes, coordinated by a lightweight coordinator service. Each data node hosts FAISS indexes for assigned centroids, with automatic replication for fault tolerance.

Client Request Flow:

Client → DistributedIndexManager

→ Coordinator (cluster state)

→ CentroidClient (gRPC)

→ Data Node A (primary)

→ Data Node B (replica)

→ SingleCentroidManager (FAISS)

→ IDMap / IVF / OnDisk tiers

Coordinator Node

Maintains cluster state, handles leader election, manages replica placement and failover

Cluster membership management

Health monitoring and failover

Replica placement decisions

Distributed configuration

Data Nodes

Store vector data in FAISS indexes with tiered storage (IDMap, IVF, OnDisk)

Vector storage and indexing

Search query execution

Replication (primary & replica)

Local health reporting

Replication & Sharding

Data Replication and Sharding Visualization

Why Sharding Matters for Vector Search

Unlike traditional databases where you can just throw more RAM at the problem, vector search has a dirty secret: FAISS indexes don't scale linearly with data size. The inverted file (IVF) algorithm used for approximate nearest neighbor search gets slower as indexes grow beyond a few hundred million vectors.

Sharding distributes vectors across multiple smaller indexes, each operating at peak efficiency. A 1 billion vector dataset split across 10 nodes (100M vectors each) searches faster than a single 1B vector index—even accounting for network overhead.

Basic Replication (n-way)

Each centroid replicated across n servers

Replication Factor: 2-3

Consistency: Strong consistency

Network Overhead: Low (10-15%)

Sharding + Replication

Data partitioned across shards, each shard replicated

Replication Factor: 3-5

Consistency: Eventual consistency

Network Overhead: Medium (15-20%)

Cross-Region Replication

Asynchronous replication to secondary regions

Replication Factor: 2+ regions

Consistency: Eventual (async)

Network Overhead: High (25-30%)

Shard Placement Algorithm

The DistributedIndexManager uses a consistent hashing algorithm to assign centroids to nodes. When a new node joins the cluster:

Coordinator calculates centroid assignments based on cluster topology
Affected centroids are migrated to new node in background
Replicas are created on other nodes for fault tolerance
Health checker validates data consistency across replicas

Consistency During Migration

Reads continue to be served from old locations during migration. Writes are buffered and replayed on new nodes. Zero downtime, zero data loss.

Failover & Recovery

Health Checking

The HealthChecker service periodically pings each data node with lightweight gRPC health checks:

gRPC health check every 5 seconds (configurable)
3 consecutive failures trigger failover
Circuit breaker prevents cascading failures
Automatic recovery when node returns

Automatic Failover

When a primary replica fails, the coordinator promotes a healthy replica:

Detect failure (health check timeout)
Select best replica (lowest latency)
Update cluster state (atomic)
Notify clients (via gRPC metadata)

Disaster Recovery

Beyond automatic failover, MLGraph supports point-in-time recovery from snapshots:

Snapshotting

Periodic FAISS index snapshots
Metadata backup to key-value store
Store in object storage (S3, GCS)

Restoration

Full cluster restore from snapshot
Individual node recovery
Automated runbooks for common scenarios

Cluster Scaling

Scaling Characteristics

MLGraph demonstrates excellent scaling efficiency up to 10 nodes, with near-linear throughput improvements and sub-linear overhead. The sweet spot for most use cases is 3-5 nodes, balancing performance, cost, and operational complexity.

Load Throughput (3 nodes)

4.2x

Search QPS (5 nodes)

90%+

Scaling Efficiency

1 Node

blue

Development & small datasets

Load: 100K vectors/sec

Search: 1,800 QPS

Memory: 2 GB

Availability: None (single point of failure)

3 Nodes

cyan

Production starter (recommended)

Load: 300K vectors/sec

Search: 5,000 QPS

Memory: 700 MB/node

Availability: Survives 1 node failure

5 Nodes

purple

High availability

Load: 450K vectors/sec

Search: 7,500 QPS

Memory: 400 MB/node

Availability: Survives 2 node failures

10+ Nodes

amber

Large-scale deployments

Load: 800K+ vectors/sec

Search: 12,000+ QPS

Memory: 200 MB/node

Availability: Survives 3-4 node failures

Choosing Cluster Size

By Dataset Size

Small (<10M vectors)

1-3 nodes recommended

Medium (10M-100M)

3-5 nodes recommended

Large (100M-1B)

10-20 nodes recommended

Very Large (>1B)

50+ nodes recommended

By Performance Requirements

High Throughput (>10K QPS)

10-15 nodes optimal

Low Latency (<1ms)

3 nodes optimal (minimize network)

Configuration

Coordinator Setup

# coordinator.yaml

cluster_id: "mlgraph-prod"

election_timeout_ms: 5000

heartbeat_interval_ms: 1000

replication_factor: 3

kvstore: "etcd://localhost:2379"

Data Node Config

# node.yaml

node_id: "node-1"

listen_addr: "0.0.0.0:50051"

coordinator: "localhost:50050"

storage_path: "/data/mlgraph"

cache_size_mb: 2048

Command-Line Tools

Start coordinator:

./mlgraph-coordinator --config coordinator.yaml

Start data node:

./centroid_server --port 50051 --coordinator localhost:50050

Check cluster status:

./mlgraph-cli cluster status

Drain node for maintenance:

./mlgraph-cli node drain node-1 --wait

Performance Tuning

Memory Optimization

Cache Size

Set cache_size_mb to 20-30% of available RAM

Tiered Storage

Hot data in IDMap, warm in IVF, cold on disk

Network Tuning

gRPC Settings

Increase max_concurrent_streams for high QPS

Bandwidth

10 Gbps recommended for 10+ node clusters

Load Balancing

Consistent Hashing

Minimizes data movement when nodes join/leave

Centroid Balancing

Rebalance automatically to maintain even load