Distributed Database
SSD-First

MLGraph

Turbopuffer showed us the way, but we went off-road. A distributed vector database that runs on your SSDs, not someone else's cloud. Centroids in memory, vectors on disk, latency in microseconds. Scale to billions of vectors without scaling your AWS bill.

1B+
Vectors Supported
7,500
QPS @ 5 Nodes
0.2ms
P50 Latency
65%
Memory Savings
The Philosophy

Why SSD-First Matters

Here's the economics: RAM costs about $5/GB/month. SSD costs about $0.10/GB/month. That's a 50x difference. If you're storing a billion 128-dimensional float vectors, that's 512 GB of raw data. In RAM? $2,560/month. On SSD? $51/month. The math speaks for itself.

But wait, you say, disk is slow! True—if you're reading it wrong. The trick is knowing what to keep in memory. Centroids are tiny (256-1024 of them, each 512 bytes for 128D). They tell you whichvectors to fetch. Keep centroids hot, let vectors stay cold, and you get the best of both worlds.

This is the Turbopuffer insight, and it's brilliant. We took it further: distributed centroids across nodes, mirror groups for replication, tiered sharding for different access patterns. Same philosophy, enterprise-grade implementation.

Architecture

Built for Scale

A multi-node, multi-centroid system designed for horizontal scaling.

Tiered Sharding

New vectors hit IDMap shards for instant availability. As they age, they graduate to IVF shards for efficient ANN search. Large indices move to OnDisk shards. Your data flows naturally to its optimal resting place.

Mirror Groups

Primary-replica replication with configurable consistency. SYNC waits for all replicas, SEMI_SYNC for at least one, ASYNC for fire-and-forget. Pick your trade-off between durability and latency.

Distributed Centroids

Centroids live in memory on each node. Vectors live on disk. Queries hit centroids first (microseconds), then fetch relevant vectors (still fast). Memory stays small, storage scales infinitely.

Smart Allocation

Round-robin for simplicity. Load-balanced for efficiency. Locality-aware for performance—grouping related centroids on the same node reduces query fanout.

Data Flow Architecture

Ingest
Vectors arrive
IDMap Shard
Fast writes
IVF → OnDisk
Scalable storage
High Availability

Mirror Groups: Replication Done Right

Replication sounds simple until you think about it. What happens when the primary fails mid-write? What if a replica falls behind? How do you handle split-brain scenarios? We've thought about all of this so you don't have to.

A mirror group is a collection of servers maintaining identical copies of your index data. One primary handles writes; replicas serve reads. You choose the consistency level:

SYNC

Wait for all replicas. Maximum durability, higher latency. For when losing data is not an option.

SEMI_SYNC
Default

Wait for at least one replica. Balanced trade-off. Good for most production workloads.

ASYNC

Fire and forget. Maximum speed, eventual consistency. For high-throughput scenarios.

mirror_group_config.cpp
MirrorGroupConfig config;
config.group_id = "production-vectors";
config.server_ids = {"node-1", "node-2", "node-3"};
config.replication_factor = 2;
config.primary_server_id = "node-1";
config.consistency_level = ConsistencyLevel::SEMI_SYNC;

// Automatic failover when primary goes down
// Load balancing across healthy replicas
// Replication lag monitoring with alerts
Performance

Linear Scaling, Real Numbers

Measured on 128-dimensional vectors, 1M vectors per node, production hardware.

ConfigurationVectorsLoad RateSearch QPSMemory
Single Server
< 10M100K/s1,8002 GB
3-Node Cluster
10M-100M300K/s5,000700 MB/node
5-Node Cluster
100M-1B450K/s7,500400 MB/node

Data Format Matters

For 1M vectors at 128 dimensions—choose wisely.

Parquet
Recommended
Load Time
0.81s
Throughput
1,232K vec/s
Size
494 MB
CSV
Load Time
9.77s
Throughput
102K vec/s
Size
1,249 MB
JSON
Load Time
20.19s
Throughput
50K vec/s
Size
2,528 MB

Why Parquet?

Parquet is columnar storage with built-in compression. For vectors:

  • 12x faster than CSV loading
  • 60% smaller file sizes
  • Type-safe schema enforcement
  • Streaming reads for large files
Allocation Strategies

Smart Centroid Placement

Where you put your centroids matters. Place related centroids on the same node, and queries that need multiple clusters only hit one server. Scatter them randomly, and every query fans out to every node.

1

Round-Robin Allocator

Simple and predictable. Centroid 0 → Node 0, Centroid 1 → Node 1, etc. Great for uniform workloads where all clusters are equally hot.

2

Load-Balanced Allocator

Tracks centroid count per server, allocates to least-loaded nodes. Prevents hot spots when clusters have uneven sizes.

3

Locality-Aware Allocator

Recommended

Uses k-means to group similar centroids on the same node. Queries that span related clusters hit fewer servers. Supports geographic/zone awareness for cross-datacenter deployments.

Ready to Scale Your Vector Search?

From single-node development to multi-datacenter production, MLGraph grows with your needs—and your SSD budget stays reasonable.

Request Demo