MLGraph
Turbopuffer showed us the way, but we went off-road. A distributed vector database that runs on your SSDs, not someone else's cloud. Centroids in memory, vectors on disk, latency in microseconds. Scale to billions of vectors without scaling your AWS bill.
Why SSD-First Matters
Here's the economics: RAM costs about $5/GB/month. SSD costs about $0.10/GB/month. That's a 50x difference. If you're storing a billion 128-dimensional float vectors, that's 512 GB of raw data. In RAM? $2,560/month. On SSD? $51/month. The math speaks for itself.
But wait, you say, disk is slow! True—if you're reading it wrong. The trick is knowing what to keep in memory. Centroids are tiny (256-1024 of them, each 512 bytes for 128D). They tell you whichvectors to fetch. Keep centroids hot, let vectors stay cold, and you get the best of both worlds.
This is the Turbopuffer insight, and it's brilliant. We took it further: distributed centroids across nodes, mirror groups for replication, tiered sharding for different access patterns. Same philosophy, enterprise-grade implementation.
Built for Scale
A multi-node, multi-centroid system designed for horizontal scaling.
Tiered Sharding
New vectors hit IDMap shards for instant availability. As they age, they graduate to IVF shards for efficient ANN search. Large indices move to OnDisk shards. Your data flows naturally to its optimal resting place.
Mirror Groups
Primary-replica replication with configurable consistency. SYNC waits for all replicas, SEMI_SYNC for at least one, ASYNC for fire-and-forget. Pick your trade-off between durability and latency.
Distributed Centroids
Centroids live in memory on each node. Vectors live on disk. Queries hit centroids first (microseconds), then fetch relevant vectors (still fast). Memory stays small, storage scales infinitely.
Smart Allocation
Round-robin for simplicity. Load-balanced for efficiency. Locality-aware for performance—grouping related centroids on the same node reduces query fanout.
Data Flow Architecture
Mirror Groups: Replication Done Right
Replication sounds simple until you think about it. What happens when the primary fails mid-write? What if a replica falls behind? How do you handle split-brain scenarios? We've thought about all of this so you don't have to.
A mirror group is a collection of servers maintaining identical copies of your index data. One primary handles writes; replicas serve reads. You choose the consistency level:
Wait for all replicas. Maximum durability, higher latency. For when losing data is not an option.
Wait for at least one replica. Balanced trade-off. Good for most production workloads.
Fire and forget. Maximum speed, eventual consistency. For high-throughput scenarios.
MirrorGroupConfig config;
config.group_id = "production-vectors";
config.server_ids = {"node-1", "node-2", "node-3"};
config.replication_factor = 2;
config.primary_server_id = "node-1";
config.consistency_level = ConsistencyLevel::SEMI_SYNC;
// Automatic failover when primary goes down
// Load balancing across healthy replicas
// Replication lag monitoring with alertsLinear Scaling, Real Numbers
Measured on 128-dimensional vectors, 1M vectors per node, production hardware.
| Configuration | Vectors | Load Rate | Search QPS | Memory |
|---|---|---|---|---|
Single Server | < 10M | 100K/s | 1,800 | 2 GB |
3-Node Cluster | 10M-100M | 300K/s | 5,000 | 700 MB/node |
5-Node Cluster | 100M-1B | 450K/s | 7,500 | 400 MB/node |
Data Format Matters
For 1M vectors at 128 dimensions—choose wisely.
Why Parquet?
Parquet is columnar storage with built-in compression. For vectors:
- 12x faster than CSV loading
- 60% smaller file sizes
- Type-safe schema enforcement
- Streaming reads for large files
Smart Centroid Placement
Where you put your centroids matters. Place related centroids on the same node, and queries that need multiple clusters only hit one server. Scatter them randomly, and every query fans out to every node.
Round-Robin Allocator
Simple and predictable. Centroid 0 → Node 0, Centroid 1 → Node 1, etc. Great for uniform workloads where all clusters are equally hot.
Load-Balanced Allocator
Tracks centroid count per server, allocates to least-loaded nodes. Prevents hot spots when clusters have uneven sizes.
Locality-Aware Allocator
Uses k-means to group similar centroids on the same node. Queries that span related clusters hit fewer servers. Supports geographic/zone awareness for cross-datacenter deployments.
Ready to Scale Your Vector Search?
From single-node development to multi-datacenter production, MLGraph grows with your needs—and your SSD budget stays reasonable.