Distributed Database

SSD-First

MLGraph

Turbopuffer showed us the way, but we went off-road. A distributed vector database that runs on your SSDs, not someone else's cloud. Centroids in memory, vectors on disk, latency in microseconds. Scale to billions of vectors without scaling your AWS bill.

Deploy Now Cluster Setup Guide

1B+

Vectors Supported

7,500

QPS @ 5 Nodes

0.2ms

P50 Latency

65%

Memory Savings

The Philosophy

Why SSD-First Matters

Here's the economics: RAM costs about $5/GB/month. SSD costs about $0.10/GB/month. That's a 50x difference. If you're storing a billion 128-dimensional float vectors, that's 512 GB of raw data. In RAM? $2,560/month. On SSD? $51/month. The math speaks for itself.

But wait, you say, disk is slow! True—if you're reading it wrong. The trick is knowing what to keep in memory. Centroids are tiny (256-1024 of them, each 512 bytes for 128D). They tell you whichvectors to fetch. Keep centroids hot, let vectors stay cold, and you get the best of both worlds.

This is the Turbopuffer insight, and it's brilliant. We took it further: distributed centroids across nodes, mirror groups for replication, tiered sharding for different access patterns. Same philosophy, enterprise-grade implementation.

Architecture

Built for Scale

A multi-node, multi-centroid system designed for horizontal scaling.

Tiered Sharding

New vectors hit IDMap shards for instant availability. As they age, they graduate to IVF shards for efficient ANN search. Large indices move to OnDisk shards. Your data flows naturally to its optimal resting place.

Mirror Groups

Primary-replica replication with configurable consistency. SYNC waits for all replicas, SEMI_SYNC for at least one, ASYNC for fire-and-forget. Pick your trade-off between durability and latency.

Distributed Centroids

Centroids live in memory on each node. Vectors live on disk. Queries hit centroids first (microseconds), then fetch relevant vectors (still fast). Memory stays small, storage scales infinitely.

Smart Allocation

Round-robin for simplicity. Load-balanced for efficiency. Locality-aware for performance—grouping related centroids on the same node reduces query fanout.

Data Flow Architecture

Ingest

Vectors arrive

IDMap Shard

Fast writes

IVF → OnDisk

Scalable storage

High Availability

Mirror Groups: Replication Done Right

Replication sounds simple until you think about it. What happens when the primary fails mid-write? What if a replica falls behind? How do you handle split-brain scenarios? We've thought about all of this so you don't have to.

A mirror group is a collection of servers maintaining identical copies of your index data. One primary handles writes; replicas serve reads. You choose the consistency level:

SYNC

Wait for all replicas. Maximum durability, higher latency. For when losing data is not an option.

SEMI_SYNC

Default

Wait for at least one replica. Balanced trade-off. Good for most production workloads.

ASYNC

Fire and forget. Maximum speed, eventual consistency. For high-throughput scenarios.

mirror_group_config.cpp

MirrorGroupConfig config;
config.group_id = "production-vectors";
config.server_ids = {"node-1", "node-2", "node-3"};
config.replication_factor = 2;
config.primary_server_id = "node-1";
config.consistency_level = ConsistencyLevel::SEMI_SYNC;

// Automatic failover when primary goes down
// Load balancing across healthy replicas
// Replication lag monitoring with alerts

Performance

Linear Scaling, Real Numbers

Measured on 128-dimensional vectors, 1M vectors per node, production hardware.

Configuration	Vectors	Load Rate	Search QPS	Memory
Single Server	< 10M	100K/s	1,800	2 GB
3-Node Cluster	10M-100M	300K/s	5,000	700 MB/node
5-Node Cluster	100M-1B	450K/s	7,500	400 MB/node

Data Format Matters

For 1M vectors at 128 dimensions—choose wisely.

Parquet

Recommended

Load Time

0.81s

Throughput

1,232K vec/s

Size

494 MB

CSV

Load Time

9.77s

Throughput

102K vec/s

Size

1,249 MB

JSON

Load Time

20.19s

Throughput

50K vec/s

Size

2,528 MB

Why Parquet?

Parquet is columnar storage with built-in compression. For vectors:

12x faster than CSV loading
60% smaller file sizes
Type-safe schema enforcement
Streaming reads for large files

Allocation Strategies

Smart Centroid Placement

Where you put your centroids matters. Place related centroids on the same node, and queries that need multiple clusters only hit one server. Scatter them randomly, and every query fans out to every node.

Round-Robin Allocator

Simple and predictable. Centroid 0 → Node 0, Centroid 1 → Node 1, etc. Great for uniform workloads where all clusters are equally hot.

Load-Balanced Allocator

Tracks centroid count per server, allocates to least-loaded nodes. Prevents hot spots when clusters have uneven sizes.

Locality-Aware Allocator

Recommended

Uses k-means to group similar centroids on the same node. Queries that span related clusters hit fewer servers. Supports geographic/zone awareness for cross-datacenter deployments.

Ready to Scale Your Vector Search?

From single-node development to multi-datacenter production, MLGraph grows with your needs—and your SSD budget stays reasonable.

Start Building Request Demo