API Feature
Testing
Test Data Generation API
Generate synthetic vectors, training data, and ground-truth queries via API for automated testing and benchmarking.

Test data generation workflow with distribution options
API Endpoints
POST /api/test-data/vectors
Generate random vectors with configurable distribution.
{
"count": 100000,
"dimensions": 128,
"distribution": "gaussian",
"normalize": true,
"seed": 42
}POST /api/test-data/clustered
Generate vectors clustered around centroids.
{
"count": 100000,
"dimensions": 128,
"numClusters": 64,
"clusterSpread": 0.1,
"clusterSizeVariance": 0.3
}POST /api/test-data/ground-truth
Generate queries with known nearest neighbors for evaluation.
{
"indexName": "benchmark-index",
"queryCount": 1000,
"k": 100,
"sampleFrom": "index" // or "new"
}POST /api/test-data/benchmark-suite
Generate a complete benchmark suite with vectors, queries, and ground truth.
{
"name": "recall-benchmark",
"vectorCount": 1000000,
"queryCount": 10000,
"dimensions": 256,
"distribution": "clustered",
"k": [1, 10, 100]
}Full Example
// Generate test vectors
const vectorsResponse = await fetch('/api/test-data/vectors', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
count: 100000,
dimensions: 128,
distribution: 'clustered',
numClusters: 256,
clusterSpread: 0.05,
normalize: true,
seed: 12345,
format: 'stream' // Stream to avoid memory issues
})
});
// Response is a ReadableStream for large datasets
const reader = vectorsResponse.body.getReader();
// Or get download URL for file
const { downloadUrl } = await fetch('/api/test-data/vectors', {
method: 'POST',
body: JSON.stringify({
count: 1000000,
dimensions: 256,
format: 'parquet',
output: 'url'
})
}).then(r => r.json());
// Download via URL
const file = await fetch(downloadUrl);Distribution Parameters
| Distribution | Parameters | Use Case |
|---|---|---|
| uniform | min, max | Baseline testing |
| gaussian | mean, stddev | Embedding simulation |
| clustered | numClusters, spread | IVF benchmarking |
| zipf | alpha | Realistic access patterns |
| adversarial | pattern | Edge case testing |
Streaming Ingest
For large datasets, generate and ingest directly without intermediate storage:
// Generate and stream directly to index
POST /api/test-data/generate-and-ingest
{
"indexName": "benchmark-vectors",
"count": 10000000,
"dimensions": 128,
"distribution": "clustered",
"batchSize": 50000,
"progressWebhook": "https://example.com/progress"
}
// Response
{
"jobId": "gen-123",
"status": "running",
"progress": {
"generated": 0,
"ingested": 0,
"total": 10000000
}
}
// Poll for status or use webhook
GET /api/jobs/gen-123