Architecture
Core

MLGraph Daemon Design

The unified service providing comprehensive distributed vector index management through gRPC, REST API, and CLI.

Architecture Overview

mlgraphd
Interface Layer
gRPC API
REST API
CLI
Service Layer
Index Manager
Train Manager
Search Engine
Storage Layer
Local FS
S3
Azure
GCS
RocksDB

Command Line Interface

mlgraphd [options]

Options:
  --config PATH              Configuration file (default: /etc/mlgraph/mlgraphd.conf)
  --grpc-port PORT          gRPC service port (default: 50051)
  --rest-port PORT          REST API port (default: 8080)
  --admin-port PORT         Admin/metrics port (default: 9090)
  --data-dir PATH           Data directory (default: /var/lib/mlgraph)
  --log-level LEVEL         Log level (debug|info|warn|error)
  --enable-tls              Enable TLS for all services
  --tls-cert PATH           TLS certificate path
  --tls-key PATH            TLS key path
  --cluster-mode            Enable distributed mode
  --discovery-method METHOD Service discovery (manual|consul|etcd|k8s)
  --daemon                  Run as daemon

Service Interfaces

gRPC API

Port: 50051

  • • Index management
  • • Training operations
  • • Search operations
  • • Cluster management

REST API

Port: 8080

  • • /v1/indices
  • • /v1/indices/:id/train
  • • /v1/indices/:id/search
  • • /v1/cluster/status

CLI Tool

  • • mlgraph index create
  • • mlgraph train
  • • mlgraph search
  • • mlgraph cluster status

Storage Backends

Local FS
AWS S3
Azure Blob
Google Cloud
RocksDB

Configuration Example

# /etc/mlgraph/mlgraphd.conf
daemon:
  grpc_port: 50051
  rest_port: 8080
  admin_port: 9090
  data_dir: /var/lib/mlgraph
  log_level: info

cluster:
  enabled: true
  discovery_method: consul
  consul_address: localhost:8500
  replication_factor: 3

storage:
  local:
    enabled: true
    path: /var/lib/mlgraph/indices
  s3:
    enabled: true
    bucket: mlgraph-indices
    region: us-east-1

performance:
  max_concurrent_operations: 100
  index_cache_size_mb: 4096
  query_timeout_ms: 30000

monitoring:
  prometheus:
    enabled: true
    endpoint: /metrics
  tracing:
    enabled: true
    jaeger_endpoint: localhost:6831

CLI Examples

# Index Management
mlgraph index create --name my-index --dimension 128 --centroids 1000
mlgraph index list
mlgraph index info my-index
mlgraph index delete my-index

# Training
mlgraph train --index my-index --file training-data.parquet
mlgraph train --index my-index --s3 s3://bucket/training-data.csv

# Search
mlgraph search --index my-index --vector "[0.1, 0.2, 0.3, ...]" --top-k 10
mlgraph search --index my-index --file query.npy --top-k 100 --nprobe 50

# Cluster Management
mlgraph cluster status
mlgraph cluster rebalance --index my-index
mlgraph cluster set-replication --index my-index --factor 3

Kubernetes Deployment

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mlgraphd
spec:
  serviceName: mlgraphd
  replicas: 3
  selector:
    matchLabels:
      app: mlgraphd
  template:
    spec:
      containers:
      - name: mlgraphd
        image: mlgraph/daemon:latest
        ports:
        - containerPort: 50051
        - containerPort: 8080
        - containerPort: 9090
        env:
        - name: MLGRAPH_CLUSTER_MODE
          value: "true"
        - name: MLGRAPH_DISCOVERY_METHOD
          value: "k8s"
        volumeMounts:
        - name: data
          mountPath: /var/lib/mlgraph
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 100Gi