Engineering Blog

Technical deep dives from the WEB WISE HOUSE team. No marketing fluff—just honest takes on training infrastructure, vector databases, code search, and what it actually takes to build AI for C++ engineers.

The Latent Bridge: How Our 8 SLMs Talk Without Words
Architecture
Multi-Agent
Research

The Latent Bridge: How Our 8 SLMs Talk Without Words

Why we ditched token-based inter-model communication for direct semantic vector channels. Inspired by recent research on cross-model latent bridges, our 8 specialist SLMs now share meaning at latent speed.

David Gornshtein12 min read
Read Article
Nemotron Nano 3: The Holy Trinity of Efficiency (Mamba + MoE + GQA)
Architecture
Hybrid Models
NVIDIA

Nemotron Nano 3: The Holy Trinity of Efficiency (Mamba + MoE + GQA)

How NVIDIA combined Mamba-2 state spaces, sparse MoE, and GQA attention to create a 31.6B model that activates only 3.2B parameters. The architecture that inspired our SLM ensemble.

David Gornshtein15 min read
Read Article
Implementing Mamba 3 in Production: A Practitioner's Guide
Implementation
Mamba 3
Production

Implementing Mamba 3 in Production: A Practitioner's Guide

Everything we learned deploying Mamba 3 state-space models for C++ code generation: hardware requirements, Tensor Engine integration, memory efficiency, long-context handling, and real performance benchmarks.

David Gornshtein18 min read
Read Article
Mamba 3: The State Space Revolution (And Why We Use It)
State Space Models
Mamba 3
Architecture

Mamba 3: The State Space Revolution (And Why We Use It)

A first-principles journey through Mamba 3's innovations: trapezoidal discretization, complex dynamics via RoPE, and MIMO formulation. From mathematical intuition to real-world performance.

David Gornshtein16 min read
Read Article
The 4-Bit Miracle: How NVFP4 Squeezes 16-Bit Intelligence into 4-Bit Memory
Quantization
NVFP4
Blackwell

The 4-Bit Miracle: How NVFP4 Squeezes 16-Bit Intelligence into 4-Bit Memory

NVFP4 achieves the impossible: 3.5x memory reduction with less than 1% accuracy loss. Learn how dual-level scaling enables running 8 specialist SLMs on a single GPU.

David Gornshtein12 min read
Read Article
Training at 4-Bit: The Research That Broke the Rules
Training
NVFP4
Mixed Precision

Training at 4-Bit: The Research That Broke the Rules

NVIDIA demonstrated the impossible: training a 12B model on 10 trillion tokens at 4-bit precision. How mixed-precision strategies and the Muon optimizer enable FP4 training for our SLM ensemble.

David Gornshtein15 min read
Read Article
Building Production SLMs with NVFP4: A Practical Guide
Production
NVFP4
Deployment

Building Production SLMs with NVFP4: A Practical Guide

Hands-on guide to quantizing and deploying specialized language models with NVFP4. PTQ workflows, QAT strategies, and how we fit 8 models (56B params) in 28GB.

David Gornshtein18 min read
Read Article
The Magnificent Eight: Why We Built 8 Tiny Models Instead of 1 Big One
SLM Ensemble
Architecture
MoE

The Magnificent Eight: Why We Built 8 Tiny Models Instead of 1 Big One

Specialist vs generalist models: why 8 models with 4B-8B parameters each (0.8B-1.6B active) outperform single 70B+ models for C++ engineering. MoE architecture deep dive.

David Gornshtein12 min read
Read Article
Mamba Meets Transformers: Our Hybrid Architecture That Shouldn't Work But Does
Architecture
Mamba 3
Transformers

Mamba Meets Transformers: Our Hybrid Architecture That Shouldn't Work But Does

Combining Mamba 3 state-space models with Transformer Engine layers and regular attention. A technical deep dive into a hybrid architecture that defies conventional wisdom.

David Gornshtein14 min read
Read Article
The Honest Truth About Training AI on GB10: Our Grace Blackwell Journey
Training Infrastructure
GB10 / DGX Spark
Grace Blackwell

The Honest Truth About Training AI on GB10: Our Grace Blackwell Journey

Why we use NVIDIA GB10 (DGX Spark) clusters with 128GB unified LPDDR5X memory, 1 PFLOP FP4 performance, and the economics of desktop AI supercomputers vs. cloud.

David Gornshtein12 min read
Read Article
From Academic Prototype to Production Beast: Our FAISS Journey
Technical Deep Dive
FAISS Extended

From Academic Prototype to Production Beast: Our FAISS Journey

Facebook built FAISS for billion-vector experiments. We needed it to not lose customer data. Slight difference in priorities.

David Gornshtein12 min read
Read Article
A Million Steps Without Falling: What We Learned About AI Agent Orchestration
AI Agents
Orchestration

A Million Steps Without Falling: What We Learned About AI Agent Orchestration

An agent that makes a 0.1% error rate sounds great until step 1000 when you realize you're debugging garbage.

David Gornshtein14 min read
Read Article
AST + BM25 + Vectors: The Unholy Trinity of Code Search That Actually Works
Code Search
RAG

AST + BM25 + Vectors: The Unholy Trinity of Code Search That Actually Works

Why pure vector search fails for code, and how combining AST parsing, BM25, and multi-vector embeddings creates a code search system that doesn't hallucinate.

David Gornshtein15 min read
Read Article
When Your Vector Database Lies to You: The Uncomfortable Truth About RAG
Research
Embeddings

When Your Vector Database Lies to You: The Uncomfortable Truth About RAG

A deep dive into the theoretical limitations of embedding-based retrieval and why combining AST parsing, BM25, and multi-vector approaches is essential for robust code search.

David Gornshtein15 min read
Read Article
Why We Train Small Models That Actually Understand C++
SLM Ensemble
C++

Why We Train Small Models That Actually Understand C++

GPT-4 can write hello world in any language. Our models know why your template metaprogramming is broken. Learn how 4B-8B specialist models outperform 70B+ generalists.

David Gornshtein15 min read
Read Article
The Algorithm Whisperer: Our Newest SLM Specialist
SLM Ensemble
Algorithms
Pseudocode

The Algorithm Whisperer: Our Newest SLM Specialist

Meet Algorithm SLM (Algo-7B): our 8th specialist model trained on algorithm design, pseudocode interpretation, and complexity analysis. Why pseudocode understanding matters for C++ development.

David Gornshtein12 min read
Read Article

Want to learn more about our work?

Check out our product documentation or get in touch to discuss how we can help with your C++ AI needs.