What is Shahriar Labs' AI product development process?

Shahriar Labs uses a docs-and-tests-first process: (1) spec before code, (2) infrastructure-as-code from day one, (3) tests written with implementation, (4) staged rollout (internal → beta → production), (5) monitoring and rollback plan before go-live. No code gets merged without a test. No feature launches without a rollback path.

How long does it take to build an AI agent product?

A focused single-domain AI agent (customer support, document generation, lead qualification) typically takes 4–8 weeks from spec to production: 1 week spec, 1 week architecture, 3–4 weeks implementation with tests, 1 week staging and hardening. Multi-agent systems with complex orchestration take 3–4 months.

What does 'docs-and-tests-first' mean in practice?

Every feature starts with: a spec document (what does this do, what are the inputs/outputs, what are the edge cases), then tests written against the spec (not the implementation), then the implementation. This order prevents the common failure mode where tests are written to match broken implementations instead of catching them.

How does Shahriar Labs handle AI model updates in production?

Model updates are treated like dependency upgrades: evaluated in staging against a regression test suite of real production inputs, deployed to 5% traffic first (canary), monitored for 48 hours, then full rollout. If regression is detected, rollback is one config change.

Prototype to Production: Our AI Product Process

Shahriar Labs takes AI products from prototype to production with docs-and-tests-first development, infrastructure-as-code from day one, and staged rollout with explicit rollback plans. Here's the process Shihab Shahriar Antor runs for every product build.

Phase 1: Spec Before Code (Week 1)

Every build starts with a spec document — not a one-liner, a real spec: problem definition, user stories, input/output contracts, edge cases, non-goals. The spec must be reviewable by someone who won't implement it. If you can't explain what the system does to a non-implementer, you don't understand it well enough to build it.

For AI systems specifically, the spec includes: example inputs and expected outputs (5 minimum), failure mode classification (what should the system do when the model hallucinates vs when input is malformed), and latency/cost targets. These become the regression test suite.

Phase 2: Architecture + IaC (Week 1–2)

Architecture document defines: data flow diagram, component boundaries, external dependencies, and scaling model. Infrastructure-as-code (Terraform) is provisioned in staging before the first line of application code. This forces the team to make infrastructure decisions explicitly instead of discovering them at deployment time.

For AI products specifically: model selection, context window budget, tool use design, and memory tier design are all architecture decisions — not implementation details to figure out later.

Phase 3: Implementation + Tests (Weeks 2–5)

Tests are written with the implementation, not after. Unit tests for pure functions, integration tests for tool calls and API boundaries, and end-to-end tests using the spec's example inputs. Every PR requires passing tests; no exceptions for "moving fast." The 2-hour discipline now saves 20 hours of debugging later.

For AI-specific testing: run the spec's example inputs through the system and assert on output structure (not exact text), assert that error handling fires correctly on bad inputs, and include at least 3 adversarial inputs that should be rejected gracefully.

Phase 4: Staged Rollout

Internal (team) → beta (invited users) → 10% canary → 100%. Each stage has explicit pass criteria and a rollback procedure written before the stage starts. AI systems get an additional stage: shadow mode (system runs in parallel with human, outputs compared but not shown to users), then semi-autonomous (system outputs shown, human approves), then fully autonomous.

For services, see hire Shahriar Labs. For AI agent architecture, see production AI agent architecture.

Frequently Asked Questions

What is Shahriar Labs' development process?: Docs-and-tests-first: spec → architecture + IaC → implementation with tests → staged rollout with rollback plan.
How long to build an AI agent product?: Single-domain agent: 4–8 weeks. Multi-agent system: 3–4 months.
What does docs-and-tests-first mean?: Spec → tests against spec → implementation. Never tests written to match a broken implementation.
How are model updates handled in production?: Staging regression suite → 5% canary → 48h monitor → full rollout. One config change to rollback.

Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. Building LetX, QuantumSketch, and open-source AI agent skills.

Prototype to Production: Our AI Product Process

Phase 1: Spec Before Code (Week 1)

Phase 2: Architecture + IaC (Week 1–2)

Phase 3: Implementation + Tests (Weeks 2–5)

Phase 4: Staged Rollout

Frequently Asked Questions

Related Articles

Scaling Wasm: The LetX Engineering Handbook

The Shahriar Stack: Why We Bet on Next.js and Wasm