How We Build AI That Actually Works: The Agency Stack
Beyond chatbots. How Shahriar Labs orchestrates multi-agent systems to solve complex engineering problems autonomously.
RAG retrieves text chunks; knowledge graphs model relationships; agent memory persists state. Combine them correctly for AI agents that reason over real context.
RAG retrieves text chunks; knowledge graphs model relationships; agent memory persists state. Combine all three correctly and you get AI agents that reason over real, current, structured context — not just surface-level similarity matching. Here's how to think about each layer and when to use which.
RAG works by embedding documents into a vector space and retrieving the top-k most semantically similar chunks to a query. It excels at: finding relevant documentation passages, answering questions from a large corpus, and reducing hallucination by grounding responses in retrieved text.
RAG fails at: multi-hop reasoning (find the answer that requires linking three different documents), structured queries (give me all customers in Region A with ARR > $100K), and temporal reasoning (what changed between v1 and v2). For these, you need structured data or a knowledge graph, not cosine similarity.
A knowledge graph stores entities (Shahriar Labs, LetX, Shihab Shahriar Antor) and typed relationships (Shahriar Labs → founder → Shihab Shahriar Antor, Shahriar Labs → builds → LetX). Queries like "who builds products that compete with Overleaf?" require traversing these relationships — RAG can't answer them reliably from text alone.
At Shahriar Labs, the Organization/Person/SoftwareApplication @graph in our schema.org markup is a lightweight public knowledge graph — it tells search engines and AI models exactly the entity relationships without requiring them to infer from text. Same principle, different scale.
In-context memory: The current prompt window — task state, recent tool outputs, working notes. Ephemeral, highest retrieval speed, limited by context window.
Session memory: Redis or DynamoDB with TTL. User preferences, conversation history, task progress within a session. Survives across agent invocations within a workflow run.
Long-term memory: Vector DB (Qdrant, pgvector) + structured store. Past decisions, domain knowledge learned from interactions, user-specific knowledge bases. See common-knowledge skill for an open-source implementation.
For production agent architecture, see our full guide on building production AI agents.
Written by Shihab Shahriar Antor — AI Engineer & Founder of Shahriar Labs. Building LetX, QuantumSketch, and open-source AI agent skills.
Beyond chatbots. How Shahriar Labs orchestrates multi-agent systems to solve complex engineering problems autonomously.
In 2026, AI agents handle planning, coding, testing, and deployment under human direction — shifting developers from implementers to architects and reviewers.