Inside the Capstone: Building a Production AI System

Every graduate of the AI Engineering Bootcamp has completed a capstone: an industry-scale production AI system, built by a team over the final twelve weeks of the program and defended before a review panel of faculty and industry engineers. It is a graduation requirement, and it is graded against a published rubric.

The capstone exists because the hardest problems in AI engineering do not appear in exercises. They appear when a retrieval pipeline meets messy documents, when an agent loops on a malformed tool call at 2 a.m., and when a token bill arrives at the end of the month.

What a capstone must include

The requirements mirror what a professional team would be accountable for in production:

A deployed, working system on AWS (ECS or Lambda) serving real usage, with authentication, rate limiting, and semantic caching where appropriate.
A retrieval or agentic core: most teams build RAG pipelines over custom knowledge bases with Qdrant, or tool-using autonomous agents orchestrated with LangGraph.
Observability and evaluation: LLM tracing with LangSmith, automated evaluation suites, and token and cost monitoring with documented operating costs.
Security and guardrails: input validation, output filtering, and documented failure modes, including how the system handles infinite agent loops.
CI/CD: every capstone ships through an automated pipeline with tests and evaluation gates.

The three checkpoints

Architecture review, week 4. Teams submit a formal design document and defend their architecture before faculty: data flow, model selection, retrieval strategy, cost model, and failure handling. Teams revise and resubmit until the design passes.

Mid-project demonstration, week 8. A working end-to-end slice must be deployed and traced. This checkpoint exists to force integration early; teams that integrate late fail in predictable ways.

Final defense, week 12. A recorded technical presentation and live demonstration, followed by questioning from the panel. Panelists probe the decisions: Why hybrid search instead of pure vector similarity? Why HyDE re-ranking here? What happens when the model provider degrades? What does a request cost?

Why teams, and why it is graded

Capstones are built in teams of three to four with sprint structure, pull-request review, and rotating responsibilities, because production systems are built by teams, and because technical collaboration is an assessed learning outcome of the program.

The final grade appears on the graduate's official transcript, and the capstone record (design document, architecture review outcome, and defense result) is part of the credential we can verify for employers. When a graduate says "I built and operated a production AI system," that claim is documented.

The full capstone requirements are published in Section 08 of the Program Details.