Inference Engineering
Notes and book draft on inference architecture, serving, latency, throughput, and GPU/cloud economics.
Books
These books are currently in progress and available for free access. Each link opens the source Google Doc for reading and feedback.
I strongly believe books in the AI era should be continuously rewritten, much like the continual learning paradigm. These drafts are my attempt to write continual-learning books: living technical references that stay open, improve over time, and invite feedback as the field changes.
Notes and book draft on inference architecture, serving, latency, throughput, and GPU/cloud economics.
Book draft on production agent systems, durable harnesses, memory, tool boundaries, evals, and governance.
Book draft on RAG, context engineering, knowledge graphs, semantic connectors, and enterprise data-to-AI patterns.
Book draft on security patterns for AI systems, agents, data access, tool use, and AI application risk.
Book draft on modern LLMs, reasoning models, multimodal systems, and inference-time compute.
Book draft on reinforcement learning concepts, environments, evaluation loops, and agent training patterns.
Book draft on AI-assisted coding, agentic development workflows, code review, and software delivery loops.