AI PERIODIC TABLE
The Building Blocks of Agentic AI
A reference framework for the building blocks of agentic AI systems, organized by capability and maturity level.
A minimal intent spec capturing outcome, constraints, and success criteria. Task intents are compiled into prompts, plans, and tool calls that drive agent behavior. They serve as the contract between the human goal and the machine execution plan.
A compact vector representation of text, images, or other data used for similarity search, clustering, and retrieval routing. Embeddings convert semantic meaning into geometric proximity, enabling machines to find related content without keyword matching.
A typed invocation of an external capability — API, function, shell command, or database query — with schema-validated inputs and outputs. Tool calls are how models take action in the real world, bridging language understanding with executable operations.
An atomic UI operation — click, type, drag, scroll, or hotkey — performed in a real or virtual desktop/web environment. UI actions let agents interact with software the same way humans do, enabling automation of any GUI application without dedicated APIs.
A reliable message/event envelope for handing off tasks, tool results, and agent-to-agent communications across processes. Message passing decouples producers from consumers, enabling async workflows, fan-out patterns, and resilient multi-agent coordination.
Machine-enforced policy constraints — allowed actions, data handling rules, and authorization checks — evaluated before and after agent actions. Policy rules act as guardrails that prevent agents from taking harmful or unauthorized steps, even when instructed to do so.
A single timed unit of work inside a distributed trace, carrying attributes and events for debugging and performance measurement. Trace spans form a tree structure that shows exactly how a request flows through agents, tools, and models — essential for diagnosing failures in complex AI pipelines.
A constrained output contract — JSON schema, function result schema, or typed response format — that makes agent responses machine-reliable. Structured outputs eliminate parsing guesswork and enable deterministic downstream processing, turning free-form LLM text into programmatic data.
A production index over embeddings enabling fast approximate nearest-neighbor search and hybrid retrieval. Vector indexes are the backbone of RAG systems, letting agents quickly find semantically relevant documents from millions of candidates in milliseconds.
The Model Context Protocol — a standard interface for exposing tools, resources, and prompts to AI models in a consistent, discoverable way. MCP provides a universal catalog of capabilities with schemas and transport, so any model can use any tool without custom integration code.
A UI layer designed for agent workflows — action previews, approval gates, state inspection, and human-in-the-loop controls. Agent UIs make autonomous systems transparent and steerable, letting operators observe, intervene, and course-correct in real time.
A structured agent-to-agent communication contract defining messages, handoffs, task offers, capabilities, and receipts. Agent protocols enable interoperability between heterogeneous agent systems, letting agents from different vendors collaborate on shared tasks.
Controls that reduce prompt injection, jailbreaks, data exfiltration, and unsafe tool execution. Prompt security encompasses input/output filters, content policies, sandboxing, and attestation layers that keep agents operating within intended boundaries.
A pipeline and export mechanism for traces and metrics — OTLP, collectors, and vendor backends. Trace export connects instrumented agent code to the observability platforms where engineers actually debug, alert, and analyze production behavior.
A runtime planner that decomposes high-level intent into executable steps, selects tools, and revises plans based on observations and failures. Planners implement the think-act-observe loop that gives agents their autonomy, deciding what to do next based on what has happened so far.
Retrieval over a knowledge graph combined with text, blending entity/relationship structure with passage grounding. GraphRAG answers questions that require multi-hop reasoning across documents, outperforming flat vector search on complex queries that span multiple facts.
A durable, resumable workflow runtime for long-running agent jobs with automatic retries, timers, idempotency, and checkpointing. Durable workflows survive process crashes and restarts, making it safe to run multi-hour agent tasks that interact with unreliable external services.
A runtime that lets agents operate a real desktop or web environment via perception and UI actions, not just API calls. Computer use enables agents to automate legacy applications, fill forms, navigate browsers, and interact with any software that lacks a programmatic interface.
The core runtime abstraction for building agents — state management, tool routing, retries, memory hooks, and multi-agent coordination. Agent frameworks provide the scaffolding so developers can focus on agent logic instead of plumbing, handling the complex lifecycle of plan-execute-observe loops.
An isolated execution environment that constrains side effects — filesystem access, network calls, credentials, and system resources — for untrusted code and tools. Sandboxes let agents execute generated code safely, preventing a single bad tool call from compromising the host system.
Runtime visibility into agent behavior — traces, logs, metrics, prompt/tool lineage, error clustering, and dashboards. Observability platforms purpose-built for AI show not just what happened, but why the model chose a particular path, how much it cost, and where quality degrades.
A production-ready task specification — runbook-style — that encodes goals, constraints, approval gates, and rollback guidance for autonomous operations. Spec handoffs bridge the gap between what a human wants and what an agent safely executes in production, including escalation paths when things go wrong.
A repeatable ingestion and transformation path that keeps knowledge fresh — ETL/ELT, change data capture, document sync, and schema evolution. Data pipelines feed agents with up-to-date information, ensuring RAG systems and knowledge bases reflect reality rather than stale snapshots.
The control and perception middleware that bridges AI models to embodied systems — robot OS, motion planning, sensor fusion, and actuator control. Robot middleware translates high-level agent commands into physical actions, handling the real-time constraints that software agents never face.
The operator interface for supervising production agents — approvals, escalations, replay, and incident response. Ops consoles give human operators a command center to monitor fleet-wide agent health, intervene when agents are stuck, and audit every decision an agent made.
A deterministic runner that executes agent workflows with checkpoints, retries, timeouts, and human gates. The harness provides production-grade reliability around non-deterministic AI components, ensuring that flaky model calls or tool failures don't silently corrupt long-running jobs.
Identity and authorization controls for agents, tools, and data — RBAC, ABAC, OAuth scopes, and secrets handling. Access control ensures agents only reach the resources they need, following least-privilege principles so a compromised agent can't escalate beyond its scope.
Per-agent and per-tool cost accounting — tokens consumed, latency, GPU time, and API spend — with budgets and alerts. Cost metering prevents runaway expenses from autonomous agents, letting teams set hard spending limits and understand which workflows drive the most cost.
Test-case and rubric definition for agent evaluation — pass/fail criteria, graders, scenario specs, and golden datasets. Rubric authoring captures what 'good' looks like for non-deterministic systems, enabling automated scoring of agent outputs that can't be checked with simple assertions.
A simulated environment for training and evaluating agent behavior — games, embodied simulations, and UI task worlds. Sim worlds provide safe, reproducible arenas where agents can fail cheaply, learn from mistakes, and be benchmarked before touching production systems.
A benchmark harness that exposes interactive tasks — web browsing, desktop apps, codebases — with execution-based scoring. Gym arenas measure real agent capability by requiring agents to actually complete tasks, not just answer questions about them.
Adversarial testing of agent behavior and tool access — jailbreaks, prompt injection, privilege escalation, and misuse scenarios. Red teaming proactively finds failure modes before attackers do, stress-testing the safety boundaries that policy rules and sandboxes are supposed to enforce.
Continuous safety and quality regression tests that run on every change to prompts, tools, policies, or models. Regression suites catch silent degradation — when a model update breaks a previously working behavior or a policy change opens an unintended loophole.
Standardized evaluation suites and leaderboards used to compare agents and models on real tasks — coding, UI use, tool use, and reasoning. Benchmarks provide a shared language for measuring progress and catching regressions across the industry.
Short-lived working state used during a single run — scratchpad, active context, tool buffers, and intermediate results. Work memory is the agent's 'mental workspace' that persists only for the duration of one task execution.
Append-only event log of actions and observations enabling replay, audit, and learning-from-history. Episodic logs let agents and operators review exactly what happened during a run, and can be replayed for debugging or used as training data.
Persisted snapshot of agent state enabling resume after failure, long waits, or handoffs. Checkpoints are the foundation of durable execution — they let an agent pick up exactly where it left off after a crash or deliberate pause.
Durable memory across sessions — user preferences, organizational knowledge, learned skills, and relationship history — with retrieval and consolidation. Long-term memory lets agents build persistent understanding of users and projects over weeks and months.
General-purpose large language model used for text understanding, generation, and tool reasoning. LLMs are the reasoning engines at the heart of every agent, converting instructions and context into decisions and actions.
Small language model optimized for latency, on-device deployment, or specialized domains. SLMs trade breadth for speed and cost, running on edge devices or handling focused tasks where a full LLM is overkill.
Models that natively handle text, images, audio, and video in one unified stream. Multimodal models let agents see screenshots, hear audio, and process documents with mixed content — critical for computer use, document understanding, and real-world interaction.
Models tuned for multi-step deliberation and planning with chain-of-thought reasoning, achieving improved reliability on hard math, code, and logic tasks. Reasoning models think before they answer, trading latency for accuracy on problems that require careful step-by-step analysis.
Language models using diffusion-style generation — iterative denoising — rather than autoregressive next-token decoding. Diffusion LLMs can generate all tokens in parallel, offering fundamentally different speed/quality tradeoffs and enabling novel editing and infilling workflows.
A model served or trained across many nodes and GPUs, often with decentralized or swarm compute. Distributed LLMs tackle the fundamental challenge that frontier models are too large for any single machine, requiring novel parallelism strategies.
Vision-Language-Action models that map perception and language to embodied actions for robotics. VLAs bridge the gap between AI reasoning and physical manipulation, letting robots understand natural language commands and execute them in the real world.
Scaling inference across GPUs and nodes for throughput, latency, and long-context workloads. Distributed inference engines handle the challenge of serving models to thousands of concurrent users while keeping response times low and costs manageable.
Scaling model training via data, model, and pipeline parallelism with sharded optimizers. Distributed training makes it possible to build frontier models by coordinating thousands of GPUs, with sophisticated strategies to minimize communication overhead.