AI PERIODIC TABLE

The Building Blocks of Agentic AI

A reference framework for the building blocks of agentic AI systems, organized by capability and maturity level.

Goal Design

Data Access

Tools

Interaction

Orchestration

Safety

Observability

Primitives

Task Intent

Ti — Task IntentPrimitives · Goal Design

A minimal intent spec capturing outcome, constraints, and success criteria. Task intents are compiled into prompts, plans, and tool calls that drive agent behavior. They serve as the contract between the human goal and the machine execution plan.

DSPy OpenAI Agents SDK PydanticAI LangChain Runnables AutoGen Task Specs CrewAI Tasks

Embedding

Eb — EmbeddingPrimitives · Data Access

A compact vector representation of text, images, or other data used for similarity search, clustering, and retrieval routing. Embeddings convert semantic meaning into geometric proximity, enabling machines to find related content without keyword matching.

OpenAI Embeddings Sentence-Transformers FlagEmbedding (BGE)Cohere Embed Voyage AI Jina Embeddings Google Gecko/Gemini Embeddings

Tool Call

Tc — Tool CallPrimitives · Tools

A typed invocation of an external capability — API, function, shell command, or database query — with schema-validated inputs and outputs. Tool calls are how models take action in the real world, bridging language understanding with executable operations.

OpenAI Function Calling Anthropic Tool Use Google Gemini Function Calling LangChain Tools Mistral Function Calling Gorilla LLM

UI Action

Ua — UI ActionPrimitives · Interaction

An atomic UI operation — click, type, drag, scroll, or hotkey — performed in a real or virtual desktop/web environment. UI actions let agents interact with software the same way humans do, enabling automation of any GUI application without dedicated APIs.

Anthropic Computer Use OpenAI Computer Use Playwright Selenium OSWorld Browser Use

Message Passing

Mp — Message PassingPrimitives · Orchestration

A reliable message/event envelope for handing off tasks, tool results, and agent-to-agent communications across processes. Message passing decouples producers from consumers, enabling async workflows, fan-out patterns, and resilient multi-agent coordination.

NATS Apache Kafka Redis Streams RabbitMQ Google Pub/Sub Amazon SQS ZeroMQ

Policy Rules

Pr — Policy RulesPrimitives · Safety

Machine-enforced policy constraints — allowed actions, data handling rules, and authorization checks — evaluated before and after agent actions. Policy rules act as guardrails that prevent agents from taking harmful or unauthorized steps, even when instructed to do so.

Open Policy Agent (OPA)Cedar Policy OpenFGA AWS IAM Policies Oso Casbin

Trace Span

Ts — Trace SpanPrimitives · Observability

A single timed unit of work inside a distributed trace, carrying attributes and events for debugging and performance measurement. Trace spans form a tree structure that shows exactly how a request flows through agents, tools, and models — essential for diagnosing failures in complex AI pipelines.

OpenTelemetry Traces W3C Trace Context Jaeger Zipkin Datadog APM Honeycomb

Interfaces

Structured Output

So — Structured OutputInterfaces · Goal Design

A constrained output contract — JSON schema, function result schema, or typed response format — that makes agent responses machine-reliable. Structured outputs eliminate parsing guesswork and enable deterministic downstream processing, turning free-form LLM text into programmatic data.

OpenAI Structured Outputs Pydantic JSON Schema Instructor Outlines Anthropic Tool Use (structured)TypeChat

Vector Index

Vi — Vector IndexInterfaces · Data Access

A production index over embeddings enabling fast approximate nearest-neighbor search and hybrid retrieval. Vector indexes are the backbone of RAG systems, letting agents quickly find semantically relevant documents from millions of candidates in milliseconds.

Pinecone Qdrant Weaviate FAISS Milvus Chroma pgvector

MCP

Mc — MCPInterfaces · Tools

The Model Context Protocol — a standard interface for exposing tools, resources, and prompts to AI models in a consistent, discoverable way. MCP provides a universal catalog of capabilities with schemas and transport, so any model can use any tool without custom integration code.

MCP Specification MCP Servers (reference)MCP Python SDK MCP TypeScript SDK Claude Code MCP Cursor MCP Integration

Agent UI

Ui — Agent UIInterfaces · Interaction

A UI layer designed for agent workflows — action previews, approval gates, state inspection, and human-in-the-loop controls. Agent UIs make autonomous systems transparent and steerable, letting operators observe, intervene, and course-correct in real time.

LangGraph Studio Open WebUI Chainlit Gradio Streamlit Vercel AI SDK (React)

Agent Protocol

Ap — Agent ProtocolInterfaces · Orchestration

A structured agent-to-agent communication contract defining messages, handoffs, task offers, capabilities, and receipts. Agent protocols enable interoperability between heterogeneous agent systems, letting agents from different vendors collaborate on shared tasks.

A2A Protocol (Google)MCP (Anthropic)gRPC JSON-RPC OpenAI Agents Handoffs AutoGen Agent Chat

Prompt Security

Ps — Prompt SecurityInterfaces · Safety

Controls that reduce prompt injection, jailbreaks, data exfiltration, and unsafe tool execution. Prompt security encompasses input/output filters, content policies, sandboxing, and attestation layers that keep agents operating within intended boundaries.

OWASP Top 10 for LLM Apps promptfoo Lakera Guard Rebuff LLM Guard Anthropic Safeguards

Trace Export

Tx — Trace ExportInterfaces · Observability

A pipeline and export mechanism for traces and metrics — OTLP, collectors, and vendor backends. Trace export connects instrumented agent code to the observability platforms where engineers actually debug, alert, and analyze production behavior.

OTLP Specification OpenTelemetry Collector Jaeger Grafana Tempo Datadog New Relic

Runtimes

Planner

Pl — PlannerRuntimes · Goal Design

A runtime planner that decomposes high-level intent into executable steps, selects tools, and revises plans based on observations and failures. Planners implement the think-act-observe loop that gives agents their autonomy, deciding what to do next based on what has happened so far.

ReAct (paper)LangGraph AutoGen OpenAI Agents SDK CrewAI Claude Code Devin

GraphRAG

Gr — GraphRAGRuntimes · Data Access

Retrieval over a knowledge graph combined with text, blending entity/relationship structure with passage grounding. GraphRAG answers questions that require multi-hop reasoning across documents, outperforming flat vector search on complex queries that span multiple facts.

Microsoft GraphRAG Neo4j GenAI LlamaIndex KG RAG LangChain Graph Retrievers Amazon Neptune Analytics FalkorDB

Durable Workflow

Dw — Durable WorkflowRuntimes · Tools

A durable, resumable workflow runtime for long-running agent jobs with automatic retries, timers, idempotency, and checkpointing. Durable workflows survive process crashes and restarts, making it safe to run multi-hour agent tasks that interact with unreliable external services.

Temporal LangGraph Durable Execution Azure Durable Functions Inngest Restate AWS Step Functions Prefect

Computer Use

Cu — Computer UseRuntimes · Interaction

A runtime that lets agents operate a real desktop or web environment via perception and UI actions, not just API calls. Computer use enables agents to automate legacy applications, fill forms, navigate browsers, and interact with any software that lacks a programmatic interface.

Anthropic Computer Use OpenAI Computer Use Browser Use OSWorld Playwright Puppeteer

Agent Framework

Af — Agent FrameworkRuntimes · Orchestration

The core runtime abstraction for building agents — state management, tool routing, retries, memory hooks, and multi-agent coordination. Agent frameworks provide the scaffolding so developers can focus on agent logic instead of plumbing, handling the complex lifecycle of plan-execute-observe loops.

LangGraph OpenAI Agents SDK CrewAI AutoGen Claude Agent SDK Semantic Kernel Haystack

Sandbox

Sb — SandboxRuntimes · Safety

An isolated execution environment that constrains side effects — filesystem access, network calls, credentials, and system resources — for untrusted code and tools. Sandboxes let agents execute generated code safely, preventing a single bad tool call from compromising the host system.

E2B gVisor Firecracker Docker Fly Machines Modal Sandboxes Daytona

Observability

Ob — ObservabilityRuntimes · Observability

Runtime visibility into agent behavior — traces, logs, metrics, prompt/tool lineage, error clustering, and dashboards. Observability platforms purpose-built for AI show not just what happened, but why the model chose a particular path, how much it cost, and where quality degrades.

LangSmith Arize Phoenix Langfuse OpenTelemetry Weights & Biases Weave Braintrust Helicone

Production

Spec Handoff

Sh — Spec HandoffProduction · Goal Design

A production-ready task specification — runbook-style — that encodes goals, constraints, approval gates, and rollback guidance for autonomous operations. Spec handoffs bridge the gap between what a human wants and what an agent safely executes in production, including escalation paths when things go wrong.

NIST AI RMF Google SRE Workbook PagerDuty Runbook Automation Shoreline Runbooks Terraform Plans Argo CD GitOps

Data Pipeline

Dp — Data PipelineProduction · Data Access

A repeatable ingestion and transformation path that keeps knowledge fresh — ETL/ELT, change data capture, document sync, and schema evolution. Data pipelines feed agents with up-to-date information, ensuring RAG systems and knowledge bases reflect reality rather than stale snapshots.

Airbyte dbt Dagster Apache Airflow Fivetran Prefect Unstructured

Robot Middleware

Ro — Robot MiddlewareProduction · Tools

The control and perception middleware that bridges AI models to embodied systems — robot OS, motion planning, sensor fusion, and actuator control. Robot middleware translates high-level agent commands into physical actions, handling the real-time constraints that software agents never face.

ROS 2 MoveIt NVIDIA Isaac ROS Boston Dynamics Orbit PyBullet Open Robotics Gazebo

Ops Console

Oc — Ops ConsoleProduction · Interaction

The operator interface for supervising production agents — approvals, escalations, replay, and incident response. Ops consoles give human operators a command center to monitor fleet-wide agent health, intervene when agents are stuck, and audit every decision an agent made.

LangSmith LangGraph Studio Langfuse Dashboard Grafana Arize AI Datadog LLM Observability

Harness

Hn — HarnessProduction · Orchestration

A deterministic runner that executes agent workflows with checkpoints, retries, timeouts, and human gates. The harness provides production-grade reliability around non-deterministic AI components, ensuring that flaky model calls or tool failures don't silently corrupt long-running jobs.

Temporal LangGraph Durable Execution Argo Workflows Prefect Dagster AWS Step Functions

Access Control

Ac — Access ControlProduction · Safety

Identity and authorization controls for agents, tools, and data — RBAC, ABAC, OAuth scopes, and secrets handling. Access control ensures agents only reach the resources they need, following least-privilege principles so a compromised agent can't escalate beyond its scope.

OAuth 2.0 OpenFGA HashiCorp Vault AWS IAM Keycloak SPIFFE/SPIRE

Cost Meter

Cm — Cost MeterProduction · Observability

Per-agent and per-tool cost accounting — tokens consumed, latency, GPU time, and API spend — with budgets and alerts. Cost metering prevents runaway expenses from autonomous agents, letting teams set hard spending limits and understand which workflows drive the most cost.

OpenAI Usage Dashboard LangSmith Cost Tracking Helicone OpenTelemetry Metrics Anthropic Usage API LiteLLM Proxy

Testing

Rubric Authoring

Ra — Rubric AuthoringTesting · Goal Design

Test-case and rubric definition for agent evaluation — pass/fail criteria, graders, scenario specs, and golden datasets. Rubric authoring captures what 'good' looks like for non-deterministic systems, enabling automated scoring of agent outputs that can't be checked with simple assertions.

OpenAI Evals promptfoo DeepEval Braintrust RAGAS LangSmith Evaluation

Sim World

Sw — Sim WorldTesting · Tools

A simulated environment for training and evaluating agent behavior — games, embodied simulations, and UI task worlds. Sim worlds provide safe, reproducible arenas where agents can fail cheaply, learn from mistakes, and be benchmarked before touching production systems.

AI2-THOR Habitat MineDojo NVIDIA Isaac Sim Unity ML-Agents MuJoCo

Gym Arena

Ga — Gym ArenaTesting · Interaction

A benchmark harness that exposes interactive tasks — web browsing, desktop apps, codebases — with execution-based scoring. Gym arenas measure real agent capability by requiring agents to actually complete tasks, not just answer questions about them.

SWE-bench WebArena OSWorld HumanEval GAIA Benchmark Tau-bench

Red Team

Rt — Red TeamTesting · Orchestration

Adversarial testing of agent behavior and tool access — jailbreaks, prompt injection, privilege escalation, and misuse scenarios. Red teaming proactively finds failure modes before attackers do, stress-testing the safety boundaries that policy rules and sandboxes are supposed to enforce.

MITRE ATLAS OWASP LLM Top 10 promptfoo Red Team Garak Microsoft PyRIT NIST AI RMF

Regression

Rx — RegressionTesting · Safety

Continuous safety and quality regression tests that run on every change to prompts, tools, policies, or models. Regression suites catch silent degradation — when a model update breaks a previously working behavior or a policy change opens an unintended loophole.

SWE-bench Verified promptfoo CI OpenAI Evals DeepEval CI Braintrust CI GitHub Actions + LLM Tests

Benchmarks

Bk — BenchmarksTesting · Observability

Standardized evaluation suites and leaderboards used to compare agents and models on real tasks — coding, UI use, tool use, and reasoning. Benchmarks provide a shared language for measuring progress and catching regressions across the industry.

SWE-bench MMLU / MMLU-Pro HumanEval HELM OSWorld LMSYS Chatbot Arena LiveBench

Memory

Work Memory

Wk — Work Memory

Short-lived working state used during a single run — scratchpad, active context, tool buffers, and intermediate results. Work memory is the agent's 'mental workspace' that persists only for the duration of one task execution.

KV Cache / Context Window LangGraph State AutoGen Scratchpad Claude Extended Thinking OpenAI Reasoning Tokens

Episodic Log

El — Episodic Log

Append-only event log of actions and observations enabling replay, audit, and learning-from-history. Episodic logs let agents and operators review exactly what happened during a run, and can be replayed for debugging or used as training data.

OpenTelemetry Logs LangSmith Traces Langfuse Sessions Braintrust Logs Arize Phoenix Traces

Checkpoint

Ck — Checkpoint

Persisted snapshot of agent state enabling resume after failure, long waits, or handoffs. Checkpoints are the foundation of durable execution — they let an agent pick up exactly where it left off after a crash or deliberate pause.

LangGraph Checkpoints Temporal Event History PyTorch Checkpointing Restate Journal Inngest Steps

Long Term Memory

Lt — Long Term Memory

Durable memory across sessions — user preferences, organizational knowledge, learned skills, and relationship history — with retrieval and consolidation. Long-term memory lets agents build persistent understanding of users and projects over weeks and months.

mem0 Letta (MemGPT)Zep LangMem Claude Memory (claude.ai)ChatGPT Memory

Model

LLM

Ll — LLM

General-purpose large language model used for text understanding, generation, and tool reasoning. LLMs are the reasoning engines at the heart of every agent, converting instructions and context into decisions and actions.

Anthropic Claude 4.6 (Opus/Sonnet)OpenAI GPT-5.4 Google Gemini 3.1 Pro Meta Llama 4 Maverick xAI Grok 4 Mistral Large 3 DeepSeek V3.2 Cohere Command A

SLM

Sl — SLM

Small language model optimized for latency, on-device deployment, or specialized domains. SLMs trade breadth for speed and cost, running on edge devices or handling focused tasks where a full LLM is overkill.

Microsoft Phi-4 Reasoning Google Gemma 3 Meta Llama 4 Scout Mistral 3 Small Qwen 3.5 Small Gemini 3.1 Flash Lite

Multimodal

Mm — Multimodal

Models that natively handle text, images, audio, and video in one unified stream. Multimodal models let agents see screenshots, hear audio, and process documents with mixed content — critical for computer use, document understanding, and real-world interaction.

Google Gemini 3.1 Pro OpenAI GPT-5.4 Anthropic Claude 4.6 (vision)Meta Llama 4 Maverick Microsoft Phi-4 Multimodal Qwen 3.5 (multimodal)Cohere Command A Vision

Reasoning Model

Rm — Reasoning Model

Models tuned for multi-step deliberation and planning with chain-of-thought reasoning, achieving improved reliability on hard math, code, and logic tasks. Reasoning models think before they answer, trading latency for accuracy on problems that require careful step-by-step analysis.

OpenAI o3-pro Claude 4.6 Extended Thinking Google Gemini 3.1 Pro (thinking)GPT-5.4 Thinking DeepSeek R2 Grok 4 (reasoning)Cohere Command A Reasoning

Diffusion LLM

Dl — Diffusion LLM

Language models using diffusion-style generation — iterative denoising — rather than autoregressive next-token decoding. Diffusion LLMs can generate all tokens in parallel, offering fundamentally different speed/quality tradeoffs and enabling novel editing and infilling workflows.

Inception Labs Mercury MDLM (paper)SEDD (paper)Plaid (discrete diffusion)Dream (Microsoft)

Distributed LLM

Dm — Distributed LLM

A model served or trained across many nodes and GPUs, often with decentralized or swarm compute. Distributed LLMs tackle the fundamental challenge that frontier models are too large for any single machine, requiring novel parallelism strategies.

Petals Prime Intellect DiLoCo (paper)Together AI Exo Labs Hivemind

VLA

Va — VLA

Vision-Language-Action models that map perception and language to embodied actions for robotics. VLAs bridge the gap between AI reasoning and physical manipulation, letting robots understand natural language commands and execute them in the real world.

Physical Intelligence π0 Google RT-2 / Gemini Robotics OpenVLA NVIDIA Isaac GR00T Figure (humanoid)Boston Dynamics + Agility

Compute

Distributed Inference

Di — Distributed Inference

Scaling inference across GPUs and nodes for throughput, latency, and long-context workloads. Distributed inference engines handle the challenge of serving models to thousands of concurrent users while keeping response times low and costs manageable.

vLLM TensorRT-LLM SGLang Ray Serve llama.cpp Ollama Text Generation Inference

Distributed Training

Dt — Distributed Training

Scaling model training via data, model, and pipeline parallelism with sharded optimizers. Distributed training makes it possible to build frontier models by coordinating thousands of GPUs, with sophisticated strategies to minimize communication overhead.

DeepSpeed PyTorch FSDP Megatron-LM JAX/XLA Ray Train Hugging Face Accelerate

AI Periodic Table

AI PERIODIC TABLE