Tutorial D6: MCP Tooling & Architecture Basics
Read-only tool servers, shared tooling, and security boundaries
✅ CORE MISSION OF THIS TUTORIAL
By the end of this tutorial, the reader will be able to:
- ✅ Explain MCP (Model Context Protocol) as a tool boundary between agents and plant systems
- ✅ Design a read-only tool server that exposes safe query-only capabilities
- ✅ Replace duplicated per-agent tool glue with shared tooling usable by multiple agents
- ✅ Implement basic security boundaries: allowlists, argument validation, audit logs, and caching
- ✅ Prepare for D6: Tool Abstraction & MCP Server Introduction (building a real server interface)
This tutorial focuses on architecture basics: the point is not more tools — it is bounded tools.
🌍 VENDOR-AGNOSTIC ENGINEERING NOTE
This tutorial uses:
- ▸ Alarm logs (CSV/JSON) from any SCADA/HMI historian
- ▸ PLC tag reads via OPC UA, ADS, S7, EtherNet/IP gateways (read-only)
- ▸ TwinCAT, Siemens TIA Portal, CODESYS, Allen-Bradley ecosystems
- ▸ Any plant data source that can be queried safely
We keep examples read-only and simulation-friendly. No live PLC writes in this tutorial.
1️⃣ WHY MCP EXISTS — TOOL CHAOS IN MULTI-AGENT SYSTEMS
Here’s a predictable failure mode when you build multiple industrial agents: every agent re-implements the same “tool glue”. One agent parses alarms one way, another agent parses alarms a different way, and you now have tool drift — inconsistent facts feeding “confident” recommendations.
In a packaging line scenario, you might have:
- ▸ A diagnostics agent reading alarm windows for “what happened in the last 15 minutes”
- ▸ A shift report agent summarizing the last hour and counting recurring faults
- ▸ A maintenance agent correlating tag reads with fault codes
Key Principle: Tools are part of your safety model.
If every agent ships its own tool code, you are scaling risk along with capability.
MCP (Model Context Protocol) is a practical answer: move tools out of agents, put them behind a dedicated server boundary, and make that boundary enforce policy: read-only, allowlisted, auditable, and shared.
BEFORE MCP: DUPLICATED TOOL GLUE
graph LR
A[Diagnostics Agent]:::purple --> B[Tool Glue Copy 1]:::pink
C[Shift Report Agent]:::purple --> D[Tool Glue Copy 2]:::pink
B --> E[Alarm Historian]:::cyan
D --> E[Alarm Historian]:::cyan
classDef cyan fill:#1a1a1e,stroke:#04d9ff,stroke-width:2px,color:#04d9ff;
classDef purple fill:#1a1a1e,stroke:#6366f1,stroke-width:2px,color:#6366f1;
classDef pink fill:#1a1a1e,stroke:#ff2bd6,stroke-width:2px,color:#ff2bd6;
The visible symptom is “inconsistent answers.” The underlying issue is architectural: duplicated tooling becomes duplicated truth.
2️⃣ MCP MENTAL MODEL — A TOOL SERVER AS A SAFETY BOUNDARY
The simplest way to understand MCP in industrial settings: agents should not “own” plant access. Agents should request facts; a tool server should decide whether those requests are permitted, how they are executed, and how they are logged.
AFTER MCP: SHARED TOOL SERVER BOUNDARY
graph LR
A[Diagnostics Agent]:::purple --> S[MCP Tool Server]:::purple
B[Shift Report Agent]:::purple --> S[MCP Tool Server]:::purple
S --> D[Alarm Historian]:::cyan
S --> T[Read Only Tag Access]:::cyan
S --> H[Audit Log]:::green
classDef cyan fill:#1a1a1e,stroke:#04d9ff,stroke-width:2px,color:#04d9ff;
classDef purple fill:#1a1a1e,stroke:#6366f1,stroke-width:2px,color:#6366f1;
classDef green fill:#1a1a1e,stroke:#00ff7f,stroke-width:2px,color:#00ff7f;
Starting with read-only is not a limitation — it’s a discipline. In most plants, “read facts + recommend actions” already delivers value without risking unintended writes.
- ▸ Read-only tools are easier to approve, audit, and sandbox.
- ▸ Shared tools prevent “truth drift” across agents.
- ▸ Boundaries make “what the agent can do” explicit and testable.
3️⃣ SECURITY BOUNDARIES — THE MINIMUM VIABLE GUARDRAILS
“Security boundary” here is not a compliance checkbox — it is practical containment. Your first guardrails should be boring and enforceable:
Minimum Guardrails (Start Here)
- ▸ Allowlist of tools (unknown tool = blocked)
- ▸ Read-only naming convention (no
write_*) - ▸ Argument validation (block suspicious keys)
- ▸ Audit log for every call (who/what/when/result)
Guardrails You Add Next (D6+)
- ▸ AuthN/AuthZ (per-agent identities and per-tool permissions)
- ▸ Rate limits + cost caps + caching policies
- ▸ Network segmentation (tool server sits in the right zone)
- ▸ Tracing/observability for production (System Track)
Boundary mindset: agents ask, servers decide.
If the agent “can do anything,” you do not have a boundary — you have a hope.
Experiment 1 — Tool Duplication Drift (Before MCP)
See how two agents “agree to share tools” but still ship different logic — producing inconsistent facts.
SETUP CELL
Experiment 1A — Two agents, two alarm parsers
Create a realistic alarm log sample and two subtly different parsing functions.
from datetime import datetime, timedelta
BASE = datetime(2025, 12, 24, 7, 0, 0)
def make_alarm(minutes: int, code: str, msg: str, line: str = "LINE-7"):
return {
"ts": (BASE + timedelta(minutes=minutes)).isoformat(timespec="seconds"),
"line": line,
"code": code,
"msg": msg,
}
ALARMS = [
make_alarm(0, "PE_TIMEOUT", "Photo-eye blocked > 5s at infeed"),
make_alarm(3, "VFD_OVERCURRENT", "Main conveyor VFD overcurrent trip"),
make_alarm(6, "PE_TIMEOUT", "Photo-eye blocked > 5s at infeed"),
make_alarm(9, "E_STOP", "E-stop circuit opened"),
make_alarm(12, "PE_TIMEOUT", "Photo-eye blocked > 5s at infeed"),
]
# Agent 1: correct counting by full fault code
def agent1_parse_counts(alarms):
counts = {}
for a in alarms:
counts[a["code"]] = counts.get(a["code"], 0) + 1
return counts
# Agent 2: BUG — truncates fault code at underscore
def agent2_parse_counts(alarms):
counts = {}
for a in alarms:
short = a["code"].split("_")[0]
counts[short] = counts.get(short, 0) + 1
return counts Explanation
- - This is the most common “shared tooling” failure: copy/paste reuse without a single shared implementation.
- - The bug is subtle: counts still “look plausible,” but the keys change (PE vs PE_TIMEOUT).
Takeaway
If tools live inside agents, they will drift.
EXPERIMENT CELL
Experiment 1B — Same alarms, different “truth”
Run both parsers and compare the counts.
print("Agent1 counts:", agent1_parse_counts(ALARMS))
print("Agent2 counts:", agent2_parse_counts(ALARMS)) Expected output
Agent1 counts: {'PE_TIMEOUT': 3, 'VFD_OVERCURRENT': 1, 'E_STOP': 1}
Agent2 counts: {'PE': 3, 'VFD': 1, 'E': 1} Explanation
- - If these two agents feed separate dashboards, you will get arguments about “what is actually recurring.”
- - In a tool-using LLM system, the model will confidently explain whichever facts you hand it.
- - Typical LLM cost impact: the model may waste tokens re-checking or re-querying when counts don’t align (~$0.05–$0.20/run depending on prompts).
Common mistake
Assuming "we use the same tools" because two repos have similar code.
Takeaway
Duplicated tooling creates inconsistent facts, not just duplicated effort.
CHECKPOINT CELL
Checkpoint — What broke (and what it costs)
Confirm the failure mode before introducing a tool server boundary.
Explanation
- - Failure mode: two agents interpret the same plant history differently due to tool drift.
- - Operational symptom: conflicting recommendations and wasted troubleshooting time.
- - Expected learning time so far: ~15 minutes.
- - Expected API cost if you integrate this into a live tool-using agent: ~$0.05–$0.20/run (mostly from “extra reasoning” over inconsistent facts).
Takeaway
Before MCP, the problem is not the model — it is uncontrolled tool implementation.
Experiment 2 — Read-Only Tool Server (MCP-style Boundary)
Build a tiny “MCP-like” tool host: allowlisted tools, read-only enforcement, caching, and an audit trail.
SETUP CELL
Experiment 2A — Define audit trail structure and server skeleton
Create data models for tracking tool calls and the server container.
from dataclasses import dataclass, field
from datetime import datetime, timezone
import json, hashlib
@dataclass
class ToolCall:
"""Records one tool invocation for audit purposes."""
ts: str # timestamp of call
tool: str # tool name
args: dict # arguments passed
actor: str # agent identifier
ok: bool # whether call succeeded
reason: str # "ok", "cache_hit", "unknown_tool", "write_blocked"
@dataclass
class ReadOnlyToolServer:
"""Minimal MCP-style tool host with read-only enforcement."""
tools: dict = field(default_factory=dict) # tool name → function
audit: list = field(default_factory=list) # ToolCall records
cache: dict = field(default_factory=dict) # cache key → result Explanation
- - ToolCall captures who called what, when, and whether it was allowed.
- - The server has three core pieces: a tool registry, an audit log, and a cache.
- - This structure makes the boundary explicit and testable.
Takeaway
Explicit data models turn tool access from implicit to auditable.
SETUP CELL
Experiment 2A — Add tool registration with read-only enforcement
Implement register() and attach it to the server so the notebook stays runnable cell-by-cell.
def register(self, name: str, fn):
"""Register a tool function. Reject write_* tools by name."""
if name.startswith("write_"):
raise ValueError("Read-only server forbids write_* tools")
self.tools[name] = fn
def list_tools(self):
"""Return sorted list of registered tool names."""
return sorted(self.tools.keys())
# Attach methods onto the class so later cells can use srv.register(...)
ReadOnlyToolServer.register = register
ReadOnlyToolServer.list_tools = list_tools Explanation
- - In a notebook-style tutorial, defining the function is not enough: we also attach it onto ReadOnlyToolServer so the next cells can call it.
- - The naming convention (write_*) is a simple but effective first filter.
- - Rejecting at registration time prevents write tools that are never called from accumulating.
- - list_tools() gives agents visibility into what they can call.
Takeaway
Read-only starts at registration: block write tools before they enter the system.
SETUP CELL
Experiment 2A — Implement deterministic cache keys
Hash tool name + args to create a stable cache key, then attach that helper to the class.
def _cache_key(self, tool: str, args: dict) -> str:
"""Generate a deterministic cache key from tool name and arguments."""
blob = json.dumps({"tool": tool, "args": args}, sort_keys=True).encode()
return hashlib.sha256(blob).hexdigest()
ReadOnlyToolServer._cache_key = _cache_key Explanation
- - Deterministic hashing ensures identical calls (same tool, same args) hit the cache.
- - sort_keys=True makes {"line": "A", "window": 5} and {"window": 5, "line": "A"} produce the same key.
- - Attaching the helper keeps the class definition incremental without breaking the next experiment cell.
- - Caching reduces redundant tool reads and LLM token costs when agents re-ask the same question.
Takeaway
Deterministic cache keys turn repeated queries into cheap lookups.
CORE CELL
Experiment 2A — Implement call() with security checks and caching
Enforce allowlist, read-only checks, caching, and audit logging, then attach call() to the class.
def call(self, tool: str, args: dict, actor: str):
"""Execute a tool call with security boundaries and audit trail."""
now = datetime.now(timezone.utc).isoformat(timespec="seconds")
# 1. Allowlist check: unknown tools are blocked
if tool not in self.tools:
self.audit.append(ToolCall(now, tool, args, actor, False, "unknown_tool"))
raise KeyError(f"Unknown tool: {tool}")
# 2. Read-only boundary: block write_* tools and suspicious args
if tool.startswith("write_") or any(k.startswith("write") for k in args.keys()):
self.audit.append(ToolCall(now, tool, args, actor, False, "write_blocked"))
raise PermissionError("Read-only boundary: write operations blocked")
# 3. Cache check: return cached result if available
key = self._cache_key(tool, args)
if key in self.cache:
self.audit.append(ToolCall(now, tool, args, actor, True, "cache_hit"))
return self.cache[key]
# 4. Execute tool and cache result
out = self.tools[tool](**args)
self.cache[key] = out
self.audit.append(ToolCall(now, tool, args, actor, True, "ok"))
return out
ReadOnlyToolServer.call = call Explanation
- - Four security layers: allowlist, read-only naming, argument inspection, and audit logging.
- - As with register() and _cache_key(), we attach call() to the class so the tutorial remains runnable top-to-bottom.
- - Caching reduces redundant work and helps control costs in LLM-integrated systems.
- - Every call is logged (success or failure) for observability.
- - This pattern scales from 1 agent to 100 agents without changing the boundary logic.
Why this matters
Once tools are centralized, you can test, version, and secure them without editing every agent.
Takeaway
A call() method with explicit checks turns tool access into policy enforcement.
EXPERIMENT CELL
Experiment 2B — Register shared read-only tools and reuse across agents
Expose a shared alarm-count tool and show two agents getting the same answer (with caching).
# Reuse ALARMS + agent1_parse_counts from Experiment 1
def tool_alarm_counts(line: str):
filtered = [a for a in ALARMS if a["line"] == line]
return agent1_parse_counts(filtered)
srv = ReadOnlyToolServer()
srv.register("read_alarm_counts", tool_alarm_counts)
print("Tools:", srv.list_tools())
agent_a = "diagnostics_agent"
agent_b = "shift_report_agent"
res1 = srv.call("read_alarm_counts", {"line": "LINE-7"}, actor=agent_a)
res2 = srv.call("read_alarm_counts", {"line": "LINE-7"}, actor=agent_b) # cache hit
print("A counts:", res1)
print("B counts:", res2) Expected output
Tools: ['read_alarm_counts']
A counts: {'PE_TIMEOUT': 3, 'VFD_OVERCURRENT': 1, 'E_STOP': 1}
B counts: {'PE_TIMEOUT': 3, 'VFD_OVERCURRENT': 1, 'E_STOP': 1} Explanation
- - Two agents consume one shared tool implementation, so there is no “counts drift.”
- - The second call is a cache hit — a small but real cost-control lever once you add LLMs.
- - Typical cost impact in real deployments: caching prevents redundant tool reads and reduces token-heavy “re-checking” behavior.
Takeaway
Shared tools eliminate drift and reduce redundant calls.
EXPERIMENT CELL
Experiment 2C — Block write attempts + inspect the audit trail
Demonstrate boundary enforcement and audit visibility.
# Unknown tool (blocked by allowlist)
try:
srv.call("write_plc_tag", {"tag": "Conveyor.Start", "value": True}, actor=agent_a)
except Exception as e:
print("Write attempt blocked:", type(e).__name__, str(e))
# Suspicious arg (blocked by write key detection)
try:
srv.call("read_alarm_counts", {"line": "LINE-7", "write_override": True}, actor=agent_b)
except Exception as e:
print("Arg write blocked:", type(e).__name__, str(e))
ok = sum(1 for c in srv.audit if c.ok)
blocked = sum(1 for c in srv.audit if (not c.ok))
cache_hits = sum(1 for c in srv.audit if c.reason == "cache_hit")
print(f"audit entries={len(srv.audit)} ok={ok} blocked={blocked} cache_hits={cache_hits}")
print("last 3 audit:")
for c in srv.audit[-3:]:
print(c.tool, c.actor, c.ok, c.reason) Expected output
Write attempt blocked: KeyError 'Unknown tool: write_plc_tag' Arg write blocked: PermissionError Read-only boundary: write operations blocked audit entries=4 ok=2 blocked=2 cache_hits=1 last 3 audit: read_alarm_counts shift_report_agent True cache_hit write_plc_tag diagnostics_agent False unknown_tool read_alarm_counts shift_report_agent False write_blocked
Explanation
- - Two different “write-like” attempts were blocked: unknown tool and suspicious write arguments.
- - The audit trail shows who attempted what, and whether it was allowed.
- - In production, this audit stream becomes your observability backbone (System Track), and later your governance surface (Architect Track).
- - Typical LLM cost impact: clearer tool boundaries reduce “tool thrash” and retries (~$0.05–$0.30/run depending on the agent’s prompting).
Common mistake
Letting the agent call arbitrary tools because “it’s only internal.”
Takeaway
Boundaries + audit logs turn tool access from implicit to testable.
CHECKPOINT CELL
Checkpoint — MCP basics you should retain
Lock in the architecture before moving to RAG and richer tool stacks.
Explanation
- - MCP concept: a standard way for agents to access tools through a dedicated server boundary.
- - Read-only first: high value, lower risk, easier approval in OT environments.
- - Shared tooling: one implementation, many agents — consistent facts and simpler maintenance.
- - Security boundary basics: allowlists, argument validation, caching, and audit logs.
- - Expected learning time for the full tutorial: ~55 minutes.
- - Expected API cost for these experiments: $0.00 (pure Python). If integrated with a tool-using LLM agent, budget ~$0.10–$0.50 for a few runs while you tune prompts.
Takeaway
MCP is an architectural boundary: centralize tools, enforce policy, and share safely.
Further Reading
Official Documentation
-
Model Context Protocol (MCP) Specification
Official MCP protocol specification for standardized tool interfaces between agents and systems.
-
Anthropic Python SDK
Tool use and function calling patterns with Claude, including best practices for industrial applications.
-
OpenAI Function Calling Guide
Complete guide to function calling, tool schemas, and response parsing for GPT models.
Industrial Patterns
-
LangChain Tools Documentation
Tool abstraction patterns, custom tool creation, and integration with agents.
-
MCP Server Examples
Reference implementations of MCP servers for various use cases and integrations.
-
Anthropic Tool Use Research
Research on safe and effective tool use patterns for LLM systems.
✅ KEY TAKEAWAYS
- ✅ Tool drift is a real failure mode: duplicated code becomes duplicated truth.
- ✅ A read-only tool server gives you a safe starting point with real industrial value.
- ✅ An allowlist + argument validation is the minimum viable security boundary for tools.
- ✅ Caching is not just performance — it is a cost and stability guardrail once LLMs are calling tools.
- ✅ Audit logs are the seed of production observability and future governance.
- ✅ This tutorial sets up the mental model you will need before building real MCP servers in D6.
🔜 NEXT TUTORIAL
D7 — RAG Foundations (LlamaIndex)
Add retrieval with citations so agents can quote manuals, SOPs, and historical fixes.