🟣 Developer Track
Tutorial 6 of 16
🟣 DEVELOPER TRACK • FOUNDATIONS INTERMEDIATE

Tutorial D6: MCP Tooling & Architecture Basics

Read-only tool servers, shared tooling, and security boundaries

✅ CORE MISSION OF THIS TUTORIAL

By the end of this tutorial, the reader will be able to:

  • Explain MCP (Model Context Protocol) as a tool boundary between agents and plant systems
  • Design a read-only tool server that exposes safe query-only capabilities
  • Replace duplicated per-agent tool glue with shared tooling usable by multiple agents
  • Implement basic security boundaries: allowlists, argument validation, audit logs, and caching
  • Prepare for D6: Tool Abstraction & MCP Server Introduction (building a real server interface)

This tutorial focuses on architecture basics: the point is not more tools — it is bounded tools.

🌍 VENDOR-AGNOSTIC ENGINEERING NOTE

This tutorial uses:

  • Alarm logs (CSV/JSON) from any SCADA/HMI historian
  • PLC tag reads via OPC UA, ADS, S7, EtherNet/IP gateways (read-only)
  • TwinCAT, Siemens TIA Portal, CODESYS, Allen-Bradley ecosystems
  • Any plant data source that can be queried safely

We keep examples read-only and simulation-friendly. No live PLC writes in this tutorial.

1️⃣ WHY MCP EXISTS — TOOL CHAOS IN MULTI-AGENT SYSTEMS

Here’s a predictable failure mode when you build multiple industrial agents: every agent re-implements the same “tool glue”. One agent parses alarms one way, another agent parses alarms a different way, and you now have tool drift — inconsistent facts feeding “confident” recommendations.

In a packaging line scenario, you might have:

  • A diagnostics agent reading alarm windows for “what happened in the last 15 minutes”
  • A shift report agent summarizing the last hour and counting recurring faults
  • A maintenance agent correlating tag reads with fault codes

Key Principle: Tools are part of your safety model.
If every agent ships its own tool code, you are scaling risk along with capability.

MCP (Model Context Protocol) is a practical answer: move tools out of agents, put them behind a dedicated server boundary, and make that boundary enforce policy: read-only, allowlisted, auditable, and shared.

BEFORE MCP: DUPLICATED TOOL GLUE

graph LR
    A[Diagnostics Agent]:::purple --> B[Tool Glue Copy 1]:::pink
    C[Shift Report Agent]:::purple --> D[Tool Glue Copy 2]:::pink

    B --> E[Alarm Historian]:::cyan
    D --> E[Alarm Historian]:::cyan

    classDef cyan fill:#1a1a1e,stroke:#04d9ff,stroke-width:2px,color:#04d9ff;
    classDef purple fill:#1a1a1e,stroke:#6366f1,stroke-width:2px,color:#6366f1;
    classDef pink fill:#1a1a1e,stroke:#ff2bd6,stroke-width:2px,color:#ff2bd6;

The visible symptom is “inconsistent answers.” The underlying issue is architectural: duplicated tooling becomes duplicated truth.

2️⃣ MCP MENTAL MODEL — A TOOL SERVER AS A SAFETY BOUNDARY

The simplest way to understand MCP in industrial settings: agents should not “own” plant access. Agents should request facts; a tool server should decide whether those requests are permitted, how they are executed, and how they are logged.

AFTER MCP: SHARED TOOL SERVER BOUNDARY

graph LR
    A[Diagnostics Agent]:::purple --> S[MCP Tool Server]:::purple
    B[Shift Report Agent]:::purple --> S[MCP Tool Server]:::purple

    S --> D[Alarm Historian]:::cyan
    S --> T[Read Only Tag Access]:::cyan
    S --> H[Audit Log]:::green

    classDef cyan fill:#1a1a1e,stroke:#04d9ff,stroke-width:2px,color:#04d9ff;
    classDef purple fill:#1a1a1e,stroke:#6366f1,stroke-width:2px,color:#6366f1;
    classDef green fill:#1a1a1e,stroke:#00ff7f,stroke-width:2px,color:#00ff7f;

Starting with read-only is not a limitation — it’s a discipline. In most plants, “read facts + recommend actions” already delivers value without risking unintended writes.

  • Read-only tools are easier to approve, audit, and sandbox.
  • Shared tools prevent “truth drift” across agents.
  • Boundaries make “what the agent can do” explicit and testable.

3️⃣ SECURITY BOUNDARIES — THE MINIMUM VIABLE GUARDRAILS

“Security boundary” here is not a compliance checkbox — it is practical containment. Your first guardrails should be boring and enforceable:

Minimum Guardrails (Start Here)

  • Allowlist of tools (unknown tool = blocked)
  • Read-only naming convention (no write_*)
  • Argument validation (block suspicious keys)
  • Audit log for every call (who/what/when/result)

Guardrails You Add Next (D6+)

  • AuthN/AuthZ (per-agent identities and per-tool permissions)
  • Rate limits + cost caps + caching policies
  • Network segmentation (tool server sits in the right zone)
  • Tracing/observability for production (System Track)

Boundary mindset: agents ask, servers decide.
If the agent “can do anything,” you do not have a boundary — you have a hope.

Experiment 1 — Tool Duplication Drift (Before MCP)

See how two agents “agree to share tools” but still ship different logic — producing inconsistent facts.

1

SETUP CELL

Experiment 1A — Two agents, two alarm parsers

setup

Create a realistic alarm log sample and two subtly different parsing functions.

Python
from datetime import datetime, timedelta

BASE = datetime(2025, 12, 24, 7, 0, 0)

def make_alarm(minutes: int, code: str, msg: str, line: str = "LINE-7"):
    return {
        "ts": (BASE + timedelta(minutes=minutes)).isoformat(timespec="seconds"),
        "line": line,
        "code": code,
        "msg": msg,
    }

ALARMS = [
    make_alarm(0, "PE_TIMEOUT", "Photo-eye blocked > 5s at infeed"),
    make_alarm(3, "VFD_OVERCURRENT", "Main conveyor VFD overcurrent trip"),
    make_alarm(6, "PE_TIMEOUT", "Photo-eye blocked > 5s at infeed"),
    make_alarm(9, "E_STOP", "E-stop circuit opened"),
    make_alarm(12, "PE_TIMEOUT", "Photo-eye blocked > 5s at infeed"),
]

# Agent 1: correct counting by full fault code
def agent1_parse_counts(alarms):
    counts = {}
    for a in alarms:
        counts[a["code"]] = counts.get(a["code"], 0) + 1
    return counts

# Agent 2: BUG — truncates fault code at underscore
def agent2_parse_counts(alarms):
    counts = {}
    for a in alarms:
        short = a["code"].split("_")[0]
        counts[short] = counts.get(short, 0) + 1
    return counts

Explanation

  • - This is the most common “shared tooling” failure: copy/paste reuse without a single shared implementation.
  • - The bug is subtle: counts still “look plausible,” but the keys change (PE vs PE_TIMEOUT).

Takeaway

If tools live inside agents, they will drift.

2

EXPERIMENT CELL

Experiment 1B — Same alarms, different “truth”

experiment

Run both parsers and compare the counts.

Python
print("Agent1 counts:", agent1_parse_counts(ALARMS))
print("Agent2 counts:", agent2_parse_counts(ALARMS))
Expected output
Agent1 counts: {'PE_TIMEOUT': 3, 'VFD_OVERCURRENT': 1, 'E_STOP': 1}
Agent2 counts: {'PE': 3, 'VFD': 1, 'E': 1}

Explanation

  • - If these two agents feed separate dashboards, you will get arguments about “what is actually recurring.”
  • - In a tool-using LLM system, the model will confidently explain whichever facts you hand it.
  • - Typical LLM cost impact: the model may waste tokens re-checking or re-querying when counts don’t align (~$0.05–$0.20/run depending on prompts).

Common mistake

Assuming "we use the same tools" because two repos have similar code.

Takeaway

Duplicated tooling creates inconsistent facts, not just duplicated effort.

3

CHECKPOINT CELL

Checkpoint — What broke (and what it costs)

checkpoint

Confirm the failure mode before introducing a tool server boundary.

Explanation

  • - Failure mode: two agents interpret the same plant history differently due to tool drift.
  • - Operational symptom: conflicting recommendations and wasted troubleshooting time.
  • - Expected learning time so far: ~15 minutes.
  • - Expected API cost if you integrate this into a live tool-using agent: ~$0.05–$0.20/run (mostly from “extra reasoning” over inconsistent facts).

Takeaway

Before MCP, the problem is not the model — it is uncontrolled tool implementation.

Experiment 2 — Read-Only Tool Server (MCP-style Boundary)

Build a tiny “MCP-like” tool host: allowlisted tools, read-only enforcement, caching, and an audit trail.

4

SETUP CELL

Experiment 2A — Define audit trail structure and server skeleton

setup

Create data models for tracking tool calls and the server container.

Python
from dataclasses import dataclass, field
from datetime import datetime, timezone
import json, hashlib

@dataclass
class ToolCall:
    """Records one tool invocation for audit purposes."""
    ts: str          # timestamp of call
    tool: str        # tool name
    args: dict       # arguments passed
    actor: str       # agent identifier
    ok: bool         # whether call succeeded
    reason: str      # "ok", "cache_hit", "unknown_tool", "write_blocked"

@dataclass
class ReadOnlyToolServer:
    """Minimal MCP-style tool host with read-only enforcement."""
    tools: dict = field(default_factory=dict)    # tool name → function
    audit: list = field(default_factory=list)    # ToolCall records
    cache: dict = field(default_factory=dict)    # cache key → result

Explanation

  • - ToolCall captures who called what, when, and whether it was allowed.
  • - The server has three core pieces: a tool registry, an audit log, and a cache.
  • - This structure makes the boundary explicit and testable.

Takeaway

Explicit data models turn tool access from implicit to auditable.

5

SETUP CELL

Experiment 2A — Add tool registration with read-only enforcement

setup

Implement register() and attach it to the server so the notebook stays runnable cell-by-cell.

Python
def register(self, name: str, fn):
    """Register a tool function. Reject write_* tools by name."""
    if name.startswith("write_"):
        raise ValueError("Read-only server forbids write_* tools")
    self.tools[name] = fn

def list_tools(self):
    """Return sorted list of registered tool names."""
    return sorted(self.tools.keys())

# Attach methods onto the class so later cells can use srv.register(...)
ReadOnlyToolServer.register = register
ReadOnlyToolServer.list_tools = list_tools

Explanation

  • - In a notebook-style tutorial, defining the function is not enough: we also attach it onto ReadOnlyToolServer so the next cells can call it.
  • - The naming convention (write_*) is a simple but effective first filter.
  • - Rejecting at registration time prevents write tools that are never called from accumulating.
  • - list_tools() gives agents visibility into what they can call.

Takeaway

Read-only starts at registration: block write tools before they enter the system.

6

SETUP CELL

Experiment 2A — Implement deterministic cache keys

setup

Hash tool name + args to create a stable cache key, then attach that helper to the class.

Python
def _cache_key(self, tool: str, args: dict) -> str:
    """Generate a deterministic cache key from tool name and arguments."""
    blob = json.dumps({"tool": tool, "args": args}, sort_keys=True).encode()
    return hashlib.sha256(blob).hexdigest()

ReadOnlyToolServer._cache_key = _cache_key

Explanation

  • - Deterministic hashing ensures identical calls (same tool, same args) hit the cache.
  • - sort_keys=True makes {"line": "A", "window": 5} and {"window": 5, "line": "A"} produce the same key.
  • - Attaching the helper keeps the class definition incremental without breaking the next experiment cell.
  • - Caching reduces redundant tool reads and LLM token costs when agents re-ask the same question.

Takeaway

Deterministic cache keys turn repeated queries into cheap lookups.

7

CORE CELL

Experiment 2A — Implement call() with security checks and caching

core

Enforce allowlist, read-only checks, caching, and audit logging, then attach call() to the class.

Python
def call(self, tool: str, args: dict, actor: str):
    """Execute a tool call with security boundaries and audit trail."""
    now = datetime.now(timezone.utc).isoformat(timespec="seconds")

    # 1. Allowlist check: unknown tools are blocked
    if tool not in self.tools:
        self.audit.append(ToolCall(now, tool, args, actor, False, "unknown_tool"))
        raise KeyError(f"Unknown tool: {tool}")

    # 2. Read-only boundary: block write_* tools and suspicious args
    if tool.startswith("write_") or any(k.startswith("write") for k in args.keys()):
        self.audit.append(ToolCall(now, tool, args, actor, False, "write_blocked"))
        raise PermissionError("Read-only boundary: write operations blocked")

    # 3. Cache check: return cached result if available
    key = self._cache_key(tool, args)
    if key in self.cache:
        self.audit.append(ToolCall(now, tool, args, actor, True, "cache_hit"))
        return self.cache[key]

    # 4. Execute tool and cache result
    out = self.tools[tool](**args)
    self.cache[key] = out
    self.audit.append(ToolCall(now, tool, args, actor, True, "ok"))
    return out

ReadOnlyToolServer.call = call

Explanation

  • - Four security layers: allowlist, read-only naming, argument inspection, and audit logging.
  • - As with register() and _cache_key(), we attach call() to the class so the tutorial remains runnable top-to-bottom.
  • - Caching reduces redundant work and helps control costs in LLM-integrated systems.
  • - Every call is logged (success or failure) for observability.
  • - This pattern scales from 1 agent to 100 agents without changing the boundary logic.

Why this matters

Once tools are centralized, you can test, version, and secure them without editing every agent.

Takeaway

A call() method with explicit checks turns tool access into policy enforcement.

8

EXPERIMENT CELL

Experiment 2B — Register shared read-only tools and reuse across agents

experiment

Expose a shared alarm-count tool and show two agents getting the same answer (with caching).

Python
# Reuse ALARMS + agent1_parse_counts from Experiment 1

def tool_alarm_counts(line: str):
    filtered = [a for a in ALARMS if a["line"] == line]
    return agent1_parse_counts(filtered)

srv = ReadOnlyToolServer()
srv.register("read_alarm_counts", tool_alarm_counts)

print("Tools:", srv.list_tools())

agent_a = "diagnostics_agent"
agent_b = "shift_report_agent"

res1 = srv.call("read_alarm_counts", {"line": "LINE-7"}, actor=agent_a)
res2 = srv.call("read_alarm_counts", {"line": "LINE-7"}, actor=agent_b)  # cache hit

print("A counts:", res1)
print("B counts:", res2)
Expected output
Tools: ['read_alarm_counts']
A counts: {'PE_TIMEOUT': 3, 'VFD_OVERCURRENT': 1, 'E_STOP': 1}
B counts: {'PE_TIMEOUT': 3, 'VFD_OVERCURRENT': 1, 'E_STOP': 1}

Explanation

  • - Two agents consume one shared tool implementation, so there is no “counts drift.”
  • - The second call is a cache hit — a small but real cost-control lever once you add LLMs.
  • - Typical cost impact in real deployments: caching prevents redundant tool reads and reduces token-heavy “re-checking” behavior.

Takeaway

Shared tools eliminate drift and reduce redundant calls.

9

EXPERIMENT CELL

Experiment 2C — Block write attempts + inspect the audit trail

experiment

Demonstrate boundary enforcement and audit visibility.

Python
# Unknown tool (blocked by allowlist)
try:
    srv.call("write_plc_tag", {"tag": "Conveyor.Start", "value": True}, actor=agent_a)
except Exception as e:
    print("Write attempt blocked:", type(e).__name__, str(e))

# Suspicious arg (blocked by write key detection)
try:
    srv.call("read_alarm_counts", {"line": "LINE-7", "write_override": True}, actor=agent_b)
except Exception as e:
    print("Arg write blocked:", type(e).__name__, str(e))

ok = sum(1 for c in srv.audit if c.ok)
blocked = sum(1 for c in srv.audit if (not c.ok))
cache_hits = sum(1 for c in srv.audit if c.reason == "cache_hit")

print(f"audit entries={len(srv.audit)} ok={ok} blocked={blocked} cache_hits={cache_hits}")
print("last 3 audit:")
for c in srv.audit[-3:]:
    print(c.tool, c.actor, c.ok, c.reason)
Expected output
Write attempt blocked: KeyError 'Unknown tool: write_plc_tag'
Arg write blocked: PermissionError Read-only boundary: write operations blocked
audit entries=4 ok=2 blocked=2 cache_hits=1
last 3 audit:
read_alarm_counts shift_report_agent True cache_hit
write_plc_tag diagnostics_agent False unknown_tool
read_alarm_counts shift_report_agent False write_blocked

Explanation

  • - Two different “write-like” attempts were blocked: unknown tool and suspicious write arguments.
  • - The audit trail shows who attempted what, and whether it was allowed.
  • - In production, this audit stream becomes your observability backbone (System Track), and later your governance surface (Architect Track).
  • - Typical LLM cost impact: clearer tool boundaries reduce “tool thrash” and retries (~$0.05–$0.30/run depending on the agent’s prompting).

Common mistake

Letting the agent call arbitrary tools because “it’s only internal.”

Takeaway

Boundaries + audit logs turn tool access from implicit to testable.

10

CHECKPOINT CELL

Checkpoint — MCP basics you should retain

checkpoint

Lock in the architecture before moving to RAG and richer tool stacks.

Explanation

  • - MCP concept: a standard way for agents to access tools through a dedicated server boundary.
  • - Read-only first: high value, lower risk, easier approval in OT environments.
  • - Shared tooling: one implementation, many agents — consistent facts and simpler maintenance.
  • - Security boundary basics: allowlists, argument validation, caching, and audit logs.
  • - Expected learning time for the full tutorial: ~55 minutes.
  • - Expected API cost for these experiments: $0.00 (pure Python). If integrated with a tool-using LLM agent, budget ~$0.10–$0.50 for a few runs while you tune prompts.

Takeaway

MCP is an architectural boundary: centralize tools, enforce policy, and share safely.

Further Reading

Official Documentation

Industrial Patterns

✅ KEY TAKEAWAYS

  • Tool drift is a real failure mode: duplicated code becomes duplicated truth.
  • A read-only tool server gives you a safe starting point with real industrial value.
  • An allowlist + argument validation is the minimum viable security boundary for tools.
  • Caching is not just performance — it is a cost and stability guardrail once LLMs are calling tools.
  • Audit logs are the seed of production observability and future governance.
  • This tutorial sets up the mental model you will need before building real MCP servers in D6.

🔜 NEXT TUTORIAL

D7 — RAG Foundations (LlamaIndex)

Add retrieval with citations so agents can quote manuals, SOPs, and historical fixes.