🟢 TECHNICIAN TRACK • BEGINNER

Tutorial #7: Chain-of-Thought for Logic Review

Transparent Step-by-Step Reasoning

✅ CORE MISSION OF THIS TUTORIAL

By the end of this tutorial, the reader will be able to:

✅ Understand what chain-of-thought (CoT) means in practical engineering terms
✅ Ask AI to explain its reasoning, not just give a verdict
✅ Use structured reasoning formats for PLC logic review
✅ Spot subtle logic issues more easily by making assumptions explicit
✅ Prepare for future agent behavior that is transparent and auditable

This tutorial focuses on logic review, not safety certification.

⚠️

⚠️ SAFETY BOUNDARY REMINDER

This tutorial performs analysis only.

It must never be connected to:

Live PLCs
Production deployment pipelines
Safety-rated controllers
Motion or power systems

> All outputs are advisory-only and always require explicit human approval before any real-world action.

🌍 VENDOR-AGNOSTIC ENGINEERING NOTE

This tutorial uses:

▸ Generic IEC 61131-3 Structured Text (ST)
▸ TwinCAT, Siemens TIA Portal, CODESYS
▸ Allen-Bradley ST
▸ Any IEC-based runtime

No vendor-specific libraries, no runtime access, no PLC connections.

1️⃣ WHAT IS CHAIN-OF-THOUGHT IN ENGINEERING TERMS?

In this context:

Practical “chain-of-thought” = a structured rationale (assumptions + checks) that explains how a conclusion was reached.

❌ Without Chain-of-Thought

"Yes, the logic is correct."

Not auditable, hard to verify

✅ With Chain-of-Thought

Step 1: When StartButton is TRUE...
Step 2: When StopButton is TRUE...
Step 3: Therefore the logic behaves as specified.

Transparent, inspectable reasoning

Chain-of-thought does not make the model smarter by itself.
It makes parts of the model's rationale inspectable by you.

Benefits for PLC Engineers

🔹 See Where AI Might Be Wrong

Inspect each reasoning step

🔹 Challenge Assumptions

Correct flawed logic immediately

🔹 Reuse Reasoning Patterns

Apply across similar reviews

🔹 Build Trust Through Transparency

Not blind faith

2️⃣ REFERENCE SCENARIO — REVIEWING SIMPLE MOTOR LOGIC

We reuse a familiar IEC ST snippet:

PASCAL

IF StartButton THEN
    MotorRunning := TRUE;
END_IF;

IF StopButton THEN
    MotorRunning := FALSE;
END_IF;

Specification (Informal)

▸ When StartButton is TRUE → MotorRunning should become TRUE
▸ When StopButton is TRUE → MotorRunning should become FALSE
▸ When neither button is pressed → MotorRunning should keep its last state

We want the AI to:

▸ Explain whether the logic matches the spec
▸ Show intermediate reasoning steps
▸ Output a clear VERDICT

3️⃣ CONCEPT: VERDICT-ONLY VS CHAIN-OF-THOUGHT OUTPUT

Two modes of AI behavior:

❌ Verdict-Only Mode

graph LR
    A[Code + Spec] --> B[AI Analysis]
    B --> C[Verdict Only]

    style A fill:#1a1a1e,stroke:#04d9ff,stroke-width:2px,color:#fff
    style B fill:#1a1a1e,stroke:#9e4aff,stroke-width:2px,color:#fff
    style C fill:#1a1a1e,stroke:#ff4fd8,stroke-width:2px,color:#ff4fd8

❌ Hard to verify
❌ Hard to challenge
❌ Black box decision

✅ Chain-of-Thought Mode

graph LR
    A[Code + Spec] --> B[Assumptions]
    B --> C[Step-by-Step]
    C --> D[Verdict]

    style A fill:#1a1a1e,stroke:#04d9ff,stroke-width:2px,color:#fff
    style B fill:#1a1a1e,stroke:#9e4aff,stroke-width:2px,color:#fff
    style C fill:#1a1a1e,stroke:#9e4aff,stroke-width:2px,color:#fff
    style D fill:#1a1a1e,stroke:#00ff7f,stroke-width:2px,color:#00ff7f

✅ Lists assumptions
✅ Walks through conditions
✅ Compares to spec
✅ Clear verdict

Prefer requesting the structured rationale mode for logic review (but still validate with testing/simulation).

4️⃣ PRACTICAL EXPERIMENTS

🧪 Experiment 1: Verdict-Only vs Explicit Chain-of-Thought

Objective

See the difference between a shallow answer and a structured reasoning trace.

Python Code

Python

from openai import OpenAI

client = OpenAI()

iec_code = """
IF StartButton THEN
    MotorRunning := TRUE;
END_IF;

IF StopButton THEN
    MotorRunning := FALSE;
END_IF;
"""

spec = """
Specification:
- When StartButton is TRUE, MotorRunning should become TRUE.
- When StopButton is TRUE, MotorRunning should become FALSE.
- When neither button is pressed, MotorRunning should retain its last state.
"""

# --- Part A: Verdict-only style prompt ---

prompt_verdict_only = f"""
You are a PLC logic reviewer.

Here is the code (IEC 61131-3 ST):

{iec_code}

{spec}

Question:
Does this logic match the specification? Answer briefly with YES or NO and one short sentence.
"""

response_verdict = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": prompt_verdict_only}
    ]
)

print("=== VERDICT-ONLY ===")
print(response_verdict.choices[0].message.content)

# --- Part B: Structured rationale prompt ---

prompt_rationale = f"""
You are a PLC logic reviewer.

Here is the code (IEC 61131-3 ST):

{iec_code}

{spec}

Follow this format exactly:

ASSUMPTIONS:
- ...

CHECKS (brief, human-auditable):
1. ...
2. ...
3. ...

VERDICT:
- PASS or FAIL, with one short justification.

Do not provide hidden internal chain-of-thought. Provide only a concise rationale and checks that a human can audit.
"""

response_rationale = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": prompt_rationale}
    ]
)

print("\n=== STRUCTURED RATIONALE ===")
print(response_rationale.choices[0].message.content)

Expected Output

=== VERDICT-ONLY ===
YES. The logic matches the specification.

=== STRUCTURED RATIONALE ===
ASSUMPTIONS:
- MotorRunning keeps its previous value unless explicitly set.
- StartButton and StopButton are momentary signals.

CHECKS (brief, human-auditable):
1. When StartButton is TRUE, MotorRunning is set to TRUE.
2. When StopButton is TRUE, MotorRunning is set to FALSE.
3. When both are FALSE, MotorRunning is not reassigned and keeps its last state.

VERDICT:
- PASS. The logic behaves according to the given specification.

Interpretation

▸ ❌ Verdict-only is not auditable
▸ ✅ Structured rationale exposes assumptions and checks
▸ ✅ You can now agree or disagree with concrete steps
▸ Cost/runtime vary by model, pricing, and system load

🧪 Experiment 2: Chain-of-Thought for Detecting a Logic Bug

Objective

Use structured reasoning to catch a subtle behavior issue.

Python Code

Python

from openai import OpenAI

client = OpenAI()

# Deliberately faulty code with ELSIF
iec_faulty = """
IF StartButton THEN
    MotorRunning := TRUE;
ELSIF StopButton THEN
    MotorRunning := FALSE;
END_IF;
"""

spec = """
Specification:
- When StartButton is TRUE, MotorRunning should become TRUE.
- When StopButton is TRUE, MotorRunning should become FALSE.
- If both StartButton and StopButton are TRUE at the same time,
  StopButton must take priority and MotorRunning should become FALSE.
- When neither button is pressed, MotorRunning should retain its last state.
"""

prompt_faulty_cot = f"""
You are a PLC logic reviewer.

Here is the code (IEC 61131-3 ST):

{iec_faulty}

{spec}

Follow this format exactly:

ASSUMPTIONS:
- ...

STEP_BY_STEP_REASONING:
1. ...
2. ...
3. ...

EDGE_CASE_ANALYSIS:
- Describe what happens when both StartButton and StopButton are TRUE.
- Compare this to the specification.

VERDICT:
- PASS or FAIL, with a short justification.

Do not provide hidden internal chain-of-thought. Provide only a concise rationale and checks that a human can audit.
"""

response_faulty = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": prompt_faulty_cot}
    ]
)

print(response_faulty.choices[0].message.content)

Expected Output

ASSUMPTIONS:
- MotorRunning keeps its previous value unless assigned.
- StartButton and StopButton can be TRUE at the same time.

STEP_BY_STEP_REASONING:
1. When StartButton is TRUE and StopButton is FALSE, MotorRunning is set to TRUE.
2. When StartButton is FALSE and StopButton is TRUE, MotorRunning is set to FALSE.
3. When both are FALSE, MotorRunning holds its last state.

EDGE_CASE_ANALYSIS:
- When both StartButton and StopButton are TRUE, only the StartButton branch executes due to ELSIF.
- This sets MotorRunning to TRUE, while the specification requires StopButton to take priority and force MotorRunning to FALSE.

VERDICT:
- FAIL. The ELSIF structure prioritizes StartButton over StopButton, which violates the specification.

Interpretation

▸ ✅ Surfaces edge case behavior clearly
▸ ✅ Maps behavior back to the written specification
▸ ✅ Provides a clear, auditable FAIL verdict
▸ ✅ ELSIF bug caught through explicit reasoning
▸ Cost/runtime vary by model, pricing, and system load

⚠️ THE INTEGRATION CHALLENGE

In production: reasoning traces need to be aggregated, searched, and checked for completeness. Free-form CoT text makes this hard to do reliably at scale.

Chain-of-thought provides transparency, but not processability:

Example: Programmatically checking if edge cases were analyzed

Python

cot_output = """
ASSUMPTIONS:
- MotorRunning keeps its previous value unless assigned.
- StartButton and StopButton can be TRUE at the same time.

STEP_BY_STEP_REASONING:
1. When StartButton is TRUE...
2. When StartButton is FALSE and StopButton is TRUE...
3. When both are FALSE...

EDGE_CASE_ANALYSIS:
- When both buttons TRUE, only StartButton branch executes...

VERDICT:
- FAIL. The ELSIF structure prioritizes StartButton...
"""

# How do you automatically detect if edge case analysis was performed?
# How do you extract which specific edge cases were considered?
# How do you validate that all required reasoning steps are present?

# String parsing is fragile:
has_edge_analysis = "EDGE_CASE_ANALYSIS:" in cot_output  # Brittle
verdict = "FAIL" if "FAIL" in cot_output else "PASS"    # Unreliable

# Problem: Can't reliably build automation on top of prose

✅ What CoT Provides

✅ Human-readable reasoning
✅ Auditable step-by-step logic
✅ Transparent assumptions

⚠️ What CoT Doesn't Provide

⚠️ Machine-parseable fields
⚠️ Reliable extraction of verdicts
⚠️ Automated validation of completeness
⚠️ Multi-agent interoperability

Tutorial #9 covers schema-first design + field-level validation + audit logging — converting rationale into structured JSON with explicit fields for assumptions, edge cases, and verdicts. This makes reasoning checkable: you can validate completeness, extract specific findings, and aggregate across multiple analyses.

Important: Structured outputs solve format problems, not correctness problems. An agent can return perfectly valid JSON with wrong reasoning. Structured outputs make reasoning checkable (constraints, cross-checks, validators), not correct.

Note: In production, prefer brief rationale summaries over full reasoning transcripts. The audit trail is the structured output + evidence + validation results, not verbatim internal reasoning.

🔒 EXPLICIT OPERATIONAL PROHIBITIONS

❌ Never Use Chain-of-Thought For:

❌ Treating chain-of-thought outputs as formal verification
❌ Using AI reasoning as a replacement for testing or simulation
❌ Letting AI approve or merge PLC code changes automatically
❌ Using this process for safety certification or compliance

Chain-of-thought is a review aid, not a formal method.

✅ KEY TAKEAWAYS

✅ Chain-of-thought = showing the reasoning, not just verdicts
✅ It makes AI outputs auditable, challengeable, and reusable
✅ You can catch subtle logic bugs by forcing edge case analysis
✅ The engineer remains the final decision-maker
✅ This pattern will later power transparent agent behavior in higher tracks

🔜 NEXT TUTORIAL

#8 — Building Your First Tool-Using Agent

Extend your skills by connecting reasoning to a single, strictly read-only tool in a controlled way.

🧭 ENGINEERING POSTURE

This tutorial enforced:

▸ Transparency over black-box answers
▸ Structured reasoning over intuition
▸ Human authority over all conclusions
▸ Advisory tooling over autonomous control