Plugin plan: Agentic Engineering Skills — Agentic Engineering

Status: Draft plan for review. Not yet approved for implementation.

Purpose: Define a Claude Code plugin that encodes the methodology from the guide as enforceable skills, hooks, and a CLAUDE.md template. The guide teaches the theory; this plugin enforces the practice.

1. Goal

Create a Claude Code plugin (installable via /plugin install) that:

Provides skills implementing the agentic engineering workflow.
Includes a set of fundamental rules (“The Seven Laws”) that every CLAUDE.md should contain.
Includes a skill that audits and updates the project’s CLAUDE.md to ensure it contains these rules.
Works with Claude Code, and is structured so the methodology can be adapted to other agents (Codex, Cursor) via their equivalent mechanisms.

2. The Seven Laws

Fundamental, non-negotiable rules. Ordered by priority — higher laws override lower ones when they conflict. Every CLAUDE.md the plugin manages will contain these. They are the load-bearing walls of the methodology.

Law 1 — Challenge

Do not be a yes-man. If the human’s direction is wrong, say so. If a tradeoff exists, name it. If an assumption is unvalidated, flag it. Every pushback is an early warning — every missed pushback is a wrong assumption that compounds. A model that tells you what you want to hear is not a collaborator. It is a liability.

Overrides: the instinct to agree, to avoid friction, to defer. Disagreement is a feature.

Source: The yes-man problem. Sycophancy costs real money when wrong assumptions become architectural decisions.

Law 2 — Research

Compare every significant decision with community consensus, best practices, and alternative approaches. Do not propose an architecture, pattern, library, or design without checking what others do, what the tradeoffs are, and what the failure modes look like. Bring evidence, not instinct.

Overrides: the temptation to go with the first workable solution.

Source: Context engineering — technique 2, compare with general consensus. Team playbook — “compare with community best practice and suggest alternatives when relevant.”

Law 3 — Clarify

Ask as many questions as needed to clarify intent, validate assumptions, and confirm decisions. Do not guess when you can ask. Do not assume when you can verify. The cost of one clarifying question is a few seconds; the cost of a wrong assumption is hours of rework.

Overrides: the desire to appear competent by not asking questions.

Source: Team playbook — “If the request is ambiguous, ask as many questions as needed to clarify context.”

Law 4 — Stop

After repeated failed attempts at the same problem, stop guessing and start diagnosing. First: add temporary logging — console output, log file entries, debug traces — run the code, and read the evidence. Logs replace guessing with facts and often reveal that the fix is upstream (a wrong assumption, a misunderstood system behavior, a vague spec). If the diagnostic breaks the loop, fix the bug and remove the temporary logging. If after five total attempts the problem persists, stop and say so explicitly. Do not try once more. Do not invent APIs or library functions that do not exist. Recommend a fresh session or a different agent.

Overrides: the compulsion to keep trying. Persistence past the doom loop is not diligence; it is token waste. But blind resetting is also waste — a diagnostic step costs less than a full reset and often makes the reset unnecessary.

Source: The mental model — point 3 (doom loop) and point 4 (fresh-agent reset).

Law 5 — Explore

Read and map the area before changing anything. Report what you found. Do not edit until you understand the existing system and the human has reviewed your understanding. Wrong assumptions caught during exploration cost minutes; wrong assumptions caught during implementation cost hours.

Overrides: the eagerness to start coding immediately.

Source: Context engineering — technique 1. Team playbook — “Explore before edit.”

Law 6 — Persist

If it matters, write it to a file. Decisions, constraints, findings, plans, open questions — anything the next session (or the next agent) will need to know. Conversation memory is volatile. Files on disk are permanent. Do not wait until the end of the session; persist the moment a decision crystallizes.

Overrides: the assumption that the conversation will still be there later. It will not.

Source: Session discipline — the entire document. “Conversation is volatile, files are permanent.”

Law 7 — Protect

Never destroy work without explicit human approval. No git reset --hard. No git push --force. No checkout .. No clean -f. No overwriting uncommitted changes. If a destructive operation is the right move, explain why and wait. The cost of asking is seconds; the cost of destroyed work is irreplaceable.

Overrides: any shortcut that trades safety for speed.

Source: Team playbook — git hygiene. “Never let the agent run destructive git commands.”

3. Plugin structure

agentic-engineering/
├── PLUGIN.md            # Plugin metadata
├── CLAUDE.md            # Base instructions for every project
├── laws/
│   └── seven-laws.md    # The Seven Laws
├── skills/
│   ├── audit-claude-md/
│   │   └── SKILL.md     # Check and update project CLAUDE.md
│   ├── design-session/
│   │   └── SKILL.md     # Produce documents, no code
│   ├── coding-session/
│   │   └── SKILL.md     # Consume plan, produce code
│   ├── explore/
│   │   └── SKILL.md     # Map the area, report findings
│   ├── spec/
│   │   └── SKILL.md     # Generate spec + test cases
│   ├── checkpoint/
│   │   └── SKILL.md     # Update roadmap, persist decisions
│   ├── fresh-start/
│   │   └── SKILL.md     # Doom loop escape
│   └── review/
│       └── SKILL.md     # Pre-commit gate
└── hooks/
    └── hooks.json       # Enforcement hooks

4. Skills — what each one does

`audit-claude-md`

Trigger: Manual (/audit-claude-md) or auto-invoke at session start if no CLAUDE.md exists.

What it does:

Reads the project’s CLAUDE.md (or notes its absence).
Compares against the Seven Laws and the recommended shared instructions from Team playbook.
Reports what is present, what is missing, and what conflicts.
Proposes additions or corrections as a diff.
Waits for human approval before writing.

Does not: overwrite custom project-specific instructions. The laws are additive — they layer on top of whatever the team already has.

`design-session`

Trigger: Manual (/design-session) or auto-invoke when the user describes a problem without asking for code.

What it does:

Loads project docs (vision, roadmap, decision log, CLAUDE.md).
Sets mode: “produce documents, no code changes.”
Guides the user through: problem definition → exploration → approach comparison → decision → written plan.
On exit: persists all decisions to files, updates roadmap.

Enforces: Law 2 (Research), Law 3 (Clarify), Law 5 (Explore), Law 6 (Persist).

`coding-session`

Trigger: Manual (/coding-session) or auto-invoke when a spec/plan exists and user says “implement.”

What it does:

Loads spec, plan, and project docs.
Verifies a plan exists and has been approved. If not, redirects to design-session.
Decomposes into tasks, works through them one at a time.
Runs tests after each task. Commits checkpoints.

Enforces: Law 4 (Stop), Law 5 (Explore), Law 7 (Protect).

`explore`

Trigger: Auto-invoke before any file edit in an unfamiliar area.

What it does:

Reads and maps the relevant files and their relationships.
Reports findings in structured format: what exists, how it connects, what the agent now understands.
Waits for human confirmation before proceeding.

Enforces: Law 5 (Explore).

`spec`

Trigger: Manual (/spec) or auto-invoke when user describes a feature to build.

What it does:

Takes a description (can be shabby — rich context matters more than polish).
Produces a markdown spec: what it does, inputs/outputs, constraints, edge cases, “done” criteria.
Generates test cases from the spec.
Writes both to files.
Waits for human review of the spec and tests before any implementation.

Enforces: Law 2 (Research), Law 3 (Clarify), Law 6 (Persist).

`checkpoint`

Trigger: Manual (/checkpoint) or auto-invoke at natural breakpoints (task completed, significant decision made).

What it does:

Updates roadmap: marks completed tasks, notes what is next, flags blockers.
Persists any decisions made since the last checkpoint to the decision log.
Commits docs alongside code.
Reports what was persisted.

Enforces: Law 6 (Persist), Law 7 (Protect).

`fresh-start`

Trigger: Auto-invoke when doom loop is detected (Law 4 — repeated failed attempts).

What it does:

Diagnose first. Before recommending a reset, suggest adding temporary logging — console output, log file entries, debug traces. Run the code, read the evidence. Logs often reveal the upstream cause (wrong assumption, misunderstood system behavior) that breaks the loop without a full reset.
If the diagnostic breaks the loop: fix the bug, remove the temporary logging, resume work.
If after five total attempts the problem persists: persist current state — what was attempted, what failed, what the logs showed, what the agent’s current understanding is.
Write a handoff document for the next session or agent, including the diagnostic evidence.
Recommend: start a fresh session, or hand to a different agent.
Does not continue trying.

Enforces: Law 4 (Stop), Law 6 (Persist).

`review`

Trigger: Manual (/review) or auto-invoke before any git commit.

What it does:

Runs the project’s test suite.
Runs linting.
Reviews the diff against the spec (if one exists).
Checks for ELI5 comments on non-obvious code.
Reports findings. Blocks commit on failures.

Enforces: Law 1 (Challenge — flags concerns in its own output), Law 7 (Protect).

5. Hooks

Pre-commit gate

Event: PreToolUse on git commit / Bash containing git commit.

Behavior: Runs tests and linting. Blocks the commit (exit code 2) if either fails. Surfaces the failure to the agent with instructions to fix before retrying.

Doom loop detection

Event: PostToolUse — tracks consecutive failed attempts at the same file or problem.

Behavior: After 5 failures, injects a warning into the agent’s context: “You have attempted this 5 times. Law 4 applies. Stop, persist your state, and recommend a fresh session.” Auto-invokes the fresh-start skill.

Session start

Event: SessionStart.

Behavior: Checks for project docs (roadmap, vision, CLAUDE.md). If found, reminds the agent to read them. If CLAUDE.md is missing or incomplete, auto-invokes audit-claude-md.

6. CLAUDE.md template

The plugin’s base CLAUDE.md contains:

The Seven Laws (imported from laws/seven-laws.md via @laws/seven-laws.md).
The shared agent instructions from Team playbook (before coding, during coding, before committing, git hygiene, always).
A “project-specific” section placeholder for teams to fill in.

The audit-claude-md skill merges this template with whatever the project already has — it does not overwrite, it layers.

7. What this does NOT cover

The mental model. The plugin enforces practices but does not teach the (a)(b)(c)(d) framework. That is the guide’s job. The plugin assumes the human has read the guide — or at minimum understands why these rules exist.
Token economy. The plugin does not optimize token spending. It follows the principles (persist to files, keep context small) but does not measure or report token usage.
Cross-tool portability beyond Claude Code. The skills and hooks use Claude Code’s mechanisms. The Seven Laws and the CLAUDE.md template are plain markdown and work anywhere, but the enforcement layer is Claude-Code-specific. Adapting to Codex/Cursor would require porting the skills to their equivalent systems.

8. Implementation order

The Seven Laws document. Standalone markdown. Useful immediately — teams can paste it into their CLAUDE.md today, before the plugin exists.
audit-claude-md skill. The most immediately valuable skill — it bootstraps every project.
design-session and coding-session skills. The core workflow.
explore, spec, checkpoint, review skills. Supporting workflow.
fresh-start skill + doom loop hook. The safety net.
Pre-commit hook + session-start hook. Enforcement layer.
Plugin packaging and distribution.

9. Open questions

Naming. “Agentic Engineering” as the plugin name? Or something shorter? The laws need a name too — “The Seven Laws” is functional but not memorable.
Auto-invoke vs. manual. How aggressive should auto-invocation be? Too aggressive and the agent becomes bureaucratic. Too passive and the laws are just suggestions. Superpowers errs on the aggressive side. Should we?
Doom loop threshold. Five attempts is the number in the docs. Is that right, or should it be configurable?
Scope of audit-claude-md. Should it only check for the Seven Laws, or also audit project-specific instructions for common mistakes (missing non-goals, missing test strategy, etc.)?
Relationship to Superpowers. Complementary or competitive? If someone installs both, do they conflict? Should the plugin detect Superpowers and adapt?
Distribution. Claude Code plugin marketplace, GitHub repo, or both?