feat(skills): backport adversarial UX optional skill

2026-04-22 10:36:30 -04:00
9 changed files with 232 additions and 274 deletions
--- a/hermes_cli/tools_config.py
+++ b/hermes_cli/tools_config.py
@@ -57,7 +57,7 @@ CONFIGURABLE_TOOLSETS = [
    ("moa",             "🧠 Mixture of Agents",         "mixture_of_agents"),
    ("tts",             "🔊 Text-to-Speech",            "text_to_speech"),
    ("skills",          "📚 Skills",                    "list, view, manage"),
-    ("todo",            "📋 Task Planning",             "todo, ultraplan"),
+    ("todo",            "📋 Task Planning",             "todo"),
    ("memory",          "💾 Memory",                    "persistent memory across sessions"),
    ("session_search",  "🔎 Session Search",            "search past conversations"),
    ("clarify",         "❓ Clarifying Questions",      "clarify"),
--- a/optional-skills/dogfood/adversarial-ux-test/SKILL.md
+++ b/optional-skills/dogfood/adversarial-ux-test/SKILL.md
@@ -0,0 +1,190 @@
+---
+name: adversarial-ux-test
+description: Roleplay the most difficult, tech-resistant user for your product. Browse the app as that persona, find every UX pain point, then filter complaints through a pragmatism layer to separate real problems from noise. Creates actionable tickets from genuine issues only.
+version: 1.0.0
+author: Omni @ Comelse
+license: MIT
+metadata:
+  hermes:
+    tags: [qa, ux, testing, adversarial, dogfood, personas, user-testing]
+    related_skills: [dogfood]
+---
+
+# Adversarial UX Test
+
+Roleplay the worst-case user for your product — the person who hates technology, doesn't want your software, and will find every reason to complain. Then filter their feedback through a pragmatism layer to separate real UX problems from "I hate computers" noise.
+
+Think of it as an automated "mom test" — but angry.
+
+## Why This Works
+
+Most QA finds bugs. This finds **friction**. A technically correct app can still be unusable for real humans. The adversarial persona catches:
+- Confusing terminology that makes sense to developers but not users
+- Too many steps to accomplish basic tasks
+- Missing onboarding or "aha moments"
+- Accessibility issues (font size, contrast, click targets)
+- Cold-start problems (empty states, no demo content)
+- Paywall/signup friction that kills conversion
+
+The **pragmatism filter** (Phase 3) is what makes this useful instead of just entertaining. Without it, you'd add a "print this page" button to every screen because Grandpa can't figure out PDFs.
+
+## How to Use
+
+Tell the agent:
+```
+"Run an adversarial UX test on [URL]"
+"Be a grumpy [persona type] and test [app name]"
+"Do an asshole user test on my staging site"
+```
+
+You can provide a persona or let the agent generate one based on your product's target audience.
+
+## Step 1: Define the Persona
+
+If no persona is provided, generate one by answering:
+
+1. **Who is the HARDEST user for this product?** (age 50+, non-technical role, decades of experience doing it "the old way")
+2. **What is their tech comfort level?** (the lower the better — WhatsApp-only, paper notebooks, wife set up their email)
+3. **What is the ONE thing they need to accomplish?** (their core job, not your feature list)
+4. **What would make them give up?** (too many clicks, jargon, slow, confusing)
+5. **How do they talk when frustrated?** (blunt, sweary, dismissive, sighing)
+
+### Good Persona Example
+> **"Big Mick" McAllister** — 58-year-old S&C coach. Uses WhatsApp and that's it. His "spreadsheet" is a paper notebook. "If I can't figure it out in 10 seconds I'm going back to my notebook." Needs to log session results for 25 players. Hates small text, jargon, and passwords.
+
+### Bad Persona Example
+> "A user who doesn't like the app" — too vague, no constraints, no voice.
+
+The persona must be **specific enough to stay in character** for 20 minutes of testing.
+
+## Step 2: Become the Asshole (Browse as the Persona)
+
+1. Read any available project docs for app context and URLs
+2. **Fully inhabit the persona** — their frustrations, limitations, goals
+3. Navigate to the app using browser tools
+4. **Attempt the persona's ACTUAL TASKS** (not a feature tour):
+   - Can they do what they came to do?
+   - How many clicks/screens to accomplish it?
+   - What confuses them?
+   - What makes them angry?
+   - Where do they get lost?
+   - What would make them give up and go back to their old way?
+
+5. Test these friction categories:
+   - **First impression** — would they even bother past the landing page?
+   - **Core workflow** — the ONE thing they need to do most often
+   - **Error recovery** — what happens when they do something wrong?
+   - **Readability** — text size, contrast, information density
+   - **Speed** — does it feel faster than their current method?
+   - **Terminology** — any jargon they wouldn't understand?
+   - **Navigation** — can they find their way back? do they know where they are?
+
+6. Take screenshots of every pain point
+7. Check browser console for JS errors on every page
+
+## Step 3: The Rant (Write Feedback in Character)
+
+Write the feedback AS THE PERSONA — in their voice, with their frustrations. This is not a bug report. This is a real human venting.
+
+```
+[PERSONA NAME]'s Review of [PRODUCT]
+
+Overall: [Would they keep using it? Yes/No/Maybe with conditions]
+
+THE GOOD (grudging admission):
+- [things even they have to admit work]
+
+THE BAD (legitimate UX issues):
+- [real problems that would stop them from using the product]
+
+THE UGLY (showstoppers):
+- [things that would make them uninstall/cancel immediately]
+
+SPECIFIC COMPLAINTS:
+1. [Page/feature]: "[quote in persona voice]" — [what happened, expected]
+2. ...
+
+VERDICT: "[one-line persona quote summarizing their experience]"
+```
+
+## Step 4: The Pragmatism Filter (Critical — Do Not Skip)
+
+Step OUT of the persona. Evaluate each complaint as a product person:
+
+- **RED: REAL UX BUG** — Any user would have this problem, not just grumpy ones. Fix it.
+- **YELLOW: VALID BUT LOW PRIORITY** — Real issue but only for extreme users. Note it.
+- **WHITE: PERSONA NOISE** — "I hate computers" talking, not a product problem. Skip it.
+- **GREEN: FEATURE REQUEST** — Good idea hidden in the complaint. Consider it.
+
+### Filter Criteria
+1. Would a 35-year-old competent-but-busy user have the same complaint? → RED
+2. Is this a genuine accessibility issue (font size, contrast, click targets)? → RED
+3. Is this "I want it to work like paper" resistance to digital? → WHITE
+4. Is this a real workflow inefficiency the persona stumbled on? → YELLOW or RED
+5. Would fixing this add complexity for the 80% who are fine? → WHITE
+6. Does the complaint reveal a missing onboarding moment? → GREEN
+
+**This filter is MANDATORY.** Never ship raw persona complaints as tickets.
+
+## Step 5: Create Tickets
+
+For **RED** and **GREEN** items only:
+- Clear, actionable title
+- Include the persona's verbatim quote (entertaining + memorable)
+- The real UX issue underneath (objective)
+- A suggested fix (actionable)
+- Tag/label: "ux-review"
+
+For **YELLOW** items: one catch-all ticket with all notes.
+
+**WHITE** items appear in the report only. No tickets.
+
+**Max 10 tickets per session** — focus on the worst issues.
+
+## Step 6: Report
+
+Deliver:
+1. The persona rant (Step 3) — entertaining and visceral
+2. The filtered assessment (Step 4) — pragmatic and actionable
+3. Tickets created (Step 5) — with links
+4. Screenshots of key issues
+
+## Tips
+
+- **One persona per session.** Don't mix perspectives.
+- **Stay in character during Steps 2-3.** Break character only at Step 4.
+- **Test the CORE WORKFLOW first.** Don't get distracted by settings pages.
+- **Empty states are gold.** New user experience reveals the most friction.
+- **The best findings are RED items the persona found accidentally** while trying to do something else.
+- **If the persona has zero complaints, your persona is too tech-savvy.** Make them older, less patient, more set in their ways.
+- **Run this before demos, launches, or after shipping a batch of features.**
+- **Register as a NEW user when possible.** Don't use pre-seeded admin accounts — the cold start experience is where most friction lives.
+- **Zero WHITE items is a signal, not a failure.** If the pragmatism filter finds no noise, your product has real UX problems, not just a grumpy persona.
+- **Check known issues in project docs AFTER the test.** If the persona found a bug that's already in the known issues list, that's actually the most damning finding — it means the team knew about it but never felt the user's pain.
+- **Subscription/paywall testing is critical.** Test with expired accounts, not just active ones. The "what happens when you can't pay" experience reveals whether the product respects users or holds their data hostage.
+- **Count the clicks to accomplish the persona's ONE task.** If it's more than 5, that's almost always a RED finding regardless of persona tech level.
+
+## Example Personas by Industry
+
+These are starting points — customize for your specific product:
+
+| Product Type | Persona | Age | Key Trait |
+|-------------|---------|-----|-----------|
+| CRM | Retirement home director | 68 | Filing cabinet is the current CRM |
+| Photography SaaS | Rural wedding photographer | 62 | Books clients by phone, invoices on paper |
+| AI/ML Tool | Department store buyer | 55 | Burned by 3 failed tech startups |
+| Fitness App | Old-school gym coach | 58 | Paper notebook, thick fingers, bad eyes |
+| Accounting | Family bakery owner | 64 | Shoebox of receipts, hates subscriptions |
+| E-commerce | Market stall vendor | 60 | Cash only, smartphone is for calls |
+| Healthcare | Senior GP | 63 | Dictates notes, nurse handles the computer |
+| Education | Veteran teacher | 57 | Chalk and talk, worksheets in ring binders |
+
+## Rules
+
+- Stay in character during Steps 2-3
+- Be genuinely mean but fair — find real problems, not manufactured ones
+- The pragmatism filter (Step 4) is **MANDATORY**
+- Screenshots required for every complaint
+- Max 10 tickets per session
+- Test on staging/deployed app, not local dev
+- One persona, one session, one report
--- a/tests/test_optional_adversarial_ux_skill_catalog.py
+++ b/tests/test_optional_adversarial_ux_skill_catalog.py
@@ -0,0 +1,25 @@
+from pathlib import Path
+
+from tools.skills_hub import OptionalSkillSource
+
+
+REPO_ROOT = Path(__file__).resolve().parents[1]
+
+
+def test_optional_skill_source_scans_adversarial_ux_test():
+    source = OptionalSkillSource()
+    metas = {meta.identifier: meta for meta in source._scan_all()}
+
+    assert "official/dogfood/adversarial-ux-test" in metas
+    assert metas["official/dogfood/adversarial-ux-test"].name == "adversarial-ux-test"
+    assert "tech-resistant user" in metas["official/dogfood/adversarial-ux-test"].description
+
+
+def test_optional_skill_catalog_docs_list_adversarial_ux_test():
+    optional_catalog = (REPO_ROOT / "website" / "docs" / "reference" / "optional-skills-catalog.md").read_text(encoding="utf-8")
+    bundled_catalog = (REPO_ROOT / "website" / "docs" / "reference" / "skills-catalog.md").read_text(encoding="utf-8")
+
+    assert "**adversarial-ux-test**" in optional_catalog
+    assert "official/dogfood/adversarial-ux-test" in optional_catalog
+    assert "`adversarial-ux-test`" in bundled_catalog
+    assert "dogfood/adversarial-ux-test" in bundled_catalog
--- a/tests/tools/test_registry.py
+++ b/tests/tools/test_registry.py
@@ -294,32 +294,22 @@ class TestBuiltinDiscovery:
            "tools.browser_tool",
            "tools.clarify_tool",
            "tools.code_execution_tool",
-            "tools.crisis_tool",
            "tools.cronjob_tools",
            "tools.delegate_tool",
            "tools.file_tools",
            "tools.homeassistant_tool",
            "tools.image_generation_tool",
-            "tools.local_inference_tool",
            "tools.memory_tool",
            "tools.mixture_of_agents_tool",
            "tools.process_registry",
            "tools.rl_training_tool",
-            "tools.scavenger_fixer",
            "tools.send_message_tool",
            "tools.session_search_tool",
            "tools.skill_manager_tool",
            "tools.skills_tool",
-            "tools.sovereign_router",
-            "tools.sovereign_scavenger",
-            "tools.sovereign_teleport",
-            "tools.static_analyzer",
-            "tools.symbolic_verify",
            "tools.terminal_tool",
            "tools.todo_tool",
            "tools.tts_tool",
-            "tools.ultraplan",
-            "tools.verify_tool",
            "tools.vision_tools",
            "tools.web_tools",
        }
--- a/tests/tools/test_ultraplan_tool.py
+++ b/tests/tools/test_ultraplan_tool.py
@@ -1,81 +0,0 @@
-import json
-from pathlib import Path
-
-from toolsets import resolve_toolset
-from tools.registry import registry
-
-
-def test_create_action_saves_markdown_and_json(tmp_path):
-    from tools.ultraplan import ultraplan_tool
-
-    result = json.loads(
-        ultraplan_tool(
-            action="create",
-            mission="Daily autonomous planning",
-            streams=[
-                {
-                    "id": "A",
-                    "name": "Backlog burn",
-                    "phases": [
-                        {"id": "A1", "name": "Triage", "artifact": "issue list"},
-                        {"id": "A2", "name": "Ship", "dependencies": ["A1"], "artifact": "PR"},
-                    ],
-                }
-            ],
-            base_dir=str(tmp_path),
-        )
-    )
-
-    assert result["success"] is True
-    assert Path(result["file_path"]).exists()
-    assert Path(result["json_path"]).exists()
-    assert "Work Streams" in Path(result["file_path"]).read_text(encoding="utf-8")
-
-
-def test_load_action_returns_saved_plan(tmp_path):
-    from tools.ultraplan import ultraplan_tool
-
-    created = json.loads(
-        ultraplan_tool(
-            action="create",
-            date="20260422",
-            mission="Mission from saved plan",
-            base_dir=str(tmp_path),
-        )
-    )
-    loaded = json.loads(
-        ultraplan_tool(
-            action="load",
-            date="20260422",
-            base_dir=str(tmp_path),
-        )
-    )
-
-    assert created["success"] is True
-    assert loaded["success"] is True
-    assert loaded["plan"]["mission"] == "Mission from saved plan"
-    assert loaded["file_path"].endswith("ultraplan_20260422.md")
-
-
-def test_cron_spec_returns_daily_schedule_and_prompt():
-    from tools.ultraplan import ultraplan_tool
-
-    result = json.loads(ultraplan_tool(action="cron_spec"))
-
-    assert result["success"] is True
-    assert result["schedule"] == "0 6 * * *"
-    assert "Ultraplan" in result["prompt"]
-    assert "ultraplan_YYYYMMDD.md" in result["prompt"]
-
-
-def test_registry_registers_ultraplan_tool():
-    import tools.ultraplan  # noqa: F401
-
-    entry = registry.get_entry("ultraplan")
-    assert entry is not None
-    assert entry.toolset == "todo"
-
-
-def test_default_toolsets_include_ultraplan():
-    assert "ultraplan" in resolve_toolset("todo")
-    assert "ultraplan" in resolve_toolset("hermes-cli")
--- a/tools/ultraplan.py
+++ b/tools/ultraplan.py
@@ -290,9 +290,6 @@ def load_ultraplan(date: str, base_dir: Path = None) -> Optional[Ultraplan]:
        return None


-DEFAULT_ULTRAPLAN_SCHEDULE = "0 6 * * *"
-
-
 def generate_daily_cron_prompt() -> str:
    """Generate the prompt for the daily ultraplan cron job."""
    return """Generate today's Ultraplan.
@@ -301,9 +298,9 @@ Steps:
 1. Check open Gitea issues assigned to you
 2. Check open PRs needing review
 3. Check fleet health status
-4. Decompose work into parallel streams with concrete phases and artifacts
-5. Use the ultraplan tool to save ~/.timmy/cron/ultraplan_YYYYMMDD.md and the matching JSON sidecar
-6. Optionally file a Gitea issue with the plan summary
+4. Decompose work into parallel streams
+5. Generate ultraplan_YYYYMMDD.md
+6. File Gitea issue with the plan

 Output format:
 - Mission statement
@@ -311,176 +308,3 @@ Output format:
 - Dependency map
 - Success metrics
 """
-
-
-def generate_daily_cron_job_spec(schedule: str = DEFAULT_ULTRAPLAN_SCHEDULE) -> Dict[str, str]:
-    """Return a reusable cron job spec for daily Ultraplan generation."""
-    return {
-        "name": "Daily Ultraplan",
-        "schedule": schedule,
-        "prompt": generate_daily_cron_prompt(),
-        "path_pattern": "~/.timmy/cron/ultraplan_YYYYMMDD.md",
-    }
-
-
-def _resolve_base_dir(base_dir: Optional[str | Path]) -> Path:
-    """Normalize the requested Ultraplan base directory."""
-    if base_dir is None:
-        return Path.home() / ".timmy" / "cron"
-    return Path(base_dir).expanduser()
-
-
-def ultraplan_tool(
-    action: str,
-    date: Optional[str] = None,
-    mission: str = "",
-    streams: Optional[List[Dict[str, Any]]] = None,
-    metrics: Optional[Dict[str, Any]] = None,
-    notes: str = "",
-    base_dir: Optional[str] = None,
-) -> str:
-    """Create/load Ultraplan artifacts and expose a daily cron spec."""
-    from tools.registry import tool_error, tool_result
-
-    action = (action or "").strip().lower()
-    resolved_base_dir = _resolve_base_dir(base_dir)
-
-    try:
-        if action == "create":
-            plan = create_ultraplan(date=date, mission=mission, streams=streams or [])
-            if metrics:
-                plan.metrics = metrics
-            if notes:
-                plan.notes = notes
-            md_path = save_ultraplan(plan, base_dir=resolved_base_dir)
-            json_path = resolved_base_dir / f"ultraplan_{plan.date}.json"
-            return tool_result(
-                success=True,
-                action="create",
-                date=plan.date,
-                file_path=str(md_path),
-                json_path=str(json_path),
-                plan=plan.to_dict(),
-            )
-
-        if action == "load":
-            plan_date = date or datetime.now().strftime("%Y%m%d")
-            plan = load_ultraplan(plan_date, base_dir=resolved_base_dir)
-            if plan is None:
-                return tool_error(
-                    f"No Ultraplan found for {plan_date}",
-                    success=False,
-                    action="load",
-                    date=plan_date,
-                )
-            return tool_result(
-                success=True,
-                action="load",
-                date=plan.date,
-                file_path=str(resolved_base_dir / f"ultraplan_{plan.date}.md"),
-                json_path=str(resolved_base_dir / f"ultraplan_{plan.date}.json"),
-                plan=plan.to_dict(),
-                markdown=plan.to_markdown(),
-            )
-
-        if action == "cron_spec":
-            spec = generate_daily_cron_job_spec()
-            return tool_result(success=True, action="cron_spec", **spec)
-
-        return tool_error(
-            f"Unknown Ultraplan action: {action}",
-            success=False,
-            action=action,
-        )
-    except Exception as e:
-        return tool_error(f"Ultraplan {action or 'tool'} failed: {e}", success=False, action=action)
-
-
-ULTRAPLAN_SCHEMA = {
-    "name": "ultraplan",
-    "description": (
-        "Create or load daily Ultraplan planning artifacts under ~/.timmy/cron/ and "
-        "return a reusable cron spec for autonomous planning. Use this when you want "
-        "a concrete markdown/json plan file with streams, phases, dependencies, and metrics."
-    ),
-    "parameters": {
-        "type": "object",
-        "properties": {
-            "action": {
-                "type": "string",
-                "enum": ["create", "load", "cron_spec"],
-                "description": "Operation to perform",
-            },
-            "date": {
-                "type": "string",
-                "description": "Plan date as YYYYMMDD. Defaults to today for create/load.",
-            },
-            "mission": {
-                "type": "string",
-                "description": "High-level mission statement for today's plan.",
-            },
-            "streams": {
-                "type": "array",
-                "description": "Optional work streams with phases/artifacts/dependencies for create.",
-                "items": {
-                    "type": "object",
-                    "properties": {
-                        "id": {"type": "string"},
-                        "name": {"type": "string"},
-                        "phases": {
-                            "type": "array",
-                            "items": {
-                                "type": "object",
-                                "properties": {
-                                    "id": {"type": "string"},
-                                    "name": {"type": "string"},
-                                    "description": {"type": "string"},
-                                    "artifact": {"type": "string"},
-                                    "dependencies": {
-                                        "type": "array",
-                                        "items": {"type": "string"},
-                                    },
-                                },
-                                "required": ["name"],
-                            },
-                        },
-                    },
-                    "required": ["name"],
-                },
-            },
-            "metrics": {
-                "type": "object",
-                "description": "Optional success metrics to store on the plan.",
-                "additionalProperties": True,
-            },
-            "notes": {
-                "type": "string",
-                "description": "Optional free-form notes appended to the saved plan.",
-            },
-            "base_dir": {
-                "type": "string",
-                "description": "Optional override for the Ultraplan storage directory.",
-            },
-        },
-        "required": ["action"],
-    },
-}
-
-
-from tools.registry import registry
-
-registry.register(
-    name="ultraplan",
-    toolset="todo",
-    schema=ULTRAPLAN_SCHEMA,
-    handler=lambda args, **_kw: ultraplan_tool(
-        action=args.get("action", ""),
-        date=args.get("date"),
-        mission=args.get("mission", ""),
-        streams=args.get("streams"),
-        metrics=args.get("metrics"),
-        notes=args.get("notes", ""),
-        base_dir=args.get("base_dir"),
-    ),
-    emoji="🗺️",
-)
--- a/toolsets.py
+++ b/toolsets.py
@@ -47,7 +47,7 @@ _HERMES_CORE_TOOLS = [
    # Text-to-speech
    "text_to_speech",
    # Planning & memory
-    "todo", "ultraplan", "memory",
+    "todo", "memory",
    # Session history search
    "session_search",
    # Clarifying questions
@@ -157,8 +157,8 @@ TOOLSETS = {
    },
    
    "todo": {
-        "description": "Task planning and tracking for multi-step work, including daily Ultraplan artifacts",
-        "tools": ["todo", "ultraplan"],
+        "description": "Task planning and tracking for multi-step work",
+        "tools": ["todo"],
        "includes": []
    },
    
--- a/website/docs/reference/optional-skills-catalog.md
+++ b/website/docs/reference/optional-skills-catalog.md
@@ -16,6 +16,7 @@ For example:

 ```bash
 hermes skills install official/blockchain/solana
+hermes skills install official/dogfood/adversarial-ux-test
 hermes skills install official/mlops/flash-attention
 ```

@@ -56,6 +57,12 @@ hermes skills uninstall <skill-name>
 | **blender-mcp** | Control Blender directly from Hermes via socket connection to the blender-mcp addon. Create 3D objects, materials, animations, and run arbitrary Blender Python (bpy) code. |
 | **meme-generation** | Generate real meme images by picking a template and overlaying text with Pillow. Produces actual `.png` meme files. |

+## Dogfood
+
+| Skill | Description |
+|-------|-------------|
+| **adversarial-ux-test** | Roleplay the most difficult, tech-resistant user for a product — browse in-persona, rant, then filter through a RED/YELLOW/WHITE/GREEN pragmatism layer so only real UX friction becomes tickets. |
+
 ## DevOps

 | Skill | Description |
--- a/website/docs/reference/skills-catalog.md
+++ b/website/docs/reference/skills-catalog.md
@@ -59,9 +59,12 @@ DevOps and infrastructure automation skills.

 ## dogfood

+Internal dogfooding and QA skills used to test Hermes Agent itself.
+
 | Skill | Description | Path |
 |-------|-------------|------|
 | `dogfood` | Systematic exploratory QA testing of web applications — find bugs, capture evidence, and generate structured reports. | `dogfood/dogfood` |
+| `adversarial-ux-test` | Roleplay the most difficult, tech-resistant user for a product — browse in-persona, rant, then filter through a RED/YELLOW/WHITE/GREEN pragmatism layer so only real UX friction becomes tickets. | `dogfood/adversarial-ux-test` |
 | `hermes-agent-setup` | Help users configure Hermes Agent — CLI usage, setup wizard, model/provider selection, tools, skills, voice/STT/TTS, gateway, and troubleshooting. | `dogfood/hermes-agent-setup` |

 ## email