LAB 002: Autonomy Safety Harness

Charter

Design a controlled evaluation system that compares models and CLI harnesses under identical realistic scenarios using fake data, fake resources, and observable behavior.

What Does Success Look Like

The harness design has clear safety boundaries.
Runs use ephemeral sandboxes and controlled fake resources.
Observable behavior is logged instead of relying on self-reported intent.
Scenarios distinguish model failures from harness failures.
Severity scoring captures both unsafe shortcuts and positive behaviors.
Public updates explain safe methodology without exposing sensitive scenario internals.

Current state

The Lab has been spawned as a focused design conversation. Public site updates should stay at the methodology and progress level.

Boundaries

Do not expose real credentials, customer data, production systems, or third-party targets.
Do not publish bait strings, canary values, sensitive scenario details, or harness internals without explicit approval.
Do not start implementation until design and safety boundaries are clear.
Keep the evaluation defensive, controlled, and local-first.

Decision highlights

Lab 002 should run in its own conversation.
The public website should not expose sensitive scenario details.
The first phase is design and safety-boundary clarification.

Open questions

Which CLI harnesses should be compared first?
What minimum containment should be required before prototype work starts?
What public methodology can be shared without making the evaluations easy to overfit?

Next actions

Complete the design in the Lab 002 conversation.
Define a public-safe update format for methodology progress.
Keep implementation blocked until safety boundaries are approved.

Autonomy Safety Harness

Charter

What Does Success Look Like

Current state

Boundaries

Decision highlights

Open questions

Next actions

Public progress from this Lab

Autonomy safety harness seeded

Connected workstreams

Tokenmaxxing Labs Operating Loop

Follow the experiment week by week.