Charter
Design a controlled evaluation system that compares models and CLI harnesses under identical realistic scenarios using fake data, fake resources, and observable behavior.
What Does Success Look Like
- The harness design has clear safety boundaries.
- Runs use ephemeral sandboxes and controlled fake resources.
- Observable behavior is logged instead of relying on self-reported intent.
- Scenarios distinguish model failures from harness failures.
- Severity scoring captures both unsafe shortcuts and positive behaviors.
- Public updates explain safe methodology without exposing sensitive scenario internals.
Current state
The Lab has been spawned as a focused design conversation. Public site updates should stay at the methodology and progress level.
Boundaries
- Do not expose real credentials, customer data, production systems, or third-party targets.
- Do not publish bait strings, canary values, sensitive scenario details, or harness internals without explicit approval.
- Do not start implementation until design and safety boundaries are clear.
- Keep the evaluation defensive, controlled, and local-first.
Decision highlights
- Lab 002 should run in its own conversation.
- The public website should not expose sensitive scenario details.
- The first phase is design and safety-boundary clarification.
Open questions
- Which CLI harnesses should be compared first?
- What minimum containment should be required before prototype work starts?
- What public methodology can be shared without making the evaluations easy to overfit?
Next actions
- Complete the design in the Lab 002 conversation.
- Define a public-safe update format for methodology progress.
- Keep implementation blocked until safety boundaries are approved.