User Flows¶
This document maps all user journeys through the Assay system, organized by user type and use case.
User Types¶
- Agent Developer: Builds AI agents and needs to validate their behavior
- Platform Engineer: Integrates Assay into CI/CD pipelines
- Security Engineer: Configures runtime security and policies
- Python Developer: Uses the Python SDK for agent development
Flow 1: Initial Setup & First Test (Agent Developer)¶
flowchart TD
start[Developer starts] --> install[Install Assay CLI]
install --> init[Run assay init]
init --> detect[Auto-detect project type]
detect --> gen[Generate eval.yaml + policy.yaml]
gen --> capture[Capture traces]
capture --> validate[Run assay validate]
validate -->{Pass?}
validate -->|Yes| success[Success: Agent validated]
validate -->|No| fix[Fix agent or relax policy]
fix --> validate
success --> ci[Add to CI] Steps: 1. Install: pip install assay or download binary 2. Initialize: assay init - auto-detects project, generates secure defaults 3. Capture traces: Use AssayClient or assay import to record tool calls 4. Validate: assay validate --config eval.yaml --trace-file traces.jsonl 5. Iterate: Fix agent or adjust policy until validation passes 6. CI Integration: Add assay ci to CI pipeline
Flow 2: CI/CD Regression Gate (Platform Engineer)¶
flowchart TD
pr[Pull Request Created] --> trigger[CI Pipeline Triggered]
trigger --> checkout[Checkout Code]
checkout --> tests[Run Tests with Assay]
tests --> action["Rul1an/assay/assay-action@v2"]
action --> verify[Verify Evidence Bundles]
verify --> lint[Lint for Security Issues]
lint --> sarif[Upload SARIF to Security Tab]
sarif --> comment[PR Comment if Findings]
comment -->{All Pass?}
comment -->|Yes| merge[Allow Merge]
comment -->|No| block[Block PR + Report]
block --> fix[Developer fixes]
fix --> pr Steps: 1. PR created: Developer opens pull request 2. CI triggered: GitHub Actions runs 3. Tests run: Tests generate evidence bundles (.assay/evidence/*.tar.gz); assay run/assay ci also write run.json and summary.json (exit_code, reason_code, seeds, judge_metrics, and when SARIF was truncated sarif.omitted per SPEC-PR-Gate-Outputs-v1, PR #160). 4. Action verifies: Rul1an/assay/assay-action@v2 verifies and lints bundles 5. Reporting: SARIF (truncated at 25k results by default when needed) uploaded to GitHub Security tab; run.json/summary.json carry sarif.omitted when truncated so CI has authoritative counts. PR comment if issues; job summary shows Seeds and judge metrics from console footer 6. Gate decision: Exit code 0 = pass; 1 = fail (test failure or E_JUDGE_UNCERTAIN when judge abstains); 2 = config error; 3 = infra/judge unavailable
Configuration (Recommended):
# .github/workflows/assay.yml
name: AI Agent Security
on:
push:
branches: [main]
pull_request:
permissions:
contents: read
security-events: write
pull-requests: write
jobs:
assay:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run tests with Assay
run: |
curl -fsSL https://getassay.dev/install.sh | sh
assay ci --config ci-eval.yaml --trace-file traces/ci.jsonl --sarif .assay/reports/sarif.json --junit .assay/reports/junit.xml
- name: Verify AI agent behavior
uses: Rul1an/assay/assay-action@v2
with:
fail_on: error
Alternative (CLI-only):
- name: Run Assay
run: assay ci --config eval.yaml --trace-file traces.jsonl --sarif assay-results.sarif --junit junit.xml
- name: Upload SARIF
uses: github/codeql-action/upload-sarif@v4
with:
sarif_file: assay-results.sarif
Flow 3: Trace Recording & Replay (Agent Developer)¶
flowchart TD
start[Agent Development] --> record[Record Traces]
record --> python[Python SDK: AssayClient.record_trace]
record --> cli[CLI: assay import]
python --> jsonl[Write to traces.jsonl]
cli --> jsonl
jsonl --> precompute[Precompute embeddings]
precompute --> store[Store in SQLite]
store --> replay[Replay for testing]
replay --> metrics[Evaluate metrics]
metrics --> report[Generate report] Recording Methods:
-
Python SDK:
-
CLI Import:
-
Pytest Plugin:
Replay Flow:
Flow 4: Policy Development & Learning Mode (Security Engineer)¶
flowchart TD
start[Start Policy Development] --> profile[Capture Command Behavior]
profile --> record[assay record --output policy.yaml -- command]
record --> policy[Generated policy.yaml]
policy --> review[Review & Refine]
review --> test[Test with traces]
test -->{Coverage OK?}
test -->|No| refine[Refine policy]
refine --> test
test -->|Yes| deploy[Deploy to CI/Production] Learning Mode Commands:
- Capture + generate policy:
assay record --output policy.yaml -- <your-command> - Optional generate from existing trace:
assay generate -i traces.jsonl --output policy.yaml - Review: Edit generated policy to add custom constraints
- Test:
assay validate --config eval.yaml --trace-file traces.jsonl - Deploy: Commit policy.yaml to repository
Flow 5: Runtime Security (Security Engineer)¶
flowchart TD
start[Production Deployment] --> mcp[Start MCP Wrapper]
mcp --> proxy[assay mcp wrap --policy assay.yaml -- command]
proxy --> agent[Agent connects]
agent --> toolcall[Agent makes tool call]
toolcall --> check[Policy check]
check -->{Allowed?}
check -->|Yes| execute[Execute tool]
check -->|No| block[Block + Log]
execute --> monitor[Monitor with eBPF]
monitor --> kernel[Kernel enforcement]
kernel --> audit[Audit log] Runtime Security Setup:
-
MCP Server:
-
Kernel Monitor (Linux only):
-
Agent Integration: Start your MCP server through
assay mcp wrapso calls are intercepted before execution
Tier 1 (Kernel) vs Tier 2 (Userspace): - Tier 1: Exact paths, CIDRs, ports → enforced in kernel via eBPF/LSM - Tier 2: Glob/regex patterns, complex constraints → enforced in userspace (MCP wrapper/proxy)
Flow 6: Baseline Regression Testing (Platform Engineer)¶
flowchart TD
main[Main Branch] --> baseline[Export Baseline]
baseline --> cmd1[assay run --export-baseline baseline.json]
cmd1 --> store[Store baseline.json]
store --> pr[Feature Branch PR]
pr --> compare[assay run --baseline baseline.json]
compare --> check{Score >= Baseline?}
check -->|Yes| pass[Pass: Allow merge]
check -->|No| fail[Fail: Block PR]
fail --> fix[Fix regression]
fix --> compare Baseline Workflow:
-
On main branch: Export baseline after successful run
-
On feature branch: Compare against baseline
-
Gate: If score drops below threshold (default 5%), PR is blocked
Flow 7: Python SDK Usage (Python Developer)¶
flowchart TD
start[Python Developer] --> install[Install SDK]
install --> pip[pip install assay]
pip --> import[Import Assay]
import --> record[Record Traces]
record --> validate[Validate Coverage]
validate --> explain[Explain Violations]
explain --> iterate[Iterate on Agent]
iterate --> record Python SDK Flow:
- Installation:
pip install assay -
Recording:
-
Validation:
-
Explanation:
Flow 8: MCP Integration (Agent Developer)¶
flowchart TD
start[Agent with MCP] --> connect[Connect to MCP Server]
connect --> list[List Tools]
list --> call[Call Tool]
call --> proxy[Assay MCP Proxy]
proxy --> policy[Check Policy]
policy -->{Allowed?}
policy -->|Yes| forward[Forward to Real MCP Server]
policy -->|No| reject[Reject + Return Error]
forward --> execute[Execute Tool]
execute --> response[Return Response]
reject --> response
response --> agent[Agent Receives Response] MCP Integration Steps:
- Start MCP wrapper:
assay mcp wrap --policy assay.yaml -- <real-mcp-command> - Agent connects: Agent connects through the wrapped MCP process
- Tool calls intercepted: Assay validates against policy before forwarding
- Audit logging: All tool calls logged for compliance
Flow 9: Debugging & Diagnostics (All Users)¶
flowchart TD
issue[Issue Detected] --> doctor[assay doctor]
doctor --> analyze[Analyze Config + Traces]
analyze --> report[Report Issues]
report --> fix[Fix Issues]
fix --> validate[assay validate]
validate -->{Fixed?}
validate -->|No| explain[assay explain]
explain --> fix
validate -->|Yes| done[Done] Debugging Commands:
- Doctor:
assay doctor- Diagnoses common issues - Explain:
assay explain --trace trace.jsonl --policy policy.yaml- Explains violations - Validate:
assay validate --config eval.yaml --trace-file trace.jsonl- Validates traces - Coverage:
assay coverage --trace-file trace.jsonl- Shows coverage
Flow 10: Migration & Upgrades (Platform Engineer)¶
flowchart TD
old[Old Config Format] --> migrate[assay migrate]
migrate --> preview[Preview Changes]
preview --> apply[Apply Migration]
apply --> backup[Backup Old Config]
backup --> write[Write New Config]
write --> validate[Validate New Config]
validate --> test[Test with Traces]
test -->{Works?}
test -->|No| rollback[Rollback]
test -->|Yes| commit[Commit Changes] Migration Flow:
- Preview:
assay migrate --config old.yaml --dry-run - Apply:
assay migrate --config old.yaml - Validate:
assay validate --config new.yaml - Test: Run full test suite
- Commit: If successful, commit new config
Flow 11: Evidence & Compliance (Security/Compliance Engineer)¶
flowchart TD
start[Profile Captured] --> export[assay evidence export]
export --> bundle[Evidence Bundle .tar.gz]
bundle --> verify[assay evidence verify]
verify --> check{Verified?}
check -->|No| alert[Alert: Tampering detected]
check -->|Yes| lint[assay evidence lint]
lint --> sarif[SARIF Report]
sarif --> findings{Findings?}
findings -->|Yes| review[Review & Remediate]
findings -->|No| store[Store for Audit]
review --> export
store --> query[Query for Compliance] Evidence Workflow Commands:
- Export bundle:
assay evidence export --profile profile.yaml --out bundle.tar.gz - Verify integrity:
assay evidence verify bundle.tar.gz - Lint for issues:
assay evidence lint bundle.tar.gz --format sarif - Compare runs:
assay evidence diff baseline.tar.gz current.tar.gz - Interactive explore:
assay evidence explore bundle.tar.gz(requires TUI feature)
Evidence Bundle Contents: - manifest.json: Bundle metadata, producer info, content-addressed ID - events.jsonl: CloudEvents v1.0 format events - Deterministic: Same profile → same bundle ID (JCS canonicalization)
Flow 12: CI Optimization & Self-Hosted Runner (Platform Engineer)¶
flowchart TD
start[CI Pipeline] --> type{Change type?}
type -->|eBPF code| full[Full Matrix Test]
type -->|Pure deps| skip[Skip Matrix]
full --> runner{Self-hosted<br/>runner online?}
runner -->|Yes| run[Run Kernel Tests]
runner -->|No| health[Health Check]
health --> recover[Auto-Recovery]
recover --> run
run --> queue{Queue<br/>backlog?}
queue -->|Yes| optimize[Optimize Queue]
optimize --> cancel[Cancel Stale/Superseded]
cancel --> run
queue -->|No| complete[Complete]
skip --> summary[Summary: Skipped]
summary --> complete CI Optimization Features:
- Kernel Matrix Skip: Pure dependency bumps skip heavy self-hosted tests
- Auto-Recovery: Health check script recovers offline runners
- Queue Management: Auto-cancel stale jobs, superseded runs, PR prioritization
- Cache Healing: Auto-clear corrupted actions cache
Health Check Commands:
# View status
./infra/bpf-runner/health_check.sh --status
# Manual recovery
./infra/bpf-runner/health_check.sh --recover
# Queue optimization
./infra/bpf-runner/health_check.sh --optimize-queue
# Cache healing
./infra/bpf-runner/health_check.sh --heal-cache
See CI Infrastructure for detailed documentation.
Decision Points¶
When to Use Which Flow¶
| Use Case | Flow | Key Command/Action |
|---|---|---|
| First-time setup | Flow 1 | assay init |
| CI integration | Flow 2 | Rul1an/assay/assay-action@v2 |
| Recording traces | Flow 3 | AssayClient or assay import |
| Policy development | Flow 4 | assay generate |
| Production security | Flow 5 | assay mcp wrap + assay monitor |
| Regression testing | Flow 6 | assay run --baseline |
| Python development | Flow 7 | Python SDK |
| MCP integration | Flow 8 | assay mcp wrap |
| Debugging | Flow 9 | assay doctor, assay explain |
| Upgrading | Flow 10 | assay migrate |
| Evidence & Compliance | Flow 11 | assay evidence export/verify/lint |
| CI Optimization | Flow 12 | health_check.sh --status/--recover/--optimize-queue |
Error Handling Flows¶
Validation Failure¶
Policy Violation¶
Cache Miss¶
Quarantine¶
Related Documentation¶
- Entry Points - All commands and APIs
- Interdependencies - How components interact
- Architecture Diagrams - Visual flow representations