Assay Roadmap 2026¶
Status sync (2026-03-15): Q1 DX/refactor convergence is closed on
main(RFC-001/002/003/004). Evidence-as-a-product (ADR-025), protocol adapters (ADR-026), and MCP governance/enforcement (ADR-032 Wave24-Wave42) are materially implemented onmain. Governance support ADRs ADR-027 through ADR-031 are implemented onmainand should be read as delivered contracts, not pending proposals. BYOS truth (ADR-015): Phase 1 is complete onmain:push,pull,list,store-status,.assay/store.yamlconfig, and provider quickstart docs (S3, B2, MinIO) are all shipped. Split refactor program is closed loop through Wave7C Step3 onmain(see plan, report, program review pack). Next repo-level priorities: release/changelog hygiene, then ADR-015 Phase 2 (GitHub Action integration) or next bounded wave.
Strategic Focus: Agent Runtime Evidence & Control Plane. Core Value: Verifiable Evidence (Open Standard) + Governance Platform.
Executive Summary¶
Assay is the "Evidence Recorder" for agentic workflows. We create verifiable, machine-readable audit trails that integrate with existing security/observability stacks. Assay aims to become the standard evidence lint runtime for agentic CI, with open engine + baseline packs and strong CI/SARIF integration as the adoption motor.
Standards Alignment: - CloudEvents v1.0 envelope β lingua franca for event routers and SIEM pipelines - W3C Trace Context (traceparent) β correlation with existing distributed tracing - SARIF 2.1.0 β GitHub Code Scanning integration with explicit automationDetails.id uniqueness/stability contract - EU AI Act Article 12 β record-keeping requirements make "evidence" commercially relevant; pack mappings are pinned to EUR-Lex text, and phased dates are treated as guidance - OTel GenAI Semantic Conventions β vendor-agnostic observability bridge for LLM/agent workloads; conventions are evolving, so integrations are version-pinned with mapping tests - ENISA / SBOM / SLSA β Supply-chain assurance (SBOM, provenance, attestation) aligns with ENISA priorities; SLSA-aligned attestation per ADR-018
Strategic Positioning: Protocol-Agnostic Governance¶
The protocol landscape table below is a planning snapshot (hypothesis-driven) and is revisited as specs/programs evolve.
The agentic commerce/interop space is fragmenting (Jan 2026):
| Protocol | Owner | Focus |
|---|---|---|
| ACP (Agentic Commerce Protocol) | OpenAI/Stripe | Buyer/agent/business transactions |
| UCP (Universal Commerce Protocol) | Google/Shopify | Discoverβbuyβpost-purchase journeys |
| AP2 (Agent Payments Protocol) | Secure transactions + mandates | |
| A2A (Agent2Agent) | Agent discovery/capabilities/tasks | |
| x402 | Community | Internet-native (crypto) agent payments |
Assay's moat: Protocol-agnostic evidence + governance layer.
"Regardless of protocol: verifiable evidence, policy enforcement, trust verification, SIEM/OTel-ready."
All these protocols converge on "tool calls + state transitions" β exactly what Assay captures as trace-linked evidence.
2026 runtime-governance positioning: Recent fragmented-IPI experiments on main sharpen this moat. Assay's value is not "better regex"; it is deterministic governance on the tool bus. Wrap-only lexical enforcement is useful but brittle against multi-step leakage and tool-hopping. Sequence/state policies stay robust because they govern behavioral routes across sink labels and payload variants. The bounded claim remains important: these experiments demonstrate sink-call exfiltration control with audit-grade evidence and low decision overhead, not a universal solution to semantic hijacking or raw network egress.
Why This Matters¶
- Tool Signing becomes critical: "tool substitution" and "merchant tool spoofing" are real commerce risks
- Mandates/Intents need audit trails: AP2's authorization model requires provable evidence
- Agent Identity is enterprise-core: who/what authorized a transaction?
See Protocol Landscape Analysis for detailed research
Market Validation (Feb 2026)¶
The CI/CD-for-agents market is validating Assay's core assumptions:
- AAIF (Agentic AI Foundation): MCP, goose and AGENTS.md are under Linux Foundation governance (Dec 2025). MCP as a vendor-neutral standard reduces protocol fragmentation risk and supports Assay's MCP-first bet.
- GitHub "Continuous AI" (Feb 2026, evolving/preview signal): repo-agents with read-only default + "Safe Outputs" β explicit contracts defining what agents may produce. This aligns with Assay's policy-as-code model.
- Policy-as-code as best practice: Multiple sources (V2Solutions, Skywork, Gartner) now list policy-as-code, least privilege, auditability and kill switches as enterprise requirements for agent deployment. Not a niche compliance need anymore.
- Fleet-of-small-agents pattern: The dominant deployment pattern is many small specialized agents, not one generalist. More agents = more policies = more Assay usage per repo.
- Gartner risk signal: >40% of agentic AI projects will be cancelled by end 2027 due to costs, unclear value, or inadequate risk controls. Governance tooling is a prerequisite, not a nice-to-have.
Competitive differentiation: Agent CI (eval-as-service), Langfuse/LangSmith (observability), Dagger (agentic runtime) cover adjacent layers. None offers deterministic replay, evidence bundles with integrity guarantees, or compliance packs. Assay's unique position: governance + audit, not observability or eval-as-service.
See RESEARCH-ci-cd-ai-agents-feb2026.md for detailed analysis
Current State: Evidence Contract v1 β Complete¶
The Evidence Contract v1 is production-ready.
| Component | Status | Notes |
|---|---|---|
assay-evidence crate | β | Schema v1, JCS canonicalization, content-addressed IDs |
| Evidence pipeline | β | ProfileCollector β Profile β EvidenceMapper β EvidenceEvent (OTel Collector pattern) |
| CLI commands | β | export, verify, show, lint, diff, explore |
| OTel integration | β | trace_parent, trace_state on all events |
Architecture Note: The current pipeline follows the OTel Collector pattern (native format emission β transformation layer β canonical export). This is the recommended SOTA approach per OpenTelemetry best practices. See ADR-008: Evidence Streaming for the decision to keep CloudEvents construction out of the hot path.
π― Immediate Next Steps (Q1 Close-out)¶
- v1 Contract Freeze β Publish versioning policy, deprecation rules, golden bundle fixtures
- Compatibility Tests β No new event types without schema + tests
- Docs Positioning β "Assay Evidence = CloudEvents + Trace Context + Deterministic Bundle"
CLI Surface: Two-Layer Positioning¶
To reduce "surface area tax" and improve adoption, CLI commands are positioned in two tiers:
Happy Path (Core Workflow)¶
assay run # Execute with policy enforcement
assay evidence export # Create verifiable bundle
assay evidence verify # Offline integrity check
assay evidence lint # Security/quality findings (SARIF)
assay evidence diff # Compare bundles
assay evidence explore # Interactive TUI viewer
Power Tools (Advanced/Experimental)¶
All other commands (quarantine, fix, demo, sim, discover, kill, mcp, etc.) are documented separately as advanced tooling.
Developer UX/DX Strategy (Feb 2026 Refresh)¶
Assay execution priorities are now explicitly evaluated against five developer-facing dimensions:
| Dimension | Why it matters for Assay | 2026 Direction |
|---|---|---|
| Time-to-first-signal | Teams adopt if first value arrives fast | Keep the golden path short (init -> trace -> run -> PR signal) |
| Quality-of-feedback | Red/green alone is not enough for agent systems | Keep reason codes, explainability, and rerun hints as first-class contracts |
| Workflow fit | Adoption follows existing surfaces | Prioritize CI/PR/SARIF/Security-tab integrations over new standalone UI |
| Trust & auditability | Security/compliance requires reproducibility | Preserve deterministic outputs, seeds, and evidence bundle integrity |
| Change resilience | MCP/tools/policies drift over time | Make drift visible with diff/explain flows before it becomes a blocking surprise |
Execution rule: if a proposal does not clearly improve at least one of these dimensions without raising cognitive load, it is deferred.
Deliberate Non-Plays¶
Based on competitive landscape analysis:
- Not observability: Langfuse, LangSmith, Arize do this better and it's a different market. Integrate via OTel where needed, don't build dashboards.
- Not eval-as-a-service: Agent CI and LangSmith do evals. Assay does policy enforcement + evidence. Overlap on PR-gates, but the value proposition is different.
- Not agent-building: Dagger, Zencoder build agents. Assay validates them. Complementary, not competitive.
- Not universal semantic-hijacking detection: LLM-as-judge or probabilistic semantic gates are not the core enforcement model. Assay stays deterministic and evidence-first.
- Not a full outbound egress firewall (yet): raw network containment belongs to OS/runtime isolation layers. Assay governs tool routes and records policy decisions; it does not replace platform egress controls as an MVP.
Q1 2026: Trust & Telemetry β Complete¶
Objective: Establish Assay as the standard for agent auditability.
Evidence Core¶
- Schema v1 (
assay.evidence.event.v1) definitions - JCS (RFC 8785) canonicalization
- Content-addressed ID generation (
sha256(canonical)) - CLI: export, verify, show
Evidence DX (Lint/Diff/Explore)¶
- Linting: Rule registry, SARIF output with
partialFingerprints,--fail-onthreshold - SARIF identity contract: stable, unique
automationDetails.idper tool/run lineage for deterministic dedupe and traceability - Diff: Semantic comparison (hosts, file access), baseline support
- Explore: TUI viewer with ANSI/control char sanitization (
tuifeature flag)
Hardening (Chaos/Differential Testing)¶
- IO chaos (intermittent failures, short reads,
Interrupted/WouldBlock) - Stream chaos (partial writes, truncation)
- Differential verification (reference parity, spec drift, platform matrix)
Telemetry¶
- OTel Trace/Span context on all events
- OTel trace ingest (
assay trace ingest-otel) - OTel export in test results
Q2 2026: Supply Chain Security¶
Objective: Launch compliance and signing features with zero infrastructure cost.
Strategy: BYOS-first (Bring Your Own Storage) per ADR-015. Users provide their own S3-compatible storage. Managed infrastructure deferred until PMF.
Prioritized Deliverables¶
| Priority | Item | Effort | Value | Status |
|---|---|---|---|---|
| P0 | GitHub Action v2 | Medium | High | β Complete |
| P0 | Exit Codes (V2) | Low | High | β Complete (v2.12.0) |
| P0 | Report IO Robustness (Warnings) | Low | High | β Complete (v2.12.0) |
| P1 | BYOS CLI Commands | Low | High | β
Complete (Phase 1: push/pull/list/store-status, .assay/store.yaml config, provider docs) |
| P1 | Tool Signing (x-assay-sig) | Medium | High | β Complete (v2.9.0) |
| P2 | Pack Engine (OSS) | Medium | High | β Complete (v2.10.0) |
| P2 | EU AI Act Baseline Pack (OSS) | Low | High | β Complete (v2.10.0) |
| P2 | Mandate/Intent Evidence | Medium | High | β Complete (v2.11.0) |
| P1 | Judge Reliability (SOTA E7) | High | High | β Complete (Audit Grade) |
| P1 | Progress N/M (E4.3) | Low | High | β Complete (PR #164) |
| P2 | GitHub Action v2.1 | Low | Medium | β Complete (PR #185) |
| P1 | Golden path (<30 min first signal) | Medium | High | β
Complete (PR #187, init --hello-trace --ci) |
| P1 | Drift-aware feedback (explain + policy/tool diffs) | Medium | High | β
Complete (generate --diff PR #177, explain PR #179) |
| P1 | CLI debt reduction (Wave A/B: typed errors, pipeline, config) | Medium | High | β
Delivered on main; Wave C remains explicitly data-gated |
| P1 | Starter packs (OSS) | Low | High | β Complete (ADR-023) |
| P1 | Audit Kit (Manifest/Provenance) (ADR-025) | Low | High | β Complete (I1 closed-loop) |
| P1 | Soak Testing & Pass^k (ADR-025) | Medium | High | β Complete (I1 closed-loop) |
| P2 | Closure Score & Completeness (ADR-025) | Medium | High | β Complete (I2/I3 closed-loop) |
| P2 | Sim Engine Hardening (limits + budget) | Low | Medium | Superseded by ADR-025 Soak |
| P3 | Sigstore Keyless (Enterprise) | Medium | Medium | Pending |
| Defer | Managed Evidence Store | High | Medium | Q3+ if demand |
| Defer | Dashboard | High | Medium | Q3+ |
See ADRs: ADR-011 (Signing), ADR-013 (EU AI Act), ADR-014 (Action), ADR-015 (BYOS), ADR-016 (Pack Taxonomy) See Spec: SPEC-Tool-Signing-v1 See Debt: RFC-001 DX/UX Governance (historical governance RFC: Wave A/B delivered, Wave C remains performance-only and data-gated)
GitHub Action v2 β Complete¶
Published to GitHub Marketplace: assay-ai-agent-security
Features: - Zero-config evidence bundle discovery - SARIF integration with GitHub Security tab - PR comments (only when findings) - Baseline comparison via cache - Job Summary reports
A. BYOS CLI Commands β Complete (Phase 1)¶
Per ADR-015, evidence storage uses user-provided S3-compatible buckets:
# CLI commands (--store optional when .assay/store.yaml is present)
assay evidence push bundle.tar.gz
assay evidence pull --bundle-id sha256:...
assay evidence list --run-id run_123
assay evidence store-status
- Generic S3 Client: Using
object_storecrate - push command: Upload verified bundle with immutability-safe writes
- pull command: Download bundle by ID or run
- list command: List bundles with filtering, JSON/table/plain output
- store-status command: Connectivity/access/inventory diagnostics (JSON/table/plain)
- Conditional writes:
If-None-Match: "*"for immutability - Content-addressed keys: SHA-256 bundle_id as source of truth
- Structured config:
.assay/store.yamlwith frozen precedence chain - Provider quickstart docs: AWS S3, Backblaze B2, MinIO
Supported backends: AWS S3, Backblaze B2, Wasabi, Cloudflare R2, MinIO, local filesystem (via S3-compatible endpoint or file://)
B. Tool Signing β Complete¶
Per SPEC-Tool-Signing-v1:
assay tool keygen --out ~/.assay/keys/ # Generate PKCS#8/SPKI keypair
assay tool sign tool.json --key priv.pem --out signed.json
assay tool verify signed.json --pubkey pub.pem # Exit: 0=ok, 2=unsigned, 3=untrusted, 4=invalid
-
x-assay-sigfield: Ed25519 + DSSE PAE encoding - JCS canonicalization: RFC 8785 deterministic JSON
- key_id trust model: SHA-256 of SPKI bytes
- Trust policies:
require_signed,trusted_key_ids - Keyless (Enterprise): Sigstore Fulcio + Rekor integration
C. Mandate/Intent Evidence (P2) β Complete (v2.11.0)¶
Full mandate evidence implementation per SPEC-Mandate-v1.0.5:
- Evidence types: Mandate content (intent/transaction), signed envelopes, lifecycle events
- Runtime enforcement: MandateStore (SQLite), 7-step Authorizer flow, revocation support
- CloudEvents:
mandate.used,mandate.revoked,tool.decisionlifecycle events - CLI integration:
--audit-log,--decision-log,--event-sourceflags - Idempotent retries: Deterministic
use_id+was_newflag for audit-log deduplication - Revocation: Hard cutoff (no clock skew tolerance) per SPEC Β§7.6
See ADR-017 for architecture decision.
This strengthens EU AI Act Articles 12 & 14 compliance for commerce workflows.
C. Compliance Packs (P2) β Semgrep Model¶
Per ADR-016, we follow the Semgrep open core pattern: - Engine + Baseline packs = Open Source (Apache 2.0) - Pro packs + Managed workflows = Enterprise (Commercial)
Pack Engine (OSS) β Complete (v2.10.0)¶
assay evidence lint --pack eu-ai-act-baseline # Single pack
assay evidence lint --pack eu-ai-act-baseline,soc2 # Composition
assay evidence lint --pack ./custom-pack.yaml # Custom pack
- Pack loader: YAML schema with
pack_kind(compliance/security/quality) - Rule ID namespacing:
{pack}@{version}:{rule_id}for collision handling - Pack composition:
--pack a,bwith deterministic merge - Version resolution:
assay_min_version+evidence_schema_version - Pack digest: SHA256 (JCS RFC 8785) for supply chain integrity
- SARIF output: Pack metadata in
propertiesbags (GitHub Code Scanning compatible) - Disclaimer enforcement:
pack_kind == compliancerequires disclaimer - GitHub dedup:
primaryLocationLineHashfingerprint - Truncation:
--max-resultsfor SARIF size limits
EU AI Act Baseline Pack (OSS) β Complete (v2.10.0)¶
Direct Article 12(1) + 12(2)(a)(b)© mapping:
| Rule ID | Article | Check | Status |
|---|---|---|---|
| EU12-001 | 12(1) | Evidence bundle contains automatically recorded events | β |
| EU12-002 | 12(2)© | Events include lifecycle fields for operation monitoring | β |
| EU12-003 | 12(2)(b) | Events include correlation IDs for post-market monitoring | β |
| EU12-004 | 12(2)(a) | Events include fields for risk situation identification | β |
See ADR-013 for detailed mapping and SPEC-Pack-Engine-v1 for implementation spec.
EU AI Act Pro Pack (Enterprise)¶
- Biometric-specific rules (Article 12(3))
- Retention policy validation
- Advanced risk scoring
- Org-specific exception workflows
- PDF audit report generation
Additional Packs (Future)¶
- Commerce Pack: Mandate/intent required, signed-tools required (enabled by v2.11.0 mandate support)
- SOC2 Baseline (OSS): Common Criteria mapping delivered (see ADR-022, pack in
packs/open/soc2-baseline/) - SOC2 Pro (Enterprise): assurance-depth mappings and workflow integrations
- Starter packs (OSS): CICD hygiene, minimal traceability β compatibility floor; see Β§F
- Pack Registry: Local packs in
~/.assay/packs/(ADR-021, implemented in PR #287)
E. GitHub Action v2.1 β Complete¶
Per ADR-018:
| Priority | Feature | Rationale |
|---|---|---|
| P1 | Compliance pack support | EU AI Act compliance story |
| P2 | BYOS push with OIDC | Zero-credential enterprise posture |
| P3 | Artifact attestation | Supply chain integrity |
| P4 | Coverage badge | Developer DX |
Key design decisions: - Write operations (push, attest, badge) only on push to main (fork PR threat model) - OIDC authentication per provider (explicit, not auto-detect) - Attestations provide "SLSA-aligned provenance" (no specific level claims) - Attestation lifecycle is staged: produce on push-to-main, verify in release/promote lanes, and keep fail-closed semantics scoped to release artifacts until stability is proven - EU AI Act timeline is treated as phased guidance (legal mapping source remains EUR-Lex text and versioned pack mappings)
See ADR-018 for full specification.
F. Starter Packs (OSS) (P1) β Complete¶
CICD-hygiene pack as compatibility floor for adoptionβminimal traceability so teams get first value from assay evidence lint with minimal config. See ADR-023. Merged PR #289.
assay evidence lint --pack cicd-starter bundle.tar.gz
assay evidence lint --pack cicd-starter,eu-ai-act-baseline bundle.tar.gz
Status per step (codebase-check):
| Check | Status |
|---|---|
cicd-starter or similar pack | β Present (packs/open + vendored) |
Pack in packs/open/ or BUILTIN_PACKS | β cicd-starter in both |
| CICD-hygiene rules | β CICD-001..004 in pack |
Scope: - [x] Pack: cicd-starter (kind: quality), in packs/open/cicd-starter/ - [x] Default: cicd-starter when no --pack specified (PLG) - [x] Rules: CICD-001 (event count); CICD-002 (assay.profile.started/.finished); CICD-003 (traceparent/tracestate/run_id); CICD-004 (build_id/version, info) - [x] BUILTIN_PACKS + vendoring: packs/open vendored to crates/assay-evidence/packs/ - [x] Docs: README per ADR-023 Appendix A; pinned GH Action; --fail-on warning; Next steps (follow-up)
Design decisions: - kind: quality (no disclaimer; distinct from compliance packs) - Light rules onlyβreuse existing check types: event_count, event_pairs, event_field_present - Composable with eu-ai-act-baseline for teams graduating to compliance
G. Reliability Surface & Soak (P1) [ADR-025]¶
Pivot from generic "simulation" to Policy Soak Testing as a reliability product. See ADR-025.
Scope (Iteration 1): - [x] CLI: assay sim soak subcommand with pass^k semantics (pass_all, pass_rate) - [x] Report: soak-report-v1 strict JSON schema (decision_policy, violations_by_rule) - [x] Determinism: Seeded execution for reproducible reliability - [x] Limits: Time budget and resource limits (inherited from ADR-024 work)
Design decisions: - Pass condition: "Pass All K" is the gold standard for Agentic CI - Evidence: The Soak Report itself is an artifact in the evidence bundle - Step3 rollout status: informational nightly soak + informational readiness aggregation are active; no PR required-check impact in Step3 - Step4 rollout status: fail-closed enforcement is active in release lane only (policy v1 + readiness enforcement script); PR lanes remain unchanged
H. Audit Kit & Closure (P2) [ADR-025] β Complete¶
Formalize "Evidence-as-a-Product" with provenance and replayability scores.
Scope (Iteration 1, 2, 3): - [x] Manifest Extensions: x-assay.packs_applied and mappings for provenance (I2) - [x] Completeness: Pack-relative signal gaps (required vs captured) (I2) - [x] Closure Score: Replay-relative score (0.0-1.0) for hermetic replay readiness (I2) - [x] OTEL Bridge: Export Assay events to OTLP/GenAI SemConv (Iteration 3)
Q3 2026: Enterprise Scale (Growth)¶
Objective: Integration with the broader security ecosystem + agentic protocol support, led by route-governance primitives rather than broader surface area.
Route Governance Core (Highest Priority)¶
March 2026 experiment evidence changes the ordering inside Q3. Before expanding adapter breadth or managed surfaces, Assay should close the product gap between experiment-only route governance and reusable platform primitives.
| Priority | Capability | Why now | MVP boundary |
|---|---|---|---|
| P0 | Tool taxonomy as first-class classes | Second-sink and tool-hopping results show raw tool names are brittle. | Source/sink/store/exec class map, policies written on classes, evidence logs record matched class. |
| P0 | Session identity + state store contract | Cross-session decay shows route memory is required beyond a single run. | Explicit session key, replayable state store interface, TTL/window semantics, evidence export/import. |
| P1 | Coverage/completeness reports | Governance without visibility into unknown tools and untested routes creates blind spots. | JSON + SARIF informational report for tools seen vs declared, unknown tools, and route coverage gaps. |
| P1 | Machine-readable decision logs | Enterprises need debugging without log archaeology. | JSONL decision events with rule_id, reason_code, matched_fields, decision_path, policy_version, latency. |
| P1 | Replay with state snapshots | The product claim is verifiability, not just blocking. | Replay tool-call logs plus policy + state snapshot, with decision diffs on policy/state changes. |
| P2 | Hardening defaults for ingest/proxy | Inline governance inherits parser attack surface. | Default caps, fuzz/property tests, and consistent fail-closed semantics. |
| P2 | Evidence-first interop bridges | OTel/MCP interop is useful only if enforcement and evidence stay first-class. | Lossiness-accounted bridges, adapter metadata everywhere, no LLM-judge in core enforcement. |
These items outrank additional dashboard and growth-only work because the experiment ladder shows Assay wins on route/state governance, not on content filtering or surface-area expansion alone.
A. Protocol Adapters (Adapter-First Strategy)¶
Lightweight adapters that map protocol-specific events to Assay's EvidenceEvent + policy hooks:
| Adapter | Protocol | Focus |
|---|---|---|
assay-adapter-acp | Agentic Commerce Protocol | OpenAI/Stripe checkout flows |
assay-adapter-ucp | Universal Commerce Protocol | Google/Shopify commerce journeys |
assay-adapter-a2a | Agent2Agent | Agent discovery/tasks/messages |
- Adapter trait: Common interface for protocol β EvidenceEvent mapping
- ACP adapter: Tool calls, checkout events, payment intents (leverages v2.11.0 mandate support)
- UCP adapter: Discover/buy/post-purchase state transitions
- A2A adapter: Agent capabilities, task delegation, artifacts
Status on main: - assay-adapter-api, assay-adapter-acp, assay-adapter-a2a, and assay-adapter-ucp are merged in open core. - ADR-026 stabilization through E4 is merged on main (metadata identity, lossiness preservation, host attachment policy, canonical digests, parser hardening). - UCP now follows the same A/B/C rollout discipline as ACP and A2A: Step1 freeze, Step2 MVP + fixtures, Step3 closure docs.
Why adapters: The market is fragmenting (ACP vs UCP vs AP2 vs x402). Assay's value is protocol-agnostic governance, not protocol lock-in.
Enabled by v2.11.0: The mandate evidence module provides the foundation for AP2-style authorization tracking in these adapters.
AAIF governance note: MCP and A2A are now under the Agentic AI Foundation (Linux Foundation, Dec 2025). This reduces protocol fragmentation risk and makes adapter investments more durable.
B. Connectors¶
- SIEM: Splunk / Microsoft Sentinel export adapters
- CI/CD: GitHub Actions v2 (Rul1an/assay-action@v2) / GitLab CI integration
- GitHub App: Native policy drift detection in PRs
- GitLab CI: Native integration
- OTel GenAI: Align evidence export with OTel GenAI semantic conventions β conventions still experimental but Pydantic AI already follows them; monitor for stability before building bridge
C. Additional Compliance Packs¶
- SOC 2 Pack: Control mapping for Type II audits
- MCPTox: Regression testing against jailbreak/poisoning patterns
- Industry Packs: Healthcare (HIPAA), Finance (PCI-DSS)
D. Managed Evidence Store (Evaluate)¶
Only proceed if: 1. Users explicitly request managed hosting 2. Revenue model supports infrastructure costs 3. PMF is validated
If yes, implement per ADR-009 and ADR-010: - [ ] Cloudflare Workers + R2: Non-SEC-compliant tier (lowest cost) - [ ] Backblaze B2 Proxy: SEC 17a-4 compliant tier - [ ] Pricing: Pass-through storage + margin
Q4 2026: Platform Features¶
Objective: Advanced capabilities for enterprise adoption.
A. Governance Dashboard (If Managed Store Exists)¶
- Policy Drift: Trend lines, anomaly detection
- Degradation Reports: Evidence health score
- Env Strictness Score: Compliance posture metrics
B. Assurance & Audit Readiness (If Managed Store Exists)¶
- Policy Exceptions: Waivers with expiry, owner, rationale; audit trail for compliance exceptions
- Auditor Portal: Read-only export of packs + results + fingerprints; "audit-ready bundles" for external auditors
C. Advanced Signing & Attestation¶
- Sigstore Keyless: Fulcio certificate + Rekor transparency log
- SCITT Integration: Transparency log for signed statements (IETF draft)
- Org Trust Policies: Managed identity verification
D. Identity & Authorization Stack¶
Enterprise identity for agentic workloads:
- SPIFFE/SPIRE: Workload identity for non-human actors
- FAPI 2.0 Profile: High-security OAuth for agent commerce APIs
- OpenID4VP/VCI: Verifiable credentials for mandate attestation
- OAuth 2.0 BCP (RFC 9700): DPoP sender-constrained tokens
Why: AP2/UCP mandate flows require provable authorization. FAPI/OpenID4VP are the emerging standards.
E. Managed Isolation (Future)¶
- Managed Runners: Cloud-hosted MicroVMs (Firecracker/gVisor)
- Zero-Infra:
assay run --remote ...transparent offloading
Backlog / Deferred¶
Evidence Streaming Mode (Optional)¶
- Streaming Mode: Native events + async mapping for real-time OTel export and Evidence Store ingest
EventSinktrait withAggregatingProfileSink(default) andStreamingSink(feature-gated)assay evidence streamcommand (NDJSON to stdout/file)- Backpressure handling, bounded memory
- See ADR-008 for design
Note: This is a product capability, not a refactoring item. The current ProfileCollector β EvidenceMapper pipeline is correct per OTel Collector pattern. Streaming mode adds an alternative path for real-time use cases without changing the default behavior.
Runtime Extensions (Epic G)¶
- ABI 6/7: Signal scoping (v6), Audit Logging (v7)
- Learn from Denials: Policy improvement from blocked requests
Hash Chains (Epic K)¶
- Tool Metadata Linking: Link tool definitions to policy snapshots
- Integrity Verification: Cryptographic tool-to-policy binding
HITL Implementation (Epic L)¶
- Decision Variant + Receipts: Human-in-the-loop tracking
- Guardrail Hooks: NeMo/Superagent integration
Pack Marketplace (Future)¶
- Partner packs: Third-party packs via marketplace (rev share model)
Foundation (Completed 2025)¶
The core execution and policy engine is stable and production-ready.
Core Engine¶
- Core Sandbox: CLI runner with Landlock isolation (v1-v4 ABI)
- Policy Engine v2: JSON Schema for argument validation
- Profiling: "Learning Mode" to generate policies from traces
- Enforcement: strict/fail-closed modes, environment scrubbing
- Tool Integrity (Phase 9): Tool metadata hashing and pinning
Runtime Security¶
- Runtime Monitor: eBPF/LSM kernel-level enforcement
- Policy Compilation: Tier 1 (kernel/LSM) and Tier 2 (userspace)
- MCP Server: Runtime policy enforcement proxy
Testing & Validation¶
- Trace Replay: Deterministic replay without LLM API calls
- Baseline Regression: Compare runs against historical baselines
- Agent Assertions: Sequence and structural expectations
- Quarantine: Flaky test management
Developer Experience¶
- Python SDK:
AssayClient,Coverage,Explainer, pytest plugin - Doctor: Diagnostic tool for common issues
- Explain: Human-readable violation explanations
- Coverage Analysis: Policy coverage calculation
- Auto-Fix: Agentic policy fixing with risk levels
- Demo: Demo environment generator
- Setup: Interactive system setup
Reporting & Integration¶
- Multiple Formats: Console, JSON, JUnit, SARIF
- OTel Integration: Trace ingest and export
- CI Integration: GitHub Actions / GitLab CI workflows
Advanced Features¶
- Attack Simulation:
assay simhardening/compliance testing - MCP Discovery: Auto-discovery of MCP servers
- MCP Management: Kill/terminate MCP servers
- Experimental: MCP process wrapper (hidden command)
Open Core Philosophy¶
Assay follows the open core model (Semgrep pattern): engine + baseline packs are open source, managed workflows + pro packs are enterprise.
See ADR-016: Pack Taxonomy for formal definition.
Open Source (Apache 2.0)¶
Everything needed to create, verify, and analyze evidence locally:
| Category | Components |
|---|---|
| Evidence Contract | Schema v1, JCS canonicalization, content-addressed IDs, deterministic bundles |
| CLI Workflow | export, verify, lint, diff, explore, show |
| BYOS Storage | push, pull, list with S3/Azure/GCS/local backends |
| Basic Signing | Ed25519 local key signing and verification (v2.9.0) |
| Pack Engine | --pack loader, composition, SARIF output, digest verification (v2.10.0) |
| Baseline Packs | eu-ai-act-baseline (Article 12 mapping, v2.10.0), soc2-baseline (Common Criteria baseline, ADR-022) |
| Mandate Evidence | Mandate types, signing, runtime enforcement, CloudEvents lifecycle (v2.11.0) |
| Runtime Security | Policy engine, MCP proxy, eBPF/LSM monitor, mandate authorization |
| Developer Experience | Python SDK, pytest plugin, GitHub Action |
| Output Formats | SARIF, JUnit, JSON, console, NDJSON (audit/decision logs) |
Why open: Standards adoption requires broad accessibility. The evidence format and baseline compliance checks should become infrastructure, not a product moat.
Enterprise Features (Commercial)¶
Governance workflows and premium compliance for organizations:
| Category | Components |
|---|---|
| Identity & Access | SSO/SAML/SCIM, RBAC, teams, approval workflows |
| Pro Compliance Packs | eu-ai-act-pro (biometric rules, PDF reports), soc2-pro, industry packs β assurance depth + maintained mappings + auditor-friendly reporting |
| Managed Workflows | Exception approvals, policy exceptions (waivers with expiry/owner/rationale), scheduled scans, compliance dashboards |
| Auditor Portal | Read-only export, audit-ready bundles, packs + results + fingerprints (when Managed Store exists) |
| Advanced Signing | Sigstore keyless, transparency log verification, org trust policies |
| Managed Storage | WORM retention, legal hold, compliance attestation |
| Integrations | SIEM connectors (Splunk/Sentinel), OTel pipeline templates |
| Fleet Management | Policy distribution, runtime agent management |
Principle: Gate workflow scale and org operations, not basic compliance checks. The "workflow moat" strategy: engine free, baseline free, managed workflows paid.