AdversAI — Autonomous AI Pentesting

the problem

Attackers run 24/7.
Defenders run once a quarter.

01
Manual pentests ship too late
A human engagement runs twice a year and covers a fraction of your attack surface. Attackers don't scope.
02
Scanners are loud and shallow
Commodity scanners read headers and call it a day. They miss CORS reflection, DKIM drift, LLM injection, ASM deltas.
03
LLM apps are a new attack surface
Jailbreaks, prompt injection, tool abuse, retrieval leaks. Most tools can't even see them, let alone test them.
04
Guardrails are mostly paper
CLI caps that don't cap. Scope flags that never reach the engine. Paper guardrails convert into production bills.

the stack · core

Five reasoning engines. One ReAct loop.

Every call routes through the same scope validator. Every finding routes through the same ReAct reasoning trace. Every trace is replayable on a network-blocked provider, deterministically.

the stack · v3.1 deep extraction

Eight new engines. Evidence, not warnings.

Tier 1 runs pre-ReAct on every scan. Each engine pulls actual extracted content from the target — bundle contents, buckets, schemas, historical infra — and writes the evidence to disk next to the finding.

tier 1 · 01
JS Bundle Miner
Scans webpack/vite bundles for 20+ secret classes (AWS, Stripe, GitHub, Supabase, Firebase, JWTs).
hoppa.global — 715KB bundle beautified, regex-scanned, zero leaks
tier 1 · 02
Source Map Reconstructor
Rebuilds full source code when .js.map files ship to prod — 14 URL patterns probed per target.
tier 1 · 03
GraphQL Introspector
Dumps complete schema, detects exposed playgrounds (GraphiQL / Apollo), writes SDL + JSON.
tier 1 · 04
.well-known Enumerator
Harvests robots.txt, OIDC config, security.txt, sitemap — parses Disallow paths for sensitive surface.
tier 1 · 05
Subdomain Deep Scanner
Certificate transparency + DNS brute + amass passive → full historical subdomain inventory, live-host classified.
hoppa.global — staging. + test. surfaced as HIGH
tier 1 · 06
Bucket Enumerator
Finds exposed S3 / GCS / Azure buckets by brand pattern, parses XML listings, captures up to 100 keys.
tier 1 · 07
Full Port Scanner
nmap top-100 with service detection. Risky ports (MSSQL / Postgres / Redis / Mongo / K8s API) → CRITICAL.
tier 1 · 08
DNS History Miner
SecurityTrails + VirusTotal + DNSDumpster → every IP the target ever used → finds forgotten legacy infra.

Dogfooded on hoppa.global — 24s wall-clock, 21 findings, 2 HIGH (staging & test subdomains surfaced). 458 tests green, 75% engine coverage. Version 3.1.0.

jurisdiction

Offensive security, licensed in-region.

AdversAI operates under a UAE federal offensive-security authorization. For GCC clients — federal, financial, critical-infra — that means engagements stay jurisdictionally clean: license in-region, execution in-region, evidence in-region.

01
UAE-licensed offensive security
Operating under a live UAE federal information-security license. Engagement letters signed in-region, offensive work authorized by national framework — not a re-sold US license or a self-declared statement.
02
GCC-approved. Federal-scope authorized.
Scope approved for Gulf Cooperation Council entities, including federal agencies and regulated sectors (finance, telco, critical infra). Cleared to test production systems, not just staging sandboxes.
03
Data sovereignty — findings stay in-region
All scan artefacts, extracted secrets, replay caches, and reasoning traces live on Tailscale-private infra inside the UAE. No third-party SaaS ingestion. No cross-border transfer of your evidence.

license number available on request under NDA

the ultrareview

Every claim audited.
Every guardrail honest.

Three parallel adversarial agents audit every major commit — Security, Correctness, Quality — with grep-verified findings and file/line citations. When we find a paper guardrail in our own code, we publish the fix before the changelog.

passed · 2026-04-17

why it matters

Security

Hunts for injection, auth bypass, and dead-plumbing risk in our own code before it reaches yours.

No eval, exec, unsafe YAML, or shell=True anywhere in the codebase.

CSP with strict-dynamic + 144-bit CSPRNG nonce per request; unsafe-eval dropped entirely.

_block_network patches socket/getaddrinfo/gethostbyname — replay mode proven hermetic.

why it matters

"A pentest platform that can be tricked into exfiltrating its own customer data is worthless."

AdversAI is built by operators who've watched automated pentest tools ship paper guardrails — CLI flags that look like caps but never reach the engine, CSP nonces that never get consumed, test suites that monkey-patch the very signature the production code breaks. Every time we catch one of those in our own code, we publish it: the finding, the file, the line, the fix.
— adversai operating doctrine

what you get back

Findings ship as Markdown, JSON, and SARIF.

Reproduction steps, remediation, CVSS, replayable trace. Drop straight into PRs, Jira, or your SIEM.

live demo

Real scans. Real output. No mockup.

These are verbatim traces from production scans, anonymized. Tap an engine to watch it run.

passive recon on acme.example

adversai — osint.trace● hermetic · replay mode

the landscape

What the field actually looks like.

We read the competitors' READMEs, ran their demos, and ported their best ideas (Shannon-style reasoning, PentAGI's autonomy arc). Here's an honest grid. Where a rival is better, we say so.

Capability	AdversAI v3.1	Shannon Lite	Pentest Swarm AI	Strix	Artemis	METATRON	PentAGI	PyRIT	garak
Deep content extraction	Yes (8 engines)	No	No	Partial	Partial	No	No	No	No
Secret extraction from bundles	Yes (20+ classes)	No	No	Partial	No	No	No	No	No
OSINT engine	Yes	No (white-box)	Yes	Yes	Yes	Partial	Yes	No	No
Network engine	Yes	Yes	Yes	Yes	Yes	Yes	Yes	No	No
Reasoning loop	Yes (ReAct)	Yes (2-stage)	Yes (ReAct)	Yes (agentic)	Partial	Yes (multi-step)	Yes	No	No
Autonomous exploit	Yes (playbook)	Yes (PoC req'd)	Yes	Yes	No	Partial	Yes	No	No
LLM red-team	Yes	No	No	No	No	No	No	Yes	Yes
Scope guardrails	Per-call validator	Implicit (source)	Yes	Yes	Yes (CERT-PL)	Yes	Yes	N/A	N/A
Black-box (no source)	Yes	No	Yes	Yes	Yes	Yes	Yes	Yes	Yes
Replayable trace	Yes	No	No	No	Partial	No	No	Partial	Partial
Licensed / in-region	UAE federal	No	No	No	PL national	No	No	No	No
Providers	4 + Bedrock	Claude only	Claude only	Multi	N/A	N/A	Multi	Multi	Multi

AdversAI v3.1

ours

Deep content extraction: Yes (8 engines)
Secret extraction from bundles: Yes (20+ classes)
OSINT engine: Yes
Network engine: Yes
Reasoning loop: Yes (ReAct)
Autonomous exploit: Yes (playbook)
LLM red-team: Yes
Scope guardrails: Per-call validator
Black-box (no source): Yes
Replayable trace: Yes
Licensed / in-region: UAE federal
Providers: 4 + Bedrock

Shannon Lite

Deep content extraction: No
Secret extraction from bundles: No
OSINT engine: No (white-box)
Network engine: Yes
Reasoning loop: Yes (2-stage)
Autonomous exploit: Yes (PoC req'd)
LLM red-team: No
Scope guardrails: Implicit (source)
Black-box (no source): No
Replayable trace: No
Licensed / in-region: No
Providers: Claude only

Pentest Swarm AI

Deep content extraction: No
Secret extraction from bundles: No
OSINT engine: Yes
Network engine: Yes
Reasoning loop: Yes (ReAct)
Autonomous exploit: Yes
LLM red-team: No
Scope guardrails: Yes
Black-box (no source): Yes
Replayable trace: No
Licensed / in-region: No
Providers: Claude only

Strix

Deep content extraction: Partial
Secret extraction from bundles: Partial
OSINT engine: Yes
Network engine: Yes
Reasoning loop: Yes (agentic)
Autonomous exploit: Yes
LLM red-team: No
Scope guardrails: Yes
Black-box (no source): Yes
Replayable trace: No
Licensed / in-region: No
Providers: Multi

Artemis

Deep content extraction: Partial
Secret extraction from bundles: No
OSINT engine: Yes
Network engine: Yes
Reasoning loop: Partial
Autonomous exploit: No
LLM red-team: No
Scope guardrails: Yes (CERT-PL)
Black-box (no source): Yes
Replayable trace: Partial
Licensed / in-region: PL national
Providers: N/A

METATRON

Deep content extraction: No
Secret extraction from bundles: No
OSINT engine: Partial
Network engine: Yes
Reasoning loop: Yes (multi-step)
Autonomous exploit: Partial
LLM red-team: No
Scope guardrails: Yes
Black-box (no source): Yes
Replayable trace: No
Licensed / in-region: No
Providers: N/A

PentAGI

Deep content extraction: No
Secret extraction from bundles: No
OSINT engine: Yes
Network engine: Yes
Reasoning loop: Yes
Autonomous exploit: Yes
LLM red-team: No
Scope guardrails: Yes
Black-box (no source): Yes
Replayable trace: No
Licensed / in-region: No
Providers: Multi

PyRIT

Deep content extraction: No
Secret extraction from bundles: No
OSINT engine: No
Network engine: No
Reasoning loop: No
Autonomous exploit: No
LLM red-team: Yes
Scope guardrails: N/A
Black-box (no source): Yes
Replayable trace: Partial
Licensed / in-region: No
Providers: Multi

garak

Deep content extraction: No
Secret extraction from bundles: No
OSINT engine: No
Network engine: No
Reasoning loop: No
Autonomous exploit: No
LLM red-team: Yes
Scope guardrails: N/A
Black-box (no source): Yes
Replayable trace: Partial
Licensed / in-region: No
Providers: Multi

Sources: Shannon Lite (GitHub KeygraphHQ/shannon), Pentest Swarm AI (Armur-Ai/Pentest-Swarm-AI), Strix (usestrix), Artemis (CERT-PL/Artemis), METATRON (agentic-pentest), PentAGI (vxcontrol/pentagi), PyRIT (Microsoft), garak (NVIDIA). Verified 2026-04-18.

Our differentiator: UAE federal license + deep content extraction (20+ secret classes, source-map reconstruction, historical-infra enumeration) + evidence-grade reports (Markdown, JSON, SARIF) with replayable reasoning traces.

unified scoring

One number.
Grounded in five.

Every finding is graded by engine-local severity, then fused into a single composite score weighted by attacker cost, exploitability, and blast radius. The dial on the right tracks the score as you read this section.

0–39Hardened. Minor posture notes only.
40–74Findings exist. Patch path is clear.
75–100Active exposure. Mitigate before it costs you.

pricing

Start free. Grow when you need to.

The Solo tier ships the full kit — you're trusted to run responsibly. Paid tiers add the glue teams actually pay for: shared scope, audit logs, SSO, incident paths.

Tier 2 authenticated testing and Tier 3 full red team engagements require a signed engagement letter — contact us.

Solo
$0self-hosted
One operator. Full kit.
- All 5 engines, unrestricted
- CLI + JSON/Markdown reports
- Bring your own LLM (Claude, OpenAI, Ollama, LMStudio)
- Replay cache + non-mocked test harness
- Community Discord
Get the code
most picked
Team
$499per month · 5 seats
Shared scope, shared reports.
- Everything in Solo
- Shared scope registry + audit log
- Playbook library + ASM watcher
- HackerOne / Bugcrowd scope sync
- Priority Slack support, business-hours SLA
Start trial
Enterprise
Customannual
SOC-aligned. On your infra.
- Everything in Team
- SSO, SCIM, RBAC
- Private deployment · AWS Bedrock preview
- Dedicated scope-review engineer
- 24/7 incident path · named TAM
Book a walkthrough
Research
Free*reviewed per request
Academics, CERTs, non-profits.
- Same kit as Team
- Publish findings — we'll coordinate disclosure
- Citation in quarterly security notes
- * Subject to scope review
Apply

start

Break in first.
Ship it cleaner.

Pull AdversAI today. Point it at an allowlisted target. See a trace in your terminal before your coffee lands.

Get the code Re-watch the demo

· no card · no lock-in · MIT for the core ·

We break in
before they do.

Attackers run 24/7.
Defenders run once a quarter.

Manual pentests ship too late

Scanners are loud and shallow

LLM apps are a new attack surface

Guardrails are mostly paper

Five reasoning engines. One ReAct loop.

OSINT

Network

Cortex

Phantom

LLM Red-Team

Eight new engines. Evidence, not warnings.

JS Bundle Miner

Source Map Reconstructor

GraphQL Introspector

.well-known Enumerator

Subdomain Deep Scanner

Bucket Enumerator

Full Port Scanner

DNS History Miner

Offensive security, licensed in-region.

Every claim audited.
Every guardrail honest.

Security

Findings ship as Markdown, JSON, and SARIF.

Real scans. Real output. No mockup.

What the field actually looks like.

AdversAI v3.1

Shannon Lite

Pentest Swarm AI

Strix

Artemis

METATRON

PentAGI

PyRIT

garak

One number.
Grounded in five.

Start free. Grow when you need to.

Break in first.
Ship it cleaner.

We break inbefore they do.

Attackers run 24/7.Defenders run once a quarter.

Manual pentests ship too late

Scanners are loud and shallow

LLM apps are a new attack surface

Guardrails are mostly paper

Five reasoning engines. One ReAct loop.

OSINT

Network

Cortex

Phantom

LLM Red-Team

Eight new engines. Evidence, not warnings.

JS Bundle Miner

Source Map Reconstructor

GraphQL Introspector

.well-known Enumerator

Subdomain Deep Scanner

Bucket Enumerator

Full Port Scanner

DNS History Miner

Offensive security, licensed in-region.

Every claim audited.Every guardrail honest.

Security

Findings ship as Markdown, JSON, and SARIF.

Real scans. Real output. No mockup.

What the field actually looks like.

AdversAI v3.1

Shannon Lite

Pentest Swarm AI

Strix

Artemis

METATRON

PentAGI

PyRIT

garak

One number.Grounded in five.

Start free. Grow when you need to.

Break in first.Ship it cleaner.

We break in
before they do.

Attackers run 24/7.
Defenders run once a quarter.

Every claim audited.
Every guardrail honest.

One number.
Grounded in five.

Break in first.
Ship it cleaner.