Does “XBOW AI Hacker” Deserve the Hype?

Jun 25, 2025

Las week, I wrote an article about how AI hacker agents could reduce the need for human penetration testers. Yesterday, a company called XBOW claimed that they invented this product and people keep sending me this news: an AI company called XBOW invented an AI hacker, and it’s scored top in HackerOne.

Is this a breakthrough or just a marketing hype? As someone who has spent years doing penetration testing and previously worked at HackerOne as a triage analyst, I want to share my thoughts.

What Can We Already Solve Without AI?

Automated application security testing is nothing new. DAST (Dynamic Application Security Testing) tools like Burp Suite and Invicti (Acunetix+Netsparker) have been around for 20 years and are continuously improving. These tools use a crawling engine to traverse the application, gathering various requests, endpoints, and parameters. They have a massive payload list to find vulnerabilities. They bombard the target with these payloads and then check the response (or page itself) to detect any vulnerabilities. They are particularly skilled at finding injection issues like SQL Injection, Command Injection, XSS, and SSRF because the methods for finding them are deterministic.

However, their scope is limited. Tools like Burp and Invicti don’t handle subdomain enumeration or detect misconfigurations in cloud setups or API endpoints. That’s where open-source tools shine. subfinder can be used to enumerate subdomains while nuclei can be used to find exposed endpoints, secrets, vulnerabilities, and configuration mistakes. These tools are widely used by bug bounty hunters.

What Remains Unsolved Without Human Input?

Most of the vulnerabilities lie behind authentication. And configuring scanners to stay authenticated is not very straightforward. They usually struggle with complex login flows, session handling, and the dynamic content. Sometimes the only reliable way is to manually crawl the application and allow scanner to consume requests.

More importantly, certain vulnerabilities simply can't be detected without an understanding of business logic. Take IDOR (Insecure Direct Object Reference) issues an an example. If a receipt ID is guessable and someone can view another user’s data by incrementing the number, that’s a critical issue. But it’s not something DAST tools can reason about. They don’t “understand” context. They also struggle with logic flaws like abusing discount mechanisms, manipulating shopping cart prices, race conditions, or exploiting misconfigured OAuth flows.

So What Does XBOW Actually Solve?

Given the AI hype, you’d expect XBOW to tackle the hard problems: logic bugs, authentication complexities, or deep contextual understanding. But based on their own blog post, XBOW seems to be solving the problems that are already solved. They highlight vulnerabilities like Remote Code Execution, SQL Injection, XXE, Path Traversal, SSRF, XSS, and so on. These are serious vulnerabilities. But also the bread and butter of conventional DAST tools.

Unless they’re outperforming existing tools in terms of signal-to-noise ratio (i.e., no positives, better accuracy etc.), there’s nothing fundamentally new here. They also don’t mention authentication at all, so we don’t know how good they are at that.

They also mention a scoring system that ranks subdomains based on various factors: target attractiveness, WAF presence, HTTP status code, authentication forms, number of endpoints, etc. That’s cool, but it sounds a lot like what tools like httpx, gau, what web, and nuclei are already doing. If XBOW can do them all flawlessly, I can say it’s a good product. However, if it’s stuttering or producing incorrect results, it’s not acceptable because the same thing can be achieved for free. It would be hard to justify the “AI hacker” label.

About That HackerOne Leaderboard Claim

I want to underline this for everyone: XBOW topped the VDP (Vulnerability Disclosure Program) leaderboard and not the Bug Bounty program. That’s important. Most of the good hackers don’t spend time in VDPs. In other words, there’s far less competition in the VDP space.

That said, topping the VDP leaderboard is still not an easy thing to do. If XBOW managed to find valid bugs across multiple programs using “just their software”, that’s impressive. But we don’t know how much of it was automated vs. human-assisted.

They also didn’t share vulnerability reports. Therefore, we don’t know if they are low-hanging fruits or sophisticated issues.

Final Thoughts

As I wrote in my earlier article, I believe that AI agents will reduce the demand for human testers over time. But is XBOW that turning point? We don’t know. We need an independent validation. If the tool can consistently outperform traditional scanners (without getting super expensive), then it deserves a real recognition. Until then, I’m keeping my expectations in check.

Ozgur Alp

Sep 25

Three months have passed and it has shown to be mostly a marketing success. However, within the hype, they increased their investment significantly and now they have a chance to take it further. Better than nothing.

Expand full comment

michael

Jun 27Edited

I'm also not 100% sure how to best read the Hacker1 leaderboard stats, but to me claiming the top spot with just 130/~1K resolved cases on bounty-less programs smells inconclusive at best

I.E. are they better at hacking the leaderboard than actual systems.

Utku Sen’s Substack

Discussion about this post