AI Agents, Bot Auth, and TLS Identity
Two things changed quietly in 2024–25. First, AI scrapers became the majority of non-human web traffic. Second, the web finally got a serious mechanism for AI bots to prove who they are: Web Bot Auth, built on HTTP Message Signatures (RFC 9421). This guide explains what's happening, what's at stake for site operators, and how to verify the bots claiming to crawl you.
The Problem: User-Agent Strings Lie
Until 2024, the only thing a polite AI crawler could do to identify itself was set a
User-Agent header (GPTBot, PerplexityBot,
ClaudeBot, Google-Extended) and hope you trusted it. The
impolite ones just claimed to be Chrome. Operators who wanted to block AI training
scrapers had three options, all bad:
- Trust the user-agent (easily spoofed).
- Maintain reverse-DNS verification scripts per vendor (high maintenance).
- Block all unidentified bots and accept the false positives.
The asymmetry got intolerable. Cloudflare reported in 2024 that AI training and inference bots accounted for a double-digit percentage of all bot traffic to sites on its network, with both impersonation and ignore-robots.txt behaviour well-documented. The IETF picked up the problem, and the result is Web Bot Auth.
Web Bot Auth, in One Paragraph
A bot operator publishes a directory file at a well-known HTTPS URL
listing the public keys it uses to sign requests. When the bot fetches your page, it
adds a Signature-Input and Signature header per
RFC 9421 (HTTP Message Signatures), plus a
Signature-Agent header pointing to the directory. Your server fetches the
directory, verifies the signature using the listed key, and now cryptographically
knows the request really came from the claimed bot. No DNS games, no IP allow-lists,
no user-agent guessing.
The Standards Stack
| Component | Standard | Status |
|---|---|---|
| Message signatures | RFC 9421 — HTTP Message Signatures | Proposed Standard (Feb 2024) |
| Web Bot Auth profile | draft-cloudflare-web-bot-auth | IETF draft, in production at Cloudflare |
| Signature suites | Ed25519, ECDSA P-256, RSA-PSS (RFC 9421) | Stable |
| Discovery | /.well-known/http-message-signatures-directory |
Convention, evolving |
What a Signed Request Looks Like
GET /article.html HTTP/2
Host: example.com
User-Agent: PerplexityBot/1.0 (+https://perplexity.ai/bot)
Accept: text/html
Signature-Agent: "https://perplexity.ai"
Signature-Input: sig1=("@authority" "@method" "@target-uri" "signature-agent");
created=1747396800;
keyid="ed25519-2026-01";
alg="ed25519";
expires=1747396830;
nonce="b3JpZ2luYWwtcmVxdWVzdC1ub25jZQ"
Signature: sig1=:MEUCIQDx5...truncated...==:
The server's verification flow:
- Read
Signature-Agent→https://perplexity.ai - Fetch
https://perplexity.ai/.well-known/http-message-signatures-directoryover HTTPS - Look up
keyid=ed25519-2026-01in the directory's JWKS - Reconstruct the signature base from the
@authority @method @target-uri signature-agentcomponents - Verify the Ed25519 signature against that key
- Check
createdandexpiresare within tolerance - If everything matches, this request really came from Perplexity
perplexity.ai, they can sign as Perplexity. This is why
the directory URL must be HTTPS, and why the bot operator's TLS posture is now part of
the trust model.
The Identity Hierarchy
Web Bot Auth gives you a three-level identity for any bot:
- Level 0: Anonymous. User-agent only. No verification. Treat as untrusted by default.
- Level 1: Reverse-DNS-verified. The bot's source IP reverse-resolves
to a hostname controlled by the claimed operator. This is what Google has used for
Googlebotfor years. Better than nothing, but DNS-poisoning attacks and shared-hosting hazards exist. - Level 2: Signature-verified. The request is signed by a key published in the operator's directory, served over TLS. Cryptographic proof of identity per request.
What Site Operators Should Do
Today
- Audit your bot traffic. Log the
Signature-Agentheader where present. Most production WAFs (Cloudflare, Fastly, Akamai) now expose it. - Decide on an AI policy and encode it in
robots.txtandllms.txt. Trustworthy AI bots will honour both. - Use the signature when blocking. "Block PerplexityBot" is unreliable if you trust the user-agent; verifying the signature first makes the block enforceable.
- Whitelist with confidence. Some operators want to allow specific AI bots while blocking unidentified scrapers. Web Bot Auth is the first mechanism that makes this safe.
If you operate a bot
- Generate an Ed25519 keypair (or ECDSA P-256).
- Publish the public key at
https://your-host/.well-known/http-message-signatures-directoryin JWKS format, served over a properly configured HTTPS endpoint. - Sign each request with the standard components:
@authority @method @target-uri signature-agent created expires. - Rotate keys regularly (180 days is sensible) and keep two keys active during rotation.
- Treat the directory's TLS cert as the root of your bot's identity. Monitor it independently; don't let it expire.
The TLS Connection Underneath
Web Bot Auth is built on top of TLS, and that's deliberate. Two parts of your TLS configuration now affect bot trust:
TLS handshake fingerprinting (JA3 / JA4)
Even before Web Bot Auth, operators were identifying clients by their TLS handshake characteristics — cipher order, extensions, supported groups. JA3 (legacy MD5) and its successor JA4 produce a stable fingerprint of the ClientHello. A real Chrome and a Go HTTP client produce different fingerprints regardless of user-agent.
AI scraping libraries (Playwright in headless mode, curl-impersonate,
Puppeteer) can mimic browser fingerprints. Web Bot Auth is the answer to that arms
race: don't try to distinguish "real Chrome" from "Chrome-mimicking scraper" at the
TLS layer when you can have the operator sign cryptographically at the HTTP layer.
Your TLS posture matters more than ever
If your site can't negotiate TLS 1.3 with modern AEAD ciphers, a portion of agentic clients will downgrade or fail. Worse, an attacker who can MITM your TLS can intercept signed bot requests and replay them — the signature binds to host and method but not to the underlying TLS session. TLS hygiene is foundational; don't treat bot auth as separable.
Securing AI-Exposed APIs and MCP Servers
The same primitives matter when you expose an API for AI agents to call. Whether it's a public LLM endpoint, an internal MCP (Model Context Protocol) server, or a tool-use API, the TLS layer is doing more work than ever:
Practical hardening
- TLS 1.3 only, AEAD ciphers, ECDHE. No exceptions for "legacy clients" — LLM tooling is universally modern.
- mTLS for high-value endpoints. If the agent is calling your private API, client certificates are still the cleanest auth method. Combined with short-lived ACME-issued client certs, the rotation problem goes away.
- HTTP Message Signatures on requests and responses. The agent verifies it's really talking to you; you verify it's really the agent.
- Strict
Content-Security-PolicyandCross-Origin-Resource-Policyon any HTML you serve to agentic browsers — they're still browsers, with the same XSS attack surface. - Watch CT logs for your API hostname. If a phishing operation
obtains a certificate for
api-yourservice.comto fool agents into sending tokens, you want to know within minutes.
The Dark Side: AI-Generated Phishing and Free DV Certs
The same automation that made HTTPS universal also made it free for attackers. Generative
AI lowers the cost of producing convincing phishing pages to near-zero, and Let's
Encrypt happily issues a valid DV certificate for
banking-account-verify.com in 30 seconds. The padlock has never meant
less, and users still trust it.
Defensive moves that still work:
- Brand-aware CT monitoring. Watch CT logs for new certificates containing your brand name or homoglyphs.
- BIMI for email. Verified logos in mailbox UIs help users distinguish real from fake.
- Strict DMARC at
p=reject. AI-generated phishing still relies on spoofed email; DMARC kills it at the protocol layer. - Hardware-token MFA. The only auth factor that survives an AI-generated phishing page that captures everything else.
What's Next
The shape of the next two years:
- Web Bot Auth moves from draft to RFC, probably 2026–27. Standardised verification libraries appear in nginx-modules and Caddy plugins.
- Major mailbox providers adopt RFC 9421 for email API auth, replacing shared-secret SMTP submission for high-volume senders.
- Browser-side agent identity — signed user-agent claims from real users, distinct from bot signatures — becomes the next interesting problem.
- The PQC migration for signature algorithms arrives just in time to apply to all of this. Expect ML-DSA-65 keys in bot directories alongside Ed25519 within two years.