AI Agents, Bot Auth, and TLS Identity

Two things changed quietly in 2024–25. First, AI scrapers became the majority of non-human web traffic. Second, the web finally got a serious mechanism for AI bots to prove who they are: Web Bot Auth, built on HTTP Message Signatures (RFC 9421). This guide explains what's happening, what's at stake for site operators, and how to verify the bots claiming to crawl you.

The Problem: User-Agent Strings Lie

Until 2024, the only thing a polite AI crawler could do to identify itself was set a User-Agent header (GPTBot, PerplexityBot, ClaudeBot, Google-Extended) and hope you trusted it. The impolite ones just claimed to be Chrome. Operators who wanted to block AI training scrapers had three options, all bad:

  • Trust the user-agent (easily spoofed).
  • Maintain reverse-DNS verification scripts per vendor (high maintenance).
  • Block all unidentified bots and accept the false positives.

The asymmetry got intolerable. Cloudflare reported in 2024 that AI training and inference bots accounted for a double-digit percentage of all bot traffic to sites on its network, with both impersonation and ignore-robots.txt behaviour well-documented. The IETF picked up the problem, and the result is Web Bot Auth.

Web Bot Auth, in One Paragraph

A bot operator publishes a directory file at a well-known HTTPS URL listing the public keys it uses to sign requests. When the bot fetches your page, it adds a Signature-Input and Signature header per RFC 9421 (HTTP Message Signatures), plus a Signature-Agent header pointing to the directory. Your server fetches the directory, verifies the signature using the listed key, and now cryptographically knows the request really came from the claimed bot. No DNS games, no IP allow-lists, no user-agent guessing.

The Standards Stack

Component Standard Status
Message signatures RFC 9421 — HTTP Message Signatures Proposed Standard (Feb 2024)
Web Bot Auth profile draft-cloudflare-web-bot-auth IETF draft, in production at Cloudflare
Signature suites Ed25519, ECDSA P-256, RSA-PSS (RFC 9421) Stable
Discovery /.well-known/http-message-signatures-directory Convention, evolving

What a Signed Request Looks Like

GET /article.html HTTP/2
Host: example.com
User-Agent: PerplexityBot/1.0 (+https://perplexity.ai/bot)
Accept: text/html
Signature-Agent: "https://perplexity.ai"
Signature-Input: sig1=("@authority" "@method" "@target-uri" "signature-agent");
                  created=1747396800;
                  keyid="ed25519-2026-01";
                  alg="ed25519";
                  expires=1747396830;
                  nonce="b3JpZ2luYWwtcmVxdWVzdC1ub25jZQ"
Signature: sig1=:MEUCIQDx5...truncated...==:

The server's verification flow:

  1. Read Signature-Agenthttps://perplexity.ai
  2. Fetch https://perplexity.ai/.well-known/http-message-signatures-directory over HTTPS
  3. Look up keyid=ed25519-2026-01 in the directory's JWKS
  4. Reconstruct the signature base from the @authority @method @target-uri signature-agent components
  5. Verify the Ed25519 signature against that key
  6. Check created and expires are within tolerance
  7. If everything matches, this request really came from Perplexity
Why HTTPS for the directory matters. The whole trust chain bottoms out at the TLS certificate on the directory's host. If an attacker can serve a fake directory under perplexity.ai, they can sign as Perplexity. This is why the directory URL must be HTTPS, and why the bot operator's TLS posture is now part of the trust model.

The Identity Hierarchy

Web Bot Auth gives you a three-level identity for any bot:

  • Level 0: Anonymous. User-agent only. No verification. Treat as untrusted by default.
  • Level 1: Reverse-DNS-verified. The bot's source IP reverse-resolves to a hostname controlled by the claimed operator. This is what Google has used for Googlebot for years. Better than nothing, but DNS-poisoning attacks and shared-hosting hazards exist.
  • Level 2: Signature-verified. The request is signed by a key published in the operator's directory, served over TLS. Cryptographic proof of identity per request.

What Site Operators Should Do

Today

  1. Audit your bot traffic. Log the Signature-Agent header where present. Most production WAFs (Cloudflare, Fastly, Akamai) now expose it.
  2. Decide on an AI policy and encode it in robots.txt and llms.txt. Trustworthy AI bots will honour both.
  3. Use the signature when blocking. "Block PerplexityBot" is unreliable if you trust the user-agent; verifying the signature first makes the block enforceable.
  4. Whitelist with confidence. Some operators want to allow specific AI bots while blocking unidentified scrapers. Web Bot Auth is the first mechanism that makes this safe.

If you operate a bot

  1. Generate an Ed25519 keypair (or ECDSA P-256).
  2. Publish the public key at https://your-host/.well-known/http-message-signatures-directory in JWKS format, served over a properly configured HTTPS endpoint.
  3. Sign each request with the standard components: @authority @method @target-uri signature-agent created expires.
  4. Rotate keys regularly (180 days is sensible) and keep two keys active during rotation.
  5. Treat the directory's TLS cert as the root of your bot's identity. Monitor it independently; don't let it expire.

The TLS Connection Underneath

Web Bot Auth is built on top of TLS, and that's deliberate. Two parts of your TLS configuration now affect bot trust:

TLS handshake fingerprinting (JA3 / JA4)

Even before Web Bot Auth, operators were identifying clients by their TLS handshake characteristics — cipher order, extensions, supported groups. JA3 (legacy MD5) and its successor JA4 produce a stable fingerprint of the ClientHello. A real Chrome and a Go HTTP client produce different fingerprints regardless of user-agent.

AI scraping libraries (Playwright in headless mode, curl-impersonate, Puppeteer) can mimic browser fingerprints. Web Bot Auth is the answer to that arms race: don't try to distinguish "real Chrome" from "Chrome-mimicking scraper" at the TLS layer when you can have the operator sign cryptographically at the HTTP layer.

Your TLS posture matters more than ever

If your site can't negotiate TLS 1.3 with modern AEAD ciphers, a portion of agentic clients will downgrade or fail. Worse, an attacker who can MITM your TLS can intercept signed bot requests and replay them — the signature binds to host and method but not to the underlying TLS session. TLS hygiene is foundational; don't treat bot auth as separable.

Securing AI-Exposed APIs and MCP Servers

The same primitives matter when you expose an API for AI agents to call. Whether it's a public LLM endpoint, an internal MCP (Model Context Protocol) server, or a tool-use API, the TLS layer is doing more work than ever:

Practical hardening

  • TLS 1.3 only, AEAD ciphers, ECDHE. No exceptions for "legacy clients" — LLM tooling is universally modern.
  • mTLS for high-value endpoints. If the agent is calling your private API, client certificates are still the cleanest auth method. Combined with short-lived ACME-issued client certs, the rotation problem goes away.
  • HTTP Message Signatures on requests and responses. The agent verifies it's really talking to you; you verify it's really the agent.
  • Strict Content-Security-Policy and Cross-Origin-Resource-Policy on any HTML you serve to agentic browsers — they're still browsers, with the same XSS attack surface.
  • Watch CT logs for your API hostname. If a phishing operation obtains a certificate for api-yourservice.com to fool agents into sending tokens, you want to know within minutes.

The Dark Side: AI-Generated Phishing and Free DV Certs

The same automation that made HTTPS universal also made it free for attackers. Generative AI lowers the cost of producing convincing phishing pages to near-zero, and Let's Encrypt happily issues a valid DV certificate for banking-account-verify.com in 30 seconds. The padlock has never meant less, and users still trust it.

Defensive moves that still work:

  • Brand-aware CT monitoring. Watch CT logs for new certificates containing your brand name or homoglyphs.
  • BIMI for email. Verified logos in mailbox UIs help users distinguish real from fake.
  • Strict DMARC at p=reject. AI-generated phishing still relies on spoofed email; DMARC kills it at the protocol layer.
  • Hardware-token MFA. The only auth factor that survives an AI-generated phishing page that captures everything else.

What's Next

The shape of the next two years:

  • Web Bot Auth moves from draft to RFC, probably 2026–27. Standardised verification libraries appear in nginx-modules and Caddy plugins.
  • Major mailbox providers adopt RFC 9421 for email API auth, replacing shared-secret SMTP submission for high-volume senders.
  • Browser-side agent identity — signed user-agent claims from real users, distinct from bot signatures — becomes the next interesting problem.
  • The PQC migration for signature algorithms arrives just in time to apply to all of this. Expect ML-DSA-65 keys in bot directories alongside Ed25519 within two years.

Related Articles