Back to Blog
Guide

What a Competitor's robots.txt Quietly Leaks

The smallest file on a competitor's domain is a hand-written list of the paths they specifically don't want you to see — which is exactly why it's worth reading.

A competitor's robots.txt lists the paths they don't want crawled — staging sites, beta features, and internal tools they'd rather you never found.

June 12, 2026
5 min read

Every website serves a file at competitor.com/robots.txt. It's a plain-text instruction sheet for search crawlers — a list of paths the site asks bots not to index. It exists for SEO and crawl-budget hygiene, written for Googlebot, not for you. But because it's a hand-maintained list of the URLs a company specifically wants kept out of search results, it doubles as a directory of the things they'd rather nobody stumble onto. It's the single smallest, most-ignored file on the entire domain, and it's been public the whole time.

The signal is in the Disallow: lines. Each one names a path the company chose to hide from search — and people only hide paths that exist. Read top to bottom, a robots.txt is a list of directories a competitor confirmed are live, then asked the internet to look away from.

Disallow lines name things that exist

Nobody writes a Disallow: rule for a page that isn't there. So every entry is an admission: this path is live, and we don't want it indexed. The mundane ones — /cart, /account, /search — tell you nothing. The interesting ones name a directory the company has stood up but isn't ready to advertise.

Disallow: /beta/, Disallow: /labs/, Disallow: /preview/ — these are feature areas in flight. Disallow: /staging/ or a staging.competitor.com reference points at a pre-production environment that's reachable on the public internet. Disallow: /docs/v2/ while their live docs sit at /docs/ means a documentation rewrite — and usually a product rewrite — is underway. The same logic that makes a sitemap leak unannounced pages works in reverse here: the sitemap lists what they want found, robots.txt lists what they don't, and the gap between the two is where the unshipped work lives.

New Disallow entries are dated roadmap

A robots.txt changes rarely, which is exactly what makes a change meaningful. When a new Disallow: /ai/ or Disallow: /workflows/ line appears that wasn't there last month, an engineer added it because that path now exists and isn't ready for search yet. You're seeing the directory created before the feature is announced.

This is the same engineering-honest lead time you get from new endpoints showing up in their docs, but earlier and cruder — the path often appears in robots.txt before there's any docs page to find. A burst of new disallowed directories in a short window is a strong tell that a launch or a pivot is being staged behind the scenes.

AI-crawler rules reveal their stance

Since the LLM-scraping wave, robots.txt has picked up a new class of entries: rules targeting AI crawlers by name. User-agent: GPTBot, User-agent: ClaudeBot, User-agent: CCBot, User-agent: Google-Extended, each followed by Disallow: / or a carve-out.

How a competitor handles these is a small but real strategic disclosure. Blocking every AI crawler says they see their content as a defensible asset they don't want feeding models. Explicitly allowing them — or leaving everything open — often signals a company betting on LLM-surface visibility as a distribution channel. A change here, from open to blocked or the reverse, is a content-strategy decision you can read the week they make it.

The Sitemap line and the comment lines

Two more things hide in the file. The Sitemap: directive at the bottom links to their full sitemap index — sometimes several, including ones not linked anywhere else on the site. That's the thread to pull for the full unannounced-pages sweep.

And occasionally you'll find comments — lines starting with #. Engineers leave notes to themselves: # block the old marketing site, # temp, remove after migration, # disallow internal tools. These are unguarded asides written for an audience of one, and they sometimes hand you the why behind a disallowed path for free.

How Seeto handles this

A robots.txt is the kind of file no one re-reads — you might check it once when you're poking at a competitor's SEO, then never again, so a new Disallow: /beta/ line can sit there for months unnoticed. The meaningful moments are single-line diffs: one new disallowed directory, an AI-crawler rule flipping from allow to block, a fresh staging reference. Seeto treats robots.txt as a monitored surface, so a new or removed Disallow: line surfaces as a discrete change event on the same cadence as the pricing and docs pages. It doesn't tell you what /workflows/ is going to become — reading that implication is still your job. It just makes sure the one-line change reaches you the week it's added, instead of the next time you happen to type the URL.

The two-minute version

For each of your top three competitors, once a month:

  1. Open competitor.com/robots.txt and read every Disallow: line, ignoring the obvious ones (/cart, /account) and noting any path that names a feature, environment, or directory you didn't know existed — /beta/, /staging/, /labs/, /v2/.
  2. Save the file's current contents and compare next month. Any new disallowed directory, any AI-crawler rule that changed, or any Sitemap: line you hadn't followed is a roadmap or strategy signal you're reading straight from the source, before they announce it.

Ready to analyze your competitors?

Seeto monitors your competitors 24/7 and delivers actionable insights automatically.