From 40aa9552969bdb71f10d315df726e779a33b3da7 Mon Sep 17 00:00:00 2001 From: TaprootFreak <142087526+TaprootFreak@users.noreply.github.com> Date: Thu, 4 Jun 2026 19:23:00 +0200 Subject: [PATCH] feat: add robots.txt allowing search engines and AI crawlers Add a version-controlled robots.txt served from the VuePress public dir (copied to the published site root) that explicitly allows search engines and AI agents to crawl, index, and learn from the public documentation. It grants all content signals (search, ai-input, ai-train) and lists the major AI crawlers individually in addition to the wildcard group, since some honor only their own named record. --- src/.vuepress/public/robots.txt | 38 +++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 src/.vuepress/public/robots.txt diff --git a/src/.vuepress/public/robots.txt b/src/.vuepress/public/robots.txt new file mode 100644 index 0000000..6fd0392 --- /dev/null +++ b/src/.vuepress/public/robots.txt @@ -0,0 +1,38 @@ +# robots.txt — DFX documentation (docs.dfx.swiss) +# +# Public documentation. We explicitly WANT both search engines and AI agents to +# crawl, index, and learn from this content. This file is version-controlled in +# this repository and is the authoritative crawl policy for this site. +# +# Content signals: all uses are granted — search, AI input / retrieval-augmented +# generation, and AI training. We deliberately do NOT signal ai-train=no. + +User-agent: * +Allow: / +Content-Signal: search=yes, ai-input=yes, ai-train=yes + +# Major AI crawlers are explicitly welcome. Some honor only their own named +# record, so each is listed in addition to the wildcard group above. +User-agent: ClaudeBot +Allow: / + +User-agent: GPTBot +Allow: / + +User-agent: Google-Extended +Allow: / + +User-agent: CCBot +Allow: / + +User-agent: Bytespider +Allow: / + +User-agent: Amazonbot +Allow: / + +User-agent: Applebot-Extended +Allow: / + +User-agent: meta-externalagent +Allow: /