From 40aa9552969bdb71f10d315df726e779a33b3da7 Mon Sep 17 00:00:00 2001
From: TaprootFreak <142087526+TaprootFreak@users.noreply.github.com>
Date: Thu, 4 Jun 2026 19:23:00 +0200
Subject: [PATCH] feat: add robots.txt allowing search engines and AI crawlers

Add a version-controlled robots.txt served from the VuePress public dir
(copied to the published site root) that explicitly allows search engines
and AI agents to crawl, index, and learn from the public documentation.

It grants all content signals (search, ai-input, ai-train) and lists the
major AI crawlers individually in addition to the wildcard group, since some
honor only their own named record.
---
 src/.vuepress/public/robots.txt | 38 +++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)
 create mode 100644 src/.vuepress/public/robots.txt

diff --git a/src/.vuepress/public/robots.txt b/src/.vuepress/public/robots.txt
new file mode 100644
index 0000000..6fd0392
--- /dev/null
+++ b/src/.vuepress/public/robots.txt
@@ -0,0 +1,38 @@
+# robots.txt — DFX documentation (docs.dfx.swiss)
+#
+# Public documentation. We explicitly WANT both search engines and AI agents to
+# crawl, index, and learn from this content. This file is version-controlled in
+# this repository and is the authoritative crawl policy for this site.
+#
+# Content signals: all uses are granted — search, AI input / retrieval-augmented
+# generation, and AI training. We deliberately do NOT signal ai-train=no.
+
+User-agent: *
+Allow: /
+Content-Signal: search=yes, ai-input=yes, ai-train=yes
+
+# Major AI crawlers are explicitly welcome. Some honor only their own named
+# record, so each is listed in addition to the wildcard group above.
+User-agent: ClaudeBot
+Allow: /
+
+User-agent: GPTBot
+Allow: /
+
+User-agent: Google-Extended
+Allow: /
+
+User-agent: CCBot
+Allow: /
+
+User-agent: Bytespider
+Allow: /
+
+User-agent: Amazonbot
+Allow: /
+
+User-agent: Applebot-Extended
+Allow: /
+
+User-agent: meta-externalagent
+Allow: /