AI Scrape Protect

Opis

AI Scrape Protect is a WordPress plugin designed to protect your website from scraping for AI training purposes. It achieves this by adding opt-out instructions to the robots.txt file for the most common AI scraping bots and including meta tags to control how your content is used.

Version 5.0 introduces a full settings page with granular per-bot control, categorised bot management, AJAX-powered toggles, and support for custom user agents.

Features

  • Adds User-agent and Disallow rules to your robots.txt file to block a comprehensive list of AI scraping bots.
  • Generates compact robots.txt output: all blocked bots are grouped under a single Disallow: / directive.
  • Introduces meta tags in the HTML <head> to provide additional instructions to AI bots.
  • Separate toggles for content meta tags (noai, nosummary, DisallowAITraining) and image meta tags (noimageai).
  • Settings page with five categorised tabs: Search Engines, AI Training, AI Search & Answers, General Indexing, Custom Bots.
  • Group toggle per category to block or allow all bots in that category at once.
  • Individual per-bot toggle with description and company information for each bot.
  • Custom bots tab: add and remove your own user agent strings.
  • Global enable/disable toggle and separate toggles for robots.txt and meta tags.
  • „Apply default settings” button to reset all options.
  • Admin bar icon linking to the settings page (visible to administrators only). Icon reflects the current enabled state.
  • Physical robots.txt detection notice: warns you if a file in your root directory overrides WordPress’s virtual robots.txt.
  • Fully AJAX-powered settings page: changes are saved instantly without page reloads.
  • Extensible via the aisp_user_agents filter for developers.
  • Multisite note: settings are stored per site using get_option/update_option. Network-wide configuration is not supported.

Note: robots.txt and meta tag opt-out instructions are not always respected by all bots. This plugin is a measure to discourage scraping rather than a guaranteed technical block.

Zrzuty ekranu

  • Settings page: Global toggles for enabling robots.txt rules and meta tags.
  • Bot management tabs: Per-category tabs with individual bot toggles and descriptions.
  • Custom bots tab: Add and remove your own user agent strings.
  • robots.txt output: Compact grouped format with all blocked bots under a single Disallow directive.

Instalacja

  1. Upload the ai-scrape-protect folder to the /wp-content/plugins/ directory.
  2. Activate the plugin through the 'Plugins’ menu in WordPress.
  3. Go to Settings > AI Scrape Protect to configure your preferences.

Najczęściej zadawane pytania

How does this plugin protect my site from AI scraping?

The plugin adds specific User-agent entries to your robots.txt file to instruct common AI scraping bots not to crawl your site. It also adds meta tags in the HTML <head> to signal AI bots that your content should not be used for training or summarisation.

Will this completely stop AI scraping of my site?

No. robots.txt and meta tag directives are voluntary signals. Most reputable AI companies respect them, but malicious or poorly behaved scrapers may not. This plugin is a measure to discourage scraping, not a technical enforcement mechanism.

I have a physical robots.txt file in my WordPress root. Will the plugin work?

No. When a physical robots.txt file exists, WordPress’s virtual robots.txt is bypassed entirely, and this plugin’s modifications will not be applied. The plugin will display a warning on its settings page when a physical file is detected. Remove or rename the physical file to allow the plugin to function.

Can I add bots that are not in the list?

Yes. Go to Settings > AI Scrape Protect and open the Custom Bots tab. Enter the exact user agent token as it appears in your server logs.

What is the difference between AI Training bots and AI Search & Answers bots?

AI Training bots collect your content to train language models. AI Search & Answers bots retrieve content to generate answers for users of AI search tools like ChatGPT Search or Perplexity. Blocking AI Search bots may prevent your site from appearing in AI-generated answers. Blocking AI Training bots does not affect your visibility in regular search engines.

What is the difference between Search Engines and the other categories?

Search engine bots such as Googlebot and Bingbot are allowed by default. Disabling one will add it to the blocked list and remove it from the explicit allow block in robots.txt. Only do this if you intentionally want to remove your site from that search engine.

Can I allow some bots while blocking others?

Yes. Use the individual toggles on the settings page to control each bot separately.

What does the aisp_user_agents filter do?

Developers can use this filter to add or modify bots programmatically from a theme or another plugin, without editing the plugin files directly.

Does this plugin work on Multisite?

Settings are stored per site using standard WordPress options. Network-wide configuration is not supported.

I deactivated the plugin. What happens to my settings?

Your settings are preserved in the database. Reactivating the plugin will restore your previous configuration. If you want to remove all plugin data, use the „Apply default settings” button before deactivating, or delete the options manually from the database.

Recenzje

2025-12-01
The plugin works great. The only thing I miss is a simple interface where I can select, using checkboxes, the bots I want to block or allow in the robots.txt file. I believe this would be extremely helpful and would represent a significant quality improvement for the plugin, Dan. For example, I currently have Bingbot blocked, and I would like to be able to fix this without having to modify the code or the plugin itself.4 starts for now! 5 if you do something with that! See u and good job
2025-06-30
Really a super handy plugin. For me personally, I’d like to be able to manage the settings. So I’m looking forward to the next version. Or maybe interaction with the wikipedia entry List_of_bots.
2025-02-11
For me, it’s important that the content that I worked so hard on isn’t used without my permission. This plugin helps me keep AI scraper bots off my website. I tested several of them and it works! Also, it is really easy to manage and looks like it gets updated very regularly, which is essential at the pace that things are moving. Long story short, I’ve installed it on both my websites. Very happy.
Przeczytaj 4 recenzje

Kontrybutorzy i deweloperzy

„AI Scrape Protect” jest oprogramowaniem open source. Poniższe osoby miały wkład w rozwój wtyczki.

Zaangażowani

Rejestr zmian

5.0

  • Complete rewrite with modular file structure (includes/class-robots.php, class-meta-tags.php, class-admin-bar.php, class-settings-page.php, class-ajax.php, bots.php).
  • Added full settings page under Settings > AI Scrape Protect.
  • Added AJAX-powered toggles: all settings save instantly without page reloads.
  • Added five bot categories: Search Engines, AI Training, AI Search & Answers, General Indexing, Custom Bots.
  • Added individual per-bot toggles with descriptions and company information.
  • Added group toggle per category.
  • Added Custom Bots tab for user-defined user agent strings.
  • Added global enable/disable toggle and separate toggles for robots.txt and meta tags.
  • Added „Apply default settings” button.
  • Added physical robots.txt detection notice.
  • Added admin bar icon linking to settings page (administrators only). Icon reflects current enabled state.
  • Switched robots.txt output to compact grouped format (all bots under a single Disallow: /).
  • Separated meta tags into content group (noai, nosummary, DisallowAITraining) and image group (noimageai).
  • Removed duplicate meta name=”robots” tags; each group now outputs one combined tag.
  • Removed deprecated bot entries: anthropic-ai, Claude-Web (replaced by ClaudeBot), BardBot (Google Bard is now Gemini), NeevaBot (Neeva shut down in 2023).
  • Added new bots: Claude-SearchBot, Claude-User, Google-Agent, Google-Extended, GoogleOther, GPTBot-Preview, ClaudeResearchBot, FirecrawlAgent, iAskAI-Crawler, Perplexity-User.
  • Replaced Always Allowed category with Search Engines: all search engine bots are now individually toggleable, not locked. Includes Googlebot variants and Bingbot.
  • Updated Requires PHP to 8.0.
  • Tested up to WordPress 7.0.
  • All links updated to codesurf.eu.
  • Extensible via aisp_user_agents filter.

4.6

  • Added six new AI scraping bots to the block list: 360Spider, ChatGLM-Spider, Sogou, Baiduspider, ErnieBot, DeepseekBot.

4.5

  • Removed DuckDuckBot from the blocklist.
  • Specifically allowed Googlebot, Googlebot-Image, Googlebot-News, Google-PageSpeed, Google-Site-Verification and Lighthouse.

4.4

  • Improved compatibility with other plugins modifying robots.txt by no longer injecting a redundant „User-agent: *” directive.

4.3

  • Increased priority for the robots.txt modification.
  • Updated modify_robots_txt to add all User-agent rules in a single loop.

4.2

  • Added four new AI training bots: Claude-User, DataForSeoBot, GoogleOther, TurnitinBot.

4.1

  • Added two new AI bots: Amazon-AI and AnthropicBot.

4.0

  • Project links moved from uisce.eu to codesurf.eu. No functional changes.

3.1

  • Added new AI bots: DuckDuckBot, OpenAIContentCrawler, YandexBot, NeevaBot, AIMatrixCrawler.
  • Improved admin bar icon.

3.0

  • Added admin bar icon functionality.
  • Updated meta tags for improved AI scraping protection.

2.4

  • Updated meta tags for improved AI scraping protection.