Salin dan Bagikan

Cara Membuat Robots.txt yang Benar untuk SEO

Robots.txt adalah file yang memberitahu search engine crawler halaman mana yang boleh dan tidak boleh di-crawl. Penting untuk crawl budget optimization dan privacy.

Apa itu Robots.txt?

Location: yoursite.com/robots.txt
Purpose: Guide search engine crawlers
Format: Plain text file
Standard: Robots Exclusion Protocol

Syntax Dasar

# Comment line
User-agent: *
Disallow: /private/
Allow: /public/
Sitemap: https://yoursite.com/sitemap.xml

Directives

Directive	Function
User-agent	Target crawler
Disallow	Block path
Allow	Permit path (overrides Disallow)
Sitemap	Sitemap location
Crawl-delay	Wait between requests

Contoh Robots.txt

Basic (Allow All)

User-agent: *
Disallow:

Sitemap: https://yoursite.com/sitemap.xml

Block Specific Folder

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/

Sitemap: https://yoursite.com/sitemap.xml

Block All Crawlers

User-agent: *
Disallow: /

Different Rules per Bot

User-agent: Googlebot
Disallow: /nogoogle/

User-agent: Bingbot
Disallow: /nobing/

User-agent: *
Disallow: /private/

WordPress Standard

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/

Sitemap: https://yoursite.com/sitemap_index.xml

E-commerce

User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=

Sitemap: https://yoursite.com/sitemap.xml

Pattern Matching

Wildcards

# Block all PDF files
User-agent: *
Disallow: /*.pdf$

# Block all query strings
Disallow: /*?

# Block specific parameter
Disallow: /*?ref=

End of URL ($)

# Only block .pdf files
Disallow: /*.pdf$

# This blocks /file.pdf
# But allows /file.pdf/page

Common Mistakes

❌ Blocking CSS/JS (hurts rendering)
❌ Blocking images (hurts image SEO)
❌ Typos in syntax
❌ Wrong file location
❌ Using noindex in robots.txt (doesn't work)
❌ Blocking sitemap

Correct Approach

# Allow CSS and JS for rendering
User-agent: *
Allow: /wp-includes/js/
Allow: /wp-content/themes/
Allow: /wp-content/plugins/
Disallow: /wp-admin/

Testing Robots.txt

Google Search Console

Settings > robots.txt Tester
Enter URL to test
Check if blocked/allowed

Screaming Frog

Configuration > robots.txt
Test custom robots.txt
See blocked URLs

Manual Check

curl https://yoursite.com/robots.txt

Robots.txt vs Noindex

Robots.txt:
- Controls crawling
- Doesn't prevent indexing
- File-based

Noindex:
- Controls indexing
- Page still crawled
- Meta tag/header

Best Practice:
- Use robots.txt for crawl efficiency
- Use noindex to prevent indexing
- Don't block pages you want noindexed

Important Notes

Blocking Doesn’t Mean Private

Warning:
Robots.txt is public
Anyone can read it
Not a security measure

For sensitive content:
- Password protection
- Server-side auth
- Not just robots.txt

Blocked Pages Can Still Index

If page has backlinks:
- URL may still appear in search
- Just without description
- Shows "blocked by robots.txt"

To truly prevent indexing:
- Use noindex tag
- Don't block crawling
- Let Google see the noindex

Best Practices Checklist

✓ Place at root domain
✓ Include sitemap location
✓ Test before deploying
✓ Don't block important resources
✓ Use for crawl efficiency
✓ Regular review and update
✗ Don't rely for security
✗ Don't block then expect noindex

Kesimpulan

Robots.txt adalah tool untuk mengontrol crawling, bukan indexing atau security. Gunakan dengan bijak untuk crawl budget optimization dan guide crawler ke content yang penting.