Salin dan Bagikan
Cara Membuat Robots.txt yang Benar untuk SEO
Robots.txt adalah file yang memberitahu search engine crawler halaman mana yang boleh dan tidak boleh di-crawl. Penting untuk crawl budget optimization dan privacy.
Apa itu Robots.txt?
Location: yoursite.com/robots.txt
Purpose: Guide search engine crawlers
Format: Plain text file
Standard: Robots Exclusion Protocol
Syntax Dasar
# Comment line
User-agent: *
Disallow: /private/
Allow: /public/
Sitemap: https://yoursite.com/sitemap.xml
Directives
| Directive | Function |
|---|---|
| User-agent | Target crawler |
| Disallow | Block path |
| Allow | Permit path (overrides Disallow) |
| Sitemap | Sitemap location |
| Crawl-delay | Wait between requests |
Contoh Robots.txt
Basic (Allow All)
User-agent: *
Disallow:
Sitemap: https://yoursite.com/sitemap.xml
Block Specific Folder
User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /tmp/
Sitemap: https://yoursite.com/sitemap.xml
Block All Crawlers
User-agent: *
Disallow: /
Different Rules per Bot
User-agent: Googlebot
Disallow: /nogoogle/
User-agent: Bingbot
Disallow: /nobing/
User-agent: *
Disallow: /private/
WordPress Standard
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /trackback/
Disallow: /feed/
Disallow: /comments/
Sitemap: https://yoursite.com/sitemap_index.xml
E-commerce
User-agent: *
Disallow: /cart/
Disallow: /checkout/
Disallow: /account/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=
Sitemap: https://yoursite.com/sitemap.xml
Pattern Matching
Wildcards
# Block all PDF files
User-agent: *
Disallow: /*.pdf$
# Block all query strings
Disallow: /*?
# Block specific parameter
Disallow: /*?ref=
End of URL ($)
# Only block .pdf files
Disallow: /*.pdf$
# This blocks /file.pdf
# But allows /file.pdf/page
Common Mistakes
❌ Blocking CSS/JS (hurts rendering)
❌ Blocking images (hurts image SEO)
❌ Typos in syntax
❌ Wrong file location
❌ Using noindex in robots.txt (doesn't work)
❌ Blocking sitemap
Correct Approach
# Allow CSS and JS for rendering
User-agent: *
Allow: /wp-includes/js/
Allow: /wp-content/themes/
Allow: /wp-content/plugins/
Disallow: /wp-admin/
Testing Robots.txt
Google Search Console
- Settings > robots.txt Tester
- Enter URL to test
- Check if blocked/allowed
Screaming Frog
- Configuration > robots.txt
- Test custom robots.txt
- See blocked URLs
Manual Check
curl https://yoursite.com/robots.txt
Robots.txt vs Noindex
Robots.txt:
- Controls crawling
- Doesn't prevent indexing
- File-based
Noindex:
- Controls indexing
- Page still crawled
- Meta tag/header
Best Practice:
- Use robots.txt for crawl efficiency
- Use noindex to prevent indexing
- Don't block pages you want noindexed
Important Notes
Blocking Doesn’t Mean Private
Warning:
Robots.txt is public
Anyone can read it
Not a security measure
For sensitive content:
- Password protection
- Server-side auth
- Not just robots.txt
Blocked Pages Can Still Index
If page has backlinks:
- URL may still appear in search
- Just without description
- Shows "blocked by robots.txt"
To truly prevent indexing:
- Use noindex tag
- Don't block crawling
- Let Google see the noindex
Best Practices Checklist
✓ Place at root domain
✓ Include sitemap location
✓ Test before deploying
✓ Don't block important resources
✓ Use for crawl efficiency
✓ Regular review and update
✗ Don't rely for security
✗ Don't block then expect noindex
Kesimpulan
Robots.txt adalah tool untuk mengontrol crawling, bukan indexing atau security. Gunakan dengan bijak untuk crawl budget optimization dan guide crawler ke content yang penting.
Artikel Terkait
Link Postingan : https://www.tirinfo.com/cara-membuat-robots-txt-seo/
Editor : Hendra WIjaya
Publisher :
Tirinfo
Read : 2 minutes.
Update : 7 January 2026