Getting a GitHub Blog Indexed by Google — Search Console Registration

This post summarizes how to expose GitHub blog posts to Google Search.

[01] What is Google Search Console?

A service for registering websites with Google’s search engine so they can be discovered, and for monitoring search results. It works through web crawling.

For blog posts to appear in Google Search, Google’s crawler has to read the site (crawl it).
You can register and monitor your site at Google Search Console (a Google account is required).

[02] Registering with Google Search Console

After logging in with your Google account, visit the site and click Start Now.
A GitBlog already has a URL, so enter the blog URL under URL prefix.
Complete Ownership Verification to prove the URL is yours.
Copy the HTML file shown in the popup (e.g., google675xxxx.html) into the root of the blog repository and push it.
git add *, git commit -m xxx, git push
After the file is uploaded, wait briefly — you’ll see the ownership-confirmed popup.

[03] Generating and Submitting sitemap.xml

Ownership verification lets Google know the blog exists, but the crawler still needs structured data to read and serve information from the site.
sitemap.xml lists every page on the website so search engines can surface them to users.
There are two ways to generate sitemap.xml:
- Write it manually
- Generate it via Jekyll’s jekyll-sitemap plugin
The Minimal Mistakes theme used by this blog can generate it through the jekyll-sitemap plugin.
- Confirm jekyll-sitemap is listed under plugins in _config.yml.
- Add gem 'jekyll-sitemap to the Gemfile.
- Run bundle install in the terminal.
- After pushing to the Git repository, open htpps://xxx.github.io/sitemap.xml — the list will look like this:
In Google Search Console, go to Index → Sitemaps.
Enter sitemap.xml under “Add a new sitemap” and submit.

[04] Generating and Applying robots.txt

robots.txt defines the rules a web crawler must follow when crawling the site.
It controls which parts of the site can be referenced.
Create robots.txt in the root of the blog repository with the following content:

# User-agent : the crawler the rules apply to (* allows all)
# - Google's crawler is Googlebot, Naver's is Yeti, etc. — each engine has its own crawler.
# Allow: paths to allow crawling (/ permits everything under /)
# Sitemap: location of the sitemap
# - The sitemap is a directory of pages on the site.

User-agent: *
Allow: /

Sitemap: https://eona1301.github.io/sitemap.xml

cmaven

Getting a GitHub Blog Indexed by Google — Search Console Registration

[01] What is Google Search Console?

[02] Registering with Google Search Console

[03] Generating and Submitting sitemap.xml

[04] Generating and Applying robots.txt

You may also enjoy

GitHub Actions Workflow Guide — Line-by-Line of deploy.yml and stale.yml

GitHub Multi-Account Switching — Using gh auth switch

Checking RAID Environment Disks on Ubuntu — MegaRAID + smartctl Guide

Kubernetes Update Strategies — Rolling Update, Blue/Green, and Canary