Getting a GitHub Blog Indexed by Google — Search Console Registration
This post summarizes how to expose GitHub blog posts to Google Search.
[01] What is Google Search Console?
A service for registering websites with Google’s search engine so they can be discovered, and for monitoring search results. It works through web crawling.
- For blog posts to appear in Google Search, Google’s crawler has to read the site (crawl it).
- You can register and monitor your site at Google Search Console (a Google account is required).
[02] Registering with Google Search Console
-
After logging in with your Google account, visit the site and click Start Now.

-
A GitBlog already has a URL, so enter the blog URL under URL prefix.

-
Complete Ownership Verification to prove the URL is yours.

- Copy the HTML file shown in the popup (e.g.,
google675xxxx.html) into the root of the blog repository and push it. -
git add *,git commit -m xxx,git push
-
After the file is uploaded, wait briefly — you’ll see the ownership-confirmed popup.

[03] Generating and Submitting sitemap.xml
- Ownership verification lets Google know the blog exists, but the crawler still needs structured data to read and serve information from the site.
-
sitemap.xmllists every page on the website so search engines can surface them to users. - There are two ways to generate
sitemap.xml:- Write it manually
- Generate it via Jekyll’s
jekyll-sitemapplugin
- The Minimal Mistakes theme used by this blog can generate it through the
jekyll-sitemapplugin.-
Confirm
jekyll-sitemapis listed underpluginsin_config.yml.
-
Add
gem 'jekyll-sitemapto theGemfile.
-
Run
bundle installin the terminal.
-
After pushing to the Git repository, open
htpps://xxx.github.io/sitemap.xml— the list will look like this:
-
-
In Google Search Console, go to Index → Sitemaps.

-
Enter
sitemap.xmlunder “Add a new sitemap” and submit.

[04] Generating and Applying robots.txt
-
robots.txtdefines the rules a web crawler must follow when crawling the site. - It controls which parts of the site can be referenced.
- Create
robots.txtin the root of the blog repository with the following content:
1
2
3
4
5
6
7
8
9
10
# User-agent : the crawler the rules apply to (* allows all)
# - Google's crawler is Googlebot, Naver's is Yeti, etc. — each engine has its own crawler.
# Allow: paths to allow crawling (/ permits everything under /)
# Sitemap: location of the sitemap
# - The sitemap is a directory of pages on the site.
User-agent: *
Allow: /
Sitemap: https://eona1301.github.io/sitemap.xml