:bulb: This post summarizes how to expose GitHub blog posts to Google Search.

[01] What is Google Search Console?

A service for registering websites with Google’s search engine so they can be discovered, and for monitoring search results. It works through web crawling.

  • For blog posts to appear in Google Search, Google’s crawler has to read the site (crawl it).
  • You can register and monitor your site at Google Search Console (a Google account is required).

[02] Registering with Google Search Console

  • After logging in with your Google account, visit the site and click Start Now.

    image 1

  • A GitBlog already has a URL, so enter the blog URL under URL prefix.

    image 2

  • Complete Ownership Verification to prove the URL is yours.

    image 3

  • Copy the HTML file shown in the popup (e.g., google675xxxx.html) into the root of the blog repository and push it.
  • git add *, git commit -m xxx, git push

    image 20

  • After the file is uploaded, wait briefly — you’ll see the ownership-confirmed popup.

    image 4

[03] Generating and Submitting sitemap.xml

  • Ownership verification lets Google know the blog exists, but the crawler still needs structured data to read and serve information from the site.
  • sitemap.xml lists every page on the website so search engines can surface them to users.
  • There are two ways to generate sitemap.xml:
    • Write it manually
    • Generate it via Jekyll’s jekyll-sitemap plugin
  • The Minimal Mistakes theme used by this blog can generate it through the jekyll-sitemap plugin.
    • Confirm jekyll-sitemap is listed under plugins in _config.yml.

      image 12

    • Add gem 'jekyll-sitemap to the Gemfile.

      image 5

    • Run bundle install in the terminal.

      image 7

    • After pushing to the Git repository, open htpps://xxx.github.io/sitemap.xml — the list will look like this:

      image 11

  • In Google Search Console, go to Index → Sitemaps.

    image 13

  • Enter sitemap.xml under “Add a new sitemap” and submit.

    image 14

    image 15

[04] Generating and Applying robots.txt

  • robots.txt defines the rules a web crawler must follow when crawling the site.
  • It controls which parts of the site can be referenced.
  • Create robots.txt in the root of the blog repository with the following content:
1
2
3
4
5
6
7
8
9
10
# User-agent : the crawler the rules apply to (* allows all)
# - Google's crawler is Googlebot, Naver's is Yeti, etc. — each engine has its own crawler.
# Allow: paths to allow crawling (/ permits everything under /)
# Sitemap: location of the sitemap
# - The sitemap is a directory of pages on the site.

User-agent: *
Allow: /

Sitemap: https://eona1301.github.io/sitemap.xml