@kentgigger

How To Create A Dynamic Sitemap With Next.js

Authors

computer, summary, chart

Next.js is one of the more popular React production frameworks out there.

You might not have heard about it, but there is something called a Sitemap. Search engines LOVE sitemaps. If you go to Google and search for site:https://yoursite.com/sitemap.xml, you will see a list of URLs that match a specific pattern on your website. How do these pages get into this list, though? And how can we use Next.js to create such a dynamic sitemap? In this tutorial, I will be walking through how I've made my Next.js sitemap and how you can too. Let's dive right in:

First, you need to have a Next.js site1 already set up, or this won't work. And you will need to have some pages2 or posts you've already created. Do all the npm run dev stuff and let's get going!

Table of Contents

Static vs. Dynamic Sitemaps

A Static Sitemap

If your site is not updated often, it might be suffering from a static sitemap. This is a raw .xml file defining the content of your site.

Here's a simple static sitemap example on an xml sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
      <loc>https://yoursite.com/home</loc>
      <lastmod>2022-05-12</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.7</priority>
  </url>
</urlset>

As the number of pages on your site increases, you might want to generate your sitemap dynamically.

Okay, let's break down these parameters in the simple static sitemap:

<urlset>

Required, encapsulates the file, and refers to a protocol standard.

<url>

Required, the parent tag for each URL entry. The rest are children of this tag.

<loc>

Required, this URL address begins with the protocol (for example, http or https) and ends with a trailing slash—if your web server requires it. It must be less than 2,048 characters.

<lastmod>

Optional, this field shows the date and time of the last modification made to the page. The date should be written in the W3C Datetime3 format, which omits the time portion if you desire.

<changefreq>

Optional, how often the page will change. This value provides general information to search engines and may not accurately reflect how often they crawl the page.

The following certain values can be used:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never
Use always to describe pages that change each time they are accessed, and never for URLs that have been archived.
<priority>

Optional, this number tells search engines how much this URL is prioritized relative to others on your site. A higher number means that it's a more important page. Valid values range from 0.0 to 1.0.

I've always used the priority tag; however, I will note that Google4 doesn't seem to care too much about the priority attribute in sitemaps.

The default priority of a page is 0.5. While it's true that you can give your pages a high priority, this will not affect their positions in the search engines' results.

All this can be found in greater detail at sitemaps.org 5. It's full of xml file examples and how to code a static sitemap.

A Dynamic Sitemap

If your site is frequently updated, you should create a dynamic sitemap. I have a handful of static pages because I use MD files instead of a CMS like WordPress on my blog, so I don't have to worry about content changing after build time. If all your content and pages are local in your project, you can make a sitemap.xml file at build time by running a script.

My site uses a globby6 script to globby to or traverse the file system and return all my routes. So the first thing you'll need to do is install it as a dev dependency.

npm i D-globby

Then you'll create the script page:

scripts / generate - sitemap.js

Lastly, add the following code to your package.json file:

"build": "next build && node ./scripts/generate-sitemap",

Next time you build your website, look at the freshly minted public/sitemap.xml.

You can also make a postbuild script if you'd like to view the sitemap locally, but I'll have to get into that in another post.

This will run the script after the next build completes.

const fs = require('fs')
const globby = require('globby')
// Takes a string and returns an object for my site metadata
const matter = require('gray-matter')
// Fetchs dynamic header and directory URL for each page. All the page data
const siteMetadata = require('../data/siteMetadata')
// Fetch all routes based on patterns
// Your pages folder structure might be different, so customize the code bellow to match your needs.
;(async () => {
  const pages = await globby([
    'pages/*.js', // Routes inside /pages
    'pages/*.tsx', // Routes inside /pages
    'data/blog/**/*.mdx', // All MDX files inside my /data/blog
    'data/blog/**/*.md', // All MD files inside my /data/blog
    '!pages/_*.js', // Ignore next.js files
    '!pages/_*.tsx', // Ignore next.js files
    '!pages/**/[*.{ts,tsx}', // Ignore my dynamic route index Example /pages/blog/[slug].tsx
    '!pages/api', // Ignore API routes
    // 'public/tags/**/*.xml', // Ignore Tags
  ])

  // Add urlSet to entire sitemap string
  // xml content type
  const sitemap = `
        <?xml version="1.0" encoding="UTF-8"?>
        <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
            ${pages
              .map((page) => {
                // Exclude drafts from the sitemap
                if (page.search('.md') >= 1 && fs.existsSync(page)) {
                  const source = fs.readFileSync(page, 'utf8')
                  const fm = matter(source)
                  if (fm.data.draft) {
                    return
                  }
                }
                // Remove none route related parts of filename.
                const path = page
                  .replace('pages/', '/')
                  .replace('data/blog', '/blog')
                  // .replace('public/', '/')  // Ignore Tags
                  .replace('.js', '')
                  .replace('.tsx', '')
                  .replace('.mdx', '')
                  .replace('.md', '')
                  .replace('/feed.xml', '')
                // Remove the word index from route
                const route = path === '/index' ? '' : path
                // Remove the 404 page and search page from sitemap
                if (page.search('pages/404.') > -1 || page.search(`pages/blog/[...slug].`) > -1) {
                  return
                }
                // Build url portion of sitemap.xml
                return `
                        <url>
                            <loc>${siteMetadata.siteUrl}${route}</loc>
                            <lastmod>${new Date().toISOString()}</lastmod>
                              <changefreq>monthly</changefreq>
                              <priority>0.7</priority>
                        </url>
                    `
              })
              .join('')}
        </urlset>
    `
  // eslint-disable-next-line no-sync
  fs.writeFileSync('public/sitemap.xml', formatted)
})()
Also, don't forget to add public/sitemap.xml to your .gitignore because it is a generated file.

One of the issues I ran into while building my sitemap was that all my draft MD pages were also showing up in my sitemap. I had to add this little piece to keep those files from showing up in the sitemap.

// Exclude drafts from the sitemap
    if (page.search('.md') >= 1 && fs.existsSync(page)) {
      const source = fs.readFileSync(page, 'utf8')
      const fm = matter(source)
      if (fm.data.draft) {
        return
      }

CMS Content or External Content

A sitemap cannot be created at build time when you are using a CMS. Once you have pushed out new content after the build, the sitemap will be outdated unless you never update your CMS with new content.

Using an external data source—like a CMS or homegrown database—to create your sitemap, you'll need to make an API request before generating it. The implementation will vary depending on the data source and what it offers, but the idea is the same.

Make a new file such as /pages/api/sitemap.js

Sitemap Js File Example

//import react
import React from 'react'
// Fetch data from a CMS.
const CMS_DATA_URL = 'https://yoursite.com/posts'
// Add urlSet to entire sitemap string
const createSitemap = (posts) => `<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
        ${posts
          .map(({ id }) => {
            // Build url portion of sitemap.xml
            return `
                <url>
                    <loc>${`${CMS_DATA_URL}/${id}`}</loc>
                    <lastmod>${new Date().toISOString()}</lastmod>
                    <changefreq>monthly</changefreq>
                    <priority>0.7</priority>
                </url>
            `
          })
          .join('')}
    </urlset>
    `

class ExteranlSitemap extends React.Component {
  static async getInitialProps({ res }) {
    const request = await fetch(CMS_DATA_URL)
    const posts = await request.json()
    // ensure response is XML encoded
    res.setHeader('Content-Type', 'text/xml')
    res.write(createSitemap(posts))
    res.end()
  }
}

export default ExteranlSitemap

That gives sitemaps a route of pages/api/sitemap.js. Sitemaps typically live at the root at /sitemap.xml, but we can use Nextjs configuration file next.config.js to rewrite the /sitemap.xml URL to its real route of /pages/api/sitemap.js. Keep in mind this is my pages folder structure. Yours could be different.

Next.js Config file

In the Next.js config file add the following.

// nextjs config file

module.exports = {
  rewrites: async () => [
    {
      source: '/sitemap.xml',
      destination: '/pages/api/sitemap', // your pages folder structure
    },
  ],
}

And that's it! Do a dance part as you have created a next sitemap. Depending on how your site is built, you may have to publish your site to see the sitemap.

If you are using a CMS I would recommend fetching data using GraphQL. Which I'll go into more in another post.

Please make sure you encode your next sitemap file in UTF-8 and escape URLs appropriately 7, or Google will not like your sitemap. Breaking up a sitemap into smaller sitemaps is recommended, as a sitemap may have up to 50k URLs and must not exceed 50MB uncompressed. If you have more than 50k URLs, use a sitemap index file8 to assemble all your sitemaps and send this one file to be indexed rather than sending them one by one.

Please keep in mind that server side rendering has its own set of challenges when it comes to being SEO friendly. This is one of the reasons I use MD files and keep static pages on my site.

Next js Sitemap Example

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yoursite.com/home</loc>
    <lastmod>2022-05-12</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.7</priority>
  </url>
</urlset>

Robots.txt For Search Engines

Adding a robots.txt file to your website can help improve its search engine optimization. A robots.txt file is like an online "No Trespassing" sign that tells search engines which areas are off-limits—and which roads they should take instead.

Create a robots.txt file and add it to your public folder.

Examples Robots.txt

Basic Robots.txt

User-agent: *
Allow: /

# Host
Host: https://yoursite.com/

# Sitemaps
Sitemap: https://yoursite.com/sitemap.xml

It will tell search engines they are welcome to explore your website to their heart's content.

If you want to keep any pages from being indexed, add them to the "Disallow" section. For example we don't want to index our api page as that wouldn't look good. We only want the next sitemap xml file to get indexed. If it were to be indexed it could create two sitemaps and we don't want that.

Robots.txt with Disallow

User-agent: *
Allow: /
Disallow: /api/*

# Host
Host: https://yoursite.com/

# Sitemaps
Sitemap: https://yoursite.com/sitemap.xml

You can view your robots.txt file by visiting yoursite.com/robots.txt

The End

Next.js is very SEO friendly and is a vital component of any website. In this tutorial, we went over building a sitemap generator that creates a sitemap for every page on your website. This will help boost your rankings on search engine result pages (SERPs) and draw more users to your website. I hope you learned something.

If your website grows and you continue to write page after page, you'll want your next sitemap to function without thinking about it. Creating the next sitemap dynamically will help you import all your pages and posts, remove the manual human part, and keep you from mapping every page. As time goes on, you'll also want to get into JSON files, better known as schema, to help all the bots quickly read your site without many requests. A post for another time too.

I hope you learned something. I know I did.

Footnotes

  1. https://nextjs.org/docs

  2. https://nextjs.org/docs/basic-features/pages

  3. https://www.w3.org/TR/NOTE-datetime

  4. https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap

  5. https://www.sitemaps.org/protocol.html

  6. https://www.npmjs.com/package/globby

  7. https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap#sitemap-encoding-guidelines

  8. https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps