codeflood logo

Minimal Head Part 1: Single Page

In this multi-part series, I'm going to explore how far I can progress, building a static site generator (SSG) for a headless CMS using a restrained toolset and minimal dependencies.

Introduction

We developers have a tendency to over-complicate things. How often have you been talking to a developer colleague, or listening at a conference, and found that the thing the other person is talking about is completely over-engineered and more complex than it needs to be?

"To host my simple 3 page static HTML site I created a proper node container and run it in Kubernetes using AKS. I've got HPA enabled so the deployment will automatically scale with load, and the nodepools are also set to auto-scale!"

(this is an attempt at hyperbole, not an actual conversation I've had)

I've often been intrigued by how much one can achieve when working in a very constrained environment. It's testament that we often don't need all that added complexity. Javascript is a great example of this. Look at all the things people have written in Javascript! Yet, Javascript is quite minimal. There's no standard library like we have with other languages (Java, C#, even C). Yet that hasn't stopped Javascript from becoming one of the most widely used languages today.

So I got to thinking about how minimally I could build a static site generator as the head application when using a headless CMS like Sitecore Content Hub ONE. I've used existing frameworks like Eleventy and Nest.js to build head applications against Content Hub ONE, but I wanted to pare it right back.

The Toolset

So what's the bare essentials when building against Content Hub ONE? Firstly we need to be able to make HTTP requests against the GraphQL endpoint so we can query for our structured content. Then we need some way to extract the raw field values from the JSON response of the GraphQL. Lastly, we need to insert those raw field values into an HTML template to produce a resulting HTML file.

The toolset I settled on is:

  • curl - to make the HTTP requests to the GraphQL endpoint.
  • jq - to extract raw field values from the JSON response of the GraphQL endpoint.
  • Bash - to orchestrate the whole process and perform raw field value substitution into HTML templates, and generate the output files.

I also want this to execute under Linux, which is prevalent in SaaS CI/CD build pipeline services such as GitHub Actions and BitBucket Pipelines.

The Content Model

For the first part of this series, I'm going to generate pages for a blog, so I've created a content type in Content Hub ONE to model a very basic blog post.

Field Name Type Purpose
Title Short Text The title of the blog post
Content Rich text The main content of the blog post
Publish Date Date and time The timestamp for when the blog post is published
Category Select A simple taxonomy to segment blog posts

Here's the content type in Content Hub ONE.

Blog post content type in Content Hub ONE

With the content type defined, I'll create some test content.

A simple blog post in Content Hub ONE

Querying for Content

Now I've got some sample content published, I can use the GraphQL IDE to write a GraphQL query to retrieve the blog post by it's ID. Content hub ONE publishes content to Sitecore Experience Edge, and that's where I'll be querying for the published content.

This is the resulting GraphQL query:

query getBlogPost($id: String!) {
  blogPost(id: $id) {
    id
    title
    content
  }
}

I've parameterized the query so I can specify any blog post by ID. If I execute the query passing W-ogYlX0z0W79idySURpiA as the id variable, the JSON response is as follows:

{
  "data": {
    "blogPost": {
      "id": "W-ogYlX0z0W79idySURpiA",
      "title": "Younger and more vulnerable years",
      "content": {
        "type": "doc",
        "content": [
          {
            "type": "paragraph",
            "content": [
              {
                "type": "text",
                "text": "In my younger and more vulnerable years my father gave me some advice 
                that I’ve been turning over in my mind ever since."
              }
            ]
          },
          {
            "type": "paragraph",
            "content": [
              {
                "type": "text",
                "text": "“Whenever you feel like criticizing anyone,” he told me, “just 
                remember that all the people in this world haven’t had the advantages that 
                you’ve had.”"
              }
            ]
          },
          {
            "type": "paragraph",
            "content": [
              {
                "type": "text",
                "text": "He didn’t say any more, but we’ve always been unusually communicative 
                in a reserved way, and I understood that he meant a great deal more than that."
              }
            ]
          }
        ]
      }
    }
  }
}

To execute the GraphQL query, I need to post it to the GraphQL endpoint and structure the query into a JSON payload, including the variables:

{
    "query": "query getBlogPost($id: String!) {
        blogPost(id: $id) {
            id,
            title,
            content
        }
    }",
    "variables": {
        "id":"W-ogYlX0z0W79idySURpiA"
    }
}

I'll use curl to execute this web request and then store the result into a variable named page_data. Note this command uses an environment variable named API_TOKEN which must be set to the X-GQL-Token used to access Experience Edge. I have another blog post on Using the Experience Edge GraphQL API which details how to call the API.

# Get page data from CH ONE
page_data=$(\
    curl -X POST https://edge.sitecorecloud.io/api/graphql/v1 \
        -H "Content-Type: application/json" \
        -H "X-GQL-Token: ${API_TOKEN}" \
        -d '{"query":"query getBlogPost($id: String!) {blogPost(id: $id) { id, title, content }}",
          "variables":{"id":"W-ogYlX0z0W79idySURpiA"}}' \
        --fail-with-body
)

Extracting Field Values

With the JSON response from the GraphQL query stored in the page_data variable, I can pipe it to jq to extract the individual fields.

echo ${page_data} | jq -r -c '.data.blogPost | .id, .title, .content'

This command will output the fields to stdout, with each field on a new line. To read each into a variable for use later on, I'll use the read command and a here-string to pass the output of jq along.

# Extract individual fields
{
    read id;
    read title;
    read content;
} <<< $(echo ${page_data} | jq -r -c '.data.blogPost | .id, .title, .content')

Although the id and title fields are simple text, the content field is in ProseMirror format, and needs to be converted to HTML before using it. You can see from the GraphQL response JSON above that the ProseMirror format is a tree of objects, each with a type and optional sub-content, which is the same object structure again. This tree structure is most easily processed using recursion, so I'll define a recursive function which will process the ProseMirror content and generate HTML.

function transform_prosemirror {
    local type=$(echo $1 | jq .type -r)

    case $type in
        
        paragraph)
            echo "<p>"
            process_content "$1"
            echo "</p>"
            ;;
        
        text)
            echo $(echo "$1" | jq .text -r)
            ;;

        *)
            process_content "$1"
            ;;
    esac
}

function process_content {
    echo "$1" | jq .content[] -c | while read -r item
    do
        transform_prosemirror "$item"
    done
}

In the transform_prosemirror function, I'm extracting the type of the node, then using a case statement to process that node, calling process_content to process all sub-content. process_content will process each sub-content node in turn, calling back to transform_prosemirror for each, creating the recursive call.

With the function defined, I can use it to convert the content field:

# Convert fields
content=$(transform_prosemirror "$content")

Substitute Fields into the HTML Template

With each field stored in a local variable, I can perform the substitution of these into the HTML template of the file I'll be generating. I'll be using a here-doc for the HTML template, which can natively expand any parameters I want to use. I'll also redirect this output directly to a file.

# Write file
cat << END > page.html
<!DOCTYPE html>
<html>
    <head>
        <title>${title}</title>
    </head>
    <body>
        <h1>${title}</h1>
        <p>ID: ${id}</p>
        <main>
            ${content}
        </main>
    </body>
</html>
END

Conclusion

And there we have it, the simplest head application for my headless CMS. This example could have been a little simpler if I'd only used basic text and didn't have to worry about converting the ProseMirror content, but this is a bit closer to a real-world example.

This example only generates a single file so far, but in the following posts of this series I'll iterate over the script to cover more real-world cases, including making use of the extra fields of the blog post which I didn't include here (Publish Date and Category).

The script is also available on GitHub at https://github.com/codeflood/minimal-head/blob/main/1-single-page/generate.sh if you'd like to see the whole thing together. I've also added a GitHub workflow to build this simple single page website and prove this simple, minimal head application can run on a SaaS CI/CD build pipeline.

Comments

Leave a comment

All fields are required.