Szymon Kaliski

  1. Main
  2. Projects
  3. Notes
  4. Music
  5. Bio

Building a Static Site Generator

I built my own static site generator to generate this page (and all the other pages on this website) in the first half of 2024.

This is the fourth technology that I've used here, and I finally decided to take the whole thing into my own hands. The previous version of this site was made on top of Gatsby ↗, and I don't really remember my reasoning for picking it, other than wanting to migrate off of Tumblr, where I migrated to from Wordpress, over a decade ago. There's nothing wrong with any of those, it's just that I kept on making custom plugins and hacking around what Gatsby exposes, and it got to a point where the whole thing felt both very brittle, and very complicated. On a somewhat philosophical level, I also don't think I need any frontend JavaScript for some text and an occasional image.

High-Level

The way the generator works is the most straight-forward approach I could come up with. There's an input directory, for example:

├── assets/
│   └── photo.png
├── writings/
│   └── 2024-07-01-static-site-generator/
│       ├── index.md
│       └── miniature.png
├── feed.xml.ts
├── style.css
└── index.tsx

I recursively traverse it, process every file I find, and output the same structure:

├── assets/
│   └── photo.png
├── writings/
│   └── 2024-07-01-static-site-generator/
│       ├── index.html
│       └── miniature.png
├── feed.xml
├── style.css
└── index.html

Most of the behavior can be inferred from the difference between the two:

This way my deploy script is just an rsync to a server.

The "fun", as always, is in the details:

Normalizing Markdown

Most of my content (projects, articles, newsletter) is written in markdown. Notes are also in markdown, but with a slightly different syntax, as they are generated from a subset of my personal wiki. For example, in the website-first markdown I keep a YAML header with metadata, when I link between things, I use the url's instead of relative file paths, etc.

Instead of building a generic system to handle these differences, I just have a different code-path for when the markdown file is inside the notes/ subdirectory or not. This is the biggest upside of a totally custom, one-off solution: I don't have to change how I write my personal notes, or figure out how to plug into someone else's plugin system; I just write some custom code when I need it.

Backlinking

If you clicked around my website, you probably noticed the backlinks section at the bottom of most pages, for example here. To generate this section, when I parse the markdown, I find all of the local links, and store them together with other metadata about that page. I also grab the AST subtree containing a "meaningful" nearest paragraph containing the link.

When rendering the final pages, which happens after the input directory has been fully scanned, I can find items which mention the page that's currently being processed, by looking for a match between what that other page links to, and what is the current slug.

Because I also keep the "meaningful" paragraph, I can generate the Text Fragments ↗ built into modern browsers, and highlight the backlinked sections! To try it out, go here, and click on, for example, the "Protoboard" backlink — you'll end up in the middle of that article, with a relevant paragraph highlighted.

Transclusions

I often publish smaller demos as a part of my newsletter, and they don't end up on the projects page, which feels like where you'd look for, well, projects.

I could just copy-paste the relevant text between these two places, but that would be too simple — instead I've implemented a way to transclude markdown sections using custom markdown directives.

To do this, I use remark-directive ↗ to parse custom :::transclusion{slug, header} directive: slug points to the page from which to transclude from, and header to the header of the section to copy over.

When I find that directive in the markdown that I'm parsing, I have to transplant a subtree of the other page's markdown AST, so I have to already have it in memory. Hence the website generation is done in three passes:

  1. Initial scan of the input directory, and reading all the items into memory, including initial markdown parsing, but not replacing the transclusions yet.
  2. Process the transclusions, and grab all the links from markdown files for backlinking. I do this after transcluding, so the links from that material are also included.
  3. Iterate over all the items again, this time generating the final output files.

For a demo of transclusions, check out the Liunon project.

Time-Tracking

The previous version of this website had a separate page with some public-facing stats, just for fun:

This page was using the data from my time-tracking system, and was implemented in a pretty hacky way — I had a small API server which would collect the Google Calendar events, and post-process them for rendering. The client code would query that server and render the charts — the JSON blob was huge, and the client had to do the work every time the page was opened.

I decided to make the stats a bit less real-time, instead of updating every 10 minutes or so, I'm updating them when the static site generator runs. With that, I also reworked this section a bit, which you can now see on the main page.

It feels really nice to not have to run any client-side JavaScript to generate this. Since the code runs "on the server" (on my machine when the website is generated), I can simply import parts of my time-tracking library ↗, and call whatever functions I want, which ultimately synchronously accesses my filesystem, parses some JSON's, and calculates values for the charts, which get embedded as static <svg>'s inside the html file.

Conclusions

In most cases it's probably better to grab something off-the-shelf, or even just write HTML manually, and move on with your life. I'm not sure if this was the uncommon case, but at least I can check another one of the nerdy "achievements" off my list.

The project took more time than I would have liked, mainly because I had to deal with a lot of idiosyncrasies around migrating from the previous system, and at the same time, I was fixing various styling issues that I accumulated over the years, and adding a couple of features. Did I have to do it all at once? Probably not, but here we are.

Overall, I'm quite happy with this setup, and I also hope I won't have to do this again.

Backlinks

  1. 2024-07-01Generating this Website Statically, the Hard Way1

1116 words published on 2024-07-01let me know what you think