Schemescape

Development log of a life-long coder

A Common Lisp static site generator, because why not?

In a recent post about speeding up md2blog (the Deno/TypeScript-based static site generator I created for this site), I gloated about suppressing the urge to build yet another static site generator:

I have successfully staved off the urge ... to create yet another static site generator by instead making md2blog ... just fast enough ... that it seems pointless to bother improving upon its performance.

This is the inevitable follow-up post where I describe the new static site generator I ended up building, this time using Common Lisp.

Motivation

I'll discuss differentiators next, but my personal motivation for creating this static site generator is some combination of the following:

In other words: for fun.

Differentiators

Fun is fun, but am I just reinventing the wheel? Hopefully not. I fully expect no one else will use this static site generator (assuming I even complete it), but I think it does have a unique combination of features, including:

Architecture

Pipeline

For maximum flexibility, this static site generator is based on a generic processing pipeline, represented as a directed acyclic graph of processing nodes.

Here's a description of a blog pipeline, with one bullet per node:

In image form (note: the pipeline definition in code is shown later):

Blog pipeline diagram

Node types

So far, this sounds like Metalsmith's declarative JSON-based "plugin chain". Here's the twist:

There are two main node types:

For example, here are two hypothetical nodes: an aggregate node that combines metadata from two Markdown posts ("post1.md" and "post2.md") into a single index ("index.json"), and a transform node that converts Markdown to HTML -- note that the transform node processes each item in isolation.

Aggregate node diagram Transform node diagram

Explicitly expressing 1:N transform nodes is the primary innovation (although I'm sure--at least I hope--this has been done before, somewhere). Here are the benefits:

I experimented with a similar approach using GNU Make in the past, but besides being slow (because it spun up a new process for processing each input), it was also cumbersome, requiring hand-crafting patterns and adding kludges to detect zombie files that should be deleted.

Under the hood

Internally, the processing pipeline actually operates over "changes". For example, if a file gets added or modified in the source directory, an :update event is propagated down the pipeline; if a file is deleted, a :delete event is sent. There are two additional node types that operate directly upon changes:

Each node maintains a snapshot of inputs and outputs and they will only run when their input actually changes (item contents and/or metadata). Transform nodes also maintain a map of inputs to outputs (to handle deletions, implicit or explicit).

Node types example

Here's the blog pipeline from before with each node type in parentheses:

Note that when a single Markdown post is updated, the transform nodes only need to process the updated item(s) (if any).

Item representation

Items are represented by three pieces of information:

Templates/HTML representation

I hate most static site generators because I hate the template languages they use. Especially the one Hugo uses. Sometimes, it's just the verbose syntax for inserting a value that I dislike. Other times, it's the bespoke conditional/loop syntax that I grudgingly have to learn.

A corollary to Greenspun's tenth rule seems appropriate:

Any sufficiently complicated C or Fortran program HTML template language contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.

The obvious solution is to simply embrace Common Lisp's list processing. Here's an example of the list format I'm using:

(:p "Here is a "
    (:a :href "https://log.schemescape.com/" "link!"))

Rendered to HTML:

<p>Here is a <a href="https://log.schemescape.com/">link!</a></p>

So that's the "list" part.

The "processing" part is literally just Common Lisp code. No weird syntax, just a standardized language that's been kicking around for decades. Although I haven't implemented it yet, this should also make validating relative links at build-time trivial since I only need to walk lists (something Lisp does with ease).

For the record, I did not use CL-WHO because it doesn't escape strings by default and I didn't use Spinneret because it has ~20 dependencies.

Current status

Here's what I've implemented so far:

There's quite a bit of work remaining:

Performance

Given that the implementation is incomplete, I don't want to read too much into its performance. For my site, I expect it to be fast because updating a single blog post will only require rebuilding the edited post, possibly the index/archive/post index pages, and possibly the Atom feed (roughly a 30x reduction in the number of files being written for my smallish site).

On my netbook where I finally got md2blog live rebuilds down to 200ms, the prototype blog could complete an incremental rebuild for a single post update in 80ms--and that's with a slow (and brittle) Markdown processor that needs to be replaced.

Code

Currently, the code is a complete mess. It's all one big file with a million TODOs and at least one gratuitous macro. It's a work-in-progress, and code cleanup isn't even in my top ten concerns right now.

Honestly, I don't even want to share the code because it's so ugly, but since you can easily find it, I'll just save you the trouble:

https://github.com/jaredkrinke/cl-stuff/blob/main/ssg/ssg.lisp

Pipeline example

Here is an example of the previous blog pipeline expressed in code:

(defparameter *pipeline*
  '((source :children (front-matter))
    (front-matter :children (markdown
                             index-posts))
    (markdown :children (template-posts))
    (template-posts :children (lhtml))
    (index-posts :children (template-indexes))
    (template-indexes :children (lhtml))
    (lhtml :children (destination))
    (destination)))

The first symbol in each list is the name of a node class. Arcs/arrows can be added via either :children or :parents. I prefer to use :children because it seems more intuitive to think of the way items flow through the pipeline (source -> front-matter -> markdown -> template-posts -> lhtml -> destination).

Name

So what is this new static site generator called? Well, that's also not in my top ten concerns right now. The Common Lisp package is just named SSG as a placeholder. Hopefully I'll think of a catchy name eventually.

The end

And apologies for creating yet another static site generator. At least I didn't create a new front-end framework for JavaScript!