Writing a code viewer in GNU make

I wrote a code viewer in GNU make called CHUM and was so excited about it I popped off a quick post (below). However, I realized I wanted to write about why a static code viewer, and especially why in GNU make of all things ... thus the second heading, below. I sure hope I won't need more than one level after it!

Viewer (original post)

Power is back on after Francine. I have to go to work tomorrow and it's 17 minutes after midnight, oops. However, I do have news! Head over to my code page to see the fruits of my cursèd Makefile code page thing, CHUM. That's what I finished today!

OK, Why in GNU make ???

The question I anticipate most about this software (indeed, the question I have asked myself the most since writing it) is: why make? CHUM isn't just a Makefile that maps inputs to outputs and builds what needs to be built. It's an actual program, embedding an awk(1) script and html templates, and even with a configuration file. What gives? Why not write this in sh(1) or idk, literally anything else?

1. SCHWA, or the First Attempt

The core of CHUM is a big embedded awk script that originally lived as SCHWA, or the Source Code Htmlizer Written in Awk. SCHWA works pretty well---I mean I'm still (mostly) using it in CHUM---but awk(1) has really bad argument handling options. I listed the invocation I was using to generate SCHWA's own schwa page in its README, but here it is again so you don't have to follow a link:

    -vreadmefilter='pandoc -thtml5' \
    -vtemplate=template.html \
    -vstatic=style.css \
    -vclone='<a href="https://git.acdw.net/schwa">https://git.acdw.net/schwa</a>'
    -vdesc="Source Code Htmlizer" \
    *

Clearly, this is not a good commandline to have to remember every time I make a change to some codebase. So I thought to myself, "I should write a wrapper to take command-line arguments or read a config file in the repo or something." I originally was going to use sh(1) but it has its own problems, notably being a pain in the ass to embed awk scripts in.

(Of course the ironic thing is that make(1) isn't really easier to embed things in, but more on that later.)

2. Using provided tools

A short digression is apropos here, I think: Why am I writing this in awk(1) and sh(1) and make(1) at all?

Something I've always admired about the gemini project is its ethos of using provided tooling for as much as possible. No need to reinvent the wheel: TLS exists and is fine, having a line-based protocol makes clients simple to code with off-the-shelf libraries, etc. etc. Ever since first stumbling into this space, that ethos has fascinated me.

One reason I like using sh(1) and awk(1) and make(1) is that these three tools are present in every POSIX operating system. What's more, POSIX sh(1) and awk(1) are really pretty ok for especially the kind of work in this project: at the bottom of it, it's just manipulating text from one format to another.

POSIX make(1) is another story, sadly. The POSIX standardizes make(1) to be barely usable, even in "normal" situations. Luckily, GNU Make is pretty well-supported, and is either installed by default or is the default make(1) implementation on most Linux distributions [citation needed]. BSD is its own kettle of fish, but (A) I don't use BSD (sorry not sorry!) and (B) you can install GNU Make on BSD ... I'm pretty sure?

Anyway, I like using these tools and other POSIX-specified utilities because they're pretty likely to be installed wherever with minimal faff. And I just think they're neat!

3. (GNU) Make is actually ... ok for this

My first thought, to wrap schwa.awk in a shell script, is a good strat! I would've gone with it except for the aforementioned annoyance of embedding awk in shell. Basically the choices were one of these:

       awk 'BEGIN {
       var = val
       print("whatever")
}
...
'
BEGIN {
      var = val
      print("whatever")
}
...
EOF
process() { awk "$prog"; }
)"
BEGIN {
      var = val
      print("whatever")
}
...
EOF
process() { awk -f /tmp/prog; }

None of these are really great, though I admit it's really for aesthetic reasons. Still, those are reasons and I'm writing this for fun, so I started thinking about other options.

I remembered this blog post I'd read about making a shell/make polyglot script: How to "make" a shell script. I'm not sure what the author uses this for, but it helped me remember about define variables in GNU Make. These are like here-docs but don't require weird quoting or subshells---so they seemed like a good solution.

Turns out, they're ok! I easily embedded the awk script into the makefile, plus the HTML template and style---negating the need for extra files cluttering up the repo.

Finally, make(1) allows you to include other makefiles, which means I could set a lot of variables for defaults, then include a config file from the project directory to set them per-project.

Finally, thanks to make's -C option I banged out CHUMS, a program that builds an index of a bunch of repos, in maybe 20 minutes. Overall I'm pretty happy with this solution and am planning to use it from here on out for my code repos.

4. Where to go from here

There are still plenty of places to improve CHUM. For one thing, awk field variables and make variables both use $=, meaning I have to escape awk variables with =$$ -- and besides, there are plenty of places where I've already made my embedded awk script simpler by just using make variables directly. Going further in this direction will yield riches, I think.

The way CHUM is currently written, I'm also missing out on incremental builds and parallel processing. Porting more of the awk functionality to make will also allow me to take more advantage of make's features -- and since I'm committed to using only GNU make, I can target a richer featureset than I have been used to.

Anyway I don't know how to finish this so uh yay cool neat project! Fun times :)