Stop. Writing. Shell. Scripts. (a rant)

2022-05-17

(warning - this is basically unedited ranting into a text editor. Here there be dragons!)

Shell scripts are the worst kind of problem, where everybody seems to have just accepted that the problem is there and not worth dealing with. Shell scripts have been normalised as an acceptable part of life on a linux or unix system, but they shouldn't be. Shell scripts are an unacceptably awful form of programming. While many people contend that a good programmer can write a good program in any language, and I mostly agree, shell scripts work so hard against you in writing anything approaching a good program.

And I know some people will respond in a few very predictable ways:

to the first, I respond that even if this was never the intention, important things is what I see shell scripts used for more than anything else - things like installers, system administation tools, and so on. It's not uncommon, in my experience, to encounter shell scripts that ask for root access. To the second, I think that the unix philosophy is flawed, and one of the points of sharing this rant is to highlight some of its flaws. I'll get to the third later. The fourth is correct, if a little rude. So with that in mind, on to the complaining!

Everything as strings is a bad idea, actually

Generally, in a shell script, you program by working with and manipulating strings. You open a file as a string, extract information from it with string processing tools, and output a string. In fact, there isn't (really) any other datatypes. Strings represent numbers, strings represent tables of data, strings represent the contents of files, etc. This is often pointed to as an example of the simplicity of the unix environment - instead of bothering with all of these meaningless distinctions and confusions caused by many data types, everything only has to deal with one type of data!

There is a serious problem here though: strings aren't one type of data. Putting encoding issues to the side (which are a problem in any language), and this might seem trivial, when you use strings for all kinds of data it can be difficult for a program to determine what data it's working with. Different programs can also use completely different expected formats for data, leading to a lot of shell scripting being about transforming a string from the output format of one program into the input format of another program using string processing methods that don't understand the contents of what they are modifying, just regex or awk field splitting. This makes it very easy to be running a program on malformed data. Sometimes they complain, often they will just continue and produce malformed output.

An example from my real life: I wrote a shell script that would rename mp3 files based on the tags they had, in order to make my music library have a consistent filename format. It used the program "id3v2" to get the tags from the file, used sed and awk to find the data it needed, then renamed the file based on that. This script worked great until one day it started deleting some albums I ran it on. Turns out that sometimes, the tag I was searching for wasn't attached to an mp3, so it was failing to convert and rename the file, but then deleted the original file.

In any other language, the program would have terminated when the tag wasn't found and say there's a missing key in the tags structure. The shell script just powered on through assuming a song title of an empty string. This problem comes up again and again in shell scripts whenever structured data is being processed - as the shell script does not understand the structure of the data, it resorts to the brute force string manipulation method. This makes dealing with any structured data in a shell script a pain, because you have to either devise a string format for representing the structured data, or "parse" it using shell tools, which is prone to error.

Another approach would be to use tools like jq, which *do* understand the data they're processing. But at that point you have to learn jq query language (essentially a programming language in its own right) and any user is required to install additional software to make your script work.

Learning 10 programming langauges to use one

The shell does almost nothing on its own, instead relying entirely on external programs. This is often given as an example of unix modularity, but really it just creates an awkward (heh, awk-ward) environment where writing a program is less about knowing a language and how to solve the problem you have and instead about knowing how to use many different tools strung together, each that ranges from having a set of arcane command-line flags through to an entire programming language of its own. Take, for example, awk. awk (as well as having a name that sounds like a noise that your stomach makes when you're ill) is a very useful tool for string processing, especially at processing data in a tabular form. It's very rare to see a shell script that doesn't invoke awk at some point, and for good reason - awk is super handy. But to be able to make use of awk, you need to learn the syntax and semantics of the awk language, which are entirely different to that of the shell. This is also true for other tasks. Want to process json files? use jq, which is practically its own programming language too! This extends as far as seeing perl programs embedded inside shell scripts for particularly tricky bits of string processing, which is literally using another general-purpose programming language inside your program to perform some task. It adds to the cognitive load of trying to learn shell scripting or trying to understand any given shell script, as it likely isn't written in shell - it's written in shell, awk, sed, jq, and the arcane flags that make up the syntax of many of the core utilities (which is inconsistent, even between two POSIX-standardised programs)

Error handling? what's that?

Shell scripts don't handle errors very well, preferring to generally just steamroll ahead with whatever task they have to do. A perfect example is trying to run a program that doesn't exist. Most programming languages will stop you by saying that you are trying to call a function that is undeclared - not a shell script! It'll just move right ahead, and only break by the time it's already executed another line that required the data you got out of that program in order to work. This is catchable, but generally only found after the script has been run and caused some kind of problem.

This can be solved by writing a big check at the start of your shell script for every single program that it relies on, and wrap every call to every program in an if to check the return code. But nobody does that. Leaving shell scripts as a flimsy pile of unchecked errors most of the tmie.

The times when everything isn't a string

I said earlier that everything in shell scripts was a string. This is usually true, except sometimes it's not! For example, when iterating over files in a directory or a list of items, strings are split into individual items implicitly. How useful! Except this implicit item splitting happens other places as well, such as capturing output from a command with $(), which you often won't want. The way around this is to add "IFS=" on a line on its own at the top of your program, setting the field-splitting seperator to nothing. But this also stops loops from working, and if you are processing strings with awk you don't want the shell doing field seperation. This would probably all be acceptable if split strings were a data structure you could do useful stuff from, like take an index, but you can't. Great!

Variants and versions

I'm going to keep this short, but there are lots of different shells all at various levels of posix compatibility. It's easy to accidentally use a shell feature that only exists on your platform and not for others and only find out after it breaks in somebody else's setup. This can be avoided using shellcheck or sticking only to POSIX shell, but then you miss out on useful features. So you have two shitty options - use the standard dialect, which is missing important tools, or use your system's dialect which means your script might not run properly other places. Pick your poison, I guess.

Environment dependence

Regarding things not working properly other places, shell scripts are so environment dependent it's ridiculous. Short of putting a whole ton of checks at the top of your file for the capabilities of various different programs on each system, you can't tell exactly what a system will have - POSIX is meant to solve this but only covers the core utilities, good luck if somebody else has another versions of ffmpeg with a slightly different output format that breaks your script! Additionally, even coreutils can't be relied upon to be POSIX-compatible: GNU stuff requires an environment variable to tell it to conform to POSIX standard.

On environment variables, it's very possible that a user has set an alias for a program, or made some other environmental change for their interactive environment. Unless the user is very careful, those changes will affect scripts too, in a way that is likely to break them. oops!

Ugly hacks

some other miscellaneous ugly things in shell I want to point out:

On Shellcheck

Shellcheck is a great tool and an absolute must-have if you are somehow, hopefully only ever by force, having to write or maintain shell scripts. But it's a patch at best, and no amount of fancy static analysis could understand every possible runtime bug that the shell script way of working almost necessarily causes.

(in)conclusion

stop writing shell scripts. The supposed portability is a lie and an illusion. You'd be better writing your script in a language like ruby or python - they are almost as widespread, much more robust, and will have less unexpected behavior that could lead to bad consequences. The effort required to write shell scripts that work around the fundemental brokenness of the system isn't worth it. Better that users install ruby than have to install specific programs that your shell script depends on. Shell scripts are maybe acceptable for programs that only run on one machine, in one environment. But they can still blow up in your face. Use ANYTHING else.