Batch file renaming

Home About GitHub

Often times I need to rename a large group of files, perhaps numbering in the thousands, often times when cleaning up data in preparation for running a machine learning process. There are many CLI tools and file browsers with this functionality built-in, and Emacs unsurprisingly has this functionality built-in as well.

Problem: rename 1000 files with a regular expression.

First I should point out that specialized tools to do this exist, check out the GNU Rename Utils package. The technique described here is a more ad-hoc method where each step is performed manually.

I assume we are not changing the directory tree structure, only the file names. The regular expression does not really matter, in fact you can just edit the file names by hand in vim. But just as an example, let's say each filename has a date and time, an 8-digit hexadecimal ID, and a human readable description. We want to remove the description, and put the date first.

mv bunch-of-bananas_6a828bc0_2022-04-15_19450102.jpg \
   2022-04-15_19450102_6a828bc0.jpg ;
  

The regular expressions to do this will look something like this:

    s/^.*_\([[:xdigit:]]\{8\}\)_\([\-_[:digit:]]\+\)[.]jpg/\2_\1.jpg/
  

In this article, I will demonstrate:

  1. a CLI shell technique I use to rename files without specialized tools, using only a text editor such as vim, and ordinary shell tools like sed.

  2. how to do this with Emacs, specifically dired.

Renaming many files with ordinary shell tools

This technique involves generating a shell script with a thousands of lines of code, each line of code contains a single call to the mv command. We visually inspect the script to ensure there are no errors, then execute the script.

  1. Perform a preliminary check to make sure there are no special shell characters, especially the single-quote (apostrophe) in any of the file names. If there aren't too many, we can these files by hand:

    ls images-dir/ | grep -E "[']";
          

    If there is no output, we can continue.

  2. List the directory contents, pipe it through the sed command, redirect output to a file called rename-images.sh. The regular expression is defined to construct the mv command. When using sed the replacement expression can use the & symbol to insert the entire matched portion of the pattern:

    ls images-dir/ | \
      sed -e 's/^.*_\([[:xdigit:]]\{8\}\)_\([-_[:digit:]]\+\)[.]jpg$/mv -nv '\'\&\'\ \''\2_\1.jpg'\'\;/ \
          >./rename-images.sh ;
          

    Each line of code is a call to mv -nv

    • the -v switch enables verbose mode, which reports what file was renamed to what.

    • the -n switch prevents existing files from being overwritten. This is important because our rename action may accidentally rename 2 different files to the same name.

  3. Inspect the content of the file rename-images.sh with less, if there appear to be errors, make corrections to the regular expression and regenerate the file:

    mv -vn 'bunch-of-bananas_6a828bc0_2022-04-15_19450102.jpg' '2022-04-15_19450102_6a828bc0.jpg';
    mv -vn 'people-eating-dinner_e8e910f1_2022-04-15_19450104.jpg' '2022-04-15_19450104_e8e910f1.jpg';
    mv -vn 'two-owls-in-a-tree_10023aa0_2022-04-15_19450107.jpg' '2022-04-15_19450107_e8e910f1.jpg';
    mv -vn 'apple-orange-banana_42aa0af3_2022-04-15_19450108.jpg' '2022-04-15_19450108_42aa0af3.jpg';
    ....
          

    Scroll down to the bottom, make sure there is nothing obviously wrong. If this file appears to be correct, we are ready to execute.

  4. As an additional check, we can select the 3rd and 4th argument of each line of code, which is the source and target name of each file, and pipe this to the sort | uniq -d command, which will report to us any duplicate file names or rename cycles:

    while read first second third fourth; do \
        echo "${third}"; \
        echo "${fourth}"; \
      done <./rename-images.sh | sort | uniq -d;
        

    If there is no output, this script is OK to run.

  5. Execute the script in the directory containing the files:

    (cd images-dir/ && sh ../rename-images.sh; )
        

    since we used the -v option to mv, this script will report every single file rename action, if we want to we can redirect this output to a log.

    Be careful that the script is only run once. After it executes, it will fail to execute every other time it runs, because all the files that used to exist no longer exist (they were all renamed).

Regular expressions are optional

Of course, you need not use a regular expression to actually change the file names. You can simply insert the mv -nv commands some other way, for example:

ls images-dir/ | \
  while read filename; \
    do echo "mv -nv '${filename}' '${filename}';"; \
  done >./rename-images.sh

and then use your ordinary editor (e.g. vim) to make incremental updates to your rename-images.sh script, and you will have access to undo functionality if you make mistakes. You can still use the duplicate name check in step 4. You only need to keep the mv -vn commands and original file names for each line.

Using dired in Emacs

  1. C-x ddired prompts you for a directory in which to work, and then it opens a dired buffer by calling the ls command on the path you selected.

  2. C-x C-q — in Emacs, any buffer can be toggled between an editable mode and a view (read-only) mode with this key binding, and dired directory buffers are no exception. Although dired buffers are read-only by default (allowing changes only when using i to insert subdirectory listings), so when in a dired buffer, the C-x C-q switches it from read-only to writable dired (wdired) mode.

  3. C-M-% — runs the query-replace-regexp command. This will prompt us to enter an expression twice, first for the query regular expression, then for the replacement pattern:

    ^.*_\([[:xdigit:]]\{8\}\)_\([-_[:digit:]]+\)[.]jpg$
    \2_\1.jpg
          
  4. y — At first, Emacs will prompt you before replacing each string. If you entered the regular expression correctly, the whole file name will be highlighted, you can press y to replace the filename with the replacement pattern. Check that the new file name looks correct.

  5. ! — After pressing y a few times to ensure your regular expression is working as you expected, you can press ! to tell Emacs to replace all of the rest of the filenames by this regular expression pattern without prompting you for each item.

  6. M-v — scroll back up through the dired buffer, visually inspect all of the file names to be sure there is nothing obviously wrong.

  7. C-c C-c — this tells Emacs to execute all of the file renaming actions to make the directory on the file system look like the dired buffer. Emacs automatically checks that you have not renamed 2 different files to the same name, or that you are not overwriting an existing file by accident. After renaming in the file system is complete, the wdired-mode is canceled and you are returned to read-only mode.

Regular expressions are optional

As with the CLI technique and vim, Emacs provides you with the full range of editing tools to allow you to rename your files.

Conclusion

As you can see, the technique to rename files using the sed command in the CLI is almost identical to what Emacs does, except:

It may not be quite as nice as a graphical batch file renamer, but it works, and is built-in to Emacs dired ready to be used at any time.


Emacs for Professionals

This article is part of my Emacs for Professionals series, in which I explain in a few paragraphs how I perform a specific common task using Emacs in ways that people already familiar with command line tools and Linux shell scripting can quickly understand.