Batch file renaming
Often times I need to rename a large group of files, perhaps numbering in the thousands, often times when cleaning up data in preparation for running a machine learning process. There are many CLI tools and file browsers with this functionality built-in, and Emacs unsurprisingly has this functionality built-in as well.
Problem: rename 1000 files with a regular expression.
First I should point out that specialized tools to do this exist, check out the GNU Rename Utils package. The technique described here is a more ad-hoc method where each step is performed manually.
I assume we are not changing the directory tree structure, only
the file names. The regular expression does not really matter, in
fact you can just edit the file names by hand in vim
.
But just as an example, let's say each filename has a date and time,
an 8-digit hexadecimal ID, and a human readable description. We want
to remove the description, and put the date first.
mv bunch-of-bananas_6a828bc0_2022-04-15_19450102.jpg \ 2022-04-15_19450102_6a828bc0.jpg ;
The regular expressions to do this will look something like this:
s/^.*_\([[:xdigit:]]\{8\}\)_\([\-_[:digit:]]\+\)[.]jpg/\2_\1.jpg/
In this article, I will demonstrate:
a CLI shell technique I use to rename files without specialized tools, using only a text editor such as
vim
, and ordinary shell tools likesed
.how to do this with Emacs, specifically
dired
.
Renaming many files with ordinary shell tools
This technique involves generating a shell script with a
thousands of lines of code, each line of code contains a single call
to the mv
command. We visually inspect the script to
ensure there are no errors, then execute the script.
Perform a preliminary check to make sure there are no special shell characters, especially the single-quote (apostrophe) in any of the file names. If there aren't too many, we can these files by hand:
ls images-dir/ | grep -E "[']";
If there is no output, we can continue.
List the directory contents, pipe it through the
sed
command, redirect output to a file calledrename-images.sh
. The regular expression is defined to construct themv
command. When usingsed
the replacement expression can use the&
symbol to insert the entire matched portion of the pattern:ls images-dir/ | \ sed -e 's/^.*_\([[:xdigit:]]\{8\}\)_\([-_[:digit:]]\+\)[.]jpg$/mv -nv '\'\&\'\ \''\2_\1.jpg'\'\;/ \ >./rename-images.sh ;
Each line of code is a call to
mv -nv
the
-v
switch enablesverbose
mode, which reports what file was renamed to what.the
-n
switch prevents existing files from being overwritten. This is important because our rename action may accidentally rename 2 different files to the same name.
Inspect the content of the file
rename-images.sh
withless
, if there appear to be errors, make corrections to the regular expression and regenerate the file:mv -vn 'bunch-of-bananas_6a828bc0_2022-04-15_19450102.jpg' '2022-04-15_19450102_6a828bc0.jpg'; mv -vn 'people-eating-dinner_e8e910f1_2022-04-15_19450104.jpg' '2022-04-15_19450104_e8e910f1.jpg'; mv -vn 'two-owls-in-a-tree_10023aa0_2022-04-15_19450107.jpg' '2022-04-15_19450107_e8e910f1.jpg'; mv -vn 'apple-orange-banana_42aa0af3_2022-04-15_19450108.jpg' '2022-04-15_19450108_42aa0af3.jpg'; ....
Scroll down to the bottom, make sure there is nothing obviously wrong. If this file appears to be correct, we are ready to execute.
As an additional check, we can select the 3rd and 4th argument of each line of code, which is the source and target name of each file, and pipe this to the
sort | uniq -d
command, which will report to us any duplicate file names or rename cycles:while read first second third fourth; do \ echo "${third}"; \ echo "${fourth}"; \ done <./rename-images.sh | sort | uniq -d;
If there is no output, this script is OK to run.
Execute the script in the directory containing the files:
(cd images-dir/ && sh ../rename-images.sh; )
since we used the
-v
option tomv
, this script will report every single file rename action, if we want to we can redirect this output to a log.Be careful that the script is only run once. After it executes, it will fail to execute every other time it runs, because all the files that used to exist no longer exist (they were all renamed).
Regular expressions are optional
Of course, you need not use a regular expression to actually
change the file names. You can simply insert the mv -nv
commands some other way, for example:
ls images-dir/ | \ while read filename; \ do echo "mv -nv '${filename}' '${filename}';"; \ done >./rename-images.sh
and then use your ordinary editor (e.g. vim
) to make
incremental updates to your rename-images.sh
script,
and you will have access to undo functionality if you make
mistakes. You can still use the duplicate name check in step 4. You
only need to keep the mv -vn
commands and original file
names for each line.
Using dired
in Emacs
C-x d
—dired
prompts you for a directory in which to work, and then it opens adired
buffer by calling thels
command on the path you selected.C-x C-q
— in Emacs, any buffer can be toggled between an editable mode and a view (read-only) mode with this key binding, anddired
directory buffers are no exception. Althoughdired
buffers are read-only by default (allowing changes only when usingi
to insert subdirectory listings), so when in adired
buffer, theC-x C-q
switches it from read-only towritable dired
(wdired
) mode.C-M-%
— runs thequery-replace-regexp
command. This will prompt us to enter an expression twice, first for the query regular expression, then for the replacement pattern:^.*_\([[:xdigit:]]\{8\}\)_\([-_[:digit:]]+\)[.]jpg$ \2_\1.jpg
y
— At first, Emacs will prompt you before replacing each string. If you entered the regular expression correctly, the whole file name will be highlighted, you can pressy
to replace the filename with the replacement pattern. Check that the new file name looks correct.!
— After pressingy
a few times to ensure your regular expression is working as you expected, you can press!
to tell Emacs to replace all of the rest of the filenames by this regular expression pattern without prompting you for each item.M-v
— scroll back up through thedired
buffer, visually inspect all of the file names to be sure there is nothing obviously wrong.C-c C-c
— this tells Emacs to execute all of the file renaming actions to make the directory on the file system look like thedired
buffer. Emacs automatically checks that you have not renamed 2 different files to the same name, or that you are not overwriting an existing file by accident. After renaming in the file system is complete, thewdired-mode
is canceled and you are returned to read-only mode.
Regular expressions are optional
As with the CLI technique and vim
, Emacs provides you
with the full range of editing tools to allow you to rename your
files.
keyboard macros to do repetitive edits
You can also undo any mistakes you make with
C-x u
orC-/
.
Conclusion
As you can see, the technique to rename files using
the sed
command in the CLI is almost identical to what
Emacs does, except:
No temporary files such as
rename-images.sh
are created, rather Emacs buffers the renaming script in memory.A single key binding for each action is provided, which is faster than entering whole commands in the CLI shell.
Duplicate filename checks are run automatically before committing changes to the filesystem.
The actual
mv -nv
commands and original file names are hidden, allowing you to perform edits more naturally on only the new file names. This makes file names line up more nicely and is conducive to rectangular/columnar editing.As a reminder, in complete fairness, the aforementioned GNU Rename Utils software does the exact same thing as
wdired
, except it runs an editor in an isolated process, rather than as a Lisp program in the Emacs runtime.
It may not be quite as nice as a graphical batch file renamer,
but it works, and is built-in to Emacs dired
ready to
be used at any time.
Emacs for Professionals
This article is part of my Emacs for Professionals series, in which I explain in a few paragraphs how I perform a specific common task using Emacs in ways that people already familiar with command line tools and Linux shell scripting can quickly understand.