Klaus on Tilde Town

Project 128: how much space do you really need?

I've started a personal project that will revolutionize the way I treat my files and start to rethink my whole backup and storage strategies in my digital life. I'm calling it Project 128, and the goal of it is to try to fit every digital file that I own under the space of 128 GB (not including the backups).

The 128 GB limit is somewhat arbitrary, but I deliberately chose a lower boundary than most commercial HDDs nowadays because I wanted to force myself to think more about what files I choose to keep and what are their importance to me. Besides, buying 128 GB of storage as a USB drive or MicroSD card is cheap enough nowadays, so that helps.

So why, in a world where new data is constantly being produced faster than anyone (and anything) can process, and storage tends to become progressively cheaper, am I artificially lowering my storage limits to such a ludicrous small number?

My answer is because most of the data we own today is not important, rather just nice to have, and our real crucial data can be made much smaller than we think is possible.

I had already previously written about my bone to pick with file bloat, that is, the fact that just like software, user files have been growing larger unnecessarily without adding value. High-definition pictures are 5MB each - does that much definition really make a difference at all? A full length movie is now 2 to 3GB a piece - does "quality" really justify such a humongous size? The examples go on, but it's not the point of this article.

Besides, as a positive side effect, having everything backed up neatly in a small space makes trying and transitioning between Linux Distributions (distrohopping) much easier. We can think of it as the digital equivalent of travelling light along several hotels in a road trip.

My approach to fit everything under 128GB (or less, if possible!) will follow three major steps:

  1. Defining what's mandatory and what's optional
  2. Reducing the size of everything as far as possible
  3. Storing and backing up

These are better explained below:

Define what's important

Perhaps the most important step in this project, deciding what's absolutely important to be kept and stored from what's optional or a nice to have is the first step to having a sane storage and backup policy. Of course, given the capacity, we'd rather store everything we own, but the amount of stuff that really matters in the end might actually be significantly smaller.

Stop and think for a moment: if a disaster were to happen and you could only save a small number of files from your hard drive, which ones would you save? Chances are it's much, much less than the total stuff you own. Chances also are that it's not the stuff you interact with every day either, but rather files that are pretty much immutable and that have a value of specific importance to you.

Maybe its your tax declaration statements from five or ten years ago that you must retain for bookkeeping, or the pictures you took from a vacation in a beautiful place last year. Or the pictures of loved ones at a family reunion. Or your PGP and SSH keys. Each one of these has a specific reason to be important to you.

On the other hand, the data you interact with on a daily basis is likely changing all the time or is likely to have at least a few different sources, like a download from the internet. Code, images, web pages, documents, nowadays with so much content being produced and shared around, you're most likely to see the same thing available from many different sources. Which means that if you lose your copy, it wouldn't be hard to get it back again.

In my personal implementation of Project 128, I'll be using the following criteria to evaluate what's important along my files, evaluated as an "OR" condition:

Although these guidelines are sort of strict, they do make it clear that most of the stuff you deal with daily is not irreplaceable, and if you lose it, the impact is minimal. Not so with the stuff that qualifies to the above, so that's the core for Project 128.

Following these guidelines, I estimate there's between 20 to 30GB of these core files among my stuff, so that still leaves me with plenty to store the rest of the "non-essential" stuff. Unfortunately, that non-essential stuff is much, much larger.

Reduce the size when possible

Theoretically, anything that is non-essential can be obtained back from the internet somehow, but in practice it's much better and safer to have it locally. Since we judged the important stuff away already, it's harder to grade non-essential stuff in terms of importance, so another strategy is necessary.

Luckily, almost all sorts of media files can be reduced away without losing value. This is great news, since media likely is the largest space hog in your hard drive.

The pictures and video you've been taking in bulk since you got your first smartphone are likely to be much larger in resolution than they need to be. Your videos are likely way larger than they need to be (b-but it's 4K!) and a big reduction will not affect the experience if all you do is watch it from your laptop or phone. Music is usually small enough in size, but the opus format (.ogg) has a slightly better compression than the popular MP3.

You can reduce images by using the convert command that comes in the ImageMagick program. Reducing an image with it is as simple as running the following command:

convert original_image.jpg -resize 50% small_image.jpg 

Where small_image.jpg would now have half of the original_image.jpg's width and height - essentially reducing the resolution to a quarter of the original size.

This resolution trick is pretty useful since it means that for every time that you reduce the size of the "sides," you're essentially reducing the final image size by a factor of its square: a 1/2 reduction produces an image with 1/4 of original size, 1/3 reduction 1/9th, and 1/4 reduction reduces the original to 1/16th of the original size. I have reduced pictures from my phone to 1/16th of the size before, and the "loss" in quality is negligible.

Video can also be reduced in a similar way through the ffmpeg program. The general syntax is:

ffmpeg -i videofile.mp4 -vf scale=<final width>:-1 output.mp4 

This way, ffmpeg resizes the video to the desired final width in pixels, and automatically calculates the height to maintain the aspect ratio. And just like with images, the reduction in size is proportional to the square of the reduction in width. Hence you usually can make your videos much smaller without losing quality significantly.

Unfortunately, I do not know of a way that you can reduce or compress MP3 audio in a simple way as described above. I'm OK with that, though, since audio is generally pretty small, and my collection of it is not very large. But if you know how to "compress" audio in a similar way, please let me know in Mastodon.

Storing and backing up everything

By performing the steps above in reducing the unneeded size of media files, you will have reduced your required storage space in a significant way. My hope is to be able to fit everything under the 128GB mark after performing these reductions accordingly. The next step is to start storing and backing up everything.

There is a simple and easy to remember backup strategy that goes by the name of 3-2-1 backup. In summary, it states that you should have at least 3 copies of each backup in 2 different media, and where 1 copy is kept off-site (physically out of the place you usually work with the data). It might not be perfect or best suited for enterprise-grade backups, but it's enough for the threat model described here.

For the local backups in different media this is an easy and cheap task given the low prices of "small" storage lately. I can buy a 128GB SD card or USB pendrive for quite cheap, with the other media being a 320GB external HDD or even a smaller 160GB external SSD - both still much cheaper than their TB-large counterparts.

My personal requirement is that the storage medium itself must be encrypted, and thankfully that's easy to do through graphical applications that manage storage volumes. The Gnome Disk Utility, for example, makes the entire process of formatting and encrypting external storage very easy. There's also GParted and other programs that can do that as well.

The encryption requirement not only protects it from unwanted access, but also is the best option in case the drive gets corrupted and I have to discard it safely without the risk of anyone accessing my files. Once all my media are encrypted, I can start backing up everything.

For the third, off-site copy, things gets a little tricky: I can either get another USB drive, back it up regularly and keep it in another place I have easy access like the office, or use the dreaded cloud-based storage. Privacy-conscious users are rightly afraid of storing personal files in 3rd-party providers, but I think there are some mitigation techniques that could still be used.

One option is to encrypt everything before sending it to the dreaded cloud. Encrypt files individually via gpg locally, and only then send these encrypted versions off to untrusted storage. Although this is a good solution to protect the contents of files, it does little to protect the metadata, chiefly the name and ownership of the files. So even if you encrypt your homemade_porn_movie.mp4 file as a homemade_porn_movie.mp4.gpg file, an adversary could still infer interesting information from it even though the content is protected.

You could work around this by compressing all your files to be encrypted in one large archive (maybe version it somehow) and encrypt that archive instead, but the granularity and flexibility of accessing that data becomes much lower. This might be a significant trade-off depending on your use case.

Another option is to use a storage service that encrypts the media in a way that only you can decrypt it, usually via a password. This is the method that some privacy-conscious email providers like Protonmail, Confidesk or Tutanota advertise they deploy their services, and is followed by at least one storage provider: MEGA. If you trust MEGA's promises of keeping your content encrypted, you don't even need to encrypt the content before sending it off - the storage medium is already encrypted. And you get 15GB free.

However, as with all your "cloud" services, if you don't own the machines where the data is stored, you don't control it. And there is no technical way to prevent the host to MitM the password or even adding a backdoor (like it was done with Tutanota in 2020) to decrypt your content in a silent manner. You can accept, but cannot ignore this risk.

Conclusion

Fitting your entire life's worth of information within 128 GB of space is a revolutionary project, both in the sense of re-organizing your files and practicing a sort of digital minimalism to re-thinking your backup strategy.

Is reducing the amount of stuff always the right way to to go? I'm not sure, but I'm pretty sure it will help me decide better what's important, and derive a better way to store and back up my stuff with consciente. I should write back in a couple of months reporting my findings, but in the meantime, I'm pretty excited with trying this out again.


What do you think about reducing everything that you own to just 128 GB of space? Could you go lower than that? How would you perform your backups in such case? Let me know in Mastodon!


This post is number #4 of my #100DaysToOffload project. Follow my progress through Mastodon!


Last updated on 02/10/21