My file organization and backup workflow
2024.05.28

I have been using Rclone to backup my files for a few years now, and have developed a workflow of sorts for it. I'll discuss ways in which I organize my files and how that allows me to automatically create verifiable and "portable" backups with Rclone.

My setup definitely won't fit most people's use cases, but I hope this will give you some inspiration on how to incorporate Rclone around your own workflow. It's an extremely useful tool that can automate surprisingly many cloud storage and backup related tasks.

General rules

In order to make the most out of Rclone and its capability for automation, I try to stick to the following general rules when organizing my files:

1.

All files will be categorized under one of six top-level folders: Audio, Documents, Literature, Pictures, Software and Videos. These may vary widely depending on your files, but in general it's important to categorize every file around a set of generic folders.

The top-level folders can also be verbs if it makes more sense to you, eg. Listen, Write, Read, Watch, Program and Study.

2.

All files will be placed in subfolders under the top-level folders, eg. Audio/Podcasts, Audio/Music, Videos/Movies, Videos/TV Shows.

Files should never be placed directly under a top-level folder, as there is always a subcategory they can be placed under. Keeping files in at least depth 2 or higher in the directory tree will also help with automation later on.

3.

Files placed under the previously mentioned subfolders need to follow a consistent file naming convention and folder structure, but only limited to their respective subfolders.

For example, files placed in Audio/Music could be categorized further in subfolders "Album Artist/Album Name", whereas files placed in Videos/Movies wouldn't need to be placed in subfolders, instead named as "Movie Title (Year)".

The most important part is that each subfolder's files should be placed at a consistent folder depth. For example, files in Audio/Music will all be placed at depth 4: Audio/Music/Album Artist/Album Name/## Track Title.mp3, whereas files placed in Videos/Movies will all be placed at depth 2: Videos/Movies/Movie Title (Year).mkv.

I wrote a simple Bash script to help keep track of this. The script prints out each depth files are found at for a given directory.

Cataloging file depths

When files are organized following the aforementioned rules, some interesting automation will become possible. Let's say that each of the subfolders and their common file depths are cataloged in a TSV-file, like so:

2	Audio/Music
0	Literature/Books
0	Videos/Movies
2	Videos/TV Shows

The first field will specify a consistent depth for files relative to the given subfolder. This means that Audio/Music's common subfolders Album Artist/Album Name make files reside at depth 2, whereas files placed directly under Literature/Books are at depth 0.

With this TSV-file quite a lot of automation has now become possible. The following sections will contain some examples of this.

Creating checksums

In order to create backups, more specifically verifiable backups, we need to have checksums for everything. This is something backup tools like Borg and Restic will always create for you. However, something I dislike about these methods is their own ways of browsing and managing the resulting backups. These tools have their own commands for listing, verifying, mounting and restoring files in a backup.

Having become quite familiar with Rclone over the past few years, I wanted to achieve something similar to these backup tools entirely within Rclone. Turns out, it's entirely possible to create backups with Rclone when using it with the option --backup-dir.

To create backups with Rclone, checksum files need to be created. Unlike traditional backup tools, Rclone won't automatically create these for you. This is where the previously mentioned TSV-file comes in handy. Since the file depth specified for each subfolder means that no files should be placed above that given depth, we can create checksums for all files under that depth.

In order to automate this, I wrote a Bash script called new-md5, which can recursively create MD5 files at specified depths. Instead of creating one large MD5 file directly under Audio/Music, new-md5 can create a separate MD5 file for each album at depth 2. The benefit of this over large MD5 files is that renaming and moving directories around remains easy.

Given that every subfolder is specified in the TSV-file with their correct file depths, automating checksum creation becomes a trivial task with new-md5 or a similar helper script. See new-md5's documentation for some use cases.

Backups

Since every directory now has MD5 files, backing up can be done in a more "portable" fashion than some traditional backup tools allow for. With this I mean that the files can simply be copied or synced as is to new drives without a need to generate new checksums for the resulting backup. Since new-md5 doesn't update hashes in the generated checksum files by default, there is no danger of files being changed without notice between multiple new backups.

Rclone ties this all together. All files can be placed in a default-remote, whether they are in the cloud or on a disk, encrypted or not. Creating backups can then be automated by creating remotes named backup_*, have a helper script cycle through each backup remote with rclone listremotes | grep "^backup_" and create backups with Rclone's --backup-dir, optionally along with --suffix and --suffix-keep-extension.

Since --backup-dir shouldn't be in the same path as the top-level folders, it's a good idea to make every remote's root include a couple of meta-folders: one for files the given remote contains and one for deleted files, which --backup-dir handles for you. I tend to name these simply as 0 and 1 to save space in the path length.

With this setup, a backup can be created with the following command:
rclone sync default:/0 backup_remote:/0 --backup-dir backup_remote:/1

Verifying the backups and the default remote is also trivial with Rclone. All MD5 files in a remote can be found with
rclone lsf remote_name:/0 -R --files-only --include '*.md5'
and subsequently checked in a loop with
rclone md5sum remote_name:/0/path/to/md5 -C remote_name:/0/path/to/md5/.verify.md5.

Although this might not be the most optimal and space-saving method for creating backups, I've personally been very happy with it. Since everything revolves around Rclone's remotes, this method is extremely malleable. Everything can be done locally on hard drives, but can easily be scaled up to include any cloud storage that Rclone supports, not to mention encryption, compression or combining remotes with the union remote.

Following the 3-2-1 rule when creating backups this way is also easy. The default remote can be a local drive, with one of the backup remotes pointing to another local drive and another to a cloud storage.