I have been using Rclone to backup my
files for a few years now, and have developed a workflow of sorts for
it. I'll discuss ways in which I organize my files and how that allows
me to automatically create verifiable and "portable" backups with
Rclone.
My setup definitely won't fit most people's use cases, but I hope this
will give you some inspiration on how to incorporate Rclone around your
own workflow. It's an extremely useful tool that can automate
surprisingly many cloud storage and backup related tasks.
In order to make the most out of Rclone and its capability for
automation, I try to stick to the following general rules when
organizing my files:
1.
All files will be categorized under one of six top-level folders:
Audio
, Documents
, Literature
,
Pictures
, Software
and Videos
.
These may vary widely depending on your files, but in general it's
important to categorize every file around a set of generic folders.
The top-level folders can also be verbs if it makes more sense to you,
eg. Listen
, Write
, Read
,
Watch
, Program
and Study
.
2.
All files will be placed in subfolders under the top-level folders, eg.
Audio/Podcasts
, Audio/Music
,
Videos/Movies
, Videos/TV Shows
.
Files should never be placed directly under a top-level folder, as
there is always a subcategory they can be placed under. Keeping files
in at least depth 2 or higher in the directory tree will also help with
automation later on.
3.
Files placed under the previously mentioned subfolders need to follow a
consistent file naming convention and folder structure, but only
limited to their respective subfolders.
For example, files placed in Audio/Music
could be
categorized further in subfolders "Album Artist/Album
Name"
, whereas files placed in Videos/Movies
wouldn't need to be placed in subfolders, instead named as "Movie
Title (Year)"
.
The most important part is that each subfolder's files should be placed
at a consistent folder depth. For example, files in
Audio/Music
will all be placed at depth 4:
Audio/Music/Album Artist/Album Name/## Track Title.mp3
,
whereas files placed in Videos/Movies
will all be placed
at depth 2: Videos/Movies/Movie Title (Year).mkv
.
I wrote a simple
Bash script
to help keep track of this. The script prints out each depth files are
found at for a given directory.
When files are organized following the aforementioned rules, some interesting automation will become possible. Let's say that each of the subfolders and their common file depths are cataloged in a TSV-file, like so:
2 Audio/Music 0 Literature/Books 0 Videos/Movies 2 Videos/TV Shows
The first field will specify a consistent depth for files relative
to the given subfolder. This means that Audio/Music
's common
subfolders Album Artist/Album Name
make files reside at depth
2, whereas files placed directly under Literature/Books
are at
depth 0.
With this TSV-file quite a lot of automation has now become possible. The
following sections will contain some examples of this.
In order to create backups, more specifically verifiable backups, we
need to have checksums for everything. This is something backup tools like
Borg and
Restic will always create for you.
However, something I dislike about these methods is their own ways of
browsing and managing the resulting backups. These tools have their own
commands for listing, verifying, mounting and restoring files in a backup.
Having become quite familiar with Rclone over the past few years, I wanted
to achieve something similar to these backup tools entirely within Rclone.
Turns out, it's entirely possible to create backups with Rclone when using
it with the option
--backup-dir
.
To create backups with Rclone, checksum files need to be created. Unlike
traditional backup tools, Rclone won't automatically create these for
you. This is where the previously mentioned TSV-file comes in handy. Since
the file depth specified for each subfolder means that no files should be
placed above that given depth, we can create checksums for all files under
that depth.
In order to automate this, I wrote a Bash script called
new-md5
,
which can recursively create MD5 files at specified depths.
Instead of creating one large MD5 file directly under Audio/Music
,
new-md5
can create a separate MD5 file for each album
at depth 2. The benefit of this over large MD5 files is that renaming and
moving directories around remains easy.
Given that every subfolder is specified in the TSV-file with their correct
file depths, automating checksum creation becomes a trivial task with
new-md5
or a similar helper script. See
new-md5
's
documentation for some use cases.
Since every directory now has MD5 files, backing up can be done in a more
"portable" fashion than some traditional backup tools allow for. With this
I mean that the files can simply be copied or synced as is to new drives
without a need to generate new checksums for the resulting backup. Since
new-md5
doesn't update hashes in the generated checksum files
by default, there is no danger of files being changed without notice
between multiple new backups.
Rclone ties this all together. All files can be placed in a default
-remote,
whether they are in the cloud or on a disk, encrypted or not. Creating backups
can then be automated by creating remotes named backup_*
, have
a helper script cycle through each backup remote with
rclone listremotes | grep "^backup_"
and create backups with Rclone's --backup-dir
, optionally along with
--suffix
and --suffix-keep-extension
.
Since --backup-dir
shouldn't be in the same path as the
top-level folders, it's a good idea to make every remote's root include a
couple of meta-folders: one for files the given remote contains and one
for deleted files, which --backup-dir
handles for you. I tend to
name these simply as 0
and 1
to save space in the
path length.
With this setup, a backup can be created with the following command:
rclone sync default:/0 backup_remote:/0 --backup-dir backup_remote:/1
Verifying the backups and the default remote is also trivial with Rclone.
All MD5 files in a remote can be found with
rclone lsf remote_name:/0 -R --files-only --include '*.md5'
and subsequently checked in a loop with
rclone md5sum remote_name:/0/path/to/md5 -C remote_name:/0/path/to/md5/.verify.md5
.
Although this might not be the most optimal and space-saving method for
creating backups, I've personally been very happy with it. Since everything
revolves around Rclone's remotes, this method is extremely malleable.
Everything can be done locally on hard drives, but can easily be scaled up to
include any cloud storage that Rclone supports, not to mention
encryption,
compression
or combining remotes with the
union remote.
Following the
3-2-1 rule
when creating backups this way is also easy. The default remote can be
a local drive, with one of the backup remotes pointing to another local drive
and another to a cloud storage.