Off Topic: A Better Sync Option?

Just a quick thing...

This website is made available free of charge and without adverts, pop-ups, mailing lists, or subscriptions. However, if you can throw a few pounds my way to help out with the runnings costs and the time spent researching, writing, and advocating for active travel then your support is gratefully received! Thank you!

Buy Me A Coffee

Update 14 May 2024: Thanks to everyone who’s commented either in reply here or on social media. I’ve had a suggestions regarding changing the parameters of rsync, but also telling me about Syncthing. I found this latter option to be quite compelling where it seemed to meet my requirements with the added advantage of me not needing to manually run it after I’ve finished working on my Lightroom library. I’m therefore trialling that with initially positive impressions. Thanks again for everyone’s help!

Original article follows below.

A rare off-topic piece to explain a technical situation and ask for opinions on better alternatives.

The situation now

I use Apple computers as my daily drivers (a Mac Mini M2 and a MacBook Pro M1), and an old Windows computer as a NAS machine (which also servers Steam game streaming, hence why I’m not using a Linux or a proper, off-the-shelf NAS).

I keep my Adobe Lightroom photo library (all Lightroom files plus original photographs) on an encrypted external SSD to allow me to manage it on whichever machine I’m using. This works well, but on its own it would be vulnerable to drive failure, damage, loss etc.

To mitigate against those risks by way of a level of backup, when I am finished with the library, I run a script that copies all changed files to a mirror which I store on a spinning hard drive on the Windows machine, made available over the local network as an SMB share. Effectively, this is a one-way sync. There will never be changes made to the remote copy so I don’t need nor want to pull data the other way.

As an aside but not relevant for the aspect in question – that mirror is also syncronised in turn to an off-site storage provider.

To run the mirror from the local external SSD to the network storage drive, I am using rsync, called via a bash script which I run in the MacOS Terminal. That manual aspect is fine. While it does introduce an element of risk should I forget to run it, I’m diligent on this and don’t mind manually calling the script as part of my workflow.

My script is below. The intent here is to only copy over any changes since the last time the script was running, as an incremental backup. So, new and altered files get copied across (overwriting existing files in the case of the latter), and deleted files are removed from the mirror.

#!/usr/bin/env bash
rsync -auv --delete --exclude={".*","*Previews.lrdata","._*"} --progress $SOURCE $TARGET

# -a = archive mode
# -u = update only (don’t overwrite newer files)
# -v = increase verbosity
# --delete = (alias of --delete-during) receiver deletes during transfer, not before.
# --exclude = (exclude files matching pattern)
# --progress = (show progress during transfer)

(The path on the remote computer is mounted in Finder before running this script)

The problem

While the above script works for maintaining that essential backup, it is painfully slow despite it being (I believe) incremental and not creating a new full copy. We’re talking hours here. I’m running the script as I write; it’s been going for about 75 minutes and it’s only scanned through about 20% of the files. On that basis, it’s going to take up to seven hours to finish, unless it speeds up.

Update at 20:28 – The script finally finished running after 10 hours and eight minutes. Clearly ridiculous!

I’m confident it should be possible to do this much quicker. I’m sure when I ran a Windows machine for my desktop, a similar script was very fast. If I recall correctly, that script used robocopy – but that’s a Windows only option, so not something I can return to.

I currently have over 92,000 files – a mix of photographs, sidecar files (XMP), and Lightroom catalogue files, so a variety of file sizes from kilobytes through to hundreds of megabytes. The script appears to parse each one of these files, presumably to check if a change has happened, before deciding whether to copy that file or not. So, the more files, the longer the script needs to execute.

While I can set it to run and walk away, this essentially means I’m only realistically going to be running this script when I have the drive connected to the Mac Mini, where it is run off mains power and also has the added benefit of an ethernet connection. But still, it’s a machine left running for hours where I’d likely rather shut it down. It may well mean leaving the computer on overnight.

It’s not practical to run this script as it stands on the laptop, which is primarily run on battery power and uses Wi-Fi for networking. It would again mean leaving the machine active when I might prefer to shutdown, for the night say, and it would be an inconvenience leaving the SSD plugged in while it runs.

In either case, it ties up the drive to the machine in its connected to, which could be an inconvenience.

The question

Given the situation and problem then, what can I do better? Is there something wrong with my rsync command which I could change for improved performance without affecting the outcome?

Is there a better option that will achieve the same results (an exact, incremental, on demand, one-way mirror that maintains file attributes) but does so more quickly? If so, I’d prefer something free and open source but am not opposed to closed software nor paying a reasonable one-off cost for something reliable, trusted, and well-reviewed (certainly not a subscription though).

If you’re able to shed some light, do please leave a comment, or get in touch on social media – but no “go back to Windows” or “use Linux” comments please. The main setup is not going to change!

Thank you!

Have you found this content interesting or useful? If so, and if you are able to, any contributions are greatly appreciated! Thank you!

Buy Me A Coffee