FAQ  •  Register  •  Login

Backup script for imd files

<<

jimbomahoney

Master

Posts: 83

Joined: Wed Feb 27, 2019 11:21 am

Post Thu Aug 22, 2019 8:31 am

Backup script for imd files

Hi all,

I'm not sure what methods you are all using to backup the data from the Helios / CyTOF, but here is the method I have settled on. I hope it's useful or others can suggest better / faster methods.

I'm using 7zip and (optionally) some additional codecs. The codecs aren't strictly necessary, but give me about a 10% speed increase when using BROTLI vs. LZMA2 with the standard 7zip install.

EDIT - Thanks to Samuel for helping with tidying the code and removing the need to add 7Zip to the Environment variables!

EDIT 2 - Here's a Github link

Here's the batch file:

  Code:
@ECHO OFF

REM set the path of 7z
SET ZIPPER=C:\Program Files\7-Zip\7z.exe

REM Check the 7z is correctly installed
If Not Exist "%ZIPPER%" (
Echo Error: Zipper is not found as "%ZIPPER%"!
Goto END
)

REM Tune the zipping options (BROTLI Requires additional codecs)
REM 7z options have been tested and selected for optimal speed
Set ZIPOPTIONS=-mm=BROTLI -mx=2 -mmt24
REM -mm=BROTLI - use the BROTLI codec. If not using additional codecs, use LZMA2, which is slightly slower (~10%), but slightly better compression (~4%).
REM -mx=2 - compression quality (higher = better compression, but slower). Use -mx=0 if using LZMA2.
REM -mmt24 - use 24 threads; set as the number of virtual cores on your machine

REM Root folder to backup goes here:
SET WORKDIR=E:\User_Data\

REM Check if workdir exists
IF NOT EXIST "%WORKDIR%" (
Echo Error: working directory does not exist; check "%WORKDIR%"!
Goto END
)

REM change to working directory
CD /D "%WORKDIR%"

REM Loop across files recursively
REM Zip all IMD files in the WORKDIR directory (and subdirs)
REM but only if they haven't already been zipped
REM %%f = files
REM %%~pnf = path and filename (excluding extension)
FOR /R %%f in (*.imd) DO (
Echo Processing %%f
IF NOT EXIST "%%~pnf.7z" (
"%ZIPPER%" a %ZIPOPTIONS% "%%~pf%%~nf.7z" "%%f"
)
)
Echo Compression finished.


:END




Here's what it does:

1) Change to the directory I want to compress.
2) Find all the IMD files in that directory (and subdirectories).
3) See if they have already been compressed.
4) If they haven't, it will create new, compressed files.
5) Each IMD is compressed into its own identically-named 7z file (lossless compression).


After much experimentation, this was as fast as I could get it. e.g. 55 GB IMD file compressed in ~40 seconds to about 900 MB.

-mm=BROTLI - this will use the BROTLI codec
-mx=2 - this is the compression quality (higher = better compression, but slower)
-mmt24 - this will use 24 threads. I experimented with values from 1 to 256 and found that, on our Helios machine (24 virtual cores), this was the best value for speed.

I then back up the 7z files to a data server (using more code in the batch file to execute an rsync backup over SSH).

Hope this helps someone and I'd welcome improvements or suggestions of what methods you use!
Last edited by jimbomahoney on Thu Aug 22, 2019 2:21 pm, edited 4 times in total.
<<

jimbomahoney

Master

Posts: 83

Joined: Wed Feb 27, 2019 11:21 am

Post Thu Aug 22, 2019 8:36 am

Re: Backup script for imd files

Sorry - a few typos in there!

It's of course a 55 GB file compressed to ~900 MB, not a 5 GB! That would be pathetic!

:D
<<

Jahangir

Master

Posts: 52

Joined: Sun Oct 29, 2017 6:34 pm

Post Thu Aug 22, 2019 2:00 pm

Re: Backup script for imd files

Hi,

I am much less tech savvy than you are unfortunately. So I use WinRAR to compress my IMD files, on average, it gives a 2-3% compression rate from the original file, so quite good in my opinion. It does take really long though which is the only down side - however, compressing the IMD files is the last thing I do once I'm finished using the Helios, so I usually start compressing the files and leave the computer on overnight (although it only take 20+ minutes depending on file size and number of files compressing in one go).

Best,

Jahangir
<<

BjornZ

Contributor

Posts: 43

Joined: Fri Jul 10, 2015 1:04 am

Post Thu Aug 22, 2019 4:21 pm

Re: Backup script for imd files

I wrote a script a few years ago that's in use on a handful of CyTOFs and is similar to yours, available here: https://github.com/nolanlab/cytof-backer-upper. It uses 7zip (but with 7z, not brotli) and is designed to upload data to Google Cloud Storage, but can be pointed at other backup location types (external hard drives, NFS mounts, etc.). It also backs up the FCS files but does not compress them since they're mostly incompressible. We set it up as a scheduled task that runs every night.
<<

mleipold

Guru

Posts: 5796

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Thu Aug 22, 2019 5:00 pm

Re: Backup script for imd files

Hi all,

Here at Stanford HIMC, we just do a 7zip compression (default parameters) of the IMD+FCS+whatever files on a month-by-month basis. Our Helios E: drive is large enough that we don't run out of space; with CyTOFv1, I think I was doing it every 2 weeks (with WinRAR, at the time). After compression, we move it to both an external HD and to a Google Drive folder. For a month's worth of data, it often takes more than 24hr to compress, but 7zip allows you to pause compression if it's not done yet and you need to start running samples. And, compressing over the weekend takes care of it in any case.

I think CyTOF2's are where you need this the most: IMO, the CyTOF2 E: drive is WAAAAAAAAAAAAAY too small, to where you can entirely fill it in a single day if you save IMDs. That's where overnight compression and data backup is really important (if you're not saving IMDs, it's less of an issue, but we have always saved all IMDs).


Mike
<<

jimbomahoney

Master

Posts: 83

Joined: Wed Feb 27, 2019 11:21 am

Post Fri Aug 23, 2019 7:09 am

Re: Backup script for imd files

BjornZ wrote:I wrote a script a few years ago that's in use on a handful of CyTOFs and is similar to yours, available here: https://github.com/nolanlab/cytof-backer-upper. It uses 7zip (but with 7z, not brotli) and is designed to upload data to Google Cloud Storage, but can be pointed at other backup location types (external hard drives, NFS mounts, etc.). It also backs up the FCS files but does not compress them since they're mostly incompressible. We set it up as a scheduled task that runs every night.


Thanks - I've linked to yours from my GitHub repo in case people stumble upon mine and want something different / better.
<<

sgranjeaud

Master

Posts: 123

Joined: Wed Dec 21, 2016 9:22 pm

Location: Marseille, France

Post Mon Aug 26, 2019 3:42 pm

Re: Backup script for imd files

A publication about compression of FCS files

Lossless Compression of Cytometric Data
A. Bras and V. van der Velden Cytometry Part A
https://doi.org/10.1002/cyto.a.23879

The best compression ratios are about 0.5, which means that the compressed file size is half that of the FCS file.
<<

jimbomahoney

Master

Posts: 83

Joined: Wed Feb 27, 2019 11:21 am

Post Tue Aug 27, 2019 7:29 am

Re: Backup script for imd files

Thanks!

That's interesting!

7Zip uses LZMA by default and my script enables the LZMA2 version, which can be more multithreaded. Brotliseems to be almost as good, but slightly faster.

Return to CyTOF general discussion

Who is online

Users browsing this forum: No registered users and 8 guests

cron