FAQ  •  Register  •  Login

Modifying channels from the FCS files

Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely :-)
<<

avinash1

Participant

Posts: 16

Joined: Fri Jun 16, 2017 1:59 pm

Post Thu Sep 26, 2019 9:41 pm

Modifying channels from the FCS files

Hi CytOFers! I have removed 5 extra unused conflicting channels from FCS files that were run across multiple time points and re-exported from Premessa. Interestingly, the file size has dropped half but the event counts still remain same. Just curious do you expect this ? Is this normal ? Why has the file size halved ?

Cheers
Avi
<<

dtelad11

Master

Posts: 129

Joined: Mon Oct 31, 2016 6:26 pm

Post Thu Sep 26, 2019 10:01 pm

Re: Modifying channels from the FCS files

FCS files that come from the CyTOF have an additional section that is used by the Fluidigm software for normalization. Personally, I fondly refer to it as the "super-secret-Fluidigm-section". If you manipulate the file outside of the Fluidigm software (via FlowJo, Cytobank, R, etc.) that section will be discarded, cutting the file size by half.
<<

avinash1

Participant

Posts: 16

Joined: Fri Jun 16, 2017 1:59 pm

Post Thu Sep 26, 2019 10:05 pm

Re: Modifying channels from the FCS files

Thanks EL ad. Just wanted to confirm if that will not hinder the data in anyway ?

Avi
<<

dtelad11

Master

Posts: 129

Joined: Mon Oct 31, 2016 6:26 pm

Post Thu Sep 26, 2019 10:28 pm

Re: Modifying channels from the FCS files

Make sure to normalize with the Fluidigm software ahead of any downstream analysis. Assuming you've done that, you're safe.
<<

dtelad11

Master

Posts: 129

Joined: Mon Oct 31, 2016 6:26 pm

Post Thu Sep 26, 2019 10:28 pm

Re: Modifying channels from the FCS files

Make sure to normalize with the Fluidigm software ahead of any downstream analysis. Assuming you've done that, you're safe.
<<

avinash1

Participant

Posts: 16

Joined: Fri Jun 16, 2017 1:59 pm

Post Thu Sep 26, 2019 10:37 pm

Re: Modifying channels from the FCS files

Yes I have done that originally when generating data from the machine.

Avi
<<

vtosevski

Contributor

Posts: 44

Joined: Wed Nov 20, 2013 12:50 pm

Location: Zurich, Switzerland

Post Sun Sep 29, 2019 8:23 pm

Re: Modifying channels from the FCS files

Hi Avi and El-ad,

I don't think it's the additional section. I've been looking for that mythical section long time ago as I too was told it was there (the randomised and non-randomised matrix). To me it made sense that it should be there but I was never able to see it myself. El-ad, have you seen it? :)

Mike Jiang from RG lab@Fred Hutch told how to find a second matrix (if it's there) and it wasn't. Instead, he suggested the size difference most likely has to do with the number of bits per data point (see the old thread here: https://support.bioconductor.org/p/109258/

I just realized I never closed that thread but if memory serves me well, I followed their advice and could confirm it to be the case.

Vinko
<<

dtelad11

Master

Posts: 129

Joined: Mon Oct 31, 2016 6:26 pm

Post Sun Sep 29, 2019 10:46 pm

Re: Modifying channels from the FCS files

Vinko, I'm confused by your reply -- there is nothing "mythical" about this section. Open the FCS file with a hex editor, you can scroll to the end of the file and see XML tags. Additionally, check out the header, there is a definition of a "user-defined OTHER segment" following the other segments such as TEXT and DATA, as per the FCS standard. The existence and utility of that segment was confirmed by Fluidigm reps and by FlowJo personnel.
<<

ChrisCiccolella

Participant

Posts: 3

Joined: Tue Nov 27, 2018 9:05 pm

Post Sun Sep 29, 2019 11:02 pm

Re: Modifying channels from the FCS files

My vote is also with the additional section. The FCS spec does allow for 64 bit encoding (typical is 32 bit) but I don't think I've ever seen it implemented despite intentionally looking for examples. This is with good reason since it offers an unnecessary degree of numeric precision while doubling the amount of storage needed to encode the data. Why do that? On the other hand, I have indeed seen extra stuff pasted to the end of Fluidigm FCS files.

Anyway, there is no sense in speculating because it's easy to determine the answer:

Start by opening your FCS file with a text editor or hex editor.

To know the bits per value, look at the $PnB keyword for each channel with the text segment at the beginning of the file. E.g., $P4B gives the bits per value of the fourth channel. This will normally be 32, but if it's 64, then that would explain the doubling in size. The loss of half the data size comes from reading in the data as 64 bit then writing it out again as 32 bit, which likely all software is hard-coded to do.

You can also look at the $DATATYPE keyword. If this has a value of F (for "float") then you should have 32 bit encoding. A value of D (for "double") means 64 bit encoding. Besides F I think I have only ever seen I (for "integer"). If I recall correctly, the YETI used integer encoding in its very early days. I doubt it does this still?

So now about the extra section and how to parse it:

The simplest way to infer its existence is to look at the $BEGINDATA and $ENDDATA keywords in the TEXT segment. Subtract the former from the latter and you have the number of bytes of data that are encoded in the file. Since data should the primary share of the file size, simply comparing this value to the size of the file on disk will give a simple indication of whether or not there is extra stuff in the file. For example, a Fluidigm file I have on my computer has a disk size of 257 MB. When I calculate the theoretical data size I get (84803681-3682) bytes = 84.8 MB. If I read the file into R and write it back out again, sure enough, the written FCS file is 84.8 MB.

The FCS spec also allows for other information to be officially encoded in the FCS file. The byte locations of this information should be given by other keywords such as $BEGINANALYSIS, $ENDANALYSIS, and $NEXTDATA. $NEXTDATA, as your linked thread points out, would be used for another data matrix but this is not used in the example file I'm looking at. So, there is no immediate official explanation for the rest of the information encoded past the stated end of the data in this file.

Another observation to make is that sometimes there is a bunch of XML at the very end of the file. I think this is known as the "XML tail". So, there is clearly precedent for Fluidigm appending things to the end of FCS files. This little bit of XML doesn't explain the large increase in file size, though.

So how do we parse it? The first step is to know ahead of time exactly what the information is and how it's encoded. Then you would have to write a routine that reads in the correct bytes and processes them to the correct data structure. Not sure if this information is public or could be offered by a Fluidigm rep or anyone who knows. Obviously it's liable to change at some point.

@El-ad, I don't see any other byte pointer keywords in the FCS file I'm looking at right now, but that would help pull out the correct bytes for this information.
<<

vtosevski

Contributor

Posts: 44

Joined: Wed Nov 20, 2013 12:50 pm

Location: Zurich, Switzerland

Post Mon Sep 30, 2019 8:27 pm

Re: Modifying channels from the FCS files

Hi both,

@El-ad, I didn't mean anything bad by "mythical". I don't look at FCS files with HEX editors and have never seen the XML tags and user-defined OTHER segment you mention. It felt intuitive that additional matrix should be in there (as CyTOF software can go between randomized and non-randomized matrix back and forth) but I never managed to see it myself, which is why I used the word "mythical". I am not a native speaker of English so if that word has a "heavier" meaning than the one I intended, my bad.

@Chris - thanks for this exhaustive reply. I took some files now as they come out of the instrument and checked, they are indeed encoded with 32-bit precision (and $DATATYPE is "F"). The $NEXTDATA is 0, so the additional matrix it is, in Fluidigm's own way, I suppose!

Vinko
Next

Return to CyTOF data analysis

Who is online

Users browsing this forum: No registered users and 11 guests

cron