Data pre-processing
Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely

10 posts
• Page 1 of 1
How do I pre-process the CyTOF data before clustering? I have FCS files that were trimmed for alive, singlet, leukocytes. After that, do I normalize the data so that the samples have similar expression patterns? If so, what are the different methods? Also, I see from the Cytofkit paper that the values are first transformed. Why are they transformed and what are the pros and cons of each transformation?
Re: Data pre-processing
Hi Sara,
The typical order of operations is a bit different than what you describe:
1. Acquisition of Raw data (hot off the machine, no additional processing)
2. Normalization. Either MATLAB/Finck style, or Fluidigm-style.
3. Debarcoding, if applicable.
4. Gating down to Live Intact Singlets (removal of debris, Beads, dead cells, remaining Cell-Cell and Bead-Cell doublets).
5. Depending on your analysis pipeline, often export of new FCS files from the gated Live Intact Singlets (if using Cytobank or a FlowJo plug-in, you can usually specify the population to cluster without having to re-export).
6. Clustering.
So, Normalize before gating down to Live Intact Singlets.
I'm not sure what you mean by "normalize the data so that the samples have similar expression patterns": could you clarify?
Mike
The typical order of operations is a bit different than what you describe:
1. Acquisition of Raw data (hot off the machine, no additional processing)
2. Normalization. Either MATLAB/Finck style, or Fluidigm-style.
3. Debarcoding, if applicable.
4. Gating down to Live Intact Singlets (removal of debris, Beads, dead cells, remaining Cell-Cell and Bead-Cell doublets).
5. Depending on your analysis pipeline, often export of new FCS files from the gated Live Intact Singlets (if using Cytobank or a FlowJo plug-in, you can usually specify the population to cluster without having to re-export).
6. Clustering.
So, Normalize before gating down to Live Intact Singlets.
I'm not sure what you mean by "normalize the data so that the samples have similar expression patterns": could you clarify?
Mike
Re: Data pre-processing
Dear Mike,
Is that possible to add a normalization step with data normalized based on certain markers?
I’ve heard that some cyTOF users run the same frozen sample each time they run cells, so that they can normalize their data based on markers instead of beads. Do you know how to proceed to this type of normalization?
Pauline
Is that possible to add a normalization step with data normalized based on certain markers?
I’ve heard that some cyTOF users run the same frozen sample each time they run cells, so that they can normalize their data based on markers instead of beads. Do you know how to proceed to this type of normalization?
Pauline
Re: Data pre-processing
Thanks, Mike. The data I have has already been bead normalized, trimmed for singlets, and leukocytes. The signal for a few of the samples is generally lower, most likely from a technical artifact. I was wondering if they need to be brought into the same space, similarly to RNA-seq.
Also, I see from the cytofkit paper that some people transform CyTOF data using different methods (inverse hyperbolic sine, automatic logicle transformation) before clustering. I looked into the papers that describe these transformations and they were designed for flow cytometry. Should the CyTOF values be transformed? Are these transformations appropriate, even though they were designed for flow cytometry? Do flow cytometry and mass cytometry markers have similar distributions?
Also, I see from the cytofkit paper that some people transform CyTOF data using different methods (inverse hyperbolic sine, automatic logicle transformation) before clustering. I looked into the papers that describe these transformations and they were designed for flow cytometry. Should the CyTOF values be transformed? Are these transformations appropriate, even though they were designed for flow cytometry? Do flow cytometry and mass cytometry markers have similar distributions?
Re: Data pre-processing
Hi all,
Pauline: yes, many people run control samples (PBMCs, or, ideally, control bone marrow if you're running bone marrow, etc) on each plate or as a spike-in/barcode if doing barcoding. Unfortunately, to my knowledge, there hasn't been any published method for using those *directly* as batch-normalizers.
Here are some Cytoforum threads, mentioning some work by Jeff Hokanson or Sofie van Gassen that is unfortunately still unreleased:
viewtopic.php?f=3&t=954&p=2841&hilit=hokanson#p2841
viewtopic.php?f=1&t=328&p=2399&hilit=hokanson#p2399
Sara: arcsinh is a common transformation for CyTOF data. Most flow cytometry transformations can be applied to CyTOF data. Most flow analysis software like FlowJo or Cytobank apply these transformations automatically to CyTOF data, for visualization purposes at the very least. Whether CyTOF data "should" be transformed (and which is "better") is, to me, a philosophical question.
I'm not sure what you mean by "Do flow cytometry and mass cytometry markers have similar distributions?". There are multiple published cases where Flow data and CyTOF data are compared head-to-head, and the Freq Parent and such do agree. If you're looking at a histogram level, then yes, the shapes are highly similar. The main difference is that CyTOF data has a hard edge at Zero (ie, there's no negative number section to the distribution/marker-negative peak).
I'm also not clear on why since "the signal for a few of the samples is generally lower", that you assume it's a technical artifact. Was there a control sample that was included on all plates, that *also* had lower signal on a specific plate? If not, it's not clear to me that you could exclude the possibility that the lower signal is due to biological variation among donors, or at least a technical issue like freeze/thaw (especially cell-viability) artifacts, rather than a day-of staining-based or machine-based artifact.
Mike
Pauline: yes, many people run control samples (PBMCs, or, ideally, control bone marrow if you're running bone marrow, etc) on each plate or as a spike-in/barcode if doing barcoding. Unfortunately, to my knowledge, there hasn't been any published method for using those *directly* as batch-normalizers.
Here are some Cytoforum threads, mentioning some work by Jeff Hokanson or Sofie van Gassen that is unfortunately still unreleased:
viewtopic.php?f=3&t=954&p=2841&hilit=hokanson#p2841
viewtopic.php?f=1&t=328&p=2399&hilit=hokanson#p2399
Sara: arcsinh is a common transformation for CyTOF data. Most flow cytometry transformations can be applied to CyTOF data. Most flow analysis software like FlowJo or Cytobank apply these transformations automatically to CyTOF data, for visualization purposes at the very least. Whether CyTOF data "should" be transformed (and which is "better") is, to me, a philosophical question.
I'm not sure what you mean by "Do flow cytometry and mass cytometry markers have similar distributions?". There are multiple published cases where Flow data and CyTOF data are compared head-to-head, and the Freq Parent and such do agree. If you're looking at a histogram level, then yes, the shapes are highly similar. The main difference is that CyTOF data has a hard edge at Zero (ie, there's no negative number section to the distribution/marker-negative peak).
I'm also not clear on why since "the signal for a few of the samples is generally lower", that you assume it's a technical artifact. Was there a control sample that was included on all plates, that *also* had lower signal on a specific plate? If not, it's not clear to me that you could exclude the possibility that the lower signal is due to biological variation among donors, or at least a technical issue like freeze/thaw (especially cell-viability) artifacts, rather than a day-of staining-based or machine-based artifact.
Mike
Re: Data pre-processing
Thank you! I really appreciate you taking the time to answer my questions! There was a control sample processed with every sample. I will see if I can get that data and see if it also has lower signal.
Re: Data pre-processing
Thank you Mike !
Re: Data pre-processing
Hi Mike,
Data in Cytobank is automatically arcsinh-transformed. However when data is exported in form of FCS file and read into R the transformation has to be applied again, right?
I see as minRange value = 0 for everything.. after reading FCS files.. therefore that would explain sth...
best
Julia
Data in Cytobank is automatically arcsinh-transformed. However when data is exported in form of FCS file and read into R the transformation has to be applied again, right?
I see as minRange value = 0 for everything.. after reading FCS files.. therefore that would explain sth...
best
Julia
Re: Data pre-processing
Hi Julia,
If your data is Mass Cytometry, then the lowest value (aka minRange) for every channel is 0.
If the maxRange is also 0 then you are in trouble.
I sincerely think that questions concerning using R and the use of its packages should be addressed to the bioconductor forum with an adequate tag such as flowCore in your case.
I think that cytoforum is more concerned by questions about the way the data are processed by the packages, the meaning of the output, the algorithms, their limits...
Best,
Samuel
If your data is Mass Cytometry, then the lowest value (aka minRange) for every channel is 0.
If the maxRange is also 0 then you are in trouble.
I sincerely think that questions concerning using R and the use of its packages should be addressed to the bioconductor forum with an adequate tag such as flowCore in your case.
I think that cytoforum is more concerned by questions about the way the data are processed by the packages, the meaning of the output, the algorithms, their limits...
Best,
Samuel
Re: Data pre-processing
Hi Julia,
A quick way to check -- if the values in your FCS files range from 0 up to 10,000 or so, then the data is untransformed (i.e. no arcsinh). If they range from 0 to 5 or so, then that's arcsinh transformed data.
Tom
A quick way to check -- if the values in your FCS files range from 0 up to 10,000 or so, then the data is untransformed (i.e. no arcsinh). If they range from 0 to 5 or so, then that's arcsinh transformed data.
Tom
10 posts
• Page 1 of 1
Who is online
Users browsing this forum: No registered users and 1 guest