FAQ  •  Register  •  Login

Data pre-processing

Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely :-)
<<

selitsky

Participant

Posts: 3

Joined: Fri Feb 09, 2018 8:24 pm

Post Thu Apr 26, 2018 12:28 pm

Data pre-processing

How do I pre-process the CyTOF data before clustering? I have FCS files that were trimmed for alive, singlet, leukocytes. After that, do I normalize the data so that the samples have similar expression patterns? If so, what are the different methods? Also, I see from the Cytofkit paper that the values are first transformed. Why are they transformed and what are the pros and cons of each transformation?
<<

mleipold

Guru

Posts: 5792

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Thu Apr 26, 2018 2:48 pm

Re: Data pre-processing

Hi Sara,

The typical order of operations is a bit different than what you describe:

1. Acquisition of Raw data (hot off the machine, no additional processing)
2. Normalization. Either MATLAB/Finck style, or Fluidigm-style.
3. Debarcoding, if applicable.
4. Gating down to Live Intact Singlets (removal of debris, Beads, dead cells, remaining Cell-Cell and Bead-Cell doublets).
5. Depending on your analysis pipeline, often export of new FCS files from the gated Live Intact Singlets (if using Cytobank or a FlowJo plug-in, you can usually specify the population to cluster without having to re-export).
6. Clustering.

So, Normalize before gating down to Live Intact Singlets.

I'm not sure what you mean by "normalize the data so that the samples have similar expression patterns": could you clarify?


Mike
<<

PaulineM

Participant

Posts: 14

Joined: Thu Apr 12, 2018 1:05 pm

Post Fri Apr 27, 2018 9:10 am

Re: Data pre-processing

Dear Mike,

Is that possible to add a normalization step with data normalized based on certain markers?

I’ve heard that some cyTOF users run the same frozen sample each time they run cells, so that they can normalize their data based on markers instead of beads. Do you know how to proceed to this type of normalization?

Pauline
<<

selitsky

Participant

Posts: 3

Joined: Fri Feb 09, 2018 8:24 pm

Post Fri Apr 27, 2018 12:43 pm

Re: Data pre-processing

Thanks, Mike. The data I have has already been bead normalized, trimmed for singlets, and leukocytes. The signal for a few of the samples is generally lower, most likely from a technical artifact. I was wondering if they need to be brought into the same space, similarly to RNA-seq.

Also, I see from the cytofkit paper that some people transform CyTOF data using different methods (inverse hyperbolic sine, automatic logicle transformation) before clustering. I looked into the papers that describe these transformations and they were designed for flow cytometry. Should the CyTOF values be transformed? Are these transformations appropriate, even though they were designed for flow cytometry? Do flow cytometry and mass cytometry markers have similar distributions?
<<

mleipold

Guru

Posts: 5792

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Fri Apr 27, 2018 3:00 pm

Re: Data pre-processing

Hi all,

Pauline: yes, many people run control samples (PBMCs, or, ideally, control bone marrow if you're running bone marrow, etc) on each plate or as a spike-in/barcode if doing barcoding. Unfortunately, to my knowledge, there hasn't been any published method for using those *directly* as batch-normalizers.

Here are some Cytoforum threads, mentioning some work by Jeff Hokanson or Sofie van Gassen that is unfortunately still unreleased:
viewtopic.php?f=3&t=954&p=2841&hilit=hokanson#p2841
viewtopic.php?f=1&t=328&p=2399&hilit=hokanson#p2399


Sara: arcsinh is a common transformation for CyTOF data. Most flow cytometry transformations can be applied to CyTOF data. Most flow analysis software like FlowJo or Cytobank apply these transformations automatically to CyTOF data, for visualization purposes at the very least. Whether CyTOF data "should" be transformed (and which is "better") is, to me, a philosophical question.

I'm not sure what you mean by "Do flow cytometry and mass cytometry markers have similar distributions?". There are multiple published cases where Flow data and CyTOF data are compared head-to-head, and the Freq Parent and such do agree. If you're looking at a histogram level, then yes, the shapes are highly similar. The main difference is that CyTOF data has a hard edge at Zero (ie, there's no negative number section to the distribution/marker-negative peak).

I'm also not clear on why since "the signal for a few of the samples is generally lower", that you assume it's a technical artifact. Was there a control sample that was included on all plates, that *also* had lower signal on a specific plate? If not, it's not clear to me that you could exclude the possibility that the lower signal is due to biological variation among donors, or at least a technical issue like freeze/thaw (especially cell-viability) artifacts, rather than a day-of staining-based or machine-based artifact.


Mike
<<

selitsky

Participant

Posts: 3

Joined: Fri Feb 09, 2018 8:24 pm

Post Fri Apr 27, 2018 6:23 pm

Re: Data pre-processing

Thank you! I really appreciate you taking the time to answer my questions! There was a control sample processed with every sample. I will see if I can get that data and see if it also has lower signal.
<<

PaulineM

Participant

Posts: 14

Joined: Thu Apr 12, 2018 1:05 pm

Post Sun Apr 29, 2018 3:19 pm

Re: Data pre-processing

Thank you Mike !
<<

juliam

Participant

Posts: 18

Joined: Wed Oct 07, 2020 8:13 am

Post Mon Nov 02, 2020 5:33 pm

Re: Data pre-processing

Hi Mike,
Data in Cytobank is automatically arcsinh-transformed. However when data is exported in form of FCS file and read into R the transformation has to be applied again, right?
I see as minRange value = 0 for everything.. after reading FCS files.. therefore that would explain sth...

best
Julia
<<

sgranjeaud

Master

Posts: 123

Joined: Wed Dec 21, 2016 9:22 pm

Location: Marseille, France

Post Mon Nov 02, 2020 7:50 pm

Re: Data pre-processing

Hi Julia,

If your data is Mass Cytometry, then the lowest value (aka minRange) for every channel is 0.
If the maxRange is also 0 then you are in trouble.

I sincerely think that questions concerning using R and the use of its packages should be addressed to the bioconductor forum with an adequate tag such as flowCore in your case.

I think that cytoforum is more concerned by questions about the way the data are processed by the packages, the meaning of the output, the algorithms, their limits...

Best,
Samuel
<<

tomash

Contributor

Posts: 25

Joined: Sun Oct 19, 2014 10:15 pm

Post Tue Nov 03, 2020 7:04 am

Re: Data pre-processing

Hi Julia,

A quick way to check -- if the values in your FCS files range from 0 up to 10,000 or so, then the data is untransformed (i.e. no arcsinh). If they range from 0 to 5 or so, then that's arcsinh transformed data.

Tom

Return to CyTOF data analysis

Who is online

Users browsing this forum: No registered users and 11 guests