FAQ  •  Register  •  Login

Negative values in CyTOF

<<

CRStevens

Contributor

Posts: 31

Joined: Thu Jul 17, 2014 5:07 pm

Post Thu Feb 19, 2015 3:05 pm

Re: Negative values in CyTOF

Hi All,

This is a very interesting topic indeed. I was wondering if anyone would like to comment on a slight spin-off to this. Does anyone have any experience trying to look at basal expression of markers? How do these very small expression levels translate with the randomization of the data. For instance could you tell the difference between the basal levels of a pSTAT target between a healthy donor and one with a chronic disease?

Our thoughts are that with the broader negative peaks that the randomization gives us it would be more difficult to see these differences. Do you think changing the randomization or not doing the randomization would help with this resolution? Thoughts?

-Chad
<<

richardellis

Participant

Posts: 13

Joined: Mon Dec 09, 2013 11:03 am

Post Thu Feb 19, 2015 5:31 pm

Re: Negative values in CyTOF

Thanks Erin for weighing in with that useful advice.

Chad, the randomisation and transformation (and they really do go hand in hand with plots for conventional gating) are important for the axis on which you measure them. You would have a single "bin" covering approx 0-78 on the first pixel if you used a linear axis of say 128 pixels to cover a data range of 10,000 points. Moving to log, bi-ex, ArcSinh obviously drives down the density of discrete values represented at the low end of the axis, but doesn't change the data - and those data treatments allow us to visualise populations.

I found that one software I tried (I think it was FlowJo of the Mac-lineage, around version 8 or 9) added it's own randomisation, presumably to deal with picket-fencing artefacts. That was more alarming because it was being applied to already randomised data.

See the .txt files you are generating as the intermediary between raw and .fcs - that is why we also supply these to our users in case they are interested. They don't have the randomisation applied, so you just get integer values. Then when you convert it using default settings you split the integer, for example at zero, only between -1 and 1. If you have dim signals at say 10, positive at 100 and bright at 1000 will your randomisation in that range make a difference? In fact I would be alarmed if a clustering algorithm found the values in that range significantly different from each other even around zero when you will have, to paraphrase Erin, some events 'below measurable'. Or the events which we know from experience will pick up a count or two from coincidence due to no-longer-cell-associated particles in the water simultaneously entering the ion cloud. And it is unlikely that you are worrying about the randomisation that happens at 1000!

As Erin postulated too much weight on zero might cause a problem for some analyses, and similarly negative values may have to be treated for some calculations to work. But the randomisation plays a valuable role in conventional gating and as I mentioned I would be disappointed if such small variation affected clustering. I realise I haven't quite answered your question about basal level - maybe someone else can post some results to illustrate that case. But essentially I find it difficult to believe that such randomisation can give you a "broad peak" when it only splits an integer value as far as the next integer value - and that will also be reflected in the transformation applied to that axis of the plot. I don't think we can reliably resolve populations between any two given neighbouring integer values, if for no reason more than fundamentally there is some background metal in the tube.
Guy's Hospital flow core, London
<<

Ofir

Master

Posts: 75

Joined: Thu Nov 07, 2013 12:46 pm

Location: US, CA

Post Thu Feb 19, 2015 9:18 pm

Re: Negative values in CyTOF

Hey gang,
Here is s brief comment from Vladimir on this issue:
Factually, the description is correct.
It is probably also true that many researchers employ our algorithm by default.
Fundamentally counted ions are integer numbers which cannot be directly presented in Log or Asinh scale using the same number of bins for every decade.
The "zero" events might also be quite detrimental to a general clustering algorithm. "Overemphasising" means that small but interesting clusters will not be detected due to the overwhelming importance of some dominant events. It is true for any dominant cluster.
<<

petterbrodin

Participant

Posts: 11

Joined: Thu Nov 28, 2013 7:32 am

Post Thu Feb 19, 2015 10:30 pm

Re: Negative values in CyTOF

Hello all and thanks for this important discussion!
Randomization is certainly good for visualization of data in biaxial plots. For high-dimensional analyses it poses a problem though. When performing clustering (SPADE) or dimensionality reduction (PCA or tSNE) or anything else, then all values should first be standardized (column-wise) to z-scores or something equivalent. This is important to prevent markers with high dynamical range to arbitrarily dominate the clustering or dimensionality reduction. Such standardization is automatic in ACCENSE when loading input files and possibly also built in to some of the other available tools, I'm not sure. The problem is that randomization of zero-entries can negatively impact this necessary data standardization and I would therefor recommend any high-dimensional analysis requiring standardization is performed on non-randomized values.
/Petter
<<

ErinSimonds

Master

Posts: 50

Joined: Tue May 13, 2014 8:04 pm

Post Thu Feb 19, 2015 11:41 pm

Re: Negative values in CyTOF

Interesting point, Petter -- that's a good example of where the integer values can be helpful.

SPADE doesn't do Z-score normalization, and viSNE doesn't either (unless Z-score is built into the bh-tSNE implementation from van der Maaten, but I don't think it is.) PCA doesn't do any normalization by default either. For these algorithms, the randomized zero-values don't hurt, and probably help in the case of viSNE. SPADE won't really care one way or the other, since it uses distances and annealing to assemble clusters. PCA also won't care, I don't think, except in very extreme cases (i.e. all cells have zero values on all channels, and the only remaining data is the randomized values.)

So maybe there is no one-size-fits-all answer ... it depends on what you plan to do with the data:

    1) If your high-dimensional algorithm of choice performs Z-score normalization, like ACCENSE, you should turn randomization off
    2) If your high-dimensional algorithm of choice does not perform Z-score normalization, like vISNE/SPADE/PCA, you should leave randomization on
    3) If you are visualizing the data in 2D plots, you should leave randomization on

Agreed? Any corrections to this proposal? I think this forum is as good a place as any to come to consensus on this issue!
<<

petterbrodin

Participant

Posts: 11

Joined: Thu Nov 28, 2013 7:32 am

Post Fri Feb 20, 2015 5:21 am

Re: Negative values in CyTOF

Yes in principle Erin, but anyone who wants all markers to contribute equally to their high-dimensional analysis, would first have to standardize their data, irrespective of the algorithm applied. For example, when attempting to find patterns in data or clustering cells with similar phenotypes, you typically don't want markers with the highest absolute expression value (abundant protein or good ab probe) to dominate over others with lower absolute values...

Petter
<<

mleipold

Guru

Posts: 3098

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Fri Feb 20, 2015 4:19 pm

Re: Negative values in CyTOF

Hi Petter,

Could you explain a bit further why you think Z-score normalization is important? I'm not so knowledgable about statistics in general. If I understand Z-score normalization correctly, you're adjusting for the fact that different markers have higher signal/spread (eg, CD45RA) than others (eg, PD-1).

If so, how do you avoid the issue of the dim marker spread being poisoned by low events that are just background, rather than "true"?

I would think you'd have to set some kind of threshold to say "anything below this value X, consider as background" and which would therefore be ignored somehow, or given a value of 0 or some very low number, before you would do that normalization.


Mike
<<

petterbrodin

Participant

Posts: 11

Joined: Thu Nov 28, 2013 7:32 am

Post Sun Feb 22, 2015 7:52 pm

Re: Negative values in CyTOF

Yes Mike, you are bringing up an important point. It is absolutely true that if a marker expression is uniformly low and effectively not relevant (i.e noise), then such signals will be inflated by the standardization procedure. For any analysis, manual or automated, with or without standardization, it is always essential to first identify such markers and exclude them from the analysis. Only markers with reliable and variable enough signals can be informative and should be included.

With respect to the thresholding, yes, I think if you can find a reliable threshold for "true" positive signal using an fluorescence-minus one (FMO) control, stim/unstim comparison or some other measure, then it would make sense to only include signals above this value in standardization and downstream analysis. The rest should be set to zero or as NA-values. This brings me back to my initial argument for not using randomized zero-entries in standardization and high-dimensional analysis, since these will obscure the "real" signal of interest. The same is true for the background signal.

With that said, I think everyone can agree that a marker like PD-1 (low maximum signal) can be just as informative as CD45RA (high maximum signal) when it comes to defining a cellular phenotype by ACCENSE, SPADE, viSNE or PCA or any other tool. The key is not absolute expression values, but that expression is variable between cells. However, without the standardization of PD-1 and CD45RA expression values prior to the analysis, these two markers will not contribute equally to the output of the high-dimensional analysis.
<<

anitamkant

Contributor

Posts: 49

Joined: Mon Nov 18, 2013 6:30 am

Post Fri Feb 27, 2015 6:59 pm

Re: Negative values in CyTOF

Hello Everybody,
Thanks for sharing your views on this important topic.
Fluidigm is in the process of updating the white paper on signal processing with details on the randomization.
Please stay tuned.
Thanks
<<

rmelchio

Participant

Posts: 3

Joined: Fri Feb 13, 2015 10:08 am

Post Thu Mar 26, 2015 10:24 am

Re: Negative values in CyTOF

Another quick question on the subject. When comparing MFI across samples do you set values lower than 0 to 0 or keep the original randomly scattered values?

Thanks in advance
Previous

Return to CyTOF general discussion

Who is online

Users browsing this forum: No registered users and 1 guest