FAQ  •  Register  •  Login

get geometric mean back for cytof data?

Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely :-)
<<

xwxdazhong

Participant

Posts: 3

Joined: Tue Oct 18, 2016 1:11 pm

Post Thu Mar 07, 2019 12:04 am

get geometric mean back for cytof data?

I have been working with cytof data and phenograph for a while, The tools I used are cytobank and cytofkit, of which the best way so far to represent marker expression level of certain clusters is 'median'.
Yet I noticed some clusters from phenograph exhibit bimodal distribution or contain 'long tail' towards high value on some channels, and it makes median a biased representation, in my opinion.
So I start to think about Geometric mean(GM) which we use a lot in FACS data analysis. However, there is a zero issue for GM analysis. If the data contains zero value, the geometric mean will become 0 no matter how many positive values there are. Currently, there are several ways to deal with zero value for GM analysis, one way is to add "1" or "0.001" to all the values, both zero values and non-zero values, but I am not sure it is legal to do such a data transformation for cytof data.
It would be very much great if anyone could provide some insight on this, thanks a lot!

Wen
<<

BjornZ

Contributor

Posts: 43

Joined: Fri Jul 10, 2015 1:04 am

Post Thu Mar 07, 2019 3:22 am

Re: get geometric mean back for cytof data?

I generally avoid geomean for flow and CyTOF data because of that issue with zero (and, for flow, negatives). All of the ways I've seen to try to deal with 0s and negatives are hacks that shouldn't be relied upon:

  • If you add 1, that's similar to removing all values that are 0 (because it's multiplication by 1). ("Similar" -- because you still count them when taking the n-th root.)
  • If you replace the zeros with a tiny value (~0.0001), your result will probably be much smaller than you expect because you're multiplying by a lot of tiny values.
  • If you add some large value to all the numbers, calculate the geomean, and then subtract that same value, the result will be biased inconsistently across the domain; see https://math.stackexchange.com/a/2544339/562206 for a full explanation.

This came up again recently on the Cytometry list; see https://lists.purdue.edu/pipermail/cyto ... 53633.html for Mario Roderer's comment and then David Novo's response, with which I very much agree. The two images attached are the same file viewed in FlowJo with the scale range changed from [-6, 6] to [-10, 10]. FlowJo reports a geomean of 5.92 for the former and 9.94 for the latter; there is no statistical basis for this.

In any case, geomean won't help describe a bimodal distribution.

Enough ranting; what can you do instead?
  • If you need to *compare* two or more distributions, look at tests specifically designed to do so (such as Kolmogorov-Smirnov https://en.wikipedia.org/wiki/Kolmogoro ... irnov_test).
  • Consider using "percent positive" -- percentage of events that are greater than some threshold. Sometimes people set that threshold arbitrarily, but ideally it's set based on a control.
  • Consider using trimmed mean. This removes (an adjustable amount of) outliers and leaves you with an estimate of the central tendency. https://en.wikipedia.org/wiki/Truncated_mean lists its upsides and downsides.
  • Continue using mean or median if appropriate for your question at hand.

Hope that helps! Might be able to provide more specific advice if you share the question you're trying to answer with your dataset.
Attachments
g2.png
g2.png (6.01 KiB) Viewed 11324 times
g1.png
g1.png (6.95 KiB) Viewed 11324 times
<<

sgranjeaud

Master

Posts: 125

Joined: Wed Dec 21, 2016 9:22 pm

Location: Marseille, France

Post Thu Mar 07, 2019 8:56 am

Re: get geometric mean back for cytof data?

IMO, the most important point in your question is the fact that the distribution could be bimodal. It sounds like the cluster should be refined.
I don't see the problem in using median as median put less weight on extreme values than arithmetical mean. So I think median is a good indicator of central tendency.
Could you share some screenshot of the distributions?
<<

dtelad11

Master

Posts: 129

Joined: Mon Oct 31, 2016 6:26 pm

Post Thu Mar 07, 2019 1:20 pm

Re: get geometric mean back for cytof data?

Mike and Samuel above raise an important point, in my opinion. We could have a discussion about the math behind medians and geometric means. However, the real question is: what are you looking to visualize here? Every visualization brings it advantages and disadvantages. Heat maps are a great way to summarize a large number of data points into a single plot, but they lose the distribution. If you're looking for higher resolution, you might want to consider joy plots -- I attached an example to this post.
Attachments
astrolabe_diagnostics_assignment_joy_plot.jpg
Joy plot of labeled cell subsets
<<

xwxdazhong

Participant

Posts: 3

Joined: Tue Oct 18, 2016 1:11 pm

Post Tue Mar 12, 2019 3:56 pm

Re: get geometric mean back for cytof data?

thanks a lot for all the replies. i like emails linked by BjornZ's reply and I noticed a statement like this in Mario Roederer's email:
"FlowJo implemented a version of Geometric Mean which also computes to the central tendency in any scaled data — i.e., it represents the central tendency of the data even if there are negative or zero values. Think of it as a “graphical” mean. It works very well."

I used FlowJo a lot for FACS analysis including geometric mean(GM) function, and yes, I did not know the special "GM" analysis in FlowJo until now. To confirm it, I input a csv file with some data including 0 value into FlowJo and did the GM analysis for this channel and I got a GMFI value other than "0". therefore, I think it is worth to try input phenograph data into flowjo to get GMFI, as an alternative to median analysis.
I know it might be too soon before I know the detailed formula of the GM function in flowjo, but it could be a potential way for heatmap.

Respond to dtelad11, i think heatmap is still a good way for data exhibition, even it indeed ignores the distribution but it seems to be quite necessary to ignore the distribution on purpose but in a scientific way if I would like to get a clean, simple figure, of course, a clean, simple heatmap including the color and 'mini distribution' would always be the best option.

thanks again!

Wen
<<

BjornZ

Contributor

Posts: 43

Joined: Fri Jul 10, 2015 1:04 am

Post Tue Mar 12, 2019 4:08 pm

Re: get geometric mean back for cytof data?

Hi Wen,

Sorry I was unclear, but I was actually discouraging use of FlowJo's "geometric mean," as it's not statistically valid and still doesn't help your scenario of bimodal distributions. It's their own definition and you can trivially change its value in ways that shouldn't be possible. I'd look instead at the alternatives proposed in my post and the others.

+1 for El-Ad's suggestion of "joy plots" (histogram overlays); but I want to mention that you usually see them with the conditions you want to compare overlaid in a single column as opposed to a grid (greatly helps with visualization).

Best,
Zach
<<

xwxdazhong

Participant

Posts: 3

Joined: Tue Oct 18, 2016 1:11 pm

Post Thu Mar 14, 2019 9:58 pm

Re: get geometric mean back for cytof data?

BjornZ wrote:Hi Wen,

Sorry I was unclear, but I was actually discouraging use of FlowJo's "geometric mean," as it's not statistically valid and still doesn't help your scenario of bimodal distributions. It's their own definition and you can trivially change its value in ways that shouldn't be possible. I'd look instead at the alternatives proposed in my post and the others.

+1 for El-Ad's suggestion of "joy plots" (histogram overlays); but I want to mention that you usually see them with the conditions you want to compare overlaid in a single column as opposed to a grid (greatly helps with visualization).

Best,
Zach


Hi Zach,
thank you for your reply. You made a quite clear point on the GM that it is not valid statistically and I understand your concern. Yet I more care about whether GM is more valid than the median in cytof analysis, after all, heatmap is still necessary for cytof analysis in my opinion. So, my question becomes, other than "joy plots" which combines mini histograms together, would GM heatmap be better (or more solid?) than median heatmap?

Wen
<<

meeklu

Participant

Posts: 5

Joined: Tue May 18, 2021 11:49 am

Post Wed May 19, 2021 9:47 am

Re: get geometric mean back for cytof data?

Hi Wen,

I recently stumbled across your post and was wondering if you have solved this issue?
Did you publish a paper with the described dataset that showed a bimodal distribution?

Best,
Melissa

Return to CyTOF data analysis

Who is online

Users browsing this forum: No registered users and 2 guests