Gating output-addl analysis-issue of dependent pops

Posts: 2095

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Mon Jun 10, 2019 8:30 pm

Gating output-addl analysis-issue of dependent pops

Hi all,

I have a general topic to throw out for discussion, especially to the stats and informatics people.

When we perform experiments for people as part of the service center, we process the samples, run the samples, do preliminary analysis (standard FlowJo gating, especially as QC), and then give all the files back to the customer.

Often, we're asked to output a data table of Freq Parent and/or Count from the FlowJo gating hierarchy. Some people use this as a quick first pass, but others use this as a big part of their data for statistical analysis.

Unfortunately, we do occasionally have people that want to use that FlowJo table without thought to the gating hierarchy and what are dependent and independent populations. For example, we often do CD3 +/- as a simple split gate in a histogram. Clearly, in this case, CD3+ pop and CD3- pop are interdependent and sum to 100%.

In some other cases, we may do a bivariate like CD4 vs CD8, and gate the single-pos. In this case, CD4+ and CD8+ often do *not* sum to 100% (so, not *completely* interdependent, but not sure I'd call them completely independent either since they come from the same parent gate).

Obviously, this is an issue of gating hierarchies, as a single cell will be a member of several gates as you go down the hierarchy. This is in contrast to many types of clustering, where a single cell can be a part of one and only one cluster and therefore there are none of these redundancies or interdependencies.

Since this is something that comes up, I thought I'd throw this out as a discussion topic to the community: what do you do in these cases? Is there *one* answer? Or, is this where the informatics/stats people need to work particularly closely with the bench scientists to decide what goes and what stays in the analysis?

In the above CD3 split gate example, are *both* CD3pos *and* CD3neg something to include in your modeling, or should you use one or the other but not both?

Also, you could imagine a case where, say, total CD4+ Count or (CD4+ percent of total CD3+) may not change significantly between Case and Control, but some subpopulation (%Naive, or %Th2) would. In other cases, total CD4+ Count or (CD4+ percent of total CD3+) may change, but within that altered CD4+ fraction, the subfractions of %Naive or %Th2 might not.





Posts: 19

Joined: Sat Nov 01, 2014 7:07 pm

Post Tue Jun 11, 2019 2:43 pm

Re: Gating output-addl analysis-issue of dependent pops

IANAL, I mean, I am not a statistician and would love to hear from a real one here.

I asked a question along the same lines to Holden M. at Flowtex this year when he showed some PLS analysis of CYTOF cell proportional data that were raw-fed into a model. With PLS, these concerns are even more substantial than with simple comparisons, since the PLS looks into collinearity of the data and compositional data contain artificial collinearity.

I believe this problem is very well recognize and worked out out by the world outside cytometry field :lol: since there are tons of compositional data in nature.

I was taught to use isometric log-ratio transformation to data such as percentage/proportion data, then use them in downstream analysis. However, for single comparisons (if you only care about one population - that's never the case in real world of multi parameter analysis, I guess :)) it is not necessary for most common statistical tests.

To answer your question directly - I tend to not hand tables of raw stats to our collaborators for this specific reason. When I am wearing a core director hat, I try to push users towards getting their own FlowJo seat. However, I am in the flow core, and my users have way more independence than a CYTOF core user whose samples are processed for them.

