FAQ  •  Register  •  Login

Cluster identification

Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely :-)



Posts: 3

Joined: Wed Nov 21, 2018 4:08 pm

Post Sun Jul 21, 2019 5:34 pm

Cluster identification

Hi all

Does anyone know if there are any packages out there that try and make the horrible task of cluster ID easy? Any packages able to provide a "Best guess" for what a cell subset is based on the marker expression profiles?

Kind regards



Posts: 2156

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Mon Jul 22, 2019 3:44 pm

Re: Cluster identification

Hi Juan,

Some of it depends on what you mean by "best guess". In some cases, you need to gate/identify a population and then the software applies that info to the rest of the sample(s). In some other cases, the program has been trained on some prior data (even RNAseq) and that model is applied to new samples.

Here's a (not exhaustive) list of some methods to get you started:
Scaffold/Statistical Scaffold: https://doi.org/10.1016/j.cell.2016.12.022
cytometree: https://doi.org/10.1002/cyto.a.23601
ACDC: https://doi.org/10.1093/bioinformatics/btx054
DeepCyTOF: https://doi.org/10.1093/bioinformatics/btx448
xCell: https://doi.org/10.1093/bioinformatics/btx448
HiPPO and PANDA: https://peerj.com/preprints/2188/
Mondrian process: https://arxiv.org/abs/1711.07673 (also related: https://doi.org/10.1101/414904)

To be honest: I haven't really integrated any of these into my workflows. I still mainly look at heatmaps and such to help with cell type assignment. Part of that is my relatively low level of competence getting some of the packages/code to work really well, and some of that is just time to evaluate the various options and decide which is best for my own instances.




Posts: 112

Joined: Mon Oct 31, 2016 6:26 pm

Post Mon Jul 22, 2019 3:48 pm

Re: Cluster identification

In addition to the packages Mike suggested, you might want to look into flowCL:


With that said, as Mike pointed out, the standard protocol seems to be to manually assign labels based on eyeballing biaxial plots, heat maps, or t-SNE/UMAP maps. There are several companies that offer this as a fee-for-service, including us (Astrolabe Diagnostics, el-ad@astrolabediagnostics.com), Cytapex (http://cytapex.com/), and AltraBio (https://www.altrabio.com/).



Posts: 3

Joined: Wed Nov 21, 2018 4:08 pm

Post Tue Jul 23, 2019 12:27 am

Re: Cluster identification

Hi Mike and David

Thank you so much for your help. Really useful. I'll try the packages suggested.




Posts: 4

Joined: Mon Jul 13, 2015 9:07 pm

Post Thu Jul 25, 2019 1:46 am

Re: Cluster identification

Hi all,

Interesting thread, thanks for linking tools. I'd also like to highlight Marker Enrichment Modeling (MEM), a tool my group developed for learning cell identity.

The approach in MEM goes beyond traditional heatmaps of the median intensity (or staring at t-SNE plots) in a couple ways: 1) the value assessed is an enrichment score that is platform independent (we show comparison of fluorescence and mass cytometry where the same enrichment scores are obtained) and 2) MEM also creates an ordered text label that is a "compressed digest" of the apparent identity / special features of that subset (and generally more quickly readable than a heatmap).

MEM code and examples on GIthub here:

This is from a class we teach on cell identification approaches and it has a lot more than the minimum needed for MEM, including multiple R markdowns, an install script for MEM, UMAP, t-SNE, and FlowSOM, and a few small FCS files. I believe it's about 20MB in total with 3 sets of example FCS files from publications.

This version of MEM produces a few different ways of viewing the data, including:
- Heatmaps of the MEM enrichment scores, median expression, and interquartile ranges for all populations
- A human and machine readable text "MEM label" as in:
"1 : UP CD4+4 CD3+4 • DN CD16-9 CD8-7 CD11c-5 HLA-DR-5 CD69-3"
This is for CD4 T cells. The scores go from +10 (max enriched) to 0 (no difference) to -10 (max excluded). (For those who have followed MEM: there is now also a "reference-less" version of MEM that can be tested in the examples on Github.)

This code also shows examples of comparing MEM labels generated from FlowSOM analysis of UMAP vs. FlowSOM analysis of t-SNE vs. expert gates (i.e., do we get the same phenotype and MEM label if we analyze 3 different ways?).

The original publication for MEM is:
Diggins et al., Nature Methods 2017
Characterizing cell subsets using marker enrichment modeling

There is also a protocol walking through a few examples:
Diggins et al. Current Protocols in Cytometry 2018
Generating Quantitative Cell Identity Labels with Marker Enrichment Modeling (MEM)

You can see uses of MEM and comparison of MEM labels recent publications, including:
Greenplate et al., Cancer Immunology Research 2019
Computational Immune Monitoring Reveals Abnormal Double-Negative T Cells Present across Human Tumor Types.

Feel free to ping me here or in email if anyone has questions or needs help getting it working. There's a new "try your data" script we're developing if you want to test it out on your files.


-Jonathan Irish @ Vanderbilt

Return to CyTOF data analysis

Who is online

Users browsing this forum: Bing [Bot] and 3 guests