FAQ  •  Register  •  Login


Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely :-)



Posts: 23

Joined: Thu Oct 31, 2013 7:58 pm

Post Fri Nov 08, 2013 11:33 pm


The earliest integrative analysis approach for CyTOF data (also applicable to any flow cytometry data) is SPADE (Spanning-tree Progression Analysis of Density-normalized Events [1], invented in the Nolan lab. It has been used in several published papers of CyTOF data [2-4]. SPADE is available as part of the DVS Cytobank (https://dvs.cytobank.org), which has a simple, intuitive user interface. SPADE can also be run as a Cytoscape plug-in, available from the Nolan lab website (https://github.com/nolanlab/spade/wiki). This implementation is better for large data sets, but is a bit more tricky to install and run.

SPADE is a clustering algorithm, with clustering performed after down-sampling of events to allow identification of low-density clusters. The algorithm then displays the relatedness of clusters via a dendogram. This "tree" can then be colored by expression level of any given marker, or by fold-change of any marker over control.

Caveats of SPADE include that its result is highly dependent on the settings (initial file gating, down-sampling percentage, target cluster number, and markers used for clustering). Even with the same settings, multiple SPADE runs will result in somewhat different trees, due to the random nature of down-sampling. Tips include:
-Pre-gate your files to at least eliminate debris, doublets, and dead cells.
-Use only a minimal set of markers for clustering, and make sure they are well-resolved. Clustering on very dim markers is dangerous.
-Perform multiple SPADE runs to gain confidence in the analysis.


1. Simonds, E. F., Bendall, S. C., Gibbs, K. D., Bruggner, R. V., Linderman, M. D., Sachs, K., et al. (2011). Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE. Nature Biotechnology, 1–8. doi:10.1038/nbt.1991

2. Bendall, S. C., Simonds, E. F., Qiu, P., Amir, E. A. D., Krutzik, P. O., Finck, R., et al. (2011). Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum. Science, 332(6030), 687–696. doi:10.1126/science.1198704

3. Bodenmiller, B., Zunder, E. R., Finck, R., Chen, T. J., Savig, E. S., Bruggner, R. V., et al. (2012). Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators. Nature Biotechnology, 30(9), 857–866. doi:10.1038/nbt.2317

4. Horowitz, A., Strauss-Albee, D. M., Leipold, M., Kubo, J., Nemat-Gorgani, N., Dogan, O. C., et al. (2013). Genetic and environmental determinants of human NK cell diversity revealed by mass cytometry. Science Translational Medicine, 5(208), 208ra145. doi:10.1126/scitranslmed.3006702



Posts: 17

Joined: Tue Nov 26, 2013 1:55 pm

Post Thu Sep 18, 2014 2:29 pm


Hi All,

We would like to use SPADE for our analysis and have generated lots of trees using a combination of node numbers and downsampling settings...now the hard part seems to be choosing the 'best' tree.

If anyone can comment on the basic criteria they use to decide whether a tree is good or not that would be really helpful!

Also, in response to the original cytoforum post about SPADE, we would like to know why it is 'dangerous' to cluster using markers with low signal intensity? It seems logical to cluster using all markers so that the process is completely unsupervised/unbiased.





Posts: 23

Joined: Thu Oct 31, 2013 7:58 pm

Post Thu Sep 18, 2014 4:19 pm


The various permutations of SPADE trees that are created from multiple runs, with or without settings changes, should be evaluated for consensus. In other words, separation of major branches should be similar among most of them, even if the spacial arrangement varies. I wouldn't pick a tree that looks completely different from all the other runs. I *would* look for clean separation of lineages, and if there are particular cells you're interested in, say NK cells, I would look for good clustering of those, and separation of sub-lineages.

The reason not to use all markers in clustering, in totally non-mathematic terms, is that SPADE gets confused. It will attempt to separate clusters based on differences that are mostly noise, if given channels that contain mostly noise and little signal...and in the process may miss differences based on real signal in the "good" channels. If someone wants to say this more scientifically, I will not be offended. :)




Posts: 4300

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Thu Sep 18, 2014 4:23 pm


Hi David,

Some things to keep in mind with SPADE:

1. In order to have identical trees, you have to have all the files in the same SPADE run. Even the same files run multiple times using the same input parameters will give slightly different trees each run. In other words, Node 28 in Run #1 might be a monocyte node, whereas in Run #2, Node 28 might be a B cell node. Currently, SPADE does not allow you to build one tree and then feed additional files through that "template" tree.

2. The number of cells in each file should be kept in mind when you choose the input node number. If you have only a small number of cells in each file, you don't want to choose too large a node number. Minimally, this will give you a large number of nodes that only contain 1 cell, which is seldom useful.

3. Once I have built a SPADE tree, I usually go in and look for some of the rarer cell populations that may not be well-described by my panel. So, for instance, PBMCs often *do* contain small numbers of basophils, mDCs, and pDCs. My panel is more T-cell focused, which basically makes them Lin- for most of the 33 markers in my panel (CD123, HLADR, and CD11c are about the only ones that help with the clustering). So, these populations are infrequent, *and* not well-described, therefore they are often lumped together into a single node. Or, minimally, basophils are in one node, and the DCs are in another node even though CD123 and CD11c should separate mDCs from pDCs. You can investigate this by clicking on a node, and looking at the bivariate plot next to the SPADE tree. If the tree does a good job of separating those sorts of populations, then the tree is usually something I can have confidence about.

If it doesn't separate those well, I step up to another rare or ill-described population, such as plasmablasts, TCRgd+, CD85j+ CD8+ cells, CXCR5+ T cells, etc and repeat the bivariate QC.

4. The danger of using low signal intensity markers in your tree-building is that they can give you clusters without meaning. There isn't a way to set a clustering threshold, like "any signal below arcsinh=1, count as zero". So, that parameter's influence on the tree-building is therefore going to be driven as much by the noise/background as by "true-but-dim" signal.

I wouldn't say that you can *never* use such markers. Just that you have to be *extremely* careful in their interpretation, especially if the frequency of the positive cells is low.

For example: PD-1 signal is often somewhat dim, but the pos/neg populations can usually be resolved in FlowJo/Cytobank. ICOS, on the other hand, is super-dim, and usually rare. Therefore, it might not add much to the tree-building. Minimally, you would have to be careful interpreting the nodes.

For example, B cells are biologically negative for ICOS (they make ICOS-L): if the intensity of the ICOS signal on your T cells is similar to the (background) intensity on the B cells, then I would argue that your T cell signal is not interpretable.

I would note: this sort of background-driven clustering is most important for these low-and-rare markers. However, this is also one of the reasons why you need to gate out debris, doublets, and dead cells as much as possible, to avoid their background signals from "poisoning" the tree. Similarly, if one of your markers is screwy in your files, then you need to take similar care in interpretation. For instance, you could have a case where there's a high background in your file(s) in a channel because you were overtiter on an antibody, or hadn't washed the cells enough that day. This would also affect the background level for all cells, and therefore could affect how that marker impacts your tree. As one example, CD45RA is often streakier than many antibodies, and so you could wind up with a tree where the CD45RA signal on monocytes or Memory T cells is much higher than it should be for at least one file.


Return to CyTOF data analysis

Who is online

Users browsing this forum: No registered users and 2 guests