FAQ  •  Register  •  Login

2020-Liu et al-Front Cell Dev Biol

<<

mleipold

Guru

Posts: 2156

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Mon May 18, 2020 4:48 pm

2020-Liu et al-Front Cell Dev Biol

"Recent Advances in Computer-Assisted Algorithms for Cell Subtype Identification of Cytometry Data"
Liu P, Liu S, Fang Y, Xue X, Zou J, Tseng G, Konnikova L
Front Cell Dev Biol. 2020, 8, 234
https://doi.org/10.3389/fcell.2020.00234

"To evaluate and compare the performance of the various unsupervised and supervised clustering tools, we applied these algorithms to a public dataset [Fluidigm_Maxpar Direct Immune Profiling Assay_201325_Gating Example_v1.0 (Public)] downloaded from Cytobank (Kotecha et al., 2010) for a total of 32 unsupervised and 6 supervised/semi-supervised clustering tools. This dataset included CyTOF data on 42 human peripheral blood mononuclear cells (PBMCs) samples, where we randomly chose one PBMC sample (HulmmProfiling_S1_PBMC_1) and applied it to all the various methods. After processing the data to filter out beads, dead cells and doublets, there were 184,968 cells that remained in the dataset. "

- appears to be https://premium.cytobank.org/cytobank/e ... nts/221569
<<

mleipold

Guru

Posts: 2156

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Mon May 18, 2020 8:32 pm

Re: 2020-Liu et al-Front Cell Dev Biol

I'm a bit confused by this in the section "A Practical Application":

"For those methods requiring a training dataset such as DeepCyTOF, CyTOF linear classifer and flowLearn, we set aside half of the cells in the dataset as the “training” set and used the remaining cells in the dataset as the “validation” dataset."

If I understood their dataset statement correctly, it consists of a single file of ~185K events. So, they would be taking a single file, splitting it in half, then the first half as "training" data and the second half (of the same file) as "validation" data?
<<

dtelad11

Master

Posts: 112

Joined: Mon Oct 31, 2016 6:26 pm

Post Mon May 18, 2020 8:38 pm

Re: 2020-Liu et al-Front Cell Dev Biol

If I understand the paper correctly then yes, that is what they have done.
<<

sgranjeaud

Master

Posts: 79

Joined: Wed Dec 21, 2016 9:22 pm

Location: Marseille, France

Post Mon May 18, 2020 11:15 pm

Re: 2020-Liu et al-Front Cell Dev Biol

Interesting work. I hope that the authors will move the FCS and the clusterings to Cytobank community website.

Phenograph clustering seems visually correct. FlowSOM was shrinked to 20 clusters, but I feel that a higher value (same as Phenograph?) would produce better results as some populations are not detected currently. SPADE performs quite well, visually and reading its scores.

In my humble opinion, tuning has to be done even when the reference is supposedly known. I think that the default tuning might make some algorithms performing poorly. But if an algorithm performs well, it should be (re)considered.

This is an interesting comparison for selecting approaches that could fit the current challenges. I feel that 185k cells is a standard now for mass cytometry for one sample. The current challenge consists in working with dozens of samples. Scaling the most promising methods to millions of cells has not been publish to my knowledge. optSNE solves the visualization of millions cells, but which approach scales up and addresses the clustering of millions of cells?

Typo: I don't think that (Ferrer-Font et al., 2019) should be associated to Phenograph as the original paper.
<<

dtelad11

Master

Posts: 112

Joined: Mon Oct 31, 2016 6:26 pm

Post Mon May 18, 2020 11:38 pm

Re: 2020-Liu et al-Front Cell Dev Biol

> FlowSOM was shrinked to 20 clusters, but I feel that a higher value (same as Phenograph?) would produce better results as some populations are not detected currently.

Agreed. FlowSOM was designed around over-clustering (see the original paper and Duò et al., F1000Res. 2018 Jul 26, the latter is a scRNAseq paper but still relevant in my opinion).

> optSNE solves the visualization of millions cells, but which approach scales up and addresses the clustering of millions of cells?

Numerous papers used meta-clustering successfully to address the problem of clustering a large number of cells.

Return to Literature

Who is online

Users browsing this forum: Google [Bot] and 3 guests

cron