FAQ  •  Register  •  Login

Update of Rphenograph and Rtsne

Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely :-)
<<

sgranjeaud

Master

Posts: 79

Joined: Wed Dec 21, 2016 9:22 pm

Location: Marseille, France

Post Thu Jun 11, 2020 5:18 pm

Update of Rphenograph and Rtsne

Hi,

I hope everybody is safe and has started to work again.

I have finished to implement some improvements in Rphenograph, features below.
For the geeks who want to test and report problems the code is available
https://github.com/i-cyto/Rphenograph

I implemented the really delicious optsne in Rtsne some weeks ago and I am waiting the approval of the author/maintainer of Rtsne.
For the geeks who want to test and report problems the code is available
https://github.com/SamGG/Rtsne

To be noticed, I am not a maestro of C coding nor openmp, so use these packages at your own risk.
Installation commands at the end of this message.
I will integrate them in a new release of cytofkit for the end of June.

On my laptop, a dataset of 300 k datapoints, 11 dimensions (part of the flow18 dataset from Belkina et al), using 30 nearest neighbors,
optsne is computed in 960 sec, similarly to Python.
Rphenograph is computed in 210 sec, similarly to PARC (195 sec) whereas original Phenograph implementations are 440 sec in R and 1096 sec in Python. The original Python Phenograph carries out 32 Louvain iterations, but this information is not available in the igraph implementation of Louvain, as far as I know. PARC is carrying 5 iterations of the Leiden algorithm by default. When Python Phenograph is set up to carry out 5 iterations of Leiden, it takes nearly the same amount of time as PARC. I let you the pleasure to check which result you prefer.

To be noticed, Jan Stuchly did a Rphenograph implementation with Annoy library, the whole process being fully parallelized, which is not my case.
https://github.com/stuchly/Rphenoannoy

I thank Etienne, Tom, Josef Spidlen, Chris Ciccolella, James Melville and Jan Stuchly for exchanges, and I thank my employer, Inserm.

Rphenograph new features
* S. Thomas Kelly added pruning and graphs clustering methods. This was turned into calling igraph functions for simplifying the graph.
* Etienne K. Becht added approximate HNSW nearest neighbors for speed. The RcppHNSW package is as fast as Python on 1 core, and multi-core is currently developed.
* Etienne K. Becht noticed that some points are not reported because they don't share any neighbors with their neighbors. This is integrated in C code.
* Louvain is the default graph clustering method. Any clustering functions of the (r)igraph package can be specified. Leiden is not yet available.
* S. Granjeaud improves the the Jaccard_coefficient function by pre-sorting nearest indices in the C code. Now the computation takes only a few seconds for a dataset of 300 k datapoints and 30 nearest neighbors.
* The original implementation of the Jaccard coefficient removes the two cells when looking at the intersection of their neighbors. You can decide to keep them.
* A parameter permits to report only some of the k NN instead of all of them. If k is set to 30, the Jaccard is still computed on 30 kNN, but only the 10 NN could be reported. This lowers the number of edges in the graph speeding up the clustering and allowing a finer clustering.

Windows installations
To ease windows installation and not avoiding the installation of Rtools, I released windows Binary for 64 bits installation of R (3.6.x, 4.0.x)
install.packages("https://github.com/i-cyto/Rphenograph/releases/download/Rphenograph_0.99.1.9003/Rphenograph_0.99.1.9003.zip", repos = NULL, type = "win.binary")
install.packages("https://github.com/SamGG/Rtsne/releases/download/v0.15.0.9001/Rtsne_0.15.0.9001.zip", repos = NULL, type = "win.binary")

Linux/Mac installations
The following commands should work. If not, open an issue on github. I don't have access to such machines, so help will be limited.
devtools::install_github("i-cyto/Rphenograph")
devtools::install_github("SamGG/Rtsne")
<<

vtosevski

Contributor

Posts: 44

Joined: Wed Nov 20, 2013 12:50 pm

Location: Zurich, Switzerland

Post Wed Jun 24, 2020 8:02 am

Re: Update of Rphenograph and Rtsne

Hi Sam,

Thanks for sharing this information, we'll for sure try some of the improvements and report feedback. And I also thank your employer, for supporting you in this work, and your peers and colleagues who shared some of the burden that comes with it.

Seeing how wide your experience goes, across different platforms and implementations, I wanted to ask you how does the graph-based clustering methods implemented in scRNAseq domain compare to the work you outlined here? Are they "better", same, "worse"? I once ran clustering of CyTOF data within Seurat framework, just as a toy examples and, of course, it is possible. I am not sure, however, that it is faster... The potential benefit I recognize is a more active development (perhaps?). What are your thoughts?

Best,
Vinko
<<

sgranjeaud

Master

Posts: 79

Joined: Wed Dec 21, 2016 9:22 pm

Location: Marseille, France

Post Wed Jun 24, 2020 9:24 am

Re: Update of Rphenograph and Rtsne

Hi Vinko,

Thanks for your feedback. My experience does not go very far.

I know Seurat includes graph based clustering, but I never tested it. The clustering algorithm used is smart local moving algorithm for large-scale modularity-based community detection, but now SLM's authors recommend using Leiden algorithm. Nevertheless it might tempting to test SLM for speed and accuracy.
https://github.com/satijalab/seurat/blo ... imizer.cpp
http://www.ludowaltman.nl/slm/

IMHO tools developed in the field of scRNAseq are addressing a lower amount than in the field of cytometry. So their scaling-up to millions of cells must be tested. This is an affordable task.

Answering for the accuracy of the resulting clustering requires a ground truth which is more difficult to find. There are the classical test files (as used by Weber 2016) but I feel they are less challenging than the data currently produced. So I encourage experimenters to release their data with annotation in order to improve analytical tools. And I thank Mike for carrying about that.
viewtopic.php?f=1&t=1937&p=5080

If you are looking for speed, have a look at PARC (Python code based on Leiden) and the recently advertised FastPG (based on parallel Louvain)
viewtopic.php?f=10&t=1591&p=4394
viewtopic.php?f=10&t=1951&p=5096

To get a really better answer than mine, M. Robinson, Y. Saeys or R. Gottardo should be contacted, because these groups are really working in both fields.

Best,
Samuel
<<

dtelad11

Master

Posts: 107

Joined: Mon Oct 31, 2016 6:26 pm

Post Wed Jun 24, 2020 12:43 pm

Re: Update of Rphenograph and Rtsne

Seurat's clustering is conceptually similar to Phenograph. I have run it on CyTOF data in the past (100,000 cells from a single donor PBMC) and got comparable results.
<<

sgranjeaud

Master

Posts: 79

Joined: Wed Dec 21, 2016 9:22 pm

Location: Marseille, France

Post Wed Jun 24, 2020 5:24 pm

Re: Update of Rphenograph and Rtsne

Dear El-ad,
Thanks a lot for sharing your experience.
A colleague also confirmed this.

Return to CyTOF data analysis

Who is online

Users browsing this forum: No registered users and 2 guests

cron