FAQ  •  Register  •  Login

automated clustering for large data sets (ACCENSE)

Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely :-)
<<

bluejek128

Participant

Posts: 13

Joined: Tue Feb 03, 2015 4:21 pm

Location: NYC

Post Thu Feb 05, 2015 9:36 pm

automated clustering for large data sets (ACCENSE)

Hi All

I'm writing to inquire about the application of ACCENSE for large data sets (>100,000) obtained by CyTOF. Running ACCENSE on downsampled data of 20,000 points yielded in a reasonable number of clusters, but produced over 240 clusters when applied to my complete data set of approx. 350,000 points. Since it is critical for us to run many data points in order to identify rare cell types, etc, does anyone know how to effectively apply ACCENSE for automatically clustering large data sets? The Chakraborty, et. al paper mentions "projectiong additional points in the t-SNE map" in its supporting materials...but didn't quite get how to apply that.

Thanks!

Joel
<<

petterbrodin

Participant

Posts: 11

Joined: Thu Nov 28, 2013 7:32 am

Post Fri Feb 06, 2015 4:57 am

Re: automated clustering for large data sets (ACCENSE)

Hello Joel.
As you know the ability of ACCENSE to identify clusters is dependent on the separation between these clusters. This in turn is dependent on the markers included in the analysis and how variable their expression is across the cells in your analysis.
I suggest the following:
1) Lower the significance level of the k-means clustering if you haven't already as this vill reduce the number of clusters found
2) If you're still not finding the clusters you expect, then the resolution is the issue and you need to go over the markers included in your analysis. Remove any marker that is invariable as these only provide noise and no signal and pre-gate if you haven't already to exclude cell populations you are not interested in.

The strategy you refer to from the paper of down- and up sampling was used only to get around the upper limit (~25,000 cells) possible with the original t-SNE implementation. That is not an issue anymore using the standalone ACCENSE GUI running Barnes-Hut SNE.

Best/Petter
<<

mleipold

Guru

Posts: 5796

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Fri Feb 06, 2015 4:11 pm

Re: automated clustering for large data sets (ACCENSE)

Hi Joel,

I'm not sure that I understand your question.

You said: "Running ACCENSE on downsampled data of 20,000 points yielded in a reasonable number of clusters, but produced over 240 clusters when applied to my complete data set of approx. 350,000 points."

When you clustered on 20,000 points, you got a different number of clusters than when clustered on 350,000. What do you mean by "reasonable"? I would assume that you got fewer clusters with 20K than with 350K; are you saying that you think the smaller number is more correct/"reasonable" than the larger number?


Mike
<<

bluejek128

Participant

Posts: 13

Joined: Tue Feb 03, 2015 4:21 pm

Location: NYC

Post Fri Feb 06, 2015 9:08 pm

Re: automated clustering for large data sets (ACCENSE)

Thanks for your prompt replies:

Petter: Thanks for the tips; I will play around with the markers.

Mike: As Petter mentioned, running ACCENSE results in different numbers of clusters for the same data set depending on how you set the significance value. For our purposes, having many clusters would be nice, but we also don’t want to have so many clusters that the differences in functional characteristics become insignificant in the clinical context (ie: having 3,000 clusters for 6,000 cells would have little practical value for our purposes).

Please let me know if you have additional comments and advices.

Joel
<<

mleipold

Guru

Posts: 5796

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Fri Feb 06, 2015 9:36 pm

Re: automated clustering for large data sets (ACCENSE)

Hi Joel,

I agree, having 3000 clusters for 6000 cells isn't useful. But 240 clusters for 350K cells seems very reasonable to me.

As an example: in some of my testing of different algorithms, I run the same FCS file through the given algorithm, playing with different settings. In this case, it's a PBMC sample of about 270K cells, with 33 markers. There are a couple out of the 33 that aren't terribly useful, but for consistency, I use all 33 in the clustering.

For reasonable settings (I can give the details if needed)
SPADE: 241-590 clusters, with 10-96 cells in the smallest cluster
SWIFT: 1787-2099 clusters, with 34 cells in the smallest cluster
ACCENSE: P=0.01, 1297 clusters, 8 cells in the smallest cluster
P=0.000001, 539 clusters, 27 cells in the smallest cluster

While the numbers do vary, they're all in the same order of magnitude, of mid to high hundreds of clusters. And remember, these are very different algorithms.


So, I don't think 240 clusters for 350K cells is at all unreasonable. And while I did have a difference in my ACCENSE results, it was only a 2-3 fold change in cluster number, for a 1e4 difference in P value.


Mike

Return to CyTOF data analysis

Who is online

Users browsing this forum: No registered users and 15 guests