FAQ  •  Register  •  Login

Phenograph - choice of settings input in to cytofkit?

Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely :-)
<<

jamesaries

Participant

Posts: 14

Joined: Thu Sep 22, 2016 2:49 pm

Post Tue Jun 12, 2018 2:01 pm

Phenograph - choice of settings input in to cytofkit?

Hi everyone,

Many thanks for your recent posts on Phenograph Erin - I had a few specifics to ask about the input settings.

Please see attached - cytofkit settings.png (screen grab of what I inputted) and cytof forum upload.png (a combined tSNE of all 20 files).

I am currently looking at T cell subsets, NK cells and iNKT cells (the latter probably my rarest population) and attempting to identify immuno-dominant features that are more common in one outcome than another. I have several different cell subsets, the main ones being CD3, CD4, CD8, iNKT, gamma-delta T and NK (CD16 and CD56). Within CD4 and CD8 I am looking at memory / naive cells, exhausted as well as transcription factors (Tbet, Gata3 etc) and chemokine expression (CCR5, CCR7 and CCR9)- I guess probably 11 to 12 subsets within CD4 and CD8 separately although clearly it depends how you define them. In total 22 markers - a conservative estimate of immune cell subsets being approx 30?

When I used Phenograph (through Cytofkit), I asked it to select 10000 cells each from 20 fcs files, produced tSNE plots and generated 29 clusters (default I think). I grouped all the samples by outcome (9 in outcome A and 11 in outcome B) - creating a concatenated tSNE for each outcome. I exported the clusters and compared abundance of clusters between the two groups. Please note the attached file is the combined tSNE for all 20 files - when separated they actually look quite similar.

Not surprisingly my tSNE plots look ‘busy’ and I wonder if having more samples might make it very difficult to do this analysis in future.

I just wondered if anyone had any thoughts about how I could optimise?

For example,
1. Is 10000 cells okay? (I picked this as my smallest file had just over 10000 cells - CD45+ defined events).
2. Should I define a different number of clusters beforehand?
3. Should I try a different seed?
4. Is it worth downsampling before (as discussed on the forum)?
5. In your experience will it identify low frequency cells (surely with only 10000 cells, iNKT cells [0.1% of PBMCs] are going to be difficult to find?

Thanks so much,

I enjoy using Phenograph a lot but want to make sure I'm getting the most out of it.
BW
James
Attachments
cytof forum upload.png
cytofkit settings.PNG
<<

ErinSimonds

Master

Posts: 50

Joined: Tue May 13, 2014 8:04 pm

Post Fri Jun 15, 2018 11:33 pm

Re: Phenograph - choice of settings input in to cytofkit?

Hi James,

Thanks for sharing a plot of your data -- it definitely helps for everyone to see what you're seeing. You've asked some great questions that address issues that I believe many others have run into. I'll try to tackle each of the points/questions you raised, in order:

Phenograph ... generated 29 clusters (default I think)

It looks like you used the default settings for PhenoGraph (k=30), and got 29 clusters. These clusters seem reasonable based on your data -- they are not segmenting the tSNE map into absurd or uninformative regions. However, these clusters might be missing some smaller populations by lumping them into larger clusters. This seems to be happening with Cluster 2, which has several well-resolved islands in tSNE space, but were aggregated into a single cluster. You can try lower values for the Rphenograph_k parameter and it will produce more clusters. I would try k=25, 20, 15, and 10. I wouldn't go lower than 10.

the attached file is the combined tSNE for all 20 files - when separated they actually look quite similar.

That's a good sign and indicates your staining, data collection and normalization were consistent. You don't want to see batch effects where each patient falls in a different part of the plot.

my tSNE plots look ‘busy’

The plot you shared doesn't look busy at all to me, but maybe I'm just numb to these plots by now ;) Increasing the tSNE iterations to 5000 may tighten up some of the more spread-out islands, but I think you're getting good separation here. I'd be very happy with a plot like this!

I wonder if having more samples might make it very difficult to do this analysis in future

Now this is the real challenge. If you double the number of cells per sample, or if you try this pipeline on twice as many files, it's going to take much longer to run the analysis. The plot will probably also be more "busy". In that case, you'll have to do more iterations to make the plot "tighter", which further exacerbates the computation time. It's a vicious cycle ;)

This is the reality of using tSNE for big datasets -- it struggles with a large number of points. Newer algorithms like UMAP show promise for visualizing larger numbers of cells, but these aren't built into Cytofkit yet.

You can skip tSNE altogether and just run clustering (i.e. PhenoGraph or FlowSOM_meta) on the data. Unfortunately there appears to be a bug in Cytofkit_GUI where it will run tSNE even if you deselect it. I run the functions individually with an R script. Then you can monitor changes in cluster occupancy between your samples.

Since you're looking for populations (clusters) that differ between outcomes, you should probably try out some tools that are built for this purpose: Cydar, Statistical SCAFFOLD, or CITRUS are good options. tSNE is great for interactive data exploration, but it's not well suited A vs. B comparisons in large datasets.

Is 10000 cells okay? (I picked this as my smallest file had just over 10000 cells - CD45+ defined events).

I think this is a useful number to get a sense of the landscape, the markers that are working well (and those that aren't), and if you need to do any clean-up gates. However, it sounds like you're interested in a cell type that's present at 1/1000 frequency, so 10,000 cells per sample will be underpowered to monitor changes in that population.

Should I define a different number of clusters beforehand?

You can't set the number of clusters with PhenoGraph. You can do this with FlowSOM_meta, however. I'm not sure what you mean by "beforehand" in this context, but one general tip is to clean up the data as much as you can before running clustering/tSNE.

Should I try a different seed?

I don't do this, personally. It doesn't have a big effect if the data is robust, and you're looking for changes/trends that are robust.

Is it worth downsampling before (as discussed on the forum)?

If your goal is to look at iNKT cells, then you may want to make them a greater proportion of the data. Do you really need T cells, B cells and monocytes in the same analysis? Can you analyze the iNKT cells separately?

In your experience will it identify low frequency cells (surely with only 10000 cells, iNKT cells [0.1% of PBMCs] are going to be difficult to find?

Let's say you run clustering on a dataset of n=20 samples * 10,000 cells each = 200,000 cells total. If they each have exactly 0.1% iNKT cells, then you've given only 200 iNKT cells to the clustering/tSNE algorithms. Is that enough to define a cluster? Sometimes it is. If the cells have a very unique phenotype (i.e. endothelia), then a rare but consistent group of cells can form its own cluster. However, if they are pretty similar to other cell types in your dataset (in terms the markers that you measured and revealed to the clustering algorithm), then I wouldn't be surprised if they got merged into a similar cluster. In some cases, PhenoGraph will merge clusters, but tSNE keeps them separate. This is what's happening for Cluster 2 in the example dataset you provided. The safer route would be to enrich the iNKT cells by removing the more abundant cell type (T / B / mono). You can always measure those in a different clustering run.

I think this is off to a great start. It certainly looks like you have "structure" in your data. Best of luck with your comparison!
<<

nkhanbham

Master

Posts: 53

Joined: Wed Feb 25, 2015 3:03 pm

Post Tue Jun 19, 2018 3:00 pm

Re: Phenograph - choice of settings input in to cytofkit?

Can I just ask which transformation settings in the Cytofkit GUI are recommended for flow cytometry data files?
Naeem
<<

sgranjeaud

Master

Posts: 123

Joined: Wed Dec 21, 2016 9:22 pm

Location: Marseille, France

Post Tue Jun 19, 2018 7:25 pm

Re: Phenograph - choice of settings input in to cytofkit?

Hi,

First, some comments on Erin's answer. Merging my own opinion with works of colleagues leads to me to think that gating coarsely the population of interest or removing populations you don't care about in your current question will help a lot to answer that current question. Sometimes tSNE shows populations that Phenograph don't see, but the alternate is true also. There is no general rule IMHO, see examples below from Mair et al, 2016.

Concerning the transformation, asinh(x/150) does usually a good job for data acquired on LSR II, ie 18 bits. Some datasets need to increase the cofactor from 150 up to 500. Nevertheless, if the negative is not at zero but slightly shifted, eg asinh(intensity/150) = 1 aka zero intensity ~ 180, then the asinh transform will probably not give a correct result. In such cases, auto_logicle will perform better because it takes into account the zero shift.

Best.
Attachments
2018-06-19_211944.png
1st case
2018-06-19_211909.png
2nd case
<<

ErinSimonds

Master

Posts: 50

Joined: Tue May 13, 2014 8:04 pm

Post Tue Jun 19, 2018 7:34 pm

Re: Phenograph - choice of settings input in to cytofkit?

Good points, Sam. Thanks for recommending the Mair et al paper (https://www.ncbi.nlm.nih.gov/pubmed/26548301), it's a useful resource for these questions.

Just wanted to clarify one point -- the GUI for Cytofkit doesn't currently allow users to choose the asinh cofactor. It has the following options:

- autoLgcl
- cytofAsinh [i.e. asinh(data / 5)]
- Fixedlogicle [allows user to define parameters w, t, m, a]
- none

...so as you suggested, Naeem would do well to go with autoLgcl for digital optical cytometry data.
<<

sgranjeaud

Master

Posts: 123

Joined: Wed Dec 21, 2016 9:22 pm

Location: Marseille, France

Post Tue Jun 19, 2018 8:24 pm

Re: Phenograph - choice of settings input in to cytofkit?

Hopefully, you are there to correct me. I confess I didn't compare any transformation using cytofkit. Thanks.
<<

albertapaul

Participant

Posts: 7

Joined: Tue Oct 30, 2018 4:30 am

Post Thu Nov 01, 2018 3:35 am

Re: Phenograph - choice of settings input in to cytofkit?

Hi All,

I want to use Phenograph for spectral flow data (Aurora platform). Some background on the data generated and interests:
I am looking at human PBMCs cultured under different conditions with a panel that has 9 cytokines, CD4 subset markers and activation markers together with
a Cell tracer dye (CellTrace Violet). Want to look at different populations between stimulus conditions of PBMC from a subject, later compare results between different subject of the effects of different stimuli.
1) Any idea about transformation method (I can get Visne plots
well but sometimes have to play with the Asinh cofactor (when comparing APC vs. Alexa Fluor 647 positive subsets, they can run on this platform together).
2) In addition spectral flow data are sometimes well in the negative , especially with the autofluorescence subtraction that I use for stimulated and cultured PBMCs, how to toggle that? :mrgreen: :mrgreen:
3) With bi-exponential plots populations of less than 0.5 % are sometimes detected but are not visible in Visne. Can you indicate if this method would be suited at all?
I have exported the CD45RO_CellTrace Violet low population (divided)[flash=][img][img][*][list=][/list][/img][/img][/flash] first from FCSexpress
4) how many cells to use ?
see

Return to CyTOF data analysis

Who is online

Users browsing this forum: No registered users and 9 guests