FAQ  •  Register  •  Login

FlowJO transformation before clustering

Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely :-)
<<

LouisS

Participant

Posts: 4

Joined: Mon Jan 27, 2020 12:48 pm

Post Mon Aug 03, 2020 8:57 pm

FlowJO transformation before clustering

Hej,
I am wondering what kind of arcsinh transformation FlowJo applies before it runs clustering and UMAP etc. I have used cytofkit and the clustering in Bioconductor package and there you know how the data is transformed.
In FlowJo it is a bit hard to understand. Are Phenograph and FlowSOM transforming the data loaded in the R script? Most people transform with arcsinh cofactor 5. I can do that in R and then I will never get the same values like FlowJO presents to me in the plots. Anyone more familiar with this? Thanks! Best , Louis
<<

taylori

Participant

Posts: 1

Joined: Tue Aug 04, 2020 3:14 pm

Post Tue Aug 04, 2020 3:16 pm

Re: FlowJO transformation before clustering

Greetings Louis,
Thank you for the excellent post.
Phenograph, FlowSom, UMAP and most other algorithms which you might want to run from within FlowJo will utilize whatever transform and scaling settings you have applied witihin the FlowJo, at the time when you launch a given calculation. This means you, as the researcher, have a lot of control over how those algorithms "see" the data fed into them, allowing you to fine tune the results.
Please feel free to reach out to us as you have any further questions going forward.
Best Regards,
Ian Taylor
Director, Product Innovation
BD Life Sciences - Informatics
Toll Free (US Only): 800-366-6045
541-201-0022 (Ext. 13)
flowjo@bd.com
<<

tomash

Contributor

Posts: 25

Joined: Sun Oct 19, 2014 10:15 pm

Post Fri Aug 14, 2020 12:50 am

Re: FlowJO transformation before clustering

To expand on Ian's helpful comments a little -- FlowJo utilises the 'channel values' of the data. Essentially the exact way you are plotting/visualising the data (i.e. with a bi-exponential/logicle transformation etc) is turned into a linear set of data by binning the data into 1024 'bins' (in FJ10), reaching from the near side of the plot to the far side. When the tSNE/UMAP/whatever results are generated on the basis of these channel values, they are then attached to the _original_ data, so you never see the channel values. The benefit of this approach within FlowJo is that you can set up the level of low-end compression exactly as your data needs. With CyTOF it's not as important, as the channels will behave fairly similarly, but in flow data this might vary wildly depending on the level of spread. You can also export the data in this format (CSV channel values) if you want to use this for other clustering tools, rather than performing an arcsinh transformation -- while the channel values do lose some information, it doesn't seem to really impact anything from my experience.

We have a page explaining some of this here: https://wiki.centenary.org.au/display/S ... in+Spectre
<<

LouisS

Participant

Posts: 4

Joined: Mon Jan 27, 2020 12:48 pm

Post Fri Aug 14, 2020 9:43 am

Re: FlowJO transformation before clustering

Thanks for your answers and the Website! Very helpful! I was always wondering what these channel values are :) Best, Louis
<<

tomash

Contributor

Posts: 25

Joined: Sun Oct 19, 2014 10:15 pm

Post Fri Aug 21, 2020 4:04 am

Re: FlowJO transformation before clustering

We've got a page on it here if you'd like some comparisons (https://wiki.centenary.org.au/x/w8yACQ) along with some export instructions (https://wiki.centenary.org.au/x/1eRfCQ).
<<

chasman

Participant

Posts: 2

Joined: Tue Nov 03, 2020 4:37 pm

Post Wed Nov 04, 2020 3:32 pm

Re: FlowJO transformation before clustering

Thanks to tomash and Ian for your replies here.

I wonder if you have more detail about the binning algorithm. Is the *range* of the data split into 1024 uniformly ranged bins, or are the actual data *values* split into 1024 bins with equal number of values in each bin? The former would be essentially a scaling with discretization while the latter would be transforming the data to ranks.
Is the binning done after combining data from different samples? And is it done within each marker or across all markers?

I am trying to reconcile the differences that I am seeing between a FlowSOM analysis that I have run in R with a FlowSOM analysis that my collaborator ran in FlowJo, on the same dataset. I have been running it on the biexponential transformed data, because I didn't know about the 1024-channel binning process until now. The cluster expression patterns are fairly similar between my cluster mean heatmap and hers, but there are some differences that make me wonder if it's not just a matter of scaling and a different random seed. I'd like to implement the binning to see if my results are more similar to my collaborator's.

Thank you!
<<

tomash

Contributor

Posts: 25

Joined: Sun Oct 19, 2014 10:15 pm

Post Fri Nov 06, 2020 12:33 am

Re: FlowJO transformation before clustering

Hi Chasman,

It is indeed the *range* that is split into 1024 uniformly ranged bins (as opposed to the bins containing equivalent numbers of cells). The range, in this case, is simply the minimum to maximum values that are _plotted_ by FlowJo (e.g. for CyTOF data, I think defaults would be something like -10^1 to 2x10^4 or, close to it). It is performed on all markers that are chosen for whatever function is being used (e.g. tSNE etc in FlowJo, or for when exporting data as CSV channel values). The helpful thing, is that if you have already optimised the axis settings for each channel (especially the extent of compression of the low-end values), then that adjustment is captured in the channel values for each marker individually. The binning is done regardless of whether samples are combined or not -- as the actual data doesn't play a role in determining the binning, only the plotted range does that.

In terms of the differences between your FlowSOM results from FlowJo and R, are these differences in the fundamental structure of the results, or just different cluster ID numbers of different groups of cells? The later is quite common and would be expected -- as you have said, the simple fact that the values are going to be very different between the two datasets (e.g. one is 0 - 1024, and one is ~0 - 5), and the stochastic nature of clustering runs, mean the the cluster assignments might look quite different. It's possible that the actual cell groupings could be comparable, just with different labels (e.g. T cells in FlowJo were metacluster 1, but metacluster 5 in R, etc). However, if there are serious structural differences between the R and FlowJo runs with FlowSOM, there could be a few causes:

1. Because the level of low-end compression will be different between the channel value and arcsinh transformed data, you may have different levels of background signal in each channel which will effect the FlowSOM grid and clustering results.

2. If we assume that the compression of low-end values was similar in the channel value and arcsinh transformed data, there is still a difference. If the plot max in FlowJo was 5x10^3, for examples, then any values _above_ 5x10^3 will simply be converted to 1024. In Arcsinh transformation, there is no high-value data capping, so you may have some high expression values that would stain proportionally high after arcsinh transformation.

3. The default settings being used for each FlowSOM run may be different. The grid size determines how many first level clusters are generated, and this can impact things, especially if you have more subtle low frequency populations, or are trying to capture phenotypic 'landscapes' more than distinct populations. Are you specifying a target number of metaclusters to be generated in each, or are you letting FlowSOM choose for you?

There is an obvious question here as well: given that the binning reduces the 'resolution' of the data (i.e. a range of 0 - 10,000 is compressed to 0 - 1024), does the quality of clustering suffer? I will say that I have not formally benchmarked this, but my experience is that both perform pretty well in clustering, and I've not noticed substantial differences in how well populations are resolved. I'm very willing to be proven wrong on that, but so far so good. In some cases I've found the channel values have performed better, but this is probably because of the capping of the high-end values, which is something that cane be done to the arcsinh transformed values as well.
<<

chasman

Participant

Posts: 2

Joined: Tue Nov 03, 2020 4:37 pm

Post Fri Nov 06, 2020 8:57 pm

Re: FlowJO transformation before clustering

Tomash,

Thank you very much for your helpful and thoughtful response.

Our data is from a gated subset of innate immune cell types (excluding T and B cells), and the 7 markers that I have capture both lineage and activation/maturation. So, the phenotypic landscape is continuous.

I am making my comparison on the basis of the cluster median heatmap that my collaborator shared with me. I ran the entire FlowSOM analysis with a few different random seeds (starting before creation of the SOM grid). I used the default grid size (10x10) and requested the same number of metaclusters that my collaborator did.

I attached an image of some of the results. In my opinion, the cluster patterns are very similar between my result and the FlowJo result, although there are some differences in relative levels of some markers in some clusters. I see variation between my own different initializations as well. For example, in one result we have a rare population show up that was missed at the same number of clusters in the other results. So, I think it's likely that the differences between my result and the FlowJo result could be explained by random chance, but I want to make sure I've considered every reason - your response has been helpful for that!

Instead of using arcsinh transformation, I have been using the biexponential transformation parameters that were exported from FlowJo in the FCS files. I received a FlowJo workspace ACS file, which included an XML workspace specification and all of the individual FCS files. I have been using flowWorkspace::flowjo_to_gatingset to read the data, which automatically performs the biexponential transformation using the parameters in the files. By default the channel range is 256, but I have updated that to 1024. I think this must just be a scaling parameter. The parameters otherwise look like this: $P8D|BIEX,262144.047111,4.418540,0.000000,-100.000000

Do you know if performing the biexponential transformation using the parameters in the FCS files would give the same result as the channel values exported by FlowJo? I'm not sure if I have enough info in those four parameters to do a capping of high end values, so perhaps not.

I am asking my collaborator if she can find any RData files that were produced by FlowJo during the FlowSOM analysis so that I can hopefully compare the FlowSOM objects and data values themselves.

Thank you again!
Deborah
Attachments
flowjo_vs_r_flowsom.pdf
Comparison of FlowSOM cluster heatmaps
(244.58 KiB) Downloaded 619 times

Return to CyTOF data analysis

Who is online

Users browsing this forum: Google [Bot] and 12 guests