Page 1 of 1

FlowJO transformation before clustering

PostPosted: Mon Aug 03, 2020 8:57 pm
by LouisS
I am wondering what kind of arcsinh transformation FlowJo applies before it runs clustering and UMAP etc. I have used cytofkit and the clustering in Bioconductor package and there you know how the data is transformed.
In FlowJo it is a bit hard to understand. Are Phenograph and FlowSOM transforming the data loaded in the R script? Most people transform with arcsinh cofactor 5. I can do that in R and then I will never get the same values like FlowJO presents to me in the plots. Anyone more familiar with this? Thanks! Best , Louis

Re: FlowJO transformation before clustering

PostPosted: Tue Aug 04, 2020 3:16 pm
by taylori
Greetings Louis,
Thank you for the excellent post.
Phenograph, FlowSom, UMAP and most other algorithms which you might want to run from within FlowJo will utilize whatever transform and scaling settings you have applied witihin the FlowJo, at the time when you launch a given calculation. This means you, as the researcher, have a lot of control over how those algorithms "see" the data fed into them, allowing you to fine tune the results.
Please feel free to reach out to us as you have any further questions going forward.
Best Regards,
Ian Taylor
Director, Product Innovation
BD Life Sciences - Informatics
Toll Free (US Only): 800-366-6045
541-201-0022 (Ext. 13)

Re: FlowJO transformation before clustering

PostPosted: Fri Aug 14, 2020 12:50 am
by tomash
To expand on Ian's helpful comments a little -- FlowJo utilises the 'channel values' of the data. Essentially the exact way you are plotting/visualising the data (i.e. with a bi-exponential/logicle transformation etc) is turned into a linear set of data by binning the data into 1024 'bins' (in FJ10), reaching from the near side of the plot to the far side. When the tSNE/UMAP/whatever results are generated on the basis of these channel values, they are then attached to the _original_ data, so you never see the channel values. The benefit of this approach within FlowJo is that you can set up the level of low-end compression exactly as your data needs. With CyTOF it's not as important, as the channels will behave fairly similarly, but in flow data this might vary wildly depending on the level of spread. You can also export the data in this format (CSV channel values) if you want to use this for other clustering tools, rather than performing an arcsinh transformation -- while the channel values do lose some information, it doesn't seem to really impact anything from my experience.

We have a page explaining some of this here: ... in+Spectre

Re: FlowJO transformation before clustering

PostPosted: Fri Aug 14, 2020 9:43 am
by LouisS
Thanks for your answers and the Website! Very helpful! I was always wondering what these channel values are :) Best, Louis

Re: FlowJO transformation before clustering

PostPosted: Fri Aug 21, 2020 4:04 am
by tomash
We've got a page on it here if you'd like some comparisons ( along with some export instructions (

Re: FlowJO transformation before clustering

PostPosted: Wed Nov 04, 2020 3:32 pm
by chasman
Thanks to tomash and Ian for your replies here.

I wonder if you have more detail about the binning algorithm. Is the *range* of the data split into 1024 uniformly ranged bins, or are the actual data *values* split into 1024 bins with equal number of values in each bin? The former would be essentially a scaling with discretization while the latter would be transforming the data to ranks.
Is the binning done after combining data from different samples? And is it done within each marker or across all markers?

I am trying to reconcile the differences that I am seeing between a FlowSOM analysis that I have run in R with a FlowSOM analysis that my collaborator ran in FlowJo, on the same dataset. I have been running it on the biexponential transformed data, because I didn't know about the 1024-channel binning process until now. The cluster expression patterns are fairly similar between my cluster mean heatmap and hers, but there are some differences that make me wonder if it's not just a matter of scaling and a different random seed. I'd like to implement the binning to see if my results are more similar to my collaborator's.

Thank you!

Re: FlowJO transformation before clustering

PostPosted: Fri Nov 06, 2020 12:33 am
by tomash
Hi Chasman,

It is indeed the *range* that is split into 1024 uniformly ranged bins (as opposed to the bins containing equivalent numbers of cells). The range, in this case, is simply the minimum to maximum values that are _plotted_ by FlowJo (e.g. for CyTOF data, I think defaults would be something like -10^1 to 2x10^4 or, close to it). It is performed on all markers that are chosen for whatever function is being used (e.g. tSNE etc in FlowJo, or for when exporting data as CSV channel values). The helpful thing, is that if you have already optimised the axis settings for each channel (especially the extent of compression of the low-end values), then that adjustment is captured in the channel values for each marker individually. The binning is done regardless of whether samples are combined or not -- as the actual data doesn't play a role in determining the binning, only the plotted range does that.

In terms of the differences between your FlowSOM results from FlowJo and R, are these differences in the fundamental structure of the results, or just different cluster ID numbers of different groups of cells? The later is quite common and would be expected -- as you have said, the simple fact that the values are going to be very different between the two datasets (e.g. one is 0 - 1024, and one is ~0 - 5), and the stochastic nature of clustering runs, mean the the cluster assignments might look quite different. It's possible that the actual cell groupings could be comparable, just with different labels (e.g. T cells in FlowJo were metacluster 1, but metacluster 5 in R, etc). However, if there are serious structural differences between the R and FlowJo runs with FlowSOM, there could be a few causes:

1. Because the level of low-end compression will be different between the channel value and arcsinh transformed data, you may have different levels of background signal in each channel which will effect the FlowSOM grid and clustering results.

2. If we assume that the compression of low-end values was similar in the channel value and arcsinh transformed data, there is still a difference. If the plot max in FlowJo was 5x10^3, for examples, then any values _above_ 5x10^3 will simply be converted to 1024. In Arcsinh transformation, there is no high-value data capping, so you may have some high expression values that would stain proportionally high after arcsinh transformation.

3. The default settings being used for each FlowSOM run may be different. The grid size determines how many first level clusters are generated, and this can impact things, especially if you have more subtle low frequency populations, or are trying to capture phenotypic 'landscapes' more than distinct populations. Are you specifying a target number of metaclusters to be generated in each, or are you letting FlowSOM choose for you?

There is an obvious question here as well: given that the binning reduces the 'resolution' of the data (i.e. a range of 0 - 10,000 is compressed to 0 - 1024), does the quality of clustering suffer? I will say that I have not formally benchmarked this, but my experience is that both perform pretty well in clustering, and I've not noticed substantial differences in how well populations are resolved. I'm very willing to be proven wrong on that, but so far so good. In some cases I've found the channel values have performed better, but this is probably because of the capping of the high-end values, which is something that cane be done to the arcsinh transformed values as well.

Re: FlowJO transformation before clustering

PostPosted: Fri Nov 06, 2020 8:57 pm
by chasman

Thank you very much for your helpful and thoughtful response.

Our data is from a gated subset of innate immune cell types (excluding T and B cells), and the 7 markers that I have capture both lineage and activation/maturation. So, the phenotypic landscape is continuous.

I am making my comparison on the basis of the cluster median heatmap that my collaborator shared with me. I ran the entire FlowSOM analysis with a few different random seeds (starting before creation of the SOM grid). I used the default grid size (10x10) and requested the same number of metaclusters that my collaborator did.

I attached an image of some of the results. In my opinion, the cluster patterns are very similar between my result and the FlowJo result, although there are some differences in relative levels of some markers in some clusters. I see variation between my own different initializations as well. For example, in one result we have a rare population show up that was missed at the same number of clusters in the other results. So, I think it's likely that the differences between my result and the FlowJo result could be explained by random chance, but I want to make sure I've considered every reason - your response has been helpful for that!

Instead of using arcsinh transformation, I have been using the biexponential transformation parameters that were exported from FlowJo in the FCS files. I received a FlowJo workspace ACS file, which included an XML workspace specification and all of the individual FCS files. I have been using flowWorkspace::flowjo_to_gatingset to read the data, which automatically performs the biexponential transformation using the parameters in the files. By default the channel range is 256, but I have updated that to 1024. I think this must just be a scaling parameter. The parameters otherwise look like this: $P8D|BIEX,262144.047111,4.418540,0.000000,-100.000000

Do you know if performing the biexponential transformation using the parameters in the FCS files would give the same result as the channel values exported by FlowJo? I'm not sure if I have enough info in those four parameters to do a capping of high end values, so perhaps not.

I am asking my collaborator if she can find any RData files that were produced by FlowJo during the FlowSOM analysis so that I can hopefully compare the FlowSOM objects and data values themselves.

Thank you again!