FAQ  •  Register  •  Login

Concatenating a large number of samples

Forum rules
Please be as geeky as possible. Reference, reference, reference.
Also, please note that this is a mixed bag of math-gurus and mathematically challenged, so choose your words wisely :-)
<<

loukias

Participant

Posts: 4

Joined: Tue Oct 31, 2023 2:05 pm

Post Wed Dec 06, 2023 11:50 am

Concatenating a large number of samples

Hi All!
I was not able to find a topic on this so thought I would ask the community. I am new to analysing cytof data and am not entirely sure how to proceed with my data set, particularly on how I should concatenate to run a tSNE analysis on it to begin with.
I have 3 groups of 17 mice each. Each group was treated with a different drug. I want to compare the effect the different drugs have on the cells. Originally I thought I could maybe concatenate each group separately so I can distinguish the different groups, then concatenate the concatenated files so I can analyse all at the same time. Is this even a valid way of doing this kind of analysis?
My thought process is essentially that if I just go ahead and concatenate all 51 files in one go I will not be able to separate the different treatment groups.
Or would this be a situation where I need to have keywords for every parameter I want to make comparisons etc and to just go ahead and concatenate everything just once?

Thank you
<<

MCOlivier

Contributor

Posts: 47

Joined: Mon Oct 05, 2015 9:48 am

Location: European Genomic Institute for Diabetes, Universitity of Lille, Institut Pasteur de Lille, France

Post Wed Dec 06, 2023 1:28 pm

Re: Concatenating a large number of samples

Hi Lou,

The thing is that, depending on the package/software you are using, you can trace the individual files and identify them depending on their name. Like group1_mouse1, group1_mouse2, group2_mouse1, group2_mouse2... At the end, as for dimension reduction, you down sample a part of all the input files, the dimension reduction will be realized on those down sampled events as a single bulk of cells (as if they were concatenated). So, if you use a package/software that allow you to group individuals after dimension reduction, you can totally pool your samples without not any need for concatenation (keeping individual data untouched, what is best).

As an example, if you use cytofkitlab or cytofkit, you can group your samples after dim reduction and clustering on the ShinyAPP interface, getting grouped as well as individual data, with downstream statistical power conserved for quantification of clusters depending on groups n

Hope this helps ;-)
Regards
Olivier
<<

loukias

Participant

Posts: 4

Joined: Tue Oct 31, 2023 2:05 pm

Post Wed Dec 06, 2023 2:55 pm

Re: Concatenating a large number of samples

MCOlivier wrote:So, if you use a package/software that allow you to group individuals after dimension reduction, you can totally pool your samples without not any need for concatenation (keeping individual data untouched, what is best).


Hi Olivier, thank you for your reply!
I have limited bioinformatics knowledge so I will start analysing using FlowJo for now. So far from what I have seen, to compare data on flowjo the dimension reduction needs to be done on all samples as a concatenated file.
What software or package could I use to group after dimension reduction?

I have not been able to use cytofkit on R at all. Do I need to run an older version of R to be able to use it?

Ideally, I would not group individuals as I also want to look at sex-specific variation in the data for example but for flowjo I have not really seen bigger number of samples being analysed like this.
<<

MCOlivier

Contributor

Posts: 47

Joined: Mon Oct 05, 2015 9:48 am

Location: European Genomic Institute for Diabetes, Universitity of Lille, Institut Pasteur de Lille, France

Post Wed Dec 06, 2023 3:02 pm

Re: Concatenating a large number of samples

back ;-)

Sorry but I do not like at all the flowjo pluggins, so I simply don't use them.
You can not run dim reduction independently on different samples (SNE or Umaps) and compare them, because they will always be deifferent from one to each other. You must run all your samples at once. If FlowJo really needs one single concatenated file, it's like paleolytic... (sorry for any fan of FlowJo ;-)

Try CytofKitLab here :
https://github.com/i-cyto/cytofkitlab

It's the CytofKit ameliorated by @Samuel Granjeaud, with an easy GUI interface to start with "click and play" buttons. This GUI interface is really convenient for beginners (even if it lacks a part of the tunable SNE/Umap parameters). By careful for the type of transformation you apply to your data (depending on the nature of the cytometry techno you are using, i.e. arsinh for mass and logicle for flow).

You just need to follow the instructions on the GitHub page for installation, them load the package in your library, then launch the GUI interface.

Enjoy ;-)

Olivier
<<

CRStevens

Master

Posts: 61

Joined: Thu Jul 17, 2014 5:07 pm

Post Wed Dec 06, 2023 3:20 pm

Re: Concatenating a large number of samples

I agree it is important to do your dimensionality reduction on all samples together. I also would say that doing this type of work on FlowJo is probably more complicated than finding some ofther analysis tools that are out there right now. If R isn't your cup of tea, then try some other platforms like OMIQ or Tercen, ect. I know Tercen has a free public version that allow you to do quite a bit of work without having to pay a license fee. I believe it is based on the amount of storage you use.

These platforms also allow for the addition of your metadata to your analysis. So you can visualize your tsne afterwards by group/sex/age ect.
The connection of this metadata is really important factor and one that third party vendors have started to incorporate into their platforms.

These platforms are the bridging gap between biologists and bioinformaticians. I know a decade ago I forced myself to learn R due to necessity with the lack of analysis tools available, but with so many new tools out there it's not as necessary anymore.

-Chad
<<

loukias

Participant

Posts: 4

Joined: Tue Oct 31, 2023 2:05 pm

Post Wed Dec 06, 2023 3:31 pm

Re: Concatenating a large number of samples

Thank you for all of your tips! I will look up those other platforms too and give it a go. Thank you for the cytofkit link as well!

Lou
<<

tomash

Contributor

Posts: 25

Joined: Sun Oct 19, 2014 10:15 pm

Post Thu Dec 07, 2023 11:54 pm

Re: Concatenating a large number of samples

Hi Lou,

Excellent questions. I would like to add a few thoughts to help frame your approach here.

1. As mentioned by others, you definitely want to merge _all_ your samples together into a singular analysis. If you think of it this way, you want the clustering algorithm to run on a single dataset, which may (or may not) be comprised of multiple samples/groups etc. It's almost like your are blinding the algorithm to the presence of multiple samples, and then unmixing them after the analysis is complete. Depending on the software you use, it may require you to explicitly merge these together before the analysis, or it might do this 'under the hood' so check the instructions carefully.

2. Try not to think of this analysis as 'tSNE analysis' -- tools like tSNE and UMAP are helpful for visualising things with single-cell resolution, but they a) don't actually do the quantification or statistical work and b) have limitations to do with interpretations etc. The algorithms doing the heavy lifting are typically clustering or perhaps classification algorithms. This is also important for scale -- to get a reasonable statistical analysis, you want as many cells per sample as possible. Clustering algorithms like FlowSOM can take tens of millions of cells no problem, but tSNE will struggle above ~100K. Other options like Opt-SNE, FIt-SNE, or UMAP will handle larger numbers of cells, but all suffer from the same fundamental problem that they won't scale well. In this case it doesn't matter too much -- you can run the full analysis using clustering, and just visualise a subset of cells using tSNE/UMAP etc.

3. We detail such a workflow in our analysis toolkit Spectre. It is an R package, but it requires interacting with code instead of a point-and-click interface, but it's designed to be user friendly for wet-lab scientists, so if you want to give it a try feel free. The key feature here that may be most helpful for you is that it is designed explicitly for rapid processing of very large datasets, aided by the data.table framework in R. We also detail how to replicate the design of our workflow (i.e. merge samples --> cluster --> downsample --> tSNE/UMAP/whatever) in other programs like FlowJo. You can use the same strategy we outlike in other programs too, including CytofKitLab.

https://immunedynamics.io/spectre/

Check out the getting started tutorials, and then the 'simple discovery' workflow is the one to use for the kind of analysis you have described (R and FlowJo versions). The demo dataset is CNS cells from group of 8 mice (4x mock and 4x virus infected).

If you need any help, feel free to reach out -- you can reply here, or we have a discussion board https://github.com/ImmuneDynamics/Spectre/discussions, or you can email us directly.

Good luck

Tom
<<

loukias

Participant

Posts: 4

Joined: Tue Oct 31, 2023 2:05 pm

Post Tue Dec 12, 2023 12:58 pm

Re: Concatenating a large number of samples

Hi Tom,
Thank you for the tips!! I will have a look at SPECTRE as well and see how I get along!
I was originally considering tSNE to complement FlowSOM analysis as a starting point, so thank you for pointing out some of the issues with this analysis! I will definitely keep that in mind. I will checkout UMAP analysis too. :)

Return to CyTOF data analysis

Who is online

Users browsing this forum: No registered users and 3 guests