FAQ  •  Register  •  Login

CyTOF/IMC/MIBI dataset reuse

<<

mleipold

Guru

Posts: 7177

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Thu Apr 25, 2024 3:53 pm

CyTOF/IMC/MIBI dataset reuse

Hi all,

I posted this on my LinkedIn earlier this week, but thought that the broader Cytoforum audience might be interested too.

Since we just passed 4000 CyTOF/IMC/MIBI-TOF papers, I thought I'd go back and look at how often datasets are reused.


This is a tricky number to calculate: for example, the Weber and Robinson 2016 algorithm comparison paper is often cited, but they didn't generate any new data in their FR-FCM-ZZPH accession: that's a compilation of several different datasets (including 2015-Levine et al-Cell and 2016-Samusik et al-Nat Methods).

Whenever possible, I go back to original papers. So, in this example, I wouldn't count 2016-Weber and Robinson, but would instead count 2015-Levine et al-Cell and 2016-Samusik et al-Nat Methods.


That aside, by my count, CyTOF/IMC/MIBI-TOF datasets have been reused a total of 856 times. There are over 300 datasets reused.

1. The top 5 datasets reused account for 25% of all reuse. These are 3 suspension datasets, 1 IMC, and 1 MIBI-TOF.
* these are also the primary datasets being used for algorithm development, which is limiting and in my opinion troubling.

2. The top 10 account for 1/3 of all reuse, and are also the only datasets reused 10 or more times.

3. Only 33 datasets (~9% of all datasets) are used 5+ times, and account for 51% of all reuse.

Looking at it another way, there's a really long tail of ~180 datasets that are reused only once.


MIke
Attachments
Datset reuse-count and sum-042324.png
Datset reuse-count and sum-042324.png (231.78 KiB) Viewed 7135 times
<<

mleipold

Guru

Posts: 7177

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Thu Apr 25, 2024 6:55 pm

Re: CyTOF/IMC/MIBI dataset reuse

So, a related question: What is limiting dataset reuse?

Put differently, if you're a researcher and generate CyTOF/IMC/MIBI data, do you try to search for an independent dataset that might support your findings? Or if you're an algorithm developer, how do you decide which dataset(s) (beyond Levine or Samusik! or even Jackson and Keren for imaging!) to use to test or benchmark your algorithm?

If you don't search for independent data, why not? Too hard to find relevant independent datasets? The ones you find aren't useful (too different from your panel? N too small? some confounder like strong batch effect makes it too difficult to re-analyze?)?


With the amount of data reuse present in Sequencing, I'm really curious why that field has really adopted reuse, and yet cytometry (CyTOF or flow) really haven't to the same extent.

Return to CyTOF general discussion

Who is online

Users browsing this forum: No registered users and 3 guests