CyTOF/IMC/MIBI dataset reuse
Hi all,
I posted this on my LinkedIn earlier this week, but thought that the broader Cytoforum audience might be interested too.
Since we just passed 4000 CyTOF/IMC/MIBI-TOF papers, I thought I'd go back and look at how often datasets are reused.
This is a tricky number to calculate: for example, the Weber and Robinson 2016 algorithm comparison paper is often cited, but they didn't generate any new data in their FR-FCM-ZZPH accession: that's a compilation of several different datasets (including 2015-Levine et al-Cell and 2016-Samusik et al-Nat Methods).
Whenever possible, I go back to original papers. So, in this example, I wouldn't count 2016-Weber and Robinson, but would instead count 2015-Levine et al-Cell and 2016-Samusik et al-Nat Methods.
That aside, by my count, CyTOF/IMC/MIBI-TOF datasets have been reused a total of 856 times. There are over 300 datasets reused.
1. The top 5 datasets reused account for 25% of all reuse. These are 3 suspension datasets, 1 IMC, and 1 MIBI-TOF.
* these are also the primary datasets being used for algorithm development, which is limiting and in my opinion troubling.
2. The top 10 account for 1/3 of all reuse, and are also the only datasets reused 10 or more times.
3. Only 33 datasets (~9% of all datasets) are used 5+ times, and account for 51% of all reuse.
Looking at it another way, there's a really long tail of ~180 datasets that are reused only once.
MIke
I posted this on my LinkedIn earlier this week, but thought that the broader Cytoforum audience might be interested too.
Since we just passed 4000 CyTOF/IMC/MIBI-TOF papers, I thought I'd go back and look at how often datasets are reused.
This is a tricky number to calculate: for example, the Weber and Robinson 2016 algorithm comparison paper is often cited, but they didn't generate any new data in their FR-FCM-ZZPH accession: that's a compilation of several different datasets (including 2015-Levine et al-Cell and 2016-Samusik et al-Nat Methods).
Whenever possible, I go back to original papers. So, in this example, I wouldn't count 2016-Weber and Robinson, but would instead count 2015-Levine et al-Cell and 2016-Samusik et al-Nat Methods.
That aside, by my count, CyTOF/IMC/MIBI-TOF datasets have been reused a total of 856 times. There are over 300 datasets reused.
1. The top 5 datasets reused account for 25% of all reuse. These are 3 suspension datasets, 1 IMC, and 1 MIBI-TOF.
* these are also the primary datasets being used for algorithm development, which is limiting and in my opinion troubling.
2. The top 10 account for 1/3 of all reuse, and are also the only datasets reused 10 or more times.
3. Only 33 datasets (~9% of all datasets) are used 5+ times, and account for 51% of all reuse.
Looking at it another way, there's a really long tail of ~180 datasets that are reused only once.
MIke