Data analysis development-wishlists and datasets?
On another thread (viewtopic.php?f=3&t=823), Adeeb made a comment about having some datasets in preparation to help out with the informatics problem of doublet discrimination/exclusion. This was partly in response to El-ad's comments (Wed Oct 25, 2017 3:32 pm) about not having quite the right dataset to do some of this modeling.
It got me thinking a bit about datasets, and what various labs might have "lying around" that would be helpful in things like this. While FlowRepository and Immport (among others) are great, I'm sure about every lab has some data that would be useful to *someone* doing algorithm development, benchmarking, etc, even if it's not necessarily publishable by itself.
For example, the HIMC runs a healthy control sample on almost every plate of samples we process, so we have replication data that we haven't really done a ton with, aside from using it as qualitative QC to make sure that the assay performed properly. But, it could potentially be useful to some data scientist looking at day/week/month/year variation over in-house reagent lots, different instruments (specific instruments vs specific instrument, instrument models, etc), etc.
On the flipside, I've also encountered a few instances where a data scientist was surprised by a problem or analysis need that I raised: I don't think the bench scientists are necessarily communicating *that* back to the data people.
So, 2 questions that I hope will spark a lot of discussion:
1) Data scientists, bioinformaticians, and algorithm developers (and combinations thereof): for the problems/issues you see in CyTOF data analysis, what datasets do you need (or at least want) from the bench scientists in order to better tackle them? In other words, what's your wishlist?
2) Bench researchers: what do you see as some still-unmet needs in data analysis? I'm thinking more about unsolved problems: things like A) the clean doublet discrimination mentioned above; B) detection and hopefully elimination of sample carryover; C) batch correction (both for barcoded samples *and* non-barcoded samples with a plate control). In other words, what's your wishlist?
I'm mainly thinking of datasets that might be already acquired that either haven't (yet) been made publicly available, moreso than from-scratch creation. But that would also be an option for the field as a whole, and probably a good chance for new research collaborations!
Mike