Hi all,
Like Lisa, I get this question all the time. There definitely are some algorithms I prefer, or steer people to as a "first pass"......like Tom, FlowSOM is usually my own first-pass nowadays.
I definitely agree that not every algorithm is going to be appropriate for every study/dataset. This *is* part of the reason I recently posted about compiling a database of publicly available CyTOF datasets (
viewtopic.php?f=3&t=1047): not only for algorithm developers to have a variety of things to use to try to "break" (or at least strongly test) their algorithms, but as a resource for people generating data and then looking at other datasets that might be "similar" enough (panel, cell type, staining conditions, cell rarity, etc), seeing what's been used successfully, and then using that as a place to *start*. Obviously, new ones will have come out, but I find a lot of my users have an analysis paralysis of "where do I even start?!"
Regarding the benchmarking studies: both FlowCAP and some of the others seem to use a lot of the same datasets. Even a lot of the newer algorithms use the same datasets. This is good in some ways, as it gives a way to relate the different results. However, I think it also is a limitation: as good as, say, the Bendall et al Science paper dataset is, it's not the only type of experiment that people are doing. Therefore, if your experiment is designed differently than that type, you may not have a lot of information from the current benchmarking articles.
Let me be clear: I don't think every algorithm is good for every study. And I also don't think it's bad to *demonstrate* both the strengths *and* weaknesses of new algorithms; I personally love it when authors (in the paper or at least on their Github for the code) give examples of the limitations of their approaches!
That also leads me into another topic: documentation. I think algorithm developers (new and old code) can do a better job of documenting how to use their software. One example of this is the original documentation for Citrus on Github: the paper was published in PNAS in July 2014. However, the "Getting Started" Github wiki didn't explicitly state "Citrus requires 8 or more samples in each experimental group for Citrus in order to work as expected" until Nov 2014.
I'm not meaning to single out the Citrus developers on this: a lot (I would personally say most) algorithm documentation suffers in this respect. From the developers' POV, much of this type of thing may be obvious based on the way they wrote their code. However, from the end-user POV, this makes it a lot harder for us to immediately exclude certain algorithms because our dataset (N, cell rarity, etc) isn't appropriate.
Or, put another way, the users don't want to misuse your algorithms any more than the developers *want* them to be misused.
So, Lisa, Tom, and others: are there some newer or more troublemaking datasets in particular that you would especially recommend authors should keep in mind for future benchmarking studies? Developers, are there datasets you would like to *see* to *try* to break your algorithms on (
viewtopic.php?f=3&t=874&p=2538)?
One I might recommend are some datasets from Amir Horowitz or Catherine Blish, where there's a highly cell-type-focused panel, that additionally has some rare cells in it.....I think that's a kind that a number of algorithms will have trouble with (particularly those that strongly downsample or have frequency thresholds), but would be relevant to people trying to find things on the rarity level of, say, tetramer-positive cells.
Mike