Post Tue Nov 05, 2024 7:05 pm

2024-Wang et al-preprint

"Generalized cell phenotyping for spatial proteomics with language-informed vision models"
Xuefei (Julie) Wang, Rohit Dilip, Yuval Bussi, Caitlin Brown, Elora Pradhan, Yashvardhan Jain, Kevin Yu, Shenyi Li, Martin Abt, Katy Borner, Leeat Keren, Yisong Yue, Ross Barnowski, David Ashley Van Valen
bioRxiv, posted November 03, 2024
https://doi.org/10.1101/2024.11.02.621624

- will update when/if this is peer-reviewed and published


- Honestly, it's difficult to determine exactly which IMC and MIBI-TOF data is being reused here: while the MIBI-TOF reference 1 is 2019-Keren et al-Sci Adv (a commonly reused dataset), the IMC reference is 2014-Giesen et al-Nat Methods

- "To create a dataset that captured the diversity of marker panels, cellular morphologies, tissue heterogeneity, and technical artifacts present in the field, we first compiled data from published sources 19,27–40, as well as unpublished data deposited in the HuBMAP data portal."

- "The resulting dataset, Expanded TissueNet, consists of 10.5 million cells, spanning six imaging platforms: Imaging Mass Cytometry (IMC) 3, CO-Detection by indEXing (CODEX) 2, Multiplex Ion Beam Imaging (MIBI) 1, Iterative Bleaching Extends Multiplexity (IBEX) 5, MICS (MACSima Imaging Cyclic Staining) 4, and Multiplexed immunofluorescence (MxIF) with Cell DIVET M technology (Leica Microsystems, Wetzlar, Germany), with the majority of data coming from the first three."

- "Instruction for downloading the pretrained model weights and a subset of ExpandedTissueNet that includes all data sourced from public datasets (5.2 million cells) is available at https://vanvalenlab.github.io/deepcell-types. The remaining datasets were made available to our lab before their publication to improve model performance. These are available upon reasonable request and will be made publicly available upon publication of the corresponding manuscripts."