Median transformation?
Posted: Mon Feb 19, 2024 4:25 pm
Hello,
I'm new to CyTOF but have a lot of experience analyzing single cell RNA data.
In single cell RNA, it is common to summarize the expression of a gene in a cell type using the mean (or sum of) expression across cells. However, in CyTOF, I've noticed that everyone uses the median.
Looking at the distribution of markers in my CyTOF data, it's clear to me that in general the data have a very long right tail (even after arcsinh(x/5) transformation). So, my guess is that people use the median because they are worried that this fat right tail may cause the mean to be very sensitive to points in that tail. However, what is less clear to me is if these points in the right tail are really "outliers", or if they are just cells with high expression being correctly measured. Is there a reason to believe that these cells are technical artifacts? Or, is there a biological reason to care less about cells with very high expression relative to the rest of the population (which would cause one to prefer the median to the mean)?
Thanks for your help!
I'm new to CyTOF but have a lot of experience analyzing single cell RNA data.
In single cell RNA, it is common to summarize the expression of a gene in a cell type using the mean (or sum of) expression across cells. However, in CyTOF, I've noticed that everyone uses the median.
Looking at the distribution of markers in my CyTOF data, it's clear to me that in general the data have a very long right tail (even after arcsinh(x/5) transformation). So, my guess is that people use the median because they are worried that this fat right tail may cause the mean to be very sensitive to points in that tail. However, what is less clear to me is if these points in the right tail are really "outliers", or if they are just cells with high expression being correctly measured. Is there a reason to believe that these cells are technical artifacts? Or, is there a biological reason to care less about cells with very high expression relative to the rest of the population (which would cause one to prefer the median to the mean)?
Thanks for your help!