FAQ  •  Register  •  Login

2018-Cosma-Cytometry A

<<

mleipold

Guru

Posts: 5796

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Fri Mar 02, 2018 7:35 pm

2018-Cosma-Cytometry A

"Universal cell type identifier based on number theory"
Antonia Cosma
Cytometry A, 2018,
DOI: 10.1002/cyto.a.23346
<<

mleipold

Guru

Posts: 5796

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Fri Mar 02, 2018 8:15 pm

Re: 2018-Cosma-Cytometry A

Hi all,

We can move this discussion over to Data Analysis if we want, but I think it might be useful to keep the initial discussion tied to the paper.

I think this is an interesting idea, giving a cell type a "number" based on the prime numbers assigned to the CD markers it expresses. This could allow faster searches for similarity with cells found in other papers, without completely limiting it to only exactly the same markers.

The paper does expand on ways to accommodate "what if the marker I'm using doesn't have a CD number?" and "what if I have mid/low rather than pos/neg?"


I guess my only real comment is that it potentially has lower "resolution" about signal intensity than Jonathan Irish's Marker Enrichment Modeling (DOI: 10.1002/cpcy.34 ; doi:10.1038/nmeth.4149).


Thoughts? Especially from computational people who would be integrating such searches into workflows....

Mike
<<

bc2zbUVA

Contributor

Posts: 22

Joined: Thu Nov 19, 2015 4:23 pm

Post Fri Mar 02, 2018 8:58 pm

Re: 2018-Cosma-Cytometry A

Very elegant approach, thanks for posting this. I don't have a huge issue with loss in resolution, as the potential gain in the number of markers being stored at once seems like it would be worthwhile. I haven't run into a search issue yet that hasn't been solvable by more elegant coding, but I will keep this approach in mind should it ever happen. I'd love to see it applied to scRNA seq, where we run into these sorts of issues a lot more frequently. Currently, I'm encoding immunophenotypes using the phenocodes approach from the FlowType R package, but that varies from experiment to experiment. The thing I appreciate about this approach is that every ID that would be annotated could be derived regardless of the panel. My biggest concern would be how this handles extremely large phenotypes, though apparently we can get up to 20,000 markers easily enough.

Of note, the 20,244th prime is a six-digit numberstill falling in the range of the elliptic curve factorization method and the software package described above.


However, storing those digits is going to become a huge drain on memory at some point.

To demonstrate the capacity to handle big integers, I multiplied all the primesassociated to the 401 CD markers shown in SupportingInformation Table 1 to obtain a UNN of 1,177 digits


Storing thousands of digits per cell is going to be extremely memory hungry. Whereas using the phenocode method, a 401 parameter signature only requires 401 digits. Again, the largest issue is search time, and I would be interested to see search comparisons of a phenocode vs the UNN approach laid out here.
<<

mleipold

Guru

Posts: 5796

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Fri Mar 02, 2018 9:23 pm

Re: 2018-Cosma-Cytometry A

I would also like to mention some practical benchwork/experimental issues: anything that would cause an experimental artifact would potentially affect this coding. For example, freeze-thaw, fixation, methanol-perm, certain types of stimulation, etc are known to affect expression levels of certain markers.

In my experience, though, most of these usually wind up as negatives (eg, False Negative where antibody doesn't bind because you messed up its epitope, or True Negative, like CD4 downregulation upon certain strong stims). Therefore, if I'm understanding Cosma's plan, you just wouldn't multiply in that associated prime, and it would give you the same product as just not including the marker in the first place.

However, there are definitely cases of False Positives, which would potentially be more problematic. One example would be the increased CD14/CD16 binding to negatives from a pre-mixed cocktail that my coauthors and I saw in the Multicenter paper (https://doi.org/10.1016/j.jim.2017.11.008)


I'm not saying there aren't ways to deal with this, but better to keep it in mind from the beginning.....

Mike
<<

cytoboy

Participant

Posts: 1

Joined: Thu Mar 08, 2018 9:07 am

Post Fri Mar 09, 2018 2:05 pm

Re: 2018-Cosma-Cytometry A

Finally, with a similar principle a more detailed definition could be achieved for population defined as expressing a marker at a “medium”/ “low” level by simply cubing the prime


This statement is wrong. You cannot process the same way to have a more detailed definition of cell phenotypes.

Here is an issue based on the author example, if CD127 (983) had 3 levels of definition:

19*23*151*277*983*983*983 = 17361958221158712

but:

17361958221158712 = 2*2*2*3*7*977*3023*34991029

This demonstrates that the PPS system cannot handle one of the most complicated problems when defining cell populations: Cell phenotypes are not defined in term of the absence or presence of cell markers, but rather in term of gradients of cell marker expressions. It is quite scary that the editor or the reviewers did not notice that issue…
<<

sgranjeaud

Master

Posts: 123

Joined: Wed Dec 21, 2016 9:22 pm

Location: Marseille, France

Post Fri Mar 09, 2018 2:35 pm

Re: 2018-Cosma-Cytometry A

Hi,
I discussed with Antonio about handling levels. If I remember correctly, this is possible using powers of the prime numbers.
I will ask him to register this forum and to answer your questions.
Nevertheless, I think that levels should not be too numerous, otherwise assignation will become very subjective.
Cheers.
<<

antonio

Contributor

Posts: 20

Joined: Wed Dec 04, 2013 3:05 pm

Post Fri Mar 09, 2018 2:41 pm

Re: 2018-Cosma-Cytometry A

Dear Cytoboy,

Some simple math.
If you multiply odd numbers you get an odd number as result. So your first calculation is simply wrong from the beginning.
I advise you to calculate correctly and you will see the PPS works.

Always calculate twice before telling something is wrong!

Regards,

Antonio Cosma

PS: I would like to add that, apart the calculation error, your are challenging the fundamental theorem of arithmetic and will be really scaring if in more than 2000 years nobody realised this problem
<<

laustr

Participant

Posts: 2

Joined: Thu Feb 22, 2018 6:26 pm

Post Fri Mar 09, 2018 4:05 pm

Re: 2018-Cosma-Cytometry A

bc2zbUVA wrote:
Storing thousands of digits per cell is going to be extremely memory hungry.


Yes i guess, but wouldn't this be solved by using a dictionary? It's not like every cell has a different cell type...

Sturla
<<

mleipold

Guru

Posts: 5796

Joined: Fri Nov 01, 2013 5:30 pm

Location: Stanford HIMC, CA, USA

Post Fri Mar 09, 2018 4:44 pm

Re: 2018-Cosma-Cytometry A

Hi all,

First of all: let's please keep comments friendly.

Second: I think there might be a computational problem that's raising its head here. A lot of computer programs limit the number of significant figures involved in calculations.

For example, Excel truncates somewhere around 15 significant figures:
http://precisioncalc.com/what_is_xlprecision.html
https://stackoverflow.com/questions/344 ... flow-error

You can see this in my attached Excel example:
Screen Shot 2018-03-09 at 8.35.50 AM.png


Everything is going OK until the final multiplication by 983, then you get a case where a number ending in "1" gets multiplied by a number ending in "3" and you result in a number ending in "0".......which I think we can all agree is incorrect. Doing it on paper by hand, I come up with a number "....,158,713" rather than ".....,158,700".


Assuming this is what's causing the issue today: this would be something that would have to be taken into CAREFUL account when writing programs to do these computations.....numerical truncation errors would make this method completely useless.


Mike
<<

bc2zbUVA

Contributor

Posts: 22

Joined: Thu Nov 19, 2015 4:23 pm

Post Sat Mar 10, 2018 4:22 am

Re: 2018-Cosma-Cytometry A

laustr wrote:bc2zbUVA wrote:
Storing thousands of digits per cell is going to be extremely memory hungry.


Yes i guess, but wouldn't this be solved by using a dictionary? It's not like every cell has a different cell type...

Sturla


Well if using a dictionary, the question becomes which dictionary is more efficient? The one where every immunophenotype is encoded by a vector of equal length, or one where every immunophenotype is encoded by a vector of variable length. I've been to a few lectures where they optimized compression of genomic variant lookups by sorting the variants based on frequency, and I could see an argument for doing a similar approach with the prime. The more frequent a marker is used to phenotype, the lower its prime value should be. However, this is beyond my expertise in data structures. I'll have to grab the table from the paper and use it to create dictionaries for some of my older analyses.
Next

Return to Literature

Who is online

Users browsing this forum: Bing [Bot], mleipold and 53 guests