Order of operations for implementing PeacoQC?
Hi all,
Some of my users have recently started implementing PeacoQC into their workflows, and a question came up:
At exactly what point of the cleaning/analysis workflow is it best to implement it?
One user had a Normalized Concatenated Debarcoded file she received from some collaborators that had a gap (partial clog or something causing low/no events). She uploaded it to Cytobank and was trying to remove that gap with PeacoQC on Cytobank. She was unable to do so; she contacted Cytobank and they suggested that she go back to the original individual file for that segment and try to remove it there.
In this case, she doesn't have access to that file. But is that really the intended order of operations, to do it immediately on the Raw file?
In other words, is it meant to go:
1. Acquire sample (Raw file).
2. PeacoQC
3. Normalize
4. Concatenate
5. Debarcode
I reread the PeacoQC paper, and didn't find an explicit statement (only part I could find was "Note that all the files used in this work were first put through a preprocessing pipeline that included the removal of margin events, compensation and transformation where necessary.", which could happen upstream even of Normalization). However, I did see this response on the Github site: https://github.com/saeyslab/PeacoQC/issues/11 "I would recommend to first perform PeacoQC and then apply normalization on the files."
A lot of CyTOF users do:
1. Acquire sample (Raw file)
2. Fluidigm/SBio norm
3. Concatenate
4. Debarcode
(Steps 2-4 often being done while cleaning the instrument at the end of the run, so easy timing)
As was discussed recently in several CYTO2023 workshops, I realize that pipeline step order can vary depending on use case (or data quality). And PeacoQC will of course *execute* on CCT files ("To evaluate algorithm running times, we artificially created a set of files ranging from 1000 to 3,000,000 cells, by concatenating file 010 of the flowCAPIV data five times and uniformly sampling the relevant number of cells. This approach ensures that the amount of quality issues to detect stays similar for all files in this evaluation."), but whether that's optimal for anomaly removal isn't clear.
Comments/feedback on "ideal" step order here would be appreciated, especially since that would entail a lot of people changing their on-CyTOF-computer workflow. And even changes if you're wanting to use PeacoQC on Cytobank, since you're generally uploading your already-Normalized Concatenated Debarcoded files there.....
Mike
Some of my users have recently started implementing PeacoQC into their workflows, and a question came up:
At exactly what point of the cleaning/analysis workflow is it best to implement it?
One user had a Normalized Concatenated Debarcoded file she received from some collaborators that had a gap (partial clog or something causing low/no events). She uploaded it to Cytobank and was trying to remove that gap with PeacoQC on Cytobank. She was unable to do so; she contacted Cytobank and they suggested that she go back to the original individual file for that segment and try to remove it there.
In this case, she doesn't have access to that file. But is that really the intended order of operations, to do it immediately on the Raw file?
In other words, is it meant to go:
1. Acquire sample (Raw file).
2. PeacoQC
3. Normalize
4. Concatenate
5. Debarcode
I reread the PeacoQC paper, and didn't find an explicit statement (only part I could find was "Note that all the files used in this work were first put through a preprocessing pipeline that included the removal of margin events, compensation and transformation where necessary.", which could happen upstream even of Normalization). However, I did see this response on the Github site: https://github.com/saeyslab/PeacoQC/issues/11 "I would recommend to first perform PeacoQC and then apply normalization on the files."
A lot of CyTOF users do:
1. Acquire sample (Raw file)
2. Fluidigm/SBio norm
3. Concatenate
4. Debarcode
(Steps 2-4 often being done while cleaning the instrument at the end of the run, so easy timing)
As was discussed recently in several CYTO2023 workshops, I realize that pipeline step order can vary depending on use case (or data quality). And PeacoQC will of course *execute* on CCT files ("To evaluate algorithm running times, we artificially created a set of files ranging from 1000 to 3,000,000 cells, by concatenating file 010 of the flowCAPIV data five times and uniformly sampling the relevant number of cells. This approach ensures that the amount of quality issues to detect stays similar for all files in this evaluation."), but whether that's optimal for anomaly removal isn't clear.
Comments/feedback on "ideal" step order here would be appreciated, especially since that would entail a lot of people changing their on-CyTOF-computer workflow. And even changes if you're wanting to use PeacoQC on Cytobank, since you're generally uploading your already-Normalized Concatenated Debarcoded files there.....
Mike