Page 1 of 1

Batch alignment for whole blood studies

PostPosted: Fri Jan 22, 2021 2:52 am
by tomash
Hi everyone,
I would like to initiate a discussion concerning the use of reference control samples for batch alignment in whole blood studies.

When performing cytometry analysis on samples stained in multiple batches, batch effects can occur. A number of approaches existing for performing ‘batch alignment’, which seeks to remove the technical (batch-derived) variance. A few of these (such as CytoNorm & CytofBatchAdjust) use ‘reference’ samples for this purpose. The classic example here would be a large study using cryopreserved PBMCs -- samples are frozen down when they are arrive at the lab, and then thawed/stained/run together in batches. In addition to this, a single donor may provide a large number of aliquots of cryopreserved PBMCs to serve as reference controls, where an aliquot can be thawed/stained/acquired alongside each batch of samples. Because the reference control aliquots are all biologically identical, any differences between then are due to the thawing/staining/acquisition that occur in each batch. In this way, the relative changes in expression patterns/ranges can be modelled and adjusted, and this same model can be applied to the samples -- removing batch-derived variance, while preserving biological variance. There are a few key requirements for reference controls, such as the need to span the full range of data that is to be aligned (e.g. if the reference control cells don’t express high enough of a certain marker, that marker can’t be aligned in the actual samples). There is a much longer discussion about the considerations required here, but this is enough information for the discussion below.

In a number of studies, whole blood samples may be prepared, stained, and run as they are arrive at the lab (i.e. not fixed or cryopreserved to processed in batches at a later stage). This essentially means each sample (or couple of samples) is a separate batch. This somewhat precludes the inclusion of reference controls, as, in theory, the same donor would have to provide fresh whole blood on each occasion a new sample arrives.

In some studies, an aliquot of cryopreserved PBMCs (from a single donor) may be thawed and stained/run alongside each sample as a form of qualitative control (i.e. it would reveal any major differences between batches due to staining conditions -- ab concentration, time, temperature, antibody lot, missing antibodies etc). It is feasible that such a control could serve as a reference control for alignment purposes. This would not be an ideal solution by any means, as the reference and actual samples are treated differently, leading to differences in marker expression patterns and population proportions (and the exclusion of whole cell types), as well as any biological changes induced by the isolation/cryopreservation process. However, differences due to shifts in staining conditions (as above) could still be modelled (for the markers and cell types that are common to both the reference PBMC sample and the whole blood samples), and this might be sufficient to adjust for staining artefacts from day to day.

There are other approaches that have been published previously, including aligning samples to each other (which assumes that most major phenotyping markers are stably expressed between patients), though this approach really only works to get samples ‘similar’ to each other to simplify gating/clustering, and is sensitive to changes in some key markers (e.g. CD3 might be downregulated due to stimulation etc).

I have a few questions for the community, and I would be very interested in a variety of responses:
1. In a whole-blood staining situation as described above, what would be your preferred method of dealing with batch effects? I have deliberately excluded any discussion of careful attention to ensuring reproducible staining procedures above, but if your answer is ‘we just stain really carefully and consistently’ then I would love to hear that as well.
2. For whole blood, do you fix the samples (using smartube buffer etc) so that they can be stained and run in batches? Does the advantage of fixing/preserving the samples to run them together in batches outweigh the potential impact on marker expression?
3. For those of you who work with fresh whole blood samples, do you ever use a cryopreserved PBMC sample as a type of qualitative control? The approach described above somewhat assumes that these reference PBMC samples might be commonly collected anyway.
4. Has anyone ever tried performing batch alignment using an approach like this?
5. Do you find it easier/better/preferable to just accept the batch differences, and rely on adjusting the gates in each sample to accomodate these differences?

Thanks for your consideration, and I’m looking forward to hearing different perspectives!


Re: Batch alignment for whole blood studies

PostPosted: Fri Jan 22, 2021 4:29 pm
by mleipold
Hi Tom,

Here are some thoughts I had:

1. What is the reason for running the samples immediately? Put differently, is there a reason that you couldn't do an initial staining step for surface markers, and *then* fix and freeze (PFA, or SmartTube, Cytodelics, etc *after* surface staining to address fixation-staining effects)? That way, you could "save up" several samples to perm, so any potential intracellular markers and Ir, and then run (or barcode before intracellular, if you wanted further consistency).
- for example, it may not be worth it to "save up" samples if you only get 1 sample every 6 months or something.
- especially if you make frozen cocktail aliquotss for the surface and/or intracellular, you can improve consistency that way.

For a big NIH study we're doing with Mt Sinai, that's the basic protocol: WB drawn, immediately stained with the Fluidigm MDIPA Lyo surface panel (but see Adeeb's caveat about the donor-effects and MDIPA in WB samples: ), and then SmartTube buffer added and the sample is then frozen. We then thaw, wash, barcode, pool, do intracellular, Ir+PFA, and refreeze as FBS/DMSO aliquots for later running.

2. What are you trying to control for? As you know, different controls have different purposes.
- I agree, ideally you would have a standardized control sample. For maximum alignment (sort of absolute reference), this would need to be an identical sample for every run/plate. A Prestained control (in this case, multiple aliquots of the same WB donor stained with the frozen cocktail, fixed, and frozen) might work best. This is similar to the Prestained control that I had in my Multicenter paper, which allowed us to demonstrate that the Site-Stained background increase in a couple markers was an artifact of the 4C storage of the combined cocktail.

- However, if it's Prestained, that wouldn't tell you if your frozen cocktail (or some other reagent) has gone off on the day of staining. If you want to know whether the cocktails are working properly, then replicate PBMCs may be sufficient (with your caveats about PBMCs missing Grans and some other WB populations). Or, multiple aliquots of the same WB donor just fixed/frozen (if fixation doesn't affect your staining much...Vericells might also suffice), that you could just add as another sample or barcode.

- if you want to know whether the machine is functioning properly, then replicate PBMCs may be sufficient.

3. Regarding data analysis: I think it depends on what you're planning to do with the data. If you're going to be handgating, then yes, nudging the gates as necessary to account for batch effects (as well as potential biology!) will address a lot of the issues even without formal algorithmic batch correction.

However, if you're going to be some sort of algorithmic analysis, significant batch effects will be problematic. You don't want extra populations that are being created simply by artifacts (often increased background) relative to more well-behaved samples. Exactly where that transition point from "some batch effects, but don't appear to be enough to affect the algorithm much" to "wow, this Plate has a bunch of plate-specific clusters!" I think depends on the algorithm, and I certainly do not have a clear cutoff for you.


Re: Batch alignment for whole blood studies

PostPosted: Fri Jan 07, 2022 5:49 am
Here is how my lab has addressed batch effects without a reference sample in the past. On many samples, unperturbed and stimulated, with controls and patient samples.

"Raw phospho-signal values were adjusted to allow comparison of phosphorylation across the five control CyTOF experiments run on five experiment dates as follows. The median baseline (unstimulated) phosphorylation values for each signaling pathway-cell type-experiment date combination was calculated. Adjustment factors were next calculated to align every experiment date’s values to the bootstrapped mean values for each signaling pathway-cell type combination. Following this adjustment, all controls had the same baseline values for each signaling pathway-cell type combination.

The same adjustment factors calculated for signaling pathway-cell type combination in each control was applied to post-stimulus signaling values. This allowed comparison of post-stimulation phospho-signaling fold changes from baseline across the five experiment dates." (in the supplement)