CITE-seq Unpacked: A Comprehensive Guide to Cellular Indexing of Transcriptomes and Epitopes by Sequencing

Pre

In the rapidly evolving field of single-cell genomics, CITE-seq stands out as a powerful approach to measure both the transcriptome and surface protein epitopes within the same cell. By combining RNA sequencing with antibody-derived tags, CITE-seq offers a multi-omics view that enhances cell-type identification, functional annotation, and the discovery of subtle cellular states. This guide explores what CITE-seq is, how the workflow works, and how researchers can plan, execute, and analyse CITE-seq experiments to achieve robust, publication-ready results.

CITE-seq: What is it and why does it matter?

The term CITE-seq, or Cellular Indexing of Transcriptomes and Epitopes by Sequencing, describes a method that jointly profiles gene expression and protein expression at single-cell resolution. Unlike traditional approaches that rely on either transcriptomics or proteomics alone, CITE-seq enables a direct, integrative readout of RNA and surface proteins. This dual modality improves cell-type resolution, aids in distinguishing closely related states, and supports constructive interpretation of immune and developmental cell landscapes.

Key advantages of CITE-seq

  • Simultaneous RNA and protein data from the same cell
  • Improved cell-type discrimination, especially for immune and stem cell populations
  • Better annotation of cell states when surface markers are informative
  • Compatibility with established single-cell workflows and downstream analyses

For many labs, the appeal of CITE-seq lies in “more information per cell” without sacrificing throughput. The approach is compatible with droplet-based or well-based single-cell workflows and integrates with popular analysis ecosystems such as Seurat and Scanpy. In discussions of CITE-seq, you may also encounter related terms and variants, all describing a family of methods that use antibody-derived tags to quantify surface proteins alongside RNA.

How CITE-seq works: the core workflow explained

At its heart, the CITE-seq workflow uses antibody-derived tags (ADTs) attached to antibodies that bind specific cell-surface proteins. When cells are processed for single-cell sequencing, both mRNA and the ADTs are captured and sequenced, yielding two orthogonal data streams per cell: a gene expression profile and a surface protein fingerprint.

Step-by-step overview

  1. Antibody labelling with ADTs: Cells are stained with a panel of antibodies, each conjugated to a unique DNA oligonucleotide barcode. The oligos are designed to be captured and read during standard single-cell library preparation.
  2. Single-cell partitioning: The labelled cells are partitioned into droplets (or wells) so that each cell’s transcriptome and ADTs are isolated together for downstream capture.
  3. mRNA capture and cDNA synthesis: mRNA from each cell is captured and converted into complementary DNA (cDNA) as in typical single-cell RNA sequencing workflows.
  4. ADTs capture and library preparation: The DNA barcodes attached to antibodies are captured alongside the mRNA-derived cDNA, enabling sequencing of the ADT-derived tags in parallel.
  5. Sequencing and data production: Libraries are sequenced to generate readouts for both gene expression (RNA counts) and surface protein markers (ADT counts) for each cell.

Data readouts and integration

The resulting data set contains two principal components per cell: a transcriptome profile and a surface-protein profile. In some protocols, a third aspect can be included, such as a sample tag or multiplexing barcode. The integration of these data streams enables refined clustering, improved marker discovery, and better understanding of functional states across cell populations.

It is important to recognise that the number of ADTs that can be measured is limited by practical factors such as panel design, antibody availability, and sequencing depth. Thoughtful panel construction and pilot experiments are essential to optimise signal-to-noise and ensure reliable interpretation of protein measurements alongside transcriptomes.

Designing a CITE-seq experiment: planning and panel design

Effective CITE-seq experiments begin with careful planning. The design phase covers antibody panel construction, controls, experimental scale, and sequencing strategy. This section outlines practical considerations to help you design a robust CITE-seq study.

Panel design: choosing surface markers wisely

  • Target a balanced panel: Include markers that define major cell types and markers that discriminate subpopulations of interest.
  • Consider biology and housekeeping markers: Include markers linked to known biology (e.g., activation states) and stable housekeeping controls for normalisation.
  • Avoid cross-reactivity: Select antibodies with high specificity and well-characterised performance in the chosen species and tissue.
  • Confirm conjugation compatibility: Ensure antibodies are compatible with the ADT conjugation chemistry used in your protocol.

Controls and quality assurance

  • Inclusion of isotype or fluorescence controls to monitor non-specific binding
  • Negative controls to gauge background ADT counts
  • Replicates and cell viability measures to ensure data reliability

Sequencing depth and multiplexing

Decide on sequencing depth per modality, balancing transcriptome coverage with accurate ADT quantification. Consider multiplexing strategies to increase throughput and reduce batch effects, while ensuring demultiplexing accuracy remains high.

CITE-seq data types and analysis: turning raw reads into insights

Data analysis in CITE-seq blends standard single-cell RNA-seq workflows with dedicated handling of ADT counts. Several software ecosystems have integrated CITE-seq capabilities, enabling streamlined processing, normalisation, clustering, and multi-omics interpretation.

Pre-processing: from raw data to clean matrices

  • RNA data: perform typical QC (mitochondrial gene content, features per cell), normalization, and feature selection
  • ADT data: treat as a separate modality, often subject to different normalisation due to distinct distribution characteristics
  • Link multi-omic data: map RNA and ADT data to the same cells, preparing for joint analysis

Normalisation strategies for CITE-seq

Because ADT counts can differ markedly from RNA counts, separate normalisation pipelines are usually employed. RNA data commonly uses log-normalisation or more sophisticated methods, while ADT data may benefit from centred log ratio transformations or negative-binomial modelling. Integrated methods, such as TotalVI or multi-omic embeddings in Seurat, help harmonise modalities and improve clustering fidelity.

Clustering and cell-type annotation

  • Use joint embeddings to cluster cells based on combined RNA and ADT information
  • Leverage known marker panels to annotate cell types, while remaining open to novel or transitional states
  • Assess stability across screens and batches with proper controls

Statistical modelling and downstream insights

Advanced analyses may employ probabilistic models that jointly model RNA and ADT data, enabling more precise cell-type demarcation and pathway inference. Tools in the Seurat and scVI ecosystems offer tutorials and workflows for CITE-seq data, including integration with external reference datasets.

Comparing CITE-seq with related multi-omics approaches

Several methods share a similar goal of multi-omics profiling at the single-cell level. Understanding how CITE-seq compares with these approaches helps researchers choose the right tool for their questions.

REAP-seq and related antibody-derived tag methods

REAP-seq, like CITE-seq, uses antibody-derived tags to quantify surface proteins alongside transcriptomes. Differences mainly lie in the chemistry of ADT conjugation, library preparation specifics, and software ecosystems. The core principle—dual readouts from the same cell—remains a common thread.

Multi-omic alternatives and multi-omics integration

Other strategies aim to broaden the scope beyond surface proteins, incorporating chromatin accessibility or intracellular markers. Techniques such as SHARE-seq or sci-CAR combine chromatin accessibility with transcriptomes, whereas CITE-seq focuses on protein epitopes at the cell surface. Integrative analyses across modalities are an active area of method development.

Practical considerations: turning theory into high-quality data

While the concept is straightforward, successful CITE-seq experiments depend on practical execution. The following points summarise actionable tips to improve data quality and reproducibility.

Antibody panel validation and titration

  • Verify antibody specificity in the relevant tissue
  • Perform titration experiments to optimise signal-to-noise
  • Include appropriate controls to detect non-specific binding

Staining protocol and sample handling

  • Follow validated staining procedures to preserve cell integrity
  • minimise time between tissue dissociation and staining to reduce artefacts
  • Maintain consistent temperatures and buffers to preserve epitopes

Quality metrics and troubleshooting

  • Monitor doublet rates, as droplet-based methods can capture two cells together
  • Assess mitochondrial read proportions and gene detection thresholds
  • Check ADT count distributions for expected plateauing or drop-off in low-quality samples

Best practices: ensuring robust interpretation of CITE-seq data

Adopting best practices across experimental design, data processing, and reporting will maximise the reliability and impact of CITE-seq studies. Below are recommended guidelines that align with community standards and recent methodological advances.

Documentation and reproducibility

  • Maintain detailed records of panel composition, antibody lots, and library preparations
  • Share analysis pipelines, parameter choices, and versioned software to facilitate replication

Validation with orthogonal data

Where possible, corroborate findings with independent measurements such as flow cytometry or imaging-based protein quantification. Cross-validation strengthens inference about cell states and marker associations.

Ethical and regulatory considerations

Ensure compliant sample handling, data privacy, and ethical approvals for human tissues, where applicable. Document consent and sample provenance alongside experimental metadata.

Future directions: what lies ahead for CITE-seq and multi-omics

The field of single-cell multi-omics is evolving rapidly, with ongoing innovations designed to expand the capabilities of CITE-seq and related technologies. Researchers can expect improvements in panel density, sensitivity, and integration with complementary modalities.

Higher-dimensional antibody panels

Advances in antibody design and conjugation chemistry may enable larger ADT panels without compromising signal quality. More target epitopes could allow finer dissection of cell states and activation patterns.

Deeper integration with computational tools

As multi-omics datasets grow, new algorithms for joint modelling, data imputation, and interpretable visualisations will emerge. Methods that provide intuitive embedding visualisations and biologically explainable results will be particularly valuable for translating data into insights.

Clinical and translational applications

In clinical research, CITE-seq can aid in characterising tumour microenvironments, monitoring immune responses, and identifying biomarkers of treatment response. Standardising workflows and robust validation will support broader adoption in translational studies.

Glossary and quick references

To help readers quickly orient themselves, here are concise definitions of key terms frequently encountered in CITE-seq discussions.

  • CITE-seq: Cellular Indexing of Transcriptomes and Epitopes by Sequencing; a method to measure RNA and surface proteins in single cells.
  • ADTs: Antibody-Derived Tags; DNA barcodes attached to antibodies that quantify surface epitopes in CITE-seq.
  • Single-cell RNA sequencing (scRNA-seq): A technology that profiles gene expression at the level of individual cells.
  • TotalVI: A probabilistic model for jointly analysing RNA and protein data from multi-omics single-cell experiments.
  • REAP-seq: A related approach using antibody-derived tags to measure surface proteins with RNA transcripts.
  • Batch effects: Unwanted systematic differences between samples that can confound biological signals.

Final reflections: integrating CITE-seq into your research toolkit

CITE-seq represents a robust and versatile approach to single-cell multi-omics, merging the depth of transcriptomics with the actionable specificity of surface proteins. When planned thoughtfully, executed with careful controls, and analysed with appropriate models, CITE-seq can reveal nuanced cellular landscapes that might remain hidden in single-modality studies. Whether you are mapping immune cell hierarchies, exploring developmental trajectories, or interrogating tumour ecosystems, CITE-seq offers a compelling route to richer biological insight while remaining compatible with familiar analytical workflows.

As multi-omics continues to mature, the role of CITE-seq in the broader landscape of single-cell biology is likely to strengthen. Researchers who stay current with panel design best practices, robust normalisation strategies, and validated data analysis pipelines will be well positioned to translate complex data into meaningful discoveries.