Scientific Report

Common variable immunodeficiency (CVID, ORPHA: #1572) is a primary antibody deficiency (PAD) disease featuring low serum levels of IgG, IgA and/or IgM, and normal or decreased B cell numbers that lead to recurrent infections particularly in the respiratory and gastrointestinal tract. The pathogenesis of CVID remains largely unresolved; as only ~25% of CVID cases can be explained by monogenic gene defects (e.g. in ICOS, TNFRSF13B, CTLA4, NFkB1, CECR1), and the identification of genetic mutations remains elusive in the majority of CVID cases. In some cases, CVID is preceded by selective IgA deficiency (IgAD), which can be asymptomatic or associated with an increased frequency of infections, allergies, and autoimmunity. Both IgAD and CVID patients feature increased susceptibility to recurrent respiratory and gastrointestinal infections. A shared genetic basis for IgAD and CVID has been suggested, and various studies have reported the progression from IgAD to CVID (Aghamohammadi et al, Int Arch Allergy Immunol, 2008; Haimila et al, Genes Immun, 2009), which has also been observed in numerous unpublished cases.

Whether or not CVID has a monogenetic basis remained controversial for decades, as diseases symptoms typically manifests only in late adolescence, unlike prototypical monogenetic primary immunodeficiencies. The debate lasted until 2003, when Dr. Grimbacher (member of our consortium) discovered that CVID can be caused by biallelic loss of function mutations in ICOS, a co-stimulatory molecule expressed by follicular helper T cells (TFH cells) (Grimbacher et al; Nat Immunol, 2003). However, autosomal-dominant mutations (which are more difficult to detect by Sanger Sequencing) were not identified in this cohort of patients until 2005, when Dr. Grimbacher’s team and Dr. Hammarström’s team (associate member of our consortium) discovered heterozygous mutations in the gene TNFRSF13B, which encodes for TACI (Salzer et al, Nat Genet, 2005; Pan-Hammarstrom et al, Nat Genet, 2007). Next generation sequencing has later revealed mutations in PIK3CD, PIK3R1 (Deau et al, J Clin Invest, 2014), CTLA4, NFKB1 and NFKB2 (Schubert et al, Nat Med, 2014; Fliegauf et al, Am J Hum Genet, 2015). In any case, incomplete disease penetrance suggests additional, as yet undefined mechanisms (e.g. polygenic, epigenetic, and environmental contributors) relevant for CVID clinical phenotype. Moreover, the phenotypic expressivity of the disease varies considerably. Despite technological improvements, the majority of CVID patients (an estimated 75%) still lack a molecular diagnosis based on genetic sequencing approaches, reinforcing the notion that there remain undefined molecular mechanisms at play in CVID.

Dr. Ballestar (consortium coordinator)’s team identified DNA methylation defects associated with CVID, providing a concrete proof-of-principle for an epigenetic dimension of CVID (Rodríguez-Cortez et al, Nat Commun, 2015). This work provided the first evidence of epigenetic dysregulation in PADs and has spearheaded interest in epigenetic dysregulation in the broader field of primary immunodeficiency research. Consistently, epigenetic mechanisms such as DNA methylation have an established role in B cell differentiation and biology (Kulis et al, Nat Genet. 2015).

DNA methylation, as well as other epigenetic marks including histone post-translational modifications, and their associated effects on chromatin accessibility, are responsible for establishing and maintaining the transcriptional activity of genes. These mechanisms are particularly relevant during cell differentiation processes. For example, a member of the consortium (Dr. Bock) recently mapped the DNA methylation dynamics associated with human hematopoietic differentiation as part of the International Human Epigenome Consortium (Farlik et al, Cell Stem Cell, 2016). The development and maturation of the immune system is governed by successive epigenetic changes, including DNA methylation and chromatin accessibility, and presents an alternative avenue for cellular dysregulation that appears to be highly relevant for the development of immune-related disease, such as in CVID.

The i-PAD project seeks to systematically test the hypothesis that a large proportion of those CVID (and IgAD) patients that lack a clear monogenetic defect are characterized by abnormal molecular signatures, which can be determined by a combination of epigenome, transcriptome and/or proteomic profiling. If this hypothesis is true, the proposed research will fundamentally advance our biological understanding and ability to molecularly diagnose PADs and provide proof-of-concept for the role of epigenetic and gene-regulatory changes in an important class of rare diseases.

For the first time, technology is now ready to systematically assess the role of the epigenome, transcriptome and proteome in PAD patient samples. Multi-omics technologies have been established and optimized for low input amounts of FACS-purified clinical samples to enable high-resolution molecular dissection of heterogeneous disease samples. A member of the consortium (Dr. Bock) has been a driver of the optimization of epigenome mapping technology for low-input and single-cell applications, including highly optimized protocols for DNA methylation (Farlik et al, Cell Rep, 2015), histone modifications (Schmidl et al, Nat Methods, 2015) and chromatin accessibility (Rendeiro et al, Nature Communications 2016). This lab is also highly proficient in low-input and single-cell RNA-seq (Ji et al, EMBO Reports, 2016; Mass et al, Science 2016) as well as CRISPR single-cell functional screening (Nature Methods. 2017). Moreover, Dr. Bock is a fully trained bioinformatician experienced in multi-omics data processing and integrative analysis.

Genetic information, epigenetic modulation, and gene expression ultimately result in specific proteomic signatures. In the i-PAD project, we also investigate this additional step to connect the functional consequences of the dysregulated epigenetic modulation in CVID with the proteomic signature of cellular populations. Dr. Geiger (an associated member of this consortium) and colleagues have recently analyzed the proteome dynamics of T cell activation in as few as 50,000 cells, obtaining data on up to 10,000 proteins (Geiger et al. Cell, 2016). This resource provides an important reference to identify dysregulated proteins in T cells from CVID patients that may be the consequence of upstream epigenetic dysregulation. Quantitative proteomics has revealed the network architecture of immune cells (Rieckmann, Geiger, et al Nat Immunol, 2017) and can thereby help understand PAD-specific defects. In the i-PAD project, we will go one step further and include proteomic profiling of T cell populations to assess the functional consequences of dysregulated epigenetic modulation in CVID.

The combined expertise and direction of the members of this consortium will provide the know-how, access to patient samples, technologies, and conceptual tools to fill the critical gap in the knowledge of the underlying mechanisms of CVID which will eventually impact their clinical management.

Aim and scientific approach

The Integrative Multi-Omics Analysis of Primary Antibody Deficiency (i-PAD) project seeks to systematically test the hypothesis that a large proportion of those CVID (and IgAD) patients that lack a clear monogenetic defect are characterized by abnormal molecular signatures. These will be determined by a combination of epigenome, transcriptome and/or proteomic profiling, followed by integrative analysis.

Objectives of i-PAD:

1. Obtain the epigenomic, transcriptomic, and proteomic signature of relevant immune cell types from individuals with defined and selected monogenetic traits within the CVID spectrum (including IgAD).

2. Analyze and integrate the obtained datasets to identify cellular pathways dysregulated in CVID.

3. Stratify patients according to their omics signature to advance personalized medicine for CVID.

4. Verify and correct the above-identified dysregulated pathways in vitro.

Results and achievements

To achieve the goals of the i-PAD project, we designed a work plan relying on extensive interaction between omics data production and its integration with both a systems biology and clinical immunology perspective. This requires a close collaboration between clinicians, molecular biologists, and bioinformaticians within the consortium. The overall strategy of the i-PAD project consists of the generation and integration of genetics, epigenomics, transcriptomics and proteomics datasets to associate specific profiles with distinct clinical features of CVID patients, advancing disease understanding and providing the groundwork for future omics-based diagnostics of PADs. To reach our goals, we have nine work packages (WP): WP1 and WP2 deal with the management, coordination, ethical aspects, and harmonization of methods in order to ensure data comparability within the project. WP3-WP6 are devoted to the specific omics methods/data production and will be led by an experienced PI in the field. WP7 will focus on data integration and patient stratification. WP8 will focus on validation and functional dissection of dysregulated pathways in vitro. Finally, WP9 is dedicated to data dissemination within the clinical and research communities and the communication with patient associations.

The specific results obtained within each WP, as well as their leaders, are described below:

WP1. Management, coordination (IGTP, CCI).

This WP deals with the coordination of the tasks at the individual sites/institutes, the setup of servers for sharing files among partners, and the organization of periodic TCs and F2F meetings for updates on the progress and achievement of milestones. The project General Coordinator (GC) is Dr. Esteban Ballestar (IGTPL) who leads the administrative, scientific and operational management of the project together with Dr Bodo Grimbacher (co-general coordinator, co-GC). They are assisted by the IGTP Research Grants Management Department.

Task 1.1 General coordination (IGTP, CCI).

The GC coordinates the communication among partners to ensure that the consortium works as a whole and that information flows across WP efficiently and according to the general Work Plan. The GC has monitored the daily progress and quality of the technical work, assess the fulfillment of objectives, check the execution of milestones and deliverables, and assist project members to achieve their goals.

Results obtained within task 1.1 include:

-Preparation of Consortium Agreement (CA) and Material Transfer Agreement (MTA). These documents have been prepared and are in the final stage for signature to be submitted at the JTC2018 office.

-Organization of TC (bimonthly) for the entire consortium and 1 face-to-face kick off meeting in Barcelona (April 2019). The second face-to-face meeting, which was scheduled as a 2-day meeting for March 18th and 19 2020 in Potsdam, had to be canceled due to the SARS-CoV2 pandemic. Instead, this meeting was held as a half-day videoconference with a good outcome.

Task 1.2 Coordination of the Registry, Biobank, and Database coordination (CCI, LBI, KI).

Dr. Grimbacher has ensured the creation of suitable platforms for the Registry, Biobank and Omics Database. Details are described within WP2. To achieve the maximal clinical value of the research, a team of CCI and LBI have developed the tools for the data management in the Registry and Omics Database as well as guidelines that will be used to educate all researchers for data introduction and use. Members of this joint team have had regular meetings to ensure the correct use of the information in the Registry and Database.

Task 1.3 Administrative coordination (IGTP).

The GC has been assisted by the IGTP Research Grants Management Department in the administrative tasks required for the Project. This Department is monitoring the adequate execution of the project, including ethical approvals, specifically managed through WP2 (Task 2.1). It also assists in the financial and scientific reports according to guidelines. Moreover, GC has been assisted by the Pre-Award Department of IGTP for the preparation and the timely conclusion of the CA and MTA.

WP2. Sample collection, biobanking, and methods harmonization (CCI, LBI)

Task 2.1. Ethical permissions (CCI).

The ethics committee Freiburg approved our application with an effective date of 06.06.2019. Later, in July 2019 the consent form under which patient samples are being collected was approved (ETK-Nr. 354/19). The i-PAD ethic vote has the Antrag-Nr. EK-Freiburg: 76/19. This ethics permission represents an umbrella covering aspects related to informed consents, sample storage, use in the project, shipping samples internationally and/or to pharma, patient recall.

Task 2.2. Implementation of a Registry (CCI).

- The CCI, in collaboration with members of Christoph Bock’s laboratory (LBI), has created a web-based registry platform containing information in regards to sample name (original and submitted), cell type, cell numbers, cell treatment, disease group, replicate number, sex, gender, date of birth from all participants, etc. and a detailed introduction sheet for the correct use of this platform. Every collaborator has one sheet to fill in information specific for data production according to their contributing method.

- In addition, the consortium is using, a database specifically designed to combine clinical and laboratory information -extracted from clinical databases, such as (the electronic patient chart from the Freiburg Hospital), or curated from research articles-, complete genetic and genomic information from gDNA sequencing, microbiome data from intestinal 16sRNA sequencing, and functional data from experimental research. uses a universal structured and hierarchical terminology inspired on various biological ontologies, including terms from well-known ontologies such as the Gene Ontology (GO), the Human Phenotype Ontology (HP), the Cell Ontology (CL), the Disease Ontology (DO), Online Mendelian Inheritance in Man ontology (OMIM), the International Classification of Diseases (ICD10), The Medical Subject Headings (MeSH), and the National Cancer Institute Thesaurus (NCIT), among others. uses genomic resources and annotations from and it was designed and currently maintained by Dr. A. Caballero and Dr. M. Proietti.

For capturing the clinical and laboratory data of the patients, we are currently using the database, which is an electronic patient chart and research database provided by the Section of Immunodeficiency headed by Prof. K. Warnatz of the Department Rheumatology and Clinical Immunology of the University Hospital Freiburg, who recently applied and was granted to become an additional honorary member of the i-PAD consortium.

However, we are currently working on a disease-specific stand-alone research database for i-PAD patients. This project is currently called the CTLA4-database project, although CTLA4 is only a prototype disease for our cohort of patients, but the database structure being developed for CTLA4-patients will be also used for the other patients included into i-PAD.The CTLA4 database currently holds more than 200 mutation carriers and a publication on this cohort is currently being prepared.

Task 2.3. Methods Harmonization (CCI).

It is essential that the methods for the cell sorting as well as sample preparation and storage for the subsequent methods (genetics, gene expression, DNA methylation, chromatin accessibility and proteomics) are standardized at the beginning of the project. This will ensure homogeneity of data quality across the entire cohort and optimal performance of the different assays. The following protocols have been tested and harmonized:

- After the establishment of pilot experiments, defining 72 ml EDTA whole blood donation as appropriate for all omics approaches, and the analysis of unstimulated monocytes as sufficient readout for CVID patients, the protocol was distributed among all collaborators and discussed in detail. The following protocol was used to process all collected material: If possible 72 ml EDTA whole blood is collected from each participating individual. Two EDTA tubes are centrifuged at 1000 rpm for 2 min RT and 2 x 0.5 ml plasma is transferred each to 2 ml cryotube and stored at -80°C. PBMCs are isolated via Ficoll density gradient in separate tubes and cell numbers are determined by Neubauer counting chamber. Two EDTA tubes are left unwashed during PBMCs isolation for genomic DNA isolation and whole-exome sequencing. After cell counting an aliquot of 100’000 PBMCs are used for cytometric measurements for monocytes (CD14, CD16, CD80 and HLA-DR), B cells (CD19, CD20, CD21, CD27, CD38, IgM and IgD), B cell activation (CD69, CD80, CD86, CD95, HLA-DR), T cells (CD3, CD4, CD8, CD25, CD45Ra, CD69, CD127 and CCR7) and 200’000 PBMCs are frozen down in 90% FCS and 10% DMSO in a Mr. Frosty cooler and stored in N2 for future applications. The rest of PBMCs are stained for monocytes (CD4-, CD14+), naive B cells (CD14-, CD4-, CD27-, CD19+, IgD+) and CD4+ naive T cells (CD14-, CD4+, CD45Ra+) and sorted by fluorescence-activated cell sorting. Naive B cells are stimulated for 24h at 37°C CO2 5% with CD40L provided by Prof. Pascal Schneider (Department of Biochemistry, Lausanne, Switzerland) and IL-21 (Miltenyi). Naive CD4+ T cells are stimulated 48h at 37°C CO2 5% with CD3 and CD28 (Invitrogen, plate-bound stimulation, incubated overnight). Once stimulated the lymphocytes are counted with Neubauer chamber and distributed according to their cell numbers and requirements of each omics approach. First priority is RNA-seq/RRBS (RNA and DNA are isolated from same cells, Qiagen Allprep DNA/RNA micro kit) with 50’000 cells, second priority is ATAC-seq with 50’000 cells and if cell counts are above 150’000 then 50’000 cells are used for third priority proteomics (max. 200’000 cells). An aliquot of 1’000 cells is used to measure activation success via flow cytometry. Cell numbers from monocytes are determined during sorting and distributed according to requirements of the different omics approaches equally as lymphocytes. Before shipment to Vienna for further sample processing, ATAC-seq samples undergo the initial transposition step. Cells are centrifuged at 500 g for 5 min at 4°C and supernatant is discarded. The remaining pellet is resuspended in lysis buffer (TD buffer, transposase, nuclease-free H2O, digitonin and proteases inhibitor cocktail) and incubated for 30 min at 37°C followed by DNA purification with DNA clean and concentrator kit (Zymoresearch). Cells for RNA-seq/RRBS are lysed in RLT plus buffer + beta mercaptoethanol 1:100 and stored at -80°C till isolation of DNA and RNA, according to Qiagen protocol. Cells for proteomics are washed 3x with PBS and snap-frozen in N2.

This protocol will be implemented in the other i-PAD centers recruiting patients into i-PAD such as Karolinska and in the future also hopefully Vienna and a possible new member of the consortium: Hannover.

RRBS and ATAC-seq: As part of the pilot, RRBS and ATAC-seq samples were generated to ensure consistent sample quality across two normal donors. RRBS pilot samples had adequate CpG site coverage, and ATAC-seq pilot experiments had an adequate fraction of reads in peaks (which are key assay-specific quality metrics). Moreover, sample correlations of ATAC-seq pilot samples showed expected clustering by cell type and stimulation conditions.

Task 2.4. Establishment of a biobank (CCI).

The consent form has been approved by the ethics committee Freiburg and contains detailed information for each participant in the study. Protocols for sample handling have been distributed among collaborators harboring detailed information regarding necessary equipment, consumables and storage of processed material. There will be two types of experimental protocols:

A sample will be drawn for a specific experiment within iPAD. This will require a signed patient consent for iPAD studies. This sample will be processed immediately without the involvement of the Freiburg CCI-FREEZE biobank. These experiments, however, may make necessary intermediate storage of processed samples, such as the collection of gDNA samples. Before bulk shipment to the sequencing facility. The storage of these samples is not considered biobanking and hence takes place under approved SOPs and protocols of the respective research group.
A sample will be drawn with the intention to freeze the sample for yet undetermined experimental use. This sample collection requires a biobank consent from the patient. In Freiburg this procedure takes place under the approved protocols of the CCI-FREEZE biobank Freiburg (www.uniklinik-freiburg.de/freeze-biobank.html).

WP3. Genomics and Transcriptomics (CCI).

Task 3.1- 6.2. Generation of -omics data sets.

To date, 94 individuals have been collected for this study in Freiburg, see Table 1. From 92 individuals, samples have been sent to the corresponding collaborator for processing/ sequencing, see Table 2.

Table 1. Summary of individuals collected

Table 2. Summary of samples sent out to collaborators

Task 3.2. Generation and analysis of gene expression data

A total of 270 RNA-seq samples have been sent to Novogene for library preparation and sequencing. Initial library preparation of 5 samples with low RNA concentration failed. As a consequence, Novogene optimized their library preparation protocol and was able to process 262 of our 270 samples, as for further three samples library preparation failed. The consortium will receive a hard drive with the data of the 252 RNA-Seq datasets by end of March 2020.

WP4. DNA methylation profiling (IGTP, LBI)

This WP focuses on the generation of DNA methylation profiles using a genome-scale bisulfite sequencing method with single-CpG and single-allele resolution, specifically reduced representation bisulfite sequencing (RRBS) for the aforementioned cell types and samples.

Task 4.1. DNA methylation data generation.

DNA methylation profiles were obtained via a custom-optimized RRBS protocol. 271 RRBS samples have been produced to date, of which 65 samples have already completed sequencing and initial data processing (Table X). Seven samples dropped out prior to sequencing due to experimental failure: these samples did not pass intermediate quality control tests, namely insufficient DNA input (as indicated by low Cq values by qPCR) or excessive DNA fragmentation (as indicated by Agilent Bioanalyzer profiles). Unmapped reads were quality using FastQC (Andrew, 2010) and aligned to hg38 using BSMAP (Xi et al, 2009). Further pre-processing and preliminary analysis was conducted using the RnBeads (Müller et al, 2019), a start-to-finish pipeline for analyzing DNA methylation data.

Preliminary analysis of the first 65 sequenced samples shows sufficiently high CpG site coverage across samples, and cell-type specificity, indicative of good RRBS library quality. Experimental processing of the remaining 199 samples may be delayed due to the coronavirus situation.

WP5. Chromatin accessibility data generation (LBI, CCI)

This WP focuses on the generation of chromatin accessibility profiles using the Assay of the Transposase-Accessible Chromatin using sequencing (ATAC-seq), for the aforementioned cell types and samples.

Task 5.1. Generation of high-resolution chromatin accessibility profiles

Chromatin accessibility profiles were obtained the ATAC-seq protocol, and conducted jointly between the CCI and LBI-Vienna teams. Of the 267 ATAC-seq samples processed to date, 151 samples have been sequenced (Figure 2A). Six samples dropped out prior to sequencing due to experimental failure in sample preparation, either due to low DNA input identified by high qPCR Cq values, or poor Agilent Bioanalyzer profiles that indicate poor tagmentation. Unaligned reads were quality controlled using FastQC (Andrew, 2010). Reads were then trimmed, aligned to hg38, and filtered using skewer (Jiang et al, 2014), Bowtie2 (Langmead et al, 2012), and sambamba (Tarasov et al, 2015) respectively. Peaks were called using MACS2 (Zhang et al, 2008), and a consensus peak set was generated taking the union of extended summits across all samples. After manual inspection of sample genome browser tracks, six samples dropped out due to high levels of background reads, which is indicative of poor sample quality. The remaining 145 samples fulfilled thresholds of quality control metrics (e.g. alignment rates, fraction of reads in peaks), as well as the expected cell-type specificity (Figure ). As the final dataset will be generated successively, efforts are planned to enable correction of experimental batch effects; overlap between biological groups of interest (e.g. disease group) will be planned between experimental batches.

WP6. Proteomics (IRB, CCI).

Using high-resolution mass spectrometry, this WP is focused on the generation of deep proteomes of activated naïve CD4 T cells for 125 patient samples according to well-established protocols described by Geiger et al. (Cell 2016).

Task 6.1. Generation of mass spectrometry data.

To support sample processing at IRB, Neftali Ramirez (a member of the CCI team) joined the IRB lab (R Geiger) for two weeks. During the sample shipment dry ice evaporated before delivery, as the shipment was held back at the Swiss customs. Therefore, initially only 5 samples were processed and measured to determine the degree of protein degradation. Preliminary analysis, however, showed sufficient sample quality and exploitable data when compared to different previous data sets of monocytes and lymphocytes from the Geiger lab.

107 samples have been processed and will be measured continuously over the next weeks and months, while 127 samples still need to be processed.

WP7. Data integration & clinical analysis of molecular markers (LBI, CCI).

To be undertaken once the complete individual omics datasets are completed, as specified in the i-PAD proposal (from month 22).

WP8. Validation and functional dissection of dysregulated pathways in vitro (IGTP, CCI, LBI).

To be undertaken once the complete individual omics datasets and tasks in WP7 are completed, as specified in the i-PAD proposal (from month 21).

WP9. Dissemination and communication activities (IGTP, CCI).

A Communication Plan (CP) has been defined.

- We expect to have several publications both in relation to the generation and analysis of the individual datasets as well as a result of the integrated analysis of the multi-omics datasets. We have defined the leadership of the different sub-studies (transcriptomics, methylomics, accessibility, proteomics) within the project to optimize in the best possible manner the outputs and reach the largest possible audience and maximize the benefit to clinicians willing to extend the results to CVID (and other PAD) patients.

- The group leaders, students, and postdocs are sharing their results in annual seminars, as well as in sessions open to the general public organized by their host institution and or by other institutions. One example is the participation of the i-PAD consortium members in the AKKI meeting in Potsdam, which was going to take place in March 2020 (moved to a new date, given the covid-19 outbreak) This conference will attract immunologists from all areas relevant to PADs and will contribute to future fruitful collaborations.

- Moreover, the group leaders are going to present the results of their project at an intermediate and a final status symposium organized by E-Rare.