Key words

1 Introduction

The immune system, including the innate and adaptive as well as overlapping systems, plays a pivotal role in the defense against viral or bacterial infections, immune homeostasis, and cancer surveillance. Within the immune system, T lymphocytes are crucial for adaptive immune responses, and are activated upon recognition of peptides displayed by human leukocyte antigen class I (HLA-I) or class-II (HLA-II) molecules at the surfaces of antigen-presenting cells (APCs). T lymphocytes express the T cell receptor (TCR) that recognizes specific peptides, which have been processed and presented in combination with an HLA molecule. There are two major subtypes of T lymphocytes: CD8+ cytotoxic T cells (CTLs) and CD4+ helper T cells. CTLs recognize peptides in the context of HLA-I molecules, while CD4+ helper T cells recognize peptides associated with HLA-II molecules. The functional activity of these two subsets of T cells is said to be restricted by HLA-I and -II molecules, respectively.

It is known that CTLs play a major role in killing tumor cells [1, 2] and controlling viral or bacterial infections [3–7], while CD4+ T cells are required for priming and expansion of naive CD8+ T cells as well as secondary expansion of CD8+ memory T cells [8–12]. It might therefore be of critical importance to incorporate both HLA-I- and -II-restricted epitopes in peptide-based vaccines to obtain participation of both CD4+ and CD8+ T cells for generation of strong and long-lasting immunity.

Thus, identification of new antigenic peptides, derived from infectious agents or tumor antigens, which may bind to HLA-I or HLA-II molecules in exchange with self-peptides normally occupying the HLA-binding site (see below), is important for developing new effective vaccines capable of activating the cellular arm of the immune responses. However, the barrier to development of peptide-based vaccines with maximum population coverage is that the restricting HLA genes are extremely polymorphic resulting in a vast diversity of peptide-binding HLA specificities and a low population coverage for any given peptide–HLA specificity. As of April 2013, it has been reported that there are 7,089 HLA-I alleles and 2,065 HLA-II alleles (http://hla.alleles.org). Undoubtedly, these numbers will be further increased in the future. To reduce this complexity, one option is to group thousands of different HLA molecules into clusters of several so-called HLA supertypes: a classification that refers to a group of HLA alleles with largely overlapping peptide binding specificities. In this chapter, we discuss the state-of-the-art classification of HLA-I and HLA-II supertypes and their application in development of peptide-based vaccines.

2 HLA-I Molecule and Assembly of HLA-I Peptide Complex

The major histocompatibility complex class I (MHC-I) antigens are referred to as the human leukocyte antigens class I (HLA-A, -B, and -C) and as H-2 class I antigens (K, D, and L) in mice. HLA-I antigens consist of three non-covalently associated components: a 45 kDa glycosylated amino acid (AA) heavy chain (HC), a 12 kDa light chain (beta 2 microglobulin, β2m), and a short 8–10 AA self-peptide. The heavy chain of HLA-I consists of about 340 AA residues, including a cytoplasmic region (about 30 AA residues), a transmembrane region (about 40 AA residues), and an extracellular region composed of three immunoglobulin-like domains (α1, α2, and α3), each consisting of approximately 90 AA. The α1 and α2 domains form a peptide-binding groove and contain the positions contributing to the binding pockets for the peptide and T cell receptors. The binding groove is divided into six distinct pockets (A–F) based on chemical and physical characteristics; the most important pockets for peptide binding are the B and the F pockets. The membrane-proximal α3 domain of the HC contains a binding site for the co-stimulatory molecule CD8 [13] expressed by CTLs, which play an important enhancing role in killing virus-infected cells and cancer cells. The α1 and α2 domains consist of two segmented alpha helices forming the walls and eight antiparallel β strands forming the floor—together forming a unique peptide-binding groove, which is the site where the self (or foreign antigen-derived) peptide (8–10 AA) binds to the polymorphic parts of the HC and is presented to peptide-specific CTL for scrutiny. β2m is non-covalently associated with the extracellular region of the HLA-I heavy chain by non-covalent interactions with α2 and α3 domains [14]. β2m is essential for the correct conformation of the peptide-binding groove of the heavy chain and stabilizes the HLA-I antigen peptide complex on the cell surface. Thus, β2m indirectly participates in the antigen presentation to specific T-cell receptors of CTL [15–17].

The assembly of HLA-I peptide complex occurs in the endoplasmic reticulum (ER). Initially, the HLA-I HC associates with the chaperone calnexin (CNX) initiating an early folding and a disulfide bond formation within the HC. The newly synthesized HLA-I HC then associates with β2m to form heterodimer. This heterodimer is rapidly recruited into the peptide-loading complex (PLC) consisting of a transporter associated with antigen processing (TAP), and the chaperones tapasin, calreticulin (CRT), and ERp57. The HLA-I HC/β2m heterodimer is now ready for peptide loading. Peptides, both self- and pathogen-derived, are predominantly generated in the cytosol by the proteasome to degrade cytosolic proteins into short peptides, although a proteasome-independent peptide produced directly by insulin-degrading enzyme has been recently documented [18]. Thereafter, the peptides are transported into the ER by the TAP1 and TAP2. These peptides are further trimmed by aminopeptidase ERAAP1 and ERAAP2 to 8–10 AA, a length appropriate for HLA-I binding. Once HLA-I/HC-β2m dimers, physically associated with PLC, bind a subset of high-affinity peptides, the fully assembled MHC-I peptide complexes are released from PLC and transported via the Golgi apparatus to the cell surface, where the peptides are presented by HLA-I to CTL for scrutiny (see details in reviews [19, 20]).

3 HLA-II Molecule and Antigen-Presenting Pathway

The HLA-II molecule consists of two chains: α and β chain (each one with two domains: α1 and α2, β1 and β2) and a self-peptide with 13–25 AA located in a cleft formed by the α1 and β1 domains. Classical HLA-II molecules include HLA-DR, HLA-DQ, and HLA-DP and are expressed mostly in the membrane of the professional antigen-presenting cells, where they present processed extracellular antigenic peptides to CD4+ T cells. In contrast to the antigen-binding groove of HLA-I molecule, which is closed at each end, the antigen-binding groove of HLA-II molecules is open at both ends and allows longer peptides (13–25 AA) to be loaded [21, 22]. During synthesis of HLA-II molecules in the ER, the α and β chains are produced and associate with an invariant chain, which stabilizes the HLA-II molecule and prevents it from binding of intracellular peptides or peptides from the endogenous pathway. The invariant chain directs transportation of HLA-II from the ER to the Golgi complex, followed by fusion with late endosomes which contain peptides derived from endocytosed, degraded proteins (self or foreign). The invariant chain is then cleaved by cathepsins to form a small fragment known as CLIP, which occupies the peptide-binding groove of the HLA-II molecules. HLA-DM facilitates CLIP removal and makes the peptide-binding groove of HLA-II ready for peptide loading before the HLA-II-peptide complex migrates to the cell surfaces to be scrutinized by CD4+ T cells [23].

4 Classification of Supertypes

4.1 HLA-I Supertypes

The concept of supertypes was firstly introduced by Alessandro Sette’s group in 1995 [24, 25]. The definition of an HLA supertype is that HLA molecules with similar peptide binding features are grouped into one supertype; this means that if a peptide is able to bind to one allele within a supertype, it can also bind to all other alleles in this supertype. In practice, actually only a few peptides that are able to bind to one allele in a supertype can bind to all the other alleles within the supertype. To date, many methods have been used to define HLA-I supertypes, including structural similarities, shared peptide-binding motifs, and identification of cross-reacting peptides [26–29]. Based on motifs derived from binding data or sequencing of endogenously bound peptides, along with simple structural analyses, Sette and Sidney [30] defined nine supertypes (HLA-A1, -A2, -A3, -A24, -B7, -B27, -B44, -B58, -B62), which were reported to cover most of the HLA-A and -B polymorphisms. Subsequently, Ole Lund’s group [26] constructed hidden Markov models (HMMs) [31] for HLA-I molecules using a Gibbs sampling procedure [32] and defined a similarity measure between these sequence motifs. By using this similarity to cluster alleles into supertypes, Ole Lund’s group [26] further defined three new HLA-I supertypes (HLA-A26, -B8, and -B39), in addition to the nine supertypes described previously by Alessandro Sette’s group [30], which was based on about 100 HLA-I peptide interactions. In the past few years, a lot of binding data have been generated; MHC-binding motif information is readily accessible (http://www.iedb.org), and MHC sequence data are also available in the IMGT (the international ImMunoGeneTics information system: http://www.imgt.org) database. In 2008 Alessandro Sette’s group analyzed the updated list of alleles available through IMGT using a simple approach largely based on compilation of published motifs, binding data, and analyses of shared repertoires of binding peptides, in combination with clustering based on the primary sequence of the B and F peptide-binding pockets [29]. They provided updated supertype assignments, with new assignments for about 1,000 different HLA-I alleles, which is about a tenfold increase in the number of alleles compared to their original classification done in 1999 [30]. In the updated HLA-I classification, Alessandro Sette’s group found that about 80 % of the 945 alleles examined were classified into one of the nine supertypes identified previously [30], and they did not suggest the existence of any other novel supertypes. However, they found that some alleles have specificities spanning two different supertypes, nine alleles share features of both the A01 and A03 supertypes, and another ten alleles have a specificity overlapping the A01 and A24 supertypes [29]. In addition, some alleles could not be assigned to any supertypes known today on the basis of the criteria mentioned above; thus these unclassified alleles remain to be addressed.

In summary, the updated HLA-I classification described by Alessandro Sette’s group [29] is in agreement with those defined by other approaches from the other groups [26, 33, 34] including Ole Lund’s group, and is now widely accepted and has been used for development of peptide-based vaccines [29, 35, 36].

4.2 HLA-II Supertypes

The structural composition between HLA-I and HLA-II molecules is fundamentally different, thus leading to very different binding characteristics. The binding groove is closed at both ends in an HLA-I molecule, while the peptide-binding groove of HLA-II molecules is open at both ends, which allow the binding of longer peptides (13–25 AA residues) than that for HLA-I molecules. A deeper understanding of the polymorphism of HLA-II molecules will contribute significantly to HLA-II-binding peptide prediction and classification of supertypes.

In contrast to HLA-I supertypes, HLA-II supertypes have been less intensively studied, although a few studies about HLA-II supertypes [26, 37–41] have been reported. One important reason is that peptide binding data for HLA-II molecules is less available than those for HLA-I molecules due to the complexity of HLA-II structure. Nevertheless, studies have suggested that many DR molecules [26, 37, 38] and many DP molecules [40, 42] can be grouped into supertypes. In 1998, Ou et al. [38] grouped HLA-DR molecules into seven different functional supertypes on the basis of their ability to bind and present antigenic peptides to T cells and their association with susceptibility or resistance to disease. In 2002, Castelli et al. [40] defined an HLA-DP4 supertype and supported the existence of three main binding supertypes among HLA-DP molecules. In 2005, Doytchinova et al. [37] applied a combined bioinformatics approach using both protein sequence and structural data, to 2,225 HLA-II molecules, to detect similarities in their peptide-binding sites for definition of HLA-II supertypes. They defined 12 HLA-II supertypes, including five DRs (DR1, DR3, DR4, DR5, and DR9), three DQs (DQ1, DQ2, and DQ3), and four DPs (DPw1, DPw2, DPw4, and DPw6). In 2011, Greenbaum et al. [41] determined the binding capacity of a large panel of non-redundant peptides for a set of 27 common HLA DR, DQ, and DP molecules. The measured binding data were then used to define class II supertypes on the basis of shared binding repertoires. Seven different supertypes (main DR, DR4, DRB3, main DQ, DQ7, main DP, and DP2) were defined. Subsequently, according to motif-based supertype classification [27], seven different supertypes were defined after the analysis of 27 HLA II proteins described in a previous report [41]. All the molecules belonging to the DP genetic locus (DPB1*0101, DPB1*0201, DPB1*0401, DPB1*0402, DPB1*0501, and DPB1*1401) were grouped into a single supertype; DQ proteins were grouped into two different supertypes, each containing three HLAs: (DQB1*0301, DQB1*0302, DQB1*0401) and (DQB1*0201, DQB1*0501, DQB1*0602). The motif-based classification of the DR proteins is less defined compared with the other loci. The HLA-DR can be grouped into four supertypes: (DRB1*0401, DRB1*0405, DRB1*0802, DRB1*1101), (DRB3*0101, DRB3*0202), (DRB1*0301, DRB1*1302), and the fourth containing the remaining DR proteins. Functional and motif-based clustering of 27 defined HLA-II molecules revealed the presence of proteins sharing both functional and structural properties, thus supporting the concept of HLA-II supertypes.

5 HLA Supertypes and Vaccines

To date, one of the major drawbacks of a peptide-based vaccine strategy is that the restricting HLA genes are extremely polymorphic resulting in a vast diversity of peptide-binding HLA specificities and a low population coverage for any given peptide–HLA specificity. To increase population coverage, one might include defined epitopes for each HLA-I allele; however, this would lead to a vaccine comprising hundreds of peptides. As mentioned above, one way to reduce this complexity is to group HLA molecules into HLA supertypes; a classification that as mentioned above refers to a group of HLA alleles with largely overlapping peptide binding specificities [24, 25, 30]. Ideally this means that a peptide, which binds to one allele within a supertype, has a high probability of binding to other allelic members of the same supertype. The concept of HLA supertypes has been successfully applied to characterize and identify T cell epitopes from a variety of different pathogens, including measles-mumps-rubella, SARS, EBV, HIV, HCV, HBV, HPV, influenza, LCMV, Lassa virus, F. tularensis, vaccinia, and cancer antigens as well [29].

HLA supertypes have been utilized as a component in several approaches and algorithms designed for predicting peptide candidates [43–48]. The technology behind “reverse immunology” is developing rapidly in order to identify T cell epitopes from tumor antigens and infectious microorganisms [44–51]. During the SARS epidemic back in 2003, the SARS genome was identified in a matter of weeks, and a complete CTL epitope scanning—just barely possible at that time—was completed a few months later [43]. Therefore, “reverse immunology” as a powerful tool to identify T cell epitopes has now reached the stage where genome-, pathogen-, and HLA-wide scanning for HLA-binding antigenic epitopes become feasible at a scale and speed that makes it possible to exploit the genome information as fast as it can be generated. Importantly, a large-scale dataset of measured HLA-II-binding affinities covering 26 allelic variants, including a total of 44541 affinity measurements for HLA-DR alleles as well as 11 HLA-DP and DQ molecules [52], are available to be used as training data for generating prediction tools utilizing several machine learning algorithms. To date, the computer-based algorithms for predicting peptides binding to HLA-I molecules are being developed for HLA-II-restricted peptide epitopes, a development, which is of pivotal importance for understanding the immune response and its effect on host-pathogen interactions [32, 52–55]. Those tools will definitely lead to fast identification of novel peptides restricted by HLA-I and HLA-II supertypes for use in vaccines against infectious agents as well as tumors. In this respect, individual peptides harboring both HLA-I and HLA-II binding potentials [46–48, 56] might be of particular importance.

In conclusion, classification of HLA supertypes reduces complexity of HLA polymorphisms and has a significant impact on the development of peptide-based vaccines with maximum population coverage. Since CD4+ T cells are required for priming of naïve CD8+ T cells as well as expansion of CD8+ memory T cells [8–12], it is of critical importance to incorporate both HLA-I and -II supertype-restricted epitopes in peptide-based vaccines with maximum population coverage to obtain participation of both CD4+ and CD8+ T cells for generation of strong and long-lasting immunity.