A Combined Computer-Aided Approach to Drive the Identification of Potential Epitopes in Protein Therapeutics

Background: The identification of fragment sequences, or motifs, within a therapeutic protein that may elicit an immune response when processed by T-cells can be provided by computer-aided approaches. Immunogenicity is a significant problem associated with protein therapeutics and should be investigated in the early stage of protein-based drug development to avoid treatment resistance and potentially life-threatening immune responses. Purpose: To provide a combined computer-aided protocol for investigating the immunogenic profile of a recombinant Kunitz-type inhibitor, which has been reported as promising antitumor agent by our research group. Methods: The combination of databases searching (IEDB and SYFPEITHI) and molecular docking simulations was exploited, herein. This combined protocol has allowed the identification of potential epitopes before in vitro/in vivo evaluation. Predictors of human proteasome cleavage transport and major histocompatibility complex (MHC) binding were considered as overall score assigning the corresponding intrinsic potential of being a T cell epitope to each fragment sequence. The peptides or motifs better classified in the two databases were docked into the three-dimensional (3D) structure of MHC (class I and II) complex to verify the calculated binding affinity. The binding interactions regarding the molecular recognition process by T-cells were also exploited through the MHC:ligand:T-cell complexes. Results: Regarding the Kunitz-type sequence, four motifs were identified as potentially epitopes for MHC-I and three motifs were found for MHC-II. But, those motifs were classified as moderately immunogenic. Final remarks: The combined computer-aided protocol has significantly reduced the number of potential epitopes to be considered for further analysis and could be useful to identify immunogenic fragments (high, moderate and low) in protein pharmaceutics before in vitro/in vivo experimentation. __________________________________________________________________________________________


INTRODUCTION
Protein therapeutics (enzymes, toxins, monoclonal antibodies) have been considered as significantly promising for improving human health offering as advantages more specificity and less toxicity when compared to small compounds. These called proteinbased drugs are mostly safe and nontoxic, whether natural or recombinant, but depending on the dose and administration frequency (number of doses administered), they often induce harmful anti-drug antibodies (ADA), which may compromise the drug's efficiency and provide serious adverse events related to the cross-reactivity with autologous proteins (endogenous counterpart). ADA may change not only the drug's pharmacokinetic profile, but also its pharmacodynamic response interfering or neutralizing the biological effect (1,2). Thus, the immunogenicity of protein-based drug candidates must be considered, primarily in the early stage of drug development process, to avoid both the treatment resistance and potentially life-threatening immune responses.
The immune response mechanisms regarding protein therapeutics can be characterized by the (i) activation of the classical immune system by "foreign" proteins, similar the immune response elicited against pathogens or vaccines (involving Tcells, B-cells, and innate immune system); and (ii) alteration of B-and/or T-cell tolerance, such as the response elicited to autologous self-proteins in certain autoimmune diseases. The two mechanisms overlap but are slightly distinct from one another. Human immune responses to autologous proteins might also involve overcoming the regulation of adaptive immune responses by regulatory T-cells (Treg), though (1,3). _________________________________________ Concerning the classic immune pathway, the production of anti-therapeutic protein responses is the top of a series of events, which eventually lead to B-cell activation and antibody secretion (4,5).
The T-cell response to a protein therapeutic antigen (antigenic fragment/peptide) generally depends on (i) the binding of T-cell epitopes to the major histocompatibility complex (MHC), (ii) the presentation of the MHC:epitope complex at the cell surface, and (iii) the recognition of the MHC:epitope complex by either an effector, or Treg. The experimental binding affinity of T-cell epitopes to MHC, or human leucocyte antigen (HLA), can be evaluated through MHC or HLA binding assays (6,7), and the findings can be retrieved from databases used for immunogenicity prediction, for instance.
MHC class I molecules are expressed by all nucleated cells whereas class II molecules are primarily expressed by professional antigenpresenting cells (APC), such as dendritic cells (DC), macrophages, and B-cells (8,9). MHC class I and II are similar in function presenting peptide fragments (epitopes) at the cell surface to be recognized by CD8+ and CD4+ T-cells, respectively. Those peptides are from different sources, though, generally intracellular for MHC class I and exogenous for MHC class II, obtained via different pathways. Interestingly, there is a link between the two pathways (MHC-I and MHC-II), termed crosspresentation, whereby exogenous antigens (previously phagocytosed) can be also presented by MHC-I molecules. Therefore, the T-cell epitopes presentation by both, MHC-I and MHC-II, contributes to the initiation of an immune response (8,10,11).
The fragment sequences within a therapeutic protein that, when processed by T lymphocytes may elicit an immune response, can be predicted by applying computer-aided approaches. Regarding protein-based drugs, it has been reported that the primary amino acid sequence itself can be a strong determinant concerning the protein's immunogenic potential (binding affinity by MHC or HLA and presentation of T-cell epitopes) (12). Of note, there are databases available and widely used, such as IEDB (Immune Epitope Database) (13) and SYFPEITHI (14), loaded with information regarding the binding affinity by MHC or HLA and providing tools for mapping potentially T-cells epitopes (1). The databases allow the prediction of epitopes considering different application types, including tumor cell lines. However, this kind of prediction has been reported mostly regarding vaccine candidates, and it is often used aside the experimental assays (15)(16)(17)(18), instead of as a driven procedure before experimentation.
Herein, a recombinant Kunitz-type inhibitor (19,20), which has been reported as a promising antitumor agent by our research group (21), was investigated regarding its immunogenic profile combining the search in two databases (13,14) and molecular docking simulations to identify potentially epitopes before in vitro/in vivo evaluation. Kunitztype domains are ubiquitous in nature, acting as both serine protease inhibitors and toxins in animal venoms. Structurally, they consist of between 50 and 70 amino acid residues, adopting a conserved fold generally with two antiparallel β-sheets and one or two helical regions, which are stabilized with three disulphide bridges. They are involved in several physiological processes, such as inflammation, blood coagulation, and fibrinolysis (22,23).

Input data
The computer-aided approach was applied to a recombinant Kunitz-type inhibitor deposited in GenBank, code AAT68575.1, by Batista, I.F.C. and Chudzinski-Tavassi, A.M. (submission date: 02-MAR-2004) (24). The primary sequence contains 129 amino acid residues, where the residues 1-21 correspond to the signal peptide. Then, the input data was the primary (linear) amino acid sequence (FASTA format) of the recombinant protein, discarding the signal peptide.
Two free-accessed databases, SYFPEITHI (14) and IEDB (13), were chosen to perform the immunogenicity prediction regarding the number of reports and number of citations (25)(26)(27). Both databases have information on binding affinity of epitopes by the HLA complex (experimental binding affinity data) (15). In this step, we also informed the MHC source species of interest to perform the analysis.

Selection of alleles
Two MHC properties make difficult to avoid the immune response: (i) MHC is polygenic, meaning it contains several different MHC-I and MHC-II genes (every individual possesses a set of MHC molecules with different ranges of peptide-binding specificities); and, (ii) MHC is highly polymorphic, meaning there are multiple variants of each gene within the population.
Size of sequence fragments (peptides/epitopes) Regarding the size of fragments generated, the core of residues in the T-cell epitope sequence that mainly define the binding affinity and stability to MHC pockets is limited in length to 9 or 10 amino acids (aa). Generally, epitopes presented by MHC-I have 9 aa residues in length, and those presented by MHC-II may range from 12 to 20 aa residues (15). In this study, we have considered sequence fragments containing up to 15 aa residues for MHC-II.
Due to the variability concerning the interaction points into the MHC-II binding site (less specific), the predictability findings have a confidence limit of 50%. The MHC-I prediction, on the other hand, has 80% confidence limit since the core, containing nine residues, has been considered providing more specific interactions (14,30,13).
Of note, the prediction of T-cell epitopes based on fragment sequences (peptides) is computationally feasible when sufficient information on a set of sequence fragments regarding the MHC (I or II) binding affinity is available. The IEDB database (30,13), for instance, may provide reliable background findings for developing T-cell epitope prediction tools.
Sequence fragments classification and cut-off values When using IEDB database (30,13), the prediction method can be chosen by the user. Then, regarding the MHC-I epitopes prediction, the NetMHCpan method (33) based on artificial neural network (ANN) was chosen to classify the sequence fragments, since there was not any corresponding predictor to the Kunitz-type protein investigated. That method provides the binding affinity prediction expressed as IC50 values (nM). A binding affinity (IC50) threshold of 500 nM, for instance, identifies peptide binders recognized by T-cells and could be considered to select peptides (6). Paul et al. (2013) (34) have showed that absolute binding affinity threshold correlates better with immunogenicity, and for getting even better correlation, MHC-specific thresholds should be used. Then, herein, we considered the specific IC50 cut-off values for the 38 most common HLA-A and HLA-B alleles, representative of the nine major subtypes found in the IEDB database (13) to classify the fragment sequences.
The stabilized matrix method, SMM-align (35), is mostly recommended to perform the MHC-II epitopes prediction and was selected to score the sequence fragments. The MHC-II binding site is open at both ends making the correct alignment of a peptide in the binding cavity a crucial part, especially regarding the identification of the core of residues as an MHC-II binding motif. The method predicts quantitative peptide:MHC binding affinity values, meaning it calculates the IC50 value (nM) for each epitope, making it ideally suited to rationally discover promising epitopes.
When using the SYFPEITHI database (14), the potential epitopes are sorted by evaluating the position of each amino acid and its chemistry properties experimentally determined (lipophilicity, basicity, and so on). That information is converted to a score using a proper algorithm, implemented in the database. The algorithm also computes a maximum score value for each allele based on available information from natural binders. For example, the maximum score value that a certain peptide could present considering the allele HLA-A*02 would be 36 (14). Cut-off values are not reported for this kind of score calculation. Then, in order to select the peptides better classified, we have considered not only the sequence fragments presenting score values higher than 20, but also if they would appear in multiple alleles.
MHC class I: intracellular processing prediction IEDB also performs the intracellular processing prediction regarding the MHC class I pathway. Predictors of human proteasome cleavage, binding affinity by the transporter associated with antigen presentation (TAP) protein, and antigen presentation by MHC-I at cell surface, were used to produce an overall score for each peptide regarding its intrinsic potential of being a T-cell epitope.
In this step, we selected the immune proteasome type to make the prediction. The predictions are based on in vitro proteasomal digest of the enolase and casein proteins as described by Tenzer et al. (2005) (36).
The TAP score estimates effective -log(IC50) values of a peptide or its N-terminal prolonged precursors regarding the binding to TAP. It has been shown that high affinity of a certain peptide translates into high transport rates (30,13). The prediction of antigen presentation by MHC-I, however, is quite similar to the MHC-I binding prediction supra-mentioned.

Molecular Docking Simulations
The molecular docking method provides a binding affinity estimative regarding the ligandreceptor/protein complex through a scoring or energy function. Larger ligands having many rotatable bonds (freedom degrees), for instance, demand significantly computational timeconsuming. Then, a rigid docking procedure can be considered to speed up the calculation procedure. Several fragment peptides, or motifs, may have immunogenic potential and could interact with MHC molecules inducing immune responses. There is an urgent need to find faster predicting protocols that may allow to identify and classify the most immunogenic fragments before carrying out the in vitro and in vivo experiments, which are also expensive and time-consuming (37).
The findings from both databases were compared, and the fragment sequences (peptides) classified as promising T-cell epitopes for MHC-I and MHC-II were considered to perform molecular docking simulations, allowing the assessment to the three-dimensional (3D) structural information in the molecular binding recognition process by MHC and T-cell, as well.
The presence of the T-cell receptor (TCR) in the MHC-ligand complex was crucial for selecting the 3D structure to perform molecular docking simulations. It was considered 3D structures having resolution under 3.0 Å (X-ray diffraction method).
The Cartesian coordinates of the HLA-A*02 complex (MHC-I:ligand:Tcell), deposited in the Protein Data Bank (38) (PDB), PBD ID 1OGA (resolution at 1.40 Å) (39), were retrieved and used as reference/template to perform the molecular docking simulations for the MHC class I. The ligand was used as reference to build up the 3D molecular models of the fragment sequences previously selected as promising epitopes (database search).
Based on data reported from the crystallographic structure of the complex (39), three calculation conditions were considered regarding the number of water molecules participating into the molecular interaction process, as follows: (i) complex MHC-I:ligand and two water molecules establishing interactions only with the ligand; (ii) complex MHC-I:ligand and five water molecules, where the additional three molecules participate in interactions into the HLA-A active site; and, (ii) complex MHC-I:ligand:TCR lymphocyte and ten water molecules, being five molecules corresponding to those from the second calculation condition, and five more related to interactions established between the TCR lymphocyte protein chains and the rest of the complex (ligand and HLA-A active site). Of note, the last calculation condition is likely more representative in terms of T-cell recognition process.
For the molecular docking simulations of MHC class II, the Cartesian coordinates of the HLA-DRB1 complex (MHC-II:ligand:TCR) deposited in the Protein Data Bank (38) (PDB), code PBD ID 1FYT (resolution at 2.6 Å) (40), were retrieved and used as reference/template. The ligand contains 13 aa residues, and the interactions established between the complex and water molecules have been also reported (40). The ligand was used as reference to build up the 3D molecular models of the sequences previously selected as promising epitopes. Although those peptides have size of 15 aa residues, the core of residues (generally containing 9 aa) is indeed the motif responsible for establishing the interactions into the HLA-DRB1 active site.
Molecular docking simulations were carried out using CLC Drug Discovery Workbench software (version 2.4, Qiagen Aarhus A/S, 2014) (41). Regarding the molecular modelling method used, it classifies promising ligands according to the complex binding energy values, which are related to the ligands' calculated binding affinity (complex formation). Furthermore, the intermolecular interactions and interatomic distances can be also exploited using this approach.
Re-docking procedure using the original PDB ligand was performed to establish the optimum conditions for computing the peptides binding affinity values. The simulation protocol was the following: (i) 1,000 iterations; (ii) rigid approach, because the number of rotatable bonds; and (iii) Nelder-Mead simplex method (42), implemented in the package, for function minimization. Of note, not only the root-mean square deviation (RMSD) values (43), regarding the atomic position differences between the reference ligand and each peptide, but also the score function using the PLANTSPLP method (44), were considered as docking evaluation criteria.
A particularly ligand binding mode into the protein binding pocket can be connected to a score value. The score, herein, mimics the potential energy change when the target protein and ligand come together, meaning that a very negative score corresponds to a strong binding whereas a less negative, or even positive, score value corresponds to a weak or non-existing binding. The total score value comprises the following types of contribution: hydrogen bond score, metal interaction score, steric interaction score, and ligand conformation penalty score. Concerning the last contribution, it scores the complementarity between the binding site and ligand by rewarding and punishing different types of heavy atom contacts having inter-atom distance less than 5.5 Å.
Regardless the method limitations, it can be considered as a good alternative to access the 3D structural information related to the molecular recognition process for rationally driving the selection of potentially promising epitopes.

MHC class I ligand identification
Initially, the databases divided the primary amino acid sequence of the Kunitz-type inhibitor in all possible fragments containing nine amino residues in size, which is the preferred size for MCH-I presentation (15), giving approximately 1,200 peptides (fragment sequences or motifs or potential epitopes).
Regarding the findings from SYFPEITHI database (14), eleven peptides were selected considering the twelve investigated alleles for MHC-I, according to the criteria previously described in the Methods section. The score values and corresponding distribution of the peptides in each allele are presented in Figure 1S (Supporting Information). Concerning Figure 1S, the fragments p5, p57, and p67, were identified as potential ligands for seven of the twelve alleles investigated. Six fragments (p5, p86, p71, p81, p69 and p57) were pointed out as MHC-I ligands for the most frequent allele in the population (HLA-A*02). Furthermore, the fragments p5 and p86 have showed score above 20 and were selected for the next stage of the study. The same kind of evaluation was performed to the other fragments. After that, the findings obtained from SYFPEITHI database (14) were compared to those from IEDB (13).
The IEDB database (13) considers specific IC50 cut-off values for each allele (29). The position of the first aa residue corresponding to each fragment better classified as well as the predicted IC50 (nM) and IC50 (nM) cut-off values are listed in S1 Table  (Supporting Information).
Regarding S1 Table, after applying the IC50 cutoff, there were not any distinct sequence fragments identified as good ligands for MHC-I considering the three most frequent alleles in population (HLA-A*02:01, HLA-A*24:02, and HLA-A*01:01). The sequence fragments (peptides) p1, p32, p34, and p81 were found in multiples alleles suggesting they would have a significant potential to be presented at least by two different HLA complexes to TCR lymphocytes ( Figure 2S; Supporting Information).
Despite the differences concerning the frequency of the epitopes in each allele, none of the fragment sequences was excluded in this step since they have presented IC50 predicted values bellow the specified cutoff value (29). Therefore, the comparison of the motifs identified from the two databases was made and six fragment sequences were common to both databases, as showed in Table  1.

MHC class I intracellular processing prediction
Predictors of human proteasome cleavage and binding affinity by TAP protein were considered in association with the prediction of fragment sequences as potential epitopes to be presented by MHC-I at cell surface. An overall score for each peptide (fragment sequence or epitope or motif) was generated concerning the intrinsic potential of each peptide being a T-cell epitope. Thus, the combined score has increased the chance of finding promising immunogenic sequences (15). The different fragment sequences found in this step were also considered to perform molecular docking simulations.
The binding affinity of certain peptides (sequence fragments or motifs) by MHC-I, to be further presented at cell surface, involves an intracellular processing where firstly an exogenous protein would be internalized by phagocytosis or endocytosis; then, it would be cleaved by the proteasome generating smaller fragments/peptides having different sizes. Those peptides may bind to the TAP protein to be transported to the endoplasmic reticulum (ER). In the ER, depending on the binding affinity, the peptides can be recognized by MHC-I, which will be responsible for presenting them at the cell surface (membrane) (10,45).  The analysis was made considering the alleles HLA-A*02:01, HLA-A*24:02, HLA-A*01:01, and HLA-A*03:03, which cover 70% of the population frequency (13,30). Higher score values suggest that the fragments generated in the intracellular processing would be likely presented via MHC-I pathway (29). The intracellular processing prediction score combines the proteasome cleavage, TAP transport, and MHC binding predictions (13,15). The score values ranged from -0.30 to 2.38 regarding 390 fragment sequences generated. A cutoff score value of 1.0 was set to reduce the number of fragments to be considered in the subsequent steps. Therefore, twenty-nine fragments were selected regarding all alleles investigated, and they are listed in Table 2. The fragment sequences also found in the MHC-I binding procedure (previous step), using both databases, are pointed out with asterisk: one (*) for those peptides found in IEDB (13) and two (**) for those found in SYFPEITHI (14).
Twelve fragment sequences presented score values higher than 1.50 (in bold; Table 2) and among them the peptides p5, p57, p59, and p67, were also found in the MHC-I binding step carried out in SYFPEITHI (**) database (14), and the peptide p72 was also found in the previous step performed in the IEDB (*) database (13). The sequences identified as good MHC-I ligands were compared to the fragments found in this step (MHC-I intracellular processing prediction) before being considered to follow to the next step, molecular docking simulations.
A summary regarding the steps considered for searching potentially promising MHC-I ligands using the tools available in the two databases is presented in Figure 1. Concerning the large number of fragments retrieved from the databases it is quite important adopting selection filters to rationally reduce the number of fragment sequences to be considered in molecular docking simulations.

MHC class I molecular docking simulations
After the findings comparison from both databases, twenty-four fragment sequences (peptides or epitopes or motifs) (see Figure 1) classified as promising T-cell epitopes for MHC-I were considered to perform molecular docking simulations (41).
Then, in this step, it was possible to assess the 3D structural information regarding the molecular binding recognition process. The selection of potentially promising epitopes has become more rational and reliable, since the interactions involved in whole molecular system (HLA-A*02:ligand:TCR), PDB ID 1OGA (39), were taken into account. The calculation conditions were previously described in the Methods section, and the findings for the first and second conditions are shown in Supporting Information (Table 2S and 3S). Of note, the molecular docking conditions can be directly compared to the epitopes prediction findings from the two databases since the interactions involved in the complex HLA-A*02:ligand can also be assessed (calculation condition 1 and 2). Therefore, eighteen of twenty-four peptides selected in both databases have presented favourable score energy values (negative values) as well as lower RMSD values, regarding the HLA-A*02:ligand complex formation. Concerning the calculation conditions, the findings from molecular docking were mostly in agreement with those obtained from the databases (MHC-I binding prediction), but reducing the number of promising sequences to be considered to further evaluation.
The third calculation condition includes the water molecules establishing molecular interaction with the whole system: the protein chains of the TCR lymphocyte, the HLA-A*02 binding site and the ligand (39). This condition is the most representative of the molecular system investigated regarding the stimulation of an immune response. Not only the ligand binding affinity by MHC-I is important in the molecular recognition process, but also the ligand presentation by MCH-I to the TCR lymphocyte. Concerning that, the databases have not enough information to predict the interactions involved in the entire complex HLA-A*02:ligand:TCR lymphocyte. The findings regarding the third calculation condition are shown in Table 3. Among the twenty-four peptides investigated only four fragment sequences (p1, p100, p69, p86) presented favourable score energy values. Of note, their RMSD values were lower than 1 Å. Based on these findings, they could be considered as potentially immunogenic.
LaFuente and Reche (2009) (11) have reported that the residues at terminal positions in the ligands' sequences seem to directly establish interactions with MHC. Otherwise, the residues at fourth, eighth, and sixth positions are related to the recognition process by the TCR lymphocyte. The peptides which presented negative energy score values had the same interaction mode previously reported by LaFuente and Reche (2009) (11). The peptide p1 [MANSKAVCN], for instance, was the best ranked by the docking score function (Table 3; -60.56 kcal/mol) and has showed molecular interactions with the following residues in the HLA-A*02 binding site: Tyr7, Tyr59, Tyr84, Thr143, Lys146, Tyr159, and Tyr171. Those residues are involved in antigen presentation (46). Furthermore, interactions were also observed with the amino acid residues Gln52, Gln96, Ser95, and Ser99 of the TCR lymphocyte protein chain (Figure 2).
The peptides p100 [EEDADSGEI], p69 [SPKLICFKV], and p86 [IMKKNLTGI] presented additional molecular interactions with four, three, and five amino acid residues, respectively, conserved in the HLA-A*02 binding site (46). The main difference seems to be related to the interactions established with the amino acid residues of the TCR lymphocyte chain, which could explain the different score energy values as well as the RMSD variation (accommodation into the binding site) in comparison to the best ranked peptide (p1). Regarding the molecular interactions with TCR, the fourth residue of peptide p1 interacts with the residues Ser95, Gln96, Gln52, and Ser99, while its sixth residue interacts with the Ser99 residue of TCR. For peptide p100 (-21.19 kcal/mol), the fourth residue only interacts with the Gln52 residue of TCR, while the sixth residue interacts with the residues Gln52 and Ser99. Concerning the peptides p69 (-18.27 kcal/mol) and p86 (-5.95 kcal/mol), the interactions with TCR are established only by the amino acid residues at the fourth position (Gly97 and Gln52 to p69; Gln96 to p86).
The 3D representation regarding the molecular interactions established by peptide p1 in the complex HLA-A*02:p1:TCR lymphocyte is shown in Figure 3 (A-C), using Discovery Studio Visualizer software. 44 The molecular docking of p1 in the HLA-A*02 binding site is presented in Figure 3A, and the solvent accessibility surface around the ligand p1 in Figure 3B. The interactions involved in the entire complex (antigen recognition process), including the TCR lymphocyte, are shown in Figure 3C. The MHC-I binding site exposes p1 to be recognized by the lymphocyte, and it can be observed that the terminal portions of ligand p1 establish interactions with aa residues from MHC-I whereas the central portion of p1 interacts with the T lymphocyte chain.

MHC class II ligand identification
The two databases cut the primary sequence of the protein in all possible fragments containing fifteen amino acid residues in size (preferred size for MHC-II; ranging from 12 to 20 residues) resulting in approximately 450 fragments. The alleles chosen in both databases to perform the MHC-II ligand prediction were HLA-DRB1*15:01, HLA-DRB1*07:01, HLA-DRB1*03:01, and HLA-DRB1*11:01. Of note, the allele DRB1*09:01 was not available in the SYFPEITHI database (14).
Regarding the findings from SYFPEITHI database (14), thirteen peptides (fragment sequences or motifs) presented score above 20. The peptide p80 has appeared in three alleles and the peptides p4 and p69 have appeared in two alleles. The score values and corresponding distribution of the peptides in each allele are presented in Figure 3S (Supporting Information section). Concerning the IEDB database (13), the predicted binding affinity of each peptide via MCH-II pathway is provided through IC50 values (nM). According to IEDB 2005-2015® (30), when peptides present IC50 values below 50 nM they are classified as having high binding affinity; when they have IC50 values between 50 and 500 nM they are considered as having moderate (intermediate) binding affinity; and when they present IC50 values higher than 500 nM they are discriminated as having low binding affinity by MHC-II (48). Twelve peptides were identified as having moderate binding affinity by MHC-II (IC50 values between 50 and 500 nM) and seven from twelve appeared in the allele HLA-DRB*11:01 ( Figure  4S; Supporting Information section). Five peptides from the same sequence region (76 to 93) of the Kunitz-type inhibitor protein have identified as good MHC-II ligands sharing a common sequence around the moiety DIMKK (see Table 4). Even though the sequences were not the same, the DIMKK region could be considered as a promising epitope. Table 4 shows the fragment sequences better classified as ligands for MHC-II using the two databases.
A summary regarding the steps considered for searching potentially promising MHC-II ligands using the tools available in the two databases is presented in Figure 4, pointing out the filters adopted to rationally reduce the number of fragment sequences to be considered in each next step of analysis.
MHC class II molecular docking simulations MHC-II molecules are highly polymorphic, and that feature may contribute to degenerate their binding site, which is less specific (more open groove) allowing the binding of multiple fragment sequences (peptides) to multiple MHC-II subtypes (49). Therefore, concerning the promiscuity related to the interaction regions into the MHC-II binding site, the confidence limit regarding the prediction of potential epitopes by the database used to carry out the analysis is only 50% (13,14,30). In this regard, the 3D structural information concerning the molecular system MHC-II:ligan:TCR lymphocyte can play a role to rationally select the epitopes more promising to be good ligands of MHC-II and, then, recognized by TCR lymphocyte.
Molecular docking simulations were performed constraining flexible bonds in the primary chain of each peptide. Of note, the size of peptides and related conformational flexibility have impact directly on the time-consumed in calculation as well as in findings reliability (50,51). Molecular docking approach, however, is indeed a relatively fast procedure and can provide new insights regarding the molecular system investigated.    The nineteen fragment sequences selected from the two databases were, firstly, docked into the MHC-II binding site (PDB ID 1FYT; resolution at 2.6 Å) (40) without considering TCR lymphocyte. The peptides were sorted according to their calculated binding affinity (energy score value related to the MHC-II:ligand complex formation) and accommodation/orientation in the binding site (RMSD values). The energy scoring function and RMSD values are listed in Table 5.
Ten peptides (p79, p78, p19, p74, p80, p77, p10, p22, p20, p76) have showed favourable (negative) energy score and RMSD values lower than 1 Å. The negative values obtained for total energy score indicate the formation of the HLA-DRB1:ligand complexes. Those peptides, consequently, would be more likely presented to the TCR lymphocyte. However, in order to be more certain, molecular docking simulations were also run considering the whole complex, HLA-DRB1:ligand:TCR lymphocyte, and the findings are shown in Table 6.
The peptides p2, p4, p61, and p62 presented RMSD values higher than 20 Å (43), indicating they did not fit well into the complex binding site. The steric hindrance regarding the residues' side chains in the ligands and in the binding site was responsible for that lack of accommodation. Interestingly, when the entire complex (MHC-II:ligand:TCR lymphocyte) was considered in docking simulations, only three peptides have showed favorable total energy score (negative values): p74 (-147.54 kcal/mol), p79 (-87.39 kcal/mol), and p78 (-21.21 kcal/mol), reinforcing these peptides as potentially immunogenic epitopes. The 3D representation regarding the molecular interactions established by the peptide p74 (more negative energy value; RMSD = 0.05 Å) in the complex HLA-DRB1:p74:TCR is shown in Figure 5, using Discovery Studio Visualizer software (47).
The peptide p74 [CFKVQDYWILNDIMK] establishes hydrogen bond interactions with the following residues into the HLA-DR1 binding site: Gln9, Phe51, Ser53, Asp57, Trp61, Asn62, Asn69, Arg71, Lys75, Arg76, His81, Asn82, and Val85 (the interatomic distances are lower than 3 Å). Furthermore, hydrogen bond interactions are also established with the Glu94 and Gly98 residues in the TCR lymphocyte binding site.   The major TCR lymphocyte amino acid residues involved in the molecular recognition process (Glu30, Gly98, Ser96) interact with ligands at position 11 (40). Regarding position 11, the peptide p74 has established interaction only with the residue Gly98. Besides that, hydrogen bonding interaction at position 3 was observed involving the Glu94 residue from the TCR binding site. The interactions established between p74 and HLA-DR1:TCR lymphocyte are shown in Figure 6.
The peptide p69, which presented unfavorable total energy score (600.33 kcal/mol, more positive value; RMSD value = 1.99 Å), interacts with the Ser96 residue at position 11 and Gly98 at position 10. It also established interactions at position 8 with the Ser95, Glu102, and Asn99 residues into the TCR lymphocyte binding site. The intermolecular interactions established by the peptides p74 and p69 into the HLA-DR1 and TCR lymphocyte binding sites, concerning the aa residues and water molecules, are summarized in Table 7. Of note, the peptide p69 establishes intermolecular interactions with only three residues into the HLA-DR1 protein (Ser53, Arg71 and Tyr60), in contrast with the thirteen residues involved in the binding of p74 (more favorable total energy score).

FINAL REMARKS
The combined computer-aided protocol described herein has significantly reduced the number of fragments or motifs to be considered as potentially promising epitopes concerning the Kunitz-type inhibitor used as example. Also, the fragment sequences were classified as having moderate immunogenic activity. Four potentially promising epitopes (p1, p100, p69 and p86) were found for MHC-I and three (p74, p79 and p78) for MHC-II. It is noteworthy that the three epitopes predicted for MHC-II using the combined computer-aided approach were among the sequences experimentally assayed and classified as potential epitopes (preliminary data; not published). However, more experimental assays should be conducted to fully validate the combined computer-aided protocol. Since the protocol comprises the 3D structural information related not only to the MCH:ligand complex, but also to the MHC:ligand:TCR lymphocyte complex, the prediction power from this combined approach seems to be improved. The assessment of ligand-lymphocyte interaction may allow to correlate the MHC ligand's binding affinity as well as the ligand's chance to be presented to TCR (immunogenic activity). Then, regarding the early stage of protein-based drugs development, for instance, the combined computer-approach could be applied to drive the selection of more promising epitopes to be synthesized and experimentally evaluated, as well as to find better constructions (through molecular modifications) derived to the original protein sequence to avoid or reduce the appearance of ADA.

CONFLICT OF INTEREST
The authors declare that they have no conflict of interest.