Bacterial signal peptides: structure, optimization, and applications

Bacterial signal peptides are N-terminal tags that direct proteins for export through one of various transport pathways. These signal peptides are highly important as they are the key determinants of transport, ensuring that the correct protein arrives at the correct pathway. While these peptides consist of three domains with well conserved biochemical properties, there remains a large amount of diversity between the signal sequences for different proteins, transport pathways, and bacterial species. Recent advancements have allowed for the prediction and manipulation of signal sequences to optimize protein export efficiency. This knowledge can then be exploited in the field of recombinant protein production wherein bacterial species are used to produce and secrete proteins of interest. By fusing the protein with an optimized signal peptide, the yield or rate of export can be improved. This review focuses on signal peptides for two primary transport pathways (Sec and Tat) in Escherichia coli specifically, with an emphasis on applications and the production of recombinant proteins.

Bacterial signal peptides are N-terminal tags that direct proteins for export through one of various transport pathways. These signal peptides are highly important as they are the key determinants of transport, ensuring that the correct protein arrives at the correct pathway. While these peptides consist of three domains with well conserved biochemical properties, there remains a large amount of diversity between the signal sequences for different proteins, transport pathways, and bacterial species. Recent advancements have allowed for the prediction and manipulation of signal sequences to optimize protein export efficiency. This knowledge can then be exploited in the field of recombinant protein production wherein bacterial species are used to produce and secrete proteins of interest. By fusing the protein with an optimized signal peptide, the yield or rate of export can be improved. This review focuses on signal peptides for two primary transport pathways (Sec and Tat) in Escherichia coli specifically, with an emphasis on applications and the production of recombinant proteins.

Signal sequence, signal peptide, recombinant proteins, Sec, Tat
For bacterial proteins to be transported to different cellular compartments or secreted from the cell, they must navigate through various transport pathways (Green & Mecsas, 2016). Faithful targeting to these pathways is paramount as it ensures that proteins arrive in the right place at the right time. With more than a dozen secretion systems and a vast exportome portfolio, strict organization must be maintained for bacteria to remain viable and functional. This protein transport specificity is provided by short N-terminal tags dubbed "signal peptides" (Emr et al., 1978;Freudl, 2018). These signal peptides have a tripartite structure with conserved motifs and are typically 16-30 amino acids in length (Freudl, 2018;Peng et al., 2019). Distinct biochemical properties of each domain or motif allow the protein to be transported by one specific pathway while avoiding others. Once the signal peptide has directed the protein to the correct transport pathway, it is cleaved off by signal peptidases (Peng et al., 2019). This review will outline common features of signal peptides for two main transport pathways-Sec and Tat-in Escherichia coli (E . coli ). The general secretion (Sec) and twin-arginine translocation (Tat) pathways are the primary transport systems in bacteria, with Sec predicted to transport over 90% of all secreted proteins (Georgiou & Segatori, 2005;Green & Mecsas, 2016). In brief, both pathways are composed of multiple protein components and involve transport across the cytoplasmic (inner) membrane (Green & Mecsas, 2016). However, while Sec only transports proteins in an unfolded state, Tat has the unique ability to transport folded proteins.
In addition to discussing key components of signal peptides, this review will also explore practical applications. For example, signal peptides can be exploited in the bacterial production of recombinant proteins (Freudl, 2018). This is highly relevant to many industries such as biopharmaceuticals, food production, and scientific research. By expanding upon the knowledge surrounding signal peptides and optimizing their sequence for maximum efficiency, the yield and purity of recombinant proteins can be improved.
A specific signal sequence exhibits some variation, both within and between bacterial species. This variation allows for certain proteins to be targeted to different transport pathways for export, thus arriving at different final destinations (Freudl, 2018;Green & Mecsas, 2016). Despite this diversity, there are three typical domains: a positively charged amino-terminal region (n-region), a central hydrophobic region (h-region), and a polar carboxylterminal region (c-region) (Freudl, 2018;Peng et al., 2019) ( Figure 1). It is within this c-region that the cleavage site for signal peptidases resides, with an A-X-A motif where A is alanine, and X is any amino acid (Rusch & Kendall, 2007) ( Figure 1). Due to recurring motifs, bioinformatics tools may be used to predict putative signal sequences in silico (Peng et al., 2019). For example, SignalP (Petersen et al., 2011), Phobius (Käll et al., 2007), and PrediSi (Hiller et al., 2004) can be used to predict signal peptides for the Sec pathway, while TatP (Bendtsen et al., 2005), Tatfind (Rose et al., 2002), and PRED-TAT (Bagos et al., 2010) may be used for the Tat pathway. The following subsections will outline common motifs seen in signal peptide sequences, with a focus on E . coli. Additionally, differences in the signal peptide that discern Sec vs. Tat transport will be highlighted.

Sec-specific signal peptides
Signal sequences which target proteins to the Sec pathway in E . coli follow the general tripartite structure as outlined previously (Rusch & Kendall, 2007) (Figure 1). As evidenced in Table 1, the positively charged n-region typically contains multiple basic amino acid residues such as lysine (K) and arginine (R). Next, the longer h-region consists primarily of hydrophobic amino acid residues (Table 1). Finally, the short c-region contains some polar uncharged amino acid residues such as serine (S) and threonine (T), as well as the signal peptide cleavage motif (A-X-A) (Lüke et al., 2009;Palmer & Berks, 2012). While the biochemical properties within each domain are well conserved, the individual amino acids themselves exhibit large variation between different proteins (Table 1). Because the amino acid substitutions are often within the same biochemical "family" (e.g., hydrophobic alanine vs. leucine), this diversity is well Figure 1. Schematic of the general tripartite structure of signal peptides. Both Sec-and Tat-specific signal peptides possess a positively charged n-region, a hydrophobic h-region, and a polar c-region. This polar c-region also contains the A-X-A motif, which is a signal peptidase cleavage site as indicated by the scissors. Tat-specific signal peptides also contain a twin-arginine motif (S/T-R-R-X-F-L-K) at the border between the n-and h-regions.  (Crane & Randall, 2017;Molloy et al., 2000) and signal peptide sequences were collected from the UniProt database (UniProt Consortium, 2019), then a ClustalW MSA was performed via MEGA software (Kumar et al., 2018). Colour key for biochemical properties: yellow=hydrophobic; blue=basic; red=acidic; green=hydroxyl, amine; pink=glycine; indigo=proline; brown=sulfhydryl;, aqua=histidine. tolerated and does not affect the overall function of the signal peptide. However, these slight changes can lead to differences in export efficiency, as discussed further in subsection 2.3.

Tat-specific signal peptides
Similar to Sec-specific signal peptides, Tat-specific signal peptides follow the same three-domain pattern (Freudl, 2018) (Figure 1). However, there are a few differences that distinguish Tat signal sequences from those destined for the Sec pathway, as depicted in Table 2. Most notable is the presence of the conserved Tat-specific motif at the boundary between the n-and h-regions: S/T-R-R-X-F-L-K, where X is usually a polar amino acid (Freudl, 2018). The twin arginines (R-R) are invariant in all Tat signal peptides, as this motif plays a key role in the binding of the Tat substrate to the export system (Cristóbal et al., 1999;Palmer & Berks, 2012) (Table 2). Additionally, the +2 and +3 amino acid residues (with respect to the twin arginines) are consistently hydrophobic, as evidenced in Table 2 (Brink et al., 1998;Cristóbal et al., 1999). Another distinction is that Tat signal peptides tend to be less hydrophobic than Sec signal peptides, particularly within the h-region (Cristóbal et al., 1999). This is in part due to comparatively more glycine (G) and less leucine (L) residues in the h-region of Tat signal peptides (Table 2). Additionally, Tat signal peptides are relatively longer than those for the Sec pathway, at average lengths of 38 and 24 amino acids respectively (Cristóbal et al., 1999;Freudl, 2018). A longer n-region in Tat signal peptides contributes to this increased length. The n-region also contains more negatively charged residues than what is seen in Sec signal peptides (Table 2). This may be attributed to the compensatory ability of the large positive charge from the twin-arginine motif (Cristóbal et al., 1999). Finally, Tat signal peptides tend to contain a few positively charged amino acid residues (R, K) within the c-region, termed the "Sec-avoidance" signal (Bogsch et al., 1997;Freudl, 2018) ( Table 2). It is important to note that while the twin-arginine motif is ubiquitous in all Tat-specific signal peptides, it alone is insufficient to direct proteins to the Tat pathway (Bogsch et al., 1997;Chaddock et al., 1995;Palmer & Berks, 2012). Instead, a combination of these key distinguishing features is required.

Signal sequence optimization
Specific modifications to the signal sequence can be made to optimize the export of a given protein. For example, a higher charge-to-length ratio in the n-region typically leads to higher secretion efficiency in the Sec pathway (Peng et al., 2019). This was confirmed experimentally in E . coli by substituting positively charged residues with either neutral or negatively charged residues in the n-region, thereby reducing the positive charge (Inouye et al., 1982;Nesmeyanova et al., 1997). As a result of these substitutions, the rate of transport decreased and the accumulation of protein aggregates in the cytoplasm increased. Conversely, increasing the net charge of the n-region generally improved export efficiency as demonstrated by Ismail et al. (2011) in E .  (Kumar et al., 2018).
Colour key for biochemical properties: yellow = hydrophobic; blue = basic; red = acidic; green = hydroxyl, amine; pink = glycine; indigo = proline; brown = sulfhydryl; aqua = histidine. coli, although this may be protein-specific. This effect is likely due to the electrostatic interactions that are present between the positively charged n-region and negatively charged residues found near the binding groove of the Sec translocase motor protein (SecA) (Chou & Gierasch, 2005;Gelis et al., 2007;Low et al., 2013). Altogether, increasing the charge-to-length ratio of the n-region generally increases transport efficiency for Sec substrates. Manipulating the charge of the n-region may also influence transport efficiency via the Tat pathway (Freudl, 2018). Notably, Li et al. (2006) found that charge distribution in the n-region was more significant than net charge when determining Tatspecific protein secretion rates. This further supports the notion that while there may be general trends, signal peptide optimization is still unique to a given signal sequence and protein.
The h-region can also be modified, as the degree of hydrophobicity appears to be pivotal when determining export efficiency. For example, increasing the hydrophobicity of this region can increase efficiency of export via the Sec pathway, as demonstrated by Chen et al. (1996) in E . coli. This is likely because the h-region adopts an ɑ-helical conformation, which is able to form hydrophobic interactions with the SecA motor component of the Sec transport system (Chou & Gierasch, 2005;Gelis et al., 2007;Mori et al., 1997). By increasing the hydrophobicity, and thereby strengthening this interaction, the efficiency of transport may increase as SecA is more readily able to recognize Sec-specific signal peptides (Chou & Gierasch, 2005;Low et al., 2013). Besides affecting the levels of secreted protein, manipulating the hydrophobicity of the hregion can also change the targeted transport pathway. Natural Tat substrates can be redirected to the Sec pathway by increasing the hydrophobicity of the signal peptide (Cristóbal et al., 1999). This aligns with evidence that Sec signal peptides tend to be comparatively more hydrophobic, as discussed in subsection 2.2. Additionally, further increasing the hydrophobicity of a signal peptide destined for the SecB pathway can redirect it to the SRP-dependent pathway, which is an alternate branch of the Sec transport pathway (Bowers et al., 2003;Low et al., 2013). This is due to SRP preferentially binding longer, more hydrophobic ɑhelices (Low et al., 2013). In general, by manipulating the hydrophobicity of the h-region, both export efficiency and the targeted transport pathway can be altered.
Some current advancements are being made in an attempt to streamline signal peptide optimization. For instance, the generation and screening of "signal peptide libraries" have been performed for a few bacterial species to predict the optimal signal sequence (Brockmeier et al., 2006;Degering et al., 2010;Peng et al., 2019). However, it is important to note that while a given signal peptide may be optimal for one specific protein, it may perform poorly if fused to a different protein. As such, there is no universally optimal signal peptide. These signal peptide libraries and other publicly available databases may be beneficial as initial screening tools, and their utility will increase as they include more species or substrates (Goudenège et al., 2010;Low et al., 2013).
Overall, targeted modifications to the signal peptide can alter the protein's export efficiency. This has beneficial implications for the design and production of recombinant proteins in bacteria, as discussed in the following section.
Recombinant proteins such as growth factors and antibodies can be produced in bacteria for use in research, food production, or the pharmaceutical industry (Freudl, 2018). Bacterial systems boast many advantages such as low cost, high yield, short production time, and relatively easy genetic manipulation (Terpe, 2006). Despite drawbacks, such as differing codon biases and a lack of post-translational modification, bacterial systems remain a popular choice for the production of recombinant proteins. Notably, 30% of protein-based pharmaceuticals were produced by E . coli in 2009 (Ferrer-Miralles et al., 2009). To produce a desired recombinant protein in bacteria, a cleavable signal peptide can be fused at the N-terminus to allow for export out of the cytoplasm (Terpe, 2006). The specific sequence can be modified to optimize export efficiency, while still bearing in mind the limitations of the endogenous transport machinery (Freudl, 2018).
While recombinant proteins can be expressed and collected from the cytoplasm directly, there are many advantages to exporting the protein to the periplasm prior to collection via the incorporation of a signal peptide (Choi et al., 2000;Guerrero Montero et al., 2019). For example, as the periplasm is the only oxidizing compartment of the cell, it is the only location wherein disulfide bond formation will occur (Guerrero Montero et al., 2019;Pooley et al., 1996). As a result, any recombinant protein with disulfide bonds must be exported to the periplasm to exhibit proper tertiary structure. Additionally, exporting to the periplasm allows for easier downstream processing, as there are fewer contaminants such as DNA in the periplasm when compared to the cytoplasm (Balasundaram et al., 2009;Guerrero Montero et al., 2019). Furthermore, protein aggregation can be prevented by exporting the recombinant protein out of the cytoplasm (Guerrero Montero et al., 2019). This is significant as cytoplasmic aggregates can be toxic to the bacterial cell, in addition to making downstream processing increasingly difficult. Finally, fewer proteases are present in the periplasm, leading to less turnover of the recombinant protein (Gottesman, 1996;Mergulhão et al., 2005). On the whole, exporting the desired protein to the periplasm via a carefully designed signal peptide is a key step in recombinant protein production.
Currently, many different recombinant proteins are being produced in bacterial systems and exported via the Sec or Tat pathway. For example, human growth hormone (Guerrero Montero et al., 2019), interferon ɑ2b (Alanen et al., 2015), and human antibody fragments (Alanen et al., 2015) can be exported via Tat. Meanwhile, the Sec pathway has been utilized for the export of insulin-like growth factor 1 (Joly et al., 1998), human epidermal growth factor (Wong & Sutherland, 1993), parathyroid hormone (Wong & Sutherland, 1993), and alkaline phosphatase (Choi et al., 2000). In either case, the specific targeting of the recombinant protein to an export pathway is achieved by the fusing of a signal peptide to the N-terminal region. In the following subsections, the production of two key biopharmaceuticals in E. coli will be discussed in further depth.

Human growth hormone
One specific example of a biopharmaceutical which can be produced and exported by bacteria is human growth hormone (hGH) (Guerrero Montero et al., 2019). hGH can aid in the treatment of hypopituitarism, obesity, and burn/wound healing (Isaksson et al., 1985). Bacterial systems are a good candidate for producing hGH as it has few disulfide bonds and no glycosylation (Guerrero Montero et al., 2019;Ultsch et al., 1994). However, hGH adopts a complex, folded conformation prior to transport out of the cytoplasm. As such, it is unable to be secreted by the Sec pathway, and the Tat pathway must be used instead. Montero et al. (2019) demonstrated this by fusing the TorA Tat signal peptide to hGH (TorA-hGH). This fusion protein was expressed in E. coli "TatExpress" strains, which have elevated levels of Tat proteins to circumvent the slower transport typically associated with the Tat pathway. Mature hGH was found to be readily abundant in the periplasm, with proper disulfide bond formation and correct cleavage of the TorA signal peptide. Additionally, the yield was quite high at 2.39-5.4 g/L of culture, which far exceeds the yield expectations for bacterial recombinant proteins (typically 0.5-0.8 g/L) (Georgiou & Segatori, 2005). This is likely due to the use of the "TatExpress'' strain, allowing for increased export of hGH to the periplasm. The increased export capacity also aids in reducing insoluble hGH aggregates in the cytoplasm, which is a prevalent issue in this field (Guerrero Montero et al., 2019;Patra et al., 2000). Overall, bacterial secretion of TorA-hGH via the Tat pathway appears to be a viable option for the mass production of recombinant hGH.

Full-length monoclonal antibodies
Another specific example of exploiting bacterial signal peptides is the production and secretion of human monoclonal antibodies in E. coli (Zhou et al., 2016).
Monoclonal antibodies are incredibly valuable to the pharmaceutical industry, as they can be used to treat a wide variety of human diseases such as breast cancer, lung cancer, rheumatoid arthritis, ulcerative colitis, and psoriasis (Spadiut et al., 2014). However, secretion efficiency remains a limitation for monoclonal antibody production in bacteria.
In an attempt to circumvent this drawback, Zhou et al. (2016) tested four Sec-specific signal peptides (STII, DsbA, PhoA, MalE) fused to the N-terminus of the heavy chain (HC) antibody component. Of these, the signal peptide from DsbA was deemed to be the most efficient. Zhou et al. (2016) speculated that this was due to the high hydrophobicity of the h-region when compared to the other signal peptides tested. Accordingly, they attempted to further improve secretion efficiency by increasing the hydrophobicity of the h-region for two signal peptides (STII, DsbA). In doing so, Zhou et al. (2016) found that the proportion of mature, secreted HC could be doubled with only a single amino acid substitution in the h-region, wherein a polar amino acid residue was mutated to a more hydrophobic one. Altogether, this demonstrates that the optimization principles discussed in subsection 2.3 can be applied to recombinant protein production in bacterial systems.
The expanding knowledge of signal peptides and their key features has proven to be incredibly valuable when applied to recombinant protein production in bacterial systems. In the future, natural signal peptides could be modified to improve secretion efficiency, such as increasing the charge of the nregion or increasing the length and hydrophobicity of the h-region (Low et al., 2013). While there have already been examples of this in the literature (as discussed in subsection 2.3), the optimization of signal peptides is an ongoing process due to their protein-and species-specific nature. As such, each signal peptide must be optimized anew when considering a novel protein or substrate. These optimization efforts may be further aided by the expansion of current signal peptide libraries to include a wider variety of bacterial species and proteins, as well as the development of highthroughput signal peptide screening methods (Goudenège et al., 2010). Another future direction may involve designing signal peptides de novo, as demonstrated by Mhiri et al. (2000) in Streptomyces. After comparing the most efficient natural signal peptides, they designed a synthetic signal peptide which then outperformed the natural signal peptide up to six-fold. This remarkable proof of concept opens the door for designing synthetic signal peptides. Finally, much is still unknown about the specific interactions between portions of the signal peptide and the transport machinery at the residue level (Blaudeck et al., 2003;Gelis et al., 2007). As such, improving our understanding of the structural and functional roles of each motif may further advance attempts to optimize natural signal sequences or design them de novo.
The importance of bacterial signal peptides cannot be understated, given that they appear to be the sole determiner of transport pathway targeting (Bogsch et al., 1997). Through modifying pre-existing signal peptides or designing them de novo, transport efficiency can be optimized (Low et al., 2013). At the same time, identifying conserved regions, such as the twin-arginine motif or the A-X-A signal peptidase site, provides boundaries for regions which should not be perturbed. These optimized signal sequences can then be exploited for the production of desired proteins, such as human hormones or antibodies, in bacterial systems (Freudl, 2018). Overall, the expanding knowledge of signal peptides is a promising step forward when applied to the field of recombinant protein production.
This review paper was originally written for Dr. Tracy Raivio's Genetics 415 class (Current Topics in Bacterial Genetics) at the University of Alberta.