Abbreviations: SP: Signal peptide, SPase: Signal peptidase, ER: endoplasmic reticulum, LHCGR: Human luteinizing hormone receptor, CADA: Cyclotriazadisulfonamide, hCD4: Human CD4, aa: amino acid, PMF: proton motive force, VSV-G: vesicular stomatitis virus G-protein, MBP: maltose-binding protein, ALP: Alkaline phosphatase, Cel-CD: cellulase catalytic domain, RBP: ribose-binding peptide, GM-CSF: granulocyte-colony-stimulating factors, BEVS: baculovirus expression vector system, LCMV: lymphocytic choriomeningitis virus, TorA: TAMO reductase, OPH: Organophosphorus hydrolase, Csn: chitosanase, pGH: porcine growth hormone, IFN2: interferon alpha 2, PTHrP: parathyroid hormone related protein, TIR: translation initiation region, TOM: translocase of outer membrane, TIM: translocase of inner membrane, MPP: mitochondrial processing peptidase, MIP: mitochondrial intermediate peptidase, SPP: stroma processing peptidase, TPP: thylakoid processing peptides, SPC: signal peptidase complex, NK: natural killer cell, JEV-E: pseudotyped Japanese encephalitis virus envelope, PSM: phenol soluble modulin, CaM: calmodulin, ECP: eosinophil cationic protein, TGF- α : transforming growth factor- α , HAS: human serum albumin, DNP: Dynamic nuclear polarization, MAS: magic angle spinning, NMR: nuclear magnetic resonance.
Introduction
A great majority of secretory proteins in all domains of life carry a short peptide at their Nterminals, called signal peptide (SP). SPs act as zip codes marking the protein secretion pathway as well as the protein target location. In addition to protein targeting, a number of critical functions with or without regard to the passenger proteins have been attributed to SPs. They have come handy in diverse fields from recombinant protein production to disease diagnosis and vaccination. Of special importance, they have been shown to be a promising tool in biotherapeutic production. Different computational and experimental studies have been carried out to elucidate SP features. Moreover, a number of methods and tools have been devised for exploring different features of SPs. This review delves into the literature of SP and provides the knowledge from the structural and functional point of view as well as signal peptide applications.
Applications and importance of signal peptides
Day by day, the demand for biotherapeutics and recombinant proteins is increasing. Hence, several prokaryotic and eukaryotic hosts have been widely accepted for cytoplasmic expression of recombinant protein. However, there are several obstacles in the large-scale production of recombinant proteins, among which inclusion body formation and protein degradation via proteases are the important factors. On the other hand, endogenous proteins may interfere with the folding of a recombinant secretory protein. The above-mentioned factors, as well as the complicated downstream purification process of protein production, will result in loss of protein yield (5). Moreover, the yield of recombinant protein is not only related to the expression level but also to translocation efficiency. Two major components, including secretory machinery and SPs, determine the capability and efficiency of translocation. The translocation efficiency could be increased by using alternative SPs from heterologous species (6-9).
It was shown that the use of SPs has increased protein production to commercially significant levels (10-12). SPs are of special importance from other aspects than production of recombinant proteins. About 50% of proteins that have been identified in the extracellular proteins of Bacillus subtilis contain typical SPs (13). More than 90% of the secretory proteins in Escherichia coli are SPdependent (14). A significant amount of eukaryotic secretory proteins, which is more than 20% of mouse and human proteome, possesses SPs as well (15). The considerable proportion of SPs in various taxa signifies their study. Additionally, a large number of human diseases is caused by mutations in the SPs. A comprehensive literature survey was carried out to find SPassociated diseases. It was revealed that 26 diseases were attributed to the SP mutations and impairment in 21 human proteins (16). For example, a single mutation in the hydrophobic region of pre parathyroid hormone SP impairs the hormone secretion, causing autosomal familial isolated hypoparathyroidism (17). Another example is mutation in the SP of preproinsulin that is associated with the onset of diabetes (18). A new variant SP of the human luteinizing hormone receptor (LHCGR) affects receptor biogenesis via triggering Leydig cell hypoplasia (19). Moreover, SPs are intriguing targets of drugs. For instance, cyclotriazadisulfonamide (CADA) down-modulates human CD4 (hCD4) in a highly selective manner through interfering with the related SP. In fact, CADA disturbs the completion of SP inversion to a hairpin-looped structure, thus, inhibits SP cleavage; and subsequently the CD4 polypeptide is less expressed on the surface of thymocytes. As a result, cells will be protected from HIV and SIV infection, whereas the immune functions attributed to CD4+ T-cell will be maintained (20). The success of gene therapies highly depends on the high transfection efficiency to the target organ, which could be achieved by increasing the secretion rate and serum levels of the therapeutic molecule via selecting optimized SPs (21). It is possible to take advantage of SPs as diagnostic biomarkers in a number of diseases. For instance, levels of SP complement for epidermal growth factor domain-containing protein-1 herald pulmonary embolism (22).
SPs are implemented in lab applications together with their fundamental capacities in industry and therapeutic fields. In this context, there is a technique called “signal-exon trap”, which is used to distinguish secretory proteins or membrane proteins on the genomic scale using SPs (23). Interestingly, SPs are crucial for the selective activity of a selection marker in labs. In this view, sugar intolerance for the SacB, the gene used as a selection marker in E. coli, only appeared when translocation was SP dependent (24). Taken together, SPs are ubiquitous in diverse fields. In addition to the abovementioned roles, SPs have other applications, including controlling the rate of protein secretion, determining protein folding-state, affecting downstream trans-membrane behavior and N-terminal glycosylation, nuclear localization signal, playing role in viral/bacterial infectivity, and applying as potential vaccine candidates. Roles and applications of SPs are summarized in Table 1.
Secretory systems in prokaryotes and eukaryotes
Several different secretory pathways have evolved among organisms: 1) to come up with the secretory needs of rapidly growing organisms such as bacteria and yeasts, 2) to offset the low rate of protein synthesis versus the high rate of secretion, and 3) to be able to secrete proteins with different characteristics. Additionally, the type of secretory pathway, which is determined by the SP’s features, affects protein localization in the cell as well as post/co-translational modification of protein (25-28). Therefore, having a general view of different secretory pathways enables us to optimize protein secretion by choosing the most appropriate SPs or manipulating them with respect to the attributes of the secretory protein. Different types of secretory pathways depending on the N-terminal signal, the related organisms, and their specific features are summarized in Table 2. In this review, only the general secretory pathways are mentioned.
Sec pathway
Introduction of the Sec pathway
Sec pathway or post-translational translocation is a pathway in which proteins remain unfolded to be sufficiently recognized and translocated via the secretory machinery. In other words, proteins that undergo post-translational modifications are secreted via the SecB pathway. The Sec pathway, which seems to be conserved in all classes of lives, is classified as SecB or SecA2 dependent pathway. The SecB pathway is considered as the prevailing pathway in all classes of bacteria and SecA2 is regarded Gram-positive specific (29). More than 90% of E. coli proteins (14) follow the Sec pathway; however, this pathway is absent or of trivial importance in most of the Gram-positive bacteria (30). In eukaryotes, small secretory proteins, including yeast proteins and organellar proteins, opt for the Sec pathway (31). Notably, the Sec pathway is preferable in the biotechnological processes, due to three reasons: 1) it has a higher capacity of protein production than other secretory pathways such as Tat, 2) the contact between the proprotein and intracellular protease is minimal in this pathway, as they are substantially coupled to the secretory components, 3) SecB-dependent proteins are directed to the periplasm or outside of the cell (32, 33).
The SEC pathway components and functionality
Sec machinery consists of SecB (or DnaK), SecA, SecY, SecE, SecG, and SecDF proteins (14). Pre-proteins bind to SecB for maintaining transport-competent state. In fact, SecB acts as a chaperone, which prevents the protein folding before secretion (34), and subsequently, the preprotein is guided to SecA. As SecA was bound to the pre-protein and ATP, it was inserted into the cell membrane. In the cell membrane, the SecA/pre-protein complex forms a translocation complex with SecY/SecE/SecG (35). SecY and SecE form a channel conducting protein out of the cell, and SecG activates translocation. The protein is conducted by the energy released from ATP and proton gradient (33). Finally, the mature protein is released by SecD (or SecDF complex) as the signal peptidase (SPase) cleaves the SP (36) (Figure 1). Cleavage by SPase is an event that takes place in the most aforesaid pathways., SecA and SecY are so critical in the Sec system so that their advantageous mutations can compensate defects in the SP (14).
SRP pathway
Introduction of the SRP pathway
SRP pathway or co-translational translocation is a pathway occurring co-translationally, which prevents any form of cytoplasmic modification. This pathway seems to be present in all domains of life (30). Proteins translocated across the endoplasmic reticulum (ER) membrane utilize the SRP pathway, particularly the ones with more than 100 amino acids (aas) (37). It was hypothesized that the SRP-dependent mechanism dominates in eukaryotes. However, accumulating data suggests that some eukaryotic proteins are translocated via SRPindependent mechanisms (38).
The SRP pathway components and functionality
The SRP pathway consists of the SRP complex, the FtsY receptor in bacteria or SRP receptor in eukaryotes, and the translocon complex. SRP complex is a ribonucleoprotein, consisting of a 7S RNA and six protein subunits (SRP9, SRP14, SRP19, SRP54, SRP68, and SRP72) in eukaryotes, and 4.5S RNA and the Ffh protein in prokaryotes (39). The archaeal SRP pathway is an intermediate between bacterial and eukaryotic SRP system. Recognition of SP by SRP begins in the tunnel of ribosomes or directly after the emergence of a nascent polypeptide chain on the ribosome. In this way, protein folding starts in a very initial stage (40, 41). The SP-SRP complex is targeted to the same translocon in Sec pathway using FtsY/SRP receptor. In the eukaryotes, the ER membrane-spanning translocon Sec61, composed of Sec61, Sec62, and Sec63, directs proteins out of the ER (42). In addition to Sec61 complex, for some of the proteins, two additional membrane proteins called translocating chain associated membrane protein (TRAM) or translocon-associated protein (TRAP) are essential for protein translocation (43, 44). These accessory translocons are crucial in mammalian cells, whereas in the yeast, Sec61 is sufficient for protein translocation (Figure 1).
Tat pathway
Introduction of Tat pathway
The twin-Arg transport system, shortly called Tat pathway, is a Sec-independent pathway that translocates proteins in a mature state. Tat pathway has gotten its name from the SPs carrying the twin-Arg motif (45). It is found in bacteria, archaea, plant chloroplast, and a few plant mitochondria with a higher prevalence in archaea and Gram-positive bacteria, respectively (30, 46). Notably, low secretion efficiency of heterologous proteins in Bacillus species is attributed to the specific properties of Sec pathway, which is used in the production of commercially important recombinant proteins. As a consequence, it has been suggested to take advantage of Sec-independent pathways, including the Tat pathway in Bacillus species (46). Whereas proteins with slow folding rates are translocated via the Sec pathway, rapidly-folding proteins favor the Tat pathway. Additionally, multimeric complexes such as enzyme complex or periplasmic proteins in E. coli, which are bound to cofactors (e.g. flavin and iron-sulfur clusters), are secreted via the Tat pathway (47).
Tat pathway components and functionality
Tat secretory system generally consists of three membrane proteins called TatA, TatB, and TatC. Nevertheless, the types and numbers of translocase components depend on the species (Gram-positive, Gram-negative, or archaea). TatA and TatC are the minimal prerequisites in Tat pathway (30). The mature protein-SP complex is recognized by the TatBC complex, recruiting TatA, which translocates protein through a conducting channel or destabilizing cell membrane (48, 49). In fact, the proton motive force (PMF) is sufficient to initiate the translocation of protein (50). Herein, Tat pathway is evolutionary advantageous, because the only energy needed for protein translocation is the PMF (51). Accordingly, the prevalent secretion pathway in Corynebacterium glutamicum is the Tat pathway, although the majority of SPs have been identified as the Sec-types (52). Moreover, oligomerization of Tat translocon partly depends on the PMF (53). It was revealed that TatB is responsible for recognition of the twin-Arg motif of an SP, while TatC has close contact with the hydrophobic region of the SP (Figure 1).
Signal peptide structure
A typical SP has 25-30 residues (54). Longer SPs (up to ~140 residues) are usually found in eukaryotes; however, they have been also observed in viral proteins and bacterial autotransporters (55, 56). Long eukaryotic SPs are mainly organelle-targeting sequences. Longer SPs mostly remain stable after protein maturation and add extra functions to protein targeting (55). Tat SPs are longer than the Sec or SRP SPs; they have an average length of 36 aa (57). However, SPs could be as small as 16 aa such as the SP of albumin and vesicular stomatitis virus G-protein (VSV-G) (58). The general structure of an SP is composed of three main parts: 1) N-region: the positive-charged domain, 2) H-region: the hydrophobic core, and 3) C-region: the cleavage site (54) (Figure 2). Although all SPs have this general structure, they have remarkable differences in their aa compositions. High variation in SP composition is responsible for their high capacity for protein translocation. It was assumed that SPs are tolerant of a wide range of mutations and capable of secretion in evolutionary distant heterologous hosts (59, 60). However, this belief was an issue of debate thereafter (61). SP variations alter translocation efficiency, cleavage sites, and even the post-cleavage events. In order to address the different positions in SP structure, any position before cleavage site is called as P1 (position -1), P2 (position -2), and so on; and any position after the cleavage site is called as P’1 (position +1), and P’2 (position +2). There are several studies concerning elucidation of SP composition as well as optimization of their structure. In the following sections, the major aspects of SP regions are mentioned and summarized in Table 3.