[go: up one dir, main page]

HK1114405B - Collagen producing plants and methods of generating and using same - Google Patents

Collagen producing plants and methods of generating and using same Download PDF

Info

Publication number
HK1114405B
HK1114405B HK08104257.2A HK08104257A HK1114405B HK 1114405 B HK1114405 B HK 1114405B HK 08104257 A HK08104257 A HK 08104257A HK 1114405 B HK1114405 B HK 1114405B
Authority
HK
Hong Kong
Prior art keywords
gly
pro
ala
glu
leu
Prior art date
Application number
HK08104257.2A
Other languages
Chinese (zh)
Other versions
HK1114405A1 (en
Inventor
Oded Shoseyov
Hanan Stein
Original Assignee
Collplant Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Collplant Ltd. filed Critical Collplant Ltd.
Priority claimed from PCT/IL2005/001045 external-priority patent/WO2006035442A2/en
Publication of HK1114405A1 publication Critical patent/HK1114405A1/en
Publication of HK1114405B publication Critical patent/HK1114405B/en

Links

Description

Collagen-producing plants and methods of making and using same
Field and background of the invention
The present invention relates to collagen-producing plants and methods of making and using the same. More specifically, the present invention relates to a novel method for generating plants capable of producing high levels of hydroxylated collagen chains capable of forming native triple helical type I collagen fibers.
Collagen is the major structural protein responsible for the structural integrity of vertebrates and many other multicellular organisms. Type I collagen represents the prototypical fibrillar collagen and is the predominant collagen type in most tissues.
Type I collagen is the predominant collagen component in bone and tendons, and is found in large quantities in the skin, aorta and lungs. Type I collagen fibers provide great tensile strength and limited extensibility. The most abundant molecular form of type I collagen is composed of 2 different alpha chains [ alpha 1(I)]2And α 2(I) (Inkinen, 2003). All fibrillar collagen molecules comprise 3 polypeptide chains built up from repeated Gly-X-Y triplets, where X and Y can be any amino acid, but are usually the imino acids proline and hydroxyproline.
Fibrillating collagen is synthesized as a precursor procollagen comprising globular N-and C-terminal extension propeptides. The biosynthesis of procollagen is a complex process involving a number of different post-translational modifications, including proline and lysine hydroxylations, N-linked and O-linked glycosylations, and the formation of intra-and inter-chain disulfide bonds. The enzymes that perform these modifications act in a synergistic manner to ensure folding and assembly of properly aligned and thermostable triple-helical molecules.
Each procollagen molecule is assembled within the rough endoplasmic reticulum from 3 polypeptide chain components. Hydroxylation of proline and lysine residues occurs within the Gly-X-Y repeat region because polypeptide chains are co-translationally translocated through the endoplasmic reticulum membrane (translocate). Once the polypeptide chain is fully translocated into the endoplasmic reticulum, the C-propeptide is folded. The 3 pre-alpha chains are then joined via their C-propeptides to form a trimeric molecule, allowing the Gly-X-Y repeat region to form nucleation sites at its C-terminus, ensuring proper alignment of the chains. The Gly-X-Y region is then folded in the C-to-N direction to form a triple helix.
The temporal relationship between polypeptide chain modification and triple helix formation is critical, since hydroxylation of proline residues is essential to ensure the stability of the triple helix at body temperature, once formed, the triple helix no longer serves as a substrate for the hydroxylating enzyme. The C-propeptide (and to a lesser extent the N-propeptide) remains procollagen soluble during its passage through cells (Bulleid et al, 2000). After or during secretion of procollagen molecules into the extracellular matrix, the propeptide is removed by procollagen N-and C-proteases, triggering spontaneous self-assembly of the collagen molecule into fibrils (Hulmes, 2002). Removal of the pro-peptide by procollagen N-and C-proteases reduced the solubility of procollagen by > 10000-fold and was necessary and sufficient to initiate the assembly of collagen into fibers by itself. Critical to this assembly process is a short non-triple helical peptide called telopeptide at the end of the triple helical domain, which ensures correct registration of collagen molecules within the fibrillar structure and reduces the critical concentration for self-assembly (bullid et al, 2000). In nature, the stability of the triple helical structure of collagen requires hydroxylation of proline by prolyl-4-hydroxylase (P4H) to form hydroxyproline residues within the collagen chain.
Plants expressing a collagen chain are known in the art, see, e.g., U.S. Pat. Nos. 6,617,431 and (Merle et al, 2002, Ruggiero et al, 2000). Although plants are capable of synthesizing hydroxyproline-containing proteins, the prolyl hydroxylase responsible for hydroxyproline synthesis in plant cells shows relatively loose substrate sequence specificity compared to mammalian P4H, and therefore collagen production comprising hydroxyproline only at the Y position of the Gly-X-Y triplet requires that plants co-express collagen and the P4H gene (Olsen et al, 2003).
Attempts to produce human collagen that relies on hydroxylation mechanisms naturally occurring in plants have resulted in collagen that is weak in proline hydroxylation (Merle et al, 2002). This collagen melts or loses its triple helix structure at temperatures below 30 ℃. Co-expression of collagen and prolyl-hydroxylase yields stabilized hydroxylated collagen that is biologically relevant for use at body temperature (Merle et al, 2002).
Lysyl hydroxylases (LH, EC 1.14.11.4), galactosyltransferases (EC2.4.1.50) and glucosyltransferases (EC 2.4.1.66) are enzymes involved in post-translational modification of collagen. They sequentially modify lysyl residues in specific positions to hydroxyllysyl, galactosyl hydroxyllysyl and glucosyl galactosyl hydroxyllysyl residues. These structures are characteristic of collagen and are essential for its functional activity (Wang et al, 2002). The individual human enzyme, lysyl hydroxylase 3(LH3), can catalyze all 3 consecutive steps in the formation of hydroxylysine-linked carbohydrates (Wang et al, 2002).
Hydroxylysine of human collagen expressed in tobacco formed less than 2% of the hydroxylysine found in bovine collagen (0.04% residues/1.88% residues). This suggests that plant endogenous lysyl hydroxylase cannot sufficiently hydroxylate lysine in collagen.
While the present invention has been made in practice, the present inventors have revealed that effective hydroxylation of collagen chains relies on the isolation of collagen chains together with enzymes capable of correctly modifying such polypeptides.
Summary of The Invention
According to one aspect of the present invention, there is provided a method of producing collagen in a plant or isolated plant cell, comprising expressing at least one type of collagen alpha chain and exogenous P4H in a plant or isolated plant cell in a manner that enables accumulation of the at least one type of collagen alpha chain and exogenous P4H in a subcellular compartment devoid of endogenous P4H activity, thereby producing collagen in the plant.
According to a further aspect of the present invention, there is provided
According to still further features in preferred embodiments of the invention described below, the method further comprises expressing exogenous LH3 in a subcellular compartment devoid of endogenous P4H activity.
According to still further features in the described preferred embodiments at least one type of collagen alpha chain comprises a signal peptide for targeting to an apoplast or a vacuole.
According to still further features in the described preferred embodiments at least one type of collagen alpha chain lacks an ER targeting sequence or a retention sequence.
According to still further features in the described preferred embodiments at least one type of collagen alpha chain is expressed in a DNA-containing organelle of the plant.
According to still further features in the described preferred embodiments the exogenous P4H includes a signal peptide for targeting to an apoplast or a vacuole.
According to still further features in the described preferred embodiments the exogenous P4H lacks an ER targeting or retention sequence.
According to still further features in the described preferred embodiments the exogenous P4H is expressed in a DNA-containing organelle of the plant.
According to still further features in the described preferred embodiments at least one type of collagen alpha chain is an alpha 1 chain.
According to still further features in the described preferred embodiments at least one type of collagen alpha chain is an alpha 2 chain.
According to still further features in the described preferred embodiments at least one type of collagen alpha chain includes a C-terminal and/or N-terminal propeptide.
According to still further features in the described preferred embodiments the plant is selected from the group consisting of tobacco, corn, alfalfa, rice, potato, soybean, tomato, wheat, barley, Canola (Canola), and cotton.
According to still further features in the described preferred embodiments at least one type of collagen alpha chain or exogenous P4H is expressed only in a part of the plant.
According to still further features in the described preferred embodiments the part of the plant is a leaf, a seed, a root, a tuber, or a stem.
According to still further features in the described preferred embodiments the exogenous P4H is capable of specifically hydroxylating the Y position of the Gly-X-Y triplet of at least one type of collagen alpha chain.
According to still further features in the described preferred embodiments the exogenous P4H is human P4H.
According to still further features in the described preferred embodiments the plant is subjected to a stress condition.
According to still further features in the described preferred embodiments the stress condition is selected from drought, salinity, injury, cold and injection of stress inducing compounds.
According to another aspect of the present invention, there is provided a genetically modified plant or isolated plant cell capable of accumulating collagen alpha chain with a hydroxylation pattern identical to that produced when collagen alpha chain is expressed in human cells.
According to a further aspect of the present invention there is provided a genetically modified plant or isolated plant cell capable of accumulating collagen alpha chains in a subcellular compartment devoid of endogenous P4H activity.
According to still further features in the described preferred embodiments the genetically modified plant further comprises exogenous P4H.
According to still further features in the described preferred embodiments at least one type of collagen alpha chain comprises a signal peptide for targeting to an apoplast or a vacuole.
According to still further features in the described preferred embodiments at least one type of collagen alpha chain lacks an ER targeting sequence or a retention sequence.
According to still further features in the described preferred embodiments at least one type of collagen alpha chain is expressed in a DNA-containing organelle of the plant.
According to still further features in the described preferred embodiments the exogenous P4H includes a signal peptide for targeting to an apoplast or a vacuole.
According to still further features in the described preferred embodiments the exogenous P4H lacks an ER targeting or retention sequence.
According to still further features in the described preferred embodiments the exogenous P4H is expressed in a DNA-containing organelle of the plant.
According to still further features in the described preferred embodiments the collagen alpha chain is an alpha 1 chain.
According to still further features in the described preferred embodiments the collagen alpha chain is an alpha 2 chain.
According to still further features in the described preferred embodiments the collagen alpha chain includes a C-terminal and/or N-terminal propeptide.
According to a further aspect of the present invention, there is provided a plant system comprising a first genetically modified plant capable of accumulating a collagen alpha 1 chain, and a second genetically modified plant capable of accumulating a collagen alpha 2 chain.
According to a further aspect of the present invention, there is provided a plant system comprising a first genetically modified plant capable of accumulating collagen alpha 1 chain and collagen alpha 2 chain, and a second genetically modified plant capable of accumulating P4H.
According to still further features in the described preferred embodiments at least one of the first genetically modified plant and the second genetically modified plant further comprises exogenous P4H.
According to still another aspect of the present invention, there is provided a method for producing fibrillar collagen, comprising: (a) expressing a collagen alpha 1 chain in a first plant; (b) expressing the collagen alpha 2 chain in a second plant; wherein expression in the first plant and the second plant is configured such that each of the collagen alpha 1 chain and the collagen alpha 2 chain is capable of accumulating in a subcellular compartment devoid of endogenous P4H activity; and (c) crossing the first plant with a second plant and selecting progeny expressing collagen alpha 1 chain and collagen alpha 2 chain, thereby producing fibrillar collagen.
According to still further features in the described preferred embodiments the method further comprises expressing an exogenous P4H in each of the first plant and the second plant.
According to still further features in the described preferred embodiments each of the collagen alpha 1 chain and the collagen alpha 2 chain includes a signal peptide for targeting to an apoplast or a vacuole.
According to still further features in the described preferred embodiments each of the collagen alpha 1 chain and the collagen alpha 2 chain lacks an ER targeting sequence or a retention sequence.
According to still further features in the described preferred embodiments steps (a) and (b) are effected via expression in a DNA-containing organelle of the plant.
According to still further features in the described preferred embodiments the exogenous P4H includes a signal peptide for targeting to an apoplast or a vacuole.
According to still further features in the described preferred embodiments the exogenous P4H lacks an ER targeting or retention sequence.
According to still further features in the described preferred embodiments the exogenous P4H is expressed in a DNA-containing organelle of the plant.
According to still further features in the described preferred embodiments each of the collagen alpha 1 chain and the collagen alpha 2 chain includes a C-terminal and/or N-terminal propeptide.
According to still further features in the described preferred embodiments the exogenous P4H is capable of specifically hydroxylating the Y position of the Gly-X-Y triplet of at least one type of collagen alpha chain.
According to still further features in the described preferred embodiments the exogenous P4H is human P4H.
According to still further features in the described preferred embodiments the first plant and the second plant are subjected to a stress condition.
According to still further features in the described preferred embodiments the stress condition is selected from drought, salinity, injury, heavy metal toxicity and cold stress.
According to still another aspect of the present invention, there is provided a method for producing fibrillar collagen, comprising: (a) expressing in a first plant a collagen alpha 1 chain and a collagen alpha 2 chain, wherein the expression in the first plant is configured such that each of the collagen alpha 1 chain and the collagen alpha 2 chain is capable of accumulating in a subcellular compartment devoid of endogenous P4H activity; (b) expressing in a second plant exogenous P4H capable of accumulating in a subcellular compartment devoid of endogenous P4H activity; and (c) crossing the first plant with a second plant and selecting progeny expressing collagen alpha 1 chain, collagen alpha 2 chain and P4H, thereby producing fibrillar collagen.
According to a further aspect of the present invention there is provided a nucleic acid construct comprising a polynucleotide encoding human P4H placed under the transcriptional control of a promoter functional in plant cells.
According to still further features in the described preferred embodiments the promoter is selected from the group consisting of a CaMV35S promoter, a ubiquitin promoter, a rbcS promoter, and a SVBV promoter.
According to a further aspect of the present invention there is provided a genetically modified plant or isolated plant cell capable of expressing collagen alpha 1 chain, collagen alpha 2 chain, P4H, LH3 and protease C and/or protease N.
According to still further features in the described preferred embodiments the collagen alpha 1 chain and the collagen alpha 2 chain are each capable of accumulating in a subcellular compartment devoid of endogenous plant P4H activity.
According to a further aspect of the present invention there is provided a genetically modified plant or isolated plant cell capable of accumulating collagen having temperature stability characteristics which are the same as those of mammalian collagen.
According to still further features in the described preferred embodiments the collagen is type I collagen.
According to still further features in the described preferred embodiments the mammalian collagen is human collagen.
According to yet another aspect of the present invention, there is provided a collagen coding sequence optimized for expression in a plant.
According to still further features in the described preferred embodiments the collagen-encoding sequence is a sequence consisting of SEQ ID NO: 1 is illustrated.
The present invention successfully addresses the shortcomings of the presently known configurations by providing collagen chains capable of expressing the correct hydroxylation, which can assemble into collagen with properties similar to human collagen.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. In case of conflict, the present patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.
Brief Description of Drawings
The invention is herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.
In the drawings:
FIGS. 1a-d illustrate the construction of various expression cassettes and vectors for transforming test plants. All coding sequences synthesized as part of this study were optimized for expression in tobacco.
FIG. 2 illustrates various co-transformation methods. Each expression cassette is represented by a short name of the coding sequence. The coding sequences are detailed in table 1. Each co-transformation was performed with 2 pBINPLUS binary vectors. Each rectangle represents a single pBINPLUS vector carrying 1, 2 or 3 expression cassettes. Promoters and terminators are described in detail in example 1.
FIG. 3 is a multiplex PCR screen of transformants showing plants positive for either collagen alpha 1 chain (324bp fragment) or collagen alpha 2 chain (537bp fragment) or both.
FIG. 4 is a Western blot analysis of transgenic plants produced by co-transformation of 2, 3 and 4.
Total soluble proteins were extracted from tobacco co-transformants #2, #3 and #4 and detected with anti-collagen I antibody (# AB745 from Chemicon inc. The size marker is # SM0671 from Fermentas inc. W.t. is wild type tobacco. Positive collagen bands are visible in plants that are PCR positive for either type I collagen alpha 1 or alpha 2 or both. A positive control band of 500ng type I collagen from human placenta (# CC050 from Chemicon inc., extracted from human placenta by pepsin digestion) represents about 0.3% total soluble protein (about 150 μ g) in the sample from the transgenic plant. The larger band of about 140kDa in the human collagen sample was procollagen containing its C-propeptide as detected by the anti-carboxy-terminal propeptide of collagen type I antibody (# MAB1913 from Chemicon inc.). The smaller band of about 120kDa in the human collagen sample is collagen without the pro-peptide. Due to its rare composition, proline-rich proteins (including collagen) migrate uniformly on polyacrylamide gels as bands with higher than expected molecular weights. Thus, the collagen chain with a molecular weight of about 95kDa and without the pro peptide migrates as a band of about 120 kDa.
FIG. 5 is a Western blot analysis of transgenic plants generated from co-transformed #8 carrying an apoplast (appoplast) signal translationally fused to the collagen chain. Total soluble protein was extracted from transgenic tobacco leaves and detected with anti-collagen I antibody (# AB745 from Chemicon inc. Positive collagen alpha 2 bands were visible in plants 8-141. Type I collagen from human placenta (# CC050 from Chemicon inc.) served as a control.
FIGS. 6a-b illustrate the assembly of collagen triple helix and thermostability as demonstrated by heat treatment and trypsin or pepsin digestion, in FIG. 6 a-total soluble proteins from tobacco 2-9 (expressing only collagen. alpha.1 and not P4H) and 3-5 (expressing collagen. alpha.1 +2 and human P4H. alpha. and. beta. subunits) were heat treated (38 ℃ or 43 ℃ for 15 minutes), followed by trypsin digestion (room temperature for 20 minutes) and detected in a Western blot procedure with anti-collagen I antibody. The positive control was 500ng human collagen I sample + total soluble protein of wild type tobacco. In fig. 6 b-total soluble protein was extracted from transgenic tobacco 13-6 (expressing collagen I α 1 and α 2 chains-indicated by arrows, human P4H α and β subunits and human LH3) and heat treated (33 ℃, 38 ℃ or 42 ℃ for 20 minutes), immediately cooled on ice to prevent reassembly of the triple helix, and incubated with pepsin at room temperature (about 22 ℃) for 30 minutes, followed by detection with anti-collagen I antibody (# AB745 from Chemicon inc.) in a standard western blot method. The positive control was a-50 ng human collagen I sample (# CC050 from Chemicon inc., extracted from human placenta by pepsin digestion) added to total soluble protein extracted from wild-type tobacco.
FIG. 7 illustrates northern blot analysis performed on wild-type tobacco. The blot was probed with tobacco P4 HcDNA.
FIG. 8 is a Western blot analysis of transgenic plants produced by co-transformation of 2, 3 and 13. Total soluble protein was extracted from tobacco co-transformants and detected with anti-human P4H α and β and anti-collagen I antibodies.
Description of the preferred embodiments
The present invention relates to plants that express and accumulate collagen, which can be used to produce collagen and collagen fibers that exhibit the characteristics of mammalian collagen.
The principles and operation of the present invention may be better understood with reference to the drawings and the accompanying description.
Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details set forth in the following description or exemplified by the examples. The invention is capable of other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.
Plants that produce collagen are known in the art. Although such plants can be used to produce collagen chains as well as collagen, such chains are incorrectly hydroxylated and therefore assemble themselves, whether in a plant or not, resulting in collagen that is inherently unstable.
While turning the present invention to practice, the present inventors devised a plant expression method that ensures correct hydroxylation of collagen chains and thus enables the production of collagen in plants that closely mimics the characteristics of human type I collagen (e.g., temperature stability).
Thus, according to one aspect of the present invention, there is provided a genetically modified plant capable of expressing at least one type of collagen alpha chain and accumulating it in a subcellular compartment devoid of endogenous P4H activity.
As used herein, the phrase "genetically modified plant" refers to any lower (e.g., moss) or higher (vascular) plant or tissue or isolated cell thereof (e.g., cell suspension) that is stably or transiently transformed with an exogenous polynucleotide sequence. Examples of plants include tobacco, corn, alfalfa, rice, potato, soybean, tomato, wheat, barley, canola, cotton, carrot, and lower plants such as moss.
As used herein, the phrase "collagen chain" refers to a collagen subunit, e.g., the α 1 or 2 chain of a collagen fiber, preferably a type I fiber. As used herein, the phrase "collagen" refers to an assembled trimer of collagen, which in the case of type I collagen includes 2 α 1 chains and 1 α 2 chain. Collagen fibrils are collagens that lack the terminal propeptides C and N.
As used herein, the phrase "subcellular compartment lacking endogenous P4H activity" refers to any compartmentalized region of a cell that does not include plant P4H or enzymes having plant-like P4H activity. Examples of such subcellular compartments include the vacuole, apoplast and cytoplasm, and organelles such as chloroplasts, mitochondria, and the like.
Any type of collagen strand can be expressed by the genetically modified plants of the invention. Examples include fibrillating collagens (types I, II, III, V and XI), network-forming collagens (types IV, VIII and X), collagens bound to the fibril surface (types IX, XII and XIV), collagens present as transmembrane proteins (types XIII and XVII), or collagens that form 11nm periodic beaded filaments (type VI). For further description please see Hulmes, 2002.
Preferably, the expressed collagen chains are alpha 1 and/or 2 chains of type I collagen. The expressed collagen alpha chain may be encoded by any polynucleotide sequence derived from any mammal. Preferably, the sequence encoding the collagen alpha chain is human and consists of SEQ ID NO: 1 and 4.
Generally, the alpha collagen chains expressed in plants may or may not include their terminal propeptides (i.e., propeptide C and propeptide N).
Ruggiero et al (2000) indicate that procollagen is treated differently via plant proteolytic activity than normal in humans, and that propeptide C is removed by plant proteolytic activity, although the cleavage site is unknown. Cleavage of propeptide C may occur on the procollagen peptide prior to trimer assembly (binding of 3C-propeptides is necessary to initiate trimer assembly).
Cleavage of the N-propeptide via plant proteolytic activity occurs in mature plants but not in plantlets. Such cleavage removes 2 amino acids (2 out of 17) from the N-terminal peptide.
The C-propeptide (and to a lesser extent the N-propeptide) remains procollagen soluble during its passage through animal cells (Bulleid et al, 2000) and is expected to have a similar effect in plant cells. After or during secretion of procollagen molecules into the extracellular matrix, the propeptide is removed by procollagen N-and C-proteases, triggering spontaneous self-assembly of the collagen molecule into fibrils (Hulmes, 2002). Removal of the pro-peptide by procollagen N-and C-proteases reduced the solubility of procollagen by > 10000-fold and was necessary and sufficient to initiate the assembly of collagen into fibers by itself. Critical to this assembly process is a short non-triple helical peptide called telopeptide at the end of the triple helical domain, which ensures correct registration of collagen molecules within the fibrillar structure and reduces the critical concentration for self-assembly (bullid et al, 2000). The prior art describes the cleavage of the pro-peptide by pepsin during collagen production (Bulleid et al 2000). However, pepsin destroys the telopeptides, and thus, pepsin-extracted collagen is unable to form an ordered fibrillar structure (Bulleid et al 2000).
The Protein Disulfide Isomerase (PDI) forming the human P4H β subunit was shown to bind to the C-propeptide prior to trimer assembly, and therefore also acts as a molecular chaperone during chain assembly (Ruggero et al, 2000). The use of human procollagen I N-protease and procollagen C-protease expressed in different plants may produce collagen more similar to native human collagen, and may form an ordered fibrillar structure.
In cases where N or C propeptides or both are included in the expressed collagen chain, the genetically modified plants of the invention may also express the respective protease (i.e., C or N or both). The polynucleotide sequence encoding such proteases consists of SEQ ID NO: 18 (protease C) and 20 (protease N). Such proteases may be expressed such that they accumulate in the same subcellular compartment as the collagen chains.
Accumulation of expressed collagen chains in subcellular compartments lacking endogenous P4H activity can be achieved via any of several methods.
For example, the expressed collagen chains may include a signal sequence for targeting the expressed protein to a subcellular compartment such as an apoplast or an organelle (e.g., a chloroplast). Examples of suitable signal sequences include the chloroplast transit peptide (including amino acids 1-57 in Swiss-Prot, accession number P07689) and the mitochondrial transit peptide (including amino acids 1-28 in Swiss-Prot, accession number P46643). Additional examples of suitable signal sequences and guidelines for using such signal sequences in the expression of collagen chains in plant cells are provided in the examples section below.
Alternatively, when expressed in plants, the sequence of the collagen chains may be modified in such a way as to alter the collagen cell localization.
As mentioned above, the ER of plants comprises P4H which does not allow the correct hydroxylation of collagen chains. The collagen alpha chain naturally includes an ER targeting sequence that directs the expressed collagen into the ER, where it is post-translationally modified (including incorrect hydroxylation). Thus, removal of the ER targeting sequence will result in cytoplasmic accumulation of collagen chains lacking post-translational modifications (including any hydroxylation).
Example 1 of the examples section below describes the production of collagen sequences lacking ER sequences.
Still alternatively, the collagen chains may be expressed and accumulated in DNA-containing organelles such as chloroplasts or mitochondria. Further description of chloroplast expression is provided below.
As mentioned above, hydroxylation of the alpha chain is necessary for stable collagen I assembly. Since the alpha chains expressed by the genetically modified plants of the invention accumulate in the regions lacking endogenous P4H activity, such chains must be isolated from plants, plant tissues or cells and hydroxylated in vitro. Such hydroxylation can be achieved by the methods described by Turpeenniemi-Hujanen and Myllya (Concomitant hydrolysis 0f proline and lysine resins in collagen producing purified enzymes in vitro. Biochim Biophys acta.1984Jul 16; 800 (l): 59-65).
Although such in vitro hydroxylation can produce correctly hydroxylated collagen chains, its implementation is difficult and costly.
To overcome the limitation of in vitro hydroxylation, the genetically modified plants of the invention preferably also co-express P4H, which enables correct hydroxylation of the collagen alpha chain [ i.e.hydroxylation only at the proline (Y) position of the Gly-X-Y triad ]. P4H is an enzyme consisting of 2 subunits, α and β. Both are necessary for the formation of active enzymes, while the beta subunit also has chaperone function.
P4H expressed by the genetically modified plant of the invention is preferably human P4H, which is encoded by a nucleotide sequence such as SEQ ID NO: 12 and 14. In addition, mutants of P4H or P4H homologues showing enhanced substrate specificity may also be used.
An example of a suitable P4H homologue is the Arabidopsis oxidoreductase identified by NCBI accession number NP-179363. The pairwise comparison of this protein sequence with the human P4H alpha subunit by the present inventors revealed the highest homology between the functional domains of any known plant P4H homologue.
Since P4H needs to co-accumulate with the expressed collagen chain, its coding sequence is preferably modified accordingly (addition of a signal sequence, deletion that may prevent ER targeting, etc.).
In mammalian cells, collagen is also modified by lysyl hydroxylase, galactosyltransferase, and glucosyltransferase enzymes. These enzymes sequentially modify lysyl residues at specific positions to hydroxyllysyl, galactosyl hydroxyllysyl and glucosyl galactosyl hydroxyllysyl residues. The individual human enzyme, lysyl hydroxylase 3(LH3), can catalyze all 3 consecutive steps in the formation of hydroxylysine-linked carbohydrates.
Thus, the genetically modified plants of the invention also preferably express mammalian LH 3. The sequence encoding LH3 is for example SEQ ID NO: 22 may be used for such purposes.
The collagen chains and modifying enzymes described above may be expressed from stably integrated or transiently expressed nucleic acid constructs comprising polynucleotide sequences encoding alpha chains and/or modifying enzymes (e.g., P4H and LH3) placed under the transcriptional control of a plant functional promoter. Such nucleic acid constructs (which are also referred to herein as expression constructs) can be configured for expression throughout an intact plant, a defined plant tissue or a defined plant cell, or at a defined developmental stage of a plant. Such constructs may also include selectable markers (e.g., antibiotic resistance), enhancer elements, and origins of replication for bacterial replication.
It will be appreciated that constructs comprising 2 expressible inserts (e.g. 2 alpha chain types, or alpha chain and P4H) preferably comprise a separate promoter for each insert, or alternatively, such constructs may express a single transcript chimera comprising 2 inserts from a single promoter. In this case, the chimeric transcript includes an IRES sequence between the 2 inserts so that the downstream insert can be translated therefrom.
The constructs of the invention may employ a wide variety of plant-functional expression promoters and enhancers, which may be tissue-specific, developmental-specific, constitutive, or inducible, examples of which are provided below.
As used herein and in the claims section that follows, the phrase "plant promoter" or "promoter" includes promoters that can direct the expression of genes in plant cells, including DNA-containing organelles. Such promoters may be derived from plant, bacterial, viral, fungal or animal sources. Such promoters may be constitutive, i.e., capable of directing high levels of gene expression in a plurality of plant tissues, tissue specific, i.e., capable of directing gene expression in one or more specific plant tissues, inducible, i.e., capable of directing gene expression under stimulation, or chimeric, i.e., formed from at least 2 different promoter parts.
Thus, the plant promoter used may be a constitutive promoter, a tissue-specific promoter, an inducible promoter or a chimeric promoter.
Examples of constitutive plant promoters include, but are not limited to, CaMV35S and CaMV19S promoters, FMV34S promoter, sugarcane baculovirus (sugarcane bacillus baculovirus) promoter, CsVMV promoter, arabidopsis thaliana ACT2/ACT8 actin promoter, arabidopsis thaliana ubiquitin UBQ1 promoter, barley sulfokine BTH6 promoter, and rice actin promoter.
Examples of tissue-specific promoters include, but are not limited to, the phaseolin storage protein promoter, the DLEC promoter, the PHS promoter, the zein storage protein promoter, the lupin conglutin gamma promoter from soybean, the AT2S1 gene promoter, the ACT11 actin promoter from Arabidopsis thaliana, the napA promoter from Brassica napus (Brassica napus), and the potato patatin gene promoter.
Inducible promoters are promoters that are induced by specific stimuli such as stress conditions (including, for example, light, temperature, chemicals, drought, high salinity, osmotic shock), oxidant conditions or in pathogenic situations, and include, but are not limited to, light inducible promoters derived from the pea rbcS gene, promoters from the alfalfa rbcS gene, promoters DRE, MYC and MYB that are activated upon drought; INT, INPS, prxEa, Ha hsp17.7G4 and RD21 promoters activated under high salinity and osmotic stress, and hsr203J and str246C promoters activated under pathogenic stress.
Preferably, the promoter used in the present invention is a strong constitutive promoter, such that overexpression of the construct insert is achieved after plant transformation.
It is understood that any construct type used in the present invention can be co-transformed into the same plant using the same or different selectable markers in each construct type. Alternatively, a first construct type may be introduced into a first plant and a second construct type may be introduced into a second isogenic plant, after which the resulting transgenic plants may be crossed and progeny of the double transformants selected. Further selfing of such progeny can be used to generate homozygous lines of 2 constructs.
Various methods exist for introducing nucleic acid constructs into monocotyledonous and dicotyledonous plants (Potrykus, I., Annu. Rev. plant. physiol., plant. mol. biol. (1991) 42: 205-225; Shimamoto et al Nature (1989) 338: 274-276). Such methods rely on stable integration of the nucleic acid construct or parts thereof into the plant genome, or on transient expression of the nucleic acid construct in cases where these sequences are not inherited by plant progeny.
Furthermore, there are several ways in which the nucleic acid construct can be introduced directly into the DNA of DNA-containing organelles, such as chloroplasts.
There are 2 basic methods to achieve stable genomic integration of exogenous sequences, such as those included within the nucleic acid constructs of the invention, into the plant genome:
(i) agrobacterium (agrobacterium) -mediated gene transfer: klee et al (1987) Annu.Rev.plant physiol.38: 467-; klee and Rogers, CellCulture and social Cell Genetics of Plants, Vol.6, molecular biology of Plant Nuclear Genes, eds. Schell, J., and Vasil, L.K., Academic Publishers, San Diego, Calif. (1989) pp.2-25; gatenby, Plant Biotechnology, eds. Kung, S. and Arntzen, C.J., Butterworth publishers, Boston, Mass. (1989) pages 93-112.
(ii) Direct uptake of DNA: paszkowski et al, Cell Culture and SomaticCell Genetics of Plants, Vol.6, Molecular Biology of Plant nucleic acids eds. Schell, J., and Vasil, L.K., Academic Publishers, San Diego, Calif. (1989) pp.52-68; including methods for direct uptake of DNA into protoplasts, Toriyama, k, et al (1988) Bio/Technology 6: 1072-1074. DNA uptake induced by transient electrical shock of plant cells: zhang et al Plant Cell Rep (1988) 7: 379-384 Fromm et al Nature (1986) 319: 791-793. By particle bombardment, Klein et al Bio/Technology (1988) 6: 559-563; McCabe et al Bio/Technology (1988) 6: 923-; sanford, physiol.plant. (1990) 79: 206-209; by using a micropipette system, Neuhaus et al, the or.appl.genet (1987) 75: 30-36; neuhaus and Spangenberg, physiol.plant. (1990) 79: 213-217; or by direct incubation of DNA with germinating pollen, DeWet et al Experimental management of Ovule Tissue, eds. Chapman, G.P. and Mantell, S.H. and Daniels, W.Longman, London, (1985) pp.197-; and Ohta, proc.natl.acad.sci.usa (1986) 83: 715-719, DNA is injected into plant cells or tissues.
The agrobacterium system involves the use of a plasmid vector that contains a defined DNA segment integrated into the plant genomic DNA. The method of inoculation of plant tissue varies depending on the plant species and the agrobacterium delivery system. A widely used method is the leaf disk method, which can be performed from any tissue explant that provides a good source for initiating whole plant differentiation. Horsch et al Plant Molecular Biology Manual A5, Kluwer Academic Publishers, Dordrecht (1988) pages 1-9. A complementary approach employs an agrobacterium delivery system combined with vacuum infiltration. The Agrobacterium system is particularly viable in the production of transgenic dicotyledonous plants.
There are various methods for directly transferring DNA into plant cells. In electroporation, protoplasts are briefly exposed to a strong electric field. In microinjection, DNA is mechanically injected directly into cells using a tiny micropipette. In microparticle bombardment, DNA is adsorbed on microparticles such as magnesium sulfate crystals, tungsten particles, or gold particles, and the microparticles are physically accelerated into cells or plant tissue.
And (4) carrying out plant propagation after transformation. The most common method of plant propagation is through seeds. Regeneration by seed propagation, however, has the disadvantage of a lack of uniformity in the crop due to heterozygosity, since the seeds are produced by plants according to the genetic variation determined by mendelian's law. Basically, each seed is genetically different and each will grow with its own specific trait. Thus, preferably, the transformed plant is produced such that the regenerated plant has the same traits and characteristics as the parent transgenic plant. Thus, it is preferred that the transformed plant is regenerated by micropropagation (micropropagation) which provides rapid, consistent reproduction of the transformed plant.
Transient expression methods that can be used to transiently express the isolated nucleic acids included in the nucleic acid constructs of the invention include, but are not limited to, microinjection and bombardment as described above, but under conditions conducive to transient expression, and virus-mediated expression, wherein a packaged or unpackaged recombinant viral vector including the nucleic acid construct is used to infect plant tissues or cells such that the propagating recombinant virus established therein expresses non-viral nucleic acid sequences.
Viruses that have been shown to be useful for plant host transformation include CaMV, TMV and BV. Plant transformation using plant viruses is described in U.S. Pat. No. 4,855,237(BGV), EP-A67,553(TMV), Japanese published application No. 63-14693(TMV), EPA 194,809(BV), EPA 278,667 (BV); and Gluzman, y, et al, Communications in molecular biology: viral Vectors, Cold Spring Harbor Laboratory, New York, pp.172-189 (1988). Pseudoviral particles for expressing foreign DNA in a variety of hosts, including plants, are described in WO 87/06261.
Construction of plant RNA viruses for introducing and expressing non-viral foreign nucleic acid sequences in plants the DNA sequences of the above references and of Dawson, W.O. et al, Virology (1989) 172: 285- & ltSUB & gt 292-; takamatsu et al EMBOJ. (1987) 6: 307-311; french et al Science (1986) 231: 1294-1297; and Takamatsu et al FEBS Letters (1990) 269: 73-76.
When the virus is a DNA virus, the virus itself may be constructed. Alternatively, the virus may first be cloned into a bacterial plasmid for easy construction of the desired viral vector containing the foreign DNA. The virus can then be excised from the plasmid. If the virus is a DNA virus, the bacterial origin of replication can be attached to viral DNA which is subsequently replicated by the bacteria. Transcription and translation of this DNA will produce coat proteins, which will encapsidate the viral DNA. If the virus is an RNA virus, the virus is typically cloned as cDNA and inserted into a plasmid. Plasmids were subsequently used to prepare all constructs. RNA viruses are then produced by transcribing the viral sequences of the plasmid and translating the viral genes to produce coat proteins that encapsidate the viral RNA.
The construction of plant RNA viruses for introducing and expressing non-viral exogenous nucleic acid sequences, such as those included in the constructs of the present invention, in plants is demonstrated by the above-referenced references, as well as by U.S. patent No. 5,316,931.
In one embodiment, a plant viral nucleic acid is provided in which the native coat protein coding sequence has been deleted from the viral nucleic acid, a non-native plant viral coat protein coding sequence and a non-native promoter, preferably a subgenomic promoter of the non-native coat protein coding sequence, has been inserted that is capable of being expressed in a plant host, packaging the recombinant plant viral nucleic acid, and ensuring systemic infection of the host by the recombinant plant viral nucleic acid. Alternatively, the coat protein gene may be inactivated by insertion of a non-native nucleic acid sequence therein, thereby producing the protein. The recombinant plant viral nucleic acid may comprise one or more additional non-native subgenomic promoters. Each non-native subgenomic promoter is capable of transcribing or expressing adjacent genes or nucleic acid sequences in a plant host and is incapable of recombining with each other and with native subgenomic promoters. If more than one nucleic acid sequence is included, the inserted non-native (foreign) nucleic acid sequence may be adjacent to a native plant viral subgenomic promoter, or adjacent to both a native and non-native plant viral subgenomic promoter. The non-native nucleic acid sequence is transcribed or expressed in a host plant under the control of a subgenomic promoter to produce the desired product.
In a second embodiment, a recombinant plant viral nucleic acid is provided as in the first embodiment except that the native coat protein coding sequence is placed adjacent to a non-native coat protein subgenomic promoter rather than the non-native coat protein coding sequence.
In a third embodiment, a recombinant plant viral nucleic acid is provided in which the gene for the native coat protein is adjacent to its subgenomic promoter, and one or more non-native subgenomic promoters have been inserted into the viral nucleic acid. The inserted non-native subgenomic promoters are capable of transcribing or expressing adjacent genes in the plant host and are incapable of recombining with each other and with native subgenomic promoters. The inserted non-native nucleic acid sequence may be adjacent to a non-native subgenomic plant viral promoter such that the sequence is transcribed or expressed in the host plant under the control of the subgenomic promoter to produce the desired product.
In a fourth embodiment, a recombinant plant viral nucleic acid is provided as in the third embodiment except that the native coat protein coding sequence is replaced with a non-native coat protein coding sequence.
The viral vector is encapsidated by a coat protein encoded by the recombinant plant viral nucleic acid to produce the recombinant plant virus. The recombinant plant viral nucleic acid or recombinant plant virus is used to infect a suitable host plant. Recombinant plant viral nucleic acids are capable of replicating in a host, diffusing systemically in a host, and transcribing or expressing foreign genes (isolated nucleic acids) in a host to produce a desired protein.
Techniques for introducing foreign nucleic acid sequences into the chloroplast genome are known. This technique includes the following procedures. First, the plant cells are chemically treated to reduce the number of chloroplasts per cell to about one. Subsequently, the exogenous nucleic acid is introduced into the cell via particle bombardment for introducing at least one exogenous nucleic acid molecule into the chloroplasts. The exogenous nucleic acid is selected so that it can integrate into the chloroplast genome via homologous recombination, which is readily accomplished by enzymes inherent to the chloroplast. To this end, the foreign nucleic acid comprises, in addition to the gene of interest, at least one nucleic acid stretch which originates from the chloroplast genome. In addition, the exogenous nucleic acid includes a selectable marker that is used by the sequential selection process to determine that all or substantially all of the copies of the chloroplast genome will include the exogenous nucleic acid following such selection. Further details regarding this technique are found in U.S. Pat. Nos. 4,945,050 and 5,693,507, which are incorporated herein by reference. Thus the polypeptide can be produced by the protein expression system of the chloroplast and become integrated into the chloroplast inner membrane.
The above transformation methods can be used to produce collagen strands and/or modified enzymes and assembled collagen (with or without propeptides) in plants of any species, or plant tissues or isolated plant cells derived therefrom.
Preferred plants are those capable of accumulating large amounts of the collagen chains, collagen and/or processing enzymes described herein. Such plants may also be selected for their resistance to stress conditions and for the ease with which the expressed components or assembled collagen can be extracted. Examples of preferred plants include tobacco, corn, alfalfa, rice, potato, soybean, tomato, wheat, barley, canola and cotton.
Collagen fibers are widely used in the food and cosmetic industries. Thus, although plant-expressed collagen fibre components (alpha chain) and modifying enzymes find utility in the industrial synthesis of collagen, it is preferred to produce intact collagen in plants due to its simplicity and cost effectiveness.
Several methods can be used to produce type I collagen in plants. For example, the collagen alpha 1 chain can be isolated from plants expressing collagen alpha 1 and P4H (and optionally LH3) and mixed with the collagen alpha 2 chain isolated from plants expressing collagen alpha 2 and P4H (and optionally LH3 and proteases C and/or N). Because the collagen α 1 chain automatically assembles itself into a triple helix, such homotrimers may have to be denatured prior to mixing and renatured with the collagen α 2 chain.
Preferably, a first plant expressing collagen alpha 1 and P4H (and optionally LH3 and proteases C and/or N) may be crossed with a second (and preferably isogenic) plant expressing collagen alpha 2, or alternatively, a first plant expressing 2 alpha chains may be crossed with a second plant expressing P4H and optionally LH3 and proteases C and/or N.
It should be noted that although the above-described plant breeding method uses 2 individually transformed plants, a method using 3 or more individually transformed plants each expressing 1 or 2 components may also be employed.
Various plant breeding techniques are known to those of ordinary skill in the art and therefore further description of such techniques is not provided herein.
Although plant breeding methods are preferred, it should be noted that a single plant expressing collagen alpha 1 and 2, P4H and LH3 (and optionally protease C and/or N) can be produced via several transformation events, each of which is designed to introduce yet another expressible component into the cell. In this case, the stability of each transformation event can be verified using specific selection markers.
In any event, transformation and plant breeding methods can be used to generate any plant expressing any number of components. Presently preferred are plants expressing collagen alpha 1 and 2 chains, P4H, LH3 and at least one protease (e.g., protease C and/or N). As will be further described in the examples section below, such plants accumulate collagen that exhibits stability at temperatures up to 42 ℃.
Progeny resulting from breeding, or alternatively, multiply transformed plants, may be selected by confirming the presence of exogenous mRNA and/or polypeptide using nucleic acid or protein probes (e.g., antibodies). The latter method is preferred because it enables localization of the expressed polypeptide components (by, for example, probing fractionated plant extracts) and thus also demonstrates potential for correct processing and assembly. Examples of suitable probes are provided in the examples section below.
Once progeny expressing collagen are identified, such plants are further grown under conditions that maximize expression of the collagen chains and modifying enzymes.
Since free proline accumulation may promote overproduction of different proline-rich proteins including collagen chains expressed by the genetically modified plants of the invention, preferred incubation conditions are those that increase the accumulation of free proline in the incubated plants.
Free proline accumulates in a variety of plants in response to a wide range of environmental stresses including deprivation, salinization, low temperature, high temperature, pathogen infection, heavy metal toxicity, anaerobic life, nutrient deficiency, atmospheric pollution and ultraviolet irradiation (Hare and Cress, 1997).
Free proline may also accumulate in response to treatment of plants or soil via compounds such as ABA, or stress inducing compounds such as copper salts, paraquat, salicylic acid, and the like.
Thus, collagen-expressing offspring can be grown under different stress conditions (e.g., different NaCl concentrations ranging from 50mM up to 250 mM). To further enhance collagen production, the effect of various stress conditions on collagen expression will be examined and optimized with respect to plant viability, biomass and collagen accumulation.
Plant tissue/cells are preferably harvested at the mature stage and collagen fibers are isolated using well known prior art extraction methods, one such method being described below.
Leaves of transgenic plants were ground to powder under liquid nitrogen and the homogenate was extracted in 0.5M acetic acid containing 0.2M nacl at 4 ℃ for 60 hours. Insoluble material was removed by centrifugation. The supernatant containing the recombinant collagen was subjected to salt fractionation at 0.4M and 0.7M NaCl. The 0.7M NaCl precipitate comprising recombinant heterotrimeric collagen was dissolved in and dialyzed against 0.1M acetic acid and stored at-20 ℃ (according to Ruggiero et al, 2000).
Other objects, advantages and novel features of the present invention will become apparent to one of ordinary skill in the art upon examination of the following examples, which are not intended to be limiting. In addition, each of the various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below finds experimental support in the following examples.
Examples
Reference is now made to the following examples, which together with the above description, illustrate the invention in a non-limiting manner.
Generally, the nomenclature used herein and the laboratory procedures utilized in the present invention include molecular, biochemical, microbiological and recombinant DNA techniques. Such techniques are well described in the literature. See, e.g., "Molecular Cloning: a laboratory Manual "Sambrook et al (1989); "Current Protocols in Molecular Biology" Vol.I-III, Ausubel, R.M., ed. (1994); ausubel et al, "Current protocols in Molecular Biology", John Wiley and Sons, Baltimore, Maryland (1989); perbal, "A Practical Guide to molecular cloning," John Wiley & Sons, New York (1988); watson et al, "Recombinant DNA", Scientific American Books, New York; birren et al (eds), "Genome Analysis: a Laboratory Manual Series ", Vol.1-4, Cold Spring Harbor Laboratory Press, New York (1998); U.S. patent nos. 4,666,828; 4,683,202; 4,801,531, respectively; 5,192,659 and 5,272,057; "Cell Biology: a Laboratory Handbook ", Cellis, J.E., ed., Vol.I-III (1994); "Current Protocols in Immunology" Coligan J.E., Vol.I-III, ed. (1994); stits et al (eds), "Basic and clinical immunology" (8 th edition), apple & Lange, Norwalk, CT (1994); mishell and Shiigi (eds), "Selected Methods in Cellular Immunology", W.H.Freeman and Co., New York (1980); useful immunoassays are widely described in the patent and scientific literature, see, e.g., U.S. Pat. nos. 3,791,932; 3,839,153, respectively; 3,850,752, respectively; 3,850,578, respectively; 3,853,987, respectively; 3,867,517; 3,879,262, respectively; 3,901,654, respectively; 3,935,074, respectively; 3,984,533, respectively; 3,996,345; 4,034,074, respectively; 4,098,876, respectively; 4,879,219, respectively; 5,011,771 and 5,281,521; "Oligonucleotide Synthesis" Gait, m.j., ed. (1984); "Nucleic Acid Hybridization" Hames, b.d., and Higgins s.j., eds. (1985); "transformation and transformation" Hames, b.d., and Higgins s.j., Eds. (1984); "Animal Cell Culture" Freshney, r.i., ed. (1986); "Immobilized Cells and Enzymes" IRL Press, (1986); "A Practical Guide to Molecular Cloning" Perbal, B., (1984) and "Methods in Enzymology" Vol.1-317, Academic Press; "PCRProtocols: a Guide To Methods And Applications ", Academic Press, San Diego, CA (1990); marshak et al, "Strategies for protein purification and Characterization-A Laboratory Course Manual" CSHL Press (1996); all of which are incorporated herein by reference as if fully set forth herein. This document provides further general references throughout. The procedures therein are believed to be well known in the art and are provided for the convenience of the reader. All information contained therein is incorporated herein by reference.
Example 1
Constructs and transformation protocols
The construction of the expression cassettes and vectors used in this work is illustrated in FIGS. 1 a-d. All coding sequences in this work were optimized for expression in tobacco and chemically synthesized together with the desired flanking regions (SEQ ID NOs: 1, 4, 7, 12, 14, 16, 18, 20, 22). FIG. 1 a-synthetic genes encoding Col1 and Col2 fused to a vacuolar or apoplastic signal (encoded by SEQ ID NO: 7), or without a signal (SEQ ID1, 4) were cloned in expression cassettes consisting of a chrysanthemum rbcS1 promoter and 5 'UTR (SEQ ID NO: 10) and a chrysanthemum rbcS 13' UTR and terminator (SEQ ID NO: 11). The complete expression cassette was cloned into the multiple cloning site of the pBINPLUS plant transformation vector (van Engelen et al, 1995, Transgenic Res 4: 288-290). FIG. 1 b-synthetic genes encoding P4H β -human, P4H α -human and P4H-plants (SEQ ID NOS: 12, 14 and 16) fused to the vacuolar or apoplast signal (encoded by SEQ ID NO: 7) or without signal were cloned in an expression cassette consisting of the CaMV35S promoter and TMV ω sequence carried by vector pJD330 and the Agrobacterium nopaline synthase (NOS) terminator (Galili et al, 1987, Nucleic Acids Res 15: 3257-3273). The complete expression cassette was cloned into the multiple cloning site of the pBINPLUS vector carrying either the Col1 or Col2 expression cassettes. FIG. 1C-synthetic genes encoding protease C and protease N (SEQ ID NOS: 18, 20) fused to a vacuolar signal or apoplast signal (encoded by SEQ ID NO: 7) were cloned in an expression cassette consisting of the chrysanthemum rbcS1 promoter and 5 'UTR (SEQ ID NO: 10) and chrysanthemum rbcS 13' UTR and terminator (SEQ ID NO: 11). The complete expression cassette was cloned into the multiple cloning site of the pBINPLUS plant transformation vector. FIG. 1 d-the synthetic gene (SEQ ID NO: 22) encoding LH3, fused to the vacuolar or apoplast signal (encoded by SEQ ID NO: 7), or signal-free, flanked Strawberry Vein Banding Virus (SVBV) promoter (NCBI accession No. AF331666 REGION: 623..950 version AF331666.1GI: 13345788) and terminated by the Agrobacterium octopin synthase (OCS) terminator (NCBI accession No. Z37515 REGION: 1344..1538 version Z37515.1GI: 886843) was cloned in the multiple cloning site of the pBINPLUS vector carrying the Col1 and P4H β expression cassettes.
In FIG. 2 a co-transformation scheme for entering a host plant using the expression cassette described in FIG. 1 is illustrated. Each cassette insert is represented by a short name of the coding sequence. The coding sequences and related SEQ ID NOs are described in table 1. Each co-transformation was performed with 2 pBINPLUS binary vectors. Each rectangle represents a single pBINPLUS vector carrying 1, 2 or 3 expression cassettes. The promoter and terminator are illustrated in detail in FIG. 1.
Example 2
Plant collagen expression
The synthetic polynucleotide sequences encoding proteins listed in table 1 below were designed and optimized for expression in tobacco plants.
TABLE 1 list of expressed proteins
Name: SwissProt numbering Amino acids Splicing isoforms Absence of Name (R) Included in SEQ ID NO Encoded by SEQID NO
Collagen alpha 1(I) chain [ precursor ]] p02452 1442 One form ER signal Col1 3 1
Collagen alpha 2(I) chain [ precursor ]] p08123 makes 2 changes in p 08123: D549A and N249I 1342 One form ER signal Col2 6 4
Prolyl 4-hydroxylase beta subunit p07237 487 One form ER signal, KDEL P4H beta human 13 12
Prolyl 4-hydroxylase alpha 1 subunit p13674 517 P13674-1 ER signal P4H alpha human 15 14
Prolyl 4-hydroxylase plants SwissProt is not recorded. NCBI number: gi: 15227885 252 One form Mitochondrial signaling is expected to be: amino acids 1 to 39 P4H plant 17 16
Procollagen C-proteinase p13497 866 P13497-1BMP1-3 ER Signal, propeptide Protease C 19 18
Procollagen I N-protease o95450 958 O95450-1LpNPI ER Signal, propeptide Protease N 21 20
Lysyl hydroxylase 3 o60568 714 One form ER signal LH3 23 22
Signal peptide
(i) Vacuolar signal sequence of barley Gene for thiol protease aleurain precursor (NCBI accession No. P05167 GI: 113603)
MAHARVLLLALAVLATAAVAVASSSSFADSNPIRPVTDRAASTLA(SEQ IDNO:24)。
(ii) Apoplast signal of Arabidopsis thaliana endo-1, 4-beta-glucanase (Cell, NCBI number CAA67156.1GI: 2440033); SEQ ID NO.9, encoded by SEQ ID NO. 7.
Construction of plasmids
Plant expression vectors were constructed as taught in example 1, and the composition of each constructed expression vector was confirmed by restriction enzyme analysis and sequencing.
Constructing an expression vector comprising the following expression cassettes:
1. collagen alpha 1
2. Collagen alpha 1+ human P4H beta subunit
3. Collagen alpha 1+ human P4H beta subunit + human LH3
4. Collagen alpha 2
5. Collagen alpha 2+ human P4H alpha subunit
6. Collagen alpha 2+ Arabidopsis thaliana P4H
7. Human P4H beta subunit + human LH3
8. Human P4H alpha subunit
Each of the above coding sequences is translationally fused to a vacuolar transit peptide or an apoplast transit peptide, or in the case where cytoplasmic accumulation is expected, lacks any transit peptide sequence.
Plant transformation and PCR screening
Tobacco plants (Nicotiana tabacum, Samsun NN) were transformed with the above-described expression vectors according to the transformation protocol taught in FIG. 2.
The resulting transgenic plants were screened via multiplex PCR using 4 primers designed to amplify a 324bp fragment of collagen α 1 and a 537bp fragment of collagen α 2 (table 2). FIG. 3 illustrates the results of a multiplex PCR screen.
TABLE 2-primer List for multiplex PCR for amplification of 324bp fragment of collagen α 1 and 537bp fragment of collagen α 2
Coll forward primer (24-mer): 5′ATCACCAGGAGAACAGGGACCATC 3′ SEQ ID 25
coll reverse primer (29-mer): 5′TCCACTTCCAAATCTCTATCCCTAACAAC 3′ SEQ ID 26
col2 forward primer (23-mer): 5′AGGCATTAGAGGCGATAAGGGAG 3′ SEQ ID 27
col2 reverse primer (27-mer): 5′TCAATCCAATAATAGCCACTTGACCAC 3′ SEQ ID 28
example 3
Detection of human collagen in transgenic tobacco plants
Total soluble protein was extracted from tobacco transformants 2, 3 and 4 by grinding 500mg of leaves in 0.5ml of 50mM Tris-HCl pH 7.5 together with a "whole" protease inhibitor cocktail (product #1836145 from Roche Diagnostics GmbH, 1 tablet/50 ml buffer). The crude extract was mixed with 250. mu.l of 4 Xsample application buffer containing 10% β -mercaptoethanol and 8% SDS, the sample was boiled for 7 minutes and centrifuged at 13000rpm for 8 minutes. 20 μ l of supernatant was loaded into a 10% polyacrylamide gel and tested in a standard western blotting procedure with anti-collagen I (denatured) antibody (# AB745 from Chemicon inc.) (fig. 4). W.t. is wild type tobacco. Positive collagen bands were seen in plants that were PCR positive for either type I collagen alpha 1 or alpha 2 or both. A positive control band of 500ng type I collagen from human placenta (# CC050 from Chemicon inc.) represents about 0.3% total soluble protein (about 150 μ g) in the sample from the transgenic plant.
When collagen was targeted to vacuoles, plants expressing collagen of the expected molecular weight up to-1% total soluble protein were detected (fig. 4). Targeting of full-length collagen subcellular to apoplast was successfully achieved (figure 5). Plants expressing collagen in the cytoplasm (i.e., without targeting peptides) did not accumulate collagen to detectable levels, thereby showing that collagen subcellular targeting in plants is critical for success.
Furthermore, in contrast to the Ruggiero et al 2000 and Merle et al 2002 studies which showed significant proteolysis of collagen lacking the N-propeptide, using the present method, full-length collagen proteins containing both the C-propeptide and the N-propeptide were accumulated at high levels in subcellular compartments.
The present data also clearly show that 2 plant crosses each expressing a different collagen chain type are advantageous as it enables selection of plants expressing the optimal level of each chain type and subsequent plant crosses to obtain the desired collagen producing plant.
The collagen produced by the plants of the invention includes native propeptides, and is therefore expected to form larger proteins than the human control purified by proteolysis. The calculated molecular weights of the collagen alpha 1 and alpha 2 chains without hydroxylation or glycosylation are as follows: col1-136kDa containing propeptide, Col1-95kDa without propeptide, Col2-127kDa containing propeptide, Col2-92kDa without propeptide.
As can be seen in FIG. 4, the Col1 band appeared larger in transformants 3-5 and 3-49 than the Col1 band in other plants. This suggests that proline hydroxylation in the collagen chain is performed by the human proline 4-hydroxylase holoenzyme, consisting of alpha and beta subunits that are co-expressed in these plants and targeted to the same subcellular compartment (e.g., vacuole) as the human collagen chain.
Example 4
Collagen triple helix assembly and thermostability in transgenic plants
Assembly of collagen triple helix and thermostability of the helix in transgenic plants was tested by heat denaturation of total crude protein extract of transgenic plants followed by trypsin or pepsin digestion (fig. 6 a-b).
In the first experiment, total soluble proteins from tobacco 2-9 (expressing only collagen α 1 and not expressing P4H) and 3-5 (expressing collagen α 1+2 and P4H) were extracted by: 500mg of leaves were ground in 0.5ml of 50mM Tris-HCl pH 7.5, centrifuged at 13000rpm for 10 minutes and the supernatant was collected. 50 μ l of the supernatant was subjected to heat treatment (at 33 ℃ or 43 ℃ for 15 minutes) and then immediately placed on ice. The tryptic digestion was started by adding 6. mu.l of 1mg/ml trypsin in 50mM Tris-HCl pH 7.5 to each sample. The samples were incubated at room temperature (about 22 ℃) for 20 minutes. Digestion was stopped by adding 20 μ l of 4 x sample application buffer containing 10% beta mercaptoethanol and 8% SDS, the sample was boiled for 7 minutes and centrifuged at 13000rpm for 7 minutes. 50 μ l of supernatant was loaded onto a 10% polyacrylamide gel and tested via anti-collagen I antibody (# AB745 from Chemicon Inc.) using standard western blotting procedures. The positive control was a-500 ng human collagen I sample (# CC050 from Chemicon inc., extracted from human placenta by pepsin digestion) added to 50 μ l of total soluble protein extracted from wild-type tobacco.
As shown in FIG. 6a, the collagen triple helix formed in plant #3-5, as well as the control human collagen, was resistant to denaturation at 33 ℃. In contrast, collagen formed by plant #2-9 was denatured at 33 ℃. This difference in thermostability suggests successful triple helix assembly and post-translational proline hydroxylation in transformant #3-5, which expresses collagen alpha 1 and collagen alpha 2 as well as P4H beta and alpha subunits.
The 2 bands in transformant #2-9 may represent dimers or trimers which are stable after boiling for 7 minutes with SDS and mercaptoethanol. Similar bands are seen in human collagen (upper panel) and transformant # 3-5. A possible explanation is the covalent bond (cross-linking) between 2 peptides in different triple helices, which is formed after oxidative deamination of 2 lysines by lysine oxidase.
In a second experiment, total soluble protein from transgenic tobacco 13-6 (expressing collagen I α 1 and α 2 chains-indicated by arrows, human P4H α and β subunits and human LH3) was extracted by: 500mg of leaves were ground in 0.5ml of 100 mM Tris-HCl pH 7.5 and 300mM NaCl, centrifuged at 10000rpm for 7 minutes, and the supernatant was collected. 50 μ l of the supernatant was subjected to heat treatment (at 33 ℃, 38 ℃ or 42 ℃ for 20 minutes) and then immediately placed on ice. Pepsin digestion was started by adding 4.5 μ l 0.1M HCl in 10mM acetic acid and 4 μ l 2.5mg/ml pepsin to each sample. The samples were incubated at room temperature (about 22 ℃) for 30 minutes. Digestion was stopped by adding 5. mu.l of unbuffered 1M Tris. Each sample was mixed with 22. mu.l of 4 Xsample application buffer containing 10% beta mercaptoethanol and 8% SDS, boiled for 7 minutes and centrifuged at 13000rpm for 7 minutes. 40 μ l of supernatant was loaded onto a 10% polyacrylamide gel and tested in a standard western blotting procedure with anti-collagen I antibody (# AB745 from Chemicon inc. The positive control was human collagen I (# CC050 from Chemicon inc., extracted from human placenta by pepsin digestion) added to 50ng of total soluble protein extracted from wild-type tobacco.
As illustrated in fig. 6b, the collagen triple helix formed in plant #13-6 was resistant to denaturation at 42 ℃. Cleavage of the propeptide is first visible at 33 ℃ and the force gradually increases as the temperature rises to 38 ℃ and then to 42 ℃. The cleaved triple helical domain of collagen showed migration on the gel similar to that of pepsin treated human collagen. The human collagen used in this experiment was extracted from the human placenta by proteolysis by pepsin and was therefore devoid of propeptides and some telopeptides.
Example 5
Plant P4H expression
Induction of native plant P4H
The tobacco P4H cDNA was cloned and used as a probe to determine the conditions and treatments that will induce expression of endogenous P4H. Northern blot analysis (fig. 7) clearly shows that P4H is expressed at relatively high levels in the shoot apex and at low levels in the leaves. P4H levels were significantly induced in the leaves 4 hours after abrasion treatment ("wounding" in the following figures). Similar results were achieved with other stress conditions (not shown).
Detection of P4H alpha and beta subunits and collagen alpha 1 and alpha 2 chains in transgenic tobacco plants
Detection of human P4H alpha and beta subunits and collagen type I alpha 1 and alpha 2 chains in transgenic tobacco plants was achieved using anti-human P4H alpha subunit antibody (# 63-163 from ICN Biomedicals inc., anti-human P4H beta subunit antibody (# MAB2701 from Chemicon inc., and anti-collagen I antibody (# AB745 from Chemicon inc., respectively). Western blot results probed with these antibodies are shown in figure 8.
Expression of P4H α, P4H β and collagen I α 1 and α 2 bands was confirmed in plant 13-6 (also transformed with human LH 3). The calculated molecular weights of P4H a and β, including the vacuolar signal peptide, were 65.5kDa and 53.4kDa, respectively. The calculated molecular weights of the pro-peptide-containing, non-hydroxylated or glycosylated collagens alpha 1 and alpha 2 chains were 136kDa and 127kDa, respectively.
It is to be understood that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.
While the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents and patent applications and GenBank accession numbers mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application or GenBank accession number was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.
Reference to the literature
(other references cited in the document)
1. Bulleid NJ,John DC,Kadler KE.Recombinant expression systems for theproduction of collagen.Biochem Soc Trans.2000;28(4):350-3.Review.PMID:10961917[PubMed-indexed for MEDLINE]
2. Hare PD,Cress WA.Metabolic implications of stress-induced prolineaccumulation in plants.Plant Growth Regulation 1997;21:79-102.
3. Hieta R,Myllyharju J.Cloning and characterization of a low molecularweight prolyl 4-hydroxylase from Arabidopsis thaliana.Effective hydroxylation ofproline-rich,collagen-like,and hypoxia-inducible transcription factor alpha-likepeptides.J Biol Chem.2002 Jun 28;277(26):23965-71.Epub 2002Apr 25.PMID:11976332[PubMed-indexed for MEDLINE]
4. Hulmes DJ.Building collagen molecules,fibrils,and suprafibrillar structures.J Struct Biol.2002Jan-Feb;137(1-2):2-10.Review.PMID:12064927[PubMed-indexed for MEDLINE]
5. Inkinen K.Connective tissue formation in wound healing.An experimentalstudy.Academic Dissertation,September 2003.University of Helsinki,Faculty ofScience,Department of Biosciences,Division of Biochemistry(ISBN 952-10-1313-3)http://ethesis.helsinki.fi/iulkaisut/mat/bioti/vk/inkinen/
6. Merle C,Perret S,Lacour T,Jonval V,Hudaverdian S,Garrone R,RuggieroF,Theisen M.Hydroxylated human homotrimeric collagen I in Agrobacteriumtumefaciens-mediated transient expression and in transgenic tobacco plant.FEBSLett.2002Mar 27;515(1-3):114-8.PMID:11943205[PubMed-indexed forMEDLINE]
7. Olsen D,Yang C,Bodo M,Chang R,Leigh S,Baez J,Carmichael D,PeralaM,Hamalainen ER,Jarvinen M,Polarek J.Recombinant collagen and gelatin for drugdelivery.Adv Drug Deliv Rev.2003 Nov 28;55(12):1547-67.PMID:14623401[PubMed-in process]
8.Ruggiero F,Exposito JY,Boumat P,Gruber V,Perret S,Comte J,Olagnier B,Garrone R,Theisen M.
Triple helix assembly and processing of human collagen produced in transgenictobacco plants.FEBS Lett.2000Mar 3;469(1):132-6.PMID:10708770[PubMed-indexed for MEDLINE]
9. Tanaka M,Sato K,Uchida T.Plant prolyl hydroxylase recognizes poly(L-proline)II helix.J Biol Chem.1981 Nov 25;256(22):11397-400.PMID:6271746[P ubMed-indexed for MEDLINE]
10. Wang C,Luosujarvi H,Heikkinen J,Risteli M,Uitto L,Myllyla R.The thirdactivity for lysyl hydroxylase 3:galactosylation of hydoxylysyl residues in collagensin vitro.Matrix Biol.2002Nov;21(7):559-66.PMID:12475640[PubMed-indexedfor MEDLINE]
Sequence listing
<110>CollPlant Ltd.
<120> collagen-producing plants and methods of making and using the same
<130>CPCH0761218P
<160>28
<170>PatentIn version 3.3
<210>1
<211>4662
<212>DNA
<213> Artificial sequence
<220>
<223> vascular signal sequence containing barley thiol protease aleurain precursor gene fused with human collagen alpha 1(I) chain
Synthetic sequences of coding and flanking regions
<400>1
gcgatgcatg taatgtcatg agccacatga tccaatggcc acaggaacgt aagaatgtag 60
atagatttga ttttgtccgt tagatagcaa acaacattat aaaaggtgtg tatcaatacg 120
aactaattca ctcattggat tcatagaagt ccattcctcc taagtatcta aaccatggct 180
cacgctcgtg ttctcctcct cgctctcgct gttttggcaa cagctgctgt ggctgtggct 240
tctagttctt cttttgctga ttcaaaccct attagacctg ttactgatag agcagcttcc 300
actttggctc aattgcaaga ggagggccag gttgagggcc aagatgagga tatccctcca 360
attacatgcg tgcaaaatgg cttgcgttac cacgataggg atgtgtggaa acctgaacct 420
tgtcgtatct gtgtgtgtga taacggcaag gtgctctgcg atgatgttat ctgcgatgag 480
acaaaaaatt gccctggcgc tgaagttcct gagggcgagt gttgccctgt gtgccctgat 540
ggttccgagt ccccaactga tcaggaaact actggcgtgg agggcccaaa aggagatact 600
ggtccacgtg gtcctagggg tccagcaggt cctccaggta gagatggtat tccaggccag 660
cctggattgc caggaccacc aggcccacct ggcccaccag gacctcctgg tcttggtgga 720
aatttcgctc cacaactctc ttatggctat gatgagaagt caacaggtgg tatttccgtt 780
ccaggtccta tgggaccatc cggaccaaga ggtctcccag gtcctccagg tgctcctgga 840
cctcaaggct ttcaaggacc tccaggcgaa ccaggagaac caggcgcttc tggaccaatg 900
ggcccaaggg gaccacctgg cccaccagga aaaaatggcg atgatggcga agctggaaag 960
cctggtcgtc ctggagagag aggtcctcct ggcccacagg gtgcaagagg cttgccagga 1020
actgctggct tgcctggaat gaagggacat aggggcttct ccggcctcga tggcgctaag 1080
ggtgatgctg gccctgctgg accaaagggc gagccaggtt cccctggaga aaacggtgct 1140
cctggacaaa tgggtcctcg tggacttcca ggagaaaggg gtcgtccagg cgctccagga 1200
ccagcaggtg ctaggggaaa cgatggtgca acaggcgctg ctggccctcc tggcccaact 1260
ggtcctgctg gccctccagg attcccaggc gcagttggag ctaaaggaga agcaggacca 1320
cagggcccta ggggttctga aggacctcag ggtgttagag gtgaaccagg tcctccaggc 1380
ccagctggag cagctggtcc agcaggaaat ccaggtgctg atggtcaacc tggagctaag 1440
ggcgctaatg gcgcaccagg tatcgcaggc gcaccaggtt ttcctggcgc tagaggccca 1500
agtggtcctc aaggaccagg tggaccacca ggtccaaaag gcaattctgg cgaacctggc 1560
gctccaggtt ctaaaggaga tactggtgct aaaggcgaac caggacctgt tggtgttcag 1620
ggtcctcctg gtcctgctgg agaagaagga aaaagaggtg ctcgtggaga accaggacca 1680
actggacttc ctggacctcc tggtgaacgt ggcggacctg gctcaagggg tttccctgga 1740
gctgatggag tggcaggtcc aaaaggccct gctggagaga gaggttcacc aggtccagct 1800
ggtcctaagg gctcccctgg tgaagcaggt agaccaggcg aagcaggatt gccaggcgca 1860
aagggattga caggctctcc tggtagtcct ggcccagatg gaaaaacagg cccaccaggt 1920
ccagcaggac aagatggacg tccaggccca ccaggtcctc ctggagcaag gggacaagct 1980
ggcgttatgg gttttccagg acctaaaggt gctgctggag agccaggaaa ggcaggtgaa 2040
agaggagttc ctggtccacc aggagcagtg ggtcctgctg gcaaagatgg tgaagctgga 2100
gcacagggcc ctccaggccc tgctggccca gctggcgaac gtggagaaca aggcccagct 2160
ggtagtccag gatttcaagg attgcctggc cctgctggcc ctccaggaga agcaggaaaa 2220
cctggagaac aaggagttcc tggtgatttg ggagcacctg gaccttcagg agcacgtggt 2280
gaaagaggct tccctggcga gaggggtgtt caaggtccac caggtccagc aggacctaga 2340
ggtgctaatg gcgctcctgg caacgatgga gcaaaaggtg atgctggtgc tcctggcgca 2400
cctggaagtc agggtgctcc tggattgcaa ggaatgcctg gagagagggg tgctgctggc 2460
ttgccaggcc caaagggcga taggggtgat gctggaccaa aaggtgctga tggatcccca 2520
ggaaaagatg gagttcgtgg tcttactggc ccaatcggac ctccaggccc tgctggcgct 2580
ccaggtgata agggcgaaag tggcccaagt ggacctgctg gacctactgg tgctagaggt 2640
gcacctggtg ataggggtga acctggacca cctggtccag ctggttttgc tggtcctcct 2700
ggagctgatg gacaacctgg cgcaaagggt gaaccaggtg atgctggcgc aaagggagat 2760
gctggtccac ctggacctgc tggtccagca ggcccccctg ggccaatcgg taatgttgga 2820
gcaccaggtg ctaagggagc taggggttcc gctggtccac ctggagcaac aggatttcca 2880
ggcgctgctg gtagagttgg cccaccaggc ccatccggaa acgcaggccc tcctggtcct 2940
ccaggtcctg ctggcaagga gggtggcaaa ggaccaaggg gcgaaactgg ccctgctggt 3000
agacctggcg aagttggccc tcctggacca ccaggtccag caggagaaaa aggttcccca 3060
ggagctgatg gcccagctgg tgctccagga actccaggcc ctcaaggtat tgctggacag 3120
agaggcgttg tgggactccc tggtcaaagg ggagagagag gatttccagg cttgccagga 3180
cctagtggag aacctggaaa acaaggccca tcaggcgcta gtggagagcg tggacctcct 3240
ggccctatgg gacctcctgg attggctggc ccacctggcg aatcaggtcg tgaaggcgca 3300
ccaggcgcag aaggatcacc tggaagagat ggatcccctg gtgctaaagg cgatcgtgga 3360
gaaactggtc cagcaggccc accaggcgca ccaggtgcac ctggcgctcc aggacctgtg 3420
ggaccagctg gaaaatccgg agataggggc gagacaggcc cagcaggacc agctggacct 3480
gttggccctg ctggcgctcg tggaccagca ggacctcaag gaccaagggg agataaggga 3540
gaaacaggcg aacaaggcga taggggcatt aagggtcata ggggttttag tggcctccag 3600
ggtcctcctg gcccacctgg atcaccagga gaacagggac catctggtgc ttccggccca 3660
gctggtccaa gaggacctcc aggatcagct ggtgcacctg gaaaagatgg tcttaacggt 3720
ctcccaggac caatcggccc tccaggacct agaggaagaa caggagatgc tggccctgtt 3780
ggccctccag gacctcctgg tccaccaggt ccacctggtc ctccatcagc tggattcgat 3840
ttttcatttc ttccacagcc accacaagag aaagctcacg atggcggcag atattaccgt 3900
gctgatgatg ctaacgttgt tagggataga gatttggaag tggatacaac tttgaaatcc 3960
ctctcccagc aaattgaaaa cattagatct ccagaaggtt cacgtaaaaa cccagctaga 4020
acatgtcgtg atttgaaaat gtgtcactcc gattggaaaa gtggtgaata ctggattgat 4080
ccaaatcagg gctgtaatct cgatgctatc aaagttttct gtaacatgga aacaggcgaa 4140
acatgcgttt atcctactca accttccgtg gctcagaaaa attggtacat ctcaaaaaat 4200
cctaaagata agaggcacgt ttggttcggt gaaagtatga ctgatggatt tcaatttgag 4260
tacggcggtc aaggtagtga tccagctgat gtggctattc aactcacatt tttgcgtctt 4320
atgtccacag aggcatcaca aaacatcact taccactgca aaaacagtgt ggcttatatg 4380
gatcaacaaa caggaaacct taagaaggct cttcttttga agggctcaaa cgagattgag 4440
attagagcag agggcaactc aaggtttact tattcagtta ctgttgatgg ctgcacttca 4500
catactggcg cttggggtaa aacagttatc gagtataaga ctacaaaaac atcaagactc 4560
ccaatcattg atgttgctcc tctcgatgtt ggcgctcctg atcaagagtt cggttttgat 4620
gtgggcccag tttgtttcct ctaatgagct cgcggccgca tc 4662
<210>2
<211>4662
<212>DNA
<213> Artificial sequence
<220>
<223> vascular signal sequence and flanking of barley thiol protease aleurain precursor gene fused to human collagen alpha 1(I) chain
Synthetic sequences of the wing regions
<220>
<221>CDS
<222>(175)..(4644)
<400>2
gcgatgcatg taatgtcatg agccacatga tccaatggcc acaggaacgt aagaatgtag 60
atagatttga ttttgtccgt tagatagcaa acaacattat aaaaggtgtg tatcaatacg 120
aactaattca ctcattggat tcatagaagt ccattcctcc taagtatcta aacc atg 177
Met
1
gct cac gct cgt gtt ctc ctc ctc gct ctc gct gtt ttg gca aca gct 225
Ala His Ala Arg Val Leu Leu Leu Ala Leu Ala Val Leu Ala Thr Ala
5 10 15
gct gtg gct gtg gct tct agt tct tct ttt gct gat tca aac cct att 273
Ala Val Ala Val Ala Ser Ser Ser Ser Phe Ala Asp Ser Asn Pro Ile
20 25 30
aga cct gtt act gat aga gca gct tcc act ttg gct caa ttg caa gag 321
Arg Pro Val Thr Asp Arg Ala Ala Ser Thr Leu Ala Gln Leu Gln Glu
35 40 45
gag ggc cag gtt gag ggc caa gat gag gat atc cct cca att aca tgc 369
Glu Gly Gln Val Glu Gly Gln Asp Glu Asp Ile Pro Pro Ile Thr Cys
50 55 60 65
gtg caa aat ggc ttg cgt tac cac gat agg gat gtg tgg aaa cct gaa 417
Val Gln Asn Gly Leu Arg Tyr His Asp Arg Asp Val Trp Lys Pro Glu
70 75 80
cct tgt cgt atc tgt gtg tgt gat aac ggc aag gtg ctc tgc gat gat 465
Pro Cys Arg Ile Cys Val Cys Asp Asn Gly Lys Val Leu Cys Asp Asp
85 90 95
gtt atc tgc gat gag aca aaa aat tgc cct ggc gct gaa gtt cct gag 513
Val Ile Cys Asp Glu Thr Lys Asn Cys Pro Gly Ala Glu Val Pro Glu
100 105 110
ggc gag tgt tgc cct gtg tgc cct gat ggt tcc gag tcc cca act gat 561
Gly Glu Cys Cys Pro Val Cys Pro Asp Gly Ser Glu Ser Pro Thr Asp
115 120 125
cag gaa act act ggc gtg gag ggc cca aaa gga gat act ggt cca cgt 609
Gln Glu Thr Thr Gly Val Glu Gly Pro Lys Gly Asp Thr Gly Pro Arg
130 135 140 145
ggt cct agg ggt cca gca ggt cct cca ggt aga gat ggt att cca ggc 657
Gly Pro Arg Gly Pro Ala Gly Pro Pro Gly Arg Asp Gly Ile Pro Gly
150 155 160
cag cct gga ttg cca gga cca cca ggc cca cct ggc cca cca gga cct 705
Gln Pro Gly Leu Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro
165 170 175
cct ggt ctt ggt gga aat ttc gct cca caa ctc tct tat ggc tat gat 753
Pro Gly Leu Gly Gly Asn Phe Ala Pro Gln Leu Ser Tyr Gly Tyr Asp
180 185 190
gag aag tca aca ggt ggt att tcc gtt cca ggt cct atg gga cca tcc 801
Glu Lys Ser Thr Gly Gly Ile Ser Val Pro Gly Pro Met Gly Pro Ser
195 200 205
gga cca aga ggt ctc cca ggt cct cca ggt gct cct gga cct caa ggc 849
Gly Pro Arg Gly Leu Pro Gly Pro Pro Gly Ala Pro Gly Pro Gln Gly
210 215 220 225
ttt caa gga cct cca ggc gaa cca gga gaa cca ggc gct tct gga cca 897
Phe Gln Gly Pro Pro Gly Glu Pro Gly Glu Pro Gly Ala Ser Gly Pro
230 235 240
atg ggc cca agg gga cca cct ggc cca cca gga aaa aat ggc gat gat 945
Met Gly Pro Arg Gly Pro Pro Gly Pro Pro Gly Lys Asn Gly Asp Asp
245 250 255
ggc gaa gct gga aag cct ggt cgt cct gga gag aga ggt cct cct ggc 993
Gly Glu Ala Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Pro Pro Gly
260 265 270
cca cag ggt gca aga ggc ttg cca gga act gct ggc ttg cct gga atg 1041
Pro Gln Gly Ala Arg Gly Leu Pro Gly Thr Ala Gly Leu Pro Gly Met
275 280 285
aag gga cat agg ggc ttc tcc ggc ctc gat ggc gct aag ggt gat gct 1089
Lys Gly His Arg Gly Phe Ser Gly Leu Asp Gly Ala Lys Gly Asp Ala
290 295 300 305
ggc cct gct gga cca aag ggc gag cca ggt tcc cct gga gaa aac ggt 1137
Gly Pro Ala Gly Pro Lys Gly Glu Pro Gly Ser Pro Gly Glu Asn Gly
310 315 320
gct cct gga caa atg ggt cct cgt gga ctt cca gga gaa agg ggt cgt 1185
Ala Pro Gly Gln Met Gly Pro Arg Gly Leu Pro Gly Glu Arg Gly Arg
325 330 335
cca ggc gct cca gga cca gca ggt gct agg gga aac gat ggt gca aca 1233
Pro Gly Ala Pro Gly Pro Ala Gly Ala Arg Gly Asn Asp Gly Ala Thr
340 345 350
ggc gct gct ggc cct cct ggc cca act ggt cct gct ggc cct cca gga 1281
Gly Ala Ala Gly Pro Pro Gly Pro Thr Gly Pro Ala Gly Pro Pro Gly
355 360 365
ttc cca ggc gca gtt gga gct aaa gga gaa gca gga cca cag ggc cct 1329
Phe Pro Gly Ala Val Gly Ala Lys Gly Glu Ala Gly Pro Gln Gly Pro
370 375 380 385
agg ggt tct gaa gga cct cag ggt gtt aga ggt gaa cca ggt cct cca 1377
Arg Gly Ser Glu Gly Pro Gln Gly Val Arg Gly Glu Pro Gly Pro Pro
390 395 400
ggc cca gct gga gca gct ggt cca gca gga aat cca ggt gct gat ggt 1425
Gly Pro Ala Gly Ala Ala Gly Pro Ala Gly Asn Pro Gly Ala Asp Gly
405 410 415
caa cct gga gct aag ggc gct aat ggc gca cca ggt atc gca ggc gca 1473
Gln Pro Gly Ala Lys Gly Ala Asn Gly Ala Pro Gly Ile Ala Gly Ala
420 425 430
cca ggt ttt cct ggc gct aga ggc cca agt ggt cct caa gga cca ggt 1521
Pro Gly Phe Pro Gly Ala Arg Gly Pro Ser Gly Pro Gln Gly Pro Gly
435 440 445
gga cca cca ggt cca aaa ggc aat tct ggc gaa cct ggc gct cca ggt 1569
Gly Pro Pro Gly Pro Lys Gly Asn Ser Gly Glu Pro Gly Ala Pro Gly
450 455 460 465
tct aaa gga gat act ggt gct aaa ggc gaa cca gga cct gtt ggt gtt 1617
Ser Lys Gly Asp Thr Gly Ala Lys Gly Glu Pro Gly Pro Val Gly Val
470 475 480
cag ggt cct cct ggt cct gct gga gaa gaa gga aaa aga ggt gct cgt 1665
Gln Gly Pro Pro Gly Pro Ala Gly Glu Glu Gly Lys Arg Gly Ala Arg
485 490 495
gga gaa cca gga cca act gga ctt cct gga cct cct ggt gaa cgt ggc 1713
Gly Glu Pro Gly Pro Thr Gly Leu Pro Gly Pro Pro Gly Glu Arg Gly
500 505 510
gga cct ggc tca agg ggt ttc cct gga gct gat gga gtg gca ggt cca 1761
Gly Pro Gly Ser Arg Gly Phe Pro Gly Ala Asp Gly Val Ala Gly Pro
515 520 525
aaa ggc cct gct gga gag aga ggt tca cca ggt cca gct ggt cct aag 1809
Lys Gly Pro Ala Gly Glu Arg Gly Ser Pro Gly Pro Ala Gly Pro Lys
530 535 540 545
ggc tcc cct ggt gaa gca ggt aga cca ggc gaa gca gga ttg cca ggc 1857
Gly Ser Pro Gly Glu Ala Gly Arg Pro Gly Glu Ala Gly Leu Pro Gly
550 555 560
gca aag gga ttg aca ggc tct cct ggt agt cct ggc cca gat gga aaa 1905
Ala Lys Gly Leu Thr Gly Ser Pro Gly Ser Pro Gly Pro Asp Gly Lys
565 570 575
aca ggc cca cca ggt cca gca gga caa gat gga cgt cca ggc cca cca 1953
Thr Gly Pro Pro Gly Pro Ala Gly Gln Asp Gly Arg Pro Gly Pro Pro
580 585 590
ggt cct cct gga gca agg gga caa gct ggc gtt atg ggt ttt cca gga 2001
Gly Pro Pro Gly Ala Arg Gly Gln Ala Gly Val Met Gly Phe Pro Gly
595 600 605
cct aaa ggt gct gct gga gag cca gga aag gca ggt gaa aga gga gtt 2049
Pro Lys Gly Ala Ala Gly Glu Pro Gly Lys Ala Gly Glu Arg Gly Val
610 615 620 625
cct ggt cca cca gga gca gtg ggt cct gct ggc aaa gat ggt gaa gct 2097
Pro Gly Pro Pro Gly Ala Val Gly Pro Ala Gly Lys Asp Gly Glu Ala
630 635 640
gga gca cag ggc cct cca ggc cct gct ggc cca gct ggc gaa cgt gga 2145
Gly Ala Gln Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Glu Arg Gly
645 650 655
gaa caa ggc cca gct ggt agt cca gga ttt caa gga ttg cct ggc cct 2193
Glu Gln Gly Pro Ala Gly Ser Pro Gly Phe Gln Gly Leu Pro Gly Pro
660 665 670
gct ggc cct cca gga gaa gca gga aaa cct gga gaa caa gga gtt cct 2241
Ala Gly Pro Pro Gly Glu Ala Gly Lys Pro Gly Glu Gln Gly Val Pro
675 680 685
ggt gat ttg gga gca cct gga cct tca gga gca cgt ggt gaa aga ggc 2289
Gly Asp Leu Gly Ala Pro Gly Pro Ser Gly Ala Arg Gly Glu Arg Gly
690 695 700 705
ttc cct ggc gag agg ggt gtt caa ggt cca cca ggt cca gca gga cct 2337
Phe Pro Gly Glu Arg Gly Val Gln Gly Pro Pro Gly Pro Ala Gly Pro
710 715 720
aga ggt gct aat ggc gct cct ggc aac gat gga gca aaa ggt gat gct 2385
Arg Gly Ala Asn Gly Ala Pro Gly Asn Asp Gly Ala Lys Gly Asp Ala
725 730 735
ggt gct cct ggc gca cct gga agt cag ggt gct cct gga ttg caa gga 2433
Gly Ala Pro Gly Ala Pro Gly Ser Gln Gly Ala Pro Gly Leu Gln Gly
740 745 750
atg cct gga gag agg ggt gct gct ggc ttg cca ggc cca aag ggc gat 2481
Met Pro Gly Glu Arg Gly Ala Ala Gly Leu Pro Gly Pro Lys Gly Asp
755 760 765
agg ggt gat gct gga cca aaa ggt gct gat gga tcc cca gga aaa gat 2529
Arg Gly Asp Ala Gly Pro Lys Gly Ala Asp Gly Ser Pro Gly Lys Asp
770 775 780 785
gga gtt cgt ggt ctt act ggc cca atc gga cct cca ggc cct gct ggc 2577
Gly Val Arg Gly Leu Thr Gly Pro Ile Gly Pro Pro Gly Pro Ala Gly
790 795 800
gct cca ggt gat aag ggc gaa agt ggc cca agt gga cct gct gga cct 2625
Ala Pro Gly Asp Lys Gly Glu Ser Gly Pro Ser Gly Pro Ala Gly Pro
805 810 815
act ggt gct aga ggt gca cct ggt gat agg ggt gaa cct gga cca cct 2673
Thr Gly Ala Arg Gly Ala Pro Gly Asp Arg Gly Glu Pro Gly Pro Pro
820 825 830
ggt cca gct ggt ttt gct ggt cct cct gga gct gat gga caa cct ggc 2721
Gly Pro Ala Gly Phe Ala Gly Pro Pro Gly Ala Asp Gly Gln Pro Gly
835 840 845
gca aag ggt gaa cca ggt gat gct ggc gca aag gga gat gct ggt cca 2769
Ala Lys Gly Glu Pro Gly Asp Ala Gly Ala Lys Gly Asp Ala Gly Pro
850 855 860 865
cct gga cct gct ggt cca gca ggc ccc cct ggg cca atc ggt aat gtt 2817
Pro Gly Pro Ala Gly Pro Ala Gly Pro Pro Gly Pro Ile Gly Asn Val
870 875 880
gga gca cca ggt gct aag gga gct agg ggt tcc gct ggt cca cct gga 2865
Gly Ala Pro Gly Ala Lys Gly Ala Arg Gly Ser Ala Gly Pro Pro Gly
885 890 895
gca aca gga ttt cca ggc gct gct ggt aga gtt ggc cca cca ggc cca 2913
Ala Thr Gly Phe Pro Gly Ala Ala Gly Arg Val Gly Pro Pro Gly Pro
900 905 910
tcc gga aac gca ggc cct cct ggt cct cca ggt cct gct ggc aag gag 2961
Ser Gly Asn Ala Gly Pro Pro Gly Pro Pro Gly Pro Ala Gly Lys Glu
915 920 925
ggt ggc aaa gga cca agg ggc gaa act ggc cct gct ggt aga cct ggc 3009
Gly Gly Lys Gly Pro Arg Gly Glu Thr Gly Pro Ala Gly Arg Pro Gly
930 935 940 945
gaa gtt ggc cct cct gga cca cca ggt cca gca gga gaa aaa ggt tcc 3057
Glu Val Gly Pro Pro Gly Pro Pro Gly Pro Ala Gly Glu Lys Gly Ser
950 955 960
cca gga gct gat ggc cca gct ggt gct cca gga act cca ggc cct caa 3105
Pro Gly Ala Asp Gly Pro Ala Gly Ala Pro Gly Thr Pro Gly Pro Gln
965 970 975
ggt att gct gga cag aga ggc gtt gtg gga ctc cct ggt caa agg gga 3153
Gly Ile Ala Gly Gln Arg Gly Val Val Gly Leu Pro Gly Gln Arg Gly
980 985 990
gag aga gga ttt cca ggc ttg cca gga cct agt gga gaa cct gga aaa 3201
Glu Arg Gly Phe Pro Gly Leu Pro Gly Pro Ser Gly Glu Pro Gly Lys
995 1000 1005
caa ggc cca tca ggc gct agt gga gag cgt gga cct cct ggc cct 3246
Gln Gly Pro Ser Gly Ala Ser Gly Glu Arg Gly Pro Pro Gly Pro
1010 1015 1020
atg gga cct cct gga ttg gct ggc cca cct ggc gaa tca ggt cgt 3291
Met Gly Pro Pro Gly Leu Ala Gly Pro Pro Gly Glu Ser Gly Arg
1025 1030 1035
gaa ggc gca cca ggc gca gaa gga tca cct gga aga gat gga tcc 3336
Glu Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly Arg Asp Gly Ser
1040 1045 1050
cct ggt gct aaa ggc gat cgt gga gaa act ggt cca gca ggc cca 3381
Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly Pro
1055 1060 1065
cca ggc gca cca ggt gca cct ggc gct cca gga cct gtg gga cca 3426
Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val Gly Pro
1070 1075 1080
gct gga aaa tcc gga gat agg ggc gag aca ggc cca gca gga cca 3471
Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly Pro
1085 1090 1095
gct gga cct gtt ggc cct gct ggc gct cgt gga cca gca gga cct 3516
Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro
1100 1105 1110
caa gga cca agg gga gat aag gga gaa aca ggc gaa caa ggc gat 3561
Gln Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gln Gly Asp
1115 1120 1125
agg ggc att aag ggt cat agg ggt ttt agt ggc ctc cag ggt cct 3606
Arg Gly Ile Lys Gly His Arg Gly Phe Ser Gly Leu Gln Gly Pro
1130 1135 1140
cct ggc cca cct gga tca cca gga gaa cag gga cca tct ggt gct 3651
Pro Gly Pro Pro Gly Ser Pro Gly Glu Gln Gly Pro Ser Gly Ala
1145 1150 1155
tcc ggc cca gct ggt cca aga gga cct cca gga tca gct ggt gca 3696
Ser Gly Pro Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly Ala
1160 1165 1170
cct gga aaa gat ggt ctt aac ggt ctc cca gga cca atc ggc cct 3741
Pro Gly Lys Asp Gly Leu Asn Gly Leu Pro Gly Pro Ile Gly Pro
1175 1180 1185
cca gga cct aga gga aga aca gga gat gct ggc cct gtt ggc cct 3786
Pro Gly Pro Arg Gly Arg Thr Gly Asp Ala Gly Pro Val Gly Pro
1190 1195 1200
cca gga cct cct ggt cca cca ggt cca cct ggt cct cca tca gct 3831
Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Ser Ala
1205 1210 1215
gga ttc gat ttt tca ttt ctt cca cag cca cca caa gag aaa gct 3876
Gly Phe Asp Phe Ser Phe Leu Pro Gln Pro Pro Gln Glu Lys Ala
1220 1225 1230
cac gat ggc ggc aga tat tac cgt gct gat gat gct aac gtt gtt 3921
His Asp Gly Gly Arg Tyr Tyr Arg Ala Asp Asp Ala Asn Val Val
1235 1240 1245
agg gat aga gat ttg gaa gtg gat aca act ttg aaa tcc ctc tcc 3966
Arg Asp Arg Asp Leu Glu Val Asp Thr Thr Leu Lys Ser Leu Ser
1250 1255 1260
cag caa att gaa aac att aga tct cca gaa ggt tca cgt aaa aac 4011
Gln Gln Ile Glu Asn Ile Arg Ser Pro Glu Gly Ser Arg Lys Asn
1265 1270 1275
cca gct aga aca tgt cgt gat ttg aaa atg tgt cac tcc gat tgg 4056
Pro Ala Arg Thr Cys Arg Asp Leu Lys Met Cys His Ser Asp Trp
1280 1285 1290
aaa agt ggt gaa tac tgg att gat cca aat cag ggc tgt aat ctc 4101
Lys Ser Gly Glu Tyr Trp Ile Asp Pro Asn Gln Gly Cys Asn Leu
1295 1300 1305
gat gct atc aaa gtt ttc tgt aac atg gaa aca ggc gaa aca tgc 4146
Asp Ala Ile Lys Val Phe Cys Asn Met Glu Thr Gly Glu Thr Cys
1310 1315 1320
gtt tat cct act caa cct tcc gtg gct cag aaa aat tgg tac atc 4191
Val Tyr Pro Thr Gln Pro Ser Val Ala Gln Lys Asn Trp Tyr Ile
1325 1330 1335
tca aaa aat cct aaa gat aag agg cac gtt tgg ttc ggt gaa agt 4236
Ser Lys Asn Pro Lys Asp Lys Arg His Val Trp Phe Gly Glu Ser
1340 1345 1350
atg act gat gga ttt caa ttt gag tac ggc ggt caa ggt agt gat 4281
Met Thr Asp Gly Phe Gln Phe Glu Tyr Gly Gly Gln Gly Ser Asp
1355 1360 1365
cca gct gat gtg gct att caa ctc aca ttt ttg cgt ctt atg tcc 4326
Pro Ala Asp Val Ala Ile Gln Leu Thr Phe Leu Arg Leu Met Ser
1370 1375 1380
aca gag gca tca caa aac atc act tac cac tgc aaa aac agt gtg 4371
Thr Glu Ala Ser Gln Asn Ile Thr Tyr His Cys Lys Asn Ser Val
1385 1390 1395
gct tat atg gat caa caa aca gga aac ctt aag aag gct ctt ctt 4416
Ala Tyr Met Asp Gln Gln Thr Gly Asn Leu Lys Lys Ala Leu Leu
1400 1405 1410
ttg aag ggc tca aac gag att gag att aga gca gag ggc aac tca 4461
Leu Lys Gly Ser Asn Glu Ile Glu Ile Arg Ala Glu Gly Asn Ser
1415 1420 1425
agg ttt act tat tca gtt act gtt gat ggc tgc act tca cat act 4506
Arg Phe Thr Tyr Ser Val Thr Val Asp Gly Cys Thr Ser His Thr
1430 1435 1440
ggc gct tgg ggt aaa aca gtt atc gag tat aag act aca aaa aca 4551
Gly Ala Trp Gly Lys Thr Val Ile Glu Tyr Lys Thr Thr Lys Thr
1445 1450 1455
tca aga ctc cca atc att gat gtt gct cct ctc gat gtt ggc gct 4596
Ser Arg Leu Pro Ile Ile Asp Val Ala Pro Leu Asp Val Gly Ala
1460 1465 1470
cct gat caa gag ttc ggt ttt gat gtg ggc cca gtt tgt ttc ctc 4641
Pro Asp Gln Glu Phe Gly Phe Asp Val Gly Pro Val Cys Phe Leu
1475 1480 1485
taa tgagctcgcg gccgcatc 4662
<210>3
<211>1489
<212>PRT
<213> Artificial sequence
<220>
<223> vascular signal sequence and flanking of barley thiol protease aleurain precursor gene fused to human collagen alpha 1(I) chain
Synthetic sequences of the wing regions
<400>3
Met Ala His Ala Arg Val Leu Leu Leu Ala Leu Ala Val Leu Ala Thr
1 5 10 15
Ala Ala Val Ala Val Ala Ser Ser Ser Ser Phe Ala Asp Ser Asn Pro
20 25 30
Ile Arg Pro Val Thr Asp Arg Ala Ala Ser Thr Leu Ala Gln Leu Gln
35 40 45
Glu Glu Gly Gln Val Glu Gly Gln Asp Glu Asp Ile Pro Pro Ile Thr
50 55 60
Cys Val Gln Asn Gly Leu Arg Tyr His Asp Arg Asp Val Trp Lys Pro
65 70 75 80
Glu Pro Cys Arg Ile Cys Val Cys Asp Asn Gly Lys Val Leu Cys Asp
85 90 95
Asp Val Ile Cys Asp Glu Thr Lys Asn Cys Pro Gly Ala Glu Val Pro
100 105 110
Glu Gly Glu Cys Cys Pro Val Cys Pro Asp Gly Ser Glu Ser Pro Thr
115 120 125
Asp Gln Glu Thr Thr Gly Val Glu Gly Pro Lys Gly Asp Thr Gly Pro
130 135 140
Arg Gly Pro Arg Gly Pro Ala Gly Pro Pro Gly Arg Asp Gly Ile Pro
145 150 155 160
Gly Gln Pro Gly Leu Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly
165 170 175
Pro Pro Gly Leu Gly Gly Asn Phe Ala Pro Gln Leu Ser Tyr Gly Tyr
180 185 190
Asp Glu Lys Ser Thr Gly Gly Ile Ser Val Pro Gly Pro Met Gly Pro
195 200 205
Ser Gly Pro Arg Gly Leu Pro Gly Pro Pro Gly Ala Pro Gly Pro Gln
210 215 220
Gly Phe Gln Gly Pro Pro Gly Glu Pro Gly Glu Pro Gly Ala Ser Gly
225 230 235 240
Pro Met Gly Pro Arg Gly Pro Pro Gly Pro Pro Gly Lys Asn Gly Asp
245 250 255
Asp Gly Glu Ala Gly Lys Pro Gly Arg Pro Gly Glu Arg Gly Pro Pro
260 265 270
Gly Pro Gln Gly Ala Arg Gly Leu Pro Gly Thr Ala Gly Leu Pro Gly
275 280 285
Met Lys Gly His Arg Gly Phe Ser Gly Leu Asp Gly Ala Lys Gly Asp
290 295 300
Ala Gly Pro Ala Gly Pro Lys Gly Glu Pro Gly Ser Pro Gly Glu Asn
305 310 315 320
Gly Ala Pro Gly Gln Met Gly Pro Arg Gly Leu Pro Gly Glu Arg Gly
325 330 335
Arg Pro Gly Ala Pro Gly Pro Ala Gly Ala Arg Gly Asn Asp Gly Ala
340 345 350
Thr Gly Ala Ala Gly Pro Pro Gly Pro Thr Gly Pro Ala Gly Pro Pro
355 360 365
Gly Phe Pro Gly Ala Val Gly Ala Lys Gly Glu Ala Gly Pro Gln Gly
370 375 380
Pro Arg Gly Ser Glu Gly Pro Gln Gly Val Arg Gly Glu Pro Gly Pro
385 390 395 400
Pro Gly Pro Ala Gly Ala Ala Gly Pro Ala Gly Asn Pro Gly Ala Asp
405 410 415
Gly Gln Pro Gly Ala Lys Gly Ala Asn Gly Ala Pro Gly Ile Ala Gly
420 425 430
Ala Pro Gly Phe Pro Gly Ala Arg Gly Pro Ser Gly Pro Gln Gly Pro
435 440 445
Gly Gly Pro Pro Gly Pro Lys Gly Asn Ser Gly Glu Pro Gly Ala Pro
450 455 460
Gly Ser Lys Gly Asp Thr Gly Ala Lys Gly Glu Pro Gly Pro Val Gly
465 470 475 480
Val Gln Gly Pro Pro Gly Pro Ala Gly Glu Glu Gly Lys Arg Gly Ala
485 490 495
Arg Gly Glu Pro Gly Pro Thr Gly Leu Pro Gly Pro Pro Gly Glu Arg
500 505 510
Gly Gly Pro Gly Ser Arg Gly Phe Pro Gly Ala Asp Gly Val Ala Gly
515 520 525
Pro Lys Gly Pro Ala Gly Glu Arg Gly Ser Pro Gly Pro Ala Gly Pro
530 535 540
Lys Gly Ser Pro Gly Glu Ala Gly Arg Pro Gly Glu Ala Gly Leu Pro
545 550 555 560
Gly Ala Lys Gly Leu Thr Gly Ser Pro Gly Ser Pro Gly Pro Asp Gly
565 570 575
Lys Thr Gly Pro Pro Gly Pro Ala Gly Gln Asp Gly Arg Pro Gly Pro
580 585 590
Pro Gly Pro Pro Gly Ala Arg Gly Gln Ala Gly Val Met Gly Phe Pro
595 600 605
Gly Pro Lys Gly Ala Ala Gly Glu Pro Gly Lys Ala Gly Glu Arg Gly
610 615 620
Val Pro Gly Pro Pro Gly Ala Val Gly Pro Ala Gly Lys Asp Gly Glu
625 630 635 640
Ala Gly Ala Gln Gly Pro Pro Gly Pro Ala Gly Pro Ala Gly Glu Arg
645 650 655
Gly Glu Gln Gly Pro Ala Gly Ser Pro Gly Phe Gln Gly Leu Pro Gly
660 665 670
Pro Ala Gly Pro Pro Gly Glu Ala Gly Lys Pro Gly Glu Gln Gly Val
675 680 685
Pro Gly Asp Leu Gly Ala Pro Gly Pro Ser Gly Ala Arg Gly Glu Arg
690 695 700
Gly Phe Pro Gly Glu Arg Gly Val Gln Gly Pro Pro Gly Pro Ala Gly
705 710 715 720
Pro Arg Gly Ala Asn Gly Ala Pro Gly Asn Asp Gly Ala Lys Gly Asp
725 730 735
Ala Gly Ala Pro Gly Ala Pro Gly Ser Gln Gly Ala Pro Gly Leu Gln
740 745 750
Gly Met Pro Gly Glu Arg Gly Ala Ala Gly Leu Pro Gly Pro Lys Gly
755 760 765
Asp Arg Gly Asp Ala Gly Pro Lys Gly Ala Asp Gly Ser Pro Gly Lys
770 775 780
Asp Gly Val Arg Gly Leu Thr Gly Pro Ile Gly Pro Pro Gly Pro Ala
785 790 795 800
Gly Ala Pro Gly Asp Lys Gly Glu Ser Gly Pro Ser Gly Pro Ala Gly
805 810 815
Pro Thr Gly Ala Arg Gly Ala Pro Gly Asp Arg Gly Glu Pro Gly Pro
820 825 830
Pro Gly Pro Ala Gly Phe Ala Gly Pro Pro Gly Ala Asp Gly Gln Pro
835 840 845
Gly Ala Lys Gly Glu Pro Gly Asp Ala Gly Ala Lys Gly Asp Ala Gly
850 855 860
Pro Pro Gly Pro Ala Gly Pro Ala Gly Pro Pro Gly Pro Ile Gly Asn
865 870 875 880
Val Gly Ala Pro Gly Ala Lys Gly Ala Arg Gly Ser Ala Gly Pro Pro
885 890 895
Gly Ala Thr Gly Phe Pro Gly Ala Ala Gly Arg Val Gly Pro Pro Gly
900 905 910
Pro Ser Gly Asn Ala Gly Pro Pro Gly Pro Pro Gly Pro Ala Gly Lys
915 920 925
Glu Gly Gly Lys Gly Pro Arg Gly Glu Thr Gly Pro Ala Gly Arg Pro
930 935 940
Gly Glu Val Gly Pro Pro Gly Pro Pro Gly Pro Ala Gly Glu Lys Gly
945 950 955 960
Ser Pro Gly Ala Asp Gly Pro Ala Gly Ala Pro Gly Thr Pro Gly Pro
965 970 975
Gln Gly Ile Ala Gly Gln Arg Gly Val Val Gly Leu Pro Gly Gln Arg
980 985 990
Gly Glu Arg Gly Phe Pro Gly Leu Pro Gly Pro Ser Gly Glu Pro Gly
995 1000 1005
Lys Gln Gly Pro Ser Gly Ala Ser Gly Glu Arg Gly Pro Pro Gly
1010 1015 1020
Pro Met Gly Pro Pro Gly Leu Ala Gly Pro Pro Gly Glu Ser Gly
1025 1030 1035
Arg Glu Gly Ala Pro Gly Ala Glu Gly Ser Pro Gly Arg Asp Gly
1040 1045 1050
Ser Pro Gly Ala Lys Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly
1055 1060 1065
Pro Pro Gly Ala Pro Gly Ala Pro Gly Ala Pro Gly Pro Val Gly
1070 1075 1080
Pro Ala Gly Lys Ser Gly Asp Arg Gly Glu Thr Gly Pro Ala Gly
1085 1090 1095
Pro Ala Gly Pro Val Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly
1100 1105 1110
Pro Gln Gly Pro Arg Gly Asp Lys Gly Glu Thr Gly Glu Gln Gly
1115 1120 1125
Asp Arg Gly Ile Lys Gly His Arg Gly Phe Ser Gly Leu Gln Gly
1130 1135 1140
Pro Pro Gly Pro Pro Gly Ser Pro Gly Glu Gln Gly Pro Ser Gly
1145 1150 1155
Ala Ser Gly Pro Ala Gly Pro Arg Gly Pro Pro Gly Ser Ala Gly
1160 1165 1170
Ala Pro Gly Lys Asp Gly Leu Asn Gly Leu Pro Gly Pro Ile Gly
1175 1180 1185
Pro Pro Gly Pro Arg Gly Arg Thr Gly Asp Ala Gly Pro Val Gly
1190 1195 1200
Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Ser
1205 1210 1215
Ala Gly Phe Asp Phe Ser Phe Leu Pro Gln Pro Pro Gln Glu Lys
1220 1225 1230
Ala His Asp Gly Gly Arg Tyr Tyr Arg Ala Asp Asp Ala Asn Val
1235 1240 1245
Val Arg Asp Arg Asp Leu Glu Val Asp Thr Thr Leu Lys Ser Leu
1250 1255 1260
Ser Gln Gln Ile Glu Asn Ile Arg Ser Pro Glu Gly Ser Arg Lys
1265 1270 1275
Asn Pro Ala Arg Thr Cys Arg Asp Leu Lys Met Cys His Ser Asp
1280 1285 1290
Trp Lys Ser Gly Glu Tyr Trp Ile Asp Pro Asn Gln Gly Cys Asn
1295 1300 1305
Leu Asp Ala Ile Lys Val Phe Cys Asn Met Glu Thr Gly Glu Thr
1310 1315 1320
Cys Val Tyr Pro Thr Gln Pro Ser Val Ala Gln Lys Asn Trp Tyr
1325 1330 1335
Ile Ser Lys Asn Pro Lys Asp Lys Arg His Val Trp Phe Gly Glu
1340 1345 1350
Ser Met Thr Asp Gly Phe Gln Phe Glu Tyr Gly Gly Gln Gly Ser
1355 1360 1365
Asp Pro Ala Asp Val Ala Ile Gln Leu Thr Phe Leu Arg Leu Met
1370 1375 1380
Ser Thr Glu Ala Ser Gln Asn Ile Thr Tyr His Cys Lys Asn Ser
1385 1390 1395
Val Ala Tyr Met Asp Gln Gln Thr Gly Asn Leu Lys Lys Ala Leu
1400 1405 1410
Leu Leu Lys Gly Ser Asn Glu Ile Glu Ile Arg Ala Glu Gly Asn
1415 1420 1425
Ser Arg Phe Thr Tyr Ser Val Thr Val Asp Gly Cys Thr Ser His
1430 1435 1440
Thr Gly Ala Trp Gly Lys Thr Val Ile Glu Tyr Lys Thr Thr Lys
1445 1450 1455
Thr Ser Arg Leu Pro Ile Ile Asp Val Ala Pro Leu Asp Val Gly
1460 1465 1470
Ala Pro Asp Gln Glu Phe Gly Phe Asp Val Gly Pro Val Cys Phe
1475 1480 1485
Leu
<210>4
<211>4362
<212>DNA
<213> Artificial sequence
<220>
<223> vascular signal sequence containing barley thiol protease aleurain precursor gene fused with human collagen alpha 2(I) chain
Synthetic sequences of coding and flanking regions
<400>4
gcgatgcatg taatgtcatg agccacatga tccaatggcc acaggaacgt aagaatgtag 60
atagatttga ttttgtccgt tagatagcaa acaacattat aaaaggtgtg tatcaatacg 120
aactaattca ctcattggat tcatagaagt ccattcctcc taagtatcta aaccatggct 180
cacgctcgtg ttctcctcct cgctctcgct gttttggcaa cagctgctgt ggctgtggct 240
tcaagttcta gttttgctga ttccaaccca attcgtccag ttactgatag agcagcttcc 300
actttggctc aattgcttca agaagaaact gtgaggaagg gccctgctgg cgataggggc 360
cctaggggcg aaaggggtcc accaggacct ccaggcaggg atggcgaaga tggtccaact 420
ggccctcctg gacctcctgg ccctccaggg ccacccggct tgggcggaaa cttcgcagct 480
caatacgatg gcaagggtgt tggtcttggt cctggtccta tgggcttgat gggacctaga 540
ggcccacctg gtgctgctgg tgctcctgga ccacagggtt ttcagggacc agctggcgag 600
ccaggagagc caggccaaac aggaccagct ggtgcaaggg gacctgctgg acctcctgga 660
aaagctggtg aagatggtca cccaggcaaa ccaggacgtc ctggcgaaag aggtgttgtt 720
ggaccacaag gcgctagggg atttccaggt acacctggat tgccaggttt taagggcatt 780
cgtggtcata acggcctcga tggattgaag ggacagcctg gcgcacctgg cgttaagggt 840
gaacctggag caccaggtga aaacggtact cctggccaga ctggtgcaag aggactccca 900
ggtgaaaggg gtagagttgg tgctcctgga cctgctggag ctaggggtag tgatggtagt 960
gttggtcctg tgggccctgc tggtccaatc ggttccgctg gcccacctgg attcccaggc 1020
gctccaggac ctaaaggaga aatcggtgct gtgggtaacg caggtcctac tggtccagca 1080
ggtcctcgtg gagaagtggg attgccagga ctttctggtc cagtgggccc tccaggcaac 1140
cctggagcta acggcttgac aggagctaaa ggcgcagcag gactccctgg agtggctggc 1200
gcaccaggat tgcctggtcc aaggggtatc ccaggccctg ttggcgcagc tggagctact 1260
ggtgcacgtg gacttgttgg cgaaccaggc cctgctggat caaaaggcga gtctggaaat 1320
aagggagaac ctggttctgc tggacctcaa ggtcctcctg gaccttctgg agaagaagga 1380
aaaaggggac caaatggcga ggctggatca gcaggtccac caggaccacc tggacttcgt 1440
ggatcccctg gtagtagagg acttccaggc gctgatggta gagcaggcgt tatgggacca 1500
ccaggaagta gaggagcatc cggtccagca ggagttaggg gtcctaacgg agatgctggt 1560
agaccaggtg aaccaggtct tatgggccca aggggcctcc caggtagtcc aggaaatatc 1620
ggccctgctg gaaaagaagg ccctgttgga cttccaggta ttgatggacg tcctggccct 1680
attggcccag caggtgcaag aggagaacct ggcaatattg gatttccagg accaaagggt 1740
ccaacaggcg atcctggaaa aaatggagat aagggtcatg ctggattggc aggcgcaagg 1800
ggcgctcctg gtccagatgg aaacaacggc gcacagggtc cacctggccc tcagggtgtt 1860
caaggcggaa aaggcgaaca aggcccagct ggaccaccag gctttcaagg cttgccagga 1920
ccaagtggtc cagcaggtga agttggcaag ccaggcgagc gtggacttca tggcgagttt 1980
ggactccctg gaccagcagg accaaggggt gaaagaggcc ctcctggaga gagtggcgct 2040
gctggaccaa caggcccaat cggtagtaga ggtcctagtg gacctccagg cccagatgga 2100
aataagggtg aaccaggagt tgtgggcgct gttggaacag ctggtccttc aggaccatca 2160
ggactcccag gcgagagagg cgctgctggc attcctggag gaaaaggtga aaaaggcgaa 2220
cctggcctcc gtggcgaaat cggaaatcct ggacgtgatg gtgctcgtgg tgcacacggc 2280
gctgtgggcg ctccaggccc tgctggtgct actggtgata gaggagaggc tggcgcagct 2340
ggcccagcag gtcctgctgg cccaaggggt agtcctggtg aaagaggcga agttggacct 2400
gctggcccta acggctttgc tggccctgct ggagcagcag gtcaacctgg cgctaaaggt 2460
gaaaggggcg gaaagggccc aaaaggtgaa aatggcgttg tgggaccaac tggtccagtg 2520
ggcgcagctg gacctgctgg tccaaatgga ccaccaggac cagcaggtag tagaggagat 2580
ggtggacctc caggaatgac aggttttcca ggtgctgctg gtagaacagg acctcctggt 2640
cctagtggta tttctggtcc accaggacca ccaggtcctg ctggaaaaga aggattgagg 2700
ggtccacgtg gtgatcaagg accagtgggc agaactggtg aagttggcgc agtgggacca 2760
cctggttttg ctggagaaaa gggcccttct ggagaggcag gaacagctgg tcctcctggt 2820
acacctggac ctcaaggact tttgggtgca cctggtattc tcggattgcc aggaagtagg 2880
ggcgaacgtg gacttcctgg cgtggcagga gcagttggag aacctggccc tctcggaatc 2940
gcaggcccac caggcgcaag aggaccacca ggagctgttg gatcaccagg cgtgaatggt 3000
gcacctggcg aggctggtcg tgatggaaac ccaggaaatg atggcccacc aggaagagat 3060
ggtcaacctg gacacaaagg cgagaggggc tacccaggaa atattggccc agttggtgct 3120
gctggcgcac caggcccaca cggtccagtt ggaccagcag gaaaacacgg taatcgtggc 3180
gaaacaggcc cttcaggccc agtgggacct gctggtgctg ttggcccaag aggaccatct 3240
ggacctcaag gcattagagg cgataaggga gagcctggcg aaaaaggacc tagaggcttg 3300
cctggtttta aaggacacaa cggtctccaa ggacttccag gtatcgctgg tcatcatgga 3360
gatcagggtg ctcctggatc agtgggtcca gcaggtccta gaggcccagc aggcccttcc 3420
ggtccagcag gaaaggatgg acgtactggc caccctggaa ctgtgggccc tgctggaatt 3480
agaggtcctc aaggtcatca gggccctgct ggccctccag gtccaccagg tcctccaggc 3540
ccaccaggag tttcaggtgg tggttacgat tttggttacg atggtgattt ttaccgtgct 3600
gatcaaccta gaagtgctcc ttctctccgt cctaaagatt atgaagttga tgctactttg 3660
aaatcactta acaaccagat tgagactctt ctcacacctg agggatcaag aaagaatcca 3720
gcacgtacat gccgtgatct cagacttagt cacccagagt ggtcaagtgg ctattattgg 3780
attgatccta atcagggttg tacaatggag gctatcaaag tttactgtga ttttccaact 3840
ggagagacat gtattagggc acaacctgag aacattccag ctaaaaattg gtatcgttcc 3900
tctaaagata agaaacatgt ttggctcgga gagactatta acgctggttc tcagttcgag 3960
tataatgttg agggcgttac ttctaaagag atggcaactc agctcgcttt tatgagattg 4020
ctcgctaact acgcatccca aaacatcact tatcactgca aaaattccat tgcatatatg 4080
gatgaggaga caggaaattt gaagaaagca gttattctcc aaggtagtaa cgatgttgag 4140
cttgtggctg agggaaatag tagattcact tacacagttt tggtggatgg atgctcaaag 4200
aaaactaatg agtggggcaa gacaatcatt gagtacaaga caaataagcc ttctaggctc 4260
ccatttctcg atattgcacc tcttgatatc ggaggagctg atcacgagtt ttttgttgat 4320
atcggacctg tttgttttaa gtaatgagct cgcggccgca tc 4362
<210>5
<211>4362
<212>DNA
<213> Artificial sequence
<220>
<223> vascular signal sequence and flanking of barley thiol protease aleurain precursor gene fused to human collagen alpha 2(I) chain
Synthetic sequences of the wing regions
<220>
<221>CDS
<222>(175)..(4344)
<400>5
gcgatgcatg taatgtcatg agccacatga tccaatggcc acaggaacgt aagaatgtag 60
atagatttga ttttgtccgt tagatagcaa acaacattat aaaaggtgtg tatcaatacg 120
aactaattca ctcattggat tcatagaagt ccattcctcc taagtatcta aacc atg 177
Met
1
gct cac gct cgt gtt ctc ctc ctc gct ctc gct gtt ttg gca aca gct 225
Ala His Ala Arg Val Leu Leu Leu Ala Leu Ala Val Leu Ala Thr Ala
5 10 15
gct gtg gct gtg gct tca agt tct agt ttt gct gat tcc aac cca att 273
Ala Val Ala Val Ala Ser Ser Ser Ser Phe Ala Asp Ser Asn Pro Ile
20 25 30
cgt cca gtt act gat aga gca gct tcc act ttg gct caa ttg ctt caa 321
Arg Pro Val Thr Asp Arg Ala Ala Ser Thr Leu Ala Gln Leu Leu Gln
35 40 45
gaa gaa act gtg agg aag ggc cct gct ggc gat agg ggc cct agg ggc 369
Glu Glu Thr Val Arg Lys Gly Pro Ala Gly Asp Arg Gly Pro Arg Gly
50 55 60 65
gaa agg ggt cca cca gga cct cca ggc agg gat ggc gaa gat ggt cca 417
Glu Arg Gly Pro Pro Gly Pro Pro Gly Arg Asp Gly Glu Asp Gly Pro
70 75 80
act ggc cct cct gga cct cct ggc cct cca ggg cca ccc ggc ttg ggc 465
Thr Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Leu Gly
85 90 95
gga aac ttc gca gct caa tac gat ggc aag ggt gtt ggt ctt ggt cct 513
Gly Asn Phe Ala Ala Gln Tyr Asp Gly Lys Gly Val Gly Leu Gly Pro
100 105 110
ggt cct atg ggc ttg atg gga cct aga ggc cca cct ggt gct gct ggt 561
Gly Pro Met Gly Leu Met Gly Pro Arg Gly Pro Pro Gly Ala Ala Gly
115 120 125
gct cct gga cca cag ggt ttt cag gga cca gct ggc gag cca gga gag 609
Ala Pro Gly Pro Gln Gly Phe Gln Gly Pro Ala Gly Glu Pro Gly Glu
130 135 140 145
cca ggc caa aca gga cca gct ggt gca agg gga cct gct gga cct cct 657
Pro Gly Gln Thr Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro Pro
150 155 160
gga aaa gct ggt gaa gat ggt cac cca ggc aaa cca gga cgt cct ggc 705
Gly Lys Ala Gly Glu Asp Gly His Pro Gly Lys Pro Gly Arg Pro Gly
165 170 175
gaa aga ggt gtt gtt gga cca caa ggc gct agg gga ttt cca ggt aca 753
Glu Arg Gly Val Val Gly Pro Gln Gly Ala Arg Gly Phe Pro Gly Thr
180 185 190
cct gga ttg cca ggt ttt aag ggc att cgt ggt cat aac ggc ctc gat 801
Pro Gly Leu Pro Gly Phe Lys Gly Ile Arg Gly His Asn Gly Leu Asp
195 200 205
gga ttg aag gga cag cct ggc gca cct ggc gtt aag ggt gaa cct gga 849
Gly Leu Lys Gly Gln Pro Gly Ala Pro Gly Val Lys Gly Glu Pro Gly
210 215 220 225
gca cca ggt gaa aac ggt act cct ggc cag act ggt gca aga gga ctc 897
Ala Pro Gly Glu Asn Gly Thr Pro Gly Gln Thr Gly Ala Arg Gly Leu
230 235 240
cca ggt gaa agg ggt aga gtt ggt gct cct gga cct gct gga gct agg 945
Pro Gly Glu Arg Gly Arg Val Gly Ala Pro Gly Pro Ala Gly Ala Arg
245 250 255
ggt agt gat ggt agt gtt ggt cct gtg ggc cct gct ggt cca atc ggt 993
Gly Ser Asp Gly Ser Val Gly Pro Val Gly Pro Ala Gly Pro Ile Gly
260 265 270
tcc gct ggc cca cct gga ttc cca ggc gct cca gga cct aaa gga gaa 1041
Ser Ala Gly Pro Pro Gly Phe Pro Gly Ala Pro Gly Pro Lys Gly Glu
275 280 285
atc ggt gct gtg ggt aac gca ggt cct act ggt cca gca ggt cct cgt 1089
Ile Gly Ala Val Gly Asn Ala Gly Pro Thr Gly Pro Ala Gly Pro Arg
290 295 300 305
gga gaa gtg gga ttg cca gga ctt tct ggt cca gtg ggc cct cca ggc 1137
Gly Glu Val Gly Leu Pro Gly Leu Ser Gly Pro Val Gly Pro Pro Gly
310 315 320
aac cct gga gct aac ggc ttg aca gga gct aaa ggc gca gca gga ctc 1185
Asn Pro Gly Ala Asn Gly Leu Thr Gly Ala Lys Gly Ala Ala Gly Leu
325 330 335
cct gga gtg gct ggc gca cca gga ttg cct ggt cca agg ggt atc cca 1233
Pro Gly Val Ala Gly Ala Pro Gly Leu Pro Gly Pro Arg Gly Ile Pro
340 345 350
ggc cct gtt ggc gca gct gga gct act ggt gca cgt gga ctt gtt ggc 1281
Gly Pro Val Gly Ala Ala Gly Ala Thr Gly Ala Arg Gly Leu Val Gly
355 360 365
gaa cca ggc cct gct gga tca aaa ggc gag tct gga aat aag gga gaa 1329
Glu Pro Gly Pro Ala Gly Ser Lys Gly Glu Ser Gly Asn Lys Gly Glu
370 375 380 385
cct ggt tct gct gga cct caa ggt cct cct gga cct tct gga gaa gaa 1377
Pro Gly Ser Ala Gly Pro Gln Gly Pro Pro Gly Pro Ser Gly Glu Glu
390 395 400
gga aaa agg gga cca aat ggc gag gct gga tca gca ggt cca cca gga 1425
Gly Lys Arg Gly Pro Asn Gly Glu Ala Gly Ser Ala Gly Pro Pro Gly
405 410 415
cca cct gga ctt cgt gga tcc cct ggt agt aga gga ctt cca ggc gct 1473
Pro Pro Gly Leu Arg Gly Ser Pro Gly Ser Arg Gly Leu Pro Gly Ala
420 425 430
gat ggt aga gca ggc gtt atg gga cca cca gga agt aga gga gca tcc 1521
Asp Gly Arg Ala Gly Val Met Gly Pro Pro Gly Ser Arg Gly Ala Ser
435 440 445
ggt cca gca gga gtt agg ggt cct aac gga gat gct ggt aga cca ggt 1569
Gly Pro Ala Gly Val Arg Gly Pro Asn Gly Asp Ala Gly Arg Pro Gly
450 455 460 465
gaa cca ggt ctt atg ggc cca agg ggc ctc cca ggt agt cca gga aat 1617
Glu Pro Gly Leu Met Gly Pro Arg Gly Leu Pro Gly Ser Pro Gly Asn
470 475 480
atc ggc cct gct gga aaa gaa ggc cct gtt gga ctt cca ggt att gat 1665
Ile Gly Pro Ala Gly Lys Glu Gly Pro Val Gly Leu Pro Gly Ile Asp
485 490 495
gga cgt cct ggc cct att ggc cca gca ggt gca aga gga gaa cct ggc 1713
Gly Arg Pro Gly Pro Ile Gly Pro Ala Gly Ala Arg Gly Glu Pro Gly
500 505 510
aat att gga ttt cca gga cca aag ggt cca aca ggc gat cct gga aaa 1761
Asn Ile Gly Phe Pro Gly Pro Lys Gly Pro Thr Gly Asp Pro Gly Lys
515 520 525
aat gga gat aag ggt cat gct gga ttg gca ggc gca agg ggc gct cct 1809
Asn Gly Asp Lys Gly His Ala Gly Leu Ala Gly Ala Arg Gly Ala Pro
530 535 540 545
ggt cca gat gga aac aac ggc gca cag ggt cca cct ggc cct cag ggt 1857
Gly Pro Asp Gly Asn Asn Gly Ala Gln Gly Pro Pro Gly Pro Gln Gly
550 555 560
gtt caa ggc gga aaa ggc gaa caa ggc cca gct gga cca cca ggc ttt 1905
Val Gln Gly Gly Lys Gly Glu Gln Gly Pro Ala Gly Pro Pro Gly Phe
565 570 575
caa ggc ttg cca gga cca agt ggt cca gca ggt gaa gtt ggc aag cca 1953
Gln Gly Leu Pro Gly Pro Ser Gly Pro Ala Gly Glu Val Gly Lys Pro
580 585 590
ggc gag cgt gga ctt cat ggc gag ttt gga ctc cct gga cca gca gga 2001
Gly Glu Arg Gly Leu His Gly Glu Phe Gly Leu Pro Gly Pro Ala Gly
595 600 605
cca agg ggt gaa aga ggc cct cct gga gag agt ggc gct gct gga cca 2049
Pro Arg Gly Glu Arg Gly Pro Pro Gly Glu Ser Gly Ala Ala Gly Pro
610 615 620 625
aca ggc cca atc ggt agt aga ggt cct agt gga cct cca ggc cca gat 2097
Thr Gly Pro Ile Gly Ser Arg Gly Pro Ser Gly Pro Pro Gly Pro Asp
630 635 640
gga aat aag ggt gaa cca gga gtt gtg ggc gct gtt gga aca gct ggt 2145
Gly Asn Lys Gly Glu Pro Gly Val Val Gly Ala Val Gly Thr Ala Gly
645 650 655
cct tca gga cca tca gga ctc cca ggc gag aga ggc gct gct ggc att 2193
Pro Ser Gly Pro Ser Gly Leu Pro Gly Glu Arg Gly Ala Ala Gly Ile
660 665 670
cct gga gga aaa ggt gaa aaa ggc gaa cct ggc ctc cgt ggc gaa atc 2241
Pro Gly Gly Lys Gly Glu Lys Gly Glu Pro Gly Leu Arg Gly Glu Ile
675 680 685
gga aat cct gga cgt gat ggt gct cgt ggt gca cac ggc gct gtg ggc 2289
Gly Asn Pro Gly Arg Asp Gly Ala Arg Gly Ala His Gly Ala Val Gly
690 695 700 705
gct cca ggc cct gct ggt gct act ggt gat aga gga gag gct ggc gca 2337
Ala Pro Gly Pro Ala Gly Ala Thr Gly Asp Arg Gly Glu Ala Gly Ala
710 715 720
gct ggc cca gca ggt cct gct ggc cca agg ggt agt cct ggt gaa aga 2385
Ala Gly Pro Ala Gly Pro Ala Gly Pro Arg Gly Ser Pro Gly Glu Arg
725 730 735
ggc gaa gtt gga cct gct ggc cct aac ggc ttt gct ggc cct gct gga 2433
Gly Glu Val Gly Pro Ala Gly Pro Asn Gly Phe Ala Gly Pro Ala Gly
740 745 750
gca gca ggt caa cct ggc gct aaa ggt gaa agg ggc gga aag ggc cca 2481
Ala Ala Gly Gln Pro Gly Ala Lys Gly Glu Arg Gly Gly Lys Gly Pro
755 760 765
aaa ggt gaa aat ggc gtt gtg gga cca act ggt cca gtg ggc gca gct 2529
Lys Gly Glu Asn Gly Val Val Gly Pro Thr Gly Pro Val Gly Ala Ala
770 775 780 785
gga cct gct ggt cca aat gga cca cca gga cca gca ggt agt aga gga 2577
Gly Pro Ala Gly Pro Asn Gly Pro Pro Gly Pro Ala Gly Ser Arg Gly
790 795 800
gat ggt gga cct cca gga atg aca ggt ttt cca ggt gct gct ggt aga 2625
Asp Gly Gly Pro Pro Gly Met Thr Gly Phe Pro Gly Ala Ala Gly Arg
805 810 815
aca gga cct cct ggt cct agt ggt att tct ggt cca cca gga cca cca 2673
Thr Gly Pro Pro Gly Pro Ser Gly Ile Ser Gly Pro Pro Gly Pro Pro
820 825 830
ggt cct gct gga aaa gaa gga ttg agg ggt cca cgt ggt gat caa gga 2721
Gly Pro Ala Gly Lys Glu Gly Leu Arg Gly Pro Arg Gly Asp Gln Gly
835 840 845
cca gtg ggc aga act ggt gaa gtt ggc gca gtg gga cca cct ggt ttt 2769
Pro Val Gly Arg Thr Gly Glu Val Gly Ala Val Gly Pro Pro Gly Phe
850 855 860 865
gct gga gaa aag ggc cct tct gga gag gca gga aca gct ggt cct cct 2817
Ala Gly Glu Lys Gly Pro Ser Gly Glu Ala Gly Thr Ala Gly Pro Pro
870 875 880
ggt aca cct gga cct caa gga ctt ttg ggt gca cct ggt att ctc gga 2865
Gly Thr Pro Gly Pro Gln Gly Leu Leu Gly Ala Pro Gly Ile Leu Gly
885 890 895
ttg cca gga agt agg ggc gaa cgt gga ctt cct ggc gtg gca gga gca 2913
Leu Pro Gly Ser Arg Gly Glu Arg Gly Leu Pro Gly Val Ala Gly Ala
900 905 910
gtt gga gaa cct ggc cct ctc gga atc gca ggc cca cca ggc gca aga 2961
Val Gly Glu Pro Gly Pro Leu Gly Ile Ala Gly Pro Pro Gly Ala Arg
915 920 925
gga cca cca gga gct gtt gga tca cca ggc gtg aat ggt gca cct ggc 3009
Gly Pro Pro Gly Ala Val Gly Ser Pro Gly Val Asn Gly Ala Pro Gly
930 935 940 945
gag gct ggt cgt gat gga aac cca gga aat gat ggc cca cca gga aga 3057
Glu Ala Gly Arg Asp Gly Asn Pro Gly Asn Asp Gly Pro Pro Gly Arg
950 955 960
gat ggt caa cct gga cac aaa ggc gag agg ggc tac cca gga aat att 3105
Asp Gly Gln Pro Gly His Lys Gly Glu Arg Gly Tyr Pro Gly Asn Ile
965 970 975
ggc cca gtt ggt gct gct ggc gca cca ggc cca cac ggt cca gtt gga 3153
Gly Pro Val Gly Ala Ala Gly Ala Pro Gly Pro His Gly Pro Val Gly
980 985 990
cca gca gga aaa cac ggt aat cgt ggc gaa aca ggc cct tca ggc cca 3201
Pro Ala Gly Lys His Gly Asn Arg Gly Glu Thr Gly Pro Ser Gly Pro
995 1000 1005
gtg gga cct gct ggt gct gtt ggc cca aga gga cca tct gga cct 3246
Val Gly Pro Ala Gly Ala Val Gly Pro Arg Gly Pro Ser Gly Pro
1010 1015 1020
caa ggc att aga ggc gat aag gga gag cct ggc gaa aaa gga cct 3291
Gln Gly Ile Arg Gly Asp Lys Gly Glu Pro Gly Glu Lys Gly Pro
1025 1030 1035
aga ggc ttg cct ggt ttt aaa gga cac aac ggt ctc caa gga ctt 3336
Arg Gly Leu Pro Gly Phe Lys Gly His Asn Gly Leu Gln Gly Leu
1040 1045 1050
cca ggt atc gct ggt cat cat gga gat cag ggt gct cct gga tca 3381
Pro Gly Ile Ala Gly His His Gly Asp Gln Gly Ala Pro Gly Ser
1055 1060 1065
gtg ggt cca gca ggt cct aga ggc cca gca ggc cct tcc ggt cca 3426
Val Gly Pro Ala Gly Pro Arg Gly Pro Ala Gly Pro Ser Gly Pro
1070 1075 1080
gca gga aag gat gga cgt act ggc cac cct gga act gtg ggc cct 3471
Ala Gly Lys Asp Gly Arg Thr Gly His Pro Gly Thr Val Gly Pro
1085 1090 1095
gct gga att aga ggt cct caa ggt cat cag ggc cct gct ggc cct 3516
Ala Gly Ile Arg Gly Pro Gln Gly His Gln Gly Pro Ala Gly Pro
1100 1105 1110
cca ggt cca cca ggt cct cca ggc cca cca gga gtt tca ggt ggt 3561
Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Val Ser Gly Gly
1115 1120 1125
ggt tac gat ttt ggt tac gat ggt gat ttt tac cgt gct gat caa 3606
Gly Tyr Asp Phe Gly Tyr Asp Gly Asp Phe Tyr Arg Ala Asp Gln
1130 1135 1140
cct aga agt gct cct tct ctc cgt cct aaa gat tat gaa gtt gat 3651
Pro Arg Ser Ala Pro Ser Leu Arg Pro Lys Asp Tyr Glu Val Asp
1145 1150 1155
gct act ttg aaa tca ctt aac aac cag att gag act ctt ctc aca 3696
Ala Thr Leu Lys Ser Leu Asn Asn Gln Ile Glu Thr Leu Leu Thr
1160 1165 1170
cct gag gga tca aga aag aat cca gca cgt aca tgc cgt gat ctc 3741
Pro Glu Gly Ser Arg Lys Asn Pro Ala Arg Thr Cys Arg Asp Leu
1175 1180 1185
aga ctt agt cac cca gag tgg tca agt ggc tat tat tgg att gat 3786
Arg Leu Ser His Pro Glu Trp Ser Ser Gly Tyr Tyr Trp Ile Asp
1190 1195 1200
cct aat cag ggt tgt aca atg gag gct atc aaa gtt tac tgt gat 3831
Pro Asn Gln Gly Cys Thr Met Glu Ala Ile Lys Val Tyr Cys Asp
1205 1210 1215
ttt cca act gga gag aca tgt att agg gca caa cct gag aac att 3876
Phe Pro Thr Gly Glu Thr Cys Ile Arg Ala Gln Pro Glu Asn Ile
1220 1225 1230
cca gct aaa aat tgg tat cgt tcc tct aaa gat aag aaa cat gtt 3921
Pro Ala Lys Asn Trp Tyr Arg Ser Ser Lys Asp Lys Lys His Val
1235 1240 1245
tgg ctc gga gag act att aac gct ggt tct cag ttc gag tat aat 3966
Trp Leu Gly Glu Thr Ile Asn Ala Gly Ser Gln Phe Glu Tyr Asn
1250 1255 1260
gtt gag ggc gtt act tct aaa gag atg gca act cag ctc gct ttt 4011
Val Glu Gly Val Thr Ser Lys Glu Met Ala Thr Gln Leu Ala Phe
1265 1270 1275
atg aga ttg ctc gct aac tac gca tcc caa aac atc act tat cac 4056
Met Arg Leu Leu Ala Asn Tyr Ala Ser Gln Asn Ile Thr Tyr His
1280 1285 1290
tgc aaa aat tcc att gca tat atg gat gag gag aca gga aat ttg 4101
Cys Lys Asn Ser Ile Ala Tyr Met Asp Glu Glu Thr Gly Asn Leu
1295 1300 1305
aag aaa gca gtt att ctc caa ggt agt aac gat gtt gag ctt gtg 4146
Lys Lys Ala Val Ile Leu Gln Gly Ser Asn Asp Val Glu Leu Val
1310 1315 1320
gct gag gga aat agt aga ttc act tac aca gtt ttg gtg gat gga 4191
Ala Glu Gly Asn Ser Arg Phe Thr Tyr Thr Val Leu Val Asp Gly
1325 1330 1335
tgc tca aag aaa act aat gag tgg ggc aag aca atc att gag tac 4236
Cys Ser Lys Lys Thr Asn Glu Trp Gly Lys Thr Ile Ile Glu Tyr
1340 1345 1350
aag aca aat aag cct tct agg ctc cca ttt ctc gat att gca cct 4281
Lys Thr Asn Lys Pro Ser Arg Leu Pro Phe Leu Asp Ile Ala Pro
1355 1360 1365
ctt gat atc gga gga gct gat cac gag ttt ttt gtt gat atc gga 4326
Leu Asp Ile Gly Gly Ala Asp His Glu Phe Phe Val Asp Ile Gly
1370 1375 1380
cct gtt tgt ttt aag taa tgagctcgcg gccgcatc 4362
Pro Val Cys Phe Lys
1385
<210>6
<211>1389
<212>PRT
<213> Artificial sequence
<220>
<223> vascular signal sequence containing barley thiol protease aleurain precursor gene fused with human collagen alpha 2(I) chain
Synthetic sequences of coding and flanking regions
<400>6
Met Ala His Ala Arg Val Leu Leu Leu Ala Leu Ala Val Leu Ala Thr
1 5 10 15
Ala Ala Val Ala Val Ala Ser Ser Ser Ser Phe Ala Asp Ser Asn Pro
20 25 30
Ile Arg Pro Val Thr Asp Arg Ala Ala Ser Thr Leu Ala Gln Leu Leu
35 40 45
Gln Glu Glu Thr Val Arg Lys Gly Pro Ala Gly Asp Arg Gly Pro Arg
50 55 60
Gly Glu Arg Gly Pro Pro Gly Pro Pro Gly Arg Asp Gly Glu Asp Gly
65 70 75 80
Pro Thr Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Leu
85 90 95
Gly Gly Asn Phe Ala Ala Gln Tyr Asp Gly Lys Gly Val Gly Leu Gly
100 105 110
Pro Gly Pro Met Gly Leu Met Gly Pro Arg Gly Pro Pro Gly Ala Ala
115 120 125
Gly Ala Pro Gly Pro Gln Gly Phe Gln Gly Pro Ala Gly Glu Pro Gly
130 135 140
Glu Pro Gly Gln Thr Gly Pro Ala Gly Ala Arg Gly Pro Ala Gly Pro
145 150 155 160
Pro Gly Lys Ala Gly Glu Asp Gly His Pro Gly Lys Pro Gly Arg Pro
165 170 175
Gly Glu Arg Gly Val Val Gly Pro Gln Gly Ala Arg Gly Phe Pro Gly
180 185 190
Thr Pro Gly Leu Pro Gly Phe Lys Gly Ile Arg Gly His Asn Gly Leu
195 200 205
Asp Gly Leu Lys Gly Gln Pro Gly Ala Pro Gly Val Lys Gly Glu Pro
210 215 220
Gly Ala Pro Gly Glu Asn Gly Thr Pro Gly Gln Thr Gly Ala Arg Gly
225 230 235 240
Leu Pro Gly Glu Arg Gly Arg Val Gly Ala Pro Gly Pro Ala Gly Ala
245 250 255
Arg Gly Ser Asp Gly Ser Val Gly Pro Val Gly Pro Ala Gly Pro Ile
260 265 270
Gly Ser Ala Gly Pro Pro Gly Phe Pro Gly Ala Pro Gly Pro Lys Gly
275 280 285
Glu Ile Gly Ala Val Gly Asn Ala Gly Pro Thr Gly Pro Ala Gly Pro
290 295 300
Arg Gly Glu Val Gly Leu Pro Gly Leu Ser Gly Pro Val Gly Pro Pro
305 310 315 320
Gly Asn Pro Gly Ala Asn Gly Leu Thr Gly Ala Lys Gly Ala Ala Gly
325 330 335
Leu Pro Gly Val Ala Gly Ala Pro Gly Leu Pro Gly Pro Arg Gly Ile
340 345 350
Pro Gly Pro Val Gly Ala Ala Gly Ala Thr Gly Ala Arg Gly Leu Val
355 360 365
Gly Glu Pro Gly Pro Ala Gly Ser Lys Gly Glu Ser Gly Asn Lys Gly
370 375 380
Glu Pro Gly Ser Ala Gly Pro Gln Gly Pro Pro Gly Pro Ser Gly Glu
385 390 395 400
Glu Gly Lys Arg Gly Pro Asn Gly Glu Ala Gly Ser Ala Gly Pro Pro
405 410 415
Gly Pro Pro Gly Leu Arg Gly Ser Pro Gly Ser Arg Gly Leu Pro Gly
420 425 430
Ala Asp Gly Arg Ala Gly Val Met Gly Pro Pro Gly Ser Arg Gly Ala
435 440 445
Ser Gly Pro Ala Gly Val Arg Gly Pro Asn Gly Asp Ala Gly Arg Pro
450 455 460
Gly Glu Pro Gly Leu Met Gly Pro Arg Gly Leu Pro Gly Ser Pro Gly
465 470 475 480
Asn Ile Gly Pro Ala Gly Lys Glu Gly Pro Val Gly Leu Pro Gly Ile
485 490 495
Asp Gly Arg Pro Gly Pro Ile Gly Pro Ala Gly Ala Arg Gly Glu Pro
500 505 510
Gly Asn Ile Gly Phe Pro Gly Pro Lys Gly Pro Thr Gly Asp Pro Gly
515 520 525
Lys Asn Gly Asp Lys Gly His Ala Gly Leu Ala Gly Ala Arg Gly Ala
530 535 540
Pro Gly Pro Asp Gly Asn Asn Gly Ala Gln Gly Pro Pro Gly Pro Gln
545 550 555 560
Gly Val Gln Gly Gly Lys Gly Glu Gln Gly Pro Ala Gly Pro Pro Gly
565 570 575
Phe Gln Gly Leu Pro Gly Pro Ser Gly Pro Ala Gly Glu Val Gly Lys
580 585 590
Pro Gly Glu Arg Gly Leu His Gly Glu Phe Gly Leu Pro Gly Pro Ala
595 600 605
Gly Pro Arg Gly Glu Arg Gly Pro Pro Gly Glu Ser Gly Ala Ala Gly
610 615 620
Pro Thr Gly Pro Ile Gly Ser Arg Gly Pro Ser Gly Pro Pro Gly Pro
625 630 635 640
Asp Gly Asn Lys Gly Glu Pro Gly Val Val Gly Ala Val Gly Thr Ala
645 650 655
Gly Pro Ser Gly Pro Ser Gly Leu Pro Gly Glu Arg Gly Ala Ala Gly
660 665 670
Ile Pro Gly Gly Lys Gly Glu Lys Gly Glu Pro Gly Leu Arg Gly Glu
675 680 685
Ile Gly Asn Pro Gly Arg Asp Gly Ala Arg Gly Ala His Gly Ala Val
690 695 700
Gly Ala Pro Gly Pro Ala Gly Ala Thr Gly Asp Arg Gly Glu Ala Gly
705 710 715 720
Ala Ala Gly Pro Ala Gly Pro Ala Gly Pro Arg Gly Ser Pro Gly Glu
725 730 735
Arg Gly Glu Val Gly Pro Ala Gly Pro Asn Gly Phe Ala Gly Pro Ala
740 745 750
Gly Ala Ala Gly Gln Pro Gly Ala Lys Gly Glu Arg Gly Gly Lys Gly
755 760 765
Pro Lys Gly Glu Asn Gly Val Val Gly Pro Thr Gly Pro Val Gly Ala
770 775 780
Ala Gly Pro Ala Gly Pro Asn Gly Pro Pro Gly Pro Ala Gly Ser Arg
785 790 795 800
Gly Asp Gly Gly Pro Pro Gly Met Thr Gly Phe Pro Gly Ala Ala Gly
805 810 815
Arg Thr Gly Pro Pro Gly Pro Ser Gly Ile Ser Gly Pro Pro Gly Pro
820 825 830
Pro Gly Pro Ala Gly Lys Glu Gly Leu Arg Gly Pro Arg Gly Asp Gln
835 840 845
Gly Pro Val Gly Arg Thr Gly Glu Val Gly Ala Val Gly Pro Pro Gly
850 855 860
Phe Ala Gly Glu Lys Gly Pro Ser Gly Glu Ala Gly Thr Ala Gly Pro
865 870 875 880
Pro Gly Thr Pro Gly Pro Gln Gly Leu Leu Gly Ala Pro Gly Ile Leu
885 890 895
Gly Leu Pro Gly Ser Arg Gly Glu Arg Gly Leu Pro Gly Val Ala Gly
900 905 910
Ala Val Gly Glu Pro Gly Pro Leu Gly Ile Ala Gly Pro Pro Gly Ala
915 920 925
Arg Gly Pro Pro Gly Ala Val Gly Ser Pro Gly Val Asn Gly Ala Pro
930 935 940
Gly Glu Ala Gly Arg Asp Gly Asn Pro Gly Asn Asp Gly Pro Pro Gly
945 950 955 960
Arg Asp Gly Gln Pro Gly His Lys Gly Glu Arg Gly Tyr Pro Gly Asn
965 970 975
Ile Gly Pro Val Gly Ala Ala Gly Ala Pro Gly Pro His Gly Pro Val
980 985 990
Gly Pro Ala Gly Lys His Gly Asn Arg Gly Glu Thr Gly Pro Ser Gly
995 1000 1005
Pro Val Gly Pro Ala Gly Ala Val Gly Pro Arg Gly Pro Ser Gly
1010 1015 1020
Pro Gln Gly Ile Arg Gly Asp Lys Gly Glu Pro Gly Glu Lys Gly
1025 1030 1035
Pro Arg Gly Leu Pro Gly Phe Lys Gly His Asn Gly Leu Gln Gly
1040 1045 1050
Leu Pro Gly Ile Ala Gly His His Gly Asp Gln Gly Ala Pro Gly
1055 1060 1065
Ser Val Gly Pro Ala Gly Pro Arg Gly Pro Ala Gly Pro Ser Gly
1070 1075 1080
Pro Ala Gly Lys Asp Gly Arg Thr Gly His Pro Gly Thr Val Gly
1085 1090 1095
Pro Ala Gly Ile Arg Gly Pro Gln Gly His Gln Gly Pro Ala Gly
1100 1105 1110
Pro Pro Gly Pro Pro Gly Pro Pro Gly Pro Pro Gly Val Ser Gly
1115 1120 1125
Gly Gly Tyr Asp Phe Gly Tyr Asp Gly Asp Phe Tyr Arg Ala Asp
1130 1135 1140
Gln Pro Arg Ser Ala Pro Ser Leu Arg Pro Lys Asp Tyr Glu Val
1145 1150 1155
Asp Ala Thr Leu Lys Ser Leu Asn Asn Gln Ile Glu Thr Leu Leu
1160 1165 1170
Thr Pro Glu Gly Ser Arg Lys Asn Pro Ala Arg Thr Cys Arg Asp
1175 1180 1185
Leu Arg Leu Ser His Pro Glu Trp Ser Ser Gly Tyr Tyr Trp Ile
1190 1195 1200
Asp Pro Asn Gln Gly Cys Thr Met Glu Ala Ile Lys Val Tyr Cys
1205 1210 1215
Asp Phe Pro Thr Gly Glu Thr Cys Ile Arg Ala Gln Pro Glu Asn
1220 1225 1230
Ile Pro Ala Lys Asn Trp Tyr Arg Ser Ser Lys Asp Lys Lys His
1235 1240 1245
Val Trp Leu Gly Glu Thr Ile Asn Ala Gly Ser Gln Phe Glu Tyr
1250 1255 1260
Asn Val Glu Gly Val Thr Ser Lys Glu Met Ala Thr Gln Leu Ala
1265 1270 1275
Phe Met Arg Leu Leu Ala Asn Tyr Ala Ser Gln Asn Ile Thr Tyr
1280 1285 1290
His Cys Lys Asn Ser Ile Ala Tyr Met Asp Glu Glu Thr Gly Asn
1295 1300 1305
Leu Lys Lys Ala Val Ile Leu Gln Gly Ser Asn Asp Val Glu Leu
1310 1315 1320
Val Ala Glu Gly Asn Ser Arg Phe Thr Tyr Thr Val Leu Val Asp
1325 1330 1335
Gly Cys Ser Lys Lys Thr Asn Glu Trp Gly Lys Thr Ile Ile Glu
1340 1345 1350
Tyr Lys Thr Asn Lys Pro Ser Arg Leu Pro Phe Leu Asp Ile Ala
1355 1360 1365
Pro Leu Asp Ile Gly Gly Ala Asp His Glu Phe Phe Val Asp Ile
1370 1375 1380
Gly Pro Val Cys Phe Lys
1385
<210>7
<211>127
<212>DNA
<213> Artificial sequence
<220>
<223> synthetic sequence containing coding region and flanking region of apoplast signal of Arabidopsis thaliana endo-1, 4-beta-glucanase
<400>7
gccatggcta ggaagtcttt gattttccca gtgattcttc ttgctgtgct tcttttctct 60
ccacctattt actctgctgg acacgattat agggatgctc ttaggaagtc atctatggct 120
caattgc 127
<210>8
<211>127
<212>DNA
<213> Artificial sequence
<220>
<223> Synthesis sequences of apoplast Signal and flanking region of Arabidopsis thaliana endo-1, 4-beta-glucanase
<220>
<221>CDS
<222>(10)..(120)
<400>8
gccatggct agg aag tct ttg att ttc cca gtg att ctt ctt gct gtg ctt 51
Arg Lys Ser Leu Ile Phe Pro Val Ile Leu Leu Ala Val Leu
1 5 10
ctt ttc tct cca cct att tac tct gct gga cac gat tat agg gat gct 99
Leu Phe Ser Pro Pro Ile Tyr Ser Ala Gly His Asp Tyr Arg Asp Ala
15 20 25 30
ctt agg aag tca tct atg gct caattgc 127
Leu Arg Lys Ser Ser Met Ala
35
<210>9
<211>37
<212>PRT
<213> Artificial sequence
<220>
<223> Synthesis sequences of apoplast Signal and flanking region of Arabidopsis thaliana endo-1, 4-beta-glucanase
<400>9
Arg Lys Ser Leu Ile Phe Pro Val Ile Leu Leu Ala Val Leu Leu Phe
1 5 10 15
Ser Pro Pro Ile Tyr Ser Ala Gly His Asp Tyr Arg Asp Ala Leu Arg
20 25 30
Lys Ser Ser Met Ala
35
<210>10
<211>1037
<212>DNA
<213> Artificial sequence
<220>
<223> Chrysanthemum rbcS1 promoter and 5' UTR
<400>10
aaatggcgcg ccaagcttag acaaacaccc cttgttatac aaagaatttc gctttacaaa 60
atcaaattcg agaaaataat atatgcacta aataagatca ttcggatcca atctaaccaa 120
ttacgatacg ctttgggtac acttgatttt tgtttcagta gttacatata tcttgtttta 180
tatgctatct ttaaggatct tcactcaaag actatttgtt gatgttcttg atggggctcg 240
gaagatttga tatgatacac tctaatcttt aggagatacc agccaggatt atattcagta 300
agacaatcaa attttacgtg ttcaaactcg ttatcttttc atttaatgga tgagccagaa 360
tctctataga atgattgcaa tcgagaatat gttcggccga tatccctttg ttggcttcaa 420
tattctacat atcacacaag aatcgaccgt attgtaccct ctttccataa aggaacacac 480
agtatgcaga tgcttttttc ccacatgcag taacataggt attcaaaaat ggctaaaaga 540
agttggataa caaattgaca actatttcca tttctgttat ataaatttca caacacacaa 600
aagcccgtaa tcaagagtct gcccatgtac gaaataactt ctattatttg gtattgggcc 660
taagcccagc tcagagtacg tgggggtacc acatatagga aggtaacaaa atactgcaag 720
atagccccat aacgtaccag cctctcctta ccacgaagag ataagatata agacccaccc 780
tgccacgtgt cacatcgtca tggtggttaa tgataaggga ttacatcctt ctatgtttgt 840
ggacatgatg catgtaatgt catgagccac atgatccaat ggccacagga acgtaagaat 900
gtagatagat ttgattttgt ccgttagata gcaaacaaca ttataaaagg tgtgtatcaa 960
tacgaactaa ttcactcatt ggattcatag aagtccattc ctcctaagta tctaaacata 1020
tgcaattgtc gactaaa 1037
<210>11
<211>975
<212>DNA
<213> Artificial sequence
<220>
<223> Chrysanthemum rbcS 13' UTR and terminator
<400>11
aaaaggatcc gcggccgcat aagttttact atttaccaag acttttgaat attaaccttc 60
ttgtaacgag tcggttaaat ttgattgttt agggttttgt attatttttt tttggtcttt 120
taattcatca ctttaattcc ctaattgtct gttcatttcg ttgtttgttt ccggatcgat 180
aatgaaatgt aagagatatc atatataaat aataaattgt cgtttcatat ttgcaatctt 240
tttttacaaa cctttaatta attgtatgta tgacattttc ttcttgttat attaggggga 300
aataatgtta aataaaagta caaaataaac tacagtacat cgtactgaat aaattaccta 360
gccaaaaagt acacctttcc atatacttcc tacatgaagg cattttcaac attttcaaat 420
aaggaatgct acaaccgcat aataacatcc acaaattttt ttataaaata acatgtcaga 480
cagtgattga aagattttat tatagtttcg ttatcttctt ttctcattaa gcgaatcact 540
acctaacacg tcattttgtg aaatattttt tgaatgtttt tatatagttg tagcattcct 600
cttttcaaat tagggtttgt ttgagatagc atttcagccg gttcatacaa cttaaaagca 660
tactctaatg ctggaaaaaa gactaaaaaa tcttgtaagt tagcgcagaa tattgaccca 720
aattatatac acacatgacc ccatatagag actaattaca cttttaacca ctaataatta 780
ttactgtatt ataacatcta ctaattaaac ttgtgagttt ttgctagaat tattatcata 840
tatactaaaa ggcaggaacg caaacattgc cccggtactg tagcaactac ggtagacgca 900
ttaattgtct atagtggacg cattaattaa ccaaaaccgc ctctttcccc ttcttcttga 960
agcttgagct ctttt 975
<210>12
<211>1633
<212>DNA
<213> Artificial sequence
<220>
<223> vascular duct containing barley thiol protease aleurain precursor gene fused with human prolyl 4-hydroxylase beta subunit
Synthetic sequences for coding and flanking regions of signal sequences
<400>12
ctcgagtaaa ccatggctca tgctagggtt ttgcttttgg ctcttgctgt tcttgctact 60
gctgctgttg ctgtggcttc ttcttcatct ttcgctgatt ctaacccaat taggccagtg 120
actgatagag ctgcttctac tcttgctcaa ttggtcgaca tggatgctcc agaagaggag 180
gatcacgttc ttgtgcttag gaagtctaac ttcgctgaag ctcttgctgc tcacaagtac 240
cttcttgtgg agttttatgc tccttggtgc ggacattgca aagctcttgc tccagagtat 300
gctaaggctg ctggaaagtt gaaggctgag ggatctgaaa ttaggcttgc taaagtggat 360
gctactgagg agtctgatct tgctcaacag tacggagtta ggggataccc aactattaag 420
ttcttcagga acggagatac tgcttctcca aaggagtata ctgctggaag ggaggctgat 480
gatattgtga actggcttaa gaagagaact ggaccagctg ctactactct tccagatgga 540
gctgctgctg aatctcttgt ggagtcatct gaggtggcag tgattggatt cttcaaggat 600
gtggagtctg attctgctaa gcagttcctt caagctgctg aggctattga tgatattcca 660
ttcggaatta cttctaactc tgatgtgttc tctaagtacc agcttgataa ggatggagtg 720
gtgcttttca agaaattcga tgagggaagg aacaatttcg agggagaggt gacaaaggag 780
aaccttcttg atttcattaa gcacaaccag cttccacttg tgattgagtt cactgagcag 840
actgctccaa agattttcgg aggagagatt aagactcaca ttcttctttt ccttccaaag 900
tctgtgtctg attacgatgg aaagttgtct aacttcaaga ctgctgctga gtctttcaag 960
ggaaagattc ttttcatttt cattgattct gatcacactg ataaccagag gattcttgag 1020
ttcttcggac ttaagaagga agagtgccca gctgttaggc ttattactct tgaggaggag 1080
atgactaagt acaagccaga gtctgaagaa cttactgctg agaggattac tgagttctgc 1140
cacagattcc ttgagggaaa gattaagcca caccttatgt ctcaagagct tccagaggat 1200
tgggataagc agccagttaa ggtgttggtg ggtaaaaact tcgaggatgt ggctttcgat 1260
gagaagaaga acgtgttcgt ggagttctac gcaccttggt gtggtcactg taagcagctt 1320
gctccaattt gggataagtt gggagagact tacaaggatc acgagaacat tgtgattgct 1380
aagatggatt ctactgctaa cgaggtggag gctgttaagg ttcactcttt cccaactttg 1440
aagttcttcc cagcttctgc tgataggact gtgattgatt acaacggaga aaggactctt 1500
gatggattca agaagttcct tgagtctgga ggacaagatg gagctggaga tgatgatgat 1560
cttgaggatt tggaagaagc tgaggagcca gatatggagg aggatgatga tcagaaggct 1620
gtgtgatgag ctc 1633
<210>13
<211>537
<212>PRT
<213> Artificial sequence
<220>
<223> vascular duct containing barley thiol protease aleurain precursor gene fused with human prolyl 4-hydroxylase beta subunit
Synthetic sequences of Signal sequences and flanking regions
<400>13
Met Ala His Ala Arg Val Leu Leu Leu Ala Leu Ala Val Leu Ala Thr
1 5 10 15
Ala Ala Val Ala Val Ala Ser Ser Ser Ser Phe Ala Asp Ser Asn Pro
20 25 30
Ile Arg Pro Val Thr Asp Arg Ala Ala Ser Thr Leu Ala Gln Leu Val
35 40 45
Asp Met Asp Ala Pro Glu Glu Glu Asp His Val Leu Val Leu Arg Lys
50 55 60
Ser Asn Phe Ala Glu Ala Leu Ala Ala His Lys Tyr Leu Leu Val Glu
65 70 75 80
Phe Tyr Ala Pro Trp Cys Gly His Cys Lys Ala Leu Ala Pro Glu Tyr
85 90 95
Ala Lys Ala Ala Gly Lys Leu Lys Ala Glu Gly Ser Glu Ile Arg Leu
100 105 110
Ala Lys Val Asp Ala Thr Glu Glu Ser Asp Leu Ala Gln Gln Tyr Gly
115 120 125
Val Arg Gly Tyr Pro Thr Ile Lys Phe Phe Arg Asn Gly Asp Thr Ala
130 135 140
Ser Pro Lys Glu Tyr Thr Ala Gly Arg Glu Ala Asp Asp Ile Val Asn
145 150 155 160
Trp Leu Lys Lys Arg Thr Gly Pro Ala Ala Thr Thr Leu Pro Asp Gly
165 170 175
Ala Ala Ala Glu Ser Leu Val Glu Ser Ser Glu Val Ala Val Ile Gly
180 185 190
Phe Phe Lys Asp Val Glu Ser Asp Ser Ala Lys Gln Phe Leu Gln Ala
195 200 205
Ala Glu Ala Ile Asp Asp Ile Pro Phe Gly Ile Thr Ser Asn Ser Asp
210 215 220
Val Phe Ser Lys Tyr Gln Leu Asp Lys Asp Gly Val Val Leu Phe Lys
225 230 235 240
Lys Phe Asp Glu Gly Arg Asn Asn Phe Glu Gly Glu Val Thr Lys Glu
245 250 255
Asn Leu Leu Asp Phe Ile Lys His Asn Gln Leu Pro Leu Val Ile Glu
260 265 270
Phe Thr Glu Gln Thr Ala Pro Lys Ile Phe Gly Gly Glu Ile Lys Thr
275 280 285
His Ile Leu Leu Phe Leu Pro Lys Ser Val Ser Asp Tyr Asp Gly Lys
290 295 300
Leu Ser Asn Phe Lys Thr Ala Ala Glu Ser Phe Lys Gly Lys Ile Leu
305 310 315 320
Phe Ile Phe Ile Asp Ser Asp His Thr Asp Asn Gln Arg Ile Leu Glu
325 330 335
Phe Phe Gly Leu Lys Lys Glu Glu Cys Pro Ala Val Arg Leu Ile Thr
340 345 350
Leu Glu Glu Glu Met Thr Lys Tyr Lys Pro Glu Ser Glu Glu Leu Thr
355 360 365
Ala Glu Arg Ile Thr Glu Phe Cys His Arg Phe Leu Glu Gly Lys Ile
370 375 380
Lys Pro His Leu Met Ser Gln Glu Leu Pro Glu Asp Trp Asp Lys Gln
385 390 395 400
Pro Val Lys Val Leu Val Gly Lys Asn Phe Glu Asp Val Ala Phe Asp
405 410 415
Glu Lys Lys Asn Val Phe Val Glu Phe Tyr Ala Pro Trp Cys Gly His
420 425 430
Cys Lys Gln Leu Ala Pro Ile Trp Asp Lys Leu Gly Glu Thr Tyr Lys
435 440 445
Asp His Glu Asn Ile Val Ile Ala Lys Met Asp Ser Thr Ala Asn Glu
450 455 460
Val Glu Ala Val Lys Val His Ser Phe Pro Thr Leu Lys Phe Phe Pro
465 470 475 480
Ala Ser Ala Asp Arg Thr Val Ile Asp Tyr Asn Gly Glu Arg Thr Leu
485 490 495
Asp Gly Phe Lys Lys Phe Leu Glu Ser Gly Gly Gln Asp Gly Ala Gly
500 505 510
Asp Asp Asp Asp Leu Glu Asp Leu Glu Glu Ala Glu Glu Pro Asp Met
515 520 525
Glu Glu Asp Asp Asp Gln Lys Ala Val
530 535
<210>14
<211>1723
<212>DNA
<213> Artificial sequence
<220>
<223> vitamin containing barley thiol protease aleurain precursor gene fused with human prolyl 4-hydroxylase alpha-1 subunit
Synthetic sequences for the coding region and the flanking region of the tubular Signal sequence
<400>14
ctcgagtaaa ccatggctca tgctagggtt ttgcttttgg ctcttgctgt tcttgctact 60
gctgctgttg ctgtggcttc ttcttcatct ttcgctgatt ctaacccaat taggccagtg 120
actgatagag ctgcttctac tcttgctcaa ttggtcgaca tgcacccagg attcttcact 180
tctattggac agatgactga tcttattcac actgagaagg atcttgtgac ttctcttaag 240
gattacatta aggctgagga ggataagttg gagcagatta agaagtgggc tgagaagttg 300
gataggctta cttctactgc tacaaaagat ccagagggat tcgttggtca tccagtgaac 360
gctttcaagt tgatgaagag gcttaacact gagtggagtg agcttgagaa ccttgtgctt 420
aaggatatgt ctgatggatt catttctaac cttactattc agaggcagta cttcccaaat 480
gatgaggatc aagtgggagc tgctaaggct cttcttaggc ttcaggatac ttacaacctt 540
gatactgata caatttctaa gggaaacctt ccaggagtta agcacaagtc tttccttact 600
gctgaggatt gcttcgagct tggaaaggtt gcatacactg aggctgatta ctaccacact 660
gagctttgga tggaacaagc tcttaggcaa cttgatgagg gagagatttc tactattgat 720
aaggtgtcag tgcttgatta cctttcttac gctgtgtacc agcagggtga tcttgataag 780
gctcttttgc ttactaagaa gttgcttgag cttgatccag aacatcagag ggctaacgga 840
aaccttaagt acttcgagta cattatggct aaggaaaagg atgtgaacaa gtctgcttct 900
gatgatcagt ctgatcaaaa gactactcca aagaagaagg gagtggctgt tgattatctt 960
cctgagaggc agaagtatga gatgttgtgt aggggagagg gtattaagat gactccaagg 1020
aggcagaaga agttgttctg caggtatcac gatggaaaca ggaacccaaa gttcattctt 1080
gctccagcta agcaagaaga tgagtgggat aagccaagga ttattaggtt ccacgatatt 1140
atttctgatg ctgagattga gattgtgaag gatcttgcta agccaagact taggagggct 1200
actatttcta accctattac tggtgatctt gagactgtgc actacaggat ttctaagtct 1260
gcttggcttt ctggatacga gaacccagtg gtgtctagga ttaacatgag gattcaggat 1320
cttactggac ttgatgtgtc tactgctgag gagcttcaag ttgctaacta cggagttgga 1380
ggacaatatg agccacactt cgatttcgct aggaaggatg agccagatgc ttttaaggag 1440
cttggaactg gaaacaggat tgctacttgg cttttctaca tgtctgatgt ttctgctgga 1500
ggagctactg ttttcccaga agtgggagct tctgtttggc caaagaaggg aactgctgtg 1560
ttctggtaca accttttcgc ttctggagag ggagattact ctactaggca tgctgcttgc 1620
ccagttcttg ttggaaacaa gtgggtgtca aacaagtggc ttcatgagag gggacaagag 1680
tttagaaggc catgcactct ttctgagctt gagtgatgag ctc 1723
<210>15
<211>567
<212>PRT
<213> Artificial sequence
<220>
<223> vitamin containing barley thiol protease aleurain precursor gene fused with human prolyl 4-hydroxylase alpha-1 subunit
Synthetic sequences for the tubular Signal sequence and the flanking regions
<400>15
Met Ala His Ala Arg Val Leu Leu Leu Ala Leu Ala Val Leu Ala Thr
1 5 10 15
Ala Ala Val Ala Val Ala Ser Ser Ser Ser Phe Ala Asp Ser Asn Pro
20 25 30
Ile Arg Pro Val Thr Asp Arg Ala Ala Ser Thr Leu Ala Gln Leu Val
35 40 45
Asp Met His Pro Gly Phe Phe Thr Ser Ile Gly Gln Met Thr Asp Leu
50 55 60
Ile His Thr Glu Lys Asp Leu Val Thr Ser Leu Lys Asp Tyr Ile Lys
65 70 75 80
Ala Glu Glu Asp Lys Leu Glu Gln Ile Lys Lys Trp Ala Glu Lys Leu
85 90 95
Asp Arg Leu Thr Ser Thr Ala Thr Lys Asp Pro Glu Gly Phe Val Gly
100 105 110
His Pro Val Asn Ala Phe Lys Leu Met Lys Arg Leu Asn Thr Glu Trp
115 120 125
Ser Glu Leu Glu Asn Leu Val Leu Lys Asp Met Ser Asp Gly Phe Ile
130 135 140
Ser Asn Leu Thr Ile Gln Arg Gln Tyr Phe Pro Asn Asp Glu Asp Gln
145 150 155 160
Val Gly Ala Ala Lys Ala Leu Leu Arg Leu Gln Asp Thr Tyr Asn Leu
165 170 175
Asp Thr Asp Thr Ile Ser Lys Gly Asn Leu Pro Gly Val Lys His Lys
180 185 190
Ser Phe Leu Thr Ala Glu Asp Cys Phe Glu Leu Gly Lys Val Ala Tyr
195 200 205
Thr Glu Ala Asp Tyr Tyr His Thr Glu Leu Trp Met Glu Gln Ala Leu
210 215 220
Arg Gln Leu Asp Glu Gly Glu Ile Ser Thr Ile Asp Lys Val Ser Val
225 230 235 240
Leu Asp Tyr Leu Ser Tyr Ala Val Tyr Gln Gln Gly Asp Leu Asp Lys
245 250 255
Ala Leu Leu Leu Thr Lys Lys Leu Leu Glu Leu Asp Pro Glu His Gln
260 265 270
Arg Ala Asn Gly Asn Leu Lys Tyr Phe Glu Tyr Ile Met Ala Lys Glu
275 280 285
Lys Asp Val Asn Lys Ser Ala Ser Asp Asp Gln Ser Asp Gln Lys Thr
290 295 300
Thr Pro Lys Lys Lys Gly Val Ala Val Asp Tyr Leu Pro Glu Arg Gln
305 310 315 320
Lys Tyr Glu Met Leu Cys Arg Gly Glu Gly Ile Lys Met Thr Pro Arg
325 330 335
Arg Gln Lys Lys Leu Phe Cys Arg Tyr His Asp Gly Asn Arg Asn Pro
340 345 350
Lys Phe Ile Leu Ala Pro Ala Lys Gln Glu Asp Glu Trp Asp Lys Pro
355 360 365
Arg Ile Ile Arg Phe His Asp Ile Ile Ser Asp Ala Glu Ile Glu Ile
370 375 380
Val Lys Asp Leu Ala Lys Pro Arg Leu Arg Arg Ala Thr Ile Ser Asn
385 390 395 400
Pro Ile Thr Gly Asp Leu Glu Thr Val His Tyr Arg Ile Ser Lys Ser
405 410 415
Ala Trp Leu Ser Gly Tyr Glu Asn Pro Val Val Ser Arg Ile Asn Met
420 425 430
Arg Ile Gln Asp Leu Thr Gly Leu Asp Val Ser Thr Ala Glu Glu Leu
435 440 445
Gln Val Ala Asn Tyr Gly Val Gly Gly Gln Tyr Glu Pro His Phe Asp
450 455 460
Phe Ala Arg Lys Asp Glu Pro Asp Ala Phe Lys Glu Leu Gly Thr Gly
465 470 475 480
Asn Arg Ile Ala Thr Trp Leu Phe Tyr Met Ser Asp Val Ser Ala Gly
485 490 495
Gly Ala Thr Val Phe Pro Glu Val Gly Ala Ser Val Trp Pro Lys Lys
500 505 510
Gly Thr Ala Val Phe Trp Tyr Asn Leu Phe Ala Ser Gly Glu Gly Asp
515 520 525
Tyr Ser Thr Arg His Ala Ala Cys Pro Val Leu Val Gly Asn Lys Trp
530 535 540
Val Ser Asn Lys Trp Leu His Glu Arg Gly Gln Glu Phe Arg Arg Pro
545 550 555 560
Cys Thr Leu Ser Glu Leu Glu
565
<210>16
<211>928
<212>DNA
<213> Artificial sequence
<220>
<223> vascular duct containing barley thiol protease aleurain precursor gene fused with plant prolyl 4-hydroxylase plant
Synthetic sequences for coding and flanking regions of signal sequences
<400>16
ctcgagtaaa ccatggctca tgctagggtt ttgcttttgg ctcttgctgt tcttgctact 60
gctgctgttg ctgtggcttc ttcttcatct ttcgctgatt ctaacccaat taggccagtg 120
actgatagag ctgcttctac tcttgctcaa ttggtcgaca tgcttggtat tctttctctt 180
ccaaacgcta acaggaactc ttctaagact aacgatctta ctaacattgt gaggaagtct 240
gagacttctt ctggagatga ggagggaaat ggagaaagat gggtggaagt gatttcttgg 300
gagccaaggg ctgttgttta ccacaacttc cttactaatg aggagtgcga gcaccttatt 360
tctcttgcta agccatctat ggtgaagtct actgtggtgg atgagaaaac tggaggatct 420
aaggattcaa gagtgaggac ttcatctggt actttcctta ggaggggaca tgatgaagtt 480
gtggaagtta ttgagaagag gatttctgat ttcactttca ttccagtgga gaacggagaa 540
ggacttcaag ttcttcacta ccaagtggga caaaagtacg agccacacta cgattacttc 600
cttgatgagt tcaacactaa gaacggagga cagaggattg ctactgtgct tatgtacctt 660
tctgatgtgg atgatggagg agagactgtt tttccagctg ctaggggaaa catttctgct 720
gttccttggt ggaacgagct ttctaagtgt ggaaaggagg gactttctgt gcttccaaag 780
aaaagggatg ctcttctttt ctggaacatg aggccagatg cttctcttga tccatcttct 840
cttcatggag gatgcccagt tgttaaggga aacaagtggt catctactaa gtggttccac 900
gtgcacgagt tcaaggtgta atgagctc 928
<210>17
<211>302
<212>PRT
<213> Artificial sequence
<220>
<223> vascular duct containing barley thiol protease aleurain precursor gene fused with plant prolyl 4-hydroxylase plant
Synthetic sequences of Signal sequences and flanking regions
<400>17
Met Ala His Ala Arg Val Leu Leu Leu Ala Leu Ala Val Leu Ala Thr
1 5 10 15
Ala Ala Val Ala Val Ala Ser Ser Ser Ser Phe Ala Asp Ser Asn Pro
20 25 30
Ile Arg Pro Val Thr Asp Arg Ala Ala Ser Thr Leu Ala Gln Leu Val
35 40 45
Asp Met Leu Gly Ile Leu Ser Leu Pro Asn Ala Asn Arg Asn Ser Ser
50 55 60
Lys Thr Asn Asp Leu Thr Asn Ile Val Arg Lys Ser Glu Thr Ser Ser
65 70 75 80
Gly Asp Glu Glu Gly Asn Gly Glu Arg Trp Val Glu Val Ile Ser Trp
85 90 95
Glu Pro Arg Ala Val Val Tyr His Asn Phe Leu Thr Asn Glu Glu Cys
100 105 110
Glu His Leu Ile Ser Leu Ala Lys Pro Ser Met Val Lys Ser Thr Val
115 120 125
Val Asp Glu Lys Thr Gly Gly Ser Lys Asp Ser Arg Val Arg Thr Ser
130 135 140
Ser Gly Thr Phe Leu Arg Arg Gly His Asp Glu Val Val Glu Val Ile
145 150 155 160
Glu Lys Arg Ile Ser Asp Phe Thr Phe Ile Pro Val Glu Asn Gly Glu
165 170 175
Gly Leu Gln Val Leu His Tyr Gln Val Gly Gln Lys Tyr Glu Pro His
180 185 190
Tyr Asp Tyr Phe Leu Asp Glu Phe Asn Thr Lys Asn Gly Gly Gln Arg
195 200 205
Ile Ala Thr Val Leu Met Tyr Leu Ser Asp Val Asp Asp Gly Gly Glu
210 215 220
Thr Val Phe Pro Ala Ala Arg Gly Asn Ile Ser Ala Val Pro Trp Trp
225 230 235 240
Asn Glu Leu Ser Lys Cys Gly Lys Glu Gly Leu Ser Val Leu Pro Lys
245 250 255
Lys Arg Asp Ala Leu Leu Phe Trp Asn Met Arg Pro Asp Ala Ser Leu
260 265 270
Asp Pro Ser Ser Leu His Gly Gly Cys Pro Val Val Lys Gly Asn Lys
275 280 285
Trp Ser Ser Thr Lys Trp Phe His Val His Glu Phe Lys Val
290 295 300
<210>18
<211>2689
<212>DNA
<213> Artificial sequence
<220>
<223> synthetic sequence containing human procollagen C-protease coding region and flanking region
<400>18
agatctatcg atgcatgcca tggtaccgcg ccatggctca attggctgca acatcaaggc 60
ctgaaagagt ttggccagat ggtgttattc ctttcgttat tggtggaaac tttactggat 120
ctcagagagc agtttttaga caagctatga gacattggga aaagcacact tgtgtgacat 180
tccttgaaag gactgatgaa gattcttata ttgtgttcac ataccgtcca tgtggatgct 240
gctcatatgt tggtagaagg ggaggaggtc cacaagcaat ttctattgga aaaaactgcg 300
ataagttcgg aattgtggtg catgaattgg gacatgttgt tggtttctgg cacgaacaca 360
caaggccaga tagggatagg cacgtgtcta ttgtgaggga aaacattcag ccaggtcaag 420
agtacaattt tcttaagatg gaacctcaag aggtggaatc tctcggagag acttacgact 480
tcgactccat catgcactac gcaaggaata ctttcagcag gggcatcttc ttggatacca 540
ttgtgcctaa gtacgaggtg aacggcgtta agccacctat tggtcaaagg actaggctct 600
ctaagggtga tattgcacag gctaggaagc tctacaaatg tccagcatgc ggagaaactc 660
ttcaggattc cactggcaac ttctcatctc cagagtaccc aaacggatac tctgctcata 720
tgcactgtgt ttggaggatc tcagtgactc ctggagagaa gatcatcctc aacttcactt 780
ccctcgatct ctatcgttct aggctctgtt ggtacgacta tgtggaagtg agagatggct 840
tctggagaaa ggctccactt agaggaaggt tctgcggatc taaacttcct gagccaatcg 900
tgtctactga ttccagattg tgggtggagt tcaggtcctc ttctaattgg gttggcaagg 960
gcttttttgc tgtgtacgag gctatttgtg gcggcgacgt gaaaaaggac tacggacata 1020
ttcaaagtcc aaattaccca gatgattacc gtccttcaaa agtgtgtatt tggaggattc 1080
aagtgagtga gggtttccat gttggattga cattccaatc tttcgaaatt gagagacacg 1140
attcatgcgc atacgattat ttggaagtga gagatggaca ctctgaatct tctacactta 1200
ttggaaggta ctgcggttat gagaaacctg atgatattaa gtctacttct agtaggttgt 1260
ggcttaaatt tgtgtcagat ggttctatta acaaggctgg tttcgcagtg aacttcttca 1320
aggaagtgga tgaatgctca agacctaaca gaggaggatg tgagcaaaga tgccttaaca 1380
ctttgggaag ttacaagtgt tcttgcgatc ctggatacga gttggctcct gataagagaa 1440
gatgcgaagc tgcttgcggt ggttttttga caaaattgaa cggatctatt acttctcctg 1500
gatggccaaa agagtaccca cctaataaga attgcatttg gcagcttgtt gcacctactc 1560
agtaccgtat ttcattgcaa ttcgattttt tcgagactga gggtaatgat gtgtgcaagt 1620
acgatttcgt ggaagtgaga tcaggtctta ctgctgatag taaattgcac ggaaagttct 1680
gcggatctga aaaaccagaa gtgattacat cacagtacaa caatatgagg gtggagttca 1740
aatctgataa tactgtttct aaaaaaggtt ttaaggcaca tttcttttct gataaggacg 1800
agtgctctaa agataatggt ggttgccagc aggattgcgt gaacacattc ggttcatatg 1860
agtgccaatg ccgtagtgga tttgttcttc acgataacaa acatgattgc aaagaggcag 1920
gttgcgatca caaggtgaca tctacttcag gtactatcac atctccaaac tggcctgata 1980
agtatccttc aaaaaaagaa tgtacatggg caatttcttc tacaccaggt catagggtta 2040
agttgacatt catggagatg gatattgaga gtcaaccaga gtgcgcttat gatcatcttg 2100
aggtgttcga tggaagggat gctaaggctc ctgttcttgg tagattctgt ggtagtaaaa 2160
agccagaacc agtgcttgca acaggatcta ggatgttcct tagattctac tctgataact 2220
cagttcagag gaaaggattc caagctagtc acgcaactga atgcggtgga caagttagag 2280
cagatgttaa gactaaggat ctttactcac acgcacagtt cggagataac aactaccctg 2340
gaggagttga ttgcgagtgg gttattgtgg ctgaagaggg atacggagtt gagcttgttt 2400
tccagacatt cgaggtggag gaggaaactg attgcggtta cgattatatg gaactttttg 2460
atggatacga tagtactgct ccaagacttg gaaggtattg tggtagtggt ccaccagaag 2520
aggtgtactc agctggagat agtgttcttg ttaagttcca cagtgatgat acaattacta 2580
agaagggatt ccatcttaga tatacttcaa ctaagtttca ggatactctt cattctagga 2640
agtaatgagc tcgcggccgc atccaagctt ctgcagacgc gtcgacgtc 2689
<210>19
<211>870
<212>PRT
<213> Artificial sequence
<220>
<223> synthetic sequence containing human procollagen C-proteinase and flanking regions
<400>19
Met Ala Gln Leu Ala Ala Thr Ser Arg Pro Glu Arg Val Trp Pro Asp
1 5 10 15
Gly Val Ile Pro Phe Val Ile Gly Gly Asn Phe Thr Gly Ser Gln Arg
20 25 30
Ala Val Phe Arg Gln Ala Met Arg His Trp Glu Lys His Thr Cys Val
35 40 45
Thr Phe Leu Glu Arg Thr Asp Glu Asp Ser Tyr Ile Val Phe Thr Tyr
50 55 60
Arg Pro Cys Gly Cys Cys Ser Tyr Val Gly Arg Arg Gly Gly Gly Pro
65 70 75 80
Gln Ala Ile Ser Ile Gly Lys Asn Cys Asp Lys Phe Gly Ile Val Val
85 90 95
His Glu Leu Gly His Val Val Gly Phe Trp His Glu His Thr Arg Pro
100 105 110
Asp Arg Asp Arg His Val Ser Ile Val Arg Glu Asn Ile Gln Pro Gly
115 120 125
Gln Glu Tyr Asn Phe Leu Lys Met Glu Pro Gln Glu Val Glu Ser Leu
130 135 140
Gly Glu Thr Tyr Asp Phe Asp Ser Ile Met His Tyr Ala Arg Asn Thr
145 150 155 160
Phe Ser Arg Gly Ile Phe Leu Asp Thr Ile Val Pro Lys Tyr Glu Val
165 170 175
Asn Gly Val Lys Pro Pro Ile Gly Gln Arg Thr Arg Leu Ser Lys Gly
180 185 190
Asp Ile Ala Gln Ala Arg Lys Leu Tyr Lys Cys Pro Ala Cys Gly Glu
195 200 205
Thr Leu Gln Asp Ser Thr Gly Asn Phe Ser Ser Pro Glu Tyr Pro Asn
210 215 220
Gly Tyr Ser Ala His Met His Cys Val Trp Arg Ile Ser Val Thr Pro
225 230 235 240
Gly Glu Lys Ile Ile Leu Asn Phe Thr Ser Leu Asp Leu Tyr Arg Ser
245 250 255
Arg Leu Cys Trp Tyr Asp Tyr Val Glu Val Arg Asp Gly Phe Trp Arg
260 265 270
Lys Ala Pro Leu Arg Gly Arg Phe Cys Gly Ser Lys Leu Pro Glu Pro
275 280 285
Ile Val Ser Thr Asp Ser Arg Leu Trp Val Glu Phe Arg Ser Ser Ser
290 295 300
Asn Trp Val Gly Lys Gly Phe Phe Ala Val Tyr Glu Ala Ile Cys Gly
305 310 315 320
Gly Asp Val Lys Lys Asp Tyr Gly His Ile Gln Ser Pro Asn Tyr Pro
325 330 335
Asp Asp Tyr Arg Pro Ser Lys Val Cys Ile Trp Arg Ile Gln Val Ser
340 345 350
Glu Gly Phe His Val Gly Leu Thr Phe Gln Ser Phe Glu Ile Glu Arg
355 360 365
His Asp Ser Cys Ala Tyr Asp Tyr Leu Glu Val Arg Asp Gly His Ser
370 375 380
Glu Ser Ser Thr Leu Ile Gly Arg Tyr Cys Gly Tyr Glu Lys Pro Asp
385 390 395 400
Asp Ile Lys Ser Thr Ser Ser Arg Leu Trp Leu Lys Phe Val Ser Asp
405 410 415
Gly Ser Ile Asn Lys Ala Gly Phe Ala Val Asn Phe Phe Lys Glu Val
420 425 430
Asp Glu Cys Ser Arg Pro Asn Arg Gly Gly Cys Glu Gln Arg Cys Leu
435 440 445
Asn Thr Leu Gly Ser Tyr Lys Cys Ser Cys Asp Pro Gly Tyr Glu Leu
450 455 460
Ala Pro Asp Lys Arg Arg Cys Glu Ala Ala Cys Gly Gly Phe Leu Thr
465 470 475 480
Lys Leu Asn Gly Ser Ile Thr Ser Pro Gly Trp Pro Lys Glu Tyr Pro
485 490 495
Pro Asn Lys Asn Cys Ile Trp Gln Leu Val Ala Pro Thr Gln Tyr Arg
500 505 510
Ile Ser Leu Gln Phe Asp Phe Phe Glu Thr Glu Gly Asn Asp Val Cys
515 520 525
Lys Tyr Asp Phe Val Glu Val Arg Ser Gly Leu Thr Ala Asp Ser Lys
530 535 540
Leu His Gly Lys Phe Cys Gly Ser Glu Lys Pro Glu Val Ile Thr Ser
545 550 555 560
Gln Tyr Asn Asn Met Arg Val Glu Phe Lys Ser Asp Asn Thr Val Ser
565 570 575
Lys Lys Gly Phe Lys Ala His Phe Phe Ser Asp Lys Asp Glu Cys Ser
580 585 590
Lys Asp Asn Gly Gly Cys Gln Gln Asp Cys Val Asn Thr Phe Gly Ser
595 600 605
Tyr Glu Cys Gln Cys Arg Ser Gly Phe Val Leu His Asp Asn Lys His
610 615 620
Asp Cys Lys Glu Ala Gly Cys Asp His Lys Val Thr Ser Thr Ser Gly
625 630 635 640
Thr Ile Thr Ser Pro Asn Trp Pro Asp Lys Tyr Pro Ser Lys Lys Glu
645 650 655
Cys Thr Trp Ala Ile Ser Ser Thr Pro Gly His Arg Val Lys Leu Thr
660 665 670
Phe Met Glu Met Asp Ile Glu Ser Gln Pro Glu Cys Ala Tyr Asp His
675 680 685
Leu Glu Val Phe Asp Gly Arg Asp Ala Lys Ala Pro Val Leu Gly Arg
690 695 700
Phe Cys Gly Ser Lys Lys Pro Glu Pro Val Leu Ala Thr Gly Ser Arg
705 710 715 720
Met Phe Leu Arg Phe Tyr Ser Asp Asn Ser Val Gln Arg Lys Gly Phe
725 730 735
Gln Ala Ser His Ala Thr Glu Cys Gly Gly Gln Val Arg Ala Asp Val
740 745 750
Lys Thr Lys Asp Leu Tyr Ser His Ala Gln Phe Gly Asp Asn Asn Tyr
755 760 765
Pro Gly Gly Val Asp Cys Glu Trp Val Ile Val Ala Glu Glu Gly Tyr
770 775 780
Gly Val Glu Leu Val Phe Gln Thr Phe Glu Val Glu Glu Glu Thr Asp
785 790 795 800
Cys Gly Tyr Asp Tyr Met Glu Leu Phe Asp Gly Tyr Asp Ser Thr Ala
805 810 815
Pro Arg Leu Gly Arg Tyr Cys Gly Ser Gly Pro Pro Glu Glu Val Tyr
820 825 830
Ser Ala Gly Asp Ser Val Leu Val Lys Phe His Ser Asp Asp Thr Ile
835 840 845
Thr Lys Lys Gly Phe His Leu Arg Tyr Thr Ser Thr Lys Phe Gln Asp
850 855 860
Thr Leu His Ser Arg Lys
865 870
<210>20
<211>2912
<212>DNA
<213> Artificial sequence
<220>
<223> synthetic sequence containing human procollagen I N-protease coding region and flanking region
<400>20
gcgccatggc tcaattgagg agaagggcta ggagacacgc agctgatgat gattacaaca 60
ttgaagtttt gcttggtgtt gatgatagtg tggtgcaatt ccacggaaaa gagcatgttc 120
agaaatatct tttgacactt atgaatattg tgaacgaaat ctaccatgat gagtctttgg 180
gagcacacat taacgtggtt cttgtgagga ttattcttct ttcatacggt aaatctatgt 240
cacttattga gattggaaac ccttctcagt ctcttgagaa tgtgtgcaga tgggcatacc 300
ttcaacagaa gcctgatact ggacacgatg agtatcacga tcacgctatt ttccttacaa 360
ggcaggattt cggtccaagt ggaatgcaag gatatgctcc tgttactggt atgtgccacc 420
ctgttaggtc ttgtacactt aaccacgagg atggtttttc atctgctttc gtggtggctc 480
atgagacagg tcatgttttg ggaatggaac atgatggaca gggtaataga tgtggagatg 540
aagtgagact tggttcaatt atggctcctc ttgttcaagc tgcttttcat aggttccact 600
ggagtaggtg ttcacagcaa gagttgagta gataccttca ttcttacgat tgcttgcttg 660
atgatccatt tgctcatgat tggccagctt tgcctcaact tcctggattg cactactcta 720
tgaacgagca gtgcagattt gatttcggtc ttggttacat gatgtgcaca gctttcagga 780
ctttcgatcc atgcaaacag ttgtggtgtt cacacccaga taacccatat ttctgtaaaa 840
caaaaaaagg tccaccactt gatggtacta tgtgcgcacc tggaaagcac tgcttcaagg 900
gacactgcat ttggcttact cctgatattc ttaaaaggga tggatcatgg ggagcttggt 960
ctccattcgg aagttgctca agaacttgcg gaacaggtgt taagtttaga actaggcagt 1020
gcgataatcc acaccctgct aatggtggta gaacttgctc tggacttgct tacgattttc 1080
agttgtgttc taggcaagat tgccctgata gtcttgctga ttttagagaa gagcaatgta 1140
gacagtggga tctttacttt gagcacggcg acgctcagca ccactggctt ccacacgagc 1200
atagagatgc aaaagaaagg tgtcaccttt attgcgagag tagagagact ggagaggtgg 1260
tgtcaatgaa gagaatggtg cacgatggta caaggtgttc ttataaggat gcattctctt 1320
tgtgtgtgag gggagattgc aggaaagtgg gttgtgatgg agtgattgga tctagtaagc 1380
aagaagataa gtgcggagtg tgcggaggag ataactctca ttgcaaggtt gtgaaaggaa 1440
cttttacaag atcaccaaaa aaacacggtt acattaagat gttcgaaatt cctgctggag 1500
caaggcattt gcttattcag gaagtggatg caacatctca ccacttggca gtgaaaaacc 1560
ttgagactgg aaaattcatt ttgaacgagg agaacgatgt tgatgcatct agtaagactt 1620
tcattgcaat gggtgttgaa tgggagtata gggatgagga tggaagggaa acacttcaaa 1680
caatgggtcc tcttcatgga acaattactg tgttggtgat tccagtggga gatacaaggg 1740
tgtcattgac atacaagtat atgattcacg aggatagtct taacgttgat gataacaacg 1800
ttttggaaga agattctgtg gtttacgagt gggctcttaa gaaatggtca ccttgctcta 1860
agccatgtgg tggaggaagt cagttcacta agtatggttg taggaggagg cttgatcata 1920
agatggttca taggggattt tgcgcagcac ttagtaagcc aaaggcaatt aggagggctt 1980
gtaaccctca agaatgctca caaccagttt gggtgacagg agagtgggag ccatgttcac 2040
aaacatgcgg aagaactgga atgcaagtta gatcagttag atgcattcaa cctcttcatg 2100
ataacactac aagaagtgtg cacgcaaaac actgtaacga tgctaggcca gagagtagaa 2160
gagcttgctc tagggaactt tgccctggta gatggagggc aggaccttgg agtcagtgct 2220
ctgtgacatg tggaaacggt actcaggaaa gacctgttcc atgtagaact gctgatgata 2280
gtttcggaat ttgtcaggag gaaaggccag aaacagctag gacttgtaga cttggacctt 2340
gtcctaggaa tatttctgat cctagtaaaa aatcatacgt ggtgcaatgg ttgagtaggc 2400
cagatccaga ttcaccaatt aggaagattt cttcaaaagg acactgccag ggtgataaga 2460
gtattttctg cagaatggaa gttcttagta ggtactgttc tattccaggt tataacaaac 2520
tttcttgtaa gagttgcaac ttgtataaca atcttactaa cgtggagggt agaattgaac 2580
ctccaccagg aaagcacaac gatattgatg tgtttatgcc tactcttcct gtgccaacag 2640
ttgcaatgga agttagacct tctccatcta ctccacttga ggtgccactt aatgcatcaa 2700
gtactaacgc tactgaggat cacccagaga ctaacgcagt tgatgagcct tataagattc 2760
acggacttga ggatgaggtt cagccaccaa accttattcc taggaggcca agtccttacg 2820
aaaaaactag aaatcagagg attcaggagc ttattgatga gatgaggaaa aaggagatgc 2880
ttggaaagtt ctaatgagct cgcggccgca tc 2912
<210>21
<211>962
<212>PRT
<213> Artificial sequence
<220>
<223> synthetic sequence containing human procollagen I N-protease and flanking region
<400>21
Met Ala Gln Leu Arg Arg Arg Ala Arg Arg His Ala Ala Asp Asp Asp
1 5 10 15
Tyr Asn Ile Glu Val Leu Leu Gly Val Asp Asp Ser Val Val Gln Phe
20 25 30
His Gly Lys Glu His Val Gln Lys Tyr Leu Leu Thr Leu Met Asn Ile
35 40 45
Val Asn Glu Ile Tyr His Asp Glu Ser Leu Gly Ala His Ile Asn Val
50 55 60
Val Leu Val Arg Ile Ile Leu Leu Ser Tyr Gly Lys Ser Met Ser Leu
65 70 75 80
Ile Glu Ile Gly Asn Pro Ser Gln Ser Leu Glu Asn Val Cys Arg Trp
85 90 95
Ala Tyr Leu Gln Gln Lys Pro Asp Thr Gly His Asp Glu Tyr His Asp
100 105 110
His Ala Ile Phe Leu Thr Arg Gln Asp Phe Gly Pro Ser Gly Met Gln
115 120 125
Gly Tyr Ala Pro Val Thr Gly Met Cys His Pro Val Arg Ser Cys Thr
130 135 140
Leu Asn His Glu Asp Gly Phe Ser Ser Ala Phe Val Val Ala His Glu
145 150 155 160
Thr Gly His Val Leu Gly Met Glu His Asp Gly Gln Gly Asn Arg Cys
165 170 175
Gly Asp Glu Val Arg Leu Gly Ser Ile Met Ala Pro Leu Val Gln Ala
180 185 190
Ala Phe His Arg Phe His Trp Ser Arg Cys Ser Gln Gln Glu Leu Ser
195 200 205
Arg Tyr Leu His Ser Tyr Asp Cys Leu Leu Asp Asp Pro Phe Ala His
210 215 220
Asp Trp Pro Ala Leu Pro Gln Leu Pro Gly Leu His Tyr Ser Met Asn
225 230 235 240
Glu Gln Cys Arg Phe Asp Phe Gly Leu Gly Tyr Met Met Cys Thr Ala
245 250 255
Phe Arg Thr Phe Asp Pro Cys Lys Gln Leu Trp Cys Ser His Pro Asp
260 265 270
Asn Pro Tyr Phe Cys Lys Thr Lys Lys Gly Pro Pro Leu Asp Gly Thr
275 280 285
Met Cys Ala Pro Gly Lys His Cys Phe Lys Gly His Cys Ile Trp Leu
290 295 300
Thr Pro Asp Ile Leu Lys Arg Asp Gly Ser Trp Gly Ala Trp Ser Pro
305 310 315 320
Phe Gly Ser Cys Ser Arg Thr Cys Gly Thr Gly Val Lys Phe Arg Thr
325 330 335
Arg Gln Cys Asp Asn Pro His Pro Ala Asn Gly Gly Arg Thr Cys Ser
340 345 350
Gly Leu Ala Tyr Asp Phe Gln Leu Cys Ser Arg Gln Asp Cys Pro Asp
355 360 365
Ser Leu Ala Asp Phe Arg Glu Glu Gln Cys Arg Gln Trp Asp Leu Tyr
370 375 380
Phe Glu His Gly Asp Ala Gln His His Trp Leu Pro His Glu His Arg
385 390 395 400
Asp Ala Lys Glu Arg Cys His Leu Tyr Cys Glu Ser Arg Glu Thr Gly
405 410 415
Glu Val Val Ser Met Lys Arg Met Val His Asp Gly Thr Arg Cys Ser
420 425 430
Tyr Lys Asp Ala Phe Ser Leu Cys Val Arg Gly Asp Cys Arg Lys Val
435 440 445
Gly Cys Asp Gly Val Ile Gly Ser Ser Lys Gln Glu Asp Lys Cys Gly
450 455 460
Val Cys Gly Gly Asp Asn Ser His Cys Lys Val Val Lys Gly Thr Phe
465 470 475 480
Thr Arg Ser Pro Lys Lys His Gly Tyr Ile Lys Met Phe Glu Ile Pro
485 490 495
Ala Gly Ala Arg His Leu Leu Ile Gln Glu Val Asp Ala Thr Ser His
500 505 510
His Leu Ala Val Lys Asn Leu Glu Thr Gly Lys Phe Ile Leu Asn Glu
515 520 525
Glu Asn Asp Val Asp Ala Ser Ser Lys Thr Phe Ile Ala Met Gly Val
530 535 540
Glu Trp Glu Tyr Arg Asp Glu Asp Gly Arg Glu Thr Leu Gln Thr Met
545 550 555 560
Gly Pro Leu His Gly Thr Ile Thr Val Leu Val Ile Pro Val Gly Asp
565 570 575
Thr Arg Val Ser Leu Thr Tyr Lys Tyr Met Ile His Glu Asp Ser Leu
580 585 590
Asn Val Asp Asp Asn Asn Val Leu Glu Glu Asp Ser Val Val Tyr Glu
595 600 605
Trp Ala Leu Lys Lys Trp Ser Pro Cys Ser Lys Pro Cys Gly Gly Gly
610 615 620
Ser Gln Phe Thr Lys Tyr Gly Cys Arg Arg Arg Leu Asp His Lys Met
625 630 635 640
Val His Arg Gly Phe Cys Ala Ala Leu Ser Lys Pro Lys Ala Ile Arg
645 650 655
Arg Ala Cys Asn Pro Gln Glu Cys Ser Gln Pro Val Trp Val Thr Gly
660 665 670
Glu Trp Glu Pro Cys Ser Gln Thr Cys Gly Arg Thr Gly Met Gln Val
675 680 685
Arg Ser Val Arg Cys Ile Gln Pro Leu His Asp Asn Thr Thr Arg Ser
690 695 700
Val His Ala Lys His Cys Asn Asp Ala Arg Pro Glu Ser Arg Arg Ala
705 710 715 720
Cys Ser Arg Glu Leu Cys Pro Gly Arg Trp Arg Ala Gly Pro Trp Ser
725 730 735
Gln Cys Ser Val Thr Cys Gly Asn Gly Thr Gln Glu Arg Pro Val Pro
740 745 750
Cys Arg Thr Ala Asp Asp Ser Phe Gly Ile Cys Gln Glu Glu Arg Pro
755 760 765
Glu Thr Ala Arg Thr Cys Arg Leu Gly Pro Cys Pro Arg Asn Ile Ser
770 775 780
Asp Pro Ser Lys Lys Ser Tyr Val Val Gln Trp Leu Ser Arg Pro Asp
785 790 795 800
Pro Asp Ser Pro Ile Arg Lys Ile Ser Ser Lys Gly His Cys Gln Gly
805 810 815
Asp Lys Ser Ile Phe Cys Arg Met Glu Val Leu Ser Arg Tyr Cys Ser
820 825 830
Ile Pro Gly Tyr Asn Lys Leu Ser Cys Lys Ser Cys Asn Leu Tyr Asn
835 840 845
Asn Leu Thr Asn Val Glu Gly Arg Ile Glu Pro Pro Pro Gly Lys His
850 855 860
Asn Asp Ile Asp Val Phe Met Pro Thr Leu Pro Val Pro Thr Val Ala
865 870 875 880
Met Glu Val Arg Pro Ser Pro Ser Thr Pro Leu Glu Val Pro Leu Asn
885 890 895
Ala Ser Ser Thr Asn Ala Thr Glu Asp His Pro Glu Thr Asn Ala Val
900 905 910
Asp Glu Pro Tyr Lys Ile His Gly Leu Glu Asp Glu Val Gln Pro Pro
915 920 925
Asn Leu Ile Pro Arg Arg Pro Ser Pro Tyr Glu Lys Thr Arg Asn Gln
930 935 940
Arg Ile Gln Glu Leu Ile Asp Glu Met Arg Lys Lys Glu Met Leu Gly
945 950 955 960
Lys Phe
<210>22
<211>2888
<212>DNA
<213> Artificial sequence
<220>
<223> vascular signal sequence containing barley thiol protease aleurain precursor gene fused with human lysyl hydroxylase 3
Synthetic sequences of coding and flanking regions
<400>22
gcgaattcgc tagctatcac tgaaaagaca gcaagacaat ggtgtctcga tgcaccagaa 60
ccacatcttt gcagcagatg tgaagcagcc agagtggtcc acaagacgca ctcagaaaag 120
gcatcttcta ccgacacaga aaaagacaac cacagctcat catccaacat gtagactgtc 180
gttatgcgtc ggctgaagat aagactgacc ccaggccagc actaaagaag aaataatgca 240
agtggtccta gctccacttt agctttaata attatgtttc attattattc tctgcttttg 300
ctctctatat aaagagcttg tattttcatt tgaaggcaga ggcgaacaca cacacagaac 360
ctccctgctt acaaaccaga tcttaaacca tggctcacgc tagggttttg cttcttgctc 420
ttgctgttct tgctactgct gctgttgctg tggcttcttc aagttctttc gctgattcta 480
acccaattag gccagtgact gatagagctg cttctactct tgctcaattg agatctatgt 540
ctgatagacc aaggggaagg gatccagtta atccagagaa gttgcttgtg attactgtgg 600
ctactgctga gactgaagga taccttagat tccttaggag tgctgagttc ttcaactaca 660
ctgtgaggac tcttggactt ggagaagaat ggaggggagg agatgttgct agaactgttg 720
gaggaggaca gaaagtgaga tggcttaaga aagagatgga gaagtacgct gatagggagg 780
atatgattat tatgttcgtg gattcttacg atgtgattct tgctggatct ccaactgagc 840
ttttgaagaa attcgttcag tctggatcta ggcttctttt ctctgctgag tctttttgtt 900
ggccagaatg gggacttgct gagcaatatc cagaagtggg aactggaaag agattcctta 960
actctggagg attcattgga ttcgctacta ctattcacca gattgtgagg cagtggaagt 1020
acaaggatga cgatgatgat cagcttttct acactaggct ttaccttgat ccaggactta 1080
gggagaagtt gtctcttaac cttgatcaca agtctaggat tttccagaac cttaacggtg 1140
ctcttgatga ggttgtgctt aagttcgata ggaacagagt gaggattagg aacgtggctt 1200
acgatactct tcctattgtg gtgcatggaa acggaccaac aaaactccag cttaactacc 1260
ttggaaacta cgttccaaac ggatggactc cagaaggagg atgtggattc tgcaatcagg 1320
ataggagaac tcttccagga ggacaaccac caccaagagt tttccttgct gtgttcgttg 1380
aacagccaac tccattcctt ccaagattcc ttcagaggct tcttcttttg gattacccac 1440
cagatagggt gacacttttc cttcacaaca acgaggtttt ccacgagcca cacattgctg 1500
attcttggcc acagcttcag gatcatttct ctgctgtgaa gttggttggt ccagaagaag 1560
ctctttctcc aggagaagct agggatatgg ctatggattt gtgcaggcag gatccagagt 1620
gcgagttcta cttctctctt gatgctgatg ctgtgcttac taaccttcag actcttagga 1680
ttcttattga ggagaacagg aaagtgattg ctccaatgct ttctaggcac ggaaagttgt 1740
ggtctaattt ctggggtgct ctttctcctg atgagtacta cgctagatca gaggactacg 1800
tggagcttgt tcagagaaag agagtgggag tttggaacgt tccttatatt tctcaggctt 1860
acgtgattag gggagatact cttaggatgg agcttccaca gagggatgtt ttctctggat 1920
ctgatactga tccagatatg gctttctgca agtctttcag ggataaggga attttccttc 1980
acctttctaa ccagcatgag ttcggaagat tgcttgctac ttcaagatac gatactgagc 2040
accttcatcc tgatctttgg cagattttcg ataacccagt ggattggaag gagcagtaca 2100
ttcacgagaa ctactctagg gctcttgaag gagaaggaat tgtggagcaa ccatgcccag 2160
atgtttactg gttcccactt ctttctgagc aaatgtgcga tgagcttgtt gctgagatgg 2220
agcattacgg acaatggagt ggaggtagac atgaggattc taggcttgct ggaggatacg 2280
agaacgttcc aactgtggat attcacatga agcaagtggg atacgaggat caatggcttc 2340
agcttcttag gacttatgtg ggaccaatga ctgagtctct tttcccagga taccacacta 2400
aggctagggc tgttatgaac ttcgttgtga ggtatcgtcc agatgagcaa ccatctctta 2460
ggccacacca cgattcttct actttcactc ttaacgtggc tcttaaccac aagggacttg 2520
attatgaggg aggaggatgc cgtttcctta gatacgattg cgtgatttct tcaccaagaa 2580
agggatgggc tcttcttcat ccaggaaggc ttactcatta ccacgaggga cttccaacta 2640
cttggggaac tagatatatt atggtgtctt tcgtggatcc atgactgctt taatgagata 2700
tgcgagacgc ctatgatcgc atgatatttg ctttcaattc tgttgtgcac gttgtaaaaa 2760
acctgagcat gtgtagctca gatccttacc gccggtttcg gttcattcta atgaatatat 2820
cacccgttac tatcgtattt ttatgaataa tattctccgt tcaatttact gattgtccag 2880
aattcgcg 2888
<210>23
<211>764
<212>PRT
<213> Artificial sequence
<220>
<223> vascular signal sequence containing barley thiol protease aleurain precursor gene fused with human lysyl hydroxylase 3
And synthetic sequences of flanking regions
<400>23
Met Ala His Ala Arg Val Leu Leu Leu Ala Leu Ala Val Leu Ala Thr
1 5 10 15
Ala Ala Val Ala Val Ala Ser Ser Ser Ser Phe Ala Asp Ser Asn Pro
20 25 30
Ile Arg Pro Val Thr Asp Arg Ala Ala Ser Thr Leu Ala Gln Leu Arg
35 40 45
Ser Met Ser Asp Arg Pro Arg Gly Arg Asp Pro Val Asn Pro Glu Lys
50 55 60
Leu Leu Val Ile Thr Val Ala Thr Ala Glu Thr Glu Gly Tyr Leu Arg
65 70 75 80
Phe Leu Arg Ser Ala Glu Phe Phe Asn Tyr Thr Val Arg Thr Leu Gly
85 90 95
Leu Gly Glu Glu Trp Arg Gly Gly Asp Val Ala Arg Thr Val Gly Gly
100 105 110
Gly Gln Lys Val Arg Trp Leu Lys Lys Glu Met Glu Lys Tyr Ala Asp
115 120 125
Arg Glu Asp Met Ile Ile Met Phe Val Asp Ser Tyr Asp Val Ile Leu
130 135 140
Ala Gly Ser Pro Thr Glu Leu Leu Lys Lys Phe Val Gln Ser Gly Ser
145 150 155 160
Arg Leu Leu Phe Ser Ala Glu Ser Phe Cys Trp Pro Glu Trp Gly Leu
165 170 175
Ala Glu Gln Tyr Pro Glu Val Gly Thr Gly Lys Arg Phe Leu Asn Ser
180 185 190
Gly Gly Phe Ile Gly Phe Ala Thr Thr Ile His Gln Ile Val Arg Gln
195 200 205
Trp Lys Tyr Lys Asp Asp Asp Asp Asp Gln Leu Phe Tyr Thr Arg Leu
210 215 220
Tyr Leu Asp Pro Gly Leu Arg Glu Lys Leu Ser Leu Asn Leu Asp His
225 230 235 240
Lys Ser Arg Ile Phe Gln Asn Leu Asn Gly Ala Leu Asp Glu Val Val
245 250 255
Leu Lys Phe Asp Arg Asn Arg Val Arg Ile Arg Asn Val Ala Tyr Asp
260 265 270
Thr Leu Pro Ile Val Val His Gly Asn Gly Pro Thr Lys Leu Gln Leu
275 280 285
Asn Tyr Leu Gly Asn Tyr Val Pro Asn Gly Trp Thr Pro Glu Gly Gly
290 295 300
Cys Gly Phe Cys Asn Gln Asp Arg Arg Thr Leu Pro Gly Gly Gln Pro
305 310 315 320
Pro Pro Arg Val Phe Leu Ala Val Phe Val Glu Gln Pro Thr Pro Phe
325 330 335
Leu Pro Arg Phe Leu Gln Arg Leu Leu Leu Leu Asp Tyr Pro Pro Asp
340 345 350
Arg Val Thr Leu Phe Leu His Asn Asn Glu Val Phe His Glu Pro His
355 360 365
Ile Ala Asp Ser Trp Pro Gln Leu Gln Asp His Phe Ser Ala Val Lys
370 375 380
Leu Val Gly Pro Glu Glu Ala Leu Ser Pro Gly Glu Ala Arg Asp Met
385 390 395 400
Ala Met Asp Leu Cys Arg Gln Asp Pro Glu Cys Glu Phe Tyr Phe Ser
405 410 415
Leu Asp Ala Asp Ala Val Leu Thr Asn Leu Gln Thr Leu Arg Ile Leu
420 425 430
Ile Glu Glu Asn Arg Lys Val Ile Ala Pro Met Leu Ser Arg His Gly
435 440 445
Lys Leu Trp Ser Asn Phe Trp Gly Ala Leu Ser Pro Asp Glu Tyr Tyr
450 455 460
Ala Arg Ser Glu Asp Tyr Val Glu Leu Val Gln Arg Lys Arg Val Gly
465 470 475 480
Val Trp Asn Val Pro Tyr Ile Ser Gln Ala Tyr Val Ile Arg Gly Asp
485 490 495
Thr Leu Arg Met Glu Leu Pro Gln Arg Asp Val Phe Ser Gly Ser Asp
500 505 510
Thr Asp Pro Asp Met Ala Phe Cys Lys Ser Phe Arg Asp Lys Gly Ile
515 520 525
Phe Leu His Leu Ser Asn Gln His Glu Phe Gly Arg Leu Leu Ala Thr
530 535 540
Ser Arg Tyr Asp Thr Glu His Leu His Pro Asp Leu Trp Gln Ile Phe
545 550 555 560
Asp Asn Pro Val Asp Trp Lys Glu Gln Tyr Ile His Glu Asn Tyr Ser
565 570 575
Arg Ala Leu Glu Gly Glu Gly Ile Val Glu Gln Pro Cys Pro Asp Val
580 585 590
Tyr Trp Phe Pro Leu Leu Ser Glu Gln Met Cys Asp Glu Leu Val Ala
595 600 605
Glu Met Glu His Tyr Gly Gln Trp Ser Gly Gly Arg His Glu Asp Ser
610 615 620
Arg Leu Ala Gly Gly Tyr Glu Asn Val Pro Thr Val Asp Ile His Met
625 630 635 640
Lys Gln Val Gly Tyr Glu Asp Gln Trp Leu Gln Leu Leu Arg Thr Tyr
645 650 655
Val Gly Pro Met Thr Glu Ser Leu Phe Pro Gly Tyr His Thr Lys Ala
660 665 670
Arg Ala Val Met Asn Phe Val Val Arg Tyr Arg Pro Asp Glu Gln Pro
675 680 685
Ser Leu Arg Pro His His Asp Ser Ser Thr Phe Thr Leu Asn Val Ala
690 695 700
Leu Asn His Lys Gly Leu Asp Tyr Glu Gly Gly Gly Cys Arg Phe Leu
705 710 715 720
Arg Tyr Asp Cys Val Ile Ser Ser Pro Arg Lys Gly Trp Ala Leu Leu
725 730 735
His Pro Gly Arg Leu Thr His Tyr His Glu Gly Leu Pro Thr Thr Trp
740 745 750
Gly Thr Arg Tyr Ile Met Val Ser Phe Val Asp Pro
755 760
<210>24
<211>45
<212>PRT
<213> Artificial sequence
<220>
<223> vacuolar signal sequence of barley thiol protease aleurain precursor gene
<400>24
Met Ala His Ala Arg Val Leu Leu Leu Ala Leu Ala Val Leu Ala Thr
1 5 10 15
Ala Ala Val Ala Val Ala Ser Ser Ser Ser Phe Ala Asp Ser Asn Pro
20 25 30
Ile Arg Pro Val Thr Asp Arg Ala Ala Ser Thr Leu Ala
35 40 45
<210>25
<211>24
<212>DNA
<213> Artificial sequence
<220>
<223> Single-stranded DNA oligonucleotide
<400>25
atcaccagga gaacagggac catc 24
<210>26
<211>29
<212>DNA
<213> Artificial sequence
<220>
<223> Single-stranded DNA oligonucleotide
<400>26
tccacttcca aatctctatc cctaacaac 29
<210>27
<211>23
<212>DNA
<213> Artificial sequence
<220>
<223> Single-stranded DNA oligonucleotide
<400>27
aggcattaga ggcgataagg gag 23
<210>28
<211>27
<212>DNA
<213> Artificial sequence
<220>
<223> Single-stranded DNA oligonucleotide
<400>28
tcaatccaat aatagccact tgaccac 27

Claims (20)

1. A method of producing collagen in a plant or isolated plant cell comprising targeting at least one type of collagen alpha chain and a mammalian prolyl-4-hydroxylase (P4H) to the vacuole of the plant or isolated plant cell, thereby allowing the at least one type of the collagen alpha chain to be hydroxylated by the mammalian P4H and not by endogenous P4H, thereby producing collagen in the plant or isolated plant cell.
2. The method of claim 1, further comprising expressing an exogenous lysyl hydroxylase (LH3) in said subcellular compartment lacking endogenous P4H activity.
3. The method of claim 1, wherein said at least one type of said collagen alpha chain comprises a signal peptide for targeting vacuoles.
4. The method of claim 1, wherein said at least one type of said collagen alpha chain lacks an ER targeting sequence or a retention sequence.
5. The method of claim 1, wherein said at least one type of said collagen alpha chain is expressed in a DNA-containing organelle of the plant.
6. The method of claim 1, wherein said exogenous mammalian P4H comprises a signal peptide for targeting the vacuole.
7. The method of claim 1, wherein said exogenous mammalian P4H lacks an ER targeting or retention sequence.
8. The method of claim 1, wherein said exogenous mammalian P4H is expressed in a DNA-containing organelle of the plant.
9. The method of claim 1, wherein said at least one type of said collagen alpha chain is an alpha 1 chain.
10. The method of claim 1, wherein said at least one type of said collagen alpha chain is an alpha 2 chain.
11. The method of claim 1, wherein said at least one type of said collagen alpha chain comprises a C-terminal and/or N-terminal propeptide.
12. The method of claim 1, wherein the plant is selected from the group consisting of tobacco, corn, alfalfa, rice, potato, soybean, tomato, wheat, barley, canola, carrot, and cotton.
13. The method of claim 1, wherein said at least one type of said collagen alpha chain or said exogenous P4H is expressed only in a portion of said plant.
14. The method of claim 13, wherein said part of said plant is a leaf, seed, root, tuber, or stem.
15. The method of claim 1, wherein said exogenous mammalian P4H is capable of specifically hydroxylating the Y position of the Gly-X-Y triplet of said at least one type of said collagen alpha chain.
16. The method of claim 14, wherein said exogenous mammalian P4H is human P4H.
17. The method of claim 1, wherein the plant is subjected to stress conditions.
18. The method of claim 17, wherein the stress condition is selected from the group consisting of drought, salinity, injury, cold, and spraying a stress-inducing compound.
19. A nucleic acid construct comprising a polynucleotide encoding human P4H placed under the transcriptional control of a promoter functional in a plant cell, said human P4H comprising a signal peptide for targeting a vacuole.
20. The nucleic acid construct of claim 19, wherein the promoter is selected from the group consisting of CaMV35S promoter, ubiquitin promoter, rbcS promoter, and SVBV promoter.
HK08104257.2A 2004-09-29 2005-09-28 Collagen producing plants and methods of generating and using same HK1114405B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US61371904P 2004-09-29 2004-09-29
US60/613,719 2004-09-29
PCT/IL2005/001045 WO2006035442A2 (en) 2004-09-29 2005-09-28 Collagen producing plants and methods of generating and using same

Publications (2)

Publication Number Publication Date
HK1114405A1 HK1114405A1 (en) 2008-10-31
HK1114405B true HK1114405B (en) 2012-10-05

Family

ID=

Similar Documents

Publication Publication Date Title
EP1809751B1 (en) Collagen producing plants and methods of generating and using same
HK1251611A1 (en) Collagen producing plants and methods of generating and using same
US10626408B2 (en) Collagen producing plants and methods of generating and using same
JP4610334B2 (en) Production of peptides and proteins by accumulating in protein granules from the endoplasmic reticulum of plants
JPH02501802A (en) Method for producing biologically active peptides by expression of modified storage seed protein genes in transgenic plants
HK1114405B (en) Collagen producing plants and methods of generating and using same
AU2011211341B2 (en) Collagen Producing Plants and Methods of Generating and Using Same
AU2007201384B2 (en) Collagen Producing Plants and Methods of Generating and Using Same
HK1160177A (en) Collagen producing plants and methods of generating and using same
HK1160176B (en) Collagen producing plants and methods of generating and using same