AU595486B2

AU595486B2 - A synthetic signal sequence for the transport of proteins in expression systems

Info

Publication number: AU595486B2
Application number: AU48333/85A
Authority: AU
Inventors: Joachim Engels; Michael Leineweber; Eugen Uhlmann; Waldemar Wetekam
Original assignee: Hoechst AG
Current assignee: Hoechst AG
Priority date: 1984-10-06
Filing date: 1985-10-04
Publication date: 1990-04-05
Anticipated expiration: 2005-10-04
Also published as: GR852405B; EP0177827B1; IE852440L; DK453285A; ES547600A0; HU197355B; JPS6188883A; PH30819A; DE3587660D1; DE3436818A1; DK175511B1; DK453285D0; ATE97445T1; IE63262B1; EP0177827A3; EP0177827A2; NZ213717A; PT81253B; IL76573A; PT81253A

Abstract

The DNA of a natural signal sequence is modified by incorporation of cleavage sites for endonucleases and is thus made suitable for incorporation in any desired vectors by the building block principle. The vectors modified in this way then bring about transport of the encoded protein out of the cytoplasm. <IMAGE>

Description

-1 1- il". II 2 595486 Form COMMONWEALTH OF AUSTRP 4.A PATENTS ACT 1952-69 COMPLETE SPECIFICATION

IORIGINAL)

Class Application Number: Lodged: I t. Class Complete Specification Lodged: Accepted: Published; Priority:

I'-

.2 Related Art: (j

U

I,

ii Name of Applicant: Address of Applicant: Ac,.Ual Inventor: Address for Service HOECHST AKT IENGESELLSCHAFT 45 Bruningstrasse, D-6230 Frankfurt/Main Federal Republic of Germany JOACHIM ENGELS1 MICHAEL LEINEWEBER, EUGEN UHLMANN and WALDEMAR WETEKA EDWD. WATERS SONS, 50 QUEEN STREET, MELBOURNE, AUSTRALIA, 3000.

Gamplete Specifitation for the invention entitled: A SYNTHETIC SIGNAL SEQUENCE FOR THE TRANSPORT OF PROTEINS IN EXPRESSION SYSTEMS The following statement is a full description of this invention, Including the best mettod of performing it known to %,US 1 1a- HOECHST AKTIENGESELLSCHAFT HOE 84/F 238 Dr.KL/mu A synthetic signal sequence for the transport of proteins in expression systems In the cell, proteins are synthesized on the ribosomes which are located in the cytoplasm. Proteins which are transported out of the cytoplasm carry on the amino termi'nal end a relatively short peptide chain which is eliminated enzymatically on passage through the cytopLasmic membrane, whereupon the mature protein is produced.

This short peptide sequence is called a "signal peptide" or a presequence or leader sequence.

The signal sequence Located at the anino terminal end has already been characterized for a large number of secretory proteins. In general, it is composed of a hydrophobic region of about 10 to 20 amino acids, which is called the 15 core and to whose amino terminal end a short peptide se- S, quence (the pre-core) is bonded, this usually having one positively charged amino acid (or several). Between the carboxy terminil end of the hydrophobic region and the amino terminal end of the mature traosported protein there is a shot peptide sequence (the post-core) which contains the splice site and ensures that the spatial arrangement is favorable.

It is known, from U.S. Patent 4,411,994, to couple the gene for a protein which is to be expressed with a bacterial gene which code3 for an extracellular or periplasmic carrier protein in order thus to bring about the transport of the desired protein out of the cytoplasm. It is necessary for this process to isolate a bacterial gene, which is intrinsic to the host, for a periplasmic, outer membrane protein or an extraceilular protein. This gene is then cut with a restriction enzyme, the gene for the protein which is to be transported is inserted into the cut which has been produced, and the host cell is transformed r i 2 with a vector which contains the fusion gene thus formed.

The isolation of the natural gene and its characterization for the selection of suitable cleavage sites is extremely complex. This complexity is avoided according to the invention by making use of a synthetic signal sequence.

Thus the invention relates to a synthetic signal sequence for the transport of proteins in expression systems, which comprises DNA essentially corresponding to a natural signal sequence but having one or more cleavage sites for endonucleases which are not present in the natural DNA. Further aspects of the invention and preferred embodiments are presented below or are set out in the patent claims.

The DNA should "essentially" correspond to that of a natural signal sequence. This is to be understood to mean that the expressed signal peptide is substantially or completely identical to the natural signal peptide, in the latter case therefore the only difference existing at the DNA level is that the syntnetic DNA has at least 20 one cleavage site that the natural DNA sequence does not contain. This incorporation of the cleavage site according to the invention thus means that there is a, more or less extensive, difference from the natural sequence, it being necessary under certain circumstances to have recourse to codons which are known to be less preferred by the particular host organism. However, surprisingly, this is not associated with any expression disadvantage.

On the contrary, the specific "making to measure" of the synthetic gene is associated with so many advantages that any disadvantage owing to the use of "unnatural" codons is, in general, overcompensated by far. In fact, it has emerged that replacement of the start codon GTG, which occurs in the gene for alkaline phosphatase in E. coli, by ATG leads to a great increase in expression. A particular advantage of the invention is that the host cell has to produce less ballast protein because the gene which is to be expressed can be directly linked to the 3' end of 4o 0 0oo 0 00 0 0 0 0 00 *000 0000 r it Il I 7 3 the synthetic DNA signal sequence. Furthermore, advantages accrue in so far as it is possible in the construction of the synthetic DNA to provide DNA sequences, which protrude at the ends, for certain restriction recognition sites which allow cloning of this sequence and, in the case of disparate recognition sites, permit defined incorporation into a cloning vector. This makes possible incorporation to any desired vectors by the "modular construction principle".

Internal recognition sites for restriction enzymes permit any desired homologous or heterologous genes to be coupled on in the correct reading frame. It is also possible via these internal cleavage sites to introduce in a straightforward manner modifications in the DNA of the signal sequences, which lead to presequences which do not occur in nature.

These internal cleavage sites are advantageously placed in the regions upstream and downstream of the hydrophobic region, in particular in the post-core region, it being possible to modify the splice site and/or its adjacent region. Of course, it is also possible to modify the core region in a manner known per se.

ft. ft *r 4 ft o S .4 4 ft f ''ft.

Taking known rules into account von Heijne, J. Mol.

Biol. 173 (1984) 243-251) it is possible, via suitable S 25 cleavage sites in the gene section which codes for the carboxy terminal part of the prepeptide, to plan the signal peptidase splice site in such a manner that there is expression not of a fusion protein but directly of the desired, generally eukaryotic, peptide in its natural form. In general, genes of natural origin do not allow processing of this type.

Suitable signal sequences are in principle all signal sequences known from the literature Watson; Nucleic Acids Res. 12 (1984), 5145 5164), modifications thereof and "idealized" signal sequences derived therefrom

I.

-4- Perlman and H.O. Halvorson J. Mol. Biol. 167 (1983), 391 409).

Preferred host organisms are E. coli, Streptomyces, Staphylococcus species, such as S. aureus, BaciLLus species, such as B. subtilis, B. amyloliquifaciens, B. cereus or B. licheniformis, Pseudomonas, Saccharomyces, Spodoptera frugiperda and cell lines of higher organisms, such as plant or animal cells.

In principle, it is possible to obtain by transport expression all those proteins of prokaryotic or eukaryotic origin which can pass through the membrane. However, peptide products which are of pharmaceutical significance, such as hormones, lymphokines, interferons, blood-coagulation factors and vaccines, which in nature are also coded as peptides with an amino-terminal presequence are preferred. However, in the prokaryotic host organisms this 4 II eukaryotic presequence is not, as a rule, eliminated by tO t? S. the signal peptidases intrinsic to the host.

+i e t In E. coli, the genes for the periplasmic and outert membrane proteins are suitable for transport expression, the former directing the product into the periplasm whereas the Latter tend to direct onto the outer membrane.

S. The example which is given is the DNA signal sequence of the periplasmic protein alkaline phosphatase, which is very readily expressed in E. coli, but there is no intention to restrict the irnvention to this.

The presequence including the first twenty amino acids of alkaline phosphatase of E. coli is shown below: 1 5 Met-Lys-Gln-Ser-Thr-Ile-Ala-Leu-Ala-Leu-Leu-Pro-Leu-Leu- 20 Phe-Thr-Pro-Val-Thr-Lys-Ala-Arg-Thr-Pro-Glu-Met-Pro-Val- 35 Leu-Glu-Asn-Arg-Ala-Ala-Gln-Gly-Asn-Ile-Thr-Ala-Pro

MMMM

5 I preferred splice site of the signal peptidase It has emerged that up to about 40, usually about additional amino acids of the mature protein suffice for correct processing. However, in many cases fewer additional amino acids also suffice, for example about advantageously about 5. Since a shorter protein chain means less stress on the protein biosynthesis system of the host cell, an advantageous embodiment of the invention is set out in DMA sequence I (appendix) which codes for the presequence of alkaline phosphatase and an additional amino acids of the perfect protein. Apart from a few triplet modifications namely those which introduce unique restriction enzyme cleavage sites and replace the start codon GTG by ATG DNA sequence I corresponds to the natural sequence for alkaline phosphatase. At the ends of the coding strand are located protruding DNA sequences corresponding to the restriction endonuclease EcoR I, which permit incorporation into conventional cloning vec- *1 tors, for example the commercially available plasriids such as pBR 322, pUC 8 or pUC 12. In addition, a number of other unique cleavage sites for restriction enzymes have been incorporated within the gene of DNA sequence I, and these, on the one hand, make it possible to couple heterologous genes onto the correct site and in the desired reading frame and, on the other hand, permit modifications to be carried out: Restriction enzyme Cut after nucleotide No.

(in the coding strand) Sau 3 A 19 Pvu I 22 t Hpa II 54 (present in the Nci I 54 natural gene) L Alu I 66 Hph I 68 Ava II Of course, it is also possible to construct the protruding ii D P EPP P PUI- 6 sequences in such a manner that they correspond to different restriction enzymes, and this then permits incorporation into suitable vectors in a defined orientation. In this context, the expert will give consideration to whether the complexity associated with the construction of the gene and its specific incorporation is more important than the additional work of selection associated with incorporation in both orientations when the protruding ends are identical.

DNA sequence I can be constructed of 6 oligonucleotides 26 31 bases in length by first synthesizing them chemically and then linking them enzymatically via sticky ends of 6 nucleotides. Incorporation of the synthetic gene into cloning vectors, for example into the commercially available plasmids mentioned, is carried out in a manner known per se.

As an example for the expression of a eukaryotic gene in E. coli using a presequence according to the invention, the synthesis of monkey proinsulin is described below: a DNA sequence is constructed in which the DNA sequence I, followed by the proinsulin gene Wetekam et al., Gene sl 19 (1982) 179-183), is located on a connecting recognition site for EcoR I and downstream of a chemically synthesized regulation region, composed of a bacterial promoter, a lac operator and a ribosomal binding site (German Patent Application P 34 30 683.8), and 6 14 nucleotides away from the ribosomal binding site. The expressed proinsulin fusion peptide contains an additional i 9 amino acids on the amino terminal end, and these can be 30 eliminated enzymatically or chemically.

The incorporation of the synthetic gene into pUC 8 and the construction of expression plasmids which contain the eukaryotic genes coupled to DNA sequence I are carried out in a manner known per se. In this context, reference may be made to the textbook by Maniatis (Molecular Cloning, Maniatis et al., Cold Spring Harbor, 1982). The r77 I I -1 o a o a o a 0 o 0 0a o qa 9 O0 000 0 O 0* 0 G 0 00 000 0 0f0 ftD «E 7 transformation of the hybrid plasmids thus obtained into suitable host organisms, advantageously E. coli, is Likewise known per se and is described in detail in the abovementioned textbook. The isolation of the expressed proteins and their purification is likewise described.

In the examples which foLLow some more embodiments of the invention are specifically illustrated, from which is evident to the expert the Large number of possible modifications (and ccmbinations). Unless otherwise specified, percentage data in these examples relate to weight.

Examples 1. Chemical synthesis of a single-stranded oLigonucLeotide The synthesis of the structural units of the gene is illustrated by the example of structural unit Ia of the gene, which comprises nucleotides 1 29 of the coding strand. The nucleoside at the 3' end, in the present case therefore guanosine (nucleotide No. 29), is covalently bonded via the 3'-hydroxy group, by known methods Gait et al., Nucleic Acids Res. 8 (1980) 1081 1096) to silica gel (FRACTOSIL, supplied by Merck).

For this purpose, first the silica gel is reacted with 3-triethoxysilylpropylamine with elimination of ethanol and formation of a Si-O-Si bond. The guanosine is reacted as the N2'-isobutyryl-3'-0-succinoyL-5'-dimethoxytrityl 25 ether with the modified carrier in the presence of paranitrophenol and N,N'-dicyclohexylcarbodiimide, the free carboxy group of the succinoyl group acylating the amino radical of the propylamine group.

In the synthetic steps which follow, the base component is used as the monomethyl ester of the tritylnucleoside-3'-phosphorous acid dialkylamide or chloride, the adenine being in the form of the N 6 -benzoyl compound, the cytosine being in the form of the N 4 -benzoyl compound, the guanine being in the form of theN 2 -isop i I i~i t 11 -8 butyryl compound, and the thymine, which contains no amino group, being without a protective group.

mg of the polymeric carrier containing 2 pmol of bound guanosine are treated successively with the following agents: a) nitromethane b) saturated zinc bromide solution in nitromethane containing 1% water c) methanol d) tetrahydrofuran e) acetonitrile f) 40 pmol of the appropriate nucleoside phosphite and 200 pmol of tetrazole in 0.5 ml of anhydrous acetonitrile (5 minutes) g) 20% acetic anhydride in tetrahydrofuran containing lutidine and 10% dimethylaminopyridine (2 minutes) h) tetrahydrofuran tetrahydrofuran containing 20% water and 40% lutidine j) 3% iodine in collidine/water/tetrahydrofuran in the ratio by volume 5 4 1 k) tetrahydrofuran and 1) methanol.

In this context, the term "phosphite" is to be understood to be the monomethyl ester of the deoxyribose-3'-monophosphorous acid, the third valency being saturated by chloride or a tertiary amino group, for example a morpholino radical. The yields in each synthetic step can be determined after the detritylation reaction in each case by spectrophotometry, measuring the absorption of the dimethoxytrityl cation at a wavelength of 496 nm.

When the synthesis of the oligonucleotide is complete, the methyl phosphate protective groups on the oligomer are eliminated using p-thiocresol and triethylamine. The oligonucleotide is then removed from the solid carrier by treatment with ammonia for 3 hours. Treatment of the 9 oligomers with concentrated ammonia for 2 to 3 days quantitatively eliminates the amino protective groups on the baser. The crude product thus obtained is purified by high-pressure liquid chromatography (HPLC) or by polyacrylamide gel electrophoresis.

The other structural units Ib If of the gene are synthesized entirely correspondingly, their nucleotide sequences being evident from DNA sequence II (appendix).

2. Enzymatic Linkage of the single-stranded oligonucleotides to give DNA sequence I The terminal oligonucLeotides Ia and If are not phosphorylated. This prevents oligomerization via the protruding ends. For the phosphorylation of oligonucleotides Ib, Ic, Id and Ie, in each case 1 nmol of these compounds is i 15 treated with 5 nmol of adenosine triphosphate and 4 units of T4 polynucleotide kinase in 20 ul of 50 mM tris.HCL buffer (pH 10 mM magnesium chLoride and 10 mM dij thiothreitol (DTT) at 37 0 C for 30 minutes. The enzyme Iis inactivated by heating at 95 0 C for 5 minutes. The i 20 oLigonucleotides la to If are then coibined and hybridized Sto give the double strand by heating them in a 20 mM KCI <solution and then slowly (over the course of 2 hours) cooling to 16 0 C. The ligation to give the DNA fragment according to DNA sequence I is carried out by reaction in 40 ul of 50 mM tris.HCL buffer (20 mM magnesium chloride and 10 mM DTT) using 100 units of T4 DNA Ligase, at 15 0 C over the course of 18 hours.

c Tne purification of the gene fragment is carried out by gel electrophoresis on a 10% polyacrylamide gel (without addition of urea, 20 x 40 cm, 1 mm thick), the marker substance used being OX 174 DNA (supplied by BRL) cut with Hinf I, or pBR 322 cut with Hae III.

-~LI~ILI~ I 10 3. Incorporation of the gene fragment in pUC 3 The commercially available plasmid pUC 8 is opened in a known manner and in accordance with the manufacturer's data using the restriction endonuclease EcoR I. The digestion mixture is fractionated by electrophoresis on a polyacrylamide gel in a known manner, and the DNA is ivisualized by staining with ethidium bromide or by radioactive Labeling ("Nick translation" method of Maniatis, loc. cit.). The plasmid band is then cut out of the acrylamide gel and separated from the polyacrylamide by electrophoresis.

4. Incorporation of DNA sequence I into an expression plasmid The expression plasmid pWI 6 having the information for 15 monkey proinsuLin is constructed as follows: i 10 ug of the plasmid pBR 322 are digested Ith the restriction endonucleases EcoR I and Pvu II and then the b EcoRI cleavage site is filled in a fill-in reaction using KLenow polymerase. Following fractionation by gel electro- 20 phoresis in a 5% polyacrylamide gel, the plasmid fragment of length 2293 Bp can be obtained by electroelution (Figure 1).

The monkey preproinsulin DNA integrated in the plasmid pBR 322 (Wetekam et al., Gene 19 (1982) 179 183) is S 25 isolated by digestion using the restriction endonucleases ,j ,Hind III and Mst I (as a fragment of about 1250 Bp) and recloned into the plasmid pUC 9 as follows: the plasmid nUC 9 is cleaved with the enzyme Bam HI, the cleav~g, site is filled in a standard fill-in reaction using Klenow 30 polymerase ("large fragment"), subsequent cleavage with the restriction enzyme Hind III is carried out, and the DNA is separated from the other DNA fragments by gel electrophoresis in a 5% polyacryLamide gel. The isolated insulin DNA fragment of length about 1250 Bp is integrated ~ZUI4 Sc"" sanrPcllemr~i~EEls~. ~mrz* 11 into the opened plasmid.

To remove the untranslated region and the presequence, the pUC 9 plasmid thus modified is digested with Hae III, and the fragment of length 143 Bp is digested with Bal 31 under limiting enzyme conditions to eliminate the last two nucleotides from the presequence. This results in the first codon on the amino terminal end being TTT, which represents phenylalanine as the first amino acid of the B chain.

An adaptor which is specific for Eco RI is now ligated onto this fragment in a blunt-end ligation reaction: a) 5' AAT TAT GAA TTC GCA ATG Eco RI TA CTT AAG CGT TAC b) 5' AAT TAT GAA TTC GCA AGA Eco RI TA CTT AAG CGT TCT In order to prevent polymerization of the adaptors they are used unphosphorylated in the ligation reaction (this being indicated in the figures by Eco RI", in the same way as recognition sequences inactivated by, for example, filling in). The adaptor a) has a codon for methionine at the end, and the adaptor b) has a codon for arginine.

Thus, the gene product obtained by variant a) is amenable to removal of the bacterial contribution by cleavage with cyanogen bromide, whereas variant b) allows trypsin cleavage.

o The ligation product is digested with Mbo II. After fractionation by gel electrophoresis, a DNA fragment of length 79 Bp having the information for amino acids Nos. 1 to 21 of the B chain is obtained.

The gene for the remaining information for the proinsulin molecule (including a G-C sequence from the cloning and 21 Bp from the pBR 322 connected to the stop codon) is

I

1 i--i-k gi I-rr r -I 12 obtained from the pUC 9 plasmid having the complete information for monkey preproinsulin by digestion with Mbo II/Sma I and isolation of a DNA fragment of Length about 240 Bp. The correct Ligation product of Length about 320 Bp (including the adaptor of 18 Bp) is obtained by Ligation of the two proinsuLin fragments. This proinsulin DNA fragment thus constructed can now be Ligated together with a regulation region via the Eco RI negative cLeavage site.

Figure 2 shows the entire reaction sequence, where A, B and C denote the DNA for the particular peptide chains of the proinsuLin molecule, Ad denotes the (dephosphorylated) adaptor (a or b) and Pre denotes the DNA for the presequence of monkey preproinsuLin.

A chemically synthesized regulation region composed of a recognition sequence for Bam HI, the Lac operator a bacteriaL promoter and a ribosomal binding site (RB), and having an ATG start codon, 6 to 14 nucLeotides away from the RB and having a connected recognitioi sequence for Eco RI (Figure 3) is Ligated, via the common Eco RI overlapping region, with the proinsulin gene fragment obtained according to the previous exampLe. It is advantageous to choose the foLLowing synthetic regulation region (DNA sequence IIa from Table 2, corresponding to German Patent Application P 34 30 683.8): Ei <ta r 1 13 3' 3' GATTTATTTAAGAACTGTAAAAAATTT (Barn HI) P TAATTTGGTATAATGTGTGGAATTGTGAGCG 3' 3' ATTAAACCATATTACACACCTTAACACTCGC 0 GAATAACAATTTCACAGAGGATCTAG 3' 3' CTTATTGTTAAAGTGTCTCCTAGATCTTAA RB (Eco RI) The other synthetic regulation regions specified in Table 2 can be used Likewise. However, it is also possibLe to choose a natural or derived (PerLinan et aL., Loc.

cit.) signal sequence known from the Literature.

h i

I

4-n e9 9* TABLE 2 Synthetic regulation region (coding strand): GGATCCTAAATAAATTCTTGACATTTITTAA2TAATTTGGTATAATGT3T 4GAATTG5GAGCG6T7ACAATT8C9CIOG11Gl2Tl3TA1ITT15 (ATG) 3' 1 T or G 2Z A or C 3 G or C 4 =G or A T or C 6 GA or GAA 7 =A or C 8 T or direct bond 9 A or TAGA 10 TTTAAA, AAGCTT or AAGCTA DNA sequences Ila-h: 11 =AG or GA 12 A or G 13 C or T 14 =GAA or AGC 15 =C or direct bond 13 14 15 3 =G C CIAA C 4 =G C GAA C 9 =A C GAA C C GAA C C GAA C C GAA C C GAA C T AGC Iha b

C

d e f g h 6

GAA

C

GA

10

A

TTTAAA

AACGCTT

AAGCTT

AAGCTA

15 Following double digestion with Sma I/Bar HI and a filLir reaction of the Bam HI cleavage site with the Klenow frPgment, the ligation product (about 420 Bp) is isolated by gel electrophoresis.

The fragment thus obtained can then, by a blunt-end ligation, be ligated into the pBR 322 part-plas.id of Figure 1 (Figure The hybrid plaimid pWI 6 is obtained.

After transformation into the E. coli strain HB 101 and selection on ampicillin plates, the plasmid DNA of individual clones was tested for the integration of a 420 Bp fragment having the regulation region and the proinsulin gene shortened by Bal 31. In order to demonstrate the correct shortening of the proinsulin gene by Bal 31 (Figure the plasmids having the integrated proinsulin gene fragment were sequenced starting from the Eco RI cleavage site. Of 60 sequenced clones, three had the desired shortening by two nucleotides (Figure 4).

1 pg of the plasmid pWI 6 is cut with the restriction enzyme Eco RI and then ligated together in the presence of 30 ng of DNA sequence I, at 16 0 C in 6 hours. After transformation into E. coli HB 101, plasmids are isolated from individual clones and tested for integration of DNA sequence I by means of restriction enzyme analysis. 7% of the clones contained the plasmid pWI 6 with integrated DNA sequence I.

The direction of this integration reaction can be unambiguously determined by standard methods of restriction enzyme analysis via double digestion with Hind III/ Pvu I. The plasmid pWI 6 having a DNA sequence I inte- 30 grated in the correct direction of reading to the proinsulin gene is shown as pWIP 1 in Figure This plasmid can then be transformed into various E. coli strains in order to test the synthetic capacity of the individual strains.

ii_~ 16 The expression of the presequence-proinsulin gene fusion in E. coli is determined as follows: 1 ml of a bacterial culture induced with IPTG (isopropyl -D-thiogalactopyranoside) is stopped using PMSF (phenylmethylsulfonyl fluoride) in a final concentration of 5x10 4 M at an optical density of OD600 of 1.0 and at an induction time of 1 hour, cooled in ice and spun down.

The cell sediment is then washed in 1 ml of buffer (10 mM tris.HCl, pH 7.6; 40 mM NaCL), spun down and resuspended in 200 /l of buffer (20% sucrose; 20 mM tris.HCL, pH 2 mM EDTA), incubated at room temperature for 10 minutes, spun down and immediately resuspended in 500 pL of doubledistilled H 2 0. After incubation in ice for 10 minutes, the shock-lysed bacteria are spun down and the supernatant is frozen. The proinsulin content of this supernatant is tested by a standard insulin RIA (Amersham).

The bacterial sediment is resuspended once more in 200 pl of lysozyme buffer (20% sucrose; 2 mg/ml lysozyme; 20 mM tris.HCL, pH 8.0; 2 mM EDTA), incubated in ice for 30 minutes, sonicated 3 x 10 seconds and then spun down.

The supernatant resulting from this is tested for the content of proinsulin ("plasma fraction") in a radioimmunoassay.

Individual bacterial clones which contain the plasmid pWIP 1 were examined for their synthetic capacity and their ability to transport the proinsulin-presequence product. It was possible to demonstrate that all the bacterial clones, as expected, transported about 90% of the produced proinsulin into the periplasmic space. About 10% of the RIA activity of proinsulin was still found in the plasma fraction.

ItI 'sit

S

17 DNA sequenceI Triplet N o.

Amino acid No.

Nucleoticb N o.

Coding strand non-cod. strand 1 2 3 Met Lys Gin 5 AA TTO ATG AAA CAA G TAG TTT OTT 14 5 6 Ser Th r Ile 20 AGO AOG ATO TOG TGO TAG 7 Ala 25

GOA

CGT

8 9 Leu Ala 30 OTO GOA GAO OGT 10 Leu 35

OTO

GAG

11 Leu TT A

AAT

12 13 Pro Leu 140 000T TTA 000 AAT 14 15 16 17 18 19 20 Leu Phe Thr Pro Val Thr Lys 21 22 23 Ala Ar'g Thr

OTG

GAO

50 55 60 65 TTT ACC 000 GTG ACA AAA GOT OGO ACC AAA TGG 000 CAC TOT TTT OGA 000 TG 214 25 26 Pro Glu Met 80 OOA GAA ATG GOT OTT TAO 814 O 3 OTT AA 18 DNA sequence II: 4 Ia AA TTC ATG AAA CAA AGC ACG ATC GOA CTG 3' G TAC TTT GTT TOG TGC TAG CGT GAO Eco RI I b 4 Ic GOA OTO TTA COG TTA OTG TTT ACC COG CGT GAG AAT GO AAT GAO AAA TGG GGC 4 Id Ie 0' Eco RI lTG ACA AAA GOT COG ACC OCA GAA ATG G CAC' TGT TTT OGA GOC TGG GGT OTT TAO OTT AA if Vt 11 14

Claims

1. Synthetic DNA, coding for a signal sequence effecting the transport of a protein through the cytoplasmic membrane of a transformed host cell, which DNA encodes the amino acid sequence of a natural signal sequence but contains at least 1 cleavage site for an endonuclease which is not present in the natural DNA, and wherein the said DNA further encodes 5 to 40 of the amino terminal amino acids of the structural gene pertaining to the said natural signal sequence.

2. A DNA as claimed in Claim 1, wherein the cleavage S00 sites are located upstream and/or downstream of the DNA 0° encoding the hydrophobic region.

3. A DNA as claimed in Claim 1, which encodes the natural signal sequence and the first 5 to 40 amino acids of alkaline phosphatase of E. coli.

4. DNA of the formula I (as hereinbefore defined). I ,,00

5. A process for the expression of eukaryotic, prokaryotic or viral proteins in prokaryotic or eukaryotic host cells, which comprises coupling the gene for the protein which is to be transported oni- a DNA sequence as claimed in Claim 1, incorporating this fused gene into a vector, and transforming therewith a host cell which transports the expressed protein out of the cytoplasm,

6. A process as claimed in Claim 5, wherein the synthetic DNA sequence encodes a signal protein which is intrinsic to the host. I-I a~ 20

7. Hybrid vector, comprising a DNA sequence as claimed in Claim 1.

8. Hybrid vector, which is a plasmid containing the DNA sequence I (as hereinbefore defined) inserted in an EcoRI cleavage site.

9. Host organism, transformed with a hybrid vector as claimed in Claim 7 or 8. coli. Host organism as claimed in Claim 9 which is E. a-c «t 4* a' tt a at t t a fa t a- ala-a a-t DATED this 5th day of January, 1990 HOECHST AKTIENGESELLSCHAFT a a a I a WATERMARK PATENT TRADEMARK ATTORNEYS, "The Atrium" 290 Burwood Road Hawthorn, Victoria, 3122 AUSTRALIA JJC(3/4) I 1-a a a- a a-a