WO1997013853A2

WO1997013853A2 - Protein detection

Info

Publication number: WO1997013853A2
Application number: PCT/EP1996/004510
Authority: WO
Inventors: Henriëtte Catharina VAN DEN BROECK; Leendert Hendrik De Graaff; Jacob Visser; Albert Johannes Joseph Van Ooyen
Original assignee: Gist Brocades BV
Current assignee: DSM Delft BV
Priority date: 1995-10-13
Filing date: 1996-10-14
Publication date: 1997-04-17
Anticipated expiration: 1998-04-13
Also published as: AU7294396A; JPH10510720A; EP0796328A2; WO1997013853A3

Abstract

The present invention provides a method for identifying a DNA fragment encoding a protein of interest, wherein a cDNA library in the form of bacterial host cells transformed with DNA obtainable from an eukaryotic organism capable of producing said protein is screened for expression of said protein using a test which is indicative for its presence, characterised in that screening of the host cells is performed after the transforming DNA has become part of a plasmid. The cDNA library is preferably from an eukaryotic organism. The eukaryotic organism is preferably a fungus or a yeast cell, more preferably an Aspergillus species, most preferred is a member of the Aspergillus niger group. The enzymes screened for are preferably enzymes which show cellulase, endoxylanase or arabinoxylan degrading activity.

Description

PROTEIN DETECTION

FIELD OF INVENTION

The present invention relates to a method for identifying a DNA sequence 0 encoding a protein of interest.

BACKGROUND

s It is one of the great challenges of modern biotechnology to find new enzymes to perform certain functions. Enzymes find many applications in areas such as the food industry, detergents, starch conversion, animal feed and as processing aids in various industries. In addition, enzymatic conversions are an alternative to chemical conversions for various tasks such as the synthesis of some pharmaceutical compounds. o In general, enzymes are becoming increasingly popular as cost-efficient, specific and environment-friendly catalysts. Thus, it is one of the great challenges of the enzyme industry to find new enzymatic activities to expand the range of enzyme applications or to find enzymes better suited for specific applications.

The quest for novel enzymes has traditionally been performed by exploring 5 nature's diversity. Most enzymes which are on the market now are of microbial origin. Microbes such as bacteria, yeasts and fungi secrete enzymes with certain activities. By screening micro-organisms from various ecological niches enzymes have been found that are applied today. This screening method, however, has several disadvantages. For example, micro-organisms found in nature usually secrete a whole array of enzymes. o Thus, the enzyme needed may represent only a small percentage of the enzymes synthesized. In addition, the micro-organism may synthesize unwanted compounds such as toxins or undesired enzymes.

Recombinant DNA technology is applied to clone genes of interest which are subsequently introduced to suitable hosts. Traditional methods of cloning genes are 5 tedious as they involve purification of the encoded protein (e.g. enzyme), partial amino acid sequencing and subsequently molecular cloning of the gene of interest. After cloning, the gene of interest can be expressed in a host suitable for large-scale fermentation In addition to this traditional method of cloning enzymes, other methods have also been employed. Genomic DNA from a specific organism can be identified by cloning in E.coli, followed by screening for a certain activity. A large number of enzymes, including cellulases and xylanases have been identified in this way (Sashihara N et al. (1 984) J Bact. 158:503-506; Fukumoπ F et al. (1 989) Gene 76: 289-298; Sakka K et al (1 989) Agric. Biol Chem. 53- 905-910, Zappe H et al (1987) Appl. Microbiol Biotechnol 27 57-63; Sakka K et al (1990) Agric Biol Chem. 54:337-342; Sπvastava R et al (1991 ) FEMS Microbiol Lett. 78:201 -206; Shendye A and Rao M (1993) FEMS Microbiol. Lett 108:297-302) However, the method is limited to micro-organisms whose DNA can be expressed directly in E.coli. Usually, genes from unrelated organisms, e.g. eukaryotes like fungi and yeast, will not be expressed due to the ineffectivity of regulatory sequences or the presence of introns.

To overcome some ot these problems cDNAs have been expressed in E.coli phages, giving rise to functional proteins as detected by antibodies or enzymatic activities. The use of antibodies (see for instance EP 0 506 190 and pending application EP 94202442.3), however requires the availability of pure enzymes, whereas the enzymatic activities in phage plaques are extremely low (Karlovsky & Wolf (1 993) Methods Mol. Cell. Biol 4:40-45), making it unsuitable for quick detection of enzyma¬ tic activities.

A third method involves cDNA expression in yeast (WO 93/1 1 249, Dalbøge & Heldt-Hansen (1 994) Mol Gen Genet 243. 253-260 However, the method is meffic- lent due to the low percentage of colonies that produce an active enzyme. 2.5 x 10⁵-5 x 10^s clones have to be screened in this procedure. Another disadvantage is that the method requires the transfer of genetic material from £. coli into yeast before screening can take place, and then transfer from yeast back into E. coli for the rest of the cloning procedure. In addition, the levels of enzymatic activity are low. SUMMARY OF THE INVENTION

The present invention provides a method for identifying a DNA fragment encoding a protein of interest, wherein a cDNA library in the form of bacterial host

5 cells transformed with cDNA obtainable from a eukaryotic organism capable of producing said protein is screened for expression of said protein using a test which is indicative for its presence, characterised in that screening of the host cells is performed after the transforming DNA has become part of a plasmid. The cDNA library is preferably from a eukaryotic organism. The eukaryotic organism is preferably a fungus o or a yeast cell, more preferably an Aspergillus species, most preferred is a member of the Aspergillus niger group. The enzymes screened for are preferably enzymes which show cellulase, endoxylanase or arabinoxylan degrading activity.

5 DETAILED DESCRIPTION OF THE INVENTION

The present invention provides a method for identifying a DNA fragment encoding a protein of interest. The method comprises screening a cDNA library in the form of bacterial host cells transformed with DNA obtainable from a eukaryotic o organism capable of producing the protein of interest for expression of this protein. For this purpose a test is used which is indicative for the presence of the protein. Screening o the host cells is performed after the transforming DNA has become part of a plasmid.

In contrast to existing methods for the expression cloning of eukaryotic genes in microorganisms, this method is quick, straightforward and very efficient. Surprisingly 5 high numbers of positive clones are found by screening 10²-10⁴ clones, rather than 10⁵- 10⁶ which is the case for existing methods.

Another surprising advantage of the method is that in contrast to most existing colony screening methods, there is no need to introduce an extra lysis step in which the contents of the bacteria, including possible gene products of interest, are released. o The whole method according to the invention may be carried out in prokaryotic cells, there is no need to transfer genetic material into an eukaryotic organism at any point in the procedure. The cDNA library is prepared from the mRNA of a eukaryotic organism of interest, preferably a fungus. Examples of fungi of particular interest are those of the genus Aspergillus, more specifically Aspergillus niger. In view of recent changes in the nomenclature of black Aspergilli, the term Aspergillus niger is herein defined as

5 including all (black) Aspergilli that can be found in the Aspergillus niger Group as defined by Raper & Fennell (1 965, In: The Genus Aspergillus, The Williams & Wilkins Company, Baltimore, pp. 293-344).

The mRNA which is used to prepare cDNA may be constitutively present or may be induced. The method according to the invention is so effective that a large o percentage of colonies will produce the protein of interest. Therefore, in contrast to existing methods, this method can also be used when dealing with proteins with low mRNA levels.

The library is constructed in a vector. This vector may be any vehicle suitable for the transfer and/or expression of genetic material, such as plasmids and phage s vectors.

If the library is constructed in a phage vector, this is preferably a ZAP™ vector or a derivative thereof, more preferably λ Uni-ZAP™ XR. The primary library which is obtained in this way is amplified using a suitable host cell, preferably E. coli, more preferably £. coli XL1 Blue MRF'. o The phages are then converted into phagemids, preferably by superinfection with a filamentous helper phage and an E. coli host strain, such as E. coli SOLR. In this way a double stranded phagemid is created. Herein, the expression 'phagemid' refers to a phage genome which has been converted into a plasmid. Since all plaques are converted into phagemids prior to screening, clones of interest with low mRNA level 5 are not mistakenly rejected as negative for activity. Hence, it will be clear to the skilled addressee that screening in the plaque stage is superfluous.

In the identification method according to the invention, the DNA of interest is identified by the expression of the protein it encodes, rather than by going through the tedious tasks of purifying the protein, amino acid sequencing and molecular cloning of o the gene. One technique of expression cloning is a plate assay, wherein a cDNA library is plated onto a medium of a composition which enables screening for positive colonies. Screening for the protein of interest does not require pure protein, only the availability of a suitable assay for the detection of activity. This may be a standard assay if for example, cellulase activity may be detected using an overlay containing carboxymethyl cellulose, followed by visualisation by staining with Congo Red. Similarly, detection of arabinoxylan degrading enzymes can be achieved by precipitation formed in plates containing oat spelt xylan. Numerous suitable plate 5 assays have been described in the literature, including the detection of amylases by clearing zones in starch using iodine, the detection of polygalacturonases by clearing zones in pectin using quaternary ammonium ions, etc. However, any suitable kind of assay which allows the identification of DNA by expression of the protein it encodes may be used in the method according to the invention. o Alternatively, a purpose-made assay may be developed for the detection of protein activity, for example where a suitable standard assay is not available.

Examples of enzymes which may be detected using the method of this invention include amylases, arabinoxylan degrading enzymes, cataiases, cellulases, galactanases, lipases, oxidases, pectinases, phosphatases, proteases and xylanases. s Identification of clones of interest is possible without introducing an extra lysis step. The invention, therefore, provides a quick, efficient and straightforward method for the expression cloning of eukaryotic genes in prokaryotic microorganisms.

Also, the present invention provides isolated nucleic acid fragments which can be identified using the method of the invention. The nucleic acid fragments of the o invention may comprise DNA or RNA, preferably DNA. The nucleic acid fragments have undergone one or more isolation steps. Such isolation step may be carried out by any suitable means known in the art.

Preferred DNA fragments of the invention include those whose sequences are given in SEQ ID No. 1 , 3, 5, 7 and 9. The nucleic acid fragments of the invention are 5 not, however, limited to these preferred fragments. Rather, the invention also encompasses related nucleic acid fragments encoding polypeptides having some or all of the activity of the polypeptides encoded by the DNA sequences of SEQ ID No. 1 , 3, 5, 7 and 9; i.e. sequences of the invention related to SEQ ID No. 1 or 3 encode polypeptides having (at least) CMCase activity; sequences of the invention related to o SEQ ID No. 5 or 7 encode polypeptides having (at least) xylanase activity; and sequences of the invention related to SEQ ID No. 9 encode polypeptides having (at least) arabinoxylan degrading activity. Therefore, also, the invention provides variant isolated nucleic acid, preferably

DNA, fragments having a high degree of sequence identity with the nucleic acid sequences of SEQ ID No. 1 , 3, 5, 7 or 9. Thus they are typically substantially homologous to SEQ ID No. 1 , 3, 5, 7 or 9. Typically, such variant fragments have at s least 80% sequence identity with SEQ ID No. 1 , 3, 5, 7 or 9.

Similarly, such variant fragments may differ from SEQ ID No. 1 , 3, 5, 7 or 9 by the deletion, substitution or insertion of one or ore amino acids, as long as the deletion, substitution or insertion does not abolish the activity of the encoded polypeptide. Thus, the encoded polypeptide retains some or all of the activity of the o polypeptide of SEQ ID No. 2, 4, 6, 8 or 10.

The invention also encompasses degenerate variants of SEQ ID No. 1 , 3, 5, 7 and 9; i.e. variants encoding the polypeptides of SEQ ID No. 2, 4, 6, 8 or 10 but having a different nucleic acid sequence.

Variant nucleic acid fragments of the invention may be obtained from any s organism, although they are preferably obtained from fungi or yeasts, more preferably fungi of genus Aspergillus, most preferably A. niger var. niger, A. niger var. tubigensis or A. aculeatus.

Nucleic acids which are identified using the method according to the invention may be used for the construction of recombinant nucleic acids. Typically, o these are in the form of recombinant nucleic acid vectors. Those of skill in the art will be able to prepare suitable vectors for (over)production in host cells of interest. preferred host cells include bacterial (e.g. £. coli), yeast (e.g. K. lactis) and fungal (e.g. Aspergillus) cells.

A vector typically comprises one or more origins of replication so that it can 5 be replicated in a host cell, such as a bacterial, yeast or fungal cell (this enables constructs to be replicated and manipulated, for example in £. coli, by standard techniques of molecular biology). A vector, especially an expression vector, also typically comprises at least the following elements, usually in a 5' to 3' arrangement: a promoter for directing expression of the nucleic acid sequence and optionally a o regulator of the promoter, a transcription start site, a translational start codon, and a nucleic acid sequence of the invention.

The vector may also contain one or more selectable marker genes, for example one or more antibiotic resistance genes. Such marker genes allow identification of transformants. Optionally, the construct may also comprise an enhancer for the promoter. The vector may also comprise a polyadenylation signal, typically 3' to the nucleic acid encoding the functional polypeptide. The vector may also comprise a transcriptional terminator 3' to the sequence encoding the polypeptide of interest. The vector may also comprise one or more introns or other non-coding sequences, for example 3' to the sequence encoding the polypeptide of the invention.

In a typical vector, the nucleic acid sequence of the invention is operably linked to a promoter capable of expressing the sequence. "Operably linked" refers to a juxtaposition wherein the promoter and the nucleic acid sequence encoding the polypeptide or protein are in a relationship permitting the coding sequence to be expressed under the control of the promoter. Thus, there may be elements such as 5' non-coding sequence between the promoter and coding sequence. Such sequences can be included in the vector if they enhance or do not impair the correct control of the coding sequence by the promoter. The invention also provides polypeptides encoded by the nucleic acids of the invention. These polypeptides, which may be partly, substantially or completely isolated, have cellulase, xylanase or arabinoxylan degrading activity, as described above. Preferred polypeptides of the invention include those whose sequences are given in SEQ ID No. 2, 4, 6, 8 or 10. However, the invention is not limited to these sequences; rather, it encompasses all polypeptides encoded by the nucleic acid fragments of the invention that have some or all of the activity, typically substantially the activity of the polypeptide of SEQ ID No. 2, 4, 6, 8 or 10. Thus, variant polypeptides of the invention that are related to the polypeptide of SEQ ID No. 2 or 4 have cellulase activity; variant polypeptides related to the polypeptide of SEQ ID No. 6 or 8 have xylanase activity; and variant polypeptides related to the polypeptide of SEQ ID No. 10 have arabinoxylan degrading activity.

In particular, the invention provides variant polypeptides having sequences related to those of SEQ ID No. 2, 4, 6, 8 or 10. Typically, such variants have a high degree of sequence identity with SEQ ID No. 2, 4, 6, 8 or 10, for example at least 70% identity, thus, they are typically substantially homologous to the polypeptides of SEQ ID No. 2, 4, 6, 8 or 10. Similarly, variant polypeptides of the invention may differ from SEQ ID No. 2, 4, 6, 8 or 10 by the deletion, insertion or substitution of one or more amino acids, as long as the deletion, insertion or substitution does not abolish the activity of the polypeptide, as defined above.

Polypeptides o the invention may be produced by culturing suitable host cells under conditions that permit the expression of polypeptides of the invention from the recombinant nucleic acids o the invention; and, optionally, recovering the polypeptide thus produced. The polypeptide may be recovered by any suitable means known in the art.

EXPERIMENTAL

Standard recombinant DNA techniques such as bacterial growth, DNA isolation, hybridisation, restriction enzyme digestion and DNA sequencing are s according to Sambrook et al. (1989): Molecular cloning, a laboratory manual, Cold Spring Harbor Laboratory Press, New York.).

EXAMPLES 0

Example I Construction of Aspergillus niger cDNA expression library in E.coli .

Example 1.1 Induction and isolation of mRNA

A. niger N400 cultures were grown for 69 and 81 h respectively, as described in s EP-A- 0 463 706 but without yeast extract and with 2% of a crude wheat arabinoxylan fraction instead of oat spelt xylan, after which the mycelium was harvested by filtration and then washed with sterile saline. The mycelium was subsequently frozen in liquid nitrogen after which it was powdered using a Microdismembrator (Braun). Total RNA was isolated from mycelial powder in accordance with the guanidium thiocyanate/CsCI o protocol described in Sambrook et al. (1 989), except that the RNA was centrifuged twice using a CsCl gradient. Poly A⁺ mRNA was isolated from 5 mg of total RNA by oligo (dT)-cellulose chromatography (Aviv and Leder, 1972, Sambrook et a/., 1 989) with the following modifications: SDS is omitted from all solutions and the loading buffer was supplemented with 9% (v/v) dimethylsulfoxide. 5

Example 1.2. Construction of the cDNA library cDNA was synthesized from 7 μg poly A⁺ mRNA and ligated into bacteriophage lambda λ Uni-ZAP XR using the ZAP^,m-cDNA synthesis kit (Stratagene) according to the manufacturer's instructions. After ligation of the cDNA into Uni-ZAP XR vector-arms, o the phage DNA was packaged using Packagene'"¹ extracts (Promega) according to the manufacturer's instructions. Ligation of 120 ng cDNA in 1 .2 μg vector arms and subsequent packaging of the reaction mixture resulted in a primary library consisting of 3.5 x 10⁴ recombinant phages. This primary library was amplified using E.coli XL1 - Blue MRF', titrated and stored at 4°C.

Example 1.3 Conversion of phages into phagemids Phages were propagated by plating them in NZYCM topagarose containing 0.7% agarose on 85 mm diameter NZYCM (1 .5% agar) plates as described by Maniatis et al. (Maniatis et al. (1982): Molecular cloning, a laboratory manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York. pp64) using E.coli BB4 as plating bacteria. After overnight incubation at 37°C confluent plates were obtained from which the phages were eluted by adding 5 ml SM buffer and storing the plate for 2 hrs at 4°C with intermittent shaking. After collection of the supernatant, the bacteria were removed from the solution by centrifugation at 4.000 x g at 4°C for 10 min. To the supernatant, 0.3 % chloroform was added and the number of plaque forming units (pfu) was determined. The phage stock contained approximately 10¹⁰ pfu/ml.

The recombinant Uni-ZAP XR clones containing A. niger cDNA were converted to Bluescript phagemids using superinfection with the filamentous helper phage EXASSIST'™ and E.coli SOLR strain which are included in the cDNA synthesis kit from Stratagene, according to the manufacturer's instructions. For long term storage a glycerol stock containing about 100 colonies per vl of suspension was stored at -80°C.

Example II Construction of an Aspergillus aculeatus cDNA expression library in E.coli.

5 An A. aculeatus CBS 1 01 .43 expression library was constructed in a similar way to the A. niger library of Example I. A. aculeatus CBS 101 .43 was grown up in minimal medium containing Timberlake trace elements, 0.1 % yeast extract and 1 % soy okara for periods of 40, 52, 64 and 76 hrs at 30°C. Minimal medium contains per litre 6 g NaN0₃, 1 ,5g KH₂P0₄, 0.5g KCI and 0.5 g MgS0₄. o From mycelium at 40 and 52 hrs total RNA and polyA⁺ RNA was isolated as described in Example I. From the cDNA library about 1 50.000 primary plaques were amplified. Phages were stored at a concentration of 2.2 x 10⁷, pfu/ml. Phages were converted to phagemids as described in Example I. From 24 random clones plasmids were isolated. They all contained inserts varying in sizes between 0.6 and 2.0 kb.

5 Example III Screening of a plasmid cDNA library for cellulase-producing colonies.

The screening procedure is modified from Wood et al. (Methods in Enzymology 160,

59-74).

Plates containing 20 ml 2 x TY, 0.2% CMC (Sigma C-4888), 1.5% agar and 100 μg o ampicillin per ml. Cells are plated in an overlay of 5 ml containing about 200 colonies per plate. The overlay is kept at 50°C and contains 2 x TY, 0.2% CMC, 0.75% agar and 100 μg ampicillin per ml. Plates are covered with 5 ml 0.5% agarose, 0.2% CMC and 100 μg ampicillin per ml kept at 50°C.

Plates are dried and incubated for 48 hrs at 37°C. Next, 5 ml 0.1 % Congo Red. s (Aldrich no C8, 445.3) is poured on the plates. After staining for 1-2 hrs plates are destained with 5 ml 5M NaCl for 0.5-1 hrs.

About 12.000 colonies from A. niger cDNA library (Example I) were plated. Screening on CMC resulted in 89 colonies giving a halo after staining with Congo Red. Colonies o were subdivided in 3 classes with a large, intermediate and a small halo. From each class 3 colonies were grown up, plasmids isolated and cDNAs sequenced. All con¬ tained a full length cDNA copy. The plasmids fell into two separate classes. From each class a colony was deposited at the CBS, Baarn, the Netherlands. A colony giving a small halo was deposited on 3 August 1995 and designated CBS 589.95. A colony 5 giving a large halo was deposited on 21 September 1995 and designated CBS 662.95. The DNA sequences of the inserts are as shown in SEQ ID No. 1 and 3, respectively, together with the amino acid sequences encoded by them.

o Example IV Screening of a plasmid cDNA library on colonies producing arabinoxylan degrading activity.

Example IV.1 Screening for endo-xylanase activity. The method is similar to that used for cellulase-producing colonies in Example III. The bottom layer contains 2 x TY, 1 .5% agar and 100 μg ampicillin per ml. Cells are plated out in an overlay containing 2 x TY, 0.2 % oat spelt xylan (Sigma X-0627) 0.75 % agar and 100 μg ampicillin per ml. The top layer contains 0.5% agarose, 0.1 % RBB-xylan (Sigma M501 ) and 100 μg ampicillin per ml in 25 mM phosphate buffer pH7.4.

About 2000 colonies from the A. niger cDNA library obtained as in Example I were plated. 46 colonies gave a halo. After rescreening on 2x TY plates with an 0.3 % RBB- xylan top layer 24 colonies gave a halo once more. They were analyzed by restriction enzyme digestion and fell into two classes. From each class 6 colonies were partially sequenced. Both cDNA types (of clones) were deposited at the CBS on 3 August 1 995 under the numbers CBS 590.95 and CBS 591 .95. The DNA sequences of the inserts are as shown in SEQ ID No. 5 and 7, respectively, together with the amino acid sequences that they encode.

Example IV.2 Screening on oat spelt xylan

The plasmid library was also plated on minimal medium plates containing 2% oat spelt xylan. After two days at 37°C some colonies gave halos, indicating the production of endo-xylanase. Other colonies gave rise to precipitation rings around the colonies. Five of the latter colonies were partially sequenced. They were similar to the arabinoxylan degrading cDNA whose isolation is described in pending application EP 94202442.3. A colony containing an A. niger axdA cDNA was deposited at the CBS on 3 August 1995 under CBS 592.95. The DNA sequence of the insert is as shown in SEQ ID No. 9, together with the amino acid sequence it encodes.

The Examples show that the system described can be used for quick, straightforward and efficient expression cloning of eukaryotic DNA in prokaryotic organisms. SEQUENCE LISTING

(1) GENERAL INFORMATION:

(l) APPLICANT:

(A) NAME: Gist-brocades B.V

(B) STREET: Wateringseweg 1

(C) CITY: Delft

(E) COUNTRY. The Netherlands

(F) POSTAL CODE (ZIP) : 2611 XT

(ii) TITLE OF INVENTION: Enzyme detection (ill) NUMBER OF SEQUENCES: 10

(iv) COMPUTER READABLE FORM:

(A) MEDIUM TYPE: Floppy disk

(B) COMPUTER: IBM PC compatible

(C) OPERATING SYSTEM: PC-DOS/MS-DOS

(D) SOFTWARE: Patentin Release #1.0, Version #1.25 (EPO)

(v) CURRENT APPLICATION DATA: APPLICATION NUMBER

(2) INFORMATION FOR SEQ ID NO: 1 :

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1017 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ll) MOLECULE TYPE: cDNA (ill) HYPOTHETICAL. NO (iv) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: Aspergillus niger

(B) STRAIN: N400

(C) INDIVIDUAL ISOLATE: CBS120.49

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 57..776

(D) OTHER INFORMATION: /product-- "Cellulase"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1 ^•

GAATTCGGCA CGAGCGAATT TCCCTTGATT GCCGCTCCTC CGCTCTAACG CCCAAC 56

ATG AAG CTC CCC GTG TCA CTT GCT ATG CTT GCG GCC ACC GCC ATG GGC 104 Met Lys Leu Pro Val Ser Leu Ala Met Leu Ala Ala Thr Ala Met Gly 1 5 10 15

CAG ACG ATG TGC TCT CAA TAT GAC AGT GCC TCG AGC CCC CCG TAT TCA 152 Gin Thr Met Cys Ser Gin Tyr Asp Ser Ala Ser Ser Pro Pro Tyr Ser 20 25 30

GTG AAC CAG AAC CTC TGG GGC GAG TAC CAA GGC ACC GGC AGC CAG TGT 200 Val Asn Gin Asn Leu Trp Gly Glu Tyr Gin Gly Thr Gly Ser Gin Cys 35 40 45 GTA TAT GTC GAC AAA CTC TCC AGC AGT GGT GCA TCC TGG CAC ACC GAA 248 Val Tyr Val Asp Lys Leu Ser Ser Ser Gly Ala Ser Trp His Thr Glu 50 55 60

TGG ACC TGG AGC GGT GGT GAG GGA ACA GTG AAA AGC TAC TCT AAC TCT 296 Trp Thr Trp Ser Gly Gly Glu Gly Thr Val Lys Ser Tyr Ser Asn Ser 65 70 75 80

GGC GTT ACA TTT AAC AAG AAG CTC GTG AGT GAT GTA TCA AGC ATC CCC 344 Gly Val Thr Phe Asn Lys Lys Leu Val Ser Asp Val Ser Ser Ile Pro 85 90 95

ACC TCG GTG GAA TGG AAG CAG GAC AAC ACC AAC GTC AAC GCC GAT GTC 392 Thr Ser Val Glu Trp Lys Gin Asp Asn Thr Asn Val Asn Ala Asp Val 100 105 HO

GCG TAT GAT CTT TTC ACC GCG GCG AAT GTG GAC CAT GCC ACT TCT AGC ₄₄0 Ala Tyr Asp Leu Phe Thr Ala Ala Asn Val Asp His Ala Thr Ser Ser 115 120 125

GGC GAC TAT GAA CTG ATG ATT TGG CTT GCC CGC TAC GGC AAC ATC CAG 488 Gly Asp Tyr Glu Leu Met Ile Trp Leu Ala Arg Tyr Gly Asn Ile Gin 130 135 140

CCC ATT GGC AAG CAA ATT GCC ACG GCC ACA GTG GGA GGC AAG TCC TGG 536 Pro Ile Gly Lys Gin Ile Ala Thr Ala Thr Val Gly Gly Lys Ser Trp 145 150 155 160

GAG GTG TGG TAT GGC AGC ACC ACC CAG GCC GGT GCG GAG CAG AGG ACA 584 Glu Val Trp Tyr Gly Ser Thr Thr Gin Ala Gly Ala Glu Gin Arg Thr 165 170 175

TAC AGC TTC GTG TCA GAA AGC CCT ATC AAC TCA TAC AGT GGG GAC ATC 632 Tyr Ser Phe Val Ser Glu Ser Pro Ile Asn Ser Tyr Ser Gly Asp Ile 180 185 190

AAT GCA TTT TTC AGC TAT CTC ACT CAG AAC CAA GGC TTT CCC GCC AGC 680 Asn Ala Phe Phe Ser Tyr Leu Thr Gin Asn Gin Gly Phe Pro Ala Ser 195 200 205

TCT CAG TAC TTG ATC AAT CTG CAG TTT GGA ACT GAG GCG TTC ACC GGG 728 Ser Gin Tyr Leu Ile Asn Leu Gin Phe Gly Thr Glu Ala Phe Thr Gly 210 215 220

GGC CCG GCA ACC TTC ACG GTT GAC AAC TGG ACC GCC AGT GTC AAC TAGGGTTCTA 783 Gly Pro Ala Thr Phe Thr Val Asp Asn Trp Thr Ala Ser Val Asn 225 230 235 240

GAAGTAGCCT TTGAGGCAGA ATCTGGGTAA ATTGACTCCA GCTCGGGAGA ATGATAGCTT 843

GTTTCTTCGT TCTGGAACGT TGGGCGTGTG AGAGCTAAAA AGTCGTACCC ACTCTGATTG 903

GAAAGACTTA TTCAACATTG GTCCTTCCCT TCTGTTGGGC AAGGCATAGT TAGTGATTAG 963

ACAAGTCAAG GTCATGGTGG ATCCCTTGTA AAAAAAAAAA AAAAAAAACT CGAG 1017

(2) INFORMATION FOR SEQ ID NO.2.

(l) SEQUENCE CHARACTERISTICS.

(A) LENGTH. 239 ammo acids

(B) TYPE, amino acid (D) TOPOLOGY linear

(ii) MOLECULE TYPE protein

(xi) SEQUENCE DESCRIPTION SEQ ID NO:2 Met Lys Leu Pro Val Ser Leu Ala Met Leu Ala Ala Thr Ala Met Gly 1 5 10 15

Gin Thr Met Cys Ser Gin Tyr Asp Ser Ala Ser Ser Pro Pro Tyr Ser 20 25 30

Val Asn Gin Asn Leu Trp Gly Glu Tyr Gin Gly Thr Gly Ser Gin Cys 35 40 45

Val Tyr Val Asp Lys Leu Ser Ser Ser Gly Ala Ser Trp His Thr Glu 50 55 60

Trp Thr Trp Ser Gly Gly Glu Gly Thr Val Lys Ser Tyr Ser Asn Ser 65 70 75 80

Gly Val Thr Phe Asn Lys Lys Leu Val Ser Asp Val Ser Ser Ile Pro 85 90 95

Thr Ser Val Glu Trp Lys Gin Asp Asn Thr Asn Val Asn Ala Asp Val 100 105 110

Ala Tyr Asp Leu Phe Thr Ala Ala Asn Val Asp His Ala Thr Ser Ser 115 120 125

Gly Asp Tyr Glu Leu Met Ile Trp Leu Ala Arg Tyr Gly Asn Ile Gin 130 135 140

Pro Ile Gly Lys Gin Ile Ala Thr Ala Thr Val Gly Gly Lys Ser Trp 145 150 155 160

Glu Val Trp Tyr Gly Ser Thr Thr Gin Ala Gly Ala Glu Gin Arg Thr 165 170 175

Tyr Ser Phe Val Ser Glu Ser Pro Ile Asn Ser Tyr Ser Gly Asp Ile 180 185 190

Asn Ala Phe Phe Ser Tyr Leu Thr Gin Asn Gin Gly Phe Pro Ala Ser 195 200 205

Ser Gin Tyr Leu Ile Asn Leu Gin Phe Gly Thr Glu Ala Phe Thr Gly 210 215 220

Gly Pro Ala Thr Phe Thr Val ASD Asn Trp Thr Ala Ser Val Asn 225 230 235

(2) INFORMATION FOR SEQ ID NO:3 :

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1198 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: Aspergillus niger

(B) STRAIN: N400

(C) INDIVIDUAL ISOLATE: CBS120.49 ( ix) FEATURE :

(A) NAME/KEY: CDS

(B) LOCATION: 32..1027

(D) OTHER INFORMATION: /product-- "Cellulase"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3 :

GAATTCGGCA CGAGATCGAG CAGTCGTAGC G ATG AAG TTT CAG AGC ACT TTG 52

Met Lys Phe Gin Ser Thr Leu l 5

CTT CTT GCC GCC GCG GCT GGT TCC GCG TTG GCT GTG CCT CAT GGC TCC 100 Leu Leu Ala Ala Ala Ala Gly Ser Ala Leu Ala Val Pro His Gly Ser 10 15 20

GGA CAT AAG AAG AGG GCG TCT GTG TTT GAA TGG TTC GGA TCG AAC GAG 148 Gly His Lys Lys Arg Ala Ser Val Phe Glu Trp Phe Gly Ser Asn Glu 25 30 35

TCT GGT GCT GAA TTT GGG ACC AAT ATC CCA GGC GTC TGG GGA ACC GAC 196 Ser Gly Ala Glu Phe Gly Thr Asn Ile Pro Gly Val Trp Gly Thr Asp 40 45 50 55

TAC ATC TTC CCC GAC CCC TCG ACC ATC TCT ACG TTG ATT GGC AAG GGA 244 Tyr Ile Phe Pro Asp Pro Ser Thr Ile Ser Thr Leu Ile Gly Lys Gly 60 65 70

ATG AAC TTC TTC CGC GTC CAG TTC ATG ATG GAG AGG TTG CTT CCT GAC 292 Met Asn Phe Phe Arg Val Gin Phe Met Met Glu Arg Leu Leu Pro Asp 75 80 85

TCG ATG ACT GGT TCA TAC GAC GAG GAG TAT CTG GCC AAC TTG ACG ACT 340 Ser Met Thr Gly Ser Tyr Asp Glu Glu Tyr Leu Ala Asn Leu Thr Thr 90 95 100

GTG GTG AAA GCG GTC ACG GAT GGA GGC GCG CAT GCG CTC ATC GAC CCT 388 Val Val Lys Ala Val Thr Asp Gly Gly Ala His Ala Leu Ile Asp Pro 105 110 115

CAT AAC TAT GGC AGA TAC AAC GGG GAG ATC ATC TCC AGT ACA TCG GAT 436 His Asn Tyr Gly Arg Tyr Asn Gly Glu Ile Ile Ser Ser Thr Ser Asp 120 125 130 135

TTC CAG ACT TTC TGG CAG AAT CTG GCG GGC CAG TAC AAA GAT AAC GAC 484 Phe Gin Thr Phe Trp Gin Asn Leu Ala Gly Gin Tyr Lys Asp Asn Asp 140 145 150

TTG GTC ATG TTT GAT ACC AAC AAC GAA TAC TAC GAC ATG GAC CAG GAT 532 Leu Val Met Phe Asp Thr Asn Asn Glu Tyr Tyr Asp Met Asp Gin Asp 155 160 165

CTC GTG CTG AAT CTC AAC CAA GCA GCC ATT AAC GGC ATC CGC GCT GCA 580 Leu Val Leu Asn Leu Asn Gin Ala Ala Ile Asn Gly Ile Arg Ala Ala 170 175 180

GGT GCA AGC CAG TAC ATT TTC GTC GAA GGC AAC TCC TGG ACC GGA GCT 628 Gly Ala Ser Gin Tyr lie Phe Val Glu Gly Asn Ser Trp Thr Gly Ala 185 190 195

TGG ACA TGG GTC GAT GTC AAC GAT AAT ATG AAG AAT TTG ACC GAC CCA 676 Trp Thr Trp Val Asp Val Asn Asp Asn Met Lys Asn Leu Thr Asp Pro 200 205 210 215

GAA GAC AAG ATC GTC TAT GAA ATG CAC CAG TAC CTA GAC TCC GAC GGT 724 Glu Asp Lys Ile Val Tyr Glu Met His Gin Tyr Leu Asp Ser Asp Gly 220 225 230 TCC GGC ACT TCG GAG ACC TGT GTC TCC GGG ACA ATC GGA AAG GAG CGG 772 Ser Gly Thr Ser Glu Thr Cys Val Ser Gly Thr Ile Gly Lys Glu Arg 235 240 245

ATC ACT GAT GCT ACA CAG TGG CTC AAG GAC AAT AAG AAG GTC GGC TTC 820 Ile Thr Asp Ala Thr Gin Trp Leu Lys Asp Asn Lys Lys Val Gly Phe 250 255 260

ATC GGC GAA TAT GCC GGG GGG TCC AAT GAT GTG TGT CGG AGT GCC GTG 868 Ile Gly Glu Tyr Ala Gly Gly Ser Asn Asp Val Cys Arg Ser Ala Val 265 270 275

TCC GGG ATG CTA GAG TAC ATG GCG AAC AAC ACC GAC GTA TGG AAG GGT 916 Ser Gly Met Leu Glu Tyr Met Ala Asn Asn Thr Asp Val Trp Lys Gly 280 285 290 295

GCG TCG TGG TGG GCA GCC GGG CCA TGG TGG GGA GAC TAC ATT TTC AGC 96₄ Ala Ser Trp Trp Ala Ala Gly Pro Trp Trp Gly Asp Tyr Ile Phe Ser 300 305 310

CTG GAG CCC CCA GAT GGA ACT GCT TAC ACG GGT ATG CTG GAT ATC CTG 1012 Leu Glu Pro Pro Asp Gly Thr Ala Tyr Thr Gly Met Leu Asp Ile Leu 315 320 325

GAG ACG TAT CTC TGAGAACTGG GTGGGGTCGC AGATGCGGTG CGTCGGAGAA 1064

Glu Thr Tyr Leu 330

CTATACGGAG TTTCTTATCA GAGTGGACGG TGGTGGTACA GAGAGGCGTA CTAGAATGAA 1124

TTAGTGGCAG CGCACTGACT GACGTCACAA GACATTGCTT TTTTTGTGAA AAAAAAAAAA 1184

AAAAAAAACT CGAG 1198

(2) INFORMATION FOR SEQ ID NO:4 :

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 331 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4 :

Met Lys Phe Gin Ser Thr Leu Leu Leu Ala Ala Ala Ala Gly Ser Ala 1 5 10 15

Leu Ala Val Pro His Gly Ser Gly His Lys Lys Arg Ala Ser Val Phe 20 25 30

Glu Trp Phe Gly Ser Asn Glu Ser Gly Ala Glu Phe Gly Thr Asn Ile 35 40 45

Pro Gly Val Trp Gly Thr Asp Tyr Ile Phe Pro Asp Pro Ser Thr Ile 50 55 60

Ser Thr Leu Ile Gly Lys Gly Met Asn Phe Phe Arg Val Gin Phe Met 65 70 75 80

Met Glu Arg Leu Leu Pro Asp Ser Met Thr Gly Ser Tyr Asp Glu Glu 85 90 95

Tyr Leu Ala Asn Leu Thr Thr Val Val Lys Ala Val Thr Asp Gly Gly 100 105 110 Ala His Ala Leu Ile Asp Pro His Asn Tyr Gly Arg Tyr Asn Gly Glu

115 120 125

Ile Ile Ser Ser Thr Ser Asp Phe Gin Thr Phe Trp Gin Asn Leu Ala 130 135 140

Gly Gin Tyr Lys Asp Asn Asp Leu Val Met Phe Asp Thr Asn Asn Glu 145 150 155 160

Tyr Tyr Asp Met Asp Gin Asp Leu Val Leu Asn Leu Asn Gin Ala Ala 165 170 175

Ile Asn Gly Ile Arg Ala Ala Gly Ala Ser Gin Tyr Ile Phe Val Glu 180 185 190

Gly Asn Ser Trp Thr Gly Ala Trp Thr Trp Val Asp Val Asn Asp Asn 195 200 205

Met Lys Asn Leu Thr Asp Pro Glu Asp Lys Ile Val Tyr Glu Met His 210 215 220

Gin Tyr Leu Asp Ser Asp Gly Ser Gly Thr Ser Glu Thr Cys Val Ser 225 230 235 240

Gly Thr Ile Gly Lys Glu Arg Ile Thr Asp Ala Thr Gin Trp Leu Lys 245 250 255

Asp Asn Lys Lys Val Gly Phe Ile Gly Glu Tyr Ala Gly Gly Ser Asn 260 265 270

Asp Val Cys Arg Ser Ala Val Ser Gly Met Leu Glu Tyr Met Ala Asn 275 280 285

Asn Thr Asp Val Trp Lys Gly Ala Ser Trp Trp Ala Ala Gly Pro Trp 290 295 300

Trp Gly Asp Tyr Ile Phe Ser Leu Glu Pro Pro Asp Gly Thr Ala Tyr 305 310 315 320

Thr Gly Met Leu Asp Ile Leu Glu Thr Tyr Leu 325 330

(2) INFORMATION FOR SEQ ID NO:5 :

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 851 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: Aspergillus niger

(B) STRAIN: N400

(C) INDIVIDUAL ISOLATE: CBS120.49

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 30..707 (D) OTHER INFORMATION: /product-- "Xylanase"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5 :

GAATTCGGCA CGAGCATCAA TCAACAACC ATG CTC ACC AAG AAC CTT CTC CTC 53

Met Leu Thr Lys Asn Leu Leu Leu 1 5

TGC TTT GCC GCG GCT AAG GCT GCT CTG GCT GTT CCC CAC GAC TCT GTG 101 Cys Phe Ala Ala Ala Lys Ala Ala Leu Ala Val Pro His Asp Ser Val 10 15 20

GCC CAG CGT TCG GAT GCC TTG CAC ATG CTC TCT GAG CGC TCG ACC CCG 149 Ala Gin Arg Ser Asp Ala Leu His Met Leu Ser Glu Arg Ser Thr Pro 25 30 35 40

AGC TCG ACC GGC GAG AAC AAC GGC TTC TAC TAC TCC TTC TGG ACC GAC 197 Ser Ser Thr Gly Glu Asn Asn Gly Phe Tyr Tyr Ser Phe Trp Thr Asp 45 50 55

GGC GGT GGA GAC GTG ACC TAC ACC AAC GGA GAT GCT GGT GCC TAC ACT 245 Gly Gly Gly Asp Val Thr Tyr Thr Asn Gly Asp Ala Gly Ala Tyr Thr 60 65 70

GTT GAG TGG TCC AAC GTG GGC AAC TTT GTC GGT GGA AAG GGC TGG AAC 293 Val Glu Trp Ser Asn Val Gly Asn Phe Val Gly Gly Lys Gly Trp Asn 75 80 85

CCC GGA AGT GCG CAG GAC ATC ACC TAC AGC GGC ACC TTC ACC CCT AGC 341 Pro Gly Ser Ala Gin Asp Ile Thr Tyr Ser Gly Thr Phe Thr Pro Ser 90 95 100

GGC AAC GGC TAC CTC TCC GTC TAT GGC TGG ACC ACT GAC CCT CTG ATC 389 Gly Asn Gly Tyr Leu Ser Val Tyr Gly Trp Thr Thr Asp Pro Leu Ile 105 110 115 120

GAG TAC TAC ATC GTC GAG TCC TAC GGC GAC TAC AAC CCC GGC AGT GGA 437 Glu Tyr Tyr Ile Val Glu Ser Tyr Gly Asp Tyr Asn Pro Gly Ser Gly 125 130 135

GGC ACG TAC AAG GGC ACC GTC ACC TCG GAC GGA TCC GTT TAC GAT ATC 485 Gly Thr Tyr Lys Gly Thr Val Thr Ser Asp Gly Ser Val Tyr Asp Ile 140 145 150

TAC ACG GCT ACC CGT ACC AAT GCT GCT TCC ATT CAG GGA ACC GCT ACC 533 Tyr Thr Ala Thr Arg Thr Asn Ala Ala Ser Ile Gin Gly Thr Ala Thr 155 160 165

TTC ACT CAG TAC TGG TCC GTT CGC CAG AAC AAG AGA GTT GGC GGA ACC 581 Phe Thr Gin Tyr Trp Ser Val Arg Gin Asn Lys Arg Val Gly Gly Thr 170 175 180

GTT ACC ACC TCC AAC CAC TTC AAT GCT TGG GCT AAG CTG GGA ATG AAC 629 Val Thr Thr Ser Asn His Phe Asn Ala Trp Ala Lys Leu Gly Met Asn 185 190 195 200

CTG GGT ACT CAC AAC TAC CAG ATC GTG GCT ACC GAG GGT TAC CAG AGC 677 Leu Gly Thr His Asn Tyr Gin Ile Val Ala Thr Glu Gly Tyr Gin Ser 205 210 215

AGT GGA TCT TCG TCC ATC ACT GTT CGG TAAGCGGTGG AAGTGTGGAT 724

Ser Gly Ser Ser Ser Ile Thr Val Arg 220 225

TGAACGATTG TGCATGTAAT TACTGAGCAG TCGTATGATA TGTGAAACAG GTAGTTGTTT 784 GGTACCAATG TACTGGTCAT TTGGAGTGAA AAAAAAAAAA AAAAAACTCG AGGGGGGGCC 844 CGGTACC g51

(2) INFORMATION FOR SEQ ID NO:6 :

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 225 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6 :

Met Leu Thr Lys Asn Leu Leu Leu Cys Phe Ala Ala Ala Lys Ala Ala 1 5 10 15

Leu Ala Val Pro His Asp Ser Val Ala Gin Arg Ser Asp Ala Leu His 20 25 30

Met Leu Ser Glu Arg Ser Thr Pro Ser Ser Thr Gly Glu Asn Asn Gly 35 40 45

Phe Tyr Tyr Ser Phe Trp Thr Asp Gly Gly Gly Asp Val Thr Tyr Thr 50 55 60

Asn Gly Asp Ala Gly Ala Tyr Thr Val Glu Trp Ser Asn Val Gly Asn 65 70 75 80

Phe Val Gly Gly Lys Gly Trp Asn Pro Gly Ser Ala Gin Asp Ile Thr 85 90 95

Tyr Ser Gly Thr Phe Thr Pro Ser Gly Asn Gly Tyr Leu Ser Val Tyr 100 105 110

Gly Trp Thr Thr Asp Pro Leu Ile Glu Tyr Tyr Ile Val Glu Ser Tyr 115 120 125

Gly Asp Tyr Asn Pro Gly Ser Gly Gly Thr Tyr Lys Gly Thr Val Thr 130 135 140

Ser Asp Gly Ser Val Tyr Asp Ile Tyr Thr Ala Thr Arg Thr Asn Ala 145 150 155 160

Ala Ser Ile Gin Gly Thr Ala Thr Phe Thr Gin Tyr Trp Ser Val Arg 165 170 175

Gin Asn Lys Arg Val Gly Gly Thr Val Thr Thr Ser Asn His Phe Asn 180 185 190

Ala Trp Ala Lys Leu Gly Met Asn Leu Gly Thr His Asn Tyr Gin Ile 195 200 205

Val Ala Thr Glu Gly Tyr Gin Ser Ser Gly Ser Ser Ser Ile Thr Val 210 215 220

Arg 225 (2) INFORMATION FOR SEQ ID NO:7 :

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1233 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: Aspergillus niger

(B) STRAIN: N400

(C) INDIVIDUAL ISOLATE: CBΞ120.49

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 41..1024

(D) OTHER INFORMATION: /product-- "Xylanase"

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 :

GAATTCGGCA CGAGAGAAGG TATATCGTTT CTTGCCCAAC ATG GTT CAG ATC AAG 55

Met Val Gin Ile Lys 1 5

GTA GCT GCA CTG GCG ATG CTT TTC GCT AGC CAG GTA CTT TCT GAG CCC 103 Val Ala Ala Leu Ala Met Leu Phe Ala Ser Gin Val Leu Ser Glu Pro 10 15 20

ATT GAA CCC CGT CAG GCT TCA GTG AGT ATT GAT ACC AAA TTC AAG GCT 151 lie Glu Pro Arg Gin Ala Ser Val Ser Ile Asp Thr Lys Phe Lys Ala 25 30 35

CAC GGG AAG AAA TAT CTT GGA AAC ATT GGT GAT CAG TAC ACC TTG ACC 199 His Gly Lys Lys Tyr Leu Gly Asn Ile Gly Asp Gin Tyr Thr Leu Thr 40 45 50

AAG AAC TCG AAG ACT CCG GCC ATT ATC AAG GCC GAT TTT GGC GCG TTG 247 Lys Asn Ser Lys Thr Pro Ala Ile Ile Lys Ala Asp Phe Gly Ala Leu 55 60 65

ACT CCA GAG AAC AGC ATG AAG TGG GAT GCT ACT GAA CCC AGC CGT GGA 295 Thr Pro Glu Asn Ser Met Lys Trp Asp Ala Thr Glu Pro Ser Arg Gly 70 75 80 85

CAG TTC TCT TTC TCA GGA TCG GAC TAC CTG GTC AAC TTT GCC CAG TCT 343 Gin Phe Ser Phe Ser Gly Ser Asp Tyr Leu Val Asn Phe Ala Gin Ser 90 95 100

AAC AAC AAG CTG ATC CGC GGA CAT ACT CTC GTG TGG CAC TCG CAG CTC 391 Asn Asn Lys Leu Ile Arg Gly His Thr Leu Val Trp His Ser Gin Leu 105 110 115

CCC TCC TGG GTC CAA TCC ATC ACG GAC AAG AAT ACA CTG ATC GAA GTC 439 Pro Ser Trp Val Gin Ser Ile Thr Asp Lys Asn Thr Leu Ile Glu Val 120 125 130

ATG AAG AAT CAC ATC ACC ACA GTG ATG CAA CAC TAT AAG GGC AAG ATT 487 Met Lys Asn His Ile Thr Thr Val Met Gin His Tyr Lys Gly Lys Ile 135 140 145 TAT GCC TGG GAT GTT GTC AAT GAA ATC TTC AAC GAA GAC GGC TCC CTA 535 Tyr Ala Trp Asp Val Val Asn Glu Ile Phe Asn Glu Asp Gly Ser Leu 150 155 160 165

CGC GAC AGC GTC TTT TAC AAG GTC ATC GGC GAG GAC TAC GTG CGG ATC 583 Arg Asp Ser Val Phe Tyr Lys Val Ile Gly Glu Asp Tyr Val Arg Ile 170 175 180

GCC TTC GAG ACT GCT CGG GCT GCA GAT CCC AAT GCA AAG CTC TAC ATC 631 Ala Phe Glu Thr Ala Arg Ala Ala Asp Pro Asn Ala Lys Leu Tyr Ile 185 190 195

AAT GAT TAC AAC CTG GAT TCC GCC TCC TAC CCT AAA TTG ACC GGC ATG 679 Asn Asp Tyr Asn Leu Asp Ser Ala Ser Tyr Pro Lys Leu Thr Gly Met 200 205 210

GTT AGC CAT GTC AAG AAG TGG ATC GCA GCT GGC ATC CCT ATC GAT GGA 727 Val Ser His Val Lys Lys Trp Ile Ala Ala Gly Ile Pro Ile Asp Gly 215 220 225

ATC GGT TCC CAA ACC CAC TTG AGC GCT GGT GGA GGT GCT GGA ATT TCT 775 Ile Gly Ser Gin Thr His Leu Ser Ala Gly Gly Gly Ala Gly Ile Ser 230 235 240 245

GGA GCT CTC AAT GCT CTC GCA GGT GCC GGC ACC AAG GAG ATT GCT GTC 823 Gly Ala Leu Asn Ala Leu Ala Gly Ala Gly Thr Lys Glu Ile Ala Val 250 255 260

ACC GAG CTT GAC ATC GCT GGC GCC AGC TCG ACC GAC TAC GTG GAG GTC 871 Thr Glu Leu Asp Ile Ala Gly Ala Ser Ser Thr Asp Tyr Val Glu Val 265 270 275

GTC GAA GCC TGC CTG AAC CAG CCC AAG TGT ATC GGT ATC ACC GTT TGG 919 Val Glu Ala Cys Leu Asn Gin Pro Lys Cys Ile Gly Ile Thr Val Trp 280 285 290

GGA GTT GCT GAC CCG GAT TCC TGG CGC TCC AGC TCC ACT CCT CTG CTG 967 Gly Val Ala Asp Pro Asp Ser Trp Arg Ser Ser Ser Thr Pro Leu Leu 295 300 305

TTC GAC AGC AAC TAC AAC CCG AAG CCT GCA TAC ACT GCT ATC GCA AAT 1015 Phe Asp Ser Asn Tyr Asn Pro Lys Pro Ala Tyr Thr Ala Ile Ala Asn 310 315 320 325

GCT CTC TAGATCCTGC AACCACTCCG ATCGGATTTC CGGGAAGGCA TAGCCTTATG 1071 Ala Leu

AGGTAGGGGA CGCTCTGTTG CCTGCTGCAA CTTTGACTCT GCCATATCTG CCCATAGCAA 1131 AGGGTTGTAT TTTTTTTTCC TGTACTTCTT CCACTTTTGG CCATTACAAT CGTTTCATTT 1191 CCAAAAAAAA AAAAAAAAAA ACTCGAGGGG GGGCCCGGTA CC 1233

(2) INFORMATION FOR SEQ ID NO: 8 :

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 327 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8 : Met Val Gin Ile Lys Val Ala Ala Leu Ala Met Leu Phe Ala Ser Gin 1 5 10 15

Val Leu Ser Glu Pro Ile Glu Pro Arg Gin Ala Ser Val Ser Ile Asp 20 25 30

Thr Lys Phe Lys Ala His Gly Lys Lys Tyr Leu Gly Asn Ile Gly Asp 35 40 45

Gin Tyr Thr Leu Thr Lys Asn Ser Lys Thr Pro Ala Ile Ile Lys Ala 50 55 60

Asp Phe Gly Ala Leu Thr Pro Glu Asn Ser Met Lys Trp Asp Ala Thr 65 70 75 80

Glu Pro Ser Arg Gly Gin Phe Ser Phe Ser Gly Ser Asp Tyr Leu Val 85 90 95

Asn Phe Ala Gin Ser Asn Asn Lys Leu Ile Arg Gly His Thr Leu Val 100 105 110

Trp His Ser Gin Leu Pro Ser Trp Val Gin Ser Ile Thr Asp Lys Asn 115 120 125

Thr Leu Ile Glu Val Met Lys Asn His Ile Thr Thr Val Met Gin His 130 135 140

Tyr Lys Gly Lys Ile Tyr Ala Trp Asp Val Val Asn Glu Ile Phe Asn 145 150 155 160

Glu Asp Gly Ser Leu Arg Asp Ser Val Phe Tyr Lys Val Ile Gly Glu 165 170 175

Asp Tyr Val Arg Ile Ala Phe Glu Thr Ala Arg Ala Ala Asp Pro Asn 180 185 190

Ala Lys Leu Tyr Ile Asn Asp Tyr Asn Leu Asp Ser Ala Ser Tyr Pro 195 200 205

Lys Leu Thr Gly Met Val Ser His Val Lys Lys Trp Ile Ala Ala Gly 210 215 220

Ile Pro Ile Asp Gly Ile Gly Ser Gin Thr His Leu Ser Ala Gly Gly 225 230 235 240

Gly Ala Gly Ile Ser Gly Ala Leu Asn Ala Leu Ala Gly Ala Gly Thr 245 250 255

Lys Glu Ile Ala Val Thr Glu Leu Asp Ile Ala Gly Ala Ser Ser Thr 260 265 270

Asp Tyr Val Glu Val Val Glu Ala Cys Leu Asn Gin Pro Lys Cys Ile 275 280 285

Gly Ile Thr Val Trp Gly Val Ala Asp Pro Asp Ser Trp Arg Ser Ser 290 295 300

Ser Thr Pro Leu Leu Phe Asp Ser Asn Tyr Asn Pro Lys Pro Ala Tyr 305 310 315 320

Thr Ala Ile Ala Asn Ala Leu 325 (2) INFORMATION FOR SEQ ID NO: 9:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1276 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: double

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: cDNA (iii) HYPOTHETICAL: NO (iv) ANTI-SENSE: NO

(vi) ORIGINAL SOURCE:

(A) ORGANISM: Aspergillus niger

(B) STRAIN: N400

(C) INDIVIDUAL ISOLATE: CBS120.49

(ix) FEATURE:

(A) NAME/KEY: CDS

(B) LOCATION: 72..1070

(D) OTHER INFORMATION: /product-- "Arabinoxylan Degrading activity"

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9 :

GAATTCGGCA CGAGACGATC CCAACCATTG ATCTCTTTTG TTTGTTCCTC AGCGGATAAA 60

GTCATACGAA A ATG AAA TTC CTC AAA GCC AAG GGT AGC TTG CTG TCG TCT 110 Met Lys Phe Leu Lys Ala Lys Gly Ser Leu Leu Ser Ser 1 5 10

GGC ATA TAC CTC ATT GCA TTG GCC CCC TTT GTC AAC GCA AAA TGC GCT 158 Gly Ile Tyr Leu Ile Ala Leu Ala Pro Phe Val Asn Ala Lys Cys Ala 15 20 25

CTT CCG TCG ACA TAT AGT TGG ACT TCG ACC GAT GCT CTC GCC ACC CCA 206 Leu Pro Ser Thr Tyr Ser Trp Thr Ser Thr Asp Ala Leu Ala Thr Pro 30 35 40 45

AAG TCC GGA TGG ACT GCA CTC AAG GAC TTC ACC GAT GTC GTC TCT AAC 254 Lys Ser Gly Trp Thr Ala Leu Lys Asp Phe Thr Asp Val Val Ser Asn 50 55 60

GGC AAA CAT ATT GTC TAT GCG TCC ACT ACC GAC ACA CAG GGA AAT TAC 302 Gly Lys His Ile Val Tyr Ala Ser Thr Thr Asp Thr Gin Gly Asn Tyr 65 70 75

GGC TCC ATG GGC TTT GGC GCC TTT TCG GAC TGG TCG GAC ATG GCA TCC 350 Gly Ser Met Gly Phe Gly Ala Phe Ser Asp Trp Ser Asp Met Ala Ser 80 85 90

GCT AGT CAA ACG GCC ACA AGC TTC AGC GCC GTA GCT CCA ACC TTG TTC 398 Ala Ser Gin Thr Ala Thr Ser Phe Ser Ala Val Ala Pro Thr Leu Phe 95 100 105

TAC TTC CAG CCA AAG AGT ATC TGG GTT CTG GCC TAC CAA TGG GGC TCC 446 Tyr Phe Gin Pro Lys Ser Ile Trp Val Leu Ala Tyr Gin Trp Gly Ser 110 115 120 125

AGC ACT TTC ACC TAC CGC ACC TCT CAA GAT CCC ACC AAT GTC AAC GGC 494 Ser Thr Phe Thr Tyr Arg Thr Ser Gin Asp Pro Thr Asn Val Asn Gly 130 135 140 TGG TCA TCC GAG CAA GCT CTT TTC ACG GGC AAA ATC AGC GGC TCA AGT 542 Trp Ser Ser Glu Gin Ala Leu Phe Thr Gly Lys Ile Ser Gly Ser Ser 145 150 155

ACC GGT GCC ATT GAT CAG ACT GTG ATT GGT GAT GAT ACG AAT ATG TAT 590 Thr Gly Ala Ile Asp Gin Thr Val Ile Gly Asp Asp Thr Asn Met Tyr 160 165 170

CTT TTC TTT GCC GGC GAC AAT GGC AAG ATC TAC CGA TCC AGC ATG TCT 638 Leu Phe Phe Ala Gly Asp Asn Gly Lys Ile Tyr Arg Ser Ser Met Ser 175 180 185

ATC AAT GAC TTC CCC GGA AGC TTC GGC AGC CAG TAC GAG GAG ATC CTC 686 Ile Asn Asp Phe Pro Gly Ser Phe Gly Ser Gin Tyr Glu Glu Ile Leu 190 195 200 205

AGC GGC GCG ACC AAC GAT TTG TTC GAG GCG GTC CAA GTG TAC ACC GTC 734 Ser Gly Ala Thr Asn Asp Leu Phe Glu Ala Val Gin Val Tyr Thr Val 210 215 220

GAC GGC GGC GAG GGT GAC AGC AAG TAC CTC ATG ATC GTC GAG GCG ATC 782 Asp Gly Gly Glu Gly Asp Ser Lys Tyr Leu Met Ile Val Glu Ala Ile 225 230 235

GGT TCC ACC GGA CAT CGT TAT TTC CGC TCC TTC ACG GCC AGC AGT CTC 830 Gly Ser Thr Gly His Arg Tyr Phe Arg Ser Phe Thr Ala Ser Ser Leu 240 245 250

GGC GGA GAG TGG ACA GCC CAG GCG GCA AGT GAA GAT CAA CCC TTC GCG 878 Gly Gly Glu Trp Thr Ala Gin Ala Ala Ser Glu Asp Gin Pro Phe Ala 255 260 265

GGC AAA GCC AAC AGT GGC GCC ACC TGG ACC GAC GAC ATC AGT CAT GGT 926 Gly Lys Ala Asn Ser Gly Ala Thr Trp Thr Asp Asp Ile Ser His Gly 270 275 280 285

GAC TTG GTT CGC AAC AAC CCT GAT CAA ACC ATG ACG GTC GAT CCT TGC 974 Asp Leu Val Arg Asn Asn Pro Asp Gin Thr Met Thr Val Asp Pro Cys 290 295 300

AAC CTC CAG CTT CTC TAC CAG GGC CAT GAC CCC AAC AGC AAT AGT GAC 1022 Asn Leu Gin Leu Leu Tyr Gin Gly His Asp Pro Asn Ser Asn Ser Asp 305 310 315

TAC AAC CTC TTG CCC TGG AAG CCA GGA GTT CTT ACC TTG AAG CAG TGAAAGGCTT 1077 Tyr Asn Leu Leu Pro Trp Lys Pro Gly Val Leu Thr Leu Lys Gin 320 325 330

ATCATTTGGT TGCAGACCGG GGTTTTCTTC CCCTTCCTTG AGTAGTATTG TTGGTGGAAG 1137

ACAGCGGGAT GGGGAGTGAA TACTATCTTG GGCTCAATTG AGGTGGAATC CTGTCAGACT 1197

GTGTACATAG GCTACATGCG AATGATTTGG TTTATTCACA AAAAAAAAAA AAAAAAAACT 1257

CGAGGGGGGG CCCGGTACC 1276

(2) INFORMATION FOR SEQ ID NO: 10:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 332 amino acids

(B) TYPE: amino acid (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:

Met Lys Phe Leu Lys Ala Lys Gly Ser Leu Leu Ser Ser Gly Ile Tyr 1 5 10 15

Leu lie Ala Leu Ala Pro Phe Val Asn Ala Lys Cys Ala Leu Pro Ser 20 25 30

Thr Tyr Ser Trp Thr Ser Thr Asp Ala Leu Ala Thr Pro Lys Ser Gly 35 40 45

Trp Thr Ala Leu Lys Asp Phe Thr Asp Val Val Ser Asn Gly Lys His 50 55 60

Ile Val Tyr Ala Ser Thr Thr Asp Thr Gin Gly Asn Tyr Gly Ser Met 65 70 75 80

Gly Phe Gly Ala Phe Ser Asp Trp Ser Asp Met Ala Ser Ala Ser Gin 85 90 95

Thr Ala Thr Ser Phe Ser Ala Val Ala Pro Thr Leu Phe Tyr Phe Gin 100 105 110

Pro Lys Ser Ile Trp Val Leu Ala Tyr Gin Trp Gly Ser Ser Thr Phe 115 120 125

Thr Tyr Arg Thr Ser Gin Asp Pro Thr Asn Val Asn Gly Trp Ser Ser 130 135 140

Glu Gin Ala Leu Phe Thr Gly Lys Ile Ser Gly Ser Ser Thr Gly Ala 145 150 155 160

Ile Asp Gin Thr Val Ile Gly Asp Asp Thr Asn Met Tyr Leu Phe Phe 165 170 175

Ala Gly Asp Asn Gly Lys Ile Tyr Arg Ser Ser Met Ser Ile Asn Asp 180 185 190

Phe Pro Gly Ser Phe Gly Ser Gin Tyr Glu Glu Ile Leu Ser Gly Ala 195 200 205

Thr Asn Asp Leu Phe Glu Ala Val Gin Val Tyr Thr Val Asp Gly Gly 210 215 220

Glu Gly Asp Ser Lys Tyr Leu Met Ile Val Glu Ala Ile Gly Ser Thr 225 230 235 240

Gly His Arg Tyr Phe Arg Ser Phe Thr Ala Ser Ser Leu Gly Gly Glu 245 250 255

Trp Thr Ala Gin Ala Ala Ser Glu Asp Gin Pro Phe Ala Gly Lys Ala 260 265 270

Asn Ser Gly Ala Thr Trp Thr Asp Asp Ile Ser His Gly Asp Leu Val 275 280 285

Arg Asn Asn Pro Asp Gin Thr Met Thr Val Asp Pro Cys Asn Leu Gin 290 295 300

Leu Leu Tyr Gin Gly His Asp Pro Asn Ser Asn Ser Asp Tyr Asn Leu 305 310 315 320

Leu Pro Trp Lys Pro Gly Val Leu Thr Leu Lys Gin 325 330

Claims

1 . A method for identifying a DNA fragment encoding a protein of interest, comprising screening a cDNA library in the form of bacterial host cells transformed with DNA obtainable from a eukaryotic organism capable of producing said protein for expression of said protein using a test which is indicative for the presence, of said protein, which screening is performed after the transforming DNA has become part of a plasmid. 0

2. A method according to claim 1 wherein said eukaryotic organism is a fungus or a yeast cell.

3. A method according to claim 2 wherein said fungus is an Aspergillus species, s preferably A. niger.

4. A method according to any one of claims 1 to 3 wherein the protein is an enzyme, preferably one which has cellulase activity, arabinoxylan degrading activity or xylanase activity. 0

5. An isolated nucleic acid fragment encoding a polypeptide having cellulase activity, which fragment comprises the sequence shown in SEQ ID No. 1 or 3, or a sequence having at least 90% sequence identity to the sequence shown in SEQ ID No. 1 or 3. 5

6. An isolated nucleic acid fragment encoding a polypeptide having xylanase activity, which fragment comprises the sequence shown in SEQ ID No. 5 or 7, or a sequence having at least 90% sequence identity to the sequence shown in SEQ ID No. 5 or 7. o

7. An isolated nucleic acid fragment encoding a polypeptide with arabinoxylan degrading activity which fragment comprises the sequence shown in SEQ ID No. 9, or a sequence having at least 90% identity to the sequence shown in SEQ ID No. 9.

8. An isolated nucleic acid fragment which is a variant of the fragment of claim 5 which differs therefrom by the substitution, deletion or deletion of one or more amino acids, and encodes a polypeptide having cellulase activity.

9. An isolated nucleic acid fragment which is a variant of the fragment of claim 6 which differs therefrom by the substitution, deletion or deletion of one or more amino acids, and encodes a polypeptide having xylanase activity.

10. An isolated nucleic acid fragment which is a variant of the fragment of claim 7 which differs therefrom by the substitution, deletion or deletion of one or more amino acids, and encodes a polypeptide having arabinoxylan degrading activity.

1 1 . A recombinant nucleic acid comprising a nucleic acid fragment as defined in any one of claims 5 to 10.

12. A polypeptide encoded by the nucleic acid of any one of claims 5 to 10, or a variant thereof which differs by a deletion, substitution or insertion of one or more amino acids and which still has substantially the same activity, in substantially pure form.

13. A recombinant host cell which harbours a nucleic fragment identified by the method of any one of claims 1 to 4, or a fragment as defined in any one of claims 8 to 10, or a recombinant nucleic acid as defined in claim 1 1.

14. A method for producing a polypeptide as defined in claim 12 which comprises culturing a host cell as defined in claim 1 3 under conditions that permit the expression of the polypeptide and recovering the polypeptide.