AU4716200A

AU4716200A - Haemophilus adhesion proteins

Info

Publication number: AU4716200A
Application number: AU47162/00A
Authority: AU
Inventors: Stephen J. Barenkamp; Joseph W. St. Geme
Original assignee: St Louis University; Washington University in St Louis WUSTL
Current assignee: St Louis University; Washington University in St Louis WUSTL
Priority date: 1995-03-24
Filing date: 2000-07-12
Publication date: 2000-09-14
Anticipated expiration: 2016-03-22
Also published as: AU754024B2

Description

Regulation 3.2

AUSTRALIA

Patents Act 1990 COMPLETE

SPECIFICATION

DIVISIONAL

APPLICATION

(ORIGINAL)

C.

IP Australia Documents receved on- 7 2 JIL 2000 Bact- No: o

Q

C

Name of Applicant: Actual Inventor(s): Address for Service: Invention Title: Washington University AND St. Louis University Joseph W. St.Geme and Stephen, J. Barenkamp.

DAVIES COLLISON CAVE, Patent Attorneys, 1 Little Collins Street, Melbourne, Victoria 3000.

"Haemophilus adhesion proteins" The following statement is a full description of this invention, including the best method of performing it known to us: -1- HAEMOPHILUS ADHESION

PROTEINS

The U.S. Government has certain rights in this invention pursuant to grant numbers AI-21707 and HD-29687 from National Institutes of Health.

FIELD OF THE INVENTION The invention relates to novel Haemophilus adhesion proteins, nucleic acids, and antibodies.

**o BACKGROUND OF THE INVENTION Most bacterial diseases begin with colonization of a particular mucosal surface (Beachey et al., 1981, J. Infect. Dis. 143:325-345). Successful colonization requires that an organism overcome mechanical cleansing of the mucosal surface and evade the local immune response. The process of colonization is dependent upon specialized microbial factors that promote binding to host cells (Hultgren et al..

1993 Cell, 73:887-901). In some cases the colonizing organism will subsequently enter (invade) these cells and survive intracellularly (Falkow, 1991, Cell 65:1099- 1102).

Haemophilus influenzae is a common commensal organism of the human respiratory tract (Kuklinska and Kilian. 1984, Eur. J. Clin. Microbiol. 3:249-252). It is the most -2common cause of bacterial meningitis and a leading cause of other invasive (bacteraemic) diseases. In addition, this organism is responsible for a sizeable fraction of acute and chronic otitis media. sinusitis, bronchitis. and pneumonia.

Haemophilus influenzae is a human-specific organism that normally resides in the human nasopharynx and must colonize this site in order to avoid extinction. This microbe has a number of surface structures capable of promoting attachment to host cells (Guerina et al.. 1982. J. Infect. Dis. 146:564; Pichichero et al.. 1982. Lancet ii:960-962: St. Geme et al.. 1993, Proc. Natl. Acad. Sci. U.S.A. 90:2875-2879).

In addition. H influenzae has acquired the capacity to enter and survive within these 10 cells (Forsgren et al.. 1994. Infect. Immun. 62:673-679; St. Geme and Falkow. 1990.

Infect. Immun. 58:4036-4044; St. Geme and Falkow. 1991, Infect. Immun. 59:1325- 1333. Infect. Immun. 59:3366-3371). As a result. this bacterium is an important cause of both localized respiratory tract and systemic disease (Turk, 1984,J. Med.

Microbiol. 18:1-16). Nonencapsulated. non-typable strains account for the majority 15 of local disease (Turk. 1984. supra); in contrast. serotype b strains, which express a capsule composed of a polymer of ribose and ribitol-5-phosphate (PRP). are responsible for over 95% of cases of H. influenzae systemic disease (Turk. 1982.

Clinical importance ofHaemophilus influenzae, p. 3-9. In S.H. Sell and P.F. Wright Haemophilus influenzae epiderniology, immunology. and prevention of disease. Elsevier/North-Holland Publishing Co., New York).

The initial step in the pathogenesis of disease due to H. influenzae involves colonization of the upper respiratory mucosa (Murphy et al., 1987, J. Infect. Dis.

5:723-731). Colonization with a particular strain may persist for weeks to months.

and most individuals remain asymptomatic throughout this period (Spinola et al..

1986. Infect.Dis. 154:100-109). However.in certain circumstancescolonization will be followed by contiguous spread within the respiratory tract. resulting in local disease in the middle ear. the sinuses, the conjunctiva, or the lungs. Altematively, -3on occasion bacteria will penetrate the nasopharyngeal epithelial barrier and enter the bloodstream.

In vitro observations and animal studies suggest that bacterial surface appendages called pili (or fimbriae) play an important role in H. influenzae colonization. In 1982 two groups reported a correlation between piliation and increased attachment to human oropharyngeal epithelial cells and erythrocytes (Guerina et al.. supra; Pichichero et al., supra). Other investigators have demonstrated that anti-pilus antibodies block in vitro attachment by piliated H. influenzae (Forney et al.. 1992.

J.Infect.Dis. 165:464-470;van Alphen etal., 1988. Infect. Immun. 56:1800-1806) 10 Recently Weber et al. insertionally inactivated the pilus structural gene in an H.

influenzae type b strain and thereby eliminated expression of pili; the resulting mutant exhibited a reduced capacity for colonization of year-old monkeys (Weber et al.. 1991. Infect. Immun. 59:4724-4728).

A number of reports suggest that nonpilus factors also facilitate Haemophilus 15 colonization. Using the human nasopharyngealorgan culture model. Farley et al.

(1986. J. Infect. Dis. 161:274-280)and Loeb et al. (1988.Infect. Immun. 49:484- 489) noted that nonpiliated type b strains were capable of mucosal attachment. Read and coworkers made similar observations upon examining nontypable strains in a model that employs nasal turbinate tissue in organ culture (1991. J. Infect. Dis.

163:549-558). In the monkey colonization study by Weber et al. (1991. supra).

nonpiliated organisms retained a capacity for colonization, though at reduced densities moreover, among monkeys originally infected with the piliated strain.

virtually all organisms recovered from the nasopharynx were nonpiliated. All of these observationsare consistentwith the finding that nasopharyngealisolates from children colonized with H. influen=ae are frequently nonpiliated (Mason et al.. 1985.

Infect. Immun. 49:98-103; Brinton et al.. 1989. Pediatr.Infect. Dis. J. 8:554-561).

Previous studies have shown that H. influenzae are capable of entering (invading) cultured human epithelial cells via a pili-independent mechanism (St. Geme and Falkow, 1990, supra; St. Geme and Falkow, 1991. supra). Although H. influenzae is not generally considered an intracellular parasite, a recent report suggests that these in vitro findings may have an in vivo correlate (Forsgren et al.. 1994. supra).

Forsgren and coworkers examined adenoids from 10 children who had their adenoids removed because of longstanding secretory otitis media or adenoidal hypertrophy.

In all 10 cases there were viable intracellular H. influenzae. Electron microscopy demonstrated that these organisms were concentrated in the reticular crypt 10 epithelium and in macrophage-like cells in the subepithelial layer of tissue. One possibility is that bacterial entry into host cells provides a mechanism for evasion of the local immune response, thereby allowing persistence in the respiratory tract Thus, a vaccine for the therapeutic and prophylactic treatment of Haemophilus infection is desirable. Accordingly. it is an object of the present invention to provide 15 for recombinant Haemophilus Adherence(HA) proteins and variants thereof, and to produce useful quantities of these HA proteins using recombinant

DNA

techniques.

It is a further object of the inventionto provide recombinant nucleic acids encoding HA proteins, and expression vectors and host cells containing the nucleic acid encoding the HA protein.

An additional object of the invention is to provide monoclonal antibodies for the diagnosis of Haemophilus infection.

A further object of the invention is to provide methods for producing the HA proteins, and a vaccine comprising the HA proteins of the present invention.

Methods for the therapeutic and prophylactic treatment of Haemophilus infection are also provided.

SUMMARY OF THE INVENTION In accordance with the foregoing objects, the present invention provides recombinant HA proteins, and isolated or recombinant nucleic acids which encode the HA proteins of the present invention. Also provided are expression vectors which comprise DNA encoding a HA protein operably linked to transcriptional and translationalregulatory DNA. and host cells which contain the expression vectors.

The invention provides also provides methods for producing HA proteins which comprises culturing a host cell transformed with an expression vector and causing expressionof the nucleic acid encoding the HA protein to produce a recombinant HA protein.

The invention also includes vaccines for Haemophilus influen-ae infection comprising an HA protein for prophylactic or therapeutic use in generating an 15 immune response in a patient. Methods of treating or preventing Haemophilus influenzae infection comprise administering a vaccine.

BRIEF DESCRIPTION OF THE DRAWINGS Figures IA. IB. and 1C depict the nucleic acid sequence of HAl.

Figure 2 depicts the amino acid sequence of HAl.

Figures 3A. 3B. 3C, 3D. 3E. 3F and 3G depictthe nucleic acid sequence and amino acid sequence of HA2.

Figure 4 shows the schematic alignment of HAl and HA2. Regions of sequence similarity are indicated by shaded, striped, and open bars, corresponding to N-terminaldomains. internal domains, and C-terminal domains, respectively. The solid circles represent a conserved Walker box ATP-binding motif (GINV SGKT).

Numbers above the bars refer to amino acid residue positions in the full-length proteins. Numbers in parentheses below the HA2 bars represent percent similarity/percent identity between these domains and the corresponding HAl domains. The regions of HA2 defined by amino acid residues 51 to 173. 609 to 846. and 1292 to 1475 show minimal similarity to amino acids 51 to 220 of HAl.

10 Figure 5 depicts the homology between the N-terminal amino acid sequences of HAl and HA2. Single letter abbreviations are used for the amino acids. A line indicates identity between the residues. and two dots indicate conservativechanges.

i.e. similarity between residues.

Figure 6 depicts the restriction maps of phage 11-17 and plasmidpT7-7 subclones Figure 7 depicts the restriction map of pDC400 and derivatives. pDC400 contains a 9.1 kb insert from strain C54 cloned into pUC19. Vector sequences are represented by hatched boxes. Letters above the top horizontal line indicate restriction enzyme sites: Bg. BglIl: E. EcoRl: H. HindlII; P, Psil; S. Sall: Ss. SstI: X.XbaI. The heavy horizontal line with arrow represents the location of the hsflocus within pDC400 and the direction of transcription. The striated horizontal line represents the 3.3 kb intragenic fragment used as a probe for Southern analysis. The plasmid pDC602.

which is not shown. contains the same insert as pDC601. but in the opposite orientation.

Figure 8 shows the identification of plasmid-encoded proteins using the bacteriophage T7 expression system. Bacteria were radiolabelled with -7trans- [S]-label. and whole cell lysates were resolved on a SDS-polyacrylamidegel. Proteins were visualized by autoradiography. Lane 1.

E. coli BL21(DE3)/pT7-7uninduced; lane 2. BL21(DE3)/pT7-7 induced: lane 3, BL21(DE3)/pDC602 uninduced; lane 4. BL21(DE3)/pDC60 2 induced; lane BL21(DE3)/pDC601 uninduced; lane 6, BL21(DE3)/pDC601 induced. The plasmids pDC602 and pDC601 are derivatives ofpT7-7 that contain the 8.3 kb Xbal fragment from pDC400 in opposite orientations. The asterisk indicates the overexpressed protein in BL21(DE3)/pDC601.

Fieure 9 depicts the southern analysis of chromosomal DNA from H influenzae 10 strains C54 and 11, probing with HA2 versus HA DNA fragments were separated on a 0.7% agarose gel and transferred bidirectionallyto nitrocellulose membranes prior to probing with either HAl or HA2. Lane 1. C54 chromosomal DNA digested with Bgll; lane 2. C54 chromosomal DNA digested with Clal; lane 3. C54 chromosomal DNA digested with Pstl: lane 4. 11 chromosomal DNA digested with Bgfll; lane 5.11 chromosomalDNA digested with Clal: lane 6, 11 chromosomal DNA digested withXbal. A. Hybridization with the 3.3 kb Pstl-Bglll intragenic fragment of HA2 from strain C54. B. Hybridization with the 1.6 kb Sryl-Sspl o" intragenic fragment of HA 1 from strain 11.

Figure 10 depicts the comparison of cellular binding specificitiesof E. coli harboring HA2 versus HA1. Adherence was measured after incubating bacteria with eucaryotic cell monolayers for 30 minutes as described and was calculated by dividing the number of adherent colony forming units by the number of inoculated colony forming units (St. Geme et al.. 1993). Values are the mean SEM of measurements made in triplicate from representative experiments. The plasmid pDC601 contains the HA2 gene from H. influenzae strain C54. while contains the HA gene from nontypable H. influen:ae strain 11. Both pDC601 and pHMW8-5 were prepared using pT7-7 as the cloning vector.

-8- Figure 11 depicts the comparison of the N-terminal extremities of HA2. HMW1.

HMW2. AIDA-. Tsh. and SepA. The N-terminal sequence of HA2 is aligned with those of HAl (Barenkamp.S.J.. and J.W. St. Geme, III. Identification of a second family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol., in press.), HMW1 and HMW2 (Barenkamp,S.J.. and E. Leininger. 1992. Cloning, expression,and DNA sequence analysis of genes encoding nontypeable Haemophilus influenzae high molecular weightsurface-exposedproteinsrelatedto filamentoushemagglutininofBordeella pertussis. Infect. Immun. 60:1302-1313.). AIDA-I (Benz, and M.A. Schmidt.

10 1992. AIDA-I. the adhesin involved in diffuse adherence of the diarrhoeagenic **Escherichiacoli strain 2787 (0126:H27).is synthesized via a precursor molecule.

Mol. Microbiol. 6:1539-1546.). Tsh (Provence. D. and R. Curtiss III. 1994. Isolation and characterizationof a gene involved in hemagglutinationby an avian pathogenic Escherichia coli strain. Infect. Immun. 62:1369-1380.). and Sep A 15 (Benjelloun-Touimi. P.J. Sansonetti. and C. Parsot. 1995. SepA. the major extracellularprotein of Shigellaflexneri: autonomous secretion and involvement in tissue invasion. Mol. Microbiol. 17:123-135.). A consensus sequence is shown on the lower line.

Figure 12 depicts the southern analysis of chromosomal DNA from epidemiologicallydistinct strains of H. influenzae type b. ChromosomalDNA was digested with BglI. separated on a 0.7% agarose gel. transferred to nitrocelluloe, and probed with the 3.3 kb Pstl-Bgill intragenic fragment of hsf from strain C54.

Lane 1. strain C54: lane 2. strain 1081; lane 3. strain 1065; lane 4. strain 1058: lane strain 1060; lane 6, strain 1053: lane 7. strain 1063: lane 8. strain 1069; lane 9.

strain 1070: lane 10, strain 1076; lane 11. strain 1084.

Figure 13 depicts the southern analysis of chromosomal DNA from non-type b encapsulatedstrainsofH. influenzae. ChromosomalDNA wasdigested with Bg/I.

-9separated on a 0.7% agarose gel. transferred to nitrocellulose. and probed with the 3.3 kb Pstl-BgIll intragenic fragment of hsffrom strain C54. Lane 1, SM4 (type lane 2. SM72 (type lane 3. SM6 (type lane 4. Rd (type lane 5, SM7 (type lane 6. 142 (type lane 7. 327 (type lane 8, 351 (type lane 9. 134 (type f; lane 10, 219 (type lane 11. 346 (type lane 12. 503 (type f).

Figures 14A and 14B are the nucleic acid sequence of HA3.

SFigure 15 is the amino acid sequence of HA3.

Figures 16A and 16B depict the homology between the amino acid sequences of HAl and HA3. Single letter abbreviations are used for the amino acids. A line indicates identity between the residues. and two dots indicate conservative changes.

i.e. similarity between residues.

.o DETAILED DESCRIPTION OF THE INVENTION The present invention provides novel Haemophilus Adhesion (HA) proteins. In a preferredembodiment.the HA proteins are from Haemophilus strains, and in the preferred embodiment- from Haemophilusinfluen-a. In particular. H influen-ae encapsulated type b strains are used to clone the HA proteins of the invention.

However. using the techniques outlined below, HA proteins from other Haemophilus influenzae strains, or from other bacterial species such as Neisseria spp. or Bordetalla spp. may also be obtained.

Three HA proteins. HAl. HA2 and HA3. are depicted in Figures 2. 3 and respectively. HA2 is associated with the formation of surface fibrils, which are involved in adhesion to various host cells. HA1 has also been implicated in adhesion to a similar set of host cells. When the HAl or HA2 nucleic acid is expressed in a non-adherent strain of E. coli as described below, the E. coli acquire the ability to adhere to human host cells. It should be noted that in the literature. HAl is referred to as hia influenza adherence) and HA2 is referred to as hsf (Haemophilus surface fibrils).

A HA protein may be identified in several ways. A HA nucleic acid or HA protein is initially identified by substantial nucleic acid and/or amino acid sequence homology to the sequences shown in Figures 1.2. 3. 14 or 15. Such homology can be based upon the overall nucleic acid or amino acid sequence or portions thereof.

As used herein, a protein is a "HA protein" if the overall homology of the protein 10 sequence to the amino acid sequence shown in Figures 2 and/or Figure 3 and/or Figure 15 is preferably greater than about 45 to 50%. more preferably greater than about 65% and most preferably greater than 80%. In some embodiments the ~homology will be as high as about 90 to 95 or 98%. That is, a protein that has at least 50% homology (or greater) to one. two or all three of the amino acid sequences 15 of HAl. HA2 and HA3 is considered a HA protein. This homology will be determined using standard techniques known in the art. such as the Best Fit sequence program described by Devereux et al.. Nucl. Acid Res. 12:387-395 (1984) or the BLASTX program (Altschul et al.. J. Mol. Biol. 215:403-410 (1990)). The alignment may include the introduction of gaps in the sequences to be aligned. As noted below, in the comparison of proteins of different lengths, such as HA and HA3 with HA2. the homology is determined on the basis of the length of the shorter sequence.

In a preferred embodiment, a HA protein is defined as having significant homology to either the N-terminal region or the C-terminal region. or both. of the HA 1. HA2 and HA3 proteins depicted in Figures4, 5 and 15. The N-terminal region of about amino acids is virtually identical as between HAl and HA3 (98% homology).

-11and as between either HAl or HA3 and HA2 is 74%. As shown in Figure 11. the first 24 amino acids of the N-terminus of HAl and HA2 has limited homology to several other proteins, but this homology is 50% or less. Thus. a HA protein may be defined as having homology to the N-terminal region of at least about preferably at least about 70%. and most preferably at least about 80%. with homology as high as 90 or 95% especially preferred. Similarly, the C-terminal region of at least about 75, preferably 100 and most preferably 125 amino acid residues is also highly homologous and can be used to identify a HA protein. As shown in Figure 16. the homology between the C-terminal 120 or so amino acids of HAl and HA3 is about 98%. and as between either HAl or HA3 and HA2 is also about 98%. Thus homology at the C-terminus is a particularly useful way of identifying a HA protein. Accordingly. a HA protein can be defined as having homology to the C-terminal region of at least about 60%, preferably at least about and most preferably at least about 80%. with homology as high as 90 or especially preferred. In a preferred embodiment. the HA protein has homology to both the N- and C-terminal regions.

4..

S. In addition. a HA protein may be identified as containing at least one stretch of amino acid homology found at least in the HAl and HA2 proteins as depicted in Figure 4. HA2 contains three separate stretchs of amino acids (174 to 608. 847 to 1291. and 1476 to 1914. respectively) that shows significant homology to the region of HAl defined by amino acids 221 to 658.

The HA proteins of the present invention have limited homology to the high molecular weight protein- 1 (HMW 1) of H. influenzae. as well as the AIDA-I adhesin of E. coli. For the HMW 1 protein, this homology is greatest between residues 540 of the HAl protein and residues 1100 to about 1550 of HMWI. with homology in this overlap region. For the AIDA-I protein, there is a roughly -12homology between the first 30 amino acids of AIDA-I and HA1, and the overall homology between the proteins is roughly 22%.

In addition, the HA1, HA2 and HA3 proteins of the present invention have homology to each other. as shown in Figures 4, 5 and 16. As between HAl and HA2, the homology is 81% similarity and 72% identity overall. HA3 and HAl are 51% identical and 65% similar. Thus, for the purposes of the invention, HAl, HA2 and HA3 are all HA proteins.

An "HAl" protein is defined by substantial homology to the sequence shown in Figure 2. This homology is preferably greater than about 60%. more preferably greater than about 70% and most preferably greater than 80%. In preferred embodimentsthe homology will be as high as about 90 to 95 or 98%. Similarly.

an "HA2" protein may be defined by the same substantial homology to the sequence shown in Figure 3. and a "HA3" protein is defined with reference to Figure 15. as defined above.

In addition, for sequences which contain either more or fewer amino acids than the proteins shown in Figures 2. 3 and 15. it is understood that the percentage of homology will be determined based on the number of homologous amino acids in relation to the total number of amino acids. Thus. for example, homology of sequences shorter than that shown in Figures 2. 3 and 15. as discussed below, will be determined using the number of amino acids in the shorter sequence.

HA proteins of the present invention may be shorter than the amino acid sequences shown in Figures 2. 3 and 15. Thus. in a preferred embodiment, included within the definition of HA proteins are portions or fragments of the sequence shown in Figures 2.3 and 15. Generally,the HA protein fragments may range in size from about 7 amino acids to about 800 amino acids, with from about 1 5 to about 700 -13amino acids being preferred, and from about 100 to about 650 amino acids also preferred. Particularly preferred fragments are sequences unique to HA; these sequences have particular use in cloning HA proteins from other organisms, to generate antibodies specific to HA proteins, or for particular use as a vaccine.

Unique sequences are easily identified by those skilled in the art after examination of the HA protein sequence and comparison to other proteins; for example, by examination of the sequence alignment shown in Figures 5 and 16. Preferred unique sequences include the N-terminal region of the HAl, HA2 and HA3 sequences, comprising roughly 50 amino acids and the C-terminal 120 amino acids. depicted 10 in Figures 2. 3 and 15. HA protein fragments which are included within the definition ofa HA protein include N- or C-terminal truncationsand deletions which still allow the protein to be biologically active; for example. which still allow adherence, as described below. In addition. when the HA protein is to be used to generate antibodies, for example as a vaccine, the HA protein must share at least 15 one epitope or determinant with the sequences shown in Figures 2, 3 and 15. In a preferred embodimentthe epitope is unique to the HA protein; that is. antibodies generated to a unique epitope exhibit little or no cross-reactivity with other proteins.

However, cross reactivity with other proteins does not preclude such epitopes or antibodies for immunogenic or diagnostic uses. By "epitope" or "determinant" 20 herein is meant a portion of a protein which will generate and/or bind an antibody.

Thus. in most instances, antibodies made to a smaller HA protein will be able to bind to the full length protein.

In some embodiments. the fragment of the HA protein used to generate antibodies are small; thus. they may be used as haptens and coupled to protein carriers to generate antibodies, as is known in the art.

In addition. sequences longer than those shown in Figures 2. 3 and 15 are also included within the definition of HA proteins.

-14- Preferably, the antibodies are generated to a portion of the HA protein which is exposed at the outer membrane, i.e. surface exposed. The amino-terminal portions of HAl. HA2 and HA3 are believed to be externally exposed proteins.

The HA proteins may also be identified as associated with bacterial adhesion. Thus.

deletions of the HA proteins from the naturally occuring microorganism such as Haemophilus species results in a decrease or absence of binding ability. In some embodiments. the expression of the HA proteins in a non-adherent bacteria such as E. coli results in the ability of the organism to bind to cells.

In the case of the nucleic acid. the overall homology of the nucleic acid sequence is commensurate with amino acid homology but takes into account the degeneracy in the genetic code and codon bias of different organisms. Accordingly.the nucleic acid sequence homology may be either lower or higher than that of the protein sequence. Thus the homology of the nucleic acid sequence as compared to the nucleic acid sequences of Figures 1.3 and 14 is preferably greater than about 15 more preferably greater than about 60% and most preferably greater than In some embodiments the homology will be as high as about 90 to 95 or 98%.

As outlined for the protein sequences. a preferredembodimentutilizes HA nucleic acids with substantial homology to the unique N-terminal and C-terminal regions of the HAl, HA2 and HA3 sequences.

In one embodiment.the nucleic acid homology is determinedthroughhybridizatin studies. Thus. for example. nucleic acids which hybridize under high stringency to all or part of the nucleic acid sequences shown in Figures 1, 3 and 14 are considered HA protein genes. High stringency conditions include, but are not limited to. washes with 0.1XSSC at 65°C for 2 hours.

The HA proteins and nucleic acids of the present invention are preferably recombinant. As used herein. "nucleic acid" may refer to either DNA or RNA. or molecules which contain both deoxy- and ribonucleotides. The nucleic acids include genomic DNA, cDNA and oligonucleotidesincluding sense and anti-sense nucleic acids. Specifically included within the definition of nucleic acid are anti-sense nucleic acids. An anti-sense nucleic acid will hybridize to the corresponding noncoding strand of the nucleic acid sequences shown in Figures 1. 3 and 14, but may contain ribonucleotides as well as deoxyribonucleotides. Generally. anti-sense nucleic acids function to prevent expression of mRNA. such that a HA protein is not made. or made at reduced levels. The nucleic acid may be double stranded.

.single stranded. or contain portions of both double stranded or single stranded sequence. By the term "recombinant nucleic acid" herein is meant nucleic acid.

originally formed in vitro by the manipulation of nucleic acid by endonucleases.

in a form not normally found in nature. Thus an isolated HA protein gene. in a linear 15 form. or an expression vector formed in itr by ligating DNA molecules that are not normally joined, are both considered recombinant for the purposes of this invention: i.e. the HA nucleic acid is joined to other than the naturally occurring Haemophiluschromosome in which it is normally found. It is understood that once "a recombinant nucleic acid is made and reintroduced into a host cell or organism.

it will replicate non-recombinantly.i.e. using the in vo cellular machinery of the host cell rather than in vitro manipulations: however, such nucleic acids. once produced recombinantly.although subsequently replicated non-recombinantly.are still considered recombinant for the purposes of the invention.

Similarly. a "recombinantprotein" is a protein made using recombinanttechniques.

i.e. through the expression of a recombinant nucleic acid as depicted above.

A

recombinant protein is distinguished from naturally occurring protein by at least one or more characteristics. For example. the protein may be isolated away from some or all of the proteins and compounds with which it is normally associated -16in its wild type host. or found in the absence of the host cells themselves. Thus.

the protein may be partially or substantially purified. The definition includes the productionof a HA protein from one organism in a different organism or host cell.

Altematively.the protein may be made at a significantlyhigher concentrationthan is normally seen, through the use of a inducible promoter or high expression promoter, such that the protein is made at increased concentration levels.

Alternatively.the protein may be in a form not normally found in nature. as in the addition of an epitope tag or amino acid substitutions, insertions and deletions.

Furthermore.althoughnot normally considered"recombinant",proteinsor portions 10 of proteins which are synthesized chemically. using the sequence information of Fieures 2. 3 and 15. are considered recombinant herein as well.

Also included with the definition of HA protein are HA proteins from other organisms. which are cloned and expressed as outlined below.

In the case of anti-sense nucleic acids, an anti-sense nucleic acid is defined as one 15 which will hybridize to all or part of the corresponding non-coding sequence of the sequences shown in Figures 1.3 and 14. Generally. the hybridization conditions used for the determination of anti-sense hybridization will be high stringency conditions. such as 0.1XSSC at 65 *C.

Once the HA protein nucleic acid is identified, it can be cloned and. if necessary.

its constituent parts recombinedto form the entire HA protein nucleic acid. Once isolated from its natural source, contained within a plasmid or other vector or excised therefrom as a linear nucleic acid segment. the recombinant HA protein nucleic acid can be further used as a probe to identify and isolate other HA protein nucleic acids. It can also be used as a "precursor" nucleic acid to make modified or variant HA protein nucleic acids and proteins.

-17- Using the nucleic acids of the present inventionwhich encode HA protein, a variety of expression vectors are made. The expression vectors may be either selfreplicating extrachromosomal vectors or vectors which integrate into a host genome.

Generally, these expression vectors include transcriptional and translational regulatory nucleic acid operably linked to the nucleic acid encoding the HA protein.

"Operably linked" in this context means that the transcriptional and translational regulatory DNA is positioned relative to the coding sequence of the HA protein in such a manner that transcription is initiated. Generally. this will mean that the promoter and transcriptional initiation or start sequences are positioned 5' to the HA protein coding region. The transcriptionaland translational regulatory nucleic acid will generally be appropriate to the host cell used to express the HA protein: for example. transcriptional and translational regulatory nucleic acid sequences from Bacillus will be used to express the HA protein in Bacillus. Numerous types of appropriate expression vectors, and suitable regulatory sequences are known in the art for a variety of host cells.

ooIn general. the transcriptional and translational regulatory sequences may include.

but are not limited to, promoter sequences. leader or signal sequences. ribosomal binding sites, transcriptional start and stop sequences. translational start and stop sequences. and enhancer or activator sequences. In a preferred embodiment, the regulatory sequences include a promoter and transcriptional start and stop sequences.

Promoter sequences encode either constitutive or inducible promoters. The promoters may be either naturally occurring promoters or hybrid promoters. Hybrid promoters, which combine elements of more than one promoter, are also known in the art. and are useful in the present invention.

In addition.the expression vector may comprise additional elements. For example.

the expression vector may have two replication systems. thus allowing it to be -18maintained in two organisms, for example in mammalian or insect cells for expression and in a procaryotic host for cloning and amplification. Furthermore.

for integrating expression vectors, the expression vector contains at least one sequence homologous to the host cell genome, and preferably two homologous sequences which flank the expression construct. The integrating vector may be directed to a specific locus in the host cell by selecting the appropriate homologous sequence for inclusion in the vector. Constructs for integrating vectors are well known in the art.

In addition. in a preferred embodiment, the expression vector contains a selectable marker gene to allow the selection of transformed host cells. Selection genes are well known in the art and will vary with the host cell used.

The HA proteins of the present invention are produced by culturing a host cell transformed with an expression vector containing nucleic acid encoding a HA Sprotein, under the appropriate conditions to induce or cause expression of the HA 15 protein. The conditions appropriate for HA protein expression will vary with the choice of the expression vector and the host cell. and will be easily ascertained by one skilled in the art through routine experimentation. For example, the use of ~constitutive promoters in the expression vector will require optimizing the growth and proliferation of the host cell. while the use of an inducible promoter requires the appropriate growth conditions for induction. In addition, in some embodiments.

the timing of the harvest is important. For example, the baculoviral systems used in insect cell expression are lytic viruses, and thus harvest time selection can be crucial for product yield.

Appropriate host cells include yeast. bacteria. archebacteria. fungi. and insect and animal cells. including mammalian cells. Of particular interest are Drosophila melaneaster cells, Saccharomvces cerevisiae and other yeasts, E.coi, Bacillus -19ubtis. SF9 cells. C129 cells. 293 cells,NeurosporaBHK,CHO, COS, and HeLa cells, immortalized mammalian myeloid and lymphoid cell lines.

In a preferred embodiment. HA proteins are expressed in bacterial systems.

Bacterial expression systems are well known in the art.

A suitable bacterial promoter is any nucleic acid sequence capable of binding bacterialRNA polymerase and initiating the downstream transcription of the coding sequence of HA protein into mRNA. A bacterial promoter has a transcription initiation region which is usually placed proximal to the 5' end of the coding sequence. This transcription initiation region typically includes an RNA polymerase binding site and a transcription initiation site. Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. Examples include promoter sequences derived from sugar metabolizing enzymes. such as galactose.

lactose and maltose, and sequences derived from biosynthetic enzymes such as trNptophan. Promoters from bacteriophage may also be used and are known in the 15 art. In addition, synthetic promoters and hybrid promoters are also useful: for example, the tac promoter is a hybrid of the trp and lac promoter sequences.

Furthermore. a bacterial promoter can include naturally occurring promoters of nonbacterial origin that have the ability to bind bacterial RNA polymerase and initiate transcription.

In addition to a functioning promoter sequence. an efficient ribosome binding site is desirable. In E. coli. the ribosomebinding site iscalled the Shine-Delgarno(SD) sequence and includes an initiationcodon and a sequence 3-9 nucleotidesin length located 3 11 nucleotides upstream of the initiation codon.

The expression vector may also include a signal peptide sequence that provides for secretion of the HA protein in bacteria. The signal sequence typically encodes to .:4 0 9*9* 9 00 9 9* to.

:0.00.

a signal peptide comprised ofhydrophobic amino acids which direct the secretion of the protein from the cell, as is well known in the art. The protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic space, located between the inner and outer membrane of the cell (gram-negative bacteria).

The bacterialexpressionvector may also include a selectable marker gene to allow for the selection of bacterial strains that have been transformed. Suitable selection genes include genes which render the bacteria resistant to drugs such as ampicillin.

chloramphenicol.erythromycin.kanamycinneomycin and tetracycline. Selectable markers also include biosynthetic genes. such asthose inthe histidine.tryptophn 10 and leucine biosynthetic pathways.

These components are assembled into expression vectors. Expression vectors for bacteria are well known in the art. and include vectors for Bacillus subtilis. E. coli, Streptococcus cremoris. and Streptococcus lividans, among others.

The bacterial expression vectors are transformed into bacterial host cells using techniques well known in the art. such as calcium chloride treatment.

electroporation. and others.

In one embodiment. HA proteins are produced in insect cells. Expression vectors for the transformation of insect cells, and in particular. baculovirus-based expression vectors, are well known in the art. Briefly, baculovirus is a very large DNA virus which produces its coat protein at very high levels. Due to the size of the baculoviral genome. exogenous genes must be placed in the viral genome by recombination.

Accordingly. the components of the expression system include: a transfer vector.

usually a bacterial plasmid. which contains both a fragment of the baculovirus genome, and a convenient restriction site for insertion of the HA protein; a wild type baculoviruswith a sequence homologousto the baculovirus-specificfragment -21in the transfer vector (this allows for the homologous recombination of the heterologous gene into the baculovirus genome); and appropriate insect host cells and growth media.

Mammalian expression systems are also known in the art and are used in one embodiment. A mammalian promoter is any DNA sequence capable of binding mammalian RNA polymerase and initiating the downstream transcription of a coding sequence for HA protein into mRNA. A promoter will have a transcription initiating region. which is usually place proximal to the 5' end of the coding sequence. and a TATA box. using a located 25-30 base pairs upstream of the 10 transcription initiation site. The TATA box is thought to direct RNA polymerase 11 to begin RNA synthesis at the correct site. A mammalian promoter will also contain an upstream promoter element, typically located within 100 to 200 base "pairs upstream of the TATA box. An upstream promoter element determines the arate at which transcriptionis initiated and can act in either orientation. Of particular 15 use as mammalian promoters are the promoters from mammalian viral genes. since the viral genes are often highly expressed and have a broad host range. Examples include the SV40 early promoter. mouse mammary tumor virus LTR promoter.

adenovirus major late promoter, and herpes simplex virus promoter.

Typically. transcriptiontermination and polyadenylationsequencesrecognizedby mammalian cells are regulatory regions located 3 to the translation stop codon and thus. together with the promoter elements. flank the coding sequence. The 3 terminusof the mature mRNA is formedby site-specificpost-translationalcleavage and polyadenylation. Examples of transcription terminator and polyadenlytion signals include those derived form The methodsof introducing exogenous nucleic acid into mammalian hosts. as well as other hosts, is well known in the art. and will vary with the host cell used.

-22- Techniquesincludedextran-mediatedtransfection.calciumphoPhatPrcptio polybrene mediated transfection. protoplaSt fusion. electropOratiOn. encapsulation of the polvnucleotide(s)in liposomes. and direct microinjection of the DNA into nuclei.

In a preferred embodiment. H-A protein is produced in yeast cells. Yeast expression systems are well known in the art. and include expression vectors for Saccharomvces cervisae.Candida albicAns and C. altosa, 1-ansenula polvMorpha.

:.Kluvveromyvces fra li and K. _lacis Pichia iguillerimond-ii and P. patoris.

Schizosacchar~mcs pme and Yarrowia liolvtica. Preferred promoter 10 sequences for expression in yeast include the inducible GAL 1.10 promoter. the promoters from alcohol dehydrogenase.enolase. glucokinase, glucose- 6-phosphar isomerase, glyceraldehyde3 phosphate dehydrogenase, hexokinase, phosphofructokinase. 3-phosphoglycerate mutase. pyruvate kinase. and the acid phosphatase gene. Yeast selectable markers include ADE2. 1HS4, LEU2. TRP 1.

and ALG7. which confers resistance to tunicamycin. the G418 resistance gene.

which confers resistance to G41 18: and the CUP I gene. which allows yeast to grow in the presence of copper ions.

AO reobnn*Apoenmyeepese nrclualo ertd h Als reoinddninthe eiino A proteins ofybh eprseitaelr~rserted Thvento Har amino acid sequence variants. These variants fall into one or more of three classes: substitutional, insertional or deletional variants. These variants ordinarily are prepared by site specific mutagenesisof nucleotides in the DN A encoding the HA -23protein. using cassette mutagenesis or other techniques well known in the art, to produce DNA encoding the variant, and thereafter expressing the DNA in recombinant cell culture as outlined above. However. variant HA protein fragments having up to about 100-150 residues may be prepared by n i synthesis using established techniques. Amino acid sequence variants are characterized by the predetermined nature of the variation.a feature that sets them apart from naturally occurring allelic or interspecies variation of the HA protein amino acid sequence.

The variants typically exhibit the same qualitative biological activity as the naturally occurring analogue, although variants can also be selected which have modified 10 characteristics as will be more fully outlined below.

While the site or region for introducing an amino acid sequence variation is predetermined, the mutation per se need not be predetermined. For example, in order to optimize the performance of a mutation at a given site. random mutagenesis may be conducted at the target codon or region and the expressed HA protein variants screened for the optimal combination of desired activity. Techniques for making substitution mutations at predetermined sites in DNA having a known sequence are well known. for example. MI 3 primer mutagenesis. Screening of the mutants is done using assays of HA protein activities: for example. mutated

HA

genes are placed in HA deletion strains and tested for HA activity, as disclosed herein. The creation of deletion strains, given a gene sequence. is known in the art. For example. nucleic acid encoding the variants may be expressed in an adhesion deficient strain, and the adhesion and infectivity of the variant Haemophilus influenzae evaluated. For example, as outlined below, the variants may be expressed in the E. coli DH5a non-adherent strain, and the transformed E. coli strain evaluated for adherence using Chang conjunctival cells.

Amino acid substitutions are typically of single residues: insertions usually will be on the order of from about 1 to 20 amino acids. although considerably larger -24insertionsmay be tolerated. Deletions range from about 1 to 30 residues. although in some cases deletions may be much larger, as for example when one of the domains of the HA protein is deleted.

Substitutions, deletions, insertions or any combination thereof may be used to arrive at a final derivative. Generally these changes are done on a few amino acids to minimize the alteration of the molecule. However, larger changes may be tolerated in certain circumstances.

When small alterations in the characteristics of the HA protein are desired.

substitutions are generally made in accordance with the following chart: Chart I a. Exemplary Substitutions Original Residue Ala Arg Asn 15 Asp Cvs Gin Glu Gly His lie Leu Lys Met Phe Ser Thr Trp Tyr Val Ser Lys Gin. His Glu Ser Asn Asp Pro Asn. Gin Leu. Val Ile. Val Arg. Gin, Glu Leu. lHe Met. Leu, Tyr Thr Ser Tyr Trp. Phe Ile. Leu Substantial changes in function or immunological identity are made by selecting substitutionsthat are less conservative than those shown in Chart 1. For example.

substitutions may be made which more significantly affect: the structure of the polypeptide backbone in the area of the alteration, for example the alpha-helical or beta-sheet structure; the charge or hydrophobicity of the molecule at the target site: or the bulk of the side chain. The substitutions which in general are expected to produce the greatest changes in the polypeptide's properties are those in which a hvdrophilic residue, e.g. seryl or threonyl. is substituted for (or by) a hydrophobic residue. e.g. leucyl, isoleucyl, phenylalanyl, valyl or alanyl: a 10 cysteine or proline is substituted for (or by) any other residue; a residue having an electropositive side chain. e.g. lysyl. arginyl, or histidyl, is substituted for (or by) an electronegative residue, e.g. glutamyl or aspartyl: or a residue having a bulky side chain, e.g. phenylalanine. is substituted for (or by) one not having a side chain. e.g. glycine.

The variants typically exhibit the same qualitativebiological activity and will elicit ooo the same immune response as the naturally-occurringanalogue, although variants also are selected to modify the characteristics of the polypeptide as needed.

"Altematively.the variant may be designed such that the biological activity of the HA protein is altered. For example, the Walker box ATP-binding motif may be altered or eliminated.

In a preferred embodiment, the HA protein is purified or isolated after expression.

HA proteins may be isolated or purified in a variety of ways known to those skilled in the art depending on what other components are present in the sample. Standard purification methods include electrophoretic. molecular, immunological and chromatographic techniques, including ion exchange, hydrophobic, affinity, and reverse-phase HPLC chromatography, and chromatofocusing. For example, the HA protein may be purified using a standard anti-HA antibody column.

-26- Ultrafiltration and diafiltration techniques, in conjunction with protein concentration.

are also useful. For general guidance in suitable purification techniques, see Scopes.

Protein Purification, Springer-Verag, NY (1982). The degree of purification necessary will vary depending on the use of the HA protein. In some instances no purification will be necessary.

Once expressed and purified if necessary. the HA proteins are useful in a number of applications.

For example, the HA proteins can be coupled. using standard technology, to affinity chromatographycolumns. These columns may then be used to purify antibodies 10 from samples obtained from animals or patients exposed to the Haemophilus influenzaeorganism. The purified antibodies may then be used as outlined below.

Additionally,the HA proteins are useful to make antibodiesto HA proteins. These antibodies find use in a number of applications. The antibodies are used to diagnose the presenceof an Haemophilus influenzae infection in a sample or patient. In a 15 preferred embodiment the antibodies are used to detect the presence of nontypable Haemophilus influen-a (NTHI). although typable H. influenae infections are also detected using the antibodies.

This diagnosis will be done using techniques well known in the art: for example, samples such as blood or tissue samples may be obtained from a patient and tested for reactivity with the antibodies, for example using standard techniques such as ELISA. In a preferred embodiment. monoclonal antibodies are generated to the HA protein, using techniques well known in the art. As outlined above, the antibodies may be generated to the full length HA protein, or a portion of the HA protein.

-27- Antibodies generated to HA proteins may also be used in passive immunization treatments, as is known in the art.

Antibodies generated to unique sequences of HA proteins may also be used to screen expression libraries from other organisms to find, and subsequently clone, HA nucleic acids from other organisms.

In one embodiment, the antibodies may be directly or indirectly labelled. By "labelled" herein is meant a compound that has at least one element, isotope or .i chemical compound attached to enable the detection of the compound. In general labels fall into three classes: a) isotopic labels, which may be radioactive or heavy 10 isotopes; b) immune labels. which may be antibodies or antigens; and c) colored or fluorescent dyes. The labels may be incorporated into the compound at any position. Thus. for example. the HA protein antibody may be labelled for detection.

or a secondary antibody to the HA protein antibody may be created and labelled.

In one embodiment. the antibodies generated to the HA proteins of the present invention are used to purify or separate HA proteins or the Haemophilus influen-a organism from a sample. Thus for example. antibodies generated to HA proteins which will bind to the Haemophilus influenae organism may be coupled. using standard technology, to affinity chromatography columns. These columns can be used to pull out the Haemophilus organism from environmental or tissue samples.

In a preferred embodiment, the HA proteins of the present invention are used as vaccines for the prophylactic or therapeutic treatment of a Haemophilus influenz e infection in a patient. By "vaccine" or "immunogenic compositions" herein is meant an antigen or compound which elicits an immune response in an animal or patient The vaccine may be administered prophylactically. for example to a patient never previously exposed to the antigen. such that subsequent infection by the -28- Haemophilus influenzae organism is prevented. Alternatively, the vaccine may be administered therapeutically to a patient previously exposed or infected by the Haemophilus influenzae organism. While infection cannot be prevented, in this case an immune response is generated which allows the patient's immune system to more effectively combat the infection. Thus, for example, there may be a decrease or lessening of the symptoms associated with infection.

A "patient" for the purposes of the present invention includes both humans and other animals and organisms. Thus the methods are applicable to both human therapy and veterinary applications.

10 The administration of the HA protein as a vaccine is done in a variety of ways.

Generally, the HA proteins can be formulated according to known methods to prepare pharmaceutically useful compositions, whereby therapeutically effective ~amounts of the HA protein are combined in admixture with a pharmaceutically acceptable carrier vehicle. Suitable vehicles and their formulation are well known 15 in the art. Such compositions will contain an effective amount of the HA protein together with a suitable amount of vehicle in order to prepare pharmaceutically S. ,acceptable compositions for effective administrationto the host. The composition may include salts, buffers. carrier proteins such as serum albumin, targeting molecules to localize the HA protein at the appropriate site or tissue within the organism.and other molecules. The composition may include adjuvants as well.

In one embodiment, the vaccine is administered as a single dose; that is, one dose is adequate to induce a sufficient immune response to prophylactically or therapeuticallytreat a Haemophilusinfluen-aeinfection. In altemateembodiments, the vaccine is administered as several doses over a period of time. as a primary vaccination and "booster" vaccinations.

-29- By "therapeutically effective amounts" herein is meant an amount of the HA protein which is sufficient to induce an immune response. This amount may be different depending on whether prophylactic or therapeutic treatment is desired. Generally, this ranges from about 0.001 mg to about I gm, with a preferred range of about 0.05 to about .5 gm. These amounts may be adjusted if adjuvants are used.

The following examples serve to more fully describe the manner of using the abovedescribed invention, as well as to set forth the best modes contemplated for carrying out various aspects of the invention. It is understood that these examples in no way serve to limit the true scope of this invention, but rather are presented for illustrative 1: 0 purposes. All references cited herein are specifically incorporated by reference.

EXAMPLE 1 Cloning of HAl Many protocols are substantiallythe same as those outlined in St. Geme et al.. Mol.

~Microbio. 15(1):77-85 (1995).

Bacterial strains, plasmids. and phages.

Nontypable H. influenzae strain 11 was the clinical isolate chosen as a prototypic HMW 1 /HMW2-non-expressingstrain. although a variety of encapsulated typable strains can be used to clone the protein using the sequences of the figures. The organism was isolated in pure culture from the middle ear fluid of a child with acute otitis media. The strain was identified as H. influenae by standard methods and was classified as nontypable by its failure to agglutinate with a panel of typing antiserafor H. influen:aetypes a to f (BurroughsWellcome Co.. Research Triangle Park. and failure to show lines of precipitation with these antisera in counterimmunoelectrophoresis assays. Strain 11 adheres efficiently to Chang conjunctival cells in vitro, at levels comparable to those previously demonstrated for NTHI strains expressing HMWI/HMW2-like proteins (data not shown).

Convalescent serum from the child infected with this strain demonstrated an antibody response directed predominantly against surface-exposed high molecular weight proteins with molecular weights greater than 100 kDa.

Ml 3mpl 8 and M13mpl 9 were obtained from New England BioLabs. Inc. (Beverly, Mass.) pT7-7 was the kind gift of Stanley Tabor. This vector contains the T7 RNA polymerase promoter 410. a ribosome-binding site. and the translational start site for the T7 gene 10 protein upstream from a multiple cloning site.

10 Molecular cloning and plasmid subcloning.

The recombinant phage containing the HA1 gene was isolated and characterized using methods similar to those described previously. In brief. chromosomal

DNA

from strain 11 was prepared and Sau3A partial restriction digests of the DNA were prepared and fractionated on 0.7% agarose gels. Fractions containing DNA 15 fragments in the 9- to 20- kbp range were pooled, and a library was prepared by ligation into XEMBL3 arms. Ligation mixtures were packaged in vitro with Gigapack (Stratagene) and plate-amplified in a P2 lysogen of E. coli LE392.

Lambda plaque immunological screening was performed as described by Maniatis et al.. Molecular Cloning: A Laboratory Manual. 2d Ed. (1989). Cold Spring Harbor Press. For plasmid subcloning studies. DNA from recombinant phage was subcloned into the T7 expression plasmid pT7-7. Standard methods were used for manipulation of cloned DNA as described by Maniatis et al (supra).

Plasmid pHMW8-3 was generated by isolating an 11 kbp Xbal fragment from purified DNA from recombinant phage clone 11-17 and ligating into Xbal cut pT 7 7 Plasmid pHMW8-4 was generated by isolating a 10 kbp BamHI-Cial cut pT7-7.

-31- Plasmid pHMW8-5 was generated by digesting plasmid pHMW8-3 DNA with Clal.

isolating the larger fragment and religating. Plasmid pHMW8-6 was generated by digesting pHMW8-4 with Spel. which cuts at a unique site within the HA I gene.

blunt-endingthe resulting fragment, inserting a kanamycin resistance cassette into the Spel site. Plasmid pHM W8-7 was generated by digesting pHMW8-3 with Nrul and Hindlll, isolating the fragment containing pT 7 7 blunt-ending and religating: The plasmid restriction maps are shown in Figure 6.

DNA sequence analysis.

S* DNA sequence analysis was performed by the dideoxy method with the U.S.

10 Biochemicals Sequenase kit as suggested by the manufacturer. 36 S]dATP was purchased from New England Nuclear (Boston. Mass). Data were analyzed with Compugene software and the Genetics Computer Group program from the University of Wisconsin on a Digital VAX 8530 computer. Several 21-mer oligonucleotide primers were generated as necessary to complete the sequence.

15 Adherence assays.

Adherence assavs were done with Chang epithelial cells [Wong-Kilbourme derivative. clone 1-5c-4 (human conjunctiva). ATCC CCL20.2)]. which were seeded into wells of 24-well tissue culture plates. as described (St. Geme III et al.. Infect.

Immun. 58:4036 (1990)). Bacteria were inoculated into broth and allowed to grow to a density of approximately 2 x 109 colony-forming units per ml. Approximately 2 x 107 colony-formingunits were inoculated onto epithelial cells monolayers. and plates were gently centrifuged at 165 x g for 5 min to facilitate contact between bacteria and the epithelial surface. After incubation for 30 min at 37*C in 5% CO,.

monolayerswere rinsed five times with phosphate buffered saline (PBS) to remove nonadherent organisms and were treated with trypsin-EDTA (0.05% -32- EDTA) in PBS to release them from the plastic support. Well contents were agitated, and dilution were plated on solid medium to yield the number of adherent bacteria per monolayer. Percent adherence was calculated by dividing the number of adherent colony-forming units per monolayer by the number of inoculated colonyforming units.

Isolation and characterization of recombinant phage expressing the strain 11 high molecular weight adhesion protein.

.The nontypable Haemophilus influenzae strain 11 chromosomal DNA library was screened immunologically with convalescent serum from the child infected with 10 strain 11. Immunoreactive clones were screened by Western blot for expression of high molecular weight proteins with apparent molecular weights> 100 dDa and two different classes of recombinant clones were recovered. A single clone designated 11-17 was recovered which expressed the HAl protein. The recombinant protein expressed by this clone had an apparent molecular weight of greater than 15 200 kDa.

Transformation into E. coli Plasmids were introduced into DH5a strain of E. coli (Maniatis, supra), which is a non-adherent strain, using electroporation (Dower et al., Nucl. Acids Res. 16:6127 (1988). The results are shown in Table 1.

-33- Table 1 Strain Adherence' 8-4) 43.3 8-5) 41.3 3.3% DH5a(pHMW 8-6) 0.6 0.3% 8-7) DH5a(pT7-7) 0.4 0.1% 'Adherence was measured in a 30 minute assay and was calculated by dividing the number of adherent bacteria by the number of inoculated bacteria. Values are the 10 mean SEM of measurements made in triplicatefrom a representativeexperiment In addition. a monoclonal antibody made by standard procedures. directed against the strain 11 protein recognized proteins in 57 of 60 epidemiologically-unrelated NTHI. However. Southern analysis using the gene indicated that roughly only of the tested strains actually hybridized to the gene (data not shown).

15 EXAMPLE 2 Cloning of HA2 In a recent study we examined a series of H. influenza type b isolates by transmission electron microscopy and visualized short. thin surface fibrils distinct from pili (St. Geme. J.W.III. and D. Cutter. 1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells. Mol. Microbiol. 15:77-85.). In that study. the large genetic locus involved in the expression of these appendages was isolated.

Bacterial strains and plasmids -34- H. influenzae strain C54 is a type b strain that has been described previously (Pichichero. P. Anderson, M. Loeb. and D.H. Smith. 1982. Do pili play a role in pathogenicity of Haemophilus influenzae type b? Lancet. ii:960-962.). Strain C54-Tn400.23 is a mutant that contains a mini-Tn0 kan element in the hsflocus and demonstrates minimal in vitro adherence (St. Geme, J.W.III. and D. Cutter.

1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells. Mol. Microbiol. 15:77-85.). Strains 1053.1058.1060.1063.1065.1069, 1070.1076, 1081.and 1084 are H. influenzae 1 type b isolates generously provided by J. Musser (Baylor University. Houston.

10 Texas) (Musser et al.. 1990. Global genetic structure and molecular epidemiology of encapsulated Haemophilus influenzae. Rev. Infect. Dis. 12:75-111.). H.

influenzae strains SM4 (type SM6 (type SM7 (type and SM72 (type c) are type strains obtained from R. Facklam at the Centers for Disease Control (Atlanta. Georgia). Strains 142.327. and 351 are H. influenzaetype e isolates. and 15 strains 134, 219, 256. and 501 are H. influen-ae type f isolates obtained from H.

Kayhty (Finnish National Public Health Institute.Helsinki). Strain Rd (type d) and the 15 nontypable isolates examined by Southern analysis have been described previously (Alexander et al.. J. Exp. Med. 83:345-359 (1951): Barencamp et al..

Infect. Immun.60:1302-131 3 (1 9 9 2 E. coli DH5a is a nonadherent laboratory 20 strain that was originally obtained from Gibco BRL. E. coli strain BL2 (DE3) was a gift from F.W. Studier and contains a single copy of the T7 RNA polymerase gene under the control of the lac regulatory system (Studier. and B.A. Moffatt.

1986. Use of bacteriophage T7 RNA polymerase to direct high-level expression of cloned genes. J. Mol. Biol. 189:113-130.). Plasmid pT7- 7 was provided by S.

Tabor and contains the T7 RNA polymerase promoter fl 0. a ribosome-bindingsite.

and the translational start site for the T7 gene 10 protein upstream from a multiple cloning site (Tabor. and CC. Richardson. 1985. A bacteriophage T7 RNA polymerase/promotersystem for controlled exclusive expression of specific genes.

Proc. Natl. Acad. Sci. USA. 82:1074-1078.). pUC 19 is a high-copy-number plasmid that has been previously described (Yanish-Perronet al.. Gene 33:103-1 19(1985)).

pDC400 is a pUC19 derivative that harbors the H. influenzae strain C54 surface fibril locus and is sufficient to promote in vitro adherence by laboratory strains of E. coli (St. Geme. J.W.III. and D. Cutter. 1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells. Mol. Microbiol. 15:77-85.). pHMW8-5 is a pT7-7 derivative that contains the H. influenzae strain 11 hia locus and also promotes adherence by nonadherent laboratory strains ofE. coli (Barenkamp. and J.W. St. Geme. III. Identification of a second family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol.. in press.). pHMW8-6 contains the H. influenzae hia locus interrupted by a kanamycin cassette (Barenkamp.S.J.. and J.W. St. Geme. III. Identificationof a second family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol.. in press.). pUC4K served as the source of the 15 kanamycin-resistancegene that was used as a probe in Southern analysis (Vieira.

and J. Messing. 1982. The pUC plasmids. an M13mp7-derived system for insertion mutagenesis and sequencing with synthetic universal primers. Gene.

19:259-268.).

Culture conditions H. influenzae strains were grown on chocolate agar supplemented with 1 Isovitale X. on brain heart infusion agar supplemented with hemin and NAD (BHI-DB agar).

or in brain heart infusion broth supplemented with hemin and NAD (BHIs) (Anderson. R.B. Johnston.Jr.. and D.H. Smith. 1972. Human serum activity against Haemophilus influenzae type b. J. Clin. Invest. 51:31-38.). These strains were stored at -80C in brain heart infusion broth with 25% glycerol. E. coli strains were grown on Luria Bertani (LB) agar or in LB broth and were stored at in LB broth with 50% glycerol. For H. influenzac. kanamycin was used in a -36concentration of 25 mg/ml. Antibiotic concentrations for E. coli included the following: ampicillin or carbenicillin 100 mg/ml and kanamycin 50 mg/ml.

Induction of plasmid-encoded proteins To identify plasmid-encoded proteins, the bacteriophage T7 expression vector pT7-7 was employed and the relevant pT7-7 derivatives were transformed into E. coli BL21(DE3). Activation of the T7 promoter was achieved by inducing expression of T7 RNA polymerase with isopropyl-b-D-thiogalactopyranoside (final concentration. 1 mM). After induction for 30 minutes at 37°C. rifampicin was added to a final concentration of 200 mg/ml. Thirty minutes later. 1 ml of culture was 10 pulsed with 50 mCi oftrans-[^S]-label(ICN. rvine. Calif.) for 5 minutes. Bacteria were harvested, and whole cell lysates were resuspended in Laemmli buffer for analysis by sodium dodecyl sulfate-polyacrylamide gel electrophoresis on acrylamide gels (Laemmli. U.K. 1970. Cleavage-of structural proteins during the assembly of the head of bacteriophage T4. Nature (London). 227:680-685.).

Autoradiography was performed with Kodak XAR-5 film.

**o Recombinant DNA methods DNA ligations. restriction endonuclease digestions. and gel electrophoresis were performed according to standard techniques (Sambrook. E.F. Fritsch. and T.

Maniatis. 1989. Molecular cloning: a laboratory manual. 2nd ed. Cold Spring Harbor Laboratory. Cold Spring Harbor. Plasmids were introduced into E. coli strains by either chemical transformation or electroporation.as described (Dower.

J.F. Miller. and C.W. Ragsdale. 1988. High efficiency transformation of E.

coliby highvoltageelectroporation.NucleicAcids Res. 16:6127-6145..Sambrook.

E.F. Fritsch. and T. Maniatis. 1989. Molecular cloning: a laboratory manual.

2nd ed. Cold Spring Harbor Laboratory. Cold Spring Harbor. Transformation in H. influenzae was performed using the MIV method of Herriott et al. (Herriott.

-37- E.M. Meyer. and M. Vogt. 1970. Defined nongrowth media for stage II competence in Haemophilus influenzae. J. Bacteriol. 101:517-524.).

Adherence assays Adherence assays were performed with tissue culture cells which were seeded into wells of 24-well tissue culture plates as previously described (St. Geme et al.. Infect.

Immun. 58:4036-4044(1991)). Adherence was measured after incubating bacteria with epithelial monolayers for 30 minutes as described (St. Geme, J.W.II1, S.

Falkow. and S.J. Barenkamp. 1993. High-molecular-weightproteins of nontypable Haemophilus influenzae mediate attachment to human epithelial cells. Proc. Natl.

10 Acad. Sci. U.S.A. 90:2875-2879.). Tissue culture cells included Chang epithelial cells (Wong-Kilboumederivative.clone 1-5c-4 (human conj unctiva))(ATCC

CCL

20.2). KB cells (human oral epidermoid carcinoma)(ATCC CCL 17). HEp-2 cells (human laryngeal epidermoid carcinoma) (ATCC CCL 23). A549 cells (human lung carcinoma) (ATCC CCL 185). Intestine 407 cells (human embryonic intestine) (ATCC CCL HeLa cells (human cervical epitheloid carcinoma) (ATCC CCL ME-180 cells (human cervical epidermoid carcinoma) (ATCC HTB 33). HEC-IB cells (human endometrium) (ATCC HTB 113). and CHO-KI cells (Chinese hamster ovary) (ATCC CCL 61). Chang. KB. Intestine 407. HeLa. and HEC-IB cells were maintained in modified Eagle medium with Earle's salts and non-essential amino acids. HEp-2 cells were maintained in Dulbecco's modified Eagle medium. A549 cells and CHO-KI cells in F12 medium (Ham). and ME-180 cells in medium. All media were supplemented with 10% heat-inactivated fetal bovine serum.

Southern analysis Southern blotting was performed using high stringency conditions as previously described (St. Geme. J.W.III.and S. Falkow. 1991. Loss of capsule expression by -38- Haemophilus influenzae type b results in enhanced adherence to and invasion of human cells. Infect. Immun. 59:1325-1333.).

Microscopy Samples of epithelial cells with associated bacteria were stained with Giemsa stain and examined by light microscopy as described (St. Geme. J.W.II. and S. Falkow.

S. 1990. Haemophilus influenzae adheres to and enters cultured human epithelial cells. Infect. Immun. 58:4036-4044.).

For negative-staining electron microscopy. bacteria were stained with 0.5% aqueous uranyl acetate (St. Geme. J.W.III.and S. Falkow. 1991. Loss of capsule expression 10 by Haemophilusinfluenzae type b results in enhanced adherence to and invasion of human cells. Infect. Immun. 59:1325-1333.) and examined using a Zeiss microscope.

The previous study indicated that laboratory E. coli strains harboring the plasmid pDC400 were capable of efficient attachment to cultured human epithelial cells 15 (St. Geme. J.W.I11. and D. Cutter. 1995. Evidence that surface fibrils expressed by Hacmophilus influen:ae type b promote attachment to human epithelial cells.

Mol. Microbiol. 15:77-85.). Subcloning studies and transposon mutagenesis indicated that the relevant coding region of pDC400 was present within an 8.3 kb Xbal fragment(St. Geme. J.W.11. and D. Cutter. 1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells. Mol. Microbiol. 15:77-85.) (Figure To confirm this conclusion, in the present study this XbaI fragment was subcloned into pT7-7, generating plasmids designated pDC601 and pDC602. which contained the insert in opposite orientations (Figure As predicted. expression of these plasmids in E. coli DH5a was associated with a capacity for high level in vitro attachment (Table 1).

-39- Table 1. Adherence to Chang conjunctival cells.

Strain ADHERENCE inoculum)' DH5m/pT7-7 0.4 0.1 DH5x/pDC400 25.3 1.2 DH5a/pDC601 54.3 DH5a/pDC602 55.5 4.3 SC54b-p- 98.7 C54-HA1::kanb 1.5 +0.2 C54-Tn400.23c 3.3 0.4 a.9 10 aAdherence was measured in a 30 minute assay and was calculated by dividing the number of adherent bacteria by the number of inoculated bacteria. Values are the mean SEM ofmeasurementsmade in triplicate from representative experiments bStrain C54-HAl::kan was constructed by transforming C54b-p- with linearized pHMW8-6. which contains the HA 1 gene with an intragenic kanamycin cassette.

15 'Strain C54-Tn400.23containsa mini-Tnl 0 kan element in the hsflocus(St. Geme et al.. Mol. Microbiol. 15:77-85 (1995)).

To determine the direction of transcription and identify plasmid-encoded proteins pDC601 and pDC602 were subsequently introduced into E. coli BL21(DE3).

producing BL21(DE3)/pDC601 and BL21(DE3)/pDC602, respectively. As a negative control, pT7-7 was also transformed into BL21 (DE3). The T7 promoter in these three strains was induced with IPTG. and induced proteins were detected using trans-[PS]-label. As shown in Figure 8. induction of BL21(DE3)/pDC601 resulted in expression of a large protein over 200 kDa in size along with several slightly smaller proteins, which presumably represent degradation products. In contrast. when BL21(DE3)/pDC602 and BL21(DE3)/pT7- 7 were induced, there was no expression of these proteins. This experiment indicated that the genetic material contained in the 8.3 kbXbal fragment is transcribed from left to right as shown in Figure 7 and suggested that a single long open reading frame may be present.

Nucleotide sequencing Nucleotide sequence was determined using a Sequenase kit and double-stranded plasmid template. DNA fragments were subcloned into pUC19 and sequenced along both strands by primer walking. DNA sequence analysis was performed using the Genetics Computer Group (GCG) software package from the University of 10 Wisconsin (Devereux. P. Haeberli. and 0. Smithies. 1984. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12:387-395.).

Sequence similarity searches were carried out using the BLAST program of the National Center for Biotechnology Information (Altschul, W. Gish. W. Miller, E.W. Myers. and D.J. Lipman. 1990. Basis local alignment search tool. J. Mol. Biol.

215:403-410.).

Sequencing of the 8.3 kb Xbal fragment revealed a 7059 bp gene. which is designated for literature purposes as hsffor LHae6mophilus surface fibrils. and is referred to herein as HA2. This gene encodes a 2353-amino acid polypeptide.

referred to as Hsfor HA2. with a calculated molecular mass of 243.8 kDa. which is similar in size to the observed protein species detected after induction of BL21 (DE3)/pDC601. The HA2 gene has a GC content of 42.8%. somewhat greater than the published estimate of 38-39% for the whole genome (Fleischmann et al..

1995. Whole- genomerandom sequencingand assemblyofHoemophilusinfluenzae Rd. Science. 269: 496-512.. Kilian. M. 1976. A taxonomic study of the genus Haemophilus. with proposal of a new species. J. Gen. Microbiol. 93:9-62.).

A

putative ribosomal binding site with the sequence AAGGTA begins 13 base pairs upstream of the presumed initiation codon. A sequence similar to a rho-independent -41transcriptionterminator is present beginning 20 nucleotides beyond the stop codon and contains interrupted inverted repeats with the potential for forming a hairpin structure containing a loop of two bases and a stem of 11 bases. Of note, a string of 29 thymines spans the region from 149 to 121 nucleotides upstream of HA2.

Homology to HA1IHAI The nontypable H. influenzae nonpilus protein HAl protein (called Hia in the literature) promotes attachment to cultured human epithelial cells as outlined above.

Comparison ofthe predicted amino acid sequence of HA2 and the sequence of HA1 Srevealed 81% similarity and 72% identity overall. As depicted in Figure 5. the two S 10 sequences are highly conserved at their N-terminal and C-terminalends. and both contain a Walker box nucleotide-bindingmotif. Interestingly. HA is encoded by 3.2 kb gene and is only 115-kDa. In this context, it is noteworthy that three separate stretches of HA2 (corresponding to amino acids 174 to 608. 847 to 1291, and 1476 to 1914,respectively)show significant homology to the region of HAl defined by amino acids 221 to 658 (Figure Table 2 summarizes the level of similarity and identity between these three stretches of HA2 and one another. The suggestion is that the larger size of HA2 may relate in part to the presence of a repeated domain which is present in single copy in HAl.

Table 2. Percent similarity and percent identity between HA2 repeats.

Percent Similarity/Percent Identity HA2 174-608a HA2 847-1291' HA21476191 4 HA2 174-608 65/53 76/60 70/56 HA2 847-1291 HA2 1476-1914 'Numbers correspondto amino acid residue positions in the full-length HA2 (Hsf) protein.

-42- To evaluate whether HA 1 and HA2 are alleles of the same locus. a series of Southem blotswere performed. Samples of chromosomalDNA from strains C54 and 11 were subjected to digestion with Bgll, Clal and either Psil or Xbal. Resulting

DNA

fragments were separated by agarose electrophoresis and transferred bidirectionally to nitrocellulose membranes. One membrane was probed with a 3.3 kb internal fragment of the HA2 gene (Figure and the other membrane was probed with a 1.6 kb intragenic fragment of the HA gene. As shown in Figure 9. both probes recognized exactly the same chromosomal fragments.

To obtain additional evidence that the H.42 and HAl genes are homologs. the 10 inactivation of HA? by transformation of H influenzae strain C54b-p- with insertionally inactivated HA was attempted. The plasmid pHMW8- 6 (Barenkampand J.W. St. Geme. III. Identification of a second family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol..

in press.). which contains the HAl. gene with an intragenic kanamycin cassette, was 15 linearized with Ndel and introduced into competent C54. Southern hybridization confirmed insertion of the kanamycin cassette into H/2 (not shown). Furthermore examination of the C54 mutant by negative staining transmission electron microscopy revealed the loss of surface fibrils (not shown). Consistent with these findings. the mutant strain demonstrated minimal attachment to Chang conjunctival cells (Table 1).

In additional experiments, the cellular binding specificities conferred by the HA2 and HAl proteins were compared. As shown in Figure 10. DH5/pDC601 (expressing HA2) demonstrated high level attachment to Chang cells. KB cells. HeLa cells, and Intestine 407 cells. moderate level attachmentto HEp-2 cells, and minimal attachment to HEC-IB cells. ME-180 cells, and CHO-K1 cells. DH5a harboring (expressing H showed virtually the same pattern of attachment.

-43- Giernsa staining and subsequent examination by light microscopY confirmed these viable count adherence assay results.

Homology to other bacterial extracellular proteins A protein sequence similarity search was performed with the HA2 predicted amino .acid sequence using the BLAST network service of the National Center for Biotechnology Information (Altschul, W. Gish. W. Miller. E.W. Myers, and D.J. Lipman. 1990. Basis local alignmnent search tool. J1. Mol. Biol. 215:403-4l10.).

This search revealed low-level sequence similarity to a series of other bacterial adherence factors. including HMW I and H-MW2 (the proteins previously identif iei as being important adhesins in HAl -deficientnontypable H. influenzae strains: (St.

Geme. J.W.I1. S. Falkow. and S.J. Barerikaflp. 1993 High-molecularweight proteins o f nontypabl e Hoemophilus influenZcic me diate attachment to human epithel ial cellIs.

Proc. Natl. Acad. Sci. U.S.A. 90:2875-2879.). AIDA-1 (an adhesion protein expressed by some diarrheageic E. coli strains: Benz, and M.A. Schmidt. 1992. AIDA-1.

the adhesin involved in diffuse adherence of the diarrhoeageflic Eschcrichia coli strain 2787 (0126:1127). is synthesized via a precursor molecule. Mol. Microbiol.

6:1 539-1546.). and Tsh (a hemagglutinin produced by an avian pathogenic E. coli strain: Provence. D. and R. Curtiss 111. 1994. Isolation and characterizationof a g-ene involved in hemagglutinationby an avian pathogenic Escherichiacoli strain. InfecL.

lrnmun. 62:136 9 -1' 80 in addition. H-A2 showed homology to SepA. a Shigella flexneri secreted protein that appears to play a role in tissue invasion (Benjelloun-Touimi. P.J. Sansonetti. and C. Parsot. 1995. SepA. the major extracellular protein of Shigellaflexfleri: autonomous secretion and involvement in tissue invasion. Mol. Microbol. 17:123-135.). Alignmnent of HA2 with l-MWI.

l-IMW2-. AIDA-i. Tsh. and SepA revealed a highly conserved N-terminal domain (Figure 11). In AIDA-l. Tsh. and SepA. this N-terminal extremity precedes a typical procaryotic signal sequence (Benjelloul-Touimi.Z.. P.J. Sansonettu. and C. Parsot.

1995. SepA. the major extracellular protein of Shigellaflexflcri: autonomous secretion -44and involvement in tissue invasion. Mol. Microbiol. 17:123-135.). Similarly. in HA2 this conserved domain precedes a 26 amino acid segment that is characterized by a positively charged region, followed by a string of hydrophobic residues. and then alanine-glutamine-alanine.

Presence of an HA2 homolog in other encapsulated and nonencapsulated strains Previous work demonstrated that an HA2 homolog is present in H influenzae type b strains M42 and Eagan (St. Geme, J.W.III. and D. Cutter. 1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b promote attachment to human epithelial cells. Mol. Microbiol. 15:77-85.). To define the extent to which 10 the HA2 locus is sharedby othertype b strains.a panel of evolutionarilydiverse type b isolates by Southern analysis were examined. Among these strains were six belonging to phylogenic division I and four belonging to phylogenic division II (Musser. J.S. Kroll. E.R. Moxon. and R.K. Selander. 1988. Evolutionary genetics of the encapsulated strains of Haemophilus influenzae. Proc. Natl. Acad.

Sci. U.S.A. 85:7758-7762.). Chromosomal DNA was digested with Bgll and then probed with the intragenic 3.3 kb fragment of the HA2 gene. As shown in Figure 12. all 10 strains showed hybridization. The universal presence among H influenzae type b raised the question of the prevalence of this locus in other non-type b encapsulated H. influen:ae. Southern analysis of a series of type a. c. d. e. and f isolates again demonstrated a homolog in all cases (Figure 13).

Recently Fleischmannet al. (FleischmannR.D.. et al.. 1995. Whole-genomerandom sequencing and assembly of Haemophilus influenzae Rd. Science. 269: 496-512.) reported the genome sequence of H. influenae strain Rd. which was one of the two serotype d strains examined by Southern analysis. In accord with the Southern blotting results. search of the Rd genome revealed an open reading frame with striking sequence similarity to HA2. The Rd gene is 894 nucleotides in length and is predicted to encode a protein of 298 amino acids. Overall. the Rd locus is 70% identical to the C54 HA2 gene, and the Rd derived amino acid sequence is 62% identical and similar to C54 HA2. Interestingly, the Rd open reading frame appears to be truncated due to a "premature" stop codon.

Previous experiments revealed that 13 of 15 nontypable strains lacking an HMWI /HMW2-relatedprotein had evidence of an HA I homolog (Barenkamp,

S.J.,

and J.W. St. Geme. III. Identification of a second family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol., in press.). Consistent with the demonstration that HA2 and H.41 are homologous.

Southern analysis of these 15 strains, probing with the 3.3 kb fragment of hsf.

10 demonstrated hybridization in 12 of the same 13 (not shown).

Chromosomal location of the HA2 locus In earlier work. the HAl locus in nontypable strain 11 was found to be flanked upstream by an open reading frame with significant homology to E. coli exoribonuclease II (Barenkamp. and J.W. St. Geme. 11. Identification of a second 15 family of high molecular weight adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol.. in press.). Similarly, the H.42 locus in S: strain C54 likewise is flankedon the 5' side by an open reading frame with similarity to E. coli exonuclease This gene terminates 357 base pairs before the HA12 start codon and encodes a protein with a predicted amino acid sequence that is 61% similar and 33% identical at its C-terminal end to exoribonucleaseII. Of note. the Rd HA2 homolog is also flanked upstream by the exoribonuclease II locus.

EXAMPLE 3 Cloning of HA3 Recombinant phage containing the nontypable Haemophilus strain 32 HA3 gene were isolated and characterized using methods modified slightly from those described -46previously (Barenkamp and St. Geme. Molecular Microbiology 1996, in press). In brief. chromosomal DNA from strain 32 was prepared by a modification of the method of Marmur (Marmur. 1961). Sau3A partial restriction digests of the DNA were prepared fractionated on 0.7% agarose gels. Fractions containing DNA fragments in the 9- to 20- kbp range were pooled, and a library was prepared by ligation into XEMBL3 arms. Ligation mixtures were packaged in vitro with Gigapack® (Stratagene, La Jolla. CA) and plate amplified in a P2 lysogen ofE. coli LE392.

Lambda plaque screening was performed using a mixture of three PCR products derived from strain 32 chromosomal DNA. These PCR products were amplified using 10 primer pairs previously shown to amplify DNA segments at the 5' end of the strain 11 HA1 gene. The primers were as follows: Primer designation strand sequence 44P positive CCG TGC TTG CCC AAC ACG CTT 64P positive GCT GCC ACC TTG CAC AAC AAC 15 93G-2 positive CTT TCA ATG CCA GAA AGT AGG 18T-1 negative CTT CAA CCG TTG CGG ACA ACA Each of the positive strand primers was used with the single negative strand primer to generate the three fragments used for probing the library.

The PCR products generated from strain 11 and strain 32 chromosomal DNA were identical in size. suggesing that the nucleotide sequences of these chromosomal regions were similar in the two strains. Plaque screening was performed using standard methodology (Berger and Kimmel. 1987) at high stringency: final wash conditions were 65C for 1 hour in buffer containing 2XSSC and 1% SDS. Positive plaques were identified by autoradiography. plaque purified and phage DNA was purified by standard methods. The same primer pairs used to generate the screening -47probes were then used to localize the HA3 gene by amplifying various restriction fragments derived from the phage DNA. Once localized, the strain 32 HA3 gene and flanking DNA were sequenced using standard methods.

In order to construct strain 32 isogenic Haemophilus influenzae mutants deficient in expression of the HA3 gene, bacteria were made competent using the MIV (Herriott et al. 1970) and were transformed with linearized pHMW8-6, selecting for kanamycin resistance. Allelic exchange was confirmed by Southern analysis. The mutants that no longer expressed HA3 exhibited a marked decrease in binding to Chang epithelial *cells, using the methods outlined above (data not shown).

10 Expression in non-adherent strains of E. coli did not result in adherence, although it has not been confirmed that the protein was actually expressed.

SEQUENCE LISTING GENERAL INFORMATION: APPLICANT: Washington University (ii) TITLE OF INVENTION: HAEMOPHILUS ADHESION PROTEINS (iii) NUMBER OF SEQUENCES: 19 (iv) CORRESPONDENCE ADDRESS: ADDRESSEE: Flehr, Hohbach, Test, Albritton Herbert STREET: Four Embarcadero Center, Suite 3400 CITY: San Francisco STATE: California COUNTRY: United States ZIP: 94111-4187 COMPUTER READABLE FORM: S" MEDIUM TYPE: Floppy disk COMPUTER: IBM PC compatible OPERATING SYSTEM: PC-DOS/MS-DOS SOFTWARE: PatentIn Release Version #1.30 (vi) CURRENT APPLICATION DATA: APPLICATION NUMBER: UNKNOWN FILING DATE: 22-MAR-1996

CLASSIFICATION:

(vii) PRIOR APPLICATION DATA: APPLICATION NUMBER: US 08/409,995 FILING DATE: 24-MAR-1995 (viii) ATTORNEY/AGENT INFORMATION: NAME: Silva, Robin M.

REGISTRATION NUMBER: 38,304 REFERENCE/DOCKET NUMBER: FP61053-1/RFT/RMS (ix) TELECOMMUNICATION INFORMATION: TELEPHONE: (415) 781-1989 TELEFAX: (415) 398-3249 TELEX: 910 277299 INFORMATION FOR SEQ ID NO:1: SEQUENCE CHARACTERISTICS: LENGTH: 3294 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: ATGAACAAAA TTTTTAACGT TATTTGGAAT

GAACTCACTC

ACCCTGTTGT

GCTTATGGCG

GTTCAAGAGG

TTGGTGGAGG

TCTAGCAAJLA

TTGTTTGA-AG

ATTACCTTTG

ACGATTGGCG

ACAACTGATG

CACTTGAATG

A.TTGACGGAG

AATGCGGGTT

GCACCCACAC

CCGCAACGGT

ATGCGAATTT

CTTATAAAGG

ACA;LTACTGC

ACGGCACAAG

GCAAAGGCGG

CTTTAGCGAA

GTGGTGCTGC

GCTTGAAGTT

GTATTGGTTC

GAGATCAAAG

GGAATATCA.P

CAAATGCGCC

TGAGGCGAAC

TAATTTCACT

TTTATTAAAT

GGCGACCGTA

GAACGAGAAA

TGTGCAGGTT

AGACCTTGGT

49

GTTGTGACTC

TCCGCCACCG

AACAATACTC

AATAATTCGA

CTAAATGAAA

GGCAATTTGC

AGCCAACAAG

ACTTCCACCT

GTGAAAACTG

C

*S

*.CC

C

CC**

C

AAACTTGGGT TGTCGTATCT TGCAGGTGCT ACA.ACAACAC

TGGCGGTTGC

CTGTTACGAA

TAGCAGATGC

AAAATGCGAG

GTAAATTGGG

TCAAACATGC

CTGAAAACGG

CGACTGTGAG

CGAAAGTGAA

CTA.ATGGCGA

TGGGTTCTCC

CAAGTATCA;

CAACTGGTC)

GTGCGGATA(

AAGTTAAAA!

;GAA.AAGCTAk 3 AAGGcAAAG

CGTATTGGCA

TAAGTTGAAG

AGAAAA.ACAA

TGATAA.ACTG

CTGGGTATTG

GGATGAAGTG

CAAACACACC

TGATACCTTA

TGTAACTAGT

TACTACGGTT

TGCTACTCAT

GGATGTCTTG

k ATCAGAAAAT

-AGAGACCACG

r CGGTGCGAAG PL CAAAGAGACA G CTTAGTGACT

CGCTAAAGAT

AACCTTGACA

TACGCATTAC

GGGTGTTAAA

CGATACTGTT

AGAAAACGGT

AGACGGTAAG

GCTGCGGGTG

GACACGCTTG

ACTCGTGCAG

GCTGGCTCAA

GAG2TTCTTGA

AAGAGAACCC

TTATTTACTC

GATGCAGACC

120 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 1560 1620 1680 GTCGATTTTG TTCATACTTA

ACTGTTACTG

ACTTCTGTTA

TAGATAGCAA

TCAAAGAAAA

AATAAAGTTG ATGGTGCTAA CGCGACTG;A GCGAAAGATG

TGATTGACGC

AATGGTCAAA ATGGCGACTT GGTAATGG-IA CAACTGCGAC GCGAAAGTTG GCGACGGCTT CTTACTGTGA ATGATGGTAA TCAACTGACG AGAAGAA-ATT AGCTGGACTA

CALACTGCTGC

CAAGALAGTTA AAGCGGGCGA CAAGAGGGTG CGAACTTTAC ACTTTAGGTA CA.GGAAATAA kGTGAATAAG

CGCAACTG-T

TGTAACTAAT

AAAACTAGAT

GAACGCTAAT

GGTTACAGCA

TGAGGCGGAC

TAAAGTAPACC

TTATTCACTG

TGGTGCGAAP

ACTGGTTGGA GAATTAAAAC AACCGATGCT GCATCAGGCA CAAATGTAAC

CTTTGCTAGT

GGCACCGATG GTATTACCGT

TAAGTATGAT

GGCGATAAA.A TCGCTGCAGA

TACGACCGCA

AATCCGAAAG GTAAAGTGGC

TGATGTTGCT

AAGGTTTALG TA.ACAGCCTT

AAACAGTCTA

GGTGGTACGC TTGATGGAAA

TGCAAGTGAG

TTTAAAGCAG GCA.AGAACTT

AAAAGTGAAA

CAAGATGCTT TAACAGGCTT

AACGAGCATT

ACTGAAATCA ACAAAGACGG

CTTAACCATC

ACACCAGCAA ATGGTGCGGG

TG

ATTAGTGCGG GCGGTCAGTC

GG

GCGAATTTCG ATCCGCTGAC

TA

TATAAAGGCT TGACCAATTT

GG

GACAATACCG CCGCAACCGT

GG

AAAACCACAG GCGGCTCAAC

GG

TTCAAAAGCG

GCAACGGTATC

ACTTTTGAAT TGGCTAAAGG

TC

AATGGAAAGG AAACGAGCCT

GC

GACTTAACA-A CAGGTCAGCC T2 GATAMAGGTG GCAAAGTCGT

T'

GGTTCTGGCT ATGTAACAGG

T.

CTTGGCTTGG CTGATGAAGC

T

TCTGCTGGTA CAACGGAAAT

T

AATACCAAAG TGAGCGCGGC ACAACCTTTG TGAAAACCGA

T

AACGGTA.AGA

AAATCACTAA

GCTGACGGTA

CGGCTGATATC

AAGAAAGTTG

TGAAAGACAA

GATAAJAACCA

AAGGCGAAGT

AGCCTTGATC

CAAATGATCA

GGCGATATTT

CTGCCACTTC

AAAGGGGTAA

CAA.ACCTTGC

GGCAAACGTG

CAGATGCAGG

ACTATGCCAG

GTA.AATCAAT

TTAGCTATCG

GGGTATCAAG

ACA.ACCAATA

GTCAAGGTAA

CAAATAAT

TTAAA-AAC

GCTCCGCC

ATGAAAAA

,GCGATTTG

AATATCAC

XATGTTTCC

;AAGTGGTT

TTAAAGTT

%XATTAAAA

rCTGTAACG

AACCAAGTC

GATGCGAA)

GTAA.ATGC(

JkCGGTGGAJ

'GTGGAATTI

~GTTGTCAA

;ACCAAAGA

:GATGGCAA

GAGCAATG

ATcAAAAGC

CACCGATG(

TGGACAAG

TACAGCAAI

GGTTGC-TA

AATTTCCG

AACAGGCG

GCAAACACCA TCAGCGTAAC

CAA.AGACGGC

GTTGTGAGCG GACTGAAGAA

ATTTGGTGAT

GACAACTTAA 'CGAAACAAAA

TGACGATGCC

GGTACAGACA AGCAAACTCC

AGTTGTTGCC

CGCGGCTTGG GCTGGGTCAT

TTCTGCGGAC

GATCAAGT7C GGAATGCGAA

CGAAGTGAAA

GGTAAAACGG TCAACGGTAG

GCGTGA.AATT

AAATCGA.MG A.ATTTACCGT

CAAAGAAACC

GGCGATAAAT ATTACAGCAA

AGAGGATATT

GATGGCAATA CAGTTGCTGC

GAA.ATATCAA

GATAATACTG AAGCTACCAT

AALCCAACAA-A

GCAGATGCGA TTGCGAA-ATC

AGGCTTTGAG

CGGGCGTTTG ATGATAAGAC

AAAAGCCTTA

CACGATAAAG TCCGTTTTGC

TAATGGTTTA

~AGCACCGATG CAAACGGCGA

TAAAGTGACC

G CCTTTAACGC AA.ATCTACAA

TACCGATGCA

AGATGGGCAAA CTA.AATGGTA

TGAACTGAAT

AGTTACCCTCG GTAACGTGGA

TTCAGACGGC

.TGGTATCACG CCAAAGCTGA

CGGTACTGCG

,T AAAGTTTCTA CCGATGAAA.A

ACACGTTGTC

;T AAAGGTGTCG TGATTGACAA

TGTGGCTAAT

:G ATTAACGGA.A GTCAGTTGTA

TGCTGTGGCA

TG AATAATCTTG AGGGCAAAGT

GAATAAAGTG

~T GCATTAGCGG CTTCACAGTT

ACCACAAGCC

TT GCGGGAAGTA GTTATCAAGG

TCAAAATGGT

AT AATGGCAALAG TGATTATTCG

CTTGTCAGGC

TT GCAGCAGGTG TTGGTTACCA

GTGG

1740 1800 1860 1920 1980 2040 2100 2160 2220 2280 2340 2400 2460 2520 2580 2640 2700 276 0 2820 2880 2940 3000 3060 3120 3180 3240 3294 9 o INFORMATION FOR SEQ ID NO:2: SEQUENCE

CHARACTERISTICS:

LENGTH: 1098 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: Met Asn Lys lie Phe Asn Val Ile Trp Asn Va 1 5 10 Val Val Val Ser Glu Leu Thr Arg Thr His T 20 25 Thr Val Ala Val Ala Val Leu Ala Thr Leu L< 35 40 Ala Asn Asn Asn Thr Pro Val Thr Asn Lys Li 55 Ala Asn Phe Asn Phe Thr Asn Asn Ser Ile A 65 70 7 Val Gin Glu Ala Tyr Lys Gly Leu Leu Asn L 90 Ser Asp Lys Leu Leu Val Glu Asp Asn Thr A 100 105 Leu Arg Lys Leu Gly Trp Val Leu Ser Ser I 115 120 Glu Lys Ser Gin Gin Val Lys His Ala Asp 130 135 Lys Gly Gly Val Gin Val Thr Ser Thr Ser 145 150 Ile Thr Phe Ala Leu Ala Lys Asp Leu Gly 165 170 Ser Asp Thr Leu Thr Ile Gly Gly Gly Ala 180 185 Thr Pro Lys Val Asn Val Thr Ser Thr Thr 195 200 Lys Asp Ala Ala Gly Ala Asn Gly Asp Thr 210 215 Ile Gly Ser Thr Leu Thr Asp Thr Leu Val 225 230 il Val Th ir Lys Cy eu Ser Al eu Lys A] la Asp A 5 eu Asn G la Ala T Lys Asn G 1 Glu Val I 140 Glu Asn 155 Val Lys Ala Ala Asp Gly Thr Val 220 Gly Ser 235 r Gl 's Al .a Th La T) La G] lu L hr V 1 ly T Leu P Gly I Thr Gly Leu 205 His Pro n a Ir r Lu ys al 1C hi *h< Ly Al 19 Ly Li

A

Thr Tr Ser Al Val G1 Gly As Lys G] 8 Asn A: Gly A SArg A e Glu G s His I a Thr 175 .a Thr '0 ,s Phe eu Asn la Thr p a u Ln 0 La sn sn ;ly hr 160 Jal Thr Ala Gly His 240 Ile Asp Gly Gly Lys Asp Val Leu 260 Ser Thr Thr Gl) 275 Thr Val Glu Ph~ 290 Asp Ser Lys GlI 305 Thr Ser Val II Asn Lys Glu Th 34 Asp Giu Gly Ly 355 Asn Lys Thr G1 370 Gly Asp Phe A 385 Gly Asn Gly T Val Lys Tyr A 4 Lvs Ile Ala A 435 Ala Asn Asn I 450 Lys Lys Leu 465 Ser Trp Thr Asn Ala Ser Ala Gly Lys 515 Ser Leu Gin 530 Asp 245 Asn Gin i Let u Asi e Ly 32! r As 0 s Gi .y Tr La T~ hr T 4 sp A 20 la A ?ro I Jal Thr Glu 500 Asn Asp Gin Ala Ser Se SGl~ 31 s Gi 5 n Ly y Le p Ai ir V 3, hr A 05 la L .sp I .ys rhr.

Thr 485 Gin Leu Ala

U

0 u

S

uu a.

9 1 4

A

4

A

c Ser Thr Gly Trp Glu Asn 280 Ala Asp 295 Lys ArS Lys Asi Val As] Val Th 36 Ile Ly 375 i Ala Se 0 a Thr V~ s Val G ir Thr A 4 ly Lys V 455 la Lys 70 la Ala 4 ;iu Val .ys Val Leu Thr 535 52 His Tyr Thr Arg 250 Asn Ile Lys Gly 265 Val Asp Phe Val Thr Giu Thr Thz Thr Glu Val Lya 315 p Gly Lys Leu Ph4 330 p Gly Ala Asn Al 345 r Ala Lys Asp Va 0 's Thr Thr Asp Al 38 !r Gly Thr Asn Vz 395 al Thr Asn Gly T] 410 iy Asp Gly Leu L 425 la Leu Thr Val A 40 ral Ala Asp Val 31y Leu Val Thr 475 Glu Ala Asp Gly 490 Lys Ala Gly Asp 505 Lys Gin Giu Gly 520 Glv Leu Thr Ser e a 1 y

G

I

Ala Ala Ser 255 Val Lys Ala 270 His Thr Tyr 285 Thr Val Thi Ile Gly Alz Thr Gly Ly 33 Thr Giu As 350 Ile Asp Al 365 a Asn Gly G1 1 Thr Phe A r Asp Gly I 4 *5 Leu Asp G 430 in Asp Gly L 445 La Ser Thr Z la Leu Asn Fly Thr Leu ys Val Thr 510 Ala Asn Phe 525 Ile Thr Leu 540 a

S

L

1 4 Ile Gly Asp Val Lys 320 Ala Ala i Val n Asn a Ser 400 e Thr y Asp (s Asn sp Glu er Leu 480 isp Gly ?he Lys Thr Tyr Gly Thr 0..0 0 *0 0 0 Gly 545 Thr Thr Ser Ser Thr 625 Asp Ile Val Val Ala 705 Asn Lys Asn Va1 Val 785 Leu Thr Asn 2 Pro 2 Lys Gly I Ala 610 Asn Asn Ser Arg Ser 690 Lvs G lv Glu Thr Thr 770 Thr Gly Lys ksn kla ksp eu 595 ksp Leu rhr Ala Asn 675 Gly Gly Lys Asp Va1 755 Asp Gl Let Al~ Gly Asn C Gly 3 580 Lys I Asn I Asp Ala Asp 660 Ala Lys Glu Glu Ile 740 Ala Asn Asn i Ala a Leu la L 5 ;iy A :le S .ys P .eu 1 1lu I 6 kla J 645 Lys Asn Thr Val Thr 725 Asp Ala Thr Gin Asp 805 Ser ,ys 50 la er ,he hr ,ys ;30 Lhr rhr lu Val Val 710 Ser Leu Lys Gi.

Va 79' Gl Al Thr Gly Ala Gly Lys 615 Gly Val Thr Val Asn 695 Lys Leu Thr Tyr i Ala 775 L Al~ u Ali a Gi'

G

Ala A Gly G 5 Asp P 600 Gin A Thr A Gly Gly C Lys 680 Gly Ser Val Thr Gin 760 Thr I Asp a Asp y Thr *sn ly 85 la LSn sp tsp ;ly j65 Phe Arg Asn Lys Glv 745 Asp Ile Ala Al~ Th: 5 Asn P 570 Gin S Asn I Asp I Lys Leu 650 Ser Lys Arg Glu Val 730 Gin Lys Thr Ile i Lys 810 r Glu 55 la A er V !he P sp A 6 31n 1 335 krg C rhr C Ser Glu Phe 715 Gly Pro Gly As Ala 795 Arg Ile *sn 'al Sp la :hr ;ly flu fly Ile 700 Thr Asp Lys Gi Lye 78( Ly Al Va lu Ile Asn Lys Asp Gly L Thr I Lys A 5 Pro L 605 Tyr I Pro N Leu C Tyr I Asn 685 Thr Val I Lys Leu Lys 765 sGly s Ser a Phe 1 Asn eu le sn ieu lys Tal ;ly iis 670 fly Phe Lys Tyr Lys 750 Va1 Sex G1 As Al 83 Thr I 5 Ser 'v 575 Val V Thr E Gly I Val 2 Trp 655 Asp Ile Glu Glu Tyr 735 Asp Val Gly Phe R Asp 815 a His 0 le ral Tal ;er ,eu kla lal 3In Psn Leu Thr 720 Ser Giv.

Ser Tyr Glu 800 Lys Asp 820 825 Lys Val Arg 835 Phe Ala Asn Gly.Leu 840 Asn Thr Lys Val Ser 845 Ala Ala Thr 54 Val Glu Ser Thr Asp Ala Asn Gly Asp Lys Val Thr Thr Thr Phe Val 850 855 860 Lys Thr Asp Val Giu Leu Pro Leu Thr Gin Ilie Tyr Asn Thr Asp Ala 865 870 875 880 Asn Gly Lys Lys Ile Thr Lys Val Val Lys Asp Gly Gin Thr Lys Trp 885 890 895 Tyr Giu Leu Asn Ala Asp Gly Thr Ala Asp Met Thr Lys Glu Val Thr 900 905 910 Leu Gly Asn Vai Asp Ser Asp Gly Lys Lys Val Val Lys Asp Asn Asp 915 920 925 Gly Lys Trp, Tyr His Ala Lys Ala Asp Gly Thr Ala Asp Lys Thr Lys :930 935 940 *Gly Glu Val Ser Asn Asp Lys Val Ser Thr Asp Glu Lys His Val Val *.945 950 955 960 *Ser Leu Asp Pro Asn Asp Gin Ser Lys Giy Lys Gly Val Val Ile Asp 965 970 975 Asn Vai Ala Asn Giy Asp Ile Ser Ala Thr Ser Thr Asp Ala Ile Asn 980 985 990 Gly Ser Gin Leu Tyr Ala Val Ala Lys Gly Val Thr Asn Leu Ala Gly 995 1000 1005 *Gin Val Asn Asn Leu Glu Gly Lys Val Asn Lys Val Gly Lys Arg Ala 1010 1015 1020 Asp Ala Gly Thr Ala Ser Ala Leu Ala Ala Ser Gln Leu Pro Gin Ala 1025 1030 1035 1040 *Thr Met Pro Gly Lys Ser Met Val Ala Ile Ala Gly Ser Ser Tyr Gin 1045 1050 1055 Gly Gln Asn Gly Leu Ala Ilie Gly Val Ser Arg Ile Ser Asp Asfl Gly 1060 1065 1070 Lys Val Ile Ile Arg Leu Ser Gly Thr Thr Asn Ser Gin Gly Lys Thr 1075 1080 1085 Gly Val Ala Ala Gly Val Gly Tyr Gin Trp, 1090 1095 INFORMATION FOR SEQ ID NO:3: SEQUENCE

CHARACTERISTICS:

LENGTH: 7291 base pairs TYPE: nucleic acid STRAN"DEDNES S: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE: NAME/KEY:

CDS

LOCATION: 163. .7221 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: TTTNTTTTTC TTATTTTTTT TTTTTTTTTT TTTTTTTTTT TTGAGGCTAA

ACTTTT'NGNA

AAATATCACT TTTTTATTCT CCAMATATAG AATAGAATAC GCACGATTTC

ACTAAGAAAA

GTATATTTAT CATTAATTTT ATTAAATATA AGGTAAATAA AA ATG AAC AAA ATT Met Asn Lys Ile 1 TTT AAC GTT ATT TGG MAT GTT ATG ACT CAA ACT TGG GTT GTC GTA TCT Phe Asn Val Ile Trp Asn Val. Met Thr Gin Thr Trp Val Val Val. Ser c

S.

S*

S

5555 9 *5*W 5**S GMA CTC ACT CGC Glu Leu Thr Arg GCC GTA TTG GCG Ala Val Leu Ala 40 GAT GAA GAT GAA Asp Glu Asp Giu 55

ACC

Thr CAC ACC AAA CGC His Thr Lys Arg TCC GCA ACC GTG Ser Ala Thr Val GAG ACC Glu Thr ACA CTG TTG TTT GCA Thr Leu Leu Phe Ala 45 GAG TTA GAC CCC GTA Glu Leu Asp Pro Val ACG GTT CAG GCG Thr Val Gin Ala AAT GCT ACC AsnAla Thr CCC GTG TTG Pro Val Leu 222 270 318 366 414 GTA CGC ACT Val Arg Thr AG-- TTC Ser Phe CAT TCC GAT His Ser Asp AAA GAA GGC ACG GGA Lys Glu Gly Thr Gly 75 GMA AAA GAA GTT ACA GAA Glu Lys Glu Val Thr Glu s0

AAT

Asn TCA AAT TGG GGA Ser Asn Trp Gly

ATA

Ile 90 TAT TTC GAC AAT Tyr Phe Asp Asn

MA

Lys 95 GGA GTA CTA A Gly Val Leu Lys GCC 4 62 Al a 100 GGA GCA ATC ACC Gly Ala Ile Thr

CTC

Leu 105 AAA GCC GGC GAC MAC CTG AAA ATC AMA Lys Ala Gly Asp Asn Leu Lys Ile Lys 110 Gin Asn 115 510 ACC GAT GMA Thr Asp Glu

AGC

Ser 120 ACC MAT GCC AGT Thr Asn Ala Ser TTC ACC TAC TCG Phe Thr Tyr Ser CTG AAA AMA 558 Leu Lys Lys 130 GAC CTC ACA GAT CTG ACC Asp Leu Thr Asp Leu Thr 135 GCA MAC GGC GAT MAA GTT Ala Asn Gly Asp Lys Val 150 AGT GTT GCA Ser Val Ala 140 ACT GA MAA TTA TCG TTT GGC Thr Giu Lys Leu Ser Phe Gly 606 GAT ATT ACC AGT GAT Asp Ile Thr Ser Asp 155 GCA MAT GGC TTG MA Ala Asn Gly Leu Lys 160 TTG GCG AAA Leu Ala Lys 165 ACA GGT AAC GGA AAT GTT CAT Thr Gly Asn Gly Asn Val His 170

TTG

Leu 175 AAT GGT TTG GAT TCA Asn Gly Leu Asp Ser 180 ACT TTG CCT GAT GCG GTA ACG AAT ACA Thr Leu Pro Asp Ala Val 185 Thr Asn Thr GGT GTG TTA AGT Gly Val Leu Ser 190 AGA GCT GCA ACT Arg Ala Ala Thr TTT ACA CCT Phe Thr Pro GTT TTA AAT Val Leu Asn 215

AAT

Asn 200 GAT GTT GAA AAA Asp Val Glu Lys

ACA

Thr 205 TCA TCA AGT Ser Ser Ser 195 GTT AAA GAT Val Lys Asp 210 GCT GGA GGT Ala Gly Gly 750 798 846 GCA GGT TGG AAC Ala Gly Trp Asn

ATT

Ile 220 AAA GGT GCT AAA Lys Gly Ala LyS 0S 9 4 0* 9 9 9 9**6 AAT GTT Asn Val 230 GAG AGT GTT GAT Glu Ser Val Asp

TTA

Leu 235 GTG TCC GCT TAT Val Ser Ala Tyr

AAT

As 240 AAT GTT GAA TTT Asn Val Glu Phe 894 942

ATT

Ile 245 ACA GGC GAT AAA Thr Gly Asp Lys ACG CTT GAT GTT Thr Leu Asp Val

GTA

Val 255 TTA ACA GCT AAA Leu Thr Ala Lys

GAA

Glu 260 AAC GGT AAA ACA Asn Gly Lys Thr GAA GTG AAA TTC Glu Val Lys Phe

ACA

Thr 270 CCG AAA ACC Pro Lys Thr AAA GAA AAA Lys Glu Lys AAT AAA GTT Asn Lys Val 295

GAC

Asp 280

ACA

Thr GGT AAG TTA TTT Gly Lys Leu Phe AGT AAC ACG GCG Ser Asn Thr Ala 300 ACT GGA AAA GAG Thr Gly Lys Glu 285 ACT GAT AAT ACA Thr Asp Asn Thr

AAT

Asn

GAT

Asp 305 TCT GTT ATC Ser Val Ile 275 AAC GAC ACA Asn Asp Thr 290 GAG GGT AAT Glu Gly Asn 990 1038 1086 1134 1182 1230 4.

GGC TTA Gly Leu 310 GTC ACT GCA AAA Val Thr Ala Lys

GCT

Ala 315 GTG ATT GAT GCT Val Ile Asp Ala

GTG

Val 320 AAC AAG GCT GGT Asn Lys Ala Gly

TGG

Trp 325 AGA GTT AAA ACA Arg Val Lys Thr ACT GCT AAT GGT Thr Ala Asn Gly

CAA

Gin 335 AAT GGC GAC TTC Asn Giy Asp Phe

GCA

Ala 340 ACT GTT GCG TCA Thr Val Ala Ser

GGC

Gly 345 ACA AAT GTA ACC Thr Asn Val Thr

TTT

Phe 350 GAA AGT GGC GAT Glu Ser Gly Asp GGT ACA Gly Thr 355 ACA GCG TCA Thr Ala Ser

GTA

Val 360 ACT AAA GAT ACT Thr Lys Asp Thr

AAC

-Asn 365 GGC AAT GGC ATC Gly Asn Gly Ile ACT OTT AAG Thr Val Lys 370 OAT AAA AAA Asp Lys Lys 1278 1326 TAC GAC GCG AAA GTT GGC GAC GGC TTG AAA TTT OAT AGC Tyr Asp Ala Lys Val Oly Asp Gly Leu Lys Phe Asp Ser 375 380 385 ATC GTT Ile Val 390 OCA GAT ACG ACC Ala Asp Thr Thr

GCA

Ala 395 CTT ACT GTG ACA GGT GGT AAG OTA GCT Leu Thr Val Thr Gly Gly Lys Val Ala 400 1374 1422

GAA

Glu 405 ATT GCT AAA GAA Ile Ala Lys Glu

GAT

Asp 410 GAC AAG AAA AAA CTT GTT AAT GCA GGC Asp Lys Lys Lys Leu Val Asn Ala Gly 415

GAT

Asp 420 TTG GTA ACA GCT TTA Leu Vai Thr Ala Leu 425 GGT AAT CTA AGT TGG Gly Asn Leu Ser Trp 430 AAA GCA AAA OCT Lys Ala Lys Ala GAG GCT Glu Ala 435 1470 GAT ACT GAT Asp Thr Asp GCA GGC GAA Ala Gly Glu 455

GGT

Gly 440 GCG CTT GAG GGG Ala Leu Glu Gly

ATT

Ile 445 TCA AAA GAC CAA Ser Lys Asp Gin GAA GTC AAA Glu Val Lys 450 AAA GTG AAA Lys Val Lys 1518 1566 ACG GTA ACC TTT Thr Val Thr Phe

AAA

Lys 460 GCO GGC AAG AAC Ala Gly Lys Asn CAG GAT Gln Asp 470 GGT GCG AAC TTT Gly Ala Asn Phe TAT TCA CTG CAA Tvr Ser Leu Gln

OAT

Asp 480 OCT TTA ACG GGT Ala Leu Thr Gly

TTA

Leu 485 ACG AGC ATT ACT Thr Ser Ile Thr

TTA

Leu 490 GGT GOT ACA ACT Gly Gly Thr Thr

AAT

Asn 495 GGC GGA AAT OAT Gly Gly Asn Asp

GCG

Ala 500 AAA ACC GTC ATC Lys Thr Val le

AAC

Asn 505 AAA GAC GOT TTA Lys Asp Oly Leu

ACC

Thr 510 ATC ACO CCA GCA Ile Thr Pro Ala GGT AAT Oly Asn 515 GGC GOT ACO Gly Gly Thr AAA GCA GOT Lys Ala Gly 535 GGT ACA AAC ACC Gly Thr Asn Thr

ATC

Ile 525 AGC GTA ACC Ser Val Thr AAT AAA OCT ATT ACT AAT OTT GCG AGT Asn Lys Ala Ile Thr Asn Val Ala Ser AAA OAT GOC ATT Lys Asp Gly Ile 530 GGT TTA AGA OCT Oly Leu Arg Ala 545 OCA ACT OAT TTA Ala Thr Asp Leu 1614 1662 1710 1758 1806 1854 1902 1950 1998 TAT GAC Tyr Asp 550 OAT GCG AAT TTT Asp Ala Asn Phe OTT TTA AAT AAC Val Leu As Asn

AAT

Asn 565 AGA CAC OTT GAA Arg His Val Glu

GAT

Asp 570 OCT TAT AAA GGT Ala Tyr Lys Gly

TTA

Leu 575 TTA AAT CTA AAT Leu Asn Leu Asn

GAA

Olu 580 AAA AAT OCA AAT Lys Asn Ala Asn

AAA

Lys 585 CAA CCO TTG GTG Gin Pro Leu Val

ACT

Thr 590 GAC AOC ACG GCG Asp Ser Thr Ala GCG ACT Ala Thr 595 OTA GOC OAT Val Gly Asp

TTA

Leu 600 CGT AAA TTG GOT TGG GTA OTA TCA ACC Arg Lys Leu Gly Trp Va Val Ser Thr 605 AAA AAC GGT Lys Asn Oly 610 ACG AAA GAA GAA AGC AAT Thr Lys Giu Glu Ser Asn 615 ACC GGA GCC GGT GCT GCT Thr Gly Ala Gly Ala Ala CAA GTT Gin Val 620 AAA CAA GCT Lys Gin Ala GAT GAA GTC CTC TTT Asp Glu Vai Leu Phe 625

ACG

Thr 635 GTT ACT TCC AAA Val Thr Ser Lys TCT GAA Ser.Glu 640 AAC GGT AAA Asn Gly Lys

CAT

His 645 ACG ATT ACC GTT Thr Ile Thr Val

AGT

Ser 650 GTG GCT GAA ACT Val Ala Giu Thr

AAA

Lys 655 GCG GAT TGC GGT Ala Asp Cys Gly

CTT

Leu 660 GAA AAA GAT GGC Glu Lys Asp Gly

GAT

Asp 665 ACT ATT AAG CTC Thr Ile Lys Leu

AAA

Lys 670 GTG GAT AAT CAA Val Asp Asn Gin AAC ACT Asn Thr 675 a a GAT AAT GTT Asp Asn Yal

TTA

Leu 680 ACT GTT GGT AAT AAT GGT ACT GCT Thr Vai Giy Asn Asn Giy Thr Ala

GTC

Val GGC TTT GAA ACT GTT AAA ACT GGA GCG ACT GAT GCA GAT Gly Phe Giu Thr Val Lys Thr Gly Ala Thr Asp Aia Asp 695 700 705 ACT AAA GGT Thr Lys Gly 690 CGC GGT AAA Arg Giy Lys AAA GTC GCA Lys Val Ala 2046 2094 2142 2190 2238 2286 2334 2382 2430 GTA ACT Val Thr 710 GTA AAA GAT GCT Val Lys Asp Ala

ACT

Thr 715 GCT AAT GAC GCT Ala Asn Asp Ala

AAG

Lys

ACT

Thr 725 GTA AAA GAT GTT Val Lys Asp Val

GCA

Ala 730 ACC GCA ATT AAT Thr Ala Ile Asn

AGT

Ser 735 GCG GCG ACT TTT Ala Ala Thr Phe

GTG

Val 740 AAA ACA GAG AAT Lys Thr Glu Asn

TTA

Leu 745 ACT ACC TCT ATT Thr Thr Ser Ile

GAT

Asp 750 GAA GAT AAT CCT Glu Asp Asn Pro ACA GAT Thr Asp 755 AAC GGC AAA Asn Gly Lys GCA GGT AAA Ala Gly Lys 775

GAT

Asp 760 GAC GCA CTT AAA Asp Ala Leu Lys

GCG

Ala 765 GGC GAT ACC TTA Gly Asp Thr Leu ACC TTT AAA Thr Phe Lys 770 ATT ACT TTT Ile Thr Phe 2478 2526 AAC CTG AAA GTT Asn Leu Lys Val

AAA

Lys 780 CGT GAT GGA AAA Arg Asp Gly Lys GAC TTG Asp Leu 790 GCG AAA AAC CTT Ala Lys Asn Leu

GAG

Glu 795 GTG AAA ACT GCG Val Lys Thr Ala

AAA

Lys 800 GTG AGT GAT ACT Val Ser Asp Thr 2574

TTA

Leu 805 ACO ATT GGC GGG Thr Ile Gly Gly

AAT

Asf 810 ACA CCT ACA GGT GGC ACT ACT GCG ACG Thr Pro Thr Gly Gly Thr Thr Ala Thr 815

CCA

Pro 820 2622 2670 AAA GTG AAT ATT Lys Val Asn Ile ACT AGC ACG GCT GAT GGT TTG AAT TTT GCA AAA GAA Thr Ser Thr Ala Asp Giy Leu Asn Phe Ala Lys Glu 825 830 835 ACA GCC GAT Thr Ala Asp ACA ACT TTA Thr Thr Leu 855 GCC TCG Ala Ser 840 GGT TCT AAG Gly Ser Lys AAT GTT TAT TTG Asn Val Tyr Leu 845 GGA GCG AAG TCT Gly Ala Lys Ser AAA GGT ATT GCG Lys Gly Ile Ala 850 ACT GAG CCA AGC Thr Glu Pro Ser

TCA

Ser 865 CAC GTT GAT His Val Asp 2718 2766 2814 TTA AAT GTG GAT GCG ACG Leu Asn Val Asp Ala Thr 870

AAA

Lys 875 AAA TCC AAT GCA Lys Ser Asn Ala

GCA

Ala 880 AGT ATT GAA GAT Ser Ile Glu Asp

GTA

Val 885 TTG CGC GCA GGT Leu Arg Ala Gly

TGG

Trp 890 AAT ATT CAA GGT Asn Ile Gin Gly GGT AAT AAT GTT Gly Asn Asn Val

GAT

Asp 900 2862 a.

TAT GTA GCG ACG Tyr Val Ala Thr

TAT

Tyr 905 GAC ACA GTA AAC Asp Thr Val Asn

TTT

Phe 910 ACC GAT GAC Thr Asp Asp ACA ACA ACG Thr Thr Thr GTT AAA ATC Val Lys Ile 935

GTA

Val 920 ACC GTA ACC CAA Thr Val Thr Gln

AAA

Lys 925 GCA GAT GGC AAA Ala Asp Gly Lys AGC ACA GGT Ser Thr Gly 915 GGT GCT GAC Gly Ala Asp 930 AAC GGC AAA Asn Gly Lys 2910 2958 3006 GGT GCG AAA ACT Gly Ala Lys Thr

TCT

Ser 940 GTT ATC AAA GAC Val Ile Lys Asp a CTG TTT Leu Phe 950 ACA GGC AAA GAC Thr Gly Lys Asp AAA GAT GCG AAT Lys Asp Ala Asn

AAT

Asn 960 GGT GCA ACC GTT Gly Ala Thr Val

AGT

Ser 965 GAA GAT GAT GGC Glu Asp Asp Gly GAC ACC GGC Asp Thr Gly ACA GGC Thr Gly 975 TTA GTT ACT GCA Leu Val Thr Ala

AAA

Lys 980 ACT GTG ATT GAT Thr Val Ile Asp

GCA

Ala 985 GTA AAT AAA AGC Val Asn Lys Ser

GGT

Gly 990 TGG AGG GTA ACC Trp Arg Val Thr GGT GAG Gly Glu 995 3054 3102 3150 3198 3246 GGC GCG ACT Gly Ala Thr GCC GAA Ala Glu 1000 ACC GGT GCA Thr Gly Ala ACC GCC Thr Ala 1005 GTG AAT GCG GGT AAC GCT Val Asn Ala Gly Asn Ala 1010 TTC AAA AAC GGC AAT GCG Phe Lys Asn Gly Asn Ala 1025 GAA ACC GTT ACA TCA Glu Thr Val Thr Ser 1015 GGC ACG AGC GTG AAC Gly Thr Ser Val Asn 1020 ACC ACA GCG ACC GTA AGC AAA GAT AAT GGC AAC ATC AAT GTC AAA TAC Thr Thr Ala Thr Val Ser Lys Asp Asn Gly Asn Ile Asn Val Lys Tyr 1030 1035 1040 GAT GTA AAT GTT GGT GAC GGC TTG AAG ATT GGC GAT GAC AAA AAA ATC Asp Val Asn Val Gly Asp Gly Leu Lys 11e Gly Asp Asp Lys Lys 1045 1050 1055 1060 3294 3342 OTT GCA GAC ACG ACC ACA CTT ACT GTA ACA GOT GOT AAG GTG TCT GTT Val Ala Asp Thr Thr Thr Leu Thr Val Thr Gly Gly Lys Val Ser Val 1065 1070 1075 3390 CCT GCT GOT OCT AAT AGT GTT AAT AAC AAT Pro Ala Gly Ala Asn Ser Val Asn Asri Asn 1080 1085 AAG AAA CTT GTT AAT GCA Lys Lys Leu Val Asn Ala 1090 3438 GAG GGT TTA GCC ACT Glu Cly Leu Ala Thr 1095 GCT TTA AAC AAC Ala Leu Asn Asn 1100 CTA AGC TOG ACG GCA AAA GCC Leu Ser Trp Thr Ala Lys Ala 1105 GGC GAA ACC GAC CAA GAA GTC Oly Glu Thr Asp Gin Glu Val 1120 3486 GAT AAA TfAT Asp Lys Tyr 1110 AAA GCA GCC Lys Ala Gly 1125 GCA OAT GGC Ala Asp Gly GAO TCA GAG 01u Ser Olu 1115 3534 GAC AAA OTA ACC TTT AAA Asp Lys Val Thr Phe Lys 1130 GCA GGC AAO AMC TTA AAA Ala Gly Lys Asn Leu Lys 1135

GTG

Val 1140 3582 AAA CAG TCT GMA AAA GAC TTT ACT TAT TCA Lys Gin Ser Glu Lys Asp Phe Thr Tyr Ser 1145 115( GOC TTA ACG Gly Leu Thr AGC ATT ACT Ser Ile Thr 1160 TTA GOT GOT ACA.

Leu Gly Gly Thr 1165 MAA GAC 0CC TTA Lys Asp Gly Leu 1180 CTG CAA GAC ACT TTA ACA Leu Gin -Asp Thr Leu Thr 1155 OCT AAT GGC AGA AAT OAT Ala Asn Oly Arg Asn Asp 1170 ACC ATC ACO CTG OCA AAT Thr Ile Thr Leu Ala Asn 1185 ACO OGA ACC OTC ATC AAC Thr Gly Thr Val Ile Asn i175 3630 3678 3726 3774 3 822-.

3 87 0 GOT GCT OCO OCA GGC ACA OAT OCO TCT AAC GOA AAC ACC ATC ACT OTA Gly Ala Ala Ala Giy Thr Asp Ala Ser Asn Gly Asn Thr Ile Ser Val 1190 1195 1200 ACC AMA GAC Thr Lys Asp 1205 ACT OCT TTA Ser Ala Leu CAA OAT AMA Oln Asp Lys GCC ATT ACT GC Gly Ile Ser Ala 1210 MAA ACC TAT A Lys Thr Tyr Lys 1225 GOT AAT AAA GA.A ATT Oly Asn Lys Glu Ilie 1215 OAT ACT CAA AAC ACTI Asp Thr Gin Asn Thr 1230 ACC MAT OTT A.O Thr Asn Val Lys 1220 OCA OAT GMA ACA Ala Asp Oiu Thr 1235 GAG TTC CAC CCC CCC OTT AAA A.C OCA MAT GM GTT GAO Giu Phe His Ala Ala Val Lys Asn Ala Asn Giu Val Oiu 1240 1245 1250 TTC GTO GOT MAA MAC GOT OCA ACC OTO TCT OCA AMA ACT OAT MAC MAC Phe Val Cly Lys Asn Giy Ala Thr Val Ser Ala Lys Thr Asp Asn Asn 1255 1260 1265 GGA AAM CAT ACT GTA ACO ATIT OAT OTT OCA GMA CCC MAA OTT OCT OAT Oly Lys His Thr Val Thr Ile Asp Val Ala Glu Ala Lys Val Cly Asp 12'70 1275 1280 3918 3966 4014 GGT CTT Gly Leu 1285 GAA AAA GAT ACT GAC Glu Lys Asp Thr Asp 1290 GGC AAG ATT AAA CTC AAA GTA Gly Lys Ile Lys Leu Lys Val 1295 GAT AAT Asp Asn 1300 GCA TCC Ala Ser 1315 4062 ACA GAT GGG AAT Thr Asp Gly Asn AAT CTA Asn Leu 1305 TTA ACC GTT Leu Thr Val GAT GCA Asp Ala 1310 ACA AAA GGT Thr Lys Gly 4110 4158 GTT GCC AAG Val Ala Lys GGC GAG TTT AAT GCC Gly Glu Phe Asn Ala 1320 GTA ACA Val Thr 1325 *at CAA GGC ACA AAT Gin Gly Thr Asn 1335 AAT GGT GCA ACT Asn Gly Ala Thr 1350 GGC GAC GTT GCT Gly Asp Val Ala 1365 GAA AAT GAC GAC Glu Asn Asp Asp GCC AAT GAG Ala Asn Glu CGC GGT AAA Arg Gly Lys 1340 ACA GAT GCA ACT ACA GCC Thr Asp Ala Thr Thr Ala 1330 GTG GTT GTC AAG GGT TCA Val Val Val Lys Gly Ser 1345 AAA AAA GTG GCA ACT GTT Lys Lys Val Ala Thr Val 1360 GCT ACC GAA ACT GAC AAG Ala Thr Glu Thr Asp Lys 1355 AAA GCG ATT AAC GAC GCA GCA ACT Lys Ala Ile Asn Asp Ala Ala Thr 1370 1375 AGT GCT ACG ATT GAT GAT AGC CCA Ser Ala Thr Ile Asp Asp Ser Pro 1385 1390 4206 TTC GTG AAA Phe Val Lys

GTG

Val 1380 ACA GAT GAT GGC Thr Asp Asp Gly 1395 4254 4302 4350 4398 4446 GCA AAT GAT Ala Asn Asp GCT CTC Ala Leu 1400 AAA GCA GGC Lys Ala Gly GAC ACC Asp Thr 1405 TTG ACC TTA AAA GCG GGT Leu Thr Leu Lys Ala Gly 1410 AAT ATT ACT TTT GCC CTT Asn Ile Thr Phe Ala Leu 1425 AAA AAC TTA AAA GTT AAA CGT Lys Asn Leu Lys Val Lys Arg 1415 GAT GGT AAA Asp Gly Lys 1420 GCG AAC GAC CTT AGT GTA AAA AGC GCA ACC GTT AGC GAT AAA TTA TCG Ala Asn Asp Leu Ser Val Lys Ser Ala Thr Val Ser Asp Lys Leu Ser 1430 1435 1440 CTT GGT ACA AAC GGC AAT AAA GTC AAT ATC ACA AGC GAC ACC AAA GGC Leu Gly Thr Asn Gly Asn Lys Val Asn Ile Thr Ser Asp Thr Lys Gly 1445 1450 1455 1460 4494 4542 TTG AAC TTC Leu Asn Phe TTA AAT GGC Leu Asn Gly GCT AAA GAT Ala Lys Asp 1465 ATT GCT TCA Ile Ala Ser 1480 AGT AAG ACA GGC GAT GAT GCT AAT ATT CAC Ser Lys Thr Gly Asp Asp Ala Asn Ile His 1470 1475 4590 ACT TTA ACT GAT ACA TTG Thr Leu Thr Asp Thr Leu 1485 TTA AAT AGT GGT Leu Asn Ser Gly 1490 AAC GAG AAA AAA Asn Glu Lys Lys 1505 4638 GCG ACA ACC AAT TTA GGT GGT Ala Thr Thr Asn Leu Gly Gly 1495 AAT GGT ATT ACT GAT Asn Gly Ile Thr Asp 1500 4686 CGC GCG GCC AGC GTT AAA GAT GTC TTG AAT C GGT TGG Arg Ala Ala Ser Val Lys Asp Val Leu Asn Ala Gly Trp 1510 1515 1.520 AAT GTT CGT Asn Val Arg 4734 GGT CTT Gly Val 1525 AAA CCG GCA TCT GCA Lys Pro Ala Ser Ala 1530 AAT AAT CAA GTG GAG Asn Asn Gin Val Glu 1535 AAT ATC GAC TTT Asn Ile Asp Phe 1540 4782 GTA GCA ACC Val Ala Thr ACG AGT GTA Thr Ser Val TAC GAC ACA Tyr Asp Thr 1545 GTG GAC TTT GTT ACT Val Asp Phe Val Ser 1550 GCA GAT AMA Gly Asp Lys CAC ACC Asp Thr 1555 4830 4878 ACT GTT Thr Val 1560 GAA AGT A Giu Ser Lys CAT AAT Asp Asn 1565 GGC AAC AGA Gly Lys Arg ACC CMA GTT Thr Glu Val 1570 00..

0.0.

AAA ATC CGT GC AAC ACT TCT Lys Ile Gly Ala Lys Thr Ser 1575 GTT ATC AMA GAC Val Ile Lys Asp 1580 CAC AAC GCC AAA CTC His Asn Gly Lys Leu 1585 4926 TTT ACA GGC AAA GAG CTG AAG GAT GCT AAC AAT AAT GGC GTA ACT GTT Phe Thr Gly Lys Clu Leu Lys Asp Ala Asn Asn Asn Cly Val Thr Val 1590 1595 1600 ACC GAA ACC GAC CCC AMA GAC GAG GCT AAT GGT TTA CTG ACT GCA AAA Thr Ciu Thr Asp Cly Lys Asp Giu Gly Asn Cly Leu Val Thr.Ala Lys 1605 1610 1615 1620 GCT GTC ATT GAT GCC GTG AAT AAC GCT GCT TGG AGA GTT MAA ACA ACA Ala Val Ile Asp Ala Val Asn Lys Ala Cly Trp Arg Val Lys Thr Thr 1625 1630 1635 GGT GC-T AAT CGT CAC AAT CAT CAC TTC GCA ACT GTT C TCA GGC ACA Cly Ala Asn Gly Gin Asn Asp Asp Phe Ala Thr Val Ala Ser Gly Thr 1640 1645 1650 AAT GTA ACC TTT CCT CAT GGT MAT CCC ACA ACT CCC GMA GTA ACT AMA Asn Val Thr Phe Ala Asp Cly Asn Gly Thr Thr Ala Ciu Vai Thr Lys 1655 1660 1665 CCA MAC CAC CCT ACT ATT ACT GTT MAA TAC MAT CTT MAA GTG GCT CAT Ala Asn Asp Cly Ser Ile Thr Val Lys Tyr Asn Val Lys Val Ala Asp 1670 1675 1680 CCC TTA AMA CTA GAC CCC CAT AAM ATC GTT GCA GAC ACG ACC GTA CTT Gly Leu Lys Leu Asp Cly Asp Lys Ile Val Ala Asp Thr Thr Val Leu 1685 1690 1695 1700 4974 5022 5070 5118 5166 5214 5262 ACT CTG CCA Thr Val Ala AMA TTT GTT Lys Phe Val CAT GGT MAA GTT ACA GCT CCC AMT MAT GC Asp Gly Lys Val Thr Ala Pro Asn Asn Gly 1705 1710 CAT CCA ACT GGT TTA CC CAT CC TTA MAT Asp Ala Ser Cly Leu Ala Asp Ala Leu Asn 1720 1725 CAT GGT AAC Asp Cly Lys 1715 MAA TTA AGC Lys Leu Ser 1730 5310 5358 TGG ACG GCA ACT GCT GGT AAA GAA GGC Trp Thr Ala Thr Ala Gly Lys Glu Gly 1735 1740 ACT GGT GAA GTT GAT CCT GCA Thr Gly Glu Val Asp Pro Ala 1745 5406 AAT TCA GCA Asn Ser Ala 1750 GGG CAA GAA Gly Gin Glu GTC AAA Val Lys 1755 GCG GGC GAC Ala Gly Asp AAA GTA Lys Val 1760 ACC TTT AAA Thr Phe Lys 5454 GCC GGC GAC AAC CTG Ala Gly Asp As Leu 1765 AAA ATC Lys Ile 1770 AAA CAA AGC Lys Gin Ser GGC AAA Gly Lys 1775 GAC TTT ACC Asp Phe Thr

TAC

Tyr 1780 5502 TCG CTG AAA AAA Ser Leu Lys Lys GAG CTG Glu Leu 1785 AAA GAC CTG Lys Asp Leu ACC AGC Thr Ser 1790 GTA GAG TTC Val Glu Phe AAA GAC Lys Asp 1795 5550 *ee.

GCA AAC GGC Ala Asn Gly GGT ACA Gly Thr 1800 GGC AGT GA.M Gly Ser Glu AGC ACC Ser Thr 1805 AAG ATT ACC Lys Ile Thr AAA GAC GGC Lys Asp Gly 1810 TTG ACC ATT ACG CCG Leu Thr Ile Thr Pro 1815 GCA AAC GGT GCG Ala Asn Gly Ala 1820 GGT GCG GCA Gly Ala Ala GGT GCA AAC ACT Gly Ala Asn Thr 1825 5598 5646 5694 GCA AAC ACC ATT AGC GTA ACC AAA GAT GGC ATT AGC GCG GGT AAT AAA Ala Asn Thr Ile Ser Val Thr Lys Asp Gly Ile Ser Ala Gly Asn Lys 1830 1835 1840 GCA GTT ACA AAC Ala Val Thr Asn 1845 GTT GTG AGC GGA Val Val Ser Gly 1850 CTG AAG AAA TTT GGT Leu Lys Lys Phe Gly 1855 GAT GGT CAT Asp Gly His 1860 5742 5790 ACG TTG GCA AAT GGC ACT GTT GCT GAT Thr Leu Ala Asn Gly Thr Val Ala Asp 1865 TTT GAA Phe Glu 1870 AAG CAT TAT GAC AAT Lys His Tyr Asp Asn 1875 GGC GCG GAT AAT AAT Gly Ala Asp Asn Asn 1890 GCC TAT AAA G Ala Tyr Lys A CCG ACT GTT Pro Thr Val I 1895 ,AC TTG ~sp Leu L880 ACC AAT TTG Thr Asn Leu GAT GAA AAA Asp Glu Lys 1885 5838 5886 GCC GAC M.T ACC Ala Asp Asn Thr GCT GCA Ala Ala 1900 ACC GTG GGC Thr Val Gly GAT TTG CGC GGC Asp Leu Arg Gly 1905 TTG GGC TGG GTC Leu Gly Trp Val 1910 GAA TAC AAC GCG Glu Tyr Asn Ala 1925 ATT TCT GCG GAC AAA ACC ACA Ile Ser Ala Asp Lys Thr Thr 1915 GGC GAA Gly Glu 1920 CCC AAT CAG Pro Asn Gin 5934 5982 CAA GTG CGT AAC GCC Gin Val Arg Asn-Ala 1930 AAT GAA GTG AAA TTC AAG Asn Gu Val Lys Phe Lys 1935

AGC

Ser 1940 GGC AAC GGT ATC AAT GTT TCC GGT AAA ACA TTG AAC GGT ACG Gly Asn Gly Ile Asn Val Ser Gly Lys Thr Leu Asn Gly Thr 1945 1950 CGC GTG Arg Val 1955 6030 ATT ACC TTT GAA TTG GCT AAA Ile Thr Phe Glu Leu Ala Lys 1960 GGC GAA GTG Gly Glu Val 1965 OTT AAA TCG AAT GAA TTT Val Lys Ser Asn Glu Phe 1970 AAC TTG OTT AAA GTT GGC Asn Leu Val Lys Val Gly 1985 6078 6126 ACC OTT AAG AAT GCC Thr Val Lys Asn Ala 1975 GAT GGT TCG GAA ACG Asp Gly Ser Glu Thr 1980 GAT ATG TAT Asp Met Tyr 1990 ATG ACA GGT Met Thr Gly 2005 TAC AGC AAA Tyr Ser Lys GAG OAT ATT Glu Asp Ile 1995 GAC CCG GCA ACC Asp Pro Ala Thr 2000 AGT AAA CCO Ser Lys Pro 6174 AAA ACT GAA AAA TAT Lys Thr Glu Lys Tyr 2010 AAO GTT GAA AAC Lys Val Glu Asn 2015 GGC AAA GTC Gly Lys Val

GTT

Val 2020 6222 TCT OCT AAC GGC Ser Ala Asn Oly AGC AAG Ser Lys 2025 ACC GAA OTT ACC CTA Thr Glu Val Thr Leu 2030 ACC AAC AAA Thr Asn Lys GGT TCC Gly Ser 2035 6270 GGC TAT GTA ACA GGT AAC CAA Gly Tyr Val Thr Gly Asn Gin 2040 GTG GCT OAT Val Ala Asp 2045 OCG ATT GCG Ala Ile Ala AAA TCA GGC Lys Ser Gly 2050 TTT GAG CTT GOT TTG Phe Glu Leu Gly Leu 2055 OCT GAT GCG GCA Ala Asp Ala Ala 2060 GAA OCT GAA Glu Ala Glu AAA 0CC TTT GCA Lys Ala Phe Ala 2065 OAA AGC OCA Glu Ser Ala 2070 AAA GAC AAO Lys Asp Lys CAA TTG TCT AAA OAT Gin Leu Ser Lys Asp 2075 AAA GCG Lys Ala 2080 GAA ACT GTA Glu Thr Val 6318 6366 6414 6462 6510 AAT 0CC Asn Ala 2085 CAC OAT AAA His Asp Lys OTC CGT Val Arg 2090 TTT OCT AAT Phe Ala Asf OGT TTA Oly Leu 2095 AAT ACC AAA Asn Thr Lys

GTG

Val 2100 AGC OCG OCA ACO Ser Ala Ala Thr GTG OAA Val Glu 2105 AGC ACT OAT Ser Thr Asp OCA AAC Ala Asn 2110 GGC OAT AAA Gly Asp Lys GTG ACC Val Thr 2115 ACA ACC TTT Thr Thr Phe GTG AAA ACC Val Lys Thr 2120 OAT GTG GAA TTG CCT Asp Val Glu Leu Pro 2125 AAT AAO ATC OTT AAA Asn Lys Ile Val Lys 2140 TTA ACO CAA ATC TAC Leu Thr Gin Ile Tyr 2130 AAA OCT GAC GOA AAA Lys Ala Asp Oly Lys 2145 6558 AAT ACC GAT OCA AAC GGT Asn Thr Asp Ala Asn Oly 2135 TGG TAT GAA CTO Trp Tyr Glu Leu 2150 CTT GOT AAC GTG Leu Gly Asn Val 2165 AAT OCT OAT GOT ACO Asn Ala Asp Gly-Thr 2155 OAT OCA AAC GOT AAO Asp Ala Asn Oly Lys 2170 GCG AGT AAC AAA GAA GTG ACA Ala Ser Asn Lys lu Val Thr 2160 AAA TT GTG AAA TA ACC

AA

Lys Val Val Lys Val Thr lu 2175 2180 6606 6654 6702 AAT GGT GCG GAT AAG TGG Asn Gly Ala Asp Lys Trp 2185 AAA ACC AAA GGC GAA GTG Lys Thr Lys Gly Giu Val 2200 CAC GTT GTC CGC CTT GAT His Val Val Arg Leu Asp 2215 TAT TAC ACC AAT GCT Tyr Tyr Thr Asn Ala 2190 GAC GGT OCT GCG GAT Asp Gly Ala Ala Asp 2195 6750 AGC AAT GAT AAA Ser Asn Asp Lys 2205 GTT-TCT ACC Val Ser Thr GAT GAA AA.A Asp Giu Lys 2210 6798 6846 CCG AAC AAT Pro Asn Asn 2220 CAA TCG AAC Gin Ser Asn GGC AAA GGC GTG Gly Lys Gly Val 2225 GTC ATT GAC AAT GTG GCT Val Ile Asp Asn Val. Ala 2230 AAT GGC Asn Gly 2235 GAA ATT TCT Giu Ile Ser GCC ACT TCC ACC GAT Ala Thr Ser Thr Asp 2240 GCG ATT Ala Ile 2245 AAC GGA AGT Asn Gly Ser CAG TTG Gin Leu 2250 TAT GCC GTG Tyr Ala Val GCA AAA Ala Lys 2255 GGG GTA ACA Gly Val Thr

AAC

Asn 2260 6894 6942 6990 7038 CTT GCT GGA CAA Leu Ala Giy Gin GTG AAT AAT Val Asn Asn 2265 CTT GAG GGC AAA GTG Leu Giu Gly Lys Val 22'70 AAT AAA GTG GGC Asn Lys Vai Gly 2275 GCT TCA CAG TTA Ala Ser Gin Leu 2290 AAA CGT OCA Lys Arg Ala GAT GCA Asp Ala 2280 GGT ACA GCA Gly Thr Ala AGT GCA TTA GCG Ser Ala Leu Ala 2285 CCA CAM GCC ACT ATG Pro Gin Ala Thr Met 2295 AGT TAT CAA GGT CAA Ser Tyr GIn Gly Gin 2310 CCA GGT AAA TCA ATG GTT Pro Gly Lys Ser Met Val 2300 GCT ATT GCG GGA AGT Ala Ile Ala Gly Ser 2305 7086 7134 AAT GGT TTA OCT ATC GGG GTA TCA AGA ATT TCC Asn Gly Leu Ala Ile Gly Val Ser Arg Ile Ser 2315 2320 GAT AAT Asp Asfl 2325 GGC AAA GTG Gly Lys Val ATT ATT CGC TTG Ile Ile Arg Leu 2330 TCA GGC ACA ACC AAT AGT Ser Gly Thr Thr Asn Ser 2335 Gin 2340 7182 GGT AAA ACA GGC Gly Lys Thr Gly GTT GCA OCA GGT OTT Val Ala Ala Oly Val 2345 GGT TAC CAG TGG TAAAGTTTOG Gly Tyr Gin Trp 2350 7231 ATTATCTCTC TTAAAAAGCG GCATTTGCCG CTTTTTTTAT GGGTGGCTAT

TATGTATCGT

INFORMATION FOR SEQ ID NO:4: SEQUENCE

CHARACTERISTICS:

LENGTH: 2353 amino acids TYPE: amino acid TOPOLOGY: linear 7291 (ii) MOLECULE TYPE: protein

S

(xi) SEQUENCE DE Met Asn Lys Ile Phe As 1 5 Val Val Val Ser Giu Le Thr Val Glu Thr Ala Va Ala Asn Ala Thr Asp GI Ala Pro Val Leu Ser P1 Glu Val Thr Giu Asn SE Val Leu Lys Ala Gly A: 100 Ile Lys Gin Asn Thr A 115 Ser Leu Lys Lys Asp L 130 Leu Ser Phe Gly Ala A 145 1 Asn Gly Leu Lys Leu 165 Glv Leu Asp Ser Thr I 180 Ser Ser Ser Ser Phe 195 Thr Val Lys Asp Val 210 Thr Ala Gly Gly Asn 225 Asn Val Glu Phe lie 245 Thr Ala Lys Glu Asn 260 Thr Ser Val Ile Lys 275 66 SCRIPTION: SEQ ID NO n Val Ile Trp Asn Va 10 u Thr Arg Thr His Th 25 .1 Leu Ala Thr Leu Le 40 .u Asp Glu Glu Leu As 55 ie His Ser Asp Lys GJ 00 ir Asn Trp Gly Ile T 90 la Ile Thr Leu Lys A 105 sp Giu Ser Thr Asn A 120 eu Thr Asp Leu Thr S 135 .sn Gly Asp Lys Val A .50 1 la Lys Thr Gly Asn C 170 .eu Pro Asp Ala Val 185 rhr Pro Asn Asp Val 200 Leu Asn Ala Gly Trp 215 Val Glu Ser Val Asp 230 Thr Gly Asp Lys Asn 250 Gly Lys Thr Thr Glu 265 Glu Lys Asp Gly Lys 280 :4: 1 Met Th .r Lys Ar u Phe Al 4 ip Pro Va LU Gly Th 75 rr Phe As la Gly A la Ser Si er Val A 140 sp le T .55 ;ly Asn V Thr Asn T Glu Lys Asn Ile 220 Leu Vai 235 Thr Leu Val Lys Leu Phe r g a 1 r er la hr 'al rh 20 Ly Se As P1 2 2 Gin Thr Tr Ala Ser Al Thr Vai G1 Val Arg T Gly Giu L Asn Lys G: Asn Leu L 110 Phe Thr T Thr Glu L Ser Asp

I

His Leu 1 175 r Glv Val 190 r Arg Ala s Gly Ala r Ala Tyr ;p Val Val 255 -ie Thr Pro 270 hr Gly Lys p a nn Ir (s ly ys yr ,ys la sn Leu Al a Lys Asn 240 Leu Lys Glu 67 Asn Asn Asp Thr Asn Lys Val Thr Ser Asn Thr Ala Thr Asp Asn Thr 290 295 300 Asp Glu Gly Asn Gly Leu Val Thr Ala Lys Ala Val Ile Asp Ala Val 305 310 315 320 Asn Lys Ala Gly Trp Arg Val Lys Thr Thr Thr Ala Asn Gly Gin Asn 325 330 335 Gly Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Glu Ser 340 345 350 Gly Asp Gly Thr Thr Ala Ser Val Thr Lys Asp Thr Asn Gly Asn Gly 355 360 365 Ile Thr Val Lys Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Phe Asp 370 375 380 S* Ser Asp Lys Lys Ile Val Ala Asp Thr Thr Ala Leu Thr Val Thr Gly 385 390 395 400 Gly Lys Val Ala Glu Ile Ala Lys Glu Asp Asp Lys Lys Lys Leu Val 405 410 415 Asn Ala Gly Asp Leu Val Thr Ala Leu Gly Asn Leu Ser Trp Lys Ala 420 425 430 Lys Ala Glu Ala Asp Thr Asp Gly Ala Leu Glu Gly Ile Ser Lys Asp 435 440 445 Gin Glu Val Lys Ala Gly Glu Thr Val Thr Phe Lys Ala Gly Lys Asn 450 455 460 Leu Lys Val Lys Gin Asp Gly Ala Asn Phe Thr Tyr Ser Leu Gin Asp S465 470 475 480 o Ala Leu Thr Gly Leu Thr Ser Ile Thr Leu Gly Gly Thr Thr Asn Gly 485 490 495 Gly Asn Asp Ala Lys Thr Val Ile Asn Lys Asp Gly Leu Thr Ile Thr 500 505 510 Pro Ala Gly Asn Gly Gly Thr Thr Gly Thr Asn Thr Ile Ser Val Thr 515 520 525 Lys Asp Gly Ile Lys Ala Gly Asn Lys Ala Ile Thr Asn Val Ala Ser 530 535 540 Gly Leu Arg Ala Tyr Asp Asp Ala Asn Phe Asp Val Leu Asn Asn Ser 545 550 555 560 Ala Thr Asp Leu Asn Arg His Val Glu Asp Ala Tyr Lys Gly Leu Leu 565 570 575 Asn Leu Asn Glu Lys Asn Ala Asn Lys Gin Pro Leu Val Thr Asp Ser 580 585 590 Thr Ala Ala Thr Val Gly Asp Leu Arg Lys Leu Gly Trp Val Val Ser 595

S

Thr 1 6 Glu 1 625 Glu I Asp Asn Val Asp 705 Lys Ala Asn Leu Asn 785 Val Thr Phe Lys ,ys i10 Pal ksn :ys 31n rhr 590 Drg Lys Thr Pro Thr "770 Ile Ser Ala Alz Gl Asn Leu Gly Gly Asn 675 Lys Gly Val Phe Thr 755 Phe Thr Asi Tku Ly 83 I Z1 Gly Th Phe Th Lys Hi 64 Leu Gi 660 Thr As Glv GI Lys Va Ala Ti 7: Val L 740 Asp A Lys A Phe A Thr L 8 Pro L 820 s Glu '1 5 e Ala r r s u yy r Z5

YS

sn la Lys Gly 630 Thr Lys Asn Phe Thr 710 Va1 Thr Glu 615 Ala Ile Asp Val Glu 695 Val Lys Glu 600 Glu Gly Thr Gly Leu 680 Thr Lys Asp Asn Ser Ala Va1 Asp 665 Thr Val Asp Val Leu 745 Asn G Ala 'T 6 Ser V 650 Thr I Val C Lys I Ala Ala 730 Thr Ala Lys Leu i Asn 810 r Ser 5 r Gly u Pro 605 in hr 35 'al :le ;ly hr rhr 715 rhr Thr Leu Val Val I '620 Val Ala Lys Asn Gly 700 Ala Ala Ser Lys Lys -ys rhr G1u Leu Asn 685 Ala Asn Ile Ile Ala 765 Arc Gin Ser Thr Lys 670 Gly Thr Asp Asn Asp 750 Gly Asp kla Lys Lys 655 Val Thr Asp Ala Ser 735 Glu Asp Gl Asp Ser 640 Ala Asp Ala Ala Asp 720 Ala Asp Thr t Lys Gly Gly Lys Lys Asp 760 Asn 775 sp ,eu 05 ,ys 'hr [hr Leu 790 Thr Val Ala Thr Ala Ile Asn Asp Leu Lys Giy Ile Ala 840 Thr Asp Leu Asr Gil Th~ 82! Se:

GI

Glu 795 Thr Thr Ser Ser Val Pro Ala Lys Ala Lys Thr Asp As 845 Gly 780 Thr Gly Gly 830 Val Ala Ala Gly 815 Leu Tyr Lys Lys 800 Thr Asn Leu Ser 855 860 Ser 865 Ser His Ile Va1 Glu Asp Asp Leu Va1 885 Asn 870 Leu Val Arg Asp Ala Ala Gly Thr Trp 890 Lys 875 Asn Lys Ile Asn Gly Ala Ala 880 Asn Gly 895 69 Asn Asn Val Asp Tyr Val Ala Thr Tyr Asp Thr Val Asn Phe Thr Asp 900 905 910 Asp Ser Thr Gly Thr Thr Thr Val Thr Val Thr Gin Lys Ala Asp Gly 915 920 925 Lys Gly Ala Asp Val Lys Ile Gly Ala Lys Thr Ser Val Ile Lys Asp 930 935 940 His Asn Gly Lys Leu Phe Thr Gly Lys Asp Leu Lys Asp Ala Asn Asn 945 950 955 960 Gly Ala Thr Val Ser Glu Asp Asp Gly Lys Asp Thr Gly Thr Gly Leu 965 970 975 Val Thr Ala Lys Thr Val Ile Asp Ala Val Asn Lys Ser Gly Trp Arg *980 985 990 Val Thr Gly Glu Gly Ala Thr Ala Glu Thr Gly Ala Thr Ala Val Asn 995 1000 1005 Ala Gly Asn Ala Glu Thr Val Thr Ser Gly Thr Ser Val Asn Phe Lys 1010 1015 1020 Asn Gly Asn Ala Thr Thr Ala Thr Val Ser Lys Asp Asn Gly Asn Ile :1025 1030 1035 1040 Asn Val Lys Tyr Asp Val Asn Val Gly Asp Gly Leu Lys Ile Gly Asp 1045 1050 1055 Asp Lys Lys Ile Val Ala Asp Thr Thr Thr Leu Thr Val Thr Gly Gly 1060 1065 1070 Lys Val Ser Val Pro Ala Gly Ala Asn Ser Val Asn Asn Asn Lys Lys 1075 0801085 Leu Val Asn Ala Glu Gly Leu Ala Thr Ala Leu Asn Asn Leu Ser Trp 1090 1095 1100 Thr Ala Lys Ala Asp Lys Tyr Ala Asp Gly Glu Ser Glu Gly Glu Thr 1105 1110 1115 1120 Asp Gin Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys Ala Gly Lys 1125 1130 1135 Asn Leu Lys Val Lys Gin Ser Glu Lys Asp Phe Thr Tyr Ser Leu Gin 1140 1145 1150 Asp Thr Leu Thr Gly Leu Thr Ser Ile Thr Leu Gly Gly Thr Ala Asn 1155 1160 1165 Gly Arg Asn Asp Thr Gly Thr Val Ile Asn Lys Asp Gly Leu Thr Ile 1170 1175 1180 Thr Leu Ala Asn Gly Ala Ala Ala Gly Thr Asp Ala Ser Asn Gly Asn 1185 1190 1195 Thr Ile Ser Val Thr Lys Asp Gly Ile Ser Ala Gly Asn Lys Glu lle 1205 1210 1215 Thr Asn Val Lys Ser Ala Leu Lys Thr Tyr Lys Asp Thr Gin Asn Thr 1220 1225 1230 Ala Asp Glu Thr Gin Asp Lys Glu Phe His Ala Ala Val Lys Asn Ala 1235 1240 1245 Asn Glu Val Glu Phe Val Gly Lys Asn Gly Ala Thr Val Ser Ala Lys 1250 1255 1260 Thr Asp Asn Asn Gly Lys His Thr Val Thr Ile Asp Val Ala Glu Ala 1265 1270 1275 1280 Lys Val Gly Asp Gly Leu Glu Lys Asp Thr Asp Gly Lys Ile Lys Leu 1285 1290 1295 Lys Val Asp Asn Thr Asp Gly Asn Asn Leu Leu Thr Val Asp Ala Thr So, 1300 1305 1310 Lys Gly Ala Ser Val Ala Lys Gly Glu Phe Asn Ala Val Thr Thr Asp 1315 1320 1325 Ala Thr Thr Ala Gin Gly Thr Asn Ala Asn Glu Arg Gly Lys Val Val 1330 1335 1340 Val Lys Gly Ser Asn Gly Ala Thr Ala Thr Glu Thr Asp Lys Lys Lys 1345 1350 1355 1360 r al Ala Thr Val Gly Asp Val Ala Lys Ala Ile Asn Asp Ala Ala Thr 1365 1370 Phe Val Lys Val Glu Asn Asp Asp Ser Ala Thr Ile Asp Asp Ser Pro 1380 1385 1390 Thr Asp Asp Gly Ala Asn Asp Ala Leu Lys Ala Gly Asp Thr Leu Thr 1395 1400 1405 Leu Lys Ala Gly Lys Asn Leu Lys Val Lys Arg Asp Gly Lys Asn Ile 1410 1415 1420 Thr Phe Ala Leu Ala Asn Asp Leu Ser Val Lys Ser Ala Thr Val Ser 1425 1430 1435 1440 Asp Lys Leu Ser Leu Gly Thr Asn Gly Asn Lys Val Asn Ile Thr Ser 1445 1450 1455 Asp Thr Lys Gly Leu Asn Phe Ala Lys Asp Ser Lys Thr Gly Asp Asp 1460 1465 1470 Ala Asn Ile His Leu Asn Gly Ile Ala Ser Thr Leu Thr Asp Thr Leu 1475 1480 1485 Leu Asn Ser Gly Ala Thr Thr Asn Leu Gly Gly Asn Gly Ile Thr Asp 1490 1495 1500 71 Asn Glu Lys Lys Arg Ala Ala Ser Val Lys Asp Val Leu Asn Ala Gly 1505 1510 1515 1520 Trp Asn Val Arg Gly Val Lys Pro Ala Ser Ala Asn Asn Gin Val Glu 1525 1530 1535 Asn Ile Asp Phe Val Ala Thr Tyr Asp Thr Val Asp Phe Val Ser Gly 1540 1545 1550 Asp Lys Asp Thr Thr Ser Val Thr Val Glu Ser Lys Asp Asn Gly Lys 1555 1560 1565 Arg Thr Glu Val Lys Ile Gly Ala Lys Thr Ser Val Ile Lys Asp His 1570 1575 1580 Asn Gly Lys Leu Phe Thr Gly Lys Glu Leu Lys Asp Ala Asn Asn Asn .1585 1590 1595 1600 Gly Val Thr Val Thr Glu Thr Asp Gly Lys Asp Glu Gly Asn Gly Leu 1605 1610 1615 Val Thr Ala Lys Ala Val Ile Asp Ala Val Asn Lys Ala Gly Trp Arg 1620 1625 1630 Val Lys Thr Thr Gly Ala Asn Gly Gin Asn Asp Asp Phe Ala Thr Val 1635 1640 1645 Ala Ser Gly Thr Asn Val Thr Phe Ala Asp Gly Asn Gly Thr Thr Ala 1650 1655 1660 Glu Val Thr Lys Ala Asn Asp Gly Ser Ile Thr Val Lys Tyr Asn Val 1665 1670 1675 1680 .i Lys Val Ala Asp Gly Leu Lys Leu Asp Gly Asp Lys Ile Val Ala Asp 1685 1690 1695 Thr Thr Val Leu Thr Val Ala Asp Gly Lys Val Thr Ala Pro Asn Asn 1700 1705 1710 Gly Asp Gly Lys Lys Phe Val Asp Ala Ser Gly Leu Ala Asp Ala Leu 1715 1720 1725 Asn Lvs Leu Ser Trp Thr Ala Thr Ala Gly Lys Glu Gly Thr Gly Glu 1730 1735 1740 Val Asp Pro Ala Asn Ser Ala Gly Gin Glu Val Lys Ala Gly Asp Lys 1745 1750 1755 1760 Val Thr Phe Lys Ala Gly Asp Asn Leu Lys Ile Lys Gin Ser Gly Lys 1765 1770 1775 Asp Phe Thr Tyr Ser Leu Lys Lys Glu Leu Lys Asp Leu Thr Ser Val 1780 1785 1790 Glu Phe Lys Asp Ala Asn Gly Gly Thr Gly Ser Glu Ser Thr Lys Ile 1795 1800 1805 72 Thr Lys Asp Gly Leu Thr Ilie Thr Pro Ala Asn Gly Ala Gly Ala Ala 1810 1815 1820 Gly Ala Asn Thr Ala Asn Thr Ile Ser Val Thr Lys Asp Gly Ile Ser 1825 1830 1835 1840 Ala Gly Asn Lys Ala Val Thr Asn Val Val Ser Gly Leu Lys Lys Phe 1845 1850 1855 Gly Asp Gly His Thr Leu Ala Asn Gly Thr Val. Ala Asp Phe Glu Lys 1860 1865 1870 His TIyr Asp Asn Ala Tyr Lys Asp Leu Thr Asn Leu Asp Glu Lys Gly 1875 1880 1885 Ala Asp Asn Asn Pro Thr Val Ala Asp Asn Thr Ala Ala Thr Val Gly .:1890 1895 1900 Asp Leu Arg Gly Leu Gly Trp, Val Ile Ser Ala Asp Lys Thr Thr Gly 1905 1910 1915 1920 Glu Pro Asn Gin Giu Tyr Asn Ala Gin Val Arg Asn Ala Asri Giu Val 1925 1930 1935 Lys Phe Lys Ser Gly Asn Gly Ile Asn Val Ser Gly Lys Thr Leu Asn 1940 1945 1950 Gly Thr Arg Val Ile Thr Phe Glu Leu Ala Lys Gly Glu Val Val Lys 1955 1960 1965 Ser Asn Glu Phe Thr Val Lys Asn Ala Asp Giy Ser Giu Thr Asn Leu 197 0 1975 1980 Val Lys Vai Gly Asp Met Tyr Tyr Ser Lys Giu Asp Ile Asp Pro Ala :1985 1990 1995 2000 Thr Ser Lvs Pro Met Thr Gly Lys Thr Glu Lys Tyr Lys Val Glu Asn 2005 2010 2015 Giy Lys Val Val Ser Ala Asn Gly Ser Lys Thr Glu Val Thr Leu Thr 2020 2025 2030 Asn Lys Gly Ser Gly Tyr Val Thr Gly Asn Gin Val Ala Asp Ala Ile 2035 2040 2045 Ala Lys Ser Gly Phe Glu Leu Gly Leu Ala Asp Ala Ala Glu Ala Giu 2050 2055 2060 Lys Ala Phe Ala Giu Ser Ala Lys Asp Lys Gin Leu Ser Lys Asp Lys 2065 2070 2075 2080 Ala Glu Thr Val Asn Ala His Asp Lys Val Arg Phe Ala Asn Gly Leu 2085 2090 2095 Asn Thr Lys Val Ser Ala Ala Thr Val Glu Ser Thr Asp Ala Asn Gly 2100 2105 2110 73 Thr Asp Lys Val Thr Thr Thr Phe Val Lys 2115 2120 Asp Val Giu Leu Pro Leu 2125 Thr Gin Ile Tyr Asn Thr Asp Ala Asfl Gly Asn Lys 2130 2135 2140 Ala Asp Gly Lys Trp Tyr Glu Leu Asn Ala Asp Gly 2145 2150 2155 Lys Glu Val Thr Leu Gly Asn Val Asp Ala Asn Gly 2165 2170 Lys Val Thr Glu Asn Gly Ala Asp Lys Trp Tyr Tyr 2180 2185 Gly Ala Ala Asp Lys Thr Lys Gly Glu Val Ser Asn 2195 2200 Thr Asp Glu Lys His Val Val Arg Leu Asp Pro Asn 2210 2215 2220 Gly Lys Gly Val Val Ile Asp Asn Val Ala Asn Gly 2225 2230 2235 Thr Ser Thr Asp Ala Ile Asn Gly Ser Gin Leu Tyr 2245 2250 Gly Val Thr Asn Leu Ala Gly Gin Val Asn Asn Leu 2260 2265 Asn Lys Val Gly Lys Arg Ala Asp Ala Gly Thr Ala 2275 2280 Ala Ser Gin Leu Pro Gin Ala Thr Met Pro Gly Lys 2290 2295 230 Ile Ala Gly Ser Ser Tyr Gin Gly Gin Asn Gly Leu 2305 2310 2315 Ser Arg Ile Ser Asp Asn Gly Lys Val Ilie Ile Arg Ile Val Lys Lys Thr Ala Ser Asn 2160 Lys Lys Val Val 2175 Thr Asn Ala Asp 2190 Asp Lys Val Ser 2205 Asn Gin Ser Asn Glu Ile Ser Ala 2240 Ala Val Ala Lys 2255 Glu Gly Lys Val 2270 Ser Ala Leu Ala 2285 Ser Met Val Ala 0 Ile Ser Gly Val 2320 Gly Thr 2335 2325 2330 Thr Trp Asn Ser Gin Gly Lys Thr Gly Val Ala Ala Gly Val 2340 2345 Gly Tyr Gin 2350 INFORNATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 658 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) (xi) Met Val Thr Ala Ala Val Ser LeL Gl1 Ly 14 11 Se Tt

I

2 MOLECULE TYPE: protein SEQUENCE DESCRIPTION: SEQ ID As Lys Ile Phe Asn Val Ile Trp Asn Val 5 10 Val Val Ser Glu Leu Thr Arg Thr His Thl 25 Val Ala Val Ala Val Leu Ala Thr Leu Lel 40 Asn Asn Asn Thr Pro Val Thr Asn Lys Le 55 Asn Phe Asn Phe Thr Asn Asn Ser Ile Al 70 75 Gin Glu Ala Tyr Lys Gly Leu Leu Asn Le 90 Asp Lys Leu Leu Val Glu Asp Asn Thr A] 100 105 Arg Lys Leu Gly Trp Val Leu Ser Ser L~ 115 120 a Lys Ser Gin Gln Val Lys His Ala Asp G 130 135 s Gly Gly Val Gln Val Thr Ser Thr Ser G 5 150 1 Thr Phe Ala Leu Ala Lys Asp Leu Gly 165 170 r Asp Thr Leu Thr Ile Gly Gly Gly Ala I 180 185 r Pro Lys Val Asn Val Thr Ser Thr Thr 195 200 (s Asp Ala Ala Gly Ala Asn Gly Asp Thr 210 215 le Gly Ser Thr Leu Thr Asp Thr Leu Val 25 230 le Asp Gly Gly Asp Gin Ser Thr His Tyr 245 250 ys Asp Val Leu Asn Ala Gly Trp Asn Ile 260 265 Val Thr Glr r Lys Cys Al u Ser Ala Th u Lys Ala Ty a Asp Ala G1 u Asn Giu L) a Ala Thr V 1: (S Asn Gly TI 125 iu Val Leu P 140 iu Asn Gly L 55 ral Lys Thr ia Ala Gly Rsp Gly Leu 205 Thr Val His 220 Gly Ser Pro 235 Thr Arg Ala Lys Gly Val 1 a r r

'S

LC

11 L1 hI he yy 11 ki 19 Le

L

A

P

Thr Trp Ser Ala Val Glu Gly Asp Lys Gin Asn Ala Gly Asn Arg Asn e Glu Gly s His Thr 160 a Thr Val 175 .a Thr Thr 00 's Phe Ala eu Asn Gly la Thr His 240 la Ser Ile 255 ,ys Ala Gly 270

S

Ser Thr Asp 305 Thr Asn Asp Asn Gly 385 Gly Val Lv! Al Ly 46 Se As

S

G

T

Thr Th 27 Val G1 290 Ser Ly Ser Va Lys G1 Glu Gl 35 Lys T1 370 Asp P) Asn G; Lys T s lie A 4 a Asn P 450 s Lys 1 5 r Trp ;n Ala La Gly er Leu 530 iy Asn 45 hr Pro r Gi 5 u Ph s G1 1 11 ,u T1 34 y L 55 ir G 'ie A ly T yr A 4 la 35 rsn I .eu rhr Ser Lys 515 Gin Asn Ala y ee uu ee rr I0 ly la hr ,20 ?re Va Th Gl As

A

G

A

Gin Sei Leu Sei Asn GI' 31 Lys G1' 325 As Ly Gly Le Trp Ar Thr Vc 3 Thr A 405 Ala L i Asp T o Lys G 1 Thr 4 .r Thr 2 485 .u Gin 40 n Leu sp Ala ly Ala sn Gly 565 r

Y

0 u

S

u 00 La hi G1' Li

L

5

A

Glu Asr 28( Ala Asi 295 Lys Ar Lys As Val As Val Tb 36 lie Ly 375 Ala SE Thr V Val G r Thr A 4 y Lys V 455 a Lys C 0 a Ala .u Val es Val eu Thr 535 ys Thr 50 ,la Gly Va 2 Th: g Th p Gl p Gi 34 r Al 0 s5 Tk ~r G al TI ly A 4 la L 40 'al ;ly I ;lu Lys Lys 520 Gly Glu Ala

L

r y y

Y

5 ay rr Ly hI S1 ee Al;

A]

5(

G:

L

Asp Phe Val Glu Thr Th~ 30 Glu Val Ly 315 Lys Leu Ph 330 Ala Asn Al Lys Asp Va Thr Asp Al 38 Thr Asn Vi 395 Asn Giy TI 410 G Gly Leu L 1 Thr Val A a Asp Val A 4 u Val Thr 1 475 a Asp Gly 490 a Gly Asp )5 Ln Glu Gly eu Thr Ser le Asn Lys 555 Lsn Asn Ala 570 His 285 r Thr 3 s Ile e Thr a Thr .1 lie 365 .a Asn i Thr hr Asr ys Le sn As 44 la Se ~la Le Gly T1 Lys V Ala A, 5 Ile T 540 Asp G Asn 7 Thr Tyr Asp Val Thr Val Gly Ala LyE 32C Gly Lys Al; 335 Glu Asp Al 350 Asp Ala Va Gly Gln As Phe Ala Se 40 Gly Ile Tk 415 i Asp Gly A 430 p Gly Lys A r Thr Asp G .u Asn Ser L 4 ir Leu Asp C 495 3, Thr Phe 510 sn Phe Thr hr Leu Gly ;ly Leu Thr rhr Ile Ser 575 a 1 n rr rr sn lu *eu ;ly ,ys Tyr Thr lie 560 Val 0* 76 Thr Lys Asp Gly Ile Ser Ala Gly Gly Gin Ser Va 580 585 Ser Gly Leu Lys Lys Phe Gly Asp Ala Asn Phe As 595 600 Ser Ala Asp Asn Leu Thr Lys Gin Asn Asp Asp Al 610 615 62 Thr Asn Leu Asp Glu Lys Gly Thr Asp Lys Gin Th 625 630 635 Asp Asn Thr Ala Ala Thr Val Gly Asp Leu Arg G] 645 650 Ile Ser INFORMATION FOR SEQ ID NO:6: SEQUENCE

CHARACTERISTICS:

LENGTH: 607 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: Met Asn Lys Ile Phe Asn Val Ile Trp Asn Val r 1 5 10 Val Val Val Ser Glu Leu Thr Arg Thr His Thr I 25 Arg Gly Asp Pro Val Leu Ala Thr Leu Leu Phe 40 Asn Ala Thr Asp Glu Asp Glu Glu Leu Asp Pro 55 Pro Val Leu Ser Phe His Ser Asp Lys Glu Gly.

70 Val Thr Glu Asn Ser Asn Trp Gly Ile Tyr Phe 90 Leu Lys Ala Gly Ala Ile Thr Leu Lys Ala Gly 100 105 Lys Gin Xaa Thr Asp Glu Xaa Thr Asn Ala Ser 115 120 Leu Lys Lys Asp Leu Thr Asp Leu Thr Ser Val 130 135 1 p a .0 ir ly Lys Asn 590 Pro Leu 605 Tyr Lys Pro Val Leu Gly Val Thr Gly Val Trp 655 Val Ser Leu Ala 640 Val let .ys Ala Val Thr As As] Se Al 14 Thr Gin T 1 Arg Leu Thr Val Val Arg Gly Glu p Asn Lys p Asn Leu 110 r Phe Thr 125 .a Thr Glu .0 hr T irg Gln rhr Lys Gly Lys Tyr Lys rp Asn Ala Ala Glu Val Xaa Ser Leu a a Ser Ph 145 Gly Le Leu As Ser Se Val L) 21 Ala G 225 Val G! Ala L Ser V Asn A 2 Glu G 305 Lys Asp I Asp Thr Asp 385 Lys Ala Ala e Ly lu

YS

al sp 90

IN,

lx Va 37 Va

G]

G:

Gly Ala Asi Lys Leu Al 16 Ser Thr Le 180 Ser Phe Th 195 Asp Val Le Gly Asn Va Phe Ile T1 24 Glu As X 260 Ile Lys G: 275 Thr Asn L Asn Gly L Gly Trp P 3 e Ala Thr 340 y Thr Thr 355 1 Lys Tyr 0 's Lys Ile Ii Ala Glu Ly Asp Leu 420 lu Ala Asp 435 n a 5

U

r

U

ii rr L5 ia 11.

re 12 Ia

V

I

4

V

I1 Gly As] 150 Lys Th Pro As Pro As Asn Al 21 Glu Se 230 Gly A! Lys Ti Lys A s Val T 2 u Val T 310 g Val 1 5 .1 Ala .a Ser sp Ala al Ala 390 le Ala 05 'al Thr 'hr Asp p Lys r G1l p Al n As 20 a Gi .5 ,r Va .p Ly hr T1 sp G1 21 hr S 95 hr A .ys T 3er G Val1 Lys 375 Asp Lys Ala Gly a

F

0 e

I

3

I

T

77 Val Asp Ile 15! Asn Gly As! 170 Val Thr Asl 185 Val Glu Ly Trp Asf Il Asp Leu Va 23 Asn Thr Le 250 r Glu Val L) 265 y Lys Leu P1 0 r Asn Thr A .a Lys Ala V 3 ir Thr Thr A 330 Ly Thr Asn X 345 hr Lys Asp 60 al. Gly Asp I Ihr Thr Ala ;lu Asp Asp 410 Ueu Gly Asn 425 Ala Leu Glu 440 Thi i Va n Th s Th e Ly 22 .1 Se 5 u As 's P1 ae T la T 3 al I 15 la A Tal T rhr I 1ly Leu 395 Lys Leu Gly 1 r r

S

0 r

OC

le Ir hi 0( 1( hh Le 38

S

I

Ser Asp Ala His Leu Asl 17 Gly Val Le' 190 Arg Ala Al 205 Gly Ala Ly Ala Tyr As Val Val Le Thr Pro L 270 Gly Lys G: 28.5 Asp Asn T 4 Asp Ala V n Gly Gin A 3 r Phe Glu S 350 n Gly Asn 365 ~u Lys Phe ar Val Thr ys Lys Leu er Trp Lys 430 le Ser Lys 445

S

U

u a

G

V

4 5 (s 11L hi a

SS

33 ee 31

G:

V

4

P

Asn 160 Gly Ser Thr Thr Asn 240 Thr Thr i Asn r Asp 1 Asn 320 n Gly r Glv .y Ile 3p Ser ly Gly 400 al Asn la Ls ksp Gin 8 Glu Val Lys Ala Gly Giu Thr Val Thr Phe Lys Ala Gly Lys Asn Leu 450 455 460 Lys Val Lys Gin Asp Gly Ala Asn Phe Thr Tyr Ser Leu Gin Asp Ala 465 470 475 480 Leu Thr Gly Leu Thr Ser Ilie Thr Leu Gly Gly Thr Thr Asn Gly Gly 485 490 495 Asn Asp Ala Lys Thr Val Ilie Asn Lys Asp Gly Leu Thr Ile Thr Pro 500 505 510 Ala Gly Asn Gly Gly Thr Thr Gly Thr Asn. Thr Ile Ser Val Thr Lys 515 520 525 Asp Gly Ile Lys Ala Gly Asn Lys Ala Ilie Thr Asn Val Ala Ser Gly :530 535 540 Leu Arg Ala Tyr Asp Asp Ala Asn Phe Asp Val Leu Asn Asn Ser Ala 545 550 555 560 *Thr Asp Leu Asn Arg His Val Giu Asp Ala Tyr Lys Gly Leu Leu Asfl 565 570 575 *Leu Asn Giu Lys Asn Ala Asn Lys Gin Pro Leu Val Thr Asp Ser Thr 580 585 590 Ala Ala Thr Val Gly Asp Leu Arg Lys Leu Gly Trp Val Val Ser 595 600 605 INFORMATION FOR SEQ ID NO:7: SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STR.ANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:?: Met Asn Lys Ile Phe Asn Val Ile Trp Asn Val Met Thr Gln Thr Trp 1 5 10 Val Val Val Ser Glu Leu Thr Arg INFORMATION FOR SEQ ID NO:8: Ci) SEQUENCE

CHARACTERISTICS:

LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: Met Asn Lys Ile Phe Asn Val Ile Trp Asn Val Val Thr Gin Thr Trp 1 5 10 Val Val Val Ser Glu Leu Thr Arg INFORMATION FOR SEQ ID NO:9: SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: Met Asn Lys Ile Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 1 5 10 Val Ala Val Ser Glu Leu Ala Arg INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Asn Lys Ile Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 1 5 10 Val Ala Val Ser Glu Leu Ala Arg INFORMATION FOR SEQ ID NO:11: SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11: Met Asn Lys Ala Tyr Ser Ile Ile Trp Ser His Ser Arg Gin Ala Trp 1 5 10 Ile Val Ala Ser Glu Leu Ala Arg INFORMATION FOR SEQ ID NO:12: SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: Met Asn Arg Ile Tyr Ser Leu Arg Tyr Ser Ala Val Ala Arg Gly Phe 1 5 10 Ile Ala Val Ser Glu Phe Ala Arg INFORMATION FOR SEQ ID NO:13: SEQUENCE CHARACTERISTICS: LENGTH: 24 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: Met Asn Lys Ile Tyr Tyr Leu Lys Tyr Cys His Ile Thr Lys Ser Leu 1 5 10 Ile Ala Val Ser Glu Leu Ala Arg INFORMATION FOR SEQ ID NO:14: SEQUENCE CHARACTERISTICS: LENGTH: 2037 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUTENCE DESCRIPTION: SEQ ID NO:14: ATGAACAAAA TTTTTA.ACGT TATTTGGAAT GTTGTGACTC AA.ACTTGGGT TGTCGTATCT

GAACTCACTC

ACCCTGTTGT

GAGTTAGAAC

ACTGGAGAAiC

GCAGTAGGAL

GGCA.ATGACT

GAAAATTAT

TTGAAP.T'rGG

ATTGCTTCGA

ATTGATGCGG

AATATCCAAG

GCACCCACAC

CCGCAACGGT

CCGTACAACG

AAGAGGGAAC

GCAGCACAAT

TCACCTACTC

CGTTTGGCGC

CGAAALACAGG

CTTTGACCGA

TTAATTATCA

GCAATGGAA

0* S S *5*S

S

*5*S :AAATGCGCC T rCAGGCGAAT G

:TCTGTTTTA

kIACAGAGGTA :ACCTTCAAA C OCTGAAAAAA C AAACGGCAAT3

TAACGGAAAT

rACGCTTGCC

TCGCGCTGCA

CAATGTCGAT

TGTGAGCGTT

CTTGCCGGTT

TTACAAAGCC

GGCGAAAACC

TGTTGCAGAC

AGACAAACAG

CGGCGGCAAG

TGGCGAGTTG

TTCGGTACAG

AGGTTTGGTT

CCGCCACCG T ;CTACCGATG LGGTGGAGCT TI LTAAATTTGA A ;CCGGCGACA A

;AGCTGAAAA;

.A.AGTTGATA I 3GTCAAAACA C 3GTGGCACAA kGCGTACAAG

TTTGTCCGTA

ACGGCTGATA

CAATATGTTA

AAAGATGACG

AAAGTGAAAT

GGCACGGAAG

GTTACGTTGA

GCAACTCAAA

TTGAAAATTA

GTTGGCGATG

GAGGCTTCTG

GTCGGCAGCG

GTCAATGGCG CGAATGCCA.A *SeS

S

*5

GTCCGTGTGG

GTGAAAGTGG

AAAGTCGAA

AA7CCGGTGA

A.AGCAATTAA

AATGGCGGTA

TY!TAAATTTA

ACTTTTACGC

AAAGGTGCAA

ATGTAACAGG

GCAATGAGTA

TGGCGAGCT

AA.ATTAGCAA

AAGCCTTGCA

CAGATA6ACGA

AATCTAGCGA

CGAAAAAAGG

ATACAACTGA

GGCAGTTGC C( AA.ACGAAGA T~ CAA6ATCCGC T.

~CACAGATTC A: LCCTGAAAALT C CCTGACCAG T ~TACCAGTGA T ;TA.ATGTTCA C :AGGACACGT Tr kTGTGTTAA.A C CTTACGACAC C

CGGCTCACAA

CGGAAGACGG C GTTCGGCGGA I

TGGTATCGGC

ACACCGATGC

GCACGAGCAA

CTTTAAGCAA

GCGCGACCGG

ATGGCAAGGC

AA.TTGGTTGA

GCGAGCTTGA

AAGCCGGCGA

ATGAATTGAC

CAAGCACGAA

GTATTGGCA

GATGAAGAA

AAGGAAGGC

TCAGGAAAT

AAACAAAGC

GTTGAAACT

GCAA6ATGGC

TTAAACGGT

GACACCAAC

AGCGGTTGG

:GTGGACTTT

LZAAGACAACT

AAAACCGTT

~ATGAATCAA

kAGCGGTACA

GGTCAGCTTT

TGCTTATGCC

TGGTTTGAAT

CGATACGGTT

TTCAATTTCA

AAGCC-TGA-AC

TGGTACATCC

CAATCTGAAG

GGGCGTGAAG

GATTACCAAA

1.20 180 240 300 360 420 480 540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260 1320 1380 1440 1500 AAACTGGGTT GGAAAGTAGG GGTTGAGAAA

AAGGAAACTT

GTCAAACAAG

AGCGTGGAGT

TAGTGAAGTC GGGCGATAAA GTAACTTTGA AGGGCACAAA CTTCACTTAC GCGCTCAAAG TTAAAGACAC GGCGAATGGT..GCAAACGGTG 82 GACGGCTTGA CCATTACGCT GGCAAACGGT GCGAATGGTG CC AAGATTAAAG TTGCTTCGGA CGGCATTAGC GCGGGTAATA A-I GCAGGCGAAA TTTCTGCCAC TTCCACCGAT GCGATTAACG GJ GCAAAAGGGG TAACAAACCT TGCTGGACAA GTGAATAATC T GTGGGCAAAC GTGCAGATGC AGGTACTGCA AGTGCATTAG C( GCCACTATGC CAGGTAAATC AATGGTTTCT ATTGCGGGAA G GGTTTAGCTA TCGGGGTATC AAGAPLTTTCC GATAATGGCA GGCACAACCA ATAGTCA-AGG TAAAACAGGIC GTTGCAGCAG G' INFORMATION FOR SEQ ID SEQUENCE CHARACTERISTICS: LENGTH: 679 amino acids TYPE: amino acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID Met Asn Lys Ile Phe Asn Val Ilie Trp, Asn 1 5 10 ValiVal Val Ser Giu Leu Thr Arg Thr His 25 Thr Val Ala Val Ala Val Leu Ala Thr Leu 35 40 Ala Asn Ala Thr Asp Giu Asn Giu Asp Asp

;ACGGTGAC

~GCAGTTAA

%AGCCAGTT

rGAGGGCAA

GGCTTCACA

rAGTTATCA

PLGTGATTAT

TGTTGGTTA

TGATGCCGAC

AAACGTCGCG

GTATGCCGTG

AGTGAATAAA

GTTACCACAA

AGGTCAAAAT

TCGCTTGTCT

CCAGTGG

1620 1680 1740 1800 1860 1920 1980 2037 Val Val. Thr Gin Thr Trp is Thr Lys Cys Leu Ser Ala Glu Giu Giu Thr Val Gin Leu Giu Pro Val.

Thr so Gin Gly 55 Arg Ser Val Leux Arg Trp 70 Giu Gin Giu Gly Thr Thr Gly Asn Ala Val Gly Ser 100 Lys Ser Phe Glu Val 90 Ser Thr 105 75 Ile Ser Ala Lys Giu Gly Asn Leu Asn Thr Asp Ala Gly Ser Ser Ile Thr Phe Lys 110 Asp Asn Leu Lys Ile Lys Gin Ser Gly Asn Asp Phe Thr Tyr Ser Leu 115 120 125 Lys Lys Glu Leu Lys Asn Leu Thr Ser Val Giu Thr Glu Lys Leu Ser 130 135 140 83 Phe Gly Ala Asn Gly Asn Lys Val Asp Ile Thr Ser Asp Ala Asn Gly 145 150 155 160 a a a a a Leu His Thr Ala Asn 225 Val Lys Va1 Lys Gly 305 Asn AI a Leu Lys Leu rhr Ala 210 Gly As Lys Thr Ala 290 Glu Pro Val Sex Leu Asn Gly 195 Ser Asn Gly Thr Glu 275 Lys Leu Val Ser Thr Ala Gly 180 His Val Asn Ala Thr 260 Asp Asp Ala Lys Phe 340 Ser Lys 165 Ile 2 Val Gin Val Asn 245 Val Gly Asp Lys Ile 325 Lys Asn rhr la ksp Asp Asp 230 Ala Arg Lys Gly Thr 310 Ser Gin Ali Gly A Ser T Thr 2 Val 1 215 Phe N Asn Val Thr' Ser 295 Lys Asn Leu Tyr Leu 375 u Leu 0 s Gly sn hr Lsn !00 eu Tal lal ksp Val 280 Ala Val Val Lys Ali 36( Se: Ly Se Gly A 1 Leu T 185 Ile A Asn S Arg T Ser Val 1 265 Val Asp Lys Ala Ala 345 Asn r Asn s Ile r Val sn 70 hr ,sp er 'hr Pal !50 rhr Lys Mlet Leu Asp 330 Leu Gl Gi Se:

GI:

Gly C Asp 'I Ala Gly Tyr 235 Thr Gly Val Asn Val 315 Gly Gin Gly Leu r Ala 395 n Val ;In Ehr lal rrp 220 ksp Ala Leu Gly Gin 300 Ser Thr Asp Thi Asi 38 Th

G'

Asn S Leu A 1 Asn 'T 205 Asn I Thr N Asp IJ Pro I Asn 285 Lys Ala Glu Lys Asp 365 1 Phe 0 r Gly y Asp er la yr :le Pal hr Ial 270 3lu lal Ser Asp Gin 350 Asn Lys As As~ Asn V 175 Gly G His Gin Asp I Ala I 255 Gin Tyr Glu Gly Thr 335 Val Asp Phe Thr Gly !al ;ly ~rg ;ly ?he iis ryr Tyr Asn Thr 320 Asp Thr Gly Lys Val 400 Lys 355 Gly Ser 385 Thr Lys 370 Ser Phe Ala Asp Thr Thr Gly Pro Gin Glu Lys Thi Lei 39' Ly 405 410 415 Ala Ser Ser Glu Ile Leu 435 Ser 420 Val Lys Glu Gly Ser Ala Leu Asn Asn 440 Thr Thr 425 Lys Leu Glu Gly Gly Trp Leu Lys 445 Va1 430 Val Glu Gly Ala Val 84 Asp Gly S. S

S

Giu Lys 450 Val Lys 465 Val Lys Thr Gly Gly Ala Asn Gly 530 Ala Ser 545 Ala Gly Leu Tyr Asn Leu Thr Ala 610 Gly Lys 625 Gly Leu Ile Arg Ala Gly Val Ser Gin Val Ser 515 Ala Asp Glu Ala Giu 595 Ser Ser Ala Leu Val1 Gly Gly Glu Lys 500 Thr Asn Gly Ile Val1 580 Gly Al a Met Ile Ser 660 Gly Ser Asp Gly 485 Ser Lys Gly I le Ser 565 Ala Lys Leu Val1 Gly 645 Gly Tyr Gly Lys 470 Thr Val Ile Ala Ser 550 Al a Lys Val1 Al a Ser 630 Val1 Thr Gin Giu 455 Val Asn Glu Thr Thr 535 Ala Thr Gly Asn Ala 615 Ile Ser Thr Trp Leu Thr Phe Phe Lys 520 Val Gly Ser Val Lys 600 Ser Ala Arg Asn Leu Thr Lys 505 Asp Thr Asn Thr Thr 585 Val Gin Gly Ile Ser 665 Lys Ala 475 Tyr Ala 490 Asp Thr Gly Leu Asp Ala Lys Ala 555 Asp) Ala 570 Asn Leu Gly Lys Leu Pro Ser Ser 635 Ser Asp 650 Gin Gly Gly Leu Ala Thr Asp 540 Val1 Ile Ala Arg Gin 620 Tyr Asn Lys Asp Asn Lys Asp Asn Gly 510 Ile Thr 525 Lys Ile Lys Asn Asn Gly Gly Gin 590 Ala Asp 605 Ala Thr Gin Gly Gly Lys Thr Gly 670 Leu Lys 480 Glu Leu 495 Ala Asn Leu Ala Lys Val Val Ala 560 Ser Gin 575 Val Asn Ala Gly Met Pro Gin Asn 640 Val Ile 655 Val Ala Thr Ser Lys Giu Thr Leu 460 675 INFORMATION FOR SEQ ID NO:16: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs CB) TYPE: nucleic acid STRAMDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genornic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: CCGTGCTTGC CCAACACGCT T 21 INFORMATION FOR SEQ ID NO:17: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: GCTGCCACCT TGCACAACAA C 21 INFORMATION FOR SEQ ID NO:18: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: CTTCAATGC CAGAAAGTAG G 21 INFORMATION FOR SEQ ID NO:19: SEQUENCE CHARACTERISTICS: LENGTH: 21 base pairs TYPE: nucleic acid STRANDEDNESS: unknown TOPOLOGY: unknown (ii) MOLECULE TYPE: DNA (genomic) (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: CTTCAACCGT TGCGGACAAC A n

Claims

1. A method of treating a patient in need thereof comprising: administering to said patient an immunogenic composition comprising a pharmaceutically acceptable carrier and a recombinant Haemophilus adhesion protein having greater than 50% homology to the sequence shown in Figure 2 (SEQ ID NO:2), Figure 3 (SEQ ID NO:4) or Figure 15 (SEQ ID

2. The method according to claim 1, wherein said recombinant Haemophilus adhesion protein has a sequence having greater than 60% homology to the sequence shown 0: in Figure 2 (SEQ ID NO:2) Figure 3 (SEQ ID NO:4) or Figure 15 (SEQ ID

3. The method according to claim 2 wherein the recombinant Haemophilus adhesion protein has the sequence shown in Figure 3 (SEQ ID NO:4). S o' The method according to claim 2 wherein said recombinant Haemophilus adhesion .Vo'o, protein has the sequence shown in Figure 15 (SEQ ID The method according to claim 2 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 2 (SEQ ID NO:2). S

6. The method according to claim 1, wherein said immunogenic composition is administered prophylactically such that subsequent Haemophilus infection is prevented.

7. The method according to claim 1, wherein said immunogenic composition is administered therapeutically to a patient previously exposed or infected by Haemophilus.

8. The method according to claim 1, wherein said immunogenic composition is administered as a single dose. -87-

9. The method according to claim 1, wherein said immunogenic composition is administered in several doses over a period of time. A method of preventing Haemophilus infection comprising: administering to said patient a therapeutically effective amount of a composition comprising a pharmaceutically acceptable carrier and a recombinant Haemophilus adhesion protein having greater than 50% homology to the sequence shown in Figure 2 (SEQ ID NO:2), Figure 3 (SEQ ID NO:4) or Figure 15 (SEQ ID

11. The method according to claim 10, wherein said recombinant Haemophilus adhesion protein has a sequence having greater than 60% homology to the sequence shown in Figure 2 (SEQ ID NO:2) Figure 3 (SEQ ID NO:4) or Figure (SEQ ID NO:

12. The method according to claim 11 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 3 (SEQ ID NO:4).

13. The method according to claim 11 wherein said recombinant Haemophilus i adhesion protein has the sequence shown in Figure 15 (SEQ ID

14. The method according to claim 11 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 2 (SEQ ID NO:2). The method according to claim 10, wherein said immunogenic composition is administered as a single dose.

16. The method according to claim 10, wherein said immunogenic composition is administered in several doses over a period of time.

17. Use of a recombinant Haemophilus adhesion protein having greater than homology to the sequence shown in Figure 2 (SEQ ID NO:2), Figure 3 (SEQ ID NO:4) or Figure 15 (SEQ ID NO:15) in the manufacture of an immunogenic -88- composition for prophylactic or therapeutic use in generating an immune response.

18. Use according to claim 17, wherein said recombinant Haemophilus adhesion protein has a sequence having greater than 60% homology to the sequence shown in Figure 2 (SEQ ID NO:2), Figure 3 (SEQ ID NO:4) or Figure 15 (SEQ ID

19. Use according to claim 18 wherein said recombinant Haemophilus adhesion protein has the sequence shown in Figure 3 (SEQ ID NO:4).

20. Use according to claim 18 wherein said recombinant Haemophilus adhesion protein S•has the sequence shown in Figure 15 (SEQ ID C o..

21. Use according to claim 8 wherein said recombinant Haemophilus adhesion protein S has the sequence shown in Figure 2 (SEQ ID NO:2).

22. Use according to claim 17, wherein said immunogenic composition is adapted to be administered prophylactically such that subsequent Haemophilus infection is prevented. C.

23. Use according to claim 17, wherein said immunogenic composition is adapted to be administered therapeutically to a patient previously exposed or infected by Haemophilus.

24. Use according to claim 17, wherein said immunogenic composition is adapted to be administered as a single dose. Use according to claim 17, wherein said immunogenic composition is adapted to be administered in several doses over a period of time. Dated this 12 th day of July 2000. Washington University AND St.Louis University By their Patent Attorneys, Davies Collison Cave