WO2004053098A2

WO2004053098A2 - Base, a new cancer gene, and uses thereof

Info

Publication number: WO2004053098A2
Application number: PCT/US2003/039476
Authority: WO
Inventors: Ira H. Pastan; Kristi A. Egland; James J. Vincent; Byungkook Lee; Robert Strausberg
Original assignee: US Department of Health and Human Services
Current assignee: US Department of Health and Human Services
Priority date: 2002-12-10
Filing date: 2003-12-10
Publication date: 2004-06-24
Anticipated expiration: 2005-06-10
Also published as: AU2003296508A1; WO2004053098A3; AU2003296508A8

Abstract

The invention relates to the discovery of a new gene, termed 'BASE,' which is expressed in some 25% of breast cancers and in salivary glands. BASE is expressed in two alternatively spliced forms: a 19.5 kD, 179 amino acid secreted protein called 'base1,' and a 8.4 CKD, 79 amino acid non-secreted protein called 'base2.' The invention provides antibodies to base l and to base2. Antibodies to the proteins can be used to detect the presence of base l or base2 in a sample, thereby detecting the presence of a BASE-expressing breast cancer. Antibodies to base2 attached to a therapeutic agent can direct the agent to base2-expressing cells. Base1 and base2, immunogenic fragments of the proteins, and analogs of the proteins can be used to raise immune responses to BASE-expressing cancer cells. The invention further provides uses for using the proteins in manufacturing medicaments and methods for using antibodies to the proteins, attached to therapeutic molecules, to inhibit the growth of cancer cells expressing BASE.

Description

PATENT APPLICATION

BASE, A NEW CANCER GENE, AND USES THEREOF

CROSS-REFERENCES TO RELATED APPLICATIONS

[0001] This application claims priority from U.S. Provisional Patent Application No. 60/432,531, filed December 10, 2002. The contents of this application are incorporated herein by reference.

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER

FEDERALLY SPONSORED RESEARCH AND DEVELOPMENT [0002] NOT APPLICABLE

REFERENCE TO A "SEQUENCE LISTING," A TABLE, OR A COMPUTER PROGRAM LISTING APPENDIX SUBMITTED ON A COMPACT DISK.

[0003] NOT APPLICABLE

BACKGROUND OF THE INVENTION [0004] Breast cancer is a serious problem worldwide. In the United States, for example, one in eight women will develop breast cancer during her lifetime. Recently, advances in the use of tumor-specific immunotherapies, such as the ti-ErbB2 monoclonal antibody, Herceptin® (or "Trastuzumab", which targets human epidermal growth factor receptor 2), have shown clinical efficacy for the treatment of metastatic breast cancers with ErbB2 overexpression (Bange et al, Nat. Med., 7, 548-552 (2001); Shak, S., Semin. Oncol., 26, 71- 77 (1999)). Because only 25 to 30% of human breast cancers over express ErbB2 (Slamon et al., Science, 235, 177-182 (1987); Slamon et al, Science, 244, 707-712 (1989)), there is a great need for the identification of more breast tumor-specific immunotherapy targets. One limitation is the availability of unique protein targets that are present on cancer cells but are not expressed in normal essential tissues, such as brain, liver, or kidney. BRIEF SUMMARY OF THE INVENTION [0005] The present invention identifies a new gene expressed in breast cancers. The gene undergoes alternative splicing and is expressed as either of two polypeptides. In one group of embodiments, therefore, the invention provides isolated polypeptides comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1, SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO:l, a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO: 1 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO: 1 , a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO: 1 which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2. In some of these embodiments, the polypeptide has at least 98% sequence identity to SEQ ID NO: 1. In some other embodiments, the polypeptide has at least 95% sequence identity to SEQ ID NO:2. In some preferred embodiments, the polypeptide has the sequence of SEQ ID NO:l, while in others it has the sequence of SEQ ID NO:2.

[0006] In another group of embodiments, the invention provides compositions comprising one of the polypeptides described above and a pharmaceutically acceptable carrier.

[0007] In yet another important group of embodiments, the invention provides isolated, recombinant nucleic acid molecules comprising a nucleotide sequence encoding a polypeptide selected from the group consisting of a polypeptide of SEQ ID NO:l, a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO:l, a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO:l and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:l, a polypeptide with 90% or greater sequence identity to SEQ ID NO: 2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO:l which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO:l, and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2. In some preferred embodiments, the recombinant nucleic acid encodes a polypeptide comprising the sequence of SEQ ID NO:l or a polypeptide comprising the sequence of SEQ ID NO:2. In other embodiments, the nucleic acid encodes a polypeptide comprising an immunogenic fragment of SEQ ID NO:l comprising at least 8 contiguous amino acids from amino acids 167-179 of SEQ ID NO: 1 , or an immunogenic fragment of SEQ ID NO:2 comprising at least 8 contiguous amino acids of SEQ ID NO:2.

[0008] The invention further provides host cells comprising an expression vector comprising a promoter operatively linked to a nucleic acid sequence encoding a polypeptide selected from the group consisting of: a polypeptide of SEQ ID NO:l, a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO:l , a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO: 1 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:l, a polypeptide with 90% or greater sequence identity to SEQ ID NO: 2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO:l which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO:l, and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express a polypeptide of SEQ ID NO:l or SEQ ID NO:2.

[0009] In another set of embodiments, the invention further provides the use of an isolated polypeptide comprising an amino acid sequence selected from the group consisting of a polypeptide of SEQ ID NO:l, a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO: 1 , a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO:l and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:l , a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO:l which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2, for the manufacture of a medicament to activate T lymphocytes against cells which express SEQ ID NO: 1 or SEQ ID NO:2. In some embodiments, the use is of a polypeptide that comprises at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO: 1 or of a polypeptide comprises at least 8 contiguous amino acids of SEQ ID NO:2. In others, the polypeptide has at least 95% sequence identity to SEQ ID NO:l and, when processed and presented in the context of Major Histocompatibility Complex molecules, activates T lymphocytes against cells which express a polypeptide of SEQ ID NO:l or has at least 90% sequence identity to SEQ ID NO:2 and, when processed and presented in the context of Major Histocompatibility Complex molecules, activates T lymphocytes against cells which express a polypeptide of SEQ ID NO:2. In preferred embodiments, the cells expressing SEQ ID NO:l or SEQ ID NO:2 are breast cancer cells.

[0010] The invention further provides the use of an isolated, recombinant nucleic acid molecule comprising a nucleotide sequence encoding a polypeptide comprising an amino acid sequence selected from the group consisting of a polypeptide of SEQ ID NO:l, a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO:l, a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO:l and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO: 1, a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95%) sequence identity with SEQ ID NO: 1 which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2, for the manufacture of a medicament to activate T lymphocytes against cells which express SEQ ID NO: 1 or SEQ ID NO:2. In preferred embodiments, the cells expressing SEQ ID NO: 1 or SEQ ID NO:2 are breast cancer cells. In some embodiments, the nucleic acid molecule encodes a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO:l ; in others, it can encode a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2.

[0011] In yet another set of embodiments, the invention provides methods of activating T lymphocytes against cells expressing SEQ ID NO:l or SEQ ID NO:2. The methods comprise administering to a subject a composition, which composition comprises an isolated polypeptide selected from the group consisting of : a polypeptide of SEQ ID NO:l, a polypeptide of SEQ ID NO: 2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO:l, a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO:l and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:l, a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO:l which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO:l, and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2. In some embodiments, the method comprises administering to the subject SEQ ID NO:l, or an immunogenic fragment thereof, while in others it comprises administering to the subject SEQ ID NO:2, or an immunogenic fragment thereof. In some particular embodiments, the composition comprises a polypeptide of at least 8 contiguous amino acids of amino acids 167- 179 of SEQ ID NO: 1 and in some others it comprises a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2. In preferred embodiments, the composition is administered to a subject with breast cancer. The method can further comprise co- administering to the subject an immune adjuvant selected from non-specific immune adjuvants, subcellular microbial products and fractions, haptens, immunogenic proteins, immunomodulators, interferons, thymic hormones and colony stimulating factors.

[0012] The invention further provides a method of activating T lymphocytes against cancer cells expressing SEQ ID NO:l or SEQ ID NO:2, the method comprising contacting T cells with an antigen presenting cell pulsed or transfected with a polypeptide comprising an epitope of an isolated polypeptide selected from the group consisting of : a polypeptide of SEQ ID NO: 1 , a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO:l, a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO: 1 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:l, a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO:l which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2.

[0013] Moreover, the invention provides methods of activating T lymphocytes against cancer cells expressing SEQ ID NO: 1 or SEQ ID NO:2, by administering a nucleic acid sequence encoding polypeptide comprising an epitope of an isolated polypeptide selected from the group consisting of : a polypeptide of SEQ ID NO: 1 , a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO:l, a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95%) or greater sequence identity to SEQ ID NO:l and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:l, a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO:l which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO:l, and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2. Preferably, the nucleic acid is operably linked to a promoter. The nucleic acid can be in an expression vector, which expression vector is in an autologous recombinant cell.

[0014] The invention further provides methods of sensitizing CD8+ cells in vitro against cells expressing SEQ ID NO:l or SEQ ID NO:2. The methods comprise contacting said cells with a composition, which composition comprises an isolated polypeptide selected from the group consisting of : a polypeptide of SEQ ID NO:l, a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO:l, a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO:l and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO: 1, a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO:l which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2. The CD8+ cells can be tumor infiltrating cells.

[0015] Additionally, the invention provides methods for determining whether a subject has a SEQ ID NO:l- or SEQ ID NO:2- expressing cancer, comprising taking a sample from said subject from a site other than the salivary glands, and determining whether a cell in said sample contains a nucleic acid transcript encoding SEQ ID NO:l or SEQ ID NO:2, or detecting a polypeptide of SEQ ID NO:l or SEQ ID NO:2, whereby detection of the transcript or of SEQ ID NO:l or SEQ ID NO:2 in said sample indicates that the subject has a SEQ ID NO:l- or SEQ ID NO: 2- expressing cancer. In some embodiments of the methods, the nucleic acid transcript is detected, while in others, the method comprises a polypeptide of SEQ ID NO:l or SEQ ID NO:2. The method can alternatively comprise contacting RNA from the sample with a nucleic acid probe that specifically hybridizes to a nucleic acid transcript encoding SEQ ID NO:l or SEQ ID NO:2 under stringent hybridization conditions, and detecting hybridization. The sample can be, for example, selected from the group consisting of blood and urine.

[0016] The invention further provides antibodies to base2. In particular, the invention provides antibodies that specifically bind to an epitope of a polypeptide selected from the group consisting of a base2 protein (SEQ ID NO:2), an immunogenic fragment thereof, a polypeptide with at least 90% sequence identity to base2 and which is specifically recognized by an antibody which specifically recognizes base2, and a polypeptide which has at least 90 % sequence identity with base2 and which, when processed and presented in the context of Major Histocompatibility Complex molecules, activates T lymphocytes against cells which express base2. In preferred embodiments, the antibody binds to an epitope of base2 (SEQ ID NO:2). The antibody can further be attached to a therapeutic moiety or to a detectable label. In some embodiments, the therapeutic moiety is a cytotoxin. In preferred embodiments, the cytotoxin is selected from the group consisting of ricin A, abrin, ribotoxin, ribonuclease, saporin, calicheamycin, diphtheria toxin or a subunit thereof, Pseudomonas exotoxin, a cytotoxic portion thereof, a mutated Pseudomonas exotoxin, a cytotoxic portion thereof, and botulinum toxins A through F, pokeweed antiviral toxin or a cytotoxic fragment thereof, and bryodin 1 or a cytotoxic fragment thereof. In more preferred embodiments, the cytotoxin is a Pseudomonas exotoxin or a cytotoxic fragment thereof.

[0017] In yet another group of embodiments, the invention provides methods of inhibiting the growth of a cancer cell expressing base2 (SEQ ID NO: 2) on its exterior surface, comprising contacting the cell with an immunoconjugate comprising a therapeutic moiety and a targeting moiety, the targeting moiety comprising an antibody which specifically binds to an epitope of base2, wherein said binding permits the therapeutic moiety to inhibit the growth of the cell. The therapeutic moiety can be, for example, a drug, cytotoxin, liposome loaded with a drug, or a radioisotope.

[0018] The invention further provides the use of an antibody that specifically binds to an epitope of a polypeptide selected from the group consisting of a base2 protein (SEQ ID NO:2), an immunogenic fragment thereof, a polypeptide with at least 90%) sequence identity to base2 and which is specifically recognized by an antibody which specifically recognizes base2, and a polypeptide which has at least 90 % sequence identity with base2 and which, when processed and presented in the context of Major Histocompatibility Complex molecules, activates T lymphocytes against cells which express base2 for the manufacture of a medicament for a base2 -expressing cancer. The antibody can be attached to a therapeutic moiety. In some embodiments, the therapeutic moiety is a cytotoxin or a radioisotope.

[0019] The invention further provides kits for detecting a SEQ ID NO: 1- or SEQ ID NO:2- expressing cancer. In some embodiments, the kit comprises a container and an antibody which specifically binds to SEQ ID NO:2 or to amino acids 167-179 of SEQ ID NO:l. In other embodiments, the kit comprises a container and a nucleic acid which hybridizes under stringent conditions to a nucleic acid encoding SEQ ID NO: 2 or to a nucleic acid encoding amino acids 167-179 of SEQ ID NO: 1. BRIEF DESCRIPTION OF THE DRAWINGS [0020] Figure 1 Alignment of the BASE cDNA sequence with the human genome. The top black line represents DNA sequence from the human genome from chromosome 20. The numbers above the thick black line indicate the location of the sequence on chromosome 20, and the numbers below the thick black line are relative locations in base pairs (GoldenPath, June 2002). The black boxes represent exons from the full-length sequence of the BASE cDNA clone (kae08h07). The gray boxes represent the exons from the single 5' sequencing reaction for the MAPcL cDNA clones listed on the left. Connecting thin black and gray lines represent introns. Locations of primers used to amplify BASE are indicated as arrows labeled F (forward) and R (reverse).

[0021] Figure 2 (Figure 2 comprises panels (a) and (b)). Expression of BASE in the MAPcL cell lines and breast cancers. Figure 2a. Expression of BASE in the MAPcL cell lines. Expression levels were determined by RT-PCR using membrane-associated polyribosomal RNA isolated from the library cell lines as a template for cDNA synthesis. PCR was performed using primers to BASE. The PCR products were analyzed on a 1.5% agarose gel with ethidium bromide staining as follows: lane 1, MCF7; lane 2, SK-BR-3; lane 3, ZR-75-1 ; lane 4, MDA-MB-231 ; lane 5, hTERT-HMEl; lane 6, LNCaP; lane 7, pKAE08h07 (BASE); and lane 8, no template. Separate PCR reactions were done using transferrin receptor primers, which amplify a 615-bp fragment, to verify the quality of the generated cDNA. The DNA ladder in bp is indicated on the right. Figure 2b. Expression of BASE in breast cancer samples. RT-PCR analysis was performed using 11 breast cancer total RNA samples as templates for cDNA synthesis. The BASE PCR primers, shown in Figure 1 , amplify a 464-bp fragment. The PCR products were analyzed on a 1.5% agarose gel with ethidium bromide staining as follows: lanes 1-11, breast tumors; and lane 12, pKAE08h07 (BASE). PCR reactions using actin primers were performed separately and produce a 640-bp product. The DNA ladder in bp is indicated on the right.

[0022] Figure 3 (Figure 3 comprises panels a-c). Analysis of BASE expression in normal tissues and transcript size. Figure 3a. Expression of BASE in normal tissues. A human MTE^rM array containing 61 tissue-specific mRNA samples was hybridized with the cDNA insert of B ASE (KAE08h07). The array is as follows: A 1, whole brain; Bl, cerebral cortex; CI, frontal lobe; Dl, parietal lobe; El, occipital lobe; FI, temporal lobe; Gl, p.g. of cerebral cortex; HI, pons; A2, cerebellum left; B2, cerebellum right; C2, corpus callosum; D2, amygdala; E2, caudate nucleus; F2, hippocampus; G2, medulla oblongata; H2, putamen; A3, substantia nigra; B3, nucleus accumbens; C3 thalamus; D3, pituitary gland; E3, spinal cord; A4, heart ; B4, aorta; C4, atrium left; D4, atrium right; E4, ventricle left; F4, ventricle right; G4, interventricular septum; G4, apex of the heart; A5, esophagus; B5, stomach; C5, duodenum; D5, jejunum; E5, ileum; F5, ilocecum; G5, appendix; H5, colon ascending; A6, colon transverse; B6, colon descending; C6, rectum; A7, kidney; B7, skeletal muscle; C7, spleen; D7, thymus; E7, peripheral blood leukocytes; F7, lymph node; G7, bone marrow; H7, trachea; A8, lung; B8, placenta; C8, bladder; D8, uterus; E8, prostate; F8, testis; G8, ovary; A9, liver; B9, pancreas; C9, adrenal gland; D9, thyroid gland; E9, salivary gland; and F9, mammary gland. Figure 3b. RT-PCR analysis of BASE expression in 24 normal tissues. PCR reactions were performed using a rapid scan gene expression panel containing cDNA samples from 24 different normal tissues as follows: 1, brain; 2, heart; 3, kidney; 4, spleen; 5, liver; 6, colon; 7, lung; 8, small intestine; 9, muscle; 10, stomach; 11, testis; 12, placenta; 13, salivary gland; 14, thyroid; 15, adrenal gland; 16, pancreas; 17, ovary; 18, uterus; 19, prostate; 20, skin; 21, peripheral blood lymphocyte; 22, bone marrow; 23, fetal brain; 24, fetal liver; and 25, pKAE08h07. PCR primers for BASE are shown in Figure 1. These primers are located in different exons and amplify a 464-bp BASE fragment. As a positive control, pKAE08h07 (BASE) was used as a template for the PCR reaction (lane 25). PCR reactions using actin primers were performed separately and produced a 640-bp product. The PCR products were analyzed on a 1.5% agarose gel with ethidium bromide staining. Figure 3c. Northem blot analysis of BASE transcripts. Each lane contains poly(A) RNA (2 μg) from salivary gland (lane 1) and ZR-75-1 cells (lane 2). The membrane was probed with the 1.4 kb cDNA insert of KAE08h07. The membrane was stripped and analyzed with the β- actin probe to verify equal loading (lower panel). RNA size markers in kilobases are indicated on the right.

[0023] Figure 4. This Figure sets forth the amino acid sequence (SEQ ID NO: 1) and nucleic acid sequence (SEQ ID NO:3) of the basel protein.

[0024] Figure 5. This Figure sets forth the amino acid sequence (SEQ ID NO:2) and nucleic acid sequence (SEQ ID NO:4) of the base2 protein. DETAILED DESCRIPTION OF THE INVENTION

INTRODUCTION

[0025] The present invention concerns the discovery of a new human gene and the use of proteins expressed from the gene for the diagnosis and therapy of breast cancer. The gene has been given the name BASE (for breast cancer and salivary gland expression). The BASE gene is located on chromosome 20, from nucleotides 31604140 to 31621138 (see Figure 1).

[0026] Two proteins are expressed from the BASE gene by alternative splicing. The first, called "basel " (SEQ ID NO:l), is a 19.5 kD protein of 179 amino acids. The second, called "base2" (SEQ ID NO:2), is a 8.4 kD protein of 73 amino acids. The nucleic acid sequences encoding basel and base2 are set forth as SEQ ID NO:3 and SEQ ID NO:4, respectively. The amino acid sequence (SEQ ID NO:l) of basel and the nucleic acid sequence encoding it (SEQ ID NO:3) are shown in Figure 4. The amino acid sequence (SEQ ID NO:2) of base2 and the nucleic acid sequence encoding it (SEQ ID NO:4) are set forth in Figure 5.

[0027] BASE expression is found in about 25% of breast cancers, about the same percentage of breast overexpress the human epidermal growth factor receptor 2 ("HER2") which is the target of the commercially available therapeutic Herceptin®. Since only some 25% of breast cancers overexpress HER2, the discovery of another marker present on a large subset of breast cancers is quite important.

[0028] Two proteins are expressed from BASE. The protein designated "basel " is a secreted protein. Thus, it is expected to be present in body fluids, such as the serum. Detection of basel in the blood or urine is indicative of a BASE-expressing breast cancer.

[0029] Base2 is not secreted, and thus cannot readily be detected in body fluids. It can, however, be used as a diagnostic indicator of a BASE-expressing breast cancer in samples containing BASE-expressing tumor cells, such as tissue biopsies from tumor sites.

[0030] In addition to diagnosing the presence of a BASE-expressing cancer, detection of BASE expression can be used to monitor or to stage the cancer. Continued expression of BASE following surgery, for example, is indicative that the cancer metastasized before the surgery or that not all the tumor was removed, while continued expression after chemotherapy indicates that not all the cancer has been eradicated by the therapy.

Quantitation of basel levels in a patient's blood or urine permits a determination as to whether the amount of BASE-expressing tissue is increasing or decreasing over time, permitting an estimate of whether a patient's disease is progressing.

[0031] Detection is typically performed by immunoassays, using antibodies to basel or to base2, or to both. Detection may also be performed by disrupting cells and testing for mRNA encoding basel or base2. Detection of expression of BASE in cell samples may be accomplished by any convenient technique known in the art, such as by northern blotting or RT-PCR. Since the only normal tissue known to express BASE is the salivary gland, if a northern blot on a sample from any other location in the body shows detectable amounts of BASE nucleic acids, the practitioner can assume the presence of an BASE-expressing cancer in the sampled tissue. The diagnosis can be confirmed by knowledge of the site from which the sample was taken, histologic and morphologic features of the cells, and other routine diagnostic criteria.

[0032] Basel and base2, immunogenic fragments of basel or base2, nucleic acids encoding basel or base2, or nucleic acids encoding immunogenic fragments thereof can also be used ex vivo to activate cytotoxic T lymphocytes ("CTLs") derived from a subject to attack cells of BASE-expressing cancers when infused into the subject. For example, antigen presenting cells can be pulsed or "loaded" with basel or base2 immunogenic fragments, or transfected with nucleic acids encoding such fragments, or differentiated from stem cells transduced with such nucleic acids, and then placed in contact with CTLs to activate them against cells expressing basel or base2, or both.

[0033] Basel, base2, immunogenic fragments of basel or base2, nucleic acids encoding these proteins, or immunogenic fragments thereof, can be administered to a subject, typically in a pharmaceutically acceptable carrier, to raise or to heighten an immune response to an BASE-expressing cancer. Such compositions can be administered therapeutically, in individuals who have been diagnosed as suffering from an BASE-expressing cancer. In preferred embodiments, the protein or immunogenic fragments thereof are of basel or base2 and the cancer is a breast cancer.

[0034] Base2 is believed to be membrane-associated. Thus, antibodies which recognize base2 can be used to target effector molecules to cells expressing base2 on the exterior surface of the cell. For example, a single-chain construct comprising the variable regions of an immunoglobulin heavy chain, a light chain, or both, can be coupled to an effector molecule, such as a detectable label, to form an immunoconjugate. The immunoconjugate can then be used to detect the presence of a base2 expressing cell in a sample. In some embodiments, the immunoconjugate is used in vitro to detect the presence of base2 - expressing cells in a sample biopsied from a patient. The presence of base2 in cell sample taken from a site other than the salivary glands is indicative of the presence of a BASE- expressing cancer. The immunoconjugate can also be used in vitro on a culture of cells to confirm, for example, that base2-expressing cells have been purged from the culture.

[0035] In other embodiments, the effector molecule of the immunoconjugate is a therapeutic agent, such as an anticancer drug, a cytotoxin, or a radioisotope, which is targeted to the cancer cells by the antibody portion of the immunoconjugate. In a preferred group of such embodiments, the effector molecule targeted by the anti-base2 antibodies are toxins. The toxin may be, for example, a radioisotope or a chemical toxin. Suitable toxins are described in more detail below. In particularly preferred embodiments, the toxin is a Pseudomonas exotoxin A ("PE"), mutated to reduce or eliminate the non-specific binding of the toxin, or a cytotoxic fragment thereof. It should be noted that the only normal tissue found to express BASE in significant amounts are the salivary glands. Persons of skill in the art will recognize that the salivary glands are not essential to maintaining the life of the patient and any effect on the salivary glands due to the administration of an anti-base2 immunotoxin will typically be outweighed by the therapeutic benefit to the patient of the effect of the immunotoxin on the BASE-expressing cancer.

[0036] The sections below discuss various features of BASE, basel, and base2. The text continues with definitions used in this disclosure, with a discussion of the selection of immunogenic fragments of basel and base2, the administration of basel and base2 to subjects, the formation of antibodies against basel and base2, detection of BASE transcript and proteins, and compositions of the base proteins or nucleic acids in pharmaceutically acceptable carriers.

DEFINITIONS

[0037] Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide a general definition of many of the terms used in this invention: Singleton et al. , DICTIONARY OF MICROBIOLOGY AND MOLECULAR BIOLOGY (2d ed.

1994); THE CAMBRIDGE DICTIONARY OF SCIENCE AND TECHNOLOGY (Walker ed., 1988); THE GLOSSARY OF GENETICS, 5TH ED., R. Rieger et al (eds.), Springer Verlag (1991); and Hale & Marham, THE HARPER COLLINS DICTIONARY OF BIOLOGY (1991). As used herein, the following terms have the meanings ascribed to them unless specified otherwise.

[0038] As used herein, "BASE" (breast and salivary gland expression) refers to a gene discovered to be expressed in human breast cancer cells and in salivary glands. The BASE gene is located on chromosome 20, from nucleotides 31604140 to 31621138. BASE is expressed in two alternatively spliced proteins.

[0039] "basel" and "base2" denote two alternatively spliced proteins expressed from BASE, "basel" (amino acid sequence SEQ ID NO:l, the nucleic acid sequence encoding basel is SEQ ID NO:3) is a 19.5 kD protein of 179 amino acids. "base2" (amino acid sequence SEQ ID NO:2, the nucleic acid sequence encoding base2 is SEQ ID NO:4) is a 8.4 kD protein of 73 amino acids.

[0040] As used herein, an "immunogenic fragment" of SEQ ID NO: 1 or of SEQ ID NO:2 refers to a portion of SEQ ID NO:l or SEQ ID NO:2, respectively, which, when presented by a cell in the context of a molecule of the Major Histocompatibility Complex, can in a T-cell activation assay, activates a T-lymphocyte against a cell expressing SEQ ID NO:l or SEQ ID NO:2. Typically, such fragments are 8 to 12 contiguous amino acids of SEQ ID NO:l or SEQ ID NO:2 in length, although longer fragments may of course also be used.

[0041] In the context of comparing one polypeptide to another, "sequence identity is determined by comparing the sequence of basel or base2, as the reference sequence, to a test sequence. Typically, the two sequences are aligned for maximal or optimal alignment.

[0042] "Attached," in relation to a chimeric molecule comprised of a targeting molecule (such as an antibody) and an effector molecule (such as a label or cytotoxin) means that the targeting molecule and the effector molecule are linked by a covalent bond. The two molecules can be attached can be by, for example, chemical conjugation. Where both molecules are proteins, they can be recombinantly expressed as a fusion protein in which the two molecules are linked by a peptide bond.

[0043] A "ligand" is a compound that specifically binds to a target molecule.

[0044] A "receptor" is compound that specifically binds to a ligand. [0045] "Cytotoxic T lymphocytes" ("CTLs") are important in the immune response to tumor cells. CTLs recognize peptide epitopes in the context of HLA class I molecules that are expressed on the surface of almost all nucleated cells.

[0046] Tumor-specific helper T lymphocytes ("HTLs") are also known to be important for maintaining effective antitumor immunity. Their role in antitumor immunity has been demonstrated in animal models in which these cells not only serve to provide help for induction of CTL and antibody responses, but also provide effector functions, which are mediated by direct cell contact and also by secretion of lymphokines (e.g., IFNγ and TNF-α).

[0047] "Antibody" refers to a polypeptide ligand comprising at least a light chain or heavy chain immunoglobulin variable region which specifically recognizes and binds an epitope (e.g., an antigen). For convenience of reference, the term "antibody" as used herein encompasses intact immunoglobulins and the variants and portions of them known in the art such as: Fab' fragments, F(ab)'₂ fragments, single chain Fv proteins ("scFv"), and disulfide stabilized Fv proteins ("dsFv"). An scFv protein is a fusion protein in which a light chain variable region of an immunoglobulin and a heavy chain variable region of an immunoglobulin are bound by a linker, usually a short peptide such as Gly₄Ser. The term also includes genetically engineered forms such as chimeric antibodies (e.g., humanized murine antibodies), heteroconjugate antibodies (e.g., bispecific antibodies). See also, Pierce Catalog and Handbook, 1994-1995 (Pierce Chemical Co., Rockford, IL); Kuby, J., Immunology, 3^rd Ed., W.H. Freeman & Co., New York (1997).

[0048] An antibody immunologically reactive with a particular antigen can be generated by recombinant methods such as selection of libraries of recombinant antibodies in phage or similar vectors, see, e.g., Huse, et al, Science 246: 1275-1281 (1989); Ward, et al, Nature 341:544-546 (1989); and Vaughan, et al, Nature Biotech. 14:309-314 (1996), or by immunizing an animal with the antigen or with DNA encoding the antigen.

[0049] "Epitope" or "antigenic determinant" refers to a site on an antigen to which B and/or T cells respond. Epitopes can be formed both from contiguous amino acids or noncontiguous amino acids juxtaposed by tertiary folding of a protein. Epitopes formed from contiguous amino acids are typically retained on exposure to denaturing solvents whereas epitopes formed by tertiary folding are typically lost on treatment with denaturing solvents. An epitope typically includes at least 3, and more usually, at least 5 or 8-10 amino acids in a unique spatial conformation. Methods of determining spatial conformation of epitopes include, for example, x-ray crystallography and 2-dimensional nuclear magnetic resonance. See, e.g., Epitope Mapping Protocols in METHODS IN MOLECULAR BIOLOGY, Vol. 66, Glenn E. Morris, Ed (1996).

[0050] A ligand or a receptor "specifically binds to" a compound analyte when the ligand or receptor functions in a binding reaction which is determinative of the presence of the analyte in a sample of heterogeneous compounds. Thus, the ligand or receptor binds preferentially to a particular analyte and does not bind in a significant amount to other compounds present in the sample. For example, a polynucleotide specifically binds to an analyte polynucleotide comprising a complementary sequence and an antibody specifically binds under immunoassay conditions to an antigen analyte bearing an epitope against which the antibody was raised.

[0051] "Immunoassay" refers to a method of detecting an analyte in a sample in which specificity for the analyte is conferred by the specific binding between an antibody and a ligand. This includes detecting an antibody analyte through specific binding between the antibody and a ligand. See Harlow and Lane (1988) ANTIBODIES, A LABORATORY MANUAL, Cold Spring Harbor Publications, New York, for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity.

[0052] "Vaccine" refers to an agent or composition containing an agent effective to confer a therapeutic degree of immunity on an organism while causing only very low levels of morbidity or mortality. Methods of making vaccines are, of course, useful in the study of the immune system and in preventing and treating animal or human disease.

[0053] An "immunogenic amount" is an amount effective to elicit an immune response in a subject.

[0054] A "targeting moiety" is the portion of an immunoconjugate intended to target the immunoconjugate to a cell of interest. Typically, the targeting moiety is an antibody, a scFv, a dsFv, an Fab, or an F(ab') .

[0055] A "detectable label" means, with respect to an immunoconjugate, a portion of the immunoconjugate which has a property rendering its presence detectable. For example, the immunoconjugate may be labeled with a radioactive isotope which permits cells in which the immunoconjugate is present to be detected in immunohistochemical assays. [0056] The term "effector moiety" means the portion of an immunoconjugate intended to have an effect on a cell targeted by the targeting moiety or to identify the presence of the immunoconjugate. Thus, the effector moiety can be, for example, a therapeutic moiety, a cytotoxin, a radiolabel, or a fluorescent label.

[0057] The term "immunoconjugate" includes reference to a covalent linkage of an effector molecule to an antibody. Where the effector molecule is a toxin, the immunoconjugate can more precisely be referred to as an "immunotoxin."

[0058] The terms "effective amount" or "amount effective to" or "therapeutically effective amount" includes reference to a dosage of a therapeutic agent sufficient to produce a desired result, such as inhibiting cell protein synthesis by at least 50%, or killing the cell.

[0059] The term "contacting" includes reference to placement in direct physical association.

[0060] An "expression plasmid" comprises a nucleotide sequence encoding a molecule or interest, which is operably linked to a promoter.

[0061] As used herein, the term "anti-BASE" in reference to an antibody, includes reference to an antibody which is generated against SEQ ID NO:l or SEQ ID NO:2. In a preferred embodiment, the antibody is generated against SEQ ID NO:l or SEQ ID NO:2 synthesized by a non-primate mammal after introduction into the animal of cDNA which encodes a human protein of SEQ ID NO:l or SEQ ID NO:2. In a more preferred embodiment, the antibody is a monoclonal antibody.

[0062] "Polypeptide" refers to a polymer composed of amino acid residues, related naturally occurring structural variants, and synthetic non-naturally occurring analogs thereof linked via peptide bonds, related naturally occurring structural variants, and synthetic non- naturally occurring analogs thereof. Synthetic polypeptides can be synthesized, for example, using an automated polypeptide synthesizer. The term "protein" typically refers to large polypeptides. The term "peptide" typically refers to short polypeptides.

[0063] Conventional notation is used herein to portray polypeptide sequences: the left-hand end of a polypeptide sequence is the amino-terminus; the right-hand end of a polypeptide sequence is the carboxyl-terminus. [0064] "Fusion protein" refers to a polypeptide formed by the joining of two or more polypeptides through a peptide bond formed by the amino terminus of one polypeptide and the carboxyl terminus of the other polypeptide. A fusion protein may is typically expressed as a single polypeptide from a nucleic acid sequence encoding the single contiguous fusion protein. However, a fusion protein can also be formed by the chemical coupling of the constituent polypeptides.

[0065] "Conservative substitution" refers to the substitution in a polypeptide of an amino acid with a functionally similar amino acid. The following six groups each contain amino acids that are conservative substitutions for one another:

1) Alanine (A), Serine (S), Threonine (T);

2) Aspartic acid (D), Glutamic acid (E);

3) Asparagine (N), Glutamine (Q);

4) Arginine (R), Lysine (K);

5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and 6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).

See also, Creighton, PROTEINS, W.H. Freeman and Company, New York (1984). [0066] Two proteins are "homologs" of each other if they exist in different species, are derived from a common genetic ancestor and share at least 10% amino acid sequence identity.

[0067] "Substantially pure" or "isolated" means an object species is the predominant species present (i.e., on a molar basis, more abundant than any other individual macromolecular species in the composition), and a substantially purified fraction is a composition wherein the object species comprises at least about 50% (on a molar basis) of all macromolecular species present. Generally, a substantially pure composition means that about 80% to 90% or more of the macromolecular species present in the composition is the purified species of interest. The object species is purified to essential homogeneity (contaminant species cannot be detected in the composition by conventional detection methods) if the composition consists essentially of a single macromolecular species. Solvent species, small molecules (<500 Daltons), stabilizers (e.g., BSA), and elemental ion species are not considered macromolecular species for purposes of this definition.

[0068] "Nucleic acid" refers to a polymer composed of nucleotide units (ribonucleotides, deoxyribonucleotides, related naturally occurring structural variants, and synthetic non- naturally occurring analogs thereof) linked via phosphodiester bonds, related naturally occurring structural variants, and synthetic non-naturally occurring analogs thereof. Thus, the term includes nucleotide polymers in which the nucleotides and the linkages between them include non-naturally occurring synthetic analogs, such as, for example and without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs), and the like. Such polynucleotides can be synthesized, for example, using an automated DNA synthesizer. The term "oligonucleotide" typically refers to short polynucleotides, generally no greater than about 50 nucleotides. It will be understood that when a nucleotide sequence is represented by a DNA sequence (i.e., A, T, G, C), this also includes an RNA sequence (i.e., A, U, G, C) in which "U" replaces "T."

[0069] Conventional notation is used herein to describe nucleotide sequences: the left-hand end of a single-stranded nucleotide sequence is the 5 '-end; the left-hand direction of a double- stranded nucleotide sequence is referred to as the 5'-direction. The direction of 5' to 3' addition of nucleotides to nascent RNA transcripts is referred to as the transcription direction. The DNA strand having the same sequence as an mRNA is referred to as the "coding strand"; sequences on the DNA strand having the same sequence as an mRNA transcribed from that DNA and which are located 5' to the 5'-end of the RNA transcript are referred to as "upstream sequences"; sequences on the DNA strand having the same sequence as the RNA and which are 3' to the 3' end of the coding RNA transcript are referred to as "downstream sequences."

[0070] "cDNA" refers to a DNA that is complementary or identical to an mRNA, in either single stranded or double stranded form.

[0071] "Encoding" refers to the inherent property of specific sequences of nucleotides in a polynucleotide, such as a gene, a cDNA, or an mRNA, to serve as templates for synthesis of other polymers and macromolecules in biological processes having either a defined sequence of nucleotides (i.e., rRNA, tRNA and mRNA) or a defined sequence of amino acids and the biological properties resulting therefrom. Thus, a gene encodes a protein if transcription and translation of mRNA produced by that gene produces the protein in a cell or other biological system. Both the coding strand, the nucleotide sequence of which is identical to the mRNA sequence and is usually provided in sequence listings, and non-coding strand, used as the template for transcription, of a gene or cDNA can be referred to as encoding the protein or other product of that gene or cDNA. Unless otherwise specified, a "nucleotide sequence encoding an amino acid sequence" includes all nucleotide sequences that are degenerate versions of each other and that encode the same amino acid sequence. Nucleotide sequences that encode proteins and RNA may include introns.

[0072] "Recombinant nucleic acid" refers to a nucleic acid having nucleotide sequences that are not naturally joined together. This includes nucleic acid vectors comprising an amplified or assembled nucleic acid which can be used to transform a suitable host cell. A host cell that comprises the recombinant nucleic acid is referred to as a "recombinant host cell." The gene is then expressed in the recombinant host cell to produce, e.g., a "recombinant polypeptide." A recombinant nucleic acid may serve a non-coding function (e.g., promoter, origin of replication, ribosome-binding site, etc.) as well.

[0073] "Expression control sequence" refers to a nucleotide sequence in a polynucleotide that regulates the expression (transcription and/or translation) of a nucleotide sequence operatively linked thereto. "Operatively linked" refers to a functional relationship between two parts in which the activity of one part (e.g., the ability to regulate transcription) results in an action on the other part (e.g., transcription of the sequence). Expression control sequences can include, for example and without limitation, sequences of promoters (e.g., inducible or constitutive), enhancers, transcription terminators, a start codon (i.e., ATG), splicing signals for introns, and stop codons.

[0074] "Expression cassette" refers to a recombinant nucleic acid construct comprising an expression control sequence operatively linked to an expressible nucleotide sequence. An expression cassette generally comprises sufficient cis-acting elements for expression; other elements for expression can be supplied by the host cell or in vitro expression system.

[0075] "Expression vector" refers to a vector comprising an expression cassette. Expression vectors include all those known in the art, such as cosmids, plasmids (e.g., naked or contained in liposomes) and viruses that incorporate the expression cassette.

[0076] A first sequence is an "antisense sequence" with respect to a second sequence if a polynucleotide whose sequence is the first sequence specifically hybridizes with a polynucleotide whose sequence is the second sequence.

[0077] Terms used to describe sequence relationships between two or more nucleotide sequences or amino acid sequences include "reference sequence," "selected from," "comparison window," "identical," "percentage of sequence identity," "substantially identical," "complementary," and "substantially complementary."

[0078] For sequence comparison of nucleic acid sequences, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters are used. Methods of alignment of sequences for comparison are well-known in the art. Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. Mol. Biol 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat 'I Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and visual inspection (see, e.g., Current Protocols in Molecular Biology (Ausubel et al, eds 1995 supplement)).

[0079] One example of a useful algorithm is PILEUP. PILEUP uses a simplification of the progressive alignment method of Feng & Doolittle, J. Mol. Evol 35:351-360 (1987). The method used is similar to the method described by Higgins & Sharp, CABIOS 5:151-153 (1989). Using PILEUP, a reference sequence is compared to other test sequences to determine the percent sequence identity relationship using the following parameters: default gap weight (3.00), default gap length weight (0.10), and weighted end gaps. PILEUP can be obtained from the GCG sequence analysis software package, e.g., version 7.0 (Devereaux et al, Nuc. Acids Res. 12:387-395 (1984).

[0080] Another example of algorithms that are suitable for determining percent sequence identity and sequence similarity are the BLAST and the BLAST 2.0 algorithm, which are described in Altschul et al, J. Mol Biol. 215:403-410 (1990) and Altschul et al, Nucleic Acids Res. 25:3389-3402 (1977)). Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (which can be found on the World Wide Web at ncbi.nlm.nih.gov). The BLASTN program (for nucleotide sequences) uses as defaults a word length (W) of 11, alignments (B) of 50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands. The BLASTP program (for amino acid sequences) uses as defaults a word length (W) of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)).

[0081] "Stringent hybridization conditions" refers to 50% formamide, 5 x SSC and 1% SDS incubated at 42° C or 5 x SSC and 1% SDS incubated at 65° C, with a wash in 0.2 x SSC and 0.1% SDS at 65° C.

[0082] "Naturally-occurring" as applied to an object refers to the fact that the object can be found in nature. For example, an amino acid or nucleotide sequence that is present in an organism (including viruses) that can be isolated from a source in nature and which has not been intentionally modified by man in the laboratory is naturally-occurring.

[0083] "Linker" refers to a molecule that joins two other molecules, either covalently, or through ionic, van der Waals or hydrogen bonds, e.g., a nucleic acid molecule that hybridizes to one complementary sequence at the 5' end and to another complementary sequence at the 3' end, thus joining two non-complementary sequences.

[0084] "Pharmaceutical composition" refers to a composition suitable for pharmaceutical use in a mammal. A pharmaceutical composition comprises a pharmacologically effective amount of an active agent and a pharmaceutically acceptable carrier.

[0085] "Pharmacologically effective amount" refers to an amount of an agent effective to produce the intended pharmacological result.

[0086] "Pharmaceutically acceptable carrier" refers to any of the standard pharmaceutical carriers, buffers, and excipients, such as a phosphate buffered saline solution, 5% aqueous solution of dextrose, and emulsions, such as an oil/water or water/oil emulsion, and various types of wetting agents and/or adjuvants. Suitable pharmaceutical carriers and formulations are described in REMINGTON'S PHARMACEUTICAL SCIENCES, 19th Ed. (Mack Publishing Co., Easton, 1995). Preferred pharmaceutical carriers depend upon the intended mode of administration of the active agent. Typical modes of administration include enteral (e.g., oral) or parenteral (e.g., subcutaneous, intramuscular, intravenous or intraperitoneal injection; or topical, transdermal, or transmucosal administration). A "pharmaceutically acceptable salt" is a salt that can be formulated into a compound for pharmaceutical use including, e.g., metal salts (sodium, potassium, magnesium, calcium, etc.) and salts of ammonia or organic amines.

[0087] A "subject" of diagnosis or treatment is a human or non-human mammal. [0088] "Administration" of a composition refers to introducing the composition into the subject by a chosen route of administration. For example, if the chosen route is intravenous, the composition is administered by introducing the composition into a vein of the subject.

[0089] "Treatment" refers to prophylactic treatment or therapeutic treatment.

[0090] A "prophylactic" treatment is a treatment administered to a subject who does not exhibit signs of a disease or exhibits only early signs for the purpose of decreasing the risk of developing pathology.

[0091] A "therapeutic" treatment is a treatment administered to a subject who exhibits signs of pathology for the purpose of diminishing or eliminating those signs.

[0092] "Diagnostic" means identifying the presence or nature of a pathologic condition. Diagnostic methods differ in their sensitivity and specificity. The "sensitivity" of a diagnostic assay is the percentage of diseased individuals who test positive (percent of true positives). The "specificity" of a diagnostic assay is 1 minus the false positive rate, where the false positive rate is defined as the proportion of those without the disease who test positive. While a particular diagnostic method may not provide a definitive diagnosis of a condition, it suffices if the method provides a positive indication that aids in diagnosis.

[0093] "Prognostic" means predicting the probable development (e.g., severity) of a pathologic condition.

PROTEINS SYNTHESIZED FROM BASE

[0094] This invention provides isolated, recombinant proteins synthesized from BASE. Two proteins are expressed from BASE. Basel is a 19.5 kD protein of 179 amino acids (amino acid sequence, SEQ ID NO:l, encoding nucleotide sequence SEQ ID NO:3, both of which are set forth in Figure 4.). Base2 is a 8.49 kD protein of 73 amino acids (amino acid sequence, SEQ ID NO:2, encoding nucleotide sequence SEQ ID NO:4, both of which are set forth in Figure 5). Because of the degeneracy of the genetic code, persons of skill will recognize that numerous other nucleotide sequences could encode the same amino acid sequences.

[0095] In certain embodiments, this invention provides polypeptides comprising an epitope comprising at least 5 to at least 15 consecutive amino acids from basel or from base2. Such proteins bind to antibodies raised against full-length basel or base2, respectively. In some embodiments, the anti-base 1 antibodies to basel bind epitopes within amino acids 167-179 of SEQ ID NO: 1.

[0096] In other embodiments, this invention provides fusion proteins comprising a first and second polypeptide moiety in which one of the protein moieties comprises an amino acid sequence of at least 5 amino acids identifying an epitope of basel or base2. In one embodiment the BASE moiety is all or substantially of basel or base2. The other moiety can be, e.g., an immunogenic protein. Such fusions also are useful to evoke an immune response against basel or base2, respectively.

[0097] In other embodiments, this invention provides basel -like peptides ("basel analogs") whose amino acid sequences are at least 95% identical to basel (although they may have 96%o, 91%, 98%, or even 99% sequence identity to basel) and which are specifically bound by antibodies which specifically bind to basel . In preferred embodiments this invention provides basel -like peptides (also sometimes referred to herein as "basel -analogs") whose amino acid sequences are at least 95% identical to basel (although they may have 96, 97, 98, or even 99% sequence identity to basel) and which activate T-lymphocytes to cells which express basel .

[0098] Similarly, in some embodiments, this invention provides base2-like peptides ("base2 analogs") whose amino acid sequences are at least 90% identical to base2 (although they may have 91%, 92%, 93%, 94%, 95%, or even higher sequence identity to base2) and which are specifically bound by antibodies which specifically bind to base2. In preferred embodiments this invention provides base2-like peptides (also sometimes referred to herein as "base2-analogs") whose amino acid sequences are at least 90% identical to base2 (although they may have 91, 92, 93, 94, 95, 96, 97, 98 or even 99% sequence identity to base2) and which activate T-lymphocytes to cells which express base2.

[0099] In another embodiment, the polypeptide comprises an epitope that binds an MHC molecule, e.g., an HLA molecule or a DR molecule. These molecules bind polypeptides having the correct anchor amino acids separated by about eight or nine amino acids. These peptides can be identified by inspection of the amino acid sequence of basel or base2 and by knowledge of the MHC binding motifs, well known in the art.

[0100] Basel, base2, immunogenic fragments of these proteins, and basel- and base2- analogs can be synthesized recombinantly. Immunogenic fragments of basel and base2 and the full length proteins can also be chemically synthesized by standard methods. If desired, polypeptides can also be chemically synthesized by emerging technologies, such as the one described in W. Lu et al, Federation of European Biochemical Societies Letters. 429:31-35 (1998).

BASE NUCLEIC ACIDS

[0101] In one aspect this invention provides isolated, recombinant nucleic acid molecules comprising nucleotide sequences encoding the basel and base2 proteins, respectively (see, e.g., Figures 4 and 5). The nucleic acids are useful for expressing basel and base2, which can then be used, for example, to raise antibodies for diagnostic purposes. The practitioner can use these sequences to prepare PCR primers for isolating nucleotide sequences of the invention. The sequences encoding basel and base2 can be modified to engineer nucleic acids encoding related molecules of this invention using well known techniques.

[0102] A nucleic acid comprising sequences of the invention can be cloned or amplified by in vitro methods, such as the polymerase chain reaction (PCR), the ligase chain reaction (LCR), the transcription-based amplification system (TAS), the self-sustained sequence replication system (3SR) and the Qβ replicase amplification system (QB). For example, a polynucleotide encoding the basel or the base2 protein can be isolated by polymerase chain reaction of cDNA using primers based on the DNA sequence of the molecule.

[0103] A wide variety of cloning and in vitro amplification methodologies are well-known to persons skilled in the art. PCR methods are described in, for example, U.S. Pat. No.

4,683,195; Mullis et al. (1987) Cold Spring Harbor Symp. Quant. Biol. 51:263; and Erlich, ed., PCR TECHNOLOGY, (Stockton Press, NY, 1989). Polynucleotides also can be isolated by screening genomic or cDNA libraries with probes selected from the sequences of the desired polynucleotide under stringent hybridization conditions.

[0104] Engineered versions of the nucleic acids can be made by site-specific mutagenesis of other polynucleotides encoding the proteins, or by random mutagenesis caused by increasing the error rate of PCR of the original polynucleotide with 0.1 mM MnCl₂ and unbalanced nucleotide concentrations.

A. Expression vectors

[0105] The invention also provides expression vectors for expressing basel and base2. Expression vectors can be adapted for function in prokaryotes or eukaryotes by inclusion of appropriate promoters, replication sequences, markers, etc. for transcription and translation of mRNA. The construction of expression vectors and the expression of genes in transfected cells involves the use of molecular cloning techniques well known in the art. Sambrook et al, MOLECULAR CLONING ~ A LABORATORY MANUAL, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, (1989) and CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, F.M.

Ausubel et al, eds., (Current Protocols, Greene Publishing Associates, Inc. and John Wiley & Sons, Inc.) ("Ausubel"). Useful promoters for such purposes include a metallothionein promoter, a constitutive adenovirus major late promoter, a dexamethasone-inducible MMTV promoter, a SV40 promoter, a MRP polIII promoter, a constitutive MPSV promoter, a tetracycline-inducible CMV promoter (such as the human immediate-early CMV promoter), and a constitutive CMV promoter. A plasmid useful for gene therapy can comprise other functional elements, such as selectable markers, identification regions, and other genes.

[0106] Expression vectors useful in this invention depend on their intended use. Such expression vectors must, of course, contain expression and replication signals compatible with the host cell. Expression vectors useful for expressing bioactive conjugates include viral vectors such as retroviruses, adenoviruses and adeno-associated viruses, plasmid vectors, cosmids, and the like. Viral and plasmid vectors are preferred for transfecting mammalian cells. The expression vector pcDNA3 (Invitrogen, San Diego, CA), in which the expression control sequence comprises the CMV promoter, provides good rates of transfection and expression. Adeno-associated viral vectors are useful in the gene therapy methods of this invention.

[0107] A variety of means are available for delivering polynucleotides to cells including, for example, direct uptake of the molecule by a cell from solution, facilitated uptake through lipofection (e.g., liposomes or immunoliposomes), particle-mediated transfection, and intracellular expression from an expression cassette having an expression control sequence operably linked to a nucleotide sequence that encodes the inhibitory polynucleotide. See also U.S. Patent 5,272,065 (Inouye et al); METHODS IN ENZYMOLOGY, vol. 185, Academic Press, Inc., San Diego, CA (D.V. Goeddel, ed.) (1990) or M. Krieger, GENE TRANSFER AND EXPRESSION - A LABORATORY MANUAL, Stockton Press, New York, NY, (1990). Recombinant DNA expression plasmids can also be used to prepare the polynucleotides of the invention for delivery by means other than by gene therapy, although it may be more economical to make short oligonucleotides by in vitro chemical synthesis. [0108] The construct can also contain a tag to simplify isolation of the protein. For example, a polyhistidine tag of, e.g., six histidine residues, can be incorporated at the amino terminal end of the protein. The polyhistidine tag allows convenient isolation of the protein in a single step by nickel-chelate chromatography.

B. Recombinant cells

[0109] The invention also provides recombinant cells comprising an expression vector for expression of the nucleotide sequences of this invention ("host cells"). Host cells can be selected for high levels of expression in order to purify the protein. The cells can be prokaryotic cells, such as E. coli, or eukaryotic cells. Useful eukaryotic cells include yeast and mammalian cells. The cell can be, e.g., a recombinant cell in culture or a cell in vivo.

[0110] Cells expressing basel and base2 are useful for active or passive immunization of subjects against cells expressing these peptides. In certain embodiments, the cells are bacterial cells. In one version of active immunization, recombinant cells are autologous cells of the subject that can present the polypeptides in association with HLA molecules. For example, antigen presenting cells are useful for this purpose. In this case, it is preferable to use "autologous cells," that is, cells derived from the subject. Such cells are MHC compatible. The basel- or base2- encoding nucleotide sequence should be placed under the control of a constitutive promoter in such cells because one goal is to express the polypeptides in high density on the cell surface.

METHODS OF ELICITING A CELL-MEDIATED IMMUNE RESPONSE AGAINST CELLS EXPRESSING BASE

[0111] BASE is expressed by cells of some 25% breast cancers examined. Therefore, BASE can be used as a target of intervention in inhibiting the growth of cells of these cancers which express BASE, as well as a marker for cancer cells that have metastasized from these cancers. This invention provides methods of treating these cancers with immunotherapy. The methods involve immunizing a subject against basel or base2, or both, thereby eliciting a cell-mediated immune response against cells expressing these proteins. Immunization can be active or passive. In active immunization, the immune response is elicited in the subject in vivo. In passive immunization, Tc cells activated against the polypeptide are cultured in vitro and administered to the subject. Such methods may be expected to result in the destruction of healthy salivary gland tissue that expresses BASE. However, the salivary glands are not an essential organ. Their loss must be counterbalanced against the chance for loss of the subject's life from the cancer.

[0112] The immunizing agent can be of full-length basel or base2, a peptide comprising an antigenic determinant of basel or base2, e.g., an immunogenic fragment of basel, or a protein or peptide that is substantially identical to basel or base2. In preferred embodiments, the immunizing agent is full-length basel, an immunogenic fragment thereof, or a protein or peptide that is substantially identical to basel (that is, which has 90% or more sequence identity to basel and preferably about 95%, 96%, 97%, 98%, or more sequence identity). When one is attempting to elicit a cell-mediated immune response against BASE, preferred peptides comprising antigenic determinants are those peptides bearing a binding motif for an HLA molecule of the subject. These motifs are well known in the art. For example, HLA- A2 is a common allele in the human population. The binding motif for this molecule includes polypeptides with 9 or 10 amino acids having leucine or methionine in the second position and valine or leucine in the last positions.

[0113] Based on the polypeptide sequence of basel and base2, one can identify amino acid sequences bearing motifs for any particular HLA molecule. Peptides comprising these motifs can be prepared by any of the typical methods (e.g., recombinantly, chemically, etc.). Because basel and base2 are self proteins, the preferred amino acid sequences bearing HLA binding motifs are those that encode subdominant or cryptic epitopes. Those epitopes can be identified by a lower comparative binding affinity for the HLA molecule with respect to other epitopes in the molecule or compared with other molecules that bind to the HLA molecule.

[0114] Polypeptides that comprise an amino acid sequence from basel or base2 that, in turn, comprise an HLA binding motif also are useful for eliciting an immune response. This is because, in part, such proteins will be processed by the cell into a peptide that can bind to the HLA molecule and that have a basel or base2 epitope.

[0115] In some preferred embodiments, the HLA molecule is HLA-A2. HLA-A2 is the most common HLA Class I molecule in most of the world's population. For example, it is present in about 45% of the North American population. Peptides that bind to HLA-A2 typically are 9 to 10 amino acids in length. Further, while the central residues of the peptides (residues 4-8, and residue 9 if the peptide is a 10-amino acid peptide) cannot be varied without some effect on binding or induction of CTL activity, some variations can be made with regard to residues 1-3 on the C- terminal end and with regard to residue 10 of the N- terminal end. The residues at positions 1, 2, 3 and at the last residue position (position 9 or 10, depending on the length of the peptide) are the ones that have been found to be permissive of certain types of variations. In general, position 1 can be any amino acid. Some literature indicates, however, that substituting tyrosine, Y, at position 1 results in a peptide with better binding to HLA-A2. Thus, Y is a preferred substitution at position 1 in peptides of the invention. It is also considered in the art that position 2 can be selected from the group consisting of L, M, A, I, V, and T, with L and M being preferred, and with L being particularly preferred. If position 3 is occupied by a hydrophobic residue, if the residue at that position is substituted, it is preferably substituted by another hydrophobic residue, such as W or Y. Thus, using the sequences of the basel and base2, peptides can be designed which

[0116] A complex of an HLA molecule and a peptidic antigen acts as the ligand recognized by HLA-restricted T cells (Buus, S. et al., Cell 47: 1071 (1986); Babbitt, B. P. et al., Nature 317:359 (1985); Townsend, A. and Bodmer, H., Annu. Rev. Immunol. 7:601, 1989; Germain, R. N., Annu. Rev. Immunol. 11 :403 (1993)). Through the study of single amino acid substituted antigen analogs and the sequencing of endogenously bound, naturally processed peptides, critical residues that correspond to motifs required for specific binding to HLA antigen molecules have been identified (see, e.g., Southwood, et al., J. Immunol. 160:3363 (1998); Rammensee, et al., Immunogenetics 41 : 178 (1995); Rammensee et al., Sette, A. and Sidney, J. Curr. Opin. Immunol. 10:478 (1998); Engelhard, V. H., Curr. Opin. Immunol. 6:13 (1994); Sette, A. and Grey, H. M., Curr. Opin. Immunol. 4:79, (1992)).

[0117] Furthermore, x-ray crystallographic analysis of HLA-peptide complexes has revealed pockets within the peptide binding cleft of HLA molecules which accommodate, in an allele-specific mode, residues borne by peptide ligands; these residues in turn determine the HLA binding capacity of the peptides in which they are present. (See, e.g., Madden, D.R. Annu. Rev. Immunol. 13:587, 1995; Smith, et al., Immunity 4:203, 1996; Fremont et al., Immunity 8:305, 1998; Stern et al., Structure 2:245, 1994; Jones, E.Y. Curr. Opin. Immunol. 9:75, 1997; Brown, J. H. et al., Nature 364:33, 1993.)

[0118] Accordingly, the definition of class I and class II allele-specific HLA binding motifs, or class I or class II supermotifs allows identification of regions within basel or base2 that have the potential of binding particular HLA molecules. [0119] Molecules with high levels of sequence identity to basel or base2 are also useful to elicit an immune response. Such molecules can be recognized as "foreign" to the immune system, yet generate antibodies or CTLs that cross react with basel or base2. Analogs of basel whose amino acid sequences are at least 90% identical to basel (although they may have 91%, 92%, 93%, 94%, 95%, or even higher sequence identity to basel) and which are specifically bound by antibodies which specifically bind to basel may be used. Further useful in this regard are basel -analogs, that is, peptides whose amino acid sequences are at least 90% identical to basel (although they may have 91%, 92%, 93%, 94%, 95%, or even higher sequence identity to basel) and which activate T-lymphocytes to cells which express basel . Similarly, analogs of base2 whose amino acid sequences are at least 90% identical to base2 (although they may have 91%, 92%, 93%, 94%, 95%, or even higher sequence identity to base2) and which are specifically bound by antibodies which specifically bind to base2 may be used. Further useful in this regard are base2 analogs, that is, peptides whose amino acid sequences are at least 90% identical to base2 (although they may have 91%, 92%, 93%, 94%, 95%, or even higher sequence identity to base2) and which activate T-lymphocytes to cells which express base2.

[0120] Another molecule that is substantially homologous to a basel or base2 antigenic determinant can be made by modifying the sequence of a natural basel or base2 epitope so that it binds with greater affinity for the HLA molecule.

[0121] One method of identifying genes encoding antigenic determinants is as follows: TILs from a subject with metastatic cancer are grown and tested for the ability to recognize the autologous cancer in vitro. These TILs are administered to the subject to identify the ones that result in tumor regression. The TILs are used to screen expression libraries for genes that express epitopes recognized by the TILs. Subjects then are immunized with these genes. Alternatively, lymphocytes are sensitized in vitro against antigens encoded by these genes. Then the sensitized lymphocytes are adoptively transferred into subjects and tested for their ability to cause tumor regression. Rosenberg, et al., Immunol. Today 1997 18:175 (1997).

[0122] The application of these molecules is now described. These methods are also described in Rosenberg et al, supra, and Restifo et al, Oncology 11 :50 (1999).

[0123] One method of invoking an immune response involves immunizing the subject with a polypeptide comprising an antigenic determinant from basel or base2, either alone or, more preferably, combined with an adjuvant, such as Freund's incomplete adjuvant, lipids or liposomes, gp96, Hsp70 or Hsp90. The polypeptide can be basel or base2, an antigenic fragment of basel or base2, a fusion protein comprising the antigenic determinant, or a peptide comprising a sequence substantially identical to such an antigenic determinant.

[0124] Another method involves pulsing or "loading" a polypeptide comprising an epitope from basel or base2 onto antigen presenting cells ("APCs"), transfecting the APCs with nucleic acids encoding such an epitope, or transducing a stem or progenitor cell with such a nucleic acid and differentiating the cells into APCs expressing the epitope. The APCs are then placed into contact with T lymphocytes in vitro, and the T lymphocytes are then administered to the subject. Alternatively, the APCs are themselves administered to the subject to activate endogenous T lymphocytes.

[0125] In another method, a nucleic acid sequence encoding a polypeptide comprising an antigenic determinant from basel or base2 in an expression cassette is administered to the subject. The nucleic acid optionally also can encode cytokines (e.g., IL-2), a costimulatory molecule or other genes that enhance the immune response. In some embodiments, the nucleic acid is administered to the subject as naked DNA by, e.g. biolistic injection into a body tissue, such as skin or muscle. Such methods have been shown to result in the stimulation of a cell-mediated response against cells that express the encoded polypeptide. Alternatively, the nucleic acid sequence is administered in a virus in which part of the viral genome has been replaced or augmented with the desired nucleic acid sequence. The virus can be, for example, adenovirus, adeno-associated virus, fowlpox virus or vaccinia virus. Upon infection, the infected cells will express the basel or base2 peptide and express the antigenic determinant on the cell surface in combination with the HLA molecule which binds peptides having the same motif as the antigenic determinant. These cells will then stimulate the activation of CTLs that recognize the presented antigen, resulting in destruction of cancer cells that also bear the determinant.

[0126] In another method, recombinant bacteria that express the epitope, such as Bacillus Calmette-Guerin (BCG), Salmonella or Listeria, optionally also encoding cytokines, costimulatory molecules or other genes to enhance the immune response, are administered to the subject.

[0127] In yet another method, cells expressing the antigen are administered to the subject. This includes, for example, dendritic cells pulsed with basel or base2 epitopes, and cells transfected with nucleic acids encoding polypeptides comprising basel or base2 antigenic determinants, along with HLA and B7 genes. The multiple transfection results in the production of several components necessary for presenting the antigenic determinant on the cell surface. In one embodiment, the molecule is a fusion protein in which the polypeptide bearing the antigenic determinant is fused to an HLA molecule (usually through a linker) so as to improve binding of the peptide to the HLA molecule. In one embodiment, the cell is an antigen presenting cell. Preferably, the cells are eukaryotic cells, more preferably, mammalian cells, more preferably, human cells, more preferably autologous human cells derived from the subject.

[0128] In another method, antigen presenting cells ("APCs") are pulsed or co-incubated with peptides comprising an epitope from basel or base2 in vitro. These cells are used to sensitize CD8+ cells, such as tumor infiltrating lymphocytes ("TILs") from breast cancer tumors or peripheral blood lymphocytes ("PBLs"). The TILs or PBLs preferably are from the subject. However, they should at least be MHC Class-I restricted to the HLA types the subject possesses. The sensitized cells are then administered to the subject.

[0129] In a supplemental method, any of these immunotherapies is augmented by administering a cytokine, such as IL-2, IL-3, IL-6, IL-10, IL-12, IL-15, GM-CSF, interferons.

[0130] In addition to the methods for evaluating immunogenicity of peptides set forth above, immunogenicity can also be evaluated by: evaluation of primary T cell cultures from normal individuals (see, e.g., Wentworth, P. A. et al., Mol. Immunol. 32:603, 1995; Celis, E. et al., Proc. Natl. Acad. Sci. USA 91 :2105, 1994; Tsai, V. et al., J. Immunol. 158:1796, 1997; Kawashima, I. et al., Human Immunol. 59: 1, 1998); by immunization of HLA transgenic mice (see, e.g., Wentworth, P. A. et al., J. Immunol. 26:97, 1996; Wentworth, P. A. et al., Int. Immunol. 8:651, 1996; Alexander, J. et al, J. Immunol. 159:4753, 1997), and by demonstration of recall T cell responses from patients who have been effectively vaccinated or who have a tumor; (see, e.g., Rehermann, B. et al., J. Exp. Med. 181 :1047, 1995; Doolan, D. L. et al., Immunity 7:97, 1997; Bertoni, R. et al., J. Clin. Invest. 100:503, 1997; Threlkeld, S. C. et al., J. Immunol. 159: 1648, 1997; Diepolder, H. M. et al., J. Virol. 71 :6011, 1997).

[0131] In choosing CTL-inducing peptides of interest for vaccine compositions, peptides with higher binding affinity for class I HLA molecules are generally preferable. Peptide binding is assessed by testing the ability of a candidate peptide to bind to a purified HLA molecule in vitro. [0132] To ensure that a basel or base2 analog when used as a immunogen, actually elicits a CTL response to basel or base2 in vivo (or, in the case of class II epitopes, elicits helper T cells that cross-react with the native peptides), the basel or base2 analog may be used to immunize T cells in vitro from individuals of the appropriate HLA allele. Thereafter, the immunized cells' capacity to induce lysis of basel- or base2- sensitized target cells is evaluated.

[0133] More generally, peptides from basel or base2 or an immunogenic fragment or analog thereof (a "peptide of the invention") can be synthesized and tested for their ability to bind to HLA proteins and to activate HTL or CTL responses, or both.

[0134] Conventional assays utilized to detect T cell responses include proliferation assays, lymphokine secretion assays, direct cytotoxicity assays, and limiting dilution assays. For example, antigen-presenting cells that have been incubated with a peptide can be assayed for the ability to induce CTL responses in responder cell populations.

[0135] Peripheral blood mononuclear cells (PBMCs) may be used as the responder cell source of CTL precursors. The appropriate antigen-presenting cells are incubated with peptide, after which the peptide-loaded antigen-presenting cells are then incubated with the responder cell population under optimized culture conditions. Positive CTL activation can be determined by assaying the culture for the presence of CTLs that kill radio-labeled target cells, both specific peptide-pulsed targets as well as target cells expressing endogenously processed forms of the antigen from which the peptide sequence was derived.

[0136] A method which allows direct quantification of antigen-specific T cells is staining with Fluorescein-labeled HLA tetrameric complexes (Altman et al., Proc. Natl. Acad. Sci. USA 90:10330 (1993); Altman et al, Science 274:94 (1996)). Alternatively, staining for intracellular lymphokines, interferon-γ release assays or ELISPOT assays, can be used to evaluate T-cell responses.

[0137] HTL activation may be assessed using such techniques known to those in the art such as T cell proliferation and secretion of lymphokines, e.g. IL-2 (see, e.g. Alexander et al., Immunity 1 :751-761 (1994)). ANTIBODIES AGAINST BASE

[0138] The anti-basel (SEQ ID NO: 1) or anti-base2 (SEQ ID NO:2) antibodies generated in the present invention can be linked to effector molecules (EM) through the EM carboxyl terminus, the EM amino terminus, through an interior amino acid residue of the EM such as cysteine, or any combination thereof. Similarly, the EM can be linked directly to the heavy, light, Fc (constant region) or framework regions of the antibody. Linkage can occur through the antibody's amino or carboxyl termini, or through an interior amino acid residue. Further, multiple EM molecules (e.g., any one of from 2-10) can be linked to an anti- SEQ ID NO:l or SEQ ID NO:2 antibody and/or multiple antibodies (e.g., any one of from 2-5) can be linked to an EM. The antibodies used in a multivalent immunoconjugate composition of the present invention can be directed to the same or different epitopes of SEQ ID NO: 1 or SEQ ID NO:2. In preferred forms, the effector molecule is a detectable label.

[0139] In preferred embodiments of the present invention, the anti- SEQ ID NO: 1 or SEQ ID NO:2 antibody is a monoclonal antibody. Recombinant antibodies, such as a scFv or a disulfide stabilized Fv antibody, may also be used. Fv antibodies are typically about 25 kDa and contain a complete antigen-binding site with 3 CDRs per heavy chain and per light chain. If the V_H and the V_L chain are expressed non-contiguously, the chains of the Fv antibody are typically held together by noncovalent interactions. However, these chains tend to dissociate upon dilution, so methods have been developed to crosslink the chains through glutaraldehyde, intermolecular disulfides, or a peptide linker.

[0140] In some embodiments, the antibody is a single chain Fv (scFv). The V_H and the V_L regions of a scFv antibody comprise a single chain which is folded to create an antigen binding site similar to that found in two chain antibodies. Once folded, noncovalent interactions stabilize the single chain antibody. In a more preferred embodiment, the scFv is recombinantly produced. One of skill will realize that conservative variants of the antibodies of the instant invention can be made. Such conservative variants employed in scFv fragments will retain critical amino acid residues necessary for correct folding and stabilizing between the V_H and the V regions.

[0141] In some embodiments of the present invention, the scFv antibody is directly linked to the EM through the light chain. However, scFv antibodies can be linked to the EM via its amino or carboxyl terminus. [0142] While the V_H and V_L regions of some antibody embodiments can be directly joined together, one of skill will appreciate that the regions may be separated by a peptide linker consisting of one or more amino acids. Peptide linkers and their use are well-known in the art. See, e.g., Huston, et al, Proc. Nat' I Acad. Sci. USA 8:5879 (1988); Bird, et al, Science 242:4236 (1988); Glockshuber, et al, Biochemistry 29:1362 (1990); U.S. Patent No.

4,946,778, U.S. Patent No. 5,132,405 and Stemmer, et al, Biotechniques 14:256-265 (1993), all incorporated herein by reference. Generally the peptide linker will have no specific biological activity other than to join the regions or to preserve some minimum distance or other spatial relationship between them. However, the constituent amino acids of the peptide linker may be selected to influence some property of the molecule such as the folding, net charge, or hydrophobicity. Single chain Fv (scFv) antibodies optionally include a peptide linker of no more than 50 amino acids, generally no more than 40 amino acids, preferably no more than 30 amino acids, and more preferably no more than 20 amino acids in length. In some embodiments, the peptide linker is a concatamer of the sequence Gly-Gly-Gly-Ser, preferably 2, 3, 4, 5, or 6 such sequences. However, it is to be appreciated that some amino acid substitutions within the linker can be made. For example, a valine can be substituted for a glycine.

A. Antibody Production [0143] Methods of producing polyclonal antibodies are known to those of skill in the art. In brief, an immunogen, preferably isolated SEQ ID NO:l or SEQ ID NO:2 or immunogenic fragments of SEQ ID NO:l or SEQ ID NO:2 epitopes are mixed with an adjuvant and animals are immunized with the mixture. When appropriately high titers of antibody to the immunogen are obtained, blood is collected from the animal and antisera are prepared. If desired, further fractionation of the antisera to enrich for antibodies reactive to the polypeptide is performed. See, e.g., Coligan, CURRENT PROTOCOLS IN IMMUNOLOGY, Wiley/Greene, NY (1991); and Harlow & Lane, supra, which are incorporated herein by reference.

[0144] A number of immunogens can be used to produce antibodies that specifically bind SEQ ID NO: 1 or SEQ ID NO:2. Full-length SEQ ID NO: 1 or SEQ ID NO:2 is a suitable immunogen. Typically, the immunogen of interest is a peptide of at least about 3 amino acids, more typically the peptide is at least 5 amino acids in length, preferably, the fragment is at least 10 amino acids in length and more preferably the fragment is at least 15 amino acids in length. The peptides can be coupled to a carrier protein (e.g., as a fusion protein), or are recombinantly expressed in an immunization vector. Antigenic determinants on peptides to which antibodies bind are typically 3 to 10 amino acids in length. Naturally occurring polypeptides are also used either in pure or impure form.

[0145] Monoclonal antibodies may be obtained by various techniques familiar to those skilled in the art. Description of techniques for preparing such monoclonal antibodies may be found in, e.g., Stites, et al. (eds.) BASIC AND CLINICAL IMMUNOLOGY (4TH ED.), Lange Medical Publications, Los Altos, CA, and references cited therein; Harlow & Lane, supra; Goding, MONOCLONAL ANTIBODIES: PRINCIPLES AND PRACTICE (2D ED.), Academic Press, New York, NY (1986); Kohler & Milstein, Nature 256:495-497 (1975); and particularly (Chowdhury, P.S., et al, Mol. Immunol. 34:9 (1997)), which discusses one method of generating monoclonal antibodies.

[0146] It is preferred that monoclonal antibodies are made by immunizing an animal with the target antigen or with nucleic acid sequence that encodes the desired immunogen, such as SEQ ID ΝO:l or SEQ ID NO:2. Immunization with non-replicating transcription units that encode a heterologous proteins elicits antigen specific immune responses. After translation into the foreign protein, the protein is processed and presented to the immune system like other cellular proteins. Because it is foreign, an immune response is mounted against the protein and peptide epitopes that are derived from it (Donnelly, et al, J Immunol. Methods 176:145-152 (1994); and Boyer, et al, J. Med. Primatol. 25:242-250 (1996)). This technique has two significant advantages over protein-based immunization. One is that it does not require the purification of the protein, which at best, is time consuming and in cases of many membrane proteins, is very difficult. A second advantage is that since the immunogen is synthesized in a mammalian host, it undergoes proper post-translational modifications and folds into the native structure.

[0147] To immunize with SEQ ID NO:l- or SEQ ID NO:2- coding DNA, SEQ ID NO:l- or SEQ ID NO:2- coding cDNA is introduced into a plasmid so that transcription of the coding sequence is under the control of a promoter such as the CMV promoter. The plasmid is then injected into an animal, either subcutaneously, intradermally, intraperitoneally, etc. As a result, the SEQ ID NO: 1 or SEQ ID NO:2 cDNA is transcribed in the animal into mRNA, SEQ ID NO: 1 or SEQ ID NO:2 is translated from the mRNA, the translated protein undergoes proper post-translational modifications and is expressed. The animal raises antibodies to SEQ ID NO:l or SEQ ID NO:2 and the sera is monitored for antibody titer.

[0148] Optionally, in addition to the coding region and regulatory elements, the plasmid carries an ampicillin resistance (Amp) gene. The Amp gene is known to have immunostimulatory sequences for Thl responses necessary for increased antibody production (Sato, et al, Science 273:352-354 (1996)).

[0149] As described above, in some embodiments, the monoclonal antibody can be a scFv. Methods of making scFv antibodies have been described. See, Huse, et al, supra; Ward, et al. Nature 341:544-546 (1989); and Vaughan, et al, supra. In brief, mRNA from B- cells is isolated and cDN A is prepared. The cDN A is amplified by well known techniques, such as PCR, with primers specific for the variable regions of heavy and light chains of immunoglobulins. The PCR products are purified by, for example, agarose gel electrophoresis, and the nucleic acid sequences are joined. If a linker peptide is desired, nucleic acid sequences that encode the peptide are inserted between the heavy and light chain nucleic acid sequences. The sequences can be joined by techniques known in the art, such as blunt end ligation, insertion of restriction sites at the ends of the PCR products or by splicing by overlap extension (Chowdhury, et al, Mol. Immunol. 34:9 (1997)). After amplification, the nucleic acid which encodes the scFv is inserted into a vector, again by techniques well known in the art. Preferably, the vector is capable of replicating in prokaryotes and of being expressed in both eukaryotes and prokaryotes.

[0150] scFv can be chosen through a phage display library. The procedure described above for synthesizing scFv is followed. After amplification by PCR, the scFv nucleic acid sequences are fused in frame with gene III (gill) which encodes the minor surface protein glllp of the filamentous phage (Marks, et al, J. Biol Chem. 267:16007-16010 (1992); Marks, et al, Behring Inst. Mitt. 91:6-12 (1992); and Brinkmann, et al, J. Immunol. Methods

182:41-50 (1995)). The phage express the resulting fusion protein on their surface. Since the proteins on the surface of the phage are functional, phage bearing SEQ ID NO:l- or SEQ ID NO: 2- binding antibodies can be separated from non-binding or lower affinity phage by panning or antigen affinity chromatography (McCafferty, et al, Nature 348:552-554 (1990)).

[0151] In a preferred embodiment, scFv that specifically bind to SEQ ID NO: 1 or SEQ ID NO:2 are found by panning. Panning is done by coating a solid surface with SEQ ID NO:l or SEQ ID NO:2 and incubating the phage on the surface for a suitable time under suitable conditions. The unbound phage are washed off the solid surface and the bound phage are eluted. Finding the antibody with the highest affinity is dictated by the efficiency of the selection process and depends on the number of clones that can be screened and the stringency with which it is done. Typically, higher stringency corresponds to more selective panning. If the conditions are too stringent, however, the phage will not bind. After one round of panning, the phage that bind to SEQ ID NO:l or SEQ ID NO:2 coated plates are expanded in E. coli and subjected to another round of panning. In this way, an enrichment of 2000-fold occurs in 3 rounds of panning. Thus, even when enrichment in each round is low, multiple rounds of panning will lead to the isolation of rare phage and the genetic material contained within which encodes the sequence of the highest affinity antibody. The physical link between genotype and phenotype provided by phage display makes it possible to test every member of a cDNA library for binding to antigen, even with large libraries of clones.

B. Binding Affinity of Antibodies [0152] Binding affinity for a target antigen is typically measured or determined by standard antibody-antigen assays, such as competitive assays, saturation assays, or immunoassays such as ELISA or RIA.

[0153] Such assays can be used to determine the dissociation constant of the antibody. The phrase "dissociation constant" refers to the affinity of an antibody for an antigen. Specificity of binding between an antibody and an antigen exists if the dissociation constant (K_D = 1/K, where K is the affinity constant) of the antibody is < lμM, preferably < 100 nM, and most preferably < 0.1 nM. Antibody molecules will typically have a K_D in the lower ranges. K_D = [Ab-Ag]/[Ab][Ag] where [Ab] is the concentration at equilibrium of the antibody, [Ag] is the concentration at equilibrium of the antigen and [Ab-Ag] is the concentration at equilibrium of the antibody-antigen complex. Typically, the binding interactions between antigen and antibody include reversible noncovalent associations such as electrostatic attraction, Van der Waals forces and hydrogen bonds.

C. Immunoassays [0154] The antibodies can be detected and/or quantified using any of a number of well recognized immunological binding assays (see, e.g., U.S. Patents 4,366,241; 4,376,110; 4,517,288; and 4,837,168). For a review of the general immunoassays, see also METHODS IN CELL BIOLOGY, VOL. 37, Asai, ed. Academic Press, Inc. New York (1993); BASIC AND CLINICAL IMMUNOLOGY 7TH EDITION, Stites & Terr, eds. (1991). Immunological binding assays (or immunoassays) typically utilize a ligand (e.g., SEQ ID NO:l or SEQ ID NO:2) to specifically bind to and often immobilize an antibody. The antibodies employed in immunoassays of the present invention are discussed in greater detail supra.

[0155] Immunoassays also often utilize a labeling agent to specifically bind to and label the binding complex formed by the ligand and the antibody. The labeling agent may itself be one of the moieties comprising the antibody/analyte complex, i.e., the anti-SEQ ID NO:l or SEQ ID NO:2 antibody. Alternatively, the labeling agent may be a third moiety, such as another antibody, that specifically binds to the antibody/ SEQ ID NO: 1 or SEQ ID NO:2 protein complex.

[0156] In one aspect, a competitive assay is contemplated wherein the labeling agent is a second anti- SEQ ID NO:l or SEQ ID NO:2 antibody bearing a label. The two antibodies then compete for binding to the immobilized SEQ ID NO:l or SEQ ID NO:2. Alternatively, in a non-competitive format, the anti- SEQ ID NO:l or SEQ ID NO:2 antibody lacks a label, but a second antibody specific to antibodies of the species from which the anti- SEQ ID NO:l or SEQ ID NO: 2 antibody is derived, e.g., murine, and which binds the anti- SEQ ID NO:l or SEQ ID NO:2 antibody, is labeled.

[0157] Other proteins capable of specifically binding immunoglobulin constant regions, such as Protein A or Protein G may also be used as the label agent. These proteins are normal constituents of the cell walls of streptococcal bacteria. They exhibit a strong non- immunogenic reactivity with immunoglobulin constant regions from a variety of species (see, generally Kronval, et al, J. Immunol. 111:1401-1406 (1973); and Akerstrom, et al., J. Immunol 135:2589-2542 (1985)).

[0158] Throughout the assays, incubation and/or washing steps may be required after each combination of reagents. Incubation steps can vary from about 5 seconds to several hours, preferably from about 5 minutes to about 24 hours. However, the incubation time will depend upon the assay format, antibody, volume of solution, concentrations, and the like. Usually, the assays will be carried out at ambient temperature, although they can be conducted over a range of temperatures, such as 10°C to 40°C.

[0159] While the details of the immunoassays of the present invention may vary with the particular format employed, the method of detecting anti- SEQ ID NO:l or SEQ ID NO:2 antibodies in a sample containing the antibodies generally comprises the steps of contacting the sample with an antibody which specifically reacts, under immunologically reactive conditions, to the SEQ ID NO:l or SEQ ID NO:2/antibody complex.

METHODS OF DETECTING CELLS THAT EXPRESS BASE

[0160] In another aspect, this invention provides methods of detecting cells that express BASE. The methods involve detecting either a BASE transcript or polypeptide. Such methods of detection are useful in the detection of BASE-expressing cancers. In particular, breast cancer cells can be distinguished from other cells by the expression of BASE.

[0161] Tissue samples can be selected from any likely site of primary or metastatic cancer including the breast, and distal sites such as the lymph nodes and other organs. Persons of skill in the art are aware that men, as well as women, suffer from breast cancer. Breast cancer in men is relatively rare, representing only about 1%> of all breast cancer cases. Because it is uncommon, however, it is frequently diagnosed at a later stage, which affects the chances for survival. Accordingly, improved diagnosis of breast cancer in men is desirable.

[0162] In one method, a biopsy is performed on the subject and the collected tissue is tested in vitro. Typically, the cells are disrupted by lysing, sonic disruption, osmotic pressure, freezing and thawing, enzymatic treatment, or other means routine in the art to render the proteins of the nucleus accessible without denaturing them. The cellular contents (or the nuclear contents, if the contents have been fractionated) are then contacted, for example, with an anti- SEQ ID NO:l or SEQ ID NO:2 antibody. Any immune complexes which result indicate the presence of a BASE protein in the sample. To facilitate such detection, the antibody can be labeled, for example with a radiolabel. Alternatively, the antibody can be coupled to an effector molecule which is labeled. In another method, the cells can be detected in vivo using typical imaging systems. For example, the method can involve the administration to a subject of a labeled composition capable of reaching the cell. Then, the localization of the label is determined by any of the known methods for detecting the label. Any conventional method for visualizing diagnostic imaging can be used. For example, paramagnetic isotopes can be used for MRI.

[0163] Detection of BASE proteins or transcript can not only be diagnostic for a BASE- expressing cancer, but can also aid the practitioner in staging the disease. For example, an increase in BASE protein or transcript levels in samples from a patient over time indicates that the patient's disease has progressed, while a decrease may indicate that therapy is proving effective in controlling or reducing the patient's tumor load.

A. Detection of BASE and BASE proteins

[0164] BASE and BASE proteins can be identified by any methods known in the art. In one embodiment, the methods involve detecting a polypeptide with a ligand that specifically recognizes the polypeptide (e.g., an immunoassay). Antibodies are particularly useful for specific detection of SEQ ID NO:l or SEQ ID NO:2. A variety of antibody-based detection methods are known in the art. These include, for example, radioimmunoassay, sandwich immunoassays (including ELISA), immunofluorescence assays, Western blot, affinity chromatography (affinity ligand bound to a solid phase), and in situ detection with labeled antibodies. Another method for detecting SEQ ID NO:l or SEQ ID NO:2 involves identifying the respective polypeptide according to its mass through, for example, gel electrophoresis, mass spectrometry or HPLC. Subject samples can be taken from any number of appropriate sources, such as peritoneal fluid, blood or a blood product (e.g., serum), urine, tissue biopsy (e.g., lymph node tissue), etc.

[0165] The SEQ ID NO: 1 or SEQ ID NO:2 proteins can be detected in cells in vitro, in samples from biopsy and in vivo using imaging systems described above.

B. Detection of transcript encoding BASE [0166] Cells that express BASE transcript can be detected by contacting the sample with a nucleic acid probe that specifically hybridizes with the transcript, and detecting hybridization. This includes, for example, methods of in situ hybridization, in which a labeled probe is contacted with the sample and hybridization is detected by detecting the attached label. However, the amounts of transcript present in the sample can be small. Therefore, other methods employ amplification, such as RT-PCR. In these methods, probes are selected that function as amplification primers which specifically amplify the BASE sequences from mRNA. Then, the amplified sequences are detected using typical methods.

[0167] The probes are selected to specifically hybridize with BASE transcripts. Generally, complementary probes are used. However, probes need not be exactly complementary if they have sufficient sequence homology and length to hybridize under stringent conditions. PRODUCTION OF IMMUNOCONJUGATES

[0168] Immunoconjugates include, but are not limited to, molecules in which there is a covalent linkage of a therapeutic agent to a targeting molecule, such as an antibody. A therapeutic agent is an agent with a particular biological activity directed against a particular target molecule or a cell bearing a target molecule. One of skill in the art will appreciate that therapeutic agents may include various drugs such as vinblastine, daunomycin and the like, cytotoxins such as native or modified Pseudomonas exotoxin or Diphtheria toxin, encapsulating agents, (e.g., liposomes) which themselves contain pharmacological compositions, radioactive agents such as ^l251, ³²P, ^l4C, ³H and ³⁵S and other labels, target moieties and ligands.

[0169] The choice of a particular therapeutic agent depends on the particular target molecule or cell and the biological effect is desired to evoke. Thus, for example, the therapeutic agent may be a cytotoxin which is used to bring about the death of a particular target cell. Conversely, where it is merely desired to invoke a non-lethal biological response, the therapeutic agent may be conjugated to a non-lethal pharmacological agent or a liposome containing a non-lethal pharmacological agent.

[0170] With the therapeutic agents and antibodies herein provided, one of skill can readily construct a variety of clones containing functionally equivalent nucleic acids, such as nucleic acids which differ in sequence but which encode the same EM or antibody sequence. Thus, the present invention provides nucleic acids encoding antibodies and conjugates and fusion proteins thereof. In some embodiments, the anti-base 1 antibody binds specifically to an linear or a conformation epitope of amino acids 167-179 of SEQ ID NO:l.

A. Recombinant Methods

[0171] Nucleic acid sequences encoding the chimeric molecules of the present invention can be prepared by any suitable method including, for example, cloning of appropriate sequences or by direct chemical synthesis by methods such as the phosphotriester method of Narang, et al, Meth. Enzymol 68:90-99 (1979); the phosphodiester method of Brown, et al, Meth. Enzymol. 68:109-151 (1979); the diethylphosphoramidite method of Beaucage, et al, Tetra. Lett. 22:1859-1862 (1981); the solid phase phosphoramidite triester method described by Beaucage & Caruthers, Tetra. Letts. 22(20): 1859-1862 (1981), e.g., using an automated synthesizer as described in, for example, Needham-VanDevanter, et al. Nucl Acids Res. 12:6159-6168 (1984); and, the solid support method of U.S. Patent No. 4,458,066. Chemical synthesis produces a single stranded oligonucleotide. This may be converted into double stranded DNA by hybridization with a complementary sequence, or by polymerization with a DNA polymerase using the single strand as a template. One of skill would recognize that while chemical synthesis of DNA is limited to sequences of about 100 bases, longer sequences may be obtained by the ligation of shorter sequences.

[0172] In a preferred embodiment, the nucleic acid sequences of this invention are prepared by cloning techniques. Examples of appropriate cloning and sequencing techniques, and instructions sufficient to direct persons of skill through many cloning exercises are found in Sambrook, et al., supra, Berger and Kimmel (eds.), supra, and Ausubel, supra. Product information from manufacturers of biological reagents and experimental equipment also provide useful information. Such manufacturers include the SIGMA chemical company (Saint Louis, MO), R&D systems (Minneapolis, MN), Pharmacia LKB Biotechnology (Piscataway, NJ), CLONTECH Laboratories, Inc. (Palo Alto, CA), Chem Genes Corp., Aldrich Chemical Company (Milwaukee, WI), Glen Research, Inc., GIBCO BRL Life Technologies, Inc. (Gaithersburg, MD), Fluka Chemica-Biochemika Analytika (Fluka Chemie AG, Buchs, Switzerland), Invitrogen, San Diego, CA, and Applied Biosystems (Foster City, CA), as well as many other commercial sources known to one of skill.

[0173] Nucleic acids encoding native EM or anti- basel or base2 antibodies can be modified to form the EM, antibodies, or immunoconjugates of the present invention. Modification by site-directed mutagenesis is well known in the art. Nucleic acids encoding EM or anti- basel or base2 antibodies can be amplified by in vitro methods. Amplification methods include polymerase chain reaction (PCR), the ligase chain reaction (LCR), the transcription-based amplification system (TAS), the self-sustained sequence replication system (3SR). A wide variety of cloning methods, host cells, and in vitro amplification methodologies are well known to persons of skill.

[0174] In a preferred embodiment, immunoconjugates are prepared by inserting the cDNA which encodes an anti- basel or base2 scFv antibody into a vector which comprises the cDNA encoding the EM. The insertion is made so that the scFv and the EM are read in frame, that is in one continuous polypeptide which contains a functional Fv region and a functional EM region. In a particularly preferred embodiment, cDNA encoding a diphtheria toxin fragment is ligated to a scFv so that the toxin is located at the carboxyl terminus of the scFv. In a most preferred embodiment, cDNA encoding PE is ligated to a scFv so that the toxin is located at the amino terminus of the scFv.

[0175] Once the nucleic acids encoding an EM, anti- basel or base2 antibody, or an immunoconjugate of the present invention are isolated and cloned, one may express the desired protein in a recombinantly engineered cell such as bacteria, plant, yeast, insect and mammalian cells as discussed above in connection with the discussion of expression vectors encoding basel or base2. It is expected that those of skill in the art are knowledgeable in the numerous expression systems available for expression of proteins including E. coli, other bacterial hosts, yeast, and various higher eukaryotic cells such as the COS, CHO, HeLa and myeloma cell lines. No attempt to describe in detail the various methods known for the expression of proteins in prokaryotes or eukaryotes will be made.

[0176] One of skill would recognize that modifications can be made to a nucleic acid encoding a polypeptide of the present invention (i.e., anti- basel or base2 antibody, PE, or an immunoconjugate formed from their combination) without diminishing its biological activity. Some modifications may be made to facilitate the cloning, expression, or incorporation of the targeting molecule into a fusion protein. Such modifications are well known to those of skill in the art and include, for example, termination codons, a methionine added at the amino terminus to provide an initiation site, additional amino acids placed on either terminus to create conveniently located restriction sites, or additional amino acids (such as poly His) to aid in purification steps.

[0177] In addition to recombinant methods, the immunoconjugates, EM, and antibodies of the present invention can also be constructed in whole or in part using standard peptide synthesis. Solid phase synthesis of the polypeptides of the present invention of less than about 50 amino acids in length may be accomplished by attaching the C-terminal amino acid of the sequence to an insoluble support followed by sequential addition of the remaining amino acids in the sequence. Techniques for solid phase synthesis are described by Barany & Merrifield, THE PEPTIDES: ANALYSIS, SYNTHESIS, BIOLOGY. VOL. 2: SPECIAL METHODS IN PEPTIDE SYNTHESIS, PART A. pp. 3-284; Merrifield, et al. J. Am. Chem. Soc. 85:2149-2156 (1963), and Stewart, et al, SOLID PHASE PEPTIDE SYNTHESIS, 2ND ED. , Pierce Chem. Co., Rockford, 111. (1984). Proteins of greater length may be synthesized by condensation of the amino and carboxyl termini of shorter fragments. Methods of forming peptide bonds by activation of a carboxyl terminal end (e.g., by the use of the coupling reagent N, N'-dicycylohexylcarbodiimide) are known to those of skill.

B. Purification [0178] Once expressed, the recombinant immunoconjugates, antibodies, and/or effector molecules of the present invention can be purified according to standard procedures of the art, including ammonium sulfate precipitation, affinity columns, column chromatography, and the like (see, generally, R. Scopes, PROTEIN PURIFICATION, Springer-Verlag, N. Y. (1982)). Substantially pure compositions of at least about 90 to 95% homogeneity are preferred, and 98 to 99% or more homogeneity are most preferred for pharmaceutical uses. Once purified, partially or to homogeneity as desired, if to be used therapeutically, the polypeptides should be substantially free of endotoxin.

[0179] Methods for expression of single chain antibodies and/or refolding to an appropriate active form, including single chain antibodies, from bacteria such as E. coli have been described and are well-known and are applicable to the antibodies of this invention. See,

Buchner, et al, Anal Biochem. 205:263-270 (1992); Pluckthun, Biotechnology 9:545 (1991); Huse, et al, Science 246:1275 (1989) and Ward, et al, Nature 341 :544 (1989), all incorporated by reference herein.

[0180] Often, functional heterologous proteins from E. coli or other bacteria are isolated from inclusion bodies and require solubilization using strong denaturants, and subsequent refolding. During the solubilization step, as is well-known in the art, a reducing agent must be present to separate disulfide bonds. An exemplary buffer with a reducing agent is: 0.1 M Tris pH 8, 6 M guanidine, 2 mM EDTA, 0.3 M DTE (dithioerythritol). Reoxidation of the disulfide bonds can occur in the presence of low molecular weight thiol reagents in reduced and oxidized form, as described in Saxena, et al, Biochemistry 9: 5015-5021 (1970), incorporated by reference herein, and especially as described by Buchner, et al, supra.

[0181] Renaturation is typically accomplished by dilution (e.g., 100-fold) of the denatured and reduced protein into refolding buffer. An exemplary buffer is 0.1 M Tris, pH 8.0, 0.5 M L-arginine, 8 mM oxidized glutathione (GSSG), and 2 mM EDTA.

[0182] As a modification to the two chain antibody purification protocol, the heavy and light chain regions are separately solubilized and reduced and then combined in the refolding solution. A preferred yield is obtained when these two proteins are mixed in a molar ratio such that a 5 fold molar excess of one protein over the other is not exceeded. It is desirable to add excess oxidized glutathione or other oxidizing low molecular weight compounds to the refolding solution after the redox-shuffling is completed.

PSEUDOMONAS EXOTOXIN AND OTHER TOXINS

[0183] Toxins can be employed with antibodies of the present invention to yield chimeric molecules, such as immunotoxins. Exemplary toxins include ricin, abrin, Diphtheria toxin and subunits thereof, ribotoxin, ribonuclease, saporin, and calicheamicin, as well as botulinum toxins A through F. These toxins are well known in the art and many are readily available from commercial sources (e.g., Sigma Chemical Company, St. Louis, MO).

Diphtheria toxin is isolated from Corynebacteήum diphtheriae. Ricin is the lectin RCA60 from Ricinus communis (Castor bean). The term also references toxic variants thereof. For example, see, U.S. Patent Nos. 5,079,163 and 4,689,401. Ricinus communis agglutinin (RCA) occurs in two forms designated RCA₆₀ and RCA120 according to their molecular weights of approximately 65 and 120 kD, respectively (Nicholson & Blaustein, J. Biochim. Biophys. Acta 266:543 (1972)). The A chain is responsible for inactivating protein synthesis and killing cells. The B chain binds ricin to cell-surface galactose residues and facilitates transport of the A chain into the cytosol (Olsnes, et al, Nature 249:627-631 (1974) and U.S. Patent No. 3,060,165). Conjugating ribonucleases to targeting molecules for use as immunotoxins is discussed in, e.g., Suzuki et al., Nat Biotech 17:265-70 (1999). Exemplary ribotoxins such as α-sarcin and restrictocin are discussed in, e .g., Rathore et al., Gene 190:31-5 (1997) and Goyal and Batra, Biochem 345 Pt 2:247-54 (2000). Calicheamicins were first isolated from Micromonospora echinospora and are members of the enediyne antitumor antibiotic family that cause double strand breaks in DNA that lead to apoptosis. See, e.g., Lee et al., J. Antibiot 42:1070-87 (1989). The drug is the toxic moiety of an immunotoxin in clinical trials. See, e.g., Gillespie et al., Ann Oncol 11 :735-41 (2000).

[0184] Ricin is the lectin RCA60 from Ricinus communis (Castor bean). The term also references toxic variants thereof. For example, see, U.S. Patent Nos. 5,079,163 and 4,689,401. Ricinus communis agglutinin (RCA) occurs in two forms designated RCA₆₀ and RCA₁₂₀ according to their molecular weights of approximately 65 and 120 kD, respectively (Nicholson & Blaustein, J. Biochim. Biophys. Acta 266:543 (1972)). The A chain is responsible for inactivating protein synthesis and killing cells. The B chain binds ricin to cell-surface galactose residues and facilitates transport of the A chain into the cytosol (Olsnes, et al, Nature 249:627-631 (1974) and U.S. Patent No. 3,060,165).

[0185] Abrin includes toxic lectins from Abrus precatorius. The toxic principles, abrin a, b, c, and d, have a molecular weight of from about 63 and 67 kD and are composed of two disulfide-linked polypeptide chains A and B. The A chain inhibits protein synthesis; the B- chain (abrin-b) binds to D-galactose residues (see, Funatsu, et al, Agr. Biol. Chem. 52:1095 (1988); and Olsnes, Methods Enzymol. 50:330-335 (1978)).

[0186] In preferred embodiments of the present invention, the toxin is Pseudomonas exotoxin (PE). The term "Pseudomonas exotoxin" as used herein refers to a full-length native (naturally occurring) PE or a PE that has been modified. Such modifications may include, but are not limited to, elimination of domain la, various amino acid deletions in domains lb, II and III, single amino acid substitutions and the addition of one or more sequences at the carboxyl terminus such as KDEL (SEQ ID NO: 5) and REDL (SEQ ID NO:6). See Siegall, et al, J. Biol. Chem. 264: 14256-14261 (1989). In a preferred embodiment, the cytotoxic fragment of PE retains at least 50%), preferably 75%, more preferably at least 90%, and most preferably 95% of the cytotoxicity of native PE. In a particularly preferred embodiment, the cytotoxic fragment is more toxic than native PE.

[0187] Native Pseudomonas exotoxin A ("PE") is an extremely active monomeric protein (molecular weight 66 kD), secreted by Pseudomonas aeruginosa, which inhibits protein synthesis in eukaryotic cells. The native PE sequence is provided in commonly assigned U.S. Patent No. 5,602,095, incorporated herein by reference. The method of action is inactivation of the ADP-ribosylation of elongation factor 2 (EF-2). The exotoxin contains three structural domains that act in concert to cause cytotoxicity. Domain la (amino acids 1-252) mediates cell binding. Domain II (amino acids 253-364) is responsible for translocation into the cytosol and domain III (amino acids 400-613) mediates ADP ribosylation of elongation factor 2. The function of domain lb (amino acids 365-399) remains undefined, although some or all of it, such as amino acids 365-380, can be deleted without loss of cytotoxicity. See Siegall, et al., (\9&9), supra.

[0188] PE employed in the present invention include the native sequence, cytotoxic fragments of the native sequence, and conservatively modified variants of native PE and its cytotoxic fragments. Cytotoxic fragments of PE include those which are cytotoxic with or without subsequent proteolytic or other processing in the target cell (e.g., as a protein or pre- protein). Cytotoxic fragments of PE known in the art include PE40, PE38, and PE35.

[0189] In preferred embodiments, the PE has been modified to reduce or eliminate nonspecific cell binding, frequently by deleting domain la as taught in U.S. Patent 4,892,827, although this can also be achieved, for example, by mutating certain residues of domain la. U.S. Patent 5,512,658, for instance, discloses that a mutated PE in which Domain la is present but in which the basic residues of domain la at positions 57, 246, 247, and 249 are replaced with acidic residues (glutamic acid, or "E")) exhibits greatly diminished nonspecific cytotoxicity. This mutant form of PE is sometimes referred to as PE4E.

[0190] PE40 is a truncated derivative of PE as previously described in the art. See, Pai, et al, Proc. Nat 'I Acad. Sci. USA 88:3358-62 (1991); and Kondo, et α/., J Biol. Chem. 263:9470-9475 (1988). PE35 is a 35 kD carboxyl-terminal fragment of PE in which amino acid residues 1-279 have deleted and the molecule commences with a met at position 280 followed by amino acids 281-364 and 381-613 of native PE. PE35 and PE40 are disclosed, for example, in U.S. Patents 5,602,095 and 4,892,827.

[0191] In some preferred embodiments, the cytotoxic fragment PE38 is employed. PE38 is a truncated PE pro-protein composed of amino acids 253-364 and 381-613 which is activated to its cytotoxic form upon processing within a cell (see e.g., U.S. Patent No. 5,608,039, and Pastan et al., Biochim. Biophys. Acta 1333:C1-C6 (1997)).

[0192] While in preferred embodiments, the PE is PE4E, PE40, or PE38, any form of PE in which non-specific cytotoxicity has been eliminated or reduced to levels in which significant toxicity to non-targeted cells does not occur can be used in the immunotoxins of the present invention so long as it remains capable of translocation and EF-2 ribosylation in a targeted cell.

A. Conservatively Modified Variants of PE

[0193] Conservatively modified variants of PE or cytotoxic fragments thereof have at least 80%o sequence similarity, preferably at least 85% sequence similarity, more preferably at least 90%) sequence similarity, and most preferably at least 95% sequence similarity at the amino acid level, with the PE of interest, such as PE38. [0194] The term "conservatively modified variants" applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refer to those nucleic acid sequences which encode identical or essentially identical amino acid sequences, or if the nucleic acid does not encode an amino acid sequence, to essentially identical nucleic acid sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given polypeptide. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are "silent variations," which are one species of conservatively modified variations. Every nucleic acid sequence herein which encodes a polypeptide also describes every possible silent variation of the nucleic acid. One of skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine) can be modified to yield a functionally identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a polypeptide is implicit in each described sequence.

[0195] As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a "conservatively modified variant" where the alteration results in the substitution of an amino acid with a chemically similar amino acid.

B. Assaying for Cytotoxicity of PE [0196] Pseudomonas exotoxins employed in the invention can be assayed for the desired level of cytotoxicity by assays well known to those of skill in the art. Thus, cytotoxic fragments of PE and conservatively modified variants of such fragments can be readily assayed for cytotoxicity. A large number of candidate PE molecules can be assayed simultaneously for cytotoxicity by methods well known in the art. For example, subgroups of the candidate molecules can be assayed for cytotoxicity. Positively reacting subgroups of the candidate molecules can be continually subdivided and reassayed until the desired cytotoxic fragment(s) is identified. Such methods allow rapid screening of large numbers of cytotoxic fragments or conservative variants of PE. COMPOSITIONS WITH PHARMACEUTICALLY ACCEPTABLE CARRIERS

[0197] In another aspect, this invention provides compositions that comprise a pharmaceutically acceptable carrier and a composition of this invention.

[0198] In one group of embodiments, the composition comprises SEQ ID NO:l or SEQ ID NO:2, an immunogenic fragment of one of these proteins, or a SEQ ID NO:l or SEQ ID NO:2 analog, in an amount effective to elicit a cell-mediated immune response or a humoral response in a subject, e.g. , a polypeptide bearing an MHC binding motif. Such compositions are useful as vaccines in the therapeutic methods of this invention and for preparing antibodies. [0199] In another embodiment, the composition comprises a nucleic acid molecule comprising a nucleotide sequence encoding SEQ ID NO:l or SEQ ID NO:2, an immunogenic fragment of one of these proteins, or a SEQ ID NO:l or SEQ ID NO: 2 analog, operably linked to a promoter. Expression of the nucleic acid in a subject is effective to elicit an immune response against cells expressing SEQ ID NO:l or SEQ ID NO:2 in the subject. Such compositions are useful in the therapeutic methods of this invention.

[0200] In yet another group of embodiments, the composition may comprise a chimeric molecule comprising a targeting molecule and a effector molecules, such as detectable label or other molecule to detect cells expressing SEQ ID NO:l or SEQ ID NO:2, or a cytotoxin, such as PE. In some embodiments, the targeting molecule is an antibody. If the detector molecule is one capable of binding specifically to a nucleic acid encoding SEQ ID NO:l or SEQ ID NO:2 (such as a DNA binding protein which can bind specifically to DNA encoding SEQ ID NO:l or SEQ ID NO:2), than the composition can be used to detect cells which express that nucleic acid.

[0201] The compositions can be prepared in unit dosage forms for administration to a subject. The amount and timing of administration are at the discretion of the treating physician to achieve the desired purposes.

[0202] The compositions for administration will commonly comprise a solution of the agent (e.g., a polypeptide of SEQ ID NO:l or SEQ ID NO:2, or an immunogenic fragment thereof, where the intent is to raise an immune response, or an anti-base 1 or anti-base2 antibody and detectable label immunoconjugate where the intent is to detect the presence of BASE-expressing cells) dissolved in a pharmaceutically acceptable carrier, preferably an aqueous carrier. A variety of aqueous carriers can be used, e.g., buffered saline and the like. These solutions are sterile and generally free of undesirable matter. These compositions may be sterilized by conventional, well known sterilization techniques. The compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions such as pH adjusting and buffering agents, toxicity adjusting agents and the like, for example, sodium acetate, sodium chloride, potassium chloride, calcium chloride, sodium lactate and the like. The concentration of fusion protein in these formulations can vary widely, and will be selected primarily based on fluid volumes, viscosities, body weight and the like in accordance with the particular mode of administration selected and the patient's needs.

[0203] Thus, a typical pharmaceutical immunotoxin composition of the present invention for intravenous administration would be about 0.1 to 10 mg per patient per day. Dosages from 0.1 up to about 100 mg per patient per day may be used. These compositions can be administered to inhibit the growth of cells of BASE expressing cancers. In these applications, compositions are administered to a patient in an amount sufficient to inhibit growth of BASE-expressing cells. An amount adequate to accomplish this is defined as a

"therapeutically effective dose." Amounts effective for this use will depend upon the severity of the disease and the general state of the patient's health. An effective amount of the compound is that which provides either subjective relief of a symptom(s) or an objectively identifiable improvement as noted by the clinician or other qualified observer. Depending on the practitioner's judgment, the compositions can be administered directly into a tumor, administered locally (for example, by catheter or cannula) to the area around a tumor, or administered parentally. Decisions about the method of administration are within routinely made by practitioners based on such criteria as the number, size, and location of the patient's metastases.

[0204] By contrast, compositions intended to raise an immune response will typically be administered to a secluded site and not into the circulatory or lymph system, such as into a body cavity or into a lumen of an organ. Preferably, the administration is intramuscular. Actual methods for preparing administrable compositions will be known or apparent to those skilled in the art and are described in more detail in such publications as REMINGTON'S PHARMACEUTICAL SCIENCE, 19TH ED., Mack Publishing Company, Easton, Pennsylvania (1995). [0205] Single or multiple administrations of the compositions are administered depending on the dosage and frequency as required and tolerated by the patient. In the case of compositions intended to raise an immune response, the composition should provide a sufficient quantity of the proteins of this invention to raise an immune response to BASE- expressing cells. Generally, the dose is sufficient to raise or to heighten a cellular immune response to BASE-expressing cells without producing unacceptable toxicity to the patient. In the case of immunotoxins or other immunoconjugates intended to have a therapeutic effect, the dosage may be administered once but may be readministered periodically until either a therapeutic result is achieved or until side effects warrant discontinuation of therapy.

[0206] Controlled release parenteral formulations of the compositions of the present invention can be made as implants, oily injections, or as particulate systems. For a broad overview of protein delivery systems see, Banga, A.J., THERAPEUTIC PEPTIDES AND PROTEINS: FORMULATION, PROCESSING, AND DELIVERY SYSTEMS, Technomic Publishing Company, Inc., Lancaster, PA, (1995) incorporated herein by reference. Particulate systems include microspheres, microparticles, microcapsules, nanocapsules, nanospheres, and nanoparticles. Microcapsules contain the therapeutic protein as a central core. In microspheres the therapeutic is dispersed throughout the particle. Particles, microspheres, and microcapsules smaller than about 1 μm are generally referred to as nanoparticles, nanospheres, and nanocapsules, respectively. Capillaries have a diameter of approximately 5 μm so that only nanoparticles are administered intravenously. Microparticles are typically around 100 μm in diameter and are administered subcutaneously or intramuscularly. See, e.g., Kreuter, J., COLLOIDAL DRUG DELIVERY SYSTEMS, J. Kreuter, ed., Marcel Dekker, Inc., New York, NY, pp. 219-342 (1994); and Tice & Tabibi, TREATISE ON CONTROLLED DRUG DELIVERY, A. Kydonieus, ed., Marcel Dekker, Inc. New York, NY, pp. 315-339, (1992) both of which are incorporated herein by reference.

[0207] Polymers can be used for ion-controlled release of immunoconjugate compositions of the present invention. Various degradable and nondegradable polymeric matrices for use in controlled drug delivery are known in the art (Langer, R., Accounts Chem. Res. 26:537-542 (1993)). For example, the block copolymer, polaxamer 407 exists as a viscous yet mobile liquid at low temperatures but forms a semisolid gel at body temperature. It has shown to be an effective vehicle for formulation and sustained delivery of recombinant interleukin-2 and urease (Johnston, et al, Pharm. Res. 9:425-434 (1992); and Pec, et al, J. Parent. Sci. Tech. 44(2):58-65 (1990)). Alternatively, hydroxyapatite has been used as a microcarrier for controlled release of proteins (Ijntema, et al, Int. J. Pharm. 112:215-224 (1994)). In yet another aspect, liposomes are used for controlled release as well as drug targeting of the lipid- capsulated drug (Betageri, el al, LIPOSOME DRUG DELIVERY SYSTEMS, Technomic Publishing Co., Inc., Lancaster, PA (1993)). Numerous additional systems for controlled delivery of therapeutic proteins are known. See, e.g., U.S. Pat. No. 5,055,303, 5,188,837, 4,235,871, 4,501,728, 4,837,028 4,957,735 and 5,019,369, 5,055,303; 5,514,670; 5,413,797; 5,268,164; 5,004,697; 4,902,505; 5,506,206, 5,271,961; 5,254,342 and 5,534,496, each of which is incorporated herein by reference.

DIAGNOSTIC KITS AND IN VITRO USES

[0208] In another embodiment, this invention provides for kits for the detection of BASE- expressing cells or of basel (SEQ ID NO:l) or base2 (SEQ ID NO:2) or an immunoreactive fragment thereof, (i.e., collectively, a "BASE protein") in a biological sample. A "biological sample" as used herein is a sample of biological tissue or fluid that contains an BASE protein. Such samples include, but are not limited to, tissue from biopsy, sputum, blood, blood cells (e.g., white cells), and urine. Biological samples also include sections of tissues, such as frozen sections taken for histological purposes.

[0209] Kits will typically comprise an anti- SEQ ID NO: 1 or SEQ ID NO:2 antibody. In some embodiments, the anti -basel antibody binds to an epitope formed by amino acids 167- 179 of SEQ ID NO: 1. The SEQ ID NO: 1 or SEQ ID NO:2 antibody may be an anti- SEQ ID NO:l or SEQ ID NO:2 Fv fragment, such as a scFv fragment.

[0210] In addition the kits will typically include instructional materials disclosing means of use of an anti- basel (SEQ ID NO:l) or base2 (SEQ ID NO:2) antibody (e.g. for detection of breast cancer cells in a sample). The kits may also include additional components to facilitate the particular application for which the kit is designed. Thus, for example, the kit may additionally contain means of detecting the label (e.g. enzyme substrates for enzymatic labels, filter sets to detect fluorescent labels, appropriate secondary labels such as a sheep anti-mouse-HRP, or the like). The kits may additionally include buffers and other reagents routinely used for the practice of a particular method. Such kits and appropriate contents are well known to those of skill in the art.

[0211] In one embodiment of the present invention, the diagnostic kit comprises an immunoassay. As described above, although the details of the immunoassays of the present invention may vary with the particular format employed, the method of detecting basel (SEQ ID NO:l) or base2 (SEQ ID NO:2) in a biological sample generally comprises the steps of contacting the biological sample with an antibody which specifically reacts, under immunologically reactive conditions, SEQ ID NO:l or to SEQ ID NO:2, or to both. If the biological sample comprises cells, cells to be tested for BASE expression will typically be disrupted prior to contact with the antibody. Conveniently, disruption can be by sonication, although other methods known in the art may also be used so long as they do not denature the SEQ ID NO:l or SEQ ID NO:2 polypeptides or interfere with antibody binding. The antibody is allowed to bind to SEQ ID NO: 1 or SEQ ID NO:2 or both under immunologically reactive conditions, and the presence of the bound antibody is detected directly or indirectly.

[0212] The antibodies provided herein will be especially useful as diagnostic agents and in in vitro assays to detect the presence of SEQ ID NO:l or SEQ ID NO:2 in biological samples. For example, the antibodies made by the methods taught herein can be used as the targeting moieties of immunoconjugates in immunohistochemical assays to determine whether a sample contains cells expressing SEQ ID NO: 1 or SEQ ID NO:2. If the sample is one taken from a tissue of a patient which should not normally express SEQ ID NO:l or SEQ ID NO:2, detection of one of those proteins would indicate, for example, that the patient has a cancer characterized by the presence of BASE-expressing cells, in a patient not previously known to have such a cancer or, for a patient under treatment for such a cancer, that the treatment has not yet been successful at eradicating it. Typically, the cancer is a breast cancer.

[0213] In another set of uses for the invention, immunotoxins targeted by anti- SEQ ID NO: 1 or anti-SEQ ID NO:2 antibodies can be used to purge targeted cells from a population of cells in a culture. Thus, for example, cells cultured from a patient having a BASE- expressing cancer can be purged of cancer cells by contacting the culture with immunotoxins which target cells expressing SEQ ID NO: 1 or SEQ ID NO:2.

EXAMPLES Example 1

[0214] To discover new antigens as targets for immunotherapies of breast cancers and secreted proteins for use as diagnostic markers, we have taken a molecular approach to identify membrane and secreted proteins that are present in breast cancers but are not expressed in normal essential tissues. Secretory and integral membrane proteins are translated from mRNA on membrane bound ribosomes associated with the endoplasmic reticulum. Isolation of the membrane-associated polyribosomal RNA produces an enriched population of transcripts that encode membrane and secretory proteins.

[0215] A high quality cDNA library was generated which is enriched with genes that encode membrane and secreted proteins using membrane-associated RNA from six cell lines: four different breast cancer cell lines, one normal breast cell line, and a prostate cancer cell line. This cDNA library was subtracted with RNA from a pool of five libraries derived from liver, kidney, brain, lung and muscle to enrich for differentially expressed genes in breast and prostate cancer while removing or reducing ubiquitously expressed genes. Subtraction of the membrane-associated polyribosomal library with libraries from normal tissues has a two-fold consequence. First, it identifies known genes that are either activated or upregulated in breast or prostate cancer cells and could facilitate identifying genes that play a role in carcinogenesis. Second, it identifies membrane proteins that are enriched in or specific for breast and prostate. In addition, secreted proteins can be used in breast cancer diagnostic tests.

[0216] 943 random clones from the unsubtracted Membrane-Associated Polyribosomal cDNA Library (MAPcL) and 15,581 clones from the subtracted MAPcL were sequenced. The subtracted MAPcL was enriched for genes that encode membrane and secreted proteins and numerous genes that are associated with or overexpressed in breast and prostate cancers. Out of 15,581 clones sequenced from the subtracted MAPcL, 10,506 clones mapped to known genes, 4074 mapped to UniGene clusters that are not associated with known genes, and 1001 are comprised of unknown sequences. Here described is a new breast cancer gene from the subtracted MAPcL designated BASE (breast cancer and salivary gland expression). BASE encodes a secreted protein that is expressed in breast cancer but is not expressed in the tissues used for subtraction, verifying the effectiveness of the method. Example 2

[0217] Generation of a membrane associated polyribosomal cDNA library

To generate a breast and prostate cancer cDNA library enriched with genes that encode membrane and secreted proteins, membrane-associated polyribosomal RNA was isolated from four breast cancer cell lines (MCF7, ZR-75-1, SK-BR-3 and MDA-MB-231), one telomerase immortalized normal breast cell line, hTERT-HMEl, and the prostate cancer cell line, LNCaP, that produces PSA. The addition of the prostate cancer cell line RNA served two purposes. It served as a test of the approach, as the library was expected to be enriched in PSA. Also, it gives the opportunity to discover new genes expressed in prostate cancers. It was previously shown using cDNA microarray analysis that there are two main subgroups of breast tumors based on their gene expression profiles: ER (estrogen receptor) positive and ER negative (Perou et al, Nature 406, 747-752 (2000)). In addition, the over expression of ErbB2 correlated with low levels of the ER. Because there are numerous breast cancer cell lines available, four were chosen to represent the recognized range of phenotypic diversity of breast tumors. MCF7 and ZR-75-1 both express the ER and express ErbB2 at low levels. SK-BR-3 and MDA-MB-231 do not express the ER. SK-BR-3 contains gene amplifications of ErbB2, whereas MDA-MB-231 expresses ErbB2 at low levels. Membrane-associated polyribosomal RNA was isolated individually from the six cell lines, and the RNA was pooled. A cDNA library was generated from the pooled membrane-associated polyribosomal RNA as described in Methods. The library contains 2.01 x 10⁷ cfu total with an average insert size of 2 kb.

Subtraction of the initial MAPcL

[0218] To remove ubiquitously expressed genes and enrich for genes specifically expressed in breast and prostate cancers, the initial MAPcL was subtracted using biotinylated RNA generated from five normal libraries: brain, liver, lung, kidney, and skeletal muscle as described herein. The efficiency of subtraction was determined by measuring the level of a housekeeping gene, eukaryotic elongation factor lα, or EEF1A1. The EEF1A1 gene was reduced by 85-fold in the subtracted library.

[0219] To determine which genes are represented in the initial and subtracted MAPcLs, one sequencing reaction was performed on the 5' end of 943 unsubtracted clones and 15,581 subtracted clones that were randomly chosen. The sequences from the two libraries were compared to known genes in RefSeq, a public database of curated genes (Pruitt et al, Nucleic Acids Res., 29, 137-140 (2001)). The most abundant known genes in the initial and subtracted libraries are shown in Tables 1 and 2. Although the initial library was made from emiched membrane-associated polyribosomal RNA, it still contained cDNAs derived from highly expressed genes that encode soluble, housekeeping proteins, such as GAPD and EEF1A1. Subtraction of the initial library successfully removed these contaminating sequences such that out of the 15,581 sequenced clones, GAPD was not present and only one clone encoded EEF1 Al . Furthermore, the most abundant gene was PSA, a prostate-specific secreted protein highly expressed in the LNCaP cell line (Table 2).

Percentage of cDNA inserts representing the entire coding region

[0220] Because all sequencing reactions were performed from the 5' end, it is possible to determine what percentage of the cDNA inserts encode full-length transcripts of known genes. Using BLASTX to compare translated MAPcL sequences to the RefSeq protein database, it was determined that 30% of the MAPcL sequences contain the 5' end of the encoded proteins (Altschul et al, J. Mol. Biol, 215, 403-410 (1990)).

Quantitation of genes encoding membrane and secreted proteins

[0221] To quantify the enrichment of clones encoding membrane and secreted proteins, MAPcL sequences representing known genes were assessed using the Gene Ontology Consortium (GO) database (The Gene Ontology Consortium, Genome Res., 11, 1425-1433 (2001)). According to the cellular location classification from the GO database, 49% of the known genes in the subtracted MAPcL encode membrane or secreted proteins. In contrast, only 14% of known genes encode membrane or secreted proteins from a control library derived from unfractionated mRNA from an adenocarcinoma breast tissue (see Methods). Cellular locations of the most abundant genes in the initial and subtracted libraries are listed in Tables 1 and 2.

Unknown sequences [0222] The 15,581 sequences from clones of the subtracted MAPcL were classified as either known or unknown based on a BLAST analysis (Table 3) (Altschul et al, J. Mol. Biol, 215, 403-410 (1990)). Sequences were labeled as known if they aligned to a gene sequence in the RefSeq database; otherwise, they were labeled unknown. Of 15,581 MAPcL sequences, 10,506 sequences aligned with 3814 RefSeq genes. The remaining 5075 unknown sequences were divided into three groups: (1) 4074 sequences aligned with 2382 UniGene clusters which were not associated with known genes, (2) 354 sequences, representing 342 unique transcripts, overlapped with ESTs that were not part of any UniGene clusters, and (3) 647 sequences, representing 457 unique transcripts, did not overlap any known sequences (Boguski et al, Nat. Genet., 4, 332-333 (1993)). Numbers of sequences with each classification are given in Table 3.

[0223] The 5075 sequences from the subtracted MAPcL that are not associated with known genes were examined to narrow the search for genes encoding potential immunotherapy targets. Candidate sequences chosen for further study either align to EST sequences derived only from nonessential tissue libraries, have alternative splice forms different than ESTs derived from essential tissues, or do not align with any ESTs. The sequences were aligned to the human genome using BLAT from the GoldenPath project (December 2001 build) and the genomic region was surveyed around these sequences for evidence of gene structure based on other ESTs that were also aligned to the genome (Kent et al, Genome Res., 11, 1541-1548 (2001); Kent et al, Genome Res., 12, 996-1006 (2002); Kent, W.J., Genome Res. 12, 656- 664. (2002)). MAPcL sequences that appeared to represent the 5' end of genes containing ESTs from excluded tissues were eliminated. In addition, all candidate MAPcL sequences contain a predicted open reading frame based on the sequence obtained from one reaction from the 5' end.

Characterization of BASE

[0224] A new sequence from the subtracted MAPcL that fits the above criteria was experimentally characterized and designated BASE (breast cancer and salivary gland expression). Figure 1 shows the cDNA sequence of BASE (kae08h07) aligned to chromosome 20. Initially, a single 5' sequencing reaction was performed, and the sequence was aligned with the human genome (Fig. 1). Completion of the full-length sequence (Fig. 1) shows that BASE has an open reading frame encoding a 19.5 kDa protein. Analysis of the amino acid sequence of BASE using the PSort program predicts it to be a secreted protein (Nakai et al, Trends Biochem. Sci., 24, 34-36 (1999)). Three additional MAPcL cDNA sequences align with the kae08h07 sequence; however, no ESTs in the dbEST database (Boguski et al, Nat. Genet., 4, 332-333 (1993)) align with BASE (Fig. 1). Since coverage of the EST database is incomplete (Zhang et al, Science, 276, 1268-1272 (1997)), expression specificity of BASE had to be verified experimentally.

[0225] Because membrane-associated RNAs derived from diverse cell lines were used to make the MAPcL, we determined which of the cell lines express BASE using reverse transcriptase (RT)-PCR analysis of the membrane-associated polyribosomal RNA (Fig. 2a). The specific primers used for PCR are located in separate exons of BASE and amplify a 464- bp fragment (Fig. 1). BASE had strong expression in the breast cancer cell line, ZR-75-1, and low expression in SK-BR-3 and MCF7. No expression was detected in the normal breast cell line, hTERT-hME, and weak expression was observed in LNCaP (Fig. 2a). As a control for the quality of the generated cDNA, separate PCR reactions were performed using primers to the transferrin receptor (Fig. 2a).

[0226] For the potential proteins encoded by the MAPcL genes to be used as therapeutic targets or diagnostic markers, the genes must be expressed in breast cancers. To examine expression levels of BASE in breast cancers, a RT-PCR analysis was performed (Fig. 2b). Total RNA was isolated from 8 primary and 3 metastatic frozen breast cancer samples from patients, and the RNA was used as a template to generate cDNA. The BASE specific primers used for PCR are shown in Figure 1. BASE was expressed in 5 primary breast cancer samples (Fig. 2b, lanes 2, 5, 7, 8, and 11) and one metastatic sample (Fig. 2b, lane 10). As a positive control for the PCR reactions, pKAE08h07 was used as a template (Fig. 2b, lane 12). Separate PCR reactions were performed using actin primers to verify the quality of the generated cDNA (Fig. 2b). Expression of BASE in breast cancers was confirmed using in situ hybridization analysis.

[0227] The expression profile of BASE was analyzed using a human Multiple Tissue Expression (MTE™) array containing mRNA from 61 different normal tissues. When the cDNA insert of the BASE clone pKAE08h07 was used as a probe, it only reacted with salivary gland mRNA (Fig. 3a, E9). Expression of BASE was not detected in the mammary gland mRNA sample (Fig. 3a, F9). Next, PCR analysis was performed using a rapid scan panel containing cDNA samples derived from 24 normal tissues (Fig. 3b). The rapid scan panel revealed an abundant 464-bp PCR product from cDNA derived from salivary gland (Fig. 3b, lane 13) confirming the dot blot result. To verify the quality of the cDNA templates, separate PCR reactions were performed using actin primers, and bands of equal intensity were observed (Fig. 3 b).

[0228] To determine the transcript length of BASE and to verify that the cDNA clone represents the full-length transcript, Northern blot analysis was performed. Salivary gland and ZR-75-1 mRNA were probed with the 1.4 kb cDNA insert of kae08h07 (Fig. 3c). Two bands were observed with the most abundant transcript at approximately 2.3 kb and a less abundant transcript at approximately 1.7 kb. The insert size for the kae08h07 BASE clone is 1.4 kb, which corresponds to the 1.7 kb transcript with the addition of the polyA tail. The 2.3 kb transcript indicates that there may be alternatively spliced forms of BASE. This is consistent with the presence of two MAPcL clones, kael30b03 and kael 17b07, which overlap kae08h07, but have an extended exon (Fig. 1). A strong actin band was observed with all the samples.

[0229] BASE is an example of a new breast cancer gene from the subtracted MAPcL that does not overlap any other sequences in the human database.

Example 3

[0230] This Example discusses the studies resulting in the present invention.

[0231] A molecular approach was used to identify new genes encoding membrane and secreted proteins expressed in breast cancers that have limited expression in normal tissues for use as immunotherapy targets and diagnostic markers. To increase the chances of finding new membrane and secreted proteins, membrane-associated polyribosomal mRNA, which encodes membrane and secreted proteins, was isolated from four breast cancer cell lines, one normal breast cell line and a prostate cancer cell line. A cDNA library was generated and subsequently subtracted with five different libraries made from normal tissues to reduce ubiquitously expressed genes and enrich for genes expressed in breast and prostate cancers. This approach was especially feasible for breast cancer because numerous cell lines are available with a diverse range of phenotypes.

[0232] To determine what genes are represented in the subtracted MAPcL, an unbiased method of randomly sequencing 15,581 clones was used. Forty-five percent of the non- redundant MAPcL sequences did not align with any known genes, as determined by BLAST against the RefSeq database (Table 3), indicating that this approach was very successful in identifying new genes. Of the 2,382 UniGene clusters represented by the MAPcL sequences not associated with known genes, 27 align with ESTs derived only from libraries made from nonessential tissues, such as breast, ovary and testis. Of the 342 unique sequences that align with non-UniGene ESTs, 50 sequences have alignment restricted to ESTs derived from nonessential tissues. Lastly, 457 unique sequences align with no ESTs, and consequently information about tissue specificity of expression is not available for these clones. Most of these sequences probably represent transcripts that previously have not been detected. However, this number may be an over estimate because the MAPcL clones have an average insert size of 1800 bp while the sequences are 578 bp on average and are generated from the 5' end of the cDNA clones. While we have estimated that 30%> of the MAPcL cDNA inserts contain the entire ORF of the encoded protein, UniGene EST sequences in the database most often consist of the 3' end of transcripts. Consequently, the 5' sequences from the full-length MAPcL clones may not overlap with the corresponding 3' EST clusters.

[0233] The studies reported herein resulted in uncovering the expression pattern of a new breast cancer gene, designated BASE. BASE does not overlap any ESTs in the dbEST, and BASE was not expressed in the organs used for subtraction. BASE was only expressed in salivary gland (Fig. 3a & b). Most importantly, although the gene was identified using tissue culture cell lines, BASE was also expressed in both primary and metastatic human breast cancers as determined by RT-PCR (Fig. 2b) and in situ hybridization. The PSort program predicts that BASE is a secreted protein (Boguski et al, Nat. Genet., 4, 332-333 (1993)). Results of a protein BLAST search indicate that BASE shares sequence similarity with Latherin, a 228 amino acid protein that is a major component of horse sweat and is responsible for rendering hydrophobic surfaces wettable by water (Beeley et al, Biochem. J, 235, 645-650 (1986)). BASE, a 179 amino acid protein, is 42% identical and 63% similar to the first 178 amino acids of Latherin.

[0234] Subtraction of the initial library using biotinylated RNA derived from normal tissues enriched for transcripts that encode membrane and secreted proteins and genes that are upregulated in breast and prostate cancers (Table 2). Based on the cellular location of proteins encoded by the known genes in the library, 49% of the MAPcL clones encode membrane or secreted proteins. Furthermore, 12 of the 15 most abundant genes represented in the subtracted MAPcL encode either secreted or membrane proteins (Table 2). The most abundant gene from the subtracted MAPcL is kallikrein 3 (Table 2). The abundance of this gene is a good verification for subtraction of the library because it is expressed by the prostate cancer cell line LNCaP, encodes a secreted protein and has expression associated with prostate tissue (Table 2).

[0235] Some of the most abundant genes represented in the subtracted MAPcL are associated with or upregulated in breast cancer. The eighth most abundant gene from the subtracted library is ErbB2, which is a membrane protein. All of the breast cancer cell lines used to make the MAPcL express ErbB2 with SK-BR-3 containing gene amplification of ErbB2. Keratin proteins are used as markers for the detection of epithelial cells (Ronnov- Jessen et al, Physiol Rev., 76, 69-125. (1996)). The breast cancer cell lines MCF7, ZR-75-1, SK-BR-3 and MDA-MB-231 have been shown to produce large amounts of Keratin 8,

Keratin 18 and Keratin 19 (Trask et al, Proc. Nail Acad. Sci. U.S.A., 87, 2319-2323 (1990)). Furthermore, expression of Keratin 8 and Keratin 18 is normally maintained in carcinomas; whereas, expression of other keratin family members is frequently lost (Oshima et al, Cancer Metastasis Rev., 15, 445-471. (1996)). Keratin 18 is the third most abundant gene represented in the subtracted MAPcL, and there are 20 MAPcL clones that encode Keratin 8, and 20 clones that encode Keratin 19 (Table 2). In addition, MUC1, which is upregulated about 10-fold in 90% of breast tumors (Hadden, J.W., Int. J. Immunopharmacol, 21, 79-101 (1999)), is encoded by 12 cDNA clones.

[0236] Finally, with over 3000 unknown transcripts and 49%> of the MAPcL clones encoding membrane or secreted proteins, this library may contain numerous genes encoding therapeutic targets or diagnostic proteins for breast cancer.

Table 1 » Initial library list of the most abundant MAPcL genes

Count Symbol Name Cellular

Location

9 FN1 fibronectin 1 secreted

7 GAPD glyceraldehyde-3-phosphate dehydrogenase cytoplasm 7 KRT8 keratin 8 cytoskeletal 6 EEF1A1 eukaryotic translation elongation factor 1 α 1 cytoplasm 6 GRP58 glucose regulated protein, 58kD er 6 KRT18 keratin 18 cytoskeletal

5 SSR2 signal sequence receptor, β er

5 TRA1 tumor rejection antigen (gp96) 1 membrane 4 EIF4A2 eukaryotic translation initiation factor 4A, isoform 2 cytoplasm 4 ANXA2 annexin A2 membrane 4 RPL4 ribosomal protein L4 cytoplasm 4 PPIB peptidylprolyl isomerase B (cyclophilin B) er 4 CD151 CD151 antigen membrane 4 P4HB procollagen-proline, 2-oxoglutarate 4-dioxygenase, β er 3 DKFZP566C243 protein mitochondria

Out of 943 sequences from the initial library. "Determined from GO, LocusLink and OMIM. ^cEndoplasmic reticulum

Table 2 • Subtracted MAPcL list of the most abundant genes

Name Cellular

Count³ Symbol Location⁰

87 KLK3 kallikrein 3, (prostate specific antigen) secreted

86 SLC7A5 solute carrier family 7 (cationic amino acid membrane transporter, y+ system), member 5

59 KRT18 keratin 18 cytoskeletal

58 ITGA3 integrin, α 3 (α 3 subunit VLA-3 receptor) membrane

56 ITGB4 integrin, β 4 membrane 48 leukocyte receptor cluster (LRC) member 4 membrane

LENG4

43 LAMC2 laminin, γ 2 secreted

42 ERBB2 v-erb-b2 erythroblastic leukemia viral oncogene membrane homolog 2,

41 KRT14 keratin 14 cytoskeletal 40 SPINT2 serine protease inhibitor, Kunitz type, 2 membrane 40 BSG basigin (OK blood group) membrane 35 LGALS3BP lectin, galactoside-binding, soluble 3 binding protein secreted 34 KRT17 keratin 17 cytoskeletal 34 SLC16A3 solute carrier family 16 (monocarboxylic acid membrane transporters), member 3

30 GRN granulin secreted ^aOut of 15,581 sequences from the subtracted library. "Determined from GO, LocusLink and OMIM.

Table 3 • Classification of the Subtracted MAPcL Sequences

No. of No. of Genes %^f Represented¹¹ Sequences

RefSeq genes³ 10506 3814" 54.5 Non-RefSeq UniGene clusters⁰ 4074 2382^d 34.0 Non-Unigene 354 342^e 5.0 EST hits^c

No EST hits 647 457^e 6.5

Total 15581 6995 100 Sequences

^a Determined by BLAST against the RefSeq database. ^b Determined by BLAST against NT and dbEST databases. ^c Determined by BLAST against the dbEST database. ^d Determined by counting unique RefSeq genes and UniGene clusters. ^c Determined by BLAST of the MAPcL sequences against themselves. ^f Percentage of total genes represented.

Example 4

[0237] This Example sets out the methods used in the studies resulting in the present invention.

[0238] Primers. The primers used are the following: KAE08h07-For (5'-CAAGCCCTTAATGATTTGACTC-3') (SEQ ID NO:7); KAE08h07-Rev (5'-AGGTTTCTCTCTATGTTTGCCAC-3') (SEQ ID NO:8); Transferrin-For (5'-CATTCTCTAACTTGTTTGGTGG-3') (SEQ ID NO:9); Transferrin-Rev (5'- CCAGGTAAACAAGTCTACCG-3 ') (SEQ ID NO: 10). The primers were synthesized by Lofstrand Labs Ltd. (Gaithersburg, Maryland). The primers Actin-For (5'- GCATGGGTCAGAAGGAT-3') (SEQ ID NO:l 1) and Actin-Rev (5'-CCAATGGT- GATGACCTG-3') (SEQ ID NO: 12) were purchased from OriGene Technologies (Rockville, Maryland).

[0239] Cell culture. MCF7, SK-BR-3, ZR-75-1, MDA-MB-231 and LNCaP cell lines were maintained as recommended by ATCC. The hTERT-HMEl cell line (Clontech, Palo Alto, California) was maintained according to manufacturer's instructions. [0240] Isolation of membrane associated polyribosomal RNA. MCF7, SK-BR-3, ZR- 75-1, MDA-MB-231, hTERT-HMEl and LNCaP cells (~1 x 10⁸ cells per prep) were individually treated with 50 μM cycloheximide (Sigma, St. Louis, Missouri) for 10 min at 37°C. The cells were washed twice with ice-cold phosphate-buffered saline solution and scraped from the dish into a 50 ml conical tube. The cells were centrifuged and resuspended to 1.25 x 10⁸ cell/ml in hypotonic buffer (10 mM KC1, 1.5 mM MgCl₂, 10 mM Tris-HCl pH 7.4 and 200 u/ml RNase inhibitor (Roche, Indianapolis, IN)). The cells were placed on ice to swell for 10 min and ruptured with a Dounce using pestle B (Kontes, Vineland, NJ). The membrane associated polyribosomes and cytosolic polyribosomes were separated by isopycnic centrifugation in a discontinuous sucrose density gradient at 90,000 x g for 15 hr at 4°C (Mechler, B.M., Methods Enzymol, 152, 241-248 (1987)). The total RNA was isolated from the membrane polyribosomal fraction using Trizol LS Reagent (Invitrogen, Life Technologies, Carlsbad, CA). The quality of the total RNA was verified using the Agilent 2100 Bioanalyzer. Individual preps of membrane associated polyribosomal RNA from each cell line were pooled as follows: 300 μg each isolated from MCF7, SK-BR-3, ZR-75-1, MDA-MB-231, and hTERT-HMEl and 200 μg from LNCaP. Of the pooled RNA, 100 μg was saved for future analysis, and 1.6 mg was given to Invitrogen, Life Technologies for library construction.

[0241] Generation of the MAPcL. mRNA was isolated from the pooled total membrane associated polyribosomal RNA, and cDNA was generated by using an oligo dT primer by

Invitrogen, Life Technologies. The cDNA fragments were cloned directionally into the EcoR V and Not I sites of pCMVSportό.O (Invitrogen, San Diego, CA), resulting in the destruction of the EcoRV site. The library was electroporated into E. coli ΕMDH10B cells, and the titer of the library was determined. Twenty-three clones were randomly picked to determine the average insert size of the library.

[0242] Subtraction of the MAPcL. 5 x 10° clones of the MAPcL were amplified 26,000- fold by Invitrogen, using their semi-solid agarose procedure, which minimizes clone bias that normally occurs during liquid amplification. A driver library was created by pooling Invitrogen, Life Technologies' s pre-made liver, brain, kidney, lung and skeletal muscle libraries in equimolar amounts. The amplified MAPcL was subtracted with the driver library (Li et al, Biotechniques, 16, 722-729 (1994)). The subtracted library contains 1.3 x 10⁷ cfu total with an average insert size of 1800 bp. [0243] Sequencing of the subtracted and unsubtracted MAPcL. The 5' sequencing reactions were performed by Advanced Technology Center using the Ml 3 reverse primer.

[0244] Human MTE™ array and Northern blot hybridization. The Human MTE™ Array was purchased from Clontech (part of BD Biosciences, Palo Alto, CA). The 1.4 kb BASE probe used for hybridization was generated by digesting the MAPcL clone pKAE08h07 with EcøRI and Nøtl and purifying the cDΝA insert by agarose gel electrophoresis. The cDΝA insert was labeled with P by random primer extension (Lofstrand Labs Ltd., Gaithersburg, MD), and the hybridization conditions were done as described previously (Εssand et al, Proc. Natl Acad. Sci. U.S.A., 96, 9287-9292 (1999)). The membrane was exposed to film for 2 days.

[0245] Samples for Northern blot hybridization, 2 μg of poly(A) RNA per lane, were separated using a 1.25%) agarose gel containing 2%> formaldehyde. Salivary gland poly(A) RNA was purchased from Clontech, and ZR-75-1 poly(A) RNA was generated using the FastTrack 2.0 mRNA isolation system from Invitrogen. Generation of the BASE probe, hybridization and washing conditions were performed as described above. The 0.24-9.5 kb RNA ladder was purchased from Invitrogen. The blot was exposed to film for 1 day.

[0246] Reverse transcription-PCR (RT-PCR) and rapid scan gene expression panel analysis. Total RNA was isolated from frozen breast tumor samples acquired from the Cooperative Human Tissue Network and tissue culture cell lines using the StrataPrep Total RNA Miniprep Kit (Stratagene, La Jolla, CA) according to the manufacturer's instructions. To generate single stranded cDNAs, total RNA (5 μg) was used with the First-Strand cDNA Synthesis Kit using random hexamer priming according to the manufacturer's instructions (Amersham Biosciences, Piscataway, NJ). PCR reactions were performed using the following protocol: initial denaturation at 94°C for 3 min, 35 cycles of denaturation at 94°C for 1 min, annealing at 60°C for 1 min, and elongation at 72°C for 1 min with a final 5 min extension at 72°C. Similar PCR conditions were used with the Rapid-Scan gene expression panel except elongation at 72°C was performed for 2 min (OriGene Technologies).

[0247] Sequence identification. Sequences of clones from the MAPcL were initially identified by comparison to the NCBI RefSeq, GenBank and dbEST databases using BLAST (Pruitt et al, Nucleic Acids Res., 29, 137-140 (2001); Altschul et al, J. Mol. Biol, 215, 403- 410 (1990)). Full-length clones were identified as those MAPcL sequences with a hit to a RefSeq protein at 70% identity or better and an alignment starting at amino acid one of the RefSeq protein. Membrane and secreted proteins were identified using GO classifications associated with RefSeq genes. The NIH_MGC_87 cDNA library from the NIH Mammalian Gene Collection was used as a control for membrane and secreted proteins (Strausberg et al, Science, 286, 455-457 (1999)). This library contains over 19,000 ESTs and was made from an adenocarcinoma breast tissue derived cell line. MAPcL sequences representing unknown genes were classified by tissue expression using EST sequences from the dbEST.

[0248] URLs. UniGene, RefSeq, dbEST and GenBank sequence databases, the BLAST program, LocusLink, OMIM and the CGAP project can be accessed on the World Wide Web at ncbi.nlm.nih.gov. The GO database can be accessed on the Web at geneontology.org/. The NIH_MGC_87 cDNA library can be obtained on-line by entering "http://" followed by "mgc.nci.nih.gov/". The Goldenpath genome build and annotation databases can be accessed on-line by entering "http://" followed by "genome.ucsc.edu". All database versions except Goldenpath were taken as a snapshot from public releases available March 14, 2002. Genome sequences and annotations were taken from the December 2001 build of Goldenpath.

[0249] While specific examples have been provided, the above description is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this specification. The scope of the invention should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the appended claims along with their full scope of equivalents.

[0250] All publications and patent documents cited herein are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication or patent document were so individually denoted. Citation of various references in this document is not an admission that any particular reference is considered to be "prior art" to the invention.

Claims

WHAT IS CLAIMED IS:

1. An isolated polypeptide comprising an amino acid sequence selected from the group consisting of SEQ ID NO: 1 , SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167- 179 of SEQ ID NO : 1 , a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO: 1 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO: 1 , a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO: 1 which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 %> sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2.

2. An isolated polypeptide of claim 1, wherein the polypeptide has at least 98% sequence identity to SEQ ID NO: 1.

3. An isolated polypeptide of claim 1, wherein the polypeptide has at least 95% sequence identity to SEQ ID NO:2.

4. An isolated polypeptide of claim 1 , wherein the polypeptide has the sequence of SEQ ID NO: 1.

5. An isolated polypeptide of claim 1, wherein the polypeptide has the sequence of SEQ ID NO:2.

6. An isolated polypeptide of claim 1 , wherein the polypeptide comprises an immunogenic fragment of SEQ ID NO: 1 comprising at least 8 contiguous amino acids from amino acids 167- 179 of SEQ ID NO: 1.

7. An isolated polypeptide of claim 1, wherein the polypeptide comprises an immunogenic fragment of SEQ ID NO:2 comprising at least 8 contiguous amino acids of SEQ ID NO:2.

8. A composition comprising a polypeptide of claim 1 and a pharmaceutically acceptable carrier.

9. A composition comprising a polypeptide of claim 2 and a pharmaceutically acceptable carrier.

10. A composition comprising a polypeptide of claim 3 and a pharmaceutically acceptable carrier.

11. A composition comprising a polypeptide of claim 4 and a pharmaceutically acceptable carrier.

12. A composition comprising a polypeptide of claim 5 and a pharmaceutically acceptable carrier.

13. A composition comprising a polypeptide of claim 6 and a pharmaceutically acceptable carrier.

14. A composition comprising a polypeptide of claim 7 and a pharmaceutically acceptable carrier.

15. An isolated, recombinant nucleic acid molecule comprising a nucleotide sequence encoding a polypeptide selected from the group consisting of a polypeptide of SEQ ID NO: 1, a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167- 179 of SEQ ID NO: 1 , a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO: 1 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO: 1 , a polypeptide with 90%> or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 %> sequence identity with SEQ ID NO: 1 which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2.

16. An isolated, recombinant nucleic acid molecule of claim 15, which encodes a polypeptide comprising the sequence of SEQ ID NO: 1.

17. An isolated, recombinant nucleic acid molecule of claim 15, which encodes a polypeptide comprising the sequence of SEQ ID NO:2.

18. An isolated, recombinant nucleic acid molecule of claim 15, which encodes a polypeptide comprising an immunogenic fragment of SEQ ID NO: 1 comprising at least 8 contiguous amino acids from amino acids 167- 179 of SEQ ID NO: 1.

19. An isolated, recombinant nucleic acid molecule of claim 15, which encodes a polypeptide comprising an immunogenic fragment of SEQ ID NO:2 comprising at least 8 contiguous amino acids of SEQ ID NO:2.

20. A host cell comprising an expression vector comprising a promoter operatively linked to a nucleotide sequence encoding a polypeptide selected from the group consisting of: a polypeptide of SEQ ID NO: 1 , a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167- 179 of SEQ ID NO : 1 , a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95%> or greater sequence identity to SEQ ID NO: 1 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:l , a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 %> sequence identity with SEQ ID NO: 1 which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express a polypeptide of SEQ ID NO: 1 or SEQ ID NO:2.

21. A use of an isolated polypeptide comprising an amino acid sequence selected from the group consisting of a polypeptide of SEQ ID NO: 1 , a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO: 1 , a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95%) or greater sequence identity to SEQ ID NO: 1 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:l, a polypeptide with 90%> or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 %> sequence identity with SEQ ID NO: 1 which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2, for the manufacture of a medicament to activate T lymphocytes against cells which express SEQ ID NO:l or SEQ ID NO:2.

22. A use of claim 21 , wherein said isolated polypeptide comprises at least 8 contiguous amino acids of amino acids 167- 179 of SEQ ID NO : 1.

23. A use of claim 21 , wherein the polypeptide comprises at least 8 contiguous amino acids of SEQ ID NO:2.

24. A use of claim 21, wherein the polypeptide has at least 95% sequence identity to SEQ ID NO:l and which, when processed and presented in the context of Major Histocompatibility Complex molecules, activates T lymphocytes against cells which express a polypeptide of SEQ ID NO: 1.

25. A use of claim 21, wherein the polypeptide has at least 90% sequence identity to SEQ ID NO:2 and which, when processed and presented in the context of Major Histocompatibility Complex molecules, activates T lymphocytes against cells which express a polypeptide of SEQ ID NO:2.

26. A use of claim 21, wherein the cells expressing SEQ ID NO: 1 or SEQ ID NO:2 are breast cancer cells.

27. A use of an isolated, recombinant nucleic acid molecule comprising a nucleotide sequence encoding a polypeptide comprising an amino acid sequence selected from the group consisting of a polypeptide of SEQ ID NO: 1 , a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO: 1 , a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95%> or greater sequence identity to SEQ ID NO: 1 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO: 1 , a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95% sequence identity with SEQ ID NO:l which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 %> sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2, for the manufacture of a medicament to activate T lymphocytes against cells which express SEQ ID NO: 1 or SEQ ID NO:2.

28. A use of claim 27, wherein the cells expressing SEQ ID NO: 1 or SEQ ID NO:2 are breast cancer cells.

29. A use of claim 27, wherein the isolated, recombinant nucleic acid molecule encodes a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO: 1.

30. A use of claim 27, wherein the isolated, recombinant nucleic acid molecule encodes a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2.

31. A method of activating T lymphocytes against cells expressing SEQ ID NO: 1 or SEQ ID NO:2, the method comprising administering to a subject a composition, which composition comprises an isolated polypeptide selected from the group consisting of : a polypeptide of SEQ ID NO: 1 , a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167- 179 of SEQ ID NO : 1 , a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO: 1 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO: 1 , a polypeptide with 90%> or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 %> sequence identity with SEQ ID NO: 1 which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2.

32. A method of claim 31 comprising administering to the subject SEQ ID NO: 1 , or an immunogenic fragment thereof.

33. A method of claim 31 comprising administering to the subject SEQ ID NO:2, or an immunogenic fragment thereof.

34. A method of claim 31 , wherein the polypeptide comprises a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO : 1.

35. A method of claim 31 , wherein the polypeptide comprises a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2.

36. A method of claim 31 , wherein the composition is administered to a subject with breast cancer.

37. The method of claim 31 , further comprising co-administering to the subject an immune adjuvant selected from non-specific immune adjuvants, subcellular microbial products and fractions, haptens, immunogenic proteins, immunomodulators, interferons, thymic hormones and colony stimulating factors.

38. A method of activating T lymphocytes against cancer cells expressing SEQ ID NO: 1 or SEQ ID NO:2, the method comprising contacting T cells with an antigen presenting cell pulsed or transfected with a polypeptide comprising an epitope of an isolated polypeptide selected from the group consisting of : a polypeptide of SEQ ID NO: 1 , a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167- 179 of SEQ ID NO : 1 , a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO:l and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:l, a polypeptide with 90% or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 %> sequence identity with SEQ ID NO: 1 which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 % sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2.

39. A method of activating T lymphocytes against cancer cells expressing SEQ ID NO: 1 or SEQ ID NO:2, the method comprising administering a nucleic acid sequence encoding polypeptide comprising an epitope of an isolated polypeptide selected from the group consisting of : a polypeptide of SEQ ID NO: 1 , a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO: 1 , a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95%) or greater sequence identity to SEQ ID NO:l and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO: 1 , a polypeptide with 90%> or greater sequence identity to SEQ ID NO:2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO: 1 which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 %> sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2.

40. A method of claim 39, wherein said nucleic acid is operably linked to a promoter.

41. A method of claim 39, wherein said nucleic acid is in an expression vector, which expression vector is in an autologous recombinant cell.

42. A method of sensitizing CD8+ cells in vitro against cells expressing SEQ ID NO: 1 or SEQ ID NO:2, the method comprising contacting said cells with a composition, which composition comprises an isolated polypeptide selected from the group consisting of : a polypeptide of SEQ ID NO: 1 , a polypeptide of SEQ ID NO:2, a polypeptide of at least 8 contiguous amino acids of amino acids 167-179 of SEQ ID NO:l, a polypeptide of at least 8 contiguous amino acids of SEQ ID NO:2, a polypeptide with 95% or greater sequence identity to SEQ ID NO: 1 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO: 1, a polypeptide with 90%) or greater sequence identity to SEQ ID NO: 2 and which is specifically recognized by an antibody which specifically recognizes SEQ ID NO:2, a polypeptide which has at least 95 % sequence identity with SEQ ID NO: 1 which, when processed and presented in the context of Major Histocompatibility Complex ("MHC") molecules, activates T lymphocytes against cells which express SEQ ID NO: 1 , and a polypeptide which has at least 90 %> sequence identity with SEQ ID NO:2 which, when processed and presented in the context of MHC molecules, activates T lymphocytes against cells which express SEQ ID NO:2.

43. A method of claim 42, further wherein said CD8+ cells are tumor infiltrating cells.

44. A method for determining whether a subject has a SEQ ID NO: 1- or SEQ ID NO:2- expressing cancer, comprising taking a sample from said subject from a site other than the salivary glands, and determining whether a cell in said sample contains a nucleic acid transcript encoding SEQ ID NO:l or SEQ ID NO:2, or detecting a polypeptide of SEQ ID NO: 1 or SEQ ID NO:2, whereby detection of the transcript or of SEQ ID NO: 1 or SEQ ID NO:2 in said sample indicates that the subject has a SEQ ID NO: 1 - or SEQ ID NO:2- expressing cancer.

45. A method of claim 44, comprising detecting the transcript.

46. A method of claim 44, comprising detecting a polypeptide of SEQ ID NO:l or SEQ ID NO:2.

47. A method of claim 44, comprising contacting RNA from the sample with a nucleic acid probe that specifically hybridizes to a nucleic acid transcript encoding SEQ ID NO:l or SEQ ID NO:2 under stringent hybridization conditions, and detecting hybridization.

48. A method of claim 44, wherein said sample is selected from the group consisting of blood and urine.

49. An antibody that specifically binds to an epitope of a polypeptide selected from the group consisting of a base2 protein (SEQ ID NO:2), an immunogenic fragment thereof, a polypeptide with at least 90% sequence identity to base2 and which is specifically recognized by an antibody which specifically recognizes base2, and a polypeptide which has at least 90 % sequence identity with base2 and which, when processed and presented in the context of Major Histocompatibility Complex molecules, activates T lymphocytes against cells which express base2.

50. An antibody of claim 49, wherein said polypeptide is base2 (SEQ ID NO:2).

51. An antibody of claim 49 attached to a therapeutic moiety or a detectable label.

52. An antibody of claim 51 , wherein the therapeutic moiety is a cytotoxin.

53. An antibody of claim 52, wherein the cytotoxin is selected from the group consisting of ricin A, abrin, ribotoxin, ribonuclease, saporin, calicheamycin, diphtheria toxin or a subunit thereof, Pseudomonas exotoxin, a cytotoxic portion thereof, a mutated Pseudomonas exotoxin, a cytotoxic portion thereof, and botulinum toxins A through F, pokeweed antiviral toxin or a cytotoxic fragment thereof, and bryodin 1 or a cytotoxic fragment thereof.

54. An antibody of claim 52, wherein the cytotoxin is a Pseudomonas exotoxin or a cytotoxic fragment thereof.

55. A method of inhibiting the growth of a cancer cell expressing base2 (SEQ ID NO: 2) on its exterior surface, comprising contacting the cell with an immunoconjugate comprising a therapeutic moiety and a targeting moiety, the targeting moiety comprising an antibody which specifically binds to an epitope of base2, wherein said binding permits the therapeutic moiety to inhibit the growth of the cell.

56. A method of claim 55, wherein the therapeutic moiety is a cytotoxin or a radioisotope.

57. A use of an antibody that specifically binds to an epitope of a polypeptide selected from the group consisting of a base2 protein (SEQ ID NO:2), an immunogenic fragment thereof, a polypeptide with at least 90% sequence identity to base2 and which is specifically recognized by an antibody which specifically recognizes base2, and a polypeptide which has at least 90 %> sequence identity with base2 and which, when processed and presented in the context of Major Histocompatibility Complex molecules, activates T lymphocytes against cells which express base2 for the manufacture of a medicament for a base2-expressing cancer.

58. A use of claim 57, further wherein said antibody is attached to a therapeutic moiety.

59. A use of claim 58, wherein the therapeutic moiety is a cytotoxin or a radioisotope.

60. A kit for detecting a SEQ ID NO:l- or SEQ ID NO:2- expressing cancer, said kit comprising a container and an antibody which specifically binds to SEQ ID NO:2 or to amino acids 167-179 of SEQ ID NO:l .

61. A kit for detecting the presence of a SEQ ID NO:l- or SEQ ID NO:2- expressing cancer, said kit comprising a container and a nucleic acid which hybridizes under stringent conditions to a nucleic acid encoding SEQ ID NO:2 or to a nucleic acid encoding amino acids 167- 179 of SEQ ID NO : 1.