WO2009061214A1

WO2009061214A1 - Compositions and methods for modulating pigment production in plants

Info

Publication number: WO2009061214A1
Application number: PCT/NZ2008/000283
Authority: WO
Inventors: Andrew Charles Allan; Richard Victor Espley; Roger Paul Hellens; Kui Lin-Wang
Original assignee: Horticulture and Food Research Institute of New Zealand Ltd; New Zealand Institute for Plant and Food Research Ltd
Current assignee: Horticulture and Food Research Institute of New Zealand Ltd; New Zealand Institute for Bioeconomy Science Ltd
Priority date: 2007-11-05
Filing date: 2008-10-29
Publication date: 2009-05-14
Anticipated expiration: 2010-05-05
Also published as: CL2008003296A1; AR069201A1

Abstract

The invention relates to polynucleotides encoding novel transcription factors, and to the encoded transcription factors, that are capable of regulating anthocyanin production in plants. The invention also relates to constructs comprising the polynucleotides, and to host cells, plant cells and plants transformed with the polynucleotides, constructs and vectors. The invention also relates methods of producing plants with altered anthocyanin production and plants produced by the methods.

Description

COMPOSITIONS AND METHODS FOR MODULATING PIGMENT PRODUCTION

IN PLANTS

TECHNICAL FIELD

The present invention is in the field of pigment development in plants.

BACKGROUND ART

The accumulation of anthocyanin pigments in fruit is an important determinant of fruit quality. These pigments provide essential cultivar differentiation for consumers and are implicated in the health attributes. Anthocyanins belong to the diverse group of ubiquitous secondary metabolites, collectively known as flavonoids. In plants, flavonoids are believed to have a variety of functions, including defence and protection against light stress, and the pigmented anthocyanin compounds play an important physiological role as attractants in plant/animal interactions (Harborne and Grayer, 1994; Koes et ah, 1994).

One of the most common anthocyanin pigments is cyanidin which, in the form of cyanidin 3- O-galactoside, is the pigment primarily responsible for red colouration in apple skin (Lancaster, 1992; Tsao et ah, 2003). Some of the biosynthetic genes responsible have been determined (e.g. Hoffmann et al., 2006).

The avocado {Persea americana) is a tree native to Mexico and Central America, classified in the flowering plant family Lauraceae. An average avocado tree produces about 120 avocados annually. Commercial orchards produce an average of 7 tonnes per hectare each year, with some orchards achieving 20 tonnes per hectare.

In avocado {Persea americana cv. Hass) levels of anthocyanin pigment are rapidly elevated during ripening in the fruit skin and only rarely in the fruit cortex. Reddening of the avocado flesh is considered a disorder. Other varieties (e.g. cv. Fuerte) remain green skinned even when fully ripe. From studies in a diverse array of plants species, it is apparent that anthocyanin biosynthesis is controlled at the level of transcription.

The control of anthocyanin accumulation in avocado is a key question in understanding and manipulating colour in this species. Identification of the factors that exert this control would provide tools for modulating the extent and distribution of anthocyanin-derived pigmentation in fruit tissue. This would provide new novel cultivars as well as tools for controlling disorders in avocado that involve anthocyanins.

Transcription factors may regulate expression of more than one gene in any given biosynthetic pathway and therefore can be useful tools for regulating production from such biosynthetic pathways. For example, the Arabidopsis gene PAPl, when overexpressed in transgenic Arabidopsis led to up-regulation of a number of genes in the anthocyanin biosynthesis pathway from PAL to CHS and DFR (Borevitz et al, 2000, Tohge et al, 2005).

In order to manipulate anthocyanin production in avocado species it is advantageous to have available sequences derived from avocado species. Such sequences may be useful to alleviate public concerns about cross-species transformation in the genetic manipulation of anthocyanin production. In addition if down-regulation of such an avocado sequence is sought, it may be necessary to transform the plant with a sequence that is identical, or at least highly similar, to the endogenous avocado sequence. Avocado sequences may also be useful to provide probes or primers for assessing expression of corresponding endogenous sequences in avocado species during marker-assisted breeding.

To the applicant's knowledge, there are currently no known transcription factors derived from avocado that can be used to regulate anthocyanin production in avocado or other plant species.

It is therefore an object of the invention to provide transcription factor sequence from avocado which regulates anthocyanin production in avocado and other species and/or at least to provide the public with a useful choice.

SUMMARY OF THE INVENTION

In the first aspect the invention provides an isolated polynucleotide comprising a sequence encoding a polypeptide with the amino acid sequence of SEQ ID NO:1 or a variant thereof, wherein the polypeptide or variant is an R2R3 MYB transcription factor that positively regulates anthocyanin production in a plant. Preferably the transcription factor positively regulates anthocyanin production.

In one embodiment the variant comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 1.

In one embodiment the polypeptide comprises the amino acid sequence of SEQ ID NO: 1.

In a further aspect the invention provides an isolated polynucleotide comprising a sequence encoding a polypeptide with the amino acid sequence of SEQ ID NO:1 or a variant thereof, wherein the polypeptide or variant thereof is an R2R3 MYB transcription factor that positively regulates the promoter of a gene in the anthocyanin biosynthetic pathway in a plant.

In one embodiment the gene in the anthocyanin biosynthetic pathway encodes dihydroflavonol 4-reductase (DFR).

In a further embodiment the promoter has at least 70% identity to the sequence of SEQ ID NO: 8.

In a further embodiment the promoter has the sequence of SEQ ID NO: 8.

In a further aspect the invention provides an isolated polynucleotide comprising the sequence of SEQ ID NO: 2 or a variant thereof, wherein the polynucleotide or variant encodes an R2R3 MYB transcription factor that regulates anthocyanin production in a plant.

Preferably the transcription factor positively regulates anthocyanin production.

In one embodiment the variant comprises a nucleic acid sequence with at least 70% identity to the sequence of SEQ ID NO: 2. In one embodiment the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 2.

In one embodiment the variant comprises a nucleic acid sequence with at least 70% identity to the sequence of SEQ ID NO: 3.

In one embodiment the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 3.

In a further aspect the invention provides an isolated polynucleotide comprising the sequence of SEQ ID NO: 2 or a variant thereof, wherein the polynucleotide or variant thereof encodes an R2R3 MYB transcription factor that regulates the promoter of a gene in the anthocyanin biosynthetic pathway in a plant.

Preferably the transcription factor positively regulates the promoter.

In one embodiment the variant comprises a nucleic acid sequence with at least 70% identity to the sequence of any one of SEQ ID NO: 2.

In one embodiment the variant comprises a nucleic acid sequence with at least 70% identity to the sequence of SEQ ID NO: 2.

In one embodiment the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 2.

In one embodiment the gene in the anthocyanin biosynthetic pathway encodes dihydroflavolon 4-reductase (DFR).

In a further embodiment the promoter has the sequence of SEQ ID NO: 8. In a further aspect the invention provides an isolated polypeptide comprising: a) the amino acid sequence of SEQ ID NO: 1 or a variant thereof, wherein the polypeptide or variant thereof is an R2R3 MYB transcription factor that regulates anthocyanin production in a plant; or b) a fragment, of at least 5 amino acids in length, of the sequence of a), capable of performing the same function as the polypeptide in a).

Preferably the transcription factor positively regulates the promoter.

In a further aspect the invention provides an isolated polypeptide comprising: a) the amino acid sequence of SEQ ID NO: 1 or a variant thereof, wherein the polypeptide or variant thereof is an R2R3 MYB transcription factor that regulates the promoter of a gene in the anthocyanin biosynthetic pathway in a plant. b) a fragment, of at least 5 amino acids in length, of the sequence of a), capable of performing the same function as the polypeptide in a).

Preferably the transcription factor positively regulates the promoter.

In a further embodiment the promoter has at least 70% identity to the sequence of SEQ ID NO: 8. In a further embodiment the promoter has the sequence of SEQ ID NO: 8.

In a further aspect the invention provides a polynucleotide encoding a polypeptide of the invention.

In a further aspect the invention provides an antibody raised against a polypeptide of the invention.

In a further aspect the invention provides a genetic construct comprising a polynucleotide of any one of the invention.

In a further aspect the invention provides a vector comprising a polynucleotide of the invention.

In a further aspect the invention provides a vector comprising a genetic construct of the invention.

In a further aspect the invention provides a host cell genetically modified to express a polynucleotide of any one of the invention.

In a further aspect the invention provides a host cell comprising a genetic construct of the invention.

In a further aspect the invention provides a host cell comprising a vector of the invention.

In a further aspect the invention provides a plant cell genetically modified to express a polynucleotide of the invention.

In a further aspect the invention provides a plant cell or comprising the genetic construct of the invention.

In a further aspect the invention provides a plant which comprises the plant cell of the invention. In a further aspect the invention provides a method for producing a polypeptide of the invention, the method comprising the step of culturing a host cell genetically modified to express a polynucleotide of the invention

In one embodiment the host cell comprises a genetic construct of the invention.

In a further aspect the invention provides a method for producing a plant cell or plant with altered anthocyanin production, the method comprising the step of transformation of a plant cell or plant with a genetic construct including: a) at least one polynucleotide encoding of a MYB polypeptide of the invention; b) at least one gene encoding of a MYB polypeptide of the invention c) at least one polynucleotide comprising a fragment, of at least 15 nucleotides in length, of the polynucleotide of a) or b); d) at least one polynucleotide comprising a complement, of at least 15 nucleotides in length, of the polynucleotide of c); or e) at least one polynucleotide capable of hybridising under stringent conditions to the polynucleotide of a) or a gene of b).

In one embodiment the method includes the additional step of transforming the plant with a construct designed to express a bHLH transcription factor, such that the bHLH transcription factor is co-expressed with the MYB polypeptide of the invention.

In a further embodiment the bHLH transcription factor comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 4 or 5.

In a further embodiment the bHLH transcription factor comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 4.

In a further embodiment the bHLH transcription factor comprises an amino acid sequence with the sequence of SEQ ID NO: 4.

In a further embodiment the bHLH transcription factor comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 5. In a further embodiment the bHLH transcription factor comprises an amino acid sequence with the sequence of SEQ ID NO : 5.

Preferred combinations of transcription factors to be co-expressed in the method of the invention include PaMYBlO (SEQ ID NO: 1) with AtBHLH2 (SEQ ID NO: 4); and PaMYBlO (SEQ ID NO: 1) with MdBHLH3 (SEQ ID NO: 5). Use of variants of each co- expressed BHLH sequence is also included in the method of the invention.

In a further aspect the invention provides a plant produced by the method of the invention.

In a further aspect the invention provides a method for selecting a plant altered in anthocyanin production, the method comprising testing of a plant for: a) altered expression of a polynucleotide of the invention; or b) the presence of a polymorphism associated with altered expression or activity of a polynucleotide of the invention.

In one embodiment the method involves testing the plant for altered expression of a polynucleotide of the invention.

In one embodiment the method involves testing the plant the presence of a polymorphism associated with altered expression or activity of a polynucleotide of the invention.

In a further aspect the invention provides a method for selecting a plant altered in anthocyanin production, the method comprising testing of a plant for altered expression of a polypeptide of the invention.

In a further aspect the invention provides a group or population of plants selected by the method of the invention.

In a further aspect the invention provides a method for selecting a plant cell or plant that has been transformed, the method comprising the steps a) transforming a plant cell or plant with a polynucleotide of the invention capable of regulating anthocyanin production in a plant; b) expressing the polynucleotide in the plant cell or plant; and c) selecting a plant cell or plant with increased anthocyanin pigmentation relative to other plant cells or plants, the increased anthocyanin pigmentation indicating that the plant cell or plant has been transformed.

The invention also provides a transformed plant selected by the method.

Preferably the transcription factors and variants of the invention, that are capable of regulating anthocyanin production in plants, are capable of regulating the production of the anthocyanins selected from the group including but not limited to: cyanidin-3-glucoside, cyanidin-3-0- rutinoside, cyanidin-3-glucoside and cyanidin-3-pentoside.

Preferably the plants or plant cells with altered production of anthocyanins, produced by or selected by the methods of the invention, are altered in production of anthocyanins selected from the group including but not limited to: cyanidin-3-glucoside, cyanidin-3-O-rutinoside, cyanidin-3-glucoside and cyanidin-3-pentoside;.

The polynucleotides, polypeptides and variants, of the invention may be derived from any species or may be produced by recombinant or synthetic means.

In one embodiment the polynucleotide or variant, is derived from a plant species.

In a further embodiment the polynucleotide or variant, is derived from a gymnosperm plant species.

In a further embodiment the polynucleotide or variant, is derived from an angiosperm plant species.

In a further embodiment the polynucleotide or variant, is derived from a from dicotyledonous plant species. The polypeptides and polypeptide variants of the invention may be derived from any species, or may be produced by recombinant or synthetic means.

In one embodiment the polypeptides or variants of the invention are derived from plant species.

In a further embodiment the polypeptides or variants of the invention are derived from gymnosperm plant species.

In a further embodiment the polypeptides or variants of the invention are derived from angiosperm plant species.

In a further embodiment the polypeptides or variants of the invention are derived from dicotyledonous plant species.

In a further embodiment polypeptide or variant is derived from a monocotyledonous plant species.

The plant cells and plants of the invention, including those from which the polynucleotides, polypeptides and variant may be derived, may be from any species.

In one embodiment the plants cells and plants of the invention are from gymnosperm species.

In a further embodiment the plants cells and plants of the invention are from angiosperm species.

In a further embodiment the plants cells and plants of the invention are from dicotyledonous species.

In a further embodiment the plants cells and plants of the invention are from monocotyledonous species.

Preferred plant species include fruit plant species. Preferred plant species include those selected from a group consisting of the following genera: Malus, Per sea, Pyrus, Prunis, Rubus, Rosa, Fragaria, Actinidia, Cydonia, Citrus, and Vaccinium.

Particularly preferred fruit plant species are: Malus domestica, Actidinia deliciosa, A. chinensis, A. eήantha, A. arguta and hybrids of the four Actinidia species, Prunis persica, Persea Americana, Pyrus communis, Pyrus pyrifolia, Rubus idaeus, , Rosa hybrida, and Fragaria x ananassa.

Preferred plants also include vegetable plant species selected from a group comprising but not limited to the following genera: Brassica, Lycopersicon and Solanum.

Particularly preferred vegetable plant species are: Lycopersicon esculentum and Solanum tuberosum.

Preferred plants also include crop plant species selected from a group comprising but not limited to the following genera: Glycine, Zea, Hordeum and Oryza.

Particularly preferred crop plant species include Glycine max, Zea mays and Oryza sativa.

Preferred plants also include those of the Lauraceae family.

Preferred Lauraceae genera include: Actinodaphne, Adenodaphne, Aiouea, Alseodaphne, Anaueria, Aniba, Apollonias, Aspidostemon, Beilschmiedia, Brassiodendron, Caryodaphnopsis, Cassytha, Chlorocardium, Cinnadenia, Cinnamomum, Clinostemon, Cryptocarya, Dahlgrenodendron, Dehaasia, Dicypellium, Dodecadenia, Endiandra, Endlicheria, Eusideroxylon, Gamanthera, Hexapora, Hypodaphnis, Iteadaphne, Kubitzkia, Laurus, Licaria, Lindera, Litsea, Machilus, Mezilaurus, Mocinnodaphne, Mutisiopersea, Nectandra, Neocinnamomum, Neolitsea, Nothaphoebe, Ocotea, Paraia, Parasassafras, Persea, Phoebe, Phyllostemonodaphne, Pleurothyrium, Potameia, Potoxylon, Povedadaphne, Ravensara clove nutmeg, Rhodostemonodaphne, Sassafras, Sextonia, Sinosassafras, Syndiclis, Tήadodaphne, Umbellularia, Urbanodendron, Williamodendron, and Yushunia.

Preferred Lauraceae species include: Cinnamomum verum, Cinnamomum tamala, Cinnamomum loureiroi, Cinnamomum burmannii, Cinnamomum camphor a, Cinnamomum cassia, Cinnamomum aromaticum, Laurus azorica, Laurus nobilis, Laurus novocanariensis, Lindera benzoin, Persea Americana, Persea schiedeana, Persea indica, Persea lingue, Sassafras albidum, Sassafras tzumu, and Sassafras randaiense.

Particularly preferred Lauraceae genera include: Cinnamomum, Laurus, Lindera, Persea, and Sassafras.

Particularly preferred Lauraceae species include: Laurus nobilis, Cinnamomum camphor a, Cinnamomum verum, Lindera benzoin, Persea americana, Sassafras albidum

A more particularly preferred Lauraceae genera is Persea.

Preferred Persea species include: Persea schiedeana, Persea indica, Persea lingue, and Persea americana.

A most particularly preferred Persea species is Persea americana. A preferred variety is Persea americana cv. Hass.

The term "plant" is intended to include a whole plant, any part of a plant, propagules and progeny of a plant.

The term 'propagule' means any part of a plant that may be used in reproduction or propagation, either sexual or asexual, including seeds and cuttings.

DETAILED DESCRIPTION

In this specification where reference has been made to patent specifications, other external documents, or other sources of information, this is generally for the purpose of providing a context for discussing the features of the invention. Unless specifically stated otherwise, reference to such external documents is not to be construed as an admission that such documents, or such sources of information, in any jurisdiction, are prior art, or form part of the common general knowledge in the art. The term "comprising" as used in this specification means "consisting at least in part of. When interpreting each statement in this specification that includes the term "comprising", features other than that or those prefaced by the term may also be present. Related terms such as "comprise" and "comprises" are to be interpreted in the same manner.

The term "positively regulates anthocyanin production" with respect to transcription factors means that when a plant expresses, or expresses increased levels of the transciption factor, the result is an increase in anthocyanin production in the plant to a relative suitable control plant. The increased level of expression of the transcription factor may be brought about by genetic manipulation such as transformation with a polynucleotide or genetic construct of the invention. Alternatively the increased expression my be naturally occuring in selected plants from a population.

Suitable control plants include plants of the same species or variety that are not genetically modified to increase expression of the transcription factor, such as plants transformed with a control construct, for example an empty vector construct. Other control plants may include other members of the population from which plants with naturally occuring high expression of the transcription factor are selected.

Polynucleotides and fragments

The term "polynucleotide(s)," as used herein, means a single or double-stranded deoxyribonucleotide or ribonucleotide polymer of any length but preferably at least 15 nucleotides, and include as non-limiting examples, coding and non-coding sequences of a gene, sense and antisense sequences complements, exons, introns, genomic DNA, cDNA, pre- niRNA, mRNA, rRNA, siRNA, miRNA, tRNA, ribozymes, recombinant polypeptides, isolated and purified naturally occurring DNA or RNA sequences, synthetic RNA and DNA sequences, nucleic acid probes, primers and fragments.

A "fragment" of a polynucleotide sequence provided herein is a subsequence of contiguous nucleotides that is capable of specific hybridization to a target of interest, e.g., a sequence that is at least 15 nucleotides in length. The fragments of the invention comprise 15 nucleotides, preferably at least 20 nucleotides, more preferably at least 30 nucleotides, more preferably at least 50 nucleotides, more preferably at least 50 nucleotides and most preferably at least 60 nucleotides of contiguous nucleotides of a polynucleotide of the invention. A fragment of a polynucleotide sequence can be used in antisense, gene silencing, triple helix or ribozyme technology, or as a primer, a probe, included in a microarray, or used in polynucleotide-based selection methods of the invention.

The term "primer" refers to a short polynucleotide, usually having a free 3 'OH group, that is hybridized to a template and used for priming polymerization of a polynucleotide complementary to the target.

The term "probe" refers to a short polynucleotide that is used to detect a polynucleotide sequence, that is complementary to the probe, in a hybridization-based assay. The probe may consist of a "fragment" of a polynucleotide as defined herein.

Polypeptides and fragments

The term "polypeptide", as used herein, encompasses amino acid chains of any length but preferably at least 5 amino acids, including full-length proteins, in which amino acid residues are linked by covalent peptide bonds. Polypeptides of the present invention may be purified natural products, or may be produced partially or wholly using recombinant or synthetic techniques. The term may refer to a polypeptide, an aggregate of a polypeptide such as a dimer or other multimer, a fusion polypeptide, a polypeptide fragment, a polypeptide variant, or derivative thereof.

A "fragment" of a polypeptide is a subsequence of the polypeptide that performs a function that is required for the biological activity and/or provides three dimensional structure of the polypeptide. The term may refer to a polypeptide, an aggregate of a polypeptide such as^" a dimer or other multimer, a fusion polypeptide, a polypeptide fragment, a polypeptide variant, or derivative thereof capable of performing the above enzymatic activity.

The term "isolated" as applied to the polynucleotide or polypeptide sequences disclosed herein is used to refer to sequences that are removed from their natural cellular environment. An isolated molecule may be obtained by any method or combination of methods including biochemical, recombinant, and synthetic techniques. The term "recombinant" refers to a polynucleotide sequence that is removed from sequences that surround it in its natural context and/or is recombined with sequences that are not present in its natural context.

A "recombinant" polypeptide sequence is produced by translation from a "recombinant" polynucleotide sequence.

The term "derived from" with respect to polynucleotides or polypeptides of the invention being derived from a particular genera or species, means that the polynucleotide or polypeptide has the same sequence as a polynucleotide or polypeptide found naturally in that genera or species. The polynucleotide or polypeptide, derived from a particular genera or species, may therefore be produced synthetically or recombinantly.

Variants

As used herein, the term "variant" refers to polynucleotide or polypeptide sequences different from the specifically identified sequences, wherein one or more nucleotides or amino acid residues is deleted, substituted, or added. Variants may be naturally occurring allelic variants, or non-naturally occurring variants. Variants may be from the same or from other species and may encompass homologues, paralogues and orthologues. In certain embodiments, variants of the inventive polypeptides and polypeptides possess biological activities that are the same or similar to those of the inventive polypeptides or polypeptides. The term "variant" with reference to polypeptides and polypeptides encompasses all forms of polypeptides and polypeptides as defined herein.

Polynucleotide variants

Variant polynucleotide sequences preferably exhibit at least 50%, more preferably at least

51%, more preferably at least 52%, more preferably at least 53%, more preferably at least 54%, more preferably at least 55%, more preferably at least 56%, more preferably at least

57%, more preferably at least 58%, more preferably at least 59%, more preferably at least 60%, more preferably at least 61%, more preferably at least 62%, more preferably at least

63%, more preferably at least 64%, more preferably at least 65%, more preferably at least

66%, more preferably at least 67%, more preferably at least 68%, more preferably at least 69%, more preferably at least 70%, more preferably at least 71%, more preferably at least 72%, more preferably at least 73%, more preferably at least 74%, more preferably at least 75%, more preferably at least 76%, more preferably at least 77%, more preferably at least 78%, more preferably at least 79%, more preferably at least 80%, more preferably at least 81%, more preferably at least 82%, more preferably at least 83%, more preferably at least 84%, more preferably at least 85%, more preferably at least 86%, more preferably at least 87%, more preferably at least 88%, more preferably at least 89%, more preferably at least 90%, more preferably at least 91%, more preferably at least 92%, more preferably at least 93%, more preferably at least 94%, more preferably at least 95%, more preferably at least 96%, more preferably at least 97%, more preferably at least 98%, and most preferably at least 99% identity to a sequence of the present invention. Identity is found over a comparison window of at least 20 nucleotide positions, preferably at least 50 nucleotide positions, more preferably at least 100 nucleotide positions, and most preferably over the entire length of a polynucleotide of the invention.

Polynucleotide sequence identity can be determined in the following manner. The subject polynucleotide sequence is compared to a candidate polynucleotide sequence using BLASTN (from the BLAST suite of programs, version 2.2.5 [Nov 2002]) in bl2seq (Tatiana A. Tatusova, Thomas L. Madden (1999), "Blast 2 sequences - a new tool for comparing protein and nucleotide sequences", FEMS Microbiol Lett. 174:247-250), which is publicly available from NCBI (ftp://ftp.ncbi.nih.RQv/blast/). The default parameters of bl2seq are utilized except that filtering of low complexity parts should be turned off.

The identity of polynucleotide sequences may be examined using the following unix command line parameters:

bl2seq -i nucleotideseql -j nucleotideseq2 -F F -p blastn

The parameter -F F turns off filtering of low complexity sections. The parameter -p selects the appropriate algorithm for the pair of sequences. The bl2seq program reports sequence identity as both the number and percentage of identical nucleotides in a line "Identities = ".

Polynucleotide sequence identity may also be calculated over the entire length of the overlap between a candidate and subject polynucleotide sequences using global sequence alignment programs (e.g. Needleman, S. B. and Wunsch, C. D. (1970) J. MoI. Biol. 48, 443-453). A full implementation of the Needleman- Wunsch global alignment algorithm is found in the needle program in the EMBOSS package (Rice,P. Longden,I. and Bleasby,A. EMBOSS: The European Molecular Biology Open Software Suite, Trends in Genetics June 2000, vol 16, No 6. pp.276-277) which can be obtained from http://www.hgmp.mrc.ac.uk/Software/EMBOSS/. The European Bioinformatics Institute server also provides the facility to perform EMBOSS- needle global alignments between two sequences on line at http:/www.ebi. ac.uk/emboss/align/.

Alternatively the GAP program may be used which computes an optimal global alignment of two sequences without penalizing terminal gaps. GAP is described in the following paper: Huang, X. (1994) On Global Sequence Alignment. Computer Applications in the Biosciences 10, 227-235.

Polynucleotide variants of the present invention also encompass those which exhibit a similarity to one or more of the specifically identified sequences that is likely to preserve the functional equivalence of those sequences and which could not reasonably be expected to have occurred by random chance. Such sequence similarity with respect to polypeptides may be determined using the publicly available bl2seq program from the BLAST suite of programs (version 2.2.5 [Nov 2002]) from NCBI (ftp://ftp.ncbi.nih.gov/blast/).

The similarity of polynucleotide sequences may be examined using the following unix command line parameters:

bl2seq -i nucleotideseql -j nucleotideseq2 -F F -p tblastx

The parameter -F F turns off filtering of low complexity sections. The parameter -p selects the appropriate algorithm for the pair of sequences. This program finds regions of similarity between the sequences and for each such region reports an "E value" which is the expected number of times one could expect to see such a match by chance in a database of a fixed reference size containing random sequences. The size of this database is set by default in the bl2seq program. For small E values, much less than one, the E value is approximately the probability of such a random match. Variant polynucleotide sequences preferably exhibit an E value of less than 1 x 10 ^" more preferably less than 1 x 10 ^"9, more preferably less than 1 x 10 ^"' , more preferably less than 1 x 10 ^~15, more preferably less than 1 x 10 ^~18 more preferably less than 1 x 10 ^"21 _; more preferably less than 1 x 10 ^"30 _; more preferably less than I x IO ⁴⁰ more preferably less than 1 x 10 ^"50 more preferably less than 1 x 10 ^~60 more preferably less than 1 x 10 ^"70 _; more preferably less than 1 x 10 ^"80 _s more preferably less than 1 x 10 ^"9 and most preferably less than 1 x 10^"100 when compared with any one of the specifically identified sequences.

Alternatively, variant polynucleotides of the present invention hybridize to the specified polynucleotide sequences, or complements thereof under stringent conditions.

The term "hybridize under stringent conditions", and grammatical equivalents thereof, refers to the ability of a polynucleotide molecule to hybridize to a target polynucleotide molecule (such as a target polynucleotide molecule immobilized on a DNA or RNA blot, such as a Southern blot or Northern blot) under defined conditions of temperature and salt concentration. The ability to hybridize under stringent hybridization conditions can be determined by initially hybridizing under less stringent conditions then increasing the stringency to the desired stringency.

With respect to polynucleotide molecules greater than about 100 bases in length, typical stringent hybridization conditions are no more than 25 to 30° C (for example, 10° C) below the melting temperature (Tm) of the native duplex (see generally, Sambrook et al, Eds, 1987, Molecular Cloning, A Laboratory Manual, 2nd Ed. Cold Spring Harbor Press; Ausubel et al, 1987, Current Protocols in Molecular Biology, Greene Publishing,). Tm for polynucleotide molecules greater than about 100 bases can be calculated by the formula Tm = 81. 5 + 0. 41% (G + C-log (Na+). (Sambrook et al., Eds, 1987, Molecular Cloning, A Laboratory Manual, 2nd Ed. Cold Spring Harbor Press; Bolton and McCarthy, 1962, PNAS 84:1390). Typical stringent conditions for polynucleotide of greater than 100 bases in length would be hybridization conditions such as prewashing in a solution of 6X SSC, 0.2% SDS; hybridizing at 65⁰C, 6X SSC, 0.2% SDS overnight; followed by two washes of 30 minutes each in IX SSC, 0.1% SDS at 65° C and two washes of 30 minutes each in 0.2X SSC, 0.1% SDS at 65⁰C.

With respect to polynucleotide molecules having a length less than 100 bases, exemplary stringent hybridization conditions are 5 to 10° C below Tm. On average, the Tm of a polynucleotide molecule of length less than 100 bp is reduced by approximately (500/oligonucleotide length)⁰ C.

With respect to the DNA mimics known as peptide nucleic acids (PNAs) (Nielsen et ah, Science. 1991 Dec 6;254(5037): 1497-500) Tm values are higher than those for DNA-DNA or DNA-RNA hybrids, and can be calculated using the formula described in Giesen et ah, Nucleic Acids Res. 1998 Nov l;26(21):5004-6. Exemplary stringent hybridization conditions for a DNA-PNA hybrid having a length less than 100 bases are 5 to 10° C below the Tm.

Variant polynucleotides of the present invention also encompasses polynucleotides that differ from the sequences of the invention but that, as a consequence of the degeneracy of the genetic code, encode a polypeptide having similar activity to a polypeptide encoded by a polynucleotide of the present invention. A sequence alteration that does not change the amino acid sequence of the polypeptide is a "silent variation". Except for ATG (methionine) and TGG (tryptophan), other codons for the same amino acid may be changed by art recognized techniques, e.g., to optimize codon expression in a particular host organism.

Polynucleotide sequence alterations resulting in conservative substitutions of one or several amino acids in the encoded polypeptide sequence without significantly altering its biological activity are also included in the invention. A skilled artisan will be aware of methods _. for making phenotypically silent amino acid substitutions (see, e.g., Bowie et al, 1990, Science 247, 1306).

Variant polynucleotides due to silent variations and conservative substitutions in the encoded polypeptide sequence may be determined using the publicly available bl2seq program from the BLAST suite of programs (version 2.2.5 [Nov 2002]) from NCBI (ftp://ftp.ncbi.nih.gov/blast/) via the tblastx algorithm as previously described.

Polypeptide variants

The term "variant" with reference to polypeptides encompasses naturally occurring, recombinantly and synthetically produced polypeptides. Variant polypeptide sequences preferably exhibit at least 50%, more preferably at least 51%, more preferably at least 52%, more preferably at least 53%, more preferably at least 54%, more preferably at least 55%, more preferably at least 56%, more preferably at least 57%, more preferably at least 58%, more preferably at least 59%, more preferably at least 60%, more preferably at least 61%, more preferably at least 62%, more preferably at least 63%, more preferably at least 64%, more preferably at least 65%, more preferably at least 66%, more preferably at least 67%, more preferably at least 68%, more preferably at least 69%, more preferably at least 70%, more preferably at least 71%, more preferably at least 72%, more preferably at least 73%, more preferably at least 74%, more preferably at least 75%, more preferably at least 76%, more preferably at least 77%, more preferably at least 78%, more preferably at least 79%, more preferably at least 80%, more preferably at least 81%, more preferably at least 82%, more preferably at least 83%, more preferably at least 84%, more preferably at least 85%, more preferably at least 86%, more preferably at least 87%, more preferably at least 88%, more preferably at least 89%, more preferably at least 90%, more preferably at least 91%, more preferably at least 92%, more preferably at least 93%, more preferably at least 94%, more preferably at least 95%; more preferably at least 96%, more preferably at least 97%, more preferably at least 98%, and most preferably at least 99% identity to a sequences of the present invention. Identity is found over a comparison window of at least 20 amino acid positions, preferably at least 50 amino acid positions, more preferably at least 100 amino acid positions, and most preferably over the entire length of a polypeptide of the invention.

Polypeptide sequence identity can be determined in the following manner. The subject polypeptide sequence is compared to a candidate polypeptide sequence using BLASTP (from the BLAST suite of programs, version 2.2.5 [Nov 2002]) in bl2seq, which is publicly available from NCBI (ftp://ftp.ncbi.nih.gov/blast/). The default parameters of bl2seq are utilized except that filtering of low complexity regions should be turned off.

Polypeptide sequence identity may also be calculated over the entire length of the overlap between a candidate and subject polynucleotide sequences using global sequence alignment programs. EMBOSS-needle (available at http./www.ebi.ac.uk/emboss/align/) and GAP (Huang, X. (1994) On Global Sequence Alignment. Computer Applications in the Biosciences 10, 227-235.) as discussed above are also suitable global sequence alignment programs for calculating polypeptide sequence identity. A preferred method for calculating polypeptide sequence identity is based on aligning sequences to be compared using Clustal W (Thompson et al 1994, Nucleic Acid Res 11 (22)4673-4680)

Polypeptide variants of the present invention also encompass those which exhibit a similarity to one or more of the specifically identified sequences that is likely to preserve the functional equivalence of those sequences and which could not reasonably be expected to have occurred by random chance. Such sequence similarity with respect to polypeptides may be determined using the publicly available bl2seq program from the BLAST suite of programs (version 2.2.5 [Nov 2002]) from NCBI (ftp://ftp.ncbi.nih.gov/blast/). The similarity of polypeptide sequences may be examined using the following unix command line parameters:

bl2seq -i peptideseql -j peptideseq2 -F F - p blastp

Variant polypeptide sequences preferably exhibit an E value of less than 1 x 10 ^"6 more preferably less than 1 x 10 ^"9, more preferably less than 1 x 10 ^~12, more preferably less than 1 x 10 ^~15, more preferably less than 1 x 10 ^~18, more preferably less than 1 x 10 ^"21, more preferably less than 1 x 10 ^"30, more preferably less than 1 x 10 ^A0, more preferably less than 1 x 10 ^'50, more preferably less than 1 x 10 ^"60, more preferably less than 1 x 10 ^"70, more preferably less than 1 x 10 ^"80, more preferably less than 1 x 10 ^"90 and most preferably 1x10^" ¹⁰⁰ when compared with any one of the specifically identified sequences.

The parameter -F F turns off filtering of low complexity sections. The parameter -p selects the appropriate algorithm for the pair of sequences. This program finds regions of similarity between the sequences and for each such region reports an "E value" which is the expected number of times one could expect to see such a match by chance in a database of a fixed reference size containing random sequences. For small E values, much less than one, this is approximately the probability of such a random match.

Conservative substitutions of one or several amino acids of a described polypeptide sequence without significantly altering its biological activity are also included in the invention. A skilled artisan will be aware of methods for making phenotypically silent amino acid substitutions (see, e.g., Bowie et al, 1990, Science 247, 1306). Constructs, vectors and components thereof

The term "genetic construct" refers to a polynucleotide molecule, usually double-stranded

DNA, which may have inserted into it another polynucleotide molecule (the insert polynucleotide molecule) such as, but not limited to, a cDNA molecule. A genetic construct may contain the necessary elements that permit transcribing the insert polynucleotide molecule, and, optionally, translating the transcript into a polypeptide. The insert polynucleotide molecule may be derived from the host cell, or may be derived from a different cell or organism and/or may be a recombinant polynucleotide. Once inside the host cell the genetic construct may become integrated in the host chromosomal DNA. The genetic construct may be linked to a vector.

The term "vector" refers to a polynucleotide molecule, usually double stranded DNA, which is used to transport the genetic construct into a host cell. The vector may be capable of replication in at least one additional host system, such as E coli.

The term "expression construct" refers to a genetic construct that includes the necessary elements that permit transcribing the insert polynucleotide molecule, and, optionally, translating the transcript into a polypeptide. An expression construct typically comprises in a 5' to 3' direction: a) a promoter functional in the host cell into which the construct will be transformed, b) the polynucleotide to be expressed, and c) a terminator functional in the host cell into which the construct will be transformed.

The term "coding region" or "open reading frame" (ORF) refers to the sense strand of a genomic DNA sequence or a cDNA sequence that is capable of producing a transcription product and/or a polypeptide under the control of appropriate regulatory sequences. The coding sequence is identified by the presence of a 5' translation start codon and a 3' translation stop codon. When inserted into a genetic construct, a "coding sequence" is capable of being expressed when it is operably linked to promoter and terminator sequences. "Operably-linked" means that the sequenced to be expressed is placed under the control of regulatory elements that include promoters, tissue-specific regulatory elements, temporal regulatory elements, enhancers, repressors and terminators.

The term "noncoding region" refers to untranslated sequences that are upstream of the translational start site and downstream of the translational stop site. These sequences are also referred to respectively as the 5' UTR and the 3' UTR. These regions include elements required for transcription initiation and termination and for regulation of translation efficiency.

Terminators are sequences, which terminate transcription, and are found in the 3' untranslated ends of genes downstream of the translated sequence. Terminators are important determinants of mRNA stability and in some cases have been found to have spatial regulatory functions.

The term "promoter" refers to nontranscribed cis-regulatory elements upstream of the coding region that regulate gene transcription. Promoters comprise cis-initiator elements which specify the transcription initiation site and conserved boxes such as the TATA box, and motifs that are bound by transcription factors.

A "transgene" is a polynucleotide that is taken from one organism and introduced into a different organism by transformation. The transgene may be derived from the same species or from a different species as the species of the organism into which the transgene is introduced.

A "transgenic plant" refers to a plant which contains new genetic material as a result of genetic manipulation or transformation. The new genetic material may be derived from a plant of the same species as the resulting transgenic plant or from a different species.

An "inverted repeat" is a sequence that is repeated, where the second half of the repeat is in the complementary strand, e.g.,

(5')GATCTA TAGATC(3')

(3')CTAGAT ATCTAG(5') Read-through transcription will produce a transcript that undergoes complementary base- pairing to form a hairpin structure provided that there is a 3-5 bp spacer between the repeated regions.

The term "regulating anthocyanin production" is intended to include both increasing and decreasing anthocyanin production. Preferably the term refers to increasing anthocyanin production. Anthocyanins that may be regulated include but are not limited to cyanindin-3- glucoside, cyaniding-3-O-rutinoside, cyanadin-3-galactoside and cyanadin-3-pentoside.

The terms "to alter expression of and "altered expression" of a polynucleotide or polypeptide of the invention, are intended to encompass the situation where genomic DNA corresponding to a polynucleotide of the invention is modified thus leading to altered expression of a polynucleotide or polypeptide of the invention. Modification of the genomic DNA may be through genetic transformation or other methods known in the art for inducing mutations. The "altered expression" can be related to an increase or decrease in the amount of messenger RNA and/or polypeptide produced and may also result in altered activity of a polypeptide due to alterations in the sequence of a polynucleotide and polypeptide produced.

The applicants have identified a polynucleotide cDNA sequence (SEQ ID NO: 2) and a polynucleotide genomic sequence (SEQ ID NO: 3) which encode a polypeptide (SEQ ID NO:

1) from the avocado species Per sea americana cv. Hass. The polypeptide is a MYB R2R3 transcription factors that positively regulates anthocyanin production in plants. The invention provides fragments and variants of the sequences. The transcription factor also positively regulates the promoters of genes encoding enzymes in the anthocyanin biosynthetic pathway in plants. A summary of the relationship between the polynucleotides and polypeptides is found in Table 2 (Summary of Sequences).

The invention provides genetic constructs, vectors and plants comprising the polynucleotide sequences. The invention also provides plants comprising the genetic constructs and vectors of the invention.

The invention provides plants altered, relative to suitable control plants, in production of anthocyanin pigments. The invention provides both plants with increased and decreased production of anthocyanin pigments. The invention also provides methods for the production of such plants and methods for the selection of such plants.

Suitable control plants may include non-transformed plants of the same species and variety, or plants of the same species or variety transformed with a control construct, such as an empty vector construct.

Uses of the compositions of the invention include the production of fruit, or other plant parts, with increased levels of anthocyanin pigmentation, for example production of apples with red skin and or red flesh.

The invention also provides methods for selecting transformed plant cells and plants by selecting plant cells and plants which have increased anthocyanin pigment, the increased anthocyanic pigment indicating that the plants are transformed to express a polynucleotide or polypeptide of the invention.

Methods for isolating or producing polynucleotides

The polynucleotide molecules of the invention can be isolated by using a variety of techniques known to those of ordinary skill in the art. By way of example, such polypeptides can be isolated through use of the polymerase chain reaction (PCR) described in Mullis et al. , Eds. 1994 The Polymerase Chain Reaction, Birkhauser, incorporated herein by reference. The polypeptides of the invention can be amplified using primers, as defined herein, derived from the polynucleotide sequences of the invention.

Further methods for isolating polynucleotides of the invention include use of all, or portions of, the polypeptides having the sequence set forth herein as hybridization probes. The technique of hybridizing labelled polynucleotide probes to polynucleotides immobilized on solid supports such as nitrocellulose filters or nylon membranes, can be used to screen the genomic or cDNA libraries. Exemplary hybridization and wash conditions are: hybridization for 20 hours at 65°C in 5. 0 X SSC, 0. 5% sodium dodecyl sulfate, 1 X Denhardt's solution ; washing (three washes of twenty minutes each at 55⁰C) in 1. 0 X SSC, 1% (w/v) sodium dodecyl sulfate, and optionally one wash (for twenty minutes) in 0. 5 X SSC, 1% (w/v) sodium dodecyl sulfate, at 60°C. An optional further wash (for twenty minutes) can be conducted under conditions of 0. 1 X SSC, 1% (w/v) sodium dodecyl sulfate, at 60°C.

The polynucleotide fragments of the invention may be produced by techniques well-known in the art such as restriction endonuclease digestion, oligonucleotide synthesis and PCR amplification.

A partial polynucleotide sequence may be used, in methods well-known in the art to identify the corresponding full length polynucleotide sequence. Such methods include PCR-based methods, 5'RACE (Frohman MA, 1993, Methods Enzymol. 218: 340-56) and hybridization- based method, computer/database -based methods. Further, by way of example, inverse PCR permits acquisition of unknown sequences, flanking the polynucleotide sequences disclosed herein, starting with primers based on a known region (Triglia et ah, 1998, Nucleic Acids Res 16, 8186, incorporated herein by reference). The method uses several restriction enzymes to generate a suitable fragment in the known region of a gene. The fragment is then circularized by intramolecular ligation and used as a PCR template. Divergent primers are designed from the known region. In order to physically assemble full-length clones, standard molecular biology approaches can be utilized (Sambrook et al, Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor Press, 1987).

It may be beneficial, when producing a transgenic plant from a particular species, to transform such a plant with a sequence or sequences derived from that species. The benefit may be to alleviate public concerns regarding cross-species transformation in generating transgenic organisms. Additionally when down-regulation of a gene is the desired result, it may be necessary to utilise a sequence identical (or at least highly similar) to that in the plant, for which reduced expression is desired. For these reasons among others, it is desirable to be able to identify and isolate orthologues of a particular gene in several different plant species. Variants (including orthologues) may be identified by the methods described.

Methods for identifying variants

Physical methods Variant polypeptides may be identified using PCR-based methods (Mullis et al, Eds. 1994 The Polymerase Chain Reaction, Birkhauser). Typically, the polynucleotide sequence of a primer, useful to amplify variants of polynucleotide molecules of the invention by PCR, may be based on a sequence encoding a conserved region of the corresponding amino acid sequence.

Alternatively library screening methods, well known to those skilled in the art, may be employed (Sambrook et al, Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor Press, 1987). When identifying variants of the probe sequence, hybridization and/or wash stringency will typically be reduced relatively to when exact sequence matches are sought.

Polypeptide variants may also be identified by physical methods, for example by screening expression libraries using antibodies raised against polypeptides of the invention (Sambrook et al, Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor Press, 1987) or by identifying polypeptides from natural sources with the aid of such antibodies.

Computer based methods

The variant sequences of the invention, including both polynucleotide and polypeptide variants, may also be identified by computer-based methods well-known to those skilled in the art, using public domain sequence alignment algorithms and sequence similarity search tools to search sequence databases (public domain databases include Genbank, EMBL, Swiss- Prot, PIR and others). See, e.g., Nucleic Acids Res. 29: 1-10 and 11-16, 2001 for examples of online resources. Similarity searches retrieve and align target sequences for comparison with a sequence to be analyzed (i.e., a query sequence). Sequence comparison algorithms use scoring matrices to assign an overall score to each of the alignments.

An exemplary family of programs useful for identifying variants in sequence databases is the BLAST suite of programs (version 2.2.5 [Nov 2002]) including BLASTN, BLASTP, BLASTX, tBLASTN and tBLASTX, which are publicly available from (ftp://ftp.ncbi.nih.gov/blast/) or from the National Center for Biotechnology Information (NCBI), National Library of Medicine, Building 38A, Room 8N805, Bethesda, MD 20894 USA. The NCBI server also provides the facility to use the programs to screen a number of publicly available sequence databases. BLASTN compares a nucleotide query sequence against a nucleotide sequence database. BLASTP compares an amino acid query sequence against a protein sequence database. BLASTX compares a nucleotide query sequence translated in all reading frames against a protein sequence database. tBLASTN compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames. tBLASTX compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database. The BLAST programs may be used with default parameters or the parameters may be altered as required to refine the screen.

The use of the BLAST family of algorithms, including BLASTN, BLASTP, and BLASTX, is described in the publication of Altschul et al., Nucleic Acids Res. 25: 3389-3402, 1997.

The "hits" to one or more database sequences by a queried sequence produced by BLASTN, BLASTP, BLASTX, tBLASTN, tBLASTX, or a similar algorithm, align and identify similar portions of sequences. The hits are arranged in order of the degree of similarity and the length of sequence overlap. Hits to a database sequence generally represent an overlap over only a fraction of the sequence length of the queried sequence.

The BLASTN, BLASTP, BLASTX, tBLASTN and tBLASTX algorithms also produce "Expect" values for alignments. The Expect value (E) indicates the number of hits one can "expect" to see by chance when searching a database of the same size containing random contiguous sequences. The Expect value is used as a significance threshold for determining whether the hit to a database indicates true similarity. For example, an E value of 0.1 assigned to a polynucleotide hit is interpreted as meaning that in a database of the size of the database screened, one might expect to see 0.1 matches over the aligned portion of the sequence with a similar score simply by chance. For sequences having an E value of 0.01 or less over aligned and matched portions, the probability of finding a match by chance in that database is 1% or less using the BLASTN, BLASTP, BLASTX, tBLASTN or tBLASTX algorithm.

Multiple sequence alignments of a group of related sequences can be carried out with CLUSTALW (Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Research, 22:4673-4680, http://www-igbmc.u-strasbg.fr/BioInfo/ClustalW/Top.htmn or T-COFFEE (Cedric Notredame, Desmond G. Higgins, Jaap Heringa, T-Coffee: A novel method for fast and accurate multiple sequence alignment, J. MoI. Biol. (2000) 302: 205-217))or PILEUP, which uses progressive, pairwise alignments. (Feng and Doolittle, 1987, J. MoI. Evol. 25, 351).

Pattern recognition software applications are available for finding motifs or signature sequences. For example, MEME (Multiple Em for Motif Elicitation) finds motifs and signature sequences in a set of sequences, and MAST (Motif Alignment and Search Tool) uses these motifs to identify similar or the same motifs in query sequences. The MAST results are provided as a series of alignments with appropriate statistical data and a visual overview of the motifs found. MEME and MAST were developed at the University of California, San Diego.

PROSITE (Bairoch and Bucher, 1994, Nucleic Acids Res. 22, 3583; Hofmann et al., 1999, Nucleic Acids Res. 27, 215) is a method of identifying the functions of uncharacterized proteins translated from genomic or cDNA sequences. The PROSITE database (www.expasy.org/prosite) contains biologically significant patterns and profiles and is designed so that it can be used with appropriate computational tools to assign a new sequence to a known family of proteins or to determine which known domain(s) are present in the sequence (Falquet et al., 2002, Nucleic Acids Res. 30, 235). Prosearch is a tool that can search SWISS-PROT and EMBL databases with a given sequence pattern or signature.

The function of a variant polynucleotide of the invention as encoding a transcription factor capable of regulating pigment production in a plant transcription factors can be tested for the ability to regulate expression of known anthocyanin biosynthesis genes by methods known to those skilled in the art (e.g. Example 3) or can be tested for their capability to regulate pigment production by methods known to those skilled in the art (e.g. WO07/027105).

Methods for isolating polypeptides The polypeptides of the invention, including variant polypeptides, may be prepared using peptide synthesis methods well known in the art such as direct peptide synthesis using solid phase techniques (e.g. Stewart et al., 1969, in Solid-Phase Peptide Synthesis, WH Freeman Co, San Francisco California, or automated synthesis, for example using an Applied Biosystems 43 IA Peptide Synthesizer (Foster City, California). Mutated forms of the polypeptides may also be produced during such syntheses.

The polypeptides and variant polypeptides of the invention may also be purified from natural sources using a variety of techniques that are well known in the art (e.g. Deutscher, 1990, Ed, Methods in Enzymology, Vol. 182, Guide to Protein Purification,).

Alternatively the polypeptides and variant polypeptides of the invention may be expressed recombinantly in suitable host cells and separated from the cells as discussed below.

Methods for producing constructs and vectors

The genetic constructs of the present invention comprise one or more polynucleotide sequences of the invention and/or polynycleotides encoding polypeptides of the invention, and may be useful for transforming, for example, bacterial, fungal, insect, mammalian or plant organisms. The genetic constructs of the invention are intended to include expression constructs as herein defined.

Methods for producing and using genetic constructs and vectors are well known in the art and are described generally in Sambrook et al, Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor Press, 1987 ; Ausubel et al, Current Protocols in Molecular Biology, Greene Publishing, 1987).

Methods for producing host cells comprising polynucleotides, constructs or vectors

The invention provides a host cell which comprises a genetic construct or vector of the invention. Host cells may be derived from, for example, bacterial, fungal, insect, mammalian or plant organisms. Host cells comprising genetic constructs, such as expression constructs, of the invention are useful in methods well known in the art (e.g. Sambrook et al, Molecular Cloning : A Laboratory Manual, 2nd Ed. Cold Spring Harbor Press, 1987 ; Ausubel et al, Current Protocols in Molecular Biology, Greene Publishing, 1987) for recombinant production of polypeptides of the invention. Such methods may involve the culture of host cells in an appropriate medium in conditions suitable for or conducive to expression of a polypeptide of the invention. The expressed recombinant polypeptide, which may optionally be secreted into the culture, may then be separated from the medium, host cells or culture medium by methods well known in the art (e.g. Deutscher, Ed, 1990, Methods in Enzymology, VoI 182, Guide to Protein Purification).

Methods for producing plant cells and plants comprising constructs and vectors

The invention further provides plant cells which comprise a genetic construct of the invention, and plant cells modified to alter expression of a polynucleotide or polypeptide of the invention. Plants comprising such cells also form an aspect of the invention.

Production of plants altered in pigment production may be achieved through methods of the invention. Such methods may involve the transformation of plant cells and plants, with a construct of the invention designed to alter expression of a polynucleotide or polypeptide capable of regulating pigment production in such plant cells and plants. Such methods also include the transformation of plant cells and plants with a combination of the construct of the invention and one or more other constructs designed to alter expression of one or more polypeptides or polypeptides capable of regulating pigment production in such plant cells and plants.

Methods for transforming plant cells, plants and portions thereof with polypeptides are described in Draper et al., 1988, Plant Genetic Transformation and Gene Expression. A Laboratory Manual Blackwell Sci. Pub. Oxford, p. 365; Potrykus and Spangenburg, 1995, Gene Transfer to Plants. Springer- Verlag, Berlin.; and Gelvin et al, 1993, Plant Molecular Biol. Manual. Kluwer Acad. Pub. Dordrecht. A review of transgenic plants, including transformation techniques, is provided in Galun and Breiman, 1997, Transgenic Plants. Imperial College Press, London. Methods for genetic manipulation of plants

A number of plant transformation strategies are available (e.g. Birch, 1997, Ann Rev Plant

Phys Plant MoI Biol, 48, 297). For example, strategies may be designed to increase expression of a polynucleotide/polypeptide in a plant cell, organ and/or at a particular developmental stage where/when it is normally expressed or to ectopically express a polynucleotide/polypeptide in a cell, tissue, organ and/or at a particular developmental stage which/when it is not normally expressed. The expressed polynucleotide/polypeptide may be derived from the plant species to be transformed or may be derived from a different plant species.

Transformation strategies may be designed to reduce expression of a polynucleotide/polypeptide in a plant cell, tissue, organ or at a particular developmental stage which/when it is normally expressed. Such strategies are known as gene silencing strategies.

Genetic constructs for expression of genes in transgenic plants typically include promoters for driving the expression of one or more cloned polynucleotide, terminators and selectable marker sequences to detest presence of the genetic construct in the transformed plant.

The promoters suitable for use in the constructs of this invention are functional in a cell, tissue or organ of a monocot or dicot plant and include cell-, tissue- and organ- specific promoters, cell cycle specific promoters, temporal promoters, inducible promoters, constitutive promoters that are active in most plant tissues, and recombinant promoters. Choice of promoter will depend upon the temporal and spatial expression of the cloned polynucleotide, so desired. The promoters may be those normally associated with a transgene of interest, or promoters which are derived from genes of other plants, viruses, and plant pathogenic bacteria and fungi. Those skilled in the art will, without undue experimentation, be able to select promoters that are suitable for use in modifying and modulating plant traits using genetic constructs comprising the polynucleotide sequences of the invention. Examples of constitutive plant promoters include the CaMV 35 S promoter, the nopaline synthase promoter and the octopine synthase promoter, and the Ubi 1 promoter from maize. Plant promoters which are active in specific tissues, respond to internal developmental signals or external abiotic or biotic stresses are described in the scientific literature. Exemplary promoters are described, e.g., in WO 02/00894, which is herein incorporated by reference. Exemplary terminators that are commonly used in plant transformation genetic construct include, e.g., the cauliflower mosaic virus (CaMV) 35S terminator, the Agrobacterium tumefaciens nopaline synthase or octopine synthase terminators, the Zea mays zein gene terminator, the Orγza sativa ADP-glucose pyrophosphorylase terminator and the Solarium tuberosum PI-II terminator.

Selectable markers commonly used in plant transformation include the neomycin phophotransferase II gene (NPT II) which confers kanamycin resistance, the aadA gene, which confers spectinomycin and streptomycin resistance, the phosphinothricin acetyl transferase {bar gene) for Ignite (AgrEvo) and Basta (Hoechst) resistance, and the hygromycin phosphotransferase gene ( hpt) for hygromycin resistance.

Use of genetic constructs comprising reporter genes (coding sequences which express an activity that is foreign to the host, usually an enzymatic activity and/or a visible signal (e.g., luciferase, GUS, GFP) which may be used for promoter expression analysis in plants and plant tissues are also contemplated. The reporter gene literature is reviewed in Herrera- Estrella et ah, 1993, Nature 303, 209, and Schrott, 1995, In: Gene Transfer to Plants (Potrykus, T., Spangenberg. Eds) Springer Verlag. Berline, pp. 325-336.

Gene silencing strategies may be focused on the gene itself or regulatory elements which effect expression of the encoded polypeptide. "Regulatory elements" is used here- in the widest possible sense and includes other genes which interact with the gene of interest.

Genetic constructs designed to decrease or silence the expression of a polynucleotide/polypeptide of the invention may include an antisense copy of a polynucleotide of the invention. In such constructs the polynucleotide is placed in an antisense orientation with respect to the promoter and terminator.

An "antisense" polynucleotide is obtained by inverting a polynucleotide or a segment of the polynucleotide so that the transcript produced will be complementary to the mRNA transcript of the gene, e.g.,

5'GATCTA 3' (coding strand) 3'CTAGAT 5' (antisense strand)

3'CUAGAU 5' mRNA ^" 5'GAUCUCG 3' antisense RNA Genetic¹ constructs designed for gene silencing may also include an inverted repeat. An 'inverted repeat' is a sequence that is repeated where the second half of the repeat is in the complementary strand, e.g.,

5'-GATCTA TAGATC-3'

3'-CTAGAT ATCT AG-5'

The transcript formed may undergo complementary base pairing to form a hairpin structure. Usually a spacer of at least 3-5 bp between the repeated region is required to allow hairpin formation.

Another silencing approach involves the use of a small antisense RNA targeted to the transcript equivalent to an miRNA (Llave et al, 2002, Science 297, 2053). Use of such small antisense RNA corresponding to polynucleotide of the invention is expressly contemplated.

The term genetic construct as used herein also includes small antisense RNAs and other such polypeptides effecting gene silencing.

Transformation with an expression construct, as herein defined, may also result in gene silencing through a process known as sense suppression (e.g. Napoli et al, 1990, Plant Cell 2, 279; de Carvalho Niebel et al., 1995, Plant Cell, 7, 347). In some cases sense suppression may involve over-expression of the whole or a partial coding sequence but may also involve expression of non-coding region of the gene, such as an intron or a 5' or 3' untranslated region (UTR). Chimeric partial sense constructs can be used to coordinately silence multiple genes (Abbott et al, 2002, Plant Physiol. 128(3): 844-53; Jones et al, 1998, Planta 204: 499- 505). The use of such sense suppression strategies to silence the expression of a polynucleotide of the invention is also contemplated.

The polynucleotide inserts in genetic constructs designed for gene silencing may correspond to coding sequence and/or non-coding sequence, such as promoter and/or intron and/or 5' or - 3' UTR sequence, or the corresponding gene. Other gene silencing strategies include dominant negative approaches and the use of ribozyme constructs (Mclntyre, 1996, Transgenic Res, 5, 257)

Pre-transcriptional silencing may be brought about through mutation of the gene itself or its regulatory elements. Such mutations may include point mutations, frameshifts, insertions, deletions and substitutions.

The following are representative publications disclosing genetic transformation protocols that can be used to genetically transform the following plant species: Rice (Alam et al, 1999, Plant Cell Rep. 18, 572); apple (Yao et al, 1995, Plant Cell Reports 14, 407-412); maize (US Patent Serial Nos. 5, 177, 010 and 5, 981, 840); wheat (Ortiz et al, 1996, Plant Cell Rep. 15, 1996, 877); tomato (US Patent Serial No. 5, 159, 135); potato (Kumar et al, 1996 Plant J. 9, : 821); cassava (Li et al., 1996 Nat. Biotechnology 14, 736); lettuce (Michelmore et al., 1987, Plant Cell Rep. 6, 439); tobacco (Horsch et al., 1985, Science 227, 1229); cotton (US Patent Serial Nos. 5, 846, 797 and 5, 004, 863); grasses (US Patent Nos. 5, 187, 073 and 6. 020, 539); peppermint (Niu et al., 1998, Plant Cell Rep. 17, 165); citrus plants (Pena et al, 1995, Plant Sci.104, 183); caraway (Krens et al., 1997, Plant Cell Rep, 17, 39); banana (US Patent Serial No. 5, 792, 935); soybean (US Patent Nos. 5, 416, 011 ; 5, 569, 834 ; 5, 824, 877 ; 5, 563, 04455 and 5, 968, 830); pineapple (US Patent Serial No. 5, 952, 543); poplar (US Patent No. 4, 795, 855); monocots in general (US Patent Nos. 5, 591, 616 and 6, 037, 522); brassica (US Patent Nos. 5, 188, 958 ; 5, 463, 174 and 5, 750, 871); cereals (US Patent No. 6, 074, 877); pear (Matsuda et al., 2005, Plant Cell Rep. 24(1):45-51); Prunus (Ramesh et al., 2006, Plant Cell Rep. 25(8):821-8; Song and Sink 2005, Plant Cell Rep. 2006; 25(2):1 17-23; Gonzalez Padilla et al., 2003, Plant Cell Rep. 22(l):38-45); strawberry (Oosumi et al., 2006, Planta.; 223(6):1219-30; Folta et al., 2006, Planta. 2006 Apr 14; PMID: 16614818), rose (Li et al., 2003, Planta. 218(2):226-32), Rubus (Graham et al., 1995, Methods MoI Biol. 1995;44: 129-33) and Avocado (Cruz-Hernandez et al., 1998 Plant Cell Reports 17 497-503). Transformation of other species is also contemplated by the invention. Suitable methods and protocols for transformation of other species are available in the scientific literature.

Several further methods known in the art may be employed to alter expression of a nucleotide and/or polypeptide of the invention. Such methods include but are not limited to Tilling (Till et al, 2003, Methods MoI Biol, 2%, 205), so called "Deletagene" technology (Li et al, 2001, Plant Journal 27(3), 23-5) and the use of artificial transcription factors such as synthetic zinc finger transcription factors, (e.g. Jouvenot et al, 2003, Gene Therapy 10, 513). Additionally antibodies or fragments thereof, targeted to a particular polypeptide may also be expressed in plants to modulate the activity of that polypeptide (Jobling et al, 2003, Nat. Biotechnol., 21(1), 35). Transposon tagging approaches may also be applied. Additionally peptides interacting with a polypeptide of the invention may be identified through technologies such as phase-display (Dyax Corporation). Such interacting peptides may be expressed in or applied to a plant to affect activity of a polypeptide of the invention. Use of each of the above approaches in alteration of expression of a nucleotide and/or polypeptide of the invention is specifically contemplated.

Methods of selecting plants

Methods are also provided for selecting plants with altered pigment production. Such methods involve testing of plants for altered for the expression of a polynucleotide or polypeptide of the invention. Such methods may be applied at a young age or early developmental stage when the altered pigment production may not necessarily be visible, to accelerate breeding programs directed toward improving anthocyanin content.

The expression of a polynucleotide, such as a messenger RNA, is often used as an indicator of expression of a corresponding polypeptide. Exemplary methods for measuring the expression of a polynucleotide include but are not limited to Northern analysis, RT-PCR and dot-blot analysis (Sambrook et al, Molecular Cloning : A Laboratory Manual, 2nd Ed. Cold Spring

Harbor Press, 1987). Polynucleotides or portions of the polynucleotides of the invention are thus useful as probes or primers, as herein defined, in methods for the identification of plants with altered levels of anthocyanin. The polypeptides of the invention may be used as probes in hybridization experiments, or as primers in PCR based experiments, designed to identify such plants.

Alternatively antibodies may be raised against polypeptides of the invention. Methods for raising and using antibodies are standard in the art (see for example: Antibodies, A Laboratory Manual, Harlow A Lane, Eds, Cold Spring Harbour Laboratory, 1998). Such antibodies may be used in methods to detect altered expression of polypeptides which modulate flower size in plants. Such methods may include ELISA (Kemeny, 1991, A Practical Guide to ELISA, NY Pergamon Press) and Western analysis (Towbin & Gordon, 1994, J Immunol Methods, 72, 313).

These approaches for analysis of polynucleotide or polypeptide expression and the selection of plants with altered expression are useful in conventional breeding programs designed to produce varieties with altered pigment production.

Mapping applications

The methods of the invention encompass detection of polymorphisms and/or mutations associated with altered expression and/or activity of the genes corresponding the the polynucleotides of the invention. That is invention encompass methods for detection of allellic variants with altered expression and/or activity of the genes corresponding the the polynucleotides of the invention.

Thus the invention is also related to the use of the polynucleotides of the invention as markers or to assist in a breeding program, as described for example in PCT publication US89/00709.

Using PCR, characterization of the gene present in a particular tissue or plant variety may be made by an analysis of the genotype of the tissue or variety. Thus particular alleles may be detected which have altered expression or activity of the genes corresponding to the polynucleotides of the invention.

For example, deletions and insertions can be detected by a change in size of the amplified product in comparison to the genotype of a reference sequence. Point mutations can be identified by hybridizing amplified DNA to radiolabeled RNA or alternatively, radiolabeled antisense DNA sequences. Perfectly matched sequences can be distinguished from mismatched duplexes by RNase A digestion or by differences in melting temperatures.

Sequence differences between a reference gene and genes having mutations also may be revealed by direct DNA sequencing. In addition, cloned DNA segments may be employed as probes to detect specific DNA segments. The sensitivity of such methods can be greatly enhanced by appropriate use of PCR or another amplification method. For example, a sequencing primer is used with double-stranded PCR product or a single-stranded template molecule generated by a modified PCR. The sequence determination is performed by conventional procedures with radiolabeled nucleotide or by automatic sequencing procedures with fluorescent tags.

Genetic typing of various varieties of plants based on DNA sequence differences may be achieved by detection of alteration in electrophoretic mobility of DNA fragments in gels, with or without denaturing agents.' Small sequence deletions and insertions can be visualized by high resolution gel electrophoresis. DNA fragments of different sequences may be distinguished on denaturing formamide gradient gels in which the mobilities of different DNA fragments are retarded in the gel at different positions according to their specific melting or partial melting temperatures (see, e.g., Myers et al., Science, 230:1242 (1985)).

Sequence changes at specific locations also may be revealed by nuclease protection assays, such as RNase and Sl protection or the chemical cleavage method (e.g., Cotton et al., Proc. Natl. Acad. ScL, (USA), 85:43974401 (1985)).

Thus, the detection of a specific DNA sequence may be achieved by methods such as hybridization, RNase protection, chemical cleavage, direct DNA sequencing or the use of restriction enzymes, (e.g., restriction fragment length polymorphisms ("RFLP")) and Southern blotting of genomic DNA.

In addition to more conventional gel-electrophoresis and DNA sequencing, mutations also can be detected by in situ analysis.

A mutation may be ascertained, for example, by a DNA sequencing assay. Samples are processed by methods known in the art to capture the RNA. First strand cDNA is synthesized from the RNA samples by, adding an oligonucleotide primer consisting of sequences that hybridize to a region on the mRNA. Reverse transcriptase and deoxynucleotides are added to allow synthesis of the first strand cDNA. Primer sequences are synthesized based on the DNA sequences of the cytokinin modulating enzymes of the invention. The primer sequence is generally comprised of at least 15 consecutive bases, and may contain at least 30 or even 50 consecutive bases. RT-PCR can also be used to detect mutations. It is particularly preferred to use RT-PCR in conjunction with automated detection systems, such as, for example, GeneScan. RNA or cDNA may also be used for the same purpose, PCR or RT-PCR. As an example, PCR primers complementary to the polynucleotides of the invnetion can be used to identify and analyze mutations.

Deletions and insertions can also be detected by a change in size of the amplified product in comparison to the normal genotype. Point mutations can be identified by hybridizing amplified DNA to radiolabeled RNA, or alternatively, radiolabeled antisense DNA sequences. While perfectly matched sequences can be distinguished from mismatched duplexes by RNase a digestion or by differences in melting temperatures, preferably point mutations are identified by sequence analysis.

Plants

The plants of the invention may be grown and either self-ed or crossed with a different plant strain and the resulting hybrids, with the desired phenotypic characteristics, may be identified. Two or more generations may be grown to ensure that the subject phenotypic characteristics are stably maintained and inherited. Plants resulting from such standard breeding approaches also form an aspect of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be better understood with reference to the accompanying drawings in which:

Figure 1 shows an alignment of the avocado MYBlO protein sequence with other MYB sequences involved in anthocyanin production.

Figure 2 shows a bootstrap phylogenetic analysis of the avocado MYBlO (PaMYBlO, black dot) protein sequence with other MYB sequences involved in anthocyanin production (red dots), and all other Arabidopsis R2R3 MYBs. Figure 3 shows % identity of avocado MYBlO protein with other anthocyanin related MYB transcription factors.

Figure 4 shows avocado and other fruit MYB proteins in a phylogeny with MYB genes from other species. Avocado MYBlO (black dot: PaMYBlO) clusters with other MYBlOs, and genes from other species involved in regulating anthocyanin biosynthesis (red dots). Subgroup numbers are those described by Stracke et ah, (2001) and are shown as a suffix after most MYB descriptors. Arabidopsis genes identified by Arabidopsis unique identifiers. The accession numbers for other species of genes, or translated products, in the GenBank database are as follows: AmROSEAl [ABB83826], AmROSEA2 [ABB83827], AmVENOSA [ABB83828], MdMYBlO [DQ267896], VvMYBAl [AB242302], VvMYBA2 [AB097924], V1MYBA2, [AB073013]; Ca A [CAE75745], PhAN2 [AAF66727], LeANTl [AAQ55181], GhMYBlO [CAD87010]; PmMBFl [AAA82943], ZmCl [AAK81903], AtTT2 [Q9FJA2], MdMYBδ [DQ267899], AtGLl [AAC97387], FaMYBl [AAK84 64].

Figure 5 shows qPCR expression of PaMYBlO in leaves and fruit flesh and skin. PaMYBlO is fruit specific and mainly in the skin (A), In leaves, even when reddening (B) PaMYBlO appears not to be responsible for foliage pigmentation.

Figure 6 shows qPCR expression of PaMYBlO in the fruit skin of darkening "Hass" and green-skinned "Fuerte". PaMYBlO is specific to pigmented skin and increases rapidly as Hass fruit mature and colour. Stages represent maturity states after picking, over a 7 day period.

Figure 7 shows the results of trans-activation assays where the avocado MYB (PaMYBlO) gene was infiltrated into N. benthamiana leaves either with or without the apple (MdbHLFB) or Arabidopsis (AtbHLH2) BHLH genes. Trans-activation of the Arabidopsis DFR promoter (AtDFR) was measured using the AtDRF-LUC reporter cassette.

EXAMPLES

The invention will now be illustrated with reference to the following non-limiting examples. Example 1: Isolation and characterisation of R2R3 MYB transcription factors of the invention.

Isolation of PaMYBlO

Previous studies, in other species, have shown that subgroup 10 MYBs are involved in regulating plant pigmentation. However, within publicly available avocado EST databases (8,700 Persea americana, nucleotide sequences as of August 2007); there were no MYB TFs showing high homology in BLAST searches to Arabidopsis PAPl and subgroup 10 MYBs from other species.

The applicants selected fruit skin tissue just turning pink, from both cultivated Avocado {Persea Americana cv Hass). The applicants predicted that this tissue/developmental state would contain transcription factors regulating anthocyanin production in avocado. Messenger RNA (mRNA) was isolated from this tissue by standard techniques, and subjected to PCR, 3' and 5' RACE (GeneRacer, Invitrogen).

Degenerate primers (SEQ ID NO: 13) were used as shown in Table 1 below (with a 32 fold degeneracy) designed to the consensus DNA sequence of the R2R3 DNA binding domain based on the sequence of anthocyanin regulators in diverse species. Also 5' RACE primers (SEQ ID NO: 9 & 10) based on the R2R3 region of apple MdMYBlO were used. Numerous cDNAs encoding R2R3 MYB domains were obtained. The applicants identified one sequence that they designated PaMYBlO as a candidate sequence encoding transcription factors that modulates anthocyanin production. The complete sequence for PaMYBlO was compiled from overlapping fragments. Then full length clones (both genomic and cDNA) were isolated using gene specific primers (SEQ ID NO: 15 & 17) designed to the 5' and 3' UTR regions.

The translated amino acid sequence of PaMYBlO is shown in SEQ ID NO: 1.

The cDNA sequence of PaMYBlO is shown in SEQ ID NO: 2.

The genomic (gDNA) sequence of PaMYBlO is shown in SEQ ID NO: 3. Table 1: Primers used for PaMYBlO PCR and real-time PCR

An alignment of the PaMYBlO amino acid sequence with those of other anthocyanin regulators was produced as follows. The PaMYBlO cDNA sequence was trimmed of vector, adapter and low quality sequence regions and uploaded to Vector NTI version 9.0.0 (Invitrogen). Full length sequences were aligned using Vector NTI Clustal W (opening=15, extension=0.3). The result is shown in Figure 1. Phylogeny

Phylogenetic and molecular evolutionary analyses of the aligned sequences were conducted using MEGA version 3.1 (Kumar, et al, 2004) using minimum evolution phylogeny test and 1000 bootstrap replicates.

Results are shown in Figure 2.

PaMYBlO appears to be related to the Arabidopsis subgroup 10 MYBs, including PAPl. The sequences show only 58% identity to the entire protein but 80% amino acid identity over the R2R3 DNA binding domain. The PaMYBlO amino acid sequence was used to search the NCBI EST data base using TBLASTX (Nucleotide query - Translated db [tblastx]" is useful for identifying novel genes in error prone query sequence) (http://www.ncbi.nlm.nih.gov/blast/ ). The closest blast hit identified is Arabidopsis PAP2 (AT1G66390, AtMYB90). However, using calculated identity Arabidopsis sequences were slightly less homologous than Antirrhinum ROSEA2 and ROSEAl (Schwinn et al., 2006) and other Rosaceous MYB10-like sequences (Lin- Wang et al., unpublished; Espley et al., 2007; Takos et al. 2006), at the protein level (see Figure 3).

Percent sequence identity was calculated after aligning the sequences with several other MYB sequences involved in anthocyanin regulation using Clustal W (Thompson et al 1994, Nucleic Acid Res 1 1 (22)4673-4680). Percent identity between the protein sequences is shown in Figure 3.

A bootstrapped circular phylogenic tree shown in Figure 4 was generated using MEGA version 3.1 (Kumar et al, 2004) and shows PaMYBlO FvMYBlO in the same clade as Arabidopsis PAPl, PAP2, AtMYBl 13 and AtMYBl 14. Example 2: In vivo expression of the R2R3 MYB transcription factors of the invention correlates strongly with anthocyanin levels in fruit.

Real time (qPCR) expression analysis

RNA was isolated (using a procedure adapted from Chang et al., 1993) from avocado pigmented fruit skin, pink fruit flesh and six types of leaves (colour ranges from dark green to pink, shown in Figure 5). First strand cDNA synthesis was carried out by using oligo dT according to the manufacturers instructions (Invitrogen).

The sequence of avocado actin was identified from HortResearch's proprietory EST database. Gene specific primers, corresponding to the avocado PaMYBlO and active (PaActin) sequences were designed using Vector NTI version 9.0.0 (Invitrogen) to a stringent set of criteria, enabling application of universal reaction conditions. To check reaction specificity, RT-PCR reactions were carried out according to manufacturer's instructions (Platinum Taq, Invitrogen, Carlsbad, CA), with a thermal profile as follows; pre-incubation at 95°C for 5 minutes followed by 35 cycles of 95°C (30 seconds), 60°C (30 seconds) and 72°C (30 seconds) with a final extension at 72°C for 5 minutes. The sequence of each primer pair and the relevant accession number are shown in Table I. To eliminate amplification of gDNA contamination, both PaActin primers and PaMYBlO primers span introns.

qPCR DNA amplification and analysis was carried out using the LightCycler System (Roche LightCycler 1.5, Roche Diagnostics). All reactions were performed with the LightCycler FastStart SYBR Green Master Mix (Roche Diagnostics) following the manufacturer's method. Reactions were performed in triplicate using 2 μl 5x Master Mix, 0.5 μM each primer, 1 μl diluted cDNA and nuclease-free water (Roche Diagnostics) to a final volume of 10 μl. A negative water control was included in each run. Fluorescence was measured at the end of each annealing step. Amplification was followed by a melting curve analysis with continual fluorescence data acquisition during the 65°C to 95°C melt. The raw data was analysed with the LightCycler software version 4 and expression was normalised to Per sea americana Actin {PaActin) to minimise variation in cDNA template levels, with the avocado dark green leaf sample acting as calibrator with a nominal value of 1. PaActin was selected for normalisation due to its consistent transcript level throughout fruit tissues and leaf with the range of crossing threshold (Ct) values < 1 across experiments. For each gene a standard curve was generated with a cDNA serial dilution and the resultant PCR efficiency calculations (ranging between 1.717 and 1.897) were imported into relative expression data analysis. To ensure that the transcripts of single genes had been amplified, qPCR amplicons were sequenced and confirmed as the expected plant DNA sequences. Error bars shown in qPCR data are technical replicates, the means ± S.E. of 3 replicate qPCR reactions.

Results

qPCR analysis of the expression of PaMYBlO gene transcripts in the red leaves of avocado, and fruit skin and flesh, revealed massive increases in the relative transcript levels of this transcription factor in the fruit tissues (Figure 5). Over a timecourse of pigmentation change (7 days of maturity off the tree, devided into 7 stages) PaMYBlO expression increase over 50 fold (Figure 6). This occurred only in the dark skinned "Hass" cultivar. Green skinned "Fuerte" showed no PaMYBlO expression (Figure 6), Only very low levels of the gene are present in leaves, with a slight elevation when the leaf reddens. PaMYB 10 therefore appears to be fruit specific, and drives pigementation in the skin of "Hass". The visible pigment in blackening avocado skin is known to be anthocyanic (cyanidin 3-O-glucoside; Ashton et al., 2006).

Example 3: Induction of the promoter of an anthocyanin biosynthetic gene by over- expression of the R2R3 MYB transcription factors of the invention in plants.

Dual luciferase assay of transiently transformed tobacco leaves

The promoter sequence for Arabidopsis DFR (SEQ ID NO: 8) was inserted into the cloning site of pGreen 0800-LUC (Hellens et ah, 2005) and modified to introduce an Ncol site at the 3' end of the sequence, allowing the promoter to be cloned as a transcriptional fusion with the firefly luciferase gene (LUC). Thus, TFs that bind the promoter and increase the rate of transcription could be identified by an induced increase in luminescence activity. Arabidopsis DFR (TT3, AT5g42800) was isolated from genomic Arabidopsis DNA. In the same construct, a luciferase gene from Renilla (REN), Under the control of a 35S promoter, provided an estimate of the extent of transient expression. Activity is expressed as a ratio of LUC to REN activity so that where the interaction between a TF (+/- bHLH) and the promoter occurred, a significant increase in the LUC activity relative to REN would be observed. The promoter-LUC fusion in pGreenll 0800-LUC was used in transient transformation by mixing 100 μl of Agrobacterium strain GV3101 (MP90) transformed with the reporter cassette with two other Agrobacterium cultures (450 μl each) transformed with cassettes containing a MYB TF gene fused to the 35S promoter and a bHLH TF gene in either pART27 (Gleave, 1992) or pGreenll 62-SK binary vectors (Hellens et al, 2000).

Nicotiana benthamiana plants were grown under glasshouse conditions in full potting mix, using natural light with daylight extension to 16 h, until at least 3 leaves were available for infiltration with Agrobacterium. Plants were maintained in the glasshouse for the duration of the experiment. Agrobacterium was cultured on Lennox agar (Invitrogen) supplemented with selection antibiotics and incubated at 28°C. A 10 μl loop of confluent bacterium were re- suspended in 10 ml of infiltration media (10 mM MgCl₂, 0.5 μM acetosyringone), to an OD_60O of 0.2, and incubated at room temperature without shaking for 2 h before infiltration. Infiltrations were performed according to the methods of Voinnet et al, (2003). Approximately 150 μl of this Agrobacterium mixture was infiltrated at six points into a young leaf of N. benthamiana and transient expression was assayed 3 days after inoculation.

Firefly luciferase and renilla luciferase were assayed using the dual luciferase assay reagents (Promega, Madison, WI). Three days after inoculation, 2 cm leaf discs (6 technical replicates from each plant) were removed and ground in 500 μl of passive lysis buffer (PLB). Ten μl of a 1/100 dilution of this crude extract was assayed in 40 μl of luciferase assay buffer, and the chemiluminescence measured. 40 μl of Stop and Glow™ buffer was then added and a second chemiluminescence measurement made. Absolute relative luminescence units (RLU) were measured in a Turner 20/20 luminometer (Turner BioSystems, Sunnyvale, CA), with a 5 s delay and 15 s integrated measurement.

Transient luminescent assays of avocado transcription factor activity

The dual luciferase system has been demonstrated to provide a rapid method of transient gene expression analysis (Matsuo et al., 2001, Hellens et al, 2005). It requires no selectable marker and results can be quantified with a simple enzymatic assay. We used Nicotiana benthamiana to test the interaction of our candidate TFs with an Arabidopsis anthocyanin biosynthesis gene promoter AtDFR (TT3, AT5g42800). This is known to be regulated by Arabidopsis PAPl and PAP2 MYB TFs (Zimmermann et al, 2004, Tohge et al, 2005).

Since PaMYBlO has the amino acid residues that specify interaction with bHLHs (Grotewold et al, 2000, Zimmermann et al., 2004), transient assays were performed in the presence or absence of known bHLH cofactors. These bHLH were of the IHf clade of bHLH (Heim et al,

2003) that has been shown to be involved in the regulation of anthocyanin biosynthesis, from apple MdbHLH3 (CN934367), and Arabidopsis AΛHLH2 (Atlg63650). Results from the promoter assay indicated a significant increase in activity when PaMYBlO was co- transformed with an apple or Arabidopsis bHLH (Figure 7). These results reflect previous work in a transient protoplast transfection system where an Arabidopsis DFR promoter:GUS fusion was activated by PAPl only in the presence of a bHLH (Zimmermann et al, 2004).

Conclusion

A combination of bioinformatics, expression analysis and functional demonstration of the activation of the promoter of a gene in the anthocyanin biosynthetic pathways provides strong evidence that PaMYBlO is an important regulator of anthocyanin biosyntheses in avocado.

It is not the intention to limit the scope of the invention to the above mentioned examples only. As would be appreciated by a skilled person in the art, many variations are possible without departing from the scope of the invention.

References

Aharoni A., De Vos C.H., Wein M., Sun Z., Greco R., Kroon A., MoI J.N., O'Connell A.P. (2001) The strawberry FaMYBl transcription factor suppresses anthocyanin and flavonol accumulation in transgenic tobacco. Plant J. 28, 319-32

Ashton OB, Wong M, McGhie TK, Vather R, Wang Y, Requejo-Jackman C, Ramankutty P, Woolf AB. (2006) Pigments in avocado tissue and oil. J Agric Food Chem. 54(26): 10151-10158

Baudry, A., Heim, M.A., Dubreucq, B., Caboche, M., Weisshaar, B. and Lepiniec, L.

(2004) TT2, TT8, and TTGl synergistically specify the expression of BANYULS and proanthocyanidin biosynthesis in Arabidopsis thaliana. Plant J. 39, 366-380

Borevitz, J.O., Xia, Y., Blount, J., Dixon, R.A. and Lamb, C. (2000) Activation tagging identifies a conserved MYB regulator of phenylpropanoid biosynthesis. Plant Cell, 12, 2383-2394

Boss, P.K., Davies, C. and Robinson, S.P. (1996) Expression of anthocyanin biosynthesis pathway genes in red and white grapes. Plant MoI. Biol. 32, 565-569

Boyer, J. and Liu, R. (2004) Apple phytochemicals and their health benefits. Nutrition J. 3,

5

Brooks, R.M. and Olmo, H.P. 1972. Register of New Fruit and Nut Varieties. University of California Press, London.

Brouillard, R. (1988) Flavonoids and flower colour. In JB Harborne, ed, The Flavonoids ^■ Advances in Research since 1980. Chapman & Hall, London, pp 525-538

Chang, S., Puryear, J. and Cairney, J. (1993) A simple and efficient method for isolating RNA from pine trees. Plant MoI. Biol. Rep. 11, 113-116. CIE. (1986) Colorimetry 2^nd edn. Publication CIE No. 15.2, Central Bureau of the Commission Internationale de L'Eclairage, Vienna.

Cruz-Hernandez A, Witjaksono, Litz RE, Lim MG (1998) Agrobacterium tumefaciens - mediated transformation of embryogenic avocado cultures and regeneration of somatic embryos Plant Cell Reports 17 (6-7): 497-503

Davies, K.M. and Schwinn, K.E. (2003) Transcriptional regulation of secondary metabolism. Fund. Plant. Biol. 30, 913-925

De Jong, W.S., Eannetta, N.T., De Jong, D.M. and Bodis, M. (2004) Candidate gene analysis of anthocyanin pigmentation loci in the Solanaceae. Theor. Appl. Genet. 108, 423 - 432

de Vetten, N., Quattrocchio, F., MoI, J. and Koes, R. (1997) The anil locus controlling flower pigmentation in petunia encodes a novel WD-repeat protein conserved in yeast, plants, and animals. Genes Dev. 11, 1422-1434

Dixon, R.A. and Steele, CL. (1999) Flavonoids and isoflavonoids - a gold mine for metabolic engineering. Trends Plant ScL 4, 394-400

Dong, Y.H., Mitra, D., Kootstra, A., Lister, C. and Lancaster, J. (1998) Postharvest stimulation of skin colour in Royal Gala apple. J. Am. Soc. Hortic. ScL 120, 95-100

Espley R. V., Hellens R.P., Putterill J., Stevenson, D.E., Kutty-Amma, S. Allan A.C. (2007) Red Colouration in Apple Fruit is Due to the Activity of the MYB Transcription Factor, MdMYBlO. The Plant Journal, 49(3), 414-427

Gleave, A. (1992) A versatile binary vector system with a T-DNA organisational structure conducive to efficient integration of cloned DNA into the plant genome. Plant MoI. Biol. 20, 1203-1207 Goff, S. A., Cone, K.C. and Chandler, V.L. (1992) Functional analysis of the transcriptional activator encoded by the maize B gene: evidence for a direct functional interaction between two classes of regulatory proteins. Genes Dev. 6, 864-875

Goodrich, J., Carpenter, R. and Coen, E.S. (1992) A common gene regulates pigmentation pattern in diverse plant species. Cell, 68, 955-964

Grotewold, E., Sainz, M.B., Tagliani, L., Hernandez, J.M., Bowen, B. and Chandler V.L.

(2000) Identification of the residues in the Myb domain of maize Cl that specify the interaction with the bHLH cofactor R. Proc. Natl Acad. ScL USA, 97, 13579-13584

Harborne, J.B. (1967) Comparative biochemistry of the flavonoids. Academic press, London

Harborne, J.B. and Grayer, R. J. (1994) Flavonoids and insects. In JB Harborne, ed, The Flavonoids: Advances in Research Since 1986. Chapman & Hall, London, p 589-618

Heim, M.A., Jakoby, M., Werber, M., Martin, C, Bailey, P.C. and Weisshaar, B. (2003) The basic helix-loop-helix transcription factor family in plants: a genome-wide study of protein structure and functional diversity. MoI. Biol. Evol. 20, 735-747

Hellens, R.P., Edwards, E.A., Leyland, N.R., Bean, S. and Mullineaux, P.M. (2000) pGreen: a versatile and flexible binary Ti vector for Agrobacterium-mediated plant transformation. Plant MoI. Biol. 42, 819-832

Hellens, R.P., Allan, A.C., Friel, E.N., Bolitho, K., Grafton, K., Templeton, M.D., Karunairetnam, S. and Laing, W.A. (2005) Transient plant expression vectors for functional genomics, quantification of promoter activity and RNA silencing. Plant Methods, 1:13

Hernandez, J.M., Heine, G.F., Irani, N.G., Feller, A., Kim, M-G., Matulnik, T., Chandler, V.L. and Grotewold, E. (2004) Different mechanisms participate in the R-dependent activity of the R2R3 MYB transcription factor Cl. J Biol. Chem. 279, 48205-48213

Hoffmann, T., Kalinowski, G., and Schwab, W. (2006) RNAi-induced silencing of gene expression in strawberry fruit (Fragaria x ananassa) by agroinfiltration: a rapid assay for gene function analysis. The Plant Journal 48 (5), 818-826.

Holton, T. A. and Cornish, E.C. (1995) Genetics and biochemistry of anthocyanin biosynthesis. Plant Cell, 7, 1071-1083

Honda, C, Kotoda, N., Wada, M., Kondo, S., Kobayashi, S., Soejima, J., Zhang, Z., Tsuda, T. and Moriguchi, T. (2002) Anthocyanin biosynthetic genes are coordinately expressed during red coloration in apple skin. Plant Physiol. Biochem.

40, 955-962

Jin, H. and Martin, C. (1999) Multifunctionality and diversity within the plant MYB-gene family. Plant MoI. Biol. 41, 577-585

Kim, S-H., Lee, J-R., Hong, S-T., Yoo, Y-K., An, G. and Kim, S-R. (2003) Molecular cloning and analysis of anthocyanin biosynthesis genes preferentially expressed in apple skin. Plant ScL 165, 403-413

Kobayashi, S., Ishimaru, M., Hiraoka, K. and Honda, C. (2002) Myb-related genes of the Kyoho grape (Vitis labruscana) regulate anthocyanin biosynthesis. Planta 215, 924- 933

Kobayashi, S., Goto-Yamamoto, N. and Hirochika, H. (2004) Retrotransposon-induced mutations in grape skin colour. Science, 304, 982

Koes, R.E., Quattrocchio, F. and MoI, J.N.M. (1994) The flavonoid biosynthetic pathway in plants: Function and evolution. BioEssays, 16, 123-132 Kumar S, Tamura K, and Nei, M. (2004) MEGA3: Integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform. 5, 150-63.

Lancaster, J. (1992) Regulation of skin color in apples. Crit. Rev. Plant. Sci. 10, 487-502

Lister, CE. and Lancaster, J. E. (1996) Developmental changes in enzymes of flavonoid biosynthesis in the skins of red and green apple cultivars. J. Sci. Food. Agric. 71, 313-

320

Martin, C. and Paz-Ares, J. (1997) MYB transcription factors in plants. Trends in Genetics, 13, 67-73

Mathews H, Clendennen SK, Caldwell CG, Liu XL, Connors K, Matheis N, Schuster DK, Menasco DJ, Wagoner W, Lightner J, Wagner DR. (2003) Activation tagging in tomato identifies a transcriptional regulator of anthocyanin biosynthesis, modification, and transport. Plant Cell 15(8): 1689-703.

Matsuo, N., Minami, M., Maeda, T. and Hiratsuka, K. (2001) Dual luciferase assay for monitoring transient gene expression in higher plants. Plant Biotechnol. 18, 71-75

MoI, J., Grotewold, E. and Koes, R. (1998) How genes paint flowers and seeds. Trends Plant. ScL 3, 212-217

MoI, J. J., Schafer, E. and Weiss, D. (1996) Signal perception, transduction, and gene expression involved in anthocyanin biosynthesis. Crit. Rev. Plant. Sci. 15, 525-557

Montefiori, M., McGhie, T. K., Costa, G. and Ferguson, A. R. (2005) Pigments in the Fruit of Red-Fleshed Kiwifruit {Actinidia chinensis and Actinidia deliciosa).

J. Agric. Food Chem, 53, 9526-9530.

Nesi, N., Debeaujon, L, Jond, C, Pelletier, G., Caboche, M. and Lepiniec, L. (2000) The TT8 gene encodes a Basic Helix-Loop-Helix domain protein required for expression of DFR and BAN genes in Arabidopsis sύiques. Plant Cell, 12, 1863-1878 Newcdmb, R.D., Crowhurst, R., Gleave, A.P., Rikkerink, E., Allan, A.C., Beuning, L.,

Bowen, J., Gera, E., Jamieson, K.R., Janssen, B., Laing, W.A., McArtney, S.,

Nain, B., Ross, G., Snowden, K., Souleyre, E., Walton, E., Yauk, Y-C. (2006)

Analyses of Expressed Sequence Tags from Apple (Malus x domesticά) Plant Physiol. 141, 147-167

Noda, K-I., Glover, B. J., Linstead, P. and Martin, C. (1994) Flower colour intensity depends on specialized cell shape controlled by a Myb-related transcription factor. Nature, 369, 661-664

Ramsay, N.A. and Glover, B.J. (2005) MYB-bHLH-WD40 protein complex and the evolution of cellular diversity. Trends Plant ScL 10, 63-70

Saito, K. and Yamazaki, M. (2002) Biochemistry and molecular biology of the late-stage of biosynthesis of anthocyanin: lessons from Perilla frutescens as a model plant. New Phytol 155, 9-23

Stevenson, D.E., Wibisono, R., Jensen, D.J., Stanley, R.A., and Cooney, J.M. (2006) Direct acylation of flavonoid glycosides with phenolic acids catalysed by Candida antarctica lipase B (Novozym 435^®). Enzyme Microb Technol, 39, 1236-1241

Stracke, R., Werber, M. and Weisshaar, B. (2001) The R2R3-MYB gene family in Arabidopsis thaliana. Curr. Opin. Plant Biol. 4, 447-456

Schwinn, K.E., Venail, J., Shang, Y., Mackay, S., Aim, V., Butelli, E., Oyama, R., Bailey, P., Davies, K.M. and Martin, C. (2006) A small family of Mrø-regulatory genes controls floral pigmentation intensity and patterning in the genus Antirrhinum. Plant Cell 18, 831-851

Tohge, T., Nishiyama, Y., Hirai, M.Y., Yano, M., Nakajima, J., Awazuhara, M., Inoue, E., Takahashi, H., Goodenowe, D.B., Kitayama, M., Noji, M., Yamazaki, M. and Saito, K. (2005) Functional genomics by integrated analysis of metabolome and transcriptome of Arabidopsis plants over-expressing an MYB transcription factor. Plant J. 42, 218-235

Takos AM, Jaffe FW, Jacob SR, Bogs J, Robinson SP, Walker AR. (2006) Light-Induced Expression of a MYB Gene Regulates Anthocyanin Biosynthesis in Red Apples. Plant

Physiol. 142, 1216-1232

Tsao, R., Yang, R., Young, J.C. and Zhu, H. (2003) Polyphenolic profiles in eight apple cultivars using high-performance liquid chromatography (HPLC). J. Agric. Food Chem. 51, 6347-6353

Voinnet, O., Rivas, S., Mestre, P. and Baulcombe, D. (2003) An enhanced transient expression system in plants based on suppression of gene silencing by the pi 9 protein of tomato bushy stunt virus. Plant J. 33, 949-956

Walker, A.R., Davison, P.A., Bolognesi-Winfield, A.C., James, CM., Srinivasan, N.,

Blundell, T.L., Esch, J. J., Marks, M.D. and Gray, J.C. (1999) The TRANSPARENT TESTA GLABRA 1 locus, which regulates trichome differentiation and anthocyanin biosynthesis in Arabidopsis, encodes a WD40 repeat protein. Plant Cell,

11, 1337-1350

Winkel-Shirley, B. (2001) Flavonoid biosynthesis. A colourful model for genetics, biochemistry, cell biology, and biotechnology. Plant Physiol. 126, 485-493

Xie, D-Y., Sharma, S.B., Wright, E., Wang, Z-Y. and Dixon, R. A. (2006) Metabolic engineering of proanthocyanidins through co-expression of anthocyanidin reductase and the PAPl MYB transcription factor. Plant J. 45, 895-907. Yao, J-L., Cohen, D., Atkinson, R., Richardson, K. and Morris, B. (1995) Regeneration of transgenic plants from the commercial apple cultivar Royal Gala. Plant Cell Reports, 14, 407-412

Zhang, F., Gonzalez, A., Zhao, M., Payne, C.T., and Lloyd, A. (2003) A network of redundant bHLH proteins functions in all TTGl -dependent pathways of Arabidopsis.

Development 130, 4859-4869

Zimmermann, I.M., Heim, M.A., Weisshaar, B. and Uhrig, J.F. (2004) Comprehensive identification of Arabidopsis thaliana MYB transcription factors interacting with R/B- like BHLH proteins. Plant J. 40, 22-34

Table 2: SUMMARY OF SEQUENCES

Claims

CLAIMS:

1. An isolated polynucleotide comprising a sequence encoding a polypeptide with the amino acid sequence of SEQ ID NO:1 or a variant thereof, wherein the variant is an R2R3 MYB transcription factor that positively regulates anthocyanin production in a plant.

2. The isolated polynucleotide of claim 1, wherein the variant comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 1.

3. The isolated polynucleotide of claim 1, encoding a polypeptide comprising the amino acid sequence of SEQ ID NO: 1.

4. An isolated polynucleotide comprising a sequence encoding a polypeptide with the amino acid sequence of SEQ ID NO:1 or a variant thereof, wherein the polypeptide or variant thereof is an R2R3 MYB transcription factor that positively regulates the promoter of a gene in the anthocyanin biosynthetic pathway in a plant.

5. The isolated polynucleotide of claim 4, wherein the variant comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 1.

6. The isolated polynucleotide of claim 4, wherein the polynucleotide encodes a polypeptide comprising the amino acid sequence of SEQ ID NO: 1.

7. The isolated polynucleotide of claim 4, wherein the gene in the anthocyanin biosynthetic pathway encodes dihydroflavonol 4-reductase (DFR).

8. The isolated polynucleotide of claim 4, wherein the promoter has at least 70% identity to the sequence of SEQ ID NO: 8.

9. The isolated polynucleotide of claim 4, wherein the promoter has the sequence of SEQ ID NO: 8.

10. An isolated polynucleotide comprising the sequence of SEQ ID NO: 2 or a variant thereof, wherein the polynucleotide or variant encodes an R2R3 MYB transcription factor that regulates anthocyanin production in a plant.

11. The isolated polynucleotide of claim 10, wherein the variant comprises a nucleic acid sequence with at least 70% identity to the sequence of SEQ ID NO: 2.

12. The isolated polynucleotide of claim 10, wherein the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 2.

13. The isolated polynucleotide of claim 10, wherein the variant comprises a nucleic acid sequence with at least 70% identity to the sequence of SEQ ID NO: 3.

14. The isolated polynucleotide of claim 10, wherein the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 3.

15. An isolated polynucleotide comprising the sequence of SEQ ID NO: 2 or a variant thereof, wherein the polynucleotide or variant thereof encodes an R2R3 MYB transcription factor that -regulates the promoter of a gene in the anthocyanin biosynthetic pathway in a plant.

16. The isolated polynucleotide of claim 15, wherein the variant comprises a nucleic acid sequence with at least 70% identity to the sequence of any one of SEQ ID NO: 2.

17. The isolated polynucleotide of claim 15, wherein the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 2.

18. The isolated polynucleotide of claim 15, wherein the variant comprises a nucleic acid sequence with at least 70% identity to the sequence of SEQ ID NO: 3.

19. The isolated polynucleotide of claim 15, wherein the polynucleotide comprises the nucleic acid sequence of SEQ ID NO: 3.

20. The isolated polynucleotide of claim 15, wherein the gene in the anthocyanin biosynthetic pathway encodes dihydroflavolon 4-reductase (DFR).

21. The isolated polynucleotide of claim 15, wherein the promoter has at least 70% identity to the sequence of SEQ ID NO: 8.

22. The isolated polynucleotide of claim 15, wherein the promoter has the sequence of SEQ ID NO: 8.

23. An isolated polypeptide comprising: a) the amino acid sequence of SEQ ID NO: 1 or a variant thereof, wherein the polypeptide or variant thereof is an R2R3 MYB transcription factor that regulates anthocyanin production in a plant; or b) a fragment, of at least 5 amino acids in length, of the sequence of a), capable of performing the same function as the polypeptide in a).

24. The isolated polypeptide of claim 23, wherein the variant comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 1.

25. The isolated polypeptide of claim 24, wherein the polypeptide comprises the amino acid sequence of SEQ ID NO: 1.

26. An isolated polypeptide comprising: a) the amino acid sequence of SEQ ID NO: 1 or a variant thereof, wherein the polypeptide or variant thereof is an R2R3 MYB transcription factor that regulates the promoter of a gene in the anthocyanin biosynthetic pathway in a plant. b) a fragment, of at least 5 amino acids in length, of the sequence of a), capable of performing the same function as the polypeptide in a).

27. The isolated polypeptide of claim 26, wherein the variant comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 1.

28. The isolated polypeptide of claim 26, wherein the polypeptide comprises the amino acid sequence of SEQ ID NO: 1.

29. The isolated polypeptide of claim 26, wherein the gene in the anthocyanin biosynthetic pathway encodes dihydroflavonol 4-reductase (DFR).

30. The isolated polypeptide of claim 26, wherein the promoter has at least 70% identity to the sequence of SEQ ID NO: 8.

31. The isolated polypeptide of claim 26, wherein the promoter has the sequence of SEQ ID NO: 8.

32. A polynucleotide encoding a polypeptide of any one of claims 23 to 31.

33. An antibody raised against a polypeptide of any one of claims 23 to 31.

34. A genetic construct comprising a polynucleotide of any one of claims 1 to 22 and 32.

35. A host cell genetically modified to express a polynucleotide of claims 1 to 22 and 32.

36. A host cell comprising a genetic construct of claim 34.

37. A plant cell or plant genetically modified to express a polynucleotide of claims 1 to 22 and 32.

38. A plant cell or plant comprising a genetic construct of claim 34.

39. A method for producing a plant cell or plant with altered anthocyanin production, the method comprising the step of transformation of a plant cell or plant with a genetic construct including: a) at least one polynucleotide of any one of claims 1 to 22 and 32 encoding a MYB polypeptide; b) at least one polynucleotide, or gene, encoding of a MYB polypeptide of any one of claims 23 to 31 ; c) at least one polynucleotide comprising a fragment, of at least 15 nucleotides in length, of the polynucleotide or gene of a) or b); d) at least one polynucleotide comprising a complement, of at least 15 nucleotides in length, of the polynucleotide of c); or e) at least one polynucleotide capable of hybridising under stringent conditions to the polynucleotide or gene of a) or b).

40. The method of claim 40 including the additional step of transforming the plant with a construct designed to express a bHLH transcription factor, such that the bHLH transcription factor is co-expressed with the MYB polypeptide.

41. The method of claim 41, wherein the bHLH transcription factor comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 4 or 5.

42. The method of claim 41, wherein the bHLH transcription factor comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 4.

43. The method of claim 41, wherein the bHLH transcription factor comprises an amino acid sequence with the sequence of SEQ ID NO: 4.

44. The method of claim 41, wherein the bHLH transcription factor comprises an amino acid sequence with at least 70% identity to the sequence of SEQ ID NO: 5.

45. The method of claim 41, wherein the bHLH transcription factor comprises an amino acid sequence with the sequence of SEQ ID NO: 5.

46. A plant produced by the method of any one of claims 39 to 45.

47. A method for selecting a plant altered in anthocyanin production, the method comprising testing of a plant for: a) altered expression of a polynucleotide of any one of claims 1 to 22 and 32; b) altered expression of a polypeptide of any one of claims 23 to 31, or c) the presence of a polymorphism associated with the altered expression in a) or b).

48. A group or population of plants selected by the method of claim 47.

49. A a method for selecting a plant cell or plant that has been transformed, the method comprising the steps a) transforming a plant cell or plant with a polynucleotide any one of claims 1 to 22 and 32 capable of regulating anthocyanin production in a plant; b) expressing the polynucleotide in the plant cell or plant; and c) selecting a plant cell or plant with increased anthocyanin pigmentation relative to other plant cells or plants, the increased anthocyanin pigmentation indicating that the plant cell or plant has been transformed.

50. A transformed plant selected by the method of claim 49.