HK1159183A

HK1159183A - Methods and compositions for improving the production of products in microorganisms

Info

Publication number: HK1159183A
Application number: HK11113440.6A
Authority: HK
Inventors: 杰弗里．布兰查德; 苏珊．莱申; 埃尔莎．珀蒂; 约翰．费伯尔; 马蒂亚斯．施马利士
Original assignee: 马萨诸塞大学; 特瑞斯有限公司
Priority date: 2008-07-28
Filing date: 2009-07-28
Publication date: 2012-07-27

Description

Methods and compositions for improving product production in microorganisms

Requirement of priority

The present application claims priority from U.S. provisional patent application No. 61/084,233 filed on 28 th.2008, U.S. provisional patent application No. 61/225,184 filed on 13 th.2009, and U.S. provisional patent application No. 61/228,922 filed on 27 th.2009, the entire contents of which are incorporated herein by reference.

Technical Field

The present invention relates to the fields of microbiology, molecular biology and biotechnology. More particularly, the present invention relates to methods and compositions for improving the production of products such as ethanol and hydrogen in microorganisms.

Background

It would be advantageous to develop methods and compositions for producing useful energy from renewable and sustainable biomass resources. Energy in the form of carbohydrates can be found in waste biomass and dedicated energy crops, for example, grains such as corn or wheat, or grasses such as switchgrass.

The challenge today is to develop a viable and economical strategy for converting carbohydrates into a usable energy form. Strategies to derive useful energy from carbohydrates include the production of ethanol and other alcohols, the conversion of carbohydrates to hydrogen, and the direct conversion of carbohydrates to electrical energy by fuel cells. Examples of strategies for obtaining Biomass in the form of Ethanol are described by DiPardo, Journal of Outlook for Biomass Ethanol Production and Demand (EIA forms), 2002; sheehan, Biotechnology Progress, 15: 8179, 1999; martin, Enzyme Microbes Technology, 31: 274, 2002; greenr, BioCycle, 61-65, April 2005; lynd, Microbiology and Molecular Biology Reviews, 66: 3,506-; and Lynd et al in "consistent Bioprocessing of Cellulosic Biomass: an Update, "Current Opinion in Biotechnology, 16: 577 and 583, 2005.

Disclosure of Invention

The present application is based in particular on the identification of the Clostridium phytofermentans gene encoding products predicted to be involved in growth on media useful for the production of products such as fuels, e.g. ethanol and hydrogen. The genes identified herein may be expressed heterologously in other microorganisms to provide new or enhanced functions. Likewise, these genes may be expressed in c.

Some embodiments include polynucleotides comprising an isolated nucleic acid encoding at least one hydrolase identified in c. In such embodiments, the isolated nucleic acid can be selected from table 6. In specific embodiments, the hydrolase is selected from the group consisting of Cphy3367, Cphy3368, Cphy0430, Cphy3854, Cphy0857, Cphy0694, and Cphy 1929. The designation Cphy3367 represents the JGI number, which refers to the american National Center for Biotechnology Information (NCBI) locus tag of c.phytofermentans, documented by Genbank. In further embodiments, the polynucleotide may comprise a regulatory sequence operably linked to the isolated nucleic acid encoding the hydrolase.

Some embodiments include polynucleotides comprising an isolated nucleic acid encoding at least one ATP-binding cassette (ABC) transporter identified in c. In these embodiments, the isolated nucleic acid can be selected from table 7. In particular embodiments, the ABC transporter is selected from the group consisting of Cphy3854, Cphy3855, Cphy3857, Cphy3858, Cphy3859, Cphy3860, Cphy3861, and Cphy 3862. In further embodiments, the polynucleotide may comprise a regulatory sequence operably linked to an isolated nucleic acid encoding an ABC transporter.

Some embodiments include polynucleotides comprising an isolated nucleic acid encoding at least one transcriptional regulator identified in c. In these embodiments, the isolated nucleic acid can be selected from table 8. In further embodiments, the polynucleotide may comprise a regulatory sequence operably linked to an isolated nucleic acid encoding a transcriptional regulator.

Some embodiments include a polynucleotide cassette comprising any combination of nucleic acids encoding a hydrolase, an ABC transporter, and a transcriptional regulator described herein. In one embodiment, the polynucleotide cassette may comprise an isolated nucleic acid encoding at least one hydrolase and an isolated nucleic acid encoding at least one ABC transporter. In another embodiment, a polynucleotide cassette may comprise an isolated nucleic acid encoding at least one hydrolase and an isolated nucleic acid encoding at least one transcriptional regulator. In another embodiment, a polynucleotide cassette may comprise an isolated nucleic acid encoding at least one ABC transporter and an isolated nucleic acid encoding at least one transcriptional regulator. In another embodiment, the polynucleotide cassette may comprise an isolated nucleic acid encoding at least one hydrolase and an isolated nucleic acid encoding at least one ABC transporter and an isolated nucleic acid encoding at least one transcriptional regulator.

Some embodiments include an expression cassette comprising any of the polynucleotides described herein and a regulatory sequence operably linked to the polynucleotide cassette.

Some embodiments include recombinant microorganisms comprising any of the polynucleotides, polynucleotide cassettes, and/or expression cassettes described herein. In a specific embodiment, the recombinant microorganism may be selected from the group consisting of Clostridium cellulovorans, Clostridium cellulolyticum, Clostridium thermocellum, Clostridium josui, Clostridium papyrisolvens, Clostridium celluloperonosicum, Clostridium cellulobial, Clostridium cellulobiopolycarpum, Clostridium hugnatriei, Clostridium cellulossi, Clostridium stercorarium, Clostridium cellulolyticum, Clostridium poretii, Clostridium cellulolyticum, Clostridium sporotrichum, Clostridium chaeta, Clostridium andreanicola, Clostridium thermophilum, Clostridium autodridium, Clostridium herboricum, Clostridium biovorans, Clostridium cellulolyticum, Escherichia coli, Clostridium thermobacter, Clostridium cellulolyticum, Clostridium thermolyticum, Clostridium thermobacter cellulolyticum, Clostridium cellulolyticum, Clostridium cellulolyticum, and Clostridium cellulolyticum, Clostridium cellulolyticum strain, Clostridium cellulolyticum strain, and Clostridium cellulolyticum strain.

Some embodiments include isolated proteins encoding the hydrolases identified in c. In some embodiments, a method of producing ethanol is provided. These methods include culturing the microorganism, providing a culture medium, and providing any of the isolated proteins described herein.

Some embodiments include an isolated polynucleotide cassette comprising one or more, two or more, or all three of: sequences encoding Clostridium phytofermentans hydrolase, sequences encoding c. In some embodiments, the hydrolase is selected from the group consisting of Cphy3368, Cphy3367, Cphy1799, Cphy1800, Cphy2105, Cphy1071, Cphy0430, Cphy1163, Cphy3854, Cphy1929, Cphy2108, Cphy3158, Cphy3207, Cphy3009, Cphy3010, Cphy2632, Cphy3586, Cphy0218, Cphy0220, Cphy1720, Cphy3160, Cphy2276, Cphy1714, Cphy0694, Cphy3202, Cphy3862, Cphy0858, Cphy1510, Cphy2128, Cphy1169, Cphy1888, Cphy2919, and Cphy 1612. In some embodiments, the ABC transporter is selected from the group consisting of Cphy1529, Cphy1530, Cphy1531, Cphy3858, Cphy3859, Cphy3860, Cphy2569, Cphy2570, Cphy2571, Cphy2654, Cphy2655, Cphy2656, Cphy3588, Cphy3589, Cphy3590, Cphy3210, Cphy 9, Cphy3208, Cphy2274, Cphy2273, Cphy2272, Cphy2268, Cphy2267, Cphy2266, Cphy2265, Cphy2012, Cphy2011, Cphy2010, Cphy2009, Cphy1717, Cphy1716, Cphy1715Cphy1451, Cphy1449, Cphy1448, Cphy 1443, and Cphy 1132.

Some embodiments include recombinant microorganisms comprising nucleic acids disclosed herein, such as one or more, two or more, or all three: an exogenous nucleic acid encoding a Clostridium phytofermentans hydrolase, an exogenous nucleic acid encoding a c. In some embodiments, the hydrolase is selected from the group consisting of Cphy3368, Cphy3367, Cphy1799, Cphy1800, Cphy2105, Cphy1071, Cphy0430, Cphy1163, Cphy3854, Cphy1929, Cphy2108, Cphy3158, Cphy3207, Cphy3009, Cphy3010, Cphy2632, Cphy3586, Cphy0218, Cphy0220, Cphy1720, Cphy3160, Cphy2276, Cphy1714, Cphy0694, Cphy3202, Cphy3862, Cphy0858, Cphy1510, Cphy2128, Cphy1169, Cphy1888, Cphy2919, and Cphy 1612. In some embodiments, the ABC transporter is selected from the group consisting of Cphy1529, Cphy1530, Cphy1531, Cphy3858, Cphy3859, Cphy3860, Cphy2569, Cphy2570, Cphy2571, Cphy2654, Cphy2655, Cphy2656, Cphy3588, Cphy3589, Cphy3590, Cphy3210, Cphy 9, Cphy3208, Cphy2274, Cphy2273, Cphy2272, Cphy2268, Cphy2267, Cphy2266, Cphy2265, Cphy2012, Cphy2011, Cphy2010, Cphy2009, Cphy1717, Cphy1716, Cphy1715Cphy1451, Cphy1449, Cphy1448, Cphy 1443, and Cphy 1132. In some embodiments, the microorganism is selected from the group consisting of Clostridium cellulovorans, Clostridium cellulolyticum, Clostridium thermocellulosum, Clostridium jolyticum, Clostridium josui, Clostridium papyrifera, Clostridium cellulosimilis, Clostridium thermocoprinus, Clostridium cellulobiopolybacterium, Clostridium hugnathi, Clostridium polysilocosis, Clostridium polysaccharomyces, Clostridium porifer, Clostridium sporophyllum, Clostridium chartarum, Clostridium sporotrichum, Clostridium chartarum, Clostridium alderidium, Clostridium bionatriensis, Clostridium cellulolyticum, Escherichia coli, Thermococcus cellulose, Clostridium cellulolyticum, Clostridium thermolyticum, Clostridium cellulolyticum, Thermobacterium cellulolyticum, Clostridium cellulolyticum, Clostridium cellulolyticum, and Clostridium cellulolyticum.

Some embodiments include a method of producing ethanol, the method comprising culturing at least one recombinant microorganism described herein. Such embodiments may also include providing a culture medium to the microorganism. In particular embodiments, the culture medium may be selected from the group consisting of sawdust, wood flour, wood pulp, paper pulp waste, grasses such as switchgrass, biomass plants and agricultural crops such as crambe, seaweed, rice hulls, bagasse, jute, leaves, macroalgae material, microalgae material, grass clippings (grass clippings), corn stover, corn cobs, corn kernels, corn flour, distillers grains, and pectin. In particular embodiments, the medium may be pectin.

Some embodiments include methods of treating a hydrolase culture medium comprising providing a microorganism exogenously expressing a Clostridium phytofermentans hydrolase and providing the microorganism with a culture medium of the hydrolase such that the culture medium is treated to form a product. In some embodiments, the microorganism exogenously expresses a Clostridium phytofermentans ATP Binding Cassette (ABC) transporter that transports (e.g., imports or exports) products.

Some embodiments include products for biofuel production, including lignocellulosic biomass and microorganisms capable of directly hydrolyzing and fermenting the biomass, wherein the microorganisms are modified to increase the activity of one or more cellulases (such as one or more cellulases disclosed herein, e.g., Cphy3367, Cphy3368, Cphy0218, Cphy3207, Cphy2058 and Cphy 1163). In some embodiments, the microorganism is capable of directly fermenting five and six carbon sugars. In some embodiments, the microorganism is a bacterium, such as a species of the genus Clostridium, such as Clostridium phytofermentans. In some embodiments, the microorganism comprises one or more heterologous polynucleotides that increase the activity of one or more cellulases.

Some embodiments include products for biofuel production, including carbonaceous biomass and microorganisms capable of directly hydrolyzing and fermenting the biomass, wherein the microorganisms are modified to increase the activity of one or more cellulases (such as one or more cellulases disclosed herein, e.g., Cphy3367, Cphy3368, Cphy0218, Cphy3207, Cphy2058 and Cphy 1163). In some embodiments, the microorganism is capable of producing a fermentation end product. In some embodiments, a substantial portion of the fermentation end product is ethanol. In some embodiments, the fermentation end product comprises lactic acid, acetic acid, and/or formic acid. In some embodiments, the microorganism is capable of absorbing one or more complex carbohydrates. In some embodiments, the biomass has a higher concentration of oligomeric carbohydrates than monomeric carbohydrates.

Some embodiments include a method of producing a biofuel, comprising:

(a) contacting the carbonaceous biomass with a microorganism capable of directly hydrolyzing and fermenting the carbonaceous biomass, wherein the microorganism is modified to increase the activity of one or more cellulases (such as one or more cellulases disclosed herein, e.g., Cphy3367, Cphy3368, Cphy0218, Cphy3207, Cphy2058 and Cphy 1163); and

(b) sufficient time is allowed for the hydrolysis and fermentation to produce biofuel. In some embodiments, the microorganism is capable of absorbing one or more complex carbohydrates. In some embodiments, the biomass has a higher concentration of oligomeric carbohydrates than monomeric carbohydrates. In some embodiments, the hydrolysis results in a higher concentration of cellobiose and/or larger oligomers relative to the monomeric carbohydrates.

The headings used herein are for structural purposes only and are not to be construed as limiting the described subject matter in any way. All documents and similar materials cited in this application, including but not limited to patents, patent applications, articles, books, discussions and internet web pages, are specifically incorporated by reference herein in their entirety for any purpose. If a definition of a term in an incorporated reference differs from the definition provided in this application, the definition provided in this application controls. It should be appreciated that there is an implicit "about" preceding a measure discussed herein, such as temperature, concentration, and time, and thus slight and insubstantial deviations are included within the scope of this application. In this application, the singular includes the plural unless specifically stated otherwise. Likewise, the use of "including" is not meant to be limiting. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention. The terms "a" or "an," as used herein, refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. For example, "an element" means one element or more than one element.

Unless otherwise defined, technical and scientific terms used in connection with the inventions described herein shall have the meaning commonly understood by one of ordinary skill in the art. Furthermore, unless the context requires otherwise, singular terms shall include the plural and plural terms shall include the singular. Generally, the terms and techniques used in connection with cell and tissue culture, molecular biology, and protein and oligo-or polynucleotide chemistry and hybridization described herein are those known and commonly used in the art. For example, standard techniques are used for nucleic acid purification and preparation, chemical analysis, recombinant nucleic acid and oligonucleotide synthesis. Enzymatic reactions and purification techniques are performed according to the manufacturer's instructions or as commonly used in the art or as described herein. The techniques and procedures described herein are generally implemented according to conventional methods known in the art and as described in numerous general and more specific references that are cited and discussed throughout the present specification. See, e.g., Sambrook et al, Molecular Cloning: a Laboratory Manual (third edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. 2000). The terminology used herein and the laboratory procedures and techniques described herein are those known and commonly used in the art.

For use consistent with the embodiments provided herein, the following terms, unless otherwise noted, shall be understood to have the following meanings:

"nucleotide" refers to a phosphate ester of a nucleoside, either as a monomeric unit or within a nucleic acid. "nucleotide 5 '-triphosphate" refers to the nucleotide at the 5' -position of the triphosphate group, which is sometimes referred to as "NTP" or "dNTP" and "ddNTP" to specifically indicate the structural characteristics of ribose. The triphosphate group may include sulfur substituents for different oxygens, such as a-thio-5' -triphosphate nucleotide. For a review of nucleic acid chemistry, see: shabarova, Z. and Bogdannov, A, advanced Organic Chemistry of Nucleic Acids, VCH, New York, 1994.

The terms "nucleic acid" and "nucleic acid molecule" refer to natural nucleic acid sequences such as DNA (deoxyribonucleic acid) and RNA (ribonucleic acid), artificial nucleic acids, analogs thereof, or combinations thereof.

As used herein, the terms "polynucleotide" and "oligonucleotide" are used interchangeably to mean a single-or double-stranded polymer of nucleotide monomers (nucleic acids), including but not limited to 2 ' -deoxyribonucleotides (nucleic acids) and Ribonucleotides (RNAs) linked by internucleoside phosphodiester linkages such as 3 ' -5 ' and 2 ' -5 ', reverse linkages such as 5 ' -5 ', branched structures, or nucleic acid analogs. Polynucleotides having associated counterions, e.g. H⁺、NH4⁺Trialkylammonium and Mg²⁺、Na⁺And so on. The polynucleotide may consist entirely of deoxyribonucleotides, entirely of ribonucleotides, or a mixture thereof. Polynucleotides may be composed of nucleobase and sugar analogs. Exemplary PolynucleotideRanges in size from a few monomeric units (e.g., 5-40 in the art when they are more commonly designated as oligonucleotides) to several thousand monomeric nucleotide units. Unless otherwise specified, whenever a polynucleotide sequence is present, it is understood that the nucleotides are in 5 'to 3' order from left to right, and "a" represents deoxyadenosine, "C" represents deoxycytidine, "G" represents deoxyguanosine, and "T" represents thymidine.

As used herein, "fuel and/or other chemical" refers to compounds suitable as liquid or gaseous fuels, including, but not limited to, hydrocarbons, hydrogen, methane, hydroxyl compounds such as alcohols (e.g., ethanol, butanol, propanol, methanol, etc.), carbonyl compounds such as aldehydes and ketones (e.g., acetone, formaldehyde, 1-propanal, etc.), organic acids, organic acid derivatives such as esters (e.g., wax esters, glycerides, etc.), and other functional compounds including, but not limited to, 1, 2-propanediol, 1, 3-propanediol, lactic acid, formic acid, acetic acid, succinic acid, and pyruvic acid, which are produced by enzymes such as cellulases, polysaccharases, lipases, proteases, ligninases, and hemicellulases.

The term "plasmid" refers to a circular nucleic acid vector. Typically, plasmids contain an origin of replication such that many copies of the plasmid are produced in bacterial (or sometimes eukaryotic) cells without integration into the host cell DNA.

The term "construct" as used herein refers to a recombinant nucleotide sequence, typically a recombinant nucleic acid molecule, which is produced for the purpose of expressing a specific nucleotide sequence, or which is used in the construction of other recombinant nucleotide sequences. In general, a "construct" as used herein refers to a recombinant nucleic acid molecule.

An "expression cassette" refers to a set of polynucleotide elements that allow transcription of a polynucleotide in a host cell. Typically, the expression cassette includes a promoter and a heterologous or native polynucleotide sequence that is transcribed. The expression cassette or construct may likewise include, for example, transcription termination signals, polyadenylation signals, and enhancer elements.

By "expression vector" is meant a vector that allows expression of a polynucleotide in a cell. Expression of a polynucleotide includes transcriptional and/or post-transcriptional events. An "expression construct" is a vector into which a nucleotide sequence of interest is inserted in such a way that the nucleotide sequence of interest is placed in operable linkage to an expression sequence present in the expression vector.

An "operon" refers to a group of polynucleotide elements that produce messenger rna (mrna). Typically, the operon includes a promoter and one or more structural genes. Typically, an operon comprises one or more structural genes that are transcribed into a polycistronic mRNA (a single mRNA molecule encoding more than one protein). In some embodiments, the operon may also include an operator gene that modulates the activity of an operator structural gene.

The term "host cell" as used herein refers to a cell transformed using the methods and compositions of the present invention. In general, a host cell as used herein means a microbial cell into which a nucleic acid of interest has been introduced.

The term "transformation" as used herein refers to permanent or transient genetic alteration, such as permanent genetic alteration, induced in a cell following fusion of non-host nucleic acid sequences.

The term "transformed cell" as used herein refers to a cell into which a nucleic acid molecule encoding a gene product of interest (e.g., an RNA and/or protein) has been introduced into the cell (or an ancestor thereof) by means of recombinant nucleic acid techniques.

The term "gene" as used herein refers to any and all discrete coding regions of a host genome, or regions that encode only functional RNAs (e.g., tRNA's, rRNA's, regulatory RNAs such as ribozymes) and includes related non-coding and regulatory regions. The term "gene" includes within its scope open reading frames encoding a specific polypeptide, introns, and adjacent 5 'and 3' non-coding nucleotide sequences involved in the regulation of expression. In this regard, a gene may further comprise control signals such as promoters, enhancers and/or termination signals naturally associated with the given gene, or heterologous control signals. The gene sequence may be a cDNA or genomic nucleic acid or a fragment thereof. The gene may be introduced into a suitable vector for extrachromosomal presence or integration into a host.

The terms "gene of interest," "nucleotide sequence of interest," "polynucleotide of interest," or "nucleic acid of interest" as used herein refer to any nucleotide or nucleic acid sequence that encodes a protein or other molecule that is desired to be expressed in a host cell, e.g., for production of a protein or other biological molecule (e.g., an RNA product) in a target cell. The nucleotide sequence of interest may be operably linked to other sequences that facilitate expression, such as a promoter.

The term "promoter" as used herein refers to a minimal nucleic acid sequence sufficient to direct transcription of a nucleic acid sequence to which it is operably linked. The term "inducible promoter" as used herein refers to a promoter that is transcriptionally active when bound to a transcriptional activator, which in turn is activated under specific conditions, such as the presence of a specific chemical signal or combination of chemical signals that affect the binding of the transcriptional activator to the inducible promoter and/or affect the function of the transcriptional activator itself.

The term "operon," "control sequence," or "control sequence" as used herein refers to a nucleic acid sequence that regulates the expression of an operably linked coding sequence in a particular host organism. Suitable control sequences for prokaryotes include, for example, promoters, optionally operator sequences and ribosome binding sites.

"operably linked" and the like mean a linkage in which the polynucleotide elements have a functional relationship. A nucleic acid sequence is "operably linked" when it is placed into a functional relationship with another nucleic acid sequence. For example, a promoter or enhancer is operably linked to a coding sequence if it affects the transcription of the coding sequence. In some embodiments, operably linked means that the nucleic acid sequences being linked are typically contiguous, in that it is necessary for the two protein coding regions to be contiguous and in reading frame. A coding sequence is "operably linked" to another coding sequence and RNA polymerase will transcribe the two coding sequences into a single mRNA, which is then translated into a single polypeptide having amino acids derived from both coding sequences. The coding sequences need not be adjacent to each other, as long as the expression sequences are ultimately processed to produce the desired protein.

By "operably linked" to a transcribable polynucleotide is meant that the transcribable polynucleotide is placed under the regulatory control of the promoter, which then controls transcription of the polynucleotide and optionally translation of the polynucleotide. In the construction of heterologous promoter/structural gene compounds, the promoter or variant thereof is typically placed away from the transcription start site of the transcribable polynucleotide, approximately the same distance between the promoter and the gene it naturally controls, i.e., the gene from which the promoter is derived. As is known in the art, some variants can be placed at this distance without loss of function. Similarly, the typical placement of a regulatory sequence element, such as an operon, enhancer, associated with a transcribable polynucleotide under the control of the regulatory sequence element is defined by the placement of the element in its natural context, i.e., the gene from which it is derived.

"culturing" means incubating a cell or organism under conditions in which the cell or organism is capable of performing some, if not all, of the biological process. For example, the cell may be cultured for growth or propagation, or it may be inactive but still capable of biological and/or biochemical processes such as replication, transcription, translation, and the like.

By "transgenic organism" is meant a non-human organism (e.g., a unicellular organism (e.g., a microorganism), a mammal, a non-mammal (e.g., a nematode or a drosophila)) having in part of its cells a non-endogenous (i.e., heterologous) nucleic acid sequence or a non-endogenous (i.e., heterologous) nucleic acid sequence stably integrated into its germline nucleic acid.

The term "biomass" as used herein refers to a mass of living or biological material and includes natural or processed, and more broadly natural organic materials.

"recombinant" refers to synthetic or other in vitro manipulated polynucleotides (recombinant polynucleotides) and methods of using recombinant polynucleotides to produce gene products encoded by those polynucleotides in cells or other biological systems. For example, the cloned polynucleotide may be inserted into a suitable expression vector, such as a bacterial plasmid, which may be used to transform a suitable host cell. Host cells comprising recombinant polynucleotides are referred to as "recombinant host cells" or "recombinant bacteria". The gene is then expressed in a recombinant host cell to produce, for example, a "recombinant protein". In addition, the recombinant polynucleotide may provide non-coding functions, such as a promoter, an origin of replication, or a ribosome binding site.

The term "heterologous recombination" refers to a recombination process between two nucleic acid molecules based on the similarity of the nucleic acid sequences. The term includes reciprocal recombination and non-reciprocal recombination (also known as gene conversion). Furthermore, the recombination can be the result of an equivalent or non-equivalent cross-over event. An equivalent crossover occurs between two equivalent sequences or chromosomal regions, while a non-equivalent crossover occurs between identical (or substantially identical) segments of non-equivalent sequences or chromosomal regions. Non-equal crossover typically results in gene duplication or deletion. Enzymes and mechanisms involved in homologous recombination are described in Watson et al, Molecular Biology of The Gene 313-327, The Benjamin/Cummings Publishing Co, 4 th edition (1987).

The term "non-homologous or random integration" refers to any process in which a nucleic acid is integrated into the genome that does not involve homologous recombination. It appears to be a random process where integration can occur at any of a number of genomic locations.

"heterologous polynucleotide sequence" or "heterologous nucleic acid" is a relative term referring to a polynucleotide that is functionally related to another polynucleotide (e.g., a promoter sequence) in such a way that the two polynucleotide sequences are not placed in the same relationship as one another in nature. The heterologous polynucleotide sequence includes, e.g., a promoter operably linked to the heterologous nucleic acid, and a polynucleotide sequence comprising its native promoter, which is inserted into a heterologous vector for transformation into a recombinant host cell. Heterologous polynucleotide sequences are considered "exogenous" in that they are introduced into a host cell by transformation techniques. However, the heterologous polynucleotides may be from different sources or from the same source. Alteration of the sequence of the heterologous polynucleotide may occur, for example, by treating the polynucleotide with a restriction enzyme to produce a polynucleotide sequence operably linked to a regulatory element. Alterations may also occur, for example, by site-directed mutagenesis.

The term "endogenous expression" refers to a polynucleotide that is native to and naturally expressed in a host cell.

"capable of expression" refers to a host cell that provides sufficient cellular environment for the expression of endogenous and/or exogenous polynucleotides.

The present application relates to U.S. provisional application No. 61/032,048 filed on day 27 of 2008, international application No. PCT/US2009/35597 filed on day 27 of 2009, U.S. application No. 12/419,211 filed on day 6 of 2009, U.S. provisional application No. 61/060,620 filed on day 11 of 2008, and U.S. application No. 12/483,118 filed on day 11 of 2009, each of which is incorporated herein by reference in its entirety for any purpose.

The figures, descriptions and examples below illustrate some specific embodiments of the invention in detail. Those skilled in the art will recognize that many variations and modifications are included within its scope. Thus, the description of some specific embodiments should not be considered as limiting the scope of the invention.

Drawings

FIG. 1 is a schematic diagram of a series of polynucleotide gene combination examples. R represents a transcription regulatory sequence; A. b and C represent sequences encoding ATP-binding cassette (ABC) transporters; GH represents a sequence encoding a glycoside hydrolase; s represents a signal sequence.

Fig. 2 is a schematic diagram of a series of embodiments of gene combinations in c. Numbers represent the location of a particular sequence on the c.

Figure 3 is a schematic diagram of the design of c. The 24-base probes synthesized on the microarray are indicated by dashes. Boxes represent predicted open reading frames, such as protein coding regions. 11 24 base probes were used to measure the level of each Open Reading Frame (ORF). The intergenic region is flanked on both sides of the DNA by 24-base probes, which are distinguished by a single DNA base.

FIG. 4 is a schematic of a method of measuring mRNA transcript range. The putative mRNA transcript includes non-coding regions that extend 5 'and 3' of the corresponding predicted ORF. The probes are indicated by dashes. In this example, three probes at the left end (5 ') of the ORF and two probes at the right end (3') of the ORF will indicate transcript ranges of the mRNA.

Fig. 5 is a diagram of the c.

Figure 6 is a graph showing GC content in a 1kb genomic fragment as a function of c. 6 genomic islands with GC content > 50% were numbered. These 6 regions constitute a total of 161 kb regions.

Fig. 7 is a contiguous tree of strains c. phytofermentans and related taxonomies within clostridium based on the 16S rRNA gene sequence. Cluster I includes disease-causing clostridium, cluster III includes hydrolyzable cellulose clostridium, cluster XIVa includes gut microbes, and metagenomic sequences in clostridium. The number on the node is the level (percentage) of self-priming support based on adjacency analysis of 1000 oversampled databases. Bacillus subtilis is used as the exocolony. Bar, 4 nucleotide substitutions at each position.

FIG. 8 is a circular graph showing the values of the best matches (e values cut off at 0.01) of Clostridium phytofermentans ISDg CDSs in other sequenced Clostridium bacterial genomes.

FIGS. 9A and 9B are circular graphs showing comparison of the Glycoside Hydrolase (GH) -encoding gene (9A) and all genes (9B) in different organisms using BLASTP.

Figure 10 is a contiguous tree showing the molecular phylogeny of glycoside hydrolase family GH9 domains.

Figure 11 is a contiguous tree showing the molecular phylogeny of glycoside hydrolase family GH5 domains.

FIG. 12 is a schematic diagram showing the putative hydrolase of the examples. Some hydrolases may be extracellular or membrane bound. GH: a hydrolase; and (3) CBM: a carbohydrate-binding domain.

Fig. 13 is a depiction of the absorption and metabolism of pentoses in c.

Figure 14 is a depiction of the uptake and metabolism of fucose in c.

Fig. 15 is a depiction of the absorption and metabolism of rhamnose in c.

Fig. 16 is a depiction of the regulation, absorption and metabolism of laminarin in c.

Fig. 17 is a depiction of the absorption and metabolism of cellobiose in c.

FIG. 18 is a depiction of the pIMP-Cphy plasmid map.

FIG. 19 is a depiction of the plasmid map of pCphyP 3510-3367.

Detailed Description

The various embodiments disclosed herein generally focus on compositions and methods for making recombinant microorganisms that are capable of producing fuel when cultured in a variety of fermentation environments. In general, recombinant microorganisms are capable of efficiently and stably producing fuels, such as alcohols, and related compounds, such that high yields of the fuels can be obtained from relatively inexpensive natural biomass materials, such as cellulose. In some embodiments, recombinant microorganisms can efficiently and stably catalyze the conversion of inexpensive natural biomass materials, such as lignocellulose, to produce sugars and polysaccharides, and related compounds.

Currently, there are a few technologies that use recombinant organisms capable of producing fuels. Many technologies often have problems that result in low fuel yields, high costs, and undesirable byproducts. For example, some known techniques use corn grain and other cereals as feedstock. However, competing feed and food demands on the grain supply and prices may ultimately limit the development of ethanol production from corn and other cereals. Other feedstock sources include lignocellulose, which can be produced by saccharification and fermentation of ethanol (Lynd, L.R., Cushman, J.H., Nichols, R.J., and Wyman, C.E. Fuel ethanol from cellulose biology, "Science 251, 1318-1323 (1991)). Since lignocellulose is a major component of biomass and is also the most abundant biological material on earth, fuels derived from lignocellulosic biomass are renewable energy choices with the potential to maintain the world's economy, energy and environment. However, conventional lignocellulosic ethanol products require expensive and complex multi-step processes, including the production of lignocellulosic material and pretreatment of the lignocellulosic material with exogenous glycolytic enzymes, hydrolysis of polysaccharides present in the pretreated biomass, and isolated fermentation of hexose and pentose sugars.

In one embodiment, the methods and compositions of the present invention comprise genetically modifying or making microorganisms to increase the enzymatic activity of one or more enzymes, including but not limited to cellulases. Examples of such modifications include modifying endogenous nucleic acid regulatory elements to increase expression of one or more enzymes (e.g., operably linking a gene encoding an enzyme of interest to a strong promoter), introducing additional copies of a nucleic acid molecule into the microorganism to provide enhanced activity of the enzyme, operably linking a gene encoding one or more enzymes to an inducible promoter, or a combination thereof.

The different microorganisms of the present invention may be modified to increase the activity of one or more cellulase enzymes or enzymes associated with cellulose treatment. The classification of cellulases is generally based on a collection of enzymes that will form families with similar or identical activities, but not necessarily identical substrate specificity. One of these classes is the CAZY system (CAZY stands for carbohydrate active enzyme), for example, there are a list of 115 different Glycoside Hydrolases (GH) named GH1 to GH 115. Each different protein family typically has a corresponding enzymatic activity. The database includes cellulose and hemicellulase active enzymes. In addition, the entire annotated genome of Clostridium phytofermentans is available from the Internet www.ncbi.nlm.nih.gov/sites/entrez.

Some embodiments described herein simplify the conventional multi-step process of lignocellulosic ethanol production by providing methods and compositions that allow lignocellulosic biomass to be fermented to ethanol in a single step. This is known as a Comprehensive Biological Process (CBP). Since CBP simplifies the overall conversion process, reducing costs and energy waste, it is predicted to be the only economically and environmentally sustainable cellulosic ethanol bioprocess.

In some embodiments, polynucleotides and expression cassettes for efficient fuel production systems are provided. The polynucleotides and expression cassettes can be used to prepare expression vectors for transforming microorganisms to render the transformed microorganisms capable of efficiently producing very large quantities of products, such as fuels.

In some embodiments, the metabolism of a microorganism can be improved by introducing and expressing various genes. According to some embodiments of the invention, the recombinant microorganism may use a gene from Clostridium phytofermentans (ISDgT, american type culture collection 700394T) as a biocatalyst for promoting the conversion of, for example, cellulose to fuels such as ethanol and hydrogen.

In some embodiments, c.phytofermentans (american type culture collection 700394)^T) Can be based on the strain ISDg cultured^TAre defined (Warnick et al, International Journal of Systematic and evolution Microbiology, 52: 1155-60, 2002). The entire annotated genome of Clostridium phytofermentans is available from the Internet www.ncbi.nlm.nih.gov/sites/entrez. Description of several embodimentsGenerally, to systems, methods, and compositions for producing fuels and/or other useful organic products, including strain ISDg^TAnd/or from strain ISDg^TOr any other strain of the species c. The species can be defined using standard taxonomic factors (Stackelbandt and Goebel, International Journal of Systematic Bacteriology, 44: 846-9, 1994): and model strain (ISDg)^T) In contrast, strains having sequence homology values of 16s rRNA of 97% and above were considered to be c. There is considerable evidence that microorganisms with 70% or more DNA recombination values also have at least 96% DNA sequence identity and share defined classes of phenotypic characteristics. Phytofermentans strain ISDg^TThe genomic sequence analysis of (a) indicates the presence of a large number of genes and genetic loci that may be involved in the mechanisms and pathways of plant polysaccharide fermentation, leading to unusual fermentation properties of the microorganism. Based on the above taxonomic factors, all strains of the species c. The phytofermentans strain may be a naturally-occurring isolate or a genetically-improved strain.

Different expression vectors can be introduced into the host microorganism so that the transformed microorganism can produce large amounts of fuel under different fermentation conditions. The recombinant microorganism can be modified so that the microorganism produces fuel stably in high yield when cultured on a medium comprising, for example, cellulose.

Phytofermentans, alone or in combination with one or more other microorganisms, can ferment cellulosic biomass material into combustible biofuels such as ethanol, propanol and/or hydrogen on a large scale (see, e.g., U.S. patent application No. 2007/0178569; warnik et al, Int J Syst Evol Microbiol (2002), 521155-1160, each of which is incorporated herein by reference in its entirety).

The polynucleotides, expression cassettes, and expression vectors disclosed herein can be used with many different host microorganisms to produce fuels such as ethanol and hydrogen. For example, in addition to Clostridium phytofermentans, cellulose-hydrolyzing microorganisms such as Clostridium cellulovorans, Clostridium cellulolyticum, Clostridium thermocellulosum, Clostridium jolyticum, Clostridium josui, Clostridium papyrisolvens, Clostridium cellulobial, Clostridium thermonatum, Clostridium termitis, Clostridium thermocopriae, Clostridium thermocelluloicum, Clostridium thermocellulolyticum, Clostridium saccharolyticum, Clostridium sporotrichum, Clostridium micropopuletii, Clostridium toxocellum, Clostridium borteorhicornum, Clostridium borteorubium, Clostridium sporotrichlordoticum, Clostridium sporotrichum, Clostridium sporotrichllum, Clostridium sporotrichllurgicum, Clostridium sporotrichllusca, Clostridium sporotrichum, Clostridium sporotrichllusca, Clostridium sporotrichum, Clostridium sporotrichllurgic, Clostridium sporotrichllusca, Clostridium sporotrichllurgic cellulose, Clostridium sporotrichllusca cellulose, Clostridium sporotrichllus, Clostridium sporotrichllusca cellulose, Clostridium sporotrichllus, Clostridium sporotrichllu. Other microorganisms that can be used include, for example, saccharolytic microorganisms such as Thermoanaerobacterium thermosaccharolyticum and Thermoanaerobacterium saccharolyticum. Other potential hosts include other bacteria, yeast, algae, fungi, and eukaryotic cells.

In various embodiments, the disclosed polynucleotides, expression cassettes, and expression vectors can be used with c.

As will be appreciated by those skilled in the art, the production of recombinant organisms capable of producing fuels can be of great benefit, particularly for efficient, cost-effective and environmentally non-destructive fuel production.

Illustrative embodiments

The following description and examples describe in detail some embodiments of the invention. Those skilled in the art will recognize that its scope includes many variations and modifications. Therefore, the description of the preferred embodiments should not be taken as limiting the scope of the invention.

Various embodiments of the present invention provide the benefit of using recombinant microorganisms to produce fuels. Polynucleotides, expression cassettes, expression vectors and recombinant microorganisms optimized for fuel production are disclosed according to some embodiments of the invention.

Hydrolytic enzyme

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids encoding hydrolases identified in c. Some embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising nucleic acids encoding hydrolases identified in c. Advantages of using nucleic acids encoding hydrolases include improved ability and performance of microorganisms to hydrolyze polymers such as polysaccharides and polypeptides.

Hydrolases may include enzymes that degrade polymers such as disaccharides, trisaccharides and polysaccharides, polypeptides and proteins. Polymers may also include, for example, cellulose, hemicellulose, pectin, lignin, and proteoglycans. Examples of enzymes and enzyme activities that degrade polysaccharides include, but are not limited to, Glycoside Hydrolases (GH), Glycosyltransferases (GT), Polysaccharide Lyases (PL), Carbohydrate Esterases (CE), and proteins containing Carbohydrate Binding Modules (CBM) (available on The Internet "chair. org."; Coutinho, P.M., and Henrissat, B. (1999) Carbohydrate-active enzymes: an integrated database approach. in "Recent Advances in Carbohydrate Bioengineering", H.J.Gilbert, G.Davies, B.Henrissat and B.Ssvensson, The Royal Society of Chemistry, Cambridge, pages 3-12).

In some embodiments, GH, GT, PL, CE and CMB may be separate enzymes with different activities. In other embodiments, GH, GT, PL, CE and CMB may be enzyme domains with specific catalytic activities. For example, an enzyme with multiple activities may have multiple enzyme domains, including, for example, GH, GT, PL, CE, and/or CBM catalytic domains.

O-glycosyl hydrolases are a widely distributed group of enzymes that hydrolyze the glycosidic bonds between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. The classification system of glycosyl hydrolases based on sequence similarity has led to the identification of 85 different families of PUBMED: 7624375, PUBMED: 8535779 PUBMED. This classification can be obtained in the CAZy (carbohydrate active enzyme) website PUBMED. Since protein folding is more conserved than its sequence, some families can be grouped in "clans".

Glycoside hydrolase family 9 includes enzymes with several known activities, such as endoglucanases and cellobiohydrolases. In c.phytofermentans, a typical GH9 cellulase is ABX 43720.

Any hydrolase may be selected from the annotated genomes of c. Examples include, for example, one or more endoglucanases, chitinases, cellobiohydrolases, or endo-cellulases (at the reducing or non-reducing end).

In addition, microorganisms such as c. For heterologous expression, bacteria or yeast may be modified by recombinant techniques (e.g., Brat et al, appl. env. Microbiol. 29; 75: 2304-2311, disclosing the expression of xylose isomerase in Saccharomyces cerevisiae).

Other modifications may be made to increase the yield of end products (e.g., ethanol) in the recombinant microorganisms of the invention. For example, the host may further comprise additional heterologous DNA segments, the expression products of which are proteins involved in the transport of mono-and/or oligosaccharides to the recombinant host. Likewise, additional genes from the glycolytic pathway may be integrated into the host. In such a route, the ethanol production rate can be increased.

One of the most significant and unexpected characteristics of the phytofermentans genome is the number and diversity of enzyme genes encoding carbohydrate activities. This diversity is unique among organisms associated with c. Table 1 illustrates the diversity of carbohydrate genes involved in other organisms.

Table 1: number and diversity of carbohydrate-active genes

Phytofermentans genomes include a diverse array of GH, PL, CE and CBM genes with a wide range of putative functions predicted using the methods described herein and methods known in the art. Tables 2 to 5 show some examples of some known activities of GH, PL, CE and CBM family members predicted to be present in c. Known activities are listed by activity and the associated EC numbers determined by the international union of biochemistry and molecular biology.

Table 2: known Activity of members of the glycoside hydrolase family

Table 3: known Activity of members of the polysaccharide lyase family

Table 4: known Activity of members of the carbohydrate esterase family

Table 5: known Activity of members of the carbohydrate binding Module family

Some embodiments include genes encoding the hydrolases shown in table 6. JGI numbers represent NCBI locus tags in GenBank records.

Table 6: prediction of hydrolases in C

JGI number	GH	GH modular structure
			Cphy0191	GH43
Cphy0203	GH105
			Cphy0218	GH31
Cphy0220	GH3
			Cphy0288	GH88
Cphy0430	GH94
			Cphy0530	GH2
Cphy0531	GH43
			Cphy0607	GH20
Cphy0662	GH3
			Cphy0666	GH106
Cphy0694	GH94
			Cphy0699	GH3
Cphy0711	GH2
			Cphy0769	GH4
Cphy0776	GH88
			Cphy0857	GH94
Cphy0858	GH30
			Cphy0874	GH95
Cphy0875	GH43
			Cphy0934	GH88
Cphy1019	GH65

Cphy1071	GH26	CBM35-GH26-CBM3
			Cphy1125	GH3
Cphy1163	GH5
			Cphy1169	GH51
Cphy1308	GH87
			Cphy1395	GH95
Cphy1435	GH19
			Cphy1510	GH10
Cphy1596	GH3
			Cphy1612
Cphy1640	GH12
			Cphy1652	GH18
Cphy1688	GH^*
			Cphy1711	GH28
Cphy1714	GH85	GH85-CBM32
			Cphy1720	GH38
Cphy1750	GH105
			Cphy1775	GH^*	SLH-GH^*-CBM32-CBM32
Cphy1799	GH18	CBM12-GH18
			Cphy1800	GH18	GH18-CBM12
Cphy1815	GH18	GH18-LRR
			Cphy1873	GH87	CBM35-CBM6-GH87
Cphy1874	GH65
			Cphy1877	GH31

Cphy1882	GH87	GH87-SORT
			Cphy1888
Cphy1919	GH105
			Cphy1929	GH94
Cphy1934	GH13
			Cphy1936	GH36
Cphy1937	GH1
			Cphy1943	GH19	CBM5-GH19
Cphy2025	GH2
			Cphy2028	GH43
Cphy2058	GH5
			Cphy2105	GH11
Cphy2108	GH10	CBM22-GH10-SORT
			Cphy2128	GH26	CBM35-GH26-X2-X2-CBM3
Cphy2190	GH29
			Cphy2276	GH26	CBM35-GH26
Cphy2304	GH13	CBM41-CBM48-GH13-SORT
			Cphy2331	GH13	CBM48-GH13
Cphy2332	GH3
			Cphy2341	GH13
Cphy2342	GH13
			Cphy2344	GH13
Cphy2349	GH77
			Cphy2350	GH13

Cphy2567	GH28
		Cphy2572	GH18
Cphy2632	GH43
		Cphy2736	GH28
Cphy2848	GH4
		Cphy2919
Cphy3009	GH3
		Cphy3010	GH10
Cphy3011	GH43
		Cphy3023	GH29
Cphy3028	GH29
		Cphy3029	GH88
Cphy3056	GH36
		Cphy3081	GH2
Cphy3109	GH25
		Cphy3158	GH67
Cphy3160	GH2
		Cphy3202	GH5	GH5-X2-CBM46-CBM2
Cphy3207	GH8
		Cphy3217	GH28
Cphy3239	GH20
		Cphy3310	GH28
Cphy3313	GH65
		Cphy3314	GH65

Cphy3329	GH3
			Cphy3367	GH9	GH9-CBM3-X2-X2-CBM3
Cphy3368	GH48	GH48-X2-CBM3
			Cphy3388	GH16	GH 16-CBM4-CBM4-CBM4-CBM4
Cphy3396	GH4
			Cphy3398	GH43
Cphy3404	GH30
			Cphy3466	GH73
Cphy3571	GH20
			Cphy3586	GH53	GH53-CBM13
Cphy3618	GH43
			Cphy3749	GH18
Cphy3785	GH31
			Cphy3854	GH94
Cphy3862	GH10	GH10-GH10-CE15

In some embodiments, the polysaccharide-degrading enzyme comprises a cellulose-degrading enzyme, i.e., a cellulase. Some cellulases, including endo-cellulases (EC 3.2.1.4) and exo-cellulases (EC 3.2.1.91), hydrolyze beta-1, 4-glucosidic bonds.

Examples of endocellulases predicted in c.phytofermentans include genes within the GH5 family, such as Cphy 3368; cphy1163 and Cphy 2058; GH8 family, such as Cphy 3207; and GH9 family, such as Cphy 3367. Examples of exocellulases in phytofermentans include genes in the GH48 family, such as Cphy 3368. Some exocellulases hydrolyze polysaccharides to produce 2-4 units of oligosaccharide of glucose, producing cellodextrin disaccharide (cellobiose), trisaccharide (cellotriose), or tetrasaccharide (cellotetraose). Members of the GH5, GH9, and GH48 families may have both endo-and exo-cellulase activity.

In some embodiments, the polysaccharide degrading enzymes may include enzymes capable of degrading hemicellulose, i.e., hemicellulases (leschene, s.b. handbook on clostridium) (ed.d. usa rre, P.) (CRC Press, Boca Raton, 2005)). Hemicellulose is the major constituent of plant biomass and comprises a mixture of pentoses and hexoses, for example, D-xylopyranose, L-arabinofuranose, D-mannopyranose, D-glucopyranose, D-galactopyranose, D-glucuronic acid and other sugars (aspenall, g.o.the Biochemistry of plantations 473, 1980; Han, J.S. & Rowell, j.s.paper and compositions from agro-based resources 83, 1997). In some embodiments, the predicted hemicellulases identified in c.phytofermentans include enzymes that act on the linear skeleton of hemicellulose, such as endo-beta-1, 4-D-xylanases (EC3.2.1.8), such as GH5, GH10, GH11, and GH43 family members; 1, 4- β -D-xyloside xylohydrolase (EC 3.2.1.37), such as GH30, GH43 and GH3 family members; and β -mannanases (EC 3.2.1.78), such as members of the GH26 family. (see Table 6).

In some embodiments, the predicted hemicellulases identified in c.phytofermentans include enzymes that act on side groups and substituents of hemicellulose, such as α -L-arabinofuranosidase (EC 3.2.1.55), as GH3, GH43, and GH51 family members; α -xylosidase, such as GH31 family members; alpha-fucosidases (EC 3.2.1.51), such as GH95 and GH29 family members; galactosidases such as GH1, GH2, GH4, GH36, GH43 family members; and acetyl-xylan esterases (EC 3.1.1.72), such as CE2 and CE 4. (see Table 6).

In some embodiments, the polysaccharide degrading enzyme may comprise an enzyme capable of degrading pectin, i.e., a pectinase. In plant cell walls, the cross-linked cellulose network can penetrate into pectin motifs covalently linked to xyloglucan and some structural proteins. The pectin may comprise Homogalacturonan (HG) or Rhamnogalacturonan (RH).

In some embodiments, the pectinase identified in c. HG may consist of D-galacturonic acid (D-galA) units, which may be acetylated and methylated. Enzymes that hydrolyze HG include, for example, 1, 4-alpha-D polygalacturonate lyases (EC 4.2.2.2), such as PL1, PL9, and PL11 family members; glucuronic acid hydrolases, such as GH88 and GH 105 family members; pectin acetyl esterases, such as CE 12 family members; and pectin methyl esterase, such as CE8 family members (see table 6).

In some embodiments, the pectinases identified in c. RH may be a backbone consisting of alternating 1, 2-alpha-L-rhamnose (L-Rha) and 1, 4-alpha-D-galacturonic residues (Lau, J.M., McNeil M., Darvill A.G. & Albersheim P.Structure of the backbone of rhamnogalactan I, a peptide polysaccharide in the primary cells of plants 137, 111 (1985)). The rhamnose residues of the backbone may have galactan, arabinosan or arabinogalactan attached to C4 as side chains. Enzymes that hydrolyze HG may include, for example, an endo-rhamnogalacturonase, such as a GH28 family member; and rhamnogalacturonan lyases, such as members of the PL11 family. (see Table 6).

Some embodiments include enzymes that can hydrolyze starch. Phytofermentans are capable of degrading starch and chitin (Warnick, T.A., Methe, B.A., and Leschine, S.B.Clostridium phytofermentans sp.nov., a cellulosic polysaccharide from for example soil in int.J.Syst. Evol.Microbiol.52, 1155-1160 (2002); Leschine, S.B.in Handbook on Clostridium (ed D. rre, P.) (CRC Press, Boca Raton, 2005); Reguera, G. & Leschine, S.B.Chitin degradation by cellulosic polysaccharides and polysaccharides from microorganisms FEMS. 204.367, 374 (374)). Enzymes that hydrolyze starch include alpha-amylase, glucoamylase, beta-amylase, exo-alpha-1, 4-glucanase, and pullulanase. Examples of predicted enzymes identified in phytofermentans that are involved in starch hydrolysis include members of the GH13 family. (see Table 6).

In other embodiments, the hydrolytic enzyme may comprise an enzyme that hydrolyzes chitin. Examples of enzymes that can hydrolyze chitin include members of the GH18 and GH19 families. (see Table 6).

In some embodiments, the hydrolytic enzyme may comprise an enzyme that hydrolyses lichenase, i.e. a lichenase, such as a GH16 family member, e.g. Cphy 3388.

In some embodiments, the hydrolase may comprise a CBM family member. Without wishing to be bound by any theory, the function of the CBM domain is to localize the enzyme complex to a specific substrate. Examples of predicted CBM families identified in phytofermentans that can bind cellulose include CBM2, CBM3, CBM4, CBM6, and CBM46 family members. Examples of predicted CBM families identified in phytofermentans that can bind xylan include CBM2, CBM4, CBM6, CBM13, CBM22, CBM35, and CBM36 family members. (see Table 6). In other embodiments, the CBM domain family member functions to stabilize the enzyme complex.

Some embodiments include polynucleotides encoding at least one predicted hydrolase identified in c.

ATP-binding cassette transporters

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising the nucleic acids encoding ATP-binding cassette transporters (ABC transporters) identified in c. Some embodiments relate to methods of producing fuel using these polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising the ABC transporter encoding nucleic acids identified in c. Advantages of using nucleic acids encoding ABC transporters include increasing the ability of transformed organisms to transport compounds to organisms and use these compounds in biochemical pathways for producing fuels, thereby improving fuel production. Examples of such compounds include polymeric hydrolysates.

ABC transporter proteins are hydrolyzed using ATP to transport a variety of substances across the plasma membrane. Such materials include sugars and amino acids. ABC transporters can be identified using the methods described herein and methods known in the art. ABC transporters include at least two types of domains: a transmembrane domain and a nucleotide (e.g., ATP) binding domain. Some ABC transporters also include a lytic binding domain that facilitates regulation of lysate transport. These domains may be present on the same polypeptide chain or on multiple polypeptide chains. Some members of the ABC transporter family include the ABC _ tran (pfam0005) domain. Further members of the ABC transporter family include 4 domains, two symmetrical halves of which are connected by a long band charge domain and a highly hydrophobic fragment (Hyde et al, Nature, 346: 362-.

In more exemplary embodiments, polynucleotide cassettes, expression vectors and organisms comprising ABC transporters are identified in c. Such gene clusters can be identified using the methods described herein and methods known in the art. In some embodiments, genes and gene clusters can be identified by the degree of identity between adjacent clusters of proteins (COG). Such genes and gene clusters may be included on the cassette or expressed together. Examples including predicted ABC transporters and ABC transporter sub-domains are shown in table 7. Column "No." indicates a hypothetical cluster. ABC-transporter domains include a signal transduction domain.

Table 7: prediction of ABC transporters and other proteins/domains in phytofermentans

Some embodiments include the use of a nucleic acid encoding a predicted ABC transporter that transports any polymeric hydrolysate. Such hydrolysates may include monosaccharides such as glucose, mannose, fucose, galactose, arabinose, rhamnose and xylose; disaccharides, such as trehalose, maltose, lactose, sucrose, cellobiose; xylobiose and oligosaccharides, such as cellotriose, cellotetraose, xylotriose, xylotetraose, inulin, raffinose and melezitose.

Some embodiments include a predicted ABC transporter for transporting cellobiose, e.g., a predicted ABC transporter encoded by Cphy2464, Cphy2465, and Cphy 2466.

Transcription regulatory factor

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids encoding transcription regulators identified in c. Other embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising nucleic acids encoding transcription regulators identified in c.

Transcriptional regulators identified in c. AraC regulators may include Transcriptional activators of genes involved in carbon metabolism (Gallegos M.T., et al, AraC/xylS Family of Transcriptional regulators. Microbiol. mol. biol. Rev.61, 393. 410 (1997)). PurR modulators include members of The lactose repressor family (Ramos, J.L., et al, The TetR family of translational repressors. Microbiol. mol. biol. Rev.69, 326-356 (2005)).

Some embodiments include the predicted transcriptional regulators shown in table 8.

Table 8: predicted transcriptional regulators in phytofermentans

Some embodiments include a predicted transcriptional regulator encoded by Cphy 2467.

Combination of

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors and organisms comprising more than one (e.g., two or more) genes identified in c. Some embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising more than one (e.g., two or more) genes identified in c.

The combination may include a polynucleotide cassette containing more than one gene identified in c. In such embodiments, any of the genes described herein may be utilized in combination with any other gene described herein. For example, any nucleic acid encoding a hydrolase identified in c.phytofermentans may be utilized in combination with any nucleic acid encoding an ABC transporter identified in c.phytofermentans. In further embodiments, any nucleic acid encoding a hydrolase identified in c.phytofermentans may be utilized in combination with a nucleic acid encoding a cognitive (cognizant) ABC transporter identified in c.phytofermentans, e.g., a nucleic acid encoding a xylanase in combination with a nucleic acid encoding a xylose transporter.

As used herein, cognition may refer to at least two genes that bind to a particular biochemical pathway. For example, cognizance can refer to at least two genes whose products of a first gene are substrates of a second gene, and so forth. Advantages of using cognitive genes include the ability to produce recombinant organisms with diverse activities encoded by the polynucleotide cassettes, e.g., an organism transformed with a polynucleotide cassette comprising a hydrolase and a cognitive ABC transporter can hydrolyze a particular substrate polymer for the hydrolase and transport the hydrolysate to the cell via the cognitive ABC transporter. One skilled in the art can identify examples of cognitive genes described herein.

In other embodiments, any nucleic acid encoding a hydrolase identified in c.phytofermentans may be used in combination with any nucleic acid encoding a transcriptional regulator identified in c.phytofermentans. In further embodiments, any nucleic acid encoding a hydrolase identified in c.phytofermentans may be used in combination with a nucleic acid encoding a cognitive transcription regulator identified in c.phytofermentans.

In particular embodiments, any ABC transporter encoding nucleic acid identified in c.phytofermentans may be used in combination with any transcriptional regulator encoding nucleic acid identified in c.phytofermentans. In a further embodiment, any nucleic acid encoding an ABC transporter identified in c.phytofermentans may be used in combination with a nucleic acid encoding a cognitive transcriptional regulator identified in c.phytofermentans.

In some embodiments, any nucleic acid encoding a hydrolase identified in c.phytofermentans may be used in combination with any nucleic acid encoding an ABC transporter identified in c.phytofermentans, and any nucleic acid encoding a transcriptional regulator identified in c.phytofermentans. In further embodiments, any nucleic acid encoding a hydrolase identified in c.phytofermentans may be used in combination with any nucleic acid encoding a cognitive ABC transporter identified in c.phytofermentans, and any nucleic acid encoding a cognitive transcriptional regulator identified in c.phytofermentans.

In some embodiments, the combination may comprise the sequential use of more than one gene identified in c. For example, an organism can be transformed with a polynucleotide comprising any of the genes described herein, followed by transformation with at least one different gene described herein.

An exemplary embodiment of a polynucleotide cassette comprising or consisting essentially of at least two genes is shown in FIG. 1. In one embodiment, the predicted hydrolase encoded by Cphy2276 may bind to the predicted cognitive ABC-transporter domains encoded by Cphy2272, Cphy2273, and Cphy 2274. In another embodiment, the predicted hydrolase encoded by Cphy3207 may bind to the predicted cognitive ABC transporter domains encoded by Cphy3210, Cphy3209 and Cphy3208, and the predicted cognitive transcriptional regulator encoded by Cphy3211, and the predicted cognitive signal transduction protein encoded by Cphy 3212. In another embodiment, the predicted ABC transporter domains encoded by Cphy0862, Cphy0861 and Cphy0860 may bind to the predicted transcriptional regulator encoded by Cphy0864, as well as the predicted signal transduction protein encoded by Cphy 0863. In another embodiment, the predicted ABC transporter domains encoded by Cphy2466, Cphy2465, and Cphy2464 may be combined with the predicted transcriptional regulator encoded by Cphy 2467. In another embodiment, the predicted hydrolase encoded by Cphy1877 may be combined with the predicted transcriptional regulator encoded by Cphy 1876.

In further exemplary embodiments, polynucleotide cassettes, expression vectors and organisms comprising more than one gene may comprise the gene cluster identified in c. Such gene clusters can be identified using the methods described herein and methods known in the art. In some embodiments, genes and gene clusters can be identified by the degree of identity between adjacent clusters of proteins (COG). Such genes and gene clusters may be included on the cassette or expressed together. Examples of gene clusters identified in c.

Table 9: gene clusters identified in phytofermentans

Enzymes involved in xylose digestion

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising nucleic acids identified in c. Other embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising nucleic acids identified in c.

As used herein, genes involved in xylose digestion include, for example, hydrolase genes that hydrolyze polymers to xylose, ABC transporter genes that transport xylose to cells, transcriptional regulator genes that regulate these genes encoding hydrolases and/or ABC transporters, and genes involved in the fermentation of pentose sugars (e.g., xylose) to ethanol. Genes identified as upregulated when c.phytofermentans were grown on xylose include Cphy3419, Cphy1219 and Cphy1585, Cphy1586 and Cphy1587 (see fig. 13).

Although many species of clostridium can degrade hemicellulose, many cannot ferment pentose sugars produced by such hydrolysis. Obviously, c.phytofermentans are able to hydrolyze hemicellulose to pentose sugars and ferment pentose sugars to ethanol. Phytofermentans can transport pentoses to cells as oligosaccharides or as monosaccharides. Phytofermentans genome comprises genes encoding enzymes for xylose digestion, including enzymes in the non-oxide pentose phosphate pathway involved in the conversion of pentoses to hexoses. Consistent with the ability to ferment pentose sugars, expression data of cells grown on xylose have shown that key enzymes in the hexose phosphate pathway, namely transaldolase (EC 2.2.1.1, Cphy0013) and transketolase (EC 2.2.1.1, Cphy0014), are present in the most transcripts. Glyceraldehyde-3-phosphate dehydrogenase (EC 1.2.1.12, Cphy2879), which links the pentose phosphate pathway and the energy harvesting step that initiates the glycolysis reaction to glycolysis, is strongly induced by xylan, cellulose, cellobiose and glucose. Other genes that are up-regulated during growth on xylan include Cphy2105, Cphy2106, Cphy2108, Cphy1510, Cphy3158, Cphy3009, Cphy3010, Cphy3419, Cphy1219, Cphy2632, Cphy3206, Cphy3207, Cphy3208, Cphy3209, Cphy3210, Cphy3211, Cphy3212, Cphy1448, Cphy1449, Cphy1450, Cphy1451, Cphy1132, Cphy1133, Cphy1134, Cphy 8, Cphy 1521529, Cphy1530, Cphy1531, and Cphy 1532.

The fermentation of hexoses and pentoses is terminated by the reduction of acetyl-coa to ethanol catalyzed by enzymes including NAD (p) -dependent acetaldehyde dehydrogenase (Ald) and NAD-dependent ethanol dehydrogenase (Adh). Phytofermentans genome comprises putative genes encoding at least 7 alds (PutA domains) and at least 6 Adh, e.g., putative proteins encoded at Cphy3925 (comprising Ald and Adh domains). 4 Alds and 3 Adh were clustered by three clusters: cphy 1173-1183; genes encoding Cphy1411-1430 and Cphy 2634-2650.

Genes involved in propanol production, ethanolamine and/or propylene glycol metabolism

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids identified in c. Some embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising nucleic acids identified in c.

Phytofermentans contain a proteinaceous microchamber ("PMC") not found in bacteria of other similar biotechnological targets, such as c. These microchambers have been observed by electron microscopy. Specific enzymes involved in the conversion of carbohydrates to ethanol are confined to these microchambers, suggesting compartmentalization of specific pathways and greater metabolic efficiency (Conrado, r.j., Mansell, t.j., Varner, J.D. & DeLisa, m.p

Three loci in phytofermentans encode proteins confined to a protein-like compartment. These protein-like compartments are similar to those involved in carbon dioxide fixation and utilization of ethanolamine and propylene glycol found in other organisms. Each locus includes enzymes for converting five-carbon sugars and alcohol dehydrogenase to primary ethanol.

Of the 7 alds and 6 adhs identified in phytofermentans, 4 alds and 3 adhs were confined to a protein-like microchamber. Adh restricted to protein-like microchambers showed sequence identity to Fe-Adh or Zn-Adh and was encoded by three clusters of genes: cphy 1173-1183; cphy1411-1430 and Cphy 2634-2650.

Many enzymes confined to the proteo-like microchamber may be involved in the fucose to propanol pathway, as well as in the metabolism of ethanolamine and propylene glycol. For example, the Cphy2634-2650 cluster comprises an orthologue of a gene involved in ethanolamine metabolism in Salmonella typhimurium, and the Cphy1411-1430 cluster comprises a gene in Salmonella typhimurium encoding a product involved in the function of the propylene glycol utilization operon.

Furthermore, the Cphy1173-1187 cluster contains genes homologous to the microchambers found in Roseburia inuvogens (score, k.p., Martin, J.C, Campbell, g., Mayer, c.d. & Flint, h.j.white-gene transcription profile genes up-regulated by growth on enzyme in the human gut foundation 4349(2006)) and genes encoding putative enzymes involved in fucose and rhamnose utilization (see fig. 14 and 15). Other genes identified as upregulating upon or otherwise predicted to be involved in fucose utilization include Cphy0578, Cphy0579, Cphy0580, Cphy0581, Cphy0582, Cphy0583, Cphy0584, Cphy1146, Cphy1147, Cphy1148, Cphy1149 (fig. 15).

Hydrogen production

Hydrogen can be produced by the fermentation of various sugars. In some embodiments, the polynucleotide may comprise a nucleic acid encoding a ferredoxin hydrogenase identified in c. Examples of genes encoding ferredoxin hydrogenases identified in c.phytofermentans include Cphy0087, Cphy0090, Cphy0092, Cphy2056, Cphy3805, Cphy 3798.

Multi-modular polysaccharide lyases

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids identified in c. Some embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids identified in c. Examples of genes encoding enzyme/protein domains involved in pectin hydrolysis include the gene at the locus Cphy 1612. The Cphy1612 locus encodes the predicted PL1 and PL9 domains. PL1 includes pectate lyase (EC 4.2.2.2); exo-pectate lyase (EC 4.2.2.9); and a pectin lyase (EC 4.2.2.10) domain. PL9 comprises pectate lyase (EC 4.2.2.2) and exo-polygalacturonate lyase (EC 4.2.2.9) domains.

Multimodular xylanases and esterases

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids identified in c. Other embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising nucleic acids identified in c. Examples of genes encoding enzyme/protein domains including xylanase and esterase activity include the gene at the Cphy3862 locus. The Cphy3862 locus includes 3 prediction domains, namely, two GH10 domains and one CE15 domain, with the following activities: GH10 has xylanase (EC3.2.1.8) activity; GH10 has endo-1, 3-xylanase (EC 3.2.1.3) activity, and CE15 has glucuronidase (EC3.1.1-) and 4-O-methylglucuronesterase (EC3.1.1-) activities.

Utilization of laminarin

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids identified in c. Some embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising nucleic acids identified in c. Laminarin is a storage glucan (polysaccharide of glucose) found in brown algae. Examples of genes identified to be up-regulated upon growth on laminarin include Cphy0857, Cphy0858, Cphy0859, Cphy0860, Cphy0861, Cphy0862, Cphy0863, Cphy0864, Cphy0865 and Cphy3388 (see fig. 16).

Utilization of cellobiose

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids identified in c. Other embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors, and microorganisms comprising nucleic acids identified in c. Cellobiose is a disaccharide derived from the condensation of two glucose molecules linked by a β (1 → 4) bond. Examples of genes identified as upregulated upon growth on cellobiose include Cphy0430, Cphy2464, Cphy2465, Cphy2466 and Cphy2467 (see fig. 17).

Cellulose utilization

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids identified in c. Some embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids identified in c. Examples of genes identified to be up-regulated upon growth on cellulose or otherwise predicted to be involved in cellulose utilization include Cphy3367, Cphy3368, Cphy1163, Cphy3202, Cphy3160, Cphy0430, Cphy3854, Cphy3855, Cphy3857, Cphy3858, Cphy3859, Cphy3860, Cphy3861, Cphy3862, Cphy2569, Cphy2570, Cphy2571, Cphy2464, Cphy2465, Cphy2466, Cphy2467, Cphy 8, Cphy 9, Cphy 1521530, Cphy1531, and Cphy 1532.

Pectin utilization

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids identified in c. Other embodiments relate to methods of producing fuel using polynucleotides, polynucleotide cassettes, expression vectors and microorganisms comprising nucleic acids identified in c. Examples of genes identified to be upregulated upon growth on pectin include Cphy3585, Cphy3586, Cphy3587, Cphy3588, Cphy3589, Cphy3590, Cphy2262, Cphy2263, Cphy2264, Cphy2265, Cphy2266, Cphy2267, Cphy2268, Cphy2269, Cphy2272, Cphy2273, Cphy2274, Cphy2275, Cphy2276, Cphy2464, Cphy2465, Cphy2466, Cphy2467, Cphy1714, Cphy1715, Cphy1716, Cphy1717, Cphy1718, Cphy1719, Cphy 171hy 1719, Cphy3153, Cphy 31hy 3154, Cphy3155, Cphy2011, Cphy0219, Cphy 118hy, Cphy 118hy 1180, Cphy 118hy, Cphy 118hy 1180, Cphy1179, Cphy 1172, Cphy1179, Cphy 1172, Cphy1179, Cphy 1172, Cphy 118hy.

Genes up-regulated upon growth on pectin and predicted to be involved in the breakdown and transport of arabinogalactan side chains of rhamnogalacturonan-I include Cphy3585, Cphy3586, Cphy3587, Cphy3588, Cphy3589, and Cphy 3590. Genes up-regulated upon growth on pectin and predicted to be involved in the breakdown and transport of rhamnose galacturonan-I or rhamnose galacturonan-II side chains include Cphy2262, Cphy2263, Cphy2264, Cphy2265, Cphy2266, Cphy2267, Cphy2268, Cphy2269, Cphy2272, Cphy2273, Cphy2274, Cphy2275, Cphy2276, Cphy1714, Cphy1715, Cphy1716, Cphy1717, Cphy1718, Cphy1719 and Cphy 1720. Genes that are up-regulated upon growth on pectin and predicted to be involved in sugar transport include Cphy2464, Cphy2465, Cphy2466 and Cphy 2467. Genes predicted to be involved in the breakdown and transport of polygalacturonic acid include Cphy0288, Cphy0289, Cphy0290, Cphy0291, Cphy0292, and Cphy 0293. Genes predicted to be involved in rhamnogalacturonan cleavage and transport include Cphy0339, Cphy0340, Cphy0341, Cphy0342, Cphy 0343. Genes predicted to be involved in rhamnose transport and breakdown include Cphy0578, Cphy0579, Cphy0580, Cphy0581, Cphy0582, Cphy0583, Cphy0584, Cphy1146, Cphy1147, Cphy1148 and Cphy 1149. Genes upregulated upon growth on pectin and/or predicted to be involved in fucose transport and breakdown include Cphy3153, Cphy3154, Cphy3155, Cphy2010, Cphy2011, and Cphy 2012. Genes that are up-regulated upon growth on pectin and/or genes predicted to be involved in fucose and rhamnose metabolism include Cphy1174, Cphy1175, Cphy1176, Cphy1177, Cphy1178, Cphy1179, Cphy1180, Cphy1181, Cphy1182, Cphy1183, Cphy1184, Cphy1185, Cphy1186, and Cphy 1187.

Genes upregulated during growth on pectin and/or predicted to be involved in polygalacturonic acid utilization include Cphy2919, Cphy0288, Cphy0289, Cphy0290, Cphy0291, Cphy0292, Cphy0293, Cphy3308, Cphy3309, Cphy3310, Cphy3311, Cphy3312, Cphy3313, Cphy3314, Cphy 1885, Cphy3316, Cphy3317, Cphy1118, Cphy1119, Cphy1120, Cphy1121, Cphy1879, Cphy1880, Cphy1881, Cphy1882, Cphy3, Cphy2736, Cphy2737, Cphy2738, Cphy39, Cphy2740, Cphy2741, Cphy2742, and Cphy 2743.

Identification of nucleic acid sequences in C

Some embodiments described herein relate to methods of identifying genes in c. These methods can include identifying a nucleic acid sequence comprising a coding sequence, a non-coding sequence, a regulatory sequence, an intergenic sequence, an operon, or a gene cluster. In some embodiments, the methods of identifying genes in c.

In some embodiments, the gene in c.phytofermentans can be identified by gene similarity to another sequence. Similarity can be detected between polynucleotide sequences or polypeptide sequences. In some embodiments, the other sequence may be a sequence present in another organism. Examples of other organisms include different species of clostridium, such as c.beijerinckii or c.acetobutylicum; or organisms of different genera, such as Bacillus subtilis.

In some embodiments, similarity may be measured as a percentage identity. Percent sequence identity can be a relationship between two or more polypeptide sequences or two or more polynucleotide sequences, as determined by comparing the sequences. In further embodiments, sequence identity may be the degree of sequence relatedness between polypeptide or polynucleotide sequences, as the case may be, as determined by the match between such series of sequences. Typically, sequence identity and sequence similarity can be readily calculated by known methods, including but not limited to those described below: computational Molecular Biology (Lesk, A.M., ed.) Oxford University Press, New York (1988); biocomputing: information and Genome Projects (Smith, D.W., ed.) Academic Press, New York (1993); computer Analysis of Sequence Data, Part I (Griffin, A.M. and Griffin, H.G., eds.) Humana Press, New Jersey (1994); sequence Analysis in Molecular Biology (von Heinje, G., ed.) Academic Press (1987); and Sequence Analysis Primer (Gribskov, M. and Devereux, J., eds.) Stockton Press, NY (1991). Methods of detecting sequence identity can be designed to give the best match between the detected sequences. Some methods of detecting sequence identity and sequence similarity are compiled as publicly available computer programs. Sequence alignment and percent identity calculations can be performed using the Megalign program of LASERGENE bioinformatics calculation software (DNASTAR inc., Madison, Wis.). Multiple alignments of sequences can be achieved using the Clustal method of alignment with default parameters (GAP PENALTY ═ 10, GAP LENGTH PENALTY ═ 10) (Higgins and Sharp (1989) cabaos.5: 151-. Default parameters for pairwise alignments using the Clustal method are KTUPLE 1, GAP PENALTY-3, WIND 0W-5 and DIAGONALS SAVED-5.

In other embodiments, genes in c.phytofermentans can be identified by predicting their presence in a nucleic acid sequence and/or a putative translated polypeptide sequence using algorithms known in the art. For example, computer algorithms in the program, such as GeneMark, may be used^TM(Bessemer, J. and M.Borodovsky.2005. GeneMark: web software for gene sizing in prokaryotes, eukaryotes and viruses. nucleic Acids Res 33: W451-4) and Glimer (Delcher, A.L., K.A.Bratke, E.C.powers and S.L.Salzberg.2007.identifying bacterial genes and endosymbiont DNA with Glimer.Bioinformatics 23: 673-9).

In some embodiments, a nucleotide or amino acid sequence can be analyzed using a computer algorithm or a software program. In related embodiments, the sequence analysis software may be commercially available or developed on its own. Examples of sequence analysis software include the GCG software of the programs (Wisconsin Package Version 9.0, Genetics Computer Group (GCG), Madison, Wis.), BLASTP, BLASTN, BLASTX (Altschul et al, J.MoI.biol.215: 403-.

In other embodiments, genes in c. For example, the National Center for Biotechnology Information (NCBI) Conserved Domain Database (CDD) includes several databases, including the NCBI conserved domain being reviewed, SMART (smart.embl-heidelberg. de/SMART), PFAM (available from the internet, sanger. ac.uk/Software/PFAM), and COGS (systematic classification of proteins encoded in the complete genome).

In some embodiments, genes can be identified and metabolic pathways of putative proteins encoded by the genes can be predicted. In such embodiments, a metabolic pathway database may be used. For example, Kyoto Encyclopedia of Genes and Genomes (KEGG), wherein a KEGG auto-annotation server (available from Internet genome. jp/KEGG/kaas /) can provide functional annotation to an identified gene using a BLAST comparison with KEGG GENES data.

Isolation of nucleic acid sequences from C

Nucleic acid sequences can be cloned from the c. For example, recombinant DNA and molecular cloning techniques that can be used are described by: sambrook, j., Fritsch, e.f. and manitis, t., Molecular Cloning: a Laboratory Manual, Second Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (1989); and Silhavy, t.j., Bennan, m.l., and Enquist, l.w., Experiments with Gene Fusions, Cold Spring Harbor Laboratory Cold Press Spring Harbor, n.y. (1984); and Ausubel, F.M. et al, Current Protocols in Molecular Biology, published by Greene Publishing Assoc and Wiley-Interscience (1987). Furthermore, methods for isolating homologous or orthologous genes using sequence-dependent protocols are known in the art. Examples of sequence-dependent schemes include, but are not limited to: nucleic acid hybridization methods, and DNA and RNA amplification methods exemplified by various applications of nucleic acid amplification techniques, such as polymerase chain reaction (PCR; Mullis et al, U.S. Pat. No. 4,683,202), ligase chain reaction (LCR; Tabor, S. et al, Proc. Acad. Sci. USA 82, 1074, (1985)) or strand displacement amplification (SDA; Walker et al, Proc. Natl. Acad. Sci. U.S. A., 89, 392, (1992)).

Typically, in PCR-type amplification techniques, primers have different sequences and are not complementary to each other. Depending on the desired detection conditions, the primer sequences should be designed to provide efficient and reliable replication of the target nucleic acid. Methods for PCR primer design are conventional and known In The art (The In and Wallace, "The use of oligonucleotide as specific hybridization probes In The Diagnosis of Genetic Disorders", In Human Genetic Disorders: A Practical application, K.E.Davis Ed., (1986) pages 33-50 IRL Press, Herndon, Va.; Rychlik, W. (1993) In White, B.A. (ed.), Methods In Molecular Biology, Vol.15, pages 31-39, PCR Protocols: Current Methods and applications, Humania Press, Inc., Totowa, N.J.).

In general, two short fragments of the identified sequence can be used in a PCR protocol to amplify longer nucleic acid fragments encoding homologous genes from DNA or RNA. The PCR may be performed on a pool of cloned nucleic acid fragments, one of which is derived from the identified nucleic acid sequence and the other of which is derived from a unique 3' poly (A) fragment encoding the mRNA precursor of the microbial gene. Alternatively, the second primer sequence may be based on a sequence derived from a cloning vector. For example, the RACE protocol (Frohman et al, PNAS USA 85: 8998(1988)) provides a method of generating cDNA using PCR to amplify copies of the region between a single site and the 3 'or 5' end of a transcript. Primers oriented in the 3 'or 5' direction can be designed from the identified sequences. Specific 3 'or 5' cDNA fragments can be isolated using commercially available 3 'RACE or 5' RACE systems (BRL) (Ohara et al, PNAS USA 86: 5673 (1989); Loh et al, Science 243: 217 (1989)).

In some embodiments, the identified nucleic acid sequence may be isolated by screening a c. Examples of probes may include DNA probes labeled by techniques such as random primer DNA labeling, nick translation, or end labeling, and RNA probes produced by methods such as in vitro transcription systems. In addition, specific oligonucleotides can be designed and used to amplify partial or full-length instant sequences. The resulting amplification product can be labeled directly in the amplification reaction or after the amplification reaction, and can be used as a probe to isolate a full-length DNA fragment under appropriately stringent conditions.

In some embodiments, the isolated nucleic acid is cloned into a vector. Typically, the vector has the ability to replicate in the host microorganism. Various vectors are known, for example, phages, plasmids, viruses or hybrids thereof. The vector may be manipulated as a cloning vector or an expression vector in the host cell of choice. Typically, the vector comprises an isolated nucleic acid, a selectable marker, and sequences that allow for autonomous replication or chromosomal integration. Further embodiments may comprise promoter sequences, enhancers or termination sequences that drive expression of the isolated nucleic acid. In other embodiments, the vector may comprise sequences that allow for deletion of the sequence from the chromosomal DNA that is subsequently integrated into the vector sequence. Examples include loxP sequences or FRT sequences, which are responsive to CRE recombinase and FLP recombinase, respectively.

Polynucleotides, polynucleotide cassettes, expression cassettes and expression vectors

Some embodiments described herein relate to polynucleotides, polynucleotide cassettes, expression cassettes, and expression vectors for producing fuel or other products in recombinant microorganisms.

The polynucleotide cassette may comprise at least one polynucleotide of interest. In some embodiments, a polynucleotide cassette may comprise more than one polynucleotide of interest. For example, a polynucleotide cassette may comprise two or more, three or more, or any number of the genes and/or polynucleotides of interest described herein.

In some embodiments, the polynucleotide of interest may comprise one or more of the nucleic acids identified in c. In some embodiments, the polynucleotide of interest may have at least 50%, 55%, 60%, 65%, 70%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, 99%, and 100% identity to one or more genes identified in c. In other embodiments, the polynucleotide of interest may encode one or more proteins comprising conservative substitutions for the wild-type protein. In further embodiments, the polynucleotide of interest may encode one or more proteins comprising a substitution that alters the efficiency of the protein to produce a fuel. For example, the protein encoding the enzyme may be made to catalyze the reaction more efficiently.

As used herein, an expression cassette can be a polynucleotide of interest operably linked to a regulatory sequence, such as a promoter. Promoters suitable for the present invention include any promoter that expresses a polynucleotide of interest. In some embodiments, the promoter may be a promoter sequence identified in c. In some embodiments, the promoter may be a promoter sequence identified in a host organism. In some embodiments, the promoter may be an inducible promoter, e.g., a light-inducible promoter or a temperature-sensitive promoter. In other embodiments, the promoter may be a constitutive promoter. In some embodiments, the promoter may be selected based on the desired expression level of the polynucleotide of interest in the host microorganism. In some embodiments, the promoter may be placed at about the same distance from the heterologous transcription start site as it is in nature from the transcription start site. However, as is known in the art, some variation in this distance may be allowed without loss of promoter function. In other embodiments, the expression cassette may further comprise regulatory sequences such as enhancers and/or termination sequences.

Promoter elements may be selected and used in vectors such as pIMCphy. For example, the transcriptional regulatory sequence is operably linked to a gene of interest (e.g., in an expression construct). The promoter may be any series of DNA sequences that specifically interact with cellular transcription factors to regulate transcription of downstream genes. The choice of a particular promoter depends on the cell type used to express the protein of interest. Generally, useful transcriptional regulatory sequences are sequences from the host microorganism. In various embodiments, constitutive or inducible promoters are selected for use in the host cell. Depending on the host cell, hundreds of constitutive and inducible promoters are known and can be designed to function in the host cell.

The promoter may be any series of DNA sequences that specifically interact with cellular transcription factors to regulate transcription of downstream genes. The choice of a particular promoter depends on the cell type used to express the protein of interest. The transcription regulatory sequences may be those derived from the host microorganism. In various embodiments, constitutive or inducible promoters for use in the host cell may be selected. Depending on the host cell, hundreds of constitutive and inducible promoters are known and can be designed to function in the host cell.

In some examples, promoters widely used in recombinant technology, such as Escherichia coli lac and trp operon, tac promoter, phage pL promoter, phage T7 and SP6 promoter, beta-actin promoter, insulin promoter, baculovirus polyhedrin, and p10 promoter, may be utilized.

In other examples, constitutive promoters may be used. Non-limiting examples of constitutive promoters include int promoter of bacteriophage lambda, bla promoter of beta-lactamase gene sequence of pBR322, hydA or thlA in Clostridium, Streptomyces coelicolor hrdB or whiE, CAT promoter of chloramphenicol acetyltransferase gene sequence of pPR325, constitutive promoter blaZ of Staphylococcus, and the like.

The promoter used in the present invention may likewise be an inducible promoter which regulates the expression of downstream genes in a controlled manner, for example under the particular conditions of cell culture. Examples of inducible prokaryotic promoters include The larger right and left promoters of bacteriophage, The trp, recA, lacZ, AraC and gal promoters of E.coli, The alpha-amylase (Ulman et al, J.Bacteriol.162: 176-182, 1985) and sigma-D-specific promoters (Gilman et al, Gene sequence 32: 11-20(1984)), The bacteriophage promoter of Bacillus (Gryczan, In: The Molecular Biology of The Bacillus, Academic Press, Inc., NY (1982)), The Streptomyces promoter (Ward et al, MoI.Genet.203: 468-478, 1986), and The like. Typical prokaryotic promoters are reviewed by Glick (J.Ind.Microtiot.1: 277-282, 1987), Cenatiempo (Biochimie 68: 505-516, 1986) and Gottesrnan (Ann.Rev.Genet.18: 415-442, 1984).

Promoters that are constitutively active in certain culture environments may not be active under other conditions. For example, the promoter of the hydA gene from Clostridium acetobutylicum is known to regulate expression from environmental pH. Furthermore, temperature-regulated promoters are likewise known and can be used. Thus, in some embodiments, pH-regulated or temperature-regulated promoters may be used with the expression constructs of the invention, depending on the desired host cell. Other pH-regulatable promoters are known, such as P170 which functions in lactic acid bacteria as disclosed in U.S. patent application No. 2002-0137140.

In general, in order to efficiently express a desired gene/nucleotide sequence, various promoters such as the original promoter of the gene, the promoter of an antibiotic resistance gene (e.g., kanamycin resistance gene of Tn5, ampicillin resistance gene of pBR 322), the promoter of lambda phage, and any promoter that can function in a host cell can be used. For expression, other regulatory elements, such as Shine-Dalgarno (SD) sequences comprising natural and synthetic sequences operable in a host cell and transcription terminators (inverted repeats comprising any natural and synthetic sequences) operable in a host cell into which the coding sequences are to be introduced to provide the recombinant cells of the invention, can be used in conjunction with the promoters described above.

Examples of promoters that may be used with the products and methods of the invention include those disclosed in the following patent documents: US 2004/0171824, US 6,410,317, WO 2005/024019. Several Promoter-operator systems, such as lac (D.V.Goeddel et al, "Expression In Escherichia coli of chemical Synthesized Genes for Human Insulin," Proc.Nat.Acad.Sci.U.S.A., 76: 106-110(1979)), Trp (J.D.Windass et al, "The Construction of a Synthetic Escherichia coli Trp Promoter and Its Use In The Expression of a Synthetic Interferon Gene", Nucl.acids.Res.10: 6639-57(1982)) and The lambda PL operon (R.Crow.et al, "Expression expressions for High Level Genes of protein Genes, Expression of protein, 1982") have been used for regulation of Expression of Genes In cells (E.V.Goeddel et al, "Expression In Escherichia coli of Escherichia coli, 19838: (19838). The corresponding regulatory genes are the lac repressor, trpR and cI repressor, respectively.

Repressors are protein molecules that specifically bind to a particular operator. For example, the lac repressor molecule binds to an operator of the lac promoter-operator system, whereas the cro repressor binds to lambda P_RAn operator of a promoter. Other combinations of repressors and operators are known in the art. See, e.g., J.D. Watson et al, Molecular Biology Of The Gene, p.373 (4 th edition, 1987). The structure formed by the repressor and the operator prevents the beneficial interaction of the relevant promoter with the RNA polymerase, thus preventing transcription. Other molecules, defined as inducers, bind to the repressor, thus preventing the repressor from binding to its operator. Thus, inhibition of protein expression by a repressor molecule can be reversed by reducing the repressor concentration or by neutralizing the repressor with an inducer.

Similar promoter-operator systems and inducers are known in other microorganisms. In yeast, the GAL10 and GAL1 promoters are repressed by extracellular glucose and activated by the addition of galactose (inducer). The protein GAL80 is a repressor of the system, while GAL4 is a transcriptional activator. Binding of GAL80 to galactose prevents GAL80 from binding GAL 4. GAL4 may then bind to an Upstream Activation Sequence (UAS) to activate transcription. See Y.Oshima, "Regulatory Circuits for Gene Expression: the Metabolism Of Galactose And phosphor "in The Molecular Biology Of The Yeast Saccharomyces, Metabolism And Gene Expression, J.N.Stratan et al (1982).

Transcription under the control of the PHO5 promoter is inhibited by extracellular inorganic phosphate, which is induced to high levels when phosphate is depleted. R.a.kramer and n.andersen, "Isolation of Yeast Genes with mRNA Levels Controlled By phosphor Concentration," proc.nat.acad.sci.u.s.a., 77: 6451-6545(1980). Many regulatory genes for PHO5 expression have been identified, including some that are involved in phosphate regulation.

Mat α 2 is a promoter system for temperature regulation in yeast. Repressor proteins, operator genes and promoter sites have been identified in this system. Sledziewski et al, "Construction Of Temperature-Regulated Yeast Promoters Using The Mat α 2 reproduction System," Bio/Technology, 6: 411-16(1988).

Another example of a repressor system in yeast is the CUP1 promoter, which may be Cu substituted²⁺And (4) ion induction. The CUP1 promoter is regulated by metallothionein. Gorman et al, "Regulation of The Yeast Metallocation Gene," Gene, 48: 13-22(1986).

Similarly, to obtain the desired expression of one or more cellulases, higher copy number plasmids can be used in the products or methods of the invention. The construct may be prepared by chromosomal integration of the desired gene. Chromosomal integration of foreign genes can offer several advantages over plasmid-based constructs, the latter having some limitations in commercial processes. The ethanologenic gene has been integrated into the e.coli B chromosome, see Ohta et al (1991) appl.environ.microbiol.57: 893-9. Typically, this is accomplished by purification of a DNA fragment comprising (1) the desired gene upstream of the antibiotic resistance gene and (2) a homologous DNA fragment from the microorganism of interest. The DNA can be ligated into a circular form without a replicon and used for transformation. Thus, the gene of interest may be introduced into a heterologous host such as e.coli, and short random fragments may be isolated and operably linked to the gene of interest (e.g., a gene encoding cellulase) to facilitate homologous recombination.

Expression vector

An expression vector can include any of the expression cassettes described herein, and typically includes all of the elements required for expression of one or more polynucleotides of interest in a host cell. In some embodiments, the polynucleotide of interest is introduced into a vector to produce a recombinant expression vector suitable for transformation of a host cell to produce a fuel in a recombinant microorganism. In other embodiments, the expression cassette may be introduced into a vector to produce a recombinant expression vector suitable for transformation of a host cell. In some embodiments, expression vectors are provided comprising one or more expression cassettes.

Expression vectors may replicate independently, or they may replicate by being inserted into the host cell genome. In some embodiments, the expression cassette may be homologously integrated into the host cell genome. In other embodiments, the gene may be non-homologously integrated into the host cell genome. In some embodiments, the expression cassette can be integrated into a desired locus by double homologous recombination.

In some embodiments, it is contemplated that the vector may be used in more than one host cell. For example, vectors can be used for cloning in e. Coli and compatible with clostridia or other gram positive bacteria. Coli and gram positive plasmid origins of replication are known. Other elements of the vector may include, for example, a selectable marker such as kanamycin resistance or ampicillin resistance, which allows for the detection and/or selection of those cells transformed with the desired polynucleotide sequence. Typical clostridial shuttle vectors are described in Mauchline et al (1999): clostridium: manual of Industrial Microbiology and Biotechnology, AL domain and JE Davies, eds, (ASM publication), page 475-; and Heap et al, j. microbiol. methods, 78: 79-85 (2009).

In some embodiments, the expression vector may include one or more genes whose presence and/or expression renders the host cell tolerant to economically relevant concentrations of ethanol. For example, genes such as omrA, lmrA and lmrCD may be included in the expression vector. OmrA and its homologues from Lactobacillus cerevisiae oeni, LmrA from Lactobacillus lactis has been shown to increase the relative resistance of tolC (-) E.coli by 100 to 10000-fold (Bourdineaud et al, A bacterial gene homologus to ABC transporters protective Oenococcus oeni from ethanol and other stress factors in wire. int.J. food Microbiol.2004, 4.1.92 (1): 1-14). Therefore, combining omrA, lmrA and other homologues may be beneficial in increasing ethanol tolerance of the host cell.

In some embodiments, the vectors provided herein may include one or more genomic nucleic acid fragments that are readily targeted for integration into the genome of the host organism. The genomic nucleic acid fragment for targeted integration may be about 10 nucleotides to about 20000 nucleotides in length. In some embodiments, a genomic nucleic acid fragment for targeted integration may be about 1000 to about 10000 nucleotides in length. In other embodiments, the genomic nucleic acid fragment used for targeted integration is about 1kb to about 2kb in length. In some embodiments, a "contiguous" piece of nuclear genomic nucleic acid may be separated into flanking flaps when a gene of interest is cloned into a non-coding region of contiguous DNA. This allows the integration of the intervening nucleic acid region into the bacterial chromosome by double crossover recombination. In other embodiments, the flanking panels may include nucleic acid sequence segments of the nucleus that are not adjacent to each other. In some embodiments, the first flanking genomic nucleic acid segment is located about 0 to about 10000 base pairs from the second flanking genomic nucleic acid segment in the nuclear genome.

In some embodiments, the genomic nucleic acid fragments can be introduced into a vector to produce a backbone expression vector for targeted integration of any of the expression cassettes disclosed herein into the nuclear genome of a host organism. Any of a variety of methods known in the art for introducing nucleic acid sequences can be used. For example, nucleic acid fragments can be amplified from isolated genomic nucleic acid using suitable primers and PCR. The amplification products can then be introduced into any of a variety of suitable cloning vectors by, for example, ligation. Some useful vectors include, for example, but are not limited to: pGEM13z, pGEMT and pGEMTEAsy (Promega, Madison, Wis.); pSTBluel (EMD Chemicals Inc. san Diego, Calif.); and pcDNA3.1, pCR4-TOPO, pCR-TOPO-II, pCRBlunt-II-TOPO (Invitrogen, Carlsbad, Calif.). In some embodiments, at least one nucleic acid fragment from a nucleus is introduced into a vector. In other embodiments, two or more nucleic acid fragments from the nucleus are introduced into the vector. In some embodiments, the two nucleic acid fragments may be adjacent to each other in the vector. In some embodiments, the two nucleic acid fragments introduced into the vector can be separated by, for example, about 1 to 30 base pairs. In some embodiments, the sequence separating the two nucleic acid fragments may comprise at least one restriction endonuclease recognition site.

In various embodiments, regulatory sequences may be included in the vectors of the invention. In some embodiments, the regulatory sequence comprises a nucleic acid sequence for regulating the expression of a gene introduced into the nuclear genome (e.g., a gene of interest). In various embodiments, the regulatory sequences can be introduced into a backbone expression vector. For example, various regulatory sequences can be identified from the host microorganism genome. The regulatory sequence may comprise, for example, a promoter, enhancer, intron, exon, 5 'UTR, 3' UTR of a nuclear gene, or any portion of any of the foregoing. The regulatory sequences may be introduced into the desired vector using standard molecular biology techniques. In some embodiments, the vector comprises a cloning vector or a vector comprising a nucleic acid fragment for targeted integration.

In some embodiments, the nucleic acid sequence for regulating expression of a gene introduced into the nuclear genome can be introduced into the vector by PCR amplification of the 5 'UTR, 3' UTR, promoter and/or enhancer, or a portion thereof, of one or more genes. Primers flanking the amplified sequence are used to amplify the regulatory sequence using appropriate PCR cycling conditions. In some embodiments, the primers may include recognition sequences for any of a variety of restriction enzymes, thus introducing those recognition sequences into the PCR amplification products. The PCR product can be digested with appropriate restriction enzymes and introduced into the corresponding site of the vector.

In other embodiments, the expressed gene or genes may be integrated into the microbial genome using commercially available systems or similar methods. The applicability of these methods to Clostridium has been demonstrated, including the integration and expression of foreign genes in Clostridium cells (see, e.g., Heap et al (2007). J.Microbiol. methods.70: 452-464; Chen et al (2007). plasmid.58: 182-189).

Microbial hosts

Some embodiments relate to a microorganism comprising any of the polynucleotides, polynucleotide cassettes, expression cassettes, or expression vectors described herein. Host cells may include, but are not limited to: eukaryotic cells such as animal cells, insect cells, fungal cells and yeast and prokaryotic cells such as bacteria. In some embodiments, the host is c. In some embodiments, possible host microorganisms may include recombinant organisms.

In some embodiments, the recombinant microorganism may be a cellulolytic microorganism or a glycolytic microorganism. In some embodiments, the microorganism may be Clostridium cellulovorans, Clostridium cellulolyticum, Clostridium thermocellulosum, Clostridium jolyticum, Clostridium stercorarium, Clostridium themitmiis, Clostridium thermocoprirarium, Clostridium cellulolyticum, Clostridium thermocarboxylaceae, Clostridium cellulolytics, Clostridium polysaccharomyces, Clostridium poriferetii, Clostridium cellulolyticum, Clostridium sporotrichellum, Clostridium chattacharum, Clostridium alderidium alginate, Clostridium autobacterium, Clostridium thermophilium, Clostridium Thermobacterium biovorans, Clostridium cellulolyticum, Escherichia coli, Clostridium thermobacter acidolyticum, Clostridium thermococcus, Clostridium thermobacter, Clostridium thermobacter strain, Clostridium thermobacter, Clostridium thermococcus, Clostridium thermobacter, Clostridium thermobacter strain, Clostridium thermobacter thermococcus, Clostridium thermobacter strain, Clostridium thermobacter strain, Clostridium thermobacter thermofusus, Clostridium thermofuscus, Clostridium thermofusus, Clostridium thermofuscus, Clostridium thermofusobacterium, or Corynebacterium thermofusobacterium.

In some embodiments, the host microorganism may be selected from, for example, a greater variety of gram-negative bacteria such as Xanthomonas species and gram-positive bacteria including members of the genus Bacillus such as b.pumius, b.subtilis, and b.coegulans; members of the genus Clostridium such as c.acetobutylicum, c.aerotolerans, c.thermocellum, c.hydrotoluculum and c.thermosaccharolyticum; cellulomonas species such as Cellulomonas uda; and butyl rivivibrio fibrisolvens. Coli, other intestinal bacteria may be used, for example, the genera Erwinia such as e.chrysanthemi, and Klebsiella such as k.plantaticola and k.oxytoca. In some embodiments, the host microorganism may be Zymomonas mobilis. Similar acceptable host organisms are yeasts, typically of the species Cryptococcus such as Cr.Albidus, Monilia such as Pichia stipitis and Pullularia pullulans, and Saccharomyces cerevisiae; and other oligosaccharide-metabolizing bacteria, including, but not limited to, Bacteroides succinogenes, Thermoanaerobacter species such as t.ethanolica, Thermoanaerobium species such as t.brockii, thermoanaerobioides species such as t.acetoethyl and Ruminococcus species (e.g., r.flaveacides), Thermonospora species (e.g., t.fusca), and acetovirobrio species (e.g., a.luculolyticus). In some embodiments, the host organism may be selected from, for example, algae such as Amphora, Anabaena, Anikstrodes, Botryococcus, Chaetoceros, Chlorella, Chlorococcum, Cyclotella, Cylindroltheca, Dunaliella, Euglena, Hematococcus, Isochrysis, Monorphium, Nannochlorooris, Nannnochloropsis, Navicula, Nephrochlorris, Nephrosoris, Nitzschia, Nodularia, Nostoc, Ochromonas, Oocrystis, Oscillartoria, Pavlova, Phaeodacum, Playtmonas, Pleurochrysis, Porurura, Pseudomonas, Pyranococcus, Stichococcus, Thiocorus, Thiachromobacter, and Thiachromobacter. Literature concerning microorganisms that meet the subject criteria, for example in Biely, Trends in biotech.3: 286-90(1985), in Robsen et al, Enzyme Microb.Technol.11: 626-44(1989), and in Beguin Ann.Rev.Microbiol.44: 219-48(1990), the entirety of each of which is incorporated herein by reference. Suitable transformation methodologies are available for each of these different types of hosts and are described in detail below. See also, e.g., Brat et al, appl.env.Microbiol.29; 75: 2304 and 2311, the expression of xylose isomerase in Saccharomyces cerevisiae is disclosed.

In some embodiments, the host microorganism may be selected by, for example, its ability to produce proteins necessary for transporting oligosaccharides into the cell and its intracellular levels of enzymes that metabolize those oligosaccharides. Examples of such microorganisms include intestinal bacteria such as e.chrysanthemi and other Erwinia, and Klebsiella species that naturally produce β -xylosidase such as k.oxytoca, and k.plantaticola. Coli are promising hosts because they transport and metabolize cellobiose, maltose and/or maltotriose. See, e.g., Hall et al, j. bacteriol.169: 2713-17(1987).

In some embodiments, the host cell may be selected by screening to determine whether the detected microorganism is transporting and metabolizing the oligosaccharide. Such screening can be accomplished in a variety of ways. For example, microorganisms can be screened to determine which grow on a suitable oligosaccharide medium, the screening being designed to select for those microorganisms that do not merely transport monomers to the cells. See, e.g., Hall et al (1987), supra. Alternatively, the microorganism may be tested for a suitable intracellular enzyme activity, such as β -xylosidase activity. The growth of potential host microorganisms can be further screened for ethanol tolerance, salt tolerance, and temperature tolerance. See Alterhum et al, appl.environ.microbiol.55: 1943-48 (1989); beall et al, Biotechnol. & bioeng.38: 296-303(1991).

In some embodiments, the host microorganism may exhibit one or more of the following characteristics: the ability to grow in ethanol at an ethanol concentration of about 1%, the ability to tolerate salt levels, such as 0.3 molar, acetate levels, such as 0.2 molar, and temperatures, such as 40 ℃, and the ability to produce high levels of enzymes that are beneficial to the depolymerization of cellulose, hemicellulose, and pectin with minimal protease activity. In some embodiments, the host microorganism may also comprise a native xylanase or cellulase. In some embodiments, after introduction of an expression vector for fuel production, the host can produce ethanol from a variety of sugars, detected with greater than, e.g., 90% of theoretical yield while retaining one or more of the useful characteristics described above.

Host cell transformation

Some embodiments relate to methods of introducing any of the polynucleotides, polynucleotide cassettes, expression cassettes, and expression vectors described herein into a host microbial cell. Such embodiments thus produce recombinant microorganisms capable of producing fuel when cultured under a variety of fermentation conditions. Methods of transforming cells are known in the art and may include, for example, electroporation, lipofection, transfection, conjugation, chemical transformation, injection, particle gun bombardment, and magnetophoresis (magnetophoresis). Magnetophoresis micro-sized linear magnets constructed using magnetophoresis and nanotechnology were used to introduce nucleic acids into cells (Kuehnle et al, U.S. Pat. No. 6,706,394; 2004; Kuehnle et al, U.S. Pat. No. 5,516,670; 1996). In some embodiments, electrotransformation of methylated plasmids to C.phytofermentans can be accomplished according to protocols developed by Mermelstein et al, Bio/Technology 10: 190-195 (1992)). Further methods may include a combination of transformations. In other embodiments, positive transformants can be isolated on agar-solidified CGM supplemented with an appropriate antibiotic.

In various embodiments, the transformation methods can be combined with one or more methods of visualizing or quantifying the introduction of nucleic acids into one or more microorganisms. Furthermore, it is taught that this can be combined with the identification of any method that shows statistical differences from the unaltered phenotype, such as growth, fluorescence, carbon metabolism, isoprene flux or fatty acid content. The transformation method can likewise be combined with the visualization or quantification of the product resulting from the expression of the introduced nucleic acid.

Typically, vectors containing plasmid DNA may be methylated to prevent digestion with the Clostridium endonuclease prior to transformation to C.phytofermentans (Mermelstein and Papout sakis. appl. environ. Microbiol. 59: 1077-1081 (1993)). In some embodiments, methylation may be achieved by phi3TI methyltransferase. In a further embodiment, the plasmid DNA may be transformed into DH10 β. E.coli (Zhao et al, appl. environ. Microbiol.69: 2831-41(2003)) containing a vector pDHKM carrying an active copy of the phi3TI methyltransferase gene.

Typically, the C.phytofermentans strain can be grown anaerobically at 37 ℃ in Clostridial Growth Medium (CGM) supplemented with a suitable antibiotic such as 40. mu.g/ml erythromycin/chloramphenicol or 25. mu.g/ml methylsulfonylchromycin (Hartmanis and Gatenbeck. appl. environ. Microbiol.47: 1277-83 (1984)). Furthermore, the phytofermentans strain can be used in the formulation SCIENTIFIC at 37 ℃^TMIn oxygen-free chambers (THERMO FORMA)^TMMarietta, Ohio), 100ml of CGM supplemented with appropriate antibiotics.

In other embodiments, C.phytofermentans may be cultured according to the technique of Hungate (Hungate, R.E. (1969). A roll tube method for culture of strict anaerobes. methods Microbiol 3B, 117-. Medium GS-2C can be used for enrichment, isolation and routine culture of C. phytofermentans strain and is derived from Johnson et al, GS-2(Johnson, e.a., Madia, a.&Demain, a.l. (1981) chemical ly defined minor medium for growth of the anaerobic cellular thermolytic thermophile thermophilium thermal cell. apple Environ Microbiol 41, 1060-. GS-2C may contain the following: 6.0g/l ball-milled cellulose (Leschine, S.B.&clean-Parola, E. (1983). Methylphilic cellulistic Clostridium from fresh water environments, apple Environ Microbiol 46, 728-; 6.0g/l yeast extract; 2.1g/l urea; 2.9g/lK₂HPO₄；1.5g/l KH₂PO₄(ii) a 10.0g/l MOPS; 3.0g/l sodium citrate dihydrate; 2.0g/l cysteine hydrochloride; resazurin at 0.001 g/l; the pH was adjusted to 7.0. The broth culture can be carried out at 30 deg.C in the absence of O₂N of (A)₂And (4) incubating in atmosphere. The plate culture of the agar medium can be carried out at room temperature under N₂/CO₂/H₂(83: 10: 7) in an oxygen-free chamber (Coy Laboratory Products).

Growth, expression and fuel production

Some embodiments relate to the production of fuel using any of the recombinant microorganisms described herein. In some embodiments, one or more different recombinant microorganisms can be used in combination to produce a fuel. Such combinations may include more than one different type of recombinant microorganism in a single fermentation reaction. Other combinations may include one or more different types of recombinant microorganisms used in sequential steps of the process to produce fuel from biomass. In some embodiments, a single recombinant microorganism can be used to produce a fuel from biomass. In some embodiments, recombinant microorganisms can be used to catalyze the production of products such as sugars and polysaccharides from lignocellulose and other substrates.

In some embodiments, the recombinant microorganism can be cultured under conditions suitable for expression of the genes contained in the expression cassette and for fuel production. In some embodiments, the incubation conditions may vary depending on the host microorganism used. In some embodiments, the incubation conditions may vary depending on the type of regulatory element associated with the expression cassette. For example, a recombinant microorganism comprising an expression cassette containing an inducible promoter linked to a nucleic acid may require the addition of a specific agent to the culture medium for nucleic acid expression.

In other embodiments, the recombinant microorganism can be a strain of c.phytofermentans for efficiently fermenting a wide spectrum of materials into fuels, such as co-pending U.S. patent application No. 2007/0178569 and U.S. provisional patent application No. 2008/2/28Both of which are expressly incorporated herein in their entirety, as described in patent application No. 61/032,048. In some embodiments, the c.phytofermentans strain may be american type culture collection 700394^T。

In some embodiments, a process for fermenting a substrate (e.g., lignocellulosic feedstock) can comprise: (1) providing a pre-treated biomass source material comprising a plant polysaccharide (wherein the treatment may be shearing, chopping, grinding, or the like); (2) inoculating the pretreated biomass-source material with a first culture comprising an anaerobic microorganism that hydrolyzes cellulose (as disclosed herein) in the presence of oxygen to produce an aerobic broth, wherein the anaerobic microorganism is capable of at least partially hydrolyzing the plant polysaccharide; and (3) fermenting the inoculated anaerobic broth until a portion of the plant polysaccharide has been converted to ethanol. In other embodiments, the method of using a fermentation substrate comprises: (1) providing a pre-treated plant polysaccharide-containing biomass source material (wherein the pre-treatment can be shearing, chopping, grinding, or the like); (2) inoculating the pretreated biomass source material with a first culture comprising an aerobic microorganism that hydrolyzes cellulose (such as the microorganisms disclosed herein) in the presence of oxygen to produce an aerobic broth, wherein the aerobic microorganism is capable of at least partially hydrolyzing the plant polysaccharide; (3) incubating the aerobic broth until the cellulose-hydrolyzing aerobic microorganism consumes at least a portion of the oxygen and hydrolyzes at least a portion of the plant polysaccharide, thereby converting the aerobic broth to an anaerobic broth comprising a hydrolysate comprising fermentable sugars; (4) inoculating an anaerobic broth with a second culture comprising an anaerobic microorganism (such as a microorganism disclosed herein) capable of converting fermentable sugars to ethanol; and (5) fermenting the inoculated anaerobic broth until a portion of the fermentable sugars have been converted to ethanol.

Fermentation efficiency can be measured in a variety of ways, for example, efficiency changes can be measured by comparison to wild-type organisms. Likewise, the change in efficiency can be measured as the ratio of the production of fuel from a substrate (e.g., cellulose) per unit time between the recombinant organism and the wild-type organism. In some embodiments, the change in efficiency between a recombinant organism and a wild-type organism may be more than 1%, more than 5%, more than 10%, more than 15%, more than 20%, more than 25%, more than 30%, more than 35%, more than 40%, more than 45%, more than 50%, more than 55%, more than 60%, more than 65%, more than 70%, more than 75%, more than 80%, more than 85%, more than 90%, more than 95%, more than 100%, and more than 200%.

Various media for growing various microorganisms are known in the art. The growth medium may be minimal and/or defined, or complete and/or complex. The fermentable carbon source may include pretreated or non-pretreated feedstock comprising cellulosic, hemicellulosic, and/or lignocellulosic materials, such as sawdust, wood chips, wood pulp, paper pulp, pulp waste streams, grasses such as switchgrass, biomass plants, and agricultural crops, such as crambe, seaweed, rice hulls, bagasse, jute, leaves, grass clippings, corn stover, corn cobs, corn kernels, corn meal, distillers grain, and pectin.

Other nutrients may be present in the fermentation reaction including nitrogen containing compounds including amino acids, proteins, hydrolyzed proteins, ammonia, urea, nitrates, nitrites, soy derivatives, casein derivatives, milk powder, milk derivatives, whey, yeast extracts, hydrolyzed yeast, autolysed yeast, corn steep liquor, monosodium glutamate and/or other sources of fermentation nitrogen, vitamins and/or mineral supplements. In some embodiments, one or more additional lower molecular weight carbon sources may be added or present, such as glucose, sucrose, maltose, corn syrup, lactic acid, and the like. In some embodiments, one possible form of growth medium may be modified Luria-Bertani (LB) broth (with 10g Difco tryptone, 5g Difco yeast extract, and 5g sodium chloride per liter) as described in Miller J.H. (1992).

Increased fuel production may be observed after a host cell competent to produce fuel is transformed with an expression vector and recombinant microorganism described herein grown under suitable conditions. Improved fuel production can be observed by standard methods known to those skilled in the art.

In some embodiments, growth and production of the recombinant microorganisms disclosed herein can be accomplished in conventional batch, fed-batch, or continuous fermentation. In some embodiments, it is desirable for some hosts to complete fermentation under oxygen-poor or anaerobic conditions. In other embodiments, fuel production may be achieved under conditions where oxygen levels are sufficient to allow aerobic biological growth; and, optionally, under conditions where a gas lift fermenter or equivalent is used. In some embodiments, the recombinant microorganism is grown using batch culture. In some embodiments, the recombinant microorganism is grown fermentatively using a bioreactor. In some embodiments, the growth medium in which the recombinant microorganism is grown is altered, thereby resulting in an increased level of fuel production. The number of medium changes may vary.

The pH of the fermentation may be high enough for host growth and fuel production. The pH of the fermentation broth may be adjusted using a neutralizing agent such as calcium carbonate or calcium hydroxide. The choice and combination of any of the above fermentation processes is highly dependent on the host strain and the processes used downstream.

In some embodiments, the organic solvent may be purified from c. In some embodiments, the organic solvent is purified by distillation. In an exemplary embodiment, about 96% ethanol may be distilled from the fermented mixture. In further embodiments, fuel grade ethanol, i.e., about 99-100% ethanol, may be obtained from about 96% ethanol by azeotropic distillation. Azeotropic distillation can be achieved by adding benzene to about 96% ethanol and then redistilling the mixture. Alternatively, approximately 96% ethanol may be passed through a molecular sieve to remove moisture.

In some embodiments, a method of producing a fuel may comprise culturing any of the microorganisms described herein and supplementing the culture medium with a protein expressed by a polynucleotide, polynucleotide cassette, expression vector comprising any of the nucleic acids encoding a predictor gene identified in c. In particular embodiments, the nucleic acid may encode a hydrolase. In some embodiments, the culture medium may be supplemented with the isolated protein.

The following examples are for illustrative purposes and are not intended to be limiting.

Examples

Example 1 identification of DNA sequences in C

And (5) constructing, separating and sequencing an insertion library. Genomic DNA was sequenced using a conventional whole genome shotgun strategy. Briefly, random 2-3kb DNA fragments were isolated after mechanical shearing. These gel extracted fragments were concentrated, end-repaired and cloned into pUC 18. Using PEBigDye^TMTerminator chemistry (Perkin Elmer) completed the double-ended plasmid sequencing reaction, and the sequencing ladder was dissolved on a PE 3700 automated DNA sequencer. One round (x read) of sequencing of the small insert library was completed, resulting in x-fold redundancy.

Sequence assembly and gap closure. Sequence tracing with Phred43, 44 was performed prior to assembly with Phrap (p.green, University of Washington, Seattle, Washington, USA) and visualization with Consed45 to call up bases and assess data quality.

Sequence analysis and annotation. Genetic modeling was performed using Critica47, Glimmer48 and Generation (composite. ornl. gov/Generation/index. shtml) modeling packages, the results were combined, and a protein-Based Local Alignment Search Tool (BLASTP) search of the translations against the non-redundant database (NR) of GenBank was performed. The alignment of the N-terminus of each gene model relative to the best NR match was used to select gene models. If a BLAST match is not returned, the Critica model will be retained. Gene models that overlap each other by more than 10% of their length are labeled, and genes that match with BLAST are preferably selected. In addition to BLASTP vs NR, revised base/proteomes were searched using KEGG GENES, InterPro (incorporating Pfam, TIGRFAms, SmartHMM, PROSITE, PRINTS and ProDom), and Clusters of organizational Groups of proteins (COGs) databases. From these results, classification was performed using KEGG and COG grades. The initial criteria for automated functional assignment requires a minimum of 50% residue identity over 80% match length of the BLASTP alignment, plus evidence of concurrence from modeling or graphical methods. The reduction of identity to 30% over 80% of the length forms a hypothetical distribution.

Using BLASTP, each c. phytofermentans gene was searched on all genes from the sequenced genome, extracting the first alignment of each predicted protein. Theoretical subcellular localization analysis and signal peptide cleavage sites were done using PSORT. The CAZy domain is annotated by CAZy ((carbohydrate active enzyme, www.cazy.org)). The transporter is annotated with TransportDB (www.membranetransport.org). The full sequence of c.phytofermentans (accession number NC 010001) was available in 8 months 2007.

Example 2 expression analysis of DNA sequences in C

And (4) designing a microarray. Phytofermentans routine Affymetrix microarray design (fig. 3) allows for the measurement of expression levels of all identified Open Reading Frames (ORFs), evaluation of 5 'and 3' untranslated regions of mrnas, operon measurements, tRNA discovery and differences between selectable gene models (the main difference is the choice of initiation codon).

Using GeneMark^TM(Besemer, J. and M.Borodovsky.2005. GeneMark: web software for gene sizing in prokaryotes, eukaryotes and viruses. nucleic Acids Res 33: W451-4) and the Glimer (Delcher, A.L., K.A.Brantke, E.C.powers and S.L.Salzberg.2007.identification of bacterial genes and endosymbiont DNA with Glimer.Bioinformatics 23: 673-9) predictors identify putative protein coding sequences. The combination of these two predictions was used as the expression set. If the N-terminal regions of the two proteins are different, the smaller of the two proteins is used for transcription analysis, but the extended region is represented by the probe to indicate the true N-terminus. These array designs have resulted in being included in GenBank recordsAll proteins mentioned and including other ORFs not found in GenBank records. Standard Affymetrix array design protocols were followed to ensure that each probe was unique to minimize cross-hybridization. The array was designed on an Affymetrix GeneChip model 49-5241 with 11 μ features^TMImplemented on an array.

Cell culture growth and RNA isolation. 30 ℃ and 100% N₂C.phytofermentans were next cultured in GS2 medium supplemented with 0.3% (wt/vol) of one of 14 specific carbon sources (glucose, xylan, cellobiose, cellulose, D-arabinose, L-arabinose, fucose, galactose, laminarin, mannose, pectin, rhamnose, xylose or yeast extract), in test tubes or 500ml Erlenmeyer flasks. Growth was measured spectrophotometrically by monitoring the change in optical density at 660 nm.

From exponential metaphase cultures (OD)₆₆₀0.5) was purified. 1ml of the sample was snap frozen by immersion in liquid nitrogen. Cells were harvested by centrifugation at 8000rpm for 5 minutes at 4 ℃ and Qiagen RNeasy was used^TMMini Kit and treatment with RNase-free DNase I to isolate total RNA. Using Nanodrop^TMThe absorbance at 260/280nm of the spectrophotometer measures the RNA concentration.

And (4) processing the microarray. cDNA synthesis, array hybridization and imaging were performed at the genomic core facility of the massachusetts university medical center. Using Affymetrix GeneChip^TMDNA labeling kit, 10. mu.g total RNA from each sample was used as template for synthesis of labeled cDNA. The labeled cDNA sample was directed according to Affymetrix with Affymetrix GeneChip^TMArray hybridization. The hybridized array is used in GeneChip^TMThe scanner 3000 scans. The resulting raw spot image data file was processed into a center (pivot), image report and normalized probe intensity file using microarray set version 5.0 (MAS 5.0). Expression values were calculated using a conventional software package implementing the GCRMA method.

Data provided by BioConductor (Gentleman, R.C, V.J.Carey, D.M.Bates, B.Bolstad, M.Dettling, S.Dudoit, B.Ellis, L.Gautier, Y.Ge, J.Gentry, K.Hornik, T.Hothorn, W.Huber, S.Iacus, R.Irizary, F.Leisch, C.Li, M.Maechler, A.J.Rosik, G.Sawitzki, C.Smith, G.Smyth, L.Tierney, J.Y.Yang and J.Zhang.2004.Conduducer: open software firmware for Biology and biological analysis, Biostrain for biological analysis and analysis in Biostrain, P.S.5. variance, P.S.7. and P.Biostrain, Biostrain, 3. 5. variance, P.S.S.7. and P.Biostrain, quality of probes, Biostrain, and quality of probes, B.S.S.S.S.5. filtration, S.S.S.7. and P.7. balance, B.S.B.S.B.B.B.B.B.B.B.B.B.B.B.B.B.Bolsbad. testing, A. probes, quality of Biostrain, P.E.E.A.A.C.C.C.C.C.C.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.S.B.S.B.A. probes, A. probes, B.S.S.S.S.S.S.S.S.S.S.S.S.S.S.B.S.S.S.S.S.S.S.. No image artifacts due to microarray fabrication or processing were observed. Microarray background values of 34 (glucose), 32 (cellobiose), 30 (xylose) and 34 (cellulose) were within the typical 20-100 average background values for the Affymetrix array. Quality control checks of methods suitable for use in c.

Assessment of the extent of mRNA transcription. To identify putative promoter sequences, the length of the mRNA transcript was estimated with errors ranging +/-24 bases. The expression levels of the intergenic region and the intraORF region of the adjacent ORF were compared using specific probes (FIG. 4). The reading above 1000a.u. indicates that the specific probe represents a partial expression region. Conversely, readings below 250a.u. indicate that there is no specific hybridization between the probe and the expression region. Probes representing putative expression regions have a reading greater than:

mean (Gene 1) -stdev (Gene 1) or mean (Gene 2) -stdev (Gene 2)

To avoid errors from a single probe, readings from at least two consecutive probes are used to represent the expression region. However, since some probes may be non-reactive in nature, a single probe reading below the threshold is included in the mapped expression region, where successive probes meet the criteria upstream and downstream of the non-reactive probe. This allows for better quantification of the transcriptional range of low expression level genes and incorporates the intergenic region requirements for adjacent expression.

BLAST was run on the c.phytofermentans genome with each probe tested and was used to identify potential sources of cross-hybridization. For any match with an E value less than 0.01, the probes are detected on the array for a strength corresponding to a BLAST match. If any match shows an expression value higher than the probe in question, the probe is tagged as a possible source of cross-hybridization. For each putative expression region, the number of positive probes and the number of these positive probes considered possible cross-crosses were reported. The transcription range of each of the predicted glycoside hydrolase-related proteins and the putative alcohol dehydrogenase was reported.

Genes corresponding to transcripts that were more than 4-fold differentially expressed when grown on D-arabinose compared to glucose are listed in table 10.

TABLE 10 expression on arabinose

Differential expression (log)₂)	JGI number	Description of COG
			5.1	Cphy1174	Pyruvate formate lyase
5.0	Cphy1175	Glycyl radical enzyme activator protein family
			4.9	Cphy1176	Microventricular proteins
6.1	Cphy1177	Class II aminal enzymes/adducin family proteins
			5.9	Cphy1178	Aldehyde dehydrogenase
5.3	Cphy1179	Alcohol dehydrogenase zinc binding domain proteins
			5.6	Cphy1180	Microventricular proteins
5.7	Cphy1181	Microventricular proteins
			5.0	Cphy1182	Microventricular proteins
4.5	Cphy1183	Propylene glycol utilizing protein
			3.9	Cphy1184	Ethanolamine utilization protein EutN/Carboxylic enzyme structural protein Cccll
3.4	Cphy1185	Respiratory chain NADH dehydrogenase domain 51kDa subunit
			3.9	Cphy3153	RbsD or FucD transporters
4.4	Cphy3154	Carbohydrate kinase FGGY
			4.4	Cphy3155	L-fucose isomerase
2.6	Cphy3367	Cellulose 1, 4-beta-cellobiosidase
			2.8	Cphy3368	Cellulose 1, 4-beta-cellobiosidase

Genes corresponding to transcripts that were more than 4-fold differentially expressed when grown on L-arabinose compared to glucose are listed in table 11.

TABLE 11 expression on L-arabinose

Genes corresponding to transcripts that were differentially expressed by more than 4 fold when grown on cellobiose compared to glucose are listed in table 12.

TABLE 12 expression on Cellobiose

Differential expression (log)₂)	JGI number	Description of COG
			3.5	Cphy0430	Glycosyltransferase 36
2.1	Cphy1586	ABC transporter associated
			2.3	Cphy1587	Transport of monosaccharide ATPases
2.0	Cphy2264	Glycosidases PH1107
			1.9	Cphy2265	Extracellular solute binding protein family 1
2.0	Cphy2266	Hypothetical proteins
			2.2	Cphy2267	Binding protein dependent transport systems, inner membrane components
2.3	Cphy2268	Binding protein dependent transport systems, inner membrane components
			2.3	Cphy2269	Hypothetical proteins
2.3	Cphy2270	-
			2.0	Cphy2271	-
2.1	Cphy2272	Binding protein dependent transport systems, inner membrane components
			2.2	Cphy2273	Binding protein dependent transport systems, inner membrane components
4.3	Cphy2464	Binding protein dependent transport systems, inner membrane components
			3.8	Cphy2465	Binding protein dependent transport systems, inner membrane components
2.7	Cphy2466	Extracellular solute binding protein family 1
			3.4	Cphy2467	Transcriptional regulatory protein, LacI family

Genes corresponding to transcripts that were differentially expressed more than 4 fold when grown on cellulose compared to glucose are listed in table 13.

TABLE 13 expression on cellulose

Differential expression (log)₂)	JGI number	Description of COG
			3.4	Cphy0430	Glycosyltransferase 36
3.1	Cphy1071	Glycoside hydrolase family 26
			3.0	Cphy1163	Cellulase enzymes
4.1	Cphy1529	Extracellular solute binding protein family 1
			3.2	Cphy1530	Binding protein dependent transport systems, inner membrane components
3.6	Cphy1531	Binding protein dependent transport systems, inner membrane components
			2.1	Cphy1586	ABC transporter associated
2.3	Cphy1587	Transport of monosaccharide ATPases
			6.2	Cphy1799	Glycoside hydrolase family 18
6.2	Cphy1800	Glycoside hydrolase family 18
			2.0	Cphy1929	Glycosyltransferase 36
3.2	Cphy2105	Endo-1, 4-beta-xylanase
			2.6	Cphy2263	Hypothetical proteins
2.0	Cphy2264	Glycosidases PH1107
			2.9	Cphy2265	Extracellular solute binding protein family 1
2.0	Cphy2266	Hypothetical proteins
			2.2	Cphy2267	Binding protein dependent transport systems, inner membrane components
2.3	Cphy2268	Binding protein dependent transport systems, inner membrane components
			2.3	Cphy2269	Hypothetical proteins
3.0	Cphy2270	-
			2.7	Cphy2271	-
2.1	Cphy2272	Binding protein dependent transport systems, inner membrane components
			2.2	Cphy2273	Binding protein dependent transport systems, inner membrane components

2.3	Cphy2274	Extracellular solute binding protein family 1
			4.0	Cphy2464	Binding protein dependent transport systems, inner membrane components
3.5	Cphy2465	Binding protein dependent transport systems, inner membrane components
			2.5	Cphy2466	Extracellular solute binding protein family 1
3.0	Cphy2467	Transcriptional regulatory protein, LacI family
			2.9	Cphy2569	Extracellular solute binding protein family 1
3.9	Cphy2570	Dependent bindingTransport system for proteins, inner Membrane Components
			3.9	Cphy2571	Binding protein dependent transport systems, inner membrane components
2.0	Cphy3209	Binding protein dependent transport systems, inner membrane components
			2.0	Cphy3210	Putative polysaccharide transport system substrate binding proteins
4.8	Cphy3367	Cellulose 1, 4-beta-cellobiosidase
			5.5	Cphy3368	Cellulose 1, 4-beta-cellobiosidase
2.5	Cphy3854	Glycosyltransferase 36
			3.1	Cphy3855	Phosphomannose mutase
2.5	Cphy3858	Extracellular solute binding protein family 1
			3.8	Cphy3859	Binding protein dependent transport systems, inner membrane components
3.7	Cphy3860	Binding protein dependent transport systems, inner membrane components
			2.9	Cphy3861	Two-component transcriptional regulatory proteins, the AraC family
2.3	Cphy3862	Endo-1, 4-beta-xylanase

Genes corresponding to transcripts that were differentially expressed by more than 4 fold when grown on fucose compared to glucose are listed in table 14.

TABLE 14 expression on fucose

Differential expression (log)₂)	JGI number	Description of COG
			2.7	Cphy0580	ABC transporter associated
2.6	Cphy0581	ATP enzymes for transport of monosaccharides
			2.8	Cphy0582	ATP enzymes for transport of monosaccharides
2.2	Cphy0583	Putative carbohydrate ABC transporter, substrate binding protein
			2.2	Cphy0584	L-arabinose isomerase
2.8	Cphy1071	Glycoside hydrolase family 26
			2.8	Cphy1163	Cellulase enzymes
5.6	Cphy1174	Pyruvate hydrochloric acid lyase
			6.2	Cphy1175	Glycoyl radical enzyme activated protein family
5.8	Cphy1176	Microventricular proteins
			6.5	Cphy1177	Class II aminal enzymes/adducin family proteins
6.4	Cphy1178	Aldehyde dehydrogenase
			6.3	Cphy1179	Alcohol dehydrogenase zinc binding domain proteins
6.4	Cphy1180	Microventricular proteins
			6.4	Cphy1181	Microventricular proteins
6.0	Cphy1182	Microventricular proteins
			5.9	Cphy1183	Propylene glycol utilizing protein
5.4	Cphy1184	Ethanolamine utilization protein EutN/Carboxylic enzyme structural protein Ccm1
			5.2	Cphy1185	Respiratory chain NADH dehydrogenase domain 51kDa subunit
4.7	Cphy1186	Microventricular proteins
			4.9	Cphy1799	Glycoside hydrolase family 18
5.2	Cphy1800	Glycoside hydrolase family 18
			6.1	Cphy2010	ABC transporter associated
6.6	Cphy2011	Transport of monosaccharide ATPases

5.9	Cphy2012	Periplasmic binding protein/LacI transcriptional regulator protein
			2.0	Cphy2105	Endo-1, 4-beta-xylanase
2.4	Cphy2569	Extracellular solute binding protein family 1
			3.3	Cphy2570	Binding protein dependent transport systems, inner membrane components
3.0	Cphy2571	Binding protein dependent transport systems, inner membrane components
			2.5	Cphy2919	Unknown functional protein DUF1565
4.9	Cphy3153	RbsD or FucU transporters
			5.3	Cphy3154	Carbohydrate kinase FGGY
5.3	Cphy3155	L-fucose isomerase
			2.3	Cphy3308	Hypothetical proteins
4.2	Cphy3367	Cellulose 1, 4-beta-cellobiosidase
			4.7	Cphy3368	Cellulose 1, 4-beta-cellobiosidase
2.1	Cphy3854	Glycosyltransferase 36
			2.3	Cphy3855	Phosphomannose mutase
2.3	Cphy3858	Extracellular solute binding protein family 1
			3.3	Cphy3859	Binding protein dependent transport systems, inner membrane components
3.2	Cphy3860	Binding protein dependent transport systems, inner membrane components
			2.3	Cphy3861	Two-component transcriptional regulatory proteins, the AraC family

Genes corresponding to transcripts that were differentially expressed more than 4 fold when grown on galactose compared to glucose are listed in table 15.

TABLE 15 expression on galactose

Differential expression (log)₂)	JGI number	Description of COG
			2.1	Cphy3367	Cellulose 1, 4-beta-cellobiosidase
2.0	Cphy3368	Cellulose 1, 4-beta-cellobiosidase

Genes corresponding to transcripts that were differentially expressed by more than 4 fold when grown on laminarin compared to glucose are listed in table 16.

TABLE 16 expression on laminarin

Differential expression (log)₂)	JGI number	Description of COG
			5.4	Cphy0857	Cellobiose phosphorylase-like protein
5.1	Cphy0858	Glycoside hydrolase family 30
			4.9	Cphy0859	Hypothetical proteins
5.7	Cphy0860	Binding protein dependent transport systems, inner membrane components
			5.8	Cphy0861	Binding protein dependent transport systems, inner membrane components
3.9	Cphy0862	Extracellular solute binding protein family 1
			3.8	Cphy0863	The internal region of histidine kinase
3.8	Cphy0864	Two-component transcriptional regulatory proteins, the AraC family
			4.0	Cphy0865	Hypothetical proteins
2.2	Cphy1448	Phosphonate ABC transporters, periplasmic phosphonate binding proteins
			1.8	Cphy1449	Phosphonic acid ABC transporters, ATPase subunits
1.9	Cphy1450	Phosphonic acidsABC transporter, inner membrane subunit
			2.0	Cphy1451	Phosphonic acid ABC transporters, inner membrane subunits
2.0	Cphy1929	Glycosyltransferase 36
			4.9	Cphy3388	Endoglucanase-1, 3-beta-D-glucosidase

Genes corresponding to transcripts that were differentially expressed more than 4-fold when grown on mannose compared to glucose are listed in table 17.

TABLE 17 expression on mannose

Differential expression (log)₂)	JGI number	Description of COG
			2.6	Cphy1071	Glycoside hydrolase family 26

2.5	Cphy1585	Solute binding component of putative ABC transporters
			2.5	Cphy1586	ABC transporter associated
2.9	Cphy1587	Transport of monosaccharide ATPases
			3.9	Cphy1799	Glycoside hydrolase family 18
4.2	Cphy1800	Glycoside hydrolase family 18
			2.5	Cphy2105	Endo-1, 4-beta-xylanase
3.1	Cphy2569	Extracellular solute binding protein family 1
			3.6	Cphy2570	Binding protein dependent transport systems, inner membrane components
3.4	Cphy2571	Binding protein dependent transport systems, inner membrane components
			3.8	Cphy3367	Cellulose 1, 4-beta-cellobiosidase
4.1	Cphy3368	Cellulose 1, 4-beta-cellobiosidase
			2.1	Cphy3855	Phosphomannose mutase
2.4	Cphy3858	Extracellular solute binding protein family 1
			3.3	Cphy3859	Binding protein dependent transport systems, inner membrane components
3.1	Cphy3860	Binding protein dependent transport systems, inner membrane components
			2.4	Cphy3861	Two-component transcriptional regulatory proteins, the AraC family

Genes corresponding to transcripts that were differentially expressed by more than 4 fold when grown on pectin compared to glucose are listed in table 18.

TABLE 18 expression on pectin

Differential expression (log)₂)	JGI number	Description of COG
			2.7	Cphy0218	Glycoside hydrolase family 31
2.2	Cphy0219	Hypothetical proteins
			2.5	Cphy0220	Glycoside hydrolase family 3 domain proteins
3.5	Cphy0430	Glycosyltransferase 36

2.2	Cphy1071	Glycoside hydrolase family 26
			2.3	Cphy1174	Pyruvate hydrochloric acid lyase
2.4	Cphy1175	Glycoyl radical enzyme activated protein family
			2.1	Cphy1176	Microventricular proteins
3.3	Cphy1177	Class II aminal enzymes/adducin family proteins
			2.9	Cphy1178	Aldehyde dehydrogenase
2.3	Cphy1179	Alcohol dehydrogenase zinc binding domain proteins
			2.9	Cphy1180	Microventricular proteins
2.8	Cphy1181	Microventricular proteins
			2.2	Cphy1182	Microventricular proteins
2.1	Cphy1183	Propylene glycol utilizing protein
			2.0	Cphy1219	Xylose isomerase
3.2	Cphy1612	Pectate lyase/Amb allergen
			2.0	Cphy1714	Glycoside hydrolase family 85
3.6	Cphy1715	Binding protein dependent transport systems, inner membrane components
			3.4	Cphy1716	Binding protein dependent transport systems, inner membrane components
2.3	Cphy1717	Extracellular solute binding protein family 1
			2.7	Cphy1718	Glycosidases PH1107
2.5	Cphy1719	Hypothetical proteins
			2.4	Cphy1720	Glycoside hydrolase family 38
2.4	Cphy1888	Hypothetical proteins
			3.4	Cphy1929	Glycosyltransferase 36
2.1	Cphy2010	ABC transporter associated
			2.1	Cphy2011	Transport of monosaccharide ATPases

2.1	Cphy2262	N-acetylglucosamine-2-epimerase
			3.1	Cphy2263	Hypothetical proteins
3.3	Cph y2264	Glycosidases PH1107
			3.1	Cphy2265	Extracellular solute binding protein family 1
3.1	Cphy2266	Hypothetical proteins
			4.0	Cphy2267	Binding protein dependent transport systems, inner membrane components
3.8	Cphy2268	Binding protein dependent transport systems, inner membrane components
			3.9	Cphy2269	Hypothetical proteins
3.6	Cphy2270	-
			3.6	Cphy2271	-
4.0	Cphy2272	Binding protein dependent transport systems, inner membrane components
			4.0	Cphy2273	Binding protein dependent transport systems, inner membrane components
3.2	Cphy2274	Extracellular solute binding protein family 1
			2.4	Cphy2275	Hypothetical proteins
2.1	Cphy2276	Mannan-1, 4-beta-mannosidase
			3.9	Cphy2464	Binding protein dependent transport systems, inner membrane components
3.5	Cphy2465	Binding protein dependent transport systems, inner membrane components
			2.4	Cphy2466	Extracellular solute binding protein family 1
2.9	Cphy2467	Transcriptional regulatory protein, LacI family
			2.9	Cphy2919	Unknown functional protein DUF1565
2.4	Cphy3153	RbsD or FcuU transport
			2.5	Cphy3154	Carbohydrate kinase FGGY
2.6	Cphy3155	L-fucose isomerase
			2.2	Cphy3160	Glycoside hydrolase family 2 carbohydrate binding

3.4	Cphy3367	Cellulose 1, 4-beta-cellobiosidase
			3.5	Cphy3368	Cellulose 1, 4-beta-cellobiosidase
4.2	Cphy3585	Transcriptional regulatory protein, LacI family
			6.5	Cphy3586	Arabinogalactan endo-1, 4-beta-galactosidase
6.8	Cphy3587	Hypothetical proteins
			6.5	Cphy3588	Binding protein dependent transport systems, inner membrane components
6.1	Cphy3589	Binding protein dependent transport systems, inner membrane components
			6.7	Cphy3590	Extracellular solute binding protein family 1
2.2	Cphy3859	Binding protein dependent transport systems, inner membrane components
			2.0	Cphy3860	Binding protein dependent transport systems, inner membrane components

Genes corresponding to transcripts that were more than 4-fold differentially expressed when grown on rhamnose compared to glucose are listed in table 19.

TABLE 19 expression on rhamnose

Genes corresponding to transcripts that were differentially expressed by more than 4 fold when grown on xylan compared to glucose are listed in table 20.

TABLE 20 expression on xylan

Differential expression (log)₂)	JGI number	Description of COG
			2.4	Cphy1132	Solute binding component of putative ABC transporters
2.7	Cphy1133	Transport of monosaccharide ATPases
			2.6	Cphy1134	ABC transporter associated

2.2	Cphy1177	Class II aminal enzymes/adducin family proteins
			2.1	Cphy1178	Aldehyde dehydrogenase
2.0	Cphy1181	Microventricular proteins
			3.6	Cphy1219	Xylose isomerase
2.7	Cphy1448	Phosphonate ABC transporters, periplasmic phosphonate binding proteins
			2.3	Cphy1449	Phosphonic acid ABC transporters, ATPase subunits
2.5	Cphy1450	Phosphonic acid ABC transporters, inner membrane subunits
			2.6	Cphy1451	Phosphonic acid ABC transporters, inner membrane subunits
2.8	Cphy1528	Transcriptional regulatory protein, AraC family
			6.7	Cphy1529	Extracellular solute binding protein family 1
6.8	Cphy1530	Binding protein dependent transport systems, inner membrane components
			6.7	Cphy1531	Binding protein dependent transport systems, inner membrane components
5.3	Cphy1532	Domain of unknown function DUF1801
			4.5	Cphy2105	Endo-1, 4-beta-xylanase
4.6	Cphy2106	Unknown functional protein DUF323
			4.8	Cphy2108	Endo-1, 4-beta-xylanase
2.4	Cphy2632	Glycoside hydrolase family 43
			2.4	Cphy2654	Carbohydrate ABC transporter substrate binding proteins
4.2	Cphy2655	Binding protein dependent transport systems, inner membrane components
			4.3	Cphy2656	Binding protein dependent transport systems, inner membrane components
4.3	Cphy3009	Glycoside hydrolase family 3 domain proteins
			4.1	Cphy3010	Endo-1, 4-beta-xylanase
4.6	Cphy3158	Alpha-glucuronidase
			4.3	Cphy3206	Receptive methyl chemotactic sensory transducin

4.5	Cphy3207	Glycoside hydrolase family 8
			4.4	Cphy3208	Binding protein dependent transport systems, inner membrane components
4.3	Cphy3209	Binding protein dependent transport systems, inner membrane components
			4.6	Cphy3210	Putative polysaccharide transport system substrate binding proteins
3.3	Cphy3211	Two-component transcriptional regulatory proteins, the AraC family
			3.0	Cphy3212	The internal region of histidine kinase
3.4	Cphy3419	Xylulose kinase

Genes corresponding to transcripts that were differentially expressed by more than 4 fold when grown on xylose compared to glucose are listed in table 21.

TABLE 21 expression on xylose

Genes corresponding to transcripts that were differentially expressed by more than 4 fold when grown on yeast extract compared to when grown on glucose are listed in table 22.

TABLE 22 expression on Yeast extracts

Differential expression (log)₂)	JGI number	Description of COG
			2.4	Cphy0857	Cellobiose phosphorylase-like protein
2.2	Cphy0858	Glycoside hydrolase family 30
			3.2	Cphy0860	Binding protein dependent transport systems, inner membrane components
3.6	Cphy0861	Binding protein dependent transport systems, inner membrane components
			3.0	Cphy1448	Phosphonic acid ABC transporter
2.0	Cphy1449	Phosphonic acid ABC transporter
			2.1	Cphy1450	Phosphonic acid ABC transporter
2.0	Cphy1451	Phosphonic acid ABC transporter

Example 3 genomic profiling of the genome of C.phvtofementans

And (3) genome composition. Phytofermentans ISDg ATCC 700394 has a single circular 4,847,594bp chromosome and no plasmid. The origin of replication was defined using the position of the GC-tilted transition point and the presence of the unique replication protein danA (fig. 5). The G + C content was 35.3%. The depicted G + C content in the 1kb range as a function of position in the genome (fig. 6) shows several isolated genomic islands with higher G + C content. The 6 specific islands with an average G + C content > 50% were defined as 1kb regions, shown in FIG. 6. Genes were identified in or around each of these genomic islands (table 3).

Table 23, general characteristics of c. phytofermentans genomes

Taken together, these high G + C islands appear to have low gene density. Of the 161 kb regions (including 6 genomic islands) with a G + C content > 50%, 12 regions contained no genes. The only genes found in the high G + C island are the two component system (histidine kinase and response regulatory protein) and the protein with a putative collagen triple helix repeat. Most genes surrounding these high G + C regions are not functionally known. One of the genes adjacent to region V encodes a phage protein (fig. 6). The genome encodes 3,926 the predicted coding sequence (CDS) (table 23).

The Clostridial genome typically displays a strong coding propensity, but in c. phytofermentans The CDS is encoded equally on The leader (52%) and The trailer (48%) (Seedorf, h. et al. The genome of Clostridium kluyveri, a strict anaerobe with unique metallic catalysts, proc. natl.acad.sci.u.s.a.105, 2128 + 2133 (2008)). CDS 73% was assigned to putative function, while 11% has similarity to unknown functional genes, 16% is unique to c.

61 tRNA genes were predicted in the genome, covering 20 amino acids (Table 23, Table 24).

tRNAs genes of Table 24, C

8 ribosomal operons, 3 of which are oriented toward the leading strand and 5 toward the trailing strand, are typically clustered near the origin of replication. A large number of rRNA operons in phytofermentans may be evolutionary adaptations and benefits of organisms that have undergone a modified growth environment, as suggested by the increased ability of bacteria with higher numbers of operons to rapidly react to favorable growth environments (Schmidt, T.M. in Bacterial genetics: physical structures and analysis.221(Chapman and Hall Co., New York, N.Y., 1997); Klappurtenanbach, J.A., Dunbar, J.M. & Schmidt, T.M. rRNA optical copy number reflexive biological structures of bacteria. application. environmental. Microbiol.66, 1328-.

Based on the presence of clustered phage associated genes, it appears that there are putative prophages in the genome. The phage cluster spans approximately 39kb and includes 40 genes (Cphy 2953-2993). The 15 genes that lead to head and tail structural components and assembly are homologous to the gene of Clostridium difficile phage Φ C2 (Goh, S., Ong, P.F., Song, K.P., Riley, T.V. & Chang, B.J. the complete genome sequence of Clostridium difficile phage phi C2 and complex to phi CD 119 and indelible genes of CD630.microbiology 153, 676-. It is not clear whether functionally equivalent genes necessary for the phage to complete its life cycle (i.e., DNA packaging, tail assembly, cytolysis, lysogenic control and DNA replication, recombination and improvement) are present in the genome (Id). There were 27 genes (transposases) related to the insertion sequence, lower than in the closely related genome (table 25).

TABLE 25 comparison of the Clostridia genomes

Phytofermentans are evolutionarily related to plant humus-associated soil microorganisms. To illustrate the phylogenetic relationship between c.phytofermentans and other members of the clostridium species including the unsequenced genome, the 16s rRNA gene sequence (1611bp) of the isolated and most closely related members was used for contiguity analysis. The phylogeny demonstrated that strain ISDg is a member of cluster XIVa consisting of a majority of human/rat/chicken intestinal microorganisms and is only far related to cluster I containing many pathogens and solventogenic strains (solvogenic) Clostridium acetobutylicum and cluster III containing cellulolytic bacteria such as Clostridium cellulolyticum and Clostridium thermocellum (fig. 7) (warneck, t.a., meth, B.A. & leschen, s.b. Clostridium phytofermentans sp.nov., a cellulosic microorganism from soil. int.j.sychol. ethanol. Microbiol.52, 1155-type 1160 (2002); gels, m.d. et al. The phylogeny of bacillus, reaction of bacillus coli, 1994.j. syphyl. J. syphyl. 826. and j. syphylin. J. bacillus of bacillus, et al. (fig. 44. J. bion. 44. J. biorien. J. bion. 126. f. bacillus). In cluster XIVa, C phytofermentans are part of a clade (93.7-93.8% similarity) comprising uncultured bacteria from metagenomic analysis anaerobic rice soil, methanogenic waste exudate bioreactors (Burrell, P.C, O' Sullivan, C, Song, H., Clarke, W.P. & Blackall, L.L.identification, detection, and spatial resolution of Clostridium potential responses for cell differentiation in a microbial soil bioreactor. application. environmental. biol.70, 2414. 2419 (2004); Hengstmann, U.S., Chin, K.J. Sensen, P.H. environmental, W.biological.70, 2414. differentiation (9. faecal manganese, U.S. Chin, K.J. senns, Ja26. environmental, W.92. environmental) and Microcoding strain of Microcoding strain (5058.7-92% similarity) and Microcoding strain of Microcoding strain (5058.7.8.7-8% similarity) of Microcoding strain.

The grouping of c.phytofermentans in the species clostridium based on rRNA analysis is consistent with the overall distribution of CDS c.phytofermentans genes based on their similarity to genes in other whole sequenced genomes using BLASTP. The 38% CDS was most similar to cluster XIVa, followed by 10% in cluster I and 7% in cluster III (fig. 8). However, a significant proportion (14%) of the CDS has no significant homology in the species clostridium and exhibits the highest level of similarity to the CDS in phylogenetically distant strains. This suggests that the c. phytofermentans genome may contain many genes obtained from horizontal gene transfer. These scattered sources of genes reinforce the diversity of clostridia and the uniqueness of c.

A unique set of GH assemblies from different origins. The simultaneous presence of Glycoside Hydrolases (GH) with such a large number of functions in a single genome, such as c. Despite the diversity of the microbial systems organised for polysaccharide utilisation, the basic building blocks showed considerable uniformity. The catalytic domains found in polysaccharide-degrading enzymes can be programmed into families by their primary sequence and folding topology. Representatives of these families can be found in many different bacteria and eukaryotic microorganisms. By quantifying similarities and differences in other bacterial GH catalytic domains and their organization from gene to genome level, we sought to better understand their function or estimate the uniqueness of c.

GH of c. phytofermentans is similar to a widely different bacterium (6 phyla and 46 species) when compared to enzymes in other sequenced bacteria using BLASTP. More GH genes were similar to far related bacteria (chi square test, P-0.0004998) than expected for distribution of all genes in c. Approximately 50% of GH was more similar to species outside the clostridium than the expected 14% (fig. 9). Approximately 18% GH was more similar to bacillus followed by cluster III which was 17% more similar to the hydrolyzed cellulose bacteria (fig. 9). This suggests that horizontal gene transfer plays an important role in the evolution and assembly of the plant-degrading ability of c.

With the exception of the starch degrading genes clustered on the genome (Cphy2304-2352), the hemicellulase or cellulase genes were, in general, rarely co-localized. This supports the latter hypothesis achieved by independent horizontal gene transfer. However, examples of gene cascades with related functions are xylan degrading clusters with β -glucosidase Cphy3009(GH3), endo-xylanase Cphy3010(GH10) and arabinofuranosidase Cphy3011(GH 43). This tandem is unique to c. Furthermore, the genes encoding the two major cellulases, Cphy3367(GH9) and Cphy3368(GH48), are continuous across the genome. This is consistent with the high degree of synergy observed between the bacterial cellulases GH48 and GH9 of all cellulase systems known to date as present in bacteria (Riedel, K., Ritter, J. & Bronnenieer, K. synergistic interaction of the Clostridium stercorarium cellulose Avicease I (CeIZ) and Avicease II (CeIY) in the differentiation of microbial cellulose FEMS Microbiol. Lett.147, 239 (1997)).

In molecular phylogeny, the catalytic domains GH9 and GH48 of c.phytofermentans (see fig. 10 and 11) most closely resemble the endoglucanase Z precursor of Clostridium lactorum of thermophilic hydrolyzed cellulose and xynalytic, respectively, (Jauris, s. et al. Sequence analysis of the Clostridium lactorum gene encoding a thermoactive cellulose (avicel I): identification of catalytic and cellulose-binding domains. moi. gene. 223, 258. quadrature 267(1990)) and cellodextrin hydrolase (microcrystallmease II) (bronnen. meier, k., rugagel, clone K.P. staudinbank, w.l.purifaciens of promoter and cellulose of beta. xylo. chrysophanol 1. quadrature. biovar. 1991. bion. bionate. 1999)). GH9 and GH48 are also adjacent in c.steririum (Schwarz, w.h., Zverlov, V.V. & Bahl, h.excellular glycosyl hydrolases from clostridium orifice. adv.appl.microbiol.56, 215-. This is more severe in thermophilic c.sterirum, where GH9 and GH48 are highly similar to those in c.phytofermentans and c.sterirum, and are fused into a single protein. These observations suggest a common origin and cooperative function of these key enzymes in these three bacteria.

Since a given GH family contains enzymes with a very broad range of known activities, one might ask whether the redundancy in the family reflects functional redundancy or lack of specificity of the catalytic domain. More detailed molecular studies of the target GH family associated with plant cell wall degradation show that the sequences of these related but dissimilar genes are frequently altered. It should be mentioned that c.phytofermentans generally have less redundancy per family but a broader range of GH families reflecting its more faceted ecology compared to the cellulose specific species c.thermocellum with a prominent number of GH9(16), GH48(2), GH5(11) cellulases reflecting its cellulose-degrading specific role (fig. 9). Some GH families in phytofermentans still contain a prominent number of genes. Such situations are GH3 glucosidase, GH5 cellulase, and GH10, GH26, GH43 xylan degrading enzymes. Molecular phylogeny of GH5 cellulases (pfam00150) from c. phytofermentans showed that they were distinct and divided into two subclusters (fig. 11). Cluster B comprises fungal cellulases. This example strengthens how lateral gene transfer has affected GH evolution. More specifically, it emphasizes the importance of gene transfer between microorganisms belonging to different kingdoms, and suggests a more important role of gene transfer in kingdoms. Phylogeny of the 6 GH10 domains of c.phytofermentans suggests a wide range of variations and possible functions in c.phytofermentans. Cphy2108(GH10) is very similar to the multi-modular xylanase of c.stereostorium Xyn10C, a range of thermostable cells and cellulose and xylan binding proteins, thus binding cells to substrates (AIi, m.k., Kimura, t., Sakka, K. & Ohmiya, k.the multidomain xylanase Xyn10B as a cellulose-binding protein in Clostridium corarium. fems microbiol. lett.198, 79-83 (2001)). The discovery of two unique but closely related GH10 domains on a single enzyme fused to a CE domain shows that the replication event follows the fusion and separation. This particular arrangement of single protein catalytic functions is unique to c.

Replication, followed by fusion and rearrangement, and sequence isolation produced vast arrays of multi-modular enzymes in c. However, in general, a prominent feature of c.phytofermentans is the importance of horizontal gene transfer, such that such complex and gene clusters can be obtained from other members of the habitat society.

There is no degradation of the plant cell walls of the cellulose. Phytofermentans share an ecology similar to that of cellulosome-producing bacteria. However, there is neither biochemical nor genetic evidence of production of cellulose bodies (no anchoring structures, cohesin or dockerin domains) in such bacteria. The cellulosomal complexes are thought to be involved in plant cell wall failure, as they provide a bacterial cell-surface construct for retaining the high concentrations of proteins necessary to break down the different linkages of plant cell wall polysaccharides, representing a range of substrate specificities; they can maximize the stoichiometry and synergy between different enzymatic catalysis and binding specificities; and they can help limit the diffusion of depleted products away from the cell by providing a specific environment between the cell membrane and the substrate (Flint, h.j., Bayer, e.a., Rincon, m.t., Lamed, R. & White, b.a. polysaccharide inactivation by gum bacteria: potential for new antigens from microbial analysis. nat. rev. microbial.6, 121-. The strategy of phytofermentans to exploit the efficient exhaustion of plant cell walls and uptake of products without cellulose is not clear.

First, surprisingly, there was no grouping on phylogenetic trees according to protein and cellulose relationships for all cellulases, GH5, GH9 and GH 48. This suggests that the anchoring domain of the protein in the anchoring complex is missing or has been obtained multiple times independently (FIGS. 10 and 11). The remarkable multimodular features of the cellulosomal proteins are preserved in c. Phytofermentans have 19 middle modular GH proteins representing approximately 17% of all GH. In non-hydrolyzable cellulose bacteria, the corresponding GH is found predominantly on single domain polypeptides (Flint, H.J., Bayer, E.A., Rincon, M.T., Lamed, R. & White, B.A. polysaccharide inactivation by microorganism: potential for new antigens from genomic analysis. Nat.Rev.Microbiol.6, 121-131 (2008)). This likely reflects the different cellular localization of gene products and the fact that they act on smaller soluble carbohydrate substrates (Id.). According to this hypothesis, a multimodular tissue, which seems to be characteristic of enzymes from the class of hydrolyzing cellulose, may be present mainly for the extracellular processes of heterologous insoluble substrates such as cell walls (Id.).

In the absence of cellulose, CBMs can tightly immobilize enzymes in plant cell walls and thus hold them in proximity to their substrates. 35 putative CBM's representing 15 CAZy families were identified (Table 5). CBM2, CBM3, CBM4, CBM6 and CBM46 have been shown to bind cellulose (Table 5). CBM2, CBM4, CBM6, CBM13, CBM22, CBM35 and CBM36 have been shown to bind xylan (Table 5). The presence of multiple combinations of CBM domains with specificities that do not match catalytic domain specificities may provide advantages for acting on plant cell walls of different topologies where multiple polysaccharide types are cross-linked. Xylanases with cellulose-binding CBMs may help c.phytofermentans to link cellulose fibers when degrading cross-linked xylan. The presence of catalytic domain-independent CBMs can also be explained by their thermal stability effect, which has been demonstrated in some examples. Another type of domain, X2, can be found between the catalytic domain and the CBM domain, or between the CBM domains of one mannanase and three cellulases of c. Very little information is available about the function of X2 in bacterial extracellular enzymes. They can be assumed to function as spacers or linker sequences that allow optimal interaction between the catalytic module and the substrate binding module, for protein-protein interactions, or as potential carbohydrate binding domains.

In addition to binding to polysaccharides, some enzymes appear to bind to cells, leaving the cell near its substrate. Among the 31 GH enzymes predicted to be secreted (predicted to have a single peptide and/or extracellular), the cyclic prediction of transmembrane helices, the TonB-box (COG0810 and PS00430), the SORT domain and/or cell wall binding suggests that these enzymes may be associated with membranes or cell walls (table 6). Of these proteins, 8GH enzymes are predicted to have CBM and cell-joining capabilities to potentially gain natural access of cells to their degraded substrates. Finally, the specific gene Cphy1775 (SLH-GH)^*CBM32-CBM32) matched the predicted SLH domain (pfam00395) to anchor it to the cell wall and two immunoglobulin-like folds (CBM32) and probably behave like a CBM domain, which binds the cell to its substrate. Other GH enzymes may still be anchored to the cell by other unknown pathwaysThe cell surface. Cells may adhere together through different domains such as pfam07705(CARDB, cell adhesion domain in bacteria) and pfamO1391 (collagen, collagen triple helix repeat). Biofilm formation may also play a role in the coordinated degradation of plant cell wall polysaccharides.

Degradation linked to uptake and phosphorylation by a wide range of inducible ABC transporters. Although a phosphoenolpyruvate-dependent phosphotransferase system was found in the genome, the primary expression pattern data indicates that cells may not be expressed when they grow on any major component of the plant cell wall (such as xylan, cellulose, cellobiose, or glucose). More specifically, a significant number (137) of proteins with ABC _ tran (pfam00005) domains in the genome are consistent with carbohydrate uptake by ATP-binding cassette (ABC) transporters. Consistently, c. phytofermentans have a significantly higher number (21) of solute binding domains (SBP bac 1, pfamO1547) than its relatives in clostridium (table 25), typically associated with the uptake ABC transporter and allowing specific binding of different solutes. This suggests the necessity for affinity for different kinds of solutes, which is consistent with the hypothesis that c. phytofermentans can absorb multiple oligosaccharides. Finally, the polysaccharide ABC transporter LpIb (COG4209) domain, some ABC transporters of the subcomponent permease type, was over-proportioned in c.phytofermentans compared to other bacteria in the clostridium genus (20) (table 25).

The number and diversity of the domains may allow for the uptake of a wide range of oligosaccharides (table 25). Phytofermentans have 0.5% of the genes contributing to this function, while the same c.saccharolyticus is only 0.2% of the broad species, indicating that c.saccharolyticus absorbs more of the same type of sugar. Another significant feature is the presence of GH (53 in 109) adjacent to 41 ABC-type transporters, plus regulatory proteins, indicating combined uptake and degradation and specific regulation (table 7). The significant number and diversity of GH94 (cellobiose phosphorylase/cellodextrin phosphorylase) and GH65 (maltose phosphorylase) (table 25) is consistent with the hypothesis that a broad range of oligosaccharide types enter cells. The presence of 4 of the 5 cellobiose/cellodextrin phosphorylase GH94 membrane bound proteins close to ABC transporters is consistent with cellobiose and cellodextrin transport through ABC proteins, which is also the case in c.cellolyticum (Desvaux, m., guide, E. & Petitdemange, h.cellulose catalysis by bacillus cellulose growing in batch culture on defined medium. appl.environ.microbiol.66, 2461-.

There is also a high amount of β -glucosidase (8GH3) active against cellobiose or xylobiose. Phytofermentans feed the oligosaccharide into its metabolism, possibly through energy-favorable phosphorylation via cellobiose/cellodextrin phosphorylation or through energy-consuming hydrolytic β -glucosidase action. It is likely that the concentration of cellodextrin and the availability of other growth substrates (e.g., cellulose or cellobiose) are involved in determining the relative importance of cellodextrin fate and phospholysis and hydrolytic cleavage. Given the widespread occurrence of phospholytic and hydrolytic pathways of cellodextrin metabolism in cellulolytic microorganisms, this apparent redundancy is that selective values are possible. Adjusting the relative flux through these two pathways may allow the microorganism to modulate the ATP rate in response to environmental factors (e.g., availability of substrate). Phytofermentans also absorbed monosaccharides such as xylose, as evidenced by the presence of 9Xy1F, predicted to absorb xylose (table 25).

Fine-tuning of carbohydrate metabolism. C. phytofermentans have a large number of AraC (70) and PurR (23) transcriptional regulators compared to relatives in clostridium (table 25). Prokaryotic transcriptional regulatory proteins are classified into families based on sequence similarity and structural and functional criteria. AraC regulatory proteins typically activate gene transcription involved in carbon metabolism, stress responses and pathogenesis (Ramos, J.L. et al, "The TetR family of transcriptional repressors," Microbiol. MoI. biol. Rev.69, 326-356 (2005)). PurR belongs to the lactose repressor family (lac), and the gene product generally acts as a repressor, with physiological concentrations of the ligand causing dissociation of the PurR-DNA complex (Id). The abundance of these regulatory proteins is consistent with the extensive array of substrate utilization and regulatory complex networks.

Of the 23 genes with significant similarity to PurR, 8 were adjacent to ABC transporters clustered with the GH enzyme. Of the 70 genes with significant similarity to AraC, 20 were found to be close to ABC transporters clustered with the GH enzyme. Of the 41 ABC transporters found clustered with the GH enzyme, 20 were adjacent to one AraC and 8 were close to PurR. Based on these findings, we hypothesized that they regulate the expression of the respective genomic regions (table 7).

Example 4 detection of hydrolase Activity

Various methods of detecting the predicted biological activity of the hydrolase may be used. In one embodiment, the hydrolase-encoding predictor genes identified in c. Coli into a microorganism. The activity of the expressed gene was measured by the following method: the transformed microorganism is supplemented with a substrate predictive for a hydrolase, the consumption of substrate and the increase in hydrolysate are measured, and this activity is compared to the level of activity in an untransformed control microorganism. In some embodiments, the expression vector is designed for predicted extracellular expression of a hydrolase. An increase in substrate hydrolysis may indicate that the predicted hydrolase is in fact a hydrolase.

Example 5 detection of ABC transporter Activity

A variety of methods for detecting the biological activity of the predicted ABC transporter can be used. In one embodiment, the predictor genes encoding ABC transporters identified in c. Coli into a microorganism. The activity of the expressed gene was measured by the following method: the transformed microorganism is supplemented with the substrate for the predicted ABC transporter, the transport of the substrate into the cell is measured, and the uptake level is compared to the uptake level in the untransformed control microorganism. An increase in uptake may indicate that the predicted ABC transporter is an ABC transporter.

Example 6 detection of transcriptional regulatory protein Activity

Various methods of detecting the predicted biological activity of the transcriptional modulator protein may be used. In one embodiment, the predicted gene encoding a transcriptional regulator protein identified in c. Coli into a microorganism. The activity of the expressed gene is detected by co-transfecting a microorganism transformed with a plasmid containing a target nucleic acid sequence of the transcriptional regulator protein and a reporter gene. The activity of the reporter gene is detected and compared to the level of activity of the same reporter gene in a control microorganism. An increase in reporter gene activity indicates that the predicted transcriptional modulator protein may be a transcriptional modulator protein.

Example 7 modification of E.coli with Cellobiose

Coli laboratory strains and natural isolates do not express cellobiose-utilizing functional genes, although they characteristically contain a cryptic cellobiose-utilizing gene on their chromosome (Hall et al, J Bacteriol., 1987 June; 169: 2713-2717). E.coli was engineered to utilize cellobiose through the expression of Cphy2464-2466 and Cphy0430, where Cphy2464-2466 encodes the ABC transporter and Cphy0430 encodes a cellobiose phosphorylase that converts cellobiose to glucose and glucose-1-phosphate. The Cphy2464-2466 and Cphy0430 genes are expressed from constitutive promoters on plasmids. The signal sequence of Cphy2466 was replaced by the signal sequence of the endogenous e.coli abc transporter periplasmic binding protein to direct expression of the protein in the periplasm. Coli is able to grow using cellobiose as the sole carbon source.

Modification of improved pectin breakdown in example 8, s

Cphy1714, Cphy1720 and Cphy3586 were cloned into e.coli-s.cerevisiae shuttle vectors and expressed heterologously from plasmids in s.cerevisiae. For secretion of the gene product, the signal sequence is replaced by that of the s. The modified yeast exhibits improved pectin degradation.

Example 9 microbial improvement

pIMPCphy

pIMPCphy vector was constructed as a shuttle vector for C.phytofermentans. Coli has an ampicillin resistance cassette and an origin of replication (ori) for selection and replication in e. It has a gram-positive origin of replication, allowing replication of the plasmid in c. To select for the presence of the plasmid, the pIMCphy vector carries an erythromycin resistance gene under the control of the promoter of the Cphy1029 gene of C.phytofermentans. This plasmid was transferred to c.phytofermentans by electrotransformation or by conjugative transfer with e.coli strain with a decoy plasmid (e.g. pRK 2020). The plasmid map of pIMCphy is depicted in FIG. 18.

Constitutive promoter

In the first step, several promoters from c. Promoter elements (e.g., ribosomal genes, or genes for ethanol production, alcohol dehydrogenase) are selected by selecting key gene selection that may necessarily involve a constitutive pathway (constitutive pathway). Examples of promoters from such genes and genes include, but are not limited to:

cphy _ 1029: iron-containing alcohol dehydrogenase

Cphy _ 3510: ig domain containing protein

Cphy _ 3925: bifunctional acetaldehyde CoA/alcohol dehydrogenase

Cloning of promoters

The different promoters of the upstream region of the gene were amplified by PCR. The primers used for this PCR reaction were selected in such a way that: they contain a promoter region, but do not contain a ribosome binding site for downstream genes. The primers are designed to introduce a cleavage site at the end of the promoter fragment, which is present in the multiple cloning site of pIMCphy, but not in its own promoter region, such as SalI, BamHI, XmaI, SmaI, EcoRI.

The PCR reaction was performed with a commercially available PCR Kit, GoTaq according to the manufacturer's conditions^TMGreen Master Mix (Promega). The reaction was carried out in a thermal cycler Gene Amp System 24(Perkin Elmer). The PCR product is obtained by GenElute^TMPCR Clean-Up Kit (Sigma) was purified. The purified PCR product as well as the plasmid pIMCphy were then digested with the relevant enzymes in appropriate amounts according to the manufacturer's conditions (restriction enzymes from New England Bio labs and Promega). The PCR product and plasmid were then recovered in Recovery Flashgel^TM(Lonza) and gel purified. The PCR product was then ligated to the plasmid using the Quick Ligation Kit (New England Biolabs), E.coli (DH 5. alpha.) competent cells were transformed with the Ligation mix, and the cells plated on LB plates containing 1. mu.g/ml ampicillin. The plates were incubated overnight at 37 ℃.

Ampicillin resistant e.coli clones were selected from the plates and streaked onto new selection plates. After growth at 37 ℃, a single colony was inoculated in liquid LB medium containing 1. mu.g/ml ampicillin and grown overnight at 37 ℃. The liquid medium was isolated using the Gene Elute plasma isolation kit.

Mini-prep kit

Correct insertion of the plasmid was checked by PCR reaction with appropriate primers and restriction digestion with restriction enzymes, respectively. To ensure sequence integrity, the insert was sequenced in this step.

Cloning of cellulase Gene

One or more cellulase genes may include the ribosome binding site of each gene itself, amplified by PCR, and subsequently digested with the appropriate enzymes previously described in the promoter cloning. The resulting plasmid is also treated with the relevant restriction enzymes, and the resulting amplified gene is moved into the plasmid by conventional ligation. The pCphyP3510-3367 plasmid (FIG. 19; SEQ ID NO: 1) was formed by ligating Cphy _3367 downstream of the Cphy _3510 promoter. Coli, and correct inserts were confirmed from transformants selected on selection plates.

Binding transfer

The different plasmids described above were transformed into e.coli DH5 a carrying the helper plasmid pRK 2030. After overnight growth at 37 ℃, e.coli colonies carrying both of the above plasmids were selected on LB plates containing 1 μ g/ml ampicillin and 50 μ g/ml kanamycin. Single colonies were obtained after re-streaking on selection plates at 37 ℃. Coli (e.g., LB or LB supplemented with 1% glucose and 1% cellobiose) were inoculated with single colonies and grown overnight at 37 ℃ either aerobically or 35 ℃ anaerobically. Fresh growth medium was inoculated with overnight culture at 1: 1 and grown to mid-log phase. Phytofermentans strains were also grown to mid-log phase in the same medium.

Two different cultures, c. phytofermentans and e.coli with pRK2030 and one of the plasmids are mixed in different ratios, e.g. 1: 10, 1: 1, 10: 1. At 35 ℃, in liquid medium on a plate or in 25mm nucleocore^TMHybridization was performed on Track-Etch membranes (Whatman). The time is between 2 and 24 hours and the hybridization medium is the same medium as that used to grow the culture prior to hybridization. Following the hybridization procedure, the bacterial mixture is either plated directly on a plate or first grown in liquid medium for 6-18 hours and then plated again. The plates contained 10. mu.g/ml erythromycin as selection reagent for C.phytofermentans, 10. mu.g/ml trimethoprim, 150. mu.g/ml cyclosporine and 1. mu.g/ml nalidixic acid as relative selection medium for E.coli.

After 3-5 days of incubation at 35 ℃, erythromycin resistant colonies were picked from the plates and restreaked on fresh selection plates. Single colonies were picked and the presence of plasmid was determined by PCR reaction.

Cellulase Gene expression

The expression of the cellulose genes on the different plasmids is then examined in the presence of little or no expression of the corresponding genes at the chromosomal locus. The positive screen showed constitutive expression of the cloned cellulase.

SEQ ID NO：1

plasmid-pCphyP 3510-3367

1 ccgggaattc actggccgtc gttttacaac gtcgtgactg ggaaaaccct ggcgttaccc

61 aacttaatcg ccttgcagca catccccctt tcgccagctg gcgtaatagc gaagaggccc

121 gcaccgatcg cccttcccaa cagttgcgca gcctgaatgg cgaatggcgc ctgatgcggt

181 attttctcct tacgcatctg tgcggtattt cacaccgcat atggtgcact ctcagtacaa

241 tctgctctga tgccgcatag ttaagccagc cccgacaccc gccaacaccc gctgacgcgc

301 cctgacgggc ttgtctgctc ccggcatccg cttacagaca agctgtgacc gtctccggga

361 gctgcatgtg tcagaggttt tcaccgtcat caccgaaacg cgcgagacga aagggcctcg

421 tgatacgcct atttttatag gttaatgtca tgataataat ggtttcttag acgtcaggtg

481 gcacttttcg gggaaatgtg cgcggaaccc ctatttgttt atttttctaa atacattcaa

541 atatgtatcc gctcatgaga caataaccct gataaatgct tcaataatat tgaaaaagga

601 agagtatgag tattcaacat ttccgtgtcg cccttattcc cttttttgcg gcattttgcc

661 ttcctgtttt tgctcaccca gaaacgctgg tgaaagtaaa agatgctgaa gatcagttgg

721 gtgcacgagt gggttacatc gaactggatc tcaacagcgg taagatcctt gagagttttc

781 gccccgaaga acgttttcca atgatgagca cttttaaagt tctgctatgt ggcgcggtat

841 tatcccgtat tgacgccggg caagagcaac tcggtcgccg catacactat tctcagaatg

901 acttggttga gtactcacca gtcacagaaa agcatcttac ggatggcatg acagtaagag

961 aattatgcag tgctgccata accatgagtg ataacactgc ggccaactta cttctgacaa

1021 cgatcggagg accgaaggag ctaaccgctt ttttgcacaa catgggggat catgtaactc

1081 gccttgatcg ttgggaaccg gagctgaatg aagccatacc aaacgacgag cgtgacacca

1141 cgatgcctgt agcaatggca acaacgttgc gcaaactatt aactggcgaa ctacttactc

1201 tagcttcccg gcaacaatta atagactgga tggaggcgga taaagttgca ggaccacttc

1261 tgcgctcggc ccttccggct ggctggttta ttgctgataa atctggagcc ggtgagcgtg

1321 ggtctcgcgg tatcattgca gcactggggc cagatggtaa gccctcccgt atcgtagtta

1381 tctacacgac ggggagtcag gcaactatgg atgaacgaaa tagacagatc gctgagatag

1441 gtgcctcact gattaagcat tggtaactgt cagaccaagt ttactcatat atactttaga

1501 ttgatttaaa acttcatttt taatttaaaa ggatctaggt gaagatcctt tttgataatc

1561 tcatgaccaa aatcccttaa cgtgagtttt cgttccactg agcgtcagac cccgtagaaa

1621 agatcaaagg atcttcttga gatccttttt ttctgcgcgt aatctgctgc ttgcaaacaa

1681 aaaaaccacc gctaccagcg gtggtttgtt tgccggatca agagctacca actctttttc

1741 cgaaggtaac tggcttcagc agagcgcaga taccaaatac tgtccttcta gtgtagccgt

1801 agttaggcca ccacttcaag aactctgtag caccgcctac atacctcgct ctgctaatcc

1861 tgttaccagt ggctgctgcc agtggcgata agtcgtgtct taccgggttg gactcaagac

1921 gatagttacc ggataaggcg cagcggtcgg gctgaacggg gggttcgtgc acacagccca

1981 gcttggagcg aacgacctac accgaactga gatacctaca gcgtgagcta tgagaaagcg

2041 ccacgcttcc cgaagggaga aaggcggaca ggtatccggt aagcggcagg gtcggaacag

2101 gagagcgcac gagggagctt ccagggggaa acgcctggta tctttatagt cctgtcgggt

2161 ttcgccacct ctgacttgag cgtcgatttt tgtgatgctc gtcagggggg cggagcctat

2221 ggaaaaacgc cagcaacgcg gcctttttac ggttcctggc cttttgctgg ccttttgctc

2281 acatgttctt tcctgcgtta tcccctgatt ctgtggataa ccgtattacc gcctttgagt

2341 gagctgatac cgctcgccgc agccgaacga ccgagcgcag cgagtcagtg agcgaggaag

2401 cggaagagcg cccaatacgc aaaccgcctc tccccgcgcg ttggccgatt cattaatgca

2461 gctggcacga caggtttccc gactggaaag cgggcagtga gcgcaacgca attaatgtga

2521 gttagctcac tcattaggca ccccaggctt tacactttat gcttccggct cgtatgttgt

2581 gtggaattgt gagcggataa caatttcaca caggaaacag ctatgaccat gattacgcca

2641 aagctttggc taacacacac gccattccaa ccaatagttt tctcggcata aagccatgct

2701 ctgacgctta aatgcactaa tgccttaaaa aaacattaaa gtctaacaca ctagacttat

2761 ttacttcgta attaagtcgt taaaccgtgt gctctacgac caaaagtata aaacctttaa

2821 gaactttctt ttttcttgta aaaaaagaaa ctagataaat ctctcatatc ttttattcaa

2881 taatcgcatc agattgcagt ataaatttaa cgatcactca tcatgttcat atttatcaga

2941 gctccttata ttttatttcg atttatttgt tatttattta acatttttct attgacctca

3001 tcttttctat gtgttattct tttgttaatt gtttacaaat aatctacgat acatagaagg

3061 aggaaaaact agtatactag tatgaacgag aaaaatataa aacacagtca aaactttatt

3121 acttcaaaac ataatataga taaaataatg acaaatataa gattaaatga acatgataat

3181 atctttgaaa tcggctcagg aaaagggcat tttacccttg aattagtaca gaggtgtaat

3241 ttcgtaactg ccattgaaat agaccataaa ttatgcaaaa ctacagaaaa taaacttgtt

3301 gatcacgata atttccaagt tttaaacaag gatatattgc agtttaaatt tcctaaaaac

3361 caatcctata aaatatttgg taatatacct tataacataa gtacggatat aatacgcaaa

3421 attgtttttg atagtatagc tgatgagatt tatttaatcg tggaatacgg gtttgctaaa

3481 agattattaa atacaaaacg ctcattggca ttatttttaa tggcagaagt tgatatttct

3541 atattaagta tggttccaag agaatatttt catcctaaac ctaaagtgaa tagctcactt

3601 atcagattaa atagaaaaaa atcaagaata tcacacaaag ataaacagaa gtataattat

3661 ttcgttatga aatgggttaa caaagaatac aagaaaatat ttacaaaaaa tcaatttaac

3721 aattccttaa aacatgcagg aattgacgat ttaaacaata ttagctttga acaattctta

3781 tctcttttca atagctataa attatttaat aagtaagtta agggatgcat aaactgcatc

3841 ccttaacttg tttttcgtgt acctattttt tgtgaatcga tccggccagc ctcgcagagc

3901 aggattcccg ttgagcaccg ccaggtgcga ataagggaca gtgaagaagg aacacccgct

3961 cgcgggtggg cctacttcac ctatcctgcc cggatcgatt atgtcttttg cgcattcact

4021 tcttttctat ataaatatga gcgaagcgaa taagcgtcgg aaaagcagca aaaagtttcc

4081 tttttgctgt tggagcatgg gggttcaggg ggtgcagtat ctgacgtcaa tgccgagcga

4141 aagcgagccg aagggtagca tttacgttag ataaccccct gatatgctcc gacgctttat

4201 atagaaaaga agattcaact aggtaaaatc ttaatatagg ttgagatgat aaggtttata

4261 aggaatttgt ttgttctaat ttttcactca ttttgttcta atttctttta acaaatgttc

4321 tttttttttt agaacagtta tgatatagtt agaatagttt aaaataagga gtgagaaaaa

4381 gatgaaagaa agatatggaa cagtctataa aggctctcag aggctcatag acgaagaaag

4441 tggagaagtc atagaggtag acaagttata ccgtaaacaa acgtctggta acttcgtaaa

4501 ggcatatata gtgcaattaa taagtatgtt agatatgatt ggcggaaaaa aacttaaaat

4561 cgttaactat atcctagata atgtccactt aagtaacaat acaatgatag ctacaacaag

4621 agaaatagca aaagctacag gaacaagtct acaaacagta ataacaacac ttaaaatctt

4681 agaagaagga aatattataa aaagaaaaac tggagtatta atgttaaacc ctgaactact

4741 aatgagaggc gacgaccaaa aacaaaaata cctcttactc gaatttggga actttgagca

4801 agaggcaaat gaaatagatt gacctcccaa taacaccacg tagttattgg gaggtcaatc

4861 tatgaaatgc gattaagctt agcttggctg caggtcgaca gacagcataa gtcacatcca

4921 gacaaatgtc ctataggatg ttagtagggg tttggagaat tgcccgtaag gcaggttatt

4981 tggctagata taatcaatcc agttacagga tagtaggatt gcaacccagt cgttttgacc

5041 agtttgtaca agaattttaa tttgtcgaaa tattgtggca aatcaaatga agttctttga

5101 tgaaatgttt agaaacatga cttagaatgg ggtacaaaaa gtgaatttgt aagcaaaaag

5161 acttgacctt tcctacgata gttgttataa tcatcttgtt attggaacga ttatatttac

5221 ttatgcacat tttagagttt ttcgaattgt taatacatca ttaacaattt aattatactc

5281 gttatgtgac gtaagtcaat ataatacaaa accatatatt ttaagccgcg ggcagaaagg

5341 atgagagata tgaaaaagat aataagtctt ttattagtga taacacttct gatatccatg

5401 gcaccatcga aagctgacgc agcggaaacc aattataatt acggagaagc tcttcaaaaa

5461 tcaatcatgt tttatgagtt tcaacgttct ggtaaactgc caagtaccat tcggaataat

5521 tggagaggtg actctggttt aaccgatgga gcagatgttg gtttggatct aactggtggc

5581 tggtatgatg ctggtgatca tgtaaaattt aatcttcctt tggcttatac tgtaacaatg

5641 ttagcatggg cagtatatga agaagaggct actctttcaa aggcaggcca attaagttat

5701 ttattagatg aaattaagtg gtctagtgat tacctaatta aatgtcatcc acaagcaaat

5761 gtattttatt atcaggttgg taatggaaat acagatcact cttggtgggg acctgctgaa

5821 gttatgcaga tggctagacc gtcctataag gttgatttaa ataacccagg ttctactgta

5881 gtaggagaag cagcagcagc tcttgcagca acagcactta tatataagac aaaagaccct

5941 acttattcag caacttgcct tcgtcatgca aaagagcttt ttaattttgc agatacaaca

6001 aaaagcgatg ctggatatac agcagcaagt gggttctata cttcctatag tggattttat

6061 gatgaattat cctgggcagc tacatggatt taccttgcaa gtggagaagc gacctatttg

6121 gataaggcag aatcttatgt agccaaatgg ggaacagaac ctcaatcttc cacattaagt

6181 tataagtggg cacaaaactg ggatgatgtt cactatggtg cagctttatt attagcaaga

6241 attacaaata aagcaattta taagaacaat attgaaatgc atcttgacta ttggactaca

6301 gggtataatg gtagtcgtat tacttataca ccaaaaggac ttgcttggtt agattcctgg

6361 ggtgcattaa gatatgcgac gacaacagca tttctagcaa gtgtttatgc tgattggagc

6421 ggatgtagtg ctggaaaagt tagtacttac aatgcatttg cgaaacagca ggtagattat

6481 gcattaggaa gtaccggaag aagttttgtg gttggatatg gtgtaaattc tccaacaaga

6541 cctcatcata gaactgctca tagttcatgg gcagacagtc agacggagcc aaattaccat

6601 agacacacca tttatggtgc tttagtaggt ggacctggta ataatgatag ttatgaggat

6661 aacattaata attatgtaaa caatgaaatc gcttgtgact ataatgcagg ttttgttggc

6721 gcattggcta aagtttataa aacatatggc ggaacaccaa ttgcaaactt taaggcaatc

6781 gaaacagtaa caaacgatga gttatttatt caagctggta ttaatgcctc tggtccatct

6841 tttatcgaag taaaggcatt ggttttcaat gagacaggtt ggccagctcg tgttaccgat

6901 aaattatcct ttaagtattt tattgatatc tcggaatatg tagcaaaggg atatacaaag

6961 aatgatttta cggtatcgac aaattataac aatggagcaa ccacatcggc attgcttcct

7021 tgggatgctg cgaataatat ctattatgtg aatgtagact tctctggaac taagatttat

7081 cctggtggac agtctgcata taagaaagaa gtacaattta gaattgctgg tccacaaaac

7141 gttaatatat gggacaattc caatgactac tcctttacac aaattgctaa tgttagttca

7201 ggaaataccg taaagaccac atatatacca ttgtatgata atggtaaatt agtatttggt

7261 aatgagccaa agacgggtgt tccttctgca agtcttgata agactacagc aaactttgac

7321 aaaaacccag ctgtatccgc agatatacca gtaaccatta actataatgg taatacatta

7381 acagcggtta agaatggaac aacggtttta acgaaaggta ctgattatac tgtatctggt

7441 aatgtagtaa cgttatctaa gaattatttc ttagcacaga gcgctagtac ggttacttta

7501 acatttgtat ttagtggcgg taacgatgca acattaaaag tgactttagt agatacttct

7561 ccaagtgcat ccattaatcc aaattctgct gtctttgata aggctagcgg aaaacaggaa

7621 aatatagtta ttacgcttac accaaatggc aataccttag ctggacttaa gaatgggtct

7681 aagagcctgg taactggaac tgattatacc gtttccggaa caacagtgac gattctatct

7741 tcttatttaa gtcaatttgc agtaggaagt caatctattg tatttgaaat gaataaaggg

7801 acaaatccag tcttagcagt taccattaag gattcttctg ttgttactcc aacaggaaat

7861 attaaacttc aaatgtttaa tggaaattct tctgcaacaa cgaatggcat tgcaccaaga

7921 attaaattaa ttaacaccgg aactactgca atcaacttat ccgatgttaa gattcgctat

7981 tattatacaa tcaatggcga aaaggatcag gcattctggt gtgattattc gacgattggt

8041 agttccaatg taaatggtac tttcgtaaag atgagtacac caaaaacaaa tgcagattac

8101 tatctagaat tttcatttaa gtccgctgcc ggaactttaa acgcagggca aagtattgaa

8161 gttcaaggaa gattttctaa ggtagactgg acaaactata cacaaacaga tgattattcg

8221 tttggtgata gtaactcaag ttatgctgat tggaataaga caacagtata tatctctgat

8281 gttttggttt ggggagtcga accataatag gagaaaaaat gtaataattt ttagaggggt

8341 cataacttag tatacatgtc tgtatatgag gtccgacacg tgccacacgg catgtgtcgg

8401 gcctcatttt tatacagcgt gtatgtgacc ttattcatga caagggatcg tccgcc

Other embodiments

The above description discloses several methods and materials of the present invention. The invention improves methods and materials and changes methods and equipment for preparation. Such modifications will be apparent to those skilled in the art from the disclosure of the invention or the embodiments of the invention disclosed herein. It is therefore intended that the invention not be limited to the particular embodiments disclosed herein, but that it cover all modifications and variations falling within the true scope and spirit of the invention as encompassed by the appended claims.

Claims

1. A product for biofuel production, comprising: lignocellulosic biomass, a microorganism capable of directly hydrolyzing and fermenting said biomass, wherein said microorganism is modified to increase the activity of one or more cellulases.

2. The product of claim 1, wherein the microorganism is capable of direct fermentation of five and six carbon sugars.

3. The product of claim 2, wherein the microorganism is a bacterium.

4. The product of claim 1, wherein the microorganism is a Clostridium species.

5. The product according to claim 1, wherein the microorganism is Clostridium phytofermentans.

6. The product of claim 1, wherein the microorganism is non-recombinant or recombinant.

7. The product of claim 2, wherein said microorganism comprises one or more heterologous polynucleotides capable of increasing the activity of said one or more cellulases.

8. A product for biofuel production, comprising: a carbonaceous biomass, a microorganism capable of directly hydrolyzing and fermenting said biomass, wherein said microorganism is modified to increase the activity of one or more cellulase enzymes.

9. The product of claim 8, wherein the microorganism is capable of producing a fermentation end product, the substantial portion of which is ethanol.

10. The product of claim 8, wherein the microorganism is capable of producing a fermentation end product comprising lactic acid, acetic acid and formic acid.

11. The product of claim 8, wherein the microorganism is capable of absorbing one or more complex carbohydrates.

12. The product of claim 8, wherein the biomass comprises a higher concentration of oligomeric carbohydrates than monomeric carbohydrates.

13. The product of claim 8, wherein said one or more cellulases is selected from the group consisting of Cphy3367, Cphy3368, Cphy0218, Cphy3207, Cphy2058 and Cphy 1163.

14. The product of claim 8, wherein said one or more cellulase enzymes is Cphy 3367.

15. A method for producing a biofuel, the method comprising:

(a) contacting a carbon-containing biomass with a microorganism capable of directly hydrolyzing and fermenting the carbon-containing biomass, wherein the microorganism is modified to increase the activity of one or more cellulases; and (b) allowing sufficient time for the hydrolysis and fermentation to produce a biofuel.

16. The method of claim 15, wherein the one or more cellulases is selected from the group consisting of Cphy3367, Cphy3368, Cphy0218, Cphy3207, Cphy2058 and Cphy 1163.

17. The method of claim 15, wherein the microorganism is capable of absorbing one or more complex carbohydrates.

18. The method of claim 15, wherein the biomass comprises a higher concentration of oligomeric carbohydrates than monomeric carbohydrates.

19. The method of claim 18, wherein the hydrolysis produces a higher concentration of cellobiose and/or larger oligomers than monomeric carbohydrates.

20. The method of claim 15, wherein the one or more cellulase enzymes is Cphy 3367.

21. According to SEQ ID NO: 1.