[go: up one dir, main page]

WO2014089797A1 - Locked nucleic acid-modified dna fragment for high-throughput sequencing - Google Patents

Locked nucleic acid-modified dna fragment for high-throughput sequencing Download PDF

Info

Publication number
WO2014089797A1
WO2014089797A1 PCT/CN2012/086521 CN2012086521W WO2014089797A1 WO 2014089797 A1 WO2014089797 A1 WO 2014089797A1 CN 2012086521 W CN2012086521 W CN 2012086521W WO 2014089797 A1 WO2014089797 A1 WO 2014089797A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
primer
nucleic acid
locked nucleic
chain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2012/086521
Other languages
French (fr)
Chinese (zh)
Inventor
龚梅花
章文蔚
李计广
朱鹏远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BGI Technology Solutions Co Ltd
Original Assignee
BGI Technology Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BGI Technology Solutions Co Ltd filed Critical BGI Technology Solutions Co Ltd
Priority to PCT/CN2012/086521 priority Critical patent/WO2014089797A1/en
Publication of WO2014089797A1 publication Critical patent/WO2014089797A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/30Chemical structure
    • C12N2310/32Chemical structure of the sugar
    • C12N2310/323Chemical structure of the sugar modified ring structure
    • C12N2310/3231Chemical structure of the sugar modified ring structure having an additional ring, e.g. LNA, ENA

Definitions

  • the present invention relates to high throughput sequencing related techniques, and in particular to LNA modified DNA fragments for high throughput sequencing, including DNA linkers, PCR primers and/or sequencing primers.
  • the present invention also relates to a high throughput sequencing method comprising a DNA library preparation and sequencing method for high throughput sequencing, the DNA library preparation and sequencing method comprising the step of using an LNA modified DNA fragment. Background technique
  • Locked nucleic acid is a synthetic antisense oligonucleotide, a special bicyclic nucleotide derivative in which the nucleotide residue of the ribose ring ( ⁇ -D-)
  • the 2,-oxygen and 4,-carbon of ribofuranose form a fluorenylene linkage by shrinkage, and the structure contains one or more 2'-0,4'-C-fluorenylene- ⁇ -D-ribofuranosyl
  • the monomer, the 2'-0 position and the 4'-C position of ribose form an oxysulfinyl bridge, a sulfinylene bridge or an amine fluorene bridge through different shrinkage, and are connected in a ring shape, and the ring bridge is locked.
  • the N-configuration of the furanose C3'-endotype reduces the flexibility of the ribose structure and increases the stability of the local structure of the phosphate backbone. Because LNA and DNA/RNA have the same phosphate skeleton in structure, they have good recognition ability and strong affinity for DNA and RNA (Li Shengmao, Xu Xiang, Liang Huaping, Research progress in locked nucleic acid, Physiological science) Progress, 2003, 34 ( 4 ), 319-323 ) 0
  • the present invention includes the following aspects:
  • a first aspect of the invention relates to a locked nucleic acid modified DNA fragment for high throughput sequencing, wherein the DNA fragment is selected from one, two or three of a linker, a PCR primer and a sequencing primer, characterized in that The linker, PCR primer and/or sequencing primer in the DNA fragment contains a locked nucleic acid.
  • the high throughput sequencing refers to SOLEXA sequencing.
  • One .
  • the DNA fragment is a PCR primer.
  • the DNA fragment is a sequencing primer.
  • the DNA fragment is a linker and a PCR primer.
  • the DNA fragment is a PCR primer and a sequencing primer. In another embodiment of the invention, the DNA fragment is a linker, a PCR primer, and a sequencing primer.
  • the locked nucleic acid contained in the linker is located at the 5th nucleotide of the terminal F chain 5, and is located at the 3rd nucleotide of the end of the linker R chain 3, The number of locked nucleic acids in the F chain and the R chain of the linker is one.
  • sequence of the linker F chain is the sequence of SEQ ID NO: 3; in a specific embodiment of the invention, the sequence of the linker R chain is SEQ ID NO: 4 The sequence shown.
  • the sense primer contains a locked nucleic acid located near the 5th end of the PCR primer
  • the near 5 is Refers to the 2nd to 5th (eg, 2nd, 3rd, 4th, 5th) nucleotides near the 5th end of the primer
  • the antisense primer contains the locked nucleic acid located near the PCR primer 3
  • the end preferably, the near 3
  • the end refers to the 2nd to 5th (for example, 2nd, 3rd, 4th, 5th) nucleotides located near the 3rd end of the primer
  • the number of locked nucleic acids in the PCR primer is 1 to 3 (for example, 1, 2, 3).
  • the sense primer refers to a PCR primer for amplifying a coding strand in genomic DNA, and in one embodiment of the present invention, a partial sequence thereof is identical to a fixed sequence P5 on a chip. PCR primers.
  • the PCR sequence of the partial sequence and the immobilized sequence P5 on the chip refers to a PCR Primer PE 1.0 primer.
  • the locked nucleic acid contained in the PCR primer is located at the 3rd nucleotide of the PCR Primer PE 1.0 primer near the 5th end, and the number of the locked nucleic acid is 1.
  • the antisense primer refers to a PCR primer for amplifying a template strand in genomic DNA, and in one embodiment of the invention, a partial sequence thereof and a fixed sequence on the chip P7- The resulting PCR primers.
  • the PCR primer that binds to the immobilized sequence P7 on the chip comprises a tag sequence for distinguishing different sequencing results, thus the on-chip
  • the number of libraries that are mixed and sequenced is related. In a specific embodiment of the present invention, four different libraries are mixed and sequenced, so the number of the tag sequences is four, so the PCR primers that bind to the immobilized sequence P7 on the chip The number is four.
  • the tag consists of A/T/C/G, which serves to identify different libraries, allowing different libraries to be mixed and sequenced to take full advantage of sequencing throughput.
  • the label has a length of 8 nt, such as AGAGACTT, GCGAGGCC. AGATCTCT or TAGAGAGC.
  • tags can be introduced by ligation tag ligation or PCR.
  • PCR primers are used to introduce tags using PCR, allowing different libraries to be labeled with different markers for sequencing.
  • the PCR primer of the partial sequence and the immobilized sequence P7 on the chip is a PCR Primer PE2.0 primer, for example, PCR Primer PE2.0 primers A, B, C. D.
  • the locked nucleic acid contained in the PCR primer is located at the 3rd nucleotide of the 3' end of the PCR Primer PE2.0 primer, and the number of the locked nucleic acid is 1.
  • the sequence of the PCR primer is the sequence shown in SEQ ID NO: 6.
  • the sequence of the PCR primer is the sequence shown in SEQ ID NOs: 11-14.
  • the locked nucleic acid contained in the sequencing primer is located near the 3' end of the sequencing primer; preferably, the 3 end of the sequencing primer is located near the sequencing primer.
  • the 2nd to 5th for example, 2nd, 3rd, 4th, 5th
  • the number of locked nucleic acids in the sequencing primer is 1 to 3 (for example, 1, 2, 3) )).
  • the sequencing primer is a Read2 sequencing primer.
  • the locked nucleic acid contained in the sequencing primer is located at the 2nd and 4th nucleotides of the sequencing primer near the 3' end, and the number of locked nucleic acids in the sequencing primer is 2.
  • sequence of the sequencing primer is the sequence shown in SEQ ID NO: 16.
  • a second aspect of the invention relates to a composition comprising the DNA fragment of any of the first aspects of the invention.
  • a third aspect of the invention relates to a method for constructing a DNA library, the method comprising the step of performing a nucleic acid modification on a DNA fragment, the DNA fragment being a linker and/or a PCR primer, the lock nucleic acid modification being a linker in the DNA fragment and / or PCR primers contain a locked nucleic acid; preferably,
  • the lock nucleic acid contained in the linker is located at the 5th end of the linker F chain, and/or 3 end of the linker R chain; preferably, the 5 end of the link F chain is located at the F chain 5 At the 2nd to 5th (for example, 2nd, 3rd, 4th, and 5th) nucleotides of the end, the 3rd end of the R chain near the linker is located at the 2nd to 5th of the R chain 3, for example, 2, 3, 4, 5) nucleotides; preferably, the number of locked nucleic acids in the F chain or R chain of the linker is 1-3 (for example, 1, 2, 3);
  • the sense primer contains a locked nucleic acid located near the PCR primer 5, , , , 3, 4, 5) nucleotides; and/or antisense primers contain a locked nucleic acid located near the 3' end of the PCR primer, preferably, close to 3, the end refers to the primer 2 to 5 (for example, 2, 3, 4, 5) nucleotides near the 3'end; further preferably, the number of locked nucleic acids in the PCR primer is 1 to 3 (for example, 1, 2, 3).
  • the locked nucleic acid contained in the linker is located at the 5th nucleotide of the F chain 5, and is located at the 3rd nucleotide of the R chain 3, the F chain
  • the number of locked nucleic acids in the R chain and each of the R chains is one.
  • sequence of the linker F chain is the sequence of SEQ ID NO: 3; in a specific embodiment of the invention, the sequence of the linker R chain is SEQ ID NO: 4 The sequence shown.
  • the sense primer refers to a PCR primer for amplifying a coding strand in genomic DNA, and in one embodiment of the present invention, a partial sequence thereof is identical to a fixed sequence P5 on a chip. PCR primers.
  • the PCR primer that binds the partial sequence to the immobilized sequence P5 on the chip refers to a PCR Primer PE 1.0 primer.
  • the locked nucleic acid contained in the PCR primer is located at the 3rd nucleotide of the PCR Primer PE 1.0 primer near the 5th end, and the number of the locked nucleic acid is 1.
  • the antisense primer refers to a PCR primer for amplifying a template strand in genomic DNA, and in one embodiment of the invention, a partial sequence thereof and a fixed sequence on the chip P7- The resulting PCR primers.
  • the PCR primers that bind to the immobilized sequence P7 on the chip comprise a tag sequence for distinguishing between different sequencing results, such that the binding to the immobilized sequence P7 on the chip
  • the number of PCR primers is the same as the number of tag sequences. In a specific embodiment of the present invention, the number of the tag sequences is four, and therefore the number of the PCR primers combined with the immobilized sequence P7 on the chip is four.
  • the PCR primer of the partial sequence and the immobilized sequence P7 on the chip is a PCR Primer PE2.0 primer, for example, PCR Primer PE2.0 primers A, B, C. D.
  • the locked nucleic acid contained in the PCR primer is located at the 3rd nucleotide of the 3' end of the PCR Primer PE2.0 primer, and the number of the locked nucleic acid is 1.
  • the sequence of the PCR primer is the sequence shown in SEQ ID NO: 6.
  • the sequence of the PCR primer is the sequence shown in SEQ ID NOs: 11-14.
  • a fourth aspect of the invention relates to a method of sequencing a DNA library, the method comprising modifying with a locked nucleic acid , ,
  • the locked nucleic acid contained in the sequencing primer is located near the 3 end of the sequencing primer; preferably, the 3 terminal near the sequencing primer refers to the 2nd to 5th of the sequencing primer near the 3rd end (for example, the 2nd and the 3rd) 4, 5) nucleotides; preferably, the number of locked nucleic acids in the sequencing primer is 1-3 (for example, 1, 2, 3).
  • the sequencing primer is a Read2 sequencing primer.
  • the locked nucleic acid contained in the sequencing primer is located at the 2nd and 4th nucleotides of the sequencing primer near the 3' end, and the number of locked nucleic acids in the sequencing primer is 2.
  • the sequence of the sequencing primer is the sequence shown in SEQ ID NO: 16.
  • a fifth aspect of the present invention relates to a high-throughput sequencing method, which comprises a method for constructing a DNA library and a method for sequencing a DNA library, the method for constructing the DNA library according to any one of the third aspect of the present invention, The sequencing method of the DNA library is the sequencing method according to any one of the fourth aspects of the invention.
  • a sixth aspect of the invention relates to the use of a DNA fragment according to any of the first aspects of the invention in high throughput sequencing, construction of a DNA library or sequencing of a DNA library.
  • the high throughput sequencing refers to SOLEXA sequencing.
  • SOLEXA sequencing the high throughput sequencing. The invention is further described below.
  • the high-throughput sequencing is also called "Next-generation" sequencing technology, in order to be capable of paralleling hundreds of thousands to millions of DNA molecules at a time. Sequence determination is a marker, and high-throughput sequencing makes it possible to perform detailed analysis of the transcriptome and genome of a species, so it is also known as deep sequencing, including but not limited to: massively parallel signatures Massively Parallel Signature Sequencing (MPSS).
  • MPSS Massively Parallel Signature Sequencing
  • Polymerase cloning POLony Sequencing
  • 454 pyrosequencing Illumina (Solexa) sequencing
  • ABI SOLiD sequencing Ion semiconductor sequencing
  • DNA nanospheres DNA nanoball sequencing Helicos' single-molecule DNA sequencing technology, etc.
  • synthetic sequencing such as SOLEXA sequencing and various sequencing technologies developed based on Solexa sequencing.
  • the SOLEXA sequencing belongs to the next-generation sequencing technology developed by SOLICA, and the core idea is to sequence while synthesizing. That is, when a new DNA complementary strand is generated, either the added dNTP catalyzes the substrate to catalyze the fluorescence by enzymatic cascade reaction, or directly adds the fluorescently labeled dNTP or semi-degenerate primer, and releases when the synthetic strand is synthesized or linked to form a complementary strand. Fluorescent signal. Complementary strand sequence information is obtained by capturing the optical signal and transforming it into a sequencing peak (Mardis ER (2008). x ,. u .
  • the SOLEXA test includes sequencing of DNA samples and sequencing of RNA samples. Depending on the sequencing method, SOLEXA sequencing can be divided into single-end sequencing (Single-read Sequencing) and double-ended sequencing (Paired-end Sequencing and Mate-pair Sequencing). In an embodiment of the invention, the SOLEXA sequencing method is a Paired-end sequencing method.
  • the locked nucleic acid refers to a synthetic antisense oligonucleotide, which is a special bicyclic nucleotide derivative and also belongs to a kind of nucleotide.
  • the 2,-oxygen and 4,-carbon of the ribose ring ( ⁇ -D-ribofuranosyl) of the nucleotide residue form a fluorenylene linkage by shrinkage (see Formula I, where B is a base), and the structure contains One or more 2'-0,4'-C-arylene- ⁇ -D-ribofuranoic acid monomers, the VO position and the 4'-C position of ribose form an oxy-indenylene bridge through different shrinkage, a sulfinylene bridge or an amine sulfhydryl bridge, which is connected in a ring shape.
  • This ring bridge locks the N-form of the furanose C3'-endotype, reduces the flexibility of the ribose structure, and increases
  • the locked nucleic acid modification means that the nucleotide in the DNA fragment is replaced by a locked nucleic acid having the same base.
  • the DNA library refers to a library prepared by extracting genomic DNA from a cell, then breaking it to a size of about 100-1000 bp, and then ligating the linker to the fragment and PCR-amplifying it.
  • the library is used for high throughput sequencing, such as SOLEXA sequencing.
  • the construction of the DNA library refers to a process from the extraction of DNA in a cell to the obtaining of a DNA library.
  • the sequencing of the DNA library refers to a process of sequencing the obtained DNA library to obtain a nucleotide sequence of each fragment in the library.
  • the DNA fragment refers to a small DNA fragment required for high-throughput sequencing such as SOLEXA sequencing, including a linker, a PCR primer, and a sequencing primer.
  • the genomic DNA fragment refers to a fragment obtained by disrupting the extracted genomic DNA.
  • the adaptor is used in high-throughput sequencing, particularly in SOLEXA sequencing, specifically to add a "Y"-type double-stranded DNA fragment at the end of the interrupted genomic DNA fragment, which is A "Y" type double-stranded DNA fragment synthesized by annealing the F chain and the R chain.
  • the function of the linker is to add a known sequence to design the corresponding primer for PCR.
  • the PCR primer is used in high-throughput sequencing, particularly in SOLEXA sequencing, w.
  • the sequencing primers are used in high-throughput sequencing, particularly SOLEXA sequencing, and specifically refer to primers for sequencing a constructed DNA library.
  • the F chain of the linker refers to a forward oligonucleotide chain of a double-linker, wherein the 5, end sequence is complementary to the R chain 3, the end sequence, forming a Y-type double-stranded fragment, 5, After phosphorylation, it can be linked to the 3, end A of the genomic DNA fragment after addition of A.
  • the 3' end sequence of the R chain of the linker is complementary to the F chain 5, the end sequence to form a Y-type double-stranded fragment, and the 3' end may be ligated to the 5' end of the genomic DNA fragment.
  • the 5, or 3, terminal near the DNA fragment is located within a third of the length of the 5, or 3, end of the fragment.
  • the n-th nucleotide located at the 5, or 3, end of a DNA fragment is calculated from the first nucleotide of the 5, or 3, end of the fragment. The position of the nth nucleotide.
  • the flowcell refers to a sequencing chip to which a single-stranded oligonucleotide sequence is attached.
  • the fixed sequence P7 refers to a binding sequence on a flowcell, and a 5, end sequence of a sequence sequence of a SBS process template.
  • the fixed sequence P5 refers to a binding sequence on a flowcell.
  • the sense primer is also referred to as an upstream primer, and refers to a primer which is identical to the 5' end sequence of the coding strand in the DNA fragment to be amplified.
  • the antisense primer also referred to as a downstream primer, refers to a primer complementary to the 3, end sequence of the coding strand in the DNA fragment to be amplified.
  • the Readl sequencing primer refers to a sequencing primer used for synthesizing a read library 5, a terminal sequence at the time of double-end sequencing.
  • the Read2 sequencing primer refers to a sequencing primer used for synthesizing the read library 3, the terminal sequence, at the time of double-end sequencing.
  • the nucleotide includes deoxyribonucleotides, ribonucleotides, and also includes a locked nucleic acid.
  • the position of the locked nucleic acid means that the deoxyribonucleotide at that place is replaced by a corresponding locked nucleic acid, that is, by a locked nucleic acid containing the same base.
  • a library of LNA-modified linkers and PCR primers is separately constructed, and a library of commonly used DNA small fragments that have not been modified by LNA is used, and after successful library preparation, solexa high-throughput sequencing is performed, wherein The LNA-modified library was subjected to LNA-modified supernatant sequencing primers, and the LNA-modified library was subjected to sequencing primers provided by Illumina, and the results obtained were compared to verify the stability, reproducibility and true reliability of the present invention.
  • the invention combines the LNA modification technology with the high-throughput sequencing technology, and improves the thermal stability of the DNA fragment and the stability against the enzyme degradation by performing LNA modification on the linker, PCR primer and/or sequencing primer involved in SOLEXA sequencing. Sex, which activates RNase H, which reduces DNA dimer production and improves ligation and PCR efficiency.
  • the present invention optimizes the LNA modification site, and adopts different strategies for modification of different DNA fragments, thereby further improving the effect of LNA modification.
  • the modification site is located at the 5th end of the F chain of the linker and at the 3rd end of the R chain, which can improve the efficiency of the joint annealing and improve the connection of the linker with the target fragment which is reacted with "A" reaction.
  • the LNA modification site of the common PCR Primer PE 1.0 primer is close to the 5th end, which can improve the binding efficiency with the p5 sequence
  • PCR Primer PE2.0 primers (A, B, C, D) have LNA modification sites close to the 3' end, which improves the binding efficiency to the p7 sequence, and ultimately makes the sequence of interest and the immobilization sequence more stable
  • LNA for sequencing primers Modification, using double-modification of sequencing primers at the 3' end can improve the stability, specificity and sensitivity of sequencing primers, thereby improving the quality of the entire sequencing Run (single-on-sequence sequencing reaction).
  • the present invention performs parallel analysis on the sequencing data of the LNA modified library and the non-LNA modified library, and evaluates the sequencing quality by sequencing the base quality value (Q30%), the sequencing error rate, the GC content, the joint contamination, and the genomic alignment ratio.
  • Library quality was evaluated by GC distribution, gene coverage, and the like. By comparison, it was found that the library modified with LNA was better than the library without LNA modification, both in library quality and in sequencing quality.
  • FIG. 1 LNA modification protocol for SOLEXA sequencing
  • FIG. 1 Schematic diagram of the preparation process of the DNA PE index library
  • Figure 4A Non-LNA modified library sequencing mass distribution map, the abscissa is the number of cycles of PE90, the ordinate is the base mass value corresponding to each cycle, the color represents a different percentage, white: 0, green: 10%, yellow: 30%, red: 50%, deep red: 70%, black: 100%;
  • Figure 4B LNA modified library sequencing mass distribution map, the abscissa is the number of cycles of PE90, the ordinate is the base mass value corresponding to each cycle, the color represents a different percentage, white: 0, green: 10%, yellow: 30 %, red: 50%, dark red: 70%, black: 100%; , ,
  • the number of cycles, the ordinate is the error rate (number of bases per cycle error / number of bases in all cycles);
  • Non-LNA modified library sequencing results GC content distribution map the abscissa is the GC content of each window on the statistical Acinetobacter reference sequence, and the ordinate is the number of times of comparison to each window;
  • Figure 6 LNA modified library sequencing results GC content distribution map, the abscissa is the GC content of each window on the statistical Acinetobacter reference sequence, and the ordinate is the number of times of comparison to each window;
  • Figure 7 Gene coverage map of library sequencing results, the abscissa is the number of coverages per base, and the ordinate is the number of bases; where the light curve (lower peak curve) indicates a non-LNA modified library, dark The curve (higher curve of the peak) represents the LNA modified library;
  • FIG. 8 Aglient 2100 results for unannealed joint annealing
  • FIG. 8B Aglient 2100 results for LNA-modified joint annealing
  • the box plot is drawn by sorting Q30% of all tiles in each cycle and plotting 5 points as a box plot: highest, lowest, median, quarter, three-quarters;
  • the box plot is drawn in such a way that Q30% of all tiles are sorted and 5 points are drawn into a box plot: highest, lowest, median, quarter, three quarters.
  • Acinetobacter genomic DNA (genome ⁇ 3.6M, GC content 40.4%) was extracted as a template, about 30 ⁇ g, the starting amount of each library was 3 ⁇ ⁇ , the Covaris S2 was used to break the main band 350bp, and 8 inserts were constructed in parallel.
  • a 350 bp DNA PE inde library 4 of which used LNA-modified adapters and PCR primers, and the other 4 libraries used LNA-free linkers and PCR primers.
  • high-throughput sequencing of solexa SBS-sequencing by synthesis
  • was carried out in which the LNA-modified library was sequenced with LNA-modified sequencing primer read 2, and the library without LNA modification used the sequencing primer provided by Illumina.
  • 350bp DNA PE index library preparation operation See Figure 1, Figure 2. Specific steps are as follows:
  • the terminal repair product was purified by QIAquick PCR Purification Kit (QIAGEN) and dissolved in 34 ⁇ l EB buffer.
  • the product was purified by MinElute PCR Purification Kit (QIAGEN) and dissolved in 12 ⁇ l of EB buffer.
  • the underlined part is the LNA modification site of the linker, which represents the LNA modified G and T, respectively.
  • the LNA modification site is close to the 5th end of the F chain, the R chain 3, and the end, which can improve the joint annealing efficiency and improve
  • the ligation of the linker with the target fragment reacted with "addition of A" makes the binding of the linker and the PCR primer more efficient, specific and sensitive.
  • the ligated product was purified by QIAquick PCR Purification Kit (QIAGEN) and dissolved in 32 ⁇ l of EB buffer.
  • the ligated product prepared in the step 4 was prepared, and a 2% agarose gel was prepared, and 50 bp DNA Ladder (NEB) was selected, and 120 rpm was electrophoresed for 60 minutes. The gelation was recovered, and the range of the gel was determined according to the size of the linker and the desired fragment size. The cut pieces were recovered by QIAquick Gel Extraction Kit (QIAGEN) and finally dissolved in 23 ⁇ 1 EB buffer.
  • the PCR Primer PE 1.0 primer (5, end primer, upstream primer) sequence is:
  • the LNA-modified PCR Primer PE 1.0 primer sequence is:
  • the underlined part is the LNA modification site of the linker, indicating the LNA modified T.
  • the four unmodified PCR Primer PE 2.0 primers (3, end primers, downstream primers) sequence are:
  • ACGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO : 8)
  • the four LNA-modified PCR Primer PE 2.0 primer sequences are:
  • the LNA modification sites of the underlined linker represent the LNA modified T, respectively.
  • the partial sequence of the PCR Primer PE 2.0 primer is identical to the fixed sequence P7 on the flowcell, underlined.
  • the LNA modification site of the PCR primer the LNA modification site of the common PCR Primer PE 1.0 primer is close to the 5, and the binding efficiency of the P5 complementary sequence (DNA library template strand) can be improved.
  • PCR Primer PE2.0 primer (A) The LNA modification site of B, C, D) is close to the 3, and the binding efficiency of the P7 complementary sequence (DNA library coding strand) can be improved, and finally the target sequence is more stably bound to the fixed sequence.
  • binding sequence in the fixed sequence P5 is: AATGATACGGCGACCACCGA ( SEQ ID NO: 19);
  • the binding sequence in the fixed sequence P7 is: CAAGCAGAAGACGGCATACGA (SEQ ID NO: 20).
  • step 6 Take the PCR amplification product obtained in step 6. Prepare 2% agarose gel, select 50 bp DNA Ladder, 120v electrophoresis for 60min, and cut the gel. The range of gelation is determined by the size of the linker and the desired target fragment. The cut rubber pieces are recovered by QIAquick Gel Extraction Kit and finally dissolved in 25 ⁇ 1 ⁇ u
  • Tables 1 and 2 The results of the sequencing information analysis are shown in Tables 1 and 2, wherein the results in Tables 1 and 2 are the results of sequencing analysis of a set of parallel constructed libraries. Table 1 Sequencing information analysis results of unmodified library
  • the ratio of the ratio of CT to 0 is 1.
  • the ratio is 99.85; the paired reads are only 96.6 inserts large 320 insert size error -12/+15 (%Align) 97.96 ratio is small ( Insert Size
  • the sequencing result is better than the unmodified sequencing result.
  • the q30 box plot is concentrated, and the median value is also significantly higher. It is indicated that the base quality of the library modified with LNA is better than no modification.
  • Figure 4A shows the sequencing results without LNA modification.
  • Figure 4B shows the sequencing results with LNA modification.
  • the abscissa is the cycle number of PE90, and the ordinate is the base corresponding to each cycle.
  • Base quality value color represents a different percentage, white: 0, green: 10%, yellow: 30%, red: 50%, dark red: 70%, black: 100%, such as a certain quality value at a certain position.
  • the mass value of the locus on the abscissa is 10% of the total mass value of the ordinate. The higher the mass value, the darker the color, the better the quality. It can be seen from Figures 4A and 4B that the LNA-modified library has a significantly better mass value distribution than the LNA-free library.
  • Figure 5A shows the sequencing result without LNA modification.
  • Figure 5B shows the sequencing result of LNA modification.
  • the data of the lower machine is compared with the reference genome. Two 32 bp are allowed. In the case of mismatch, select the segment (sss). If the first 32 bp is matched in the case of allowing 2 mismatches, use eland software to calculate, in the comparable reads, each cycle is wrong. Number of bases / number of bases in all cycles. The abscissa indicates the number of cycles of PE90, the ordinate is the number of bases per cycle error/the number of bases in all cycles. From the results, the LNA-modified library has a lower error rate than the unmodified library. many.
  • the gene coverage distribution is shown in Figure 7.
  • the abscissa is the number of times of coverage of each base, and the ordinate is the number of bases.
  • the curve conforms to the Poisson distribution. The more concentrated the graph is on the central axis, the more random the coverage is. The figure shows that the coverage randomness is improved after the LNA modification.
  • Table 3 Sequencing information of two sets of parallel library libraries
  • Parts a and b refer to unmodified libraries, respectively, and groups A and B refer to LNA-modified texts, respectively.
  • the LNA-modified library increased the unique rate and map to genome rate, reduced the duplication rate, and the coverage and depth were better than those without the LNA modified library.
  • the F chain and the R chain (SEQ ID NO: 1 and SEQ ID NO: 2, respectively) of an unmodified hydrazine linker of equal volume and concentration of 100 ⁇ M were subjected to gradient joint annealing, and after annealing, diluted to 5 ⁇ , using the Agilent 2100 for testing, the results are shown in Figure 8A.
  • the F chain and the R chain (SEQ ID NO: 3 and SEQ ID NO: 4, respectively) of an LNA modified ⁇ linker of equal volume and concentration of 100 ⁇ were subjected to gradient joint annealing, and after annealing, diluted to 1 ⁇
  • the test was carried out using an Agilent 2100, and the results are shown in Fig. 8B.
  • the size of the synthesized double strands after annealing is 80 bp and 82 bp, respectively, and the proportion of the double-linked head synthesized by LNA modification is 60%, and the peak of the single chain is only one, and there is no modification.
  • the ratio of the double link head is 57%, and the peak of the single chain has two.
  • a set of parallel libraries was prepared according to the method of Example 1 and subjected to high-throughput sequencing of SOLEXA, except that when the linker was added, the linker used was a linker not modified by LNA (SEQ ID NO: 3 and SEQ ID NO: 4).
  • the PCR primers used were PCR primers without LNA modification (SEQ ID NOS: 7-10), and only LNA-modified and unmodified sequencing primers were used for sequencing (SEQ ID NO: 15 ⁇ 16). The sequencing results are shown in Tables 4 and 5.
  • RawClusters/Tile represents the number of clusters of clusters per tile, where the median of all tiles is taken.
  • PFClusters/Tile indicates the number of DNA clusters after each tile has been filtered by PF.
  • PF Illumina's default filter rule: Only one base in the first 25 bases is allowed to have a bad quality. Reads that do not meet this condition are filtered.
  • %PF PFClusters/RawClusters FirstCyclelnt: The light intensity of the first cycle.
  • %Phasing Probability of response lag (in the current number of cycles, the proportion of reads that are still in the previous cycle).
  • %Prephasing Probability of response advancement (the proportion of reads in the current cycle number that has already been processed in the last cycle).
  • 9A and 9B are box results of sequencing results of fq mass value q30 using LNA-modified or unmodified sequencing primer read2, wherein the more concentrated the box plot, the closer the value of Q30% is in each cycle, the box The higher the median value in the graph, the better the base quality.
  • the sequencing result of the sequencing primer using the LNA modified read2 q30 box plot is more concentrated than the sequencing result of the unmodified sequencing primer read2 q30 box plot, the median value is also higher, indicating the use of LNA The modified sequencing primer read2 is better than the base of the unmodified sequencing primer read2.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Disclosed is a locked nucleic acid-modified DNA fragment for high-throughput sequencing. The DNA fragment is selected from one, two or three of a linker, a PCR primer and a sequencing primer, and the linker, PCR primer and sequencing primer in the DNA fragment contain locked nucleic acids. Also disclosed are a high-throughput sequencing method, a DNA library construction method and a DNA library sequencing method, comprising a step of performing locked nucleic acid modification on a linker, a PCR primer and/or a sequencing primer. Also disclosed is the use of the DNA fragment in high-throughput sequencing, DNA library construction or DNA library sequencing.

Description

用于高通量测序的锁核酸修饰的 DNA片段 技术领域  Locked nucleic acid modified DNA fragment for high throughput sequencing

本发明涉及高通量测序相关技术, 具体涉及用于高通量测序的 LNA修饰 的 DNA片段, 所述 DNA片段包括接头、 PCR引物和 /或测序引物。 本发明还 涉及高通量测序方法, 所述方法包括用于高通量测序的 DNA文库制备和测序 方法,所述 DNA文库制备和测序方法包括使用 LNA修饰的 DNA片段的步骤。 背景技术  The present invention relates to high throughput sequencing related techniques, and in particular to LNA modified DNA fragments for high throughput sequencing, including DNA linkers, PCR primers and/or sequencing primers. The present invention also relates to a high throughput sequencing method comprising a DNA library preparation and sequencing method for high throughput sequencing, the DNA library preparation and sequencing method comprising the step of using an LNA modified DNA fragment. Background technique

锁核酸(Locked nucleic acid, LNA )是一种人工合成的反义寡核苷酸, 是 一种特殊的双环状核苷酸衍生物,其中核苷酸残基的核糖环( β-D-呋喃核糖)的 2,- 氧和 4,-碳通过缩水作用形成亚曱基连接, 结构中含有一个或多个 2'-0,4'-C-亚曱 基 -β-D-呋喃核糖核酸单体,核糖的 2'-0位和 4'-C位通过不同的缩水作用形成氧亚 曱基桥、 硫亚曱基桥或胺亚曱基桥,并连接成环形,这个环形桥锁定了呋喃糖 C3'- 内型的 N构型,降低了核糖结构的柔韧性,增加了磷酸盐骨架局部结构的稳定性。 由于 LNA与 DNA/RNA在结构上具有相同的磷酸盐骨架,故其对 DNA、 RNA有 很好的识别能力和强大的亲和力 (李生茂, 徐祥, 梁华平等, 锁核酸研究进展, 生理科学进展, 2003, 34 ( 4 ), 319-323 )0 Locked nucleic acid (LNA) is a synthetic antisense oligonucleotide, a special bicyclic nucleotide derivative in which the nucleotide residue of the ribose ring (β-D-) The 2,-oxygen and 4,-carbon of ribofuranose form a fluorenylene linkage by shrinkage, and the structure contains one or more 2'-0,4'-C-fluorenylene-β-D-ribofuranosyl The monomer, the 2'-0 position and the 4'-C position of ribose form an oxysulfinyl bridge, a sulfinylene bridge or an amine fluorene bridge through different shrinkage, and are connected in a ring shape, and the ring bridge is locked. The N-configuration of the furanose C3'-endotype reduces the flexibility of the ribose structure and increases the stability of the local structure of the phosphate backbone. Because LNA and DNA/RNA have the same phosphate skeleton in structure, they have good recognition ability and strong affinity for DNA and RNA (Li Shengmao, Xu Xiang, Liang Huaping, Research progress in locked nucleic acid, Physiological science) Progress, 2003, 34 ( 4 ), 319-323 ) 0

随着高通量测序技术的应用和发展, DNA测序技术也日趋成熟。常用的 DNA 文库构建技术, 容易产生接头和引物二聚体, 从而影响建库质量, 增加无效数据 量。 目前, 基于 LNA技术直接用于高通量测序技术的报道较少, 2010年有研究 ( Reduction of non-insert sequence reads by dimer eliminator LNA oligonucleotide for small RNA deep sequencing , BioTechniques , 49:751-755 (October 2010) doi 10.2144/000113516 )使用 LNA探针去除 small RNA文库中的 自连片段,以减少文库中 non-insert的数据比例。因此,能否将 LNA技术与 DNA 高通量测序技术结合, 以提高文库建库效率、 增加测序准确性, 是本领域亟需解 决的问题。 发明内容  With the application and development of high-throughput sequencing technology, DNA sequencing technology is also becoming more mature. The commonly used DNA library construction technology is prone to the generation of linkers and primer dimers, which affects the quality of the database and increases the amount of invalid data. At present, there are few reports on the direct use of LNA technology for high-throughput sequencing technology. Reduction of non-insert sequence reads by dimer eliminator LNA oligonucleotide for small RNA deep sequencing, BioTechniques, 49:751-755 (October 2010) doi 10.2144/000113516) The LNA probe was used to remove self-ligated fragments from the small RNA library to reduce the proportion of non-insert data in the library. Therefore, the ability to combine LNA technology with DNA high-throughput sequencing technology to improve library building efficiency and increase sequencing accuracy is an urgent problem in this field. Summary of the invention

本发明人通过不懈地努力, 令人惊奇地发现, 如果对高通量测序中所涉及 到的接头、 PCR 引物和 /或测序引物进行锁核酸(LNA )修饰, 可以大大提高 高通量测序的质量, 例如 DNA文库的建库质量和 /或 DNA文库的测序质量, 本发明基于以上发现而完成。 具体地, 本发明包括以下几个方面:  Through intensive efforts, the inventors have surprisingly found that high-throughput sequencing can be greatly improved if the nucleic acid (LNA) modification of the linker, PCR primers and/or sequencing primers involved in high throughput sequencing is performed. The quality, such as the quality of the library of the DNA library and/or the quality of the sequencing of the DNA library, is based on the above findings. Specifically, the present invention includes the following aspects:

本发明的第一方面涉及用于高通量测序的锁核酸修饰的 DNA片段, 其中 所述 DNA片段选自接头、 PCR引物和测序引物中的一种、 两种或三种, 其特 征在于, 所述 DNA片段中接头、 PCR引物和 /或测序引物含有锁核酸。  A first aspect of the invention relates to a locked nucleic acid modified DNA fragment for high throughput sequencing, wherein the DNA fragment is selected from one, two or three of a linker, a PCR primer and a sequencing primer, characterized in that The linker, PCR primer and/or sequencing primer in the DNA fragment contains a locked nucleic acid.

在本发明的一个实施方案中, 所述高通量测序是指 SOLEXA测序。 一 , 。 In one embodiment of the invention, the high throughput sequencing refers to SOLEXA sequencing. One, .

在本发明的另一个实施方案中, 所述 DNA片段为 PCR引物。  In another embodiment of the invention, the DNA fragment is a PCR primer.

在本发明的另一个实施方案中, 所述 DNA片段为测序引物。  In another embodiment of the invention, the DNA fragment is a sequencing primer.

在本发明的另一个实施方案中, 所述 DNA片段为接头和 PCR引物。  In another embodiment of the invention, the DNA fragment is a linker and a PCR primer.

在本发明的另一个实施方案中, 所述 DNA片段为 PCR引物和测序引物。 在本发明的另一个实施方案中, 所述 DNA片段为接头、 PCR引物和测序 引物。  In another embodiment of the invention, the DNA fragment is a PCR primer and a sequencing primer. In another embodiment of the invention, the DNA fragment is a linker, a PCR primer, and a sequencing primer.

根据本发明第一方面任一项的 DNA片段, 所述接头中含有的锁核酸修饰 位于接头靠近接头 F链的 5,端, 和 /或靠近接头 R链的 3,端; 优选地, 所述靠 近接头 F链的 5,端是指位于 F链 5,端的第 2 ~ 5个 (例如第 2、 3、 4、 5个) 核苷酸处, 所述靠近接头 R链的 3,端是指位于 R链 3,端的第 2 ~ 5个(例如第 2、 3、 4、 5个)核苷酸处;优选地,所述接头 F链或 R链中锁核酸的个数为 1 ~ 3个 (例如为 1、 2、 3个) 。  A DNA fragment according to any one of the first aspects of the present invention, wherein the lock nucleic acid modification contained in the linker is located at a 5, end of the linker near the linker F chain, and/or near the 3 end of the linker R chain; preferably, said The 5th end near the F chain of the linker is located at the 2nd to 5th (for example, 2nd, 3rd, 4th, 5th) nucleotides of the F chain 5, and the 3rd end of the R chain near the linker refers to Located at the 2nd to 5th (for example, 2nd, 3rd, 4th, 5th) nucleotides of the R chain 3, preferably, the number of locked nucleic acids in the F chain or R chain of the linker is 1 to 3 (for example, 1, 2, 3).

在本发明的一个实施方案中, 所述接头中含有的锁核酸位于接头 F链 5, 端的第 5个核苷酸处, 并且位于接头 R链 3,端的第 3个核苷酸处, 所述接头 F 链和 R链中锁核酸的个数各为 1个。  In one embodiment of the invention, the locked nucleic acid contained in the linker is located at the 5th nucleotide of the terminal F chain 5, and is located at the 3rd nucleotide of the end of the linker R chain 3, The number of locked nucleic acids in the F chain and the R chain of the linker is one.

在本发明的一个具体实施方案中, 所述接头 F链的序列为 SEQ ID NO: 3 所示序列; 在本发明的一个具体实施方案中, 所述接头 R链的序列为 SEQ ID NO: 4所示序列。  In a specific embodiment of the invention, the sequence of the linker F chain is the sequence of SEQ ID NO: 3; in a specific embodiment of the invention, the sequence of the linker R chain is SEQ ID NO: 4 The sequence shown.

根据本发明第一方面任一项的 DNA片段,所述 PCR引物中,正义引物(上 游引物) 所含有的锁核酸位于靠近该 PCR引物的 5,端, 优选地, 所述靠近 5, 端是指位于该引物靠近 5,端的第 2 ~ 5个 (例如第 2、 3、 4、 5个)核苷酸处; 和 /或反义引物(下游引物)所含有的锁核酸位于靠近该 PCR引物的 3,端, 优选 地, 所述靠近 3,端是指位于该引物靠近 3,端的第 2 ~ 5个(例如第 2、 3、 4、 5 个)核苷酸处; 进一步优选地, 所述 PCR引物中锁核酸的个数为 1 ~ 3个(例 如为 1、 2、 3个) 。  A DNA fragment according to any one of the first aspects of the present invention, wherein, in the PCR primer, the sense primer (upstream primer) contains a locked nucleic acid located near the 5th end of the PCR primer, preferably, the near 5 is Refers to the 2nd to 5th (eg, 2nd, 3rd, 4th, 5th) nucleotides near the 5th end of the primer; and/or the antisense primer (downstream primer) contains the locked nucleic acid located near the PCR primer 3, the end, preferably, the near 3, the end refers to the 2nd to 5th (for example, 2nd, 3rd, 4th, 5th) nucleotides located near the 3rd end of the primer; further preferably, The number of locked nucleic acids in the PCR primer is 1 to 3 (for example, 1, 2, 3).

在本发明的实施方案中, 所述正义引物是指用于扩增基因组 DNA中编码 链的 PCR 引物, 在本发明的一个实施方案中, 是指其部分序列与芯片上的固 定序列 P5—致的 PCR引物。  In an embodiment of the present invention, the sense primer refers to a PCR primer for amplifying a coding strand in genomic DNA, and in one embodiment of the present invention, a partial sequence thereof is identical to a fixed sequence P5 on a chip. PCR primers.

在本发明的具体实施方案中, 所述部分序列与芯片上的固定序列 P5—致 的 PCR引物是指 PCR Primer PE 1.0引物。  In a particular embodiment of the invention, the PCR sequence of the partial sequence and the immobilized sequence P5 on the chip refers to a PCR Primer PE 1.0 primer.

在本发明的一个实施方案中, 所述 PCR引物中含有的的锁核酸位于 PCR Primer PE 1.0引物靠近 5,端的第 3个核苷酸处, 所述锁核酸的个数为 1个。  In one embodiment of the present invention, the locked nucleic acid contained in the PCR primer is located at the 3rd nucleotide of the PCR Primer PE 1.0 primer near the 5th end, and the number of the locked nucleic acid is 1.

在本发明的实施方案中, 所述反义引物是指用于扩增基因组 DNA中模板 链的 PCR 引物, 在本发明的一个实施方案中, 是指其部分序列与芯片上的固 定序列 P7—致的 PCR引物。  In an embodiment of the invention, the antisense primer refers to a PCR primer for amplifying a template strand in genomic DNA, and in one embodiment of the invention, a partial sequence thereof and a fixed sequence on the chip P7- The resulting PCR primers.

在本发明的具体实施方案中, 所述与芯片上的固定序列 P7结合的 PCR引 物中包含标签序列, 所述标签序列用于区分不同的测序结果, 因此所述与芯片上 混合测序的文库数量有关,在本发明的具体实施方案中,四个不同文库混合测序, 因此所述标签序列的个数为 4个, 因此所述与芯片上的固定序列 P7结合的 PCR 引物的个数为 4个。 In a specific embodiment of the present invention, the PCR primer that binds to the immobilized sequence P7 on the chip comprises a tag sequence for distinguishing different sequencing results, thus the on-chip The number of libraries that are mixed and sequenced is related. In a specific embodiment of the present invention, four different libraries are mixed and sequenced, so the number of the tag sequences is four, so the PCR primers that bind to the immobilized sequence P7 on the chip The number is four.

在本发明的实施方案中, 标签由 A/T/C/G组成, 起标识不同文库的作用, 使不同文库可以混合测序, 充分利用测序通量。 在本发明的一个具体实施方案 中, 标签长度为 8nt, 如, 为 AGAGACTT、 GCGAGGCC. AGATCTCT或者 TAGAGAGC。在文库构建过程中,标签可以通过连接标签接头或者 PCR引入。 在本发明的一个具体实施方案中, 合成 PCR引物时利用 PCR引入标签, 使不同 文库带有不同的标识可混合测序。  In an embodiment of the invention, the tag consists of A/T/C/G, which serves to identify different libraries, allowing different libraries to be mixed and sequenced to take full advantage of sequencing throughput. In a particular embodiment of the invention, the label has a length of 8 nt, such as AGAGACTT, GCGAGGCC. AGATCTCT or TAGAGAGC. During library construction, tags can be introduced by ligation tag ligation or PCR. In a specific embodiment of the invention, PCR primers are used to introduce tags using PCR, allowing different libraries to be labeled with different markers for sequencing.

在本发明的一个具体实施方案中, 所述部分序列与芯片上的固定序列 P7— 致的 PCR引物为 PCR Primer PE2.0引物, 例如为 PCR Primer PE2.0引物 A、 B、 C. D。  In a specific embodiment of the present invention, the PCR primer of the partial sequence and the immobilized sequence P7 on the chip is a PCR Primer PE2.0 primer, for example, PCR Primer PE2.0 primers A, B, C. D.

在本发明的一个具体实施方案中, 所述 PCR 引物中含有的的锁核酸位于 PCR Primer PE2.0引物的 3,端的第 3个核苷酸处, 所述锁核酸的个数为 1个。  In a specific embodiment of the present invention, the locked nucleic acid contained in the PCR primer is located at the 3rd nucleotide of the 3' end of the PCR Primer PE2.0 primer, and the number of the locked nucleic acid is 1.

在本发明的一个具体实施方案中,所述 PCR引物的序列为 SEQ ID NO: 6 所示序列。  In a specific embodiment of the invention, the sequence of the PCR primer is the sequence shown in SEQ ID NO: 6.

在本发明的一个具体实施方案中, 所述 PCR引物的序列为 SEQ ID NO: 11 - 14所示序列。  In a specific embodiment of the invention, the sequence of the PCR primer is the sequence shown in SEQ ID NOs: 11-14.

根据本发明第一方面任一项的 DNA片段, 所述测序引物中含有的锁核酸 位于靠近测序引物的 3,端; 优选地, 所述靠近测序引物的 3,端是指位于测序引 物靠近 3,端的第 2 ~ 5个(例如第 2、 3、 4、 5个)核苷酸处; 优选地, 所述测 序引物中锁核酸的个数为 1 ~ 3个 (例如为 1、 2、 3个) 。  According to the DNA fragment of any one of the first aspects of the present invention, the locked nucleic acid contained in the sequencing primer is located near the 3' end of the sequencing primer; preferably, the 3 end of the sequencing primer is located near the sequencing primer. At the 2nd to 5th (for example, 2nd, 3rd, 4th, 5th) nucleotides of the terminal; preferably, the number of locked nucleic acids in the sequencing primer is 1 to 3 (for example, 1, 2, 3) )).

在本发明的实施方案中, 所述测序引物为 Read2测序引物。  In an embodiment of the invention, the sequencing primer is a Read2 sequencing primer.

在本发明的一个实施方案中, 所述测序引物中含有的锁核酸位于测序引物 靠近 3,端的第 2和第 4个核苷酸处, 所述测序引物中锁核酸的个数为 2个。  In one embodiment of the present invention, the locked nucleic acid contained in the sequencing primer is located at the 2nd and 4th nucleotides of the sequencing primer near the 3' end, and the number of locked nucleic acids in the sequencing primer is 2.

在本发明的一个具体实施方案中,所述测序引物的序列为 SEQ ID NO: 16 所示序列。  In a specific embodiment of the invention, the sequence of the sequencing primer is the sequence shown in SEQ ID NO: 16.

本发明第二方面涉及组合物,其含有本发明第一方面任一项的 DNA片段。 本发明第三方面涉及 DNA文库的构建方法,所述方法包括对 DNA片段进行 锁核酸修饰的步骤, 所述 DNA片段为接头和 /或 PCR引物, 所述锁核酸修饰是 指 DNA片段中接头和 /或 PCR引物含有锁核酸; 优选地,  A second aspect of the invention relates to a composition comprising the DNA fragment of any of the first aspects of the invention. A third aspect of the invention relates to a method for constructing a DNA library, the method comprising the step of performing a nucleic acid modification on a DNA fragment, the DNA fragment being a linker and/or a PCR primer, the lock nucleic acid modification being a linker in the DNA fragment and / or PCR primers contain a locked nucleic acid; preferably,

所述接头中含有的锁核酸位于接头靠近接头 F链的 5,端, 和 /或靠近接头 R 链的 3,端; 优选地, 所述靠近接头 F链的 5,端是指位于 F链 5,端的第 2 ~ 5个 (例 如第 2、 3、 4、 5个)核苷酸处, 所述靠近接头 R链的 3,端是指位于 R链 3,端的第 2 ~ 5个(例如第 2、 3、 4、 5个)核苷酸处; 优选地, 所述接头 F链或 R链中锁核 酸的个数为 1 ~ 3个(例如为 1、 2、 3个) ;  The lock nucleic acid contained in the linker is located at the 5th end of the linker F chain, and/or 3 end of the linker R chain; preferably, the 5 end of the link F chain is located at the F chain 5 At the 2nd to 5th (for example, 2nd, 3rd, 4th, and 5th) nucleotides of the end, the 3rd end of the R chain near the linker is located at the 2nd to 5th of the R chain 3, for example, 2, 3, 4, 5) nucleotides; preferably, the number of locked nucleic acids in the F chain or R chain of the linker is 1-3 (for example, 1, 2, 3);

所述 PCR引物中, 正义引物所含有的锁核酸位于靠近该 PCR引物的 5, , , 、 3、 4、 5个)核苷酸处; 和 /或反义引物所含有的锁核酸位于靠近该 PCR引物的 3,端,优选地,所述靠近 3,端是指位于该引物靠近 3,端的第 2 ~ 5个(例如第 2、 3、 4、 5个)核苷酸处; 进一步优选地, 所述 PCR引物中锁核酸的个数为 1 ~ 3个 (例如为 1、 2、 3个) 。 In the PCR primer, the sense primer contains a locked nucleic acid located near the PCR primer 5, , , , 3, 4, 5) nucleotides; and/or antisense primers contain a locked nucleic acid located near the 3' end of the PCR primer, preferably, close to 3, the end refers to the primer 2 to 5 (for example, 2, 3, 4, 5) nucleotides near the 3'end; further preferably, the number of locked nucleic acids in the PCR primer is 1 to 3 (for example, 1, 2, 3).

在本发明的一个实施方案中, 所述接头中含有的锁核酸位于 F链 5,端的第 5 个核苷酸处, 并且位于 R链 3,端的第 3个核苷酸处, 所述 F链和 R链中锁核酸的 个数各为 1个。  In one embodiment of the invention, the locked nucleic acid contained in the linker is located at the 5th nucleotide of the F chain 5, and is located at the 3rd nucleotide of the R chain 3, the F chain The number of locked nucleic acids in the R chain and each of the R chains is one.

在本发明的一个具体实施方案中, 所述接头 F链的序列为 SEQ ID NO: 3 所示序列; 在本发明的一个具体实施方案中, 所述接头 R链的序列为 SEQ ID NO: 4所示序列。  In a specific embodiment of the invention, the sequence of the linker F chain is the sequence of SEQ ID NO: 3; in a specific embodiment of the invention, the sequence of the linker R chain is SEQ ID NO: 4 The sequence shown.

在本发明的实施方案中, 所述正义引物是指用于扩增基因组 DNA中编码 链的 PCR 引物, 在本发明的一个实施方案中, 是指其部分序列与芯片上的固 定序列 P5—致的 PCR引物。  In an embodiment of the present invention, the sense primer refers to a PCR primer for amplifying a coding strand in genomic DNA, and in one embodiment of the present invention, a partial sequence thereof is identical to a fixed sequence P5 on a chip. PCR primers.

在本发明的具体实施方案中, 所述部分序列与芯片上的固定序列 P5结合 的 PCR引物是指 PCR Primer PE 1.0引物。  In a specific embodiment of the invention, the PCR primer that binds the partial sequence to the immobilized sequence P5 on the chip refers to a PCR Primer PE 1.0 primer.

在本发明的一个实施方案中, 所述 PCR引物中含有的的锁核酸位于 PCR Primer PE 1.0引物靠近 5,端的第 3个核苷酸处, 所述锁核酸的个数为 1个。  In one embodiment of the present invention, the locked nucleic acid contained in the PCR primer is located at the 3rd nucleotide of the PCR Primer PE 1.0 primer near the 5th end, and the number of the locked nucleic acid is 1.

在本发明的实施方案中, 所述反义引物是指用于扩增基因组 DNA中模板 链的 PCR 引物, 在本发明的一个实施方案中, 是指其部分序列与芯片上的固 定序列 P7—致的 PCR引物。  In an embodiment of the invention, the antisense primer refers to a PCR primer for amplifying a template strand in genomic DNA, and in one embodiment of the invention, a partial sequence thereof and a fixed sequence on the chip P7- The resulting PCR primers.

在本发明的具体实施方案中, 所述与芯片上的固定序列 P7结合的 PCR引 物中包含标签序列, 所述标签序列用于区分不同的测序结果, 因此所述与芯片上 的固定序列 P7结合的 PCR引物的个数与标签序列的个数相同。在本发明的具体 实施方案中, 所述标签序列的个数为 4个, 因此所述与芯片上的固定序列 P7结 合的 PCR引物的个数为 4个。  In a particular embodiment of the invention, the PCR primers that bind to the immobilized sequence P7 on the chip comprise a tag sequence for distinguishing between different sequencing results, such that the binding to the immobilized sequence P7 on the chip The number of PCR primers is the same as the number of tag sequences. In a specific embodiment of the present invention, the number of the tag sequences is four, and therefore the number of the PCR primers combined with the immobilized sequence P7 on the chip is four.

所述标签序列的个数和位于 PCR引物中的位置为本领域公知。  The number of tag sequences and positions in the PCR primers are well known in the art.

在本发明的一个具体实施方案中, 所述部分序列与芯片上的固定序列 P7— 致的 PCR引物为 PCR Primer PE2.0引物, 例如为 PCR Primer PE2.0引物 A、 B、 C. D。  In a specific embodiment of the present invention, the PCR primer of the partial sequence and the immobilized sequence P7 on the chip is a PCR Primer PE2.0 primer, for example, PCR Primer PE2.0 primers A, B, C. D.

在本发明的一个具体实施方案中, 所述 PCR引物中含有的的锁核酸位于 PCR Primer PE2.0引物的 3,端的第 3个核苷酸处, 所述锁核酸的个数为 1个。  In a specific embodiment of the present invention, the locked nucleic acid contained in the PCR primer is located at the 3rd nucleotide of the 3' end of the PCR Primer PE2.0 primer, and the number of the locked nucleic acid is 1.

在本发明的一个具体实施方案中,所述 PCR引物的序列为 SEQ ID NO: 6 所示序列。  In a specific embodiment of the invention, the sequence of the PCR primer is the sequence shown in SEQ ID NO: 6.

在本发明的一个具体实施方案中, 所述 PCR引物的序列为 SEQ ID NO: 11 - 14所示序列。 本发明第四方面涉及 DNA文库的测序方法,所述方法包括利用锁核酸修饰 , , In a specific embodiment of the invention, the sequence of the PCR primer is the sequence shown in SEQ ID NOs: 11-14. A fourth aspect of the invention relates to a method of sequencing a DNA library, the method comprising modifying with a locked nucleic acid , ,

地, Ground,

所述测序引物中含有的锁核酸位于靠近测序引物的 3,端; 优选地, 所述靠 近测序引物的 3,端是指位于测序引物靠近 3,端的第 2 ~ 5个 (例如第 2、 3、 4、 5 个)核苷酸处; 优选地, 所述测序引物中锁核酸的个数为 1 ~ 3个(例如为 1、 2、 3个) 。  The locked nucleic acid contained in the sequencing primer is located near the 3 end of the sequencing primer; preferably, the 3 terminal near the sequencing primer refers to the 2nd to 5th of the sequencing primer near the 3rd end (for example, the 2nd and the 3rd) 4, 5) nucleotides; preferably, the number of locked nucleic acids in the sequencing primer is 1-3 (for example, 1, 2, 3).

在本发明的实施方案中, 所述测序引物为 Read2测序引物。  In an embodiment of the invention, the sequencing primer is a Read2 sequencing primer.

在本发明的一个实施方案中, 所述测序引物中含有的锁核酸位于测序引物 靠近 3,端的第 2和第 4个核苷酸处, 所述测序引物中锁核酸的个数为 2个。  In one embodiment of the present invention, the locked nucleic acid contained in the sequencing primer is located at the 2nd and 4th nucleotides of the sequencing primer near the 3' end, and the number of locked nucleic acids in the sequencing primer is 2.

在本发明的一个具体实施方案中,所述测序引物的序列为 SEQ ID NO: 16 所示序列。 本发明第五方面涉及高通量测序方法,其包括 DNA文库的构建方法和 DNA 文库的测序方法,所述 DNA文库的构建方法为本发明第三方面任一项所述的构 建方法, 所述 DNA文库的测序方法为本发明第四方面任一项所述的测序方法。 本发明第六方面涉及本发明第一方面任一项的 DNA片段在高通量测序、 DNA文库的构建或 DNA文库的测序中的用途。  In a specific embodiment of the invention, the sequence of the sequencing primer is the sequence shown in SEQ ID NO: 16. A fifth aspect of the present invention relates to a high-throughput sequencing method, which comprises a method for constructing a DNA library and a method for sequencing a DNA library, the method for constructing the DNA library according to any one of the third aspect of the present invention, The sequencing method of the DNA library is the sequencing method according to any one of the fourth aspects of the invention. A sixth aspect of the invention relates to the use of a DNA fragment according to any of the first aspects of the invention in high throughput sequencing, construction of a DNA library or sequencing of a DNA library.

在本发明的一个实施方案中, 所述高通量测序是指 SOLEXA测序。 以下对本发明做进一步描述。  In one embodiment of the invention, the high throughput sequencing refers to SOLEXA sequencing. The invention is further described below.

在本发明中, 所述高通量测序 ( High-throughput sequencing ) 又称"下一 代"测序技术 ( "Next-generation" sequencing technology ) , 以能一次并行对 几十万到几百万条 DNA分子进行序列测定为标志, 同时高通量测序使得对一 个物种的转录组和基因组进行细致全貌的分析成为可能, 所以又被称为深度测序 (deep sequencing) , 其包括但不限于: 大规模平行签名测序( Massively Parallel Signature Sequencing, MPSS) . 聚合酶克隆 ( Polony Sequencing ) 、 454焦碑 酸测序 ( 454 pyrosequencing ) 、 Illumina (Solexa)测序、 ABI SOLiD 测序、 离子半导体测序( Ion semiconductor sequencing ) 、 DNA 纳米球测序 ( DNA nanoball sequencing ) 、 Helicos 公司的单分子测序 ( single-molecule DNA sequencing )技术等, 优选为合成测序, 例如 SOLEXA测序及基于 Solexa测 序发展起来的各种测序技术。  In the present invention, the high-throughput sequencing is also called "Next-generation" sequencing technology, in order to be capable of paralleling hundreds of thousands to millions of DNA molecules at a time. Sequence determination is a marker, and high-throughput sequencing makes it possible to perform detailed analysis of the transcriptome and genome of a species, so it is also known as deep sequencing, including but not limited to: massively parallel signatures Massively Parallel Signature Sequencing (MPSS). Polymerase cloning (POLony Sequencing), 454 pyrosequencing, Illumina (Solexa) sequencing, ABI SOLiD sequencing, Ion semiconductor sequencing, DNA nanospheres DNA nanoball sequencing, Helicos' single-molecule DNA sequencing technology, etc., are preferably synthetic sequencing, such as SOLEXA sequencing and various sequencing technologies developed based on Solexa sequencing.

在本发明中,所述 SOLEXA测序属于 SOLEXA公司开发的新一代测序技 术, 核心思想是边合成边测序。 即生成新 DNA互补链时, 要么加入的 dNTP 通过酶促级联反应催化底物激发出荧光,要么直接加入被荧光标记的 dNTP或 半简并引物, 在合成或连接生成互补链时, 释放出荧光信号。 通过捕获光信号 并转化为一个测序峰值, 获得互补链序列信息 ( Mardis ER (2008). x ,. u . In the present invention, the SOLEXA sequencing belongs to the next-generation sequencing technology developed by SOLICA, and the core idea is to sequence while synthesizing. That is, when a new DNA complementary strand is generated, either the added dNTP catalyzes the substrate to catalyze the fluorescence by enzymatic cascade reaction, or directly adds the fluorescently labeled dNTP or semi-degenerate primer, and releases when the synthetic strand is synthesized or linked to form a complementary strand. Fluorescent signal. Complementary strand sequence information is obtained by capturing the optical signal and transforming it into a sequencing peak (Mardis ER (2008). x ,. u .

387-402 ) 。 SOLEXA测^包括对 DNA样本测序和 RNA样本测序。 根据测序 方法不同, SOLEXA测序又可以分为单端测序(Single-read Sequencing)和双端 测序 (Paired-end Sequencing和 Mate-pair Sequencing)。 在本发明的实施方案中, 所述 SOLEXA测序方法为 Paired-end测序法。 387-402). The SOLEXA test includes sequencing of DNA samples and sequencing of RNA samples. Depending on the sequencing method, SOLEXA sequencing can be divided into single-end sequencing (Single-read Sequencing) and double-ended sequencing (Paired-end Sequencing and Mate-pair Sequencing). In an embodiment of the invention, the SOLEXA sequencing method is a Paired-end sequencing method.

在本发明中, 所述锁核酸是指一种人工合成的反义寡核苷酸, 是一种特殊 的双环状核苷酸衍生物,也属于核苷酸的一种。其中核苷酸残基的核糖环(β-D- 呋喃核糖)的 2,-氧和 4,-碳通过缩水作用形成亚曱基连接(见式 I , 其中 B为碱 基), 结构中含有一个或多个 2'-0,4'-C-亚曱基 -β-D-呋喃核糖核酸单体,核糖的 V-O位和 4'-C位通过不同的缩水作用形成氧亚曱基桥、 硫亚曱基桥或胺亚曱基 桥,并连接成环形,这个环形桥锁定了呋喃糖 C3'-内型的 N构型,降低了核糖结构的 柔韧性,增加了磷酸盐骨架局部结  In the present invention, the locked nucleic acid refers to a synthetic antisense oligonucleotide, which is a special bicyclic nucleotide derivative and also belongs to a kind of nucleotide. The 2,-oxygen and 4,-carbon of the ribose ring (β-D-ribofuranosyl) of the nucleotide residue form a fluorenylene linkage by shrinkage (see Formula I, where B is a base), and the structure contains One or more 2'-0,4'-C-arylene-β-D-ribofuranoic acid monomers, the VO position and the 4'-C position of ribose form an oxy-indenylene bridge through different shrinkage, a sulfinylene bridge or an amine sulfhydryl bridge, which is connected in a ring shape. This ring bridge locks the N-form of the furanose C3'-endotype, reduces the flexibility of the ribose structure, and increases the local structure of the phosphate skeleton.

Figure imgf000007_0001
在本发明中, 所述锁核酸修饰是指在 DNA片段中的核苷酸被具有相同碱 基的锁核酸替代。
Figure imgf000007_0001
In the present invention, the locked nucleic acid modification means that the nucleotide in the DNA fragment is replaced by a locked nucleic acid having the same base.

在本发明中,所述 DNA文库是指将基因组 DNA从细胞中提取,然后将其打 断到约 100-lOOObp大小, 再将接头连接到片段上, 经 PCR扩增后制成的文库, 所述文库用于高通量测序, 例如 SOLEXA测序。  In the present invention, the DNA library refers to a library prepared by extracting genomic DNA from a cell, then breaking it to a size of about 100-1000 bp, and then ligating the linker to the fragment and PCR-amplifying it. The library is used for high throughput sequencing, such as SOLEXA sequencing.

在本发明中,所述 DNA文库的构建是指从提取细胞中 DNA开始到得到 DNA 文库的过程。  In the present invention, the construction of the DNA library refers to a process from the extraction of DNA in a cell to the obtaining of a DNA library.

在本发明中,所述 DNA文库的测序是指对获得的 DNA文库进行测序,获得 文库中各片段核苷酸序列的过程。  In the present invention, the sequencing of the DNA library refers to a process of sequencing the obtained DNA library to obtain a nucleotide sequence of each fragment in the library.

在本发明中,所述 DNA片段是指在高通量测序例如 SOLEXA测序中所需 要的 DNA小分子片段, 包括接头、 PCR引物和测序引物等。  In the present invention, the DNA fragment refers to a small DNA fragment required for high-throughput sequencing such as SOLEXA sequencing, including a linker, a PCR primer, and a sequencing primer.

在本发明中,所述基因组 DNA片段是指提取得到的基因组 DNA经过打断 后得到的片段。  In the present invention, the genomic DNA fragment refers to a fragment obtained by disrupting the extracted genomic DNA.

在本发明中, 所述接头( Adapter )用于高通量测序特别是 SOLEXA测序 中 ,具体是指添加在打断的基因组 DNA片段末端的一段" Y"型双链 DNA片段, 是由接头的 F链和 R链通过退火合成的一段" Y"型双链 DNA片段。 接头的作 用是加入一段已知序列, 以设计相应引物进行 PCR。  In the present invention, the adaptor (Adapter) is used in high-throughput sequencing, particularly in SOLEXA sequencing, specifically to add a "Y"-type double-stranded DNA fragment at the end of the interrupted genomic DNA fragment, which is A "Y" type double-stranded DNA fragment synthesized by annealing the F chain and the R chain. The function of the linker is to add a known sequence to design the corresponding primer for PCR.

在本发明中, 所述 PCR引物用于高通量测序特别是 SOLEXA测序中, 具 w 。 In the present invention, the PCR primer is used in high-throughput sequencing, particularly in SOLEXA sequencing, w.

在本发明中, 所述测序引物用于高通量测序特别是 SOLEXA测序中, 具 体是指用于对构建的 DNA文库进行测序的引物。  In the present invention, the sequencing primers are used in high-throughput sequencing, particularly SOLEXA sequencing, and specifically refer to primers for sequencing a constructed DNA library.

在本发明中, 所述接头的 F链是指双链接头的正向寡核苷酸链, 其 5,端序 列与 R链 3,端序列互补, 形成 Y型双链片段, 其 5,端经过磷酸化后可以与加 A后的基因组 DNA片段的 3,端 A连接。  In the present invention, the F chain of the linker refers to a forward oligonucleotide chain of a double-linker, wherein the 5, end sequence is complementary to the R chain 3, the end sequence, forming a Y-type double-stranded fragment, 5, After phosphorylation, it can be linked to the 3, end A of the genomic DNA fragment after addition of A.

在本发明中, 所述接头的 R链的 3,端序列与 F链 5,端序列互补, 形成 Y 型双链片段, 其 3,端可以与基因组 DNA片段的 5,端连接。  In the present invention, the 3' end sequence of the R chain of the linker is complementary to the F chain 5, the end sequence to form a Y-type double-stranded fragment, and the 3' end may be ligated to the 5' end of the genomic DNA fragment.

在本发明中 , 所述靠近某一种 DNA片段的 5,或 3,端是指位于该片段的 5, 或 3,端的三分之一长度内。  In the present invention, the 5, or 3, terminal near the DNA fragment is located within a third of the length of the 5, or 3, end of the fragment.

在本发明中, 所述位于某一种 DNA片段 5,或 3,端的第 n个核苷酸, 是指 从该片段的 5,或 3,端的第一个核苷酸开始计算,所得到的第 n个核苷酸的位置。  In the present invention, the n-th nucleotide located at the 5, or 3, end of a DNA fragment is calculated from the first nucleotide of the 5, or 3, end of the fragment. The position of the nth nucleotide.

在本发明中, 所述芯片 (flowcell )是指测序芯片, 表面连接有一层单链寡 核苷酸序列。  In the present invention, the flowcell refers to a sequencing chip to which a single-stranded oligonucleotide sequence is attached.

在本发明中, 所述固定序列 P7是指 flowcell上的结合序列, 边合成边测 序 (SBS ) 过程模板序列的 5,端序列。  In the present invention, the fixed sequence P7 refers to a binding sequence on a flowcell, and a 5, end sequence of a sequence sequence of a SBS process template.

在本发明中, 所述固定序列 P5是指 flowcell上的结合序列。  In the present invention, the fixed sequence P5 refers to a binding sequence on a flowcell.

在本发明中, 所述正义引物也称为上游引物是指与待扩增 DNA片段中编 码链的 5,端序列相同的引物。  In the present invention, the sense primer is also referred to as an upstream primer, and refers to a primer which is identical to the 5' end sequence of the coding strand in the DNA fragment to be amplified.

在本发明中, 所述反义引物也称为下游引物是指与待扩增 DNA片段中编 码链的 3,端序列互补的引物。  In the present invention, the antisense primer, also referred to as a downstream primer, refers to a primer complementary to the 3, end sequence of the coding strand in the DNA fragment to be amplified.

在本发明中,所述 Readl测序引物是指双末端测序时合成读取文库 5,末端 序列所使用的测序引物。  In the present invention, the Readl sequencing primer refers to a sequencing primer used for synthesizing a read library 5, a terminal sequence at the time of double-end sequencing.

在本发明中,所述 Read2测序引物是指双末端测序时合成读取文库 3,末端 序列所使用的测序引物。  In the present invention, the Read2 sequencing primer refers to a sequencing primer used for synthesizing the read library 3, the terminal sequence, at the time of double-end sequencing.

在本发明中, 所述核苷酸即包括脱氧核糖核苷酸、 核糖核苷酸, 也包括锁 核酸。  In the present invention, the nucleotide includes deoxyribonucleotides, ribonucleotides, and also includes a locked nucleic acid.

在本发明中, 所述锁核酸的位置是指该处的脱氧核糖核苷酸被相应的锁核 酸替代, 即被含有相同碱基的锁核酸替代。 在本发明的一个实施方案中, 分别构建使用 LNA修饰的接头和 PCR 引物 的文库,与未经 LNA修饰的常用 DNA 小片段文库,文库制备成功后,进行 solexa 高通量上机测序, 其中经 LNA修饰的文库用 LNA修饰的上机测序引物, 未经 LNA修饰的文库用 illumina提供的测序引物, 比较两者所得的结果、 验证本发 明的稳定性、 可重复性和真实可靠性。 选用相同样品, 进行多次实验后, 证明本 发明得到的数据真实可信, 与原有技术的数据相比, 经 LNA修饰的文库质量和 测序质量比原有技术的文库质量有所提高。 本发明将 LNA修饰技术与高通量测序技术相结合,通过对 SOLEXA测序 中所涉及的接头、 PCR引物和 /或测序引物进行 LNA修饰, 提高了 DNA片段 的热稳定性、对抗酶降解的稳定性, 同时能够激活 RNase H, 进而降低了 DNA 二聚体的产生, 提高了连接和 PCR效率。 In the present invention, the position of the locked nucleic acid means that the deoxyribonucleotide at that place is replaced by a corresponding locked nucleic acid, that is, by a locked nucleic acid containing the same base. In one embodiment of the present invention, a library of LNA-modified linkers and PCR primers is separately constructed, and a library of commonly used DNA small fragments that have not been modified by LNA is used, and after successful library preparation, solexa high-throughput sequencing is performed, wherein The LNA-modified library was subjected to LNA-modified supernatant sequencing primers, and the LNA-modified library was subjected to sequencing primers provided by Illumina, and the results obtained were compared to verify the stability, reproducibility and true reliability of the present invention. After selecting the same sample and performing multiple experiments, it is proved that the data obtained by the present invention is authentic, and the quality of the library modified by LNA and the quality of sequencing are improved compared with the data of the prior art. The invention combines the LNA modification technology with the high-throughput sequencing technology, and improves the thermal stability of the DNA fragment and the stability against the enzyme degradation by performing LNA modification on the linker, PCR primer and/or sequencing primer involved in SOLEXA sequencing. Sex, which activates RNase H, which reduces DNA dimer production and improves ligation and PCR efficiency.

同时, 本发明对 LNA修饰位点进行了优化, 对不同 DNA片段的修饰采用 了不同策略, 进一步提高了 LNA修饰的效果。 对于接头的 LNA修饰, 其修饰 位点位于接头 F链的 5,端, R链的 3,端, 这样既可以提高接头退火效率, 又可 以提高接头与经"加 A"反应的目的片段的连接, 使接头和 PCR 引物的结合更高 效, 特异, 灵敏; 对于 PCR引物的 LNA修饰, 将公用 PCR Primer PE 1.0引物 的 LNA修饰位点靠近 5,端,这样可以提高与 p5序列的结合效率,将 PCR Primer PE2.0引物 (A, B, C, D ) 的 LNA修饰位点靠近 3,端, 这样可以提高与 p7序 列的结合效率, 最终使目的序列与固定序列更稳定; 对于测序引物的 LNA修饰, 采用对测序引物在 3,端进行双重修饰, 可以提高测序引物的稳定性, 特异性和灵 敏度, 从而提高整个测序 Run (单次上机测序反应) 的质量。  At the same time, the present invention optimizes the LNA modification site, and adopts different strategies for modification of different DNA fragments, thereby further improving the effect of LNA modification. For the LNA modification of the linker, the modification site is located at the 5th end of the F chain of the linker and at the 3rd end of the R chain, which can improve the efficiency of the joint annealing and improve the connection of the linker with the target fragment which is reacted with "A" reaction. To make the binding of the linker and the PCR primer more efficient, specific and sensitive; for the LNA modification of the PCR primer, the LNA modification site of the common PCR Primer PE 1.0 primer is close to the 5th end, which can improve the binding efficiency with the p5 sequence, PCR Primer PE2.0 primers (A, B, C, D) have LNA modification sites close to the 3' end, which improves the binding efficiency to the p7 sequence, and ultimately makes the sequence of interest and the immobilization sequence more stable; LNA for sequencing primers Modification, using double-modification of sequencing primers at the 3' end, can improve the stability, specificity and sensitivity of sequencing primers, thereby improving the quality of the entire sequencing Run (single-on-sequence sequencing reaction).

本发明对 LNA修饰文库与非 LNA修饰文库的测序数据结果进行平行分 析, 通过测序碱基质量值 (Q30% ) 、 测序错误率、 GC 含量、 接头污染、 基 因组比对率等来评价测序质量, 通过 GC分布情况、 基因覆盖度等评价文库质 量。 通过对比评价可发现, 使用 LNA修饰的文库无论是文库质量还是测序质 量, 都比没有使用 LNA修饰的文库质量要好。 附图说明  The present invention performs parallel analysis on the sequencing data of the LNA modified library and the non-LNA modified library, and evaluates the sequencing quality by sequencing the base quality value (Q30%), the sequencing error rate, the GC content, the joint contamination, and the genomic alignment ratio. Library quality was evaluated by GC distribution, gene coverage, and the like. By comparison, it was found that the library modified with LNA was better than the library without LNA modification, both in library quality and in sequencing quality. DRAWINGS

图 1: 用于 SOLEXA测序的 LNA修饰方案;  Figure 1: LNA modification protocol for SOLEXA sequencing;

图 2: DNA PE index文库制备操作流程示意图;  Figure 2: Schematic diagram of the preparation process of the DNA PE index library;

图 3A: 非 LNA修饰文库测序结果 fq质量值 q30箱状图, 横坐标为 PE90 的循环数, 纵坐标为 Q30% ( Q-64>=30, 即合格碱基的百分比) ; 箱状图的画 法为在每个循环( cycle ) 中, 将所有 tile (小区)的 Q30%排序, 取 5个点画成 箱状图: 最高, 最低, 中值, 四分之一, 四分之三。  Figure 3A: Non-LNA modified library sequencing results fq mass value q30 box plot, the abscissa is the number of cycles of PE90, the ordinate is Q30% (Q-64>=30, the percentage of qualified bases); box plot The drawing is to sort all the tiles (cells) Q30% in each cycle, and draw 5 points into a box plot: highest, lowest, median, quarter, three-quarters.

图 3B: LNA修饰文库测序结果 fq质量值 q30箱状图, 横坐标为 PE90的 循环数, 纵坐标为 Q30% ( Q-64>=30, 即合格碱基的百分比) ; 箱状图的画法 为在每个循环( cycle ) 中, 将所有 tile (小区)的 Q30%排序, 取 5个点画成箱 状图: 最高, 最低, 中值, 四分之一, 四分之三。  Figure 3B: LNA modified library sequencing results fq mass value q30 box plot, the abscissa is the number of cycles of PE90, the ordinate is Q30% (Q-64>=30, the percentage of qualified bases); In each cycle, the Q30% of all tiles are sorted, and 5 points are drawn into a box plot: highest, lowest, median, quarter, and three-quarters.

图 4A: 非 LNA修饰文库测序质量分布图, 横坐标为 PE90的循环数, 纵 坐标为每个循环对应的碱基质量值, 颜色代表不同的百分比, 白色: 0, 绿色: 10%,黄色: 30%, 红色: 50%, 深红: 70%,黑色: 100%;  Figure 4A: Non-LNA modified library sequencing mass distribution map, the abscissa is the number of cycles of PE90, the ordinate is the base mass value corresponding to each cycle, the color represents a different percentage, white: 0, green: 10%, yellow: 30%, red: 50%, deep red: 70%, black: 100%;

图 4B: LNA修饰文库测序质量分布图, 横坐标为 PE90的循环数, 纵坐标 为每个循环对应的碱基质量值,颜色代表不同的百分比, 白色: 0, 绿色: 10%,黄 色: 30%, 红色: 50%, 深红: 70%,黑色: 100%; , , Figure 4B: LNA modified library sequencing mass distribution map, the abscissa is the number of cycles of PE90, the ordinate is the base mass value corresponding to each cycle, the color represents a different percentage, white: 0, green: 10%, yellow: 30 %, red: 50%, dark red: 70%, black: 100%; , ,

循环数, 纵坐标为错误率 (每个循环错误的碱基数 /全部循环的碱基数) ; The number of cycles, the ordinate is the error rate (number of bases per cycle error / number of bases in all cycles);

图 5Β: LNA修饰文库测序结果碱基错误率分布图, 横坐标为 ΡΕ90的循 环数, 纵坐标为错误率 (每个循环错误的碱基数 /全部循环的碱基数) ;  Figure 5Β: The base error rate distribution of the LNA modified library sequencing results, the abscissa is the number of cycles of ΡΕ90, and the ordinate is the error rate (number of bases per cycle error / number of bases in all cycles);

图 6Α: 非 LNA修饰文库测序结果 GC含量分布图, 横坐标为统计不动杆 菌参考序列上每个窗口的 GC含量, 纵坐标为比对到每个窗口的覆盖次数;  Figure 6Α: Non-LNA modified library sequencing results GC content distribution map, the abscissa is the GC content of each window on the statistical Acinetobacter reference sequence, and the ordinate is the number of times of comparison to each window;

图 6Β: LNA修饰文库测序结果 GC含量分布图,横坐标为统计不动杆菌参 考序列上每个窗口的 GC含量, 纵坐标为比对到每个窗口的覆盖次数;  Figure 6: LNA modified library sequencing results GC content distribution map, the abscissa is the GC content of each window on the statistical Acinetobacter reference sequence, and the ordinate is the number of times of comparison to each window;

图 7: 文库测序结果基因覆盖分布图, 横坐标为统计每个碱基的覆盖次数, 纵坐标为碱基个数; 其中浅色曲线 (峰值更低的曲线)表示非 LNA修饰文库, 深色曲线(峰值更高的曲线)表示 LNA修饰文库;  Figure 7: Gene coverage map of library sequencing results, the abscissa is the number of coverages per base, and the ordinate is the number of bases; where the light curve (lower peak curve) indicates a non-LNA modified library, dark The curve (higher curve of the peak) represents the LNA modified library;

图 8Α: 没有修饰的接头退火的 Aglient 2100结果;  Figure 8: Aglient 2100 results for unannealed joint annealing;

图 8B: LNA修饰的接头退火的 Aglient 2100结果;  Figure 8B: Aglient 2100 results for LNA-modified joint annealing;

图 9A: 使用没有修饰的 read2测序引物的文库测序结果 fq质量值 q30箱 状图, 横坐标为 PE90的循环数, 纵坐标为 Q30% ( Q-64>=30, 即合格碱基的百 分比); 箱状图的画法为在每个 cycle中, 将所有 tile的 Q30%排序, 取 5个点 画成箱状图: 最高, 最低, 中值, 四分之一, 四分之三;  Figure 9A: Library sequencing results using unmodified read2 sequencing primers fq mass value q30 box plot, abscissa is the number of cycles of PE90, ordinate is Q30% (Q-64>=30, the percentage of qualified bases) The box plot is drawn by sorting Q30% of all tiles in each cycle and plotting 5 points as a box plot: highest, lowest, median, quarter, three-quarters;

图 9B:使用 LNA修饰的 read2测序引物的文库测序结果 fq质量值 q30箱 状图, 横坐标为 PE90的循环数, 纵坐标为 Q30% ( Q-64>=30, 即合格碱基的百 分比); 箱状图的画法为在每个 cycle中, 将所有 tile的 Q30%排序, 取 5个点 画成箱状图: 最高, 最低, 中值, 四分之一, 四分之三。  Figure 9B: Library sequencing results using LNA modified read2 sequencing primers fq mass value q30 box plot, abscissa is the number of cycles of PE90, ordinate is Q30% (Q-64>=30, the percentage of qualified bases) The box plot is drawn in such a way that Q30% of all tiles are sorted and 5 points are drawn into a box plot: highest, lowest, median, quarter, three quarters.

具体实施方式  detailed description

下面将结合实施例对本发明的实施方案进行详细描述, 但是本领域技术人 员将会理解, 下列实施例仅用于说明本发明, 而不应视为限定本发明的范围。 实施例中未注明具体条件者, 按照常规条件或制造商建议的条件进行。 所用试 剂或仪器未注明生产厂商者, 均为可以通过市购获得的常规产品。 以下测序采 用 Illumina Hiseq2000测序仪进行, 未特别注明的均按照说明书进行操作。  The embodiments of the present invention are described in detail below with reference to the accompanying drawings, but the invention is to be construed as illustrative only. If no specific conditions are specified in the examples, they are carried out according to the general conditions or the conditions recommended by the manufacturer. If the reagents or instruments used do not indicate the manufacturer, they are all conventional products that can be obtained commercially. The following sequencing was performed using an Illumina Hiseq 2000 sequencer, and instructions were performed according to the instructions unless otherwise noted.

以下实施例中, 未修饰的 DNA片段由 Invitrogen公司合成, LNA修饰的 DNA片段由 Exiqon公司合成。 实施例 1 SOLEXA高通量测序方法  In the following examples, unmodified DNA fragments were synthesized by Invitrogen, and LNA-modified DNA fragments were synthesized by Exiqon. Example 1 SOLEXA High-throughput Sequencing Method

提取不动杆菌基因组 DNA (基因组〜 3.6M,GC含量 40.4% )为模板, 约 30μ g , 每个文库起始量 3μ§, 使用 Covaris S2打断到主带 350bp, 平行构建 8个插 入片段为 350bp的 DNA PE inde 文库, 其中 4个文库使用 LNA修饰的接头 ( adapter )和 PCR引物 ( primer ) , 另外 4个文库使用没有 LNA修饰的接头 和 PCR引物。文库制备完成后,进行 solexa(SBS-sequencing by synthesis)高通量 测序,其中 LNA修饰的文库用 LNA修饰的测序引物 read 2进行测序, 没有 LNA 修饰的文库使用 Illumina提供的测序引物。 350bp DNA PE index 文库制备操作 - 见图 1、 图 2。 具体步骤如下: Acinetobacter genomic DNA (genome ~ 3.6M, GC content 40.4%) was extracted as a template, about 30μg, the starting amount of each library was 3μ § , the Covaris S2 was used to break the main band 350bp, and 8 inserts were constructed in parallel. A 350 bp DNA PE inde library, 4 of which used LNA-modified adapters and PCR primers, and the other 4 libraries used LNA-free linkers and PCR primers. After the library preparation was completed, high-throughput sequencing of solexa (SBS-sequencing by synthesis) was carried out, in which the LNA-modified library was sequenced with LNA-modified sequencing primer read 2, and the library without LNA modification used the sequencing primer provided by Illumina. 350bp DNA PE index library preparation operation - See Figure 1, Figure 2. Specific steps are as follows:

1、基因组 DNA打断  1. Genomic DNA interruption

取不动杆菌基因组的量 3μ§, 采用 Covaris S2 ( Covaris公司)打断至主带 350bpo 打断检测合格后, 用 QIAquick PCR Purification Kit ( QIAGEN )纯化 打断产物, 溶于 32μ1 EB buffer。 Take an amount of Acinetobacter genomic 3μ §, using the Covaris S2 (Covaris Corporation) interrupt to the main band detected 350bp o interrupted qualified, QIAquick PCR Purification Kit (QIAGEN) and purified by broken product, dissolved in 32μ1 EB buffer.

2、 末端修复  2, end repair

上述步骤得到 30 μΐ 洗脱的 DNA,向反应体系中依次加入 45μ1水、 10 μΐ 10x End Repair Buffer ( ENZYMATICS ) 、 4 μΐ lOmM dNTP Mix ( NEB ) 、 5 μΐ T4 DNA Polymerase ( ENZYMATICS ) 、 1 μΐ Klenow DNA Polymerase ( ENZYMATICS ) 、 5 μΐ T4 PNK ( ENZYMATICS ) , 总反应体系为 100 μΐ, 将反应管置于 20°C 反应 30 min。  The above procedure yielded 30 μΐ of eluted DNA, and 45 μl of water, 10 μΐ 10× End Repair Buffer (ENZYMATICS), 4 μΐ lOmM dNTP Mix (NEB), 5 μΐ T4 DNA Polymerase (ENZYMATICS), 1 μΐ Klenow were sequentially added to the reaction system. DNA Polymerase (ENZYMATICS), 5 μΐ T4 PNK (ENZYMATICS), the total reaction system was 100 μΐ, and the reaction tube was placed at 20 ° C for 30 min.

反应完成后, 用 QIAquick PCR Purification Kit ( QIAGEN ) 纯化末端修 复产物, 溶于 34μ1 EB buffer。  After completion of the reaction, the terminal repair product was purified by QIAquick PCR Purification Kit (QIAGEN) and dissolved in 34 μl EB buffer.

3、 加 A  3, plus A

上述步骤得到 32 μΐ Eluted DNA,向反应体系中依次加入 5μ1 10X blue Buffer ( ENZYMATICS ) 、 10 μΐ 1 mM dATP ( ENZYMATICS ) 、 3 μΐ Klenow exo (3' to 5' exo minus) ( ENZYMATICS ) ,总反应体系为 50μ1,将反应管置于 37。C 反 应 30 min.  The above procedure yielded 32 μΐ Eluted DNA, and 5μ1 10X blue Buffer ( ENZYMATICS ), 10 μΐ 1 mM dATP ( ENZYMATICS ), 3 μΐ Klenow exo (3' to 5' exo minus) (ENZYMATICS ) were added to the reaction system in turn. The system was 50 μl and the reaction tube was placed at 37. C reaction 30 min.

反应完成后, 用 MinElute PCR Purification Kit ( QIAGEN ) 纯化加 A产物, 溶于 12μ1 EB buffer。  After completion of the reaction, the product was purified by MinElute PCR Purification Kit (QIAGEN) and dissolved in 12 μl of EB buffer.

4、 加接头  4, add connector

上述步骤得到 ΙΟμΙ加 Α产物, 向反应体系中依次加入 25μ1 2x Rapid T4 DNA Ligase Buffer ( ENZYMATICS ) 、 10 μΐ PE Adapter Oligo Mix(PE 接头混 合物, 20μΜ) or ΙΟμΙ LNA修饰的 PE Adapter Oligo Μΐχ(20μΜ). 5μ1 T4 DNA Ligase ( ENZYMATICS ) ,总反应体系为 50μ1,将反应管置于 -20°C反应 15 min。  The above procedure results in a ΙΟμΙ addition product, and 25μ1 2x Rapid T4 DNA Ligase Buffer (ENZYMATICS), 10 μΐ PE Adapter Oligo Mix (PE connector mixture, 20 μΜ) or ΙΟμΙ LNA modified PE Adapter Oligo Μΐχ (20 μΜ) are sequentially added to the reaction system. 5μ1 T4 DNA Ligase (ENZYMATICS), the total reaction system was 50μ1, and the reaction tube was placed at -20 °C for 15 min.

未修饰的 PE接头:  Unmodified PE joint:

adapter— F—1:  Adapter— F-1:

5 Thos/GA TCGGAA GA GCA CA CGTCTGAA CTCCA GTCA C3 ' (其中 5,Phos 表示 5,磷酸化) (SEQ ID NO : 1)  5 Thos/GA TCGGAA GA GCA CA CGTCTGAA CTCCA GTCA C3 ' (where 5, Phos indicates 5, phosphorylation) (SEQ ID NO: 1)

adapter— R—1:  Adapter— R-1:

5'TACACTCTTTCCCTACACGACGCTCTTCCGATCT3' ( SEQ ID NO: 5'TACACTCTTTCCCTACACGACGCTCTTCCGATCT3' (SEQ ID NO:

2) 2)

LNA修饰的 PE接头: LNA modified PE joint:

adapter— F—1:  Adapter— F-1:

5 Thos/GA TCGGAA GA GCA CA CGTCTGAA CTCCA GTCA C3 ' (其中 5,Phos 表示 5,磷酸化) (SEQ ID NO : 3) 5'TACACTCTTTCCCTACACGACGCTCTTCCGATCT3' ( SEQ ID NO: 4) 一 5 Thos/GA TCGGAA GA GCA CA CGTCTGAA CTCCA GTCA C3 ' (where 5, Phos indicates 5, phosphorylation) (SEQ ID NO: 3) 5'TACACTCTTTCCCTACACGACGCTCTTCCGATCT3' (SEQ ID NO: 4)

其中带下划线部分为接头的 LNA修饰位点,分别表示 LNA修饰的 G和 T, LNA修饰位点靠近 F链的 5,端, R链 3,端, 这样既可以提高接头退火效率, 又 可以提高接头与经"加 A"反应的目的片段的连接, 使接头和 PCR 引物的结合更 高效, 特异, 灵敏。  The underlined part is the LNA modification site of the linker, which represents the LNA modified G and T, respectively. The LNA modification site is close to the 5th end of the F chain, the R chain 3, and the end, which can improve the joint annealing efficiency and improve The ligation of the linker with the target fragment reacted with "addition of A" makes the binding of the linker and the PCR primer more efficient, specific and sensitive.

反应完成后, 用 QIAquick PCR Purification Kit ( QIAGEN ) 纯化连接产 物, 溶于 32μ1 EB buffer。  After the reaction was completed, the ligated product was purified by QIAquick PCR Purification Kit (QIAGEN) and dissolved in 32 μl of EB buffer.

5、 连接产物胶纯化  5, the connection product glue purification

取步骤 4制备得到的连接产物, 配制 2 %的琼脂糖凝胶, 选择 50 bp DNA Ladder ( NEB ) , 120ν 电泳 60min, 切胶回收, 切胶范围根据接头以及所需目 的片段大小决定。 切下的胶块用 QIAquick Gel Extraction Kit ( QIAGEN )回收, 最后溶于 23μ1 EB buffer。  The ligated product prepared in the step 4 was prepared, and a 2% agarose gel was prepared, and 50 bp DNA Ladder (NEB) was selected, and 120 rpm was electrophoresed for 60 minutes. The gelation was recovered, and the range of the gel was determined according to the size of the linker and the desired fragment size. The cut pieces were recovered by QIAquick Gel Extraction Kit (QIAGEN) and finally dissolved in 23μ1 EB buffer.

6、 PCR扩增  6, PCR amplification

上述步骤得到 23μ1 连接产物, 向反应体系中依次加入 25 μΐ Phusion® High-Fidelity DNA Polymerase ( NEB ) , 1 μΐ PCR Primer PE 1.0 ( ΙΟμΜ ) or 1 μΐ LNA修饰的 PCR Primer PE 1.0 ( 10μΜ ) , 1 μΐ PCR Primer PE 2.0 ( 10μΜ ) or 1 μΐ LNA修饰的 PCR Primer PE 2.0 ( 10μΜ )、 总反应体系 50μ1。 在 PCR仪 上按照以下程序进行反应:  The above procedure yielded a 23 μl ligation product, and 25 μΐ of Phusion® High-Fidelity DNA Polymerase (NEB), 1 μΐ PCR Primer PE 1.0 ( ΙΟμΜ ) or 1 μΐ LNA modified PCR Primer PE 1.0 ( 10 μΜ ), 1 was sequentially added to the reaction system. Μΐ PCR Primer PE 2.0 ( 10μΜ ) or 1 μΐ LNA modified PCR Primer PE 2.0 (10μΜ), total reaction system 50μ1. The reaction was carried out on the PCR machine according to the following procedure:

a. 98°C , 30s;  a. 98 ° C, 30 s;

b. 以下程序 10个循环:  b. The following procedure 10 cycles:

98°C , 10s; 65°C, 30; 72°C , 30s;  98 ° C, 10 s; 65 ° C, 30; 72 ° C, 30 s;

c. 72°C, 5min;  c. 72 ° C, 5 min;

d. 4°C, 维持。  d. 4 ° C, maintained.

其中 PCR Primer PE 1.0引物 (5,端引物, 上游引物)序列为:  The PCR Primer PE 1.0 primer (5, end primer, upstream primer) sequence is:

CGCTCTTCCGATCT3 (SEQ ID NO : 5) CGCTCTTCCGATCT3 (SEQ ID NO: 5)

LNA修饰的 PCR Primer PE 1.0引物序列为:  The LNA-modified PCR Primer PE 1.0 primer sequence is:

CGCTCTTCCGATCT3 (SEQ ID NO : 6) CGCTCTTCCGATCT3 (SEQ ID NO: 6)

其中带下划线部分为接头的 LNA修饰位点, 表示 LNA修饰的 T。  The underlined part is the LNA modification site of the linker, indicating the LNA modified T.

4种未修饰的 PCR Primer PE 2.0引物 (3,端引物, 下游引物)序列为:  The four unmodified PCR Primer PE 2.0 primers (3, end primers, downstream primers) sequence are:

PCR _primer_A:  PCR _primer_A:

5'CAAGCAGAAGACGGCATACGAGATAGAGACTTGTGACTGGAGTTCAGA CGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO : 7)  5'CAAGCAGAAGACGGCATACGAGATAGAGACTTGTGACTGGAGTTCAGA CGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO : 7)

PCR _primer_B: " PCR _primer_B: "

ACGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO : 8) ACGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO : 8)

PCR _primer_C:  PCR _primer_C:

5'CAAGCAGAAGACGGCATACGAGATAGATCTCTGTGACTGGAGTTCAGA CGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO : 9)  5'CAAGCAGAAGACGGCATACGAGATAGATCTCTGTGACTGGAGTTCAGA CGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO : 9)

PCR _primer_D:  PCR _primer_D:

5'CAAGCAGAAGACGGCATACGAGATTAGAGAGCGTGACTGGAGTTCAGA CGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO : 10)  5'CAAGCAGAAGACGGCATACGAGATTAGAGAGCGTGACTGGAGTTCAGA CGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO : 10)

4种 LNA修饰的 PCR Primer PE 2.0引物序列为: The four LNA-modified PCR Primer PE 2.0 primer sequences are:

PCR _primer_A:  PCR _primer_A:

5'CAAGCAGAAGACGGCATACGAGATAGAGACTTGTGACTGGAGTTCAGA CGTGTGCTCTTCCGATCT3 ' (SEQ ID NO : 11)  5'CAAGCAGAAGACGGCATACGAGATAGAGACTTGTGACTGGAGTTCAGA CGTGTGCTCTTCCGATCT3 ' (SEQ ID NO : 11)

PCR _primer_B:  PCR _primer_B:

5'CAAGCAGAAGACGGCATACGAGATGCGAGGCCGTGACTGGAGTTCAG ACGTGTGCTCTTCCGATCT3 ' (SEQ ID NO : 12)  5'CAAGCAGAAGACGGCATACGAGATGCGAGGCCGTGACTGGAGTTCAG ACGTGTGCTCTTCCGATCT3 ' (SEQ ID NO : 12)

PCR _primer_C:  PCR _primer_C:

5'CAAGCAGAAGACGGCATACGAGATAGATCTCTGTGACTGGAGTTCAGA CGTGTGCTCTTCCGATCT3 ' (SEQ ID NO : 13)  5'CAAGCAGAAGACGGCATACGAGATAGATCTCTGTGACTGGAGTTCAGA CGTGTGCTCTTCCGATCT3 ' (SEQ ID NO : 13)

PCR _primer_D:  PCR _primer_D:

5'CAAGCAGAAGACGGCATACGAGATTAGAGAGCGTGACTGGAGTTCAGA CGTGTGCTCTTCCGATCT3 ' (SEQ ID NO : 14)  5'CAAGCAGAAGACGGCATACGAGATTAGAGAGCGTGACTGGAGTTCAGA CGTGTGCTCTTCCGATCT3 ' (SEQ ID NO : 14)

其中带下划线部分 接头的 LNA修饰位点, 分别表示 LNA修饰的 T。  The LNA modification sites of the underlined linker represent the LNA modified T, respectively.

由于公用 PCR Primer ΡΕ 1.0引物的 5,端部分序列与芯片(flowcell )上的固 定序列 P5—致, PCR Primer PE 2.0引物的部分序列与芯片 (flowcell )上的固 定序列 P7相一致, 带下划线部分为 PCR 引物的 LNA修饰位点, 将公用 PCR Primer PE 1.0引物的 LNA修饰位点靠近 5,端可以提高与 P5互补序列( DNA文 库模板链)的结合效率, 将 PCR Primer PE2.0引物(A, B, C, D ) 的 LNA修 饰位点靠近 3,端可以提高与 P7互补序列(DNA文库编码链)的结合效率, 最终 使目的序列与固定序列结合更稳定。  Since the 5' end portion of the common PCR Primer ΡΕ 1.0 primer coincides with the fixed sequence P5 on the flowcell, the partial sequence of the PCR Primer PE 2.0 primer is identical to the fixed sequence P7 on the flowcell, underlined. For the LNA modification site of the PCR primer, the LNA modification site of the common PCR Primer PE 1.0 primer is close to the 5, and the binding efficiency of the P5 complementary sequence (DNA library template strand) can be improved. PCR Primer PE2.0 primer (A) The LNA modification site of B, C, D) is close to the 3, and the binding efficiency of the P7 complementary sequence (DNA library coding strand) can be improved, and finally the target sequence is more stably bound to the fixed sequence.

其中,固定序列 P5中的结合序列为: AATGATACGGCGACCACCGA ( SEQ ID NO: 19 ) ;  Wherein the binding sequence in the fixed sequence P5 is: AATGATACGGCGACCACCGA ( SEQ ID NO: 19);

固定序列 P7中的结合序列为: CAAGCAGAAGACGGCATACGA ( SEQ ID NO: 20 ) 。  The binding sequence in the fixed sequence P7 is: CAAGCAGAAGACGGCATACGA (SEQ ID NO: 20).

7、 PCR产物胶纯化  7. PCR product gel purification

取步骤 6获得的 PCR扩增产物, 配制 2 %的琼脂糖凝胶, 选择 50 bp DNA Ladder, 120v 电泳 60min, 切胶回收, 切胶范围根据接头以及所需目的片段大 小决定。 切下的胶块用 QIAquick Gel Extraction Kit回收, 最后溶于 25μ1 ΕΒ u Take the PCR amplification product obtained in step 6. Prepare 2% agarose gel, select 50 bp DNA Ladder, 120v electrophoresis for 60min, and cut the gel. The range of gelation is determined by the size of the linker and the desired target fragment. The cut rubber pieces are recovered by QIAquick Gel Extraction Kit and finally dissolved in 25μ1 ΕΒ u

8、 文库制备完成后, 进行 solexa高通量测序。  8. After the library preparation is completed, solexa high-throughput sequencing is performed.

按照上述方法平行构建 4个 LNA修饰和 4个非 LNA修饰文库, 质控合格后, 进 行测序,测序使用 Illumina Hiseq2000测序仪 , 其中 4个 LNA修饰的文库使用 LNA修 饰的 Read 2测序引物, 4个非 LNA修饰的文库使用 illumina提供的 Read 2测序引物。  Four LNA modifications and four non-LNA modified libraries were constructed in parallel according to the above method. After quality control, sequencing was performed, and the Illumina Hiseq2000 sequencer was used for sequencing. Four LNA-modified libraries were read with LNA-modified Read 2 sequencing primers, 4 Non-LNA modified libraries used Read 2 sequencing primers provided by Illumina.

Readl测序引物:  Readl sequencing primers:

5' ACACTCTTTCCCTACACGACGCTCTTCCGATCT3' ( SEQ ID NO: 17 ) Inde 测序引物:  5' ACACTCTTTCCCTACACGACGCTCTTCCGATCT3' (SEQ ID NO: 17) Inde Sequencing Primers:

5' GATCGGAAGAGCACACGTCTGAACTCCAGTCAC3' ( SEQ ID NO: 5' GATCGGAAGAGCACACGTCTGAACTCCAGTCAC3' ( SEQ ID NO:

18 ) 18)

未修饰的 Read 2测序引物:  Unmodified Read 2 sequencing primers:

5, GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT3, (SEQ ID NO : 15) LNA修饰的 Read 2测序引物:  5, GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT3, (SEQ ID NO: 15) LNA-modified Read 2 sequencing primer:

5 'GTGA CTGGA GTTCA GA CGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO : 16) 其中带下划线部分为测序引物 LNA修饰位点, 分别表示 LNA修饰的 A和 (:。 由于高通量测序在测序过程中时间比较长, 引物容易降解, 从而影响测序质量, 因此, 对测序引物进行修饰显得非常关键, 尤其是在 3,端靠末端。 本发明对测序 引物在 3,端进行了双重修饰, 意义在于提高测序引物的稳定性, 特异性和灵敏度, 从而提高整个测序 Run的质量。 实施例 2 SOLEXA高通量测序结果分析  5 'GTGA CTGGA GTTCA GA CGTGTGCTCTTCCGA TCT3 ' (SEQ ID NO: 16) wherein the underlined portion is the sequencing primer LNA modification site, which represents LNA-modified A and (:. Because of high-throughput sequencing, the time is long in the sequencing process. Primers are easily degraded, which affects the quality of sequencing. Therefore, it is very important to modify the sequencing primers, especially at the end of the 3rd end. The present invention double-modifies the sequencing primers at the 3' end, which means to improve the sequencing primers. Stability, specificity and sensitivity to improve the quality of the entire sequencing run. Example 2 Analysis of SOLEXA high-throughput sequencing results

平行分析实施例 1获得的 LNA修饰文库与非 LNA修饰文库的下机数据(即 测序结果数据), 主要包括下机后的测序碱基质量值, 测序错误率, %GC含量, 接头污染,基因组比对率, 以评价测序质量; 并进一步分析数据的基本信息, GC 分布情况, 基因覆盖度等, 以评价文库质量。 通过对比评价可发现, 使用 LNA 修饰的文库无论是文库质量还是测序质量, 总体上都比没有使用 LNA修饰的文 库质量要好。 (以下数据结果除测序碱基质量值由 Hiseq2000测序自带的软件得 出外, 其余均以 SOAP比对分析软件计算得出, 其中图 3A和 3B是 4个文库的 平均结果)  Parallel analysis of the LNA modified library obtained in Example 1 and the non-LNA modified library data (ie, sequencing result data), mainly including sequencing base quality values after sequencing, sequencing error rate, %GC content, joint contamination, genome The comparison rate to evaluate the quality of the sequencing; and further analysis of the basic information of the data, GC distribution, gene coverage, etc., to evaluate the library quality. By comparison, it can be found that the library modified with LNA is generally better than the library without LNA modification, regardless of the quality of the library or the quality of the sequencing. (The following data results were obtained by the software included in Hiseq2000 sequencing except that the sequencing base quality values were obtained, and the rest were calculated by SOAP comparison analysis software, wherein Figures 3A and 3B are the average results of the four libraries)

测序信息分析结果见表 1、表 2 , 其中表 1和表 2中的结果是一组平行构建 文库的测序分析结果。 表 1未修饰文库的测序信息分析结果  The results of the sequencing information analysis are shown in Tables 1 and 2, wherein the results in Tables 1 and 2 are the results of sequencing analysis of a set of parallel constructed libraries. Table 1 Sequencing information analysis results of unmodified library

Figure imgf000014_0001
: 总读段数
Figure imgf000014_0001
: Total number of readings

1,733,3 总 碱 基 量 读 长 (%Phasing) 0.141 1,733,3 total alkali amount read length (%Phasing) 0.141

( Total 311,994,720 90; 90 ( Total 311,994,720 90; 90

(Length) 预 定 相 率 0.397; Reads) 04 ( TotalBases )  (Length) Predetermined Phase Rate 0.397; Reads) 04 ( TotalBases )

(%prephasing) 0.248 (%prephasing) 0.248

%Q20 97.94; %Q30 93.50; 87.13 GC 含 量 41.40; 错 误 率 0.12; %Q20 97.94; %Q30 93.50; 87.13 GC content 41.40; error rate 0.12;

93.03 (%GC) 41.35 (%ErrorRate) 0.43 标签 (Index) AGAGAT index 中错配数 97.6 index 中错配 2.3  93.03 (%GC) 41.35 (%ErrorRate) 0.43 Label (Index) Mismatch in AGAGAT index 97.6 Mismatch in index 2.3

CT 为 0 的 比例 数为 1 的比  The ratio of the ratio of CT to 0 is 1.

( %Index 例 (%Index  (%Index example (%Index

Omismatch) lmismatch)  Omismatch) lmismatch)

比 对 率 99.85; 成对 reads唯一 96.6 插入片段大 320 插入大小误差 -12/+15 (%Align) 97.96 比 对 率 小 ( Insert Size  The ratio is 99.85; the paired reads are only 96.6 inserts large 320 insert size error -12/+15 (%Align) 97.96 ratio is small ( Insert Size

( %Uniqpair) ( InsertSize ) SD)  ( %Uniqpair) ( InsertSize ) SD)

LNA修饰文库的测序信息分析结果 Sequencing information analysis results of LNA modified library

Figure imgf000015_0001
表 1、表 2的数据结果由 Hiseq2000测序机器自带的软件自动得出,从表 1、 表 2可以看出: LNA修饰的文库, Q30%为 (96.77; 96.15 ) ; 而非 LNA修饰 的文库, %Q30为: (93.50; 87.13 ) ; Q30% ( Q-64>=30, 即合格碱基的百分 比)越高表示质量值越好;不动杆菌的%0€实际含量是 40.4%,表 1中 LNA修 饰的文库%。<: ( 40.91; 40.84 )比非 LNA修饰的文库0 /oGC: ( 41.40; 41.35 ) 更靠近真实情况。错误率和比对率, 使用 LNA修饰的文库结果比对率更高, 错 误率更低, 具体数据对照表 1。
Figure imgf000015_0001
The data results in Table 1 and Table 2 are automatically obtained by the software provided by the Hiseq2000 sequencing machine. It can be seen from Table 1 and Table 2: LNA modified library, Q30% is (96.77; 96.15); non-LNA modified library , %Q30 is: (93.50; 87.13); Q30% (Q-64>=30, the percentage of qualified bases) indicates that the quality value is better; the actual content of Acinetobacter is 40.4%, Table % of the LNA modified library in 1 . <: ( 40.91; 40.84 ) is closer to the real situation than the non-LNA modified library 0 /oGC: ( 41.40; 41.35 ). The error rate and the comparison rate, the LNA modified library results in higher acknowledgment rate and lower error rate. The specific data are compared with Table 1.

测序结果 fq质量值 q30箱状图参见图 3A、 3 B , 图 3A为未经 LNA修饰 的测序结果, 图 3B为经过 LNA修饰的测序结果, 横坐标为 PE90 的循环数 ( cycle ) , 纵坐标为 Q30% ( Q-64>=30, 即合格碱基的百分比) , 从图中可以 看出, 箱状图越集中, 表示每个循环中, Q30%的值越相近, 箱状图中中值越 , 。 , Sequencing results fq mass value q30 box plot see Figure 3A, 3 B, Figure 3A is the sequencing result without LNA modification, Figure 3B is the LNA modified sequencing results, the abscissa is the number of cycles of PE90 (cycle), ordinate For Q30% (Q-64>=30, the percentage of qualified bases), it can be seen from the figure that the more concentrated the box plot is, the closer the value of Q30% is in each cycle, in the box plot. Value , . ,

序结果比没有修饰的测序结果 q30箱状图集中 ,中值也明显高。说明使用 LNA 修饰的文库的碱基质量比没有修饰要好。 The sequencing result is better than the unmodified sequencing result. The q30 box plot is concentrated, and the median value is also significantly higher. It is indicated that the base quality of the library modified with LNA is better than no modification.

测序质量分布参见图 4A、 4B, 图 4A为未经 LNA修饰的测序结果, 图 4B 为经过 LNA修饰的测序结果, 横坐标为 PE90的循环数(cycle ) , 纵坐标为每 个循环对应的碱基质量值,颜色代表不同的百分比, 白色: 0, 绿色: 10%,黄色: 30%, 红色: 50%, 深红: 70%,黑色: 100%, 比如某个位置某个质量值为绿色, 那么就代表横坐标的位点上质量值为纵坐标的占全部质量值的 10%, 质量值越 高颜色越深表示质量越好。 从图 4A、 4B可以看出经过 LNA修饰的文库, 质量 值分布明显好于没有 LNA修饰的文库。  See Figure 4A, 4B for the sequencing quality distribution. Figure 4A shows the sequencing results without LNA modification. Figure 4B shows the sequencing results with LNA modification. The abscissa is the cycle number of PE90, and the ordinate is the base corresponding to each cycle. Base quality value, color represents a different percentage, white: 0, green: 10%, yellow: 30%, red: 50%, dark red: 70%, black: 100%, such as a certain quality value at a certain position Then, the mass value of the locus on the abscissa is 10% of the total mass value of the ordinate. The higher the mass value, the darker the color, the better the quality. It can be seen from Figures 4A and 4B that the LNA-modified library has a significantly better mass value distribution than the LNA-free library.

碱基错误率分布参见图 5A、 5B, 图 5A为未经 LNA修饰的测序结果, 图 5B 为经过 LNA修饰的测序结果, 将下机数据与参考基因组进行比对, 比对前 32bp 允许 2个错配的情况下选取测序片段( reads ) , 如果前 32bp在允许 2个错配的 情况下比对不上则弃之, 用 eland软件计算, 在能够比上的 reads中, 每个循环 错误的碱基数 /全部循环的碱基数。 横坐标表示 PE90的循环数(cycle ) , 纵坐标 为每个循环错误的碱基数 /全部循环的碱基数, 从结果上看, LNA修饰的文库比 未修饰的文库的错误率要低得多。  See Figure 5A, 5B for the base error rate distribution. Figure 5A shows the sequencing result without LNA modification. Figure 5B shows the sequencing result of LNA modification. The data of the lower machine is compared with the reference genome. Two 32 bp are allowed. In the case of mismatch, select the segment (sss). If the first 32 bp is matched in the case of allowing 2 mismatches, use eland software to calculate, in the comparable reads, each cycle is wrong. Number of bases / number of bases in all cycles. The abscissa indicates the number of cycles of PE90, the ordinate is the number of bases per cycle error/the number of bases in all cycles. From the results, the LNA-modified library has a lower error rate than the unmodified library. many.

GC含量分布参见图 6A、 6B, 图 6A为未经 LNA修饰的测序结果, 图 6B 为经过 LNA修饰的测序结果, 每 500bp为一个窗口, 横坐标为统计不动杆菌参 考序列上每个窗口的 GC含量, 纵坐标为比对到每个窗口的覆盖次数, 做相关性 分析, 统计 GC的 PCR偏好性, 覆盖次数越均一表示 GC偏好性越低。 从图中 比较得出 LNA修饰过后, GC的偏好性明显减低。  See Figure 6A, 6B for GC content distribution, Figure 6A for sequencing results without LNA modification, Figure 6B for LNA-modified sequencing results, one window per 500 bp, and the abscissa for each window on the statistical Acinetobacter reference sequence The GC content, the ordinate is the number of times of comparison to each window, and the correlation analysis is performed to analyze the PCR preference of the GC. The more uniform the number of coverages, the lower the GC preference. From the comparison in the figure, the preference of GC was significantly reduced after LNA modification.

基因覆盖分布参见图 7,横坐标为统计每个碱基的覆盖次数,纵坐标为碱基 个数。 该曲线符合泊松分布, 图形越集中于中轴, 表明覆盖越随机, 图中可以 看出 LNA修饰过后, 覆盖随机性有所提高。 表 3 两组平行建库文库的测序信息比对分析结果  The gene coverage distribution is shown in Figure 7. The abscissa is the number of times of coverage of each base, and the ordinate is the number of bases. The curve conforms to the Poisson distribution. The more concentrated the graph is on the central axis, the more random the coverage is. The figure shows that the coverage randomness is improved after the LNA modification. Table 3 Sequencing information of two sets of parallel library libraries

Figure imgf000016_0001
测序深度(Depth ) 72 74.1 68.3 74.1 覆盖度 ( Coverage ) 98.85% 98.82% 98.84% 98.84%
Figure imgf000016_0001
Depth of Depth (Depth) 72 74.1 68.3 74.1 Coverage 98.85% 98.82% 98.84% 98.84%

覆盖度≥40χ  Coverage ≥40χ

( coverage at least 40 x ) 0.943721 0.978885 0.93051 0.978784  ( coverage at least 40 x ) 0.943721 0.978885 0.93051 0.978784

覆盖度≥60x  Coverage ≥60x

( coverage at least 60 x ) 0.745516 0.832488 0.681204 0.831284  ( coverage at least 60 x ) 0.745516 0.832488 0.681204 0.831284

覆盖度≥80x  Coverage ≥80x

( coverage at least 80 x ) 0.340046 0.355042 0.267971 0.357859  ( coverage at least 80 x ) 0.340046 0.355042 0.267971 0.357859

(其中 a和 b组分别是指未修饰的文库, A与 B组分别是指 LNA修饰的文  (Parts a and b refer to unmodified libraries, respectively, and groups A and B refer to LNA-modified texts, respectively.

从上表可以看出, LNA修饰的文库提高了 unique rate 和 map to genome rate, 降低 duplication rate , 覆盖度和深度都比没 LNA修饰的文库要好。 As can be seen from the above table, the LNA-modified library increased the unique rate and map to genome rate, reduced the duplication rate, and the coverage and depth were better than those without the LNA modified library.

由此证实本发明将 LNA修饰的接头 (adapter ) , PCR 引物 (primer ) , 测序引物 ( sequencing primer )应用于高通量建库测序技术中, 可以提高文库质 量, 降低无效数据, 增加测序的准确性。 实施例 3 LNA修饰和未修饰接头的退火结果  It is thus confirmed that the LNA modified adapter, PCR primer, sequencing primer can be applied to high-throughput database sequencing technology, which can improve library quality, reduce invalid data, and increase sequencing accuracy. Sex. Example 3 Annealing Results of LNA Modified and Unmodified Linkers

取等体积、 浓度为 100 μ Μ的未修饰 ΡΕ接头的 F链和 R链(分别为 SEQ ID NO: 1和 SEQ ID NO: 2 ) , 进行梯度接头退火, 退火完毕后, 稀释至 5 μ Μ, 利用 Agilent 2100进行检测 , 结果见图 8A。  The F chain and the R chain (SEQ ID NO: 1 and SEQ ID NO: 2, respectively) of an unmodified hydrazine linker of equal volume and concentration of 100 μM were subjected to gradient joint annealing, and after annealing, diluted to 5 μΜ , using the Agilent 2100 for testing, the results are shown in Figure 8A.

取等体积、 浓度为 100 μ Μ的 LNA修饰 ΡΕ接头的 F链和 R链(分别为 SEQ ID NO: 3和 SEQ ID NO: 4 ) , 进行梯度接头退火, 退火完毕后, 稀释 至 1 μ Μ后利用 Agilent 2100进行检测 , 结果见图 8B。  The F chain and the R chain (SEQ ID NO: 3 and SEQ ID NO: 4, respectively) of an LNA modified ΡΕ linker of equal volume and concentration of 100 μΜ were subjected to gradient joint annealing, and after annealing, diluted to 1 μΜ The test was carried out using an Agilent 2100, and the results are shown in Fig. 8B.

从图 8A和图 8B 可以看出, 接头退火后合成双链的大小分别在 80bp和 82bp, 使用 LNA修饰合成的双链接头的比例占 60% , 且单链的峰只有一个, 而没有修饰的双链接头的比例是 57% , 单链的峰有两个。  It can be seen from Fig. 8A and Fig. 8B that the size of the synthesized double strands after annealing is 80 bp and 82 bp, respectively, and the proportion of the double-linked head synthesized by LNA modification is 60%, and the peak of the single chain is only one, and there is no modification. The ratio of the double link head is 57%, and the peak of the single chain has two.

结果表明, LNA修饰的 PE接头合成双链的浓度和退火效率比未修饰的接 头要高。 实施例 4 使用 LNA修饰和未修饰的测序引物的测序结果  The results show that the concentration of the double-stranded and the annealing efficiency of the LNA-modified PE linker is higher than that of the unmodified joint. Example 4 Sequencing results using LNA-modified and unmodified sequencing primers

按照实施例 1的方法制备一组平行文库并进行 SOLEXA高通量测序, 与 之不同的是加接头时, 所用接头为未经 LNA修饰的接头 (SEQ ID NO: 3和 SEQ ID NO: 4 ) , 在进行 PCR扩增时, 所用 PCR引物为未经 LNA修饰的 PCR引物 (SEQ ID NO: 7 ~ 10 ) , 仅在测序时同时使用 LNA修饰的和未经 修饰的测序引物 (SEQ ID NO: 15 ~ 16 ) 。 测序结果见表 4和表 5。

Figure imgf000018_0001
A set of parallel libraries was prepared according to the method of Example 1 and subjected to high-throughput sequencing of SOLEXA, except that when the linker was added, the linker used was a linker not modified by LNA (SEQ ID NO: 3 and SEQ ID NO: 4). When performing PCR amplification, the PCR primers used were PCR primers without LNA modification (SEQ ID NOS: 7-10), and only LNA-modified and unmodified sequencing primers were used for sequencing (SEQ ID NO: 15 ~ 16). The sequencing results are shown in Tables 4 and 5.
Figure imgf000018_0001

Figure imgf000018_0002
其中 RawClusters/Tile 表示每个 tile的 DNA簇( clusters )数, 这里取了 所有 tile的中位数。 PFClusters/Tile 表示每个 tile经过 PF过滤之后的 DNA簇 数, PF是 Illumina的默认的过滤规则: 前 25个碱基里面只允许有一个碱基质量 不好, 不满足这个条件的 reads 就过滤。 %PF : PFClusters/RawClusters FirstCyclelnt: 第一个循环的光强 。 %Phasing: 反应滞后的概率 (当前循环数 中, 还在进行前 1个循环反应的 reads的比例)。 %Prephasing: 反应超前的概率 (当前循环数中,已经在进行后 1个循环反应的 reads的比例)。%Q20( Q-64>=20 即合格碱基的百分比) , %Q30 ( Q-64>=30, 即合格碱基的百分比) 。
Figure imgf000018_0002
Where RawClusters/Tile represents the number of clusters of clusters per tile, where the median of all tiles is taken. PFClusters/Tile indicates the number of DNA clusters after each tile has been filtered by PF. PF is Illumina's default filter rule: Only one base in the first 25 bases is allowed to have a bad quality. Reads that do not meet this condition are filtered. %PF : PFClusters/RawClusters FirstCyclelnt: The light intensity of the first cycle. %Phasing: Probability of response lag (in the current number of cycles, the proportion of reads that are still in the previous cycle). %Prephasing: Probability of response advancement (the proportion of reads in the current cycle number that has already been processed in the last cycle). %Q20 (Q-64>=20 is the percentage of qualified bases), %Q30 (Q-64>=30, which is the percentage of qualified bases).

从表 4和表 5 可以看出, 使用 LNA修饰的 read2 测序引物的测序结果 、 „ As can be seen from Tables 4 and 5, sequencing results using LNA-modified read2 sequencing primers , „

测序质量更好。 The quality of sequencing is better.

图 9A和 9B是使用经 LNA修饰或未经修饰的测序引物 read2的测序结果 fq质量值 q30箱状图, 其中, 箱状图越集中, 表示每个循环中, Q30%的值越相 近, 箱状图中中值越高,表示碱基质量越好。从这两个图中可以看出,使用 LNA 修饰的测序引物 read2的测序结果 q30箱状图比没有修饰的测序引物 read2的 测序结果 q30箱状图更集中,中值也更高,说明使用 LNA修饰的测序引物 read2 比没有修饰的测序引物 read2的碱基质量要好。  9A and 9B are box results of sequencing results of fq mass value q30 using LNA-modified or unmodified sequencing primer read2, wherein the more concentrated the box plot, the closer the value of Q30% is in each cycle, the box The higher the median value in the graph, the better the base quality. As can be seen from the two figures, the sequencing result of the sequencing primer using the LNA modified read2 q30 box plot is more concentrated than the sequencing result of the unmodified sequencing primer read2 q30 box plot, the median value is also higher, indicating the use of LNA The modified sequencing primer read2 is better than the base of the unmodified sequencing primer read2.

将表 2和表 4, 图 9B和 3B进行对比的话, 可以看出表 2的 Q20和 Q30均 比表 4的要高, 而图 3B的 q30箱状图比图 9B的要集中, 中值也明显提高。 结 果表明同时使用 LNA修饰的接头, PCR引物及测序引物比单纯修饰测序引物的 测序效果要好。 尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理 解。 根据已经公开的所有教导, 可以对那些细节进行各种修改和替换, 这些改 变均在本发明的保护范围之内。 本发明的全部范围由所附权利要求及其任何等 同物给出。  Comparing Table 2 with Table 4, Figure 9B and 3B, it can be seen that both Q20 and Q30 of Table 2 are higher than those of Table 4, while the q30 box diagram of Figure 3B is more concentrated than that of Figure 9B, and the median value is also Significantly improved. The results indicate that PCR primers and sequencing primers have better sequencing results than LSA-modified primers. Although the specific embodiments of the present invention have been described in detail, those skilled in the art will understand. Various modifications and alterations may be made to those details in light of the teachings of the invention, which are within the scope of the invention. The full scope of the invention is given by the appended claims and any equivalents thereof.

Claims

权 利 要 求 Rights request 1. 用于高通量测序的锁核酸修饰的 DNA片段,其中所述 DNA片段选 自接头、 PCR引物和测序引物中的一种、 两种或三种, 其特征在于, 所述 DNA片段中的接头、 PCR引物和 /或测序引物含有锁核酸。  A lock nucleic acid-modified DNA fragment for high-throughput sequencing, wherein the DNA fragment is selected from one, two or three of a linker, a PCR primer, and a sequencing primer, characterized in that the DNA fragment is The linker, PCR primer and/or sequencing primer contains a locked nucleic acid. 2. 权利要求 1的 DNA片段, 所述接头中含有的锁核酸位于接头靠近 F链的 5,端, 和 /或靠近 R链的 3,端; 优选地, 所述靠近 F链的 5,端是指 位于 F链 5,端的第 2 ~ 5个(例如第 2、 3、 4、 5个)核苷酸处, 所述靠近 R链的 3,端是指位于 R链 3,端的第 2 ~ 5个(例如第 2、 3、 4、 5个)核苷 酸处; 优选地, 所述 F链或 R链中锁核酸的个数为 1 ~ 3个(例如为 1、 2、 3个) 。 2. The DNA fragment of claim 1, wherein the linker nucleic acid contained in the linker is located at the 5' end of the linker near the F chain, and/or near the 3' end of the R chain; preferably, the 5' end near the F chain It refers to the 2nd to 5th (for example, 2nd, 3rd, 4th, 5th) nucleotides located at the 5th end of the F chain, and the 3rd end near the R chain refers to the 2nd to the end of the R chain 3 5 (for example, 2, 3, 4, 5) nucleotides; preferably, the number of locked nucleic acids in the F chain or R chain is 1 to 3 (for example, 1, 2, 3) . 3. 权利要求 1的 DNA片段, 所述 PCR引物中, 正义引物所含有的锁 核酸位于靠近该 PCR引物的 5,端, 优选地, 所述靠近 5,端是指位于该引物 靠近 5,端的第 2 ~ 5个(例如第 2、 3、 4、 5个)核苷酸处; 和 /或反义引物 所含有的锁核酸位于靠近该 PCR引物的 3,端, 优选地, 所述靠近 3,端是指 位于该引物靠近 3,端的第 2 ~ 5个(例如第 2、 3、 4、 5个)核苷酸处; 进 一步优选地, 所述 PCR引物中锁核酸的个数为 1 ~ 3个(例如为 1、 2、 3 个) 。 3. The DNA fragment according to claim 1, wherein the PCR primer comprises a lock nucleic acid contained in the sense primer located near the 5th end of the PCR primer, preferably, the near 5 is at the end of the primer near the 5th end. 2 to 5 (eg, 2, 3, 4, 5) nucleotides; and/or the antisense primer contains a locked nucleic acid located near the 3' end of the PCR primer, preferably, the proximity 3 The terminus refers to the 2nd to 5th (for example, 2nd, 3rd, 4th, 5th) nucleotides located near the 3rd end of the primer; further preferably, the number of locked nucleic acids in the PCR primer is 1~ 3 (for example, 1, 2, 3). 4. 权利要求 1的 DNA片段, 所述测序引物中含有的锁核酸位于靠近 测序引物的 3,端; 优选地, 所述靠近测序引物的 3,端是指位于测序引物靠 近 3,端的第 2 ~ 5个 (例如第 2、 3、 4、 5个)核苷酸处; 优选地, 所述测 序引物中锁核酸的个数为 1 ~ 3个(例如为 1、 2、 3个) 。 4. The DNA fragment of claim 1, wherein the sequencing primer comprises a locked nucleic acid located near the 3' end of the sequencing primer; preferably, the 3 terminus adjacent to the sequencing primer refers to the second of the sequencing primer near the 3' end. ~ 5 (for example, 2, 3, 4, 5) nucleotides; preferably, the number of locked nucleic acids in the sequencing primer is 1 to 3 (for example, 1, 2, 3). 5. 组合物, 其含有权利要求 1-4任一项的 DNA片段。 A composition comprising the DNA fragment of any one of claims 1-4. 6. DNA文库的构建方法, 所述方法包括利用锁核酸修饰的 DNA片段进 行文库构建的步骤, 所述 DNA片段为接头和 /或 PCR引物, 所述锁核酸修饰 是指 DNA片段中接头和 PCR引物含有锁核酸; 优选地, A method for constructing a DNA library, the method comprising the step of constructing a library using a DNA fragment modified with a locked nucleic acid, wherein the DNA fragment is a linker and/or a PCR primer, and the lock nucleic acid modification refers to a linker and a PCR in the DNA fragment. The primer contains a locked nucleic acid; preferably, 所述接头中含有的锁核酸位于接头靠近接头 F链的 5,端, 和 /或靠近接 头 R链的 3,端; 优选地, 所述靠近接头 F链的 5,端是指位于 F链 5,端的第 2 ~ 5 个(例如第 2、 3、 4、 5个)核苷酸处, 所述靠近接头 R链的 3,端是指位于 R 链 3,端的第 2 ~ 5个核苷酸处; 优选地, 所述接头 F链或 R链中锁核酸的个数 为 1 ~ 3个 (例如为 1、 2、 3个) ; The lock nucleic acid contained in the linker is located at the 5th end of the linker F chain, and/or 3 end of the linker R chain; preferably, the 5 end of the link F chain is located at the F chain 5 , the second 2 ~ 5 At the nucleotides (for example, 2, 3, 4, 5), the 3, end of the R chain near the linker is located at the 2nd to 5th nucleotides of the R chain 3, preferably; The number of locked nucleic acids in the F chain or the R chain of the linker is 1 to 3 (for example, 1, 2, 3); 或者, 优选地,  Or, preferably, 所述 PCR引物中, 正义引物所含有的锁核酸位于靠近该 PCR引物的 5,端, 优选地, 所述靠近 5,端是指位于该引物靠近 5,端的第 2 ~ 5个(例如 第 2、 3、 4、 5个)核苷酸处;和 /或反义引物所含有的锁核酸位于靠近该 PCR 引物的 3,端, 优选地, 所述靠近 3,端是指位于该引物靠近 3,端的第 2 ~ 5 个 (例如第 2、 3、 4、 5个)核苷酸处; 进一步优选地, 所述 PCR引物中 锁核酸的个数为 1 ~ 3个 (例如为 1、 2、 3个) 。  In the PCR primer, the locked nucleic acid contained in the sense primer is located near the 5th end of the PCR primer, preferably, the close to 5, the end refers to the 2nd to 5th of the primer near the 5th end (for example, the second , 3, 4, 5) nucleotides; and/or the antisense primer contains a locked nucleic acid located near the 3' end of the PCR primer, preferably, the close to 3, the end is located at the primer close to 3 Further, the second to fifth (for example, the second, third, fourth, and fifth) nucleotides of the terminal; further preferably, the number of locked nucleic acids in the PCR primer is from 1 to 3 (for example, 1, 2, 3). 7. DNA文库的测序方法, 所述方法包括利用锁核酸修饰的测序引物 进行测序的步骤, 所述锁核酸修饰是指测序引物中含有锁核酸; 优选地, 所述测序引物中含有的锁核酸位于靠近测序引物的 3,端; 优选地, 所 述靠近测序引物的 3,端是指位于测序引物靠近 3,端的第 2 ~ 5个 (例如第 2、 3、 4、 5个)核苷酸处; 优选地, 所述测序引物中锁核酸的个数为 1 ~ 3个(例 如为 1、 2、 3个) 。 7. A sequencing method for a DNA library, the method comprising the step of sequencing using a locked nucleic acid-modified sequencing primer, wherein the locked nucleic acid modification comprises a sequencing primer comprising a locked nucleic acid; preferably, the sequencing primer comprises a locked nucleic acid Located at the 3' end of the sequencing primer; preferably, the 3 terminus near the sequencing primer refers to the 2nd to 5th (eg, 2nd, 3rd, 4th, 5th) nucleotides located near the 3rd end of the sequencing primer. Preferably, the number of locked nucleic acids in the sequencing primer is 1-3 (for example, 1, 2, 3). 8. 高通量测序方法, 其包括 DNA文库的构建和 DNA文库的测序, 所 述 DNA文库的构建方法为权利要求 6所述的构建方法, 所述 DNA文库的测 序方法为权利要求 7所述的测序方法。 A high-throughput sequencing method comprising the construction of a DNA library and the sequencing of the DNA library, the method of constructing the DNA library is the construction method according to claim 6, the sequencing method of the DNA library is the method of claim 7. Sequencing method. 9. 权利要求 1-4任一项的 DNA片段在高通量测序、 DNA文库的构建或 DNA文库的测序中的用途。 9. Use of a DNA fragment according to any one of claims 1 to 4 in high throughput sequencing, construction of a DNA library or sequencing of a DNA library.
PCT/CN2012/086521 2012-12-13 2012-12-13 Locked nucleic acid-modified dna fragment for high-throughput sequencing Ceased WO2014089797A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/086521 WO2014089797A1 (en) 2012-12-13 2012-12-13 Locked nucleic acid-modified dna fragment for high-throughput sequencing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/086521 WO2014089797A1 (en) 2012-12-13 2012-12-13 Locked nucleic acid-modified dna fragment for high-throughput sequencing

Publications (1)

Publication Number Publication Date
WO2014089797A1 true WO2014089797A1 (en) 2014-06-19

Family

ID=50933710

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/086521 Ceased WO2014089797A1 (en) 2012-12-13 2012-12-13 Locked nucleic acid-modified dna fragment for high-throughput sequencing

Country Status (1)

Country Link
WO (1) WO2014089797A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005073409A2 (en) * 2004-01-26 2005-08-11 Applera Corporation Methods, compositions, and kits for amplifying and sequencing polynucleotides
CN101413034A (en) * 2008-11-21 2009-04-22 东南大学 Method for preparing molecular cloning chip for high-throughput cloning of nucleic acid molecule
CN101831500A (en) * 2010-05-19 2010-09-15 广州市锐博生物科技有限公司 Small RNA (Ribonucleic Acid) quantitative detecting method and reagent kit
CN102301011A (en) * 2009-02-02 2011-12-28 埃克西库恩公司 Method for quantification of small RNA species
EP2405000A1 (en) * 2010-07-06 2012-01-11 Alacris Theranostics GmbH Synthesis of chemical libraries
EP2405017A1 (en) * 2010-07-06 2012-01-11 Alacris Theranostics GmbH Method for nucleic acid sequencing
WO2012118802A1 (en) * 2011-02-28 2012-09-07 Transgenomic, Inc. Kit and method for sequencing a target dna in a mixed population
CN102712955A (en) * 2009-11-03 2012-10-03 Htg分子诊断有限公司 Quantitative nuclease protection sequencing (qNPS)

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005073409A2 (en) * 2004-01-26 2005-08-11 Applera Corporation Methods, compositions, and kits for amplifying and sequencing polynucleotides
CN101413034A (en) * 2008-11-21 2009-04-22 东南大学 Method for preparing molecular cloning chip for high-throughput cloning of nucleic acid molecule
CN102301011A (en) * 2009-02-02 2011-12-28 埃克西库恩公司 Method for quantification of small RNA species
CN102712955A (en) * 2009-11-03 2012-10-03 Htg分子诊断有限公司 Quantitative nuclease protection sequencing (qNPS)
CN101831500A (en) * 2010-05-19 2010-09-15 广州市锐博生物科技有限公司 Small RNA (Ribonucleic Acid) quantitative detecting method and reagent kit
EP2405000A1 (en) * 2010-07-06 2012-01-11 Alacris Theranostics GmbH Synthesis of chemical libraries
EP2405017A1 (en) * 2010-07-06 2012-01-11 Alacris Theranostics GmbH Method for nucleic acid sequencing
WO2012004203A1 (en) * 2010-07-06 2012-01-12 Alacris Theranostics Gmbh Method for nucleic acid sequencing
WO2012118802A1 (en) * 2011-02-28 2012-09-07 Transgenomic, Inc. Kit and method for sequencing a target dna in a mixed population

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HUMMELSHOJ, L. ET AL.: "Locked nucleic acid inhibits amplification of contaminating DNA in real-time PCR", RESEARCH REPORT, vol. 38, no. 4, 31 December 2005 (2005-12-31), pages 605 - 610, XP001247310, DOI: doi:10.2144/05384RR01 *
RAYMOND, C.K. ET AL.: "Simple, quantitative primer-extension PCR assay for direct monitoring of microRNAs and short-interfering RNAs", RNA, vol. 11, no. 11, 31 December 2002 (2002-12-31), pages 1737 - 1744 *
SHAO, NINGSHENG ET AL.: "Advances in The SELEX Technique and Aptamers", PROGRESS IN BIOCHEMISTRY AND BIOPHYSICS, vol. 33, no. 4, 31 December 2006 (2006-12-31), pages 329 - 335 *

Similar Documents

Publication Publication Date Title
CN105506125B (en) A kind of sequencing approach and a kind of two generation sequencing libraries of DNA
CN103088433B (en) Construction method and application of genome-wide methylation high-throughput sequencing library
CN105400776B (en) Oligonucleotide adapters and their application in the construction of single-strand circular libraries for nucleic acid sequencing
CN102653784B (en) Tag used for multiple nucleic acid sequencing and application method thereof
CN110036117A (en) Increase the method for the treating capacity of single-molecule sequencing by multi-joint short dna segment
CN106497920A (en) A kind of library constructing method and test kit for nonsmall-cell lung cancer detection in Gene Mutation
WO2012037882A1 (en) Dna tags and use thereof
WO2012068919A1 (en) Dna library and preparation method thereof, and method and device for detecting snps
CN104153003A (en) Method for establishing DNA (Deoxyribose Nucleic Acid) library based on illumina sequencing platform
WO2012037880A1 (en) Dna tag and application thereof
CN102839168A (en) Nucleic acid probe, and preparation method and application thereof
WO2012126398A1 (en) Dna tag and use thereof
WO2012037875A1 (en) Dna tags and use thereof
CN104232627A (en) 2b-RAD pooling technology
CN112359093B (en) Method and kit for preparing and expressing and quantifying free miRNA library in blood
CN107604046A (en) Bimolecular self checking library for minim DNA ultralow frequency abrupt climatic change prepares and two generation sequence measurements of hybrid capture
WO2012037881A1 (en) Nucleic acid tags and use thereof
CN104093854A (en) Method and kit for characterizing rna in a composition
WO2018113799A1 (en) Method and test kit for constructing simplified genomic library
WO2023202030A1 (en) Method for constructing high-throughput sequencing library of small rna
CN104232626A (en) Barcode object in reduced-representation genome sequencing library and design method thereof
CN115125624A (en) A set of barcode adapters and a medium-throughput multiplex single-cell representative DNA methylation library construction and sequencing method
CN112501249B (en) RNA library preparation methods, sequencing methods and kits
WO2023247658A1 (en) Methods and compositions for nucleic acid sequencing
CN109750092A (en) A method and kit for targeting and enriching target DNA with high GC content

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12890130

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12890130

Country of ref document: EP

Kind code of ref document: A1