[go: up one dir, main page]

WO2025062001A1 - Optimised nucleic acid sequencing - Google Patents

Optimised nucleic acid sequencing Download PDF

Info

Publication number
WO2025062001A1
WO2025062001A1 PCT/EP2024/076524 EP2024076524W WO2025062001A1 WO 2025062001 A1 WO2025062001 A1 WO 2025062001A1 EP 2024076524 W EP2024076524 W EP 2024076524W WO 2025062001 A1 WO2025062001 A1 WO 2025062001A1
Authority
WO
WIPO (PCT)
Prior art keywords
immobilised
sequencing
primer
solid support
primers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2024/076524
Other languages
French (fr)
Inventor
Aathavan KARUNAKARAN
Seth MCDONALD
Merek SIU
Nileshi SARAF
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Marks & Clerk LLP
Illumina Inc
Original Assignee
Marks & Clerk LLP
Illumina Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Marks & Clerk LLP, Illumina Inc filed Critical Marks & Clerk LLP
Publication of WO2025062001A1 publication Critical patent/WO2025062001A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay

Definitions

  • aspects relate to solid supports and methods for use in nucleic acid sequencing, in particular solid supports and methods for use in concurrent sequencing.
  • the invention also relates to methods and kits for use in nucleic acid sequencing, in particular methods for use in optimising the signal-to-noise ratio in simultaneous sequencing, in particular by using a calculated amount of terminated primer(s).
  • deoxyribonucleic acid analogs conjugated to fluorescent labels are hybridized to the template nucleic acids, and excitation light sources are used to excite the fluorescent labels on the deoxyribonucleic acid analogs.
  • Detectors capture fluorescent emissions from the fluorescent labels and identify the deoxyribonucleic acid analogs.
  • the sequence of the template nucleic acids may be determined by repeatedly performing such sequencing cycles.
  • NGS allows for the sequencing of a number of different template nucleic acids simultaneously, which has significantly reduced the cost of sequencing in the last twenty years.
  • a solid support comprising: a plurality of first immobilised primers, wherein a proportion of the first immobilised primers are configured to be cleavable under first cleavage conditions; a plurality of second immobilised primers, wherein a proportion or substantially all of the second immobilised primers are configured to be cleavable under second cleavage conditions; and wherein the proportion of the first immobilised primers configured to be cleavable under first cleavage conditions is less than the proportion of the second immobilised primers configured to be cleavable under second cleavage conditions.
  • the solid support comprises at least one well, and wherein the plurality of first immobilised primers and the plurality of second immobilised primers are located within the well. In one aspect, the solid support comprises a plurality of wells, wherein the plurality of first immobilised primers and the plurality of second immobilised primers are located within each of the wells.
  • the proportion of first immobilised primers are configured to be cleavable by a thermal trigger, a light trigger, and/or a chemical/biochemical trigger. In one aspect, the proportion of first immobilised primers are configured to be cleavable by a glycosylase. In one aspect, the proportion of first immobilised primers are configured to be cleavable by a uracil glycosylase or an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase). In one aspect, the proportion of first immobilised primers are configured to be cleavable by an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase).
  • each first immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or thymine when the first immobilised primer is a DNA sequence; or wherein each first immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or uracil when the first immobilised primer is an RNA sequence.
  • each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) or uracil when the first immobilised primer is a DNA sequence, or wherein each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8- oxoguanine) when the first immobilised primer is an RNA sequence.
  • each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is a DNA sequence.
  • the proportion of second immobilised primers are configured to be cleavable by a thermal trigger, a light trigger, or a chemical/biochemical trigger. In one aspect, the proportion of second immobilised primers are configured to be cleavable by a glycosylase. In one aspect, the proportion of second immobilised primers are configured to be cleavable by a uracil glycosylase or an oxoguanine glycosylase (e.g. 8- oxoguanine glycosylase). In one aspect, the proportion of second immobilised primers are configured to be cleavable by a uracil glycosylase.
  • each second immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or thymine when the second immobilised primer is a DNA sequence; or wherein each second immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or uracil when the second immobilised primer is an RNA sequence.
  • each second immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) or uracil when the second immobilised primer is a DNA sequence, or wherein each second immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the second immobilised primer is an RNA sequence
  • each first immobilised primer comprises a sequence as defined in SEQ ID NO. 1 or 5, or a variant or fragment thereof; and each second immobilised primer comprises a sequence as defined in SEQ ID NO. 2, or a variant or fragment thereof; or wherein each first immobilised primer comprises a sequence as defined in SEQ ID NO. 2, or a variant or fragment thereof; and each second immobilised primer comprises a sequence as defined in SEQ ID NO. 1 or 5, or a variant or fragment thereof.
  • Each second polynucleotide sequence may comprise a first adaptor sequence, wherein the first adaptor sequence comprises a portion, which is substantially complementary to the first immobilised primer (or is substantially complementary to the first immobilised primer).
  • the first adaptor sequence may be at a 3’-end of the second polynucleotide sequence.
  • a solution comprising a polynucleotide library prepared by ligating adaptor sequences to double-stranded polynucleotide sequences as described above may be flown across a flowcell.
  • a particular polynucleotide strand from the polynucleotide library to be sequenced comprising, in a 5’ to 3’ direction, a second primer-binding complement sequence 302 (e.g. P7), a first terminal binding site complement 303’ (e.g. SBS12), a forward strand of the sequence 101 , a second terminal sequencing primer binding site 304 (e.g. SBS3’) and a first primer-binding sequence 30T (e.g. P5’), may anneal (via the first primerbinding sequence 301’) to the first immobilised primer 201 (e.g. P5 lawn primer) located within a particular well 203 ( Figure 7A).
  • a second primer-binding complement sequence 302 e.g. P7
  • a first terminal binding site complement 303’ e.g. SBS12
  • a forward strand of the sequence 101 e.g. SBS3’
  • a second terminal sequencing primer binding site 304 e.g. SBS
  • the polynucleotide library may comprise other polynucleotide strands with different forward strands of the sequence 101.
  • Such other polynucleotide strands may anneal to corresponding first immobilised primers 201 (e.g. P5 lawn primers) in different wells 203, thus enabling parallel processing of the various different strands within the polynucleotide library.
  • first immobilised primers 201 e.g. P5 lawn primers
  • a new polynucleotide strand may then be synthesised, extending from the first immobilised primer 201 (e.g. P5 lawn primer) in a direction away from the substrate 204.
  • this generates a template strand comprising, in a 5’ to 3’ direction, the first immobilised primer 201 (e.g. P5 lawn primer) which is attached to the solid support 200, a second terminal sequencing primer binding site complement 304’ (e.g. SBS3), a forward strand of the template 10T (which represents a type of “first portion”), a first terminal sequencing primer binding site 303 (which represents a type of “first sequencing primer binding site”) (e.g. SBS12’), and a second primer-binding sequence 302’ (e.g. P7’) ( Figure 7B).
  • Such a process may utilise an appropriate polymerase, such as a DNA or RNA polymerase.
  • the polynucleotide strand from the polynucleotide library may then be dehybridised and washed away, leaving a template strand attached to the first immobilised primer 201 (e.g. P5 lawn primer) ( Figure 7C).
  • first immobilised primer 201 e.g. P5 lawn primer
  • the second primer-binding sequence 302’ (e.g. P7’) on the template strand may then anneal to a second immobilised primer 202 (e.g. P7 lawn primer) located within the well 203. This forms a “bridge” or “sequence bridge” ( Figure 7D).
  • the strand attached to the second immobilised primer 202 may then be dehybridised from the strand attached to the first immobilised primer 201 (e.g. P5 lawn primer) ( Figure 7F).
  • a subsequent bridge amplification cycle can then lead to amplification of the strand attached to the first immobilised primer 201 (e.g. P5 lawn primer) and the strand attached to the second immobilised primer 202 (e.g. P7 lawn primer).
  • the second primer-binding sequence 302’ e.g. P7’
  • the first primer-binding sequence 30T e.g. P5’
  • the second immobilised primer 202 e.g. P7 lawn primer
  • further bridge amplification cycles may be conducted to increase the number of first polynucleotide sequences and second polynucleotide sequences within the well 203.
  • the “first portion” corresponds with the forward strand of the template 10T
  • the “second portion” corresponds with the forward complement strand of the template 101.
  • the template provides information (e.g. identification of the genetic sequence, identification of epigenetic modifications) on the original target polynucleotide sequence.
  • a sequencing process e.g. a sequencing-by-synthesis or sequencing-by-ligation process
  • identification is meant here obtaining genetic information from the polynucleotide strand or polynucleotide strands. This may include identification of the genetic sequence of the polynucleotide strand or polynucleotide strands (i.e. sequencing). Furthermore, this may instead, or additionally, include identification of mismatched base pairs. In addition, this may instead, or additionally, include identification of any epigenetic modifications, for example methylation. Accordingly, “identification” may mean identification of the genetic sequence of the polynucleotide strand or polynucleotide strands, mismatched base pairs, and/or identification of any epigenetic modifications.
  • sequencing may be carried out using any suitable "sequencing-by- synthesis" technique, wherein nucleotides are added successively in cycles to the free 3' hydroxyl group, resulting in synthesis of a polynucleotide chain in the 5' to 3' direction.
  • the nature of the nucleotide added may be determined after each addition.
  • One particular sequencing method relies on the use of modified nucleotides that can act as reversible chain terminators. Such reversible chain terminators comprise removable 3' blocking groups.
  • the modified nucleotides may carry a label to facilitate their detection.
  • a label may be configured to emit a signal, such as an electromagnetic signal, or a (visible) light signal.
  • the label is a fluorescent label (e.g. a dye).
  • a fluorescent label e.g. a dye
  • the label may be configured to emit an electromagnetic signal, or a (visible) light signal.
  • One method for detecting the fluorescently labelled nucleotides comprises using laser light of a wavelength specific for the labelled nucleotides, or the use of other suitable sources of illumination.
  • the fluorescence from the label on an incorporated nucleotide may be detected by a CCD camera or other suitable detection means. Suitable detection means are described in PCT/US2007/007991 , the contents of which are incorporated herein by reference in their entirety.
  • the detectable label need not be a fluorescent label. Any label can be used which allows the detection of the incorporation of the nucleotide into the DNA sequence.
  • Each cycle may involve simultaneous delivery of four different nucleotide types to the array of template molecules.
  • different nucleotide types can be added sequentially and an image of the array of template molecules can be obtained between each addition step.
  • each nucleotide type may have a (spectrally) distinct label.
  • four channels may be used to detect four nucleobases (also known as 4- channel chemistry) ( Figure 8 - left).
  • a first nucleotide type e.g. A
  • a second nucleotide type e.g. G
  • a second label e.g. configured to emit a second wavelength, such as blue light
  • a third nucleotide type e.g. T
  • a third label e.g.
  • one channel may be used to detect four nucleobases (also known as 1 -channel chemistry) ( Figure 8 - right).
  • a first nucleotide type e.g. A
  • a second nucleotide type e.g. G
  • a third nucleotide type e.g. T
  • a non-cleavable label e.g. configured to emit the wavelength, such as green light
  • a fourth nucleotide type e.g. C
  • a label-accepting site which does not include the label.
  • the sequencing process comprises a first sequencing read and second sequencing read.
  • the first sequencing read and the second sequencing read may be conducted concurrently. In other words, the first sequencing read and the second sequencing read may be conducted at the same time.
  • the first and second sequencing strands, and the clusters generated therefrom are spatially unresolved. Accordingly, the methods of the present invention allow the simultaneous sequencing of spatially unresolved clusters.
  • the methods of the invention provide a method of sequencing without the need for a paired-end turn and cluster re-synthesis. This in turn reduces the time taken to sequence a target polynucleotide, thus improving even further the efficiency of the sequencing protocol when using the method of the present invention.
  • a paired-end turn refers to the sequence of stages required to effectively invert the sequence for the second read in paired-end reading, after sequencing read 1 ( Figure 7). The paired-end turn may be facilitated by a cycle of bridge amplification and linearization.
  • the present invention also eliminates the need for cluster re-synthesis of the first or second polynucleotide sequences for read 2 of conventional SBS workflows ( Figure 7).
  • a method of preparing a first and second polynucleotide sequence for concurrent sequencing comprising:
  • the method of the invention may further comprise hybridising the first polynucleotide sequence to first immobilised primers on a solid support and hybridising the second polynucleotide sequence to second immobilised primers on a solid support; and synthesising a plurality of first and second polynucleotide sequences by conducting an amplification reaction to extend the first and second immobilised primers.
  • measurement of a first and second sequencing intensity is carried out sequentially.
  • the measurement of a first signal intensity generated from the first polynucleotide sequence is first, and the measurement of a second signal intensity generated from the second polynucleotide sequence is second - or vice versa.
  • the order of application is immaterial, so long as the first and second sequencing primers are applied separately and sequentially.
  • the method comprises applying (i.e. flowing across the surface of the solid support) a plurality of first (or second) sequencing primers (also known as a read 1 sequencing primer).
  • the first sequencing primers hybridise to the first sequencing primer binding site (e.g. first terminal sequencing primer binding site 303).
  • the method comprises applying a plurality of labelled first sequencing primers, and measuring the signal intensity generated from binding of the first sequencing primers to the first polynucleotide strand. This is the measurement of the first signal intensity described above.
  • the method comprises applying (i.e. flowing across the surface of the solid support) a plurality of first (or second) sequencing primers, where the sequencing primers are not labelled and conducting an extension reaction, as described above, to extend the first (or second) sequencing primer.
  • the sequencing primer may be extended by a plurality of labelled nucleotides, which are subsequently detected. In one embodiment, the sequencing primer is extended by just one labelled nucleotide. Alternatively, the sequencing primer is extended by 2, 3, 4, 5 or more labelled nucleotides.
  • the label comprises a cleavable covalent bond.
  • cleavable covalent bond refers to a covalent bond that can be cleaved, for example, under the application of heat, light or other (bio)chemical methods (e.g. by exposure to a degradation agent, such as an enzyme or a catalyst), while a “non- cleavable covalent bond” is stable to degradation under such conditions.
  • cleavable covalent bonds include thermally or photolytically cleavable cycloadducts (e.g.
  • X is the desired ratio of signal intensities. In one embodiment X may be 2. tR1 P is terminated R1 primer
  • R1 p is Read 1 sequencing primer (non-terminated)
  • R1 si is a first signal intensity generated from a first nucleotide sequence
  • R2si is a second signal intensity generated from a first nucleotide sequence
  • first and second polynucleotide strands can be sequenced concurrently.
  • first sequence e.g. forward strand of the template
  • second sequence e.g. reverse strand of the template
  • sequencing errors are improved because concurrently received signals can be better distinguished by 16 QaM analysis. This is illustrated in Figures 11 B and 12.
  • improving spatially unresolved imaging for clusters by improving the SNR allows for smaller pitch and nanowell dimensions to be used.
  • improved SNR allows imaging technologies with lower numerical aperture to be used, allowing for faster imaging of larger areas. This in turn increases the throughput power of sequencing reads.
  • sequences to be identified comprise one or more index sequence
  • the index is typically read separately from read 1 and read 2. Either before read 1 and before read 2, or afterwards. If afterwards, the extended sequencing primer is denatured and washed off the flowcell, and index primer is hybridized for the several cycles of index read.
  • the first portions and second portions may be different polynucleotide sequences. That is, the sequences may be genetically unrelated and/or derived from different sources.
  • the single (concatenated) polynucleotide strand with a first and second portion may comprise a first sequencing primer binding site and a second sequencing primer binding site, (used to sequence the first and second portions respectively) where the first sequencing primer binding site and second sequencing primer binding site are of a different sequence to each other and bind different sequencing primers.
  • a method of preparing a polynucleotide sequence comprising a first portion and a second portion for concurrent sequencing, wherein the method comprises: measuring a first signal intensity generated from a first portion of the polynucleotide sequence, and measuring a second signal intensity generated from a second portion of the polynucleotide sequence, and based on the first signal intensity and the second signal intensity, a subsequent signal intensity generated from sequencing the first nucleotide strand can be or is attenuated, wherein the attenuated subsequent signal intensity generated from sequencing the first polynucleotide strand has a lower intensity than a subsequent signal intensity generated from sequencing the second polynucleotide strand.
  • the measured intensity of a first and second signal as described above can be used to calculate the amount of terminated first (or second) sequencing primer to be added.
  • the method comprises applying a calculated amount of first (or second) terminated primer. Any amount of terminated sequencing primer that modulates the signal intensity ratio towards 2:1 or around 2:1 can be used.
  • Figure 9 is a scatter plot showing an example of sixteen distributions of signals generated by polynucleotide sequences disclosed herein. By creating different intensities of signals between the two clusters their identities can be resolved, despite them being spatially unresolved, i.e. within the same CMOS pixel.
  • the scatter plot of Figure 9 shows sixteen distributions (or bins) of intensity values from the combination of a brighter signal (i.e. a first signal as described herein) and a dimmer signal (i.e. a second signal as described herein); the two signals may be co-localized and may not be optically resolved as described above.
  • the intensity values shown in Figure 9 may be up to a scale or normalisation factor; the units of the intensity values may be arbitrary or relative (i.e., representing the ratio of the actual intensity to a reference intensity).
  • the sum of the brighter signal generated by the first portions and the dimmer signal generated by the second portions results in a combined signal.
  • the combined signal may be captured by a first optical channel and a second optical channel. Since the brighter signal may be A, T, C or G, and the dimmer signal may be A, T, C or G, there are sixteen possibilities for the combined signal, corresponding to sixteen distinguishable patterns when optically captured. That is, each of the sixteen possibilities corresponds to a bin shown in Figure 9.
  • the computer system can map the combined signal generated into one of the sixteen bins, and thus determine the added nucleobase at the first portion and the added nucleobase at the second portion, respectively.
  • the computer processor base calls both the added nucleobase at the first portion and the added nucleobase at the second portion as C.
  • the processor base calls the added nucleobase at the first portion as C and the added nucleobase at the second portion as T.
  • the processor base calls the added nucleobase at the first portion as C and the added nucleobase at the second portion as G.
  • the processor base calls the added nucleobase at the first portion as C and the added nucleobase at the second portion as A.
  • the processor base calls the added nucleobase at the first portion as A and the added nucleobase at the second portion as C.
  • the processor base calls the added nucleobase at the first portion as A and the added nucleobase at the second portion as T.
  • the processor base calls the added nucleobase at the first portion as A and the added nucleobase at the second portion as G.
  • the processor base calls both the added nucleobase at the first portion and the added nucleobase at the second portion as A.
  • T is configured to emit a signal in both the IMAGE 1 channel and the IMAGE 2 channel
  • A is configured to emit a signal in the IMAGE 1 channel only
  • C is configured to emit a signal in the IMAGE 2 channel only
  • G does not emit a signal in either channel.
  • A may be configured to emit a signal in both the IMAGE 1 channel and the IMAGE 2 channel
  • T may be configured to emit a signal in the IMAGE 1 channel only
  • C may be configured to emit a signal in the IMAGE 2 channel only
  • G may be configured to not emit a signal in either channel.
  • Figure 10 is a flow diagram showing a method 1700 of base calling according to the present disclosure.
  • the described method allows for simultaneous sequencing of two (or more) portions (e.g. the first portion and the second portion) in a single sequencing run from a single combined signal obtained from the first portion and the second portion, thus requiring less sequencing reagent consumption and faster generation of data from both the first portion and the second portion.
  • the simplified method may reduce the number of workflow steps while producing the same yield as compared to existing next-generation sequencing methods. Thus, the simplified method may result in reduced sequencing runtime.
  • the disclosed method 1700 may start from block 1701. The method may then move to block 1710.
  • first polynucleotide sequences are extended from the first immobilised primers and second polynucleotide sequences are extended from the second immobilised primers
  • this difference in proportions that are cleavable in the first immobilised primers and the second immobilised primers provides a way of selectively processing the first polynucleotide sequences relative to the second polynucleotide sequences.
  • the second immobilised primer is different in sequence to the first immobilised primer.
  • first immobilised primers i.e. other than the first immobilised primers that are configured to be cleavable under first cleavage conditions
  • the remaining population of first immobilised primers are not cleavable under the first cleavage conditions. As such, only some of the total population of first immobilised primers will become cleaved when the solid support is exposed to the first cleavage conditions.
  • the remaining population of second immobilised primers i.e. other than the second immobilised primers that are configured to be cleavable under second cleavage conditions
  • the remaining population of second immobilised primers are not cleavable under the second cleavage conditions.
  • there is no (or substantially no) remaining population of second immobilised primers As such, some or substantially all of the total population of second immobilised primers will become cleaved when the solid support is exposed to the second cleavage conditions.
  • the phosphate linkage is located at a 5’-end or a 3’-end of a nucleotide comprising an unnatural nucleobase (i.e. one which is not usually present in a typical DNA sequence an RNA sequence).
  • an unnatural nucleobase i.e. one which is not usually present in a typical DNA sequence an RNA sequence.
  • first cleavage conditions refers to reaction conditions that cause cleavage within the first immobilised primer (i.e. at the first cleavage site).
  • the first cleavage conditions may involve exposure to a thermal trigger (e.g. by heating), a light trigger (e.g. by exposure to ultraviolet light), and/or a chemical/biochemical trigger (e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladiumbased or a nickel-based catalyst, periodate).
  • a thermal trigger e.g. by heating
  • a light trigger e.g. by exposure to ultraviolet light
  • a chemical/biochemical trigger e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladiumbased or a nickel-based catalyst, periodate.
  • the “first cleavage conditions” may allow linearisation to occur, and may be referred to as “first linearisation conditions”.
  • the proportion of first immobilised primers may be configured to be cleavable by a thermal trigger (e.g. by heating), a light trigger (e.g. by exposure to ultraviolet light), and/or a chemical/biochemical trigger (e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladium-based or a nickel- based catalyst, periodate).
  • a thermal trigger e.g. by heating
  • a light trigger e.g. by exposure to ultraviolet light
  • a chemical/biochemical trigger e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladium-based or a nickel- based catalyst, periodate.
  • the proportion of first immobilised primers may be configured to be cleavable by a glycosylase.
  • the first cleavage conditions involve exposure to a glycosylase.
  • the proportion of first immobilised primers may be configured to be cleavable by a glycosylase that recognises any nitrogenous base (e.g. purine or pyrimidine) which is not selected from guanine (G), cytosine (C), adenine (A) and thymine (T) when the first immobilised primer is a DNA sequence; or the proportion of first immobilised primers may be configured to be cleavable by a glycosylase that recognises any nitrogenous base (e.g.
  • the glycosylase may recognise an unnatural nucleobase (i.e. one which is not usually present in a typical DNA sequence an RNA sequence).
  • unnatural nucleobases may include oxoguanine (e.g. 8-oxoguanine), hypoxanthine, xanthine, methylguanines (e.g. O 6 -methylguanine, N 7 - methylguanine), methyladenines (e.g.
  • methylcytosines e.g. 5-methylcytosine, 5-hydroxymethylcytosine, 5- formylcytosine, 5-carboxylcytosine
  • dihydrouracil inosine
  • uracil if the first immobilised primer is a DNA sequence.
  • the proportion of first immobilised primers may be configured to be cleavable by a uracil glycosylase (when the first immobilised primer is a DNA sequence) or an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase); and in a further embodiment, an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase).
  • a uracil glycosylase when the first immobilised primer is a DNA sequence
  • an oxoguanine glycosylase e.g. 8-oxoguanine glycosylase
  • an oxoguanine glycosylase e.g. 8-oxoguanine glycosylase
  • each first immobilised primer that is cleavable may comprise a nucleobase which is not selected from guanine, cytosine, adenine or thymine when the first immobilised primer is a DNA sequence, or wherein each first immobilised primer that is cleavable may comprise a nucleobase which is not selected from guanine, cytosine, adenine or uracil when the first immobilised primer is an RNA sequence.
  • each first immobilised primer that is cleavable may comprise an unnatural nucleobase (i.e. one which is not usually present in a typical DNA sequence an RNA sequence).
  • examples of unnatural nucleobases may include oxoguanine (e.g. 8- oxoguanine), hypoxanthine, xanthine, methylguanines (e.g. O 6 -methylguanine, N 7 - methylguanine), methyladenines (e.g. 3-methyladenine, N 6 -methyladenine), modified cytosines including methylcytosines (e.g. 5-methylcytosine, 5-hydroxymethylcytosine, 5- formylcytosine, 5-carboxylcytosine), dihydrouracil, inosine, and uracil (if the first immobilised primer is a DNA sequence).
  • oxoguanine e.g. 8- oxoguanine
  • hypoxanthine e.g. O 6 -methylguanine, N 7 - methylguanine
  • methyladenines e.g. 3-methyladenine, N 6 -methyladenine
  • each first immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) or uracil when the first immobilised primer is a DNA sequence, or wherein each first immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is an RNA sequence; and in an even further embodiment, wherein each first immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is a DNA sequence.
  • oxoguanine e.g. 8-oxoguanine
  • uracil when the first immobilised primer is a DNA sequence
  • each first immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is an RNA sequence
  • the proportion of second immobilised primers may be configured to be cleavable by a thermal trigger (e.g. by heating), a light trigger (e.g. by exposure to ultraviolet light), and/or a chemical/biochemical trigger (e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladium-based or a nickel- based catalyst, periodate).
  • a thermal trigger e.g. by heating
  • a light trigger e.g. by exposure to ultraviolet light
  • a chemical/biochemical trigger e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladium-based or a nickel- based catalyst, periodate.
  • the glycosylase may recognise an unnatural nucleobase (i.e. one which is not usually present in a typical DNA sequence an RNA sequence).
  • unnatural nucleobases may include oxoguanine (e.g. 8-oxoguanine), hypoxanthine, xanthine, methylguanines (e.g. O 6 -methylguanine, N 7 -methylguanine), methyladenines (e.g.
  • the proportion of first immobilised primers cleavable under first cleavage conditions relative to a total population of first immobilised primers may be between 0.2 to 0.8, between 0.25 to 0.75 (in a further embodiment), between 1 Zs to % (in an even further embodiment), or about 0.5 (in a yet even further embodiment); whilst respective proportions of first immobilised primers which are not cleavable under first cleavage conditions relative to a total population of first immobilised primers may be between 0.8 to 0.2, between 0.75 to 0.25 (in the further embodiment), between % to 1 Zs (in the even further embodiment), or about 0.5 (in the yet even further embodiment) (wherein the proportion of first immobilised primers cleavable under first cleavage conditions and the proportion of first immobilised primers which are not cleavable under first cleavage conditions sums to 1).
  • a surface of the solid support may comprise at least one first linking group capable of forming non-covalent interactions, covalent bonds, or metalcoordination bonds with a second linking group; in a further embodiment, non-covalent interactions or covalent bonds.
  • the surface of the solid support may comprise a plurality of first linking groups.
  • first immobilised primer and/or the second immobilised primer may comprise a first linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with a second linking group; in a further embodiment, covalent bonds.
  • the first linking groups are advantageous because these can form cross-links with the template strands, thus allowing the template strands to be fixed to the surface of the solid support and/or the immobilised primers, and preventing them from becoming washed away (e.g. during template melting). Although not strictly necessary, this may be useful during amplification, clustering or sequencing of the template, particularly when the first immobilised primer and/or the second immobilised primer is cleaved on exposure to the first cleavage conditions and/or second cleavage conditions.
  • the first linking groups may be located only within the well (or plurality of wells). In other words, a region outside the well (or plurality of wells) may not comprise the first linking groups. In particular, a region outside the well (or plurality of wells) may not comprise the first linking groups, as well as not comprising the first immobilised primers and the second immobilised primers.
  • the first linking groups may be capable of forming non-covalent interactions.
  • These non-covalent interactions may include one or more of ionic bonds, hydrogen bonds, hydrophobic interactions, TT-TT interactions, van der Waals interactions and host-guest interactions.
  • the type of interaction is not particularly limited, provided that the interactions are (collectively) sufficiently strong for the template strands to remain attached to the solid support during amplification, clustering or sequencing.
  • the term “ionic bond” refers to a chemical bond between two or more ions that involves an electrostatic attraction between a cation and an anion.
  • the cation may be selected from “metal cations”, as described herein, or “non-metal cations”.
  • Non-metal cations may include ammonium salts (e.g. alkylammonium salts) or phosphonium salts (e.g. alkylphosphonium salts).
  • the anion may be selected from phosphates, thiophosphates, phosphonates, thiophosphonates, phosphinates, thiophosphinates, sulfates, sulfonates, sulfites, sulfinates, carbonates, carboxylates, alkoxides, phenolates and thiophenolates.
  • hydrogen bond refers to a bonding interaction between a lone pair on an electron-rich atom (e.g. nitrogen, oxygen or fluorine) and a hydrogen atom attached to an electronegative atom (e.g. nitrogen or oxygen).
  • electron-rich atom e.g. nitrogen, oxygen or fluorine
  • hydrogen atom attached to an electronegative atom (e.g. nitrogen or oxygen).
  • the term “host-guest interaction” refers to two or more groups which are able to form bound complexes via one or more types of non-covalent interactions by molecular recognition, such as ionic bonding, hydrogen bonding, hydrophobic interactions, van der Waals interactions and TT-TT interactions.
  • the host- guest interaction may include interactions formed between cucubiturils with adamantanes (e.g. 1-adamantylamine), ammonium ions (e.g. amino acids), ferrocenes; cyclodextrins with adamantanes (e.g. 1-adamantylamine), ammonium ions (e.g.
  • ferrocenes calixarenes with adamantanes (e.g. 1-adamantylamine), ammonium ions (e.g. amino acids), ferrocenes; crown ethers (e.g. 18-crown-6, 15-crown-5, 12- crown-4) or cryptands (e.g. [2.2.2]cryptand) with cations (e.g. metal cations, ammonium ions); avidins (e.g. streptavidin) and biotin; and antibodies and haptens.
  • adamantanes e.g. 1-adamantylamine
  • ammonium ions e.g. amino acids
  • ferrocenes e.g. 18-crown-6, 15-crown-5, 12- crown-4
  • cryptands e.g. [2.2.2]cryptand
  • the first linking groups may be capable of forming covalent bonds.
  • the bond may be stable such that the template strands remain attached to the solid support during amplification, clustering or sequencing.
  • covalent bonds include cycloadducts (e.g. triazole cycloadducts, cyclobutane cycloadducts, furan-maleimide cycloadducts), alkylene linkages, alkenylene linkages, esters, amides, acetals, hemiaminal ethers, aminals, imines, hydrazones, polysulfide linkages (e.g. disulfide linkages), boron-based linkages (e.g. boronic and borinic acids/esters), silicon-based linkages (e.g. silyl ether, siloxane), and phosphorus- based linkages (e.g. phosphite, phosphate, thiophosphate) linkages.
  • cycloadducts e.g. triazole cycloadducts, cyclobutane cycloadducts, furan-maleimide cycloadducts
  • cycloadduct refers to a cyclic structure formed from a cycloaddition reaction between two components (e.g. Diels-Alder type cycloadditions between a diene and a dienophile, including inverse Diels-Alder type cycloadditions, 1 ,3- dipolar type cycloadditions between a dipole and a dipolarophile, or [2 + 2] cycloadditions between two alkenes).
  • Diels-Alder type cycloadditions between a diene and a dienophile including inverse Diels-Alder type cycloadditions, 1 ,3- dipolar type cycloadditions between a dipole and a dipolarophile, or [2 + 2] cycloadditions between two alkenes).
  • alkyl or alkylene refers to monovalent or divalent straight and branched chain groups respectively having from 1 to 12 carbon atoms.
  • the alkyl or alkylene groups are straight or branched alkyl or alkylene groups having from 1 to 6 carbon atoms; in a yet further embodiment, straight or branched alkyl or alkylene groups having from 1 to 4 carbon atoms.
  • An alkyl or alkylene group may comprise one or more “substituents”, as described herein.
  • alkenyl or “alkenylene” refers to monovalent or divalent straight and branched chain groups respectively having from 1 to 12 carbon atoms, and which comprise at least one carbon-carbon double bond.
  • the alkenyl or alkenylene groups are straight or branched alkenyl or alkenylene groups having from 1 to 6 carbon atoms; in a yet further embodiment, straight or branched alkenyl or alkenylene groups having from 1 to 4 carbon atoms.
  • An alkenyl or alkenylene group may comprise one or more “substituents”, as described herein.
  • alkynyl refers to monovalent straight and branched chain groups respectively having from 1 to 12 carbon atoms, and which comprise at least one carbon-carbon triple bond.
  • the alkynyl groups are straight or branched alkynyl groups having from 1 to 6 carbon atoms; in a yet further embodiment, straight or branched alkynyl groups having from 1 to 4 carbon atoms.
  • An alkynyl group may comprise one or more “substituents”, as described herein.
  • amino refers to a -N(R)(R’) group, where R and R’ are independently hydrogen or a “substituent” as defined herein.
  • amine linkage refers to a -NR- group, and where R is hydrogen or a “substituent” as defined herein.
  • aryl refers to a monovalent monocyclic, bicyclic or tricyclic aromatic group respectively containing from 6 to 14 carbon atoms in the ring. Common aryl groups include C6-C14 aryl, for example, Ce-C aryl. An aryl group may comprise one or more “substituents”, as described herein.
  • a “heterocyclyl” group refers to a monovalent saturated or partially saturated 3 to 7 membered monocyclic, or 7 to 10 membered bicyclic ring system respectively, which consists of carbon atoms and from one to four heteroatoms independently selected from the group consisting of O, N, and S, wherein the nitrogen and sulfur heteroatoms may be optionally oxidised, the nitrogen may be optionally quaternised, and includes any bicyclic group in which any of the above-defined rings is fused to a benzene ring, and wherein the ring may be substituted on carbon or on a nitrogen atom if the resulting compound is stable.
  • heterocyclyl groups include pyrrolidinyl, tetrahydrofuranyl, dihydrofuranyl, tetrahydrothienyl, tetrahydrothiopyranyl, isoxazolinyl, piperidyl, morpholinyl, thiomorpholinyl, thioxanyl, piperazinyl, azetidinyl, oxetanyl, thietanyl, homopiperidyl, oxepanyl, thiepanyl, oxazepinyl, diazepinyl, thiazepinyl, 1 , 2,3,6- tetrahydropyridyl, 2-pyrrolinyl, 3-pyrrolinyl, indolinyl, 2H-pyranyl, 4H-pyranyl, dioxanyl, 1 ,3-dioxolanyl, pyrazolinyl, di
  • a heterocyclyl group may comprise one or more “substituents”, as described herein.
  • heteroaryl group refers to monovalent aromatic groups having 5 to 14 ring atoms respectively (for example, 5 to 10 ring atoms) and containing carbon atoms and 1 , 2 or 3 oxygen, nitrogen or sulfur heteroatoms.
  • Non-limiting examples of “heteroaryl” groups include quinolyl including 8-quinolyl, isoquinolyl, coumarinyl including 8-coumarinyl, pyridyl, pyrazinyl, pyrazolyl, pyrimidinyl, pyridazinyl, furyl, pyrrolyl, thienyl, thiazolyl, isothiazolyl, triazolyl (e.g.
  • tetrazolyl isoxazolyl, oxazolyl, imidazolyl, indolyl, isoindolyl, indazolyl, indolizinyl, phthalazinyl, pteridinyl, purinyl, oxadiazolyl, thiadiazolyl, furazanylene, pyridazinyl, triazinyl, cinnolinyl, benzimidazolyl, benzofuranyl, benzofurazanyl, benzothiophenyl, benzothiazolyl, benzoxazolyl, quinazolinyl, quinoxalinyl, naphthyridinyl and furopyridyl.
  • heteroaryl (or heteroarylene) group contains a nitrogen atom in a ring
  • nitrogen atom may be in the form of an N- oxide, e.g., a pyridyl N-oxide, pyrazinyl N-oxide, pyrimidinyl N-oxide and pyridazinyl N- oxide.
  • a heteroaryl group may comprise one or more “substituents”, as described herein.
  • acetal refers to a -OC(R)(R’)O- group, where R and R’ are independently hydrogen or a “substituent” as described herein.
  • hypothalamic ether refers to a -OC(R)(R’)NR”- group, where R, R’ and R” are independently hydrogen or a “substituent” as described herein.
  • the term “aminal” refers to a -NR(R’)(R”)NR”’- group, where R, R’, R” and R’” are independently hydrogen or a “substituent” as described herein.
  • polysulfide refers to a -(S) n - group, wherein n is 2 to 10, or 2 to 6.
  • n may be 2, forming a “disulfide” linkage.
  • boron-based linkage refers to a -(O) a -B(OR)-(O)b- group, where R is independently hydrogen or a “substituent” as described herein, and where a and b are independently 0 or 1 .
  • silicon-based linkage refers to a -(O) a -Si(R)(R’)-(O)b- group, where R and R’ are independently hydrogen or a “substituent” as described herein, and where a and b are independently 0 or 1 .
  • phosphorus-based linkage refers to a -(O) a -P(R)-(O)b- group, where R and R’ are independently hydrogen or a “substituent” as described herein, and where a and b are independently 0 or 1.
  • substituents may be chosen from the foregoing list.
  • each R’ may be the same or different.
  • the first linking groups may be capable of forming metalcoordination bonds. Where metal-coordination bonds are used, the bond may be strong enough such that the template strands remain attached to the solid support during amplification, clustering or sequencing.
  • the non-terminated first sequencing primers comprise or consist of a sequence selected from SEQ ID NOs: 7 to 10 or a variant or fragment thereof and the second sequencing primers comprises or consists of a different sequence selected from SEQ ID NOs: 7 to 10, or a variant or fragment thereof.
  • the data processing device may comprise a solid support as described herein, such as a flow cell.
  • a computer-readable storage medium comprising instructions which, when executed by a processor, cause the processor to carry out the methods as described herein.
  • a software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of computer-readable storage medium known in the art.
  • An exemplary storage medium can be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium can be integral to the processor.
  • the processor and the storage medium can reside in an ASIC.
  • a software module can comprise computer-executable instructions which cause a hardware processor to execute the computer-executable instructions.
  • Computer-executable instructions may be stored in a (transitory or non-transitory) computer readable storage medium (e.g., memory, storage system, etc.) storing code, or computer readable instructions.
  • a (transitory or non-transitory) computer readable storage medium e.g., memory, storage system, etc.
  • Disjunctive language such as the phrase “at least one of X, Y or Z,” unless specifically stated otherwise, is otherwise understood with the context as used in general to present that an item, term, etc., may be either X, Y or Z, or any combination thereof (e.g., X, Y and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y or at least one of Z to each be present.
  • the terms “about” or “approximate” and the like are synonymous and are used to indicate that the value modified by the term has an understood range associated with it, where the range can be ⁇ 20%, ⁇ 15%, ⁇ 10%, ⁇ 5%, or ⁇ 1%.
  • the term “substantially” is used to indicate that a result (e.g., measurement value) is close to a targeted value, where close can mean, for example, the result is within 80% of the value, within 90% of the value, within 95% of the value, or within 99% of the value.
  • the term “partially” is used to indicate that an effect is only in part or to a limited extent.
  • a device configured to or “a device to” are intended to include one or more recited devices.
  • Such one or more recited devices can also be collectively configured to carry out the stated recitations.
  • a processor to carry out recitations A, B and C can include a first processor configured to carry out recitation A working in conjunction with a second processor configured to carry out recitations B and C.
  • a PAZAM coated, polished, non-grafted HiSeq4K flowcell was grafted with 10 pM P7-8oxoG and 10 M P5-U (lanes 1 and 2): oligo mixes were made up in 1.125M Na2SC>4 with 0.1% Tween20. 175 pl of this buffer was spiked with the appropriate primers and used in the grafting reaction on an Illumina cBot. The grafting mix was pumped onto the flowcell surface and incubated at 60 °C for 60 mins to allow the click chemistry between the 5’ BCNs on the oligos and the free azides to take place.
  • the flowcell lanes were washed with HT 1 buffer and then oligo grafting checked via a TET-QC assay (hybridisation of TET labelled complements to the P5/P7 oligo sequences, Typhoon instrument scanning to assess the levels of TET signal in each lane).
  • TET-QC oligos were removed by 0.1 N NaOH dehyb before the flowcell was used to make clusters.
  • lanes were prepared for either “read 1” or “read 2” by linearising the P5 oligos with LMX1 (for read 1 in lane 1) or linearising the linearizable P7 oligos with FpG (PLM2v2 reagent, HiSeqX PE kit) (for read 2 in lane 2). Linearisation for both cases was for 30 mins at 38 °C. Lanes were then primer hybed with either HP10 (read 1 primer mix) or HP11 (read 2 primer mix) as appropriate.
  • R1 linearised lanes all show similar first cycle intensity (lanes 1 , 3, 5 and 7), while the R2 linearised lanes (lanes 2, 4, 6 and 8) show decreasing intensity due to the increasing concentration of P7-nonlin within the grafting mix.
  • Lanes 3 and 4 in particular show an almost 2: 1 ratio of first cycle intensities as desired, but the flowcell in general shows how the ratio is completely tunable as required.
  • SEQ ID NO. 2 P7 sequence
  • SEQ ID NO. 4 P7’ sequence (complementary to P7)
  • SEQ ID NO. 6 Alternative P5’ sequence (complementary to alternative P5 sequence)
  • SEQ ID NO. 7 SBS3
  • SEQ ID NO. 9 SBS12
  • SEQ ID NO. 12 BCN-P7-8oxoG

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Aspects relate to solid supports and methods for use in nucleic acid sequencing, in particular solid supports and methods for use in concurrent sequencing. The invention also relates to methods and kits for use in nucleic acid sequencing, in particular methods for use in optimising the signal-to-noise ratio in simultaneous sequencing, in particular by using a calculated amount of terminated primer(s).

Description

OPTIMISED NUCLEIC ACID SEQUENCING
Field of the Invention
Aspects relate to solid supports and methods for use in nucleic acid sequencing, in particular solid supports and methods for use in concurrent sequencing. The invention also relates to methods and kits for use in nucleic acid sequencing, in particular methods for use in optimising the signal-to-noise ratio in simultaneous sequencing, in particular by using a calculated amount of terminated primer(s).
Background of the Invention
In some types of next-generation sequencing (NGS) technologies, a nucleic acid cluster is created on a flow cell by amplifying an original template nucleic acid strand. Sequencing cycles may be performed as complementary strands of the template nucleic acids are being synthesized, i.e., using sequencing-by-synthesis (SBS) processes.
In each sequencing cycle, deoxyribonucleic acid analogs conjugated to fluorescent labels are hybridized to the template nucleic acids, and excitation light sources are used to excite the fluorescent labels on the deoxyribonucleic acid analogs. Detectors capture fluorescent emissions from the fluorescent labels and identify the deoxyribonucleic acid analogs. As a result, the sequence of the template nucleic acids may be determined by repeatedly performing such sequencing cycles.
NGS allows for the sequencing of a number of different template nucleic acids simultaneously, which has significantly reduced the cost of sequencing in the last twenty years. However, there remains a desire for further improvements in sequencing throughput and speed.
Summary
According to an aspect of the present invention, there is provided a solid support, comprising: a plurality of first immobilised primers, wherein a proportion of the first immobilised primers are configured to be cleavable under first cleavage conditions; a plurality of second immobilised primers, wherein a proportion or substantially all of the second immobilised primers are configured to be cleavable under second cleavage conditions; and wherein the proportion of the first immobilised primers configured to be cleavable under first cleavage conditions is less than the proportion of the second immobilised primers configured to be cleavable under second cleavage conditions.
In one aspect, the solid support comprises at least one well, and wherein the plurality of first immobilised primers and the plurality of second immobilised primers are located within the well. In one aspect, the solid support comprises a plurality of wells, wherein the plurality of first immobilised primers and the plurality of second immobilised primers are located within each of the wells.
In one aspect, the proportion of first immobilised primers are configured to be cleavable by a thermal trigger, a light trigger, and/or a chemical/biochemical trigger. In one aspect, the proportion of first immobilised primers are configured to be cleavable by a glycosylase. In one aspect, the proportion of first immobilised primers are configured to be cleavable by a uracil glycosylase or an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase). In one aspect, the proportion of first immobilised primers are configured to be cleavable by an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase).
In one aspect, each first immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or thymine when the first immobilised primer is a DNA sequence; or wherein each first immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or uracil when the first immobilised primer is an RNA sequence.
In one aspect, each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) or uracil when the first immobilised primer is a DNA sequence, or wherein each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8- oxoguanine) when the first immobilised primer is an RNA sequence. In one aspect, each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is a DNA sequence.
In one aspect, the proportion of second immobilised primers are configured to be cleavable by a thermal trigger, a light trigger, or a chemical/biochemical trigger. In one aspect, the proportion of second immobilised primers are configured to be cleavable by a glycosylase. In one aspect, the proportion of second immobilised primers are configured to be cleavable by a uracil glycosylase or an oxoguanine glycosylase (e.g. 8- oxoguanine glycosylase). In one aspect, the proportion of second immobilised primers are configured to be cleavable by a uracil glycosylase.
In one aspect, each second immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or thymine when the second immobilised primer is a DNA sequence; or wherein each second immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or uracil when the second immobilised primer is an RNA sequence.
In one aspect, each second immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) or uracil when the second immobilised primer is a DNA sequence, or wherein each second immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the second immobilised primer is an RNA sequence
In one aspect, each second immobilised primer that is cleavable comprises uracil when the second immobilised primer is a DNA sequence.
In one aspect, each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is a DNA sequence, and wherein each second immobilised primer that is cleavable comprises uracil when the second immobilised primer is a DNA sequence; or wherein each first immobilised primer that is cleavable comprises uracil when the first immobilised primer is a DNA sequence, and wherein each second immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the second immobilised primer is a DNA sequence.
In one aspect, the first cleavage conditions and the second cleavage conditions are the same or are different. In one aspect, the first cleavage conditions and the second cleavage conditions are different.
In one aspect, a ratio between the first immobilised primers configured to be cleavable under first cleavage conditions and the second immobilised primers configured to be cleavable under second cleavage conditions is between 1 :1.25 to 1 :5. In one aspect, the ratio is between 1 : 1.5 to 1 :3. In one aspect, the ratio is about 1 :2.
In one aspect, the proportion of first immobilised primers cleavable under first cleavage conditions relative to a proportion of first immobilised primers which are not cleavable under first cleavage conditions is between 20:80 to 80:20.
In one aspect, the proportion is between 1 :3 to 3:1. In one aspect, the proportion is between 1 :2 to 2:1. In one aspect, the proportion is about 1 :1.
In one aspect, the proportion of first immobilised primers cleavable under first cleavage conditions relative to a total population of first immobilised primers is between 0.2 to 0.8. In one aspect, the proportion is between 0.25 to 0.75. In one aspect, the proportion is between % to %. In one aspect, the proportion is about 0.5.
In one aspect, the proportion of second immobilised primers cleavable under second cleavage conditions relative to a total population of second immobilised primers is 0.9 or more. In one aspect, the proportion is 0.95 or more.
In one aspect, substantially all of the second immobilised primers are cleavable under second cleavage conditions.
In one aspect, each first immobilised primer comprises a sequence as defined in SEQ ID NO. 1 or 5, or a variant or fragment thereof; and each second immobilised primer comprises a sequence as defined in SEQ ID NO. 2, or a variant or fragment thereof; or wherein each first immobilised primer comprises a sequence as defined in SEQ ID NO. 2, or a variant or fragment thereof; and each second immobilised primer comprises a sequence as defined in SEQ ID NO. 1 or 5, or a variant or fragment thereof.
In one aspect, a surface of the solid support comprises at least one first linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with a second linking group.
In one aspect, the surface of the solid support comprises at least one first linking group capable of forming non-covalent interactions or covalent bonds with the second linking group. In one aspect, the first linking group is capable of forming non-covalent interactions, and the first linking group comprises a biotin moiety or an avidin (e.g. streptavidin). In one aspect, the first linking group comprises an avidin (e.g. streptavidin).
In one aspect, the first linking group is capable of forming covalent bonds, and the first linking group comprises a hydroxyl group, an alkyne (e.g. a terminal alkyne) or an azide (e.g. poly(N-(5-azidoacetamidylpentyl)acrylamide-co-acrylamide, PAZAM). In one aspect, the first linking group comprises an azide (e.g. poly(N-(5- azidoacetamidylpentyl)acrylamide-co-acrylamide, PAZAM).
In one aspect, the first immobilised primer and/or the second immobilised primer comprises a first linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with a second linking group. In one aspect, the first linking group is capable of forming covalent bonds with the second linking group.
In one aspect, the first linking group is capable of forming covalent bonds, and the first linking group comprises an alkene moiety (e.g. an electron-deficient alkene such as 3- cyanovinylcarbazole).
In one aspect, the solid support is a flow cell.
According to another aspect of the present invention, there is provided a kit comprising a solid support as described herein.
According to another aspect of the present invention, there is provided a use of a solid support as described herein in nucleic acid sequencing.
According to another aspect of the present invention, there is provided a process of manufacturing a solid support, comprising:
(a) immobilising a plurality of first precursor primers onto a solid support to form a plurality of first immobilised primers, wherein a proportion of the first precursor primers are configured to be cleavable under first cleavage conditions; and
(b) immobilising a plurality of second precursor primers onto the solid support to form a plurality of second immobilised primers, wherein a proportion or substantially all of the second precursor primers are configured to be cleavable under second cleavage conditions; wherein the proportion of the first precursor primers configured to be cleavable under first cleavage conditions is less than the proportion of the second precursor primers configured to be cleavable under second cleavage conditions.
In one aspect, steps (a) and (b) are conducted sequentially or simultaneously. In one aspect, step (b) is conducted after step (a). In one aspect, step (a) is conducted after step (b). In one aspect, steps (a) and (b) are conducted simultaneously.
In one aspect, a ratio of a concentration of first precursor primers that are configured to be cleavable under first cleavage conditions used in step (a) compared to a concentration of second precursor primers that are configured to be cleavable under second cleavage conditions used in step (b) is between 0.2 to 0.8. In one aspect, the ratio is between 0.25 to 0.75. In one aspect, the ratio is between 0.5 to 0.75.
In one aspect, immobilisation comprises forming covalent linkages between the solid support and each of the plurality of first precursor primers, and between the solid support and each of the plurality of second precursor primers. In one aspect, forming covalent linkages involves using a click reaction. In one aspect, forming covalent linkages involves forming a 1 ,2,3-triazole linkage.
According to another aspect of the present invention, there is provided a method of preparing polynucleotide sequences for identification, comprising: providing a solid support as described herein, and synthesising a plurality of first polynucleotide sequences each comprising first portions and each extending from the first immobilised primers, and a plurality of second polynucleotide sequences each comprising second portions and each extending from the second immobilised primers.
In one aspect, the method further comprises a step of exposing the solid support to first cleavage conditions and/or second cleavage conditions after the step of synthesising the plurality of first polynucleotide sequences and the second polynucleotide sequences.
In one aspect, the first cleavage conditions and/or second cleavage conditions comprise exposure to a thermal trigger, a light trigger, and/or a chemical/biochemical trigger. In one aspect, the solid support is exposed to a glycosylase. In one aspect, the solid support is exposed to a uracil glycosylase and/or an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase). In one aspect, the method further comprises a step of blocking 3’-ends of the first polynucleotide sequences and the second polynucleotide sequences to prevent further extension of the first polynucleotide sequences and the second polynucleotide sequences.
In one aspect, the method further comprises a step of removing first immobilised primers and/or second immobilised primers that are not extended.
In one aspect, the solid support is a solid support wherein a surface of the solid support comprises at least one first linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with a second linking group, wherein the first polynucleotide sequence and/or the second polynucleotide sequence comprises a second linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with the first linking group, and the method further comprises a step of forming the non-covalent interaction, covalent bond and/or metal-coordination bond between the first linking group and the second linking group.
In one aspect, the second linking group is capable of forming non-covalent interactions, and the second linking group comprises an avidin (e.g. streptavidin) or a biotin moiety.
In one aspect, the second linking group comprises a biotin moiety.
In one aspect, the second linking group is capable of forming covalent bonds, and the second linking group comprises a thiophosphate, an azide or an alkyne (e.g. a terminal alkyne).
In one aspect, the non-covalent interaction comprises an avidin-biotin interaction (e.g. a streptavidin-biotin interaction).
In one aspect, the covalent bond comprises a thiophosphate ester linkage, or a triazole linkage.
In one aspect, the solid support is a solid support wherein the first immobilised primer and/or the second immobilised primer comprises a first linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with a second linking group, wherein the first polynucleotide sequence and/or the second polynucleotide sequence comprises a second linking group capable of forming non- covalent interactions, covalent bonds, or metal-coordination bonds with the first linking group, and the method further comprises a step of forming the non-covalent interaction, covalent bond and/or metal-coordination bond between the first linking group and the second linking group.
In one aspect, the second linking group is capable of forming covalent bonds, and the second linking group comprises an alkene moiety (e.g. a pyrimidine nucleobase, such as a thymine, uracil or cytosine nucleobase). In one aspect, the covalent bond comprises a cyclobutane linkage.
In one aspect, the method further comprises a step of linearising the first polynucleotide sequence and the second polynucleotide sequence.
In one aspect, the method further comprises treating the linearised first polynucleotide sequence and the linearised second polynucleotide sequence with a single-stranded binding protein.
According to another aspect of the present invention, there is provided a method of sequencing polynucleotide sequences, comprising: preparing polynucleotide sequences for identification using a method as described herein; and concurrently sequencing nucleobases in the first portion and the second portion.
In one aspect, the step of concurrently sequencing nucleobases comprises performing sequencing-by-synthesis or sequencing-by-ligation. In one aspect, the step of concurrently sequencing nucleobases comprises treatment with a strand displacement polymerase (e.g. phi29). In one aspect, the step of concurrently sequencing nucleobases comprises treatment with a 5’-3’ exonuclease.
In one aspect, the method further comprises a step of conducting paired-end reads.
In one aspect, the step of concurrently sequencing nucleobases comprises:
(a) obtaining first intensity data comprising a combined intensity of a first signal component obtained based upon a respective first nucleobase at a first portion of the template sequence and a second signal component obtained based upon a respective second nucleobase at a second portion of the template complement sequence, wherein the first and second signal components are obtained simultaneously;
(b) obtaining second intensity data comprising a combined intensity of a third signal component obtained based upon the respective first nucleobase at the first portion and a fourth signal component obtained based upon the respective second nucleobase at the second portion, wherein the third and fourth signal components are obtained simultaneously;
(c) selecting one of a plurality of classifications based on the first and the second intensity data, wherein each classification represents a possible combination of respective first and second nucleobases; and
(d) based on the selected classification, base calling the respective first and second nucleobases.
In one aspect, selecting the classification based on the first and second intensity data comprises selecting the classification based on the combined intensity of the first and second signal components and the combined intensity of the third and fourth signal components.
In one aspect, the plurality of classifications comprises sixteen classifications, each classification representing one of sixteen unique combinations of first and second nucleobases.
In one aspect, the first signal component, second signal component, third signal component and fourth signal component are generated based on light emissions associated with the respective nucleobase.
In one aspect, the light emissions are detected by a sensor, wherein the sensor is configured to provide a single output based upon the first and second signals.
In one aspect, the sensor comprises a single sensing element.
In one aspect, the method further comprises repeating steps (a) to (d) for each of a plurality of base calling cycles.
In one aspect, the first portion is at least 25 base pairs and the second portion is at least 25 base pairs. According to another aspect of the present invention, there is provided a kit comprising instructions for preparing polynucleotide sequences for identification as described herein; and/or polynucleotide sequences as described herein.
According to another aspect of the present invention, there is provided a data processing device comprising means for carrying out a method as described herein.
In one aspect, the data processing device is a polynucleotide sequencer.
According to another aspect of the present invention, there is provided a computer program product comprising instructions which, when the program is executed by a processor, cause the processor to carry out a method as described herein.
According to another aspect of the present invention, there is provided a computer- readable storage medium comprising instructions which, when executed by a processor, cause the processor to carry out a method as described herein.
According to another aspect of the present invention, there is provided a computer- readable data carrier having stored thereon a computer program product as described herein.
According to another aspect of the present invention, there is provided a data carrier signal carrying a computer program product as described herein.
We have also found that the signal-to-noise ratio (SNR) in two clusters (of amplified polynucleotides strands) imaged from the same CMOS pixel or well, is maximum when the intensities of the two clusters are in the ratio 2:1. That is, the SNR is optimal when the first sequencing read and the second sequencing read signal intensities are at a ratio of 2:1. However, systematic deviations from target surface primer densities and amplification efficiencies can alter the resulting ratios. Deviations from this optimal signal intensity is reflected by a reduced SNR. The further the ratio of signal intensities deviates from the optimal ratio of 2:1 , the weaker the SNR and the greater the sequencing errors.
Here we describe a method that can maximise the SNR, and consequently lower the sequencing error rate. The method comprises modulating - or attenuating - the intensity of concurrent signals towards the optimal 2:1 ratio. In particular, the signal intensity ratio is modulated using a calculated ratio of terminated primers. The sequencing method described herein may comprise measuring a first signal intensity and a second signal intensity for a first and second library strand respectively. The signal intensity measurements may be measured using a modified (e.g. labelled) sequencing primer, or alternatively a (non-modified) sequencing primer and at least one or a plurality of modified nucleotide(s). In either method, the measurement of a signal intensity from the modified sequencing primer or one modified nucleotide is sufficient to subsequently attenuate the sequencing signal intensity generated from the first or second strand to the desired ratio.
According to an aspect of the present invention, there is provided a method of preparing a first and second polynucleotide strand for concurrent sequencing, wherein the method comprises:
(a) measuring a first signal intensity generated from a first polynucleotide strand; and
(b) measuring a second signal intensity generated from a second polynucleotide strand; and based on the first signal intensity and the second signal intensity, a subsequent signal intensity generated from sequencing the first nucleotide strand can be attenuated, wherein the attenuated subsequent signal intensity generated from sequencing the first polynucleotide strand has a lower intensity than a subsequent signal intensity generated from sequencing the second polynucleotide strand.
In one embodiment, measurement of the intensity of the first signal and the second signal is carried out sequentially.
The method may comprise applying a first sequencing primer that binds to the first nucleotide strand, wherein the first sequencing primer is labelled, obtaining a first signal from the label and determining the intensity of the first signal.
Alternatively, the method comprises applying a first sequencing primer that binds to the first nucleotide strand, conducting an extension reaction using one or more labelled nucleotides, obtaining a first signal from the one or more labelled nucleotides and determining the intensity of the first signal. In one embodiment, the method comprises washing off the first sequencing primer. In another embodiment, the method comprises washing off the first sequencing primer and the labelled nucleotides.
In one embodiment, the method may further comprise applying a second sequencing primer that binds to the second nucleotide strand, wherein the second sequencing primer is labelled, obtaining a second signal from the label and determining the intensity of the second signal. In one embodiment, the label is cleavable, and the method comprises cleaving the label after determining the intensity of the second signal.
In an alternative embodiment, the method comprises applying a second sequencing primer that binds to the second nucleotide strand, conducting an extension reaction using one or more labelled nucleotides, obtaining a second signal from the one or more labelled nucleotides and determining the intensity of the second signal. Again, the label may be cleavable, and the method comprises cleaving the label after determining the intensity of the second signal.
In one embodiment, the subsequent signal generated from sequencing the first polynucleotide strand is attenuated using a mixture of terminated and non-terminated first sequencing primers.
The terminated primers may comprise at least one modification that prevents extension (i.e. elongation) of the primer by a polymerase. In one embodiment, the modification is selected from a blocking group such as a hairpin loop, a deoxynucleotide, a deoxyribonucleotide, a hydrogen atom instead of a 3’-OH group, a phosphate group, a phosphorothioate group, a propyl spacer (e.g. -O-(CH2)3-OH instead of a 3’-OH group), a modification blocking the 3’-hydroxyl group (e.g. hydroxyl protecting groups, such as silyl ether groups (e.g. trimethylsilyl, triethylsilyl, triisopropylsilyl, t-butyl(dimethyl)silyl, t- butyl(diphenyl)silyl), ether groups (e.g. benzyl, allyl, t-butyl, methoxymethyl (MOM), 2- methoxyethoxymethyl (MEM), tetrahydropyranyl), or acyl groups (e.g. acetyl, benzoyl)), or an inverted nucleobase.
In one embodiment, the ratio of terminated: non-terminated first sequencing primers is calculated based on the first signal intensity and the second signal intensity.
In one embodiment, the ratio of terminated: non-terminated first sequencing primers is calculated using the following formula: tR1 p/ R1p = (X* R1si - R2si)/R2si (Equation 2)
Wherein:
X is the desired ratio of signal intensities. In one embodiment X may be 2. tR1p is terminated R1 primer
R1 p is Read 1 sequencing primer (non-terminated)
R1 si is a first signal intensity generated from a first nucleotide sequence
R2si is a second signal intensity generated from a second nucleotide sequence
In one embodiment, the ratio of the intensity of the attenuated subsequent signal generated from sequencing the first polynucleotide strand to the intensity of the subsequent signal generated from sequencing the second polynucleotide strand is or is around 2:1.
The method may be used to improve the signal-to-noise ratio in nucleic acid sequencing.
In another aspect of the invention there is provided a method of concurrently sequencing a first and second polynucleotide sequence, wherein the method comprises: preparing first and second polynucleotide sequences for sequencing using a method of the invention; attenuating the first signal intensity to generate an adjusted first signal intensity, wherein the adjusted first signal intensity has lower intensity than the second signal intensity; and sequencing nucleobases in the first polynucleotide sequence and the second polynucleotide sequence.
In one embodiment, attenuating comprises applying a mixture of terminated and nonterminated first sequencing primers before sequencing nucleobases in the first polynucleotide sequence and the second polynucleotide sequence.
The step of concurrently sequencing nucleobases may comprise performing sequencing-by-synthesis or sequencing-by-ligation.
In another aspect of the invention there is provided a sequencing kit comprising a plurality of first and second sequencing primers, wherein the first sequencing primers comprises both terminated and non-terminated primers.
The sequencing kit may further comprise labelled first and second sequencing primers. In one embodiment, the labelled second sequencing primers comprise a cleavable site linking the label and the sequencing primer.
The sequencing kit may further comprise instructions for use.
In another aspect of the invention there is provided a data processing device comprising means for carrying out the methods of the invention.
In another aspect of the invention there is provided a data processing device according to the invention wherein the data processing device is a polynucleotide sequencer.
In another aspect of the invention there is provided a computer program product comprising instructions, which when the program is executed by a processor, cause the processor to carry out a method of the invention.
In another aspect of the invention there is provided a computer-readable storage medium comprising instructions which, when executed by a processor, cause the processor to carry out a method of the invention.
In another aspect of the invention there is provided a computer-readable data carrier having stored thereon a computer program product of the invention.
In another aspect of the invention there is provided a data carrier signal carrying a computer program product of the invention.
Brief Description of the Drawings
Features of examples of the present disclosure will become apparent by reference to the following detailed description and drawings, in which like reference numerals correspond to similar, though perhaps not identical, components. For the sake of brevity, reference numerals or features having a previously described function may or may not be described in connection with other drawings in which they appear.
Figure 1 shows a forward strand, reverse strand, forward complement strand, and reverse complement strand of a polynucleotide molecule.
Figure 2 shows an example of a polynucleotide sequence (or insert) with 5’ and 3’ adaptor sequences. Figure 3 shows a typical polynucleotide with 5’ and 3’ adaptor sequences.
Figure 4 shows an example of PCR stitching. Here, two sequences - a strand of a human library and a strand of a phiX library are joined together to create a single polynucleotide strand comprising both a first insert (comprising the strand of the human sequence) and a second insert (comprising the strand of the phiX sequence), as well as terminal and internal adaptor sequences.
Figure 5 shows the preparation of a polynucleotide sequence using a loop fork method.
Figure 6 shows an exemplar solid support 200 comprising a substrate 204 with a plurality of wells 203. Immobilised primers 201 , 202 are found within the well.
Figure 7 shows the stages of bridge amplification and the generation of an amplified cluster comprising (Panel A) a library strand hybridising to an immobilised primer; (Panel B) generation of a template strand from the library strand; (Panel C) dehybridisation and washing away the library strand; (Panel D) hybridisation of the template strand to another immobilised primer; (Panel E) generation of a template complement strand from the template strand via bridge amplification; (Panel F) dehybridisation of the sequence bridge; (Panel G) hybridisation of the template strand and template complement strand to immobilised primers; and (Panel H) subsequent bridge amplification to provide a plurality of template and template complement strands.
Figure 8 shows the detection of nucleobases using 4-channel, 2-channel and 1 -channel chemistry.
Figure 9 is a plot showing graphical representations of sixteen distributions of signals generated by polynucleotide sequences according to one embodiment.
Figure 10 is a flow diagram showing a method for base calling according to one embodiment.
Figure 11 shows an embodiment where sequencing is conducted on a solid support as described herein. Panel A indicates the process used from the initial preparation of the solid support to the sequencing-by-synthesis step used in concurrent sequencing. Panel B illustrates the sequencing process conducted for the R1 and R2 concurrent sequencing step, where R1 signal is twice that of R2. In this case, all P7 primers are nicked, whilst only half of the P5 primers are nicked. The boxed region indicates optional cross-linking between the 3’-end of the template strands and the surface of the solid support.
Figure 12 shows another embodiment where sequencing is conducted on a solid support as described herein. Panel A indicates the process used from the initial preparation of the solid support to the sequencing-by-synthesis step used in concurrent sequencing. Panel B illustrates the sequencing process conducted for the R1 and R2 concurrent sequencing step, where R1 signal is twice that of R2. Again, in this case, all P7 primers are nicked, whilst only half of the P5 primers are nicked. The double bar lines on the P7 primers and some of the P5 primers indicates the cleavage sites. The boxed region indicates cross-linking between the template strands and the P5/P7 primers.
Figure 13 shows various ways of cross-linking the template strands to the solid support, such as to the surface of the solid support, the first immobilised primers or the second immobilised primers. Panel A indicates a way of attaching the template strand to the surface of the solid support using a biotin-streptavidin interaction. This is done by putting a poly-A tail (dotted line) and a nucleotide comprising a biotin moiety on the end of template strand, which is able to form a non-covalent interaction with streptavidin (rounded rectangle) on the solid support surface. Panel B indicates another way of attaching the template strand to the surface of the solid support using covalent bonds. This is done by putting a nucleotide (eight-pointed star) comprising an alkyne or thiophosphate group on the end of the template strand, which is able to form covalent bonds with azide groups or hydroxyl groups on the solid support surface. Panel C indicates a way of attaching the template strand to the immobilised primers. This is done by incorporating 3-cyanovinylcarbazole (CNVK) instead of a nucleobase in the immobilised primer. The 3-cyanovinylcarbazole can then undergo a [2 + 2] cycloaddition reaction under irradiation with UV light (at 365 nm) to form a covalent linkage with pyrimidine bases (Py) on the template strand.
Figure 14 shows a flowcell grafted with various ratios of first immobilised primers (P7) some of which are cleavable (P7-8oxoG) and some of which are not cleavable (P7 nonlin), as well as second immobilised primers (P5) which are all cleavable (P5-LI) according to Example 1 to 3 and Reference Example 1. In Panel A, from left to right, are shown lanes 1 to 8, with intensities obtained on the first cycle highlighted in various regions for each lane. Lanes 1 and 2 (Reference Example 1 , control) show the intensity obtained on the first read (R1) and the second read (R2) respectively, where 10 pM cleavable first precursor primers (P7-8oxoG) and 10 pM cleavable second precursor primers (P5-LI) were used for initial primer concentrations during grafting in both lanes. Lanes 3 and 4 (Example 1) show the intensity obtained on the first read (R1) and the second read (R2) respectively, where 7.5 pM cleavable first precursor primers (P7-8oxoG), 2.5 pM non- cleavable first precursor primers (P7 non-lin) and 10 pM cleavable second precursor primers (P5-LI), were used for initial primer concentrations during grafting in both lanes. Lanes 5 and 6 (Example 2) show the intensity obtained on the first read (R1) and the second read (R2) respectively, where 5.0 pM cleavable first precursor primers (P7- 8oxoG), 5.0 pM non-cleavable first precursor primers (P7 non-lin) and 10 pM cleavable second precursor primers (P5-U), were used for initial primer concentrations during grafting in both lanes. Lanes 7 and 8 (Example 3) show the intensity obtained on the first read (R1) and the second read (R2) respectively, where 2.5 pM cleavable first precursor primers (P7-8oxoG), 7.5 pM non-cleavable first precursor primers (P7 non-lin) and 10 pM cleavable second precursor primers (P5-U), were used for initial primer concentrations during grafting in both lanes. Panel B shows a comparison of the average intensities of each lane, with lanes 3 and 4 coming closest to a 1 :2 ratio of intensities between R1 and R2. (N.B. P7 is used as first immobilised primer and P5 is used as second immobilised primer in Panel A, whereas P5 is used as first immobilised primer and P7 is used as second immobilised primer in Figures 11 and 12).
Figure 15 shows the key steps of one embodiment of the present invention. (Step A) A first signal intensity is measured. The read 1 primer is flowed in and the intensity of a signal from a modification within the read 1 primer or from the extension of the read 1 primer is measured. (Step B) A second signal intensity is measured. The read 2 primer is flowed in and the intensity of a signal from a modification within the read 2 primer or from the extension of the read 2 primer is measured. (Step C) Using the first and second signal intensity measurements, the amount of terminated primer is calculated to achieve an effective 1 :2 ratio of read 1 : read 2 signals. Flow in terminated and extensible primers and start simultaneous sequencing.
Figure 16 illustrates that as the primer ratio of read 1 : read 2 primers is modulated towards 2:1 , the signal-to-noise ratio (SNR) between 16QaM clouds is improved. A first and second library strand exist adjacent within the same nanowell, due to surface chemistry patterning. (A). Read 1 and read 2 primers at a ratio of 1 :1 , do not create sufficient SNR for 16QaM sequencing. (B). Read 1 and read 2 primers at a ratio of 1 :1.2, generate low SNR between 16QaM clouds, so sequencing errors are high. (C). Modulating the effective ratio of read 1 and read 2 primers using terminated primers for one reads, towards 2:1 creates an optimal SNR between 16QaM clouds, reducing sequencing errors.
Figure 17 shows panels that as the primer ratio of read 1 : read 2 primers is modulated towards 2:1 the signal-to-noise ratio (SNR) between 16QaM clouds is improved. (Panel A) Read 1 and read 2 primers at a ratio of 1.5:1 do not show an even SNR between 16QaM clouds. (Panel B) read 1 and read 2 primers at a ratio of 2:1 , show an even SNR between 16QaM clouds, allowing for high-accuracy simultaneous sequencing. (Panel C) Read 1 and read 2 primers at a ratio of 3:1 , show uneven SNR between 16QaM clouds. Between-cloud SNR values of 7.00 - 9.99 indicate poor SNR, 10.00 - 11 .99 indicate fair SNR, 12.00 - 17.00 indicate good SNR.
Figure 18 shows Panels of plots illustrating that primer ratios ~2:1 show a low error rate (ER, % confusion between each pair of clouds) and a high minimum cloud-to-cloud SNR. (Panels A-C), Dim Intensity = 100; (Panels D-F), Dim Intensity = 50. (Panel A, D) error rate (ER, % confusion) and primer ratio; (Panel B, E) minimum cloud-to-cloud SNR and primer ratio; (Panel C, F) median lowest 10 SNR and primer ratio.
Detailed Description
All patents, patent applications, and other publications referred to herein, including all sequences disclosed within these references, are expressly incorporated herein by reference, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated by reference. All documents cited are, in relevant part, incorporated herein by reference in their entireties for the purposes indicated by the context of their citation herein. However, the citation of any document is not to be construed as an admission that it is prior art with respect to the present disclosure.
The present invention can be used in sequencing, in particular concurrent sequencing. Methodologies applicable to the present invention have been described in WO 08/041002, WO 07/052006, WO 98/44151 , WO 00/18957, WO 02/06456, WO 07/107710, WO 05/068656, US 13/661 ,524 and US 2012/0316086, the contents of which are herein incorporated by reference. Further information can be found in US 20060024681 , US 20060292611 , WO 06/110855, WO 06/135342, WO 03/074734, WO 07/010252, WO 07/091077, WO 00/179553, WO 98/44152 and WO 2022/087150, the contents of which are herein incorporated by reference. As used herein, the term “variant” refers to a variant polypeptide sequence or part of the polypeptide sequence that retains desired function of the full non-variant sequence. For example, a desired function of the immobilised primer retains the ability to bind (i.e. hybridise) to a target sequence.
As used in any aspect described herein, a “variant” has at least 25%, 26%, 27%, 28%, 29%, 30%, 31 %, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41 %, 42%, 43%,
44%, 45%, 46%, 47%, 48%, 49%, 50%, 51 %, 52%, 53%, 54%, 55%, 56%, 57%, 58%,
59%, 60%, 61 %, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71 %, 72%, 73%,
74%, 75%, 76%, 77%, 78%, 79%, 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%,
89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% overall sequence identity to the non-variant nucleic acid sequence. The sequence identity of a variant can be determined using any number of sequence alignment programs known in the art. As an example, Emboss Stretcher from the EMBL-EBI may be used as found on the Internet at ebi.ac.uk/Tools/psa/emboss_stretcher/ (using default parameters: pair output format, Matrix = BLOSUM62, Gap open = 1 , Gap extend = 1 for proteins; pair output format, Matrix = DNAfull, Gap open = 16, Gap extend = 4 for nucleotides).
As used herein, the term “fragment” refers to a functionally active series of consecutive nucleic acids from a longer nucleic acid sequence. The fragment may be at least 99%, at least 95%, at least 90%, at least 80%, at least 70%, at least 60%, at least 50%, at least 40% or at least 30% the length of the longer nucleic acid sequence. In one embodiment, a fragment as used herein also retains the ability to bind (i.e. hybridise) to a target sequence.
Sequencing generally comprises four fundamental steps: 1) library preparation to form a plurality of target polynucleotides for identification; 2) cluster generation to form an array of amplified template polynucleotides; 3) sequencing the cluster array of amplified template polynucleotides; and 4) data analysis to identify characteristics of the target polynucleotides from the amplified template polynucleotide sequences. These steps are described in greater detail below.
Library strands and template terminology As shown in Figure 1 , for a given double-stranded polynucleotide sequence 100 to be identified, the polynucleotide sequence 100 comprises a forward strand of the sequence 101 and a reverse strand of the sequence 102. See Figure 1.
When the polynucleotide sequence 100 is replicated (e.g. using a DNA/RNA polymerase), complementary versions of the forward strand 101 of the sequence 100 and the reverse strand 102 of the sequence 100 are generated. Thus, replication of the polynucleotide sequence 100 provides a double-stranded polynucleotide sequence 100a that comprises a forward strand of the sequence 101 and a forward complement strand of the sequence 10T, and a double-stranded polynucleotide sequence 100b that comprises a reverse strand of the sequence 102 and a reverse complement strand of the sequence 102’.
The term “template” may be used to describe a complementary version of the doublestranded polynucleotide sequence 100. As such, the “template” comprises a forward complement strand of the sequence 10T and a reverse complement strand of the sequence 102’. Thus, by using the forward complement strand of the sequence 10T as a template for complementary base pairing, a sequencing process (e.g. a sequencing- by-synthesis or a sequencing-by-ligation process) reproduces information that was present in the original forward strand of the sequence 101 . Similarly, by using the reverse complement strand of the sequence 102’ as a template for complementary base pairing, a sequencing process (e.g. a sequencing-by-synthesis or a sequencing-by-ligation process) reproduces information that was present in the original reverse strand of the sequence 102.
The two strands in the template may also be referred to as a forward strand of the template 10T and a reverse strand of the template 102’. The complement of the forward strand of the template 10T is termed the forward complement strand of the template 101 , whilst the complement of the reverse strand of the template 102’ is termed the reverse complement strand of the template 102.
Generally, where forward strand, reverse strand, forward complement strand, and reverse complement strand are used herein without qualifying whether they are with respect to the original polynucleotide sequence 100 or with respect to the “template”, these terms may be interpreted as referring to the “template”.
Figure imgf000022_0001
Library preparation
Library preparation is the first step in any high-throughput sequencing platform. These libraries allow templates to be generated via complementary base pairing that can subsequently be clustered and amplified. During library preparation, nucleic acid sequences, for example genomic DNA sample, or cDNA or RNA sample, is converted into a sequencing library, which can then be sequenced. By way of example with a DNA sample, the first step in library preparation is random fragmentation of the DNA sample. Sample DNA is first fragmented and the fragments of a specific size (typically 200-500 bp, but can be larger) are ligated, sub-cloned or “inserted” in-between two oligo adaptors (adaptor sequences). The original sample DNA fragments are referred to as “inserts”. The target polynucleotides may advantageously also be size-fractionated prior to modification with the adaptor sequences.
As described herein, the templates to be generated typically include separate polynucleotide sequences, in particular a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion. Generating these templates from particular libraries may be performed according to methods known to persons of skill in the art. However, some example approaches of preparing libraries suitable for generation of such templates are described below. In some embodiments, the library may be prepared by ligating adaptor sequences to double-stranded polynucleotide sequences, each comprising a forward strand of the sequence and a reverse strand of the sequence, as described in more detail in e.g. WO 07/052006, which is incorporated herein by reference. In some cases, “tagmentation” can be used to attach the sample DNA to the adaptors, as described in more detail in e.g. WO 10/048605, US 2012/0301925, US 2013/0143774 and WO 2016/189331 , each of which are incorporated herein by reference. In tagmentation, double-stranded DNA is simultaneously fragmented and tagged with adaptor sequences and PCR primer binding sites. The combined reaction eliminates the need for a separate mechanical shearing step during library preparation. These procedures may be used, for example, for preparing templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion, wherein the first portion is a forward strand of the template, and the second portion is a forward complement strand of the template - i.e. a copy of the forward strand (or alternatively, wherein the first portion is a reverse strand of the template, and the second portion is a reverse complement strand of the template).
Where features herein are described in relation to the “forward” strand, it should be considered that these features could equally be applied to the “reverse strand”. Equally, where reference is made to the reverse strand, this can mean a complement of the forward strand, and the forward strand can mean a complement of the reverse strand.
Where libraries are prepared by ligating adaptor sequences to double-stranded polynucleotide sequences as described above, library preparation may comprise ligating a first primer-binding sequence 30T or an adaptor that comprises a primer-binding sequence 30T (e.g. P5’, such as SEQ ID NO. 3) and a second terminal sequencing primer binding site 304 (e.g. SBS3’, for example, SEQ ID NO. 8) to a 3’-end of a forward strand of a sequence 101. See Figure 2. The library preparation may be arranged such that the second terminal sequencing primer binding site 304 is attached (e.g. directly attached) to the 3’-end of the forward strand of the sequence 101 , and such that the first primer-binding sequence 30T is attached (e.g. directly attached) to the 3’-end of the second terminal sequencing primer binding site 304.
The library preparation may further comprise ligating a complement or an adaptor comprising a complement of first terminal sequencing primer binding site 303’ (e.g. SBS12, such as SEQ ID NO. 9) (also referred to herein as a first terminal sequencing primer binding site complement 303’) and a complement of a second primer-binding sequence 302 (also referred to herein as a second primer-binding complement sequence 302) (e.g. P7, such as SEQ ID NO. 2) to a 5’-end of the forward strand of the sequence 101. The library preparation may be arranged such that first terminal sequencing primer binding site complement 303’ is attached (e.g. directly attached) to the 5’-end of the forward strand of the sequence 101 , and such that second primer-binding complement sequence 302 is attached (e.g. directly attached) to the 5’-end of first terminal sequencing primer binding site complement 303’.
Thus, one strand of a polynucleotide within a polynucleotide library may comprise, in a 5’ to 3’ direction, a second primer-binding complement sequence 302 or a first adaptor comprising a second primer-binding complement sequence 302 (e.g. P7), a first terminal sequencing primer binding site complement 303’ (e.g. SBS12), a forward strand of the sequence/insert 101 , a second terminal sequencing primer binding site 304 or a second adaptor comprising a second terminal sequencing primer binding site 304 (e.g. SBS3’), and a first primer-binding sequence 30T (e.g. P5’) (Figure 2 - bottom strand).
Although not shown in Figure 2, the strand may further comprise one or more index sequences. As such, a first index sequence (e.g. i7) may be provided between the second primer-binding complement sequence 302 (e.g. P7) and the first terminal sequencing primer binding site complement 303’ (e.g. SBS12). Separately, or in addition, a second index complement sequence (e.g. i5’) may be provided between the second terminal sequencing primer binding site 304 (e.g. SBS3’) and the first primer-binding sequence 30T (e.g. P5’). Thus, in some embodiments, one strand of a polynucleotide within a polynucleotide library may comprise, in a 5’ to 3’ direction, a second primerbinding complement sequence 302 (e.g. P7), a first index sequence (e.g. i7), a first terminal sequencing primer binding site complement 303’ (e.g. SBS12), a forward strand of the sequence 101 , a second terminal sequencing primer binding site 304 (e.g. SBS3’), a second index complement sequence (e.g. i5’), and a first primer-binding sequence 30T (e.g. P5’). A typical polynucleotide is shown in Figure 3 (bottom strand).
When a double-stranded sequence 100 is used, the library preparation may also comprise ligating a second primer-binding sequence 302’ or a third adaptor comprising a second primer-binding sequence 302’ (e.g. P7’) and a first terminal sequencing primer binding site 303 (e.g. SBS12’) to a 3’-end of a reverse strand of a sequence 102. The library preparation may be arranged such that first terminal sequencing primer binding site 303 is attached (e.g. directly attached) to the 3’-end of the reverse strand of the sequence 102, and such that the second primer-binding sequence 302’ is attached (e.g. directly attached) to the 3’-end of first terminal sequencing primer binding site 303.
The library preparation may further comprise ligating a complement of a second terminal sequencing primer binding site 304’ or a fourth adaptor comprising a second terminal sequencing primer binding site 304’ (e.g. SBS3) (also referred to herein as a second terminal sequencing primer binding site complement 304’) and a complement of a first primer-binding sequence 301 (also referred to herein as a first primer-binding complement sequence 301) (e.g. P5) to a 5’-end of the reverse strand of the sequence 102. The library preparation may be arranged such that the second terminal sequencing primer binding site complement 304’ is attached (e.g. directly attached) to the 5’-end of the reverse strand of the sequence 102, and such that the first primer-binding complement sequence 301 is attached (e.g. directly attached) to the 5’-end of the second terminal sequencing primer binding site complement 304’.
Thus, another strand of a polynucleotide within a polynucleotide library may comprise, in a 5’ to 3’ direction, a first primer-binding complement sequence 301 or a third adaptor comprising a first primer-binding complement sequence 301 (e.g. P5), a second terminal sequencing primer binding site complement 304’ (e.g. SBS3), a reverse strand of the sequence/insert102, a first terminal sequencing primer binding site 303 or a fiurth adaptor comprising first terminal sequencing primer binding site 303 (e.g. SBS12’), and a second primer-binding sequence 302’ (e.g. P7’) (Figure 2 - top strand).
Although not shown in Figure 2, the another strand may further comprise one or more index sequences. As such, a second index sequence (e.g. i5) may be provided between the first primer-binding complement sequence 301 (e.g. P5) and the second terminal sequencing primer binding site complement 304’ (e.g. SBS3). Separately, or in addition, a first index complement sequence (e.g. i7’) may be provided between the first terminal sequencing primer binding site 303 (e.g. SBS12’) and the second primer-binding sequence 302’ (e.g. P7’). Thus, in some embodiments, another strand of a polynucleotide within a polynucleotide library may comprise, in a 5’ to 3’ direction, a first primer-binding complement sequence 301 (e.g. P5), a second index sequence (e.g. i5), a second terminal sequencing primer binding site complement 304’ (e.g. SBS3), a reverse strand of the sequence 102, a first terminal sequencing primer binding site 303 (e.g. SBS12’), a first index complement sequence (e.g. i7’), and a second primer-binding sequence 302’ (e.g. P7’). A typical polynucleotide is shown in Figure 3 (top strand). In some embodiments, the library may be prepared using PCR stitching methods, such as (splicing by) overlap extension PCR (also known as OE-PCR or SOE-PCR), as described in more detail in e.g. Higuchi et al. (Nucleic Acids Res., 1988, vol. 16, pp. 7351-7367), which is incorporated herein by reference. A representative process for conducting PCR stitching for a human and PhiX library is shown in Figure 4.
As used herein, the term “genetically unrelated” refers to portions which are not related in the sense of being any two of the group consisting of: forward strands, reverse strands, forward complement strands, and reverse complement strands. However, the “genetically unrelated” sequences could be different fragment sequences which are derived from the same source, but are different fragments from that source (e.g. from the same fragmented library preparation process). This includes sequences that can be overlapping in sequence (but not identical in sequence).
As will be described later, during clustering and amplification, further processes may be used to generate templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion, wherein the first portion and the second portion are genetically unrelated.
In some embodiments, the library may be prepared using a loop fork method, which is described below. This procedure may be used, for example, for preparing templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion, wherein the first portion is a forward strand of the template, and the second portion is a reverse complement strand of the template (or alternatively, wherein the first portion is a reverse strand of the template, and the second portion is a forward complement strand of the template). Such libraries may also be referred to as self-tandem inserts. A representative process for conducting a loop fork method is shown in Figure 5.
Starting from a double-stranded polynucleotide sequence comprising a forward strand of the sequence and a reverse strand of the sequence, adaptors may be ligated to a first end of the sequence (e.g. using processes as described in more detail in e.g. WO 07/052006, or “tagmentation” methods as described above). A second end of the sequence (different from the first end) may be ligated to a loop, which connects the forward strand of the sequence and the reverse strand of the sequence, thus generating a loop fork ligated polynucleotide sequence. Conducting PCR on the loop fork ligated polynucleotide sequence produces a new double-stranded polynucleotide sequence, one strand comprising the forward strand of the sequence and the reverse strand of the sequence, and the other strand comprising a forward complement strand of the sequence and a reverse complement strand of the sequence. The library is now ready for seeding, clustering and amplification.
As will be described later, during clustering and amplification, further processes may be used to generate templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion, wherein the first portion is a forward strand of the template, and the second portion is a reverse complement strand of the template (or alternatively, wherein the first portion is a reverse strand of the template, and the second portion is a forward complement strand of the template).
As will be understood by the skilled person, a double-stranded nucleic acid will typically be formed from two complementary polynucleotide strands comprised of deoxyribonucleotides or ribonucleotides joined by phosphodiester bonds, but may additionally include one or more ribonucleotides and/or non-nucleotide chemical moieties and/or non-naturally occurring nucleotides and/or non-naturally occurring backbone linkages. In particular, the double-stranded nucleic acid may include non- nucleotide chemical moieties, e.g. linkers or spacers, at the 5' end of one or both strands. By way of non-limiting example, the double-stranded nucleic acid may include methylated nucleotides, uracil bases, phosphorothioate groups, peptide conjugates etc. Such non-DNA or non-natural modifications may be included in order to confer some desirable property to the nucleic acid, for example to enable covalent, non-covalent or metal-coordination attachment to a solid support, or to act as spacers to position the site of cleavage an optimal distance from the solid support. A single stranded nucleic acid consists of one such polynucleotide strand. Where a polynucleotide strand is only partially hybridised to a complementary strand - for example, a long polynucleotide strand hybridised to a short nucleotide primer - it may still be referred to herein as a single stranded nucleic acid.
A sequence comprising at least a primer-binding sequence (a primer-binding sequence and a sequencing primer binding site, in another aspect, a combination of a primerbinding sequence, an index sequence and a sequencing primer binding site) may be referred to herein as an adaptor sequence, and an insert is flanked by a 5’ adaptor sequence and a 3’ adaptor sequence. The primer-binding sequence may also comprise a sequencing primer for the index read. As used herein, an “adaptor” refers to a sequence that comprises a short sequencespecific oligonucleotide that is ligated to the 5' and 3' ends of each DNA (or RNA) fragment in a sequencing library as part of library preparation. The adaptor sequence may further comprise non-peptide linkers.
In a further embodiment, the P5’ and P7’ primer-binding sequences are complementary to short primer sequences (or lawn primers) present on the surface of a flow cell. Binding of P5’ and P7’ to their complements (P5 and P7) on - for example - the surface of the flow cell, permits nucleic acid amplification. As used herein denotes the complementary strand.
The primer-binding sequences in the adaptor which permit hybridisation to amplification primers (e.g. lawn primers) will typically be around 20-40 nucleotides in length, although the invention is not limited to sequences of this length. The precise identity of the amplification primers (e.g. lawn primers), and hence the cognate sequences in the adaptors, are generally not material to the invention, as long as the primer-binding sequences are able to interact with the amplification primers in order to direct PCR amplification. The sequence of the amplification primers may be specific for a particular target nucleic acid that it is desired to amplify, but in other embodiments these sequences may be "universal" primer sequences which enable amplification of any target nucleic acid of known or unknown sequence which has been modified to enable amplification with the universal primers. The criteria for design of PCR primers are generally well known to those of ordinary skill in the art.
The index sequences (also known as a barcode or tag sequence) are unique short DNA (or RNA) sequences that are added to each DNA (or RNA) fragment during library preparation. The unique sequences allow many libraries to be pooled together and sequenced simultaneously. Sequencing reads from pooled libraries are identified and sorted computationally, based on their barcodes, before final data analysis. Library multiplexing is also a useful technique when working with small genomes or targeting genomic regions of interest. Multiplexing with barcodes can exponentially increase the number of samples analysed in a single run, without drastically increasing run cost or run time. Examples of tag sequences are found in WO05/068656, whose contents are incorporated herein by reference in their entirety. The tag can be read at the end of the first read, or equally at the end of the second read, for example using a sequencing primer complementary to the strand marked P7. The invention is not limited by the number of reads per cluster, for example two reads per cluster: three or more reads per cluster are obtainable simply by dehybridising a first extended sequencing primer, and rehybridising a second primer before or after a cluster repopulation/strand resynthesis step. Methods of preparing suitable samples for indexing are described in, for example WO 2008/093098, which is incorporated herein by reference. Single or dual indexing may also be used. With single indexing, up to 48 unique 6-base indexes can be used to generate up to 48 uniquely tagged libraries. With dual indexing, up to 24 unique 8-base Index 1 sequences and up to 16 unique 8-base Index 2 sequences can be used in combination to generate up to 384 uniquely tagged libraries. Pairs of indexes can also be used such that every i5 index and every i7 index are used only one time. With these unique dual indexes, it is possible to identify and filter indexed hopped reads, providing even higher confidence in multiplexed samples.
The sequencing primer binding sites are sequencing and/or index primer binding sites and indicate the starting point of the sequencing read. In one embodiment, amsequencing primer binding site is only required to be a binding site for a sequencing primer. During the sequencing process, a sequencing primer anneals (i.e. hybridises) to at least a portion of the sequencing primer binding site on the template strand. The polymerase enzyme binds to this site and incorporates complementary nucleotides base by base into the growing opposite strand.
A sequence comprising at least a primer-binding sequence (a primer-binding sequence and a sequencing primer binding site, or in another aspect, a combination of a primerbinding sequence, an index sequence and a sequencing primer binding site) may be referred to herein as an adaptor sequence, and an insert is flanked by a 5’ adaptor sequence and a 3’ adaptor sequence. As used herein the 5’ adaptor sequence is the adaptor sequence that is attached to the 5’ end of the insert. Similarly, the 3’ adaptor sequence is the adaptor sequence that is attached to the 3’ end of the insert. The primerbinding sequence may also comprise a sequencing primer for the index read. However, one advantage of the present invention is that a separate sequencing primer for the index read is not needed, reducing the number of reagents/primers needed for a sequencing read.
Accordingly, as used herein, an “adaptor” refers to a sequence that comprises a short sequence-specific oligonucleotide that is ligated to the 5' and 3' ends of each DNA (or RNA) fragment in a sequencing library as part of library preparation. The adaptor sequence may further comprise non-peptide linkers. Cluster generation and amplification
Once a double stranded nucleic acid library is formed, typically, the library has previously been or is subjected to denaturing conditions to provide single stranded nucleic acids. Suitable denaturing conditions will be apparent to the skilled reader with reference to standard molecular biology protocols (Sambrook et al., 2001 , Molecular Cloning, A Laboratory Manual, 4th Ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor Laboratory Press, NY; Current Protocols, eds Ausubel et al). In one embodiment, chemical denaturation may be used.
Following denaturation, a single-stranded library may be contacted in free solution onto a solid support comprising surface capture moieties (for example P5 and P7 lawn primers).
Thus, embodiments of the present invention may be performed on a solid support 200, such as a flowcell. However, in alternative embodiments, seeding and clustering can be conducted off-flowcell using other types of solid support.
The solid support 200 may comprise a substrate 204. See Figure 6. The substrate 204 comprises at least one well 203 (e.g. a nanowell), and typically comprises a plurality of wells 203 (e.g. a plurality of nanowells).
In one embodiment, the solid support comprises a plurality of first immobilised primers and a plurality of second immobilised primers.
Thus, each well 203 may comprise a plurality of first immobilised primers 201. In addition, each well 203 may comprise a plurality of second immobilised primers 202. Thus, each well 203 may comprise a plurality of first immobilised primers 201 and a plurality of second immobilised primers 202.
The first immobilised primer 201 may be attached via a 5’-end of its polynucleotide chain to the solid support 200. When extension occurs from first immobilised primer 201 , the extension may be in a direction away from the solid support 200.
The second immobilised primer 202 may be attached via a 5’-end of its polynucleotide chain to the solid support 200. When extension occurs from second immobilised primer 202, the extension may be in a direction away from the solid support 200. The first immobilised primer 201 may be different to the second immobilised primer 202 and/or a complement of the second immobilised primer 202. The second immobilised primer 202 may be different to the first immobilised primer 201 and/or a complement of the first immobilised primer 201.
The (or each of the) first immobilised primer(s) 201 may comprise a sequence as defined in SEQ ID NO. 1 or 5, or a variant or fragment thereof. The (or each of the) second immobilised primer(s) 202 may comprise a sequence as defined in SEQ ID NO. 2, or a variant or fragment thereof. Whilst first immobilised primer(s) 201 are shown here to correspond to P5 and second immobilised primer(s) 202 are shown here to correspond to P7, the definitions of these may be swapped - in other words, first immobilised primer(s) 201 may correspond instead to P7, and second immobilised primer(s) 202 may correspond to P5.
By way of brief example, following attachment of the P5 and P7 primers to the solid support, the solid support may be contacted with the template to be amplified under conditions which permit hybridisation (or annealing - such terms may be used interchangeably) between the template and the immobilised primers. The template is usually added in free solution under suitable hybridisation conditions, which will be apparent to the skilled reader. Typically, hybridisation conditions are, for example, 5xSSC at 40°C. However, other temperatures may be used during hybridisation, for example about 50°C to about 75°C, about 55°C to about 70°C, or about 60°C to about 65°C. Solid-phase amplification can then proceed. The first step of the amplification is a primer extension step in which nucleotides are added to the 3' end of the immobilised primer using the template to produce a fully extended complementary strand. The template is then typically washed off the solid support. The complementary strand will include at its 3' end a primer-binding sequence (i.e. either P5’ or P7’) which is capable of bridging to the second primer molecule immobilised on the solid support and binding. Further rounds of amplification (analogous to a standard PCR reaction) leads to the formation of clusters or colonies of template molecules bound to the solid support. This is called clustering.
Thus, solid-phase amplification by either a method analogous to that of WO 98/44151 or that of WO 00/18957 (the contents of which are incorporated herein in their entirety by reference) will result in production of a clustered array comprised of colonies of "bridged" amplification products. This process is known as bridge amplification. Both strands of the amplification products will be immobilised on the solid support at or near the 5' end, this attachment being derived from the original attachment of the amplification primers. Typically, the amplification products within each colony will be derived from amplification of a single template molecule. Other amplification procedures may be used, and will be known to the skilled person. For example, amplification may be isothermal amplification using a strand displacement polymerase; or may be exclusion amplification as described in WO 2013/188582. Further information on amplification can be found in WO 02/06456 and WO 07/107710, the contents of which are incorporated herein in their entirety by reference.
Through such approaches, a cluster of template molecules is formed, comprising copies of a template strand and copies of the complement of the template strand.
As used herein, the term “cluster” may refer to a clonal group of template polynucleotides (e.g. DNA or RNA) bound within a single well of a solid support (e.g. flow cell). As such, a cluster may refer to the population of polynucleotide molecules within a well that are then sequenced. A “cluster” may contain a sufficient number of copies of template polynucleotides such that the cluster is able to output a signal (e.g. a light signal) that allows sequencing reads to be performed on the cluster. A “cluster” may comprise, for example, about 500 to about 2000 copies, about 600 to about 1800 copies, about 700 to about 1600 copies, about 800 to about 1400 copies, about 900 to about 1200 copies, or about 1000 copies of template polynucleotides.
A first signal that is capable of being produced by the first portion and a second signal that is capable of being produced by the second portion may be spatially unresolved (e.g. generated from the same region or substantially overlapping regions). In one embodiment, a first region occupied by the first polynucleotide sequence comprising the first portion within the duoclonal cluster is the same as, or substantially overlapping with, a second region occupied by the second polynucleotide sequence comprising the second portion within the duoclonal cluster.
By “duoclonal” cluster is meant that the population of polynucleotide sequences that are then sequenced (as the next step) are substantially of two types - e.g. a first sequence and a second sequence. As such, a “duoclonal” cluster may refer to the population of single first sequences and single second sequences within a well that are then sequenced. A “duoclonal” cluster may contain a sufficient number of copies of a single first sequence and copies of a single second sequence such that the cluster is able to output a signal (e.g. a light signal) that allows sequencing reads to be performed on the “monoclonal” cluster. A “duoclonal” cluster may comprise, for example, about 500 to about 2000 combined copies, about 600 to about 1800 combined copies, about 700 to about 1600 combined copies, about 800 to 1400 combined copies, about 900 to 1200 combined copies, or about 1000 combined copies of single first sequences and single second sequences. The copies of single first sequences and single second sequences together may comprise at least about 50%, at least about 60%, at least about 70%, even at least about 80%, at least about 90%, or about 95%, 98%, 99% or 100% of all polynucleotides within a single well of the flow cell, and thus providing a substantially duoclonal “cluster”.
The steps of cluster generation and amplification for templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion, are illustrated below and in Figure 7.
In cases where (separate) polynucleotide strands are used, each first polynucleotide sequence may be attached (via the 5’-end of the first polynucleotide sequence) to a first immobilised primer, and wherein each second polynucleotide sequence is attached (via the 5’-end of the second polynucleotide sequence) to a second immobilised primer. Each first polynucleotide sequence may comprise a second adaptor sequence, wherein the second adaptor sequence comprises a portion, which is substantially complementary to the second immobilised primer (or is substantially complementary to the second immobilised primer). The second adaptor sequence may be at a 3’-end of the first polynucleotide sequence. Each second polynucleotide sequence may comprise a first adaptor sequence, wherein the first adaptor sequence comprises a portion, which is substantially complementary to the first immobilised primer (or is substantially complementary to the first immobilised primer). The first adaptor sequence may be at a 3’-end of the second polynucleotide sequence.
In an embodiment, a solution comprising a polynucleotide library prepared by ligating adaptor sequences to double-stranded polynucleotide sequences as described above may be flown across a flowcell.
A particular polynucleotide strand from the polynucleotide library to be sequenced comprising, in a 5’ to 3’ direction, a second primer-binding complement sequence 302 (e.g. P7), a first terminal binding site complement 303’ (e.g. SBS12), a forward strand of the sequence 101 , a second terminal sequencing primer binding site 304 (e.g. SBS3’) and a first primer-binding sequence 30T (e.g. P5’), may anneal (via the first primerbinding sequence 301’) to the first immobilised primer 201 (e.g. P5 lawn primer) located within a particular well 203 (Figure 7A).
The polynucleotide library may comprise other polynucleotide strands with different forward strands of the sequence 101. Such other polynucleotide strands may anneal to corresponding first immobilised primers 201 (e.g. P5 lawn primers) in different wells 203, thus enabling parallel processing of the various different strands within the polynucleotide library.
A new polynucleotide strand may then be synthesised, extending from the first immobilised primer 201 (e.g. P5 lawn primer) in a direction away from the substrate 204. By using complementary base-pairing, this generates a template strand comprising, in a 5’ to 3’ direction, the first immobilised primer 201 (e.g. P5 lawn primer) which is attached to the solid support 200, a second terminal sequencing primer binding site complement 304’ (e.g. SBS3), a forward strand of the template 10T (which represents a type of “first portion”), a first terminal sequencing primer binding site 303 (which represents a type of “first sequencing primer binding site”) (e.g. SBS12’), and a second primer-binding sequence 302’ (e.g. P7’) (Figure 7B). Such a process may utilise an appropriate polymerase, such as a DNA or RNA polymerase.
If the polynucleotides in the library comprise index sequences, then corresponding index sequences are also produced in the template.
The polynucleotide strand from the polynucleotide library may then be dehybridised and washed away, leaving a template strand attached to the first immobilised primer 201 (e.g. P5 lawn primer) (Figure 7C).
The second primer-binding sequence 302’ (e.g. P7’) on the template strand may then anneal to a second immobilised primer 202 (e.g. P7 lawn primer) located within the well 203. This forms a “bridge” or “sequence bridge” (Figure 7D).
A new polynucleotide strand may then be synthesised by bridge amplification, extending from the second immobilised primer 202 (e.g. P7 lawn primer) (initially) in a direction away from the substrate 204. By using complementary base-pairing, this generates a template strand comprising, in a 5’ to 3’ direction, the second immobilised primer 202 (e.g. P7 lawn primer) which is attached to the solid support 200, a first terminal sequencing primer binding site complement 303’ (e.g. SBS12), a forward complement strand of the template 101 (which represents a type of “second portion”), a second terminal sequencing primer binding site 304 (which represents a type of “second sequencing primer binding site”) (e.g. SBS3’), and a first primer-binding sequence 30T (e.g. P5’) (Figure 7E). Again, such a process may utilise a suitable polymerase, such as a DNA or RNA polymerase.
The strand attached to the second immobilised primer 202 (e.g. P7 lawn primer) may then be dehybridised from the strand attached to the first immobilised primer 201 (e.g. P5 lawn primer) (Figure 7F).
A subsequent bridge amplification cycle can then lead to amplification of the strand attached to the first immobilised primer 201 (e.g. P5 lawn primer) and the strand attached to the second immobilised primer 202 (e.g. P7 lawn primer). Similar to Figure 7D, the second primer-binding sequence 302’ (e.g. P7’) on the template strand attached to the first immobilised primer 201 (e.g. P5 lawn primer) may then anneal to another second immobilised primer 202 (e.g. P7 lawn primer) located within the well 203. In a similar fashion, the first primer-binding sequence 30T (e.g. P5’) on the template strand attached to the second immobilised primer 202 (e.g. P7 lawn primer) may then anneal to another first immobilised primer 201 (e.g. P5 lawn primer) located within the well 203 (Figure 7G).
Completion of bridge amplification and dehybridisation may then provide an amplified (duoclonal) cluster, thus providing a plurality of first polynucleotide sequences comprising the forward strand of the template 10T (i.e. “first portions”), and a plurality of second polynucleotide sequences comprising the forward complement strand of the template 101 (i.e. “second portions”) (Figure 7H).
If desired, further bridge amplification cycles may be conducted to increase the number of first polynucleotide sequences and second polynucleotide sequences within the well 203.
In this particular example, the “first portion” corresponds with the forward strand of the template 10T, and the “second portion” corresponds with the forward complement strand of the template 101.
However, other set-ups may be obtained by changing the library used. For example, by using a loop fork method to prepare a library, a portion at or close to the loop (or the loop complement) may be cleaved (e.g. by nicking). In these cases, the loop may comprise a cleavage site (e.g. a restriction recognition site, a cleavable linker, a modified nucleotide, or the like). By conducting cleavage at the loop, it is possible to produce a well 203, where the “first portion” corresponds with a forward strand of the template, and the “second portion” corresponds with a reverse complement strand of the template. In addition, by using a PCR stitching method to prepare a library, a portion at or close to the overlap region may comprise a cleavage site (e.g. a restriction recognition site, a cleavable linker, a modified nucleotide, or the like). By conducting cleavage at the overlap region, it is possible to produce a well 203, where the “first portion” corresponds with a first insert sequence, and the “second portion” corresponds with a second insert sequence that is genetically unrelated to the first insert sequence. As such, different types of strands for the “first portions” and “second portions” may be prepared for templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion, and as such the forward strand of the template 101’ and the forward complement strand of the template 101 may be substituted as appropriate.
Sequencing
As described herein, the template provides information (e.g. identification of the genetic sequence, identification of epigenetic modifications) on the original target polynucleotide sequence. For example, a sequencing process (e.g. a sequencing-by-synthesis or sequencing-by-ligation process) may reproduce information that was present in the original target polynucleotide sequence, by using complementary base pairing.
By “identification” is meant here obtaining genetic information from the polynucleotide strand or polynucleotide strands. This may include identification of the genetic sequence of the polynucleotide strand or polynucleotide strands (i.e. sequencing). Furthermore, this may instead, or additionally, include identification of mismatched base pairs. In addition, this may instead, or additionally, include identification of any epigenetic modifications, for example methylation. Accordingly, “identification” may mean identification of the genetic sequence of the polynucleotide strand or polynucleotide strands, mismatched base pairs, and/or identification of any epigenetic modifications.
In one embodiment, sequencing may be carried out using any suitable "sequencing-by- synthesis" technique, wherein nucleotides are added successively in cycles to the free 3' hydroxyl group, resulting in synthesis of a polynucleotide chain in the 5' to 3' direction. The nature of the nucleotide added may be determined after each addition. One particular sequencing method relies on the use of modified nucleotides that can act as reversible chain terminators. Such reversible chain terminators comprise removable 3' blocking groups. Once such a modified nucleotide has been incorporated into the growing polynucleotide chain complementary to the region of the template being sequenced there is no free 3'-OH group available to direct further sequence extension and therefore the polymerase cannot add further nucleotides. Once the nature of the base incorporated into the growing chain has been determined, the 3' block may be removed to allow addition of the next successive nucleotide. By ordering the products derived using these modified nucleotides it is possible to deduce the DNA sequence of the DNA template. Such reactions can be done in a single experiment if each of the modified nucleotides has attached thereto a different label, known to correspond to the particular base, to facilitate discrimination between the bases added at each incorporation step. Suitable labels are described in PCT application PCT/GB2007/001770, the contents of which are incorporated herein by reference in their entirety. Alternatively, a separate reaction may be carried out containing each of the modified nucleotides added individually.
The modified nucleotides may carry a label to facilitate their detection. Such a label may be configured to emit a signal, such as an electromagnetic signal, or a (visible) light signal.
In a particular embodiment, the label is a fluorescent label (e.g. a dye). Thus, such a label may be configured to emit an electromagnetic signal, or a (visible) light signal. One method for detecting the fluorescently labelled nucleotides comprises using laser light of a wavelength specific for the labelled nucleotides, or the use of other suitable sources of illumination. The fluorescence from the label on an incorporated nucleotide may be detected by a CCD camera or other suitable detection means. Suitable detection means are described in PCT/US2007/007991 , the contents of which are incorporated herein by reference in their entirety.
However, the detectable label need not be a fluorescent label. Any label can be used which allows the detection of the incorporation of the nucleotide into the DNA sequence.
Each cycle may involve simultaneous delivery of four different nucleotide types to the array of template molecules. Alternatively, different nucleotide types can be added sequentially and an image of the array of template molecules can be obtained between each addition step.
In some embodiments, each nucleotide type may have a (spectrally) distinct label. In other words, four channels may be used to detect four nucleobases (also known as 4- channel chemistry) (Figure 8 - left). For example, a first nucleotide type (e.g. A) may include a first label (e.g. configured to emit a first wavelength, such as red light), a second nucleotide type (e.g. G) may include a second label (e.g. configured to emit a second wavelength, such as blue light), a third nucleotide type (e.g. T) may include a third label (e.g. configured to emit a third wavelength, such as green light), and a fourth nucleotide type (e.g. C) may include a fourth label (e.g. configured to emit a fourth wavelength, such as yellow light). Four images can then be obtained, each using a detection channel that is selective for one of the four different labels. For example, the first nucleotide type (e.g. A) may be detected in a first channel (e.g. configured to detect the first wavelength, such as red light), the second nucleotide type (e.g. G) may be detected in a second channel (e.g. configured to detect the second wavelength, such as blue light), the third nucleotide type (e.g. T) may be detected in a third channel (e.g. configured to detect the third wavelength, such as green light), and the fourth nucleotide type (e.g. C) may be detected in a fourth channel (e.g. configured to detect the fourth wavelength, such as yellow light). Although specific pairings of bases to signal types (e.g. wavelengths) are described above, different signal types (e.g. wavelengths) and/or permutations may also be used.
In some embodiments, detection of each nucleotide type may be conducted using fewer than four different labels. For example, sequencing-by-synthesis may be performed using methods and systems described in US 2013/0079232, which is incorporated herein by reference.
Thus, in some embodiments, two channels may be used to detect four nucleobases (also known as 2-channel chemistry) (Figure 8 - middle). For example, a first nucleotide type (e.g. A) may include a first label (e.g. configured to emit a first wavelength, such as green light) and a second label (e.g. configured to emit a second wavelength, such as red light), a second nucleotide type (e.g. G) may not include the first label and may not include the second label, a third nucleotide type (e.g. T) may include the first label (e.g. configured to emit the first wavelength, such as green light) and may not include the second label, and a fourth nucleotide type (e.g. C) may not include the first label and may include the second label (e.g. configured to emit the second wavelength, such as red light). Two images can then be obtained, using detection channels for the first label and the second label. For example, the first nucleotide type (e.g. A) may be detected in both a first channel (e.g. configured to detect the first wavelength, such as red light) and a second channel (e.g. configured to detect the second wavelength, such as green light), the second nucleotide type (e.g. G) may not be detected in the first channel and may not be detected in the second channel, the third nucleotide type (e.g. T) may be detected in the first channel (e.g. configured to detect the first wavelength, such as red light) and may not be detected in the second channel, and the fourth nucleotide type (e.g. C) may not be detected in the first channel and may be detected in the second channel (e.g. configured to detect the second wavelength, such as green light). Although specific pairings of bases to signal types (e.g. wavelengths) and/or combinations of channels are described above, different signal types (e.g. wavelengths) and/or permutations may also be used.
In some embodiments, one channel may be used to detect four nucleobases (also known as 1 -channel chemistry) (Figure 8 - right). For example, a first nucleotide type (e.g. A) may include a cleavable label (e.g. configured to emit a wavelength, such as green light), a second nucleotide type (e.g. G) may not include a label, a third nucleotide type (e.g. T) may include a non-cleavable label (e.g. configured to emit the wavelength, such as green light), and a fourth nucleotide type (e.g. C) may include a label-accepting site which does not include the label. A first image can then be obtained, and a subsequent treatment carried out to cleave the label attached to the first nucleotide type, and to attach the label to the label-accepting site on the fourth nucleotide type. A second image may then be obtained. For example, the first nucleotide type (e.g. A) may be detected in a channel (e.g. configured to detect the wavelength, such as green light) in the first image and not detected in the channel in the second image, the second nucleotide type (e.g. G) may not be detected in the channel in the first image and may not be detected in the channel in the second image, the third nucleotide type (e.g. T) may be detected in the channel (e.g. configured to detect the wavelength, such as green light) in the first image and may be detected in the channel (e.g. configured to detect the wavelength, such as green light) in the second image, and the fourth nucleotide type (e.g. C) may not be detected in the channel in the first image and may be detected in the channel in the second image (e.g. configured to detect the wavelength, such as green light). Although specific pairings of bases to signal types (e.g. wavelengths) and/or combinations of images are described above, different signal types (e.g. wavelengths), images and/or permutations may also be used. In one embodiment, the sequencing process comprises a first sequencing read and second sequencing read. The first sequencing read and the second sequencing read may be conducted concurrently. In other words, the first sequencing read and the second sequencing read may be conducted at the same time.
The first sequencing read may comprise the binding of a first sequencing primer (also known as a read 1 sequencing primer) to the first sequencing primer binding site (e.g. first terminal sequencing primer binding site 303 in templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion). The second sequencing read may comprise the binding of a second sequencing primer (also known as a read 2 sequencing primer) to the second sequencing primer binding site (e.g. second terminal sequencing primer binding site 304 in templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion).
This leads to sequencing of the first portion (e.g. forward strand of the template 10T in templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion; or other types of first portion if different library preparations are used, such as by PCR stitching or loop fork methods) and the second portion (e.g. forward complement strand of the template 101 in templates including a first polynucleotide sequence comprising a first portion and a second polynucleotide sequence comprising a second portion; or other types of first portion if different library preparations are used, such as by PCR stitching or loop fork methods).
Alternative methods of sequencing include sequencing by ligation, for example as described in US 6,306,597 or WO 06/084132, the contents of which are incorporated herein by reference.
Methods of conducting simultaneous sequencing
The sequencing process comprises a first sequencing read and second sequencing read. The first sequencing read and the second sequencing read may be conducted concurrently. In other words, the first sequencing read and the second sequencing read may be conducted at the same time. The present invention is directed to methods of simultaneous (or concurrent - such terms are used interchangeably) sequencing. By simultaneous sequencing is meant that both the first and second polynucleotide sequences (or strands) are read (i.e. sequenced) at the same time. This is achieved by altering the ratio of first to second sequencing reads, or vice versa (and the signal from these sequencing reads), which in turn allows the first and second sequencing reads to be differentiated, and allows the first and second polynucleotide strands to be sequenced simultaneously.
In the methods of the invention, the first and second sequencing strands, and the clusters generated therefrom, are spatially unresolved. Accordingly, the methods of the present invention allow the simultaneous sequencing of spatially unresolved clusters.
Advantageously, the methods of the invention provide a method of sequencing without the need for a paired-end turn and cluster re-synthesis. This in turn reduces the time taken to sequence a target polynucleotide, thus improving even further the efficiency of the sequencing protocol when using the method of the present invention. A paired-end turn refers to the sequence of stages required to effectively invert the sequence for the second read in paired-end reading, after sequencing read 1 (Figure 7). The paired-end turn may be facilitated by a cycle of bridge amplification and linearization. The present invention also eliminates the need for cluster re-synthesis of the first or second polynucleotide sequences for read 2 of conventional SBS workflows (Figure 7). Cluster re-synthesis follows the paired-end turn, and consists of cycles of bridge amplification to restore the clusters of library preparations for the second read (Figure 7). Thus, as described above, eliminating these steps provides additional time savings. For the simultaneous sequencing with 16 QaM, as described below, the sequencing time can be almost halved.
As the clusters are spatially unresolved, the methods of the invention further permit lower numerical aperture (NA) epifluorescence imaging of the sequencing reads. This increases the speed at which imaging can occur, thus reducing the time taken to sequence a target polynucleotide, and again improves the efficiency of the sequencing protocol when using the method of the present invention.
As a yet further advantage, the method of the invention also simplifies the immobilised primers. The immobilised primers of the present invention do not need to be cleavable from the solid support. Accordingly, in one embodiment, the immobilised primers may not comprise a cleavable site. In another embodiment, where the immobilised primers comprise a cleavable site, this cleavable site is different from or cleavable under different cleavable conditions from the cleavable sites present in the first and/or second sequencing primer. In another embodiment, the immobilised primers may not comprise a uracil or oxoguanine (e.g. 8-oxoguanine).
As a yet further advantage, the methods of the invention further improve (i.e. increase) the signal-to-noise ratio (SNR). This is achieved, as explained below, by attenuating the first signal intensity from the first sequencing read to generate an adjusted first signal intensity. As a result, the signal intensity generated from the second sequencing read: the adjusted signal intensity generated from the first sequencing read is optimised at or around 2:1.
The SNR is used to compute an estimated error rate for the entire ensemble of clusters, and so is a primary determinate of sequencing error. Therefore, a sub-optimal SNR equates to increased error rate.
The SNR of a sequencing read may be defined as the ability to differentiate one signal from another and/or background noise. SNR is correlative to sequencing error rate; the greater the SNR the greater the accuracy of sequencing two signals received concurrently. In the context of the present invention, the two signals received may correspond to a first and second sequencing read.
In one embodiment, the SNR between two signals (i.e. two clouds of a 16QaM output) equals the difference of mean intensities divided by the sum of noises.
In one embodiment, the SNR is measured in decibels (dB) and may be defined by equation 1 :
SNR = 20. log10(SNR) (Equation 1)
In the 16QAM scheme, a high SNR between adjacent clouds and the background is required to differentiate between concurrently received signals. The optimal SNR is achieved when the signal intensity ratio of the first sequencing read and the second sequencing read (hereafter referred to as the signal intensity ratio) is approximately 2:1. This is shown in Figure 17. The improved SNR equates to a reduced error rate at this primer ratio, as shown in Figure 18. Figure 18 (A, D) shows that a primer ratio of 2:1 corresponds to a very low error rate, wherein the error rate is estimated by calculating SNR between all 120 possible pairings and converting SNR to confusion percentage between each pair. This optimal primer ratio corresponds to a high SNR value (Figure 18 B, E; minimum SNR value).
As described above, achieving a high SNR (and ~2:1 signal intensity ratio) may be hindered for a number of reasons, such as systematic deviations in surface chemistry patterns and amplification efficiencies. To resolve this possible issue, the methods of the present invention maximise the SNR and consequently lower the error rate of a sequencing system.
Accordingly, in one aspect of the invention there is provided a method of preparing a first and second polynucleotide sequence for concurrent sequencing, wherein the method comprises:
(a) measuring a first signal intensity generated from a first nucleotide sequence, and
(b) measuring a second signal intensity generated from a second nucleotide sequence, and
(c) based on the first signal intensity and the second signal intensity, a subsequent signal intensity generated from sequencing the first nucleotide strand can be attenuated, wherein the attenuated subsequent signal intensity generated from sequencing the first polynucleotide strand has a lower intensity than a subsequent signal intensity generated from sequencing the second polynucleotide strand.
In other words, measurement of the first signal intensity and the second signal intensity provides the information necessary to be able to attenuate a subsequent signal intensity.
In another aspect of the invention there is provided a method of preparing a first and second polynucleotide sequence for concurrent sequencing, wherein the method comprises:
(a) measuring a first signal intensity generated from a first nucleotide sequence, and
(b) measuring a second signal intensity generated from a second nucleotide sequence, and (c) based on the first signal intensity and the second signal intensity, a subsequent signal intensity generated from sequencing the first nucleotide strand is attenuated, wherein the attenuated subsequent signal intensity generated from sequencing the first polynucleotide strand has a lower intensity than a subsequent signal intensity generated from sequencing the second polynucleotide strand.
In one embodiment the plurality of first and second polynucleotide sequences is synthesised by bridge or exclusion amplification, as described above. This leads to the formation of clusters or colonies of template molecules that are bound to the solid support via the immobilised primers. Accordingly, in one embodiment, synthesising a plurality of first and second polynucleotide sequences by conducting an amplification reaction forms a cluster or plurality of clusters. As described above, bridge amplification may be performed a number of times (i.e. a number of cycles are carried out) to obtain a cluster of first and second polynucleotide sequences - that is a cluster pair comprising clusters of the first polynucleotide sequence and clusters of the second polynucleotide sequence. As described above, this cluster pair may be spatially unresolved. In one embodiment, at least 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10, 15, 20 or 25 or more cycles of bridge amplification are performed.
Accordingly, the method of the invention may further comprise hybridising the first polynucleotide sequence to first immobilised primers on a solid support and hybridising the second polynucleotide sequence to second immobilised primers on a solid support; and synthesising a plurality of first and second polynucleotide sequences by conducting an amplification reaction to extend the first and second immobilised primers.
Following cluster formation, the cluster (or library) is linearised. Linearisation can occur as only one end of each side of the “bridge” formed during bridge amplification is attached (e.g. covalently bonded) to the solid support. The other side of the bridge is hybridized to its’ complementary immobilized primer. As shown in Figure 5G and 5H, linearization leads to a plurality of single-stranded first and second polynucleotide sequences immobilised on the solid support through the first and second immobilised primers. We may refer to these as first immobilised polynucleotide strands and second immobilised polynucleotide strands.
Following linearization the method comprises measuring a first signal intensity generated from a first nucleotide sequence, and measuring a second signal intensity generated from a second nucleotide sequence. By the first nucleotide strand is meant the first immobilised polynucleotide strand, and by the second nucleotide strand is meant the second immobilised polynucleotide strand.
By “measuring a (first and second) sequencing intensity” as used in (a) and (b) above, is meant obtaining a signal from the first and second polynucleotide strand that is sufficient to inform a subsequent calculation, which is then used to adjust the final signal intensity ratio, preferably towards 1 :2 for read 1 :read 2. In one embodiment, the method may comprise measuring the signal intensity of a single base within a sequencing primer. Alternatively, the method may comprise measuring the signal intensity from extending the bound sequencing primer and reading the first base. Though a signal intensity from a single nucleotide or base is sufficient to obtain a signal intensity measurement, the signal intensity from multiple, i.e. 2, 3, 4, 5, 6 or 10 or at least 15, nucleotides or bases may be determined and averaged.
Preferably, measurement of a first and second sequencing intensity is carried out sequentially. For example, the measurement of a first signal intensity generated from the first polynucleotide sequence is first, and the measurement of a second signal intensity generated from the second polynucleotide sequence is second - or vice versa. The order of application is immaterial, so long as the first and second sequencing primers are applied separately and sequentially.
Accordingly, in one embodiment, the method comprises applying (i.e. flowing across the surface of the solid support) a plurality of first (or second) sequencing primers (also known as a read 1 sequencing primer). The first sequencing primers hybridise to the first sequencing primer binding site (e.g. first terminal sequencing primer binding site 303).
By “sequencing primer” is meant a polynucleotide that is sufficiently complementary to a terminal sequencing primer binding site of a library strand, and is functionally able to initiate replication of the immobilised library strand i.e. sequencing. In one embodiment, the sequencing primer may be selected from SBS12’ or SBS3, for example SEQ ID NO: 10 and 7 respectively. By extension (or extending) of a sequencing primer is meant the sequence beyond a sequencing primer resulting from replication of the immobilised library strand.
In one embodiment, the first (or second) sequencing primer comprises a label to facilitate their detection. Such a label may be configured to emit a signal, such as an electromagnetic signal, or a (visible) light signal. In a particular embodiment, the label is a fluorescent label (e.g. a dye). Thus, such a label may be configured to emit an electromagnetic signal, or a (visible) light signal as described above in relation to labelled nucleotides for use in sequencing.
Accordingly, in one embodiment, the method comprises applying a plurality of labelled first sequencing primers, and measuring the signal intensity generated from binding of the first sequencing primers to the first polynucleotide strand. This is the measurement of the first signal intensity described above.
In an alternative embodiment, the method comprises applying (i.e. flowing across the surface of the solid support) a plurality of first (or second) sequencing primers, where the sequencing primers are not labelled and conducting an extension reaction, as described above, to extend the first (or second) sequencing primer. The sequencing primer may be extended by a plurality of labelled nucleotides, which are subsequently detected. In one embodiment, the sequencing primer is extended by just one labelled nucleotide. Alternatively, the sequencing primer is extended by 2, 3, 4, 5 or more labelled nucleotides.
The nucleotides may be labelled as described above. The labelled nucleotides may also comprise a 3’ blocking group. The blocking group may be any modification that prevents extension (i.e. elongation) of the free end by a polymerase. Examples of suitable blocking groups include a hairpin loop, a hydrogen atom instead of a 3’-OH group, a phosphate group, a propyl spacer (e.g.-O-(CH2)3-OH instead of a 3’-OH group), a modification blocking the 3’-hydroxyl group (e.g. hydroxyl protecting groups, such as silyl ether groups (e.g. trimethylsilyl, triethylsilyl, triisopropylsilyl, t-butyl(dimethyl)silyl, t- butyl(diphenyl)silyl), ether groups (e.g. benzyl, allyl, t-butyl, methoxymethyl (MOM), 2- methoxyethoxymethyl (MEM), tetrahydropyranyl), or acyl groups (e.g. acetyl, benzoyl)), or an inverted nucleobase. The 3’ block may be removed after the labelled nucleotide has been detected.
In one embodiment, the first sequencing primers and/or any extension products may be washed off before any further steps.
Once the first (or second) signal is obtained (i.e. step (a) as shown in Figure 15), and the intensity thereof measured, as described above, a second (or first) signal may be obtained (and the intensity thereof measured) (i.e. step (b) is performed). This second signal may be obtained by any of the means described above for obtaining the first signal. For example, in one embodiment, the method comprises applying (i.e. flowing across the surface of the solid support) a plurality of second (or first) sequencing primers (also known as a read 2 sequencing primer). The second sequencing primers hybridise to the second sequencing primer binding site (e.g. second terminal sequencing primer binding site 304) in the template. A second signal can then be obtained, and the intensity thereof measured, as described above using either labelled sequencing primers or non-labelled sequencing primers and labelled nucleotides.
In a further embodiment, where the second sequencing primer is labelled, the label may be cleavable. Advantageously, this means that the second sequencing primer can be extended when sequencing the full insert following the signal intensity measurement without a wash stage by cleaving the label.
In one embodiment, the label comprises a cleavable covalent bond. As used herein, the term “cleavable covalent bond” refers to a covalent bond that can be cleaved, for example, under the application of heat, light or other (bio)chemical methods (e.g. by exposure to a degradation agent, such as an enzyme or a catalyst), while a “non- cleavable covalent bond” is stable to degradation under such conditions. Non-limiting examples of cleavable covalent bonds include thermally or photolytically cleavable cycloadducts (e.g. furan-maleimide cycloadducts), alkenylene linkages, esters, amides, acetals, hemiaminal ethers, aminals, imines, hydrazones, 1 ,2-diol linkages (e.g. glycols cleavable by periodates), polysulfide linkages (e.g. disulfide linkages), boron-based linkages (e.g. boronic and borinic acids/esters), silicon-based linkages (e.g. silyl ether, siloxane), and phosphorus-based linkages (e.g. phosphite, phosphate) linkages.
The cleavable element of a label will be configured to be cleaved at specific location, hereafter called the cleavable site, under certain reaction conditions. As used herein, the term “cleavage conditions” refers to reaction conditions that cause cleavage within the label, i.e. at the cleavage site. The cleavage conditions may involve exposure to a thermal trigger (e.g. by heating), a light trigger (e.g. by exposure to ultraviolet light), and/or a chemical/biochemical trigger (e.g. an enzyme, periodate).
Accordingly, in one embodiment, the method comprises applying a second cleavable- labelled sequencing primer, measuring the intensity of a signal from the labelled sequencing primer and cleaving the cleavable label. Once a first and second signal intensity has been measured, as described above, it is possible to attenuate the first (or second) signal intensity to generate an adjusted first (or second) signal that has a lower intensity than the second (or first respectively) signal. In particular, as described above, it is highly advantageous to be able to manipulate the ratio of the first and second signal intensities, particularly towards the optimal ratio of 2:1 of second: first signal intensities to permit concurrent sequencing and achieve an optimal SNR.
In one embodiment, the optimum ratio of 2:1 or around 2:1 is achieved using first (or second) sequencing primers comprising a mix of terminated and non-terminated primers.
By terminated primer (or blocked primer - such terms may be used interchangeably), is meant a sequencing primer comprising a modification that prevents extension (i.e. elongation) of the primer by a polymerase. Suitable modifications include blocking groups such as a hairpin loop, a deoxynucleotide, a deoxyribonucleotide, a hydrogen atom instead of a 3’-OH group, a phosphate group, a phosphorothioate group, a propyl spacer (e.g. -O-(CH2)3-OH instead of a 3’-OH group), a modification blocking the 3’- hydroxyl group (e.g. hydroxyl protecting groups, such as silyl ether groups (e.g. trimethylsilyl, triethylsilyl, triisopropylsilyl, t-butyl(dimethyl)silyl, t-butyl(diphenyl)silyl), ether groups (e.g. benzyl, allyl, t-butyl, methoxymethyl (MOM), 2-methoxyethoxymethyl (MEM), tetrahydropyranyl), or acyl groups (e.g. acetyl, benzoyl)), or an inverted nucleobase.
The measured intensity of a first and second signal as described above (i.e. steps (a) and (b)) can be used to calculate the amount of terminated first (or second) sequencing primer to be added. This stage is illustrated in Figure 15C.
Accordingly, in one embodiment, the method comprises applying a calculated amount of first (or second) terminated primer. Any amount of terminated sequencing primer that modulates the signal intensity ratio towards 2:1 can be used.
In one embodiment, the calculated amount of terminated primer is determined by inputting the first and second measured signal intensities from steps (a) and (b) into Equation 2. tR1 p/ R1 p = (2* R1 si - R2si)/R2si (Equation 2)
Wherein: X is the desired ratio of signal intensities. In one embodiment X may be 2. tR1 P is terminated R1 primer
R1 p is Read 1 sequencing primer (non-terminated)
R1 si is a first signal intensity generated from a first nucleotide sequence
R2si is a second signal intensity generated from a first nucleotide sequence
Once the amount of first (or second) terminated sequencing primers has been determined that leads to the optimal 2:1 signal of second:first reads, the first and second polynucleotide strands can be sequenced concurrently.
The first sequencing read may comprise the binding of a mixture of terminated and nonterminated first sequencing primers (also known as a read 1 sequencing primers) at the ratio calculated above to the first sequencing primer binding site (e.g. first terminal sequencing primer binding site 303). The second sequencing read may comprise binding second sequencing primers to the second sequencing primer binding site (e.g. second terminal sequencing primer binding site 304).
This leads to concurrent sequencing of the first sequence (e.g. forward strand of the template) and the second sequence (e.g. reverse strand of the template).
Many benefits are afforded by improving the SNR of the system. Firstly, sequencing errors are improved because concurrently received signals can be better distinguished by 16 QaM analysis. This is illustrated in Figures 11 B and 12. Secondly, improving spatially unresolved imaging for clusters by improving the SNR allows for smaller pitch and nanowell dimensions to be used. Thirdly, improved SNR allows imaging technologies with lower numerical aperture to be used, allowing for faster imaging of larger areas. This in turn increases the throughput power of sequencing reads.
Where the sequences to be identified comprise one or more index sequence, the index is typically read separately from read 1 and read 2. Either before read 1 and before read 2, or afterwards. If afterwards, the extended sequencing primer is denatured and washed off the flowcell, and index primer is hybridized for the several cycles of index read.
Concurrent sequencing of concatenated polynucleotides
The present invention also pertains to methods of preparing a polynucleotide strand or strands for identification where the strand comprises two portions (in other words, a concatenated polynucleotide sequence comprising a first portion and a second portion) to be identified, such that’s such portions can be identified concurrently. This may be achieved by altering the ratio of the different portions which are capable of emitting a signal, which in turn means that during sequencing the signal from the first portion will be greater than the signal from the second portion. It is this difference in the intensity of the first and second signals that allows for the two portions to be identified simultaneously.
The first portions and second portions may be different polynucleotide sequences. That is, the sequences may be genetically unrelated and/or derived from different sources.
Alternatively, the first portions and second portions may be genetically related.
The single (concatenated) polynucleotide strand with a first and second portion, the single (concatenated) polynucleotide strand may comprise a first sequencing primer binding site and a second sequencing primer binding site, (used to sequence the first and second portions respectively) where the first sequencing primer binding site and second sequencing primer binding site are of a different sequence to each other and bind different sequencing primers.
In one aspect of the invention there is provided a method of preparing a polynucleotide sequence comprising a first portion and a second portion for concurrent sequencing, wherein the method comprises: measuring a first signal intensity generated from a first portion of the polynucleotide sequence, and measuring a second signal intensity generated from a second portion of the polynucleotide sequence, and based on the first signal intensity and the second signal intensity, a subsequent signal intensity generated from sequencing the first nucleotide strand can be or is attenuated, wherein the attenuated subsequent signal intensity generated from sequencing the first polynucleotide strand has a lower intensity than a subsequent signal intensity generated from sequencing the second polynucleotide strand.
As described above, the measured intensity of a first and second signal as described above (i.e. steps (a) and (b)) can be used to calculate the amount of terminated first (or second) sequencing primer to be added. Accordingly, in one embodiment, the method comprises applying a calculated amount of first (or second) terminated primer. Any amount of terminated sequencing primer that modulates the signal intensity ratio towards 2:1 or around 2:1 can be used.
Data analysis using 16 QaM
Figure 9 is a scatter plot showing an example of sixteen distributions of signals generated by polynucleotide sequences disclosed herein. By creating different intensities of signals between the two clusters their identities can be resolved, despite them being spatially unresolved, i.e. within the same CMOS pixel.
We can use 16 Quadrature Modulation (QaM) to concurrently read signals. The scatter plot of Figure 9 shows sixteen distributions (or bins) of intensity values from the combination of a brighter signal (i.e. a first signal as described herein) and a dimmer signal (i.e. a second signal as described herein); the two signals may be co-localized and may not be optically resolved as described above. This is shown in Figure 9. The intensity values shown in Figure 9 may be up to a scale or normalisation factor; the units of the intensity values may be arbitrary or relative (i.e., representing the ratio of the actual intensity to a reference intensity). The sum of the brighter signal generated by the first portions and the dimmer signal generated by the second portions results in a combined signal. The combined signal may be captured by a first optical channel and a second optical channel. Since the brighter signal may be A, T, C or G, and the dimmer signal may be A, T, C or G, there are sixteen possibilities for the combined signal, corresponding to sixteen distinguishable patterns when optically captured. That is, each of the sixteen possibilities corresponds to a bin shown in Figure 9. The computer system can map the combined signal generated into one of the sixteen bins, and thus determine the added nucleobase at the first portion and the added nucleobase at the second portion, respectively.
For example, when the combined signal is mapped to bin 1612 for a base calling cycle, the computer processor base calls both the added nucleobase at the first portion and the added nucleobase at the second portion as C. When the combined signal is mapped to bin 1614 for the base calling cycle, the processor base calls the added nucleobase at the first portion as C and the added nucleobase at the second portion as T. When the combined signal is mapped to bin 1616 for the base calling cycle, the processor base calls the added nucleobase at the first portion as C and the added nucleobase at the second portion as G. When the combined signal is mapped to bin 1618 for the base calling cycle, the processor base calls the added nucleobase at the first portion as C and the added nucleobase at the second portion as A.
When the combined signal is mapped to bin 1622 for the base calling cycle, the processor base calls the added nucleobase at the first portion as T and the added nucleobase at the second portion as C. When the combined signal is mapped to bin 1624 for the base calling cycle, the processor base calls both the added nucleobase at the first portion and the added nucleobase at the second portion as T. When the combined signal is mapped to bin 1626 for the base calling cycle, the processor base calls the added nucleobase at the first portion as T and the added nucleobase at the second portion as G. When the combined signal is mapped to bin 1628 for the base calling cycle, the processor base calls the added nucleobase at the first portion as T and the added nucleobase at the second portion as A.
When the combined signal is mapped to bin 1632 for the base calling cycle, the processor base calls the added nucleobase at the first portion as G and the added nucleobase at the second portion as C. When the combined signal is mapped to bin 1634 for the base calling cycle, the processor base calls the added nucleobase at the first portion as G and the added nucleobase at the second portion as T. When the combined signal is mapped to bin 1636 for the base calling cycle, the processor base calls both the added nucleobase at the first portion and the added nucleobase at the second portion as G. When the combined signal is mapped to bin 1638 for the base calling cycle, the processor base calls the added nucleobase at the first portion as G and the added nucleobase at the second portion as A.
When the combined signal is mapped to bin 1642 for the base calling cycle, the processor base calls the added nucleobase at the first portion as A and the added nucleobase at the second portion as C. When the combined signal is mapped to bin 1644 for the base calling cycle, the processor base calls the added nucleobase at the first portion as A and the added nucleobase at the second portion as T. When the combined signal is mapped to bin 1646 for the base calling cycle, the processor base calls the added nucleobase at the first portion as A and the added nucleobase at the second portion as G. When the combined signal is mapped to bin 1648 for the base calling cycle, the processor base calls both the added nucleobase at the first portion and the added nucleobase at the second portion as A. In this particular example, T is configured to emit a signal in both the IMAGE 1 channel and the IMAGE 2 channel, A is configured to emit a signal in the IMAGE 1 channel only, C is configured to emit a signal in the IMAGE 2 channel only, and G does not emit a signal in either channel. However, different permutations of nucleobases can be used to achieve the same effect by performing dye swaps. For example, A may be configured to emit a signal in both the IMAGE 1 channel and the IMAGE 2 channel, T may be configured to emit a signal in the IMAGE 1 channel only, C may be configured to emit a signal in the IMAGE 2 channel only, and G may be configured to not emit a signal in either channel.
Further details regarding performing base-calling based on a scatter plot having sixteen bins may be found in U.S. Patent Application Publication No. 2019/0212294, the disclosure of which is incorporated herein by reference.
Figure 10 is a flow diagram showing a method 1700 of base calling according to the present disclosure. The described method allows for simultaneous sequencing of two (or more) portions (e.g. the first portion and the second portion) in a single sequencing run from a single combined signal obtained from the first portion and the second portion, thus requiring less sequencing reagent consumption and faster generation of data from both the first portion and the second portion. Further, the simplified method may reduce the number of workflow steps while producing the same yield as compared to existing next-generation sequencing methods. Thus, the simplified method may result in reduced sequencing runtime.
As shown in Figure 10, the disclosed method 1700 may start from block 1701. The method may then move to block 1710.
At block 1710, intensity data is obtained. The intensity data includes first intensity data and second intensity data. The first intensity data comprises a combined intensity of a first signal component obtained based upon a respective first nucleobase of the first portion and a second signal component obtained based upon a respective second nucleobase of the second portion. Similarly, the second intensity data comprises a combined intensity of a third signal component obtained based upon the respective first nucleobase of the first portion and a fourth signal component obtained based upon the respective second nucleobase of the second portion. As such, the first portion is capable of generating a first signal comprising a first signal component and a third signal component. The second portion is capable of generating a second signal comprising a second signal component and a fourth signal component.
As described above, the first portion and the second portion may be arranged on the solid support such that signals from the first portion and the second portion are detected by a single sensing portion and/or may comprise a single cluster such that first signals and second signals from each of the respective first portions and second portions cannot be spatially resolved.
In one example, obtaining the intensity data comprises selecting intensity data that corresponds to two (or more) different portions (e.g. the first portion and the second portion). In one example, intensity data is selected based upon a chastity score. A chastity score may be calculated as the ratio of the brightest base intensity divided by the sum of the brightest and second brightest base intensities. The desired chastity score may be different depending upon the expected intensity ratio of the light emissions associated with the different portions. As described above, it may be desired to produce clusters comprising the first portion and the second portion, which give rise to signals in a ratio of 2:1. In one example, high-quality data corresponding to two portions with an intensity ratio of 2:1 may have a chastity score of around 0.8 to 0.9.
After the intensity data has been obtained, the method may proceed to block 1720. In this step, one of a plurality of classifications is selected based on the intensity data. Each classification represents a possible combination of respective first and second nucleobases. In one example, the plurality of classifications comprises sixteen classifications as shown in Figure 9, each representing a unique combination of first and second nucleobases. Where there are two portions, there are sixteen possible combinations of first and second nucleobases. Selecting the classification based on the first and second intensity data comprises selecting the classification based on the combined intensity of the first and second signal components and the combined intensity of the third and fourth signal components.
The method may then proceed to block 1730, where the respective first and second nucleobases are base called based on the classification selected in block 1720. The signals generated during a cycle of a sequencing are indicative of the identity of the nucleobase(s) added during sequencing (e.g. using sequencing-by-synthesis). It will be appreciated that there is a direct correspondence between the identity of the nucleobases that are incorporated and the identity of the complementary base at the corresponding position of the template sequence bound to the solid support. Therefore, any references herein to the base calling of respective nucleobases at the two portions encompasses the base calling of nucleobases hybridised to the template sequences and, alternatively or additionally, the identification of the corresponding nucleobases of the template sequences. The method may then end at block 1740.
Selective processing methods - solid supports and methods of preparing polynucleotide seguences for identification
The present invention is directed to a solid support that enables selective processing and methods of preparing polynucleotide sequences for identification using the solid support. The solid support comprises first immobilised primers and second immobilised primers, wherein a proportion of first immobilised primers that are cleavable is different from a proportion of second immobilised primers that are cleavable. Advantageously, once first polynucleotide sequences are extended from the first immobilised primers and second polynucleotide sequences are extended from the second immobilised primers, this difference in proportions that are cleavable in the first immobilised primers and the second immobilised primers provides a way of selectively processing the first polynucleotide sequences relative to the second polynucleotide sequences. This leads to a difference in intensity of signals that can be produced by the first polynucleotide sequences and the second polynucleotide sequences (e.g. a difference in intensity of signals that can be produced by a first portion in the first polynucleotide sequence and a second portion in the second polynucleotide sequence), which then allows both the first polynucleotide sequence and the second polynucleotide sequence to be identified simultaneously. In turn, this enables at least a doubling of the throughput of a sequencing reaction (i.e. increased sequencing efficiency) as well as a decrease in the time taken to sequence target polynucleotide strands. Furthermore, designing this feature into the solid support simplifies subsequent amplification and sequencing steps, as it is possible to rely on normal amplification and sequencing methods (such as those described above), rather than specifically designed amplification or sequencing methods that allow selective processing.
Accordingly, we describe a solid support, comprising: a plurality of first immobilised primers, wherein a proportion of the first immobilised primers are configured to be cleavable under first cleavage conditions; a plurality of second immobilised primers, wherein a proportion or substantially all of the second immobilised primers are configured to be cleavable under second cleavage conditions; and wherein the proportion of the first immobilised primers configured to be cleavable under first cleavage conditions is less than the proportion of the second immobilised primers configured to be cleavable under second cleavage conditions.
In one embodiment, the second immobilised primer is different in sequence to the first immobilised primer.
The remaining population of first immobilised primers (i.e. other than the first immobilised primers that are configured to be cleavable under first cleavage conditions) are not cleavable under the first cleavage conditions. As such, only some of the total population of first immobilised primers will become cleaved when the solid support is exposed to the first cleavage conditions.
In addition, the remaining population of second immobilised primers (i.e. other than the second immobilised primers that are configured to be cleavable under second cleavage conditions), if applicable, are not cleavable under the second cleavage conditions. When substantially all of the second immobilised primers are configured to be cleavable under the second cleavage conditions, then there is no (or substantially no) remaining population of second immobilised primers. As such, some or substantially all of the total population of second immobilised primers will become cleaved when the solid support is exposed to the second cleavage conditions.
The location at which the first immobilised primer is configured to be cleavable under first cleavage conditions may also be referred to as a first cleavage site. As such, the first immobilised primer comprises a first cleavage site. The first cleavage site may comprise a cleavable covalent bond. In some cases, when the first cleavage site is nicked, this allows sequencing to occur starting from the nick location (e.g. in conjunction with a strand displacement polymerase). In some cases, when the first cleavage site is nicked, this allows linearisation to occur since one side of a “bridge” formed during bridge amplification can be detached from a solid support. In such linearisation cases, the first cleavage site may also be referred to as a first linearisation site (or that the first immobilised primer is configured to be linearisable under first linearisation conditions). The location at which the second immobilised primer is configured to be cleavable under second cleavage conditions may also be referred to as a second cleavage site. As such, the second immobilised primer comprises a second cleavage site. The second cleavage site may comprise a cleavable covalent bond. In some cases, when the second cleavage site is nicked, this allows sequencing to occur starting from the nick location (e.g. in conjunction with a strand displacement polymerase). In some cases, when the second cleavage site is nicked, this allows linearisation to occur since one side of a “bridge” formed during bridge amplification can be detached from a solid support. In such linearisation cases, the second cleavage site may also be referred to as a second linearisation site (or that the second immobilised primer is configured to be linearisable under second linearisation conditions).
As used herein, the term “cleavable covalent bond” refers to a covalent bond that can be cleaved for example under the application of heat, light or other (bio)chemical methods (e.g. by exposure to a degradation agent, such as an enzyme or a catalyst), while a “non- cleavable covalent bond” is stable to degradation under such conditions. Non-limiting examples of cleavable covalent bonds include thermally or photolytically cleavable cycloadducts (e.g. furan-maleimide cycloadducts), alkenylene linkages, esters, amides, acetals, hemiaminal ethers, aminals, imines, hydrazones, 1 ,2-diol linkages (e.g. glycols cleavable by periodates), polysulfide linkages (e.g. disulfide linkages), boron-based linkages (e.g. boronic and borinic acids/esters), silicon-based linkages (e.g. silyl ether, siloxane), and phosphorus-based linkages (e.g. phosphite, phosphate) linkages.
In one embodiment, the cleavable covalent bond may comprise a phosphate linkage. In a further embodiment, the phosphate linkage may be a phosphate linkage located at a 5’-end or a 3’-end of a nucleotide comprising a nucleobase which is not selected from guanine, cytosine, adenine or thymine when the first immobilised primer and/or second immobilised primer is a DNA sequence; or wherein the first immobilised primer comprises a nucleobase which is not selected from guanine, cytosine, adenine or uracil when the first immobilised primer and/or second immobilised primer is an RNA sequence. In other words, the phosphate linkage is located at a 5’-end or a 3’-end of a nucleotide comprising an unnatural nucleobase (i.e. one which is not usually present in a typical DNA sequence an RNA sequence). By using the native phosphate linkages in a DNA/RNA sequence as the cleavable covalent bond, it is possible to avoid the use of sequencing primers during sequencing, as it is possible to use the “nicked” location as the start of a sequencing run. If the “nicked” location still comprises a 3’ phosphate group after cleavage, the 3’-phosphate group may be removed using a kinase (e.g. T4 kinase) to allow extension to occur at the “nicked” location.
In one embodiment, the solid support may comprise at least one well, and wherein the plurality of first immobilised primers and the plurality of second immobilised primers are located within the well. In a further embodiment, the solid support may comprise a plurality of wells, wherein the plurality of first immobilised primers and the plurality of second immobilised primers are located within each of the wells. Since each well is configured such that the proportion of first immobilised primers cleavable under first cleavage conditions is less than the proportion of second immobilised primers cleavable under second cleavage conditions, this provides a way of selectively processing all of these wells, thus enabling concurrent sequencing across all of the wells.
The plurality of first immobilised primers and second immobilised primers may be located only within the at least one well (or plurality of wells). In other words, a region outside the well (or plurality of wells) may not comprise the first immobilised primers and second immobilised primers.
As used herein, the term “first cleavage conditions” refers to reaction conditions that cause cleavage within the first immobilised primer (i.e. at the first cleavage site). The first cleavage conditions may involve exposure to a thermal trigger (e.g. by heating), a light trigger (e.g. by exposure to ultraviolet light), and/or a chemical/biochemical trigger (e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladiumbased or a nickel-based catalyst, periodate). In some cases, the “first cleavage conditions” may allow linearisation to occur, and may be referred to as “first linearisation conditions”.
Accordingly, in an embodiment, the proportion of first immobilised primers may be configured to be cleavable by a thermal trigger (e.g. by heating), a light trigger (e.g. by exposure to ultraviolet light), and/or a chemical/biochemical trigger (e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladium-based or a nickel- based catalyst, periodate).
In an embodiment, the proportion of first immobilised primers may be configured to be cleavable by a glycosylase. In other words, the first cleavage conditions involve exposure to a glycosylase. In a further embodiment, the proportion of first immobilised primers may be configured to be cleavable by a glycosylase that recognises any nitrogenous base (e.g. purine or pyrimidine) which is not selected from guanine (G), cytosine (C), adenine (A) and thymine (T) when the first immobilised primer is a DNA sequence; or the proportion of first immobilised primers may be configured to be cleavable by a glycosylase that recognises any nitrogenous base (e.g. purine or pyrimidine) which is not selected from guanine (G), cytosine (C), adenine (A) and uracil (II) when the first immobilised primer is an RNA sequence. In other words, the glycosylase may recognise an unnatural nucleobase (i.e. one which is not usually present in a typical DNA sequence an RNA sequence). Examples of unnatural nucleobases may include oxoguanine (e.g. 8-oxoguanine), hypoxanthine, xanthine, methylguanines (e.g. O6-methylguanine, N7- methylguanine), methyladenines (e.g. 3-methyladenine, N6-methyladenine), modified cytosines including methylcytosines (e.g. 5-methylcytosine, 5-hydroxymethylcytosine, 5- formylcytosine, 5-carboxylcytosine), dihydrouracil, inosine, and uracil (if the first immobilised primer is a DNA sequence).
In an embodiment, the proportion of first immobilised primers may be configured to be cleavable by a uracil glycosylase (when the first immobilised primer is a DNA sequence) or an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase); and in a further embodiment, an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase).
In an embodiment, each first immobilised primer that is cleavable may comprise a nucleobase which is not selected from guanine, cytosine, adenine or thymine when the first immobilised primer is a DNA sequence, or wherein each first immobilised primer that is cleavable may comprise a nucleobase which is not selected from guanine, cytosine, adenine or uracil when the first immobilised primer is an RNA sequence. In other words, each first immobilised primer that is cleavable may comprise an unnatural nucleobase (i.e. one which is not usually present in a typical DNA sequence an RNA sequence). As mentioned above, examples of unnatural nucleobases may include oxoguanine (e.g. 8- oxoguanine), hypoxanthine, xanthine, methylguanines (e.g. O6-methylguanine, N7- methylguanine), methyladenines (e.g. 3-methyladenine, N6-methyladenine), modified cytosines including methylcytosines (e.g. 5-methylcytosine, 5-hydroxymethylcytosine, 5- formylcytosine, 5-carboxylcytosine), dihydrouracil, inosine, and uracil (if the first immobilised primer is a DNA sequence).
In a further embodiment, each first immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) or uracil when the first immobilised primer is a DNA sequence, or wherein each first immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is an RNA sequence; and in an even further embodiment, wherein each first immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is a DNA sequence.
As used herein, the term “second cleavage conditions” refers to reaction conditions that cause cleavage within the second immobilised primer (i.e. at the second cleavage site). The second cleavage conditions may involve exposure to a thermal trigger (e.g. by heating), a light trigger (e.g. by exposure to ultraviolet light), and/or a chemical/biochemical trigger (e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladium-based or a nickel-based catalyst, periodate). In some cases, the “second cleavage conditions” may allow linearisation to occur, and may be referred to as “second linearisation conditions”.
Accordingly, in an embodiment, the proportion of second immobilised primers may be configured to be cleavable by a thermal trigger (e.g. by heating), a light trigger (e.g. by exposure to ultraviolet light), and/or a chemical/biochemical trigger (e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladium-based or a nickel- based catalyst, periodate).
In an embodiment, the proportion of second immobilised primers may be configured to be cleavable by a glycosylase. In other words, the second cleavage conditions involve exposure to a glycosylase. In a further embodiment, the proportion of second immobilised primers may be configured to be cleavable by a glycosylase that recognises any nitrogenous base (e.g. purine or pyrimidine) which is not selected from guanine (G), cytosine (C), adenine (A) and thymine (T) when the second immobilised primer is a DNA sequence; or the proportion of second immobilised primers may be configured to be cleavable by a glycosylase that recognises any nitrogenous base (e.g. purine or pyrimidine) which is not selected from guanine (G), cytosine (C), adenine (A) and uracil (II) when the second immobilised primer is an RNA sequence. In other words, the glycosylase may recognise an unnatural nucleobase (i.e. one which is not usually present in a typical DNA sequence an RNA sequence). Examples of unnatural nucleobases may include oxoguanine (e.g. 8-oxoguanine), hypoxanthine, xanthine, methylguanines (e.g. O6-methylguanine, N7-methylguanine), methyladenines (e.g. 3- methyladenine, N6-methyladenine), modified cytosines including methylcytosines (e.g. 5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine, 5-carboxylcytosine), di hydrouracil, inosine, and uracil (if the second immobilised primer is a DNA sequence). In an embodiment, the proportion of second immobilised primers may be configured to be cleavable by a uracil glycosylase (when the second immobilised primer is a DNA sequence) or an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase); and in a further embodiment, a uracil glycosylase (when the second immobilised primer is a DNA sequence).
In an embodiment, each second immobilised primer that is cleavable may comprise a nucleobase which is not selected from guanine, cytosine, adenine or thymine when the second immobilised primer is a DNA sequence, or wherein each second immobilised primer that is cleavable may comprise a nucleobase which is not selected from guanine, cytosine, adenine or uracil when the second immobilised primer is an RNA sequence. In other words, each second immobilised primer that is cleavable may comprise an unnatural nucleobase (i.e. one which is not usually present in a typical DNA sequence an RNA sequence). As mentioned above, examples of unnatural nucleobases may include oxoguanine (e.g. 8-oxoguanine), hypoxanthine, xanthine, methylguanines (e.g. O6-methylguanine, N7-methylguanine), methyladenines (e.g. 3-methyladenine, N6- methyladenine), modified cytosines including methylcytosines (e.g. 5-methylcytosine, 5- hydroxymethylcytosine, 5-formylcytosine, 5-carboxylcytosine), di hydrouracil, inosine, and uracil (if the second immobilised primer is a DNA sequence).
In a further embodiment, each second immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) or uracil when the second immobilised primer is a DNA sequence, or wherein each second immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) when the second immobilised primer is an RNA sequence; and in an even further embodiment, wherein each second immobilised primer that is cleavable may comprise uracil when the second immobilised primer is a DNA sequence.
In an embodiment, each first immobilised primer that is cleavable may be configured to be cleavable by an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase) when the first immobilised primer is a DNA sequence, and wherein each second immobilised primer that is cleavable may be configured to be cleavable by a uracil glycosylase when the second immobilised primer is a DNA sequence; or wherein each first immobilised primer that is cleavable may be configured to be cleavable by a uracil glycosylase when the first immobilised primer is a DNA sequence, and wherein each second immobilised primer that is cleavable may be configured to be cleavable by an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase) when the second immobilised primer is a DNA sequence.
In an embodiment, each first immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is a DNA sequence, and wherein each second immobilised primer that is cleavable may comprise uracil when the second immobilised primer is a DNA sequence; or wherein each first immobilised primer that is cleavable may comprise uracil when the first immobilised primer is a DNA sequence, and wherein each second immobilised primer that is cleavable may comprise oxoguanine (e.g. 8-oxoguanine) when the second immobilised primer is a DNA sequence.
As used herein, the term “glycosylase” may refer to an enzyme which catalyses the removal of a nitrogenous base from one of the nucleotides in a (poly)nucleotide chain by breaking a N-glycosidic bond, resulting in the formation of an apurinic/apyrimidinic site (AP site). For DNA chains, the glycosylase may recognise any nitrogenous base (e.g. purine or pyrimidine) which is not selected from guanine (G), cytosine (C), adenine (A) and thymine (T); for RNA chains, the glycosylase may recognise any nitrogenous base (e.g. purine or pyrimidine) which is not selected from guanine (G), cytosine (C), adenine (A) and uracil (II). Examples of typical nitrogenous bases recognised by glycosylases include oxoguanine (e.g. 8-oxoguanine), uracil, inosine and alkylpurines.
Glycosylases may be monofunctional, such that they only possess glycosylase activity (i.e. breaking of the N-glycosidic bond) - cleavage of a phosphodiester bond in the sugarphosphate backbone may then occur in an uncatalysed manner by elimination. Other glycosylases may be bifunctional, such that they also possess AP lyase activity by catalysing the phosphodiester bond of the (poly)nucleotide chain. In an embodiment, the glycosylase is bifunctional (i.e. possesses both glycosylase and AP lyase activity).
In an embodiment, the first cleavage conditions and the second cleavage conditions may be the same or may be different. In some embodiments, the first cleavage conditions and the second cleavage conditions may be the same. This allows the cleavage within the first immobilised primers and the second immobilised primers to occur using a single exposure, which reduces the number of steps required for preparing first polynucleotide sequences and second polynucleotide sequences for concurrent sequencing. In alternative embodiments, the first cleavage conditions and the second cleavage conditions may be different. This allows control of which of the first immobilised primer and/or the second immobilised primer become cleaved, should this be necessary during the preparation processes.
In an embodiment, a ratio between the first immobilised primers configured to be cleavable under first cleavage conditions and the second immobilised primers configured to be cleavable under second cleavage conditions may be between 1 :1.25 to 1 :5; in a further embodiment, between 1 :1.5 to 1 :3; and in an even further embodiment, about 1 :2. These ratios provide successively better distribution of clouds when analysing the concurrent sequencing results.
In an embodiment, the proportion of first immobilised primers cleavable under first cleavage conditions relative to a proportion of first immobilised primers which are not cleavable under first cleavage conditions may be between 20:80 to 80:20; in a further embodiment, between 1 :3 to 3:1 , in an even further embodiment, between 1 :2 to 2:1 ; and in a yet even further embodiment, about 1 :1.
In an embodiment, the proportion of first immobilised primers cleavable under first cleavage conditions relative to a total population of first immobilised primers may be between 0.2 to 0.8; in a further embodiment, between 0.25 to 0.75; in an even further embodiment, between 1Zs to %; in a yet even further embodiment, about 0.5. In particular, the proportion of first immobilised primers cleavable under first cleavage conditions relative to a total population of first immobilised primers may be between 0.2 to 0.8, between 0.25 to 0.75 (in a further embodiment), between 1Zs to % (in an even further embodiment), or about 0.5 (in a yet even further embodiment); whilst respective proportions of first immobilised primers which are not cleavable under first cleavage conditions relative to a total population of first immobilised primers may be between 0.8 to 0.2, between 0.75 to 0.25 (in the further embodiment), between % to 1Zs (in the even further embodiment), or about 0.5 (in the yet even further embodiment) (wherein the proportion of first immobilised primers cleavable under first cleavage conditions and the proportion of first immobilised primers which are not cleavable under first cleavage conditions sums to 1).
In an embodiment, the proportion of second immobilised primers cleavable under second cleavage conditions relative to a total population of second immobilised primers may be 0.9 or more; in a further embodiment, 0.95 or more; in an even further embodiment, substantially all of the second immobilised primers are cleavable under second cleavage conditions. In particular, the proportion of second immobilised primers cleavable under second cleavage conditions relative to a total population of second immobilised primers may be 0.9 or more, 0.95 or more (in a further embodiment), or substantially all of the second immobilised primers are cleavable under second cleavage conditions (in an even further embodiment), whilst respective proportions of second immobilised primers which are not cleavable under second cleavage conditions relative to a total population of second immobilised primers may be 0.1 or less, 0.05 or less (in the further embodiment), or substantially none of the second immobilised primers are not cleavable under second cleavage conditions (in the even further embodiment).
In an embodiment, each first immobilised primer may comprise a sequence as defined in SEQ ID NO. 1 or 5, or a variant or fragment thereof; and each second immobilised primer may comprise a sequence as defined in SEQ ID NO. 2, or a variant or fragment thereof; or wherein each first immobilised primer may comprise a sequence as defined in SEQ ID NO. 2, or a variant or fragment thereof; and each second immobilised primer may comprise a sequence as defined in SEQ ID NO. 1 or 5, or a variant or fragment thereof.
In an embodiment, a surface of the solid support may comprise at least one first linking group capable of forming non-covalent interactions, covalent bonds, or metalcoordination bonds with a second linking group; in a further embodiment, non-covalent interactions or covalent bonds. In an even further embodiment, the surface of the solid support may comprise a plurality of first linking groups.
In an additional or alternative embodiment, the first immobilised primer and/or the second immobilised primer may comprise a first linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with a second linking group; in a further embodiment, covalent bonds.
The first linking groups (whether part of the surface of the solid support, the first immobilised primer, and/or the second immobilised primer) are advantageous because these can form cross-links with the template strands, thus allowing the template strands to be fixed to the surface of the solid support and/or the immobilised primers, and preventing them from becoming washed away (e.g. during template melting). Although not strictly necessary, this may be useful during amplification, clustering or sequencing of the template, particularly when the first immobilised primer and/or the second immobilised primer is cleaved on exposure to the first cleavage conditions and/or second cleavage conditions. Where at least one well (or a plurality of wells) is/are present on the solid support and the at least one well (or plurality of wells) comprise the first immobilised primers and the second immobilised primers, the first linking groups may be located within the well (or plurality of wells).
The first linking groups may be located only within the well (or plurality of wells). In other words, a region outside the well (or plurality of wells) may not comprise the first linking groups. In particular, a region outside the well (or plurality of wells) may not comprise the first linking groups, as well as not comprising the first immobilised primers and the second immobilised primers.
In some embodiments, the first linking groups may be capable of forming non-covalent interactions. These non-covalent interactions may include one or more of ionic bonds, hydrogen bonds, hydrophobic interactions, TT-TT interactions, van der Waals interactions and host-guest interactions. Where non-covalent interactions are used, the type of interaction is not particularly limited, provided that the interactions are (collectively) sufficiently strong for the template strands to remain attached to the solid support during amplification, clustering or sequencing.
As used herein, the term “ionic bond” refers to a chemical bond between two or more ions that involves an electrostatic attraction between a cation and an anion. For example, the cation may be selected from “metal cations”, as described herein, or “non-metal cations”. Non-metal cations may include ammonium salts (e.g. alkylammonium salts) or phosphonium salts (e.g. alkylphosphonium salts). The anion may be selected from phosphates, thiophosphates, phosphonates, thiophosphonates, phosphinates, thiophosphinates, sulfates, sulfonates, sulfites, sulfinates, carbonates, carboxylates, alkoxides, phenolates and thiophenolates.
As used herein, the term “hydrogen bond” refers to a bonding interaction between a lone pair on an electron-rich atom (e.g. nitrogen, oxygen or fluorine) and a hydrogen atom attached to an electronegative atom (e.g. nitrogen or oxygen).
As used herein, the term “host-guest interaction” refers to two or more groups which are able to form bound complexes via one or more types of non-covalent interactions by molecular recognition, such as ionic bonding, hydrogen bonding, hydrophobic interactions, van der Waals interactions and TT-TT interactions. For example, the host- guest interaction may include interactions formed between cucubiturils with adamantanes (e.g. 1-adamantylamine), ammonium ions (e.g. amino acids), ferrocenes; cyclodextrins with adamantanes (e.g. 1-adamantylamine), ammonium ions (e.g. amino acids), ferrocenes, calixarenes with adamantanes (e.g. 1-adamantylamine), ammonium ions (e.g. amino acids), ferrocenes; crown ethers (e.g. 18-crown-6, 15-crown-5, 12- crown-4) or cryptands (e.g. [2.2.2]cryptand) with cations (e.g. metal cations, ammonium ions); avidins (e.g. streptavidin) and biotin; and antibodies and haptens.
In other embodiments, the first linking groups may be capable of forming covalent bonds. Where covalent bonds are used, the bond may be stable such that the template strands remain attached to the solid support during amplification, clustering or sequencing.
Examples of covalent bonds include cycloadducts (e.g. triazole cycloadducts, cyclobutane cycloadducts, furan-maleimide cycloadducts), alkylene linkages, alkenylene linkages, esters, amides, acetals, hemiaminal ethers, aminals, imines, hydrazones, polysulfide linkages (e.g. disulfide linkages), boron-based linkages (e.g. boronic and borinic acids/esters), silicon-based linkages (e.g. silyl ether, siloxane), and phosphorus- based linkages (e.g. phosphite, phosphate, thiophosphate) linkages.
As used herein, the term “cycloadduct” refers to a cyclic structure formed from a cycloaddition reaction between two components (e.g. Diels-Alder type cycloadditions between a diene and a dienophile, including inverse Diels-Alder type cycloadditions, 1 ,3- dipolar type cycloadditions between a dipole and a dipolarophile, or [2 + 2] cycloadditions between two alkenes).
As used herein, the term “alkyl” or “alkylene” refers to monovalent or divalent straight and branched chain groups respectively having from 1 to 12 carbon atoms. In a further embodiment, the alkyl or alkylene groups are straight or branched alkyl or alkylene groups having from 1 to 6 carbon atoms; in a yet further embodiment, straight or branched alkyl or alkylene groups having from 1 to 4 carbon atoms. An alkyl or alkylene group may comprise one or more “substituents”, as described herein.
As used herein, the term “alkenyl” or “alkenylene” refers to monovalent or divalent straight and branched chain groups respectively having from 1 to 12 carbon atoms, and which comprise at least one carbon-carbon double bond. In a further embodiment, the alkenyl or alkenylene groups are straight or branched alkenyl or alkenylene groups having from 1 to 6 carbon atoms; in a yet further embodiment, straight or branched alkenyl or alkenylene groups having from 1 to 4 carbon atoms. An alkenyl or alkenylene group may comprise one or more “substituents”, as described herein.
As used herein, the term “alkynyl” refers to monovalent straight and branched chain groups respectively having from 1 to 12 carbon atoms, and which comprise at least one carbon-carbon triple bond. In a further embodiment, the alkynyl groups are straight or branched alkynyl groups having from 1 to 6 carbon atoms; in a yet further embodiment, straight or branched alkynyl groups having from 1 to 4 carbon atoms. An alkynyl group may comprise one or more “substituents”, as described herein.
As used herein, the term “amino” refers to a -N(R)(R’) group, where R and R’ are independently hydrogen or a “substituent” as defined herein. As used herein, the term “amine linkage” refers to a -NR- group, and where R is hydrogen or a “substituent” as defined herein.
As used herein, the term “ester” refers to a -O-C(=O)- group, where the group is attached to two other carbon atoms at the points of attachment to the group.
As used herein, the term “amide” refers to a -NR-C(=O)- group, where R is hydrogen or a “substituent” as described herein.
An “aryl” group refers to a monovalent monocyclic, bicyclic or tricyclic aromatic group respectively containing from 6 to 14 carbon atoms in the ring. Common aryl groups include C6-C14 aryl, for example, Ce-C aryl. An aryl group may comprise one or more “substituents”, as described herein.
A “heterocyclyl” group refers to a monovalent saturated or partially saturated 3 to 7 membered monocyclic, or 7 to 10 membered bicyclic ring system respectively, which consists of carbon atoms and from one to four heteroatoms independently selected from the group consisting of O, N, and S, wherein the nitrogen and sulfur heteroatoms may be optionally oxidised, the nitrogen may be optionally quaternised, and includes any bicyclic group in which any of the above-defined rings is fused to a benzene ring, and wherein the ring may be substituted on carbon or on a nitrogen atom if the resulting compound is stable. Non-limiting examples of “heterocyclyl” groups include pyrrolidinyl, tetrahydrofuranyl, dihydrofuranyl, tetrahydrothienyl, tetrahydrothiopyranyl, isoxazolinyl, piperidyl, morpholinyl, thiomorpholinyl, thioxanyl, piperazinyl, azetidinyl, oxetanyl, thietanyl, homopiperidyl, oxepanyl, thiepanyl, oxazepinyl, diazepinyl, thiazepinyl, 1 , 2,3,6- tetrahydropyridyl, 2-pyrrolinyl, 3-pyrrolinyl, indolinyl, 2H-pyranyl, 4H-pyranyl, dioxanyl, 1 ,3-dioxolanyl, pyrazolinyl, dithianyl, dithiolanyl, dihydropyranyl, dihydrothienyl, dihydrofuranyl, dihydropyridazinyl (e.g. 1 ,4-dihydropyridazinyl), pyrazolidinyl, imidazolinyl, imidazolidinyl, 3-azabicyclo[3.1.0]hexyl, 3-azabicyclo[4.1.0]heptyl, 3H- indolyl, and quinolizinyl. A heterocyclyl group may comprise one or more “substituents”, as described herein.
A “heteroaryl” group refers to monovalent aromatic groups having 5 to 14 ring atoms respectively (for example, 5 to 10 ring atoms) and containing carbon atoms and 1 , 2 or 3 oxygen, nitrogen or sulfur heteroatoms. Non-limiting examples of “heteroaryl” groups include quinolyl including 8-quinolyl, isoquinolyl, coumarinyl including 8-coumarinyl, pyridyl, pyrazinyl, pyrazolyl, pyrimidinyl, pyridazinyl, furyl, pyrrolyl, thienyl, thiazolyl, isothiazolyl, triazolyl (e.g. 1 ,2,3-triazolyl), tetrazolyl, isoxazolyl, oxazolyl, imidazolyl, indolyl, isoindolyl, indazolyl, indolizinyl, phthalazinyl, pteridinyl, purinyl, oxadiazolyl, thiadiazolyl, furazanylene, pyridazinyl, triazinyl, cinnolinyl, benzimidazolyl, benzofuranyl, benzofurazanyl, benzothiophenyl, benzothiazolyl, benzoxazolyl, quinazolinyl, quinoxalinyl, naphthyridinyl and furopyridyl. Where the heteroaryl (or heteroarylene) group contains a nitrogen atom in a ring, such nitrogen atom may be in the form of an N- oxide, e.g., a pyridyl N-oxide, pyrazinyl N-oxide, pyrimidinyl N-oxide and pyridazinyl N- oxide. A heteroaryl group may comprise one or more “substituents”, as described herein.
As used herein, the term “ester” refers to a -O-C(=O)- group, where the group is attached to two other carbon atoms at the points of attachment to the group.
As used herein, the term “amide” refers to a -NR-C(=O)- group, where R is hydrogen or a “substituent” as described herein.
As used herein, the term “acetal” refers to a -OC(R)(R’)O- group, where R and R’ are independently hydrogen or a “substituent” as described herein.
As used herein, the term “hemiaminal ether” refers to a -OC(R)(R’)NR”- group, where R, R’ and R” are independently hydrogen or a “substituent” as described herein.
As used herein, the term “aminal” refers to a -NR(R’)(R”)NR”’- group, where R, R’, R” and R’” are independently hydrogen or a “substituent” as described herein. As used herein, the term “imine” refers to a -C(R)=N- group, where R is hydrogen or a “substituent” as described herein.
As used herein, the term “hydrazone” refers to a -C(R)=N-NR’- group, where R and R’ are independently hydrogen or a “substituent” as described herein.
As used herein, the term “polysulfide” refers to a -(S)n- group, wherein n is 2 to 10, or 2 to 6. For example, n may be 2, forming a “disulfide” linkage.
As used herein, the term “boron-based linkage” refers to a -(O)a-B(OR)-(O)b- group, where R is independently hydrogen or a “substituent” as described herein, and where a and b are independently 0 or 1 .
As used herein, the term “silicon-based linkage” refers to a -(O)a-Si(R)(R’)-(O)b- group, where R and R’ are independently hydrogen or a “substituent” as described herein, and where a and b are independently 0 or 1 .
As used herein, the term “phosphorus-based linkage” refers to a -(O)a-P(R)-(O)b- group, where R and R’ are independently hydrogen or a “substituent” as described herein, and where a and b are independently 0 or 1.
As used herein, the term “substituent” refers to groups such as OR’, =0, SR’, SOR’, SO2R’, NO2, NHR’, NR’R’, =N-R’, NHCOR’, N(COR’)2, NHSO2R’, NR’C(=NR’)NR’R’, CN, halogen, COR’, COOR’, OCOR’, OCONHR’, OCONR’R’, CONHR’, CONR’R’, protected OH, protected amino, protected SH, substituted or unsubstituted Ci-Ci2 alkyl, substituted or unsubstituted C2-Ci2 alkenyl, substituted or unsubstituted C2-Ci2 alkynyl, substituted or unsubstituted aryl, substituted or unsubstituted heterocyclyl, and substituted or unsubstituted heteroaryl, where each of the R’ groups is independently selected from the group consisting of hydrogen, OH, NO2, NH2, SH, CN, halogen, COH, COalkyl, CO2H, substituted or unsubstituted Ci-Ci2 alkyl, substituted or unsubstituted C2-Ci2 alkenyl, substituted or unsubstituted C2-Ci2 alkynyl, substituted or unsubstituted aryl, substituted or unsubstituted heterocyclyl, and substituted or unsubstituted heteroaryl. Where such groups are themselves substituted, the substituents may be chosen from the foregoing list. In addition, where there are more than one R’ groups on a substituent, each R’ may be the same or different. In other embodiments, the first linking groups may be capable of forming metalcoordination bonds. Where metal-coordination bonds are used, the bond may be strong enough such that the template strands remain attached to the solid support during amplification, clustering or sequencing.
For example, the metal-coordination bond may be one formed between nickel and histidine, such as nickel-His6 tag. The surface of the solid support, the first immobilised primers, and/or the second immobilised primers may comprise nickel (e.g. nickel metal or nickel ions), and attachable to a histidine (e.g. His6 tag) moiety on the template. Alternatively, the surface of the solid support, the first immobilised primers, and/or the second immobilised primers may comprise a histidine (e.g. His6 tag), and attachable to nickel (e.g. nickel metal or nickel ions) on the template.
As used herein, the term “metal-coordination bond” refers to a reversible ionic bond and/or a reversible dative covalent bond formed between a metal moiety and a ligand (e.g. a “metal-coordination group”, as described herein).
As used herein, the term “metal-coordination group” refers to a group which is able to coordinate with a metal moiety by forming a reversible ionic bond and/or a reversible dative covalent bond between the coordinating group and the metal moiety. Non-limiting examples of metal-coordination groups include benzenediols (e.g. catechols) or derivatives thereof; benzenetriols (e.g. gallols) or derivatives thereof; amino acids including histidine (e.g. polyhistidines such as His6 tag), serine, threonine, asparagine, glutamine, lysine, or cysteine; and ethylenediaminetetraacetic acid and derivatives thereof.
The ratio of metal-coordination group(s) to metal moieties can be tuned. There may be one, two or three coordinating groups per metal moiety.
As used herein, a “metal moiety” can be any metal moiety suitable to form ionic bonds, or to coordinate with a metal-coordinating group. For the metal-coordinating group, the metal moiety forms reversible ionic bonds and/or reversible dative covalent bonds with metal-coordination group(s). Suitable metal moieties include metal cations, metal oxides, metal hydroxides, metal carbides, metal nitrides and/or metal nanoparticles.
Particular metal cations include lithium, sodium, potassium, rubidium, caesium, beryllium, magnesium, calcium, strontium, barium, chromium, manganese, iron, cobalt, nickel, copper, silver, gold, platinum, palladium, zinc, cadmium, mercury, aluminium, gallium, indium, tin, lead and bismuth. In a further embodiment, the metal cation is nickel.
More particularly, suitable cations include alkali metal ions (e.g. Li+ lithium ion, Na+ sodium ion, K+ potassium ion, Rb+ rubidium ion, Cs+ caesium ion), alkaline earth metal ions (e.g. Be2+ beryllium ion, Mg2+ magnesium ion, Ca2+ calcium ion, Sr2+ strontium ion, Ba2+ barium ion), transition metal ions (e.g. Ti2+ titanium (II) ion, Ti4+ titanium (IV) ion, V2+ vanadium (II) ion, V3+ vanadium (III) ion, V4+ vanadium (IV) ion, V5+ vanadium (V) ion, Cr2+ chromium (II) ion, Or3* chromium (III) ion, Cr®+ chromium (VI) ion, Mn2+ manganese
(II) ion, Mn3+ manganese (III) ion, Mn4+ manganese (IV) ion, Fe2+ iron (II) ion, Fe3+ iron
(III) ion, Co2+ cobalt (II) ion, Co3+ cobalt (III) ion, Ni2+ nickel (II) ion, Ni3+ nickel (III) ion, Cu+ copper (I) ion, Cu2+ copper (II) ion, Ag+ silver ion, Au+ gold (I) ion, Au3+ gold (III) ion, Pt2+ platinum (II) ion, Pt4+ platinum (IV) ion, Pd2+ palladium (II) ion, Pd4+ palladium (IV) ion, Zn2+ zinc ion, Cd2+ cadmium ion, Hg+ mercury (I) ion, Hg2+ mercury (II) ion), Group III metal ions (e.g. Al3+ aluminium ion, Ga3+ gallium ion, ln+ indium (I) ion, ln3+ indium (III) ion), Group IV metal ions (e.g. Sn2+ tin (II) ion, Sn4+ tin (IV) ion, Pb2+ lead (II) ion, Pb4+ lead (IV) ion), and/or Group V metal ions (e.g. Bi3+ bismuth (III) ion, Bi5+ bismuth (V) ion). In a further embodiment, the cation is a Ni2+ (II) ion.
The metal moiety may be in the form of a metal salt. Suitable metal salts include but are not limited to halides, nitriles, hydroxides and the like.
The metal moiety may be in the form of an oxide or nanoparticle. For example, iron oxide nanoparticles may be used. Other suitable oxides or nanoparticles include iron oxides, iron nitrides, iron carbides, iron metal particles, nickel oxides, nickel carbides, nickel particles, titanium oxides, titanium metal particles, titanium nitrides, titanium carbides, silver metal particles and gold metal particles.
In one embodiment, where the first linking group is present on the surface of the solid support, the first linking group may be capable of forming non-covalent interactions, and the first linking group may comprise a biotin moiety or an avidin (e.g. streptavidin); in a further embodiment, an avidin (e.g. streptavidin).
In one embodiment, where the first linking group is present on the surface of the solid support, the first linking group may be capable of forming covalent bonds, and the first linking group may comprise a hydroxyl group, an alkyne (e.g. a terminal alkyne) or an azide (e.g. poly(N-(5-azidoacetamidylpentyl)acrylamide-co-acrylamide, PAZAM); in a further embodiment, an azide (e.g. poly(N-(5-azidoacetamidylpentyl)acrylamide-co- acrylamide, PAZAM).
In one embodiment, where the first linking group is present on the first immobilised primer and/or the second immobilised primer, the first linking group may be capable of forming covalent bonds, and the first linking group may comprise an alkene moiety (e.g. an electron-deficient alkene such as 3-cyanovinylcarbazole).
As mentioned above, the solid support is useful in nucleic acid sequencing, particularly concurrent sequencing.
Accordingly, in another aspect of the present invention, there is provided a use of a solid support as described herein in nucleic acid sequencing.
In another aspect of the present invention, there is provided a process of manufacturing a solid support, comprising:
(a) immobilising a plurality of first precursor primers onto a solid support to form a plurality of first immobilised primers, wherein a proportion of the first precursor primers are configured to be cleavable under first cleavage conditions; and
(b) immobilising a plurality of second precursor primers onto the solid support to form a plurality of second immobilised primers, wherein a proportion or substantially all of the second precursor primers are configured to be cleavable under second cleavage conditions; wherein the proportion of the first precursor primers configured to be cleavable under first cleavage conditions is less than the proportion of the second precursor primers configured to be cleavable under second cleavage conditions.
In one embodiment, the second precursor primer is different in sequence to the first precursor primer.
The solid support to be manufactured may be a solid support as described herein. Accordingly, aspects relating to the solid support and other characteristics of the solid support (such as the first immobilised primers and the second immobilised primers) as described herein apply equally to processes as described herein for manufacturing the solid support. The term “first precursor primer” refers to a state of the first immobilised primers of the solid support before they are immobilised to the solid support. As such, the first precursor primers may be provided as “free” primers in solution. After immobilisation, the “first precursor primers” are then referred to as “first immobilised primers”.
The term “second precursor primer” refers to a state of the second immobilised primers of the solid support before they are immobilised to the solid support. As such, the second precursor primers may be provided as “free” primers in solution. After immobilisation, the “second precursor primers” are then referred to as “second immobilised primers”.
Steps (a) and (b) may be conducted sequentially or simultaneously.
For example, in one embodiment, where steps (a) and (b) are conducted sequentially, step (b) may be conducted after step (a). Alternatively, step (a) may be conducted after step (b).
In one embodiment, steps (a) and (b) may be conducted simultaneously.
The immobilisation method is not particularly limited provided that the first immobilised primers and the second immobilised primers remain on the solid support during amplification, clustering and sequencing.
In one embodiment, immobilisation may comprise forming covalent linkages between the solid support and each of the plurality of first precursor primers, and between the solid support and each of the plurality of second precursor primers. In a further embodiment, the forming covalent linkages involves using a click reaction (e.g. metal-catalysed azidealkyne cycloaddition reactions, such as copper-catalysed azide-alkyne cycloaddition reactions and strain-promoted azide-alkyne cycloadditions).
In particular, forming covalent linkages may involve forming a 1 ,2,3-triazole linkage. The solid support prior to immobilisation may include azide moieties (e.g. PAZAM), whilst the first precursor primers and the second precursor primers may each comprise alkyne moieties (e.g. terminal alkynes, cycloalkynes). A click reaction between the azide moieties on the solid support and the alkyne moieties on the first precursor primers and the second precursor primers allows a 1 ,2,3-triazole linkage to be formed. The configuration of azide moieties and alkyne moieties can also be swapped, for example by including alkyne moieties on the solid support prior to immobilisation, and including azide moieties on each of the first precursor primers and the second precursor primers.
In one embodiment, a ratio of a concentration of first precursor primers that are configured to be cleavable under first cleavage conditions used in step (a) compared to a concentration of second precursor primers that are configured to be cleavable under second cleavage conditions used in step (b) may be between 0.2 to 0.8; in a further embodiment, between 0.25 to 0.75; in an even further embodiment, between 0.5 to 0.75.
In one embodiment, a concentration of first precursor primers used in step (a) may be between 0.1 pM to 150 pM; in a further embodiment between 1 pM to 15 pM; in an even further embodiment, between 5.0 pM to 10 pM; in a yet even further embodiment, between 7.5 pM to 10 pM.
In one embodiment, a concentration of first precursor primers that are configured to be cleavable under first cleavage conditions used in step (a) may be between 0.1 pM to 100 pM; in a further embodiment between 1 pM to 10 pM; in an even further embodiment, between 2.5 pM to 7.5 pM; in a yet even further embodiment, between 5.0 pM to 7.5 pM.
In one embodiment, a concentration of second precursor primers used in step (b) may be between 0.1 pM to 150 pM; in a further embodiment between 1 pM to 15 pM; in an even further embodiment, between 5.0 pM to 10 pM; in a yet even further embodiment, between 7.5 pM to 10 pM.
In one embodiment, a concentration of second precursor primers that are configured to be cleavable under second cleavage conditions used in step (b) may be between 0.1 pM to 150 pM; in a further embodiment between 1 pM to 15 pM; in an even further embodiment, between 5.0 pM to 10 pM; in a yet even further embodiment, between 7.5 pM to 10 pM.
The solid support may be useful in methods of preparing polynucleotide sequences for identification.
Accordingly, in another aspect of the present invention, there is provided a method of preparing polynucleotide sequences for identification, comprising: providing a solid support as described herein, and synthesising a plurality of first polynucleotide sequences each comprising first portions and each extending from the first immobilised primers, and a plurality of second polynucleotide sequences each comprising second portions and each extending from the second immobilised primers.
By “identification” is meant here obtaining genetic information from the polynucleotide strands. This may include identification of the genetic sequence of the polynucleotide strands (i.e. sequencing). Furthermore, this may instead, or additionally, include identification of mismatched base pairs. In addition, this may instead, or additionally, include identification of any epigenetic modifications, for example methylation. Accordingly, “identification” may mean identification of the genetic sequence of the polynucleotide strands, mismatched base pairs, and/or identification of any epigenetic modifications.
The present invention can be applied to (separate) polynucleotide strands where a first strand comprises a first portion to be identified and a second strand comprises a second portion to be identified.
The first portions and second portions may be different polynucleotide sequences. That is, the sequences may be genetically unrelated and/or derived from different sources.
Alternatively, the first portions and second portions may be genetically related.
For example, the (separate) polynucleotide strands may comprise a first strand that comprises a first portion that may comprise (or be) the forward strand of a polynucleotide sequence (e.g. forward strand of a template), and a second strand that comprises a second portion that may comprise (or be) the reverse strand of the polynucleotide sequence (e.g. reverse strand of the template) or the forward complement strand of the polynucleotide sequence (e.g. forward complement strand of the template). As a further alternative, the (separate) polynucleotide strands may comprise a first strand that comprises a first portion that may comprise (or be) the reverse strand of a polynucleotide sequence (e.g. reverse strand of a template), and a second strand that comprises a second portion that may comprise (or be) the forward strand of the polynucleotide sequence (e.g. forward strand of the template) or the reverse complement strand of the polynucleotide sequence (e.g. reverse complement strand of the template). Alternatively, the (separate) polynucleotide strands may comprise a first strand that comprises a first portion that may comprise (or be) the forward strand of a polynucleotide sequence (e.g. forward strand of a template), and a second strand that comprises a second portion that may comprise (or be) the reverse complement strand of the polynucleotide sequence (e.g. reverse complement strand of the template) (in effect, a reverse complement strand may be considered a “copy” of the forward strand). As a further alternative, the (separate) polynucleotide strands may comprise a first strand that comprises a first portion that may comprise (or be) the reverse strand of a polynucleotide sequence (e.g. reverse strand of a template), and a second strand that comprises a second portion that may comprise (or be) the forward complement strand of the polynucleotide sequence (e.g. forward complement strand of the template) (in effect, a forward complement strand may be considered a “copy” of the reverse strand). In some embodiments, the first portion may be derived from a forward strand of a target polynucleotide to be sequenced, and the second portion may be derived from a reverse complement strand of the target polynucleotide to be sequenced; or the first portion may be derived from a reverse strand of a target polynucleotide to be sequenced, and the second portion may be derived from a forward complement strand of the target polynucleotide to be sequenced. In these particular embodiments, concurrent sequencing of both the forward and reverse complement strands (or the reverse and forward complement strands) allows mismatched base pairs and/or epigenetic modification to be detected.
The first portion may be referred to herein as read 1 (R1). The second portion may be referred to herein as read 2 (R2).
In one embodiment, the first portion is at least 25 or at least 50 base pairs and the second portion is at least 25 base pairs or at least 50 base pairs.
The polynucleotide strands may form or be part of a cluster on the solid support.
As used herein, the term “cluster” may refer to a clonal group of template polynucleotides (e.g. DNA or RNA) bound within a single well of a solid support (e.g. flow cell). As such, a cluster may refer to the population of polynucleotide molecules within a well that are then sequenced. A “cluster” may contain a sufficient number of copies of template polynucleotides such that the cluster is able to output a signal (e.g. a light signal) that allows sequencing reads to be performed on the cluster. A “cluster” may comprise, for example, about 500 to about 2000 copies, about 600 to about 1800 copies, about 700 to about 1600 copies, about 800 to about 1400 copies, about 900 to about 1200 copies, or about 1000 copies of template polynucleotides.
A cluster may be formed by bridge amplification, as described above.
The cluster formed may be a duoclonal cluster.
By “duoclonal” cluster is meant that the population of polynucleotide sequences that are then sequenced (as the next step) are substantially of two types - e.g. a first sequence and a second sequence. As such, a “duoclonal” cluster may refer to the population of single first sequences and single second sequences within a well that are then sequenced. A “duoclonal” cluster may contain a sufficient number of copies of a single first sequence and copies of a single second sequence such that the cluster is able to output a signal (e.g. a light signal) that allows sequencing reads to be performed on the “monoclonal” cluster. A “duoclonal” cluster may comprise, for example, about 500 to about 2000 combined copies, about 600 to about 1800 combined copies, about 700 to about 1600 combined copies, about 800 to about 1400 combined copies, about 900 to about 1200 combined copies, or about 1000 combined copies of single first sequences and single second sequences. The copies of single first sequences and single second sequences together may comprise at least about 50%, at least about 60%, at least about 70%, even at least about 80%, at least about 90%, or about 95%, 98%, 99% or 100% of all polynucleotides within a single well of the flow cell, and thus providing a substantially duoclonal “cluster”.
A first signal that is capable of being produced by the first portion and a second signal that is capable of being produced by the second portion may be spatially unresolved (e.g. generated from the same region or substantially overlapping regions). In one embodiment, a first region occupied by the first polynucleotide sequence comprising the first portion within the duoclonal cluster is the same as, or substantially overlapping with, a second region occupied by the second polynucleotide sequence comprising the second portion within the duoclonal cluster.
In one embodiment, the method may further comprise a step of exposing the solid support to first cleavage conditions and/or second cleavage conditions after the step of synthesising the plurality of first polynucleotide sequences and the second polynucleotide sequences. Suitable first cleavage conditions and second cleavage conditions are described above. In a further embodiment, the first cleavage conditions and/or second cleavage conditions may comprise exposure to a thermal trigger (e.g. by heating), a light trigger (e.g. by exposure to ultraviolet light), and/or a chemical/biochemical trigger (e.g. an enzyme, a metal catalyst including transition metal catalysts such as a palladium-based or a nickel-based catalyst, periodate).
The solid support may be exposed to the first cleavage conditions and then subsequently exposed to the second cleavage conditions, or may be exposed to the second cleavage conditions and then subsequently exposed to the first cleavage conditions (i.e. sequentially). Alternatively, the solid support may be exposed to the first cleavage conditions and the second cleavage conditions at the same time (i.e. simultaneously).
In one embodiment, the solid support may be exposed to a glycosylase. In other words, the first cleavage conditions and/or second cleavage conditions may involve exposure to a glycosylase. Suitable glycosylases are described above. In a further embodiment, the solid support is exposed to a uracil glycosylase and/or an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase).
In one embodiment, the method may further comprise a step of blocking 3’-ends of the first polynucleotide sequences and the second polynucleotide sequences to prevent further extension of the first polynucleotide sequences and the second polynucleotide sequences. When sequencing-by-synthesis methods are used, this ensures that the labelled nucleotides are added at the sites to be sequenced (i.e. forming base pairs with the first polynucleotide sequences and the second polynucleotide sequences), rather than the 3’-end of the template strands.
In one embodiment, the method may further comprise a step of removing first immobilised primers and/or second immobilised primers that are not extended. Alternatively, the method may further comprise a step of blocking 3’-ends of first immobilised primers and/or second immobilised primers that are not extended. When sequencing-by-synthesis methods are used, this ensures that labelled nucleotides are added at the sites to be sequenced (i.e. forming base pairs with the first polynucleotide sequences and the second polynucleotide sequences), rather than the 3’-end of first immobilised primers and/or second immobilised primers that are not yet extended.
As mentioned above, it can be advantageous to form cross-links between the template strands and the surface of the solid support and/or the immobilised primers, as this prevents the template strands from becoming washed away (e.g. during template melting). Accordingly, the solid support may comprise a first linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with a second linking group on the surface of the solid support, the first immobilised primer, and/or the second immobilised primer; and the first polynucleotide sequence and/or the second polynucleotide sequence may comprise a second linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with the first linking group. Suitable non-covalent interactions, covalent bonds or metalcoordination bonds that can be formed by the second linking group have already been described in connection with the first linking group, and these apply equally to the second linking group.
In one embodiment, the solid support may comprise a first linking group on the surface of the solid support (e.g. as described herein), the first polynucleotide sequence and/or the second polynucleotide sequence may comprise a second linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with the first linking group, and the method further comprises a step of forming the non-covalent interaction, covalent bond and/or metal-coordination bond between the first linking group and the second linking group.
In one embodiment, the solid support may comprise a first linking group on the first immobilised primer and/or the second immobilised primer, the first polynucleotide sequence and/or the second polynucleotide sequence may comprise a second linking group capable of forming non-covalent interactions, covalent bonds, or metalcoordination bonds with the first linking group, and the method further comprises a step of forming the non-covalent interaction, covalent bond and/or metal-coordination bond between the first linking group and the second linking group.
In one embodiment, where the first linking group is present on the surface of the solid support, the second linking group may be capable of forming non-covalent interactions, and the second linking group may comprise an avidin (e.g. streptavidin) or a biotin moiety; in a further embodiment, a biotin moiety. In particular, where the first linking group may comprise an avidin (e.g. streptavidin), the second linking group may comprise a biotin moiety. Such a second linking group may be attached by extending a 3’-end of a template strand (e.g. the first polynucleotide sequence and/or the second polynucleotide sequence) and attaching a nucleotide comprising a biotin moiety. For example, the 3’-end of the template strand (e.g. the first polynucleotide sequence and/or the second polynucleotide sequence) may comprise a poly-A tail and a nucleotide comprising a biotin moiety.
In one embodiment, where the first linking group is present on the surface of the solid support, the second linking group may be capable of forming covalent bonds, and the second linking group may comprise a thiophosphate, an azide or an alkyne (e.g. a terminal alkyne); in a further embodiment, an alkyne (e.g. a terminal alkyne). In particular, where the first linking group may comprise a hydroxyl group, the second linking group may comprise a thiophosphate. In addition, in particular, where the first linking group may comprise an azide (e.g. PAZAM), the second linking group may comprise an alkyne (e.g. a terminal alkyne). Such a second linking group may be attached by extending a 3’- end of a template strand (e.g. the first polynucleotide sequence and/or the second polynucleotide sequence) and attaching a nucleotide comprising an alkyne or a thiophosphate group (e.g. an alkyne or a thiophosphate group attached to the 3’-end of the nucleotide).
In one embodiment, where the first linking group is present on the first immobilised primer and/or the second immobilised primer, the second linking group may be capable of forming covalent bonds, and the second linking group may comprise an alkene moiety (e.g. a pyrimidine nucleobase, such as a thymine, uracil or cytosine nucleobase). In particular, where the first linking group may comprise an alkene moiety (e.g. an electrondeficient alkene such as 3-cyanovinylcarbazole), the second linking group may comprise an alkene (e.g. a pyrimidine nucleobase, such as a thymine, uracil or cytosine nucleobase). Such a second linking group may be already present in the template strand (e.g. the first polynucleotide sequence and/or the second polynucleotide sequence). Alternatively, such a second linking group may be introduced into the template strand (e.g. the first polynucleotide sequence and/or the second polynucleotide sequence) by extending a 3’-end of the template strand and attaching a nucleotide comprising thymine, uracil and/or cytosine (e.g. a poly-T tail, a poly-ll tail, or a poly-C tail).
In one embodiment, the non-covalent interaction may comprise an avidin-biotin interaction (e.g. a streptavidin-biotin interaction), in particular where the first linking group is present on the surface of the solid support.
In one embodiment, the covalent bond may comprise a thiophosphate ester linkage, or a triazole linkage (e.g. a 1 ,2,3-triazole linkage), in particular where the first linking group is present on the surface of the solid support. Where a triazole linkage (e.g. a 1 ,2,3- triazole) linkage is present, the linkage may be formed by using click chemistry (e.g. metal-catalysed azide-alkyne cycloaddition reactions, such as copper-catalysed azidealkyne cycloaddition reactions and strain-promoted azide-alkyne cycloadditions).
In one embodiment, the covalent bond may comprise a cyclobutane linkage, in particular where the first linking group is present on the first immobilised primer and/or the second immobilised primer. Where a cyclobutane linkage is present, the linkage may be formed by [2 + 2] cycloaddition reactions (e.g. [2 + 2] photocycloadditions, such as those promoted by irradiation using ultraviolet light).
In one embodiment, the method may further comprise a step of linearising the first polynucleotide sequence and the second polynucleotide sequence. In a further embodiment, the method may further comprise treating the linearised first polynucleotide sequence and the linearised second polynucleotide sequence with a single-stranded binding protein.
Methods of sequencing
Also described herein is a method of sequencing polynucleotide sequences, comprising preparing polynucleotide sequences for identification using a method as described herein; and concurrently sequencing nucleobases in the first portion and the second portion.
In one embodiment, sequencing is performed by sequencing-by-synthesis or sequencing-by-ligation.
In one embodiment, the step of concurrently sequencing nucleobases may comprise treatment with a strand displacement polymerase (e.g. phi29).
In one embodiment, the step of concurrently sequencing nucleobases may comprise treatment with a 5’-3’ exonuclease. For example, a polymerase (e.g. DNA polymerase) and a 5’-3’ exonuclease or a polymerase (e.g. DNA polymerase) with 5’-3’ exonuclease activity may be used. The 5’-3’ exonuclease activity is used to essentially clear the path ahead of the growing strand (i.e. remove the downstream hybridised strand). Exonucleases are enzymes that catalyse the removal of individual nucleotides from an end of a polynucleotide chain, by cleaving the phosphodiester bond via hydrolysis. In one embodiment, the polymerase (e.g. DNA polymerase) has native 5’-3’ exonuclease activity. That is, the polymerase (e.g. DNA polymerase) naturally has 5’-3’ exonuclease activity. Examples of polymerases (e.g. DNA polymerases) with native 5’-3’ exonuclease activity include: Taq DNA Polymerase, T7 DNA polymerase and DNA Polymerase I.
In another embodiment, a polymerase (e.g. DNA polymerase) and an exonuclease may be separately applied (e.g. flowed across the solid support) either sequentially or concurrently. In one embodiment, the exonuclease may be applied prior to incorporation of the polymerase. Examples of suitable exonucleases include RecJf, Lambda Exonuclease, T7 exonuclease domain, T5 exonuclease, or the DNA polymerase l-like H3TH domain, Exonuclease V (RecBCD) or Exonuclease VII, with ss- and ds- DNA exonuclease activity.
In one embodiment, the polymerase (e.g. DNA polymerase) may be engineered to have 5’-3’ exonuclease activity. In one example, a polymerase (e.g. DNA polymerase) may be fused with a protein possessing 5’-3’ exonuclease activity. In this way, the resulting fusion protein will contain an exonuclease domain, a DNA-binding domain and a polymerase domain. Suitable polymerases include Pfu DNA polymerase, DNA Pol 5, Therminator™ DNA Polymerase, DNA Polymerase III. Commercially available fusion proteins include the Phusion DNA polymerase. Alternatively, a fusion protein could be engineered using the polymerases and exonucleases discussed above. The skilled person would understand that fusion proteins can be recombinant fusion proteins, created through genetic engineering of a fusion gene. Most commonly, fusion proteins are created by tandem fusion or linker-mediated fusion.
In one embodiment, the method may further comprise a step of conducting paired-end reads.
In some embodiments, the data may be analysed using 16 QAM as mentioned herein.
Accordingly, the step of concurrently sequencing nucleobases may comprise:
(a) obtaining first intensity data comprising a combined intensity of a first signal component obtained based upon a respective first nucleobase at the first portion and a second signal component obtained based upon a respective second nucleobase at the second portion, wherein the first and second signal components are obtained simultaneously; (b) obtaining second intensity data comprising a combined intensity of a third signal component obtained based upon the respective first nucleobase at the first portion and a fourth signal component obtained based upon the respective second nucleobase at the second portion, wherein the third and fourth signal components are obtained simultaneously;
(c) selecting one of a plurality of classifications based on the first and the second intensity data, wherein each classification represents a possible combination of respective first and second nucleobases; and
(d) based on the selected classification, base calling the respective first and second nucleobases.
In one embodiment, selecting the classification based on the first and second intensity data may comprise selecting the classification based on the combined intensity of the first and second signal components and the combined intensity of the third and fourth signal components.
In one embodiment, the plurality of classifications may comprise sixteen classifications, each classification representing one of sixteen unique combinations of first and second nucleobases.
In one embodiment, the first signal component, second signal component, third signal component and fourth signal component may be generated based on light emissions associated with the respective nucleobase.
In one example, the light emissions may be detected by a sensor, wherein the sensor is configured to provide a single output based upon the first and second signals.
In one embodiment, the sensor may comprise a single sensing element.
In one embodiment, the method may further comprise repeating steps (a) to (d) for each of a plurality of base calling cycles.
Kits
Methods as described herein may be performed by a user physically. In other words, a user may themselves conduct the methods of preparing polynucleotide sequences for identification as described herein, and as such the methods as described herein may not need to be computer-implemented.
In another aspect of the invention, there is provided a kit comprising a solid support as described herein.
In another aspect of the invention, there is provided a kit comprising instructions for preparing polynucleotide sequences for identification according to the methods described herein and/or sequencing polynucleotide sequences according to the methods described herein.
In another aspect of the invention there is provided a sequencing kit comprising a plurality of first and second sequencing primers, wherein the first sequencing primers comprises both terminated and non-terminated primers.
In one example, the terminated first sequencing primers comprises or consists of a sequence selected from SEQ ID NOs: 7 to 10, or a variant or fragment thereof.
In another example, the non-terminated first sequencing primers comprise or consist of a sequence selected from SEQ ID NOs: 7 to 10 or a variant or fragment thereof and the second sequencing primers comprises or consists of a different sequence selected from SEQ ID NOs: 7 to 10, or a variant or fragment thereof.
The kit may also comprise labelled first and second sequencing primers, as described above. In one embodiment, the labelled second sequencing primers may comprise a cleavable site linking the label and the sequencing primer, as described above.
The kit may also further comprises a polymerase.
Computer programs and products
In other embodiments, methods as described herein may be performed by a computer. In other words, a computer may contain instructions to conduct the methods of preparing polynucleotide sequences for identification as described herein, and as such the methods as described herein may be computer-implemented. Accordingly, in another aspect of the invention, there is provided a data processing device comprising means for carrying out the methods as described herein.
The data processing device may be a polynucleotide sequencer.
The data processing device may comprise reagents used for methods as described herein.
The data processing device may comprise a solid support as described herein, such as a flow cell.
In another aspect of the invention, there is provided a computer program product comprising instructions which, when the program is executed by a processor, cause the processor to carry out the methods as described herein.
In another aspect of the invention, there is provided a computer-readable storage medium comprising instructions which, when executed by a processor, cause the processor to carry out the methods as described herein.
In another aspect of the invention, there is provided a computer-readable data carrier having stored thereon the computer program product as described herein.
In another aspect of the invention, there is provided a data carrier signal carrying the computer program product as described herein.
The various illustrative imaging or data processing techniques described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of both. To illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. The described functionality can be implemented in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosure. The various illustrative detection systems described in connection with the embodiments disclosed herein can be implemented or performed by a machine, such as a processor configured with specific instructions, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A processor can be a microprocessor, but in the alternative, the processor can be a controller, microcontroller, or state machine, combinations of the same, or the like. A processor can also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. For example, systems described herein may be implemented using a discrete memory chip, a portion of memory in a microprocessor, flash, EPROM, or other types of memory.
The elements of a method, process, or algorithm described in connection with the embodiments disclosed herein can be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of computer-readable storage medium known in the art. An exemplary storage medium can be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium can be integral to the processor. The processor and the storage medium can reside in an ASIC. A software module can comprise computer-executable instructions which cause a hardware processor to execute the computer-executable instructions.
Computer-executable instructions may be stored in a (transitory or non-transitory) computer readable storage medium (e.g., memory, storage system, etc.) storing code, or computer readable instructions.
Additional Notes
The embodiments described herein are exemplary. Modifications, rearrangements, substitute processes, etc. may be made to these embodiments and still be encompassed within the teachings set forth herein. One or more of the steps, processes, or methods described herein may be carried out by one or more processing and/or digital devices, suitably programmed. Conditional language used herein, such as, among others, “can,” “might,” “may,” “e.g.,” and the like, unless specifically stated otherwise, or otherwise understood within the context as used, is generally intended to convey that certain embodiments include, while other embodiments do not include, certain features, elements and/or states. Thus, such conditional language is not generally intended to imply that features, elements and/or states are in any way required for one or more embodiments or that one or more embodiments necessarily include logic for deciding, with or without author input or prompting, whether these features, elements and/or states are included or are to be performed in any particular embodiment. The terms “comprising,” “including,” “having,” “involving,” and the like are synonymous and are used inclusively, in an open-ended fashion, and do not exclude additional elements, features, acts, operations, and so forth. Also, the term “or” is used in its inclusive sense (and not in its exclusive sense) so that when used, for example, to connect a list of elements, the term “or” means one, some, or all of the elements in the list. The term “comprising” may be considered to encompass “consisting”.
Disjunctive language such as the phrase “at least one of X, Y or Z,” unless specifically stated otherwise, is otherwise understood with the context as used in general to present that an item, term, etc., may be either X, Y or Z, or any combination thereof (e.g., X, Y and/or Z). Thus, such disjunctive language is not generally intended to, and should not, imply that certain embodiments require at least one of X, at least one of Y or at least one of Z to each be present.
The terms “about” or “approximate” and the like are synonymous and are used to indicate that the value modified by the term has an understood range associated with it, where the range can be ±20%, ±15%, ±10%, ±5%, or ±1%. The term “substantially” is used to indicate that a result (e.g., measurement value) is close to a targeted value, where close can mean, for example, the result is within 80% of the value, within 90% of the value, within 95% of the value, or within 99% of the value. The term “partially” is used to indicate that an effect is only in part or to a limited extent.
Unless otherwise explicitly stated, articles such as “a” or “an” should generally be interpreted to include one or more described items. Accordingly, phrases such as “a device configured to” or “a device to” are intended to include one or more recited devices. Such one or more recited devices can also be collectively configured to carry out the stated recitations. For example, “a processor to carry out recitations A, B and C” can include a first processor configured to carry out recitation A working in conjunction with a second processor configured to carry out recitations B and C.
While the above detailed description has shown, described, and pointed out novel features as applied to illustrative embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the devices or algorithms illustrated can be made without departing from the spirit of the disclosure. As will be recognized, certain embodiments described herein can be embodied within a form that does not provide all of the features and benefits set forth herein, as some features can be used or practiced separately from others. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
It should be appreciated that all combinations of the foregoing concepts (provided such concepts are not mutually inconsistent) are contemplated as being part of the inventive subject matter disclosed herein. In particular, all combinations of claimed subject matter appearing at the end of this disclosure are contemplated as being part of the inventive subject matter disclosed herein.
The present invention will now be described by way of the following non-limiting examples.
Examples
Oligos:
BCN-P5-U (SEQ ID NO. 11)
5’BCN-TTTTTTAATGATACGGCGACCACCGAGAUCTACAC
BCN-P7-8oxoG (SEQ ID NO. 12)
5’BCN-TTTTTTCAAGCAGAAGACGGCATAC(8-oxo-G)AGAT
BCN-P7-non-lin (SEQ ID NO. 13)
5’BCN-TTTTTTCAAGCAGAAGACGGCATACGAGAT
BCN = bicyclononyne
Overview summary of flowcell:
Figure imgf000089_0001
Reference Example 1
1. A PAZAM coated, polished, non-grafted HiSeq4K flowcell was grafted with 10 pM P7-8oxoG and 10 M P5-U (lanes 1 and 2): oligo mixes were made up in 1.125M Na2SC>4 with 0.1% Tween20. 175 pl of this buffer was spiked with the appropriate primers and used in the grafting reaction on an Illumina cBot. The grafting mix was pumped onto the flowcell surface and incubated at 60 °C for 60 mins to allow the click chemistry between the 5’ BCNs on the oligos and the free azides to take place. After the grafting step, the flowcell lanes were washed with HT 1 buffer and then oligo grafting checked via a TET-QC assay (hybridisation of TET labelled complements to the P5/P7 oligo sequences, Typhoon instrument scanning to assess the levels of TET signal in each lane). TET-QC oligos were removed by 0.1 N NaOH dehyb before the flowcell was used to make clusters.
2. Flowcell lanes were seeded with 200 pM of PhiX v3 control library in EPX (HiSeqX ExAmp mix) and amplified for 60 mins at 38 °C, followed by 10 cycles of bridge boost amplification from a standard HiSeqX cBot reagent plate.
3. After the amplification was complete, lanes were prepared for either “read 1” or “read 2” by linearising the P5 oligos with LMX1 (for read 1 in lane 1) or linearising the linearizable P7 oligos with FpG (PLM2v2 reagent, HiSeqX PE kit) (for read 2 in lane 2). Linearisation for both cases was for 30 mins at 38 °C. Lanes were then primer hybed with either HP10 (read 1 primer mix) or HP11 (read 2 primer mix) as appropriate.
4. The flowcell was then put on a HiSeqX for a 1x36 cycle run.
Intensity results are shown in Figure 14. As expected, in lanes 1 and 2, the intensities for R1 and R2 are approximately equal. The same method as Reference Example 1 was used, but this time with 7.5 pM P7- 8oxoG, 2.5 pM P7-non-lin and 10 pM P5-LI for grafting (read 1 in lane 3, read 2 in lane 4). Intensity results are shown in Figure 14. In lanes 3 and 4, the intensity of R2 is roughly half that of R1.
Example 2
The same method as Reference Example 1 was used, but this time with 5.0 pM P7- 8oxoG, 5.0 pM P7-non-lin and 10 pM P5-LI for grafting (read 1 in lane 5, read 2 in lane 6). Intensity results are shown in Figure 14. In lanes 5 and 6, the intensity of R2 is roughly a fifth of that of R1.
Example 3
The same method as Reference Example 1 was used, but this time with 2.5 pM P7- 8oxoG, 7.5 pM P7-non-lin and 10 pM P5-LI for grafting (read 1 in lane 7, read 2 in lane 8).
Intensity results are shown in Figure 14. In lanes 7 and 8, the intensity of R2 is roughly a tenth of that of R1.
Overall, these results show that R1 linearised lanes all show similar first cycle intensity (lanes 1 , 3, 5 and 7), while the R2 linearised lanes (lanes 2, 4, 6 and 8) show decreasing intensity due to the increasing concentration of P7-nonlin within the grafting mix. Lanes 3 and 4 in particular show an almost 2: 1 ratio of first cycle intensities as desired, but the flowcell in general shows how the ratio is completely tunable as required.
SEQUENCE LISTING
SEQ ID NO. 1 : P5 sequence
AATGATACGGCGACCACCGAGATCTACAC
SEQ ID NO. 2: P7 sequence
CAAGCAGAAGACGGCATACGAGAT
SEQ ID NO. 3: P5’ sequence (complementary to P5)
GTGTAGATCTCGGTGGTCGCCGTATCATT
SEQ ID NO. 4: P7’ sequence (complementary to P7)
ATCTCGTATGCCGTCTTCTGCTTG SEQ ID NO. 5: Alternative P5 sequence
AATGATACGGCGACCGA
SEQ ID NO. 6: Alternative P5’ sequence (complementary to alternative P5 sequence)
TCGGTCGCCGTATCATT
SEQ ID NO. 7: SBS3
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
SEQ ID NO. 8: SBS3’
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
SEQ ID NO. 9: SBS12
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
SEQ ID NO. 10: SBS12’
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
SEQ ID NO. 11 : BCN-P5-U
5’BCN-TTTTTTAATGATACGGCGACCACCGAGAUCTACAC
BCN = bicyclononyne
SEQ ID NO. 12: BCN-P7-8oxoG
5’BCN-TTTTTTCAAGCAGAAGACGGCATAC(8-oxo-G)AGAT
BCN = bicyclononyne
SEQ ID NO. 13: BCN-P7-non-lin
5’BCN-TTTTTTCAAGCAGAAGACGGCATACGAGAT
BCN = bicyclononyne

Claims

CLAIMS:
1. A solid support, comprising: a plurality of first immobilised primers, wherein a proportion of the first immobilised primers are configured to be cleavable under first cleavage conditions; a plurality of second immobilised primers, wherein a proportion or substantially all of the second immobilised primers are configured to be cleavable under second cleavage conditions; and wherein the proportion of the first immobilised primers configured to be cleavable under first cleavage conditions is less than the proportion of the second immobilised primers configured to be cleavable under second cleavage conditions.
2. A solid support according to claim 1 , wherein the solid support comprises at least one well, and wherein the plurality of first immobilised primers and the plurality of second immobilised primers are located within the well.
3. A solid support according to claim 1 or claim 2, wherein the solid support comprises a plurality of wells, wherein the plurality of first immobilised primers and the plurality of second immobilised primers are located within each of the wells.
4. A solid support according to any one of claims 1 to 3, wherein the proportion of first immobilised primers are configured to be cleavable by a thermal trigger, a light trigger, and/or a chemical/biochemical trigger.
5. A solid support according to any one of claims 1 to 4, wherein the proportion of first immobilised primers are configured to be cleavable by a glycosylase.
6. A solid support according to claim 5, wherein the proportion of first immobilised primers are configured to be cleavable by a uracil glycosylase or an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase).
7. A solid support according to claim 6, wherein the proportion of first immobilised primers are configured to be cleavable by an oxoguanine glycosylase (e.g. 8- oxoguanine glycosylase).
8. A solid support according to any one of claims 1 to 7, wherein each first immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or thymine when the first immobilised primer is a DNA sequence; or wherein each first immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or uracil when the first immobilised primer is an RNA sequence.
9. A solid support according to claim 8, wherein each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) or uracil when the first immobilised primer is a DNA sequence, or wherein each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is an RNA sequence.
10. A solid support according to claim 9, wherein each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is a DNA sequence.
11. A solid support according to any one of claims 1 to 10, wherein the proportion of second immobilised primers are configured to be cleavable by a thermal trigger, a light trigger, or a chemical/biochemical trigger.
12. A solid support according to any one of claims 1 to 11 , wherein the proportion of second immobilised primers are configured to be cleavable by a glycosylase.
13. A solid support according to claim 12, wherein the proportion of second immobilised primers are configured to be cleavable by a uracil glycosylase or an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase).
14. A solid support according to claim 13, wherein the proportion of second immobilised primers are configured to be cleavable by a uracil glycosylase.
15. A solid support according to any one of claims 1 to 14, wherein each second immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or thymine when the second immobilised primer is a DNA sequence; or wherein each second immobilised primer that is cleavable comprises a nucleobase which is not selected from guanine, cytosine, adenine or uracil when the second immobilised primer is an RNA sequence.
16. A solid support according to claim 15, wherein each second immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) or uracil when the second immobilised primer is a DNA sequence, or wherein each second immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the second immobilised primer is an RNA sequence
17. A solid support according to claim 16, wherein each second immobilised primer that is cleavable comprises uracil when the second immobilised primer is a DNA sequence.
18. A solid support according to any one of claims 1 to 6, 8, 9, 11 to 13, 15 or 16, wherein each first immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the first immobilised primer is a DNA sequence, and wherein each second immobilised primer that is cleavable comprises uracil when the second immobilised primer is a DNA sequence; or wherein each first immobilised primer that is cleavable comprises uracil when the first immobilised primer is a DNA sequence, and wherein each second immobilised primer that is cleavable comprises oxoguanine (e.g. 8-oxoguanine) when the second immobilised primer is a DNA sequence.
19. A solid support according to any one of claims 1 to 18, wherein the first cleavage conditions and the second cleavage conditions are the same or are different.
20. A solid support according to claim 19, wherein the first cleavage conditions and the second cleavage conditions are different.
21 . A solid support according to any one of claims 1 to 20, wherein a ratio between the first immobilised primers configured to be cleavable under first cleavage conditions and the second immobilised primers configured to be cleavable under second cleavage conditions is between 1 :1.25 to 1 :5.
22. A solid support according to claim 21 , wherein the ratio is between 1 :1.5 to 1 :3.
23. A solid support according to claim 22, wherein the ratio is about 1 :2.
24. A solid support according to any one of claims 1 to 23, wherein the proportion of first immobilised primers cleavable under first cleavage conditions relative to a proportion of first immobilised primers which are not cleavable under first cleavage conditions is between 20:80 to 80:20.
25. A solid support according to claim 24, wherein the proportion is between 1 :3 to 3:1.
26. A solid support according to claim 25, wherein the proportion is between 1 :2 to 2:1.
27. A solid support according to claim 26, wherein the proportion is about 1 :1.
28. A solid support according to any one of claims 1 to 27, wherein the proportion of first immobilised primers cleavable under first cleavage conditions relative to a total population of first immobilised primers is between 0.2 to 0.8.
29. A solid support according to claim 28, wherein the proportion is between 0.25 to 0.75.
30. A solid support according to claim 29, wherein the proportion is between 1Zs to %.
31 . A solid support according to claim 30, wherein the proportion is about 0.5.
32. A solid support according to any one of claims 1 to 31 , wherein the proportion of second immobilised primers cleavable under second cleavage conditions relative to a total population of second immobilised primers is 0.9 or more.
33. A solid support according to claim 32, wherein the proportion is 0.95 or more.
34. A solid support according to claim 33, wherein substantially all of the second immobilised primers are cleavable under second cleavage conditions.
35. A solid support according to any one of claims 1 to 34, wherein each first immobilised primer comprises a sequence as defined in SEQ ID NO. 1 or 5, or a variant or fragment thereof; and each second immobilised primer comprises a sequence as defined in SEQ ID NO. 2, or a variant or fragment thereof; or wherein each first immobilised primer comprises a sequence as defined in SEQ ID NO. 2, or a variant or fragment thereof; and each second immobilised primer comprises a sequence as defined in SEQ ID NO. 1 or 5, or a variant or fragment thereof.
36. A solid support according to any one of claims 1 to 35, wherein a surface of the solid support comprises at least one first linking group capable of forming non- covalent interactions, covalent bonds, or metal-coordination bonds with a second linking group.
37. A solid support according to claim 36, wherein the surface of the solid support comprises at least one first linking group capable of forming non-covalent interactions or covalent bonds with the second linking group.
38. A solid support according to claim 37, wherein the first linking group is capable of forming non-covalent interactions, and the first linking group comprises a biotin moiety or an avidin (e.g. streptavidin).
39. A solid support according to claim 38, wherein the first linking group comprises an avidin (e.g. streptavidin).
40. A solid support according to claim 37, wherein the first linking group is capable of forming covalent bonds, and the first linking group comprises a hydroxyl group, an alkyne (e.g. a terminal alkyne) or an azide (e.g. poly(N-(5- azidoacetamidylpentyl)acrylamide-co-acrylamide, PAZAM).
41. A solid support according to claim 40, wherein the first linking group comprises an azide (e.g. poly(N-(5-azidoacetamidylpentyl)acrylamide-co-acrylamide, PAZAM).
42. A solid support according to any one of claims 1 to 41 , wherein the first immobilised primer and/or the second immobilised primer comprises a first linking group capable of forming non-covalent interactions, covalent bonds, or metalcoordination bonds with a second linking group.
43. A solid support according to claim 42, wherein the first linking group is capable of forming covalent bonds with the second linking group.
44. A solid support according to claim 43, wherein the first linking group is capable of forming covalent bonds, and the first linking group comprises an alkene moiety (e.g. an electron-deficient alkene such as 3-cyanovinylcarbazole).
45. A solid support according to any one of claims 1 to 44, wherein the solid support is a flow cell.
46. A kit comprising a solid support according to any one of claims 1 to 45.
47. Use of a solid support according to any one of claims 1 to 45 in nucleic acid sequencing.
48. A process of manufacturing a solid support, comprising:
(a) immobilising a plurality of first precursor primers onto a solid support to form a plurality of first immobilised primers, wherein a proportion of the first precursor primers are configured to be cleavable under first cleavage conditions; and
(b) immobilising a plurality of second precursor primers onto the solid support to form a plurality of second immobilised primers, wherein a proportion or substantially all of the second precursor primers are configured to be cleavable under second cleavage conditions; wherein the proportion of the first precursor primers configured to be cleavable under first cleavage conditions is less than the proportion of the second precursor primers configured to be cleavable under second cleavage conditions.
49. A process according to claim 48, wherein steps (a) and (b) are conducted sequentially or simultaneously.
50. A process according to claim 49, wherein step (b) is conducted after step (a).
51. A process according to claim 49, wherein step (a) is conducted after step (b).
52. A process according to claim 49, wherein steps (a) and (b) are conducted simultaneously.
53. A process according to any one of claims 48 to 52, wherein a ratio of a concentration of first precursor primers that are configured to be cleavable under first cleavage conditions used in step (a) compared to a concentration of second precursor primers that are configured to be cleavable under second cleavage conditions used in step (b) is between 0.2 to 0.8.
54. A process according to claim 53, wherein the ratio is between 0.25 to 0.75.
55. A process according to claim 54, wherein the ratio is between 0.5 to 0.75.
56. A process according to any one of claims 48 to 55, wherein immobilisation comprises forming covalent linkages between the solid support and each of the plurality of first precursor primers, and between the solid support and each of the plurality of second precursor primers.
57. A process according to claim 56, wherein forming covalent linkages involves using a click reaction.
58. A process according to claim 56 or claim 57, wherein forming covalent linkages involves forming a 1 ,2,3-triazole linkage.
59. A method of preparing polynucleotide sequences for identification, comprising: providing a solid support according to any one of claims 1 to 45, and synthesising a plurality of first polynucleotide sequences each comprising first portions and each extending from the first immobilised primers, and a plurality of second polynucleotide sequences each comprising second portions and each extending from the second immobilised primers.
60. A method according to claim 59, further comprising a step of exposing the solid support to first cleavage conditions and/or second cleavage conditions after the step of synthesising the plurality of first polynucleotide sequences and the second polynucleotide sequences.
61. A method according to claim 60, wherein the first cleavage conditions and/or second cleavage conditions comprise exposure to a thermal trigger, a light trigger, and/or a chemical/biochemical trigger.
62. A method according to claim 61 , wherein the solid support is exposed to a glycosylase.
63. A method according to claim 62, wherein the solid support is exposed to a uracil glycosylase and/or an oxoguanine glycosylase (e.g. 8-oxoguanine glycosylase).
64. A method according to any one of claims 59 to 63, further comprising a step of blocking 3’-ends of the first polynucleotide sequences and the second polynucleotide sequences to prevent further extension of the first polynucleotide sequences and the second polynucleotide sequences.
65. A method according to any one of claims 59 to 64, further comprising a step of removing first immobilised primers and/or second immobilised primers that are not extended.
66. A method according to any one of claims 59 to 65, wherein the solid support is a solid support according to any one of claims 36 to 41 , wherein the first polynucleotide sequence and/or the second polynucleotide sequence comprises a second linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with the first linking group, and the method further comprises a step of forming the non-covalent interaction, covalent bond and/or metal-coordination bond between the first linking group and the second linking group.
67. A method according to claim 66, wherein the second linking group is capable of forming non-covalent interactions, and the second linking group comprises an avidin (e.g. streptavidin) or a biotin moiety.
68. A method according to claim 67, wherein the second linking group comprises a biotin moiety.
69. A method according to claim 66, wherein the second linking group is capable of forming covalent bonds, and the second linking group comprises a thiophosphate, an azide or an alkyne (e.g. a terminal alkyne).
70. A method according to claim 67 or claim 68, wherein the non-covalent interaction comprises an avidin-biotin interaction (e.g. a streptavidin-biotin interaction).
71. A method according to claim 69, wherein the covalent bond comprises a thiophosphate ester linkage, or a triazole linkage.
72. A method according to any one of claims 59 to 71 , wherein the solid support is a solid support according to any one of claims 42 to 44, wherein the first polynucleotide sequence and/or the second polynucleotide sequence comprises a second linking group capable of forming non-covalent interactions, covalent bonds, or metal-coordination bonds with the first linking group, and the method further comprises a step of forming the non-covalent interaction, covalent bond and/or metal-coordination bond between the first linking group and the second linking group.
73. A method according to claim 72, wherein the second linking group is capable of forming covalent bonds, and the second linking group comprises an alkene moiety (e.g. a pyrimidine nucleobase, such as a thymine, uracil or cytosine nucleobase).
74. A method according to claim 73, wherein the covalent bond comprises a cyclobutane linkage.
75. A method according to any one of claims 59 to 74, wherein the method further comprises a step of linearising the first polynucleotide sequence and the second polynucleotide sequence.
76. A method according to claim 75, wherein the method further comprises treating the linearised first polynucleotide sequence and the linearised second polynucleotide sequence with a single-stranded binding protein.
77. A method of sequencing polynucleotide sequences, comprising: preparing polynucleotide sequences for identification using a method according to any one of claims 59 to 76; and concurrently sequencing nucleobases in the first portion and the second portion.
78. A method according to claim 77, wherein the step of concurrently sequencing nucleobases comprises performing sequencing-by-synthesis or sequencing-by- ligation.
79. A method according to claim 77 or claim 78, wherein the step of concurrently sequencing nucleobases comprises treatment with a strand displacement polymerase (e.g. phi29).
80. A method according to claim 77 or claim 78, wherein the step of concurrently sequencing nucleobases comprises treatment with a 5’-3’ exonuclease.
81. A method according to any one of claims 77 to 80, wherein the method further comprises a step of conducting paired-end reads.
82. A kit comprising instructions for preparing polynucleotide sequences for identification according to any one of claims 59 to 76; and/or polynucleotide sequences according to any one of claims 77 to 81 .
83. A data processing device comprising means for carrying out a method according to any one of claims 59 to 81 .
84. A data processing device according to claim 83, wherein the data processing device is a polynucleotide sequencer.
85. A computer program product comprising instructions which, when the program is executed by a processor, cause the processor to carry out a method according to any one of claims 59 to 81 .
86. A computer-readable storage medium comprising instructions which, when executed by a processor, cause the processor to carry out a method according to any one of claims 59 to 81 .
87. A computer-readable data carrier having stored thereon a computer program product according to claim 85.
88. A data carrier signal carrying a computer program product according to claim 85.
89. A method of preparing a first and second polynucleotide strand for concurrent sequencing, wherein the method comprises:
(a) measuring a first signal intensity generated from a first polynucleotide strand; and
(b) measuring a second signal intensity generated from a second polynucleotide strand; and based on the first signal intensity and the second signal intensity, a subsequent signal intensity generated from sequencing the first nucleotide strand can be attenuated, wherein the attenuated subsequent signal intensity generated from sequencing the first polynucleotide strand has a lower intensity than a subsequent signal intensity generated from sequencing the second polynucleotide strand.
90. The method of claim 89, wherein measurement of the intensity of the first signal and the second signal is carried out sequentially.
91. The method of claim 89 or 90, wherein the method comprises applying a first sequencing primer that binds to the first nucleotide strand, wherein the first sequencing primer is labelled, obtaining a first signal from the label and determining the intensity of the first signal.
92. The method of claim 89 or 90, wherein the method comprises applying a first sequencing primer that binds to the first nucleotide strand, conducting an extension reaction using one or more labelled nucleotides, obtaining a first signal from the one or more labelled nucleotides and determining the intensity of the first signal.
93. The method of claim 91 wherein the method comprises washing off the first sequencing primer.
94. The method of claim 92, wherein the method comprises washing off the first sequencing primer and the labelled nucleotides.
95. The method of any of claims 89 to 94, wherein the method comprises applying a second sequencing primer that binds to the second nucleotide strand, wherein the second sequencing primer is labelled, obtaining a second signal from the label and determining the intensity of the second signal.
96. The method of claim 95, wherein the label is cleavable, and wherein the method comprises cleaving the label after determining the intensity of the second signal.
97. The method of any of claims 89 to 94, wherein the method comprises applying a second sequencing primer that binds to the second nucleotide strand, conducting an extension reaction using one or more labelled nucleotides, obtaining a second signal from the one or more labelled nucleotides and determining the intensity of the second signal.
98. The method of any of claims 89 to 97, wherein the subsequent signal generated from sequencing the first polynucleotide strand is attenuated using a mixture of terminated and non-terminated first sequencing primers.
99. The method of claim 98, wherein the terminated primers comprise at least one modification that prevents extension (i.e. elongation) of the primer by a polymerase.
100. The method of claim 99, wherein the modification is selected from a blocking group such as a hairpin loop, a deoxynucleotide, a deoxyribonucleotide, a hydrogen atom instead of a 3’-OH group, a phosphate group, a phosphorothioate group, a propyl spacer (e.g. -O-(CH2)3-OH instead of a 3’-OH group), a modification blocking the 3’-hydroxyl group (e.g. hydroxyl protecting groups, such as silyl ether groups (e.g. trimethylsilyl, triethylsilyl, triisopropylsilyl, t-butyl(dimethyl)silyl, t-butyl(diphenyl)silyl), ether groups (e.g. benzyl, allyl, t- butyl, methoxymethyl (MOM), 2-methoxyethoxymethyl (MEM), tetrahydropyranyl), or acyl groups (e.g. acetyl, benzoyl)), or an inverted nucleobase.
101 . The method of any of claims 98 to 100, wherein a ratio of terminated: nonterminated first sequencing primers is calculated based on the first signal intensity and the second signal intensity.
102. The method of claim 101 , wherein the ratio of terminated: non-terminated first sequencing primers is calculated using the following formula: tR1 p/ R1 p = (X* R1 si - R2si)/R2si (Equation 2) wherein: X is the desired ratio of signal intensities, wherein preferably X is 2 tR1p is terminated R1 primer
R1p is Read 1 sequencing primer (non-terminated)
R1 si is a first signal intensity generated from a first nucleotide sequence
R2si is a second signal intensity generated from a second nucleotide sequence.
103. The method of any of claims 101 to 102, wherein the ratio of the intensity of the attenuated subsequent signal generated from sequencing the first polynucleotide strand to the intensity of the subsequent signal generated from sequencing the second polynucleotide strand is or is around 2:1.
104. The method of any of claims 89 to 103, wherein the method improves the signal-to-noise ratio in nucleic acid sequencing.
105. A method of concurrently sequencing a first and second polynucleotide sequence, wherein the method comprises: preparing first and second polynucleotide sequences for sequencing using a method according to any one of claims 89 to 104; attenuating the first signal intensity to generate an adjusted first signal intensity, wherein the adjusted first signal intensity has lower intensity than the second signal intensity; and sequencing nucleobases in the first polynucleotide sequence and the second polynucleotide sequence.
106. A method according to claim 105, wherein the attenuating comprises applying a mixture of terminated and non-terminated first sequencing primers before sequencing nucleobases in the first polynucleotide sequence and the second polynucleotide sequence.
107. A method according to any one of claims 105 to 106, wherein the step of concurrently sequencing nucleobases comprises performing sequencing-by- synthesis or sequencing-by-ligation.
108. A sequencing kit comprising a plurality of first and second sequencing primers, wherein the first sequencing primers comprises both terminated and non-terminated primers.
109. The sequencing kit of claim 108, wherein the sequencing kit further comprises labelled first and second sequencing primers.
110. The sequencing kit of claim 109, wherein the labelled second sequencing primers comprise a cleavable site linking the label and the sequencing primer.
111. The sequencing kit of any of claims 108 to 110 further comprising instructions for use.
112. A data processing device comprising means for carrying out a method according to any one of claims 89 to 107.
113. A data processing device according to claim 112, wherein the data processing device is a polynucleotide sequencer.
114. A computer program product comprising instructions which, when the program is executed by a processor, cause the processor to carry out a method according to any one of claims 89 to 107.
115. A computer-readable storage medium comprising instructions which, when executed by a processor, cause the processor to carry out a method according to any one of claims 89 to 107.
116. A computer-readable data carrier having stored thereon a computer program product according to claim 114.
117. A data carrier signal carrying a computer program product according to claim 114.
PCT/EP2024/076524 2023-09-20 2024-09-20 Optimised nucleic acid sequencing Pending WO2025062001A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202363584129P 2023-09-20 2023-09-20
US202363583981P 2023-09-20 2023-09-20
US63/583,981 2023-09-20
US63/584,129 2023-09-20

Publications (1)

Publication Number Publication Date
WO2025062001A1 true WO2025062001A1 (en) 2025-03-27

Family

ID=92900098

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2024/076524 Pending WO2025062001A1 (en) 2023-09-20 2024-09-20 Optimised nucleic acid sequencing

Country Status (1)

Country Link
WO (1) WO2025062001A1 (en)

Citations (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998044152A1 (en) 1997-04-01 1998-10-08 Glaxo Group Limited Method of nucleic acid sequencing
WO1998044151A1 (en) 1997-04-01 1998-10-08 Glaxo Group Limited Method of nucleic acid amplification
WO2000018957A1 (en) 1998-09-30 2000-04-06 Applied Research Systems Ars Holding N.V. Methods of nucleic acid amplification and sequencing
US6306597B1 (en) 1995-04-17 2001-10-23 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
WO2002006456A1 (en) 2000-07-13 2002-01-24 Invitrogen Corporation Methods and compositions for rapid protein and peptide extraction and isolation using a lysis matrix
WO2003074734A2 (en) 2002-03-05 2003-09-12 Solexa Ltd. Methods for detecting genome-wide sequence variations associated with a phenotype
WO2005068656A1 (en) 2004-01-12 2005-07-28 Solexa Limited Nucleic acid characterisation
US20060024681A1 (en) 2003-10-31 2006-02-02 Agencourt Bioscience Corporation Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof
WO2006084132A2 (en) 2005-02-01 2006-08-10 Agencourt Bioscience Corp. Reagents, methods, and libraries for bead-based squencing
WO2006110855A2 (en) 2005-04-12 2006-10-19 454 Life Sciences Corporation Methods for determining sequence variants using ultra-deep sequencing
WO2006135342A1 (en) 2005-06-14 2006-12-21 Agency For Science, Technology And Research Method of processing and/or genome mapping of ditag sequences
US20060292611A1 (en) 2005-06-06 2006-12-28 Jan Berka Paired end sequencing
WO2007010252A1 (en) 2005-07-20 2007-01-25 Solexa Limited Method for sequencing a polynucleotide template
WO2007052006A1 (en) 2005-11-01 2007-05-10 Solexa Limited Method of preparing libraries of template polynucleotides
WO2007091077A1 (en) 2006-02-08 2007-08-16 Solexa Limited Method for sequencing a polynucleotide template
WO2007107710A1 (en) 2006-03-17 2007-09-27 Solexa Limited Isothermal methods for creating clonal single molecule arrays
WO2008041002A2 (en) 2006-10-06 2008-04-10 Illumina Cambridge Limited Method for sequencing a polynucleotide template
WO2008093098A2 (en) 2007-02-02 2008-08-07 Illumina Cambridge Limited Methods for indexing samples and sequencing multiple nucleotide templates
WO2010048605A1 (en) 2008-10-24 2010-04-29 Epicentre Technologies Corporation Transposon end compositions and methods for modifying nucleic acids
US20120301925A1 (en) 2011-05-23 2012-11-29 Alexander S Belyaev Methods and compositions for dna fragmentation and tagging by transposases
US20120316086A1 (en) 2011-06-09 2012-12-13 Illumina, Inc. Patterned flow-cells useful for nucleic acid analysis
US20130079232A1 (en) 2011-09-23 2013-03-28 Illumina, Inc. Methods and compositions for nucleic acid sequencing
US20130143774A1 (en) 2011-12-05 2013-06-06 The Regents Of The University Of California Methods and compositions for generating polynucleic acid fragments
WO2013188582A1 (en) 2012-06-15 2013-12-19 Illumina, Inc. Kinetic exclusion amplification of nucleic acid libraries
US9347092B2 (en) * 2009-02-25 2016-05-24 Roche Molecular System, Inc. Solid support for high-throughput nucleic acid analysis
WO2016189331A1 (en) 2015-05-28 2016-12-01 Illumina Cambridge Limited Surface-based tagmentation
US9670529B2 (en) * 2012-02-28 2017-06-06 Population Genetics Technologies Ltd. Method for attaching a counter sequence to a nucleic acid sample
US20180037950A1 (en) * 2014-11-11 2018-02-08 Illumina Cambridge Limited Methods and arrays for producing and sequencing monoclonal clusters of nucleic acid
WO2019028470A2 (en) * 2017-08-04 2019-02-07 Billiontoone, Inc. Sequencing output determination and analysis with target-associated molecules in quantification associated with biological targets
US20190212294A1 (en) 2018-01-08 2019-07-11 Illumina, Inc. High-Throughput Sequencing with Semiconductor-Based Detection
WO2019222264A1 (en) * 2018-05-15 2019-11-21 Illumina, Inc. Compositions and methods for chemical cleavage and deprotection of surface-bound oligonucleotides
WO2022087150A2 (en) 2020-10-21 2022-04-28 Illumina, Inc. Sequencing templates comprising multiple inserts and compositions and methods for improving sequencing throughput
CN114846154A (en) * 2019-12-23 2022-08-02 深圳华大智造科技股份有限公司 Controlled strand displacement for paired-end sequencing

Patent Citations (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6306597B1 (en) 1995-04-17 2001-10-23 Lynx Therapeutics, Inc. DNA sequencing by parallel oligonucleotide extensions
WO1998044152A1 (en) 1997-04-01 1998-10-08 Glaxo Group Limited Method of nucleic acid sequencing
WO1998044151A1 (en) 1997-04-01 1998-10-08 Glaxo Group Limited Method of nucleic acid amplification
WO2000018957A1 (en) 1998-09-30 2000-04-06 Applied Research Systems Ars Holding N.V. Methods of nucleic acid amplification and sequencing
WO2002006456A1 (en) 2000-07-13 2002-01-24 Invitrogen Corporation Methods and compositions for rapid protein and peptide extraction and isolation using a lysis matrix
WO2003074734A2 (en) 2002-03-05 2003-09-12 Solexa Ltd. Methods for detecting genome-wide sequence variations associated with a phenotype
US20060024681A1 (en) 2003-10-31 2006-02-02 Agencourt Bioscience Corporation Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof
WO2005068656A1 (en) 2004-01-12 2005-07-28 Solexa Limited Nucleic acid characterisation
WO2006084132A2 (en) 2005-02-01 2006-08-10 Agencourt Bioscience Corp. Reagents, methods, and libraries for bead-based squencing
WO2006110855A2 (en) 2005-04-12 2006-10-19 454 Life Sciences Corporation Methods for determining sequence variants using ultra-deep sequencing
US20060292611A1 (en) 2005-06-06 2006-12-28 Jan Berka Paired end sequencing
WO2006135342A1 (en) 2005-06-14 2006-12-21 Agency For Science, Technology And Research Method of processing and/or genome mapping of ditag sequences
WO2007010252A1 (en) 2005-07-20 2007-01-25 Solexa Limited Method for sequencing a polynucleotide template
WO2007052006A1 (en) 2005-11-01 2007-05-10 Solexa Limited Method of preparing libraries of template polynucleotides
WO2007091077A1 (en) 2006-02-08 2007-08-16 Solexa Limited Method for sequencing a polynucleotide template
WO2007107710A1 (en) 2006-03-17 2007-09-27 Solexa Limited Isothermal methods for creating clonal single molecule arrays
WO2008041002A2 (en) 2006-10-06 2008-04-10 Illumina Cambridge Limited Method for sequencing a polynucleotide template
WO2008093098A2 (en) 2007-02-02 2008-08-07 Illumina Cambridge Limited Methods for indexing samples and sequencing multiple nucleotide templates
WO2010048605A1 (en) 2008-10-24 2010-04-29 Epicentre Technologies Corporation Transposon end compositions and methods for modifying nucleic acids
US9347092B2 (en) * 2009-02-25 2016-05-24 Roche Molecular System, Inc. Solid support for high-throughput nucleic acid analysis
US20120301925A1 (en) 2011-05-23 2012-11-29 Alexander S Belyaev Methods and compositions for dna fragmentation and tagging by transposases
US20120316086A1 (en) 2011-06-09 2012-12-13 Illumina, Inc. Patterned flow-cells useful for nucleic acid analysis
US20130079232A1 (en) 2011-09-23 2013-03-28 Illumina, Inc. Methods and compositions for nucleic acid sequencing
US20130143774A1 (en) 2011-12-05 2013-06-06 The Regents Of The University Of California Methods and compositions for generating polynucleic acid fragments
US9670529B2 (en) * 2012-02-28 2017-06-06 Population Genetics Technologies Ltd. Method for attaching a counter sequence to a nucleic acid sample
WO2013188582A1 (en) 2012-06-15 2013-12-19 Illumina, Inc. Kinetic exclusion amplification of nucleic acid libraries
US20180037950A1 (en) * 2014-11-11 2018-02-08 Illumina Cambridge Limited Methods and arrays for producing and sequencing monoclonal clusters of nucleic acid
WO2016189331A1 (en) 2015-05-28 2016-12-01 Illumina Cambridge Limited Surface-based tagmentation
WO2019028470A2 (en) * 2017-08-04 2019-02-07 Billiontoone, Inc. Sequencing output determination and analysis with target-associated molecules in quantification associated with biological targets
US20190212294A1 (en) 2018-01-08 2019-07-11 Illumina, Inc. High-Throughput Sequencing with Semiconductor-Based Detection
WO2019222264A1 (en) * 2018-05-15 2019-11-21 Illumina, Inc. Compositions and methods for chemical cleavage and deprotection of surface-bound oligonucleotides
CN114846154A (en) * 2019-12-23 2022-08-02 深圳华大智造科技股份有限公司 Controlled strand displacement for paired-end sequencing
WO2022087150A2 (en) 2020-10-21 2022-04-28 Illumina, Inc. Sequencing templates comprising multiple inserts and compositions and methods for improving sequencing throughput

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HIGUCHI ET AL., NUCLEIC ACIDS RES., vol. 16, 1988, pages 7351 - 7367
SAMBROOK ET AL.: "Molecular Cloning, A Laboratory Manual", 2001, COLD SPRING HARBOR LABORATORY PRESS

Similar Documents

Publication Publication Date Title
AU2019222723B2 (en) Methods for the epigenetic analysis of DNA, particularly cell-free DNA
WO2023175040A2 (en) Concurrent sequencing of forward and reverse complement strands on concatenated polynucleotides for methylation detection
US20240279733A1 (en) Orthogonal hybridization
WO2022247555A1 (en) Sequencing method
US20250043275A1 (en) Methods of preparing loop fork libraries
US10036063B2 (en) Method for sequencing a polynucleotide template
WO2024256581A1 (en) Determination of modified cytosines
WO2025062001A1 (en) Optimised nucleic acid sequencing
WO2024256580A1 (en) Concurrent sequencing with spatially separated rings
US20240287578A1 (en) Concurrent sequencing of forward and reverse complement strands on concatenated polynucleotides for methylation detection
WO2003102179A1 (en) Novel method of assyaing nucleic acid using labeled nucleotide
US20240301464A1 (en) Concurrent sequencing of forward and reverse complement strands on separate polynucleotides for methylation detection
US20250207193A1 (en) Reusable flow cells and methods of using them for nucleic acid sequencing
WO2025062002A1 (en) Concurrent sequencing using nick translation
WO2026006746A2 (en) Nucleic acid preparation and analysis techniques
CN117813390A (en) Methods for metal-directed cleavage of surface-bound polynucleotides

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24776869

Country of ref document: EP

Kind code of ref document: A1