HK1116222A

HK1116222A - Method for improving the characterisation of a polynucleotide sequence

Info

Publication number: HK1116222A
Application number: HK08106448.7A
Authority: HK
Inventors: Preben Lexow
Original assignee: Lingvitae As
Priority date: 2005-03-01
Filing date: 2006-03-01
Publication date: 2008-12-19

Description

Methods for improving polynucleotide sequence characteristics

Technical Field

The present invention relates to a method for improving the accuracy of polynucleotide sequence characterization.

Background

Advances in molecular research have been driven to some extent by improvements in the techniques used to characterize molecules or their biological responses. In particular, the study of nucleic acids DNA and RNA has benefited from technological advances in sequence analysis and hybridization reaction studies.

The primary method commonly used for large-scale DNA sequencing is the chain termination method. This method was originally developed by Sanger and Coulson (Sanger et al, Proc. Natl. Acad. Sci. USA, 1977; 74: 5463-. Once incorporated, the dideoxy derivative terminates the polymerase reaction and the products are then separated by gel electrophoresis and analysed to reveal the position of incorporation of the particular dideoxy derivative into the chain.

Although the method is widely used and reliable results are obtained, it is recognized that it is slow, labor intensive and expensive.

US-A-5302509 discloses A method of sequencing polynucleotides immobilized on A solid support. The method relies on incorporating 3' -closed bases A, G, C and T with different fluorescent labels into the immobilized polynucleotide in the presence of a DNA polymerase. The polymerase can incorporate bases complementary to the target polynucleotide, but is prevented from further addition by the 3' -blocking group. The label of the incorporated base is then determined and the blocking group is removed by chemical cleavage, allowing further polymerization to occur. However, this process, which requires removal of the blocking group, is time consuming and must be performed efficiently.

WO-A-00/39333 describes A method for sequencing A polynucleotide by converting the sequence of A target polynucleotide into A second polynucleotide which contains defined sequence and positional information. The sequence information of the target is reported to be "magnified" in the second polynucleotide, thereby making it easier to distinguish the individual bases in the target molecule. This can be achieved by using a "magnifying tag", which is a predetermined unit of nucleic acid sequence. Each base on the target molecule, i.e., adenine, cytosine, guanine and thymine, is represented by a respective magnifying tag, thereby converting the original target sequence into a magnified sequence. The order of the magnifying tags is then determined using conventional techniques and the specific sequence of the target polynucleotide is determined therefrom.

In a preferred sequencing method, each magnifying tag comprises a label, for example a fluorescent label, which can then be identified and used to characterise the magnifying tag.

WO-A-04/094664 describes an improvement of the conversion process disclosed in WO-A-00/39333. In both methods, it is preferred that each magnifying tag comprises two units having a unique sequence that can be used as a binary system, where one unit represents a "0" and the other represents a "1". Each base on the target is characterized by a combination of these two units, e.g., adenine can be represented by "0" + "0", cytosine by "0" + "1", guanine by "1" + "0", and thymine by "1" + "1".

As with all sequencing methods, maintaining high accuracy is very important for successful sequencing reactions. It is therefore a long felt need to obtain the highest accuracy of any sequencing reaction.

Disclosure of Invention

The present invention provides A method of improving the accuracy of sequencing reactions, in particular those involving the use of binary signals, such as those described in WO-A-00/39333 and WO-A-04/094664 (both incorporated herein by reference), or those involving base-to-base signals, such as ligation proximity assays (ligation proximity assays). The invention is based on the recognition that: when a sequencing reaction involves converting a target molecule, such as a polynucleotide, into a polynucleotide comprising units of unique sequence information, the accuracy of the sequence data obtained can be improved by incorporating into the polynucleotide a defined sequence that can be used as an internal control that can be determined to ensure that sequencing errors can be detected. These control sequences do not directly represent the sequence of the target polynucleotide.

According to a first aspect of the invention, a method of identifying at least one characteristic of a target molecule comprises the steps of:

(i) converting said at least one characteristic into a signal polynucleotide sequence; and are

(ii) Identifying the signal polynucleotide sequences, thereby identifying the at least one characteristic of the target molecule, wherein each signal polynucleotide sequence comprises at least one control sequence that defines a characteristic of the signal polynucleotide sequence, and wherein identification of the control sequences confirms whether the signal polynucleotide sequences have been correctly identified, and optionally, if the identification is incorrect, the identification provides the necessary information to determine what the correct signal polynucleotide sequence should be.

According to a second aspect of the invention, a method of sequencing a target polynucleotide comprises the steps of:

(i) converting at least one base of the target polynucleotide into a signal sequence; and are

(ii) Identifying the signal sequences, and thereby the sequence of the target polynucleotide, wherein each signal sequence comprises at least one control sequence defining a feature of the signal sequence, and wherein identification of the control sequences confirms whether the signal sequence has been correctly identified, and optionally, if the identification is incorrect, provides the necessary information to determine what the correct polynucleotide sequence should be.

Drawings

The invention is described with reference to the accompanying drawings, in which:

FIG. 1A shows a binary signal sequence containing information on three bases in a target polynucleotide and two control bits (bits); and

FIG. 1B shows a method for defining the bit content of a binary signal sequence to which it is linked using a control sequence, wherein bit triplets without a "0" or with two "0" bits are designated with the control bit "0" and bit triplets without a "1" or with two "1" bits are designated with the control bit "1".

Detailed Description

The invention is based on the recognition that: a target molecule can be converted to a defined polynucleotide sequence and the accuracy of the final read step can be assessed by incorporating a control sequence into the polynucleotide sequence formed above and detecting the presence or absence of the control sequence.

The methods of the invention are particularly suited to improve the accuracy of sequencing reactions in which one target polynucleotide is converted to another polynucleotide of defined sequence (referred to herein as a "signal sequence"). The method is based on the recognition that: the addition of a control sequence to the signal sequence provides an internal check of the sequence data obtained and allows identification of potential errors during the read-out step.

With respect to the target molecule, the characteristics of the target molecule may be expressed (converted) into a signal polynucleotide sequence. For example, if the target molecule is a protein, each amino acid monomer can be represented by a particular sequence on the signal polynucleotide sequence. In a preferred embodiment, the target molecule is a polynucleotide and the conversion is achieved by amplifying the polynucleotide. The invention is further described by the use of polynucleotides as target molecules.

The term "polynucleotide" is well known in the art and is used to refer to a series of linked nucleic acid molecules, such as DNA or RNA. Nucleic acid mimetics such as PNA, LNA (locked nucleic acid) and 2' -O-methRNA are also within the scope of the invention.

As known in the art, the bases A, T (U), G, and C referred to herein refer to the nucleotide bases adenine, thymine (uracil), guanine, and cytosine. Uracil replaces thymine when the polynucleotide is RNA, or is introduced into DNA using dUTP, as is also well known in the art.

A "signal sequence" is a single-or double-stranded polynucleotide that includes unique nucleotide sequence "units". Each A, T (U), G, and C base on the target is represented by a unique predetermined unit or a unique combination of units in the signal sequence. Each unit preferably comprises two or more nucleotide bases, preferably 2 to 50 bases, more preferably 2 to 20 bases, most preferably 4 to 10 bases, for example 6 bases. At least two different bases are included in each unit. The design of the units allows to distinguish between different units in a "read-out" step, e.g.involving incorporation of nucleotides with a detectable label in the polymerization reaction or upon hybridization of complementary oligonucleotides. Methods of sequencing in which A target is converted to another polynucleotide "signal sequence" are well known in the art, for example, as described in WO-A-00/39333 and WO-A-04/094664.

In a preferred embodiment of these sequencing techniques, each base of the target polynucleotide is represented by two distinct sequence units in the signal sequence. According to this embodiment, two cells can be used as a binary system, where one cell represents a "0" and the other cell represents a "1"; whereby each base on the target can be represented by a 2-bit binary code. Each "0" or "1" is referred to herein as a "sequence bit". Each base on the target is characterized by the combination of the two bits. For example, adenine may be represented by "0" + "0", cytosine by "0" + "1", guanine by "1" + "0", and thymine by "1" + "1". These cells need to be distinguished so that a "stop" signal can be incorporated into each cell. Depending on whether the bases on the target (template) polynucleotide are in odd or even positions, it may also be preferable to use different units to represent "1" and "0".

Examples are given below:

an odd template sequence:

″0″：TTTTTTA(CCC)

″1″：TTTTTTG(CCC)

even template sequences:

″0″：CCCCCCA(TTT)

″1″：CCCCCCG(TTT)

in this example, the underlined bases are the targets for labeling nucleotides in a polymerase reaction, the bases in parentheses are used as stop signals, and the remaining bases provide the spacing between labels.

Suitable signal sequences are also described in WO-A-00/39333.

Thus, the binary method involves combining the sequence "bits" to form a signal sequence. The methods of the invention incorporate a "control bit" into the signal sequence. The term "control bit" as used herein refers to a predetermined sequence unit defining the bit sequence of the signal sequence adjacent thereto; each control bit provides a sequence overview for sequences adjacent to it. In the reading step, the information contained in the reference bits is used to verify whether the information read from the adjacent sequence is correct. The terms "control bit" and "control sequence" are used interchangeably.

In its simplest form, each bit in the signal sequence is immediately preceded (or followed) by another identical bit that serves as a reference bit. In this embodiment, each sequence bit in the signal sequence is repeated by a control bit, thereby providing internal control and inspection for final sequencing of the signal sequence.

In a preferred embodiment, each control bit defines a plurality of sequence bits. Preferably, each control bit defines 2 to 10 sequence bits, more preferably 2 to 5 bits. As shown in FIG. 1A, most preferably each control bit defines 3 sequential bits. When each reference bit defines three sequential bits, the reference bits can define the sequential bits shown in FIG. 1B. If a triplet of sequence bits contains no "0" bits or 2 "0" bits, a "0" reference bit is connected to the triplet. If a bit triplet contains 0 or 2 "1" bits, the reference bit "1" is used. In such a system, a change in a single bit in a triplet of bits will always result in a change in the reference bit, and a misinterpretation of a bit during readout, i.e., "0" as "1", can be detected with the reference bit (unless both bits are misinterpreted simultaneously in the triplet). When using the preferred control system shown in FIG. 1B, if the control bit is a "1", this indicates that the preceding triplet must contain two "1" bits and one "0". If one of the bases is misread, e.g., "1" is read as "0", the control bit will highlight the error. The reference bit thus defines the number of each type of bit per triplet of sequence bits connected thereto, i.e. "bit-content". This provides an internal control for the read-out step.

The preferred system takes advantage of the concept of "parity bits" in the field of computer programming and applies it in the fields of molecular biology and biochemistry. In the preferred embodiment, the control bit functions as a parity bit by defining the bit-content (or other number) of each triplet of sequence bits connected thereto. Parity of "odd" or "even" can be used, i.e., a parity (control) bit can define whether a particular sequence bit ("0" or "1") within the signal polynucleotide region to which the parity (control) bit is connected is odd or even.

The table below shows the increase in accuracy obtained by using one control bit for every three sequence bits.

Using the reference bit without using the reference bit

Accuracy per base the accuracy per base

90 97.57

91 97.99

92 98.37

93 98.73

94 99.05

95 99.32

96 99.56

97 99.75

98 99.88

99 99.97

In a preferred embodiment of the invention, each signal sequence contains binary information, i.e.6 bits of information, encoding three bases in the target polynucleotide. After every third bit, a reference bit is incorporated into the signal sequence, which defines the first three bits in the sequence, as shown in FIGS. 1A and 1B. Thus, each signal sequence contains eight bits of information, six of which represent bases in the target polynucleotide and two of which are control bits. In each cycle of "converting" the target polynucleotide to a signal sequence, the information for 3 bases in the target is represented in the signal sequence. To sequence sequences of more than three bases using this preferred embodiment, the addition of more cycles of signal sequence to form A single strand comprising A defined series of signal sequences may be used, as described in WO-A-00/39333 and WO-A-04/094664.

In one embodiment, the control bit can have a defined sequence characteristic of a particular polynucleotide signal sequence (or partial sequence). If there is an error in the signal sequence, for example if an incorrect number of bases is detected in the read step, the control bit can be identified and the identification allows identification of what the correct signal sequence (or part of the signal sequence) should be. The comparison bits in this way act as an error correction sequence, similar to error correction codes (e.g., hamming codes) used in computer design. The control bits should therefore be long enough to allow a specific characterization of the signal sequence (or a part thereof). For example, if a portion of the signal sequence corresponds to a particular nucleotide base A, then the control bit should be such that a characterization of a portion of the signal sequence can determine that it corresponds to A, provided that the signal sequence is not sequenced correctly or is mis-formed from the original target molecule.

In addition to the control bits present in each signal sequence, the method of the invention can be implemented as follows: i.e. additional control bits are inserted in defined regions or at defined intervals during the construction of the signal sequence. The presence of additional control bits at regular intervals allows the user to confirm that the polynucleotide signal sequence is present in the correct form (sequence) and thus that the conversion and/or readout steps have taken place correctly. For example, if the target molecule is a polynucleotide and a switch can be made to sequence the target, it is expected that additional control bits present at intervals of every 10 bases (on the target) will increase the likelihood of detecting any frameshifts. For example, if the sequencing experiment causes a frameshift that occurs during sequencing of the signal sequence, additional control bits will not be identified as expected after the corresponding 10 base sequence; this indicates that there is an error somewhere in the sequence after the last additional control bit is detected. These additional control bits can be inserted after any defined number of bases (or other features) on the target. For example, they may be inserted every 1 to 10 base transitions. For example, bases A, C, G and T are represented as a binary sequence shown below and a control bit sequence that separates each "converted" base.

A ＝00 01

C ＝01 01

G ＝10 01

T ＝11 01

Sequence 01 is a control bit and the sequence can be identified after sequencing each base code. If 01 is not identified in the sequencing of a base, this indicates that the read step misses a sequence and so the sequencing/read step is repeated once.

In another separate embodiment, when sequencing bases characterized by a series of "0" or "1", the control bits can be used to ensure that the read step is performed correctly. It is often difficult for a readout platform to distinguish a series of "0" s or "1" s, so it can be said that the readout step can only read three, rather than four, consecutive "0" s. It is therefore preferable to ensure that successive "0" (or "1") are spaced apart. This can be achieved by introducing redundant control bit sequences in each sequence corresponding to one base to ensure that only a limited number of "0" s are contiguous. The redundant control bits are removed (typically by computer algorithms) following sequencing to identify the correct sequence.

For example, taking binary coding of A, G, C and T as an example, as described above, redundant parity bits can be introduced in the following manner:

A ＝01001

C ＝01101

G ＝11001

T ＝10101

the underlined sequence at position 2 is the control bit. This ensures that the signal sequence does not contain a series of 3 or more consecutive "0" s or 3 or more consecutive "1" s. A read-out step can then be performed and the redundant comparison bit is removed (knowing that it is at position 2). The redundant control bits can be inserted in the correct position by using the correct linker molecule, as disclosed in WO-A-04/094664.

Once a signal sequence containing at least one reference sequence is generated, it is necessary to perform a "read-out step" to obtain sequence information encoded therein.

The read-out step may be performed using any suitable technique, such as those described in WO-A-00/39333 and WO-A-04/094663 and summarised herein. Preferred detection techniques as described above utilise selected nucleotides having a detectable label or nucleotides incorporating a group for subsequent indirect labelling, incorporate bases complementary to bases on the signal sequence using a polymerase reaction and monitor any incorporation.

To perform a read-out step based on a polymerase reaction, it is generally necessary to first anneal a primer sequence to the signal sequence polynucleotide, which can be identified by the polymerase and serves as a starting site for continued extension of the complementary strand. The primer sequence may be added as a separate component to the polynucleotide which contains a complementary sequence to which the primer anneals. The polymerase reaction is preferably performed under conditions that allow controlled incorporation of complementary nucleotides one unit at a time. This allows each amplified signal sequence unit to be classified by detecting the incorporated label. As described above, because each unit preferably contains a "stop" sequence, incorporation can be controlled by providing only those nucleotides that are required for incorporation into the first unit. Since each unit can be identified by a specific label, two different units (0 and 1) can be distinguished in each cycle. This allows any incorporated label to be detected and the unit to be identified.

The readout method can be performed as follows:

(i) contacting a signal sequence comprising the defined unit with at least one of the nucleotides dATP, dTTP, dGTP and dCTP, wherein the at least one nucleotide comprises a detectable label specific for the nucleotide, under conditions such that a polymerisation reaction can occur;

(ii) removing all unincorporated nucleotides and detecting all incorporation;

(iii) removing the label from the incorporated nucleotide; and is

(iv) Repeating steps ii) to iv) thereby identifying the different units and thereby determining the sequence of the target polynucleotide.

The number of different nucleotides required in step (i) of each cycle depends on the design of the signal sequence unit. If each unit contains only one base type, only one nucleotide (with a detectable label) is required. However, if two bases are used (one as a target for a detectably labeled nucleotide and the other providing a gap between different target bases) two nucleotides (one bound to the target base and the other "filling in" the base between the target bases) are required.

The use of a base as a stop signal allows the detection step to be carried out without the need to block uncontrolled incorporation with a blocking nucleotide during the polymerase reaction. The stop signal is effective when there is no complementary base in the polymerase mixture to "stop" the base. Thus, a "fill-in" step can be performed after the characterization of each unit, in which the previously deleted nucleotides are used, thereby incorporating the complementary base of the stop base, which provides the possibility for the characterization of the next unit. This is done after the detection step. The "stop" base of one unit and the first base of the next unit should not be of the same type. This may ensure that the "fill-in" process does not proceed to the next cell. The unincorporated nucleotides used in the "fill-in" process can then be removed and the next unit can then be characterized.

The choice of polymerase and detectable label will be apparent to the skilled person. The following is used as guidance only:

klenow and Klenow (exo-) can efficiently incorporate tetramethylrhodamine-4-dUTP and rhodamine-110-dCTP (Amersham Pharmacia Biotech) (Brakmann and dNeckchen, 2001, Brakmann and L ö bermann, 2000).

Vent, Taq and Tgo DNA polymerases can efficiently incorporate digoxin and fluorophores such as AMCA, tetramethylrhodamine, fluorescein and Cy5 without at least up to several positional intervals (Augustin et al (reference) 2001).

T4DNA polymerase can efficiently fill in fluorophore-labeled nucleotides.

Preferred polymerases are Klenow large fragment (exo-) and T4DNA polymerase.

Other conditions necessary to carry out the polymerase reaction, including temperature, pH, buffer composition, and the like, will be apparent to those skilled in the art. The polymerization step may be carried out for a period of time sufficient to allow base incorporation into the first unit. The non-incorporated nucleotides are then removed, for example by washing the array, and detection of the incorporated label is then performed.

Another read-out is to hybridize short detectably labeled oligonucleotides to the magnified readable signal sequence and/or units on the positional tag and detect any hybridization results. The short oligonucleotide has a sequence complementary to a specific unit in the readable signal sequence. For example, if a binary system is used and each monomer in the sample fragment is defined by a different combination of signal sequence units (one representing "0" and the other representing "1"), the invention will require an oligonucleotide specific for unit "1". In such embodiments, selective hybridization of the oligonucleotides may be achieved by designing each unit to be a different polynucleotide sequence relative to the other units. This ensures that the hybridisation reaction only occurs in the presence of the particular unit and detection of the hybridisation event identifies the characteristics of the sample fragment.

In a preferred embodiment, the label is a fluorescent moiety. As mentioned above, many examples of fluorophores that can be used are known in the art. Conventional methods can be used to attach a suitable fluorophore to a nucleotide. Nucleotides with appropriate labels are also commercially available. The label is attached in such a way that it can be removed after the detection step. This can be accomplished by conventional methods, including:

I. the attack signal itself:

4) bleaching

1) Photo bleaching

2) Chemical bleaching

2) Fluorescence quenching

i) Antibodies generated by targeting fluorescence (e.g., anti-fluorescein antibody, anti-Oregon Green antibody)

ii) quenching of the signal by FRET (incorporation of a quencher in the vicinity of the signal, e.g. Taqman method)

3) Signal cutting

i) Chemical cleavage (e.g.reduction of disulfide bond between base and signal)

ii) photocleavage (e.g. by introduction of nitrobenzyl or tert-butyl keto group)

iii) enzymatic methods (e.g. alpha-chymotrypsin digestion of peptide linkers)

Signal-carrying nucleotides:

1) exonucleolytic removal

i)3 '-5' exonuclease degradation of incorporated nucleotides (e.g., exonuclease III or 3 '-5' exonuclease activity to activate DNA polymerase in the absence of certain nucleotides)

2) Restriction enzyme digestion

ii) digestion of the double-stranded DNA carrying the signal (e.g.ApaI, Dry, SmaI sites which can be incorporated into the termination signal)

Another method of using removable labels is to use inactive labels that can be reactivated during biochemical processes.

The preferred method is by photocleavage or chemical cleavage.

When labeled as a fluorophore, the fluorescence signal generated by incorporation can be measured with optical tools, for example, by confocal microscopy. Alternatively, a sensitive 2-D detector such as a Charge Coupled Detector (CCD) may be used to display the various signals generated.

The conventional optical detection device is as follows:

microscope Epi-fluorescence (Epi-fluorescene)

Objective immersion (100X, 1.3NA)

Light source laser or lamp

Optical filter band-pass

Mirror dichroic mirror and dichroic wedge (dichroic wedge)

Detector photomultiplier tube (PMT) or CCD camera

Different devices may also be used, including:

A. total Internal Reflection Fluorescence Microscope (TIRFM)

Light source one or more lasers

Background control without pinholes

CCD camera of detector (video and digital imaging system)

B. Confocal Laser Scanning Microscope (CLSM)

Light source one or more lasers

Background reduction of one or more pinhole apertures

Detector a) single pinhole: photomultiplier for different fluorescence wavelengths

Tube (PMT) detector [ final image by computer over time

Dot formation ].

b) Several thousand pinholes (rotating Nipkow disk (Nipkow)

disk)): CCD Camera detection image [ Final image is straight by camera

Take note of.

C. Two-photon (TPLSM) and multiphoton laser scanning microscope

Light source one or more lasers

Background control without pinholes

CCD camera of detector (video and digital imaging system)

Preferred methods are TIRFM and confocal microscopy.

The read-out platform may also be based on the nanopore disclosed in WO00/39333, the content of which is incorporated herein by reference.

It will be appreciated that although specific examples of techniques suitable for reading signal sequences are provided in this specification, signal sequences may be read using any suitable reading platform.

Claims

1. A method of identifying at least one characteristic of a target molecule comprising the steps of:

(i) converting said at least one characteristic into a signal polynucleotide; and are

(ii) Identifying the signal polynucleotide sequence, thereby identifying the at least one characteristic of the target molecule, wherein each signal polynucleotide comprises at least one control sequence that defines one characteristic of the signal polynucleotide, and wherein identification of the control sequences confirms whether the signal polynucleotide sequence has been correctly identified, and optionally, if the identification is incorrect, the identification provides the necessary information to determine what the correct signal polynucleotide sequence should be.

2. The method according to claim 1, wherein said target molecule is a polymer.

3. The method according to claim 2, wherein the feature to be identified is at least one monomer.

4. A method according to claim 3, wherein said at least one monomer is a nucleotide.

5. A method according to any one of claims 1 to 4 wherein each feature of the target polymer is represented as at least one unique sequence unit in the signal polynucleotide.

6. A method according to claim 5 wherein the target polymer is characterised by a specific combination of two or more distinct polynucleotide sequence units in the signal polynucleotide.

7. A method according to claim 6 wherein each feature of the target polymer is represented as a specific combination of two or more polynucleotide sequence units designated "0" and "1" in the signal polynucleotide sequence, thereby forming a binary signal polynucleotide.

8. A method according to any one of claims 2 to 7 wherein three monomers on the target polymer are converted to one signal polynucleotide in step (i).

9. A method according to any preceding claim, wherein the control sequence is incorporated into the signal polynucleotide at predetermined intervals.

10. A method according to any one of claims 5 to 9 wherein a control sequence is incorporated into the signal polynucleotide after every third sequence unit.

11. A method according to any one of claims 6 to 10 wherein the combination of units linked thereto is defined by the control sequence.

12. A method according to any one of claims 7 to 10 wherein the control sequence is a "0" or "1" unit which defines the number of "0" or "1" units in the region of the signal polynucleotide to which it is linked.

13. The method according to any one of claims 7 to 9, wherein the control sequence is present at a defined position in the signal sequence such that no more than three sequence units of the same type are present which are characteristic of the target.

14. A method according to any preceding claim, wherein steps (i) and (ii) are repeated to form a molecule having a series of polynucleotide signal sequences which are indicative of said characteristic of said target molecule.

15. A method according to claim 13, wherein additional control sequences are incorporated into the formed molecule at defined intervals, whereby identification of the additional control sequences reveals whether the correct number of signal sequences have been incorporated.

16. A method of sequencing a target polynucleotide comprising the steps of:

i) converting at least one base of the target polynucleotide into a signal sequence; and are

ii) identifying said signal sequences, and thereby said sequence of said target polynucleotide, wherein each signal sequence comprises at least one control sequence defining a feature of said signal sequence, and wherein identification of said control sequences confirms whether said signal sequence has been correctly identified, and optionally, if said identification is incorrect, said identification provides the necessary information to determine what sequence said correct signal sequence should be.