CN111676276A

CN111676276A - A method for rapid and accurate determination of gene editing mutation and its application

Info

Publication number: CN111676276A
Application number: CN202010669771.5A
Authority: CN
Inventors: 吴磊; 吴谦
Original assignee: Hubei Boyuan Synthetic Biotechnology Co ltd
Current assignee: Hubei Boyuan Synthetic Biotechnology Co ltd
Priority date: 2020-07-13
Filing date: 2020-07-13
Publication date: 2020-09-18

Abstract

The invention relates to a method and application for quickly and accurately determining gene editing mutation, and belongs to the technical field of gene detection. The method includes the following steps: 1) designing primers based on the target interval, amplifying, constructing a library, and sending high-throughput sequencing; 2) performing quality control on the sequencing data, removing sequences with a quality value less than 30; removing the "N" base content More than 10% of the sequences; 3) splicing at both ends to obtain long sequences, and counting and counting the number of different samples; 4) aligning the data obtained by S3 with the target genome sequence to obtain analysis results. The invention can quickly and efficiently detect the gene editing mutation of each sample, and the cost is low. In addition, 1G of sequencing data can be obtained in only 1-2 minutes, which greatly improves the analysis speed.

Description

A method for rapid and accurate determination of gene editing mutation and its application

【技术领域】【Technical field】

本发明涉及基因检测技术领域，具体涉及一种快速精准确定基因编辑突变情况的方法及其应用。The invention relates to the technical field of gene detection, in particular to a method for rapidly and accurately determining the mutation status of gene editing and its application.

【背景技术】【Background technique】

基因组编辑技术已经被广泛的应用于基础研究、基因治疗和遗传改良等多个领域。但是在基因编辑完成后需要进行测序分析来获取突变体准确的突变情况和突变类型。而Sanger测序不仅仅价格昂贵，而且通量低，无法通过一次测序结果而获得编辑产物的准确突变类型。而高通量测序不仅测序成本低廉而且已经有相关文章证明其可以实现对基因型的准确鉴定，但是目前使用的基于高通量测序对基因编辑突变的鉴定方法，是将测序的所有reads重新比对回参考基因组，然后再进行突变分析和统计。公开文献“Hi-TOM：aplatform for high-throughput tracking of mutations inducedby CRISPR/Cassystems”(Liu Q等.[J].SCIENCE CHINA Life Sciences,第62卷第1期，第1-7页，2019，01)介绍了目前使用广泛的Hi-TOM高通量突变分析平台，可以大批量的鉴定不同物种的基因突变情况。但是该分析平台存在分析过程较长的缺陷，由于它比对的是比较短的reads，很有可能比对到很多地方，因此有的相似性较高的序列无法比对出精确的结果，1G的测序数据量分析处理过程可能需要超过一个小时或更长，甚至有时候无法获得鉴定结果，并且分析流程和序列设计相对固定，灵活性差，无法满意一些特定的分析需求。Genome editing technology has been widely used in many fields such as basic research, gene therapy and genetic improvement. However, after the gene editing is completed, sequencing analysis is required to obtain the accurate mutation status and mutation type of the mutants. Sanger sequencing is not only expensive, but also has low throughput, and it is impossible to obtain the exact mutation type of the edited product through one sequencing result. High-throughput sequencing is not only low in cost, but also has been proven in relevant articles to achieve accurate identification of genotypes. However, the currently used high-throughput sequencing-based identification method for gene editing mutations is to re-comparison all sequencing reads. Back to the reference genome, and then perform mutation analysis and statistics. Open document "Hi-TOM: a platform for high-throughput tracking of mutations induced by CRISPR/Cassystems" (Liu Q et al. [J]. SCIENCE CHINA Life Sciences, Vol. 62, No. 1, pp. 1-7, 2019, 01 ) introduced the widely used Hi-TOM high-throughput mutation analysis platform, which can identify gene mutations in different species in large quantities. However, the analysis platform has the disadvantage of a long analysis process. Since it compares relatively short reads, it is likely to be compared to many places, so some sequences with high similarity cannot be compared to accurate results. 1G The analysis and processing process of the large amount of sequencing data may take more than an hour or longer, and sometimes the identification results cannot be obtained, and the analysis process and sequence design are relatively fixed, with poor flexibility, and cannot meet some specific analysis needs.

因此，本方法拟开发一套分析流程，可以在短时间内准确的获得大批量鉴定突变体基因编辑情况，并且可以按照不同特定需求设计实验和分析流程。因此，有必要利用高通量测序的优势，开发一种成本低廉、快速而精准的方法，以便运用于确定基因编辑突变的情况。Therefore, this method intends to develop a set of analysis procedures, which can accurately obtain large-scale identification of mutant gene editing in a short period of time, and can design experiments and analysis procedures according to different specific needs. Therefore, it is necessary to take advantage of high-throughput sequencing to develop a cost-effective, rapid and precise method for determining gene editing mutations.

【发明内容】[Content of the invention]

针对现有技术的不足，本发明提供了一种确定基因编辑突变情况的方法，基于高通量测序的优势，以达到成本低廉、快速精准的目的。Aiming at the deficiencies of the prior art, the present invention provides a method for determining gene editing mutations, which is based on the advantages of high-throughput sequencing to achieve the goals of low cost, rapidity and accuracy.

为解决上述问题，本发明提供了一种快速精准确定基因编辑突变情况的方法，包括以下步骤：In order to solve the above problems, the present invention provides a method for quickly and accurately determining gene editing mutation, including the following steps:

1.基于靶标区间设计引物，扩增，构建文库，送高通量测序；1. Design primers based on the target interval, amplify, construct the library, and send it to high-throughput sequencing;

2.对测序数据进行质控；2. Quality control of sequencing data;

3.对质控后的测序数据进行两端reads的拼接，获得长序列；然后使用统计软件或脚本，将序列相同的样本计为一个，对不同样本的数量统计计数；3. Splicing the reads at both ends of the quality-controlled sequencing data to obtain long sequences; then use statistical software or scripts to count samples with the same sequence as one, and count the number of different samples;

4.将步骤3中获得的数据与靶标基因组序列比对，识别出不同的碱基序列，获得分析结果，确定基因突变情况。4. Compare the data obtained in step 3 with the target genome sequence, identify different base sequences, obtain analysis results, and determine gene mutation.

进一步地，所述步骤1中设计引物扩增时，左右两端测序数据覆盖靶标区间重叠区间大于8bp；Further, when designing primer amplification in the step 1, the overlapping interval between the left and right ends of the sequencing data covering the target interval is greater than 8bp;

进一步地，所述步骤1中使用PE100或PE250方法测序；所述PE150测序方法是构建100-260bp文库，用所述PE250测序方法时构建100-460bp文库；Further, the PE100 or PE250 method is used for sequencing in the step 1; the PE150 sequencing method is to construct a 100-260 bp library, and the PE250 sequencing method is used to construct a 100-460 bp library;

进一步地，所述步骤1中所述测序为基于Illumina测序平台，多样品混样测序时，添加不同的barcode序列进行区分。Further, the sequencing in the step 1 is based on the Illumina sequencing platform, and different barcode sequences are added to distinguish when multiple samples are mixed for sequencing.

进一步地，所述步骤2中，所述质控步骤为：去除质量值小于30的reads；去除“N”碱基含量占总reads数量大于10％的reads。Further, in the step 2, the quality control steps are: removing reads with a quality value less than 30; removing reads with an "N" base content that accounts for more than 10% of the total reads.

进一步地，所述步骤3可根据获得数据的质控情况，去除统计计数中占总reads数之比小于5％的reads。Further, in the step 3, according to the quality control of the obtained data, the reads that account for less than 5% of the total reads in the statistical count are removed.

进一步地，所述步骤4中所述靶标基因组序列包含所述步骤1中的靶标区间；所述靶标基因组序列大小为500-1000bp；Further, the target genome sequence in the step 4 includes the target interval in the step 1; the size of the target genome sequence is 500-1000bp;

进一步地，所述步骤4中所述比对方法为，先对靶标基因组序列建立索引，再将所述步骤3中的数据比对回靶标基因组序列。Further, the comparison method in the step 4 is to first establish an index on the target genome sequence, and then align the data in the step 3 back to the target genome sequence.

本发明的另一个目的在于提供一种基于高通量测序的快速精准确定基因编辑突变情况的方法在筛选基因突变体中的应用。Another object of the present invention is to provide a method for quickly and accurately determining gene editing mutations based on high-throughput sequencing in screening gene mutants.

进一步地，所述基因突变体包括通过人为的方法和天然的方法获得的突变体；Further, the gene mutants include mutants obtained by artificial methods and natural methods;

进一步地，所述的基因突变体包括植物突变体、动物突变体和微生物突变体。Further, the gene mutants include plant mutants, animal mutants and microbial mutants.

相比现有技术，本发明的有益效果为：Compared with the prior art, the beneficial effects of the present invention are:

1)快速：1G的二代测序数据，整个过程只需要1-2分钟即可获得统计结果；1) Fast: 1G second-generation sequencing data, the whole process only takes 1-2 minutes to obtain statistical results;

2)精准：基于二代测序技术的高通量的优势，每个样本都可以获得成千上万条测序序列，具有显著的统计学价值；2) Accuracy: Based on the high-throughput advantage of next-generation sequencing technology, thousands of sequencing sequences can be obtained for each sample, which has significant statistical value;

3)成本低：可以多个样品混合测序，100多个样品扩增的PCR产物混合测序只需要1G的数据量即可满足分析要求，大大降低了分析所需的成本。3) Low cost: Multiple samples can be mixed for sequencing, and the mixed sequencing of PCR products amplified by more than 100 samples only needs 1G of data to meet the analysis requirements, which greatly reduces the cost of analysis.

【附图说明】【Description of drawings】

为了更清楚地说明本发明实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本发明的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the embodiments. It should be understood that the following drawings only show some embodiments of the present invention, and therefore do not It should be regarded as a limitation of the scope, and for those of ordinary skill in the art, other related drawings can also be obtained according to these drawings without any creative effort.

图1是本发明的PCR扩增产物示意图。Figure 1 is a schematic diagram of the PCR amplification product of the present invention.

图2是本发明的数据分析流程示意图。FIG. 2 is a schematic diagram of the data analysis process flow of the present invention.

图3是本发明实施例1的靶标基因组进行比对的部分结果比对图。FIG. 3 is a partial result comparison diagram of the target genome comparison of Example 1 of the present invention.

图4是本发明实施例2的靶标基因组进行比对结果比对图。FIG. 4 is a comparison diagram of the comparison results of target genomes in Example 2 of the present invention.

【具体实施方式】【Detailed ways】

以下实例用于说明本发明，但不限制本发明的范围。在不背离本发明精神和实质的前提下，对本发明的方法、步骤或条件所作的修改或替换，均属于本发明的范围。The following examples are intended to illustrate the present invention, but not to limit the scope of the present invention. Modifications or substitutions made to the methods, steps or conditions of the present invention without departing from the spirit and essence of the present invention all belong to the scope of the present invention.

实施例1Example 1

以下实施例分别选用了基于CRISPR-Cas9技术进行基因组编辑的7个不同靶标的水稻编辑苗，共计96个样品，分别编号A1-H12。In the following examples, 7 rice edited seedlings with different targets for genome editing based on CRISPR-Cas9 technology were selected, a total of 96 samples were numbered A1-H12.

1.引物设计1. Primer Design

由于特异性引物加上高通量测序接头引物过长，为达到建库的目的，因此选择采用多重PCR建库技术，通过两轮PCR完成对目标区域的富集和建库的目的。Since the specific primers and the high-throughput sequencing adapter primers are too long, in order to achieve the purpose of building the library, the multiplex PCR library building technology is chosen to complete the enrichment of the target region and the purpose of building the library through two rounds of PCR.

第一轮引物应包括以下三个部分：1)特异性扩增引物序列；2)分样标签序列；3)搭桥序列。The first round of primers should include the following three parts: 1) specific amplification primer sequence; 2) sample tag sequence; 3) bridging sequence.

第二轮引物应包括：1)搭桥序列；2)index序列；3)测序接头序列。该PCR引物示意图可以参考如图1所示。The second round of primers should include: 1) bridging sequence; 2) index sequence; 3) sequencing linker sequence. The schematic diagram of the PCR primers can be referred to as shown in Figure 1.

本实施例设计选择的是PE150测序方法，设计的正向引物和反向引物距离靶标在70bp-120bp之间，左右两端测序数据覆盖靶标区间重叠区间大于8bp，扩增片段再加上测序接头和barcode后的大小在100bp-260bp之间。In this example, the PE150 sequencing method is selected for the design. The designed forward primer and reverse primer are between 70bp and 120bp from the target. The sequencing data at the left and right ends cover the target interval and the overlap interval is greater than 8bp. And the size after barcode is between 100bp-260bp.

2.基因组提取及目的片段扩增2. Genome extraction and target fragment amplification

为实现可以大批量快速提取基因组的目的，采用如下快捷的基因组提取步骤：在2mL 96孔离心管架上分别放置好与植物样品对应的2mL离心管，并做好标记，取3-5cm的水稻突变苗的叶片加入离心管内。分别每孔加入50uL 0.2mol的KOH溶液和1-2粒研磨珠。在室温下，将2mL的离心管分别转移到全自动磨样机的适配器中，运行30-60秒，待缓冲液变绿即可。使用排枪分别吸取2uL的上清液于96孔PCR板中，作为PCR的模板。使用第一轮引物进行PCR扩增获得目的条带，PCR结束后取3-5ul产物进行电泳检测，确保扩出目的条带。In order to achieve the purpose of rapidly extracting genomes in large quantities, the following quick genome extraction steps are adopted: Place 2mL centrifuge tubes corresponding to plant samples on a 2mL 96-well centrifuge tube rack, mark them well, and take 3-5cm of rice. The leaves of the mutant seedlings were added to the centrifuge tube. Add 50uL of 0.2mol KOH solution and 1-2 grinding beads to each well, respectively. At room temperature, transfer the 2mL centrifuge tubes to the adapters of the automatic grinder, and run for 30-60 seconds until the buffer turns green. 2uL of the supernatant was pipetted into a 96-well PCR plate using a discharge gun, and used as a template for PCR. Use the first round of primers for PCR amplification to obtain the target band. After PCR, take 3-5ul of the product for electrophoresis detection to ensure that the target band is amplified.

第一轮PCR扩增体系和扩增程序如下表1所示。The first round PCR amplification system and amplification procedures are shown in Table 1 below.

表1：第一轮PCR扩增体系和扩增程序Table 1: First-round PCR amplification system and amplification procedure

以第一轮产物为模板进行二轮扩增，扩增体系和扩增程序如表2所示。The second round of amplification was carried out with the first round product as the template, and the amplification system and amplification procedure are shown in Table 2.

表2：第二轮PCR扩增体系和扩增程序Table 2: Second-round PCR amplification system and amplification procedure

PCR扩增完成后，电泳检测，最后等比例混合后，送高通量测序，测序数据量在1G以上，可以自行设计任意几个碱基，添加不同的barcode序列进行区分。After PCR amplification, electrophoresis detection, and finally mixing in equal proportions, high-throughput sequencing is sent. The amount of sequencing data is more than 1G. You can design any number of bases by yourself and add different barcode sequences to distinguish.

3、测序结果分析3. Analysis of sequencing results

获得测序数据后，按照如图2的数据分析流程进行数据的分析，一般测序公司会交付客户clean data，是已经经过初步数据质控后的数据(也可获得raw data后，手动进行数据初步质控)。获得测序数据，根据分样标签序列，进行拆分样品，分别获得96个样品的测序数据，然后开展如下分析工作：After obtaining the sequencing data, analyze the data according to the data analysis process as shown in Figure 2. Generally, the sequencing company will deliver the clean data to the customer, which is the data that has undergone preliminary data quality control (you can also obtain the raw data and manually perform the preliminary data quality control. control). Obtain the sequencing data, split the samples according to the sub-sample tag sequence, obtain the sequencing data of 96 samples respectively, and then carry out the following analysis work:

3.1数据质控3.1 Data quality control

可以选用高通量数据质控软件Fastp，对测序数据进行质控处理，按照质控要求：碱基质量值应该设置为30，大于或等于30才合格，否则去除reads；去除包含超过10％的“N”碱基的reads。The high-throughput data quality control software Fastp can be used to perform quality control processing on the sequencing data. According to the quality control requirements: the base quality value should be set to 30, and it is only qualified if it is greater than or equal to 30. Otherwise, the reads will be removed; "N" base reads.

3.2序列拼接3.2 Sequence splicing

可以使用编写的程序脚本或序列拼接软件，根据两端测序reads具有重叠片段的特点，将两端测序reads进行拼接，获得更高质量和较长序列的拼接序列。The programmed script or sequence splicing software can be used to splicing the sequencing reads at both ends according to the characteristics of overlapping fragments of the sequencing reads at both ends to obtain a spliced sequence of higher quality and longer sequence.

3.3去冗余和统计3.3 De-redundancy and statistics

可以使用已有的去重复统计软件或脚本，对每个样本的重复数据(序列相同)只保留一个并且进行统计计数，由于相同样品的相同靶标扩增的片段大小基本一致，因此测序拼接的序列也基本一致，所以对于重复数据只保留一个并且进行计数，这样可以极大的减少后续比对回靶标序列的数据量。为保证结果的准确性，应该设置过滤值，去除被统计数量较少的序列，一般的应该是被统计的reads占总reads数的5％以上就可以保留被记录。经过去重复处理，数据量会大大的减少。The existing deduplication statistical software or script can be used to keep only one duplicate data (same sequence) of each sample and perform statistical counting. Since the amplified fragments of the same target of the same sample have basically the same size, the sequence of the sequencing and splicing can be obtained. It is basically the same, so only one duplicate data is kept and counted, which can greatly reduce the amount of data for subsequent alignment back to the target sequence. In order to ensure the accuracy of the results, the filter value should be set to remove the sequences with a small number of counts. Generally, the counted reads should account for more than 5% of the total reads and can be kept and recorded. After deduplication processing, the amount of data will be greatly reduced.

3.4与靶标基因组进行比对3.4 Alignment with the target genome

为更加快速的获得比对结果，先对靶标基因组建立索引，再将上述处理后的去冗余序列比对回靶标基因组，基于Burrow-WheelerAligner原理，比对和参考基因组不一样就判定有突变。此过程96个样品同时进行比对分析只需要几秒(包含本实施例测序数据处理过程整个过程时间为1-2分钟)就可以获得如表3和图3的比对结果。不仅仅可以很直观的了解每个靶标位点的突变情况也可以了解每个样品的具体突变情况和支持的reads数。获得的每个样品的测序拼接序列也可以进入后续更深入的分析和研究。In order to obtain the alignment results more quickly, the target genome is indexed first, and then the de-redundant sequences after the above processing are aligned back to the target genome. Based on the Burrow-Wheeler Aligner principle, if the alignment is different from the reference genome, it is determined that there is a mutation. In this process, it only takes a few seconds for 96 samples to be compared and analyzed at the same time (including the entire process time of the sequencing data processing process in this embodiment is 1-2 minutes) to obtain the comparison results shown in Table 3 and FIG. 3 . Not only can we intuitively understand the mutation status of each target site, but also the specific mutation status and the number of supported reads of each sample. The sequenced spliced sequences obtained for each sample can also be used for subsequent more in-depth analysis and research.

表3：部分样品比对分析统计情况(A01-B12)Table 3: Comparative analysis statistics of some samples (A01-B12)

实施例2Example 2

基于CRISPR-Cas9系统开发的先导编辑系统(Prime editor，PE)不仅可以引入插入和缺失(indels)而且可以实现所有12种碱基到碱基的转换，是新一代编辑利器。但是目前其编辑效率低，使用一代测序方法难以准确的确定编辑效果。因此本实施例基于使用PE编辑系统对293T细胞进行了基因编辑后的编辑效率鉴定。The Prime editor (PE) developed based on the CRISPR-Cas9 system can not only introduce insertions and deletions (indels) but also realize all 12 base-to-base conversions. It is a new generation editing tool. However, its editing efficiency is currently low, and it is difficult to accurately determine the editing effect using next-generation sequencing methods. Therefore, this example is based on the identification of editing efficiency after gene editing of 293T cells using the PE editing system.

1.引物设计1. Primer Design

由于特异性引物加上高通量测序接头引物过长，为达到建库的目的，因此选择采用多重PCR建库技术，通过两轮PCR完成对目标区域的富集和建库的目的。第一轮引物应包括以下三个部分：1)特异性扩增引物序列；2)分样标签序列；3)搭桥序列。第二轮引物应包括：1)搭桥序列；2)index序列；3)测序接头序列。该PCR引物示意图可以参考如图1所示。Since the specific primers and the high-throughput sequencing adapter primers are too long, in order to achieve the purpose of building the library, the multiplex PCR library building technology is chosen to complete the enrichment of the target region and the purpose of building the library through two rounds of PCR. The first round of primers should include the following three parts: 1) specific amplification primer sequence; 2) sample tag sequence; 3) bridging sequence. The second round of primers should include: 1) bridging sequence; 2) index sequence; 3) sequencing linker sequence. The schematic diagram of the PCR primers can be referred to as shown in Figure 1.

本实施例设计选择的是PE250测序方法，设计的正向引物和反向引物距离靶标在70bp-120bp之间，左右两端测序数据覆盖靶标区间重叠区间大于8bp，扩增片段加上测序接头和barcode后的大小在100-460bp之间。In this example, the PE250 sequencing method is selected for the design. The designed forward primer and reverse primer are between 70bp and 120bp from the target. The sequencing data at the left and right ends cover the target interval and the overlap interval is greater than 8bp. The amplified fragment plus the sequencing adapter and The size after barcode is between 100-460bp.

细胞及菌液无需额外提取基因组，只需要将编辑后的细胞或菌液直接进行PCR扩增即可。使用第一轮引物进行PCR扩增获得目的条带，PCR结束后取3-5ul产物进行电泳检测，确保扩出目的条带。There is no need for additional genome extraction from cells and bacterial solutions, and only the edited cells or bacterial solutions need to be directly amplified by PCR. Use the first round of primers for PCR amplification to obtain the target band. After PCR, take 3-5ul of the product for electrophoresis detection to ensure that the target band is amplified.

第一轮PCR扩增体系和扩增程序如表4所示。The first round PCR amplification system and amplification procedures are shown in Table 4.

表4：Table 4:

以第一轮产物为模板进行二轮扩增，扩增体系和扩增程序如表5所示。The first-round product was used as a template for the second round of amplification. The amplification system and amplification procedure are shown in Table 5.

表5：table 5:

PCR扩增完成后，电泳检测，最后等比例混合后，送高通量测序，测序数据量在1G以上。After the PCR amplification is completed, electrophoresis is performed, and after mixing in equal proportions, it is sent to high-throughput sequencing, and the amount of sequencing data is more than 1G.

3.测序结果分析3. Analysis of sequencing results

本实施例数据分析流程与实施例1相同，数据分析流程如图2所示。由于考虑的PE编辑效率低的问题，这里设置的过滤值(统计的reads占总reads数的百分比)未作设置，过滤值可按照实验要求进行设定，但是其余质控条件不能改变。可以获得详细的分析结果如表6和图4所示，能够准确的了解PE系统的编辑结果以及比例，可以明显的观察到PE编辑可以实现对基因组的插入缺失和点突变，但是编辑效率低。The data analysis process of this embodiment is the same as that of Embodiment 1, and the data analysis process is shown in FIG. 2 . Due to the low efficiency of PE editing, the filter value (the percentage of counted reads to the total number of reads) is not set here. The filter value can be set according to the experimental requirements, but the rest of the quality control conditions cannot be changed. Detailed analysis results can be obtained as shown in Table 6 and Figure 4, which can accurately understand the editing results and proportions of the PE system. It can be clearly observed that PE editing can achieve indels and point mutations in the genome, but the editing efficiency is low.

表6：293T细胞比对分析统计情况Table 6: Statistics of 293T cell comparison analysis

对比例1Comparative Example 1

采用HI-TOM方法(Hi-TOM：a platform for high-throughput tracking ofmutations induced by CRISPR/Cas systems”Liu Q等.[J].SCIENCE CHINA LifeSciences,第62卷第1期，第1-7页，2019，01)，基于CRISPR-Cas9技术进行基因组编辑的7个不同靶标的水稻编辑苗进行数据处理和比对分析，共计96个样品，分别编号A1-H12。Using the HI-TOM method (Hi-TOM: a platform for high-throughput tracking of mutations induced by CRISPR/Cas systems" Liu Q et al. [J]. SCIENCE CHINA LifeSciences, Vol. 62, No. 1, pp. 1-7, 2019, 01), based on CRISPR-Cas9 technology for genome editing of 7 different target rice edited seedlings for data processing and comparison analysis, a total of 96 samples, numbered A1-H12.

所得数据与靶标基因组进行比对，时间在1个多小时后，发现无法获得全部结果，其结果如表7所示，其中A03至A12、B03之后均没有数据。The obtained data was compared with the target genome. After more than one hour, it was found that all the results could not be obtained. The results are shown in Table 7, in which there is no data after A03 to A12 and B03.

表7：Table 7:

对比例1所得分析比对结果与实施例1相比，证明了本发明采取的基于高通量测序的确定基因编辑突变情况的方法可以在短时间内得到全部结果的在精准、快速方面的优越性。The analysis and comparison results obtained in Comparative Example 1 are compared with those in Example 1, which proves that the method for determining gene editing mutation status based on high-throughput sequencing adopted in the present invention can obtain all results in a short time and is superior in terms of accuracy and speed. sex.

对比例2Comparative Example 2

采用HI-TOM方法(同对比例1)，基于使用PE编辑系统对293T细胞进行了基因编辑后的编辑效率鉴定。Using the HI-TOM method (same as Comparative Example 1), the editing efficiency of 293T cells after gene editing was identified based on the PE editing system.

所得数据与靶标基因组进行比对，时间在1个多小时后，没有获得比对结果。The obtained data was compared with the target genome, and no comparison result was obtained after more than one hour.

对比例2所得分析比对结果与实施例2相比，证明了本发明采取的基于高通量测序的确定基因编辑突变情况的方法可以在短时间内得到全部结果的在精准、快速方面的优越性。Compared with Example 2, the analysis and comparison results obtained in Comparative Example 2 prove that the method for determining gene editing mutation status based on high-throughput sequencing adopted in the present invention can obtain all results in a short period of time and is superior in accuracy and speed. sex.

本发明并不仅仅限于说明书和实施方式中所描述，因此对于熟悉领域的人员而言可容易地实现另外的优点和改进，故在不背离权利要求及等同范围所限定的一般概念的精神和范围的情况下，本发明并不限于特定的细节、代表性的方案和这里示出与描述的图示示例。The present invention is not limited only to what is described in the specification and embodiments, so that additional advantages and modifications can easily be realized by those skilled in the art, without departing from the spirit and scope of the general concept defined by the claims and equivalents However, the invention is not limited to the specific details, representative aspects and illustrative examples shown and described herein.

Claims

1. A method for quickly and accurately determining a gene editing mutation, comprising the following steps:

S1. Design primers based on the target interval, amplify, construct a library, and send for high-throughput sequencing;

S2. Quality control the sequencing data;

S3. Splicing the reads at both ends of the sequencing data after quality control to obtain long sequences, and counting them; the counting method is: using statistical software or scripts, counting samples with the same sequence as one, and counting the number of different samples and count;

S4. Compare the data obtained in S3 with the target genome sequence, identify different base sequences, obtain analysis results, and determine gene mutation.

2 . The method for quickly and accurately determining gene editing mutations according to claim 1 , wherein, when designing primers to amplify in step S1 , the sequencing data at the left and right ends cover the target interval overlapped by more than 8 bp. 3 .

3. A kind of method for quickly and accurately determining gene editing mutation situation according to claim 1, it is characterized in that, described in the step S1, the sequencing uses one of PE150 sequencing method and PE250 sequencing method; using the PE150 sequencing method The sequencing method is to construct a 100-260 bp library, and when using the PE250 sequencing method, a 100-460 bp library is constructed; the sequencing is based on the Illumina sequencing platform, and when the sequencing is multi-sample mixed sample sequencing, different barcode sequences are added to distinguish.

4. The method for quickly and accurately determining gene editing mutations according to claim 1, wherein in the step S2, the quality control method is: removing reads with a quality value less than 30; removing "N" Reads with base content greater than 10%.

5. The method for quickly and accurately determining gene editing mutations according to claim 1, wherein the step S3 further comprises de-redundancy: removing the ratio of the total number of reads in the statistical count less than 5%. reads.

6 . The method for quickly and accurately determining gene editing mutations according to claim 1 , wherein the target genome sequence in step S4 includes the target interval in step S1 ; the target genome sequence Size is 500-1000bp.

7. The method for quickly and accurately determining gene editing mutations according to claim 1, wherein the comparison method in step S4 is to first establish an index on the target genome sequence, and then perform the step The data in S3 is aligned back to the target genome sequence.

8. The application of the method for rapidly and accurately determining gene editing mutations according to any one of claims 1-6 in screening gene mutants.

9 . The application according to claim 7 , wherein the gene mutants include mutants obtained by artificial methods and natural methods. 10 .

10. The application according to claim 7, wherein the gene mutants include plant mutants, animal mutants and microorganism mutants.