[go: up one dir, main page]

CN109903810A - A kind of analysis method of macro genome conformity and moving element - Google Patents

A kind of analysis method of macro genome conformity and moving element Download PDF

Info

Publication number
CN109903810A
CN109903810A CN201811505402.1A CN201811505402A CN109903810A CN 109903810 A CN109903810 A CN 109903810A CN 201811505402 A CN201811505402 A CN 201811505402A CN 109903810 A CN109903810 A CN 109903810A
Authority
CN
China
Prior art keywords
annotation
information
integrall
bio
moving element
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811505402.1A
Other languages
Chinese (zh)
Inventor
杨洋
薛正晟
孙子奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI PERSONAL BIOTECHNOLOGY CO Ltd
Original Assignee
SHANGHAI PERSONAL BIOTECHNOLOGY CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI PERSONAL BIOTECHNOLOGY CO Ltd filed Critical SHANGHAI PERSONAL BIOTECHNOLOGY CO Ltd
Priority to CN201811505402.1A priority Critical patent/CN109903810A/en
Publication of CN109903810A publication Critical patent/CN109903810A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the analysis methods of a kind of macro genome conformity and moving element, which comprises the steps of: lower machine data preparation step;Construct local data base step;BLAST compares step;Annotate construction step.The beneficial effects of the present invention are: the sequence amount of annotation is big and speed is fast: this analysis process will use the database under line, for very big data set, can easily also annotate, and the speed annotated can be fast many than submitting online.Annotation result information is complete: this analysis process can automatically classify annotation result, summarize, user-friendly.

Description

A kind of analysis method of macro genome conformity and moving element
Technical field
The present invention relates to field of bioinformatics, and in particular to a kind of analysis side of macro genome conformity and moving element Method.
Background technique
Integron is genetic mechanism, by storing and expressing new gene, bacterium is made to adapt to and evolve rapidly.These genes are embedding Enter in the referred to as specific genetic structure of box gene (one is changed into the term of integron box recently), usually carries one without starting The ORF and recombination site (attC) of son.Integron box is integrated to by integrase mediated locus specificity recombining reaction whole The site attI of zygote platform.
Mobile genetic elements (MGE) is a kind of inhereditary material, can be moved in genome, can also be turned from a species Move on to another species.MGE is present in all organisms.In the mankind, about 50% genome is considered as MGE.MGE Unique effect is played in evolution.Gene duplication event can also be occurred by the mechanism of MGE.MGE can also cause egg The mutation of white matter code area, this can change protein function.In addition, they can also reset the gene in host genome.Into An example for changing MGE in environment is that virulence factor and antibiotic resistance can transport the gene of MGE with total with neighbouring bacterium Enjoy them.The new gene obtained by this mechanism can increase adaptability by obtaining new or additional function.It is another Aspect, MGE can also reduce adaptability by causing allele or the mutation of disease.
And defect present in the analysis method of the existing integron for macro genome and mobile hereditary original part is:
1) annotation method is single: existing annotation method can only submit sequence to corresponding website.
2) the sequence amount annotated is too small: due to that can only annotate online, so the sequence amount uploaded receives control, can not do Large batch of annotation.
3) annotation result information is imperfect: information included in annotation result is imperfect, needs the information table of its website It is corresponding manually, a large amount of manual operations are undoubtedly brought, the time is increased, reduce efficiency.
Summary of the invention
In order to overcome drawbacks described above present in the prior art, the purpose of the present invention is to provide a kind of macro genome conformities The analysis method of son and moving element.
One of in order to achieve the object of the present invention, used technical solution is:
A kind of analysis method of macro genome conformity and moving element, includes the following steps:
Step 1: downloaded from the website of analytical integration and mobile original part corresponding data (http: // Integrall.bio.ua.pt/? getFastaAll);
It is corresponded to step 2: the data downloaded are established using makeblastdb in sequence alignment program blast software The database of blast;
Step 3: the gene order of input and the established database of previous step are used in sequence alignment program blast Blastn be compared;The method of the comparison used is blastn, and evalue is necessarily less than 1e-5, and the similarity of comparison is big In 60%;
Step 4: being carried out using the middle network tool curl of linux to website http://integrall.bio.ua.pt The collecting work of annotation information, the time that this step is spent is larger, needs to bear with;
Step 5: third step annotation is integrated, finally obtained using awk software to the information of result and the 4th step The annotation result of integron and mobile original part.
In a preferred embodiment of the invention, the annotation information acquisition of the step 4 specifically:
Using the network tool curl of linux, first to integrating subnet http://integrall.bio.ua.pt/? List accesses, and uses order:
seq 50 50 10000|xargs-i-P 10curl-L
Http:// integrall.bio.ua.pt/? { } &ob=org is accessed using concurrent type frog and is extracted letter list&s= Breath.
Main innovation point of the invention is:
The sequence amount of annotation is big and speed is fast: this analysis process will use the database under line, for very big data Collection, can easily also annotate, and the speed annotated can be fast many than submitting online.
Annotation result information is complete: this analysis process can automatically classify annotation result, summarize, user is facilitated to make With.
Detailed description of the invention
Fig. 1 is flow diagram of the invention.
Fig. 2 is the schematic diagram of prior art blast.
Fig. 3 is the schematic diagram 1 of step 4.
Fig. 4 is the schematic diagram 2 of step 4.
Fig. 5 is the schematic diagram 3 of step 4.
Fig. 6 is the schematic diagram 4 of step 4.
Fig. 7 is the schematic diagram 5 of step 4.
Fig. 8 is the schematic diagram 1 of step 5.
Fig. 9 a, b are the schematic diagram 2 of step 5.
Figure 10 is the schematic diagram 3 of step 5.
Specific embodiment
The present invention is further illustrated by the following examples, but these embodiments must not be used to explain to the present invention Limitation.
1. downloaded on from the website of analytical integration and mobile original part corresponding data (http: // Integrall.bio.ua.pt/? getFastaAll)
2. the data downloaded are established corresponding blast's using makeblastdb in sequence alignment program blast software Database;
3. the gene order of input and the established database of previous step are used in sequence alignment program blast Blastn is compared;The method of the comparison used is blastn, and evalue is necessarily less than 1e-5, and the similarity of comparison is greater than 60%;
4. being annotated using the middle network tool curl of linux to website http://integrall.bio.ua.pt The collecting work of information, the time that this step is spent is larger, needs to bear with;
5. being integrated to the information of result and the 4th step, finally obtaining integron third step annotation using awk software With the annotation result of mobile original part
Previous annotation means can only online blast, as shown in Figure 2:
Wherein, the defect of the online blast of the prior art is: the sequence amount that single annotation uploads cannot be greater than 2MB, etc. To time it is longer, the annotation of large-scale data volume can not be suitable for.Therefore this analysis method is come into being.
Therefore, step 4 of the invention is to pass through: using the network tool curl of linux, first to integrating subnet Http:// integrall.bio.ua.pt/? list accesses, and spends 1.729s, obtains integration subdata up to the present The information of the first page in library pays close attention to the annotation information of corresponding data: such as Fig. 3:
Then it needs to extract Accession Nr, Organism, Integrase gene from the webpage of return, Cassette array, and obtain the result of how many current page.As shown in figure 4, so far, integron website is received at present Page 201 are recorded, totally 10032 records, it is thus understood that when total number of pages, the access of concurrent type frog can be taken to extract information, to accelerate The speed of information extraction, uses order:
seq 50 50 10000|xargs -i -P 10curl -L
Http:// integrall.bio.ua.pt/? list&s={ } &ob=org
10 corresponding webpages can be accessed in 2s, and extract information therein, in total page 201 of target webpage In, it extracts completely, and in the case where network interruption not occurring, it is expected to be spend 50~60s.It can be seen that ultrahigh in efficiency.
Obtaining outside the record included, it is also necessary to obtain every included gene information of record, need this 10032 In item record, the acquisition of its homepage is accessed every time:
Curl-L http://integrall.bio.ua.pt/? acc={ Accession Nr }
As shown in figure 5, corresponding Gene is extracted, the information of Product, Sequences, by 10032 records one by one with simultaneously The form access that hair is 10 times, spends about 1.88s every time, extracts completely, the order used is: cat Accession_num_ File | 10 curl-L of xargs-P " http://integrall.bio.ua.pt/? acc={ } ",
And not network interruption occurs in the case where, it is contemplated that spend 2000s.
2060s is spent during this in total.Speed is exceedingly fast.
The information table of extraction is as shown in Figure 6:
As shown in fig. 7, each column are respectively Accession Nr, Organism, Integrase gene, Cassette array;
Each column are respectively Accession Nr, Gene, Product, Sequences
Step 5 of the invention will be annotated as a result, corresponding with information;
Such as: annotation result as shown in figure 8, and the first row of two information that is got with the 4th step correspond pair It answers: then being annotated as a result, as shown in Fig. 9 a, 9b, 10.
Wherein, as shown in Figure 10, the gene order of this each column respectively input, Accession Nr, Organism, Gene, Product, Integrase gene, Cassette array.
Because having used above-mentioned technical solution, the online annotation of technical solution of the present invention compared to the prior art The advantages of be:
The sequence amount of annotation is big and speed is fast: this analysis process will use the database under line, for very big data Collection, can easily also annotate, and the speed annotated can be fast many than submitting online.
Annotation result information is complete: this analysis process can automatically classify annotation result, summarize, user is facilitated to make With.

Claims (2)

1. a kind of analysis method of macro genome conformity and moving element, which comprises the steps of:
Step 1: downloading corresponding data from the website of analytical integration and mobile original part;
Step 2: the data downloaded are established corresponding blast using makeblastdb in sequence alignment program blast software Database;
Step 3: the gene order of input and the established database of previous step are used in sequence alignment program blast Blastn is compared;The method of the comparison used is blastn, and evalue is necessarily less than 1e-5, and the similarity of comparison is greater than 60%;
Step 4: being annotated using the middle network tool curl of linux to website http://integrall.bio.ua.pt The collecting work of information, the time that this step is spent is larger, needs to bear with;
Step 5: being integrated to the information of result and the 4th step, finally obtaining integration third step annotation using awk software The annotation result of son and mobile original part.
2. a kind of analysis method of macro genome conformity and moving element as described in claim 1, which is characterized in that described The annotation information of step 4 acquires specifically:
Using the network tool curl of linux, first to integrating subnet http://integrall.bio.ua.pt/? list It accesses, and uses order:
seq 50 50 10000|xargs-i-P 10curl-L
Http:// integrall.bio.ua.pt/? { } &ob=org is accessed using concurrent type frog and is extracted information list&s=.
CN201811505402.1A 2018-12-10 2018-12-10 A kind of analysis method of macro genome conformity and moving element Withdrawn CN109903810A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811505402.1A CN109903810A (en) 2018-12-10 2018-12-10 A kind of analysis method of macro genome conformity and moving element

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811505402.1A CN109903810A (en) 2018-12-10 2018-12-10 A kind of analysis method of macro genome conformity and moving element

Publications (1)

Publication Number Publication Date
CN109903810A true CN109903810A (en) 2019-06-18

Family

ID=66943417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811505402.1A Withdrawn CN109903810A (en) 2018-12-10 2018-12-10 A kind of analysis method of macro genome conformity and moving element

Country Status (1)

Country Link
CN (1) CN109903810A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119626328A (en) * 2024-11-18 2025-03-14 中国人民解放军军事科学院军事医学研究院 A method for efficiently identifying mobile genetic elements in bacterial plasmids

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140280327A1 (en) * 2013-03-15 2014-09-18 Cypher Genomics Systems and methods for genomic variant annotation
CN107194208A (en) * 2017-04-25 2017-09-22 北京荣之联科技股份有限公司 A kind of genetic analysis annotates method and apparatus
CN108334750A (en) * 2018-04-19 2018-07-27 江苏先声医学诊断有限公司 A method and system for analyzing metagenomic data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140280327A1 (en) * 2013-03-15 2014-09-18 Cypher Genomics Systems and methods for genomic variant annotation
CN107194208A (en) * 2017-04-25 2017-09-22 北京荣之联科技股份有限公司 A kind of genetic analysis annotates method and apparatus
CN108334750A (en) * 2018-04-19 2018-07-27 江苏先声医学诊断有限公司 A method and system for analyzing metagenomic data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
企鹅号-美吉生物: "《Blast:大神教你轻松搞定序列比对》", 《HTTPS://CLOUD.TENCENT.COM/DEVELOPER/NEWS/155849》 *
欧易生物: "《如何提取gff文件中的基因注释信息》", 《HTTPS://WWW.SOHU.COM/A/124625014_464200?_TRANS_=000019_WZWZA》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119626328A (en) * 2024-11-18 2025-03-14 中国人民解放军军事科学院军事医学研究院 A method for efficiently identifying mobile genetic elements in bacterial plasmids
CN119626328B (en) * 2024-11-18 2025-05-23 中国人民解放军军事科学院军事医学研究院 Method for efficiently identifying bacterial plasmid intracellular movement genetic element

Similar Documents

Publication Publication Date Title
Devoto et al. Megaphages infect Prevotella and variants are widespread in gut microbiomes
Imelfort et al. GroopM: an automated tool for the recovery of population genomes from related metagenomes
Siddharthan et al. PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny
Orengo et al. Bioinformatics: genes, proteins and computers
WO2014066635A1 (en) Genome explorer system to process and present nucleotide variations in genome sequence data
Juretic et al. Transposable element annotation of the rice genome
Jariani et al. SANTA-SIM: simulating viral sequence evolution dynamics under selection and recombination
Sanders et al. FusoPortal: an interactive repository of hybrid MinION-sequenced Fusobacterium genomes improves gene identification and characterization
CN105426700B (en) A kind of method that batch calculates genome ortholog evolutionary rate
Wei et al. scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation
Yap et al. High performance computational methods for biological sequence analysis
CN109903810A (en) A kind of analysis method of macro genome conformity and moving element
CN116417066B (en) Subcellular localization method for predicting long non-coding RNA based on deep learning
Schulz et al. Fishing for a reelGene: evaluating gene models with evolution and machine learning
JP2010204753A (en) Species identification method and system
CN110534157A (en) A kind of batch extracting genomic gene information simultaneously translates the method for comparing analytical sequence
Majidian et al. Quest for orthologs in the era of data deluge and AI: challenges and innovations in orthology prediction and data integration
CN119068987B (en) Automatic interpretation analysis method and device for sequencing data, equipment and storage medium
Singhal et al. Using supervised machine-learning approaches to understand abiotic stress tolerance and design resilient crops
CN109308935A (en) A method and application platform for predicting non-coding DNA based on support vector machine
Zhang et al. Phylotranscriptomic analysis based on coalescence was less influenced by the evolving rates and the number of genes: a case study in Ericales
CN114003815B (en) Network public opinion theme and discovery method of user group concerned by same
Carrion et al. ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines
Batut et al. Pathogen detection from (direct Nanopore) sequencing data using Galaxy-Foodborne Edition
Roncoroni et al. Preparing genomic data for phylogeny reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190618