CN109903810A - A kind of analysis method of macro genome conformity and moving element - Google Patents
A kind of analysis method of macro genome conformity and moving element Download PDFInfo
- Publication number
- CN109903810A CN109903810A CN201811505402.1A CN201811505402A CN109903810A CN 109903810 A CN109903810 A CN 109903810A CN 201811505402 A CN201811505402 A CN 201811505402A CN 109903810 A CN109903810 A CN 109903810A
- Authority
- CN
- China
- Prior art keywords
- annotation
- information
- integrall
- bio
- moving element
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 16
- 238000000034 method Methods 0.000 claims abstract description 13
- 108090000623 proteins and genes Proteins 0.000 claims description 14
- 230000010354 integration Effects 0.000 claims description 5
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 238000010276 construction Methods 0.000 abstract 1
- 238000002360 preparation method Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 10
- 108020005210 Integrons Proteins 0.000 description 7
- 108010061833 Integrases Proteins 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 241000894006 Bacteria Species 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000002068 genetic effect Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 108700028369 Alleles Proteins 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 102100034343 Integrase Human genes 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004545 gene duplication Effects 0.000 description 1
- 230000008303 genetic mechanism Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 230000004853 protein function Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 230000007923 virulence factor Effects 0.000 description 1
- 239000000304 virulence factor Substances 0.000 description 1
- 210000004885 white matter Anatomy 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the analysis methods of a kind of macro genome conformity and moving element, which comprises the steps of: lower machine data preparation step;Construct local data base step;BLAST compares step;Annotate construction step.The beneficial effects of the present invention are: the sequence amount of annotation is big and speed is fast: this analysis process will use the database under line, for very big data set, can easily also annotate, and the speed annotated can be fast many than submitting online.Annotation result information is complete: this analysis process can automatically classify annotation result, summarize, user-friendly.
Description
Technical field
The present invention relates to field of bioinformatics, and in particular to a kind of analysis side of macro genome conformity and moving element
Method.
Background technique
Integron is genetic mechanism, by storing and expressing new gene, bacterium is made to adapt to and evolve rapidly.These genes are embedding
Enter in the referred to as specific genetic structure of box gene (one is changed into the term of integron box recently), usually carries one without starting
The ORF and recombination site (attC) of son.Integron box is integrated to by integrase mediated locus specificity recombining reaction whole
The site attI of zygote platform.
Mobile genetic elements (MGE) is a kind of inhereditary material, can be moved in genome, can also be turned from a species
Move on to another species.MGE is present in all organisms.In the mankind, about 50% genome is considered as MGE.MGE
Unique effect is played in evolution.Gene duplication event can also be occurred by the mechanism of MGE.MGE can also cause egg
The mutation of white matter code area, this can change protein function.In addition, they can also reset the gene in host genome.Into
An example for changing MGE in environment is that virulence factor and antibiotic resistance can transport the gene of MGE with total with neighbouring bacterium
Enjoy them.The new gene obtained by this mechanism can increase adaptability by obtaining new or additional function.It is another
Aspect, MGE can also reduce adaptability by causing allele or the mutation of disease.
And defect present in the analysis method of the existing integron for macro genome and mobile hereditary original part is:
1) annotation method is single: existing annotation method can only submit sequence to corresponding website.
2) the sequence amount annotated is too small: due to that can only annotate online, so the sequence amount uploaded receives control, can not do
Large batch of annotation.
3) annotation result information is imperfect: information included in annotation result is imperfect, needs the information table of its website
It is corresponding manually, a large amount of manual operations are undoubtedly brought, the time is increased, reduce efficiency.
Summary of the invention
In order to overcome drawbacks described above present in the prior art, the purpose of the present invention is to provide a kind of macro genome conformities
The analysis method of son and moving element.
One of in order to achieve the object of the present invention, used technical solution is:
A kind of analysis method of macro genome conformity and moving element, includes the following steps:
Step 1: downloaded from the website of analytical integration and mobile original part corresponding data (http: //
Integrall.bio.ua.pt/? getFastaAll);
It is corresponded to step 2: the data downloaded are established using makeblastdb in sequence alignment program blast software
The database of blast;
Step 3: the gene order of input and the established database of previous step are used in sequence alignment program blast
Blastn be compared;The method of the comparison used is blastn, and evalue is necessarily less than 1e-5, and the similarity of comparison is big
In 60%;
Step 4: being carried out using the middle network tool curl of linux to website http://integrall.bio.ua.pt
The collecting work of annotation information, the time that this step is spent is larger, needs to bear with;
Step 5: third step annotation is integrated, finally obtained using awk software to the information of result and the 4th step
The annotation result of integron and mobile original part.
In a preferred embodiment of the invention, the annotation information acquisition of the step 4 specifically:
Using the network tool curl of linux, first to integrating subnet http://integrall.bio.ua.pt/?
List accesses, and uses order:
seq 50 50 10000|xargs-i-P 10curl-L
Http:// integrall.bio.ua.pt/? { } &ob=org is accessed using concurrent type frog and is extracted letter list&s=
Breath.
Main innovation point of the invention is:
The sequence amount of annotation is big and speed is fast: this analysis process will use the database under line, for very big data
Collection, can easily also annotate, and the speed annotated can be fast many than submitting online.
Annotation result information is complete: this analysis process can automatically classify annotation result, summarize, user is facilitated to make
With.
Detailed description of the invention
Fig. 1 is flow diagram of the invention.
Fig. 2 is the schematic diagram of prior art blast.
Fig. 3 is the schematic diagram 1 of step 4.
Fig. 4 is the schematic diagram 2 of step 4.
Fig. 5 is the schematic diagram 3 of step 4.
Fig. 6 is the schematic diagram 4 of step 4.
Fig. 7 is the schematic diagram 5 of step 4.
Fig. 8 is the schematic diagram 1 of step 5.
Fig. 9 a, b are the schematic diagram 2 of step 5.
Figure 10 is the schematic diagram 3 of step 5.
Specific embodiment
The present invention is further illustrated by the following examples, but these embodiments must not be used to explain to the present invention
Limitation.
1. downloaded on from the website of analytical integration and mobile original part corresponding data (http: //
Integrall.bio.ua.pt/? getFastaAll)
2. the data downloaded are established corresponding blast's using makeblastdb in sequence alignment program blast software
Database;
3. the gene order of input and the established database of previous step are used in sequence alignment program blast
Blastn is compared;The method of the comparison used is blastn, and evalue is necessarily less than 1e-5, and the similarity of comparison is greater than
60%;
4. being annotated using the middle network tool curl of linux to website http://integrall.bio.ua.pt
The collecting work of information, the time that this step is spent is larger, needs to bear with;
5. being integrated to the information of result and the 4th step, finally obtaining integron third step annotation using awk software
With the annotation result of mobile original part
Previous annotation means can only online blast, as shown in Figure 2:
Wherein, the defect of the online blast of the prior art is: the sequence amount that single annotation uploads cannot be greater than 2MB, etc.
To time it is longer, the annotation of large-scale data volume can not be suitable for.Therefore this analysis method is come into being.
Therefore, step 4 of the invention is to pass through: using the network tool curl of linux, first to integrating subnet
Http:// integrall.bio.ua.pt/? list accesses, and spends 1.729s, obtains integration subdata up to the present
The information of the first page in library pays close attention to the annotation information of corresponding data: such as Fig. 3:
Then it needs to extract Accession Nr, Organism, Integrase gene from the webpage of return,
Cassette array, and obtain the result of how many current page.As shown in figure 4, so far, integron website is received at present
Page 201 are recorded, totally 10032 records, it is thus understood that when total number of pages, the access of concurrent type frog can be taken to extract information, to accelerate
The speed of information extraction, uses order:
seq 50 50 10000|xargs -i -P 10curl -L
Http:// integrall.bio.ua.pt/? list&s={ } &ob=org
10 corresponding webpages can be accessed in 2s, and extract information therein, in total page 201 of target webpage
In, it extracts completely, and in the case where network interruption not occurring, it is expected to be spend 50~60s.It can be seen that ultrahigh in efficiency.
Obtaining outside the record included, it is also necessary to obtain every included gene information of record, need this 10032
In item record, the acquisition of its homepage is accessed every time:
Curl-L http://integrall.bio.ua.pt/? acc={ Accession Nr }
As shown in figure 5, corresponding Gene is extracted, the information of Product, Sequences, by 10032 records one by one with simultaneously
The form access that hair is 10 times, spends about 1.88s every time, extracts completely, the order used is: cat Accession_num_
File | 10 curl-L of xargs-P " http://integrall.bio.ua.pt/? acc={ } ",
And not network interruption occurs in the case where, it is contemplated that spend 2000s.
2060s is spent during this in total.Speed is exceedingly fast.
The information table of extraction is as shown in Figure 6:
As shown in fig. 7, each column are respectively Accession Nr, Organism, Integrase gene, Cassette
array;
Each column are respectively Accession Nr, Gene, Product, Sequences
Step 5 of the invention will be annotated as a result, corresponding with information;
Such as: annotation result as shown in figure 8, and the first row of two information that is got with the 4th step correspond pair
It answers: then being annotated as a result, as shown in Fig. 9 a, 9b, 10.
Wherein, as shown in Figure 10, the gene order of this each column respectively input, Accession Nr, Organism,
Gene, Product, Integrase gene, Cassette array.
Because having used above-mentioned technical solution, the online annotation of technical solution of the present invention compared to the prior art
The advantages of be:
The sequence amount of annotation is big and speed is fast: this analysis process will use the database under line, for very big data
Collection, can easily also annotate, and the speed annotated can be fast many than submitting online.
Annotation result information is complete: this analysis process can automatically classify annotation result, summarize, user is facilitated to make
With.
Claims (2)
1. a kind of analysis method of macro genome conformity and moving element, which comprises the steps of:
Step 1: downloading corresponding data from the website of analytical integration and mobile original part;
Step 2: the data downloaded are established corresponding blast using makeblastdb in sequence alignment program blast software
Database;
Step 3: the gene order of input and the established database of previous step are used in sequence alignment program blast
Blastn is compared;The method of the comparison used is blastn, and evalue is necessarily less than 1e-5, and the similarity of comparison is greater than
60%;
Step 4: being annotated using the middle network tool curl of linux to website http://integrall.bio.ua.pt
The collecting work of information, the time that this step is spent is larger, needs to bear with;
Step 5: being integrated to the information of result and the 4th step, finally obtaining integration third step annotation using awk software
The annotation result of son and mobile original part.
2. a kind of analysis method of macro genome conformity and moving element as described in claim 1, which is characterized in that described
The annotation information of step 4 acquires specifically:
Using the network tool curl of linux, first to integrating subnet http://integrall.bio.ua.pt/? list
It accesses, and uses order:
seq 50 50 10000|xargs-i-P 10curl-L
Http:// integrall.bio.ua.pt/? { } &ob=org is accessed using concurrent type frog and is extracted information list&s=.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811505402.1A CN109903810A (en) | 2018-12-10 | 2018-12-10 | A kind of analysis method of macro genome conformity and moving element |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811505402.1A CN109903810A (en) | 2018-12-10 | 2018-12-10 | A kind of analysis method of macro genome conformity and moving element |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN109903810A true CN109903810A (en) | 2019-06-18 |
Family
ID=66943417
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811505402.1A Withdrawn CN109903810A (en) | 2018-12-10 | 2018-12-10 | A kind of analysis method of macro genome conformity and moving element |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109903810A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119626328A (en) * | 2024-11-18 | 2025-03-14 | 中国人民解放军军事科学院军事医学研究院 | A method for efficiently identifying mobile genetic elements in bacterial plasmids |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140280327A1 (en) * | 2013-03-15 | 2014-09-18 | Cypher Genomics | Systems and methods for genomic variant annotation |
| CN107194208A (en) * | 2017-04-25 | 2017-09-22 | 北京荣之联科技股份有限公司 | A kind of genetic analysis annotates method and apparatus |
| CN108334750A (en) * | 2018-04-19 | 2018-07-27 | 江苏先声医学诊断有限公司 | A method and system for analyzing metagenomic data |
-
2018
- 2018-12-10 CN CN201811505402.1A patent/CN109903810A/en not_active Withdrawn
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20140280327A1 (en) * | 2013-03-15 | 2014-09-18 | Cypher Genomics | Systems and methods for genomic variant annotation |
| CN107194208A (en) * | 2017-04-25 | 2017-09-22 | 北京荣之联科技股份有限公司 | A kind of genetic analysis annotates method and apparatus |
| CN108334750A (en) * | 2018-04-19 | 2018-07-27 | 江苏先声医学诊断有限公司 | A method and system for analyzing metagenomic data |
Non-Patent Citations (2)
| Title |
|---|
| 企鹅号-美吉生物: "《Blast:大神教你轻松搞定序列比对》", 《HTTPS://CLOUD.TENCENT.COM/DEVELOPER/NEWS/155849》 * |
| 欧易生物: "《如何提取gff文件中的基因注释信息》", 《HTTPS://WWW.SOHU.COM/A/124625014_464200?_TRANS_=000019_WZWZA》 * |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119626328A (en) * | 2024-11-18 | 2025-03-14 | 中国人民解放军军事科学院军事医学研究院 | A method for efficiently identifying mobile genetic elements in bacterial plasmids |
| CN119626328B (en) * | 2024-11-18 | 2025-05-23 | 中国人民解放军军事科学院军事医学研究院 | Method for efficiently identifying bacterial plasmid intracellular movement genetic element |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Devoto et al. | Megaphages infect Prevotella and variants are widespread in gut microbiomes | |
| Imelfort et al. | GroopM: an automated tool for the recovery of population genomes from related metagenomes | |
| Siddharthan et al. | PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny | |
| Orengo et al. | Bioinformatics: genes, proteins and computers | |
| WO2014066635A1 (en) | Genome explorer system to process and present nucleotide variations in genome sequence data | |
| Juretic et al. | Transposable element annotation of the rice genome | |
| Jariani et al. | SANTA-SIM: simulating viral sequence evolution dynamics under selection and recombination | |
| Sanders et al. | FusoPortal: an interactive repository of hybrid MinION-sequenced Fusobacterium genomes improves gene identification and characterization | |
| CN105426700B (en) | A kind of method that batch calculates genome ortholog evolutionary rate | |
| Wei et al. | scPreGAN, a deep generative model for predicting the response of single-cell expression to perturbation | |
| Yap et al. | High performance computational methods for biological sequence analysis | |
| CN109903810A (en) | A kind of analysis method of macro genome conformity and moving element | |
| CN116417066B (en) | Subcellular localization method for predicting long non-coding RNA based on deep learning | |
| Schulz et al. | Fishing for a reelGene: evaluating gene models with evolution and machine learning | |
| JP2010204753A (en) | Species identification method and system | |
| CN110534157A (en) | A kind of batch extracting genomic gene information simultaneously translates the method for comparing analytical sequence | |
| Majidian et al. | Quest for orthologs in the era of data deluge and AI: challenges and innovations in orthology prediction and data integration | |
| CN119068987B (en) | Automatic interpretation analysis method and device for sequencing data, equipment and storage medium | |
| Singhal et al. | Using supervised machine-learning approaches to understand abiotic stress tolerance and design resilient crops | |
| CN109308935A (en) | A method and application platform for predicting non-coding DNA based on support vector machine | |
| Zhang et al. | Phylotranscriptomic analysis based on coalescence was less influenced by the evolving rates and the number of genes: a case study in Ericales | |
| CN114003815B (en) | Network public opinion theme and discovery method of user group concerned by same | |
| Carrion et al. | ECuADOR—Easy Curation of Angiosperm Duplicated Organellar Regions, a tool for cleaning and curating plastomes assembled from next generation sequencing pipelines | |
| Batut et al. | Pathogen detection from (direct Nanopore) sequencing data using Galaxy-Foodborne Edition | |
| Roncoroni et al. | Preparing genomic data for phylogeny reconstruction |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WW01 | Invention patent application withdrawn after publication | ||
| WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190618 |