[go: up one dir, main page]

Cunial et al., 2022 - Google Patents

Fast and compact matching statistics analytics

Cunial et al., 2022

View HTML
Document ID
13498632182449248599
Author
Cunial F
Denas O
Belazzougui D
Publication year
Publication venue
Bioinformatics

External Links

Snippet

Motivation Fast, lightweight methods for comparing the sequence of ever larger assembled genomes from ever growing databases are increasingly needed in the era of accurate long reads and pan-genome initiatives. Matching statistics is a popular method for computing …
Continue reading at academic.oup.com (HTML) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30312Storage and indexing structures; Management thereof
    • G06F17/30321Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30943Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
    • G06F17/30946Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/21Text processing
    • G06F17/22Manipulating or registering by use of codes, e.g. in sequence of text characters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F17/30613Indexing
    • G06F17/30619Indexing indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/22Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/28Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for programming tools or database systems, e.g. ontologies, heterogeneous data integration, data warehousing or computing architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/20Handling natural language data
    • G06F17/27Automatic analysis, e.g. parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F19/00Digital computing or data processing equipment or methods, specially adapted for specific applications
    • G06F19/10Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
    • G06F19/14Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for phylogeny or evolution, e.g. evolutionarily conserved regions determination or phylogenetic tree construction
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled

Similar Documents

Publication Publication Date Title
Cox et al. Large-scale compression of genomic sequence databases with the Burrows–Wheeler transform
Marçais et al. Locality-sensitive hashing for the edit distance
Kuhnle et al. Efficient construction of a complete index for pan-genomics read alignment
Rahman et al. Representation of k-mer sets using spectrum-preserving string sets
Bernard et al. Alignment-free inference of hierarchical and reticulate phylogenomic relationships
Simpson et al. Efficient construction of an assembly string graph using the FM-index
Li Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
Chikhi et al. On the representation of de Bruijn graphs
Conway et al. Succinct data structures for assembling large genomes
Muggli et al. Building large updatable colored de Bruijn graphs via merging
Nicolae et al. LFQC: a lossless compression algorithm for FASTQ files
US20180373839A1 (en) Systems and methods for encoding genomic graph information
Kingsford et al. Reference-based compression of short-read sequences using path encoding
Löchel et al. Fractal construction of constrained code words for DNA storage systems
Elworth et al. To petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics
Sun et al. Allsome sequence bloom trees
Vinga et al. Pattern matching through Chaos Game Representation: bridging numerical and discrete data structures for biological sequence analysis
Marschall et al. Efficient exact motif discovery
Sun et al. Allsome sequence bloom trees
Mustafa et al. Dynamic compression schemes for graph coloring
Ginart et al. Optimal compressed representation of high throughput sequence data via light assembly
Braga et al. The solution space of sorting by DCJ
Cunial et al. Fast and compact matching statistics analytics
Recanati et al. A spectral algorithm for fast de novo layout of uncorrected long nanopore reads
Holley et al. Dynamic alignment-free and reference-free read compression