Wang et al., 2009 - Google Patents
Efficient approximate entity extraction with edit distance constraintsWang et al., 2009
View PDF- Document ID
- 10002040964927638121
- Author
- Wang W
- Xiao C
- Lin X
- Zhang C
- Publication year
- Publication venue
- Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
External Links
Snippet
Named entity recognition aims at extracting named entities from unstructured text. A recent trend of named entity recognition is finding approximate matches in the text with respect to a large dictionary of known entities, as the domain knowledge encoded in the dictionary helps …
- 238000000605 extraction 0 title abstract description 15
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30634—Querying
- G06F17/30657—Query processing
- G06F17/30675—Query execution
- G06F17/30678—Query execution using boolean model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30533—Other types of queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30861—Retrieval from the Internet, e.g. browsers
- G06F17/30864—Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems
- G06F17/30867—Retrieval from the Internet, e.g. browsers by querying, e.g. search engines or meta-search engines, crawling techniques, push systems with filtering and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30964—Querying
- G06F17/30979—Query processing
- G06F17/30985—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/20—Handling natural language data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G06F19/22—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology for sequence comparison involving nucleotides or amino acids, e.g. homology search, motif or SNP [Single-Nucleotide Polymorphism] discovery or sequence alignment
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99936—Pattern matching access
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Efficient approximate entity extraction with edit distance constraints | |
Xiao et al. | Top-k set similarity joins | |
Gipp | Citation-based plagiarism detection | |
Wang et al. | Vchunkjoin: An efficient algorithm for edit similarity joins | |
Bar-Yossef et al. | Do not crawl in the DUST: Different URLs with similar text | |
Tang et al. | Bibliometric fingerprints: name disambiguation based on approximate structure equivalence of cognitive maps | |
Aizawa et al. | A fast linkage detection scheme for multi-source information integration | |
US7503035B2 (en) | Software tool for detecting plagiarism in computer source code | |
Kalashnikov et al. | Web people search via connection analysis | |
Deng et al. | A pivotal prefix based filtering algorithm for string similarity search | |
Li et al. | Auto-fuzzyjoin: Auto-program fuzzy similarity joins without labeled examples | |
Sood et al. | Probabilistic near-duplicate detection using simhash | |
Zhang et al. | Hmsearch: An efficient hamming distance query processing algorithm | |
Wang et al. | Efficient and effective knn sequence search with approximate n-grams | |
Li et al. | A partition-based method for string similarity joins with edit-distance constraints | |
Wang et al. | Local similarity search for unstructured text | |
Hourrane et al. | Survey of plagiarism detection approaches and big data techniques related to plagiarism candidate retrieval | |
Chaudhuri et al. | Mining document collections to facilitate accurate approximate entity matching | |
KR100788440B1 (en) | Copy detection system based on theft pattern | |
Alonso et al. | Duplicate news story detection revisited | |
Kim et al. | Efficient overlap and content reuse detection in blogs and online news articles | |
Kalashnikov et al. | Towards breaking the quality curse. a web-querying approach to web people search. | |
Qin et al. | Asymmetric signature schemes for efficient exact edit similarity query processing | |
Lee et al. | An efficient similarity join algorithm with cosine similarity predicate | |
Zou et al. | A cluster-based plagiarism detection method |