[go: up one dir, main page]

Yan, 1995 - Google Patents

Duplicate detection in information dissemination

Yan, 1995

View PDF
Document ID
2494630744496179479
Author
Yan T
Publication year

External Links

Snippet

Our experience with the SIFT [YGM95] information dissemination system (in use by over 7,000 users daily) has identified an important and generic dissemination problem: duplicate information. In this paper we explain why duplicates arise, we quantify the problem, and we …
Continue reading at ilpubs.stanford.edu:8090 (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30286Information retrieval; Database structures therefor; File system structures therefor in structured data stores
    • G06F17/30386Retrieval requests
    • G06F17/30424Query processing
    • G06F17/30477Query execution
    • G06F17/30483Query execution of query operations
    • G06F17/30486Unary operations; data partitioning operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/3061Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30017Multimedia data retrieval; Retrieval of more than one type of audiovisual media
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30861Retrieval from the Internet, e.g. browsers
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30011Document retrieval systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/30Information retrieval; Database structures therefor; File system structures therefor
    • G06F17/30067File systems; File servers

Similar Documents

Publication Publication Date Title
US11947513B2 (en) Search phrase processing
Yan et al. The SIFT information dissemination system
Yan Duplicate detection in information dissemination
US6757675B2 (en) Method and apparatus for indexing document content and content comparison with World Wide Web search service
US6721749B1 (en) Populating a data warehouse using a pipeline approach
Chen et al. Ti: an efficient indexing mechanism for real-time search on tweets
US6314421B1 (en) Method and apparatus for indexing documents for message filtering
US6226630B1 (en) Method and apparatus for filtering incoming information using a search engine and stored queries defining user folders
US20040205044A1 (en) Method for storing inverted index, method for on-line updating the same and inverted index mechanism
JP2000003321A (en) Message storage structure of high performance
JP2010520549A (en) Data storage and management methods
KR20040017008A (en) System and method for offering information using a search engine
CN111460255A (en) Music work information data acquisition and storage method
Shivakumar Detecting digital copyright violations on the Internet
Garcia-Molina Duplicate Removal in Information Dissemination
Yan et al. Information finding in a digital library: the Stanford perspective
Uehara et al. Information retrieval based on temporal attributes in WWW archives
Yan Efficient techniques for wide-area information dissemination
Kourdounakis Subscription Indexes for Web Syndication Systems
Schäuble Integrating Information Retrieval and Database Functions