RU2007141666A - METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES - Google Patents
METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES Download PDFInfo
- Publication number
- RU2007141666A RU2007141666A RU2007141666/09A RU2007141666A RU2007141666A RU 2007141666 A RU2007141666 A RU 2007141666A RU 2007141666/09 A RU2007141666/09 A RU 2007141666/09A RU 2007141666 A RU2007141666 A RU 2007141666A RU 2007141666 A RU2007141666 A RU 2007141666A
- Authority
- RU
- Russia
- Prior art keywords
- information
- classes
- processing
- class
- document
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract 8
- 230000000877 morphologic effect Effects 0.000 claims abstract 3
- 238000010606 normalization Methods 0.000 claims abstract 2
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
1. Способ сбора, обработки и каталогизации целевой информации из неструктурированных источников, по которому клиентами формулируется задача по поиску и отбору из информационных сетей соответствующей их запросу информации, посредством регистрации на сайте компании, осуществляющей сбор и анализ такой информации, производится идентификация клиента, клиенту предлагается тема или перечень тем, которые предварительно определяются и настраиваются экспертным путем, предварительно формируют базу контрольных информационных признаков, подлежащих выявлению в информационном потоке, принимают информационный поток, т.е. электронные документы, отобранные с информационных ресурсов, последовательно обрабатывают электронные документы из информационного потока, выделяют из поступившего на обработку электронного документа список элементов и список слов, используя лексический анализ текстовой информации, обеспечивающий подготовительную нормализацию обрабатываемых электронных документов, выделяют по установленным правилам информационные признаки, сравнивают их с контрольными информационными признаками из базы данных, содержащей всю справочную информацию, включающую все морфологические и семантические характеристики словосочетаний, а также слова-синонимы и тематически связанные слова, по результатам сравнения фиксируют наличие или отсутствие в каждом поступившем на обработку электронном документе идентификационных признаков, подлежащих выявлению, на основе этого анализа принимается решение о дальнейшей обработке электронных документов, проводят обработку этих документов с использованием детального м1. A method for collecting, processing and cataloging target information from unstructured sources, according to which the clients formulate the task of searching and selecting information corresponding to their request from information networks, by registering on the company’s website collecting and analyzing such information, the client is identified, the client is invited to a topic or a list of topics that are pre-determined and configured by experts, pre-form a database of control information features, next to aschih identify in the information flow, receiving an information flow, i.e. electronic documents selected from information resources sequentially process electronic documents from the information stream, select a list of elements and a list of words from the electronic document received for processing, using lexical analysis of text information that provides preparatory normalization of processed electronic documents, select information signs according to established rules, compare them with control information signs from a database containing all the reference information According to the results of comparison, the presence, including all morphological and semantic characteristics of phrases, as well as synonyms and thematically related words, fixes the presence or absence of identification attributes to be identified in each electronic document received, based on this analysis, a decision is made on further processing of electronic documents, carry out the processing of these documents using the detailed m
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RU2007141666/09A RU2007141666A (en) | 2007-11-13 | 2007-11-13 | METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RU2007141666/09A RU2007141666A (en) | 2007-11-13 | 2007-11-13 | METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES |
Publications (1)
Publication Number | Publication Date |
---|---|
RU2007141666A true RU2007141666A (en) | 2009-05-20 |
Family
ID=41021336
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
RU2007141666/09A RU2007141666A (en) | 2007-11-13 | 2007-11-13 | METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES |
Country Status (1)
Country | Link |
---|---|
RU (1) | RU2007141666A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8380753B2 (en) | 2011-01-18 | 2013-02-19 | Apple Inc. | Reconstruction of lists in a document |
WO2013073999A2 (en) | 2011-11-18 | 2013-05-23 | Общество С Ограниченной Ответственностью "Центр Инноваций Натальи Касперской" | Method for the automated analysis of text documents |
US9959259B2 (en) | 2009-01-02 | 2018-05-01 | Apple Inc. | Identification of compound graphic elements in an unstructured document |
-
2007
- 2007-11-13 RU RU2007141666/09A patent/RU2007141666A/en not_active Application Discontinuation
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9959259B2 (en) | 2009-01-02 | 2018-05-01 | Apple Inc. | Identification of compound graphic elements in an unstructured document |
US8380753B2 (en) | 2011-01-18 | 2013-02-19 | Apple Inc. | Reconstruction of lists in a document |
US8886676B2 (en) | 2011-01-18 | 2014-11-11 | Apple Inc. | Reconstruction of lists in a document |
WO2013073999A2 (en) | 2011-11-18 | 2013-05-23 | Общество С Ограниченной Ответственностью "Центр Инноваций Натальи Касперской" | Method for the automated analysis of text documents |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110597988B (en) | Text classification method, device, equipment and storage medium | |
WO2022141861A1 (en) | Emotion classification method and apparatus, electronic device, and storage medium | |
US7761447B2 (en) | Systems and methods that rank search results | |
CN106649818B (en) | Application search intent identification method, device, application search method and server | |
US10394830B1 (en) | Sentiment detection as a ranking signal for reviewable entities | |
US8396867B2 (en) | Identifying and ranking networked biographies and referral paths corresponding to selected qualifications | |
CN112035658B (en) | Enterprise public opinion monitoring method based on deep learning | |
CN111079029B (en) | Sensitive account detection method, storage medium and computer equipment | |
JP2011222004A (en) | System and method for recommending interesting content in information stream | |
CN102663139A (en) | Method and system for constructing emotional dictionary | |
CN103744889B (en) | A kind of method and apparatus for problem progress clustering processing | |
Ozoh et al. | Identification and classification of toxic comments on social media using machine learning techniques | |
CN103744887B (en) | It is a kind of for the method for people search, device and computer equipment | |
CN109446393B (en) | Network community topic classification method and device | |
Jiang et al. | PITT at TREC 2011 session track | |
CN111488453B (en) | Resource grading method, device, equipment and storage medium | |
CN115827989A (en) | Network public opinion artificial intelligence early warning system and method under big data environment | |
RU2007141666A (en) | METHOD FOR COLLECTING, PROCESSING, AND CATALOGIZING TARGET INFORMATION FROM UNSTRUCTURED SOURCES | |
CN104899310B (en) | Information sorting method, the method and device for generating information sorting model | |
Morales-Ramirez et al. | Discovering Speech Acts in Online Discussions: A Tool-supported method. | |
CN109325099A (en) | A kind of method and apparatus of automatically retrieval | |
CN118536957A (en) | Talent post matching method and device based on model screening, medium and equipment | |
CN115860283B (en) | Contribution degree prediction method and device based on knowledge worker portrait | |
JP5315726B2 (en) | Information providing method, information providing apparatus, and information providing program | |
CN116796199A (en) | Project matching analysis system and method based on artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FA92 | Acknowledgement of application withdrawn (lack of supplementary materials submitted) |
Effective date: 20091130 |