RU2003104608A

RU2003104608A - METHOD FOR AUTOMATED PROCESSING OF INFORMATION TEXT MATERIALS

Info

Publication number: RU2003104608A
Application number: RU2003104608/09A
Authority: RU
Inventors: Владимир Фёдорович Хорошевский; Виктор Петрович Клинцов
Original assignee: Онтос Аг
Filing date: 2003-02-18
Publication date: 2004-09-20

Claims

1. A method for the automated processing of information text materials in which the presence of information in the processed information text materials that can be described in terms of their content by elements of the characteristics of the user's information needs is recorded, the fact of the availability of such information and the corresponding elements of the characteristics of the user's information needs are recorded and used elements and their combinations in graphical representation to the user of the content of processed materials wherein, the processing of informational text material is carried out in an interactive mode, form the image structure of the graphical representation of the informational textual material, which is a graph with vertices and connections, upon completion of the processing of informational textual material form a cognitive map of the latter, characterized in that it is preliminarily formed in the form of an oriented graph information needs of the user on a certain topic, while at the top of the graph are the types of interesting users objects, and on arcs - the typical relations between these types of objects that interest him, and for each vertex of the constructed graph and each arc connecting pairs of vertices, a separate set of logical constructions is constructed, each of which contains in the left part a search template for examples of types of objects and / or examples of typical relations between them, and on the right side there are operators of fixation in the text of examples of types of objects and / or examples of typical relations between them found from the template, and the process of processing textual information material lead through the sequential implementation of the preprocessing phase, which includes the stage of morphological analysis of the processed information text material with fixing the morphological tag assigned to each of his words according to the results of morphological analysis, the stage of searching for stable phrases in the processed information text material with fixing the semantic tag assigned to each phrase according to the search results, and the segmentation stage of the processed information text material on the proposal This consists in highlighting punctuation marks corresponding to the ends of the sentence with fixing the mark of the end of the sentence, the procession phase, which includes the step of highlighting examples of typical relations, consisting in searching for verb groups in the processed information text material by comparing words with a morphological mark corresponding to verb groups, with patterns of the left parts of the set of logical constructions and fixing a fragment of the processed information text material containing the same as the template verb group, using the operators specified in the right part of the corresponding logical structure template, comparing the matching verb groups with the names of the arcs of the graph image of the user’s information need structure image, and fixing them in the form of a list of examples of typical relations when detecting arcs whose names correspond to those found verb groups, and in the absence of arcs, the names of which correspond to the verb groups of the processed information text material, further processing of the latter is also stopped by the stage of selecting examples of objects, which is carried out by searching in the processed information textual material for examples corresponding to objects of those vertices of the graph that are connected by arcs whose names are identical to the found verb groups, followed by fixing in the list of examples of objects of this type with simultaneous fixing of those types of objects to which they relate, and if there are no examples of types of objects in the processed information text material, I correspond templates, the further processing of informational textual material is stopped, and the postprocessing phase, which is performed as a sequence of the stage of forming from the list of examples of typical relations and the list of examples of types of objects of elementary graphs, each of which has a structure of the form “vertex – arc – vertex”, at the vertices of which they have examples of the corresponding types of objects, and on the arc - an example of the corresponding type relation connecting the selected vertices with fixing the list of elementary graphs and the merge stage I have elementary graphs in a cognitive map of processed informational textual material.

2. The method of automated processing of informational textual materials according to claim 1, characterized in that the image of the structure of the graphical representation of the user's informational needs on a given topic is formed by expert people, transforming the user's informational need on a specific topic into a oriented graph by selecting and fixing material for this topic types of objects and typical relationships between them.

3. The method of automated processing of information text materials according to claim 1, characterized in that the typical relations between the types of objects are divided into general and specialized.

4. The method of automated processing of information text materials according to claim 3, characterized in that the list of general relations for a particular topic is fixed and constant, and the list of specialized relations is open for replenishment and change.

5. The method of automated processing of informational textual materials according to claim 4, characterized in that the relations typical of the descendant-ancestor type hierarchy are referred to general model relations.

6. The method of automated processing of informational textual materials according to claim 5, characterized in that the relation “to be an example” is referred to general model relations.

7. The method of automated processing of informational textual materials according to claim 3, characterized in that the relations specific to the selected topic are referred to specialized model relations.

8. The method of automated processing of information text materials according to claim 1, characterized in that each of the sets of logical structures is used as a search rule in the text for examples of types of objects or examples of typical relations between objects.

9. The method of automated processing of information text materials according to claim 1, characterized in that dictionaries of the Russian language are used as a general dictionary, and dictionaries compiled and updated by users are used as specialized dictionaries.

10. The method of automated processing of informational textual materials according to claim 1, characterized in that words are distinguished as individual units, as sequences of letters from space to space, and / or punctuation marks, and / or special characters, and / or dates, and / or numbers.

11. The method of automated processing of information text materials according to claim 1, characterized in that the step of morphological analysis of the processed text is carried out by highlighting the end of each word of the processed text, comparing the remaining part of the word with the corresponding words in the general dictionary, after which the word from the processed text is brought back to normal form with the simultaneous attribution of its morphological features, and by bringing the noun to normal form, the fixation of the word is recognized in accordance genus, nominative and singular, reduction of verbs to normal form - fixation of the verb in undefined form.

12. A method for the automated processing of information text materials according to claims 1 and 11, characterized in that the genus, number, case are used as morphological signs for a noun, and the form, person and time for verbs.

13. The method of automated processing of information text materials according to claim 1, characterized in that the step of searching for stable phrases in the processed information text material is carried out after the stage of morphological analysis.

14. The method of automated processing of information text materials according to claim 1, characterized in that as stable phrases use dictionary entries of special dictionaries prepared and updated by users.

15. The method for automated processing of information text materials according to claim 1, characterized in that the step of searching for stable phrases in the processed information text material is carried out by searching in the processed information text material of words and phrases that are presented in specialized dictionaries, and fixing for each word found and phrases of semantic litter from the corresponding dictionary.

16. The method of automated processing of information text materials according to claim 1, characterized in that the step of segmenting the processed information text material is carried out by identifying a part of it starting either with a capital letter or starting after one or more blank lines and ending with some punctuation mark to which the “end-offer” mark is assigned.

17. The method of automated processing of informational textual materials according to claim 1, characterized in that when searching for verb groups at the stage of identifying examples of typical relations, words or phrases having the morphological mark “verb”, or the mark “participle”, or the mark “verbal noun” are distinguished "

18. The method of automated processing of information text materials according to claim 1, characterized in that at the stage of extracting examples of objects by found arcs, types of objects are located at the vertices of the graph connected by these arcs and the found types of objects are fixed in a list, then for each a fixed type of objects, the corresponding set of logical constructions is selected, each of which is used to highlight examples of the corresponding type of objects in the information text material by comparing The words or phrases from the processed text with the template from the left side of the corresponding logical construction and, with a positive result of such a comparison, the example found in the processed information text material is fixed in the list of examples of objects of this type with simultaneous fixing of the types of objects to which they relate, and in case of absence in the processed informational textual material of examples of types of objects corresponding to templates, further processing of informational textual materials rial is stopped.

19. The method of automated processing of information text materials according to claim 1, characterized in that in cases where the types of objects and / or typical relations in the image column of the structure of the graphical representation of the consumer’s information needs are described by additional characteristics, for which they are previously created from the information text material the corresponding logical constructions, the processing of informational textual material is continued by searching in the latter for specific fragments corresponding to uyuschih described further characteristics and fixation of the fragments in the lists of types of objects and / or model relations.

20. The method of automated processing of information text materials according to claim 1, characterized in that the step of forming elementary graphs comprises the step of forming elementary graphs for examples of typical relations from a list of examples of typical relations and the stage of searching and processing synonyms.

21. The method of automated processing of information text materials according to claim 20, characterized in that for the formation of elementary graphs for examples of typical relations from the list of examples of typical relations, first for each of the elements in the list of examples of typical relations choose the corresponding arc from the graph image structure of the graphical representation of information needs and select vertices that are connected by this arc, then from the list of types of objects select examples of objects corresponding to the selected vertices am, and for each such triple they form an elementary graph having a structure of the “vertex – arc – vertex” type, with examples of the corresponding types of objects being placed at the vertices of the elementary graph, and an example of the corresponding type relation connecting the selected vertices on the arc.

22. The method of automated processing of information text materials according to claim 20, characterized in that the stage of searching and processing synonyms is carried out by comparing the names of each of the examples of types of objects from the list of types of objects with elements of the dictionary of synonyms or using the heuristic algorithm, which consists in the fact that two examples of one typical object are considered synonyms if the representation in the source text of the first of them is completely “embedded” in the representation in the source of the second of them or their representation in the source text are completely identical, and if synonyms are identified for each pair, an elementary graph is formed having a structure of the form “vertex – arc – vertex”, the first vertex of which corresponds to an example of a typical object, the second to its synonym, and the arc to the relation with the name “synonym” ”, The obtained elementary graph is fixed in the general list of elementary graphs.

23. The method for automated processing of information text materials according to claim 20, characterized in that the step of forming elementary graphs further includes a step of forming elementary graphs for undefined relationships.

24. The method of automated processing of information text materials according to item 23, wherein the stage of forming elementary graphs for undefined relations is carried out for examples of those types of objects that have remained unused, but are within the same sentence by forming from a variety of unused elements from the list examples of typical relations and from the list of types of objects of all their possible pairs and the formation for each such pair of elementary graph, also having the structure of the form “ver ina - arc - vertex ”, with the first vertex of this elementary graph corresponding to the first example of an object from a selected pair, the second to the second example of an object from a selected pair, and the arc to an indefinite relation with the name“ ??? ”, the resulting elementary graph is fixed in the general list elementary graphs.

25. The method of automated processing of information text materials according to claim 1, characterized in that the step of merging elementary graphs into a cognitive map of the processed information text material is carried out by superimposing the same vertices of the elementary graphs constructed and eliminating duplicate arcs.

26. The method of automated processing of information text materials according to claim 1, characterized in that elementary graphs with arcs having an indefinite relation “???” are infused into the cognitive map of the processed information text material only after the user confirms the need for this operation.

27. The method of automated processing of information text materials according to claim 1, characterized in that the step of forming a cognitive map for the totality of processed information text materials is carried out after processing the last of the information text materials from a given set by applying the same vertices of all constructed cognitive maps and eliminating duplicate arcs .

28. The method of automated processing of information text materials according to claim 1, characterized in that each of the types of objects and / or typical relations has single or multiple characteristics.

29. The method of automated processing of information text materials according to claim 28, characterized in that the characteristics can be numerical, string or reference.