CN108628928B

CN108628928B - Text mining support method and apparatus

Info

Publication number: CN108628928B
Application number: CN201810156475.8A
Authority: CN
Inventors: 西川康平
Original assignee: Screen Holdings Co Ltd
Current assignee: Screen Holdings Co Ltd
Priority date: 2017-03-15
Filing date: 2018-02-24
Publication date: 2021-12-07
Anticipated expiration: 2038-02-24
Also published as: TW201835790A; JP2018152023A; KR20180105566A; CN108628928A; KR102230102B1; JP6829117B2; TWI692696B

Abstract

A text mining support method and apparatus display a support screen including a scatter diagram and an introduction indicating a view of the scatter diagram when the scatter diagram indicating a result of correspondence analysis is displayed. When a scatter diagram related to words and variables is displayed, a screen instructed by a user is displayed on a basic screen not including an enlightenment, a 1 st support screen including a judgment method of words near the origin as an enlightenment, a 2 nd support screen including a judgment method of a degree of association of words characterizing variables as an enlightenment, a 3 rd support screen including a judgment method of a degree of similarity between words as an enlightenment, and a 4 th support screen including a judgment method of a degree of similarity between variables as an enlightenment. This makes it possible to efficiently perform processing for guiding a view from a graph showing the result of the correspondence analysis.

Description

Text mining support method and apparatus

Technical Field

The present invention relates to a data mining technology, and more particularly, to a text mining support method and apparatus for supporting execution of text mining (text mining).

Background

In recent years, attention has been paid to a data mining technique that applies a data analysis technique such as statistics or pattern recognition to a large amount of data and guides findings (rules and the like appearing in the data) from the large amount of data. Data mining that takes text data as an object is called text mining. In the following, a case where correspondence analysis (correlation analysis sis), which is one of data analysis techniques, is performed on text data is considered.

In the correspondence analysis, a process of rearranging items so that the association between the header item and the table-side item becomes the largest is performed for a composite table (cross correlation table). The results of performing the correspondence analysis are typically represented using scatter plots (two-dimensional charts). For example, if the composite table shown in fig. 2 is subjected to corresponding analysis, the scatter diagram shown in fig. 3 can be obtained.

Japanese patent laying-open No. 2005-44087, which is related to the present invention, describes a text mining system that presents an analysis flow when a plurality of analysis tools are used to a user. With the system described in the above document, even a user with little knowledge or experience regarding text mining can perform analysis using a plurality of analysis tools in an appropriate order.

Disclosure of Invention

[ problems to be solved by the invention ]

In correspondence analysis, it is more important to examine the obtained scatter diagram and to guide the finding than to obtain the scatter diagram. However, since a user who has little knowledge or experience about text mining does not understand the view of the scatter diagram, the user does not understand what to do first even when viewing the scatter diagram. Therefore, a user with little knowledge or experience cannot efficiently perform processing for guiding the view from the scatter diagram.

The system described in patent document 1 presents an analysis flow to a user, but does not support a process of guiding a finding from an analysis result. Therefore, even if the system described in patent document 1 is used, the above problem cannot be solved.

Accordingly, an object of the present invention is to provide a text mining support method and apparatus for efficiently performing processing for guiding a view from a graph showing a result of correspondence analysis.

[ means for solving problems ]

In order to achieve the above object, the present invention has the following features.

The 1 st embodiment of the present invention is a text mining support method for displaying analysis results obtained by correspondence analysis, including

Inputting the analysis result;

a step of inputting an instruction from a user;

generating screen data including a screen of a graph representing the analysis result; and

displaying a picture according to the picture data; and is

The step of generating screen data generates screen data of a support screen including the graph and a hint (hit) indicating a view of the graph, in response to the instruction.

Embodiment 2 of the present invention is that in embodiment 1 of the present invention,

the screen data generating step generates screen data corresponding to a screen selected by the instruction from among a plurality of support screens and a base screen that includes the graph and does not include the hint.

Embodiment 3 of the present invention is that in embodiment 2 of the present invention,

in the step of inputting the analysis result, a result of associating the 1 st item with the 2 nd item, that is, a result including the 1 st component and the 2 nd component of the 1 st item and the 1 st component and the 2 nd component of the 2 nd item is input as the analysis result,

the step of generating screen data generates a scatter chart in which the 1 st item and the 2 nd item are plotted in a plane having the 1 st component as a horizontal axis and the 2 nd component as a vertical axis, as the graph.

Embodiment 4 of the present invention is embodiment 3 of the present invention,

the plurality of support screens include a 1 st support screen, and the 1 st support screen includes, as the hint, a 1 st item near an origin in a scatter diagram that does not have a significant feature.

Embodiment 5 of the present invention is that in embodiment 4 of the present invention,

a range near the origin is shown in the scatter diagram included in the 1 st support screen.

Embodiment 6 of the present invention is embodiment 3 of the present invention,

the plurality of support screens include a 2 nd support screen, and the 2 nd support screen includes, as the hint, a 1 st item located in a direction away from an origin toward a 2 nd item in a scatter diagram, which characterizes the 2 nd item.

Embodiment 7 of the present invention is embodiment 6 of the present invention,

a range of directions from an origin to the selected 2 nd item is shown in a scatter diagram included in the 2 nd support screen.

Embodiment 8 of the present invention is embodiment 3 of the present invention,

the plurality of support screens include a 3 rd support screen, and the 3 rd support screen includes a meaning that the similarity between the 1 st items close to each other in the scatter diagram is high as the hint.

Embodiment 9 of the present invention is that in embodiment 8 of the present invention,

the scatter diagram included in the 3 rd support screen shows the range around the selected 1 st item.

The 10 th embodiment of the present invention is that in the 3 rd embodiment of the present invention,

the plurality of support screens include a 4 th support screen, and the 4 th support screen includes, as the hint, a screen in which the similarity between items 2 that are close in distance in a scatter diagram is high.

Embodiment 11 of the present invention is embodiment 10 of the present invention,

a scatter diagram included in the 4 th support screen is indicated with a symbol indicating a 2 nd item closest to the selected 2 nd item.

Embodiment 12 of the present invention is embodiment 3 of the present invention,

in the step of inputting the analysis result, a result of performing correspondence analysis on a composite table in which a word is the item 1, a part of a sentence is the item 2, and the frequency of occurrence of each word in each part of the sentence is data in the table is input as the analysis result.

An embodiment 13 of the present invention is a text mining support apparatus that displays analysis results obtained by correspondence analysis, including

An analysis result input unit for inputting the analysis result;

an instruction input unit for inputting an instruction from a user;

a screen generating unit that generates screen data of a screen including a graph indicating the analysis result; and

an analysis result display unit for displaying a screen based on the screen data; and is

The screen generating unit generates screen data of a support screen including the graph and an introduction showing a view of the graph in response to the instruction.

Embodiment 14 of the invention is embodiment 13 of the invention,

the screen generation unit generates screen data corresponding to a screen selected by the instruction from among a plurality of support screens and a basic screen that includes the graph and does not include the hint.

Embodiment 15 of the present invention is that in embodiment 14 of the present invention,

the analysis result input unit inputs, as the analysis result, a result of associating the 1 st item with the 2 nd item, that is, a result including the 1 st component and the 2 nd component of the 1 st item and the 1 st component and the 2 nd component of the 2 nd item,

the screen generating unit generates, as the graph, a scatter diagram in which the 1 st item and the 2 nd item are plotted in a plane having the 1 st component as a horizontal axis and the 2 nd component as a vertical axis.

Embodiment 16 of the invention is that in embodiment 15 of the invention,

the analysis result input unit inputs a result of performing correspondence analysis on a composite table in which a word is the 1 st item, a part of a sentence is the 2 nd item, and the frequency of occurrence of each word in each part of the sentence is data in the table, as the analysis result.

[ Effect of the invention ]

According to embodiment 1 or embodiment 13, the user can efficiently perform processing for guiding a view from the graph showing the result of the correspondence analysis using the support screen including the graph showing the result of the correspondence analysis and the hint indicating the view of the graph.

According to the above-described

embodiment

2 or 14, by selectively displaying the support screen including the hint and the basic screen not including the hint, a screen corresponding to the level of the user can be displayed. Further, by selectively displaying a plurality of support screens, it is possible to present a plurality of views of the chart to the user.

According to the above-described

embodiment

3 or 15, the user can efficiently perform processing for guiding the findings from the scatter chart showing the results of the correspondence analysis on the 1 st item and the 2 nd item.

According to the 4 th embodiment, the user can efficiently perform processing for guiding the findings from the graph representing the result of the correspondence analysis using the knowledge that the 1 st item near the origin in the scatter diagram does not have a significant feature.

According to the 5 th embodiment, the user can view the illustrated range, and easily know the 1 st item having no distinctive feature.

According to the 6 th embodiment, the user can use the knowledge that the 1 st item located in the direction from the origin to the 2 nd item in the scatter diagram gives the feature to the 2 nd item, and efficiently perform the processing of guiding the view from the graph representing the result of the correspondence analysis.

According to the 7 th embodiment, the user can view the illustrated range and easily know the 1 st item which gives a feature to the selected 2 nd item.

According to the 8 th embodiment, the user can efficiently perform processing for guiding the view from the graph showing the result of the correspondence analysis using the knowledge that the similarity between items 1 that are close in distance in the scatter diagram is high.

According to the 9 th embodiment, the user can easily know the 1 st item having a high degree of similarity to the selected 1 st item while viewing the illustrated range.

According to the 10 th embodiment, the user can efficiently perform processing for guiding the view from the graph showing the result of the correspondence analysis using the knowledge that the similarity between items 2 that are close in distance in the scatter diagram is high.

According to the 11 th embodiment, the user can view the illustrated symbol and easily know the 2 nd item having the highest similarity with the selected 2 nd item.

According to the above-described

embodiment

12 or 18, the user can efficiently perform processing for guiding the findings from the scatter chart showing the results of correspondence analysis on words and parts of articles.

These and other objects, features, embodiments and effects of the present invention will become more apparent from the following detailed description with reference to the accompanying drawings.

Drawings

Fig. 1 is a block diagram showing a configuration of a text mining support device according to an embodiment of the present invention.

Fig. 2 is a diagram showing a composite table to be subjected to correspondence analysis.

Fig. 3 is a diagram showing a scatter diagram created by the text mining support device shown in fig. 1.

Fig. 4 is a block diagram showing a configuration of a computer functioning as the text mining support device shown in fig. 1.

Fig. 5 is a flowchart showing an operation of the text mining support apparatus shown in fig. 1.

Fig. 6 is a diagram showing a basic screen of the text mining support apparatus shown in fig. 1.

Fig. 7 is a view showing a 1 st support screen of the text mining support device shown in fig. 1.

Fig. 8 is a view showing a 2 nd support screen image of the text mining support device shown in fig. 1.

Fig. 9 is a view showing a 3 rd support screen of the text mining support device shown in fig. 1.

Fig. 10 is a diagram showing a 4 th support screen of the text mining support device shown in fig. 1.

[ description of symbols ]

1: text data

2: analysis results

5: text analysis device

10: text mining support device

11: analysis result input unit

12: instruction input unit

13: picture generation part

14: analysis result display unit

20: computer with a memory card

21：CPU

22: main memory

23: storage unit

24: input unit

25: display unit

26: communication unit

27: recording medium reading unit

28: keyboard with a keyboard body

29: mouse (Saggar)

30: recording medium

31: text mining support program

100: basic picture

101: picture selection window

102. 112, 122, 132, 142: scatter diagram window

103: radio button

110. 120, 130, 140: support screen

113. 123, 133: word list window

143: variable list window

114. 124, 134, 144: inspiration window

115. 135, and (3) adding: round (T-shaped)

125. 145: arrow head

126. 127: semi-straight line

S101 to S112: step (ii) of

Detailed Description

Hereinafter, a text mining support method, a text mining support device, and a text mining support program according to embodiments of the present invention will be described with reference to the drawings. The text mining support method according to the present embodiment is typically executed by using a computer. The text mining support device according to the present embodiment is typically configured using a computer. The text mining support program according to the present embodiment is a program for implementing a text mining support method using a computer. A computer that executes the text mining support program functions as a text mining support device.

Fig. 1 is a block diagram showing a configuration of a text mining support device according to an embodiment of the present invention. The text mining support device 10 shown in fig. 1 includes an analysis result input unit 11, an instruction input unit 12, a screen generation unit 13, and an analysis result display unit 14. The result of the correspondence analysis of the text data is input to the text mining support device 10. The text mining support device 10 displays a scatter diagram indicating the inputted analysis result on the screen.

In fig. 1, a text analysis device 5 is provided in a stage preceding a text mining support device 10. Text data 1 is input into the text analysis device 5. In the following description, the text data 1 is text data having a plurality of portions (hereinafter referred to as "chapters"). In addition, the "chapter" is also referred to as a "variable" in a scene in which correspondence analysis is performed. The text analysis device 5 extracts words included in the text data 1, and creates a composite table in which the words are set as table-side items, chapters are set as header items, and the frequency of appearance of each word in each chapter is set as in-table data. The text analysis device 5 performs correspondence analysis on the created composite table, and outputs an analysis result 2. By performing the correspondence analysis, 2 or more components representing the characteristics of the processing target data can be obtained. The analysis result 2 includes at least the 1 st and 2 nd components of each word, the 1 st and 2 nd components of each variable, the contribution ratio of the 1 st component, and the contribution ratio of the 2 nd component.

Fig. 2 is a diagram showing a composite table to be subjected to correspondence analysis. The composite table shown in fig. 2 is created by inputting article data of the novel "human failure" as text data 1 into the text analysis device 5. This novel is a japanese novel, and has 5 chapters of "preamble", "first hand note", "second hand note", "third hand note", and "postscript", and includes words such as "oneself", "human", "flatfish", and "mood". The composite table shown in fig. 2 includes words such as "self", "human", "flatfish", and "mood" as table-side items, and includes 5 variables (chapters) such as "preamble", "first hand", second hand ", third hand", and "postscript" as table-head items. The word "human" appears 38 times in "first hand. In the composite table shown in fig. 2, 38 is written in a column (hatched portion) in which the table-side item is "human" and the head item is "first hand mark". In order to appropriately perform correspondence analysis, only words having a frequency of occurrence equal to or higher than a predetermined value are included in the composite table shown in fig. 2.

Fig. 3 is a diagram showing a scatter diagram created by the text mining support apparatus 10. As described above, the analysis result 2 input to the text mining support device 10 includes at least the 1 st component and the 2 nd component of each word, the 1 st component and the 2 nd component of each variable, the contribution ratio of the 1 st component, and the contribution ratio of the 2 nd component. The screen generating unit 13 creates a scatter diagram by plotting words and variables in a plane having the 1 st component as the horizontal axis and the 2 nd component as the vertical axis. For example, according to analysis result 2 with respect to the composite table shown in fig. 2, a scatter chart shown in fig. 3 was made. The analysis result display unit 14 displays a screen including the created scatter diagram.

In fig. 3, black circles are depicted at positions of words, hollow squares are depicted at positions of variables, the words are standard bodies, and the variables are italicized. Fig. 3 shows the contribution ratio of the 1 st component and the contribution ratio of the 2 nd component. In general, the contribution ratio of the 1 st component is larger than the contribution ratio of the 2 nd component. In view of this, 2 points P (P) within the scatter plot₁、p₂)、Q(q₁、q₂) The distance d between the components is the contribution ratio k using the 1 st component₁Contribution ratio k to the 2 nd component₂And is defined as the following formula (1).

d＝√[{k₁(p₁-q₁)}²+{k₂(p₂-q₂)}²]...(1)

The distance in the following description refers to a distance in a scatter diagram defined by equation (1). Circles shown in the scatter diagram appear as ellipses having a length in the 1 st component direction shorter than that in the 2 nd component direction.

Fig. 4 is a block diagram showing a configuration of a computer functioning as the text mining support device 10. The computer 20 shown in fig. 4 includes a Central Processing Unit (CPU) 21, a main memory 22, a storage Unit 23, an input Unit 24, a display Unit 25, a communication Unit 26, and a recording medium reading Unit 27. In the main Memory 22, for example, a Dynamic Random Access Memory (DRAM) is used. In the storage section 23, for example, a hard disk or a solid-state drive is used. The input unit 24 includes, for example, a keyboard 28 or a mouse 29. The display unit 25 uses, for example, a liquid crystal display. The communication unit 26 is an interface circuit (interface circuit) for wired communication or wireless communication. The recording medium reading unit 27 is an interface circuit of the recording medium 30 storing a program and the like. The recording medium 30 is a non-transitory recording medium such as a Compact disk-Read Only Memory (CD-ROM) or a Digital Video disk-Read Only Memory (DVD-ROM).

When the computer 20 executes the text mining support program 31, the storage unit 23 stores the text mining support program 31 and the analysis result 2. The text mining support program 31 and the analysis result 2 may be received from a server or another computer using the communication unit 26, or may be read from the recording medium 30 using the recording medium reading unit 27, for example.

When the text mining support program 31 is executed, the text mining support program 31 and the analysis result 2 are copied and transferred to the main memory 22. The CPU 21 processes the analysis result 2 stored in the main memory 22 by executing the text mining support program 31 stored in the main memory 22 using the main memory 22 as a work memory. At this time, the computer 20 functions as the text mining support device 10. The configuration of the computer 20 described above is merely an example, and the text mining support device 10 may be configured using any computer.

The user having knowledge or experience related to text mining has the following knowledge with respect to the scatter diagram representing the result of the correspondence analysis. A user with knowledge or experience can use this knowledge to guide insight from the scatter plot.

Knowledge item 1 "words near the origin have no significant features"

Knowledge item 2 "word located in the direction from origin to variable leaving is highly related to the variable, and the variable is characterized"

Knowledge item 3 "words close in distance have high similarity to each other"

Knowledge item 4 "similarity of closely spaced variables to each other is high"

On the other hand, users with little knowledge or experience regarding text mining do not have the knowledge described above. Therefore, a user with little knowledge or experience cannot efficiently perform processing for guiding the view from the scatter diagram. In order to solve the above problem, the text mining support device 10 displays not only a screen including a scatter diagram as a basic screen but also a screen including a scatter diagram and a hint (hint) indicating a view of the scatter diagram as a support screen in response to an instruction from a user.

The operation of each part of the text mining support apparatus 10 will be described with reference to fig. 1. The analysis result input unit 11 receives the analysis result 2 output from an external device (e.g., the text analysis device 5). An instruction from the user is input to the instruction input unit 12. The screen generation unit 13 creates a scatter diagram indicating the analysis result 2, and generates screen data of a screen including the scatter diagram. The screen generating unit 13 selectively generates screen data of a support screen including a scatter diagram and an enlightenment and screen data of a basic screen including a scatter diagram and not an enlightenment, in response to an instruction from the user inputted by using the instruction input unit 12. The analysis result display unit 14 displays a screen based on the screen data generated by the screen generation unit 13. Hereinafter, the support screens displayed by the text mining support device 10 are 4 types, and the 4 types of support screens are referred to as a 1 st support screen to a 4 th support screen.

Fig. 5 is a flowchart showing the operation of the text mining support apparatus 10. First, the CPU 21 transfers the analysis result 2 output from the text analysis device 5 to the main memory 22. Thereby, the analysis result 2 is input to the text mining support device 10 (step S101). Then, the CPU 21 creates a scatter chart from the analysis result 2 (step S102). The scatter diagram is created by plotting words and variables in a plane having the 1 st component as the horizontal axis and the 2 nd component as the vertical axis. Then, the CPU 21 generates screen data of the basic screen including the scatter diagram created in step S102 (step S103). Then, the CPU 21 causes the display unit 25 to display the basic screen based on the screen data generated in step S103 (step S104).

Fig. 6 is a diagram showing a basic screen. The basic frame 100 shown in fig. 6 includes a frame selection window 101 and a scatter diagram window 102. The scatter diagram shown in fig. 3 is described in the scatter diagram window 102. The screen selection window 101 has 6 radio buttons 103. Hereinafter, the 6 radio buttons 103 are referred to as the 1 st to 6 th radio buttons. The 1 st to 6 th radio buttons correspond to the basic screen, the 1 st to 4 th support screens, and the end, respectively. When the basic screen image 100 is displayed, the user operates the keyboard 28 or the mouse 29 to press any of the 1 st to 6 th radio buttons. Thereby, an instruction from the user is input.

The CPU 21 accepts an instruction from the user input using the screen selection window 101 (step S105). Then, the CPU 21 proceeds to any of the following steps in response to an instruction from the user (step S106). When the instruction from the user is "basic screen" (when the 1 st radio button is pressed), the CPU 21 proceeds to step S107. In this case, the CPU 21 generates screen data of the basic screen in the same manner as in step S103 (step S107). If the instruction from the user is "1 st support screen" (2 nd radio button is pressed), the CPU 21 proceeds to step S108. In this case, the CPU 21 generates screen data of the 1 st support screen (step S108). If the instruction from the user is "2 nd support screen" (the 3 rd radio button is pressed), the CPU 21 proceeds to step S109. In this case, the CPU 21 generates screen data of the 2 nd support screen (step S109). If the instruction from the user is "support screen No. 3" (if the 4 th radio button is pressed), the CPU 21 proceeds to step S110. In this case, the CPU 21 generates screen data of the 3 rd support screen (step S110). If the instruction from the user is "4 th support screen" (5 th radio button is pressed), the CPU 21 proceeds to step S111. In this case, the CPU 21 generates screen data of the 4 th support screen (step S111). When the instruction from the user is "end" (when the 6 th radio button is pressed), the CPU 21 ends the processing.

After executing any one of step S107 to step S111, the CPU 21 proceeds to step S112. Then, the CPU 21 causes the display unit 25 to display a screen based on the screen data generated in any one of step S107 to step S111 (step S112). Then, the CPU 21 proceeds to step S105. In this way, the text mining support device 10 displays a screen selected from the basic screen and the 1 st to 4 th support screens in response to an instruction from the user.

The components of the computer 20 shown in fig. 4 and the steps shown in fig. 5 correspond to the components of the text mining support device 10 shown in fig. 1 as follows. The CPU 21 executing step S101 functions as the analysis result input unit 11. The input unit 24 and the CPU 21 executing step S105 function as the instruction input unit 12. The CPU 21 that executes step S102 to step S103 and step S106 to step S111 functions as the screen generating unit 13. The display unit 25 and the CPU 21 that executes step S104 and step S112 function as the analysis result display unit 14.

Fig. 7 is a diagram showing a 1 st support screen. The 1 st support screen 110 shown in fig. 7 includes a screen selection window 101, a scatter diagram window 112, a word list window 113, and a hint window (hint window) 114. The 1 st support screen 110 is related to the 1 st knowledge that the word near the origin has no significant feature. The user can watch the 1 st support screen 110 and efficiently perform processing for guiding the view from the scatter diagram using the 1 st knowledge.

Before the 1 st support screen 110 is displayed, the user operates the keyboard 28 or the mouse 29 to designate a range determined to be near the origin. The initial value of the range determined to be near the origin may be determined in advance. The scatter diagram shown in fig. 3 is written in the scatter diagram window 112. A circle 115 (oval in appearance) indicating the vicinity of the origin is described in the scatter diagram window 112. The circle 115 is preferably written in a different color (e.g., red) than the scatter diagram. In this way, in the scatter diagram included in the 1 st support screen 110, the range near the origin is illustrated using the circle 115. Therefore, the user can view the illustrated range and easily know a word without a prominent feature.

The word list window 113 contains words located near the origin (words within the circle 115) and a word list in which the distances between the words and the origin are arranged in order of distance from near to far. The upward triangular representations within the word list window 113 are arranged in order of distance from near to far. In the inspiration window 114, the title "main point of analysis" is added to describe the 1 st knowledge. The revelation window 114 is disposed at a position overlapping with the scatter diagram window 112.

The size of the circle 115 is determined by an arbitrary method. For example, the size of the circle 115 can be determined by the user specifying the number of words (e.g., 10 words) contained in the circle 115. Alternatively, the size of the circle 115 may be determined by the user specifying the proportion of words contained in the circle 115 (e.g., 10% of the entire). Alternatively, the user may determine the size of the circle 115 by specifying the distance from the origin in the 1 st support screen 110 using the mouse 29.

In the 1 st support screen 110 shown in fig. 7, words near the origin (words within the circle 115) are displayed in the same example as other words. Alternatively, the word near the origin may be displayed in a different example (for example, in a light color) from the other words on the 1 st support screen, or the word near the origin may not be displayed. In the 2 nd to 4 th support screens, the words displayed in the 1 st support screen in the embodiment different from the other words may be displayed in the embodiment different from the other words, or the words not displayed in the 1 st support screen may not be displayed.

Fig. 8 is a diagram showing a 2 nd support screen. The 2 nd support screen 120 shown in fig. 8 includes a screen selection window 101, a scatter diagram window 122, a word list window 123, and an enlightenment window 124. The 2 nd support screen 120 is related to the 2 nd knowledge "the word located in the direction away from the origin toward the variable is highly related to the variable, and a feature is assigned to the variable". The user can view the 2 nd support screen 120 and efficiently perform processing for guiding the view from the scatter diagram using the 2 nd knowledge.

Before the 2 nd support screen 120 is displayed, the user operates the keyboard 28 or the mouse 29 to select 1 variable (chapter). Here, a case where the variable "prolog" is selected will be explained. The scatter diagram shown in fig. 3 is written in the scatter diagram window 122. In the scatter diagram window 122, an arrow 125 that passes through the selected variable from the origin is described, and 2 half

straight lines

126 and 127 that form an angle of a predetermined angle (for example, 10 °) with the arrow 125 from the origin are described. Within the area bounded by half-

lines

126, 127, there are words that lie in directions away from the origin toward the selected variable. In this way, in the scatter diagram included in the 2 nd support screen 120, the range from the origin to the direction in which the selected variable leaves is illustrated by using the half straight line 126 and the half straight line 127. Therefore, the user can easily know the word that gives the feature to the selected variable by viewing the illustrated range.

In the word list window 123, a word list in which words (words in an area sandwiched by the half straight line 126 and the half straight line 127) located in a direction away from the origin toward the selected variable and the distance between the word and the origin are arranged in order of distance from far to near is described. The downward triangle representations within the word list window 123 are arranged in distance order from far to near. In the word list window 123, the 2 nd knowledge is associated with the statement "the farther the distance from the origin can be determined, the higher the degree of association". In the inspiration window 124, the title "main point of analysis" is added to describe the 2 nd knowledge. The inspiration window 124 is disposed at a position overlapping the scatter diagram window 122.

The angle of the angle formed by the arrow 125 with the half-line 126 and the half-line 127 may be determined by any method as long as the arrow 125 is included in the same quadrant as the half-line 126 and the half-line 127. When the arrow 125 and the angle are provided to describe the half straight line 126 and the half straight line 127, the half straight line 126 and the half straight line 127 are described on the 1 st component axis or the 2 nd component axis when the half straight line 126 and the half straight line 127 are included in a quadrant different from the arrow 125. The arrow 125 is preferably written in a different color (e.g., red) than the scatter diagram. The half-

straight lines

126 and 127 are preferably described in a different color (for example, blue) from the scatter diagram and the arrow 125.

Fig. 9 is a diagram showing a 3 rd support screen. The 3 rd support screen 130 shown in fig. 9 includes a screen selection window 101, a scatter diagram window 132, a word list window 133, and an enlightenment window 134. The 3 rd support screen 130 is related to the 3 rd knowledge that "words close in distance have high similarity to each other". The user can watch the 3 rd support screen 130 and efficiently perform the process of guiding the view from the scatter diagram using the 3 rd knowledge.

Before the support screen 130 of the 3 rd position is displayed, the user operates the keyboard 28 or the mouse 29 to select 1 word and specify a range in the vicinity of the word determined to be selected. Here, a case where the word "eye" is selected will be described. The scatter diagram shown in fig. 3 is written in the scatter diagram window 132. Circles 135 (oval in appearance) indicating the vicinity of the selected word are described in the scatter diagram window 132. The circle 135 is preferably written in a different color (e.g., red) than the scatter plot. In this way, in the scatter diagram included in the 3 rd support screen 130, the range near the selected word is illustrated by using the circle 135. Therefore, the user can easily know the word having a high degree of similarity to the selected word by viewing the illustrated range.

The word list window 133 contains words (words within the circle 135) located in the vicinity of the selected word and a word list in which the distance between the word and the specified word is arranged in order of distance from near to far. In the word list window 133, "it can be determined that the closer the distance to the word, the higher the similarity" is described as the 3 rd knowledge. In this example, the word closest to the selected word "eye" is "face". Therefore, the word having the highest similarity to the selected word "eye" is "face". The meaning is described in the inspiration window 134 with the title "main point of analysis". The revelation window 134 is disposed at a position overlapping with the scatter diagram window 132.

The size of the circle 135 is determined by an arbitrary method, similarly to the size of the circle 115 in the 1 st support screen 110. For example, the user determines the size of the circle 135 by a method of specifying the number of words contained in the circle 135, a method of specifying the proportion of words contained in the circle 135, a method of specifying the distance from the selected word, and the like.

Fig. 10 is a diagram showing a 4 th support screen. The 4 th support screen 140 shown in fig. 10 includes a screen selection window 101, a scatter diagram window 142, a variable list window 143, and an enlightenment window 144. The 4 th support screen 140 is related to the 4 th knowledge that "the similarity between the close variables is high". The user can view the 4 th support screen 140 and efficiently perform processing for guiding the view from the scatter diagram using the 4 th knowledge.

Before the 4 th support screen 140 is displayed, the user operates the keyboard 28 or the mouse 29 to select 1 variable. Here, a case where the variable "preamble" is selected will be described. The scatter diagram shown in fig. 3 is written in the scatter diagram window 142. In the scatter diagram window 142, an arrow 145 is shown, which takes the selected variable as a starting point and the variable closest to the selected variable as an ending point. The arrow 145 is preferably written in a different color (e.g., red) than the scatter diagram. In this way, in the scatter diagram included in the 4 th support screen 140, an arrow 145 indicating a variable closest to the selected variable is illustrated. Therefore, the user can easily know the variable having the highest similarity to the selected variable by viewing the illustrated arrow 145.

The variable list window 143 records a list of variables to be closer to the selected variable and the variables arranged in order of distance from near to far. In the variable list window 143, "it can be determined that the closer the distance to the variable, the higher the similarity" is described as the 4 th knowledge. In this example, the variable closest to the selected variable "prolog" is "postscript". Thus, the variable with the highest similarity to the selected variable "preamble" is "postscript". The inspiration window 144 is described with the title "the main point of analysis". The revealing window 144 is arranged at a position overlapping the scatter diagram window 142.

The text mining support device 10 may display a support screen other than the support screen described above. The support screen may include any content as long as it includes the scatter diagram and the hint indicating the view of the scatter diagram. The suggestion may be a viewer explicitly showing the scatter diagram or a viewer implicitly showing the scatter diagram. The hint may be included in any part of the support screen. The hint may be described in a window overlapping with the scatter diagram window, a window not overlapping with the scatter diagram window, or a message box (message box) whose position is fixed.

As described above, the text mining support method according to the present embodiment includes: inputting an analysis result 2; a step of inputting an instruction from a user; generating screen data including a screen of a graph (scatter plot) indicating the analysis result 2; and displaying the picture according to the picture data. The step of generating screen data generates screen data of a support screen including a graph and an introduction indicating a view of the graph in response to the instruction. Therefore, the user can efficiently perform processing for guiding the findings from the graph showing the result of the correspondence analysis using the support screen including the graph showing the result of the correspondence analysis and the hint indicating the opinion of the graph.

The screen data generation step generates screen data corresponding to a screen selected by the instruction from among the plurality of support screens (the 1 st support screen 110, the 2 nd support screen 120, the 3 rd support screen 130, and the 4 th support screen 140) and the basic screen 100 including the graph and not including the enlightenment. In this way, by selectively displaying the support screen including the hint and the basic screen not including the hint, a screen corresponding to the user's level can be displayed. Further, by selectively displaying a plurality of support screens, it is possible to present a plurality of views of the chart to the user.

In the step of inputting the analysis result, a result of associating the 1 st item (word) with the 2 nd item (variable), that is, a result including the 1 st component and the 2 nd component of the 1 st item and the 1 st component and the 2 nd component of the 2 nd item is input as an analysis result 2; the step of generating screen data creates, as a graph, a scatter diagram in which the 1 st item and the 2 nd item are plotted in a plane having the 1 st component as the horizontal axis and the 2 nd component as the vertical axis. Therefore, the user can efficiently perform processing for guiding the findings from the scatter chart showing the results of the correspondence analysis on the 1 st item and the 2 nd item.

The plurality of support screens include: a 1 st support screen 110 including, as an introduction, a 1 st item near the origin in the scatter diagram, which does not have a significant feature; a 2 nd support screen 120 including, as an introduction, a 1 st item located in a direction departing from the origin toward a 2 nd item in the scatter diagram, the 2 nd item being characterized by the 1 st item; a 3 rd support screen 130 including a meaning that the similarity between the 1 st items close to each other in the scatter diagram is high as an introduction; and a 4 th support screen 140 including a meaning that the similarity between items 2 close to each other in the scatter diagram is high as an introduction. Therefore, the user can efficiently perform processing for guiding the findings from the graph showing the result of the correspondence analysis using the suggestions included in each support screen.

In the scatter diagram included in the 1 st support screen 110, a range near the origin is illustrated by using a circle 115. In the scatter diagram included in the 2 nd support screen 120, the range from the origin to the direction in which the selected 2 nd item leaves is illustrated by a half straight line 126 and a half straight line 127. In the scatter diagram included in the 3 rd support screen 130, the range around the selected 1 st item is illustrated by using a circle 135. In the scatter diagram included in the 4 th support screen 140, a symbol (arrow 145) indicating the 2 nd item closest to the selected 2 nd item is shown. Therefore, the user can easily know the 1 st item having no distinctive feature, the 1 st item having a feature added to the selected 2 nd item, the 1 st item having a high degree of similarity to the selected 1 st item, and the 2 nd item having a high degree of similarity to the selected 2 nd item, by viewing the range or the symbol shown in each support screen.

In the step of inputting the analysis result, a result of performing correspondence analysis on a composite table in which a word is set as the 1 st item, a part of a sentence is set as the 2 nd item, and the frequency of occurrence of each word in each part of the sentence is set as data in the table is input as the analysis result. Therefore, the user can efficiently perform processing for guiding the findings from the scatter chart showing the results of correspondence analysis on the words and the parts of the articles.

The text mining support device 10 according to the present embodiment and the text mining support program 31 according to the present embodiment have the same features and achieve the same effects as the text mining support method according to the present embodiment.

In the above description, the text mining support device 10 displays a scatter diagram that two-dimensionally represents the result of the correspondence analysis. The present invention is not limited to this, and can be applied to a text mining support method and apparatus that displays a graph (e.g., a three-dimensional graph) representing the result of the correspondence analysis in a multidimensional manner. Further, as with the text mining support method and apparatus that displays the scatter chart showing the result of correspondence analysis with respect to the composite table relating to text data, it is possible to configure the data mining support method and apparatus that displays the chart (scatter chart, three-dimensional chart, or the like) showing the result of correspondence analysis with respect to the composite table relating to arbitrary data other than text data.

According to the text mining support method and apparatus of the present invention, the user can efficiently perform the processing of guiding the view from the graph showing the result of the correspondence analysis by displaying the support screen including the graph showing the result of the correspondence analysis and the hint showing the view of the graph.

The present invention has been described in detail, but the above description is illustrative and not restrictive in all respects. It is to be understood that many other variations or modifications may be made without departing from the scope of the invention.

Claims

1. A text mining support method for displaying an analysis result obtained by correspondence analysis, comprising: comprises that

Inputting the analysis result;

a step of inputting an instruction from a user;

displaying a picture according to the picture data; and is

Generating screen data including the graph and a support screen showing a concept of the graph in accordance with the instruction,

the step of generating screen data generates screen data corresponding to a screen selected by the instruction from among a plurality of support screens and a basic screen including the chart and not including the hint,

2. The text mining support method according to claim 1, wherein: the plurality of support screens include a 1 st support screen, and the 1 st support screen includes, as the hint, a 1 st item near an origin in a scatter diagram that does not have a significant feature.

3. The text mining support method according to claim 2, wherein: a range near the origin is shown in the scatter diagram included in the 1 st support screen.

4. The text mining support method according to claim 1, wherein: the plurality of support screens include a 2 nd support screen, and the 2 nd support screen includes, as the hint, a 1 st item located in a direction away from an origin toward a 2 nd item in a scatter diagram, which characterizes the 2 nd item.

5. The text mining support method according to claim 4, wherein: a range of directions from an origin to the selected 2 nd item is shown in a scatter diagram included in the 2 nd support screen.

6. The text mining support method according to claim 1, wherein: the plurality of support screens include a 3 rd support screen, and the 3 rd support screen includes a meaning that the similarity between the 1 st items close to each other in the scatter diagram is high as the hint.

7. The text mining support method according to claim 6, wherein: the scatter diagram included in the 3 rd support screen shows the range around the selected 1 st item.

8. The text mining support method according to claim 1, wherein: the plurality of support screens include a 4 th support screen, and the 4 th support screen includes, as the hint, a screen in which the similarity between items 2 that are close in distance in a scatter diagram is high.

9. The text mining support method according to claim 8, wherein: a scatter diagram included in the 4 th support screen is indicated with a symbol indicating a 2 nd item closest to the selected 2 nd item.

10. The text mining support method according to claim 1, wherein: in the step of inputting the analysis result, a result of performing correspondence analysis on a composite table in which a word is the item 1, a part of a sentence is the item 2, and the frequency of occurrence of each word in each part of the sentence is data in the table is input as the analysis result.

11. A text mining support device that displays an analysis result obtained by correspondence analysis, characterized in that: comprises that

An analysis result input unit for inputting the analysis result;

an instruction input unit for inputting an instruction from a user;

The screen generating unit generates screen data of a support screen including the graph and a hint indicating a view of the graph in response to the instruction,

the screen generating unit generates screen data corresponding to a screen selected by the instruction from among a plurality of support screens and a basic screen including the graph and not including the hint,

12. The text mining support device according to claim 11, wherein: the analysis result input unit inputs a result of performing correspondence analysis on a composite table in which a word is the 1 st item, a part of a sentence is the 2 nd item, and the frequency of occurrence of each word in each part of the sentence is data in the table, as the analysis result.