CN106227911B - A Visualization Method of Topic Evolution Based on Circuit Diagram Element Metaphor - Google Patents
A Visualization Method of Topic Evolution Based on Circuit Diagram Element Metaphor Download PDFInfo
- Publication number
- CN106227911B CN106227911B CN201610487736.5A CN201610487736A CN106227911B CN 106227911 B CN106227911 B CN 106227911B CN 201610487736 A CN201610487736 A CN 201610487736A CN 106227911 B CN106227911 B CN 106227911B
- Authority
- CN
- China
- Prior art keywords
- word
- theme
- disk
- metaphor
- evolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
- G06F30/398—Design verification or optimisation, e.g. using design rule check [DRC], layout versus schematics [LVS] or finite element methods [FEM]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Instructional Devices (AREA)
Abstract
本发明属于数据可视化分析领域,具体涉及一种基于电路图元素隐喻的主题演化可视化方法。包括:数据预处理,对文本数据进行预处理,进行分词、去停词操作,将文本集合处理成词库;采用LDA算法对文本集合进行处理,抽取主题,并记录与主题对应的词、文本以及时间和地点信息;以焊盘图标隐喻主题中的词,即词盘,词盘采用空心饼图形式表示,表示在此时间段内此词所处的地理分布比例等。该方法主要能够展示主题的内容、主题的强度随时间的变化及主题与主题间的演化关系以方便用户对主题的演化过程进行分析,还可展示各时段内同一主题强度的地理分布。
The invention belongs to the field of data visualization analysis, and in particular relates to a theme evolution visualization method based on circuit diagram element metaphors. Including: data preprocessing, preprocessing text data, performing word segmentation and removing stop words, processing text collections into lexicons; using LDA algorithm to process text collections, extracting topics, and recording words and texts corresponding to topics And the time and place information; the words in the theme are metaphorized by the pad icon, that is, the word plate, and the word plate is represented in the form of a hollow pie chart, indicating the geographical distribution ratio of the word within this time period, etc. This method can mainly display the content of the theme, the change of the intensity of the theme over time, and the evolution relationship between themes to facilitate the user to analyze the evolution process of the theme, and can also display the geographical distribution of the intensity of the same theme in each time period.
Description
技术领域technical field
本发明属于数据可视化分析领域,具体涉及一种基于电路图元素隐喻的主题演化可视化方法。The invention belongs to the field of data visualization analysis, and in particular relates to a theme evolution visualization method based on circuit diagram element metaphors.
背景技术Background technique
主题演化是指主题随时间的变化过程,是当今的一个研究热点,可广泛应用于文本挖掘、舆情分析、科研热点分析等领域。其主要任务是应用主题演化算法发现文档集合中主题变化趋势。主题演化研究的结果多以概率等数学形式表现,不易理解,尤其主题间的深化关系不易被发现。于是,需要一种可视化的分析方法来展示并帮助分析主题演化过程。Topic evolution refers to the change process of topics over time. It is a research hotspot today and can be widely used in text mining, public opinion analysis, scientific research hotspot analysis and other fields. Its main task is to apply the topic evolution algorithm to discover the trend of topic changes in document collections. The results of topic evolution research are mostly expressed in mathematical forms such as probability, which is difficult to understand, especially the deepening relationship between topics is not easy to be discovered. Therefore, a visual analysis method is needed to display and help analyze the topic evolution process.
现有主题演化可视化方法主要采用堆叠图(Stacked Chart)、冲击图(AlluvialDiagram)等形式如Themeriver、TestFlow、NEViewer等可视化方法,其特点是能够表现出主题随时间的演化过程和主题的演化关系,但只能表现主题相对强度;堆叠图强调主题演化过程,采用连续图形表现离散量,易产生误解;另外上述各个方法对主题数量多、联系复杂的情况,尤其是小强度主题表现不够清晰,存在小强度主题易被覆盖的问题,同时对组成主题的词表现不清晰或未进行表现。对此,本发明提出一种新的、基于电路图隐喻的主题演化可视化方法来解决上述问题。Existing visualization methods for theme evolution mainly use Stacked Chart, Alluvial Diagram and other visualization methods such as Themeriver, TestFlow, NEViewer, etc., which are characterized by being able to show the evolution process of the theme over time and the evolution relationship of the theme. However, it can only show the relative intensity of the theme; the stacked graph emphasizes the evolution process of the theme, and uses continuous graphics to express discrete quantities, which is easy to cause misunderstandings; in addition, the above-mentioned methods are not clear enough for the situation where the number of themes is large and the connection is complicated, especially for small-intensity themes. Small-intensity topics are easily covered, and the words that make up the topic are unclear or not represented. In view of this, the present invention proposes a new visualization method of theme evolution based on circuit diagram metaphor to solve the above problems.
发明内容Contents of the invention
本发明的目的是提供一种更清晰的基于电路图元素隐喻的主题演化可视化方法。The purpose of the present invention is to provide a clearer visualization method for theme evolution based on circuit diagram element metaphors.
本发明的目的是这样实现的:The purpose of the present invention is achieved like this:
(1)数据预处理,对文本数据进行预处理,进行分词、去停词操作,将文本集合处理成词库;采用LDA算法对文本集合进行处理,抽取主题,并记录与主题对应的词、文本以及时间和地点信息;(1) Data preprocessing, preprocessing the text data, performing word segmentation and removing stop words, processing the text collection into a thesaurus; using the LDA algorithm to process the text collection, extracting themes, and recording the words corresponding to the topics, text and time and location information;
(2)以焊盘图标隐喻主题中的词,即词盘,词盘采用空心饼图形式表示,表示在此时间段内此词所处的地理分布比例;(2) The words in the topic are metaphorized by the pad icon, that is, the word plate, which is represented in the form of a hollow pie chart, indicating the geographical distribution ratio of the word within this time period;
(3)在同一时段内抽取的同一主题下的词盘就近排布;(3) Word disks under the same topic extracted within the same period of time are arranged nearby;
(4)以元件图标表示主题,即主题框,主题框包围同一主题内的词盘,表示词盘在选定时间段内属于同一主题,以框宽度表示主题强度,即本时段内包含该主题的文档数,以框高度表示主题包含的词量;(4) The theme is represented by the component icon, that is, the theme frame, which surrounds the word disk in the same theme, indicating that the word disk belongs to the same theme within the selected time period, and the theme intensity is represented by the frame width, that is, the theme is included in this time period The number of documents of , and the box height represents the amount of words contained in the topic;
(5)以“+”“-”符号表示词在主题中的出现和消亡,以电路图中电流由正极向负极流动隐喻主题的演化方向;(5) Use "+" and "-" symbols to indicate the appearance and disappearance of words in the theme, and use the current in the circuit diagram to flow from the positive pole to the negative pole to metaphor the evolution direction of the theme;
(6)在每个词出现的位置出现标识左侧以圆角矩形显示词的内容;(6) The content of the word is displayed in a rounded rectangle on the left side of the logo at the position where each word appears;
(7)以电路图中电路走线隐喻不同时间段主题间词汇的演化关系,即用走线连接不同时间段里的相同词汇,词连接线分三段绘制,第一段由源词盘发出,第三段连至目的词盘,中间段连接上述两段词连接线,在第一段和第三段词连接线上标识词强度,即出现这个词的数量。(7) Use the circuit traces in the circuit diagram as a metaphor for the evolution of vocabulary between topics in different time periods, that is, use the traces to connect the same vocabulary in different time periods. The word connecting lines are drawn in three sections. The first section is issued from the source word disk. The third paragraph is connected to the target word disk, the middle paragraph is connected to the above-mentioned two paragraphs of word connecting lines, and the word strength is identified on the first and third paragraphs of word connecting lines, that is, the number of occurrences of this word.
本发明的有益效果在于:The beneficial effects of the present invention are:
该方法主要能够展示主题的内容、主题的强度随时间的变化及主题与主题间的演化关系以方便用户对主题的演化过程进行分析,还可展示各时段内同一主题强度的地理分布。本发明有效解决了现有主题演化可视化方法中的强度表示不清晰、复杂主题关系表示不够清晰以及小强度主题易被掩盖等问题。This method can mainly display the content of the theme, the change of the intensity of the theme over time, and the evolution relationship between themes to facilitate the user to analyze the evolution process of the theme, and can also display the geographical distribution of the intensity of the same theme in each time period. The invention effectively solves the problems in the existing topic evolution visualization method that the intensity representation is not clear, the relationship between complex topics is not clear enough, the topics with small intensity are easily covered up, and the like.
附图说明Description of drawings
图1为本发明步骤图;Fig. 1 is a step diagram of the present invention;
图2为本发明具体实施过程图;Fig. 2 is a specific implementation process diagram of the present invention;
图3为本发明词盘所处的地理分布图。Fig. 3 is a geographical distribution diagram of the word disk of the present invention.
具体实施方式Detailed ways
下面结合附图对本发明做进一步描述。The present invention will be further described below in conjunction with the accompanying drawings.
一种基于电路图元素隐喻的主题演化可视化方法,其实施包括如下步骤:A visualization method for theme evolution based on circuit diagram element metaphor, its implementation includes the following steps:
步骤1,数据预处理,对文本数据进行预处理,首先进行分词、去停词等操作,将文本集合处理成词库;然后采用诸如LDA等算法对文本集合进行处理,抽取主题,并记录与主题对应的词、文本以及时间和地点等信息;Step 1, data preprocessing, to preprocess the text data, first perform operations such as word segmentation and stop word removal, and process the text collection into a lexicon; then use algorithms such as LDA to process the text collection, extract topics, and record and Information such as words, texts, and time and place corresponding to the topic;
步骤2,以焊盘图标隐喻主题中的词,此处可称之为“词盘”,词盘采用空心饼图形式表示,如图2中标号1所示,可表示在此时间段内此词所处的地理分布比例,如图3所示。Step 2, use the pad icon as a metaphor for the words in the topic, which can be called "word disk" here, and the word disk is represented in the form of a hollow pie chart, as shown by the number 1 in Figure 2, which can represent the time period during this period. The geographical distribution ratio of words is shown in Figure 3.
步骤3,在同一时段内抽取的同一主题下的词盘就近排布,如图3所示;Step 3, the word trays under the same topic extracted in the same period of time are arranged nearby, as shown in Figure 3;
步骤4,以元件图标(矩形框)表示主题,此处可称之为“主题框”,主题框包围同一主题内的词盘,表示其在选定时间段内属于同一主题,以框宽度表示主题强度(本时段内包含该主题的文档数),如图2中标号2所示;Step 4, use the component icon (rectangular frame) to represent the theme, which can be called "theme frame" here, and the theme frame surrounds the dictionaries in the same theme, indicating that they belong to the same theme within the selected time period, and is represented by the width of the box Topic strength (the number of documents containing this topic in this period), as shown by the number 2 in Figure 2;
步骤5,以“+”“-”符号表示词在主题中的出现和消亡,以电路图中电流由正极向负极流动隐喻主题的演化方向,如图2中标号3和2中标号4所示;Step 5, use "+" and "-" symbols to indicate the appearance and disappearance of words in the theme, and use the current in the circuit diagram to flow from the positive pole to the negative pole to metaphorize the evolution direction of the theme, as shown in the number 3 in Figure 2 and the number 4 in Figure 2;
步骤6,在每个词出现的位置出现标识左侧以圆角矩形显示词的内容,如图2中标号5所示;Step 6, the content of the word is displayed in a rounded rectangle on the left side of the logo at the position where each word appears, as shown in the label 5 in Figure 2;
步骤7,以电路图中电路走线隐喻不同时间段主题间词汇的演化关系,即用走线连接不同时间段里的相同词汇,词连接线分三段绘制,第一段由源词盘发出,第三段连至目的词盘,中间段连接上述2段词连接线,如图2中标号6所示,并在第一段和第三段词连接线上标识词强度(即出现这个词的数量),如图2中标号7所示。Step 7, use the circuit diagram to metaphor the evolution relationship of vocabulary between topics in different time periods, that is, use the wiring to connect the same vocabulary in different time periods, the word connecting lines are drawn in three sections, the first section is sent from the source word disk, The third section is connected to the target word disk, and the middle section is connected to the above-mentioned 2 paragraphs of word connection lines, as shown in the label 6 in Figure 2, and the first paragraph and the third paragraph word connection line identify word strength (promptly appear this word quantity), as shown by number 7 in Figure 2.
包括步骤2中所述以焊盘图标隐喻主题中包括的词,词盘采用空心饼图形式表示,可表示在此时间段内此词所处的地理分布比例。Including the words included in the theme of the pad icon metaphor described in step 2, the word disk is represented in the form of a hollow pie chart, which can indicate the geographical distribution ratio of the word within this time period.
包括步骤4中所述以元件图标(矩形框)表示主题,此处可称之为“主题框”,主题框包围同一主题内的词盘,表示其在选定时间段内属于同一主题,以框宽度表示主题强度(本时段内包含该主题的文档数),以框高度表示主题包含的词量。Including the theme represented by the component icon (rectangular box) described in step 4, which can be called "theme box" here, and the theme box surrounds the dictionaries in the same theme, indicating that they belong to the same theme within the selected time period, so as to The width of the box indicates the strength of the topic (the number of documents containing the topic in this period), and the height of the box indicates the amount of words contained in the topic.
包括步骤5中所述采用“+”“-”符号表示词在主题中的出现和消亡,以电路图中电流由正极向负极流动隐喻主题的演化方向。Including the use of "+" and "-" symbols in step 5 to indicate the appearance and disappearance of words in the theme, and the evolution direction of the theme as a metaphor for the current flowing from the positive pole to the negative pole in the circuit diagram.
包括步骤7中所述以电路图中电路连走线隐喻不同时间段主题间词汇的演化关系,即用走线连接不同时间段里的相同词汇。Including the evolution relationship of vocabulary between topics in different time periods as described in step 7, using the circuit diagram to connect lines to metaphor, that is, use lines to connect the same vocabulary in different time periods.
词连接走线分三段绘制,第一段由源词盘发出,第三段连至目的词盘,中间段连接上述2段词连接线,并在第一段和第三段词连接线上标识词强度(即出现这个词的数量)。The word connection line is drawn in three sections, the first section is sent from the source word disk, the third section is connected to the target word disk, the middle section connects the above two word connection lines, and the first and third paragraph word connection lines Identifies word strength (that is, the number of occurrences of the word).
Claims (1)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610487736.5A CN106227911B (en) | 2016-06-28 | 2016-06-28 | A Visualization Method of Topic Evolution Based on Circuit Diagram Element Metaphor |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610487736.5A CN106227911B (en) | 2016-06-28 | 2016-06-28 | A Visualization Method of Topic Evolution Based on Circuit Diagram Element Metaphor |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN106227911A CN106227911A (en) | 2016-12-14 |
| CN106227911B true CN106227911B (en) | 2019-08-06 |
Family
ID=57519733
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610487736.5A Active CN106227911B (en) | 2016-06-28 | 2016-06-28 | A Visualization Method of Topic Evolution Based on Circuit Diagram Element Metaphor |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106227911B (en) |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103164545A (en) * | 2011-12-09 | 2013-06-19 | 北京邮电大学 | Visual editing method of virtual electronic components |
| CN105426556A (en) * | 2014-09-19 | 2016-03-23 | 北京华大九天软件有限公司 | Visualization analysis method for image layer relation in layout design rule file |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2000019369A1 (en) * | 1998-10-01 | 2000-04-06 | Bios Group Lp | Automatic evolution of mixed analog and digital electronic circuits |
-
2016
- 2016-06-28 CN CN201610487736.5A patent/CN106227911B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103164545A (en) * | 2011-12-09 | 2013-06-19 | 北京邮电大学 | Visual editing method of virtual electronic components |
| CN105426556A (en) * | 2014-09-19 | 2016-03-23 | 北京华大九天软件有限公司 | Visualization analysis method for image layer relation in layout design rule file |
Non-Patent Citations (1)
| Title |
|---|
| 面向构件语义关系的软件体系结构演化分析;印桂生;《哈尔滨工程大学学报》;20111031;1329-1335页 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106227911A (en) | 2016-12-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112836052B (en) | Automobile comment text viewpoint mining method, equipment and storage medium | |
| CN105095288B (en) | Data analysis method and data analysis device | |
| US9626414B2 (en) | Automatic log record segmentation | |
| CN104794212A (en) | Context sentiment classification method and system based on user comment text | |
| CN105431886A (en) | Hierarchical visualization of rendered datasets | |
| CN109726712A (en) | Character recognition method, device, storage medium, and server | |
| CN107085726A (en) | Single character location method in oracle bone rubbings based on multi-method denoising and connected region analysis | |
| US11615618B2 (en) | Automatic image annotations | |
| CN112036373A (en) | Method for training video text classification model, video text classification method and device | |
| CN107977379A (en) | Method and apparatus for mined information | |
| CN104537392B (en) | A kind of method for checking object based on the semantic part study of identification | |
| Wu et al. | Automatic object extraction from images using deep neural networks and the level‐set method | |
| CN105657575B (en) | Video labeling method and device | |
| CN106227911B (en) | A Visualization Method of Topic Evolution Based on Circuit Diagram Element Metaphor | |
| CN113128513A (en) | Small sample training method based on target segmentation | |
| Kucher et al. | Visual Analysis of Text Annotations for Stance Classification with ALVA. | |
| US10360993B2 (en) | Extract information from molecular pathway diagram | |
| CN114461806A (en) | Training method and device of advertisement recognition model and advertisement shielding method | |
| CN103455607A (en) | Method for automatically converting waveform image file into preset waveform data file | |
| CN112463844A (en) | Data processing method and device, electronic equipment and storage medium | |
| CN118626637A (en) | A visual data processing method based on cloud server | |
| CN104036252A (en) | Image processing method, image processing device and electronic device | |
| CN103593337B (en) | A kind of method for visualizing of image-text set | |
| Yang et al. | Automated extraction of lecture outlines from lecture videos | |
| CN113553815B (en) | Method and device for automatic generation of intelligent report description based on hierarchical attention pointer generation network |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |