[go: up one dir, main page]

CN106815320B - Investigation big data visual modeling method and system based on expanded three-dimensional histogram - Google Patents

Investigation big data visual modeling method and system based on expanded three-dimensional histogram Download PDF

Info

Publication number
CN106815320B
CN106815320B CN201611225454.4A CN201611225454A CN106815320B CN 106815320 B CN106815320 B CN 106815320B CN 201611225454 A CN201611225454 A CN 201611225454A CN 106815320 B CN106815320 B CN 106815320B
Authority
CN
China
Prior art keywords
data
dimensional
original
dimensional histogram
expanded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201611225454.4A
Other languages
Chinese (zh)
Other versions
CN106815320A (en
Inventor
胡钦太
黄昌勤
张瑜
卢春和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGZHOU CREATEVIEW OPTOELECTRONICS TECHNOLOGY Co Ltd
South China Normal University
Original Assignee
GUANGZHOU CREATEVIEW OPTOELECTRONICS TECHNOLOGY Co Ltd
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GUANGZHOU CREATEVIEW OPTOELECTRONICS TECHNOLOGY Co Ltd, South China Normal University filed Critical GUANGZHOU CREATEVIEW OPTOELECTRONICS TECHNOLOGY Co Ltd
Priority to CN201611225454.4A priority Critical patent/CN106815320B/en
Publication of CN106815320A publication Critical patent/CN106815320A/en
Application granted granted Critical
Publication of CN106815320B publication Critical patent/CN106815320B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Educational Technology (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Generation (AREA)

Abstract

本发明公开了一种基于拓展三维直方图的调研大数据可视化建模方法及系统,方法包括:进行三维可视化模型初始化;对调研数据按照具体的数据类型进行分类别预处理;读取原始的调研数据,并依据三维可视化模型的要求和分类别预处理的结果,进行拓展三维直方图数据提取和规格化,生成符合三维可视化模型标准格式的拓展三维直方图数据;将生成的拓展三维直方图数据按照图形学的方法进行可视化显示。本发明基于拓展三维直方图,能形成统一的整体可视化分析结果,适用性更广;通过拓展三维直方图这一可视化图表与原调研数据相联系,保真度高;能根据不同的数据类型采取相应有效的数据预处理方法,更加有效。本发明可广泛应用于大数据处理领域。

Figure 201611225454

The invention discloses a research big data visualization modeling method and system based on an extended three-dimensional histogram. The method includes: initializing a three-dimensional visualization model; classifying and preprocessing the research data according to specific data types; reading the original research According to the requirements of the 3D visualization model and the results of classification preprocessing, the extended 3D histogram data is extracted and normalized to generate the extended 3D histogram data that conforms to the standard format of the 3D visualization model; the generated extended 3D histogram data Visual display is carried out according to the method of graphics. The invention is based on the expanded three-dimensional histogram, which can form a unified overall visual analysis result, and has wider applicability; through the expanded three-dimensional histogram, the visual chart is connected with the original research data, and the fidelity is high; it can be adopted according to different data types. Corresponding and effective data preprocessing methods are more effective. The invention can be widely used in the field of big data processing.

Figure 201611225454

Description

基于拓展三维直方图的调研大数据可视化建模方法及系统Research big data visualization modeling method and system based on extended three-dimensional histogram

技术领域technical field

本发明涉及大数据处理领域,尤其是一种基于拓展三维直方图的调研大数据可视化建模方法及系统。The invention relates to the field of big data processing, in particular to a research big data visualization modeling method and system based on an extended three-dimensional histogram.

背景技术Background technique

教育设备是教育现代化的必备条件,教育决策部门需要对信息技术应用水平的调研数据进行采集和统计,以掌握教育信息化进程基础设施的配备情况、明确软硬件设施的应用情况,同时还需要基于调研数据综合描述教育信息化的进行情况,生成相应的应用分析报告、问题诊断、规划咨询和发展决策等各种报告,从而利用丰富的调研数据进行各级教育信息化实施现状、应用效果和发展水平的评估、咨询及规划。Educational equipment is a necessary condition for education modernization. Educational decision-making departments need to collect and count the research data on the application level of information technology, so as to master the equipment of the infrastructure in the process of education informatization, and to clarify the application of software and hardware facilities. Based on the survey data, comprehensively describe the progress of educational informatization, and generate various reports such as application analysis reports, problem diagnosis, planning consultation and development decision-making, so as to use the rich survey data to carry out the implementation status, application effect and development of educational informatization at all levels. Development level assessment, consultation and planning.

近年来随着教育信息化的巨额投入以及教育信息化的应用与之前预期效果的巨大落差,使得人们不得不关注教育信息化的战略决策和投资收益等深层问题。越来越多的人开始关注教育信息化的绩效,将教育信息化的工作重心从投资、提供信息化方案、平台和系统转向教育信息化整合、教育信息化价值评估和教育信息化可持续发展。教育信息化评价也逐步由以“投入为主”确定教育信息化水平的方式转变为以“绩效为主”确定教育信息化水平的方式,以通过对教育信息化绩效评价来促进教育信息化的应用发展。然而,在教育信息化绩效评价这方面,无论国外还是国内都处于尝试探索阶段。教育信息化绩效评价是一项比较困难的事情:一方面,教育信息化不仅是个动态的发展过程,而且属于多投入多产出的问题,它的产出不易用量化指标来衡量;另一方面,在这一领域至今尚未有成熟的理论指导和合适的测量办法和测量工具,再加上教育信息化评价本身就是一项十分困难的工作,因为教育信息化的效益具有多样性,不仅要看其经济效益更主要看社会效益,不仅要看当前效益更多地表现为长远效益不,仅要看固有效益更多地表现为派生效益。所以教育信息化绩效评价己成为大家关注的十分重要且相当紧迫的话题。In recent years, with the huge investment in educational informatization and the huge gap between the application of educational informatization and the previous expected results, people have to pay attention to the deep problems of strategic decision-making and investment income of educational informatization. More and more people are beginning to pay attention to the performance of educational informatization, shifting the focus of educational informatization from investment, providing informatization solutions, platforms and systems to educational informatization integration, educational informatization value evaluation and sustainable development of educational informatization . The evaluation of educational informatization has gradually changed from the method of determining the level of educational informatization based on "investment" to the method of determining the level of educational informatization based on "performance", so as to promote the development of educational informatization through performance evaluation of educational informatization. application development. However, in the aspect of educational informatization performance evaluation, both foreign and domestic are in the stage of trial and exploration. The performance evaluation of educational informatization is a relatively difficult task: on the one hand, educational informatization is not only a dynamic development process, but also a problem of multiple inputs and multiple outputs, and its output cannot be easily measured by quantitative indicators; In this field, there is still no mature theoretical guidance and suitable measurement methods and measurement tools, and the evaluation of educational informatization itself is a very difficult task, because the benefits of educational informatization are diverse, not only depends on Its economic benefits are mainly based on social benefits, not only the current benefits are more manifested as long-term benefits, but only the inherent benefits are more manifested as derived benefits. Therefore, the performance evaluation of educational informatization has become a very important and urgent topic that everyone pays attention to.

教育信息化调研数据的数据类型比较复杂,并且属于会因势而变的动态化数据,而对省市大规模的调研所产生的巨大数据量亟需能够进行自动化处理和可视化分析的手段。常规的可视化手段主要对数值类的简化类型进行对照统计与可视化,在数据量及类型急骤增加的情况下已经不适应调研分析的需要,这时通常会通过聚类进行分析处理。The data types of educational informatization research data are relatively complex, and are dynamic data that will change according to the situation. The huge amount of data generated by large-scale surveys in provinces and cities urgently requires automated processing and visual analysis methods. Conventional visualization methods mainly carry out comparative statistics and visualization of simplified types of numerical values, which are no longer suitable for the needs of research and analysis due to the rapid increase in the amount and type of data.

传统的聚类分析计算方法主要有如下几种:The traditional cluster analysis calculation methods mainly include the following:

1、划分方法(partitioning methods)1. Partitioning methods

给定一个有N个元组或者纪录的数据集,划分方法将构造K个分组,每一个分组就代表一个聚类,K<N,且这K个分组满足下列条件:(1)每一个分组至少包含一个数据纪录;(2)每一个数据纪录属于且仅属于一个分组(这个要求在某些模糊聚类算法中可以适当放宽)。对于给定的K,该算法首先给出一个初始的分组方法,然后通过反复迭代的方法改变分组,使得每一次改进之后的分组方案都较前一次好。好的衡量标准为:同一分组中的记录越近越好,而不同分组中的纪录越远越好。使用划分方法这个基本思想的算法有:K-MEANS算法、K-MEDOIDS算法和CLARANS算法。Given a data set with N tuples or records, the partition method will construct K groups, each group represents a cluster, K < N, and the K groups satisfy the following conditions: (1) Each group At least one data record is included; (2) each data record belongs to one and only one group (this requirement can be appropriately relaxed in some fuzzy clustering algorithms). For a given K, the algorithm first gives an initial grouping method, and then changes the grouping through repeated iterations, so that each improved grouping scheme is better than the previous one. A good measure is that records in the same group are as close as possible, and records in different groups are as far away as possible. Algorithms that use the basic idea of the division method are: K-MEANS algorithm, K-MEDOIDS algorithm and CLARANS algorithm.

大部分划分方法是基于距离的。给定要构建的分区数K后,划分方法首先创建一个初始化划分方案;然后,它采用一种迭代的重定位技术,通过把对象从一个组移动到另一个组来进行划分。在这些划分方法中一个好的划分的衡量标准一般是:同一个簇中的对象尽可能相互接近或相关,而不同的簇中的对象尽可能远离或不同。传统的划分方法只可以扩展到子空间聚类,而不是搜索整个数据空间,适用于存在很多属性并且数据稀疏的数据。为了达到全局最优,基于划分的聚类方法可能需要穷举所有可能的划分,计算量极大。实际上,大多数应用都采用了流行的启发式方法,如k-均值和k-中心算法,渐近地提高聚类质量,以逼近局部最优解。这些启发式聚类方法很适合用来寻找中小规模的数据库中的球状簇。为了寻找出具有复杂形状的簇和对超大型数据集进行聚类,需要进一步扩展基于划分的聚类方法。Most partitioning methods are distance based. Given the number of partitions K to build, the partition method first creates an initial partition scheme; then, it employs an iterative relocation technique to partition by moving objects from one group to another. A good partitioning criterion in these partitioning methods is generally: objects in the same cluster are as close or related to each other as possible, while objects in different clusters are as far away or as different as possible. Traditional partitioning methods can only be extended to subspace clustering instead of searching the entire data space, and are suitable for data with many attributes and sparse data. In order to achieve the global optimum, the partition-based clustering method may need to exhaust all possible partitions, which is computationally intensive. In fact, most applications employ popular heuristics, such as k-means and k-center algorithms, to asymptotically improve clustering quality to approximate local optimal solutions. These heuristic clustering methods are well suited for finding spherical clusters in small and medium-sized databases. To find clusters with complex shapes and to cluster very large datasets, partition-based clustering methods need to be further extended.

2、层次方法(hierarchical methods)2. Hierarchical methods

层次方法对给定的数据集进行层次化分解,直到某种条件满足为止,其具体又可分为“自底向上”和“自顶向下”两种方案。以“自底向上”方案为例,初始时每一个数据纪录都组成一个单独的组,在接下来的迭代中,层次方法把那些相互邻近的组合并成一个组,直到所有的记录组成一个分组或者某个条件满足为止。层次方法的代表算法有:BIRCH算法、CURE算法、CHAMELEON算法等。The hierarchical method decomposes a given data set hierarchically until a certain condition is satisfied, and it can be divided into two schemes: "bottom-up" and "top-down". Taking the "bottom-up" scheme as an example, each data record is initially formed into a separate group. In the next iteration, the hierarchical method combines those adjacent groups into a group until all records form a group. or until a certain condition is met. The representative algorithms of the hierarchical method are: BIRCH algorithm, CURE algorithm, CHAMELEON algorithm and so on.

层次聚类方法可以基于距离或者基于密度或连通性的。层次聚类方法的一些扩展也考虑了子空间聚类内容。层次方法的缺陷在于,一旦一个步骤(合并或分裂)完成,它就不能被撤(不用考虑不同选择的组合数目,计算开销小);然而这种技术不能更正错误的决定。因此还需要对层次聚类方法的聚类质量进行进一步提高。Hierarchical clustering methods can be distance based or density or connectivity based. Some extensions of hierarchical clustering methods also consider subspace clustering content. The disadvantage of the hierarchical approach is that once a step (merge or split) is completed, it cannot be withdrawn (no consideration for the number of combinations of different options, low computational cost); however, this technique cannot correct bad decisions. Therefore, it is necessary to further improve the clustering quality of the hierarchical clustering method.

3、基于模型的方法(model-based methods)3. Model-based methods

基于模型的方法给每一个聚类假定一个模型,然后去寻找能够较好满足这个模型的数据集。这样一个模型可能是数据点在空间中的密度分布函数或者其它,它的一个潜在假定就是:目标数据集是由一系列的概率分布所决定的。通常基于模型的方法有两种尝试方向:统计的方案和神经网络的方案。Model-based methods assume a model for each cluster, and then search for a dataset that better satisfies this model. Such a model may be a function of the density distribution of data points in space or something else, one underlying assumption of which is that the target data set is determined by a series of probability distributions. Usually model-based methods have two directions of attempt: statistical schemes and neural network schemes.

综上所述,目前针对信息化调研大数据的可视化建模方法,存在以下缺陷或不足:To sum up, the current visualization modeling methods for informatization research big data have the following defects or deficiencies:

(1)只能实现简单的局部数据型可视化,只适合同种数据类型的直观比较,没法形成统一的整体可视化分析结果,适用性不广;(1) Only simple local data visualization can be realized, and it is only suitable for intuitive comparison of the same data type, and cannot form a unified overall visualization analysis result, and its applicability is not wide;

(2)可视化对数据处理过程不可逆,保真度低;(2) The visualization is irreversible to the data processing process, and the fidelity is low;

(3)无法满足复杂的多类型数据的处理要求,不能根据不同的数据类型采取相应有效的数据预处理方法。(3) The processing requirements of complex multi-type data cannot be met, and corresponding and effective data preprocessing methods cannot be adopted according to different data types.

发明内容SUMMARY OF THE INVENTION

为解决上述技术问题,本发明的目的在于:提供一种适用性广、保真度和有效的,基于拓展三维直方图的调研大数据可视化建模方法。In order to solve the above-mentioned technical problems, the purpose of the present invention is to provide a visualization modeling method of research big data based on extended three-dimensional histogram, which has wide applicability, high fidelity and effectiveness.

本发明的另一目的在于:提供一种适用性广、保真度和有效的,基于拓展三维直方图的调研大数据可视化建模系统。Another object of the present invention is to provide a research big data visualization modeling system based on extended three-dimensional histogram, which has wide applicability, fidelity and effectiveness.

本发明所采取的技术方案是:The technical scheme adopted by the present invention is:

基于拓展三维直方图的调研大数据可视化建模方法,包括以下步骤:The research big data visualization modeling method based on extended three-dimensional histogram includes the following steps:

进行三维可视化模型初始化;Initialize the 3D visualization model;

对调研数据按照具体的数据类型进行分类别预处理;Preprocess the survey data into categories according to specific data types;

读取原始的调研数据,并依据三维可视化模型的要求和分类别预处理的结果,进行拓展三维直方图数据提取和规格化,生成符合三维可视化模型标准格式的拓展三维直方图数据,所述拓展三维直方图的横向维度由具有不同层次的不同的填报主体来构成,纵向维度包括多种复杂的数据类型,Z向维度为由单元格属性、单元高度和顶端纹理组成并与原调研数据相联系的结构,其中,不同层次的不同的填报主体包括但不限于省、市、县区和学校,多种复杂的数据类型包括但不限于逻辑类数据、文本类数据、数值类数据和枚举类数据,顶端纹理采用不同的色彩来表示数据的变化趋势;Read the original survey data, and extract and normalize the extended 3D histogram data according to the requirements of the 3D visualization model and the results of classification preprocessing, and generate the extended 3D histogram data that conforms to the standard format of the 3D visualization model. The horizontal dimension of the 3D histogram is composed of different reporting subjects with different levels, the vertical dimension includes a variety of complex data types, and the Z-dimension is composed of cell attributes, cell height and top texture and is related to the original survey data. In which, different reporting entities at different levels include but are not limited to provinces, cities, counties and schools, and a variety of complex data types include but are not limited to logical data, text data, numerical data and enumeration data. Data, the top texture uses different colors to represent the changing trend of the data;

将生成的拓展三维直方图数据按照图形学的方法进行可视化显示。The generated extended 3D histogram data is visualized according to the graphics method.

进一步,所述进行三维可视化模型初始化这一步骤,其包括:Further, the step of performing the initialization of the 3D visualization model includes:

确定拓展三维直方图横向和纵向的绝对宽度、数据间隔、所包含的最小单位维度总个数;Determine the absolute width, data interval, and the total number of minimum unit dimensions included in the horizontal and vertical directions of the extended three-dimensional histogram;

确定拓展三维直方图横向坐标和纵向坐标的具体维度结构;Determine the specific dimensional structure of the horizontal and vertical coordinates of the extended three-dimensional histogram;

确定拓展三维直方图z轴方向总的高度、地平面的位置以及不同数据类型的高度表示方法;Determine the total height in the z-axis direction of the extended three-dimensional histogram, the position of the ground plane, and the height representation method of different data types;

设置拓展三维直方图z轴方向的顶端纹理参数。Sets the top texture parameter that extends the z-axis direction of the 3D histogram.

进一步,所述对调研数据按照具体的数据类型进行分类别预处理这一步骤,其包括:Further, the step of classifying and preprocessing the survey data according to specific data types includes:

读入原始的调研数据;Read in the original survey data;

根据原始的调研数据的具体数据类型进行相应的分类别处理,得到处理后的数据;According to the specific data type of the original research data, the corresponding classification processing is performed to obtain the processed data;

将处理后的数据存档。Archive the processed data.

进一步,所述根据原始的调研数据的具体数据类型进行相应的分类别处理,得到处理后的数据这一步骤,其具体为:Further, the step of performing corresponding classification processing according to the specific data type of the original survey data to obtain the processed data is specifically:

若原始的调研数据为数值类数据,则先确定数值类数据的取值范围,然后确定数值类数据的平均值,再确定数值类数据数值映射的方法,最后标识单个数值类数据的变化趋势;若原始的调研数据为逻辑类数据,则先列举出逻辑类数据各个调研项的取值范围,然后确定逻辑类数据的参考值,再确定逻辑类数据数值映射的方法,最后标识单个逻辑类数据的变化趋势;若原始的调研数据为文本类数据,则先列举出文本类数据的关键词,然后提取文本类数据的摘要,再确定文本类数据关键词映射的方法,最后标识单个文本类数据的变化趋势。If the original survey data is numerical data, first determine the value range of the numerical data, then determine the average value of the numerical data, then determine the numerical mapping method of the numerical data, and finally identify the change trend of a single numerical data; If the original survey data is logical data, first enumerate the value range of each survey item of the logical data, then determine the reference value of the logical data, then determine the method of numerical mapping of the logical data, and finally identify a single logical data If the original research data is textual data, first enumerate the keywords of the textual data, then extract the abstract of the textual data, then determine the method of keyword mapping for the textual data, and finally identify a single textual data changing trend.

进一步,所述读取原始的调研数据,并依据三维可视化模型的要求和分类别预处理的结果,进行拓展三维直方图数据提取和规格化,生成符合三维可视化模型标准格式的拓展三维直方图数据这一步骤,其包括:Further, the original research data is read, and according to the requirements of the 3D visualization model and the results of classification preprocessing, the extended 3D histogram data is extracted and normalized, and the extended 3D histogram data that conforms to the standard format of the 3D visualization model is generated. This step includes:

逐条读取原始的调研数据,根据预定复合结构调研数据的数据格式对原始的调研数据进行逐层深入解析,直到解析出原始的调研数据最小的数据单元;Read the original research data one by one, and perform in-depth analysis of the original research data layer by layer according to the data format of the predetermined composite structure research data, until the smallest data unit of the original research data is parsed;

根据原始的调研数据的具体数据类型、三维可视化模型的要求和分类别预处理的结果,提取原始的调研数据所需要的图形化表示数据,并计算出相应的数据变化趋势;According to the specific data type of the original research data, the requirements of the 3D visualization model and the results of classification preprocessing, extract the graphical representation data required by the original research data, and calculate the corresponding data change trend;

将提取的图形化表示数据和计算出的数据变化趋势进行三维可视化模型的规格化,得到符合三维可视化模型标准格式的数据;Normalize the 3D visualization model of the extracted graphical representation data and the calculated data change trend, and obtain data that conforms to the standard format of the 3D visualization model;

将符合三维可视化模型标准格式的数据写入拓展三维直方图数据集。Write the data conforming to the standard format of the 3D visualization model into the extended 3D histogram dataset.

进一步,所述进行拓展三维直方图数据规格化这一步骤,其包括:Further, the described step of normalizing the expanded three-dimensional histogram data includes:

以地平面作为参考的平面,根据不同数据类型的特点,制定立方高度归一化对策略;Taking the ground plane as a reference plane, according to the characteristics of different data types, formulate a cubic height normalization pairing strategy;

根据立方高度归一化对策略对拓展三维直方图数据进行规格化处理,所述拓展三维直方图Z轴方向的高度高于地平面或低于地平面。The extended three-dimensional histogram data is normalized according to the cube height normalization strategy, and the height of the extended three-dimensional histogram in the Z-axis direction is higher than or lower than the ground level.

本发明所采取的另一技术方案是:Another technical scheme adopted by the present invention is:

基于拓展三维直方图的调研大数据可视化建模系统,包括:Research big data visualization modeling system based on extended 3D histogram, including:

三维可视化模型初始化模块,用于进行三维可视化模型初始化;The 3D visualization model initialization module is used to initialize the 3D visualization model;

调研数据分类别预处理模块,用于对调研数据按照具体的数据类型进行分类别预处理;The research data classification preprocessing module is used to preprocess the research data by classification according to specific data types;

拓展三维直方图数据生成模块,用于读取原始的调研数据,并依据三维可视化模型的要求和分类别预处理的结果,进行拓展三维直方图数据提取和规格化,生成符合三维可视化模型标准格式的拓展三维直方图数据,所述拓展三维直方图的横向维度由具有不同层次的不同的填报主体来构成,纵向维度包括多种复杂的数据类型,Z向维度为由单元格属性、单元高度和顶端纹理组成并与原调研数据相联系的结构,其中,不同层次的不同的填报主体包括但不限于省、市、县区和学校,多种复杂的数据类型包括但不限于逻辑类数据、文本类数据、数值类数据和枚举类数据,顶端纹理采用不同的色彩来表示数据的变化趋势;The extended 3D histogram data generation module is used to read the original survey data, and according to the requirements of the 3D visualization model and the results of classification preprocessing, the extended 3D histogram data is extracted and normalized to generate a standard format that conforms to the 3D visualization model. The extended three-dimensional histogram data, the horizontal dimension of the extended three-dimensional histogram is composed of different reporting subjects with different levels, the vertical dimension includes a variety of complex data types, and the Z dimension is composed of cell attributes, cell height and A structure composed of top textures and linked to the original survey data, in which different reporting entities at different levels include but are not limited to provinces, cities, counties and schools, and various complex data types include but are not limited to logical data, text For class data, numerical data and enumeration data, the top texture uses different colors to represent the trend of data changes;

调研数据拓展三维直方图显示模块,用于将生成的拓展三维直方图数据按照图形学的方法进行可视化显示。The expanded 3D histogram display module of the survey data is used to visualize the generated expanded 3D histogram data according to the graphics method.

进一步,所述调研数据分类别预处理模块包括:Further, the survey data classification preprocessing module includes:

数据读入单元,用于读入原始的调研数据;The data read-in unit is used to read in the original research data;

分类别处理单元,用于根据原始的调研数据的具体数据类型进行相应的分类别处理,得到处理后的数据;The classification processing unit is used to perform corresponding classification processing according to the specific data type of the original survey data to obtain the processed data;

存档单元,用于将处理后的数据存档。Archiving unit for archiving processed data.

进一步,所述分类别处理单元具体执行以下操作:Further, the classification processing unit specifically performs the following operations:

若原始的调研数据为数值类数据,则先确定数值类数据的取值范围,然后确定数值类数据的平均值,再确定数值类数据数值映射的方法,最后标识单个数值类数据的变化趋势;若原始的调研数据为逻辑类数据,则先列举出逻辑类数据各个调研项的取值范围,然后确定逻辑类数据的参考值,再确定逻辑类数据数值映射的方法,最后标识单个逻辑类数据的变化趋势;若原始的调研数据为文本类数据,则先列举出文本类数据的关键词,然后提取文本类数据的摘要,再确定文本类数据关键词映射的方法,最后标识单个文本类数据的变化趋势。If the original survey data is numerical data, first determine the value range of the numerical data, then determine the average value of the numerical data, then determine the numerical mapping method of the numerical data, and finally identify the change trend of a single numerical data; If the original survey data is logical data, first enumerate the value range of each survey item of the logical data, then determine the reference value of the logical data, then determine the method of numerical mapping of the logical data, and finally identify a single logical data If the original research data is textual data, first enumerate the keywords of the textual data, then extract the abstract of the textual data, then determine the method of keyword mapping for the textual data, and finally identify a single textual data changing trend.

进一步,所述拓展三维直方图数据生成模块包括:Further, the expanded three-dimensional histogram data generation module includes:

读取与解析单元,用于逐条读取原始的调研数据,根据预定复合结构调研数据的数据格式对原始的调研数据进行逐层深入解析,直到解析出原始的调研数据最小的数据单元;The reading and parsing unit is used to read the original research data one by one, and perform in-depth analysis of the original research data layer by layer according to the data format of the predetermined composite structure research data, until the smallest data unit of the original research data is parsed;

提取与计算单元,用于根据原始的调研数据的具体数据类型、三维可视化模型的要求和分类别预处理的结果,提取原始的调研数据所需要的图形化表示数据,并计算出相应的数据变化趋势;The extraction and calculation unit is used to extract the graphical representation data required by the original research data according to the specific data type of the original research data, the requirements of the three-dimensional visualization model and the results of classification preprocessing, and calculate the corresponding data changes. trend;

规格化单元,用于将提取的图形化表示数据和计算出的数据变化趋势进行三维可视化模型的规格化,得到符合三维可视化模型标准格式的数据;The normalization unit is used to normalize the three-dimensional visualization model of the extracted graphical representation data and the calculated data change trend, so as to obtain data conforming to the standard format of the three-dimensional visualization model;

写入单元,用于将符合三维可视化模型标准格式的数据写入拓展三维直方图数据集。The writing unit is used to write the data conforming to the standard format of the 3D visualization model into the extended 3D histogram dataset.

本发明的方法的有益效果是:包括进行三维可视化模型初始化,对调研数据按照具体的数据类型进行分类别预处理,进行拓展三维直方图数据提取和规格化以及进行可视化显示的步骤,基于拓展三维直方图,通过拓展三维直方图数据提取和规格化生成符合三维可视化模型标准格式的拓展三维直方图数据并进行可视化显示,能形成统一的整体可视化分析结果,适用性更广;拓展三维直方图Z向维度为由单元格属性、单元高度和顶端纹理组成并与原调研数据相联系的结构,通过拓展三维直方图这一可视化图表与原调研数据相联系,克服了常规可视化对数据处理的不可逆性缺陷,保真度高;拓展三维直方图纵向维度包括多种复杂的数据类型,对数据类型进行了扩展,并增设了对调研数据按照具体的数据类型进行分类别预处理,满足了复杂的多类型数据的处理要求,能根据不同的数据类型采取相应有效的数据预处理方法,更加有效。The beneficial effects of the method of the present invention are: including the steps of initializing a three-dimensional visualization model, preprocessing the survey data according to specific data types, extracting and normalizing the expanded three-dimensional histogram data, and performing visual display, based on the expanded three-dimensional Histogram, through the extraction and normalization of extended 3D histogram data, generates extended 3D histogram data that conforms to the standard format of 3D visualization model and displays it visually, which can form a unified overall visual analysis result and has wider applicability; extended 3D histogram Z The directional dimension is a structure composed of cell attributes, cell heights and top textures and is related to the original research data. By expanding the three-dimensional histogram visualization chart to connect with the original research data, it overcomes the irreversibility of data processing by conventional visualization. Defects and high fidelity; the vertical dimension of the extended 3D histogram includes a variety of complex data types, the data types are expanded, and the preprocessing of the survey data according to specific data types is added, which satisfies the complex and complex data types. It is more effective to take corresponding and effective data preprocessing methods according to the processing requirements of different data types.

本发明的系统的有益效果是:包括三维可视化模型初始化模块、调研数据分类别预处理模块、拓展三维直方图数据生成模块和调研数据拓展三维直方图显示模块,基于拓展三维直方图,在拓展三维直方图数据生成模块和调研数据拓展三维直方图显示模块中通过拓展三维直方图数据提取和规格化生成符合三维可视化模型标准格式的拓展三维直方图数据并进行可视化显示,能形成统一的整体可视化分析结果,适用性更广;拓展三维直方图数据生成模块的拓展三维直方图Z向维度为由单元格属性、单元高度和顶端纹理组成并与原调研数据相联系的结构,通过拓展三维直方图这一可视化图表与原调研数据相联系,克服了常规可视化对数据处理的不可逆性缺陷,保真度高;拓展三维直方图数据生成模块的拓展三维直方图纵向维度包括多种复杂的数据类型,对数据类型进行了扩展,并增设了对调研数据按照具体的数据类型进行分类别预处理的调研数据分类别预处理模块,满足了复杂的多类型数据的处理要求,能根据不同的数据类型采取相应有效的数据预处理方法,更加有效。The beneficial effects of the system of the present invention are: including a three-dimensional visualization model initialization module, a research data classification preprocessing module, an expanded three-dimensional histogram data generation module, and a research data expanded three-dimensional histogram display module, based on the expanded three-dimensional histogram, in the expanded three-dimensional histogram The histogram data generation module and the survey data extension 3D histogram display module can generate the extended 3D histogram data conforming to the standard format of the 3D visualization model and visualize it by extracting and normalizing the extended 3D histogram data, which can form a unified overall visual analysis. As a result, the applicability is wider; the Z dimension of the extended three-dimensional histogram of the extended three-dimensional histogram data generation module is a structure composed of cell attributes, cell heights and top textures and is related to the original research data. 1. The visualization chart is connected with the original research data, which overcomes the irreversibility defect of conventional visualization for data processing, and has high fidelity; the extended three-dimensional histogram vertical dimension of the extended three-dimensional histogram data generation module includes a variety of complex data types. The data type has been expanded, and a research data classification preprocessing module has been added to preprocess the research data according to specific data types, which satisfies the processing requirements of complex multi-type data, and can take corresponding measures according to different data types. Effective data preprocessing methods are more effective.

附图说明Description of drawings

图1为本发明基于拓展三维直方图的调研大数据可视化建模方法的步骤流程图;Fig. 1 is the step flow chart of the research big data visualization modeling method based on the expanded three-dimensional histogram of the present invention;

图2为本发明根据原始的调研数据的具体数据类型进行相应的分类别处理步骤的流程图;Fig. 2 is the flow chart that the present invention carries out corresponding classification processing steps according to the concrete data type of the original survey data;

图3为本发明基于拓展三维直方图的调研大数据可视化建模的整体结构框图。FIG. 3 is a block diagram of the overall structure of the visualization modeling of the survey big data based on the extended three-dimensional histogram of the present invention.

具体实施方式Detailed ways

参照图1,基于拓展三维直方图的调研大数据可视化建模方法,包括以下步骤:Referring to Fig. 1, the visualization modeling method of research big data based on extended three-dimensional histogram includes the following steps:

进行三维可视化模型初始化;Initialize the 3D visualization model;

对调研数据按照具体的数据类型进行分类别预处理;Preprocess the survey data into categories according to specific data types;

读取原始的调研数据,并依据三维可视化模型的要求和分类别预处理的结果,进行拓展三维直方图数据提取和规格化,生成符合三维可视化模型标准格式的拓展三维直方图数据,所述拓展三维直方图的横向维度由具有不同层次的不同的填报主体来构成,纵向维度包括多种复杂的数据类型,Z向维度为由单元格属性、单元高度和顶端纹理组成并与原调研数据相联系的结构,其中,不同层次的不同的填报主体包括但不限于省、市、县区和学校,多种复杂的数据类型包括但不限于逻辑类数据、文本类数据、数值类数据和枚举类数据,顶端纹理采用不同的色彩来表示数据的变化趋势;Read the original survey data, and extract and normalize the extended 3D histogram data according to the requirements of the 3D visualization model and the results of classification preprocessing, and generate the extended 3D histogram data that conforms to the standard format of the 3D visualization model. The horizontal dimension of the 3D histogram is composed of different reporting subjects with different levels, the vertical dimension includes a variety of complex data types, and the Z-dimension is composed of cell attributes, cell height and top texture and is related to the original survey data. In which, different reporting entities at different levels include but are not limited to provinces, cities, counties and schools, and a variety of complex data types include but are not limited to logical data, text data, numerical data and enumeration data. Data, the top texture uses different colors to represent the changing trend of the data;

将生成的拓展三维直方图数据按照图形学的方法进行可视化显示。The generated extended 3D histogram data is visualized according to the graphics method.

进一步作为优选的实施方式,所述进行三维可视化模型初始化这一步骤,其包括:Further as a preferred embodiment, the step of carrying out the initialization of the 3D visualization model includes:

确定拓展三维直方图横向和纵向的绝对宽度、数据间隔、所包含的最小单位维度总个数;Determine the absolute width, data interval, and the total number of minimum unit dimensions included in the horizontal and vertical directions of the extended three-dimensional histogram;

确定拓展三维直方图横向坐标和纵向坐标的具体维度结构;Determine the specific dimensional structure of the horizontal and vertical coordinates of the extended three-dimensional histogram;

确定拓展三维直方图z轴方向总的高度、地平面的位置以及不同数据类型的高度表示方法;Determine the total height in the z-axis direction of the extended three-dimensional histogram, the position of the ground plane, and the height representation method of different data types;

设置拓展三维直方图z轴方向的顶端纹理参数。Sets the top texture parameter that extends the z-axis direction of the 3D histogram.

参照图2,进一步作为优选的实施方式,所述对调研数据按照具体的数据类型进行分类别预处理这一步骤,其包括:Referring to Fig. 2, as a further preferred embodiment, the step of classifying and preprocessing the survey data according to specific data types includes:

读入原始的调研数据;Read in the original survey data;

根据原始的调研数据的具体数据类型进行相应的分类别处理,得到处理后的数据;According to the specific data type of the original research data, the corresponding classification processing is performed to obtain the processed data;

将处理后的数据存档。Archive the processed data.

参照图2,进一步作为优选的实施方式,所述根据原始的调研数据的具体数据类型进行相应的分类别处理,得到处理后的数据这一步骤,其具体为:Referring to Fig. 2, as a further preferred embodiment, the step of performing corresponding classification processing according to the specific data type of the original survey data to obtain the processed data is specifically:

若原始的调研数据为数值类数据,则先确定数值类数据的取值范围,然后确定数值类数据的平均值,再确定数值类数据数值映射的方法,最后标识单个数值类数据的变化趋势;若原始的调研数据为逻辑类数据,则先列举出逻辑类数据各个调研项的取值范围,然后确定逻辑类数据的参考值,再确定逻辑类数据数值映射的方法,最后标识单个逻辑类数据的变化趋势;若原始的调研数据为文本类数据,则先列举出文本类数据的关键词,然后提取文本类数据的摘要,再确定文本类数据关键词映射的方法,最后标识单个文本类数据的变化趋势。If the original survey data is numerical data, first determine the value range of the numerical data, then determine the average value of the numerical data, then determine the numerical mapping method of the numerical data, and finally identify the change trend of a single numerical data; If the original survey data is logical data, first enumerate the value range of each survey item of the logical data, then determine the reference value of the logical data, then determine the method of numerical mapping of the logical data, and finally identify a single logical data If the original research data is textual data, first enumerate the keywords of the textual data, then extract the abstract of the textual data, then determine the method of keyword mapping for the textual data, and finally identify a single textual data changing trend.

进一步作为优选的实施方式,所述读取原始的调研数据,并依据三维可视化模型的要求和分类别预处理的结果,进行拓展三维直方图数据提取和规格化,生成符合三维可视化模型标准格式的拓展三维直方图数据这一步骤,其包括:Further as a preferred embodiment, the original research data is read, and according to the requirements of the three-dimensional visualization model and the results of classification preprocessing, the extended three-dimensional histogram data is extracted and normalized to generate a standard format that conforms to the three-dimensional visualization model. The step of expanding the 3D histogram data includes:

逐条读取原始的调研数据,根据预定复合结构调研数据的数据格式对原始的调研数据进行逐层深入解析,直到解析出原始的调研数据最小的数据单元;Read the original research data one by one, and perform in-depth analysis of the original research data layer by layer according to the data format of the predetermined composite structure research data, until the smallest data unit of the original research data is parsed;

根据原始的调研数据的具体数据类型、三维可视化模型的要求和分类别预处理的结果,提取原始的调研数据所需要的图形化表示数据,并计算出相应的数据变化趋势;According to the specific data type of the original research data, the requirements of the 3D visualization model and the results of classification preprocessing, extract the graphical representation data required by the original research data, and calculate the corresponding data change trend;

将提取的图形化表示数据和计算出的数据变化趋势进行三维可视化模型的规格化,得到符合三维可视化模型标准格式的数据;Normalize the 3D visualization model of the extracted graphical representation data and the calculated data change trend, and obtain data that conforms to the standard format of the 3D visualization model;

将符合三维可视化模型标准格式的数据写入拓展三维直方图数据集。Write the data conforming to the standard format of the 3D visualization model into the extended 3D histogram dataset.

进一步作为优选的实施方式,所述进行拓展三维直方图数据规格化这一步骤,其包括:Further as a preferred embodiment, the described step of expanding the three-dimensional histogram data normalization includes:

以地平面作为参考的平面,根据不同数据类型的特点,制定立方高度归一化对策略;Taking the ground plane as a reference plane, according to the characteristics of different data types, formulate a cubic height normalization pairing strategy;

根据立方高度归一化对策略对拓展三维直方图数据进行规格化处理,所述拓展三维直方图Z轴方向的高度高于地平面或低于地平面。The extended three-dimensional histogram data is normalized according to the cube height normalization strategy, and the height of the extended three-dimensional histogram in the Z-axis direction is higher than or lower than the ground level.

参照图3,基于拓展三维直方图的调研大数据可视化建模系统,包括:Referring to Figure 3, the research big data visualization modeling system based on the extended three-dimensional histogram includes:

三维可视化模型初始化模块,用于进行三维可视化模型初始化;The 3D visualization model initialization module is used to initialize the 3D visualization model;

调研数据分类别预处理模块,用于对调研数据按照具体的数据类型进行分类别预处理;The research data classification preprocessing module is used to preprocess the research data by classification according to specific data types;

拓展三维直方图数据生成模块,用于读取原始的调研数据,并依据三维可视化模型的要求和分类别预处理的结果,进行拓展三维直方图数据提取和规格化,生成符合三维可视化模型标准格式的拓展三维直方图数据,所述拓展三维直方图的横向维度由具有不同层次的不同的填报主体来构成,纵向维度包括多种复杂的数据类型,Z向维度为由单元格属性、单元高度和顶端纹理组成并与原调研数据相联系的结构,其中,不同层次的不同的填报主体包括但不限于省、市、县区和学校,多种复杂的数据类型包括但不限于逻辑类数据、文本类数据、数值类数据和枚举类数据,顶端纹理采用不同的色彩来表示数据的变化趋势;The extended 3D histogram data generation module is used to read the original survey data, and according to the requirements of the 3D visualization model and the results of classification preprocessing, the extended 3D histogram data is extracted and normalized to generate a standard format that conforms to the 3D visualization model. The extended three-dimensional histogram data, the horizontal dimension of the extended three-dimensional histogram is composed of different reporting subjects with different levels, the vertical dimension includes a variety of complex data types, and the Z dimension is composed of cell attributes, cell height and A structure composed of top textures and linked to the original survey data, in which different reporting entities at different levels include but are not limited to provinces, cities, counties and schools, and various complex data types include but are not limited to logical data, text For class data, numerical data and enumeration data, the top texture uses different colors to represent the trend of data changes;

调研数据拓展三维直方图显示模块,用于将生成的拓展三维直方图数据按照图形学的方法进行可视化显示。The expanded 3D histogram display module of the survey data is used to visualize the generated expanded 3D histogram data according to the graphics method.

进一步作为优选的实施方式,所述调研数据分类别预处理模块包括:Further as a preferred embodiment, the survey data classification preprocessing module includes:

数据读入单元,用于读入原始的调研数据;The data read-in unit is used to read in the original research data;

分类别处理单元,用于根据原始的调研数据的具体数据类型进行相应的分类别处理,得到处理后的数据;The classification processing unit is used to perform corresponding classification processing according to the specific data type of the original survey data to obtain the processed data;

存档单元,用于将处理后的数据存档。Archiving unit for archiving processed data.

进一步作为优选的实施方式,所述分类别处理单元具体执行以下操作:As a further preferred embodiment, the classification processing unit specifically performs the following operations:

若原始的调研数据为数值类数据,则先确定数值类数据的取值范围,然后确定数值类数据的平均值,再确定数值类数据数值映射的方法,最后标识单个数值类数据的变化趋势;若原始的调研数据为逻辑类数据,则先列举出逻辑类数据各个调研项的取值范围,然后确定逻辑类数据的参考值,再确定逻辑类数据数值映射的方法,最后标识单个逻辑类数据的变化趋势;若原始的调研数据为文本类数据,则先列举出文本类数据的关键词,然后提取文本类数据的摘要,再确定文本类数据关键词映射的方法,最后标识单个文本类数据的变化趋势。If the original survey data is numerical data, first determine the value range of the numerical data, then determine the average value of the numerical data, then determine the numerical mapping method of the numerical data, and finally identify the change trend of a single numerical data; If the original survey data is logical data, first enumerate the value range of each survey item of the logical data, then determine the reference value of the logical data, then determine the method of numerical mapping of the logical data, and finally identify a single logical data If the original research data is textual data, first enumerate the keywords of the textual data, then extract the abstract of the textual data, then determine the method of keyword mapping for the textual data, and finally identify a single textual data changing trend.

进一步作为优选的实施方式,所述拓展三维直方图数据生成模块包括:Further as a preferred embodiment, the expanded three-dimensional histogram data generation module includes:

读取与解析单元,用于逐条读取原始的调研数据,根据预定复合结构调研数据的数据格式对原始的调研数据进行逐层深入解析,直到解析出原始的调研数据最小的数据单元;The reading and parsing unit is used to read the original research data one by one, and perform in-depth analysis of the original research data layer by layer according to the data format of the predetermined composite structure research data, until the smallest data unit of the original research data is parsed;

提取与计算单元,用于根据原始的调研数据的具体数据类型、三维可视化模型的要求和分类别预处理的结果,提取原始的调研数据所需要的图形化表示数据,并计算出相应的数据变化趋势;The extraction and calculation unit is used to extract the graphical representation data required by the original research data according to the specific data type of the original research data, the requirements of the three-dimensional visualization model and the results of classification preprocessing, and calculate the corresponding data changes. trend;

规格化单元,用于将提取的图形化表示数据和计算出的数据变化趋势进行三维可视化模型的规格化,得到符合三维可视化模型标准格式的数据;The normalization unit is used to normalize the three-dimensional visualization model of the extracted graphical representation data and the calculated data change trend, so as to obtain data conforming to the standard format of the three-dimensional visualization model;

写入单元,用于将符合三维可视化模型标准格式的数据写入拓展三维直方图数据集。The writing unit is used to write the data conforming to the standard format of the 3D visualization model into the extended 3D histogram dataset.

下面结合说明书附图和具体实施例对本发明作进一步解释和说明。The present invention will be further explained and illustrated below in conjunction with the accompanying drawings and specific embodiments of the description.

实施例一Example 1

针对现有技术适用性不广、保真度低和不够有效的问题,本发明提出了一种新的基于拓展三维直方图的调研大数据可视化建模方法及系统。本发明最关键的是要将多种层次、多维度、多种类型的调研数据处理过程,从常规同种数据类型的直观比较方式,聚类上升为统一的整体可视化分析模型。本发明基于拓展三维直方图,而拓展三维直方图以三维直方图为基础进行了扩维:横向维度由简单同一种层次的不同部分组成扩展到由省、市、县区和学校这样具有不同层次的不同的填报主体构成,为后续展开不同粒度的大数据处理提供了条件;纵向维度由通常的纯数值数据类型拓展为包括逻辑类数据、文本类数据、数值类数据和枚举类数据等多种复杂数据类型的集合,大大扩充了可视化三维模型的表达能力与适用范围;Z向维度由通常的单一高度拓展成由单元格属性、单元高度和顶端纹理这三大内涵组成的,并与原调研数据相联系的高保真结构。其中,顶端纹理采用不同的色彩来表示数据的变化趋势,为时态数据库的应用奠定了基础。在可视化建模过程中,本发明对新引进的复杂数据类型进行预处理时,根据不同数据类型的特点,制定了立方高度归一化策略。为此,本发明还专门引进了地平面作为参考的平面,使得扩展后的高度可以通过高于地平面或低于地平面的方式进行表示,增强了立方高度的内涵和表达能力。Aiming at the problems of poor applicability, low fidelity and insufficient effectiveness of the prior art, the present invention proposes a new visualization modeling method and system for research big data based on extended three-dimensional histogram. The key point of the present invention is to upgrade the multi-level, multi-dimensional and multi-type investigation data processing process from the conventional intuitive comparison method of the same data type to a unified overall visual analysis model. The present invention is based on the expanded three-dimensional histogram, and the expanded three-dimensional histogram is expanded on the basis of the three-dimensional histogram: the horizontal dimension is composed of different parts of a simple same level and expanded to a province, a city, a county, and a school with different levels. It provides conditions for subsequent big data processing of different granularities; the vertical dimension is expanded from the usual pure numerical data type to include logical data, text data, numerical data, and enumeration data, etc. It is a collection of complex data types, which greatly expands the expressive ability and application scope of the visual 3D model; the Z dimension is expanded from the usual single height to the three connotations of cell attributes, cell height and top texture, and is consistent with the original. High-fidelity structures linked to survey data. Among them, the top texture uses different colors to represent the changing trend of data, which lays the foundation for the application of temporal database. In the process of visual modeling, when the present invention preprocesses newly introduced complex data types, a cube height normalization strategy is formulated according to the characteristics of different data types. For this reason, the present invention also specially introduces the ground plane as a reference plane, so that the expanded height can be expressed by being higher than or lower than the ground plane, which enhances the connotation and expressive ability of the cubic height.

如图3所示,本发明的调研大数据可视化建模系统包括调研数据分类别预处理模块、三维可视化模型初始化模块、拓展三维直方图数据生成模块和调研数据拓展三维直方图显示模块这四大部分。其中,调研数据分类别预处理模块,用于对不同类别的调研数据进行预处理。三维可视化模型初始化模块,用于初始化可视化三维模型的宽度、高度和精度等需要初始化的数据。拓展三维直方图数据生成模块把原始的调研数据读取进来,并依据三维可视化模型的要求,进行数据的组合和规格化,最终得到规范化的三维可视化模型标准格式数据。调研数据拓展三维直方图显示模块,把拓展三维直方图数据生成模块标准化的数据依照图形学的方法,逐行逐列,按高度及顶端纹理来组成能够直接可视化显示的输出数据。As shown in FIG. 3 , the research big data visualization modeling system of the present invention includes four major categories: a research data classification preprocessing module, a three-dimensional visualization model initialization module, an expanded three-dimensional histogram data generation module, and a research data expanded three-dimensional histogram display module. part. Among them, the survey data preprocessing module is used for preprocessing different categories of survey data. The 3D visualization model initialization module is used to initialize the data that need to be initialized, such as the width, height and precision of the visualization 3D model. The expanded 3D histogram data generation module reads in the original survey data, combines and normalizes the data according to the requirements of the 3D visualization model, and finally obtains the standardized 3D visualization model standard format data. The research data expands the 3D histogram display module, and standardizes the data from the expanded 3D histogram data generation module according to the graphics method, row by column, according to the height and top texture to form the output data that can be directly visualized.

如图1所示,本发明的调研大数据可视化建模方法包括以下步骤:As shown in Figure 1, the research big data visualization modeling method of the present invention comprises the following steps:

(1)进行三维可视化模型的初始化,包括:确定横向和纵向的绝对的宽度以及数据的间隔;确定一共有多少个最小单位维度;根据调研系统的需求,还可以确定横纵坐标的维度是一个怎么样的包含结构;z轴方向要确定总的高度、地平面的位置和不同的数据类型的高度表示方法,从而给不同平面位置纯数值的高度赋予不同的与信息技术应用水平相关的现实意义。同时,在初始化时还要制定一套差异化的顶端纹理来表示数值的变化。(1) Initialize the 3D visualization model, including: determining the absolute width of the horizontal and vertical directions and the interval of the data; determining how many minimum unit dimensions there are in total; according to the needs of the research system, it can also be determined that the dimension of the horizontal and vertical coordinates is a How to include the structure; the z-axis direction should determine the total height, the position of the ground plane and the height representation method of different data types, so as to give the heights of pure numerical values in different plane positions different practical meanings related to the level of information technology applications . At the same time, a set of differentiated top textures should be developed to represent the changes in values during initialization.

(2)对调研数据进行分类别预处理。如图2所示,分类别预处理主要处理三种主要类型的数据:数值类数据,先确定数值类数据的取值范围,然后确定数值类数据的平均值,再确定数值类数据数值映射的方法,最后标识单个数值类数据的变化趋势;逻辑类数据,先列举出逻辑类数据各个调研项的取值范围,然后确定逻辑类数据的参考值,再确定逻辑类数据数值映射的方法,最后标识单个逻辑类数据的变化趋势;文本类数据,先列举出文本类数据的关键词,然后提取文本类数据的摘要,再确定文本类数据关键词映射的方法,最后标识单个文本类数据的变化趋势。(2) Pre-processing the survey data into categories. As shown in Figure 2, classification preprocessing mainly deals with three main types of data: numerical data, first determine the value range of numerical data, then determine the average value of numerical data, and then determine the numerical mapping of numerical data. method, and finally identify the change trend of a single numerical data; for logical data, first enumerate the value range of each survey item of the logical data, then determine the reference value of the logical data, and then determine the method of numerical mapping of the logical data, and finally Identify the change trend of a single logical data; for textual data, first enumerate the keywords of the textual data, then extract the abstract of the textual data, then determine the method of mapping the keywords of the textual data, and finally identify the changes of a single textual data trend.

(3)拓展三维直方图数据的生成。(3) Expand the generation of 3D histogram data.

生成拓展三维直方图数据的具体过程为:首先,要逐条读取调研的原数据,根据预定复合结构调研数据的数据格式(即设定要解析的目标格式)对原始的调研数据进行逐层深入解析,直到解析出原始的调研数据最小的数据单元;再根据调研的原数据的数据类型,依据步骤(2)中相关类型数据的处理方法,去提取所需要的图形化表示数据,并计算出相应数据变化的趋势;然后,把这些提取和计算出的数据进行三维可视化模型的规格化,最后统一写入到拓展三维直方图数据集中。The specific process of generating the expanded 3D histogram data is as follows: First, read the original data of the survey one by one, and then go deeper into the original survey data layer by layer according to the data format of the predetermined composite structure survey data (that is, set the target format to be parsed). Analyze until the smallest data unit of the original survey data is parsed; then according to the data type of the original survey data, according to the processing method of the relevant type of data in step (2), to extract the required graphical representation data, and calculate The trend of corresponding data changes; then, the extracted and calculated data are normalized to the 3D visualization model, and finally written into the extended 3D histogram dataset.

(4)调研数据拓展三维直方图显示。(4) Expand the three-dimensional histogram display of the survey data.

此步骤用来把上一步骤(3)中生成的具有一定意义的拓展三维直方图数据,变成完全按照图形学要求的格式来表示的数据集,以进行可视化显示处理。此步骤能规定三维可视化模型的长宽高,各个行列间的间隔及高度以及顶端贴图等纹理的具体显示要求,具备能够进行显示,移动、转动、向各个维度坐标轴方向进行投影切割等操作的基本条件。This step is used to convert the extended three-dimensional histogram data with certain significance generated in the previous step (3) into a data set that is completely represented in the format required by graphics for visual display processing. This step can specify the length, width and height of the 3D visualization model, the interval and height between each row and column, and the specific display requirements for textures such as top textures. Basic conditions.

实施例二Embodiment 2

教育信息技术应用水平的评测是一个复杂的系统过程,其应用实施例一的可视化建模系统进行建模的过程具体包括以下步骤:The evaluation of the application level of educational information technology is a complex system process, and the process of modeling by applying the visual modeling system of the first embodiment specifically includes the following steps:

(1)建立信息技术应用评测的指标体系,包括各个规划、管理、投入、应用、培训等评价的主要指标,并保持在历次评测中的相对稳定性;(1) Establish an indicator system for the evaluation of information technology applications, including the main indicators of various planning, management, investment, application, training and other evaluations, and maintain relative stability in previous evaluations;

(2)建立网络评测系统,尽量通过网络手段来进行跨地域的调查,积累足够的调研数据量;(2) Establish a network evaluation system, try to conduct cross-regional surveys through network means, and accumulate enough survey data;

(3)建立实施例一的可视化建模系统,将传统数据处理方式朝着可视化处理分析的方向进行改进;(3) Establish the visual modeling system of the first embodiment, and improve the traditional data processing method in the direction of visual processing and analysis;

(4)根据实施例一的可视化建模系统进行调研数据的可视化分析与呈现;(4) According to the visual modeling system of the first embodiment, the visual analysis and presentation of the research data are carried out;

(5)在可视化分析的基础上,基于顶端纹理建立长期的调研机制,以从终结性的调研评测转变成为持续性的监测。(5) On the basis of visual analysis, a long-term research mechanism is established based on the top texture, so as to transform from a final survey and evaluation to a continuous monitoring.

与现有技术相比,本发明具有以下优点:Compared with the prior art, the present invention has the following advantages:

a.建立了可视化模型来改进统计分析框架,将数字化服务形式转变为图形化服务形式;a. Established a visualization model to improve the statistical analysis framework and transform the digital service form into a graphical service form;

b.从简单的局部数据型可视化扩展到全系统数据的可视化;b. Expand from simple local data visualization to system-wide data visualization;

c.通过拓展三维直方图这一可视化图表与原调研数据相联系,克服了常规可视化对数据处理的不可逆性缺陷,保真度更高;c. By expanding the three-dimensional histogram as a visualization chart to connect with the original research data, it overcomes the irreversibility defect of conventional visualization for data processing, and the fidelity is higher;

d.对复杂的多类型数据以语义为基础进行了聚类拓展,由模型内部进行不同类型数据的分辨并采取了相应有效的数据预处理方法,更加有效;d. The complex multi-type data is clustered and expanded on the basis of semantics. Different types of data are distinguished within the model and corresponding effective data preprocessing methods are adopted, which is more effective;

e.充分利用了顶部纹理,在传统三维可视化模型上附加上表示变化趋势的第四维信息,使得调研系统向持续监测系统的转变成为可能;e. Make full use of the top texture, and add the fourth-dimensional information representing the changing trend to the traditional 3D visualization model, making it possible to transform the research system into a continuous monitoring system;

f.扩充了三维直方图的内涵,构建了有层次、可分析和可解读的结构化基础可视化模型。f. Expanded the connotation of 3D histogram, and constructed a structured basic visualization model with layers, analysis and interpretation.

以上是对本发明的较佳实施进行了具体说明,但本发明并不限于所述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the described embodiments, and those skilled in the art can also make various equivalent deformations or replacements without departing from the spirit of the present invention, These equivalent modifications or substitutions are all included within the scope defined by the claims of the present application.

Claims (10)

1. The investigation big data visualization modeling method based on the expanded three-dimensional histogram is characterized in that: the method comprises the following steps:
initializing a three-dimensional visual model;
classifying and preprocessing the research data according to specific data types;
reading original investigation data, extracting and normalizing expanded three-dimensional histogram data according to the requirements of a three-dimensional visual model and classification preprocessing results, and generating expanded three-dimensional histogram data which accords with the standard format of the three-dimensional visual model, wherein the horizontal dimension of the expanded three-dimensional histogram is formed by different filling main bodies with different levels, the longitudinal dimension comprises multiple complex data types, and the Z-direction dimension is a structure which is formed by cell attributes, cell heights and top textures and is connected with the original investigation data, wherein the different filling main bodies with different levels comprise provinces, cities, counties and schools, the multiple complex data types comprise logic data, text data and numerical data, and the top textures adopt different colors to express the variation trend of the data;
and carrying out visual display on the generated expanded three-dimensional histogram data according to a graphics method.
2. The investigation big data visualization modeling method based on the extended three-dimensional histogram of claim 1, characterized in that: the step of initializing the three-dimensional visualization model includes:
determining the horizontal and longitudinal absolute widths, data intervals and the total number of the minimum unit dimensions contained in the expanded three-dimensional histogram;
determining a specific dimension structure for expanding the horizontal coordinate and the vertical coordinate of the three-dimensional histogram;
determining the total height of the expanded three-dimensional histogram in the z-axis direction, the position of a ground plane and height representation methods of different data types;
and setting a top texture parameter expanding the z-axis direction of the three-dimensional histogram.
3. The investigation big data visualization modeling method based on the extended three-dimensional histogram of claim 1, characterized in that: the step of classifying and preprocessing the research data according to specific data types comprises the following steps:
reading in original research data;
performing corresponding classification processing according to the specific data type of the original research data to obtain processed data;
and archiving the processed data.
4. The investigation big data visualization modeling method based on the extended three-dimensional histogram of claim 3, characterized in that: the step of performing corresponding classification processing according to the specific data type of the original research data to obtain processed data specifically includes:
if the original research data is numerical data, determining the value range of the numerical data, then determining the average value of the numerical data, then determining a numerical mapping method of the numerical data, and finally identifying the variation trend of the single numerical data; if the original investigation data is logic data, firstly enumerating the value range of each investigation item of the logic data, then determining the reference value of the logic data, then determining the logical data value mapping method, and finally identifying the variation trend of the single logic data; if the original research data is text data, firstly listing keywords of the text data, then extracting an abstract of the text data, then determining a method for mapping the keywords of the text data, and finally identifying the variation trend of the single text data.
5. The investigation big data visualization modeling method based on the extended three-dimensional histogram of claim 3, characterized in that: the step of reading original investigation data, extracting and normalizing expanded three-dimensional histogram data according to the requirements of the three-dimensional visual model and the classification preprocessing result, and generating expanded three-dimensional histogram data conforming to the standard format of the three-dimensional visual model comprises the following steps:
reading the original investigation data one by one, and carrying out deep analysis on the original investigation data layer by layer according to the data format of the predetermined composite structure investigation data until the minimum data unit of the original investigation data is analyzed;
extracting graphical representation data required by the original research data according to the specific data type of the original research data, the requirements of the three-dimensional visual model and the classification preprocessing result, and calculating the corresponding data change trend;
normalizing the three-dimensional visual model of the extracted graphical representation data and the calculated data change trend to obtain data conforming to the standard format of the three-dimensional visual model;
and writing the data which accords with the standard format of the three-dimensional visual model into an expanded three-dimensional histogram data set.
6. The method for visual modeling of research big data based on extended three-dimensional histograms according to any of claims 1-5, characterized in that: the step of performing extended three-dimensional histogram data normalization comprises:
a ground plane is used as a reference plane, and a cubic height normalization strategy is formulated according to the characteristics of different data types;
and normalizing the expanded three-dimensional histogram data according to the cubic height normalization strategy, wherein the height of the expanded three-dimensional histogram in the Z-axis direction is higher than the ground level or lower than the ground level.
7. An investigation big data visualization modeling system based on an expanded three-dimensional histogram is characterized in that: the method comprises the following steps:
the three-dimensional visualization model initialization module is used for initializing a three-dimensional visualization model;
the investigation data classification preprocessing module is used for classifying and preprocessing the investigation data according to specific data types;
an expanded three-dimensional histogram data generation module for reading the original investigation data, extracting and normalizing the expanded three-dimensional histogram data according to the requirements of the three-dimensional visualization model and the classification preprocessing result, generating expanded three-dimensional histogram data in accordance with the standard format of the three-dimensional visualization model, the horizontal dimension of the expanded three-dimensional histogram is formed by different filling main bodies with different levels, the longitudinal dimension comprises a plurality of complex data types, the Z-direction dimension is a structure which is formed by cell attributes, cell heights and top textures and is related to original research data, the different filling bodies of different levels comprise provinces, cities, counties and schools, the various complex data types comprise logic data, text data and numerical data, and the top texture adopts different colors to express the variation trend of the data;
and the investigation data expansion three-dimensional histogram display module is used for visually displaying the generated expansion three-dimensional histogram data according to a graphics method.
8. The system according to claim 7, wherein the developed three-dimensional histogram based research big data visualization modeling system comprises: the investigation data classification preprocessing module comprises:
the data reading unit is used for reading in original research data;
the classification processing unit is used for carrying out corresponding classification processing according to the specific data type of the original research data to obtain processed data;
and the archiving unit is used for archiving the processed data.
9. The system according to claim 8, wherein the developed three-dimensional histogram based research big data visualization modeling system comprises: the classification processing unit specifically executes the following operations:
if the original research data is numerical data, determining the value range of the numerical data, then determining the average value of the numerical data, then determining a numerical mapping method of the numerical data, and finally identifying the variation trend of the single numerical data; if the original investigation data is logic data, firstly enumerating the value range of each investigation item of the logic data, then determining the reference value of the logic data, then determining the logical data value mapping method, and finally identifying the variation trend of the single logic data; if the original research data is text data, firstly listing keywords of the text data, then extracting an abstract of the text data, then determining a method for mapping the keywords of the text data, and finally identifying the variation trend of the single text data.
10. The system according to claim 7, 8 or 9, wherein the histogram expansion based research big data visualization modeling system comprises: the expanded three-dimensional histogram data generation module comprises:
the reading and analyzing unit is used for reading the original investigation data one by one and deeply analyzing the original investigation data layer by layer according to the data format of the predetermined composite structure investigation data until the data unit with the minimum original investigation data is analyzed;
the extraction and calculation unit is used for extracting graphical representation data required by the original research data according to the specific data type of the original research data, the requirements of the three-dimensional visual model and the classification preprocessing result, and calculating the corresponding data change trend;
the normalization unit is used for normalizing the extracted graphical representation data and the calculated data change trend of the three-dimensional visualization model to obtain data conforming to the standard format of the three-dimensional visualization model;
and the writing unit is used for writing the data conforming to the standard format of the three-dimensional visual model into the expanded three-dimensional histogram data set.
CN201611225454.4A 2016-12-27 2016-12-27 Investigation big data visual modeling method and system based on expanded three-dimensional histogram Expired - Fee Related CN106815320B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611225454.4A CN106815320B (en) 2016-12-27 2016-12-27 Investigation big data visual modeling method and system based on expanded three-dimensional histogram

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611225454.4A CN106815320B (en) 2016-12-27 2016-12-27 Investigation big data visual modeling method and system based on expanded three-dimensional histogram

Publications (2)

Publication Number Publication Date
CN106815320A CN106815320A (en) 2017-06-09
CN106815320B true CN106815320B (en) 2020-03-17

Family

ID=59110274

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611225454.4A Expired - Fee Related CN106815320B (en) 2016-12-27 2016-12-27 Investigation big data visual modeling method and system based on expanded three-dimensional histogram

Country Status (1)

Country Link
CN (1) CN106815320B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033204B (en) * 2018-06-29 2021-10-08 浙江大学 A Visual Query Method for Histogram of Hierarchical Integral Based on World Wide Web
CN112365110A (en) * 2019-07-24 2021-02-12 中移信息技术有限公司 Research method, platform, server and computer storage medium
CN111523009B (en) * 2020-07-03 2020-10-13 北京每日优鲜电子商务有限公司 Data visualization processing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103295025A (en) * 2013-05-03 2013-09-11 南京大学 Automatic selecting method of three-dimensional model optimal view
CN103412871A (en) * 2013-07-08 2013-11-27 北京百度网讯科技有限公司 Method and device for generating visualized view
CN103617220A (en) * 2013-11-22 2014-03-05 北京掌阔移动传媒科技有限公司 Method and device for implementing mobile terminal 3D (three dimensional) model
CN105069020A (en) * 2015-07-14 2015-11-18 国家信息中心 Method and system for three-dimensional visualization of natural resource data

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8626750B2 (en) * 2011-01-28 2014-01-07 Bitvore Corp. Method and apparatus for 3D display and analysis of disparate data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103295025A (en) * 2013-05-03 2013-09-11 南京大学 Automatic selecting method of three-dimensional model optimal view
CN103412871A (en) * 2013-07-08 2013-11-27 北京百度网讯科技有限公司 Method and device for generating visualized view
CN103617220A (en) * 2013-11-22 2014-03-05 北京掌阔移动传媒科技有限公司 Method and device for implementing mobile terminal 3D (three dimensional) model
CN105069020A (en) * 2015-07-14 2015-11-18 国家信息中心 Method and system for three-dimensional visualization of natural resource data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
海底多维综合数据建模及可视化技术研究;苏天赟;《中国优秀博硕士学位论文全文数据库 (博士)基础科学辑》;20070215(第02期);全文 *

Also Published As

Publication number Publication date
CN106815320A (en) 2017-06-09

Similar Documents

Publication Publication Date Title
Liu et al. Cylinder detection in large-scale point cloud of pipeline plant
CN109492796A (en) A kind of Urban Spatial Morphology automatic Mesh Partition Method and system
He et al. Multivariate spatial data visualization: a survey
CN108388559A (en) Name entity recognition method and system, computer program of the geographical space under
CN112734913A (en) Three-dimensional model sphere expression calculation method based on multi-stage deformation reconstruction
CN118093673B (en) Mapping data processing method
CN114169771B (en) Area division method and device, electronic device and storage medium
Ma et al. Automatic discovery of common design structures in CAD models
CN116662468B (en) Urban functional area identification method and system based on geographic object space mode characteristics
CN111863135B (en) False positive structure variation filtering method, storage medium and computing device
CN106815320B (en) Investigation big data visual modeling method and system based on expanded three-dimensional histogram
CN113254517A (en) Service providing method based on internet big data
CN115713605A (en) Commercial building group automatic modeling method based on image learning
Yu et al. An optimization model for landscape planning and environmental design of smart cities based on big data analysis
CN117972111B (en) A knowledge reasoning method based on online graph processing technology for knowledge graph
CN101350035A (en) Content-based 3D model retrieval method test platform
Li Typical trajectory extraction method for ships based on ais data and trajectory clustering
CN105631465A (en) Density peak-based high-efficiency hierarchical clustering method
CN114722735B (en) A flow field feature tracking method based on graph optimization
CN116452842A (en) Clustering algorithm and device for reduced point cloud data set based on attention mechanism
CN101826098A (en) AB column diagram-based method for estimating spatial query selection rate
Cao et al. A survey on visual data mining techniques and applications
CN102779288A (en) Ontology analysis method based on field theory
Wu et al. Cartographic generalization
CN116166708B (en) An operation platform management system based on big data analytics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200317

CF01 Termination of patent right due to non-payment of annual fee