CN105303456A

CN105303456A - Method for processing monitoring data of electric power transmission equipment

Info

Publication number: CN105303456A
Application number: CN201510674398.1A
Authority: CN
Inventors: 耿利; 许海霞; 苗泽玮; 赵娜; 陈迪; 刘泉; 李晨; 李振宇; 胡青学; 张子建
Original assignee: Guo Wang Shandong Ningyang Power Supply Co; Taian Power Supply Co of State Grid Shandong Electric Power Co Ltd; State Grid Corp of China SGCC
Current assignee: Guo Wang Shandong Ningyang Power Supply Co; Taian Power Supply Co of State Grid Shandong Electric Power Co Ltd; State Grid Corp of China SGCC
Priority date: 2015-10-16
Filing date: 2015-10-16
Publication date: 2016-02-03

Abstract

本发明提供了一种电力传输设备监控数据处理方法，包括：根据监控数据的关联性和时间和空间属性进行多重备份的一致性散列存储，利用并行计算框架对多个监控数据源进行组合检索和并行检索和特征分析。本发明提出了一种电力传输设备监控数据的处理方法，基于云计算技术对监控数据进行高效、可靠地存储，并且实现快速访问和分析。The invention provides a method for processing monitoring data of power transmission equipment, including: performing consistent hash storage for multiple backups according to the relevance and time and space attributes of monitoring data, and using a parallel computing framework to perform combined retrieval of multiple monitoring data sources and parallel retrieval and feature analysis. The invention proposes a method for processing monitoring data of power transmission equipment, which efficiently and reliably stores the monitoring data based on cloud computing technology, and realizes quick access and analysis.

Description

Monitoring data processing method for power transmission equipment

技术领域technical field

本发明涉及电网数据处理，特别涉及一种电力传输设备监控数据处理方法。The invention relates to power grid data processing, in particular to a method for processing monitoring data of power transmission equipment.

背景技术Background technique

随着电网规模的快速增长、电网结构日趋复杂，电力企业纷纷加大电力传输设备监控的推广和应用力度，获取与传输的各类数据也在发生几何级的增长。这些数据不仅包括了设备异常时出现的各类信号、运行中的各类设备的状态信息，同时还包含了大量的相关数据，如地理信息、天气、现场温度与湿度以及检测视频、图像以及相关文档等，逐渐构成电力传输设备监控数据。大量的监控节点不断地向数据平台传递采集的数据，形成海量的异构数据流。数据平台不仅需要可靠地存储这些数据，而且需要及时地分析和处理这些数据。虽然现有技术基于云计算平台处理海量监控数据，但是与互联网领域的云计算应用相比，电力传输设备监控无论在数据存储、通信还是计算方面都存在很大差异。如何对上述数据进行高效、可靠地存储，并快速访问和分析，是当前急需解决的问题。With the rapid growth of the grid scale and the increasingly complex grid structure, power companies have increased the promotion and application of power transmission equipment monitoring, and various types of data acquired and transmitted are also undergoing geometric growth. These data not only include various signals when the equipment is abnormal, status information of various equipment in operation, but also a large amount of related data, such as geographic information, weather, on-site temperature and humidity, and detection videos, images and related data. Documents, etc., gradually constitute the monitoring data of power transmission equipment. A large number of monitoring nodes continuously transmit collected data to the data platform, forming massive heterogeneous data streams. The data platform not only needs to store these data reliably, but also needs to analyze and process these data in a timely manner. Although the existing technology is based on the cloud computing platform to process massive monitoring data, compared with cloud computing applications in the Internet field, the monitoring of power transmission equipment is very different in terms of data storage, communication and calculation. How to efficiently and reliably store the above data, and quickly access and analyze them is an urgent problem to be solved.

发明内容Contents of the invention

为解决上述现有技术所存在的问题，本发明提出了一种电力传输设备监控数据处理方法，包括：In order to solve the problems existing in the above-mentioned prior art, the present invention proposes a method for processing monitoring data of power transmission equipment, including:

根据监控数据的关联性和时间和空间属性进行多重备份的一致性散列存储，利用并行计算框架对多个监控数据源进行组合检索和并行检索和特征分析。Consistent hash storage of multiple backups is carried out according to the relevance and time and space attributes of the monitoring data, and the combined retrieval and parallel retrieval and feature analysis of multiple monitoring data sources are performed using the parallel computing framework.

优选地，所述根据监控数据的关联性和时间和空间属性进行多重备份的一致性散列存储，进一步包括：Preferably, the consistent hash storage of multiple backups according to the relevance and time and space attributes of the monitoring data further includes:

获取每个监控设备采集数据的时间和空间特性，即数据对应的采集时间和采集地点以及自定义相关系数作为数据检索和分析的关键字；在云平台中将数据存储为3个备份版本；利用一致性散列将数据的第1备份按照监控设备编号进行散列映射；将数据的第2备份按照采集时间数据进行散列映射；将数据的第3备份按照自定义相关系数进行散列映射，所述相关系数为监控数据的特定属性，其根据上层应用程序的需要来赋值；所述一致性散列存储进一步包括以下过程：Obtain the time and space characteristics of the data collected by each monitoring device, that is, the corresponding collection time and location of the data and the custom correlation coefficient as keywords for data retrieval and analysis; store the data in three backup versions on the cloud platform; use Consistent hashing performs hash mapping of the first backup of data according to the monitoring device number; hash mapping of the second backup of data according to the collection time data; hash mapping of the third backup of data according to the custom correlation coefficient, The correlation coefficient is a specific attribute of the monitoring data, which is assigned according to the needs of the upper application program; the consistent hash storage further includes the following process:

1)通过配置文件预定义监控数据的所述相关系数以及冗余备份的数量；1) Predefining the correlation coefficient of the monitoring data and the number of redundant backups through the configuration file;

2)计算云平台中每个存储节点的散列值，并将其配置到预先建立的循环散列队列区间上；2) Calculate the hash value of each storage node in the cloud platform, and configure it to the pre-established circular hash queue interval;

3)根据监控数据的时间和空间属性以及相关系数计算数据的散列值，对云平台下存在的数据多个备份的第1备份，根据数据的来源，即监控设备编号，计算第一散列值，将其映射到循环散列队列上；对第2备份，根据监控数据的时间属性即采集时间数据，计算第二散列值，并将其映射到循环散列队列上；对第3备份，根据数据的相关系数计算第三散列值，并将其映射到循环散列队列上；如果云平台配置有3个以上的备份，则交替按照上述第一至第三备份的方式计算其散列值并依次映射到循环散列队列上；3) Calculate the hash value of the data according to the time and space attributes of the monitoring data and the correlation coefficient, and calculate the first hash according to the source of the data, that is, the monitoring device number, for the first backup of multiple backups of the data existing under the cloud platform value, and map it to the circular hash queue; for the second backup, calculate the second hash value according to the time attribute of the monitoring data, that is, collect time data, and map it to the circular hash queue; for the third backup , calculate the third hash value according to the correlation coefficient of the data, and map it to the circular hash queue; if the cloud platform is configured with more than 3 backups, then alternately calculate the hash value according to the above-mentioned first to third backup methods The column values are mapped to the circular hash queue in turn;

4)根据数据散列值和存储节点散列值确定数据的存储位置，按顺时针将数据映射到距离其最近的存储节点上；4) Determine the storage location of the data according to the data hash value and the storage node hash value, and map the data to the nearest storage node clockwise;

5)若数据将存储的节点出现空间不足情况，则跳过当前节点以寻找下一个存储节点；5) If there is insufficient space in the node where the data will be stored, skip the current node to find the next storage node;

此外，在进行数据读取时，名字节点根据存储节点与客户端之间的距离对多个存储节点进行排序后返回给客户端，以从最近的节点读取数据，其中，两个节点之间的距离定义为一个节点到达另一个节点所经过的节点数。In addition, when reading data, the NameNode sorts multiple storage nodes according to the distance between the storage node and the client and returns them to the client to read data from the nearest node. The distance of is defined as the number of nodes passed by one node to reach another node.

优选地，所述对多个监控数据源进行组合检索，进一步包括：Preferably, the combined retrieval of multiple monitoring data sources further includes:

根据以下条件进行检索：设备属性数据，即名称、运行时间、安装地点、本体参数，监控数据即导线温度、载流量、拉力、环境数据即环境温度、湿度和气压、地理信息数据即海拔、经纬度；将不同的数据源进行数据连接，所述不同的数据源来自于多个文件；监控设备对绝缘端子泄漏电流、导线张力、导线电流、导线温度、微气象数据进行统一的数据采集并上传，在绝缘端子异常、导线接头过热或失衡的情况下进行相关的信息报警；其中在监控泄漏电流的过程中，利用设备属性数据文件、绝缘端子泄漏电流数据文件和环境数据文件这3个数据文件进行检索，生成监控设备预定时间内的监控数据，并将3个数据文件进行连接处理以进行组合检索；Retrieve according to the following conditions: equipment attribute data, that is, name, running time, installation location, body parameters, monitoring data, that is, wire temperature, ampacity, tension, environmental data, that is, ambient temperature, humidity and air pressure, geographic information data, that is, altitude, latitude and longitude ; Connecting different data sources, the different data sources come from multiple files; the monitoring equipment collects and uploads unified data on insulation terminal leakage current, wire tension, wire current, wire temperature, and micro-meteorological data, In the case of abnormal insulation terminals, overheating or unbalanced wire joints, relevant information alarms are carried out; in the process of monitoring leakage current, three data files are used: equipment attribute data files, insulation terminal leakage current data files and environmental data files. Retrieve, generate the monitoring data of the monitoring equipment within a predetermined time, and connect the three data files for combined retrieval;

在电力传输设备监控数据完成存储之后，对数据进行检索的方法是在map端执行的并行查询方法，在map阶段完成数据的过滤及连接过程而避免进行reduce阶段，检索包括以下步骤：After the monitoring data of the power transmission equipment is stored, the method of retrieving the data is a parallel query method executed on the map side. The data filtering and connection process is completed in the map stage to avoid the reduce stage. The retrieval includes the following steps:

1)根据用户提出的检索条件，对数据进行过滤，去除不满足条件的数据；1) According to the search conditions proposed by the user, filter the data and remove the data that does not meet the conditions;

2)根据检索需求，设定主键；所述主键为监控设备编号、时间数据或者相关系数；2) According to the retrieval requirement, the primary key is set; the primary key is the monitoring equipment number, time data or correlation coefficient;

3)对各数据源的每条记录，采用数据文件名作为标签进行标记；3) For each record of each data source, use the data file name as a label to mark;

4)根据主键将相同属性值的记录切分到一组，并进行数据连接；4) Segment records with the same attribute value into a group according to the primary key, and perform data connection;

组合检索的map过程中的过滤、标记设定、分组排序、连接操作在本地节点进行，然后组合检索的结果输出到分布式文件系统；The filtering, tag setting, group sorting, and connection operations in the map process of combined retrieval are performed on the local node, and then the result of combined retrieval is output to the distributed file system;

并且，所述对多个监控数据源进行并行检索和特征分析，进一步包括：And, the parallel retrieval and feature analysis of multiple monitoring data sources further includes:

基于多通道时间序列的动态相互关系，对多通道同步采集的信号数据进行整合特征提取，首先将数据上传至分布式文件系统，由分布式文件系统将数据分块，并随机分布到多个存储节点上，多通道时间序列的动态相互关系的计算在reduce阶段完成，计算结果输出到分布式文件系统中保存，利用数据的时间关联性，将采集时间数据作为关键字计算散列存储位置，所述特征提取过程进一步包括：Based on the dynamic interrelationship of multi-channel time series, the integrated feature extraction is performed on the multi-channel synchronously collected signal data. First, the data is uploaded to the distributed file system, and the distributed file system divides the data into blocks and randomly distributes them to multiple storages. On the node, the calculation of the dynamic interrelationship of the multi-channel time series is completed in the reduce phase, and the calculation results are output to the distributed file system for storage. Using the time correlation of the data, the collection time data is used as the key to calculate the hash storage location. The feature extraction process further includes:

1)计算任务时间，对数据进行过滤，去除不满足时间条件的数据；2)将时间数据作为主键，对每条记录进行标记；3)根据主键将相同属性值的记录切分到一组，并调用多变量样本熵计算过程，将计算结果输出到分布式文件系统。1) Calculate the task time, filter the data, and remove the data that does not meet the time condition; 2) Use the time data as the primary key to mark each record; 3) Divide the records with the same attribute value into a group according to the primary key, And call the multi-variable sample entropy calculation process, and output the calculation results to the distributed file system.

本发明相比现有技术，具有以下优点：Compared with the prior art, the present invention has the following advantages:

本发明提出了一种电力传输设备监控数据的处理方法，基于云计算技术对监控数据进行高效、可靠地存储，并且实现快速访问和分析。The invention proposes a method for processing monitoring data of power transmission equipment, which efficiently and reliably stores the monitoring data based on cloud computing technology, and realizes quick access and analysis.

具体实施方式detailed description

下文提供对本发明一个或者多个实施例的详细描述。结合这样的实施例描述本发明，但是本发明不限于任何实施例。本发明的范围仅由权利要求书限定，并且本发明涵盖诸多替代、修改和等同物。在下文描述中阐述诸多具体细节以便提供对本发明的透彻理解。出于示例的目的而提供这些细节，并且无这些具体细节中的一些或者所有细节也可以根据权利要求书实现本发明。A detailed description of one or more embodiments of the invention is provided below. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details.

本发明基于云平台进行电力传输设备监控数据存储和并行分析处理的研究；考虑数据的关联性和时间和空间属性，提出数据关联性的多重备份一致性散列存储方法，并对云平台的数据切分策略以及云平台网络架构规划进行优化。在此基础上，基于并行框架实现监控数据的数据源并行检索和多通道数据整合特征提取并行计算。The present invention conducts research on power transmission equipment monitoring data storage and parallel analysis and processing based on the cloud platform; considers data relevance and time and space attributes, proposes a multiple backup consistent hash storage method for data relevance, and analyzes the data of the cloud platform Segmentation strategy and cloud platform network architecture planning are optimized. On this basis, based on the parallel framework, the data source parallel retrieval of monitoring data and the multi-channel data integration feature extraction parallel computing are realized.

从电力传输设备监控数据平台的上层应用程序角度考虑，数据的分布主要受以下因素的影响：1)数据需要尽量均匀的分布到云平台中各节点，以保持负载均衡；2)云平台云平台中节点故障被视为一种常态，优化数据分布时需要考虑节点失效问题；3)为保证数据的可靠性及检索处理效率，需要采取多重备份方案；4)云平台运行环境下，网络传输及磁盘I/O操作是影响整体性能的重要因素，如果能减少数据的通信量，将会有效减少数据处理时间。以监控系统中常用的数据关联检索为例，在执行并行计算关联检索时，采用标准的云平台数据布局方案(未考虑数据关联性)，连接操作需要在Reduce阶段完成。在Map阶段，所有数据在多个节点上进行分组排序，之后由reduce任务的节点通过远程访问的方式进行数据下载。在这个过程中，可能有大量与最后连接操作无关的数据也在网络中被复制和传输。如果在数据上传时根据数据的设备属性，将同一设备的数据尽量存储在相同节点上，则可以在map阶段完成连接操作，省去reduce阶段的数据通信，使整体执行效率得到提高。From the perspective of the upper application program of the power transmission equipment monitoring data platform, the distribution of data is mainly affected by the following factors: 1) The data needs to be distributed to each node in the cloud platform as evenly as possible to maintain load balance; 2) The cloud platform cloud platform Node failure is regarded as a normal state, and the problem of node failure needs to be considered when optimizing data distribution; 3) In order to ensure data reliability and retrieval processing efficiency, multiple backup schemes are required; 4) In the operating environment of the cloud platform, network transmission and Disk I/O operations are an important factor affecting the overall performance. If the data communication volume can be reduced, the data processing time will be effectively reduced. Taking the commonly used data association retrieval in the monitoring system as an example, when performing parallel computing association retrieval, the standard cloud platform data layout scheme is adopted (without considering data association), and the connection operation needs to be completed in the Reduce phase. In the Map stage, all data is grouped and sorted on multiple nodes, and then the nodes of the reduce task download the data through remote access. During this process, there may be a large amount of data irrelevant to the last connection operation that is also copied and transmitted in the network. If the data of the same device is stored on the same node as much as possible according to the device attributes of the data when uploading data, the connection operation can be completed in the map stage, eliminating the data communication in the reduce stage, and improving the overall execution efficiency.

根据以上分析，对云平台的数据布局进行优化，利用以下数据存储方法：将相关的数据集中存储，在数据检索和分析时，将主要工作放在map端执行，以减少由映射到reduce中间过程网络通信负载，从而提高整体检索和分析性能。每一种类型监控数据可能具有不同的数据类型和格式，但它们的共同特点是均具有时间和空间特性，即每个监控设备采集数据均对应于一个具体的采集时间和一个具体的采集地点。这构成数据检索和分析时最常用的关键字。由于云平台默认将数据存为3个备份版本，方法考虑3方面的关联性：监控设备位置、数据采集时间和自定义关联性。利用一致性散列方法，将数据的第1备份版本按照监控设备编号进行散列映射；将数据的第2备份版本按照采集时间数据进行散列映射；将数据的第3备份版本按照自定义相关系数进行散列映射，以满足不同检索和数据分析需求。相关系数可作为监控数据的一个属性，根据上层应用程序的需要赋值，以实现自定义关联性。方法中需要构建循环散列队列。具体流程描述如下：According to the above analysis, optimize the data layout of the cloud platform, and use the following data storage methods: centrally store relevant data, and place the main work on the map side during data retrieval and analysis, so as to reduce the intermediate process from mapping to reduce Network communication load, thereby improving overall retrieval and analysis performance. Each type of monitoring data may have different data types and formats, but their common feature is that they all have time and space characteristics, that is, the data collected by each monitoring device corresponds to a specific collection time and a specific collection location. These constitute the most commonly used keywords for data retrieval and analysis. Since the cloud platform stores data in three backup versions by default, the method considers three aspects of correlation: monitoring device location, data collection time, and custom correlation. Using the consistent hash method, the first backup version of the data is hash-mapped according to the monitoring device number; the second backup version of the data is hash-mapped according to the collection time data; The coefficients are hash-mapped to meet different retrieval and data analysis needs. The correlation coefficient can be used as an attribute of the monitoring data, and it can be assigned according to the needs of the upper-layer application to achieve custom correlation. The method needs to build a circular hash queue. The specific process is described as follows:

1)监控数据的相关系数以及冗余备份数量通过配置文件预定义，这里冗余备份版本数量定义为3；1) The correlation coefficient of monitoring data and the number of redundant backups are predefined through the configuration file, where the number of redundant backup versions is defined as 3;

2)计算云平台中每个存储节点的散列值，并将其配置到循环散列队列区间上；2) Calculate the hash value of each storage node in the cloud platform, and configure it on the interval of the circular hash queue;

3)根据监控数据的时间和空间属性以及相关系数计算数据的散列值。在云平台下存在数据的多重备份。对第1备份版本，根据数据的来源，即监控设备编号，计算散列值1，将其映射到循环散列队列上；对第2备份版本，根据监控数据的时间属性，即采集时间数据，计算其散列值2，并将其映射到循环散列队列上。对第3备份版本，根据数据的相关系数计算其散列值3，并将其映射到循环散列队列上。如果需要更高的存储可靠性，配置了大于3的备份版本数量，则交替按照上述3种方式计算其散列值i，并依次映射到循环散列队列上；3) Calculate the hash value of the data according to the time and space attributes and the correlation coefficient of the monitoring data. There are multiple backups of data under the cloud platform. For the first backup version, according to the source of the data, that is, the monitoring device number, calculate the hash value 1, and map it to the circular hash queue; for the second backup version, according to the time attribute of the monitoring data, that is, the collection time data, Calculate its hash value 2 and map it to the circular hash queue. For the third backup version, calculate its hash value 3 according to the correlation coefficient of the data, and map it to the circular hash queue. If higher storage reliability is required and the number of backup versions greater than 3 is configured, the hash value i is calculated alternately according to the above three methods, and mapped to the circular hash queue in turn;

4)根据数据散列值和存储节点散列值确定数据的存储位置。按顺时针将数据映射到距离其最近的存储节点上；4) Determine the storage location of the data according to the hash value of the data and the hash value of the storage node. Map the data to the nearest storage node clockwise;

5)若数据将存储的节点出现空间不足等异常情况，则跳过该节点以寻找下一个存储节点。5) If there is an abnormal situation such as insufficient space in the node where the data will be stored, skip this node to find the next storage node.

在进行数据读取时，名字节点会根据存储节点与客户端之间的距离对多个存储节点进行排序后返回给客户端，以便从最近的节点读取数据。云平台中网络节点呈树状结构，树中每棵子树的根节点通常是连接计算机的交换节点，两个节点之间的距离定义为一个节点到达另一个节点所经过的节点数。When reading data, the NameNode sorts multiple storage nodes according to the distance between the storage node and the client and returns them to the client, so as to read data from the nearest node. The network nodes in the cloud platform are in a tree structure. The root node of each subtree in the tree is usually a switching node connected to the computer. The distance between two nodes is defined as the number of nodes passed by one node to another node.

云平台的默认配置认为所有的节点均在一个机架中，因此需要根据实际云平台的配置情况，将云平台节点的网络架构传递给云平台，才能使云平台调度器选择合理的存储节点进行数据读取和写入。网络架构结构可采用脚本代码的形式传递给云平台。The default configuration of the cloud platform considers that all nodes are in one rack, so it is necessary to pass the network architecture of the cloud platform nodes to the cloud platform according to the actual configuration of the cloud platform, so that the cloud platform scheduler can select a reasonable storage node for storage. Data read and write. The network architecture structure can be transmitted to the cloud platform in the form of script code.

电力传输设备监控需要对在线监控的多种设备以及线路参数根据监控设备编号、采集时间等条件进行组合检索。组合检索涉及设备属性数据(名称、运行时间、安装地点等)、本体参数，监控数据(导线温度、载流量、拉力等)、环境数据(环境温湿度、气压等)、地理信息数据(海拔、经纬度等)等数据源，这需要将不同的数据源进行数据连接。多源数据通常来自于不同的文件。监控设备对绝缘端子泄漏电流、导线张力、导线电流、导线温度、微气象等数据进行统一的数据采集并上传。在绝缘端子异常、导线接头过热或失衡的情况下能进行相关的信息报警。以泄漏电流检索为例，检索涉及3个数据文件：设备属性数据文件；绝缘端子泄漏电流数据文件；环境数据文件。检索需要生成监控设备一段时间内的监控数据，即获得带有设备信息和环境信息的监控数据列表，这需要将3个数据文件进行连接处理，才能获得满足要求的列表。The monitoring of power transmission equipment requires a combined retrieval of various online monitoring equipment and line parameters according to the monitoring equipment number, collection time and other conditions. Combined retrieval involves equipment attribute data (name, running time, installation location, etc.), body parameters, monitoring data (conductor temperature, ampacity, tension, etc.), environmental data (environmental temperature and humidity, air pressure, etc.), geographic information data (elevation, Longitude, latitude, etc.) and other data sources, which require data connection of different data sources. Multi-source data usually comes from different files. The monitoring equipment collects and uploads unified data on insulation terminal leakage current, wire tension, wire current, wire temperature, micro-climate and other data. In the case of abnormal insulation terminals, overheating or unbalanced wire joints, relevant information alarms can be carried out. Taking leakage current retrieval as an example, the retrieval involves three data files: equipment attribute data file; insulation terminal leakage current data file; environment data file. Retrieval needs to generate the monitoring data of the monitoring equipment for a period of time, that is, to obtain the monitoring data list with equipment information and environmental information. This requires the connection processing of three data files to obtain a list that meets the requirements.

在电力传输设备监控数据的完成存储之后，对数据进行检索的方法是在map端执行的并行查询方法，方法主要包括在map阶段完成数据的过滤及连接过程，避免进行reduce阶段，从而节省网络传输开销。方法执行的前提是数据已经按照前文所描述的基于数据关联性的多重备份一致性散列方法进行了数据分布，从而使连接时所需要的数据聚集到了同一个存储节点。检索流程可描述如下：After the monitoring data of the power transmission equipment is stored, the method of retrieving the data is a parallel query method executed on the map side. The method mainly includes completing the data filtering and connection process in the map stage, avoiding the reduce stage, thereby saving network transmission overhead. The premise of the execution of the method is that the data has been distributed according to the multiple backup consistent hashing method based on data correlation described above, so that the data required for the connection is gathered to the same storage node. The retrieval process can be described as follows:

2)根据检索需求，设定主键；主键可以是监控设备编号、时间数据或者相关系数；2) According to the retrieval requirement, set the primary key; the primary key can be monitoring equipment number, time data or correlation coefficient;

3)对各数据源的每条记录进行标记，可采用数据文件名作为标签进行标记；3) Each record of each data source is marked, and the data file name can be used as a label for marking;

4)根据主键将相同属性值的记录切分到一组，并进行数据连接。4) Segment records with the same attribute value into a group according to the primary key, and perform data connection.

数据在优化分布后，组合检索的map过程中的过滤、标记设定、分组排序、连接等操作在本地节点进行，组合检索的结果输出到分布式文件系统。After the data distribution is optimized, operations such as filtering, label setting, group sorting, and connection in the map process of combined retrieval are performed on the local node, and the result of combined retrieval is output to the distributed file system.

随着多传感测量技术广泛应用于各种电力设备监控，同步监控的多通道数据序列被采集并保存。这些同步的多通道数据序列内或序列间动态相互关系蕴含着丰富的特征信息，能更全面地反映电力设备运行状态。本发明基于多通道时间序列的动态相互关系，对6通道同步采集的振动监控设备的振动信号数据进行整合特征提取。在云平台下，基于一致性散列方法设计并行化的特征提取方法，加快特征提取速度。As the multi-sensing measurement technology is widely used in various power equipment monitoring, the synchronously monitored multi-channel data sequences are collected and saved. The dynamic interrelationships within or between these synchronous multi-channel data sequences contain rich feature information, which can more comprehensively reflect the operating status of power equipment. Based on the dynamic interrelationship of the multi-channel time series, the present invention performs integrated feature extraction on the vibration signal data of vibration monitoring equipment collected synchronously by 6 channels. Under the cloud platform, a parallel feature extraction method is designed based on the consistent hash method to speed up feature extraction.

同步采集的6通道振动监控信号独立存储于6个文件中，信号分段存储，每段信号带有时间数据。为完成对信号的并行分析，首先将数据上传至分布式文件系统。分布式文件系统将数据分块，并随机分布到多个存储节点上。由于未考虑数据关联性，并行化的数据关系评价方法只能采用在map端对数据进行数据过滤，并将各段信号通过网络发送给reduce端进行求解的计算模式。每个通道文件被切分成多个分段，分布存储于多个存储节点上。多通道时间序列的动态相互关系的计算在reduce阶段完成，计算结果输出到分布式文件系统中保存。应用上文所述的数据优化分布方法对同步采集的多通道数据进行重新分布，利用数据的时间关联性，将采集时间数据作为关键字计算散列存储位置。The synchronously collected 6-channel vibration monitoring signals are independently stored in 6 files, and the signals are stored in segments, and each segment of the signal has time data. In order to complete the parallel analysis of the signal, the data is first uploaded to the distributed file system. The distributed file system divides data into blocks and randomly distributes them to multiple storage nodes. Since the data correlation is not considered, the parallelized data relationship evaluation method can only use the calculation mode of filtering the data on the map side, and sending each segment of the signal to the reduce side through the network for solution. Each channel file is divided into multiple segments and distributed and stored on multiple storage nodes. The calculation of the dynamic interrelationship of the multi-channel time series is completed in the reduce phase, and the calculation results are output to the distributed file system for storage. Apply the above-mentioned data optimization distribution method to redistribute the synchronously collected multi-channel data, and use the time correlation of the data to calculate the hash storage location using the collected time data as a key.

优化分布使同步数据聚集，并在map任务中完成计算任务。Optimizing distribution enables synchronous data aggregation, and completes computing tasks in map tasks.

基于一致性散列算法的特征提取流程可描述如下：The feature extraction process based on the consistent hash algorithm can be described as follows:

1)计算任务时间，对数据进行过滤，去除不满足时间条件的数据；1) Calculate the task time, filter the data, and remove the data that does not meet the time condition;

2)将时间数据作为主键，对每条记录进行标记；2) Use the time data as the primary key to mark each record;

3)根据主键将相同属性值的记录切分到一组，并调用多变量样本熵的计算过程。计算结果输出到分布式文件系统。3) Segment records with the same attribute value into a group according to the primary key, and call the calculation process of multivariate sample entropy. The calculation results are output to the distributed file system.

其中多变量样本熵计算流程可描述如下：The multivariate sample entropy calculation process can be described as follows:

1)设原始p维(通道)时间序列为{x_k，i}_i＝1 ^N，k＝1，2，....，p，其中每维序列有N个点。首先对预先给定的尺度因数β，构建多变量时间序列{y_k，j ^β}，即 ${y_{k, j}}^{β} = \frac{1}{β} Σ_{i = (j - 1) β + 1}^{j β} x_{k, i}, k = 1, 2, ..., p,$ 其中 $1 < j < \frac{N}{β} .$ 1) Suppose the original p-dimensional (channel) time series is {x _{k, i} } _i=1 ^N , k=1, 2, ..., p, where each dimensional sequence has N points. Firstly, for the pre-specified scale factor β, construct a multivariate time series {y _{k, j} ^β }, namely ${the y}_{k, j}^{β} = \frac{1}{β} Σ_{i = (j - 1) β + 1}^{j β} x_{k, i}, k = 1, 2, ..., p,$ in $1 < j < \frac{N}{β} .$

2)预设p维参数嵌入矢量M[m₁，m₂，，m_p]，p维时间延迟向量2) Preset p-dimensional parameter embedding vector M[m ₁ , m ₂ ,, m _p ], p-dimensional time delay vector

[T₁，T₂，...，T_p]，利用多变量时间序列{y_k，j ^β}，构建(N-n)个复合延迟向量Y_m(i)，即： [T ₁ , T ₂ ,..., T _p ], using the multivariate time series {y _{k, j} ^β }, construct (Nn) composite delay vectors Y _m (i), namely:

3)定义Y_m(i)和Y_m(j)之间的距离为d[Y_m(i)，Y_m(j)]，即：d[Y_m(i)，Y_m(j)]＝max_{l＝1，…，m}{|x(i+l-1)-x(j+l+1|)}3) Define the distance between Y _m (i) and Y _m (j) as d[Y _m (i), Y _m (j)], namely: d[Y _m (i), Y _m (j)] =max _{l=1,..., m} {|x(i+l-1)-x(j+l+1|)}

4)对于给定的阈值r，对每个i值计算事件P_i：d[Y_m(i)，Y_m(j)]<r(j≠i)出现的概率B_i ^m(r)＝P_i/(N-n-1)，表示了所有Y_m(j)与Y_m(i)的关联程度。4) For a given threshold r, calculate event P _i for each i value: d[Y _m (i), Y _m (j)]<r(j≠i) occurrence probability B _i ^m (r)= P _i /(Nn-1), represents the degree of correlation between all Y _m (j) and Y _m (i).

5)求B_i ^m(r)对所有i的平均值，即： 5) Find the average value of B _i ^m (r) for all i, namely:

6)扩展步骤2)中的m为m+1，重复步骤3)-5)得到B^m+1(r)。6) Extend m in step 2) to m+1, and repeat steps 3)-5) to obtain B ^m+1 (r).

7)计算多变量样本熵为 $M S E (M, T, r, N) = - l n (\frac{B^{m + 1} (r)}{B^{m} (r)}) .$ 7) Calculate the multivariate sample entropy as $m S E. (m, T, r, N) = - l no (\frac{B^{m + 1} (r)}{B^{m} (r)}) .$

优选的，本发明的云平台对小文件提出一种合并策略，一定数量的小文件合并后生成新的存储文件，一般对属于同一属性的小文件进行合并。在将新的存储文件写入系统的同时更新索引文件。云平台中的索引包括，主索引是文件所属的资源集合，如类型等；次索引是具体的资源条目。在需要读取文件时，依次在主索引和次索引中査询，缩小了查询范围，能够保证较高的读取响应。本发明的云平台的存储层设计的核心包括：首先对小文件进行合并生成存储文件，再基于数据库的存储特征对合并后的文件建立次索引，通过索引预取提高文件读取的响应速度。以下详细介绍存储层具体的细节。Preferably, the cloud platform of the present invention proposes a merging strategy for small files, a certain number of small files are merged to generate a new storage file, and generally small files belonging to the same attribute are merged. Index files are updated at the same time as new stash files are written to the system. Indexes in the cloud platform include: the primary index is the resource collection to which the file belongs, such as the type; the secondary index is the specific resource entry. When a file needs to be read, the query is performed in the primary index and the secondary index in sequence, which narrows down the scope of the query and ensures a high read response. The core of the storage layer design of the cloud platform of the present invention includes: first merging small files to generate storage files, and then establishing a secondary index for the merged files based on the storage characteristics of the database, and improving the response speed of file reading through index prefetching. The specific details of the storage layer are described in detail below.

将文件划分成一个个block即块，块的默认大小是64M。分布式文件系统的命名空间被持久化在一个镜像文件中，启动时由名字节点将其加载到内存中。大量小文件会造成名字节点内存不足，生成过大的镜像文件降低读取文件时文件的查找效率。对每一个文件的读写操作，首先在命名空间中查询，查找文件的块地址、文件大小等信息，然后再在数据节点空间中进行检索。当读取的文件很小时，读写过程中主要时间都消耗在了检索查询中，而不是文件数据的传输，影响服务器集群的处理效率。Divide the file into blocks or blocks. The default size of the block is 64M. The namespace of the distributed file system is persisted in an image file, which is loaded into memory by the name node at startup. A large number of small files will cause insufficient memory in the name node, and generating too large image files will reduce the efficiency of file search when reading files. For the read and write operations of each file, first query in the namespace to find information such as the block address and file size of the file, and then search in the data node space. When the read file is small, the main time in the read and write process is spent on retrieval and query instead of file data transmission, which affects the processing efficiency of the server cluster.

云平台利用小文件合并来生成存储文件。首先实现一个过滤器对文件按类型和大小进行过滤，选择可以进行全文检索的文档文件，本文文件大小设定阈值为10M，当文件大于10M时则视为大文件，不需要进行合并。过滤后云平台按照文件条目所属资源集合为单位对过滤后的小文件进行合并成为文件块。资源集合是具有一定相关性的资源条目的集合，一个资源条目只属于一个资源集合。通常集合按照属性范围、时间等划分，文件可以按照属性域来划分。新的文件块内资源条目具有很大的关联性，在以后的数据处理中就可以将文件块分配给一个MapReduce任务，避免了因任务的计算量太少而浪费任务分配和切换的时间，减少数据在集群中的移动。The cloud platform utilizes small file merging to generate storage files. First implement a filter to filter files by type and size, and select document files that can be searched in full text. In this paper, the file size threshold is set to 10M. When a file is larger than 10M, it is considered a large file and does not need to be merged. After filtering, the cloud platform merges the filtered small files into file blocks according to the resource set to which the file entry belongs. A resource collection is a collection of resource entries with certain dependencies, and a resource entry belongs to only one resource collection. Usually collections are divided according to attribute range, time, etc., and files can be divided according to attribute domains. The resource entries in the new file block have a great correlation, and the file block can be assigned to a MapReduce task in the future data processing, avoiding the waste of task allocation and switching time due to the small amount of calculation of the task, reducing Movement of data within the cluster.

小文件合并后名字节点内存是整个文件系统的性能瓶颈，因为所有的文件元数据信息需要存储在其内存中，将小文件合并后可以减少文件的数量，节省很多内存空间，但是合并后的文件读取效率会很低。本发明优选的实施例采用分级索引来建立小文件元数据索引，将大的索引文件以合理的规则划分为小的索引文件。以资源集合为主索引，每个资源集合下的资源条目内容作为次索引，这样在查找的时候先根据资源条目所在集合进行査找，再到相应的次索引文件中进行查找。虽然多了一个在主索引中查找的过程，但是由于资源集合数不会太多，其查找时间是很小的，经过划分的次索引文件比全局索引文件小的多，所以整体上会提高查找效率。同时次索引文件也并非全部加载入内存，可根据内存使用情况并结合缓存策略进行灵活调度，解决内存不足的问题。After merging small files, the name node memory is the performance bottleneck of the entire file system, because all file metadata information needs to be stored in its memory. Merging small files can reduce the number of files and save a lot of memory space, but the merged files Reading efficiency will be very low. A preferred embodiment of the present invention adopts a hierarchical index to establish a small file metadata index, and divides a large index file into small index files according to reasonable rules. The resource collection is used as the main index, and the resource entry content under each resource collection is used as the secondary index. In this way, when searching, first search according to the collection where the resource entry is located, and then search in the corresponding secondary index file. Although there is an additional process of searching in the main index, since the number of resource collections is not too large, the search time is very small, and the divided secondary index file is much smaller than the global index file, so the overall search will be improved efficiency. At the same time, not all the secondary index files are loaded into the memory, and can be flexibly scheduled according to the memory usage and combined with the cache strategy to solve the problem of insufficient memory.

这里提出的索引预取是指通过用户当前访问的数据预测用户下面将会访问的数据，并将其索引调入缓存。若能准确预测，就可提前将用户将要访问的数据载入缓存，当用户访问时就能得到较快的系统响应。The index prefetching proposed here refers to predicting the data that the user will access next based on the data currently accessed by the user, and loading its index into the cache. If the prediction can be made accurately, the data to be accessed by the user can be loaded into the cache in advance, and a faster system response can be obtained when the user accesses it.

用户在下载或浏览资源条目前，通常必须通过检索或目录查找的方式得到“中间结果集”，然后才能在其中选择需要的资源条目进一步访问。在用户看到结果集页面与执行下载或浏览之间存在一个数秒的间隔，在这段时间内通过提前缓存中间结果集中资源条目的索引，在用户点击下载或浏览时就不用再执行一系列文件元数据查询，直接进行传输文件即可，这样可以很大程度上提高这些文件的请求响应。这种响应提升并不需要太多的内存。Before downloading or browsing resource items, users usually have to obtain an "intermediate result set" through retrieval or directory search, and then they can select the required resource items for further access. There is an interval of several seconds between the user seeing the result set page and executing the download or browsing. During this period, by caching the index of the resource entry in the intermediate result set in advance, it is unnecessary to execute a series of files when the user clicks to download or browse. For metadata query, you can directly transfer files, which can greatly improve the request response of these files. This responsiveness boost doesn't require much memory.

以下详细描述了本发明云平台的存储层构架。云平台除了利用上述策略，在实现时，其存储层架构是系统的基础。云平台存储层构建在Hadoop集群上的分布式存储系统上，提供基本的文件保存和读取服务。The storage layer architecture of the cloud platform of the present invention is described in detail below. In addition to using the above strategies, the cloud platform has its storage layer architecture as the basis of the system when it is implemented. The cloud platform storage layer is built on the distributed storage system on the Hadoop cluster to provide basic file storage and reading services.

云平台存储层的架构采用三层结构设计：用户接口层，业务逻辑层和存储层，而且为了提高性能，采用将Web服务器和服务器集群分离的方式。用户接口层即提供的用户界面，用户通过该层提供的功能发送请求和接收反馈信息。业务逻辑层是小文件读取和写入的功能实现层，包括文件合并、索引构建和缓存构建等。The architecture of the storage layer of the cloud platform adopts a three-layer structure design: user interface layer, business logic layer and storage layer, and in order to improve performance, the method of separating the Web server and the server cluster is adopted. The user interface layer is the user interface provided, and the user sends requests and receives feedback information through the functions provided by this layer. The business logic layer is the functional implementation layer for reading and writing small files, including file merging, index building, and cache building.

业务逻辑层包括文件合并、检索系统、小文件索引、缓存和分布式系统客户端等功能模块。各模块具体实现如下：The business logic layer includes functional modules such as file merging, retrieval system, small file index, cache and distributed system client. The specific implementation of each module is as follows:

(1)文件合并：文件合并功能包含2个阶段：创建SequenceFile对象进行小文件进行合并。通过过滤器的过滤，对符合合并要求的文件进行合并，首先根据资源条目所在的资源集合在主索引中查找，查找到资源集合对应的文件路径后，创建SequenceFile对象，并获得SequenceFile的Writer对象并对其进行配置，准备写入文件。在执行文件写入的同时开启一个新的线程，将该资源条目对应的文件位值、长度等元数据信息写入资源条目次索引。资源条目写入成功后关闭输出流，返回提交成功，否则返回提交失败。(1) File merging: The file merging function includes two stages: creating a SequenceFile object to merge small files. Merge files that meet the merging requirements through filter filtering. First, search in the main index according to the resource collection where the resource entry is located. After finding the file path corresponding to the resource collection, create a SequenceFile object, obtain the Writer object of the SequenceFile, and It is configured and ready to be written to a file. Start a new thread while executing file writing, and write metadata information such as the file bit value and length corresponding to the resource entry into the sub-index of the resource entry. After the resource entry is successfully written, the output stream is closed and the submission success is returned, otherwise the submission failure is returned.

(2)检索：提供文件检索功能，依靠该模块基于“中间结果集”对分布式文件系统进行读取优化。(2) Retrieval: Provide file retrieval function, and rely on this module to optimize reading of distributed file system based on "intermediate result set".

(3)小文件索引：构建小文件索引，包括资源集合主索引和资源条目次索引，提供索引文件创建、追加和删除记录等功能。(3) Small file index: build a small file index, including resource collection primary index and resource entry secondary index, and provide index file creation, append and delete records and other functions.

主索引数据存储在关系数据库中，通过关系数据库访问接口提供访问，使用Java中的Map数据结构保存。因为资源集合已经存入数据库，根据此索引只需要增加在资源条目添加的时候由系统生成值的字段，所以可以保存在关系数据库中，不影响处理效率。主索引中的数据采用Key/Value结构，可以使用Java中Map数据结构提高査询效率。另外，为保证检索效率，必须在服务启动的时候根据数据库中内容初始化该Map对象并一直存在，由于主索引文件数不多，Map对象占用内存很小，所以系统开销有限，当有新的资源集合加入或有的被删除的时候，需对该Map对象进行更新。The primary index data is stored in the relational database, accessed through the relational database access interface, and saved using the Map data structure in Java. Because the resource collection has been stored in the database, according to this index, it is only necessary to add the field with the value generated by the system when the resource entry is added, so it can be stored in the relational database without affecting the processing efficiency. The data in the main index adopts the Key/Value structure, and the Map data structure in Java can be used to improve query efficiency. In addition, in order to ensure retrieval efficiency, the Map object must be initialized according to the content in the database when the service starts and exists all the time. Since there are not many main index files, the Map object occupies very little memory, so the system overhead is limited. When there are new resources When the collection is added or some are deleted, the Map object needs to be updated.

次索引是通过开源项目Lucene创建的，支持小文件元数据检索。Lucene有一套完善的索引构建、更新和查找解决方案，而且在索引文件小于1G时查询效率非常高，可用于构建商用搜索引擎。云平台要创建的索引需要一些特殊的功能，如每当用户添加资源条目的时候需实时更新索引文件；多个用户在一个资源集合下同时添加资源条目时，文件写入的并发控制；压缩索引文件以减少内存占用等。The secondary index is created through the open source project Lucene, which supports small file metadata retrieval. Lucene has a complete set of index construction, update and search solutions, and the query efficiency is very high when the index file is less than 1G, which can be used to build a commercial search engine. The index to be created on the cloud platform requires some special functions, such as real-time update of the index file whenever a user adds a resource entry; concurrency control of file writing when multiple users add resource entries under a resource collection at the same time; compressed index files to reduce memory usage, etc.

(4)预取：为了更好地提升响应速度，这里提供对用户感兴趣的“中间结果集”的缓存管理，包括缓存空间维护，缓存更新，更新算法维护等功能。(4) Prefetching: In order to better improve the response speed, cache management of "intermediate result sets" that are of interest to users is provided here, including cache space maintenance, cache update, update algorithm maintenance and other functions.

在用户发出检索请求后，Web服务根据用户检索条件査询符合用户需要的资源条目结果集，返回给用户，同时创建异步线程更新缓存，在返回用户结果集到用户浏览结果集并确定点击下载或浏览操作之间的时间间隔内更新缓存内容。当缓存模块接收到更新缓存内容请求时，调用索引模块进行检索，将当前结果集条目的元数据载入缓存。当用户发送下载或浏览请求时，Web服务调用分布式系统客户端在缓存中查找元数据开始读取数据并向客户端传输。After the user sends a retrieval request, the Web service queries the resource item result set that meets the user's needs according to the user's retrieval condition, returns it to the user, and creates an asynchronous thread to update the cache at the same time. The cache content is updated in the interval between browsing operations. When the cache module receives a request to update the cache content, it calls the index module to retrieve and load the metadata of the current result set entry into the cache. When a user sends a download or browse request, the Web service calls the distributed system client to look up metadata in the cache to start reading the data and transmit it to the client.

系统维护一个固定线程数量的线程池在每次接收到更新缓存请求的时候调用一个线程去处理，若线程池内没有空闲线程则让该缓存任务等待。这样可以将缓存更新任务占的系统资源维持在一个合理的范围内，不影响系统整体性能。本发明选择FIFO算法实现缓存模块调度功能，以最高效的方式淘汰最久以前的缓存条目。具体实现是：建立缓存池，配置缓存池大小，默认为32M，可以保存20万条文件元数据信息。缓存池里面存储的是一个个键值对key/value，文件名作为key，文件的数据节点ID，起始位置和长度的组合作为value。该缓存池提供两个操作put和get。put往缓存池放入数据，如果缓存池里面已有的数据达到了上限，则根据缓存替换算法替换相应的数据，如果还有空间直接放入就行。Get操作根据key值获取相应的value值，如没有则返回空。The system maintains a thread pool with a fixed number of threads and calls a thread to process each time it receives a cache update request. If there are no idle threads in the thread pool, the cache task is made to wait. In this way, the system resources occupied by the cache update task can be maintained within a reasonable range without affecting the overall system performance. The present invention selects the FIFO algorithm to realize the scheduling function of the cache module, and eliminates the oldest cache entries in the most efficient manner. The specific implementation is: establish a cache pool, configure the size of the cache pool, the default is 32M, and can save 200,000 pieces of file metadata information. The cache pool stores key-value pairs key/value, the file name is used as the key, the data node ID of the file, the combination of the starting position and length is used as the value. The buffer pool provides two operations put and get. Put puts data into the cache pool. If the existing data in the cache pool reaches the upper limit, replace the corresponding data according to the cache replacement algorithm. If there is still space, just put it directly. The Get operation obtains the corresponding value value according to the key value, and returns empty if there is no value.

分布式系统客户端封装了操作文件系统与外界交互的API，包括读写文件和查询文件位置等。当文件系统接收到文件读取请求时，首先经过文件过滤器进行判断，属于被合并了的文件则首先在缓存中查找文件的元数据信息，若不存在，则在索引文件中进行查找，若还是查找不到则与名字节点通信。査找到文件元数据后构建SequenceFile对象然后获得SequenceFile的Reader对象向数据节点发送读取请求，将数据传输给用户后关闭输入流，返回完成。The distributed system client encapsulates the API for operating the file system to interact with the outside world, including reading and writing files and querying file locations. When the file system receives a file read request, it first judges through the file filter. If the file belongs to the merged file, it first searches the metadata information of the file in the cache. If it does not exist, it searches in the index file. If it still cannot be found, communicate with the name node. After finding the file metadata, construct the SequenceFile object and then obtain the Reader object of the SequenceFile to send a read request to the data node, close the input stream after transmitting the data to the user, and return to complete.

用户有两种请求方式，一种是提交文件的写入请求，一种是查询、浏览或获取资源的读取请求。当Web服务器接收到用户提交资源请求时，首先判断是否需要做小文件合并，若需要，则进行文件合并，不需要则直接使用分布式文件系统写入接口进行写入即可。文件合并后通过分布式文件系统客户端准备将文件写入分布式文件系统，在分布式系统客户端写入文件的同时，调用小文件索引更新模块执行小文件索引及更新，因为Web服务器主机和服务器集群是分离的，写入和更新可以通过不同的线程同时执行，彼此没有影响。当分布式文件系统写入成功后Web服务向客户端返回提交成功信息。Users have two request methods, one is a write request to submit a file, and the other is a read request to query, browse or obtain resources. When the web server receives the resource request submitted by the user, it first judges whether it is necessary to merge small files. If necessary, it will merge the files. If not, it can directly use the distributed file system writing interface to write. After the file is merged, the distributed file system client prepares to write the file into the distributed file system. When the distributed system client writes the file, the small file index update module is called to perform the small file index and update, because the web server host and The server cluster is separated, and writes and updates can be performed simultaneously by different threads without affecting each other. After the distributed file system is successfully written, the web service returns the submission success message to the client.

在用户需要浏览文件详细内容或下载文件时发送文件读取请求，该请求频次高，耗费系统资源最多。当Web服务器接收到用户的读取请求时，首先通过检索系统根据用户提交的条件进行检索，得到用户需要的资源条目结果集返回给用户浏览，同时将结果集中显示在用户界面中第一页的条目集合(默认20条)发送给缓存模块，并开启一个单独的线程更新缓存，当用户浏览完返回的结果集页面请求下载或浏览详细时，Web服务调用分布式文件系统客户端准备读取文件内容，分布式文件系统客户端首先在缓存中查找文件位置信息，若没有查找到则再到小文件索引中查找，查找到位置信息后则直接到数据节点读取数据，返回给用户。Send a file read request when the user needs to browse the detailed content of the file or download the file. The frequency of this request is high and consumes the most system resources. When the web server receives the user's read request, it first searches through the retrieval system according to the conditions submitted by the user, obtains the resource item result set required by the user and returns it to the user for browsing, and at the same time displays the results in a concentrated manner on the first page of the user interface. The entry set (default 20) is sent to the cache module, and a separate thread is opened to update the cache. When the user browses the returned result set page and requests to download or browse details, the web service calls the distributed file system client to prepare to read the file Content, the distributed file system client first searches the file location information in the cache, if not found, it then searches in the small file index, and after finding the location information, it directly reads the data from the data node and returns it to the user.

应当理解的是，本发明的上述具体实施方式仅仅用于示例性说明或解释本发明的原理，而不构成对本发明的限制。因此，在不偏离本发明的精神和范围的情况下所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。此外，本发明所附权利要求旨在涵盖落入所附权利要求范围和边界、或者这种范围和边界的等同形式内的全部变化和修改例。It should be understood that the above specific embodiments of the present invention are only used to illustrate or explain the principle of the present invention, and not to limit the present invention. Therefore, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present invention shall fall within the protection scope of the present invention. Furthermore, it is intended that the appended claims of the present invention embrace all changes and modifications that come within the scope and metesques of the appended claims, or equivalents of such scope and metes and bounds.

Claims

1. a power transmitting device monitor data disposal route, is characterized in that:

The consistance hash of carrying out multiple duplication according to the relevance of monitor data and Time and place attribute stores, and utilizes parallel computation frame to carry out combined retrieval and parallel search and signature analysis to multiple monitor data source.

2. method according to claim 1, is characterized in that, the consistance hash that the described relevance according to monitor data and Time and place attribute carry out multiple duplication stores, and comprises further:

Obtain the Time and place characteristic of each watch-dog image data, the acquisition time that namely data are corresponding and collecting location and self-defined related coefficient are as the key word of data retrieval and analysis; In cloud platform, data are stored as 3 backup versions; Utilize consistance hash that the 1st of data the backup is carried out Hash maps according to watch-dog numbering; The 2nd of data backup is carried out Hash maps according to acquisition time data; The 3rd of data backup is carried out Hash maps according to self-defined related coefficient, and described related coefficient is the particular community of monitor data, and it needs to come assignment according to upper level applications; Described consistance hash stores and comprises following process further:

1) by the described related coefficient of configuration file predefine monitor data and the quantity of redundancy backup;

2) calculate the hashed value of each memory node in cloud platform, and be configured between the circulation hash queue region set up in advance;

3) according to the Time and place attribute of monitor data and the hashed value of Calculation of correlation factor data, to the 1st backup of the multiple backup of the data existed under cloud platform, according to the source of data, i.e. watch-dog numbering, calculate the first hashed value, be mapped in the queue of circulation hash; To the 2nd backup, according to time attribute and the acquisition time data of monitor data, calculate the second hashed value, and be mapped in the queue of circulation hash; To the 3rd backup, according to Calculation of correlation factor the 3rd hashed value of data, and be mapped in the queue of circulation hash; If cloud platform configuration has the backup of more than 3, then alternately calculate its hashed value according to the mode of the above-mentioned first to the 3rd backup and be mapped in the queue of circulation hash successively;

4) according to the memory location of data hash value and memory node hashed value determination data, by clockwise by data-mapping on the memory node nearest apart from it;

5) if the node of storage is occurred insufficient space situation by data, then present node is skipped to find next memory node;

In addition, when carrying out digital independent, namenode returns to client after sorting to multiple memory node according to the distance between memory node and client, to read data from nearest node, wherein, the distance definition between two nodes by a node arrive another node the nodes of process.

3. method according to claim 2, is characterized in that, describedly carries out combined retrieval to multiple monitor data source, comprises further:

Retrieve according to following condition: device attribute data, i.e. title, working time, infield, body parameter, monitor data and conductor temperature, current-carrying capacity, pulling force, environmental data and environment temperature, humidity and air pressure, geographic information data and height above sea level, longitude and latitude; Different data sources is carried out data cube computation, and described different data source comes from multiple file; Watch-dog carries out unified data acquisition to insulated terminal leakage current, wire tension, current in wire, conductor temperature, microclimate data and uploads, abnormal at insulated terminal, terminal is overheated or unbalance the information of carrying out being correlated with report to the police; Wherein in the process of monitoring leakage current, these 3 data files of device attribute data file, insulated terminal leakage current data file and environmental data file are utilized to retrieve, generate the monitor data in the watch-dog schedule time, and 3 data files are carried out connection handling to carry out combined retrieval;

After power transmitting device monitor data completes storage, the method retrieved data is the parallel query method performed at map end, and complete the filtration of data and connection procedure in the map stage and avoid carrying out the reduce stage, retrieval comprises the following steps:

1) according to the search condition that user proposes, data are filtered, remove the data do not satisfied condition;

2) according to Search Requirement, setting major key; Described major key is watch-dog numbering, time data or related coefficient;

3) to every bar record of each data source, Data Filename is adopted to mark as label;

4) according to major key by the record cutting of same alike result value to one group, and carry out data cube computation;

Filtration in the map process of combined retrieval, flag settings, packet sequencing, attended operation are carried out at local node, and then the result of combined retrieval outputs to distributed file system;

Further, described parallel search and signature analysis are carried out to multiple monitor data source, comprise further:

Based on hyperchannel seasonal effect in time series dynamic interrelationships, integration characteristics extraction is carried out to the signal data of multichannel synchronousing collection, first by data upload to distributed file system, by distributed file system by deblocking, and stochastic distribution is on multiple memory node, the calculating of hyperchannel seasonal effect in time series dynamic interrelationships completed in the reduce stage, result of calculation outputs in distributed file system and preserves, utilize the temporal associativity of data, acquisition time data are calculated hash memory location as key word, and described characteristic extraction procedure comprises further:

1) the calculation task time, data are filtered, remove the data not meeting time conditions; 2) using time data as major key, every bar record is marked; 3) according to major key by the record cutting of same alike result value to one group, and call multivariate sample entropy computation process, result of calculation outputted to distributed file system.