[go: up one dir, main page]

CN112115164B - Data processing method and device, data query method and device and network equipment - Google Patents

Data processing method and device, data query method and device and network equipment Download PDF

Info

Publication number
CN112115164B
CN112115164B CN201910535480.4A CN201910535480A CN112115164B CN 112115164 B CN112115164 B CN 112115164B CN 201910535480 A CN201910535480 A CN 201910535480A CN 112115164 B CN112115164 B CN 112115164B
Authority
CN
China
Prior art keywords
data
structured
preset
original
aggregated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910535480.4A
Other languages
Chinese (zh)
Other versions
CN112115164A (en
Inventor
罗艳军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Beijing Kingsoft Cloud Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Beijing Kingsoft Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd, Beijing Kingsoft Cloud Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN201910535480.4A priority Critical patent/CN112115164B/en
Publication of CN112115164A publication Critical patent/CN112115164A/en
Application granted granted Critical
Publication of CN112115164B publication Critical patent/CN112115164B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data processing method and device, a data query method and device and network equipment, wherein the data processing method comprises the following steps: collecting original data to be processed; carrying out structuring treatment on the original data according to a specified format to obtain structured data; carrying out aggregation treatment on the structured data to obtain aggregated data, and taking the aggregated data as data processed by original data; wherein the data amount of the aggregate data is smaller than the data amount of the original data. The invention can effectively reduce the data volume and solve the problem of huge data volume in the prior art.

Description

数据处理方法及装置、数据查询方法及装置、网络设备Data processing method and device, data query method and device, network equipment

技术领域Technical Field

本发明涉及数据处理技术领域,尤其是涉及数据处理方法及装置、数据查询方法及装置、网络设备。The present invention relates to the field of data processing technology, and in particular to a data processing method and device, a data query method and device, and network equipment.

背景技术Background Art

在诸如云服务等领域中通常会涉及到庞大的数据量。诸如,云服务提供商会为用户提供计算服务、网络服务、存储服务等多种计费产品(又可称为资源),并以数据形式记录用户在预设时间段内所消耗的资源量,得到用于作为收费依据的资源消耗数据。云服务提供商所记录得到的资源消耗数据量通常巨大。诸如,对于一个资源而言,如果云服务提供商以1秒一次的频率记录一条资源消耗数据,一天将会记录86400条数据,如果云服务提供商可提供一万个资源,一天记录的资源消耗数据则是8亿,更何况云服务提供商通常可提供几万、几十万甚至百万的资源。庞大的数据量可能会带来存储不便、查询耗时长等诸多问题。In areas such as cloud services, huge amounts of data are usually involved. For example, cloud service providers will provide users with a variety of billing products (also known as resources), such as computing services, network services, and storage services, and record the amount of resources consumed by users in a preset time period in the form of data to obtain resource consumption data used as the basis for charging. The amount of resource consumption data recorded by cloud service providers is usually huge. For example, for a resource, if a cloud service provider records a resource consumption data at a frequency of once per second, 86,400 pieces of data will be recorded in a day. If a cloud service provider can provide 10,000 resources, the resource consumption data recorded in a day will be 800 million, not to mention that cloud service providers can usually provide tens of thousands, hundreds of thousands, or even millions of resources. The huge amount of data may bring many problems such as inconvenient storage and long query time.

发明内容Summary of the invention

有鉴于此,本发明的目的在于提供一种数据处理方法及装置、数据查询方法及装置、网络设备,能够有效缩减数据量,改善现有技术中数据量庞大的问题。In view of this, an object of the present invention is to provide a data processing method and device, a data query method and device, and a network device, which can effectively reduce the amount of data and improve the problem of huge data volume in the prior art.

为了实现上述目的,本发明实施例采用的技术方案如下:In order to achieve the above purpose, the technical solution adopted by the embodiment of the present invention is as follows:

第一方面,本发明实施例提供了一种数据处理方法,包括:采集待处理的原始数据;对所述原始数据按照指定格式进行结构化处理,得到结构化数据;对所述结构化数据进行聚合处理以得到聚合数据,将所述聚合数据作为所述原始数据处理后的数据;其中,所述聚合数据的数据量小于所述原始数据的数据量。In a first aspect, an embodiment of the present invention provides a data processing method, comprising: collecting original data to be processed; performing structured processing on the original data according to a specified format to obtain structured data; performing aggregation processing on the structured data to obtain aggregated data, and using the aggregated data as data after processing the original data; wherein the data volume of the aggregated data is smaller than the data volume of the original data.

在一些实施例中,所述对所述原始数据按照指定格式进行结构化处理,得到结构化数据的步骤,包括:从所述原始数据中查找与预设结构化表中的各字段对应的数据,并将查找到的所述数据填写在所述预设结构化表中对应的字段处;将经填写的所述预设结构化表作为结构化数据。In some embodiments, the step of structuring the original data according to a specified format to obtain structured data includes: searching the original data for data corresponding to each field in a preset structured table, and filling the found data in the corresponding fields in the preset structured table; and using the filled preset structured table as structured data.

在一些实施例中,所述方法还包括:如果从所述原始数据中未查找到与所述预设结构化表中指定的关键字段对应的数据,丢弃所述原始数据。In some embodiments, the method further includes: if no data corresponding to the key field specified in the preset structured table is found in the original data, discarding the original data.

在一些实施例中,所述对所述结构化数据进行聚合处理,得到聚合数据的步骤,包括:将所述结构化数据进行分组,得到至少一个数据组;其中,同一个所述数据组中的多条结构化数据在预设字段对应填入的数据相同;对于每个所述数据组,将该数据组中的多条结构化数据进行聚合,得到该数据组对应的聚合数据;其中,所述聚合数据为该数据组中的多条结构化数据中选取的一条结构化数据,或者,所述聚合数据为该数据组中的多条结构化数据按照预设算法形成的一条新的结构化数据。In some embodiments, the step of aggregating the structured data to obtain aggregated data includes: grouping the structured data to obtain at least one data group; wherein, multiple pieces of structured data in the same data group have the same data filled in corresponding preset fields; for each of the data groups, aggregating the multiple pieces of structured data in the data group to obtain aggregated data corresponding to the data group; wherein, the aggregated data is a piece of structured data selected from the multiple pieces of structured data in the data group, or, the aggregated data is a new piece of structured data formed by the multiple pieces of structured data in the data group according to a preset algorithm.

在一些实施例中,所述方法还包括:将得到的所述聚合数据存储于非关系型数据库中;所述非关系型数据库部署于服务器集群上。In some embodiments, the method further includes: storing the obtained aggregated data in a non-relational database; and deploying the non-relational database on a server cluster.

在一些实施例中,所述采集待处理的原始数据的步骤,包括:按照预设频率从部署有计费产品的服务器获取待处理的原始数据;其中,所述原始数据为所述服务器记录的所述计费产品的资源消耗数据;所述服务器的数量为一个或多个。In some embodiments, the step of collecting the raw data to be processed includes: obtaining the raw data to be processed from a server where a billing product is deployed at a preset frequency; wherein the raw data is resource consumption data of the billing product recorded by the server; and the number of the servers is one or more.

第二方面,本发明实施例还提供一种数据查询方法,包括:如果接收到数据查询请求,从预设的数据库中查找与所述数据查询请求对应的聚合数据;其中,所述数据库中存储有如第一方面任一项实施例提供的所述的数据处理方法得到的聚合数据;将查找到的所述聚合数据反馈给所述数据查询请求的请求方。In a second aspect, an embodiment of the present invention further provides a data query method, comprising: if a data query request is received, searching for aggregated data corresponding to the data query request from a preset database; wherein the database stores aggregated data obtained by the data processing method provided in any one of the embodiments of the first aspect; and feeding back the found aggregated data to the requester of the data query request.

第三方面,本发明实施例提供了一种数据处理装置,包括:数据采集模块,用于采集待处理的原始数据;数据结构化模块,用于对所述原始数据按照指定格式进行结构化处理,得到结构化数据;数据聚合模块,用于对所述结构化数据进行聚合处理以得到聚合数据,将所述聚合数据作为所述原始数据处理后的数据;其中,所述聚合数据的数据量小于所述原始数据的数据量。In a third aspect, an embodiment of the present invention provides a data processing device, comprising: a data acquisition module, used to acquire original data to be processed; a data structuring module, used to perform structured processing on the original data according to a specified format to obtain structured data; a data aggregation module, used to perform aggregation processing on the structured data to obtain aggregated data, and use the aggregated data as data after processing the original data; wherein the data volume of the aggregated data is smaller than the data volume of the original data.

第四方面,本发明实施例提供了一种数据查询装置,包括:数据查找模块,用于如果接收到数据查询请求,从预设的数据库中查找与所述数据查询请求对应的聚合数据;其中,所述数据库中存储有如第一方面任一项实施例提供的所述的数据处理方法得到的聚合数据;数据反馈模块,用于将查找到的所述聚合数据反馈给所述数据查询请求的请求方。In a fourth aspect, an embodiment of the present invention provides a data query device, comprising: a data search module, for searching for aggregated data corresponding to a data query request from a preset database if a data query request is received; wherein the database stores aggregated data obtained by the data processing method provided in any one of the embodiments of the first aspect; and a data feedback module, for feeding back the found aggregated data to the requester of the data query request.

第五方面,本发明实施例提供了一种网络设备,包括处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令以实现第一方面任一项实施例提供的方法,或者所述处理器执行所述机器可执行指令以实现第二方面提供的方法。In a fifth aspect, an embodiment of the present invention provides a network device, comprising a processor and a machine-readable storage medium, wherein the machine-readable storage medium stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions to implement the method provided by any embodiment of the first aspect, or the processor executes the machine-executable instructions to implement the method provided by the second aspect.

第六方面,本发明实施例提供了一种机器可读存储介质,所述机器可读存储介质存储有机器可执行指令,所述机器可执行指令在被处理器调用和执行时,所述机器可执行指令促使所述处理器实现第一方面任一项实施例提供的方法,或者,所述机器可执行指令促使所述处理器实现第二方面提供的方法。In a sixth aspect, an embodiment of the present invention provides a machine-readable storage medium, wherein the machine-readable storage medium stores machine-executable instructions. When the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the method provided by any one of the embodiments of the first aspect, or the machine-executable instructions cause the processor to implement the method provided by the second aspect.

本发明实施例提供了一种数据处理方法及装置,能够对采集的原始数据按照指定格式进行结构化处理,得到结构化数据,然后对结构化数据进行聚合处理以得到聚合数据,从而将聚合数据作为原始数据处理后的数据。本实施例采用对原始数据进行结构化处理并聚合的方式,可以有效缩减原始数据的数据量,改善现有技术中数据量庞大的问题。The embodiment of the present invention provides a data processing method and device, which can perform structured processing on the collected raw data according to a specified format to obtain structured data, and then aggregate the structured data to obtain aggregated data, so as to use the aggregated data as the data after the raw data is processed. This embodiment adopts the method of performing structured processing and aggregating the raw data, which can effectively reduce the amount of raw data and improve the problem of huge data volume in the prior art.

本发明实施例提供了一种数据查询方法及装置,能够从预设的数据库中查找与接收到的数据查询请求对应的聚合数据,并反馈给数据查询请求的请求方,由于数据库中存储的是采用上述数据处理方法得到的聚合数据,数据量小于现有技术中直接在数据库中存储原始数据的数据量,因此通过这种方式可以有效缩短数据查询时间,提升查询效率。An embodiment of the present invention provides a data query method and device, which can search for aggregated data corresponding to a received data query request from a preset database and feed back the aggregated data to the requester of the data query request. Since the database stores aggregated data obtained by the above-mentioned data processing method, the amount of data is smaller than the amount of original data directly stored in the database in the prior art. Therefore, this method can effectively shorten the data query time and improve the query efficiency.

本发明实施例的其他特征和优点将在随后的说明书中阐述,或者,部分特征和优点可以从说明书推知或毫无疑义地确定,或者通过实施本发明实施例的上述技术即可得知。Other features and advantages of the embodiments of the present invention will be described in the following description, or some features and advantages can be inferred or determined without doubt from the description, or can be learned by implementing the above-mentioned techniques of the embodiments of the present invention.

为使本发明的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。In order to make the above-mentioned objects, features and advantages of the present invention more obvious and easy to understand, preferred embodiments are given below and described in detail with reference to the accompanying drawings.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the specific implementation methods of the present invention or the technical solutions in the prior art, the drawings required for use in the specific implementation methods or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are some implementation methods of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.

图1示出了本发明实施例所提供的一种数据处理方法的流程图;FIG1 shows a flow chart of a data processing method provided by an embodiment of the present invention;

图2示出了本发明实施例所提供的一种数据处理方法的具体流程图;FIG2 shows a specific flow chart of a data processing method provided by an embodiment of the present invention;

图3示出了本发明实施例所提供的一种数据查询方法的流程图;FIG3 shows a flow chart of a data query method provided by an embodiment of the present invention;

图4示出了本发明实施例所提供的一种数据处理装置的结构框图;FIG4 shows a structural block diagram of a data processing device provided by an embodiment of the present invention;

图5示出了本发明实施例所提供的一种数据查询装置的结构框图;FIG5 shows a structural block diagram of a data query device provided by an embodiment of the present invention;

图6示出了本发明实施例所提供的一种网络设备的结构示意图。FIG6 shows a schematic diagram of the structure of a network device provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合附图对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

目前诸如云服务等领域中通常会涉及到庞大的数据量,以云服务提供商的收费模式为例,云服务提供商向用户收取服务费的一种典型方式为:以预定频率采集用户在某一时间段内所消耗的资源,并将该时间段内的资源消耗量的峰值作为该时间段内的单价标准。诸如,云服务提供商会以5秒一次的频率采集用户的资源消耗数据,得到该用户在一小时内的资源消耗数据集合,并依据得到的资源消耗数据集合中的资源消耗数据最高值确定该用户在这一小时内的单价,从而对该用户进行收费。基于上述需求,对于每种收费资源而言,云服务提供商都需要频繁采集并存储用户的资源消耗数据,每天采集的资源消耗数据量巨大,极其影响存储空间以及查询效率。At present, fields such as cloud services usually involve huge amounts of data. Taking the charging model of cloud service providers as an example, a typical way for cloud service providers to charge users for services is to collect the resources consumed by users in a certain time period at a predetermined frequency, and use the peak value of resource consumption in the time period as the unit price standard in the time period. For example, a cloud service provider will collect the user's resource consumption data at a frequency of once every 5 seconds, obtain the user's resource consumption data set within an hour, and determine the user's unit price within this hour based on the highest value of the resource consumption data in the obtained resource consumption data set, thereby charging the user. Based on the above requirements, for each charged resource, the cloud service provider needs to frequently collect and store the user's resource consumption data. The amount of resource consumption data collected every day is huge, which greatly affects the storage space and query efficiency.

为改善上述问题,本发明实施例提供的一种数据处理方法及装置、数据查询方法及装置、网络设备,该技术可应用于诸如云服务领域等容易涉及到庞大数据量的领域,以下对本发明实施例进行详细介绍。To improve the above problems, the embodiments of the present invention provide a data processing method and device, a data query method and device, and a network device. This technology can be applied to fields such as cloud services that easily involve large amounts of data. The embodiments of the present invention are described in detail below.

首先本发明实施例公开了一种数据处理方法,该方法可以由服务器等网络设备执行。参见图1所示的数据处理方法的流程图,该方法主要包括以下步骤S102~步骤S106:First, the embodiment of the present invention discloses a data processing method, which can be executed by a network device such as a server. Referring to the flowchart of the data processing method shown in FIG1 , the method mainly includes the following steps S102 to S106:

步骤S102,采集待处理的原始数据。Step S102, collecting raw data to be processed.

上述原始数据可以为直接采集的且未经处理的数据,其数据量通常较多,诸如云服务提供商通过部署有计费产品的服务器直接采集到的大量的资源消耗数据。一种具体的实施方式中,可以按照预设频率从部署有计费产品的服务器获取待处理的原始数据;该原始数据为服务器记录的计费产品的资源消耗数据。The above raw data may be directly collected and unprocessed data, and the data volume is usually large, such as a large amount of resource consumption data directly collected by a cloud service provider through a server deployed with a billing product. In a specific implementation, the raw data to be processed may be obtained from a server deployed with a billing product at a preset frequency; the raw data is the resource consumption data of the billing product recorded by the server.

在实际应用中,服务器的数量为一个或多个。一种计费产品可以集中部署于一个服务器上,也可以分散部署于服务器集群上,部署有计费产品的服务器上可以设置有用于采集原始数据的数据采集器。计费产品可以实现诸如计算服务、网络服务、存储服务等面向用户的服务,具体可采用云服务器、弹性IP、存储块等方式实现,云服务提供商通常具有多种计费产品,每种计费产品具有唯一ID。In actual applications, there are one or more servers. A billing product can be deployed on one server or on a server cluster. The server where the billing product is deployed can be equipped with a data collector for collecting raw data. Billing products can implement user-oriented services such as computing services, network services, and storage services. Specifically, they can be implemented using cloud servers, elastic IPs, storage blocks, etc. Cloud service providers usually have multiple billing products, and each billing product has a unique ID.

相比于现有技术大多是直接将原始数据存储于数据库中,导致数据库的数据量庞大,而本申请在采集到原始数据之后,会对采集的原始数据采取如以下步骤S104和S106进行进一步处理,以便于较好地缩减数据量。Compared with the prior art, most of the original data is directly stored in the database, resulting in a huge amount of data in the database. After collecting the original data, the present application will further process the collected original data in the following steps S104 and S106 to better reduce the amount of data.

步骤S104,对原始数据按照指定格式进行结构化处理,得到结构化数据。Step S104, structuring the original data according to a specified format to obtain structured data.

通常而言,原始数据大多是字符串形式的非结构化数据,不同的原始数据的字段的形式、数量、排序等大多不同,形式杂乱无章,为了能够更方便地对原始数据进行处理并存储,本实施例将原始数据转换为整齐有序的结构化数据,该结构化处理过程可以为离线处理。Generally speaking, most raw data is unstructured data in the form of strings. The form, quantity, order, etc. of the fields of different raw data are mostly different, and the form is disorganized. In order to be able to process and store the raw data more conveniently, this embodiment converts the raw data into neatly ordered structured data, and the structured processing process can be offline processing.

步骤S106,对结构化数据进行聚合处理以得到聚合数据,将聚合数据作为原始数据处理后的数据;其中,聚合数据的数据量小于原始数据的数据量。Step S106, performing aggregation processing on the structured data to obtain aggregated data, and using the aggregated data as data after the original data is processed; wherein the data volume of the aggregated data is smaller than the data volume of the original data.

具体的,可以按照预设的聚合规则对结构化数据进行聚合处理,得到数量较少的聚合数据。诸如,1000条结构化数据经过聚合处理后得到1条聚合数据,则可较大程度的减少数据量。具体的聚合规则可以根据实际需求而灵活设置,无论是何种聚合规则,都需要满足最终得到的聚合数据的数据量小于原始数据的数据量。Specifically, the structured data can be aggregated according to the preset aggregation rules to obtain a smaller amount of aggregated data. For example, after 1000 pieces of structured data are aggregated, 1 piece of aggregated data is obtained, which can greatly reduce the amount of data. The specific aggregation rules can be flexibly set according to actual needs. Regardless of the aggregation rules, the amount of aggregated data finally obtained must be less than the amount of original data.

本发明实施例提供的上述数据处理方法,能够对采集的原始数据按照指定格式进行结构化处理,得到结构化数据,然后对结构化数据进行聚合处理以得到聚合数据,从而将聚合数据作为原始数据处理后的数据。本实施例采用对原始数据进行结构化处理并聚合的方式,可以有效缩减原始数据的数据量,改善现有技术中数据量庞大的问题,从而也有助于缓解因数据量庞大而导致的存储不便及查询效率低下等问题。The above data processing method provided by the embodiment of the present invention can perform structured processing on the collected raw data according to the specified format to obtain structured data, and then aggregate the structured data to obtain aggregated data, so as to use the aggregated data as the data after the raw data is processed. This embodiment adopts the method of performing structured processing and aggregation on the raw data, which can effectively reduce the amount of raw data, improve the problem of large amount of data in the prior art, and thus also help alleviate the problems of storage inconvenience and low query efficiency caused by large amount of data.

本实施例给出了一种结构化处理的具体实施方式,上述步骤S104可以参照如下方式实施:从原始数据中查找与预设结构化表中的各字段对应的数据,并将查找到的数据填写在预设结构化表中对应的字段处;将经填写的预设结构化表作为结构化数据。为便于理解,示例性说明如下:This embodiment provides a specific implementation method of structured processing. The above step S104 can be implemented in the following manner: search for data corresponding to each field in the preset structured table from the original data, and fill the found data in the corresponding field in the preset structured table; and use the filled preset structured table as structured data. For ease of understanding, an exemplary description is as follows:

诸如,一条原始数据为:For example, a piece of original data is:

2019-01-29 07:31:39body:2019-01-29 07:31:39body:

{"status":"available","share_type_id":"473acf19-f54a-408a-83b0-f835519125f8","share_id":"2b151c02-eb6b-4962-a822-5d69e888aebc","usage_realtime":0,"quota_upperbound":20971520,"az":"ksc_shpbstest_zone_raidssd_1001","size":20480,"user_id":"11e440e7c797415ca65421273fde3b2c"."share_type":"Capacity","tenant_id":"8ec373464e3f4a53b8ee6d32afc62c41","timestamp":"2019-01-2907:31:39"}{"status":"available","share_type_id":"473acf19-f54a-408a-83b0-f835519125f8","share_id":"2b151c02-eb6b-4962-a822-5d69e888aebc","usage_realtime":0,"quota_upperbound" :20971520,"az":"ksc_shp bstest_zone_raidssd_1001","size":20480,"user_id":"11e440e7c797415ca65421273fde3b2c"."share_type":"Capacity","tenant_id":"8ec373464e3f4a53b8ee6d32afc62c41","timestamp ":"2019-01-2907:31:39"}

由上可知,原始数据是以字符串形式体现的非结构化数据。From the above, we can see that the original data is unstructured data in the form of strings.

预设结构化表参见表1:The preset structured table is shown in Table 1:

表1Table 1

如表1所示,storage_type(存储类型)、user_id(用户标识)、availability_zone(可用区域,也可理解为机房)、share_id(共享标识,也可理解为文件系统标识)、used_size(已使用量,可理解为资源使用量)、create_time(创建时间,也即本条数据的创建时间)为预设结构化表中的字段,上述字段均为所需的有用信息。在实际应用中,可以根据实际业务和需求,灵活设置结构化表中所包含的字段类型、字段数量和字段排序等。As shown in Table 1, storage_type (storage type), user_id (user ID), availability_zone (available area, also known as computer room), share_id (shared ID, also known as file system ID), used_size (used amount, also known as resource usage), create_time (creation time, that is, the creation time of this data) are fields in the preset structured table, and the above fields are all useful information required. In actual applications, the field type, field quantity, and field sorting contained in the structured table can be flexibly set according to actual business and needs.

按照上述预设的结构化表对原始数据进行结构化处理,得到的结构化数据参见表2:The original data is structured according to the above preset structured table, and the obtained structured data is shown in Table 2:

表2Table 2

如表2所示,从杂乱的非结构化的原始数据中查找与预设结构化表中各个字段对应的数据并相应填入结构化表中,得到结构化数据,结构化表中不需要的数据(诸如原始数据中的share-type-id、usage_realtime等信息)则可直接丢弃。通过这种方式,可以更好的对原始数据进行整理,不仅便于后续聚合处理,而且也可以清洗掉重要性不高的信息,有助于缩减数据量。As shown in Table 2, the data corresponding to each field in the preset structured table is searched from the messy unstructured raw data and filled into the structured table accordingly to obtain structured data. The unnecessary data in the structured table (such as share-type-id, usage_realtime and other information in the raw data) can be directly discarded. In this way, the raw data can be better organized, which is not only convenient for subsequent aggregation processing, but also can clean up the information of low importance, which helps to reduce the amount of data.

考虑到获取的原始数据中可能存在不符合要求的数据,诸如,有的原始数据中并未包含后续查询所需的关键信息,以原始数据为资源消耗数据为例,则资源ID、资源消耗量和时间等都是资源消耗数据所必不可少的关键信息,资源消耗数据一旦缺少此类关键信息,后续无法应用。为了避免此类不符合要求的数据占用存储空间以及浪费查询时间,上述方法还包括:如果从原始数据中未查找到与预设结构化表中指定的关键字段对应的数据,丢弃原始数据。关键字段可以为必不可缺的字段,具体可根据实际需求而设置。Considering that there may be data that does not meet the requirements in the acquired raw data, for example, some raw data does not contain the key information required for subsequent queries. Taking the raw data as resource consumption data as an example, resource ID, resource consumption and time are all essential key information for resource consumption data. Once the resource consumption data lacks such key information, it cannot be used later. In order to avoid such non-compliant data occupying storage space and wasting query time, the above method also includes: if the data corresponding to the key field specified in the preset structured table is not found from the raw data, the raw data is discarded. The key field can be an indispensable field, which can be set according to actual needs.

本实施例给出了一种对结构化数据进行聚合处理,得到聚合数据的实施方式,包括如下步骤(1)和(2):This embodiment provides an implementation method for aggregating structured data to obtain aggregated data, including the following steps (1) and (2):

(1)将结构化数据进行分组,得到至少一个数据组;其中,同一个数据组中的多条结构化数据在预设字段对应填入的数据相同。诸如,将预设时间段内的文件系统标识相同的资源消耗数据(已结构化处理)划分为一个数据组。(1) Grouping the structured data to obtain at least one data group; wherein the data filled in the corresponding preset fields of multiple structured data in the same data group are the same. For example, resource consumption data (which has been structured) with the same file system identifier within a preset time period are divided into one data group.

(2)对于每个数据组,将该数据组中的多条结构化数据进行聚合,得到该数据组对应的聚合数据;其中,聚合数据为该数据组中的多条结构化数据中选取的一条结构化数据,或者,聚合数据为该数据组中的多条结构化数据按照预设算法形成的一条新的结构化数据。假设结构化数据为结构化的资源消耗数据,其中都包含有资源消耗量。从每个数据组中选择资源消耗量最大的一条结构化数据作为该数据组对应的聚合数据;又诸如,将数据组中所有的结构化数据中的资源消耗量按照预设算法(比如加权算法)计算得到用于收费的资源消耗量,从而基于计算得到的资源消耗量形成一条新的结构化数据。(2) For each data group, multiple structured data in the data group are aggregated to obtain aggregated data corresponding to the data group; wherein the aggregated data is a structured data selected from the multiple structured data in the data group, or the aggregated data is a new structured data formed by the multiple structured data in the data group according to a preset algorithm. Assume that the structured data is structured resource consumption data, which contains resource consumption. From each data group, a structured data with the largest resource consumption is selected as the aggregated data corresponding to the data group; for another example, the resource consumption in all the structured data in the data group is calculated according to a preset algorithm (such as a weighted algorithm) to obtain the resource consumption for charging, thereby forming a new structured data based on the calculated resource consumption.

考虑到现有技术中大多是直接将原始数据存储于关系型数据库中,而关系型数据库通常只部署在一台服务器上,存在存储量瓶颈,难以满足数据量较大的情况。本实施例在对原始数据进行处理,得到数据量较少的聚合数据的基础上,为了更好的扩展存储空间,改善现有的存储瓶颈问题,本实施例提供的数据处理方法还包括:将得到的聚合数据存储于非关系型数据库中;该非关系型数据库部署于服务器集群上。通过这种分布式存储方式,进一步缓解了数据量可能带来的存储问题。在一种具体的实现方式中,非关系型数据库可基于分布式的ElasticSearch(ES)实现,其可近乎实时的在线存储和检索数据。Considering that most of the prior art directly stores the original data in a relational database, and the relational database is usually deployed on only one server, there is a storage bottleneck, which makes it difficult to meet the situation of large data volume. Based on processing the original data to obtain aggregated data with a smaller amount of data, in order to better expand the storage space and improve the existing storage bottleneck problem, the data processing method provided by this embodiment also includes: storing the obtained aggregated data in a non-relational database; the non-relational database is deployed on a server cluster. Through this distributed storage method, the storage problem that may be caused by the amount of data is further alleviated. In a specific implementation method, the non-relational database can be implemented based on a distributed ElasticSearch (ES), which can store and retrieve data online in near real time.

本实施例进一步给出了一种数据处理的具体实施方式,在该实施方式中,原始产品为资源消耗数据,计费产品分散部署在服务器集群中,每台服务器上都设置有数据采集器flume,flume是一种可以收集所需数据,并将数量庞大的数据汇集起来发送至诸如基于Hadoop的Hive等数据仓库工具进行进一步处理,其中,Hadoop是一种分布式系统基础构架,其可以使用户在不了解分布式底层细节的情况下开发分布式程序,充分利用集群的优势进行高速运算和存储。Hadoop的存储系统是HDFS(Hadoop Distributed File System,分布式文件系统),Hive可基于HDFS进行底层存储。参见图2所示的一种数据处理方法的具体流程示意图,包括如下步骤:This embodiment further provides a specific implementation method of data processing, in which the original product is resource consumption data, and the billing product is dispersedly deployed in the server cluster. A data collector flume is set on each server. Flume is a data warehouse tool that can collect the required data and collect a large amount of data and send it to data warehouse tools such as Hive based on Hadoop for further processing. Hadoop is a distributed system infrastructure that allows users to develop distributed programs without understanding the details of the distributed underlying layer, and fully utilize the advantages of the cluster for high-speed computing and storage. Hadoop's storage system is HDFS (Hadoop Distributed File System), and Hive can perform underlying storage based on HDFS. Referring to the specific flow diagram of a data processing method shown in Figure 2, it includes the following steps:

步骤S202:数据采集器flume以预定频率获取当前服务器的原始数据,并将原始数据传输至Hive,以记录在原始Hive表中。Step S202: the data collector flume obtains the original data of the current server at a predetermined frequency, and transmits the original data to Hive to be recorded in the original Hive table.

Hive可视为基于Hadoop的数据仓库工具,其类似关系型数据库,具有关系型数据库的可视化能力,在本实施例中可将其作为原始数据处理过程中的所用的中间存储库。如果数据采集器flume将原始数据发送至Hive,可以构成存储有原始数据的原始Hive表。Hive can be regarded as a data warehouse tool based on Hadoop. It is similar to a relational database and has the visualization capability of a relational database. In this embodiment, it can be used as an intermediate repository used in the raw data processing process. If the data collector Flume sends the raw data to Hive, an original Hive table storing the raw data can be formed.

步骤S204:通过Spark对记录在原始Hive表中的原始数据进行结构化处理,并将得到的结构化数据批量迁移至结构化Hive表。Step S204: Use Spark to perform structured processing on the original data recorded in the original Hive table, and migrate the obtained structured data to the structured Hive table in batches.

Spark是一个快速、通用的大规模数据处理引擎,本实施例可以通过Spark对原始Hive表中的原始数据进行批任务清洗及解析,将原始Hive表中的数据转换为结构化数据,并将得到的结构化数据ETL(Extract-Transform-Load,萃取-转置-加载)至结构化Hive表,上述批量迁移的过程也即ETL的过程。诸如,Spark可以从杂乱无章的非结构化的原始数据中查找到所需的有用信息(如前述预设结构化表中的各字段对应的数据),并将获取的有用信息存入预设的预设结构化表中对应的字段处,得到结构化数据,然后将结构化数据批量存于结构化Hive表。Spark is a fast, general-purpose large-scale data processing engine. In this embodiment, Spark can be used to perform batch task cleaning and parsing on the raw data in the original Hive table, convert the data in the original Hive table into structured data, and ETL (Extract-Transform-Load) the obtained structured data to the structured Hive table. The above batch migration process is also the ETL process. For example, Spark can find the required useful information (such as the data corresponding to each field in the aforementioned preset structured table) from the messy and unstructured raw data, and store the obtained useful information in the corresponding field of the preset preset structured table to obtain structured data, and then store the structured data in batches in the structured Hive table.

步骤S206:通过Spark将结构化Hive表批量迁移至ElasticSearch中,并对结构化数据进行聚合处理。Step S206: Use Spark to batch migrate structured Hive tables to ElasticSearch, and perform aggregation processing on the structured data.

ElasticSearch(ES)一个开源的高扩展的分布式全文检索引擎,可以近乎实时的在线存储和检索数据,其本身扩展性很好,可以扩展到上百台服务器并处理PB级别的数据。将结构化Hive表迁移至ElasticSearch中,有利于后续通过ElasticSearch进行快速检索(可在线查询),进一步缩短查询时间,提升查询效率。此外,Spark还可以基于预设的聚合规则对结构化数据进一步聚合,以减少最终存储于ElasticSearch中的数据量。实际应用中,Spark在进行数据结构化处理和聚合处理时,都可采用离线处理方式。ElasticSearch (ES) is an open source, highly scalable distributed full-text search engine that can store and retrieve data online in near real time. It has good scalability and can be expanded to hundreds of servers and process PB-level data. Migrating structured Hive tables to ElasticSearch is conducive to subsequent rapid retrieval through ElasticSearch (online query is possible), further shortening query time and improving query efficiency. In addition, Spark can further aggregate structured data based on preset aggregation rules to reduce the amount of data ultimately stored in ElasticSearch. In actual applications, Spark can use offline processing when performing data structuring and aggregation processing.

步骤S208:将聚合处理得到的聚合数据存储于非关系型数据库ElasticSearch结构化表中。Step S208: storing the aggregated data obtained by the aggregation process in a structured table of the non-relational database ElasticSearch.

由于非关系型数据库ElasticSearch的结构化表可部署于服务器集群上,因此可较好的扩展存储空间,可有效改善传统的关系型数据库仅部署于一台服务器上而导致的存储瓶颈问题。而且,经处理得到的聚合数据的数据量远小于原始数据的数据量,较好地改善了数据量较大的问题,不仅节约了存储资源,而且也有助于提升查询效率。Since the structured tables of the non-relational database ElasticSearch can be deployed on a server cluster, the storage space can be expanded well, which can effectively improve the storage bottleneck problem caused by the traditional relational database being deployed on only one server. Moreover, the amount of aggregated data obtained after processing is much smaller than the amount of original data, which effectively improves the problem of large data volume, not only saving storage resources, but also helping to improve query efficiency.

进一步,本实施例还提供了一种数据查询方法,参见图3所示的数据查询方法的流程图,该方法主要包括以下步骤S302~步骤S304:Furthermore, this embodiment also provides a data query method. Referring to the flow chart of the data query method shown in FIG. 3 , the method mainly includes the following steps S302 to S304:

步骤S302,如果接收到数据查询请求,从预设的数据库中查找与数据查询请求对应的聚合数据;其中,数据库中存储有如前述任一项的数据处理方法得到的聚合数据;由于数据库中存储的是对原始数据进行处理后得到的聚合数据,相比于现有直接存储原始数据的数据库,并采用mysql进行查询的方式而言,本实施例中的数据库所存储的数据量更少,因此搜索用时更短,可有效提升查询效率。在实际应用中,上述数据库可以为单点式的关系型数据库,也可以为多点分布的非关系型数据库。优选的,可将聚合数据存储于非关系型数据库中,诸如,将聚合数据在线存储于ElasticSearch中,不仅有利于分布式存储以扩展存储空间,而且利用ElasticSearch进行检索的用时更短,速度更快。Step S302, if a data query request is received, the aggregated data corresponding to the data query request is searched from a preset database; wherein the database stores aggregated data obtained by any of the data processing methods described above; since the database stores aggregated data obtained after processing the original data, compared to the existing database that directly stores the original data and uses mysql for querying, the amount of data stored in the database in this embodiment is less, so the search time is shorter, which can effectively improve the query efficiency. In practical applications, the above database can be a single-point relational database or a multi-point distributed non-relational database. Preferably, the aggregated data can be stored in a non-relational database, such as storing the aggregated data online in ElasticSearch, which is not only conducive to distributed storage to expand the storage space, but also uses ElasticSearch for retrieval in a shorter time and faster speed.

步骤S304,将查找到的聚合数据反馈给数据查询请求的请求方。Step S304: Feedback the found aggregated data to the requester of the data query request.

本发明实施例提供的上述数据查询方法,能够从预设的数据库中查找与接收到的数据查询请求对应的聚合数据,并反馈给数据查询请求的请求方,由于数据库中存储的是采用上述数据处理方法得到的聚合数据,数据量小于现有技术中直接在数据库中存储原始数据的数据量,因此通过这种方式可以有效缩短数据查询时间,提升查询效率。The above-mentioned data query method provided by the embodiment of the present invention can search for aggregated data corresponding to the received data query request from a preset database, and feed it back to the requester of the data query request. Since the database stores the aggregated data obtained by the above-mentioned data processing method, the amount of data is smaller than the amount of data of the original data directly stored in the database in the prior art. Therefore, this method can effectively shorten the data query time and improve the query efficiency.

对应于前述本实施例提供的数据处理方法,本实施例还提供了一种数据处理装置,参见图4所示,该装置包括依次连接的数据采集模块40、数据结构化模块42和数据聚合模块44,其中,Corresponding to the data processing method provided in the above embodiment, the present embodiment further provides a data processing device, as shown in FIG4 , the device comprises a data acquisition module 40, a data structuring module 42 and a data aggregation module 44 connected in sequence, wherein:

数据采集模块40,用于采集待处理的原始数据;A data acquisition module 40 is used to acquire raw data to be processed;

数据结构化模块42,用于对原始数据按照指定格式进行结构化处理,得到结构化数据;The data structuring module 42 is used to perform structured processing on the original data according to a specified format to obtain structured data;

数据聚合模块44,用于对结构化数据进行聚合处理以得到聚合数据,将聚合数据作为原始数据处理后的数据;其中,聚合数据的数据量小于原始数据的数据量。The data aggregation module 44 is used to aggregate the structured data to obtain aggregated data, and use the aggregated data as data after the original data is processed; wherein the data volume of the aggregated data is smaller than the data volume of the original data.

本发明实施例提供的上述数据处理装置,能够对采集的原始数据按照指定格式进行结构化处理,得到结构化数据,然后对结构化数据进行聚合处理以得到聚合数据,从而将聚合数据作为原始数据处理后的数据。本实施例采用对原始数据进行结构化处理并聚合的方式,可以有效缩减原始数据的数据量,改善现有技术中数据量庞大的问题。The above data processing device provided by the embodiment of the present invention can perform structured processing on the collected raw data according to the specified format to obtain structured data, and then aggregate the structured data to obtain aggregated data, so as to use the aggregated data as the data after the raw data is processed. This embodiment adopts the method of performing structured processing and aggregating the raw data, which can effectively reduce the amount of raw data and improve the problem of huge data volume in the prior art.

在一种具体的实施方式中,上述数据采集模块40进一步用于:按照预设频率从部署有计费产品的服务器获取待处理的原始数据;其中,原始数据为服务器记录的计费产品的资源消耗数据;服务器的数量为一个或多个。In a specific implementation, the data acquisition module 40 is further used to: obtain the original data to be processed from the server where the billing product is deployed at a preset frequency; wherein the original data is the resource consumption data of the billing product recorded by the server; and the number of servers is one or more.

在一种实施方式中,上述数据结构化模块42用于:从原始数据中查找与预设结构化表中的各字段对应的数据,并将查找到的数据填写在预设结构化表中对应的字段处;将经填写的预设结构化表作为结构化数据。在此基础上,上述装置还包括:数据丢弃模块,用于如果从原始数据中未查找到与预设结构化表中指定的关键字段对应的数据,丢弃原始数据。In one embodiment, the data structuring module 42 is used to: search for data corresponding to each field in the preset structured table from the original data, and fill the found data in the corresponding field in the preset structured table; and use the filled preset structured table as structured data. On this basis, the device further includes: a data discarding module, which is used to discard the original data if the data corresponding to the key field specified in the preset structured table is not found from the original data.

在一种具体的实施方式中,上述数据结构化模块42进一步用于:将结构化数据进行分组,得到至少一个数据组;其中,同一个数据组中的多条结构化数据在预设字段对应填入的数据相同;对于每个数据组,将该数据组中的多条结构化数据进行聚合,得到该数据组对应的聚合数据;其中,聚合数据为该数据组中的多条结构化数据中选取的一条结构化数据,或者,聚合数据为该数据组中的多条结构化数据按照预设算法形成的一条新的结构化数据。In a specific embodiment, the data structuring module 42 is further used to: group the structured data to obtain at least one data group; wherein, the data filled in the corresponding preset fields of multiple structured data in the same data group are the same; for each data group, aggregate the multiple structured data in the data group to obtain aggregated data corresponding to the data group; wherein the aggregated data is a structured data selected from the multiple structured data in the data group, or the aggregated data is a new structured data formed by the multiple structured data in the data group according to a preset algorithm.

在一种实施方式中,上述装置还包括:存储模块,用于将得到的聚合数据存储于非关系型数据库中;非关系型数据库部署于服务器集群上。In one embodiment, the above-mentioned device further includes: a storage module, used to store the obtained aggregated data in a non-relational database; the non-relational database is deployed on a server cluster.

对应于前述本实施例提供的数据查询方法,本实施例还提供了一种数据查询装置,参见图5所示,该装置包括依次连接的数据查询模块50和数据反馈模块52,其中:Corresponding to the data query method provided in the above embodiment, the present embodiment further provides a data query device, as shown in FIG5 , the device comprises a data query module 50 and a data feedback module 52 connected in sequence, wherein:

数据查找模块50,用于如果接收到数据查询请求,从预设的数据库中查找与数据查询请求对应的聚合数据;其中,数据库中存储有如前述任一项的数据处理方法得到的聚合数据;A data search module 50, for searching for aggregated data corresponding to the data query request from a preset database if a data query request is received; wherein the database stores aggregated data obtained by any of the above-mentioned data processing methods;

数据反馈模块52,用于将查找到的聚合数据反馈给数据查询请求的请求方。The data feedback module 52 is used to feed back the found aggregated data to the requester of the data query request.

本发明实施例提供的上述数据查询装置,能够从预设的数据库中查找与接收到的数据查询请求对应的聚合数据,并反馈给数据查询请求的请求方,由于数据库中存储的是采用上述数据处理方法得到的聚合数据,数据量小于现有技术中直接在数据库中存储原始数据的数据量,因此通过这种方式可以有效缩短数据查询时间,提升查询效率。The above-mentioned data query device provided by the embodiment of the present invention can search for aggregated data corresponding to the received data query request from a preset database, and feed back the aggregated data to the requester of the data query request. Since the database stores the aggregated data obtained by the above-mentioned data processing method, the amount of data is smaller than the amount of data of the original data directly stored in the database in the prior art. Therefore, this method can effectively shorten the data query time and improve the query efficiency.

本实施例所提供的装置,其实现原理及产生的技术效果和前述实施例相同,为简要描述,装置实施例部分未提及之处,可参考前述方法实施例中相应内容。The implementation principle and technical effects of the device provided in this embodiment are the same as those of the aforementioned embodiments. For the sake of brief description, for matters not mentioned in the device embodiment, reference may be made to the corresponding contents in the aforementioned method embodiment.

本实施例进一步提供了一种网络设备,包括处理器和机器可读存储介质,机器可读存储介质存储有能够被处理器执行的机器可执行指令,处理器执行机器可执行指令以实现前述数据处理方法或前述数据查询方法。This embodiment further provides a network device, including a processor and a machine-readable storage medium, wherein the machine-readable storage medium stores machine-executable instructions that can be executed by the processor, and the processor executes the machine-executable instructions to implement the aforementioned data processing method or the aforementioned data query method.

图6为本发明实施例提供的一种网络设备的结构示意图,该设备100包括:处理器60,存储器61,总线62和通信接口63,所述处理器60、通信接口63和存储器61通过总线62连接;处理器60用于执行存储器61中存储的可执行模块,例如计算机程序。Figure 6 is a structural diagram of a network device provided in an embodiment of the present invention. The device 100 includes: a processor 60, a memory 61, a bus 62 and a communication interface 63. The processor 60, the communication interface 63 and the memory 61 are connected via the bus 62; the processor 60 is used to execute an executable module stored in the memory 61, such as a computer program.

其中,存储器61可能包含高速随机存取存储器(RAM,Random Access Memory),也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。通过至少一个通信接口63(可以是有线或者无线)实现该系统网元与至少一个其他网元之间的通信连接,可以使用互联网,广域网,本地网,城域网等。The memory 61 may include a high-speed random access memory (RAM), and may also include a non-volatile memory, such as at least one disk memory. The communication connection between the system network element and at least one other network element is realized through at least one communication interface 63 (which may be wired or wireless), and the Internet, wide area network, local area network, metropolitan area network, etc. may be used.

总线62可以是ISA总线、PCI总线或EISA总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一个双向箭头表示,但并不表示仅有一根总线或一种类型的总线。The bus 62 may be an ISA bus, a PCI bus or an EISA bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one bidirectional arrow is used in FIG6 , but this does not mean that there is only one bus or one type of bus.

其中,存储器61用于存储程序,所述处理器60在接收到执行指令后,执行所述程序,前述本发明实施例任一实施例揭示的流过程定义的装置所执行的方法可以应用于处理器60中,或者由处理器60实现。Among them, the memory 61 is used to store programs, and the processor 60 executes the program after receiving the execution instruction. The method executed by the device for flow process definition disclosed in any embodiment of the above-mentioned embodiment of the present invention can be applied to the processor 60 or implemented by the processor 60.

处理器60可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器60中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器60可以是通用处理器,包括中央处理器(Central Processing Unit,简称CPU)、网络处理器(Network Processor,简称NP)等;还可以是数字信号处理器(Digital SignalProcessing,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现成可编程门阵列(Field-Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器61,处理器60读取存储器61中的信息,结合其硬件完成上述方法的步骤。The processor 60 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit or software instructions in the processor 60. The above processor 60 can be a general-purpose processor, including a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The methods, steps and logic block diagrams disclosed in the embodiments of the present invention can be implemented or executed. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor. The steps of the method disclosed in the embodiments of the present invention can be directly embodied as a hardware decoding processor to be executed, or the hardware and software modules in the decoding processor can be executed. The software module can be located in a mature storage medium in the field such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc. The storage medium is located in the memory 61, and the processor 60 reads the information in the memory 61 and completes the steps of the above method in combination with its hardware.

本发明实施例还提供了一种机器可读存储介质,机器可读存储介质存储有机器可执行指令,机器可执行指令在被处理器调用和执行时,机器可执行指令促使处理器实现前述数据处理方法或前述数据查询方法,具体实现可参见方法实施例,在此不再赘述。An embodiment of the present invention also provides a machine-readable storage medium, which stores machine-executable instructions. When the machine-executable instructions are called and executed by a processor, the machine-executable instructions prompt the processor to implement the aforementioned data processing method or the aforementioned data query method. The specific implementation can be found in the method embodiment, which will not be repeated here.

所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention, or the part that contributes to the prior art or the part of the technical solution, can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in each embodiment of the present invention. The aforementioned storage medium includes: various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk.

最后应说明的是:以上所述实施例,仅为本发明的具体实施方式,用以说明本发明的技术方案,而非对其限制,本发明的保护范围并不局限于此,尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本发明实施例技术方案的精神和范围,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。Finally, it should be noted that the above-described embodiments are only specific implementations of the present invention, which are used to illustrate the technical solutions of the present invention, rather than to limit them. The protection scope of the present invention is not limited thereto. Although the present invention is described in detail with reference to the above-described embodiments, ordinary technicians in the field should understand that any technician familiar with the technical field can still modify the technical solutions recorded in the above-described embodiments within the technical scope disclosed by the present invention, or can easily think of changes, or make equivalent replacements for some of the technical features therein; and these modifications, changes or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in the protection scope of the present invention. Therefore, the protection scope of the present invention shall be based on the protection scope of the claims.

Claims (9)

1. A method of data processing, comprising:
Collecting original data to be processed;
carrying out structuring treatment on the original data according to a specified format to obtain structured data;
performing aggregation processing on the structured data to obtain aggregated data, wherein the aggregated data is used as data processed by the original data; wherein the data volume of the aggregate data is smaller than the data volume of the original data;
The step of carrying out structuring treatment on the original data according to a specified format to obtain structured data comprises the following steps:
Searching data corresponding to each field in a preset structured table from the original data, and filling the searched data in the corresponding field in the preset structured table;
taking the filled preset structured table as structured data;
The step of performing aggregation processing on the structured data to obtain aggregated data comprises the following steps:
Grouping the structured data to obtain at least one data group; wherein, the data correspondingly filled in the plurality of structured data in the same data group in the preset field are the same;
For each data group, aggregating a plurality of pieces of structured data in the data group to obtain aggregation data corresponding to the data group; the aggregation data is one piece of structured data selected from a plurality of pieces of structured data in the data group, or the aggregation data is one new piece of structured data formed by the plurality of pieces of structured data in the data group according to a preset algorithm.
2. The method according to claim 1, wherein the method further comprises:
and discarding the original data if the data corresponding to the key field specified in the preset structural table is not found in the original data.
3. The method according to claim 1, wherein the method further comprises:
Storing the obtained aggregation data in a non-relational database; the non-relational database is deployed on a server cluster.
4. A method according to any one of claims 1 to 3, wherein the step of collecting raw data to be processed comprises:
Acquiring raw data to be processed from a server deployed with a charging product according to a preset frequency; wherein, the original data is the resource consumption data of the charging product recorded by the server; the number of the servers is one or more.
5. A method of querying data, comprising:
If a data query request is received, searching aggregate data corresponding to the data query request from a preset database; wherein the database stores therein aggregated data obtained by the data processing method according to any one of claims 1 to 4;
and feeding the searched aggregated data back to a requester of the data query request.
6. A data processing apparatus, comprising:
The data acquisition module is used for acquiring the original data to be processed;
The data structuring module is used for structuring the original data according to a specified format to obtain structured data;
The data aggregation module is used for carrying out aggregation processing on the structured data to obtain aggregated data, and taking the aggregated data as the data processed by the original data; wherein the data volume of the aggregate data is smaller than the data volume of the original data;
The data structuring module is further configured to: searching data corresponding to each field in a preset structured table from the original data, and filling the searched data in the corresponding field in the preset structured table; taking the filled preset structured table as structured data;
the data aggregation module is further configured to: grouping the structured data to obtain at least one data group; wherein, the data correspondingly filled in the plurality of structured data in the same data group in the preset field are the same; for each data group, aggregating a plurality of pieces of structured data in the data group to obtain aggregation data corresponding to the data group; the aggregation data is one piece of structured data selected from a plurality of pieces of structured data in the data group, or the aggregation data is one new piece of structured data formed by the plurality of pieces of structured data in the data group according to a preset algorithm.
7. A data query device, comprising:
the data searching module is used for searching aggregate data corresponding to the data query request from a preset database if the data query request is received; wherein the database stores therein aggregated data obtained by the data processing method according to any one of claims 1 to 4;
and the data feedback module is used for feeding the searched aggregated data back to a requester of the data query request.
8. A network device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to perform the method of any one of claims 1 to 4 or to perform the method of claim 5.
9. A machine-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the method of any one of claims 1 to 4 or cause the processor to implement the method of claim 5.
CN201910535480.4A 2019-06-19 2019-06-19 Data processing method and device, data query method and device and network equipment Active CN112115164B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910535480.4A CN112115164B (en) 2019-06-19 2019-06-19 Data processing method and device, data query method and device and network equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910535480.4A CN112115164B (en) 2019-06-19 2019-06-19 Data processing method and device, data query method and device and network equipment

Publications (2)

Publication Number Publication Date
CN112115164A CN112115164A (en) 2020-12-22
CN112115164B true CN112115164B (en) 2024-09-03

Family

ID=73795034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910535480.4A Active CN112115164B (en) 2019-06-19 2019-06-19 Data processing method and device, data query method and device and network equipment

Country Status (1)

Country Link
CN (1) CN112115164B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312434A (en) * 2021-07-29 2021-08-27 北京快立方科技有限公司 Pre-polymerization treatment method for massive structured data
CN119719096A (en) * 2023-09-28 2025-03-28 华为云计算技术有限公司 A data processing method and related equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598536A (en) * 2014-12-29 2015-05-06 浙江大学 Structured processing method of distributed network information
CN105139281A (en) * 2015-08-20 2015-12-09 北京中电普华信息技术有限公司 Method and system for processing big data of electric power marketing

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011095988A2 (en) * 2010-02-03 2011-08-11 Puranik Anita Kulkarni A system and method for extraction of structured data from arbitrarily structured composite data
CN105956932A (en) * 2016-04-29 2016-09-21 中国南方电网有限责任公司电网技术研究中心 Distribution and utilization data fusion method and system
CN106294866B (en) * 2016-08-23 2020-02-11 北京奇虎科技有限公司 Log processing method and device
US20180165306A1 (en) * 2016-12-09 2018-06-14 International Business Machines Corporation Executing Queries Referencing Data Stored in a Unified Data Layer
CN107679192B (en) * 2017-10-09 2020-09-22 中国工商银行股份有限公司 Multi-cluster cooperative data processing method, system, storage medium and equipment
CN108449375A (en) * 2018-01-30 2018-08-24 上海天旦网络科技发展有限公司 The system and method for network interconnection data grabber distribution
CN108764525A (en) * 2018-04-20 2018-11-06 北京化工大学 For structured analysis of the distributed generation resource with electric model and parameter optimization method
CN108897796B (en) * 2018-06-12 2023-07-14 平安科技(深圳)有限公司 Method for calling Influxdb database by service system, storage medium and server
CN109325036A (en) * 2018-07-25 2019-02-12 浙江精功机器人智能装备有限公司 A kind of system and method for realizing real-time data synchronization
CN109471863B (en) * 2018-11-12 2021-07-20 北京懿医云科技有限公司 Information query method and device based on distributed database and electronic equipment
CN109460412A (en) * 2018-11-14 2019-03-12 北京锐安科技有限公司 Data aggregation method, device, equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598536A (en) * 2014-12-29 2015-05-06 浙江大学 Structured processing method of distributed network information
CN105139281A (en) * 2015-08-20 2015-12-09 北京中电普华信息技术有限公司 Method and system for processing big data of electric power marketing

Also Published As

Publication number Publication date
CN112115164A (en) 2020-12-22

Similar Documents

Publication Publication Date Title
JP7453143B2 (en) Data storage and query methods and devices
CN104699718B (en) Method and apparatus for being rapidly introduced into business datum
CN110825733B (en) Multi-sampling-stream-oriented time series data management method and system
WO2017016423A1 (en) Real-time new data update method and device
CN105760395A (en) Data processing method, device and system
CN102332030A (en) Data storage, management and query method and system for distributed key-value storage system
CN110909266B (en) Deep paging method and device and server
CN110674101B (en) Data processing method and device of file system and cloud server
CN111258978A (en) a method of data storage
CN106503008B (en) File storage method and device and file query method and device
CN107194007A (en) A kind of integrated management system of spacecraft isomery test data
CN112115164B (en) Data processing method and device, data query method and device and network equipment
CN104951509A (en) Big data online interactive query method and system
CN114416739A (en) Operation and maintenance work order data import and export system and method
CN110262951A (en) A kind of business second grade monitoring method and system, storage medium and client
CN108206776B (en) Group history message query method and device
CN115618050B (en) Video data storage, analysis method, device, system, communication equipment and storage medium
CN113672593A (en) Data storage method, device and equipment
CN111400301A (en) Data query method, device and equipment
CN110019152A (en) A kind of big data cleaning method
CN102663097A (en) A method for organizing agricultural time series data based on Hadoop+Hbase
CN118708830A (en) A method, device, equipment and medium for aggregating and counting business request performance indicators
CN110099061A (en) A kind of cloud platform video streaming services selection method and device
CN117633355A (en) Hot spot data real-time recommendation method, system, electronic equipment and storage medium
CN107203554A (en) A kind of distributed search method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant