[go: up one dir, main page]

CN111221698A - Task data collection method and device - Google Patents

Task data collection method and device Download PDF

Info

Publication number
CN111221698A
CN111221698A CN201811419180.1A CN201811419180A CN111221698A CN 111221698 A CN111221698 A CN 111221698A CN 201811419180 A CN201811419180 A CN 201811419180A CN 111221698 A CN111221698 A CN 111221698A
Authority
CN
China
Prior art keywords
statistical
value
data
execution information
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811419180.1A
Other languages
Chinese (zh)
Inventor
李海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Financial Technology Holding Co Ltd
Original Assignee
Beijing Jingdong Financial Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Financial Technology Holding Co Ltd filed Critical Beijing Jingdong Financial Technology Holding Co Ltd
Priority to CN201811419180.1A priority Critical patent/CN111221698A/en
Publication of CN111221698A publication Critical patent/CN111221698A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本公开提供一种任务数据采集方法与装置,涉及数据处理技术领域。任务数据采集方法包括:为n个子任务建立对应的n个数据收集器;获取所述子任务的执行信息,根据所述执行信息获取所述数据收集器中记录的预设统计维度;在所述数据收集器中查找到所述预设统计维度对应的统计数值;判断所述预设统计维度对应的统计种类,根据所述统计种类以及所述执行信息更新所述统计数值;在所述n个子任务执行完毕后,对所述n个数据收集器进行汇总。本公开提供的任务数据采集方法可以提高数据采集的效率。

Figure 201811419180

The present disclosure provides a task data collection method and device, which relate to the technical field of data processing. The task data collection method includes: establishing corresponding n data collectors for n subtasks; acquiring execution information of the subtasks, and acquiring preset statistical dimensions recorded in the data collectors according to the execution information; The statistical value corresponding to the preset statistical dimension is found in the data collector; the statistical type corresponding to the preset statistical dimension is determined, and the statistical value is updated according to the statistical type and the execution information; After the task is executed, the n data collectors are aggregated. The task data collection method provided by the present disclosure can improve the efficiency of data collection.

Figure 201811419180

Description

Task data acquisition method and device
Technical Field
The disclosure relates to the technical field of computers, in particular to a task data acquisition method and device.
Background
On a Hadoop big data platform, MapReduce is a widely applied data processing framework, the platform also provides functions of monitoring a MapReduce task and checking execution logs, but the success of the task does not represent that all data processing is correct, and at this time, more precise execution process data needs to be collected, for example, whether the data volume is correct or not is judged by counting the input and output data volumes.
At present, a counter is provided in a MapReduce framework and is used for collecting necessary counts in a task execution process so as to count information such as resource consumption, input and output data total amount and the like. If other counters are needed, a custom counter and counting logic can be added in the processing process, and the counting value of the counter is searched after the task operation is finished.
Since each counter can only obtain one result data, the statistical requirements are difficult to meet when the acquisition requirements are finer or counts in certain dimensions are counted. For example, counting the number of rows of each input data file separately, a counter needs to be added for each file, and if counting is performed according to more conditions, such as counting according to the input file and the writing date, enumeration is no longer possible, and thus implementation by the counter is not possible.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The purpose of the present disclosure is to provide a task data acquisition method and a task data acquisition apparatus, which are used to overcome, at least to some extent, the problem that it is difficult to meet the data statistics requirement during the operation of MapReduce framework due to the limitations and defects of the related art.
According to a first aspect of the embodiments of the present disclosure, there is provided a task data acquisition method, including: establishing n corresponding data collectors for the n subtasks; acquiring execution information of the subtasks, and acquiring preset statistical dimensions recorded in the data collector according to the execution information; searching a statistic value corresponding to the preset statistic dimension in the data collector; judging a statistic type corresponding to the preset statistic dimension, and updating the statistic value according to the statistic type and the execution information; and after the n subtasks are executed, summarizing the n data collectors.
In an exemplary embodiment of the present disclosure, the statistical categories include counts, extrema, summaries.
In an exemplary embodiment of the present disclosure, the updating the statistics according to the statistics category and the execution information includes:
and when the statistic type is counting, adding the statistic to generate a new value, and replacing the statistic with the new value.
In an exemplary embodiment of the present disclosure, the updating the statistics according to the statistics category and the execution information includes:
and when the statistic type is a maximum value, acquiring a value corresponding to the preset statistic dimension in the execution information, comparing the value with the statistic value, and writing the larger value of the two values into the data collector as a new statistic value.
In an exemplary embodiment of the present disclosure, the updating the statistics according to the statistics category and the execution information includes:
and when the statistic type is a minimum value, acquiring a value corresponding to the preset statistic dimension in the execution information, comparing the value with the statistic value, and writing the smaller value of the two as a new statistic value into the data collector.
In an exemplary embodiment of the present disclosure, the updating the statistics according to the statistics category and the execution information includes:
and when the statistic type is summary, acquiring a numerical value corresponding to the preset statistic dimension in the execution information, and writing the sum of the numerical value and the statistic value into the data collector as a new statistic value.
In an exemplary embodiment of the present disclosure, the aggregating the n data collectors includes:
and converting the data in the n data collectors into a preset format one by one, and writing the data into a statistical file corresponding to the task through a preset output method.
According to a second aspect of the embodiments of the present disclosure, there is provided a task data acquisition apparatus including:
the data collector establishing module is used for establishing n corresponding data collectors for the n subtasks;
the data identification module is used for acquiring the execution information of the subtasks and acquiring the preset statistical dimension recorded in the data collector according to the execution information;
the data value determining module is set to search the statistic value corresponding to the preset statistic dimension in the data collector;
a value updating module, configured to determine a statistic type corresponding to the preset statistic dimension, and update the statistic value according to the statistic type and the execution information;
the data summarizing module is set to summarize the n data collectors after the n subtasks are executed;
according to a third aspect of the present disclosure, there is provided a task data acquisition apparatus comprising: a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above based on instructions stored in the memory.
According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements a task data collection method as recited in any one of the above.
According to the embodiment of the method and the device, the task information is collected in real time for each subtask according to the preset conditions in the MapReduce execution process, the statistical data is updated, the statistical data is collected after the task is executed, the more precise data acquisition of the MapReduce execution process data can be realized without increasing other tasks and without remarkably increasing the resource consumption, the more comprehensive inspection and diagnosis can be performed on the MapReduce execution process before result checking or query of the execution log, and the refined MapReduce execution process monitoring and abnormal discovery can be realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
Fig. 1 schematically illustrates a flow chart of a task data collection method in an exemplary embodiment of the present disclosure.
Fig. 2 schematically illustrates a flow chart of a task data collection method in an exemplary embodiment of the present disclosure.
Fig. 3 schematically illustrates a flow chart of a task data collection method in an exemplary embodiment of the present disclosure.
Fig. 4 schematically illustrates a block diagram of a task data collection device in an exemplary embodiment of the present disclosure.
Fig. 5 schematically illustrates a block diagram of an electronic device in an exemplary embodiment of the disclosure.
Fig. 6 schematically illustrates a schematic diagram of a computer-readable storage medium in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Further, the drawings are merely schematic illustrations of the present disclosure, in which the same reference numerals denote the same or similar parts, and thus, a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The following detailed description of exemplary embodiments of the disclosure refers to the accompanying drawings.
Fig. 1 schematically illustrates a flow chart of a task data collection method in an exemplary embodiment of the present disclosure. Referring to fig. 1, a task data collection method 100 may include:
step S102, establishing n corresponding data collectors for the n subtasks;
step S104, acquiring the execution information of the subtasks, and acquiring the preset statistical dimension recorded in the data collector according to the execution information;
step S106, finding out a statistic corresponding to the preset statistic dimension in the data collector;
step S108, judging a statistic type corresponding to the preset statistic dimension, and updating the statistic value according to the statistic type and the execution information;
and step S110, after the n subtasks are executed, summarizing the n data collectors.
According to the embodiment of the method and the device, the task information is collected in real time for each subtask according to the preset conditions in the MapReduce execution process, the statistical data is updated, the statistical data is collected after the task is executed, the more precise data acquisition of the MapReduce execution process data can be realized without increasing other tasks and without remarkably increasing the resource consumption, the more comprehensive inspection and diagnosis can be performed on the MapReduce execution process before result checking or query of the execution log, and the refined MapReduce execution process monitoring and abnormal discovery can be realized.
The steps of the task data collection method 100 will be described in detail below.
In one embodiment of the technical scheme disclosed by the disclosure, a data collector (for example, a hash table) is respectively established in each subtask of MapReduce, and a set is maintained in the data collector, which is the same as storing collected data. And when the task is completed, writing the data in the data collector into the HDFS file system in a preset output mode, and integrating the collected data in the HDFS file system after the task is completed for subsequent monitoring and checking.
In step S102, n corresponding data collectors are established for the n subtasks.
Taking MapReduce task as an example, a data collector can be constructed for each subtask when Mapper and Reducer are initialized, and a preset statistical dimension is set for each subtask. For example, basic information of the subtasks, JobId, JobName, input file path and name, task phase, mapper or reducer implementation class name, etc. may be initialized as data collection dimensions. The preset statistical dimension can be various, and the technical personnel in the field can set according to the actual requirement.
In addition, a hash table may be constructed in each data collector to maintain a collection of collected data, a preset data dimension of the data to be collected is written into a List as a key name (key name) of the hash table, and a value or other data value to be collected is written into the hash table as a key value corresponding to the key name.
In some embodiments, the task output may also be constructed in a multi-output manner such as multiple outputs before the task is submitted, and in addition to the output data of the task itself, a file output format of the data collector is set, for example, an addNamedOut method is used to declare an output format (for example, Text format) of the data collection file in the data collector, so that the data collection file is combined after the task is completed.
In step S104, the execution information of the subtask is obtained, and the preset statistical dimension recorded in the data collector is obtained according to the execution information.
In step S106, the statistics corresponding to the preset statistical dimension is found in the data collector.
In step S108, a statistical type corresponding to the preset statistical dimension is determined, and the statistical value is updated according to the statistical type and the execution information.
During the execution of the subtask, a plurality of pieces of execution information may be acquired, preset statistical dimensions recorded in a corresponding data collector of the subtask are searched for in the pieces of execution information, and when one or more of the preset statistical dimensions are found to exist in the pieces of execution information, the pieces of execution information are further processed.
Firstly, the type of the preset statistical dimension related to the execution information can be confirmed, and the statistical type corresponding to each preset statistical dimension is judged according to the preset value in the data collector. A plurality of statistical categories may be preset in each data collector and a data update method corresponding to each statistical category is provided.
In embodiments of the present disclosure, the statistical categories provided by the data collector may include, for example, counting, extreming, aggregating, and the like. In other embodiments, the data collector may further set more statistical categories and provide corresponding executable data updating methods, which is not limited by the disclosure.
FIG. 2 is a diagram illustrating the sub-steps of step S108 in one embodiment.
Referring to fig. 2, step S108 may include:
step S1081, judging a statistic type corresponding to a preset statistic dimension;
step S1082, when the statistic type is counting, adding a new value to the statistic, and replacing the new value with the statistic;
step S1083, when the statistic type is maximum, obtaining a value corresponding to a preset statistic dimension in the execution information, comparing the value with the statistic value, and writing the larger of the two as a new statistic value into a data collector;
step S1084, when the statistic category is a minimum value, obtaining a value corresponding to a preset statistic dimension from the execution information, comparing the value with the statistic value, and writing the smaller of the two as a new statistic value into a data collector;
step S1085, when the statistics category is summary, obtaining a value corresponding to a preset statistics dimension from the execution information, and writing the sum of the value and the statistics value as a new statistics value into the data collector.
In one embodiment, each data updating method provides two parameters, where the first parameter is a preset statistical dimension, that is, the key name of the hash table mentioned above; the second parameter is a value, which defaults to 1 if the statistical category is count, and does not need to be filled in, and if the statistical category is other, specific values, such as values associated with preset statistical dimensions appearing in the latest task execution information, may be filled in as needed.
For example, the methods corresponding to the statistical types of counting, extremum determination, and aggregation may be named collectCount, collectMax, collectMin, and collectSum, respectively. And updating the statistics according to the following logic:
collectCount: and taking the first parameter as a key name, finding a key value corresponding to the first parameter from the hash table, adding one to the key value to form a new key value, and then writing the first parameter and the new key value into the hash table.
collectMax: and taking the first parameter as a key name, finding a key value corresponding to the first parameter from the hash table, acquiring a numerical value corresponding to the first parameter from the execution information, writing the numerical value into the hash table as a second parameter, and writing a larger value of the key value and the second parameter into the hash table as a new key value corresponding to the first parameter. The implementation manner of collectMin is similar to that of collectMax, and the key value and the smaller value of the second parameter are used as the new key value.
collectSum: and taking the first parameter as a key name, finding a key value corresponding to the first parameter from the hash table, obtaining a numerical value corresponding to the first parameter from the execution information, writing the numerical value into the hash table as a second parameter, and writing the sum of the key value and the second parameter into the hash table as a new key value corresponding to the first parameter.
In addition to the above examples, various statistical categories such as filtering, classification statistics, and the like may be set. For the same task execution information, as the same task execution information may involve a plurality of preset statistical dimensions, data can be extracted from different angles according to the preset statistical dimensions, and the statistical data of each preset statistical dimension can be updated.
In step S110, after the n subtasks are executed, the n data collectors are summarized.
In some embodiments, aggregating the information of the n data collectors may include converting the data in the n data collectors into a preset format one by one, and writing the preset format into a statistical file corresponding to the task through a preset output method.
Still taking the MapReduce task execution process as an example, at the clear stage of the MapReduce task, data in the hash table may be taken out, and serialized item by item into a preset format (for example, Text format), where the serialization manner is to separate each dimension value and key value in the key name into a string Text through a comma, and then output the string Text into a temporary directory corresponding to the task through a multiple output.
In the Commit stage of task execution, by rewriting fileoutputcommit, on one hand, each data collection file generated by the task is merged into one file (HDFS provides merging support for text files), and on the other hand, the merged data file is copied from the temporary directory to a specific data collection directory.
In subsequent processing, the generated data file can be loaded into the Hive table, and query and analysis are performed in an SQL manner.
FIG. 3 is a flow chart of a task data collection method in one embodiment of the present disclosure.
Referring to fig. 3, a task data collection process may include, for example:
step S31, when constructing MapReduce Job, using multiple outputs, and adding a result output format of data acquisition through addFormadOutput;
step S32, constructing a data collector when the Mapper and the Reducer are initialized, and storing task information which can be selected as a data collection dimension, wherein the task information can be task basic information such as JobId, JobName, input file path and name, task phase, Mapper or Reducer realization class name and the like;
step S33, respectively collecting quantity, extreme value and summary value by the methods of collectincount, collectitMax, collectitMin, collectitSum and the like of a collectitor in the executing process of Mapper and Reducer;
step S34, when the Mapper and Reducer are finished, in the clear method, writing the data collected in the collector into the temporary collection file by the multiple outputs.write method;
and step S35, rewriting FileOutputCommitter, merging the data acquisition files in the temporary directory when the MapReduce task is completed, and transferring the merged data acquisition files to the final data acquisition directory.
According to the embodiment of the application, the data collector is constructed in a data set mode in the MapReduce task execution process, multi-statistical-dimension multi-statistical-type data collection is supported, and when the task is completed, data in the data collector is written into an HDFS file system in a multi-output mode, so that the problem that the data statistical requirements are difficult to meet in the MapReduce framework operation process can be solved.
Corresponding to the method embodiment, the present disclosure also provides a task data acquisition device, which may be used to execute the method embodiment.
Fig. 4 schematically illustrates a block diagram of a task data collection device in an exemplary embodiment of the present disclosure.
Referring to fig. 4, the task data collecting apparatus 400 may include:
a data collector creation module 402 configured to create n data collectors corresponding to the n subtasks;
a data identification module 404 configured to acquire execution information of the subtasks, and acquire a preset statistical dimension recorded in the data collector according to the execution information;
a data determining module 406 configured to find a statistic corresponding to the preset statistic dimension in the data collector;
a value updating module 408 configured to determine a statistic category corresponding to the preset statistic dimension, and update the statistic value according to the statistic category and the execution information;
the data summarization module 410 is configured to summarize the n data collectors after the n subtasks are executed.
In an exemplary embodiment of the present disclosure, the statistical categories include counts, extrema, summaries.
In an exemplary embodiment of the present disclosure, the value update module 408 is configured to:
and when the statistic type is counting, adding the statistic to generate a new value, and replacing the statistic with the new value.
And when the statistic type is a maximum value, acquiring a value corresponding to the preset statistic dimension in the execution information, comparing the value with the statistic value, and writing the larger value of the two values into the data collector as a new statistic value.
And when the statistic type is a minimum value, acquiring a value corresponding to the preset statistic dimension in the execution information, comparing the value with the statistic value, and writing the smaller value of the two as a new statistic value into the data collector.
And when the statistic type is summary, acquiring a numerical value corresponding to the preset statistic dimension in the execution information, and writing the sum of the numerical value and the statistic value into the data collector as a new statistic value.
In an exemplary embodiment of the present disclosure, the data summarization module 410 is configured to:
and converting the data in the n data collectors into a preset format one by one, and writing the data into a statistical file corresponding to the task through a preset output method.
Since the functions of the apparatus 400 have been described in detail in the corresponding method embodiments, the disclosure is not repeated herein.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 500 according to this embodiment of the invention is described below with reference to fig. 5. The electronic device 500 shown in fig. 5 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, the electronic device 500 is embodied in the form of a general purpose computing device. The components of the electronic device 500 may include, but are not limited to: the at least one processing unit 510, the at least one memory unit 520, and a bus 530 that couples various system components including the memory unit 520 and the processing unit 510.
Wherein the storage unit stores program code that is executable by the processing unit 510 to cause the processing unit 510 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 510 may execute step S102 as shown in fig. 1: establishing n corresponding data collectors for the n subtasks; step S104: acquiring execution information of the subtasks, and acquiring preset statistical dimensions recorded in the data collector according to the execution information; step S105: searching a statistic value corresponding to the preset statistic dimension in the data collector; step S108: and judging the statistic type corresponding to the preset statistic dimension, and updating the statistic value according to the statistic type and the execution information.
The memory unit 520 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)5201 and/or a cache memory unit 5202, and may further include a read only memory unit (ROM) 5203.
Storage unit 520 may also include a program/utility 5204 having a set (at least one) of program modules 5205, such program modules 5205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 530 may be one or more of any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 500 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 550. Also, the electronic device 500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 560. As shown, the network adapter 560 communicates with the other modules of the electronic device 500 over the bus 530. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
Referring to fig. 6, a program product 600 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (10)

1.一种任务数据采集方法,其特征在于,包括:1. a task data collection method, is characterized in that, comprises: 为n个子任务建立对应的n个数据收集器;Establish corresponding n data collectors for n subtasks; 获取所述子任务的执行信息,根据所述执行信息获取所述数据收集器中记录的预设统计维度;Obtain the execution information of the subtask, and obtain the preset statistical dimension recorded in the data collector according to the execution information; 在所述数据收集器中查找到所述预设统计维度对应的统计数值;Find the statistical value corresponding to the preset statistical dimension in the data collector; 判断所述预设统计维度对应的统计种类,根据所述统计种类以及所述执行信息更新所述统计数值;determining the statistical type corresponding to the preset statistical dimension, and updating the statistical value according to the statistical type and the execution information; 在所述n个子任务执行完毕后,对所述n个数据收集器进行汇总。After the execution of the n subtasks is completed, the n data collectors are aggregated. 2.如权利要求1所述的任务数据采集方法,其特征在于,所述统计种类包括计数、极值、汇总。2 . The task data collection method according to claim 1 , wherein the statistical types include counts, extreme values, and summaries. 3 . 3.如权利要求2所述的任务数据采集方法,其特征在于,所述根据所述统计种类以及所述执行信息更新所述统计数值包括:3. The task data collection method according to claim 2, wherein the updating the statistical value according to the statistical type and the execution information comprises: 在所述统计种类为计数时,将所述统计数值加一生成新值,并用所述新值替换所述统计数值。When the statistic type is count, add one to the statistic value to generate a new value, and replace the statistic value with the new value. 4.如权利要求2所述的任务数据采集方法,其特征在于,所述根据所述统计种类以及所述执行信息更新所述统计数值包括:4. The task data collection method according to claim 2, wherein the updating the statistical value according to the statistical type and the execution information comprises: 当所述统计种类为极大值时,在所述执行信息中获取所述预设统计维度对应的数值,并与所述统计数值进行比较,将二者中较大者作为新的统计数值写入所述数据收集器。When the statistical type is a maximum value, obtain the value corresponding to the preset statistical dimension in the execution information, compare it with the statistical value, and write the larger of the two as the new statistical value. into the data collector. 5.如权利要求2所述的任务数据采集方法,其特征在于,所述根据所述统计种类以及所述执行信息更新所述统计数值包括:5. The task data collection method according to claim 2, wherein the updating the statistical value according to the statistical type and the execution information comprises: 当所述统计种类为极小值时,在所述执行信息中获取所述预设统计维度对应的数值,并与所述统计数值进行比较,将二者中较小者作为新的统计数值写入所述数据收集器。When the statistical type is a minimum value, obtain the value corresponding to the preset statistical dimension in the execution information, compare it with the statistical value, and write the smaller of the two as the new statistical value. into the data collector. 6.如权利要求2所述的任务数据采集方法,其特征在于,所述根据所述统计种类以及所述执行信息更新所述统计数值包括:6. The task data collection method according to claim 2, wherein the updating the statistical value according to the statistical type and the execution information comprises: 当所述统计种类为汇总时,在所述执行信息中获取所述预设统计维度对应的数值,将所述数值与所述统计数值之和作为新的统计数值写入所述数据收集器。When the statistical type is summary, the numerical value corresponding to the preset statistical dimension is obtained from the execution information, and the sum of the numerical value and the statistical value is written into the data collector as a new statistical value. 7.如权利要求1所述的任务数据采集方法,其特征在于,所述对所述n个数据收集器进行汇总包括:7. The task data collection method according to claim 1, wherein the summarizing the n data collectors comprises: 将n个所述数据收集器中的数据逐条转化为预设格式,通过预设输出方法写入与任务对应的统计文件中。The data in the n data collectors is converted into a preset format one by one, and written into a statistical file corresponding to the task by a preset output method. 8.一种任务数据采集装置,其特征在于,包括:8. A task data acquisition device, characterized in that, comprising: 数据收集器创建模块,设置为n个子任务建立对应的n个数据收集器;The data collector creation module is set to establish corresponding n data collectors for n subtasks; 数据识别模块,设置为获取所述子任务的执行信息,根据所述执行信息获取所述数据收集器中记录的预设统计维度;A data identification module, configured to obtain execution information of the subtask, and obtain preset statistical dimensions recorded in the data collector according to the execution information; 数据确值模块,设置为在所述数据收集器中查找到所述预设统计维度对应的统计数值;A data verification module, configured to find the statistical value corresponding to the preset statistical dimension in the data collector; 数值更新模块,设置为判断所述预设统计维度对应的统计种类,根据所述统计种类以及所述执行信息更新所述统计数值;A value updating module, configured to determine the statistical type corresponding to the preset statistical dimension, and update the statistical value according to the statistical type and the execution information; 数据汇总模块,设置为在所述n个子任务执行完毕后,对所述n个数据收集器进行汇总。The data aggregation module is configured to aggregate the n data collectors after the execution of the n subtasks is completed. 9.一种电子设备,其特征在于,包括:9. An electronic device, characterized in that, comprising: 存储器;以及memory; and 耦合到所属存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行如权利要求1-7任一项所述的任务数据采集方法。A processor coupled to an associated memory, the processor being configured to execute the task data collection method of any one of claims 1-7 based on instructions stored in the memory. 10.一种计算机可读存储介质,其上存储有程序,该程序被处理器执行时实现如权利要求1-7任一项所述的任务数据采集方法。10. A computer-readable storage medium on which a program is stored, and when the program is executed by a processor, implements the task data collection method according to any one of claims 1-7.
CN201811419180.1A 2018-11-26 2018-11-26 Task data collection method and device Pending CN111221698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811419180.1A CN111221698A (en) 2018-11-26 2018-11-26 Task data collection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811419180.1A CN111221698A (en) 2018-11-26 2018-11-26 Task data collection method and device

Publications (1)

Publication Number Publication Date
CN111221698A true CN111221698A (en) 2020-06-02

Family

ID=70830299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811419180.1A Pending CN111221698A (en) 2018-11-26 2018-11-26 Task data collection method and device

Country Status (1)

Country Link
CN (1) CN111221698A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984628A (en) * 2020-08-28 2020-11-24 北京人大金仓信息技术股份有限公司 Database statistical information collection method, device, medium and electronic equipment
CN112181779A (en) * 2020-09-28 2021-01-05 北京云歌科技有限责任公司 AI metadata comprehensive processing method and system
CN113010376A (en) * 2021-03-01 2021-06-22 北京聚云科技有限公司 Method and device for monitoring cloud storage system for storing training data
CN113806034A (en) * 2021-01-06 2021-12-17 北京沃东天骏信息技术有限公司 Task execution method and device, computer readable storage medium and electronic device
CN114489974A (en) * 2021-12-30 2022-05-13 北京亿阳信通科技有限公司 Method and device for processing real-time data

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101197876A (en) * 2006-12-06 2008-06-11 中兴通讯股份有限公司 Method and system for multi-dimensional analysis of message service data
CN102402459A (en) * 2010-09-10 2012-04-04 中兴通讯股份有限公司 Method and device for summarizing performance data of network management system
CN103188702A (en) * 2011-12-29 2013-07-03 中兴通讯股份有限公司 Device performance reporting and statistical method, distributed devices, general control device and system
CN105045871A (en) * 2015-07-15 2015-11-11 国家超级计算深圳中心(深圳云计算中心) Data aggregation query method and apparatus
CN105574152A (en) * 2015-12-16 2016-05-11 北京邮电大学 Method and system for rapidly counting frequencies
CN105634845A (en) * 2014-10-30 2016-06-01 任子行网络技术股份有限公司 Method and system for carrying out multi-dimensional statistic analysis on large number of DNS journals
CN105786973A (en) * 2016-02-02 2016-07-20 重庆秒盈电子商务有限公司 Concurrent data processing method and system based on big data technology
CN105893411A (en) * 2015-11-20 2016-08-24 乐视致新电子科技(天津)有限公司 Statistic data processing method and apparatus
CN106202280A (en) * 2016-06-29 2016-12-07 联想(北京)有限公司 A kind of information processing method and server
CN107729138A (en) * 2017-09-14 2018-02-23 北京天耀宏图科技有限公司 A kind of analysis method and device of high-performance distributed Vector spatial data
CN107749896A (en) * 2017-11-13 2018-03-02 天津开心生活科技有限公司 Private clound concurrency control method and device, storage medium and electric terminal
CN107766504A (en) * 2017-10-20 2018-03-06 华迪计算机集团有限公司 A kind of real time streaming data Treatment Analysis method and system
CN107844374A (en) * 2017-11-02 2018-03-27 上海携程商务有限公司 The task executing method of terminal device, device, electronic equipment, storage medium
CN108427772A (en) * 2018-04-10 2018-08-21 携程商旅信息服务(上海)有限公司 Online report form generation method, system, equipment and storage medium
CN108829505A (en) * 2018-06-28 2018-11-16 北京奇虎科技有限公司 A kind of distributed scheduling system and method

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101197876A (en) * 2006-12-06 2008-06-11 中兴通讯股份有限公司 Method and system for multi-dimensional analysis of message service data
CN102402459A (en) * 2010-09-10 2012-04-04 中兴通讯股份有限公司 Method and device for summarizing performance data of network management system
CN103188702A (en) * 2011-12-29 2013-07-03 中兴通讯股份有限公司 Device performance reporting and statistical method, distributed devices, general control device and system
CN105634845A (en) * 2014-10-30 2016-06-01 任子行网络技术股份有限公司 Method and system for carrying out multi-dimensional statistic analysis on large number of DNS journals
CN105045871A (en) * 2015-07-15 2015-11-11 国家超级计算深圳中心(深圳云计算中心) Data aggregation query method and apparatus
CN105893411A (en) * 2015-11-20 2016-08-24 乐视致新电子科技(天津)有限公司 Statistic data processing method and apparatus
CN105574152A (en) * 2015-12-16 2016-05-11 北京邮电大学 Method and system for rapidly counting frequencies
CN105786973A (en) * 2016-02-02 2016-07-20 重庆秒盈电子商务有限公司 Concurrent data processing method and system based on big data technology
CN106202280A (en) * 2016-06-29 2016-12-07 联想(北京)有限公司 A kind of information processing method and server
CN107729138A (en) * 2017-09-14 2018-02-23 北京天耀宏图科技有限公司 A kind of analysis method and device of high-performance distributed Vector spatial data
CN107766504A (en) * 2017-10-20 2018-03-06 华迪计算机集团有限公司 A kind of real time streaming data Treatment Analysis method and system
CN107844374A (en) * 2017-11-02 2018-03-27 上海携程商务有限公司 The task executing method of terminal device, device, electronic equipment, storage medium
CN107749896A (en) * 2017-11-13 2018-03-02 天津开心生活科技有限公司 Private clound concurrency control method and device, storage medium and electric terminal
CN108427772A (en) * 2018-04-10 2018-08-21 携程商旅信息服务(上海)有限公司 Online report form generation method, system, equipment and storage medium
CN108829505A (en) * 2018-06-28 2018-11-16 北京奇虎科技有限公司 A kind of distributed scheduling system and method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984628A (en) * 2020-08-28 2020-11-24 北京人大金仓信息技术股份有限公司 Database statistical information collection method, device, medium and electronic equipment
CN112181779A (en) * 2020-09-28 2021-01-05 北京云歌科技有限责任公司 AI metadata comprehensive processing method and system
CN112181779B (en) * 2020-09-28 2024-06-04 北京云歌科技有限责任公司 Comprehensive processing method and system for AI metadata
CN113806034A (en) * 2021-01-06 2021-12-17 北京沃东天骏信息技术有限公司 Task execution method and device, computer readable storage medium and electronic device
CN113010376A (en) * 2021-03-01 2021-06-22 北京聚云科技有限公司 Method and device for monitoring cloud storage system for storing training data
CN113010376B (en) * 2021-03-01 2023-07-21 北京聚云科技有限公司 Monitoring method and device for cloud storage system for storing training data
CN114489974A (en) * 2021-12-30 2022-05-13 北京亿阳信通科技有限公司 Method and device for processing real-time data

Similar Documents

Publication Publication Date Title
US12229642B2 (en) Efficient duplicate detection for machine learning data sets
US11615076B2 (en) Monolith database to distributed database transformation
US11507583B2 (en) Tuple extraction using dynamically generated extractor classes
CN111221698A (en) Task data collection method and device
EP3161732B1 (en) Feature processing recipes for machine learning
EP3513313A1 (en) System for importing data into a data repository
CN107918600A (en) report development system and method, storage medium and electronic equipment
WO2019161645A1 (en) Shell-based data table extraction method, terminal, device, and storage medium
CN108052618B (en) Data management method and device
US9098497B1 (en) Methods and systems for building a search service application
US10996855B2 (en) Memory allocation in a data analytics system
CN111460137A (en) A method, device and medium for identifying microservice concerns based on topic model
CN112395333B (en) Method, device, electronic equipment and storage medium for checking data abnormality
CN113360517A (en) Data processing method and device, electronic equipment and storage medium
CN111651259A (en) Dependency-based system management method, device and storage medium
CN116089417A (en) Information acquisition method, information acquisition device, storage medium and computer equipment
US10552455B2 (en) Analytics enablement for engineering records
CN111143156B (en) Big data platform garbage task acquisition system, method and computer system
CN116628042A (en) Data processing method, device, equipment and medium
US12306806B1 (en) Systems and methods for data access acceleration and control within a secure storage network
KR102430880B1 (en) Method for providing drawing database
Koutsimpogiorgos Comparative analysis of SQL queries performance on vehicle sensor data in RDBMS and Apache Spark
CN113515504A (en) Data management method, device, electronic equipment and storage medium
US9984173B2 (en) Automated value analysis in legacy data
CN116662448A (en) Data automatic synchronization method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

Address before: Room 221, 2nd floor, Block C, 18 Kechuang 11th Street, Daxing Economic and Technological Development Zone, Beijing, 100176

Applicant before: BEIJING JINGDONG FINANCIAL TECHNOLOGY HOLDING Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200602