CN112860779B - Batch data importing method and device - Google Patents
Batch data importing method and device Download PDFInfo
- Publication number
- CN112860779B CN112860779B CN202110336651.8A CN202110336651A CN112860779B CN 112860779 B CN112860779 B CN 112860779B CN 202110336651 A CN202110336651 A CN 202110336651A CN 112860779 B CN112860779 B CN 112860779B
- Authority
- CN
- China
- Prior art keywords
- data
- task
- server
- imported
- trained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Debugging And Monitoring (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a batch data importing method and device, which adopt a pre-trained configuration generating model to generate parameter configuration information when batch data importing is executed. The process does not need excessive manual intervention, the function of determining parameter configuration information is borne by the configuration generation model to a large extent, the consumption of human resources is reduced, and the threshold for determining the parameter configuration information is lowered. In addition, the configuration generation model in the present specification takes information of data and hardware parameters of a server as inputs to determine parameter configuration information, and the parameter configuration information output by the configuration generation model can be more suitable for the data to be imported and is also more suitable for the server performing data storage. Moreover, the technical scheme in the specification is suitable for various business processing scenes, and is particularly suitable for financial business processing scenes.
Description
Technical Field
The application relates to the technical field of data processing, in particular to a batch data importing method and device.
Background
With the advancement of computer technology and the network age, batch processing technology has been very widely used. Particularly in some systems combined with the traditional industry, there is a large amount of traditional industry data to be stored electronically, and batch processing techniques based on batch importation are generally utilized. .
Batch processing is to export data to a data file, then transmit the file to a related system, and the related system writes storage logic (i.e. configures parameters) by itself to parse the data of the data file into its own database. In existing batch processes, different data files often require different storage logic to process, and the types of data files faced by related systems are often not uniform. This makes it necessary for maintenance personnel of the relevant system to write different storage logic for different data files, and it is seen that the process of writing the storage logic is labor-intensive. In addition, if there is a problem in writing storage logic, it will be relatively slow to process files in the storage process, and it needs to be considered whether the processing data volume is too large to cause service breakdown in batch processing. Further, if the number of related systems involved in the batch process is large, the problem that the process of writing the storage logic consumes more human resources is more prominent by writing the storage logic for each related system separately.
Disclosure of Invention
The application provides a batch data importing method and device, which effectively reduces the consumption degree of human resources in the batch data importing process, so that the process of determining parameter configuration information is more convenient, and the technical scheme adopted by the application is as follows:
In a first aspect, a batch data import method is provided, the batch data import method is based on a batch data import system, the system comprises a server, a pre-trained configuration generation model and a database; the method comprises the following steps:
acquiring data to be imported;
Generating a plurality of tasks according to the data to be imported;
inputting the information of the data to be imported and the hardware parameters of the server into a pre-trained configuration generation model to obtain parameter configuration information output by the configuration generation model;
Configuring the server according to the parameter configuration information to obtain a configured server;
and processing each task by adopting the configured server so as to import batch data corresponding to the task into the database.
In an alternative embodiment of the present specification, the pre-trained configuration generation model is obtained by:
Obtaining a training sample according to the information of the data obtained when the batch data is imported in the history;
Inputting the training samples and the hardware parameters of the server into a configuration generating model to be trained to obtain undetermined parameter configuration information output by the configuration generating model to be trained;
configuring the server by adopting the undetermined parameter configuration information to obtain a undetermined server;
Processing each task obtained by adopting the undetermined server to the data corresponding to the training sample, and determining the loss of the configuration generation model to be trained according to the processing effect;
And taking the loss minimization as a training target, and adjusting parameters of the configuration generating model to be trained to obtain a pre-trained configuration generating model.
In an optional embodiment of the present disclosure, the information of the data to be imported includes at least one of an amount of the data to be imported and a format of the data to be imported.
In an optional embodiment of the present disclosure, the parameter configuration information includes at least one of a main thread number, a secondary thread number, a first granularity of processing tasks of each thread, a capacity of a buffer pool, a size of a buffer pool of a read file, a number of allowed execution failures, a failure number threshold, and a second granularity; and the second granularity is used for carrying out the granularity of redefining the task which fails to be processed when the task which fails to be processed is processed again.
In an optional embodiment of the present disclosure, the processing each task by using the configured server includes:
the main thread of the server reads the data to be imported to a buffer pool;
and the auxiliary thread acquires the data to be imported from the buffer pool, and executes the task corresponding to the auxiliary thread according to the acquired data to be imported.
In an alternative embodiment of the present specification, the method further comprises:
If any auxiliary thread adopted in task processing is abnormal, when data to be imported does not exist in a buffer pool of the server, the task processed by the abnormal auxiliary thread is distributed to other auxiliary threads for reprocessing.
In an alternative embodiment of the present disclosure, the task processed by the abnormal secondary thread is allocated to other secondary threads for reprocessing, including:
determining a task processed by the abnormal auxiliary thread as a target task;
And dividing the target task into a plurality of subtasks according to a second granularity of model output generated by the pre-trained configuration, and distributing the subtasks to other auxiliary threads except the abnormal auxiliary threads.
In an alternative embodiment of the present disclosure, the task processed by the abnormal secondary thread is allocated to other secondary threads for reprocessing, including:
Recording the number of execution failures of the task processed by the abnormal auxiliary thread as 1, and updating the number of execution failures according to the situation that the task processed by the abnormal auxiliary thread fails to be executed again after that;
and when the updated execution failure times reach a failure times threshold value output by the pre-trained configuration generation model, generating and displaying alarm information.
In an alternative embodiment of the present specification, the configuration generation model is an RNN model.
In a second aspect, a batch data import apparatus is provided, which can perform a batch data import process provided in the above-described embodiments of the present application. The apparatus is for a batch data import system, as shown in FIG. 3, and includes one or more of the following modules:
The acquisition module is configured to acquire data to be imported;
the task generating module is configured to generate a plurality of tasks according to the data to be imported;
The parameter configuration information generation module is configured to input the information of the data to be imported and the hardware parameters of the server into a pre-trained configuration generation model to obtain parameter configuration information output by the configuration generation model;
The configuration module is configured to configure the server according to the parameter configuration information to obtain a configured server;
And the import module is configured to process each task by adopting the configured server so as to import batch data corresponding to the task into the database.
In an alternative embodiment of the present disclosure, the batch data importing apparatus may further include a training module.
The training module is configured to obtain a training sample according to the information of the data acquired during the batch data import in the history; inputting the training samples and the hardware parameters of the server into a configuration generating model to be trained to obtain undetermined parameter configuration information output by the configuration generating model to be trained; configuring the server by adopting the undetermined parameter configuration information to obtain a undetermined server; processing each task obtained by adopting the undetermined server to the data corresponding to the training sample, and determining the loss of the configuration generation model to be trained according to the processing effect; and taking the loss minimization as a training target, and adjusting parameters of the configuration generating model to be trained to obtain a pre-trained configuration generating model.
In an optional embodiment of the present disclosure, the information of the data to be imported includes at least one of an amount of the data to be imported and a format of the data to be imported.
In an optional embodiment of the present disclosure, the parameter configuration information includes at least one of a main thread number, a secondary thread number, a first granularity of processing tasks of each thread, a capacity of a buffer pool, a size of a buffer pool of a read file, a number of allowed execution failures, a failure number threshold, and a second granularity; and the second granularity is used for carrying out the granularity of redefining the task which fails to be processed when the task which fails to be processed is processed again.
In an optional embodiment of the present disclosure, the importing module is specifically configured to read, by a main thread of the server, the data to be imported to a buffer pool; and the auxiliary thread acquires the data to be imported from the buffer pool, and executes the task corresponding to the auxiliary thread according to the acquired data to be imported.
In an alternative embodiment of the present disclosure, the batch data importing apparatus may further include an exception handling module.
And the exception handling module is configured to allocate the task handled by the abnormal auxiliary line to other auxiliary lines for reprocessing when no data to be imported exists in the buffer pool of the server if any auxiliary line adopted for task handling is abnormal.
In an optional embodiment of the present disclosure, the exception handling module is specifically configured to determine a task handled by the auxiliary thread of the exception as a target task; and dividing the target task into a plurality of subtasks according to a second granularity of model output generated by the pre-trained configuration, and distributing the subtasks to other auxiliary threads except the abnormal auxiliary threads.
In an optional embodiment of the present disclosure, the exception handling module is further configured to record the number of execution failures of the task handled by the exceptional auxiliary thread as 1, and thereafter update the number of execution failures according to a case of re-execution failures of the task handled by the exceptional auxiliary thread; and when the updated execution failure times reach a failure times threshold value output by the pre-trained configuration generation model, generating and displaying alarm information.
In an alternative embodiment of the present specification, the configuration generation model is an RNN model.
In a third aspect, an electronic device is provided, the electronic device comprising:
One or more processors;
A memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: the batch data import method shown in the first aspect is performed.
In a fourth aspect, there is provided a computer readable storage medium storing computer instructions that, when run on a computer, cause the computer to perform the batch data import method of the first aspect.
The application provides a batch data importing method, a batch data importing device, electronic equipment and a computer-readable storage medium. The process does not need excessive manual intervention, the function of determining parameter configuration information is borne by the configuration generation model to a large extent, the consumption of human resources is reduced, and the threshold for determining the parameter configuration information is lowered. In addition, the configuration generation model in the present specification takes information of data and hardware parameters of a server as inputs to determine parameter configuration information, and the parameter configuration information output by the configuration generation model can be more suitable for the data to be imported and is also more suitable for the server performing data storage. Moreover, the technical scheme in the specification is suitable for various business processing scenes, and is particularly suitable for financial business processing scenes.
Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a schematic diagram of a batch data import system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a batch data import process according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a batch data importing apparatus according to an embodiment of the present application;
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.
In describing embodiments of the present application, words such as "exemplary," "such as" or "for example" are used to mean serving as examples, illustrations or explanations. Any embodiment or design described herein as "exemplary," "such as" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary," "such as" or "for example," etc., is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present application, the term "and/or" is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a alone, B alone, and both A and B. In addition, unless otherwise indicated, the term "plurality" means two or more. For example, a plurality of systems means two or more systems, and a plurality of terminals means two or more terminals.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
For convenience of description, only a portion related to the present invention is shown in the drawings. Embodiments and features of embodiments in this specification may be combined with each other without conflict.
A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.
The batch data import method is based on a batch data import system, which comprises a server, a pre-trained configuration generation model and a database. Illustratively, the batch data import process in this specification involves the network architecture of the batch data import system as shown in FIG. 1.
The present specification does not limit the source of the data to be imported, and the data to be imported may be from different data sources or from the same data. In this specification, data is referred to as data to be imported before it is successfully imported into a database in a batch data import system.
The server in the batch data importing system is used for importing data to be imported into the database. The configuration generation model is used to generate parameters for the server. The database is used for data storage.
In general, the importation of data to be imported, which is to be performed by a server, tends to be different in terms of different amounts of data, and/or different sizes of data. If the configuration of the server is unchanged, the server may have a better effect of importing data of some batches and a worse effect of importing data of other batches.
Therefore, if the parameters of the server are not adjusted, the server cannot meet the requirement of importing complex and changeable data, and the overall effect of data importing is affected.
In view of this, a batch data import method in the present specification is specifically proposed to enable timely and effective configuration of parameters of a server. The batch data import method in this specification may include one or more of the following steps.
S200: and acquiring data to be imported.
The timing of acquiring the data to be imported is not particularly limited in this specification.
In an alternative embodiment of the present disclosure, when the import database is completed without execution of the import data to be imported obtained for a certain lot, the import data to be imported for the next lot is not obtained. That is, this step may be performed after the completion of the import of the data to be imported of the previous batch.
The batch data import system in this specification may further include a buffer pool for temporarily storing data to be imported into the database. Whether the data to be imported in the buffer pool is detected, if not, the step is executed; if the detection result is yes, waiting for preset time, and executing the detection again until the detection result is no.
S202: and generating a plurality of tasks according to the data to be imported.
In general, the amount of data to be imported will not be too small, in some scenarios, to implement multi-threaded data import, the data may be divided into a plurality of tasks (tasks) according to a preset division rule, and then the server may process the tasks to implement importing the data to be imported into the database.
The specification does not particularly limit the division rule and the number of divided tasks. The division rules and the number of tasks obtained by division can be determined according to the actual scene.
S204: inputting the information of the data to be imported and the hardware parameters of the server into a pre-trained configuration generation model to obtain parameter configuration information output by the configuration generation model.
In this specification, the information of the data may be information that affects the process of importing the data into the database. The specific information can influence the process of importing data into the database, and can be determined according to actual scenes.
In an alternative embodiment of the present specification, the information of the data to be imported may include at least one of an amount of the data to be imported and a format of the data to be imported. In addition, other information may be used as information of the data to be imported, which is not described in detail.
In addition, the hardware parameters of the server are not particularly limited in the present specification, and the hardware parameters need to be determined according to the server adopted in the actual scenario. In an alternative embodiment of the present specification, the hardware parameter may be at least one of a maximum number of threads (at least one of a maximum number of main threads and a maximum number of auxiliary threads) that the server can provide, a maximum capacity of a buffer pool that the server can provide, and a processor parameter of the server.
The configuration generation model in this specification is used to generate at least part of the parameter configuration information employed by the server in performing data importation. In an alternative embodiment of the present specification, the parameter configuration information includes at least one of a main thread number, a secondary thread number, a first granularity of processing tasks for each thread, a capacity of a buffer pool, and a buffer pool size of a read file.
In addition, in another optional embodiment of the present specification, at least one of the number of times of allowing the execution to fail, the failure number threshold, and the second granularity for a task may also be used as parameter configuration information; and the second granularity is used for carrying out the granularity of redefining the task which fails to be processed when the task which fails to be processed is processed again.
The present description configuration generation model may be an artificial intelligence model. The existing artificial intelligent models capable of realizing the prediction function are suitable for the process in the specification. In an alternative embodiment of the present description, the configuration generation model is an RNN (RecurrentNeuralNetwork ) model.
S206: and configuring the server according to the parameter configuration information to obtain the configured server.
After the configuration generation model generates the parameter configuration information, at least a portion of the parameters of the server may be configured according to the parameter configuration information.
The configuration mode actually adopted in the parameter configuration process is not particularly limited in the specification. In an alternative embodiment of the present description, the configuration generation model may place parameter configuration information around into a configuration file having a format that can be directly read by the server. The configuration of the server can be completed only by importing the configuration file into the server, and the configured server is obtained.
S208: and processing each task by adopting the configured server so as to import batch data corresponding to the task into the database.
The server after configuration obtained by the above steps has at least parameters suitable for the data to be imported of the batch and the hardware condition of the server, and then the data to be imported can be imported into the database by using the server after configuration.
After the importing of the data to be imported is completed, step S200 may be executed again, and the configured server obtained through the process in the present specification may be better able to cope with even if the difference between the data to be processed obtained for each batch is large.
It can be seen that the parameter configuration information is generated using a pre-trained configuration generation model. The process does not need excessive manual intervention, the function of determining parameter configuration information is borne by the configuration generation model to a large extent, the consumption of human resources is reduced, and the threshold for determining the parameter configuration information is lowered. In addition, the configuration generation model in the present specification takes information of data and hardware parameters of a server as inputs to determine parameter configuration information, and the parameter configuration information output by the configuration generation model can be more suitable for the data to be imported and is also more suitable for the server performing data storage.
From the foregoing, it is apparent that the configuration generation model in the present specification plays a significant role in performing data import. How the configuration generation model is obtained will now be described.
In an alternative embodiment of the present disclosure, the training samples may be obtained based on information from data obtained during the batch data import in the history. And inputting the training samples and the hardware parameters of the server into a configuration generating model to be trained to obtain undetermined parameter configuration information output by the configuration generating model to be trained. And configuring the server by adopting the undetermined parameter configuration information to obtain the undetermined server. And processing each task obtained by the data corresponding to the training sample by adopting the undetermined server, and determining the loss of the configuration generation model to be trained according to the processing effect. And adjusting parameters of the configuration generating model to be trained with the minimum loss training target to obtain a pre-trained configuration generating model.
In addition, the configuration generation model to be trained can be trained in other ways, which are not listed in this description.
Furthermore, in some cases, the process of importing data to be imported by the server into the database is affected not only by the parameters of the server itself, but also to some extent by the database. When the configuration generating model to be trained is trained, the training sample, the hardware parameters of the server and the information of the database can be input into the configuration generating model to be trained, and training for the model is performed to obtain the trained configuration generating model.
Furthermore, in step S204, the information of the database may be input into the configuration generation model, and the configuration generation model outputs parameter configuration information according to the influence of the database on the data import of the server. The server is used for importing the data to be imported into the database according to the parameter configuration information, and the cooperation between the server and the database is considered, so that the effect of importing the data is improved comprehensively.
As is clear from the foregoing, in the present specification, the step of importing data to be imported into the database is mainly performed by the server. In the present specification, a server may be a cluster formed by a plurality of devices, components, and middleware, and the cluster may have a distributed structure.
And when the server performs data import after configuration, the main thread of the server reads the data to be imported to a buffer pool. And the auxiliary thread acquires the data to be imported from the buffer pool, and executes the task corresponding to the auxiliary thread according to the acquired data to be imported. In some alternative embodiments, the number of main threads, the number of auxiliary threads, the capacity of the buffer pool, and the first granularity of processing tasks (i.e., the amount of data that a task contains) for each thread may be obtained using a configuration generation model.
In addition, the server is inevitably abnormal in the process of data import. The abnormal phenomenon may be caused by the data to be imported, or may be caused by the environment where the server is located.
In order to avoid the abnormal phenomenon from affecting the data importing process, the data importing process in the present specification further includes: when data import is performed, the operating states of the threads are detected. If any auxiliary line adopted in the task processing is abnormal, when the data to be imported does not exist in the buffer pool of the server, the task processed by the abnormal auxiliary line is distributed to other auxiliary lines for reprocessing, so that each task obtained according to the data to be imported can be processed properly.
According to the method and the device, when the data to be imported does not exist in the data pool, the tasks corresponding to the abnormality are processed, and the normal task processing by other normal auxiliary threads can be effectively avoided.
In an alternative embodiment of the present disclosure, if an abnormality occurs in a secondary thread, a task handled by the abnormal secondary thread is determined as a target task. And dividing the target task into a plurality of subtasks according to a second granularity of model output generated by the pre-trained configuration, and distributing the subtasks to other auxiliary threads except the abnormal auxiliary threads. Wherein the second particle size is smaller than the first particle size described above.
According to the division of the tasks by the second granularity, the tasks can be divided into smaller subtasks, so that the thread can locate which subtask the abnormality caused by the task specifically occurs in when processing the subtasks.
After that, if an abnormality occurs in the importing of the data execution corresponding to a subtask, the subtask corresponding to the abnormality which occurs again is further divided according to the third granularity generated by the configuration generation model, and a plurality of task units are obtained. And then, distributing the obtained task unit to a normal thread so that the normal thread processes the task unit.
Further, when an abnormality occurs in a task handled by a certain auxiliary thread, recording the number of execution failures of the task handled by the abnormal auxiliary thread as 1, and thereafter, updating the number of execution failures according to the case of re-execution failures of the task handled by the abnormal auxiliary thread; and when the updated execution failure times reach a failure times threshold value output by the pre-trained configuration generation model, generating and displaying alarm information.
The present disclosure further provides a batch data importing apparatus that may perform a batch data importing process provided in the above embodiment of the present application. The apparatus is for a batch data import system, as shown in FIG. 3, and includes one or more of the following modules:
an acquisition module 300 configured to acquire data to be imported;
a task generating module 302, configured to generate a plurality of tasks according to the data to be imported;
the parameter configuration information generating module 304 is configured to input the information of the data to be imported and the hardware parameters of the server into a pre-trained configuration generating model to obtain parameter configuration information output by the configuration generating model;
The configuration module 306 is configured to configure the server according to the parameter configuration information to obtain a configured server;
and the importing module 308 is configured to process each task by adopting the configured server so as to import batch data corresponding to the task into the database.
In an alternative embodiment of the present disclosure, the batch data importing apparatus may further include a training module 310.
The training module 310 is configured to obtain a training sample according to the information of the data obtained when the batch data is imported in the history; inputting the training samples and the hardware parameters of the server into a configuration generating model to be trained to obtain undetermined parameter configuration information output by the configuration generating model to be trained; configuring the server by adopting the undetermined parameter configuration information to obtain a undetermined server; processing each task obtained by adopting the undetermined server to the data corresponding to the training sample, and determining the loss of the configuration generation model to be trained according to the processing effect; and taking the loss minimization as a training target, and adjusting parameters of the configuration generating model to be trained to obtain a pre-trained configuration generating model.
In an optional embodiment of the present disclosure, the information of the data to be imported includes at least one of an amount of the data to be imported and a format of the data to be imported.
In an optional embodiment of the present disclosure, the parameter configuration information includes at least one of a main thread number, a secondary thread number, a first granularity of processing tasks of each thread, a capacity of a buffer pool, a size of a buffer pool of a read file, a number of allowed execution failures, a failure number threshold, and a second granularity; and the second granularity is used for carrying out the granularity of redefining the task which fails to be processed when the task which fails to be processed is processed again.
In an optional embodiment of the present disclosure, the importing module 308 is specifically configured to read, by a main thread of the server, the data to be imported to a buffer pool; and the auxiliary thread acquires the data to be imported from the buffer pool, and executes the task corresponding to the auxiliary thread according to the acquired data to be imported.
In an alternative embodiment of the present disclosure, the batch data import apparatus may further include an exception handling module 312.
The exception handling module 312 is configured to, if any one of the auxiliary threads adopted when the task is handled is abnormal, allocate the task handled by the abnormal auxiliary thread to the other auxiliary threads for reprocessing when there is no data to be imported in the buffer pool of the server.
In an alternative embodiment of the present disclosure, the exception handling module 312 is specifically configured to determine a task handled by the auxiliary thread of the exception as a target task; and dividing the target task into a plurality of subtasks according to a second granularity of model output generated by the pre-trained configuration, and distributing the subtasks to other auxiliary threads except the abnormal auxiliary threads.
In an alternative embodiment of the present specification, the exception handling module 312 is further configured to record the number of execution failures of the task handled by the exception auxiliary thread as 1, and thereafter update the number of execution failures according to the case of re-execution failures of the task handled by the exception auxiliary thread; and when the updated execution failure times reach a failure times threshold value output by the pre-trained configuration generation model, generating and displaying alarm information.
In an alternative embodiment of the present specification, the configuration generation model is an RNN model.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the objectives of one or more embodiments of the present disclosure. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
Further, an embodiment of the present application provides an electronic device, as shown in fig. 4, an electronic device 40 shown in fig. 4 includes: a processor 401 and a memory storage device 403. Wherein the processor 401 is coupled to a memory storage device 403, such as via a bus 402. Further, the electronic device 40 may also include a transceiver 404. It should be noted that, in practical applications, the transceiver 404 is not limited to one, and the structure of the electronic device 40 is not limited to the embodiment of the present application. The processor 401 is applied in the embodiment of the present application to implement the functions of each module shown in fig. 3. Transceiver 404 includes a receiver and a transmitter.
The processor 401 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. Processor 401 may also be a combination that implements computing functionality, such as a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 402 may include a path to transfer information between the components. Bus 402 may be a PCI bus, an EISA bus, or the like. Bus 402 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 4, but not only one bus or one type of bus.
Storage device 403 may be, but is not limited to, a ROM or other type of static storage device, a RAM or other type of dynamic storage device, that can store static information and instructions, an EEPROM, CD-ROM or other optical disk storage, optical disk storage (including compact disks, laser disks, optical disks, digital versatile disks, blu-ray disks, etc.), magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory device 403 is used for storing application program codes for executing the inventive arrangements and is controlled to be executed by the processor 401. The processor 401 is operative to execute application code stored in the memory device 403 for implementing the functions of the various modules illustrated in fig. 3.
The embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method shown in the above embodiment.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The foregoing is only a partial embodiment of the present application, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations should and are intended to be comprehended within the scope of the present application.
Claims (10)
1. A batch data import method based on a batch data import system comprising a server, a pre-trained configuration generation model and a database; the method comprises the following steps:
acquiring data to be imported;
Generating a plurality of tasks according to the data to be imported;
inputting the information of the data to be imported and the hardware parameters of the server into a pre-trained configuration generation model to obtain parameter configuration information output by the configuration generation model;
Configuring the server according to the parameter configuration information to obtain a configured server;
Processing each task by adopting the configured server so as to import batch data corresponding to the task into the database;
the information of the data to be imported comprises the amount of the data to be imported and/or the format of the data to be imported;
The parameter configuration information comprises a main thread number, a secondary thread number, a first granularity of each thread processing task, a capacity of a buffer pool, a size of the buffer pool for reading a file, a number of allowed execution failures, a failure number threshold value and a second granularity; and the second granularity is used for carrying out the granularity of redefining the task which fails to be processed when the task which fails to be processed is processed again.
2. The method of claim 1, wherein the pre-trained configuration generation model is obtained by:
Obtaining a training sample according to the information of the data obtained when the batch data is imported in the history;
Inputting the training samples and the hardware parameters of the server into a configuration generating model to be trained to obtain undetermined parameter configuration information output by the configuration generating model to be trained;
configuring the server by adopting the undetermined parameter configuration information to obtain a undetermined server;
Processing each task obtained by adopting the undetermined server to the data corresponding to the training sample, and determining the loss of the configuration generation model to be trained according to the processing effect;
And taking the loss minimization as a training target, and adjusting parameters of the configuration generating model to be trained to obtain a pre-trained configuration generating model.
3. The method of claim 1, wherein processing each task with the post-configuration server comprises:
the main thread of the server reads the data to be imported to a buffer pool;
and the auxiliary thread acquires the data to be imported from the buffer pool, and executes the task corresponding to the auxiliary thread according to the acquired data to be imported.
4. The method according to claim 1, wherein the method further comprises:
If any auxiliary thread adopted in task processing is abnormal, when data to be imported does not exist in a buffer pool of the server, the task processed by the abnormal auxiliary thread is distributed to other auxiliary threads for reprocessing.
5. The method of claim 4, wherein assigning tasks handled by the anomalous secondary thread to other secondary threads for reprocessing, comprises:
determining a task processed by the abnormal auxiliary thread as a target task;
And dividing the target task into a plurality of subtasks according to a second granularity of model output generated by the pre-trained configuration, and distributing the subtasks to other auxiliary threads except the abnormal auxiliary threads.
6. The method of claim 4, wherein assigning tasks handled by the anomalous secondary thread to other secondary threads for reprocessing, comprises:
Recording the number of execution failures of the task processed by the abnormal auxiliary thread as 1, and updating the number of execution failures according to the situation that the task processed by the abnormal auxiliary thread fails to be executed again after that;
and when the updated execution failure times reach a failure times threshold value output by the pre-trained configuration generation model, generating and displaying alarm information.
7. The method according to any one of claims 1 to 6, wherein the configuration generation model is an RNN model.
8. A batch data importing apparatus that can execute the batch data importing method provided in any one of claims 1 to 7; the device is used for a batch data import system, and comprises one or more of the following modules:
The acquisition module is configured to acquire data to be imported;
the task generating module is configured to generate a plurality of tasks according to the data to be imported;
The parameter configuration information generation module is configured to input the information of the data to be imported and the hardware parameters of the server into a pre-trained configuration generation model to obtain parameter configuration information output by the configuration generation model;
The configuration module is configured to configure the server according to the parameter configuration information to obtain a configured server, wherein the parameter configuration information comprises a main thread number, a secondary thread number, a first granularity of each thread processing task, a capacity of a buffer pool, a size of the buffer pool of a read file, a permitted execution failure frequency, a failure frequency threshold value and a second granularity; the second granularity is used for determining the granularity of the task which fails to be processed again when the task which fails to be processed is processed again;
The import module is configured to process each task by adopting the configured server so as to import batch data corresponding to the task into the database, wherein the information of the data to be imported comprises the quantity of the data to be imported and/or the format of the data to be imported;
the batch data importing device further comprises a training module.
9. An electronic device, one or more processors;
A memory;
One or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to: performing the method of any of the preceding claims 1-7.
10. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110336651.8A CN112860779B (en) | 2021-03-29 | 2021-03-29 | Batch data importing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110336651.8A CN112860779B (en) | 2021-03-29 | 2021-03-29 | Batch data importing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112860779A CN112860779A (en) | 2021-05-28 |
CN112860779B true CN112860779B (en) | 2024-05-24 |
Family
ID=75993122
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110336651.8A Active CN112860779B (en) | 2021-03-29 | 2021-03-29 | Batch data importing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112860779B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114201232B (en) * | 2021-12-01 | 2023-08-22 | 东莞新能安科技有限公司 | Battery management system parameter configuration method, device, system and upper computer |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6289345B1 (en) * | 1997-06-26 | 2001-09-11 | Fujitsu Limited | Design information management system having a bulk data server and a metadata server |
CN104980421A (en) * | 2014-10-15 | 2015-10-14 | 腾讯科技(深圳)有限公司 | Method and system for processing batch requests |
CN110502344A (en) * | 2019-08-26 | 2019-11-26 | 联想(北京)有限公司 | A kind of data adjustment method and device |
CN110991649A (en) * | 2019-10-28 | 2020-04-10 | 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) | Deep learning model building method, device, equipment and storage medium |
CN111459631A (en) * | 2020-03-27 | 2020-07-28 | 厦门梦加网络科技股份有限公司 | Automatic batch processing method and system for server |
CN111666144A (en) * | 2020-06-19 | 2020-09-15 | 中信银行股份有限公司 | Batch processing task execution method and system and machine room deployment system |
CN112561078A (en) * | 2020-12-18 | 2021-03-26 | 北京百度网讯科技有限公司 | Distributed model training method, related device and computer program product |
-
2021
- 2021-03-29 CN CN202110336651.8A patent/CN112860779B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6289345B1 (en) * | 1997-06-26 | 2001-09-11 | Fujitsu Limited | Design information management system having a bulk data server and a metadata server |
CN104980421A (en) * | 2014-10-15 | 2015-10-14 | 腾讯科技(深圳)有限公司 | Method and system for processing batch requests |
CN110502344A (en) * | 2019-08-26 | 2019-11-26 | 联想(北京)有限公司 | A kind of data adjustment method and device |
CN110991649A (en) * | 2019-10-28 | 2020-04-10 | 中国电子产品可靠性与环境试验研究所((工业和信息化部电子第五研究所)(中国赛宝实验室)) | Deep learning model building method, device, equipment and storage medium |
CN111459631A (en) * | 2020-03-27 | 2020-07-28 | 厦门梦加网络科技股份有限公司 | Automatic batch processing method and system for server |
CN111666144A (en) * | 2020-06-19 | 2020-09-15 | 中信银行股份有限公司 | Batch processing task execution method and system and machine room deployment system |
CN112561078A (en) * | 2020-12-18 | 2021-03-26 | 北京百度网讯科技有限公司 | Distributed model training method, related device and computer program product |
Also Published As
Publication number | Publication date |
---|---|
CN112860779A (en) | 2021-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6447120B2 (en) | Job scheduling method, data analyzer, data analysis apparatus, computer system, and computer-readable medium | |
CN110389748A (en) | Service data processing method and terminal equipment | |
CN110609807B (en) | Method, apparatus and computer readable storage medium for deleting snapshot data | |
WO2020215925A1 (en) | Event subscription method and apparatus based on blockchain | |
CN112860779B (en) | Batch data importing method and device | |
CN112631994A (en) | Data migration method and system | |
CN109614386B (en) | Data processing method, device, server and computer readable storage medium | |
CN117149378A (en) | Task scheduling method, device, equipment and media for smart car operating system | |
Mustafee et al. | Hybrid Models with Real-Time Data in Healthcare: A Focus on Data Synchronization and Experimentation | |
CN115079881A (en) | Virtual reality-based picture correction method and system | |
CN116483558A (en) | Resource management method, system, device, processor and electronic equipment | |
CN116360960A (en) | Memory allocation method and memory allocation device based on many-core chip | |
CN113971074A (en) | Transaction processing method, apparatus, electronic device, and computer-readable storage medium | |
CN118519734A (en) | Job processing method and device, storage medium and electronic equipment | |
US20200110642A1 (en) | Funnel locking for normal rcu grace period requests | |
CN105573920A (en) | Storage space management method and device | |
CN111078449A (en) | Information processing method, information processing device, and terminal device | |
CN114546623B (en) | Task scheduling method and system based on big data system | |
CN111159353B (en) | System and method for constructing intelligent reporting robot based on multidimensional data | |
US12430068B2 (en) | Managing provenance information for data processing pipelines | |
CN112084297B (en) | Data processing method, device, electronic equipment and storage medium | |
CN118096386A (en) | Asset management method, equipment and medium based on Lambda architecture and Kappa architecture | |
CN117911153A (en) | Service data processing method, device and equipment based on attribute change | |
CN117314087A (en) | Technical support resource selection method, device, equipment and medium | |
CN106570161A (en) | Data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |