CN116775646B

CN116775646B - Database data management method, device, computer equipment and storage medium

Info

Publication number: CN116775646B
Application number: CN202310586811.3A
Authority: CN
Inventors: 孙辽东
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2023-05-23
Filing date: 2023-05-23
Publication date: 2025-09-26
Anticipated expiration: 2043-05-23
Also published as: CN116775646A

Abstract

The present invention relates to the field of database technology, and discloses a database data management method, device, computer equipment, and storage medium, including: obtaining time information of data to be processed; determining a storage area corresponding to the data to be processed based on the time information; processing the data to be processed based on different storage areas, and determining a processing result. The method divides the data to be processed according to the time information, stores the divided data to be processed in different storage areas, and then processes the data to be processed accordingly based on the storage areas. The specific processing method can be determined according to the actual application scenario. For scenarios such as writing and reading database data, the method can avoid memory overflow caused by writing or loading a large amount of data at one time, effectively improving the stability of the database, reducing the resource usage of the database, and improving the efficiency of data management.

Description

Database data management method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of database technologies, and in particular, to a method and apparatus for managing database data, a computer device, and a storage medium.

Background

A server is one type of computer that runs faster and is more highly loaded than a normal computer, and can provide computing or application services for other clients in the network. The server has high-speed operation capability, long-time reliable operation, strong I/O external data throughput capability and better expansibility. Monitoring the server helps to improve the server and discover faults of the server in time. The process of monitoring system resources, such as CPU utilization, memory consumption, CPU temperature, etc., during server performance monitoring can help identify performance-related problems with the server.

The collected performance data is stored in a database during performance monitoring of the server, and the data in the database is displayed through a front-end page or corresponding data is displayed according to query of a user. In general, performance monitoring is performed on all servers in a server cluster, and due to the number of server nodes, a database may face a problem of memory overflow when performing large-scale data acquisition, data writing and data query.

Disclosure of Invention

In view of this, the embodiments of the present invention provide a method, an apparatus, a computer device, and a storage medium for managing data of a database, so as to solve the problem of memory overflow of the database.

In a first aspect, an embodiment of the present invention provides a method for managing data in a database, where the method includes:

acquiring time information of data to be processed;

determining a storage area corresponding to the data to be processed based on the time information;

And processing the data to be processed based on different storage areas, and determining a processing result.

According to the data management method of the database, time information of data to be processed is collected, a storage area corresponding to the data to be processed is determined according to the time information, and then the data to be processed in different storage areas are processed. According to the method, the data to be processed is divided according to the time information, the divided data to be processed is stored in different storage areas, the data to be processed is correspondingly processed based on the storage areas, a specific processing mode can be determined according to actual application scenes, and the method can avoid memory overflow caused by writing or loading a large amount of data at one time, effectively improves the stability of a database, reduces the resource occupation of the database and improves the efficiency of data management.

In some optional embodiments, if the data to be processed is data to be written, the time information includes a collection time of the data to be written, the storage area includes a target database and a disk file, and the determining, based on the time information, a storage area corresponding to the data to be processed includes:

Calculating the time difference between the acquisition time of the data to be written and the current time;

Judging the time difference and the preset tray drop threshold value to obtain a judging result;

And dividing the data to be written into first data to be written and second data to be written based on the judging result, wherein a storage area corresponding to the first data to be written is a target database, and a storage area corresponding to the second data to be written is a disk file.

In some optional embodiments, the processing the data to be processed based on different storage areas, and determining a processing result includes:

writing the first data to be written into the target database;

and writing the second data to be written into the disk file.

In some optional embodiments, the processing the data to be processed based on different storage areas, determining a processing result, further includes:

And acquiring the second data to be written from the disk file, and storing the second data to be written into the target database.

In some optional embodiments, if the data to be processed is data to be read, the time information includes a collection time and a queried time of the data to be read, and determining, based on the time information, a storage area corresponding to the data to be processed includes:

storing index information corresponding to the data to be read, the acquisition time of which is smaller than a preset acquisition time threshold value, into a first reading area;

And storing index information corresponding to the data to be read, of which the queried time is lower than a preset query time threshold value, in the first reading area into a second reading area, and storing the index information of the data to be read in the first reading area and the data to be read corresponding to the index information of the data to be read in the second reading area into a third reading area.

Acquiring a query request;

Judging whether a target index corresponding to the query request is stored in the first reading area or not based on the query request;

And when the target index corresponding to the query request is stored in the first reading area, acquiring target data to be read corresponding to the target index from the third reading area.

When the target index corresponding to the query request is not stored in the first reading area, judging whether the target index corresponding to the query request is stored in the second reading area;

and when the target index corresponding to the query request is stored in the second reading area, acquiring target data to be read corresponding to the target index from the third reading area.

In a second aspect, an embodiment of the present invention provides a data management apparatus for a database, the apparatus including:

the information acquisition module is used for acquiring time information of the data to be processed;

the area determining module is used for determining a storage area corresponding to the data to be processed based on the time information;

And the data processing module is used for processing the data to be processed based on different storage areas and determining a processing result.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, where the memory and the processor are communicatively connected to each other, and the memory stores computer instructions, and the processor executes the computer instructions, thereby executing the data management method of the database according to the first aspect or any implementation manner corresponding to the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where computer instructions are stored on the computer readable storage medium, where the computer instructions are configured to cause a computer to perform a method for managing data in a database according to the first aspect or any one of the embodiments corresponding to the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a method of data management of a database according to some embodiments of the invention;

FIG. 2 is a flow chart of a method of data management of a database according to some embodiments of the invention;

FIG. 3 is a schematic diagram of a cache queue according to some embodiments of the invention;

FIG. 4 is a flow chart of a method of data management of a database according to some embodiments of the invention;

FIG. 5 is a flow diagram of a method of data management of a database according to some embodiments of the invention;

FIG. 6 is a schematic diagram of a memory region according to some embodiments of the invention;

FIG. 7 is a schematic diagram of a data query process according to some embodiments of the invention;

FIG. 8 is a schematic diagram of a data writing process according to some embodiments of the invention;

fig. 9 is a block diagram of a data management apparatus of a database according to an embodiment of the present invention;

Fig. 10 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Monitoring the performance of the server is helpful to identify the performance problem of the server, and repair the server in time, in the related art, the performance of the server can be collected and displayed through monitoring software, and as the server cluster is generally monitored in a large scale, the database can face the problem of memory overflow under the scene of second-level data writing and the scene of frequent index data change.

Based on the above, the embodiment of the invention provides a data management method of a database, so that the memory occupation of the database is optimized, and meanwhile, the data query performance is improved.

According to an embodiment of the present invention, there is provided a data management method embodiment of a database, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

In this embodiment, a method for managing data of a database is provided, and fig. 1 is a flowchart of a method for managing data of a database according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:

step S11, time information of the data to be processed is acquired.

The embodiment of the method can be used for server performance monitoring software which can be used for collecting the performance data of the server, storing the performance data into a database and displaying the collected performance data through a front-end page. When the user needs to perform server performance query, data screening can be performed by inputting keywords, time and the like, and corresponding performance data is displayed. The server cluster is generally monitored, a plurality of server devices are involved in the server cluster, an acquisition tool is installed on each server device, performance data of the server are acquired at regular time according to a set period, and specific acquisition of the performance data and the period can be set according to actual requirements, for example, acquisition is set to be performed once every 5 seconds, and CPU utilization rate, CPU temperature, GPU thread utilization rate, disk file read-write performance and the like are acquired. The acquisition tool sends a data writing request to the database terminal after acquiring the performance data, and the database terminal receives the data writing request and stores the performance data.

The data to be processed refers to server performance data, and the embodiment of the invention relates to the scenes of writing, inquiring and the like of database data, so that the data to be processed can comprise data which needs to be stored into a database after the acquisition tool acquires the server performance data, data which a user needs to inquire and read from the database according to requirements and the like, and the positions of the data to be processed in different scenes are different.

The time information may include a collection time of server performance data, a latest queried time of data stored in the database, a queried frequency of data stored in the database, and the like. For the data to be processed in different scenes, the specific time information corresponding to the data is different, for example, for the data which is not stored in the database, the time information comprises the acquisition time of the performance data.

Step S12, based on the time information, a storage area corresponding to the data to be processed is determined.

Before the data to be processed is correspondingly processed, the data to be processed is required to be stored in different storage areas, the data to be processed in different scenes is divided according to time information of the data to be processed, a time threshold can be set specifically, the time information is compared with the time threshold, the data to be processed is further divided, and all batches of data obtained through division are stored in the corresponding storage areas.

The storage area can be in the form of a queue or a form, the specific existence form is not limited, and the storage area can be correspondingly set according to the actual scene.

Step S13, processing the data to be processed based on different storage areas, and determining a processing result.

And respectively executing different processing operations on the data to be processed which are placed in different storage areas, wherein the specific processing modes of the data to be processed are different in different scenes. When the application scene is the written data, the final processing result is to write the data to be processed into the database, and when the application scene is the query data, the final processing result is to read the data from the database according to the query request.

Taking the scenario that data is written into a database as an example, data to be processed is data which needs to be stored into the database, generally, the collected data may need to be stored into the database in a large batch, in order to avoid the problem of memory overflow of the database caused by storing a large amount of data, in the scheme, the data which is not stored into the database for a long time but is stored into other areas are stored into the database through time information of the data to be processed, and the data with the newer collection time is stored into the database, wherein the time information comprises the collection time, and the number and the size of the data which are written into the database at one time are limited in such a way, so that the memory overflow caused by writing a large amount of data is avoided.

The method in the scheme can be realized through the plug-in, different application scenes can correspond to different plug-ins, the functions can be integrated into one plug-in, the functions are integrated into the server performance monitoring software through the plug-in mode, the original business of the software cannot be invaded, the database products which are specifically related are not limited, and the maintenance cost and the technical risk can be reduced. The database software involved is not limited, e.g., influxDB, kdb+, graphite, etc.

In this embodiment, a method for managing data of a database is provided, fig. 2 is a flowchart of a method for managing data of a database according to an embodiment of the present invention, as shown in fig. 2, if data to be processed is data to be written, time information includes collection time of the data to be written, and a storage area includes a target database and a disk file, where the flowchart includes the following steps:

step S21, time information of the data to be processed is acquired.

Please refer to step S11 in the embodiment shown in fig. 1 in detail, which is not described herein.

Step S22, calculating the time difference between the acquisition time of the data to be written and the current time.

The embodiment is applied to a scene of writing data into a database, the data to be processed is the data to be written, the data to be written is sent to the database by a performance acquisition tool on a server after the data is acquired, the acquisition tool sends a data writing request to the database, the data to be written is contained in the data writing request, the server receives the data writing request, and the data to be written is acquired from the data writing request.

Before the method of step S22 is executed, the data to be written may be stored in a buffer queue, and specifically, the buffer queue may be written into the buffer queue in a double Hash (Hash) nested manner, as shown in fig. 3, specifically as follows:

1. Hash table1 (HashTable 1) adopts the structure of array + linked list, key is the name of the node to be collected, and value is the nested Hash list. The size HashTable is initialized to the number of nodes, the nodes refer to servers, and the number of the nodes is the number of servers in the server cluster. First, the capacity expansion is required, in this embodiment, the capacity expansion policy is to perform capacity expansion when the capacity usage number > =0.75 is an initial value, and the capacity expansion is (2×old size+1), where Old size refers to the number of nodes when the data writing operation is performed last time.

The addressing mode is that the position of the linked list and hash (nodeName)% threadNum are found in the array according to the hash value of NodeName, and then the final HASHENTRY is found in the list by searching.

2. Hash table2 (HashTable) is similar to HashTable except that the initialization size is the number of acquisition entries and the addressing is hash (collectName)% collectNum.

The actual data stored in hash table 2 is a compressed binary data, which includes MetricID, data (actual collected data, such as cpu utilization, memory utilization, etc.), lock (Lock, lock is added when the data is dropped or written in InfluxDB, preventing dirty data from being written in), status (whether it can be recycled, including 0: draft, 1: dropped, 2: written in InfluxDB, 3: write InfluxDB succeeded, 4: write InfluxDB failed, 5: dropped succeeded, 6: dropped failed).

Wherein MetricID refers to the identification number of the data, which represents the uniqueness of the data, and the specific composition can be time+nodename+ collectName + DeviceId. Lock and Status may be changed based on the state of the data prior to writing to the database.

The method for using the double hash table comprises the steps that a hash table 1 records node names and data names to be written, and the node names and the data names to be written in the hash table 1 are positioned to a hash table 2 to obtain specific data corresponding to the node names and the data names to be written in.

The following illustrates the process of storing data to be written into a cache queue:

firstly, completing initial configuration of a buffer queue according to the cluster scale, wherein the initial configuration comprises the length of the queue, the threshold time for writing data into a disk file, the concurrent number of threads and the number of acquisition items. The acquisition items are server attributes which specifically need to be acquired, such as temperature, utilization rate and the like. The name of the acquisition item is the name of the data to be written.

The data format to be written is { nodename=node1, collectName = CPU cpuTemp =80 ℃, idle=80% }, cpuTemp and idle are acquisition item names;

Retrieving Hashtable1 according to nodeName, and writing nodename=nod1, collectName =cpu in HASHENTRY of Hashtable 1;

Hashtable2 was retrieved according to nodename+ collectName and written in HASHENTRY of Hashtable 2:

MetricID = 2023010123959_node1_cpu_cpu 0, data = { cpuTemp = 80 ℃, idle = 80% }, lock = unlock, status = 0.

And after the data to be written is stored in the cache queue, finishing the storage of the data by monitoring the cache queue. And concurrently reading the data to be written from the cache queue, and calculating the time difference between the acquisition time of the data to be written and the current time, wherein the acquisition time of the data to be written is carried in the data to be written.

Step S23, judging the time difference and the preset tray drop threshold value to obtain a judging result.

The preset disc drop threshold refers to the threshold time for writing data into a disc file, and is set according to actual conditions. Comparing the time difference with a preset landing threshold to obtain a judging result, wherein the judging result comprises that the time difference is not larger than the preset landing threshold and the time difference is larger than the preset landing threshold.

Step S24, dividing the data to be written into first data to be written and second data to be written based on the judging result, wherein the storage area corresponding to the first data to be written is a target database, and the storage area corresponding to the second data to be written is a disk file.

And when the judging result is that the time difference is larger than the preset tray drop threshold, the data to be written is out of date, namely, the data is not processed for a long time after being collected, the data is determined to be second data to be written, and a storage area corresponding to the second data to be written is a magnetic disk file.

And when the judgment result is that the time difference is not greater than the preset tray drop threshold, determining the data as first data to be written, wherein a storage area corresponding to the first data to be written is a target database, and the target database refers to a database to be stored.

Step S25, processing the data to be processed based on different storage areas, and determining a processing result.

After the data to be written is divided and the storage area corresponding to each data is determined, each data is written into the corresponding database.

Specifically, step S25 includes the steps of:

s251, writing the first data to be written into a target database;

s252, writing the second data to be written into the disk file.

In some alternative embodiments, step S25 further comprises obtaining second data to be written from the disk file, and storing the second data to be written to the target database.

And for the second data to be written stored in the disk file, data playback can be performed, the second data to be written is read from the disk file concurrently, and the second data to be written is directly written into the target database. If the writing of the target database fails due to the problems of overtime, network error and the like, the data which fails to be written into the target database is written into the disk file.

The following describes the above process completely, specifically, as shown in fig. 4, taking the target database as InfluxDB as an example, and performing concurrent reading on the data to be written in the write cache queue, where the data to be written is divided into a direct write queue and a data playback queue, the direct write queue judges whether the data to be written is to be dropped by the time difference and the size of the preset drop threshold, if the result is no, the first data to be written is written in the InfluxDB, and if the result is yes, the second data to be written is written in the disk file. And concurrently reading second data to be written in the disk file through the data playback queue, and writing the second data to be written in the InfluxDB. If the writing of the InfluxDB fails, the data is stored to a disk file.

In the process of writing the data in the cache queue into the database, changing the state of the data in the hash table according to the actual state of the data, changing the state of the data in the cache queue into 'successful writing InfluxDB' or 'successful disc dropping', deleting the data in the cache queue, avoiding excessive data storage in the cache queue, and reducing the occupation of a memory when the data to be written is stored into the database or a disc file. After the data is extracted from the cache queue and before the data is stored in the database or the disk file, the data can be locked, and only one-way conversion can be performed after the data is locked.

According to the data management method for the database, the time difference between the acquisition time and the current time of the data to be written is calculated, the time difference and the preset landing threshold value are judged, the data to be written is divided according to the judgment result, the first data to be written is written into the target database, the second data to be written is written into the disk file, the second data to be written is read from the disk file, and the second data to be written is stored into the target database. According to the method, the data volume of the target data base to be written is limited by dividing the data to be written, so that batch storage of the data can be realized, the problem of memory overflow of the target data base caused by writing a large amount of data is avoided, and the memory occupation is optimized.

In this embodiment, a method for managing data in a database is provided, fig. 5 is a flowchart of a method for managing data in a database according to an embodiment of the present invention, and as shown in fig. 5, if data to be processed is data to be read, time information includes acquisition time and queried time of the data to be read, where the flowchart includes the following steps:

Step S31, time information of the data to be processed is acquired.

Step S32, storing index information corresponding to the data to be read, the acquisition time of which is smaller than a preset acquisition time threshold value, in a first reading area.

The embodiment is applied to a scene of reading data in a database, the data to be processed is the data to be read, and the data to be read is stored in a target database. The time information of the data to be read comprises acquisition time and queried time, wherein the queried time comprises each query time, and the queried time can determine the query frequency of the data to be read.

The method comprises the steps of presetting an acquisition time threshold to be days, taking 30 days as an example, loading acquired data to be read within 30 days after a target database is started, and storing index information of the data to be read into a first reading area, wherein the data to be read comprises performance data of each server, for example, the CPU temperature is 50 ℃, and the corresponding index information is the CPU temperature and the name of a server node. The index information is used for inquiring the data to be read, and can be in the forms of an identification number, a node name and the like.

Step S33, storing the index information corresponding to the data to be read, the queried time of which is lower than the preset query time threshold value in the first reading area, in the second reading area, and storing the index information of the data to be read in the first reading area and the index information of the data to be read in the second reading area, in the third reading area.

The preset query time threshold value refers to the maximum queried time, when the data exceeds the preset query time threshold value, the data is not queried for a long time, the data with the queried time lower than the preset query time threshold value is the first data to be read, and the index information corresponding to the first data to be read is stored in the second reading area.

That is, the data to be read in the first read area is screened, and the index information of the data that is not frequently queried is transferred to the second read area.

In addition, data in the first read region may also be screened using a data screening algorithm, such as the least frequently used algorithm.

Index information of data to be read is stored in the first reading area and the second reading area, and data to be read corresponding to all the index information is stored in the third reading area. The index information is used for inquiring the data to be read in the third reading area.

As shown in fig. 6, the first read region is a direct memory, the second read region is a replacement memory, and the third read region is replacement data. The double cache queues are used for storing common index information, and the replacement memory is used for storing data corresponding to the two cache queues, so that memory occupation can be optimized.

After the database service is started, index information in the database is loaded into a direct memory according to a time sequence and a preset acquisition time threshold value, and data in the direct memory is migrated into a replacement memory by adopting a least frequently used algorithm. The direct memory and the data corresponding to the index information in the replacement memory are stored in the replacement data. The data stored in the replacement memory may be eliminated, that is, the data may be deleted, and the data corresponding to the index data stored in the replacement data may be deleted at the same time as the data is deleted.

The size of the buffer queue may be defined as 1m by 10 by 2 by default, and the default size of the permuted data is 16m by 20.

And step S34, processing the data to be processed based on different storage areas, and determining a processing result.

Specifically, step S34 includes the steps of:

step S341, obtaining a query request.

The user can send out a query request on the front-end page, and the query request can comprise index information, namely a target index, such as the name of the server node, the name of the data acquisition item, the acquisition time and the like.

In step S342, it is determined whether the target index corresponding to the query request is stored in the first read area based on the query request.

And inquiring in the first reading area according to the index information in the inquiry request, and judging whether the target index contained in the inquiry request is stored in the first reading area.

In step S343, when the target index corresponding to the query request is stored in the first reading area, the target data to be read corresponding to the target index is obtained from the third reading area.

When the target index is stored in the first reading area, acquiring data corresponding to the target index, namely target data to be read, from the third reading area.

In some alternative embodiments, step S34 further comprises the steps of:

Step S344, when the target index corresponding to the query request is not stored in the first reading area, determining whether the target index corresponding to the query request is stored in the second reading area;

if the target index is not stored in the first reading area, searching is performed in the second reading area, and whether the target index is stored in the second reading area is judged.

In step S345, when the target index corresponding to the query request is stored in the second reading area, the target data to be read corresponding to the target index is obtained from the third reading area.

And when the target index is stored in the second reading area, acquiring data corresponding to the target index, namely target data to be read, from the third reading area.

The data management of the database provided by the embodiment of the invention divides the data to be read according to the time information, stores the data in the first reading area and the second reading area respectively, and searches the first reading area when the data is processed, and searches the second reading area if the data does not exist. The time of data query is simplified in a double-storage queue mode, the efficiency of data query is improved, all data is not required to be loaded, and the optimization of the memory is realized.

In the query process of this embodiment of the method, as shown in fig. 7, the target database is InfluxDB, the first read area is direct memory, the second read area is replacement memory, the third read area is replacement data, TSM (Time-Structured MERGE TREE storage engine) is a storage engine of InfluxDB, and the InfluxDB uses the TSM storage engine to store all data. The database will load the TSM file when it is started, and read it into the memory for subsequent query operation. According to the query request, data retrieval is carried out, firstly, the direct memory is queried, when index information is stored in the direct memory, whether data corresponding to the index information is stored in the replacement data is judged, when the index information is stored in the direct memory, the query is completed, when the index information is not stored in the replacement data, the TSM is retrieved, and the data queried from the TSM is stored in the replacement data. If the index information is not searched in the direct memory, searching is carried out in the replacement memory, if the index information exists in the replacement memory, whether the data corresponding to the index information is stored in the replacement data is judged, if the index information does not exist in the replacement memory, the TSM is searched, the searched index information is written into the direct memory, and the data corresponding to the index information is written into the replacement memory.

Taking a target database as an InfluxDB as an example, a data management method embodiment of the database is provided, and specific application scenes comprise a data writing database and a data query reading, wherein the two scenes relate to an index data frequent change scene and a second data writing scene.

InfluxDB is an open source distributed timing, time and index database, written in the Go language, without external reliance. The design goal is to realize distributed and horizontal expansion and expansion, and is a core product of InfluxData. And the system is used for storing the monitoring data and the report data in the artificial intelligence development platform. InfluxDB will add the following information to memory when service is started:

(1) Meta data, influxDB, when it is started, reads metadata stored in a metadata storage area, including information such as database, data preprocessor, continuous query, etc. This metadata store is typically kept on disk to ensure that the InfluxDB can maintain its state after reboot.

(2) WAL Log (WRITE AHEAD Log) it will read the WAL Log, including write operations and query operations, when InfluxDB is started. WAL logs are typically used to persist data and ensure consistency of the data.

(3) TSM File (Time-Structured MERGE TREE storage Engine) InfluxDB uses the TSM storage engine to store all data. When the TSM file is started, the TSM file is loaded and read into the memory for subsequent query operation.

InfluxDB uses a storage engine LSM-tree and uses a memory caching mechanism to optimize write performance. However, influxDB may face memory overflow problems in cases of high write load, large-scale data acquisition, and server starvation.

The application scenario related to the embodiment includes a data writing database and a data query reading, and the two scenarios relate to a frequent change scenario of index data and a second-level data writing scenario:

(1) 400 physical nodes, each node generates 256 tasks (256 logic cores of each node, 1 core of each task), each task runs for 2 hours, each task has 3 tag values (task name, task category, node where the task is located), each tag value is 20 characters, and then 400 x 256 x 24/2 x 3 x 20/1024/1024=70 Mb memory data (25 GB of 1 year) are newly added every day.

(2) The second-level data writing scene comprises 400 physical nodes, each node is 200+ in acquisition item, 400 times 200=80000 data are written in each second, and each data is 0.4kb on average, and 400 times 200 times 0.4 kb/1024=31Mb are written in each second.

The method provides a data writing component WRITECACHE and a data reading component READCACHE for a database.

The data writing component comprises a writing request, a data playback monitor and a configuration item, wherein the writing request is received, data to be written is temporarily written into a disk file according to the memory limit requirement, and then data storage is finished in batches through a background task. The specific process is shown in fig. 8:

The writing data is the data to be written, after the data to be written is received, the data to be written is stored in a cache queue, a part of the data to be written is written into InfluxDB, a part of the data to be written is dropped (written into a disk file), if the writing of the InfluxDB fails, the writing of the InfluxDB to the data mark fails, and if the writing of the InfluxDB succeeds, the data in the cache queue is removed. For the data of the landing disk, if the landing disk fails, the landing disk failure is marked, and if the landing disk succeeds, the data in the cache queue is removed. For data written to disk files, influxDB may be read and written concurrently. WRITECACHE the deployment includes defining a configuration item size according to the cluster size, completing initialization of the nest HashTable according to the configuration item, and starting the data playback listener.

The data reading component comprises a read request receiving step, a strategy of modifying InfluxDB itself Index (Index information) all loading to be optimized to be loaded according to the need, a double-buffer queue storing the commonly used Index, and a buffer cache (third read area) storing the temporarily replaced and frequently accessed data, so that the memory occupation can be optimized, and meanwhile, the data inquiry performance can be improved.

The dual-cache queue is a first read area and a second read area, and READCACHE deployments comprise initializing a dual-Index cache queue, initializing a data buffer cache, and loading indexes within the last 30 days by default into the Index cache queue.

The component increases a cache queue of a TSM in the InfluxDB, loads Index data according to different strategies to replace all InfluxDB loads to cause the problem of memory overflow, optimizes data writing logic through a slicing technology, and reduces the problems of memory occupation and data writing loss.

The embodiment provides a method for solving the memory overflow of an InfluxDB single node under massive data, which not only can solve the memory overflow problem of the InfluxDB, but also can improve the query performance through a double Index queue and a data Buffer queue. The method is integrated into the product in a plug-in mode, does not invade the original service, and reduces the influence on the InfluxDB. Based on the scheme, the stability of InfluxDB is improved, the occupation of InfluxDB resources is reduced, the maintenance cost and the technical risk are reduced, and the competitiveness of the AI platform in similar products is improved.

The embodiment also provides a data management device of a database, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a data management apparatus for a database, as shown in fig. 9, including:

An information acquisition module 81 for acquiring time information of data to be processed;

the area determining module 82 is configured to determine a storage area corresponding to the data to be processed based on the time information;

the data processing module 83 is configured to process the data to be processed based on different storage areas, and determine a processing result.

In some optional embodiments, if the data to be processed is data to be written, the time information includes a collection time of the data to be written, the storage area includes a target database and a disk file, and the area determining module 82 includes:

The time calculation unit is used for calculating the time difference between the acquisition time of the data to be written and the current time;

The time judging unit is used for judging the size of the time difference and a preset tray drop threshold value to obtain a judging result;

The data writing unit is used for dividing the data to be written into first data to be written and second data to be written based on the judging result, a storage area corresponding to the first data to be written is a target database, and a storage area corresponding to the second data to be written is a magnetic disk file.

In some alternative embodiments, the data processing module 83 includes:

A first writing unit, configured to write the first data to be written into the target database;

And the second writing unit is used for writing the second data to be written into the disk file.

In some alternative embodiments, the data processing module 83 includes:

And the third writing unit is used for acquiring the second data to be written from the disk file and storing the second data to be written into the target database.

In some optional embodiments, if the data to be processed is data to be read, the time information includes an acquisition time and a queried time of the data to be read, and the area determining module includes:

The first storage unit is used for storing index information corresponding to the data to be read, the acquisition time of which is smaller than a preset acquisition time threshold value, into a first reading area;

The second storage unit is used for storing index information corresponding to the data to be read, of which the queried time is lower than a preset queried time threshold value, in the first reading area into the second reading area, and storing the index information of the data to be read in the first reading area and the data to be read corresponding to the index information of the data to be read in the second reading area into the third reading area.

In some alternative embodiments, the data processing module 83 includes:

a request acquisition unit for acquiring a query request;

A first index retrieval unit, configured to determine, based on the query request, whether a target index corresponding to the query request is stored in the first read area;

and the first data reading unit is used for storing a target index corresponding to the query request in the first reading area and acquiring target data to be read corresponding to the target index from the third reading area.

In some alternative embodiments, the data processing module 83 includes:

A second index retrieval unit, configured to determine whether a target index corresponding to the query request is stored in the second read area when the target index corresponding to the query request is not stored in the first read area;

And the second data reading unit is used for storing a target index corresponding to the query request in the second reading area and acquiring target data to be read corresponding to the target index from the third reading area.

The data management means of the database in this embodiment are presented in the form of functional units, here referred to as ASIC circuits, processors and memories executing one or more software or firmware programs, and/or other devices that can provide the functionality described above.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The embodiment of the invention also provides computer equipment, and a data management device with the database shown in the figure 9.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, and as shown in fig. 10, the computer device includes one or more processors 10, a memory 20, and interfaces for connecting components, including a high-speed interface and a low-speed interface. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 10.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.

The memory 20 may include a storage program area that may store an operating system, an application program required for at least one function, and a storage data area that may store data created from the use of a computer device according to the presentation of an applet landing page, etc. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk file storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The memory 20 may comprise volatile memory, such as random access memory, or nonvolatile memory, such as flash memory, hard disk or solid state disk, or the memory 20 may comprise a combination of the above types of memory.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random-access memory, a flash memory, a hard disk, a solid state disk, or the like, and further, the storage medium may further include a combination of the above types of memories. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A method for managing data in a database, characterized in that the method comprises:

Get the time information of the data to be processed;

Processing the data to be processed based on the different storage areas to determine a processing result; the time information corresponding to the data to be processed in different scenarios is different, and the scenarios include writing and querying database data;

If the data to be processed is data to be written, the time information includes the acquisition time of the data to be written, and the storage area includes a target database and a disk file, determining the storage area corresponding to the data to be processed based on the time information includes: calculating a time difference between the acquisition time of the data to be written and the current time; determining a difference between the time difference and a preset disk write threshold to obtain a determination result; and dividing the data to be written into first data to be written and second data to be written based on the determination result, the storage area corresponding to the first data to be written being the target database, and the storage area corresponding to the second data to be written being the disk file;

If the data to be processed is data to be read, the time information includes an acquisition time and a query time of the data to be read, and determining a storage area corresponding to the data to be processed based on the time information includes: storing index information corresponding to the data to be read whose acquisition time is less than a preset acquisition time threshold in a first read area; storing index information corresponding to the data to be read in the first read area whose query time is less than a preset query time threshold in a second read area, and storing the data to be read corresponding to the index information of the data to be read in the first read area and the index information of the data to be read in the second read area in a third read area.

2. The method according to claim 1, wherein processing the data to be processed based on the different storage areas and determining the processing results comprises:

Writing the first data to be written into the target database;

The second data to be written is written into the disk file.

3. The method according to claim 2, wherein the processing of the data to be processed based on the different storage areas and determining the processing results further comprises:

The second data to be written is obtained from the disk file, and the second data to be written is stored in the target database.

4. The method according to claim 1, wherein processing the data to be processed based on the different storage areas and determining the processing results comprises:

Get query request;

Determining, based on the query request, whether the first read area stores a target index corresponding to the query request;

When the target index corresponding to the query request is stored in the first read area, the target to-be-read data corresponding to the target index is acquired from the third read area.

5. The method according to claim 4, wherein the processing the data to be processed based on the different storage areas and determining the processing results further comprises:

When the target index corresponding to the query request is not stored in the first read area, determining whether the target index corresponding to the query request is stored in the second read area;

When the target index corresponding to the query request is stored in the second read area, the target to-be-read data corresponding to the target index is acquired from the third read area.

6. A data management device for a database, characterized in that the device comprises:

An information acquisition module is used to obtain time information of the data to be processed;

an area determination module, configured to determine a storage area corresponding to the data to be processed based on the time information;

A data processing module, configured to process the data to be processed based on the different storage areas and determine a processing result;

The time information corresponding to the data to be processed in different scenarios is different, and the scenarios include writing and querying database data;

If the data to be processed is data to be written, the time information includes the acquisition time of the data to be written, and the storage area includes a target database and a disk file, the area determination module includes: a time calculation unit for calculating the time difference between the acquisition time of the data to be written and the current time; a time judgment unit for judging the difference between the time difference and a preset disk write threshold to obtain a judgment result; and a data writing unit for dividing the data to be written into first data to be written and second data to be written based on the judgment result, the storage area corresponding to the first data to be written being the target database, and the storage area corresponding to the second data to be written being the disk file;

If the data to be processed is data to be read, the time information includes the acquisition time and the query time of the data to be read, and the area determination module includes:

The first storage unit is used to store the index information corresponding to the to-be-read data whose acquisition time is less than the preset acquisition time threshold in the first reading area; the second storage unit is used to store the index information corresponding to the to-be-read data whose query time in the first reading area is less than the preset query time threshold in the second reading area, and the to-be-read data corresponding to the index information of the to-be-read data in the first reading area and the index information of the to-be-read data in the second reading area are stored in the third reading area.

7. A computer device, comprising:

A memory and a processor, wherein the memory and the processor are communicatively connected to each other, the memory stores computer instructions, and the processor executes the database data management method according to any one of claims 1 to 5 by executing the computer instructions.

8. A computer-readable storage medium, characterized in that computer instructions are stored on the computer-readable storage medium, and the computer instructions are used to enable a computer to execute the database data management method according to any one of claims 1 to 5.