CN116775646A

CN116775646A - Database data management method, device, computer equipment and storage medium

Info

Publication number: CN116775646A
Application number: CN202310586811.3A
Authority: CN
Inventors: 孙辽东
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2023-05-23
Filing date: 2023-05-23
Publication date: 2023-09-19

Abstract

The invention relates to the technical field of databases, and discloses a data management method, a device, computer equipment and a storage medium of a database, wherein the method comprises the following steps: acquiring time information of data to be processed; determining a storage area corresponding to the data to be processed based on the time information; and processing the data to be processed based on different storage areas, and determining a processing result. According to the method, the data to be processed is divided according to the time information, the divided data to be processed is stored in different storage areas, the data to be processed is correspondingly processed based on the storage areas, a specific processing mode can be determined according to actual application scenes, and the method can avoid memory overflow caused by writing or loading a large amount of data at one time, effectively improves the stability of a database, reduces the resource occupation of the database and improves the efficiency of data management.

Description

Database data management method, device, computer equipment and storage medium

Technical Field

The present invention relates to the field of database technologies, and in particular, to a method and apparatus for managing database data, a computer device, and a storage medium.

Background

A server is one type of computer that runs faster and is more highly loaded than a normal computer, and can provide computing or application services for other clients in the network. The server has high-speed operation capability, long-time reliable operation, strong I/O external data throughput capability and better expansibility. Monitoring the server helps to improve the server and discover faults of the server in time. Monitoring the process of system resources while monitoring server performance, for example: CPU utilization, memory consumption, CPU temperature, etc., can help identify performance-related problems with the server.

The collected performance data is stored in a database during performance monitoring of the server, and the data in the database is displayed through a front-end page or corresponding data is displayed according to query of a user. In general, performance monitoring is performed on all servers in a server cluster, and due to the number of server nodes, a database may face a problem of memory overflow when performing large-scale data acquisition, data writing and data query.

Disclosure of Invention

In view of this, the embodiments of the present invention provide a method, an apparatus, a computer device, and a storage medium for managing data of a database, so as to solve the problem of memory overflow of the database.

In a first aspect, an embodiment of the present invention provides a method for managing data in a database, where the method includes:

acquiring time information of data to be processed;

determining a storage area corresponding to the data to be processed based on the time information;

and processing the data to be processed based on different storage areas, and determining a processing result.

According to the data management method of the database, time information of data to be processed is collected, a storage area corresponding to the data to be processed is determined according to the time information, and then the data to be processed in different storage areas are processed. According to the method, the data to be processed is divided according to the time information, the divided data to be processed is stored in different storage areas, the data to be processed is correspondingly processed based on the storage areas, a specific processing mode can be determined according to actual application scenes, and the method can avoid memory overflow caused by writing or loading a large amount of data at one time, effectively improves the stability of a database, reduces the resource occupation of the database and improves the efficiency of data management.

In some optional embodiments, if the data to be processed is data to be written, the time information includes a collection time of the data to be written, the storage area includes a target database and a disk file, and the determining, based on the time information, a storage area corresponding to the data to be processed includes:

Calculating the time difference between the acquisition time of the data to be written and the current time;

judging the time difference and the preset tray drop threshold value to obtain a judging result;

and dividing the data to be written into first data to be written and second data to be written based on the judging result, wherein a storage area corresponding to the first data to be written is a target database, and a storage area corresponding to the second data to be written is a disk file.

In some optional embodiments, the processing the data to be processed based on different storage areas, and determining a processing result includes:

writing the first data to be written into the target database;

and writing the second data to be written into the disk file.

In some optional embodiments, the processing the data to be processed based on different storage areas, determining a processing result, further includes:

and acquiring the second data to be written from the disk file, and storing the second data to be written into the target database.

In some optional embodiments, if the data to be processed is data to be read, the time information includes a collection time and a queried time of the data to be read, and determining, based on the time information, a storage area corresponding to the data to be processed includes:

Storing index information corresponding to the data to be read, the acquisition time of which is smaller than a preset acquisition time threshold value, into a first reading area;

and storing index information corresponding to the data to be read, of which the queried time is lower than a preset query time threshold value, in the first reading area into a second reading area, and storing the index information of the data to be read in the first reading area and the data to be read corresponding to the index information of the data to be read in the second reading area into a third reading area.

acquiring a query request;

judging whether a target index corresponding to the query request is stored in the first reading area or not based on the query request;

and when the target index corresponding to the query request is stored in the first reading area, acquiring target data to be read corresponding to the target index from the third reading area.

when the target index corresponding to the query request is not stored in the first reading area, judging whether the target index corresponding to the query request is stored in the second reading area;

And when the target index corresponding to the query request is stored in the second reading area, acquiring target data to be read corresponding to the target index from the third reading area.

In a second aspect, an embodiment of the present invention provides a data management apparatus for a database, the apparatus including:

the information acquisition module is used for acquiring time information of the data to be processed;

the area determining module is used for determining a storage area corresponding to the data to be processed based on the time information;

and the data processing module is used for processing the data to be processed based on different storage areas and determining a processing result.

In a third aspect, an embodiment of the present invention provides a computer apparatus, including: the memory and the processor are in communication connection with each other, the memory stores computer instructions, and the processor executes the computer instructions, thereby executing the data management method of the database according to the first aspect or any corresponding embodiment of the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where computer instructions are stored on the computer readable storage medium, where the computer instructions are configured to cause a computer to perform a method for managing data in a database according to the first aspect or any one of the embodiments corresponding to the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a method of data management of a database according to some embodiments of the invention;

FIG. 2 is a flow chart of a method of data management of a database according to some embodiments of the invention;

FIG. 3 is a schematic diagram of a cache queue according to some embodiments of the invention;

FIG. 4 is a flow chart of a method of data management of a database according to some embodiments of the invention;

FIG. 5 is a flow diagram of a method of data management of a database according to some embodiments of the invention;

FIG. 6 is a schematic diagram of a memory region according to some embodiments of the invention;

FIG. 7 is a schematic diagram of a data query process according to some embodiments of the invention;

FIG. 8 is a schematic diagram of a data writing process according to some embodiments of the invention;

Fig. 9 is a block diagram of a data management apparatus of a database according to an embodiment of the present invention;

fig. 10 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Monitoring the performance of the server is helpful to identify the performance problem of the server, and repair the server in time, in the related art, the performance of the server can be collected and displayed through monitoring software, and as the server cluster is generally monitored in a large scale, the database can face the problem of memory overflow under the scene of second-level data writing and the scene of frequent index data change.

Based on the above, the embodiment of the invention provides a data management method of a database, so that the memory occupation of the database is optimized, and meanwhile, the data query performance is improved.

According to an embodiment of the present invention, there is provided a data management method embodiment of a database, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

In this embodiment, a method for managing data of a database is provided, and fig. 1 is a flowchart of a method for managing data of a database according to an embodiment of the present invention, as shown in fig. 1, where the flowchart includes the following steps:

step S11, time information of the data to be processed is acquired.

The embodiment of the method can be used for server performance monitoring software which can be used for collecting the performance data of the server, storing the performance data into a database and displaying the collected performance data through a front-end page. When the user needs to perform server performance query, data screening can be performed by inputting keywords, time and the like, and corresponding performance data is displayed. The server cluster is generally monitored, a plurality of server devices are involved in the server cluster, an acquisition tool is installed on each server device, performance data of the server are acquired regularly according to a set period, and specific acquisition of the performance data and the period can be set according to actual requirements, for example: the method comprises the steps of setting acquisition once every 5 seconds, and acquiring CPU utilization rate, CPU temperature, GPU thread utilization rate, disk file read-write performance and the like. The acquisition tool sends a data writing request to the database terminal after acquiring the performance data, and the database terminal receives the data writing request and stores the performance data.

The data to be processed refers to server performance data, and the embodiment of the invention relates to the scenes of writing, inquiring and the like of database data, so that the data to be processed can comprise data which needs to be stored into a database after the acquisition tool acquires the server performance data, data which a user needs to inquire and read from the database according to requirements and the like, and the positions of the data to be processed in different scenes are different.

The time information may include a collection time of server performance data, a latest queried time of data stored in the database, a queried frequency of data stored in the database, and the like. For the data to be processed in different scenes, the specific time information corresponding to the data is different, for example, for the data which is not stored in the database, the time information comprises the acquisition time of the performance data.

Step S12, based on the time information, a storage area corresponding to the data to be processed is determined.

Before the data to be processed is correspondingly processed, the data to be processed is required to be stored in different storage areas, the data to be processed in different scenes is divided according to time information of the data to be processed, a time threshold can be set specifically, the time information is compared with the time threshold, the data to be processed is further divided, and all batches of data obtained through division are stored in the corresponding storage areas.

The storage area can be in the form of a queue or a form, the specific existence form is not limited, and the storage area can be correspondingly set according to the actual scene.

Step S13, processing the data to be processed based on different storage areas, and determining a processing result.

And respectively executing different processing operations on the data to be processed which are placed in different storage areas, wherein the specific processing modes of the data to be processed are different in different scenes. When the application scene is the written data, the final processing result is to write the data to be processed into the database, and when the application scene is the query data, the final processing result is to read the data from the database according to the query request.

Taking the scenario that data is written into a database as an example, data to be processed is data which needs to be stored into the database, generally, the collected data may need to be stored into the database in a large batch, in order to avoid the problem of memory overflow of the database caused by storing a large amount of data, in the scheme, the data which is not stored into the database for a long time but is stored into other areas are stored into the database through time information of the data to be processed, and the data with the newer collection time is stored into the database, wherein the time information comprises the collection time, and the number and the size of the data which are written into the database at one time are limited in such a way, so that the memory overflow caused by writing a large amount of data is avoided.

The method in the scheme can be realized through the plug-in, different application scenes can correspond to different plug-ins, the functions can be integrated into one plug-in, the functions are integrated into the server performance monitoring software through the plug-in mode, the original business of the software cannot be invaded, the database products which are specifically related are not limited, and the maintenance cost and the technical risk can be reduced. The database software involved is not limited, for example: influxDB, kdb+, graphite, etc.

In this embodiment, a method for managing data of a database is provided, fig. 2 is a flowchart of a method for managing data of a database according to an embodiment of the present invention, as shown in fig. 2, if data to be processed is data to be written, time information includes collection time of the data to be written, and a storage area includes a target database and a disk file, where the flowchart includes the following steps:

step S21, time information of the data to be processed is acquired.

Please refer to step S11 in the embodiment shown in fig. 1 in detail, which is not described herein.

Step S22, calculating the time difference between the acquisition time of the data to be written and the current time.

The embodiment is applied to a scene of writing data into a database, the data to be processed is the data to be written, the data to be written is sent to the database by a performance acquisition tool on a server after the data is acquired, the acquisition tool sends a data writing request to the database, the data to be written is contained in the data writing request, the server receives the data writing request, and the data to be written is acquired from the data writing request.

Before the method of step S22 is executed, the data to be written may be stored in a buffer queue, and specifically, the buffer queue may be written into the buffer queue in a double Hash (Hash) nested manner, as shown in fig. 3, specifically as follows:

1. Hash table1 (HashTable 1): and adopting an array+linked list structure, wherein the key is the name of the node to be acquired, and the value is a nested Hash list. The HashTable1 is initialized to the number of nodes, the nodes refer to servers, and the number of the nodes is the number of the servers in the server cluster. First, the capacity expansion is required, in this embodiment, the capacity expansion policy is to perform capacity expansion when the capacity usage number > =0.75 is an initial value, and the capacity expansion is (2×old size+1), where Old size refers to the number of nodes when the data writing operation is performed last time.

The addressing mode is that firstly, the position of the linked list and hash (nodeName)% threadNum are found in the array according to the hash value of the NodeName, and then the final HashEntry is found in the list by searching one by one.

2. Hash table2 (HashTable 2): similar to HashTable1, the difference is that the initialization size is the number of acquisition items, and the addressing mode is hash (collectName)% collectNum.

The actual data stored in hash table2 is a compressed binary data including MetricID, data (actual collected data such as cpu utilization, memory utilization, etc.), lock (Lock, lock added when the data is dropped or written into InfluxDB, preventing dirty data from being written), status (whether it can be reclaimed, including 0: draft, 1: dropped, 2: written into InfluxDB, 3: write InfluxDB succeeded, 4: write InfluxDB failed, 5: dropped success, 6: dropped).

Wherein, the MetricID refers to the identification number of the data, which represents the data uniqueness, and the specific constitution can be as follows: time+nodename+collectname+deviceid. Lock and Status may be changed based on the state of the data prior to writing to the database.

Brief summary the manner in which the double hash table is used is: the hash table1 records the node names and the data names to be written, and the node names and the data names to be written in the hash table1 are positioned to the hash table 2 to obtain specific data corresponding to the node names and the data names to be written in.

The following illustrates the process of storing data to be written into a cache queue:

firstly, completing initial configuration of a buffer queue according to the cluster scale, including: queue length, data write-in disk file threshold time, thread concurrency number and acquisition item number. Acquisition items are server attributes that specifically need to be acquired, for example: temperature, rate of use, etc. The name of the acquisition item is the name of the data to be written.

The data format to be written is { nodename=node1, collectname=cpu CPU temp=80 ℃, idle=80% }, CPU temp and idle are acquisition item names;

retrieving Hashtable1 according to the nodeName, and writing nodename=node1 in hashtentry of Hashtable1, collectname=cpu;

Hashtable2 is retrieved from nodename+collectname and written in hashtentry of Hashtable 2:

metricdl=2023010123959_node1_cpu_cpu0, data= { cpu temp=80 ℃, idle=80% }, lock=unlock, status=0.

And after the data to be written is stored in the cache queue, finishing the storage of the data by monitoring the cache queue. And concurrently reading the data to be written from the cache queue, and calculating the time difference between the acquisition time of the data to be written and the current time, wherein the acquisition time of the data to be written is carried in the data to be written.

Step S23, judging the time difference and the preset tray drop threshold value to obtain a judging result.

The preset disc drop threshold refers to the threshold time for writing data into a disc file, and is set according to actual conditions. Comparing the time difference with a preset landing threshold to obtain a judging result, wherein the judging result comprises that the time difference is not larger than the preset landing threshold and the time difference is larger than the preset landing threshold.

Step S24, dividing the data to be written into first data to be written and second data to be written based on the judging result, wherein the storage area corresponding to the first data to be written is a target database, and the storage area corresponding to the second data to be written is a disk file.

And when the judging result is that the time difference is larger than the preset tray drop threshold, the data to be written is out of date, namely, the data is not processed for a long time after being collected, the data is determined to be second data to be written, and a storage area corresponding to the second data to be written is a magnetic disk file.

And when the judgment result is that the time difference is not greater than the preset tray drop threshold, determining the data as first data to be written, wherein a storage area corresponding to the first data to be written is a target database, and the target database refers to a database to be stored.

Step S25, processing the data to be processed based on different storage areas, and determining a processing result.

After the data to be written is divided and the storage area corresponding to each data is determined, each data is written into the corresponding database.

Specifically, step S25 includes the steps of:

s251, writing the first data to be written into a target database;

s252, writing the second data to be written into the disk file.

In some alternative embodiments, step S25 further comprises: and acquiring second data to be written from the disk file, and storing the second data to be written into the target database.

And for the second data to be written stored in the disk file, data playback can be performed, the second data to be written is read from the disk file concurrently, and the second data to be written is directly written into the target database. If the writing of the target database fails due to the problems of overtime, network error and the like, the data which fails to be written into the target database is written into the disk file.

The following describes the above process completely, specifically, as shown in fig. 4, taking the target database as InfluxDB as an example, and performing concurrent reading on the data to be written in the write cache queue, where the data to be written is divided into a direct write queue and a data playback queue, and the direct write queue judges whether the data to be written is to be dropped by the time difference and the size of the preset drop threshold, if not, the first data to be written is written in the InfluxDB; if yes, writing the second data to be written into the disk file. And concurrently reading second data to be written in the disk file through the data playback queue, and writing the second data to be written in the InfluxDB. If the writing of the InfluxDB fails, the data is stored to a disk file.

In the process of writing the data in the cache queue into the database, changing the state of the data in the hash table according to the actual state of the data, changing the state of the data in the cache queue into 'successful writing InfluxDB' or 'successful disc dropping', deleting the data in the cache queue, avoiding excessive data storage in the cache queue, and reducing the occupation of a memory when the data to be written is stored into the database or a disc file. After the data is extracted from the cache queue and before the data is stored in the database or the disk file, the data can be locked, and only one-way conversion can be performed after the data is locked.

According to the data management method for the database, the time difference between the acquisition time and the current time of the data to be written is calculated, the time difference and the preset landing threshold value are judged, the data to be written is divided according to the judgment result, the first data to be written is written into the target database, the second data to be written is written into the disk file, the second data to be written is read from the disk file, and the second data to be written is stored into the target database. According to the method, the data volume of the target data base to be written is limited by dividing the data to be written, so that batch storage of the data can be realized, the problem of memory overflow of the target data base caused by writing a large amount of data is avoided, and the memory occupation is optimized.

In this embodiment, a method for managing data in a database is provided, fig. 5 is a flowchart of a method for managing data in a database according to an embodiment of the present invention, and as shown in fig. 5, if data to be processed is data to be read, time information includes acquisition time and queried time of the data to be read, where the flowchart includes the following steps:

step S31, time information of the data to be processed is acquired.

Step S32, storing index information corresponding to the data to be read, the acquisition time of which is smaller than a preset acquisition time threshold value, in a first reading area.

The embodiment is applied to a scene of reading data in a database, the data to be processed is the data to be read, and the data to be read is stored in a target database. The time information of the data to be read comprises acquisition time and queried time, wherein the queried time comprises each query time, and the queried time can determine the query frequency of the data to be read.

The preset acquisition time threshold is a number of days, taking 30 days as an example, after the target database is started, firstly loading acquired data to be read within 30 days, and storing index information of the data to be read into a first reading area, wherein the data to be read comprises performance data of each server, such as: the CPU temperature is 50 ℃, and the corresponding index information is the CPU temperature and the name of the server node. The index information is used for inquiring the data to be read, and can be in the forms of an identification number, a node name and the like.

Step S33, storing the index information corresponding to the data to be read, the queried time of which is lower than the preset query time threshold value in the first reading area, in the second reading area, and storing the index information of the data to be read in the first reading area and the index information of the data to be read in the second reading area, in the third reading area.

The preset query time threshold value refers to the maximum queried time, when the data exceeds the preset query time threshold value, the data is not queried for a long time, the data with the queried time lower than the preset query time threshold value is the first data to be read, and the index information corresponding to the first data to be read is stored in the second reading area.

That is, the data to be read in the first read area is screened, and the index information of the data that is not frequently queried is transferred to the second read area.

In addition, data in the first read region may also be screened using a data screening algorithm, such as the least frequently used algorithm.

Index information of data to be read is stored in the first reading area and the second reading area, and data to be read corresponding to all the index information is stored in the third reading area. The index information is used for inquiring the data to be read in the third reading area.

As shown in fig. 6, the first read region is a direct memory, the second read region is a replacement memory, and the third read region is replacement data. The double cache queues are used for storing common index information, and the replacement memory is used for storing data corresponding to the two cache queues, so that memory occupation can be optimized.

After the database service is started, index information in the database is loaded into a direct memory according to a time sequence and a preset acquisition time threshold value, and data in the direct memory is migrated into a replacement memory by adopting a least frequently used algorithm. The direct memory and the data corresponding to the index information in the replacement memory are stored in the replacement data. The data stored in the replacement memory may be eliminated, that is, the data may be deleted, and the data corresponding to the index data stored in the replacement data may be deleted at the same time as the data is deleted.

The size of the buffer queue may be defined as 1m by 10 by 2 by default, and the default size of the permuted data is 16m by 20.

And step S34, processing the data to be processed based on different storage areas, and determining a processing result.

Specifically, step S34 includes the steps of:

step S341, obtaining a query request.

The user can send out a query request on the front-end page, and the query request can comprise index information, namely a target index, such as the name of the server node, the name of the data acquisition item, the acquisition time and the like.

In step S342, it is determined whether the target index corresponding to the query request is stored in the first read area based on the query request.

And inquiring in the first reading area according to the index information in the inquiry request, and judging whether the target index contained in the inquiry request is stored in the first reading area.

In step S343, when the target index corresponding to the query request is stored in the first reading area, the target data to be read corresponding to the target index is obtained from the third reading area.

When the target index is stored in the first reading area, acquiring data corresponding to the target index, namely target data to be read, from the third reading area.

In some alternative embodiments, step S34 further comprises the steps of:

step S344, when the target index corresponding to the query request is not stored in the first reading area, determining whether the target index corresponding to the query request is stored in the second reading area;

if the target index is not stored in the first reading area, searching is performed in the second reading area, and whether the target index is stored in the second reading area is judged.

In step S345, when the target index corresponding to the query request is stored in the second reading area, the target data to be read corresponding to the target index is obtained from the third reading area.

And when the target index is stored in the second reading area, acquiring data corresponding to the target index, namely target data to be read, from the third reading area.

The data management of the database provided by the embodiment of the invention divides the data to be read according to the time information, stores the data in the first reading area and the second reading area respectively, and searches the first reading area when the data is processed, and searches the second reading area if the data does not exist. The time of data query is simplified in a double-storage queue mode, the efficiency of data query is improved, all data is not required to be loaded, and the optimization of the memory is realized.

In the query process of this embodiment of the method, as shown in fig. 7, the target database is InfluxDB, the first read area is direct memory, the second read area is replacement memory, the third read area is replacement data, TSM (Time-Structured Merge Tree storage engine) is a storage engine of InfluxDB, and the InfluxDB uses the TSM storage engine to store all data. The database will load the TSM file when it is started, and read it into the memory for subsequent query operation. According to the query request, data retrieval is carried out, firstly, the query is carried out in a direct memory, when index information is stored in the direct memory, whether data corresponding to the index information is stored in the replacement data is judged, and when the index information is stored in the direct memory, the query is completed; and when the data corresponding to the index information is not stored in the replacement data, searching the TSM, and storing the data queried from the TSM into the replacement data. If the index information is not queried in the direct memory, searching is carried out in the replacement memory, and if the index information exists in the replacement memory, whether the data corresponding to the index information is stored in the replacement data is judged; if the replacement memory does not exist, the TSM is searched, the searched index information is written into the direct memory, and the data corresponding to the index information is written into the replacement memory.

Taking a target database as an InfluxDB as an example, a data management method embodiment of the database is provided, and specific application scenes comprise a data writing database and a data query reading, wherein the two scenes relate to an index data frequent change scene and a second data writing scene.

InfluxDB is an open source distributed timing, time and index database, written in the Go language, without external reliance. The design goal is to realize distributed and horizontal expansion, and is a core product of InfluxData. And the system is used for storing the monitoring data and the report data in the artificial intelligence development platform. InfluxDB will add the following information to memory when service is started:

(1) Meta data: upon startup of the InfluxDB, it will read the metadata stored in the metadata storage area, including information such as databases, data preprocessors, continuous queries, etc. This metadata store is typically kept on disk to ensure that the InfluxDB can maintain its state after reboot.

(2) WAL Log (Write head Log): upon startup of the InfluxDB, it will read the WAL log, which includes write operations and query operations. WAL logs are typically used to persist data and ensure consistency of the data.

(3) TSM file (Time-Structured Merge Tree storage engine): influxDB uses the TSM storage engine to store all data. When the TSM file is started, the TSM file is loaded and read into the memory for subsequent query operation.

InfluxDB uses a storage engine LSM-tree and uses a memory caching mechanism to optimize write performance. However, influxDB may face memory overflow problems in cases of high write load, large-scale data acquisition, and server starvation.

The application scenario related to the embodiment includes a data writing database and a data query reading, and the two scenarios relate to a frequent change scenario of index data and a second-level data writing scenario:

(1) Scene where index data changes frequently: 400 physical nodes, each node generates 256 tasks (256 logic cores of each node, 1 core of each task), each task runs for 2 hours, each task has 3 tag values (task name, task category, node where the task is located), each tag value is 20 characters, and 400 x 256 x 24/2 x 3 x 20/1024/1024=70 Mb memory data (25 GB of 1 year) is newly added every day.

(2) Second-level data writing scene: 400 physical nodes, each node 200+ collects the item, and 400 x 200 = 80000 pieces of data are written in each second concurrently, and each piece of data averages 0.4kb, then 400 x 200 x 0.4kb/1024 = 31Mb is written in each second.

The method provides a data writing component WriteCache and a data reading component ReadCache for the database.

The data writing component includes: receiving a writing request, temporarily writing data to be written into a disk file according to a memory limit requirement, and then finishing data storage in batches through a background task, wherein the method mainly comprises the following steps of: write cache queue, data playback listener, configuration item. The specific process is shown in fig. 8:

the writing data is the data to be written, after the data to be written is received, the data to be written is stored in a cache queue, a part of the data to be written is written into InfluxDB, a part of the data to be written is dropped (written into a disk file), if the writing of the InfluxDB fails, the writing of the InfluxDB to the data mark fails, and if the writing of the InfluxDB succeeds, the data in the cache queue is removed. For the data of the landing disk, if the landing disk fails, the landing disk failure is marked, and if the landing disk succeeds, the data in the cache queue is removed. For data written to disk files, influxDB may be read and written concurrently. The WriteCache deployment includes: defining the size of the configuration item according to the cluster scale, completing the initialization of the nested HashTable according to the configuration item, and starting the data playback monitor.

The data reading assembly includes: receiving a read request, modifying a strategy of InfluxDB itself Index (Index information) all loading, and optimizing to be loaded as required; the dual cache queue is used for storing the common Index, and the buffer cache (the third reading area) is used for storing the data which are temporarily replaced and accessed frequently, so that the memory occupation can be optimized, and meanwhile, the data query performance can be improved.

The dual-cache queue is a first read area and a second read area, and the ReadCache deployment includes: initializing a double Index cache queue, initializing a data buffer cache, and loading Index to the Index cache queue within the last 30 days by default.

The component increases a cache queue of a TSM in the InfluxDB, loads Index data according to different strategies to replace all InfluxDB loads, and causes the problem of memory overflow; the data writing logic is optimized through the slicing technology, so that the memory occupation and the data writing loss are reduced.

The embodiment provides a method for solving the memory overflow of an InfluxDB single node under massive data, which not only can solve the memory overflow problem of the InfluxDB, but also can improve the query performance through a double Index queue and a data Buffer queue. The method is integrated into the product in a plug-in mode, does not invade the original service, and reduces the influence on the InfluxDB. Based on the scheme, the stability of InfluxDB is improved, the occupation of InfluxDB resources is reduced, the maintenance cost and the technical risk are reduced, and the competitiveness of the AI platform in similar products is improved.

The embodiment also provides a data management device of a database, which is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

The present embodiment provides a data management apparatus for a database, as shown in fig. 9, including:

an information acquisition module 81 for acquiring time information of data to be processed;

the area determining module 82 is configured to determine a storage area corresponding to the data to be processed based on the time information;

the data processing module 83 is configured to process the data to be processed based on different storage areas, and determine a processing result.

In some optional embodiments, if the data to be processed is data to be written, the time information includes a collection time of the data to be written, the storage area includes a target database and a disk file, and the area determining module 82 includes:

the time calculation unit is used for calculating the time difference between the acquisition time of the data to be written and the current time;

the time judging unit is used for judging the size of the time difference and a preset tray drop threshold value to obtain a judging result;

the data writing unit is used for dividing the data to be written into first data to be written and second data to be written based on the judging result, a storage area corresponding to the first data to be written is a target database, and a storage area corresponding to the second data to be written is a magnetic disk file.

In some alternative embodiments, the data processing module 83 includes:

a first writing unit, configured to write the first data to be written into the target database;

and the second writing unit is used for writing the second data to be written into the disk file.

In some alternative embodiments, the data processing module 83 includes:

and the third writing unit is used for acquiring the second data to be written from the disk file and storing the second data to be written into the target database.

In some optional embodiments, if the data to be processed is data to be read, the time information includes an acquisition time and a queried time of the data to be read, and the area determining module includes:

the first storage unit is used for storing index information corresponding to the data to be read, the acquisition time of which is smaller than a preset acquisition time threshold value, into a first reading area;

the second storage unit is used for storing index information corresponding to the data to be read, of which the queried time is lower than a preset queried time threshold value, in the first reading area into the second reading area, and storing the index information of the data to be read in the first reading area and the data to be read corresponding to the index information of the data to be read in the second reading area into the third reading area.

In some alternative embodiments, the data processing module 83 includes:

a request acquisition unit for acquiring a query request;

a first index retrieval unit, configured to determine, based on the query request, whether a target index corresponding to the query request is stored in the first read area;

and the first data reading unit is used for storing a target index corresponding to the query request in the first reading area and acquiring target data to be read corresponding to the target index from the third reading area.

In some alternative embodiments, the data processing module 83 includes:

a second index retrieval unit, configured to determine whether a target index corresponding to the query request is stored in the second read area when the target index corresponding to the query request is not stored in the first read area;

and the second data reading unit is used for storing a target index corresponding to the query request in the second reading area and acquiring target data to be read corresponding to the target index from the third reading area.

The data management means of the database in this embodiment are presented in the form of functional units, here referred to as ASIC circuits, processors and memories executing one or more software or firmware programs, and/or other devices that can provide the functionality described above.

Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.

The embodiment of the invention also provides computer equipment, and a data management device with the database shown in the figure 9.

Referring to fig. 10, fig. 10 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 10, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 10.

The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.

Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform the methods shown in implementing the above embodiments.

The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the computer device of the presentation of a sort of applet landing page, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one disk file storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.

The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.

The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.

Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims

1. A method of data management of a database, the method comprising:

acquiring time information of data to be processed;

2. The method according to claim 1, wherein if the data to be processed is data to be written, the time information includes a collection time of the data to be written, the storage area includes a target database and a disk file, and the determining, based on the time information, the storage area corresponding to the data to be processed includes:

3. The method according to claim 2, wherein the processing the data to be processed based on the different storage areas, determining a processing result, includes:

writing the first data to be written into the target database;

and writing the second data to be written into the disk file.

4. A method according to claim 3, wherein said processing said data to be processed based on different said storage areas, determining a processing result, further comprises:

5. The method of claim 1, wherein if the data to be processed is data to be read, the time information includes a collection time and a queried time of the data to be read, and determining, based on the time information, a storage area corresponding to the data to be processed includes:

6. The method of claim 5, wherein the processing the data to be processed based on the different storage areas, determining a processing result, comprises:

acquiring a query request;

7. The method of claim 6, wherein the processing the data to be processed based on the different storage areas, determining a processing result, further comprises:

8. A data management apparatus for a database, the apparatus comprising:

9. A computer device, comprising:

a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, the processor executing the computer instructions to perform the method of data management of a database according to any one of claims 1 to 7.

10. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the data management method of the database of any one of claims 1 to 7.