CN109284295B

CN109284295B - Data optimization method and device

Info

Publication number: CN109284295B
Application number: CN201811209843.7A
Authority: CN
Inventors: 岳斌
Original assignee: Zhengzhou Yunhai Information Technology Co Ltd
Current assignee: Zhengzhou Yunhai Information Technology Co Ltd
Priority date: 2018-10-17
Filing date: 2018-10-17
Publication date: 2021-09-17
Anticipated expiration: 2038-10-17
Also published as: CN109284295A

Abstract

The embodiment of the application discloses a data optimization method, which adopts an algorithm of insertion merging and load balancing to merge or load balance child nodes reaching a threshold value, and can obviously reduce the depth of a tree and the number of child nodes when the depth of the tree is large, save storage resource expenditure and improve storage efficiency. The method in the embodiment of the application comprises the following steps: calculating the vacancy rate of a first node and the vacancy rate of a second node, wherein the first node is adjacent to the second node, and the vacancy rate is the proportion of the number of the hollow factors in each node to the number of the total node factors; and if the vacancy rate of at least one of the first node and the second node reaches a first threshold value, generating a third node, wherein the third node comprises all non-vacancy factors of the first node and the second node.

Description

Data optimization method and device

Technical Field

The present application relates to the field of computers, and in particular, to a method and an apparatus for data optimization.

Background

In the storage field, huge resources are required to be occupied for mass data query and storage, and the performance of data storage is seriously influenced. In order to reduce the resources occupied by stored data and improve data storage performance, various mature and effective query algorithms have been generated.

Among them, the most classical belongs to the B + Tree algorithm. The existing research institutions mainly aim at researching how to improve the performance by adopting two aspects of improving the hardware performance and optimizing the algorithm, the maximum benefit cannot be brought into play by simply improving the hardware performance, and the performance improvement needs to be realized by combining with the proper optimization algorithm. When data is inserted, the B + Tree preferentially splits Leaf nodes (Leaf pages), and storage is expanded by increasing the depth of the Tree along with the increase of data.

The B + Tree algorithm can also meet performance requirements when the data volume is small. However, when the method is applied to mass storage, too much Leaf Page and too deep Tree depth affect the efficiency of data query, and too much resources are required to be occupied, which affects the performance of data storage.

Disclosure of Invention

The embodiment of the application provides a data optimization method, through adopting an insertion merging algorithm, Leaf pages with the vacancy rate reaching a threshold value are merged, when the depth of a tree is large, the depth of the tree and the number of the Leaf pages can be obviously reduced, the expenditure of storage resources is saved, and the storage efficiency is improved. Further load balancing of adjacent Leaf pages can balance nodes in each Leaf Page, average vacancy rate is achieved, split Page operation is reduced when subsequent factors are added, and storage performance is further improved.

In order to achieve the above purpose, the embodiments of the present application provide the following technical solutions:

the first aspect of the present application provides a data optimization method, which may be described by using a C language algorithm, and then embedded in a storage system by using a Java C calling method, where a data optimization device involved in an execution process of the data optimization method corresponds to a corresponding functional entity in an intelligent terminal. The method can comprise the following steps: calculating the vacancy rate of a first node and the vacancy rate of a second node, wherein the first node is adjacent to the second node, and the vacancy rate is the proportion of the number of the hollow factors in each node to the number of the total node factors;

and if the vacancy rate of at least one of the first node and the second node reaches a first threshold value, generating a third node, wherein the third node comprises all non-vacancy factors of the first node and the second node.

Optionally, in some embodiments of the present application, after the generating the third node, if the vacancy rate of at least one of the first node or the second node reaches the first threshold, the method further includes: the first root node factor is updated to a second root node factor, the second root node factor being a last non-white space factor of the third node, the first root node factor being an indexing factor of the first node and the second node.

Optionally, in some embodiments of the present application, the updating the first root node factor to the second root node factor includes: deleting the first root node factor; adding the last non-void factor of the third node as the second root node factor.

Optionally, in some embodiments of the present application, after generating a third node if the vacancy rate of at least one of the first node and the second node reaches the first threshold, the method further includes: calculating the unbalance rate of the third node, wherein the unbalance rate of the third node is the ratio of the vacancy rate of the third node to the sum of the vacancy rate of the third node and the vacancy rate of a fourth node, the third node is adjacent to the fourth node, and the unbalance rate of the fourth node does not reach a second threshold value; and if the unbalance rate of the third node reaches the second threshold value, carrying out load balancing on the third node.

Optionally, in some embodiments of the application, if the imbalance rate of the third node reaches the second threshold, after performing load balancing on the third node, the method further includes: and updating the third root node factor to be a fourth node factor, wherein the fourth node factor is the last non-blank factor of the third node, and the third root node factor is the index factors of the first node and the second node.

In a second aspect, an embodiment of the present application provides a data optimization apparatus having a function of implementing the method according to the first aspect or any one of the possible implementation manners of the first aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above.

In a third aspect, an embodiment of the present application provides another data optimization apparatus, where the data optimization apparatus may include an entity such as a terminal device or a chip, and the data optimization apparatus includes: a processor, a memory; the memory is to store instructions; the processor is configured to execute the instructions in the memory to cause the data optimization apparatus to perform the method of any of the preceding first aspects.

In a fourth aspect, the present application provides a chip system comprising a processor for enabling a data optimization apparatus to implement the functions referred to in the above aspects, e.g. to transmit or process data and/or information referred to in the above methods. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the data optimization device. The chip system can be a data optimization device, and can also be a system chip which is applied to the data optimization device and executes corresponding functions.

In a fifth aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to perform the method as described in the first aspect and any one of the alternative implementations.

In a sixth aspect, embodiments of the present application provide a computer storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the method as described in the first aspect and any one of the optional implementations.

The computer storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

According to the technical scheme, the embodiment of the application has the following advantages:

by adopting an insertion merging algorithm, Leaf pages with the vacancy rate reaching a threshold value are merged, and when the depth of the tree is large, the depth of the tree and the number of the Leaf pages can be obviously reduced, the expenditure of storage resources is saved, and the storage efficiency is improved. Further load balancing of adjacent Leaf pages can balance nodes in each Leaf Page, average vacancy rate is achieved, split Page operation is reduced when subsequent factors are added, and storage performance is further improved.

Drawings

FIG. 1 is a schematic diagram of the B + Tree algorithm insertion data;

FIG. 2 is a flow chart of a method of data optimization according to an embodiment of the present application;

FIG. 3 is a flow chart of another method of data optimization in an embodiment of the present application;

FIG. 4 is a flow chart of another method of data optimization according to an embodiment of the present application;

FIG. 5 is a diagram of an application scenario of a method for data optimization according to an embodiment of the present application;

FIG. 6 is a flow chart of another method of data optimization according to an embodiment of the present application;

FIG. 7 is a flow chart of another method of data optimization in an embodiment of the present application;

FIG. 8 is a diagram illustrating an application scenario of another method for data optimization according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a data optimization apparatus in an embodiment of the present application;

fig. 10 is a schematic structural diagram of another data optimization device in the embodiment of the present application.

Detailed Description

Embodiments of the present application will now be described with reference to the accompanying drawings, and it is to be understood that the described embodiments are merely illustrative of some, but not all, embodiments of the present application. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

In the storage field, huge resources are required to be occupied for mass data query and storage, and the performance of data storage is seriously influenced. In order to reduce resources occupied by stored data and improve data storage performance, various mature and effective query algorithms are generated at present, and the most classical query algorithm belongs to a B + Tree algorithm. The existing research institutions mainly aim at researching how to improve the performance by adopting two aspects of improving the hardware performance and optimizing the algorithm, the maximum benefit cannot be brought into play by simply improving the hardware performance, and the performance improvement needs to be realized by combining with the proper optimization algorithm. B + Tree is an algorithm widely used in the query algorithm, and as shown in fig. 1, the B + Tree algorithm is a schematic diagram of data insertion, but as a general algorithm, the performance still has a large optimization space. When data are inserted, the Leaf Page can be preferentially split by the B + Tree, and the performance requirement can be met when the data volume is small. However, when applied to mass storage, excessive Leaf Page and excessive deep Tree depth become performance bottlenecks for mass data storage.

Based on the problems, the invention provides a B + Tree optimization algorithm containing restriction factor insertion and load balancing, the description of the algorithm is completed by adopting C language, and then the algorithm is embedded into a storage system by a Java C calling method. The method shortens the research and development period of products and improves the performance of mass data storage.

The method proposed in this embodiment is to solve this problem, and for easy understanding, a specific flow in this embodiment is described below, as shown in fig. 2, which is a flow chart of a method for data optimization according to this embodiment, and the method includes, but is not limited to, the following steps:

201. calculating the vacancy rate of the first node and the vacancy rate of the second node;

in the present embodiment, the first node is adjacent to the second node, and it should be understood that the adjacent is expressed as a positional relationship, that is, adjacent nodes in the data storage. The blank factor is the factor site of unwritten data, and the total node factor is the total number of writable data factors of the node. The mathematical expression may be: and a factor is a/A, wherein the factor is the vacancy rate, a is the number of the blank factors, and A is the number of the total node factors.

202. The vacancy rate of at least one node of the first node and the second node reaches a first threshold value;

in this embodiment, the vacancy rate of at least one of the first node and the second node may reach the first threshold under three conditions, that is, the vacancy rate of the first node reaches the first threshold, the vacancy rate of the second node does not reach the first threshold, the vacancy rate of the second node reaches the first threshold, the vacancy rate of the first node does not reach the first threshold, and the vacancy rate of the first node and the vacancy rate of the second node both reach the first threshold, which is determined by the actual conditions, and is not limited herein.

In this embodiment, the first threshold may be a limiting factor, and it should be noted that the value of the factor may be 30% or 40%, and the specific situation is determined according to the actual situation, and is not limited here.

203. Generating a third node;

in this embodiment, the third node includes all non-vacancy factors of the first node and the second node, and it should be understood that the total number of writable factors in the third node is not changed by writing all non-vacancy factors of the first node and the second node, that is, the vacancy factor is not increased by writing data.

The method of the embodiment can reduce the number of nodes, namely reduce the depth of the tree and the number of Leaf pages, thereby saving the expenditure of storage resources and improving the storage efficiency.

In one possible scenario, the root node factor may change according to the combination of the node factors, as shown in fig. 3, which is a flowchart of another data optimization method in the embodiment of the present application.

For ease of understanding, the following describes a specific procedure in this embodiment, including, but not limited to, the following steps:

301. calculating the vacancy rate of the first node and the vacancy rate of the second node;

302. judging that the vacancy rate of at least one node of the first node and the second node reaches a first threshold value;

303. generating a third node;

in this embodiment, the steps 301-303 are similar to the steps 201-203, and are not described herein again.

304. Updating the first root node factor to a second root node factor;

in this embodiment, the second root node factor is the last non-vacancy factor of the third node, and the first root node factor is an index factor of the first node and the second node, that is, the positions of the first node and the second node may be determined according to the position of the first root node.

It should be noted that the last non-blank factor of the third node is the last data arranged according to the writing logic, for example, if the writing logic arrangement of the data is arranged from small to large and sequentially to the right, the second root node factor is the rightmost non-blank factor of the third node; if the writing logic arrangement of the data is arranged leftwards in sequence from small to large according to the numerical values, the second root node factor is the leftmost non-blank factor of the third node; the specific situation is determined by actual situation, and is not limited herein.

Due to the update of the root node, the related data can be better positioned in the data query process, and the search efficiency is improved.

The method proposed in this embodiment is to solve this problem, and for easy understanding, a specific flow in this embodiment is described below, as shown in fig. 4, which is a flow chart of another data optimization method in this embodiment, and this method includes, but is not limited to, the following steps:

401. calculating the vacancy rate of the first node and the vacancy rate of the second node;

402. judging that the vacancy rate of at least one node of the first node and the second node reaches a first threshold value;

403. generating a third node;

in this embodiment, steps 401 and 403 are similar to

steps

201 and 203, and are not described herein again.

404. Deleting the first root node factor;

in this embodiment, deleting the first root node factor may be understood as deleting an original data factor of the first root node, which becomes a blank factor and is available for data to be written again.

405. Adding the last non-blank factor of the third node as a second root node factor;

in this embodiment, adding the last non-vacancy factor of the third node as the second root node factor may be understood as that data of the last non-vacancy factor of the third node is written into the vacancy factor to become the second root node factor.

In this embodiment, the last non-blank factor of the third node is the last data arranged according to the write logic, for example, if the write logic arrangement of the data is arranged rightward from small to large according to the numerical values, the second root node factor is the rightmost non-blank factor of the third node; if the writing logic arrangement of the data is arranged leftwards in sequence from small to large according to the numerical values, the second root node factor is the leftmost non-blank factor of the third node; the specific situation is determined by actual situation, and is not limited herein.

The method of the above embodiment reduces the number of nodes, that is, reduces the tree depth and the Leaf Page number, by merging nodes and updating a root node, thereby saving the storage resource expense and improving the storage efficiency, which is described below in a specific data scenario.

In order to solve the problem, the following describes a specific procedure in this embodiment for easy understanding, and as shown in fig. 5, this is an application scenario diagram of a method for data optimization in this embodiment of the present application.

In this embodiment, there are 4 nodes in total, where the vacancy rate of the first node is 75%, the vacancy rate of the second node is 50%, the vacancy rate of the third node is 0%, and the vacancy rate of the fourth node is 25%.

First, a limiting factor is set to be 30% to identify the vacancy rate, and then the vacancy rates of the first node and the second node reach a threshold value.

Then, the non-white space factors are combined, in the figure, data (5) (25, 30) are combined into one node to become (5, 25, 30), and original 4 nodes become 3 nodes.

Since the rightmost factor after merging is (30), the root node factor is changed from (25) to (30), thus forming the complete storage logic.

It should be noted that, in the present embodiment, a case of 4 nodes is exemplified, where 2 nodes are merged, but an actual case should include more nodes, and the present embodiment is a method description, and the present method can be applied to a scenario of mass data storage optimization, where the specific case is determined by an actual scenario.

In some possible scenarios, the nodes after merging may have an unbalanced vacancy rate, which may affect the data writing efficiency, as described below.

The method proposed in this embodiment is to solve this problem, and for easy understanding, a specific flow in this embodiment is described below, as shown in fig. 6, which is a flow chart of another data optimization method in this embodiment, and this method includes, but is not limited to, the following steps:

601. calculating the unbalance rate of the third node;

in this embodiment, the unbalanced rate of the third node is a ratio of the idle rate of the third node to a sum of the idle rate of the third node and an idle rate of a fourth node, where the third node is adjacent to the fourth node, and the unbalanced rate of the fourth node does not reach a second threshold; the formula for the imbalance calculation may be: lv3 is factor3/(factor3+ factor4), where lv3 is the imbalance ratio of the third node, factor3 is the vacancy ratio of the third node, and factor4 is the vacancy ratio of the fourth node.

602. The unbalance rate of the third node reaches a second threshold value;

in this embodiment, the second threshold may be 50% or 40%, and the specific value is determined according to the actual situation, which is not limited herein.

603. Load balancing is carried out on the third node;

in this embodiment, the load balancing is to rotate the factor in the fourth node to the third node, so that the imbalance rates of the two nodes are close to each other. For example: the third node is (5, X, X, X), the fourth node is (15, 20, 25, X), X is a blank factor, after load balancing, the third node is (5, 15, X, X), the fourth node is (20, 25, X, X), and at this time, the imbalance ratio of the third node and the fourth node approaches.

It should be understood that, in this embodiment, a case of load balancing of 2 nodes is described, where the factor for performing rotation comes from the 2 nodes, but in mass data storage, load balancing of a greater number of nodes may also be performed, and a specific operation is determined according to a practical situation, and is not limited herein.

By further load balancing of the nodes, the nodes in each Leaf Page are balanced, the average vacancy rate is balanced, split Page operation is reduced when subsequent factors are added, and the storage performance is further improved.

In one possible scenario, the root node factor may change according to the combination of the node factors, as shown in fig. 7, which is a flowchart of another data optimization method in the embodiment of the present application.

701. calculating the unbalance rate of the third node;

702. the unbalance rate of the third node reaches a second threshold value;

703. load balancing is carried out on the third node;

in this embodiment, the step 701-703 is similar to the step 601-603, and is not described herein again.

704. Updating the third node factor to a fourth node factor;

in this embodiment, the fourth node factor is the last non-blank factor of the third node, and the third node factor is an index factor of the first node and the second node, that is, the positions of the first node and the second node can be determined according to the position of the third node.

It should be noted that the last non-blank factor of the third node is the last data arranged according to the write logic, for example, if the write logic arrangement of the data is arranged to the right in sequence from small to large according to the numerical values, the fourth node factor is the rightmost non-blank factor of the third node; if the writing logic arrangement of the data is arranged leftwards in sequence from small to large according to the numerical values, the fourth node factor is the leftmost non-blank factor of the third node; the specific situation is determined by actual situation, and is not limited herein.

The method of the above embodiment balances the number of non-blank factors of the node by load balancing of the node and updating of the root node, thereby saving the storage resource expenditure and improving the storage efficiency, which is explained in a specific data scenario below.

In order to solve the problem, a specific flow in the present embodiment is described below, and as shown in fig. 8, the method is an application scenario diagram of another data optimization method in the present embodiment.

First, the scene includes third-order nodes, including (5, X, X, X) (10, 15, 25, 30) (40, 45, X, X) (55, 60, 65, 70) (75, 76, 77, 78) (80, 85, 86, X) (90, 95, X, X) (105, 110, 115, 120), where X is a blanking factor.

Then, a load balancing operation is performed, the (5, X, X, X) is changed to (5, 10, X, X), (10, 15, 25, 30) is changed to (15, 25, 30, X), (40, 45, X, X) is changed to (40, 45, 50, X), (55, 60, 65, 70) is changed to (60, 65, 70, X), (90, 95, X, X) is changed to (90, 95, 105, X), (105, 110, 115, 120) is changed to (110, 115, 120, X), and the corresponding root node is also changed according to the change of the rightmost factor of the node.

According to the embodiment, after load balancing, the vacancy rate of each node is more average, and when subsequent factors are added, split page operation can be reduced, and storage performance is further improved.

In the embodiment of the present application, the data optimization apparatus may be divided into the functional modules according to the method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, in the embodiment of the present application, the division of the module is schematic, and is only one logic function division, and there may be another division manner in actual implementation.

For example, when each functional unit is divided in an integrated manner, as shown in fig. 9, the functional unit is a schematic structural diagram of a data optimization device in the embodiment of the present application.

In fig. 9, the data optimization apparatus 900 provided in the embodiment of the present application includes a calculation unit 901 and a generation unit 902;

a calculating unit 901, configured to calculate an idle rate of a first node and an idle rate of a second node, where the first node is adjacent to the second node, and the idle rate is a ratio of the number of white factors in each node to the total number of node factors;

a generating unit 902, configured to generate a third node if an idle rate of at least one of the first node and the second node reaches a first threshold, where the third node includes all non-white space factors of the first node and the second node.

Optionally, the generating unit 902 is further configured to update the first root node factor to be a second root node factor, where the second root node factor is a last non-vacancy factor of the third node, and the first root node factor is an index factor of the first node and the second node.

Optionally, the generating unit 902 is specifically configured to delete the first root node factor; the generating unit is specifically configured to add the rightmost factor of the third node as the second root node factor.

Optionally, the calculating unit 901 is further configured to calculate an imbalance rate of the third node, where the imbalance rate of the third node is a ratio of an idle rate of the third node to a sum of the idle rate of the third node and an idle rate of a fourth node, the third node is adjacent to the fourth node, and the imbalance rate of the fourth node does not reach a second threshold; the generating unit 902 is further configured to perform load balancing on the third node if the imbalance rate of the third node reaches the second threshold.

Optionally, the generating unit 902 is further configured to update the third node factor to a fourth node factor, where the fourth node factor is a last non-blank factor of the third node, and the third node factor is an index factor of the first node and the second node.

All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional unit, and are not described herein again.

In this embodiment, the data optimization apparatus 900 may be presented in a form of dividing each functional module in an integrated manner. A "module" as used herein may refer to an application-specific integrated circuit (ASIC), an electronic circuit, a processor and memory that execute one or more software or firmware programs, an integrated logic circuit, and/or other devices that provide the described functionality. In a simple embodiment, one skilled in the art can appreciate that the data optimization device 900 can take the form shown in fig. 10, as shown in fig. 10, which is a schematic structural diagram of another data optimization device in the embodiment of the present application.

The data optimization apparatus 1000 includes: an input/output (I/O) interface 1001, a processor 1002, and a memory 1003. Specifically, the processor 1002 in fig. 10 may cause the data optimization apparatus 1000 to execute the method for implementing the control of the data optimization circuit in the above method embodiment by calling a computer stored in the memory 1003 to execute the instructions.

Specifically, the functions/implementation procedures of the calculating unit 901 and the generating unit 902 in fig. 9 can be implemented by the processor 1002 in fig. 10 calling a computer executing instruction stored in the memory 1003. Alternatively, the functions/implementation processes of the calculation unit 901 and the generation unit 902 in fig. 9 may be implemented by an input/output (I/O) interface 1001 in fig. 10.

Since the data optimization apparatus 1000 provided in the embodiment of the present application can be used for the method for performing data optimization circuit control, reference may be made to the above method embodiment for obtaining technical effects, which are not described herein again.

Optionally, an embodiment of the present application provides a chip system, where the chip system includes a processor, and is used to support a data optimization apparatus to implement the data optimization method. In one possible design, the system-on-chip further includes a memory. The memory is used for storing program instructions and data necessary for the data optimization device. The chip system may be formed by a chip, and may also include a chip and other discrete devices, which is not specifically limited in this embodiment of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus. The naming or numbering of the steps appearing in the present application does not mean that the steps in the method flow have to be executed in the chronological/logical order indicated by the naming or numbering, and the named or numbered process steps may be executed in a modified order depending on the technical purpose to be achieved, as long as the same or similar technical effects are achieved. The division of the modules presented in this application is a logical division, and in practical applications, there may be another division, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed, and in addition, the shown or discussed coupling or direct coupling or communication connection between each other may be through some interfaces, and the indirect coupling or communication connection between the modules may be in an electrical or other similar form, which is not limited in this application. The modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed in a plurality of circuit modules, and some or all of the modules may be selected according to actual needs to achieve the purpose of the present disclosure.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable storage medium, and the storage medium may include: ROM, RAM, magnetic or optical disks, and the like.

The method, the apparatus, and the system for optimizing data provided by the embodiment of the present application are described in detail above, and a specific example is applied in the description to explain the principle and the embodiment of the present application, and the description of the embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method of data optimization, comprising:

calculating the vacancy rate of a first node and the vacancy rate of a second node, wherein the first node is adjacent to the second node, and the vacancy rate is the proportion of the number of the hollow factors in each node to the number of the total node factors; the blank factor is a factor site in which data is not written, and the total node factor is the total number of data factors in which the node can be written;

if the vacancy rate of at least one of the first node and the second node reaches a first threshold value, generating a third node, wherein the third node comprises all non-vacancy factors of the first node and the second node; if the vacancy rate of at least one of the first node and the second node reaches a first threshold, after a third node is generated, the method further includes: deleting the first root node factor; adding a last non-vacancy factor of the third node as a second root node factor, the second root node factor being a last non-vacancy factor of the third node, the first root node factor being an index factor of the first node and the second node;

if the vacancy rate of at least one of the first node and the second node reaches a first threshold, after a third node is generated, the method further includes:

calculating an unbalance rate of the third node, wherein the unbalance rate of the third node is a ratio of an idle rate of the third node to a sum of the idle rate of the third node and an idle rate of a fourth node, the third node is adjacent to the fourth node, and the unbalance rate of the fourth node does not reach a second threshold value;

if the unbalance rate of the third node reaches the second threshold, performing load balancing on the third node; and updating a third root node factor to be a fourth node factor, wherein the fourth node factor is the last non-blank factor of the third node, and the third root node factor is the index factors of the first node and the second node.

2. A data optimization apparatus, comprising:

the computing unit is used for computing the vacancy rate of a first node and the vacancy rate of a second node, wherein the first node is adjacent to the second node, and the vacancy rate is the proportion of the number of the void factors in each node to the number of the total node factors; the blank factor is a factor site in which data is not written, and the total node factor is the total number of data factors in which the node can be written;

a generating unit, configured to generate a third node if an idle rate of at least one of the first node and the second node reaches a first threshold, where the third node includes all non-blanking factors of the first node and the second node; the generating unit is further configured to delete the first root node factor; adding a last non-vacancy factor of the third node as a second root node factor, the second root node factor being a last non-vacancy factor of the third node, the first root node factor being an index factor of the first node and the second node;

the calculating unit is further configured to calculate an imbalance rate of the third node, where the imbalance rate of the third node is a ratio of an idle rate of the third node to a sum of the idle rate of the third node and an idle rate of a fourth node, the third node is adjacent to the fourth node, and the imbalance rate of the fourth node does not reach a second threshold;

the generating unit is further configured to perform load balancing on the third node if the imbalance rate of the third node reaches the second threshold; and the third root node factor is used for updating to a fourth node factor, wherein the fourth node factor is the last non-blank factor of the third node, and the third root factor is the index factor of the first node and the second node.