CN111400320B

CN111400320B - Method and device for generating information

Info

Publication number: CN111400320B
Application number: CN202010190583.4A
Authority: CN
Inventors: 欧阳可守
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2023-06-20
Anticipated expiration: 2040-03-18
Also published as: CN111400320A

Abstract

The embodiment of the application discloses a method and a device for generating information. One embodiment of the method comprises the following steps: deriving a set of key metadata from a key-value database, wherein each key metadata piece comprises a key and a value-occupying capacity; writing the key metadata set into a statistics tree; based on the statistical tree, storage distribution information of the key value database is generated. This embodiment provides a way of analyzing the storage distribution of the new key-value database.

Description

Method and device for generating information

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating information.

Background

Key value databases are now widely used by the industry. Meanwhile, the key value database can bear a large amount of data without obviously degrading the performance due to the flexibility and the distributed characteristics. Because the flexibility technicians usually do not pay much attention to the use condition of storage when using, the users often do not pay attention to the data condition of internal storage until the cluster storage space is about to reach the limit, and try to perform certain optimization to improve the cluster utilization rate.

At present, the key value database storage analysis method mainly comprises the following three steps: firstly, reversing all key capacities in the key value database, and analyzing out the key with large occupied capacity. And secondly, summarizing the capacity conditions of the keys according to the known key templates or prefixes, and further analyzing the capacity use conditions. Thirdly, sampling a small part of keys for manual analysis.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating information.

In a first aspect, an embodiment of the present application proposes a method for generating information, including: deriving a set of key metadata from a key-value database, wherein each key metadata piece comprises a key and a value-occupying capacity; writing the key metadata set into a statistics tree; based on the statistical tree, storage distribution information of the key value database is generated.

In some embodiments, writing the set of key metadata to the statistics tree includes: dividing keys in the key metadata set to generate an element sequence set; for an element sequence in the element sequence set, sequentially inserting elements of the element sequence into a statistical tree, wherein the rank of the element in the element sequence corresponds to the hierarchy of the element in the statistical tree; based on the value-occupied capacity in the key metadata set, the number of keys and the value-occupied capacity in the nodes in the statistics tree are updated.

In some embodiments, partitioning keys in a key metadata set generates a sequence of elements comprising: for a key in a key metadata set, determining a target separator in the key; the key is split into a sequence of elements based on the target separator.

In some embodiments, determining the target separator in the key includes: counting the frequency of the occurrence of the preset separator in the preset separator set in the key; based on the counted frequency, a target separator is determined from a set of preset separators.

In some embodiments, inserting elements of the sequence of elements into the statistics tree sequentially includes: for a current element of the sequence of elements, determining a position of the current element in the statistics tree, and inserting the current element at the position.

In some embodiments, determining the location of the current element in the statistics tree, and inserting the current element at the location, comprises: the following insertion steps are performed: matching the current element in a child node set of a node corresponding to the last element of the current element in the statistical tree; if the matching is successful, determining that the current element is inserted into the statistical tree; determining whether the current element is the last element in the sequence of elements; if the element is the last element, it is determined that the sequence of elements has been inserted into the statistics tree.

In some embodiments, determining the location of the current element in the statistics tree and inserting the current element at the location further comprises: in response to determining that the current element is not the last element in the sequence of elements, taking the next element in the sequence of elements as the current element, and continuing to perform the inserting step.

In some embodiments, determining the location of the current element in the statistics tree and inserting the current element at the location further comprises: if the matching of the current element in the child node set of the node corresponding to the last element of the current element in the statistical tree fails, creating the child node corresponding to the current element for the node corresponding to the last element of the current element.

In some embodiments, determining the location of the current element in the statistics tree and inserting the current element at the location further comprises: determining whether the number of child nodes of the node corresponding to the last element of the current element meets a merging condition; and if the merging condition is met, merging at least part of the child nodes of the node corresponding to the last element of the current element into a wild card child node.

In some embodiments, the merge conditions include at least one of: the number of the child nodes exceeds a preset child node number threshold, and the number of the global nodes of the statistical tree exceeds a preset global node data threshold.

In some embodiments, determining the location of the current element in the statistics tree and inserting the current element at the location further comprises: the leaf nodes corresponding to numbers created under the same node in the statistical tree are merged into a wild-type Fu Shezi node.

In some embodiments, generating the stored distribution information of the key-value database based on the statistical tree includes: traversing the statistical tree to generate storage distribution information of the key value database.

In some embodiments, traversing the statistical tree to generate the stored distribution information of the key-value database includes: performing preamble traversal on the statistical tree, and outputting a matched character string set and the occupation capacity of each matched character string; and sorting the matched character string sets according to the output occupied capacity, and generating storage distribution information of the key value database based on the sorting result.

In some embodiments, generating the storage distribution information of the key-value database based on the ordering result includes: selecting a target character string from the matched character string set based on the sorting result; and performing wild card expansion on the target character string to obtain an expanded character string set and the occupied capacity of each expanded character string.

In a second aspect, an embodiment of the present application proposes an apparatus for generating information, including: a deriving unit configured to derive a set of key metadata from a key-value database, wherein each piece of key metadata comprises a key and a value-occupying capacity; a writing unit configured to write the key metadata set into the statistical tree; and a generation unit configured to generate storage distribution information of the key value database based on the statistical tree.

In some embodiments, a write unit includes: a segmentation subunit configured to segment keys in the key metadata set to generate an element sequence set; an inserting subunit configured to insert, for an element sequence in the element sequence set, elements of the element sequence into the statistics tree in sequence, wherein a rank of the element in the element sequence corresponds to a hierarchy of the element in the statistics tree; an updating subunit configured to update the number of keys and the value-occupied capacity in the nodes in the statistics tree based on the value-occupied capacity in the key metadata set.

In some embodiments, the partitioning subunit comprises: a determination module configured to determine, for a key in the set of key metadata, a target separator in the key; a splitting module configured to split the key into a sequence of elements based on the target delimiter.

In some embodiments, the determining module comprises: a statistics sub-module configured to count the frequency of occurrence of preset segmenters in the set of preset separators in the key; a first determination submodule configured to determine a target separator from a preset separator set based on the counted frequency.

In some embodiments, the insertion subunit comprises: an insertion module configured to determine, for a current element of the sequence of elements, a position of the current element in the statistics tree, and insert the current element at the position.

In some embodiments, the insertion module comprises: an insertion sub-module configured to perform the following insertion steps: for the current element of the element sequence, matching the current element in a child node set of a node corresponding to the last element of the current element in the statistical tree; if the matching is successful, determining that the current element is inserted into the statistical tree; determining whether the current element is the last element in the sequence of elements; if the element is the last element, it is determined that the sequence of elements has been inserted into the statistics tree.

In some embodiments, the insertion module further comprises: and a loop sub-module configured to, in response to determining that the current element is not the last element in the sequence of elements, take a next element of the sequence of elements as the current element, and continue performing the inserting step.

In some embodiments, the insertion module further comprises: the creating sub-module is configured to create a sub-node corresponding to the current element for the node corresponding to the previous element of the current element if the matching of the current element in the sub-node set of the node corresponding to the previous element of the current element in the statistics tree fails.

In some embodiments, the insertion module further comprises: a second determining sub-module configured to determine whether the number of sub-nodes of the node corresponding to the last element of the current element satisfies a merging condition; and the first merging sub-module is configured to merge at least part of sub-nodes of the node corresponding to the last element of the current element into a wild card sub-node if the merging condition is met.

In some embodiments, the insertion module further comprises: a second merging sub-module configured to merge the numerically corresponding leaf nodes created under the same node in the statistics tree into a wild-type Fu Shezi node.

In some embodiments, the generating unit comprises: and the traversing subunit is configured to traverse the statistical tree and generate storage distribution information of the key value database.

In some embodiments, traversing the subunit comprises: the traversing module is configured to conduct preamble traversal on the statistical tree and output a matching character string set and the occupied capacity of each matching character string; the generation module is configured to sort the matched character string sets according to the output occupied capacity and generate storage distribution information of the key value database based on the sorting result.

In some embodiments, the generating module comprises: a selection sub-module configured to select a target string from the set of matching strings based on the ranking result; and the expansion submodule is configured to perform wild card expansion on the target character string to obtain an expansion character string set and the occupied capacity of each expansion character string.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.

The method and the device for generating information provided by the embodiment of the application firstly derive a key metadata set from a key value database; then writing the key metadata set into a statistical tree; and finally, generating storage distribution information of the key value database based on the statistical tree. A new way of analyzing the storage distribution of a key-value database is provided.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings, in which:

FIG. 1 is an exemplary system architecture in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method for generating information according to the present application;

FIG. 3 is a flow chart of yet another embodiment of a method for generating information according to the present application;

FIG. 4 is a flow chart of another embodiment of a method for generating information according to the present application;

FIG. 5 is a flow chart of one application scenario of a method for generating information according to the present application;

FIG. 6A shows a schematic diagram of a tree entry process;

FIG. 6B shows a schematic diagram of the merging process;

FIG. 6C shows yet another schematic diagram of the tree entry process;

FIG. 6D shows yet another schematic diagram of the merging process;

FIG. 6E shows another schematic diagram of the merging process;

FIG. 7 is a schematic structural diagram of one embodiment of an apparatus for generating information according to the present application;

fig. 8 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present application.

Detailed Description

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the methods for generating information or the apparatus for generating information of the present application may be applied.

As shown in fig. 1, a key value database 101, a network 102, and a server 103 may be included in a system architecture 100. Network 102 is the medium used to provide a communication link between key-value store 101 and server 103. Network 102 may include various connection types such as wired, wireless communication links, or fiber optic cables, among others.

The key-value database 101 may be used to store data in the form of key-value pairs.

The server 103 may provide various services. For example, the server 103 may perform processing such as analysis on data such as a key metadata set derived from the key-value database 101, and generate a processing result (for example, storage distribution information of the key-value database).

The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module. The present invention is not particularly limited herein.

It should be noted that, the method for generating information provided in the embodiments of the present application is generally performed by the server 103, and accordingly, the apparatus for generating information is generally disposed in the server 103.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for generating information according to the present application is shown. The method for generating information comprises the steps of:

step 201, deriving a key metadata set from a key value database.

In this embodiment, an execution subject of the method for generating information (e.g., the server 103 shown in fig. 1) may derive a key metadata set from a key database (e.g., the key database 101 shown in fig. 1). Wherein each piece of key metadata in the set of key metadata may include a key (key), and an occupied capacity of a value (value) stored by the key.

In practice, key-value databases, also called KV databases, are usually only capable of classifying resources by keys. Whereas keys are typically designed to be multi-segmented using specific separators, including fixed character segments and data segments. The fixed character segment may be used to uniquely describe a type of data and the data segment may be used to describe a unique record in such data. For example, storing user data in a key-value database, a key may be designed to: key: prefix_user_ _$ { uid }. Where prefix_user_is a fixed character segment, indicating that this fixed character segment is used to maintain the user data set. The $ { uid } is the unique identity of the user. Thus, each user corresponds to only one key. It should be noted that, the present application automatically discovers the fixed character segment of prefix_user_in the key value database, and realizes automatic analysis of the storage distribution of key metadata without human intervention. The application is realized on the basis of the unique data set with the fixed character segment description.

Step 202, writing the key metadata set into a statistics tree.

In this embodiment, the execution body may write the key metadata set into the statistics tree. Specifically, for each piece of key metadata in the key metadata set, the execution body may first write the key in the key metadata into the path of the statistics tree, and then update the occupied capacity on the path based on the value occupied capacity in each piece of key metadata.

Step 203, based on the statistical tree, generating storage distribution information of the key value database.

In this embodiment, the execution body may generate the storage distribution information of the key value database based on the statistical tree. For example, the executing entity may output the character string and the corresponding occupied capacity on the path with the top occupied capacity (e.g., top 10) in the statistics tree as the storage distribution information of the key value database.

The method for generating information provided by the embodiment of the application comprises the steps of firstly, exporting a key metadata set from a key value database; then writing the key metadata set into a statistical tree; and finally, generating storage distribution information of the key value database based on the statistical tree. A new way of analyzing the storage distribution of a key-value database is provided.

With further reference to fig. 3, a flow 300 of yet another embodiment of a method for generating information according to the present application is shown. The method for generating information comprises the steps of:

step 301, deriving a key metadata set from a key value database.

In this embodiment, the specific operation of step 301 is described in detail in step 201 in the embodiment shown in fig. 2, and will not be described herein.

Step 302, partitioning keys in a key metadata set to generate a set of element sequences.

In this embodiment, for key metadata in the key metadata set, the execution body may divide the keys in the key metadata to generate the element sequence. Specifically, the execution body may divide the key into a plurality of elements, and arrange each element in the order in which the elements appear in the key, to generate the element sequence. For example, the execution body may divide the key by a fixed character length (e.g., 1 character, 2 characters) to obtain the element sequence.

In some alternative implementations of the present embodiment, the execution body may first determine the target separator in the key; the key is then split into a sequence of elements based on the target separator. Wherein the target segmenter may be one or more of the separators that appear in the key. For example, the execution subject may take a separator frequently appearing in a key as a target separator. For another example, the executing body may first count the frequency of occurrence of the preset separator in the preset separator set in the key; a target separator is then determined from the set of preset separators based on the counted frequency. Typically, the preset separator with the highest occurrence frequency is the target separator of the key. Wherein the preset separator set may include, but is not limited to "_", ": a plurality of separators such as "," { "," } ", etc.

If the target separator in the key fails to be specified, the entire key is taken as one element without splitting.

Step 303, for an element sequence in the element sequence set, sequentially inserting elements of the element sequence into the statistics tree.

In this embodiment, for an element sequence in the element sequence set, the execution body may sequentially insert elements in the element sequence into the statistics tree. Wherein, for an element sequence, an element in the element sequence is inserted into a node of a layer in the statistical tree, and the order of the element in the element sequence corresponds to the level of the element in the statistical tree.

In some optional implementations of this embodiment, for a current element of the sequence of elements, the execution body may determine a location of the current element in the statistics tree, and insert the current element at the location.

Step 304, updating the number of keys and the value-occupied capacity in the nodes in the statistics tree based on the value-occupied capacity in the key metadata set.

In this embodiment, for key metadata in the key metadata set, the execution body may update the number of keys and the value occupation capacity in the nodes in the statistics tree for the value occupation capacity in the key metadata. The number of elements inserted by one node in the statistical tree is equal to the number of keys in the node, and the sum of the value occupied capacity in all key metadata corresponding to the inserted elements is equal to the value occupied capacity in the node.

Step 305, traversing the statistical tree to generate storage distribution information of the key value database.

In this embodiment, the executing body may traverse the statistics tree to generate the storage distribution information of the key value database. For example, the execution body may perform a preamble traversal on the statistics tree, and output the set of matching strings and the occupation capacity of each matching string as the storage distribution information of the key value database. Wherein, one matching string in the matching string set may be composed of elements inserted by all nodes on one path from the root node to the leaf node in the statistical tree. Further, the execution body may further sort the matched string sets according to the output occupied capacity, and generate storage distribution information of the key value database based on the sorting result. For example, the sorted set of matching strings and the occupied capacity of each matching string are used as the storage distribution information of the key value database. For another example, a matching string with a occupation capacity of 10 in front of the rank is selected from the sorted matching string set, and the selected matching string and the occupation capacity of each matching string are used as the storage distribution information of the key value database.

As can be seen from fig. 3, the flow 300 of the method for generating information in this embodiment highlights the writing step and the generating step compared to the corresponding embodiment of fig. 2. Thus, the scheme described in this embodiment firstly segments the keys in the key metadata set to generate an element sequence set; then, for the element sequence in the element sequence set, sequentially inserting the elements of the element sequence into a statistical tree; then updating the key number and the value occupation capacity in the nodes in the statistical tree based on the value occupation capacity in the key metadata set; and finally traversing the statistical tree to generate storage distribution information of the key value database. And inserting keys in the key metadata set into a statistical tree, and traversing the statistical tree to obtain the storage distribution information of the key value database, thereby improving the analysis accuracy of the storage distribution of the key value database. In addition, the analysis efficiency of the storage distribution of the key value database is improved.

With further reference to fig. 4, a flow 400 of another embodiment of a method for generating information according to the present application is shown. The method for generating information comprises the steps of:

step 401, deriving a key metadata set from a key value database.

Step 402, partitioning keys in the key metadata set to generate a set of element sequences.

In this embodiment, the specific operations of steps 401 to 402 are described in detail in steps 301 to 302 in the embodiment shown in fig. 3, and are not described herein.

Step 403, for a current element of the element sequence in the element sequence set, determining whether the current element is successfully matched in a child node set of a node corresponding to a previous element of the current element in the statistics tree.

In this embodiment, for an element sequence in the element sequence set, the execution body may match a current element in the element sequence in a child node set of a node corresponding to a previous element of the current element in the statistics tree, and determine whether the matching of the current element in the child node set of the node corresponding to the previous element is successful. In general, the execution body may determine whether a node corresponding to an element previous to the current element has a child node corresponding to the current element. If so, the matching is successful, and step 404 is performed; if not, the match fails, step 405 is performed.

Step 404, determining that the current element has been inserted into the statistics tree.

In this embodiment, if the matching of the current element in the child node set of the node corresponding to the current element is successful, it is determined that the current element has been inserted into the statistics tree. At this time, the execution body may directly execute step 406.

Step 405, creating a child node corresponding to the current element for a node corresponding to the previous element of the current element.

In this embodiment, if the matching of the current element in the child node set of the node corresponding to the current element fails, it is determined that the current element has not been inserted into the statistics tree. At this time, the execution body may first create a child node corresponding to the current element for the node corresponding to the previous element of the current element, and then execute step 406.

It should be noted that, when none of the element sequences in the element sequence set is inserted into the statistical tree, the statistical tree may be empty. If a first element in the first sequence of elements is to be inserted into the statistics tree, a root node corresponding to the first element may be created in the statistics tree. Steps 403-405 are then performed for any element other than the first element.

Step 406, determining whether the number of child nodes of the node corresponding to the last element of the current element satisfies the merging condition.

In this embodiment, the execution body may determine whether the number of child nodes of the node corresponding to the previous element of the current element satisfies the merging condition. If the merging condition is satisfied, step 407 is executed first, and step 408 is executed again; if the merge condition is not satisfied, step 408 is performed directly.

In this embodiment, the merging condition may include one or more conditions set in advance. In general, the merging condition may be a condition for describing an excessive number of child nodes of the same node and/or an excessive number of nodes of the statistical tree. The merge condition is primarily a threshold determination, for example, the merge condition may include, but is not limited to, at least one of: the number of children exceeds a preset number of children threshold, the number of global nodes of the statistical tree exceeds a preset global node data threshold, etc. The number of the sub-nodes exceeding the threshold value of the number of the preset sub-nodes can be the situation of distinguishing the depth of the statistical tree, and the threshold values of the nodes with different depths can be different. The number of global nodes of the statistical tree exceeding the preset global node data threshold may be a global unified threshold without distinguishing the depth of the statistical tree.

Step 407, merging at least part of the sub-nodes of the node corresponding to the last element of the current element into a wild card sub-node.

In this embodiment, if the merging condition is satisfied, the execution body may merge at least some child nodes of the node corresponding to the previous element of the current element into the wild card child node. Typically, the execution body may screen at least part of the sub-nodes for merging according to a predefined variable screening function. For example, the execution body may merge all the children nodes of the node corresponding to the last element of the current element into a wildcard child node, and then merge element statistics of all the positions into the wildcard child node. For another example, the execution entity may merge only the child nodes corresponding to the digits into a wild card child node, and then merge all the digital element statistics at that location into the wild card child node.

Step 408 determines if the current element is the last element in the sequence of elements.

In this embodiment, the execution body may determine whether the current element is the last element in the element sequence. If yes, go to step 409; if not the last element, step 410 is performed.

Step 409, determining that the element sequence has been inserted into a statistics tree.

In this embodiment, if the current element is the last element in the sequence of elements, then it is determined that the sequence of elements has been inserted into the statistics tree. Step 411 is performed when all element sequences in the set of element sequences have been inserted into the statistics tree.

In step 410, the next element of the sequence of elements is taken as the current element.

In this embodiment, if the current element is not the last element in the element sequence, it is determined that there is an element in the element sequence that has not been inserted into the statistics tree. At this time, the execution body may take the next element of the element sequence as the current element, and continue to execute step 403. This is repeated until all the element sequences in the set of element sequences have been inserted into the statistics tree.

In some optional implementations of this embodiment, the executing entity may further merge the leaf nodes corresponding to the numbers created under the same node in the statistics tree into a wild-type Fu Shezi node.

Step 411, updating the number of keys and the value-occupied capacity in the nodes in the statistics tree based on the value-occupied capacity in the key metadata set.

In this embodiment, the specific operation of step 411 is described in detail in step 404 in the embodiment shown in fig. 4, and will not be described herein.

Step 412, performing a preamble traversal on the statistics tree, and outputting the set of matching strings and the occupation capacity of each matching string.

In this embodiment, the execution body may perform preamble traversal on the statistics tree, and output the set of matching strings and the occupation capacity of each matching string. Wherein, one matching string in the matching string set may be composed of elements inserted by all nodes on one path from the root node to the leaf node in the statistical tree. The occupied capacity of the matching string may be equal to the sum of the value occupied capacities in all the key metadata corresponding thereto, that is, the value occupied capacity in the leaf node on the path corresponding to the matching string.

Step 413, sorting the matched string set according to the outputted occupation capacity.

In this embodiment, the execution body may sort the matching string set according to the output occupied capacity. For example, the execution body may sort the matching string sets in order of the occupied capacity from large to small.

Step 414, selecting a target character string from the matched character string set based on the sorting result.

In this embodiment, the execution body may select the target string from the matching string set based on the sorting result. For example, the execution body may select the matching string ordered first from the matching string set as the target string.

And step 415, performing wild card expansion on the target character string to obtain an expanded character string set and the occupied capacity of each expanded character string.

In this embodiment, the execution body may perform wild card expansion on the target string to obtain an expanded string set and an occupied capacity of each expanded string.

Generally, if the target string includes only one wildcard, the execution body may expand the wildcard to obtain all the expanded strings covered by the wildcard and the occupied capacity of each expanded string. Then, the execution body may sort all the expansion strings based on the occupied capacity, and select the expansion strings sorted in the first 10 bits and the occupied capacity of each expansion string as the storage distribution information of the key value database. If the target character string includes at least two wildcards, the execution body may first expand the first wildcard to obtain all expanded character strings covered by the first wildcard and an occupied capacity of each expanded character string; then sorting all the expansion character strings based on the occupied capacity, and selecting the expansion character strings sorted in the first 10 bits; and finally, expanding the second wildcard character of the selected expanded character string to obtain all the expanded character strings covered by the second wildcard character and the occupied capacity of each expanded character string, and circularly selecting and expanding until the occupied capacity of the expanded character string is lower than 50%.

For ease of understanding, one application scenario of the method for generating information of the embodiments of the present application is provided below. The flow chart of the application scenario is shown in fig. 5, and includes the following steps:

step 501, key value database key metadata set extraction.

Step 502, a piece of key metadata is extracted.

Step 503, separator analysis.

Element splitting, step 504.

Step 505, take the first element to start tree construction.

Step 506, whether there are no elements.

In step 507, a child node is found.

At step 508, child nodes are created.

In step 509, there are too many child node elements.

Step 510, merging the plurality of child nodes under the current node to generate an aggregate node.

Step 511, take the next element and continue to build the tree based on child nodes.

Step 512 updates the number of node keys and the occupancy capacity.

In step 513, there is no key metadata.

Step 514, the statistical tree is output in order.

Step 515, the first wildcard is expanded, and the expanded string is counted.

In step 516, the expanded string occupying more than 50% of the capacity is searched.

Step 517, outputting the wild card expansion result.

Furthermore, the key metadata set derived from the key-value database includes the following 8 keys: the description will be continued by taking a_b_c_1, a_b_c_2, a_b_c_3, a_1_b_2, a_2_b_3, a_3_b_4, a_4_b_5 and a_1_b_6 as examples. First, a_b_c_1, a_b_c_2, and a_b_c_3 are put into the tree. Referring specifically to fig. 6A, a schematic diagram of this tree entry process is shown. After 3 leaf nodes are established, a large number of digital nodes exist under the node c, and the nodes are automatically combined into wild card nodes. Referring specifically to fig. 6B, a schematic diagram of the merging process is shown. After that, a_1_b_2, a_2_b_3, a_3_b_4 are continued into the tree. Referring specifically to fig. 6C, a schematic diagram of this tree entry process is shown. Because more secondary nodes exist under the node a, the digital nodes are automatically combined into common symbol nodes. Referring specifically to fig. 6D, a schematic diagram of the merging process is shown. And in the merging process, a large number of nodes are found to exist under the merged b nodes, and the b nodes are recursively merged into common symbol nodes. Referring specifically to fig. 6E, a schematic diagram of the merging process is shown. Then, a_4_b_5 and a_1_b_6 continue to enter the tree, because the matching with a_b_does not change the tree structure, only the statistical information is updated, and finally the tree building is finished, and the first stage result is output: a_ _ b_ 62.5% and a_ b_ c_ 37.5%. Then, by analyzing the results of the first stage, it is found that the core capacity is mainly concentrated in a_b_this mode, and the second stage spreads two wildcards to obtain: a_1_b _ 25%, a_2_b _ 12.5%, a_3_b _ 12.5%, a_4_b _ 12.5%, a_ _ b_ 2.5%, a_ _ b_ 3_ 12.5%, a_ _ b_ 4.5%, a_ _ b_ 5.12.5% and a_ _ b_ 6.5%. Finally, the main problem of further localization is concentrated in a_1_b _ this pattern, for which further analysis is continued, giving a corresponding solution.

As can be seen from fig. 4, the flow 400 of the method for generating information in this embodiment highlights the insertion step and the traversal step, compared to the corresponding embodiment of fig. 3. Therefore, the scheme described in the embodiment inserts elements in the statistical tree based on the element matching mode, and path repetition in the statistical tree is avoided. In addition, the storage contents of the key value database are combined into a limited number of combined results in a node combining mode, and the analysis efficiency of storage distribution can be improved by analyzing the limited number of combined results, so that the positioning time of the storage bottleneck of the key value database is shortened, and the corresponding solution can be provided quickly.

With further reference to fig. 7, as an implementation of the method shown in the foregoing figures, the present application provides an embodiment of an apparatus for generating information, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 7, the apparatus 700 for generating information of the present embodiment may include: a deriving unit 701, a writing unit 702, and a generating unit 703. Wherein the deriving unit 701 is configured to derive a set of key metadata from a key-value database, wherein each key metadata piece comprises a key and a value-occupying capacity; a writing unit 702 configured to write the set of key metadata into a statistics tree; a generating unit 703 configured to generate storage distribution information of the key value database based on the statistical tree.

In the present embodiment, in the apparatus 700 for generating information: the specific processes of the deriving unit 701, the writing unit 702 and the generating unit 703 and the technical effects thereof may refer to the relevant descriptions of steps 201-203 in the corresponding embodiment of fig. 2, and are not repeated here.

In some optional implementations of the present embodiment, the writing unit 702 includes: a segmentation subunit (not shown in the figure) configured to segment keys in the key metadata set to generate a set of element sequences; an inserting subunit (not shown in the figure) configured to insert, for an element sequence in the element sequence set, elements of the element sequence into the statistics tree in turn, wherein a rank of an element in the element sequence corresponds to a hierarchy of the element in the statistics tree; an updating subunit (not shown in the figure) configured to update the number of keys and the value-occupied capacity in the nodes in the statistical tree based on the value-occupied capacity in the key metadata set.

In some optional implementations of the present embodiment, the partitioning sub-unit includes: a determining module (not shown) configured to determine, for a key in the set of key metadata, a target separator in the key; a splitting module (not shown) is configured to split the key into a sequence of elements based on the target delimiter.

In some optional implementations of this embodiment, the determining module includes: a counting sub-module (not shown in the figure) configured to count the frequency of occurrence of preset separators in the preset separator set in the key; a first determining sub-module (not shown in the figure) is configured to determine a target separator from a set of preset separators based on the counted frequency.

In some optional implementations of the present embodiment, the inserting subunit includes: an insertion module (not shown in the figure) configured to determine, for a current element of the sequence of elements, a position of the current element in the statistics tree, and to insert the current element at the position.

In some optional implementations of this embodiment, the insertion module includes: an insertion sub-module (not shown in the figures) configured to perform the following insertion steps: for the current element of the element sequence, matching the current element in a child node set of a node corresponding to the last element of the current element in the statistical tree; if the matching is successful, determining that the current element is inserted into the statistical tree; determining whether the current element is the last element in the sequence of elements; if the element is the last element, it is determined that the sequence of elements has been inserted into the statistics tree.

In some optional implementations of this embodiment, the insertion module further includes: a loop sub-module (not shown in the figure) configured to, in response to determining that the current element is not the last element in the sequence of elements, take the next element in the sequence of elements as the current element, and continue performing the inserting step.

In some optional implementations of this embodiment, the insertion module further includes: a creating sub-module (not shown in the figure) configured to create a child node corresponding to the current element for a node corresponding to a previous element of the current element if the current element fails to match in the child node set of the node corresponding to the previous element of the current element in the statistics tree.

In some optional implementations of this embodiment, the insertion module further includes: a second determining sub-module (not shown in the figure) configured to determine whether the number of sub-nodes of the node corresponding to the last element of the current element satisfies a merging condition; a first merging sub-module (not shown in the figure) is configured to merge at least part of the sub-nodes of the node corresponding to the element above the current element into a wildcard sub-node if the merging condition is met.

In some optional implementations of this embodiment, the merge conditions include at least one of: the number of the child nodes exceeds a preset child node number threshold, and the number of the global nodes of the statistical tree exceeds a preset global node data threshold.

In some optional implementations of this embodiment, the insertion module further includes: a second merge sub-module (not shown) is configured to merge the numerically corresponding leaf nodes created under the same node in the statistics tree into a wild-type Fu Shezi node.

In some optional implementations of the present embodiment, the generating unit 703 includes: a traversing subunit (not shown) configured to traverse the statistical tree to generate the storage distribution information of the key-value database.

In some alternative implementations of the present embodiment, traversing the sub-units includes: a traversing module (not shown in the figure) configured to perform a preamble traversal on the statistical tree, and output a set of matching strings and an occupied capacity of each matching string; a generating module (not shown in the figure) configured to sort the matched string sets according to the outputted occupied capacity, and generate storage distribution information of the key value database based on the sorting result.

In some optional implementations of this embodiment, the generating module includes: a selection sub-module (not shown) configured to select a target string from the set of matching strings based on the sorting result; a spreading sub-module (not shown in the figure) configured to perform wild card spreading on the target character string to obtain a set of spread character strings and an occupied capacity of each of the spread character strings.

Referring now to FIG. 8, there is illustrated a schematic diagram of a computer system 800 suitable for use in implementing an electronic device (e.g., server 103 shown in FIG. 1) of an embodiment of the present application. The electronic device shown in fig. 8 is only an example and should not impose any limitation on the functionality and scope of use of the embodiments of the present application.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the method of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 801.

It should be noted that, the computer readable medium described in the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present application may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or electronic device. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The described units may also be provided in a processor, for example, described as: a processor includes a export unit, a write unit, and a generation unit. Where the names of the units do not constitute a limitation of the unit itself in each case, for example, the export unit may also be described as "unit exporting a set of key metadata from a key-value database".

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: deriving a set of key metadata from a key-value database, wherein each key metadata piece comprises a key and a value-occupying capacity; writing the key metadata set into a statistics tree; based on the statistical tree, storage distribution information of the key value database is generated.

The foregoing description is only of the preferred embodiments of the present application and is presented as a description of the principles of the technology being utilized. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but it is intended to cover other embodiments in which any combination of features described above or equivalents thereof is possible without departing from the spirit of the invention. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims

1. A method for generating information, comprising:

deriving a set of key metadata from a key-value database, wherein each key metadata piece comprises a key and a value-occupying capacity;

writing the key metadata set into a statistics tree;

generating storage distribution information of the key value database based on the statistical tree;

wherein the writing the key metadata set into the statistics tree comprises:

dividing keys in the key metadata set to generate an element sequence set;

for the element sequence in the element sequence set, sequentially inserting the elements of the element sequence into a statistical tree, wherein the rank of the element in the element sequence corresponds to the hierarchy of the element in the statistical tree;

And updating the key number and the value occupation capacity in the nodes in the statistical tree based on the value occupation capacity in the key metadata set.

2. The method of claim 1, wherein the partitioning of keys in the set of key metadata generates a set of element sequences, comprising:

for a key in the key metadata set, determining a target separator in the key;

the key is split into a sequence of elements based on the target separator.

3. The method of claim 2, wherein the determining the target separator in the key comprises:

counting the frequency of the occurrence of the preset separator in the preset separator set in the key;

based on the counted frequency, a target separator is determined from the preset separator set.

4. The method of claim 1, wherein inserting elements of the sequence of elements into the statistics tree sequentially comprises:

for a current element of the sequence of elements, determining a position of the current element in the statistics tree, and inserting the current element at the position.

5. The method of claim 4, wherein the determining the location of the current element in the statistics tree and inserting the current element at the location comprises:

The following insertion steps are performed: matching the current element in a child node set of a node corresponding to the last element of the current element in the statistical tree; if the matching is successful, determining that the current element is inserted into the statistical tree; determining whether the current element is the last element in the sequence of elements; if the element is the last element, determining that the element sequence has been inserted into the statistics tree.

6. The method of claim 5, wherein the determining the location of the current element in the statistics tree and inserting the current element at the location further comprises:

in response to determining that the current element is not the last element in the sequence of elements, taking the next element in the sequence of elements as the current element, and continuing to perform the inserting step.

7. The method of claim 5, wherein the determining the location of the current element in the statistics tree and inserting the current element at the location further comprises:

if the matching of the current element in the child node set of the node corresponding to the last element of the current element in the statistical tree fails, creating the child node corresponding to the current element for the node corresponding to the last element of the current element.

8. The method of one of claims 5-7, wherein the determining a location of a current element in the statistics tree and inserting the current element at the location further comprises:

determining whether the number of child nodes of the node corresponding to the last element of the current element meets a merging condition;

and if the merging condition is met, merging at least part of the child nodes of the node corresponding to the last element of the current element into a wild card child node.

9. The method of claim 8, wherein the merge condition comprises at least one of: the number of the child nodes exceeds a preset child node number threshold value, and the number of the global nodes of the statistical tree exceeds a preset global node data threshold value.

10. The method of claim 8, wherein the determining the location of the current element in the statistics tree and inserting the current element at the location further comprises:

the leaf nodes corresponding to numbers created under the same node in the statistical tree are combined into a wild-type Fu Shezi node.

11. The method of claim 1, wherein the generating storage distribution information of the key-value database based on the statistical tree comprises:

Traversing the statistical tree to generate storage distribution information of the key value database.

12. The method of claim 11, wherein the traversing the statistics tree to generate storage distribution information for the key-value database comprises:

performing preamble traversal on the statistical tree, and outputting a matched character string set and the occupied capacity of each matched character string;

and sorting the matched character string sets according to the output occupied capacity, and generating storage distribution information of the key value database based on the sorting result.

13. The method of claim 12, wherein the generating storage distribution information of the key-value database based on the ordering result comprises:

selecting a target character string from the matched character string set based on the sorting result;

and performing wild card expansion on the target character string to obtain an expanded character string set and the occupied capacity of each expanded character string.

14. An apparatus for generating information, comprising:

a deriving unit configured to derive a set of key metadata from a key-value database, wherein each piece of key metadata comprises a key and a value-occupying capacity;

a writing unit configured to write the key metadata set into a statistics tree;

A generation unit configured to generate storage distribution information of the key value database based on the statistical tree;

wherein the writing unit includes:

a segmentation subunit configured to segment keys in the key metadata set to generate an element sequence set;

an inserting subunit configured to insert, for an element sequence in the element sequence set, elements of the element sequence into a statistics tree in turn, wherein a rank of an element in the element sequence corresponds to a hierarchy of the element in the statistics tree;

an updating subunit configured to update the number of keys and the value-occupied capacity in the nodes in the statistics tree based on the value-occupied capacity in the key metadata set.

15. The apparatus of claim 14, wherein the segmentation subunit comprises:

a determination module configured to, for a key in the set of key metadata, determine a target separator in the key;

a splitting module is configured to split the key into a sequence of elements based on the target separator.

16. The apparatus of claim 15, wherein the means for determining comprises:

a statistics sub-module configured to count the frequency of occurrence of preset segmenters in the set of preset separators in the key;

A first determination submodule configured to determine a target separator from the set of preset separators based on the counted frequency.

17. The apparatus of claim 14, wherein the insertion subunit comprises:

an insertion module configured to determine, for a current element of the sequence of elements, a position of the current element in the statistics tree, and insert the current element at the position.

18. The apparatus of claim 17, wherein the insertion module comprises:

an insertion sub-module configured to perform the following insertion steps: for the current element of the element sequence, matching the current element in a child node set of a node corresponding to the last element of the current element in the statistical tree; if the matching is successful, determining that the current element is inserted into the statistical tree; determining whether the current element is the last element in the sequence of elements; if the element is the last element, determining that the element sequence has been inserted into the statistics tree.

19. The apparatus of claim 18, wherein the insertion module further comprises:

and a loop sub-module configured to, in response to determining that the current element is not the last element in the sequence of elements, take the next element in the sequence of elements as the current element, and continue performing the inserting step.

20. The apparatus of claim 18, wherein the insertion module further comprises:

the creating sub-module is configured to create a sub-node corresponding to the current element for the node corresponding to the previous element of the current element if the matching of the current element in the sub-node set of the node corresponding to the previous element of the current element in the statistics tree fails.

21. The apparatus of one of claims 18-20, wherein the insertion module further comprises:

a second determining sub-module configured to determine whether the number of sub-nodes of the node corresponding to the last element of the current element satisfies a merging condition;

and the first merging sub-module is configured to merge at least part of sub-nodes of the node corresponding to the last element of the current element into a wild card sub-node if the merging condition is met.

22. The apparatus of claim 21, wherein the merge condition comprises at least one of: the number of the child nodes exceeds a preset child node number threshold value, and the number of the global nodes of the statistical tree exceeds a preset global node data threshold value.

23. The apparatus of claim 21, wherein the insertion module further comprises:

a second merging sub-module configured to merge the numerically corresponding leaf nodes created under the same node in the statistical tree into a wild-type Fu Shezi node.

24. The apparatus of claim 14, wherein the generating unit comprises:

and a traversing subunit configured to traverse the statistical tree to generate storage distribution information of the key value database.

25. The apparatus of claim 24, wherein the traversal subunit comprises:

the traversing module is configured to conduct preamble traversal on the statistical tree and output a matching character string set and the occupied capacity of each matching character string;

and the generation module is configured to sort the matched character string set according to the output occupied capacity and generate storage distribution information of the key value database based on the sorting result.

26. The apparatus of claim 25, wherein the means for generating comprises:

a selecting sub-module configured to select a target character string from the set of matching character strings based on the sorting result;

and the expansion submodule is configured to perform wild card expansion on the target character string to obtain an expansion character string set and the occupied capacity of each expansion character string.

27. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-13.

28. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-13.