CN113760839A

CN113760839A - Log data compression processing method and device, electronic equipment and storage medium

Info

Publication number: CN113760839A
Application number: CN202011339912.3A
Authority: CN
Inventors: 李汶良
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-12-07

Abstract

The embodiment of the invention discloses a log data compression processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring each first attribute name in first log data to be processed; determining a first compressed name corresponding to each first attribute name according to a mapping relation between the currently stored attribute name and the compressed name and each first attribute name in the first log data, wherein the number of characters of the first compressed name is less than that of the first attribute name; and replacing each first attribute name in the first log data with a corresponding first compressed name to obtain compressed second log data. By the technical scheme of the embodiment of the invention, the log data volume can be greatly reduced under the condition of not destroying the text structure of the log data.

Description

Log data compression processing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the internet technology, in particular to a log data compression processing method and device, electronic equipment and a storage medium.

Background

With the rapid development of internet technology, system service architecture is developed from original monolithic type to distributed type and the most popular micro-service type at present, so that the request interaction process between services becomes more complex.

In general, a server may write a generated log file into a full-text search engine (e.g., an elastic search server), so that when a failure occurs, a log search may be performed in the full-text search engine using a log keyword, so that a failure cause may be quickly analyzed based on a search result. As the application access amount becomes larger, the amount of generated log data also becomes larger, and thus the log data needs to be compressed in order to reduce the amount of log data.

At present, log data are usually compressed by a common data compression method, and then the compressed log data are written into a log file or a full-text search engine.

However, in the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:

the common data compression mode can lead the compressed log text to have a different structure from the original log text, thereby destroying the text structure of the log data, leading a full-text search engine to be incapable of segmenting words and establishing indexes, further being incapable of carrying out full-text search based on keywords and reducing the search efficiency.

Disclosure of Invention

The embodiment of the invention provides a log data compression processing method and device, electronic equipment and a storage medium, which are used for greatly reducing the volume of log data under the condition that the text structure of the log data is not damaged.

In a first aspect, an embodiment of the present invention provides a log data compression processing method, including:

acquiring each first attribute name in first log data to be processed;

determining a first compressed name corresponding to each first attribute name according to a mapping relation between the currently stored attribute name and the compressed name and each first attribute name in the first log data, wherein the number of characters of the first compressed name is less than that of the first attribute name;

and replacing each first attribute name in the first log data with the corresponding first compressed name to obtain compressed second log data.

In a second aspect, an embodiment of the present invention further provides a log data compression processing apparatus, including:

the first log data acquisition module is used for acquiring each first attribute name in the first log data to be processed;

a first compressed name determining module, configured to determine, according to a mapping relationship between currently stored attribute names and compressed names and each first attribute name in the first log data, a first compressed name corresponding to each first attribute name, where a number of characters of the first compressed name is smaller than a number of characters of the first attribute name;

and the first attribute name replacing module is used for replacing each first attribute name in the first log data with the corresponding first compressed name to obtain compressed second log data.

In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a memory for storing one or more programs;

when the one or more programs are executed by the one or more processors, the one or more processors implement the log data compression processing method provided by any embodiment of the invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement a log data compression processing method according to any embodiment of the present invention.

The embodiment of the invention has the following advantages or beneficial effects:

the first compressed name corresponding to each first attribute name in the first log data to be processed is determined according to the mapping relation between the currently stored attribute name and the compressed name, and each first attribute name in the first log data is replaced by the corresponding first compressed name with a small number of characters, so that the log data volume can be greatly reduced, and because only the first attribute name in the first log data is replaced, the rest part of the first attribute name is unchanged, the text structure of the log data is not damaged, so that the subsequent full-text search can be performed based on the keyword, and the search efficiency is improved.

Drawings

Fig. 1 is a flowchart of a log data compression processing method according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an example of a log data compression according to an embodiment of the present invention;

fig. 3 is a flowchart of a log data compression processing method according to a second embodiment of the present invention;

fig. 4 is a flowchart of a log data compression processing method according to a third embodiment of the present invention;

fig. 5 is an example of a log data compression processing architecture according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of a log data compression processing apparatus according to a fourth embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a log data compression processing method according to an embodiment of the present invention, which is applicable to a case of compressing log data of a serialized character string structure. The method can be executed by a log data compression processing device, the device can be realized by software and/or hardware, and is integrated with an application end corresponding to an application program, wherein the application end can be a client end provided with the application program, and can also be a service end serving the application program. As shown in fig. 1, the method specifically includes the following steps:

s110, acquiring each first attribute name in the first log data to be processed.

The first log data may be log data having a character string structure after serialization, or may be log data to be serialized, that is, request response data and/or program internal data in an application to be serialized. That is, in the present embodiment, the log data may be serialized and then compressed, or the log data may be compressed while being serialized. Serialization can refer to the process of converting request response data and/or program internal data in an application into a form that can be stored or transmitted. A string structure may refer to a data structure in the form of a string represented using a structured language. The structured Language may be, but is not limited to, JSON (JavaScript Object Notation) or XML (Extensible Markup Language). The attribute name may refer to a name of an attribute field included in the first log data.

Specifically, based on the service requirement, the request response data and/or the program internal data generated by the application terminal may be stored as log data in a log file, so as to perform fault analysis based on the log data in the following. For the case of performing compression processing after serializing log data, when generating request response data and/or program internal data to be stored, the application terminal may serialize the request response data object and/or program internal data object by using a structured language, obtain the serialized request response data and/or program internal data having a string structure, that is, first log data, and obtain each first attribute name included in the first log data.

It should be noted that the data characteristics of the first log data to be processed are as follows: there is a large amount of list-type data and the attribute name field is fixed, so that there may be a large number of duplicate attribute name fields, resulting in a large amount of log data.

S120, determining a first compressed name corresponding to each first attribute name according to the mapping relation between the currently stored attribute name and the compressed name and each first attribute name in the first log data, wherein the number of characters of the first compressed name is less than that of the first attribute name.

The compressed name may be a name obtained by character-compressing the attribute name. The characters used by the compressed name may refer to the specification of the programming language identifier to ensure the validity of the compressed name. For example, the compression name may be set with $, underline, number, and upper and lower case letters. The mapping relationship between the attribute name and the compression name may refer to a one-to-one correspondence relationship between the attribute name and the compression name. The mapping relation can be preset to improve the mapping efficiency; and the method can also be set in real time to reduce the occupancy rate of the memory space. The number of characters of each compressed name is smaller than the number of characters of the corresponding attribute name so as to reduce the number of characters of the attribute name. For example, the attribute names: name, its corresponding compression name: a. the compression name in this embodiment may be composed of one or two characters, so as to reduce the number of characters as much as possible and improve the compression effect.

The mapping relationship between the attribute name and the compressed name can be stored in the form of a mapping table. The mapping table may be implemented using a hash table. The mapping relationship in this embodiment may include a forward mapping relationship (i.e., a forward mapping table) and a reverse mapping relationship (i.e., a reverse mapping table). Wherein, the forward mapping relation refers to the mapping from the attribute name to the compression name. A reverse mapping relationship refers to a mapping from a compressed name to an attribute name. For example, if the attribute name is: and if the message corresponds to the compression name b, the forward mapping relation is message- > b, and the reverse mapping relation is b- > message.

Specifically, a compressed name corresponding to each attribute name that may be used in the log data may be preset, a mapping relationship including all the attribute names may be generated, and the generated mapping relationship may be stored, so that when compressing the first log data, the first compressed name corresponding to each first attribute name in the first log data may be quickly and directly determined based on the stored mapping relationship. For example, the first compressed name corresponding to each first attribute name can be determined more quickly based on the forward mapping relationship between the attribute names and the compressed names, i.e., the forward mapping table, so as to improve the mapping efficiency.

S130, replacing each first attribute name in the first log data with a corresponding first compressed name, and obtaining compressed second log data.

Specifically, each first attribute name in the first log data is replaced with a corresponding first compressed name with a smaller number of characters, so that all the first attribute names of the first log data are compressed, and the amount of log data is reduced. For example, fig. 2 shows an example of compression of log data. The left side in fig. 2 is the first log data, and the right side is the second log data after the compression processing. As can be seen, the amount of log data can be greatly reduced by performing compression processing on all the first attribute names. In the embodiment, only the first attribute name in the first log data is replaced, and the rest part in the first log data is not changed, so that the text structure of the first log data is not damaged, the full-text log search can be performed based on the keyword in the following process, and the search efficiency is improved.

It should be noted that there are often more repeated attribute names in different object information generated by the application end, and the number of different attribute names is not too large in general, and there are usually hundreds to thousands of different attribute names, so that the attribute names are compressed based on the mapping relationship, and the overall log data amount can be greatly reduced. In addition, in different data application scenes, the attribute value information corresponding to the same attribute name is different and cannot be expected, so that the attribute value information corresponding to the first attribute name does not need to be compressed, the subsequent full-text log search by using the keywords is facilitated, and the search efficiency is improved.

According to the technical scheme of the embodiment, the first compression name corresponding to each first attribute name in the first log data to be processed is determined according to the mapping relation between the currently stored attribute name and the compression name, and each first attribute name in the first log data is replaced by the corresponding first compression name with a small number of characters, so that the log data volume can be greatly reduced. In addition, only the first attribute name in the first log data is replaced, and the rest part is unchanged, so that the text structure of the log data is not damaged, full-text search can be performed based on the keywords in the following process, and the search efficiency is improved.

On the basis of the above technical solution, S110 may include: the method comprises the steps of obtaining first log data to be serialized, and analyzing each first attribute name in the first log data when the first log data are serialized. Accordingly, the "obtaining the compressed second log data" in S130 includes: and performing serialization splicing processing based on each first compression name to obtain serialized second log data.

Specifically, for the case of performing compression processing during serialization of log data, the application end may directly use the generated request response data and/or program internal data to be stored as first log data to be serialized, and when serializing the first log data, analyze each first attribute name in the first log data, so as to replace each first attribute name in the first log data with a corresponding first compressed name with a smaller number of characters, thereby greatly reducing the amount of log data. And according to a format splicing mode required by the structured language, based on each first compression name, performing serialization splicing processing on the processed first log data to obtain serialized second log data. By directly compressing the log data in the serialization process, the data volume of the serialized log data can be greatly reduced, so that the memory space occupancy rate of the serialized log data is reduced, the memory recovery efficiency of an application end is improved, and the system performance is improved.

On the basis of the above technical solution, S120 may include: detecting whether a compressed name corresponding to each first attribute name in the first log data exists in a mapping relation between the currently stored attribute name and the compressed name; if so, determining the compression name corresponding to the first attribute name as the first compression name corresponding to the first attribute name; if not, determining a first compressed name corresponding to the first attribute name based on a preset mapping mode, and storing the first attribute name and the corresponding first compressed name into a mapping relation so as to update the mapping relation.

The preset mapping mode may be a preset mode for establishing a mapping relationship between the attribute name and the compression name.

Specifically, in the process of compressing the attribute names, the embodiment may determine, in real time, the compressed names corresponding to the related attribute names, so as to establish and store the mapping relationships corresponding to the attribute names in real time, so that the currently stored mapping relationships only include the mapping relationships corresponding to the currently related attribute names and do not include the mapping relationships corresponding to the attribute names that are not currently related, so that a useless mapping relationship does not need to be stored, the memory space occupancy rate is reduced, and the memory utilization rate is improved.

For example, for each first attribute name in the first log data, it may be detected whether a mapping relationship corresponding to the first attribute name exists in a currently stored mapping relationship, such as a currently stored forward mapping table, and if the mapping relationship exists, a compressed name corresponding to the first attribute name may be determined directly based on the mapping relationship. If not, the mapping relation corresponding to the first attribute name needs to be generated in real time, and at this time, the first compression name corresponding to the first attribute name may be determined based on a preset mapping mode. For example, each usable compression name may be generated and stored in advance using a preset character, and a cursor pointer for pointing to a storage location of a compression name to be used may be initialized, so that a storage location of a next compression name to be used may be obtained based on the game pointer, and the compression name stored in the storage location may be determined as the first compression name corresponding to the first attribute name. After the first compressed name is determined, the cursor pointer can be moved backward by one bit based on the compressed name use sequence so as to point the cursor pointer to the storage position of the next compressed name to be used, so that the first compressed name can be quickly generated based on the cursor pointer, and the mapping efficiency is improved. The compressed name using sequence can be generated by sequentially arranging the number of characters contained in the compressed name from small to large, so that the compressed name with the minimum number of characters is preferentially used, the log data volume is further reduced, and the compression effect is improved.

The mapping relationship between the first attribute name and the corresponding first compressed name is stored, and the currently stored mapping relationship is updated in real time, for example, the generated mapping relationship between the first attribute name and the corresponding first compressed name can be stored in a forward mapping table, so that the first compressed name corresponding to the first attribute name can be directly inquired from the forward mapping table in the following process, and therefore, the currently stored mapping relationships are all the mapping relationships corresponding to the related first attribute name, useless mapping relationships are not stored, and the memory space occupancy rate is reduced.

Example two

Fig. 3 is a flowchart of a log data compression processing method according to a second embodiment of the present invention, where "determining a first compression name corresponding to the first attribute name based on a preset mapping manner" is further optimized in this embodiment based on the foregoing embodiments. Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted.

Referring to fig. 3, the log data compression processing method provided in this embodiment specifically includes the following steps:

s310, acquiring each first attribute name in the first log data to be processed.

S320, detecting whether a compressed name corresponding to each first attribute name in the first log data exists in the mapping relation between the currently stored attribute name and the compressed name, if so, going to the step S330, and if not, going to the step S340.

S330, determine the compressed name corresponding to the first attribute name as the first compressed name corresponding to the first attribute name, and go to step S380.

Specifically, when a mapping relationship corresponding to the first attribute name exists in the currently stored mapping relationship, the first compressed name corresponding to the first attribute name may be directly obtained based on the mapping relationship. When all the first attribute names in the first log data have determined the corresponding first compression names, the process may proceed to step S380 to perform the compression processing operation of the attribute names.

And S340, acquiring the current number of the currently used compression names.

The current number of currently used compression names may refer to the number of compression names used in all the currently stored mappings. The mapping relationship between the attribute names and the compressed names in this embodiment is a one-to-one correspondence relationship, so that the current number of the currently used compressed names is equal to the number of the currently related attribute names. For example, the current number of compression names that have been currently used may be the number of compression names contained in the currently stored mapping table, or the number of attribute names contained in the currently stored mapping table.

Specifically, when the mapping relationship corresponding to the first attribute name does not exist in the currently stored mapping relationship, it indicates that the first compressed name corresponding to the first attribute name needs to be generated in real time, and at this time, the current number of the currently used compressed names may be counted based on the currently stored mapping relationship. For example, the present embodiment may set a global variable for storing the current number of currently used compressed names in real time, so that the current number may be quickly obtained based on the global variable without performing statistics each time, thereby improving the mapping efficiency.

S350, if it is detected that the current number is smaller than the first preset number, generating a first compressed name corresponding to the first attribute name by using a single preset character, and proceeding to step S370.

The first predetermined number may refer to a total number of compression names that can be generated using one predetermined character. When generating a compact name using a preset character, the preset character used needs to conform to the specifications of the programming language identifier. For example, this preset character may be $ or upper and lower case letters. When compressed names are generated using $ or upper and lower case letters, the total number of compressed names that can be generated is 53, i.e., the first preset number is 53.

Specifically, it may be determined whether the compressed name can be generated using one-bit preset characters by detecting whether the current number is smaller than a first preset number. If the current number is smaller than the first preset number, it is indicated that there is still an unused compressed name generated by using a preset character, and at this time, a first compressed name corresponding to the first attribute name may be generated by continuously using the preset character. For example, a compressed name may be randomly generated by using a preset character, and whether the compressed name is used in the currently stored mapping relationship is detected, and if not, the compressed name may be determined as the first compressed name corresponding to the first attribute name; if so, another compression name needs to be generated until a compression name which is not currently used is generated as the first compression name.

For example, the step S350 of generating a first compressed name corresponding to the first attribute name by using a one-bit preset character may include: and determining a target preset character corresponding to the first attribute name according to the current number based on the mapping sequence of the first preset character, and taking the target preset character as a corresponding first compressed name.

The first-bit predetermined character mapping order may refer to a mapping use order generated by using each predetermined character available in the first bit. For example, the first digit may use $ and case letters for each default character, so that sorting $ and case letters results in the first digit default character mapping order.

Specifically, each preset character that can be the first bit may be stored in advance based on the first-bit preset character mapping order. When a compressed name needs to be generated by using a one-bit preset character, the next preset character in the current number can be used as a target preset character according to the mapping sequence of the first preset character, and the target preset character is used as a corresponding first compressed name. For example, if the current number is 5, it indicates that 5 preset characters are currently used, and at this time, the 6 th preset character may be used as the corresponding first compressed name according to the mapping sequence of the first preset character, so that the first compressed name including only one preset character may be generated more quickly, and the mapping efficiency is improved.

For example, determining, based on the first bit preset character mapping order, a target preset character corresponding to the first attribute name according to the current number as a corresponding first compressed name may include: determining a target ASCII code value corresponding to the first attribute name according to the current number according to the arrangement sequence of the ASCII code values corresponding to the first-bit usable preset characters from small to large; and determining a target preset character corresponding to the target ASCII code value as a first compressed name corresponding to the first attribute name.

Specifically, the first available preset characters may be arranged in the order from small to large in the ASCII code value, for example, when the first available preset characters are $ and capital and lowercase, the order after sorting is: $ capital letters A through Z, and lowercase letters a through Z. And determining the target ASCII code value corresponding to the first attribute name according to the current number value. For example, if the current number X is 0, the target ASCII code value is 0; if the current number X is greater than 0 and less than or equal to 26, the target ASCII code value may be determined based on equation 65+ (X-1); if the current number X is greater than 26 and less than 53 (i.e., the first predetermined number), the target ASCII code value can be determined based on equation 97+ (X-27). The corresponding target preset characters can be obtained by converting the target ASCII code numerical value, and the target preset characters are used as the first compression names corresponding to the first attribute names, so that the usable target preset characters can be quickly determined in real time based on the ASCII code numerical value, all the usable preset characters at the first position do not need to be stored in advance, the occupancy rate of a memory space is reduced, and the operation performance is improved.

It should be noted that, after the first compression name corresponding to the first attribute name is generated by using one-bit preset character, it indicates that a used compression name is added currently, at this time, 1 may be added to the current number of the currently used compression name, and the current number is updated in real time, so that the accurate current number may be directly obtained in the following, and the mapping efficiency and the mapping accuracy are improved.

S360, if it is detected that the current number is greater than or equal to the first preset number and less than the second preset number, a first compressed name corresponding to the first attribute name is generated by using two preset characters, and the process goes to step S370.

The second preset number may refer to a total number of compression names that may be generated using two-bit preset characters. When generating a compact name using two-bit default characters, the two default characters used need to conform to the specifications of the programming language identifier. For example, the first available preset character may be $ or capital letter, and the second available preset character may be $ or underline, number, or capital letter, so that the total number of compressed names that can be generated by any two preset characters is 3392, i.e., the first preset number is 3392.

Specifically, the current number may be detected based on a first preset number and a second preset number, and it is determined whether the compression name may be generated using one-bit preset characters or two-bit preset characters. If the current number is detected to be greater than or equal to the first preset number and less than the second preset number, it is indicated that all the compression names generated by using one-bit preset characters are used, but compression names generated by using two-bit preset characters which are not used exist currently, and at this time, the first compression name corresponding to the first attribute name can be generated by using the two-bit preset characters. For example, a compression name may be randomly generated by using two usable preset characters, and whether the compression name is used in the currently stored mapping relationship is detected, and if not, the compression name may be determined as a first compression name corresponding to the first attribute name; if so, another compression name needs to be generated until a compression name which is not currently used is generated as the first compression name.

It should be noted that, if it is detected that the current number is greater than or equal to the second preset number and is less than the total number of the compression names that can be generated by using the three-bit preset characters, the three-bit preset characters may be used to generate the first compression name corresponding to the first attribute name based on a manner similar to the manner of generating the compression name by using the two-bit preset characters, which is not described herein again.

For example, the step S360 of generating a first compressed name corresponding to the first attribute name by using two preset characters may include: processing the current quantity according to a third preset quantity and a fourth preset quantity, and determining a first reference value and a second reference value; determining a first-bit preset character corresponding to the first attribute name according to a first-bit reference value based on a first-bit preset character mapping sequence; determining a second-bit preset character corresponding to the first attribute name according to a second-bit reference numerical value based on a second-bit preset character mapping sequence; and splicing the first preset character and the second preset character to obtain a first compressed name corresponding to the first attribute name.

The third preset number may refer to the number of preset characters that can be used by the first digit. The fourth preset number may refer to the number of preset characters available for the second bit. The third preset number in this embodiment may be equal to the first preset number. For example, when the first available preset character may be $ or upper case letters, the corresponding third preset number is 53. When the second available predetermined character can be $, underline, number, or lower case letter, the corresponding fourth predetermined number is 64. The first reference value may refer to a reference value used to determine a first predetermined character, such as the number of predetermined characters used by the first bit. The second bit reference value may refer to a reference value used to determine a second bit default character, such as the number of default characters used by the second bit. The second-bit predetermined character mapping order, which is similar to the first-bit predetermined character mapping order described above, may refer to a mapping usage order generated by using each predetermined character available for the second bit. For example, the preset characters available for the second position are $, underline, number and upper and lower case letters, so that $, underline, number and upper and lower case letters are sorted to obtain the mapping order of the second position preset characters.

Specifically, the current number may be processed by using a third preset number and a fourth preset number, and the number of the preset characters used by the first bit (i.e., the first bit reference value) and the number of the preset characters used by the second bit (i.e., the second bit reference value) may be determined. In the above-described manner, based on the mapping order of the first-bit default characters, the first-bit default character corresponding to the first attribute name is determined according to the first-bit reference value. For example, the next default character located at the first reference value may be used as the first default character according to the first default character mapping order; or according to the arrangement sequence of the ASCII code values corresponding to the first-bit usable preset characters from small to large, determining a target ASCII code value according to the first-bit reference value, and taking the preset characters corresponding to the target ASCII code value as the first-bit preset characters. The second bit default character in this embodiment is determined in a manner similar to that of the first bit default character. For example, the next default character located at the second bit reference value may be used as the second bit default character according to the second bit default character mapping order; or according to the arrangement sequence of the ASCII code values corresponding to the preset characters which can be used by the second bit from small to large, the target ASCII code value is determined according to the reference value of the second bit, and the preset characters corresponding to the target ASCII code value are used as the second-bit preset characters. For example, the preset characters available for the second place may be arranged in the order from small to large in the ASCII code value, for example, when the preset characters available for the second place are $, underline, number, and upper and lower case letter, the order after the ordering is: $ number 0 to 9, capital letters a to Z, underline, and lowercase letters a to Z. And determining the target ASCII code value corresponding to the second preset character according to the second reference value. For example, if the second bit reference value Y is 0, the target ASCII code value is determined to be 0; if the second bit reference value Y is greater than 0 and less than or equal to 10, the target ASCII code value may be determined based on equation 48+ (Y-1); if the second bit reference value Y is greater than 10 and less than or equal to 36, the target ASCII code value may be determined based on equation 65+ (Y-11). If the second bit reference value Y is equal to 37, determining the target ASCII code value to be 95; if the second bit reference value Y is greater than 37 and less than 64 (i.e., the fourth predetermined number), the target ASCII code value can be determined based on equation 97+ (Y-38). The corresponding second-bit preset character can be obtained by converting the target ASCII code value. And splicing the first preset character and the second preset character to obtain a splicing result which is the first compression name corresponding to the first attribute name, so that the first compression name only comprising the two preset characters can be quickly generated, and the mapping efficiency is improved.

Exemplarily, the processing the current number according to the third preset number and the fourth preset number to determine the first bit reference value and the second bit reference value may include: determining a difference between the current quantity and a third preset quantity; the difference is divided by a fourth predetermined number to obtain a quotient as the first digit reference value and a remainder as the second digit reference value.

Specifically, the current number may be subtracted by a third preset number, the difference may be divided by a fourth preset number, the division result may be rounded down to obtain a first reference number, and the division result may be rounded down to obtain a remainder, so as to obtain a second reference number.

It should be noted that, after the two-bit preset character is used to generate the first compression name corresponding to the first attribute name, it indicates that a used compression name is added currently, at this time, 1 may be added to the current number of the currently used compression names, and the current number is updated in real time, so that the accurate current number may be directly obtained in the following process, and the mapping efficiency and the mapping accuracy are improved.

S370, store the first attribute name and the corresponding first compressed name in the mapping relationship, so as to update the mapping relationship.

It should be noted that the operation of step S380 may be executed after the first compressed name corresponding to each first attribute name in the first log data is determined by performing the operations of steps S350-S370 in a loop.

And S380, replacing each first attribute name in the first log data with a corresponding first compressed name, and obtaining compressed second log data.

According to the technical scheme of the embodiment, when whether the current number of the currently used compression names is smaller than the first preset number is detected, the first compression name corresponding to the first attribute name is generated by using one-bit preset characters, and when the current number is detected to be larger than or equal to the first preset number and smaller than the second preset number, the first compression name corresponding to the first attribute name is generated by using two-bit preset characters, so that the compression name with the minimum number of characters can be preferentially generated, the compressed log data volume is further reduced, and the compression effect is further improved.

EXAMPLE III

Fig. 4 is a flowchart of a log data compression processing method according to a third embodiment of the present invention, and this embodiment describes in detail a log search process after obtaining second log data after compression processing based on the foregoing embodiments. Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted.

Referring to fig. 4, the log data compression processing method provided in this embodiment specifically includes the following steps:

s410, acquiring each first attribute name in the first log data to be processed.

In particular, fig. 5 gives an example of a log data compression processing architecture. As shown in fig. 5, the log serialization module in the application may serialize the request response data object and/or the program internal data object, obtain the request response data and/or the program internal data of the serialized string structure, that is, the first log data to be processed, and obtain each first attribute name in the first log data.

S420, determining a first compressed name corresponding to each first attribute name according to the mapping relation between the currently stored attribute name and the compressed name and each first attribute name in the first log data, wherein the number of characters of the first compressed name is less than that of the first attribute name.

Specifically, the centralized cache module in fig. 5 may be configured to store the mapping relationship between the attribute name and the compressed name, and provide writing and querying functions of the mapping relationship, so as to implement mapping relationship data sharing between the application and the full-text search engine. The local memory cache module in the application end can load a mapping relation (such as a forward mapping table) to the local memory from the centralized cache module, so that the compression module can query the mapping relation corresponding to each first attribute name in the first log data from the local memory cache module, if the mapping relation exists, the compression module determines a corresponding first compression name based on the mapping relation, if the mapping relation does not exist, the compression module generates a corresponding first compression name in real time based on a preset mapping mode, and stores the generated mapping relation between the first compression name and the first attribute name in the local memory cache module and the centralized cache module in real time, so that real-time updating of the mapping relation is realized, and mapping accuracy is guaranteed.

S430, replacing each first attribute name in the first log data with a corresponding first compressed name, and obtaining compressed second log data.

Specifically, as shown in fig. 5, after obtaining the serialized first log data, the log serialization module may obtain a first compression name (i.e., a compression field name) corresponding to each first attribute name from the compression module, and replace each first attribute name with a corresponding first compression name, thereby obtaining the compressed second log data.

S440, writing the second log data into the log file, and writing the log file into a full-text search engine, so as to perform log search in the full-text search engine based on the log keywords input by the user, and obtain search result data.

The full-text search engine may be a server that retrieves log data matching the log keywords input by the user from all log files and displays the search results to the user. For example, the full text search engine may be, but is not limited to, an Elasticsearch server. The log keywords may refer to any words to be searched, which are input by the user based on business requirements. For example, the log key may be an attribute name or attribute value information.

Specifically, as shown in fig. 5, the log serialization module in the application end may write the second log data into the log file, and output the log file to the search module of the full-text search engine for storage and index establishment. The log files stored in the full-text search engine do not damage the text structure of the original log data, so that the full-text search engine can perform word segmentation and establish indexes, and further can perform full-text search based on the input log keywords, and the search efficiency is improved.

Exemplarily, S440 may include: determining a search keyword for log search according to a log keyword input by a user; searching in each log file based on the search keyword to obtain third log data matched with the search keyword; determining a third attribute name corresponding to each third compressed name according to the mapping relation between the currently stored attribute name and the compressed name and each third compressed name in the third log data; and replacing each third compressed name of the third log data with a corresponding third attribute name to obtain decompressed search result data.

The search keyword may be a keyword that the full-text search engine can search out log data results. The third log data may refer to a log search result containing a search keyword directly searched by the full-text search engine.

Specifically, as shown in fig. 5, a log keyword retrieval module in a full-text search engine may receive a log keyword input by a user. A local memory cache module in the full-text search engine may load a mapping relationship (such as a forward mapping table) from the centralized cache module to the local memory, so that the keyword compression module may determine a search keyword corresponding to the log keyword based on the stored mapping relationship. For example, if it is detected that the log keyword input by the user is an attribute name, the log keyword may be compressed based on the mapping relationship stored in the local memory cache module, so as to obtain a search keyword after the log keyword is compressed. If the log keywords input by the user are detected to be non-attribute names, such as attribute value information, the log keywords can be directly determined to be search keywords without compression processing. The search module in the full-text search engine can search in all log files based on the search keywords to obtain third log data matched with the search keywords. The decompression module in the full-text search engine can query the mapping relation from the local memory cache module, determine a third attribute name corresponding to each third compression name in the third log data, replace each third compression name of the third log data with a corresponding third attribute name, obtain decompressed search result data, and display the obtained search result data through the log display module, so that fault analysis can be performed on the basis of the displayed log data.

It should be noted that the mapping relationship loaded by the full-text search engine from the centralized cache module includes mapping relationships corresponding to all related attribute names, so that a log keyword and a mapping relationship corresponding to each third compressed name in the third log data may exist in the loaded mapping relationship, so that the corresponding search keyword and the corresponding third attribute name may be directly determined, and thus the search efficiency may be further improved by sharing the mapping relationship.

According to the technical scheme, the second log data are written into the log file, and the log file is written into the full-text search engine, so that the written log file does not damage the text structure of the original log data, the full-text search engine can perform word segmentation and index establishment, full-text search can be performed based on the input log keywords, and the search efficiency is improved.

The following is an embodiment of a log data compression processing apparatus according to an embodiment of the present invention, which belongs to the same inventive concept as the log data compression processing methods according to the above embodiments, and reference may be made to the above embodiment of the log data compression processing method for details that are not described in detail in the embodiment of the log data compression processing apparatus.

Example four

Fig. 6 is a schematic structural diagram of a log data compression processing apparatus according to a fourth embodiment of the present invention, which is applicable to a case of performing compression processing on log data of a serialized character string structure, and the apparatus specifically includes: a first attribute name acquisition module 610, a first compressed name determination module 620, and a first attribute name replacement module 630.

The first attribute name obtaining module 610 is configured to obtain each first attribute name in the first log data to be processed; a first compressed name determining module 620, configured to determine, according to a mapping relationship between currently stored attribute names and compressed names and each first attribute name in the first log data, a first compressed name corresponding to each first attribute name, where the number of characters of the first compressed name is smaller than the number of characters of the first attribute name; a first attribute name replacing module 630, configured to replace each first attribute name in the first log data with a corresponding first compressed name, to obtain compressed second log data.

Optionally, the first compression name determining module 620 includes:

the mapping relation detection unit is used for detecting whether a compression name corresponding to each first attribute name in the first log data exists in the mapping relation between the currently stored attribute name and the compression name;

a first determining unit, configured to determine, if yes, a compression name corresponding to the first attribute name as a first compression name corresponding to the first attribute name;

and the second determining unit is used for determining a first compressed name corresponding to the first attribute name based on a preset mapping mode if the first attribute name is not the same as the first attribute name, and storing the first attribute name and the corresponding first compressed name into a mapping relation so as to update the mapping relation.

Optionally, the second determining unit includes:

a current number obtaining subunit, configured to obtain a current number of currently used compression names;

the first generating subunit is configured to generate a first compressed name corresponding to the first attribute name by using a single preset character if it is detected that the current number is smaller than a first preset number;

the second generating subunit is configured to generate a first compressed name corresponding to the first attribute name by using two-bit preset characters if it is detected that the current number is greater than or equal to a first preset number and is less than a second preset number;

the first preset number refers to the total number of the compression names which can be generated by utilizing one preset character; the second predetermined number refers to the total number of compression names that can be generated using the two-bit predetermined character.

Optionally, the first generating subunit is specifically configured to: and determining a target preset character corresponding to the first attribute name according to the current number based on the mapping sequence of the first preset character, and taking the target preset character as a corresponding first compressed name.

Optionally, the first generating subunit is further specifically configured to: determining a target ASCII code value corresponding to the first attribute name according to the current number according to the arrangement sequence of the ASCII code values corresponding to the first-bit usable preset characters from small to large; and determining a target preset character corresponding to the target ASCII code value as a first compressed name corresponding to the first attribute name.

Optionally, the second generating subunit is specifically configured to: processing the current number according to a third preset number and a fourth preset number, and determining a first reference value and a second reference value, wherein the third preset number refers to the number of preset characters available for the first position, and the fourth preset number refers to the number of preset characters available for the second position; determining a first-bit preset character corresponding to the first attribute name according to a first-bit reference value based on a first-bit preset character mapping sequence; determining a second-bit preset character corresponding to the first attribute name according to a second-bit reference numerical value based on a second-bit preset character mapping sequence; and splicing the first preset character and the second preset character to obtain a first compressed name corresponding to the first attribute name.

Optionally, the second generating subunit is further specifically configured to: determining a difference between the current quantity and a third preset quantity; the difference is divided by a fourth predetermined number to obtain a quotient as the first digit reference value and a remainder as the second digit reference value.

Optionally, the first attribute name obtaining module 610 is specifically configured to: acquiring first log data to be serialized, and analyzing each first attribute name in the first log data when the first log data is serialized; the first attribute name replacing module 630 is further specifically configured to: and performing serialization splicing processing based on each first compression name to obtain serialized second log data.

Optionally, the apparatus further comprises:

and the log writing module is used for writing the second log data into a log file after the compressed second log data are obtained, and writing the log file into the full-text search engine so as to perform log search in the full-text search engine based on the log keywords input by the user and obtain search result data.

Optionally, the full-text search engine is specifically configured to: determining a search keyword for log search according to a log keyword input by a user; searching in each log file based on the search keyword to obtain third log data matched with the search keyword; determining a third attribute name corresponding to each third compressed name according to the mapping relation between the currently stored attribute name and the compressed name and each third compressed name in the third log data; and replacing each third compressed name of the third log data with a corresponding third attribute name to obtain decompressed search result data.

The log data compression processing device provided by the embodiment of the invention can execute the log data compression processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the log data compression processing method.

It should be noted that, in the embodiment of the log data compression processing apparatus, each unit and each module included in the log data compression processing apparatus are only divided according to functional logic, but are not limited to the above division as long as the corresponding function can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

EXAMPLE five

Fig. 7 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention. FIG. 7 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in FIG. 7, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, and commonly referred to as a "hard drive"). Although not shown in FIG. 7, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, to implement a log data compression processing method provided by the embodiment of the present invention, the method includes:

acquiring each first attribute name in first log data to be processed;

and replacing each first attribute name in the first log data with a corresponding first compressed name to obtain compressed second log data.

Of course, those skilled in the art can understand that the processor can also implement the technical solution of the log data compression processing method provided by any embodiment of the present invention.

EXAMPLE six

The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the steps of a log data compression processing method provided in any embodiment of the present invention, the method including:

acquiring each first attribute name in first log data to be processed;

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A log data compression processing method is characterized by comprising the following steps:

acquiring each first attribute name in first log data to be processed;

2. The method according to claim 1, wherein determining, according to the mapping relationship between the currently stored attribute name and the compressed name and each first attribute name in the first log data, a first compressed name corresponding to each first attribute name comprises:

detecting whether a compressed name corresponding to each first attribute name in the first log data exists in a mapping relation between currently stored attribute names and compressed names;

if so, determining the compression name corresponding to the first attribute name as the first compression name corresponding to the first attribute name;

if not, determining a first compressed name corresponding to the first attribute name based on a preset mapping mode, and storing the first attribute name and the corresponding first compressed name into the mapping relation so as to update the mapping relation.

3. The method of claim 2, wherein determining the first compressed name corresponding to the first attribute name based on a preset mapping manner comprises:

acquiring the current number of the currently used compression names;

if the current number is smaller than a first preset number, generating a first compressed name corresponding to the first attribute name by using a preset character;

if the current number is detected to be larger than or equal to a first preset number and smaller than a second preset number, generating a first compressed name corresponding to the first attribute name by using two-bit preset characters;

the first preset number refers to the total number of the compression names which can be generated by utilizing one preset character; the second preset number refers to the total number of compression names that can be generated by using two preset characters.

4. The method of claim 3, wherein generating the first compressed name corresponding to the first attribute name using a one-bit default character comprises:

and determining a target preset character corresponding to the first attribute name according to the current number based on the mapping sequence of the first preset character to serve as a corresponding first compressed name.

5. The method of claim 4, wherein determining the target default character corresponding to the first attribute name according to the current number as the corresponding first compressed name based on the first bit default character mapping order comprises:

determining a target ASCII code value corresponding to the first attribute name according to the current number according to the arrangement sequence of the ASCII code values corresponding to the preset characters with the first usable bit from small to large;

and determining the target preset characters corresponding to the target ASCII code numerical value as a first compressed name corresponding to the first attribute name.

6. The method of claim 3, wherein generating the first compressed name corresponding to the first attribute name by using two bits of preset characters comprises:

processing the current number according to a third preset number and a fourth preset number, and determining a first reference value and a second reference value, wherein the third preset number refers to the number of preset characters available for the first position, and the fourth preset number refers to the number of preset characters available for the second position;

determining a first-bit preset character corresponding to the first attribute name according to the first-bit reference value based on a first-bit preset character mapping sequence;

determining a second-bit preset character corresponding to the first attribute name according to the second-bit reference numerical value based on a second-bit preset character mapping sequence;

and splicing the first preset character and the second preset character to obtain a first compressed name corresponding to the first attribute name.

7. The method of claim 6, wherein processing the current number according to a third preset number and a fourth preset number to determine a first bit reference value and a second bit reference value comprises:

determining a difference between the current quantity and the third preset quantity;

dividing the difference by a fourth predetermined number, taking the resulting quotient as the first digit reference value, and the resulting remainder as the second digit reference value.

8. The method of claim 1, wherein obtaining each first attribute name in the first log data to be processed comprises:

acquiring first log data to be serialized, and analyzing each first attribute name in the first log data when the first log data is serialized;

the obtaining of the compressed second log data includes:

and performing serialization splicing processing based on each first compression name to obtain serialized second log data.

9. The method according to any one of claims 1 to 8, further comprising, after obtaining the compressed second log data:

and writing the second log data into a log file, and writing the log file into a full-text search engine so as to perform log search in the full-text search engine based on log keywords input by a user and obtain search result data.

10. The method of claim 9, wherein performing a log search based on a log keyword input by a user to obtain search result data comprises:

determining a search keyword for log search according to a log keyword input by a user;

searching in each log file based on the search keyword to obtain third log data matched with the search keyword;

determining a third attribute name corresponding to each third compressed name according to a mapping relation between the currently stored attribute name and the compressed name and each third compressed name in the third log data;

and replacing each third compressed name of the third log data with the corresponding third attribute name to obtain decompressed search result data.

11. A log data compression processing apparatus, comprising:

12. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the log data compression processing method of any one of claims 1-10.

13. A computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing a log data compression processing method according to any one of claims 1 to 10.