CN112749139A

CN112749139A - Log file processing method, electronic device and storage medium

Info

Publication number: CN112749139A
Application number: CN202011614132.5A
Authority: CN
Inventors: 邵传贤; 周振江; 王浩然; 马兵; 吴庆双
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-04
Anticipated expiration: 2040-12-30
Also published as: CN112749139B

Abstract

The invention provides a log file processing method, electronic equipment and a storage medium, wherein the method comprises the following steps: determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in a coding table; respectively aiming at each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the ID of the coding table corresponding to the coding field, and determining a source character segment corresponding to the coding field according to the coding number and the coding table; and generating a source file according to the source character segment corresponding to each encoding field. According to the log file processing method, the electronic device and the storage medium, the encoding table set is established and updated based on the character segments in the source file, the source file is encoded and compressed through the encoding table set, the storage space of the source file in the storage process is released, the cost of hardware equipment is reduced, and meanwhile, the encoded file is decoded through the encoding table set, so that rapid decoding is achieved.

Description

Log file processing method, electronic device and storage medium

Technical Field

The present invention relates to the field of information encoding and decoding technologies, and in particular, to a log file processing method, an electronic device, and a storage medium.

Background

In the operation process of the business system, the request and the system response of the user are recorded in a log file mode. The log files can be collected to a big data analysis platform for big data analysis, and the analyzed data can be uniformly transferred to a data storage system for storage. The log file is stored, on one hand, the deeper mining of the historical service can be carried out in the later period, on the other hand, when problems exist, the historical data can be analyzed, problem rules can be found, and the problem positioning and solving are convenient.

If the log file is stored in the source string format, a large amount of hardware storage space is needed. For this reason, the log file needs to be encoded and compressed to reduce the compression space.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a log file processing method, electronic equipment and a storage medium.

The invention provides a log file processing method, which comprises the following steps:

determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in a coding table;

respectively aiming at each coding field, determining a coding table corresponding to the coding field in a coding table set according to a coding type and a coding table ID corresponding to the coding field, and determining a source character field corresponding to the coding field according to a coding number and the coding table corresponding to the coding field;

generating a source file according to the source character segment corresponding to each coding field;

wherein the set of encoding tables includes one or more subsets of encoding tables; each coding table subset comprises at least one coding table, and the coding types of the coding tables are the same; each coding table in the same coding table subset corresponds to different coding table IDs respectively; the encoding type is determined based on a source segment length.

According to the log file processing method provided by the invention, the lengths of the code numbers in the code table are sequentially increased, and the source character segments sequentially correspond to the code numbers according to the occurrence times from more to less.

According to the icon adjusting method provided by the invention, the number of the coding table subsets is the same as the numerical value of the preset intercepting length, and the intercepting length is the reference length for segmenting the source file in the source file coding process.

According to the log file processing method provided by the invention, the method further comprises the following steps:

acquiring a source file to be coded, and segmenting and dividing the source file to be coded according to the intercepting length to obtain each subsection;

and coding the source character segments in each sub-segment based on the existing coding table set to obtain a coding file, and updating the coding table set.

According to the log file processing method provided by the invention, the encoding of the source character segments in each sub-segment is respectively carried out based on the existing encoding table set to obtain the encoding file, and the encoding table set is updated, and the method comprises the following steps:

judging whether an existing coding table set has a coding table subset of a corresponding coding type according to the maximum length of the subsections to obtain a first judgment result;

if the first judgment result is yes, judging whether the coding tables in the existing coding table subset can be matched with the subsections or not to obtain a second judgment result, and configuring coding numbers for the subsections according to the second judgment result;

if the first judgment result is negative, establishing a new coding table subset, configuring a corresponding coding type according to the maximum length of the subsegment, establishing a new coding table in the new coding table subset, configuring a corresponding coding table ID, and configuring a coding number for the subsegment.

According to the log file processing method provided by the invention, when the first judgment result is negative, the method further comprises the following steps:

determining that an existing coding table set has a coding table subset of a corresponding coding type according to each single character of the subsegment, determining that the coding table in the existing coding table subset cannot be matched with the single character, and configuring a coding number of the single character in the coding table;

and determining that the existing coding table subset does not have a corresponding coding type according to each single character of the subsegment, establishing a coding table in a new coding table subset, and configuring the coding number of the single character in the coding table.

acquiring any value from 1 to (L-1) of the character segments S (0, i) of the subsegments, wherein the S (0, i) represents the character segments formed by sequentially splicing the characters from 0 to the ith in the subsegments, and L is the maximum length of the subsegments;

determining that the character segment S (0, i) does not have a coding table subset of a corresponding coding type in an existing coding table set, establishing a new coding table subset, configuring the corresponding coding type according to the length of the character segment S (0, i), establishing a new coding table in the new coding table subset, configuring a corresponding coding table ID, and configuring a coding number for the character segment S (0, i);

and determining that the character segment S (0, i) has the corresponding coding table subset of the coding type in the existing coding table set, determining that the character segment S (0, i) cannot be matched in the coding table in which the existing coding table subset exists, matching a new coding number for the character segment S (0, i), and updating the coding table.

According to a log file processing method provided by the present invention, the configuring an encoding number for a sub-segment according to a second judgment result includes:

determining that the sub-segments can not be matched in the coding tables in the existing coding table subset, configuring a new coding number for the sub-segments, and updating the coding tables;

and if the sub-sections can be matched in the coding tables in the existing coding table subset, configuring the matched coding numbers for the sub-sections.

The invention also provides a log file processing method, which comprises the following steps:

acquiring a source file to be coded, and segmenting and dividing the source file to be coded according to an interception length to obtain each sub-segment, wherein the interception length is a reference length for segmenting and dividing the source file;

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the log file processing method.

The invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the log processing method as any one of the above.

According to the log file processing method, the electronic device and the storage medium, the encoding table set is established and updated based on the character segments in the source file, the source file is encoded and compressed through the encoding table set, the storage space of the source file in the storage process is released, the cost of hardware equipment is reduced, and meanwhile, the encoded file is decoded through the encoding table set, so that rapid decoding is achieved.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a log file processing method provided by the present invention;

FIG. 2 is another schematic flow chart of a log file processing method provided by the present invention;

FIG. 3 is a schematic structural diagram of a log file processing apparatus provided in the present invention;

FIG. 4 is a schematic diagram of another structure of the log file processing apparatus provided by the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following describes a log file processing method, apparatus, electronic device and storage medium provided by the present invention with reference to fig. 1 to 5.

Fig. 1 shows a schematic flow chart of a log file processing method provided by the present invention, and referring to fig. 1, the method includes the following steps:

s11, determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in a coding table;

s12, respectively aiming at each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the coding table ID corresponding to the coding field, and determining a source character section corresponding to the coding field according to the coding number and the coding table corresponding to the coding field;

and S13, generating a source file according to the source character segment corresponding to each encoding field.

Wherein the set of encoding tables includes one or more subsets of encoding tables; each coding table subset comprises at least one coding table, and the coding types of the coding tables are the same; each coding table in the same coding table subset corresponds to different coding table IDs respectively; the encoding type is determined based on the source segment length.

It should be noted that, in the process of operating the business system, the request and the system response of the user are recorded in the form of log files in the steps S11 to S13. The log files can be collected to a big data analysis platform for big data analysis, and the analyzed data can be uniformly transferred to a data storage system for storage. The log file is stored, on one hand, the deeper mining of the historical service can be carried out in the later period, on the other hand, when problems exist, the historical data can be analyzed, problem rules can be found, and the problem positioning and solving are convenient.

Each log file is a character string, and each complete character string can generate different character segments due to the arrangement sequence of characters, so that the coded information is different due to the difference of the character segments. Therefore, a corresponding encoding table set needs to be established based on the characteristics of the log file, and encoding and decoding of the log file are completed through the established encoding table set.

The order of the characters changes the length of the character segments. For example, the length of the character segment ab is 2 bytes and the length of the character segment abcdf is 5 bytes.

In order to save the encoding time, the character segments with different lengths are required to be matched in the corresponding encoding table to obtain the encoding number. To this end, the encoding table set includes one or more encoding table subsets, and each encoding table stores therein one or more encoding tables of the same type. I.e. each subset of coding tables has a unique coding type, comprising one or more coding tables.

Since the character segments are distinguished based on different lengths (i.e., byte lengths), in the present invention, the encoding type of each encoding sub-table is determined based on the length of the character segment.

For example, in the present invention, a character segment includes one character or a plurality of characters.

The code table corresponding to the character segment of one character is called a single character code table, and the code type is called a single character.

The coding table corresponding to the character segments of the two characters is called a double-character coding table, and the coding type is called 'double characters'.

……

The coding table corresponding to the character segment of the i characters is called an i character coding table, and the coding type is called an 'i character'.

In the present invention, one or more coding tables are included in each of the subsets of coding tables. In this encoding sub-table, each encoding table has a unique encoding table ID.

For example, if the subset of coding tables includes 3 coding tables, the ID of each coding table is 1.1, 1.2, 1.3.

In the present invention, the encoding table is used to encode the log file. Therefore, the code table includes the corresponding relation between the code number and the source character segment. Each encoded log file is referred to as a source file, and the log file is composed of character segments, so the character segments of the source file are referred to as source character segments herein. The source character segments and the code numbers form a one-to-one correspondence relationship and are stored in a code table.

The code number is a unique number corresponding to the source character segment. The code number is determined by binary coding.

In the invention, a source file is coded by adopting a segmented coding mode, and therefore, the coded file comprises a plurality of coding fields, and each coding field comprises a coding type, a coding table ID and a coding number in a coding table.

In the decoding process, a file to be decoded is obtained, and the coding type, the coding table ID and the coding number in the coding table of each coding field are determined according to the file to be decoded.

And determining the coding table corresponding to each coding field in the coding table set according to the coding type and the coding table ID in each coding field, and obtaining the source character section corresponding to each coding field according to the coding number and the coding table.

And generating a source file according to the source character section corresponding to each encoding field.

For example, the file to be decoded includes encoding fields of { three characters, 3.1, 011}, { three characters, 3.2, 001}, respectively.

If there is a correspondence between 011 and abc in the encoding table with the encoding table ID of 3.1 in the three-character encoding table subset, then the source character segment with { three characters, 3.1, 011} encoding field is abc.

If there is a correspondence between 001 and jkh in the encoding table with an ID of 3.2 in the subset of three-character encoding tables, the source character segment with the encoding field of { three-character, 3.2, 001} is jkh.

At this time, the generated source file is abcjkh.

The log file processing method provided by the invention establishes and updates the coding table set based on the character segments in the source file, and codes and compresses the source file through the coding table set, so that the storage space of the source file in the storage process is released, the cost of hardware equipment is reduced, and meanwhile, the coding file is decoded through the coding table set, so that the rapid decoding is realized.

In the further explanation of the above method, the specific explanation is mainly for the code numbers in the code table, the lengths of the code numbers are sequentially increased, and the source character segments sequentially correspond to the code numbers according to the occurrence times from more to less.

In this regard, it should be noted that each code number has its own byte length. For example, the binary code 0, 1, 10, 11, 100, 101, 111, 1000 … … shows that the length of the code number is set in an increasing manner as a whole.

In the present invention, there are some character segments that occur more often. But the coding table is configured with a coding number corresponding to a longer byte length. Therefore, the encoding table needs to be updated and optimized, so that the source character segments are replaced with the encoding numbers according to the occurrence times from more to less.

For example, the source character segment abcdefgh appears in different source files more times, but the code number of the source character segment in the coding table is 10000, while the code number of another source character segment abcvgjhk in the coding table is 0, at this time, the code number of the source character segment abcdefgh is changed to 0, and the code number of another source character segment abcvgjhk is changed to 10000. The numbers of other source character segments with more times are sequentially adjusted to the front of the code numbers in the code table.

In the invention, the same character segment is compressed to obtain the same compression result, the occurrence frequency of the character segment can be counted through the compression result, and the coding table is optimized, thereby obtaining better character compression ratio.

In the further explanation of the above method, the number of the encoding table subsets is mainly explained specifically, the number of the encoding table subsets is the same as a preset intercepting length, and the intercepting length is a reference length for segmenting the source file in the source file encoding process.

In this regard, it should be noted that the method of the present invention is intended to encode and compress a source file, and for compression, a character segment of a certain byte length (i.e., an intercepted length) in a source file is actually compressed into an encoding with a shorter byte length. Namely: the maximum length of the code number in each code table is made shorter than the truncated length, so that the file compression can be realized.

Based on the above explanation of the encoding types, the number of the encoding table subsets is the same as the preset value of the truncation length. The interception length is a reference length for segmenting the source file in the source file encoding process.

For example, the configuration interception length is 10 bytes, and at this time, for the source file with 98 bytes to be segmented, 9 subsections with 10 bytes length and 1 subsections with 8 bytes length need to be segmented.

In the above encoding process with the truncation length of 10 bytes, since 10 bytes is the maximum length of the segment division, the number of the encoding table subsets can only be 10.

The further method of the invention can reasonably manage the number of the established subset of the coding table by reasonably setting the interception length.

In the further explanation of the above method, the establishment and updating process of the coding table is mainly explained as follows:

In this regard, it should be noted that, in the present invention, the encoding tables in the encoding table set can be dynamically updated during the encoding process of the log file.

The source file is encoded and compressed, in fact, character segments of a certain byte length (namely, the intercepted length) in the source file are compressed into codes with shorter byte length.

Therefore, after the source file to be coded is obtained, the source file to be coded is segmented and divided according to the set intercepting length to obtain each subsection.

And then, coding the source character segments in each sub-segment based on the existing coding table set to obtain a coding file, and updating the coding table set.

In the present invention, an initial encoding table may be configured. The character segment corresponding to the code number in the code table is a common character or a character segment. That is, the common characters or character segments are encoded, and the corresponding relation between the code numbers and the character segments in the code table corresponding to different character segment lengths is established.

Then the initial coding table set is used for coding a certain number of source files to obtain a coding file and a more complete coding table set.

And then, the more complete coding table set is used for coding the subsequent source file to obtain a coding file, and meanwhile, the dynamic updating of the coding table set is also realized.

The further method of the invention encodes the source file by the existing encoding table set, and realizes the dynamic update of the encoding table set while obtaining the encoded file, so that the encoding table set is more perfect and has better adaptability.

In the further explanation of the above method, the source character segments in each sub-segment are encoded based on the existing encoding table set to obtain the encoded file, and the processing procedure of updating the encoding table set is explained as follows:

In this regard, it should be noted that, in the present invention, it should be noted that, when the configuration truncation length is 10 bytes, at this time, 9 subsections with a length of 10 bytes and 1 subsection with a length of 8 bytes are required to be divided for the source file with 98 bytes being segmented.

For 9 sub-segments of length 10 bytes, the maximum length of the sub-segment is 10.

For 1 sub-segment of length 8 bytes, the maximum length of the sub-segment is 8.

In the encoding process, whether the current sub-segment can be matched with the corresponding encoding number in the encoding table is firstly judged, and if the current sub-segment can be matched with the directly configurable encoding number, the encoding compression of the current sub-segment is completed.

When it is determined that there is no corresponding coding table subset of the coding type in the existing coding table set according to the maximum length of the sub-segment, it indicates that the character segment of the length has not been coded in the coding process, at this time, a new coding table subset is established for the length of the byte, a coding table is established in the coding table subset, a corresponding coding table ID is configured, a coding number is configured for the sub-segment, and then the sub-segment is coded and compressed by using the just configured coding number.

For example, for the sub-segment abcdefghlk, a subset of the coding table with the coding type of "10 characters" is established, then the coding table with the ID of 10.1 is configured, and the corresponding relationship between "0" and "abcdefghlk" is established in the coding table.

When determining that the existing coding table set has a coding table subset of a corresponding coding type according to the maximum length of the sub-segment, it cannot be guaranteed that a corresponding relationship between the sub-segment and the coding number exists in a certain coding table in the coding table subset. For this purpose, matching needs to be performed in the coding table.

The further method of the invention is to establish a new coding table and configure the coding number when the existing coding table can not match the sub-segments, so as to directly match the subsequent sub-segments.

In the further description of the method, the following processing procedure when the first determination result is negative is mainly described in an additional way, specifically as follows:

since the maximum length of the code number in each code table is made shorter than the truncation length, file compression can be realized. Therefore, when the length of the code number in one code table is close to the intercepted length, a new code table needs to be reconstructed, and then a new sub-section configuration new code number which does not establish the corresponding relation between the sub-section and the code number is added into the code table.

In the present invention, the sub-segment of the length is not encoded in the encoding process, and at this time, it cannot be determined whether the character segments of different byte lengths exist in the sub-segment and have a corresponding relationship with the encoding number in the corresponding encoding table. Because when a source file is divided, the length of the last character segment that may be reserved is less than the maximum length of the subsegment. For example, the maximum length of a sub-segment is 10 and the length of the last character segment is 2.

At this time, continuously acquiring any value from 1 to (L-1) of the character segments S (0, i) of the subsections, wherein S (0, i) represents the character segment formed by the characters from 0 to the ith in the subsections, and L is the maximum length of the field;

determining that the character segment S (0, i) does not have a coding table subset of a corresponding coding type in an existing coding table set, establishing a new coding table subset, configuring the corresponding coding type according to the length of the character segment S (0, i), establishing a new coding table in the new coding table subset, configuring a corresponding coding table ID, and configuring a coding number for the character segment S (0, i); and determining that the existing coding table set has a coding table subset of the corresponding coding type according to the length of the character segment S (0, i), determining that the character segment S (0, i) cannot be matched in the coding table in which the existing coding table subset exists, matching a new coding number for the character segment S (0, i), and updating the coding table.

The processing procedure for the character segment S (0, i) of the sub-segment is the same as the above-mentioned processing procedure for the entire field, and is not described herein again.

In addition, when the first judgment result is negative, determining that an existing coding table set has a coding table subset of a corresponding coding type according to each single character of the subsegment, determining that the coding table in the existing coding table subset cannot be matched with the single character, and configuring the coding number of the single character in the coding table; and determining that the existing coding table subset does not have a corresponding coding type according to each single character of the subsegment, establishing a coding table in a new coding table subset, and configuring the coding number of the single character in the coding table. These processes are the same as the above-described processes and are not described in detail herein.

The above-described processing is explained below by specific examples:

if the source file to be encoded is a character string:&₁&₂…&₁₀@₁@₂…@₁₀％₁％₂…％₁₀…, the string is 101 characters in length. Each time the compressed character segment is 10 characters, the compression is divided into 11 times. I.e., 10 character segments of 10 characters and 1 character segment of 1 character.

Firstly, 1-10 characters are taken from source file to be coded as first character segment to be compressed "&₁&₂…&₁₀”；

If there is a 10 character code table subset in the code table set "&₁&₂…&₁₀"there is corresponding code table in 10 character code table subsets, and the corresponding code 01 in the code table, then output the code 01 directly;

if there is no 10 character code table subset or there is 10 character code table subset but the matching is not successful in the code table, then "&₁&₂…&₁₀"get first character"&₁", determine the character"&₁"whether it is in single character coding table, if yes, continue the next step; if not, first "&₁Putting the code into a single character code table, allocating code numbers for the code, and then carrying out the next step;

from "&₁&₂…&₁₀"middle reading second character"&₂", the processing of the single character is consistent with the processing in the previous step; after the single character processing is finished, the last character processing is carried out "&₁"and the single character read this time"&₂Splicing to obtain character string "&₁&₂". Judgment "&₁&₂"whether it is in the double character coding table, if yes, then proceed to next step; if not, will "&₁&₂Adding the code number into a double-character code table and distributing a corresponding code number for the double-character code table;

from "&₁&₂…&₁₀"middle reading second character"&₃", the processing of the single character is consistent with the processing in the previous step; after the character processing is finished, the character segment 'ab' processed last time and the read character of this time are compared "&₃Splicing to obtain character string "&₁&₂&₃". Judgment "&₁&₂&₃Whether the code is in the three-character code table or not is judged, and if yes, the next step is carried out; if not, will "&₁&₂&₃Adding into the three-character code table and distributing corresponding codes for the three-character code tableCode number;

from "&₁&₂…&₁₀"read the ith character"&_i", i takes 4-10, the processing of single character is consistent with the processing in the previous step; after the single character processing is completed, the last processed character segment "&₁-&_i-1"with the single character read this time"&_iSplicing to obtain character segments "&₁-&_i". Judgment "&₁-&_iWhether the code is in the i character code table or not, if yes, the next step is carried out; if not, will "&₁-&_iAdding the code number into an i character code table and distributing a corresponding code number for the i character code table;

in "&₁&₂…&₁₀After the processing is finished, the codes in the 10-character code table corresponding to the character segment can be obtained, and the processing of the first character segment is finished. The output format of the final encoding is: { ten characters, 10.1, 00}, the crosses are the code type, 10.1 the code table ID, 00 the code number.

Then intercepting the second character string' @ in turn₁@₂…@₁₀”、“％₁％₂…％₁₀"," … ", repeated for it with the above pair of character segments"&₁&₂…&₁₀The processing process with the same principle is completed until the whole source file to be coded is processed, namely the compression is completed.

The further method of the invention realizes the purpose of dynamically updating the coding table by carrying out individual coding analysis on each character string of each subsection when each subsection in the source file to be coded is not successfully matched in the coding table set.

Fig. 2 shows a schematic flow chart of a log file processing method provided by the present invention, and referring to fig. 2, the method includes the following steps:

s21, obtaining a source file to be coded, and segmenting and dividing the source file to be coded according to the intercepting length to obtain each sub-segment, wherein the intercepting length is the reference length for segmenting and dividing the source file;

and S22, coding the source character segments in each sub-segment based on the existing coding table set to obtain a coding file, and updating the coding table set.

In the further explanation of the above method, the following explanation is mainly given when the first determination result is negative:

In the further explanation of the above method, the process of configuring the code number for the sub-segment according to the second judgment result is mainly explained as follows:

The above-described encoding process is described in detail in the above-mentioned content, and is not described in detail herein.

The method of the invention encodes the source file through the existing encoding table set, and realizes the dynamic update of the encoding table set while obtaining the encoded file, so that the encoding table set is more perfect and has better adaptability.

The following describes the log file processing apparatus provided by the present invention, and the log file processing apparatus described below and the log file processing method described above may be referred to in correspondence with each other.

Fig. 3 shows a schematic structural diagram of a log file processing apparatus provided by the present invention, referring to fig. 3, the apparatus includes a parsing module 31, a decoding module 32, and a generating module 33, where:

the analysis module 31 is configured to determine each coding field in the file to be decoded, where each coding field includes a corresponding coding type, a coding table ID, and a coding number in a coding table;

a decoding module 32, configured to determine, for each coding field, a coding table corresponding to the coding field in a coding table set according to a coding type and a coding table ID corresponding to the coding field, and determine, according to a coding number and the coding table corresponding to the coding field, a source character segment corresponding to the coding field;

a generating module 33, configured to generate a source file according to the source character segments corresponding to each encoding field;

In the further explanation of the above device, the lengths of the code numbers in the code table are sequentially increased, and the source character segments sequentially correspond to the code numbers from more to less according to the occurrence times.

In a further description of the above apparatus, the number of the encoding table subsets is the same as a preset truncation length, where the truncation length is a reference length for segmenting the source file during the encoding process of the source file.

In a further description of the above apparatus, the apparatus further comprises an encoding module configured to:

In a further description of the above apparatus, the encoding module is specifically configured to, in a process of encoding the source character segment in each sub-segment based on an existing encoding table set to obtain an encoding file and updating the encoding table:

In a further description of the foregoing apparatus, when the first determination result is negative, the encoding module is further configured to:

acquiring a character section S (0, i) of the sub-section, wherein i is 0- (L-1), the S (0, i) represents the character section formed by sequentially splicing the 0 th character section to the ith character section in the sub-section, and L is the maximum length of the sub-section;

In a further description of the above apparatus, the encoding module, in the process of configuring the code number for the sub-segment according to the second judgment result, is specifically configured to:

Since the principle of the apparatus according to the embodiment of the present invention is the same as that of the method according to the above embodiment, further details are not described herein for further explanation.

It should be noted that, in the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).

Fig. 4 shows a schematic structural diagram of a log file processing apparatus provided by the present invention, referring to fig. 4, the apparatus includes a dividing module 41 and an encoding module 42, where:

the dividing module 41 is configured to obtain a source file to be encoded, and segment the source file to be encoded according to an interception length to obtain each sub-segment, where the interception length is a reference length for segmenting the source file;

and the encoding module 42 is configured to encode the source character segments in each sub-segment based on an existing encoding table set to obtain an encoding file, and update the encoding table set.

In a further description of the above apparatus, the encoding module is specifically configured to, in a processing procedure of respectively encoding the source character segments in each sub-segment based on an existing encoding table set to obtain an encoded file and updating the encoding table set:

The device of the invention encodes the source file through the existing encoding table set, and realizes the dynamic update of the encoding table set while obtaining the encoded file, so that the encoding table set is more perfect and has better adaptability.

Fig. 5 is a schematic physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor)51, a communication Interface (communication Interface)52, a memory (memory)53 and a communication bus 54, wherein the processor 51, the communication Interface 52 and the memory 53 complete communication with each other through the communication bus 54. The processor 51 may call logic instructions in the memory 53 to perform a log file processing method comprising: determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in a coding table; respectively aiming at each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the ID of the coding table corresponding to the coding field, and determining a source character segment corresponding to the coding field according to the coding number and the coding table corresponding to the coding field; and generating a source file according to the source character segment corresponding to each encoding field. Wherein the set of encoding tables includes one or more subsets of encoding tables; each coding table subset comprises at least one coding table, and the coding types of the coding tables are the same; each coding table in the same coding table subset corresponds to different coding table IDs respectively; the encoding type is determined based on the source segment length.

In addition, the logic instructions in the memory 53 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the log file processing method provided by the above methods, the method comprising: determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in a coding table; respectively aiming at each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the ID of the coding table corresponding to the coding field, and determining a source character segment corresponding to the coding field according to the coding number and the coding table corresponding to the coding field; and generating a source file according to the source character segment corresponding to each encoding field. Wherein the set of encoding tables includes one or more subsets of encoding tables; each coding table subset comprises at least one coding table, and the coding types of the coding tables are the same; each coding table in the same coding table subset corresponds to different coding table IDs respectively; the encoding type is determined based on the source segment length.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the log file processing method provided above, the method including: determining each coding field in the file to be decoded, wherein each coding field comprises a corresponding coding type, a coding table ID and a coding number in a coding table; respectively aiming at each coding field, determining a coding table corresponding to the coding field in a coding table set according to the coding type and the ID of the coding table corresponding to the coding field, and determining a source character segment corresponding to the coding field according to the coding number and the coding table corresponding to the coding field; and generating a source file according to the source character segment corresponding to each encoding field. Wherein the set of encoding tables includes one or more subsets of encoding tables; each coding table subset comprises at least one coding table, and the coding types of the coding tables are the same; each coding table in the same coding table subset corresponds to different coding table IDs respectively; the encoding type is determined based on the source segment length.

The present invention provides an electronic device, which may include: the system comprises a processor (processor), a communication Interface (communication Interface), a memory (memory) and a communication bus, wherein the processor, the communication Interface and the memory are communicated with each other through the communication bus. The processor may call logic instructions in the memory to perform a log file processing method, the method comprising: acquiring a source file to be coded, and segmenting and dividing the source file to be coded according to an interception length to obtain each sub-segment, wherein the interception length is a reference length for segmenting and dividing the source file; and coding the source character segments in each sub-segment based on the existing coding table set to obtain a coding file, and updating the coding table set.

In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the log file processing method provided by the above methods, the method comprising: acquiring a source file to be coded, and segmenting and dividing the source file to be coded according to an interception length to obtain each sub-segment, wherein the interception length is a reference length for segmenting and dividing the source file; and coding the source character segments in each sub-segment based on the existing coding table set to obtain a coding file, and updating the coding table set.

In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program, which when executed by a processor is implemented to perform the log file processing method provided above, the method including: acquiring a source file to be coded, and segmenting and dividing the source file to be coded according to an interception length to obtain each sub-segment, wherein the interception length is a reference length for segmenting and dividing the source file; and coding the source character segments in each sub-segment based on the existing coding table set to obtain a coding file, and updating the coding table set.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A log file processing method is characterized by comprising the following steps:

2. The log file processing method according to claim 1, wherein lengths of the code numbers in the code table are sequentially increased, and the source character segments sequentially correspond to the code numbers from more to less according to the occurrence number.

3. The log file processing method as claimed in claim 2, wherein a maximum value of the number of the subsets of the encoding tables is the same as a preset truncation length, and the truncation length is a reference length for segmenting the source file during the encoding process of the source file.

4. A log file processing method according to any of claims 1-3, wherein the method further comprises:

5. The log file processing method according to claim 4, wherein the encoding the source character segments in each sub-segment based on an existing encoding table set to obtain an encoded file, and updating the encoding table set includes:

6. The log file processing method according to claim 5, when the first determination result is no, further comprising:

7. The log file processing method according to claim 6, further comprising, when the first determination result is negative:

8. The log file processing method as claimed in claim 5, wherein said configuring an encoding number for a subsegment according to the second determination result comprises:

9. A log file processing method is characterized by comprising the following steps:

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the log file processing method according to any one of claims 1 to 8 or implements the steps of the log file processing method according to claim 9 when executing the program.

11. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the log file processing method according to any one of claims 1 to 8, or implements the steps of the log file processing method according to claim 9.