CN112966101B - Statement clustering method, transaction clustering method, statement clustering device and transaction clustering device - Google Patents

Statement clustering method, transaction clustering method, statement clustering device and transaction clustering device Download PDF

Info

Publication number
CN112966101B
CN112966101B CN202110167246.8A CN202110167246A CN112966101B CN 112966101 B CN112966101 B CN 112966101B CN 202110167246 A CN202110167246 A CN 202110167246A CN 112966101 B CN112966101 B CN 112966101B
Authority
CN
China
Prior art keywords
transaction
sentence
characteristic value
request
statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110167246.8A
Other languages
Chinese (zh)
Other versions
CN112966101A (en
Inventor
白腊梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110167246.8A priority Critical patent/CN112966101B/en
Publication of CN112966101A publication Critical patent/CN112966101A/en
Application granted granted Critical
Publication of CN112966101B publication Critical patent/CN112966101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention is applicable to the technical field of databases, and provides a statement clustering method, which comprises the following steps: receiving statement request input in real time; denoising and normalizing the sentence request to obtain a purified sentence request; extracting statement feature values in the purified statement request through a feature extraction algorithm; and adding the sentence characteristic values according to the first appearance sequence to form a sentence identification sequence, and establishing a sentence mapping relation between the sentence identification sequence and the sentence identification sequence. The invention also provides a transaction clustering method, a statement clustering device and a transaction clustering device. In the sentence clustering method of the embodiment of the invention, the effective connection between a single sentence request and a higher-level sentence identification sequence formed by the sentence request clustering is established, so that not only can an operation and maintenance person analyze the abnormal reasons of a database according to the sentence identification sequence, but also other persons can effectively cluster data in other fields based on the sentence clustering method and observe and analyze the development trend of other data.

Description

Statement clustering method, transaction clustering method, statement clustering device and transaction clustering device
Technical Field
The invention belongs to the technical field of databases, and particularly relates to a statement clustering method, a transaction clustering method, a statement clustering device and a transaction clustering device.
Background
In the prior art, when the data in the database server side is abnormal, an operation and maintenance person determines a data classification standard in a manual mode on site so as to be used for classifying the data according to the classification standard by the database server side, thereby helping the operation and maintenance person observing the change of the data to determine the reason of the data abnormality. When the data volume in the database server is relatively large, the data is classified and counted by the operation and maintenance personnel on site, and the method is limited in that the reasons of the data abnormality are difficult to be timely and effectively determined by manpower.
In addition, even if the operation and maintenance personnel spend a lot of time and effort to confirm the specific cause of the abnormality of the data, the operation and maintenance personnel cannot be extended to the cause for analyzing and judging that the abnormality may occur in other data of the type to which the abnormality data belongs according to the abnormality cause of the single abnormality data, and cannot be extended to the cause for analyzing and judging that the abnormality may occur in the data body of the higher hierarchy to which the type of data belongs. That is, it is difficult to effectively extend to the reason for analyzing and judging that a larger, higher-level data body may be abnormal, based on a single cause of data abnormality.
Similarly, in other situations, even if the feature information of one object is confirmed, it is difficult to accurately and effectively identify objects having the same feature information as the same class, and it is difficult to identify all objects in the same class as larger and higher-level classes, and it is difficult to establish an effective link between a single object and a higher-level subject constituted by the objects.
Disclosure of Invention
The embodiment of the invention provides a statement clustering method, a transaction clustering method, a statement clustering device and a transaction clustering device, which aim to solve the technical problems that in the prior art, effective connection between a single target object and a main body with higher hierarchy formed by the target object is difficult to establish, for example, when data in a database server is abnormal, data classification cannot be timely and effectively determined only by operation and maintenance personnel in a manual mode on site, and the cause of the abnormality cannot be timely and effectively determined.
The embodiment of the invention is realized in such a way that a statement clustering method comprises the following steps:
Receiving statement request input in real time;
denoising and normalizing the statement request to obtain a purified statement request;
Extracting statement feature values in the purification statement request through a feature extraction algorithm;
And adding the statement characteristic values according to the first appearance sequence to form a statement class identification sequence, and establishing a statement mapping relation between the statement characteristic values and the statement class identification sequence.
The embodiment of the invention also provides a transaction clustering method, which comprises the following steps:
Receiving input of a transaction characteristic value set contained in a transaction, wherein the transaction characteristic value set is formed by the statement identification sequence obtained by the statement clustering method;
Comparing each received set of transaction characteristic values in the order of first occurrence;
Clustering the same transaction characteristic value sets into one type when the compared transaction characteristic value sets are the same;
And when the compared transaction characteristic value sets are different, adding the different transaction characteristic value sets according to the first appearance sequence to form a transaction characteristic value sequence.
The embodiment of the invention also provides a statement clustering device, which comprises:
the sentence request receiving unit is used for receiving sentence request input in real time;
the statement request processing unit is used for denoising and normalizing the statement request to obtain a purified statement request;
the sentence characteristic value extraction unit is used for extracting sentence characteristic values in the purifying sentence request through a characteristic extraction algorithm;
And the statement identification sequence forming unit is used for adding the statement characteristic values according to the first appearance sequence to form a statement identification sequence and establishing a statement mapping relation between the statement identification sequence and the statement identification sequence.
The embodiment of the invention also provides a transaction clustering device, which comprises:
A transaction characteristic value set receiving unit, configured to receive an input of a transaction characteristic value set included in a transaction, where the transaction characteristic value set is formed by the sentence identification sequence obtained by the sentence clustering device;
a transaction characteristic value set comparison unit, configured to compare each received transaction characteristic value set in the order of first occurrence;
The transaction characteristic value set clustering unit is used for clustering the same transaction characteristic value set into one type when the compared transaction characteristic value sets are the same;
and the transaction characteristic value sequence forming unit is used for adding different transaction characteristic value sets according to the first appearance sequence to form a transaction characteristic value sequence when the compared transaction characteristic value sets are different.
The method has the advantages that the sentence characteristic values are obtained by processing the received sentence requests, then the sentence identification sequences are formed according to the sentence characteristic values, the sentence identification sequences are used for effectively clustering the received sentence requests, and effective connection between a single sentence request and the sentence identification sequences which are formed by clustering the sentence requests and have higher layers is established. In a concrete implementation scene of the sentence clustering, when an operation and maintenance person encounters a data abnormal condition, processing to obtain a sentence characteristic value of abnormal data, and judging an abnormal reason through comparison between the sentence characteristic value of the abnormal data and a stored sentence identification sequence. The method can also analyze and judge possible abnormal conditions of statement requests with similar characteristic values according to statement identification sequences output by statement clustering, and can further analyze and judge certain development trends of transactions and services formed by the statement identification sequences, possible abnormal problems and the like, so that effective observation data support is provided for processing the abnormal conditions. Other personnel can also effectively cluster the data in other fields based on the sentence clustering method, and the method is used for observing and analyzing the development trend of other data.
Drawings
Fig. 1 and fig. 2 are schematic flow diagrams of a sentence clustering method according to an embodiment of the present invention;
FIGS. 3-5 are flow diagrams of a transaction clustering method according to embodiments of the present invention;
Fig. 6 and fig. 7 are schematic structural diagrams of a sentence clustering device according to an embodiment of the present invention;
fig. 8 to 10 are schematic structural diagrams of a transaction clustering device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
Referring to fig. 1, the sentence clustering method in the embodiment of the present invention includes the steps of:
S1: receiving statement request input in real time;
s2: denoising and normalizing the sentence request to obtain a purified sentence request;
S3: extracting statement feature values in the purified statement request through a feature extraction algorithm;
S4: and adding the sentence characteristic values according to the first appearance sequence to form a sentence identification sequence, and establishing a sentence mapping relation between the sentence identification sequence and the sentence identification sequence.
In the embodiment of the invention, the sentence characteristic value is obtained by processing the received sentence request, then the sentence identification sequence is formed according to the sentence characteristic value, the sentence identification sequence is used for effectively clustering the received sentence request, and the effective connection between the single sentence request and the sentence identification sequence with higher hierarchy formed by clustering the sentence requests is established. In a concrete implementation scene of the sentence clustering, when an operation and maintenance person encounters a data abnormal condition, processing to obtain a sentence characteristic value of abnormal data, and judging an abnormal reason through comparison between the sentence characteristic value of the abnormal data and a stored sentence identification sequence.
More importantly, the method can analyze and judge possible abnormal conditions of statement requests with similar characteristic values according to statement identification sequences output by statement clustering, and can further analyze and judge certain development trends of transactions and services formed by the statement identification sequences, possible abnormal problems and the like, so that effective observation data support is provided for processing the abnormal conditions.
The statement clustering method of the embodiment of the invention is applied to a database server, the data in the database server is composed of statement requests, the data written in the following can be understood as statement requests, the statement requests are structured query language (Structured Query Language, SQL) requests for the database server, and the statement requests are commonly used for accessing data and querying, updating and managing a relational database system. The clustering is a process of gathering statement requests stored in the database server into one class according to a certain condition so as to automatically divide different statement requests received by the database server into a plurality of classes.
Specifically, after the database server is started, the receipt of the statement request sent by the client is performed in real time, after the database server receives and processes the statement request, response data (answer statement request) is returned to the client for the statement request, and clustering is needed for the data generated by the interaction between the database server and the client to determine whether the abnormal data is from the database server or the client in the interaction process between the database server and the client, and further, the specific type of the abnormal data can be determined to help operation and maintenance personnel to quickly locate the cause of the data abnormality.
It can be appreciated that noise often exists in the sentence request, which affects the judgment of the sentence request by the database server and the subsequent clustering process. Therefore, in the embodiment of the invention, after receiving the sentence request, denoising and normalizing are firstly carried out on the sentence request, and the denoising is carried out, namely the noise in the sentence request is removed, so that the format of the sentence request is more programmed, normalized, standardized and formalized, and the processing efficiency of the subsequent sentence request is improved; after removing the noise in the sentence request, the sentence request has a certain interval and space, so that all the characters in the sentence request are continuous through normalization processing, and a continuous character string is formed, and the continuous character string is the Clean SQL request. Through step S2, noise in the sentence request is purified, and meanwhile, the character strings meet requirements of continuity and integrity, so that the processing speed of the sentence request is improved, and the normal operation of the processing process is ensured.
Since the sentence request is a character string composed of a plurality of characters in the programming of the computer language, the character strings of different sentence requests are different, and the characters therein are also different. Therefore, when the sentence requests need to be clustered, characters, fields, character strings formed by the characters or the fields in the sentence requests are taken as distinguishing points between different sentence requests, and the characters, the fields and the character strings can be understood as characteristic values for distinguishing different sentence requests, namely, sentence characteristic values (specialvalue). Therefore, in the embodiment of the invention, after the purification statement request is obtained, the statement characteristic value in the statement request is lifted and stored through the characteristic extraction algorithm, and each received statement request is clustered by taking the statement characteristic value as a clustering standard of the statement request and is added into a specific storage space, so that management and searching are facilitated.
In one embodiment, the feature extraction algorithm may be an algorithm with an information summarization function, such as an MD5 algorithm or a Hash algorithm. Taking MD5 algorithm as an example, a special character string may be generated from a character string or a file according to a certain rule by using MD5 algorithm, where the special character string is also called MD5 abstract, and the MD5 abstract is a sentence feature value in the embodiment of the present invention, MD5 abstracts in different files or character strings are different, and if the MD5 abstracts of two files or character strings are the same, the two files or character strings may be considered to be the same.
In addition, since the MD5 digest corresponding to a file or a character string is fixed, when the content of the file or the character string changes (is abnormal), the MD5 digest is different, so after the MD5 algorithm processes the purified sentence request to obtain the sentence feature value, the abnormal sentence request in the database server can be rapidly and accurately located by using the sentence feature value as a clustering standard for the continuously received sentence request and by judging whether the sentence feature value of the sentence request stored in the database server is changed.
In other embodiments, the statement feature value in the purified statement request may be extracted by other algorithms, which are not limited to the above-mentioned MD5 algorithm and Hash algorithm, and the feature value extraction algorithm for processing the purified statement request may be specifically selected on the premise of ensuring that the statement feature value can be accurately obtained.
It can be understood that, since the sentence feature values are character strings with a certain length, and the character string lengths of different sentence feature values are different, if the extracted sentence feature values are directly used as the class names of the clusters, the class names are disordered, and meanwhile, the distinction between different clusters is inconvenient to distinguish, and the division between different clusters is inconvenient.
Therefore, in the embodiment of the invention, the extracted sentence characteristic values are added according to the first appearance sequence to form the sentence identification sequence, the corresponding sentence mapping relation between the sentence characteristic values and the sentence identification sequence is established and stored, namely, after sentence clustering is started, the sentence characteristic values which appear first are only added to form the sentence identification sequence, and if the same sentence characteristic values appear, the sentence characteristic values are added according to the sentence identification sequence which corresponds to the first appearance sequence. Therefore, through any sentence characteristic value or sentence identification sequence, the corresponding sentence identification sequence or sentence characteristic value can be quickly and accurately inquired, repeated clustering of the same sentence request can be avoided, and the processing speed of the sentence request is improved.
More, the marks in the sentence mark sequence can be composed of numbers, letters, special characters and the like with mark functions, such as 1,2,3 … …, a, b, c … … and the like, different sentence characteristic values are automatically and orderly added to form the sentence mark sequence, the same sentence characteristic values are endowed with the same marks, the mapping relation between the sentence mark sequence and the sentence characteristic values is simply, effectively and accurately established, the sentence characteristic values represented by the sentence mark sequence can be rapidly and accurately determined through the sentence mark sequence, and the subsequent other data processing processes aiming at the sentence mark sequence are facilitated.
In the embodiment of the invention, the sentence identification sequence can be a natural number increment sequence, namely, the sentence identification sequence can be bound by starting from a natural number of '1', for example, the current sentence identification sequence is (1), the sentence characteristic values obtained by subsequent extraction are different, then the natural number of '2' is given, the sentence identification sequence is updated to be (1/2), and then the sentence identification sequence is continuously formed by increasing the sentence identification sequence by extracting the sentence characteristic values to obtain the sentence identification sequence of (1/2/3/4 … …) in the form of self-increasing sequence number.
For example, if the extracted sentence characteristic value 1 of the 1 st sentence request 1 is "xxxxx 1", the sentence cluster of the sentence request 1 is "1", and the sentence class identification sequence is (1); if the extracted sentence characteristic value 2 of the second sentence request 2 is 'xxxxxx 2', the sentence cluster of the sentence request 2 is '2', and the sentence identification sequence is (1/2) different from the sentence characteristic value 1; if the extracted sentence characteristic value 3 of the third sentence request 3 is "xxxxxx1", the sentence cluster of the sentence request 3 is "1" as same as the sentence characteristic value 1, and the sentence identification sequence is (1/2/1); if the extracted sentence characteristic value 4 of the fourth sentence request 4 is "xxxxx 3", which is different from the sentence characteristic value 1 and the sentence characteristic value 2, the sentence cluster of the sentence request 4 is "3", the sentence identification sequence is (1/2/1/3), and the like, so as to realize the continuous update of the sentence identification sequence.
In one embodiment, the step S3 may be omitted, and the extracted sentence characteristic value is directly used as a specific category of the cluster, so as to simplify the sentence clustering flow. In yet another embodiment, the mapping relation, the sentence identification sequence and the sentence characteristic value are all associated and stored together, so that the inquiry and the management are facilitated.
Example two
Referring to fig. 2, step S2 includes the steps of:
S21: traversing each character in the sentence request, and converting all characters in the sentence request into capitalization;
s22: rejecting the first type of characters in the sentence request;
s23: replacing the second type character in the sentence request with a third type character;
S24: outputting all the remaining characters of the sentence request as a clean sentence request.
Specifically, after the sentence request is obtained, each character in the sentence request is traversed firstly, each character in the sentence request is converted into uppercase, then the first type of characters in the sentence request are removed, wherein the first type of characters are noise in the sentence request, such as '-', '\n', 0x20 (namely blank space) and other characters in the sentence request, the first type of characters can influence the correctness of the sentence request, and therefore the first type of characters are removed to achieve the purpose of denoising the sentence request.
And after the first type characters are removed, replacing the second type characters in the sentence request with the third type characters. Wherein the second type of characters are values in all characters requested by the sentence, such as ', ' and the third type of characters are '? The character string is the output clean sentence request, and the common point of different sentence requests is determined by the characters in the clean sentence request.
Example III
Referring to fig. 3, the transaction clustering method in the embodiment of the invention includes the steps of:
s5: receiving the input of a transaction characteristic value set contained in a transaction, wherein the transaction characteristic value set is formed by a statement identification sequence obtained by the statement clustering method;
S6: comparing each received transaction characteristic value set according to the first appearance sequence;
S7: clustering the same transaction characteristic value sets into one type when the compared transaction characteristic value sets are the same;
s8: when the compared transaction characteristic value sets are different, the different transaction characteristic value sets are added according to the first appearance sequence to form a transaction characteristic value sequence.
In the embodiment of the invention, the sentence identification sequence obtained by the sentence clustering method is used as the transaction characteristic value set contained in the transaction, the same transaction characteristic value set is clustered into the same class, and different transaction characteristic value sets are added to form the transaction characteristic value sequence, so that the effective clustering of the received transaction is realized, and the effective connection between a single transaction and the transaction characteristic value sequence which is clustered by the transaction and has higher hierarchy is established. In one specific implementation scenario of transaction clustering, when an operator encounters a transaction abnormal condition, processing obtains a transaction characteristic value set of an abnormal transaction, and judging an abnormal cause through comparison between the transaction characteristic value set of the abnormal transaction and a stored transaction characteristic value sequence.
More importantly, the method can analyze and judge possible abnormal conditions of the transactions with similar characteristic values according to the transaction characteristic value sequences output by the transaction clusters, and can further analyze and judge certain development trends of other subjects (such as businesses) formed by the transaction characteristic value sequences, possible abnormal problems and the like, so that effective observation data support is provided for processing the abnormal conditions.
Specifically, the transaction is composed of a plurality of sentence requests, and in combination with the description about the sentence requests in the sentence clustering method embodiment, it is known that the sentence requests forming different transactions are different, so that the sentence feature values forming different transactions are different from the sentence class identification sequences, and the sentence feature values and the sentence class identification sequences obtained further can be used as "feature values" for distinguishing different transaction clusters, and used for confirming and establishing new transaction clusters.
In the embodiment of the invention, after the statement requests are clustered, as a transaction contains a plurality of statement requests, the statement class identification sequence contains a plurality of statement characteristic values, namely, the statement class identification sequence contains a plurality of statement requests. Therefore, when clustering the transactions, a transaction characteristic value set formed by the sentence identification sequences is directly input, namely a complete transaction is input, wherein the sentence identification sequences are characteristic values for distinguishing different transactions.
Further, by comparing the received transaction characteristic value sets (sentence identification sequences) of each transaction according to the first occurrence sequence, it can be determined whether the same transaction exists at the database server currently, and different transactions can be distinguished or different transaction clusters can be newly created.
If the compared transaction characteristic value sets are the same, namely the transaction characteristic value set of the currently input transaction exists, clustering the transactions represented by the same transaction characteristic value set into the same class, and avoiding repeated addition of the same transaction; if the compared transaction characteristic value sets are different, namely the transaction characteristic value sets of the currently input transaction do not have the same object in the database server, the transactions represented by the different transaction characteristic value sets are added to form a transaction characteristic value sequence according to the first appearance sequence, so that the processing process of the data is clearer and more regular, and the management is convenient.
Notably, the transaction characteristic value sequence can be directly formed by adding the transaction characteristic value set so as to simplify the data processing flow; the method can also be formed by indirectly adding the transaction characteristic value sets, such as the statement clustering method, wherein the set identification is firstly given to the transaction characteristic value sets, and then the identification is added to form the transaction characteristic value sequence, so that the character length of the transaction characteristic value sequence is shortened, the composition of the transaction characteristic value sequence is relatively simple and clear, and the inquiry and the management are convenient.
For steps S7 and S8, for example, if the transaction characteristic value set 1 included in the 1 st transaction received by the database server is [1/2/3], that is, the sentence identification sequence forming the transaction characteristic value set 1 is (1/2/3), and the database server does not have [1/2/3], so that the [1/2/3] is added to form the transaction characteristic value sequence {1/2/3}, and the cluster of the 1 st transaction is the cluster represented by {1/2/3 }; if the received transaction characteristic value set 2 contained in the 2 nd transaction is [1/3/3], namely the sentence identification sequence forming the transaction characteristic value set 2 is (1/3/3), the [1/3/3] does not exist in the database server, so that the [1/3/3] is added to the transaction characteristic value sequence, at the moment, the transaction characteristic value sequence comprises { [1/2/3], [1/3/3] }, and the cluster of the 2 nd transaction is the cluster represented by {1/3/3 }; if the received transaction characteristic value set 3 contained in the 3 rd transaction is [1/2/3], namely the sentence identification sequence forming the transaction characteristic value set 3 is (1/2/3), at this time [1/2/3] exists in the database server, so that the 3 rd transaction is clustered to the 1 st transaction, the transaction characteristic value sequence is { [1/2/3], [1/3/3] }, and so on.
In one embodiment, the plurality of sentence feature values received in step S5 may also be a plurality of sentence requests that constitute a transaction, where in the sentence clustering method embodiment, it is not necessary to build a sentence class identification sequence according to the sentence feature values, that is, it is not necessary to construct the identification sequence into a transaction feature value set, so that the data processing flows of the sentence clustering method and the transaction clustering method are simplified.
Example IV
Referring to fig. 4, after step S09, the method includes the steps of:
S10: a corresponding transaction class identifier is given to the transaction characteristic value set, and a transaction mapping relation between the transaction characteristic value set and the transaction class identifier is established;
s11: the transaction class identifiers are added in the order of first occurrence to form a transaction class identifier sequence.
Specifically, the transaction identifier may be a number, a letter, a special character, or the like with an identifying function, such as 01, 02, 03 … …, A, B, C … …, or the like, and different transaction identifiers are automatically and orderly assigned to different transaction characteristic value sets, and the same transaction characteristic value set is assigned to the same transaction identifier, so that a mapping relationship between the transaction identifier and the transaction characteristic value set is simply, effectively and accurately established, and the transaction characteristic value set represented by the transaction identifier can be quickly and accurately determined through the transaction identifier, so that other data processing processes performed for the transaction identifier are facilitated.
In the embodiment of the invention, the transaction type identifier is a natural number, the transaction type identifier sequence is a natural number increment sequence, the transaction type identifier is distinguished from the statement type identifier sequence in the statement clustering method, the transaction type identifier can be given to the transaction characteristic value set from the natural number '01', then the transaction type identifier is continuously formed by continuously incrementing the transaction type identifier after receiving different transaction characteristic value sets, and the transaction type identifier sequence {01/02/03/04 … … } is obtained by updating in a self increment sequence number mode.
For example, the transaction characteristic value set 1 of the 1 st transaction 1 received at this time is [1/2/3], a transaction class identifier "01" is given to the transaction characteristic value set 1, the mapping relationship between the transaction characteristic value set 1 and the transaction class identifier is [1/2/3] ≡→01, the transaction cluster of the transaction 1 is 01, and the transaction class identifier sequence formed by adding at this time is {01}; the transaction characteristic value set 2 of the 2 nd transaction 2 is [1/3/3] which is received, and is different from the transaction characteristic value set 1, transaction class identification '02' is given to the transaction characteristic value set 2, the mapping relation between the transaction characteristic value set 2 and the transaction characteristic value set is [1/3/3] ≡02, the transaction cluster of the transaction 2 is 02, and the transaction class identification sequence formed by adding is {01/02}; the received transaction characteristic value set 3 of the 3 rd transaction 3 is [1/2/3], which is the same as the transaction 1 and the transaction characteristic value set 1, so that a transaction type identifier "01" is given to the transaction characteristic value set 3, the mapping relation between the transaction characteristic value set 3 and the transaction characteristic value set is [1/2/3] ≡→01, the transaction cluster of the transaction 3 is 01, the transaction type identifier sequence formed by adding is {01/02}, and the transaction type identifiers are continuously increased to form by extracting different transaction characteristic value sets, such as {01/02/03/04 … … }.
In the embodiment of the invention, the transaction identification, the transaction characteristic value set, the transaction identification sequence and the mapping relation can be associated and stored, so that the transaction identification, the transaction characteristic value set, the transaction identification sequence and the mapping relation are convenient to compare and cluster with other transactions, the character length is shorter, and the inquiry, comparison and management are more clear.
Example five
Referring to fig. 5, step S5 includes the steps of:
S51: determining a transaction start identifier and a transaction end identifier;
s52: and acquiring a statement class identification sequence between the transaction start identification and the transaction end identification and forming a transaction characteristic value set.
It will be appreciated that in operation of the database server, a complete transaction includes a start time point and an end time point, and statement requests between the start time point and the end time point constitute a complete transaction. Therefore, in the embodiment of the present invention, the start identifier and the end identifier of a transaction are first determined, which is equivalent to acquiring the identifiers of the start time point and the end time point of a transaction, that is, the start time point and the end time point of the transaction are confirmed by the start identifier and the end identifier of the transaction.
And acquiring a statement class identification sequence between the transaction start identification and the transaction end identification, wherein the statement class identification sequence is a reference of a plurality of statement requests as the statement class identification sequence consists of statement characteristic values of the plurality of statement requests. Therefore, step S52 may be understood as acquiring statement requests (including statement requests made at the start time point and the end time point) between the start time point and the end time point of a transaction, where all the statement requests form a set of transaction characteristic values, so as to form a complete transaction, avoid omission of statement requests in the transaction, ensure the integrity of the transaction, and accurately and effectively cluster the transaction.
In addition, in the embodiment of the invention, the transaction start mark is a start sentence, the transaction end mark is an end sentence, and when a transaction starts to be performed or is about to be ended, marks (such as special characters) for identification are added into sentence requests under the start time point and the end time point to divide a complete transaction, so that the start sentence and the end sentence can be conveniently distinguished. The transaction itself is composed of a plurality of statement requests, the starting statement and the ending statement are respectively the starting and the ending of a transaction, and the starting statement and the ending statement of the transaction are respectively used as the transaction starting identifier and the transaction ending identifier in the embodiment of the invention, so that the complete transaction is conveniently and accurately identified without additional identifiers.
Example six
Referring to fig. 6, a sentence clustering apparatus 10 according to an embodiment of the present invention includes:
a sentence request receiving unit 11 for receiving a sentence request input in real time;
A sentence request processing unit 12, configured to denoise and normalize the sentence request to obtain a purified sentence request;
a sentence characteristic value extracting unit 13, configured to extract a sentence characteristic value in the purified sentence request through a characteristic extraction algorithm;
the sentence identification sequence forming unit 14 is configured to add the sentence characteristic values in the order of first occurrence to form a sentence identification sequence, and establish a sentence mapping relationship between the two.
In the embodiment of the invention, the sentence identification sequence used for clustering the sentence request stored in the database server is obtained, when the data quantity received or stored by the database server is large and data abnormality occurs, the sentence characteristic value of the abnormal data can be determined according to the sentence identification sequence, the sentence characteristic value is compared with the sentence identification sequence stored in the database server, if the sentence characteristic value of the abnormal data is the same as the content in the compared sentence identification sequence, the sentence clustering of the abnormal data can be confirmed, then the specific position of the abnormal data is positioned, the abnormal cause of the abnormal data is further analyzed, and the problem of data abnormality is rapidly, accurately and effectively solved. Meanwhile, whether the same statement requests exist at the database server side currently can be judged, so that different statement requests can be distinguished, or different statement clusters can be newly built.
The statement clustering method of the embodiment of the invention is applied to a database server, the data in the database server is composed of statement requests, the data written in the following can be understood as statement requests, the statement requests are structured query language (Structured Query Language, SQL) requests for the database server, and the statement requests are commonly used for accessing data and querying, updating and managing a relational database system. The clustering is a process of gathering statement requests stored in the database server into one class according to a certain condition so as to automatically divide different statement requests received by the database server into a plurality of classes.
Specifically, after the database server is started, the receipt of the statement request sent by the client is performed in real time, after the database server receives and processes the statement request, response data (answer statement request) is returned to the client for the statement request, and clustering is needed for the data generated by the interaction between the database server and the client to determine whether the abnormal data is from the database server or the client in the interaction process between the database server and the client, and further, the specific type of the abnormal data can be determined to help operation and maintenance personnel to quickly locate the cause of the data abnormality.
It can be appreciated that noise often exists in the sentence request, which affects the judgment of the sentence request by the database server and the subsequent clustering process. Therefore, in the embodiment of the invention, after receiving the sentence request, denoising and normalizing are firstly carried out on the sentence request, and the denoising is carried out, namely the noise in the sentence request is removed, so that the format of the sentence request is more programmed, normalized, standardized and formalized, and the processing efficiency of the subsequent sentence request is improved; after removing the noise in the sentence request, the sentence request has a certain interval and space, so that all the characters in the sentence request are continuous through normalization processing, and a continuous character string is formed, and the continuous character string is the Clean SQL request. Through step S2, noise in the sentence request is purified, and meanwhile, the character strings meet requirements of continuity and integrity, so that the processing speed of the sentence request is improved, and the normal operation of the processing process is ensured.
Since the sentence request is a character string composed of a plurality of characters in the programming of the computer language, the character strings of different sentence requests are different, and the characters therein are also different. Therefore, when the sentence requests need to be clustered, characters, fields, character strings formed by the characters or the fields, and the like in the sentence requests are taken as distinguishing points between different sentence requests, and the characters, the fields, and the character strings can be understood as feature values for distinguishing the different sentence requests, namely, sentence feature values (specific values). Therefore, in the embodiment of the invention, after the purification statement request is obtained, the statement characteristic value in the statement request is lifted and stored through the characteristic extraction algorithm, and each received statement request is clustered by taking the statement characteristic value as a clustering standard of the statement request and is added into a specific storage space, so that management and searching are facilitated.
In one embodiment, the feature extraction algorithm may be an algorithm with an information summarization function, such as an MD5 algorithm or a Hash algorithm. Taking MD5 algorithm as an example, a special character string may be generated from a character string or a file according to a certain rule by using MD5 algorithm, where the special character string is also called MD5 abstract, and the MD5 abstract is a sentence feature value in the embodiment of the present invention, MD5 abstracts in different files or character strings are different, and if the MD5 abstracts of two files or character strings are the same, the two files or character strings may be considered to be the same.
In addition, since the MD5 digest corresponding to a file or a character string is fixed, when the content of the file or the character string changes (is abnormal), the MD5 digest is different, so after the MD5 algorithm processes the purified sentence request to obtain the sentence feature value, the abnormal sentence request in the database server can be rapidly and accurately located by using the sentence feature value as a clustering standard for the continuously received sentence request and by judging whether the sentence feature value of the sentence request stored in the database server is changed.
In other embodiments, the statement feature value in the purified statement request may be extracted by other algorithms, which are not limited to the above-mentioned MD5 algorithm and Hash algorithm, and the feature value extraction algorithm for processing the purified statement request may be specifically selected on the premise of ensuring that the statement feature value can be accurately obtained.
It can be understood that, since the sentence feature values are character strings with a certain length, and the character string lengths of different sentence feature values are different, if the extracted sentence feature values are directly used as the class names of the clusters, the class names are disordered, and meanwhile, the distinction between different clusters is inconvenient to distinguish, and the division between different clusters is inconvenient.
Therefore, in the embodiment of the invention, the extracted sentence characteristic values are added according to the first appearance sequence to form the sentence identification sequence, the corresponding sentence mapping relation between the sentence characteristic values and the sentence identification sequence is established and stored, namely, after sentence clustering is started, the sentence characteristic values which appear first are only added to form the sentence identification sequence, and if the same sentence characteristic values appear, the sentence characteristic values are added according to the sentence identification sequence which corresponds to the first appearance sequence. Therefore, through any sentence characteristic value or sentence identification sequence, the corresponding sentence identification sequence or sentence characteristic value can be quickly and accurately inquired, repeated clustering of the same sentence request can be avoided, and the processing speed of the sentence request is improved.
More, the marks in the sentence mark sequence can be composed of numbers, letters, special characters and the like with mark functions, such as 1,2,3 … …, a, b, c … … and the like, different sentence characteristic values are automatically and orderly added to form the sentence mark sequence, the same sentence characteristic values are endowed with the same marks, the mapping relation between the sentence mark sequence and the sentence characteristic values is simply, effectively and accurately established, the sentence characteristic values represented by the sentence mark sequence can be rapidly and accurately determined through the sentence mark sequence, and the subsequent other data processing processes aiming at the sentence mark sequence are facilitated.
In the embodiment of the invention, the sentence identification sequence can be a natural number increment sequence, namely, the sentence identification sequence can be bound by starting from a natural number of '1', for example, the sentence identification sequence is (1), the sentence characteristic values obtained by subsequent extraction are different, then the natural number of '2' is given, the sentence identification sequence is updated to be (1/2), and then the sentence identification sequence is continuously formed by increasing the sentence identification sequence in an increment mode in the form of self-increasing sequence number, so that the sentence identification sequence of (1/2/3/4 … …) is obtained by updating.
For example, if the extracted sentence characteristic value 1 of the 1 st sentence request 1 is "xxxxx 1", the sentence cluster of the sentence request 1 is "1", and the sentence class identification sequence is (1); if the extracted sentence characteristic value 2 of the second sentence request 2 is 'xxxxxx 2', the sentence cluster of the sentence request 2 is '2', and the sentence identification sequence is (1/2) different from the sentence characteristic value 1; if the extracted sentence characteristic value 3 of the third sentence request 3 is "xxxxxx1", the sentence cluster of the sentence request 3 is "1" as same as the sentence characteristic value 1, and the sentence identification sequence is (1/2/1); if the extracted sentence characteristic value 4 of the fourth sentence request 4 is "xxxxx 3", which is different from the sentence characteristic value 1 and the sentence characteristic value 2, the sentence cluster of the sentence request 4 is "3", the sentence identification sequence is (1/2/1/3), and the like, so as to realize the continuous update of the sentence identification sequence.
In one embodiment, the extracted sentence characteristic value can be directly used as a specific category of the clustering, so that the sentence clustering flow is simplified. In yet another embodiment, the mapping relation, the sentence identification sequence and the sentence characteristic value are all associated and stored together, so that the inquiry and the management are facilitated.
Example seven
Referring to fig. 7, the sentence request processing unit 12 includes:
a character conversion module 121, configured to traverse each character in the sentence request and convert all characters in the sentence request into uppercase;
A character rejecting module 122, configured to reject a first type of character in the sentence request;
A character replacing module 123, configured to replace the second type character in the sentence request with a third type character;
The character output module 124 is configured to output all the remaining characters of the sentence request as a clean sentence request.
Specifically, after the sentence request is obtained, each character in the sentence request is traversed firstly, each character in the sentence request is converted into uppercase, then the first type of characters in the sentence request are removed, wherein the first type of characters are noise in the sentence request, such as '-', '\n', 0x20 (namely blank space) and other characters in the sentence request, the first type of characters can influence the correctness of the sentence request, and therefore the first type of characters are removed to achieve the purpose of denoising the sentence request.
And after the first type characters are removed, replacing the second type characters in the sentence request with the third type characters. Wherein the second type of characters are values in all characters requested by the sentence, such as ', ' and the third type of characters are '? The character string is the output clean sentence request, and the common point of different sentence requests is determined by the characters in the clean sentence request.
Example eight
Referring to fig. 8, a transaction clustering device 20 according to an embodiment of the present invention includes:
A transaction characteristic value set receiving unit 21 for receiving an input of a transaction characteristic value set contained in a transaction, the transaction characteristic value set being formed by the sentence identification sequence obtained by the sentence clustering device 10;
A transaction characteristic value set comparing unit 22 for comparing each received transaction characteristic value set in the order of first occurrence;
a transaction characteristic value set clustering unit 23, configured to cluster the same transaction characteristic value set into a class when the compared transaction characteristic value sets are the same;
A transaction characteristic value sequence forming unit 24, configured to, when the compared transaction characteristic value sets are different, add the different transaction characteristic value sets in the order of first occurrence to form a transaction characteristic value sequence.
In the embodiment of the invention, the sentence identification sequence obtained by the sentence clustering method is used as the transaction characteristic value set contained in the transaction, the same transaction characteristic value set is clustered into the same class, and different transaction characteristic value sets are added to form the transaction characteristic value sequence, so that the effective clustering of the received transaction is realized, and the effective connection between a single transaction and the transaction characteristic value sequence which is clustered by the transaction and has higher hierarchy is established. In one specific implementation scenario of transaction clustering, when an operator encounters a transaction abnormal condition, processing obtains a transaction characteristic value set of an abnormal transaction, and judging an abnormal cause through comparison between the transaction characteristic value set of the abnormal transaction and a stored transaction characteristic value sequence.
More importantly, the method can analyze and judge possible abnormal conditions of the transactions with similar characteristic values according to the transaction characteristic value sequences output by the transaction clusters, and can further analyze and judge certain development trends of other subjects (such as businesses) formed by the transaction characteristic value sequences, possible abnormal problems and the like, so that effective observation data support is provided for processing the abnormal conditions.
Specifically, the transaction is composed of a plurality of sentence requests, and in combination with the description about the sentence requests in the sentence clustering method embodiment, it is known that the sentence requests forming different transactions are different, so that the sentence feature values forming different transactions are different from the sentence class identification sequences, and the sentence feature values and the sentence class identification sequences obtained further can be used as "feature values" for distinguishing different transaction clusters, and used for confirming and establishing new transaction clusters.
In the embodiment of the invention, after the statement requests are clustered, as a transaction contains a plurality of statement requests, the statement class identification sequence contains a plurality of statement characteristic values, namely, the statement class identification sequence contains a plurality of statement requests. Therefore, when clustering the transactions, a transaction characteristic value set formed by the sentence identification sequences is directly input, namely a complete transaction is input, wherein the sentence identification sequences are characteristic values for distinguishing different transactions.
Further, by comparing the received transaction characteristic value sets (sentence identification sequences) of each transaction according to the first occurrence sequence, it can be determined whether the same transaction exists at the database server currently, and different transactions can be distinguished or different transaction clusters can be newly created.
If the compared transaction characteristic value sets are the same, namely the transaction characteristic value set of the currently input transaction exists, clustering the transactions represented by the same transaction characteristic value set into the same class, and avoiding repeated addition of the same transaction; if the compared transaction characteristic value sets are different, namely the transaction characteristic value sets of the currently input transaction do not have the same object in the database server, the transactions represented by the different transaction characteristic value sets are added to form a transaction characteristic value sequence according to the first occurrence sequence, so that the processing process of the data is clearer and more regular, and the management is convenient.
Notably, the transaction characteristic value sequence can be directly formed by adding the transaction characteristic value set so as to simplify the data processing flow; the method can also be formed by indirectly adding the transaction characteristic value sets, such as the statement clustering method, wherein the set identification is firstly given to the transaction characteristic value sets, and then the identification is added to form the transaction characteristic value sequence, so that the character length of the transaction characteristic value sequence is shortened, the composition of the transaction characteristic value sequence is relatively simple and clear, and the inquiry and the management are convenient.
For example, if the transaction characteristic value set 1 included in the 1 st transaction received by the database server is [1/2/3], namely the sentence identification sequence forming the transaction characteristic value set 1 is (1/2/3), and the 1 st transaction does not exist in the database server [1/2/3], so that the 1 st transaction cluster is represented by {1/2/3}, and the 1 st transaction cluster is added to form the transaction characteristic value sequence {1/2/3 }; if the received transaction characteristic value set 2 contained in the 2 nd transaction is [1/3/3], namely the sentence identification sequence forming the transaction characteristic value set 2 is (1/3/3), the [1/3/3] does not exist in the database server, so that the [1/3/3] is added to the transaction characteristic value sequence, at the moment, the transaction characteristic value sequence comprises { [1/2/3], [1/3/3] }, and the cluster of the 2 nd transaction is the cluster represented by {1/3/3 }; if the received transaction characteristic value set 3 contained in the 3 rd transaction is [1/2/3], namely the sentence identification sequence forming the transaction characteristic value set 3 is (1/2/3), at this time [1/2/3] exists in the database server, so that the 3 rd transaction is clustered to the 1 st transaction, the transaction characteristic value sequence is { [1/2/3], [1/3/3] }, and so on.
In one embodiment, the transaction characteristic value set receiving unit 21 may also receive a plurality of statement characteristic values of a plurality of statement requests forming a transaction, and at this time, it is unnecessary to establish a statement class identification sequence according to the statement characteristic values, that is, it is unnecessary to form the statement class identification sequence into a transaction characteristic value set, so as to simplify the data processing flows of the statement clustering device and the transaction clustering device.
Example nine
Referring to fig. 9, the transaction clustering apparatus 20 further includes:
a transaction class identifier giving unit 25, configured to give a corresponding transaction class identifier to the transaction characteristic value set, and establish a transaction mapping relationship between the transaction class identifier and the transaction characteristic value set;
The transaction class identification sequence forming unit 26 adds the transaction class identifications in the order of first appearance to form a transaction class identification sequence.
Specifically, the transaction identifier may be a number, a letter, a special character, or the like with an identifying function, such as 01, 02, 03 … …, A, B, C … …, or the like, and different transaction identifiers are automatically and orderly assigned to different transaction characteristic value sets, and the same transaction characteristic value set is assigned to the same transaction identifier, so that a mapping relationship between the transaction identifier and the transaction characteristic value set is simply, effectively and accurately established, and the transaction characteristic value set represented by the transaction identifier can be quickly and accurately determined through the transaction identifier, so that other data processing processes performed for the transaction identifier are facilitated.
In the embodiment of the invention, the transaction type identifier is a natural number, the transaction type identifier sequence is a natural number increment sequence, the transaction type identifier is distinguished from the statement type identifier sequence in the statement clustering method, the transaction type identifier can be given to the transaction characteristic value set from the natural number '01', then the transaction type identifier is continuously formed by continuously incrementing the transaction type identifier after receiving different transaction characteristic value sets, and the transaction type identifier sequence {01/02/03/04 … … } is obtained by updating in a self increment sequence number mode.
For example, the transaction characteristic value set 1 of the 1 st transaction 1 received at this time is [1/2/3], a transaction class identifier "01" is given to the transaction characteristic value set 1, the mapping relationship between the transaction characteristic value set 1 and the transaction class identifier is [1/2/3] ≡→01, the transaction cluster of the transaction 1 is 01, and the transaction class identifier sequence formed by adding at this time is {01}; the transaction characteristic value set 2 of the 2 nd transaction 2 is [1/3/3] which is received, and is different from the transaction characteristic value set 1, transaction class identification '02' is given to the transaction characteristic value set 2, the mapping relation between the transaction characteristic value set 2 and the transaction characteristic value set is [1/3/3] ≡02, the transaction cluster of the transaction 2 is 02, and the transaction class identification sequence formed by adding is {01/02}; the received transaction characteristic value set 3 of the 3 rd transaction 3 is [1/2/3], which is the same as the transaction 1 and the transaction characteristic value set 1, so that a transaction type identifier "01" is given to the transaction characteristic value set 3, the mapping relation between the transaction characteristic value set 3 and the transaction characteristic value set is [1/2/3] ≡→01, the transaction cluster of the transaction 3 is 01, the transaction type identifier sequence formed by adding is {01/02}, and the transaction type identifiers are continuously increased to form by extracting different transaction characteristic value sets, such as {01/02/03/04 … … }.
In the embodiment of the invention, the transaction identification sequence formed by adding the transaction identification can be used for storing, comparing and clustering with other transactions, the character length is shorter, and the inquiry, comparison and management are convenient.
In the embodiment of the invention, the transaction identification, the transaction characteristic value set, the transaction identification sequence and the mapping relation can be associated and stored, so that the transaction identification, the transaction characteristic value set, the transaction identification sequence and the mapping relation are convenient to compare and cluster with other transactions, the character length is shorter, and the inquiry, comparison and management are more clear.
Examples ten
Referring to fig. 10, the transaction characteristic value set receiving unit 21 includes:
the transaction identifier determining module 211 determines a transaction start identifier and a transaction end identifier, wherein the transaction start identifier is a start statement, and the transaction end identifier is an end statement;
The transaction characteristic value set forming module 212 obtains a statement class identification sequence between a transaction start identification and a transaction end identification and forms a transaction characteristic value set.
It will be appreciated that in operation of the database server, a complete transaction includes a start time point and an end time point, and statement requests between the start time point and the end time point constitute a complete transaction. Therefore, in the embodiment of the present invention, the start identifier and the end identifier of a transaction are first determined, which is equivalent to acquiring the identifiers of the start time point and the end time point of a transaction, that is, the start time point and the end time point of the transaction are confirmed by the start identifier and the end identifier of the transaction.
And acquiring a statement class identification sequence between the transaction start identification and the transaction end identification, wherein the statement class identification sequence is a reference of a plurality of statement requests as the statement class identification sequence consists of statement characteristic values of the plurality of statement requests. Therefore, obtaining the sentence class identification sequence between the transaction start identification and the transaction end identification and forming the transaction characteristic value set can be understood as obtaining the sentence request (including the sentence request performed at the start time point and the end time point) between the start time point and the end time point of a transaction, and forming the transaction characteristic value set by all the sentence requests, so as to form a complete transaction, avoid omission of the sentence request in the transaction, ensure the integrity of the transaction, and accurately and effectively cluster the transaction.
In addition, in the embodiment of the invention, the transaction start mark is a start sentence, the transaction end mark is an end sentence, and when a transaction starts to be performed or is about to be ended, marks (such as special characters) for identification are added into sentence requests under the start time point and the end time point to divide a complete transaction, so that the start sentence and the end sentence can be conveniently distinguished. The transaction itself is composed of a plurality of statement requests, the starting statement and the ending statement are respectively the starting and the ending of a transaction, and the starting statement and the ending statement of the transaction are respectively used as the transaction starting identifier and the transaction ending identifier in the embodiment of the invention, so that the complete transaction is conveniently and accurately identified without additional identifiers.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (4)

1. A transaction clustering method, comprising:
receiving input of a transaction characteristic value set contained in a transaction, wherein the transaction characteristic value set is formed by a statement identification sequence obtained by a statement clustering method;
Comparing each received set of transaction characteristic values in the order of first occurrence;
Clustering the same transaction characteristic value sets into one type when the compared transaction characteristic value sets are the same;
When the compared transaction characteristic value sets are different, adding the different transaction characteristic value sets according to the first appearance sequence to form a transaction characteristic value sequence;
When abnormal conditions of the transaction occur, the cause of the abnormality can be judged through the comparison between the transaction characteristic value set of the abnormal transaction and the stored transaction characteristic value sequence, and the possible abnormal conditions of the transaction with the similar characteristic values can be analyzed and judged according to the transaction characteristic value sequence;
the receiving an input of a set of transaction characteristic values contained within a transaction, comprising:
Determining a transaction start identifier and a transaction end identifier;
acquiring the statement class identification sequence between the transaction start identification and the transaction end identification and forming a transaction characteristic value set;
the transaction start mark is a start statement, and the transaction end mark is an end statement;
The sentence clustering method comprises the following steps:
Receiving statement request input in real time;
Denoising and normalizing the statement request to obtain a purified statement request, wherein the purified statement request is a continuous character string after removing noise;
Extracting statement feature values in the purified statement request through a feature extraction algorithm, wherein the statement feature values are characters, fields or character strings in the statement request;
Adding the sentence characteristic values according to the first appearance sequence to form a sentence identification sequence, establishing a sentence mapping relation between the sentence characteristic values and the sentence identification sequence, and if the same sentence characteristic values appear later, adding the sentence characteristic values according to the sentence identification sequence corresponding to the first appearance sequence, and storing the sentence characteristic values, the sentence identification sequence and the mapping relation in an associated manner;
the step of denoising and normalizing the statement request to obtain a purified statement request comprises the following steps:
Traversing each character in the sentence request, and converting all characters in the sentence request into capitalization;
Rejecting the first type of characters in the sentence request;
Replacing the second type characters in the sentence request with third type characters;
And outputting all the remaining characters of the sentence request as a clean sentence request.
2. The transaction clustering method of claim 1, wherein, when the transaction feature value sets that are compared are different, adding the different transaction feature value sets in order of first occurrence to form a transaction feature value sequence, comprises:
Assigning corresponding transaction class identifiers for the transaction characteristic value sets, and establishing a transaction mapping relation between the transaction characteristic value sets and the transaction class identifiers;
And adding the transaction type identifiers according to the first appearance sequence to form a transaction type identifier sequence.
3. A transaction clustering device, comprising:
the transaction characteristic value set receiving unit is used for receiving the input of a transaction characteristic value set contained in a transaction, wherein the transaction characteristic value set is formed by a statement identification sequence obtained by a statement clustering device;
a transaction characteristic value set comparison unit, configured to compare each received transaction characteristic value set in the order of first occurrence;
The transaction characteristic value set clustering unit is used for clustering the same transaction characteristic value set into one type when the compared transaction characteristic value sets are the same;
The transaction characteristic value sequence forming unit is used for adding different transaction characteristic value sets according to the first appearance sequence to form a transaction characteristic value sequence when the compared transaction characteristic value sets are different;
When abnormal conditions of the transaction occur, the cause of the abnormality can be judged through the comparison between the transaction characteristic value set of the abnormal transaction and the stored transaction characteristic value sequence, and the possible abnormal conditions of the transaction with the similar characteristic values can be analyzed and judged according to the transaction characteristic value sequence;
the transaction characteristic value set receiving unit includes:
the transaction identification determining module is used for determining a transaction starting identification and a transaction ending identification;
the transaction characteristic value set forming module is used for acquiring the statement class identification sequence between the transaction starting identification and the transaction ending identification and forming a transaction characteristic value set;
the transaction start mark is a start statement, and the transaction end mark is an end statement;
the sentence clustering device includes:
the sentence request receiving unit is used for receiving sentence request input in real time;
The sentence request processing unit is used for carrying out denoising and normalization processing on the sentence request to obtain a purified sentence request, wherein the purified sentence request is a continuous character string after the noise is removed;
The sentence characteristic value extraction unit is used for extracting sentence characteristic values in the purifying sentence request through a characteristic extraction algorithm, wherein the sentence characteristic values are characters, fields or character strings in the sentence request;
The sentence identification sequence forming unit is used for adding the sentence characteristic values according to the first appearance sequence to form a sentence identification sequence, establishing a sentence mapping relation between the sentence characteristic values and the sentence identification sequence, and if the same sentence characteristic values appear in the follow-up sequence, adding the sentence characteristic values, the sentence identification sequence and the mapping relation according to the sentence identification sequence corresponding to the first appearance sequence, and storing the sentence characteristic values, the sentence identification sequence and the mapping relation in an associated manner;
the statement request processing unit includes:
the character conversion module is used for traversing each character in the sentence request and converting all characters in the sentence request into capitalization;
the character rejecting module is used for rejecting the first type of characters in the sentence request;
The character replacing module is used for replacing the second type of characters in the sentence request with the third type of characters;
And the character output module is used for outputting all the remaining characters of the sentence request as a purifying sentence request.
4. The transaction clustering device of claim 3, wherein the transaction clustering device further comprises:
A transaction identifier giving unit, configured to give a corresponding transaction identifier to the transaction characteristic value set, and establish a transaction mapping relationship between the transaction identifier and the transaction characteristic value set;
and the transaction type identification sequence forming unit is used for adding the transaction type identifications into the transaction type identification sequence according to the first appearance sequence.
CN202110167246.8A 2021-02-07 2021-02-07 Statement clustering method, transaction clustering method, statement clustering device and transaction clustering device Active CN112966101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110167246.8A CN112966101B (en) 2021-02-07 2021-02-07 Statement clustering method, transaction clustering method, statement clustering device and transaction clustering device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110167246.8A CN112966101B (en) 2021-02-07 2021-02-07 Statement clustering method, transaction clustering method, statement clustering device and transaction clustering device

Publications (2)

Publication Number Publication Date
CN112966101A CN112966101A (en) 2021-06-15
CN112966101B true CN112966101B (en) 2024-06-18

Family

ID=76275095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110167246.8A Active CN112966101B (en) 2021-02-07 2021-02-07 Statement clustering method, transaction clustering method, statement clustering device and transaction clustering device

Country Status (1)

Country Link
CN (1) CN112966101B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945256A (en) * 2012-10-18 2013-02-27 福建省海峡信息技术有限公司 Method and device for merging and classifying massive SQL (Structured Query Language) sentences
CN108090351A (en) * 2017-12-14 2018-05-29 北京百度网讯科技有限公司 For handling the method and apparatus of request message

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608086B (en) * 2014-11-17 2021-07-27 中兴通讯股份有限公司 Transaction processing method and device for distributed database system
CN109800240B (en) * 2018-12-13 2024-03-22 平安科技(深圳)有限公司 SQL sentence classifying method, device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102945256A (en) * 2012-10-18 2013-02-27 福建省海峡信息技术有限公司 Method and device for merging and classifying massive SQL (Structured Query Language) sentences
CN108090351A (en) * 2017-12-14 2018-05-29 北京百度网讯科技有限公司 For handling the method and apparatus of request message

Also Published As

Publication number Publication date
CN112966101A (en) 2021-06-15

Similar Documents

Publication Publication Date Title
WO2021088385A1 (en) Online log analysis method, system, and electronic terminal device thereof
MXPA04006390A (en) Real time data warehousing.
JP2011509472A (en) Data clustering method, system, apparatus, and computer program for applying the method
EP4155974A1 (en) Knowledge graph construction method and apparatus, check method and storage medium
CN108268886B (en) Method and system for identifying plug-in operation
CN110659282A (en) Data route construction method and device, computer equipment and storage medium
CN111274218A (en) Multi-source log data processing method for power information system
CN115174205B (en) Network space safety real-time monitoring method, system and computer storage medium
CN113486664A (en) Text data visualization analysis method, device, equipment and storage medium
KR101019627B1 (en) System and Method for Construction Automatic Bibliography based Pattern, and Recording Medium therefor
CN116842142B (en) Intelligent retrieval system for medical instrument
CN111143370A (en) Method, apparatus and computer-readable storage medium for analyzing relationships between a plurality of data tables
CN112966101B (en) Statement clustering method, transaction clustering method, statement clustering device and transaction clustering device
CN116541887A (en) Data security protection method for big data platform
CN116881512A (en) Cross-system metadata blood-edge automatic analysis method
CN110737677B (en) Data searching system and method
CN114090076A (en) Method and device for judging compliance of application program
CN114003665A (en) Data table field relation identification method and device, electronic equipment and storage medium
CN114139032A (en) Electronic information sorting method based on big data processing
CN112541075A (en) Method and system for extracting standard case time of warning situation text
CN110175200A (en) A kind of abnormal energy analysis method and system based on intelligent algorithm
CN113626385B (en) Method and system based on text data reading
CN113806321B (en) Log processing method and system
CN112925856B (en) Entity relationship analysis method, entity relationship analysis device, entity relationship analysis equipment and computer storage medium
CN115640369B (en) Piece information base data storage method applying star-shaped data model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant