CN113010882B - Custom position sequence pattern matching method suitable for cache loss attack - Google Patents

Custom position sequence pattern matching method suitable for cache loss attack Download PDF

Info

Publication number
CN113010882B
CN113010882B CN202110292042.7A CN202110292042A CN113010882B CN 113010882 B CN113010882 B CN 113010882B CN 202110292042 A CN202110292042 A CN 202110292042A CN 113010882 B CN113010882 B CN 113010882B
Authority
CN
China
Prior art keywords
character
node
edge
mode
mov
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110292042.7A
Other languages
Chinese (zh)
Other versions
CN113010882A (en
Inventor
刘立坤
余翔湛
韦贤葵
史建焘
叶麟
葛蒙蒙
李精卫
石开宇
车佳臻
王久金
冯帅
赵跃
宋赟祖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202110292042.7A priority Critical patent/CN113010882B/en
Publication of CN113010882A publication Critical patent/CN113010882A/en
Application granted granted Critical
Publication of CN113010882B publication Critical patent/CN113010882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/78Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a self-defined position sequence pattern matching method suitable for cache loss attack, relates to a matching algorithm, in particular to a self-defined position sequence pattern matching method suitable for cache loss attack, which matches a current scanning character with a current state node by establishing an automaton of a self-defined position sequence; if the current scanning character is successfully matched with the chr in the current state node edge value or the failure pointer, the automatic machine jumps to the next node along the edge or the failure pointer, if the new node is the tail node of the mode self-defined sequence, namely the leaf node of the automatic machine, the mode is hit, the mode recorded in the OUTPUT table is OUTPUT, the current state of the automatic machine jumps to the next node recorded by the current node, and the scanning and matching are continued; the user-defined position sequence matching algorithm solves the problems that the mode matching algorithm always scans to the depth, and the cache hit rate is low and the system processing performance is greatly reduced because a large number of automaton nodes are not in the CPU cache.

Description

Custom position sequence pattern matching method suitable for cache loss attack
Technical Field
The present invention relates to a matching algorithm, and more particularly, to a matching method for a sequence pattern of a user-defined location suitable for a cache miss attack.
Background
Pattern matching has been a research hotspot and difficulty in computer science. In the field of information security, a network security system is severely threatened by a cache loss attack aiming at a pattern matching algorithm. The attacker utilizes the moving sequence of the scanning pointer in the pattern matching algorithm to forge attack data, so that the pattern matching algorithm always scans the deep part of the path during scanning, and the cache loss rate of the system is increased. The multi-pattern exact matching algorithm can be divided into three categories according to the searching mode: prefix search, suffix search and substring search. Among prefix search methods, Aho-corascik (ac) is the most typical algorithm. The algorithm moves the window by calculating the longest common prefix between the text and the pattern. In the suffix search method, the classic algorithm Wu-Manber (WM) is characterized in that a search is performed from right to left backwards in a window. In the substring search method, the representative algorithm is Set Backed Oracle Matching (SBOM).
The AC algorithm constructs a Deterministic Finite Automata (DFA) that records a set of patterns as a Trie tree. The AC algorithm includes three tables: the system comprises a GOTO table, a FAIL table and an OUTPUT table, wherein the GOTO table records the next state according to the current state and the next character, the FAIL table determines which state to return when the next state obtained by the GOTO table is invalid, and the OUTPUT table stores the matched mode in one state. In an AC automaton, if there is a node, the graph contains a GOTO jump edge and a fail pointer.
The WM algorithm includes two steps: preprocessing and scanning, the preprocessing establishes 3 tables: SHIFT table, HASH table and PREFIX table. In the scanning process, the position of a single character in the pattern string is found out, and character matching is carried out based on the sub-string matching block with the fixed length. The scanning step uses a sliding window of a size that starts with the initial character of the text. The first HASH value of the window is calculated for the window suffix. If the HASH value is greater than zero, the window is moved forward in the text character stream using the same value. When the SHIFT value is zero, the HASH table is checked and the suffix HASH value is used to find a candidate list of possible matches.
The SBOM algorithm uses a Factor Oracle structure to construct an automaton for the character string set, and the automaton constructs a superset of all character strings in the character string set. During preprocessing, all character string prefixes with minimum length are taken to reversely construct a Trie structure, and then a Factor Oracle automaton is constructed on a Trie tree. When matching, the text is scanned through a sliding window with the minimum length of the character string prefix, the longest suffix from the initial state is scanned from the right to the left in the window, the suffix is a factor of the character string, if the suffix is hit, the rest part of the character string is scanned, if the suffix is hit, the window is moved backwards, and the state is transferred to the initial state.
In the AC and SBOM algorithms, the scanning position moves by one character correspondingly when the state of the automaton jumps once, the scanning position of the WM algorithm moves in a sliding window (w), and the w is the shortest mode length. It can be seen that the scanning position of the conventional common algorithm is within [1, w ] of each movement range, and cannot cover the longest mode.
The existing algorithms are based on fixed position sequence scanning, so that an attacker can easily implement cache attack by using the algorithms, the attacker only needs to acquire some patterns in advance, remove or modify the characters at the last position of the pattern scanning sequence, and the pattern matching algorithm always scans deeply through a large number of repeated sending, so that the cache hit rate is low and the system processing performance is greatly reduced because a large number of automaton nodes are not in the CPU cache.
Disclosure of Invention
In order to solve the technical problems of low cache hit rate and reduced system processing performance caused by the fact that an attacker is easy to use to implement cache attack in the prior art, the invention provides a custom position sequence pattern matching method suitable for cache loss attack.
A self-defined position sequence pattern matching method suitable for cache loss attack comprises the following steps:
s1, constructing an automaton;
s2, scanning a pointer to point to the first character of the data to be matched, and setting the current state of the automaton as a root node;
s3, matching the current scanning character with the current state node; in current scan character and current state node edge value or fail pointerchrIf the matching is successful, step S4 is executed, the current scan character is matched with all the edge values and the fail pointer of the current state nodechrIf the match fails, the value in the pointer of the match failschrSuccess, the scanning pointer is according tomovThe value being moved by a corresponding distance, the automaton being responsive to the record in its fail pointer
Figure 740517DEST_PATH_IMAGE002
Skipping to the next node and rescanning;
s4, the automaton jumps to the next node along the edge, if the new node is the tail node of the mode self-defined sequence, namely the leaf node of the automaton, the step S5 is executed, otherwise, the scanning pointer scans the edgemovMoving the value by a corresponding distance, if the moved value exceeds the tail character of the data to be matched, terminating the scanning, and otherwise, re-scanning;
s5, mode hit, mode recorded in the OUTPUT table is OUTPUT, the current state of the automaton jumps to the next node recorded by the current node, and step S3 is executed.
Preferably, the automaton consists of a GOTO table, a FAIL table and an OUTPUT table.
Preferably, two types of windows are set during the construction of the automaton, namely a fixed window, a variable window and a fixed-size window
Figure 125231DEST_PATH_IMAGE004
Variable size window
Figure 735204DEST_PATH_IMAGE006
Wherein
Figure 154553DEST_PATH_IMAGE004
The window size is [1, shortest mode length-1 ]],
Figure 448569DEST_PATH_IMAGE006
The window size is 2,
Figure 789421DEST_PATH_IMAGE004
longest pattern tail character position with the second character as the first character],
Figure 999822DEST_PATH_IMAGE006
There are a plurality, the number being the size of the first character set of the pattern.
Preferably, the automaton is constructed in the sequence of 1 layer, 2 layers and 3 layers; the sequence of self-defined positions of each layer is as follows:
(1) 1 layer: which comprises 3 characters of the number of the characters,
Figure 617885DEST_PATH_IMAGE004
the first character of the window,
Figure 888330DEST_PATH_IMAGE004
A window tail character,
Figure 716477DEST_PATH_IMAGE006
A longest window tail character;
(2) 2, layer:
Figure 467920DEST_PATH_IMAGE004
the window removes the largest substrings of the first and last 2 characters, and reorders in reverse order;
(3) 3, layer:
Figure 206069DEST_PATH_IMAGE004
window tail character and longest
Figure 647415DEST_PATH_IMAGE006
The largest substring between window tail characters.
Preferably, the step of constructing the automaton in step S1 is as follows:
s1.1, constructing an automaton according to a user-defined position sequence; establishing a root node with the number of 0;
s1.2. rootCreating new nodes according to the first character of the layer 1, increasing the number progressively, and adding 1 node in each mode; drawing an edge between the root node and the newly added node, wherein the GOTO table is an edge valueedgechrmov),chrIs the first character of the 1 layer,movfor the calculated movement distance by window sliding method, FAIL table is for the root node except for the character creating the edgeedgeA value;
s1.3, creating new nodes according to the second character of the layer 1, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node for each mode, and taking a GOTO table as an edge valueedge(chrmov),chrIs the second character of the layer 1,movfor the calculated moving distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
s1.4, creating new nodes according to the third character of the layer 1, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as an edge valueedge(chrmov),chrIs the third character of the 1 layer,movfor the calculated movement distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
s1.5, creating new nodes according to the first character of the 2 layers, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as an edge valueedge(chrmov),chrIs the first character of the 2 layers of the character,movfor the calculated movement distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
s1.6, moving the current node of each mode to a newly created node, and repeating the step S1.1.5 until the last character of the layer 2 completes the creation of the node;
s1.7, creating new nodes according to the first character of the 3 layers, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as an edge valueedge(chrmov),chrIs the first character of the 3 layers,movfor the calculated movement distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
s1.8, the current node of each mode moves to the newly created node, and the step S1.1.7 is repeated until the last character of the layer 3 completes the creation of the node.
Preferably, the window sliding method in step S1.2 includes the following steps:
s1.2.1, setting the first character of each mode to be aligned with the first character of the window;
s1.2.2, comparing the scanned positions in the window with the positions corresponding to each mode, wherein all the positions are completely matched successfully, executing step S1.2.3, and executing step S1.2.4 if the matching fails;
s1.2.3, finding out the best mode, returning to the next position to be scanned in the mode according to the sequence of the user-defined positions,movthe value is the distance from the nearest character scanning position to the next position to be scanned, and the algorithm is ended;
s1.2.4. the window is moved one character to the left, if the tail position of the window is less than the positions of all the mode first characters, the algorithm is ended, otherwise, the step S1.2.2 is executed.
Preferably, the calculation is based on a step-window sliding methodmovValue ofCalculating by finding a pattern corresponding to a scanned position and calculating a first non-scanned position foundmovA value; computingmovThe value formula is as follows:
Figure 962858DEST_PATH_IMAGE008
in the formula
Figure 718325DEST_PATH_IMAGE010
Representing characters
Figure 701193DEST_PATH_IMAGE012
Whether scanned, 1 is scanned, 0 is not scanned;krepresenting a current scan position;
Figure 519632DEST_PATH_IMAGE014
presentation mode
Figure 587951DEST_PATH_IMAGE016
Is scanned, is detected, and the position of the first unscanned character of (a) is determined.
The invention has the following beneficial effects: the self-defined position sequence pattern matching method exists in an automaton form, and is different from a common AC algorithm in two points: firstly, the automaton construction is not traversed according to the mode byte sequence, but according to the self-defined position sequence; secondly, the edge values of the automaton are defined differently. Under the attack of cache loss, all nodes of the pattern matching algorithm with the self-defined index sequence are concentrated in 1 level, the cache loss rate is not obviously increased, and therefore the matching performance tends to be stable. Each node has a failure list, and all failure transfer states are in the layer 1, so that the nodes for attacking data access are mainly in the layer 1, the cache loss rate is greatly reduced, and the CPU performance is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a diagram illustrating an exemplary automaton according to an embodiment of the present invention;
FIG. 2 is a flow chart of an automated machine scan according to an embodiment of the present invention;
FIG. 3 is an exemplary diagram of an automated machine scan according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a window sliding method according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating an example sequence of windows and custom locations according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Embodiment 1, referring to fig. 1 to 5, illustrates a method for matching a sequence pattern of a custom location suitable for a cache miss attack in this embodiment, including the following steps:
step one, constructing an automaton, and constructing the automaton according to a user-defined position sequence; the automaton is composed of a GOTO table, a FAIL table and an OUTPUT table, the GOTO table records the next state and the scanning position moving distance according to the current state and the current character, the FAIL table determines which state to return to and the scanning position moving distance when the next state obtained by the GOTO table is invalid, and the OUTPUT table stores the hit mode in one state and the next state. Two types of windows are set during the construction of the automaton, namely a fixed window and a variable window. Fixed size window
Figure 475004DEST_PATH_IMAGE018
Multiple variable size windows
Figure 984483DEST_PATH_IMAGE020
Wherein
Figure 33211DEST_PATH_IMAGE018
Size of [1, shortest mode Length-1],
Figure 467122DEST_PATH_IMAGE022
The size of the particles is 2,
Figure 95549DEST_PATH_IMAGE018
longest pattern tail character position with the second character as the first character],
Figure 725114DEST_PATH_IMAGE024
There are a plurality, the number being the size of the first character set of the pattern. The algorithm starts a character, ends a character and for each mode
Figure 741480DEST_PATH_IMAGE018
And preferentially matching the characters at the tail part of the window, and placing the characters which cannot be hit into the first layers of nodes of the automaton. The algorithm divides all the modes according to 3 levels, each level comprises different custom position sequences, the automaton construction sequence is 1 level, 2 levels and 3 levels, and the custom position sequences of each level are as follows:
(1) 1 layer: comprising a plurality of 3 characters of which the number is 3,
Figure 394178DEST_PATH_IMAGE018
a first character,
Figure 91876DEST_PATH_IMAGE018
A tail character,
Figure 387333DEST_PATH_IMAGE026
A longest window tail character;
(2) 2, layer:
Figure 512284DEST_PATH_IMAGE018
removing the largest substrings of the first and last 2 characters, and reordering in reverse order;
(3) 3, layer:
Figure 386699DEST_PATH_IMAGE018
tail character and longest length
Figure 888087DEST_PATH_IMAGE028
The largest substring between the tail characters.
Referring to fig. 5, the custom position order, pattern set { ADCAB, EBPCPCA, eccapdc, BDCABPDA } has a shortest length of 5, windows of [1, 4], a longest length of 5 for the first character a, windows of [2, 5], a longest length of 7 for E, windows of [2, 7], a longest length of 8 for B, and windows of [2, 8 ]. The 1 layer is composed of 1 st, 4 th and tail characters of all modes, and the length is fixed to be 3; the 2 layer is composed of the 2 nd and 3 rd characters of all modes in reverse order, and the length is fixed to be 2; the 3 layers are composed of the 5 th to the last but one character sequence of all the patterns, and the length of the 3 layers of each pattern is not fixed.
The definition of the edge isedge(chr, mov),chrWhich is indicative of the current character,movindicating the scan pointer movement distance after the current character matches.
Step one, establishing a root node with the number of 0
Step one, creating new nodes according to the first character of the layer 1, increasing the number progressively, and adding 1 node in each mode; drawing an edge between the root node and the newly added node, wherein the GOTO table is an edge valueedge(chr,mov),chrIs the first character of the 1 layer,movfor the moving distance calculated by the window sliding method, the FAIL table is edge values of all other characters except the character of the created edge of the root node;
the window sliding method comprises the following steps:
firstly, setting each mode first character to be aligned with a window first character;
step one, step two, already scanned position and every pattern correspond to the position comparison character in the window, all positions match successfully completely, carry out step one, step two, step three, match and fail, carry out step one, step two, step four;
step one, step three, finding out the best mode, returning to the next position to be scanned of the mode according to the sequence of the user-defined positions, wherein the mov value is the distance from the nearest character scanning position to the next position to be scanned, and the algorithm is ended;
and step one, step two, step four, the window moves a character to the left, if the position of the tail of the window is smaller than the position of the first character of all the modes, the algorithm is ended, otherwise, the step one, step two and step two are executed.
The mov calculation method is as follows:
assuming that the current scanning position is k, k represents the relative position to the window start position,
Figure 554561DEST_PATH_IMAGE030
as a window
Figure 53675DEST_PATH_IMAGE018
The position of the tail character is determined,
Figure 808529DEST_PATH_IMAGE032
as a window
Figure DEST_PATH_IMAGE034
The position of the tail character. The invention designs a window sliding method for calculating the mov value, which comprises the following two steps: firstly, searching a mode which accords with a scanned position; the second step is that: calculating a first unscanned position of the searched mode, and calculating mov;
the first step is as follows: assume a set of patterns as
Figure DEST_PATH_IMAGE036
Window
Figure DEST_PATH_IMAGE038
In which the scanned character and position are
Figure DEST_PATH_IMAGE040
. Looking up such patterns
Figure 21598DEST_PATH_IMAGE042
And, for all of i,
Figure DEST_PATH_IMAGE044
satisfy the following requirements
Figure DEST_PATH_IMAGE046
And is
Figure DEST_PATH_IMAGE048
The second step is that: mode(s)
Figure 729529DEST_PATH_IMAGE042
Assume that the custom location order is
Figure DEST_PATH_IMAGE050
The mov calculation formula is as follows:
Figure DEST_PATH_IMAGE052
in the formula (I), the compound is shown in the specification,
Figure 979638DEST_PATH_IMAGE054
representing characters
Figure 218858DEST_PATH_IMAGE056
Whether scanned, 1 is scanned, 0 is not scanned; k represents the current scan position;
Figure 265312DEST_PATH_IMAGE058
presentation mode
Figure 897588DEST_PATH_IMAGE042
Is scanned, is not scanned.
Step one, creating new nodes according to the second character of the layer 1, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as the edge valueedge(chr,mov),chrIs the second character of the layer 1,movfor the calculated movement distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
step four, creating new nodes according to the third character of the layer 1, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as an edge valueedge(chr,mov),chrIs the third character of the 1 layer,movfor the calculated movement distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
creating new nodes according to the first character of the 2 layers, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as an edge valueedge(chr,mov),chrIs the first character of the 2 layers of the character,movfor the calculated movement distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
step one, moving the current node of each mode to a newly created node, and repeating the step one and the step five until the last character of the layer 2 finishes the creation of the node;
step one, creating new nodes according to the first character of the 3 layers, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as an edge valueedge(chr,mov),chrIs the first character of the 3 layers,movfor the calculated movement distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
and step one, moving the current node of each mode to the newly created node, and repeating the step one and the step seven until the last character of the layer 3 finishes the creation of the node.
Referring to fig. 1, the automaton constructs results: taking { ADCAB, EBPCA, ECCADCP, BDCABPDA } as an example to show the construction result of the self-defined position sequential algorithm automaton, firstly, a root node is created, and the number is numbered
Figure 738505DEST_PATH_IMAGE060
(ii) a Creating a new node, numbered according to the first character { A, B, E } of the 4 mode 1 layers
Figure 668284DEST_PATH_IMAGE062
. Root node
Figure 846325DEST_PATH_IMAGE060
Respectively drawing edges between the nodes and the 3 newly added nodes, wherein the edge values are respectivelyedge(A,3),edge(B,3),edge(E,3). The failure list of the root node is { (, 1),
Figure 610406DEST_PATH_IMAGE060
}. Creating a new node, numbered according to the second character { A, A, C, A } of the 4 mode 1 layers
Figure 622224DEST_PATH_IMAGE064
. And with
Figure 39299DEST_PATH_IMAGE066
Draw an edge between, the edge value isedge(A,1)
Figure DEST_PATH_IMAGE068
Is listed as
Figure DEST_PATH_IMAGE070
Wherein
Figure DEST_PATH_IMAGE072
Representing a set, i.e. corresponding to different charactersmov,{(B,-1)(C,-2)(D,-1)(E,3)(P,-2)(*,1)}。
Figure DEST_PATH_IMAGE074
And
Figure DEST_PATH_IMAGE076
and the edge and fail list of
Figure DEST_PATH_IMAGE078
Similarly. The construction of the remaining nodes is similar to that in the step (3) and is not repeated. Each branch of the tree corresponds to a mode, a node is created by each character according to the sequence of the self-defined position, the edge is a pair of values, the first is the character, and the second is the position moving distance after scanning. Each node has a failure list, and all failure transfer states are in the layer 1, so that the nodes for attacking data access are mainly in the layer 1, the cache loss rate is greatly reduced, and the CPU performance is improved.
Step two, scanning the pointer to point to the first character of the data to be matched, and setting the current state of the automaton as a root node;
step three, matching the current scanning character with the current state node; all edge values and fail pointers for current scan character and current state nodechrMatching successfully, executing step four, wherein the current scanning character and all the edge values and the failure pointers of the current state nodechrFail to match, if a value in the fail pointer is matchedchrSuccess, the scanning pointer is according tomovThe value is moved by a corresponding distance, the automaton jumps to the next node according to the value recorded in the failure pointer of the automaton, and rescanning is carried out; referring to FIG. 2 for robot scanningThe flow chart understands this step.
Step four, the automaton jumps to the next node along the edge, if the new node is the tail node of the mode self-defined sequence, namely the leaf node of the automaton, the step five is executed, otherwise, the scanning pointer scans the leaf node of the pointer according to the edgemovMoving the value by a corresponding distance, if the moved value exceeds the tail character of the data to be matched, terminating the scanning, and otherwise, re-scanning;
and step five, matching a mode, outputting the mode recorded in the OUTPUT table, jumping to the next node recorded by the current node by the automaton according to the current state, and executing the step three.
Referring to fig. 3, illustrating the process of the automaton scanning attack sample, attack data is first constructed by taking { ADCAB, EBPCPCA, eccapdc, bdbcapa } as an example, attack AC algorithm data { ECCADC, bdbcapd }, attack WM algorithm data { EBMCP }, attack SBOM algorithm data { CDCAB }, an attack sample { CDCAB EBMCP eccadbdcbpd } is generated according to the attack data, the numbers on the sides in the figure indicate that the scanning is performed for the second time,
Figure DEST_PATH_IMAGE080
is the next state node of the automaton. Specifically, the method comprises the following steps:
first, the state machine enters the root node
Figure DEST_PATH_IMAGE082
At this point, scan the 1 st character C, the state machine fails to match C, find { (. 1) from the fail list,
Figure 617435DEST_PATH_IMAGE082
the state is still
Figure 842880DEST_PATH_IMAGE082
The scanning pointer moves 1 character to the right, and the current character is a 2 nd character D;
secondly, scanning the 2 nd character D, failing the state machine to match D, finding out { (. 1) from the failure list,
Figure 899303DEST_PATH_IMAGE082
},the state is still as
Figure 272516DEST_PATH_IMAGE082
The scanning pointer moves 1 character to the right, and the current character is the 3 rd character C;
secondly, scanning the 3 rd character C, failing the state machine to match C, finding { (. 1) from the failure list,
Figure 933304DEST_PATH_IMAGE082
the state is still
Figure 137889DEST_PATH_IMAGE082
The scanning pointer moves 1 character to the right, and the current character is a 4 th character A;
secondly, scan the 4 th character A, the state machine match A succeeds, according to
Figure 757089DEST_PATH_IMAGE082
Edge value edge (A, 3), scanning pointer moves to right 3 characters, current character is 7 th character B, automaton jumps to state along edge
Figure DEST_PATH_IMAGE084
Second, scan the 7 th character B, the state machine fails to match B,
Figure 214003DEST_PATH_IMAGE084
find from the failure list { (B, -1),
Figure 68696DEST_PATH_IMAGE082
scanning the pointer to move 1 character to the left, the current character is the 6 th character E, and the automaton jumps to along the failure pointer
Figure 534312DEST_PATH_IMAGE082
Secondly, scan the 6 th character E, the state machine match E is successful, according to
Figure 324413DEST_PATH_IMAGE082
Edge value edge (E, 3), scanning pointer moves right 3 characters, current character is 9 th character C, automaton jumps to state along edge
Figure DEST_PATH_IMAGE086
Finally, the subsequent steps are scanned by the method until the scanning pointer exceeds the last character D of the character string to be matched.
In this example, the data to be scanned includes 23 characters, the automaton scans 16 times in total, which accounts for 69.6%, and 8 characters are skipped, which accounts for 34.8%. Automaton node jump sequence
Figure DEST_PATH_IMAGE088
Figure DEST_PATH_IMAGE088
16 nodes are all in layer 1, root node
Figure 189995DEST_PATH_IMAGE082
The number of the plants is 8, and the percentage is 50%. Therefore, under the attack of cache loss, all nodes of the pattern matching algorithm with the self-defined sequence are concentrated in 1 level, and cache loss can not occur, so that the matching performance is stable.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (5)

1. A self-defined position sequence pattern matching method suitable for cache loss attack is characterized by comprising the following steps:
s1, constructing an automaton, comprising the following steps:
s1.1, constructing an automaton according to a user-defined position sequence; establishing a root node with the number of 0;
s1.2, creating new nodes according to the first character of the layer 1, increasing the number progressively, and adding 1 node in each mode; drawing an edge between the root node and the newly added node, wherein the GOTO table is an edge valueedgechrmov),chrIs the first character of the 1 layer,movfor the calculated movement distance by window sliding method, FAIL table is for root node except for the character of creating edgeedgeA value;
the window sliding method comprises the following steps:
s1.2.1, setting the first character of each mode to be aligned with the first character of the window;
s1.2.2, comparing the scanned positions in the window with the positions corresponding to each mode, wherein all the positions are completely matched successfully, executing step S1.2.3, and executing step S1.2.4 if the matching fails;
s1.2.3, finding out the best mode, returning to the next position to be scanned in the mode according to the sequence of the user-defined positions,movthe value is the distance from the nearest character scanning position to the next position to be scanned, and the algorithm is ended;
s1.2.4, moving the window by one character to the left, if the tail position of the window is less than the positions of all the mode first characters, ending the algorithm, and if not, executing a step S1.2.2;
s1.3, creating new nodes according to the second character of the layer 1, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as an edge valueedge(chrmov),chrIs the second character of the layer 1,movfor the calculated movement distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
s1.4, creating new nodes according to the third character of the layer 1, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as an edge valueedge(chrmov),chrIs the third character of the 1 layer,movfor the calculated movement distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
s1.5, creating new nodes according to the first character of the 2 layers, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as an edge valueedge(chrmov),chrIs the first character of the 2 layers of the character,movfor the calculated moving distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
s1.6, moving the current node of each mode to a newly created node, and repeating the step S1.1.5 until the last character of the layer 2 completes the creation of the node;
s1.7, creating new nodes according to the first character of the 3 layers, increasing the number, adding 1 node for each mode, drawing an edge between the current node and the added node of each mode, and taking a GOTO table as an edge valueedge(chrmov),chrIs the first character of the 3 layers,movfor the calculated movement distance by window sliding method, FAIL table is for the current node of the mode except for the character of the created edgeedgeA value;
s1.8, moving the current node of each mode to a newly created node, and repeating the step S1.1.7 until the last character of the 3 layers completes the creation of the node;
s2, scanning a pointer to point to the first character of the data to be matched, and setting the current state of the automaton as a root node;
s3, matching the current scanning character with the current state node; all edge values and fail pointers for current scan character and current state nodechrIf the matching is successful, step S4 is executed, the current scan character is matched with all the edge values and the failure pointers of the current state nodechrFail to match, if a value in the fail pointer is matchedchrSuccess, the scanning pointer is according tomovThe value being moved by a corresponding distance, the automaton being responsive to the record in its fail pointer
Figure 780143DEST_PATH_IMAGE002
Skipping to the next node and rescanning;
s4, the automaton jumps to the next node along the edge, if the new node is the tail node of the mode self-defined sequence, namely the leaf node of the automaton, the step S5 is executed, otherwise, the scanning pointer scans the edgemovThe value is moved by a corresponding distance, if the moved value exceeds the tail character of the data to be matched, the scanning is terminated, otherwise, the scanning is rescanned;
S5, mode hit, mode recorded in the OUTPUT table is OUTPUT, the current state of the automaton jumps to the next node recorded by the current node, and step S3 is executed.
2. The method of claim 1, wherein the automaton comprises a GOTO table, a FAIL table, and an OUTPUT table.
3. The method as claimed in claim 2, wherein the automaton is configured to set two types of windows, namely a fixed window and a variable window, and a fixed-size window
Figure 966405DEST_PATH_IMAGE004
Multiple variable size windows
Figure 768139DEST_PATH_IMAGE006
Wherein
Figure 254615DEST_PATH_IMAGE004
Size of [1, shortest mode Length-1],
Figure 749182DEST_PATH_IMAGE006
The size of the particles is 2,
Figure 688319DEST_PATH_IMAGE004
longest pattern tail character position with the second character as the first character],
Figure 816113DEST_PATH_IMAGE006
There are a plurality, the number being the size of the first character set of the pattern.
4. The matching method of self-defined position sequence patterns suitable for the cache loss attack as claimed in claim 3, wherein the automaton has a construction sequence of 1 layer, 2 layers and 3 layers; the sequence of self-defined positions of each layer is as follows:
(1) 1 layer: comprising a plurality of 3 characters of which the number is 3,
Figure 157096DEST_PATH_IMAGE004
a first character,
Figure 822563DEST_PATH_IMAGE004
A tail character,
Figure 717838DEST_PATH_IMAGE006
A longest window tail character;
(2) 2, layer:
Figure DEST_PATH_IMAGE007
removing the largest substrings of the first and last 2 characters, and reordering in reverse order;
(3) 3, layer:
Figure 799058DEST_PATH_IMAGE007
tail character and longest length
Figure 991617DEST_PATH_IMAGE006
The largest substring between the tail characters.
5. The method according to claim 4, wherein the step of window sliding computation is based on a step-by-step sequential pattern matching methodmovValue ofCalculating by finding a pattern corresponding to a scanned position and calculating a first non-scanned position foundmovA value; calculating outmovThe value formula is as follows:
Figure DEST_PATH_IMAGE009
in the formula
Figure DEST_PATH_IMAGE011
Representing characters
Figure DEST_PATH_IMAGE013
Whether scanned, 1 is scanned, 0 is not scanned;krepresenting a current scan position;
Figure DEST_PATH_IMAGE015
presentation mode
Figure DEST_PATH_IMAGE017
Is scanned, is not scanned.
CN202110292042.7A 2021-03-18 2021-03-18 Custom position sequence pattern matching method suitable for cache loss attack Active CN113010882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110292042.7A CN113010882B (en) 2021-03-18 2021-03-18 Custom position sequence pattern matching method suitable for cache loss attack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110292042.7A CN113010882B (en) 2021-03-18 2021-03-18 Custom position sequence pattern matching method suitable for cache loss attack

Publications (2)

Publication Number Publication Date
CN113010882A CN113010882A (en) 2021-06-22
CN113010882B true CN113010882B (en) 2022-08-30

Family

ID=76402487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110292042.7A Active CN113010882B (en) 2021-03-18 2021-03-18 Custom position sequence pattern matching method suitable for cache loss attack

Country Status (1)

Country Link
CN (1) CN113010882B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103500178A (en) * 2013-09-09 2014-01-08 中国科学院计算机网络信息中心 Quick multi-mode matching method on worst-case scenario of FS algorithm
CN109977276A (en) * 2019-03-22 2019-07-05 华南理工大学 A kind of single pattern matching method based on Sunday algorithm improvement

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7725510B2 (en) * 2006-08-01 2010-05-25 Alcatel-Lucent Usa Inc. Method and system for multi-character multi-pattern pattern matching
US8504510B2 (en) * 2010-01-07 2013-08-06 Interdisciplinary Center Herzliya State machine compression for scalable pattern matching
AU2013207274A1 (en) * 2012-01-08 2014-08-21 Imagistar Llc System and method for item self-assessment as being extant or displaced
AU2014205389A1 (en) * 2013-01-11 2015-06-04 Db Networks, Inc. Systems and methods for detecting and mitigating threats to a structured data storage system
US9996387B2 (en) * 2013-11-04 2018-06-12 Lewis Rhodes Labs, Inc. Context switching for computing architecture operating on sequential data
EP3742687A1 (en) * 2014-04-23 2020-11-25 Bequant S.L. Method and apparatus for network congestion control based on transmission rate gradients
CN104796354A (en) * 2014-11-19 2015-07-22 中国科学院信息工程研究所 Out-of-order data packet string matching method and system
CN105260354B (en) * 2015-08-20 2018-08-21 及时标讯网络信息技术(北京)有限公司 A kind of Chinese AC automatic machines working method based on keyword dictionary tree construction
CN106067039B (en) * 2016-05-30 2019-01-29 桂林电子科技大学 Method for mode matching based on decision tree beta pruning
US10678907B2 (en) * 2017-01-26 2020-06-09 University Of South Florida Detecting threats in big data platforms based on call trace and memory access patterns
CN110071871A (en) * 2019-03-13 2019-07-30 国家计算机网络与信息安全管理中心 A kind of large model pool ip address matching process
CN109918548A (en) * 2019-04-08 2019-06-21 上海凡响网络科技有限公司 A kind of methods and applications of automatic detection document sensitive information
CN110362669B (en) * 2019-07-18 2022-07-01 中科信息安全共性技术国家工程研究中心有限公司 Method suitable for fast keyword retrieval
CN111046938B (en) * 2019-12-06 2020-12-01 邑客得(上海)信息技术有限公司 Network traffic classification and identification method and equipment based on character string multi-mode matching

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103500178A (en) * 2013-09-09 2014-01-08 中国科学院计算机网络信息中心 Quick multi-mode matching method on worst-case scenario of FS algorithm
CN109977276A (en) * 2019-03-22 2019-07-05 华南理工大学 A kind of single pattern matching method based on Sunday algorithm improvement

Also Published As

Publication number Publication date
CN113010882A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
US9171153B2 (en) Bloom filter with memory element
CN108846016B (en) Chinese word segmentation oriented search algorithm
CN106980656B (en) A kind of searching method based on two-value code dictionary tree
US8583961B2 (en) Method and device for creating pattern matching state machine
WO2014201047A1 (en) Fast, scalable dictionary construction and maintenance
CN106599097B (en) Matching method and device for mass feature string set
Rasool et al. String matching methodologies: A comparative analysis
CN108628907A (en) A method of being used for the Trie tree multiple-fault diagnosis based on Aho-Corasick
CN108920483B (en) Suffix array-based character string fast matching method
US9690873B2 (en) System and method for bit-map based keyword spotting in communication traffic
CN103544208B (en) The matching process of massive feature cluster set and system
US20070204344A1 (en) Parallel Variable Length Pattern Matching Using Hash Table
WO2017000859A1 (en) Leaping search algorithm for similar sub-sequences in character sequence and application thereof in searching in biological sequence database
CN113010882B (en) Custom position sequence pattern matching method suitable for cache loss attack
US8051060B1 (en) Automatic detection of separators for compression
CN111159490B (en) Method, device and equipment for processing pattern character strings
CN113065419B (en) Pattern matching algorithm and system based on flow high-frequency content
CN108304467B (en) Method for matching between texts
Hon et al. Succinct indexes for circular patterns
CN111984828B (en) Neighbor node retrieval method and device
KR100992440B1 (en) A Multiple Pattern Matching Method using Multiple Consecutive Sub-patterns
CN109460495B (en) Redundant field filtering method based on improved BM algorithm and suffix array
CN111814009B (en) Mode matching method based on search engine retrieval information
CN115525801A (en) Pattern matching algorithm for network security system
CN112287655A (en) Matched text duplicate removal method and device, and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant