CN103023883A - Character string matching method based on automatic control (AC) automatic machine and suffix tree - Google Patents
Character string matching method based on automatic control (AC) automatic machine and suffix tree Download PDFInfo
- Publication number
- CN103023883A CN103023883A CN2012104884271A CN201210488427A CN103023883A CN 103023883 A CN103023883 A CN 103023883A CN 2012104884271 A CN2012104884271 A CN 2012104884271A CN 201210488427 A CN201210488427 A CN 201210488427A CN 103023883 A CN103023883 A CN 103023883A
- Authority
- CN
- China
- Prior art keywords
- packet
- described current
- order
- suffix
- suffix tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a character string matching method based on an automatic control (AC) automatic machine and a suffix tree, which comprises the following steps of: S1, compiling a characteristic character string into an AC automatic machine; S2, gathering suffixes of the characteristic character string and compiling into a suffix tree; S3, as long as a data packet enters into network security equipment, matching the data packet depending on the AC automatic machine, and conserving a matching state through the suffix tree; and S4, if the matching is successful, discarding the data packet. According to the character string matching method disclosed by the invention, the state numbers of the AC automatic machine and the suffix tree are conserved while matching the character string of the data packet, so that the data packet can be matched in a manner of continuing the last state even though disorder occurs, to avoid cache of the previous data packet; the shortcomings of increment of delay, deterioration of memory consumption and local reduction of a high-speed cache memory due to the cache are overcome, resource required by the network security equipment is reduced and performance of the network security equipment is improved.
Description
Technical field
The present invention relates to network filtering and monitoring technique field, particularly a kind of character string matching method based on AC automaton and suffix tree.
Background technology
Along with improving constantly of network security requirement, the functions such as intrusion detection, anti-virus, information filtering are applied in the Network Security Device just more and more.String matching algorithm then is the core algorithm that supports these functions, has also determined the performance of Network Security Device.The string matching algorithm that is widely used in most at present Network Security Device is Aho-Corasick (AC) automaton algorithm, the AC algorithm is a kind of string matching algorithm based on the automaton principle, as shown in Figure 1, its basic functional principle is: at first with feature string (such as the virus characteristic storehouse, filtration keyword etc.) be compiled into automaton, from state 0, word for word read in content to be matched, read in a character (for example a) time at every turn, check whether current state has the redirect arrow of corresponding character, if have, then jump to NextState corresponding to this redirect, if do not have, then jump back to state 0.Have some states to be marked as matching status, the match is successful if enter this state representative.
May be scattered in the situation of a plurality of packets for feature string, what industry generally adopted at present is that data cached bag and recombination data bag carry out string matching afterwards again, thereby has promoted the internal memory use amount of Network Security Device.
Yet there are following shortcoming in data cached bag and recombination data bag: at first, data cached bag can make network delay become large; Secondly, the recombination data bag needs a large amount of internal memories in the express network more than the gigabit level, easily makes Network Security Device the situation that internal memory exhausts occur; Again, in possessing the Network Security Device of cache memory, the bag data that read and write data in a large number in internal memory also can make the locality of cache memory reduce, thereby reduce the performance of Network Security Device.
Summary of the invention
(1) technical problem that solves
The technical problem that the present invention solves is to propose a kind of recombination data bag that need not just can detect the character string matching method of striding the bag content.
(2) technical scheme
The present invention proposes a kind of character string matching method based on AC automaton and suffix tree, described method comprises:
S1, feature string is compiled into the AC automaton;
S2, the set of the suffix of feature string is compiled into suffix tree;
S3, when having a packet to enter Network Security Device, according to described AC automaton described packet is mated, and utilizes described suffix tree to preserve matching status;
If the match is successful for S4, then abandon described packet.
Preferably, when described packet entered according to the order of sequence, then step S3 specifically comprised:
S31, when receiving current order data bag, whether the record of searching for the previous packet of described current order data bag exists;
S32, whether there is the status number of judging described current order data bag according to the record of the previous packet of described current order data bag;
S33, according to described AC automaton, begin described current order data bag is mated from the status number of described current order data bag.
Preferably, step S32 specifically comprises:
If the record of the previous packet of described current order data bag exists, then with the status number of the previous packet of the described current order data bag status number as described current order data bag;
If the record of the previous packet of described current order data bag does not exist, then the status number of described current order data bag is 0.
Preferably, when described data packet disorder entered, then step S3 specifically comprised:
S31 ', before receiving current out of order packet, whether the suffix tree record of searching for a rear packet of described current out of order packet exists, and whether the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree;
S32 ' is if the suffix tree of a rear packet of described current out of order packet record exists, and when the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree, then preserve the suffix tree numbering of described suffix tree on described suffix;
S33 ', when receiving described current out of order packet, the suffix tree numbering of described suffix tree on described suffix recalled;
S34 ', will recall the suffix that obtains and add to the afterbody of described current out of order packet, mate according to the described current out of order packet of described AC automaton after to restructuring.
Preferably, described step S31 ' comprises also whether the suffix tree record of the previous packet of searching for described current out of order packet exists:
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet;
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is not the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet.
Preferably, described step S32 ' if described in the suffix tree record of a rear packet of described current out of order packet exist, and detect the rear part that packet is not suffix of described current out of order packet, the described current out of order packet of then a rear packet of described current out of order packet being recombinated mates, and the status number of the current out of order packet after the restructuring is the status number of a rear packet of described current out of order packet.
(3) beneficial effect
The status number of the present invention's preservation state machine when the packet character string is mated, so that packet once state on order can continue when arriving mates, broken away from the shortcoming that delay increasing memory consumption strengthens, memory consumption strengthens and the cache memory locality reduces that buffer memory causes, can reduce the Network Security Device resource requirement, promote its performance.
Description of drawings
Fig. 1 is AC automaton schematic diagram;
Fig. 2 is the method flow diagram that the present invention proposes;
Fig. 3 is the suffix tree schematic diagram of feature string abcde among the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described.
The present invention proposes a kind of character string matching method based on AC automaton and suffix tree, it is the improvement to the AC algorithm, the content that replaces the whole packet of buffer memory by the status number of record automaton, the method can be used among the derivative algorithm AC-Optimized, Cl-AC scheduling algorithm of AC algorithm simultaneously, the flow chart of the method as shown in Figure 2, described method comprises:
S1, feature string is compiled into the AC automaton;
S2, the set of the suffix of feature string is compiled into suffix tree;
S3, when having a packet to enter Network Security Device, according to described AC automaton described packet is mated, and utilizes described suffix tree to preserve matching status;
If the match is successful for S4, then abandon described packet.
When described packet entered according to the order of sequence, then step S3 specifically comprised:
S31, when receiving current order data bag, whether the record of searching for the previous packet of described current order data bag exists;
S32, whether there is the status number of judging described current order data bag according to the record of the previous packet of described current order data bag;
S33, according to described AC automaton, begin described current order data bag is mated from the status number of described current order data bag.
Step S32 specifically comprises:
If the record of the previous packet of described current order data bag exists, then with the status number of the previous packet of the described current order data bag status number as described current order data bag;
If the record of the previous packet of described current order data bag does not exist, then the status number of described current order data bag is 0.
When described data packet disorder entered, then step S3 specifically comprised:
S31 ', before receiving current out of order packet, whether the suffix tree record of searching for a rear packet of described current out of order packet exists, and whether the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree;
S32 ' is if the suffix tree of a rear packet of described current out of order packet record exists, and when the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree, then preserve the suffix tree numbering of described suffix tree on described suffix;
S33 ', when receiving described current out of order packet, the suffix tree numbering of described suffix tree on described suffix recalled;
S34 ', will recall the suffix that obtains and add to the afterbody of described current out of order packet, mate according to the described current out of order packet of described AC automaton after to restructuring.
Described step S31 ' comprises also whether the suffix tree record of the previous packet of searching for described current out of order packet exists:
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet;
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is not the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet.
Described step S32 ' if described in the suffix tree record of a rear packet of described current out of order packet exist, and detect the rear part that packet is not suffix of described current out of order packet, the described current out of order packet of then a rear packet of described current out of order packet being recombinated mates, and the status number of the current out of order packet after the restructuring is the status number of a rear packet of described current out of order packet.
The present embodiment hypothesis feature string is abcde, then the set of the suffix of feature string abcde for bcde, cde, de, e}, the suffix tree schematic diagram of feature string abcde is as shown in Figure 3.
For what the present invention proposed based on the data structure in the character string matching method of AC automaton and suffix tree be:
Wherein, the data structure of AC state of automata is:
The data structure of suffix tree state is:
For the entry data structure of preserving out of order recording of information table Buffer be:
False code such as following table 1 based on the character string matching method of AC automaton and suffix tree that the present invention proposes:
Table 1
Above execution mode only is used for explanation the present invention; and be not limitation of the present invention; the those of ordinary skill in relevant technologies field; in the situation that do not break away from the spirit and scope of the present invention; can also make a variety of changes and modification; therefore all technical schemes that are equal to also belong to category of the present invention, and scope of patent protection of the present invention should be defined by the claims.
Claims (6)
1. the character string matching method based on AC automaton and suffix tree is characterized in that, described method comprises:
S1, feature string is compiled into the AC automaton;
S2, the set of the suffix of feature string is compiled into suffix tree;
S3, when having a packet to enter Network Security Device, according to described AC automaton described packet is mated, and utilizes described suffix tree to preserve matching status;
If the match is successful for S4, then abandon described packet.
2. method according to claim 1 is characterized in that, when described packet entered according to the order of sequence, then step S3 specifically comprised:
S31, when receiving current order data bag, whether the record of searching for the previous packet of described current order data bag exists;
S32, whether there is the status number of judging described current order data bag according to the record of the previous packet of described current order data bag;
S33, according to described AC automaton, begin described current order data bag is mated from the status number of described current order data bag.
3. method according to claim 2 is characterized in that, step S32 specifically comprises:
If the record of the previous packet of described current order data bag exists, then with the status number of the previous packet of the described current order data bag status number as described current order data bag;
If the record of the previous packet of described current order data bag does not exist, then the status number of described current order data bag is 0.
4. method according to claim 1 is characterized in that, when described data packet disorder entered, then step S3 specifically comprised:
S31 ', before receiving current out of order packet, whether the suffix tree record of searching for a rear packet of described current out of order packet exists, and whether the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree;
S32 ' is if the suffix tree of a rear packet of described current out of order packet record exists, and when the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree, then preserve the suffix tree numbering of described suffix tree on described suffix;
S33 ', when receiving described current out of order packet, the suffix tree numbering of described suffix tree on described suffix recalled;
S34 ', will recall the suffix that obtains and add to the afterbody of described current out of order packet, mate according to the described current out of order packet of described AC automaton after to restructuring.
5. method according to claim 4 is characterized in that, described step S31 ' comprises also whether the suffix tree record of the previous packet of searching for described current out of order packet exists:
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet;
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is not the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet.
6. method according to claim 4, it is characterized in that, described step S32 ' if described in the suffix tree record of a rear packet of described current out of order packet exist, and detect the rear part that packet is not suffix of described current out of order packet, the described current out of order packet of then a rear packet of described current out of order packet being recombinated mates, and the status number of the current out of order packet after the restructuring is the status number of a rear packet of described current out of order packet.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012104884271A CN103023883A (en) | 2012-11-26 | 2012-11-26 | Character string matching method based on automatic control (AC) automatic machine and suffix tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012104884271A CN103023883A (en) | 2012-11-26 | 2012-11-26 | Character string matching method based on automatic control (AC) automatic machine and suffix tree |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103023883A true CN103023883A (en) | 2013-04-03 |
Family
ID=47972014
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012104884271A Pending CN103023883A (en) | 2012-11-26 | 2012-11-26 | Character string matching method based on automatic control (AC) automatic machine and suffix tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103023883A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104796354A (en) * | 2014-11-19 | 2015-07-22 | 中国科学院信息工程研究所 | Out-of-order data packet string matching method and system |
CN105183788A (en) * | 2015-08-20 | 2015-12-23 | 及时标讯网络信息技术(北京)有限公司 | Operation method for Chinese AC automatic machine based on retrieval of keyword dictionary tree |
CN105407096A (en) * | 2015-11-26 | 2016-03-16 | 深圳市风云实业有限公司 | Message data detection method based on stream management |
CN105468597A (en) * | 2014-08-14 | 2016-04-06 | 腾讯科技(北京)有限公司 | Method and device for acquiring jump distance |
CN106067039A (en) * | 2016-05-30 | 2016-11-02 | 桂林电子科技大学 | Method for mode matching based on decision tree beta pruning |
CN108471355A (en) * | 2018-02-28 | 2018-08-31 | 哈尔滨工程大学 | A kind of Internet of Things Information Interoperability method based on extra large cloud computing framework |
CN112506789A (en) * | 2020-12-17 | 2021-03-16 | 中国科学院计算技术研究所 | Parallel pattern matching method for data packet detection |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101009660A (en) * | 2007-01-19 | 2007-08-01 | 杭州华为三康技术有限公司 | Universal method and device for processing the match of the segmented message mode |
CN101258721A (en) * | 2005-06-30 | 2008-09-03 | 英特尔公司 | Stateful packet content matching mechanisms |
CN101562604A (en) * | 2008-04-17 | 2009-10-21 | 北京启明星辰信息技术股份有限公司 | Non-cache model matching method based on message flow data |
CN102685098A (en) * | 2012-02-24 | 2012-09-19 | 华南理工大学 | Recombination-free multi-mode matching method for out-of-order data package flow |
-
2012
- 2012-11-26 CN CN2012104884271A patent/CN103023883A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101258721A (en) * | 2005-06-30 | 2008-09-03 | 英特尔公司 | Stateful packet content matching mechanisms |
CN101009660A (en) * | 2007-01-19 | 2007-08-01 | 杭州华为三康技术有限公司 | Universal method and device for processing the match of the segmented message mode |
CN101562604A (en) * | 2008-04-17 | 2009-10-21 | 北京启明星辰信息技术股份有限公司 | Non-cache model matching method based on message flow data |
CN102685098A (en) * | 2012-02-24 | 2012-09-19 | 华南理工大学 | Recombination-free multi-mode matching method for out-of-order data package flow |
Non-Patent Citations (1)
Title |
---|
XINMING CHEN等: "AC-Suffix-Tree: Buffer Free String Matching on Out-of-Sequence Packets", 《2011 SEVENTH ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS》, 4 October 2011 (2011-10-04) * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105468597A (en) * | 2014-08-14 | 2016-04-06 | 腾讯科技(北京)有限公司 | Method and device for acquiring jump distance |
CN105468597B (en) * | 2014-08-14 | 2020-09-25 | 腾讯科技(北京)有限公司 | Method and device for acquiring jump distance |
CN104796354A (en) * | 2014-11-19 | 2015-07-22 | 中国科学院信息工程研究所 | Out-of-order data packet string matching method and system |
CN105183788A (en) * | 2015-08-20 | 2015-12-23 | 及时标讯网络信息技术(北京)有限公司 | Operation method for Chinese AC automatic machine based on retrieval of keyword dictionary tree |
CN105183788B (en) * | 2015-08-20 | 2019-01-25 | 及时标讯网络信息技术(北京)有限公司 | A kind of Chinese AC automatic machine working method based on the retrieval of keyword dictionary tree |
CN105407096A (en) * | 2015-11-26 | 2016-03-16 | 深圳市风云实业有限公司 | Message data detection method based on stream management |
CN105407096B (en) * | 2015-11-26 | 2019-03-19 | 深圳市风云实业有限公司 | Message data detection method based on flow management |
CN106067039A (en) * | 2016-05-30 | 2016-11-02 | 桂林电子科技大学 | Method for mode matching based on decision tree beta pruning |
CN106067039B (en) * | 2016-05-30 | 2019-01-29 | 桂林电子科技大学 | Method for mode matching based on decision tree beta pruning |
CN108471355A (en) * | 2018-02-28 | 2018-08-31 | 哈尔滨工程大学 | A kind of Internet of Things Information Interoperability method based on extra large cloud computing framework |
CN112506789A (en) * | 2020-12-17 | 2021-03-16 | 中国科学院计算技术研究所 | Parallel pattern matching method for data packet detection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103023883A (en) | Character string matching method based on automatic control (AC) automatic machine and suffix tree | |
US9798714B2 (en) | System and method for keyword spotting using representative dictionary | |
Goel et al. | Small subset queries and bloom filters using ternary associative memories, with applications | |
CN102510323B (en) | Frame identifying method for serial data | |
CN102184197B (en) | Regular expression matching method based on smart finite automaton (SFA) | |
US20080104702A1 (en) | Network-based internet worm detection apparatus and method using vulnerability analysis and attack modeling | |
CN106649362B (en) | Webpage crawling method and device | |
CN100452055C (en) | Large-scale and multi-key word matching method for text or network content analysis | |
CN104753931A (en) | DPI (deep packet inspection) method based on regular expression | |
CN101442540A (en) | High speed mode matching algorithm based on field programmable gate array | |
CN100495407C (en) | Multiple character string matching method and chip | |
CN103412858A (en) | Method for large-scale feature matching of text content or network content analyses | |
CN109660517A (en) | Anomaly detection method, device and equipment | |
CN109327451A (en) | A kind of method, system, device and medium that the upload verifying of defence file bypasses | |
CN105407096A (en) | Message data detection method based on stream management | |
CN101030897B (en) | Method for matching mode in invading detection | |
US8812480B1 (en) | Targeted search system with de-obfuscating functionality | |
Afek et al. | Making DPI engines resilient to algorithmic complexity attacks | |
CN103746869A (en) | Data/mask and regular expression combined multistage deep packet detection method | |
CN102685098B (en) | Recombination-free multi-mode matching method for out-of-order data package flow | |
CN102437959B (en) | Stream forming method based on dual overtime network message | |
CN101938474A (en) | Network intrusion detection and protection method and device | |
CN106612303A (en) | Data processing method and data processing device | |
Chen et al. | Ac-suffix-tree: Buffer free string matching on out-of-sequence packets | |
CN102143151A (en) | Deep packet inspection based protocol packet spanning inspection method and deep packet inspection based protocol packet spanning inspection device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20130403 |