CN103023883A - Character string matching method based on automatic control (AC) automatic machine and suffix tree - Google Patents

Character string matching method based on automatic control (AC) automatic machine and suffix tree Download PDF

Info

Publication number
CN103023883A
CN103023883A CN2012104884271A CN201210488427A CN103023883A CN 103023883 A CN103023883 A CN 103023883A CN 2012104884271 A CN2012104884271 A CN 2012104884271A CN 201210488427 A CN201210488427 A CN 201210488427A CN 103023883 A CN103023883 A CN 103023883A
Authority
CN
China
Prior art keywords
packet
described current
order
suffix
suffix tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012104884271A
Other languages
Chinese (zh)
Inventor
陈新明
李军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN2012104884271A priority Critical patent/CN103023883A/en
Publication of CN103023883A publication Critical patent/CN103023883A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a character string matching method based on an automatic control (AC) automatic machine and a suffix tree, which comprises the following steps of: S1, compiling a characteristic character string into an AC automatic machine; S2, gathering suffixes of the characteristic character string and compiling into a suffix tree; S3, as long as a data packet enters into network security equipment, matching the data packet depending on the AC automatic machine, and conserving a matching state through the suffix tree; and S4, if the matching is successful, discarding the data packet. According to the character string matching method disclosed by the invention, the state numbers of the AC automatic machine and the suffix tree are conserved while matching the character string of the data packet, so that the data packet can be matched in a manner of continuing the last state even though disorder occurs, to avoid cache of the previous data packet; the shortcomings of increment of delay, deterioration of memory consumption and local reduction of a high-speed cache memory due to the cache are overcome, resource required by the network security equipment is reduced and performance of the network security equipment is improved.

Description

Character string matching method based on AC automaton and suffix tree
Technical field
The present invention relates to network filtering and monitoring technique field, particularly a kind of character string matching method based on AC automaton and suffix tree.
Background technology
Along with improving constantly of network security requirement, the functions such as intrusion detection, anti-virus, information filtering are applied in the Network Security Device just more and more.String matching algorithm then is the core algorithm that supports these functions, has also determined the performance of Network Security Device.The string matching algorithm that is widely used in most at present Network Security Device is Aho-Corasick (AC) automaton algorithm, the AC algorithm is a kind of string matching algorithm based on the automaton principle, as shown in Figure 1, its basic functional principle is: at first with feature string (such as the virus characteristic storehouse, filtration keyword etc.) be compiled into automaton, from state 0, word for word read in content to be matched, read in a character (for example a) time at every turn, check whether current state has the redirect arrow of corresponding character, if have, then jump to NextState corresponding to this redirect, if do not have, then jump back to state 0.Have some states to be marked as matching status, the match is successful if enter this state representative.
May be scattered in the situation of a plurality of packets for feature string, what industry generally adopted at present is that data cached bag and recombination data bag carry out string matching afterwards again, thereby has promoted the internal memory use amount of Network Security Device.
Yet there are following shortcoming in data cached bag and recombination data bag: at first, data cached bag can make network delay become large; Secondly, the recombination data bag needs a large amount of internal memories in the express network more than the gigabit level, easily makes Network Security Device the situation that internal memory exhausts occur; Again, in possessing the Network Security Device of cache memory, the bag data that read and write data in a large number in internal memory also can make the locality of cache memory reduce, thereby reduce the performance of Network Security Device.
Summary of the invention
(1) technical problem that solves
The technical problem that the present invention solves is to propose a kind of recombination data bag that need not just can detect the character string matching method of striding the bag content.
(2) technical scheme
The present invention proposes a kind of character string matching method based on AC automaton and suffix tree, described method comprises:
S1, feature string is compiled into the AC automaton;
S2, the set of the suffix of feature string is compiled into suffix tree;
S3, when having a packet to enter Network Security Device, according to described AC automaton described packet is mated, and utilizes described suffix tree to preserve matching status;
If the match is successful for S4, then abandon described packet.
Preferably, when described packet entered according to the order of sequence, then step S3 specifically comprised:
S31, when receiving current order data bag, whether the record of searching for the previous packet of described current order data bag exists;
S32, whether there is the status number of judging described current order data bag according to the record of the previous packet of described current order data bag;
S33, according to described AC automaton, begin described current order data bag is mated from the status number of described current order data bag.
Preferably, step S32 specifically comprises:
If the record of the previous packet of described current order data bag exists, then with the status number of the previous packet of the described current order data bag status number as described current order data bag;
If the record of the previous packet of described current order data bag does not exist, then the status number of described current order data bag is 0.
Preferably, when described data packet disorder entered, then step S3 specifically comprised:
S31 ', before receiving current out of order packet, whether the suffix tree record of searching for a rear packet of described current out of order packet exists, and whether the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree;
S32 ' is if the suffix tree of a rear packet of described current out of order packet record exists, and when the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree, then preserve the suffix tree numbering of described suffix tree on described suffix;
S33 ', when receiving described current out of order packet, the suffix tree numbering of described suffix tree on described suffix recalled;
S34 ', will recall the suffix that obtains and add to the afterbody of described current out of order packet, mate according to the described current out of order packet of described AC automaton after to restructuring.
Preferably, described step S31 ' comprises also whether the suffix tree record of the previous packet of searching for described current out of order packet exists:
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet;
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is not the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet.
Preferably, described step S32 ' if described in the suffix tree record of a rear packet of described current out of order packet exist, and detect the rear part that packet is not suffix of described current out of order packet, the described current out of order packet of then a rear packet of described current out of order packet being recombinated mates, and the status number of the current out of order packet after the restructuring is the status number of a rear packet of described current out of order packet.
(3) beneficial effect
The status number of the present invention's preservation state machine when the packet character string is mated, so that packet once state on order can continue when arriving mates, broken away from the shortcoming that delay increasing memory consumption strengthens, memory consumption strengthens and the cache memory locality reduces that buffer memory causes, can reduce the Network Security Device resource requirement, promote its performance.
Description of drawings
Fig. 1 is AC automaton schematic diagram;
Fig. 2 is the method flow diagram that the present invention proposes;
Fig. 3 is the suffix tree schematic diagram of feature string abcde among the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described.
The present invention proposes a kind of character string matching method based on AC automaton and suffix tree, it is the improvement to the AC algorithm, the content that replaces the whole packet of buffer memory by the status number of record automaton, the method can be used among the derivative algorithm AC-Optimized, Cl-AC scheduling algorithm of AC algorithm simultaneously, the flow chart of the method as shown in Figure 2, described method comprises:
S1, feature string is compiled into the AC automaton;
S2, the set of the suffix of feature string is compiled into suffix tree;
S3, when having a packet to enter Network Security Device, according to described AC automaton described packet is mated, and utilizes described suffix tree to preserve matching status;
If the match is successful for S4, then abandon described packet.
When described packet entered according to the order of sequence, then step S3 specifically comprised:
S31, when receiving current order data bag, whether the record of searching for the previous packet of described current order data bag exists;
S32, whether there is the status number of judging described current order data bag according to the record of the previous packet of described current order data bag;
S33, according to described AC automaton, begin described current order data bag is mated from the status number of described current order data bag.
Step S32 specifically comprises:
If the record of the previous packet of described current order data bag exists, then with the status number of the previous packet of the described current order data bag status number as described current order data bag;
If the record of the previous packet of described current order data bag does not exist, then the status number of described current order data bag is 0.
When described data packet disorder entered, then step S3 specifically comprised:
S31 ', before receiving current out of order packet, whether the suffix tree record of searching for a rear packet of described current out of order packet exists, and whether the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree;
S32 ' is if the suffix tree of a rear packet of described current out of order packet record exists, and when the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree, then preserve the suffix tree numbering of described suffix tree on described suffix;
S33 ', when receiving described current out of order packet, the suffix tree numbering of described suffix tree on described suffix recalled;
S34 ', will recall the suffix that obtains and add to the afterbody of described current out of order packet, mate according to the described current out of order packet of described AC automaton after to restructuring.
Described step S31 ' comprises also whether the suffix tree record of the previous packet of searching for described current out of order packet exists:
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet;
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is not the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet.
Described step S32 ' if described in the suffix tree record of a rear packet of described current out of order packet exist, and detect the rear part that packet is not suffix of described current out of order packet, the described current out of order packet of then a rear packet of described current out of order packet being recombinated mates, and the status number of the current out of order packet after the restructuring is the status number of a rear packet of described current out of order packet.
The present embodiment hypothesis feature string is abcde, then the set of the suffix of feature string abcde for bcde, cde, de, e}, the suffix tree schematic diagram of feature string abcde is as shown in Figure 3.
For what the present invention proposed based on the data structure in the character string matching method of AC automaton and suffix tree be:
Wherein, the data structure of AC state of automata is:
Figure BDA00002469050200061
The data structure of suffix tree state is:
For the entry data structure of preserving out of order recording of information table Buffer be:
Figure BDA00002469050200063
Figure BDA00002469050200071
False code such as following table 1 based on the character string matching method of AC automaton and suffix tree that the present invention proposes:
Figure BDA00002469050200072
Figure BDA00002469050200081
Figure BDA00002469050200091
Table 1
Above execution mode only is used for explanation the present invention; and be not limitation of the present invention; the those of ordinary skill in relevant technologies field; in the situation that do not break away from the spirit and scope of the present invention; can also make a variety of changes and modification; therefore all technical schemes that are equal to also belong to category of the present invention, and scope of patent protection of the present invention should be defined by the claims.

Claims (6)

1. the character string matching method based on AC automaton and suffix tree is characterized in that, described method comprises:
S1, feature string is compiled into the AC automaton;
S2, the set of the suffix of feature string is compiled into suffix tree;
S3, when having a packet to enter Network Security Device, according to described AC automaton described packet is mated, and utilizes described suffix tree to preserve matching status;
If the match is successful for S4, then abandon described packet.
2. method according to claim 1 is characterized in that, when described packet entered according to the order of sequence, then step S3 specifically comprised:
S31, when receiving current order data bag, whether the record of searching for the previous packet of described current order data bag exists;
S32, whether there is the status number of judging described current order data bag according to the record of the previous packet of described current order data bag;
S33, according to described AC automaton, begin described current order data bag is mated from the status number of described current order data bag.
3. method according to claim 2 is characterized in that, step S32 specifically comprises:
If the record of the previous packet of described current order data bag exists, then with the status number of the previous packet of the described current order data bag status number as described current order data bag;
If the record of the previous packet of described current order data bag does not exist, then the status number of described current order data bag is 0.
4. method according to claim 1 is characterized in that, when described data packet disorder entered, then step S3 specifically comprised:
S31 ', before receiving current out of order packet, whether the suffix tree record of searching for a rear packet of described current out of order packet exists, and whether the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree;
S32 ' is if the suffix tree of a rear packet of described current out of order packet record exists, and when the head that detects a rear packet of described current out of order packet is a suffix in the described suffix tree, then preserve the suffix tree numbering of described suffix tree on described suffix;
S33 ', when receiving described current out of order packet, the suffix tree numbering of described suffix tree on described suffix recalled;
S34 ', will recall the suffix that obtains and add to the afterbody of described current out of order packet, mate according to the described current out of order packet of described AC automaton after to restructuring.
5. method according to claim 4 is characterized in that, described step S31 ' comprises also whether the suffix tree record of the previous packet of searching for described current out of order packet exists:
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet;
If the suffix tree of the previous packet of described current out of order packet record exists, and the previous packet of described current out of order packet is not the part of described suffix, and then the suffix tree of described current out of order packet is numbered the suffix tree numbering of the previous packet of described current out of order packet.
6. method according to claim 4, it is characterized in that, described step S32 ' if described in the suffix tree record of a rear packet of described current out of order packet exist, and detect the rear part that packet is not suffix of described current out of order packet, the described current out of order packet of then a rear packet of described current out of order packet being recombinated mates, and the status number of the current out of order packet after the restructuring is the status number of a rear packet of described current out of order packet.
CN2012104884271A 2012-11-26 2012-11-26 Character string matching method based on automatic control (AC) automatic machine and suffix tree Pending CN103023883A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012104884271A CN103023883A (en) 2012-11-26 2012-11-26 Character string matching method based on automatic control (AC) automatic machine and suffix tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012104884271A CN103023883A (en) 2012-11-26 2012-11-26 Character string matching method based on automatic control (AC) automatic machine and suffix tree

Publications (1)

Publication Number Publication Date
CN103023883A true CN103023883A (en) 2013-04-03

Family

ID=47972014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104884271A Pending CN103023883A (en) 2012-11-26 2012-11-26 Character string matching method based on automatic control (AC) automatic machine and suffix tree

Country Status (1)

Country Link
CN (1) CN103023883A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104796354A (en) * 2014-11-19 2015-07-22 中国科学院信息工程研究所 Out-of-order data packet string matching method and system
CN105183788A (en) * 2015-08-20 2015-12-23 及时标讯网络信息技术(北京)有限公司 Operation method for Chinese AC automatic machine based on retrieval of keyword dictionary tree
CN105407096A (en) * 2015-11-26 2016-03-16 深圳市风云实业有限公司 Message data detection method based on stream management
CN105468597A (en) * 2014-08-14 2016-04-06 腾讯科技(北京)有限公司 Method and device for acquiring jump distance
CN106067039A (en) * 2016-05-30 2016-11-02 桂林电子科技大学 Method for mode matching based on decision tree beta pruning
CN108471355A (en) * 2018-02-28 2018-08-31 哈尔滨工程大学 A kind of Internet of Things Information Interoperability method based on extra large cloud computing framework
CN112506789A (en) * 2020-12-17 2021-03-16 中国科学院计算技术研究所 Parallel pattern matching method for data packet detection

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101009660A (en) * 2007-01-19 2007-08-01 杭州华为三康技术有限公司 Universal method and device for processing the match of the segmented message mode
CN101258721A (en) * 2005-06-30 2008-09-03 英特尔公司 Stateful packet content matching mechanisms
CN101562604A (en) * 2008-04-17 2009-10-21 北京启明星辰信息技术股份有限公司 Non-cache model matching method based on message flow data
CN102685098A (en) * 2012-02-24 2012-09-19 华南理工大学 Recombination-free multi-mode matching method for out-of-order data package flow

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101258721A (en) * 2005-06-30 2008-09-03 英特尔公司 Stateful packet content matching mechanisms
CN101009660A (en) * 2007-01-19 2007-08-01 杭州华为三康技术有限公司 Universal method and device for processing the match of the segmented message mode
CN101562604A (en) * 2008-04-17 2009-10-21 北京启明星辰信息技术股份有限公司 Non-cache model matching method based on message flow data
CN102685098A (en) * 2012-02-24 2012-09-19 华南理工大学 Recombination-free multi-mode matching method for out-of-order data package flow

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XINMING CHEN等: "AC-Suffix-Tree: Buffer Free String Matching on Out-of-Sequence Packets", 《2011 SEVENTH ACM/IEEE SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS》, 4 October 2011 (2011-10-04) *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468597A (en) * 2014-08-14 2016-04-06 腾讯科技(北京)有限公司 Method and device for acquiring jump distance
CN105468597B (en) * 2014-08-14 2020-09-25 腾讯科技(北京)有限公司 Method and device for acquiring jump distance
CN104796354A (en) * 2014-11-19 2015-07-22 中国科学院信息工程研究所 Out-of-order data packet string matching method and system
CN105183788A (en) * 2015-08-20 2015-12-23 及时标讯网络信息技术(北京)有限公司 Operation method for Chinese AC automatic machine based on retrieval of keyword dictionary tree
CN105183788B (en) * 2015-08-20 2019-01-25 及时标讯网络信息技术(北京)有限公司 A kind of Chinese AC automatic machine working method based on the retrieval of keyword dictionary tree
CN105407096A (en) * 2015-11-26 2016-03-16 深圳市风云实业有限公司 Message data detection method based on stream management
CN105407096B (en) * 2015-11-26 2019-03-19 深圳市风云实业有限公司 Message data detection method based on flow management
CN106067039A (en) * 2016-05-30 2016-11-02 桂林电子科技大学 Method for mode matching based on decision tree beta pruning
CN106067039B (en) * 2016-05-30 2019-01-29 桂林电子科技大学 Method for mode matching based on decision tree beta pruning
CN108471355A (en) * 2018-02-28 2018-08-31 哈尔滨工程大学 A kind of Internet of Things Information Interoperability method based on extra large cloud computing framework
CN112506789A (en) * 2020-12-17 2021-03-16 中国科学院计算技术研究所 Parallel pattern matching method for data packet detection

Similar Documents

Publication Publication Date Title
CN103023883A (en) Character string matching method based on automatic control (AC) automatic machine and suffix tree
US9798714B2 (en) System and method for keyword spotting using representative dictionary
Goel et al. Small subset queries and bloom filters using ternary associative memories, with applications
CN102510323B (en) Frame identifying method for serial data
CN102184197B (en) Regular expression matching method based on smart finite automaton (SFA)
US20080104702A1 (en) Network-based internet worm detection apparatus and method using vulnerability analysis and attack modeling
CN106649362B (en) Webpage crawling method and device
CN100452055C (en) Large-scale and multi-key word matching method for text or network content analysis
CN104753931A (en) DPI (deep packet inspection) method based on regular expression
CN101442540A (en) High speed mode matching algorithm based on field programmable gate array
CN100495407C (en) Multiple character string matching method and chip
CN103412858A (en) Method for large-scale feature matching of text content or network content analyses
CN109660517A (en) Anomaly detection method, device and equipment
CN109327451A (en) A kind of method, system, device and medium that the upload verifying of defence file bypasses
CN105407096A (en) Message data detection method based on stream management
CN101030897B (en) Method for matching mode in invading detection
US8812480B1 (en) Targeted search system with de-obfuscating functionality
Afek et al. Making DPI engines resilient to algorithmic complexity attacks
CN103746869A (en) Data/mask and regular expression combined multistage deep packet detection method
CN102685098B (en) Recombination-free multi-mode matching method for out-of-order data package flow
CN102437959B (en) Stream forming method based on dual overtime network message
CN101938474A (en) Network intrusion detection and protection method and device
CN106612303A (en) Data processing method and data processing device
Chen et al. Ac-suffix-tree: Buffer free string matching on out-of-sequence packets
CN102143151A (en) Deep packet inspection based protocol packet spanning inspection method and deep packet inspection based protocol packet spanning inspection device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130403