WO2014000305A1 - Method and apparatus for content matching - Google Patents

Method and apparatus for content matching Download PDF

Info

Publication number
WO2014000305A1
WO2014000305A1 PCT/CN2012/077996 CN2012077996W WO2014000305A1 WO 2014000305 A1 WO2014000305 A1 WO 2014000305A1 CN 2012077996 W CN2012077996 W CN 2012077996W WO 2014000305 A1 WO2014000305 A1 WO 2014000305A1
Authority
WO
WIPO (PCT)
Prior art keywords
hash
target
string
table entry
matching
Prior art date
Application number
PCT/CN2012/077996
Other languages
French (fr)
Chinese (zh)
Inventor
徐文广
戴崇经
田聃
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201280000614.9A priority Critical patent/CN102870116B/en
Priority to PCT/CN2012/077996 priority patent/WO2014000305A1/en
Publication of WO2014000305A1 publication Critical patent/WO2014000305A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Definitions

  • the present invention relates to data processing technologies, and in particular, to a content matching method and apparatus. Background technique
  • DPI Deep Packet Inspection
  • the prior art content matching technique for a string or a feature word typically performs the following operations: a) dividing the target string into at least one first character string; b) generating a second string group by combining, for example, further a substring of a string as a second string; c) extracting a third string from the second string, for example, filtering a commonly used string as a third string according to a blacklist or a whitelist, using a state machine or An algorithm such as a rule tree compiles each third string; d) using a sliding window method to compare whether the detected string matches the third string at the first string node according to different starting positions; e) If the match is successful, but the next string node exists, the next matching process is entered; f) if the match is successful, and there is no next string node, the detected string matches the target string; g) if the match fails, then The detected string does not match the target string.
  • the existing content matching method has at least the following defects: 1) If the target string is long, the matching node branching and matching time are multiplied, the performance will drop sharply; 2) In order to improve performance, only multiple matching engines can be used. The resource consumption is too large; 3) When the target string is added, the rule tree needs to be recompiled, which is not conducive to hot upgrade, and can only be solved by using the table item backup switching mode. Summary of the invention
  • the embodiment of the invention provides a content matching method and device, which can improve the matching speed of string content matching, reduce the resource consumption, and facilitate upgrading and maintenance.
  • An embodiment of the present invention provides a content matching method, including:
  • the embodiment of the invention further provides a content matching device, including:
  • a first hash operation module configured to perform hash operation on at least one target character string respectively according to the set at least one hash algorithm, to respectively obtain each target hash corresponding to each hash algorithm and each hash algorithm Result
  • a hash table forming module configured to form a hash table entry of the target string according to each target hash result of each target string, and combine the hash entries of each target string into a hash matching table
  • a hash operation module configured to perform a hash operation on the measured character string according to the at least one hash algorithm, to obtain the measured hash results corresponding to each hash algorithm of the tested character string
  • the table matching module is configured to perform matching in each hash table of the hash matching table according to each measured hash result of the tested character string to obtain a matching result.
  • the content matching method and device simplifies the system resources occupied by the matching, and does not require additional switching or backup resources; the string extraction process and the string hash matching process are performed in parallel, which can greatly shorten the matching time and can Improve the matching speed; the matching operation of the hash result is not affected by the length of the target string, so the matching efficiency is high; and when the target string is increased or decreased, it is not necessary to recompile the hash matching table, and only need to modify the corresponding Hash entries are fine, and the hash algorithm and its number can be updated at any time, so it is easy to upgrade and maintain.
  • FIG. 1 is a flowchart of a content matching method according to Embodiment 1 of the present invention.
  • FIG. 2 is a flowchart of a content matching method according to Embodiment 2 of the present invention.
  • FIG. 3 is a flowchart of a content matching method according to Embodiment 3 of the present invention
  • 4 is a flowchart of a content matching method according to Embodiment 4 of the present invention
  • FIG. 5 is a schematic structural diagram of a content matching apparatus according to Embodiment 5 of the present invention.
  • FIG. 6 is a schematic structural diagram of a content matching apparatus according to Embodiment 6 of the present invention.
  • FIG. 7 is a schematic structural diagram of a content matching apparatus according to Embodiment 7 of the present invention. detailed description
  • FIG. 1 is a flowchart of a content matching method according to Embodiment 1 of the present invention.
  • the content matching method may be specifically applied to various application scenarios, such as URL filtering, message filtering, etc., by software and/or hardware.
  • the content matching device carried in the server is executed, for example, by a Gateway GPRS Support Node (GGSN).
  • GGSN Gateway GPRS Support Node
  • Step 110 The content matching device respectively performs hash operation on at least one target character string according to the set at least one hash algorithm, to respectively obtain each target hash result corresponding to each hash algorithm by each target character string;
  • Step 120 The content matching device forms a hash table entry of the target character string according to each target hash result of each target character string, and combines the hash table entries of each target character string into a hash matching table.
  • Step 130 The content matching device performs a hash operation on the measured character string according to the at least one hash algorithm, to obtain the hashed operation result corresponding to each hash algorithm corresponding to each hash algorithm.
  • Step 140 The content matching device performs matching in each hash table item of the hash matching table according to each measured hash operation result of the tested character string to obtain a matching result.
  • the technical solution of this embodiment includes steps 1 10 and 120 for compiling the target character string, and steps 130 and 140 for matching the detected character string with the compiled hash matching table.
  • the target string is a character string used as a matching criterion in the content matching technology and can be preset by the user.
  • the so-called measured string is a string that needs to be matched and filtered in the content matching technology, for example, a field, a web address, and the like in the to-be-filtered message.
  • a user can set a keyword that reflects a filtering target as a target string, for example, a string that reflects violence, pornography, and the like, and pre-compiles the target string to perform subsequent Match operation.
  • the URL opened by the user, as the measured string will first match the pre-compiled target string. If the match is consistent, the webpage can be filtered out. Otherwise, the webpage of the URL is normally opened.
  • the target string is converted into a target hash result by a hash algorithm, and the same hash algorithm is used to obtain the measured hash result of the tested string, and the hashed result is matched to obtain whether the measured string is obtained. The result of matching the target string.
  • the technical solution of the embodiment of the present invention simplifies the system resources occupied by the matching, and does not require additional switching or backup resources; in addition, the string extraction process and the string hash matching process are performed in parallel, when the measured character string in the text is compared For a long time, for example, more than 20, the matching time can be greatly shortened, and the matching operation can be improved.
  • the matching result of the hash result is not affected by the length of the target string, so the matching efficiency is high; and the target is increased or decreased.
  • the string it is not necessary to recompile the hash matching table, but only the corresponding hash table item can be modified.
  • the hash algorithm and its number can also be updated at any time, so it is easy to upgrade and maintain.
  • the accuracy of the matching result is related to the number of specific hash algorithms and hash algorithms used. Choosing the appropriate hash algorithm maximizes the characteristics of the target string so that the same string has the same hash result. Increasing the number of hash algorithms can also reduce the probability that different strings have identical hash results, thus reducing the false match rate.
  • the specific hash algorithm and its number can be set according to the actual application scenario such as the number of target strings.
  • the technical solution of the embodiment of the present invention can be applied to various situations, and is not limited to matching of a string, and may also be a matching suitable for a string of data.
  • the number of the hashing algorithms is at least two, and the hashing of the target string according to the target hash result of each target string may specifically include:
  • the first target hash result of each target string is indexed as a hash table entry, and the other target hash result is used as a hash table entry content; in the above step, the first measured hash is selected.
  • the hash result is used as the index of the hash table entry, but in actual application, it is not limited to determine which result is used as the index of the entry according to the order of the hash result, and the order of the hash algorithm can be arbitrarily set, and the first obtainment can be arbitrarily determined.
  • the target hash result is indexed as a hash table entry.
  • Step 140 Perform matching according to each hashed result of the measured character string in each hash table of the hash matching table to obtain a matching result, and specifically perform the following operations:
  • Step 141 The content matching device uses the first hashed result of the measured character string as a hash table entry index, and searches for a corresponding hash table entry in the hash matching table.
  • Step 142 If searching Go to the corresponding hash table entry, the content matching device matches the other measured hash results of the tested character string with the content of the found hash table entry;
  • Step 143 When the other measured hash results are consistent with the contents of the found hash table entries, the matching success result is obtained.
  • FIG. 2 is a flowchart of a content matching method according to Embodiment 2 of the present invention. This embodiment describes each step in detail by way of an example.
  • the hash algorithm used by the user is five, and the string features are reflected as much as possible, as shown in Table 1.
  • the actual operation is not limited to this, and may be any one or any combination of the following hash algorithms, or may be added.
  • Other hash algorithms, when the target string is shorter or less, can also set the original string as a hash algorithm:
  • Step 201 The content matching device hashes the target character string according to the set five hash algorithms to respectively obtain the target hash results corresponding to the target string and each hash algorithm. Assume that there are three target strings. They are:
  • Target string 1 "ABCDEFG123456789"
  • Target string 2 "abcdefg-xyz”
  • target string 3 "Accept-Language”
  • Step 202 The content matching device forms a hash table entry of the target character string according to each target hash result of each target character string, and the hash table entries of each target character string are combined to form a hash matching table.
  • the target hash result of each target string constitutes a hash match table as shown in Table 3 below, in which the first target hash result is used as a hash table entry index (tab_index), and other target hash results are used as a hash table.
  • Item content tab_ content
  • the target hash result 1 is used as a hash table entry index
  • the target hash result 2 ⁇ 5 is used as a hash table entry to generate a hash match table.
  • table 3
  • Step 203 The content matching apparatus performs a hash operation on the measured character string according to the at least one hash algorithm to obtain the measured character string and each hash. The measured hash results for the algorithm.
  • the measured character string which may be determined by the specific application scenario of the content matching method, and may be a string to be matched in the network, for example, a hypertext transfer protocol in the receiving network (Hypertext Transfer Protocol, Referred to as HTTP), the request is as follows:
  • User-Agent Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
  • the extracted character string 8 "Connection"
  • the extraction strategy of the above-mentioned character string may also be various, and the embodiment of the present invention is not limited thereto.
  • the measured hash results obtained according to the set hash algorithm are shown in Table 4 below:
  • Step 204 The content matching device uses the first hashed result of the measured character string as a hash table entry index, and searches for a corresponding hash table entry in the hash matching table.
  • Step 205 If the corresponding hash table entry is found, the content matching device matches the other measured hash results of the tested character string with the hash table entry content of the hash table entry. If the corresponding hash table cannot be found, no subsequent steps are required to indicate that the match fails.
  • Host 0x20 entry is empty and does not match
  • Connection 0x59 entry is empty, does not match
  • Step 206 When the other measured hash results of the tested string match the contents of all the hash entries of the hash table item that are found, the matching success result is obtained.
  • the measured strings "Accept-Language” and “ABCDEFG123456789” match the target string.
  • a string ID that matches successfully can be output according to the matching result, so as to perform subsequent operations according to the matching result, such as URL filtering. It is also possible to further determine whether there is any subsequent message input, and if so, repeat the above matching process.
  • the technical solution of this embodiment describes the operations of each step in detail. Since the matching operations of the entries can be performed independently, the hash result calculation and matching of each tested string can be performed in parallel, and the pipeline achieves high rate matching, so significant Improved the matching speed.
  • the hash result of the hash algorithm may be selected as the index of the entry, or multiple hash results of the multiple hash algorithms may be selected as the index of the entry.
  • combining multiple hash results is a combined hash algorithm. Therefore, the hash algorithm in the embodiment of the present invention can be not only a simple hash calculation, but also a combined hash algorithm of multiple simple hash calculations, which can more prominently characterize the character string and improve the matching precision.
  • FIG. 3 is a flowchart of a content matching method according to Embodiment 3 of the present invention.
  • the embodiment may be further configured to include an upgrade operation of adding a target character string, and the method further includes the following steps:
  • Step 310 The content matching device adds a target character string to be added in the request according to the received target character string, and performs a hash operation on the target character string to be added according to the set at least one hash algorithm to obtain the to-be-added Target hashes and target hash results corresponding to each hash algorithm;
  • other restrictions on the adding operation may also be set, for example, first determining whether the entry of the hash matching table has reached the upper limit value, thereby determining whether to allow the addition of a new target character string.
  • Step 320 The content matching device uses the first target hash result of the target character string to be added as a hash table entry index, and reads the corresponding hash table entry from the hash matching table as the current hash table entry. ;
  • Step 330 The content matching device determines whether the content of the current hash entry is empty, and if yes, step 340 is performed, and if no, step 350 is performed;
  • Step 340 When the content of the current hash entry is empty, add another target hash result of the target string to the entry of the current hash entry, as the current hash entry.
  • Step 350 When the content of the current hash entry is not empty, use the cascading mode to use the other target hash result of the target string to be added as the current hash entry.
  • the contents of the primary table entry are added to the hash matching table.
  • the process of adding a target string in this embodiment can effectively avoid the conflict caused by the same index of the string hash table entry. If the hash table entry index of the target string is the same, you can set another hash table in cascade mode. When the measured string is matched, the items stored in the cascading form can be matched in order to ensure the accuracy of the matching result.
  • the operation of adding the other target hash result of the target string to be added to the hash matching table as the content of the next-level entry of the current hash table may include: Step 351: Comparing Whether the other target hash result of the target string is consistent with the content of the current hash entry, if yes, go to step 352, if no, go to step 353; Step 352, if they are consistent, discard the to-be-added a target string, ending the adding operation process; Step 353: If not, reading the index of the next-level offset entry of the current hash entry, and reading the next-level hash table entry according to the offset entry index.
  • the next-level hash table entry is used as the updated current hash table entry;
  • Step 354 Determine whether the content of the updated current hash entry is empty, and if yes, go to step 355, if no, go to step 356;
  • Step 355 When it is determined that the content of the updated current hash entry is empty, add another target hash result of the target string to be added as the entry of the current hash entry.
  • Step 356 When it is determined that the content of the updated current hash entry is not empty, return to perform the comparison operation of step 351.
  • the cascading entries may have multiple levels, and each level records the index of the offset entry of the next level, so that when the primary entry fails, the transition to the next level continues to match until the match is successful or There is no next level offset entry. That is, in the matching process of the tested character string, if the corresponding hash table entry is found, and the other measured hash results of the tested character string are matched with the content of the found hash table entry, The method includes: when the other measured hash result is inconsistent with the content of the found hash table entry, searching for the next-level hash table entry according to the offset table entry index order, and returning to execute the measured character The operation of matching the other hashed results of the string with the contents of the found hash table entry.
  • the addition of an entry does not affect other entries, and the full-text update of the hash matching table is not necessary, so that maintenance and upgrade are easy to implement.
  • an original string hash algorithm including at least the original string itself as a hash result, and the original string is used as an entry of a next-level hash table entry. content.
  • Hash algorithm 1 string is different by single byte or
  • the hash result of the target string is shown in Table 6 below:
  • Embodiment 4 is a flowchart of a content matching method according to Embodiment 4 of the present invention. This embodiment may further improve the operation of modifying or deleting a target character string based on any of the foregoing embodiments, where the modification and deletion operations are basically similar, including The following steps:
  • Step 410 The content matching device, according to the received target string modification request or the target string to be modified or to be deleted in the deletion request, and the target string to be modified or to be deleted based on the set at least one hash algorithm Performing a hash operation to obtain each target hash result corresponding to each hash algorithm of the target string to be modified or to be deleted;
  • Step 420 The content matching device uses the first target hash result of the target character string to be modified or to be deleted as a hash table entry index, and reads the corresponding hash table entry from the hash matching table as the current History entry
  • Step 430 The content matching device modifies or deletes the content of the hash entry of the current hash entry.
  • the modification operation is to modify the contents of the corresponding hash table.
  • the delete operation deletes the corresponding cascading entry or deletes the entire entry.
  • FIG. 5 is a schematic structural diagram of a content matching apparatus according to Embodiment 5 of the present invention.
  • the content matching apparatus includes: a first hash operation module 510, a hash table forming module 520, and a second hash operation module. 530 and hash table matching module 540.
  • the first hash operation module 510 is configured to respectively perform hashing on at least one target character string according to the set at least one hash algorithm, to respectively acquire each target string and each target corresponding to each hash algorithm.
  • Hash result; the hash table forming module 520 is configured to form a hash table entry of the target string according to each target hash result of each target string, and combine hash entries of each target string to form a hash match.
  • the second hash operation module 530 is configured to perform a hash operation on the measured character string according to the at least one hash algorithm to obtain the measured hashes corresponding to the hash algorithms.
  • the hash table matching module 540 is configured to perform matching in each hash table of the hash matching table according to each measured hash result of the tested character string to obtain a matching result.
  • the number of hash algorithms is preferably at least two, and the hash table forming module 520 is specifically configured to use the first target hash result of each target string as a hash table entry.
  • the index, the other target hash result is used as the hash table item content, and the hash table items of the respective target strings are combined to form a hash matching table; then the hash table matching module 540 may include: an index matching unit 541, content matching Unit 542 and result acquisition unit 543.
  • the index matching unit 541 is configured to: use the first hashed result of the measured character string as a hash table entry index, and search for a corresponding hash table entry in the hash matching table; 542 is configured to match another measured hash result of the measured character string with the content of the searched hash table item if the corresponding hash table entry is found; the result obtaining unit 543 is configured to use the other When the measured hash result matches the content of the hash table item that is found, the matching success result is obtained.
  • the content matching device implemented by the embodiment of the present invention implements matching of the string by matching the hash result, and can implement parallel matching of each entry, thereby improving the matching speed; the update of each entry in the hash matching table is not Mutual influence, easy to achieve the addition and modification of the target string.
  • FIG. 6 is a schematic structural diagram of a content matching apparatus according to Embodiment 6 of the present invention.
  • the content matching apparatus may further include: a third hash operation module 610, a table entry read block 620, and a content.
  • the third hash operation module 610 is configured to add a target character string to be added in the request according to the received target character string, and perform hash operation on the target character string to be added based on the set at least one hash algorithm. And obtaining the target hash result corresponding to each hash algorithm of the target string to be added; the entry reading module 620 is configured to use the first target hash result of the target string to be added as a hash table.
  • the content adding module 630 is configured to: when the content of the current hash table item is empty, the Adding another target hash result of the target string to the current hash table item as the content of the current hash table item;
  • the cascading adding module 640 is configured to: when the content of the current hash table item is not empty In the cascading manner, the other target hash result of the target string to be added is added to the hash matching table as the content of the next level of the current hash entry.
  • the cascading adding module 640 preferably includes: a comparing unit 641, a discarding unit 642, an offset index reading unit 643, a content adding unit 644, and a content judging unit 645.
  • the comparison unit 641 is configured to compare whether the other target hash result of the target character string to be added is consistent with the content of the current hash table entry; the discarding unit 642 is configured to discard the target character string to be added if they are consistent;
  • the offset index reading unit 643 is configured to: read an index of the next-level offset entry of the current hash table entry, and read the next-level hash table entry according to the offset entry index, if The next-level hash table entry is used as the updated current hash table entry;
  • the content adding unit 644 is configured to: when it is determined that the updated current hash entry of the current hash entry is empty, the target to be added The other target hash result of the string is added as the content of the entry of the current hash table entry; the content determining unit 645 is configured to
  • the content matching device may further include: an offset entry search unit, when the other measured hash results are inconsistent with the content of the found hash table entry, searching for the next level according to the offset table entry index order And the operation of the content matching unit to match the other measured hash results of the measured character string with the content of the found hash table entry.
  • the content matching device provided in this embodiment can conveniently add a target character string without causing compilation of the entire hash matching table, so that it is easy to upgrade and maintain.
  • the cascading entries By setting the cascading entries, the conflict between the hash entries of the target strings can be effectively solved, and the matching precision is improved.
  • FIG. 7 is a schematic structural diagram of a content matching apparatus according to Embodiment 7 of the present invention.
  • the content matching apparatus may further include: a fourth hash operation module 710, an index matching module 720, and a modification deletion module 730.
  • the fourth hash operation module 710 is configured to modify, according to the received target string modification request or the target string to be deleted in the request, the target string to be modified or to be deleted, based on the set at least one hash algorithm.
  • the target string to be deleted is hashed to obtain the target hash result corresponding to each hash algorithm of the target string to be modified or to be deleted; the index matching module 720 uses The first target hash result of the target string to be modified or to be deleted is used as a hash table entry index, and the corresponding hash table entry is read from the hash matching table as the current hash table entry; The deleting module 730 is configured to modify or delete the hash table item content of the current hash entry.
  • the content matching device provided in this embodiment can easily modify and delete the target character string, and does not cause the compilation of the entire hash matching table, so it is easy to upgrade and maintain.
  • the content matching device provided by the embodiments of the present invention may perform the content matching method provided by any embodiment of the present invention, and has corresponding functional modules.
  • the content matching method and device have many advantages, which can improve the matching speed, reduce the resource occupation, and can be easily upgraded and maintained.
  • the method includes the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the invention provide a method and an apparatus for content matching. The method comprises: performing a hash operation to at least one target string based on at least one set hash algorithm, to respectively obtain each target hash result; forming a hash table item of the target string according to each target hash result of each target string, and combining the hash table item of each target string to form a hash matching table; performing a hash operation to to-be-tested strings according to the at least one hash algorithm, to obtain to-be-tested hash results; matching each hash table item in the hash matching table according to each to-be-tested hash result of the to to-be-tested strings, to obtain matching results. According to the present invention, the system resources occupied by the matching can be simplified; the string extraction process and the string hash matching process are in parallel execution, so matching speed can be improved; and when target strings are increased or decreased, recompilation of the hash matching table is not needed, which is easy for upgrade and maintenance.

Description

内容匹配方法和装置 技术领域 本发明实施例涉及数据处理技术, 尤其涉及一种内容匹配方法和装置。 背景技术  The present invention relates to data processing technologies, and in particular, to a content matching method and apparatus. Background technique
随着网络精细化发展, 许多网络用户和设备商越来越关注报文 7层以上 内容, 用于进行包过滤、 内容计费、 流量检测、 搜索引擎等, 在国防、 公安、 安全、 网络服务管理、 商业广告等领域也逐步得到广泛应用。 深度报文解析 ( Deep Packet Inspection, 简称 DPI )技术应运而生, 可基于协议规定识别报 文中的各个字段内容。协议识别 /解析是 DPI关键技术之一, 而字符串 /特征字 匹配是协议识别 /解析的重要内容, 匹配速度的快慢直接影响产品性能。  With the development of network refinement, many network users and equipment vendors are paying more and more attention to the content of packets above 7 layers, which are used for packet filtering, content charging, traffic detection, search engines, etc. in defense, public security, security, and network services. Management, commercial advertising and other fields are gradually being widely used. Deep Packet Inspection (DPI) technology emerges as the times require, and each field in the message can be identified based on the protocol. Protocol identification/parsing is one of the key technologies of DPI, and string/feature word matching is an important part of protocol identification/parsing. The speed of matching directly affects product performance.
现有技术针对字符串或特征字进行的内容匹配技术典型的执行如下操 作: a )将目标字符串分为至少一个第一字符串; b )通过组合生成第二字符 串组, 例如进一步将第一字符串的子串作为第二字符串; c )从第二字符串中 提取第三字符串, 例如按照黑名单、 白名单筛选出常用的字符串作为第三字 符串, 釆用状态机或规则树等算法编译各第三字符串; d )釆用滑窗方式, 根 据不同的起始位置, 比较被检测字符串中是否匹配第一个字符串节点处的第 三字符串; e )如果匹配成功, 但存在下一个字符串节点, 则进入下一个匹配 流程; f )如果匹配成功, 且无下一字符串节点, 则被检测字符串与目标字符 串匹配; g )如果匹配失败, 则被检测字符串与目标字符串不匹配。  The prior art content matching technique for a string or a feature word typically performs the following operations: a) dividing the target string into at least one first character string; b) generating a second string group by combining, for example, further a substring of a string as a second string; c) extracting a third string from the second string, for example, filtering a commonly used string as a third string according to a blacklist or a whitelist, using a state machine or An algorithm such as a rule tree compiles each third string; d) using a sliding window method to compare whether the detected string matches the third string at the first string node according to different starting positions; e) If the match is successful, but the next string node exists, the next matching process is entered; f) if the match is successful, and there is no next string node, the detected string matches the target string; g) if the match fails, then The detected string does not match the target string.
现有的内容匹配方法至少存在如下缺陷: 1 )如果目标字符串较长时, 匹 配节点分支和匹配时间成倍增加, 性能会急剧下降; 2 )为了提高性能时, 只 能釆用多匹配引擎, 资源消耗过大; 3 )新增目标字符串时, 需要重新编译规 则树, 不利于热升级, 只能釆用表项备份切换方式解决。 发明内容  The existing content matching method has at least the following defects: 1) If the target string is long, the matching node branching and matching time are multiplied, the performance will drop sharply; 2) In order to improve performance, only multiple matching engines can be used. The resource consumption is too large; 3) When the target string is added, the rule tree needs to be recompiled, which is not conducive to hot upgrade, and can only be solved by using the table item backup switching mode. Summary of the invention
本发明实施例提供一种内容匹配方法和装置, 以提高字符串内容匹配时 的匹配速度, 减少资源占用量, 同时便于升级和维护。 本发明实施例提供了一种内容匹配方法, 包括: The embodiment of the invention provides a content matching method and device, which can improve the matching speed of string content matching, reduce the resource consumption, and facilitate upgrading and maintenance. An embodiment of the present invention provides a content matching method, including:
基于设定的至少一种哈希算法对至少一个目标字符串分别进行哈希运 算, 以分别获取每个目标字符串与各哈希算法对应的各目标哈希结果;  Performing, by each of the at least one target character string, a hash operation based on the set at least one hash algorithm, to respectively obtain each target hash result corresponding to each hash algorithm and each hash algorithm;
根据每个目标字符串的各目标哈希结果形成该目标字符串的哈希表项, 将各个目标字符串的哈希表项组合形成哈希匹配表;  Forming a hash table entry of the target string according to each target hash result of each target string, and combining the hash entries of each target string to form a hash matching table;
根据所述至少一种哈希算法对被测字符串进行哈希运算, 以获取所述被 测字符串与各哈希算法对应的各被测哈希结果;  Performing a hash operation on the measured character string according to the at least one hash algorithm to obtain the measured hash results corresponding to the hash algorithms of the measured character string;
根据所述被测字符串的各被测哈希结果在所述哈希匹配表的各哈希表项 中进行匹配, 以获得匹配结果。  Matching each hashed result of the measured character string in each hash table of the hash matching table to obtain a matching result.
本发明实施例还提供了一种内容匹配装置, 包括:  The embodiment of the invention further provides a content matching device, including:
第一哈希运算模块, 用于基于设定的至少一种哈希算法对至少一个目标 字符串分别进行哈希运算, 以分别获取每个目标字符串与各哈希算法对应的 各目标哈希结果;  a first hash operation module, configured to perform hash operation on at least one target character string respectively according to the set at least one hash algorithm, to respectively obtain each target hash corresponding to each hash algorithm and each hash algorithm Result
哈希表形成模块, 用于根据每个目标字符串的各目标哈希结果形成该目 标字符串的哈希表项, 将各个目标字符串的哈希表项组合形成哈希匹配表; 第二哈希运算模块, 用于根据所述至少一种哈希算法对被测字符串进行 哈希运算, 以获取所述被测字符串与各哈希算法对应的各被测哈希结果; 哈希表匹配模块, 用于根据所述被测字符串的各被测哈希结果在所述哈 希匹配表的各哈希表项中进行匹配, 以获得匹配结果。  a hash table forming module, configured to form a hash table entry of the target string according to each target hash result of each target string, and combine the hash entries of each target string into a hash matching table; a hash operation module, configured to perform a hash operation on the measured character string according to the at least one hash algorithm, to obtain the measured hash results corresponding to each hash algorithm of the tested character string; The table matching module is configured to perform matching in each hash table of the hash matching table according to each measured hash result of the tested character string to obtain a matching result.
本发明实施例提供的内容匹配方法和装置, 简化了匹配所占用的*** 资源, 无需额外的倒换或备份资源; 字符串提取过程和字符串哈希匹配过 程并行执行, 可以大大缩短匹配时间, 能提高匹配速度; 哈希结果的匹配 操作不会受到目标字符串长短的影响, 所以匹配效率高; 且在增加或减少 目标字符串时, 不必对哈希匹配表重新编译, 而只需修改相应的哈希表项 即可, 哈希算法及其数量也可以随时更新, 因此易于升级和维护。 附图说明  The content matching method and device provided by the embodiment of the invention simplifies the system resources occupied by the matching, and does not require additional switching or backup resources; the string extraction process and the string hash matching process are performed in parallel, which can greatly shorten the matching time and can Improve the matching speed; the matching operation of the hash result is not affected by the length of the target string, so the matching efficiency is high; and when the target string is increased or decreased, it is not necessary to recompile the hash matching table, and only need to modify the corresponding Hash entries are fine, and the hash algorithm and its number can be updated at any time, so it is easy to upgrade and maintain. DRAWINGS
图 1为本发明实施例一提供的内容匹配方法的流程图;  1 is a flowchart of a content matching method according to Embodiment 1 of the present invention;
图 2为本发明实施例二提供的内容匹配方法的流程图;  2 is a flowchart of a content matching method according to Embodiment 2 of the present invention;
图 3为本发明实施例三提供的内容匹配方法的流程图; 图 4为本发明实施例四提供的内容匹配方法的流程图; 3 is a flowchart of a content matching method according to Embodiment 3 of the present invention; 4 is a flowchart of a content matching method according to Embodiment 4 of the present invention;
图 5为本发明实施例五提供的内容匹配装置的结构示意图;  FIG. 5 is a schematic structural diagram of a content matching apparatus according to Embodiment 5 of the present invention; FIG.
图 6为本发明实施例六提供的内容匹配装置的结构示意图;  6 is a schematic structural diagram of a content matching apparatus according to Embodiment 6 of the present invention;
图 7为本发明实施例七提供的内容匹配装置的结构示意图。 具体实施方式  FIG. 7 is a schematic structural diagram of a content matching apparatus according to Embodiment 7 of the present invention. detailed description
为使本发明实施例的目的、 技术方案和优点更加清楚, 下面将结合本 发明实施例中的附图, 对本发明实施例中的技术方案进行清楚、 完整地描 述,显然, 所描述的实施例是本发明一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作出创造性劳动前提 下所获得的所有其他实施例, 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
实施例一  Embodiment 1
图 1为本发明实施例一提供的内容匹配方法的流程图, 该内容匹配方 法可具体应用于各种应用场景中, 典型地如网址过滤、 报文过滤等, 由以 软件和 /或硬件形式承载于服务器中的内容匹配装置来执行,例如承载于网 关 GPRS支持节点 ( Gateway GPRS Support Node, 简称 GGSN ) 。 该方法 具体包括如下步骤:  FIG. 1 is a flowchart of a content matching method according to Embodiment 1 of the present invention. The content matching method may be specifically applied to various application scenarios, such as URL filtering, message filtering, etc., by software and/or hardware. The content matching device carried in the server is executed, for example, by a Gateway GPRS Support Node (GGSN). The method specifically includes the following steps:
步骤 110、 内容匹配装置基于设定的至少一种哈希算法对至少一个目 标字符串分别进行哈希运算, 以分别获取每个目标字符串与各哈希算法对 应的各目标哈希结果;  Step 110: The content matching device respectively performs hash operation on at least one target character string according to the set at least one hash algorithm, to respectively obtain each target hash result corresponding to each hash algorithm by each target character string;
步骤 120、 内容匹配装置根据每个目标字符串的各目标哈希结果形成 该目标字符串的哈希表项, 将各个目标字符串的哈希表项组合形成哈希匹 配表;  Step 120: The content matching device forms a hash table entry of the target character string according to each target hash result of each target character string, and combines the hash table entries of each target character string into a hash matching table.
步骤 130、 内容匹配装置根据所述至少一种哈希算法对被测字符串进 行哈希运算, 以获取所述被被测字符串与各哈希算法对应的各个被测哈希 运算结果;  Step 130: The content matching device performs a hash operation on the measured character string according to the at least one hash algorithm, to obtain the hashed operation result corresponding to each hash algorithm corresponding to each hash algorithm.
步骤 140、 内容匹配装置根据所述被测字符串的各个被测哈希运算结 果在所述哈希匹配表的各哈希表项中进行匹配, 以获得匹配结果。  Step 140: The content matching device performs matching in each hash table item of the hash matching table according to each measured hash operation result of the tested character string to obtain a matching result.
本实施例的技术方案中包括了对目标字符串的编译步骤 1 10和 120, 以及釆用编译的哈希匹配表对被测字符串进行匹配的步骤 130和 140。 所 谓目标字符串, 是内容匹配技术中, 作为匹配基准的字符串, 可以由用户 预先设定。 所谓被测字符串, 是在内容匹配技术中, 作为需要被匹配过滤 的字符串, 例如待过滤报文中的字段、 网址等。 例如, 在网页过滤的应用 中,用户可设定体现过滤目标的关键字作为目标字符串,例如能体现暴力、 色情等等待过滤内容的字符串, 对目标字符串进行预编译, 以便执行后续 的匹配操作。 随后用户打开的网址, 作为被测字符串, 首先会与预编译的 目标字符串进行匹配, 如果匹配一致, 则可以过滤掉该网页, 否则, 则正 常打开该网址的网页。 The technical solution of this embodiment includes steps 1 10 and 120 for compiling the target character string, and steps 130 and 140 for matching the detected character string with the compiled hash matching table. Place The target string is a character string used as a matching criterion in the content matching technology and can be preset by the user. The so-called measured string is a string that needs to be matched and filtered in the content matching technology, for example, a field, a web address, and the like in the to-be-filtered message. For example, in a webpage filtering application, a user can set a keyword that reflects a filtering target as a target string, for example, a string that reflects violence, pornography, and the like, and pre-compiles the target string to perform subsequent Match operation. Then the URL opened by the user, as the measured string, will first match the pre-compiled target string. If the match is consistent, the webpage can be filtered out. Otherwise, the webpage of the URL is normally opened.
本实施例将目标字符串通过哈希算法转换为目标哈希结果, 釆用同样 的哈希算法获取被测字符串的被测哈希结果, 通过哈希结果的匹配来得到 被测字符串是否与目标字符串匹配的结果。  In this embodiment, the target string is converted into a target hash result by a hash algorithm, and the same hash algorithm is used to obtain the measured hash result of the tested string, and the hashed result is matched to obtain whether the measured string is obtained. The result of matching the target string.
本发明实施例的技术方案, 简化了匹配所占用的***资源, 无需额外 的倒换或备份资源; 此外, 字符串提取过程和字符串哈希匹配过程并行执 行, 当 艮文中的被测字符串较多时, 例如 20个以上, 也可以大大缩短匹配 时间, 可流水作业, 能提高匹配速度; 哈希结果的匹配操作不会受到目标 字符串长短的影响, 所以匹配效率高; 且在增加或减少目标字符串时, 不 必对哈希匹配表重新编译, 而只需修改相应的哈希表项即可, 哈希算法及 其数量也可以随时更新, 因此易于升级和维护。  The technical solution of the embodiment of the present invention simplifies the system resources occupied by the matching, and does not require additional switching or backup resources; in addition, the string extraction process and the string hash matching process are performed in parallel, when the measured character string in the text is compared For a long time, for example, more than 20, the matching time can be greatly shortened, and the matching operation can be improved. The matching result of the hash result is not affected by the length of the target string, so the matching efficiency is high; and the target is increased or decreased. When the string is used, it is not necessary to recompile the hash matching table, but only the corresponding hash table item can be modified. The hash algorithm and its number can also be updated at any time, so it is easy to upgrade and maintain.
本实施例的技术方案, 匹配结果的精确度与所釆用的具体哈希算法和 哈希算法的数量相关。 选择适当的哈希算法, 能最大限度地体现目标字符 串的特性, 从而使得相同的字符串具有相同的哈希结果。 增加哈希算法的 数量, 同样可以降低不同字符串具有完全相同的哈希结果的概率, 从而可 减小误匹配率。 具体所釆用的哈希算法及其数量可根据目标字符串的数量 等实际应用场景来设置。 本发明实施例的技术方案可适用于多种情况, 并 不限于字符串的匹配, 也可以为适用于一串数据的匹配。  In the technical solution of this embodiment, the accuracy of the matching result is related to the number of specific hash algorithms and hash algorithms used. Choosing the appropriate hash algorithm maximizes the characteristics of the target string so that the same string has the same hash result. Increasing the number of hash algorithms can also reduce the probability that different strings have identical hash results, thus reducing the false match rate. The specific hash algorithm and its number can be set according to the actual application scenario such as the number of target strings. The technical solution of the embodiment of the present invention can be applied to various situations, and is not limited to matching of a string, and may also be a matching suitable for a string of data.
实施例二  Embodiment 2
在上述实施例技术方案的基础上, 优选是哈希算法的数量为至少两 个, 则根据每个目标字符串的各目标哈希结果形成该目标字符串的哈希表项 具体可包括:将每个目标字符串的第一个目标哈希结果作为哈希表项索引, 其他目标哈希结果作为哈希表项内容; 上述步骤中, 选取了第一个被测哈 希结果作为哈希表项索引, 但实际应用中, 并不限定为按照哈希结果的顺 序确定哪个结果作为表项索引, 可任意设定哈希算法的顺序, 即可任意确 定第一个获得的目标哈希结果作为哈希表项索引。 On the basis of the technical solutions of the foregoing embodiments, it is preferable that the number of the hashing algorithms is at least two, and the hashing of the target string according to the target hash result of each target string may specifically include: The first target hash result of each target string is indexed as a hash table entry, and the other target hash result is used as a hash table entry content; in the above step, the first measured hash is selected. The hash result is used as the index of the hash table entry, but in actual application, it is not limited to determine which result is used as the index of the entry according to the order of the hash result, and the order of the hash algorithm can be arbitrarily set, and the first obtainment can be arbitrarily determined. The target hash result is indexed as a hash table entry.
则步骤 140, 根据所述被测字符串的各被测哈希结果在所述哈希匹配 表的各哈希表项中进行匹配, 以获得匹配结果可具体执行如下操作:  Step 140: Perform matching according to each hashed result of the measured character string in each hash table of the hash matching table to obtain a matching result, and specifically perform the following operations:
步骤 141、 内容匹配装置将所述被测字符串的第一个被测哈希结果作 为哈希表项索引, 在所述哈希匹配表中查找对应的哈希表项; 步骤 142、 如果查找到对应的哈希表项, 内容匹配装置将所述被测字符串的其他被测 哈希结果与查找到的哈希表项的内容进行匹配;  Step 141: The content matching device uses the first hashed result of the measured character string as a hash table entry index, and searches for a corresponding hash table entry in the hash matching table. Step 142: If searching Go to the corresponding hash table entry, the content matching device matches the other measured hash results of the tested character string with the content of the found hash table entry;
步骤 143、 当所述其他被测哈希结果与查找到的哈希表项的内容均匹配 一致时, 获得匹配成功结果。  Step 143: When the other measured hash results are consistent with the contents of the found hash table entries, the matching success result is obtained.
图 2为本发明实施例二提供的内容匹配方法的流程图, 本实施例以实 例方式详细介绍各步骤。  FIG. 2 is a flowchart of a content matching method according to Embodiment 2 of the present invention. This embodiment describes each step in detail by way of an example.
首先自定义所用的哈希算法为五个, 尽可能体现字符串特征, 如表 1 所示, 实际操作中并不限于此, 可以是以下哈希算法的任意一种或任意组 合, 也可添加其他的哈希算法, 当目标字符串较短或较少时, 也可以将原 始字符串设置为哈希算法:  First, the hash algorithm used by the user is five, and the string features are reflected as much as possible, as shown in Table 1. The actual operation is not limited to this, and may be any one or any combination of the following hash algorithms, or may be added. Other hash algorithms, when the target string is shorter or less, can also set the original string as a hash algorithm:
表 1  Table 1
Figure imgf000007_0001
Figure imgf000007_0001
步骤 201、 内容匹配装置基于设定的五种哈希算法对目标字符串进行 哈希运算, 以分别获取目标字符串与各哈希算法对应的各目标哈希结果; 假设目标字符串有三个, 分别为:  Step 201: The content matching device hashes the target character string according to the set five hash algorithms to respectively obtain the target hash results corresponding to the target string and each hash algorithm. Assume that there are three target strings. They are:
目标字符串 1 = "ABCDEFG123456789"  Target string 1 = "ABCDEFG123456789"
目标字符串 2 = "abcdefg-xyz" 目标字符串 3 = "Accept-Language"  Target string 2 = "abcdefg-xyz" target string 3 = "Accept-Language"
则目标字符串的 ASCII码序列分别为: 目标字符串 1对应 ASCII码序列 = "41424344454647313233343536373839" 目标字符串 2对应 ASCII码序列 = "616263646566672D78797A" Then the ASCII code sequence of the target string is: Target string 1 corresponds to ASCII code sequence = "41424344454647313233343536373839" Target string 2 corresponds to ASCII code sequence = "616263646566672D78797A"
目标字符串 3对应 ASCII码序列 = "4163636570742D4C616E6775616765 " 各目标哈希结果如下表 2所示:  Target string 3 corresponds to ASCII code sequence = "4163636570742D4C616E6775616765 " The hash results for each target are shown in Table 2 below:
表 2  Table 2
Figure imgf000008_0001
Figure imgf000008_0001
步骤 202、 内容匹配装置根据每个目标字符串的各目标哈希结果形成 该目标字符串的哈希表项, 各目标字符串的哈希表项组合形成哈希匹配 表;  Step 202: The content matching device forms a hash table entry of the target character string according to each target hash result of each target character string, and the hash table entries of each target character string are combined to form a hash matching table.
各目标字符串的目标哈希结果构成如下表 3所示的哈希匹配表, 其中第 一个目标哈希结果作为哈希表项索引 (tab— index ) , 其他目标哈希结果作为 哈希表项内容( tab— content ) 。 即将目标哈希结果 1作为哈希表项索引, 目 标哈希结果 2 ~ 5作为哈希表项内容, 生成哈希匹配表。 其中, 表项内容 ( tab— content ) 的格式可以记为 = {stringID, hash结果 2, hash结果 3 , hash 结果 4, hash结果 5} , 如表 3中所示。 表 3  The target hash result of each target string constitutes a hash match table as shown in Table 3 below, in which the first target hash result is used as a hash table entry index (tab_index), and other target hash results are used as a hash table. Item content ( tab_ content ). The target hash result 1 is used as a hash table entry index, and the target hash result 2 ~ 5 is used as a hash table entry to generate a hash match table. The format of the tab-content can be recorded as = {stringID, hash result 2, hash result 3, hash result 4, hash result 5}, as shown in Table 3. table 3
Figure imgf000008_0002
0x71 1 0x41 0x39 0x10 0x0879 0x01—41—39—10—0879 步骤 203、 内容匹配装置根据所述至少一种哈希算法对被测字符串进 行哈希运算, 以获取被测字符串与各哈希算法对应的各被测哈希结果。
Figure imgf000008_0002
0x71 1 0x41 0x39 0x10 0x0879 0x01—41—39—10—0879 Step 203: The content matching apparatus performs a hash operation on the measured character string according to the at least one hash algorithm to obtain the measured character string and each hash. The measured hash results for the algorithm.
被测字符串的获取方式有多种, 使内容匹配方法的具体应用场景而 定, 典型地可以是网络中报文的待匹配字符串, 例如接收网络中的超文本 传送协议 ( Hypertext Transfer Protocol, 简称 HTTP )请求艮文如下:  There are various ways to obtain the measured character string, which may be determined by the specific application scenario of the content matching method, and may be a string to be matched in the network, for example, a hypertext transfer protocol in the receiving network (Hypertext Transfer Protocol, Referred to as HTTP), the request is as follows:
GET /product/ggsn/index.htm HTTP/1. l\r\n  GET /product/ggsn/index.htm HTTP/1. l\r\n
Accept: */*\r\n Accept: */*\r\n
Referer: http://www.huawei.com/\r\n Accept-Language: zh-cn\r\n Accept-Encoding: gzip, deflate\r\n Referer: http://www.huawei.com/\r\n Accept-Language: zh-cn\r\n Accept-Encoding: gzip, deflate\r\n
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
5.1 ;SV1 ;.NET CLR 2.0.50727)\r\n 5.1 ;SV1 ;.NET CLR 2.0.50727)\r\n
Host: www.huawei.com\r\n Host: www.huawei.com\r\n
ABCDEFG123456789: xxxxxxxxxxx\r\n ABCDEFG123456789: xxxxxxxxxxx\r\n
Connection: Keep-Alive\r\n Connection: Keep-Alive\r\n
\r\n 根据 HTTP协议解析规则, 提取 "\r\n "特征字符和 ": " 特征字符间 的字符串如下: \r\n According to the HTTP protocol parsing rules, extract the string between the "\r\n " feature character and the ": " feature character as follows:
提取被测字符串 1 = "Accept"  Extract the measured string 1 = "Accept"
提取被测字符串 2 = "Referer" 提取被测字符串 3 = "Accept-Language"  Extract the measured string 2 = "Referer" Extract the measured string 3 = "Accept-Language"
提取被测字符串 4 = "Accept-Encoding" 提取被测字符串 5 = "User-Agent" 提取被测字符串 6 = "Host" 提取被测字符串 7 = "ABCDEFG123456789" Extract the measured string 4 = "Accept-Encoding" Extract the measured string 5 = "User-Agent" Extract the measured string 6 = "Host" Extract the measured string 7 = "ABCDEFG123456789"
提取被测字符串 8 = "Connection" 上述字符串的提取策略还可以有多种, 本发明实施例并不受限于此。 根据设定的哈希算法计算获得的被测哈希结果如下表 4所示:  The extracted character string 8 = "Connection" The extraction strategy of the above-mentioned character string may also be various, and the embodiment of the present invention is not limited thereto. The measured hash results obtained according to the set hash algorithm are shown in Table 4 below:
表 4  Table 4
Figure imgf000010_0001
Figure imgf000010_0001
步骤 204、 内容匹配装置将被测字符串的第一个被测哈希结果作为哈 希表项索引, 在所述哈希匹配表中查找对应的哈希表项;  Step 204: The content matching device uses the first hashed result of the measured character string as a hash table entry index, and searches for a corresponding hash table entry in the hash matching table.
步骤 205、 如果查找到对应的哈希表项, 则内容匹配装置将所述被测 字符串的其他被测哈希结果与查找到哈希表项的哈希表项内容进行匹配; 当然, 若无法查找到对应的哈希表项, 则无需执行后续步骤, 表明匹配失 败。  Step 205: If the corresponding hash table entry is found, the content matching device matches the other measured hash results of the tested character string with the hash table entry content of the hash table entry. If the corresponding hash table cannot be found, no subsequent steps are required to indicate that the match fails.
以 "Accept-Language" 为例, 其对应的哈希表项索引为 0x3F, 能够 在表 3的哈希匹配表中找到对应的哈希表项,同样" ABCDEFG123456789" 也能查找到对应的哈希表项, 而其他被测字符串则无法查找对应表项, 即 可直接视为匹配失败。 将 "Accept-Language" 和 "ABCDEFG123456789" 的其他被测哈希结果与查找到哈希表项的哈希表项内容进行匹配。 各被测 字符串的匹配结果如下表 5所示: 待匹配字符串 表项索引 匹配结果  Take "Accept-Language" as an example, the corresponding hash table entry index is 0x3F, and the corresponding hash table entry can be found in the hash matching table of Table 3. Similarly, "ABCDEFG123456789" can also find the corresponding hash. An entry cannot be found because the other tested string cannot find the corresponding entry. Matches other measured hash results of "Accept-Language" and "ABCDEFG123456789" with the contents of the hash table entry that finds the hash table entry. The matching result of each tested string is shown in Table 5 below: String to be matched Table entry index Match result
Accept 0x20 表项为空, 不匹配 Referer 0x51 表项为空, 不匹配Accept 0x20 entry is empty, does not match Referer 0x51 entry is empty, does not match
Accept-Language 0x3F 0x03— 41— 65— OF— 7D42,完全匹配Accept-Language 0x3F 0x03— 41— 65— OF— 7D42, exact match
Accept-Encoding 0x3D 表项为空, 不匹配 Accept-Encoding 0x3D table entry is empty, does not match
User-Agent 0x45 表项为空, 不匹配  User-Agent 0x45 entry is empty and does not match
Host 0x20 表项为空, 不匹配 Host 0x20 entry is empty and does not match
ABCDEFG123456789 0x71 0x01— 41— 39— 10— 0879,完全匹配 ABCDEFG123456789 0x71 0x01— 41— 39— 10— 0879, exact match
Connection 0x59 表项为空, 不匹配  Connection 0x59 entry is empty, does not match
步骤 206、 当被测字符串的其他被测哈希结果与查找到的哈希表项的 所有哈希表项内容均匹配一致时, 获得匹配成功结果。  Step 206: When the other measured hash results of the tested string match the contents of all the hash entries of the hash table item that are found, the matching success result is obtained.
即被测字符串 "Accept-Language" 和 "ABCDEFG123456789" 匹配到 目标字符串。可以根据匹配结果输出匹配成功的字符串 ID, 以便根据匹配 结果执行后续的操作, 例如网址过滤等。 也可以进一步判断是否还有后续 报文的输入, 若有, 则重复执行上述匹配流程。  That is, the measured strings "Accept-Language" and "ABCDEFG123456789" match the target string. A string ID that matches successfully can be output according to the matching result, so as to perform subsequent operations according to the matching result, such as URL filtering. It is also possible to further determine whether there is any subsequent message input, and if so, repeat the above matching process.
本实施例的技术方案详细介绍了各步骤的操作, 由于各表项的匹配操 作可独立进行, 因此各被测字符串的哈希结果计算和匹配可以并行进行, 流水实现高速率匹配, 所以显著提高了匹配速度。  The technical solution of this embodiment describes the operations of each step in detail. Since the matching operations of the entries can be performed independently, the hash result calculation and matching of each tested string can be performed in parallel, and the pipeline achieves high rate matching, so significant Improved the matching speed.
在上述实施例的技术方案中, 可以选择某一种哈希算法的哈希结果作 为表项索引, 或者, 也可以选择多种哈希算法的多个哈希结果进行组合, 作为表项索引。 实际上, 由多个哈希结果进行组合即是一种组合式哈希算 法。 所以本发明实施例中的哈希算法不仅可以是简单的哈希计算, 也可以 是多个简单哈希计算的组合哈希算法, 能够更突出的表征字符串的特征, 提高匹配精度。  In the technical solution of the foregoing embodiment, the hash result of the hash algorithm may be selected as the index of the entry, or multiple hash results of the multiple hash algorithms may be selected as the index of the entry. In fact, combining multiple hash results is a combined hash algorithm. Therefore, the hash algorithm in the embodiment of the present invention can be not only a simple hash calculation, but also a combined hash algorithm of multiple simple hash calculations, which can more prominently characterize the character string and improve the matching precision.
实施例三  Embodiment 3
图 3为本发明实施例三提供的内容匹配方法的流程图, 本实施例可以 以上述实施例为基础, 进一步包括添加目标字符串的升级操作, 在前述流 程的基础上, 还包括如下步骤:  FIG. 3 is a flowchart of a content matching method according to Embodiment 3 of the present invention. The embodiment may be further configured to include an upgrade operation of adding a target character string, and the method further includes the following steps:
步骤 310、 内容匹配装置根据接收到的目标字符串添加请求中待添加 的目标字符串, 基于设定的至少一种哈希算法对待添加的目标字符串进行 哈希运算,以获取所述待添加的目标字符串与各哈希算法对应的各目标哈希 结果; 在上述步骤中, 也可以设置对添加操作的其他限制, 例如, 首先判断 哈希匹配表的表项是否已达到上限值, 从而确定是否允许添加新的目标字 符串。 Step 310: The content matching device adds a target character string to be added in the request according to the received target character string, and performs a hash operation on the target character string to be added according to the set at least one hash algorithm to obtain the to-be-added Target hashes and target hash results corresponding to each hash algorithm; In the above steps, other restrictions on the adding operation may also be set, for example, first determining whether the entry of the hash matching table has reached the upper limit value, thereby determining whether to allow the addition of a new target character string.
步骤 320、 内容匹配装置将待添加目标字符串的第一个目标哈希结果 作为哈希表项索引, 从所述哈希匹配表中读取对应的哈希表项, 作为当前 哈希表项;  Step 320: The content matching device uses the first target hash result of the target character string to be added as a hash table entry index, and reads the corresponding hash table entry from the hash matching table as the current hash table entry. ;
步骤 330、内容匹配装置判断所述当前哈希表项的表项内容是否为空, 若是, 则执行步骤 340、 若否, 则执行步骤 350;  Step 330: The content matching device determines whether the content of the current hash entry is empty, and if yes, step 340 is performed, and if no, step 350 is performed;
步骤 340、 当所述当前哈希表项的表项内容为空时, 将待添加目标字 符串的其他目标哈希结果添加到当前哈希表项的表项中, 作为当前哈希表 项的内容;  Step 340: When the content of the current hash entry is empty, add another target hash result of the target string to the entry of the current hash entry, as the current hash entry. Content
步骤 350、 当所述当前哈希表项的表项内容不为空时, 釆用级联方式, 将所述待添加目标字符串的其他目标哈希结果作为所述当前哈希表项的 下一级表项内容添加至所述哈希匹配表中。  Step 350: When the content of the current hash entry is not empty, use the cascading mode to use the other target hash result of the target string to be added as the current hash entry. The contents of the primary table entry are added to the hash matching table.
本实施例添加目标字符串的过程能够有效避免字符串哈希表项索引 相同造成的冲突。 若目标字符串的哈希表项索引相同, 则可以釆用级联的 方式设置另一哈希表项。 在被测字符串进行匹配时, 对于级联形式存储的 表项可以进行顺序匹配, 以保证匹配结果的精度要求。  The process of adding a target string in this embodiment can effectively avoid the conflict caused by the same index of the string hash table entry. If the hash table entry index of the target string is the same, you can set another hash table in cascade mode. When the measured string is matched, the items stored in the cascading form can be matched in order to ensure the accuracy of the matching result.
釆用级联方式, 将待添加目标字符串的其他目标哈希结果作为当前哈 希表项的下一级表项内容添加至所述哈希匹配表中的操作可具体包括: 步骤 351、 比较待添加目标字符串的其他目标哈希结果与当前哈希表 项的表项内容是否一致, 若是, 则执行步骤 352 , 若否, 则执行步骤 353 ; 步骤 352、 若一致时, 丟弃待添加目标字符串, 结束添加操作流程; 步骤 353、 若不一致时, 读取当前哈希表项的下一级偏移表项索引, 并根据偏移表项索引读取下一级哈希表项, 将所述下一级哈希表项作为更 新后的当前哈希表项;  In the cascading manner, the operation of adding the other target hash result of the target string to be added to the hash matching table as the content of the next-level entry of the current hash table may include: Step 351: Comparing Whether the other target hash result of the target string is consistent with the content of the current hash entry, if yes, go to step 352, if no, go to step 353; Step 352, if they are consistent, discard the to-be-added a target string, ending the adding operation process; Step 353: If not, reading the index of the next-level offset entry of the current hash entry, and reading the next-level hash table entry according to the offset entry index. The next-level hash table entry is used as the updated current hash table entry;
步骤 354、 判断所述更新后的当前哈希表项的表项内容是否为空, 若 是, 则执行步骤 355 , 若否, 则执行步骤 356;  Step 354: Determine whether the content of the updated current hash entry is empty, and if yes, go to step 355, if no, go to step 356;
步骤 355、 当判断出所述更新后的当前哈希表项的表项内容为空时, 将待添加目标字符串的其他目标哈希结果添加为当前哈希表项的表项内 步骤 356、 当判断出所述更新后的当前哈希表项的表项内容不为空时, 返回执行步骤 351的比较操作。 Step 355: When it is determined that the content of the updated current hash entry is empty, add another target hash result of the target string to be added as the entry of the current hash entry. Step 356: When it is determined that the content of the updated current hash entry is not empty, return to perform the comparison operation of step 351.
上述步骤 354至 356 , 相当于返回执行前述步骤 330。  The above steps 354 to 356 are equivalent to returning to perform the aforementioned step 330.
上述技术方案, 若某一级的表项内容不为空, 但比较结果一致时, 即 表明发生不同目标字符串具有相同哈希表项的冲突, 可直接丟弃此类目标 字符串。 虽然会产生一定的匹配精度下降, 但是, 通过设置哈希算法及其 数量, 可以尽量减少此类冲突, 或通过提示操作人员, 尽量减少此类目标 字符串的设置。  In the above technical solution, if the content of a certain level of the entry is not empty, but the comparison result is consistent, it indicates that different target strings have the same hash table conflict, and the target string can be directly discarded. Although there will be a certain drop in matching accuracy, by setting the hash algorithm and its number, you can minimize such conflicts, or by prompting the operator to minimize the setting of such target strings.
上述操作中, 级联的表项可以有多级, 每一级记录下一级的偏移表项 索引, 以便在一级表项匹配失败时, 转移至下一级继续匹配, 直至匹配成 功或无下一级偏移表项。 即在被测字符串的匹配流程中, 如果查找到对应 的哈希表项, 将所述被测字符串的其他被测哈希结果与查找到的哈希表项 的内容进行匹配之后, 还包括: 当所述其他被测哈希结果与查找到的哈希 表项的内容匹配不一致时, 按照偏移表项索引顺序查找下一级哈希表项, 并返回执行将所述被测字符串的其他被测哈希结果与查找到的哈希表项的 内容进行匹配的操作。  In the above operation, the cascading entries may have multiple levels, and each level records the index of the offset entry of the next level, so that when the primary entry fails, the transition to the next level continues to match until the match is successful or There is no next level offset entry. That is, in the matching process of the tested character string, if the corresponding hash table entry is found, and the other measured hash results of the tested character string are matched with the content of the found hash table entry, The method includes: when the other measured hash result is inconsistent with the content of the found hash table entry, searching for the next-level hash table entry according to the offset table entry index order, and returning to execute the measured character The operation of matching the other hashed results of the string with the contents of the found hash table entry.
本实施例所提供的目标字符串添加方法, 某一表项的添加并不影响其 他表项,可以不必对哈希匹配表进行全文的更新, 因此易于实现维护升级。  In the method for adding a target string provided in this embodiment, the addition of an entry does not affect other entries, and the full-text update of the hash matching table is not necessary, so that maintenance and upgrade are easy to implement.
在上述技术方案的基础上, 优选设置哈希算法中至少包括将原始字符 串本身作为哈希结果的原始字符串哈希算法,且所述原始字符串作为下一级 哈希表项的表项内容。  On the basis of the foregoing technical solution, it is preferable to set an original string hash algorithm including at least the original string itself as a hash result, and the original string is used as an entry of a next-level hash table entry. content.
例如设置两个哈希算法:  For example, set two hash algorithms:
hash算法 1=字符串按单字节相异或 Hash algorithm 1 = string is different by single byte or
Figure imgf000013_0001
Figure imgf000013_0001
则目标字符串的哈希结果如下表 6所示:  The hash result of the target string is shown in Table 6 below:
表 6  Table 6
Figure imgf000013_0002
3 Cd 0x07 0x00006364
Figure imgf000013_0002
3 Cd 0x07 0x00006364
4 Ef 0x03 0x00006566  4 Ef 0x03 0x00006566
5 ABCD 0x04 0x41424344  5 ABCD 0x04 0x41424344
6 EFGH OxOC 0x45464748  6 EFGH OxOC 0x45464748
7 BCD 0x45 0x00424344  7 BCD 0x45 0x00424344
8 EFG 0x44 0x00454647 釆用目标字符串的原始字符串作为一项哈希算法的优点在于, 能够确 保字符串的匹配精度, 而将此哈希算法的结果作为哈希表项级联的下一级 或最后一级的优点在于, 级联表项匹配会串行执行, 若前述的哈希结果匹 配不一致, 则可以迅速判断出匹配失败, 直至下一级或最后一级才进行原 始字符串的精确匹配来保证匹配准确性。 既可以保证匹配精度, 又能节约 匹配时间。  8 EFG 0x44 0x00454647 The advantage of using the original string of the target string as a hash algorithm is that it can ensure the matching precision of the string, and the result of this hash algorithm is used as the next level of the hash table entry cascade. Or the last level has the advantage that the cascading table item matching will be executed serially. If the hash result matches are inconsistent, the matching failure can be quickly determined until the next level or the last level is performed. Match to ensure matching accuracy. It can guarantee the matching accuracy and save the matching time.
实施例四  Embodiment 4
图 4为本发明实施例四提供的内容匹配方法的流程图, 本实施例可以 以上述任意实施例为基础, 进一步增加了修改或删除目标字符串的操作, 修改和删除操作基本类似, 具体包括如下步骤:  4 is a flowchart of a content matching method according to Embodiment 4 of the present invention. This embodiment may further improve the operation of modifying or deleting a target character string based on any of the foregoing embodiments, where the modification and deletion operations are basically similar, including The following steps:
步骤 410、 内容匹配装置根据接收到的目标字符串修改请求或删除请 求中的待修改或待删除目标字符串,基于设定的至少一种哈希算法对所述待 修改或待删除目标字符串进行哈希运算,以获取所述待修改或待删除目标字 符串与各哈希算法对应的各目标哈希结果;  Step 410: The content matching device, according to the received target string modification request or the target string to be modified or to be deleted in the deletion request, and the target string to be modified or to be deleted based on the set at least one hash algorithm Performing a hash operation to obtain each target hash result corresponding to each hash algorithm of the target string to be modified or to be deleted;
步骤 420、 内容匹配装置将待修改或待删除目标字符串的第一个目标 哈希结果作为哈希表项索引, 从所述哈希匹配表中读取对应的哈希表项, 作为当前哈希表项;  Step 420: The content matching device uses the first target hash result of the target character string to be modified or to be deleted as a hash table entry index, and reads the corresponding hash table entry from the hash matching table as the current History entry
步骤 430、 内容匹配装置对所述当前哈希表项的哈希表项内容进行修 改或删除。  Step 430: The content matching device modifies or deletes the content of the hash entry of the current hash entry.
修改操作具体是修改相应的哈希表项内容, 删除操作则是删除对应的 级联表项或删除整个表项。  The modification operation is to modify the contents of the corresponding hash table. The delete operation deletes the corresponding cascading entry or deletes the entire entry.
实施例五  Embodiment 5
图 5为本发明实施例五提供的内容匹配装置的结构示意图,该内容匹配 装置包括: 第一哈希运算模块 510、 哈希表形成模块 520、 第二哈希运算模块 530和哈希表匹配模块 540。 其中, 第一哈希运算模块 510用于基于设定的至 少一种哈希算法对至少一个目标字符串分别进行哈希运算, 以分别获取每个 目标字符串与各哈希算法对应的各目标哈希结果; 哈希表形成模块 520用于 根据每个目标字符串的各目标哈希结果形成该目标字符串的哈希表项, 将各 个目标字符串的哈希表项组合形成哈希匹配表; 第二哈希运算模块 530用于 根据所述至少一种哈希算法对被测字符串进行哈希运算, 以获取所述被测字 符串与各哈希算法对应的各被测哈希结果; 哈希表匹配模块 540用于根据所 述被测字符串的各被测哈希结果在所述哈希匹配表的各哈希表项中进行匹 配, 以获得匹配结果。 FIG. 5 is a schematic structural diagram of a content matching apparatus according to Embodiment 5 of the present invention. The content matching apparatus includes: a first hash operation module 510, a hash table forming module 520, and a second hash operation module. 530 and hash table matching module 540. The first hash operation module 510 is configured to respectively perform hashing on at least one target character string according to the set at least one hash algorithm, to respectively acquire each target string and each target corresponding to each hash algorithm. Hash result; the hash table forming module 520 is configured to form a hash table entry of the target string according to each target hash result of each target string, and combine hash entries of each target string to form a hash match. The second hash operation module 530 is configured to perform a hash operation on the measured character string according to the at least one hash algorithm to obtain the measured hashes corresponding to the hash algorithms. The hash table matching module 540 is configured to perform matching in each hash table of the hash matching table according to each measured hash result of the tested character string to obtain a matching result.
在上述技术方案的基础上, 哈希算法的数量优选为至少两个, 则所述哈 希表形成模块 520具体用于将每个目标字符串的第一个目标哈希结果作为哈 希表项索引, 其他目标哈希结果作为哈希表项内容, 将各个目标字符串的哈 希表项组合形成哈希匹配表; 则所述哈希表匹配模块 540可以包括: 索引匹 配单元 541、 内容匹配单元 542和结果获取单元 543。 其中, 索引匹配单元 541用于将所述被测字符串的第一个被测哈希结果作为哈希表项索引, 在所 述哈希匹配表中查找对应的哈希表项; 内容匹配单元 542用于如果查找到对 应的哈希表项,将所述被测字符串的其他被测哈希结果与查找到的 哈希表项 的内容进行匹配; 结果获取单元 543用于当所述其他被测哈希结果与查找到 的哈希表项的内容均匹配一致时, 获得匹配成功结果。  On the basis of the foregoing technical solutions, the number of hash algorithms is preferably at least two, and the hash table forming module 520 is specifically configured to use the first target hash result of each target string as a hash table entry. The index, the other target hash result is used as the hash table item content, and the hash table items of the respective target strings are combined to form a hash matching table; then the hash table matching module 540 may include: an index matching unit 541, content matching Unit 542 and result acquisition unit 543. The index matching unit 541 is configured to: use the first hashed result of the measured character string as a hash table entry index, and search for a corresponding hash table entry in the hash matching table; 542 is configured to match another measured hash result of the measured character string with the content of the searched hash table item if the corresponding hash table entry is found; the result obtaining unit 543 is configured to use the other When the measured hash result matches the content of the hash table item that is found, the matching success result is obtained.
本发明实施例所提供的内容匹配装置, 通过哈希结果的匹配来实现字符 串的匹配, 能够实现各表项的并行匹配, 从而提高匹配速度; 哈希匹配表中 各表项的更新并不相互影响, 易于实现目标字符串的增加和修改。  The content matching device provided by the embodiment of the present invention implements matching of the string by matching the hash result, and can implement parallel matching of each entry, thereby improving the matching speed; the update of each entry in the hash matching table is not Mutual influence, easy to achieve the addition and modification of the target string.
实施例六  Embodiment 6
图 6为本发明实施例六提供的内容匹配装置的结构示意图, 在上述实施 例的基础上, 该内容匹配装置还可以包括: 第三哈希运算模块 610、 表项读 耳 莫块 620、 内容添加模块 630和级联添加模块 640。 其中, 第三哈希运算模 块 610用于根据接收到的目标字符串添加请求中待添加的目标字符串, 基于 所述设定的至少一种哈希算法对待添加的目标字符串进行哈希运算, 以获取 所述待添加的目标字符串与各哈希算法对应的各目标哈希结果; 表项读取模 块 620用于将待添加目标字符串的第一个目标哈希结果作为哈希表项索引, 从所述哈希匹配表中读取对应的哈希表项, 作为当前哈希表项; 内容添加模 块 630用于当所述当前哈希表项的表项内容为空时, 将所述待添加目标字符 串的其他目标哈希结果添加到当前哈希表项中, 作为当前哈希表项的内容; 级联添加模块 640用于当所述当前哈希表项的表项内容不为空时, 釆用级联 方式, 将所述待添加目标字符串的其他目标哈希结果作为所述当前哈希表项 的下一级表项内容添加至所述哈希匹配表中。 FIG. 6 is a schematic structural diagram of a content matching apparatus according to Embodiment 6 of the present invention. The content matching apparatus may further include: a third hash operation module 610, a table entry read block 620, and a content. Add module 630 and cascade add module 640. The third hash operation module 610 is configured to add a target character string to be added in the request according to the received target character string, and perform hash operation on the target character string to be added based on the set at least one hash algorithm. And obtaining the target hash result corresponding to each hash algorithm of the target string to be added; the entry reading module 620 is configured to use the first target hash result of the target string to be added as a hash table. Item index, Reading the corresponding hash table item from the hash matching table as the current hash table item; the content adding module 630 is configured to: when the content of the current hash table item is empty, the Adding another target hash result of the target string to the current hash table item as the content of the current hash table item; the cascading adding module 640 is configured to: when the content of the current hash table item is not empty In the cascading manner, the other target hash result of the target string to be added is added to the hash matching table as the content of the next level of the current hash entry.
其中, 级联添加模块 640优选包括: 比较单元 641、 丟弃单元 642、 偏移 索引读取单元 643、 内容添加单元 644和内容判断单元 645。 其中, 比较单元 641 用于比较待添加目标字符串的其他目标哈希结果与当前哈希表项的表项 内容是否一致; 丟弃单元 642用于若一致时, 丟弃待添加目标字符串; 偏移 索引读取单元 643用于若不一致时, 读取当前哈希表项的下一级偏移表项索 引, 并根据所述偏移表项索引读取下一级哈希表项, 将所述下一级哈希表项 作为更新后的当前哈希表项; 内容添加单元 644用于当判断出所述更新后的 当前哈希表项的表项内容为空时, 将待添加目标字符串的其他目标哈希结果 添加为当前哈希表项的表项内容; 内容判断单元 645用于当判断出所述更新 后的当前哈希表项的表项内容不为空时, 返回执行所述比较操作。  The cascading adding module 640 preferably includes: a comparing unit 641, a discarding unit 642, an offset index reading unit 643, a content adding unit 644, and a content judging unit 645. The comparison unit 641 is configured to compare whether the other target hash result of the target character string to be added is consistent with the content of the current hash table entry; the discarding unit 642 is configured to discard the target character string to be added if they are consistent; The offset index reading unit 643 is configured to: read an index of the next-level offset entry of the current hash table entry, and read the next-level hash table entry according to the offset entry index, if The next-level hash table entry is used as the updated current hash table entry; the content adding unit 644 is configured to: when it is determined that the updated current hash entry of the current hash entry is empty, the target to be added The other target hash result of the string is added as the content of the entry of the current hash table entry; the content determining unit 645 is configured to return to execute when it is determined that the content of the updated current hash entry is not empty. The comparison operation.
该内容匹配装置还可以包括: 偏移表项查找单元, 当所述其他被测哈希 结果与查找到的哈希表项的内容匹配不一致时, 按照偏移表项索引顺序查找 下一级哈希表项, 并返回执行内容匹配单元的将所述被测字符串的其他被测 哈希结果与查找到的哈希表项的内容进行匹配的操作。  The content matching device may further include: an offset entry search unit, when the other measured hash results are inconsistent with the content of the found hash table entry, searching for the next level according to the offset table entry index order And the operation of the content matching unit to match the other measured hash results of the measured character string with the content of the found hash table entry.
本实施例所提供的内容匹配装置, 能够方便的添加目标字符串, 不会导 致整个哈希匹配表的编译, 所以易于升级维护。 通过设置级联的表项, 也能 有效解决目标字符串之间哈希表项的冲突问题, 提高匹配的精度。  The content matching device provided in this embodiment can conveniently add a target character string without causing compilation of the entire hash matching table, so that it is easy to upgrade and maintain. By setting the cascading entries, the conflict between the hash entries of the target strings can be effectively solved, and the matching precision is improved.
实施例七  Example 7
图 7为本发明实施例七提供的内容匹配装置的结构示意图, 该内容匹配 装置还可以包括: 第四哈希运算模块 710、 索引匹配模块 720和修改删除模 块 730。 其中, 第四哈希运算模块 710用于根据接收到的目标字符串修改请 求或删除请求中的待修改或待删除目标字符串, 基于设定的至少一种哈希算 法对所述待修改或待删除目标字符串进行哈希运算, 以获取所述待修改或待 删除目标字符串与各哈希算法对应的各目标哈希结果; 索引匹配模块 720用 于将待修改或待删除目标字符串的第一个目标哈希结果作为哈希表项索引, 从所述哈希匹配表中读取对应的哈希表项, 作为当前哈希表项; 修改删除模 块 730用于对所述当前哈希表项的哈希表项内容进行修改或删除。 FIG. 7 is a schematic structural diagram of a content matching apparatus according to Embodiment 7 of the present invention. The content matching apparatus may further include: a fourth hash operation module 710, an index matching module 720, and a modification deletion module 730. The fourth hash operation module 710 is configured to modify, according to the received target string modification request or the target string to be deleted in the request, the target string to be modified or to be deleted, based on the set at least one hash algorithm. The target string to be deleted is hashed to obtain the target hash result corresponding to each hash algorithm of the target string to be modified or to be deleted; the index matching module 720 uses The first target hash result of the target string to be modified or to be deleted is used as a hash table entry index, and the corresponding hash table entry is read from the hash matching table as the current hash table entry; The deleting module 730 is configured to modify or delete the hash table item content of the current hash entry.
本实施例所提供的内容匹配装置, 能够方便的修改和删除目标字符串, 不会导致整个哈希匹配表的编译, 所以易于升级维护。  The content matching device provided in this embodiment can easily modify and delete the target character string, and does not cause the compilation of the entire hash matching table, so it is easy to upgrade and maintain.
本发明各实施例所提供的内容匹配装置, 可执行本发明任意实施例所提 供的内容匹配方法, 具备相应的功能模块。 该内容匹配方法和装置具备诸多 优点, 能提高匹配速度、 减少资源占用, 同时能易于升级维护。  The content matching device provided by the embodiments of the present invention may perform the content matching method provided by any embodiment of the present invention, and has corresponding functional modules. The content matching method and device have many advantages, which can improve the matching speed, reduce the resource occupation, and can be easily upgraded and maintained.
本领域普通技术人员可以理解: 实现上述方法实施例的全部或部分步骤 可以通过程序指令相关的硬件来完成, 前述的程序可以存储于一计算机可读 取存储介质中, 该程序在执行时, 执行包括上述方法实施例的步骤; 而前述 的存储介质包括: ROM, RAM, 磁碟或者光盘等各种可以存储程序代码的介 质。  A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The method includes the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
最后应说明的是: 以上各实施例仅用以说明本发明的技术方案, 而非对 其限制; 尽管参照前述各实施例对本发明进行了详细的说明, 本领域的普通 技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改, 或者对其中部分或者全部技术特征进行等同替换; 而这些修改或者替换, 并 不使相应技术方案的本质脱离本发明各实施例技术方案的范围。  Finally, it should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting thereof; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the foregoing embodiments may be modified, or some or all of the technical features may be equivalently replaced; and the modifications or substitutions do not deviate from the technical solutions of the embodiments of the present invention. range.

Claims

权 利 要 求 书 claims
1、 一种内容匹配方法, 其特征在于, 包括: 1. A content matching method, characterized by including:
基于设定的至少一种哈希算法对至少一个目标字符串分别进行哈希运 算, 以分别获取每个目标字符串与各哈希算法对应的各目标哈希结果; Hash operations are performed on at least one target string based on at least one set hash algorithm to obtain each target hash result corresponding to each target string and each hash algorithm;
根据每个目标字符串的各目标哈希结果形成该目标字符串的哈希表项, 将各个目标字符串的哈希表项组合形成哈希匹配表; The hash table entries of each target string are formed according to the target hash results of each target string, and the hash table entries of each target string are combined to form a hash matching table;
根据所述至少一种哈希算法对被测字符串进行哈希运算, 以获取所述被 测字符串与各哈希算法对应的各被测哈希结果; Perform a hash operation on the tested string according to the at least one hash algorithm to obtain each tested hash result corresponding to the tested string and each hash algorithm;
根据所述被测字符串的各被测哈希结果在所述哈希匹配表的各哈希表项 中进行匹配, 以获得匹配结果。 Matching is performed in each hash table entry of the hash matching table according to each tested hash result of the tested string to obtain a matching result.
2、 根据权利要求 1所述的内容匹配方法, 其特征在于, 所述哈希算法的 数量为至少两个, 2. The content matching method according to claim 1, characterized in that the number of hash algorithms is at least two,
则根据每个目标字符串的各目标哈希结果形成该目标字符串的哈希表项 包括: 将每个目标字符串的第一个目标哈希结果作为哈希表项索引, 其他目 标哈希结果作为哈希表项内容; Then, the hash table entries of each target string formed based on the target hash results of each target string include: The first target hash result of each target string is used as the hash table item index, and the other target hashes are The result is used as the hash table entry content;
则根据所述被测字符串的各被测哈希结果在所述哈希匹配表的各哈希表 项中进行匹配, 以获得匹配结果包括: Then, matching is performed in each hash table entry of the hash matching table according to each tested hash result of the tested string, so as to obtain the matching result including:
将所述被测字符串的第一个被测哈希结果作为哈希表项索引, 在所述哈 希匹配表中查找对应的哈希表项; Use the first tested hash result of the tested string as a hash table entry index, and search for the corresponding hash table entry in the hash matching table;
如果查找到对应的哈希表项 , 将所述被测字符串的其他被测哈希结果与 查找到的哈希表项的内容进行匹配; If the corresponding hash table entry is found, match other tested hash results of the tested string with the content of the found hash table entry;
当所述其他被测哈希结果与查找到的哈希表项的内容均匹配一致时, 获 得匹配成功结果。 When the other tested hash results match the contents of the found hash table entry, a successful matching result is obtained.
3、 根据权利要求 2所述的内容匹配方法, 其特征在于, 还包括: 根据接收到的目标字符串添加请求中待添加的目标字符串, 基于所述设 定的至少一种哈希算法对待添加的目标字符串进行哈希运算, 以获取所述待 添加的目标字符串与各哈希算法对应的各目标哈希结果; 3. The content matching method according to claim 2, further comprising: treating the target string to be added in the received target string addition request based on at least one hash algorithm set by the setting. Perform a hash operation on the added target string to obtain each target hash result corresponding to the target string to be added and each hash algorithm;
将所述待添加目标字符串的第一个目标哈希结果作为哈希表项索引, 从 所述哈希匹配表中读取对应的哈希表项, 作为当前哈希表项; Use the first target hash result of the target string to be added as a hash table entry index, and read the corresponding hash table entry from the hash matching table as the current hash table entry;
当所述当前哈希表项的表项内容为空时, 将所述待添加目标字符串的其 他目标哈希结果添加到当前哈希表项的表项中, 作为当前哈希表项的内容; 当所述当前哈希表项的表项内容不为空时, 釆用级联方式, 将所述待添 加目标字符串的其他目标哈希结果作为所述当前哈希表项的下一级表项内容 添加至所述哈希匹配表中。 When the entry content of the current hash table entry is empty, add the other parts of the target string to be added. The target hash result is added to the entry of the current hash table entry as the content of the current hash table entry; when the entry content of the current hash table entry is not empty, the cascade method is used, and Other target hash results of the target string to be added are added to the hash matching table as the next-level entry content of the current hash table entry.
4、 根据权利要求 3所述的内容匹配方法, 其特征在于, 釆用级联方式, 将所述待添加目标字符串的其他目标哈希结果作为所述当前哈希表项的下一 级表项内容添加至所述哈希匹配表中, 包括: 4. The content matching method according to claim 3, characterized in that, using a cascade method, other target hash results of the target string to be added are used as the next-level table of the current hash table item. The content of the item is added to the hash matching table, including:
比较待添加目标字符串的其他目标哈希结果与当前哈希表项的表项内容 是否一致; Compare other target hash results of the target string to be added to see if they are consistent with the contents of the current hash table entry;
若一致时, 丟弃待添加目标字符串; If consistent, discard the target string to be added;
若不一致时, 读取当前哈希表项的下一级偏移表项索引, 并根据所述偏 移表项索引读取下一级哈希表项, 将所述下一级哈希表项作为更新后的当前 哈希表项; If they are inconsistent, read the next-level offset table entry index of the current hash table entry, read the next-level hash table entry according to the offset table entry index, and convert the next-level hash table entry to as the updated current hash table entry;
当判断出所述更新后的当前哈希表项的表项内容为空时, 将待添加目标 字符串的其他目标哈希结果添加为当前哈希表项的表项内容; When it is determined that the updated entry content of the current hash table entry is empty, add other target hash results of the target string to be added as the entry content of the current hash table entry;
当判断出所述更新后的当前哈希表项的表项内容不为空时, 返回执行所 述比较操作。 When it is determined that the entry content of the updated current hash table entry is not empty, return to perform the comparison operation.
5、 根据权利要求 3或 4所述的内容匹配方法, 其特征在于, 如果查找到 对应的哈希表项 , 将所述被测字符串的其他被测哈希结果与查找到的哈希表 项的内容进行匹配之后, 还包括: 5. The content matching method according to claim 3 or 4, characterized in that, if the corresponding hash table entry is found, other tested hash results of the tested string are compared with the found hash table After the content of the item is matched, it also includes:
当所述其他被测哈希结果与查找到的哈希表项的内容匹配不一致时, 按 照偏移表项索引顺序查找下一级哈希表项, 并返回执行将所述被测字符串的 其他被测哈希结果与查找到的哈希表项的内容进行匹配的操作。 When the other tested hash results are inconsistent with the content of the found hash table entry, search for the next level hash table entry in the order of the offset table entry index, and return to perform the execution of the tested string. The operation of matching other tested hash results with the contents of the found hash table entry.
6、 根据权利要求 5所述的内容匹配方法, 其特征在于: 所述哈希算法中 至少包括将原始字符串本身作为哈希结果的原始字符串哈希算法, 且所述原 始字符串作为下一级哈希表项的表项内容。 6. The content matching method according to claim 5, characterized in that: the hash algorithm at least includes an original string hash algorithm that uses the original string itself as a hash result, and the original string is used as the following The contents of the first-level hash table entry.
7、 根据权利要求 3或 4所述的内容匹配方法, 其特征在于, 还包括: 根据接收到的目标字符串修改请求或删除请求中的待修改或待删除目标 字符串, 基于设定的至少一种哈希算法对所述待修改或待删除目标字符串进 行哈希运算, 以获取所述待修改或待删除目标字符串与各哈希算法对应的各 目标哈希结果; 7. The content matching method according to claim 3 or 4, further comprising: according to the target string to be modified or to be deleted in the received target string modification request or deletion request, based on at least the set A hash algorithm performs a hash operation on the target string to be modified or deleted to obtain the target string to be modified or deleted and each hash algorithm corresponding to the target string. target hash result;
将待修改或待删除目标字符串的第一个目标哈希结果作为哈希表项索 弓 I , 从所述哈希匹配表中读取对应的哈希表项, 作为当前哈希表项; Index the first target hash result of the target string to be modified or deleted as a hash table entry, and read the corresponding hash table entry from the hash matching table as the current hash table entry;
对所述当前哈希表项的哈希表项内容进行修改或删除。 Modify or delete the hash table entry content of the current hash table entry.
8、 根据权利要求 1-7任一项所述的内容匹配方法, 其特征在于, 所述哈 希算法包括下述的一种或任意组合: 字符串按单字节相异或、 字符串首字符、 字符串尾字符、 字符串长度、 以及字符串按双字节相异或。 8. The content matching method according to any one of claims 1 to 7, characterized in that the hash algorithm includes one or any combination of the following: string exclusive OR by single byte, string header character, string tail character, string length, and string double-byte exclusive OR.
9、 一种内容匹配装置, 其特征在于, 包括: 9. A content matching device, characterized in that it includes:
第一哈希运算模块, 用于基于设定的至少一种哈希算法对至少一个目标 字符串分别进行哈希运算, 以分别获取每个目标字符串与各哈希算法对应的 各目标哈希结果; The first hash operation module is used to perform a hash operation on at least one target string based on at least one set hash algorithm, so as to obtain each target string and each target hash corresponding to each hash algorithm. result;
哈希表形成模块, 用于根据每个目标字符串的各目标哈希结果形成该目 标字符串的哈希表项, 将各个目标字符串的哈希表项组合形成哈希匹配表; 第二哈希运算模块, 用于根据所述至少一种哈希算法对被测字符串进行 哈希运算, 以获取所述被测字符串与各哈希算法对应的各被测哈希结果; 哈希表匹配模块, 用于根据所述被测字符串的各被测哈希结果在所述哈 希匹配表的各哈希表项中进行匹配, 以获得匹配结果。 The hash table forming module is used to form a hash table entry of each target string according to each target hash result of the target string, and combine the hash table entries of each target string to form a hash matching table; second A hash operation module, configured to perform a hash operation on the tested string according to the at least one hash algorithm to obtain each tested hash result corresponding to the tested string and each hash algorithm; Hash A table matching module, configured to perform matching in each hash table entry of the hash matching table according to each tested hash result of the tested string to obtain a matching result.
10、 根据权利要求 9所述的内容匹配装置, 其特征在于, 所述哈希算法 的数量为至少两个, 则所述哈希表形成模块具体用于将每个目标字符串的第 一个目标哈希结果作为哈希表项索引,其他目标哈希结果作为哈希表项内容, 将各个目标字符串的哈希表项组合形成哈希匹配表; 10. The content matching device according to claim 9, characterized in that, the number of the hash algorithms is at least two, and the hash table forming module is specifically used to convert the first character of each target string into The target hash result is used as the hash table entry index, other target hash results are used as the hash table entry content, and the hash table entries of each target string are combined to form a hash matching table;
则所述哈希表匹配模块包括: The hash table matching module includes:
索引匹配单元, 用于将所述被测字符串的第一个被测哈希结果作为哈希 表项索引, 在所述哈希匹配表中查找对应的哈希表项; An index matching unit, configured to use the first tested hash result of the tested string as a hash table entry index, and search for the corresponding hash table entry in the hash matching table;
内容匹配单元, 用于如果查找到对应的哈希表项, 将所述被测字符串的 其他被测哈希结果与查找到的哈希表项的内容进行匹配; A content matching unit, used to match other tested hash results of the tested string with the content of the found hash table entry if the corresponding hash table entry is found;
结果获取单元, 用于当所述其他被测哈希结果与查找到的哈希表项的内 容均匹配一致时, 获得匹配成功结果。 The result acquisition unit is used to obtain a successful matching result when the other tested hash results match the contents of the found hash table entry.
1 1、 根据权利要求 10所述的内容匹配装置, 其特征在于, 还包括: 第三哈希运算模块, 用于根据接收到的目标字符串添加请求中待添加的 目标字符串, 基于所述设定的至少一种哈希算法对待添加的目标字符串进行 哈希运算, 以获取所述待添加的目标字符串与各哈希算法对应的各目标哈希 结果; 1 1. The content matching device according to claim 10, further comprising: a third hash operation module, configured to add the content to be added in the request according to the received target string. Target string: perform a hash operation on the target string to be added based on at least one hash algorithm set to obtain the target hash result corresponding to the target string to be added and each hash algorithm;
表项读取模块, 用于将所述待添加目标字符串的第一个目标哈希结果作 为哈希表项索引, 从所述哈希匹配表中读取对应的哈希表项, 作为当前哈希 表项; The entry reading module is used to use the first target hash result of the target string to be added as a hash table entry index, and read the corresponding hash table entry from the hash matching table as the current Hash table entry;
内容添加模块, 用于当所述当前哈希表项的表项内容为空时, 将所述待 添加目标字符串的其他目标哈希结果添加到当前哈希表项中, 作为当前哈希 表项的内容; Content addition module, used to add other target hash results of the target string to be added to the current hash table entry as the current hash table when the entry content of the current hash table entry is empty. The content of the item;
级联添加模块, 用于当所述当前哈希表项的表项内容不为空时, 釆用级 联方式, 将所述待添加目标字符串的其他目标哈希结果作为所述当前哈希表 项的下一级表项内容添加至所述哈希匹配表中。 The cascade addition module is used to use the cascade method to use other target hash results of the target string to be added as the current hash when the entry content of the current hash table entry is not empty. The content of the next-level table entry of the table entry is added to the hash matching table.
12、 根据权利要求 11所述的内容匹配装置, 其特征在于, 级联添加模块 包括: 12. The content matching device according to claim 11, characterized in that the cascade addition module includes:
比较单元, 用于比较待添加目标字符串的其他目标哈希结果与当前哈希 表项的表项内容是否一致; The comparison unit is used to compare whether other target hash results of the target string to be added are consistent with the contents of the current hash table entry;
丟弃单元, 用于若一致时, 丟弃待添加目标字符串; The discard unit is used to discard the target string to be added if it is consistent;
偏移索引读取单元, 用于若不一致时, 读取当前哈希表项的下一级偏移 表项索引, 并根据所述偏移表项索引读取下一级哈希表项, 将所述下一级哈 希表项作为更新后的当前哈希表项; The offset index reading unit is used to read the next-level offset table entry index of the current hash table entry if it is inconsistent, and read the next-level hash table entry according to the offset table entry index, and The next-level hash table entry is used as the updated current hash table entry;
内容添加单元, 用于当判断出所述更新后的当前哈希表项的表项内容为 空时, 将待添加目标字符串的其他目标哈希结果添加为当前哈希表项的表项 内容; A content adding unit, configured to add other target hash results of the target string to be added as the entry content of the current hash table entry when it is determined that the updated entry content of the current hash table entry is empty. ;
内容判断单元, 用于当判断出所述更新后的当前哈希表项的表项内容不 为空时, 返回执行所述比较操作。 The content judgment unit is configured to return to perform the comparison operation when it is judged that the entry content of the updated current hash table entry is not empty.
13、 根据权利要求 11或 12所述的内容匹配装置, 其特征在于, 还包括: 偏移表项查找单元, 当所述其他被测哈希结果与查找到的哈希表项的内 容匹配不一致时, 按照偏移表项索引顺序查找下一级哈希表项, 并返回执行 所述内容匹配单元的将所述被测字符串的其他被测哈希结果与查找到的哈希 表项的内容进行匹配的操作。 13. The content matching device according to claim 11 or 12, further comprising: an offset entry search unit, when the other tested hash results are inconsistent with the content of the found hash table entry. When, the next-level hash table entry is searched according to the offset table entry index sequence, and the other tested hash results of the tested string and the found hash table entry are returned to the execution of the content matching unit. Content matching operation.
14、 根据权利要求 11或 12所述的内容匹配装置, 其特征在于, 还包括: 第四哈希运算模块, 用于根据接收到的目标字符串修改请求或删除请求 中的待修改或待删除目标字符串, 基于设定的至少一种哈希算法对所述待修 改或待删除目标字符串进行哈希运算, 以获取所述待修改或待删除目标字符 串与各哈希算法对应的各目标哈希结果; 14. The content matching device according to claim 11 or 12, further comprising: a fourth hash operation module, configured to modify or delete the content to be modified or deleted in the received target string modification request or deletion request. Target string: perform a hash operation on the target string to be modified or deleted based on at least one set hash algorithm to obtain the target string to be modified or deleted and each hash algorithm corresponding to the target string. target hash result;
索引匹配模块, 用于将待修改或待删除目标字符串的第一个目标哈希结 果作为哈希表项索引, 从所述哈希匹配表中读取对应的哈希表项, 作为当前 哈希表项; The index matching module is used to use the first target hash result of the target string to be modified or to be deleted as a hash table entry index, and read the corresponding hash table entry from the hash matching table as the current hash table. Greek entry;
修改删除模块, 用于对所述当前哈希表项的哈希表项内容进行修改或删 除。 Modify and delete module, used to modify or delete the hash table entry content of the current hash table entry.
PCT/CN2012/077996 2012-06-30 2012-06-30 Method and apparatus for content matching WO2014000305A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201280000614.9A CN102870116B (en) 2012-06-30 2012-06-30 Method and apparatus for content matching
PCT/CN2012/077996 WO2014000305A1 (en) 2012-06-30 2012-06-30 Method and apparatus for content matching

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/077996 WO2014000305A1 (en) 2012-06-30 2012-06-30 Method and apparatus for content matching

Publications (1)

Publication Number Publication Date
WO2014000305A1 true WO2014000305A1 (en) 2014-01-03

Family

ID=47447746

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/077996 WO2014000305A1 (en) 2012-06-30 2012-06-30 Method and apparatus for content matching

Country Status (2)

Country Link
CN (1) CN102870116B (en)
WO (1) WO2014000305A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116629B (en) * 2013-02-01 2016-04-20 腾讯科技(深圳)有限公司 A kind of matching process of audio content and system
CN103414701B (en) * 2013-07-25 2017-03-01 华为技术有限公司 A kind of rule matching method and device
CN103500183A (en) * 2013-09-12 2014-01-08 国家计算机网络与信息安全管理中心 Storage structure based on multiple-relevant-field combined index and building, inquiring and maintaining method
CN105426413B (en) * 2015-10-31 2018-05-04 华为技术有限公司 A kind of coding method and device
CN106067876B (en) * 2016-05-27 2019-08-16 成都广达新网科技股份有限公司 A kind of HTTP request packet identification method based on pattern match
CN109977295A (en) * 2019-04-11 2019-07-05 北京安护环宇科技有限公司 A kind of black and white lists matching process and device
CN111627536A (en) * 2020-05-14 2020-09-04 广元市中心医院 Adverse event management system and method for hospital
US11301440B2 (en) * 2020-06-18 2022-04-12 Lexisnexis Risk Solutions, Inc. Fuzzy search using field-level deletion neighborhoods
CN113347214B (en) * 2021-08-05 2021-11-12 湖南戎腾网络科技有限公司 High-frequency state matching method and system
CN114422389B (en) * 2022-02-24 2023-09-12 成都北中网芯科技有限公司 High-speed real-time network data monitoring method based on hash and hardware acceleration

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204703A1 (en) * 2002-04-25 2003-10-30 Priya Rajagopal Multi-pass hierarchical pattern matching
US20060034115A1 (en) * 2003-06-27 2006-02-16 Dialog Semiconductor Gmbh Natural analog or multilevel transistor DRAM-cell
CN1794236A (en) * 2004-12-21 2006-06-28 英特尔公司 Efficient CAM-based techniques to perform string searches in packet payloads
CN101350788A (en) * 2008-08-25 2009-01-21 中兴通讯股份有限公司 Method for mixed loop-up table of network processor inside and outside
CN101692651A (en) * 2009-09-27 2010-04-07 中兴通讯股份有限公司 Method and device for Hash lookup table

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204703A1 (en) * 2002-04-25 2003-10-30 Priya Rajagopal Multi-pass hierarchical pattern matching
US20060034115A1 (en) * 2003-06-27 2006-02-16 Dialog Semiconductor Gmbh Natural analog or multilevel transistor DRAM-cell
CN1794236A (en) * 2004-12-21 2006-06-28 英特尔公司 Efficient CAM-based techniques to perform string searches in packet payloads
CN101350788A (en) * 2008-08-25 2009-01-21 中兴通讯股份有限公司 Method for mixed loop-up table of network processor inside and outside
CN101692651A (en) * 2009-09-27 2010-04-07 中兴通讯股份有限公司 Method and device for Hash lookup table

Also Published As

Publication number Publication date
CN102870116B (en) 2014-09-03
CN102870116A (en) 2013-01-09

Similar Documents

Publication Publication Date Title
WO2014000305A1 (en) Method and apparatus for content matching
US10733055B1 (en) Methods and apparatus related to graph transformation and synchronization
US9448999B2 (en) Method and device to detect similar documents
US9336203B2 (en) Semantics-oriented analysis of log message content
EP2924943B1 (en) Virus detection method and device
EP2998884B1 (en) Security information management system and security information management method
WO2015101097A1 (en) Method and device for feature extraction
CN108920954B (en) Automatic malicious code detection platform and method
US20150095359A1 (en) Volume Reducing Classifier
RU2523112C1 (en) System and method of selecting optimum type of antiviral verification when accessing file
RU2728497C1 (en) Method and system for determining belonging of software by its machine code
CN104680064A (en) Method and system for optimizing virus scanning of files using file fingerprints
CN103324886B (en) A kind of extracting method of fingerprint database in network intrusion detection and system
WO2013117151A1 (en) Method and system for rapidly scanning files
US8484221B2 (en) Adaptive routing of documents to searchable indexes
CN111061972B (en) AC searching optimization method and device for URL path matching
US11838322B2 (en) Phishing site detection device, phishing site detection method and phishing site detection program
US11025650B2 (en) Multi-pattern policy detection system and method
JP5464082B2 (en) Document processing apparatus, document processing method, document processing program, and computer-readable recording medium recording the document processing program
Xiao et al. Matching similar functions in different versions of a malware
TW201913414A (en) Multi-document intersection acquisition method and document server
CN113051566B (en) Virus detection method and device, electronic equipment and storage medium
US20240121267A1 (en) Inline malicious url detection with hierarchical structure patterns
Fan et al. An efficient parallel string matching algorithm based on dfa
WO2024114655A1 (en) Rule expression matching method and apparatus, and computer-readable storage medium

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201280000614.9

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12880250

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12880250

Country of ref document: EP

Kind code of ref document: A1