WO2012151843A1 - Ulr filtering system, method and gateway - Google Patents
Ulr filtering system, method and gateway Download PDFInfo
- Publication number
- WO2012151843A1 WO2012151843A1 PCT/CN2011/080608 CN2011080608W WO2012151843A1 WO 2012151843 A1 WO2012151843 A1 WO 2012151843A1 CN 2011080608 W CN2011080608 W CN 2011080608W WO 2012151843 A1 WO2012151843 A1 WO 2012151843A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- url
- unit
- rule file
- memory
- message
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
- H04L63/0236—Filtering by address, protocol, port number or service, e.g. IP-address or URL
Definitions
- the present invention relates to the field of communications, and in particular, to a Uniform I Universal Resource Locator (URL) filtering system, a method for filtering a URL, and a gateway.
- URL Uniform I Universal Resource Locator
- a URL also known as a web page address
- a URL is the address of a standard resource on the Internet, an identification method used to fully describe the addresses of web pages and other resources on the Internet. Every web page on the Internet has a unique URL address name identifier, usually called a URL address. This address can be a local disk, or a computer on a local area network, and more is a site on the Internet. . Simply put, a URL is a web address, commonly known as a "URL.”
- the hash table is used to store the URL information; the method is applicable to URL searches with different domain names, and when the domain name is the same, it takes a long time to find;
- An object of the present invention is to provide a URL filtering system and a method and a gateway for filtering URLs, so as to solve the problem of improving the speed of searching for URLs in the prior art.
- the present invention provides a method for filtering a URL, the method comprising:
- the system When the system receives the packet, it scans and determines whether the packet is a Hyper Text Transfer Protocol (HTTP) packet, and when it is determined to be an HTTP packet, scans the URL in the HTTP packet. Information, and matching with the URL information in the URL rule file in the memory;
- HTTP Hyper Text Transfer Protocol
- the HTTP message is allowed or not allowed to pass according to the matching result.
- the method further includes: determining whether the user-defined URL list has changed, and determining the change, according to the changed user-defined URL list, Regenerate the system-recognized URL rule file and load the newly generated URL rule file into memory;
- the system uses the new URL rule file in the memory to match the URL information and delete the old URL rule file in the memory.
- the method further includes: when the system determines that the received packet is not an HTTP packet, the system directly allows the packet to pass.
- the user-defined URL list is a blacklist or a whitelist.
- the enabling or disallowing of the HTTP packet according to the matching result includes:
- the permission is not allowed.
- the HTTP packet is passed; if the URL information in the received HTTP packet fails to match the URL information in the URL rule file in the memory, the HTTP packet is allowed to pass;
- the HTTP packet is allowed to pass; If the URL information in the HTTP packet is not matched with the URL information in the URL rule file in the memory, the HTTP packet is not allowed to pass.
- the present invention also provides a URL filtering system, including: an identification unit and a memory unit, the system further comprising a rule unit, a scanning unit, and a matching unit;
- the identifying unit is configured to identify whether the received packet is an HTTP packet, and send the identification result to the scanning unit;
- the rule unit is configured to generate a URL rule file recognizable by the system according to the user-defined URL list, and load the URL rule file into the memory unit;
- the scanning unit is configured to scan the received message and send it to the message identification unit, and scan when the identification result returned by the identification unit is determined to be that the received message is an HTTP message.
- the URL information in the HTTP packet, and the URL information is sent to the matching unit; and the HTTP packet is allowed or not allowed to pass according to the matching result returned by the matching unit; the matching unit is set to Matching the URL information in the HTTP packet with the URL information in the URL rule file in the memory unit, and transmitting the matching result to the scanning unit.
- the rule unit is further configured to determine whether the user-defined URL list has changed, and when determining that the user-defined URL has changed, according to the changed user-defined URL list, Regenerating the system-recognizable URL rule file, loading the newly generated URL rule file into the memory unit, and after the loading is successful, notifying the matching unit to use the new URL rule file for URL information matching.
- the matching unit is further configured to receive the notification of the rule unit, The URL information is matched using the new URL rule file, and the old URL rule file in the memory unit is deleted.
- the scanning unit is further configured to directly allow the message to pass when receiving the identification result returned by the identifying unit to determine that the received message is not an HTTP message.
- the invention also provides a gateway comprising the above URL filtering system.
- the invention converts the user-defined URL list into a URL rule file identifiable by the URL system hardware and loads it into the memory.
- the system can quickly match the HTTP message with the URL rule file in the memory. And the matching result is given, the scanning matching speed can reach at least 2 Gbps, and the type of the URL is not required to be distinguished, the complicated and cumbersome URL classification and searching in the existing method are omitted, and the URL processing speed is accelerated; the present invention supports big data.
- the URL filtering is applicable to network devices such as Integrated Service Gateway (ISG), Wireless Application Protocol (WW) gateway, and WEB gateway.
- FIG. 1 is a flow chart of a method for filtering a URL according to the present invention
- FIG. 2 is a schematic block diagram of a URL filtering system according to the present invention.
- FIG. 3 is a schematic block diagram of a gateway of the present invention. detailed description
- FIG. 1 is a flowchart of a method for filtering a URL according to the present invention.
- a method for filtering a URL is as follows. As shown in FIG. 1 , the method for filtering a URL includes the following steps:
- Step S001 Generate a URL rule file recognizable by the URL filtering system according to the user-defined blacklist
- Step S002 Loading the generated URL rule file into the memory
- Step S003 The system receives the packet
- Step S004 Scan the said ⁇ text
- Step S005 determining whether the packet is an HTTP packet, if yes, executing step S006, otherwise, performing step S010;
- Step S006 Scan URL information in the packet.
- Step S007 Matching the URL information in the URL rule file in the memory;
- Step S008 determining whether the matching is successful, if yes, executing step S009; otherwise, executing step S010;
- Step S009 Filtering the information
- the filtering the packet means that the packet is not allowed to pass.
- Step S010 Release the message.
- the releasing the message means allowing the message to pass, and the message in this step includes an HTTP packet and a non-HTTP packet.
- the HTTP message is released. If the URL information in the received HTTP packet fails to match the URL information in the URL rule file in the memory, the HTTP packet is filtered.
- the system may further determine whether the user-defined URL list has changed, and if so, according to the changed user-defined URL list. Regenerate the URL rule file recognizable by the system, and load the newly generated URL rule file into the memory. After the loading is completed, use the new URL rule file to match the URL information, and delete the old URL rule file, which can make this
- the invention implements real-time update of the URL rule file without interrupting the scan matching service.
- the memory A and the memory B can be reserved. If the old URL rule file is stored in the memory A, the newly generated URL rule file is loaded after the user-defined URL list is changed.
- the system uses the URL rule file in the memory B to match the URL information, and at the same time, deletes the URL rule file in the memory A, and when the user-defined URL list changes again, the newly generated The URL rule file is loaded into memory A, and so on. That is to say, the system performs two tasks at the same time, one is to process the received message, and the other is to detect whether the user-defined URL list has changed.
- the hardware-based filtering method of the present invention improves the speed of processing HTTP messages compared with existing software-based methods.
- FIG. 2 is a schematic block diagram of a URL filtering system according to the present invention. As shown in FIG. 2, the system includes: a scanning unit 01, an identification unit 02, a rule unit 03, a matching unit 04, and a memory unit 05;
- the scanning unit 01 is configured to scan the received message and send it to the message identification unit 02, or scan the URL information in the HTTP message, and send the URL information to the matching unit 04; and according to the identification unit 02 The returned recognition result and the matching result returned by the matching unit 04, release and/or filter the received message;
- the identifying unit 02 is configured to identify whether the received packet is an HTTP packet, and send the identification result to the scanning unit 01;
- the rule unit 03 is configured to generate a system-recognizable URL rule file according to the user-defined URL list, and load the generated URL rule file into the memory unit 05; and determine whether the user-defined URL list changes, and The user-defined URL has changed During the process, according to the changed user-defined URL list, the system-recognized URL rule file is regenerated, the newly generated URL rule file is loaded into the memory unit 05, and after the loading is completed, the matching unit 04 is notified to use the new URL rule file for URL information matching;
- the matching unit 04 is configured to match the received URL information with the URL information in the URL rule file in the memory unit 05, and send the matching result to the scanning unit 01, or when receiving the notification of the rule unit 03, The URL information matching is performed using the newly loaded URL rule file in the memory unit 05, and the old URL rule file in the memory unit 05 is deleted.
- the scanning unit 01 scans the URL information in the HTTP message, and sends the URL information to the matching unit 04. And releasing or filtering the HTTP packet according to the matching result returned by the matching unit;
- the scanning unit 01 directly releases the message.
- FIG. 3 is a schematic block diagram of a gateway according to the present invention. As shown in FIG. 3, the URL filtering system shown in FIG. 2 is included.
- the URL filtering system includes a scanning unit 01, an identification unit 02, a rule unit 03, a matching unit 04, and a memory unit 05.
- the function of each unit refer to the description of Figure 2 above, and it will not be repeated here.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Transfer Between Computers (AREA)
- Computer And Data Communications (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The present invention relates to a ULR filtering system, method and gateway. The system includes an identification unit, a memory unit, a rule unit, a scanning unit and a matching unit. The method includes: generating a ULR rule file which can be identified by a system according to a user-defined URL list, and loading the URL rule file into the memory; when receiving a message, the system scanning and judging whether the message is an HTTP message, and when it is determined that it is an HTTP message, scanning the ULR information in the HTTP message, matching the same with the URL information in the URL rule file in the memory, and allowing or not allowing the pass of the HTTP message according to the match result. The present invention need not distinguish the type of the URL, accelerating the URL processing speed.
Description
URL过滤***及过滤 URL的方法、 网关 技术领域 URL filtering system and method for filtering URL, gateway
本发明涉及通信领域, 尤其涉及一种统一资源定位符(URL, Uniform I Universal Resource Locator )过滤***及过滤 URL的方法、 网关。 背景技术 The present invention relates to the field of communications, and in particular, to a Uniform I Universal Resource Locator (URL) filtering system, a method for filtering a URL, and a gateway. Background technique
URL, 也被称为网页地址, 是因特网 (Internet )上标准的资源的地址, 用于完整地描述 Internet上网页和其它资源的地址的一种标识方法。 Internet 上的每一个网页都具有一个唯一的 URL地址名称标识, 通常称之为 URL 地址, 这种地址可以是本地磁盘, 也可以是局域网上的某一台计算机, 更 多的是 Internet上的站点。 简单地说, URL就是 Web地址, 俗称 "网址"。 A URL, also known as a web page address, is the address of a standard resource on the Internet, an identification method used to fully describe the addresses of web pages and other resources on the Internet. Every web page on the Internet has a unique URL address name identifier, usually called a URL address. This address can be a local disk, or a computer on a local area network, and more is a site on the Internet. . Simply put, a URL is a web address, commonly known as a "URL."
随着网络的普及, 互联网上的信息为人们的生活工作提供了越来越多 的便利, 接触到网络的青少年的数量也越来越多, 但网上的信息良莠不齐, 特别是还存在为数不少的宣扬色情、 暴力、 以及迷信等不良事物的网站, 为了给青少年呈现一个健康向上的网站, 需要对其访问的 URL进行过滤, 从而屏蔽掉一些不健康的、 以及非法的网站, 从而保证青少年的健康成长。 With the popularity of the Internet, information on the Internet has provided more and more convenience for people's lives and work. The number of teenagers who have access to the Internet is increasing, but the information on the Internet is mixed, especially there are still many. Websites that promote pornography, violence, and superstitions, in order to present a healthy and up-to-date website for teenagers, need to filter the URLs they visit to block out unhealthy and illegal websites to ensure the health of teenagers. growing up.
目前已有的 URL过滤方法主要有以下三种: Currently, there are three main URL filtering methods:
第一, 使用哈希(hash )表存放 URL信息; 该方法适用于域名不同的 URL查找, 当域名相同时, 查找起来耗时较长; First, the hash table is used to store the URL information; the method is applicable to URL searches with different domain names, and when the domain name is the same, it takes a long time to find;
第二, 使用字符串匹配算法; 该方法适用于关键字查找, 但是, 查找 速度比较慢; Second, use a string matching algorithm; this method is suitable for keyword lookup, but the search speed is slower;
第三, 使用正则匹配算法; 该方法适用于不确定的 URL查找, 但是, 查找速度也比较慢。 Third, use a regular matching algorithm; this method is suitable for indeterminate URL lookups, but the search speed is slower.
现有的方法查找速度会随着 URL名单中的 URL记录增加而显著下降,
如此, 不能满足现在高吞吐网络中的 URL管理。 发明内容 The speed of the existing method search will decrease significantly as the URL record in the URL list increases. In this way, URL management in today's high-throughput networks cannot be met. Summary of the invention
本发明的目的在于提供一种 URL过滤***及过滤 URL的方法、 网关, 以解决改善现有技术查找 URL速度慢的问题。 An object of the present invention is to provide a URL filtering system and a method and a gateway for filtering URLs, so as to solve the problem of improving the speed of searching for URLs in the prior art.
本发明提供了一种过滤 URL的方法, 该方法包括: The present invention provides a method for filtering a URL, the method comprising:
根据用户自定义的 URL名单, 生成 URL过滤***可识别的 URL规则 文件, 并将所述 URL规则文件加载至内存中; Generating a URL rule file recognizable by the URL filtering system according to the user-defined URL list, and loading the URL rule file into the memory;
当所述***收到报文时, 扫描并判断所述报文是否是超文本传输协议 ( HTTP, Hyper Text Transfer Protocol )报文, 确定是 HTTP报文时, 扫描 所述 HTTP报文中的 URL信息, 并与内存中的 URL规则文件中的 URL信 息进行匹配; When the system receives the packet, it scans and determines whether the packet is a Hyper Text Transfer Protocol (HTTP) packet, and when it is determined to be an HTTP packet, scans the URL in the HTTP packet. Information, and matching with the URL information in the URL rule file in the memory;
根据匹配结果允许或不允许所述 HTTP报文通过。 The HTTP message is allowed or not allowed to pass according to the matching result.
上述方案中, 所述将 URL规则文件加载至内存中后, 该方法还包括: 判断所述用户自定义的 URL名单是否有变化, 确定有变化时, 根据变 化后的用户自定义的 URL名单, 重新生成***可识别的 URL规则文件, 并将新生成的 URL规则文件加载至内存中; In the above solution, after the URL rule file is loaded into the memory, the method further includes: determining whether the user-defined URL list has changed, and determining the change, according to the changed user-defined URL list, Regenerate the system-recognized URL rule file and load the newly generated URL rule file into memory;
加载成功后, 所述***使用内存中新的 URL规则文件进行 URL信息 匹配, 同时删除内存中旧的 URL规则文件。 After the loading is successful, the system uses the new URL rule file in the memory to match the URL information and delete the old URL rule file in the memory.
上述方案中, 该方法还包括: 所述***确定收到的报文不是 HTTP报 文时, 直接允许所述报文通过。 In the above solution, the method further includes: when the system determines that the received packet is not an HTTP packet, the system directly allows the packet to pass.
上述方案中, 所述用户自定义的 URL名单是黑名单、 或为白名单。 上述方案中, 所述根据匹配结果允许或不允许所述 HTTP报文通过, 包括: In the foregoing solution, the user-defined URL list is a blacklist or a whitelist. In the above solution, the enabling or disallowing of the HTTP packet according to the matching result includes:
当所述用户自定义的 URL名单为黑名单时, 若收到的 HTTP报文中的 URL信息与内存中的 URL规则文件中的 URL信息匹配成功, 则不允许所
述 HTTP报文通过; 若收到的 HTTP报文中的 URL信息与内存中的 URL 规则文件中的 URL信息匹配失败, 则允许所述 HTTP报文通过; When the URL list of the user-defined URL is blacklisted, if the URL information in the received HTTP packet matches the URL information in the URL rule file in the memory, the permission is not allowed. The HTTP packet is passed; if the URL information in the received HTTP packet fails to match the URL information in the URL rule file in the memory, the HTTP packet is allowed to pass;
当所述用户自定义的 URL名单为白名单时, 若收到的 HTTP报文中的 URL信息与内存中的 URL规则文件中的 URL信息匹配成功, 则允许所述 HTTP报文通过;若收到的 HTTP报文中的 URL信息与内存中的 URL规则 文件中的 URL信息匹配失败, 则不允许所述 HTTP报文通过。 When the user-defined URL list is a whitelist, if the URL information in the received HTTP packet matches the URL information in the URL rule file in the memory, the HTTP packet is allowed to pass; If the URL information in the HTTP packet is not matched with the URL information in the URL rule file in the memory, the HTTP packet is not allowed to pass.
本发明还提供了一种 URL过滤***, 包括: 识别单元以及内存单元, 该***还包括规则单元、 扫描单元以及匹配单元; 其中, The present invention also provides a URL filtering system, including: an identification unit and a memory unit, the system further comprising a rule unit, a scanning unit, and a matching unit;
所述识别单元, 设置为识别收到的报文是否是 HTTP报文, 并将识别 结果发送给所述扫描单元; The identifying unit is configured to identify whether the received packet is an HTTP packet, and send the identification result to the scanning unit;
所述规则单元, 设置为根据用户自定义的 URL名单, 生成***可识别 的 URL规则文件, 并将所述 URL规则文件加载至所述内存单元; The rule unit is configured to generate a URL rule file recognizable by the system according to the user-defined URL list, and load the URL rule file into the memory unit;
所述扫描单元, 设置为扫描收到的报文, 并发送给所述报文识别单元, 并在收到所述识别单元返回的识别结果为确定收到的报文是 HTTP报文时, 扫描 HTTP报文中的 URL信息, 并将所述 URL信息发送给所述匹配单元; 并根据所述匹配单元返回的匹配结果, 允许或不允许所述 HTTP报文通过; 所述匹配单元, 设置为将所述 HTTP报文中的 URL信息与所述内存单 元中的 URL规则文件中的 URL信息进行匹配, 并将匹配结果发送给所述 扫描单元。 The scanning unit is configured to scan the received message and send it to the message identification unit, and scan when the identification result returned by the identification unit is determined to be that the received message is an HTTP message. The URL information in the HTTP packet, and the URL information is sent to the matching unit; and the HTTP packet is allowed or not allowed to pass according to the matching result returned by the matching unit; the matching unit is set to Matching the URL information in the HTTP packet with the URL information in the URL rule file in the memory unit, and transmitting the matching result to the scanning unit.
上述方案中, 所述规则单元, 还设置为判断所述用户自定义的 URL名 单是否有变化, 并在确定所述用户自定义的 URL有变化时, 根据变化后的 用户自定义的 URL名单, 重新生成***可识别的 URL规则文件, 将新生 成的 URL规则文件加载至所述内存单元中, 并在加载成功后, 通知所述匹 配单元使用新的 URL规则文件进行 URL信息匹配。 In the above solution, the rule unit is further configured to determine whether the user-defined URL list has changed, and when determining that the user-defined URL has changed, according to the changed user-defined URL list, Regenerating the system-recognizable URL rule file, loading the newly generated URL rule file into the memory unit, and after the loading is successful, notifying the matching unit to use the new URL rule file for URL information matching.
上述方案中, 所述匹配单元, 还设置为收到所述规则单元的通知后,
使用新的 URL规则文件进行 URL信息匹配, 并删除所述内存单元中旧的 URL规则文件。 In the above solution, the matching unit is further configured to receive the notification of the rule unit, The URL information is matched using the new URL rule file, and the old URL rule file in the memory unit is deleted.
上述方案中, 所述扫描单元, 还设置为在收到所述识别单元返回的识 别结果为确定收到的报文不是 HTTP报文时, 直接允许所述报文通过。 In the above solution, the scanning unit is further configured to directly allow the message to pass when receiving the identification result returned by the identifying unit to determine that the received message is not an HTTP message.
本发明还提供了一种网关, 该网关包括上述 URL过滤***。 The invention also provides a gateway comprising the above URL filtering system.
本发明将用户自定义的 URL名单转换成 URL***硬件可识别的 URL 规则文件并加载至内存中, 当收到报文时, ***可以迅速的把 HTTP报文 与内存中的 URL规则文件进行匹配, 并给出匹配结果, 扫描匹配速度可以 达到至少 2Gbps, 且不需要区分 URL的类型, 省去了现有方法中复杂而繁 瑣的 URL分类及查找,加快了 URL处理速度;本发明支持大数据量的 URL 过滤, 适用于综合业务网关(ISG, Integrated Service Gateway, )、 无线应 用协议( WAP, Wireless Application Protocol ) 网关、 WEB网关等网络设备 中。 附图说明 The invention converts the user-defined URL list into a URL rule file identifiable by the URL system hardware and loads it into the memory. When receiving the message, the system can quickly match the HTTP message with the URL rule file in the memory. And the matching result is given, the scanning matching speed can reach at least 2 Gbps, and the type of the URL is not required to be distinguished, the complicated and cumbersome URL classification and searching in the existing method are omitted, and the URL processing speed is accelerated; the present invention supports big data. The URL filtering is applicable to network devices such as Integrated Service Gateway (ISG), Wireless Application Protocol (WW) gateway, and WEB gateway. DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解, 构成本发明的一 部分, 本发明的示意性实施例及其说明用于解释本发明, 并不构成对本发 明的不当限定。 在附图中: The drawings are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图 1为本发明过滤 URL的方法流程图; 1 is a flow chart of a method for filtering a URL according to the present invention;
图 2为本发明 URL过滤***的原理框图; 2 is a schematic block diagram of a URL filtering system according to the present invention;
图 3为本发明的网关的原理框图。 具体实施方式 3 is a schematic block diagram of a gateway of the present invention. detailed description
为了使本发明所要解决的技术问题、 技术方案及有益效果更加清楚、 明白, 以下结合附图和实施例, 对本发明进行进一步详细说明。 应当理解, 此处所描述的具体实施例仅用以解释本发明, 并不用于限定本发明。
图 1 为本发明过滤 URL 的方法流程图, 本实施例假设用户自定义的 URL名单为黑名单, 如图 1所示, 本发明过滤 URL的方法, 具体包括以下 步驟: The present invention will be further described in detail below with reference to the accompanying drawings and embodiments in order to make the present invention. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. FIG. 1 is a flowchart of a method for filtering a URL according to the present invention. In this embodiment, a method for filtering a URL is as follows. As shown in FIG. 1 , the method for filtering a URL includes the following steps:
步驟 S001 : 根据用户自定义的黑名单, 生成 URL过滤***可识别的 URL规则文件; Step S001: Generate a URL rule file recognizable by the URL filtering system according to the user-defined blacklist;
步驟 S002: 将生成的 URL规则文件加载到内存中; Step S002: Loading the generated URL rule file into the memory;
步驟 S003: ***收到报文; Step S003: The system receives the packet;
步驟 S004: 扫描所述 ·^艮文; Step S004: Scan the said ^^ text;
步驟 S005: 判断所述报文是否是 HTTP报文,若是,则执行步驟 S006, 否则, 执行步驟 S010; Step S005: determining whether the packet is an HTTP packet, if yes, executing step S006, otherwise, performing step S010;
步驟 S006: 扫描所述报文中的 URL信息; Step S006: Scan URL information in the packet.
步驟 S007: 与内存中的 URL规则文件中的 URL信息进行匹配; 步驟 S008: 判断是否匹配成功, 若是, 则执行步驟 S009; 否则, 执行 步驟 S010; Step S007: Matching the URL information in the URL rule file in the memory; Step S008: determining whether the matching is successful, if yes, executing step S009; otherwise, executing step S010;
步驟 S009: 过滤所述 ·^艮文; Step S009: Filtering the information;
这里, 所述过滤所述报文就是指不允许所述报文通过。 Here, the filtering the packet means that the packet is not allowed to pass.
步驟 S010: 放行所述报文。 Step S010: Release the message.
这里, 所述放行所述报文就是指允许所述报文通过, 本步驟的报文包 括 HTTP才艮文和非 HTTP才艮文。 Here, the releasing the message means allowing the message to pass, and the message in this step includes an HTTP packet and a non-HTTP packet.
在其它实施例中, 当用户自定义的 URL名单为白名单时, 若收到的 HTTP报文中的 URL信息与内存中的 URL规则文件中的 URL信息匹配成 功, 则放行所述 HTTP报文; 若收到的 HTTP报文中的 URL信息与内存中 的 URL规则文件中的 URL信息匹配失败, 则过滤所述 HTTP报文。 In other embodiments, when the user-defined URL list is a whitelist, if the URL information in the received HTTP message matches the URL information in the in-memory URL rule file, the HTTP message is released. If the URL information in the received HTTP packet fails to match the URL information in the URL rule file in the memory, the HTTP packet is filtered.
本发明中, ***处理 4艮文的同时, 还可以进一步判断所述用户自定义 的 URL名单是否有变化, 若是, 则根据变化后的用户自定义的 URL名单,
重新生成***可识别的 URL规则文件, 并将新生成的 URL规则文件加载 到内存中, 加载完成后, 使用新的 URL规则文件进行 URL信息匹配, 同 时删除旧的 URL规则文件, 这能使得本发明在不中断扫描匹配业务的情况 下, 实现 URL规则文件的实时更新。 在具体的实施例中, 可以预留内存 A 和内存 B两块内存, 若旧的 URL规则文件存放在内存 A中, 那么, 用户自 定义的 URL名单变化后, 新生成的 URL规则文件就加载到内存 B中, 加 载完成后, ***使用内存 B中的 URL规则文件进行 URL信息匹配, 与此 同时, 删除内存 A中的 URL规则文件, 当用户自定义的 URL名单再次变 化后,新生成的 URL规则文件则被加载到内存 A中,依次类推。也就是说, ***同时执行两个任务, 一个是处理收到的报文, 一个是检测用户自定义 的 URL名单是否有变化。 In the present invention, the system may further determine whether the user-defined URL list has changed, and if so, according to the changed user-defined URL list. Regenerate the URL rule file recognizable by the system, and load the newly generated URL rule file into the memory. After the loading is completed, use the new URL rule file to match the URL information, and delete the old URL rule file, which can make this The invention implements real-time update of the URL rule file without interrupting the scan matching service. In a specific embodiment, the memory A and the memory B can be reserved. If the old URL rule file is stored in the memory A, the newly generated URL rule file is loaded after the user-defined URL list is changed. In the memory B, after the loading is completed, the system uses the URL rule file in the memory B to match the URL information, and at the same time, deletes the URL rule file in the memory A, and when the user-defined URL list changes again, the newly generated The URL rule file is loaded into memory A, and so on. That is to say, the system performs two tasks at the same time, one is to process the received message, and the other is to detect whether the user-defined URL list has changed.
本发明基于硬件的过滤方法, 与基于软件的现有方法相比, 提高了处 理 HTTP才艮文的速度。 The hardware-based filtering method of the present invention improves the speed of processing HTTP messages compared with existing software-based methods.
图 2为本发明 URL过滤***的原理框图, 如图 2所示, 该***包括: 扫描单元 01、 识别单元 02、 规则单元 03、 匹配单元 04、 以及内存单元 05; 其中, 2 is a schematic block diagram of a URL filtering system according to the present invention. As shown in FIG. 2, the system includes: a scanning unit 01, an identification unit 02, a rule unit 03, a matching unit 04, and a memory unit 05;
扫描单元 01 , 用于扫描收到的报文, 并发送给报文识别单元 02, 或者, 扫描 HTTP报文中的 URL信息, 并将所述 URL信息发送给匹配单元 04; 并根据识别单元 02返回的识别结果以及匹配单元 04返回的匹配结果, 放 行和 /或过滤收到的报文; The scanning unit 01 is configured to scan the received message and send it to the message identification unit 02, or scan the URL information in the HTTP message, and send the URL information to the matching unit 04; and according to the identification unit 02 The returned recognition result and the matching result returned by the matching unit 04, release and/or filter the received message;
识别单元 02, 用于识别收到的报文是否是 HTTP报文, 并将识别结果 发送给扫描单元 01 ; The identifying unit 02 is configured to identify whether the received packet is an HTTP packet, and send the identification result to the scanning unit 01;
规则单元 03 , 用于根据用户自定义的 URL名单, 生成***可识别的 URL规则文件, 并将生成的 URL规则文件加载到内存单元 05; 并用于判 断用户自定义的 URL名单是否有变化, 并在所述用户自定义的 URL有变
化时,根据变化后的用户自定义的 URL名单, 重新生成***可识别的 URL 规则文件, 将新生成的 URL规则文件加载到内存单元 05中, 并在加载完 成后 , 通知匹配单元 04使用新的 URL规则文件进行 URL信息匹配; The rule unit 03 is configured to generate a system-recognizable URL rule file according to the user-defined URL list, and load the generated URL rule file into the memory unit 05; and determine whether the user-defined URL list changes, and The user-defined URL has changed During the process, according to the changed user-defined URL list, the system-recognized URL rule file is regenerated, the newly generated URL rule file is loaded into the memory unit 05, and after the loading is completed, the matching unit 04 is notified to use the new URL rule file for URL information matching;
匹配单元 04,用于将收到的 URL信息与内存单元 05中的 URL规则文 件中的 URL信息进行匹配, 并将匹配结果发送给扫描单元 01 , 或者, 在收 到规则单元 03的通知时, 使用内存单元 05中新加载的 URL规则文件进行 URL信息匹配, 并删除内存单元 05中旧的 URL规则文件。 The matching unit 04 is configured to match the received URL information with the URL information in the URL rule file in the memory unit 05, and send the matching result to the scanning unit 01, or when receiving the notification of the rule unit 03, The URL information matching is performed using the newly loaded URL rule file in the memory unit 05, and the old URL rule file in the memory unit 05 is deleted.
其中,当所述识别单元 02返回的识别结果为确定收到的报文不是 HTTP 报文时, 所述扫描单元 01扫描 HTTP报文中的 URL信息, 并将所述 URL 信息发送给匹配单元 04; 并根据所述匹配单元返回的匹配结果, 放行或过 滤所述 HTTP才艮文; When the recognition result returned by the identification unit 02 is that the received message is not an HTTP message, the scanning unit 01 scans the URL information in the HTTP message, and sends the URL information to the matching unit 04. And releasing or filtering the HTTP packet according to the matching result returned by the matching unit;
当所述识别单元 02返回的识别结果为确定收到的报文不是 HTTP报文 时, 所述扫描单元 01直接放行所述报文。 When the recognition result returned by the identification unit 02 is that the received message is not an HTTP message, the scanning unit 01 directly releases the message.
图 3为本发明的网关原理框图, 如图 3所示, 包括图 2所示的 URL过 滤***, URL过滤***包括扫描单元 01、 识别单元 02、 规则单元 03、 匹 配单元 04、 以及内存单元 05, 各单元功能参见上述对图 2的描述, 此处不 再复述。 3 is a schematic block diagram of a gateway according to the present invention. As shown in FIG. 3, the URL filtering system shown in FIG. 2 is included. The URL filtering system includes a scanning unit 01, an identification unit 02, a rule unit 03, a matching unit 04, and a memory unit 05. For the function of each unit, refer to the description of Figure 2 above, and it will not be repeated here.
上述说明示出并描述了本发明的优选实施例, 但如前所述, 应当理解 本发明并非局限于本文所披露的形式, 不应看作是对其它实施例的排除, 而可用于各种其它组合、 修改和环境, 并能够在本文所述发明构想范围内, 通过上述教导或相关领域的技术或知识进行改动。 而本领域人员所进行的 改动和变化不脱离本发明的精神和范围, 则都应在本发明所附权利要求的 保护范围内。
The above description shows and describes a preferred embodiment of the present invention, but as described above, it should be understood that the invention is not limited to the form disclosed herein, and should not be construed as Other combinations, modifications, and environments are possible and can be modified by the teachings of the above teachings or related art within the scope of the inventive concept described herein. All changes and modifications made by those skilled in the art are intended to be within the scope of the appended claims.
Claims
1、 一种过滤统一资源定位符 URL的方法, 其中, 该方法包括: 根据用户自定义的 URL名单, 生成 URL过滤***可识别的 URL规则 文件, 并将所述 URL规则文件加载至内存中; A method for filtering a uniform resource locator URL, wherein the method comprises: generating a URL rule file recognizable by a URL filtering system according to a user-defined URL list, and loading the URL rule file into a memory;
当所述***收到报文时, 扫描并判断所述报文是否是超文本传输协议 HTTP才艮文, 确定是 HTTP才艮文时 , 扫 4笛所述 HTTP才艮文中的 URL信息 , 并与内存中的 URL规则文件中的 URL信息进行匹配; When the system receives the message, it scans and determines whether the message is a Hypertext Transfer Protocol (HTTP) message, and determines that the HTTP message is in the HTTP message, and Match the URL information in the in-memory URL rule file;
根据匹配结果允许或不允许所述 HTTP报文通过。 The HTTP message is allowed or not allowed to pass according to the matching result.
2、 根据权利要求 1所述的方法, 其中, 所述将 URL规则文件加载至 内存中后, 该方法还包括: 2. The method according to claim 1, wherein, after the URL rule file is loaded into the memory, the method further includes:
判断所述用户自定义的 URL名单是否有变化, 确定有变化时, 根据变 化后的用户自定义的 URL名单, 重新生成***可识别的 URL规则文件, 并将新生成的 URL规则文件加载至内存中; Determining whether there is a change in the user-defined URL list, and determining that there is a change, regenerating the system-recognizable URL rule file according to the changed user-defined URL list, and loading the newly generated URL rule file into the memory Medium
加载成功后, 所述***使用内存中新的 URL规则文件进行 URL信息 匹配, 同时删除内存中旧的 URL规则文件。 After the loading is successful, the system uses the new URL rule file in the memory to match the URL information and delete the old URL rule file in the memory.
3、 根据权利要求 1所述的方法, 其中, 该方法进一步包括: 所述*** 确定收到的报文不是 HTTP报文时, 直接允许所述报文通过。 3. The method according to claim 1, wherein the method further comprises: when the system determines that the received message is not an HTTP message, directly allowing the message to pass.
4、 根据权利要求 2所述的方法, 其中, 所述用户自定义的 URL名单 为黑名单、 或为白名单。 4. The method according to claim 2, wherein the user-defined URL list is a blacklist or a whitelist.
5、 根据权利要求 1至 4任一项所述的方法, 其中, 所述根据匹配结果 允许或不允许所述 HTTP ^艮文通过, 包括: The method according to any one of claims 1 to 4, wherein the allowing or disallowing the HTTP ^ 艮 text according to the matching result includes:
当所述用户自定义的 URL名单为黑名单时, 若收到的 HTTP报文中的 URL信息与内存中的 URL规则文件中的 URL信息匹配成功, 则不允许所 述 HTTP报文通过; 若收到的 HTTP报文中的 URL信息与内存中的 URL 规则文件中的 URL信息匹配失败, 则允许所述 HTTP报文通过; 当所述用户自定义的 URL名单为白名单时, 若收到的 HTTP报文中的 URL信息与内存中的 URL规则文件中的 URL信息匹配成功, 则允许所述 HTTP报文通过;若收到的 HTTP报文中的 URL信息与内存中的 URL规则 文件中的 URL信息匹配失败, 则不允许所述 HTTP报文通过。 If the URL list of the user-defined URL is a blacklist, if the URL information in the received HTTP packet matches the URL information in the URL rule file in the memory, the HTTP packet is not allowed to pass. If the URL information in the received HTTP packet fails to match the URL information in the URL rule file in the memory, the HTTP packet is allowed to pass. When the user-defined URL list is a whitelist, if the URL information in the received HTTP packet matches the URL information in the URL rule file in the memory, the HTTP packet is allowed to pass; If the URL information in the HTTP packet is not matched with the URL information in the URL rule file in the memory, the HTTP packet is not allowed to pass.
6、 一种 URL过滤***, 该***包括: 识别单元以及内存单元, 其中, 该***还包括: 规则单元、 扫描单元以及匹配单元; 其中, A URL filtering system, the system includes: an identification unit and a memory unit, wherein the system further includes: a rule unit, a scanning unit, and a matching unit;
所述识别单元, 设置为识别收到的报文是否是 HTTP报文, 并将识别 结果发送给所述扫描单元; The identifying unit is configured to identify whether the received packet is an HTTP packet, and send the identification result to the scanning unit;
所述规则单元, 设置为根据用户自定义的 URL名单, 生成***可识别 的 URL规则文件, 并将所述 URL规则文件加载至所述内存单元; The rule unit is configured to generate a URL rule file recognizable by the system according to the user-defined URL list, and load the URL rule file into the memory unit;
所述扫描单元, 设置为扫描收到的报文, 并发送给所述报文识别单元, 并在收到所述识别单元返回的识别结果为确定收到的报文是 HTTP报文时, 扫描所述 HTTP报文中的 URL信息, 并将所述 URL信息发送给所述匹配 单元; 并根据所述匹配单元返回的匹配结果, 允许或不允许所述 HTTP报 文通过; The scanning unit is configured to scan the received message and send it to the message identification unit, and scan when the identification result returned by the identification unit is determined to be that the received message is an HTTP message. Transmitting the URL information in the HTTP packet, and sending the URL information to the matching unit; and allowing or disallowing the HTTP packet to pass according to the matching result returned by the matching unit;
所述匹配单元, 设置为将所述 HTTP报文中的 URL信息与所述内存单 元中的 URL规则文件中的 URL信息进行匹配, 并将匹配结果发送给所述 扫描单元。 The matching unit is configured to match the URL information in the HTTP packet with the URL information in the URL rule file in the memory unit, and send the matching result to the scanning unit.
7、 根据权利要求 6所述的***, 其中, 7. The system according to claim 6, wherein
所述规则单元,还设置为判断所述用户自定义的 URL名单是否有变化, 并在确定所述用户自定义的 URL有变化时, 根据变化后的用户自定义的 URL名单, 重新生成***可识别的 URL规则文件, 将新生成的 URL规则 文件加载至所述内存单元中, 并在加载成功后, 通知所述匹配单元使用新 的 URL规则文件进行 URL信息匹配。 The rule unit is further configured to determine whether the user-defined URL list has changed, and when determining that the user-defined URL has changed, according to the changed user-defined URL list, the system may be regenerated. The identified URL rule file loads the newly generated URL rule file into the memory unit, and after the loading is successful, notifies the matching unit to use the new URL rule file to perform URL information matching.
8、 根据权利要求 7所述的***, 其中, 所述匹配单元, 还设置为收到所述规则单元的通知后, 使用新的 URL 规则文件进行 URL信息匹配, 并删除所述内存单元中旧的 URL规则文件。 8. The system according to claim 7, wherein The matching unit is further configured to: after receiving the notification of the rule unit, use a new URL rule file to perform URL information matching, and delete the old URL rule file in the memory unit.
9、 根据权利要求 6至 8任一项所述的***, 其中, 所述扫描单元, 还 设置为在收到所述识别单元返回的识别结果为确定收到的报文不是 HTTP 报文时, 直接允许所述报文通过。 The system according to any one of claims 6 to 8, wherein the scanning unit is further configured to: when receiving the recognition result returned by the identification unit, determining that the received message is not an HTTP message, The message is allowed to pass directly.
10、 一种网关, 其中, 该网关包括权利要求 6至 9任一项所述的 URL 过滤***。 A gateway, wherein the gateway comprises the URL filtering system of any one of claims 6 to 9.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011101213726A CN102780681A (en) | 2011-05-11 | 2011-05-11 | URL (Uniform Resource Locator) filtering system and URL filtering method |
CN201110121372.6 | 2011-05-11 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012151843A1 true WO2012151843A1 (en) | 2012-11-15 |
Family
ID=47125437
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2011/080608 WO2012151843A1 (en) | 2011-05-11 | 2011-10-10 | Ulr filtering system, method and gateway |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN102780681A (en) |
WO (1) | WO2012151843A1 (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103354546A (en) * | 2013-06-25 | 2013-10-16 | 亿赞普(北京)科技有限公司 | Message filtering method and message filtering apparatus |
CN103401850A (en) * | 2013-07-19 | 2013-11-20 | 北京星网锐捷网络技术有限公司 | Message filtering method and device |
CN103560995A (en) * | 2013-09-25 | 2014-02-05 | 深圳市共进电子股份有限公司 | URL filtering method for realizing IPv4 and IPv6 at the same time |
CN105302815B (en) * | 2014-06-23 | 2019-06-07 | 腾讯科技(深圳)有限公司 | The filter method and device of the uniform resource position mark URL of webpage |
CN105938472A (en) * | 2015-08-26 | 2016-09-14 | 杭州迪普科技有限公司 | Web access control method and device |
CN106657201B (en) * | 2015-11-03 | 2021-08-24 | 中兴通讯股份有限公司 | Data processing method and device of GSLB (generalized Global System for Mobile communications) scheduling system |
CN106970917B (en) * | 2016-01-13 | 2019-11-19 | 中国科学院声学研究所 | A kind of foundation of the Hash table of blacklist URL and the lookup method of request URL |
CN107404392A (en) * | 2016-05-20 | 2017-11-28 | 中兴通讯股份有限公司 | The processing method and processing device of the scheduling rule of uniform resource position mark URL |
CN109547421A (en) * | 2018-11-08 | 2019-03-29 | 锐捷网络股份有限公司 | A kind of method and device for the URL that audits |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080209057A1 (en) * | 2006-09-28 | 2008-08-28 | Paul Martini | System and Method for Improved Internet Content Filtering |
CN101795272A (en) * | 2010-01-22 | 2010-08-04 | 联想网御科技(北京)有限公司 | Illegal website filtering method and device |
CN102004770A (en) * | 2010-11-16 | 2011-04-06 | 杭州迪普科技有限公司 | Webpage auditing method and device |
CN102004789A (en) * | 2010-12-07 | 2011-04-06 | 苏州迈科网络安全技术股份有限公司 | Application method of uniform/universal resource locator (URL) filter system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090083240A1 (en) * | 2007-09-24 | 2009-03-26 | Microsoft Corporation | Authorization agnostic based mechanism |
-
2011
- 2011-05-11 CN CN2011101213726A patent/CN102780681A/en active Pending
- 2011-10-10 WO PCT/CN2011/080608 patent/WO2012151843A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080209057A1 (en) * | 2006-09-28 | 2008-08-28 | Paul Martini | System and Method for Improved Internet Content Filtering |
CN101795272A (en) * | 2010-01-22 | 2010-08-04 | 联想网御科技(北京)有限公司 | Illegal website filtering method and device |
CN102004770A (en) * | 2010-11-16 | 2011-04-06 | 杭州迪普科技有限公司 | Webpage auditing method and device |
CN102004789A (en) * | 2010-12-07 | 2011-04-06 | 苏州迈科网络安全技术股份有限公司 | Application method of uniform/universal resource locator (URL) filter system |
Also Published As
Publication number | Publication date |
---|---|
CN102780681A (en) | 2012-11-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2012151843A1 (en) | Ulr filtering system, method and gateway | |
CN106489258B (en) | Linking to content using an information centric network | |
WO2018107784A1 (en) | Method and device for detecting webshell | |
JP5917573B2 (en) | Real-time data awareness and file tracking system and method | |
JP6006788B2 (en) | Using DNS communication to filter domain names | |
US8910270B2 (en) | Remote access to private network resources from outside the network | |
US10560452B2 (en) | Apparatus and method to control transfer apparatuses depending on a type of an unauthorized communication occurring in a network | |
EP3170091B1 (en) | Method and server of remote information query | |
US9195826B1 (en) | Graph-based method to detect malware command-and-control infrastructure | |
US20190222656A1 (en) | Communication Method and Apparatus | |
CN102404741B (en) | Method and device for detecting abnormal online of mobile terminal | |
WO2006103743A1 (en) | Communication control device and communication control system | |
WO2012100531A1 (en) | Method, apparatus and system for forwarding packet | |
WO2012034518A1 (en) | Method and system for providing message including universal resource locator | |
JP5980968B2 (en) | Information processing apparatus, information processing method, and program | |
WO2014206152A1 (en) | Network safety monitoring method and system | |
WO2014094483A1 (en) | Access control method for wifi device and wifi device thereof | |
WO2015154416A1 (en) | Internet access behaviour management method and device | |
EP2640035B1 (en) | Hypertext transfer protocol (http) stream association method and device | |
JP5345500B2 (en) | Transfer control method, transfer control device, transfer control system, and transfer control program | |
CN109167758A (en) | A kind of message processing method and device | |
CN108040124B (en) | Method and device for controlling mobile terminal application based on DNS-Over-HTTP protocol | |
US20100211668A1 (en) | Optimized mirror for p2p identification | |
JP5385867B2 (en) | Data transfer apparatus and access analysis method | |
JP6623702B2 (en) | A network monitoring device and a virus detection method in the network monitoring device. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11865201 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 11865201 Country of ref document: EP Kind code of ref document: A1 |