CN104391979B - Network malice reptile recognition methods and device - Google Patents
Network malice reptile recognition methods and device Download PDFInfo
- Publication number
- CN104391979B CN104391979B CN201410743056.6A CN201410743056A CN104391979B CN 104391979 B CN104391979 B CN 104391979B CN 201410743056 A CN201410743056 A CN 201410743056A CN 104391979 B CN104391979 B CN 104391979B
- Authority
- CN
- China
- Prior art keywords
- network address
- network
- detected
- address
- time period
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 241000270322 Lepidosauria Species 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000001514 detection method Methods 0.000 claims description 5
- 235000013399 edible fruits Nutrition 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 4
- 241000208340 Araliaceae Species 0.000 description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 description 3
- 235000008434 ginseng Nutrition 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- GOLXNESZZPUPJE-UHFFFAOYSA-N spiromesifen Chemical compound CC1=CC(C)=CC(C)=C1C(C(O1)=O)=C(OC(=O)CC(C)(C)C)C11CCCC1 GOLXNESZZPUPJE-UHFFFAOYSA-N 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000004148 unit process Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
Claims (8)
- A kind of 1. network malice reptile recognition methods, it is characterised in that including:Network address to be detected is obtained, wherein, the network address to be detected is meets the network address of the first preparatory condition, such as Fruit exceedes preset times threshold value in preset time period by the number of network address access target website, it is determined that the network Address meets first preparatory condition;User access information corresponding to the network address to be detected is obtained, wherein, the user access information includes accessing institute The network terminal information of targeted website is stated, the network terminal information includes objective network end message;According in corresponding user access information include the objective network end message network address to be detected number and The number for accessing the targeted website by the network address to be detected in preset time period calculates target access ratio;Judge whether the target access ratio exceedes pre-set ratio threshold value;AndIf the target access ratio exceedes the pre-set ratio threshold value, it is determined that is accessed by the network address to be detected The behavior of the targeted website is that malice reptile accesses behavior,Wherein, according to for the network address to be detected that the objective network end message is included in corresponding user access information Number and the number calculating target access ratio for accessing the targeted website by the network address to be detected in preset time period Rate includes:Count the number for accessing the targeted website by the network address to be detected in the preset time period;Judge whether include the objective network end message in user access information corresponding to the network address to be detected;If including the objective network end message in user access information corresponding to the network address to be detected, count The number of network address to be detected comprising the objective network end message in corresponding user access information;AndThe target access ratio is calculated by below equation:S=A/B,Wherein, S is the target access ratio, and A is to include the objective network end message in corresponding user access information Network address to be detected number, B is to pass through the network address to be detected in preset time period to access the target network The number stood.
- 2. according to the method for claim 1, it is characterised in that obtain user corresponding to the network address to be detected and access Information includes:Obtain the access log of the targeted website;The access log is parsed, obtains analysis result;AndUser access information corresponding to the network address to be detected is obtained from the analysis result.
- 3. according to the method for claim 1, it is characterised in that determine the pre-set ratio threshold value by the following method:Grid of reference address set is determined, wherein, the grid of reference address set includes multiple network address, the multiple net Network address is the network address for meeting the second preparatory condition, if accessing institute by network address in the preset time period The number of targeted website is stated not less than the preset times threshold value, it is determined that the network address meets the described second default bar Part;Obtain user access information corresponding to the grid of reference address set;AndPre-set ratio threshold value is determined according to user access information corresponding to the grid of reference address set, wherein, it is described default Rate threshold is to include the objective network end message in corresponding user access information in the grid of reference address set Network address number and described in being accessed in preset time period by network address in the grid of reference address set The ratio of the number of targeted website.
- 4. according to the method for claim 3, it is characterised in that visited in the preset time period by multiple network address The targeted website is asked, determines that grid of reference address set includes:Detect respectively in the preset time period by the multiple network address access the targeted website number whether More than the preset times threshold value;AndIt is determined that network of the number not less than the preset times threshold value of the targeted website is accessed in the preset time period Address is the network address in the grid of reference address set.
- A kind of 5. network malice reptile identification device, it is characterised in that including:First acquisition unit, for obtaining network address to be detected, wherein, the network address to be detected is default for satisfaction first The network address of condition, if exceeding preset times threshold by the number of network address access target website in preset time period Value, it is determined that the network address meets first preparatory condition;Second acquisition unit, for obtaining user access information corresponding to the network address to be detected, wherein, the user visits Ask that information includes accessing the network terminal information of the targeted website, the network terminal information is believed including objective network terminal Breath;Computing unit, for including the network to be detected of the objective network end message in the user access information corresponding to The number of address and the number calculating mesh for accessing the targeted website by the network address to be detected in preset time period Mark access ratio;Judging unit, for judging whether the target access ratio exceedes pre-set ratio threshold value;AndDetermining unit, for when the target access ratio exceedes the pre-set ratio threshold value, it is determined that by described to be detected The behavior that network address accesses the targeted website is that malice reptile accesses behavior,Wherein, the computing unit includes:First statistical module, the target is accessed by the network address to be detected in the preset time period for counting The number of website;Judge module, for judging whether include the target network in user access information corresponding to the network address to be detected Network end message;Second statistical module, for including the objective network in user access information corresponding to the network address to be detected During end message, the network address to be detected comprising the objective network end message in user access information corresponding to statistics Number;AndComputing module, for calculating the target access ratio by below equation:S=A/B,Wherein, S is the target access ratio, and A is to include the objective network end message in corresponding user access information Network address to be detected number, B is to pass through the network address to be detected in preset time period to access the target network The number stood.
- 6. device according to claim 5, it is characterised in that the second acquisition unit includes:First acquisition module, for obtaining the access log of the targeted website;Parsing module, for parsing the access log, obtain analysis result;AndSecond acquisition module, letter is accessed for obtaining user corresponding to the network address to be detected from the analysis result Breath.
- 7. device according to claim 5, it is characterised in that by determining the pre-set ratio threshold value with lower module:First determining module, for determining grid of reference address set, wherein, the grid of reference address set includes multiple nets Network address, the multiple network address is to meet the network address of the second preparatory condition, if in the preset time period The number of the targeted website is accessed not less than the preset times threshold value by network address, it is determined that the network address expires Foot second preparatory condition;3rd acquisition module, for obtaining user access information corresponding to the grid of reference address set;AndSecond determining module, pre-set ratio threshold is determined for the user access information according to corresponding to the grid of reference address set Value, wherein, the pre-set ratio threshold value is described to be included in corresponding user access information in the grid of reference address set The number of the network address of objective network end message and in preset time period by the grid of reference address set Network address accesses the ratio of the number of the targeted website.
- 8. device according to claim 7, it is characterised in that visited in the preset time period by multiple network address The targeted website is asked, first determining module includes:Detection sub-module, the target is accessed by the multiple network address in the preset time period for detecting respectively Whether the number of website exceedes the preset times threshold value;AndDetermination sub-module, for determining that the number that the targeted website is accessed in the preset time period is default not less than described The network address of frequency threshold value is the network address in the grid of reference address set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410743056.6A CN104391979B (en) | 2014-12-05 | 2014-12-05 | Network malice reptile recognition methods and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410743056.6A CN104391979B (en) | 2014-12-05 | 2014-12-05 | Network malice reptile recognition methods and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104391979A CN104391979A (en) | 2015-03-04 |
CN104391979B true CN104391979B (en) | 2017-12-19 |
Family
ID=52609883
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410743056.6A Active CN104391979B (en) | 2014-12-05 | 2014-12-05 | Network malice reptile recognition methods and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104391979B (en) |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202077B (en) * | 2015-04-30 | 2020-01-21 | 华为技术有限公司 | Task distribution method and device |
CN105187396A (en) * | 2015-08-11 | 2015-12-23 | 小米科技有限责任公司 | Method and device for identifying web crawler |
CN105426415A (en) * | 2015-10-30 | 2016-03-23 | Tcl集团股份有限公司 | Management method, device and system of website access request |
CN107341395B (en) * | 2016-05-03 | 2020-03-03 | 北京京东尚科信息技术有限公司 | Method for intercepting reptiles |
CN106021552A (en) * | 2016-05-30 | 2016-10-12 | 深圳市华傲数据技术有限公司 | Internet creeper concurrency data collection method and system based on crowd behavior simulation |
CN106886906B (en) * | 2016-08-15 | 2020-06-30 | 阿里巴巴集团控股有限公司 | Equipment identification method and device |
CN108429721B (en) * | 2017-02-15 | 2020-08-04 | 腾讯科技(深圳)有限公司 | Identification method and device for web crawler |
CN108664489B (en) * | 2017-03-29 | 2022-12-23 | 腾讯科技(深圳)有限公司 | Website content monitoring method and device |
CN107392022B (en) * | 2017-07-20 | 2020-12-29 | 北京星选科技有限公司 | Crawler identification and processing method and related device |
CN109510800B (en) * | 2017-09-14 | 2020-11-27 | 北京金山云网络技术有限公司 | Network request processing method and device, electronic equipment and storage medium |
CN107800684B (en) * | 2017-09-20 | 2018-09-18 | 贵州白山云科技有限公司 | A kind of low frequency reptile recognition methods and device |
CN109559245B (en) * | 2017-09-26 | 2022-02-25 | 北京国双科技有限公司 | Method and device for identifying specific user |
CN107786542A (en) * | 2017-09-26 | 2018-03-09 | 杭州安恒信息技术有限公司 | Methods of marking and device based on big data intellectual analysis malice IP |
CN107770171B (en) * | 2017-10-18 | 2020-01-24 | 厦门集微科技有限公司 | Verification method and system for anti-crawler of server |
CN107943949B (en) * | 2017-11-24 | 2020-06-26 | 厦门集微科技有限公司 | Method and server for determining web crawler |
CN108388794B (en) * | 2018-02-01 | 2020-09-08 | 金蝶软件(中国)有限公司 | Page data protection method and device, computer equipment and storage medium |
CN109145185B (en) * | 2018-02-02 | 2019-07-02 | 北京数安鑫云信息技术有限公司 | It identifies web crawlers and extracts the method and device of web crawlers feature |
CN108521402B (en) * | 2018-03-07 | 2021-01-22 | 创新先进技术有限公司 | Method, device and equipment for outputting label |
CN109474640B (en) * | 2018-12-29 | 2021-01-05 | 奇安信科技集团股份有限公司 | Malicious crawler detection method and device, electronic equipment and storage medium |
CN109862018B (en) * | 2019-02-21 | 2021-07-09 | 中国工商银行股份有限公司 | Anti-crawler method and system based on user access behavior |
CN110245280B (en) * | 2019-05-06 | 2021-03-02 | 北京三快在线科技有限公司 | Method and device for identifying web crawler, storage medium and electronic equipment |
CN110401639B (en) * | 2019-06-28 | 2021-12-24 | 平安科技(深圳)有限公司 | Method and device for judging abnormality of network access, server and storage medium thereof |
CN110460593B (en) * | 2019-07-29 | 2021-12-14 | 腾讯科技(深圳)有限公司 | Network address identification method, device and medium for mobile traffic gateway |
CN111859069B (en) * | 2020-07-15 | 2021-10-15 | 北京市燃气集团有限责任公司 | Network malicious crawler identification method, system, terminal and storage medium |
KR102595303B1 (en) * | 2021-04-20 | 2023-10-27 | 주식회사 스크립터스 | Method for detecting web scraping, and server for executing the same |
CN113612768B (en) * | 2021-08-02 | 2023-10-17 | 北京知道创宇信息技术股份有限公司 | Network protection method and related device |
CN114978674B (en) * | 2022-05-18 | 2023-12-05 | 中国电信股份有限公司 | Crawler recognition enhancement method and device, storage medium and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101707598A (en) * | 2009-11-10 | 2010-05-12 | 成都市华为赛门铁克科技有限公司 | Method, device and system for identifying flood attack |
CN103905372A (en) * | 2012-12-24 | 2014-07-02 | 珠海市君天电子科技有限公司 | Method and device for removing false alarm of phishing website |
CN104113519A (en) * | 2013-04-16 | 2014-10-22 | 阿里巴巴集团控股有限公司 | Network attack detection method and device thereof |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130318609A1 (en) * | 2012-05-25 | 2013-11-28 | Electronics And Telecommunications Research Institute | Method and apparatus for quantifying threat situations to recognize network threat in advance |
-
2014
- 2014-12-05 CN CN201410743056.6A patent/CN104391979B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101707598A (en) * | 2009-11-10 | 2010-05-12 | 成都市华为赛门铁克科技有限公司 | Method, device and system for identifying flood attack |
CN103905372A (en) * | 2012-12-24 | 2014-07-02 | 珠海市君天电子科技有限公司 | Method and device for removing false alarm of phishing website |
CN104113519A (en) * | 2013-04-16 | 2014-10-22 | 阿里巴巴集团控股有限公司 | Network attack detection method and device thereof |
Also Published As
Publication number | Publication date |
---|---|
CN104391979A (en) | 2015-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104391979B (en) | Network malice reptile recognition methods and device | |
CN105357195B (en) | Go beyond one's commission leak detection method and the device of web access | |
CN103179132B (en) | A kind of method and device detecting and defend CC attack | |
CN107465651B (en) | Network attack detection method and device | |
CN109951500A (en) | Network attack detecting method and device | |
CN104601601B (en) | The detection method and device of web crawlers | |
CN107465648A (en) | The recognition methods of warping apparatus and device | |
CN110609937A (en) | Crawler identification method and device | |
CN106888211A (en) | The detection method and device of a kind of network attack | |
CN110830445B (en) | Method and device for identifying abnormal access object | |
CN106547793A (en) | The method and apparatus for obtaining proxy server address | |
CN104935609A (en) | Network attack detection method and detection apparatus | |
CN108763274A (en) | Recognition methods, device, electronic equipment and the storage medium of access request | |
CN106921504A (en) | A kind of method and apparatus of the associated path for determining different user | |
CN105516390B (en) | Domain name management method and device | |
CN109241733A (en) | Crawler Activity recognition method and device based on web access log | |
CN106685899A (en) | Method and device for identifying malicious access | |
CN106301980A (en) | A kind of brush amount tool detection method and apparatus | |
CN103905372A (en) | Method and device for removing false alarm of phishing website | |
CN106802904A (en) | Log processing method, apparatus and system | |
CN108768921A (en) | A kind of malicious web pages discovery method and system of feature based detection | |
CN104391953B (en) | Detect the method and device of webpage renewal | |
CN107395650A (en) | Even method and device is returned based on sandbox detection file identification wooden horse | |
CN108206769A (en) | Method, apparatus, equipment and the medium of screen quality alarm | |
CN107528812A (en) | A kind of attack detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: Malicious web crawler recognition method and device Effective date of registration: 20190531 Granted publication date: 20171219 Pledgee: Shenzhen Black Horse World Investment Consulting Co.,Ltd. Pledgor: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. Registration number: 2019990000503 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
CP02 | Change in the address of a patent holder |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Patentee after: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A Patentee before: BEIJING GRIDSUM TECHNOLOGY Co.,Ltd. |
|
CP02 | Change in the address of a patent holder | ||
PP01 | Preservation of patent right |
Effective date of registration: 20240604 Granted publication date: 20171219 |