CN108924199A - Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically - Google Patents

Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically Download PDF

Info

Publication number
CN108924199A
CN108924199A CN201810645506.6A CN201810645506A CN108924199A CN 108924199 A CN108924199 A CN 108924199A CN 201810645506 A CN201810645506 A CN 201810645506A CN 108924199 A CN108924199 A CN 108924199A
Authority
CN
China
Prior art keywords
proxy server
score value
network
internet protocol
protocol address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810645506.6A
Other languages
Chinese (zh)
Inventor
曾兴华
邵雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongshan Yingmeirui Information Technology Co Ltd
Original Assignee
Zhongshan Yingmeirui Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongshan Yingmeirui Information Technology Co Ltd filed Critical Zhongshan Yingmeirui Information Technology Co Ltd
Priority to CN201810645506.6A priority Critical patent/CN108924199A/en
Publication of CN108924199A publication Critical patent/CN108924199A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services

Abstract

Method, apparatus, computer readable storage medium and terminal device that a kind of crawlers obtain network proxy server automatically are disclosed, technical field of computer programs is belonged to.This approach includes the following steps:Obtain agent Internet protocol address disclosed in network;Obtain the proxy server of survival;The repeated data in the proxy server of agent Internet protocol address and the survival disclosed in the network is excluded, effective proxy server is obtained;Quality-ordered is carried out for the effective proxy server;The crawlers sort according to the quality good or not of the proxy server of the survival, obtain network proxy server automatically.The device, computer readable storage medium and terminal device are implemented for this method.It is capable of providing the proxy server that simple and efficient technical method obtains high-volume high quality automatically, and multiple identities complicated and changeable are simulated for crawler system.

Description

Crawlers obtain the method, apparatus of network proxy server, computer storage automatically Medium and terminal device
Technical field
The present invention relates to technical field of computer programs, obtain network agent clothes automatically more particularly to a kind of crawlers Method, apparatus, computer readable storage medium and the terminal device of business device.
Background technique
Currently, increasing with Internet user, internet data exponentially increases, how effectively from Books are numerous Internet resources in find and find useful data and become more and more important.
Existing crawler and acquisition technique scheme generally use positioning target, obtain entrance, traversal URL (unified resource is fixed Position symbol), determine data object, the mode of request storage is realized and is acquired and stores to data.
However, there are the following problems for existing technical solution:After completing crawler configuration and opening acquisition, due to each time Self-identity information can all be issued destination server by network request crawler, it is easy to which collected object monitoring exists to same node Frequent and a large amount of request in short time, the consumption of system resource and the improper service of data, General System all can in order to prevent Enabling is counter to crawl strategy, does part limitation or thoroughly block to the frequent requests of same identity characteristic.Cause to crawl efficiency reduction, It is imperfect to crawl data.General crawler system uses agent way in simulation multiple identities method, but needs to realize and be ready to Proxy server resource, directly generation economic cost, or increase workload, cause whole efficiency to reduce.
Accordingly, there exist following demands:The agency for simply obtaining high-volume high quality automatically with efficient technical method is provided Server simulates multiple identities complicated and changeable for crawler system.
Summary of the invention
In view of this, the present invention provides method, apparatus, meters that a kind of crawlers obtain network proxy server automatically Calculation machine readable storage medium storing program for executing and terminal device, thus more suitable for practical.
In order to reach above-mentioned first purpose, crawlers provided by the invention obtain the side of network proxy server automatically The technical solution of method is as follows:
The method that crawlers provided by the invention obtain network proxy server automatically includes the following steps:
Obtain agent Internet protocol address disclosed in network;
Obtain the proxy server of survival;
Exclude the repetition in the proxy server of agent Internet protocol address and the survival disclosed in the network Data obtain effective proxy server;
Quality-ordered is carried out for the effective proxy server;
It is sorted according to the quality good or not of the proxy server of the survival, obtains network proxy server automatically.
Following technical measures also can be used in the method that crawlers provided by the invention obtain network proxy server automatically It further realizes.
Preferably, the method that the crawlers obtain network proxy server automatically further includes to public in the network The step of agent Internet protocol address opened is safeguarded.
Preferably, the method that the crawlers obtain network proxy server automatically further includes being directed to user's push The step of Visual Report Forms of the working efficiency of the network proxy server.
Preferably, the method for obtaining agent Internet protocol address disclosed in network specifically includes following step Suddenly:
With " Agent IP ", " proxy server " for keyword, target is the search engine of Baidu, Google, Bing for creation Website crawls task;
Task is crawled according to described, crawls first n pages of data content of described search Engine Listing result page automatically, In, n is customized positive integer;
Field comprising agent Internet protocol address and port numbers in first n pages of the data content is cleaned and deposited Storage, obtains agent Internet protocol address disclosed in the network.
Preferably, including the word of agent Internet protocol address and port numbers in storage first n pages of the data content Duan Shi, storage form are key-value pair form.
Preferably, described crawl task according to, first n pages of described search Engine Listing result page is crawled automatically It is further comprising the steps of during data content:
Automatic the step of abandoning the entry that described search engine marks are " advertisement ", " popularization ".
Preferably, the method for the proxy server for obtaining survival specifically includes following steps:
Agent Internet protocol address disclosed in the network is grouped at random, obtains multiple groups agent Internet protocol Address;
For multiple groups agent Internet protocol address, common agent side slogan is set;
According to multiple groups agent Internet protocol address and common agent side slogan, obtains the network host of survival and open The common agent side slogan put;
The network host of the survival and the common agent side slogan of opening are merged into storage, obtain the agency of the survival Server.
Preferably, the process that the network host of the survival and the common agent side slogan of opening are merged to storage In, it is stored in the form of key-value pair after the network host of the survival and the common agent side slogan merging of opening.
Preferably, the common agent side slogan for obtaining the network host and opening survived is realized using NMap order 's.
Preferably, the reference parameter packet for during the effective proxy server progress quality-ordered Include the classification of agent Internet protocol address, response speed, data forwarding rate and packet loss.
Preferably, it is real according to the label of the effective proxy server that the agent Internet protocol address, which is sorted out, Existing.
Preferably, the label of the proxy server is selected from ssl-proxy, http agency, socket agency, transparent generation Reason, any one in anonymity proxy.
Preferably, specific according to the method that the response speed carries out quality-ordered to the effective proxy server Include the following steps:
By the Internet packets survey meter, according to the ascending row of average turnaround time of the effective proxy server Sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Preferably, the average turnaround time for defining the effective proxy server is tIt is average, according to the effective generation When managing the ascending sequence of average turnaround time of server, and recording score value, standards of grading are as follows:
tIt is averageWhen≤20ms, score value=10;20ms < tIt is averageWhen≤100ms, score value=8;100ms < tIt is averageWhen≤200ms, Score value=7;200ms < tIt is averageWhen≤300ms, score value=6;300ms < tIt is averageWhen≤500ms, score value=0, abandoning score value is 0 As a result.
Preferably, being had according to the method that the data forwarding rate carries out quality-ordered to the effective proxy server Body includes the following steps:
By calling routing trace command to send to target there is the Internet Control Information Protocol of different life spans to return Answer message, with determine to destination routing, abandon time-out as a result, by the ascending row of total time between metric and hop Sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Preferably, the interconnection network control that there are different life spans by calling routing trace command to send to target Information protocol back message processed, with determine to destination routing, abandon time-out as a result, by total time between metric and hop Ascending sequence, and when recording score value, standards of grading are as follows:
When metric≤10,
Total time≤20ms, score value=10;Total time≤50ms, score value=9;Total time≤100ms, score value=8;When total Between≤200ms, score value=7;Total time≤300ms, score value=6;Total time≤500ms, score value=5;Total time > 500ms, Score value=0 abandons the result that score value is 0;
When metric≤20,
Total time≤20ms, score value=9;Total time≤50ms, score value=8;Total time≤100ms, score value=7;When total Between≤200ms, score value=6;Total time≤300ms, score value=5;Total time≤500ms, score value=4;Total time > 500ms, Score value=0 abandons the result that score value is 0;
When metric≤30,
Total time≤20ms, score value=8;Total time≤50ms, score value=7;Total time≤100ms, score value=6;When total Between≤200ms, score value=5;Total time≤300ms, score value=4;Total time≤500ms, score value=3;Total time > 500ms, Score value=0 abandons the result that score value is 0.
Preferably, by calling routing trace command to send to target there is the internet-based control of different life spans to believe Agreement back message is ceased, to determine that order is selected from traceroute order or tracerert order to when the routing of destination.
Preferably, specifically being wrapped according to the method that the packet loss carries out quality-ordered to the effective proxy server Include following steps:
By ping order, an Internet Control Information Protocol back message echo@is spread out of to destination host and requires number According to packet, and echo response data packet to be received is waited, to send 100 requests, temporally estimates packet loss with the number of success response Rate by the ascending sequence of packet loss, and records score value, and the quality of the higher effective proxy server of score value is better.
Preferably, spreading out of an Internet Control Information Protocol back message to destination host by ping order Echo requires data packet, and waits echo response data packet to be received, temporally secondary with success response to send 100 requests Number estimation packet loss, by the ascending sequence of packet loss, and when recording score value, standards of grading are as follows:
Packet loss≤5%, score value=10;Packet loss≤15%, score value=8;Packet loss≤20%, score value=7;Packet loss ≤ 25%, score value=6;Packet loss≤30%, score value=5 abandon the list of proxies of 30% or more packet loss.
Preferably, the crawlers sort according to the quality good or not of the proxy server of the survival, it is automatic to obtain The method of network proxy server specifically includes following steps:
In newly-increased task configuration crawler, multi-process is opened;
While the exclusive agency of unlatching process is with user agent, an agency and a kind of user are distributed to each process Act on behalf of analog information;
It whether effective monitors agency used in each process, continues if effectively using the proxy server;Such as Invalid then random from the network proxy server Internet protocol address list or sequence one effective net of reallocation Network proxy server.
Preferably, the method that the crawlers obtain network proxy server automatically further includes according to timer The step of range safeguards Internet protocol address specifically includes following steps:
Timer and start by set date polling service are set;
By the Internet protocol of the used effective proxy server shown during the polling service Location excludes, and obtains the Internet protocol address list of remaining effective proxy server;
Quality-ordered is carried out to the Internet protocol address list of the remaining effective proxy server, is regained Effective proxy server list.
Preferably, the Internet protocol address that the content of the visualized list includes each proxy server participates in Number is crawled, data bulk is crawled and crawls the time.
In order to reach above-mentioned second purpose, crawlers provided by the invention obtain the dress of network proxy server automatically The technical solution set is as follows:
The device that crawlers provided by the invention obtain network proxy server automatically includes:
Disclosed Internet protocol address acquiring unit, for obtaining agent Internet protocol address disclosed in network;
The proxy server acquiring unit of survival, for obtaining the proxy server of survival;
Effective proxy server obtains module, for excluding agent Internet protocol address and institute disclosed in the network The repeated data in the proxy server of survival is stated, effective proxy server is obtained;
Effective proxy server quality-ordered unit, for carrying out quality-ordered for the effective proxy server;
Network proxy server acquiring unit, the quality good or not for the proxy server according to the survival sort, from It is dynamic to obtain network proxy server.
Following technical measures also can be used in the device that crawlers provided by the invention obtain network proxy server automatically It further realizes.
Preferably, the device that the crawlers obtain network proxy server automatically further includes:
Visual Report Forms pushing module, for user push for the network proxy server working efficiency can Depending on changing report.
Preferably, the disclosed Internet protocol address acquiring unit includes:
Crawl task creation module, for create with " Agent IP ", " proxy server " be keyword, target be Baidu, The search engine site of Google, Bing crawl task;
Automatically module is crawled, for crawling task according to, crawls the preceding n of described search Engine Listing result page automatically The data content of page, wherein n is customized positive integer;
Disclosed Internet protocol address obtains module, for will include agency's interconnection in first n pages of the data content The field of fidonetFido address and port numbers is cleaned and is stored, and agent Internet protocol address disclosed in the network is obtained.
Preferably, the device that the crawlers obtain network proxy server automatically further includes:
Entry discard module crawls task for described according to, crawls described search Engine Listing result page automatically First n pages of data content during, it is automatic to abandon the entry that described search engine marks are " advertisement ", " popularization ".
Preferably, the proxy server acquiring unit of the survival includes:
Grouping module obtains more for being grouped at random to agent Internet protocol address disclosed in the network Group agent Internet protocol address;
Common agent side slogan setup module, for for the common agency of multiple groups agent Internet protocol address setting Port numbers;
The network host of survival and the common agent side slogan of opening obtain module, interconnect for being acted on behalf of according to the multiple groups FidonetFido address and common agent side slogan obtain the network host of survival and the common agent side slogan of opening;
The proxy server of survival obtains module, for by the common proxy port of the network host of the survival and opening Number merge storage, obtain the proxy server of the survival.
Preferably, effective proxy server quality-ordered unit includes:
Response speed sorting module, for passing through the Internet packets survey meter, according to the flat of the effective proxy server Equal turnaround time ascending sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Preferably, effective proxy server quality-ordered unit includes:
Data forwarding rate sorting module, for there are different life spans by calling routing trace command to send to target Internet Control Information Protocol back message, with determine to destination routing, abandon time-out as a result, by metric and jump Total time ascending sequence between point, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Preferably, effective proxy server quality-ordered unit includes:
Packet loss sorting module is spread out of an Internet Control Information Protocol to destination host and is responded by ping order Message echo requires data packet, and waits echo response data packet to be received, to send 100 requests, temporally and success response Number estimate packet loss, by the ascending sequence of packet loss, and record score value, the matter of the higher effective proxy server of score value Amount is better.
Preferably, the network proxy server acquiring unit includes:
Process opening module, for opening multi-process in newly-increased task configuration crawler;
Process distribution module is used to divide while the exclusive agency of the process of unlatching and user agent to each process With an agency and a kind of user agent's analog information;
Monitor module, for monitor acted on behalf of used in each process it is whether effective, as effectively if continue using this Proxy server;Random from the network proxy server Internet protocol address list or sequence is reallocated if invalid One effective network proxy server.
Preferably, the device that the crawlers obtain network proxy server automatically further includes:
Internet protocol address maintenance unit, for being tieed up to agent Internet protocol address disclosed in the network Shield.
Preferably, the Internet protocol address maintenance unit includes:
Polling service starting module, for timer and start by set date polling service to be arranged;
Internet protocol address list obtains module, for used by what is shown during the polling service The Internet protocol address of effective proxy server excludes, with obtaining the Internet protocol of remaining effective proxy server Location list;
Effective proxy server list obtains module, for the internet protocol to the remaining effective proxy server It discusses address list and carries out quality-ordered, regain effective proxy server list.
In order to reach above-mentioned third purpose, the technical solution of computer readable storage medium provided by the invention is as follows:
Crawlers, which are stored with, on computer readable storage medium provided by the invention obtains network proxy server automatically Program, the program of the automatic acquisition network proxy server, which is performed, realizes that crawlers provided by the invention obtain automatically The step of taking the method for network proxy server.
In order to reach above-mentioned 4th purpose, the technical solution of computer readable storage medium provided by the invention is as follows:
Terminal device provided by the invention includes processor, memory, and it is automatic to be stored with crawlers on the memory The program of network proxy server is obtained, the program of the automatic acquisition network proxy server, which is performed, realizes that the present invention mentions The crawlers of confession obtain the step of method of network proxy server automatically.
Crawlers provided in an embodiment of the present invention obtain the method, apparatus of network proxy server automatically, computer can Available Agent IP, the viability of autonomous exploration Agent IP, sound can be obtained from network automatically by reading storage medium and terminal device Speed and data packet routing forwarding efficiency are answered, Agent IP is subjected to quality-ordered, the autonomous Agent IP poll that switches crawls, thus mould Intend into different IP and initiate network request, the counter of target is crawled with confrontation and climbs strategy.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is the step of crawlers that the embodiment of the present invention one provides obtain the method for network proxy server automatically stream Cheng Tu;
Fig. 2 is the signal stream for the device that crawlers provided by Embodiment 2 of the present invention obtain network proxy server automatically To relation schematic diagram.
Specific embodiment
The present invention in order to solve the problems existing in the prior art, provides a kind of crawlers and obtains network proxy server automatically Method, apparatus, computer readable storage medium and terminal device, thus more suitable for practical.
It is of the invention to reach the technical means and efficacy that predetermined goal of the invention is taken further to illustrate, below in conjunction with Attached drawing and preferred embodiment, to crawlers proposed according to the present invention obtain automatically network proxy server method, apparatus, Computer readable storage medium and terminal device, specific embodiment, structure, feature and its effect, detailed description is as follows.? In following the description, what different " embodiment " or " embodiment " referred to is not necessarily the same embodiment.In addition, one or more are implemented Feature, structure or feature in example can be combined by any suitable form.
The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes System, for example, A and/or B, is specifically interpreted as:It can simultaneously include A and B, can be with individualism A, it can also be with individualism B can have above-mentioned three kinds of any case.
Embodiment one
Referring to attached drawing 1, the crawlers that the embodiment of the present invention one provides obtain the method packet of network proxy server automatically Include following steps:
Obtain agent Internet protocol address disclosed in network;
Obtain the proxy server of survival;
The repeated data in the proxy server of agent Internet protocol address disclosed in network and survival is excluded, is obtained Effective proxy server;
Quality-ordered is carried out for effective proxy server;
It is sorted according to the quality good or not of the proxy server of survival, obtains network proxy server automatically.
Crawlers provided in an embodiment of the present invention obtain the method, apparatus of network proxy server automatically, computer can Available Agent IP, the viability of autonomous exploration Agent IP, sound can be obtained from network automatically by reading storage medium and terminal device Speed and data packet routing forwarding efficiency are answered, Agent IP is subjected to quality-ordered, the autonomous Agent IP poll that switches crawls, thus mould Intend into different IP and initiate network request, the counter of target is crawled with confrontation and climbs strategy.It, which can fight, crawls the counter of target and climbs strategy, It is modeled to multiple request points and initiates network request, (1) crawls failure caused by preventing target to block or crawls imperfect;(2) Multiple identities combine more association's journeys to improve the efficiency for crawling data at multiple.
Wherein, the method that crawlers obtain network proxy server automatically further includes to agency's interconnection disclosed in network The step of fidonetFido address is safeguarded.
Wherein, the method that crawlers obtain network proxy server automatically further includes to user's push for network agent The step of Visual Report Forms of the working efficiency of server.
Wherein, the method for obtaining agent Internet protocol address disclosed in network specifically includes following steps:
With " Agent IP ", " proxy server " for keyword, target is the search engine of Baidu, Google, Bing for creation Website crawls task;
According to the task that crawls, first n pages of data content of search engine tabulating result page is crawled automatically, wherein n is to make by oneself The positive integer of justice;
Field comprising agent Internet protocol address and port numbers in first n pages of data content is cleaned and stored, is obtained To agent Internet protocol address disclosed in network.
Wherein, when storing the field in preceding n pages of data content comprising agent Internet protocol address and port numbers, storage Form is key-value pair form.
Wherein, according to the task that crawls, the process of first n pages of data content of search engine tabulating result page is crawled automatically In, it is further comprising the steps of:
It is automatic to abandon the step of search engine is labeled as the entry of " advertisement ", " popularization ".
Wherein, the method for obtaining the proxy server of survival specifically includes following steps:
Agent Internet protocol address disclosed in network is grouped at random, with obtaining multiple groups agent Internet protocol Location;
For multiple groups agent Internet protocol address, common agent side slogan is set;
According to multiple groups agent Internet protocol address and common agent side slogan, network host and the opening of survival are obtained Common agent side slogan;
The network host of survival and the common agent side slogan of opening are merged into storage, the proxy server survived.
Wherein, during the network host of survival and the common agent side slogan of opening being merged storage, the net of survival Network host and the common agent side slogan of opening are stored in the form of key-value pair after merging.
Wherein, the common agent side slogan application NMap order of the network host and opening that obtain survival is realized.
Wherein, the reference parameter during carrying out quality-ordered for effective proxy server includes agent Internet Protocol address classification, response speed, data forwarding rate and packet loss.
Wherein, it is according to the realization of the label of effective proxy server that agent Internet protocol address, which is sorted out,.
Wherein, the label of proxy server is selected from ssl-proxy, http agency, socket agency, Transparent Proxy, anonymous generation Any one in reason.
Wherein, the method that speed carries out quality-ordered to effective proxy server according to response specifically includes following step Suddenly:
By the Internet packets survey meter, according to the ascending sequence of average turnaround time of effective proxy server, and Score value is recorded, the quality of the higher effective proxy server of score value is better.
Wherein, the average turnaround time for defining effective proxy server is tIt is average, according to the flat of effective proxy server Equal turnaround time ascending sequence, and when recording score value, standards of grading are as follows:
tIt is averageWhen≤20ms, score value=10;20ms < tIt is averageWhen≤100ms, score value=8;100ms < tIt is averageWhen≤200ms, Score value=7;200ms < tIt is averageWhen≤300ms, score value=6;300ms < tIt is averageWhen≤500ms, score value=0, abandoning score value is 0 As a result.
Wherein, following step is specifically included according to the method that data forwarding rate carries out quality-ordered to effective proxy server Suddenly:
By calling routing trace command to send to target there is the Internet Control Information Protocol of different life spans to return Answer message, with determine to destination routing, abandon time-out as a result, by the ascending row of total time between metric and hop Sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Wherein, by calling routing trace command to send to target there is the internet-based control information of different life spans to assist Discuss back message, with determine to destination routing, abandon time-out as a result, ascending by total time between metric and hop When sorting, and recording score value, standards of grading are as follows:
When metric≤10,
Total time≤20ms, score value=10;Total time≤50ms, score value=9;Total time≤100ms, score value=8;When total Between≤200ms, score value=7;Total time≤300ms, score value=6;Total time≤500ms, score value=5;Total time > 500ms, Score value=0 abandons the result that score value is 0;
When metric≤20,
Total time≤20ms, score value=9;Total time≤50ms, score value=8;Total time≤100ms, score value=7;When total Between≤200ms, score value=6;Total time≤300ms, score value=5;Total time≤500ms, score value=4;Total time > 500ms, Score value=0 abandons the result that score value is 0;
When metric≤30,
Total time≤20ms, score value=8;Total time≤50ms, score value=7;Total time≤100ms, score value=6;When total Between≤200ms, score value=5;Total time≤300ms, score value=4;Total time≤500ms, score value=3;Total time > 500ms, Score value=0 abandons the result that score value is 0.
Wherein, by calling routing trace command to send to target there is the internet-based control information of different life spans to assist Back message is discussed, to determine that order is selected from traceroute order or tracerert order to when the routing of destination.
Wherein, following steps are specifically included according to the method that packet loss carries out quality-ordered to effective proxy server:
By ping order, an Internet Control Information Protocol back message echo@is spread out of to destination host and requires number According to packet, and echo response data packet to be received is waited, to send 100 requests, temporally estimates packet loss with the number of success response Rate by the ascending sequence of packet loss, and records score value, and the quality of the higher effective proxy server of score value is better.
Wherein, by ping order, an Internet Control Information Protocol back message echo@is spread out of to destination host and is wanted Data packet is sought, and waits echo response data packet to be received, to send 100 requests, is temporally estimated with the number of success response Packet loss, by the ascending sequence of packet loss, and when recording score value, standards of grading are as follows:
Packet loss≤5%, score value=10;Packet loss≤15%, score value=8;Packet loss≤20%, score value=7;Packet loss ≤ 25%, score value=6;Packet loss≤30%, score value=5 abandon the list of proxies of 30% or more packet loss.
Wherein, crawlers sort according to the quality good or not of the proxy server of survival, obtain external network proxy services automatically The method of device specifically includes following steps:
In newly-increased task configuration crawler, multi-process is opened;
While the exclusive agency of unlatching process is with user agent, an agency and a kind of user agent are distributed to each process Analog information;
It whether effective monitors agency used in each process, continues if effectively using the proxy server;As in vain The effective network agent clothes of then random from network proxy server Internet protocol address list or sequence reallocation one Business device.
Wherein, the method that crawlers obtain network proxy server automatically further includes the range according to timer to mutual The step of networking protocol address is safeguarded, specifically includes following steps:
Timer and start by set date polling service are set;
By the Internet protocol address row of the used effective proxy server shown during polling service It removes, obtains the Internet protocol address list of remaining effective proxy server;
Quality-ordered is carried out to the Internet protocol address list of remaining effective proxy server, is regained effectively Proxy server list.
Wherein, the content of visualized list includes that crawling for the Internet protocol address participation of each proxy server is secondary Number crawls data bulk and crawls the time.
Embodiment two
The device that crawlers provided by Embodiment 2 of the present invention obtain network proxy server automatically includes:
Disclosed Internet protocol address acquiring unit, for obtaining agent Internet protocol address disclosed in network;
The proxy server acquiring unit of survival, for obtaining the proxy server of survival;
Effective proxy server obtains module, for excluding agent Internet protocol address disclosed in network and survival Repeated data in proxy server obtains effective proxy server;
Effective proxy server quality-ordered unit, for carrying out quality-ordered for effective proxy server;
Network proxy server acquiring unit, the quality good or not for the proxy server according to survival sort, obtain automatically Take network proxy server.
Wherein, crawlers obtain the device of network proxy server automatically and further include:
Visual Report Forms pushing module, for the visualization to user's push for the working efficiency of network proxy server Report.
Wherein, disclosed Internet protocol address acquiring unit includes:
Crawl task creation module, for create with " Agent IP ", " proxy server " be keyword, target be Baidu, The search engine site of Google, Bing crawl task;
Automatically module is crawled, for crawling first n pages of data of search engine tabulating result page automatically according to task is crawled Content, wherein n is customized positive integer;
Disclosed Internet protocol address obtains module, for will include that agent Internet is assisted in first n pages of data content The field of view address and port numbers is cleaned and is stored, and agent Internet protocol address disclosed in network is obtained.
Wherein, crawlers obtain the device of network proxy server automatically and further include:
Entry discard module, for crawling first n pages of data of search engine tabulating result page automatically according to task is crawled During content, the automatic search engine that abandons is labeled as the entry of " advertisement ", " popularization ".
Wherein, the proxy server acquiring unit of survival includes:
Grouping module obtains multiple groups generation for being grouped at random to agent Internet protocol address disclosed in network Manage Internet protocol address;
Common agent side slogan setup module, for common proxy port to be arranged for multiple groups agent Internet protocol address Number;
The network host of survival and the common agent side slogan of opening obtain module, for being assisted according to multiple groups agent Internet Address and common agent side slogan are discussed, the network host of survival and the common agent side slogan of opening are obtained;
The proxy server of survival obtains module, and the common agent side slogan of network host and opening for that will survive closes And it stores, the proxy server survived.
Wherein, effective proxy server quality-ordered unit includes:
Response speed sorting module, for by the Internet packets survey meter, average according to effective proxy server to be returned Multiple time ascending sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Wherein, effective proxy server quality-ordered unit includes:
Data forwarding rate sorting module, for there are different life spans by calling routing trace command to send to target Internet Control Information Protocol back message, with determine to destination routing, abandon time-out as a result, by metric and jump Total time ascending sequence between point, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Wherein, effective proxy server quality-ordered unit includes:
Packet loss sorting module is spread out of an Internet Control Information Protocol to destination host and is responded by ping order Message echo requires data packet, and waits echo response data packet to be received, to send 100 requests, temporally and success response Number estimate packet loss, by the ascending sequence of packet loss, and record score value, the matter of the higher effective proxy server of score value Amount is better.
Wherein, network proxy server acquiring unit includes:
Process opening module, for opening multi-process in newly-increased task configuration crawler;
Process distribution module is used to distribute one to each process while the exclusive agency of the process of unlatching and user agent A agency and a kind of user agent's analog information;
Monitor module, for monitor acted on behalf of used in each process it is whether effective, as effectively if continue apply the agency Server;Random from network proxy server Internet protocol address list or sequence reallocates one effectively if invalid Network proxy server.
Wherein, crawlers obtain the device of network proxy server automatically and further include:
Internet protocol address maintenance unit, for being safeguarded to agent Internet protocol address disclosed in network.
Wherein, Internet protocol address maintenance unit includes:
Polling service starting module, for timer and start by set date polling service to be arranged;
Internet protocol address list obtains module, for used effective by what is shown during polling service Proxy server Internet protocol address exclude, obtain remaining effective proxy server Internet protocol address column Table;
Effective proxy server list obtains module, for the Internet protocol to remaining effective proxy server Location list carries out quality-ordered, regains effective proxy server list.
Embodiment three
Crawlers, which are stored with, on the computer readable storage medium that the embodiment of the present invention three provides obtains network generation automatically The program of server is managed, the automatic program for obtaining network proxy server, which is performed, realizes crawlers provided by the invention certainly The step of dynamic method for obtaining network proxy server.
Example IV
The terminal device that the embodiment of the present invention four provides includes processor, memory, is stored with crawlers on memory The automatic program for obtaining network proxy server, the automatic program for obtaining network proxy server, which is performed, realizes that the present invention mentions The crawlers of confession obtain the step of method of network proxy server automatically.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (10)

1. a kind of method that crawlers obtain network proxy server automatically, which is characterized in that include the following steps:
Obtain agent Internet protocol address disclosed in network;
Obtain the proxy server of survival;
The repeated data in the proxy server of agent Internet protocol address and the survival disclosed in the network is excluded, Obtain effective proxy server;
Quality-ordered is carried out for the effective proxy server;
It is sorted according to the quality good or not of the proxy server of the survival, obtains network proxy server automatically.
2. the method that crawlers according to claim 1 obtain network proxy server automatically, which is characterized in that also wrap Include the step of safeguarding to agent Internet protocol address disclosed in the network.
3. the method that crawlers according to claim 1 obtain network proxy server automatically, which is characterized in that also wrap The step of including the Visual Report Forms for the working efficiency for being directed to the network proxy server to user's push.
4. the method that crawlers according to claim 1 obtain network proxy server automatically, which is characterized in that described The method for obtaining agent Internet protocol address disclosed in network specifically includes following steps:
With " Agent IP ", " proxy server " for keyword, target is the search engine site of Baidu, Google, Bing for creation Crawl task;
Task is crawled according to described, crawls first n pages of data content of described search Engine Listing result page automatically, wherein n is Customized positive integer;
Field comprising agent Internet protocol address and port numbers in first n pages of the data content is cleaned and stored, is obtained To agent Internet protocol address disclosed in the network;
Preferably, when storing the field comprising agent Internet protocol address and port numbers in first n pages of the data content, Storage form is key-value pair form;
Preferably, described crawl task according to, first n pages of data of described search Engine Listing result page are crawled automatically It is further comprising the steps of during content:
Automatic the step of abandoning the entry that described search engine marks are " advertisement ", " popularization ";Preferably, the acquisition survival The method of proxy server specifically include following steps:
Agent Internet protocol address disclosed in the network is grouped at random, with obtaining multiple groups agent Internet protocol Location;
For multiple groups agent Internet protocol address, common agent side slogan is set;
According to multiple groups agent Internet protocol address and common agent side slogan, network host and the opening of survival are obtained Common agent side slogan;
The network host of the survival and the common agent side slogan of opening are merged into storage, obtain the agency service of the survival Device;
Preferably,
It is described the network host of the survival and the common agent side slogan of opening are merged into storage during, the survival Network host and the common agent side slogan of opening are stored in the form of key-value pair after merging;
Preferably, obtaining the network host of survival and the common agent side slogan application NMap order realization of opening;
Preferably, the reference parameter for during the effective proxy server progress quality-ordered includes generation Manage Internet protocol address classification, response speed, data forwarding rate and packet loss;
Preferably, the agent Internet protocol address, which is sorted out, to be realized according to the label of the effective proxy server 's;
Preferably, the label of the proxy server is selected from ssl-proxy, http agency, socket agency, Transparent Proxy, hides Name agency in any one;
Preferably, being specifically included according to the method that the response speed carries out quality-ordered to the effective proxy server Following steps:
By the Internet packets survey meter, according to the ascending sequence of average turnaround time of the effective proxy server, and Score value is recorded, the quality of the higher effective proxy server of score value is better;
Preferably, the average turnaround time for defining the effective proxy server is tIt is average, taken according to the effective agency It is engaged in the ascending sequence of average turnaround time of device, and when recording score value, standards of grading are as follows:
tIt is averageWhen≤20ms, score value=10;20ms < tIt is averageWhen≤100ms, score value=8;100ms < tIt is averageWhen≤200ms, score value =7;200ms < tIt is averageWhen≤300ms, score value=6;300ms < tIt is averageWhen≤500ms, score value=0 abandons the knot that score value is 0 Fruit;
Preferably, specifically being wrapped according to the method that the data forwarding rate carries out quality-ordered to the effective proxy server Include following steps:
By calling routing trace command to send to target there is the Internet Control Information Protocol of different life spans to respond report Text, to determine to the routing of destination, abandon time-out as a result, by the ascending sequence of total time between metric and hop, and Score value is recorded, the quality of the higher effective proxy server of score value is better;
Preferably, described by calling routing trace command to send to target there is the internet-based control of different life spans to believe Cease agreement back message, with determine to destination routing, abandon time-out as a result, by total time between metric and hop by small When sorting to big, and recording score value, standards of grading are as follows:
When metric≤10,
Total time≤20ms, score value=10;Total time≤50ms, score value=9;Total time≤100ms, score value=8;Total time≤ 200ms, score value=7;Total time≤300ms, score value=6;Total time≤500ms, score value=5;Total time > 500ms, score value =0, abandon the result that score value is 0;
When metric≤20,
Total time≤20ms, score value=9;Total time≤50ms, score value=8;Total time≤100ms, score value=7;Total time≤ 200ms, score value=6;Total time≤300ms, score value=5;Total time≤500ms, score value=4;Total time > 500ms, score value =0, abandon the result that score value is 0;
When metric≤30,
Total time≤20ms, score value=8;Total time≤50ms, score value=7;Total time≤100ms, score value=6;Total time≤ 200ms, score value=5;Total time≤300ms, score value=4;Total time≤500ms, score value=3;Total time > 500ms, score value =0, abandon the result that score value is 0;
Preferably, by calling routing trace command to send to target there is the internet-based control information of different life spans to assist Back message is discussed, to determine that order is selected from traceroute order or tracerert order to when the routing of destination;
Preferably, according to the method that the packet loss carries out quality-ordered to the effective proxy server specifically include with Lower step:
By ping order, an Internet Control Information Protocol back message echo@is spread out of to destination host and requires data packet, And echo response data packet to be received is waited, to send 100 requests, packet loss temporally is estimated with the number of success response, is pressed The ascending sequence of packet loss, and score value is recorded, the quality of the higher effective proxy server of score value is better;
It is preferred to make position, by ping order, spreads out of an Internet Control Information Protocol back message echo@to destination host and wants Data packet is sought, and waits echo response data packet to be received, to send 100 requests, is temporally estimated with the number of success response Packet loss, by the ascending sequence of packet loss, and when recording score value, standards of grading are as follows:
Packet loss≤5%, score value=10;Packet loss≤15%, score value=8;Packet loss≤20%, score value=7;Packet loss≤ 25%, score value=6;Packet loss≤30%, score value=5 abandon the list of proxies of 30% or more packet loss;
Preferably, the crawlers sort according to the quality good or not of the proxy server of the survival, network is obtained automatically The method of proxy server specifically includes following steps:
In newly-increased task configuration crawler, multi-process is opened;
While the exclusive agency of unlatching process is with user agent, an agency and a kind of user agent are distributed to each process Analog information;
It whether effective monitors agency used in each process, continues if effectively using the proxy server;As in vain Then random from the network proxy server Internet protocol address list or sequence is reallocated an effective network generation Manage server;
Preferably, the method that the crawlers obtain network proxy server automatically further includes the range pair according to timer The step of Internet protocol address is safeguarded specifically includes following steps:
Timer and start by set date polling service are set;
By the Internet protocol address row of the used effective proxy server shown during the polling service It removes, obtains the Internet protocol address list of remaining effective proxy server;
Quality-ordered is carried out to the Internet protocol address list of the remaining effective proxy server, is regained effectively Proxy server list;
Preferably, the content of the visualized list includes crawling for the Internet protocol address participation of each proxy server Number crawls data bulk and crawls the time.
5. the device that a kind of crawlers obtain network proxy server automatically, which is characterized in that including:
Disclosed Internet protocol address acquiring unit, for obtaining agent Internet protocol address disclosed in network;
The proxy server acquiring unit of survival, for obtaining the proxy server of survival;
Effective proxy server obtains module, for excluding agent Internet protocol address disclosed in the network and described depositing Repeated data in proxy server living obtains effective proxy server;
Effective proxy server quality-ordered unit, for carrying out quality-ordered for the effective proxy server;
Network proxy server acquiring unit, the quality good or not for the proxy server according to the survival sort, obtain automatically Take network proxy server.
6. the device that crawlers according to claim 5 obtain network proxy server automatically, which is characterized in that also wrap It includes:
Visual Report Forms pushing module, for the visualization to user's push for the working efficiency of the network proxy server Report.
7. the device that crawlers according to claim 5 obtain network proxy server automatically, which is characterized in that described Disclosed Internet protocol address acquiring unit includes:
Crawl task creation module, for create with " Agent IP ", " proxy server " be keyword, target be Baidu, The search engine site of Google, Bing crawl task;
Automatically module is crawled, for crawling task according to, crawls first n pages of described search Engine Listing result page automatically Data content, wherein n is customized positive integer;
Disclosed Internet protocol address obtains module, for will include that agent Internet is assisted in first n pages of the data content The field of view address and port numbers is cleaned and is stored, and agent Internet protocol address disclosed in the network is obtained.
8. the device that crawlers according to claim 7 obtain network proxy server automatically, which is characterized in that also wrap It includes:
Entry discard module crawls task for described according to, crawls the preceding n of described search Engine Listing result page automatically It is automatic to abandon the entry that described search engine marks are " advertisement ", " popularization " during the data content of page;
Preferably, the proxy server acquiring unit of the survival includes:
Grouping module obtains multiple groups generation for being grouped at random to agent Internet protocol address disclosed in the network Manage Internet protocol address;
Common agent side slogan setup module, for common proxy port to be arranged for multiple groups agent Internet protocol address Number;
The network host of survival and the common agent side slogan of opening obtain module, for being assisted according to the multiple groups agent Internet Address and common agent side slogan are discussed, the network host of survival and the common agent side slogan of opening are obtained;
The proxy server of survival obtains module, for closing the common agent side slogan of the network host of the survival and opening And store, obtain the proxy server of the survival;
Preferably, effective proxy server quality-ordered unit includes:
Response speed sorting module, for by the Internet packets survey meter, average according to the effective proxy server to be returned Multiple time ascending sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better;
Preferably, effective proxy server quality-ordered unit includes:
Data forwarding rate sorting module, for there is the mutual of different life spans by calling routing trace command to send to target Networking control information protocol back message, with determine to destination routing, abandon time-out as a result, by between metric and hop Total time ascending sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better;
Preferably, effective proxy server quality-ordered unit includes:
Packet loss sorting module spreads out of an Internet Control Information Protocol back message to destination host by ping order Echo requires data packet, and waits echo response data packet to be received, temporally secondary with success response to send 100 requests Number estimation packet loss, by the ascending sequence of packet loss, and records score value, the quality of the higher effective proxy server of score value is more It is good;
Preferably, the network proxy server acquiring unit includes:
Process opening module, for opening multi-process in newly-increased task configuration crawler;
Process distribution module is used to distribute one to each process while the exclusive agency of the process of unlatching and user agent A agency and a kind of user agent's analog information;
Monitor module, for monitor acted on behalf of used in each process it is whether effective, as effectively if continue apply the agency Server;Random from the network proxy server Internet protocol address list or sequence is reallocated one if invalid Effective network proxy server;
Preferably, the device that the crawlers obtain network proxy server automatically further includes:
Internet protocol address maintenance unit, for being safeguarded to agent Internet protocol address disclosed in the network;
Preferably, the Internet protocol address maintenance unit includes:
Polling service starting module, for timer and start by set date polling service to be arranged;
Internet protocol address list obtains module, for used effective by what is shown during the polling service Proxy server Internet protocol address exclude, obtain remaining effective proxy server Internet protocol address column Table;
Effective proxy server list obtains module, for the Internet protocol to the remaining effective proxy server Location list carries out quality-ordered, regains effective proxy server list.
9. a kind of computer readable storage medium, which is characterized in that be stored with crawler journey on the computer readable storage medium Sequence obtains the program of network proxy server automatically, and the program of the automatic acquisition network proxy server is performed realization power Benefit requires the step of any described method in 1~4.
10. a kind of terminal device, which is characterized in that including processor, memory, crawlers are stored on the memory certainly The dynamic program for obtaining network proxy server, the program of the automatic acquisition network proxy server, which is performed, realizes that right is wanted The step of seeking any described method in 1~4.
CN201810645506.6A 2018-06-21 2018-06-21 Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically Pending CN108924199A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810645506.6A CN108924199A (en) 2018-06-21 2018-06-21 Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810645506.6A CN108924199A (en) 2018-06-21 2018-06-21 Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically

Publications (1)

Publication Number Publication Date
CN108924199A true CN108924199A (en) 2018-11-30

Family

ID=64420901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810645506.6A Pending CN108924199A (en) 2018-06-21 2018-06-21 Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically

Country Status (1)

Country Link
CN (1) CN108924199A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109587007A (en) * 2018-12-27 2019-04-05 湖南宸睿通信科技有限公司 A kind of communication equipment detecting device and its detection method
CN110034979A (en) * 2019-04-23 2019-07-19 恒安嘉新(北京)科技股份公司 A kind of proxy resources monitoring method, device, electronic equipment and storage medium
CN110147271A (en) * 2019-05-15 2019-08-20 重庆八戒传媒有限公司 Promote the method, apparatus and computer readable storage medium of crawler agent quality
CN111277662A (en) * 2020-01-22 2020-06-12 咪咕文化科技有限公司 Processing method of proxy server, electronic device and storage medium
US11595496B2 (en) 2013-08-28 2023-02-28 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11611607B2 (en) 2009-10-08 2023-03-21 Bright Data Ltd. System providing faster and more efficient data communication
US11657110B2 (en) 2019-02-25 2023-05-23 Bright Data Ltd. System and method for URL fetching retry mechanism
US11711233B2 (en) 2017-08-28 2023-07-25 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11757961B2 (en) 2015-05-14 2023-09-12 Bright Data Ltd. System and method for streaming content from multiple servers
US11902253B2 (en) 2019-04-02 2024-02-13 Bright Data Ltd. System and method for managing non-direct URL fetching service
US11985212B2 (en) 2023-03-11 2024-05-14 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105245607A (en) * 2015-10-23 2016-01-13 中国联合网络通信集团有限公司 Proxy server dynamic automatic selection method and system
US20170032044A1 (en) * 2006-11-14 2017-02-02 Paul Vincent Hayes System and Method for Personalized Search While Maintaining Searcher Privacy
CN106547793A (en) * 2015-09-22 2017-03-29 北京国双科技有限公司 The method and apparatus for obtaining proxy server address
CN107832355A (en) * 2017-10-23 2018-03-23 北京金堤科技有限公司 The method and device that a kind of agency of crawlers obtains

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170032044A1 (en) * 2006-11-14 2017-02-02 Paul Vincent Hayes System and Method for Personalized Search While Maintaining Searcher Privacy
CN106547793A (en) * 2015-09-22 2017-03-29 北京国双科技有限公司 The method and apparatus for obtaining proxy server address
CN105245607A (en) * 2015-10-23 2016-01-13 中国联合网络通信集团有限公司 Proxy server dynamic automatic selection method and system
CN107832355A (en) * 2017-10-23 2018-03-23 北京金堤科技有限公司 The method and device that a kind of agency of crawlers obtains

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11770435B2 (en) 2009-10-08 2023-09-26 Bright Data Ltd. System providing faster and more efficient data communication
US11962636B2 (en) 2009-10-08 2024-04-16 Bright Data Ltd. System providing faster and more efficient data communication
US11956299B2 (en) 2009-10-08 2024-04-09 Bright Data Ltd. System providing faster and more efficient data communication
US11949729B2 (en) 2009-10-08 2024-04-02 Bright Data Ltd. System providing faster and more efficient data communication
US11916993B2 (en) 2009-10-08 2024-02-27 Bright Data Ltd. System providing faster and more efficient data communication
US11902351B2 (en) 2009-10-08 2024-02-13 Bright Data Ltd. System providing faster and more efficient data communication
US11888921B2 (en) 2009-10-08 2024-01-30 Bright Data Ltd. System providing faster and more efficient data communication
US11611607B2 (en) 2009-10-08 2023-03-21 Bright Data Ltd. System providing faster and more efficient data communication
US11616826B2 (en) 2009-10-08 2023-03-28 Bright Data Ltd. System providing faster and more efficient data communication
US11659017B2 (en) 2009-10-08 2023-05-23 Bright Data Ltd. System providing faster and more efficient data communication
US11888922B2 (en) 2009-10-08 2024-01-30 Bright Data Ltd. System providing faster and more efficient data communication
US11659018B2 (en) 2009-10-08 2023-05-23 Bright Data Ltd. System providing faster and more efficient data communication
US11671476B2 (en) 2009-10-08 2023-06-06 Bright Data Ltd. System providing faster and more efficient data communication
US11876853B2 (en) 2009-10-08 2024-01-16 Bright Data Ltd. System providing faster and more efficient data communication
US11838119B2 (en) 2009-10-08 2023-12-05 Bright Data Ltd. System providing faster and more efficient data communication
US11811849B2 (en) 2009-10-08 2023-11-07 Bright Data Ltd. System providing faster and more efficient data communication
US11700295B2 (en) 2009-10-08 2023-07-11 Bright Data Ltd. System providing faster and more efficient data communication
US11811848B2 (en) 2009-10-08 2023-11-07 Bright Data Ltd. System providing faster and more efficient data communication
US11811850B2 (en) 2009-10-08 2023-11-07 Bright Data Ltd. System providing faster and more efficient data communication
US11838388B2 (en) 2013-08-28 2023-12-05 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11689639B2 (en) 2013-08-28 2023-06-27 Bright Data Ltd. System and method for improving Internet communication by using intermediate nodes
US11758018B2 (en) 2013-08-28 2023-09-12 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11949755B2 (en) 2013-08-28 2024-04-02 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11870874B2 (en) 2013-08-28 2024-01-09 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11924306B2 (en) 2013-08-28 2024-03-05 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11729297B2 (en) 2013-08-28 2023-08-15 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11799985B2 (en) 2013-08-28 2023-10-24 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11949756B2 (en) 2013-08-28 2024-04-02 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11924307B2 (en) 2013-08-28 2024-03-05 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11677856B2 (en) 2013-08-28 2023-06-13 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11902400B2 (en) 2013-08-28 2024-02-13 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11838386B2 (en) 2013-08-28 2023-12-05 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11979475B2 (en) 2013-08-28 2024-05-07 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11595496B2 (en) 2013-08-28 2023-02-28 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11757961B2 (en) 2015-05-14 2023-09-12 Bright Data Ltd. System and method for streaming content from multiple servers
US11729012B2 (en) 2017-08-28 2023-08-15 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11876612B2 (en) 2017-08-28 2024-01-16 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11888639B2 (en) 2017-08-28 2024-01-30 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11888638B2 (en) 2017-08-28 2024-01-30 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11979249B2 (en) 2017-08-28 2024-05-07 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11863339B2 (en) 2017-08-28 2024-01-02 Bright Data Ltd. System and method for monitoring status of intermediate devices
US11902044B2 (en) 2017-08-28 2024-02-13 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11729013B2 (en) 2017-08-28 2023-08-15 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11979250B2 (en) 2017-08-28 2024-05-07 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11962430B2 (en) 2017-08-28 2024-04-16 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11909547B2 (en) 2017-08-28 2024-02-20 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11956094B2 (en) 2017-08-28 2024-04-09 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11711233B2 (en) 2017-08-28 2023-07-25 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US11764987B2 (en) 2017-08-28 2023-09-19 Bright Data Ltd. System and method for monitoring proxy devices and selecting therefrom
US11757674B2 (en) 2017-08-28 2023-09-12 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
CN109587007A (en) * 2018-12-27 2019-04-05 湖南宸睿通信科技有限公司 A kind of communication equipment detecting device and its detection method
US11657110B2 (en) 2019-02-25 2023-05-23 Bright Data Ltd. System and method for URL fetching retry mechanism
US11675866B2 (en) 2019-02-25 2023-06-13 Bright Data Ltd. System and method for URL fetching retry mechanism
US11902253B2 (en) 2019-04-02 2024-02-13 Bright Data Ltd. System and method for managing non-direct URL fetching service
CN110034979A (en) * 2019-04-23 2019-07-19 恒安嘉新(北京)科技股份公司 A kind of proxy resources monitoring method, device, electronic equipment and storage medium
CN110147271B (en) * 2019-05-15 2020-04-28 重庆八戒传媒有限公司 Method and device for improving quality of crawler proxy and computer readable storage medium
CN110147271A (en) * 2019-05-15 2019-08-20 重庆八戒传媒有限公司 Promote the method, apparatus and computer readable storage medium of crawler agent quality
CN111277662A (en) * 2020-01-22 2020-06-12 咪咕文化科技有限公司 Processing method of proxy server, electronic device and storage medium
CN111277662B (en) * 2020-01-22 2022-11-08 咪咕文化科技有限公司 Processing method of proxy server, electronic device and storage medium
US11985210B2 (en) 2022-02-26 2024-05-14 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes
US11985212B2 (en) 2023-03-11 2024-05-14 Bright Data Ltd. System and method for improving internet communication by using intermediate nodes

Similar Documents

Publication Publication Date Title
CN108924199A (en) Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically
Roughan et al. 10 lessons from 10 years of measuring and modeling the internet's autonomous systems
CN105357054B (en) Website traffic analysis method, device and electronic equipment
Alderson et al. The many facets of internet topology and traffic
CN107431712A (en) Network flow daily record for multi-tenant environment
CN104298782B (en) Internet user actively accesses the analysis method of action trail
CN103795575B (en) A kind of system monitoring method towards multiple data centers
Claffy Tracking IPv6 evolution: data we have and data we need
Krishnamurthy et al. A Socratic method for validation of measurement-based networking research
CN106713506A (en) Data acquisition method and data acquisition system
Agarwal et al. High speed streaming data analysis of web generated log streams
CN109873793A (en) A kind of darknet discovery, source tracing method and system based on sample flow analysis
Roscoe The End of Internet Architecture.
Pak et al. Intermedia reliance and sustainability of emergent media: a large-scale analysis of American news outlets’ external linking behaviors
CN101599857A (en) Detect method, device and the network measuring system that inserts number of host of sharing
CN103957252B (en) The journal obtaining method and its system of cloud stocking system
Raban et al. Acting or reacting? Preferential attachment in a people‐tagging system
WO2015062652A1 (en) Technique for data traffic analysis
Zygmunt Role identification of social networkers
Jain et al. Temporal analysis of user behavior and topic evolution on Twitter
CN114189451A (en) Method for identifying target network backbone node
López et al. Exploring the availability, protocols and advertising of tor v3 domains
Gonzalez et al. On the tweet arrival process at Twitter: Analysis and applications
Hellerstein et al. The Network Oracle.
Zhao et al. Intelligent online BGP-4 analyzer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181130

RJ01 Rejection of invention patent application after publication