CN108924199A - Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically - Google Patents
Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically Download PDFInfo
- Publication number
- CN108924199A CN108924199A CN201810645506.6A CN201810645506A CN108924199A CN 108924199 A CN108924199 A CN 108924199A CN 201810645506 A CN201810645506 A CN 201810645506A CN 108924199 A CN108924199 A CN 108924199A
- Authority
- CN
- China
- Prior art keywords
- proxy server
- score value
- network
- internet protocol
- protocol address
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
Abstract
Method, apparatus, computer readable storage medium and terminal device that a kind of crawlers obtain network proxy server automatically are disclosed, technical field of computer programs is belonged to.This approach includes the following steps:Obtain agent Internet protocol address disclosed in network;Obtain the proxy server of survival;The repeated data in the proxy server of agent Internet protocol address and the survival disclosed in the network is excluded, effective proxy server is obtained;Quality-ordered is carried out for the effective proxy server;The crawlers sort according to the quality good or not of the proxy server of the survival, obtain network proxy server automatically.The device, computer readable storage medium and terminal device are implemented for this method.It is capable of providing the proxy server that simple and efficient technical method obtains high-volume high quality automatically, and multiple identities complicated and changeable are simulated for crawler system.
Description
Technical field
The present invention relates to technical field of computer programs, obtain network agent clothes automatically more particularly to a kind of crawlers
Method, apparatus, computer readable storage medium and the terminal device of business device.
Background technique
Currently, increasing with Internet user, internet data exponentially increases, how effectively from Books are numerous
Internet resources in find and find useful data and become more and more important.
Existing crawler and acquisition technique scheme generally use positioning target, obtain entrance, traversal URL (unified resource is fixed
Position symbol), determine data object, the mode of request storage is realized and is acquired and stores to data.
However, there are the following problems for existing technical solution:After completing crawler configuration and opening acquisition, due to each time
Self-identity information can all be issued destination server by network request crawler, it is easy to which collected object monitoring exists to same node
Frequent and a large amount of request in short time, the consumption of system resource and the improper service of data, General System all can in order to prevent
Enabling is counter to crawl strategy, does part limitation or thoroughly block to the frequent requests of same identity characteristic.Cause to crawl efficiency reduction,
It is imperfect to crawl data.General crawler system uses agent way in simulation multiple identities method, but needs to realize and be ready to
Proxy server resource, directly generation economic cost, or increase workload, cause whole efficiency to reduce.
Accordingly, there exist following demands:The agency for simply obtaining high-volume high quality automatically with efficient technical method is provided
Server simulates multiple identities complicated and changeable for crawler system.
Summary of the invention
In view of this, the present invention provides method, apparatus, meters that a kind of crawlers obtain network proxy server automatically
Calculation machine readable storage medium storing program for executing and terminal device, thus more suitable for practical.
In order to reach above-mentioned first purpose, crawlers provided by the invention obtain the side of network proxy server automatically
The technical solution of method is as follows:
The method that crawlers provided by the invention obtain network proxy server automatically includes the following steps:
Obtain agent Internet protocol address disclosed in network;
Obtain the proxy server of survival;
Exclude the repetition in the proxy server of agent Internet protocol address and the survival disclosed in the network
Data obtain effective proxy server;
Quality-ordered is carried out for the effective proxy server;
It is sorted according to the quality good or not of the proxy server of the survival, obtains network proxy server automatically.
Following technical measures also can be used in the method that crawlers provided by the invention obtain network proxy server automatically
It further realizes.
Preferably, the method that the crawlers obtain network proxy server automatically further includes to public in the network
The step of agent Internet protocol address opened is safeguarded.
Preferably, the method that the crawlers obtain network proxy server automatically further includes being directed to user's push
The step of Visual Report Forms of the working efficiency of the network proxy server.
Preferably, the method for obtaining agent Internet protocol address disclosed in network specifically includes following step
Suddenly:
With " Agent IP ", " proxy server " for keyword, target is the search engine of Baidu, Google, Bing for creation
Website crawls task;
Task is crawled according to described, crawls first n pages of data content of described search Engine Listing result page automatically,
In, n is customized positive integer;
Field comprising agent Internet protocol address and port numbers in first n pages of the data content is cleaned and deposited
Storage, obtains agent Internet protocol address disclosed in the network.
Preferably, including the word of agent Internet protocol address and port numbers in storage first n pages of the data content
Duan Shi, storage form are key-value pair form.
Preferably, described crawl task according to, first n pages of described search Engine Listing result page is crawled automatically
It is further comprising the steps of during data content:
Automatic the step of abandoning the entry that described search engine marks are " advertisement ", " popularization ".
Preferably, the method for the proxy server for obtaining survival specifically includes following steps:
Agent Internet protocol address disclosed in the network is grouped at random, obtains multiple groups agent Internet protocol
Address;
For multiple groups agent Internet protocol address, common agent side slogan is set;
According to multiple groups agent Internet protocol address and common agent side slogan, obtains the network host of survival and open
The common agent side slogan put;
The network host of the survival and the common agent side slogan of opening are merged into storage, obtain the agency of the survival
Server.
Preferably, the process that the network host of the survival and the common agent side slogan of opening are merged to storage
In, it is stored in the form of key-value pair after the network host of the survival and the common agent side slogan merging of opening.
Preferably, the common agent side slogan for obtaining the network host and opening survived is realized using NMap order
's.
Preferably, the reference parameter packet for during the effective proxy server progress quality-ordered
Include the classification of agent Internet protocol address, response speed, data forwarding rate and packet loss.
Preferably, it is real according to the label of the effective proxy server that the agent Internet protocol address, which is sorted out,
Existing.
Preferably, the label of the proxy server is selected from ssl-proxy, http agency, socket agency, transparent generation
Reason, any one in anonymity proxy.
Preferably, specific according to the method that the response speed carries out quality-ordered to the effective proxy server
Include the following steps:
By the Internet packets survey meter, according to the ascending row of average turnaround time of the effective proxy server
Sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Preferably, the average turnaround time for defining the effective proxy server is tIt is average, according to the effective generation
When managing the ascending sequence of average turnaround time of server, and recording score value, standards of grading are as follows:
tIt is averageWhen≤20ms, score value=10;20ms < tIt is averageWhen≤100ms, score value=8;100ms < tIt is averageWhen≤200ms,
Score value=7;200ms < tIt is averageWhen≤300ms, score value=6;300ms < tIt is averageWhen≤500ms, score value=0, abandoning score value is 0
As a result.
Preferably, being had according to the method that the data forwarding rate carries out quality-ordered to the effective proxy server
Body includes the following steps:
By calling routing trace command to send to target there is the Internet Control Information Protocol of different life spans to return
Answer message, with determine to destination routing, abandon time-out as a result, by the ascending row of total time between metric and hop
Sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Preferably, the interconnection network control that there are different life spans by calling routing trace command to send to target
Information protocol back message processed, with determine to destination routing, abandon time-out as a result, by total time between metric and hop
Ascending sequence, and when recording score value, standards of grading are as follows:
When metric≤10,
Total time≤20ms, score value=10;Total time≤50ms, score value=9;Total time≤100ms, score value=8;When total
Between≤200ms, score value=7;Total time≤300ms, score value=6;Total time≤500ms, score value=5;Total time > 500ms,
Score value=0 abandons the result that score value is 0;
When metric≤20,
Total time≤20ms, score value=9;Total time≤50ms, score value=8;Total time≤100ms, score value=7;When total
Between≤200ms, score value=6;Total time≤300ms, score value=5;Total time≤500ms, score value=4;Total time > 500ms,
Score value=0 abandons the result that score value is 0;
When metric≤30,
Total time≤20ms, score value=8;Total time≤50ms, score value=7;Total time≤100ms, score value=6;When total
Between≤200ms, score value=5;Total time≤300ms, score value=4;Total time≤500ms, score value=3;Total time > 500ms,
Score value=0 abandons the result that score value is 0.
Preferably, by calling routing trace command to send to target there is the internet-based control of different life spans to believe
Agreement back message is ceased, to determine that order is selected from traceroute order or tracerert order to when the routing of destination.
Preferably, specifically being wrapped according to the method that the packet loss carries out quality-ordered to the effective proxy server
Include following steps:
By ping order, an Internet Control Information Protocol back message echo@is spread out of to destination host and requires number
According to packet, and echo response data packet to be received is waited, to send 100 requests, temporally estimates packet loss with the number of success response
Rate by the ascending sequence of packet loss, and records score value, and the quality of the higher effective proxy server of score value is better.
Preferably, spreading out of an Internet Control Information Protocol back message to destination host by ping order
Echo requires data packet, and waits echo response data packet to be received, temporally secondary with success response to send 100 requests
Number estimation packet loss, by the ascending sequence of packet loss, and when recording score value, standards of grading are as follows:
Packet loss≤5%, score value=10;Packet loss≤15%, score value=8;Packet loss≤20%, score value=7;Packet loss
≤ 25%, score value=6;Packet loss≤30%, score value=5 abandon the list of proxies of 30% or more packet loss.
Preferably, the crawlers sort according to the quality good or not of the proxy server of the survival, it is automatic to obtain
The method of network proxy server specifically includes following steps:
In newly-increased task configuration crawler, multi-process is opened;
While the exclusive agency of unlatching process is with user agent, an agency and a kind of user are distributed to each process
Act on behalf of analog information;
It whether effective monitors agency used in each process, continues if effectively using the proxy server;Such as
Invalid then random from the network proxy server Internet protocol address list or sequence one effective net of reallocation
Network proxy server.
Preferably, the method that the crawlers obtain network proxy server automatically further includes according to timer
The step of range safeguards Internet protocol address specifically includes following steps:
Timer and start by set date polling service are set;
By the Internet protocol of the used effective proxy server shown during the polling service
Location excludes, and obtains the Internet protocol address list of remaining effective proxy server;
Quality-ordered is carried out to the Internet protocol address list of the remaining effective proxy server, is regained
Effective proxy server list.
Preferably, the Internet protocol address that the content of the visualized list includes each proxy server participates in
Number is crawled, data bulk is crawled and crawls the time.
In order to reach above-mentioned second purpose, crawlers provided by the invention obtain the dress of network proxy server automatically
The technical solution set is as follows:
The device that crawlers provided by the invention obtain network proxy server automatically includes:
Disclosed Internet protocol address acquiring unit, for obtaining agent Internet protocol address disclosed in network;
The proxy server acquiring unit of survival, for obtaining the proxy server of survival;
Effective proxy server obtains module, for excluding agent Internet protocol address and institute disclosed in the network
The repeated data in the proxy server of survival is stated, effective proxy server is obtained;
Effective proxy server quality-ordered unit, for carrying out quality-ordered for the effective proxy server;
Network proxy server acquiring unit, the quality good or not for the proxy server according to the survival sort, from
It is dynamic to obtain network proxy server.
Following technical measures also can be used in the device that crawlers provided by the invention obtain network proxy server automatically
It further realizes.
Preferably, the device that the crawlers obtain network proxy server automatically further includes:
Visual Report Forms pushing module, for user push for the network proxy server working efficiency can
Depending on changing report.
Preferably, the disclosed Internet protocol address acquiring unit includes:
Crawl task creation module, for create with " Agent IP ", " proxy server " be keyword, target be Baidu,
The search engine site of Google, Bing crawl task;
Automatically module is crawled, for crawling task according to, crawls the preceding n of described search Engine Listing result page automatically
The data content of page, wherein n is customized positive integer;
Disclosed Internet protocol address obtains module, for will include agency's interconnection in first n pages of the data content
The field of fidonetFido address and port numbers is cleaned and is stored, and agent Internet protocol address disclosed in the network is obtained.
Preferably, the device that the crawlers obtain network proxy server automatically further includes:
Entry discard module crawls task for described according to, crawls described search Engine Listing result page automatically
First n pages of data content during, it is automatic to abandon the entry that described search engine marks are " advertisement ", " popularization ".
Preferably, the proxy server acquiring unit of the survival includes:
Grouping module obtains more for being grouped at random to agent Internet protocol address disclosed in the network
Group agent Internet protocol address;
Common agent side slogan setup module, for for the common agency of multiple groups agent Internet protocol address setting
Port numbers;
The network host of survival and the common agent side slogan of opening obtain module, interconnect for being acted on behalf of according to the multiple groups
FidonetFido address and common agent side slogan obtain the network host of survival and the common agent side slogan of opening;
The proxy server of survival obtains module, for by the common proxy port of the network host of the survival and opening
Number merge storage, obtain the proxy server of the survival.
Preferably, effective proxy server quality-ordered unit includes:
Response speed sorting module, for passing through the Internet packets survey meter, according to the flat of the effective proxy server
Equal turnaround time ascending sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Preferably, effective proxy server quality-ordered unit includes:
Data forwarding rate sorting module, for there are different life spans by calling routing trace command to send to target
Internet Control Information Protocol back message, with determine to destination routing, abandon time-out as a result, by metric and jump
Total time ascending sequence between point, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Preferably, effective proxy server quality-ordered unit includes:
Packet loss sorting module is spread out of an Internet Control Information Protocol to destination host and is responded by ping order
Message echo requires data packet, and waits echo response data packet to be received, to send 100 requests, temporally and success response
Number estimate packet loss, by the ascending sequence of packet loss, and record score value, the matter of the higher effective proxy server of score value
Amount is better.
Preferably, the network proxy server acquiring unit includes:
Process opening module, for opening multi-process in newly-increased task configuration crawler;
Process distribution module is used to divide while the exclusive agency of the process of unlatching and user agent to each process
With an agency and a kind of user agent's analog information;
Monitor module, for monitor acted on behalf of used in each process it is whether effective, as effectively if continue using this
Proxy server;Random from the network proxy server Internet protocol address list or sequence is reallocated if invalid
One effective network proxy server.
Preferably, the device that the crawlers obtain network proxy server automatically further includes:
Internet protocol address maintenance unit, for being tieed up to agent Internet protocol address disclosed in the network
Shield.
Preferably, the Internet protocol address maintenance unit includes:
Polling service starting module, for timer and start by set date polling service to be arranged;
Internet protocol address list obtains module, for used by what is shown during the polling service
The Internet protocol address of effective proxy server excludes, with obtaining the Internet protocol of remaining effective proxy server
Location list;
Effective proxy server list obtains module, for the internet protocol to the remaining effective proxy server
It discusses address list and carries out quality-ordered, regain effective proxy server list.
In order to reach above-mentioned third purpose, the technical solution of computer readable storage medium provided by the invention is as follows:
Crawlers, which are stored with, on computer readable storage medium provided by the invention obtains network proxy server automatically
Program, the program of the automatic acquisition network proxy server, which is performed, realizes that crawlers provided by the invention obtain automatically
The step of taking the method for network proxy server.
In order to reach above-mentioned 4th purpose, the technical solution of computer readable storage medium provided by the invention is as follows:
Terminal device provided by the invention includes processor, memory, and it is automatic to be stored with crawlers on the memory
The program of network proxy server is obtained, the program of the automatic acquisition network proxy server, which is performed, realizes that the present invention mentions
The crawlers of confession obtain the step of method of network proxy server automatically.
Crawlers provided in an embodiment of the present invention obtain the method, apparatus of network proxy server automatically, computer can
Available Agent IP, the viability of autonomous exploration Agent IP, sound can be obtained from network automatically by reading storage medium and terminal device
Speed and data packet routing forwarding efficiency are answered, Agent IP is subjected to quality-ordered, the autonomous Agent IP poll that switches crawls, thus mould
Intend into different IP and initiate network request, the counter of target is crawled with confrontation and climbs strategy.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 is the step of crawlers that the embodiment of the present invention one provides obtain the method for network proxy server automatically stream
Cheng Tu;
Fig. 2 is the signal stream for the device that crawlers provided by Embodiment 2 of the present invention obtain network proxy server automatically
To relation schematic diagram.
Specific embodiment
The present invention in order to solve the problems existing in the prior art, provides a kind of crawlers and obtains network proxy server automatically
Method, apparatus, computer readable storage medium and terminal device, thus more suitable for practical.
It is of the invention to reach the technical means and efficacy that predetermined goal of the invention is taken further to illustrate, below in conjunction with
Attached drawing and preferred embodiment, to crawlers proposed according to the present invention obtain automatically network proxy server method, apparatus,
Computer readable storage medium and terminal device, specific embodiment, structure, feature and its effect, detailed description is as follows.?
In following the description, what different " embodiment " or " embodiment " referred to is not necessarily the same embodiment.In addition, one or more are implemented
Feature, structure or feature in example can be combined by any suitable form.
The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes
System, for example, A and/or B, is specifically interpreted as:It can simultaneously include A and B, can be with individualism A, it can also be with individualism
B can have above-mentioned three kinds of any case.
Embodiment one
Referring to attached drawing 1, the crawlers that the embodiment of the present invention one provides obtain the method packet of network proxy server automatically
Include following steps:
Obtain agent Internet protocol address disclosed in network;
Obtain the proxy server of survival;
The repeated data in the proxy server of agent Internet protocol address disclosed in network and survival is excluded, is obtained
Effective proxy server;
Quality-ordered is carried out for effective proxy server;
It is sorted according to the quality good or not of the proxy server of survival, obtains network proxy server automatically.
Crawlers provided in an embodiment of the present invention obtain the method, apparatus of network proxy server automatically, computer can
Available Agent IP, the viability of autonomous exploration Agent IP, sound can be obtained from network automatically by reading storage medium and terminal device
Speed and data packet routing forwarding efficiency are answered, Agent IP is subjected to quality-ordered, the autonomous Agent IP poll that switches crawls, thus mould
Intend into different IP and initiate network request, the counter of target is crawled with confrontation and climbs strategy.It, which can fight, crawls the counter of target and climbs strategy,
It is modeled to multiple request points and initiates network request, (1) crawls failure caused by preventing target to block or crawls imperfect;(2)
Multiple identities combine more association's journeys to improve the efficiency for crawling data at multiple.
Wherein, the method that crawlers obtain network proxy server automatically further includes to agency's interconnection disclosed in network
The step of fidonetFido address is safeguarded.
Wherein, the method that crawlers obtain network proxy server automatically further includes to user's push for network agent
The step of Visual Report Forms of the working efficiency of server.
Wherein, the method for obtaining agent Internet protocol address disclosed in network specifically includes following steps:
With " Agent IP ", " proxy server " for keyword, target is the search engine of Baidu, Google, Bing for creation
Website crawls task;
According to the task that crawls, first n pages of data content of search engine tabulating result page is crawled automatically, wherein n is to make by oneself
The positive integer of justice;
Field comprising agent Internet protocol address and port numbers in first n pages of data content is cleaned and stored, is obtained
To agent Internet protocol address disclosed in network.
Wherein, when storing the field in preceding n pages of data content comprising agent Internet protocol address and port numbers, storage
Form is key-value pair form.
Wherein, according to the task that crawls, the process of first n pages of data content of search engine tabulating result page is crawled automatically
In, it is further comprising the steps of:
It is automatic to abandon the step of search engine is labeled as the entry of " advertisement ", " popularization ".
Wherein, the method for obtaining the proxy server of survival specifically includes following steps:
Agent Internet protocol address disclosed in network is grouped at random, with obtaining multiple groups agent Internet protocol
Location;
For multiple groups agent Internet protocol address, common agent side slogan is set;
According to multiple groups agent Internet protocol address and common agent side slogan, network host and the opening of survival are obtained
Common agent side slogan;
The network host of survival and the common agent side slogan of opening are merged into storage, the proxy server survived.
Wherein, during the network host of survival and the common agent side slogan of opening being merged storage, the net of survival
Network host and the common agent side slogan of opening are stored in the form of key-value pair after merging.
Wherein, the common agent side slogan application NMap order of the network host and opening that obtain survival is realized.
Wherein, the reference parameter during carrying out quality-ordered for effective proxy server includes agent Internet
Protocol address classification, response speed, data forwarding rate and packet loss.
Wherein, it is according to the realization of the label of effective proxy server that agent Internet protocol address, which is sorted out,.
Wherein, the label of proxy server is selected from ssl-proxy, http agency, socket agency, Transparent Proxy, anonymous generation
Any one in reason.
Wherein, the method that speed carries out quality-ordered to effective proxy server according to response specifically includes following step
Suddenly:
By the Internet packets survey meter, according to the ascending sequence of average turnaround time of effective proxy server, and
Score value is recorded, the quality of the higher effective proxy server of score value is better.
Wherein, the average turnaround time for defining effective proxy server is tIt is average, according to the flat of effective proxy server
Equal turnaround time ascending sequence, and when recording score value, standards of grading are as follows:
tIt is averageWhen≤20ms, score value=10;20ms < tIt is averageWhen≤100ms, score value=8;100ms < tIt is averageWhen≤200ms,
Score value=7;200ms < tIt is averageWhen≤300ms, score value=6;300ms < tIt is averageWhen≤500ms, score value=0, abandoning score value is 0
As a result.
Wherein, following step is specifically included according to the method that data forwarding rate carries out quality-ordered to effective proxy server
Suddenly:
By calling routing trace command to send to target there is the Internet Control Information Protocol of different life spans to return
Answer message, with determine to destination routing, abandon time-out as a result, by the ascending row of total time between metric and hop
Sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Wherein, by calling routing trace command to send to target there is the internet-based control information of different life spans to assist
Discuss back message, with determine to destination routing, abandon time-out as a result, ascending by total time between metric and hop
When sorting, and recording score value, standards of grading are as follows:
When metric≤10,
Total time≤20ms, score value=10;Total time≤50ms, score value=9;Total time≤100ms, score value=8;When total
Between≤200ms, score value=7;Total time≤300ms, score value=6;Total time≤500ms, score value=5;Total time > 500ms,
Score value=0 abandons the result that score value is 0;
When metric≤20,
Total time≤20ms, score value=9;Total time≤50ms, score value=8;Total time≤100ms, score value=7;When total
Between≤200ms, score value=6;Total time≤300ms, score value=5;Total time≤500ms, score value=4;Total time > 500ms,
Score value=0 abandons the result that score value is 0;
When metric≤30,
Total time≤20ms, score value=8;Total time≤50ms, score value=7;Total time≤100ms, score value=6;When total
Between≤200ms, score value=5;Total time≤300ms, score value=4;Total time≤500ms, score value=3;Total time > 500ms,
Score value=0 abandons the result that score value is 0.
Wherein, by calling routing trace command to send to target there is the internet-based control information of different life spans to assist
Back message is discussed, to determine that order is selected from traceroute order or tracerert order to when the routing of destination.
Wherein, following steps are specifically included according to the method that packet loss carries out quality-ordered to effective proxy server:
By ping order, an Internet Control Information Protocol back message echo@is spread out of to destination host and requires number
According to packet, and echo response data packet to be received is waited, to send 100 requests, temporally estimates packet loss with the number of success response
Rate by the ascending sequence of packet loss, and records score value, and the quality of the higher effective proxy server of score value is better.
Wherein, by ping order, an Internet Control Information Protocol back message echo@is spread out of to destination host and is wanted
Data packet is sought, and waits echo response data packet to be received, to send 100 requests, is temporally estimated with the number of success response
Packet loss, by the ascending sequence of packet loss, and when recording score value, standards of grading are as follows:
Packet loss≤5%, score value=10;Packet loss≤15%, score value=8;Packet loss≤20%, score value=7;Packet loss
≤ 25%, score value=6;Packet loss≤30%, score value=5 abandon the list of proxies of 30% or more packet loss.
Wherein, crawlers sort according to the quality good or not of the proxy server of survival, obtain external network proxy services automatically
The method of device specifically includes following steps:
In newly-increased task configuration crawler, multi-process is opened;
While the exclusive agency of unlatching process is with user agent, an agency and a kind of user agent are distributed to each process
Analog information;
It whether effective monitors agency used in each process, continues if effectively using the proxy server;As in vain
The effective network agent clothes of then random from network proxy server Internet protocol address list or sequence reallocation one
Business device.
Wherein, the method that crawlers obtain network proxy server automatically further includes the range according to timer to mutual
The step of networking protocol address is safeguarded, specifically includes following steps:
Timer and start by set date polling service are set;
By the Internet protocol address row of the used effective proxy server shown during polling service
It removes, obtains the Internet protocol address list of remaining effective proxy server;
Quality-ordered is carried out to the Internet protocol address list of remaining effective proxy server, is regained effectively
Proxy server list.
Wherein, the content of visualized list includes that crawling for the Internet protocol address participation of each proxy server is secondary
Number crawls data bulk and crawls the time.
Embodiment two
The device that crawlers provided by Embodiment 2 of the present invention obtain network proxy server automatically includes:
Disclosed Internet protocol address acquiring unit, for obtaining agent Internet protocol address disclosed in network;
The proxy server acquiring unit of survival, for obtaining the proxy server of survival;
Effective proxy server obtains module, for excluding agent Internet protocol address disclosed in network and survival
Repeated data in proxy server obtains effective proxy server;
Effective proxy server quality-ordered unit, for carrying out quality-ordered for effective proxy server;
Network proxy server acquiring unit, the quality good or not for the proxy server according to survival sort, obtain automatically
Take network proxy server.
Wherein, crawlers obtain the device of network proxy server automatically and further include:
Visual Report Forms pushing module, for the visualization to user's push for the working efficiency of network proxy server
Report.
Wherein, disclosed Internet protocol address acquiring unit includes:
Crawl task creation module, for create with " Agent IP ", " proxy server " be keyword, target be Baidu,
The search engine site of Google, Bing crawl task;
Automatically module is crawled, for crawling first n pages of data of search engine tabulating result page automatically according to task is crawled
Content, wherein n is customized positive integer;
Disclosed Internet protocol address obtains module, for will include that agent Internet is assisted in first n pages of data content
The field of view address and port numbers is cleaned and is stored, and agent Internet protocol address disclosed in network is obtained.
Wherein, crawlers obtain the device of network proxy server automatically and further include:
Entry discard module, for crawling first n pages of data of search engine tabulating result page automatically according to task is crawled
During content, the automatic search engine that abandons is labeled as the entry of " advertisement ", " popularization ".
Wherein, the proxy server acquiring unit of survival includes:
Grouping module obtains multiple groups generation for being grouped at random to agent Internet protocol address disclosed in network
Manage Internet protocol address;
Common agent side slogan setup module, for common proxy port to be arranged for multiple groups agent Internet protocol address
Number;
The network host of survival and the common agent side slogan of opening obtain module, for being assisted according to multiple groups agent Internet
Address and common agent side slogan are discussed, the network host of survival and the common agent side slogan of opening are obtained;
The proxy server of survival obtains module, and the common agent side slogan of network host and opening for that will survive closes
And it stores, the proxy server survived.
Wherein, effective proxy server quality-ordered unit includes:
Response speed sorting module, for by the Internet packets survey meter, average according to effective proxy server to be returned
Multiple time ascending sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Wherein, effective proxy server quality-ordered unit includes:
Data forwarding rate sorting module, for there are different life spans by calling routing trace command to send to target
Internet Control Information Protocol back message, with determine to destination routing, abandon time-out as a result, by metric and jump
Total time ascending sequence between point, and score value is recorded, the quality of the higher effective proxy server of score value is better.
Wherein, effective proxy server quality-ordered unit includes:
Packet loss sorting module is spread out of an Internet Control Information Protocol to destination host and is responded by ping order
Message echo requires data packet, and waits echo response data packet to be received, to send 100 requests, temporally and success response
Number estimate packet loss, by the ascending sequence of packet loss, and record score value, the matter of the higher effective proxy server of score value
Amount is better.
Wherein, network proxy server acquiring unit includes:
Process opening module, for opening multi-process in newly-increased task configuration crawler;
Process distribution module is used to distribute one to each process while the exclusive agency of the process of unlatching and user agent
A agency and a kind of user agent's analog information;
Monitor module, for monitor acted on behalf of used in each process it is whether effective, as effectively if continue apply the agency
Server;Random from network proxy server Internet protocol address list or sequence reallocates one effectively if invalid
Network proxy server.
Wherein, crawlers obtain the device of network proxy server automatically and further include:
Internet protocol address maintenance unit, for being safeguarded to agent Internet protocol address disclosed in network.
Wherein, Internet protocol address maintenance unit includes:
Polling service starting module, for timer and start by set date polling service to be arranged;
Internet protocol address list obtains module, for used effective by what is shown during polling service
Proxy server Internet protocol address exclude, obtain remaining effective proxy server Internet protocol address column
Table;
Effective proxy server list obtains module, for the Internet protocol to remaining effective proxy server
Location list carries out quality-ordered, regains effective proxy server list.
Embodiment three
Crawlers, which are stored with, on the computer readable storage medium that the embodiment of the present invention three provides obtains network generation automatically
The program of server is managed, the automatic program for obtaining network proxy server, which is performed, realizes crawlers provided by the invention certainly
The step of dynamic method for obtaining network proxy server.
Example IV
The terminal device that the embodiment of the present invention four provides includes processor, memory, is stored with crawlers on memory
The automatic program for obtaining network proxy server, the automatic program for obtaining network proxy server, which is performed, realizes that the present invention mentions
The crawlers of confession obtain the step of method of network proxy server automatically.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (10)
1. a kind of method that crawlers obtain network proxy server automatically, which is characterized in that include the following steps:
Obtain agent Internet protocol address disclosed in network;
Obtain the proxy server of survival;
The repeated data in the proxy server of agent Internet protocol address and the survival disclosed in the network is excluded,
Obtain effective proxy server;
Quality-ordered is carried out for the effective proxy server;
It is sorted according to the quality good or not of the proxy server of the survival, obtains network proxy server automatically.
2. the method that crawlers according to claim 1 obtain network proxy server automatically, which is characterized in that also wrap
Include the step of safeguarding to agent Internet protocol address disclosed in the network.
3. the method that crawlers according to claim 1 obtain network proxy server automatically, which is characterized in that also wrap
The step of including the Visual Report Forms for the working efficiency for being directed to the network proxy server to user's push.
4. the method that crawlers according to claim 1 obtain network proxy server automatically, which is characterized in that described
The method for obtaining agent Internet protocol address disclosed in network specifically includes following steps:
With " Agent IP ", " proxy server " for keyword, target is the search engine site of Baidu, Google, Bing for creation
Crawl task;
Task is crawled according to described, crawls first n pages of data content of described search Engine Listing result page automatically, wherein n is
Customized positive integer;
Field comprising agent Internet protocol address and port numbers in first n pages of the data content is cleaned and stored, is obtained
To agent Internet protocol address disclosed in the network;
Preferably, when storing the field comprising agent Internet protocol address and port numbers in first n pages of the data content,
Storage form is key-value pair form;
Preferably, described crawl task according to, first n pages of data of described search Engine Listing result page are crawled automatically
It is further comprising the steps of during content:
Automatic the step of abandoning the entry that described search engine marks are " advertisement ", " popularization ";Preferably, the acquisition survival
The method of proxy server specifically include following steps:
Agent Internet protocol address disclosed in the network is grouped at random, with obtaining multiple groups agent Internet protocol
Location;
For multiple groups agent Internet protocol address, common agent side slogan is set;
According to multiple groups agent Internet protocol address and common agent side slogan, network host and the opening of survival are obtained
Common agent side slogan;
The network host of the survival and the common agent side slogan of opening are merged into storage, obtain the agency service of the survival
Device;
Preferably,
It is described the network host of the survival and the common agent side slogan of opening are merged into storage during, the survival
Network host and the common agent side slogan of opening are stored in the form of key-value pair after merging;
Preferably, obtaining the network host of survival and the common agent side slogan application NMap order realization of opening;
Preferably, the reference parameter for during the effective proxy server progress quality-ordered includes generation
Manage Internet protocol address classification, response speed, data forwarding rate and packet loss;
Preferably, the agent Internet protocol address, which is sorted out, to be realized according to the label of the effective proxy server
's;
Preferably, the label of the proxy server is selected from ssl-proxy, http agency, socket agency, Transparent Proxy, hides
Name agency in any one;
Preferably, being specifically included according to the method that the response speed carries out quality-ordered to the effective proxy server
Following steps:
By the Internet packets survey meter, according to the ascending sequence of average turnaround time of the effective proxy server, and
Score value is recorded, the quality of the higher effective proxy server of score value is better;
Preferably, the average turnaround time for defining the effective proxy server is tIt is average, taken according to the effective agency
It is engaged in the ascending sequence of average turnaround time of device, and when recording score value, standards of grading are as follows:
tIt is averageWhen≤20ms, score value=10;20ms < tIt is averageWhen≤100ms, score value=8;100ms < tIt is averageWhen≤200ms, score value
=7;200ms < tIt is averageWhen≤300ms, score value=6;300ms < tIt is averageWhen≤500ms, score value=0 abandons the knot that score value is 0
Fruit;
Preferably, specifically being wrapped according to the method that the data forwarding rate carries out quality-ordered to the effective proxy server
Include following steps:
By calling routing trace command to send to target there is the Internet Control Information Protocol of different life spans to respond report
Text, to determine to the routing of destination, abandon time-out as a result, by the ascending sequence of total time between metric and hop, and
Score value is recorded, the quality of the higher effective proxy server of score value is better;
Preferably, described by calling routing trace command to send to target there is the internet-based control of different life spans to believe
Cease agreement back message, with determine to destination routing, abandon time-out as a result, by total time between metric and hop by small
When sorting to big, and recording score value, standards of grading are as follows:
When metric≤10,
Total time≤20ms, score value=10;Total time≤50ms, score value=9;Total time≤100ms, score value=8;Total time≤
200ms, score value=7;Total time≤300ms, score value=6;Total time≤500ms, score value=5;Total time > 500ms, score value
=0, abandon the result that score value is 0;
When metric≤20,
Total time≤20ms, score value=9;Total time≤50ms, score value=8;Total time≤100ms, score value=7;Total time≤
200ms, score value=6;Total time≤300ms, score value=5;Total time≤500ms, score value=4;Total time > 500ms, score value
=0, abandon the result that score value is 0;
When metric≤30,
Total time≤20ms, score value=8;Total time≤50ms, score value=7;Total time≤100ms, score value=6;Total time≤
200ms, score value=5;Total time≤300ms, score value=4;Total time≤500ms, score value=3;Total time > 500ms, score value
=0, abandon the result that score value is 0;
Preferably, by calling routing trace command to send to target there is the internet-based control information of different life spans to assist
Back message is discussed, to determine that order is selected from traceroute order or tracerert order to when the routing of destination;
Preferably, according to the method that the packet loss carries out quality-ordered to the effective proxy server specifically include with
Lower step:
By ping order, an Internet Control Information Protocol back message echo@is spread out of to destination host and requires data packet,
And echo response data packet to be received is waited, to send 100 requests, packet loss temporally is estimated with the number of success response, is pressed
The ascending sequence of packet loss, and score value is recorded, the quality of the higher effective proxy server of score value is better;
It is preferred to make position, by ping order, spreads out of an Internet Control Information Protocol back message echo@to destination host and wants
Data packet is sought, and waits echo response data packet to be received, to send 100 requests, is temporally estimated with the number of success response
Packet loss, by the ascending sequence of packet loss, and when recording score value, standards of grading are as follows:
Packet loss≤5%, score value=10;Packet loss≤15%, score value=8;Packet loss≤20%, score value=7;Packet loss≤
25%, score value=6;Packet loss≤30%, score value=5 abandon the list of proxies of 30% or more packet loss;
Preferably, the crawlers sort according to the quality good or not of the proxy server of the survival, network is obtained automatically
The method of proxy server specifically includes following steps:
In newly-increased task configuration crawler, multi-process is opened;
While the exclusive agency of unlatching process is with user agent, an agency and a kind of user agent are distributed to each process
Analog information;
It whether effective monitors agency used in each process, continues if effectively using the proxy server;As in vain
Then random from the network proxy server Internet protocol address list or sequence is reallocated an effective network generation
Manage server;
Preferably, the method that the crawlers obtain network proxy server automatically further includes the range pair according to timer
The step of Internet protocol address is safeguarded specifically includes following steps:
Timer and start by set date polling service are set;
By the Internet protocol address row of the used effective proxy server shown during the polling service
It removes, obtains the Internet protocol address list of remaining effective proxy server;
Quality-ordered is carried out to the Internet protocol address list of the remaining effective proxy server, is regained effectively
Proxy server list;
Preferably, the content of the visualized list includes crawling for the Internet protocol address participation of each proxy server
Number crawls data bulk and crawls the time.
5. the device that a kind of crawlers obtain network proxy server automatically, which is characterized in that including:
Disclosed Internet protocol address acquiring unit, for obtaining agent Internet protocol address disclosed in network;
The proxy server acquiring unit of survival, for obtaining the proxy server of survival;
Effective proxy server obtains module, for excluding agent Internet protocol address disclosed in the network and described depositing
Repeated data in proxy server living obtains effective proxy server;
Effective proxy server quality-ordered unit, for carrying out quality-ordered for the effective proxy server;
Network proxy server acquiring unit, the quality good or not for the proxy server according to the survival sort, obtain automatically
Take network proxy server.
6. the device that crawlers according to claim 5 obtain network proxy server automatically, which is characterized in that also wrap
It includes:
Visual Report Forms pushing module, for the visualization to user's push for the working efficiency of the network proxy server
Report.
7. the device that crawlers according to claim 5 obtain network proxy server automatically, which is characterized in that described
Disclosed Internet protocol address acquiring unit includes:
Crawl task creation module, for create with " Agent IP ", " proxy server " be keyword, target be Baidu,
The search engine site of Google, Bing crawl task;
Automatically module is crawled, for crawling task according to, crawls first n pages of described search Engine Listing result page automatically
Data content, wherein n is customized positive integer;
Disclosed Internet protocol address obtains module, for will include that agent Internet is assisted in first n pages of the data content
The field of view address and port numbers is cleaned and is stored, and agent Internet protocol address disclosed in the network is obtained.
8. the device that crawlers according to claim 7 obtain network proxy server automatically, which is characterized in that also wrap
It includes:
Entry discard module crawls task for described according to, crawls the preceding n of described search Engine Listing result page automatically
It is automatic to abandon the entry that described search engine marks are " advertisement ", " popularization " during the data content of page;
Preferably, the proxy server acquiring unit of the survival includes:
Grouping module obtains multiple groups generation for being grouped at random to agent Internet protocol address disclosed in the network
Manage Internet protocol address;
Common agent side slogan setup module, for common proxy port to be arranged for multiple groups agent Internet protocol address
Number;
The network host of survival and the common agent side slogan of opening obtain module, for being assisted according to the multiple groups agent Internet
Address and common agent side slogan are discussed, the network host of survival and the common agent side slogan of opening are obtained;
The proxy server of survival obtains module, for closing the common agent side slogan of the network host of the survival and opening
And store, obtain the proxy server of the survival;
Preferably, effective proxy server quality-ordered unit includes:
Response speed sorting module, for by the Internet packets survey meter, average according to the effective proxy server to be returned
Multiple time ascending sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better;
Preferably, effective proxy server quality-ordered unit includes:
Data forwarding rate sorting module, for there is the mutual of different life spans by calling routing trace command to send to target
Networking control information protocol back message, with determine to destination routing, abandon time-out as a result, by between metric and hop
Total time ascending sequence, and score value is recorded, the quality of the higher effective proxy server of score value is better;
Preferably, effective proxy server quality-ordered unit includes:
Packet loss sorting module spreads out of an Internet Control Information Protocol back message to destination host by ping order
Echo requires data packet, and waits echo response data packet to be received, temporally secondary with success response to send 100 requests
Number estimation packet loss, by the ascending sequence of packet loss, and records score value, the quality of the higher effective proxy server of score value is more
It is good;
Preferably, the network proxy server acquiring unit includes:
Process opening module, for opening multi-process in newly-increased task configuration crawler;
Process distribution module is used to distribute one to each process while the exclusive agency of the process of unlatching and user agent
A agency and a kind of user agent's analog information;
Monitor module, for monitor acted on behalf of used in each process it is whether effective, as effectively if continue apply the agency
Server;Random from the network proxy server Internet protocol address list or sequence is reallocated one if invalid
Effective network proxy server;
Preferably, the device that the crawlers obtain network proxy server automatically further includes:
Internet protocol address maintenance unit, for being safeguarded to agent Internet protocol address disclosed in the network;
Preferably, the Internet protocol address maintenance unit includes:
Polling service starting module, for timer and start by set date polling service to be arranged;
Internet protocol address list obtains module, for used effective by what is shown during the polling service
Proxy server Internet protocol address exclude, obtain remaining effective proxy server Internet protocol address column
Table;
Effective proxy server list obtains module, for the Internet protocol to the remaining effective proxy server
Location list carries out quality-ordered, regains effective proxy server list.
9. a kind of computer readable storage medium, which is characterized in that be stored with crawler journey on the computer readable storage medium
Sequence obtains the program of network proxy server automatically, and the program of the automatic acquisition network proxy server is performed realization power
Benefit requires the step of any described method in 1~4.
10. a kind of terminal device, which is characterized in that including processor, memory, crawlers are stored on the memory certainly
The dynamic program for obtaining network proxy server, the program of the automatic acquisition network proxy server, which is performed, realizes that right is wanted
The step of seeking any described method in 1~4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810645506.6A CN108924199A (en) | 2018-06-21 | 2018-06-21 | Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810645506.6A CN108924199A (en) | 2018-06-21 | 2018-06-21 | Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108924199A true CN108924199A (en) | 2018-11-30 |
Family
ID=64420901
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810645506.6A Pending CN108924199A (en) | 2018-06-21 | 2018-06-21 | Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108924199A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109587007A (en) * | 2018-12-27 | 2019-04-05 | 湖南宸睿通信科技有限公司 | A kind of communication equipment detecting device and its detection method |
CN110034979A (en) * | 2019-04-23 | 2019-07-19 | 恒安嘉新(北京)科技股份公司 | A kind of proxy resources monitoring method, device, electronic equipment and storage medium |
CN110147271A (en) * | 2019-05-15 | 2019-08-20 | 重庆八戒传媒有限公司 | Promote the method, apparatus and computer readable storage medium of crawler agent quality |
CN111277662A (en) * | 2020-01-22 | 2020-06-12 | 咪咕文化科技有限公司 | Processing method of proxy server, electronic device and storage medium |
US11595496B2 (en) | 2013-08-28 | 2023-02-28 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11611607B2 (en) | 2009-10-08 | 2023-03-21 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11657110B2 (en) | 2019-02-25 | 2023-05-23 | Bright Data Ltd. | System and method for URL fetching retry mechanism |
US11711233B2 (en) | 2017-08-28 | 2023-07-25 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11757961B2 (en) | 2015-05-14 | 2023-09-12 | Bright Data Ltd. | System and method for streaming content from multiple servers |
US11902253B2 (en) | 2019-04-02 | 2024-02-13 | Bright Data Ltd. | System and method for managing non-direct URL fetching service |
US11985212B2 (en) | 2023-03-11 | 2024-05-14 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105245607A (en) * | 2015-10-23 | 2016-01-13 | 中国联合网络通信集团有限公司 | Proxy server dynamic automatic selection method and system |
US20170032044A1 (en) * | 2006-11-14 | 2017-02-02 | Paul Vincent Hayes | System and Method for Personalized Search While Maintaining Searcher Privacy |
CN106547793A (en) * | 2015-09-22 | 2017-03-29 | 北京国双科技有限公司 | The method and apparatus for obtaining proxy server address |
CN107832355A (en) * | 2017-10-23 | 2018-03-23 | 北京金堤科技有限公司 | The method and device that a kind of agency of crawlers obtains |
-
2018
- 2018-06-21 CN CN201810645506.6A patent/CN108924199A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170032044A1 (en) * | 2006-11-14 | 2017-02-02 | Paul Vincent Hayes | System and Method for Personalized Search While Maintaining Searcher Privacy |
CN106547793A (en) * | 2015-09-22 | 2017-03-29 | 北京国双科技有限公司 | The method and apparatus for obtaining proxy server address |
CN105245607A (en) * | 2015-10-23 | 2016-01-13 | 中国联合网络通信集团有限公司 | Proxy server dynamic automatic selection method and system |
CN107832355A (en) * | 2017-10-23 | 2018-03-23 | 北京金堤科技有限公司 | The method and device that a kind of agency of crawlers obtains |
Cited By (61)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11770435B2 (en) | 2009-10-08 | 2023-09-26 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11962636B2 (en) | 2009-10-08 | 2024-04-16 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11956299B2 (en) | 2009-10-08 | 2024-04-09 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11949729B2 (en) | 2009-10-08 | 2024-04-02 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11916993B2 (en) | 2009-10-08 | 2024-02-27 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11902351B2 (en) | 2009-10-08 | 2024-02-13 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11888921B2 (en) | 2009-10-08 | 2024-01-30 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11611607B2 (en) | 2009-10-08 | 2023-03-21 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11616826B2 (en) | 2009-10-08 | 2023-03-28 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11659017B2 (en) | 2009-10-08 | 2023-05-23 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11888922B2 (en) | 2009-10-08 | 2024-01-30 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11659018B2 (en) | 2009-10-08 | 2023-05-23 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11671476B2 (en) | 2009-10-08 | 2023-06-06 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11876853B2 (en) | 2009-10-08 | 2024-01-16 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11838119B2 (en) | 2009-10-08 | 2023-12-05 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11811849B2 (en) | 2009-10-08 | 2023-11-07 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11700295B2 (en) | 2009-10-08 | 2023-07-11 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11811848B2 (en) | 2009-10-08 | 2023-11-07 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11811850B2 (en) | 2009-10-08 | 2023-11-07 | Bright Data Ltd. | System providing faster and more efficient data communication |
US11838388B2 (en) | 2013-08-28 | 2023-12-05 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11689639B2 (en) | 2013-08-28 | 2023-06-27 | Bright Data Ltd. | System and method for improving Internet communication by using intermediate nodes |
US11758018B2 (en) | 2013-08-28 | 2023-09-12 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11949755B2 (en) | 2013-08-28 | 2024-04-02 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11870874B2 (en) | 2013-08-28 | 2024-01-09 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11924306B2 (en) | 2013-08-28 | 2024-03-05 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11729297B2 (en) | 2013-08-28 | 2023-08-15 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11799985B2 (en) | 2013-08-28 | 2023-10-24 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11949756B2 (en) | 2013-08-28 | 2024-04-02 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11924307B2 (en) | 2013-08-28 | 2024-03-05 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11677856B2 (en) | 2013-08-28 | 2023-06-13 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11902400B2 (en) | 2013-08-28 | 2024-02-13 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11838386B2 (en) | 2013-08-28 | 2023-12-05 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11979475B2 (en) | 2013-08-28 | 2024-05-07 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11595496B2 (en) | 2013-08-28 | 2023-02-28 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11757961B2 (en) | 2015-05-14 | 2023-09-12 | Bright Data Ltd. | System and method for streaming content from multiple servers |
US11729012B2 (en) | 2017-08-28 | 2023-08-15 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11876612B2 (en) | 2017-08-28 | 2024-01-16 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11888639B2 (en) | 2017-08-28 | 2024-01-30 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11888638B2 (en) | 2017-08-28 | 2024-01-30 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11979249B2 (en) | 2017-08-28 | 2024-05-07 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11863339B2 (en) | 2017-08-28 | 2024-01-02 | Bright Data Ltd. | System and method for monitoring status of intermediate devices |
US11902044B2 (en) | 2017-08-28 | 2024-02-13 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11729013B2 (en) | 2017-08-28 | 2023-08-15 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11979250B2 (en) | 2017-08-28 | 2024-05-07 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11962430B2 (en) | 2017-08-28 | 2024-04-16 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11909547B2 (en) | 2017-08-28 | 2024-02-20 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11956094B2 (en) | 2017-08-28 | 2024-04-09 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11711233B2 (en) | 2017-08-28 | 2023-07-25 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
US11764987B2 (en) | 2017-08-28 | 2023-09-19 | Bright Data Ltd. | System and method for monitoring proxy devices and selecting therefrom |
US11757674B2 (en) | 2017-08-28 | 2023-09-12 | Bright Data Ltd. | System and method for improving content fetching by selecting tunnel devices |
CN109587007A (en) * | 2018-12-27 | 2019-04-05 | 湖南宸睿通信科技有限公司 | A kind of communication equipment detecting device and its detection method |
US11657110B2 (en) | 2019-02-25 | 2023-05-23 | Bright Data Ltd. | System and method for URL fetching retry mechanism |
US11675866B2 (en) | 2019-02-25 | 2023-06-13 | Bright Data Ltd. | System and method for URL fetching retry mechanism |
US11902253B2 (en) | 2019-04-02 | 2024-02-13 | Bright Data Ltd. | System and method for managing non-direct URL fetching service |
CN110034979A (en) * | 2019-04-23 | 2019-07-19 | 恒安嘉新(北京)科技股份公司 | A kind of proxy resources monitoring method, device, electronic equipment and storage medium |
CN110147271B (en) * | 2019-05-15 | 2020-04-28 | 重庆八戒传媒有限公司 | Method and device for improving quality of crawler proxy and computer readable storage medium |
CN110147271A (en) * | 2019-05-15 | 2019-08-20 | 重庆八戒传媒有限公司 | Promote the method, apparatus and computer readable storage medium of crawler agent quality |
CN111277662A (en) * | 2020-01-22 | 2020-06-12 | 咪咕文化科技有限公司 | Processing method of proxy server, electronic device and storage medium |
CN111277662B (en) * | 2020-01-22 | 2022-11-08 | 咪咕文化科技有限公司 | Processing method of proxy server, electronic device and storage medium |
US11985210B2 (en) | 2022-02-26 | 2024-05-14 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
US11985212B2 (en) | 2023-03-11 | 2024-05-14 | Bright Data Ltd. | System and method for improving internet communication by using intermediate nodes |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108924199A (en) | Crawlers obtain the method, apparatus, computer storage medium and terminal device of network proxy server automatically | |
Roughan et al. | 10 lessons from 10 years of measuring and modeling the internet's autonomous systems | |
CN105357054B (en) | Website traffic analysis method, device and electronic equipment | |
Alderson et al. | The many facets of internet topology and traffic | |
CN107431712A (en) | Network flow daily record for multi-tenant environment | |
CN104298782B (en) | Internet user actively accesses the analysis method of action trail | |
CN103795575B (en) | A kind of system monitoring method towards multiple data centers | |
Claffy | Tracking IPv6 evolution: data we have and data we need | |
Krishnamurthy et al. | A Socratic method for validation of measurement-based networking research | |
CN106713506A (en) | Data acquisition method and data acquisition system | |
Agarwal et al. | High speed streaming data analysis of web generated log streams | |
CN109873793A (en) | A kind of darknet discovery, source tracing method and system based on sample flow analysis | |
Roscoe | The End of Internet Architecture. | |
Pak et al. | Intermedia reliance and sustainability of emergent media: a large-scale analysis of American news outlets’ external linking behaviors | |
CN101599857A (en) | Detect method, device and the network measuring system that inserts number of host of sharing | |
CN103957252B (en) | The journal obtaining method and its system of cloud stocking system | |
Raban et al. | Acting or reacting? Preferential attachment in a people‐tagging system | |
WO2015062652A1 (en) | Technique for data traffic analysis | |
Zygmunt | Role identification of social networkers | |
Jain et al. | Temporal analysis of user behavior and topic evolution on Twitter | |
CN114189451A (en) | Method for identifying target network backbone node | |
López et al. | Exploring the availability, protocols and advertising of tor v3 domains | |
Gonzalez et al. | On the tweet arrival process at Twitter: Analysis and applications | |
Hellerstein et al. | The Network Oracle. | |
Zhao et al. | Intelligent online BGP-4 analyzer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181130 |
|
RJ01 | Rejection of invention patent application after publication |