CN105302801A - Resource caching method and apparatus - Google Patents

Resource caching method and apparatus Download PDF

Info

Publication number
CN105302801A
CN105302801A CN201410228106.7A CN201410228106A CN105302801A CN 105302801 A CN105302801 A CN 105302801A CN 201410228106 A CN201410228106 A CN 201410228106A CN 105302801 A CN105302801 A CN 105302801A
Authority
CN
China
Prior art keywords
pages
content
buffer memory
url address
judge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410228106.7A
Other languages
Chinese (zh)
Inventor
刘永霞
徐羽
刘杉
陈伟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yayue Technology Co ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410228106.7A priority Critical patent/CN105302801A/en
Publication of CN105302801A publication Critical patent/CN105302801A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention is suitable for the field of communications and provides a resource caching method and apparatus. The method comprises: receiving an access request of a uniform resource locator (URL) address; according to the access request of the URL address, determining whether page content that the URL address points to can be cached; when the page content that the URL address points to can be cached, judging whether the page content that the URL address points to is cached; and when the page content that the URL address points to is not cached locally, caching the page content that the URL address points to. According to embodiments of the invention, the cached page content is more comprehensive.

Description

A kind of resource caching method and device
Technical field
The invention belongs to the communications field, particularly relate to a kind of resource caching method and device.
Background technology
Buffer memory, refer to and conventional data in advance is extracted from original storage, and by the deposit data of extraction in a buffer zone, can directly extract from buffer zone when user needs to extract these data, the impact of the factor such as network, deployment at this place, data original storage ground can not be subject to, improve data extraction rate with this.
In existing resource caching method, normally count according to the access times of user the resource that user commonly uses, and this conventional resource of buffer memory.But due to the resource that access times are few, it may be in the website of access performance difference, if not these resources of buffer memory in advance, then when user extracts these resources, will expend the too much time, mistake is deposited, leak the situation of depositing therefore to adopt existing method easily to occur.
Summary of the invention
Embodiments provide a kind of resource caching method, be intended to solve the mistake that existing method occurs when cache resources and deposit, leak the problem of depositing.
The embodiment of the present invention is achieved in that a kind of resource caching method, and described method comprises the steps:
Receive the request of access of uniform resource position mark URL address;
According to the request of access of described URL address, judge that the content of pages that points to described URL address whether can buffer memory;
The content of pages pointed to when described URL address can buffer memory time, judge the content of pages whether buffer memory pointed to described URL address;
When the content of pages pointed to when described URL address is not buffered in this locality, the content of pages that URL address described in buffer memory is pointed to.
Another object of the embodiment of the present invention is to provide a kind of caching resource device, and described device comprises:
Request of access receiving element, for receiving the request of access of uniform resource position mark URL address;
Cashing indication judging unit, for the request of access according to described URL address, judges that the content of pages that points to described URL address whether can buffer memory;
Cache contents judging unit, for the content of pages that points to when described URL address can buffer memory time, judge the content of pages whether buffer memory pointed to described URL address;
Content of pages buffer unit, when the content of pages for pointing to when described URL address is not buffered in this locality, the content of pages that URL address described in buffer memory is pointed to.
In embodiments of the present invention, after the request of access receiving URL address, judge that the content of pages that points to this URL address whether can buffer memory, and content of pages can buffer memory time, judge this content of pages whether buffer memory, if there is no buffer memory, then capture content of pages from high in the clouds or third party website, and by judging that content of pages selects the need of buffer memory the content of pages whether buffer memory captures.Due to before buffer memory content of pages through multilayer judge, therefore the content of pages of buffer memory more comprehensively, more accurate.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of a kind of resource caching method that first embodiment of the invention provides;
Fig. 2 is a kind of asynchronous schematic diagram searching content of pages that first embodiment of the invention provides;
Fig. 3 is the structural drawing of a kind of caching resource device that second embodiment of the invention provides.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
In order to technical solutions according to the invention are described, be described below by specific embodiment.
embodiment one:
Fig. 1 shows the process flow diagram of a kind of resource caching method that first embodiment of the invention provides, and adopt the resource caching method of the present embodiment, can reduce mass users, magnanimity page request to the load of third party website, details are as follows:
Step S11, receives the request of access of uniform resource position mark URL address.
In this step, URL(uniform resource locator) (UniformResourceLocator, URL) locating web-pages, multimedia file etc., server address (agreement (the InternetProtocol interconnected between network in URL, IP) address) generally can replace by domain name, because IP address is not easy to memory.Such as, suppose that a URL address is http:// emuch.net/bbs/viewthread.php? tid=6017207, then the domain name of this URL address is " emuch.net ".
Step S12, according to the request of access of described URL address, judges whether the content of pages pointed to described URL address can buffer memory.
In the request of access according to described URL address, judge that whether the content of pages that points to described URL address before the step of buffer memory, can comprise the steps:
Judge that the content of pages that points to URL address whether can buffer memory by white list, to improve judgement speed, specific as follows:
A1, resolve the domain name of described URL address;
A2, domain name parsing obtained and the domain name be stored in advance in white list compare, when there is the domain name identical with the domain name that described parsing obtains in described white list, judge that the content of pages that points to described URL address can buffer memory, storing content of pages in described white list can the domain name of buffer memory.
Wherein, white list stores can the domain name of buffer memory, comprise the domain name of the website of the safety known in advance, also comprise the domain name of the third party website of web site performance difference, the resource in such as website own, as picture is too much, cause reading picture excessively slow, web site performance is deteriorated, and the resource in such as website own is few again, but visit capacity too much causes web site performance to be deteriorated.The domain name of adding the third party website of web site performance difference in white list is the speed pulling content of pages in order to improve user from this third party website, also improves user pulls content of pages probability from this third party website.Wherein, the third party website of web site performance difference is by judging whether the response speed of webpage reaches requirement etc. and judge.By increasing white list, effectively can improve the accuracy identifying security website, reducing the risk being cached to dangerous content of pages.
In the request of access according to described URL address, judge that whether the content of pages that points to described URL address before the step of buffer memory, can comprise the steps:
Judge that the content of pages that points to URL address whether can buffer memory by blacklist, to improve judgement speed, specific as follows:
A1 ', resolve the domain name of described URL address;
A2 ', domain name parsing obtained and the domain name be stored in advance in blacklist compare, when there is the domain name identical with the domain name that described parsing obtains in described blacklist, judge that the content of pages that points to described URL address cannot buffer memory, described blacklist stores content of pages cannot the domain name of buffer memory.
Be previously stored with the domain name of the unsafe website known in advance in blacklist, the domain name of the website stored in this blacklist is likely that representation page content can buffer memory, but in conjunction with historical record, is judged to be the website with risk.By arranging blacklist, the probability that buffer memory has the web page contents of risk can be reduced.In practical operation, the domain name of white list and blacklist storage can be upgraded at any time, such as, if when judging that in blacklist, some websites no longer exists risk, the domain-name information of this website in blacklist can be deleted, also can increase the domain-name information of this website in white list.
Above-mentionedly only list separately by a kind of implementation that the domain name that the domain name of URL address and white list and blacklist store compares, in a practical situation, also can first compare resolving the domain name obtained with the domain name that white list stores, if white list is this domain name not, then the domain name that this domain name and blacklist store is compared, to judge whether content of pages corresponding to this domain name can buffer memory.Certainly, also first can compare resolving the domain name that obtains with the domain name that blacklist stores, if blacklist not this domain name, then the domain name that this domain name and white list store being compared, to judge whether content of pages corresponding to this domain name can buffer memory.Be not construed as limiting herein.
Except being judged that by white list and blacklist the content of pages that URL address points to whether can buffer memory, also can be judged by the regular expression of this URL address of coupling.
That is, in the described request of access according to described URL address, judge that whether the content of pages that points to described URL address before the step of buffer memory, can comprise the steps:
By the matching regular expressions of the regular expression of URL address and predetermined standard, when the matching regular expressions success of the regular expression of URL address and standard, judge that the content of pages that points to described URL address can buffer memory; When the matching regular expressions failure of the regular expression of URL address and standard, judge that the content of pages that described URL address is pointed to cannot buffer memory.
By mating the regular expression of this URL address, to identify the type of this URL address, improve the security of page cache.Wherein, whether the syntactic rule of the regular expression of URL address being mated to the regular expression mainly judging URL address is identical with the syntactic rule of the regular expression of predetermined standard, if, judge the regular expression of URL address and the matching regular expressions of standard, otherwise, judge that the regular expression of URL address does not mate with the regular expression of standard.The regular expression of URL address and standard just when expression formula is mated, show that the content of pages security of this URL address is higher.In order to improve the recognition speed of the type of URL address, can process the regular expression of the standard for mating in advance, processing procedure is as follows: (1) obtains the regular expression be used for the URL matching addresses at content of pages place: the common trait of the off-line system determination page, common trait according to the page determined collects page sample, the page sample that training is collected is to obtain model, be polymerized the model obtained again, the regular expression that output model is corresponding.(2) optimize the regular expression obtained, obtain the regular expression of standard: the number reducing regular expression, and the length shortening regular expression.(3) regular expression after optimization is packaged into lib storehouse in advance, and is integrated in server, wherein, lib storehouse is a kind of static library, is integrated in server by the regular expression being packaged into lib storehouse, and what be conducive to raising regular expression calls speed.
Certainly, the time expended due to the regular expression mating URL address is greater than the time whether domain name judging URL address exists white list, blacklist usually, therefore, can judge that the domain name of URL address is neither at white list, also, not when blacklist, the method for the regular expression of coupling URL address is selected to judge whether the content of pages that URL address is pointed to can buffer memory.
As another embodiment of the present invention, according to the request of access of described URL address, judge that whether the content of pages that points to described URL address can the step of buffer memory specifically comprise:
B1, resolve described URL address request of access http response.
B2, described http response identification content of pages can buffer memory time, judge that the content of pages that points to described URL address can buffer memory, described http response identification content of pages cannot buffer memory time, judge that the content of pages that described URL address is pointed to cannot buffer memory.
Wherein, contain multiple field relevant to buffer memory in http respond packet, such as buffer memory-control (Cache-Control) field, expire Expires field etc.When Cache-Control field identification content of pages can buffer memory time, this content of pages of buffer memory; When Expires field identification content of pages does not also have expired, this content of pages of buffer memory.Table 1 shows field information relevant to buffer memory in http response::
Table 1:
If there are these two fields of Expires and Cache-Control simultaneously, then the information of Expires field covers by the information of the max-age field of Cache-Control.
Certainly, if require not buffer memory content of pages at the relevant field of the request of access of URL address, then not this content of pages of buffer memory, such as, Cache-Control field in the request of access of this URL address is: during Cache-Control:no-cache, then the content of pages of not this URL address of buffer memory sensing.Table 2 lists the part field in the request of access of URL address:
Table 2:
Step S13, the content of pages pointed to when described URL address can buffer memory time, judge the content of pages whether buffer memory pointed to described URL address.
In this step, when judge content of pages can buffer memory time, continue to judge whether this content of pages is buffered in this locality, to perform corresponding operation according to judged result, avoid the content of pages that local cache is identical, waste storage space.
Step S14, when the content of pages pointed to when described URL address is not buffered in this locality, the content of pages that URL address described in buffer memory is pointed to.
Particularly, during the content of pages pointed in buffer memory URL address, judge whether this content of pages carries out overcompression process, if do not have, then buffer memory again after gzip compression is adopted to this content of pages, be beneficial to saving network flow.This gzip is the abbreviation of GNUzip, and it is the ZIP of a GNU free software.
Wherein, when the described content of pages pointed in described URL address is not buffered in this locality, the step of the content of pages that URL address described in buffer memory is pointed to specifically comprises:
When C1, the content of pages pointed in described URL address are not buffered in this locality, connect with third party website, to capture the content of pages of described third party website; Meanwhile, judge whether high in the clouds is cached with the content of pages of described URL address sensing;
C2, when being cached with the content of pages pointed to described URL address beyond the clouds, capture described content of pages, and the content of pages that buffer memory captures is to local;
C3, beyond the clouds be not cached with described URL address point to content of pages time, judge whether the time of the content of pages capturing described third party website exceedes default time threshold, if, reset and the tie-time of described third party website and data receipt time, again to capture the content of pages of described third party website, and the content of pages of the described third party website of buffer memory crawl is to local and high in the clouds; If not, the content of pages of the described third party website of buffer memory crawl is to local and high in the clouds.
Wherein, Fig. 2 is this locality when not being cached with content of pages, the asynchronous schematic diagram searching content of pages.In step C1 ~ C3, when judging the content of pages that this locality does not have buffer memory URL address to point to, judge whether backstage, high in the clouds is cached with this content of pages, during owing to capturing the content of pages of buffer memory from backstage, high in the clouds, only need the time of millisecond ms rank just can grab required content of pages, therefore, when judging that backstage, high in the clouds is cached with this content of pages, the content of pages of buffer memory is directly captured from backstage, high in the clouds, and the content of pages of crawl is buffered in this locality, when judging that backstage, high in the clouds is not cached with this content of pages, the content of pages that in third party website, this URL address is pointed to will be captured.In crawl process, if the time captured exceedes default time threshold, then reset tie-time and data receipt time in network layer, the tie-time reset is greater than the original tie-time usually, to extend the connection with third party website, in like manner, the data receipt time after resetting also is greater than original data receipt time usually, by resetting tie-time and data receipt time, improve the probability grabbing the content of pages of this third party website.
Particularly, the content of pages that the buffer memory in step C2 captures specifically comprises to local step:
Judge whether the return code in the head response of the content of pages captured belongs to normal value, and when the return code in the head response of the data captured belongs to normal value, by extremely local for the content of pages buffer memory captured.
The content of pages of the described third party website that the buffer memory in step C3 captures specifically comprises to step that is local and high in the clouds:
Judge whether the return code in the head response of the content of pages captured belongs to normal value, and when the return code in the head response of the data captured belongs to normal value, by content of pages buffer memory extremely this locality and the high in the clouds captured.
Wherein, the normal value of return code is generally 200OK, and when return code is normal value, the content of pages that buffer memory captures, when return code is not normal value, is such as 404 or 503 equivalent times, the content of pages of not buffer memory crawl.
Further, described by the content of pages buffer memory that captures to local step before, or, described by the content of pages buffer memory that captures to local and high in the clouds step before, comprise the steps:
Judge whether include user identity state information in the head response of the content of pages captured, when including user identity state information in the head response of the content of pages of the described third party website captured, abandon described user identity state information.
In this step, user identity state information is recorded in Set-Cookie field usually, this Set-Cookie field is included in the head response of content of pages if find, then abandon this Set-Cookie field, the data of buffer memory crawl again, to protect the privacy of user: particularly, if high in the clouds has been cached with this content of pages and this locality does not have buffer memory, then this content of pages of buffer memory is to local; If high in the clouds and this locality all do not have this content of pages of buffer memory, then this content of pages of buffer memory is to local and high in the clouds.
As another embodiment of the present invention, before the content of pages that buffer memory in step C2 captures to local step, or, the content of pages of the described third party website that the buffer memory in step C3 captures to local and high in the clouds step before, comprise the steps:
D1, judge capture the hashed value of content of pages whether identical with the hashed value of the content of pages of record; In this step, when after crawl content of pages, calculate the hashed value of the content of pages of this crawl, and be recorded in storer.When the hashed value of the new content of pages captured is identical with any one hashed value of record, shows that content of pages corresponding to these two hashed values is identical, then the cumulative response number of times of content of pages corresponding for this identical hashed value is added 1.
D2, capture the hashed value of content of pages identical with the hashed value of the content of pages of record time, judge whether the cumulative response number of times of the content of pages of described crawl is more than or equal to default verification number of times, and when the cumulative response number of times of the content of pages of described crawl is more than or equal to default verification number of times, judge that the content of pages of described crawl can buffer memory; In this step, preset verification number of times be more than or equal to 1 integer.After grabbing a content of pages, judge whether the cumulative response number of times of content of pages is more than or equal to default verification number of times, if so, show the requested mistake of this content of pages repeatedly, then judge that this content of pages needs buffer memory, to improve the speed capturing this content of pages; If not, then judge that this content of pages does not need buffer memory.
D3, capture the hashed value of content of pages not identical with the hashed value of the content of pages of record time, judge that the content of pages of described crawl cannot buffer memory.
In step D1 ~ D3, when different customer parameters repeatedly asks same content of pages, judge that this content of pages needs buffer memory; When same customer parameter repeatedly asks same content of pages, during the different content of pages of different customer parameter requests, judge that this different content of pages needs buffer memory; When the content of pages that same user repeatedly asks is not identical, judge that this content of pages does not need buffer memory.Wherein, customer parameter comprises user agent (UserAgent, UA) and User IP.
As another embodiment of the present invention, when the content of pages pointed in described URL address has been buffered in this locality, the cache-time of the content of pages of buffer memory is compared with current time, judge that the difference of the cache-time of the content of pages of buffer memory and current time is whether in the time range preset, when the cache-time of the content of pages of buffer memory and the difference of current time are in the time range preset, again capture content of pages and buffer memory.In order to ensure that the content of pages in local page buffer memory is fully consistent with the content of pages of server stores, freshness detection can be carried out to the content of pages of buffer memory in backstage, high in the clouds, when user asks certain URL, if this locality has been cached with content of pages corresponding to this URL, then judge that whether the content of pages of local cache is expired, if do not have expired, return and capture the content of pages in buffer memory, if the content of pages of buffer memory is expired, then the content of pages of asynchronous refresh page cache.
In the embodiment of the present invention, after the request of access receiving URL address, judge that the content of pages that points to this URL address whether can buffer memory, when content of pages can buffer memory time, judge this content of pages whether buffer memory, if there is no buffer memory, then capture content of pages from high in the clouds or third party website, and by judging that content of pages selects the need of buffer memory the content of pages whether buffer memory captures.Due to before buffer memory content of pages through multilayer judge, therefore the content of pages of buffer memory more comprehensively, more accurate.
embodiment two:
Fig. 3 shows the structural drawing of a kind of caching resource device that second embodiment of the invention provides, and for convenience of explanation, illustrate only the part relevant to the embodiment of the present invention.
This caching resource device comprises: request of access receiving element 31, cashing indication judging unit 32, cache contents judging unit 33, content of pages buffer unit 34.Wherein:
Request of access receiving element 31, for receiving the request of access of uniform resource position mark URL address.
Cashing indication judging unit 32, for the request of access according to described URL address, judges whether the content of pages pointed to described URL address can buffer memory.
Described cashing indication judging unit 32 by the Cache-Control field of the http response judging the request of access of URL address judge content of pages whether can buffer memory time, described cashing indication judging unit 32 comprises:
Field parsing module, for resolving the http response of the request of access of described URL address.
Content of pages whether buffer memory determination module, for described http response identification content of pages can buffer memory time, judge that the content of pages that points to described URL address can buffer memory, described http response identification content of pages cannot buffer memory time, judge that the content of pages that described URL address is pointed to cannot buffer memory.
In above-mentioned cashing indication judging unit 32, field parsing module resolves http response, and whether with the relevant field of content of pages buffer memory, buffer memory determination module judges whether content of pages can buffer memory according in http response to content of pages again.Wherein, Cache-Control and Expires field is included but not limited to the relevant field of content of pages buffer memory.
As another embodiment of the present invention, described caching resource device can also by resolving the domain name of URL address, and compare resolving the domain name obtained with the domain name in the white list prestored, judge that the content of pages that points to URL address whether can buffer memory, at this moment, caching resource device comprises:
Domain name mapping unit, for resolving the domain name of described URL address.
White list comparing unit, for comparing resolving the domain name obtained with the domain name be stored in advance in white list, when there is the domain name identical with the domain name that described parsing obtains in described white list, judge that the content of pages that points to described URL address can buffer memory, storing content of pages in described white list can the domain name of buffer memory.
By arranging white list, can accelerate to judge whether a content of pages can buffer memory.Wherein, storing in white list can the domain name of buffer memory, comprises the domain name of the website of the safety known in advance, also comprises the domain name of the third party website of web site performance difference.
As another embodiment of the present invention, described caching resource device can also by resolving the domain name of URL address, and compare resolving the domain name obtained with the domain name in the blacklist prestored, judge that the content of pages that points to URL address whether can buffer memory, at this moment, caching resource device comprises:
Domain name mapping unit, for resolving the domain name of described URL address;
Blacklist comparing unit, for comparing resolving the domain name obtained and the domain name be stored in advance in blacklist, when there is the domain name identical with the domain name that described parsing obtains in described blacklist, judge that the content of pages that points to described URL address cannot buffer memory, described blacklist stores content of pages cannot the domain name of buffer memory.
Be previously stored with the domain name of the unsafe website known in advance in blacklist, the domain name of the website stored in this blacklist is likely that representation page content can buffer memory, but in conjunction with historical record, is judged to be the website with risk.
Except being judged that by white list and blacklist the content of pages that URL address points to whether can buffer memory, also can be judged by the regular expression of this URL address of coupling.As another embodiment of the present invention, described caching resource device comprises:
Matching regular expressions unit, for the matching regular expressions by the regular expression of URL address and predetermined standard, when the matching regular expressions success of the regular expression of URL address and standard, judge that the content of pages that points to described URL address can buffer memory; When the matching regular expressions failure of the regular expression of URL address and standard, judge that the content of pages that described URL address is pointed to cannot buffer memory.
In order to improve the recognition speed of the type of URL address, can process the regular expression of the standard for mating in advance, processing procedure is as follows: (1) obtains the regular expression be used for the URL matching addresses at content of pages place: the common trait of the off-line system determination page, common trait according to the page determined collects page sample, the page sample that training is collected is to obtain model, be polymerized the model obtained again, the regular expression that output model is corresponding.(2) optimize the regular expression obtained, obtain the regular expression of standard: the number reducing regular expression, and the length shortening regular expression.(3) regular expression after optimization is packaged into lib storehouse in advance, and is integrated in server, wherein, lib storehouse is a kind of static library, is integrated in server by the regular expression being packaged into lib storehouse, and what be conducive to raising regular expression calls speed.
Cache contents judging unit 33, for the content of pages that points to when described URL address can buffer memory time, judge the content of pages whether buffer memory pointed to described URL address.
Particularly, by judging whether the hashed value of the content of pages of buffer memory exists the hashed value equal with the hashed value of the content of pages that URL address is pointed to, if exist, then judge the content of pages buffer memory pointed to URL address.
Content of pages buffer unit 34, for when the content of pages of described URL address sensing does not have buffer memory, the content of pages that URL address described in buffer memory is pointed to.
Wherein, content of pages buffer unit 34 comprises:
Third party website connection establishment module, when the content of pages for pointing in described URL address is not buffered in this locality, connects with third party website, to capture the content of pages of described third party website.Meanwhile, judge whether high in the clouds is cached with the content of pages of described URL address sensing.
High in the clouds content of pages handling module, during for being cached with the content of pages pointed to described URL address beyond the clouds, capture described content of pages, and judge whether the return code in the head response of the content of pages captured belongs to normal value, when return code in the head response of the data captured belongs to normal value, by extremely local for the content of pages buffer memory captured.
Third party website content of pages handling module, during for not being cached with the content of pages pointed to described URL address beyond the clouds, judge whether the time of the content of pages capturing described third party website exceedes default time threshold, if, reset and the tie-time of described third party website and data receipt time, again to capture the content of pages of described third party website, judge whether the return code in the head response of the content of pages captured belongs to normal value, and capture data head response in return code belong to normal value time, by the content of pages buffer memory captured to local and high in the clouds and the content of pages of described third party website that captures of buffer memory to this locality and high in the clouds.If not, judge whether the return code in the head response of the content of pages captured belongs to normal value, and when the return code in the head response of the data captured belongs to normal value, by content of pages buffer memory extremely this locality and the high in the clouds captured.
When not being cached with the content of pages pointed to URL address in this locality, third party website simultaneously to high in the clouds and this content of pages place sends content of pages acquisition request, like this, when being cached with this content of pages beyond the clouds, this content of pages can be grabbed within the time of ms rank, when not being cached with this content of pages beyond the clouds, sending content of pages acquisition request to high in the clouds and also can not take the time capturing content of pages from third party website.
As another embodiment of the present invention, time before buffer memory content of pages, described device also comprises:
User identity state information judging unit, for judge capture content of pages head response in whether include user identity state information, when including user identity state information in the head response of the content of pages of the described third party website captured, abandon described user identity state information.
Time before buffer memory content of pages, need to abandon the user identity state information comprised in the head response of content of pages, to protect the privacy of user.Wherein, user identity state information exists in Set-Cookie field usually.
As another embodiment of the present invention, before buffer memory content of pages, described device also comprises:
Whether same page content judging unit is identical with the hashed value of the content of pages of record for judging the hashed value of content of pages captured.
Content of pages buffer memory judging unit, for when the hashed value of content of pages captured is identical with the hashed value of the content of pages of record, judge whether the crawl number of times of the content of pages of described crawl is more than or equal to default crawl number of times, and when the crawl number of times of the content of pages of described crawl is more than or equal to default crawl number of times, judge that the content of pages of described crawl can buffer memory.When the hashed value of content of pages captured is not identical with the hashed value of the content of pages of record, judge that the content of pages of described crawl cannot buffer memory.
In the present embodiment, judge the whether requested mistake of this content of pages by the hashed value of content of pages, and buffer memory captures the content of pages that number of times is more than or equal to default crawl number of times, otherwise, do not do caching process.By screening the content of pages of buffer memory, reducing content of pages mistake and depositing, leaking the situation of depositing.
As another embodiment of the present invention, described device comprises:
Content of pages updating block, during for the content of pages that points in described URL address buffer memory, the cache-time of the content of pages of buffer memory is compared with current time, judge that the difference of the cache-time of the content of pages of buffer memory and current time is whether in the time range preset, and when the cache-time of the content of pages of buffer memory and the difference of current time are in the time range preset, again capture content of pages and buffer memory.
By carrying out freshness detection to being buffered in local content of pages, be conducive to ensureing that the content of pages in local page buffer memory is fully consistent with the content of pages of server stores.
One of ordinary skill in the art will appreciate that, the all or part of step realized in above-described embodiment method is that the hardware that can carry out instruction relevant by program has come, described program can be stored in a computer read/write memory medium, described storage medium, as ROM/RAM, disk, CD etc.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (19)

1. a resource caching method, is characterized in that, described method comprises the steps:
Receive the request of access of uniform resource position mark URL address;
According to the request of access of described URL address, judge that the content of pages that points to described URL address whether can buffer memory;
The content of pages pointed to when described URL address can buffer memory time, judge the content of pages whether buffer memory pointed to described URL address;
When the content of pages pointed to when described URL address is not buffered in this locality, the content of pages that URL address described in buffer memory is pointed to.
2. the method for claim 1, is characterized in that, in the described request of access according to described URL address, judges that whether the content of pages that points to described URL address before the step of buffer memory, can comprise the steps:
Resolve the domain name of described URL address;
Compare resolving the domain name obtained and the domain name be stored in advance in white list, when there is the domain name identical with the domain name that described parsing obtains in described white list, judge that the content of pages that points to described URL address can buffer memory, storing content of pages in described white list can the domain name of buffer memory.
3. the method for claim 1, is characterized in that, in the described request of access according to described URL address, judges that whether the content of pages that points to described URL address before the step of buffer memory, can comprise the steps:
Resolve the domain name of described URL address;
Compare resolving the domain name obtained and the domain name be stored in advance in blacklist, when there is the domain name identical with the domain name that described parsing obtains in described blacklist, judge that the content of pages that points to described URL address cannot buffer memory, described blacklist stores content of pages cannot the domain name of buffer memory.
4. the method for claim 1, is characterized in that, in the described request of access according to described URL address, judges that whether the content of pages that points to described URL address before the step of buffer memory, can comprise the steps:
By the matching regular expressions of the regular expression of URL address and predetermined standard, when the matching regular expressions success of the regular expression of URL address and standard, judge that the content of pages that points to described URL address can buffer memory; When the matching regular expressions failure of the regular expression of URL address and standard, judge that the content of pages that described URL address is pointed to cannot buffer memory.
5. the method for claim 1, is characterized in that, the described request of access according to described URL address, judges that whether the content of pages that points to described URL address can the step of buffer memory specifically comprise:
Resolve the http response of the request of access of described URL address;
Described http response identification content of pages can buffer memory time, judge that the content of pages that points to described URL address can buffer memory, described http response identification content of pages cannot buffer memory time, judge that the content of pages that described URL address is pointed to cannot buffer memory.
6. the method for claim 1, is characterized in that, when the described content of pages pointed in described URL address is not buffered in this locality, the step of the content of pages that URL address described in buffer memory is pointed to specifically comprises:
When the content of pages pointed in described URL address is not buffered in this locality, connect with third party website, to capture the content of pages of described third party website; Meanwhile, judge whether high in the clouds is cached with the content of pages of described URL address sensing;
When being cached with the content of pages pointed to described URL address beyond the clouds, capture described content of pages, and the content of pages that buffer memory captures is to local;
When not being cached with the content of pages pointed to described URL address beyond the clouds, judge whether the time of the content of pages capturing described third party website exceedes default time threshold, if, reset and the tie-time of described third party website and data receipt time, again to capture the content of pages of described third party website, and the content of pages of the described third party website of buffer memory crawl is to local and high in the clouds; If not, the content of pages of the described third party website of buffer memory crawl is to local and high in the clouds.
7. method as claimed in claim 6, is characterized in that,
The content of pages that described buffer memory captures specifically comprises to local step:
Judge whether the return code in the head response of the content of pages captured belongs to normal value, and when the return code in the head response of the data captured belongs to normal value, by extremely local for the content of pages buffer memory captured;
The content of pages of the described third party website that described buffer memory captures specifically comprises to step that is local and high in the clouds:
Judge whether the return code in the head response of the content of pages captured belongs to normal value, and when the return code in the head response of the data captured belongs to normal value, by content of pages buffer memory extremely this locality and the high in the clouds captured.
8. method as claimed in claim 7, is characterized in that, described by the content of pages buffer memory that captures to local step before, or, described by the content of pages buffer memory that captures to local and high in the clouds step before, comprise the steps:
Judge whether include user identity state information in the head response of the content of pages captured, when including user identity state information in the head response of the content of pages of the described third party website captured, abandon described user identity state information.
9. method as claimed in claim 6, is characterized in that, before the content of pages captured at described buffer memory to local step, or, the content of pages of the described third party website captured at described buffer memory to local and high in the clouds step before, comprise the steps:
Judge that whether the hashed value of content of pages captured is identical with the hashed value of the content of pages of record;
When the hashed value of content of pages captured is identical with the hashed value of the content of pages of record, judge whether the crawl number of times of the content of pages of described crawl is more than or equal to default crawl number of times, and when the crawl number of times of the content of pages of described crawl is more than or equal to default crawl number of times, judge that the content of pages of described crawl can buffer memory;
When the hashed value of content of pages captured is not identical with the hashed value of the content of pages of record, judge that the content of pages of described crawl cannot buffer memory.
10. the method for claim 1, it is characterized in that, when the content of pages pointed to when described URL address has been buffered in this locality, the cache-time of the content of pages of buffer memory is compared with current time, judge that the difference of the cache-time of the content of pages of buffer memory and current time is whether in the time range preset, when the cache-time of the content of pages of buffer memory and the difference of current time are in the time range preset, again capture content of pages and buffer memory.
11. 1 kinds of caching resource devices, is characterized in that, described device comprises:
Request of access receiving element, for receiving the request of access of uniform resource position mark URL address;
Cashing indication judging unit, for the request of access according to described URL address, judges that the content of pages that points to described URL address whether can buffer memory;
Cache contents judging unit, for the content of pages that points to when described URL address can buffer memory time, judge the content of pages whether buffer memory pointed to described URL address;
Content of pages buffer unit, when the content of pages for pointing to when described URL address is not buffered in this locality, the content of pages that URL address described in buffer memory is pointed to.
12. devices as claimed in claim 11, it is characterized in that, described device comprises:
Domain name mapping unit, for resolving the domain name of described URL address;
White list comparing unit, for comparing resolving the domain name obtained with the domain name be stored in advance in white list, when there is the domain name identical with the domain name that described parsing obtains in described white list, judge that the content of pages that points to described URL address can buffer memory, storing content of pages in described white list can the domain name of buffer memory.
13. devices as claimed in claim 11, it is characterized in that, described device comprises:
Domain name mapping unit, for resolving the domain name of described URL address;
Blacklist comparing unit, for comparing resolving the domain name obtained and the domain name be stored in advance in blacklist, when there is the domain name identical with the domain name that described parsing obtains in described blacklist, judge that the content of pages that points to described URL address cannot buffer memory, described blacklist stores content of pages cannot the domain name of buffer memory.
14. devices as claimed in claim 11, it is characterized in that, described device comprises:
Matching regular expressions unit, for the matching regular expressions by the regular expression of URL address and predetermined standard, when the matching regular expressions success of the regular expression of URL address and standard, judge that the content of pages that points to described URL address can buffer memory; When the matching regular expressions failure of the regular expression of URL address and standard, judge that the content of pages that described URL address is pointed to cannot buffer memory.
15. devices as claimed in claim 11, it is characterized in that, described cashing indication judging unit comprises:
Field parsing module, for resolving the http response of the request of access of described URL address;
Content of pages whether buffer memory determination module, for described http response identification content of pages can buffer memory time, judge that the content of pages that points to described URL address can buffer memory, described http response identification content of pages cannot buffer memory time, judge that the content of pages that described URL address is pointed to cannot buffer memory.
16. devices as claimed in claim 11, it is characterized in that, described content of pages buffer unit comprises:
Third party website connection establishment module, when the content of pages for pointing in described URL address is not buffered in this locality, connects with third party website, to capture the content of pages of described third party website; Meanwhile, judge whether high in the clouds is cached with the content of pages of described URL address sensing;
High in the clouds content of pages handling module, during for being cached with the content of pages pointed to described URL address beyond the clouds, capture described content of pages, and judge whether the return code in the head response of the content of pages captured belongs to normal value, when return code in the head response of the data captured belongs to normal value, by extremely local for the content of pages buffer memory captured;
Third party website content of pages handling module, during for not being cached with the content of pages pointed to described URL address beyond the clouds, judge whether the time of the content of pages capturing described third party website exceedes default time threshold, if, reset and the tie-time of described third party website and data receipt time, again to capture the content of pages of described third party website, judge whether the return code in the head response of the content of pages captured belongs to normal value, and capture data head response in return code belong to normal value time, by the content of pages buffer memory captured to local and high in the clouds and the content of pages of described third party website that captures of buffer memory to this locality and high in the clouds, if not, judge whether the return code in the head response of the content of pages captured belongs to normal value, and when the return code in the head response of the data captured belongs to normal value, by content of pages buffer memory extremely this locality and the high in the clouds captured.
17. devices as claimed in claim 16, it is characterized in that, described device also comprises:
User identity state information judging unit, for judge capture content of pages head response in whether include user identity state information, when including user identity state information in the head response of the content of pages of the described third party website captured, abandon described user identity state information.
18. devices as claimed in claim 16, it is characterized in that, described device also comprises:
Whether same page content judging unit is identical with the hashed value of the content of pages of record for judging the hashed value of content of pages captured;
Content of pages buffer memory judging unit, for when the hashed value of content of pages captured is identical with the hashed value of the content of pages of record, judge whether the crawl number of times of the content of pages of described crawl is more than or equal to default crawl number of times, and when the crawl number of times of the content of pages of described crawl is more than or equal to default crawl number of times, judge that the content of pages of described crawl can buffer memory; When the hashed value of content of pages captured is not identical with the hashed value of the content of pages of record, judge that the content of pages of described crawl cannot buffer memory.
19. devices as claimed in claim 11, it is characterized in that, described device comprises:
Content of pages updating block, when content of pages for pointing to when described URL address has been buffered in this locality, the cache-time of the content of pages of buffer memory is compared with current time, judge that the difference of the cache-time of the content of pages of buffer memory and current time is whether in the time range preset, when the cache-time of the content of pages of buffer memory and the difference of current time are in the time range preset, again capture content of pages and buffer memory.
CN201410228106.7A 2014-05-27 2014-05-27 Resource caching method and apparatus Pending CN105302801A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410228106.7A CN105302801A (en) 2014-05-27 2014-05-27 Resource caching method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410228106.7A CN105302801A (en) 2014-05-27 2014-05-27 Resource caching method and apparatus

Publications (1)

Publication Number Publication Date
CN105302801A true CN105302801A (en) 2016-02-03

Family

ID=55200080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410228106.7A Pending CN105302801A (en) 2014-05-27 2014-05-27 Resource caching method and apparatus

Country Status (1)

Country Link
CN (1) CN105302801A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017198145A1 (en) * 2016-05-20 2017-11-23 中兴通讯股份有限公司 Processing method and device for scheduling rule of uniform resource locator
CN107508807A (en) * 2017-08-14 2017-12-22 平伟功 A kind of web page contents renewal, the method and system of data storage
WO2018112685A1 (en) * 2016-12-19 2018-06-28 深圳中兴力维技术有限公司 Method, device and system for processing high concurrent hypertext transfer protocol request
CN108259198A (en) * 2016-12-28 2018-07-06 ***通信集团辽宁有限公司 A kind of pre-judging method, device and the equipment of domain name cache hit rate
CN109586937A (en) * 2017-09-28 2019-04-05 中兴通讯股份有限公司 A kind of O&M method, equipment and the storage medium of caching system
CN109639801A (en) * 2018-12-17 2019-04-16 深圳市网心科技有限公司 Back end distribution and data capture method and system
CN109922368A (en) * 2019-02-18 2019-06-21 青岛海信电器股份有限公司 A kind of image display method and smart television based on Webpage
CN111563216A (en) * 2020-07-16 2020-08-21 平安国际智慧城市科技股份有限公司 Local data caching method and device and related equipment
CN112100541A (en) * 2020-08-24 2020-12-18 浙江三维万易联科技有限公司 Website page element loading method and device, electronic device and storage medium
CN113971057A (en) * 2020-07-22 2022-01-25 北京奇虎科技有限公司 Page component information caching method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201331A1 (en) * 2007-02-15 2008-08-21 Bjorn Marius Aamodt Eriksen Systems and Methods for Cache Optimization
CN101287013A (en) * 2008-05-30 2008-10-15 杭州华三通信技术有限公司 Method for updating Webpage and Web proxy device
CN102810101A (en) * 2011-06-03 2012-12-05 北京搜狗科技发展有限公司 Webpage pre-reading method and device and browser
CN103595702A (en) * 2012-08-17 2014-02-19 中兴通讯股份有限公司 Selection method and apparatus for content providing device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080201331A1 (en) * 2007-02-15 2008-08-21 Bjorn Marius Aamodt Eriksen Systems and Methods for Cache Optimization
CN101287013A (en) * 2008-05-30 2008-10-15 杭州华三通信技术有限公司 Method for updating Webpage and Web proxy device
CN102810101A (en) * 2011-06-03 2012-12-05 北京搜狗科技发展有限公司 Webpage pre-reading method and device and browser
CN103595702A (en) * 2012-08-17 2014-02-19 中兴通讯股份有限公司 Selection method and apparatus for content providing device

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107404392A (en) * 2016-05-20 2017-11-28 中兴通讯股份有限公司 The processing method and processing device of the scheduling rule of uniform resource position mark URL
WO2017198145A1 (en) * 2016-05-20 2017-11-23 中兴通讯股份有限公司 Processing method and device for scheduling rule of uniform resource locator
WO2018112685A1 (en) * 2016-12-19 2018-06-28 深圳中兴力维技术有限公司 Method, device and system for processing high concurrent hypertext transfer protocol request
CN108259198B (en) * 2016-12-28 2021-08-06 ***通信集团辽宁有限公司 Method, device and equipment for prejudging domain name cache hit rate
CN108259198A (en) * 2016-12-28 2018-07-06 ***通信集团辽宁有限公司 A kind of pre-judging method, device and the equipment of domain name cache hit rate
CN107508807A (en) * 2017-08-14 2017-12-22 平伟功 A kind of web page contents renewal, the method and system of data storage
CN109586937A (en) * 2017-09-28 2019-04-05 中兴通讯股份有限公司 A kind of O&M method, equipment and the storage medium of caching system
CN109586937B (en) * 2017-09-28 2022-03-15 中兴通讯股份有限公司 Operation and maintenance method, equipment and storage medium of cache system
CN109639801A (en) * 2018-12-17 2019-04-16 深圳市网心科技有限公司 Back end distribution and data capture method and system
CN109922368A (en) * 2019-02-18 2019-06-21 青岛海信电器股份有限公司 A kind of image display method and smart television based on Webpage
CN111563216A (en) * 2020-07-16 2020-08-21 平安国际智慧城市科技股份有限公司 Local data caching method and device and related equipment
CN113971057A (en) * 2020-07-22 2022-01-25 北京奇虎科技有限公司 Page component information caching method, device, equipment and storage medium
CN112100541A (en) * 2020-08-24 2020-12-18 浙江三维万易联科技有限公司 Website page element loading method and device, electronic device and storage medium
CN112100541B (en) * 2020-08-24 2024-04-02 三维通信股份有限公司 Method and device for loading website page element, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN105302801A (en) Resource caching method and apparatus
US10652265B2 (en) Method and apparatus for network forensics compression and storage
US9218482B2 (en) Method and device for detecting phishing web page
CN106657044B (en) It is a kind of for improving the web page address jump method of web station system Prevention-Security
CN101582887B (en) Safety protection method, gateway device and safety protection system
CN103118007B (en) A kind of acquisition methods of user access activity and system
CN104714965B (en) Static resource De-weight method, static resource management method and device
CN109768992B (en) Webpage malicious scanning processing method and device, terminal device and readable storage medium
CN103577482B (en) A kind of webpage collection method, device and browser
CN107241300B (en) User request intercepting method and device
CN105635064B (en) CSRF attack detection method and device
CN107528812B (en) Attack detection method and device
CN104572777A (en) Webpage loading method and device based on UIWebView component
CN113518077A (en) Malicious web crawler detection method, device, equipment and storage medium
CN105337993A (en) Dynamic and static combination-based mail security detection device and method
CN109660552A (en) A kind of Web defence method combining address jump and WAF technology
CN108429785A (en) A kind of generation method, reptile recognition methods and the device of reptile identification encryption string
CN104253785A (en) Dangerous web address identification method, device and system
CN102664872A (en) System used for detecting and preventing attack to server in computer network and method thereof
CN113469866A (en) Data processing method and device and server
CN102130791A (en) Method, device and gateway server for detecting agent on gateway server
CN103152325A (en) Method and device for preventing visiting internet through sharing mode
CN109726340A (en) The querying method and device of uniform resource locator classification
CN107040606B (en) Method and device for processing http request
EP2175589A1 (en) Content control method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20221115

Address after: 1402, Floor 14, Block A, Haina Baichuan Headquarters Building, No. 6, Baoxing Road, Haibin Community, Xin'an Street, Bao'an District, Shenzhen, Guangdong 518133

Applicant after: Shenzhen Yayue Technology Co.,Ltd.

Address before: 2, 518044, East 403 room, SEG science and Technology Park, Zhenxing Road, Shenzhen, Guangdong, Futian District

Applicant before: TENCENT TECHNOLOGY (SHENZHEN) Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160203