CN106227823A - A kind of webpage update detection method, info web capture and rendering method - Google Patents
A kind of webpage update detection method, info web capture and rendering method Download PDFInfo
- Publication number
- CN106227823A CN106227823A CN201610587575.7A CN201610587575A CN106227823A CN 106227823 A CN106227823 A CN 106227823A CN 201610587575 A CN201610587575 A CN 201610587575A CN 106227823 A CN106227823 A CN 106227823A
- Authority
- CN
- China
- Prior art keywords
- webpage
- information
- renewal
- updating
- info web
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000009877 rendering Methods 0.000 title claims abstract description 9
- 238000006748 scratching Methods 0.000 claims description 12
- 230000002393 scratching effect Effects 0.000 claims description 12
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 230000004308 accommodation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9574—Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a kind of webpage update detection method, info web captures and rendering method, and described webpage update detection method includes: s1, analyzes the frame structure of predetermined url webpage and determines crawl information area;S2, the information of described crawl information area is carried out similarity comparison with local information;S3, when similarity less than set threshold value time judge that this rul webpage has renewal, otherwise judge this url webpage without update.This webpage update detection method can accurately judge the more fresh information of webpage, thus avoids being misled by unrelated renewal, and then resource of avoiding losing time and providing a loan.
Description
Technical field
The present invention relates to Web information processing technical field, particularly relate to a kind of webpage update detection method, webpage letter
Breath captures and rendering method.
Background technology
Application number 201310007246.7, the Chinese invention of a kind of grasping means based on the detection network web update cycle of title
Patent application, by the way of obtaining the renewal of the page time, judges whether webpage has renewal, if the page and historical information
The renewal of the page time is different, then formulating the page and obtaining mode is to obtain (GET), if the page of the page and historical information is more
The new time is identical, then specified page obtains mode is detection (CHK), and the shortcoming of the program is: rely on the time letter that webpage updates
Breath judges, may be misled into, such as update when being secondary or need not the information being concerned about, also can start crawl dynamic
Make.
The disclosure of background above technology contents is only used for assisting inventive concept and the technical scheme understanding the present invention, and it is not
Necessarily belong to the prior art of present patent application, show the foregoing applying date in present patent application there is no tangible proof
In the case of disclosed in, above-mentioned background technology should not be taken to evaluate novelty and the creativeness of the application.
Summary of the invention
Present invention is primarily targeted at a kind of webpage update detection method of proposition, to solve what above-mentioned prior art existed
The temporal information relying on webpage to update carries out judging possible misguided technical problem.
To this end, the present invention proposes a kind of webpage update detection method, including: s1, the frame structure of analyzing predetermined url webpage
And determine crawl information area;S2, the information of described crawl information area is carried out similarity comparison with local information;S3, when
Judge that this rul webpage has renewal when similarity is less than the threshold value set, otherwise judge that this url webpage is without updating.
Preferably, the present invention can also have a following technical characteristic:
The information of described judgement described crawl information area comprises the steps: s201, right with the similarity of local information
Described crawl information area sectional drawing binary conversion treatment obtain binary image;S202, two that described binary conversion treatment is obtained
The binary image of value image and locally stored same url webpage is compared;S203, result according to comparison are judged to have
Update or without updating.
Also include step s204, when the result of comparison be judged to without update time, the crawl information that described step s1 is determined
Region returns described step s201 at least one times after amplifying the multiple set.
The information of described judgement described crawl information area comprises the steps: s301, really with the similarity of local information
The code line at fixed described crawl information area place;S302, capture the customizing messages that described code line is corresponding;S303, by described
The customizing messages of customizing messages and locally stored same url webpage is compared;S304, according to than to result be judged to have more
New or without updating.
Also include step s305, when the result of comparison be judged to without update time, the crawl information that described step s1 is determined
Area extension is to adjacent or non-conterminous additionally at least one code line, and returns described step s301 at least one times.
The present invention also proposes a kind of info web grasping means, updates based on the webpage described in the aforementioned any one of claim
Detection method, is updated detection to the webpage of default url, carries out information scratching when result of determination is to have renewal, and update
To this locality, it is determined that when result is without updating, not capture and maintain local original information constant.
Preferably, described in carry out information scratching when having renewal and use orientation grasping means, only capture the described crawl information area
Information in territory.
The present invention it is further proposed that a kind of info web captures and rendering method, based on the webpage described in aforementioned any one
Update detection method, is updated detection to the webpage of default url, carries out information scratching when result of determination is to have renewal, and
Update to this locality, it is determined that when result is without updating, maintain local original information constant;Info web is in now, according to without updating
Info web formerly have the posterior mode of info web of renewal progressively to present.
Preferably, while the webpage having renewal is carried out information scratching, the webpage without updating is presented, to shorten
The waiting time that information presents.
It is also preferred that to the webpage having renewal captured, with inserted mode one by one while renewal to this locality
Present immediately.
The beneficial effect that the present invention is compared with the prior art includes: because crawl information area has been determined in advance, and for
The judgement whether this region is updated, therefore the misleading that unrelated renewal can be avoided to cause, start webpage the most accurately and effectively
Information scratching task, saves time and bandwidth resources.
Accompanying drawing explanation
Fig. 1 is the theory diagram of the present invention;
Fig. 2 is the FB(flow block) of one specific embodiment of the present invention
Fig. 3 is the FB(flow block) of another specific embodiment of the present invention.
Detailed description of the invention
Below in conjunction with detailed description of the invention and compare accompanying drawing the present invention is described in further detail.It is emphasized that
That the description below is merely exemplary rather than in order to limit the scope of the present invention and application thereof.
With reference to figure 1 below-3, the embodiment of non-limiting and nonexcludability, the most identical reference table will be described
Show identical parts, unless stated otherwise.
A kind of info web captures and rendering method, first, the webpage of default url is updated detection, when judging knot
Fruit carries out information scratching when being to have renewal, and updates to this locality, it is determined that when result is without updating, and maintains local original information not
Become.Info web, in now, formerly has the posterior mode of info web of renewal progressively to present according to the info web without updating.
More preferably a way is: described in carry out information scratching when having renewal and use orientation grasping means, only capture institute
State the information captured in information area.
Another more preferably way be: while the webpage having renewal is carried out information scratching, to without update webpage enter
Row presents, to shorten the waiting time that information presents.
It addition, to the webpage having renewal captured, can be with inserted mode one by one while renewal to this locality
Present immediately.I.e. showing as such, it is possible to i.e. capture, the display of the web page contents being has seriality, subtracts the most as far as possible
Few pause.
As it is shown in figure 1, the method that the webpage of aforementioned default url is updated detection includes: s1, analyze predetermined url webpage
Frame structure and determine crawl information area;S2, the information of described crawl information area is carried out similarity with local information
Comparison;S3, when similarity less than set threshold value time judge that this rul webpage has renewal, otherwise judge this url webpage without update.
Wherein, the information of described judgement described crawl information area with local information similarity as in figure 2 it is shown, include as
Lower step: s201, described crawl information area sectional drawing binary conversion treatment are obtained binary image;S202, by described two-value
Change processes the binary image of binary image and the locally stored same url webpage obtained and compares;S203, according to comparison
Result be judged to have renewal or without updating.May also include step s204, when the result of comparison be judged to without update time, by described
The crawl information area that step s1 determines returns described step s201 at least one times after amplifying the multiple set.This similarity judges
The benefit of method is: the content no matter updated is word or graphic form, all can accurately judge.
Or, the information of described judgement described crawl information area with local information similarity as it is shown on figure 3, include as
Lower step: s301, determine the code line at described crawl information area place;S302, capture the specific letter that described code line is corresponding
Breath;S303, the customizing messages of described customizing messages with locally stored same url webpage is compared;S304, according to comparison
Result is judged to there is renewal or without updating.May also include step s305, when the result of comparison be judged to without update time, by described step
The crawl information area that rapid s1 determines extends to adjacent or non-conterminous additionally at least one code line, and returns described step
S301 is at least one times.The benefit of this similarity decision method is: information scratching is fast, presents more timely, because judging that webpage is more
The information (judge and capture two job content unifications) of necessity has been captured, if result of determination is to have more while whether new
Newly, directly presented and be saved in this locality.
It would be recognized by those skilled in the art that it is possible that above description is made numerous accommodation, so embodiment is only
It is used for describing one or more particular implementation.
Although having been described above and describe the example embodiment being counted as the present invention, it will be apparent to those skilled in the art that
It can be variously modified and replace, without departing from the spirit of the present invention.Furthermore it is possible to make many amendments with by spy
Stable condition is fitted to the religious doctrine of the present invention, without departing from invention described herein central concept.So, the present invention is unrestricted
In specific embodiment disclosed here, but the present invention may also include all embodiments and the equivalent thereof that belong to the scope of the invention
Thing.
Claims (10)
1. a webpage update detection method, it is characterised in that: s1, analyze the frame structure of predetermined url webpage and determine crawl
Information area;S2, the information of described crawl information area is carried out similarity comparison with local information;S3, it is less than when similarity
Judge that this rul webpage has renewal during the threshold value set, otherwise judge that this url webpage is without updating.
2. webpage update detection method as claimed in claim 1, it is characterised in that: described judgement described crawl information area
Information comprises the steps: s201 with the similarity of local information, obtains described crawl information area sectional drawing binary conversion treatment
To binary image;S202, the binary image that described binary conversion treatment is obtained and the two of locally stored same url webpage
Value image is compared;S203, result according to comparison are judged to there is renewal or without updating.
3. webpage update detection method as claimed in claim 2, it is characterised in that: also include step s204, when the knot of comparison
When fruit is judged to without updating, capturing after information area amplifies the multiple set of described step s1 being determined returns described step
S201 is at least one times.
4. webpage update detection method as claimed in claim 1, it is characterised in that: described judgement described crawl information area
Information comprises the steps: s301 with the similarity of local information, determines the code line at described crawl information area place;
S302, capture the customizing messages that described code line is corresponding;S303, by described customizing messages and locally stored same url webpage
Customizing messages compare;S304, result according to comparison are judged to there is renewal or without updating.
5. webpage update detection method as claimed in claim 4, it is characterised in that: also include step s304, when the knot of comparison
When fruit is judged to without updating, the crawl information area described step s1 determined extends to adjacent or non-conterminous additionally at least one
Individual code line, and return described step s301 at least one times.
6. an info web grasping means, it is characterised in that: webpage update detection side based on any one of claim 1-5
Method, is updated detection to the webpage of default url, carries out information scratching when result of determination is to have renewal, and updates to this locality,
When result of determination is without updating, not capture and maintain local original information constant.
7. info web grasping means as claimed in claim 6, it is characterised in that carry out information scratching when having renewal described in: and adopt
With orientation grasping means, only capture the information in described crawl information area.
8. an info web captures and rendering method, it is characterised in that: webpage based on any one of claim 1-5 updates to be visited
Survey method, is updated detection to the webpage of default url, carries out information scratching when result of determination is to have renewal, and renewal is arrived
Local, it is determined that when result is without updating, maintain local original information constant;
Info web in now, according to the info web without updating formerly have the posterior mode of info web of renewal progressively in
Existing.
9. info web as claimed in claim 8 captures and rendering method, it is characterised in that: the webpage having renewal is carried out letter
While breath captures, the webpage without updating is presented, to shorten the waiting time that information presents.
10. info web as claimed in claim 8 captures and rendering method, it is characterised in that: have more captured
New webpage, presents with inserted mode one by one while this locality immediately updating.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610587575.7A CN106227823A (en) | 2016-07-21 | 2016-07-21 | A kind of webpage update detection method, info web capture and rendering method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610587575.7A CN106227823A (en) | 2016-07-21 | 2016-07-21 | A kind of webpage update detection method, info web capture and rendering method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106227823A true CN106227823A (en) | 2016-12-14 |
Family
ID=57532701
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610587575.7A Pending CN106227823A (en) | 2016-07-21 | 2016-07-21 | A kind of webpage update detection method, info web capture and rendering method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106227823A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106910195A (en) * | 2017-01-22 | 2017-06-30 | 北京奇艺世纪科技有限公司 | A kind of web page layout monitoring method and device |
CN111367962A (en) * | 2020-02-28 | 2020-07-03 | 北京金堤科技有限公司 | Database updating method and device, computer readable storage medium and electronic equipment |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030204561A1 (en) * | 2002-04-30 | 2003-10-30 | International Business Machines Corporation | Method and apparatus for enabling an internet web server to keep an accurate count of page hits |
CN102375830A (en) * | 2010-08-13 | 2012-03-14 | 富士通株式会社 | Webpage updating judging method and device as well as website synchronization method and device |
CN102982161A (en) * | 2012-12-05 | 2013-03-20 | 北京奇虎科技有限公司 | Method and device for acquiring webpage information |
CN103049576A (en) * | 2013-01-05 | 2013-04-17 | 北京世纪高通科技有限公司 | Event acquisition method and event acquisition device |
CN103207874A (en) * | 2012-01-17 | 2013-07-17 | 腾讯科技(深圳)有限公司 | Updated webpage content prompting method and system |
CN103885957A (en) * | 2012-12-20 | 2014-06-25 | 百度在线网络技术(北京)有限公司 | Webpage information extraction method and device |
CN104142987A (en) * | 2014-07-24 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Page content management method and device and terminal device |
CN104462152A (en) * | 2013-09-23 | 2015-03-25 | 深圳市腾讯计算机***有限公司 | Webpage recognition method and device |
CN105069032A (en) * | 2015-07-20 | 2015-11-18 | 东南大学 | Filtering expression and rendering engine based method for automatically monitoring update of dynamic webpage |
-
2016
- 2016-07-21 CN CN201610587575.7A patent/CN106227823A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030204561A1 (en) * | 2002-04-30 | 2003-10-30 | International Business Machines Corporation | Method and apparatus for enabling an internet web server to keep an accurate count of page hits |
CN102375830A (en) * | 2010-08-13 | 2012-03-14 | 富士通株式会社 | Webpage updating judging method and device as well as website synchronization method and device |
CN103207874A (en) * | 2012-01-17 | 2013-07-17 | 腾讯科技(深圳)有限公司 | Updated webpage content prompting method and system |
CN102982161A (en) * | 2012-12-05 | 2013-03-20 | 北京奇虎科技有限公司 | Method and device for acquiring webpage information |
CN103885957A (en) * | 2012-12-20 | 2014-06-25 | 百度在线网络技术(北京)有限公司 | Webpage information extraction method and device |
CN103049576A (en) * | 2013-01-05 | 2013-04-17 | 北京世纪高通科技有限公司 | Event acquisition method and event acquisition device |
CN104462152A (en) * | 2013-09-23 | 2015-03-25 | 深圳市腾讯计算机***有限公司 | Webpage recognition method and device |
CN104142987A (en) * | 2014-07-24 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Page content management method and device and terminal device |
CN105069032A (en) * | 2015-07-20 | 2015-11-18 | 东南大学 | Filtering expression and rendering engine based method for automatically monitoring update of dynamic webpage |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106910195A (en) * | 2017-01-22 | 2017-06-30 | 北京奇艺世纪科技有限公司 | A kind of web page layout monitoring method and device |
CN106910195B (en) * | 2017-01-22 | 2020-06-16 | 北京奇艺世纪科技有限公司 | Webpage layout monitoring method and device |
CN111367962A (en) * | 2020-02-28 | 2020-07-03 | 北京金堤科技有限公司 | Database updating method and device, computer readable storage medium and electronic equipment |
CN111367962B (en) * | 2020-02-28 | 2024-01-30 | 北京金堤科技有限公司 | Database updating method and device, computer readable storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109657431B (en) | Method for identifying user identity | |
CN109995601B (en) | Network traffic identification method and device | |
US20130243249A1 (en) | Electronic device and method for recognizing image and searching for concerning information | |
WO2015074503A1 (en) | Statistical method and apparatus for webpage access data | |
KR102002024B1 (en) | Method for processing labeling of object and object management server | |
CN103365967B (en) | Automatic difference detection method and device based on crawler | |
WO2017167088A1 (en) | A user relationship based multimedia recommendation method and apparatus | |
CN103870824A (en) | Method and device for capturing face in face detecting and tracking process | |
CN110348345A (en) | A kind of Weakly supervised timing operating position fixing method based on continuity of movement | |
CN106227823A (en) | A kind of webpage update detection method, info web capture and rendering method | |
CN103077380A (en) | Method and device for carrying out statistics on number of people on basis of video | |
CN107301245B (en) | Power information video search system | |
US10217455B2 (en) | Linguistic model database for linguistic recognition, linguistic recognition device and linguistic recognition method, and linguistic recognition system | |
CN102866885A (en) | Method and device for confirming clicking position in webpage | |
US20200073877A1 (en) | Video cookies | |
CN110599232A (en) | Consumption group analysis method based on big data | |
CN111008987B (en) | Method and device for extracting edge image based on gray background and readable storage medium | |
JP2014532220A (en) | Net comment collection method and system | |
KR20220090203A (en) | Automatic Data Labeling Method based on Deep learning Object Detection amd Trace and System thereof | |
US9852350B2 (en) | Character string recognition device | |
CN106407218B (en) | Navigation webpage detection method and device | |
CN107016316B (en) | barcode identification method and device | |
CN107729898B (en) | Method and device for detecting text lines in text image | |
CN110502990B (en) | Method and system for data acquisition by image processing | |
CN113553370A (en) | Abnormality detection method, abnormality detection device, electronic device, and readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 1226830 Country of ref document: HK |
|
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161214 |
|
RJ01 | Rejection of invention patent application after publication | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: WD Ref document number: 1226830 Country of ref document: HK |