CN106227823A - A kind of webpage update detection method, info web capture and rendering method - Google Patents

A kind of webpage update detection method, info web capture and rendering method Download PDF

Info

Publication number
CN106227823A
CN106227823A CN201610587575.7A CN201610587575A CN106227823A CN 106227823 A CN106227823 A CN 106227823A CN 201610587575 A CN201610587575 A CN 201610587575A CN 106227823 A CN106227823 A CN 106227823A
Authority
CN
China
Prior art keywords
webpage
information
renewal
updating
info web
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610587575.7A
Other languages
Chinese (zh)
Inventor
王喜宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
For Science And Technology (shenzhen) Co Ltd
Original Assignee
For Science And Technology (shenzhen) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by For Science And Technology (shenzhen) Co Ltd filed Critical For Science And Technology (shenzhen) Co Ltd
Priority to CN201610587575.7A priority Critical patent/CN106227823A/en
Publication of CN106227823A publication Critical patent/CN106227823A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a kind of webpage update detection method, info web captures and rendering method, and described webpage update detection method includes: s1, analyzes the frame structure of predetermined url webpage and determines crawl information area;S2, the information of described crawl information area is carried out similarity comparison with local information;S3, when similarity less than set threshold value time judge that this rul webpage has renewal, otherwise judge this url webpage without update.This webpage update detection method can accurately judge the more fresh information of webpage, thus avoids being misled by unrelated renewal, and then resource of avoiding losing time and providing a loan.

Description

A kind of webpage update detection method, info web capture and rendering method
Technical field
The present invention relates to Web information processing technical field, particularly relate to a kind of webpage update detection method, webpage letter Breath captures and rendering method.
Background technology
Application number 201310007246.7, the Chinese invention of a kind of grasping means based on the detection network web update cycle of title Patent application, by the way of obtaining the renewal of the page time, judges whether webpage has renewal, if the page and historical information The renewal of the page time is different, then formulating the page and obtaining mode is to obtain (GET), if the page of the page and historical information is more The new time is identical, then specified page obtains mode is detection (CHK), and the shortcoming of the program is: rely on the time letter that webpage updates Breath judges, may be misled into, such as update when being secondary or need not the information being concerned about, also can start crawl dynamic Make.
The disclosure of background above technology contents is only used for assisting inventive concept and the technical scheme understanding the present invention, and it is not Necessarily belong to the prior art of present patent application, show the foregoing applying date in present patent application there is no tangible proof In the case of disclosed in, above-mentioned background technology should not be taken to evaluate novelty and the creativeness of the application.
Summary of the invention
Present invention is primarily targeted at a kind of webpage update detection method of proposition, to solve what above-mentioned prior art existed The temporal information relying on webpage to update carries out judging possible misguided technical problem.
To this end, the present invention proposes a kind of webpage update detection method, including: s1, the frame structure of analyzing predetermined url webpage And determine crawl information area;S2, the information of described crawl information area is carried out similarity comparison with local information;S3, when Judge that this rul webpage has renewal when similarity is less than the threshold value set, otherwise judge that this url webpage is without updating.
Preferably, the present invention can also have a following technical characteristic:
The information of described judgement described crawl information area comprises the steps: s201, right with the similarity of local information Described crawl information area sectional drawing binary conversion treatment obtain binary image;S202, two that described binary conversion treatment is obtained The binary image of value image and locally stored same url webpage is compared;S203, result according to comparison are judged to have Update or without updating.
Also include step s204, when the result of comparison be judged to without update time, the crawl information that described step s1 is determined Region returns described step s201 at least one times after amplifying the multiple set.
The information of described judgement described crawl information area comprises the steps: s301, really with the similarity of local information The code line at fixed described crawl information area place;S302, capture the customizing messages that described code line is corresponding;S303, by described The customizing messages of customizing messages and locally stored same url webpage is compared;S304, according to than to result be judged to have more New or without updating.
Also include step s305, when the result of comparison be judged to without update time, the crawl information that described step s1 is determined Area extension is to adjacent or non-conterminous additionally at least one code line, and returns described step s301 at least one times.
The present invention also proposes a kind of info web grasping means, updates based on the webpage described in the aforementioned any one of claim Detection method, is updated detection to the webpage of default url, carries out information scratching when result of determination is to have renewal, and update To this locality, it is determined that when result is without updating, not capture and maintain local original information constant.
Preferably, described in carry out information scratching when having renewal and use orientation grasping means, only capture the described crawl information area Information in territory.
The present invention it is further proposed that a kind of info web captures and rendering method, based on the webpage described in aforementioned any one Update detection method, is updated detection to the webpage of default url, carries out information scratching when result of determination is to have renewal, and Update to this locality, it is determined that when result is without updating, maintain local original information constant;Info web is in now, according to without updating Info web formerly have the posterior mode of info web of renewal progressively to present.
Preferably, while the webpage having renewal is carried out information scratching, the webpage without updating is presented, to shorten The waiting time that information presents.
It is also preferred that to the webpage having renewal captured, with inserted mode one by one while renewal to this locality Present immediately.
The beneficial effect that the present invention is compared with the prior art includes: because crawl information area has been determined in advance, and for The judgement whether this region is updated, therefore the misleading that unrelated renewal can be avoided to cause, start webpage the most accurately and effectively Information scratching task, saves time and bandwidth resources.
Accompanying drawing explanation
Fig. 1 is the theory diagram of the present invention;
Fig. 2 is the FB(flow block) of one specific embodiment of the present invention
Fig. 3 is the FB(flow block) of another specific embodiment of the present invention.
Detailed description of the invention
Below in conjunction with detailed description of the invention and compare accompanying drawing the present invention is described in further detail.It is emphasized that That the description below is merely exemplary rather than in order to limit the scope of the present invention and application thereof.
With reference to figure 1 below-3, the embodiment of non-limiting and nonexcludability, the most identical reference table will be described Show identical parts, unless stated otherwise.
A kind of info web captures and rendering method, first, the webpage of default url is updated detection, when judging knot Fruit carries out information scratching when being to have renewal, and updates to this locality, it is determined that when result is without updating, and maintains local original information not Become.Info web, in now, formerly has the posterior mode of info web of renewal progressively to present according to the info web without updating.
More preferably a way is: described in carry out information scratching when having renewal and use orientation grasping means, only capture institute State the information captured in information area.
Another more preferably way be: while the webpage having renewal is carried out information scratching, to without update webpage enter Row presents, to shorten the waiting time that information presents.
It addition, to the webpage having renewal captured, can be with inserted mode one by one while renewal to this locality Present immediately.I.e. showing as such, it is possible to i.e. capture, the display of the web page contents being has seriality, subtracts the most as far as possible Few pause.
As it is shown in figure 1, the method that the webpage of aforementioned default url is updated detection includes: s1, analyze predetermined url webpage Frame structure and determine crawl information area;S2, the information of described crawl information area is carried out similarity with local information Comparison;S3, when similarity less than set threshold value time judge that this rul webpage has renewal, otherwise judge this url webpage without update.
Wherein, the information of described judgement described crawl information area with local information similarity as in figure 2 it is shown, include as Lower step: s201, described crawl information area sectional drawing binary conversion treatment are obtained binary image;S202, by described two-value Change processes the binary image of binary image and the locally stored same url webpage obtained and compares;S203, according to comparison Result be judged to have renewal or without updating.May also include step s204, when the result of comparison be judged to without update time, by described The crawl information area that step s1 determines returns described step s201 at least one times after amplifying the multiple set.This similarity judges The benefit of method is: the content no matter updated is word or graphic form, all can accurately judge.
Or, the information of described judgement described crawl information area with local information similarity as it is shown on figure 3, include as Lower step: s301, determine the code line at described crawl information area place;S302, capture the specific letter that described code line is corresponding Breath;S303, the customizing messages of described customizing messages with locally stored same url webpage is compared;S304, according to comparison Result is judged to there is renewal or without updating.May also include step s305, when the result of comparison be judged to without update time, by described step The crawl information area that rapid s1 determines extends to adjacent or non-conterminous additionally at least one code line, and returns described step S301 is at least one times.The benefit of this similarity decision method is: information scratching is fast, presents more timely, because judging that webpage is more The information (judge and capture two job content unifications) of necessity has been captured, if result of determination is to have more while whether new Newly, directly presented and be saved in this locality.
It would be recognized by those skilled in the art that it is possible that above description is made numerous accommodation, so embodiment is only It is used for describing one or more particular implementation.
Although having been described above and describe the example embodiment being counted as the present invention, it will be apparent to those skilled in the art that It can be variously modified and replace, without departing from the spirit of the present invention.Furthermore it is possible to make many amendments with by spy Stable condition is fitted to the religious doctrine of the present invention, without departing from invention described herein central concept.So, the present invention is unrestricted In specific embodiment disclosed here, but the present invention may also include all embodiments and the equivalent thereof that belong to the scope of the invention Thing.

Claims (10)

1. a webpage update detection method, it is characterised in that: s1, analyze the frame structure of predetermined url webpage and determine crawl Information area;S2, the information of described crawl information area is carried out similarity comparison with local information;S3, it is less than when similarity Judge that this rul webpage has renewal during the threshold value set, otherwise judge that this url webpage is without updating.
2. webpage update detection method as claimed in claim 1, it is characterised in that: described judgement described crawl information area Information comprises the steps: s201 with the similarity of local information, obtains described crawl information area sectional drawing binary conversion treatment To binary image;S202, the binary image that described binary conversion treatment is obtained and the two of locally stored same url webpage Value image is compared;S203, result according to comparison are judged to there is renewal or without updating.
3. webpage update detection method as claimed in claim 2, it is characterised in that: also include step s204, when the knot of comparison When fruit is judged to without updating, capturing after information area amplifies the multiple set of described step s1 being determined returns described step S201 is at least one times.
4. webpage update detection method as claimed in claim 1, it is characterised in that: described judgement described crawl information area Information comprises the steps: s301 with the similarity of local information, determines the code line at described crawl information area place; S302, capture the customizing messages that described code line is corresponding;S303, by described customizing messages and locally stored same url webpage Customizing messages compare;S304, result according to comparison are judged to there is renewal or without updating.
5. webpage update detection method as claimed in claim 4, it is characterised in that: also include step s304, when the knot of comparison When fruit is judged to without updating, the crawl information area described step s1 determined extends to adjacent or non-conterminous additionally at least one Individual code line, and return described step s301 at least one times.
6. an info web grasping means, it is characterised in that: webpage update detection side based on any one of claim 1-5 Method, is updated detection to the webpage of default url, carries out information scratching when result of determination is to have renewal, and updates to this locality, When result of determination is without updating, not capture and maintain local original information constant.
7. info web grasping means as claimed in claim 6, it is characterised in that carry out information scratching when having renewal described in: and adopt With orientation grasping means, only capture the information in described crawl information area.
8. an info web captures and rendering method, it is characterised in that: webpage based on any one of claim 1-5 updates to be visited Survey method, is updated detection to the webpage of default url, carries out information scratching when result of determination is to have renewal, and renewal is arrived Local, it is determined that when result is without updating, maintain local original information constant;
Info web in now, according to the info web without updating formerly have the posterior mode of info web of renewal progressively in Existing.
9. info web as claimed in claim 8 captures and rendering method, it is characterised in that: the webpage having renewal is carried out letter While breath captures, the webpage without updating is presented, to shorten the waiting time that information presents.
10. info web as claimed in claim 8 captures and rendering method, it is characterised in that: have more captured New webpage, presents with inserted mode one by one while this locality immediately updating.
CN201610587575.7A 2016-07-21 2016-07-21 A kind of webpage update detection method, info web capture and rendering method Pending CN106227823A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610587575.7A CN106227823A (en) 2016-07-21 2016-07-21 A kind of webpage update detection method, info web capture and rendering method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610587575.7A CN106227823A (en) 2016-07-21 2016-07-21 A kind of webpage update detection method, info web capture and rendering method

Publications (1)

Publication Number Publication Date
CN106227823A true CN106227823A (en) 2016-12-14

Family

ID=57532701

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610587575.7A Pending CN106227823A (en) 2016-07-21 2016-07-21 A kind of webpage update detection method, info web capture and rendering method

Country Status (1)

Country Link
CN (1) CN106227823A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106910195A (en) * 2017-01-22 2017-06-30 北京奇艺世纪科技有限公司 A kind of web page layout monitoring method and device
CN111367962A (en) * 2020-02-28 2020-07-03 北京金堤科技有限公司 Database updating method and device, computer readable storage medium and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204561A1 (en) * 2002-04-30 2003-10-30 International Business Machines Corporation Method and apparatus for enabling an internet web server to keep an accurate count of page hits
CN102375830A (en) * 2010-08-13 2012-03-14 富士通株式会社 Webpage updating judging method and device as well as website synchronization method and device
CN102982161A (en) * 2012-12-05 2013-03-20 北京奇虎科技有限公司 Method and device for acquiring webpage information
CN103049576A (en) * 2013-01-05 2013-04-17 北京世纪高通科技有限公司 Event acquisition method and event acquisition device
CN103207874A (en) * 2012-01-17 2013-07-17 腾讯科技(深圳)有限公司 Updated webpage content prompting method and system
CN103885957A (en) * 2012-12-20 2014-06-25 百度在线网络技术(北京)有限公司 Webpage information extraction method and device
CN104142987A (en) * 2014-07-24 2014-11-12 腾讯科技(深圳)有限公司 Page content management method and device and terminal device
CN104462152A (en) * 2013-09-23 2015-03-25 深圳市腾讯计算机***有限公司 Webpage recognition method and device
CN105069032A (en) * 2015-07-20 2015-11-18 东南大学 Filtering expression and rendering engine based method for automatically monitoring update of dynamic webpage

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030204561A1 (en) * 2002-04-30 2003-10-30 International Business Machines Corporation Method and apparatus for enabling an internet web server to keep an accurate count of page hits
CN102375830A (en) * 2010-08-13 2012-03-14 富士通株式会社 Webpage updating judging method and device as well as website synchronization method and device
CN103207874A (en) * 2012-01-17 2013-07-17 腾讯科技(深圳)有限公司 Updated webpage content prompting method and system
CN102982161A (en) * 2012-12-05 2013-03-20 北京奇虎科技有限公司 Method and device for acquiring webpage information
CN103885957A (en) * 2012-12-20 2014-06-25 百度在线网络技术(北京)有限公司 Webpage information extraction method and device
CN103049576A (en) * 2013-01-05 2013-04-17 北京世纪高通科技有限公司 Event acquisition method and event acquisition device
CN104462152A (en) * 2013-09-23 2015-03-25 深圳市腾讯计算机***有限公司 Webpage recognition method and device
CN104142987A (en) * 2014-07-24 2014-11-12 腾讯科技(深圳)有限公司 Page content management method and device and terminal device
CN105069032A (en) * 2015-07-20 2015-11-18 东南大学 Filtering expression and rendering engine based method for automatically monitoring update of dynamic webpage

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106910195A (en) * 2017-01-22 2017-06-30 北京奇艺世纪科技有限公司 A kind of web page layout monitoring method and device
CN106910195B (en) * 2017-01-22 2020-06-16 北京奇艺世纪科技有限公司 Webpage layout monitoring method and device
CN111367962A (en) * 2020-02-28 2020-07-03 北京金堤科技有限公司 Database updating method and device, computer readable storage medium and electronic equipment
CN111367962B (en) * 2020-02-28 2024-01-30 北京金堤科技有限公司 Database updating method and device, computer readable storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN109657431B (en) Method for identifying user identity
CN109995601B (en) Network traffic identification method and device
US20130243249A1 (en) Electronic device and method for recognizing image and searching for concerning information
WO2015074503A1 (en) Statistical method and apparatus for webpage access data
KR102002024B1 (en) Method for processing labeling of object and object management server
CN103365967B (en) Automatic difference detection method and device based on crawler
WO2017167088A1 (en) A user relationship based multimedia recommendation method and apparatus
CN103870824A (en) Method and device for capturing face in face detecting and tracking process
CN110348345A (en) A kind of Weakly supervised timing operating position fixing method based on continuity of movement
CN106227823A (en) A kind of webpage update detection method, info web capture and rendering method
CN103077380A (en) Method and device for carrying out statistics on number of people on basis of video
CN107301245B (en) Power information video search system
US10217455B2 (en) Linguistic model database for linguistic recognition, linguistic recognition device and linguistic recognition method, and linguistic recognition system
CN102866885A (en) Method and device for confirming clicking position in webpage
US20200073877A1 (en) Video cookies
CN110599232A (en) Consumption group analysis method based on big data
CN111008987B (en) Method and device for extracting edge image based on gray background and readable storage medium
JP2014532220A (en) Net comment collection method and system
KR20220090203A (en) Automatic Data Labeling Method based on Deep learning Object Detection amd Trace and System thereof
US9852350B2 (en) Character string recognition device
CN106407218B (en) Navigation webpage detection method and device
CN107016316B (en) barcode identification method and device
CN107729898B (en) Method and device for detecting text lines in text image
CN110502990B (en) Method and system for data acquisition by image processing
CN113553370A (en) Abnormality detection method, abnormality detection device, electronic device, and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1226830

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20161214

RJ01 Rejection of invention patent application after publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1226830

Country of ref document: HK