CN105447130A - Method and device for acquiring new chapter of network novel - Google Patents

Method and device for acquiring new chapter of network novel Download PDF

Info

Publication number
CN105447130A
CN105447130A CN201510796828.7A CN201510796828A CN105447130A CN 105447130 A CN105447130 A CN 105447130A CN 201510796828 A CN201510796828 A CN 201510796828A CN 105447130 A CN105447130 A CN 105447130A
Authority
CN
China
Prior art keywords
chapter list
list page
original
usual time
authentic copy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510796828.7A
Other languages
Chinese (zh)
Other versions
CN105447130B (en
Inventor
邝景胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510796828.7A priority Critical patent/CN105447130B/en
Publication of CN105447130A publication Critical patent/CN105447130A/en
Application granted granted Critical
Publication of CN105447130B publication Critical patent/CN105447130B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to the field of computer data mining, especially relates to a method and device for acquiring a new chapter of a network novel. The method comprises the following steps: combining a plurality of chapter list pages of the same topic name, and obtaining a combined result page; judging the similarity between each chapter list page and the combined result page, determining the chapter list page with the maximum similarity as a first original copy, and other chapter list pages as corresponding first duplicate copies; acquiring a first usual time of updating the first original copy, a second usual time of updating the first duplicate copies, and a usual time difference between the first usual time and the second usual time; and responding to an external request of acquiring the chapter list pages, using time rule data represented by the first usual time, the second usual time and the usual time difference to inquire the first original copy and the first duplicate copies, so as to acquire and feed back the chapter list pages. The method and device for acquiring the new chapter of the network novel provided by the invention can save network resources, feed the updated chapter list pages back to users and improve the user experience.

Description

The acquisition methods of the new chapters and sections of the network novel and device
[technical field]
The present invention relates to computer data excavation applications, particularly the acquisition methods of the new chapters and sections of a kind of network novel and device.
[background technology]
In recent years, along with the development of the network novel, there is large quantities of website specialized in the network novel and publish in instalments.And be all after entering novel website for the access of novel website and content search, then input key word stand in retrieval, retrieve the novel content of the associated keyword in this website.Mostly this mode is that the person of pursuing of some novels or network novel fan use; For more general user, the general search engine (such as Baidu, Google etc.) that still passes through is searched for.
In existing way of search, owing to being difficult to the update time predicting the up-to-date chapters and sections of certain this novel, search engine needs constantly to capture Chapter List page and obtains new chapters and sections, and efficiency is not high; And containing the reading website comprising false novel content in a large number in Search Results, make the search need of user completely dissatisfied like this, poor user experience; And due to reasons such as copyrights, the new chapters and sections of the original website of subnetwork novel can not directly be checked, but the content of these new chapters and sections can be obtained in copy website, the mode of the single site search of existing employing by the copy website recommendation that can directly check to user, can not make poor user experience.
[summary of the invention]
Object of the present invention is intended to solve at least one problem above-mentioned, provides acquisition methods and the device of the new chapters and sections of a kind of network novel.
For realizing this object, the present invention adopts following technical scheme:
The invention provides the acquisition methods of the new chapters and sections of a kind of network novel, include step:
Multiple Chapter List pages of same subject title are merged, obtains amalgamation result page;
Judge the similarity between each Chapter List page and amalgamation result page, determine that the Chapter List page that wherein similarity is maximum is the first original, other Chapter List page is then the corresponding first authentic copy;
Obtain described first usual time of the first original renewal, the second usual time of first authentic copy renewal, and the usual time difference between this first usual time and second usual time;
In response to the external request obtaining Chapter List page, utilize the described first usual time, the second usual time and the usual temporal regularity data that characterize of time difference, inquire about described first original and the first authentic copy, to obtain and Chapter List page described in feeding back.
Further, merge at described multiple Chapter List pages by same subject title, before obtaining the step of amalgamation result page, also comprise step:
Detect and obtain Chapter List page, determining the subject name of each Chapter List page, each Chapter List page corresponds to a website;
Cluster has the Chapter List page of same subject title;
Relevance between the multiple site information setting up described subject name and described Chapter List page place.
Further, in the described external request in response to obtaining Chapter List page, utilize the temporal regularity data that the described first usual time, the second usual time and usual time difference characterize, inquire about described first original and the first authentic copy, to obtain and before the step of Chapter List page described in feeding back, also to comprise step:
Receive the external request obtaining Chapter List page.
Concrete, the described external request in response to obtaining Chapter List page, utilize the temporal regularity data that the described first usual time, the second usual time and usual time difference characterize, inquire about described first original and the first authentic copy, to obtain and in the step of Chapter List page described in feeding back, also to comprise step:
In response to the external request obtaining Chapter List page, according to the described first usual time, inquire about the first original according to certain time interval;
Judge whether the Chapter List page corresponding to described first original upgrades;
When described first original upgrades, then inquire about the first authentic copy according to described usual time difference according to certain hour interval;
Obtain and feed back the site information corresponding to the first authentic copy upgraded.
Concrete, describedly judge, in the step whether the Chapter List page corresponding to described first original has upgraded, also to comprise:
By analyzing the chapters and sections information of up-to-date foundation or amendment in the Chapter List page corresponding to described first original, to judge whether this first original upgrades.
Further, described judge the step whether the Chapter List page corresponding to described first original has upgraded after, also comprise step:
When described first original does not upgrade, then perform described according to the described first usual time, inquire about the step of the first original according to certain time interval.
Concrete, the described external request in response to obtaining Chapter List page, utilize the temporal regularity data that the described first usual time, the second usual time and usual time difference characterize, inquire about described first original and the first authentic copy, to obtain and in the step of Chapter List page described in feeding back, also to comprise step:
According to the described second usual time, according to the certain time interval inquiry first authentic copy;
Judge whether the Chapter List page corresponding to the described first authentic copy upgrades all;
When the described first authentic copy upgrades all, then inquire about the first original according to described usual time difference according to certain hour interval, to judge whether described first original upgrades.
Concrete, describedly judge, in the step whether the Chapter List page corresponding to the described first authentic copy has upgraded all, also to comprise:
By analyzing the chapters and sections information of up-to-date foundation or amendment in all Chapter List pages corresponding to the described first authentic copy, to judge whether the first authentic copy upgrades all.
Further, described judge the step whether the Chapter List page corresponding to the described first authentic copy has upgraded all after, also comprise step:
When the described first authentic copy does not upgrade, then perform described according to the described second usual time, according to the step of the certain time interval inquiry first authentic copy.
Further, described multiple Chapter List pages by same subject title merge, and before obtaining the step of amalgamation result page, also comprise step:
According to the similarity between a certain Chapter List page and other Chapter List pages, judge whether this Chapter List page is false Chapter List page;
When judging that obtaining described Chapter List page is false Chapter List page, filters this Chapter List page.
Concrete, described according to the similarity between a certain Chapter List page and other Chapter List pages, judge whether this Chapter List page is in the step of false Chapter List page, also comprises step:
Obtain the character features vector of each Chapter List page;
Judge the average between a certain Chapter List page and other Chapter List pages with same text proper vector;
When described average is more than or equal to default similarity threshold, determine that this Chapter List page is effective Chapter List page;
When described average is less than default similarity threshold, determine that this Chapter List page is false Chapter List page.
Present invention also offers the acquisition device of the new chapters and sections of a kind of network novel, it includes:
Merging module, for being merged by multiple Chapter List pages of same subject title, obtaining amalgamation result page;
Reserved copy and duplicate determination module, for judging the similarity between each Chapter List page and amalgamation result page, determines that the Chapter List page that wherein similarity is maximum is the first original, and other Chapter List page is then the corresponding first authentic copy;
Time-obtaining module, for obtaining described first usual time of the first original renewal, the second usual time of first authentic copy renewal, and the usual time difference between this first usual time and second usual time;
Feedback module, for the external request in response to acquisition Chapter List page, the temporal regularity data utilizing the described first usual time, the second usual time and usual time difference to characterize, inquire about described first original and the first authentic copy, to obtain and Chapter List page described in feeding back.
Further, described acquisition device also includes cluster module,
Described cluster module, before being merged by multiple Chapter List pages of same subject title in merging module, detect and obtain Chapter List page, determining the subject name of each Chapter List page, each Chapter List page corresponds to a website; And
Cluster has the Chapter List page of same subject title; And
Relevance between the multiple site information setting up described subject name and described Chapter List page place.
Further, described acquisition device also includes receiver module,
Described receiver module, for receiving the external request obtaining Chapter List page.
Concrete, described feedback module also includes:
Original query unit, in response to the external request obtaining Chapter List page, according to the described first usual time, inquires about the first original according to certain time interval;
Original judging unit, for judging whether the Chapter List page corresponding to described first original upgrades;
Copy scheduling unit, for upgrading when described first original, then inquires about the first authentic copy according to described usual time difference according to certain hour interval;
Copy feedback unit, for obtaining and feeding back the site information corresponding to the first authentic copy that upgraded.
Concrete, described original judging unit, also for the chapters and sections information of up-to-date foundation or amendment in the Chapter List page by analyzing corresponding to described first original, to judge whether this first original upgrades.
Concrete, described copy scheduling unit, also for not upgrading when described first original, then calls original query unit and performs described according to the described first usual time, inquire about the step of the first original according to certain time interval.
Concrete, described feedback module also includes:
Copy query unit, for according to the described second usual time, according to the certain time interval inquiry first authentic copy;
Copy judging unit, for judging whether the Chapter List page corresponding to the described first authentic copy upgrades all;
Original scheduling unit, for all upgrading when the described first authentic copy, then inquires about the first original according to described usual time difference according to certain hour interval, to judge whether described first original upgrades.
Concrete, described copy judging unit passes through the chapters and sections information analyzing up-to-date foundation or amendment in all Chapter List pages corresponding to the described first authentic copy, to judge whether the first authentic copy upgrades all.
Concrete, described copy judging unit also for not upgrading when the described first authentic copy, then calls copy query unit and performs the described foundation second usual time, according to the step of the certain time interval inquiry first authentic copy.
Concrete, also include false judge module and filtering module,
Described false judge module, for multiple Chapter List pages of same subject title being merged in merging module, before obtaining amalgamation result page, according to the similarity between a certain Chapter List page and other Chapter List pages, judge whether this Chapter List page is false Chapter List page;
Filtering module, being false Chapter List page for obtaining described Chapter List page when judgement, filtering this Chapter List page.
Further, described false judge module is also for obtaining the character features vector of each Chapter List page;
Judge the average between a certain Chapter List page and other Chapter List pages with same text proper vector;
When described average is more than or equal to default similarity threshold, determine that this Chapter List page is effective Chapter List page;
When described average is less than default similarity threshold, determine that this Chapter List page is false Chapter List page.
Compared with prior art, the present invention possesses following advantage:
The acquisition methods of the new chapters and sections of a kind of network novel 1, provided in the present invention, multiple Chapter List pages of same subject title are carried out merging and obtains amalgamation result page, and according to the similarity between each Chapter List page and amalgamation result page, determine that the Chapter List page the most similar to amalgamation result page is the first original, remaining Chapter List page is the corresponding first authentic copy; Again in response to the external request obtaining Chapter List page, the rule data of the usual time utilizing the first original and the first authentic copy to upgrade, usually time difference, inquire about described first original and the first authentic copy, to obtain and Chapter List page described in feeding back.Can according to rule data usual update time in the method, regularly inquiry the first original or the Chapter List page corresponding to the first authentic copy, obtains the Chapter List page upgraded; And do not need the Chapter List page constantly capturing each website, save Internet resources, and the Chapter List page that user upgraded can be fed back to, improve user experience;
2, further, the present invention is before merging multiple Chapter List pages of same subject title, also need to judge whether each Chapter List page is false Chapter List page, is false Chapter List page, filters this Chapter List page when judgement obtains described Chapter List page; Reduce the possibility that the result Chapter List page feeding back to user comprises deceptive information, improve the Experience Degree of user further, the validity that assured plan is implemented;
3, further, in the present invention after the Chapter List web update that the first original is corresponding being detected, according to described usual time difference according to the certain hour interval inquiry first authentic copy, to the site information corresponding to the first authentic copy that user feedback has upgraded.Can to the first authentic copy site information corresponding to user feedback and the first original, the new chapters and sections of correspondence in this first authentic copy website of normal conditions can directly be checked, solve the problem that user directly cannot check the new chapters and sections in part novel original website, improve the Experience Degree of user further.
The aspect that the present invention adds and advantage will part provide in the following description, and these will become obvious from the following description, or be recognized by practice of the present invention.
[accompanying drawing explanation]
The present invention above-mentioned and/or additional aspect and advantage will become obvious and easy understand from the following description of the accompanying drawings of embodiments, wherein:
Fig. 1 is the program flow diagram of an embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 2 is the program flow diagram of an embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 3 is the program flow diagram of an embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 4 is the program flow diagram of an embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 5 is the program flow diagram of an embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 6 is the structural representation of an embodiment of the acquisition device of the new chapters and sections of the network novel in the present invention;
Fig. 7 is the structural representation of an embodiment of the acquisition device of the new chapters and sections of the network novel in the present invention;
Fig. 8 is the structural representation of an embodiment of the acquisition device of the new chapters and sections of the network novel in the present invention;
Fig. 9 is the structural representation of an embodiment of feedback module in the present invention;
Figure 10 is the structural representation of an embodiment of feedback module in the present invention.
[embodiment]
Be further described the present invention below in conjunction with accompanying drawing and exemplary embodiment, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.In addition, if the detailed description of known technology is for illustrating that feature of the present invention is unnecessary, then omitted.
Those skilled in the art of the present technique are appreciated that unless expressly stated, and singulative used herein " ", " one ", " described " and " being somebody's turn to do " also can comprise plural form.Should be further understood that, the wording used in instructions of the present invention " comprises " and refers to there is described feature, integer, step, operation, element and/or assembly, but does not get rid of and exist or add other features one or more, integer, step, operation, element, assembly and/or their group.Should be appreciated that, when we claim element to be " connected " or " coupling " to another element time, it can be directly connected or coupled to other elements, or also can there is intermediary element.In addition, " connection " used herein or " coupling " can comprise wireless connections or wirelessly to couple.Wording "and/or" used herein comprises one or more whole or arbitrary unit listing item be associated and all combinations.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, and all terms used herein (comprising technical term and scientific terminology), have the meaning identical with the general understanding of the those of ordinary skill in field belonging to the present invention.It should also be understood that, those terms defined in such as general dictionary, should be understood to that there is the meaning consistent with the meaning in the context of prior art, unless and by specific definitions as here, otherwise can not explain by idealized or too formal implication.
Be necessary first to carry out following guiding explanation to application scenarios of the present invention and principle thereof.
In internet, generally comprise user side (customer mobile terminal), network and server (Web server etc. as website).Wherein user side can be the internet mobile terminal of user, as desktop computer (PC), laptop computer (Laptop), with the smart machines of web page browsing function, as personal digital assistant (PersonalDigitalAssisstant, PDA), and mobile internet device (MobileInternetDevice, MID) and smart mobile phone (Phone) etc.These mobile terminals can, in internet environment, typically as in the Internet environment, be asked to provide a certain service by another process (as the process that server provides).Such as, in the present invention, to be mounted with the mobile phone of the APP of network novel function of search for user side, such as: Android phone etc.; With user search input field in described APP, user can input the theme of a certain network novel to search for e-book, the result that far-end server can obtain to user feedback search in response to this searching request.
Server normally by telecommunication medias such as internets, typical remote computer system of accessing as internet.And server can provide service for the multiple user sides from internet usually.There is provided service process to comprise and receive the user side request of sending, collect user side information and feedback information etc.In fact, server serves as this role of informant of computer network.Server is usually located at the side providing service, or is configured to service content by service provider, and such service provider can as the website etc. of Internet service company.
The embodiment of the of the present invention some technical schemes proposed to use above-mentioned principle to realize above-mentioned scene will be described in detail below.It should be noted that, the acquisition methods of the new chapters and sections of a kind of network novel provided by the invention, described from the visual angle of server, by programming, new for network novel chapters and sections acquisition methods can be embodied as computer program to realize on remote network devices, it includes but not limited to the cloud that computing machine, network host, single network server, multiple webserver collection or multiple server are formed.
See accompanying drawing 1, an exemplary embodiments of the new chapters and sections acquisition methods of a kind of network novel of the present invention, specifically comprises the following steps:
Multiple Chapter List pages of same subject title are merged, obtain amalgamation result page by S11.
After obtaining multiple Chapter List pages of same subject title, adopt certain duplicate removal and merge algorithm, multiple Chapter List page is merged into result page, is understood that, Chapter List in this result page is complete compared with other Chapter List page, and includes up-to-date Chapter List page.It should be noted that, in the acquisition methods of the new chapters and sections of the network novel of the present invention, the data of multiple website can be captured by Web Spider, can show whether it is novel website by automatic web page structure analysis.
In one embodiment of the invention, see accompanying drawing 2, before step S11, also step is comprised:
S101, detects and obtains Chapter List page, determines the subject name of each Chapter List page, and each Chapter List page corresponds to a website;
S102, cluster has the Chapter List page of same subject title;
S103, the relevance between the multiple site information setting up described subject name and described Chapter List page place.
Concrete, server carries out structure analysis to the webpage under the domain name of novel website, if include multiple parallel Chapter List label in webpage, can judge that this webpage is as novel Chapter List page; There is height similarity relation in sensing link href (HypertextReference, hypertext is quoted) of wherein said multiple parallel Chapter List label, but and the identical concrete filename difference of the Chapter List catalogue of correspondence.Such as, assuming that the catalogue that the href attribute kit of described multiple parallel Chapter List label contains is 5_5288, and the filename that href attribute kit contains is variant, namely by 970871 to 970980.
Further, the multiple parallel Chapter List label that described novel Chapter List page comprises includes chapters and sections Text eigenvector, it includes the key word and/or chapters and sections number that characterize chapters and sections, search engine can remove based on above-mentioned key word and/or chapters and sections number the subject name extracting this Chapter List page, such as, " title+author " can be adopted as the subject name of this Chapter List page.Then, be a set by the Chapter List page cluster with same subject title, and obtain the site information at each Chapter List page place, set up the relevance between described subject name and described multiple site information.
Further, comprising the possibility of deceptive information to reduce the result Chapter List page feeding back to user, improving the Experience Degree of user, the validity that assured plan is implemented.In one embodiment of the invention, refer to accompanying drawing 3, before step S11, also comprise step:
S01, according to the similarity between a certain Chapter List page and other Chapter List pages, judges whether this Chapter List page is false Chapter List page;
S02, is false Chapter List page when judgement obtains described Chapter List page, filters this Chapter List page.
Concrete, in one embodiment of the invention, by obtaining the character features vector of each Chapter List page; And judge the average between some Chapter List pages and other Chapter List pages with same text proper vector; When described average is more than or equal to default similarity threshold, determine that this Chapter List page is effective Chapter List page; When described average is less than default similarity threshold, determine that this Chapter List page is false Chapter List page.
It should be noted that, wherein said character features vector can be the multiple key words in Chapter List title, judges the similarity between described multiple key word based on certain similarity evaluation algorithm; Or by the numerical characteristics vector in the page number corresponding to multiple Chapter List page titles of this same subject of extraction, wherein said numerical characteristics vector can be the numerical value characterizing the page number; In the present embodiment, can jointly calculate similarity between any two Chapter List pages in conjunction with the numerical characteristics vector of Text eigenvector and correspondence thereof, also can adopt separately wherein a feature vectors to calculate the similarity between Chapter List page.After judging that a certain Chapter List page is false Chapter List page, directly filter out this Chapter List page.
Further, refer to accompanying drawing 1, in described method of the present invention, also include step:
S12, judges the similarity between each Chapter List page and amalgamation result page, determines that the Chapter List page that wherein similarity is maximum is the first original, and other Chapter List page is then the corresponding first authentic copy.
Concrete, by abovementioned steps S11, obtain more complete and include the amalgamation result page of up-to-date Chapter List item.In this step, by the similarity between more each Chapter List page and amalgamation result page, determine that the Chapter List page that wherein similarity is maximum is the first original, other Chapter List page is then the corresponding first authentic copy.Be understood that, include up-to-date Chapter List item most probably in the first described original, can characterize this Chapter List page is the original Chapter List page upgraded the earliest, determines that this Chapter List page is the first original.
Concrete, in one embodiment of the invention, can by obtaining the character features vector of each Chapter List page; And calculate the sum that each Chapter List page and amalgamation result page have same text proper vector.When this total numerical value is maximum, determine that this Chapter List page is the first original, other Chapter List page is the corresponding first authentic copy.
Further, refer to accompanying drawing 1, in described method of the present invention, also include step:
S13, obtains described first usual time of the first original renewal, the second usual time of first authentic copy renewal, and the usual time difference between this first usual time and second usual time.
Concrete, in the present invention, by adding up multiple times that described first original upgrades, analyzing and obtaining the first usual time of its correspondence; In like manner, by adding up multiple times of each Chapter List web update corresponding to the described first authentic copy, analyzing and obtaining the second usual time corresponding to each Chapter List page; And calculate after the first original upgrades, delay certain hour, certain first authentic copy corresponding to this subject name upgrades, and this time delayed is the usual time difference corresponding to this first authentic copy; And calculate after all first authentic copies have upgraded, described first original is delayed and how long is upgraded once again, and this time delayed is the usual time difference of the first original relative to all first authentic copies.Server relevance can store the site information at described time value and the first corresponding original, first authentic copy place, and certainly described first original is associated with same section name all in advance with multiple first authentic copies of correspondence and stores.
Further, refer to accompanying drawing 1, in described method of the present invention, also comprise step:
S14, in response to the external request obtaining Chapter List page, the temporal regularity data utilizing the described first usual time, the second usual time and usual time difference to characterize, inquire about described first original and the first authentic copy, to obtain and Chapter List page described in feeding back.
Be understood that, in one embodiment of the invention, before step S14, also comprise step: receive the external request obtaining Chapter List page.
Concrete, in one exemplary embodiment of the present invention, the present invention is to be mounted with the mobile phone of the APP of network novel function of search for user side, with user search input field in described APP, user can input the subject name of a certain network novel to search for the up-to-date Chapter List of this novel, then based on the up-to-date Chapter List page obtained, the up-to-date chapters and sections content pages that this list page links is entered.It should be noted that, the present invention is only exemplary, can not be construed as limiting the invention.
Concrete, in one embodiment of the invention, refer to accompanying drawing 4, in described step S14, specifically further comprising the steps of:
S141, in response to the external request obtaining Chapter List page, according to the described first usual time, inquires about the first original according to certain time interval;
S142, judges whether the Chapter List page corresponding to described first original upgrades;
S143, when described first original upgrades, then inquires about the first authentic copy according to described usual time difference according to certain hour interval;
S144, obtains and feeds back the site information corresponding to the first authentic copy upgraded.
Concrete, after the external request of the acquisition Chapter List page about a certain subject name sent to client at received server-side, according to the usual time that the first original corresponding to this subject name preset upgrades, inquire about this first original according to certain time interval, and judge whether the Chapter List page corresponding to described first original upgrades; When described first original upgrades, then the usual time difference that foundation is default inquires about the first authentic copy corresponding to this first original according to certain hour interval; When obtaining some first authentic copies and upgrading, then obtain the site information corresponding to this first authentic copy, and to this site information of client feedback.Otherwise, when described first original does not upgrade, then repeat the described foundation first usual time, inquire about the step of the first original according to certain time interval; Until judge that obtaining described first original upgrades.
Concrete, in one embodiment of the invention, by analyzing the chapters and sections information of up-to-date foundation or amendment in the Chapter List page corresponding to described first original, judge whether this first original upgrades.Such as, in one exemplary embodiment of the present invention, foundation time of the chapters and sections content of text of each parallel Chapter List label or this label institute hyperlink or modification time in regular acquisition Chapter List page, obtain and record this foundation time or modification time time point the latest, the time point newly obtained and the last time point recorded are contrasted, if two time points are not identical, then characterize this Chapter List page and upgrade; If otherwise two time points are identical, then characterize this Chapter List page and do not upgrade.It should be noted that, the above-mentioned embodiment judging whether the first original has upgraded is exemplary, and those skilled in that art can also adopt other modes to realize, and the present embodiment can not be construed as limiting the invention.
Be understood that, pass through above-described embodiment, can to the first authentic copy site information corresponding to user feedback and the first original, the new chapters and sections of correspondence in this first authentic copy website of normal conditions can directly be checked, solve the problem that user directly cannot check the new chapters and sections in part novel original website, improve the Experience Degree of user.
Further, refer to accompanying drawing 5, in another embodiment of the present invention, in described step S14, also include step:
S145, according to the described second usual time, according to the certain time interval inquiry first authentic copy;
S146, judges whether the Chapter List page corresponding to the described first authentic copy upgrades all;
S147, when the described first authentic copy upgrades all, then inquires about the first original according to described usual time difference according to certain hour interval, to judge whether described first original upgrades.
Concrete, in this embodiment when detection obtain the first authentic copy all upgrade time, according to the usual time difference preset, go the first original detecting its correspondence whether to upgrade once again again.But when detection obtain the first authentic copy there is no a update all time, then repeat according to the described second usual time, according to the step of the certain time interval inquiry first authentic copy, all upgrade until obtain all first authentic copies.
Further, in one exemplary embodiment of the present invention, by analyzing the chapters and sections information of up-to-date foundation or amendment in all Chapter List pages corresponding to the described first authentic copy, to judge whether the first authentic copy upgrades all.Such as, in one exemplary embodiment of the present invention, foundation time of the chapters and sections content of text of each parallel Chapter List label or this label institute hyperlink or modification time in regular acquisition Chapter List page, obtain and record this foundation time or modification time time point the latest, the time point newly obtained and the last time point recorded are contrasted, if two time points are not identical, then characterize this Chapter List page and upgrade; If otherwise two time points are identical, then characterize this Chapter List page and do not upgrade.It should be noted that, the above-mentioned embodiment judging whether the first authentic copy has upgraded is exemplary, and those skilled in that art can also adopt other modes to realize, and the present embodiment can not be construed as limiting the invention.
Described on end, the acquisition methods of the new chapters and sections of a kind of network novel provided in the present invention, multiple Chapter List pages of same subject title are carried out merging and obtains amalgamation result page, and according to the similarity between each Chapter List page and amalgamation result page, determine that the Chapter List page the most similar to amalgamation result page is the first original, remaining Chapter List page is the corresponding first authentic copy; Again in response to the external request obtaining Chapter List page, the rule data of the usual time utilizing the first original and the first authentic copy to upgrade, usually time difference, inquire about described first original and the first authentic copy, to obtain and Chapter List page described in feeding back.Can according to rule data usual update time in the method, regularly inquiry the first original or the Chapter List page corresponding to the first authentic copy, obtains the Chapter List page upgraded; And do not need the Chapter List page constantly capturing each website, save Internet resources, and the Chapter List page that user upgraded can be fed back to, improve user experience.
Further, according to the function modoularization thinking of computer software, present invention also offers the acquisition device of the new chapters and sections of a kind of network novel, refer to Fig. 6.Described device comprises and merges module 11, reserved copy and duplicate determination module 12, time-obtaining module 13 and feedback module 14, utilizes above-mentioned each module to erect the principle framework of whole device, thus realizes modularization embodiment.The concrete concrete function disclosing each module and realize below.
Described merging module 11, for being merged by multiple Chapter List pages of same subject title, obtains amalgamation result page.
After described merging module 11 obtains multiple Chapter List pages of same subject title, adopt certain duplicate removal and merge algorithm, multiple Chapter List page is merged into result page, be understood that, Chapter List in this result page is complete compared with other Chapter List page, and includes up-to-date Chapter List page.It should be noted that, in the acquisition device of the new chapters and sections of the network novel of the present invention, the data of multiple website can be captured by Web Spider, can show whether it is novel website by automatic web page structure analysis.
In one embodiment of the invention, see accompanying drawing 7, described acquisition device also includes cluster module 10.
Described cluster module 10, before being merged by multiple Chapter List pages of same subject title in merging module 11, detect and obtain Chapter List page, determining the subject name of each Chapter List page, each Chapter List page corresponds to a website; And
Cluster has the Chapter List page of same subject title; And
Relevance between the multiple site information setting up described subject name and described Chapter List page place.
Concrete, the webpage under described cluster module 10 pairs of novel website domain names carries out structure analysis, if include multiple parallel Chapter List label in webpage, can judge that this webpage is as novel Chapter List page; There is height similarity relation in sensing link href (HypertextReference, hypertext is quoted) of wherein said multiple parallel Chapter List label, but and the identical concrete filename difference of the Chapter List catalogue of correspondence.Such as, assuming that the catalogue that the href attribute kit of described multiple parallel Chapter List label contains is 5_5288, and the filename that href attribute kit contains is variant, namely by 970871 to 970980.
Further, the multiple parallel Chapter List label that described novel Chapter List page comprises includes chapters and sections Text eigenvector, it includes the key word and/or chapters and sections number that characterize chapters and sections, described cluster module 10 can remove based on above-mentioned key word and/or chapters and sections number the subject name extracting this Chapter List page, such as, " title+author " can be adopted as the subject name of this Chapter List page.Then, the Chapter List page cluster with same subject title is a set by described cluster module 10, and obtains the site information at each Chapter List page place, sets up the relevance between described subject name and described multiple site information.
Further, comprising the possibility of deceptive information to reduce the result Chapter List page feeding back to user, improving the Experience Degree of user, the validity that assured plan is implemented.In one embodiment of the invention, refer to accompanying drawing 8, described acquisition device also includes false judge module and filtering module.
Described false judge module 01, for according to the similarity between a certain Chapter List page and other Chapter List pages, judges whether this Chapter List page is false Chapter List page;
Described filtering module 02, being false Chapter List page for obtaining described Chapter List page when judgement, filtering this Chapter List page.
Concrete, in one embodiment of the invention, described false judge module 01 is by obtaining the character features vector of each Chapter List page; And judge the average between some Chapter List pages and other Chapter List pages with same text proper vector; When described average is more than or equal to default similarity threshold, described false judge module 01 determines that this Chapter List page is effective Chapter List page; When described average is less than default similarity threshold, described false judge module 01 determines that this Chapter List page is false Chapter List page.
It should be noted that, wherein said character features vector can be the multiple key words in Chapter List title, judges the similarity between described multiple key word based on certain similarity evaluation algorithm; Or by the numerical characteristics vector in the page number corresponding to multiple Chapter List page titles of this same subject of extraction, wherein said numerical characteristics vector can be the numerical value characterizing the page number; In the present embodiment, can jointly calculate similarity between any two Chapter List pages in conjunction with the numerical characteristics vector of Text eigenvector and correspondence thereof, also can adopt separately wherein a feature vectors to calculate the similarity between Chapter List page.After described false judge module 01 judges that a certain Chapter List page is false Chapter List page, described filtering module 02 directly filters out this Chapter List page.
Further, refer to accompanying drawing 6, described reserved copy and duplicate determination module 12, for judging the similarity between each Chapter List page and amalgamation result page, determine that the Chapter List page that wherein similarity is maximum is the first original, other Chapter List page is then the corresponding first authentic copy.
Concrete, by aforementioned merging module 11, obtain more complete and include the amalgamation result page of up-to-date Chapter List item.This reserved copy and duplicate determination module 12, by the similarity between more each Chapter List page and amalgamation result page, determines that the Chapter List page that wherein similarity is maximum is the first original, and other Chapter List page is then the corresponding first authentic copy.Be understood that, include up-to-date Chapter List item most probably in the first described original, can characterize this Chapter List page is the original Chapter List page upgraded the earliest, determines that this Chapter List page is the first original.
Concrete, in one embodiment of the invention, described reserved copy and duplicate determination module 12 can by obtaining the character features vector of each Chapter List page; And calculate the sum that each Chapter List page and amalgamation result page have same text proper vector.When this total numerical value is maximum, determine that this Chapter List page is the first original, other Chapter List page is the corresponding first authentic copy.
Further, refer to accompanying drawing 6, described time-obtaining module 13, for obtaining described first usual time of the first original renewal, the second usual time of first authentic copy renewal, and the usual time difference between this first usual time and second usual time.
Concrete, in the present invention, multiple times that described time-obtaining module 13 is upgraded by described first original of statistics, analyze and obtain the first usual time of its correspondence; In like manner, described time-obtaining module 13, by multiple times of each Chapter List web update of statistics corresponding to the described first authentic copy, is analyzed and obtains the second usual time corresponding to each Chapter List page; And described time-obtaining module 13 calculates after the first original upgrades, and delays certain hour, certain first authentic copy corresponding to this subject name upgrades, and this time delayed is the usual time difference corresponding to this first authentic copy; And described time-obtaining module 13 calculates after all first authentic copies have upgraded, described first original is delayed and how long is upgraded once again, and this time delayed is the usual time difference of the first original relative to all first authentic copies.Described time-obtaining module 13 relevance can store the site information at described time value and the first corresponding original, first authentic copy place, and certainly described first original is associated with same section name all in advance with multiple first authentic copies of correspondence and stores.
Further, refer to accompanying drawing 6, described feedback module 14, for the external request in response to acquisition Chapter List page, utilize the temporal regularity data that the described first usual time, the second usual time and usual time difference characterize, inquire about described first original and the first authentic copy, to obtain and Chapter List page described in feeding back.
Be understood that, in one embodiment of the invention, described acquisition device also includes receiver module, described receiver module, for receiving the external request obtaining Chapter List page.
Concrete, in one exemplary embodiment of the present invention, the present invention is to be mounted with the mobile phone of the APP of network novel function of search for user side, with user search input field in described APP, user can input the subject name of a certain network novel to search for the up-to-date Chapter List of this novel, and then described receiver module can receive this external request.Then user side is based on the up-to-date Chapter List page obtained, and enters the up-to-date chapters and sections content pages that this list page links.It should be noted that, the present invention is only exemplary, can not be construed as limiting the invention.
Concrete, in one embodiment of the invention, refer to accompanying drawing 9, described feedback module 14 also includes original query unit 141, original judging unit 142, copy scheduling unit 143 and copy feedback unit 144.
Described original query unit 141, in response to the external request obtaining Chapter List page, according to the described first usual time, inquires about the first original according to certain time interval;
Described original judging unit 142, for judging whether the Chapter List page corresponding to described first original upgrades;
Described copy scheduling unit 143, for upgrading when described first original, then inquires about the first authentic copy according to described usual time difference according to certain hour interval;
Described copy feedback unit 144, for obtaining and feeding back the site information corresponding to the first authentic copy that upgraded.
Concrete, receive the external request of the acquisition Chapter List page about a certain subject name that client sends at receiver module after, the usual time that described original query unit 141 upgrades according to the first original corresponding to this subject name preset, inquire about this first original according to certain time interval, described original judging unit 142 judges whether the Chapter List page corresponding to described first original upgrades; When described first original upgrades, the usual time difference preset of described copy scheduling unit 143 foundations inquires about the first authentic copy corresponding to this first original according to certain hour interval; When obtaining some first authentic copies and upgrading, described copy feedback unit 144 obtains the site information corresponding to this first authentic copy, and to this site information of client feedback.Otherwise when described first original does not upgrade, described copy scheduling unit 143 repeats the described foundation first usual time, inquires about the step of the first original according to certain time interval; Until judge that obtaining described first original upgrades.
Concrete, in one embodiment of the invention, described original judging unit 142, by analyzing the chapters and sections information of up-to-date foundation or amendment in the Chapter List page corresponding to described first original, judges whether this first original upgrades.Such as, in one exemplary embodiment of the present invention, described original judging unit 142 regularly obtains foundation time or the modification time of the chapters and sections content of text of each parallel Chapter List label or this label institute hyperlink in Chapter List page, obtain and record this foundation time or modification time time point the latest, the time point newly obtained and the last time point recorded are contrasted, if two time points are not identical, then characterize this Chapter List page and upgrade; If otherwise two time points are identical, then characterize this Chapter List page and do not upgrade.It should be noted that, original judging unit 142 described above judges that the embodiment whether the first original has upgraded is exemplary, and those skilled in that art can also adopt other modes to realize, and the present embodiment can not be construed as limiting the invention.
Be understood that, pass through above-described embodiment, can to the first authentic copy site information corresponding to user feedback and the first original, the new chapters and sections of correspondence in this first authentic copy website of normal conditions can directly be checked, solve the problem that user directly cannot check the new chapters and sections in part novel original website, improve the Experience Degree of user.
Further, refer to accompanying drawing 10, in another embodiment of the present invention, described feedback module 14 also includes copy query unit 145, copy judging unit 146 and original scheduling unit 147.
Described copy query unit 145, for according to the described second usual time, according to the certain time interval inquiry first authentic copy;
Described copy judging unit 146, for judging whether the Chapter List page corresponding to the described first authentic copy upgrades all;
Described original scheduling unit 147, for all upgrading when the described first authentic copy, then inquires about the first original according to described usual time difference according to certain hour interval, to judge whether described first original upgrades.
Concrete, in this embodiment when described copy judging unit 146 detect obtain the first authentic copy all upgrade time, whether described original scheduling unit 147, according to the usual time difference preset, goes the first original detecting its correspondence to upgrade once again again.But when described copy judging unit 146 detect obtain the first authentic copy there is no a update all time, then calling copy query unit 145 repeats according to the described second usual time, according to the step of the certain time interval inquiry first authentic copy, all upgrade until obtain all first authentic copies.
Further, in one exemplary embodiment of the present invention, described copy judging unit 146 passes through the chapters and sections information analyzing up-to-date foundation or amendment in all Chapter List pages corresponding to the described first authentic copy, to judge whether the first authentic copy upgrades all.Such as, in one exemplary embodiment of the present invention, described copy judging unit 146 regularly obtains foundation time or the modification time of the chapters and sections content of text of each parallel Chapter List label or this label institute hyperlink in Chapter List page, obtain and record this foundation time or modification time time point the latest, the time point newly obtained and the last time point recorded are contrasted, if two time points are not identical, then characterize this Chapter List page and upgrade; If otherwise two time points are identical, then characterize this Chapter List page and do not upgrade.It should be noted that, copy judging unit 146 described above judges that the embodiment whether first authentic copy has upgraded is exemplary, and those skilled in that art can also adopt other modes to realize, and the present embodiment can not be construed as limiting the invention.
Described on end, the acquisition device of the new chapters and sections of a kind of network novel provided in the present invention, merge module 11 multiple Chapter List pages of same subject title are carried out merging and obtains amalgamation result page, described reserved copy and duplicate determination module 12 is according to the similarity between each Chapter List page and amalgamation result page, determine that the Chapter List page the most similar to amalgamation result page is the first original, remaining Chapter List page is the corresponding first authentic copy; Described feedback module 14 is again in response to the external request obtaining Chapter List page, the rule data of the usual time that the first original utilizing described time-obtaining module 13 to obtain and the first authentic copy upgrade, usual time difference, inquire about described first original and the first authentic copy, to obtain and Chapter List page described in feeding back.Can according to rule data usual update time in this device, regularly inquiry the first original or the Chapter List page corresponding to the first authentic copy, obtains the Chapter List page upgraded; And do not need the Chapter List page constantly capturing each website, save Internet resources, and the Chapter List page that user upgraded can be fed back to, improve user experience.
In instructions provided herein, although the description of a large amount of details.But can understand, embodiments of the invention can be put into practice when not having these details.In certain embodiments, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Although shown exemplary embodiments more of the present invention above, but it should be appreciated by those skilled in the art that, when not departing from principle of the present invention or spirit, can make a change these exemplary embodiments, scope of the present invention is by claim and equivalents thereof.

Claims (10)

1. an acquisition methods for the new chapters and sections of the network novel, is characterized in that, include step:
Multiple Chapter List pages of same subject title are merged, obtains amalgamation result page;
Judge the similarity between each Chapter List page and amalgamation result page, determine that the Chapter List page that wherein similarity is maximum is the first original, other Chapter List page is then the corresponding first authentic copy;
Obtain described first usual time of the first original renewal, the second usual time of first authentic copy renewal, and the usual time difference between this first usual time and second usual time;
In response to the external request obtaining Chapter List page, utilize the described first usual time, the second usual time and the usual temporal regularity data that characterize of time difference, inquire about described first original and the first authentic copy, to obtain and Chapter List page described in feeding back.
2. method according to claim 1, is characterized in that, merges, before obtaining the step of amalgamation result page, also comprise step at described multiple Chapter List pages by same subject title:
Detect and obtain Chapter List page, determining the subject name of each Chapter List page, each Chapter List page corresponds to a website;
Cluster has the Chapter List page of same subject title;
Relevance between the multiple site information setting up described subject name and described Chapter List page place.
3. method according to claim 1, it is characterized in that, the described external request in response to obtaining Chapter List page, utilize the temporal regularity data that the described first usual time, the second usual time and usual time difference characterize, inquire about described first original and the first authentic copy, to obtain and before the step of Chapter List page described in feeding back, also to comprise step:
Receive the external request obtaining Chapter List page.
4. method according to claim 1, it is characterized in that, the described external request in response to obtaining Chapter List page, utilize the temporal regularity data that the described first usual time, the second usual time and usual time difference characterize, inquire about described first original and the first authentic copy, to obtain and in the step of Chapter List page described in feeding back, also to comprise step:
In response to the external request obtaining Chapter List page, according to the described first usual time, inquire about the first original according to certain time interval;
Judge whether the Chapter List page corresponding to described first original upgrades;
When described first original upgrades, then inquire about the first authentic copy according to described usual time difference according to certain hour interval;
Obtain and feed back the site information corresponding to the first authentic copy upgraded.
5. method according to claim 4, is characterized in that, describedly judges, in the step whether the Chapter List page corresponding to described first original has upgraded, also to comprise:
By analyzing the chapters and sections information of up-to-date foundation or amendment in the Chapter List page corresponding to described first original, to judge whether this first original upgrades.
6. method according to claim 4, is characterized in that, described judge the step whether the Chapter List page corresponding to described first original has upgraded after, also comprise step:
When described first original does not upgrade, then perform described according to the described first usual time, inquire about the step of the first original according to certain time interval.
7. method according to claim 1, it is characterized in that, the described external request in response to obtaining Chapter List page, utilize the temporal regularity data that the described first usual time, the second usual time and usual time difference characterize, inquire about described first original and the first authentic copy, to obtain and in the step of Chapter List page described in feeding back, also to comprise step:
According to the described second usual time, according to the certain time interval inquiry first authentic copy;
Judge whether the Chapter List page corresponding to the described first authentic copy upgrades all;
When the described first authentic copy upgrades all, then inquire about the first original according to described usual time difference according to certain hour interval, to judge whether described first original upgrades.
8. method according to claim 7, is characterized in that, describedly judges, in the step whether the Chapter List page corresponding to the described first authentic copy has upgraded all, also to comprise:
By analyzing the chapters and sections information of up-to-date foundation or amendment in all Chapter List pages corresponding to the described first authentic copy, to judge whether the first authentic copy upgrades all.
9. method according to claim 7, is characterized in that, described judge the step whether the Chapter List page corresponding to the described first authentic copy has upgraded all after, also comprise step:
When the described first authentic copy does not upgrade, then perform described according to the described second usual time, according to the step of the certain time interval inquiry first authentic copy.
10. an acquisition device for the new chapters and sections of the network novel, is characterized in that, include:
Merging module, for being merged by multiple Chapter List pages of same subject title, obtaining amalgamation result page;
Reserved copy and duplicate determination module, for judging the similarity between each Chapter List page and amalgamation result page, determines that the Chapter List page that wherein similarity is maximum is the first original, and other Chapter List page is then the corresponding first authentic copy;
Time-obtaining module, for obtaining described first usual time of the first original renewal, the second usual time of first authentic copy renewal, and the usual time difference between this first usual time and second usual time;
Feedback module, for the external request in response to acquisition Chapter List page, the temporal regularity data utilizing the described first usual time, the second usual time and usual time difference to characterize, inquire about described first original and the first authentic copy, to obtain and Chapter List page described in feeding back.
CN201510796828.7A 2015-11-18 2015-11-18 The acquisition methods and device of the new chapters and sections of the network novel Active CN105447130B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510796828.7A CN105447130B (en) 2015-11-18 2015-11-18 The acquisition methods and device of the new chapters and sections of the network novel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510796828.7A CN105447130B (en) 2015-11-18 2015-11-18 The acquisition methods and device of the new chapters and sections of the network novel

Publications (2)

Publication Number Publication Date
CN105447130A true CN105447130A (en) 2016-03-30
CN105447130B CN105447130B (en) 2018-12-25

Family

ID=55557307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510796828.7A Active CN105447130B (en) 2015-11-18 2015-11-18 The acquisition methods and device of the new chapters and sections of the network novel

Country Status (1)

Country Link
CN (1) CN105447130B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218744A (en) * 2012-07-20 2013-07-24 上海大智慧股份有限公司 Industry investment information and data processing system based on strength, weakness, opportunity, and threat (SWOT) model
CN104050273A (en) * 2014-06-24 2014-09-17 北京奇虎科技有限公司 Devices and methods for recording latest network file and modifying search result
CN104317903A (en) * 2014-10-24 2015-01-28 北京奇虎科技有限公司 Chapter type text chapter integrity identification method and device
CN104346443A (en) * 2014-10-20 2015-02-11 北京国双科技有限公司 Web text processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218744A (en) * 2012-07-20 2013-07-24 上海大智慧股份有限公司 Industry investment information and data processing system based on strength, weakness, opportunity, and threat (SWOT) model
CN104050273A (en) * 2014-06-24 2014-09-17 北京奇虎科技有限公司 Devices and methods for recording latest network file and modifying search result
CN104346443A (en) * 2014-10-20 2015-02-11 北京国双科技有限公司 Web text processing method and device
CN104317903A (en) * 2014-10-24 2015-01-28 北京奇虎科技有限公司 Chapter type text chapter integrity identification method and device

Also Published As

Publication number Publication date
CN105447130B (en) 2018-12-25

Similar Documents

Publication Publication Date Title
CN107273409B (en) Network data acquisition, storage and processing method and system
RU2522103C2 (en) Update notification method and browser
US9300755B2 (en) System and method for determining information reliability
US10489476B2 (en) Methods and devices for preloading webpages
US20150186524A1 (en) Deep application crawling
CN102722498B (en) Search engine and implementation method thereof
CN102722501B (en) Search engine and realization method thereof
CN102722499B (en) Search engine and implementation method thereof
CN102737021B (en) Search engine and realization method thereof
CN103530292A (en) Webpage displaying method and device
CN110069693B (en) Method and device for determining target page
US9043306B2 (en) Content signature notification
CN102306181B (en) Method and system for providing network resources
JP4253315B2 (en) Knowledge information collecting system and knowledge information collecting method
CN105447130A (en) Method and device for acquiring new chapter of network novel
KR101174398B1 (en) Apparatus and method for recommanding contents
JP5297295B2 (en) WWW information browsing system and method, and Web browser and program
CN101340463B (en) Method and apparatus for determining network resource type
CN101923548A (en) Method for searching Internet information and search engine
CN108255831B (en) Method and system for generating website map for website
JP6749865B2 (en) INFORMATION COLLECTION DEVICE AND INFORMATION COLLECTION METHOD
CN104392000A (en) Method and device for determining catching quota of mobile station
JP6510452B2 (en) Search server, search system, search information distribution system, search program, search information distribution program
US10853184B1 (en) Granular restore view using out-of-band continuous metadata collection
JP3708893B2 (en) Knowledge information collecting system and knowledge information collecting method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220726

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.

TR01 Transfer of patent right