CN105447130B - The acquisition methods and device of the new chapters and sections of the network novel - Google Patents

The acquisition methods and device of the new chapters and sections of the network novel Download PDF

Info

Publication number
CN105447130B
CN105447130B CN201510796828.7A CN201510796828A CN105447130B CN 105447130 B CN105447130 B CN 105447130B CN 201510796828 A CN201510796828 A CN 201510796828A CN 105447130 B CN105447130 B CN 105447130B
Authority
CN
China
Prior art keywords
chapter list
list page
original
page
usual time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510796828.7A
Other languages
Chinese (zh)
Other versions
CN105447130A (en
Inventor
邝景胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510796828.7A priority Critical patent/CN105447130B/en
Publication of CN105447130A publication Critical patent/CN105447130A/en
Application granted granted Critical
Publication of CN105447130B publication Critical patent/CN105447130B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention relates to computer data excavation applications, in particular to the acquisition methods and device of a kind of new chapters and sections of the network novel.The described method includes: multiple Chapter List pages of same subject title are merged, amalgamation result page is obtained;Judge the similarity between each Chapter List page and amalgamation result page, determines that wherein the maximum Chapter List page of similarity is the first original, other Chapter List pages are then the corresponding first authentic copy;Obtain the usual time difference between the first usual time of the first original update, the second usual time and the first usual time and the second usual time that the first authentic copy updates;In response to obtaining the external request of Chapter List page, the temporal regularity data characterized using the described first usual time, the second usual time and usual time difference inquire first original and the first authentic copy, to obtain and feed back the Chapter List page.The invention can save Internet resources, and can feed back to the updated Chapter List page of user, improve user experience.

Description

The acquisition methods and device of the new chapters and sections of the network novel
[technical field]
The present invention relates to computer data excavation applications, in particular to the acquisition methods and dress of a kind of new chapters and sections of the network novel It sets.
[background technique]
In recent years, with the development of the network novel, there are large quantities of websites specializing in the network novel and publishing in instalments.And for The access of novel website and content search be all enter novel website after, then input keyword stood in retrieval, inspection Rope goes out the novel content of the related keyword in the website.This mode is mostly the person of pursuing or the network novel love of some novels Good person uses;It is universal still to be searched by search engine (such as Baidu, Google etc.) for more general users Rope.
In existing way of search, due to being difficult to predict the renewal time of the newest chapters and sections of certain this novel, search engine is needed To grab Chapter List page constantly to obtain new chapters and sections, it is inefficient;And include false novel containing a large amount of in search result The reading website of content, so that the search need of user is not fully met, poor user experience;And due to originals such as copyrights The new chapters and sections of cause, the original website of subnetwork novel cannot be directly viewable, but the new chapter can be obtained in copy website The content of section, the existing copy website recommendation that cannot will can be directly viewable by the way of single site search are made to user Obtain poor user experience.
[summary of the invention]
The purpose of the present invention aims to solve the problem that at least one above-mentioned problem, provides a kind of acquisition side of new chapters and sections of the network novel Method and device.
To realize the purpose, the present invention adopts the following technical scheme:
The present invention provides a kind of acquisition methods of the new chapters and sections of network novel, include step:
Multiple Chapter List pages of same subject title are merged, amalgamation result page is obtained;
Judge the similarity between each Chapter List page and amalgamation result page, determines the wherein maximum chapters and sections column of similarity Table page is the first original, and other Chapter List pages are then the corresponding first authentic copy;
Obtain the first usual time of the first original update, the second usual time that the first authentic copy updates, and should Usual time difference between first usual time and the second usual time;
In response to obtaining the external request of Chapter List page, using the described first usual time, the second usual time and it is used to The temporal regularity data that normal time difference is characterized inquire first original and the first authentic copy, described to obtain and feed back Chapter List page.
Further, multiple Chapter List pages of same subject title are merged described, obtains amalgamation result page The step of before, further comprise the steps of:
Chapter List page is detected and obtained, determines that the subject name of each Chapter List page, each Chapter List page are corresponding In a website;
Cluster has the Chapter List page of identical subject name;
Establish the relevance between multiple site information where the subject name and the Chapter List page.
Further, the external request described in response to obtaining Chapter List page, utilizes the described first usual time, the The temporal regularity data that two usual times and usual time difference are characterized inquire first original and the first authentic copy, to obtain Before the step of taking and feeding back the Chapter List page, further comprise the steps of:
Receive the external request for obtaining Chapter List page.
Specifically, the external request in response to obtaining Chapter List page, is used to using the described first usual time, second The temporal regularity data that normal time and usual time difference are characterized inquire first original and the first authentic copy, to obtain simultaneously In the step of Chapter List page described in feedback, further comprise the steps of:
In response to obtaining the external request of Chapter List page, according to the described first usual time, according between the regular hour Every inquiring the first original;
Judge whether Chapter List page corresponding to first original has updated;
When first original has updated, then according to the usual time difference according to the first pair of certain time interval inquiry This;
It obtains and feeds back site information corresponding to the updated first authentic copy.
Specifically, described judge also to wrap in the whether updated step of Chapter List page corresponding to first original It includes:
By analyzing the chapters and sections information of newest foundation or modification in Chapter List page corresponding to first original, to sentence Whether first original that breaks has updated.
Further, it is described judge the whether updated step of Chapter List page corresponding to first original after, It further comprises the steps of:
When first original does not update, then execute it is described according to the described first usual time, according to the regular hour The step of interval the first original of inquiry.
Specifically, the external request in response to obtaining Chapter List page, is used to using the described first usual time, second The temporal regularity data that normal time and usual time difference are characterized inquire first original and the first authentic copy, to obtain simultaneously In the step of Chapter List page described in feedback, further comprise the steps of:
According to the described second usual time, the first authentic copy is inquired according to certain time interval;
Judge whether Chapter List page corresponding to the first authentic copy has updated;
When the first authentic copy has updated, then according to the usual time difference according to certain time interval inquiry first Original, to judge whether first original has updated.
Specifically, described judge in the whether updated step of Chapter List page corresponding to the first authentic copy, also Include:
By analyzing the chapters and sections information of newest foundation or modification in all Chapter List pages corresponding to the first authentic copy, To judge whether the first authentic copy has updated.
Further, it is described judge the whether updated step of Chapter List page corresponding to the first authentic copy it Afterwards, it further comprises the steps of:
When the first authentic copy does not update, then execute it is described according to the described second usual time, according to the regular hour The step of interval inquiry first authentic copy.
Further, described to merge multiple Chapter List pages of same subject title, obtain amalgamation result page Before step, further comprise the steps of:
According to the similarity between a certain Chapter List page and other Chapter List pages, judge the Chapter List page whether be False Chapter List page;
When judging to obtain the Chapter List page as false Chapter List page, the Chapter List page is filtered.
Specifically, the similarity according between a certain Chapter List page and other Chapter List pages, judges the chapters and sections In the step of whether list page is false Chapter List page, further comprise the steps of:
Obtain the character features vector of each Chapter List page;
Judge the average between a certain Chapter List page and other Chapter List pages with same text feature vector;
When the average is greater than or equal to preset similarity threshold, determine the Chapter List page for effective chapters and sections column Table page;
When the average is less than preset similarity threshold, determine the Chapter List page for false Chapter List page.
The present invention also provides a kind of acquisition device of the new chapters and sections of network novel comprising has:
Merging module obtains amalgamation result page for merging multiple Chapter List pages of same subject title;
Reserved copy and duplicate determining module determines it for judging the similarity between each Chapter List page and amalgamation result page The middle maximum Chapter List page of similarity is the first original, and other Chapter List pages are then the corresponding first authentic copy;
Time-obtaining module, for obtaining that the first usual time, the first authentic copy that first original updates updates Usual time difference between two usual times and the first usual time and the second usual time;
Feedback module utilizes the described first usual time, second for the external request in response to obtaining Chapter List page The temporal regularity data that usual time and usual time difference are characterized inquire first original and the first authentic copy, to obtain And feed back the Chapter List page.
Further, the acquisition device further includes having cluster module,
The cluster module, for multiple Chapter List pages of same subject title to be merged it in merging module Before, Chapter List page is detected and obtained, determines that the subject name of each Chapter List page, each Chapter List page correspond to one Website;And
Cluster has the Chapter List page of identical subject name;And
Establish the relevance between multiple site information where the subject name and the Chapter List page.
Further, the acquisition device further includes having receiving module,
The receiving module, for receiving the external request for obtaining Chapter List page.
Specifically, the feedback module further includes having:
Original query unit, for the external request in response to obtaining Chapter List page, according to the described first usual time, The first original is inquired according to certain time interval;
Original judging unit, for judging whether Chapter List page corresponding to first original has updated;
Copy scheduling unit, for having been updated when first original, then according to the usual time difference according to certain Time interval inquires the first authentic copy;
Copy feedback unit, for obtaining and feeding back site information corresponding to the updated first authentic copy.
Specifically, the original judging unit, is also used to by analyzing Chapter List page corresponding to first original In it is newest foundation or modification chapters and sections information, to judge whether first original has updated.
Specifically, the copy scheduling unit, is also used to not update when first original, then calls original cargo tracer Member executes described according to the described first usual time, the step of inquiring the first original according to certain time interval.
Specifically, the feedback module further includes having:
Copy query unit, for inquiring the first authentic copy according to certain time interval according to the described second usual time;
Copy judging unit, for judging whether Chapter List page corresponding to the first authentic copy has updated;
Original scheduling unit, for having been updated when the first authentic copy, then according to the usual time difference according to one It fixes time and is spaced the first original of inquiry, to judge whether first original has updated.
Specifically, the copy judging unit is by analyzing in all Chapter List pages corresponding to the first authentic copy most The new chapters and sections information founded or modify, to judge whether the first authentic copy has updated.
Specifically, the copy judging unit is also used to not update when the first authentic copy, then copy cargo tracer is called The step of member executes the foundation second usual time, inquires the first authentic copy according to certain time interval.
Specifically, further include having false judgment module and filtering module,
The falseness judgment module, for closing multiple Chapter List pages of same subject title in merging module And before obtaining amalgamation result page, according to the similarity between a certain Chapter List page and other Chapter List pages, the chapter is judged Save whether list page is false Chapter List page;
Filtering module, for filtering the Chapter List when judging to obtain the Chapter List page as false Chapter List page Page.
Further, the false judgment module is also used to obtain the character features vector of each Chapter List page;
Judge the average between a certain Chapter List page and other Chapter List pages with same text feature vector;
When the average is greater than or equal to preset similarity threshold, determine the Chapter List page for effective chapters and sections column Table page;
When the average is less than preset similarity threshold, determine the Chapter List page for false Chapter List page.
Compared with prior art, the present invention has following advantage:
1, the acquisition methods of the new chapters and sections of a kind of network novel provided in the present invention, by multiple chapters and sections of identical subject name List page, which merges, obtains amalgamation result page, and according to the similarity between each Chapter List page and amalgamation result page, The determining and most like Chapter List page of amalgamation result page is the first original, remaining Chapter List page is corresponding first secondary This;Again in response to the external request of acquisition Chapter List page, using the usual time of the first original and first authentic copy update, usually The regular data of time difference inquire first original and the first authentic copy, to obtain and feed back the Chapter List page.It should In method Chapter List corresponding to the first original or the first authentic copy can be periodically inquired according to usual renewal time rule data Page, obtains the Chapter List page of update;Chapter List page without constantly grabbing each website saves Internet resources, And the updated Chapter List page of user can be fed back to, improve user experience;
2, further, the present invention is before multiple Chapter List pages to same subject title merge, it is also necessary to Determine whether each Chapter List page is false Chapter List page, is arranged when judging to obtain the Chapter List page for false chapters and sections Table page filters the Chapter List page;A possibility that reducing in the result Chapter List page for feed back to user including deceptive information, The Experience Degree of user is further increased, guarantees the validity that scheme is implemented;
3, further, in the present invention after detecting the corresponding Chapter List web update of the first original, according to described used Normal time difference inquires the first authentic copy according to certain time interval, to website corresponding to the updated first authentic copy of user feedback Information.It can be to first authentic copy site information corresponding to user feedback and the first original, the usual situation first authentic copy website In the new chapters and sections of correspondence can be directly viewable, solve asking for the new chapters and sections that user can not be directly viewable in the novel original website of part Topic, further increases the Experience Degree of user.
The additional aspect of the present invention and advantage will be set forth in part in the description, these will become from the following description Obviously, or practice through the invention is recognized.
[Detailed description of the invention]
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments Obviously and it is readily appreciated that, in which:
Fig. 1 is the program flow diagram of one embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 2 is the program flow diagram of one embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 3 is the program flow diagram of one embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 4 is the program flow diagram of one embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 5 is the program flow diagram of one embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 6 is the structural schematic diagram of one embodiment of the acquisition device of the new chapters and sections of the network novel in the present invention;
Fig. 7 is the structural schematic diagram of one embodiment of the acquisition device of the new chapters and sections of the network novel in the present invention;
Fig. 8 is the structural schematic diagram of one embodiment of the acquisition device of the new chapters and sections of the network novel in the present invention;
Fig. 9 is the structural schematic diagram of one embodiment of feedback module in the present invention;
Figure 10 is the structural schematic diagram of one embodiment of feedback module in the present invention.
[specific embodiment]
The present invention is further described with exemplary embodiment with reference to the accompanying drawing, the examples of the embodiments are attached It is shown in figure, in which the same or similar labels are throughly indicated same or similar element or there is same or like function Element.The embodiments described below with reference to the accompanying drawings are exemplary, for explaining only the invention, and cannot be construed to pair Limitation of the invention.In addition, if the detailed description of known technology is for showing the invention is characterized in that unnecessary, then by it It omits.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here To explain.
It is necessary to first carry out following guiding explanation to application scenarios of the invention and its principle.
In internet, user terminal (customer mobile terminal), network and the server (Web server of such as website are generally comprised Deng).Wherein user terminal can be the internet mobile terminal of user, such as desktop computer (PC), laptop computer (Laptop), band There are the smart machines of web page browsing function, such as personal digital assistant (Personal Digital Assisstant, PDA), with And mobile internet device (Mobile Internet Device, MID) and smart phone (Phone) etc..These mobile terminals Can be in internet environment, typical as in the Internet environment, request is by another process (process that such as server provides) A certain service is provided.For example, in the present invention, using be mounted with network novel function of search APP mobile phone as user terminal, example Such as: Android phone;Input field is searched for user in the APP, user can input the master of a certain network novel Topic is to search for e-book, the result that remote server can be searched in response to the searching request to user feedback.
Server is usually can be by telecommunication medias such as internets, the typical remote computer system accessed such as internet System.Moreover, a plurality of clients of the server typically from internet provide service.There is provided service process includes receiving to use User terminal information and feedback information etc. are collected in the request that family end is sent.Substantially, the information that server serves as computer network mentions This role of donor.Server is usually located at a side of the service of offer, or is configured by service provider with service content, such Service provider can such as Internet service company website.
It will be detailed below several skills of the invention proposed to realize above-mentioned scene with above-mentioned principle The specific embodiment of art scheme.It should be noted that a kind of acquisition methods of the new chapters and sections of network novel provided by the invention, are It is described from the visual angle of server, the new chapters and sections acquisition methods of the network novel can be embodied as computer journey by programming Sequence is realized on remote network devices comprising but it is not limited to computer, network host, single network server, multiple networks The cloud that server set or multiple servers are constituted.
Referring to attached drawing 1, an a kind of exemplary embodiments of the new chapters and sections acquisition methods of network novel of the invention are specifically included Following steps:
Multiple Chapter List pages of same subject title are merged, obtain amalgamation result page by S11.
After the multiple Chapter List pages for obtaining same subject title, using certain duplicate removal with merge algorithm, will be multiple Chapter List page merges into result page, it can be appreciated that the more other Chapter List pages of the Chapter List in the result page are complete, and It include newest Chapter List page.It should be noted that in the acquisition methods of the new chapters and sections of the network novel of the present invention, energy The data of multiple websites are enough grabbed by Web Spider, can show whether it is novel net by automatic web page structure analysis It stands.
In one embodiment of the invention, it is further comprised the steps of: before step S11 referring to attached drawing 2
S101 detects and obtains Chapter List page, determines the subject name of each Chapter List page, each Chapter List page Corresponding to a website;
S102, cluster have the Chapter List page of identical subject name;
S103 establishes the relevance between multiple site information where the subject name and the Chapter List page.
Specifically, server carries out structural analysis to the webpage under novel website domain name, if including multiple flat in webpage Capable Chapter List label can determine that the webpage is novel Chapter List page;Wherein the multiple parallel Chapter List mark There are height similarity relation and its corresponding chapters by the direction link href (Hypertext Reference, hypertext reference) of label Section list directory is identical but specifically filename is different.For example it is assumed that the href of the multiple parallel Chapter List label The catalogue that attribute includes is 5_5288, and the filename that href attribute includes is variant, i.e., by 970871 to 970980.
Further, multiple parallel Chapter List labels that the novel Chapter List page includes include chapters and sections text Feature vector comprising have the keyword and/or chapters and sections number of characterization chapters and sections, search engine can based on above-mentioned keyword and/or Chapters and sections number goes to extract the subject name of the Chapter List page, for example, can be using " title+author " as the Chapter List page Subject name.Then, it is a set by the Chapter List page cluster with identical subject name, and obtains each Chapter List Site information where page, establishes the relevance between the subject name and the multiple site information.
Further, in order to reduce in the result Chapter List page for feed back to user including deceptive information it a possibility that, mentions The Experience Degree of high user guarantees the validity that scheme is implemented.In one embodiment of the invention, attached drawing 3 is referred to, in step Before S11, further comprise the steps of:
S01 judges that the Chapter List page is according to the similarity between a certain Chapter List page and other Chapter List pages No is false Chapter List page;
S02 filters the Chapter List page when judging to obtain the Chapter List page as false Chapter List page.
Specifically, in one embodiment of the invention, by the character features vector for obtaining each Chapter List page; And judge the average between some Chapter List page and other Chapter List pages with same text feature vector;When described When average is greater than or equal to preset similarity threshold, determine that the Chapter List page is effective Chapter List page;When described flat When mean is less than preset similarity threshold, determine the Chapter List page for false Chapter List page.
It should be noted that wherein the character features vector can be multiple keywords in Chapter List title, base Judge that algorithm judges the similarity between the multiple keyword in certain similarity;Either by extracting the same subject Numerical characteristics vector in the page number corresponding to multiple Chapter List page titles, wherein the numerical characteristics vector can be characterization The numerical value of the page number;In the present embodiment, it can calculate and appoint jointly in conjunction with Text eigenvector and its corresponding numerical characteristics vector The similarity anticipated between two Chapter List pages, can also individually be calculated using one of feature vector Chapter List page it Between similarity.After judging a certain Chapter List page for false Chapter List page, the Chapter List page is directly filtered out.
Further, it refers to attached drawing 1, further includes having step in the method described in the present invention:
S12 judges the similarity between each Chapter List page and amalgamation result page, determines the wherein maximum chapter of similarity Section list page is the first original, and other Chapter List pages are then the corresponding first authentic copy.
Specifically, having obtained amalgamation result that is more complete and including newest Chapter List item by abovementioned steps S11 Page.In the step, by comparing the similarity between each Chapter List page and amalgamation result page, determine that wherein similarity is maximum Chapter List page be the first original, other Chapter List pages are then the corresponding first authentic copy.It can be appreciated that described It most probably include newest Chapter List item in one original, can characterize the Chapter List page is the original chapters and sections column updated earliest Table page determines that the Chapter List page is the first original.
It, can be by obtaining the character features of each Chapter List page specifically, in one embodiment of the invention Vector;And calculate the sum of each Chapter List page and amalgamation result page with same text feature vector.When the sum numerical value When maximum, determine that the Chapter List page is the first original, other Chapter List pages are the corresponding first authentic copy.
Further, attached drawing 1 is referred to, further includes having step in the method described in the present invention:
S13 obtains the first usual time of the first original update, the second usual time that the first authentic copy updates, with And the usual time difference between the first usual time and the second usual time.
Specifically, in the present invention, it is corresponding by counting multiple times of the first original update, analyzing and obtaining its First usual time;Similarly, by counting multiple times of each Chapter List web update corresponding to the first authentic copy, point It analyses and obtains the second usual time corresponding to each Chapter List page;And calculate after the update of the first original, it delays certain Time, some first authentic copy corresponding to the subject name update, which is correspond to the first authentic copy used Normal time difference;And calculate after the completion of all first authentic copies update, how long once again first original is delayed more Newly, which is usual time difference of first original relative to all first authentic copies.Server meeting relevance is deposited Site information of the time value where with corresponding first original, the first authentic copy is stored up, certain first original and corresponding Multiple first authentic copies are associated with same section name in advance and store.
Further, it refers to attached drawing 1, in the method described in the present invention, further comprises the steps of:
S14 utilizes the described first usual time, the second usual time in response to obtaining the external request of Chapter List page And the temporal regularity data that usual time difference is characterized, first original and the first authentic copy are inquired, to obtain and feed back institute The Chapter List page stated.
It can be appreciated that in one embodiment of the invention, before step S14, further comprising the steps of: reception and obtaining chapters and sections The external request of list page.
Specifically, in an exemplary embodiment of the present invention, the present invention is to be mounted with network novel function of search The mobile phone of APP is user terminal, searches for input field with user in the APP, and user can input a certain network novel Subject name searches for the newest Chapter List of the novel, the newest Chapter List page of acquisition is then based on, into the list page The newest chapters and sections content pages linked.It should be noted that the present invention is only exemplary, can not constitute to of the invention Limitation.
Specifically, in one embodiment of the invention, refer to attached drawing 4, in the step S14, specifically further include with Lower step:
S141, in response to obtaining the external request of Chapter List page, according to the described first usual time, according to it is certain when Between interval inquiry the first original;
S142, judges whether Chapter List page corresponding to first original has updated;
S143 is then inquired according to the usual time difference according to certain time interval when first original has updated The first authentic copy;
S144 is obtained and is fed back site information corresponding to the updated first authentic copy.
Specifically, the acquisition Chapter List page about a certain subject name sent in received server-side to client After external request, according to the usual time that preset the first original corresponding to the subject name updates, according to certain Time interval inquires first original, and judges whether Chapter List page corresponding to first original has updated;When described First original has updated, then inquires corresponding to first original according to preset usual time difference according to certain time interval The first authentic copy;When obtaining some first authentic copy and having updated, then site information corresponding to the first authentic copy is obtained, and to visitor Feed back the site information in family end.Conversely, do not update when first original, then repeat the foundation first it is usual when Between, according to certain time interval inquire the first original the step of;It has been updated until judgement obtains first original.
Specifically, in one embodiment of the invention, by analyzing Chapter List page corresponding to first original In it is newest foundation or modification chapters and sections information, to judge whether first original has updated.For example, in an example of the invention Property embodiment in, periodically obtain Chapter List page in each parallel Chapter List label or the chapters and sections of the label institute hyperlink text The foundation time of this content or modification time obtain and record the time point of foundation time or modification time the latest, will newly obtain The time point at the time point and last record that take compares, if two time points are not identical, has characterized the Chapter List page It updates;If otherwise two time points are identical, characterize the Chapter List page and do not update.It should be noted that above-mentioned judgement The whether updated embodiment of one original is only exemplary, and those skilled in that art can also be using other modes come real Existing, the present embodiment can not be construed as limiting the invention.
It can be appreciated that through the foregoing embodiment, can believe to first authentic copy website corresponding to user feedback and the first original It ceases, the new chapters and sections of correspondence in the usual situation first authentic copy website can be directly viewable, and part can not be directly viewable by solving user The problem of new chapters and sections in novel original website, improve the Experience Degree of user.
Further, attached drawing 5 is referred to, further includes having step in the step S14 in another embodiment of the present invention It is rapid:
S145 inquires the first authentic copy according to certain time interval according to the described second usual time;
S146, judges whether Chapter List page corresponding to the first authentic copy has updated;
S147 is then looked into according to the usual time difference according to certain time interval when the first authentic copy has updated The first original is ask, to judge whether first original has updated.
Specifically, in the embodiment when detection, which obtains the first authentic copy, all have been updated, according to the preset usual time Difference, go to detect its corresponding first original whether and update once again.But when detection obtains the first authentic copy without all updating When, then the step of repeating according to the described second usual time, inquire the first authentic copy according to certain time interval, until It has been updated to all first authentic copies.
Further, in an exemplary embodiment of the present invention, corresponding all by analyzing the first authentic copy The chapters and sections information of newest foundation or modification in Chapter List page, to judge whether the first authentic copy has updated.For example, in the present invention An exemplary embodiment in, periodically obtain Chapter List page in each parallel Chapter List label or the label institute hyperlink The foundation time of the chapters and sections content of text connect or modification time obtain and record the time of foundation time or modification time the latest The time point at the time point newly obtained and last record is compared, if two time points are not identical, characterizes the chapters and sections by point List page has updated;If otherwise two time points are identical, characterize the Chapter List page and do not update.On it should be noted that It states and judges that the whether updated embodiment of the first authentic copy is only exemplary, those skilled in that art can also use its other party Formula realizes that the present embodiment can not be construed as limiting the invention.
As stated above, the acquisition methods of the new chapters and sections of a kind of network novel provided in the present invention, by identical subject name Multiple Chapter List pages, which merge, obtains amalgamation result page, and according between each Chapter List page and amalgamation result page Similarity, the determining and most like Chapter List page of amalgamation result page are the first original, remaining Chapter List page is corresponding The first authentic copy;Again in response to obtaining the external request of Chapter List page, using the first original and the first authentic copy update it is usual when Between, the regular data of usual time difference, first original and the first authentic copy are inquired, to obtain and feed back the chapters and sections column Table page.In this method chapter corresponding to the first original or the first authentic copy can be periodically inquired according to usual renewal time rule data List page is saved, the Chapter List page of update is obtained;Chapter List page without constantly grabbing each website saves network Resource, and the updated Chapter List page of user can be fed back to, improve user experience.
Further, according to the function modoularization thinking of computer software, the present invention also provides a kind of new chapters of network novel The acquisition device of section, please refers to Fig. 6.Described device includes merging module 11, reserved copy and duplicate determining module 12, time-obtaining module 13 With feedback module 14, the principle framework of whole device is erected using above-mentioned each module, to realize modularization embodiment. The concrete function that each module is realized is disclosed in detail below.
The merging module 11 obtains merging knot for merging multiple Chapter List pages of same subject title Fruit page.
After the merging module 11 obtains multiple Chapter List pages of same subject title, using certain duplicate removal and conjunction And algorithm, multiple Chapter List pages are merged into result page, it can be appreciated that the more other chapters and sections of the Chapter List in the result page List page is complete, and includes newest Chapter List page.It should be noted that the new chapters and sections of the network novel of the present invention In acquisition device, the data of multiple websites can be grabbed by Web Spider, can be obtained by automatic web page structure analysis Whether it is novel website.
In one embodiment of the invention, referring to attached drawing 7, the acquisition device further includes having cluster module 10.
The cluster module 10, for merging multiple Chapter List pages of same subject title in merging module 11 Before, Chapter List page is detected and obtained, determines that the subject name of each Chapter List page, each Chapter List page correspond to one A website;And
Cluster has the Chapter List page of identical subject name;And
Establish the relevance between multiple site information where the subject name and the Chapter List page.
Specifically, the cluster module 10 carries out structural analysis to the webpage under novel website domain name, if including in webpage There are multiple parallel Chapter List labels, that is, can determine that the webpage is novel Chapter List page;Wherein the multiple parallel chapter There are height similarity relations by the direction link href (Hypertext Reference, hypertext reference) of section list tab, and its Corresponding Chapter List catalogue is identical but specifically filename is different.For example it is assumed that the multiple parallel Chapter List mark The catalogue that the href attribute of label includes is 5_5288, and the filename that href attribute includes is variant, i.e., by 970871 to 970980。
Further, multiple parallel Chapter List labels that the novel Chapter List page includes include chapters and sections text Feature vector comprising have the keyword and/or chapters and sections number of characterization chapters and sections, the cluster module 10 can be based on above-mentioned keyword And/or chapters and sections number goes to extract the subject name of the Chapter List page, for example, can be arranged using " title+author " as the chapters and sections The subject name of table page.Then, the Chapter List page cluster with identical subject name is a collection by the cluster module 10 It closes, and obtains the site information where each Chapter List page, establish between the subject name and the multiple site information Relevance.
Further, in order to reduce in the result Chapter List page for feed back to user including deceptive information it a possibility that, mentions The Experience Degree of high user guarantees the validity that scheme is implemented.In one embodiment of the invention, attached drawing 8 is referred to, it is described to obtain Taking device further includes having false judgment module and filtering module.
The falseness judgment module 01, for according to similar between a certain Chapter List page and other Chapter List pages Degree, judges whether the Chapter List page is false Chapter List page;
The filtering module 02, for filtering the chapter when judging to obtain the Chapter List page as false Chapter List page Save list page.
Specifically, in one embodiment of the invention, the falseness judgment module 01 is by obtaining each chapters and sections column The character features vector of table page;And judge that there is same text feature between some Chapter List page and other Chapter List pages The average of vector;When the average is greater than or equal to preset similarity threshold, the falseness judgment module 01 is determined The Chapter List page is effective Chapter List page;When the average is less than preset similarity threshold, the false judgement Module 01 determines that the Chapter List page is false Chapter List page.
It should be noted that wherein the character features vector can be multiple keywords in Chapter List title, base Judge that algorithm judges the similarity between the multiple keyword in certain similarity;Either by extracting the same subject Numerical characteristics vector in the page number corresponding to multiple Chapter List page titles, wherein the numerical characteristics vector can be characterization The numerical value of the page number;In the present embodiment, it can calculate and appoint jointly in conjunction with Text eigenvector and its corresponding numerical characteristics vector The similarity anticipated between two Chapter List pages, can also individually be calculated using one of feature vector Chapter List page it Between similarity.After the false judgment module 01 judges a certain Chapter List page for false Chapter List page, the filtering Module 02 directly filters out the Chapter List page.
Further, attached drawing 6, the reserved copy and duplicate determining module 12, for judging each Chapter List page and closing are referred to And the similarity between result page, determine that wherein the maximum Chapter List page of similarity is the first original, other Chapter Lists Page is then the corresponding first authentic copy.
Specifically, having obtained merging knot that is more complete and including newest Chapter List item by aforementioned merging module 11 Fruit page.The reserved copy and duplicate determining module 12 determines it by comparing the similarity between each Chapter List page and amalgamation result page The middle maximum Chapter List page of similarity is the first original, and other Chapter List pages are then the corresponding first authentic copy.It is not difficult Understand, most probably include newest Chapter List item in first original, can characterize the Chapter List page is earliest more New original Chapter List page determines that the Chapter List page is the first original.
Specifically, in one embodiment of the invention, the reserved copy and duplicate determining module 12 can be by obtaining each The character features vector of Chapter List page;And each Chapter List page and amalgamation result page are calculated with same text feature vector Sum.When the sum numerical value maximum, determine that the Chapter List page is the first original, other Chapter List pages are corresponding The first authentic copy.
Further, attached drawing 6 is referred to, the time-obtaining module 13, for obtaining that first original updates Between the second usual time and the first usual time and the second usual time that one usual time, the first authentic copy update Usual time difference.
Specifically, in the present invention, multiple times that the time-obtaining module 13 is updated by counting first original, It analyzes and obtains its corresponding first usual time;Similarly, the time-obtaining module 13 is by counting the first authentic copy institute Multiple times of corresponding each Chapter List web update, analyze and obtain corresponding to each Chapter List page second it is usual when Between;And the time-obtaining module 13 calculates after the update of the first original, certain time is delayed, corresponding to the subject name Some first authentic copy updates, which is the usual time difference for corresponding to the first authentic copy;And the time It obtains module 13 to calculate after the completion of all first authentic copies update, first original is delayed how long once again updates, should The time delayed is usual time difference of first original relative to all first authentic copies.The time-obtaining module 13 can close Connection property stores site information of the time value where with corresponding first original, the first authentic copy, certain first original with Corresponding multiple first authentic copies are associated with same section name in advance and store.
Further, attached drawing 6, the feedback module 14, for asking in response to the outside for obtaining Chapter List page are referred to It asks, the temporal regularity data characterized using the described first usual time, the second usual time and usual time difference, inquires institute The first original and the first authentic copy are stated, to obtain and feed back the Chapter List page.
It can be appreciated that in one embodiment of the invention, the acquisition device further includes having receiving module, the reception Module, for receiving the external request for obtaining Chapter List page.
Specifically, in an exemplary embodiment of the present invention, the present invention is to be mounted with network novel function of search The mobile phone of APP is user terminal, searches for input field with user in the APP, and user can input a certain network novel Subject name searches for the newest Chapter List of the novel, and then the receiving module can receive the external request.Then it uses Family end group is in the newest Chapter List page of acquisition, the newest chapters and sections content pages that are linked into the list page.It should be noted that The present invention is only exemplary, and can not be construed as limiting the invention.
Specifically, in one embodiment of the invention, referring to attached drawing 9, the feedback module 14 further includes having original Query unit 141, original judging unit 142, copy scheduling unit 143 and copy feedback unit 144.
The original query unit 141, it is used according to described first for the external request in response to obtaining Chapter List page The normal time inquires the first original according to certain time interval;
The original judging unit 142, for judging whether Chapter List page corresponding to first original has updated;
The copy scheduling unit 143 is then pressed according to the usual time difference for having updated when first original The first authentic copy is inquired according to certain time interval;
The copy feedback unit 144, for obtaining and feeding back site information corresponding to the updated first authentic copy.
Specifically, receiving the acquisition Chapter List page about a certain subject name of client transmission in receiving module After external request, the original query unit 141 is updated according to preset the first original corresponding to the subject name The usual time inquires first original according to certain time interval, and the original judging unit 142 judges first original Whether corresponding Chapter List page has updated;When first original has updated, the copy scheduling unit 143 is then according to pre- If usual time difference inquire the first authentic copy corresponding to first original according to certain time interval;When obtain some When one copy has updated, the copy feedback unit 144 obtains site information corresponding to the first authentic copy, and anti-to client Present the site information.Conversely, do not update when first original, the copy scheduling unit 143 then repeat it is described according to According to the first usual time, the step of inquiring the first original according to certain time interval;Until judgement obtains first original It has updated.
Specifically, in one embodiment of the invention, the original judging unit 142 is by analyzing first original The chapters and sections information of newest foundation or modification in corresponding Chapter List page, to judge whether first original has updated.For example, In an exemplary embodiment of the present invention, the original judging unit 142 periodically obtains each parallel in Chapter List page Chapter List label or the label institute hyperlink chapters and sections content of text the foundation time or modification time, obtain and record this The time point of time or modification time the latest is founded, the time point at the time point newly obtained and last record is compared, if Two time points are not identical, then characterize the Chapter List page and updated;If otherwise two time points are identical, chapters and sections column are characterized Table page does not update.It should be noted that original judging unit 142 described above judges the whether updated implementation of the first original Example is only exemplary, and those skilled in that art can also be realized using other modes, and the present embodiment can not be constituted pair Limitation of the invention.
It can be appreciated that through the foregoing embodiment, can believe to first authentic copy website corresponding to user feedback and the first original It ceases, the new chapters and sections of correspondence in the usual situation first authentic copy website can be directly viewable, and part can not be directly viewable by solving user The problem of new chapters and sections in novel original website, improve the Experience Degree of user.
Further, attached drawing 10 is referred to, in another embodiment of the present invention, the feedback module 14 further includes having Copy query unit 145, copy judging unit 146 and original scheduling unit 147.
The copy query unit 145, for according to the described second usual time, according to certain time interval inquiry the One copy;
The copy judging unit 146, for judging Chapter List page corresponding to the first authentic copy whether more Newly;
The original scheduling unit 147, for having been updated when the first authentic copy, then according to the usual time difference The first original is inquired according to certain time interval, to judge whether first original has updated.
Specifically, in the embodiment when the copy judging unit 146 detection, which obtains the first authentic copy, all have been updated, The original scheduling unit 147 according to preset usual time difference, go to detect its corresponding first original whether and once again more Newly.But when the copy judging unit 146 detection obtains the first authentic copy without all updating, then call copy query unit The step of 145 repeated according to the described second usual time, inquired the first authentic copy according to certain time interval, until obtaining All first authentic copies have updated.
Further, in an exemplary embodiment of the present invention, the copy judging unit 146 passes through described in analysis The chapters and sections information of newest foundation or modification in the corresponding all Chapter List pages of the first authentic copy, to judge the first authentic copy whether It updates.For example, in an exemplary embodiment of the present invention, the copy judging unit 146 periodically obtains Chapter List page In each parallel Chapter List label or the chapters and sections content of text of the label institute hyperlink foundation time or modification time, obtain The time point of foundation time or modification time the latest is taken and records, by the time point at the time point newly obtained and last record Comparison, if two time points are not identical, characterize the Chapter List page and has updated;If otherwise two time points are identical, table The Chapter List page is levied not update.It should be noted that whether copy judging unit 146 described above judge the first authentic copy The embodiment of update is only exemplary, and those skilled in that art can also realize that the present embodiment is simultaneously using other modes It cannot be construed as limiting the invention.
As stated above, the acquisition device of the new chapters and sections of a kind of network novel provided in the present invention, merging module 11 will be identical Multiple Chapter List pages of subject name, which merge, obtains amalgamation result page, and the reserved copy and duplicate determining module 12 is according to each Similarity between Chapter List page and amalgamation result page, determining is first just with the most like Chapter List page of amalgamation result page This, remaining Chapter List page is the corresponding first authentic copy;The feedback module 14 is again in response to obtaining the outer of Chapter List page Portion's request, usual time, the usual time that the first original and the first authentic copy obtained using the time-obtaining module 13 is updated The regular data of difference inquire first original and the first authentic copy, to obtain and feed back the Chapter List page.The device In can periodically inquire Chapter List page corresponding to the first original or the first authentic copy, obtain according to usual renewal time rule data Take the Chapter List page of update;Chapter List page without constantly grabbing each website saves Internet resources, and can be anti- The updated Chapter List page of the user that feeds improves user experience.
In the instructions provided here, although the description of a large amount of detail.It is to be appreciated, however, that of the invention Embodiment can practice without these specific details.In some embodiments, it is not been shown in detail well known Methods, structures and technologies, so as not to obscure the understanding of this specification.
Although having been illustrated with some exemplary embodiments of the invention above, those skilled in the art will be managed Solution, in the case where not departing from the principle of the present invention or spirit, can make a change these exemplary embodiments, of the invention Range is defined by the claims and their equivalents.

Claims (22)

1. a kind of acquisition methods of the new chapters and sections of the network novel, which is characterized in that include step:
Multiple Chapter List pages of same subject title are merged, amalgamation result page is obtained;
Judge the similarity between each Chapter List page and amalgamation result page, determines the wherein maximum Chapter List page of similarity For the first original, other Chapter List pages are then the corresponding first authentic copy;
Obtain the second usual time that the first usual time, the first authentic copy that first original updates update and this first Usual time difference between usual time and the second usual time;
In response to obtain Chapter List page external request, using the described first usual time, the second usual time and it is usual when Between the temporal regularity data that are characterized of difference, first original and the first authentic copy are inquired, to obtain and feed back the chapters and sections List page.
2. the method according to claim 1, wherein in multiple Chapter List pages by same subject title Before the step of merging, obtaining amalgamation result page, further comprise the steps of:
Chapter List page is detected and obtained, determines that the subject name of each Chapter List page, each Chapter List page correspond to one A website;
Cluster has the Chapter List page of identical subject name;
Establish the relevance between multiple site information where the subject name and the Chapter List page.
3. the method according to claim 1, wherein it is described in response to obtain Chapter List page external request, The temporal regularity data characterized using the described first usual time, the second usual time and usual time difference, described in inquiry First original and the first authentic copy, the step of to obtain and feed back the Chapter List page before, further comprise the steps of:
Receive the external request for obtaining Chapter List page.
4. the method according to claim 1, wherein it is described in response to obtain Chapter List page external request, The temporal regularity data characterized using the described first usual time, the second usual time and usual time difference, described in inquiry First original and the first authentic copy, the step of to obtain and feed back the Chapter List page in, further comprise the steps of:
External request in response to obtaining Chapter List page is looked into according to the described first usual time according to certain time interval Ask the first original;
Judge whether Chapter List page corresponding to first original has updated;
When first original has updated, then according to the usual time difference according to the certain time interval inquiry first authentic copy;
It obtains and feeds back site information corresponding to the updated first authentic copy;
Wherein, the usual time difference is after the first original updates, certain time to be delayed, first corresponding to the subject name The time difference of Replica updating.
5. according to the method described in claim 4, it is characterized in that, Chapter List corresponding to judgement first original Whether page is in updated step, further includes:
It, should with judgement by analyzing the chapters and sections information of newest foundation or modification in Chapter List page corresponding to first original Whether the first original has updated.
6. according to the method described in claim 4, it is characterized in that, Chapter List corresponding to judgement first original Whether page further comprises the steps of: after updated step
When first original does not update, then execute it is described according to the described first usual time, according to certain time interval The step of inquiring the first original.
7. the method according to claim 1, wherein it is described in response to obtain Chapter List page external request, The temporal regularity data characterized using the described first usual time, the second usual time and usual time difference, described in inquiry First original and the first authentic copy, the step of to obtain and feed back the Chapter List page in, further comprise the steps of:
According to the described second usual time, the first authentic copy is inquired according to certain time interval;
Judge whether Chapter List page corresponding to the first authentic copy has updated;
When the first authentic copy has updated, then according to the usual time difference according to certain time interval inquiry first just This, to judge whether first original has updated;
Wherein, the usual time difference is after the completion of all first authentic copies update, and first original delays the time once again The time difference of update.
8. the method according to the description of claim 7 is characterized in that Chapter List corresponding to the judgement first authentic copy Whether page is in updated step, further includes:
By analyzing the chapters and sections information of newest foundation or modification in all Chapter List pages corresponding to the first authentic copy, to sentence Whether the disconnected first authentic copy has updated.
9. the method according to the description of claim 7 is characterized in that Chapter List corresponding to the judgement first authentic copy Whether page further comprises the steps of: after updated step
When the first authentic copy does not update, then execute it is described according to the described second usual time, according to certain time interval The step of inquiring the first authentic copy.
10. the method according to claim 1, wherein multiple Chapter List pages by same subject title Before the step of merging, obtaining amalgamation result page, further comprise the steps of:
According to the similarity between a certain Chapter List page and other Chapter List pages, judge whether the Chapter List page is false Chapter List page;
When judging to obtain the Chapter List page as false Chapter List page, the Chapter List page is filtered.
11. according to the method described in claim 10, it is characterized in that, described arrange according to a certain Chapter List page and other chapters and sections Similarity between table page judges to further comprise the steps of: in the step of whether the Chapter List page is false Chapter List page
Obtain the character features vector of each Chapter List page;
Judge the average between a certain Chapter List page and other Chapter List pages with same text feature vector;
When the average is greater than or equal to preset similarity threshold, determine that the Chapter List page is effective Chapter List Page;
When the average is less than preset similarity threshold, determine the Chapter List page for false Chapter List page.
12. a kind of acquisition device of the new chapters and sections of the network novel, which is characterized in that include:
Merging module obtains amalgamation result page for merging multiple Chapter List pages of same subject title;
Reserved copy and duplicate determining module determines wherein phase for judging the similarity between each Chapter List page and amalgamation result page It is the first original like maximum Chapter List page is spent, other Chapter List pages are then the corresponding first authentic copy;
Time-obtaining module, for obtaining the first usual time of the first original update, second that the first authentic copy updates is used to Usual time difference between normal time and the first usual time and the second usual time;
Feedback module, for the external request in response to obtaining Chapter List page, usually using the described first usual time, second The temporal regularity data that time and usual time difference are characterized inquire first original and the first authentic copy, to obtain and anti- The feedback Chapter List page.
13. device according to claim 12, it is characterised in that: it further include having cluster module,
The cluster module, for examining before merging module merges multiple Chapter List pages of same subject title Chapter List page is surveyed and obtained, determines that the subject name of each Chapter List page, each Chapter List page correspond to a website; And
Cluster has the Chapter List page of identical subject name;And
Establish the relevance between multiple site information where the subject name and the Chapter List page.
14. device according to claim 12, it is characterised in that: it further include having receiving module,
The receiving module, for receiving the external request for obtaining Chapter List page.
15. device according to claim 12, which is characterized in that the feedback module further includes having:
Original query unit, for the external request in response to obtaining Chapter List page, according to the described first usual time, according to Certain time interval inquires the first original;
Original judging unit, for judging whether Chapter List page corresponding to first original has updated;
Copy scheduling unit, for having been updated when first original, then according to the usual time difference according to certain time The interval inquiry first authentic copy;
Copy feedback unit, for obtaining and feeding back site information corresponding to the updated first authentic copy;
Wherein, the usual time difference is after the first original updates, certain time to be delayed, first corresponding to the subject name The time difference of Replica updating.
16. device according to claim 15, it is characterised in that: the original judging unit is also used to by analyzing institute The chapters and sections information of newest foundation or modification in Chapter List page corresponding to the first original is stated, whether to judge first original It updates.
17. device according to claim 15, it is characterised in that: the copy scheduling unit is also used to when described first Original does not update, then calls original query unit to execute the foundation first usual time, look into according to certain time interval The step of asking the first original.
18. device according to claim 12, which is characterized in that the feedback module further includes having:
Copy query unit, for inquiring the first authentic copy according to certain time interval according to the described second usual time;
Copy judging unit, for judging whether Chapter List page corresponding to the first authentic copy has updated;
Original scheduling unit, for having been updated when the first authentic copy, then according to the usual time difference according to a timing Between interval inquiry the first original, to judge whether first original has updated;
Wherein, the usual time difference is after the completion of all first authentic copies update, and first original delays the time once again The time difference of update.
19. device according to claim 18, it is characterised in that: the copy judging unit is secondary by analyzing described first The chapters and sections information of newest foundation or modification in all Chapter List pages corresponding to this, to judge the first authentic copy whether more Newly.
20. device according to claim 18, it is characterised in that: the copy judging unit is also used to when described first is secondary This is not updated, then calls copy query unit to execute the foundation second usual time, inquire according to certain time interval The step of first authentic copy.
21. device according to claim 12, it is characterised in that: it further include having false judgment module and filtering module,
The falseness judgment module is obtained for merging multiple Chapter List pages of same subject title in merging module To before amalgamation result page, according to the similarity between a certain Chapter List page and other Chapter List pages, judge that the chapters and sections arrange Whether table page is false Chapter List page;
Filtering module, for filtering the Chapter List page when judging to obtain the Chapter List page as false Chapter List page.
22. device according to claim 21, it is characterised in that: the falseness judgment module is also used to obtain each chapter Save the character features vector of list page;
Judge the average between a certain Chapter List page and other Chapter List pages with same text feature vector;
When the average is greater than or equal to preset similarity threshold, determine that the Chapter List page is effective Chapter List Page;
When the average is less than preset similarity threshold, determine the Chapter List page for false Chapter List page.
CN201510796828.7A 2015-11-18 2015-11-18 The acquisition methods and device of the new chapters and sections of the network novel Active CN105447130B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510796828.7A CN105447130B (en) 2015-11-18 2015-11-18 The acquisition methods and device of the new chapters and sections of the network novel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510796828.7A CN105447130B (en) 2015-11-18 2015-11-18 The acquisition methods and device of the new chapters and sections of the network novel

Publications (2)

Publication Number Publication Date
CN105447130A CN105447130A (en) 2016-03-30
CN105447130B true CN105447130B (en) 2018-12-25

Family

ID=55557307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510796828.7A Active CN105447130B (en) 2015-11-18 2015-11-18 The acquisition methods and device of the new chapters and sections of the network novel

Country Status (1)

Country Link
CN (1) CN105447130B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218744A (en) * 2012-07-20 2013-07-24 上海大智慧股份有限公司 Industry investment information and data processing system based on strength, weakness, opportunity, and threat (SWOT) model
CN104050273A (en) * 2014-06-24 2014-09-17 北京奇虎科技有限公司 Devices and methods for recording latest network file and modifying search result
CN104317903A (en) * 2014-10-24 2015-01-28 北京奇虎科技有限公司 Chapter type text chapter integrity identification method and device
CN104346443A (en) * 2014-10-20 2015-02-11 北京国双科技有限公司 Web text processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103218744A (en) * 2012-07-20 2013-07-24 上海大智慧股份有限公司 Industry investment information and data processing system based on strength, weakness, opportunity, and threat (SWOT) model
CN104050273A (en) * 2014-06-24 2014-09-17 北京奇虎科技有限公司 Devices and methods for recording latest network file and modifying search result
CN104346443A (en) * 2014-10-20 2015-02-11 北京国双科技有限公司 Web text processing method and device
CN104317903A (en) * 2014-10-24 2015-01-28 北京奇虎科技有限公司 Chapter type text chapter integrity identification method and device

Also Published As

Publication number Publication date
CN105447130A (en) 2016-03-30

Similar Documents

Publication Publication Date Title
US10885039B2 (en) Machine learning based search improvement
US9251157B2 (en) Enterprise node rank engine
US9300755B2 (en) System and method for determining information reliability
US20150186524A1 (en) Deep application crawling
CN102722498B (en) Search engine and implementation method thereof
CN103870461B (en) Subject recommending method, device and server
US20160259856A1 (en) Consolidating and formatting search results
JP5084858B2 (en) Summary creation device, summary creation method and program
US8180751B2 (en) Using an encyclopedia to build user profiles
CN102722499B (en) Search engine and implementation method thereof
CN102737021B (en) Search engine and realization method thereof
US10579710B2 (en) Bidirectional hyperlink synchronization for managing hypertexts in social media and public data repository
CN102722501A (en) Search engine and realization method thereof
Achsan et al. A fast distributed focused-web crawling
US20110208715A1 (en) Automatically mining intents of a group of queries
CN112231598A (en) Webpage path navigation method and device, electronic equipment and storage medium
CN105721519B (en) A kind of webpage data acquiring method, apparatus and system
US20120246134A1 (en) Detection and analysis of backlink activity
JP2010128917A (en) Method, device and program for extracting information propagation network
US10127319B2 (en) Distributed failover for unavailable content
CN105447130B (en) The acquisition methods and device of the new chapters and sections of the network novel
KR20200119534A (en) Ontology-based multilingual url filtering apparatus
CN102306181A (en) Method and system for providing network resources
Zhang et al. Detecting bad information in mobile wireless networks based on the wireless application protocol
CN101340463A (en) Method and apparatus for determining network resource type

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220726

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.