CN105447130B - The acquisition methods and device of the new chapters and sections of the network novel - Google Patents
The acquisition methods and device of the new chapters and sections of the network novel Download PDFInfo
- Publication number
- CN105447130B CN105447130B CN201510796828.7A CN201510796828A CN105447130B CN 105447130 B CN105447130 B CN 105447130B CN 201510796828 A CN201510796828 A CN 201510796828A CN 105447130 B CN105447130 B CN 105447130B
- Authority
- CN
- China
- Prior art keywords
- chapter list
- list page
- original
- page
- usual time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000005267 amalgamation Methods 0.000 claims abstract description 32
- 230000004044 response Effects 0.000 claims abstract description 23
- 230000002123 temporal effect Effects 0.000 claims abstract description 13
- 230000004048 modification Effects 0.000 claims description 19
- 238000012986 modification Methods 0.000 claims description 19
- 238000001914 filtration Methods 0.000 claims description 10
- 230000003111 delayed effect Effects 0.000 claims description 6
- 230000001934 delay Effects 0.000 claims description 3
- 238000009412 basement excavation Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 4
- 238000012512 characterization method Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 241000239290 Araneae Species 0.000 description 2
- 230000003466 anti-cipated effect Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 239000000700 radioactive tracer Substances 0.000 description 2
- 238000012916 structural analysis Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention relates to computer data excavation applications, in particular to the acquisition methods and device of a kind of new chapters and sections of the network novel.The described method includes: multiple Chapter List pages of same subject title are merged, amalgamation result page is obtained;Judge the similarity between each Chapter List page and amalgamation result page, determines that wherein the maximum Chapter List page of similarity is the first original, other Chapter List pages are then the corresponding first authentic copy;Obtain the usual time difference between the first usual time of the first original update, the second usual time and the first usual time and the second usual time that the first authentic copy updates;In response to obtaining the external request of Chapter List page, the temporal regularity data characterized using the described first usual time, the second usual time and usual time difference inquire first original and the first authentic copy, to obtain and feed back the Chapter List page.The invention can save Internet resources, and can feed back to the updated Chapter List page of user, improve user experience.
Description
[technical field]
The present invention relates to computer data excavation applications, in particular to the acquisition methods and dress of a kind of new chapters and sections of the network novel
It sets.
[background technique]
In recent years, with the development of the network novel, there are large quantities of websites specializing in the network novel and publishing in instalments.And for
The access of novel website and content search be all enter novel website after, then input keyword stood in retrieval, inspection
Rope goes out the novel content of the related keyword in the website.This mode is mostly the person of pursuing or the network novel love of some novels
Good person uses;It is universal still to be searched by search engine (such as Baidu, Google etc.) for more general users
Rope.
In existing way of search, due to being difficult to predict the renewal time of the newest chapters and sections of certain this novel, search engine is needed
To grab Chapter List page constantly to obtain new chapters and sections, it is inefficient;And include false novel containing a large amount of in search result
The reading website of content, so that the search need of user is not fully met, poor user experience;And due to originals such as copyrights
The new chapters and sections of cause, the original website of subnetwork novel cannot be directly viewable, but the new chapter can be obtained in copy website
The content of section, the existing copy website recommendation that cannot will can be directly viewable by the way of single site search are made to user
Obtain poor user experience.
[summary of the invention]
The purpose of the present invention aims to solve the problem that at least one above-mentioned problem, provides a kind of acquisition side of new chapters and sections of the network novel
Method and device.
To realize the purpose, the present invention adopts the following technical scheme:
The present invention provides a kind of acquisition methods of the new chapters and sections of network novel, include step:
Multiple Chapter List pages of same subject title are merged, amalgamation result page is obtained;
Judge the similarity between each Chapter List page and amalgamation result page, determines the wherein maximum chapters and sections column of similarity
Table page is the first original, and other Chapter List pages are then the corresponding first authentic copy;
Obtain the first usual time of the first original update, the second usual time that the first authentic copy updates, and should
Usual time difference between first usual time and the second usual time;
In response to obtaining the external request of Chapter List page, using the described first usual time, the second usual time and it is used to
The temporal regularity data that normal time difference is characterized inquire first original and the first authentic copy, described to obtain and feed back
Chapter List page.
Further, multiple Chapter List pages of same subject title are merged described, obtains amalgamation result page
The step of before, further comprise the steps of:
Chapter List page is detected and obtained, determines that the subject name of each Chapter List page, each Chapter List page are corresponding
In a website;
Cluster has the Chapter List page of identical subject name;
Establish the relevance between multiple site information where the subject name and the Chapter List page.
Further, the external request described in response to obtaining Chapter List page, utilizes the described first usual time, the
The temporal regularity data that two usual times and usual time difference are characterized inquire first original and the first authentic copy, to obtain
Before the step of taking and feeding back the Chapter List page, further comprise the steps of:
Receive the external request for obtaining Chapter List page.
Specifically, the external request in response to obtaining Chapter List page, is used to using the described first usual time, second
The temporal regularity data that normal time and usual time difference are characterized inquire first original and the first authentic copy, to obtain simultaneously
In the step of Chapter List page described in feedback, further comprise the steps of:
In response to obtaining the external request of Chapter List page, according to the described first usual time, according between the regular hour
Every inquiring the first original;
Judge whether Chapter List page corresponding to first original has updated;
When first original has updated, then according to the usual time difference according to the first pair of certain time interval inquiry
This;
It obtains and feeds back site information corresponding to the updated first authentic copy.
Specifically, described judge also to wrap in the whether updated step of Chapter List page corresponding to first original
It includes:
By analyzing the chapters and sections information of newest foundation or modification in Chapter List page corresponding to first original, to sentence
Whether first original that breaks has updated.
Further, it is described judge the whether updated step of Chapter List page corresponding to first original after,
It further comprises the steps of:
When first original does not update, then execute it is described according to the described first usual time, according to the regular hour
The step of interval the first original of inquiry.
Specifically, the external request in response to obtaining Chapter List page, is used to using the described first usual time, second
The temporal regularity data that normal time and usual time difference are characterized inquire first original and the first authentic copy, to obtain simultaneously
In the step of Chapter List page described in feedback, further comprise the steps of:
According to the described second usual time, the first authentic copy is inquired according to certain time interval;
Judge whether Chapter List page corresponding to the first authentic copy has updated;
When the first authentic copy has updated, then according to the usual time difference according to certain time interval inquiry first
Original, to judge whether first original has updated.
Specifically, described judge in the whether updated step of Chapter List page corresponding to the first authentic copy, also
Include:
By analyzing the chapters and sections information of newest foundation or modification in all Chapter List pages corresponding to the first authentic copy,
To judge whether the first authentic copy has updated.
Further, it is described judge the whether updated step of Chapter List page corresponding to the first authentic copy it
Afterwards, it further comprises the steps of:
When the first authentic copy does not update, then execute it is described according to the described second usual time, according to the regular hour
The step of interval inquiry first authentic copy.
Further, described to merge multiple Chapter List pages of same subject title, obtain amalgamation result page
Before step, further comprise the steps of:
According to the similarity between a certain Chapter List page and other Chapter List pages, judge the Chapter List page whether be
False Chapter List page;
When judging to obtain the Chapter List page as false Chapter List page, the Chapter List page is filtered.
Specifically, the similarity according between a certain Chapter List page and other Chapter List pages, judges the chapters and sections
In the step of whether list page is false Chapter List page, further comprise the steps of:
Obtain the character features vector of each Chapter List page;
Judge the average between a certain Chapter List page and other Chapter List pages with same text feature vector;
When the average is greater than or equal to preset similarity threshold, determine the Chapter List page for effective chapters and sections column
Table page;
When the average is less than preset similarity threshold, determine the Chapter List page for false Chapter List page.
The present invention also provides a kind of acquisition device of the new chapters and sections of network novel comprising has:
Merging module obtains amalgamation result page for merging multiple Chapter List pages of same subject title;
Reserved copy and duplicate determining module determines it for judging the similarity between each Chapter List page and amalgamation result page
The middle maximum Chapter List page of similarity is the first original, and other Chapter List pages are then the corresponding first authentic copy;
Time-obtaining module, for obtaining that the first usual time, the first authentic copy that first original updates updates
Usual time difference between two usual times and the first usual time and the second usual time;
Feedback module utilizes the described first usual time, second for the external request in response to obtaining Chapter List page
The temporal regularity data that usual time and usual time difference are characterized inquire first original and the first authentic copy, to obtain
And feed back the Chapter List page.
Further, the acquisition device further includes having cluster module,
The cluster module, for multiple Chapter List pages of same subject title to be merged it in merging module
Before, Chapter List page is detected and obtained, determines that the subject name of each Chapter List page, each Chapter List page correspond to one
Website;And
Cluster has the Chapter List page of identical subject name;And
Establish the relevance between multiple site information where the subject name and the Chapter List page.
Further, the acquisition device further includes having receiving module,
The receiving module, for receiving the external request for obtaining Chapter List page.
Specifically, the feedback module further includes having:
Original query unit, for the external request in response to obtaining Chapter List page, according to the described first usual time,
The first original is inquired according to certain time interval;
Original judging unit, for judging whether Chapter List page corresponding to first original has updated;
Copy scheduling unit, for having been updated when first original, then according to the usual time difference according to certain
Time interval inquires the first authentic copy;
Copy feedback unit, for obtaining and feeding back site information corresponding to the updated first authentic copy.
Specifically, the original judging unit, is also used to by analyzing Chapter List page corresponding to first original
In it is newest foundation or modification chapters and sections information, to judge whether first original has updated.
Specifically, the copy scheduling unit, is also used to not update when first original, then calls original cargo tracer
Member executes described according to the described first usual time, the step of inquiring the first original according to certain time interval.
Specifically, the feedback module further includes having:
Copy query unit, for inquiring the first authentic copy according to certain time interval according to the described second usual time;
Copy judging unit, for judging whether Chapter List page corresponding to the first authentic copy has updated;
Original scheduling unit, for having been updated when the first authentic copy, then according to the usual time difference according to one
It fixes time and is spaced the first original of inquiry, to judge whether first original has updated.
Specifically, the copy judging unit is by analyzing in all Chapter List pages corresponding to the first authentic copy most
The new chapters and sections information founded or modify, to judge whether the first authentic copy has updated.
Specifically, the copy judging unit is also used to not update when the first authentic copy, then copy cargo tracer is called
The step of member executes the foundation second usual time, inquires the first authentic copy according to certain time interval.
Specifically, further include having false judgment module and filtering module,
The falseness judgment module, for closing multiple Chapter List pages of same subject title in merging module
And before obtaining amalgamation result page, according to the similarity between a certain Chapter List page and other Chapter List pages, the chapter is judged
Save whether list page is false Chapter List page;
Filtering module, for filtering the Chapter List when judging to obtain the Chapter List page as false Chapter List page
Page.
Further, the false judgment module is also used to obtain the character features vector of each Chapter List page;
Judge the average between a certain Chapter List page and other Chapter List pages with same text feature vector;
When the average is greater than or equal to preset similarity threshold, determine the Chapter List page for effective chapters and sections column
Table page;
When the average is less than preset similarity threshold, determine the Chapter List page for false Chapter List page.
Compared with prior art, the present invention has following advantage:
1, the acquisition methods of the new chapters and sections of a kind of network novel provided in the present invention, by multiple chapters and sections of identical subject name
List page, which merges, obtains amalgamation result page, and according to the similarity between each Chapter List page and amalgamation result page,
The determining and most like Chapter List page of amalgamation result page is the first original, remaining Chapter List page is corresponding first secondary
This;Again in response to the external request of acquisition Chapter List page, using the usual time of the first original and first authentic copy update, usually
The regular data of time difference inquire first original and the first authentic copy, to obtain and feed back the Chapter List page.It should
In method Chapter List corresponding to the first original or the first authentic copy can be periodically inquired according to usual renewal time rule data
Page, obtains the Chapter List page of update;Chapter List page without constantly grabbing each website saves Internet resources,
And the updated Chapter List page of user can be fed back to, improve user experience;
2, further, the present invention is before multiple Chapter List pages to same subject title merge, it is also necessary to
Determine whether each Chapter List page is false Chapter List page, is arranged when judging to obtain the Chapter List page for false chapters and sections
Table page filters the Chapter List page;A possibility that reducing in the result Chapter List page for feed back to user including deceptive information,
The Experience Degree of user is further increased, guarantees the validity that scheme is implemented;
3, further, in the present invention after detecting the corresponding Chapter List web update of the first original, according to described used
Normal time difference inquires the first authentic copy according to certain time interval, to website corresponding to the updated first authentic copy of user feedback
Information.It can be to first authentic copy site information corresponding to user feedback and the first original, the usual situation first authentic copy website
In the new chapters and sections of correspondence can be directly viewable, solve asking for the new chapters and sections that user can not be directly viewable in the novel original website of part
Topic, further increases the Experience Degree of user.
The additional aspect of the present invention and advantage will be set forth in part in the description, these will become from the following description
Obviously, or practice through the invention is recognized.
[Detailed description of the invention]
Above-mentioned and/or additional aspect and advantage of the invention will become from the following description of the accompanying drawings of embodiments
Obviously and it is readily appreciated that, in which:
Fig. 1 is the program flow diagram of one embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 2 is the program flow diagram of one embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 3 is the program flow diagram of one embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 4 is the program flow diagram of one embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 5 is the program flow diagram of one embodiment of the acquisition methods of the new chapters and sections of the network novel in the present invention;
Fig. 6 is the structural schematic diagram of one embodiment of the acquisition device of the new chapters and sections of the network novel in the present invention;
Fig. 7 is the structural schematic diagram of one embodiment of the acquisition device of the new chapters and sections of the network novel in the present invention;
Fig. 8 is the structural schematic diagram of one embodiment of the acquisition device of the new chapters and sections of the network novel in the present invention;
Fig. 9 is the structural schematic diagram of one embodiment of feedback module in the present invention;
Figure 10 is the structural schematic diagram of one embodiment of feedback module in the present invention.
[specific embodiment]
The present invention is further described with exemplary embodiment with reference to the accompanying drawing, the examples of the embodiments are attached
It is shown in figure, in which the same or similar labels are throughly indicated same or similar element or there is same or like function
Element.The embodiments described below with reference to the accompanying drawings are exemplary, for explaining only the invention, and cannot be construed to pair
Limitation of the invention.In addition, if the detailed description of known technology is for showing the invention is characterized in that unnecessary, then by it
It omits.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one
It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in specification of the invention
Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition
Other one or more features, integer, step, operation, element, component and/or their group.It should be understood that when we claim member
Part is " connected " or when " coupled " to another element, it can be directly connected or coupled to other elements, or there may also be
Intermediary element.In addition, " connection " used herein or " coupling " may include being wirelessly connected or wirelessly coupling.It is used herein to arrange
Diction "and/or" includes one or more associated wholes for listing item or any cell and all combinations.
Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (including technology art
Language and scientific term), there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Should also
Understand, those terms such as defined in the general dictionary, it should be understood that have in the context of the prior art
The consistent meaning of meaning, and unless idealization or meaning too formal otherwise will not be used by specific definitions as here
To explain.
It is necessary to first carry out following guiding explanation to application scenarios of the invention and its principle.
In internet, user terminal (customer mobile terminal), network and the server (Web server of such as website are generally comprised
Deng).Wherein user terminal can be the internet mobile terminal of user, such as desktop computer (PC), laptop computer (Laptop), band
There are the smart machines of web page browsing function, such as personal digital assistant (Personal Digital Assisstant, PDA), with
And mobile internet device (Mobile Internet Device, MID) and smart phone (Phone) etc..These mobile terminals
Can be in internet environment, typical as in the Internet environment, request is by another process (process that such as server provides)
A certain service is provided.For example, in the present invention, using be mounted with network novel function of search APP mobile phone as user terminal, example
Such as: Android phone;Input field is searched for user in the APP, user can input the master of a certain network novel
Topic is to search for e-book, the result that remote server can be searched in response to the searching request to user feedback.
Server is usually can be by telecommunication medias such as internets, the typical remote computer system accessed such as internet
System.Moreover, a plurality of clients of the server typically from internet provide service.There is provided service process includes receiving to use
User terminal information and feedback information etc. are collected in the request that family end is sent.Substantially, the information that server serves as computer network mentions
This role of donor.Server is usually located at a side of the service of offer, or is configured by service provider with service content, such
Service provider can such as Internet service company website.
It will be detailed below several skills of the invention proposed to realize above-mentioned scene with above-mentioned principle
The specific embodiment of art scheme.It should be noted that a kind of acquisition methods of the new chapters and sections of network novel provided by the invention, are
It is described from the visual angle of server, the new chapters and sections acquisition methods of the network novel can be embodied as computer journey by programming
Sequence is realized on remote network devices comprising but it is not limited to computer, network host, single network server, multiple networks
The cloud that server set or multiple servers are constituted.
Referring to attached drawing 1, an a kind of exemplary embodiments of the new chapters and sections acquisition methods of network novel of the invention are specifically included
Following steps:
Multiple Chapter List pages of same subject title are merged, obtain amalgamation result page by S11.
After the multiple Chapter List pages for obtaining same subject title, using certain duplicate removal with merge algorithm, will be multiple
Chapter List page merges into result page, it can be appreciated that the more other Chapter List pages of the Chapter List in the result page are complete, and
It include newest Chapter List page.It should be noted that in the acquisition methods of the new chapters and sections of the network novel of the present invention, energy
The data of multiple websites are enough grabbed by Web Spider, can show whether it is novel net by automatic web page structure analysis
It stands.
In one embodiment of the invention, it is further comprised the steps of: before step S11 referring to attached drawing 2
S101 detects and obtains Chapter List page, determines the subject name of each Chapter List page, each Chapter List page
Corresponding to a website;
S102, cluster have the Chapter List page of identical subject name;
S103 establishes the relevance between multiple site information where the subject name and the Chapter List page.
Specifically, server carries out structural analysis to the webpage under novel website domain name, if including multiple flat in webpage
Capable Chapter List label can determine that the webpage is novel Chapter List page;Wherein the multiple parallel Chapter List mark
There are height similarity relation and its corresponding chapters by the direction link href (Hypertext Reference, hypertext reference) of label
Section list directory is identical but specifically filename is different.For example it is assumed that the href of the multiple parallel Chapter List label
The catalogue that attribute includes is 5_5288, and the filename that href attribute includes is variant, i.e., by 970871 to 970980.
Further, multiple parallel Chapter List labels that the novel Chapter List page includes include chapters and sections text
Feature vector comprising have the keyword and/or chapters and sections number of characterization chapters and sections, search engine can based on above-mentioned keyword and/or
Chapters and sections number goes to extract the subject name of the Chapter List page, for example, can be using " title+author " as the Chapter List page
Subject name.Then, it is a set by the Chapter List page cluster with identical subject name, and obtains each Chapter List
Site information where page, establishes the relevance between the subject name and the multiple site information.
Further, in order to reduce in the result Chapter List page for feed back to user including deceptive information it a possibility that, mentions
The Experience Degree of high user guarantees the validity that scheme is implemented.In one embodiment of the invention, attached drawing 3 is referred to, in step
Before S11, further comprise the steps of:
S01 judges that the Chapter List page is according to the similarity between a certain Chapter List page and other Chapter List pages
No is false Chapter List page;
S02 filters the Chapter List page when judging to obtain the Chapter List page as false Chapter List page.
Specifically, in one embodiment of the invention, by the character features vector for obtaining each Chapter List page;
And judge the average between some Chapter List page and other Chapter List pages with same text feature vector;When described
When average is greater than or equal to preset similarity threshold, determine that the Chapter List page is effective Chapter List page;When described flat
When mean is less than preset similarity threshold, determine the Chapter List page for false Chapter List page.
It should be noted that wherein the character features vector can be multiple keywords in Chapter List title, base
Judge that algorithm judges the similarity between the multiple keyword in certain similarity;Either by extracting the same subject
Numerical characteristics vector in the page number corresponding to multiple Chapter List page titles, wherein the numerical characteristics vector can be characterization
The numerical value of the page number;In the present embodiment, it can calculate and appoint jointly in conjunction with Text eigenvector and its corresponding numerical characteristics vector
The similarity anticipated between two Chapter List pages, can also individually be calculated using one of feature vector Chapter List page it
Between similarity.After judging a certain Chapter List page for false Chapter List page, the Chapter List page is directly filtered out.
Further, it refers to attached drawing 1, further includes having step in the method described in the present invention:
S12 judges the similarity between each Chapter List page and amalgamation result page, determines the wherein maximum chapter of similarity
Section list page is the first original, and other Chapter List pages are then the corresponding first authentic copy.
Specifically, having obtained amalgamation result that is more complete and including newest Chapter List item by abovementioned steps S11
Page.In the step, by comparing the similarity between each Chapter List page and amalgamation result page, determine that wherein similarity is maximum
Chapter List page be the first original, other Chapter List pages are then the corresponding first authentic copy.It can be appreciated that described
It most probably include newest Chapter List item in one original, can characterize the Chapter List page is the original chapters and sections column updated earliest
Table page determines that the Chapter List page is the first original.
It, can be by obtaining the character features of each Chapter List page specifically, in one embodiment of the invention
Vector;And calculate the sum of each Chapter List page and amalgamation result page with same text feature vector.When the sum numerical value
When maximum, determine that the Chapter List page is the first original, other Chapter List pages are the corresponding first authentic copy.
Further, attached drawing 1 is referred to, further includes having step in the method described in the present invention:
S13 obtains the first usual time of the first original update, the second usual time that the first authentic copy updates, with
And the usual time difference between the first usual time and the second usual time.
Specifically, in the present invention, it is corresponding by counting multiple times of the first original update, analyzing and obtaining its
First usual time;Similarly, by counting multiple times of each Chapter List web update corresponding to the first authentic copy, point
It analyses and obtains the second usual time corresponding to each Chapter List page;And calculate after the update of the first original, it delays certain
Time, some first authentic copy corresponding to the subject name update, which is correspond to the first authentic copy used
Normal time difference;And calculate after the completion of all first authentic copies update, how long once again first original is delayed more
Newly, which is usual time difference of first original relative to all first authentic copies.Server meeting relevance is deposited
Site information of the time value where with corresponding first original, the first authentic copy is stored up, certain first original and corresponding
Multiple first authentic copies are associated with same section name in advance and store.
Further, it refers to attached drawing 1, in the method described in the present invention, further comprises the steps of:
S14 utilizes the described first usual time, the second usual time in response to obtaining the external request of Chapter List page
And the temporal regularity data that usual time difference is characterized, first original and the first authentic copy are inquired, to obtain and feed back institute
The Chapter List page stated.
It can be appreciated that in one embodiment of the invention, before step S14, further comprising the steps of: reception and obtaining chapters and sections
The external request of list page.
Specifically, in an exemplary embodiment of the present invention, the present invention is to be mounted with network novel function of search
The mobile phone of APP is user terminal, searches for input field with user in the APP, and user can input a certain network novel
Subject name searches for the newest Chapter List of the novel, the newest Chapter List page of acquisition is then based on, into the list page
The newest chapters and sections content pages linked.It should be noted that the present invention is only exemplary, can not constitute to of the invention
Limitation.
Specifically, in one embodiment of the invention, refer to attached drawing 4, in the step S14, specifically further include with
Lower step:
S141, in response to obtaining the external request of Chapter List page, according to the described first usual time, according to it is certain when
Between interval inquiry the first original;
S142, judges whether Chapter List page corresponding to first original has updated;
S143 is then inquired according to the usual time difference according to certain time interval when first original has updated
The first authentic copy;
S144 is obtained and is fed back site information corresponding to the updated first authentic copy.
Specifically, the acquisition Chapter List page about a certain subject name sent in received server-side to client
After external request, according to the usual time that preset the first original corresponding to the subject name updates, according to certain
Time interval inquires first original, and judges whether Chapter List page corresponding to first original has updated;When described
First original has updated, then inquires corresponding to first original according to preset usual time difference according to certain time interval
The first authentic copy;When obtaining some first authentic copy and having updated, then site information corresponding to the first authentic copy is obtained, and to visitor
Feed back the site information in family end.Conversely, do not update when first original, then repeat the foundation first it is usual when
Between, according to certain time interval inquire the first original the step of;It has been updated until judgement obtains first original.
Specifically, in one embodiment of the invention, by analyzing Chapter List page corresponding to first original
In it is newest foundation or modification chapters and sections information, to judge whether first original has updated.For example, in an example of the invention
Property embodiment in, periodically obtain Chapter List page in each parallel Chapter List label or the chapters and sections of the label institute hyperlink text
The foundation time of this content or modification time obtain and record the time point of foundation time or modification time the latest, will newly obtain
The time point at the time point and last record that take compares, if two time points are not identical, has characterized the Chapter List page
It updates;If otherwise two time points are identical, characterize the Chapter List page and do not update.It should be noted that above-mentioned judgement
The whether updated embodiment of one original is only exemplary, and those skilled in that art can also be using other modes come real
Existing, the present embodiment can not be construed as limiting the invention.
It can be appreciated that through the foregoing embodiment, can believe to first authentic copy website corresponding to user feedback and the first original
It ceases, the new chapters and sections of correspondence in the usual situation first authentic copy website can be directly viewable, and part can not be directly viewable by solving user
The problem of new chapters and sections in novel original website, improve the Experience Degree of user.
Further, attached drawing 5 is referred to, further includes having step in the step S14 in another embodiment of the present invention
It is rapid:
S145 inquires the first authentic copy according to certain time interval according to the described second usual time;
S146, judges whether Chapter List page corresponding to the first authentic copy has updated;
S147 is then looked into according to the usual time difference according to certain time interval when the first authentic copy has updated
The first original is ask, to judge whether first original has updated.
Specifically, in the embodiment when detection, which obtains the first authentic copy, all have been updated, according to the preset usual time
Difference, go to detect its corresponding first original whether and update once again.But when detection obtains the first authentic copy without all updating
When, then the step of repeating according to the described second usual time, inquire the first authentic copy according to certain time interval, until
It has been updated to all first authentic copies.
Further, in an exemplary embodiment of the present invention, corresponding all by analyzing the first authentic copy
The chapters and sections information of newest foundation or modification in Chapter List page, to judge whether the first authentic copy has updated.For example, in the present invention
An exemplary embodiment in, periodically obtain Chapter List page in each parallel Chapter List label or the label institute hyperlink
The foundation time of the chapters and sections content of text connect or modification time obtain and record the time of foundation time or modification time the latest
The time point at the time point newly obtained and last record is compared, if two time points are not identical, characterizes the chapters and sections by point
List page has updated;If otherwise two time points are identical, characterize the Chapter List page and do not update.On it should be noted that
It states and judges that the whether updated embodiment of the first authentic copy is only exemplary, those skilled in that art can also use its other party
Formula realizes that the present embodiment can not be construed as limiting the invention.
As stated above, the acquisition methods of the new chapters and sections of a kind of network novel provided in the present invention, by identical subject name
Multiple Chapter List pages, which merge, obtains amalgamation result page, and according between each Chapter List page and amalgamation result page
Similarity, the determining and most like Chapter List page of amalgamation result page are the first original, remaining Chapter List page is corresponding
The first authentic copy;Again in response to obtaining the external request of Chapter List page, using the first original and the first authentic copy update it is usual when
Between, the regular data of usual time difference, first original and the first authentic copy are inquired, to obtain and feed back the chapters and sections column
Table page.In this method chapter corresponding to the first original or the first authentic copy can be periodically inquired according to usual renewal time rule data
List page is saved, the Chapter List page of update is obtained;Chapter List page without constantly grabbing each website saves network
Resource, and the updated Chapter List page of user can be fed back to, improve user experience.
Further, according to the function modoularization thinking of computer software, the present invention also provides a kind of new chapters of network novel
The acquisition device of section, please refers to Fig. 6.Described device includes merging module 11, reserved copy and duplicate determining module 12, time-obtaining module 13
With feedback module 14, the principle framework of whole device is erected using above-mentioned each module, to realize modularization embodiment.
The concrete function that each module is realized is disclosed in detail below.
The merging module 11 obtains merging knot for merging multiple Chapter List pages of same subject title
Fruit page.
After the merging module 11 obtains multiple Chapter List pages of same subject title, using certain duplicate removal and conjunction
And algorithm, multiple Chapter List pages are merged into result page, it can be appreciated that the more other chapters and sections of the Chapter List in the result page
List page is complete, and includes newest Chapter List page.It should be noted that the new chapters and sections of the network novel of the present invention
In acquisition device, the data of multiple websites can be grabbed by Web Spider, can be obtained by automatic web page structure analysis
Whether it is novel website.
In one embodiment of the invention, referring to attached drawing 7, the acquisition device further includes having cluster module 10.
The cluster module 10, for merging multiple Chapter List pages of same subject title in merging module 11
Before, Chapter List page is detected and obtained, determines that the subject name of each Chapter List page, each Chapter List page correspond to one
A website;And
Cluster has the Chapter List page of identical subject name;And
Establish the relevance between multiple site information where the subject name and the Chapter List page.
Specifically, the cluster module 10 carries out structural analysis to the webpage under novel website domain name, if including in webpage
There are multiple parallel Chapter List labels, that is, can determine that the webpage is novel Chapter List page;Wherein the multiple parallel chapter
There are height similarity relations by the direction link href (Hypertext Reference, hypertext reference) of section list tab, and its
Corresponding Chapter List catalogue is identical but specifically filename is different.For example it is assumed that the multiple parallel Chapter List mark
The catalogue that the href attribute of label includes is 5_5288, and the filename that href attribute includes is variant, i.e., by 970871 to
970980。
Further, multiple parallel Chapter List labels that the novel Chapter List page includes include chapters and sections text
Feature vector comprising have the keyword and/or chapters and sections number of characterization chapters and sections, the cluster module 10 can be based on above-mentioned keyword
And/or chapters and sections number goes to extract the subject name of the Chapter List page, for example, can be arranged using " title+author " as the chapters and sections
The subject name of table page.Then, the Chapter List page cluster with identical subject name is a collection by the cluster module 10
It closes, and obtains the site information where each Chapter List page, establish between the subject name and the multiple site information
Relevance.
Further, in order to reduce in the result Chapter List page for feed back to user including deceptive information it a possibility that, mentions
The Experience Degree of high user guarantees the validity that scheme is implemented.In one embodiment of the invention, attached drawing 8 is referred to, it is described to obtain
Taking device further includes having false judgment module and filtering module.
The falseness judgment module 01, for according to similar between a certain Chapter List page and other Chapter List pages
Degree, judges whether the Chapter List page is false Chapter List page;
The filtering module 02, for filtering the chapter when judging to obtain the Chapter List page as false Chapter List page
Save list page.
Specifically, in one embodiment of the invention, the falseness judgment module 01 is by obtaining each chapters and sections column
The character features vector of table page;And judge that there is same text feature between some Chapter List page and other Chapter List pages
The average of vector;When the average is greater than or equal to preset similarity threshold, the falseness judgment module 01 is determined
The Chapter List page is effective Chapter List page;When the average is less than preset similarity threshold, the false judgement
Module 01 determines that the Chapter List page is false Chapter List page.
It should be noted that wherein the character features vector can be multiple keywords in Chapter List title, base
Judge that algorithm judges the similarity between the multiple keyword in certain similarity;Either by extracting the same subject
Numerical characteristics vector in the page number corresponding to multiple Chapter List page titles, wherein the numerical characteristics vector can be characterization
The numerical value of the page number;In the present embodiment, it can calculate and appoint jointly in conjunction with Text eigenvector and its corresponding numerical characteristics vector
The similarity anticipated between two Chapter List pages, can also individually be calculated using one of feature vector Chapter List page it
Between similarity.After the false judgment module 01 judges a certain Chapter List page for false Chapter List page, the filtering
Module 02 directly filters out the Chapter List page.
Further, attached drawing 6, the reserved copy and duplicate determining module 12, for judging each Chapter List page and closing are referred to
And the similarity between result page, determine that wherein the maximum Chapter List page of similarity is the first original, other Chapter Lists
Page is then the corresponding first authentic copy.
Specifically, having obtained merging knot that is more complete and including newest Chapter List item by aforementioned merging module 11
Fruit page.The reserved copy and duplicate determining module 12 determines it by comparing the similarity between each Chapter List page and amalgamation result page
The middle maximum Chapter List page of similarity is the first original, and other Chapter List pages are then the corresponding first authentic copy.It is not difficult
Understand, most probably include newest Chapter List item in first original, can characterize the Chapter List page is earliest more
New original Chapter List page determines that the Chapter List page is the first original.
Specifically, in one embodiment of the invention, the reserved copy and duplicate determining module 12 can be by obtaining each
The character features vector of Chapter List page;And each Chapter List page and amalgamation result page are calculated with same text feature vector
Sum.When the sum numerical value maximum, determine that the Chapter List page is the first original, other Chapter List pages are corresponding
The first authentic copy.
Further, attached drawing 6 is referred to, the time-obtaining module 13, for obtaining that first original updates
Between the second usual time and the first usual time and the second usual time that one usual time, the first authentic copy update
Usual time difference.
Specifically, in the present invention, multiple times that the time-obtaining module 13 is updated by counting first original,
It analyzes and obtains its corresponding first usual time;Similarly, the time-obtaining module 13 is by counting the first authentic copy institute
Multiple times of corresponding each Chapter List web update, analyze and obtain corresponding to each Chapter List page second it is usual when
Between;And the time-obtaining module 13 calculates after the update of the first original, certain time is delayed, corresponding to the subject name
Some first authentic copy updates, which is the usual time difference for corresponding to the first authentic copy;And the time
It obtains module 13 to calculate after the completion of all first authentic copies update, first original is delayed how long once again updates, should
The time delayed is usual time difference of first original relative to all first authentic copies.The time-obtaining module 13 can close
Connection property stores site information of the time value where with corresponding first original, the first authentic copy, certain first original with
Corresponding multiple first authentic copies are associated with same section name in advance and store.
Further, attached drawing 6, the feedback module 14, for asking in response to the outside for obtaining Chapter List page are referred to
It asks, the temporal regularity data characterized using the described first usual time, the second usual time and usual time difference, inquires institute
The first original and the first authentic copy are stated, to obtain and feed back the Chapter List page.
It can be appreciated that in one embodiment of the invention, the acquisition device further includes having receiving module, the reception
Module, for receiving the external request for obtaining Chapter List page.
Specifically, in an exemplary embodiment of the present invention, the present invention is to be mounted with network novel function of search
The mobile phone of APP is user terminal, searches for input field with user in the APP, and user can input a certain network novel
Subject name searches for the newest Chapter List of the novel, and then the receiving module can receive the external request.Then it uses
Family end group is in the newest Chapter List page of acquisition, the newest chapters and sections content pages that are linked into the list page.It should be noted that
The present invention is only exemplary, and can not be construed as limiting the invention.
Specifically, in one embodiment of the invention, referring to attached drawing 9, the feedback module 14 further includes having original
Query unit 141, original judging unit 142, copy scheduling unit 143 and copy feedback unit 144.
The original query unit 141, it is used according to described first for the external request in response to obtaining Chapter List page
The normal time inquires the first original according to certain time interval;
The original judging unit 142, for judging whether Chapter List page corresponding to first original has updated;
The copy scheduling unit 143 is then pressed according to the usual time difference for having updated when first original
The first authentic copy is inquired according to certain time interval;
The copy feedback unit 144, for obtaining and feeding back site information corresponding to the updated first authentic copy.
Specifically, receiving the acquisition Chapter List page about a certain subject name of client transmission in receiving module
After external request, the original query unit 141 is updated according to preset the first original corresponding to the subject name
The usual time inquires first original according to certain time interval, and the original judging unit 142 judges first original
Whether corresponding Chapter List page has updated;When first original has updated, the copy scheduling unit 143 is then according to pre-
If usual time difference inquire the first authentic copy corresponding to first original according to certain time interval;When obtain some
When one copy has updated, the copy feedback unit 144 obtains site information corresponding to the first authentic copy, and anti-to client
Present the site information.Conversely, do not update when first original, the copy scheduling unit 143 then repeat it is described according to
According to the first usual time, the step of inquiring the first original according to certain time interval;Until judgement obtains first original
It has updated.
Specifically, in one embodiment of the invention, the original judging unit 142 is by analyzing first original
The chapters and sections information of newest foundation or modification in corresponding Chapter List page, to judge whether first original has updated.For example,
In an exemplary embodiment of the present invention, the original judging unit 142 periodically obtains each parallel in Chapter List page
Chapter List label or the label institute hyperlink chapters and sections content of text the foundation time or modification time, obtain and record this
The time point of time or modification time the latest is founded, the time point at the time point newly obtained and last record is compared, if
Two time points are not identical, then characterize the Chapter List page and updated;If otherwise two time points are identical, chapters and sections column are characterized
Table page does not update.It should be noted that original judging unit 142 described above judges the whether updated implementation of the first original
Example is only exemplary, and those skilled in that art can also be realized using other modes, and the present embodiment can not be constituted pair
Limitation of the invention.
It can be appreciated that through the foregoing embodiment, can believe to first authentic copy website corresponding to user feedback and the first original
It ceases, the new chapters and sections of correspondence in the usual situation first authentic copy website can be directly viewable, and part can not be directly viewable by solving user
The problem of new chapters and sections in novel original website, improve the Experience Degree of user.
Further, attached drawing 10 is referred to, in another embodiment of the present invention, the feedback module 14 further includes having
Copy query unit 145, copy judging unit 146 and original scheduling unit 147.
The copy query unit 145, for according to the described second usual time, according to certain time interval inquiry the
One copy;
The copy judging unit 146, for judging Chapter List page corresponding to the first authentic copy whether more
Newly;
The original scheduling unit 147, for having been updated when the first authentic copy, then according to the usual time difference
The first original is inquired according to certain time interval, to judge whether first original has updated.
Specifically, in the embodiment when the copy judging unit 146 detection, which obtains the first authentic copy, all have been updated,
The original scheduling unit 147 according to preset usual time difference, go to detect its corresponding first original whether and once again more
Newly.But when the copy judging unit 146 detection obtains the first authentic copy without all updating, then call copy query unit
The step of 145 repeated according to the described second usual time, inquired the first authentic copy according to certain time interval, until obtaining
All first authentic copies have updated.
Further, in an exemplary embodiment of the present invention, the copy judging unit 146 passes through described in analysis
The chapters and sections information of newest foundation or modification in the corresponding all Chapter List pages of the first authentic copy, to judge the first authentic copy whether
It updates.For example, in an exemplary embodiment of the present invention, the copy judging unit 146 periodically obtains Chapter List page
In each parallel Chapter List label or the chapters and sections content of text of the label institute hyperlink foundation time or modification time, obtain
The time point of foundation time or modification time the latest is taken and records, by the time point at the time point newly obtained and last record
Comparison, if two time points are not identical, characterize the Chapter List page and has updated;If otherwise two time points are identical, table
The Chapter List page is levied not update.It should be noted that whether copy judging unit 146 described above judge the first authentic copy
The embodiment of update is only exemplary, and those skilled in that art can also realize that the present embodiment is simultaneously using other modes
It cannot be construed as limiting the invention.
As stated above, the acquisition device of the new chapters and sections of a kind of network novel provided in the present invention, merging module 11 will be identical
Multiple Chapter List pages of subject name, which merge, obtains amalgamation result page, and the reserved copy and duplicate determining module 12 is according to each
Similarity between Chapter List page and amalgamation result page, determining is first just with the most like Chapter List page of amalgamation result page
This, remaining Chapter List page is the corresponding first authentic copy;The feedback module 14 is again in response to obtaining the outer of Chapter List page
Portion's request, usual time, the usual time that the first original and the first authentic copy obtained using the time-obtaining module 13 is updated
The regular data of difference inquire first original and the first authentic copy, to obtain and feed back the Chapter List page.The device
In can periodically inquire Chapter List page corresponding to the first original or the first authentic copy, obtain according to usual renewal time rule data
Take the Chapter List page of update;Chapter List page without constantly grabbing each website saves Internet resources, and can be anti-
The updated Chapter List page of the user that feeds improves user experience.
In the instructions provided here, although the description of a large amount of detail.It is to be appreciated, however, that of the invention
Embodiment can practice without these specific details.In some embodiments, it is not been shown in detail well known
Methods, structures and technologies, so as not to obscure the understanding of this specification.
Although having been illustrated with some exemplary embodiments of the invention above, those skilled in the art will be managed
Solution, in the case where not departing from the principle of the present invention or spirit, can make a change these exemplary embodiments, of the invention
Range is defined by the claims and their equivalents.
Claims (22)
1. a kind of acquisition methods of the new chapters and sections of the network novel, which is characterized in that include step:
Multiple Chapter List pages of same subject title are merged, amalgamation result page is obtained;
Judge the similarity between each Chapter List page and amalgamation result page, determines the wherein maximum Chapter List page of similarity
For the first original, other Chapter List pages are then the corresponding first authentic copy;
Obtain the second usual time that the first usual time, the first authentic copy that first original updates update and this first
Usual time difference between usual time and the second usual time;
In response to obtain Chapter List page external request, using the described first usual time, the second usual time and it is usual when
Between the temporal regularity data that are characterized of difference, first original and the first authentic copy are inquired, to obtain and feed back the chapters and sections
List page.
2. the method according to claim 1, wherein in multiple Chapter List pages by same subject title
Before the step of merging, obtaining amalgamation result page, further comprise the steps of:
Chapter List page is detected and obtained, determines that the subject name of each Chapter List page, each Chapter List page correspond to one
A website;
Cluster has the Chapter List page of identical subject name;
Establish the relevance between multiple site information where the subject name and the Chapter List page.
3. the method according to claim 1, wherein it is described in response to obtain Chapter List page external request,
The temporal regularity data characterized using the described first usual time, the second usual time and usual time difference, described in inquiry
First original and the first authentic copy, the step of to obtain and feed back the Chapter List page before, further comprise the steps of:
Receive the external request for obtaining Chapter List page.
4. the method according to claim 1, wherein it is described in response to obtain Chapter List page external request,
The temporal regularity data characterized using the described first usual time, the second usual time and usual time difference, described in inquiry
First original and the first authentic copy, the step of to obtain and feed back the Chapter List page in, further comprise the steps of:
External request in response to obtaining Chapter List page is looked into according to the described first usual time according to certain time interval
Ask the first original;
Judge whether Chapter List page corresponding to first original has updated;
When first original has updated, then according to the usual time difference according to the certain time interval inquiry first authentic copy;
It obtains and feeds back site information corresponding to the updated first authentic copy;
Wherein, the usual time difference is after the first original updates, certain time to be delayed, first corresponding to the subject name
The time difference of Replica updating.
5. according to the method described in claim 4, it is characterized in that, Chapter List corresponding to judgement first original
Whether page is in updated step, further includes:
It, should with judgement by analyzing the chapters and sections information of newest foundation or modification in Chapter List page corresponding to first original
Whether the first original has updated.
6. according to the method described in claim 4, it is characterized in that, Chapter List corresponding to judgement first original
Whether page further comprises the steps of: after updated step
When first original does not update, then execute it is described according to the described first usual time, according to certain time interval
The step of inquiring the first original.
7. the method according to claim 1, wherein it is described in response to obtain Chapter List page external request,
The temporal regularity data characterized using the described first usual time, the second usual time and usual time difference, described in inquiry
First original and the first authentic copy, the step of to obtain and feed back the Chapter List page in, further comprise the steps of:
According to the described second usual time, the first authentic copy is inquired according to certain time interval;
Judge whether Chapter List page corresponding to the first authentic copy has updated;
When the first authentic copy has updated, then according to the usual time difference according to certain time interval inquiry first just
This, to judge whether first original has updated;
Wherein, the usual time difference is after the completion of all first authentic copies update, and first original delays the time once again
The time difference of update.
8. the method according to the description of claim 7 is characterized in that Chapter List corresponding to the judgement first authentic copy
Whether page is in updated step, further includes:
By analyzing the chapters and sections information of newest foundation or modification in all Chapter List pages corresponding to the first authentic copy, to sentence
Whether the disconnected first authentic copy has updated.
9. the method according to the description of claim 7 is characterized in that Chapter List corresponding to the judgement first authentic copy
Whether page further comprises the steps of: after updated step
When the first authentic copy does not update, then execute it is described according to the described second usual time, according to certain time interval
The step of inquiring the first authentic copy.
10. the method according to claim 1, wherein multiple Chapter List pages by same subject title
Before the step of merging, obtaining amalgamation result page, further comprise the steps of:
According to the similarity between a certain Chapter List page and other Chapter List pages, judge whether the Chapter List page is false
Chapter List page;
When judging to obtain the Chapter List page as false Chapter List page, the Chapter List page is filtered.
11. according to the method described in claim 10, it is characterized in that, described arrange according to a certain Chapter List page and other chapters and sections
Similarity between table page judges to further comprise the steps of: in the step of whether the Chapter List page is false Chapter List page
Obtain the character features vector of each Chapter List page;
Judge the average between a certain Chapter List page and other Chapter List pages with same text feature vector;
When the average is greater than or equal to preset similarity threshold, determine that the Chapter List page is effective Chapter List
Page;
When the average is less than preset similarity threshold, determine the Chapter List page for false Chapter List page.
12. a kind of acquisition device of the new chapters and sections of the network novel, which is characterized in that include:
Merging module obtains amalgamation result page for merging multiple Chapter List pages of same subject title;
Reserved copy and duplicate determining module determines wherein phase for judging the similarity between each Chapter List page and amalgamation result page
It is the first original like maximum Chapter List page is spent, other Chapter List pages are then the corresponding first authentic copy;
Time-obtaining module, for obtaining the first usual time of the first original update, second that the first authentic copy updates is used to
Usual time difference between normal time and the first usual time and the second usual time;
Feedback module, for the external request in response to obtaining Chapter List page, usually using the described first usual time, second
The temporal regularity data that time and usual time difference are characterized inquire first original and the first authentic copy, to obtain and anti-
The feedback Chapter List page.
13. device according to claim 12, it is characterised in that: it further include having cluster module,
The cluster module, for examining before merging module merges multiple Chapter List pages of same subject title
Chapter List page is surveyed and obtained, determines that the subject name of each Chapter List page, each Chapter List page correspond to a website;
And
Cluster has the Chapter List page of identical subject name;And
Establish the relevance between multiple site information where the subject name and the Chapter List page.
14. device according to claim 12, it is characterised in that: it further include having receiving module,
The receiving module, for receiving the external request for obtaining Chapter List page.
15. device according to claim 12, which is characterized in that the feedback module further includes having:
Original query unit, for the external request in response to obtaining Chapter List page, according to the described first usual time, according to
Certain time interval inquires the first original;
Original judging unit, for judging whether Chapter List page corresponding to first original has updated;
Copy scheduling unit, for having been updated when first original, then according to the usual time difference according to certain time
The interval inquiry first authentic copy;
Copy feedback unit, for obtaining and feeding back site information corresponding to the updated first authentic copy;
Wherein, the usual time difference is after the first original updates, certain time to be delayed, first corresponding to the subject name
The time difference of Replica updating.
16. device according to claim 15, it is characterised in that: the original judging unit is also used to by analyzing institute
The chapters and sections information of newest foundation or modification in Chapter List page corresponding to the first original is stated, whether to judge first original
It updates.
17. device according to claim 15, it is characterised in that: the copy scheduling unit is also used to when described first
Original does not update, then calls original query unit to execute the foundation first usual time, look into according to certain time interval
The step of asking the first original.
18. device according to claim 12, which is characterized in that the feedback module further includes having:
Copy query unit, for inquiring the first authentic copy according to certain time interval according to the described second usual time;
Copy judging unit, for judging whether Chapter List page corresponding to the first authentic copy has updated;
Original scheduling unit, for having been updated when the first authentic copy, then according to the usual time difference according to a timing
Between interval inquiry the first original, to judge whether first original has updated;
Wherein, the usual time difference is after the completion of all first authentic copies update, and first original delays the time once again
The time difference of update.
19. device according to claim 18, it is characterised in that: the copy judging unit is secondary by analyzing described first
The chapters and sections information of newest foundation or modification in all Chapter List pages corresponding to this, to judge the first authentic copy whether more
Newly.
20. device according to claim 18, it is characterised in that: the copy judging unit is also used to when described first is secondary
This is not updated, then calls copy query unit to execute the foundation second usual time, inquire according to certain time interval
The step of first authentic copy.
21. device according to claim 12, it is characterised in that: it further include having false judgment module and filtering module,
The falseness judgment module is obtained for merging multiple Chapter List pages of same subject title in merging module
To before amalgamation result page, according to the similarity between a certain Chapter List page and other Chapter List pages, judge that the chapters and sections arrange
Whether table page is false Chapter List page;
Filtering module, for filtering the Chapter List page when judging to obtain the Chapter List page as false Chapter List page.
22. device according to claim 21, it is characterised in that: the falseness judgment module is also used to obtain each chapter
Save the character features vector of list page;
Judge the average between a certain Chapter List page and other Chapter List pages with same text feature vector;
When the average is greater than or equal to preset similarity threshold, determine that the Chapter List page is effective Chapter List
Page;
When the average is less than preset similarity threshold, determine the Chapter List page for false Chapter List page.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510796828.7A CN105447130B (en) | 2015-11-18 | 2015-11-18 | The acquisition methods and device of the new chapters and sections of the network novel |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510796828.7A CN105447130B (en) | 2015-11-18 | 2015-11-18 | The acquisition methods and device of the new chapters and sections of the network novel |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105447130A CN105447130A (en) | 2016-03-30 |
CN105447130B true CN105447130B (en) | 2018-12-25 |
Family
ID=55557307
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510796828.7A Active CN105447130B (en) | 2015-11-18 | 2015-11-18 | The acquisition methods and device of the new chapters and sections of the network novel |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105447130B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218744A (en) * | 2012-07-20 | 2013-07-24 | 上海大智慧股份有限公司 | Industry investment information and data processing system based on strength, weakness, opportunity, and threat (SWOT) model |
CN104050273A (en) * | 2014-06-24 | 2014-09-17 | 北京奇虎科技有限公司 | Devices and methods for recording latest network file and modifying search result |
CN104317903A (en) * | 2014-10-24 | 2015-01-28 | 北京奇虎科技有限公司 | Chapter type text chapter integrity identification method and device |
CN104346443A (en) * | 2014-10-20 | 2015-02-11 | 北京国双科技有限公司 | Web text processing method and device |
-
2015
- 2015-11-18 CN CN201510796828.7A patent/CN105447130B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218744A (en) * | 2012-07-20 | 2013-07-24 | 上海大智慧股份有限公司 | Industry investment information and data processing system based on strength, weakness, opportunity, and threat (SWOT) model |
CN104050273A (en) * | 2014-06-24 | 2014-09-17 | 北京奇虎科技有限公司 | Devices and methods for recording latest network file and modifying search result |
CN104346443A (en) * | 2014-10-20 | 2015-02-11 | 北京国双科技有限公司 | Web text processing method and device |
CN104317903A (en) * | 2014-10-24 | 2015-01-28 | 北京奇虎科技有限公司 | Chapter type text chapter integrity identification method and device |
Also Published As
Publication number | Publication date |
---|---|
CN105447130A (en) | 2016-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10885039B2 (en) | Machine learning based search improvement | |
US9251157B2 (en) | Enterprise node rank engine | |
US9300755B2 (en) | System and method for determining information reliability | |
US20150186524A1 (en) | Deep application crawling | |
CN102722498B (en) | Search engine and implementation method thereof | |
CN103870461B (en) | Subject recommending method, device and server | |
US20160259856A1 (en) | Consolidating and formatting search results | |
JP5084858B2 (en) | Summary creation device, summary creation method and program | |
US8180751B2 (en) | Using an encyclopedia to build user profiles | |
CN102722499B (en) | Search engine and implementation method thereof | |
CN102737021B (en) | Search engine and realization method thereof | |
US10579710B2 (en) | Bidirectional hyperlink synchronization for managing hypertexts in social media and public data repository | |
CN102722501A (en) | Search engine and realization method thereof | |
Achsan et al. | A fast distributed focused-web crawling | |
US20110208715A1 (en) | Automatically mining intents of a group of queries | |
CN112231598A (en) | Webpage path navigation method and device, electronic equipment and storage medium | |
CN105721519B (en) | A kind of webpage data acquiring method, apparatus and system | |
US20120246134A1 (en) | Detection and analysis of backlink activity | |
JP2010128917A (en) | Method, device and program for extracting information propagation network | |
US10127319B2 (en) | Distributed failover for unavailable content | |
CN105447130B (en) | The acquisition methods and device of the new chapters and sections of the network novel | |
KR20200119534A (en) | Ontology-based multilingual url filtering apparatus | |
CN102306181A (en) | Method and system for providing network resources | |
Zhang et al. | Detecting bad information in mobile wireless networks based on the wireless application protocol | |
CN101340463A (en) | Method and apparatus for determining network resource type |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220726 Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015 Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park) Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Patentee before: Qizhi software (Beijing) Co.,Ltd. |