CN104699848B - The data pick-up method and device in limited web data storehouse - Google Patents
The data pick-up method and device in limited web data storehouse Download PDFInfo
- Publication number
- CN104699848B CN104699848B CN201510154092.3A CN201510154092A CN104699848B CN 104699848 B CN104699848 B CN 104699848B CN 201510154092 A CN201510154092 A CN 201510154092A CN 104699848 B CN104699848 B CN 104699848B
- Authority
- CN
- China
- Prior art keywords
- data
- inquiry
- draw
- out device
- storehouse
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The present invention relates to field of computer technology, there is provided a kind of data pick-up method and device in limited web data storehouse.Method includes:Draw-out device obtains a property value in web data library inquiry interface;Inquiry request is generated, is sent to limited web data storehouse;The Webpage of query feedback is parsed, extracts inquiry data;According to the data in inquiry data update local data base;By being analyzed based on EdaliwdbFCA algorithms local data base, next group polling property value is produced;When the bar number for inquiring about data is equal to predetermined threshold value, terminate the extraction of data.Draw-out device includes querying attributes value obtaining unit, query unit, resolution unit, data updating unit, querying attributes value generation unit and poll-final unit.Data pick-up by combining form conceptual analysis method to the limited web data storehouse based on property value query interface, realizes and the higher data of mass is extracted in limited web data storehouse, and has the characteristics that stability is good, efficiency is fast.
Description
Technical field
The present invention relates to field of computer technology, in particular to a kind of data pick-up method in limited web data storehouse
And device.
Background technology
Either for technical reasons or application demand, if the inquiry in web data storehouse return the result be limited in it is certain
In the range of, i.e., web data storehouse is inquired about using set of properties, only k object can be automatically obtained by program, then be had
So the web data storehouse of feature is limited web data storehouse.Web page is divided into shallow net and deep net, and shallow net is connected by hyperlink
The static Web page come, according to statistics, the scale of deep net resource is 500 times of static page resource or so, while possesses more preferable number
According to quality, and most important resource is exactly web data storehouse in netting deeply.The data in limited web data storehouse how are extracted, and are extracted
The higher data of mass are always the problem being widely studied.
The content of the invention
In view of this, it is an object of the invention to provide a kind of data pick-up method and device in limited web data storehouse, energy
Enough realize and the higher data of mass are extracted from limited web data storehouse.
What the present invention was realized in:
In a first aspect, an embodiment of the present invention provides a kind of data pick-up method in limited web data storehouse, applied to limited
The data pick-up device in web data storehouse, the draw-out device include local data base, the described method includes:
The draw-out device obtains a property value in web data library inquiry interface;
The inquiry request is sent to described limited by the draw-out device according to the attribute value generation inquiry request
Web data storehouse;
The Webpage of the draw-out device parsing query feedback, extracts the inquiry number included by the Webpage
According to;
Data of the draw-out device in the inquiry data update local data base;
The draw-out device extracts (Extract data from by being limited web data storehouse based on maximum sub- concept
Limited Web Database based on Formal Concept Analysis, EdaliwdbFCA) algorithm is to described
Local data base is analyzed, and produces next group polling property value, to inquire about again the limited web data storehouse;
When the bar number of the inquiry data is equal to the default of the number of data for the every page of display of Webpage fed back after inquiry
During threshold value, the draw-out device terminates the extraction of data.
With reference to first aspect, an embodiment of the present invention provides the first possible embodiment of first aspect, wherein institute
Before the Webpage for stating draw-out device parsing query feedback, the method further includes:
Judge whether to receive the Webpage of feedback query in preset time;
If the Webpage of non-feedback query in preset time, the draw-out device again send the inquiry request
To the limited web data storehouse.
What the abstracting method faced is an internet complicated and changeable, and any accident may cause extraction process
The phenomenon of middle inquiry failure.Therefore, inquiry is required for being managed and safeguards each time, and the inquiry of failure can be found, and
Inquiry can be re-started, can so cause the abstracting method that there is more preferable robustness, can ensure to extract the smooth of work
Carry out.
With reference to first aspect, an embodiment of the present invention provides second of possible embodiment of first aspect, wherein institute
Data of the draw-out device in the inquiry data update local data base are stated, including:
Data in inquiry data and the local data base that the draw-out device relatively extracts;
The inquiry data that the draw-out device will differ from the data in the local data base are added to the local number
According in storehouse.
Extract data be by the data in restricted web database according to certain rule extraction into local data base, allow
Data in restricted web database can be utilized., need not be again if there are the data extracted in local data base
It is added in local data base.
With reference to first aspect, an embodiment of the present invention provides the third possible embodiment of first aspect, wherein institute
Draw-out device is stated according to the attribute value generation inquiry request, including:
Single-value attribute is converted into the multi-valued attribute that the web data library inquiry interface can identify by the draw-out device.
Second aspect, the embodiment of the present invention additionally provide a kind of data pick-up device in limited web data storehouse, the extraction
Device includes local data base, and the draw-out device further includes:
Querying attributes value obtaining unit, for obtaining a property value in web data library inquiry interface;
Query unit, for according to the attribute value generation inquiry request, the inquiry request being sent to described limited
Web data storehouse;
Resolution unit, for parsing the Webpage of query feedback, extracts the inquiry number included by the Webpage
According to;
Data updating unit, for the data in the inquiry data update local data base;
Querying attributes value generation unit, for extracting EdaliwdbFCA by being limited web data storehouse based on maximum sub- concept
Algorithm analyzes the local data base, produces next group polling property value, so as to again to the limited web data storehouse
Inquired about;
Poll-final unit, for being equal to the every page of display of Webpage fed back after inquiry when the bar number of the inquiry data
Number of data predetermined threshold value when, terminate the extraction of data.
With reference to second aspect, an embodiment of the present invention provides the first possible embodiment of second aspect, wherein institute
Stating resolution unit includes:
Webpage receives judgment sub-unit, for judging whether to receive the Webpage of feedback query in preset time;
If the Webpage of non-feedback query in preset time, the query unit again send the inquiry request
To the limited web data storehouse.
The draw-out device is applied in internet complicated and changeable, and any accident is responsible for inquiring about in extraction process
The phenomenon of failure.Therefore, inquiry is required for being managed and safeguards each time, and the inquiry of failure can be found, and can be weighed
Newly inquired about, can so cause the draw-out device that there is more preferable robustness, can ensure to extract being smoothed out for work.
With reference to second aspect, an embodiment of the present invention provides second of possible embodiment of second aspect, wherein institute
Stating data updating unit includes:
Comparing subunit, the number in the inquiry data and the local data base that are extracted for the resolution unit
According to;
Data add subelement, and the inquiry data extracted for will differ from the data in the local data base add
It is added in the local data base.
Extract data be by the data in restricted web database according to certain rule extraction into local data base, allow
Data in restricted web database can be utilized., need not be again if there are the data extracted in local data base
It is added in local data base.
With reference to second aspect, an embodiment of the present invention provides the third possible embodiment of second aspect, wherein institute
Stating query unit includes:
Attribute transforming subunit, for single-value attribute to be converted into the multivalue that the web data library inquiry interface can identify
Attribute.
The embodiment of the present invention provides a kind of data pick-up method and device in limited web data storehouse, general by combining form
Data pick-up of the analysis method to the limited web data storehouse based on property value query interface is read, is realized in limited web data storehouse
The higher data of mass are extracted, and have the characteristics that stability is good, efficiency is fast.
To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate
Appended attached drawing, is described in detail below.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair
The restriction of scope, for those of ordinary skill in the art, without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows a kind of data pick-up method in limited web data storehouse provided in an embodiment of the present invention;
Fig. 2 shows the data pick-up method in another limited web data storehouse provided in an embodiment of the present invention;
Fig. 3 shows a kind of data pick-up device in limited web data storehouse provided in an embodiment of the present invention;
Fig. 4 shows the data pick-up device in another limited web data storehouse provided in an embodiment of the present invention.
Marked in figure:Local data base 301, querying attributes value obtaining unit 302, query unit 303, is limited web data
Storehouse 304, resolution unit 305, data updating unit 306, querying attributes value generation unit 307, poll-final unit 308, webpage
Receive judgment sub-unit 309, comparing subunit 310, data addition subelement 311, attribute transforming subunit 312.
Embodiment
Below in conjunction with attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Ground describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.Usually exist
The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and designed with a variety of configurations herein.Cause
This, the detailed description of the embodiment of the present invention to providing in the accompanying drawings is not intended to limit claimed invention below
Scope, but it is merely representative of the selected embodiment of the present invention.Based on the embodiment of the present invention, those skilled in the art are not doing
Go out all other embodiments obtained on the premise of creative work, belong to the scope of protection of the invention.
The acquisition of web data is mainly by obtaining Web page information extraction.Web page is divided into shallow net and deep net.Shallow net is
The static Web page connected by hyperlink, its content can be by current universal search engine (Google, Baidu etc.) direct ropes
Draw and retrieve.Deep net refers to those Web pages produced according to user's request by Web server dynamic.It is wherein addressable
Line database (abbreviated here as web data storehouse or WDB), such as middle National IP Network, ten-thousand-ton train, remarkable Amazon etc., they
It is important deep net part.The content in web data storehouse is stored in real background data base, and major part cannot be worked as
Preceding universal search engine is indexed.Deep net Web page content only when being queried, just can according to the inquiry request of user,
By Web server dynamic generation, and return result to visitor.
The data pick-up method and device in limited web data storehouse provided in an embodiment of the present invention is carried on the back by establishing global form
The mapping relations between concept lattice corresponding to scape and local Formal Context, and then carry out careful formalization analysis.Then carry
Go out and use the method that query concept search space is covered as under only construction current queries concept, avoid the structure of lower half concept lattice
Make.And corresponding structural theory and prune rule are provided, further reduce being based on form concept analysis (formal concept
Analysis, FCA) limited web data storehouse data extraction process in inquire about the complexity of selection.
Wherein, Formal Context is a triple K=(O, A, I), and wherein O is object (entity) set, and A is descriptor
(attribute) is gathered, and I is a binary crelation between O and A, i.e.,
Formal notion is two tuple c=(X, Y), whereinMeet X'=Y and X=Y', then c is claimed
To be a formal notion of Formal Context K, wherein X and Y are known respectively as the extension and intension of concept c.Formal Context K is produced
The set expression of raw form of ownership concept is CK。
Concept lattice (Formal Concept Lattice), also referred to as Galois lattice (Galois Lattice), for
All concept set C caused by Formal Context KK, and CKOn partial ordering relation institute derived from ordered set LK=(CK,≤), claims
The concept lattice for Formal Context K.Each node in concept lattice is a formal notion.
The up/down that the collection being made of all directly father's concepts/direct sub- concept of concept c is collectively referred to as concept c covers.
The data extraction process in web data storehouse can be modeled as the Select inquiries in SQL language.Type of service concept
Function Q can be turned to by form by analyzing the process.So use attribute setInquired about, the inquiry knot of attribute Y
Fruit can be expressed as Q (Y).Online web data can be seen as global Formal Context, be expressed as KG=(OG, AG, RG);And extract
To local data group cost Formal Context, is expressed as KL=(OL, AL, RL), whereinAL=AG, RL=RG.This
All concepts that sample overall situation Formal Context produces and the concept lattice that they are formed are expressed as CGAnd LG.Correspondingly, local form
Background KLAll concepts formed and the concept lattice that they are formed are expressed as CLAnd LL。
FunctionCL→CG:
Wherein (X, Y) ∈ LL, Galois lattice of the intension Y on global Formal Context
Operation is expressed as YG' and YG”。
Full concepts, if c ∈ LL, andThen concept c is referred to as Full concepts.
For a Formal Context K=(O, A, I), if there is some object a, its property set possessed is Y, and whole
Possess the object number of property set Y in Formal Context>Δ, i.e.,And meet | | a " | |>△, then object a be referred to as shape
Undistinguishable objects of the formula background K under limited threshold value Δ.
Refering to Fig. 1, a kind of data pick-up method in limited web data storehouse, the data pick-up applied to limited web data storehouse
Device, draw-out device include local data base, and method includes:
S101:Draw-out device obtains a property value in web data library inquiry interface.
S102:Inquiry request is sent to limited web data storehouse by draw-out device according to attribute value generation inquiry request.
S103:Draw-out device parses the Webpage of query feedback, extracts the inquiry data included by Webpage.
S104:Data of the draw-out device in inquiry data update local data base.
S105:Draw-out device is produced next group and looked into by being analyzed based on EdaliwdbFCA algorithms local data base
Property value is ask, to inquire about again limited web data storehouse.
S106:When the bar number for inquiring about data is equal to the default of the number of data for the every page of display of Webpage fed back after inquiry
During threshold value, draw-out device terminates the extraction of data.
What draw-out device faced is internet complicated and changeable, and any accident is responsible in extraction process
It is disconnected.Refering to Fig. 2, an embodiment of the present invention provides the data pick-up method in another limited web data storehouse, this method has robust
Property.Method includes:
S201:Draw-out device obtains a property value in web data library inquiry interface.
S202:Inquiry request is sent to limited web data storehouse by draw-out device according to attribute value generation inquiry request.Its
In, when needing to send inquiry request, single-value attribute is converted into what the web data library inquiry interface can identify by draw-out device
Multi-valued attribute, to realize inquiry.
The present embodiment describes this transforming relationship by Xml files, can meet that interface updates by changing interface document
Demand, without being recompilated to source code.It is listed below the scale mapping XML text of part Sina mobile phones enquiring interface
Part.
File:SinaMobileProScale.xml
<Xml version=" 1.0 " encoding=" UTF-8 " standalone=" no ">
<!-- sina mobile select web deep database, scale definition-->
<!--DOCTYPE scale-set SYSTEM"scale.dtd"-->
<!DOCTYPE scale-set[
<!ELEMENT scale-set(scale+)>
<!ELEMENT scale (attribute-list, object+)>
<!ATTLIST scale name CDATA#REQUIRED>
<!ATTLIST scale type CDATA"rating">
<!ATTLIST scale id CDATA#IMPLIED>
<!ELEMENT attribute-list(#PCDATA)>
<!ELEMENT object(#PCDATA)>
<!ATTLIST object name CDATA#REQUIRED>
<!ATTLIST object id CDATA#IMPLIED>
]>
<scale-set>
<Scale name=" mobile_jiage1 " id=" 0 ">
<attribute-list></attribute-list>
<Object name=" 0-499 " id=" 0 "></object>
<Object name=" 500-999 " id=" 1 "></object>
<Object name=" 1000-1499 " id=" 2 "></object>
<Object name=" 1500-1999 " id=" 3 "></object>
<Object name=" 2000-2999 " id=" 4 "></object>
<Object name=" 3000-1000000 " id=" 5 "></object>
</scale>
<Scale name=" mobile_face " id=" 2 ">
<attribute-list></attribute-list>
<Object name=" straight panel " id=" 12 "></object>
<Object name=" upturning lids, down turnover cover " id=" 13 "></object>
<Object name=" slip lid " id=" 14 "></object>
<Object name=" rotate, rotation shadow " id=" 15 "></object>
<Object name=" other " id=" 16 "></object>
</scale>
…
</scale-set>
S203:Judge whether to receive the Webpage of feedback query in preset time;If do not fed back in preset time
The inquiry request is sent to the limited web data storehouse by the Webpage of inquiry, the draw-out device again.
After the Webpage of preset time internal feedback inquiry, perform
S204:Draw-out device parses the Webpage of query feedback, extracts the inquiry data included by Webpage.
S205:Data in inquiry data and local data base that draw-out device relatively extracts.
If it is different, then perform S206:The inquiry data that draw-out device will differ from the data in local data base are added to
In local data base.
S207:Draw-out device is produced next group and looked into by being analyzed based on EdaliwdbFCA algorithms local data base
Property value is ask, to inquire about again limited web data storehouse.
S208:When the bar number for inquiring about data is equal to the default of the number of data for the every page of display of Webpage fed back after inquiry
During threshold value, draw-out device terminates the extraction of data.
The data pick-up method of restricted web database disclosed by the embodiments of the present invention, by selecting single attributive concept to be used as
Initial query concept, if current candidate query concept is not a Full concept, mean to return the result quantity it is excessive, and
And more than limited threshold value Δ, therefore cannot be displayed in same Web page, and then cannot be extracted to obtain.According to current
The local Formal Context extracted, constructs the lower covering Covl (c) of concept c, until choose extension gesture be less than or
Equal to limited threshold value concept as actual queries concept.In the extraction process in whole web data storehouse, send in query concept
Contain Y as querying attributes collection, by updating local Formal Context to the extraction that it is returned the result under limited situation.Entirely inquired about
Cheng Zhong, reduces query concept quantity using prune rule, improves algorithm extraction efficiency.
Refering to Fig. 3, this hair embodiment provides a kind of data pick-up device in limited web data storehouse, and draw-out device includes
Local data base 301, draw-out device further includes:
Querying attributes value obtaining unit 302, for obtaining a property value in web data library inquiry interface.
Query unit 303, for according to attribute value generation inquiry request, inquiry request to be sent to limited web data storehouse
304。
Wherein, query unit 303 includes attribute transforming subunit 312, is looked into for single-value attribute to be converted into web data storehouse
Ask the multi-valued attribute that interface can identify.
Resolution unit 305, for parsing the Webpage of query feedback, extracts the inquiry number included by Webpage
According to.
Data updating unit 306, for the data in inquiry data update local data base 301.
Querying attributes value generation unit 307, for by being carried out based on EdaliwdbFCA algorithms to local data base 301
Analysis, produces next group polling property value, to inquire about again limited web data storehouse 304.
Poll-final unit 308, for being equal to the every page of display of Webpage fed back after inquiry when the bar number for inquiring about data
Number of data predetermined threshold value when, terminate the extraction of data.
According to above device, the target data in limited web data storehouse 304 can be drawn into local data base 301,
Realize the search to deep net resource.In order to make the data pick-up device in limited web data storehouse 304 that there is more preferable robustness, more
The data in limited web data storehouse 304 are extracted well, and refering to Fig. 4, the embodiment of the present invention provides another limited web data storehouse
304 data pick-up device, including local data base 301, draw-out device further includes:
Querying attributes value obtaining unit 302, for obtaining a property value in web data library inquiry interface.
Query unit 303, for according to attribute value generation inquiry request, inquiry request to be sent to limited web data storehouse
304。
Resolution unit 305, for parsing the Webpage of query feedback, extracts the inquiry number included by Webpage
According to.
Wherein, resolution unit 305 includes webpage reception judgment sub-unit 309, for judging whether received in preset time
To the Webpage of feedback query.If the Webpage of non-feedback query in preset time, query unit 303 again will inquiry
Request is sent to limited web data storehouse 304.
Data updating unit 306, for the data in inquiry data update local data base 301.
Wherein, data updating unit 306 includes:Comparing subunit 310 and data addition subelement 311.
Comparing subunit 310, in the inquiry data and local data base 301 that are extracted for comparing resolution unit 305
Data;
Data add subelement 311, for will differ from the inquiry data extracted of the data in local data base 301
It is added in local data base 301.
Querying attributes value generation unit 307, for by being carried out based on EdaliwdbFCA algorithms to local data base 301
Analysis, produces next group polling property value, to inquire about again limited web data storehouse 304.
Poll-final unit 308, for being equal to the every page of display of Webpage fed back after inquiry when the bar number for inquiring about data
Number of data predetermined threshold value when, terminate the extraction of data.
In order to make draw-out device provided in an embodiment of the present invention that there is good autgmentability, you can for different data
The extraction work in source (different web data storehouses or simulation web data storehouse), EdaliwdbFCA is encapsulated in
In ExtractStrategy classes.Data source to be extracted is abstracted as Formal Context by the draw-out device, therefore uses DBContext classes
Description form background, and Polymeric encapsulation withdrawal device abstract class DataExtractor.Meanwhile required for algorithm EdaliwdbFCA
Galois contact computing be also encapsulated in DBContext classes.The inquiry operation that draw-out device needs to send is then by abstract class
The specific entity of SendQuery abstract functions in DataExtractor is completed.SendQuery functions are needed according to specific
Query concept, is converted into the multi-valued attribute for meeting interface specification by the query interface of data source, and sends inquiry request.Extract
To data need to be put into local data base 301, therefore abstract class DataExtractor includes DBModule pairs of database module
As.Class SinaMobileExtractor and DBDataExtractor are the specific implementations of abstract class DataExtractor, reply
Different extraction tasks.And these extract task due to the difference of data source, specific extraction process, and Web query interface
Also it is different.XExtractor represents the specific implementation of any abstract class DataExtractor, so that declared attribute selection algorithm is only
Stand on specific extraction process.Therefore if new extraction task need to be added, add the reality of corresponding abstract class DataExtractor
It is existing.DBDataExtractor classes are to extract data from simulation web data storehouse, therefore include DBModule objects.
It these are only the preferred embodiment of the present invention, be not intended to limit the invention, for those skilled in the art
For member, the invention may be variously modified and varied.Any modification within the spirit and principles of the invention, being made,
Equivalent substitution, improvement etc., should all be included in the protection scope of the present invention.
Claims (8)
- A kind of 1. data pick-up method in limited web data storehouse, it is characterised in that the data applied to limited web data storehouse are taken out Device is taken, the draw-out device includes local data base, the described method includes:The draw-out device obtains a property value in web data library inquiry interface;The inquiry request is sent to the limited Web numbers by the draw-out device according to the attribute value generation inquiry request According to storehouse;The Webpage of the draw-out device parsing query feedback, extracts the inquiry data included by the Webpage;Data of the draw-out device in the inquiry data update local data base;The draw-out device extracts EdaliwdbFCA algorithms to the local by being limited web data storehouse based on maximum sub- concept Database is analyzed, and produces next group polling property value, to inquire about again the limited web data storehouse;When the bar number of the inquiry data is equal to the predetermined threshold value of the number of data for the every page of display of Webpage fed back after inquiry When, the draw-out device terminates the extraction of data;Wherein, the draw-out device extracts EdaliwdbFCA algorithms to described by being limited web data storehouse based on maximum sub- concept Local data base is analyzed, the step of producing next group polling property value, including:By the EdaliwdbFCA algorithm packagings in ExtractStrategy classes;The DBContext classes for being used for describing data in the web data storehouse are established, and will realize the EdaliwdbFCA algorithms Galois contact operation function be encapsulated in the DBContext classes;Establish withdrawal device abstract class DataExtrator, and by withdrawal device the abstract class DataExtrator and DBContext Type of Collective;The SendQuery functions for being used for sending inquiry operation are realized in the withdrawal device abstract class DataExtrator, and in institute State in SendQuery functions and query concept is converted into multi-valued attribute;The instance object of the withdrawal device abstract class DataExtrator is established based on different inquiry data sources SinaMobileExtractor and DBDataExtractor, wherein, the DBDataExtractor includes being used for from local The DBModule objects of data are extracted in database.
- 2. the data pick-up method in limited web data storehouse according to claim 1, it is characterised in that the draw-out device Before the Webpage for parsing query feedback, the method further includes:Judge whether to receive the Webpage of feedback query in preset time;If the inquiry request is sent to institute by the Webpage of non-feedback query in preset time, the draw-out device again State limited web data storehouse.
- 3. the data pick-up method in limited web data storehouse according to claim 1, it is characterised in that the draw-out device According to the data in the inquiry data update local data base, including:Data in inquiry data and the local data base that the draw-out device relatively extracts;The inquiry data that the draw-out device will differ from the data in the local data base are added to the local data base In.
- 4. the data pick-up method in limited web data storehouse according to claim 1, it is characterised in that the draw-out device According to the attribute value generation inquiry request, including:Single-value attribute is converted into the multi-valued attribute that the web data library inquiry interface can identify by the draw-out device.
- A kind of 5. data pick-up device in limited web data storehouse, it is characterised in that the draw-out device includes local data base, The draw-out device further includes:Querying attributes value obtaining unit, for obtaining a property value in web data library inquiry interface;Query unit, for according to the attribute value generation inquiry request, the inquiry request to be sent to the limited Web numbers According to storehouse;Resolution unit, for parsing the Webpage of query feedback, extracts the inquiry data included by the Webpage;Data updating unit, for the data in the inquiry data update local data base;Querying attributes value generation unit, for extracting EdaliwdbFCA algorithms by being limited web data storehouse based on maximum sub- concept The local data base is analyzed, produces next group polling property value, to be carried out again to the limited web data storehouse Inquiry;Poll-final unit, for being equal to the number for the every page of display of Webpage fed back after inquiry when the bar number of the inquiry data According to bar number predetermined threshold value when, terminate the extraction of data;The querying attributes value generation unit extracts EdaliwdbFCA algorithms by being limited web data storehouse based on maximum sub- concept The local data base is analyzed, produces the mode of next group polling property value, including:By the EdaliwdbFCA algorithm packagings in ExtractStrategy classes;The DBContext classes for being used for describing data in the web data storehouse are established, and will realize the EdaliwdbFCA algorithms Galois contact operation function be encapsulated in the DBContext classes;Establish withdrawal device abstract class DataExtrator, and by withdrawal device the abstract class DataExtrator and DBContext Type of Collective;The SendQuery functions for being used for sending inquiry operation are realized in the withdrawal device abstract class DataExtrator, and in institute State in SendQuery functions and query concept is converted into multi-valued attribute;The instance object of the withdrawal device abstract class DataExtrator is established based on different inquiry data sources SinaMobileExtractor and DBDataExtractor, wherein, the DBDataExtractor includes being used for from local The DBModule objects of data are extracted in database.
- 6. the data pick-up device in limited web data storehouse according to claim 5, it is characterised in that the resolution unit Including:Webpage receives judgment sub-unit, for judging whether to receive the Webpage of feedback query in preset time;If the inquiry request is sent to institute by the Webpage of non-feedback query in preset time, the query unit again State limited web data storehouse.
- 7. the data pick-up device in limited web data storehouse according to claim 5, it is characterised in that the data update Unit includes:Comparing subunit, the data in the inquiry data and the local data base that are extracted for the resolution unit;Data add subelement, and the inquiry data extracted for will differ from the data in the local data base are added to In the local data base.
- 8. the data pick-up device in limited web data storehouse according to claim 5, it is characterised in that the query unit Including:Attribute transforming subunit, for single-value attribute to be converted into the multi-valued attribute that the web data library inquiry interface can identify.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510154092.3A CN104699848B (en) | 2015-04-02 | 2015-04-02 | The data pick-up method and device in limited web data storehouse |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510154092.3A CN104699848B (en) | 2015-04-02 | 2015-04-02 | The data pick-up method and device in limited web data storehouse |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104699848A CN104699848A (en) | 2015-06-10 |
CN104699848B true CN104699848B (en) | 2018-04-27 |
Family
ID=53346968
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510154092.3A Expired - Fee Related CN104699848B (en) | 2015-04-02 | 2015-04-02 | The data pick-up method and device in limited web data storehouse |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104699848B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7181471B1 (en) * | 1999-11-01 | 2007-02-20 | Fujitsu Limited | Fact data unifying method and apparatus |
CN101697221A (en) * | 2009-09-18 | 2010-04-21 | 何国健 | Method for obtaining reading access to limited content of web site by purchasing web site products |
CN103560943A (en) * | 2013-10-31 | 2014-02-05 | 北京邮电大学 | Network analytic system and method supporting real-time mass data processing |
-
2015
- 2015-04-02 CN CN201510154092.3A patent/CN104699848B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7181471B1 (en) * | 1999-11-01 | 2007-02-20 | Fujitsu Limited | Fact data unifying method and apparatus |
CN101697221A (en) * | 2009-09-18 | 2010-04-21 | 何国健 | Method for obtaining reading access to limited content of web site by purchasing web site products |
CN103560943A (en) * | 2013-10-31 | 2014-02-05 | 北京邮电大学 | Network analytic system and method supporting real-time mass data processing |
Non-Patent Citations (1)
Title |
---|
基于形式概念分析的Web数据库抽取研究;张卓;《中国博士学位论文全文数据库信息科技辑》;20120715;第18、27、57、63页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104699848A (en) | 2015-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220035600A1 (en) | API Specification Generation | |
CN102053983B (en) | Method, system and device for querying vertical search | |
CN108304444B (en) | Information query method and device | |
US8463739B2 (en) | Systems and methods for generating multi-population statistical measures using middleware | |
US9495429B2 (en) | Automatic synthesis and presentation of OLAP cubes from semantically enriched data sources | |
CN111460311A (en) | Search processing method, device and equipment based on dictionary tree and storage medium | |
CN103136228A (en) | Image search method and image search device | |
CN108804516B (en) | Similar user searching device, method and computer readable storage medium | |
US9535966B1 (en) | Techniques for aggregating data from multiple sources | |
CN101344881A (en) | Index generation method and device and search system for mass file type data | |
CN105550206B (en) | The edition control method and device of structured query sentence | |
CN108228743A (en) | A kind of real-time big data search engine system | |
CN111680489B (en) | Target text matching method and device, storage medium and electronic equipment | |
CN107423037B (en) | Application program interface positioning method and device | |
CN104484392A (en) | Method and device for generating database query statement | |
CN109408502A (en) | A kind of data standard processing method, device and its storage medium | |
CN111209325A (en) | Service system interface identification method, device and storage medium | |
CN104243565A (en) | Method and device for obtaining configuration data | |
CN112905600B (en) | Data query method and device, storage medium and electronic equipment | |
CN110955855A (en) | Information interception method, device and terminal | |
CN113568923A (en) | Method and device for querying data in database, storage medium and electronic equipment | |
CN110069489A (en) | A kind of information processing method, device, equipment and computer readable storage medium | |
CN104699848B (en) | The data pick-up method and device in limited web data storehouse | |
US20120284224A1 (en) | Build of website knowledge tables | |
CN110704481A (en) | Method and device for displaying data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information |
Inventor after: Du Juan Inventor after: Zhang Zhuo Inventor after: Cao Jianchun Inventor before: Du Juan Inventor before: Zhang Zhuo |
|
CB03 | Change of inventor or designer information | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180427 Termination date: 20210402 |
|
CF01 | Termination of patent right due to non-payment of annual fee |