CN104699848B - The data pick-up method and device in limited web data storehouse - Google Patents

The data pick-up method and device in limited web data storehouse Download PDF

Info

Publication number
CN104699848B
CN104699848B CN201510154092.3A CN201510154092A CN104699848B CN 104699848 B CN104699848 B CN 104699848B CN 201510154092 A CN201510154092 A CN 201510154092A CN 104699848 B CN104699848 B CN 104699848B
Authority
CN
China
Prior art keywords
data
inquiry
draw
out device
storehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510154092.3A
Other languages
Chinese (zh)
Other versions
CN104699848A (en
Inventor
杜鹃
张卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Yellow River Conservancy Technical Institute
Original Assignee
Zhengzhou University
Yellow River Conservancy Technical Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University, Yellow River Conservancy Technical Institute filed Critical Zhengzhou University
Priority to CN201510154092.3A priority Critical patent/CN104699848B/en
Publication of CN104699848A publication Critical patent/CN104699848A/en
Application granted granted Critical
Publication of CN104699848B publication Critical patent/CN104699848B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention relates to field of computer technology, there is provided a kind of data pick-up method and device in limited web data storehouse.Method includes:Draw-out device obtains a property value in web data library inquiry interface;Inquiry request is generated, is sent to limited web data storehouse;The Webpage of query feedback is parsed, extracts inquiry data;According to the data in inquiry data update local data base;By being analyzed based on EdaliwdbFCA algorithms local data base, next group polling property value is produced;When the bar number for inquiring about data is equal to predetermined threshold value, terminate the extraction of data.Draw-out device includes querying attributes value obtaining unit, query unit, resolution unit, data updating unit, querying attributes value generation unit and poll-final unit.Data pick-up by combining form conceptual analysis method to the limited web data storehouse based on property value query interface, realizes and the higher data of mass is extracted in limited web data storehouse, and has the characteristics that stability is good, efficiency is fast.

Description

The data pick-up method and device in limited web data storehouse
Technical field
The present invention relates to field of computer technology, in particular to a kind of data pick-up method in limited web data storehouse And device.
Background technology
Either for technical reasons or application demand, if the inquiry in web data storehouse return the result be limited in it is certain In the range of, i.e., web data storehouse is inquired about using set of properties, only k object can be automatically obtained by program, then be had So the web data storehouse of feature is limited web data storehouse.Web page is divided into shallow net and deep net, and shallow net is connected by hyperlink The static Web page come, according to statistics, the scale of deep net resource is 500 times of static page resource or so, while possesses more preferable number According to quality, and most important resource is exactly web data storehouse in netting deeply.The data in limited web data storehouse how are extracted, and are extracted The higher data of mass are always the problem being widely studied.
The content of the invention
In view of this, it is an object of the invention to provide a kind of data pick-up method and device in limited web data storehouse, energy Enough realize and the higher data of mass are extracted from limited web data storehouse.
What the present invention was realized in:
In a first aspect, an embodiment of the present invention provides a kind of data pick-up method in limited web data storehouse, applied to limited The data pick-up device in web data storehouse, the draw-out device include local data base, the described method includes:
The draw-out device obtains a property value in web data library inquiry interface;
The inquiry request is sent to described limited by the draw-out device according to the attribute value generation inquiry request Web data storehouse;
The Webpage of the draw-out device parsing query feedback, extracts the inquiry number included by the Webpage According to;
Data of the draw-out device in the inquiry data update local data base;
The draw-out device extracts (Extract data from by being limited web data storehouse based on maximum sub- concept Limited Web Database based on Formal Concept Analysis, EdaliwdbFCA) algorithm is to described Local data base is analyzed, and produces next group polling property value, to inquire about again the limited web data storehouse;
When the bar number of the inquiry data is equal to the default of the number of data for the every page of display of Webpage fed back after inquiry During threshold value, the draw-out device terminates the extraction of data.
With reference to first aspect, an embodiment of the present invention provides the first possible embodiment of first aspect, wherein institute Before the Webpage for stating draw-out device parsing query feedback, the method further includes:
Judge whether to receive the Webpage of feedback query in preset time;
If the Webpage of non-feedback query in preset time, the draw-out device again send the inquiry request To the limited web data storehouse.
What the abstracting method faced is an internet complicated and changeable, and any accident may cause extraction process The phenomenon of middle inquiry failure.Therefore, inquiry is required for being managed and safeguards each time, and the inquiry of failure can be found, and Inquiry can be re-started, can so cause the abstracting method that there is more preferable robustness, can ensure to extract the smooth of work Carry out.
With reference to first aspect, an embodiment of the present invention provides second of possible embodiment of first aspect, wherein institute Data of the draw-out device in the inquiry data update local data base are stated, including:
Data in inquiry data and the local data base that the draw-out device relatively extracts;
The inquiry data that the draw-out device will differ from the data in the local data base are added to the local number According in storehouse.
Extract data be by the data in restricted web database according to certain rule extraction into local data base, allow Data in restricted web database can be utilized., need not be again if there are the data extracted in local data base It is added in local data base.
With reference to first aspect, an embodiment of the present invention provides the third possible embodiment of first aspect, wherein institute Draw-out device is stated according to the attribute value generation inquiry request, including:
Single-value attribute is converted into the multi-valued attribute that the web data library inquiry interface can identify by the draw-out device.
Second aspect, the embodiment of the present invention additionally provide a kind of data pick-up device in limited web data storehouse, the extraction Device includes local data base, and the draw-out device further includes:
Querying attributes value obtaining unit, for obtaining a property value in web data library inquiry interface;
Query unit, for according to the attribute value generation inquiry request, the inquiry request being sent to described limited Web data storehouse;
Resolution unit, for parsing the Webpage of query feedback, extracts the inquiry number included by the Webpage According to;
Data updating unit, for the data in the inquiry data update local data base;
Querying attributes value generation unit, for extracting EdaliwdbFCA by being limited web data storehouse based on maximum sub- concept Algorithm analyzes the local data base, produces next group polling property value, so as to again to the limited web data storehouse Inquired about;
Poll-final unit, for being equal to the every page of display of Webpage fed back after inquiry when the bar number of the inquiry data Number of data predetermined threshold value when, terminate the extraction of data.
With reference to second aspect, an embodiment of the present invention provides the first possible embodiment of second aspect, wherein institute Stating resolution unit includes:
Webpage receives judgment sub-unit, for judging whether to receive the Webpage of feedback query in preset time;
If the Webpage of non-feedback query in preset time, the query unit again send the inquiry request To the limited web data storehouse.
The draw-out device is applied in internet complicated and changeable, and any accident is responsible for inquiring about in extraction process The phenomenon of failure.Therefore, inquiry is required for being managed and safeguards each time, and the inquiry of failure can be found, and can be weighed Newly inquired about, can so cause the draw-out device that there is more preferable robustness, can ensure to extract being smoothed out for work.
With reference to second aspect, an embodiment of the present invention provides second of possible embodiment of second aspect, wherein institute Stating data updating unit includes:
Comparing subunit, the number in the inquiry data and the local data base that are extracted for the resolution unit According to;
Data add subelement, and the inquiry data extracted for will differ from the data in the local data base add It is added in the local data base.
Extract data be by the data in restricted web database according to certain rule extraction into local data base, allow Data in restricted web database can be utilized., need not be again if there are the data extracted in local data base It is added in local data base.
With reference to second aspect, an embodiment of the present invention provides the third possible embodiment of second aspect, wherein institute Stating query unit includes:
Attribute transforming subunit, for single-value attribute to be converted into the multivalue that the web data library inquiry interface can identify Attribute.
The embodiment of the present invention provides a kind of data pick-up method and device in limited web data storehouse, general by combining form Data pick-up of the analysis method to the limited web data storehouse based on property value query interface is read, is realized in limited web data storehouse The higher data of mass are extracted, and have the characteristics that stability is good, efficiency is fast.
To enable the above objects, features and advantages of the present invention to become apparent, preferred embodiment cited below particularly, and coordinate Appended attached drawing, is described in detail below.
Brief description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it will be appreciated that the following drawings illustrate only certain embodiments of the present invention, therefore be not construed as pair The restriction of scope, for those of ordinary skill in the art, without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 shows a kind of data pick-up method in limited web data storehouse provided in an embodiment of the present invention;
Fig. 2 shows the data pick-up method in another limited web data storehouse provided in an embodiment of the present invention;
Fig. 3 shows a kind of data pick-up device in limited web data storehouse provided in an embodiment of the present invention;
Fig. 4 shows the data pick-up device in another limited web data storehouse provided in an embodiment of the present invention.
Marked in figure:Local data base 301, querying attributes value obtaining unit 302, query unit 303, is limited web data Storehouse 304, resolution unit 305, data updating unit 306, querying attributes value generation unit 307, poll-final unit 308, webpage Receive judgment sub-unit 309, comparing subunit 310, data addition subelement 311, attribute transforming subunit 312.
Embodiment
Below in conjunction with attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Ground describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.Usually exist The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and designed with a variety of configurations herein.Cause This, the detailed description of the embodiment of the present invention to providing in the accompanying drawings is not intended to limit claimed invention below Scope, but it is merely representative of the selected embodiment of the present invention.Based on the embodiment of the present invention, those skilled in the art are not doing Go out all other embodiments obtained on the premise of creative work, belong to the scope of protection of the invention.
The acquisition of web data is mainly by obtaining Web page information extraction.Web page is divided into shallow net and deep net.Shallow net is The static Web page connected by hyperlink, its content can be by current universal search engine (Google, Baidu etc.) direct ropes Draw and retrieve.Deep net refers to those Web pages produced according to user's request by Web server dynamic.It is wherein addressable Line database (abbreviated here as web data storehouse or WDB), such as middle National IP Network, ten-thousand-ton train, remarkable Amazon etc., they It is important deep net part.The content in web data storehouse is stored in real background data base, and major part cannot be worked as Preceding universal search engine is indexed.Deep net Web page content only when being queried, just can according to the inquiry request of user, By Web server dynamic generation, and return result to visitor.
The data pick-up method and device in limited web data storehouse provided in an embodiment of the present invention is carried on the back by establishing global form The mapping relations between concept lattice corresponding to scape and local Formal Context, and then carry out careful formalization analysis.Then carry Go out and use the method that query concept search space is covered as under only construction current queries concept, avoid the structure of lower half concept lattice Make.And corresponding structural theory and prune rule are provided, further reduce being based on form concept analysis (formal concept Analysis, FCA) limited web data storehouse data extraction process in inquire about the complexity of selection.
Wherein, Formal Context is a triple K=(O, A, I), and wherein O is object (entity) set, and A is descriptor (attribute) is gathered, and I is a binary crelation between O and A, i.e.,
Formal notion is two tuple c=(X, Y), whereinMeet X'=Y and X=Y', then c is claimed To be a formal notion of Formal Context K, wherein X and Y are known respectively as the extension and intension of concept c.Formal Context K is produced The set expression of raw form of ownership concept is CK
Concept lattice (Formal Concept Lattice), also referred to as Galois lattice (Galois Lattice), for All concept set C caused by Formal Context KK, and CKOn partial ordering relation institute derived from ordered set LK=(CK,≤), claims The concept lattice for Formal Context K.Each node in concept lattice is a formal notion.
The up/down that the collection being made of all directly father's concepts/direct sub- concept of concept c is collectively referred to as concept c covers.
The data extraction process in web data storehouse can be modeled as the Select inquiries in SQL language.Type of service concept Function Q can be turned to by form by analyzing the process.So use attribute setInquired about, the inquiry knot of attribute Y Fruit can be expressed as Q (Y).Online web data can be seen as global Formal Context, be expressed as KG=(OG, AG, RG);And extract To local data group cost Formal Context, is expressed as KL=(OL, AL, RL), whereinAL=AG, RL=RG.This All concepts that sample overall situation Formal Context produces and the concept lattice that they are formed are expressed as CGAnd LG.Correspondingly, local form Background KLAll concepts formed and the concept lattice that they are formed are expressed as CLAnd LL
FunctionCL→CG
Wherein (X, Y) ∈ LL, Galois lattice of the intension Y on global Formal Context Operation is expressed as YG' and YG”。
Full concepts, if c ∈ LL, andThen concept c is referred to as Full concepts.
For a Formal Context K=(O, A, I), if there is some object a, its property set possessed is Y, and whole Possess the object number of property set Y in Formal Context>Δ, i.e.,And meet | | a " | |>△, then object a be referred to as shape Undistinguishable objects of the formula background K under limited threshold value Δ.
Refering to Fig. 1, a kind of data pick-up method in limited web data storehouse, the data pick-up applied to limited web data storehouse Device, draw-out device include local data base, and method includes:
S101:Draw-out device obtains a property value in web data library inquiry interface.
S102:Inquiry request is sent to limited web data storehouse by draw-out device according to attribute value generation inquiry request.
S103:Draw-out device parses the Webpage of query feedback, extracts the inquiry data included by Webpage.
S104:Data of the draw-out device in inquiry data update local data base.
S105:Draw-out device is produced next group and looked into by being analyzed based on EdaliwdbFCA algorithms local data base Property value is ask, to inquire about again limited web data storehouse.
S106:When the bar number for inquiring about data is equal to the default of the number of data for the every page of display of Webpage fed back after inquiry During threshold value, draw-out device terminates the extraction of data.
What draw-out device faced is internet complicated and changeable, and any accident is responsible in extraction process It is disconnected.Refering to Fig. 2, an embodiment of the present invention provides the data pick-up method in another limited web data storehouse, this method has robust Property.Method includes:
S201:Draw-out device obtains a property value in web data library inquiry interface.
S202:Inquiry request is sent to limited web data storehouse by draw-out device according to attribute value generation inquiry request.Its In, when needing to send inquiry request, single-value attribute is converted into what the web data library inquiry interface can identify by draw-out device Multi-valued attribute, to realize inquiry.
The present embodiment describes this transforming relationship by Xml files, can meet that interface updates by changing interface document Demand, without being recompilated to source code.It is listed below the scale mapping XML text of part Sina mobile phones enquiring interface Part.
File:SinaMobileProScale.xml
<Xml version=" 1.0 " encoding=" UTF-8 " standalone=" no ">
<!-- sina mobile select web deep database, scale definition-->
<!--DOCTYPE scale-set SYSTEM"scale.dtd"-->
<!DOCTYPE scale-set[
<!ELEMENT scale-set(scale+)>
<!ELEMENT scale (attribute-list, object+)>
<!ATTLIST scale name CDATA#REQUIRED>
<!ATTLIST scale type CDATA"rating">
<!ATTLIST scale id CDATA#IMPLIED>
<!ELEMENT attribute-list(#PCDATA)>
<!ELEMENT object(#PCDATA)>
<!ATTLIST object name CDATA#REQUIRED>
<!ATTLIST object id CDATA#IMPLIED>
]>
<scale-set>
<Scale name=" mobile_jiage1 " id=" 0 ">
<attribute-list></attribute-list>
<Object name=" 0-499 " id=" 0 "></object>
<Object name=" 500-999 " id=" 1 "></object>
<Object name=" 1000-1499 " id=" 2 "></object>
<Object name=" 1500-1999 " id=" 3 "></object>
<Object name=" 2000-2999 " id=" 4 "></object>
<Object name=" 3000-1000000 " id=" 5 "></object>
</scale>
<Scale name=" mobile_face " id=" 2 ">
<attribute-list></attribute-list>
<Object name=" straight panel " id=" 12 "></object>
<Object name=" upturning lids, down turnover cover " id=" 13 "></object>
<Object name=" slip lid " id=" 14 "></object>
<Object name=" rotate, rotation shadow " id=" 15 "></object>
<Object name=" other " id=" 16 "></object>
</scale>
</scale-set>
S203:Judge whether to receive the Webpage of feedback query in preset time;If do not fed back in preset time The inquiry request is sent to the limited web data storehouse by the Webpage of inquiry, the draw-out device again.
After the Webpage of preset time internal feedback inquiry, perform
S204:Draw-out device parses the Webpage of query feedback, extracts the inquiry data included by Webpage.
S205:Data in inquiry data and local data base that draw-out device relatively extracts.
If it is different, then perform S206:The inquiry data that draw-out device will differ from the data in local data base are added to In local data base.
S207:Draw-out device is produced next group and looked into by being analyzed based on EdaliwdbFCA algorithms local data base Property value is ask, to inquire about again limited web data storehouse.
S208:When the bar number for inquiring about data is equal to the default of the number of data for the every page of display of Webpage fed back after inquiry During threshold value, draw-out device terminates the extraction of data.
The data pick-up method of restricted web database disclosed by the embodiments of the present invention, by selecting single attributive concept to be used as Initial query concept, if current candidate query concept is not a Full concept, mean to return the result quantity it is excessive, and And more than limited threshold value Δ, therefore cannot be displayed in same Web page, and then cannot be extracted to obtain.According to current The local Formal Context extracted, constructs the lower covering Covl (c) of concept c, until choose extension gesture be less than or Equal to limited threshold value concept as actual queries concept.In the extraction process in whole web data storehouse, send in query concept Contain Y as querying attributes collection, by updating local Formal Context to the extraction that it is returned the result under limited situation.Entirely inquired about Cheng Zhong, reduces query concept quantity using prune rule, improves algorithm extraction efficiency.
Refering to Fig. 3, this hair embodiment provides a kind of data pick-up device in limited web data storehouse, and draw-out device includes Local data base 301, draw-out device further includes:
Querying attributes value obtaining unit 302, for obtaining a property value in web data library inquiry interface.
Query unit 303, for according to attribute value generation inquiry request, inquiry request to be sent to limited web data storehouse 304。
Wherein, query unit 303 includes attribute transforming subunit 312, is looked into for single-value attribute to be converted into web data storehouse Ask the multi-valued attribute that interface can identify.
Resolution unit 305, for parsing the Webpage of query feedback, extracts the inquiry number included by Webpage According to.
Data updating unit 306, for the data in inquiry data update local data base 301.
Querying attributes value generation unit 307, for by being carried out based on EdaliwdbFCA algorithms to local data base 301 Analysis, produces next group polling property value, to inquire about again limited web data storehouse 304.
Poll-final unit 308, for being equal to the every page of display of Webpage fed back after inquiry when the bar number for inquiring about data Number of data predetermined threshold value when, terminate the extraction of data.
According to above device, the target data in limited web data storehouse 304 can be drawn into local data base 301, Realize the search to deep net resource.In order to make the data pick-up device in limited web data storehouse 304 that there is more preferable robustness, more The data in limited web data storehouse 304 are extracted well, and refering to Fig. 4, the embodiment of the present invention provides another limited web data storehouse 304 data pick-up device, including local data base 301, draw-out device further includes:
Querying attributes value obtaining unit 302, for obtaining a property value in web data library inquiry interface.
Query unit 303, for according to attribute value generation inquiry request, inquiry request to be sent to limited web data storehouse 304。
Resolution unit 305, for parsing the Webpage of query feedback, extracts the inquiry number included by Webpage According to.
Wherein, resolution unit 305 includes webpage reception judgment sub-unit 309, for judging whether received in preset time To the Webpage of feedback query.If the Webpage of non-feedback query in preset time, query unit 303 again will inquiry Request is sent to limited web data storehouse 304.
Data updating unit 306, for the data in inquiry data update local data base 301.
Wherein, data updating unit 306 includes:Comparing subunit 310 and data addition subelement 311.
Comparing subunit 310, in the inquiry data and local data base 301 that are extracted for comparing resolution unit 305 Data;
Data add subelement 311, for will differ from the inquiry data extracted of the data in local data base 301 It is added in local data base 301.
Querying attributes value generation unit 307, for by being carried out based on EdaliwdbFCA algorithms to local data base 301 Analysis, produces next group polling property value, to inquire about again limited web data storehouse 304.
Poll-final unit 308, for being equal to the every page of display of Webpage fed back after inquiry when the bar number for inquiring about data Number of data predetermined threshold value when, terminate the extraction of data.
In order to make draw-out device provided in an embodiment of the present invention that there is good autgmentability, you can for different data The extraction work in source (different web data storehouses or simulation web data storehouse), EdaliwdbFCA is encapsulated in In ExtractStrategy classes.Data source to be extracted is abstracted as Formal Context by the draw-out device, therefore uses DBContext classes Description form background, and Polymeric encapsulation withdrawal device abstract class DataExtractor.Meanwhile required for algorithm EdaliwdbFCA Galois contact computing be also encapsulated in DBContext classes.The inquiry operation that draw-out device needs to send is then by abstract class The specific entity of SendQuery abstract functions in DataExtractor is completed.SendQuery functions are needed according to specific Query concept, is converted into the multi-valued attribute for meeting interface specification by the query interface of data source, and sends inquiry request.Extract To data need to be put into local data base 301, therefore abstract class DataExtractor includes DBModule pairs of database module As.Class SinaMobileExtractor and DBDataExtractor are the specific implementations of abstract class DataExtractor, reply Different extraction tasks.And these extract task due to the difference of data source, specific extraction process, and Web query interface Also it is different.XExtractor represents the specific implementation of any abstract class DataExtractor, so that declared attribute selection algorithm is only Stand on specific extraction process.Therefore if new extraction task need to be added, add the reality of corresponding abstract class DataExtractor It is existing.DBDataExtractor classes are to extract data from simulation web data storehouse, therefore include DBModule objects.
It these are only the preferred embodiment of the present invention, be not intended to limit the invention, for those skilled in the art For member, the invention may be variously modified and varied.Any modification within the spirit and principles of the invention, being made, Equivalent substitution, improvement etc., should all be included in the protection scope of the present invention.

Claims (8)

  1. A kind of 1. data pick-up method in limited web data storehouse, it is characterised in that the data applied to limited web data storehouse are taken out Device is taken, the draw-out device includes local data base, the described method includes:
    The draw-out device obtains a property value in web data library inquiry interface;
    The inquiry request is sent to the limited Web numbers by the draw-out device according to the attribute value generation inquiry request According to storehouse;
    The Webpage of the draw-out device parsing query feedback, extracts the inquiry data included by the Webpage;
    Data of the draw-out device in the inquiry data update local data base;
    The draw-out device extracts EdaliwdbFCA algorithms to the local by being limited web data storehouse based on maximum sub- concept Database is analyzed, and produces next group polling property value, to inquire about again the limited web data storehouse;
    When the bar number of the inquiry data is equal to the predetermined threshold value of the number of data for the every page of display of Webpage fed back after inquiry When, the draw-out device terminates the extraction of data;
    Wherein, the draw-out device extracts EdaliwdbFCA algorithms to described by being limited web data storehouse based on maximum sub- concept Local data base is analyzed, the step of producing next group polling property value, including:
    By the EdaliwdbFCA algorithm packagings in ExtractStrategy classes;
    The DBContext classes for being used for describing data in the web data storehouse are established, and will realize the EdaliwdbFCA algorithms Galois contact operation function be encapsulated in the DBContext classes;
    Establish withdrawal device abstract class DataExtrator, and by withdrawal device the abstract class DataExtrator and DBContext Type of Collective;
    The SendQuery functions for being used for sending inquiry operation are realized in the withdrawal device abstract class DataExtrator, and in institute State in SendQuery functions and query concept is converted into multi-valued attribute;
    The instance object of the withdrawal device abstract class DataExtrator is established based on different inquiry data sources SinaMobileExtractor and DBDataExtractor, wherein, the DBDataExtractor includes being used for from local The DBModule objects of data are extracted in database.
  2. 2. the data pick-up method in limited web data storehouse according to claim 1, it is characterised in that the draw-out device Before the Webpage for parsing query feedback, the method further includes:
    Judge whether to receive the Webpage of feedback query in preset time;
    If the inquiry request is sent to institute by the Webpage of non-feedback query in preset time, the draw-out device again State limited web data storehouse.
  3. 3. the data pick-up method in limited web data storehouse according to claim 1, it is characterised in that the draw-out device According to the data in the inquiry data update local data base, including:
    Data in inquiry data and the local data base that the draw-out device relatively extracts;
    The inquiry data that the draw-out device will differ from the data in the local data base are added to the local data base In.
  4. 4. the data pick-up method in limited web data storehouse according to claim 1, it is characterised in that the draw-out device According to the attribute value generation inquiry request, including:
    Single-value attribute is converted into the multi-valued attribute that the web data library inquiry interface can identify by the draw-out device.
  5. A kind of 5. data pick-up device in limited web data storehouse, it is characterised in that the draw-out device includes local data base, The draw-out device further includes:
    Querying attributes value obtaining unit, for obtaining a property value in web data library inquiry interface;
    Query unit, for according to the attribute value generation inquiry request, the inquiry request to be sent to the limited Web numbers According to storehouse;
    Resolution unit, for parsing the Webpage of query feedback, extracts the inquiry data included by the Webpage;
    Data updating unit, for the data in the inquiry data update local data base;
    Querying attributes value generation unit, for extracting EdaliwdbFCA algorithms by being limited web data storehouse based on maximum sub- concept The local data base is analyzed, produces next group polling property value, to be carried out again to the limited web data storehouse Inquiry;
    Poll-final unit, for being equal to the number for the every page of display of Webpage fed back after inquiry when the bar number of the inquiry data According to bar number predetermined threshold value when, terminate the extraction of data;
    The querying attributes value generation unit extracts EdaliwdbFCA algorithms by being limited web data storehouse based on maximum sub- concept The local data base is analyzed, produces the mode of next group polling property value, including:
    By the EdaliwdbFCA algorithm packagings in ExtractStrategy classes;
    The DBContext classes for being used for describing data in the web data storehouse are established, and will realize the EdaliwdbFCA algorithms Galois contact operation function be encapsulated in the DBContext classes;
    Establish withdrawal device abstract class DataExtrator, and by withdrawal device the abstract class DataExtrator and DBContext Type of Collective;
    The SendQuery functions for being used for sending inquiry operation are realized in the withdrawal device abstract class DataExtrator, and in institute State in SendQuery functions and query concept is converted into multi-valued attribute;
    The instance object of the withdrawal device abstract class DataExtrator is established based on different inquiry data sources SinaMobileExtractor and DBDataExtractor, wherein, the DBDataExtractor includes being used for from local The DBModule objects of data are extracted in database.
  6. 6. the data pick-up device in limited web data storehouse according to claim 5, it is characterised in that the resolution unit Including:
    Webpage receives judgment sub-unit, for judging whether to receive the Webpage of feedback query in preset time;
    If the inquiry request is sent to institute by the Webpage of non-feedback query in preset time, the query unit again State limited web data storehouse.
  7. 7. the data pick-up device in limited web data storehouse according to claim 5, it is characterised in that the data update Unit includes:
    Comparing subunit, the data in the inquiry data and the local data base that are extracted for the resolution unit;
    Data add subelement, and the inquiry data extracted for will differ from the data in the local data base are added to In the local data base.
  8. 8. the data pick-up device in limited web data storehouse according to claim 5, it is characterised in that the query unit Including:
    Attribute transforming subunit, for single-value attribute to be converted into the multi-valued attribute that the web data library inquiry interface can identify.
CN201510154092.3A 2015-04-02 2015-04-02 The data pick-up method and device in limited web data storehouse Expired - Fee Related CN104699848B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510154092.3A CN104699848B (en) 2015-04-02 2015-04-02 The data pick-up method and device in limited web data storehouse

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510154092.3A CN104699848B (en) 2015-04-02 2015-04-02 The data pick-up method and device in limited web data storehouse

Publications (2)

Publication Number Publication Date
CN104699848A CN104699848A (en) 2015-06-10
CN104699848B true CN104699848B (en) 2018-04-27

Family

ID=53346968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510154092.3A Expired - Fee Related CN104699848B (en) 2015-04-02 2015-04-02 The data pick-up method and device in limited web data storehouse

Country Status (1)

Country Link
CN (1) CN104699848B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7181471B1 (en) * 1999-11-01 2007-02-20 Fujitsu Limited Fact data unifying method and apparatus
CN101697221A (en) * 2009-09-18 2010-04-21 何国健 Method for obtaining reading access to limited content of web site by purchasing web site products
CN103560943A (en) * 2013-10-31 2014-02-05 北京邮电大学 Network analytic system and method supporting real-time mass data processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7181471B1 (en) * 1999-11-01 2007-02-20 Fujitsu Limited Fact data unifying method and apparatus
CN101697221A (en) * 2009-09-18 2010-04-21 何国健 Method for obtaining reading access to limited content of web site by purchasing web site products
CN103560943A (en) * 2013-10-31 2014-02-05 北京邮电大学 Network analytic system and method supporting real-time mass data processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于形式概念分析的Web数据库抽取研究;张卓;《中国博士学位论文全文数据库信息科技辑》;20120715;第18、27、57、63页 *

Also Published As

Publication number Publication date
CN104699848A (en) 2015-06-10

Similar Documents

Publication Publication Date Title
US20220035600A1 (en) API Specification Generation
CN102053983B (en) Method, system and device for querying vertical search
CN108304444B (en) Information query method and device
US8463739B2 (en) Systems and methods for generating multi-population statistical measures using middleware
US9495429B2 (en) Automatic synthesis and presentation of OLAP cubes from semantically enriched data sources
CN111460311A (en) Search processing method, device and equipment based on dictionary tree and storage medium
CN103136228A (en) Image search method and image search device
CN108804516B (en) Similar user searching device, method and computer readable storage medium
US9535966B1 (en) Techniques for aggregating data from multiple sources
CN101344881A (en) Index generation method and device and search system for mass file type data
CN105550206B (en) The edition control method and device of structured query sentence
CN108228743A (en) A kind of real-time big data search engine system
CN111680489B (en) Target text matching method and device, storage medium and electronic equipment
CN107423037B (en) Application program interface positioning method and device
CN104484392A (en) Method and device for generating database query statement
CN109408502A (en) A kind of data standard processing method, device and its storage medium
CN111209325A (en) Service system interface identification method, device and storage medium
CN104243565A (en) Method and device for obtaining configuration data
CN112905600B (en) Data query method and device, storage medium and electronic equipment
CN110955855A (en) Information interception method, device and terminal
CN113568923A (en) Method and device for querying data in database, storage medium and electronic equipment
CN110069489A (en) A kind of information processing method, device, equipment and computer readable storage medium
CN104699848B (en) The data pick-up method and device in limited web data storehouse
US20120284224A1 (en) Build of website knowledge tables
CN110704481A (en) Method and device for displaying data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Du Juan

Inventor after: Zhang Zhuo

Inventor after: Cao Jianchun

Inventor before: Du Juan

Inventor before: Zhang Zhuo

CB03 Change of inventor or designer information
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180427

Termination date: 20210402

CF01 Termination of patent right due to non-payment of annual fee