CN104462547B - A kind of method and system of configurable collecting webpage data - Google Patents

A kind of method and system of configurable collecting webpage data Download PDF

Info

Publication number
CN104462547B
CN104462547B CN201410822548.4A CN201410822548A CN104462547B CN 104462547 B CN104462547 B CN 104462547B CN 201410822548 A CN201410822548 A CN 201410822548A CN 104462547 B CN104462547 B CN 104462547B
Authority
CN
China
Prior art keywords
acquisition
configuration
information
website
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410822548.4A
Other languages
Chinese (zh)
Other versions
CN104462547A (en
Inventor
吴正辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN LAN-YOU TECHNOLOG Co Ltd
Original Assignee
SHENZHEN LAN-YOU TECHNOLOG Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN LAN-YOU TECHNOLOG Co Ltd filed Critical SHENZHEN LAN-YOU TECHNOLOG Co Ltd
Priority to CN201410822548.4A priority Critical patent/CN104462547B/en
Publication of CN104462547A publication Critical patent/CN104462547A/en
Application granted granted Critical
Publication of CN104462547B publication Critical patent/CN104462547B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention relates to a kind of method and system of configurable collecting webpage data, especially suitable for needing the case where constantly updating the acquisition mode to web data, this method comprises: S1, obtaining from database the configuration information of collecting webpage data;S2, according to configuration information, obtain required classifieds website and log in;S3, according to the site information after login, obtain the theme of required acquisition under website;S4, according to configuration information, collected theme acquires required web page contents;S5, the tables of data according to configuration acquire the information needed of content pages by regular expression in the tables of data that configures or certain Rule Extraction;S6, the list data extracted is stored into database.Implement the method and system of configurable collecting webpage data of the invention, user voluntarily can need the web data acquired by arbitrary disposition, and the relevant data information of the whole network is acquired by configured acquisition scheme, realize flexibly, easily collecting webpage data.

Description

A kind of method and system of configurable collecting webpage data
Technical field
The present invention relates to network communication technology fields, constantly update more specifically to one kind to webpage The method and system of the configurable collecting webpage data of the case where acquisition mode of data.
Background technique
With the high speed development that Web technology and Web are applied, the arriving of big data era applies website to various Web, special The monitoring of other social platform, the public opinion monitoring of each company, user data acquisition, big data excavate using more and more extensive; All trades and professions are also increasingly dependent on internet and rely on internet information height.But the data of internet are all magnanimity, So how to go to extract the data that we need?
Acquisition system only for a certain website or several websites currently on the market, there is no spies can configure, specified The webpage data acquiring method of fixed number evidence.
Webpage layout design both using Table mode or can use DIV mode or both mixed composition, so It will appear acquisition mistake or abnormal when acquiring data;It needs to develop program again after the website revision of acquisition, increase is opened Send out cost.
This just needs us to go to develop these data of a system acquisition, and each website is each have their own design and shows Mode cannot acquire all websites with same kind of analysis mode, to avoid doing an analytic method and net for each website Correcting of standing needs to modify program, it is necessary to need to develop a kind of general, configurable collecting webpage data system.
Summary of the invention
The technical problem to be solved by the present invention is to can only acquire one or several for existing collecting webpage data system A website has unicity and not very practical defect, provides a kind of configurable, the webpage that operation strategies widely can configure The method and system of data acquisition.
The technical scheme to solve the above technical problems is that a kind of method of configurable collecting webpage data, This method comprises:
S1, the configuration information that collecting webpage data is obtained from database, the configuration information include: configuration acquisition website Classification information, configuration acquisition theme Template Information, configuration acquisition content pages Template Information and configuration data table information;
S2, the classification information that website is acquired according to configuration judge whether the classification for enabling acquisition website, if it is enable The classification of website is acquired, classifieds website is obtained, otherwise terminates program;
S3, the classification information that website is acquired according to configuration, judge whether to log in collected classifieds website, if it is step on Otherwise the land classifieds website will log in the classifieds website using virtual log-on webpage;
S4, theme Template Information is acquired according to configuration, obtains the theme of required acquisition under website;
S5, the theme according to acquisition judge the content of the theme with the presence or absence of multi-page situation, if it is according to paging Mark obtains list of websites information, otherwise directly acquires the content pages of the theme;
S6, acquisition content is intercepted according to the opening flag and end mark of content pages, and content pages are obtained according to expression formula Network address set;
S7, the acquisition content pages Template Information according to configuration, obtain the content pages of acquisition;
S8, the content pages according to acquisition judge that it, with the presence or absence of multi-page situation, is if it is obtained according to paging mark Then the list of websites information of multi-page intercepts content according to the opening flag of content pages and end mark, otherwise direct basis The content of opening flag and end mark interception content pages;
S9, the corresponding expression formula of field or dependency rule extraction list data are obtained according to the data table information of configuration;
S10, the list data extracted is stored into database.
In the method for configurable collecting webpage data of the present invention, the acquisition attributes information includes: acquisition Network address, acquisition website coding and frequency acquisition.
The acquisition network address, for acquiring the web page address for meeting configuration;
The acquisition website coding, for acquiring the source code of website;
The frequency acquisition is set as every 5 minutes once.
In the method for configurable collecting webpage data of the present invention, the data table information includes: acquisition mark Topic, acquisition time, acquisition content and the source for acquiring content.
Title is acquired, for acquiring the title of content pages;
Content is acquired, for acquiring the content of content pages;
Acquire the source of content, the information of the content sources for acquiring content pages.
In the method for configurable collecting webpage data of the present invention, the configuration of the configuration information of the step S1 Step includes:
A, the classification and acquisition attributes information of configuration acquisition website;
B, configuration acquisition theme Template Information;
C, configuration acquisition content pages Template Information;
D, storage configuration information transfers use into database after convenient.
The system for constructing a kind of configurable collecting webpage data, comprising: starting module transfers configuration module, judges mould Block obtains configuration information module, database, interception content module and memory module;
The database is used for storage configuration information and list data;
The acquisition configuration information module, for configuring the web data of acquisition needed for user;
The acquisition configuration information module includes obtaining Website Module, obtaining subject of Web site module, obtain content pages module With acquisition list data module, wherein
The acquisition Website Module, for classifieds website needed for obtaining user;
The acquisition subject of Web site module, for obtaining theme needed for user in classifieds website;
The acquisition content pages module, for obtaining content pages needed for user in theme;
List data module is obtained, for obtaining list data in content pages.
The judgment module includes: that first judgment module, the second judgment module, third judgment module and the 4th judge mould Block;
The interception content module includes: the first interception content module and the second interception content module;
The acquisition configuration information module includes: to obtain Website Module, obtain subject of Web site module, obtain content pages module With acquisition list data module.
Starting module, for starting configurable collecting webpage data system;
Configuration module is transferred, the corresponding configuration information for acquiring needed for transferring from database;
First judgment module, for judging whether that configuration acquires the classification of website and the function of acquisition attributes, judgement are The no classification for enabling acquisition website, if it is enables the classification of acquisition website, obtains classifieds website, otherwise terminates program;
Second judgment module logs in collected classifieds website for judging whether, if it is logs in the website, otherwise The classifieds website will be logged in using virtual log-on webpage;
Subject of Web site module is obtained, for the subject of Web site Template Information according to configuration, obtains the institute for logging in classifieds website The theme needed;
Third judgment module, for judging the subject content with the presence or absence of multi-page situation, if it is according to paging mark Will obtains the list of websites information of multi-page, and the content pages of multi-page are obtained by the list information, otherwise directly acquire the master The content pages of topic;
First interception content module, for the opening flag and end mark interception content information by content pages;
Acquisition content pages module is obtained to obtain from the topic module of website for the acquisition content page information according to configuration Take required content pages;
4th judgment module, for judging that it, with the presence or absence of multi-page situation, is if it is obtained according to paging mark more The list of websites information of the page, then according to the content of opening flag and end mark interception content pages, otherwise directly basis is opened Begin to indicate the content with end mark interception content pages;
Second interception content module, for the opening flag and end mark interception content information by web page contents page;
Extract list data module, for the acquisition data table information according to configuration, extract the corresponding expression formula of field or Person's Rule list data;
Memory module, for storing the data extracted into database.
In the system of configurable collecting webpage data of the present invention, the acquisition Website Module is before execution It is first made whether to enable and log in the judgement of website, if it is carries out the module for obtaining subject of Web site and content pages, otherwise will End process.
In the system of configurable collecting webpage data of the present invention, if the 4th judgment module encounters multipage Face situation acquires data by the way of datacycle merging when paging acquires content.
The method and system for implementing configurable collecting webpage data of the invention, have the advantages that user can Voluntarily arbitrary disposition needs the webpage data information and condition acquired, acquires the relevant of the whole network by configured acquisition scheme Data information realizes the acquisition that data content flexibly, is easily carried out to any webpage.
Detailed description of the invention
Fig. 1 is the flow chart of the first preferred embodiment of the method for configurable collecting webpage data of the invention;
Fig. 2 is the flow chart of the second preferred embodiment of the method for configurable collecting webpage data of the invention;
Fig. 3 be configurable collecting webpage data of the invention method first or two preferred embodiment configuration information The flow chart of step;
Fig. 4 is the system block diagram of configurable collecting webpage data of the invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that described herein, specific examples are only used to explain the present invention, not For limiting the present invention.
As shown in Figure 1, the process of the first preferred embodiment in the method for configurable collecting webpage data of the invention In figure, the method for the configurable collecting webpage data proceeds to step S110 after starting from step S100: step S100, The configuration information of collecting webpage data is obtained from database, which includes: the classification information of configuration acquisition website, is matched Set acquisition theme Template Information, configuration acquisition content pages Template Information and configuration data table information;Then, next step is arrived S120 judges whether the classification for enabling acquisition website according to the classification information of configuration acquisition website, if it is enables acquisition net The classification stood obtains classifieds website;Otherwise terminate program;Then, next step S130 is arrived, according to the classification of configuration acquisition website Information judges whether to log in collected classifieds website, if it is logs in the classifieds website, otherwise will log in net using virtual Page logs in the classifieds website;Then, next step S140 is arrived, theme Template Information is acquired according to configuration, needed for obtaining under website The theme to be acquired;Then, next step S150 is arrived, according to the theme of acquisition, judges the subject content with the presence or absence of multi-page Situation if it is obtains the list of websites information of multi-page according to paging mark, obtains multi-page by the list information Otherwise content pages directly acquire the content pages of the theme;Then, arrive next step S160, according to the opening flag of content pages and The network address set of end mark interception acquisition content and the multi-page according to expression formula acquisition content pages;Then, next step is arrived S170 obtains the content pages of acquisition according to the acquisition content pages Template Information of configuration;Then, next step 180 is arrived, according to adopting The content pages of collection judge that it, with the presence or absence of multi-page situation, if it is obtains the list of websites of multi-page according to paging mark Otherwise information is directly marked according to opening flag and end then according to the content of opening flag and end mark interception content pages The content of will interception content pages;Then, next step S190 is arrived, the corresponding expression of field is obtained according to the data table information of configuration Formula or dependency rule extract list data, then, arrive next step S200, by the list data extracted storage to database In, last this method ends at step S210.
Further, the acquisition attributes information includes: acquisition network address, acquisition website coding and frequency acquisition.
Further, the data table information includes: acquisition title, acquisition time, acquisition content and acquisition content Source.
Further, the expression formula uses regular expression, such as finds out acquisition time by regular expression, then Regular expression extracts the formula on date are as follows: d { 4 } (- |/|) d { 1,2 } 1 d { 1,2 }.
The method of configurable collecting webpage data of the invention can provide a kind of configuration needs that can customize for user The mode of collecting webpage data, increases the practicality and validity.
As shown in Fig. 2, the process of the second preferred embodiment in the method for configurable collecting webpage data of the invention In figure, the method for the configurable collecting webpage data proceeds to step S310 after starting from step S300: step S300, The configuration information of collecting webpage data is obtained from database;Then, next step S320 is arrived, according to point of configuration acquisition website Category information obtains the website of classification, then arrives next step S330, according to the subject information of configuration acquisition, obtains institute under website The theme for needing to acquire;Then, required web page contents are acquired according to collected theme to next step S340;Then, To next step S350, according to the data table information of configuration, regular expression or one are used by the data table information configured The information of fixed Rule acquisition content pages;Then, next step S360 is arrived, by the list data extracted storage to data In library;Last this method ends at step S370.
The method of configurable collecting webpage data of the invention can provide a kind of configuration needs that can customize for user The mode of collecting webpage data, it is more simplified and user-friendly, and increase the practicality and validity.
As shown in figure 3, in first or two preferred embodiments of the method for configurable collecting webpage data of the invention In the flow chart of configuration information step, the configuration information step in the method for the configurable collecting webpage data starts from walking Proceed to step S410, the classification and acquisition attributes of configuration acquisition website after rapid S400: step S400;Then, to next Step S420, configuration acquisition theme template;Then, next step S430, configuration acquisition content pages template are carried out;Then, it carries out Next step S440, storage configuration information transfer use into database after convenient;Last this method ends at step S450。
The process of configuration information step of the invention, it is clear to can be realized, related web site needed for detailed search acquires Data information provides the condition support of acquisition, convenient for the progress of method flow.
As shown in figure 4, in the system block diagram of configurable collecting webpage data of the invention, the configurable webpage number According to the system of acquisition, comprising: starting module 510 transfers configuration module 520, judgment module 530, obtains configuration information module 540, content module 550 and memory module 560, database 570 are intercepted;
The judgment module 530 includes: first judgment module 531, the second judgment module 532,533 and of third judgment module 4th judgment module 534;
The interception content module 550 includes: the first interception content module 551 and the second interception content module 552;
The database 570 is used for storage configuration information and list data;
The acquisition configuration information module 540 includes: to obtain Website Module 541, obtain subject of Web site module 542, obtain Content pages module 543 and acquisition list data module 544.
The starting module 510, for starting configurable collecting webpage data system;
Described to transfer configuration module 520, for being acquired needed for being transferred from database the corresponding configuration information;
The first judgment module 531, for judging whether the classification of configuration acquisition website and the function of acquisition attributes, Judge whether to enable the classification for acquiring website, if it is enables the classification of acquisition website, otherwise terminate program;
The acquisition Website Module 541, for according to configuration acquisition website classification and attribute information, from all kinds of websites Website needed for middle acquisition;
Second judgment module 532 logs in collected classifieds website for judging whether, if it is logs in the net It stands, otherwise the website will be logged in using virtual log-on webpage;
The acquisition subject of Web site module 542, for the subject of Web site Template Information according to configuration, acquisition logs in website Required subject information;
The third judgment module 533, for judge the subject content with the presence or absence of multi-page situation, if it is basis Paging mark obtains the list of websites information of multi-page, otherwise directly acquires the web page contents of the theme;
The first interception content module 551, for the opening flag and end mark interception content by web page contents Information;
The acquisition content pages module 543, for the acquisition content page information according to configuration, from the topic module of website Content page information needed for obtaining;
4th judgment module 534, for judging it with the presence or absence of multi-page situation, if it is according to paging mark The list of websites information of multi-page is obtained, content is then intercepted according to opening flag and end mark, otherwise directly according to beginning Mark and end mark intercept content;
The second interception content module 552, in the opening flag and end mark interception by web page contents page Hold information;
The acquisition list data module 544 extracts the corresponding table of field for the acquisition data table information according to configuration Up to formula or Rule list data;
The memory module 560, for storing the data extracted into database.
Further, the acquisition Website Module is first made whether to enable and log in the judgement of website before execution, such as Fruit is the module for obtain subject of Web site and content pages, otherwise will terminate process.
Further, if the 4th judgment module encounters multi-page situation, paging uses datacycle when acquiring content Combined mode acquires data.
Compared with prior art, the advantages of method and system of configurable collecting webpage data of the invention, is, uses Family voluntarily can need the web data acquired by arbitrary disposition, be believed by the relevant data that configured acquisition scheme acquires the whole network Breath, realize flexibly, easily collecting webpage data.
The above description is only an embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure transformation made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant technical fields, Similarly it is included within the scope of the present invention.

Claims (6)

1. a kind of method of configurable collecting webpage data, which is characterized in that this method comprises:
S1, the configuration information that collecting webpage data is obtained from database, the configuration information include: the classification of configuration acquisition website Information, configuration acquisition theme Template Information, configuration acquisition content pages Template Information and configuration data table information;
S2, the classification information that website is acquired according to configuration, the classifieds website acquired needed for obtaining judge whether to log in collected Classifieds website if it is logs in the classifieds website, otherwise will log in the classifieds website using virtual log-on webpage;
S3, theme Template Information is acquired according to configuration, the theme acquired needed for obtaining in the classifieds website of acquisition judges the master Topic whether there is multi-page situation, and the list information of multi-page network address is if it is obtained according to paging mark, passes through the list The content pages of acquisition of information multi-page, otherwise directly acquire content pages;
S4, content pages Template Information is acquired according to configuration, whether the content pages acquired needed for obtaining from the theme of acquisition judge it There are multi-page situations, if it is obtain the list of websites information of multi-page and opening for the content pages according to paging mark Begin mark and end mark, intercepts the content of content pages, otherwise directly according to the opening flag of content pages and end mark, interception The content of content pages;
S5, according to configuration data table information, obtain the corresponding expression formula of field or dependency rule, mentioned from the content pages of acquisition Take list data;
S6, the list data of extraction is stored into database.
2. the method for configurable collecting webpage data according to claim 1, which is characterized in that the data table information It include: to acquire title, acquisition time, acquisition content and the source for acquiring content.
3. the method for configurable collecting webpage data according to claim 1, which is characterized in that the step S1's The configuration step of configuration information includes:
A, the classification and acquisition attributes of configuration acquisition website;
B, configuration acquisition theme template;
C, configuration acquisition content pages template;
D, storage configuration information is into database, to use wait transfer.
4. the method for configurable collecting webpage data according to claim 3, which is characterized in that the acquisition attributes packet It includes: acquisition network address, acquisition website coding and frequency acquisition.
5. a kind of system of the configurable collecting webpage data based on claim 1 the method, which is characterized in that including number According to library and obtain configuration information module, in which:
The acquisition configuration information module, for obtaining the configuration information of collecting webpage data from database;
The database is used for storage configuration information and list data;
The acquisition configuration information module includes obtaining Website Module, obtaining subject of Web site module, obtain content pages module and obtain Take list data module, wherein
The acquisition Website Module, for the classification information according to configuration acquisition website, the classifieds website of acquisition needed for obtaining;
The acquisition subject of Web site module obtains in the classifieds website of acquisition for acquiring theme Template Information according to configuration The theme of required acquisition;
The acquisition content pages module is adopted needed for the acquisition of the theme of acquisition for acquiring content pages Template Information according to configuration The content pages of collection;
List data module is obtained to mention from the content pages of acquisition for obtaining the corresponding expression formula of field or dependency rule Take list data.
6. the system of configurable collecting webpage data according to claim 5, which is characterized in that acquisition website mould Block is also used to judge whether after execution to enable classifieds website, if it is carries out obtaining subject of Web site module and content pages Otherwise module will terminate process.
CN201410822548.4A 2014-12-25 2014-12-25 A kind of method and system of configurable collecting webpage data Active CN104462547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410822548.4A CN104462547B (en) 2014-12-25 2014-12-25 A kind of method and system of configurable collecting webpage data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410822548.4A CN104462547B (en) 2014-12-25 2014-12-25 A kind of method and system of configurable collecting webpage data

Publications (2)

Publication Number Publication Date
CN104462547A CN104462547A (en) 2015-03-25
CN104462547B true CN104462547B (en) 2019-04-02

Family

ID=52908582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410822548.4A Active CN104462547B (en) 2014-12-25 2014-12-25 A kind of method and system of configurable collecting webpage data

Country Status (1)

Country Link
CN (1) CN104462547B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106547749A (en) * 2015-09-16 2017-03-29 北京国双科技有限公司 The method and apparatus of collecting webpage data

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915334A (en) * 2015-05-29 2015-09-16 浪潮软件集团有限公司 Automatic extraction method of key information of bidding project based on semantic analysis
CN106022126B (en) * 2016-05-06 2018-07-24 哈尔滨工程大学 A kind of web page characteristics extracting method towards WEB trojan horse detections
CN106341470A (en) * 2016-08-31 2017-01-18 北京量科邦信息技术有限公司 Method for keeping conversation and grasping continuously-updated data of conversation
CN108520043A (en) * 2018-03-30 2018-09-11 纳思达股份有限公司 Data object acquisition method, apparatus and system, computer readable storage medium
CN108549678B (en) * 2018-04-02 2020-06-19 北京今朝在线科技有限公司 Information acquisition system
CN108763279B (en) * 2018-04-11 2020-12-15 北京中科闻歌科技股份有限公司 Webpage data distributed template acquisition method and system
CN109902220B (en) * 2019-02-27 2023-11-24 腾讯科技(深圳)有限公司 Webpage information acquisition method, device and computer readable storage medium
CN110334259A (en) * 2019-04-22 2019-10-15 新分享科技服务(深圳)有限公司 Webpage data acquiring method, device and computer readable storage medium
CN110188259A (en) * 2019-05-27 2019-08-30 厦门商集网络科技有限责任公司 A kind of data grab method and device of configurableization
CN111953766A (en) * 2020-08-07 2020-11-17 福建省天奕网络科技有限公司 Method and system for collecting network data
CN112667872B (en) * 2020-11-17 2023-04-07 国家计算机网络与信息安全管理中心 Real-time acquisition method of new coronary pneumonia epidemic situation data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101034997A (en) * 2006-03-09 2007-09-12 新数通兴业科技(北京)有限公司 Method and system for accurately publishing the data information
CN101561802A (en) * 2008-04-18 2009-10-21 上海复旦光华信息科技股份有限公司 Web page structural data extraction method and system
CN103593344A (en) * 2012-08-13 2014-02-19 北大方正集团有限公司 Information acquisition method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101034997A (en) * 2006-03-09 2007-09-12 新数通兴业科技(北京)有限公司 Method and system for accurately publishing the data information
CN101561802A (en) * 2008-04-18 2009-10-21 上海复旦光华信息科技股份有限公司 Web page structural data extraction method and system
CN103593344A (en) * 2012-08-13 2014-02-19 北大方正集团有限公司 Information acquisition method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106547749A (en) * 2015-09-16 2017-03-29 北京国双科技有限公司 The method and apparatus of collecting webpage data
CN106547749B (en) * 2015-09-16 2021-02-12 北京国双科技有限公司 Webpage data acquisition method and device

Also Published As

Publication number Publication date
CN104462547A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN104462547B (en) A kind of method and system of configurable collecting webpage data
US11372935B2 (en) Automatically generating a website specific to an industry
JP6377807B2 (en) Rewriting search queries in online social networks
US8856100B2 (en) Displaying browse sequence with search results
CN104765729B (en) A kind of cross-platform microblogging community account matching process
US9355137B2 (en) Displaying articles matching a user's interest based on key words and the number of comments
CN103294781A (en) Method and equipment used for processing page data
CN102314440B (en) Utilize the method and system in network operation language model storehouse
WO2019080910A1 (en) Information processing system and method thereof for implementing information processing
CN108170678A (en) A kind of text entities abstracting method and system
CN106302849A (en) A kind of method carrying out moving solid fusion by carrier data
CN101894109A (en) Database building method and device
CN104915438B (en) A method of obtaining PCU associated data in specific topics microblogging
EP4232980A1 (en) Content based related view recommendations
CN103999079A (en) Aligning annotation of fields of documents
CN103997492B (en) A kind of adaption system and method
CN106339381A (en) Method and device for processing information
CN103377207B (en) Microblog users relation acquisition method based on script engine
JP6680472B2 (en) Information processing apparatus, information processing method, and information processing program
Vicient et al. Unsupervised semantic clustering of Twitter hashtags
JP7003481B2 (en) Reinforcing rankings for social media accounts and content
Shen et al. A Catalogue Service for Internet GIS ervices Supporting Active Service Evaluation and Real‐Time Quality Monitoring
CN106599076B (en) Forum guide map generation method and device
JP2009230483A (en) Information retrieving method, program and device
CN104331472A (en) Construction method and device of word segmentation training data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant