CN107273450A - The system that a kind of novel intelligent recommends resource - Google Patents

The system that a kind of novel intelligent recommends resource Download PDF

Info

Publication number
CN107273450A
CN107273450A CN201710398772.9A CN201710398772A CN107273450A CN 107273450 A CN107273450 A CN 107273450A CN 201710398772 A CN201710398772 A CN 201710398772A CN 107273450 A CN107273450 A CN 107273450A
Authority
CN
China
Prior art keywords
module
reptile
resource
webpage
data storehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710398772.9A
Other languages
Chinese (zh)
Inventor
肖雪松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Bright Technology Co Ltd
Original Assignee
Chengdu Bright Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Bright Technology Co Ltd filed Critical Chengdu Bright Technology Co Ltd
Priority to CN201710398772.9A priority Critical patent/CN107273450A/en
Publication of CN107273450A publication Critical patent/CN107273450A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the system that a kind of novel intelligent recommends resource, including Internet resources collection module, the Internet resources collection module includes reptile distributor and reptile execution unit, the Internet resources collection module is connected with reptile Depending module and webpage decomposing module, webpage decomposing module is connected with working data base, work collects storehouse and is connected with interim incremental data storehouse, interim incremental data storehouse is connected with renewal incremental data storehouse, update incremental data storehouse and be connected with screening module, screening module is connected with local file subsystem and interactive module, the local file subsystem is connected with working data base.

Description

The system that a kind of novel intelligent recommends resource
Technical field
The present invention relates to a kind of system, and in particular to the system that a kind of novel intelligent recommends resource.
Background technology
By means of Internet technology, big data application is persistently deepened in all fields, especially the sphere of life in people Interior, the life given people brings many facilities.Such as recommend the news client of class, similar to today's tops, Netease News, phoenix news etc., effectively have accumulated high-quality news, and the click, reading behavior expansion news for user are recommended. The content of news is related to amusement, physical culture, military affairs, science and technology, finance and economics etc., it can be seen that, they more concentrate on the life neck of people Domain.
However, facing to various huge operating pressures, in the career field of people, equally, even more needing such one Information is planted to recommend to come constantly to lift the ability to work of user, the open work visual field.At present, it has not been found that in the market has such one Money gathers resource, the recommendation application towards work.
Currently, people encounter problems in the course of the work, are typically to have two ways solution:
1. the appearance of search engine, similar Baidu, the work that can be given people bring very big help, still, need simultaneously People are wanted to look for the information come in handy from the internet for being largely flooded with various resources.A large amount of invalid informations (advertisement) with And the screening of duplicate message, by consumption user's plenty of time and energy.
2. in the mobile interchange epoch, the fragmentation of the life extreme of people, on working road, rest gap etc., mobile phone is all Into the instrument of people's heavy dependence.People how are helped to lift the profile of oneself using these fragmentation times, Some have accomplished this point to a certain extent in fact using (such as wechat public number, the mobile client of professional website), but The problem of being it, is that content is excessively disperseed, and resource is not enough concentrated, while for individual subscriber situation can not be met Propertyization is serviced.
Therefore, if there can be a kind of application, the various high-quality resources concentrated on internet can be converged, according to of user People post and user's usage behavior, personalized ventilation system is carried out for user's self-demand, will greatly lift the work of user Make ability and personal quality.
The content of the invention
The technical problems to be solved by the invention are that existing network information total amount is various and mixed and disorderly, now allegro A large amount of invalid information and advertisement significantly reduce the efficiency of people in life, are not easy to people's fast search, it is therefore intended that The system that a kind of novel intelligent recommends resource is provided, the existing information of solution is not concentrated, content is disperseed, it is impossible to for of people The problem of people's situation quickly filters out high-quality effective resource.
The present invention is achieved through the following technical solutions:
The system that a kind of novel intelligent recommends resource, including Internet resources collection module, the Internet resources collection module Including reptile distributor and reptile execution unit, the Internet resources collection module is connected with reptile Depending module and webpage point Module is solved, webpage decomposing module is connected with working data base, and work collects storehouse and is connected with interim incremental data storehouse, interim incremental number Renewal incremental data storehouse is connected with according to storehouse, incremental data storehouse is updated and is connected with screening module, screening module is connected with local file Subsystem and interactive module, the local file subsystem are connected with working data base;
The dependence that the reptile Depending module is used between Configuration network resource collection module and target network resource; The dependence that Internet resources collection module can be set up by reptile Depending module, is accordingly climbed by the configuration of reptile distributor Worm execution unit performs resource and collected;
The webpage decomposing module is used for the decomposition of webpage, removes advertising message, removes noise;
The working data base is used for and current user interest content does similarity-rough set, is user according to sequencing of similarity Push content;
It is described to update the content that incremental data storehouse is used to store network upgrade in a time cycle;
The interim incremental data storehouse is used for storage, and then last time crawls the content that breakpoint is crawled;
The interactive module is used for the hobby for analyzing user, often inputs keyword;
The screening module is used to screening user obtains keyword in interactive module in updating incremental data storehouse;
The local file subsystem is used to store the web data passed through in screening module.
The system that a kind of described novel intelligent recommends resource, local file subsystem connects a distributed document subsystem System, the distributed document subsystem, for the web data in the synchronous local file subsystem.
The system that a kind of described novel intelligent recommends resource, reptile distributor includes initial cell, page download mould Block, closing unit, the initial cell, which is used to collect for Internet resources, prepares necessary memory space and overhead;The net Page download module is used for the number for selecting different crawlers to collect target network resource according to the data type of target network resource According to;The closing unit is used to discharge overhead after collection device is collected into required target data and is collecting dress Abnormality processing is carried out when putting appearance exception.
The system that a kind of described novel intelligent recommends resource, local file subsystem includes url filtering device, the URL Filter is used to enter rearrangement to the web data of collection.
The system that a kind of described novel intelligent recommends resource, url filtering device is the mistake based on binary system array bitSet Filter.
Crawler system mainly completes the acquisition accumulation of network high-quality resource, and the data climbed down are processed and tentatively divided Analysis, its job step is as follows:
Working data base is respectively created, incremental data storehouse and interim incremental data storehouse is updated, wherein working data base is used for Similarity-rough set is done with current user interest content, is that user pushes content according to sequencing of similarity;Incremental data storehouse is updated to use Family stores in a time cycle content of (being typically one day) network upgrade;Volatile data base is used for storage, and then last time crawls The content that breakpoint is crawled;The webpage on network is crawled by reptile module;The main contents of webpage are extracted, are regenerated suitable The webpage that client is read;Participle is carried out to content textual portions, URL, title, label, source, time, the word of content is obtained The attributes such as frequency;Judge the content whether be network upgrade content;If the content is the content of website recent renewal, it will calculate Obtained contents attribute deposit updates incremental data storehouse;Otherwise, it is stored in interim incremental data storehouse;
Main system recommends the job step of task as follows:Whether judge user is to use mobile client, principal series for the first time System takes out some datas from incremental data storehouse is updated and is pushed to user;Otherwise, by according to user's last time content of interest Characteristic vector, is pushed to user, wherein deriving from according to a certain percentage from working data base and in updating incremental data storehouse respectively The data of working data base are the results according to user's content of interest similarity-rough set;User's row is collected by mobile client To be uploaded to the interactive module of main system, interactive module can combine the post feature and user behavior of user (to content to user It is interested), the content that analysis next step is recommended now, jumps to working data base.
Main system search module, is mainly inputted according to user key words, is produced in use in combination with user Interest keyword, full-text search is carried out using Lucene search libraries to climbing down come all the elements related to the post.
The concern part of main system is substantially identical with search, and difference is that meet user closes to a certain for a long time The tracking of key word.
The present invention compared with prior art, has the following advantages and advantages:
1st, a kind of novel intelligent of the invention recommends the system of resource, and the system can screen net according to the hobby of user Network information, filters out junk information, the property liked according to the demand of user and worked, and pushes content so that search is more Rapid and convenient;
2nd, a kind of novel intelligent of the invention recommends the system of resource, and the post feature that the present invention combines user pushes resource, It is more practical, and can effectively improve the ability to work of user.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is schematic structural view of the invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, with reference to embodiment and accompanying drawing, to this Invention is described in further detail, and exemplary embodiment and its explanation of the invention is only used for explaining the present invention, does not make For limitation of the invention.
Embodiment
As shown in figure 1, a kind of novel intelligent of the invention recommends the system of resource, including Internet resources collection module, it is described Internet resources collection module includes reptile distributor and reptile execution unit, and the Internet resources collection module is connected with reptile Depending module and webpage decomposing module, webpage decomposing module are connected with working data base, and work collects storehouse and is connected with interim increment Database, interim incremental data storehouse is connected with renewal incremental data storehouse, updates incremental data storehouse and is connected with screening module, screens mould Block is connected with local file subsystem and interactive module, and the local file subsystem is connected with working data base;
The dependence that the reptile Depending module is used between Configuration network resource collection module and target network resource; The dependence that Internet resources collection module can be set up by reptile Depending module, is accordingly climbed by the configuration of reptile distributor Worm execution unit performs resource and collected;
The webpage decomposing module is used for the decomposition of webpage, removes advertising message, removes noise;
The working data base is used for and current user interest content does similarity-rough set, is user according to sequencing of similarity Push content;
It is described to update the content that incremental data storehouse is used to store network upgrade in a time cycle;
The interim incremental data storehouse is used for storage, and then last time crawls the content that breakpoint is crawled;
The interactive module is used for the hobby for analyzing user, often inputs keyword;
The screening module is used to screening user obtains keyword in interactive module in updating incremental data storehouse;
The local file subsystem is used to store the web data passed through in screening module.
A kind of system that novel intelligent recommends resource, the local file subsystem connects a distributed document subsystem System, the distributed document subsystem, for the web data in the synchronous local file subsystem.
The system that a kind of novel intelligent recommends resource, the reptile distributor include initial cell, webpage download module, Closing unit, the initial cell, which is used to collect for Internet resources, prepares necessary memory space and overhead;The webpage Download module is used for the data for selecting different crawlers to collect target network resource according to the data type of target network resource; The closing unit is used to discharge overhead and in collection device after collection device is collected into required target data Abnormality processing is carried out when occurring abnormal.
A kind of system that novel intelligent recommends resource, the local file subsystem includes url filtering device, the URL mistakes Filter is used to enter rearrangement to the web data of collection.
A kind of system that novel intelligent recommends resource, the url filtering device is the filtering based on binary system array bitSet Device.
Above-described embodiment, has been carried out further to the purpose of the present invention, technical scheme and beneficial effect Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc. all should be included Within protection scope of the present invention.

Claims (5)

1. the system that a kind of novel intelligent recommends resource, it is characterised in that including Internet resources collection module, the Internet resources Collection module includes reptile distributor and reptile execution unit, and the Internet resources collection module is connected with reptile Depending module With webpage decomposing module, webpage decomposing module is connected with working data base, and work collects storehouse and is connected with interim incremental data storehouse, faces When incremental data storehouse be connected with renewal incremental data storehouse, update incremental data storehouse and be connected with screening module, screening module is connected with Local file subsystem and interactive module, the local file subsystem are connected with working data base;
The dependence that the reptile Depending module is used between Configuration network resource collection module and target network resource;Network The dependence that resource collection module can be set up by reptile Depending module, configures corresponding reptile by reptile distributor and holds Row unit performs resource and collected;
The webpage decomposing module is used for the decomposition of webpage, removes advertising message, removes noise;
The working data base is used for and current user interest content does similarity-rough set, is pushed according to sequencing of similarity for user Content;
It is described to update the content that incremental data storehouse is used to store network upgrade in a time cycle;
The interim incremental data storehouse is used for storage, and then last time crawls the content that breakpoint is crawled;
The interactive module is used for the hobby for analyzing user, often inputs keyword;
The screening module is used to screening user obtains keyword in interactive module in updating incremental data storehouse;
The local file subsystem is used to store the web data passed through in screening module.
2. the system that a kind of novel intelligent according to claim 1 recommends resource, it is characterised in that local file System connects a distributed document subsystem, the distributed document subsystem, for the synchronous local file subsystem In web data.
3. the system that a kind of novel intelligent according to claim 1 recommends resource, it is characterised in that the reptile distribution dress Put including initial cell, webpage download module, closing unit, the initial cell, which is used to collect for Internet resources, prepares necessary Memory space and overhead;The webpage download module is used to select different reptiles according to the data type of target network resource Program collects the data of target network resource;The closing unit is used for after collection device is collected into required target data Discharge overhead and carry out abnormality processing when collection device occurs abnormal.
4. the system that a kind of novel intelligent according to claim 1 recommends resource, it is characterised in that local file System includes url filtering device, and the url filtering device is used to enter rearrangement to the web data of collection.
5. the system that a kind of novel intelligent according to claim 4 recommends resource, it is characterised in that the url filtering device For the filter based on binary system array bitSet.
CN201710398772.9A 2017-05-31 2017-05-31 The system that a kind of novel intelligent recommends resource Pending CN107273450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710398772.9A CN107273450A (en) 2017-05-31 2017-05-31 The system that a kind of novel intelligent recommends resource

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710398772.9A CN107273450A (en) 2017-05-31 2017-05-31 The system that a kind of novel intelligent recommends resource

Publications (1)

Publication Number Publication Date
CN107273450A true CN107273450A (en) 2017-10-20

Family

ID=60064335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710398772.9A Pending CN107273450A (en) 2017-05-31 2017-05-31 The system that a kind of novel intelligent recommends resource

Country Status (1)

Country Link
CN (1) CN107273450A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314463A (en) * 2010-07-07 2012-01-11 北京瑞信在线***技术有限公司 Distributed crawler system and webpage data extraction method for the same
US20130254217A1 (en) * 2012-03-07 2013-09-26 Ut-Battelle, Llc Recommending personally interested contents by text mining, filtering, and interfaces
CN103902732A (en) * 2014-04-18 2014-07-02 北京大学 Construction and network resource collection method of self-adaption network resource collection system
CN104809154A (en) * 2015-03-19 2015-07-29 百度在线网络技术(北京)有限公司 Method and device for recommending information
CN105893559A (en) * 2016-03-31 2016-08-24 北京奇艺世纪科技有限公司 Data pushing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314463A (en) * 2010-07-07 2012-01-11 北京瑞信在线***技术有限公司 Distributed crawler system and webpage data extraction method for the same
US20130254217A1 (en) * 2012-03-07 2013-09-26 Ut-Battelle, Llc Recommending personally interested contents by text mining, filtering, and interfaces
CN103902732A (en) * 2014-04-18 2014-07-02 北京大学 Construction and network resource collection method of self-adaption network resource collection system
CN104809154A (en) * 2015-03-19 2015-07-29 百度在线网络技术(北京)有限公司 Method and device for recommending information
CN105893559A (en) * 2016-03-31 2016-08-24 北京奇艺世纪科技有限公司 Data pushing method and device

Similar Documents

Publication Publication Date Title
López-Robles et al. Understanding the intellectual structure and evolution of Competitive Intelligence: A bibliometric analysis from 1984 to 2017
Marres et al. Scraping the social? Issues in live social research
CN102354315B (en) Generation method of site navigation page and device thereof
CN104111941B (en) The method and apparatus that information is shown
CN105468744B (en) Big data platform for realizing tax public opinion analysis and full text retrieval
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN102831193A (en) Topic detecting device and topic detecting method based on distributed multistage cluster
CN107657057A (en) A kind of enterprise's reference information fusion graphic method
CN105930469A (en) Hadoop-based individualized tourism recommendation system and method
CN103646078B (en) Method and device for realizing internet propaganda monitoring target evaluations
CN110597981A (en) Network news summary system for automatically generating summary by adopting multiple strategies
CN102880712A (en) Method and system for sequencing searched network videos
CN107943812A (en) Recommend method for the news of user's centralized integration resource
Escadafal et al. First appraisal of the current structure of research on land and soil degradation as evidenced by bibliometric analysis of publications on desertification
CN105975609A (en) Industrial design product intelligent recommendation method and system
CN103810283A (en) Microblog data acquisition method based on user correlation
CN102214227B (en) Automatic public opinion monitoring method based on internet hierarchical structure storage
CN103902579A (en) Method and device for acquiring information
CN103714120B (en) A kind of system that user interest topic is extracted in the access record from user url
CN106294358A (en) The search method of a kind of information and system
CN103106234A (en) Searching method and device of webpage content
CN103198078B (en) A kind of internet news event report trend analysis and system
CN106776640A (en) A kind of stock information information displaying method and device
Urbinati et al. Measuring scientific brain drain with hubs and authorities: A dual perspective
CN110717089A (en) User behavior analysis system and method based on weblog

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20171020

RJ01 Rejection of invention patent application after publication