CN103577403A - Cloud computing technology based recommendation system implementation method - Google Patents

Cloud computing technology based recommendation system implementation method Download PDF

Info

Publication number
CN103577403A
CN103577403A CN201210250029.6A CN201210250029A CN103577403A CN 103577403 A CN103577403 A CN 103577403A CN 201210250029 A CN201210250029 A CN 201210250029A CN 103577403 A CN103577403 A CN 103577403A
Authority
CN
China
Prior art keywords
recommendation
data
cloud computing
algorithm
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201210250029.6A
Other languages
Chinese (zh)
Inventor
***
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhenjiang Yction Software Co Ltd
Original Assignee
Zhenjiang Yction Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhenjiang Yction Software Co Ltd filed Critical Zhenjiang Yction Software Co Ltd
Priority to CN201210250029.6A priority Critical patent/CN103577403A/en
Publication of CN103577403A publication Critical patent/CN103577403A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the field of a cloud computing technology and provides a CCBRS (Cloud Computing Based Recommendation System) based on comparative research on the traditional recommendation algorithm. Different recommendation strategies can be adopted according to different recommendation requirements. Multiple recommendation algorithms under the conditions of one-machine environment, pseudo-distributed platform and distributed platform are tested and analyzed combined with the CCBRS to achieve selection of the recommendation strategies under different conditions. According to the CCBRS based on the comparative research on the traditional recommendation algorithm, the different recommendation strategies can be adopted according to the different recommendation requirements and accordingly the versatility and the extendibility are good and a good support can be provided for large-scale data processing.

Description

Commending system implementation method based on cloud computing technology
Technical field
The present invention relates to a kind of implementation method of commending system, particularly a kind of commending system based on cloud computing technology, the invention belongs to technical field of the computer network.
Background technology
The target of personalized recommendation system (abbreviation commending system) is, for one group of user's interested product of possibility or project (items), to produce also and provide significant recommendation information to them.Compare with conventional systems tools and techniques such as database, search engines, research about commending system is started late, approximately since the mid-90 in last century, it just becomes a relatively independent research field, however, short two during the decade, research and the application of commending system have obtained significant progress.In recent years, along with the develop rapidly of the novel internet, applications form of the Top Sites such as Amazon, Yahoo, Google, YouTube, Netflix, IMDb and representative thereof, people also grow with each passing day for the research and development enthusiasm of commending system.Especially in e-commerce field, along with the rapid growth of commodity amount and kind, and the varying of commercial quality and grade, client often requires a great deal of time and just can find the commodity of oneself wanting.Therefore, e-commerce website, in order to attract client to realize profit in keen competition, must design efficient personalized recommendation system, according to client's preference or demand, for client does shopping, provides decision support and the information service of complete personalization.
At present, there have been many commercial commending systems to be used widely in practice, they provide such as hot item recommendation, New Arrivals, product correlation recommendation and customer group with interest recommendation etc. for user, more representational as Amazon, eBay, CDNow, GroupLens, Netflix and Moviefinder etc.Wherein, Amazon, mainly by utilizing user preference or other users' purchase information, recommends relevant books or other products to user; Moviefinder mainly adopts collaborative filtering to carry out film according to user preference or music disc is recommended; GroupLens mainly utilizes the automatic system based on data set to filter and provides News Recommendation Service Based etc. to user.Generally speaking, also there are some problems in current existing commending system, as: commending system usually needs customized development and underaction, majority are failed to take into full account business strategy, are difficult to adopt the different strategies etc. of recommending according to the variation of recommended requirements.Meanwhile, along with the further expansion of e-commerce system scale, on large-scale dataset, for providing real-time recommendation service, ten hundreds of users becomes more and more difficult.
In recent years, on the basis of Distributed Calculation, grid meter grate, parallel computation and the network storage etc., developed cloud computing (cloud computing).Cloud computing technology can be by network the relatively low computational entity of a plurality of costs, be integrated into a distributed system with powerful calculating ability, and by infrastructure, serve that (IaaS), platform serve (PaaS), software serve the concepts such as (SaaS) and managed service provider (MSP), powerful computing power and storage capacity are distributed in terminal user's hand.The advantage that builds personalized recommendation system based on cloud computing is very obvious, for example: cloud computing contributes to carry out efficient large-scale data excavation on the data set of magnanimity; Cloud computing is easy to realize Distributed Parallel Computing Environment cheaply, reduces data processing cost and the dependence to high-performance server of commending system; Cloud computing can be the exploitation shielding bottom isomerism of commending system, and can effectively utilize existing equipment to improve the processing power of large-scale data and speed, improves transplantability and the fault-tolerance of commending system.
Therefore, the present invention is comparing on the basis of research traditional personalized recommendation algorithm, a kind of personalized recommendation system (CCBRS) based on cloud computing has been proposed, this system can adopt according to different recommended requirements different recommendation strategies, there is good Universal and scalability, and can provide good support to the processing of large-scale data.
Summary of the invention
The cloud computing technologies such as the present invention's application Hadoop and Mahout are processed large-scale data, proposed a kind of personalized recommendation system (CCBRS) based on cloud computing, this system can customize different recommendation strategies according to different situations and different recommended requirements.
CCBRS system based on cloud computing mainly comprises three subsystems, i.e. data-storage system, recommendation computing system and business application system.Wherein data-storage system mainly comprises two parts: real-time, interactive database is (for business application system, relevant database based on common), distributed file system (utilize Hadoop HDFS to realize highly reliable distributed data file memory function, by mass data distributed store on many computer clusters).Recommend computing system to comprise that data preprocessing module (cleans isomeric data, conversion, load etc.), data-mining module (producing respectively user clustering and commercial articles clustering through cluster and association rule algorithm) and recommending module (application content filter algorithm, collaborative filtering, mixing proposed algorithm etc. calculates the result of recommendation), algorithm in above-mentioned module all carries out in conjunction with MapReduce distributed computing framework when operation, the main Mahout machine learning framework that adopts builds corresponding proposed algorithm simultaneously, Mahout has realized cluster, classification, collaborative filtering, the data mining algorithms such as Evolutionary Programming, and allow expansion, therefore can customize corresponding Mahout algorithms library according to the business demand of exemplary application layer.Business application system will be recommended strategy customization according to actual business demand, concrete calculating by recommending computing system to complete, and result of calculation is called for operation system.In CCBRS system, general business application system is real-time online work, recommends computing system to adopt and work offline, and can reduce as far as possible like this pressure of server, and can improve the execution efficiency of commending system.
Accompanying drawing explanation:
Fig. 1: each algorithm of stand-alone environment is recommended the time
Fig. 2: stand-alone environment Item Clustering working time
Fig. 3: pseudo-distributed SlopeOne working time
Fig. 4: distributed execution improves Item-Based working time
Embodiment:
Distributed cloud environment based on Hadoop mainly contains two kinds of patterns: a kind of is the pseudo-distributed mode of Hadoop based on unit, and another kind is the complete distributed mode of Hadoop.When building experimental situation, adopt Hadoop 0.2 version, because Hadoop needs JDK when moving, support, therefore selected corresponding jdkl.6.0_24.Concrete development environment is Eclipse+Hadoopeclipse plugin, hardware experiment platform be configured to OS:CentOS5.5 x64; CPU:Intel (R) Xeon (R) E54202.50GHz; Memory:4GB RAM.In test, main four PC (being PC1~PC4) that adopt build cloud computing environment, and wherein PC1 is as namenode and Jobtracker, and PC2~PC4 is as datanode and tasktracker.
To every PC /etc/hosts catalogue configuration is as follows: masters:192.168.10.1, slaves:192.168.10.1,192.168.10.2,192.168.10.3,192.168.10.4.In addition, on PC1, utilize ssh-keygen to generate the key pair of PC1, then its PKI is copied in each machine/home/.s sh catalogue, thereby makes to may be completed to logining without password ssh of each machine from PC1.Aspect the key configuration of Hadoop, the localhost under the conf/masters of every machine and slaves is revised as to corresponding IP address, and under conf/mapred-site.xml, configures the lP address of namenode and jobtracker.Table 3 has been listed and relevant hadoop key configuration parameter, during concrete configuration, can raise realization by revising conf/core-site.xml, conf/mapred-site.xml and conf/hdfs-site.xml.
The data set adopting in test comes from MovieLens[181 and Libimsetj[19], wherein from MovieLens, there are three piece of data, 100,000,1,000,000 and 1,000 ten thousand evaluation informations about film (size is respectively 1.88MB, 23.4MB and 234MB) are that information (size is 253MB) is steathily commented in 1,736 ten thousand anonymities that generated by 14 general-purpose families from the data set of Libimseti.First, carry out aforementioned six kinds of proposed algorithms and recommend 5 article to any designated user under stand-alone environment, wherein the K value in svd algorithm gets 10, and the neighbours' number in improved KNN algorithm gets 10.And because Item Clustering algorithm is longer to the data set execution time more than 100,000 magnitudes, therefore chosen several little data volumes, this algorithm is tested.
As seen from Figure 1, pretty good based on user and project-based proposed algorithm overall performance, svd algorithm and improved KNN algorithm are along with data volume increases execution time sharp increase, SlopeOne algorithm due to the restriction of internal memory make the execution time with data volume increase and more and more slower, when the data of carrying out 1,000 ten thousand, occur that internal memory overflows.Item Clustering algorithm also can, along with the increasing of data volume, show the phenomenon of execution time sharp increase as seen from Figure 2.Next test the situation of carrying out SlopeOne algorithm on the pseudo-distributed platform consisting of four PCs, under the data set of different magnitudes, the execution time of this algorithm as shown in Figure 3.
The pseudo-distributed SlopeOne Riming time of algorithm of Fig. 3
The distributed execution of Fig. 4 improves Item-Based Riming time of algorithm
As seen from Figure 3, in pseudo-distribution platform, carrying out the whole time of SlopeOne algorithm obviously improves, but with similar at stand-alone environment, when data volume is excessive, in pseudo-distribution platform, also occurred the situation that can not complete within effective time due to internal memory restriction, data magnitude has occurred that at 1,000 ten thousand o'clock internal memory overflows.The situation of improved Item-Based algorithm is carried out in last test on four PCs with distributed way, under different pieces of information amount, the corresponding execution time as shown in Figure 4.
In addition, also further tested and be increased in 5 and 7 s' situation, for the situation of the Item-Based algorithm after the operational development of 1,736 ten thousand archives score information employing distributed way when node.When nodes is increased to 5, the execution time of 1,736 ten thousand data is 5.5 hours; When nodes is increased to 7, the execution time is 3.5 hours.Reflect the variation tendency that the execution time reduces along with the increase of nodes, the performance that the cloud computing environment of the appropriate scale forming for the node utilizing by One's name is legion significantly improves commending system provides good experimental data to support.
In addition to the implementation, the present invention can also have other embodiments.All employings are equal to the technical scheme of replacement or equivalent transformation formation, all drop on the protection domain of requirement of the present invention.

Claims (2)

1. the commending system implementation method based on cloud computing technology, is primarily characterized in that and comprises following functional module: data-storage system, recommendation computing system and business application system.
2. the commending system implementation method of a kind of cloud computing technology of the proposition based on claim 1, is further characterized in that:
Real-time, interactive database is (for business application system, relevant database based on common), distributed file system (utilize HadoopHDFS to realize highly reliable distributed data file memory function, by mass data distributed store on many computer clusters);
Recommend computing system comprise data preprocessing module (to isomeric data clean, change, loading etc.), data-mining module (producing respectively user clustering and commercial articles clustering through cluster and association rule algorithm) and recommending module (application content filter algorithm, collaborative filtering, mixing proposed algorithm etc. calculate the result of recommendation);
Business application system will be recommended strategy customization according to actual business demand, concrete calculating by recommending computing system to complete, and result of calculation is called for operation system.
CN201210250029.6A 2012-07-19 2012-07-19 Cloud computing technology based recommendation system implementation method Pending CN103577403A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210250029.6A CN103577403A (en) 2012-07-19 2012-07-19 Cloud computing technology based recommendation system implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210250029.6A CN103577403A (en) 2012-07-19 2012-07-19 Cloud computing technology based recommendation system implementation method

Publications (1)

Publication Number Publication Date
CN103577403A true CN103577403A (en) 2014-02-12

Family

ID=50049211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210250029.6A Pending CN103577403A (en) 2012-07-19 2012-07-19 Cloud computing technology based recommendation system implementation method

Country Status (1)

Country Link
CN (1) CN103577403A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104503967A (en) * 2014-10-24 2015-04-08 浪潮电子信息产业股份有限公司 Hadoop-based network recommendation method
CN105704566A (en) * 2016-04-25 2016-06-22 浪潮软件集团有限公司 Video recommendation system based on television set top box
CN106503140A (en) * 2016-10-20 2017-03-15 安徽大学 One kind is based on Hadoop cloud platform web resource personalized recommendation system and method
CN106528812A (en) * 2016-08-05 2017-03-22 浙江工业大学 USDR model based cloud recommendation method
CN109873856A (en) * 2018-12-18 2019-06-11 深圳先进技术研究院 A kind of side cloud Synergistic method of rule-based evolution
EP3553676A4 (en) * 2016-12-27 2019-11-06 Huawei Technologies Co., Ltd. Smart recommendation method and terminal
CN111310042A (en) * 2020-02-13 2020-06-19 研祥智能科技股份有限公司 Collaborative filtering recommendation method and system based on cloud computing
CN112395197A (en) * 2020-11-19 2021-02-23 中国平安人寿保险股份有限公司 Data processing method, data processing device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957968A (en) * 2010-08-31 2011-01-26 南京财经大学 Online transaction service aggregation method based on Hadoop
CN102169505A (en) * 2011-05-16 2011-08-31 苏州两江科技有限公司 Recommendation system building method based on cloud computing
CN102523246A (en) * 2011-11-23 2012-06-27 陈刚 Cloud computation treating system and method
CN102546771A (en) * 2011-12-27 2012-07-04 西安博构电子信息科技有限公司 Cloud mining network public opinion monitoring system based on characteristic model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957968A (en) * 2010-08-31 2011-01-26 南京财经大学 Online transaction service aggregation method based on Hadoop
CN102169505A (en) * 2011-05-16 2011-08-31 苏州两江科技有限公司 Recommendation system building method based on cloud computing
CN102523246A (en) * 2011-11-23 2012-06-27 陈刚 Cloud computation treating system and method
CN102546771A (en) * 2011-12-27 2012-07-04 西安博构电子信息科技有限公司 Cloud mining network public opinion monitoring system based on characteristic model

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104503967A (en) * 2014-10-24 2015-04-08 浪潮电子信息产业股份有限公司 Hadoop-based network recommendation method
CN105704566A (en) * 2016-04-25 2016-06-22 浪潮软件集团有限公司 Video recommendation system based on television set top box
CN106528812A (en) * 2016-08-05 2017-03-22 浙江工业大学 USDR model based cloud recommendation method
CN106528812B (en) * 2016-08-05 2019-04-23 浙江工业大学 A kind of cloud recommended method based on USDR model
CN106503140A (en) * 2016-10-20 2017-03-15 安徽大学 One kind is based on Hadoop cloud platform web resource personalized recommendation system and method
EP3553676A4 (en) * 2016-12-27 2019-11-06 Huawei Technologies Co., Ltd. Smart recommendation method and terminal
CN109873856A (en) * 2018-12-18 2019-06-11 深圳先进技术研究院 A kind of side cloud Synergistic method of rule-based evolution
CN111310042A (en) * 2020-02-13 2020-06-19 研祥智能科技股份有限公司 Collaborative filtering recommendation method and system based on cloud computing
CN112395197A (en) * 2020-11-19 2021-02-23 中国平安人寿保险股份有限公司 Data processing method, data processing device and electronic equipment

Similar Documents

Publication Publication Date Title
CN103577403A (en) Cloud computing technology based recommendation system implementation method
Candillier et al. Designing specific weighted similarity measures to improve collaborative filtering systems
Karydi et al. Parallel and distributed collaborative filtering: A survey
CN106600302A (en) Hadoop-based commodity recommendation system
CN105488216A (en) Recommendation system and method based on implicit feedback collaborative filtering algorithm
US10331681B1 (en) Crowdsourced evaluation and refinement of search clusters
CN102298650B (en) Distributed recommendation method of massive digital information
Dhruv et al. Artist recommendation system using hybrid method: A novel approach
Liang et al. Parallel user profiling based on folksonomy for large scaled recommender systems: An implimentation of cascading mapreduce
Mustafee et al. Exploring the e-science knowledge base through co-citation analysis
Puntheeranurak et al. An Item-based collaborative filtering method using Item-based hybrid similarity
Zhang et al. An Improved Collaborative Filtering Algorithm Based on User Interest.
CN104992352A (en) Individualized resource retrieval method
Chen et al. Machine learning-based product recommendation using Apache Spark
Mohbey et al. The impact of big data in predictive analytics towards technological development in cloud computing
Saravanan Design of large-scale Content-based recommender system using hadoop MapReduce framework
Sogodekar et al. Big data analytics: hadoop and tools
Rathod et al. A survey of personalized recommendation system with user interest in social network
Lu et al. The improvement and implementation of distributed item-based collaborative filtering algorithm on Hadoop
Chang et al. A personalized IPTV channel-recommendation mechanism based on the MapReduce framework
Veena et al. A user-based recommendation with a scalable machine learning tool
Pan et al. Skyline web service selection with mapreduce
Guo et al. A PageRank-based collaborative filtering recommendation approach in digital libraries
Gandhi et al. Hybrid recommendation system with collaborative filtering and association rule mining using big data
Lu et al. Genderpredictor: a method to predict gender of customers from e-commerce website

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140212

WD01 Invention patent application deemed withdrawn after publication