CN103577403A

CN103577403A - Cloud computing technology based recommendation system implementation method

Info

Publication number: CN103577403A
Application number: CN201210250029.6A
Authority: CN
Inventors: ***
Original assignee: Zhenjiang Yction Software Co Ltd
Current assignee: Zhenjiang Yction Software Co Ltd
Priority date: 2012-07-19
Filing date: 2012-07-19
Publication date: 2014-02-12

Abstract

The invention belongs to the field of a cloud computing technology and provides a CCBRS (Cloud Computing Based Recommendation System) based on comparative research on the traditional recommendation algorithm. Different recommendation strategies can be adopted according to different recommendation requirements. Multiple recommendation algorithms under the conditions of one-machine environment, pseudo-distributed platform and distributed platform are tested and analyzed combined with the CCBRS to achieve selection of the recommendation strategies under different conditions. According to the CCBRS based on the comparative research on the traditional recommendation algorithm, the different recommendation strategies can be adopted according to the different recommendation requirements and accordingly the versatility and the extendibility are good and a good support can be provided for large-scale data processing.

Description

Commending system implementation method based on cloud computing technology

Technical field

The present invention relates to a kind of implementation method of commending system, particularly a kind of commending system based on cloud computing technology, the invention belongs to technical field of the computer network.

Background technology

The target of personalized recommendation system (abbreviation commending system) is, for one group of user's interested product of possibility or project (items), to produce also and provide significant recommendation information to them.Compare with conventional systems tools and techniques such as database, search engines, research about commending system is started late, approximately since the mid-90 in last century, it just becomes a relatively independent research field, however, short two during the decade, research and the application of commending system have obtained significant progress.In recent years, along with the develop rapidly of the novel internet, applications form of the Top Sites such as Amazon, Yahoo, Google, YouTube, Netflix, IMDb and representative thereof, people also grow with each passing day for the research and development enthusiasm of commending system.Especially in e-commerce field, along with the rapid growth of commodity amount and kind, and the varying of commercial quality and grade, client often requires a great deal of time and just can find the commodity of oneself wanting.Therefore, e-commerce website, in order to attract client to realize profit in keen competition, must design efficient personalized recommendation system, according to client's preference or demand, for client does shopping, provides decision support and the information service of complete personalization.

At present, there have been many commercial commending systems to be used widely in practice, they provide such as hot item recommendation, New Arrivals, product correlation recommendation and customer group with interest recommendation etc. for user, more representational as Amazon, eBay, CDNow, GroupLens, Netflix and Moviefinder etc.Wherein, Amazon, mainly by utilizing user preference or other users' purchase information, recommends relevant books or other products to user; Moviefinder mainly adopts collaborative filtering to carry out film according to user preference or music disc is recommended; GroupLens mainly utilizes the automatic system based on data set to filter and provides News Recommendation Service Based etc. to user.Generally speaking, also there are some problems in current existing commending system, as: commending system usually needs customized development and underaction, majority are failed to take into full account business strategy, are difficult to adopt the different strategies etc. of recommending according to the variation of recommended requirements.Meanwhile, along with the further expansion of e-commerce system scale, on large-scale dataset, for providing real-time recommendation service, ten hundreds of users becomes more and more difficult.

In recent years, on the basis of Distributed Calculation, grid meter grate, parallel computation and the network storage etc., developed cloud computing (cloud computing).Cloud computing technology can be by network the relatively low computational entity of a plurality of costs, be integrated into a distributed system with powerful calculating ability, and by infrastructure, serve that (IaaS), platform serve (PaaS), software serve the concepts such as (SaaS) and managed service provider (MSP), powerful computing power and storage capacity are distributed in terminal user's hand.The advantage that builds personalized recommendation system based on cloud computing is very obvious, for example: cloud computing contributes to carry out efficient large-scale data excavation on the data set of magnanimity; Cloud computing is easy to realize Distributed Parallel Computing Environment cheaply, reduces data processing cost and the dependence to high-performance server of commending system; Cloud computing can be the exploitation shielding bottom isomerism of commending system, and can effectively utilize existing equipment to improve the processing power of large-scale data and speed, improves transplantability and the fault-tolerance of commending system.

Therefore, the present invention is comparing on the basis of research traditional personalized recommendation algorithm, a kind of personalized recommendation system (CCBRS) based on cloud computing has been proposed, this system can adopt according to different recommended requirements different recommendation strategies, there is good Universal and scalability, and can provide good support to the processing of large-scale data.

Summary of the invention

The cloud computing technologies such as the present invention's application Hadoop and Mahout are processed large-scale data, proposed a kind of personalized recommendation system (CCBRS) based on cloud computing, this system can customize different recommendation strategies according to different situations and different recommended requirements.

CCBRS system based on cloud computing mainly comprises three subsystems, i.e. data-storage system, recommendation computing system and business application system.Wherein data-storage system mainly comprises two parts: real-time, interactive database is (for business application system, relevant database based on common), distributed file system (utilize Hadoop HDFS to realize highly reliable distributed data file memory function, by mass data distributed store on many computer clusters).Recommend computing system to comprise that data preprocessing module (cleans isomeric data, conversion, load etc.), data-mining module (producing respectively user clustering and commercial articles clustering through cluster and association rule algorithm) and recommending module (application content filter algorithm, collaborative filtering, mixing proposed algorithm etc. calculates the result of recommendation), algorithm in above-mentioned module all carries out in conjunction with MapReduce distributed computing framework when operation, the main Mahout machine learning framework that adopts builds corresponding proposed algorithm simultaneously, Mahout has realized cluster, classification, collaborative filtering, the data mining algorithms such as Evolutionary Programming, and allow expansion, therefore can customize corresponding Mahout algorithms library according to the business demand of exemplary application layer.Business application system will be recommended strategy customization according to actual business demand, concrete calculating by recommending computing system to complete, and result of calculation is called for operation system.In CCBRS system, general business application system is real-time online work, recommends computing system to adopt and work offline, and can reduce as far as possible like this pressure of server, and can improve the execution efficiency of commending system.

Accompanying drawing explanation:

Fig. 1: each algorithm of stand-alone environment is recommended the time

Fig. 2: stand-alone environment Item Clustering working time

Fig. 3: pseudo-distributed SlopeOne working time

Fig. 4: distributed execution improves Item-Based working time

Embodiment:

Distributed cloud environment based on Hadoop mainly contains two kinds of patterns: a kind of is the pseudo-distributed mode of Hadoop based on unit, and another kind is the complete distributed mode of Hadoop.When building experimental situation, adopt Hadoop 0.2 version, because Hadoop needs JDK when moving, support, therefore selected corresponding jdkl.6.0_24.Concrete development environment is Eclipse+Hadoopeclipse plugin, hardware experiment platform be configured to OS:CentOS5.5 x64; CPU:Intel (R) Xeon (R) E54202.50GHz; Memory:4GB RAM.In test, main four PC (being PC1～PC4) that adopt build cloud computing environment, and wherein PC1 is as namenode and Jobtracker, and PC2～PC4 is as datanode and tasktracker.

To every PC /etc/hosts catalogue configuration is as follows: masters:192.168.10.1, slaves:192.168.10.1,192.168.10.2,192.168.10.3,192.168.10.4.In addition, on PC1, utilize ssh-keygen to generate the key pair of PC1, then its PKI is copied in each machine/home/.s sh catalogue, thereby makes to may be completed to logining without password ssh of each machine from PC1.Aspect the key configuration of Hadoop, the localhost under the conf/masters of every machine and slaves is revised as to corresponding IP address, and under conf/mapred-site.xml, configures the lP address of namenode and jobtracker.Table 3 has been listed and relevant hadoop key configuration parameter, during concrete configuration, can raise realization by revising conf/core-site.xml, conf/mapred-site.xml and conf/hdfs-site.xml.

The data set adopting in test comes from MovieLens[181 and Libimsetj[19], wherein from MovieLens, there are three piece of data, 100,000,1,000,000 and 1,000 ten thousand evaluation informations about film (size is respectively 1.88MB, 23.4MB and 234MB) are that information (size is 253MB) is steathily commented in 1,736 ten thousand anonymities that generated by 14 general-purpose families from the data set of Libimseti.First, carry out aforementioned six kinds of proposed algorithms and recommend 5 article to any designated user under stand-alone environment, wherein the K value in svd algorithm gets 10, and the neighbours' number in improved KNN algorithm gets 10.And because Item Clustering algorithm is longer to the data set execution time more than 100,000 magnitudes, therefore chosen several little data volumes, this algorithm is tested.

As seen from Figure 1, pretty good based on user and project-based proposed algorithm overall performance, svd algorithm and improved KNN algorithm are along with data volume increases execution time sharp increase, SlopeOne algorithm due to the restriction of internal memory make the execution time with data volume increase and more and more slower, when the data of carrying out 1,000 ten thousand, occur that internal memory overflows.Item Clustering algorithm also can, along with the increasing of data volume, show the phenomenon of execution time sharp increase as seen from Figure 2.Next test the situation of carrying out SlopeOne algorithm on the pseudo-distributed platform consisting of four PCs, under the data set of different magnitudes, the execution time of this algorithm as shown in Figure 3.

The pseudo-distributed SlopeOne Riming time of algorithm of Fig. 3

The distributed execution of Fig. 4 improves Item-Based Riming time of algorithm

As seen from Figure 3, in pseudo-distribution platform, carrying out the whole time of SlopeOne algorithm obviously improves, but with similar at stand-alone environment, when data volume is excessive, in pseudo-distribution platform, also occurred the situation that can not complete within effective time due to internal memory restriction, data magnitude has occurred that at 1,000 ten thousand o'clock internal memory overflows.The situation of improved Item-Based algorithm is carried out in last test on four PCs with distributed way, under different pieces of information amount, the corresponding execution time as shown in Figure 4.

In addition, also further tested and be increased in 5 and 7 s' situation, for the situation of the Item-Based algorithm after the operational development of 1,736 ten thousand archives score information employing distributed way when node.When nodes is increased to 5, the execution time of 1,736 ten thousand data is 5.5 hours; When nodes is increased to 7, the execution time is 3.5 hours.Reflect the variation tendency that the execution time reduces along with the increase of nodes, the performance that the cloud computing environment of the appropriate scale forming for the node utilizing by One's name is legion significantly improves commending system provides good experimental data to support.

In addition to the implementation, the present invention can also have other embodiments.All employings are equal to the technical scheme of replacement or equivalent transformation formation, all drop on the protection domain of requirement of the present invention.

Claims

1. the commending system implementation method based on cloud computing technology, is primarily characterized in that and comprises following functional module: data-storage system, recommendation computing system and business application system.

2. the commending system implementation method of a kind of cloud computing technology of the proposition based on claim 1, is further characterized in that:

Real-time, interactive database is (for business application system, relevant database based on common), distributed file system (utilize HadoopHDFS to realize highly reliable distributed data file memory function, by mass data distributed store on many computer clusters);

Recommend computing system comprise data preprocessing module (to isomeric data clean, change, loading etc.), data-mining module (producing respectively user clustering and commercial articles clustering through cluster and association rule algorithm) and recommending module (application content filter algorithm, collaborative filtering, mixing proposed algorithm etc. calculate the result of recommendation);

Business application system will be recommended strategy customization according to actual business demand, concrete calculating by recommending computing system to complete, and result of calculation is called for operation system.