CN109471982B

CN109471982B - Web service recommendation method based on QoS (quality of service) perception of user and service clustering

Info

Publication number: CN109471982B
Application number: CN201811394933.8A
Authority: CN
Inventors: 暴建民; 温韬
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2018-11-21
Filing date: 2018-11-21
Publication date: 2022-06-17
Anticipated expiration: 2038-11-21
Also published as: CN109471982A

Abstract

The invention provides a Web service recommendation method based on QoS (quality of service) perception of user and service clustering, which is based on QoS (quality of service) record of user calling service and uses cosine similarity calculation to cluster users with similar service preference; then, extracting and normalizing feature words of a WSDL file of a description document of the service, and clustering the services with similar functional features by adopting a K-means + + algorithm; and finally integrating the similar user set and the similar service set into a collaborative filtering algorithm to calculate the missing QoS value of the service invocation matrix and generate recommendation. The result of a simulation experiment performed by using a data set provided by a WSDream open source project shows that the Web service recommendation method based on user and service clustering has a better result on two main evaluation indexes of average absolute error and root mean square error compared with the prior art.

Description

Web service recommendation method based on QoS (quality of service) perception of user and service clustering

Technical Field

The invention relates to the field of service recommendation in data mining, in particular to a Web service recommendation method based on QoS (quality of service) perception of user and service clustering.

Background

With the rapid development of the current internet and the explosive growth of web services with various functions available thereon, it is increasingly difficult for users to select appropriate web services, and thus there is a strong need for a related method for automatically and efficiently selecting appropriate web services for recommendation to users. In the face of massive service types and service information, how to efficiently and accurately recommend relevant services to users according to user specificity is becoming a research problem in the industry. Web service recommendations provide a direction to address this dilemma.

The web service recommendation is to establish a recommendation model for service recommendation according to the basic information, the history call record, the preference attribute of the user and the like of the user. The web service recommendation becomes a research hotspot in the field of service computing in recent years, the current academic community has matured the research on the web service recommendation, wherein the prediction and recommendation of related services and variables are realized on the basis of QoS (quality of service), and on the basis, the recommendation precision is improved by taking other factors into consideration, such as introduction of geographic factors or service function factors, which all achieve better results. However, the influence of comprehensive consideration of the user neighborhood and the service function attribute on the recommendation result under the condition that the QoS matrix data is sparse is not mentioned. The performance of the web service recommendation system depends on the recommendation method used by the system, and the basic assumption for the web service recommendation method is that: 1. the user likes a certain service, which also likes a service similar to the service; 2. users like the used services of users with similar background and preference; 3. a user likes a service with a certain feature, and the user also likes other services with the feature. The current web service recommendation method mainly comprises recommendation based on IF/THEN rules, recommendation based on contents, collaborative filtering recommendation, mixed recommendation and the like. Among them, the Collaborative Filtering Recommendation (CFR) is a relatively mature and mainstream Recommendation at present. Compared with other recommendation algorithms, the algorithm has the following advantages: firstly, no special requirement is required on a recommended object, and the recommendation can be realized for complex and abstract resources; second, only explicit or implicit user historical rating data is needed, without attribute knowledge about the user himself, and without any negative impact on the user's recommendation experience. Although the collaborative filtering recommendation algorithm has achieved many important research results, there are still many key problems to be solved urgently, which mainly include the data sparsity problem, the cold start problem, the scalability problem, and so on. The sparsity problem is mainly embodied in that the user has fewer ratings on related web services in the current data set, and potential factors influencing the preference of the user on the web services are fewer, but each potential factor greatly influences the recommendation result, so that the accuracy of web service recommendation is greatly reduced. Many attempts are made by many researchers at home and abroad at present to improve the accuracy and performance of a web service recommendation algorithm, and abundant results are obtained. Zheng et al mainly introduces the factor of the geographical field, alleviates sparsity of a prediction matrix by a matrix decomposition method, adopts a random gradient descent algorithm to quickly converge and give a prediction result, and a collaborative filtering algorithm improved by the scheme has a more accurate prediction result compared with other algorithms. Duncylin and the like propose a collaborative filtering algorithm based on project score prediction, the algorithm fills score null values in a user score item union set by adopting a project-based collaborative filtering method, and calculates similarity among users on the filled union set. Zhang et al indicate that most nearest neighbors searched by a user-based collaborative filtering method have no score on target items, so that the accuracy of score prediction is seriously influenced, an iteration threshold is set for the nearest neighbors, the user-based collaborative filtering method is executed in a circulating mode to fill up the unverified items of the nearest neighbors, and after scores of all the nearest neighbor pair target items are obtained, a final prediction score is generated by adopting a traditional collaborative filtering method; guosheng Kang et al calculate ranking scores of Web service candidates based on QoS preferences and diversity characteristics of users on service potential, and by using interests and QoS preferences of users in historical invocation of Web services, provide an innovative diversity-aware Web service ranking algorithm, construct a Web service graph based on functional similarity between Web services, and evaluate Web service candidate entries according to their scores and diversity derived from the Web service graph; yan Hu et al propose an improved time-aware collaborative filtering method that integrates time information into similarity measurements and QoS predictions for high-performance Web service recommendations.

The researches have better performance in the aspect of realizing prediction and recommending related services and variables based on QoS, but the great influence of service functions and user similarity on the precision of a recommendation result is mostly ignored, many recommendation models are deeply influenced by the high sparsity of QoS matrix data, and the problem of low precision of the recommendation result caused by the factors is urgently solved.

Disclosure of Invention

In order to solve the above problems, the present invention provides a Web service recommendation method based on user and service clustering QoS sensing, which first performs context feature extraction according to WSDL documents of Web services to obtain function description thereof, and performs clustering. And then similarity clustering is carried out on the QoS attributes of the historical calling services of the user, the QoS prediction is carried out on the cooperative service function by adopting an improved matrix decomposition recommendation algorithm, and recommendation is generated, so that the problem of low similarity caused by the sparsity of recommendation matrix data is solved. And finally, the algorithm is adopted for testing, 80% of data sets are used for training the algorithm to obtain the optimal parameters, and the rest data are used for testing. Compared with the other three mainstream web service recommendation algorithms, the test result has better results on two error evaluation parameters of MAE and RMSE.

The invention provides a Web service recommendation method based on QoS (quality of service) perception of user and service clustering, which comprises the following steps:

step 1: preprocessing the service call matrix by the users in the data set, clustering each similar user by adopting a modified cosine similarity algorithm to obtain N user sets U which are most similar to each user_(i)The calculation formula (1) is:

wherein, I_ijRepresenting a set of items that have been jointly scored by users I, j, I_iRepresented is a set of items scored by user I, I_jIs a set of items scored by user j, R_i,kIs the rating of user i on rating item k, R_j,kIs the score of the user j for the item k,

respectively averaging the scores of the user i and the user j on the respective scoring items;

step 2: extracting the characteristics of related function description of a Web service description document WSDL file, carrying out Google distance normalization, clustering the services with similar function characteristics by adopting a K-means + + algorithm to obtain a similar service set S_(i)The calculation formula (2) is:

wherein, FV_s1，FV_s2Respectively web service S₁，S₂Is given by the feature word vector, | FV_s1The | is a vector base number (mode), Normalized Google Distance, NGD for short, and is a correlation representation of two words obtained by carrying out standardized calculation on data obtained by a Google search engine;

and step 3: solving the characteristic vectors of the user matrix U and the service matrix S of the service call matrix Rp_i,q_j；

And 4, step 4: let r be_i,jThe QoS value invoked for service j for user i, the missing QoS value in R is noted as the element of matrix R

Is provided with

And 5: to calculate

For each non-null r_i,jE.g. R, calculating

Step 6: for each j e [1, n],k∈U_(i)Calculating

Wherein e₁Is p_iUpdate factor of (I)_i,jAnd the calling condition of the user i to the service j is shown, the calling is 1 if the calling is available, and the calling is 0 if the calling is not available. Alpha is a weight factor, and large alpha indicates that the influence of adjacent users on the current predicted QoS is larger;

and 7: for each i ∈ [1, m ∈ >],d∈S_(j)Calculating

Wherein e₂Is q_jBeta is a weight factor, and the larger beta indicates that the functional characteristics of the service have larger influence on the current predicted QoS;

and 8: calculating p_i＝p_i-γe₁，q_j＝q_j-γe₂Wherein gamma is an iteration factor used for controlling the number and speed of iteration;

and step 9: iterate S5 through S8 until all non-empty r_i,jCalculating or meeting the iteration termination condition;

step 10: obtaining missing QoS values

And recommend and

and (5) the similar service is provided for the relevant user to complete the recommendation.

The further improvement lies in that: the step 1 clusters similar users, and the clustering method comprises the following steps:

step 1.1: input I_i,I_j,I_ij,R_i,k,R_j,k；

Step 1.2: computing

Step 1.3: for each user i, i is more than 0 and less than or equal to m, each service j is not equal to i, j is more than 0 and less than or equal to m-1, and SimU (i, j) is calculated and stored in the temp mapping table;

step 1.4: taking the first N corresponding users in the temp mapping table to form a neighbor set U of the user i_(i)＝{U_k|U_k∈Top-N(U_i) I ≠ k }, completing clustering.

The further improvement lies in that: the step 2 clusters similar services, and the clustering method is as follows:

step 2.1: TF-IDF calculation is carried out on the WSDL file, a service function feature description word set is extracted and is recorded as FV_si；

Step 2.2: and (3) carrying out Google distance normalization processing on the feature word set, wherein the calculation formula (3) is as follows:

m is the total web page number searched in Google by using the characteristic words x and y, and is the logf (x), the logf (y) is the hit number searched by using the characteristic words x and y respectively, and f (x and y) is the number of web pages simultaneously appearing by using x and y;

step 2.3: performing K-means + + clustering based on the feature word vectors, and selecting the clustering number K;

step 2.4: feature word set vector FV from services_siRandomly selecting one as an initial clustering center C₁；

Step 2.5: calculating the distance D between each vector and the current nearest cluster center_(x)Calculating the probability of each vector becoming the next cluster center

Determining the next center according to the probability wheel disc;

step 2.6: repeating the step 2.5 until K clustering centers are selected;

step 2.7: FV for each service_siCalculating the normalized Google distance from the Google distance to K clustering centers, and classifying the Google distance to the class to which the center with the minimum distance belongs;

step 2.8: for each class C_(i)Recalculating centers of all the elements in (1);

step 2.9: repeating the step 2.7 and the step 2.8 until the clustering center is not changed any more, and obtaining a web service cluster S_(i)＝{s|s∈C_(i),i∈[1,k]And finishing clustering.

The further improvement lies in that: the range of NGD in step 2.2 is NGD (x, y) e [0, ∞). The invention has the beneficial effects that: some important features of the service itself, such as the functional description, are feature extracted, and these features play a crucial role in the user preference. And then carrying out similarity clustering on the users based on the QoS of the user calling service. Factors influencing the preference of the user to the web service are fully considered in the aspects of the server and the client, and the problem that the recommendation precision is influenced due to the fact that the service is not fully considered to the recommendation result in the prior art is solved. Aiming at the problems that data of a user service calling matrix is sparse and accuracy of a service recommendation result is greatly influenced, a matrix decomposition model is provided for predicting a missing QoS value, a gradient descent iteration algorithm is adopted for the model, and the result can be obtained efficiently by combining an actual data quantity control iteration factor.

Drawings

FIG. 1 is a scene framework diagram of the present invention.

Fig. 2 is a flow chart of the similar user clustering algorithm of the present invention.

FIG. 3 is a flow chart of a similar service clustering algorithm of the present invention.

Fig. 4 is a flow chart of the gradient descent algorithm of the present invention predicting missing QoS values and generating a recommendation algorithm.

FIG. 5 is a graph comparing recommendation accuracy on MAE under the same data set with the mainstream web service recommendation algorithm.

FIG. 6 is a graph comparing recommendation accuracy on RMSE for the same data set as the mainstream web service recommendation algorithm.

Detailed Description

For the purpose of enhancing understanding of the present invention, the present invention will be further described in detail with reference to the following examples, which are provided for illustration only and are not to be construed as limiting the scope of the present invention.

The data for the embodiment is provided by the open source item WSDream, which contains 339 user calls to 5,825 Web services in the real world, resulting in over one million and over one-half million call records. Data cleaning and preprocessing are carried out on the basis of original data, model training is carried out by using 80% of data in a data set to determine optimal parameters, and the remaining 20% of data are used for verifying the effectiveness of an algorithm model.

The architecture of the web recommendation algorithm based on user and service clustering is shown in FIG. 1. Firstly, similarity calculation and clustering are carried out on users; then, extracting functional description text features of the web service description document WSDL in the data set, and clustering by using a k-means + + algorithm after the Google distance is normalized; and finally integrating the similar users and services into a collaborative filtering recommendation model to predict the QoS value, and comparing the predicted value with the QoS of the related service to generate recommendation.

To describe the algorithm principle, there are the following definitions:

definition 1: the service scores of the user i and the user j are represented on the QoS value of service calling, and the similarity of the user i and the user j is defined on the basis of the service QoS scores of the user, and the similarity is as follows:

I_ijrepresenting a set of items that have been jointly scored by users I, j, I_iRepresented is a set of items scored by user I, I_jIs a set of items scored by user j, R_i,kIs the rating of user i on rating item k, R_j,kIs the score of the user j for the item k,

respectively, are the average values of the scores of the user i and the user j on the respective scoring items.

Definition 2: for the WSDL service description Document of each web service, a Term-Inverse Document Frequency, abbreviated as TF-IDF method is used to obtain a feature word vector of a service description text, and the feature word vector is marked as FV_siThe feature word vector for the service si is represented. Finally, the feature word set is converted into a numerical value to facilitate similarity calculation. NGD processing is used. NGD is a relevance representation of two words normalized using data from the Google search engine, defined as follows:

m is the total web page number searched in Google by using the characteristic words x and y, log f (x), log f (y) is the hit number searched by using the characteristic words x and y respectively, and f (x and y) is the number of web pages appearing by using x and y simultaneously. NGD (x, y) is ∈ [0, ∞ ].

Definition 3: the similarity of two web services is defined as:

FV_s1，FV_s2respectively a Web service S₁，S₂Is given by the feature word vector, | FV_s1And | is the cardinality of the vector. Definition 4: let the scoring matrix of user calls to service be expressed as R ═ U^TS, similar user characteristic matrix U belongs to R^l×mThe similar service characteristic matrix S is formed by R^l ^×mDefining the feature vector in the user feature matrix as p_iService feature vector is q_j. The missing value in the scoring matrix for service j for user i

Is defined as:

definition 5: similar users and similar services are integrated into a matrix decomposition model, and are expressed mathematically as follows:

U_(i)representing a set similar to user i, S_(j)Representing a set of services with similar functional descriptions as service j. I is_i,jThe call of the user i to the service j is shown, and the call is 1 in some cases and 0 in other cases.

Is the square of the F-norm. The latter two terms are regularization terms, λ, used to mitigate the over-fit user service matrix_i,λ_jAre parameters used to control the overfitting constraints. SimU (i, k) represents the similarity between user i and user k, and SimS (j, d) represents the similarity between Web service j and service d. And alpha and beta are weight coefficients for controlling the user characteristics and the service characteristics, wherein the large alpha indicates that similar users have larger influence on the currently predicted QoS, and the large beta indicates that the functional characteristics of the service have larger influence on the currently predicted QoS.

Definition 6: to obtain

By devitalizing the equation (5), the elements p of the target feature vectors U and S_i,q_iThe update iteration mode of (2) is as follows:

where γ is an iteration factor used to control the number and speed of iterations.

Definition 7: the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE) of the evaluation factors for evaluating the accuracy of the recommendation result of the recommendation method are defined as follows:

r_i,jis the QoS value actually invoked by user i for Web service j,

the model predicts the QoS value, N is the number of predicted values, and the MAE evaluates the relative error of the predicted values to the true values. RMSE emphasizes the relative maximum error. Smaller both represent smaller errors and higher accuracy. The flow of clustering users in the embodiment is shown in fig. 2, and the specific steps are as follows:

step 1.1: input I_i,I_j,I_ij,R_i,k,R_j,k；

Step 1.2: computing

step 1.4: taking the first N corresponding users in the temp mapping table to form a neighbor set U of the user i_(i)＝{U_k|U_k∈Top-N(U_i),i≠k}；

The process of clustering web services in the embodiment is shown in fig. 3, and the specific steps are as follows:

Step 2.2: and carrying out Google distance normalization processing on the feature word set, wherein the calculation formula is as follows:

m is the total web page number searched by Google by using the characteristic words x and y, and logf (x), logf (y) is the hit number searched by using the characteristic words x and y respectively, and f (x and y) is the number of web pages simultaneously appearing by using x and y. NGD (x, y) is E [0, ∞);

step 2.3: performing K-means + + clustering based on the feature word vectors, and selecting the number K of clusters;

Determining the next center according to the probability wheel disc;

step 2.6: repeating the step 2.5 until K clustering centers are selected;

step 2.7: FV for each service_siCalculate its normalization to K cluster centersGoogle distance, which is classified into the class to which the center with the smallest distance belongs;

step 2.8: for each class C_(i)Recalculating centers of all the elements in (1);

step 2.9: repeating the step 2.7 and the step 2.8 until the clustering center is not changed any more, and obtaining the web service cluster S_(i)＝{s|s∈C_(i),i∈[1,k]}；

The QoS-aware web service recommendation method based on user and service clustering of the embodiment has a flow chart as shown in fig. 4, and specifically includes the following steps:

step 1: preprocessing the service call matrix by the users in the data set, clustering each similar user by adopting a modified cosine similarity algorithm to obtain N user sets U which are most similar to each user_(i)The calculation formula is as follows:

and 2, step: extracting the characteristics of related function description of a Web service description document WSDL file, carrying out Google distance normalization, clustering the services with similar function characteristics by adopting a K-means + + algorithm to obtain a similar service set S_(i)The calculation formula is as follows:

wherein, FV_s1，FV_s2Respectively web service S₁，S₂Is given by the feature word vector, | FV_s1And | is the vector cardinality (modulo). NGD is a relevance representation of two words obtained by carrying out standardized calculation by utilizing data obtained by a Google search engine;

and 3, step 3: solving the characteristic vector p of the user matrix U and the service matrix S of the service call matrix R_i,q_j；

And 4, step 4: let r be_i,jThe QoS value invoked for user i for service j is the element of the matrix R. The missing QoS value in R is noted

Is provided with

And 5: to calculate

For each non-empty r_i,jE.g. R, calculating

Step 6: for each j e [1, n],k∈U_(i)Calculating

and 7: for each i ∈ [1, m ∈ >],d∈S_(j)Calculating

and step 9: iterate S5 through S8 until all non-empty r_i,jAnd finishing the calculation or meeting the iteration termination condition.

Step 10: obtaining missing QoS values

And recommend and

similar services are provided for relevant users; to analyze the recommendation accuracy of the QoS-aware web service recommendation algorithm of the embodiment based on user and service clustering, experimental simulations were performed with data sources from real-world collections, data sets hosted on the github open source item WSDream, which contains 5825 web service invocation records from 339 users in 73 countries. In order to verify the recommendation accuracy of the traditional service recommendation method and the method, six mainstream recommendation methods are selected, the same data set is respectively tested, the MAE and the RMSE are respectively calculated, the recommendation accuracy is compared, and the smaller the MAE and the RMSE is, the smaller the representative error is, and the higher the recommendation accuracy is.

And respectively selecting three mainstream web service recommendation methods of IPCC, NIMF and LoNMF for comparison experiments. Each method was run 10 times and the mean values of recorded MAE and RMSE were calculated and finally compared to the results of the invention. The experimental environment is a Linux operating system, the processor is an Intel (R) core (TM) i7-8550U, 8GB memory, and a result simulation experiment is carried out by using Python and C + +.

For the experimental part of the data in the examples, the values of the respective key parameters were determined in a large amount of data training and tuning, in the examples, α is set to 0.4, β is set to 0.5, γ is set to 0.013, and N is set to 10, fig. 5 and 6 are graphs comparing the results of the comparison of the present invention with other conventional six algorithms on MAE and RMSE, the abscissa is the matrix density, and the higher the abscissa represents the more data available, wherein the higher the accuracy of the present invention compared to the conventional algorithm at each matrix density was achieved.

Claims

1. A Web service recommendation method based on QoS perception of user and service clustering is characterized in that: the method comprises the following steps:

respectively averaging the scores of the user i and the user j on respective scoring items;

step 2: extracting the characteristics of related function description of a Web service description document WSDL file, carrying out Google distance normalization, clustering the services with similar function characteristics by adopting a K-means + + algorithm to obtain a similar service set S_(a)Wherein: the calculation formula (2) is:

wherein, FV_s1，FV_s2Respectively web service S₁，S₂Is given by the feature word vector, | FV_s1The | is a base number model of the vector, and the NGD is the relevance expression of two words obtained by carrying out standardized calculation on data obtained by a Google search engine;

and step 3: solving the characteristic vector p of the user matrix U and the service matrix S of the service call matrix R_i，q_n；

And 4, step 4: let r be_i,nThe QoS value invoked for service n for user i, the missing QoS value in R is noted as the element of matrix R

Is provided with

And 5: to calculate

For each non-empty r_i,nE.g. R, calculating

Step 6: for each service n (n is epsilon [1, m)]) User U (U ∈ U)_(i)) Computing

Wherein e₁Is p_iUpdate factor of (I)_i,nRepresenting the calling condition of the user i to the service n, wherein the calling condition is 1 if the calling condition exists, and is 0 if the calling condition does not exist; alpha is a weight factor, and the larger the value of alpha is, the larger the influence of the adjacent user on the current predicted QoS is;

and 7: for each user i, (i e [1, w ]), d is the current service, the calculation is done

Wherein e₂Is q_nBeta is a weight factor, and the larger the value of beta is, the larger the influence of the functional characteristics of the service on the current predicted QoS is;

and 8: calculating p_i＝p_i-γe₁，q_n＝q_n-γe₂Wherein gamma is an iteration factor used for controlling the number and speed of iteration;

and step 9: iterating steps 5-8 until all non-empty r_i,nThe iteration termination condition is calculated or met;

step 10: obtaining missing QoS values

And recommend and

2. The method of claim 1, wherein the method comprises the following steps: the step 1 clusters similar users, and the clustering method is as follows:

step 1.1: input I_i,I_j,I_ij,R_i,k,R_j,k；

Step 1.2: computing

Step 1.3: for each user i, i is more than 0 and less than or equal to w, j is more than 0 and less than or equal to w-1, calculating SimU (i, j) and storing the SimU (i, j) into a temp mapping table;

step 1.4: taking the first N corresponding users in the temp mapping table to form a neighbor set U of the user i_(i)＝{U_j|U_j∈TopN(U_j) And i is not equal to j, and clustering is finished.

3. The method of claim 1, wherein the method comprises the following steps: the step 2 is to cluster similar services, and the clustering method is as follows:

Step 2.5: calculating the distance D between each vector and the current nearest cluster center_(x)Calculating the probability of each vector becoming the center of the next cluster

Determining the next center according to the probability wheel disc;

step 2.6: repeating the step 2.5 until K clustering centers are selected;

step 2.8: for each class C_(i)Recalculating centers of all the elements in (1);

step 2.9: repeating the step 2.7 and the step 2.8 until the clustering center is not changed any more, and obtaining the web service cluster S_(a)＝{s|s∈C_(a),a∈[1,m]And m is more than or equal to 1}, and clustering is completed.

4. The method of claim 3 for recommending Web services based on QoS awareness of users and service clusters, wherein: the range of NGD in step 2.2 is NGD (x, y) e [0, ∞).