CN113553493A

CN113553493A - Service selection method based on demand service probability matrix

Info

Publication number: CN113553493A
Application number: CN202010333583.5A
Authority: CN
Inventors: 刘睿霖; 徐汉川; 王忠杰; 涂志莹; 徐晓飞
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2021-10-26

Abstract

The invention discloses a service selection method based on a demand service probability matrix. Step 1: clustering the user requirements in the historical service records by using a fuzzy clustering method; step 2: mining a solution existing in a service system by using an FP (Fabry-Perot) growth algorithm, and focusing attention on frequently and simultaneously called Web services in a historical record; and step 3: the efficiency of the traditional service selection problem is improved by establishing a probability matrix; and 4, step 4: the service frequency pattern most likely to be used by the new demand is calculated, and the possibility of combining multiple service frequency patterns is calculated. The invention takes the prior knowledge obtained from historical data as guidance, effectively reduces the search space of the service selection problem and achieves the aim of improving efficiency. The method overcomes the defect that the prior art does not consider the prior knowledge of the field, and makes up the blank of using the existing partial service solution fragments to select the service in the service selection field.

Description

Service selection method based on demand service probability matrix

Technical Field

The invention relates to the technical field of service selection, in particular to a service selection method based on a demand service probability matrix.

Background

With the proliferation of a wide variety of Web services in the internet, the number of available services continues to grow, and the relationships between services become more complex, posing significant challenges to service selection. Due to the huge search space, the existing method is difficult to select the optimal service from a large number of candidate services within a limited time, and a feasible service solution is constructed for the user.

Disclosure of Invention

The invention provides a service selection method based on a demand service probability matrix, which is used for analyzing and finding the distribution rule and the characteristics of user demands and dividing a large number of demands by using a fuzzy clustering method so as to form service demand clusters with different characteristics; on the other hand, a service (frequent) mode, namely a valuable service solution fragment, is mined from a historical service solution, secondly, a demand-service probability matrix is constructed, the matrix represents the statistical probability of the mapping relation between the service demand cluster and the service frequent mode, and finally, demand-service and a traditional service selection method are combined, so that the service selection efficiency is improved.

The invention is realized by the following technical scheme:

a service selection method based on a demand service probability matrix, the service selection method comprising the steps of:

step 1: clustering the user requirements in the historical service records by using a fuzzy clustering method;

step 2: mining a solution existing in a service system by using an FP (Fabry-Perot) growth algorithm, and focusing attention on frequently and simultaneously called Web services in a historical record;

and step 3: the efficiency of the traditional service selection problem is improved by establishing a probability matrix;

and 4, step 4: the service frequency pattern most likely to be used by the new demand is calculated, and the possibility of combining multiple service frequency patterns is calculated.

Further, the user requirements in step 1 are clustered by focusing on the non-functional requirements of the user and the constraint conditions on the service quality.

Further, the fuzzy clustering method in step 1 improves a similarity measurement method of an original algorithm, uses a pearson correlation coefficient and a cosine similarity to replace an euclidean distance, linearly superposes the pearson correlation coefficient and the cosine similarity between two demands, calculates the similarity between the demands or the membership between the demands and a cluster center, and establishes a membership matrix μ with a scale of mxh^M×HAggregating the H service demands into M user demand clusters Clu, wherein the membership matrix is defined as follows:

wherein, mu_ijRepresenting the membership degree of the jth service requirement belonging to the ith requirement cluster, wherein the membership degree matrix needs to meet the following limiting conditions:

μ_ijthe value range of (a) is from 0 to 1, the larger the numerical value is, the larger the degree of the requirement j belonging to the requirement cluster i is, for any requirement j, the sum of the membership degrees belonging to any requirement cluster is 1, and the total number of the clustering results is M requirement clusters, so that the sum of all the requirement membership degrees in each requirement cluster needs to be more than 0 and less than H.

Furthermore, aiming at the service quality constraint and the service quality preference in the user requirement, different calculation methods are respectively adopted,

the similarity of the quality of service constraints is calculated using pearson correlation coefficients,

the cosine similarity is used to calculate the quality of service preference,

further, the clustering in step 1 is specifically that, by iteratively minimizing an objective function formula,

the similarity between the sample point and the cluster center gradually approaches to the maximum value, and in each iteration process, the following formula is used,

respectively calculating the optimal demand cluster center and the membership matrix obtained in the iteration process; cluⁱAnd muⁱRespectively representing all the demand cluster centers in the ith iteration process and the optimal membership matrix obtained by the current iteration; when the change degree of the membership matrix obtained in the two iteration processes is smaller than a preset threshold value or the iteration times reach a preset maximum iteration time, the algorithm is stopped, and the obtained demand cluster center and the optimal membership matrix represent the optimal clustering division under the parameter setting.

Further, the FP growth algorithm in the step 2 comprises a construction method of the FP tree and an excavation method based on FP-growth;

the construction method of the FP tree comprises the following steps: the frequent pattern tree is a special prefix tree and consists of a root node, a frequent item head table and item prefix subtrees, each service prefix subtree is a child node of the root node, each node of the service prefix subtree consists of a service name, a support degree count and a node pointer, and the service name is an identifier of a node in the frequent pattern tree; the support count is used for representing the number of transactions meeting the condition in the data set, and if all frequently-served nodes arriving at the node path from the root node appear in a certain transaction, the transaction is considered to meet the condition; the node pointer points to the next node with the same service name in the frequent pattern tree, if no node which can be pointed to exists, the node is marked as null, the frequent item head table consists of the service name and a node head pointer, and the node head pointer points to the first node with the same service name in the frequent pattern tree;

the excavating method based on FP-growth comprises the following steps: for each frequent service ws appearing in the frequent item header table_ijTraversing, wherein the traversing sequence is a support ascending sequence, namely, each frequent service is taken as an initial suffix mode from the tail part to the head part of the head table of the frequent item; the algorithm targets each frequent service ws_ijTry to find all the nodes containing ws through the pointer head node_ijThus creating for each suffix pattern a conditional pattern base, defined as a sub data set containing nodes co-occurring with the suffix pattern, consisting of all prefix paths to the suffix pattern; if there is a node in the conditional pattern base of the suffix pattern that co-occurs with the suffix pattern more than the minimum support threshold, then the algorithm builds a conditional frequent pattern tree for the suffix pattern using the conditional pattern base, line 14; the subsequent mining process can be performed in a recursive manner on the conditional frequent pattern tree, and if the conditional frequent pattern tree is not empty, the service frequent pattern can be continuously increased.

Further, the step 4 service frequent pattern is specifically a probability matrix

M service demand clusters and N service frequency models are describedMapping probability between equations, each element o in the matrix_ijIndicates the probability of the occurrence of the jth service frequent pattern in the ith service requirement cluster, i.e.

Conditional probability p (sp)_j|clu_i) Can pass through a service frequent pattern sp_jIn service demand cluster clu_iThe number of occurrences in all service solutions handled is divided by the total number of service solutions within the service requirement cluster.

The invention has the beneficial effects that:

the invention takes the prior knowledge obtained from historical data as guidance, effectively reduces the search space of the service selection problem so as to achieve the aim of improving efficiency, overcomes the defect that the prior art does not consider the prior knowledge of the field, and makes up the blank of using partial existing service solution fragments to select services in the service selection field.

Drawings

FIG. 1 is a schematic flow diagram of the present invention.

FIG. 2 is a frequent pattern tree diagram of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

step 2: mining a solution existing in a service system by using an FP (Fabry-Perot) growth algorithm, and focusing attention on frequently and simultaneously called Web services in a historical record; although the reasons for binding these services together for use are varied, the solution fragments composed of these frequent services have significant guiding significance for the subsequent service solution construction;

and 4, step 4: based on bayesian theory, the service frequency pattern most likely to be used by new demand is calculated, as well as the likelihood of combining multiple service frequency patterns.

Further, the user requirements in step 1 are clustered by focusing on the non-functional requirements of the user and the constraint conditions on the service quality. And identifying representative features of each demand cluster, thereby avoiding analyzing each demand in historical data, and therefore when a new demand arrives, using prior knowledge in similar demand clusters thereof as a guide to quickly construct a service solution.

Further, in the Fuzzy Clustering Method (FCM) in step 1, a Similarity measurement method of an original algorithm is improved, a Pearson Correlation Coefficient (PCC) and a Cosine Similarity (COS) are used to replace an Euclidean Distance (Euclidean Distance), the Pearson Correlation Coefficient and the Cosine Similarity between two requirements are linearly superimposed, and a membership matrix μ M with a scale of mxh is established by the fuzzy clustering method to calculate a membership relationship between the Similarity or the requirement and a cluster center^M×HAggregating the H service demands into M user demand clusters Clu, wherein the membership matrix is defined as follows:

the cosine similarity is used to calculate the quality of service preference,

further, the clustering in step 1 is specifically that, by iteratively minimizing the objective function formula (lines 5-10),

respectively calculating the optimal demand cluster center (row 7) and the membership matrix (row 8) obtained in the iteration process; cluⁱAnd muⁱRespectively representing all the demand cluster centers in the ith iteration process and the optimal membership matrix obtained by the current iteration; when the change degree of the membership matrix obtained in the two iteration processes is smaller than a preset threshold value or the iteration times reach a preset maximum iteration time, the algorithm is stopped, and the obtained demand cluster center and the optimal membership matrix represent the optimal clustering division under the parameter setting.

Further, how the FP growth algorithm in step 2 identifies and discovers a valuable frequent pattern of the service from the history of the service system using the FP-growth algorithm, the FP-growth algorithm compresses the database providing the frequent item set to a frequent pattern tree (FP-tree), but still retains the item set association information, thereby avoiding the generation and storage of the frequent item candidate set required in the frequent pattern mining process. The algorithm consists of a frequent pattern tree construction method and an FP-growth mining method, which are respectively introduced below; the method comprises a construction method of the FP tree and an excavating method based on the FP-growth;

the construction method of the FP tree comprises the following steps: the method uses a compressed data structure, i.e., a frequent pattern tree, to represent a data set having a large number of samples. The frequent pattern tree reserves important information required by frequent pattern mining, and multiple data set scanning required in the mining process is avoided, so that unnecessary scanning cost is reduced, and the algorithm efficiency is improved; the frequent pattern tree is a special prefix tree, and is composed of a root node, a frequent item header table and item prefix subtrees (this section performs frequent pattern mining on a service history record, and is also called as service prefix subtrees), each service prefix subtree is a child node of the root node, as shown in fig. 2, each node of the service prefix subtree is composed of a service name, a support degree count and a node pointer, and the service name is an identifier of a node in the frequent pattern tree; the support count is used for representing the number of transactions meeting the condition in the data set, and if all frequently-served nodes arriving at the node path from the root node appear in a certain transaction, the transaction is considered to meet the condition; the node pointer points to the next node with the same service name in the frequent pattern tree, if no node which can be pointed to exists, the node is marked to be empty (null), the frequent item head table consists of the service name and a node head pointer, and the node head pointer points to the first node with the same service name in the frequent pattern tree;

see algorithm 2 for the FP-tree construction process. First, frequent services in the service data set are identified and arranged in descending order of support (line 1), and a frequent item header table is built to facilitate traversal of the frequent pattern tree (line 2), where nodes with the same service name are linked in turn by node pointers by pointing to the occurrence location of each service in the tree through the frequent item header table (line 5 in the method insert _ tree). After traversing all service solutions, a tree with node links is built, as shown in FIG. 2.

The construction of the frequent pattern tree is described below using example 1:

example 1: given a service history Log (as shown in tables 3-12) and a minimum support threshold ξ ═ 3, first, all service solutions in the data set are traversed (first two columns of table 1), services in the data set with a frequency greater than the minimum support threshold are discovered, and ranked in descending order of support (using symbols)<…>Express), that is<(ws₇₃:4),(ws₁₅:4),(ws₃₂:3),(ws₄₈:3),(ws₂₂:3),(ws₉₁:3)>The frequent services (third column of table 1) and the frequent item header tables are arranged in descending support order to ensure that the nodes included in each path in the frequent pattern tree follow the descending support order.

TABLE 1 service solutions and frequency services

Secondly, a root node of the frequent pattern tree is created and marked as a null node, and the frequent pattern tree can be generated only by traversing all the service solution sets Sol again. Traversing service solutions sol₁The first branch of the frequent pattern tree can be obtained<(ws₇₃:1),(ws₁₅:1),(ws₃₂:1),(ws₄₈:1),(ws₂₂:1)>Since each node on the branch occurs for the first time, the corresponding node head pointer in the frequent entry head table is pointed to the corresponding node. It should be noted that each branch appearing in the frequent pattern tree needs to satisfy the descending order of support. Present in the service solution sol₂The frequent service is<ws₇₃,ws₁₅,ws₃₂,ws₂₂>And existing branch<(ws₇₃:1),(ws₁₅:1),(ws₃₂:1),(ws₄₈:1),(ws₂₂:1)>Having the same prefix<(ws₇₃:1),(ws₁₅:1),(ws₃₂:1)>Then, the support of all nodes on the prefix is increased by one to create a new node (ws)₂₂1) as (ws)₃₂1) and associating nodes with the same service name, i.e. an already existing node (ws)₂₂:1)((ws₄₈Child node of 1)) points to the newly created node (ws)₂₂:1)((ws₃₂Sub-node of 1). Service solution sol₃Frequent servicing of<ws₇₃,ws₁₅,ws₃₂,ws₄₈,ws₉₁>Sharing prefixes with existing paths<(ws₇₃:2),(ws₁₅:2),(ws₃₂:2),(ws₄₈:1)>Thus, each node support on the prefix is incremented by one, creating (ws)₄₈2) child node (ws)₉₁1) and will frequently enter ws in the header table₉₁Point to the new node (ws)₉₁:1). Since service solution sol₄Is a frequent service set of<ws₇₃,ws₂₂,ws₉₁>Sharing prefix nodes ws with existing frequent pattern trees₇₃Then, the node ws₇₃Is added to one, created (ws)₇₃4) sub-branches<(ws₂₂:1),(ws₉₁:1)>And finds and then associates with nodes having the same service name using the frequent entry header table. Traversing service solutions sol₅Creating a second branch of the frequent pattern tree, i.e.<(ws₁₅:1),(ws₄₈:1),(ws₉₁:1)>。

The excavating method based on FP-growth comprises the following steps: for each frequent service ws appearing in the frequent item header table_ijTraversing (lines 10-17), wherein the traversing sequence is in ascending order of support degree, namely starting from the tail part of the head table of the frequent items to the end of the head, and each frequent service is taken as an initial Suffix Pattern (Suffix Pattern); the algorithm targets each frequent service ws_ijTry to find all the nodes containing ws through the pointer head node_ijThus creating for each suffix pattern a conditional pattern base (see table 2 second column) defined as a sub data set containing nodes co-occurring with the suffix pattern, consisting of all prefix paths to the suffix pattern; if there is a node in the conditional pattern base of the suffix pattern that co-occurs with the suffix pattern more than the minimum support threshold, then the algorithm builds a conditional frequent pattern tree for the suffix pattern using the conditional pattern base at line 14 (denoted by the symbol { … } in the third column of Table 2); subsequent mining processes may be on the conditional frequent pattern treeProceeding recursively, if the conditional frequent pattern tree is not empty, the serving frequent pattern can grow continuously (line 16).

It is noted that a special structure, i.e., a Single Prefix Path (Single Prefix Path), may exist in the frequent pattern tree. The single prefix path means that the frequent pattern tree only has one single path, namely each node except leaf child nodes in the tree has one and only one child node; or a single prefix path exists from the root node to the first node of the bifurcation, which refers to a node having at least two child nodes. Assuming that the frequent pattern tree T has a single prefix path P, the frequent pattern set of the frequent pattern tree T must include a full permutation and combination between any child nodes of the single prefix path P, and the support degree is determined by the node having the minimum support degree on the child path. The mining efficiency of the service frequency mode can be effectively improved by processing the single prefix path according to the method (lines 2-6).

Further, the step 4 of the frequent service mode specifically includes that the statistical probability reflects the prior knowledge in the history record, and how to use the conditional probability to establish a mapping probability matrix reflecting the service requirement cluster and the frequent service mode. The construction method of the probability matrix is shown as an algorithm 4; given a certain service demand cluster, which service frequent patterns are more likely to be adopted by the service solution meeting the demand in the cluster can be known according to the statistical result; probability matrix

Mapping probabilities between M service requirement clusters and N service frequent patterns are described, each element o in the matrix_ijIndicates the probability of the occurrence of the jth service frequent pattern in the ith service requirement cluster, i.e.

Example 2

In the service system, although the service quality constraint and the service quality preference contained in each user demand are personalized, when large-scale user demands are gathered together, a specific demand distribution is still embodied, and the demands can be divided into user demand groups with different characteristics. The embodiment of the invention provides a fuzzy clustering method (fuzzy C-Means algorithm, FCM) for solving the problem of user demand clustering, improves a Similarity measurement method of an original algorithm, uses Pearson Correlation Coefficient (PCC) and Cosine Similarity (COS) to replace Euclidean Distance (Euclidean Distance), linearly superposes the Pearson Correlation Coefficient and the Cosine Similarity between two demands, and calculates the Similarity between the demands or the membership between the demands and a cluster center.

Example 3

A compressed data structure, i.e., a frequent pattern tree, is used to represent a data set having a large number of samples. The frequent pattern tree reserves important information required by frequent pattern mining, and multiple data set scanning required in the mining process is avoided, so that unnecessary scanning cost is reduced, and the algorithm efficiency is improved.

Example 4

The frequent Pattern growth algorithm traverses each frequent service appearing in the frequent item head table in an ascending order of support, namely, from the tail part to the head part of the frequent item head table, and each frequent service is taken as an initial Suffix Pattern (Suffix Pattern). The algorithm attempts to find, for each frequent service, all the potentially frequent patterns containing that service by means of the pointer head node, thus creating for each postfix pattern a conditional pattern base, defined as a sub-data set containing nodes co-occurring with the postfix pattern, consisting of all the prefix paths to reach the postfix pattern. If there is a node in the conditional pattern base of the suffix pattern that co-occurs with the suffix pattern more than a minimum support threshold, the algorithm builds a conditional frequent pattern tree for the suffix pattern using the conditional pattern base. The subsequent mining process can be performed in a recursive manner on the conditional frequent pattern tree, and if the conditional frequent pattern tree is not empty, the service frequent pattern can be continuously increased.

Example 5

Probability mapping relationships hidden between requirements and solutions are described and expressed by establishing probability matrices. The statistical probability reflects the prior knowledge in the historical records, and a mapping probability matrix reflecting the service demand cluster and the frequent mode is established by using the conditional probability. When a certain service demand cluster is given, which service frequent patterns are more prone to be adopted by the service solution meeting the demand in the cluster can be known according to the statistical result.

Example 6

The probability matrix reveals a priori knowledge in the service history, which can be extracted and collated and used to guide the service solution construction required by the new service requirements. In addition, the probability matrix also embodies the distribution of user demands and the distribution of frequent patterns. The embodiment of the invention uses the service selection problem as a case to research the use effect of the probability matrix. The embodiment of the invention selects a Global planning optimization method (GP) and an artificial Bee Colony Algorithm (ABC) based on integer programming as improved objects. The global planning method and the artificial bee colony algorithm have the advantages that the solution quality and the algorithm efficiency are respectively high, the former finds a feasible service solution with the highest quality by traversing all solution spaces, and the latter randomly searches the solution space in an iterative mode to try to quickly find an approximate optimal solution, namely a similar solution of the global optimal solution.

The above embodiments of the present invention provide a service selection method, and as various Web services emerge in the internet, the magnitude of available services is continuously increasing, and the relationship between services becomes more and more complex, which brings great challenges to service selection. Due to the huge search space, the existing method is difficult to select the optimal service from a large number of candidate services within a limited time, and a feasible service solution is constructed for the user. The invention takes the prior knowledge obtained from historical data as guidance, effectively reduces the search space of the service selection problem and achieves the aim of improving efficiency. The method overcomes the defect that the prior art does not consider the prior knowledge of the field, and makes up the blank of using the existing partial service solution fragments to select the service in the service selection field.

Claims

1. A service selection method based on a demand service probability matrix is characterized by comprising the following steps:

2. The method of claim 1, wherein the user requirements in step 1 are clustered to focus on non-functional requirements of users and constraints on service quality.

3. The method of claim 1, wherein the fuzzy clustering method in step 1 is an improvement of a similarity measurement method of an original algorithm, and the Pearson correlation coefficient and the cosine similarity are used instead of the Euclidean distance, and the Pearson correlation coefficient and the cosine similarity between two demands are linearly overlapped, and the method further comprises the step of performing linear superposition on the Pearson correlation coefficient and the cosine similarity between the two demandsFrom the similarity between the calculated demands or the membership between the demands and the cluster center, the fuzzy clustering method establishes the membership matrix mu with the scale of M multiplied by H^M×HAggregating the H service demands into M user demand clusters Clu, wherein the membership matrix is defined as follows:

4. The method of claim 3, wherein different calculation methods are used for QoS constraint and QoS preference in user demand,

the cosine similarity is used to calculate the quality of service preference,

5. the method as claimed in claim 2, wherein the clustering in step 1 is specifically to minimize an objective function formula through iteration

6. The method for selecting a service based on a demand service probability matrix according to claim 2, wherein the FP growth algorithm in the step 2 comprises a FP tree construction method and a FP-growth-based mining method;

7. The method of claim 1, wherein the step of selecting the service based on the probability matrix of demand service is further characterized byStep 4 frequent pattern of service is, in particular, probability matrix

Mapping probabilities between M service requirement clusters and N service frequent patterns are described, each element o in the matrix_ijIndicates the probability of occurrence of the jth service frequent pattern in the ith service requirement cluster, i.e. o_ij＝p(sp_j|clu_i)，clu_i∈Clu，sp_j∈SP，