CN113553493A - Service selection method based on demand service probability matrix - Google Patents

Service selection method based on demand service probability matrix Download PDF

Info

Publication number
CN113553493A
CN113553493A CN202010333583.5A CN202010333583A CN113553493A CN 113553493 A CN113553493 A CN 113553493A CN 202010333583 A CN202010333583 A CN 202010333583A CN 113553493 A CN113553493 A CN 113553493A
Authority
CN
China
Prior art keywords
service
frequent
node
pattern
demand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010333583.5A
Other languages
Chinese (zh)
Inventor
刘睿霖
徐汉川
王忠杰
涂志莹
徐晓飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202010333583.5A priority Critical patent/CN113553493A/en
Publication of CN113553493A publication Critical patent/CN113553493A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a service selection method based on a demand service probability matrix. Step 1: clustering the user requirements in the historical service records by using a fuzzy clustering method; step 2: mining a solution existing in a service system by using an FP (Fabry-Perot) growth algorithm, and focusing attention on frequently and simultaneously called Web services in a historical record; and step 3: the efficiency of the traditional service selection problem is improved by establishing a probability matrix; and 4, step 4: the service frequency pattern most likely to be used by the new demand is calculated, and the possibility of combining multiple service frequency patterns is calculated. The invention takes the prior knowledge obtained from historical data as guidance, effectively reduces the search space of the service selection problem and achieves the aim of improving efficiency. The method overcomes the defect that the prior art does not consider the prior knowledge of the field, and makes up the blank of using the existing partial service solution fragments to select the service in the service selection field.

Description

Service selection method based on demand service probability matrix
Technical Field
The invention relates to the technical field of service selection, in particular to a service selection method based on a demand service probability matrix.
Background
With the proliferation of a wide variety of Web services in the internet, the number of available services continues to grow, and the relationships between services become more complex, posing significant challenges to service selection. Due to the huge search space, the existing method is difficult to select the optimal service from a large number of candidate services within a limited time, and a feasible service solution is constructed for the user.
Disclosure of Invention
The invention provides a service selection method based on a demand service probability matrix, which is used for analyzing and finding the distribution rule and the characteristics of user demands and dividing a large number of demands by using a fuzzy clustering method so as to form service demand clusters with different characteristics; on the other hand, a service (frequent) mode, namely a valuable service solution fragment, is mined from a historical service solution, secondly, a demand-service probability matrix is constructed, the matrix represents the statistical probability of the mapping relation between the service demand cluster and the service frequent mode, and finally, demand-service and a traditional service selection method are combined, so that the service selection efficiency is improved.
The invention is realized by the following technical scheme:
a service selection method based on a demand service probability matrix, the service selection method comprising the steps of:
step 1: clustering the user requirements in the historical service records by using a fuzzy clustering method;
step 2: mining a solution existing in a service system by using an FP (Fabry-Perot) growth algorithm, and focusing attention on frequently and simultaneously called Web services in a historical record;
and step 3: the efficiency of the traditional service selection problem is improved by establishing a probability matrix;
and 4, step 4: the service frequency pattern most likely to be used by the new demand is calculated, and the possibility of combining multiple service frequency patterns is calculated.
Further, the user requirements in step 1 are clustered by focusing on the non-functional requirements of the user and the constraint conditions on the service quality.
Further, the fuzzy clustering method in step 1 improves a similarity measurement method of an original algorithm, uses a pearson correlation coefficient and a cosine similarity to replace an euclidean distance, linearly superposes the pearson correlation coefficient and the cosine similarity between two demands, calculates the similarity between the demands or the membership between the demands and a cluster center, and establishes a membership matrix μ with a scale of mxhM×HAggregating the H service demands into M user demand clusters Clu, wherein the membership matrix is defined as follows:
Figure BDA0002465817190000021
wherein, muijRepresenting the membership degree of the jth service requirement belonging to the ith requirement cluster, wherein the membership degree matrix needs to meet the following limiting conditions:
Figure BDA0002465817190000022
μijthe value range of (a) is from 0 to 1, the larger the numerical value is, the larger the degree of the requirement j belonging to the requirement cluster i is, for any requirement j, the sum of the membership degrees belonging to any requirement cluster is 1, and the total number of the clustering results is M requirement clusters, so that the sum of all the requirement membership degrees in each requirement cluster needs to be more than 0 and less than H.
Furthermore, aiming at the service quality constraint and the service quality preference in the user requirement, different calculation methods are respectively adopted,
Figure BDA0002465817190000023
the similarity of the quality of service constraints is calculated using pearson correlation coefficients,
Figure BDA0002465817190000024
the cosine similarity is used to calculate the quality of service preference,
Figure BDA0002465817190000025
further, the clustering in step 1 is specifically that, by iteratively minimizing an objective function formula,
Figure BDA0002465817190000026
the similarity between the sample point and the cluster center gradually approaches to the maximum value, and in each iteration process, the following formula is used,
Figure BDA0002465817190000031
Figure BDA0002465817190000032
respectively calculating the optimal demand cluster center and the membership matrix obtained in the iteration process; cluiAnd muiRespectively representing all the demand cluster centers in the ith iteration process and the optimal membership matrix obtained by the current iteration; when the change degree of the membership matrix obtained in the two iteration processes is smaller than a preset threshold value or the iteration times reach a preset maximum iteration time, the algorithm is stopped, and the obtained demand cluster center and the optimal membership matrix represent the optimal clustering division under the parameter setting.
Further, the FP growth algorithm in the step 2 comprises a construction method of the FP tree and an excavation method based on FP-growth;
the construction method of the FP tree comprises the following steps: the frequent pattern tree is a special prefix tree and consists of a root node, a frequent item head table and item prefix subtrees, each service prefix subtree is a child node of the root node, each node of the service prefix subtree consists of a service name, a support degree count and a node pointer, and the service name is an identifier of a node in the frequent pattern tree; the support count is used for representing the number of transactions meeting the condition in the data set, and if all frequently-served nodes arriving at the node path from the root node appear in a certain transaction, the transaction is considered to meet the condition; the node pointer points to the next node with the same service name in the frequent pattern tree, if no node which can be pointed to exists, the node is marked as null, the frequent item head table consists of the service name and a node head pointer, and the node head pointer points to the first node with the same service name in the frequent pattern tree;
the excavating method based on FP-growth comprises the following steps: for each frequent service ws appearing in the frequent item header tableijTraversing, wherein the traversing sequence is a support ascending sequence, namely, each frequent service is taken as an initial suffix mode from the tail part to the head part of the head table of the frequent item; the algorithm targets each frequent service wsijTry to find all the nodes containing ws through the pointer head nodeijThus creating for each suffix pattern a conditional pattern base, defined as a sub data set containing nodes co-occurring with the suffix pattern, consisting of all prefix paths to the suffix pattern; if there is a node in the conditional pattern base of the suffix pattern that co-occurs with the suffix pattern more than the minimum support threshold, then the algorithm builds a conditional frequent pattern tree for the suffix pattern using the conditional pattern base, line 14; the subsequent mining process can be performed in a recursive manner on the conditional frequent pattern tree, and if the conditional frequent pattern tree is not empty, the service frequent pattern can be continuously increased.
Further, the step 4 service frequent pattern is specifically a probability matrix
Figure BDA0002465817190000041
M service demand clusters and N service frequency models are describedMapping probability between equations, each element o in the matrixijIndicates the probability of the occurrence of the jth service frequent pattern in the ith service requirement cluster, i.e.
Figure BDA0002465817190000042
Figure BDA0002465817190000043
Conditional probability p (sp)j|clui) Can pass through a service frequent pattern spjIn service demand cluster cluiThe number of occurrences in all service solutions handled is divided by the total number of service solutions within the service requirement cluster.
The invention has the beneficial effects that:
the invention takes the prior knowledge obtained from historical data as guidance, effectively reduces the search space of the service selection problem so as to achieve the aim of improving efficiency, overcomes the defect that the prior art does not consider the prior knowledge of the field, and makes up the blank of using partial existing service solution fragments to select services in the service selection field.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
FIG. 2 is a frequent pattern tree diagram of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A service selection method based on a demand service probability matrix, the service selection method comprising the steps of:
step 1: clustering the user requirements in the historical service records by using a fuzzy clustering method;
step 2: mining a solution existing in a service system by using an FP (Fabry-Perot) growth algorithm, and focusing attention on frequently and simultaneously called Web services in a historical record; although the reasons for binding these services together for use are varied, the solution fragments composed of these frequent services have significant guiding significance for the subsequent service solution construction;
and step 3: the efficiency of the traditional service selection problem is improved by establishing a probability matrix;
and 4, step 4: based on bayesian theory, the service frequency pattern most likely to be used by new demand is calculated, as well as the likelihood of combining multiple service frequency patterns.
Further, the user requirements in step 1 are clustered by focusing on the non-functional requirements of the user and the constraint conditions on the service quality. And identifying representative features of each demand cluster, thereby avoiding analyzing each demand in historical data, and therefore when a new demand arrives, using prior knowledge in similar demand clusters thereof as a guide to quickly construct a service solution.
Further, in the Fuzzy Clustering Method (FCM) in step 1, a Similarity measurement method of an original algorithm is improved, a Pearson Correlation Coefficient (PCC) and a Cosine Similarity (COS) are used to replace an Euclidean Distance (Euclidean Distance), the Pearson Correlation Coefficient and the Cosine Similarity between two requirements are linearly superimposed, and a membership matrix μ M with a scale of mxh is established by the fuzzy clustering method to calculate a membership relationship between the Similarity or the requirement and a cluster centerM×HAggregating the H service demands into M user demand clusters Clu, wherein the membership matrix is defined as follows:
Figure BDA0002465817190000051
wherein, muijRepresenting the membership degree of the jth service requirement belonging to the ith requirement cluster, wherein the membership degree matrix needs to meet the following limiting conditions:
Figure BDA0002465817190000052
μijthe value range of (a) is from 0 to 1, the larger the numerical value is, the larger the degree of the requirement j belonging to the requirement cluster i is, for any requirement j, the sum of the membership degrees belonging to any requirement cluster is 1, and the total number of the clustering results is M requirement clusters, so that the sum of all the requirement membership degrees in each requirement cluster needs to be more than 0 and less than H.
Furthermore, aiming at the service quality constraint and the service quality preference in the user requirement, different calculation methods are respectively adopted,
Figure BDA0002465817190000061
the similarity of the quality of service constraints is calculated using pearson correlation coefficients,
Figure BDA0002465817190000062
the cosine similarity is used to calculate the quality of service preference,
Figure BDA0002465817190000063
further, the clustering in step 1 is specifically that, by iteratively minimizing the objective function formula (lines 5-10),
Figure BDA0002465817190000064
the similarity between the sample point and the cluster center gradually approaches to the maximum value, and in each iteration process, the following formula is used,
Figure BDA0002465817190000065
Figure BDA0002465817190000066
respectively calculating the optimal demand cluster center (row 7) and the membership matrix (row 8) obtained in the iteration process; cluiAnd muiRespectively representing all the demand cluster centers in the ith iteration process and the optimal membership matrix obtained by the current iteration; when the change degree of the membership matrix obtained in the two iteration processes is smaller than a preset threshold value or the iteration times reach a preset maximum iteration time, the algorithm is stopped, and the obtained demand cluster center and the optimal membership matrix represent the optimal clustering division under the parameter setting.
Figure BDA0002465817190000067
Figure BDA0002465817190000071
Further, how the FP growth algorithm in step 2 identifies and discovers a valuable frequent pattern of the service from the history of the service system using the FP-growth algorithm, the FP-growth algorithm compresses the database providing the frequent item set to a frequent pattern tree (FP-tree), but still retains the item set association information, thereby avoiding the generation and storage of the frequent item candidate set required in the frequent pattern mining process. The algorithm consists of a frequent pattern tree construction method and an FP-growth mining method, which are respectively introduced below; the method comprises a construction method of the FP tree and an excavating method based on the FP-growth;
the construction method of the FP tree comprises the following steps: the method uses a compressed data structure, i.e., a frequent pattern tree, to represent a data set having a large number of samples. The frequent pattern tree reserves important information required by frequent pattern mining, and multiple data set scanning required in the mining process is avoided, so that unnecessary scanning cost is reduced, and the algorithm efficiency is improved; the frequent pattern tree is a special prefix tree, and is composed of a root node, a frequent item header table and item prefix subtrees (this section performs frequent pattern mining on a service history record, and is also called as service prefix subtrees), each service prefix subtree is a child node of the root node, as shown in fig. 2, each node of the service prefix subtree is composed of a service name, a support degree count and a node pointer, and the service name is an identifier of a node in the frequent pattern tree; the support count is used for representing the number of transactions meeting the condition in the data set, and if all frequently-served nodes arriving at the node path from the root node appear in a certain transaction, the transaction is considered to meet the condition; the node pointer points to the next node with the same service name in the frequent pattern tree, if no node which can be pointed to exists, the node is marked to be empty (null), the frequent item head table consists of the service name and a node head pointer, and the node head pointer points to the first node with the same service name in the frequent pattern tree;
see algorithm 2 for the FP-tree construction process. First, frequent services in the service data set are identified and arranged in descending order of support (line 1), and a frequent item header table is built to facilitate traversal of the frequent pattern tree (line 2), where nodes with the same service name are linked in turn by node pointers by pointing to the occurrence location of each service in the tree through the frequent item header table (line 5 in the method insert _ tree). After traversing all service solutions, a tree with node links is built, as shown in FIG. 2.
Figure BDA0002465817190000081
The construction of the frequent pattern tree is described below using example 1:
example 1: given a service history Log (as shown in tables 3-12) and a minimum support threshold ξ ═ 3, first, all service solutions in the data set are traversed (first two columns of table 1), services in the data set with a frequency greater than the minimum support threshold are discovered, and ranked in descending order of support (using symbols)<…>Express), that is<(ws73:4),(ws15:4),(ws32:3),(ws48:3),(ws22:3),(ws91:3)>The frequent services (third column of table 1) and the frequent item header tables are arranged in descending support order to ensure that the nodes included in each path in the frequent pattern tree follow the descending support order.
TABLE 1 service solutions and frequency services
Figure BDA0002465817190000082
Figure BDA0002465817190000091
Secondly, a root node of the frequent pattern tree is created and marked as a null node, and the frequent pattern tree can be generated only by traversing all the service solution sets Sol again. Traversing service solutions sol1The first branch of the frequent pattern tree can be obtained<(ws73:1),(ws15:1),(ws32:1),(ws48:1),(ws22:1)>Since each node on the branch occurs for the first time, the corresponding node head pointer in the frequent entry head table is pointed to the corresponding node. It should be noted that each branch appearing in the frequent pattern tree needs to satisfy the descending order of support. Present in the service solution sol2The frequent service is<ws73,ws15,ws32,ws22>And existing branch<(ws73:1),(ws15:1),(ws32:1),(ws48:1),(ws22:1)>Having the same prefix<(ws73:1),(ws15:1),(ws32:1)>Then, the support of all nodes on the prefix is increased by one to create a new node (ws)221) as (ws)321) and associating nodes with the same service name, i.e. an already existing node (ws)22:1)((ws48Child node of 1)) points to the newly created node (ws)22:1)((ws32Sub-node of 1). Service solution sol3Frequent servicing of<ws73,ws15,ws32,ws48,ws91>Sharing prefixes with existing paths<(ws73:2),(ws15:2),(ws32:2),(ws48:1)>Thus, each node support on the prefix is incremented by one, creating (ws)482) child node (ws)911) and will frequently enter ws in the header table91Point to the new node (ws)91:1). Since service solution sol4Is a frequent service set of<ws73,ws22,ws91>Sharing prefix nodes ws with existing frequent pattern trees73Then, the node ws73Is added to one, created (ws)734) sub-branches<(ws22:1),(ws91:1)>And finds and then associates with nodes having the same service name using the frequent entry header table. Traversing service solutions sol5Creating a second branch of the frequent pattern tree, i.e.<(ws15:1),(ws48:1),(ws91:1)>。
The excavating method based on FP-growth comprises the following steps: for each frequent service ws appearing in the frequent item header tableijTraversing (lines 10-17), wherein the traversing sequence is in ascending order of support degree, namely starting from the tail part of the head table of the frequent items to the end of the head, and each frequent service is taken as an initial Suffix Pattern (Suffix Pattern); the algorithm targets each frequent service wsijTry to find all the nodes containing ws through the pointer head nodeijThus creating for each suffix pattern a conditional pattern base (see table 2 second column) defined as a sub data set containing nodes co-occurring with the suffix pattern, consisting of all prefix paths to the suffix pattern; if there is a node in the conditional pattern base of the suffix pattern that co-occurs with the suffix pattern more than the minimum support threshold, then the algorithm builds a conditional frequent pattern tree for the suffix pattern using the conditional pattern base at line 14 (denoted by the symbol { … } in the third column of Table 2); subsequent mining processes may be on the conditional frequent pattern treeProceeding recursively, if the conditional frequent pattern tree is not empty, the serving frequent pattern can grow continuously (line 16).
Figure BDA0002465817190000101
It is noted that a special structure, i.e., a Single Prefix Path (Single Prefix Path), may exist in the frequent pattern tree. The single prefix path means that the frequent pattern tree only has one single path, namely each node except leaf child nodes in the tree has one and only one child node; or a single prefix path exists from the root node to the first node of the bifurcation, which refers to a node having at least two child nodes. Assuming that the frequent pattern tree T has a single prefix path P, the frequent pattern set of the frequent pattern tree T must include a full permutation and combination between any child nodes of the single prefix path P, and the support degree is determined by the node having the minimum support degree on the child path. The mining efficiency of the service frequency mode can be effectively improved by processing the single prefix path according to the method (lines 2-6).
Further, the step 4 of the frequent service mode specifically includes that the statistical probability reflects the prior knowledge in the history record, and how to use the conditional probability to establish a mapping probability matrix reflecting the service requirement cluster and the frequent service mode. The construction method of the probability matrix is shown as an algorithm 4; given a certain service demand cluster, which service frequent patterns are more likely to be adopted by the service solution meeting the demand in the cluster can be known according to the statistical result; probability matrix
Figure BDA0002465817190000111
Mapping probabilities between M service requirement clusters and N service frequent patterns are described, each element o in the matrixijIndicates the probability of the occurrence of the jth service frequent pattern in the ith service requirement cluster, i.e.
Figure BDA0002465817190000112
Figure BDA0002465817190000113
Conditional probability p (sp)j|clui) Can pass through a service frequent pattern spjIn service demand cluster cluiThe number of occurrences in all service solutions handled is divided by the total number of service solutions within the service requirement cluster.
Example 2
In the service system, although the service quality constraint and the service quality preference contained in each user demand are personalized, when large-scale user demands are gathered together, a specific demand distribution is still embodied, and the demands can be divided into user demand groups with different characteristics. The embodiment of the invention provides a fuzzy clustering method (fuzzy C-Means algorithm, FCM) for solving the problem of user demand clustering, improves a Similarity measurement method of an original algorithm, uses Pearson Correlation Coefficient (PCC) and Cosine Similarity (COS) to replace Euclidean Distance (Euclidean Distance), linearly superposes the Pearson Correlation Coefficient and the Cosine Similarity between two demands, and calculates the Similarity between the demands or the membership between the demands and a cluster center.
Example 3
A compressed data structure, i.e., a frequent pattern tree, is used to represent a data set having a large number of samples. The frequent pattern tree reserves important information required by frequent pattern mining, and multiple data set scanning required in the mining process is avoided, so that unnecessary scanning cost is reduced, and the algorithm efficiency is improved.
Example 4
The frequent Pattern growth algorithm traverses each frequent service appearing in the frequent item head table in an ascending order of support, namely, from the tail part to the head part of the frequent item head table, and each frequent service is taken as an initial Suffix Pattern (Suffix Pattern). The algorithm attempts to find, for each frequent service, all the potentially frequent patterns containing that service by means of the pointer head node, thus creating for each postfix pattern a conditional pattern base, defined as a sub-data set containing nodes co-occurring with the postfix pattern, consisting of all the prefix paths to reach the postfix pattern. If there is a node in the conditional pattern base of the suffix pattern that co-occurs with the suffix pattern more than a minimum support threshold, the algorithm builds a conditional frequent pattern tree for the suffix pattern using the conditional pattern base. The subsequent mining process can be performed in a recursive manner on the conditional frequent pattern tree, and if the conditional frequent pattern tree is not empty, the service frequent pattern can be continuously increased.
Example 5
Probability mapping relationships hidden between requirements and solutions are described and expressed by establishing probability matrices. The statistical probability reflects the prior knowledge in the historical records, and a mapping probability matrix reflecting the service demand cluster and the frequent mode is established by using the conditional probability. When a certain service demand cluster is given, which service frequent patterns are more prone to be adopted by the service solution meeting the demand in the cluster can be known according to the statistical result.
Example 6
The probability matrix reveals a priori knowledge in the service history, which can be extracted and collated and used to guide the service solution construction required by the new service requirements. In addition, the probability matrix also embodies the distribution of user demands and the distribution of frequent patterns. The embodiment of the invention uses the service selection problem as a case to research the use effect of the probability matrix. The embodiment of the invention selects a Global planning optimization method (GP) and an artificial Bee Colony Algorithm (ABC) based on integer programming as improved objects. The global planning method and the artificial bee colony algorithm have the advantages that the solution quality and the algorithm efficiency are respectively high, the former finds a feasible service solution with the highest quality by traversing all solution spaces, and the latter randomly searches the solution space in an iterative mode to try to quickly find an approximate optimal solution, namely a similar solution of the global optimal solution.
The above embodiments of the present invention provide a service selection method, and as various Web services emerge in the internet, the magnitude of available services is continuously increasing, and the relationship between services becomes more and more complex, which brings great challenges to service selection. Due to the huge search space, the existing method is difficult to select the optimal service from a large number of candidate services within a limited time, and a feasible service solution is constructed for the user. The invention takes the prior knowledge obtained from historical data as guidance, effectively reduces the search space of the service selection problem and achieves the aim of improving efficiency. The method overcomes the defect that the prior art does not consider the prior knowledge of the field, and makes up the blank of using the existing partial service solution fragments to select the service in the service selection field.

Claims (7)

1. A service selection method based on a demand service probability matrix is characterized by comprising the following steps:
step 1: clustering the user requirements in the historical service records by using a fuzzy clustering method;
step 2: mining a solution existing in a service system by using an FP (Fabry-Perot) growth algorithm, and focusing attention on frequently and simultaneously called Web services in a historical record;
and step 3: the efficiency of the traditional service selection problem is improved by establishing a probability matrix;
and 4, step 4: the service frequency pattern most likely to be used by the new demand is calculated, and the possibility of combining multiple service frequency patterns is calculated.
2. The method of claim 1, wherein the user requirements in step 1 are clustered to focus on non-functional requirements of users and constraints on service quality.
3. The method of claim 1, wherein the fuzzy clustering method in step 1 is an improvement of a similarity measurement method of an original algorithm, and the Pearson correlation coefficient and the cosine similarity are used instead of the Euclidean distance, and the Pearson correlation coefficient and the cosine similarity between two demands are linearly overlapped, and the method further comprises the step of performing linear superposition on the Pearson correlation coefficient and the cosine similarity between the two demandsFrom the similarity between the calculated demands or the membership between the demands and the cluster center, the fuzzy clustering method establishes the membership matrix mu with the scale of M multiplied by HM×HAggregating the H service demands into M user demand clusters Clu, wherein the membership matrix is defined as follows:
Figure FDA0002465817180000011
wherein, muijRepresenting the membership degree of the jth service requirement belonging to the ith requirement cluster, wherein the membership degree matrix needs to meet the following limiting conditions:
Figure FDA0002465817180000012
μijthe value range of (a) is from 0 to 1, the larger the numerical value is, the larger the degree of the requirement j belonging to the requirement cluster i is, for any requirement j, the sum of the membership degrees belonging to any requirement cluster is 1, and the total number of the clustering results is M requirement clusters, so that the sum of all the requirement membership degrees in each requirement cluster needs to be more than 0 and less than H.
4. The method of claim 3, wherein different calculation methods are used for QoS constraint and QoS preference in user demand,
Figure FDA0002465817180000021
the similarity of the quality of service constraints is calculated using pearson correlation coefficients,
Figure FDA0002465817180000022
the cosine similarity is used to calculate the quality of service preference,
Figure FDA0002465817180000023
5. the method as claimed in claim 2, wherein the clustering in step 1 is specifically to minimize an objective function formula through iteration
Figure FDA0002465817180000024
The similarity between the sample point and the cluster center gradually approaches to the maximum value, and in each iteration process, the following formula is used,
Figure FDA0002465817180000025
Figure FDA0002465817180000026
respectively calculating the optimal demand cluster center and the membership matrix obtained in the iteration process; cluiAnd muiRespectively representing all the demand cluster centers in the ith iteration process and the optimal membership matrix obtained by the current iteration; when the change degree of the membership matrix obtained in the two iteration processes is smaller than a preset threshold value or the iteration times reach a preset maximum iteration time, the algorithm is stopped, and the obtained demand cluster center and the optimal membership matrix represent the optimal clustering division under the parameter setting.
6. The method for selecting a service based on a demand service probability matrix according to claim 2, wherein the FP growth algorithm in the step 2 comprises a FP tree construction method and a FP-growth-based mining method;
the construction method of the FP tree comprises the following steps: the frequent pattern tree is a special prefix tree and consists of a root node, a frequent item head table and item prefix subtrees, each service prefix subtree is a child node of the root node, each node of the service prefix subtree consists of a service name, a support degree count and a node pointer, and the service name is an identifier of a node in the frequent pattern tree; the support count is used for representing the number of transactions meeting the condition in the data set, and if all frequently-served nodes arriving at the node path from the root node appear in a certain transaction, the transaction is considered to meet the condition; the node pointer points to the next node with the same service name in the frequent pattern tree, if no node which can be pointed to exists, the node is marked as null, the frequent item head table consists of the service name and a node head pointer, and the node head pointer points to the first node with the same service name in the frequent pattern tree;
the excavating method based on FP-growth comprises the following steps: for each frequent service ws appearing in the frequent item header tableijTraversing, wherein the traversing sequence is a support ascending sequence, namely, each frequent service is taken as an initial suffix mode from the tail part to the head part of the head table of the frequent item; the algorithm targets each frequent service wsijTry to find all the nodes containing ws through the pointer head nodeijThus creating for each suffix pattern a conditional pattern base, defined as a sub data set containing nodes co-occurring with the suffix pattern, consisting of all prefix paths to the suffix pattern; if there is a node in the conditional pattern base of the suffix pattern that co-occurs with the suffix pattern more than the minimum support threshold, then the algorithm builds a conditional frequent pattern tree for the suffix pattern using the conditional pattern base, line 14; the subsequent mining process can be performed in a recursive manner on the conditional frequent pattern tree, and if the conditional frequent pattern tree is not empty, the service frequent pattern can be continuously increased.
7. The method of claim 1, wherein the step of selecting the service based on the probability matrix of demand service is further characterized byStep 4 frequent pattern of service is, in particular, probability matrix
Figure FDA0002465817180000031
Mapping probabilities between M service requirement clusters and N service frequent patterns are described, each element o in the matrixijIndicates the probability of occurrence of the jth service frequent pattern in the ith service requirement cluster, i.e. oij=p(spj|clui),clui∈Clu,spj∈SP,
Figure FDA0002465817180000032
Conditional probability p (sp)j|clui) Can pass through a service frequent pattern spjIn service demand cluster cluiThe number of occurrences in all service solutions handled is divided by the total number of service solutions within the service requirement cluster.
CN202010333583.5A 2020-04-24 2020-04-24 Service selection method based on demand service probability matrix Pending CN113553493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010333583.5A CN113553493A (en) 2020-04-24 2020-04-24 Service selection method based on demand service probability matrix

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010333583.5A CN113553493A (en) 2020-04-24 2020-04-24 Service selection method based on demand service probability matrix

Publications (1)

Publication Number Publication Date
CN113553493A true CN113553493A (en) 2021-10-26

Family

ID=78129712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010333583.5A Pending CN113553493A (en) 2020-04-24 2020-04-24 Service selection method based on demand service probability matrix

Country Status (1)

Country Link
CN (1) CN113553493A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114758528A (en) * 2022-03-31 2022-07-15 中国民用航空飞行学院 Airport terminal area capacity prediction method based on service resource supply and demand balance
CN116681266A (en) * 2023-08-02 2023-09-01 广东台正精密机械有限公司 Production scheduling method and system of mirror surface electric discharge machine

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103455578A (en) * 2013-08-23 2013-12-18 华南师范大学 Association rule and bi-clustering-based airline customer data mining method
CN103823823A (en) * 2013-07-08 2014-05-28 电子科技大学 Denormalization strategy selection method based on frequent item set mining algorithm
CN104281617A (en) * 2013-07-10 2015-01-14 广州中国科学院先进技术研究所 Domain knowledge-based multilayer association rules mining method and system
CN106056466A (en) * 2016-05-26 2016-10-26 国网湖北省电力公司 Large-power-grid key line identification method based on FP-growth algorithm
CN107507028A (en) * 2017-08-16 2017-12-22 北京京东尚科信息技术有限公司 User preference determines method, apparatus, equipment and storage medium
US20180239949A1 (en) * 2015-02-23 2018-08-23 Cellanyx Diagnostics, Llc Cell imaging and analysis to differentiate clinically relevant sub-populations of cells
CN110442038A (en) * 2019-07-25 2019-11-12 南京邮电大学 Method is determined based on the thermal power unit operation optimization target values of FP-Growth algorithm

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103823823A (en) * 2013-07-08 2014-05-28 电子科技大学 Denormalization strategy selection method based on frequent item set mining algorithm
CN104281617A (en) * 2013-07-10 2015-01-14 广州中国科学院先进技术研究所 Domain knowledge-based multilayer association rules mining method and system
CN103455578A (en) * 2013-08-23 2013-12-18 华南师范大学 Association rule and bi-clustering-based airline customer data mining method
US20180239949A1 (en) * 2015-02-23 2018-08-23 Cellanyx Diagnostics, Llc Cell imaging and analysis to differentiate clinically relevant sub-populations of cells
CN106056466A (en) * 2016-05-26 2016-10-26 国网湖北省电力公司 Large-power-grid key line identification method based on FP-growth algorithm
CN107507028A (en) * 2017-08-16 2017-12-22 北京京东尚科信息技术有限公司 User preference determines method, apparatus, equipment and storage medium
CN110442038A (en) * 2019-07-25 2019-11-12 南京邮电大学 Method is determined based on the thermal power unit operation optimization target values of FP-Growth algorithm

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
NAYANA RAMAKRISHNAN ET AL.: "Hypergraph based clustering for document similarity using FP growth algorithm", 《2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS)》 *
PIETRO DUCANGE ET AL.: "A MapReduce-based fuzzy associative classifier for big data", 《2015 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE)》 *
周兴华: "时间序列流的层次聚类和频繁模式的挖掘算法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
陈晓 等: "基于最大熵模糊聚类的快速多目标跟踪算法研究", 《西北工业大学学报》 *
黎昂: "协议分析及聚类算法在入侵检测中的应用研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114758528A (en) * 2022-03-31 2022-07-15 中国民用航空飞行学院 Airport terminal area capacity prediction method based on service resource supply and demand balance
CN116681266A (en) * 2023-08-02 2023-09-01 广东台正精密机械有限公司 Production scheduling method and system of mirror surface electric discharge machine
CN116681266B (en) * 2023-08-02 2024-02-02 广东台正精密机械有限公司 Production scheduling method and system of mirror surface electric discharge machine

Similar Documents

Publication Publication Date Title
Li et al. Novel alarm correlation analysis system based on association rules mining in telecommunication networks
CN113807422B (en) Weighted graph convolutional neural network scoring prediction model integrating multi-feature information
CN113553493A (en) Service selection method based on demand service probability matrix
CN110719106B (en) Social network graph compression method and system based on node classification and sorting
Gu et al. The interaction between schema matching and record matching in data integration
CN101119302A (en) Method for digging frequency mode in the lately time window of affair data flow
CN112667735A (en) Visualization model establishing and analyzing system and method based on big data
CN113626400A (en) Log event extraction method and system based on log tree and analytic tree
CN107180079B (en) Image retrieval method based on convolutional neural network and tree and hash combined index
Vijayalaksmi et al. A fast approach to clustering datasets using dbscan and pruning algorithms
CN116720090A (en) Self-adaptive clustering method based on hierarchy
CN110471854B (en) Defect report assignment method based on high-dimensional data hybrid reduction
Win et al. Document clustering by fuzzy c-mean algorithm
KR102158049B1 (en) Data clustering apparatus and method based on range query using cf tree
CN114036345A (en) Method and device for processing track data and storage medium
CN113204676B (en) Compression storage method based on graph structure data
CN111369052B (en) Simplified road network KSP optimization algorithm
Jia et al. Clustering Algorithm with Learnable Distance for Categorical Data with Nominal and Ordinal Attributes
CN118093659B (en) Database Gao Weishu query method based on three-input network and high-point tree
CN112733926A (en) Multi-layer network clustering method based on semi-supervision
CN112883704A (en) Big data similar text duplicate removal preprocessing method and device and terminal equipment
CN115080921B (en) Improved Top-k dosing method based on audit sensitivity
Yahia et al. K-nearest neighbor and C4. 5 algorithms as data mining methods: advantages and difficulties
Wang et al. Attributed heterogeneous network embedding based on graph convolutional neural network
Devi et al. Hybridized harmony search method for text clustering using concept factorization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211026

WD01 Invention patent application deemed withdrawn after publication