CN105404619A

CN105404619A - Similarity based semantic Web service clustering labeling method

Info

Publication number: CN105404619A
Application number: CN201510568188.4A
Authority: CN
Inventors: 刘发贵; 邓达成; 彭晨漪; 李平
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2015-09-08
Filing date: 2015-09-08
Publication date: 2016-03-16
Anticipated expiration: 2035-09-08
Also published as: CN105404619B

Abstract

The invention discloses a similarity based semantic Web service clustering labeling method. The method is characterized by comprising two parts of realizing semantic Web service similarity calculation and realizing a semantic Web service clustering labeling algorithm. During the semantic Web service similarity calculation, in combination with results of input/output (I/O) parameter mixed similarity calculation and service description keyword similarity calculation, a calculation result of semantic Web service similarity is comprehensively obtained, and the difference and similarity between service functions are reflected; and I/O parameters can directly describe functions of corresponding service modules and serve as measurement standards for calculating the semantic Web service similarity from a functional perspective. According to the method, the accuracy of similarity calculation can be improved and the performance of a service discovery system is further improved.

Description

A kind of Semantic Web Services cluster mask method based on similarity

Technical field

The invention belongs to Semantic Web Services in intelligent semantic net and calculate field, be specifically related to the Semantic Web Services cluster mask method based on similarity.

Background technology

Along with the quick emergence of Internet of Things, can be undertaken manipulating by network and the equipment of exchanges data and resource type increasing.Cisco predicts, to the year two thousand twenty, the quantity of internet device will reach about 50,000,000,000.Along with the appearance of various Internet of Things entity device and application platform, Internet of Things starts the problem facing message exchange and collaborative work between isomerization entity.

Research is had to attempt service-oriented cross-platform thought to be incorporated in Internet of Things at present.Be described by the form of the function serviceization each entity in physical world, thus make the function of entity to come accessed with unified service interface and to call, and outwardly provide the function of himself further.So, just working in coordination with of the mutual of information and function is carried out by service interface between the entity of isomery.Further, can Relevant Service Discovery Technologies be passed through, find the service or the service chaining that meet user's request, and then find the entity of corresponding execution service function, the entity collaborate that final driving is heterogeneous, complete request.Thus, Relevant Service Discovery Technologies is that the collaborative work problem effectively solved between heterogeneous device entity provides solution.In addition, Semantic Web add the intelligence degree that can improve Relevant Service Discovery Technologies, clear and definite semanteme to be conducive to allowing between entity the Meaning of Information better understood each other.The Data Fusion of the introducing energy reinforce networked platforms of semantic technology in Internet of Things and resource query ability, meet application demand complicated and changeable.

But be described in the form of magnanimity entity function being carried out to serviceization, break through between resource while heterogeneous obstacle, by causing, the quantity of Semantic Web Services is various, and function type is lengthy and jumbled.Wherein the service of many identity functions will be there is.So, need the service library to depositing these services to carry out cluster, and the service mark center service to each class, thus improve service discovery efficiency.

Semantic Web Services cluster mask method based on similarity refers to the calculated value based on similarity between Semantic Web Services, category division is carried out to Semantic Web Services, thus play effect entity function sorted out and marks, finally reach the service of Internet of Things entity describes ever-increasing while, improve the efficiency of service discovery.

In service Similarity Measure, in existing achievement in research, in type API has Woogle and OWLS-MX.The Web service structure that Woogle supports is non-semantic, can not be applied directly in the calculating of Semantic Web Services similarity.By contrast, OWLS-MX comprises semantic reasoning module, supports the calculating of semantic similarity.But what OWLS-MX adopted is the method that coupling is filtered, and determines the similarity relation between two services with these five kinds of filtrators of EXACT, PLUGIN, SUBSUMES, SUBSUMED-BY and NEAREEST-NEIGHBOR.Thus, the similarity obtained is five fixing relation value, and the inadequate refinement of similarity numerical value is with accurate.Current existing DoM computing method, it be use service I/O body parameter calculation services between the method for similarity.Although the I/O body parameter of Semantic Web Services very directly can describe the function of corresponding with service module, when calculating similarity, only there is the I/O parameter of service, field and the feature of service function can not be described very clearly.

In service clustering algorithm, the people such as Christian use the mutation of AL (AverageLinkage) hierarchical clustering algorithm, the distance coming between compute classes central point and other classes for mean value with Mesophyticum.But the large class of minority that what the mode of hierarchical cluster probably caused final cluster to obtain is, easily causes service search efficiency on the low side.The people such as Wu propose the similarity calculating method of various dimensions, use the similarity between K-MEANS algorithm calculation services.K-MEANS algorithm need specify cluster number, and the selection of initial point is very large on the impact of result.The people such as Aliz propose the Semantic Web Services clustering method based on particle cluster algorithm.But this algorithm needs the number of times of specifying cluster number and algorithm iteration, and this algorithm is easily absorbed in the predicament of locally optimal solution.The people such as Pop propose the service clustering method based on ant group algorithm.But the speed of convergence of the method is slow, among the predicament being likely absorbed in local optimum.

Summary of the invention

The object of the invention is to the deficiency overcoming existing Semantic Web Services Similarity Measure technology and service cluster label technology, propose a kind of based on I/O mixing and the Semantic Web Services similarity calculating method of key word, and further provide a kind of method that Semantic Web Services cluster based on similarity marks.

Object of the present invention is achieved by the following technical programs:

Based on a Semantic Web Services cluster mask method for similarity, comprise following two steps:

1) Semantic Web Services Similarity Measure;

2) based on Similarity Measure result, algorithm is used to carry out cluster and center service mark to Semantic Web Services;

Above-mentioned steps 1) described computing method, comprise two parts: I/O hybrid parameter Similarity Measure, and key word Similarity Measure.

The calculating of described Semantic Web Services similarity, combine the result of I/O and the calculating of I/O parameter hybrid similarity and service describing key word Similarity Measure, comprehensively draw the result of calculation of Semantic Web Services similarity, make it more accurately to reflect the difference between service function and similarity degree;

Described I/O parameter very directly can describe the function of corresponding with service module.I/O parameter is usually used as the criterion from functional perspective computing semantic Web service similarity.

Above-mentioned steps 1) described computing method, computing formula is

Wherein, Sim (S1, S2) the similarity numerical value between Semantic Web Services S1 and S2 is represented, Sim_Func (S1, S2) the I/O hybrid parameter similarity between Semantic Web Services S1 and S2 is represented, Sim_Key (S1, S2) represents the key word similarity between Semantic Web Services S1 and S2.

Above-mentioned steps 1) described in I/O hybrid parameter similarity, the implication represented by it is as shown in Figure 1.Computing formula is:

{Sim}_{F u n c} (S 1, S 2) = \frac{{Sim}_{I n p u t s} (S 1, S 2) + {Sim}_{O u t p u t s} (S 1, S 2)}{2} .

Wherein, Sim _inputs(S1, S2) is the input parameter similarity of Semantic Web Services S1 and S2, Sim _outputs(S1, S2) is output parameter similarity.Wherein,

{Sim}_{I n p u t s} (S 1, S 2) = \frac{{Sim}_{I n} (S 1, S 2) + {Sim}_{I n} (S 2, S 1)}{2} .

Sim _in(S1, S2) represent the input parameter of Semantic Web Services S1 for Semantic Web Services S2 input parameter between similarity.Sim _in(S2, S1) otherwise, represent Semantic Web Services S2 input body parameter for Semantic Web Services S1 input body parameter between similarity.Wherein,

{Sim}_{I n} (S 1, S 2) = \frac{Σ_{i = 1}^{| I n (S 1) |} \max_{j = 1}^{| I n (S 2) \cup O u t (S 2) |} {Sim}_{c o n} (C 1_{i}, C 2_{j})}{| I n (S 1) |},

As shown in above formula, Sim _con(C1 _i, C2 _j) represent the single input parameter C1 of Semantic Web Services S1 _iwith the single input or output parameter C2 of Semantic Web Services S2 _jbetween similarity. represent concept C1 _imate with each input and output Ontological concept of Semantic Web Services S2, obtain the similarity numerical value of a pair maximum concept of matching degree.And the matching degree computing formula between concept is:

\begin{matrix} s i m (X, Y) = α \frac{| X \cap Y |}{| X |} + (1 - α) \frac{| X \cap Y |}{| Y |} = \\ α \frac{Σ_{d &Element; D} \min {μ_{X} (d), μ_{Y} (d)}}{Σ_{d &Element; D} μ_{X} (d)} + (1 - α) \frac{Σ_{d &Element; D} \min {μ_{X} (d), μ_{Y} (d)}}{Σ_{d &Element; D} μ_{Y} (d)} \end{matrix}

The implication of sim (X, Y) is the similarity degree of Y for X.α is a parameter regulating weight, α ∈ [0,1].D={x} represents object space, and X, Y are the fuzzy sets in D, and sim:U × U → [0,1] is the fuzzy resembling relation on product space U × U.Apply the similarity that this formulae discovery can obtain between two concepts.

Above-mentioned steps 1) key word in described key word similarity, refer to the common factor of owls service description file text header content for descriptive semantics Web service and profile:textDescription Chinese version vocabulary.

Above-mentioned steps 1) described in the key word similarity of Semantic Web Services, computing formula is

{Sim}_{K e y} (S 1, S 2) =

\frac{{Sim}_{w o r d} (S 1, S 2) + {Sim}_{w o r d} (S 2, S 1)}{2} .

Wherein, Sim _word(S1, S2) is for Semantic Web Services S1, the key word similarity between it and Semantic Web Services S2; Sim _word(S1, S2) is for Semantic Web Services S2, the key word similarity between it and Semantic Web Services S1.Wherein,

{Sim}_{w o r d} (S 1, S 2) = Σ_{i = 1}^{| W o r d (S 1) |} T (k 1_{i}) \times \max_{j = 1}^{| W o r d (S 2) |} {Sim}_{G} (k 1_{i}, k 2_{j}) .

T (k1 _i) that represent is key word k1 _itF-IDF weighted value.TF-IDF is a kind of method added up the significance level of words in one section of text and calculate.Wherein,

IDF _irepresent the reverse document-frequency (inversedocumentfrequency, IDF) of key word i.IDF represents the significance level of a word in All Files.Here, owing to only considering the importance of key word in respective service description document, and the ubiquity of key word is not considered, so IDF _i=1.

represent the TF word frequency (TermFrequency) of key word i in document d, the frequency namely occurred.In this article, document d refers to Semantic Web Services description document, and key word i refers to get to occur simultaneously by the word in textdescription and Semantic Web Services title and obtains.

Sim _g(k1 _i, k2 _j) represent certain key word k1 of Semantic Web Services S1 _iwith a key word k2 of Semantic Web Services S2 _jbetween similarity. represent the key word k1 of calculation services S1 one by one _iwith all key word k2 of service S2 _ibetween similarity, get maximal value and return.Wherein, because key word is text, do not have the ontology information of direct correlation with it, we adopt NormalizedGoogleDsitance ^[52]calculate the similarity between key word, computing formula is Sim _g(k1 _i, k2 _j)=1-NGD (k1 _i, k2 _j)

Wherein, NGD (k1 _i, k2 _j) represent NormalizedGoogleDistance between both keyword,

N G D (k 1_{i}, k 2_{j}) = \frac{\max {\log f (k 1_{i}), \log f (k 2_{j})} - \log f (k 1_{i}, k 2_{j})}{\log N - \min {\log f (k 1_{i}), \log k 2_{j}}}

Step 2) described according to Similarity Measure result, propose based on AffinityPropagation (AP) cluster dimensioning algorithm to Semantic Web Services carry out cluster and center service mark.Algorithm steps is:

Step 2.1 regards a Semantic Web Services as a data point;

Similarity between Semantic Web Services is converted into the distance between data point by step 2.2;

The reference value parameter of step 2.3 initialization cluster dimensioning algorithm, ratio of damping;

Step 2.4, by transmitting responsibility and availability two class message between data point, determines which data point is central point;

Other data points are sorted out in the services set representated by centre data point by step 2.5.

The distance similarity between Semantic Web Services be converted between data point described in step 2.2, technical scheme is as follows: s (i, k) represents in the distance matrix of Semantic Web Services, the distance between service i and service k.The computing method of s (i, k) are: change similarity matrix Sim into distance matrix S.Suppose service n total to be clustered, then similarity matrix can be expressed as,

S i m = [\begin{matrix} s i m (0, 0) & s i m (0, 1) & ... & s i m (0, n - 1) \\ s i m (1, 0) & s i m (1, 1) & ... & s i m (1, n - 1) \\ . & . & . \\ . & . & . \\ . & . & . \\ s i m (n - 1, 0) & s i m (n - 1, 1) & ... & s i m (n - 1, n - 1) \end{matrix}]

Wherein sim (i, j) represents the similarity between Semantic Web Services i and Semantic Web Services j, by step 1) described technical method calculates.

By formula S=Sim-1, obtain the distance matrix S based on Semantic Web Services similarity.Wherein, element s (i, the j) ∈ [-1,0] in distance matrix S.

Above-mentioned s (i, k), as i=k, the diagonal entry of what s (k, k) represented is distance matrix S, value is in an initial condition called as reference value (preference).

The value of reference value described in step 2.3, in general has the value that two kinds desirable, and one is reference value=median (S), and reference value gets the intermediate value of distance value between each Semantic Web Services data point; Two is reference value=average (S), and reference value gets the mean value of distance value between each Semantic Web Services data point.Prove by experiment, as reference value=median (S), improve maximum to the efficiency of service discovery system, Clustering Effect is best.

Ratio of damping described in step 2.3, in order to suppress to occur that the situation that numerical value vibrates is established.Prove by experiment, as lam=0.5, numerical fluctuations is less, and this algorithm the convergence speed is moderate.

The two class message, Attraction Degree (responsibility) and the degree of membership (availability) that transmit between the data point formed in Semantic Web Services described in step 2.4.Attraction Degree responsibility (i, k) represents, to refer to from Semantic Web Services data point i, service data point k as an appropriateness for i cluster centre, is generally abbreviated as r (i, k).Degree of membership availability (i, k) represents, refers to from service data point k, and service data point i meeting selected element k, as the possible degree of cluster centre, is generally abbreviated as a (i, k).

The computation rule of r (i, k) is: r (i, k) ← s (i, k)-max _{k ' s.t:k ' ≠ k}{ a (i, k ')+s (i, k ') }

Wherein, s (i, k) represents in the distance matrix of Semantic Web Services, the distance between service i and service k.Be 0 under avalability value original state between all services.

The computation rule of a (i, k) is:

a (i, k) &LeftArrow; \{\begin{matrix} m i n {0, r (k, k) + Σ_{i^{'} s . t . i^{'} &NotElement; {i, k}} \max {0, r (i^{'}, k)}}, i &NotEqual; k \\ Σ_{i^{'} s . t . i^{'} &NotElement; {i, k}} \max {0, r (i^{'}, k)}, i = k \end{matrix}

The value of r (i, k) and a (i, k) is larger, larger as the Attraction Degree degree of cluster centre to service point k, and service point i is under the jurisdiction of with service point k is the cluster at services set center degree of membership is also larger.Algorithm by continuous iteration upgrade each service point and other serve between Attraction Degree and belong to angle value, finally at a time stop iteration, and produce m high-quality cluster centre service point (exemplar), remaining service is referred in corresponding services set simultaneously.

In order to suppress the situation occurring that numerical value vibrates, need to add ratio of damping lam, that is:

r _i+1(i,k)←lamr _i(i,k)+(1-lam)r _i+1(i,k),lam∈(0,1)，

a _i+1(i,k)←lama _i(i,k)+(1-lam)a _i+1(i,k),lam∈(0,1)

Wherein, r _i+1(i, k) represents k that the i-th+1 time iteration the calculates attraction angle value for i, r _i(i, k) represents k that i-th iteration the calculate attraction angle value for i.Lam is the ratio of damping described in step 2.3.Can find out thus, each renewal attracts angle value and ownership angle value, all needs to be multiplied by lam by the result of last iteration, adds that the result of current iteration is multiplied by 1-lam, reduce the vibration of numerical value, accelerate algorithm the convergence speed.

Which data point of determination described in step 2.4 is central point, and technical scheme is as follows:

Step 2.4.1 creates matrix E=R+A.R represents the N*N matrix of record r (i, k) value, and i ∈ [0, N), k ∈ [0, N).A is the N*N matrix of record a (i, k) value, and i ∈ [0, N), k ∈ [0, N).

Whether the value of step 2.4.2 one by one on judgment matrix E diagonal line is greater than 0.If be greater than 0, then using the central point of the Semantic Web Services corresponding to this diagonal entry as cluster.

Other data points are sorted out in the services set representated by centre data point described in step 2.5, refer to the Semantic Web Services as cluster centre point out selected by step 2.4.2, the Semantic Web Services for other non-cluster central points selects the class bunch of the maximum central point of the element value of associated in E matrix as oneself.

Compared with prior art, tool of the present invention has the following advantages and technique effect: the present invention is by similarity numerical value between the service that calculates, and carry out cluster based on similarity to Semantic Web Services, intimate service being gathered is a class.And the center service extracting each cluster services set in the process of cluster is to mark the function of this class.The present invention can improve the accuracy of Similarity Measure, and promotes the performance of service discovery system further.

Accompanying drawing explanation

Fig. 1 is the input parameter Similarity Measure schematic diagram of semantic service S1;

Fig. 2 is Semantic Web Services Similarity Measure process flow diagram;

Fig. 3 is Semantic Web Services cluster dimensioning algorithm process flow diagram.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with accompanying drawing, the present invention is further elaborated, but enforcement of the present invention and protection are not limited thereto.

Fig. 2 is Semantic Web Services Similarity Measure process flow diagram.The calculating of Semantic Web Services similarity comprises functional parameter coupling and keyword match two parts.Flow process described in Fig. 2 is:

Step 1, the service of input two similarities to be calculated, the similarity giving tacit consent to them is 0;

Step 2, judges whether that the I/O parameter of presence service is empty situation, if so, then returns acquiescence similarity, process ends, if not, then continues step 3;

Step 3, carries out the Input parameter of Semantic Web Services and the calculating of Output parameter similarity respectively, and comprehensive both calculated value, the calculating of functional parameter similarity between serving, draws functional parameter similarity numerical value.

The calculating of Semantic Web Services Input parameter similarity described in step 3, the formula adopted is

{Sim}_{I n p u t s} (S 1, S 2) =

\frac{{Sim}_{I n} (S 1, S 2) + {Sim}_{I n} (S 2, S 1)}{2},

Wherein

{Sim}_{I n} (S 1, S 2) = \frac{Σ_{i = 1}^{| I n (S 1) |} \max_{j = 1}^{| I n (S 2) \cup O u t (S 2) |} {Sim}_{c o n} (C 1_{i}, C 2_{j})}{| I n (S 1) |} .

Sim _con(C1 _i, C2 _j) represent the single input parameter C1 of Semantic Web Services S1 _iwith the single input or output parameter C2 of Semantic Web Services S2 _jbetween similarity, concrete computing formula is

\begin{matrix} s i m (X, Y) = α \frac{| X \cap Y |}{| X |} + (1 - α) \frac{| X \cap Y |}{| Y |} = \\ α \frac{Σ_{d &Element; D} \min {μ_{X} (d), μ_{Y} (d)}}{Σ_{d &Element; D} μ_{X} (d)} + (1 - α) \frac{Σ_{d &Element; D} \min {μ_{X} (d), μ_{Y} (d)}}{Σ_{d &Element; D} μ_{Y} (d)} \end{matrix}

Step 4, according to the title of the content under Semantic Web Services profile label and service, extracts the set of keywords of service.

The set of keywords of the service that extracts described in step 4, comprises four steps:

Step 4.1, by the content under Semantic Web Services description document profile label, carries out vocabulary segmentation, obtains the set of multiple text english vocabulary composition;

Step 4.2, carries out word segmentation processing by the title of service, obtains the set of multiple text English word composition;

Step 4.3, carries out intersection operation by the set that step 4.1 and step 4.2 obtain, obtains new lexical set;

Step 4.4, in the lexical set that calculation procedure 4.3 obtains, the TF-IDF weighted value of each vocabulary in corresponding Semantic Web Services description document.Computing formula is t (k1 _i) that represent is key word k1 _itF-IDF weighted value.Wherein, IDF _irepresent the prevalence in corresponding Semantic Web Services description document of key word i, represent the frequency of occurrences of key word i in document d.Here, IDF _i=1.

Step 5, according to the set of keywords of Semantic Web Services, application Google distance, calculates key word similarity numerical value.Be specially sim _word(S1, S2) is for Semantic Web Services S1, the key word similarity between it and Semantic Web Services S2;

Described in step 5

{Sim}_{w o r d} (S 1, S 2) = Σ_{i = 1}^{| W o r d (S 1) |} T (k 1_{i}) \times \max_{j = 1}^{| W o r d (S 2) |} {Sim}_{G} (k 1_{i}, k 2_{j}),

T (k1 _i) that represent is key word k1 _itF-IDF weighted value, Sim _g(k1 _i, k2 _j) represent certain key word k1 of Semantic Web Services S1 _iwith a key word k2 of Semantic Web Services S2 _jbetween similarity.

Sim described in step 5 _g(k1 _i, k2 _j), employing NormalizedGoogleDsitance calculates the similarity between key word, and computing formula is Sim _g(k1 _i, k2 _j)=1-NGD (k1 _i, k2 _j).Wherein, NGD (k1 _i, k2 _j) represent NormalizedGoogleDistance between both keyword.

Step 6, comprehensive function parameter similarity numerical value and service describing key word Similarity value, draw the similarity between Semantic Web Services.Be specially sim (S1, S2) represents the similarity numerical value between Semantic Web Services S1 and S2, and Sim_Func (S1, S2) represents language functional parameter similarity, and Sim_Key (S1, S2) represents key word similarity.

Based on the Semantic Web Services similarity calculated, cluster and center service extraction are carried out to Semantic Web Services.

Fig. 3 is Semantic Web Services cluster dimensioning algorithm process flow diagram.The flow process of algorithm is mainly divided into four-stage.First stage is the similarity of computing semantic Web service; Second stage is the parameter required for initialization clustering algorithm; Three phases is constantly calculated between services by iteration and mutually transmits the value of Attraction Degree (responsibility) and degree of membership (availability), selects the centre data point of cluster; Four-stage is according to the Attraction Degree between other data points and centre data point and degree of membership, and they are carried out category division.So, just complete the cluster to Semantic Web Services.Meanwhile, centre data point out selected by cluster process is mapped back Semantic Web Services, using these services as the center service of each services set, is used for marking the exemplary functions described by respective set of services.Concrete steps are as follows:

Step 1, uses above-mentioned Semantic Web Services similarity calculating method, the similarity in calculation services storehouse between each Semantic Web Services.

Step 2, according to the similarity structure similarity matrix Sim between Semantic Web Services.The row and column of this matrix is the Semantic Web Services by same sequence arrangement.

Step 3, uses the similarity matrix Sim of second step structure, and initialization is by by the distance matrix S of clustering processing, and S=Sim-1.Meanwhile, structure Attraction Degree matrix R and degree of membership matrix A, and initialization R=0, A=0.

Step 4, the value of initialization preference is the intermediate value of all elements in distance matrix, ratio of damping lam=0.5.Element in distance matrix is sorted by order from small to large, and obtains the element intermediate value after sorting, in order to initialization preference.

Step 5, using formula r (i, k) ← s (i, k)-max _{k ' s.t:k ' ≠ k}{ a (i, k ')+s (i, k ') }, calculates the element value of Attraction Degree matrix R one by one, and adopts r _i+1(i, k) ← lamr _i(i, k)+(1-lam) r _i+1(i, k) upgrades Attraction Degree matrix R.

R (i, k) described in step 5 refers to responsibility (i, k), represents from Semantic Web Services data point i, service data point k as an appropriateness for i cluster centre.A (i, k) refers to availability (i, k), represents from service data point k, and service data point i meeting selected element k is as the possible degree of cluster centre.A (i, k) value is in an initial condition 0.

S (i, k) described in step 5 represents in the distance matrix of Semantic Web Services, the distance between service i and service k.

Step 6, using formula

a (i, k) &LeftArrow; \{\begin{matrix} m i n {0, r (k, k) + Σ_{i^{'} s . t . i^{'} &NotElement; {i, k}} \max {0, r (i^{'}, k)}}, i &NotEqual; k \\ Σ_{i^{'} s . t . i^{'} &NotElement; {i, k}} \max {0, r (i^{'}, k)}, i = k \end{matrix},

Calculate the element value of degree of membership matrix A one by one, and adopt a _i+1(i, k) ← lama _i(i, k)+(1-lam) a _i+1(i, k) upgrades degree of membership matrix A.

Step 7, judges whether iterations has exceeded defined 100 times.If so, then step 9 is performed; If not, then step 8 is performed.

Step 8, judges whether the value of Attraction Degree matrix and degree of membership matrix does not change more than 10 iteration all.If so, then the 9th step is performed; If not, then step 5 is performed.

Step 9, creates matrix E, and according to E=R+A, calculates the value of E.

Step 10, whether the value one by one on judgment matrix E diagonal line is greater than 0.If be greater than 0, then using the central point of the Semantic Web Services corresponding to this diagonal entry as cluster.

Step 11, according to E entry of a matrix element value, for the center service of the maximum central point of element value as oneself place services set is selected in other non-central some service.

Step 12, completes cluster, obtains the center service of one or more services set and each services set.

Claims

1., based on a Semantic Web Services cluster mask method for similarity, what it is characterized in that comprising the realization of Semantic Web Services Similarity Measure and Semantic Web Services cluster dimensioning algorithm realizes two parts;

The calculating of described Semantic Web Services similarity, combine the result of I/O and the calculating of I/O parameter hybrid similarity and service describing key word Similarity Measure, comprehensively draw the result of calculation of Semantic Web Services similarity, the difference between reflection service function and similarity degree; Described I/O parameter directly can describe the function of corresponding with service module, and I/O parameter is as the criterion from functional perspective computing semantic Web service similarity.

2. the Semantic Web Services cluster mask method based on similarity according to claim 1, it is characterized in that, Semantic Web Services calculating formula of similarity is

S i m (S 1, S 2) = \frac{{Sim}_{F u n c} (S 1, S 2) + {Sim}_{K e y} (S 1, S 2)}{2},

Wherein, Sim (S1, S2) the similarity numerical value between Semantic Web Services S1 and S2 is represented, Sim_Func (S1, S2) the I/O parameter hybrid similarity between Semantic Web Services S1 and S2 is represented, Sim_Key (S1, S2) represents the service describing key word similarity between Semantic Web Services S1 and S2;

The computing formula of I/O parameter hybrid similarity is

{Sim}_{F u n c} (S 1, S 2) = \frac{{Sim}_{I n p u t s} (S 1, S 2) + {Sim}_{O u t p u t s} (S 1, S 2)}{2},

Wherein, Sim _inputs(S1, S2) is the input parameter similarity of Semantic Web Services S1 and S2, Sim _outputs(S1, S2) is output parameter similarity.

3. the Semantic Web Services cluster mask method based on similarity according to claim 2, it is characterized in that, the circular of input parameter similarity is

{Sim}_{I n p u t s} (S 1, S 2) = \frac{{Sim}_{I n} (S 1, S 2) + {Sim}_{I n} (S 2, S 1)}{2},

Sim _in(S1, S2) represent the input parameter of Semantic Web Services S1 for Semantic Web Services S2 input parameter between similarity, Sim _in(S2, S1) otherwise, represent Semantic Web Services S2 input body parameter for Semantic Web Services S1 input body parameter between similarity; Wherein,

{Sim}_{I n} (S 1, S 2) = \frac{Σ_{i = 1}^{| I n (S 1) |} \max_{j = 1}^{| I n (S 2) \cup O u t (S 2) |} {Sim}_{c o n} (C 1_{i}, C 2_{j})}{| I n (S 1) |},

Sim _con(C1 _i, C2 _j) represent the single input parameter C1 of Semantic Web Services S1 _iwith the single input or output parameter C2 of Semantic Web Services S2 _jbetween similarity;

Described input parameter similarity is identical with the computing method of output parameter similarity;

Described Sim _con(C1 _i, C2 _j) computing method are specially:

\begin{matrix} s i m (X, Y) = α \frac{| X \cap Y |}{| X |} + (1 - α) \frac{| X \cap Y |}{| Y |} = \\ α \frac{Σ_{d &Element; D} \min {μ_{X} (d), μ_{Y} (d)}}{Σ_{d &Element; D} μ_{X} (d)} + (1 - α) \frac{Σ_{d &Element; D} \min {μ_{X} (d), μ_{Y} (d)}}{Σ_{d &Element; D} μ_{Y} (d)} \end{matrix}

Sim (X, Y) implication is the similarity degree of Y for X, α is a parameter regulating weight, α ∈ [0,1], D={x} represents object space, and X, Y are the fuzzy sets in D, sim:U × U → [0,1] is the fuzzy resembling relation on product space U × U, applies the similarity that this formulae discovery can obtain between two concepts.

4. the Semantic Web Services cluster mask method based on similarity according to claim 1, it is characterized in that, in described service describing key word similarity, service describing key word refers to owls service description file title and this file and to get the bid the common factor of the content vocabulary signed under profile:textDescription;

The computing formula of described service describing key word similarity is wherein, Sim _word(S1, S2) is for service S1, the key word similarity between it and service S2; Sim _word(S1, S2) is for service S2, the key word similarity between it and service S1;

Key word similarity between described Semantic Web Services S1 and Semantic Web Services S2, computing formula is

{Sim}_{w o r d} (S 1, S 2) = Σ_{i = 1}^{| W o r d (S 1) |} T (k 1_{i}) \times \max_{j = 1}^{| W o r d (S 2) |} {Sim}_{G} (k 1_{i}, k 2_{j}),

T (k1 _i) that represent is key word k1 _itF-IDF weighted value, Sim _g(k1 _i, k2 _j) represent certain key word k1 of Semantic Web Services S1 _iwith a key word k2 of Semantic Web Services S2 _jbetween similarity;

Described TF-IDF weighted value, owing to only considering the importance of key word in respective service description document, and does not consider the ubiquity of key word, so parameter IDF=1 wherein.

5. the Semantic Web Services cluster mask method based on similarity according to claim 4, is characterized in that, Sim _g(k1 _i, k2 _j) adopt NormalGoogleDistance to calculate, be specially, Sim _g(k1 _i, k2 _j)=1-NGD (k1 _i, k2 _j), NGD (k1 _i, k2 _j) represent NormalizedGoogleDistance between both keyword.

6. the Semantic Web Services cluster mask method based on similarity according to any one of Claims 1 to 5, it is characterized in that, on the basis obtaining each Semantic Web Services Similarity value, cluster and center service mark are carried out to Semantic Web Services, the work of cluster and center service mark, specifically comprises:

Step 1 regards a Semantic Web Services as a data point;

Similarity between Semantic Web Services is converted into the distance between data point by step 2;

The reference value parameter of step 3 initialization cluster dimensioning algorithm, ratio of damping;

Step 4, by transmitting responsibility and availability two class message between data point, determines which data point is central point;

Other data points are sorted out in the services set representated by centre data point by step 5.

7. the Semantic Web Services cluster mask method based on similarity according to claim 6, it is characterized in that, in step 2, s (i, k) represents in the distance matrix of Semantic Web Services, the distance between service i and service k, s (i, k) computing method are: change similarity matrix Sim into distance matrix S, by formula S=Sim-1, obtain the distance matrix S based on Semantic Web Services similarity.

8. the Semantic Web Services cluster mask method based on similarity according to claim 7, it is characterized in that, the value of the reference value parameter (preference) in step 3 is also s (k, the value of k), and reference value=median (S); Described ratio of damping, in order to suppress to occur that the situation that numerical value vibrates is established, as lam=0.5, numerical fluctuations is less, and this algorithm the convergence speed is moderate.

9. the Semantic Web Services cluster mask method based on similarity according to claim 7, it is characterized in that, in step 4, Attraction Degree responsibility (i, k) represent, refer to from Semantic Web Services data point i, service data point k, as an appropriateness for i cluster centre, is abbreviated as r (i, k), degree of membership availability (i, k) represent, refer to from service data point k, service data point i meeting selected element k is as the possible degree of cluster centre, be abbreviated as a (i, k);

Described r (i, k), computation rule is: r (i, k) ← s (i, k)-max _{k ' s.t:k ' ≠ k}{ a (i, k ')+s (i, k ') }, wherein, s (i, k) represents in the distance matrix of Semantic Web Services, the distance between service i and service k; Be 0 under avalability value original state between all services;

Described a (i, k) computation rule is:

a (i, k) &LeftArrow; \{\begin{matrix} m i n {0, r (k, k) + Σ_{i^{'} s . t . i^{'} &NotElement; {i, k}} \max {0, r (i^{'}, k)}}, i &NotEqual; k \\ Σ_{i^{'} s . t . i^{'} &NotElement; {i, k}} \max {0, r (i^{'}, k)}, i = k \end{matrix};

Determine that center service specifically comprises two steps:

Step 1 creates matrix E=R+A, and R represents the N*N matrix of record r (i, k) value, and i ∈ [0, N), k ∈ [0, N), A is the N*N matrix of record a (i, k) value, and i ∈ [0, N), k ∈ [0, N);

Whether the value of step 2 one by one on judgment matrix E diagonal line is greater than 0, if be greater than 0, then using the central point of the Semantic Web Services corresponding to this diagonal entry as cluster.

10. the Semantic Web Services cluster mask method based on similarity according to claim 9, it is characterized in that, other data points are sorted out in the services set representated by centre data point described in step 5, refer to according to the selected Semantic Web Services as cluster centre point out, the Semantic Web Services for other non-cluster central points selects the class bunch of the maximum central point of the element value of associated in E matrix as oneself.