CN113760778A

CN113760778A - Word vector model-based micro-service interface division evaluation method

Info

Publication number: CN113760778A
Application number: CN202111316694.6A
Authority: CN
Inventors: 李莹; 夏轩轩; 张凌飞; 朱晓莉; 方燕翎; 毛义华
Original assignee: Tianjin Zhongyi Science And Technology Co ltd; Binhai Industrial Technology Research Institute of Zhejiang University
Current assignee: Tianjin Zhongyi Science And Technology Co ltd; Binhai Industrial Technology Research Institute of Zhejiang University
Priority date: 2021-11-09
Filing date: 2021-11-09
Publication date: 2021-12-07
Anticipated expiration: 2041-11-09
Also published as: CN113760778B

Abstract

The invention provides a word vector model-based micro-service interface division evaluation method, which comprises the following steps of: the server side constructs a micro-service cluster; collecting log data to restore a distributed link calling process among all micro-service applications; model training: splitting a graph-shaped calling chain into linear calling subchains, extracting interface names according to a calling sequence to form an interface character string array, and obtaining a man-made micro-service interface division set omega; performing word vector model training based on the interface character string array to obtain a word vector of the interface name; interface division evaluation: taking the category number K of the micro-service application in the current cluster as the cluster number to obtain a cluster division set of a K-means algorithm; and evaluating the rationality of the omega interface division of the set by using a Purity algorithm with the clustering division set of the K-means algorithm as a reference. The method is based on the calling relation of the actual operation of the micro-service interface, uses a mathematical method to subdivide the interface set, compares the interface set with the micro-service interface divided manually, and guides the optimization of the existing micro-service architecture.

Description

Word vector model-based micro-service interface division evaluation method

Technical Field

The invention belongs to the field of micro service interfaces, and particularly relates to a micro service interface division evaluation method based on a word vector model.

Background

The traditional single application architecture is generally based on Tomcat middleware, and the complexity of the system is increased by the architecture, so that the cooperation among developers is difficult, and the system is difficult to be smoothly and continuously integrated and continuously released. In actual operation, the problem of chain reaction of faults is easy to occur, and the rapidly-increased business scale of the internet company cannot be met.

Compared with the traditional single architecture, the micro-service architecture decomposes the functions into discrete services, each service is cohesive enough, so that the coupling of the system is reduced, the services can be horizontally and vertically expanded and independently deployed, the problem of one service cannot lead the whole system to be paralyzed, and the system cannot be limited on a certain technical stack for a long time. The project adopting the micro-service architecture can realize the integration of rapid iteration, frequent release, development, operation and maintenance.

Based on the above advantages, more and more companies split the monolithic application into the micro-service architecture, for example, patent document with publication number CN112988122A discloses a monolithic application splitting tool and method based on the correlation between functional characteristics and micro-services, and patent document with publication number CN111026468A discloses a backend splitting strategy based on micro-services.

However, when the single application system has complex business, huge codes and numerous modules coupled together, it is challenging to comb out an ideal micro-service structure by means of manual disassembly. Unreasonable service interface division can lead to more complex service dependence relationship, recursively increases call delay among services, and sometimes even some simple functions are difficult to construct. This has the result that development progress is slowed, migration is more difficult, and the like.

In order to better build a micro-service architecture and reduce the call delay between services, the rationality of micro-service interface division needs to be measured and objectively evaluated.

Disclosure of Invention

In view of the above, the present invention aims to provide a method for evaluating micro-service interface division based on a word vector model, so as to solve the problem of low efficiency caused by unreasonable interface division and complex inter-service dependency relationship.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

a micro service interface division evaluation method based on a word vector model comprises the following steps:

s1, collecting data, specifically comprising the following steps:

s11, the server side constructs a micro service cluster;

s12, collecting and restoring the distributed link calling process among the micro service applications and forming a graph-shaped calling chain;

s2, setting a word vector model, inputting an interface character string array, and obtaining a word vector of an interface name, wherein the method comprises the following specific steps:

s21, dividing the graph-shaped calling chain into m linear calling subchains by a depth-first search method DFS, extracting interface names according to a calling sequence to form an interface character string array, and obtaining a man-made micro-service interface division set omega;

s22, carrying out word vector model training based on the interface character string array of the step S21 to obtain a word vector of the interface name;

s3, interface division evaluation, which comprises the following steps:

s31, obtaining a clustering cluster division set of the K-means algorithm by using the micro-service application number K as the clustering cluster number and using the word vector of the clustering interface name of the K-means algorithm

；

S32 clustering cluster partition set by K-means algorithm

For reference, the rationality of the artificial microservice interface partition set Ω is evaluated using the purify algorithm.

Further, in step S11, the method for the server to construct the micro service cluster includes:

the method comprises the steps that a server side constructs a micro-service cluster on the basis of spring cluster, service discovery annotations @ EnableDiscoveryClient and Feign annotations @ EnableFeign Clients are started on a micro-service application starting class, and calling is carried out between micro-service applications through the Feign Client.

Further, in step S12, a method for restoring the distributed link call procedure between the microservice applications and forming a graph-like call chain is collected:

adding a link tracking tool SOFATracer dependency, a Spring Cloud OpenFeign dependency and a data collection tool Zipkin dependency in a configuration file of each micro-service application, and performing embedded point access on a Spring Cloud OpenFeign component by using the SOFATracer to obtain a link calling process of each micro-service application;

introducing a link collection and display tool Zipkin into each project engineering, starting a Zipkin server, receiving link log data reported by a SOFATracer, cleaning the link log data by the Zipkin to form a graph-shaped calling chain, and restoring a distributed link calling process.

Further, the parameters of the sofatrecer configuration include:

a logging path, which designates a log file output directory;

com, alipay, sofa, tracker, Zipkin, enabled, starting the SOFATracer to report data to Zipkin remotely;

com, alipay, sofa, tracker, Zipkin, baseUrl, report data to the server address of Zipkin

The Spring Cloud OpenFeign summary log output by sofatrer can be seen in the log catalog of the project, and the parameters contained in one piece of data in the log are as follows:

app, representing the current microservice application name;

url, which represents the request interface address;

traceId, which represents the ID in sofastracer representing a unique request;

the spanId represents the level of the request in the whole call link;

the naming rule of the spanId is the number of a father spanId + a son spanId, the calling chain context relationship is included, and the spanIds with the same TraceId are collected to form a complete link tree.

Further, in step S21, the method for extracting the interface names according to the calling order and forming the interface character string array includes:

converting each calling subchain into an interface character string separated by a space, forming an interface character string array with the length of m linear calling subchains, wherein each interface character string represents an interface calling process of a primary child request, and the extracted interface granularity is a father path in an interface address and represents a resource type in micro-service application;

all extracted interface names are subjected to duplicate removal processing and are divided into k class clusters according to the classes of the micro-service applications corresponding to the interface names

，

And (4) a set is divided for the artificial micro-service interface, and k represents the number of categories of micro-service applications in the current cluster.

Further, in step S22, the word vector model is a CBOW model in the word vector models provided by the python genetic library;

the specific steps of training the word vector model are as follows:

setting a generated word vector dimension S, a window size C and the lowest word frequency min _ count = 1;

inputting an interface character string array, and establishing a sliding window with the size of C on each interface character string;

the central word of the window is used as the target of the training, and the rest words in the window are used as the target of the trainingGenerating a piece of training data by sliding the window once for each input node of the neural network, and obtaining a word vector representation set of each interface name through repeated iterative training

。

Further, in step S31, clustering the word vectors of the interface names using the K-means algorithm to obtain a cluster partition set of the K-means algorithm

The method comprises the following specific steps:

taking the number K of the categories of the microservice applications in the current cluster in step S21 as the cluster number of the K-means algorithm, first, from the interface word vector set

In randomly selecting k vectors

As a set

Each of which is clustered

And initializing clusters

，

；

Computing interface word vectors

And each mean vector

Is a distance of

Wherein, in the step (A),

is determined from the nearest mean vector

Cluster classification of

，

Indicating the current distance

Minimum time variation

A value of (i), i.e

Will interface the word vector

Into a corresponding cluster

，t=

At the beginning

；

After one iteration is finished, aiming at each class cluster

，

Recalculating the center point

Cluster the mean vector of the current class

Is updated to

Then for each interface word vector

Searching the central point closest to the user again;

repeating the loop until the set of two iterations

The clustering cluster division set of the K-means algorithm is finally obtained without change

。

Further, the computing interface word vector

And each mean vector

Is a distance of

The specific method comprises the following steps:

interface word vector

And each mean vector

Are all normalized and converted into unit directionAn amount;

interface word vector

And each mean vector

The normalized unit vector is subjected to vector dot product operation to obtain vector inner product, namely vector space cosine included angle, and the value of the cosine included angle is taken as the distance between two vectors

。

The range of the cosine is [ -1, 1], if the cosine between two vectors tends to-1, the semantic difference is larger, and tends to 1, the semantic similarity is considered to be higher.

Further, in step S32, the formula of the Purity algorithm is:

in the formula, N represents the total number of word vectors,

representing an artificial set of micro-service interface partitions,

a cluster partition set representing a K-means algorithm;

closer to 1 indicates more reasonable partitioning of the microservice interface.

For each class cluster

Assigning a class

The allocation principle is that the category is

Interface word vector of

In cluster

The number of occurrences of (a) is the largest, wherein,

calculate each cluster

Is classified into

The number of times of occurrence of the word vector is summed and normalized to obtain the final score

。

Compared with the prior art, the word vector model-based micro-service interface division evaluation method has the following beneficial effects:

the micro-service interface division evaluation method based on the word vector model is based on the calling relation of the actual operation of the micro-service interface, uses mathematical methods such as the word vector model, the K-means clustering and the Purity algorithm to divide the interface set again, compares the interface set with the manually divided micro-service interface, calculates the manual interface division evaluation score, and guides the existing micro-service architecture to carry out further optimization and adjustment, so that the micro-service architecture more conforms to the principle of high-cohesion and low-coupling micro-service architecture.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a micro-service interface partitioning evaluation method based on a word vector model according to the present invention;

FIG. 2 is a process diagram of a restore request call chain according to the present invention;

FIG. 3 is a diagram illustrating a word vector model according to the present invention;

FIG. 4 is a schematic diagram of the K-means clustering algorithm and the Purity algorithm according to the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.

The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

As shown in fig. 1, the method for evaluating the division of the micro-service interface based on the word vector model mainly includes

A data collection phase S1, a model training phase S2, and an interface evaluation phase S3.

S1, a data collection stage, which comprises the following steps:

s11, the server side constructs a micro service cluster, and each micro service application independently collects the embedded point logs.

s2, in the model training phase, setting a word vector model, inputting the preprocessed interface character string array, and obtaining the word vector representation of the interface name, wherein the method specifically comprises the following steps:

s21, dividing the graph-shaped calling chain into m linear calling subchains by a depth-first search method DFS, extracting interface names according to a calling sequence, forming an interface character string array, generating training data of a word vector model, and obtaining a micro-service interface set omega divided artificially;

s3, interface evaluation stage, which comprises the following steps:

s31, clustering interface name word vectors by using the micro-service application number K as the clustering cluster number and using a K-means algorithm to obtain a clustering cluster division set of the K-means algorithm

；

S32 clustering cluster partition set by K-means algorithm

For benchmark, human microservice interface planning is evaluated using the Purity algorithmThe rationality of the subset Ω.

In step S11, the method for the server to construct the micro service cluster includes:

the method comprises the following steps that a server side constructs a micro-service cluster on the basis of Spring Cloud, SOFATracer dependence, Spring Cloud OpenFeign dependence and Zipkin dependence are added into a pom file of an engineering module, and parameters needed to be used by a link tracking tool SOFATracer and a data collection tool Zipkin are added into a configuration file of each micro-service application, wherein the parameters comprise:

a logging path, which designates a log file output directory;

After the configuration of the dependency and the parameters of each micro service project is completed, service discovery notes @ EnableDiscoveryClients and Feign notes @ EnableFeign Clients are started on a micro service application starting class, and the micro service applications are called through the Feign Clients.

app, representing the current microservice application name;

url, which represents the request interface address;

traceId, which represents the ID in sofastracer representing a unique request;

the spanId represents the level of the request in the whole call link,

in step S12, a method for collecting and restoring the distributed link call process between the microservice applications and forming a graph call chain is provided:

starting a Zipkin server, reporting the Spring Cloud OpenFeign summary log to the Zipkin server by the SOFATracer component integrated by each micro-service application, optionally, according to the size of data volume, performing corresponding configuration on the Zipkin server to enable log data to be persisted to databases such as Mysql or elastic search.

As shown in fig. 2, firstly, the reported link log data is extracted from the database, data with the same TraceId is from the same request, the naming rule of the spanId parameter in each piece of data is the number of parent spanId + child spanId, which includes the context relationship of the call chain, the position of the piece of data in the call chain requested according to the spanId is restored, and the format of the request. "name of micro service application address/name of micro service resource class/method in class",

such as: "http://122.224.64.250: 8083/device/getInfo";

url parameters, such as device, are extracted as an interface api of the data request, and finally each request is restored to a graph-like call chain, as shown in the first dotted box of fig. 2, a, B, …, G indicate data with the same TraceId in the database, TraceId and spanId are parameters carried by the data, and api is a parameter generated by artificial extraction.

In step S21, the method for generating word vector model training data includes:

and traversing the link data of each request by a depth-first search method DFS, and splitting all the graph-shaped call chains into m linear call subchains as shown by a second dotted box in FIG. 2. Traversing each sub-chain, extracting an api parameter in each piece of data according to a calling sequence, converting each calling sub-chain into an interface character string separated by a space, such as 'sa sd sc sg', each interface character string represents an interface calling process of a sub-request at one time, and m linear calling sub-chains form an interface character string array with the length of m.

Performing duplicate removal processing on all extracted interface names sa, sb, sc and the like, and dividing the interface names into k class clusters according to the classes of the micro-service applications to which the interface names sa, sb, sc and the like belong

，

The interface string array is a training corpus as the word vector model in step S22.

As shown in fig. 3, in step S22, the word vector model is a CBOW model in the word vector models provided by the python general library, where the CBOW model is a three-layer neural network including an Input layer (Input layer), a Hidden layer (Hidden layer), and an Output layer (Output layer);

the specific steps of training the word vector model are as follows:

setting training parameters of a word vector model, generating a word vector dimension S =100, a window size C =5, and a minimum word frequency min _ count =1 (every interface appearing on a request link should not be ignored);

an interface character string array is input, a sliding window with the size of C is established on each interface character string, and a1, a2 and … a6 in the figure 3 represent interface names contained in one interface character string. The central word a3 of the window is used as the target of the training, the rest words a1, a2, a4 and a5 in the window are used as input nodes of the neural network, each interface name can be converted into N-dimensional One-Hot codes, N is the number of the extracted and de-weighted interface names, and the One-Hot codes of 4 input nodes are respectively multiplied by a shared input weight matrix

Obtaining 4 vectors, generating an S-dimensional hidden layer vector after weighted averaging, and multiplying the hidden layer vector by an output weight matrix

Obtaining an output vector, comparing the output vector with One-Hot coding of the central word a3 and updating a weight matrix

And

generating a piece of training data every time the window slides once, and obtaining an output weight matrix through repeated iterative training

For the interface word vector matrix, each row of the matrix corresponds to an S-dimensional interface word vector, and finally, a word vector representation set of each interface name is obtained

Set of

The distribution in space is shown in the first dotted box of fig. 4.

The interface word vectors with similar contexts in the call chain are close to each other in position in the space coordinate, and the interface word vectors with larger context difference are far away from each other.

In step S31, the word vectors of the interface names are clustered by using the K-means algorithm to obtain a clustering cluster division set of the K-means algorithm

The method comprises the following specific steps:

In randomly selecting k vectors

As a set

Each of which is clustered

And initializing clusters

，

；

Computing interface word vectors

And each mean vector

Is a distance of

Wherein, in the step (A),

is determined from the nearest mean vector

Cluster classification of

，

Indicating the current distance

Minimum time variation

A value of (i), i.e

Will interface the word vector

Into a corresponding cluster

，t=

At the beginning

；

After one iteration is finished, aiming at each class cluster

，

Recalculating the center point

Cluster the mean vector of the current class

Is updated to

Then for each interface word vector

Searching the central point closest to the user again;

repeating the loop until the set of two iterations

。

The computing interface word vector

And each mean vector

Is a distance of

The specific method comprises the following steps:

interface word vector

And each mean vector

Are normalized and converted into unit vectors;

interface word vector

And each mean vector

。

In step S32, the Purity algorithm formula is:

in the formula, N represents the total number of word vectors,

representing an artificial set of micro-service interface partitions,

a cluster partition set representing a K-means algorithm;

The Purity algorithm flow is shown in FIG. 4, the filled circles represent interface word vectors that have not been classified by the Kemeans algorithm, the open circles, open triangles, and open squares represent interface word vectors that have been classified by the K-means algorithm into different classes, and the second dashed box in FIG. 4 represents the interface word vectors in the set

The third dotted box represents the interface word vector in the set

The said Purity formula is given to each class cluster

Assigning a class

The allocation principle is that the category is

Interface word vector of

In cluster

The number of occurrences of (a) is the largest, wherein,

calculate each cluster

Is classified into

The number of occurrences of the interface word vector is summed and normalized to obtain the final score

。

Based on the calling relation of the actual operation of the micro-service interface, the invention uses mathematical methods such as a word vector model, K-means clustering and a Purity algorithm to re-divide the interface set, compares the interface set with the micro-service interface divided manually, calculates to obtain the evaluation score of the division of the manual interface, and guides the existing micro-service architecture to carry out further optimization and adjustment so as to ensure that the micro-service architecture better conforms to the principle of the micro-service architecture with high cohesion and low coupling.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A micro service interface division evaluation method based on a word vector model is characterized by comprising the following steps:

s1, collecting data, specifically comprising the following steps:

s11, the server side constructs a micro service cluster;

s22, inputting the interface character string array based on the step S21 into a set word vector model to obtain a word vector of the interface name;

s3, interface division evaluation, which comprises the following steps:

s31, using the category number K of the micro service application in the current cluster as the cluster number, and using the word vector of the K-means algorithm cluster interface name to obtainClustering cluster partition set for obtaining K-means algorithm

；

S32 clustering cluster partition set by K-means algorithm

2. The method for dividing and evaluating the micro-service interface based on the word vector model according to claim 1, wherein in step S11, the method for the server to construct the micro-service cluster comprises:

3. The method for evaluating division of micro-service interfaces based on word vector model according to claim 1, wherein in step S12, the method for restoring the distributed link calling process between micro-service applications and forming a graph-like calling chain is collected:

introducing a link collection and display tool Zipkin into each project engineering, starting a Zipkin server, receiving link log data reported by a SOFATracer, cleaning the link log data to form a shape calling chain, and restoring a distributed link calling process.

4. The micro service interface partition evaluation method based on the word vector model according to claim 3, wherein the parameters of the SOFATracer configuration include:

a logging path, which designates a log file output directory;

app, representing the current microservice application name;

url, which represents the request interface address;

traceId, which represents the ID in sofastracer representing a unique request;

the spanId represents the level of the request in the whole call link;

5. The method for evaluating division of micro-service interfaces based on word vector models according to claim 1, wherein in step S21, the method for extracting the interface names according to the calling order and forming the interface character string array comprises:

converting each calling subchain into an interface character string separated by a space, forming an interface character string array with the length of m linear calling subchains, wherein each interface character string represents an interface calling process of a primary child request, and the extracted interface granularity is a father path in an interface address and represents a resource class name in micro-service application;

，

6. The micro-service interface division evaluation method based on the word vector model according to claim 1, wherein in step S22, the word vector model is a CBOW model in the word vector model provided by a python genetic library;

the specific steps of training the word vector model are as follows:

the central word of the window is used as a target of the training, the rest words in the window are used as input nodes of the neural network, the training data is generated after the window slides once, and a word vector representation set of each interface name is obtained through repeated iterative training

。

7. The method for evaluating micro-service interface partition based on word vector model of claim 5, wherein in step S31, word vectors of interface names are clustered by using K-means algorithm to obtain clustering cluster partition set of K-means algorithm

The method comprises the following specific steps:

taking the category number K of micro-service application in the current cluster as the cluster number of the K-means algorithm, firstly, integrating from the interface word vector set

In randomly selecting k vectors

As a set

Each of which is clustered

And initializing clusters

，

；

Computing interface word vectors

And each mean vector

Is a distance of

Wherein, in the step (A),

is determined from the nearest mean vector

Cluster classification of

，

Indicating the current distance

Minimum time variation

A value of (i), i.e

Will interface the word vector

Into a corresponding cluster

，t=

At the beginning

；

After one iteration is finished, aiming at each class cluster

，

Recalculating the center point

Cluster the mean vector of the current class

Is updated to

Then for each interface word vector

Searching the central point closest to the user again;

repeating the loop until the set of two iterations

。

8. The method according to claim 7, wherein the calculation interface word vector is used for evaluating the division of the micro-service interface based on the word vector model

And each mean vector

Is a distance of

The specific method comprises the following steps:

interface word vector

And each mean vector

Are normalized and converted into unit vectors;

interface word vector

And each mean vector

Normalized unit directionVector dot product operation is carried out on the vector to obtain vector inner product, namely vector space cosine included angle, and the value of the cosine included angle is taken as the distance between two vectors

。

9. The method for evaluating division of micro-service interfaces based on word vector models according to claim 5, wherein in step S32, the Purity algorithm formula is:

in the formula, N represents the total number of word vectors,

representing an artificial set of micro-service interface partitions,

a cluster partition set representing a K-means algorithm;