CN113760778A - Word vector model-based micro-service interface division evaluation method - Google Patents

Word vector model-based micro-service interface division evaluation method Download PDF

Info

Publication number
CN113760778A
CN113760778A CN202111316694.6A CN202111316694A CN113760778A CN 113760778 A CN113760778 A CN 113760778A CN 202111316694 A CN202111316694 A CN 202111316694A CN 113760778 A CN113760778 A CN 113760778A
Authority
CN
China
Prior art keywords
interface
micro
service
word vector
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111316694.6A
Other languages
Chinese (zh)
Other versions
CN113760778B (en
Inventor
李莹
夏轩轩
张凌飞
朱晓莉
方燕翎
毛义华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Zhongyi Science And Technology Co ltd
Binhai Industrial Technology Research Institute of Zhejiang University
Original Assignee
Tianjin Zhongyi Science And Technology Co ltd
Binhai Industrial Technology Research Institute of Zhejiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Zhongyi Science And Technology Co ltd, Binhai Industrial Technology Research Institute of Zhejiang University filed Critical Tianjin Zhongyi Science And Technology Co ltd
Priority to CN202111316694.6A priority Critical patent/CN113760778B/en
Publication of CN113760778A publication Critical patent/CN113760778A/en
Application granted granted Critical
Publication of CN113760778B publication Critical patent/CN113760778B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3608Software analysis for verifying properties of programs using formal methods, e.g. model checking, abstract interpretation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3612Software analysis for verifying properties of programs by runtime analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a word vector model-based micro-service interface division evaluation method, which comprises the following steps of: the server side constructs a micro-service cluster; collecting log data to restore a distributed link calling process among all micro-service applications; model training: splitting a graph-shaped calling chain into linear calling subchains, extracting interface names according to a calling sequence to form an interface character string array, and obtaining a man-made micro-service interface division set omega; performing word vector model training based on the interface character string array to obtain a word vector of the interface name; interface division evaluation: taking the category number K of the micro-service application in the current cluster as the cluster number to obtain a cluster division set of a K-means algorithm; and evaluating the rationality of the omega interface division of the set by using a Purity algorithm with the clustering division set of the K-means algorithm as a reference. The method is based on the calling relation of the actual operation of the micro-service interface, uses a mathematical method to subdivide the interface set, compares the interface set with the micro-service interface divided manually, and guides the optimization of the existing micro-service architecture.

Description

Word vector model-based micro-service interface division evaluation method
Technical Field
The invention belongs to the field of micro service interfaces, and particularly relates to a micro service interface division evaluation method based on a word vector model.
Background
The traditional single application architecture is generally based on Tomcat middleware, and the complexity of the system is increased by the architecture, so that the cooperation among developers is difficult, and the system is difficult to be smoothly and continuously integrated and continuously released. In actual operation, the problem of chain reaction of faults is easy to occur, and the rapidly-increased business scale of the internet company cannot be met.
Compared with the traditional single architecture, the micro-service architecture decomposes the functions into discrete services, each service is cohesive enough, so that the coupling of the system is reduced, the services can be horizontally and vertically expanded and independently deployed, the problem of one service cannot lead the whole system to be paralyzed, and the system cannot be limited on a certain technical stack for a long time. The project adopting the micro-service architecture can realize the integration of rapid iteration, frequent release, development, operation and maintenance.
Based on the above advantages, more and more companies split the monolithic application into the micro-service architecture, for example, patent document with publication number CN112988122A discloses a monolithic application splitting tool and method based on the correlation between functional characteristics and micro-services, and patent document with publication number CN111026468A discloses a backend splitting strategy based on micro-services.
However, when the single application system has complex business, huge codes and numerous modules coupled together, it is challenging to comb out an ideal micro-service structure by means of manual disassembly. Unreasonable service interface division can lead to more complex service dependence relationship, recursively increases call delay among services, and sometimes even some simple functions are difficult to construct. This has the result that development progress is slowed, migration is more difficult, and the like.
In order to better build a micro-service architecture and reduce the call delay between services, the rationality of micro-service interface division needs to be measured and objectively evaluated.
Disclosure of Invention
In view of the above, the present invention aims to provide a method for evaluating micro-service interface division based on a word vector model, so as to solve the problem of low efficiency caused by unreasonable interface division and complex inter-service dependency relationship.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a micro service interface division evaluation method based on a word vector model comprises the following steps:
s1, collecting data, specifically comprising the following steps:
s11, the server side constructs a micro service cluster;
s12, collecting and restoring the distributed link calling process among the micro service applications and forming a graph-shaped calling chain;
s2, setting a word vector model, inputting an interface character string array, and obtaining a word vector of an interface name, wherein the method comprises the following specific steps:
s21, dividing the graph-shaped calling chain into m linear calling subchains by a depth-first search method DFS, extracting interface names according to a calling sequence to form an interface character string array, and obtaining a man-made micro-service interface division set omega;
s22, carrying out word vector model training based on the interface character string array of the step S21 to obtain a word vector of the interface name;
s3, interface division evaluation, which comprises the following steps:
s31, obtaining a clustering cluster division set of the K-means algorithm by using the micro-service application number K as the clustering cluster number and using the word vector of the clustering interface name of the K-means algorithm
Figure 936069DEST_PATH_IMAGE001
S32 clustering cluster partition set by K-means algorithm
Figure 777117DEST_PATH_IMAGE001
For reference, the rationality of the artificial microservice interface partition set Ω is evaluated using the purify algorithm.
Further, in step S11, the method for the server to construct the micro service cluster includes:
the method comprises the steps that a server side constructs a micro-service cluster on the basis of spring cluster, service discovery annotations @ EnableDiscoveryClient and Feign annotations @ EnableFeign Clients are started on a micro-service application starting class, and calling is carried out between micro-service applications through the Feign Client.
Further, in step S12, a method for restoring the distributed link call procedure between the microservice applications and forming a graph-like call chain is collected:
adding a link tracking tool SOFATracer dependency, a Spring Cloud OpenFeign dependency and a data collection tool Zipkin dependency in a configuration file of each micro-service application, and performing embedded point access on a Spring Cloud OpenFeign component by using the SOFATracer to obtain a link calling process of each micro-service application;
introducing a link collection and display tool Zipkin into each project engineering, starting a Zipkin server, receiving link log data reported by a SOFATracer, cleaning the link log data by the Zipkin to form a graph-shaped calling chain, and restoring a distributed link calling process.
Further, the parameters of the sofatrecer configuration include:
a logging path, which designates a log file output directory;
com, alipay, sofa, tracker, Zipkin, enabled, starting the SOFATracer to report data to Zipkin remotely;
com, alipay, sofa, tracker, Zipkin, baseUrl, report data to the server address of Zipkin
The Spring Cloud OpenFeign summary log output by sofatrer can be seen in the log catalog of the project, and the parameters contained in one piece of data in the log are as follows:
app, representing the current microservice application name;
url, which represents the request interface address;
traceId, which represents the ID in sofastracer representing a unique request;
the spanId represents the level of the request in the whole call link;
the naming rule of the spanId is the number of a father spanId + a son spanId, the calling chain context relationship is included, and the spanIds with the same TraceId are collected to form a complete link tree.
Further, in step S21, the method for extracting the interface names according to the calling order and forming the interface character string array includes:
converting each calling subchain into an interface character string separated by a space, forming an interface character string array with the length of m linear calling subchains, wherein each interface character string represents an interface calling process of a primary child request, and the extracted interface granularity is a father path in an interface address and represents a resource type in micro-service application;
all extracted interface names are subjected to duplicate removal processing and are divided into k class clusters according to the classes of the micro-service applications corresponding to the interface names
Figure 21017DEST_PATH_IMAGE002
Figure 492580DEST_PATH_IMAGE002
And (4) a set is divided for the artificial micro-service interface, and k represents the number of categories of micro-service applications in the current cluster.
Further, in step S22, the word vector model is a CBOW model in the word vector models provided by the python genetic library;
the specific steps of training the word vector model are as follows:
setting a generated word vector dimension S, a window size C and the lowest word frequency min _ count = 1;
inputting an interface character string array, and establishing a sliding window with the size of C on each interface character string;
the central word of the window is used as the target of the training, and the rest words in the window are used as the target of the trainingGenerating a piece of training data by sliding the window once for each input node of the neural network, and obtaining a word vector representation set of each interface name through repeated iterative training
Figure 720299DEST_PATH_IMAGE003
Further, in step S31, clustering the word vectors of the interface names using the K-means algorithm to obtain a cluster partition set of the K-means algorithm
Figure 551989DEST_PATH_IMAGE001
The method comprises the following specific steps:
taking the number K of the categories of the microservice applications in the current cluster in step S21 as the cluster number of the K-means algorithm, first, from the interface word vector set
Figure 669636DEST_PATH_IMAGE004
In randomly selecting k vectors
Figure 561369DEST_PATH_IMAGE005
As a set
Figure 761537DEST_PATH_IMAGE006
Each of which is clustered
Figure 193656DEST_PATH_IMAGE007
And initializing clusters
Figure 615410DEST_PATH_IMAGE008
Figure 491093DEST_PATH_IMAGE009
Computing interface word vectors
Figure 162246DEST_PATH_IMAGE010
And each mean vector
Figure 617629DEST_PATH_IMAGE011
Is a distance of
Figure 893890DEST_PATH_IMAGE012
Wherein, in the step (A),
Figure 924163DEST_PATH_IMAGE013
is determined from the nearest mean vector
Figure 95993DEST_PATH_IMAGE010
Cluster classification of
Figure 807597DEST_PATH_IMAGE014
Figure 16993DEST_PATH_IMAGE015
Indicating the current distance
Figure 421430DEST_PATH_IMAGE012
Minimum time variation
Figure 332754DEST_PATH_IMAGE016
A value of (i), i.e
Figure 598781DEST_PATH_IMAGE017
Will interface the word vector
Figure 911951DEST_PATH_IMAGE010
Into a corresponding cluster
Figure 487289DEST_PATH_IMAGE018
,t=
Figure 636641DEST_PATH_IMAGE019
At the beginning
Figure 955627DEST_PATH_IMAGE020
After one iteration is finished, aiming at each class cluster
Figure 888684DEST_PATH_IMAGE007
Figure 634923DEST_PATH_IMAGE021
Recalculating the center point
Figure 255261DEST_PATH_IMAGE022
Cluster the mean vector of the current class
Figure 128670DEST_PATH_IMAGE011
Is updated to
Figure 150852DEST_PATH_IMAGE023
Then for each interface word vector
Figure 615463DEST_PATH_IMAGE010
Searching the central point closest to the user again;
repeating the loop until the set of two iterations
Figure 926359DEST_PATH_IMAGE006
The clustering cluster division set of the K-means algorithm is finally obtained without change
Figure 649464DEST_PATH_IMAGE001
Further, the computing interface word vector
Figure 214569DEST_PATH_IMAGE010
And each mean vector
Figure 630506DEST_PATH_IMAGE011
Is a distance of
Figure 163119DEST_PATH_IMAGE012
The specific method comprises the following steps:
interface word vector
Figure 172139DEST_PATH_IMAGE010
And each mean vector
Figure 106597DEST_PATH_IMAGE011
Are all normalized and converted into unit directionAn amount;
interface word vector
Figure 444168DEST_PATH_IMAGE010
And each mean vector
Figure 198498DEST_PATH_IMAGE011
The normalized unit vector is subjected to vector dot product operation to obtain vector inner product, namely vector space cosine included angle, and the value of the cosine included angle is taken as the distance between two vectors
Figure 528985DEST_PATH_IMAGE012
The range of the cosine is [ -1, 1], if the cosine between two vectors tends to-1, the semantic difference is larger, and tends to 1, the semantic similarity is considered to be higher.
Further, in step S32, the formula of the Purity algorithm is:
Figure 68682DEST_PATH_IMAGE024
in the formula, N represents the total number of word vectors,
Figure 560843DEST_PATH_IMAGE002
representing an artificial set of micro-service interface partitions,
Figure 68048DEST_PATH_IMAGE001
a cluster partition set representing a K-means algorithm;
Figure 687379DEST_PATH_IMAGE025
closer to 1 indicates more reasonable partitioning of the microservice interface.
For each class cluster
Figure 330850DEST_PATH_IMAGE026
Assigning a class
Figure 13154DEST_PATH_IMAGE027
The allocation principle is that the category is
Figure 742075DEST_PATH_IMAGE027
Interface word vector of
Figure 414365DEST_PATH_IMAGE028
In cluster
Figure 194233DEST_PATH_IMAGE026
The number of occurrences of (a) is the largest, wherein,
Figure 762618DEST_PATH_IMAGE029
calculate each cluster
Figure 244415DEST_PATH_IMAGE026
Is classified into
Figure 939970DEST_PATH_IMAGE027
The number of times of occurrence of the word vector is summed and normalized to obtain the final score
Figure 354770DEST_PATH_IMAGE030
Compared with the prior art, the word vector model-based micro-service interface division evaluation method has the following beneficial effects:
the micro-service interface division evaluation method based on the word vector model is based on the calling relation of the actual operation of the micro-service interface, uses mathematical methods such as the word vector model, the K-means clustering and the Purity algorithm to divide the interface set again, compares the interface set with the manually divided micro-service interface, calculates the manual interface division evaluation score, and guides the existing micro-service architecture to carry out further optimization and adjustment, so that the micro-service architecture more conforms to the principle of high-cohesion and low-coupling micro-service architecture.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a flow chart of a micro-service interface partitioning evaluation method based on a word vector model according to the present invention;
FIG. 2 is a process diagram of a restore request call chain according to the present invention;
FIG. 3 is a diagram illustrating a word vector model according to the present invention;
FIG. 4 is a schematic diagram of the K-means clustering algorithm and the Purity algorithm according to the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," etc. may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless otherwise specified.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art through specific situations.
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
As shown in fig. 1, the method for evaluating the division of the micro-service interface based on the word vector model mainly includes
A data collection phase S1, a model training phase S2, and an interface evaluation phase S3.
S1, a data collection stage, which comprises the following steps:
s11, the server side constructs a micro service cluster, and each micro service application independently collects the embedded point logs.
S12, collecting and restoring the distributed link calling process among the micro service applications and forming a graph-shaped calling chain;
s2, in the model training phase, setting a word vector model, inputting the preprocessed interface character string array, and obtaining the word vector representation of the interface name, wherein the method specifically comprises the following steps:
s21, dividing the graph-shaped calling chain into m linear calling subchains by a depth-first search method DFS, extracting interface names according to a calling sequence, forming an interface character string array, generating training data of a word vector model, and obtaining a micro-service interface set omega divided artificially;
s22, carrying out word vector model training based on the interface character string array of the step S21 to obtain a word vector of the interface name;
s3, interface evaluation stage, which comprises the following steps:
s31, clustering interface name word vectors by using the micro-service application number K as the clustering cluster number and using a K-means algorithm to obtain a clustering cluster division set of the K-means algorithm
Figure 844789DEST_PATH_IMAGE001
S32 clustering cluster partition set by K-means algorithm
Figure 813882DEST_PATH_IMAGE001
For benchmark, human microservice interface planning is evaluated using the Purity algorithmThe rationality of the subset Ω.
In step S11, the method for the server to construct the micro service cluster includes:
the method comprises the following steps that a server side constructs a micro-service cluster on the basis of Spring Cloud, SOFATracer dependence, Spring Cloud OpenFeign dependence and Zipkin dependence are added into a pom file of an engineering module, and parameters needed to be used by a link tracking tool SOFATracer and a data collection tool Zipkin are added into a configuration file of each micro-service application, wherein the parameters comprise:
a logging path, which designates a log file output directory;
com, alipay, sofa, tracker, Zipkin, enabled, starting the SOFATracer to report data to Zipkin remotely;
com, alipay, sofa, tracker, Zipkin, baseUrl, report data to the server address of Zipkin
After the configuration of the dependency and the parameters of each micro service project is completed, service discovery notes @ EnableDiscoveryClients and Feign notes @ EnableFeign Clients are started on a micro service application starting class, and the micro service applications are called through the Feign Clients.
The Spring Cloud OpenFeign summary log output by sofatrer can be seen in the log catalog of the project, and the parameters contained in one piece of data in the log are as follows:
app, representing the current microservice application name;
url, which represents the request interface address;
traceId, which represents the ID in sofastracer representing a unique request;
the spanId represents the level of the request in the whole call link,
in step S12, a method for collecting and restoring the distributed link call process between the microservice applications and forming a graph call chain is provided:
starting a Zipkin server, reporting the Spring Cloud OpenFeign summary log to the Zipkin server by the SOFATracer component integrated by each micro-service application, optionally, according to the size of data volume, performing corresponding configuration on the Zipkin server to enable log data to be persisted to databases such as Mysql or elastic search.
As shown in fig. 2, firstly, the reported link log data is extracted from the database, data with the same TraceId is from the same request, the naming rule of the spanId parameter in each piece of data is the number of parent spanId + child spanId, which includes the context relationship of the call chain, the position of the piece of data in the call chain requested according to the spanId is restored, and the format of the request. "name of micro service application address/name of micro service resource class/method in class",
such as: "http://122.224.64.250: 8083/device/getInfo";
url parameters, such as device, are extracted as an interface api of the data request, and finally each request is restored to a graph-like call chain, as shown in the first dotted box of fig. 2, a, B, …, G indicate data with the same TraceId in the database, TraceId and spanId are parameters carried by the data, and api is a parameter generated by artificial extraction.
In step S21, the method for generating word vector model training data includes:
and traversing the link data of each request by a depth-first search method DFS, and splitting all the graph-shaped call chains into m linear call subchains as shown by a second dotted box in FIG. 2. Traversing each sub-chain, extracting an api parameter in each piece of data according to a calling sequence, converting each calling sub-chain into an interface character string separated by a space, such as 'sa sd sc sg', each interface character string represents an interface calling process of a sub-request at one time, and m linear calling sub-chains form an interface character string array with the length of m.
Performing duplicate removal processing on all extracted interface names sa, sb, sc and the like, and dividing the interface names into k class clusters according to the classes of the micro-service applications to which the interface names sa, sb, sc and the like belong
Figure 310198DEST_PATH_IMAGE002
Figure 845084DEST_PATH_IMAGE002
And (4) a set is divided for the artificial micro-service interface, and k represents the number of categories of micro-service applications in the current cluster.
The interface string array is a training corpus as the word vector model in step S22.
As shown in fig. 3, in step S22, the word vector model is a CBOW model in the word vector models provided by the python general library, where the CBOW model is a three-layer neural network including an Input layer (Input layer), a Hidden layer (Hidden layer), and an Output layer (Output layer);
the specific steps of training the word vector model are as follows:
setting training parameters of a word vector model, generating a word vector dimension S =100, a window size C =5, and a minimum word frequency min _ count =1 (every interface appearing on a request link should not be ignored);
an interface character string array is input, a sliding window with the size of C is established on each interface character string, and a1, a2 and … a6 in the figure 3 represent interface names contained in one interface character string. The central word a3 of the window is used as the target of the training, the rest words a1, a2, a4 and a5 in the window are used as input nodes of the neural network, each interface name can be converted into N-dimensional One-Hot codes, N is the number of the extracted and de-weighted interface names, and the One-Hot codes of 4 input nodes are respectively multiplied by a shared input weight matrix
Figure 489692DEST_PATH_IMAGE031
Obtaining 4 vectors, generating an S-dimensional hidden layer vector after weighted averaging, and multiplying the hidden layer vector by an output weight matrix
Figure 431234DEST_PATH_IMAGE032
Obtaining an output vector, comparing the output vector with One-Hot coding of the central word a3 and updating a weight matrix
Figure 983438DEST_PATH_IMAGE033
And
Figure 123564DEST_PATH_IMAGE034
generating a piece of training data every time the window slides once, and obtaining an output weight matrix through repeated iterative training
Figure 939073DEST_PATH_IMAGE032
For the interface word vector matrix, each row of the matrix corresponds to an S-dimensional interface word vector, and finally, a word vector representation set of each interface name is obtained
Figure 148338DEST_PATH_IMAGE003
Set of
Figure 192648DEST_PATH_IMAGE035
The distribution in space is shown in the first dotted box of fig. 4.
The interface word vectors with similar contexts in the call chain are close to each other in position in the space coordinate, and the interface word vectors with larger context difference are far away from each other.
In step S31, the word vectors of the interface names are clustered by using the K-means algorithm to obtain a clustering cluster division set of the K-means algorithm
Figure 702127DEST_PATH_IMAGE001
The method comprises the following specific steps:
taking the number K of the categories of the microservice applications in the current cluster in step S21 as the cluster number of the K-means algorithm, first, from the interface word vector set
Figure 422958DEST_PATH_IMAGE004
In randomly selecting k vectors
Figure 392621DEST_PATH_IMAGE005
As a set
Figure 224310DEST_PATH_IMAGE006
Each of which is clustered
Figure 339028DEST_PATH_IMAGE007
And initializing clusters
Figure 27498DEST_PATH_IMAGE008
Figure 414617DEST_PATH_IMAGE009
Computing interface word vectors
Figure 863047DEST_PATH_IMAGE010
And each mean vector
Figure 284801DEST_PATH_IMAGE011
Is a distance of
Figure 160485DEST_PATH_IMAGE012
Wherein, in the step (A),
Figure 34900DEST_PATH_IMAGE013
is determined from the nearest mean vector
Figure 270709DEST_PATH_IMAGE010
Cluster classification of
Figure 294772DEST_PATH_IMAGE014
Figure 888827DEST_PATH_IMAGE015
Indicating the current distance
Figure 47276DEST_PATH_IMAGE012
Minimum time variation
Figure 574859DEST_PATH_IMAGE016
A value of (i), i.e
Figure 33522DEST_PATH_IMAGE017
Will interface the word vector
Figure 437959DEST_PATH_IMAGE010
Into a corresponding cluster
Figure 834436DEST_PATH_IMAGE018
,t=
Figure 412048DEST_PATH_IMAGE019
At the beginning
Figure 413633DEST_PATH_IMAGE020
After one iteration is finished, aiming at each class cluster
Figure 51288DEST_PATH_IMAGE007
Figure 122012DEST_PATH_IMAGE021
Recalculating the center point
Figure 254047DEST_PATH_IMAGE022
Cluster the mean vector of the current class
Figure 359406DEST_PATH_IMAGE011
Is updated to
Figure 915765DEST_PATH_IMAGE023
Then for each interface word vector
Figure 739365DEST_PATH_IMAGE010
Searching the central point closest to the user again;
repeating the loop until the set of two iterations
Figure 658779DEST_PATH_IMAGE006
The clustering cluster division set of the K-means algorithm is finally obtained without change
Figure 634957DEST_PATH_IMAGE001
The computing interface word vector
Figure 614414DEST_PATH_IMAGE010
And each mean vector
Figure 659730DEST_PATH_IMAGE011
Is a distance of
Figure 133568DEST_PATH_IMAGE012
The specific method comprises the following steps:
interface word vector
Figure 213520DEST_PATH_IMAGE010
And each mean vector
Figure 114611DEST_PATH_IMAGE011
Are normalized and converted into unit vectors;
interface word vector
Figure 647223DEST_PATH_IMAGE010
And each mean vector
Figure 377282DEST_PATH_IMAGE011
The normalized unit vector is subjected to vector dot product operation to obtain vector inner product, namely vector space cosine included angle, and the value of the cosine included angle is taken as the distance between two vectors
Figure 873858DEST_PATH_IMAGE012
The range of the cosine is [ -1, 1], if the cosine between two vectors tends to-1, the semantic difference is larger, and tends to 1, the semantic similarity is considered to be higher.
In step S32, the Purity algorithm formula is:
Figure 398380DEST_PATH_IMAGE036
in the formula, N represents the total number of word vectors,
Figure 480606DEST_PATH_IMAGE002
representing an artificial set of micro-service interface partitions,
Figure 499509DEST_PATH_IMAGE001
a cluster partition set representing a K-means algorithm;
Figure 350790DEST_PATH_IMAGE025
closer to 1 indicates more reasonable partitioning of the microservice interface.
The Purity algorithm flow is shown in FIG. 4, the filled circles represent interface word vectors that have not been classified by the Kemeans algorithm, the open circles, open triangles, and open squares represent interface word vectors that have been classified by the K-means algorithm into different classes, and the second dashed box in FIG. 4 represents the interface word vectors in the set
Figure 780634DEST_PATH_IMAGE037
The third dotted box represents the interface word vector in the set
Figure 100888DEST_PATH_IMAGE038
The said Purity formula is given to each class cluster
Figure 703908DEST_PATH_IMAGE026
Assigning a class
Figure 98111DEST_PATH_IMAGE027
The allocation principle is that the category is
Figure 230015DEST_PATH_IMAGE027
Interface word vector of
Figure 21254DEST_PATH_IMAGE028
In cluster
Figure 379030DEST_PATH_IMAGE026
The number of occurrences of (a) is the largest, wherein,
Figure 939324DEST_PATH_IMAGE029
calculate each cluster
Figure 976550DEST_PATH_IMAGE026
Is classified into
Figure 5817DEST_PATH_IMAGE027
The number of occurrences of the interface word vector is summed and normalized to obtain the final score
Figure 153902DEST_PATH_IMAGE030
Based on the calling relation of the actual operation of the micro-service interface, the invention uses mathematical methods such as a word vector model, K-means clustering and a Purity algorithm to re-divide the interface set, compares the interface set with the micro-service interface divided manually, calculates to obtain the evaluation score of the division of the manual interface, and guides the existing micro-service architecture to carry out further optimization and adjustment so as to ensure that the micro-service architecture better conforms to the principle of the micro-service architecture with high cohesion and low coupling.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (9)

1. A micro service interface division evaluation method based on a word vector model is characterized by comprising the following steps:
s1, collecting data, specifically comprising the following steps:
s11, the server side constructs a micro service cluster;
s12, collecting and restoring the distributed link calling process among the micro service applications and forming a graph-shaped calling chain;
s2, setting a word vector model, inputting an interface character string array, and obtaining a word vector of an interface name, wherein the method comprises the following specific steps:
s21, dividing the graph-shaped calling chain into m linear calling subchains by a depth-first search method DFS, extracting interface names according to a calling sequence to form an interface character string array, and obtaining a man-made micro-service interface division set omega;
s22, inputting the interface character string array based on the step S21 into a set word vector model to obtain a word vector of the interface name;
s3, interface division evaluation, which comprises the following steps:
s31, using the category number K of the micro service application in the current cluster as the cluster number, and using the word vector of the K-means algorithm cluster interface name to obtainClustering cluster partition set for obtaining K-means algorithm
Figure 990443DEST_PATH_IMAGE001
S32 clustering cluster partition set by K-means algorithm
Figure 367198DEST_PATH_IMAGE001
For reference, the rationality of the artificial microservice interface partition set Ω is evaluated using the purify algorithm.
2. The method for dividing and evaluating the micro-service interface based on the word vector model according to claim 1, wherein in step S11, the method for the server to construct the micro-service cluster comprises:
the method comprises the steps that a server side constructs a micro-service cluster on the basis of spring cluster, service discovery annotations @ EnableDiscoveryClient and Feign annotations @ EnableFeign Clients are started on a micro-service application starting class, and calling is carried out between micro-service applications through the Feign Client.
3. The method for evaluating division of micro-service interfaces based on word vector model according to claim 1, wherein in step S12, the method for restoring the distributed link calling process between micro-service applications and forming a graph-like calling chain is collected:
adding a link tracking tool SOFATracer dependency, a Spring Cloud OpenFeign dependency and a data collection tool Zipkin dependency in a configuration file of each micro-service application, and performing embedded point access on a Spring Cloud OpenFeign component by using the SOFATracer to obtain a link calling process of each micro-service application;
introducing a link collection and display tool Zipkin into each project engineering, starting a Zipkin server, receiving link log data reported by a SOFATracer, cleaning the link log data to form a shape calling chain, and restoring a distributed link calling process.
4. The micro service interface partition evaluation method based on the word vector model according to claim 3, wherein the parameters of the SOFATracer configuration include:
a logging path, which designates a log file output directory;
com, alipay, sofa, tracker, Zipkin, enabled, starting the SOFATracer to report data to Zipkin remotely;
com, alipay, sofa, tracker, Zipkin, baseUrl, report data to the server address of Zipkin
The Spring Cloud OpenFeign summary log output by sofatrer can be seen in the log catalog of the project, and the parameters contained in one piece of data in the log are as follows:
app, representing the current microservice application name;
url, which represents the request interface address;
traceId, which represents the ID in sofastracer representing a unique request;
the spanId represents the level of the request in the whole call link;
the naming rule of the spanId is the number of a father spanId + a son spanId, the calling chain context relationship is included, and the spanIds with the same TraceId are collected to form a complete link tree.
5. The method for evaluating division of micro-service interfaces based on word vector models according to claim 1, wherein in step S21, the method for extracting the interface names according to the calling order and forming the interface character string array comprises:
converting each calling subchain into an interface character string separated by a space, forming an interface character string array with the length of m linear calling subchains, wherein each interface character string represents an interface calling process of a primary child request, and the extracted interface granularity is a father path in an interface address and represents a resource class name in micro-service application;
all extracted interface names are subjected to duplicate removal processing and are divided into k class clusters according to the classes of the micro-service applications corresponding to the interface names
Figure 830409DEST_PATH_IMAGE002
Figure 1628DEST_PATH_IMAGE002
And (4) a set is divided for the artificial micro-service interface, and k represents the number of categories of micro-service applications in the current cluster.
6. The micro-service interface division evaluation method based on the word vector model according to claim 1, wherein in step S22, the word vector model is a CBOW model in the word vector model provided by a python genetic library;
the specific steps of training the word vector model are as follows:
setting a generated word vector dimension S, a window size C and the lowest word frequency min _ count = 1;
inputting an interface character string array, and establishing a sliding window with the size of C on each interface character string;
the central word of the window is used as a target of the training, the rest words in the window are used as input nodes of the neural network, the training data is generated after the window slides once, and a word vector representation set of each interface name is obtained through repeated iterative training
Figure 789105DEST_PATH_IMAGE003
7. The method for evaluating micro-service interface partition based on word vector model of claim 5, wherein in step S31, word vectors of interface names are clustered by using K-means algorithm to obtain clustering cluster partition set of K-means algorithm
Figure 703971DEST_PATH_IMAGE004
The method comprises the following specific steps:
taking the category number K of micro-service application in the current cluster as the cluster number of the K-means algorithm, firstly, integrating from the interface word vector set
Figure 772421DEST_PATH_IMAGE005
In randomly selecting k vectors
Figure 629388DEST_PATH_IMAGE006
As a set
Figure 416078DEST_PATH_IMAGE007
Each of which is clustered
Figure 869056DEST_PATH_IMAGE008
And initializing clusters
Figure 808325DEST_PATH_IMAGE009
Figure 321345DEST_PATH_IMAGE010
Computing interface word vectors
Figure 860911DEST_PATH_IMAGE011
And each mean vector
Figure 366848DEST_PATH_IMAGE012
Is a distance of
Figure 409890DEST_PATH_IMAGE013
Wherein, in the step (A),
Figure 841615DEST_PATH_IMAGE014
is determined from the nearest mean vector
Figure 337318DEST_PATH_IMAGE015
Cluster classification of
Figure 663258DEST_PATH_IMAGE016
Figure 810074DEST_PATH_IMAGE017
Indicating the current distance
Figure 664898DEST_PATH_IMAGE013
Minimum time variation
Figure 913476DEST_PATH_IMAGE018
A value of (i), i.e
Figure 262680DEST_PATH_IMAGE019
Will interface the word vector
Figure 280315DEST_PATH_IMAGE015
Into a corresponding cluster
Figure 571619DEST_PATH_IMAGE020
,t=
Figure 291182DEST_PATH_IMAGE021
At the beginning
Figure 693345DEST_PATH_IMAGE022
After one iteration is finished, aiming at each class cluster
Figure 299906DEST_PATH_IMAGE008
Figure 527493DEST_PATH_IMAGE023
Recalculating the center point
Figure 750664DEST_PATH_IMAGE024
Cluster the mean vector of the current class
Figure 690938DEST_PATH_IMAGE012
Is updated to
Figure 666853DEST_PATH_IMAGE025
Then for each interface word vector
Figure 299960DEST_PATH_IMAGE015
Searching the central point closest to the user again;
repeating the loop until the set of two iterations
Figure 230001DEST_PATH_IMAGE007
The clustering cluster division set of the K-means algorithm is finally obtained without change
Figure 973966DEST_PATH_IMAGE004
8. The method according to claim 7, wherein the calculation interface word vector is used for evaluating the division of the micro-service interface based on the word vector model
Figure 820699DEST_PATH_IMAGE026
And each mean vector
Figure 873974DEST_PATH_IMAGE012
Is a distance of
Figure 806158DEST_PATH_IMAGE013
The specific method comprises the following steps:
interface word vector
Figure 88235DEST_PATH_IMAGE011
And each mean vector
Figure 537278DEST_PATH_IMAGE012
Are normalized and converted into unit vectors;
interface word vector
Figure 246608DEST_PATH_IMAGE011
And each mean vector
Figure 666088DEST_PATH_IMAGE012
Normalized unit directionVector dot product operation is carried out on the vector to obtain vector inner product, namely vector space cosine included angle, and the value of the cosine included angle is taken as the distance between two vectors
Figure 266702DEST_PATH_IMAGE013
9. The method for evaluating division of micro-service interfaces based on word vector models according to claim 5, wherein in step S32, the Purity algorithm formula is:
Figure 822448DEST_PATH_IMAGE027
in the formula, N represents the total number of word vectors,
Figure 968259DEST_PATH_IMAGE002
representing an artificial set of micro-service interface partitions,
Figure 360188DEST_PATH_IMAGE004
a cluster partition set representing a K-means algorithm;
Figure 984068DEST_PATH_IMAGE028
closer to 1 indicates more reasonable partitioning of the microservice interface.
CN202111316694.6A 2021-11-09 2021-11-09 Word vector model-based micro-service interface division evaluation method Active CN113760778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111316694.6A CN113760778B (en) 2021-11-09 2021-11-09 Word vector model-based micro-service interface division evaluation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111316694.6A CN113760778B (en) 2021-11-09 2021-11-09 Word vector model-based micro-service interface division evaluation method

Publications (2)

Publication Number Publication Date
CN113760778A true CN113760778A (en) 2021-12-07
CN113760778B CN113760778B (en) 2022-02-08

Family

ID=78784664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111316694.6A Active CN113760778B (en) 2021-11-09 2021-11-09 Word vector model-based micro-service interface division evaluation method

Country Status (1)

Country Link
CN (1) CN113760778B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115061836A (en) * 2022-08-16 2022-09-16 浙江大学滨海产业技术研究院 Micro-service splitting method based on graph embedding algorithm for interface layer
CN116112569A (en) * 2023-02-23 2023-05-12 安超云软件有限公司 Micro-service scheduling method and management system
CN117311801A (en) * 2023-11-27 2023-12-29 湖南科技大学 Micro-service splitting method based on networking structural characteristics

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6965886B2 (en) * 2001-11-01 2005-11-15 Actimize Ltd. System and method for analyzing and utilizing data, by executing complex analytical models in real time
CN107580018A (en) * 2017-07-28 2018-01-12 北京北信源软件股份有限公司 The tracking and device of a kind of distributed system
CN109670022A (en) * 2018-12-13 2019-04-23 南京航空航天大学 A kind of java application interface use pattern recommended method based on semantic similarity
WO2019080615A1 (en) * 2017-10-23 2019-05-02 阿里巴巴集团控股有限公司 Cluster-based word vector processing method, device, and apparatus
CN109921927A (en) * 2019-02-20 2019-06-21 苏州人之众信息技术有限公司 Real-time calling D-chain trace method based on micro services
CN109948710A (en) * 2019-03-21 2019-06-28 杭州电子科技大学 Micro services recognition methods based on API similarity
CN110262972A (en) * 2019-06-17 2019-09-20 中国科学院软件研究所 A kind of failure testing tool and method towards micro services application
CN111459760A (en) * 2020-04-01 2020-07-28 交通银行股份有限公司太平洋***中心 Micro-service monitoring method and device and computer storage medium
CN111459766A (en) * 2019-11-14 2020-07-28 国网浙江省电力有限公司信息通信分公司 Calling chain tracking and analyzing method for micro-service system
CN111552509A (en) * 2020-04-30 2020-08-18 深圳前海微众银行股份有限公司 Method and device for determining dependency relationship between interfaces
CN111651451A (en) * 2020-04-25 2020-09-11 复旦大学 Scene-driven single system micro-service splitting method
CN111984346A (en) * 2020-08-12 2020-11-24 八维通科技有限公司 Method, system, device and storage medium for call chain tracking in micro-service environment
CN112148254A (en) * 2019-06-27 2020-12-29 Sap欧洲公司 Application evaluation system for achieving interface design consistency between microservices
WO2021000362A1 (en) * 2019-07-04 2021-01-07 浙江大学 Deep neural network model-based address information feature extraction method
CN112650614A (en) * 2020-12-30 2021-04-13 平安消费金融有限公司 Call chain monitoring method and device, electronic equipment and storage medium
US20210157802A1 (en) * 2019-11-21 2021-05-27 Dell Products L. P. Consistent structured data hash value generation across formats and platforms

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6965886B2 (en) * 2001-11-01 2005-11-15 Actimize Ltd. System and method for analyzing and utilizing data, by executing complex analytical models in real time
CN107580018A (en) * 2017-07-28 2018-01-12 北京北信源软件股份有限公司 The tracking and device of a kind of distributed system
WO2019080615A1 (en) * 2017-10-23 2019-05-02 阿里巴巴集团控股有限公司 Cluster-based word vector processing method, device, and apparatus
CN109670022A (en) * 2018-12-13 2019-04-23 南京航空航天大学 A kind of java application interface use pattern recommended method based on semantic similarity
CN109921927A (en) * 2019-02-20 2019-06-21 苏州人之众信息技术有限公司 Real-time calling D-chain trace method based on micro services
CN109948710A (en) * 2019-03-21 2019-06-28 杭州电子科技大学 Micro services recognition methods based on API similarity
CN110262972A (en) * 2019-06-17 2019-09-20 中国科学院软件研究所 A kind of failure testing tool and method towards micro services application
CN112148254A (en) * 2019-06-27 2020-12-29 Sap欧洲公司 Application evaluation system for achieving interface design consistency between microservices
WO2021000362A1 (en) * 2019-07-04 2021-01-07 浙江大学 Deep neural network model-based address information feature extraction method
CN111459766A (en) * 2019-11-14 2020-07-28 国网浙江省电力有限公司信息通信分公司 Calling chain tracking and analyzing method for micro-service system
US20210157802A1 (en) * 2019-11-21 2021-05-27 Dell Products L. P. Consistent structured data hash value generation across formats and platforms
CN111459760A (en) * 2020-04-01 2020-07-28 交通银行股份有限公司太平洋***中心 Micro-service monitoring method and device and computer storage medium
CN111651451A (en) * 2020-04-25 2020-09-11 复旦大学 Scene-driven single system micro-service splitting method
CN111552509A (en) * 2020-04-30 2020-08-18 深圳前海微众银行股份有限公司 Method and device for determining dependency relationship between interfaces
CN111984346A (en) * 2020-08-12 2020-11-24 八维通科技有限公司 Method, system, device and storage medium for call chain tracking in micro-service environment
CN112650614A (en) * 2020-12-30 2021-04-13 平安消费金融有限公司 Call chain monitoring method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
丁丹 等: ""场景驱动且自底向上的单体***微服务拆分方法"", 《软件学报》 *
吴化尧: ""面向微服务软件开发方法研究进展"", 《计算机研究与发展》 *
钟陈星 等: ""限界上下文视角下的微服务粒度评估"", 《软件学报》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115061836A (en) * 2022-08-16 2022-09-16 浙江大学滨海产业技术研究院 Micro-service splitting method based on graph embedding algorithm for interface layer
CN115061836B (en) * 2022-08-16 2022-11-08 浙江大学滨海产业技术研究院 Micro-service splitting method based on graph embedding algorithm for interface layer
CN116112569A (en) * 2023-02-23 2023-05-12 安超云软件有限公司 Micro-service scheduling method and management system
CN116112569B (en) * 2023-02-23 2023-07-21 安超云软件有限公司 Micro-service scheduling method and management system
CN117311801A (en) * 2023-11-27 2023-12-29 湖南科技大学 Micro-service splitting method based on networking structural characteristics
CN117311801B (en) * 2023-11-27 2024-04-09 湖南科技大学 Micro-service splitting method based on networking structural characteristics

Also Published As

Publication number Publication date
CN113760778B (en) 2022-02-08

Similar Documents

Publication Publication Date Title
CN113760778B (en) Word vector model-based micro-service interface division evaluation method
US11816131B2 (en) Cross-domain clusterability evaluation for cross-guided data clustering based on alignment between data domains
US9269054B1 (en) Methods for building regression trees in a distributed computing environment
CN110188030A (en) A kind of test data generating method, device and computer equipment, storage medium
CN101606154B (en) Allow the query pattern of the type stream of element type
CN113128702A (en) Neural network self-adaptive distributed parallel training method based on reinforcement learning
WO2019047790A1 (en) Method and system for generating combined features of machine learning samples
CN110221965A (en) Test cases technology, test method, device, equipment and system
CN105488539B (en) The predictor method and device of the generation method and device of disaggregated model, power system capacity
CN111611488B (en) Information recommendation method and device based on artificial intelligence and electronic equipment
US20100306158A1 (en) Speeding up analysis of compressed web graphs
CN106777318A (en) Matrix decomposition cross-module state Hash search method based on coorinated training
Winlaw et al. Algorithmic acceleration of parallel ALS for collaborative filtering: Speeding up distributed big data recommendation in spark
US10963802B1 (en) Distributed decision variable tuning system for machine learning
CN108986872B (en) Multi-granularity attribute weight Spark method for big data electronic medical record reduction
CN113222181B (en) Federated learning method facing k-means clustering algorithm
CN109840551B (en) Method for optimizing random forest parameters for machine learning model training
CN111966793A (en) Intelligent question-answering method and system based on knowledge graph and knowledge graph updating system
CN115641162A (en) Prediction data analysis system and method based on construction project cost
CN116579503A (en) 5G intelligent hospital basic data processing method and database platform
CN114219562A (en) Model training method, enterprise credit evaluation method and device, equipment and medium
CN113761017A (en) Similarity searching method and device
CN108256083A (en) Content recommendation method based on deep learning
CN109919219A (en) A kind of Xgboost multi-angle of view portrait construction method based on Granule Computing ML-kNN
EP3771992A1 (en) Methods and systems for data ingestion in large-scale databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant