CN115034504B

CN115034504B - Cutter wear state prediction system and method based on cloud edge cooperative training

Info

Publication number: CN115034504B
Application number: CN202210754025.5A
Authority: CN
Inventors: 李孝斌; 王明星; 江沛; 尹超
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2022-06-28
Filing date: 2022-06-28
Publication date: 2024-05-28
Anticipated expiration: 2042-06-28
Also published as: CN115034504A

Abstract

The invention particularly relates to a cutter abrasion state prediction system and method based on cloud edge cooperative training. The system comprises: the equipment layer is used for acquiring sensor data of the tool to be tested; the edge platform is provided with a trained feature extraction model and a lightweight prediction model; the feature extraction model is used for extracting data features in sensor data as feature information to be detected, and the lightweight prediction model is used for taking the feature information to be detected as input and outputting a corresponding cutter abrasion state prediction result; the cloud platform is deployed with a large-scale prediction model based on an attention mechanism; the large-scale prediction model learns attention characteristics and distills the attention characteristics into a lightweight prediction model of the edge platform so as to realize cooperative training of the cloud edge model. The invention also discloses a cutter wear state prediction method. The method can be used for distributing the prediction model at the edge to realize the prediction of the cutter abrasion state, and the precision of the cutter abrasion state prediction is improved in a cloud edge cooperative training mode.

Description

Cutter wear state prediction system and method based on cloud edge cooperative training

Technical Field

The invention relates to the technical field of cutter wear state prediction, in particular to a cutter wear state prediction system and method based on cloud edge cooperative training.

Background

The state of wear of machine tool cutters is an important factor affecting the stability and reliability of the quality of the product in the manufacturing plant. When the cutter abrasion loss is increased to a certain limit, the cutting parameters are unstable, the product failure rate is increased, the continuous processing causes time and material waste, and the whole production process can not normally run when serious. Therefore, it is important to realize real-time and accurate monitoring and prediction of the tool wear state of the machine tool in the machining process.

Existing methods for detecting the wear state of a cutter are divided into a direct measurement method and an indirect measurement method. Direct measurement can directly identify the appearance, surface quality and wear state of the tool by means of sensors, but it can only be detected when it requires a shutdown. Because the surrounding environment of the cutter in the actual production and processing process is complex, the abrasion state of the cutter cannot be directly detected in real time, and an indirect measurement method is generally adopted to collect multiple sensor data such as vibration signals, cutting force, cutting temperature, cutting power and the like in the cutter processing process in real time, and the characteristic data is input into a machine learning model through data cleaning, data fusion and characteristic engineering to output a prediction result, so that the monitoring of the abrasion state of the cutter is completed.

Applicants have found that deep learning methods under large data drives often require a significant amount of computational resources. However, existing central intelligent operation modes generally deploy a prediction model to the cloud, which causes both model training and actual prediction to be affected by network fluency, resulting in lower stability of tool wear state prediction. Meanwhile, a large amount of training data or sensor data is uploaded to the cloud end, so that serious bandwidth consumption is caused, and then real-time response requirements of tool wear state monitoring in an actual production and processing environment cannot be met, so that real-time performance of tool wear state prediction is poor. Therefore, how to design a method for improving the stability and real-time performance of tool wear state prediction is a technical problem to be solved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to solve the technical problems that: how to provide a cutter wear state prediction method based on cloud edge cooperative training, so that a prediction model can be deployed at the edge to realize the prediction of the cutter wear state, thereby improving the stability and instantaneity of the cutter wear state prediction, and improving the precision of the cutter wear state prediction in a cloud edge cooperative training mode.

In order to solve the technical problems, the invention adopts the following technical scheme:

Cutter wear state prediction system based on cloud limit cooperative training includes:

The equipment layer is used for acquiring sensor data of the tool to be tested;

the edge platform is provided with a trained feature extraction model and a lightweight prediction model; the feature extraction model is used for extracting data features in sensor data as feature information to be detected, and the lightweight prediction model is used for taking the feature information to be detected as input and outputting a corresponding cutter abrasion state prediction result;

The cloud platform is deployed with a large-scale prediction model based on an attention mechanism; the large-scale prediction model learns attention characteristics and distills the attention characteristics into a lightweight prediction model of the edge platform so as to realize cooperative training of the cloud edge model.

Preferably, the edge platform is further provided with a data preprocessing module; the data preprocessing module is used for carrying out data cleaning and Z-score normalization processing on the sensor data.

The invention also discloses a cutter abrasion state prediction method based on cloud edge cooperative training, which is implemented based on the cutter abrasion state prediction system and specifically comprises the following steps:

s1: acquiring sensor data of a tool to be tested through an equipment layer, and uploading the sensor data to an edge platform;

S2: the edge platform receives sensor data and inputs the sensor data into a trained feature extraction model, and data features are extracted to serve as feature information to be detected; then inputting the characteristic information to be detected into a lightweight prediction model subjected to cloud edge cooperative training, and outputting a corresponding cutter abrasion state prediction result;

s3: the edge platform generates feedback control information based on the cutter abrasion state prediction result and transmits the feedback control information to the equipment layer;

S4: the equipment layer controls a machine tool of the to-be-tested cutter to execute corresponding actions based on the feedback control information.

Preferably, in step S2, cloud edge cooperative training is achieved through the following steps:

s201: acquiring a training data set with a plurality of groups of training data and tag data thereof;

S202: inputting training data into a feature extraction model, and extracting data features as training feature information;

S203: inputting training feature information and corresponding tag data into a lightweight prediction model, and updating parameters of a feature extraction model and the lightweight prediction model;

S204: uploading training feature information and corresponding tag data to a cloud platform, inputting the training feature information and the corresponding tag data into a large-scale prediction model, updating parameters of the large-scale prediction model, and further distilling and outputting attention features of the training;

s205: training and updating parameters of the feature extraction model and the lightweight prediction model based on the attention features and the historical data of cloud migration;

s206: steps S202 to S205 are repeated until the lightweight predictive model reaches the expectations.

Preferably, in step S202, the feature extraction model includes a two-part convolution operation, and the first part convolution operation includes adding convolution results of the 1×1 convolution kernel and the 3×1 convolution kernel and performing batch regularization; the second partial convolution operation comprises channel splicing of basic convolution results of kernels with different sizes, wherein the basic convolution comprises convolution, batch regularization and ReLU activation functions;

The feature extraction model takes sensor data as input of a first partial convolution operation; carrying out pooling treatment on the output result after batch regularization of the first part of convolution operation, and taking the result as the input of the second part of convolution operation; finally, carrying out pooling treatment on the results obtained after the second partial convolution operation and channel splicing, and outputting corresponding characteristic tensors, namely data characteristics;

wherein the basic convolution is expressed as:

BasicConv(X)＝relu(bn(conv(X,k,1)))＝relu(bn(W_k*X+b_k))；

Wherein: x represents input data; w _k represents a convolution kernel of size k ₁×k₂; * Representing a convolution operation; b _k denotes offset; reLU denotes a ReLU activation function; adding a batch regularization operation bn between the convolution and the ReLU activation function;

the ReLU activation function ReLU is expressed as:

wherein: x represents input data;

By learning the mean mu _β and variance in small batches of data To achieve batch regularization;

wherein: x _i denotes the input data sample; m represents the current batch data size;

wherein: epsilon represents a smaller value greater than zero; gamma and beta represent trainable scale and bias parameters, respectively; representing normalized data; y _i is the output after the self-learning size transformation and offset;

The basic pooling layer of the pooling operation is expressed as:

BasicPool(x)＝concat(pool(x,k₁,s),conv(x,k₂,s))；

Wherein: pool represents a pooling operation; conv represents a convolution operation; k ₁ and k ₂ represent the sizes of the pooling and convolution kernels, respectively; s represents the step size; concat means that the feature vectors are subjected to channel dimension stitching;

The shape of the feature tensor F _f is w _f×1×c_f;

Wherein: w _f represents the time domain size; c _f denotes the channel domain size.

Preferably, in step S204, the large-scale prediction model includes a dense connection structure formed by sequentially connecting three attention-dense modules, two pooling layers and a full connection layer;

Each attention-dense module comprises a plurality of corresponding dense layers and an attention layer, and the layers are in dense connection; each dense layer comprises a plurality of basic convolutions of convolution kernels with different sizes, the tensors are input into linear convolutions after being subjected to channel splicing, and the tensors are connected with an input tensor to form residual errors and then subjected to ReLU activation to obtain the output of the dense layer; the attention module starts from two dimensions of time and a channel and performs weight learning for the characteristics of the target data;

wherein the structure of the dense layer is expressed as:

IncepResLayer_B(X_i)＝relu(X_i+linerConv(X_m,5×1))；

X_m＝concat(BasicConv(X_i,1×1),BasicConv(BasicConv(X_i,1×1),5×1))；

Wherein: INCEPRESLAYER _b represents a type B dense layer; x _i represents the input of the dense layer; reLU denotes a ReLU activation function; the linerConv (x) function represents an inactive linear convolution layer; concat means that the feature vectors are subjected to channel dimension stitching; basicConv (x) represents the basic convolution;

the pooling layer comprises a maximum pooling layer and a basic convolution layer of the same step length and multi-size convolution kernel;

the structure of the pooling layer is expressed as:

Pool(X_i)＝concat(X_m1,X_m2,X_m3)；

X_m1＝Maxpool(X_i,k₁×1)；

X_m2＝BasicConv(X_i,k₁×1)；

X_m3＝BasicConv(BasicConv(X_i,1×1),k₁×1)；

Wherein: pool represents a pooling layer; concat means that the feature vectors are subjected to channel dimension stitching; maxpool (x) represents the maximum pooling operation, and a core size of k ₁×1;X_i represents the input of the pooling layer; basicConv (x) represents the basic convolution; the step sizes of all convolution operations and pooling operations are strides =4;

The working logic of the attention module is as follows:

1) The given input sequence x=x ₁,x₂,...,x_T and the filter f=f ₁,f₂,...f_K are subjected to the following time domain convolution to obtain the correlation sequence a=a ₁,a₂,...,a_T;

obtaining a final time domain weight sequence Y=y ₁,y₂,...,y_T through a Softmax function;

2) For the input feature F _i ^w×1×c, a one-dimensional sequence is obtained through single-channel 1X 1 convolution And obtaining the time domain weight/>, through time domain convolution and Softmax function

Wherein: softmax represents the Softmax function; temporalConv denotes a time domain convolution; conv represents convolution; 1×1 and 3×1 represent the convolution kernel shapes of the convolution and the time domain convolution, respectively;

3) Time domain weights After transposition, the obtained product is multiplied by an input characteristic F _i ^w×1×c to obtain a one-dimensional sequence/>And one-dimensional sequence/> according to the ratio of c/rThe channel number of the channel is reduced, layer normalization and ReLU activation are carried out simultaneously, and then the original channel number is put back according to the original multiplying power, so as to obtain channel domain weight/>

Wherein: conv represents convolution; reLU represents a ReLU activation function; layerNorm denotes layer normalization; representing time domain weights; the superscript T denotes a transpose operation; c/r represents the channel domain dimension reduction ratio;

4) Respectively associating the characteristics of the time domain and the channel domain with the corresponding time domain weights And channel domain weights/>Multiplying to obtain the attention mapping tensor/>And form residual connection with the input feature F _i ^w×1×c to obtain the attention feature output/>I.e. the attention feature.

Preferably, in step S203, the lightweight prediction model cancels the dense connection structure based on the architecture of the large-scale prediction model and replaces the normal convolution with the hole convolution, so as to realize the lightweight design.

Preferably, in step S205, MSE loss and attention distillation loss are calculated based on the tag data and the attention features of the history data, respectively, so as to obtain corresponding training loss, and then parameter updating is performed on the lightweight prediction model through the training loss;

wherein the method comprises the steps of ,Loss_all＝Loss_mse+λLoss_att＝Loss_mse+λ∑_i＝a,b,cD_C(f_t ^c_i,f_t ^e_i);

λ＝(1-α^ep)λ₀；

Wherein: loss_all represents training Loss; loss_mse represents MSE Loss; loss_att represents the Loss of attention distillation; n is the batch size; Is a predicted value; y _i is tag data; d _C(f₁,f₂) represents the cosine distance; < f ₁,f₂ > represents a two-vector inner product; f _t ^c_i and f _t ^e_i are temporal feature sequences of attention features in the large-scale prediction model and the lightweight prediction model, respectively; /(I) And/>Representing the element values in the time series of features f _t ^c_i and f _t ^e_i, respectively; w is the tensor time domain number; λ represents a dynamic distillation loss coefficient; alpha represents a number less than 1; lambda ₀ represents the initial distillation loss coefficient; ep represents the number of training rounds;

the time signature sequence is calculated by the following formula:

Wherein: f _t represents a time feature sequence; f _o ^w×1×c is the attention feature; for attention features/> A feature vector for each channel domain; c represents the number of channels; the time feature sequences f _t ^c_i and f _t ^e_i of the attention features in the large-scale prediction model and the lightweight prediction model are obtained through calculation of the formula.

Preferably, in step S2, the feature information to be measured is uploaded to the cloud platform and input to the large-scale prediction model for model accuracy evaluation, and when the loss of the large-scale prediction model exceeds the expected threshold, incremental training is performed on the large-scale prediction model.

Preferably, the incremental training specifically comprises the steps of:

s211: initializing a model to be trained by using the latest parameters in a historical model library, respectively inputting new training data into the model to be trained and the historical model, and respectively calculating Euclidean distances between each historical model and feature mapping of the model to be trained;

Wherein: Representing Euclidean distances between model features; f and F ⁱ are feature tensors of the model to be trained and the history model respectively; d _t and D _c are the euclidean distance of the time sequence and the channel sequence, respectively; x _j and/> Time sequence elements of the model to be trained and the historical model are respectively; y _k and/>Channel sequence elements of the model to be trained and the historical model are respectively;

S212: obtaining a distance loss based on each history model

S213: setting a corresponding forgetting factor eta based on the importance degree of the history model;

η＝η₀e^-ki；

wherein: η ₀ represents an initial forgetting factor; k represents a forgetting coefficient; i represents a history model number; the characteristic distance loss weight of the historical model drops exponentially along with the updating of the model;

s214: constructing an incremental loss function based on Euclidean distance between the characteristic mapping of the historical model and the model to be trained, and performing incremental training by taking the incremental loss function as an index;

Wherein: l _incre represents incremental loss; l _mse represents the mean square error of the tag data; n represents the number of history models.

The cutter abrasion state prediction method based on cloud edge cooperative training has the following beneficial effects:

According to the method, the device and the system, the sensor data of the tool to be tested are obtained through the equipment layer, the data features are extracted through the feature extraction model and the lightweight prediction model of the edge platform, and the tool wear state prediction result is output, and the edge platform is arranged near the actual production environment, so that the problem of high delay response of the traditional cloud normal frame is effectively solved, the flexibility and the expandability of the whole system can be improved, and the stability and the instantaneity of the tool wear state prediction can be improved. In addition, the lightweight prediction model is deployed on the edge platform, and the model fitting capacity is maintained, so that fewer parameters and faster reasoning speed are achieved, and the real-time performance of the cutter wear state prediction can be further improved.

Meanwhile, the attention features are distilled out in each round of training process through the large-scale prediction model deployed on the cloud platform to assist in training of the lightweight prediction model, so that an intelligent framework of cloud edge cooperative training and edge real-time reasoning is formed, the accuracy of the edge lightweight prediction model can be improved by fully utilizing model knowledge of the cloud platform, the problem that model prediction accuracy is limited due to the fact that the edge lightweight prediction model is simple in structure and small in parameter quantity is avoided, and the accuracy of tool wear state prediction can be further improved through a cloud edge cooperative training mode.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, in which:

FIG. 1 is a logic block diagram of a tool wear state prediction system based on cloud edge co-training;

FIG. 2 is a logic block diagram of a tool wear state prediction method based on cloud edge co-training;

FIG. 3 is a frame diagram of a feature extraction model;

FIG. 4 is a frame diagram of a large-scale predictive model;

FIG. 5 is a schematic diagram of a time-domain attention mechanism module of an attention module;

FIG. 6 is a frame diagram of a lightweight predictive model;

FIG. 7 is a logic block diagram of a cloud edge co-training method;

FIG. 8 is a graph of feature mapping and distillation loss based on cosine distance;

Fig. 9 is a logic diagram of incremental training.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. In the description of the present invention, it should be noted that, directions or positional relationships indicated by terms such as "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., are directions or positional relationships based on those shown in the drawings, or are directions or positional relationships conventionally put in use of the inventive product, are merely for convenience of describing the present invention and simplifying the description, and are not indicative or implying that the apparatus or element to be referred to must have a specific direction, be constructed and operated in a specific direction, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance. Furthermore, the terms "horizontal," "vertical," and the like do not denote a requirement that the component be absolutely horizontal or overhang, but rather may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined. In the description of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

The following is a further detailed description of the embodiments:

Embodiment one:

the embodiment discloses a cutter wear state prediction system based on cloud edge cooperative training.

As shown in fig. 1, the tool wear state prediction system based on cloud edge co-training includes:

The equipment layer is used for acquiring sensor data of the tool to be tested;

in this embodiment, a plurality of data acquisition cards and controllers are installed on a numerical control machine tool of a tool to be measured, an acceleration sensor and a current sensor are installed on a spindle of the tool of the machine tool and a spindle motor respectively, and are connected with the data acquisition cards to respectively acquire cutting vibration signals and current signals, namely sensor data, of the tool to be measured.

And sensor data in the cutting process of the machine tool is uploaded to an edge platform through the DDS for subsequent processing, and meanwhile, a controller installed on the equipment layer can receive feedback control information (comprising control instructions and early warning signals) returned by the edge layer, so that relevant equipment such as the machine tool is controlled to take corresponding measures.

The edge platform is provided with a trained feature extraction model and a lightweight prediction model; the feature extraction model is used for extracting features in sensor data (after pretreatment) as feature information to be detected, and the lightweight prediction model is used for taking the feature information to be detected as input and outputting a corresponding cutter wear state prediction result;

In this embodiment, the edge platform is further configured with a data preprocessing module and an edge model library for storing a history model and data thereof; the data preprocessing module is used for performing data cleaning and Z-score normalization processing on the sensor data, and further storing the preprocessed sensor data into the edge database for subsequent input into the feature extraction model.

The edge platform can also generate corresponding feedback control information (comprising control instructions and early warning signals) based on the cutter abrasion state prediction result and send the feedback control information to the equipment layer. The feedback control information is generated by the cutter wear state prediction result by adopting the existing mature means, and the specific contents are not repeated here.

The cloud platform is deployed with a large-scale prediction model based on an attention mechanism; the model learns attention characteristics and distills the attention characteristics into a lightweight predictive model of an edge platform to realize cooperative training of a cloud edge model.

In this embodiment, the cloud platform is further configured with a model evaluation module for evaluating the prediction accuracy of the large-scale prediction model, and an incremental training module for performing incremental training on the large-scale prediction model.

It should be noted that the lightweight prediction model according to the present invention is not absolutely "lightweight", but rather is "lightweight" with respect to the large-scale prediction model or other existing depth network model according to the present invention, and has fewer parameters and faster reasoning speed than the above model.

According to the method, the device and the system, the sensor data of the tool to be tested are obtained through the equipment layer, the data features are extracted through the feature extraction model and the lightweight prediction model of the edge platform, and the tool wear state prediction result is output, and the edge platform is arranged near the actual production environment, so that the problem of high delay response of the traditional cloud normal frame is effectively solved, the flexibility and the expandability of the whole system can be improved, and the stability and the instantaneity of the tool wear state prediction can be improved. In addition, the lightweight prediction model is deployed on the edge platform, so that fewer parameters and faster reasoning speed can be realized under the condition of keeping the fitting capacity of the model, the training difficulty of the model can be reduced, the prediction efficiency of the model can be improved, and the real-time performance of the prediction of the cutter abrasion state can be further improved.

Meanwhile, the attention features are distilled out in each round of training process through the large-scale prediction model deployed on the cloud platform to assist in training the lightweight prediction model, so that an intelligent framework for cloud edge cooperative training and edge real-time reasoning is formed, the accuracy of the edge lightweight prediction model can be improved by fully utilizing model knowledge of the cloud platform, the problem that the model prediction accuracy is limited due to the fact that the edge lightweight prediction model is simple in structure and small in parameter quantity is avoided, and the accuracy of tool wear state prediction can be further improved through a cloud edge cooperative training mode.

Specific:

As shown in fig. 3, the feature extraction model includes a two-part convolution operation, the first part convolution operation including adding the convolution results of the 1×1 convolution kernel and the 3×1 convolution kernel and performing batch regularization; the second partial convolution operation comprises channel splicing of basic convolution results of kernels with different sizes, wherein the basic convolution comprises convolution, batch regularization and ReLU activation functions;

wherein the basic convolution is expressed as:

BasicConv(X)＝relu(bn(conv(X,k,1)))＝relu(bn(W_k*X+b_k))；

the ReLU activation function ReLU is expressed as:

wherein: x represents input data;

By learning the mean mu _β sum in small batches of data To achieve batch regularization;

The basic pooling layer of the pooling operation is expressed as:

BasicPool(x)＝concat(pool(x,k₁,s),conv(x,k₂,s))；

wherein: pool represents a pooling operation; conv represents a convolution operation; k ₁ and k ₂ represent the size of the pooling kernel and convolution kernel, respectively; s represents the step size; concat means that the feature vectors are subjected to channel dimension stitching;

The shape of the feature tensor F _f is w _f×1×c_f;

The feature extraction model has sparse interaction and parameter sharing capabilities, so that data features can be effectively extracted from sensor data for model training and real-time preside, calculated amount can be effectively reduced, overfitting can be restrained, and real-time performance of cutter wear state prediction can be further improved.

As shown in fig. 4, the large-scale prediction model includes a dense connection structure formed by sequentially connecting three attention dense modules, two pooling layers and a full connection layer;

wherein the structure of the dense layer is expressed as:

IncepResLayer_B(X_i)＝relu(X_i+linerConv(X_m,5×1))；

X_m＝concat(BasicConv(X_i,1×1),BasicConv(BasicConv(X_i,1×1),5×1))；

the structure of the pooling layer is expressed as:

Pool(X_i)＝concat(X_m1,X_m2,X_m3)；

X_m1＝Maxpool(X_i,k₁×1)；

X_m2＝BasicConv(X_i,k₁×1)；

X_m3＝BasicConv(BasicConv(X_i,1×1),k₁×1)；

as shown in fig. 5, the working logic of the attention module is as follows:

In the implementation process, as shown in fig. 6, the lightweight prediction model cancels the dense connection structure based on the architecture of the large-scale prediction model and replaces the common convolution by the hole convolution, so as to realize the lightweight design.

In the implementation process, since the learning capacity of the lightweight prediction model is limited, in order to enable the model to mine more data information at the edge end and have stronger generalization capacity, knowledge learned by the large-scale prediction model needs to be transferred to the edge side.

As shown in fig. 7, the edge model library, the feature extraction model and the lightweight prediction model are all deployed on an edge platform, and a large-scale prediction model is deployed in a cloud platform. The training process of each round carries out parameter updating and cloud edge data transmission of three models. Firstly, ignoring a cloud model, extracting data features by a feature extraction model, then transmitting the data features into a lightweight prediction model, updating two parameters according to an MSE rule, then forward transmitting the data through the feature extraction model again, uploading the features and labels into the cloud platform, updating attention distillation parameters, transmitting attention features of a large-scale prediction model only to an edge platform, and finally, finishing training of one round by the lightweight prediction model according to an MSE loss function and weighted attention feature distillation loss updating parameters.

The algorithm of cloud edge system training is as follows:

cloud edge cooperative training is realized through the following steps:

In this embodiment, the parameters of the feature extraction model and the lightweight prediction model are optimized by the existing MSE loss function.

In this embodiment, the parameters of the large-scale predictive model are optimized by the existing MSE loss function.

in this embodiment, the parameters of the lightweight predictive model are optimized by the MSE loss function+the attention deficit loss function.

S206: steps S202 to S205 are repeated until the feature extraction model and the lightweight prediction model reach expectations.

The invention provides a deep learning method based on an attention mechanism, a residual network and the like, and provides a deep multi-convolution kernel attention residual network model, namely a large-scale prediction model, a lightweight dynamic cavity convolution model, namely a lightweight prediction model, and an intelligent framework of cloud edge collaborative training and edge side real-time reasoning is established, so that the data characteristics of time sequence signals of different sensor data in different fields can be fully explored, and the time and space data fusion is carried out on the data characteristics.

In the specific implementation process, as shown in fig. 8, the cloud model and the lightweight model respectively obtain corresponding attention characteristics F _o ^w ^×1×c, and perform attention mapping operation according to the following formulas to obtain a corresponding time characteristic sequence F _t.

Wherein: f _t represents a time feature sequence; f _o ^w×1×c is the attention feature; A feature vector for each channel domain for the attention feature F _o ^w×1×c; c represents the number of channels; the time feature sequences f _t ^c_i and f _t ^e_i of the attention features in the large-scale prediction model and the lightweight prediction model are obtained through calculation of the formula.

And measuring the weight similarity degree between each characteristic time sequence in the edge model and the corresponding cloud model sequence by adopting the cosine distance.

Respectively calculating MSE loss and attention distillation loss based on tag data and attention characteristics of historical data to obtain corresponding training loss, and further updating parameters of the lightweight prediction model through the training loss;

λ＝(1-α^ep)λ₀；

Wherein: loss_all represents training Loss; loss_mse represents MSE Loss; loss_att represents the Loss of attention distillation; n is the batch size; Is a predicted value; y _i is tag data; d _C(f₁,f₂) represents the cosine distance; < f ₁,f₂ > represents a two-vector inner product; f _t ^c_i and f _t ^e_i are temporal feature sequences of attention features in the large-scale prediction model and the lightweight prediction model, respectively; /(I) And/>Representing the element values in the time series of features f _t ^c_i and f _t ^e_i, respectively; w is the tensor time domain number; λ represents a dynamic distillation loss coefficient; alpha represents a number less than 1; lambda ₀ represents the initial distillation loss coefficient; ep represents the number of training rounds.

According to the method, the MSE loss and the attention distillation loss are calculated respectively based on the tag data and the attention characteristics of the historical data to obtain the corresponding training loss, and then the parameters of the lightweight prediction model are updated through the training loss, so that the cloud edge cooperative training based on the attention characteristics is realized, the accuracy of the edge lightweight prediction model can be improved by fully utilizing the model knowledge of the cloud platform, and the problem of limited model prediction accuracy caused by the simple structure and the small parameter quantity of the edge lightweight prediction model is avoided.

In the implementation process, in the actual production environment, the abrasion data of the machine tool cutter are continuously generated, and the performance of the model is reduced along with the time lapse, the equipment aging, the processing condition change and the like. Old data often cannot be used for model retraining due to storage limitations or privacy protection, but only relies on knowledge in the new data, which can easily cause catastrophic forgetting of the model.

Therefore, the invention uploads the feature information to be tested to the cloud platform and inputs the feature information to the large-scale prediction model for model precision evaluation, and when the loss of the large-scale prediction model exceeds an expected threshold value, incremental training is carried out on the large-scale prediction model, so that an incremental training method based on a historical model library and an attention forgetting factor is provided.

The incremental training method based on the history model library and the attention forgetting factor has the following algorithm:

As shown in fig. 9, the incremental training specifically includes the steps of:

In this embodiment, the history model stores parameters of the large-scale prediction model of each version in the history training process.

S212: obtaining a distance loss based on each history model

η＝η₀e^-ki；

In the actual production environment, the tool wear data of the machine tool are continuously generated, and the performance of the model is reduced along with the time, the equipment aging, the processing condition change and the like.

Therefore, the incremental training algorithm based on the attention forgetting factors is provided, and the large-scale prediction model is incrementally trained by combining the historical model and the parameters of the edge model library with the forgetting factors, so that the large-scale repeated training or catastrophic forgetting of historical data can be avoided, the life learning capacity of the large-scale prediction model is further improved, the tool wear state prediction precision of the lightweight prediction model can be continuously ensured in the long-term operation process, and a practical solution is provided for the tool wear state detection of the numerical control machine tool.

Embodiment two:

The embodiment also discloses a cutter abrasion state prediction method based on cloud edge cooperative training, which is implemented based on the cutter abrasion state prediction system in the first embodiment.

As shown in fig. 2, the method for predicting the tool wear state based on cloud edge cooperative training specifically includes the following steps:

In this embodiment, a plurality of data acquisition cards and controllers are installed on a numerical control machine tool of a tool to be measured, an acceleration sensor and a current sensor are installed on a spindle of the tool of the machine tool and a spindle motor respectively, and are connected with the data acquisition cards to respectively acquire cutting vibration signals and current signals, namely sensor data, of the tool to be measured. And uploading sensor data in the cutting process of the machine tool to an edge platform through the DDS for subsequent processing.

In this embodiment, the feedback control information (including the control command and the early warning signal) is generated by using the cutter wear state prediction result by using the existing mature means, and details thereof are not described herein.

In this embodiment, the controller installed in the equipment layer may receive feedback control information (including a control instruction and an early warning signal) returned by the edge layer, and control related equipment such as a machine tool to take corresponding measures.

In the implementation process, compared with the cyclic neural network, the convolutional neural network can execute parallel calculation, and can train and infer more quickly. In addition, because of the capability of sparse interaction and parameter sharing, the calculation amount is effectively reduced, and the overfitting is restrained.

the feature extraction model takes sensor data as input of a first partial convolution operation; carrying out pooling treatment on the output result after batch regularization of the first part of convolution operation, and taking the result as the input of the second part of convolution operation; finally, carrying out pooling treatment on the results obtained after the second partial convolution operation and channel splicing, and outputting corresponding characteristic tensors, namely data characteristics; wherein the basic convolution is expressed as:

BasicConv(X)＝relu(bn(conv(X,k,1)))＝relu(bn(W_k*X+b_k))；

the ReLU activation function ReLU is expressed as:

wherein: x represents input data;

The basic pooling layer of the pooling operation is expressed as:

BasicPool(x)＝concat(pool(x,k₁,s),conv(x,k₂,s))；

The shape of the feature tensor F _f is w _f×1×c_f;

In the implementation process, after the data features are extracted, a large-scale cutter abrasion value prediction model needs to be designed and deployed to a cloud platform for knowledge extraction. The introduction of the attention mechanism is more beneficial to the migration of the knowledge of the model feature domain. The signal has a one-dimensional and time-sequential nature in the time domain compared to the image data, and the importance of the different sensor data and the different channel characteristics varies.

wherein the structure of the dense layer is expressed as:

IncepResLayer_B(X_i)＝relu(X_i+linerConv(X_m,5×1))；

X_m＝concat(BasicConv(X_i,1×1),BasicConv(BasicConv(X_i,1×1),5×1))；

the structure of the pooling layer is expressed as:

Pool(X_i)＝concat(X_m1,X_m2,X_m3)；

X_m1＝Maxpool(X_i,k₁×1)；

X_m2＝BasicConv(X_i,k₁×1)；

X_m3＝BasicConv(BasicConv(X_i,1×1),k₁×1)；

as shown in fig. 5, the working logic of the attention module is as follows:

As shown in fig. 7, the edge model library, the feature extraction model and the lightweight prediction model are all deployed on the edge side, and the cloud platform is deployed with a large-scale prediction model. The training process of each round carries out parameter updating and cloud edge data transmission of three models. Firstly, ignoring a cloud model, extracting data features by a feature extraction model, then transmitting the data features into a lightweight prediction model, updating two parameters according to an MSE rule, then forward transmitting the data through the feature extraction model again, uploading the features and labels into the cloud platform, updating attention distillation parameters, transmitting attention features of a large-scale prediction model only to an edge platform, and finally, finishing training of one round by the lightweight prediction model according to an MSE loss function and weighted attention feature distillation loss updating parameters.

The algorithm of cloud edge system training is as follows:

Specifically, cloud edge cooperative training is realized through the following steps:

The invention provides a deep learning method based on an attention mechanism, a residual network and the like, and provides a deep multi-convolution kernel attention residual network model, namely a large-scale prediction model, a lightweight dynamic cavity convolution model, namely a lightweight prediction model, and an intelligent framework of cloud edge collaborative training and edge side real-time reasoning is established, so that the data characteristics of time sequence signals of different sensor data in different fields can be fully explored, and data fusion in time and space is carried out on the time sequence signals, and compared with the existing other models, the cloud edge collaborative framework has better prediction precision and faster reasoning speed, thereby further improving the precision and real-time performance of cutter abrasion state prediction.

λ＝(1-α^ep)λ₀；

S212: obtaining a distance loss based on each history model

η＝η₀e^-ki；

Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the technical solution, and those skilled in the art should understand that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the present invention, and all such modifications and equivalents are included in the scope of the claims.

Claims

1. Cutter wear state prediction system based on cloud limit cooperative training, its characterized in that includes:

The equipment layer is used for acquiring sensor data of the tool to be tested;

The cloud platform is deployed with a large-scale prediction model based on an attention mechanism; the large-scale prediction model learns attention characteristics and distills the attention characteristics into a lightweight prediction model of an edge platform so as to realize cooperative training of a cloud edge model;

the tool wear state prediction method based on cloud edge cooperative training is implemented based on a tool wear state prediction system and specifically comprises the following steps of:

in the step S2, cloud edge cooperative training is realized through the following steps:

In step S202, the feature extraction model includes a two-part convolution operation, where the first part convolution operation includes adding convolution results of the 1×1 convolution kernel and the 3×1 convolution kernel and performing batch regularization; the second partial convolution operation comprises channel splicing of basic convolution results of kernels with different sizes, wherein the basic convolution comprises convolution, batch regularization and ReLU activation functions;

wherein the basic convolution is expressed as:

BasicConv(X)＝relu(bn(conv(X,k,1)))＝relu(bn(W_k*X+b_k))；

the ReLU activation function ReLU is expressed as:

wherein: x represents input data;

The basic pooling layer of the pooling operation is expressed as:

BasicPool(x)＝concat(pool(x,k₁,s),conv(x,k₂,s))；

The shape of the feature tensor F _f is w _f×1×c_f;

Wherein: w _f represents the time domain size; c _f denotes a channel domain size;

In step S203, the lightweight prediction model cancels the dense connection structure based on the architecture of the large-scale prediction model and replaces the common convolution by the hole convolution, so as to realize the lightweight design;

in step S204, the large-scale prediction model includes a dense connection structure formed by sequentially connecting three attention dense modules, two pooling layers and a full connection layer;

wherein the structure of the dense layer is expressed as:

IncepResLayer_B(X_i)＝relu(X_i+linerConv(X_m,5×1))；

X_m＝concat(BasicConv(X_i,1×1),BasicConv(BasicConv(X_i,1×1),5×1))；

the structure of the pooling layer is expressed as:

Pool(X_i)＝concat(X_m1,X_m2,X_m3)；

X_m1＝Maxpool(X_i,k₁×1)；

X_m2＝BasicConv(X_i,k₁×1)；

X_m3＝BasicConv(BasicConv(X_i,1×1),k₁×1)；

The working logic of the attention module is as follows:

4) Respectively associating the characteristics of the time domain and the channel domain with the corresponding time domain weights And channel domain weights/>Multiplying to obtain the attention mapping tensor/>And form residual connection with the input feature F _i ^w×1×c to obtain the attention feature output/>I.e. the attention feature;

In step S205, MSE loss and attention distillation loss are calculated based on the tag data and the attention features of the history data, so as to obtain corresponding training loss, and then parameter updating is performed on the lightweight prediction model through the training loss;

λ＝(1-α^ep)λ₀；

Wherein: loss_all represents training Loss; loss_mse represents MSE Loss; loss_att represents the Loss of attention distillation; n is the batch size; Is a predicted value; y _i is tag data; d _C(f₁,f₂) represents the cosine distance; < f ₁,f₂ > represents a two-vector inner product; f _t ^c_i and f _t ^e_i are temporal feature sequences of attention features in the large-scale prediction model and the lightweight prediction model, respectively; And/> Representing the element values in the time series of features f _t ^c_i and f _t ^e_i, respectively; w is the tensor time domain number; λ represents a dynamic distillation loss coefficient; alpha represents a number less than 1; lambda ₀ represents the initial distillation loss coefficient; ep represents the number of training rounds;

the time signature sequence is calculated by the following formula:

Wherein: f _t represents a time feature sequence; Is a attention feature; /(I) For attention features/>A feature vector for each channel domain; c represents the number of channels; calculating to obtain time feature sequences f _t ^c_i and f _t ^e_i of attention features in the large-scale prediction model and the lightweight prediction model through the formula;

s206: repeating steps S202 to S205 until the light predictive model reaches the expectation;

In the step S2, the feature information to be detected is uploaded to a cloud platform and is input into a large-scale prediction model to carry out model precision evaluation, and when the loss of the large-scale prediction model exceeds an expected threshold value, incremental training is carried out on the large-scale prediction model;

the incremental training specifically comprises the following steps:

S212: obtaining a distance loss based on each history model

η＝η₀e^-ki；

Wherein: l _incre represents incremental loss; l _mse represents the mean square error of the tag data; n represents the number of history models;

2. The cloud edge co-training based tool wear state prediction system according to claim 1, wherein: the edge platform is also provided with a data preprocessing module; the data preprocessing module is used for carrying out data cleaning and Z-score normalization processing on the sensor data.