CN115034504B - Cutter wear state prediction system and method based on cloud edge cooperative training - Google Patents

Cutter wear state prediction system and method based on cloud edge cooperative training Download PDF

Info

Publication number
CN115034504B
CN115034504B CN202210754025.5A CN202210754025A CN115034504B CN 115034504 B CN115034504 B CN 115034504B CN 202210754025 A CN202210754025 A CN 202210754025A CN 115034504 B CN115034504 B CN 115034504B
Authority
CN
China
Prior art keywords
model
convolution
data
training
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210754025.5A
Other languages
Chinese (zh)
Other versions
CN115034504A (en
Inventor
李孝斌
王明星
江沛
尹超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202210754025.5A priority Critical patent/CN115034504B/en
Publication of CN115034504A publication Critical patent/CN115034504A/en
Application granted granted Critical
Publication of CN115034504B publication Critical patent/CN115034504B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention particularly relates to a cutter abrasion state prediction system and method based on cloud edge cooperative training. The system comprises: the equipment layer is used for acquiring sensor data of the tool to be tested; the edge platform is provided with a trained feature extraction model and a lightweight prediction model; the feature extraction model is used for extracting data features in sensor data as feature information to be detected, and the lightweight prediction model is used for taking the feature information to be detected as input and outputting a corresponding cutter abrasion state prediction result; the cloud platform is deployed with a large-scale prediction model based on an attention mechanism; the large-scale prediction model learns attention characteristics and distills the attention characteristics into a lightweight prediction model of the edge platform so as to realize cooperative training of the cloud edge model. The invention also discloses a cutter wear state prediction method. The method can be used for distributing the prediction model at the edge to realize the prediction of the cutter abrasion state, and the precision of the cutter abrasion state prediction is improved in a cloud edge cooperative training mode.

Description

Cutter wear state prediction system and method based on cloud edge cooperative training
Technical Field
The invention relates to the technical field of cutter wear state prediction, in particular to a cutter wear state prediction system and method based on cloud edge cooperative training.
Background
The state of wear of machine tool cutters is an important factor affecting the stability and reliability of the quality of the product in the manufacturing plant. When the cutter abrasion loss is increased to a certain limit, the cutting parameters are unstable, the product failure rate is increased, the continuous processing causes time and material waste, and the whole production process can not normally run when serious. Therefore, it is important to realize real-time and accurate monitoring and prediction of the tool wear state of the machine tool in the machining process.
Existing methods for detecting the wear state of a cutter are divided into a direct measurement method and an indirect measurement method. Direct measurement can directly identify the appearance, surface quality and wear state of the tool by means of sensors, but it can only be detected when it requires a shutdown. Because the surrounding environment of the cutter in the actual production and processing process is complex, the abrasion state of the cutter cannot be directly detected in real time, and an indirect measurement method is generally adopted to collect multiple sensor data such as vibration signals, cutting force, cutting temperature, cutting power and the like in the cutter processing process in real time, and the characteristic data is input into a machine learning model through data cleaning, data fusion and characteristic engineering to output a prediction result, so that the monitoring of the abrasion state of the cutter is completed.
Applicants have found that deep learning methods under large data drives often require a significant amount of computational resources. However, existing central intelligent operation modes generally deploy a prediction model to the cloud, which causes both model training and actual prediction to be affected by network fluency, resulting in lower stability of tool wear state prediction. Meanwhile, a large amount of training data or sensor data is uploaded to the cloud end, so that serious bandwidth consumption is caused, and then real-time response requirements of tool wear state monitoring in an actual production and processing environment cannot be met, so that real-time performance of tool wear state prediction is poor. Therefore, how to design a method for improving the stability and real-time performance of tool wear state prediction is a technical problem to be solved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to solve the technical problems that: how to provide a cutter wear state prediction method based on cloud edge cooperative training, so that a prediction model can be deployed at the edge to realize the prediction of the cutter wear state, thereby improving the stability and instantaneity of the cutter wear state prediction, and improving the precision of the cutter wear state prediction in a cloud edge cooperative training mode.
In order to solve the technical problems, the invention adopts the following technical scheme:
Cutter wear state prediction system based on cloud limit cooperative training includes:
The equipment layer is used for acquiring sensor data of the tool to be tested;
the edge platform is provided with a trained feature extraction model and a lightweight prediction model; the feature extraction model is used for extracting data features in sensor data as feature information to be detected, and the lightweight prediction model is used for taking the feature information to be detected as input and outputting a corresponding cutter abrasion state prediction result;
The cloud platform is deployed with a large-scale prediction model based on an attention mechanism; the large-scale prediction model learns attention characteristics and distills the attention characteristics into a lightweight prediction model of the edge platform so as to realize cooperative training of the cloud edge model.
Preferably, the edge platform is further provided with a data preprocessing module; the data preprocessing module is used for carrying out data cleaning and Z-score normalization processing on the sensor data.
The invention also discloses a cutter abrasion state prediction method based on cloud edge cooperative training, which is implemented based on the cutter abrasion state prediction system and specifically comprises the following steps:
s1: acquiring sensor data of a tool to be tested through an equipment layer, and uploading the sensor data to an edge platform;
S2: the edge platform receives sensor data and inputs the sensor data into a trained feature extraction model, and data features are extracted to serve as feature information to be detected; then inputting the characteristic information to be detected into a lightweight prediction model subjected to cloud edge cooperative training, and outputting a corresponding cutter abrasion state prediction result;
s3: the edge platform generates feedback control information based on the cutter abrasion state prediction result and transmits the feedback control information to the equipment layer;
S4: the equipment layer controls a machine tool of the to-be-tested cutter to execute corresponding actions based on the feedback control information.
Preferably, in step S2, cloud edge cooperative training is achieved through the following steps:
s201: acquiring a training data set with a plurality of groups of training data and tag data thereof;
S202: inputting training data into a feature extraction model, and extracting data features as training feature information;
S203: inputting training feature information and corresponding tag data into a lightweight prediction model, and updating parameters of a feature extraction model and the lightweight prediction model;
S204: uploading training feature information and corresponding tag data to a cloud platform, inputting the training feature information and the corresponding tag data into a large-scale prediction model, updating parameters of the large-scale prediction model, and further distilling and outputting attention features of the training;
s205: training and updating parameters of the feature extraction model and the lightweight prediction model based on the attention features and the historical data of cloud migration;
s206: steps S202 to S205 are repeated until the lightweight predictive model reaches the expectations.
Preferably, in step S202, the feature extraction model includes a two-part convolution operation, and the first part convolution operation includes adding convolution results of the 1×1 convolution kernel and the 3×1 convolution kernel and performing batch regularization; the second partial convolution operation comprises channel splicing of basic convolution results of kernels with different sizes, wherein the basic convolution comprises convolution, batch regularization and ReLU activation functions;
The feature extraction model takes sensor data as input of a first partial convolution operation; carrying out pooling treatment on the output result after batch regularization of the first part of convolution operation, and taking the result as the input of the second part of convolution operation; finally, carrying out pooling treatment on the results obtained after the second partial convolution operation and channel splicing, and outputting corresponding characteristic tensors, namely data characteristics;
wherein the basic convolution is expressed as:
BasicConv(X)=relu(bn(conv(X,k,1)))=relu(bn(Wk*X+bk));
Wherein: x represents input data; w k represents a convolution kernel of size k 1×k2; * Representing a convolution operation; b k denotes offset; reLU denotes a ReLU activation function; adding a batch regularization operation bn between the convolution and the ReLU activation function;
the ReLU activation function ReLU is expressed as:
wherein: x represents input data;
By learning the mean mu β and variance in small batches of data To achieve batch regularization;
wherein: x i denotes the input data sample; m represents the current batch data size;
wherein: epsilon represents a smaller value greater than zero; gamma and beta represent trainable scale and bias parameters, respectively; representing normalized data; y i is the output after the self-learning size transformation and offset;
The basic pooling layer of the pooling operation is expressed as:
BasicPool(x)=concat(pool(x,k1,s),conv(x,k2,s));
Wherein: pool represents a pooling operation; conv represents a convolution operation; k 1 and k 2 represent the sizes of the pooling and convolution kernels, respectively; s represents the step size; concat means that the feature vectors are subjected to channel dimension stitching;
The shape of the feature tensor F f is w f×1×cf;
Wherein: w f represents the time domain size; c f denotes the channel domain size.
Preferably, in step S204, the large-scale prediction model includes a dense connection structure formed by sequentially connecting three attention-dense modules, two pooling layers and a full connection layer;
Each attention-dense module comprises a plurality of corresponding dense layers and an attention layer, and the layers are in dense connection; each dense layer comprises a plurality of basic convolutions of convolution kernels with different sizes, the tensors are input into linear convolutions after being subjected to channel splicing, and the tensors are connected with an input tensor to form residual errors and then subjected to ReLU activation to obtain the output of the dense layer; the attention module starts from two dimensions of time and a channel and performs weight learning for the characteristics of the target data;
wherein the structure of the dense layer is expressed as:
IncepResLayer_B(Xi)=relu(Xi+linerConv(Xm,5×1));
Xm=concat(BasicConv(Xi,1×1),BasicConv(BasicConv(Xi,1×1),5×1));
Wherein: INCEPRESLAYER _b represents a type B dense layer; x i represents the input of the dense layer; reLU denotes a ReLU activation function; the linerConv (x) function represents an inactive linear convolution layer; concat means that the feature vectors are subjected to channel dimension stitching; basicConv (x) represents the basic convolution;
the pooling layer comprises a maximum pooling layer and a basic convolution layer of the same step length and multi-size convolution kernel;
the structure of the pooling layer is expressed as:
Pool(Xi)=concat(Xm1,Xm2,Xm3);
Xm1=Maxpool(Xi,k1×1);
Xm2=BasicConv(Xi,k1×1);
Xm3=BasicConv(BasicConv(Xi,1×1),k1×1);
Wherein: pool represents a pooling layer; concat means that the feature vectors are subjected to channel dimension stitching; maxpool (x) represents the maximum pooling operation, and a core size of k 1×1;Xi represents the input of the pooling layer; basicConv (x) represents the basic convolution; the step sizes of all convolution operations and pooling operations are strides =4;
The working logic of the attention module is as follows:
1) The given input sequence x=x 1,x2,...,xT and the filter f=f 1,f2,...fK are subjected to the following time domain convolution to obtain the correlation sequence a=a 1,a2,...,aT;
obtaining a final time domain weight sequence Y=y 1,y2,...,yT through a Softmax function;
2) For the input feature F i w×1×c, a one-dimensional sequence is obtained through single-channel 1X 1 convolution And obtaining the time domain weight/>, through time domain convolution and Softmax function
Wherein: softmax represents the Softmax function; temporalConv denotes a time domain convolution; conv represents convolution; 1×1 and 3×1 represent the convolution kernel shapes of the convolution and the time domain convolution, respectively;
3) Time domain weights After transposition, the obtained product is multiplied by an input characteristic F i w×1×c to obtain a one-dimensional sequence/>And one-dimensional sequence/> according to the ratio of c/rThe channel number of the channel is reduced, layer normalization and ReLU activation are carried out simultaneously, and then the original channel number is put back according to the original multiplying power, so as to obtain channel domain weight/>
Wherein: conv represents convolution; reLU represents a ReLU activation function; layerNorm denotes layer normalization; representing time domain weights; the superscript T denotes a transpose operation; c/r represents the channel domain dimension reduction ratio;
4) Respectively associating the characteristics of the time domain and the channel domain with the corresponding time domain weights And channel domain weights/>Multiplying to obtain the attention mapping tensor/>And form residual connection with the input feature F i w×1×c to obtain the attention feature output/>I.e. the attention feature.
Preferably, in step S203, the lightweight prediction model cancels the dense connection structure based on the architecture of the large-scale prediction model and replaces the normal convolution with the hole convolution, so as to realize the lightweight design.
Preferably, in step S205, MSE loss and attention distillation loss are calculated based on the tag data and the attention features of the history data, respectively, so as to obtain corresponding training loss, and then parameter updating is performed on the lightweight prediction model through the training loss;
wherein the method comprises the steps of ,Loss_all=Loss_mse+λLoss_att=Loss_mse+λ∑i=a,b,cDC(ft c_i,ft e_i);
λ=(1-αep0
Wherein: loss_all represents training Loss; loss_mse represents MSE Loss; loss_att represents the Loss of attention distillation; n is the batch size; Is a predicted value; y i is tag data; d C(f1,f2) represents the cosine distance; < f 1,f2 > represents a two-vector inner product; f t c_i and f t e_i are temporal feature sequences of attention features in the large-scale prediction model and the lightweight prediction model, respectively; /(I) And/>Representing the element values in the time series of features f t c_i and f t e_i, respectively; w is the tensor time domain number; λ represents a dynamic distillation loss coefficient; alpha represents a number less than 1; lambda 0 represents the initial distillation loss coefficient; ep represents the number of training rounds;
the time signature sequence is calculated by the following formula:
Wherein: f t represents a time feature sequence; f o w×1×c is the attention feature; for attention features/> A feature vector for each channel domain; c represents the number of channels; the time feature sequences f t c_i and f t e_i of the attention features in the large-scale prediction model and the lightweight prediction model are obtained through calculation of the formula.
Preferably, in step S2, the feature information to be measured is uploaded to the cloud platform and input to the large-scale prediction model for model accuracy evaluation, and when the loss of the large-scale prediction model exceeds the expected threshold, incremental training is performed on the large-scale prediction model.
Preferably, the incremental training specifically comprises the steps of:
s211: initializing a model to be trained by using the latest parameters in a historical model library, respectively inputting new training data into the model to be trained and the historical model, and respectively calculating Euclidean distances between each historical model and feature mapping of the model to be trained;
Wherein: Representing Euclidean distances between model features; f and F i are feature tensors of the model to be trained and the history model respectively; d t and D c are the euclidean distance of the time sequence and the channel sequence, respectively; x j and/> Time sequence elements of the model to be trained and the historical model are respectively; y k and/>Channel sequence elements of the model to be trained and the historical model are respectively;
S212: obtaining a distance loss based on each history model
S213: setting a corresponding forgetting factor eta based on the importance degree of the history model;
η=η0e-ki
wherein: η 0 represents an initial forgetting factor; k represents a forgetting coefficient; i represents a history model number; the characteristic distance loss weight of the historical model drops exponentially along with the updating of the model;
s214: constructing an incremental loss function based on Euclidean distance between the characteristic mapping of the historical model and the model to be trained, and performing incremental training by taking the incremental loss function as an index;
Wherein: l incre represents incremental loss; l mse represents the mean square error of the tag data; n represents the number of history models.
The cutter abrasion state prediction method based on cloud edge cooperative training has the following beneficial effects:
According to the method, the device and the system, the sensor data of the tool to be tested are obtained through the equipment layer, the data features are extracted through the feature extraction model and the lightweight prediction model of the edge platform, and the tool wear state prediction result is output, and the edge platform is arranged near the actual production environment, so that the problem of high delay response of the traditional cloud normal frame is effectively solved, the flexibility and the expandability of the whole system can be improved, and the stability and the instantaneity of the tool wear state prediction can be improved. In addition, the lightweight prediction model is deployed on the edge platform, and the model fitting capacity is maintained, so that fewer parameters and faster reasoning speed are achieved, and the real-time performance of the cutter wear state prediction can be further improved.
Meanwhile, the attention features are distilled out in each round of training process through the large-scale prediction model deployed on the cloud platform to assist in training of the lightweight prediction model, so that an intelligent framework of cloud edge cooperative training and edge real-time reasoning is formed, the accuracy of the edge lightweight prediction model can be improved by fully utilizing model knowledge of the cloud platform, the problem that model prediction accuracy is limited due to the fact that the edge lightweight prediction model is simple in structure and small in parameter quantity is avoided, and the accuracy of tool wear state prediction can be further improved through a cloud edge cooperative training mode.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings, in which:
FIG. 1 is a logic block diagram of a tool wear state prediction system based on cloud edge co-training;
FIG. 2 is a logic block diagram of a tool wear state prediction method based on cloud edge co-training;
FIG. 3 is a frame diagram of a feature extraction model;
FIG. 4 is a frame diagram of a large-scale predictive model;
FIG. 5 is a schematic diagram of a time-domain attention mechanism module of an attention module;
FIG. 6 is a frame diagram of a lightweight predictive model;
FIG. 7 is a logic block diagram of a cloud edge co-training method;
FIG. 8 is a graph of feature mapping and distillation loss based on cosine distance;
Fig. 9 is a logic diagram of incremental training.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. In the description of the present invention, it should be noted that, directions or positional relationships indicated by terms such as "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., are directions or positional relationships based on those shown in the drawings, or are directions or positional relationships conventionally put in use of the inventive product, are merely for convenience of describing the present invention and simplifying the description, and are not indicative or implying that the apparatus or element to be referred to must have a specific direction, be constructed and operated in a specific direction, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance. Furthermore, the terms "horizontal," "vertical," and the like do not denote a requirement that the component be absolutely horizontal or overhang, but rather may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined. In the description of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
The following is a further detailed description of the embodiments:
Embodiment one:
the embodiment discloses a cutter wear state prediction system based on cloud edge cooperative training.
As shown in fig. 1, the tool wear state prediction system based on cloud edge co-training includes:
The equipment layer is used for acquiring sensor data of the tool to be tested;
in this embodiment, a plurality of data acquisition cards and controllers are installed on a numerical control machine tool of a tool to be measured, an acceleration sensor and a current sensor are installed on a spindle of the tool of the machine tool and a spindle motor respectively, and are connected with the data acquisition cards to respectively acquire cutting vibration signals and current signals, namely sensor data, of the tool to be measured.
And sensor data in the cutting process of the machine tool is uploaded to an edge platform through the DDS for subsequent processing, and meanwhile, a controller installed on the equipment layer can receive feedback control information (comprising control instructions and early warning signals) returned by the edge layer, so that relevant equipment such as the machine tool is controlled to take corresponding measures.
The edge platform is provided with a trained feature extraction model and a lightweight prediction model; the feature extraction model is used for extracting features in sensor data (after pretreatment) as feature information to be detected, and the lightweight prediction model is used for taking the feature information to be detected as input and outputting a corresponding cutter wear state prediction result;
In this embodiment, the edge platform is further configured with a data preprocessing module and an edge model library for storing a history model and data thereof; the data preprocessing module is used for performing data cleaning and Z-score normalization processing on the sensor data, and further storing the preprocessed sensor data into the edge database for subsequent input into the feature extraction model.
The edge platform can also generate corresponding feedback control information (comprising control instructions and early warning signals) based on the cutter abrasion state prediction result and send the feedback control information to the equipment layer. The feedback control information is generated by the cutter wear state prediction result by adopting the existing mature means, and the specific contents are not repeated here.
The cloud platform is deployed with a large-scale prediction model based on an attention mechanism; the model learns attention characteristics and distills the attention characteristics into a lightweight predictive model of an edge platform to realize cooperative training of a cloud edge model.
In this embodiment, the cloud platform is further configured with a model evaluation module for evaluating the prediction accuracy of the large-scale prediction model, and an incremental training module for performing incremental training on the large-scale prediction model.
It should be noted that the lightweight prediction model according to the present invention is not absolutely "lightweight", but rather is "lightweight" with respect to the large-scale prediction model or other existing depth network model according to the present invention, and has fewer parameters and faster reasoning speed than the above model.
According to the method, the device and the system, the sensor data of the tool to be tested are obtained through the equipment layer, the data features are extracted through the feature extraction model and the lightweight prediction model of the edge platform, and the tool wear state prediction result is output, and the edge platform is arranged near the actual production environment, so that the problem of high delay response of the traditional cloud normal frame is effectively solved, the flexibility and the expandability of the whole system can be improved, and the stability and the instantaneity of the tool wear state prediction can be improved. In addition, the lightweight prediction model is deployed on the edge platform, so that fewer parameters and faster reasoning speed can be realized under the condition of keeping the fitting capacity of the model, the training difficulty of the model can be reduced, the prediction efficiency of the model can be improved, and the real-time performance of the prediction of the cutter abrasion state can be further improved.
Meanwhile, the attention features are distilled out in each round of training process through the large-scale prediction model deployed on the cloud platform to assist in training the lightweight prediction model, so that an intelligent framework for cloud edge cooperative training and edge real-time reasoning is formed, the accuracy of the edge lightweight prediction model can be improved by fully utilizing model knowledge of the cloud platform, the problem that the model prediction accuracy is limited due to the fact that the edge lightweight prediction model is simple in structure and small in parameter quantity is avoided, and the accuracy of tool wear state prediction can be further improved through a cloud edge cooperative training mode.
Specific:
As shown in fig. 3, the feature extraction model includes a two-part convolution operation, the first part convolution operation including adding the convolution results of the 1×1 convolution kernel and the 3×1 convolution kernel and performing batch regularization; the second partial convolution operation comprises channel splicing of basic convolution results of kernels with different sizes, wherein the basic convolution comprises convolution, batch regularization and ReLU activation functions;
The feature extraction model takes sensor data as input of a first partial convolution operation; carrying out pooling treatment on the output result after batch regularization of the first part of convolution operation, and taking the result as the input of the second part of convolution operation; finally, carrying out pooling treatment on the results obtained after the second partial convolution operation and channel splicing, and outputting corresponding characteristic tensors, namely data characteristics;
wherein the basic convolution is expressed as:
BasicConv(X)=relu(bn(conv(X,k,1)))=relu(bn(Wk*X+bk));
Wherein: x represents input data; w k represents a convolution kernel of size k 1×k2; * Representing a convolution operation; b k denotes offset; reLU denotes a ReLU activation function; adding a batch regularization operation bn between the convolution and the ReLU activation function;
the ReLU activation function ReLU is expressed as:
wherein: x represents input data;
By learning the mean mu β sum in small batches of data To achieve batch regularization;
wherein: x i denotes the input data sample; m represents the current batch data size;
wherein: epsilon represents a smaller value greater than zero; gamma and beta represent trainable scale and bias parameters, respectively; representing normalized data; y i is the output after the self-learning size transformation and offset;
The basic pooling layer of the pooling operation is expressed as:
BasicPool(x)=concat(pool(x,k1,s),conv(x,k2,s));
wherein: pool represents a pooling operation; conv represents a convolution operation; k 1 and k 2 represent the size of the pooling kernel and convolution kernel, respectively; s represents the step size; concat means that the feature vectors are subjected to channel dimension stitching;
The shape of the feature tensor F f is w f×1×cf;
Wherein: w f represents the time domain size; c f denotes the channel domain size.
The feature extraction model has sparse interaction and parameter sharing capabilities, so that data features can be effectively extracted from sensor data for model training and real-time preside, calculated amount can be effectively reduced, overfitting can be restrained, and real-time performance of cutter wear state prediction can be further improved.
As shown in fig. 4, the large-scale prediction model includes a dense connection structure formed by sequentially connecting three attention dense modules, two pooling layers and a full connection layer;
Each attention-dense module comprises a plurality of corresponding dense layers and an attention layer, and the layers are in dense connection; each dense layer comprises a plurality of basic convolutions of convolution kernels with different sizes, the tensors are input into linear convolutions after being subjected to channel splicing, and the tensors are connected with an input tensor to form residual errors and then subjected to ReLU activation to obtain the output of the dense layer; the attention module starts from two dimensions of time and a channel and performs weight learning for the characteristics of the target data;
wherein the structure of the dense layer is expressed as:
IncepResLayer_B(Xi)=relu(Xi+linerConv(Xm,5×1));
Xm=concat(BasicConv(Xi,1×1),BasicConv(BasicConv(Xi,1×1),5×1));
Wherein: INCEPRESLAYER _b represents a type B dense layer; x i represents the input of the dense layer; reLU denotes a ReLU activation function; the linerConv (x) function represents an inactive linear convolution layer; concat means that the feature vectors are subjected to channel dimension stitching; basicConv (x) represents the basic convolution;
the pooling layer comprises a maximum pooling layer and a basic convolution layer of the same step length and multi-size convolution kernel;
the structure of the pooling layer is expressed as:
Pool(Xi)=concat(Xm1,Xm2,Xm3);
Xm1=Maxpool(Xi,k1×1);
Xm2=BasicConv(Xi,k1×1);
Xm3=BasicConv(BasicConv(Xi,1×1),k1×1);
Wherein: pool represents a pooling layer; concat means that the feature vectors are subjected to channel dimension stitching; maxpool (x) represents the maximum pooling operation, and a core size of k 1×1;Xi represents the input of the pooling layer; basicConv (x) represents the basic convolution; the step sizes of all convolution operations and pooling operations are strides =4;
as shown in fig. 5, the working logic of the attention module is as follows:
1) The given input sequence x=x 1,x2,...,xT and the filter f=f 1,f2,...fK are subjected to the following time domain convolution to obtain the correlation sequence a=a 1,a2,...,aT;
obtaining a final time domain weight sequence Y=y 1,y2,...,yT through a Softmax function;
2) For the input feature F i w×1×c, a one-dimensional sequence is obtained through single-channel 1X 1 convolution And obtaining the time domain weight/>, through time domain convolution and Softmax function
Wherein: softmax represents the Softmax function; temporalConv denotes a time domain convolution; conv represents convolution; 1×1 and 3×1 represent the convolution kernel shapes of the convolution and the time domain convolution, respectively;
3) Time domain weights After transposition, the obtained product is multiplied by an input characteristic F i w×1×c to obtain a one-dimensional sequence/>And one-dimensional sequence/> according to the ratio of c/rThe channel number of the channel is reduced, layer normalization and ReLU activation are carried out simultaneously, and then the original channel number is put back according to the original multiplying power, so as to obtain channel domain weight/>
Wherein: conv represents convolution; reLU represents a ReLU activation function; layerNorm denotes layer normalization; representing time domain weights; the superscript T denotes a transpose operation; c/r represents the channel domain dimension reduction ratio;
4) Respectively associating the characteristics of the time domain and the channel domain with the corresponding time domain weights And channel domain weights/>Multiplying to obtain the attention mapping tensor/>And form residual connection with the input feature F i w×1×c to obtain the attention feature output/>I.e. the attention feature.
In the implementation process, as shown in fig. 6, the lightweight prediction model cancels the dense connection structure based on the architecture of the large-scale prediction model and replaces the common convolution by the hole convolution, so as to realize the lightweight design.
In the implementation process, since the learning capacity of the lightweight prediction model is limited, in order to enable the model to mine more data information at the edge end and have stronger generalization capacity, knowledge learned by the large-scale prediction model needs to be transferred to the edge side.
As shown in fig. 7, the edge model library, the feature extraction model and the lightweight prediction model are all deployed on an edge platform, and a large-scale prediction model is deployed in a cloud platform. The training process of each round carries out parameter updating and cloud edge data transmission of three models. Firstly, ignoring a cloud model, extracting data features by a feature extraction model, then transmitting the data features into a lightweight prediction model, updating two parameters according to an MSE rule, then forward transmitting the data through the feature extraction model again, uploading the features and labels into the cloud platform, updating attention distillation parameters, transmitting attention features of a large-scale prediction model only to an edge platform, and finally, finishing training of one round by the lightweight prediction model according to an MSE loss function and weighted attention feature distillation loss updating parameters.
The algorithm of cloud edge system training is as follows:
cloud edge cooperative training is realized through the following steps:
s201: acquiring a training data set with a plurality of groups of training data and tag data thereof;
S202: inputting training data into a feature extraction model, and extracting data features as training feature information;
S203: inputting training feature information and corresponding tag data into a lightweight prediction model, and updating parameters of a feature extraction model and the lightweight prediction model;
In this embodiment, the parameters of the feature extraction model and the lightweight prediction model are optimized by the existing MSE loss function.
S204: uploading training feature information and corresponding tag data to a cloud platform, inputting the training feature information and the corresponding tag data into a large-scale prediction model, updating parameters of the large-scale prediction model, and further distilling and outputting attention features of the training;
In this embodiment, the parameters of the large-scale predictive model are optimized by the existing MSE loss function.
S205: training and updating parameters of the feature extraction model and the lightweight prediction model based on the attention features and the historical data of cloud migration;
in this embodiment, the parameters of the lightweight predictive model are optimized by the MSE loss function+the attention deficit loss function.
S206: steps S202 to S205 are repeated until the feature extraction model and the lightweight prediction model reach expectations.
The invention provides a deep learning method based on an attention mechanism, a residual network and the like, and provides a deep multi-convolution kernel attention residual network model, namely a large-scale prediction model, a lightweight dynamic cavity convolution model, namely a lightweight prediction model, and an intelligent framework of cloud edge collaborative training and edge side real-time reasoning is established, so that the data characteristics of time sequence signals of different sensor data in different fields can be fully explored, and the time and space data fusion is carried out on the data characteristics.
In the specific implementation process, as shown in fig. 8, the cloud model and the lightweight model respectively obtain corresponding attention characteristics F o w ×1×c, and perform attention mapping operation according to the following formulas to obtain a corresponding time characteristic sequence F t.
Wherein: f t represents a time feature sequence; f o w×1×c is the attention feature; A feature vector for each channel domain for the attention feature F o w×1×c; c represents the number of channels; the time feature sequences f t c_i and f t e_i of the attention features in the large-scale prediction model and the lightweight prediction model are obtained through calculation of the formula.
And measuring the weight similarity degree between each characteristic time sequence in the edge model and the corresponding cloud model sequence by adopting the cosine distance.
Respectively calculating MSE loss and attention distillation loss based on tag data and attention characteristics of historical data to obtain corresponding training loss, and further updating parameters of the lightweight prediction model through the training loss;
wherein the method comprises the steps of ,Loss_all=Loss_mse+λLoss_att=Loss_mse+λ∑i=a,b,cDC(ft c_i,ft e_i);
λ=(1-αep0
Wherein: loss_all represents training Loss; loss_mse represents MSE Loss; loss_att represents the Loss of attention distillation; n is the batch size; Is a predicted value; y i is tag data; d C(f1,f2) represents the cosine distance; < f 1,f2 > represents a two-vector inner product; f t c_i and f t e_i are temporal feature sequences of attention features in the large-scale prediction model and the lightweight prediction model, respectively; /(I) And/>Representing the element values in the time series of features f t c_i and f t e_i, respectively; w is the tensor time domain number; λ represents a dynamic distillation loss coefficient; alpha represents a number less than 1; lambda 0 represents the initial distillation loss coefficient; ep represents the number of training rounds.
According to the method, the MSE loss and the attention distillation loss are calculated respectively based on the tag data and the attention characteristics of the historical data to obtain the corresponding training loss, and then the parameters of the lightweight prediction model are updated through the training loss, so that the cloud edge cooperative training based on the attention characteristics is realized, the accuracy of the edge lightweight prediction model can be improved by fully utilizing the model knowledge of the cloud platform, and the problem of limited model prediction accuracy caused by the simple structure and the small parameter quantity of the edge lightweight prediction model is avoided.
In the implementation process, in the actual production environment, the abrasion data of the machine tool cutter are continuously generated, and the performance of the model is reduced along with the time lapse, the equipment aging, the processing condition change and the like. Old data often cannot be used for model retraining due to storage limitations or privacy protection, but only relies on knowledge in the new data, which can easily cause catastrophic forgetting of the model.
Therefore, the invention uploads the feature information to be tested to the cloud platform and inputs the feature information to the large-scale prediction model for model precision evaluation, and when the loss of the large-scale prediction model exceeds an expected threshold value, incremental training is carried out on the large-scale prediction model, so that an incremental training method based on a historical model library and an attention forgetting factor is provided.
The incremental training method based on the history model library and the attention forgetting factor has the following algorithm:
As shown in fig. 9, the incremental training specifically includes the steps of:
s211: initializing a model to be trained by using the latest parameters in a historical model library, respectively inputting new training data into the model to be trained and the historical model, and respectively calculating Euclidean distances between each historical model and feature mapping of the model to be trained;
In this embodiment, the history model stores parameters of the large-scale prediction model of each version in the history training process.
Wherein: Representing Euclidean distances between model features; f and F i are feature tensors of the model to be trained and the history model respectively; d t and D c are the euclidean distance of the time sequence and the channel sequence, respectively; x j and/> Time sequence elements of the model to be trained and the historical model are respectively; y k and/>Channel sequence elements of the model to be trained and the historical model are respectively;
S212: obtaining a distance loss based on each history model
S213: setting a corresponding forgetting factor eta based on the importance degree of the history model;
η=η0e-ki
wherein: η 0 represents an initial forgetting factor; k represents a forgetting coefficient; i represents a history model number; the characteristic distance loss weight of the historical model drops exponentially along with the updating of the model;
s214: constructing an incremental loss function based on Euclidean distance between the characteristic mapping of the historical model and the model to be trained, and performing incremental training by taking the incremental loss function as an index;
Wherein: l incre represents incremental loss; l mse represents the mean square error of the tag data; n represents the number of history models.
In the actual production environment, the tool wear data of the machine tool are continuously generated, and the performance of the model is reduced along with the time, the equipment aging, the processing condition change and the like.
Therefore, the incremental training algorithm based on the attention forgetting factors is provided, and the large-scale prediction model is incrementally trained by combining the historical model and the parameters of the edge model library with the forgetting factors, so that the large-scale repeated training or catastrophic forgetting of historical data can be avoided, the life learning capacity of the large-scale prediction model is further improved, the tool wear state prediction precision of the lightweight prediction model can be continuously ensured in the long-term operation process, and a practical solution is provided for the tool wear state detection of the numerical control machine tool.
Embodiment two:
The embodiment also discloses a cutter abrasion state prediction method based on cloud edge cooperative training, which is implemented based on the cutter abrasion state prediction system in the first embodiment.
As shown in fig. 2, the method for predicting the tool wear state based on cloud edge cooperative training specifically includes the following steps:
s1: acquiring sensor data of a tool to be tested through an equipment layer, and uploading the sensor data to an edge platform;
In this embodiment, a plurality of data acquisition cards and controllers are installed on a numerical control machine tool of a tool to be measured, an acceleration sensor and a current sensor are installed on a spindle of the tool of the machine tool and a spindle motor respectively, and are connected with the data acquisition cards to respectively acquire cutting vibration signals and current signals, namely sensor data, of the tool to be measured. And uploading sensor data in the cutting process of the machine tool to an edge platform through the DDS for subsequent processing.
S2: the edge platform receives sensor data and inputs the sensor data into a trained feature extraction model, and data features are extracted to serve as feature information to be detected; then inputting the characteristic information to be detected into a lightweight prediction model subjected to cloud edge cooperative training, and outputting a corresponding cutter abrasion state prediction result;
In this embodiment, the edge platform is further configured with a data preprocessing module and an edge model library for storing a history model and data thereof; the data preprocessing module is used for performing data cleaning and Z-score normalization processing on the sensor data, and further storing the preprocessed sensor data into the edge database for subsequent input into the feature extraction model.
S3: the edge platform generates feedback control information based on the cutter abrasion state prediction result and transmits the feedback control information to the equipment layer;
In this embodiment, the feedback control information (including the control command and the early warning signal) is generated by using the cutter wear state prediction result by using the existing mature means, and details thereof are not described herein.
S4: the equipment layer controls a machine tool of the to-be-tested cutter to execute corresponding actions based on the feedback control information.
In this embodiment, the controller installed in the equipment layer may receive feedback control information (including a control instruction and an early warning signal) returned by the edge layer, and control related equipment such as a machine tool to take corresponding measures.
According to the method, the device and the system, the sensor data of the tool to be tested are obtained through the equipment layer, the data features are extracted through the feature extraction model and the lightweight prediction model of the edge platform, and the tool wear state prediction result is output, and the edge platform is arranged near the actual production environment, so that the problem of high delay response of the traditional cloud normal frame is effectively solved, the flexibility and the expandability of the whole system can be improved, and the stability and the instantaneity of the tool wear state prediction can be improved. In addition, the lightweight prediction model is deployed on the edge platform, so that fewer parameters and faster reasoning speed can be realized under the condition of keeping the fitting capacity of the model, the training difficulty of the model can be reduced, the prediction efficiency of the model can be improved, and the real-time performance of the prediction of the cutter abrasion state can be further improved.
Meanwhile, the attention features are distilled out in each round of training process through the large-scale prediction model deployed on the cloud platform to assist in training the lightweight prediction model, so that an intelligent framework for cloud edge cooperative training and edge real-time reasoning is formed, the accuracy of the edge lightweight prediction model can be improved by fully utilizing model knowledge of the cloud platform, the problem that the model prediction accuracy is limited due to the fact that the edge lightweight prediction model is simple in structure and small in parameter quantity is avoided, and the accuracy of tool wear state prediction can be further improved through a cloud edge cooperative training mode.
In the implementation process, compared with the cyclic neural network, the convolutional neural network can execute parallel calculation, and can train and infer more quickly. In addition, because of the capability of sparse interaction and parameter sharing, the calculation amount is effectively reduced, and the overfitting is restrained.
As shown in fig. 3, the feature extraction model includes a two-part convolution operation, the first part convolution operation including adding the convolution results of the 1×1 convolution kernel and the 3×1 convolution kernel and performing batch regularization; the second partial convolution operation comprises channel splicing of basic convolution results of kernels with different sizes, wherein the basic convolution comprises convolution, batch regularization and ReLU activation functions;
the feature extraction model takes sensor data as input of a first partial convolution operation; carrying out pooling treatment on the output result after batch regularization of the first part of convolution operation, and taking the result as the input of the second part of convolution operation; finally, carrying out pooling treatment on the results obtained after the second partial convolution operation and channel splicing, and outputting corresponding characteristic tensors, namely data characteristics; wherein the basic convolution is expressed as:
BasicConv(X)=relu(bn(conv(X,k,1)))=relu(bn(Wk*X+bk));
Wherein: x represents input data; w k represents a convolution kernel of size k 1×k2; * Representing a convolution operation; b k denotes offset; reLU denotes a ReLU activation function; adding a batch regularization operation bn between the convolution and the ReLU activation function;
the ReLU activation function ReLU is expressed as:
wherein: x represents input data;
By learning the mean mu β and variance in small batches of data To achieve batch regularization;
wherein: x i denotes the input data sample; m represents the current batch data size;
wherein: epsilon represents a smaller value greater than zero; gamma and beta represent trainable scale and bias parameters, respectively; representing normalized data; y i is the output after the self-learning size transformation and offset;
The basic pooling layer of the pooling operation is expressed as:
BasicPool(x)=concat(pool(x,k1,s),conv(x,k2,s));
Wherein: pool represents a pooling operation; conv represents a convolution operation; k 1 and k 2 represent the sizes of the pooling and convolution kernels, respectively; s represents the step size; concat means that the feature vectors are subjected to channel dimension stitching;
The shape of the feature tensor F f is w f×1×cf;
Wherein: w f represents the time domain size; c f denotes the channel domain size.
The feature extraction model has sparse interaction and parameter sharing capabilities, so that data features can be effectively extracted from sensor data for model training and real-time preside, calculated amount can be effectively reduced, overfitting can be restrained, and real-time performance of cutter wear state prediction can be further improved.
In the implementation process, after the data features are extracted, a large-scale cutter abrasion value prediction model needs to be designed and deployed to a cloud platform for knowledge extraction. The introduction of the attention mechanism is more beneficial to the migration of the knowledge of the model feature domain. The signal has a one-dimensional and time-sequential nature in the time domain compared to the image data, and the importance of the different sensor data and the different channel characteristics varies.
As shown in fig. 4, the large-scale prediction model includes a dense connection structure formed by sequentially connecting three attention dense modules, two pooling layers and a full connection layer;
Each attention-dense module comprises a plurality of corresponding dense layers and an attention layer, and the layers are in dense connection; each dense layer comprises a plurality of basic convolutions of convolution kernels with different sizes, the tensors are input into linear convolutions after being subjected to channel splicing, and the tensors are connected with an input tensor to form residual errors and then subjected to ReLU activation to obtain the output of the dense layer; the attention module starts from two dimensions of time and a channel and performs weight learning for the characteristics of the target data;
wherein the structure of the dense layer is expressed as:
IncepResLayer_B(Xi)=relu(Xi+linerConv(Xm,5×1));
Xm=concat(BasicConv(Xi,1×1),BasicConv(BasicConv(Xi,1×1),5×1));
Wherein: INCEPRESLAYER _b represents a type B dense layer; x i represents the input of the dense layer; reLU denotes a ReLU activation function; the linerConv (x) function represents an inactive linear convolution layer; concat means that the feature vectors are subjected to channel dimension stitching; basicConv (x) represents the basic convolution;
the pooling layer comprises a maximum pooling layer and a basic convolution layer of the same step length and multi-size convolution kernel;
the structure of the pooling layer is expressed as:
Pool(Xi)=concat(Xm1,Xm2,Xm3);
Xm1=Maxpool(Xi,k1×1);
Xm2=BasicConv(Xi,k1×1);
Xm3=BasicConv(BasicConv(Xi,1×1),k1×1);
Wherein: pool represents a pooling layer; concat means that the feature vectors are subjected to channel dimension stitching; maxpool (x) represents the maximum pooling operation, and a core size of k 1×1;Xi represents the input of the pooling layer; basicConv (x) represents the basic convolution; the step sizes of all convolution operations and pooling operations are strides =4;
as shown in fig. 5, the working logic of the attention module is as follows:
1) The given input sequence x=x 1,x2,...,xT and the filter f=f 1,f2,...fK are subjected to the following time domain convolution to obtain the correlation sequence a=a 1,a2,...,aT;
obtaining a final time domain weight sequence Y=y 1,y2,...,yT through a Softmax function;
2) For the input feature F i w×1×c, a one-dimensional sequence is obtained through single-channel 1X 1 convolution And obtaining the time domain weight/>, through time domain convolution and Softmax function
Wherein: softmax represents the Softmax function; temporalConv denotes a time domain convolution; conv represents convolution; 1×1 and 3×1 represent the convolution kernel shapes of the convolution and the time domain convolution, respectively;
3) Time domain weights After transposition, the obtained product is multiplied by an input characteristic F i w×1×c to obtain a one-dimensional sequence/>And one-dimensional sequence/> according to the ratio of c/rThe channel number of the channel is reduced, layer normalization and ReLU activation are carried out simultaneously, and then the original channel number is put back according to the original multiplying power, so as to obtain channel domain weight/>
Wherein: conv represents convolution; reLU represents a ReLU activation function; layerNorm denotes layer normalization; representing time domain weights; the superscript T denotes a transpose operation; c/r represents the channel domain dimension reduction ratio;
4) Respectively associating the characteristics of the time domain and the channel domain with the corresponding time domain weights And channel domain weights/>Multiplying to obtain the attention mapping tensor/>And form residual connection with the input feature F i w×1×c to obtain the attention feature output/>I.e. the attention feature.
In the implementation process, as shown in fig. 6, the lightweight prediction model cancels the dense connection structure based on the architecture of the large-scale prediction model and replaces the common convolution by the hole convolution, so as to realize the lightweight design.
In the implementation process, since the learning capacity of the lightweight prediction model is limited, in order to enable the model to mine more data information at the edge end and have stronger generalization capacity, knowledge learned by the large-scale prediction model needs to be transferred to the edge side.
As shown in fig. 7, the edge model library, the feature extraction model and the lightweight prediction model are all deployed on the edge side, and the cloud platform is deployed with a large-scale prediction model. The training process of each round carries out parameter updating and cloud edge data transmission of three models. Firstly, ignoring a cloud model, extracting data features by a feature extraction model, then transmitting the data features into a lightweight prediction model, updating two parameters according to an MSE rule, then forward transmitting the data through the feature extraction model again, uploading the features and labels into the cloud platform, updating attention distillation parameters, transmitting attention features of a large-scale prediction model only to an edge platform, and finally, finishing training of one round by the lightweight prediction model according to an MSE loss function and weighted attention feature distillation loss updating parameters.
The algorithm of cloud edge system training is as follows:
Specifically, cloud edge cooperative training is realized through the following steps:
s201: acquiring a training data set with a plurality of groups of training data and tag data thereof;
S202: inputting training data into a feature extraction model, and extracting data features as training feature information;
S203: inputting training feature information and corresponding tag data into a lightweight prediction model, and updating parameters of a feature extraction model and the lightweight prediction model;
In this embodiment, the parameters of the feature extraction model and the lightweight prediction model are optimized by the existing MSE loss function.
S204: uploading training feature information and corresponding tag data to a cloud platform, inputting the training feature information and the corresponding tag data into a large-scale prediction model, updating parameters of the large-scale prediction model, and further distilling and outputting attention features of the training;
In this embodiment, the parameters of the large-scale predictive model are optimized by the existing MSE loss function.
S205: training and updating parameters of the feature extraction model and the lightweight prediction model based on the attention features and the historical data of cloud migration;
in this embodiment, the parameters of the lightweight predictive model are optimized by the MSE loss function+the attention deficit loss function.
S206: steps S202 to S205 are repeated until the feature extraction model and the lightweight prediction model reach expectations.
The invention provides a deep learning method based on an attention mechanism, a residual network and the like, and provides a deep multi-convolution kernel attention residual network model, namely a large-scale prediction model, a lightweight dynamic cavity convolution model, namely a lightweight prediction model, and an intelligent framework of cloud edge collaborative training and edge side real-time reasoning is established, so that the data characteristics of time sequence signals of different sensor data in different fields can be fully explored, and data fusion in time and space is carried out on the time sequence signals, and compared with the existing other models, the cloud edge collaborative framework has better prediction precision and faster reasoning speed, thereby further improving the precision and real-time performance of cutter abrasion state prediction.
In the specific implementation process, as shown in fig. 8, the cloud model and the lightweight model respectively obtain corresponding attention characteristics F o w ×1×c, and perform attention mapping operation according to the following formulas to obtain a corresponding time characteristic sequence F t.
Wherein: f t represents a time feature sequence; f o w×1×c is the attention feature; A feature vector for each channel domain for the attention feature F o w×1×c; c represents the number of channels; the time feature sequences f t c_i and f t e_i of the attention features in the large-scale prediction model and the lightweight prediction model are obtained through calculation of the formula.
And measuring the weight similarity degree between each characteristic time sequence in the edge model and the corresponding cloud model sequence by adopting the cosine distance.
Respectively calculating MSE loss and attention distillation loss based on tag data and attention characteristics of historical data to obtain corresponding training loss, and further updating parameters of the lightweight prediction model through the training loss;
wherein the method comprises the steps of ,Loss_all=Loss_mse+λLoss_att=Loss_mse+λ∑i=a,b,cDC(ft c_i,ft e_i);
λ=(1-αep0
Wherein: loss_all represents training Loss; loss_mse represents MSE Loss; loss_att represents the Loss of attention distillation; n is the batch size; Is a predicted value; y i is tag data; d C(f1,f2) represents the cosine distance; < f 1,f2 > represents a two-vector inner product; f t c_i and f t e_i are temporal feature sequences of attention features in the large-scale prediction model and the lightweight prediction model, respectively; /(I) And/>Representing the element values in the time series of features f t c_i and f t e_i, respectively; w is the tensor time domain number; λ represents a dynamic distillation loss coefficient; alpha represents a number less than 1; lambda 0 represents the initial distillation loss coefficient; ep represents the number of training rounds.
According to the method, the MSE loss and the attention distillation loss are calculated respectively based on the tag data and the attention characteristics of the historical data to obtain the corresponding training loss, and then the parameters of the lightweight prediction model are updated through the training loss, so that the cloud edge cooperative training based on the attention characteristics is realized, the accuracy of the edge lightweight prediction model can be improved by fully utilizing the model knowledge of the cloud platform, and the problem of limited model prediction accuracy caused by the simple structure and the small parameter quantity of the edge lightweight prediction model is avoided.
In the implementation process, in the actual production environment, the abrasion data of the machine tool cutter are continuously generated, and the performance of the model is reduced along with the time lapse, the equipment aging, the processing condition change and the like. Old data often cannot be used for model retraining due to storage limitations or privacy protection, but only relies on knowledge in the new data, which can easily cause catastrophic forgetting of the model.
Therefore, the invention uploads the feature information to be tested to the cloud platform and inputs the feature information to the large-scale prediction model for model precision evaluation, and when the loss of the large-scale prediction model exceeds an expected threshold value, incremental training is carried out on the large-scale prediction model, so that an incremental training method based on a historical model library and an attention forgetting factor is provided.
The incremental training method based on the history model library and the attention forgetting factor has the following algorithm:
As shown in fig. 9, the incremental training specifically includes the steps of:
s211: initializing a model to be trained by using the latest parameters in a historical model library, respectively inputting new training data into the model to be trained and the historical model, and respectively calculating Euclidean distances between each historical model and feature mapping of the model to be trained;
In this embodiment, the history model stores parameters of the large-scale prediction model of each version in the history training process.
Wherein: Representing Euclidean distances between model features; f and F i are feature tensors of the model to be trained and the history model respectively; d t and D c are the euclidean distance of the time sequence and the channel sequence, respectively; x j and/> Time sequence elements of the model to be trained and the historical model are respectively; y k and/>Channel sequence elements of the model to be trained and the historical model are respectively;
S212: obtaining a distance loss based on each history model
S213: setting a corresponding forgetting factor eta based on the importance degree of the history model;
η=η0e-ki
wherein: η 0 represents an initial forgetting factor; k represents a forgetting coefficient; i represents a history model number; the characteristic distance loss weight of the historical model drops exponentially along with the updating of the model;
s214: constructing an incremental loss function based on Euclidean distance between the characteristic mapping of the historical model and the model to be trained, and performing incremental training by taking the incremental loss function as an index;
Wherein: l incre represents incremental loss; l mse represents the mean square error of the tag data; n represents the number of history models.
In the actual production environment, the tool wear data of the machine tool are continuously generated, and the performance of the model is reduced along with the time, the equipment aging, the processing condition change and the like.
Therefore, the incremental training algorithm based on the attention forgetting factors is provided, and the large-scale prediction model is incrementally trained by combining the historical model and the parameters of the edge model library with the forgetting factors, so that the large-scale repeated training or catastrophic forgetting of historical data can be avoided, the life learning capacity of the large-scale prediction model is further improved, the tool wear state prediction precision of the lightweight prediction model can be continuously ensured in the long-term operation process, and a practical solution is provided for the tool wear state detection of the numerical control machine tool.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the technical solution, and those skilled in the art should understand that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the present invention, and all such modifications and equivalents are included in the scope of the claims.

Claims (2)

1. Cutter wear state prediction system based on cloud limit cooperative training, its characterized in that includes:
The equipment layer is used for acquiring sensor data of the tool to be tested;
the edge platform is provided with a trained feature extraction model and a lightweight prediction model; the feature extraction model is used for extracting data features in sensor data as feature information to be detected, and the lightweight prediction model is used for taking the feature information to be detected as input and outputting a corresponding cutter abrasion state prediction result;
The cloud platform is deployed with a large-scale prediction model based on an attention mechanism; the large-scale prediction model learns attention characteristics and distills the attention characteristics into a lightweight prediction model of an edge platform so as to realize cooperative training of a cloud edge model;
the tool wear state prediction method based on cloud edge cooperative training is implemented based on a tool wear state prediction system and specifically comprises the following steps of:
s1: acquiring sensor data of a tool to be tested through an equipment layer, and uploading the sensor data to an edge platform;
S2: the edge platform receives sensor data and inputs the sensor data into a trained feature extraction model, and data features are extracted to serve as feature information to be detected; then inputting the characteristic information to be detected into a lightweight prediction model subjected to cloud edge cooperative training, and outputting a corresponding cutter abrasion state prediction result;
in the step S2, cloud edge cooperative training is realized through the following steps:
s201: acquiring a training data set with a plurality of groups of training data and tag data thereof;
S202: inputting training data into a feature extraction model, and extracting data features as training feature information;
In step S202, the feature extraction model includes a two-part convolution operation, where the first part convolution operation includes adding convolution results of the 1×1 convolution kernel and the 3×1 convolution kernel and performing batch regularization; the second partial convolution operation comprises channel splicing of basic convolution results of kernels with different sizes, wherein the basic convolution comprises convolution, batch regularization and ReLU activation functions;
The feature extraction model takes sensor data as input of a first partial convolution operation; carrying out pooling treatment on the output result after batch regularization of the first part of convolution operation, and taking the result as the input of the second part of convolution operation; finally, carrying out pooling treatment on the results obtained after the second partial convolution operation and channel splicing, and outputting corresponding characteristic tensors, namely data characteristics;
wherein the basic convolution is expressed as:
BasicConv(X)=relu(bn(conv(X,k,1)))=relu(bn(Wk*X+bk));
Wherein: x represents input data; w k represents a convolution kernel of size k 1×k2; * Representing a convolution operation; b k denotes offset; reLU denotes a ReLU activation function; adding a batch regularization operation bn between the convolution and the ReLU activation function;
the ReLU activation function ReLU is expressed as:
wherein: x represents input data;
By learning the mean mu β and variance in small batches of data To achieve batch regularization;
wherein: x i denotes the input data sample; m represents the current batch data size;
wherein: epsilon represents a smaller value greater than zero; gamma and beta represent trainable scale and bias parameters, respectively; representing normalized data; y i is the output after the self-learning size transformation and offset;
The basic pooling layer of the pooling operation is expressed as:
BasicPool(x)=concat(pool(x,k1,s),conv(x,k2,s));
Wherein: pool represents a pooling operation; conv represents a convolution operation; k 1 and k 2 represent the sizes of the pooling and convolution kernels, respectively; s represents the step size; concat means that the feature vectors are subjected to channel dimension stitching;
The shape of the feature tensor F f is w f×1×cf;
Wherein: w f represents the time domain size; c f denotes a channel domain size;
S203: inputting training feature information and corresponding tag data into a lightweight prediction model, and updating parameters of a feature extraction model and the lightweight prediction model;
In step S203, the lightweight prediction model cancels the dense connection structure based on the architecture of the large-scale prediction model and replaces the common convolution by the hole convolution, so as to realize the lightweight design;
S204: uploading training feature information and corresponding tag data to a cloud platform, inputting the training feature information and the corresponding tag data into a large-scale prediction model, updating parameters of the large-scale prediction model, and further distilling and outputting attention features of the training;
in step S204, the large-scale prediction model includes a dense connection structure formed by sequentially connecting three attention dense modules, two pooling layers and a full connection layer;
Each attention-dense module comprises a plurality of corresponding dense layers and an attention layer, and the layers are in dense connection; each dense layer comprises a plurality of basic convolutions of convolution kernels with different sizes, the tensors are input into linear convolutions after being subjected to channel splicing, and the tensors are connected with an input tensor to form residual errors and then subjected to ReLU activation to obtain the output of the dense layer; the attention module starts from two dimensions of time and a channel and performs weight learning for the characteristics of the target data;
wherein the structure of the dense layer is expressed as:
IncepResLayer_B(Xi)=relu(Xi+linerConv(Xm,5×1));
Xm=concat(BasicConv(Xi,1×1),BasicConv(BasicConv(Xi,1×1),5×1));
Wherein: INCEPRESLAYER _b represents a type B dense layer; x i represents the input of the dense layer; reLU denotes a ReLU activation function; the linerConv (x) function represents an inactive linear convolution layer; concat means that the feature vectors are subjected to channel dimension stitching; basicConv (x) represents the basic convolution;
the pooling layer comprises a maximum pooling layer and a basic convolution layer of the same step length and multi-size convolution kernel;
the structure of the pooling layer is expressed as:
Pool(Xi)=concat(Xm1,Xm2,Xm3);
Xm1=Maxpool(Xi,k1×1);
Xm2=BasicConv(Xi,k1×1);
Xm3=BasicConv(BasicConv(Xi,1×1),k1×1);
Wherein: pool represents a pooling layer; concat means that the feature vectors are subjected to channel dimension stitching; maxpool (x) represents the maximum pooling operation, and a core size of k 1×1;Xi represents the input of the pooling layer; basicConv (x) represents the basic convolution; the step sizes of all convolution operations and pooling operations are strides =4;
The working logic of the attention module is as follows:
1) The given input sequence x=x 1,x2,...,xT and the filter f=f 1,f2,...fK are subjected to the following time domain convolution to obtain the correlation sequence a=a 1,a2,...,aT;
obtaining a final time domain weight sequence Y=y 1,y2,...,yT through a Softmax function;
2) For the input feature F i w×1×c, a one-dimensional sequence is obtained through single-channel 1X 1 convolution And obtaining the time domain weight/>, through time domain convolution and Softmax function
Wherein: softmax represents the Softmax function; temporalConv denotes a time domain convolution; conv represents convolution; 1×1 and 3×1 represent the convolution kernel shapes of the convolution and the time domain convolution, respectively;
3) Time domain weights After transposition, the obtained product is multiplied by an input characteristic F i w×1×c to obtain a one-dimensional sequence/>And one-dimensional sequence/> according to the ratio of c/rThe channel number of the channel is reduced, layer normalization and ReLU activation are carried out simultaneously, and then the original channel number is put back according to the original multiplying power, so as to obtain channel domain weight/>
Wherein: conv represents convolution; reLU represents a ReLU activation function; layerNorm denotes layer normalization; representing time domain weights; the superscript T denotes a transpose operation; c/r represents the channel domain dimension reduction ratio;
4) Respectively associating the characteristics of the time domain and the channel domain with the corresponding time domain weights And channel domain weights/>Multiplying to obtain the attention mapping tensor/>And form residual connection with the input feature F i w×1×c to obtain the attention feature output/>I.e. the attention feature;
s205: training and updating parameters of the feature extraction model and the lightweight prediction model based on the attention features and the historical data of cloud migration;
In step S205, MSE loss and attention distillation loss are calculated based on the tag data and the attention features of the history data, so as to obtain corresponding training loss, and then parameter updating is performed on the lightweight prediction model through the training loss;
wherein the method comprises the steps of ,Loss_all=Loss_mse+λLoss_att=Loss_mse+λ∑i=a,b,cDC(ft c_i,ft e_i);
λ=(1-αep0
Wherein: loss_all represents training Loss; loss_mse represents MSE Loss; loss_att represents the Loss of attention distillation; n is the batch size; Is a predicted value; y i is tag data; d C(f1,f2) represents the cosine distance; < f 1,f2 > represents a two-vector inner product; f t c_i and f t e_i are temporal feature sequences of attention features in the large-scale prediction model and the lightweight prediction model, respectively; And/> Representing the element values in the time series of features f t c_i and f t e_i, respectively; w is the tensor time domain number; λ represents a dynamic distillation loss coefficient; alpha represents a number less than 1; lambda 0 represents the initial distillation loss coefficient; ep represents the number of training rounds;
the time signature sequence is calculated by the following formula:
Wherein: f t represents a time feature sequence; Is a attention feature; /(I) For attention features/>A feature vector for each channel domain; c represents the number of channels; calculating to obtain time feature sequences f t c_i and f t e_i of attention features in the large-scale prediction model and the lightweight prediction model through the formula;
s206: repeating steps S202 to S205 until the light predictive model reaches the expectation;
In the step S2, the feature information to be detected is uploaded to a cloud platform and is input into a large-scale prediction model to carry out model precision evaluation, and when the loss of the large-scale prediction model exceeds an expected threshold value, incremental training is carried out on the large-scale prediction model;
the incremental training specifically comprises the following steps:
s211: initializing a model to be trained by using the latest parameters in a historical model library, respectively inputting new training data into the model to be trained and the historical model, and respectively calculating Euclidean distances between each historical model and feature mapping of the model to be trained;
Wherein: Representing Euclidean distances between model features; f and F i are feature tensors of the model to be trained and the history model respectively; d t and D c are the euclidean distance of the time sequence and the channel sequence, respectively; x j and/> Time sequence elements of the model to be trained and the historical model are respectively; y k and/>Channel sequence elements of the model to be trained and the historical model are respectively;
S212: obtaining a distance loss based on each history model
S213: setting a corresponding forgetting factor eta based on the importance degree of the history model;
η=η0e-ki
wherein: η 0 represents an initial forgetting factor; k represents a forgetting coefficient; i represents a history model number; the characteristic distance loss weight of the historical model drops exponentially along with the updating of the model;
s214: constructing an incremental loss function based on Euclidean distance between the characteristic mapping of the historical model and the model to be trained, and performing incremental training by taking the incremental loss function as an index;
Wherein: l incre represents incremental loss; l mse represents the mean square error of the tag data; n represents the number of history models;
s3: the edge platform generates feedback control information based on the cutter abrasion state prediction result and transmits the feedback control information to the equipment layer;
S4: the equipment layer controls a machine tool of the to-be-tested cutter to execute corresponding actions based on the feedback control information.
2. The cloud edge co-training based tool wear state prediction system according to claim 1, wherein: the edge platform is also provided with a data preprocessing module; the data preprocessing module is used for carrying out data cleaning and Z-score normalization processing on the sensor data.
CN202210754025.5A 2022-06-28 2022-06-28 Cutter wear state prediction system and method based on cloud edge cooperative training Active CN115034504B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210754025.5A CN115034504B (en) 2022-06-28 2022-06-28 Cutter wear state prediction system and method based on cloud edge cooperative training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210754025.5A CN115034504B (en) 2022-06-28 2022-06-28 Cutter wear state prediction system and method based on cloud edge cooperative training

Publications (2)

Publication Number Publication Date
CN115034504A CN115034504A (en) 2022-09-09
CN115034504B true CN115034504B (en) 2024-05-28

Family

ID=83126925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210754025.5A Active CN115034504B (en) 2022-06-28 2022-06-28 Cutter wear state prediction system and method based on cloud edge cooperative training

Country Status (1)

Country Link
CN (1) CN115034504B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117934931A (en) * 2024-01-16 2024-04-26 广州杰鑫科技股份有限公司 Model updating method and device, optical cable intelligent operation and maintenance system and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070208A (en) * 2020-08-05 2020-12-11 同济大学 Tool wear prediction method based on encoder-decoder stage attention mechanism
CN112706001A (en) * 2020-12-23 2021-04-27 重庆邮电大学 Machine tool cutter wear prediction method based on edge data processing and BiGRU-CNN network
CN113569903A (en) * 2021-06-09 2021-10-29 西安电子科技大学 Method, system, equipment, medium and terminal for predicting abrasion of numerical control machine tool cutter
CN114297912A (en) * 2021-12-08 2022-04-08 燕山大学 Tool wear prediction method based on deep learning
CN114619292A (en) * 2022-03-25 2022-06-14 南京航空航天大学 Milling cutter wear monitoring method based on fusion of wavelet denoising and attention mechanism with GRU network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070208A (en) * 2020-08-05 2020-12-11 同济大学 Tool wear prediction method based on encoder-decoder stage attention mechanism
CN112706001A (en) * 2020-12-23 2021-04-27 重庆邮电大学 Machine tool cutter wear prediction method based on edge data processing and BiGRU-CNN network
CN113569903A (en) * 2021-06-09 2021-10-29 西安电子科技大学 Method, system, equipment, medium and terminal for predicting abrasion of numerical control machine tool cutter
CN114297912A (en) * 2021-12-08 2022-04-08 燕山大学 Tool wear prediction method based on deep learning
CN114619292A (en) * 2022-03-25 2022-06-14 南京航空航天大学 Milling cutter wear monitoring method based on fusion of wavelet denoising and attention mechanism with GRU network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于深度门控循环单元神经网络的刀具磨损状态实时监测方法;陈启鹏;谢庆生;袁庆霓;黄海松;魏琴;李宜汀;;计算机集成制造***;20200715(第07期);58-69 *
多监控数据融合下的铣削刀具磨损量预测;陈熠道 等;组合机床与自动化加工技术;20220415(第4期);96-100 *

Also Published As

Publication number Publication date
CN115034504A (en) 2022-09-09

Similar Documents

Publication Publication Date Title
US11921566B2 (en) Abnormality detection system, abnormality detection method, abnormality detection program, and method for generating learned model
Abdulshahed et al. The application of ANFIS prediction models for thermal error compensation on CNC machine tools
Fan et al. A novel machine learning method based approach for Li-ion battery prognostic and health management
CN108920888B (en) Continuous stirred tank type reaction process identification method based on deep neural network
Ayodeji et al. Causal augmented ConvNet: A temporal memory dilated convolution model for long-sequence time series prediction
Nasser et al. A hybrid of convolutional neural network and long short-term memory network approach to predictive maintenance
CN115034504B (en) Cutter wear state prediction system and method based on cloud edge cooperative training
CN110757510A (en) Method and system for predicting remaining life of robot
Das et al. Deep recurrent architecture with attention for remaining useful life estimation
Xue et al. Data-driven prognostics method for turbofan engine degradation using hybrid deep neural network
CN113858566B (en) Injection molding machine energy consumption prediction method and system based on machine learning
Jose et al. Solving time alignment issue of multimodal data for accurate prognostics with CNN-Transformer-LSTM network
CN116894180B (en) Product manufacturing quality prediction method based on different composition attention network
CN116720079A (en) Wind driven generator fault mode identification method and system based on multi-feature fusion
Xia et al. Tool wear prediction under varying milling conditions via temporal convolutional network and auxiliary learning
CN114330089B (en) Rare earth element content change prediction method and system
Alamaniotis et al. Optimal assembly of support vector regressors with application to system monitoring
CN113971489A (en) Method and system for predicting remaining service life based on hybrid neural network
Arias Chao Combining deep learning and physics-based performance models for diagnostics and prognostics
Zheng et al. A Remaining Useful Life Prediction Method of Rolling Bearings Based on Deep Reinforcement Learning
CN114841000B (en) Soft measurement modeling method based on modal common feature separation
Li et al. CapsNet-Enabled Multimodal Data Fusion: A Case Study in RUL Estimation of Turbofan Engine
Marei et al. CNC Machining Centre Thermal Error Prediction from Temperature Sensors Using Autoencoder-Augmented Long Short Term Memory Networks
Soualhi et al. Explainable RUL estimation of turbofan engines based on prognostic indicators and heterogeneous ensemble machine learning predictors
Mohammad Validating Uncertainty-Aware Virtual Sensors For Industry 4.0

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant