CN115713097A

CN115713097A - Time calculation method of electron microscope based on seq2seq algorithm

Info

Publication number: CN115713097A
Application number: CN202310017145.1A
Authority: CN
Inventors: 陈宁; 陈盼; 何世伟; 叶宗昆; 陈科明
Original assignee: Hangzhou Thingcom Information Technology Co ltd; Zhejiang Science And Technology Project Management Service Center
Current assignee: Hangzhou Thingcom Information Technology Co ltd; Zhejiang Science And Technology Project Management Service Center
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-02-24

Abstract

The invention discloses an electron microscope time calculation method based on a seq2seq algorithm, which comprises the steps of firstly screening current data of a typical scanning electron microscope and a typical transmission electron microscope as a sample set, training a model through a seq2seq algorithm structure, converting a characteristic sequence obtained by CNN network processing into a state probability distribution sequence through an Encoder-Decoder structure, obtaining an optimal training model after multiple iterations, and storing an obtained model file to a cloud server. The current data of the electron microscope collected subsequently only needs to be subjected to the same data segmentation and input into the model, and the machine-hour state corresponding to the current can be quickly obtained, so that the machine-hour data can be obtained. The invention integrates the current information of the same type of scientific instruments, trains the universal model of the same type of instruments through the current information, and predicts the machine hour of the same type of scientific instruments more accurately, thereby having greater engineering value.

Description

Time calculation method of electron microscope based on seq2seq algorithm

Technical Field

The invention belongs to the technical field of data analysis, and particularly relates to an electron microscope time calculation method based on a seq2seq algorithm.

Background

Nowadays, society is entering a big data era, and with continuous improvement of data acquisition technology and more data acquisition products, data which can be provided for people to carry out analysis and processing are also continuously increased. Meanwhile, with the further development of scientific research in colleges and universities, the number and the types of instruments are more and more, experimental technicians are limited, the problems that the idle rate of the instruments is high, the using time of the instruments is not clear and the like generally exist in large-scale instruments, the working state of the instruments can be clearly reflected when the instruments are on the machine (the machine is turned off, the machine is in a standby state and the machine is in a working state), and the method plays an important role in instrument management. The demands of collecting the current of a large instrument through a sensor and extracting the machine-hour information of the instrument through a big data analysis algorithm are urgently needed to be solved by utilizing big data analysis and mining technologies and the like.

For such problems, it is common to extract a characteristic value of an instrument and classify data points by clustering or classification algorithms in machine learning, for example, documents [ hu feng, zhu cheng zhi, wang shihua ] research on power load classification based on an improved K-means algorithm [ J ] electronic measurement technology, 2018] propose a power load classification method based on an improved K-means algorithm, which clusters characteristics of power loads, but this method has a poor clustering effect if types of data are unbalanced, such as a severe data amount imbalance or variances of classes are different. For example, in the document [ liu yong mei, jun huai, prosperous, residual current classification method [ J ] electronic technology application, 2018] an AdaBoost algorithm-based residual current classification method is proposed, the method firstly obtains characteristic components of different types of residual current components through an extraction experiment, then maps the component characteristics into an AdaBoost algorithm, and detects the type of electric shock current components in the total residual current by using the AdaBoost algorithm.

The two methods ignore the sequence characteristics of the current data, can not mine the regularity in the sequence, the current data is the result of sampling the instrument current according to the specified sampling rate in the equal interval time period, and has the characteristics of strong data continuity, compact time points, many related variables and the like, and the change of each variable not only depends on the historical data value of the variable, but also depends on the influence of other related variables on the variable. Therefore, the currents of the same type of instrument may have the same rule, and the universal model of the instrument can be obtained by training the currents of the same type of instrument.

Disclosure of Invention

In view of the above, the invention provides an electron microscope time calculation method based on a seq2seq algorithm, which integrates current information of scientific instruments of the same type, trains a general model of the instruments of the same type through the current information, and predicts the time of the instruments of the same type more accurately, thereby having greater engineering value.

A time calculation method of an electron microscope based on seq2seq algorithm comprises the following steps:

(1) Establishing a data set about the operating current of an electron microscope scientific instrument;

(2) Preprocessing a data set, and dividing the whole data set into a training set and a testing set;

(3) Constructing a state prediction model based on a seq2seq algorithm, wherein the state prediction model is formed by sequentially connecting an encoder, an attention mechanism layer and a decoder, the encoder is used for carrying out characteristic encoding on input current data, the attention mechanism layer is used for giving different weights to different hidden layer states in characteristic information, and the decoder is used for decoding the characteristic information to obtain state probability distribution of an instrument at each moment;

(4) Training the model by using current data of a training set;

(5) And inputting the current data of the test set into the trained model, so that the state indication of the instrument at each moment can be predicted, and the duration of the working operation state of the instrument is further counted.

Further, the specific implementation manner of the step (1) is as follows: firstly, screening out a plurality of groups of running current sequences related to a scanning electron microscope and a transmission electron microscope from a database of an instrument background management system, wherein the sequences have complete current periods (standby, working and shutdown) including current values at all times in the periods; then, labeling each current value in the operating current sequence to obtain a corresponding label sequence, namely for the current value at any moment, if the instrument is in a working operation state at the moment, assigning a value of 2 to the corresponding label; if the instrument is in a standby state at the moment, the corresponding label is assigned to be 1; at the moment, the instrument is in a shutdown state, and the value of the corresponding label is assigned to be 0; and combining the operating current sequence and the corresponding label sequence to form a group of current data, and obtaining a plurality of groups of current data to construct a data set.

Further, the specific implementation manner of preprocessing the data set in the step (2) is as follows: firstly, the operation current sequence in the data set is normalized, and then the operation current sequence is converted and divided into current input vectors at all moments, namely for any moment in the operation current sequencetCurrent value ofi _t According to the size of a predetermined windowwGet thei _t And front ofwCurrent value composition at each timetCurrent input vector of timei _t-w ,i _t-w+1 ,…,i _t-1 ,i _t ]If at allWherein the current value is insufficient in numberw+1, then copyi _t Completing; and traversing the current values at all moments in the running current sequence so as to obtain the current input vector at all moments.

Further, the encoder adopts a BilSTM network, which comprises a forward LSTM unit and a backward LSTM unit, and the specific calculation expression is as follows:

wherein:His the encoded feature vector output by the encoder,h _t for a BilsTM networktThe hidden state of the moment in time,tis a natural number and is not more than 1t≤m，mFor the length of time the current sequence is run,x _t is composed oftThe current at the time is input into the vector,

and

are respectively forward LSTM units intTime of day andt-a hidden state at time 1,

and

are respectively backward LSTM unitstTime of day andthidden state at time +1, bilSTM ⁺ () And BilsTM ^- () Representing the internal computation functions of the forward LSTM unit and the backward LSTM unit, respectively.

Further, the computational expression of the attention mechanism layer is as follows:

wherein:C _t to take care of the output of the mechanical layer, i.e.tA time-sequence vector of the time of day,α _t is in a hidden stateh _t The weight value of (a) is calculated,v _t is in a hidden layer stateh _t The score of attention of (a) is,exp() Expressed as natural constantseAn exponential function of the base is used,tanh() Which represents a function of the hyperbolic tangent,V _e andW _e as the weight matrix to be learned, ^T the transpose is represented by,s _t-1 is a decoder attHidden state at time 1.

Further, the decoder calculates the hidden layer state by adopting a unidirectional LSTM networks _t Further hide the layer states _t Attention mechanism layer outputC _t Andt-1 State probability density distribution vector of the instrument at time instantp _t-1 Dimension reduction is carried out on the full connection layer after splicing to obtaintState probability density distribution vector of time of day instrumentp _t As a model output.

Further, the hidden layer states _t The calculation expression of (a) is as follows:

wherein:s _t for the decoder attThe hidden state of the moment in time,p _t-1 is composed oftThe state probability density distribution vector of the instrument at time 1, LSTM () representing the internal calculation function of a one-way LSTM network.

Further, for the initial hidden states ₀ In a state of being hidden by a full connection layerh _m Reducing the vitamin content to obtain the product.

Further, the specific process of training the model in the step (4) is as follows:

4.1 Initializing model parameters including a bias vector and a weight matrix of each layer, a learning rate and an optimizer;

4.2 Inputting current input vector of each moment in the training current collection data to the model, outputting the forward propagation of the model to obtain corresponding prediction result, namely state probability density distribution vector, and calculating loss function between the prediction result and corresponding label sequenceL；

4.3 According to a loss functionLModel parameters are continuously and iteratively updated by an optimizer through a gradient descent method until a loss function is obtainedLAnd (5) converging and finishing training.

Further, the loss functionLThe expression of (a) is as follows:

wherein:p _t as a result of prediction output by the modeltThe state probability density distribution vector of the instrument at a time,y _t in the corresponding tag sequencetThe value of the tag at the time of day,tis a natural number and is not more than 1t≤m，mIs the length of time that the current sequence is run.

The method comprises the steps of firstly screening current data of a typical scanning electron microscope and a typical transmission electron microscope as sample sets, and training a model through a seq2seq algorithm structure. The core idea of the seq2seq network used in the invention is that a characteristic sequence obtained by CNN network processing is converted into a state probability distribution sequence through an Encoder-Decoder structure, an optimal training model is obtained after multiple iterations, and an obtained model file is stored in a cloud server; the current data of the electron microscope collected subsequently only needs to be subjected to the same data segmentation and input into the model, and the machine-hour state corresponding to the current can be quickly obtained, so that the machine-hour data can be obtained.

In addition, the invention integrates the current information of the same type of scientific instruments, trains the general model of the same type of instruments through the current information, and predicts the machine hour of the same type of scientific instruments more accurately, thereby having greater engineering value.

Drawings

FIG. 1 is a schematic flow chart of a calculation method in an electron microscope apparatus according to the present invention.

Fig. 2 is an exemplary graph of current waveforms for a typical scanning electron microscope and transmission electron microscope.

FIG. 3 is a schematic diagram of the basic structure of the seq2seq algorithm model of the present invention.

FIG. 4 is a diagram illustrating simulation results of the seq2seq algorithm model of the present invention.

FIG. 5 is a diagram showing the comparison between the simulation results of seq2seq algorithm model with and without the addition of the Attention mechanism.

Detailed Description

In order to describe the present invention more specifically, the following detailed description of the present invention is made with reference to the accompanying drawings and the detailed description of the present invention.

The invention provides an on-machine time calculation method for an electron microscope instrument by screening and selecting current data of a typical scanning electron microscope and a typical transmission electron microscope based on an instrument background management system, the specific flow is shown in figure 1, the method firstly needs to train a model, the typical running current of the electron microscope is selected in the training process, and the specific implementation process is as follows:

(1) And establishing a data set, including data labels and data preprocessing.

1.1 The electron microscope current sequence with a complete current cycle (working, standby, shut down) is screened from the instrument background management system database as shown in fig. 2.

1.2 And normalizing the current sequence, labeling, and dividing into a training set and a test set.

The current sequence is first normalized, with the maximum and minimum normalization formula as follows:

wherein:i _min represents the minimum value of the current values in the sequence,i _max representing the maximum value of the current values in the sequence.

The current data of each electron microscope is labeled according to actual conditions, assuming that current data of a certain electron microscope is S = {0.04, 0.04, 0.05, 0.04, 0.05, 0.04, 0.04, 1.34, 1.37, 1.33, 1.35, 1.31, 0.34, 0.34, 0.35, 0.36, 0.35}, and current data D after normalization = {0.03, 0.03, 0.04, 0.03, 0.04, 0.03, 0.03, 0.98, 1, 0.98, 0.99, 0.96, 0.25, 0.25, 0.26, 0.26}; after the marking process, the corresponding label set C = {0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1}, where 0 represents a power-off state, 1 represents a standby state, and 2 represents an operating state.

And finally, combining the obtained operating current sequence and the corresponding label sequence to form a group of samples, obtaining a plurality of groups of samples according to the group of samples to form a data set, and dividing the obtained data set into a training set and a testing set according to the proportion of 8.

(2) And constructing a seq2seq algorithm model.

2.1 The Encoder structure of the BiLSTM network is used.

As shown in fig. 3, the encoder is responsible for processing the incoming current sequence, putting all information about the incoming sequence into a fixed length vector. The Encoder structure uses a BiLSTM network instead of a unidirectional LSTM, because the BiLSTM network has a stronger ability to utilize input sequence information, the formula of the Encoder structure is as follows:

wherein:

is a forward LSTM cell intThe hidden state of the moment in time,

is a backward LSTM celltThe hidden state of the moment in time,x _t is at the same timetThe current at the time is input into the vector,

is a forward LSTM cellIn thatt-a hidden state at time 1,

is a backward LSTM celltHidden state at time +1, bilSTM ⁺ () And BilSTM ^- () Representing the computation functions of the forward LSTM unit and backward LSTM unit respectively,h _t for a BilsTM networktHidden layer state of the moment;

and

the value of (c) is initialized to the hidden layer code corresponding to the window interval with the current value of all 0.

2.2 An Attention mechanism was introduced.

An Attention mechanism is introduced in a seq2seq algorithm, and before an output vector enters a Decoder structure, coding feature vectors of all moments output in an Encoder are read firstlyH=[h ₁ ,h ₂ ,…,h _m ]，mThe number of states of the hidden layer (namely the time length of the running current sequence) is represented, different weights are given to different variables through an Attention mechanism, so that the Decoder structure can pay more Attention to effective characteristic information, and the formula of the Decoder structure is as follows:

wherein:s _t in representation Decoder structuretThe hidden state of the moment in time,s _t-1 in the representation Decoder structuret-a hidden state at time 1,C _t representtThe time-sequence vector of the time of day,p _t-1 to representt-1 the state probability density distribution vector of the instrument, LSTM () representing the internal calculation function of the unidirectional LSTM network; dynamically variable timing vectorC _t Can store the complete effective information input by the model at the current moment, which is calculatedThe process can be divided into the following 3 steps:

step 1: the attention mechanism score was calculated over 2 fully connected layers as shown below:

wherein:V _e 、W _e as a weight matrix, the weight matrix is,v _t representing hidden statesh _t The score of attention of (a) is,tanh() Representing a hyperbolic tangent function.

Step 2: obtained by the above formulav _t The softmax function is used to calculate the weight of each hidden state as shown in the following equation:

wherein:exp() Expressed as natural constantseAn exponential function of the base (A) is,α _t representing hidden statesh _t The corresponding weight value.

And 3, step 3: weights obtained according to the above formulaα _t And its corresponding hidden layer stateh _t Weighted addition to obtain a timing vectorC _t As shown in the following formula:

2.3 The Decoder structure is used to obtain the final decomposition result.

The Decoder structure converts the time sequence characteristic sequence obtained in the step 2.2 into a machine-time state sequence; in contrast to the Encode section, since Decoder requires state encoding that sequentially generates a current sequence, a unidirectional LSTM network must be used. BiLSTM network last hidden layer state in Encoderh _m Corresponding to the initial state of the LSTM layer in the Decoder structures ₀ (ii) a Due to hidden layer state

Is formed by splicing a forward hidden layer state and a backward hidden layer state, and a dimension heels ₀ Mismatch, so it is necessary to pass through the full connection layer pairh _m And transmitting the data into the Decoder after dimension reduction.

The Decoder willtHidden state of times _t 、tTime sequence vector of timeC _t Andt-probability density distribution vector of time of day 1 instrumentp _t-1 After splicing, the full connecting layer is obtainedtProbability density distribution vector of time of day instrumentp _t ，p _t Is calculated as follows:

wherein: initial valuep ₀ The number of the corresponding states according to the current value of the instrument is equally divided into { 0.5%,0.5} or {0.33 },0.33,0.33}。

Because some instruments have current values corresponding to two states and some instruments have current values corresponding to three states, probability density distribution vectors are carried out by different instrumentsp _t During prediction, different full-connection layers are required to process, and the number of the states of the instrument corresponds to the number of neurons of the full-connection layers.

2.4 A loss function and an optimization function are set.

The model of the invention converts the decomposition problem into a multi-classification problem for solving the states of each instrument through clustering and coding operation, thereby using a cross entropy function as a loss function of model training, as shown in the following formula:

wherein:y _t in the corresponding tag sequencetThe tag value at the time.

The optimizer adopts SGD + Momentum, and introduces a first-order Momentum on the basis of SGD, as shown in the following formula:

wherein:βindicating a hyper-parameter that can be set by itself,αin order to obtain the learning rate of the learning,Wandbin order to be the weight, the weight is,dWanddbis toWAndbpartial derivatives of (a). From the viewpoint of momentum, by weightWFor the purpose of example only,V _dW it can be understood that the speed is,dWit can be understood as acceleration, and the exponentially weighted average is actually to calculate the current velocity, which is affected by both the previous velocity and the current acceleration; and thenβLess than 1, and can limit the speedV _dW Too large, this ensures the smoothness and accuracy of the gradient descent, reduces oscillations, and reaches the minimum faster.

(3) The method is used for predicting the current data type of an electron microscope scientific instrument and mainly comprises the following steps:

3.1 And acquiring a current sequence to be calculated.

3.2 And converting the obtained current sequence into tensor, and predicting through a seq2seq algorithm model to finally obtain a machine-time state sequence mapped with the current sequence.

3.3 And performing time counting according to the prediction result.

In the result of the seq2seq algorithm simulation shown in fig. 4, it can be seen that the model trained by the present invention is more accurate for the time-of-flight calculation analysis of the electron microscope instrument. Generally speaking, the scanning electron microscope has a small current peak value and small fluctuation, while the transmission electron microscope has a large current peak value and more obvious fluctuation; the algorithm model can accurately identify the operation conditions of the same type of electron microscopes with different operation conditions and different waveforms, and can identify the starting-up operation under the condition of large current or strong fluctuation; the standby state can be identified when the non-working current or the fluctuation is stable; the other conditions are identified as shutdown, and the aim of the invention is basically achieved.

To verify the effect of the algorithm model of the present invention, it is compared with seq2seq algorithm without adding the Attention mechanism for optimization. It can be seen from the comparison result of the algorithm simulation shown in fig. 5 that the anchoring mechanism enables the Decoder structure to focus more on the effective feature information, the time-of-flight prediction effect for the current interval with continuous fluctuation is better, and the situation of time-of-flight prediction in the current interval with continuous fluctuation rarely occurs.

The foregoing description of the embodiments is provided to enable one of ordinary skill in the art to make and use the invention, and it is to be understood that other modifications of the embodiments, and the generic principles defined herein may be applied to other embodiments without the use of inventive faculty, as will be readily apparent to those skilled in the art. Therefore, the present invention is not limited to the above embodiments, and those skilled in the art should make improvements and modifications to the present invention based on the disclosure of the present invention within the protection scope of the present invention.

Claims

1. A time calculation method of an electron microscope based on seq2seq algorithm comprises the following steps:

(1) Aiming at scientific instruments such as an electron microscope, establishing a data set about the operating current of the scientific instruments;

(3) Constructing a state prediction model based on a seq2seq algorithm, wherein the state prediction model is formed by sequentially connecting an encoder, an attention mechanism layer and a decoder, the encoder is used for carrying out characteristic encoding on input current data, the attention mechanism layer is used for endowing different hidden layer states in characteristic information with different weights, and the decoder is used for decoding the characteristic information to obtain state probability distribution of an instrument at each moment;

(4) Training the state prediction model by using the current data of the training set;

(5) And inputting the current data of the test set into the trained model, so that the state indication of the instrument at each moment can be predicted, and the duration of the working operation state of the instrument is counted.

2. The electron microscopy time calculation method according to claim 1, characterized in that: the specific implementation manner of the step (1) is as follows: firstly, screening out a plurality of groups of running current sequences related to a scanning electron microscope and a transmission electron microscope from a database of an instrument background management system, wherein the sequences have a complete current period and contain current values at various moments in the period; then, labeling each current value in the operating current sequence to obtain a corresponding label sequence, namely for the current value at any moment, if the instrument is in a working operation state at the moment, assigning a value of 2 to the corresponding label; if the instrument is in a standby state at the moment, assigning the value of the corresponding label as 1; at this moment, the instrument is in a shutdown state, and the corresponding label is assigned to be 0; and combining the operating current sequence and the corresponding tag sequence to form a group of current data, and obtaining a plurality of groups of current data to form a data set.

3. The electron microscopy machine time calculation method according to claim 2, characterized in that: the specific implementation manner of preprocessing the data set in the step (2) is as follows: firstly, the operation current sequence in the data set is normalized, and then the operation current sequence is converted and divided into current input vectors at various moments, namely, for any moment in the operation current sequencetCurrent value ofi _t According to the size of a predetermined windowwGeti _t And before itwCurrent value composition at each timetCurrent input vector of time of day

If the current value is insufficientw+1, then copyi _t Completing; and traversing the current value at each moment in the operation current sequence so as to obtain the current input vector at each moment.

4. The electron microscopy machine time calculation method according to claim 3, characterized in that: the encoder adopts a BilSTM network which comprises a forward LSTM unit and a backward LSTM unit, and the specific calculation expression is as follows:

wherein:His the encoded feature vector output by the encoder,h _t for a BilsTM network intThe hidden state of the moment in time,tis a natural number and is not less than 1t≤m，mFor the length of time the current sequence is run,x _t is composed oftThe current at the time is input into the vector,

and

and

5. The electron microscopy machine time calculation method according to claim 4, characterized in that: the computational expression of the attention mechanism layer is as follows:

wherein:C _t to take care of the output of the mechanical layer, i.e.tA time-sequence vector of the time of day,α _t is in a hidden layer stateh _t The weight value of (a) is set,v _t is in a hidden stateh _t The score of attention of (a) is,exp() Expressed as natural constantseAn exponential function of the base is used,tanh() Which represents a function of the hyperbolic tangent,V _e andW _e as the weight matrix to be learned, ^T which represents a transposition of the image,s _t-1 for the decoder att-hidden state at time 1.

6. The electron microscopy machine time calculation method according to claim 5, characterized in that: the decoder adopts a unidirectional LSTM network to calculate the hidden layer states _t Further hide the layer states _t Attention mechanism layer outputC _t Andt-1 State probability density distribution vector of the instrument at time instantp _t-1 After splicing, the dimension is reduced through a full connection layer to obtaintState probability density distribution vector of time of day instrumentp _t As a model output.

7. The electron microscopy machine time calculation method according to claim 6, characterized in that: the hidden states _t The calculation expression of (a) is as follows:

wherein:s _t for the decoder attThe hidden state of the moment in time,p _t-1 is composed oft-1 state probability density distribution vector of the instrument, LSTM () representing the internal calculation function of the unidirectional LSTM network.

8. The electron microscopy time calculation method according to claim 7, characterized in that: for the initial hidden layer states ₀ Which is a state of being hidden by a full connection layerh _m Reducing the dimension to obtain the product.

9. The electron microscopy time calculation method according to claim 2, characterized in that: the specific process of training the model in the step (4) is as follows:

4.3 According to a loss functionLContinuously and iteratively updating model parameters by using an optimizer through a gradient descent method until a loss function is obtainedLAnd (5) converging and finishing training.

10. The electron microscopy time calculation method according to claim 9, characterized in that: said loss functionLThe expression of (a) is as follows:

wherein:p _t as a result of prediction output by the modeltThe state probability density distribution vector of the instrument at the moment,y _t in the sequence of corresponding tagstThe value of the tag at the time of day,tis a natural number and is not more than 1t≤m，mIs the length of time that the current sequence is run.