CN110895705A - Abnormal sample detection device, training device and training method thereof - Google Patents

Abnormal sample detection device, training device and training method thereof Download PDF

Info

Publication number
CN110895705A
CN110895705A CN201811067951.5A CN201811067951A CN110895705A CN 110895705 A CN110895705 A CN 110895705A CN 201811067951 A CN201811067951 A CN 201811067951A CN 110895705 A CN110895705 A CN 110895705A
Authority
CN
China
Prior art keywords
reconstruction
training
data
reconstruction error
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811067951.5A
Other languages
Chinese (zh)
Other versions
CN110895705B (en
Inventor
庞占中
于小亿
孙俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201811067951.5A priority Critical patent/CN110895705B/en
Publication of CN110895705A publication Critical patent/CN110895705A/en
Application granted granted Critical
Publication of CN110895705B publication Critical patent/CN110895705B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to a training device and a training method for training an abnormal sample detection device, and an abnormal sample detection device. A training apparatus according to the present disclosure includes a first reconstruction unit configured to generate a first reconstruction error and intermediate feature data based on training sample data as normal sample data; and a back-end processing unit configured to generate a second reconstruction error based on the first reconstruction error and the intermediate feature data, wherein the first reconstruction unit and the back-end processing unit are jointly trained based on predetermined criteria with respect to the first reconstruction error and the second reconstruction error. Comprising a first jointly trained reconstruction unit and a back-end processing unit. Compared with the prior art, the abnormal sample detection device can improve the abnormal sample detection performance.

Description

Abnormal sample detection device, training device and training method thereof
Technical Field
The present invention relates generally to the field of classification and detection, and more particularly, to a training apparatus and a training method for training an abnormal sample detection apparatus and an abnormal sample detection apparatus trained by the training apparatus and the training method.
Background
The purpose of abnormal sample detection is to identify abnormal samples that deviate from normal samples. The abnormal sample detection has important practical value and wide application range. For example, anomaly sample detection may be applied to industrial control, network intrusion detection, pathology detection, financial risk identification, video monitoring, and the like.
With the continuous development of artificial intelligence technology, deep learning is applied to solve the problem of abnormal sample detection. However, the specificity of the anomaly sample detection problem presents a significant challenge to deep learning. Firstly, the purpose of abnormal sample detection is to distinguish normal samples from abnormal samples, however, unlike the conventional classification model, the frequency of abnormal samples is low, which makes it difficult to collect enough abnormal samples for classification training. For example, when the abnormality sample detection is applied to detection of an abnormality in the operating temperature of an industrial machine, there may be a case where the operating temperature of the machine is abnormal only once or twice in collected data for several days, and the collected abnormal temperature samples are insufficient for classification training.
Furthermore, even if enough abnormal samples are collected, it is impossible to acquire complete knowledge of the abnormal samples. For example, in video monitoring, it is assumed that abnormal situations such as bicycles, motor vehicles, and the like, which occur in a pedestrian street, are monitored. However, the type of the abnormal sample in the actual scene may exceed the previously estimated value. For example, if the presence of a bicycle, a motor vehicle, is predefined as an abnormal sample class, it is difficult to judge whether these samples are normal samples or abnormal samples when objects such as a skateboard, a roller skate, a tricycle, etc. appear in a monitoring scene.
At present, the solution to the above problem is based on the following idea: since the abnormal sample class cannot be completely defined, only the normal sample class is defined, and thus any sample that does not belong to the normal sample class is defined as belonging to the abnormal sample class.
Current abnormal sample detection techniques include reconstruction error-Based abnormal sample detection techniques (e.g., SROSR (sparse representation-Based open set retrieval)), probability density-Based abnormal sample detection techniques (e.g., dagmm (deep automation sampling model)), Energy-Based abnormal sample detection techniques (e.g., dsebm (deep Structured Energy Based model)), and the like. Among these conventional abnormal sample detection techniques, an abnormal sample detection technique based on a reconstruction error is widely used because it is simple and has a good performance.
In particular, the reconstruction error refers to an error between an input sample of a reconstruction model and a reconstructed sample, wherein the reconstruction model is capable of compressing the input sample to extract feature data and reconstructing the input sample based on the extracted feature data. For the reconstruction model, the smaller the reconstruction error between the input sample and the reconstruction sample is, the better the reconstruction effect of the reconstruction model is.
Only normal samples are used for training in the training process of the reconstruction model, that is, the reconstruction model only learns how to reconstruct the normal samples. The trained reconstruction model generates a smaller reconstruction error for a normal sample, and when the input sample is an abnormal sample, the reconstruction model does not learn how to reconstruct the abnormal sample, so a larger reconstruction error is generated. Therefore, the reconstruction model can distinguish the normal sample from the abnormal sample according to the size of the reconstruction error, and the abnormal sample detection is realized.
However, the reconstruction model of the prior art has the following problems in practical application: abnormal samples may differ less from normal samples and are therefore difficult to identify correctly. Therefore, there is still a need for an abnormal sample detection technique that can more accurately distinguish between normal samples and abnormal samples.
Disclosure of Invention
In order to further improve the detection performance of the abnormal samples, an abnormal sample detection technology is proposed, which uses only normal samples as training data in the training process, reconstructs the normal samples by using a front-end reconstruction model, and performs further back-end processing on information extracted by the front-end reconstruction model to perform joint training with the front-end reconstruction model.
A brief summary of the disclosure is provided below in order to provide a basic understanding of some aspects of the disclosure. It should be understood that this summary is not an exhaustive overview of the disclosure. It is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.
An object of the present disclosure is to provide a training apparatus and a training method for training an abnormal sample detection apparatus. The training abnormal sample detection device for training by the training device and the training method can more accurately distinguish the normal sample from the abnormal sample.
In order to achieve the object of the present disclosure, according to one aspect of the present disclosure, there is provided a training device for training an abnormal sample detection device, the training device including: a first reconstruction unit configured to generate a first reconstruction error and intermediate feature data based on training sample data as normal sample data; and a back-end processing unit configured to generate a second reconstruction error based on the first reconstruction error and the intermediate feature data, wherein the first reconstruction unit and the back-end processing unit are jointly trained based on predetermined criteria with respect to the first reconstruction error and the second reconstruction error.
According to another aspect of the present disclosure, there is provided an abnormal sample detection apparatus including a trained first reconstruction unit and a back-end processing unit obtained by training of the training apparatus according to the above-described aspect of the present disclosure.
According to another aspect of the present disclosure, there is provided a training method for training an abnormal sample detection apparatus, the training method including: a first reconstruction step of generating a first reconstruction error and intermediate feature data based on training sample data as normal sample data by a first reconstruction unit; a back-end processing step of generating, by a back-end processing unit, a second reconstruction error based on the first reconstruction error and the intermediate feature data; and a joint training step for performing joint training on the first reconstruction unit and the back-end processing unit based on a predetermined criterion on the first reconstruction error and the second reconstruction error.
According to another aspect of the present disclosure, a computer program is provided that is capable of implementing the training method described above. Furthermore, a computer program product in the form of at least a computer readable medium is provided, having computer program code recorded thereon for implementing the training method described above.
The abnormal sample detection device for training according to the technology disclosed by the invention is trained based on normal samples, wherein information extracted by fully utilizing a front-end reconstruction model is utilized in the training process, so that the normal samples and the abnormal samples can be more accurately distinguished.
Drawings
The above and other objects, features and advantages of the present disclosure will be more readily understood by reference to the following description of embodiments of the present disclosure taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a block diagram illustrating a training apparatus for training an abnormal sample detection apparatus according to the present disclosure;
FIG. 2 is a schematic diagram illustrating a first reconstruction unit implemented using a depth convolutional auto-encoder, according to an embodiment of the present disclosure;
FIG. 3 is a block diagram illustrating a training apparatus according to a first embodiment of the present disclosure;
FIG. 4 is a schematic diagram showing the construction of a training apparatus according to a first embodiment of the present disclosure;
FIG. 5 is an operational flow diagram illustrating a second reconstruction unit implemented using a long short term memory model (DCAE) according to a first embodiment of the present disclosure;
FIG. 6A is a graph illustrating a probability distribution of a first reconstruction error of a DCAE;
fig. 6B is a diagram showing a combined distribution of the first reconstruction error e and the second reconstruction error e' of the DCAE;
FIG. 7 is a schematic diagram showing the construction of a training apparatus according to a second embodiment of the present disclosure;
fig. 8 is a graph illustrating a method for predicting a second reconstruction error according to a second embodiment of the present disclosure;
FIG. 9 is a flow chart illustrating a training method for training an abnormal sample detection apparatus according to an embodiment of the present disclosure; and
FIG. 10 is a block diagram illustrating the structure of a general-purpose machine that may be used to implement a training apparatus and a training method according to embodiments of the present disclosure.
Detailed Description
Hereinafter, some embodiments of the present disclosure will be described in detail with reference to the accompanying illustrative drawings. When elements of the drawings are denoted by reference numerals, the same elements will be denoted by the same reference numerals although the same elements are shown in different drawings. Further, in the following description of the present disclosure, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure unclear.
[01] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and "having," when used in this specification, are intended to specify the presence of stated features, entities, operations, and/or components, but do not preclude the presence or addition of one or more other features, entities, operations, and/or components.
[02] Unless otherwise defined, all terms used herein including technical and scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which the inventive concept belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. The present disclosure may be practiced without some or all of these specific details. In other instances, to avoid obscuring the disclosure with unnecessary detail, only components that are germane to the aspects in accordance with the disclosure are shown in the drawings, while other details that are not germane to the disclosure are omitted.
Hereinafter, a training apparatus and a training method for training an abnormal sample detection apparatus according to each embodiment of the present disclosure will be described in detail with reference to the accompanying drawings.
< first embodiment >
First, a training apparatus 100 for training an abnormal sample detection apparatus according to a first embodiment of the present disclosure will be described with reference to fig. 1 to 6B.
Fig. 1 is a block diagram illustrating a training apparatus 100 for training an abnormal sample detection apparatus according to the present disclosure.
As shown in fig. 1, the training apparatus 100 may include: a first reconstruction unit 101 for generating a first reconstruction error and intermediate feature data based on training sample data as normal sample data, and a back-end processing unit 102 for generating a second reconstruction error based on the first reconstruction error and intermediate feature data. In a training process, joint training is performed on the first reconstruction unit and the back-end processing unit based on predetermined criteria with respect to the first reconstruction error and the second reconstruction error. The first reconstruction unit 101 and the back-end processing unit 102, which are finally obtained through the joint training, together constitute an abnormal sample detection apparatus.
According to an embodiment of the present disclosure, the first reconstruction unit 101 may perform a reconstruction operation on the intermediate feature data to output first reconstruction data. Furthermore, according to an embodiment of the present disclosure, the first reconstruction error may be a distance, e.g., a euclidean distance, in vector space of the first reconstruction data and the corresponding training sample data.
Those skilled in the art will recognize that although embodiments of the present disclosure have been described using euclidean distances as an example for the first reconstruction error, the present disclosure is not limited thereto. Indeed, a person skilled in the art may use other indices than the euclidean distance for measuring the difference between the first reconstructed data and the training sample data, such as the mahalanobis distance, the cosine distance, etc. in the vector space, all of which shall be covered by the scope of the present disclosure as well.
According to one embodiment of the disclosure, the dimension of the training sample data is greater than or equal to the dimension of the intermediate feature data. In fact, the first reconstruction unit 101 may perform a feature extraction operation on the training sample data, the intermediate feature data characterizing the extracted features. For example, in the case where the technique according to the present disclosure is applied to image recognition, the training sample data may be normal two-dimensional image data, and the intermediate feature data may be one-dimensional vectors characterizing features of the extracted two-dimensional image data.
Further, in the case where the technique according to the present disclosure is applied to industrial control, the training sample data may be a one-dimensional vector composed of data sensed by each industrial sensor, and at this time, the intermediate feature data may be a one-dimensional vector having a smaller number of elements than the training sample data.
Subsequently, the first reconstruction unit 101 may perform a reconstruction operation based on the intermediate feature data, resulting in first reconstruction data. The first reconstructed data has the same dimensions as the training sample data.
According to an embodiment of the present disclosure, the first reconstruction unit 101 may be implemented by an auto-encoder.
An autoencoder is a neural network including an input layer composed of neurons, a hidden layer, and an output layer. The self-encoder can implement compression and decompression processes of data.
An autoencoder consists of an encoder and a decoder, both of which essentially perform some sort of transform processing on the data. The encoder is used to encode (compress) input data into low-dimensional data, and the decoder is used to decode (decompress) the compressed low-dimensional data into output data. An ideal self-encoder makes the reconstructed output data identical to the original input data, i.e. the error between the input data and the output data is zero.
Since the self-encoder is a technique known to those skilled in the art, the details of the self-encoder are not further described herein for the sake of brevity. Furthermore, those skilled in the art will recognize that although embodiments of the present disclosure implement the first reconstruction unit 101 by using an auto-encoder, the present disclosure is not limited thereto. In fact, according to the idea of the present disclosure, a person skilled in the art may use other reconstruction models than the self-encoder to implement the reconstruction function of the first reconstruction unit, as long as the reconstruction model is capable of extracting the intermediate feature data and calculating the first reconstruction error. All such reconstruction models are intended to be included within the scope of the present disclosure.
Fig. 2 is a schematic diagram illustrating a first reconstruction unit 101 implemented using a depth convolution auto-encoder according to an embodiment of the present disclosure.
As shown in fig. 2, according to one embodiment of the present disclosure, the first reconstruction unit 101 may be implemented by a Depth Convolution Auto Encoder (DCAE).
As shown in fig. 2, both the encoder and decoder of DCAE are implemented by a Convolutional Neural Network (CNN), and thus can process complex image data.
Given that CNN is a technique known to those skilled in the art, the details of CNN are not described further herein for the sake of brevity.
Those skilled in the art will recognize that although embodiments of the present disclosure are illustrated by applying a depth convolution auto-encoder to image data, the present disclosure is not so limited. Different types of autoencoders may be applied depending on the type of sample data to be processed in a particular application environment. For example, when the abnormal sample detection apparatus according to the present disclosure is applied to an industrial control environment, the input data may be data composed of data sensed by various sensors, in which case, the technical solution of the present disclosure may be implemented using a sparse self-encoder, and all of these technical solutions should be covered within the scope of the present disclosure.
For example, as shown in fig. 2, the encoder and decoder of the DCAE constituting the first reconstruction unit 101 each include several hidden layers, such as a convolutional layer, a pooling layer, and a full-link layer on the encoder side, and a deconvolution layer, an inverse pooling layer, and a full-link layer on the decoder side. In this regard, features of the spatial information may be learned by the convolutional layer, and information redundancy in the learned feature map is eliminated by the pooling layer. In addition, in order to obtain DCAE with strong generalization capability through learning, some additional processing including noise addition, whitening, clipping and inversion of training sample data can be adopted in training. In addition, during training, DCAE may employ dropout processing and regularization processing to prevent overfitting.
Specifically, for training sample data x input as normal sample data, the encoder of DCAE will perform feature extraction on the training sample data x to obtain low-dimensional intermediate feature data h. In addition, the decoder of the DCAE restores the intermediate feature data h to the first reconstructed data x' whose dimensions coincide with the training sample vector x. The above process can be represented by the following formula (1).
h=f1(W1x+b1),x'=f2(W2h+b2) (1)
Wherein f is1、f2Is an activation function, W1A connection weight matrix of neurons of the convolutional neural network on the encoder side, b1Is the bias vector, W, of the neurons of the convolutional neural network on the encoder side2A connection weight matrix of neurons of a convolutional neural network on the decoder side, b2Is a bias vector of a neuron of a convolutional neural network on the decoder side, where W1、b1Is the parameter to be trained on the encoder side, and can be represented by thetaeRepresents, and W2、b2Is a parameter to be trained at the decoder side, and can be represented by thetadAnd (4) showing.
In view of the above-described processing involving DCAE being a technique known to those skilled in the art, for the sake of brevity only the application of DCAE in embodiments of the present disclosure will be described herein without a more detailed description of its principles.
Assume that the training sample set contains N training samples xi(1 ≦ i ≦ N), the loss function of DCAE that may be used to characterize the first reconstruction error is represented by the following equation (2).
Figure BDA0001798781530000071
Where the subscript 2 indicates taking the second norm.
To avoid overfitting and improve generalization ability, according to one embodiment of the present disclosure, a regularization term may be added to the loss function of equation (2) above. Thus, the loss function of the above formula (2) may have a form as shown in the following formula (3).
Figure BDA0001798781530000081
Wherein theta isiIs a parameter to be trained of a fully-connected layer in a convolutional neural network constituting an encoder and a decoder of DCAE, k is the number of parameters to be trained in the fully-connected layer, and λ1Is a predetermined hyper-parameter, i.e. a regularization parameter, which may be determined empirically or experimentally. According to one embodiment of the present disclosure, λ1May be, for example, 10000.
Since the loss function of DCAE is known to the person skilled in the art, the details thereof will not be described further here for the sake of brevity.
During the joint training of the training apparatus 100, the parameter to be trained of the first reconstruction unit 101 is realized by DCAE as θeAnd thetad. Where the loss function contains a regularization term, the band training parameters also include θi
It should be noted here that for the ith training sample data x of the N training sample dataiThe first reconstruction unit 101 generates the and x by performing a reconstruction operationiCorresponding first reconstruction data x'iThe difference between the two being related to the training sample data xiFirst reconstruction error e ofi. Thus, the loss function of the above equations (2) and/or (3) may in this sense represent the first reconstruction error with respect to the population of N training sample data. Therefore, the training process that minimizes the first reconstruction error of the first reconstruction unit 101 may be regarded as a process that minimizes the loss function of the first reconstruction unit 101.
Further, as shown in fig. 2, the first reconstruction unit 101 implemented by DCAE may generate the first reconstruction error e and the intermediate feature data h based on training sample data x whose dimension is greater than or equal to that of the intermediate feature data h.
As described above, the first reconstruction error e may be a difference between input training sample data and output first reconstruction data of the first reconstruction unit 101, which may be represented by a euclidean distance of the training sample data and the first reconstruction data in a vector space. The intermediate feature data h may represent features of the extracted training sample data, so both contain some information of the originally input training sample data.
In the prior art, for normal sample data, the first reconstruction unit implemented by DCAE can obtain a better reconstruction effect, that is, the first reconstruction error between the input sample data and the output reconstructed data is smaller. However, when the first reconstruction unit trained by using normal sample data reconstructs abnormal sample data, the obtained first reconstruction error is large, so that the abnormal sample data can be distinguished.
However, for some abnormal sample data, the first reconstruction error obtained by the first reconstruction unit may also be relatively small, and thus may be mistaken for normal sample data, thereby causing the detection of the abnormal sample to fail.
Therefore, the technical scheme according to the present disclosure comprehensively considers the first reconstruction error e and the intermediate feature data h, thereby utilizing the training sample data to the maximum extent.
According to the present disclosure, the back-end processing unit 102 may further process the first reconstruction error e and the intermediate feature data h obtained by the first reconstruction unit 101 to obtain a second reconstruction error e'. In this way, the training apparatus 100 can perform joint training, which takes into account not only information of training sample data included in the first reconstruction error e but also information of training sample data included in the intermediate feature data h, on the first reconstruction unit 101 and the back-end processing unit 102 based on predetermined criteria regarding the first reconstruction error e and the second reconstruction error e', and thus can obtain an abnormal sample detection apparatus with improved detection accuracy by this joint training.
The back-end processing unit 102 according to an embodiment of the present disclosure may process the first reconstruction error e and the intermediate feature data h in various ways.
Fig. 3 is a block diagram illustrating a training apparatus 100 according to a first embodiment of the present disclosure, in which one example of a back-end processing unit 102 is given. Fig. 4 is a schematic diagram showing the configuration of the training apparatus 100 according to the first embodiment of the present disclosure.
As shown in fig. 3, according to the first embodiment of the present disclosure, the back-end processing unit 102 may include a synthesis unit 1021 for generating synthetic data based on the first reconstruction error and the intermediate feature data, and a second reconstruction unit 1022 for generating a second reconstruction error based on the synthetic data, wherein the joint training is performed on the first reconstruction unit 101, the synthesis unit 1021, and the second reconstruction unit 1022 according to a predetermined criterion that minimizes the first reconstruction error and the second reconstruction error.
According to an embodiment of the present disclosure, the synthesis unit 1021 may generate the synthesized data z based on the first reconstruction error e and the intermediate feature data h. For example, the synthesis unit 1021 may stitch the first reconstruction error e directly together with the intermediate feature data h to form the synthesized data z. Typically, the first reconstruction error e is a numerical value and the intermediate feature data h is a one-dimensional vector, and directly stitching the two together will form a new one-dimensional vector as the composite data z. However, the present disclosure is not limited thereto. One skilled in the art may combine the first reconstruction error e and the intermediate feature data h in other ways to form the composite data z in accordance with the teachings of the present disclosure.
In some cases, the first reconstruction error e may differ significantly from the intermediate feature data h in size and magnitude and thus cannot be combined directly.
In this case, according to one embodiment of the present disclosure, the synthesis unit 1021 may normalize the intermediate feature data h to match the first reconstruction error e in terms of dimension and magnitude, and then may combine the normalized intermediate feature data with the first reconstruction error e into the synthesized data z. For example, the normalization process may be a normalization process of each data element of the intermediate feature data h based on the first reconstruction error e.
Further, the intermediate feature data h is compressed low-dimensional data, which itself has no sequence property. As shown in fig. 4, in a case where the second reconstruction unit 1022 is implemented by a long-short term memory model (LSTM) (described in more detail later), in order to facilitate further processing of the synthesized data z, sequence learning may be performed on the intermediate feature data h, according to an embodiment of the present disclosure.
For example, the sequence learning may be performed according to the following equation (4).
h′=f3(W3·h+b3) (4)
Where h' is serialized intermediate feature data obtained by performing sequence learning on the intermediate feature data h, W3And b3Is a connection weight matrix and an offset vector as parameters to be learned, f3Is an activation function. Here, the parameter W used for the sequence learning performed by the synthesis unit 10213And b3Is trained during the joint training of the first reconstruction unit 101, the synthesis unit 1021 and the second reconstruction unit 1022. It should be noted that the synthesis unit 1021 has no loss function of its own, parameter W3And b3The impact on the joint training is reflected in the loss function of the second reconstruction unit 1022.
Subsequently, the resulting serialized intermediate feature data h' subjected to sequence learning is combined with the first reconstruction error e to obtain synthetic data z.
Here, it should be noted that it is not necessary to perform sequence learning on the intermediate feature data h. For example, the second reconstruction unit 1022 may also be implemented by DCAE, in which case the sequence learning need not be performed on the intermediate feature data h.
As shown in fig. 4, the synthesized data z synthesized by the synthesis unit 1021 may be input into the second reconstruction unit 1022, and the second reconstruction unit 1022 performs a reconstruction operation on the synthesized data z, calculating a second reconstruction error e 'from a difference between the resultant second reconstruction data z' and the synthesized data z.
According to an embodiment of the present disclosure, the second reconstruction error e 'may be a distance, e.g., a euclidean distance, in vector space of the second reconstruction data z' and the synthetic data z.
Similar to the first reconstruction error e, although the embodiments of the present disclosure have explained the second reconstruction error e' by using the euclidean distance as an example, the present disclosure is not limited thereto. Indeed, a person skilled in the art may use other indicators than the euclidean distance for measuring the difference between the second reconstructed data and the synthetic data, such as the mahalanobis distance, the cosine distance, etc., all of which shall be covered by the scope of the present disclosure.
As shown in fig. 4, the second reconstruction unit 1022 may be implemented using a Long Short Term Memory (LSTM) model according to an embodiment of the present disclosure.
The LSTM model is a sequence Recurrent Neural Network (RNN) that is suitable for processing and predicting significant events of very long intervals and delays in sequence features. The LSTM model is able to learn long time range dependencies through its memory cells, which typically include four cells, input gate itOutput gate otForgetting door ftAnd storage state ctWhere t represents the current time step. Storage state ctThe current state of the other cells is influenced according to the state of the last time step. Forget door ftCan be used to determine which information should be discarded. The above process can be represented by the following formula (5)
it=σ(W(i,x)xt+W(i,h)ht-1+bi)
ft=σ(W(f,x)xt+W(f,h)ht-1+bf)
gt=tanh(W(g,x)xt+W(g,h)ht-1+bg)
ct=it⊙gt+ft⊙ct-1(5)
ot=σ(W(o,x)xt+W(o,h)ht-1+bo)
ht=ot⊙tanh(ct)
Where σ is the sigmoid function, ⊙ denotes the sequential multiplication of vector elements, xtInput representing current time step t, htIndicating the current timeIntermediate state between steps t, otRepresenting the output of the current time step t. Connection weight matrix W(i,x)、W(f,x)、W(g,x)、W(o,x)And an offset vector bi、bf、bg、boIs the parameter to be trained, denoted in this text by θlAnd (4) showing.
In view of the fact that the LSTM model is known to those skilled in the art, for the sake of brevity, only its application to embodiments of the present disclosure is described herein, without a more detailed description of its principles.
According to an embodiment of the present disclosure, in order to improve the effect of the reconstruction operation, the second reconstruction unit 1022 implemented by the LSTM model may perform both forward propagation and backward propagation.
Fig. 5 is an operation flow diagram illustrating the second reconstruction unit 1022 implemented using the long-short term memory model according to the first embodiment of the present disclosure.
As shown in fig. 5, the LSTM model implementing the second reconstruction unit 1022 receives the synthetic data z, which includes the serialized intermediate feature data h 'and the first reconstruction error e, and is forward propagated in n LSTM units, where the number of n is equal to the vector length of the serialized intermediate feature data h'.
Furthermore, to improve the effect of the reconstruction operation, the LSTM model also performs back propagation for reconstruction. In fig. 5, the symbols with wavy line superscripts indicate forward propagating reconstruction results, and the symbols with sharp corner superscripts indicate backward propagating reconstruction results.
Thus, the loss function of the LSTM model for implementing the second reconstruction unit 1022 may be represented by the following equation (6).
Figure BDA0001798781530000121
Wherein, h'i,jA j-th sequence vector representing the i-th intermediate feature data of the N serialized intermediate feature data h',
Figure BDA0001798781530000122
is h'i,jCorresponding forward propagating intermediate states, and
Figure BDA0001798781530000123
is h'i,jCorresponding counter-propagating intermediate states. Furthermore, eiRepresents the ith first reconstruction error of the N first reconstruction errors, and
Figure BDA0001798781530000124
is represented byiCorresponding forward propagated results.
Further, λ2Is a predetermined hyper-parameter that can be used to adjust the proportion of the serialized intermediate feature data h 'and the first reconstruction error e in the resulting second reconstruction error e', which can be determined empirically or experimentally. E.g. λ2Is in the range of 0.1 to 1.
It should be noted here that, due to the recursive nature of the LSTM model and the physical meaning of the first reconstruction error e, the LSTM performs only forward propagation on it for the first reconstruction error e.
As described above, for the ith training sample data x of the N training sample dataiAnd xiThe corresponding first reconstruction data is x'iThe difference between the two is the training sample data x of the first reconstruction unit 101iFirst reconstruction error e ofi. Further, the first reconstruction unit 101 aims at the ith training sample data xiGenerating intermediate feature data hi. The synthesis unit 1021 pairs intermediate feature data hiPerforming sequence learning to obtain serialized intermediate feature data h'iAnd compares it with a first reconstruction error eiCombined into synthetic data zi
The LSTM model used to implement second reconstruction unit 1022 is directed to serialized intermediate feature data h 'by forward and backward propagation'iTwo reconstructed intermediate feature data are generated separately and directed to a first reconstruction error e by forward propagationiA reconstructed first reconstruction error is generated.
In summary, the loss function of equation (6) above may represent the second reconstruction error with respect to the population of N synthetic data in this sense.
As described above, in the training apparatus 100 according to the first embodiment of the present disclosure, the first reconstruction unit 101 may generate the first reconstruction error and the intermediate feature data by performing reconstruction on normal sample data used for training; the synthesis unit 1021 may combine the first reconstruction error and the intermediate feature data into synthesized data; subsequently, the second reconstruction unit 1022 may perform reconstruction on the synthetic data to generate a second reconstruction error, where the first reconstruction error generated by the first reconstruction unit may be generally represented by equation (2) or (3) above, and the second reconstruction error generated by the second reconstruction unit may be generally represented by equation (6) above.
According to the first embodiment of the present disclosure, the predetermined criterion on which the joint training is performed on the first reconstruction unit 101 and the back-end processing unit 102 (e.g., including the synthesis unit 1021 and the second reconstruction unit 1022) is to minimize the sum of both the first reconstruction error and the second reconstruction error. The predetermined criterion may be expressed by an overall loss function of the training apparatus 100 as shown in the following equation (7).
J(θedl)=JDCAEed)+λ3JLSTMl) (7)
Wherein λ3Is a predetermined hyper-parameter, i.e. a weight, which can be used to adjust the proportion of the first reconstruction error e and the second reconstruction error e' in the joint training process. In general, in order to generate representative low-dimensional intermediate feature data and first reconstruction errors, the loss function of the first reconstruction unit 101 should always dominate. Thus, the hyperparameter λ3Is usually set to less than 1, e.g. a hyperparameter λ3The value of (a) is in the range of 0.1 to 0.001.
The training apparatus 100 performs joint training of the first reconstruction unit 101 and the back-end synthesis unit 102 (including, for example, the synthesis unit 1021 and the second reconstruction unit 1022) using training sample data as normal sample data in a gradient descent method based on the loss function of the above expression (7) until a predetermined number of iterations is reached, or until a difference between results of two or more iterations stabilizes within a predetermined range. The first reconstruction unit 101 and the back-end synthesis unit 102 (e.g., including the synthesis unit 1021 and the second reconstruction unit 1022) finally obtained by the joint training may constitute an abnormal sample detection apparatus.
In the detection of abnormal sample data by the abnormal sample detection apparatus, when the input data is normal sample data, the second reconstruction error e' output by the back-end processing unit 102 (including, for example, the synthesis unit 1021 and the second reconstruction unit 1022) that contains information on the first reconstruction error e output by the first reconstruction unit 101 is smaller than a predetermined threshold. The predetermined threshold may be used to distinguish between normal sample data and abnormal sample data. The predetermined threshold may be determined empirically or experimentally.
Therefore, when sample data to be detected is input, if the second reconstruction error e' output by the back-end processing unit 102 (including, for example, the synthesis unit 1021 and the second reconstruction unit 1022) is not less than the predetermined threshold value, it may be determined that the input sample data is abnormal sample data.
The idea of the present disclosure is further explained below. Fig. 6A is a diagram illustrating a probability distribution of the first reconstruction error e of the DCAE, and fig. 6B is a diagram illustrating a distribution of the first reconstruction error e and the intermediate feature data h of the DCAE. The dark portions in fig. 6A and 6B correspond to normal samples for training, and the light portions correspond to abnormal samples to be detected.
As shown in fig. 6A, if only the first reconstruction error e of the first reconstruction unit 101 is considered, there is overlap of probability distributions of normal samples and abnormal samples at a portion circled in the figure, and thus the first reconstruction unit 101 cannot accurately identify abnormal samples within the portion. As shown in fig. 6B, considering the first reconstruction error e in further combination with the intermediate feature data h according to the technique of the present disclosure, it can be clearly seen that the normal sample and the abnormal sample can be more accurately distinguished.
Thus, according to the techniques of this disclosure, both the first reconstruction error e and the intermediate feature data h are used in conjunction for training to retain as much information as possible of the normal sample data input for training. Through processing according to the technique of the present disclosure, as shown in fig. 6B, a normal sample and an abnormal sample can be clearly distinguished, thereby improving the accuracy of abnormal sample detection.
The abnormal sample detection apparatus according to the first embodiment of the present disclosure was tested against a classical grayscale image data set MNIST commonly used in the art. The test results are shown in table 1 below.
TABLE 1
Figure BDA0001798781530000141
Figure BDA0001798781530000151
Where ρ represents an abnormal ratio, and indexes Prec (precision rate), Rec (recall rate), and F1(F value) are indexes commonly used in the existing abnormal sample detection technology to measure the detection performance. The definition is shown in the following formula (8):
Figure BDA0001798781530000152
Figure BDA0001798781530000153
Figure BDA0001798781530000154
TP, FN, FP and TN in the formula (8) represent true positive, false negative, false positive and true negative, respectively.
The test results in table 1 show that the abnormal sample detection apparatus obtained by the training performed by the training apparatus 100 according to the first embodiment of the present disclosure is superior to the abnormal sample detection apparatuses DSEBM, DAGMM, and OCSVM of the related art in each index for measuring the abnormal sample detection performance.
< second embodiment >
Next, a training apparatus 100 for training an abnormal-sample detecting apparatus according to a second embodiment of the present disclosure will be described with reference to fig. 7 and 8.
The second embodiment of the present disclosure is different from the first embodiment in that the back-end processing unit 102 that performs back-end processing on the first reconstruction error and the intermediate feature data output by the first reconstruction unit 101 is implemented using a prediction mechanism, and therefore, for the sake of brevity, repetitive description of the first reconstruction unit 101 will not be made here.
As described above, if only the first reconstruction error e of the first reconstruction unit 101 is considered, there is overlap of probability distributions of normal samples and abnormal samples at the portion circled in fig. 6A, and thus the first reconstruction unit 101 cannot accurately identify abnormal samples within the portion.
According to the second embodiment of the present disclosure, the back-end processing unit 102 may predict the second reconstruction error e 'based on the first reconstruction error e and the intermediate feature data h, wherein the predetermined criterion for performing the joint training on the first reconstruction unit 101 and the back-end processing unit 102 is to minimize a difference between the second reconstruction error e' and the first reconstruction error e.
According to one embodiment, the back-end processing unit 102 may be implemented by a multi-layer perceptron (MLP).
Fig. 7 is a schematic diagram showing the configuration of the training apparatus 100 according to the second embodiment of the present disclosure.
As shown in fig. 7, the first reconstruction unit 101 of the training apparatus 100 of the second embodiment of the present disclosure is the same as the first reconstruction unit 101 of the first embodiment except that the back-end processing unit 102 is implemented by an MLP.
MLP is a forward neural network with a hidden layer that can be used to fit complex functions.
The second embodiment of the present disclosure is based on the idea that the second reconstruction error e' can be predicted from the intermediate feature data h by establishing a correspondence relationship between the intermediate feature data h and the first reconstruction error e through training of the back-end processing unit 102 implemented by MLP.
The second reconstruction error e' output by the back-end processing unit 102 of the MLP implementation may be represented by the following equation (9).
e'=fm(Wmh+bm) (9)
Wherein f ismIs an activation function, WmIs the connection weight matrix of each layer of neurons in MLP, bmIs a bias vector of a neuron, where WmAnd bmIs the parameter to be trained of the MLP, which may be referred to herein as θmAnd (4) showing.
The training of the back-end processing unit 102 by MLP may be regarded as establishing a correspondence between the intermediate feature data h and the first reconstruction error e, and the trained back-end processing unit 102 may predict a second reconstruction error e 'for the intermediate feature data, the training aiming at bringing the second reconstruction error e' as close as possible to the corresponding first reconstruction error e. For example, according to one embodiment of the present disclosure, training is performed on the MLP such that the difference between the second reconstruction error e' and the first reconstruction error e is minimal.
In summary, for N training samples, the cost function of the MLP that can be used to generally characterize the difference between the second reconstruction error e' and the first reconstruction error e can be represented by equation (10) below.
Figure BDA0001798781530000161
In this way, for MLP trained with training sample data as normal sample data, the second reconstruction error e 'very close to the first reconstruction error e can be predicted, while for abnormal sample data, the difference between the predicted second reconstruction error e' and the corresponding first reconstruction error e is very large, whereby abnormal sample data can be identified.
As described above, in the training apparatus 100 according to the second embodiment of the present disclosure, the first reconstruction unit 101 may generate the first reconstruction error and the intermediate feature data by performing reconstruction on normal sample data used for training; the back-end processing unit 102 may establish a correspondence between the first reconstruction error and the intermediate feature data, and predict a second reconstruction error therefrom, wherein the first reconstruction error generated by the first reconstruction unit may be generally expressed by equation (2) or (3) above, and the second reconstruction error generated by the second reconstruction unit may be generally expressed by equation (10) above.
According to a second embodiment of the present disclosure, the predetermined criterion on which the joint training is performed for the first reconstruction unit 101 and the back-end processing unit 102 is such that the difference between the first reconstruction error and the second reconstruction error is minimal. The predetermined criterion may be expressed by an overall loss function of the training apparatus 100 as shown in the following equation (11).
J(θedm)=JDCAEed)+λ4JMLPm) (11)
Wherein λ4Is a predetermined hyper-parameter, i.e. a weight, which may be used to adjust the proportion of the first reconstruction error e and the second reconstruction error e' in the joint training process, which may be determined empirically or experimentally. In general, in order to generate representative low-dimensional intermediate feature data and first reconstruction errors, the loss function of the first reconstruction unit 101 should always dominate. Thus, the hyperparameter λ4Is usually set to less than 1, e.g. a hyperparameter λ4The value of (a) is in the range of 0.1 to 0.001.
The training apparatus 100 performs joint training of the first reconstruction unit 101 and the back-end processing unit 102 using training sample data as normal sample data in a gradient descent method based on the loss function of the above equation (11) until a predetermined number of iterations is reached, or until a difference between results of two or more iterations is stabilized within a predetermined range. The first reconstruction unit 101 and the back-end processing unit 102 obtained by the joint training may constitute an abnormal sample detection apparatus.
The principle of the second embodiment of the present disclosure is further explained below with reference to fig. 8. Fig. 8 is a graph illustrating a method for predicting a second reconstruction error e' according to a second embodiment of the present disclosure.
The graph shown in fig. 8 is schematic, which corresponds to fig. 6A. The dark curves in fig. 8 correspond to normal samples used for training, while the light curves correspond to abnormal samples to be detected.
As shown in fig. 8, since the first reconstruction unit 101 performs training using training sample data that is normal sample data, the first reconstruction error is small for normal sample data, and is large for abnormal sample data. However, as shown in fig. 8, the probability distribution curve of the first reconstruction error with respect to the normal sample data intersects the probability distribution curve of the first reconstruction error with respect to the abnormal sample data, resulting in a failure to accurately judge whether the data input to the first reconstruction unit 101 is the normal sample data or the abnormal sample data based on the first reconstruction error within the intersected portion.
Here, as shown in fig. 8, normal sample data may be divided into two groups, the first group of normal sample data yielding a smaller first reconstruction error en1And corresponding intermediate characteristic data hn1While the second set of normal sample data yields a larger first reconstruction error en2And corresponding intermediate characteristic data hn2Through joint training of the first reconstruction unit 101 and the back-end processing unit 102, (h) is establishedn1,hn2) And (e)n1,en2) The second reconstruction error e 'predicted by the back-end processing unit 102 at this time'n1,e'n2And a first reconstruction error en1,en2The differences therebetween are each less than some predetermined threshold.
When the trained abnormal sample detection device detects an abnormal sample, there are two cases. The first case is a larger first reconstruction error ea1Since such a large first reconstruction error never occurs in the training phase, the intermediate feature data h is not considereda1How the back-end processing unit 102 predicts the second reconstruction error e'a1Necessarily with the first reconstruction error ea1The difference is large, for example, greater than the predetermined threshold, so that it can be determined as abnormal sample data accordingly.
The second case is a smaller first reconstruction error ea2Larger of the normal sample dataA reconstruction error en2Close. However, a smaller first reconstruction error ea2Corresponding intermediate characteristic number ha2Necessarily differs from the larger first reconstruction error en2Corresponding intermediate characteristic number hn2So that a corresponding second reconstruction error e 'is predicted by back-end processing unit 102'a2Necessarily with the first reconstruction error ea2The difference is large, for example, greater than the predetermined threshold, so that it can be determined as abnormal sample data accordingly.
Therefore, by establishing the corresponding relationship between the intermediate feature data h of the normal sample data and the first reconstruction error e through the back-end processing unit 102, the abnormal sample falling in the intersection region of the probability distribution curve of the first reconstruction error with respect to the normal sample data and the probability distribution curve of the first reconstruction error with respect to the abnormal sample data can be accurately identified, so that the accuracy of detecting the abnormal sample is improved.
The abnormal sample detection apparatus according to the second embodiment of the present disclosure was tested against a classical grayscale image data set MNIST commonly used in the art. The test results are shown in table 2 below.
TABLE 2
Figure BDA0001798781530000181
Figure BDA0001798781530000191
The test results in table 2 show that the abnormal sample detection apparatus finally obtained by the training performed by the training apparatus 100 according to the second embodiment of the present disclosure is superior to the abnormal sample detection apparatuses DSEBM, DAGMM, and OCSVM of the related art in each index for measuring the abnormal sample detection performance.
Correspondingly, the disclosure also provides a training method for training the abnormal sample detection device.
Fig. 9 is a flow chart illustrating a training method 900 for training an abnormal sample detection apparatus according to an embodiment of the present disclosure.
The training method 900 begins at step S901. Subsequently, in a first reconstruction step S902, first reconstruction error and intermediate feature data are generated by the first reconstruction unit based on training sample data as normal sample data.
The first reconstruction step S902 may be realized by the first reconstruction unit 101 according to the first and second embodiments of the present disclosure.
Subsequently, in the back-end processing step S903, a second reconstruction error is generated by the back-end processing unit based on the first reconstruction error and the intermediate feature data.
The back-end processing step S903 may be realized by the back-end processing unit 102 including the synthesis unit 1021 and the second reconstruction unit 1022 according to the first embodiment of the present disclosure, or may be realized by the back-end processing unit 102 realized by a multi-layered perceptron according to the second embodiment of the present disclosure.
Next, in a joint training step S904, joint training is performed on the first reconstruction unit and the back-end processing unit based on predetermined criteria regarding the first reconstruction error and the second reconstruction error.
The joint training performed in the joint training step S904 may be iterative training performed in a gradient descent method using training sample data based on an overall loss function, where the number of iterations may be set to a predetermined number of times or determined according to a criterion that, for example, a difference between results of two or more iterations is stable within a predetermined range.
Finally, the training method 900 ends at step S905.
Although the embodiments of the present disclosure are described above by taking image data as an example, it is obvious to those skilled in the art that the embodiments of the present disclosure can be applied to other abnormal sample detection fields as well, such as industrial control, network intrusion detection, pathology detection, financial risk identification, video monitoring, and the like.
FIG. 10 is a block diagram illustrating the structure of a general-purpose machine 1000 that may be used to implement a training apparatus and a training method according to embodiments of the present disclosure. General purpose machine 1000 may be, for example, a computer system. It should be noted that the general purpose machine 1000 is only one example and is not intended to suggest any limitation as to the scope of use or functionality of the methods and apparatus of the present disclosure. Neither should the general machine 1000 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the above-described training apparatus or method.
In fig. 10, a Central Processing Unit (CPU)1001 executes various processes in accordance with a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 to a Random Access Memory (RAM) 1003. In the RAM 1003, data necessary when the CPU 1001 executes various processes and the like is also stored as necessary. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output interface 1005 is also connected to the bus 1004.
The following components are also connected to the input/output interface 1005: an input section 1006 (including a keyboard, a mouse, and the like), an output section 1007 (including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like), a storage section 1008 (including a hard disk and the like), a communication section 1009 (including a network interface card such as a LAN card, a modem, and the like). The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 may also be connected to the input/output interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like can be mounted on the drive 1010 as needed, so that a computer program read out therefrom can be installed into the storage section 1008 as needed.
In the case where the above-described series of processes is realized by software, a program constituting the software may be installed from a network such as the internet or from a storage medium such as the removable medium 1011.
It will be understood by those skilled in the art that such a storage medium is not limited to the removable medium 1011 shown in fig. 10, in which the program is stored, distributed separately from the apparatus to provide the program to the user. Examples of the removable medium 1011 include a magnetic disk (including a flexible disk), an optical disk (including a compact disc read only memory (CD-ROM) and a Digital Versatile Disc (DVD)), a magneto-optical disk (including a mini-disk (MD) (registered trademark)), and a semiconductor memory. Alternatively, the storage medium may be the ROM 1002, a hard disk included in the storage section 1008, or the like, in which programs are stored and which are distributed to users together with the device including them.
In addition, the present disclosure also provides a program product storing machine-readable instruction codes. The instruction codes are read and executed by a machine, and can execute the training method according to the disclosure. Accordingly, various storage media listed above for carrying such a program product are also included within the scope of the present disclosure.
Having described in detail in the foregoing through block diagrams, flowcharts, and/or embodiments, specific embodiments of apparatus and/or methods according to embodiments of the disclosure are illustrated. When such block diagrams, flowcharts, and/or implementations contain one or more functions and/or operations, it will be apparent to those skilled in the art that each function and/or operation in such block diagrams, flowcharts, and/or implementations can be implemented, individually and/or collectively, by a variety of hardware, software, firmware, or virtually any combination thereof. In one embodiment, portions of the subject matter described in this specification can be implemented by Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs), or other integrated forms. However, those skilled in the art will recognize that some aspects of the embodiments described in this specification can be equivalently implemented in whole or in part in integrated circuits, in the form of one or more computer programs running on one or more computers (e.g., in the form of one or more computer programs running on one or more computer systems), in the form of one or more programs running on one or more processors (e.g., in the form of one or more programs running on one or more microprocessors), in the form of firmware, or in virtually any combination thereof, and, it is well within the ability of those skilled in the art to design circuits and/or write code for the present disclosure, software and/or firmware, in light of the present disclosure.
It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components. The terms "first," "second," and the like, as used in ordinal numbers, do not denote an order of execution or importance of the features, elements, steps, or components defined by the terms, but are used merely for identification among the features, elements, steps, or components for clarity of description.
In summary, in the embodiments according to the present disclosure, the present disclosure provides the following schemes, but is not limited thereto:
scheme 1. a training apparatus for training an abnormal sample detection apparatus, the training apparatus comprising:
a first reconstruction unit configured to generate a first reconstruction error and intermediate feature data based on training sample data as normal sample data; and
a back-end processing unit configured to generate a second reconstruction error based on the first reconstruction error and the intermediate feature data,
wherein joint training is performed on the first reconstruction unit and the back-end processing unit based on predetermined criteria with respect to the first reconstruction error and the second reconstruction error.
Scheme 2. the training apparatus of scheme 1, wherein the first reconstruction unit is implemented by an auto-encoder.
Scheme 3. the training apparatus of scheme 2, wherein the first reconstruction unit is implemented by a deep convolutional auto-encoder.
Scheme 4. the training apparatus of scheme 1, wherein the first reconstruction unit is further configured to output first reconstruction data, the first reconstruction error being a distance of the first reconstruction data from the training sample data in vector space.
Scheme 5. the training apparatus of scheme 1, wherein the vector dimension of the training sample data is greater than or equal to the vector dimension of the intermediate feature data.
Scheme 6. the training apparatus of scheme 1, wherein the back-end processing unit comprises:
a synthesis unit configured to generate synthetic data based on the first reconstruction error and the intermediate feature data; and
a second reconstruction unit configured to generate the second reconstruction error based on the synthetic data,
wherein the predetermined criterion is to minimize a sum of the first reconstruction error and the second reconstruction error.
Scheme 7. the training apparatus of scheme 6, wherein the second reconstruction unit is implemented by a long-short term memory model.
Scheme 8. the training apparatus of scheme 6, wherein the second reconstruction unit is configured to output second reconstruction data, and the second reconstruction error is a distance of the second reconstruction data from the synthetic data in a vector space.
Scheme 9. the training apparatus of scheme 6, wherein the synthesis unit normalizes the intermediate feature data to match the first reconstruction error.
Scheme 10. the training apparatus according to scheme 7, wherein the synthesis unit performs sequence learning on the intermediate feature data.
Scheme 11. the training apparatus of scheme 6, wherein the loss function of the first reconstruction unit and the loss function of the second reconstruction unit are weighted and summed to obtain a total loss function, the joint training being performed based on the total loss function.
Scheme 12. the training apparatus of scheme 11, wherein the penalty function for the second reconstruction unit implemented by the long-short term memory model is derived by performing both forward propagation and backward propagation of the long-short term memory model.
Scheme 13. the training apparatus of scheme 11, wherein the weight of the loss function of the first reconstruction unit is larger than the weight of the loss function of the second reconstruction unit.
Scheme 14. the training apparatus of scheme 1, wherein the back-end processing unit is configured to predict the second reconstruction error based on the first reconstruction error and the intermediate feature data, and
wherein the predetermined criterion is to minimize a difference between the second reconstruction error and the first reconstruction error.
Scheme 15. the training apparatus of scheme 14, wherein the back-end processing unit is implemented by a multi-layer perceptron.
Scheme 16. the training apparatus of scheme 14, wherein the loss function of the first reconstruction unit and the loss function of the back-end processing unit are weighted and summed to obtain a total loss function, and the total loss function is used for performing the joint training.
Scheme 17. the training apparatus of scheme 14, wherein the weight of the loss function of the first reconstruction unit is greater than the weight of the loss function of the back-end processing unit.
Scheme 18. an abnormal sample detection apparatus includes a trained first reconstruction unit and a back-end processing unit obtained by training of the training apparatus according to any one of schemes 1 to 18.
Scheme 19. a training method for training an abnormal sample detection apparatus, the training method comprising:
a first reconstruction step of generating a first reconstruction error and intermediate feature data based on training sample data as normal sample data by a first reconstruction unit;
a back-end processing step for generating, by a back-end processing unit, a second reconstruction error based on the first reconstruction error and the intermediate feature data; and
a joint training step of performing joint training on the first reconstruction unit and the back-end processing unit based on a predetermined criterion on the first reconstruction error and the second reconstruction error.
Scheme 20. a computer readable storage medium having stored thereon a computer program which, when executed by a computer, implements the training method of scheme 19.
While the disclosure has been disclosed by the description of the specific embodiments thereof, it will be appreciated that those skilled in the art will be able to devise various modifications, improvements, or equivalents of the disclosure within the spirit and scope of the appended claims. Such modifications, improvements and equivalents are also intended to be included within the scope of the present disclosure.

Claims (19)

1. A training apparatus for training an abnormal sample detection apparatus, the training apparatus comprising:
a first reconstruction unit configured to generate a first reconstruction error and intermediate feature data based on training sample data as normal sample data; and
a back-end processing unit configured to generate a second reconstruction error based on the first reconstruction error and the intermediate feature data,
wherein joint training is performed on the first reconstruction unit and the back-end processing unit based on predetermined criteria with respect to the first reconstruction error and the second reconstruction error.
2. Training apparatus according to claim 1, wherein the first reconstruction unit is implemented by an auto-encoder.
3. Training apparatus according to claim 2, wherein the first reconstruction unit is implemented by a deep convolutional auto-encoder.
4. Training apparatus according to claim 1, wherein the first reconstruction unit is further configured to output first reconstruction data, the first reconstruction error being a distance of the first reconstruction data from the training sample data in vector space.
5. The training apparatus of claim 1, wherein the vector dimension of the training sample data is greater than or equal to the vector dimension of the intermediate feature data.
6. The training apparatus of claim 1, wherein the back-end processing unit comprises:
a synthesis unit configured to generate synthetic data based on the first reconstruction error and the intermediate feature data; and
a second reconstruction unit configured to generate the second reconstruction error based on the synthetic data,
wherein the predetermined criterion is to minimize a sum of the first reconstruction error and the second reconstruction error.
7. The training apparatus of claim 6, wherein the second reconstruction unit is implemented by a long-short term memory model.
8. The training apparatus of claim 6, wherein the second reconstruction unit is configured to output second reconstruction data, and the second reconstruction error is a distance in vector space of the second reconstruction data from the synthetic data.
9. The training apparatus according to claim 6, wherein the synthesis unit normalizes the intermediate feature data to match the first reconstruction error.
10. The training apparatus according to claim 7, wherein the synthesizing unit performs sequence learning on the intermediate feature data.
11. Training apparatus as claimed in claim 6, wherein the loss function of the first reconstruction unit and the loss function of the second reconstruction unit are weighted and summed to obtain a total loss function, the joint training being performed on the basis of the total loss function.
12. The training apparatus of claim 11, wherein the loss function of the second reconstruction unit implemented by a long-short term memory model is obtained by performing both forward propagation and backward propagation of the long-short term memory model.
13. Training apparatus according to claim 11, wherein the weight of the loss function of the first reconstruction unit is larger than the weight of the loss function of the second reconstruction unit.
14. The training apparatus of claim 1, wherein the back-end processing unit is configured to predict the second reconstruction error based on the first reconstruction error and the intermediate feature data, and
wherein the predetermined criterion is to minimize a difference between the second reconstruction error and the first reconstruction error.
15. The training apparatus of claim 14, wherein the back-end processing unit is implemented by a multi-layered perceptron.
16. The training apparatus as defined in claim 14, wherein the loss function of the first reconstruction unit and the loss function of the back-end processing unit are weighted and summed to obtain an overall loss function, the overall loss function being used for performing the joint training.
17. The training apparatus of claim 14, wherein the weight of the loss function of the first reconstruction unit is greater than the weight of the loss function of the back-end processing unit.
18. An abnormal sample detection apparatus comprising a trained first reconstruction unit and a back-end processing unit obtained by training of the training apparatus according to any one of claims 1 to 17.
19. A training method for training an abnormal sample detection apparatus, the training method comprising:
a first reconstruction step of generating a first reconstruction error and intermediate feature data based on training sample data as normal sample data by a first reconstruction unit;
a back-end processing step for generating, by a back-end processing unit, a second reconstruction error based on the first reconstruction error and the intermediate feature data; and
a joint training step of performing joint training on the first reconstruction unit and the back-end processing unit based on a predetermined criterion on the first reconstruction error and the second reconstruction error.
CN201811067951.5A 2018-09-13 2018-09-13 Abnormal sample detection device, training device and training method thereof Active CN110895705B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811067951.5A CN110895705B (en) 2018-09-13 2018-09-13 Abnormal sample detection device, training device and training method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811067951.5A CN110895705B (en) 2018-09-13 2018-09-13 Abnormal sample detection device, training device and training method thereof

Publications (2)

Publication Number Publication Date
CN110895705A true CN110895705A (en) 2020-03-20
CN110895705B CN110895705B (en) 2024-05-14

Family

ID=69785281

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811067951.5A Active CN110895705B (en) 2018-09-13 2018-09-13 Abnormal sample detection device, training device and training method thereof

Country Status (1)

Country Link
CN (1) CN110895705B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287816A (en) * 2020-10-28 2021-01-29 西安交通大学 Dangerous working area accident automatic detection and alarm method based on deep learning
CN112379269A (en) * 2020-10-14 2021-02-19 武汉蔚来能源有限公司 Battery abnormity detection model training and detection method and device thereof
CN112819156A (en) * 2021-01-26 2021-05-18 支付宝(杭州)信息技术有限公司 Data processing method, device and equipment
WO2021139236A1 (en) * 2020-06-30 2021-07-15 平安科技(深圳)有限公司 Autoencoder-based anomaly detection method, apparatus and device, and storage medium
CN113469234A (en) * 2021-06-24 2021-10-01 成都卓拙科技有限公司 Network flow abnormity detection method based on model-free federal meta-learning
CN113554146A (en) * 2020-04-26 2021-10-26 华为技术有限公司 Method for verifying labeled data, method and device for model training

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593269A (en) * 2008-05-29 2009-12-02 汉王科技股份有限公司 Face identification device and method
CN102708576A (en) * 2012-05-18 2012-10-03 西安电子科技大学 Method for reconstructing partitioned images by compressive sensing on the basis of structural dictionaries
US20140270353A1 (en) * 2013-03-14 2014-09-18 Xerox Corporation Dictionary design for computationally efficient video anomaly detection via sparse reconstruction techniques
US20150055783A1 (en) * 2013-05-24 2015-02-26 University Of Maryland Statistical modelling, interpolation, measurement and anthropometry based prediction of head-related transfer functions
CN104915686A (en) * 2015-07-03 2015-09-16 电子科技大学 NMF-based target detection method
CN105608478A (en) * 2016-03-30 2016-05-25 苏州大学 Combined method and system for extracting and classifying features of images
US20160188574A1 (en) * 2014-12-25 2016-06-30 Clarion Co., Ltd. Intention estimation equipment and intention estimation system
CN106033548A (en) * 2015-03-13 2016-10-19 中国科学院西安光学精密机械研究所 Crowd abnormity detection method based on improved dictionary learning
CN106203495A (en) * 2016-07-01 2016-12-07 广东技术师范学院 A kind of based on the sparse method for tracking target differentiating study
CN106778558A (en) * 2016-12-02 2017-05-31 电子科技大学 A kind of facial age estimation method based on depth sorting network
CN106803248A (en) * 2016-12-18 2017-06-06 南京邮电大学 Fuzzy license plate image blur evaluation method
CN107679859A (en) * 2017-07-18 2018-02-09 ***股份有限公司 A kind of Risk Identification Method and system based on Transfer Depth study
CN107729393A (en) * 2017-09-20 2018-02-23 齐鲁工业大学 File classification method and system based on mixing autocoder deep learning
CN107870321A (en) * 2017-11-03 2018-04-03 电子科技大学 Radar range profile's target identification method based on pseudo label study
CN108009571A (en) * 2017-11-16 2018-05-08 苏州大学 A kind of semi-supervised data classification method of new direct-push and system
WO2018120043A1 (en) * 2016-12-30 2018-07-05 华为技术有限公司 Image reconstruction method and apparatus
CN108399396A (en) * 2018-03-20 2018-08-14 深圳职业技术学院 A kind of face identification method based on kernel method and linear regression

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593269A (en) * 2008-05-29 2009-12-02 汉王科技股份有限公司 Face identification device and method
CN102708576A (en) * 2012-05-18 2012-10-03 西安电子科技大学 Method for reconstructing partitioned images by compressive sensing on the basis of structural dictionaries
US20140270353A1 (en) * 2013-03-14 2014-09-18 Xerox Corporation Dictionary design for computationally efficient video anomaly detection via sparse reconstruction techniques
US20150055783A1 (en) * 2013-05-24 2015-02-26 University Of Maryland Statistical modelling, interpolation, measurement and anthropometry based prediction of head-related transfer functions
US20160188574A1 (en) * 2014-12-25 2016-06-30 Clarion Co., Ltd. Intention estimation equipment and intention estimation system
CN106033548A (en) * 2015-03-13 2016-10-19 中国科学院西安光学精密机械研究所 Crowd abnormity detection method based on improved dictionary learning
CN104915686A (en) * 2015-07-03 2015-09-16 电子科技大学 NMF-based target detection method
CN105608478A (en) * 2016-03-30 2016-05-25 苏州大学 Combined method and system for extracting and classifying features of images
CN106203495A (en) * 2016-07-01 2016-12-07 广东技术师范学院 A kind of based on the sparse method for tracking target differentiating study
CN106778558A (en) * 2016-12-02 2017-05-31 电子科技大学 A kind of facial age estimation method based on depth sorting network
CN106803248A (en) * 2016-12-18 2017-06-06 南京邮电大学 Fuzzy license plate image blur evaluation method
WO2018120043A1 (en) * 2016-12-30 2018-07-05 华为技术有限公司 Image reconstruction method and apparatus
CN107679859A (en) * 2017-07-18 2018-02-09 ***股份有限公司 A kind of Risk Identification Method and system based on Transfer Depth study
CN107729393A (en) * 2017-09-20 2018-02-23 齐鲁工业大学 File classification method and system based on mixing autocoder deep learning
CN107870321A (en) * 2017-11-03 2018-04-03 电子科技大学 Radar range profile's target identification method based on pseudo label study
CN108009571A (en) * 2017-11-16 2018-05-08 苏州大学 A kind of semi-supervised data classification method of new direct-push and system
CN108399396A (en) * 2018-03-20 2018-08-14 深圳职业技术学院 A kind of face identification method based on kernel method and linear regression

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554146A (en) * 2020-04-26 2021-10-26 华为技术有限公司 Method for verifying labeled data, method and device for model training
WO2021139236A1 (en) * 2020-06-30 2021-07-15 平安科技(深圳)有限公司 Autoencoder-based anomaly detection method, apparatus and device, and storage medium
CN112379269A (en) * 2020-10-14 2021-02-19 武汉蔚来能源有限公司 Battery abnormity detection model training and detection method and device thereof
CN112379269B (en) * 2020-10-14 2024-03-05 武汉蔚来能源有限公司 Battery abnormality detection model training and detection method and device thereof
CN112287816A (en) * 2020-10-28 2021-01-29 西安交通大学 Dangerous working area accident automatic detection and alarm method based on deep learning
CN112819156A (en) * 2021-01-26 2021-05-18 支付宝(杭州)信息技术有限公司 Data processing method, device and equipment
CN113469234A (en) * 2021-06-24 2021-10-01 成都卓拙科技有限公司 Network flow abnormity detection method based on model-free federal meta-learning

Also Published As

Publication number Publication date
CN110895705B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN110895705B (en) Abnormal sample detection device, training device and training method thereof
CN110909046B (en) Time-series abnormality detection method and device, electronic equipment and storage medium
Yoon et al. Semi-supervised learning with deep generative models for asset failure prediction
CN114297936A (en) Data anomaly detection method and device
US20200234086A1 (en) Systems for modeling uncertainty in multi-modal retrieval and methods thereof
Ma et al. Degradation prognosis for proton exchange membrane fuel cell based on hybrid transfer learning and intercell differences
CN114239725B (en) Electric larceny detection method for data poisoning attack
CN112101400A (en) Industrial control system abnormality detection method, equipment, server and storage medium
CN116522265A (en) Industrial Internet time sequence data anomaly detection method and device
CN115587335A (en) Training method of abnormal value detection model, abnormal value detection method and system
Fu et al. MCA-DTCN: A novel dual-task temporal convolutional network with multi-channel attention for first prediction time detection and remaining useful life prediction
CN115525896A (en) Malicious software detection method utilizing dynamic graph attention network
CN117150402A (en) Power data anomaly detection method and model based on generation type countermeasure network
Almasoud et al. Parkinson’s detection using RNN-graph-LSTM with optimization based on speech signals
Lu et al. Quality-relevant feature extraction method based on teacher-student uncertainty autoencoder and its application to soft sensors
Qin et al. CSCAD: Correlation structure-based collective anomaly detection in complex system
CN113642084A (en) Tunnel surrounding rock pressure prediction method and device for slurry balance shield and storage medium
CN117041972A (en) Channel-space-time attention self-coding based anomaly detection method for vehicle networking sensor
CN116628612A (en) Unsupervised anomaly detection method, device, medium and equipment
CN115691654B (en) Method for predicting antibacterial peptide of quantum gate-controlled circulating neural network based on fewer parameters
CN116628444A (en) Water quality early warning method based on improved meta-learning
CN116610973A (en) Sensor fault monitoring and failure information reconstruction method and system
Yu et al. Time series reconstruction using a bidirectional recurrent neural network based encoder-decoder scheme
Xu et al. A multi-task learning-based generative adversarial network for red tide multivariate time series imputation
Zheng et al. Multi‐channel response reconstruction using transformer based generative adversarial network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant