CN110895705A

CN110895705A - Abnormal sample detection device, training device and training method thereof

Info

Publication number: CN110895705A
Application number: CN201811067951.5A
Authority: CN
Inventors: 庞占中; 于小亿; 孙俊
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-09-13
Filing date: 2018-09-13
Publication date: 2020-03-20
Anticipated expiration: 2038-09-13
Also published as: CN110895705B

Abstract

The present disclosure relates to a training device and a training method for training an abnormal sample detection device, and an abnormal sample detection device. A training apparatus according to the present disclosure includes a first reconstruction unit configured to generate a first reconstruction error and intermediate feature data based on training sample data as normal sample data; and a back-end processing unit configured to generate a second reconstruction error based on the first reconstruction error and the intermediate feature data, wherein the first reconstruction unit and the back-end processing unit are jointly trained based on predetermined criteria with respect to the first reconstruction error and the second reconstruction error. Comprising a first jointly trained reconstruction unit and a back-end processing unit. Compared with the prior art, the abnormal sample detection device can improve the abnormal sample detection performance.

Description

Abnormal sample detection device, training device and training method thereof

Technical Field

The present invention relates generally to the field of classification and detection, and more particularly, to a training apparatus and a training method for training an abnormal sample detection apparatus and an abnormal sample detection apparatus trained by the training apparatus and the training method.

Background

The purpose of abnormal sample detection is to identify abnormal samples that deviate from normal samples. The abnormal sample detection has important practical value and wide application range. For example, anomaly sample detection may be applied to industrial control, network intrusion detection, pathology detection, financial risk identification, video monitoring, and the like.

With the continuous development of artificial intelligence technology, deep learning is applied to solve the problem of abnormal sample detection. However, the specificity of the anomaly sample detection problem presents a significant challenge to deep learning. Firstly, the purpose of abnormal sample detection is to distinguish normal samples from abnormal samples, however, unlike the conventional classification model, the frequency of abnormal samples is low, which makes it difficult to collect enough abnormal samples for classification training. For example, when the abnormality sample detection is applied to detection of an abnormality in the operating temperature of an industrial machine, there may be a case where the operating temperature of the machine is abnormal only once or twice in collected data for several days, and the collected abnormal temperature samples are insufficient for classification training.

Furthermore, even if enough abnormal samples are collected, it is impossible to acquire complete knowledge of the abnormal samples. For example, in video monitoring, it is assumed that abnormal situations such as bicycles, motor vehicles, and the like, which occur in a pedestrian street, are monitored. However, the type of the abnormal sample in the actual scene may exceed the previously estimated value. For example, if the presence of a bicycle, a motor vehicle, is predefined as an abnormal sample class, it is difficult to judge whether these samples are normal samples or abnormal samples when objects such as a skateboard, a roller skate, a tricycle, etc. appear in a monitoring scene.

At present, the solution to the above problem is based on the following idea: since the abnormal sample class cannot be completely defined, only the normal sample class is defined, and thus any sample that does not belong to the normal sample class is defined as belonging to the abnormal sample class.

Current abnormal sample detection techniques include reconstruction error-Based abnormal sample detection techniques (e.g., SROSR (sparse representation-Based open set retrieval)), probability density-Based abnormal sample detection techniques (e.g., dagmm (deep automation sampling model)), Energy-Based abnormal sample detection techniques (e.g., dsebm (deep Structured Energy Based model)), and the like. Among these conventional abnormal sample detection techniques, an abnormal sample detection technique based on a reconstruction error is widely used because it is simple and has a good performance.

In particular, the reconstruction error refers to an error between an input sample of a reconstruction model and a reconstructed sample, wherein the reconstruction model is capable of compressing the input sample to extract feature data and reconstructing the input sample based on the extracted feature data. For the reconstruction model, the smaller the reconstruction error between the input sample and the reconstruction sample is, the better the reconstruction effect of the reconstruction model is.

Only normal samples are used for training in the training process of the reconstruction model, that is, the reconstruction model only learns how to reconstruct the normal samples. The trained reconstruction model generates a smaller reconstruction error for a normal sample, and when the input sample is an abnormal sample, the reconstruction model does not learn how to reconstruct the abnormal sample, so a larger reconstruction error is generated. Therefore, the reconstruction model can distinguish the normal sample from the abnormal sample according to the size of the reconstruction error, and the abnormal sample detection is realized.

However, the reconstruction model of the prior art has the following problems in practical application: abnormal samples may differ less from normal samples and are therefore difficult to identify correctly. Therefore, there is still a need for an abnormal sample detection technique that can more accurately distinguish between normal samples and abnormal samples.

Disclosure of Invention

In order to further improve the detection performance of the abnormal samples, an abnormal sample detection technology is proposed, which uses only normal samples as training data in the training process, reconstructs the normal samples by using a front-end reconstruction model, and performs further back-end processing on information extracted by the front-end reconstruction model to perform joint training with the front-end reconstruction model.

A brief summary of the disclosure is provided below in order to provide a basic understanding of some aspects of the disclosure. It should be understood that this summary is not an exhaustive overview of the disclosure. It is not intended to identify key or critical elements of the disclosure or to delineate the scope of the disclosure. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

An object of the present disclosure is to provide a training apparatus and a training method for training an abnormal sample detection apparatus. The training abnormal sample detection device for training by the training device and the training method can more accurately distinguish the normal sample from the abnormal sample.

In order to achieve the object of the present disclosure, according to one aspect of the present disclosure, there is provided a training device for training an abnormal sample detection device, the training device including: a first reconstruction unit configured to generate a first reconstruction error and intermediate feature data based on training sample data as normal sample data; and a back-end processing unit configured to generate a second reconstruction error based on the first reconstruction error and the intermediate feature data, wherein the first reconstruction unit and the back-end processing unit are jointly trained based on predetermined criteria with respect to the first reconstruction error and the second reconstruction error.

According to another aspect of the present disclosure, there is provided an abnormal sample detection apparatus including a trained first reconstruction unit and a back-end processing unit obtained by training of the training apparatus according to the above-described aspect of the present disclosure.

According to another aspect of the present disclosure, there is provided a training method for training an abnormal sample detection apparatus, the training method including: a first reconstruction step of generating a first reconstruction error and intermediate feature data based on training sample data as normal sample data by a first reconstruction unit; a back-end processing step of generating, by a back-end processing unit, a second reconstruction error based on the first reconstruction error and the intermediate feature data; and a joint training step for performing joint training on the first reconstruction unit and the back-end processing unit based on a predetermined criterion on the first reconstruction error and the second reconstruction error.

According to another aspect of the present disclosure, a computer program is provided that is capable of implementing the training method described above. Furthermore, a computer program product in the form of at least a computer readable medium is provided, having computer program code recorded thereon for implementing the training method described above.

The abnormal sample detection device for training according to the technology disclosed by the invention is trained based on normal samples, wherein information extracted by fully utilizing a front-end reconstruction model is utilized in the training process, so that the normal samples and the abnormal samples can be more accurately distinguished.

Drawings

The above and other objects, features and advantages of the present disclosure will be more readily understood by reference to the following description of embodiments of the present disclosure taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a block diagram illustrating a training apparatus for training an abnormal sample detection apparatus according to the present disclosure;

FIG. 2 is a schematic diagram illustrating a first reconstruction unit implemented using a depth convolutional auto-encoder, according to an embodiment of the present disclosure;

FIG. 3 is a block diagram illustrating a training apparatus according to a first embodiment of the present disclosure;

FIG. 4 is a schematic diagram showing the construction of a training apparatus according to a first embodiment of the present disclosure;

FIG. 5 is an operational flow diagram illustrating a second reconstruction unit implemented using a long short term memory model (DCAE) according to a first embodiment of the present disclosure;

FIG. 6A is a graph illustrating a probability distribution of a first reconstruction error of a DCAE;

fig. 6B is a diagram showing a combined distribution of the first reconstruction error e and the second reconstruction error e' of the DCAE;

FIG. 7 is a schematic diagram showing the construction of a training apparatus according to a second embodiment of the present disclosure;

fig. 8 is a graph illustrating a method for predicting a second reconstruction error according to a second embodiment of the present disclosure;

FIG. 9 is a flow chart illustrating a training method for training an abnormal sample detection apparatus according to an embodiment of the present disclosure; and

FIG. 10 is a block diagram illustrating the structure of a general-purpose machine that may be used to implement a training apparatus and a training method according to embodiments of the present disclosure.

Detailed Description

Hereinafter, some embodiments of the present disclosure will be described in detail with reference to the accompanying illustrative drawings. When elements of the drawings are denoted by reference numerals, the same elements will be denoted by the same reference numerals although the same elements are shown in different drawings. Further, in the following description of the present disclosure, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure unclear.

[01] The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," and "having," when used in this specification, are intended to specify the presence of stated features, entities, operations, and/or components, but do not preclude the presence or addition of one or more other features, entities, operations, and/or components.

[02] Unless otherwise defined, all terms used herein including technical and scientific terms have the same meaning as commonly understood by one of ordinary skill in the art to which the inventive concept belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. The present disclosure may be practiced without some or all of these specific details. In other instances, to avoid obscuring the disclosure with unnecessary detail, only components that are germane to the aspects in accordance with the disclosure are shown in the drawings, while other details that are not germane to the disclosure are omitted.

Hereinafter, a training apparatus and a training method for training an abnormal sample detection apparatus according to each embodiment of the present disclosure will be described in detail with reference to the accompanying drawings.

< first embodiment >

First, a training apparatus 100 for training an abnormal sample detection apparatus according to a first embodiment of the present disclosure will be described with reference to fig. 1 to 6B.

Fig. 1 is a block diagram illustrating a training apparatus 100 for training an abnormal sample detection apparatus according to the present disclosure.

As shown in fig. 1, the training apparatus 100 may include: a first reconstruction unit 101 for generating a first reconstruction error and intermediate feature data based on training sample data as normal sample data, and a back-end processing unit 102 for generating a second reconstruction error based on the first reconstruction error and intermediate feature data. In a training process, joint training is performed on the first reconstruction unit and the back-end processing unit based on predetermined criteria with respect to the first reconstruction error and the second reconstruction error. The first reconstruction unit 101 and the back-end processing unit 102, which are finally obtained through the joint training, together constitute an abnormal sample detection apparatus.

According to an embodiment of the present disclosure, the first reconstruction unit 101 may perform a reconstruction operation on the intermediate feature data to output first reconstruction data. Furthermore, according to an embodiment of the present disclosure, the first reconstruction error may be a distance, e.g., a euclidean distance, in vector space of the first reconstruction data and the corresponding training sample data.

Those skilled in the art will recognize that although embodiments of the present disclosure have been described using euclidean distances as an example for the first reconstruction error, the present disclosure is not limited thereto. Indeed, a person skilled in the art may use other indices than the euclidean distance for measuring the difference between the first reconstructed data and the training sample data, such as the mahalanobis distance, the cosine distance, etc. in the vector space, all of which shall be covered by the scope of the present disclosure as well.

According to one embodiment of the disclosure, the dimension of the training sample data is greater than or equal to the dimension of the intermediate feature data. In fact, the first reconstruction unit 101 may perform a feature extraction operation on the training sample data, the intermediate feature data characterizing the extracted features. For example, in the case where the technique according to the present disclosure is applied to image recognition, the training sample data may be normal two-dimensional image data, and the intermediate feature data may be one-dimensional vectors characterizing features of the extracted two-dimensional image data.

Further, in the case where the technique according to the present disclosure is applied to industrial control, the training sample data may be a one-dimensional vector composed of data sensed by each industrial sensor, and at this time, the intermediate feature data may be a one-dimensional vector having a smaller number of elements than the training sample data.

Subsequently, the first reconstruction unit 101 may perform a reconstruction operation based on the intermediate feature data, resulting in first reconstruction data. The first reconstructed data has the same dimensions as the training sample data.

According to an embodiment of the present disclosure, the first reconstruction unit 101 may be implemented by an auto-encoder.

An autoencoder is a neural network including an input layer composed of neurons, a hidden layer, and an output layer. The self-encoder can implement compression and decompression processes of data.

An autoencoder consists of an encoder and a decoder, both of which essentially perform some sort of transform processing on the data. The encoder is used to encode (compress) input data into low-dimensional data, and the decoder is used to decode (decompress) the compressed low-dimensional data into output data. An ideal self-encoder makes the reconstructed output data identical to the original input data, i.e. the error between the input data and the output data is zero.

Since the self-encoder is a technique known to those skilled in the art, the details of the self-encoder are not further described herein for the sake of brevity. Furthermore, those skilled in the art will recognize that although embodiments of the present disclosure implement the first reconstruction unit 101 by using an auto-encoder, the present disclosure is not limited thereto. In fact, according to the idea of the present disclosure, a person skilled in the art may use other reconstruction models than the self-encoder to implement the reconstruction function of the first reconstruction unit, as long as the reconstruction model is capable of extracting the intermediate feature data and calculating the first reconstruction error. All such reconstruction models are intended to be included within the scope of the present disclosure.

Fig. 2 is a schematic diagram illustrating a first reconstruction unit 101 implemented using a depth convolution auto-encoder according to an embodiment of the present disclosure.

As shown in fig. 2, according to one embodiment of the present disclosure, the first reconstruction unit 101 may be implemented by a Depth Convolution Auto Encoder (DCAE).

As shown in fig. 2, both the encoder and decoder of DCAE are implemented by a Convolutional Neural Network (CNN), and thus can process complex image data.

Given that CNN is a technique known to those skilled in the art, the details of CNN are not described further herein for the sake of brevity.

Those skilled in the art will recognize that although embodiments of the present disclosure are illustrated by applying a depth convolution auto-encoder to image data, the present disclosure is not so limited. Different types of autoencoders may be applied depending on the type of sample data to be processed in a particular application environment. For example, when the abnormal sample detection apparatus according to the present disclosure is applied to an industrial control environment, the input data may be data composed of data sensed by various sensors, in which case, the technical solution of the present disclosure may be implemented using a sparse self-encoder, and all of these technical solutions should be covered within the scope of the present disclosure.

For example, as shown in fig. 2, the encoder and decoder of the DCAE constituting the first reconstruction unit 101 each include several hidden layers, such as a convolutional layer, a pooling layer, and a full-link layer on the encoder side, and a deconvolution layer, an inverse pooling layer, and a full-link layer on the decoder side. In this regard, features of the spatial information may be learned by the convolutional layer, and information redundancy in the learned feature map is eliminated by the pooling layer. In addition, in order to obtain DCAE with strong generalization capability through learning, some additional processing including noise addition, whitening, clipping and inversion of training sample data can be adopted in training. In addition, during training, DCAE may employ dropout processing and regularization processing to prevent overfitting.

Specifically, for training sample data x input as normal sample data, the encoder of DCAE will perform feature extraction on the training sample data x to obtain low-dimensional intermediate feature data h. In addition, the decoder of the DCAE restores the intermediate feature data h to the first reconstructed data x' whose dimensions coincide with the training sample vector x. The above process can be represented by the following formula (1).

h＝f₁(W₁x+b₁)，x'＝f₂(W₂h+b₂) (1)

Wherein f is₁、f₂Is an activation function, W₁A connection weight matrix of neurons of the convolutional neural network on the encoder side, b₁Is the bias vector, W, of the neurons of the convolutional neural network on the encoder side₂A connection weight matrix of neurons of a convolutional neural network on the decoder side, b₂Is a bias vector of a neuron of a convolutional neural network on the decoder side, where W₁、b₁Is the parameter to be trained on the encoder side, and can be represented by theta_eRepresents, and W₂、b₂Is a parameter to be trained at the decoder side, and can be represented by theta_dAnd (4) showing.

In view of the above-described processing involving DCAE being a technique known to those skilled in the art, for the sake of brevity only the application of DCAE in embodiments of the present disclosure will be described herein without a more detailed description of its principles.

Assume that the training sample set contains N training samples x_i(1 ≦ i ≦ N), the loss function of DCAE that may be used to characterize the first reconstruction error is represented by the following equation (2).

Where the subscript 2 indicates taking the second norm.

To avoid overfitting and improve generalization ability, according to one embodiment of the present disclosure, a regularization term may be added to the loss function of equation (2) above. Thus, the loss function of the above formula (2) may have a form as shown in the following formula (3).

Wherein theta is_iIs a parameter to be trained of a fully-connected layer in a convolutional neural network constituting an encoder and a decoder of DCAE, k is the number of parameters to be trained in the fully-connected layer, and λ₁Is a predetermined hyper-parameter, i.e. a regularization parameter, which may be determined empirically or experimentally. According to one embodiment of the present disclosure, λ₁May be, for example, 10000.

Since the loss function of DCAE is known to the person skilled in the art, the details thereof will not be described further here for the sake of brevity.

During the joint training of the training apparatus 100, the parameter to be trained of the first reconstruction unit 101 is realized by DCAE as θ_eAnd theta_d. Where the loss function contains a regularization term, the band training parameters also include θ_i。

It should be noted here that for the ith training sample data x of the N training sample data_iThe first reconstruction unit 101 generates the and x by performing a reconstruction operation_iCorresponding first reconstruction data x'_iThe difference between the two being related to the training sample data x_iFirst reconstruction error e of_i. Thus, the loss function of the above equations (2) and/or (3) may in this sense represent the first reconstruction error with respect to the population of N training sample data. Therefore, the training process that minimizes the first reconstruction error of the first reconstruction unit 101 may be regarded as a process that minimizes the loss function of the first reconstruction unit 101.

Further, as shown in fig. 2, the first reconstruction unit 101 implemented by DCAE may generate the first reconstruction error e and the intermediate feature data h based on training sample data x whose dimension is greater than or equal to that of the intermediate feature data h.

As described above, the first reconstruction error e may be a difference between input training sample data and output first reconstruction data of the first reconstruction unit 101, which may be represented by a euclidean distance of the training sample data and the first reconstruction data in a vector space. The intermediate feature data h may represent features of the extracted training sample data, so both contain some information of the originally input training sample data.

In the prior art, for normal sample data, the first reconstruction unit implemented by DCAE can obtain a better reconstruction effect, that is, the first reconstruction error between the input sample data and the output reconstructed data is smaller. However, when the first reconstruction unit trained by using normal sample data reconstructs abnormal sample data, the obtained first reconstruction error is large, so that the abnormal sample data can be distinguished.

However, for some abnormal sample data, the first reconstruction error obtained by the first reconstruction unit may also be relatively small, and thus may be mistaken for normal sample data, thereby causing the detection of the abnormal sample to fail.

Therefore, the technical scheme according to the present disclosure comprehensively considers the first reconstruction error e and the intermediate feature data h, thereby utilizing the training sample data to the maximum extent.

According to the present disclosure, the back-end processing unit 102 may further process the first reconstruction error e and the intermediate feature data h obtained by the first reconstruction unit 101 to obtain a second reconstruction error e'. In this way, the training apparatus 100 can perform joint training, which takes into account not only information of training sample data included in the first reconstruction error e but also information of training sample data included in the intermediate feature data h, on the first reconstruction unit 101 and the back-end processing unit 102 based on predetermined criteria regarding the first reconstruction error e and the second reconstruction error e', and thus can obtain an abnormal sample detection apparatus with improved detection accuracy by this joint training.

The back-end processing unit 102 according to an embodiment of the present disclosure may process the first reconstruction error e and the intermediate feature data h in various ways.

Fig. 3 is a block diagram illustrating a training apparatus 100 according to a first embodiment of the present disclosure, in which one example of a back-end processing unit 102 is given. Fig. 4 is a schematic diagram showing the configuration of the training apparatus 100 according to the first embodiment of the present disclosure.

As shown in fig. 3, according to the first embodiment of the present disclosure, the back-end processing unit 102 may include a synthesis unit 1021 for generating synthetic data based on the first reconstruction error and the intermediate feature data, and a second reconstruction unit 1022 for generating a second reconstruction error based on the synthetic data, wherein the joint training is performed on the first reconstruction unit 101, the synthesis unit 1021, and the second reconstruction unit 1022 according to a predetermined criterion that minimizes the first reconstruction error and the second reconstruction error.

According to an embodiment of the present disclosure, the synthesis unit 1021 may generate the synthesized data z based on the first reconstruction error e and the intermediate feature data h. For example, the synthesis unit 1021 may stitch the first reconstruction error e directly together with the intermediate feature data h to form the synthesized data z. Typically, the first reconstruction error e is a numerical value and the intermediate feature data h is a one-dimensional vector, and directly stitching the two together will form a new one-dimensional vector as the composite data z. However, the present disclosure is not limited thereto. One skilled in the art may combine the first reconstruction error e and the intermediate feature data h in other ways to form the composite data z in accordance with the teachings of the present disclosure.

In some cases, the first reconstruction error e may differ significantly from the intermediate feature data h in size and magnitude and thus cannot be combined directly.

In this case, according to one embodiment of the present disclosure, the synthesis unit 1021 may normalize the intermediate feature data h to match the first reconstruction error e in terms of dimension and magnitude, and then may combine the normalized intermediate feature data with the first reconstruction error e into the synthesized data z. For example, the normalization process may be a normalization process of each data element of the intermediate feature data h based on the first reconstruction error e.

Further, the intermediate feature data h is compressed low-dimensional data, which itself has no sequence property. As shown in fig. 4, in a case where the second reconstruction unit 1022 is implemented by a long-short term memory model (LSTM) (described in more detail later), in order to facilitate further processing of the synthesized data z, sequence learning may be performed on the intermediate feature data h, according to an embodiment of the present disclosure.

For example, the sequence learning may be performed according to the following equation (4).

h′＝f₃(W₃·h+b₃) (4)

Where h' is serialized intermediate feature data obtained by performing sequence learning on the intermediate feature data h, W₃And b₃Is a connection weight matrix and an offset vector as parameters to be learned, f₃Is an activation function. Here, the parameter W used for the sequence learning performed by the synthesis unit 1021₃And b₃Is trained during the joint training of the first reconstruction unit 101, the synthesis unit 1021 and the second reconstruction unit 1022. It should be noted that the synthesis unit 1021 has no loss function of its own, parameter W₃And b₃The impact on the joint training is reflected in the loss function of the second reconstruction unit 1022.

Subsequently, the resulting serialized intermediate feature data h' subjected to sequence learning is combined with the first reconstruction error e to obtain synthetic data z.

Here, it should be noted that it is not necessary to perform sequence learning on the intermediate feature data h. For example, the second reconstruction unit 1022 may also be implemented by DCAE, in which case the sequence learning need not be performed on the intermediate feature data h.

As shown in fig. 4, the synthesized data z synthesized by the synthesis unit 1021 may be input into the second reconstruction unit 1022, and the second reconstruction unit 1022 performs a reconstruction operation on the synthesized data z, calculating a second reconstruction error e 'from a difference between the resultant second reconstruction data z' and the synthesized data z.

According to an embodiment of the present disclosure, the second reconstruction error e 'may be a distance, e.g., a euclidean distance, in vector space of the second reconstruction data z' and the synthetic data z.

Similar to the first reconstruction error e, although the embodiments of the present disclosure have explained the second reconstruction error e' by using the euclidean distance as an example, the present disclosure is not limited thereto. Indeed, a person skilled in the art may use other indicators than the euclidean distance for measuring the difference between the second reconstructed data and the synthetic data, such as the mahalanobis distance, the cosine distance, etc., all of which shall be covered by the scope of the present disclosure.

As shown in fig. 4, the second reconstruction unit 1022 may be implemented using a Long Short Term Memory (LSTM) model according to an embodiment of the present disclosure.

The LSTM model is a sequence Recurrent Neural Network (RNN) that is suitable for processing and predicting significant events of very long intervals and delays in sequence features. The LSTM model is able to learn long time range dependencies through its memory cells, which typically include four cells, input gate i_tOutput gate o_tForgetting door f_tAnd storage state c_tWhere t represents the current time step. Storage state c_tThe current state of the other cells is influenced according to the state of the last time step. Forget door f_tCan be used to determine which information should be discarded. The above process can be represented by the following formula (5)

i_t＝σ(W_(i,x)x_t+W_(i,h)h_t-1+b_i)

f_t＝σ(W_(f,x)x_t+W_(f,h)h_t-1+b_f)

g_t＝tanh(W_(g，x)x_t+W_(g,h)h_t-1+b_g)

c_t＝i_t⊙g_t+f_t⊙c_t-1(5)

o_t＝σ(W_(o,x)x_t+W_(o,h)h_t-1+b_o)

h_t＝o_t⊙tanh(c_t)

Where σ is the sigmoid function, ⊙ denotes the sequential multiplication of vector elements, x_tInput representing current time step t, h_tIndicating the current timeIntermediate state between steps t, o_tRepresenting the output of the current time step t. Connection weight matrix W_(i,x)、W_(f,x)、W_(g,x)、W_(o,x)And an offset vector b_i、b_f、b_g、b_oIs the parameter to be trained, denoted in this text by θ_lAnd (4) showing.

In view of the fact that the LSTM model is known to those skilled in the art, for the sake of brevity, only its application to embodiments of the present disclosure is described herein, without a more detailed description of its principles.

According to an embodiment of the present disclosure, in order to improve the effect of the reconstruction operation, the second reconstruction unit 1022 implemented by the LSTM model may perform both forward propagation and backward propagation.

Fig. 5 is an operation flow diagram illustrating the second reconstruction unit 1022 implemented using the long-short term memory model according to the first embodiment of the present disclosure.

As shown in fig. 5, the LSTM model implementing the second reconstruction unit 1022 receives the synthetic data z, which includes the serialized intermediate feature data h 'and the first reconstruction error e, and is forward propagated in n LSTM units, where the number of n is equal to the vector length of the serialized intermediate feature data h'.

Furthermore, to improve the effect of the reconstruction operation, the LSTM model also performs back propagation for reconstruction. In fig. 5, the symbols with wavy line superscripts indicate forward propagating reconstruction results, and the symbols with sharp corner superscripts indicate backward propagating reconstruction results.

Thus, the loss function of the LSTM model for implementing the second reconstruction unit 1022 may be represented by the following equation (6).

Wherein, h'_i，jA j-th sequence vector representing the i-th intermediate feature data of the N serialized intermediate feature data h',

is h'_i，jCorresponding forward propagating intermediate states, and

is h'_i，jCorresponding counter-propagating intermediate states. Furthermore, e_iRepresents the ith first reconstruction error of the N first reconstruction errors, and

is represented by_iCorresponding forward propagated results.

Further, λ₂Is a predetermined hyper-parameter that can be used to adjust the proportion of the serialized intermediate feature data h 'and the first reconstruction error e in the resulting second reconstruction error e', which can be determined empirically or experimentally. E.g. λ₂Is in the range of 0.1 to 1.

It should be noted here that, due to the recursive nature of the LSTM model and the physical meaning of the first reconstruction error e, the LSTM performs only forward propagation on it for the first reconstruction error e.

As described above, for the ith training sample data x of the N training sample data_iAnd x_iThe corresponding first reconstruction data is x'_iThe difference between the two is the training sample data x of the first reconstruction unit 101_iFirst reconstruction error e of_i. Further, the first reconstruction unit 101 aims at the ith training sample data x_iGenerating intermediate feature data h_i. The synthesis unit 1021 pairs intermediate feature data h_iPerforming sequence learning to obtain serialized intermediate feature data h'_iAnd compares it with a first reconstruction error e_iCombined into synthetic data z_i。

The LSTM model used to implement second reconstruction unit 1022 is directed to serialized intermediate feature data h 'by forward and backward propagation'_iTwo reconstructed intermediate feature data are generated separately and directed to a first reconstruction error e by forward propagation_iA reconstructed first reconstruction error is generated.

In summary, the loss function of equation (6) above may represent the second reconstruction error with respect to the population of N synthetic data in this sense.

As described above, in the training apparatus 100 according to the first embodiment of the present disclosure, the first reconstruction unit 101 may generate the first reconstruction error and the intermediate feature data by performing reconstruction on normal sample data used for training; the synthesis unit 1021 may combine the first reconstruction error and the intermediate feature data into synthesized data; subsequently, the second reconstruction unit 1022 may perform reconstruction on the synthetic data to generate a second reconstruction error, where the first reconstruction error generated by the first reconstruction unit may be generally represented by equation (2) or (3) above, and the second reconstruction error generated by the second reconstruction unit may be generally represented by equation (6) above.

According to the first embodiment of the present disclosure, the predetermined criterion on which the joint training is performed on the first reconstruction unit 101 and the back-end processing unit 102 (e.g., including the synthesis unit 1021 and the second reconstruction unit 1022) is to minimize the sum of both the first reconstruction error and the second reconstruction error. The predetermined criterion may be expressed by an overall loss function of the training apparatus 100 as shown in the following equation (7).

J(θ_e,θ_d,θ_l)＝J_DCAE(θ_e,θ_d)+λ₃J_LSTM(θ_l) (7)

Wherein λ₃Is a predetermined hyper-parameter, i.e. a weight, which can be used to adjust the proportion of the first reconstruction error e and the second reconstruction error e' in the joint training process. In general, in order to generate representative low-dimensional intermediate feature data and first reconstruction errors, the loss function of the first reconstruction unit 101 should always dominate. Thus, the hyperparameter λ₃Is usually set to less than 1, e.g. a hyperparameter λ₃The value of (a) is in the range of 0.1 to 0.001.

The training apparatus 100 performs joint training of the first reconstruction unit 101 and the back-end synthesis unit 102 (including, for example, the synthesis unit 1021 and the second reconstruction unit 1022) using training sample data as normal sample data in a gradient descent method based on the loss function of the above expression (7) until a predetermined number of iterations is reached, or until a difference between results of two or more iterations stabilizes within a predetermined range. The first reconstruction unit 101 and the back-end synthesis unit 102 (e.g., including the synthesis unit 1021 and the second reconstruction unit 1022) finally obtained by the joint training may constitute an abnormal sample detection apparatus.

In the detection of abnormal sample data by the abnormal sample detection apparatus, when the input data is normal sample data, the second reconstruction error e' output by the back-end processing unit 102 (including, for example, the synthesis unit 1021 and the second reconstruction unit 1022) that contains information on the first reconstruction error e output by the first reconstruction unit 101 is smaller than a predetermined threshold. The predetermined threshold may be used to distinguish between normal sample data and abnormal sample data. The predetermined threshold may be determined empirically or experimentally.

Therefore, when sample data to be detected is input, if the second reconstruction error e' output by the back-end processing unit 102 (including, for example, the synthesis unit 1021 and the second reconstruction unit 1022) is not less than the predetermined threshold value, it may be determined that the input sample data is abnormal sample data.

The idea of the present disclosure is further explained below. Fig. 6A is a diagram illustrating a probability distribution of the first reconstruction error e of the DCAE, and fig. 6B is a diagram illustrating a distribution of the first reconstruction error e and the intermediate feature data h of the DCAE. The dark portions in fig. 6A and 6B correspond to normal samples for training, and the light portions correspond to abnormal samples to be detected.

As shown in fig. 6A, if only the first reconstruction error e of the first reconstruction unit 101 is considered, there is overlap of probability distributions of normal samples and abnormal samples at a portion circled in the figure, and thus the first reconstruction unit 101 cannot accurately identify abnormal samples within the portion. As shown in fig. 6B, considering the first reconstruction error e in further combination with the intermediate feature data h according to the technique of the present disclosure, it can be clearly seen that the normal sample and the abnormal sample can be more accurately distinguished.

Thus, according to the techniques of this disclosure, both the first reconstruction error e and the intermediate feature data h are used in conjunction for training to retain as much information as possible of the normal sample data input for training. Through processing according to the technique of the present disclosure, as shown in fig. 6B, a normal sample and an abnormal sample can be clearly distinguished, thereby improving the accuracy of abnormal sample detection.

The abnormal sample detection apparatus according to the first embodiment of the present disclosure was tested against a classical grayscale image data set MNIST commonly used in the art. The test results are shown in table 1 below.

TABLE 1

Where ρ represents an abnormal ratio, and indexes Prec (precision rate), Rec (recall rate), and F1(F value) are indexes commonly used in the existing abnormal sample detection technology to measure the detection performance. The definition is shown in the following formula (8):

TP, FN, FP and TN in the formula (8) represent true positive, false negative, false positive and true negative, respectively.

The test results in table 1 show that the abnormal sample detection apparatus obtained by the training performed by the training apparatus 100 according to the first embodiment of the present disclosure is superior to the abnormal sample detection apparatuses DSEBM, DAGMM, and OCSVM of the related art in each index for measuring the abnormal sample detection performance.

< second embodiment >

Next, a training apparatus 100 for training an abnormal-sample detecting apparatus according to a second embodiment of the present disclosure will be described with reference to fig. 7 and 8.

The second embodiment of the present disclosure is different from the first embodiment in that the back-end processing unit 102 that performs back-end processing on the first reconstruction error and the intermediate feature data output by the first reconstruction unit 101 is implemented using a prediction mechanism, and therefore, for the sake of brevity, repetitive description of the first reconstruction unit 101 will not be made here.

As described above, if only the first reconstruction error e of the first reconstruction unit 101 is considered, there is overlap of probability distributions of normal samples and abnormal samples at the portion circled in fig. 6A, and thus the first reconstruction unit 101 cannot accurately identify abnormal samples within the portion.

According to the second embodiment of the present disclosure, the back-end processing unit 102 may predict the second reconstruction error e 'based on the first reconstruction error e and the intermediate feature data h, wherein the predetermined criterion for performing the joint training on the first reconstruction unit 101 and the back-end processing unit 102 is to minimize a difference between the second reconstruction error e' and the first reconstruction error e.

According to one embodiment, the back-end processing unit 102 may be implemented by a multi-layer perceptron (MLP).

Fig. 7 is a schematic diagram showing the configuration of the training apparatus 100 according to the second embodiment of the present disclosure.

As shown in fig. 7, the first reconstruction unit 101 of the training apparatus 100 of the second embodiment of the present disclosure is the same as the first reconstruction unit 101 of the first embodiment except that the back-end processing unit 102 is implemented by an MLP.

MLP is a forward neural network with a hidden layer that can be used to fit complex functions.

The second embodiment of the present disclosure is based on the idea that the second reconstruction error e' can be predicted from the intermediate feature data h by establishing a correspondence relationship between the intermediate feature data h and the first reconstruction error e through training of the back-end processing unit 102 implemented by MLP.

The second reconstruction error e' output by the back-end processing unit 102 of the MLP implementation may be represented by the following equation (9).

e'＝f_m(W_mh+b_m) (9)

Wherein f is_mIs an activation function, W_mIs the connection weight matrix of each layer of neurons in MLP, b_mIs a bias vector of a neuron, where W_mAnd b_mIs the parameter to be trained of the MLP, which may be referred to herein as θ_mAnd (4) showing.

The training of the back-end processing unit 102 by MLP may be regarded as establishing a correspondence between the intermediate feature data h and the first reconstruction error e, and the trained back-end processing unit 102 may predict a second reconstruction error e 'for the intermediate feature data, the training aiming at bringing the second reconstruction error e' as close as possible to the corresponding first reconstruction error e. For example, according to one embodiment of the present disclosure, training is performed on the MLP such that the difference between the second reconstruction error e' and the first reconstruction error e is minimal.

In summary, for N training samples, the cost function of the MLP that can be used to generally characterize the difference between the second reconstruction error e' and the first reconstruction error e can be represented by equation (10) below.

In this way, for MLP trained with training sample data as normal sample data, the second reconstruction error e 'very close to the first reconstruction error e can be predicted, while for abnormal sample data, the difference between the predicted second reconstruction error e' and the corresponding first reconstruction error e is very large, whereby abnormal sample data can be identified.

As described above, in the training apparatus 100 according to the second embodiment of the present disclosure, the first reconstruction unit 101 may generate the first reconstruction error and the intermediate feature data by performing reconstruction on normal sample data used for training; the back-end processing unit 102 may establish a correspondence between the first reconstruction error and the intermediate feature data, and predict a second reconstruction error therefrom, wherein the first reconstruction error generated by the first reconstruction unit may be generally expressed by equation (2) or (3) above, and the second reconstruction error generated by the second reconstruction unit may be generally expressed by equation (10) above.

According to a second embodiment of the present disclosure, the predetermined criterion on which the joint training is performed for the first reconstruction unit 101 and the back-end processing unit 102 is such that the difference between the first reconstruction error and the second reconstruction error is minimal. The predetermined criterion may be expressed by an overall loss function of the training apparatus 100 as shown in the following equation (11).

J(θ_e,θ_d,θ_m)＝J_DCAE(θ_e,θ_d)+λ₄J_MLP(θ_m) (11)

Wherein λ₄Is a predetermined hyper-parameter, i.e. a weight, which may be used to adjust the proportion of the first reconstruction error e and the second reconstruction error e' in the joint training process, which may be determined empirically or experimentally. In general, in order to generate representative low-dimensional intermediate feature data and first reconstruction errors, the loss function of the first reconstruction unit 101 should always dominate. Thus, the hyperparameter λ₄Is usually set to less than 1, e.g. a hyperparameter λ₄The value of (a) is in the range of 0.1 to 0.001.

The training apparatus 100 performs joint training of the first reconstruction unit 101 and the back-end processing unit 102 using training sample data as normal sample data in a gradient descent method based on the loss function of the above equation (11) until a predetermined number of iterations is reached, or until a difference between results of two or more iterations is stabilized within a predetermined range. The first reconstruction unit 101 and the back-end processing unit 102 obtained by the joint training may constitute an abnormal sample detection apparatus.

The principle of the second embodiment of the present disclosure is further explained below with reference to fig. 8. Fig. 8 is a graph illustrating a method for predicting a second reconstruction error e' according to a second embodiment of the present disclosure.

The graph shown in fig. 8 is schematic, which corresponds to fig. 6A. The dark curves in fig. 8 correspond to normal samples used for training, while the light curves correspond to abnormal samples to be detected.

As shown in fig. 8, since the first reconstruction unit 101 performs training using training sample data that is normal sample data, the first reconstruction error is small for normal sample data, and is large for abnormal sample data. However, as shown in fig. 8, the probability distribution curve of the first reconstruction error with respect to the normal sample data intersects the probability distribution curve of the first reconstruction error with respect to the abnormal sample data, resulting in a failure to accurately judge whether the data input to the first reconstruction unit 101 is the normal sample data or the abnormal sample data based on the first reconstruction error within the intersected portion.

Here, as shown in fig. 8, normal sample data may be divided into two groups, the first group of normal sample data yielding a smaller first reconstruction error e_n1And corresponding intermediate characteristic data h_n1While the second set of normal sample data yields a larger first reconstruction error e_n2And corresponding intermediate characteristic data h_n2Through joint training of the first reconstruction unit 101 and the back-end processing unit 102, (h) is established_n1，h_n2) And (e)_n1，e_n2) The second reconstruction error e 'predicted by the back-end processing unit 102 at this time'_n1，e'_n2And a first reconstruction error e_n1，e_n2The differences therebetween are each less than some predetermined threshold.

When the trained abnormal sample detection device detects an abnormal sample, there are two cases. The first case is a larger first reconstruction error e_a1Since such a large first reconstruction error never occurs in the training phase, the intermediate feature data h is not considered_a1How the back-end processing unit 102 predicts the second reconstruction error e'_a1Necessarily with the first reconstruction error e_a1The difference is large, for example, greater than the predetermined threshold, so that it can be determined as abnormal sample data accordingly.

The second case is a smaller first reconstruction error e_a2Larger of the normal sample dataA reconstruction error e_n2Close. However, a smaller first reconstruction error e_a2Corresponding intermediate characteristic number h_a2Necessarily differs from the larger first reconstruction error e_n2Corresponding intermediate characteristic number h_n2So that a corresponding second reconstruction error e 'is predicted by back-end processing unit 102'_a2Necessarily with the first reconstruction error e_a2The difference is large, for example, greater than the predetermined threshold, so that it can be determined as abnormal sample data accordingly.

Therefore, by establishing the corresponding relationship between the intermediate feature data h of the normal sample data and the first reconstruction error e through the back-end processing unit 102, the abnormal sample falling in the intersection region of the probability distribution curve of the first reconstruction error with respect to the normal sample data and the probability distribution curve of the first reconstruction error with respect to the abnormal sample data can be accurately identified, so that the accuracy of detecting the abnormal sample is improved.

The abnormal sample detection apparatus according to the second embodiment of the present disclosure was tested against a classical grayscale image data set MNIST commonly used in the art. The test results are shown in table 2 below.

TABLE 2

The test results in table 2 show that the abnormal sample detection apparatus finally obtained by the training performed by the training apparatus 100 according to the second embodiment of the present disclosure is superior to the abnormal sample detection apparatuses DSEBM, DAGMM, and OCSVM of the related art in each index for measuring the abnormal sample detection performance.

Correspondingly, the disclosure also provides a training method for training the abnormal sample detection device.

Fig. 9 is a flow chart illustrating a training method 900 for training an abnormal sample detection apparatus according to an embodiment of the present disclosure.

The training method 900 begins at step S901. Subsequently, in a first reconstruction step S902, first reconstruction error and intermediate feature data are generated by the first reconstruction unit based on training sample data as normal sample data.

The first reconstruction step S902 may be realized by the first reconstruction unit 101 according to the first and second embodiments of the present disclosure.

Subsequently, in the back-end processing step S903, a second reconstruction error is generated by the back-end processing unit based on the first reconstruction error and the intermediate feature data.

The back-end processing step S903 may be realized by the back-end processing unit 102 including the synthesis unit 1021 and the second reconstruction unit 1022 according to the first embodiment of the present disclosure, or may be realized by the back-end processing unit 102 realized by a multi-layered perceptron according to the second embodiment of the present disclosure.

Next, in a joint training step S904, joint training is performed on the first reconstruction unit and the back-end processing unit based on predetermined criteria regarding the first reconstruction error and the second reconstruction error.

The joint training performed in the joint training step S904 may be iterative training performed in a gradient descent method using training sample data based on an overall loss function, where the number of iterations may be set to a predetermined number of times or determined according to a criterion that, for example, a difference between results of two or more iterations is stable within a predetermined range.

Finally, the training method 900 ends at step S905.

Although the embodiments of the present disclosure are described above by taking image data as an example, it is obvious to those skilled in the art that the embodiments of the present disclosure can be applied to other abnormal sample detection fields as well, such as industrial control, network intrusion detection, pathology detection, financial risk identification, video monitoring, and the like.

FIG. 10 is a block diagram illustrating the structure of a general-purpose machine 1000 that may be used to implement a training apparatus and a training method according to embodiments of the present disclosure. General purpose machine 1000 may be, for example, a computer system. It should be noted that the general purpose machine 1000 is only one example and is not intended to suggest any limitation as to the scope of use or functionality of the methods and apparatus of the present disclosure. Neither should the general machine 1000 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the above-described training apparatus or method.

In fig. 10, a Central Processing Unit (CPU)1001 executes various processes in accordance with a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 to a Random Access Memory (RAM) 1003. In the RAM 1003, data necessary when the CPU 1001 executes various processes and the like is also stored as necessary. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output interface 1005 is also connected to the bus 1004.

The following components are also connected to the input/output interface 1005: an input section 1006 (including a keyboard, a mouse, and the like), an output section 1007 (including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like), a storage section 1008 (including a hard disk and the like), a communication section 1009 (including a network interface card such as a LAN card, a modem, and the like). The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 may also be connected to the input/output interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like can be mounted on the drive 1010 as needed, so that a computer program read out therefrom can be installed into the storage section 1008 as needed.

In the case where the above-described series of processes is realized by software, a program constituting the software may be installed from a network such as the internet or from a storage medium such as the removable medium 1011.

It will be understood by those skilled in the art that such a storage medium is not limited to the removable medium 1011 shown in fig. 10, in which the program is stored, distributed separately from the apparatus to provide the program to the user. Examples of the removable medium 1011 include a magnetic disk (including a flexible disk), an optical disk (including a compact disc read only memory (CD-ROM) and a Digital Versatile Disc (DVD)), a magneto-optical disk (including a mini-disk (MD) (registered trademark)), and a semiconductor memory. Alternatively, the storage medium may be the ROM 1002, a hard disk included in the storage section 1008, or the like, in which programs are stored and which are distributed to users together with the device including them.

In addition, the present disclosure also provides a program product storing machine-readable instruction codes. The instruction codes are read and executed by a machine, and can execute the training method according to the disclosure. Accordingly, various storage media listed above for carrying such a program product are also included within the scope of the present disclosure.

Having described in detail in the foregoing through block diagrams, flowcharts, and/or embodiments, specific embodiments of apparatus and/or methods according to embodiments of the disclosure are illustrated. When such block diagrams, flowcharts, and/or implementations contain one or more functions and/or operations, it will be apparent to those skilled in the art that each function and/or operation in such block diagrams, flowcharts, and/or implementations can be implemented, individually and/or collectively, by a variety of hardware, software, firmware, or virtually any combination thereof. In one embodiment, portions of the subject matter described in this specification can be implemented by Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Digital Signal Processors (DSPs), or other integrated forms. However, those skilled in the art will recognize that some aspects of the embodiments described in this specification can be equivalently implemented in whole or in part in integrated circuits, in the form of one or more computer programs running on one or more computers (e.g., in the form of one or more computer programs running on one or more computer systems), in the form of one or more programs running on one or more processors (e.g., in the form of one or more programs running on one or more microprocessors), in the form of firmware, or in virtually any combination thereof, and, it is well within the ability of those skilled in the art to design circuits and/or write code for the present disclosure, software and/or firmware, in light of the present disclosure.

It should be emphasized that the term "comprises/comprising" when used herein, is taken to specify the presence of stated features, elements, steps or components, but does not preclude the presence or addition of one or more other features, elements, steps or components. The terms "first," "second," and the like, as used in ordinal numbers, do not denote an order of execution or importance of the features, elements, steps, or components defined by the terms, but are used merely for identification among the features, elements, steps, or components for clarity of description.

In summary, in the embodiments according to the present disclosure, the present disclosure provides the following schemes, but is not limited thereto:

scheme 1. a training apparatus for training an abnormal sample detection apparatus, the training apparatus comprising:

a first reconstruction unit configured to generate a first reconstruction error and intermediate feature data based on training sample data as normal sample data; and

a back-end processing unit configured to generate a second reconstruction error based on the first reconstruction error and the intermediate feature data,

wherein joint training is performed on the first reconstruction unit and the back-end processing unit based on predetermined criteria with respect to the first reconstruction error and the second reconstruction error.

Scheme 2. the training apparatus of scheme 1, wherein the first reconstruction unit is implemented by an auto-encoder.

Scheme 3. the training apparatus of scheme 2, wherein the first reconstruction unit is implemented by a deep convolutional auto-encoder.

Scheme 4. the training apparatus of scheme 1, wherein the first reconstruction unit is further configured to output first reconstruction data, the first reconstruction error being a distance of the first reconstruction data from the training sample data in vector space.

Scheme 5. the training apparatus of scheme 1, wherein the vector dimension of the training sample data is greater than or equal to the vector dimension of the intermediate feature data.

Scheme 6. the training apparatus of scheme 1, wherein the back-end processing unit comprises:

a synthesis unit configured to generate synthetic data based on the first reconstruction error and the intermediate feature data; and

a second reconstruction unit configured to generate the second reconstruction error based on the synthetic data,

wherein the predetermined criterion is to minimize a sum of the first reconstruction error and the second reconstruction error.

Scheme 7. the training apparatus of scheme 6, wherein the second reconstruction unit is implemented by a long-short term memory model.

Scheme 8. the training apparatus of scheme 6, wherein the second reconstruction unit is configured to output second reconstruction data, and the second reconstruction error is a distance of the second reconstruction data from the synthetic data in a vector space.

Scheme 9. the training apparatus of scheme 6, wherein the synthesis unit normalizes the intermediate feature data to match the first reconstruction error.

Scheme 10. the training apparatus according to scheme 7, wherein the synthesis unit performs sequence learning on the intermediate feature data.

Scheme 11. the training apparatus of scheme 6, wherein the loss function of the first reconstruction unit and the loss function of the second reconstruction unit are weighted and summed to obtain a total loss function, the joint training being performed based on the total loss function.

Scheme 12. the training apparatus of scheme 11, wherein the penalty function for the second reconstruction unit implemented by the long-short term memory model is derived by performing both forward propagation and backward propagation of the long-short term memory model.

Scheme 13. the training apparatus of scheme 11, wherein the weight of the loss function of the first reconstruction unit is larger than the weight of the loss function of the second reconstruction unit.

Scheme 14. the training apparatus of scheme 1, wherein the back-end processing unit is configured to predict the second reconstruction error based on the first reconstruction error and the intermediate feature data, and

wherein the predetermined criterion is to minimize a difference between the second reconstruction error and the first reconstruction error.

Scheme 15. the training apparatus of scheme 14, wherein the back-end processing unit is implemented by a multi-layer perceptron.

Scheme 16. the training apparatus of scheme 14, wherein the loss function of the first reconstruction unit and the loss function of the back-end processing unit are weighted and summed to obtain a total loss function, and the total loss function is used for performing the joint training.

Scheme 17. the training apparatus of scheme 14, wherein the weight of the loss function of the first reconstruction unit is greater than the weight of the loss function of the back-end processing unit.

Scheme 18. an abnormal sample detection apparatus includes a trained first reconstruction unit and a back-end processing unit obtained by training of the training apparatus according to any one of schemes 1 to 18.

Scheme 19. a training method for training an abnormal sample detection apparatus, the training method comprising:

a first reconstruction step of generating a first reconstruction error and intermediate feature data based on training sample data as normal sample data by a first reconstruction unit;

a back-end processing step for generating, by a back-end processing unit, a second reconstruction error based on the first reconstruction error and the intermediate feature data; and

a joint training step of performing joint training on the first reconstruction unit and the back-end processing unit based on a predetermined criterion on the first reconstruction error and the second reconstruction error.

Scheme 20. a computer readable storage medium having stored thereon a computer program which, when executed by a computer, implements the training method of scheme 19.

While the disclosure has been disclosed by the description of the specific embodiments thereof, it will be appreciated that those skilled in the art will be able to devise various modifications, improvements, or equivalents of the disclosure within the spirit and scope of the appended claims. Such modifications, improvements and equivalents are also intended to be included within the scope of the present disclosure.

Claims

1. A training apparatus for training an abnormal sample detection apparatus, the training apparatus comprising:

2. Training apparatus according to claim 1, wherein the first reconstruction unit is implemented by an auto-encoder.

3. Training apparatus according to claim 2, wherein the first reconstruction unit is implemented by a deep convolutional auto-encoder.

4. Training apparatus according to claim 1, wherein the first reconstruction unit is further configured to output first reconstruction data, the first reconstruction error being a distance of the first reconstruction data from the training sample data in vector space.

5. The training apparatus of claim 1, wherein the vector dimension of the training sample data is greater than or equal to the vector dimension of the intermediate feature data.

6. The training apparatus of claim 1, wherein the back-end processing unit comprises:

7. The training apparatus of claim 6, wherein the second reconstruction unit is implemented by a long-short term memory model.

8. The training apparatus of claim 6, wherein the second reconstruction unit is configured to output second reconstruction data, and the second reconstruction error is a distance in vector space of the second reconstruction data from the synthetic data.

9. The training apparatus according to claim 6, wherein the synthesis unit normalizes the intermediate feature data to match the first reconstruction error.

10. The training apparatus according to claim 7, wherein the synthesizing unit performs sequence learning on the intermediate feature data.

11. Training apparatus as claimed in claim 6, wherein the loss function of the first reconstruction unit and the loss function of the second reconstruction unit are weighted and summed to obtain a total loss function, the joint training being performed on the basis of the total loss function.

12. The training apparatus of claim 11, wherein the loss function of the second reconstruction unit implemented by a long-short term memory model is obtained by performing both forward propagation and backward propagation of the long-short term memory model.

13. Training apparatus according to claim 11, wherein the weight of the loss function of the first reconstruction unit is larger than the weight of the loss function of the second reconstruction unit.

14. The training apparatus of claim 1, wherein the back-end processing unit is configured to predict the second reconstruction error based on the first reconstruction error and the intermediate feature data, and

15. The training apparatus of claim 14, wherein the back-end processing unit is implemented by a multi-layered perceptron.

16. The training apparatus as defined in claim 14, wherein the loss function of the first reconstruction unit and the loss function of the back-end processing unit are weighted and summed to obtain an overall loss function, the overall loss function being used for performing the joint training.

17. The training apparatus of claim 14, wherein the weight of the loss function of the first reconstruction unit is greater than the weight of the loss function of the back-end processing unit.

18. An abnormal sample detection apparatus comprising a trained first reconstruction unit and a back-end processing unit obtained by training of the training apparatus according to any one of claims 1 to 17.

19. A training method for training an abnormal sample detection apparatus, the training method comprising: