CN113780242A

CN113780242A - Cross-scene underwater sound target classification method based on model transfer learning

Info

Publication number: CN113780242A
Application number: CN202111160013.1A
Authority: CN
Inventors: 王大宇; 罗恒光; 张博轩; 王晓庆; 唐立赫; 张锦灿; 王志欣; 李鹏飞
Original assignee: CETC 54 Research Institute
Current assignee: CETC 54 Research Institute
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2021-12-10

Abstract

The invention discloses a cross-scene underwater sound target classification method based on model transfer learning, belongs to the technical field of underwater sound target passive reconnaissance, and is suitable for classifying underwater sound target signals received based on an array form. The method takes the target signal characteristics extracted by the time-frequency analysis method as a shared characteristic space sample of cross-scene classification, trains a convolutional neural network, and firstly encodes knowledge in a source data domain to form stable classification capability in model parameters, model prior knowledge and a model framework. And then, moving to a target scene, and fitting class relation knowledge in the model in a new sample and label space based on a knowledge distillation technology to realize classification capability. The method solves the problem of poor generalization capability of the existing underwater sound target classification method in cross-scene application through a model migration learning technology, can quickly adapt to new environment and new task under the condition of small sample and data imbalance, and greatly saves time, energy and resources consumed by large amount of data acquisition of a target domain and model retraining.

Description

Cross-scene underwater sound target classification method based on model transfer learning

Technical Field

The invention belongs to the technical field of underwater sound target passive reconnaissance, and particularly relates to a cross-scene underwater sound target classification method based on model transfer learning, which can be used for classifying underwater sound target signals received based on an array form, judging target types and rapidly adapting to different application scenes to keep good classification capability.

Background

The underwater sound target classification is an information processing technology for analyzing and processing target radiation noise signals received by sonar equipment, extracting target characteristics and judging target types. The commonly used target classification methods mainly include statistical classification, model matching, expert system, deep neural network, and the like. When the existing method is applied in a cross-scene mode, due to the reasons of working environment change, target type difference, small samples and the like, the existing method is weak in generalization capability, a classification model needs to be retrained, and in addition, the data period under the condition of obtaining and labeling a new application scene is long and large in consumption, so that the method is an important challenge in the technical result conversion process at the current stage.

The transfer learning aims to solve the learning problem of small data amount in a target scene, and effectively transfers the model from one scene to another scene. The potential motivation of model-based transfer learning is that transferred knowledge is encoded on model levels such as model parameters, model prior knowledge and architecture, an effective model is verified in a specific scene, and classification relation knowledge is extracted by combining a knowledge distillation technology to perform discrimination and classification, so that the cost of performing cross-scene transfer can be greatly reduced. Therefore, the cross-scene underwater sound target classification and identification method based on model migration is an effective solution. However, such a solution is still lacking in the prior art.

Disclosure of Invention

In view of this, the technical problem to be solved by the present invention is to provide a cross-scene underwater sound target classification method based on model transfer learning, which has the advantages of strong generalization capability, high accuracy and good robustness in cross-scene application.

In order to achieve the purpose, the invention adopts the technical scheme that:

a cross-scene underwater sound target classification method based on model transfer learning comprises the following steps:

(1) filtering and denoising an underwater sound target radiation noise signal received by sonar equipment, filtering out clutter and enhancing the signal to form processed time domain signal data;

(2) performing advanced feature mapping processing on the time domain signal data obtained in the step (1), and extracting a time-frequency spectrogram with inter-class discrimination as cross-scene identification shared space features;

(3) determining hyper-parameters of a convolutional neural network model, marking the class of the time-frequency spectrogram, and training the convolutional neural network model;

(4) introducing a knowledge distillation technology into the convolutional neural network model obtained in the step (3), extracting a target type soft label and training, so that class relation knowledge in the model is fitted to a classification performance target in a new sample and label space;

(5) carrying out cross-scene underwater sound target classification by using the convolutional neural network model trained in the step (4);

and completing the cross-scene underwater sound target classification based on model transfer learning.

Wherein, the concrete mode of the step (1) is as follows:

(101) dividing real number form underwater acoustic target radiation noise signals collected by a matrix into equal-time-length samples;

(102) carrying out beam forming on the sample data by adopting a split beam correlation method, and carrying out spectrum peak detection in all directions to select a target suspected direction;

(103) carrying out time accumulation on the suspected target position obtained in the step (102), and obtaining a target accurate position through empirical threshold judgment;

(104) performing time domain beam forming according to the target accurate position obtained in the step (103), and extracting enhanced time domain signal data;

(105) and (5) performing frequency domain filtering on the time domain signal data obtained in the step (104) through an FIR filter to finish filtering and noise reduction.

Wherein, the concrete mode of step (2) is:

and performing short-time Fourier transform on each frame of time domain signal data, and extracting a time-frequency spectrogram with inter-class discrimination as cross-scene identification shared space characteristics.

The convolutional neural network model in the step (3) comprises 1 input layer, 4 convolutional layers, 4 maximum pooling layers, 2 full-link layers and 1 Softmax layer, and the convolutional layers use a ReLU function as an activation function.

In the step (3), the time-frequency spectrogram is labeled to belong to the category, and the convolutional neural network model is trained in the specific mode:

(301) labeling the time spectrum graph according to the target type recorded in the collection process, and using the labeled time spectrum graph as a training sample of the convolutional neural network model;

(302) dividing all time-frequency spectrogram characteristic data into a training set, a verification set and a test set, and disordering the sequence to ensure that various types of data in each sample set are uniformly distributed;

(303) and training the convolutional neural network model by using the sample set data to obtain the convolutional neural network model with the classification and recognition capability.

The specific method of the step (4) is as follows:

(401) adjusting the temperature parameter of a Softmax function in the convolutional neural network model;

(402) extracting cross-scene classification sharing spatial features of data in a target scene for labeling, and taking the cross-scene classification sharing spatial features as training samples;

(403) inputting the training sample into a convolution layer of a convolution neural network model, and passing a nonlinear activation function through the convolution layer;

(404) sending the output of the nonlinear activation function into a pooling layer, performing feature dimension reduction, and reserving key information;

(405) adjusting corresponding parameters, repeating the steps (403) and (404) for a set number of times, and inputting the result into the full connection layer;

(406) inputting the output result of the full connection layer into a softmax function, and extracting a target type soft tag;

(407) calculating the loss of the target type soft label output in the step (406) and the label value, training the soft label, and forming label space and class relation knowledge mapping under a new scene;

(408) and after the training is finished, adjusting the temperature parameter of the Softmax function to be 1, and deploying and applying.

The beneficial effects obtained by the invention are as follows:

1. the method uses an artificial intelligence-based underwater acoustic target identification method, determines a time-frequency spectrogram with inter-class characterization characteristics as a shared space characteristic of cross-scene classification, learns target characteristics by using a deep neural network, and forms target identification capability in a source data domain.

2. The invention is based on the transfer learning technology of the model, utilizes the knowledge obtained by the source data domain to influence the learning and the performance in the target scene, indicates that the knowledge coded into the model parameters, the model prior knowledge and the model architecture is beneficial to the learning of the target domain model, and carries out knowledge sharing on the model level, thereby improving the generalization capability of the model applied across scenes.

3. The invention introduces knowledge distillation technology, solves the practical application problem of small sample amount or unbalance in a target scene, enables the method to be quickly adapted to a new environment, a new target or a new task, can achieve good classification performance by only needing a small amount of target domain samples, and provides a technical approach for the conversion and application of research results.

Drawings

FIG. 1 is a schematic diagram of an embodiment of the present invention;

FIG. 2 is a flow chart of a filtering and denoising process in an embodiment of the present invention;

FIG. 3 is a flow chart of feature extraction in an embodiment of the present invention;

FIG. 4 is a schematic diagram of a convolutional neural network structure for source domain knowledge learning in an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a convolutional neural network model based on a model migration technique in an embodiment of the present invention.

Detailed Description

The technical scheme of the invention is further explained in detail with reference to the attached drawings.

A cross-scene underwater sound target classification method based on model transfer learning is used for preprocessing target noise signals acquired by an array by using a signal processing method to filter interference when extracting features. Fig. 1 is a schematic diagram of implementation of cross-scene training, in which source scene target features are first extracted as training samples, and then a Convolutional Neural Network (CNN) is used to learn sample feature knowledge to form stable classification capability. And then, transferring the model to a target scene, and introducing a knowledge distillation technology to quickly realize the underwater sound target classification capability in the target scene by utilizing the structure and parameters of the source scene model. The method specifically comprises the following steps:

(1) filtering and denoising an underwater sound target radiation noise signal received by sonar equipment, filtering out clutter and enhancing the signal to form processed time domain signal data; the method comprises the following specific steps:

(2) Performing domain transformation on the time domain signal data in the step (1), and extracting a time frequency spectrogram (LOFAR) feature sample with inter-class discrimination as a cross-scene recognition shared space feature; the method specifically comprises the following steps:

(3) Labeling the class of the time-frequency spectrogram characteristic sample obtained in the step (2), and training a Convolutional Neural Network (CNN) by using the time-frequency characteristic; the training skill of 'dropout' can be used in the network training to prevent the over-fitting phenomenon;

the training process comprises the following steps:

(301) classifying and labeling the time spectrum graph characteristic samples according to the target types recorded during collection to serve as training samples of the convolutional neural network;

(302) dividing all time-frequency spectrogram data into a training set, a verification set and a test set according to the proportion of 3:1:1, and disordering the sequence to ensure that various data in each sample set are uniformly distributed;

(303) constructing a convolutional neural network model according to the characteristics of time-frequency data, wherein the convolutional neural network model comprises 1 input layer, 4 convolutional layers, 4 maximum pooling layers, 2 full-connection layers and 1 Softmax layer, and the convolutional layers use a ReLU function as an activation function;

(304) in the training stage, firstly, trainable parameters and hyper-parameters are initialized, time-frequency data are input into CNN to extract characteristic information, and the characteristic information passes through a nonlinear activation function;

(305) sending the output of the activation function to a pooling layer, performing feature dimension reduction, and reserving key information;

(306) adjusting corresponding parameters, repeating the steps (304) and (305) for three times, and inputting the result into a full connection layer to enable the characteristics to be mapped to a sample mark space;

(307) inputting the result (306) into a softmax classification function to obtain a prediction type probability;

(308) calculating loss by using the output of (307) and the label value, and updating the model parameters;

(309) and performing self-adaptive optimization according to the change trend of the classification accuracy of the verification set in the training process until a convolutional neural network model with classification and recognition capabilities is obtained.

(4) Introducing a target type soft label calculation and training method into the model obtained in the step (3) based on a knowledge distillation technology; the method comprises the following specific steps:

(401) loading the convolutional neural network classification model obtained in the step (3), and adjusting the temperature parameter of the Softmax function;

(402) extracting cross-scene classification sharing spatial features (LOFAR) of data in a target scene for labeling, and using the LOFAR as a training sample of a new neural network;

(403) inputting the time-frequency spectrogram into a convolution layer of a convolution neural network model to extract characteristic information, and enabling the characteristic information to pass through a nonlinear activation function;

(405) adjusting corresponding parameters, repeating the steps (403) and (404) for three times, and inputting the result into a full connection layer to enable the characteristics to be mapped to a sample mark space;

(406) inputting the result (405) into a softmax classification function, and extracting a target type soft label;

(407) calculating the loss by using the output of (406) and the label value, and forming a label space and category relation knowledge mapping under a new scene;

The following is a more specific example:

a cross-scene underwater sound target classification method based on model transfer learning specifically comprises the following steps:

s1: filtering noise reduction

In an actual working environment, no matter source domain data or target domain data, a plurality of targets usually exist in a detection range at the same time, that is, a condition that multi-target features are interleaved exists in data collected by sonar equipment, and deep learning needs feature samples which are as clean as possible, so that a method is needed for suppressing noise signals of other targets as much as possible when signal data of a certain target is obtained. Fig. 2 is a flow chart of filtering and denoising processing in this embodiment, where data needs to be divided at equal intervals, a split beam correlation method is used to perform spectrum peak detection in all directions by using target spatial distribution characteristics to select a target suspected direction, time accumulation is performed on the target suspected direction, an empirical threshold decision is performed to obtain a target accurate direction, time-domain beam forming is performed on the direction to complete spatial filtering, and the purpose of filtering target signals in other directions is achieved; and further performing frequency filtering on the data obtained by the space-domain filtering through an FIR filter, so that the interference of ocean background noise and other noise is reduced, and clean sample data is obtained.

S2: time-frequency feature extraction

Referring to fig. 3, the time-frequency feature extraction process flow is shown, for filtered sonar signal data (L)₁(n)，L₂(n)，...，L_k(n) performing a short-time Fourier transform to generate a time-frequency spectrum sequence (F)₁(u，v)，F₂(u，v)，…，F_k(u, v)), the implementation steps are as follows:

s21: by controlling the overlapping of frames, dividing a sampling sequence of an original signal into a plurality of continuous frames which are in front-back connection;

s22: for each frame signal x_k(n) performing short-time Fourier transform to obtain F_k(m，n)；

By using a window function, the short-time fourier transform is made to have local characteristics, so that a "local spectrum" of the underwater acoustic signal can be acquired by using the window function, and the operation is as follows:

wherein g is^*Is a window function, N is a frequency index, N is the number of FFT points, and m is a sliding step length.

And selecting a Hanning window as a window function, obtaining an LOFAR spectrogram through fast Fourier transform, and extracting low-frequency line spectrum characteristics of the acoustic signal.

S3: convolutional neural network model training

A convolutional neural network model of learning time-frequency characteristics is built by referring to the network model structure of FIG. 4, and all sample data are divided into a training set, a verification set and a test set according to the ratio of 3: 1. In the training stage, the feature maps of the training set and the verification set are input into the model, the classification accuracy change of the verification set in the training process is observed, and the hyper-parameters are manually adjusted to improve the generalization capability and robustness of the model. In the testing stage, the characteristic diagram in the test set is input into the trained CNN model, and the training process of the model is completed.

S31: when a CNN model is built based on features of a LOFAR spectrogram, the size of the spectrogram needs to be normalized to 128 × 128, so that the consistency of dimensions is ensured;

s32: inputting the normalized LOFAR spectrogram into a convolution layer with 8 × 8 convolution kernels, 5 channels and 1 step length to obtain an ith hidden layer hi, wherein the working process of the convolution kernels in the convolution layer can be represented by the following formula:

(i，j)∈{0，1，...，L_l+1}

wherein b is the deviation amount, Z^lAnd Z^l+1Convolution inputs and outputs representing the l +1 th layer, also called the signature, W^l+1Is a convolution kernel of layer L +1, L_l+1Is Z^l+1The feature pattern length and width are assumed to be the same. Z (i, j) corresponds to the pixel of the characteristic diagram, kl is the channel number of the characteristic diagram, k refers to a certain channel, f, s₀And p is a convolutional layer parameter, corresponding to the convolutional kernel size, convolutional step size, and number of filling layers.

To increase the non-linear relationship between the layers of the neural network and alleviate the over-fitting problem, hi is input into the ReLU activation function as follows:

s33: and (3) sampling and dimensionality reduction is carried out on the feature map by using a pooling layer with a result input window length of 2 and a step length of 2, irrelevant information is removed, the parameter quantity is reduced, and the general representation form of the pooling layer is as follows:

step length s in the formula₀Pixel (i, j) has the same meaning as the convolution layer, and p is a pre-specified parameter. When p → ∞ the pooling takes a maximum within the region, it is called maximum pooling.

S34: and adjusting the convolution kernel parameters of the convolution layer, continuously iterating for 2 times by adopting a method of the convolution layer + the ReLU function + the pooling layer, and further extracting deep features of the feature map. After all the required local features are obtained, the local features are mapped to a mark space of a sample through 2 full-connection layers in sequence to obtain a one-dimensional vector; using 2 fully connected layers is to better fit the true probability distribution;

s35: and then utilizing a softmax function to represent the multi-classification result in a probability form, wherein the calculation method of the softmax function is as follows:

wherein f is_cA vector denoted c;

s36: the cross entropy function is used as the loss function, and the concrete form is as follows

Wherein t is_iIndicates the tag value, y_iRepresenting the output of the softmax function. Root of herbaceous plant

S37: an adam (adaptive motion estimation) algorithm is adopted as an optimization algorithm of a deep learning model, and adaptive adjustment of a learning rate is realized by using first-order Moment estimation and second-order Moment estimation, wherein the formula is as follows:

v_dw＝0v_db＝0S_dw＝0S_db＝0

the parameter updating formula is as follows:

the deviation correction is carried out using an exponentially weighted average algorithm in the following way:

where t denotes the number of iterations, β₁Is a parameter of momentum, usually 0.9, beta is a parameter of RMSprop, usually 0.999, e is mainly used to avoid the case where the denominator is 0, usually 10^-8W and b represent weights and biases of the neural network, respectively, and α represents a learning rate.

The square of the gradient of the parameter w is indicated.

S4: training of target cross-scene underwater sound target classification model

A convolutional neural network of learning time-frequency characteristics is built by referring to a network model structure of FIG. 5, a Softmax function with high temperature is used for reserving enough class relation information, and target domain samples are divided into a training set, a testing set and a verification set according to the proportion of 3: 1. And in the training stage, extracting the target type soft label, and finishing the fitting of the target sample and the label space on the target type relation based on the loss of the relation knowledge between the classes of the soft label. In the testing stage, the characteristic diagram in the test set is input into the trained CNN model, and the training process of the model is completed.

S41: setting the temperature parameter of the Softmax function to be T-5 in the training stage, and calculating the calculation method of the Softmax function according to the multi-classification result presented in the probability form:

wherein f is_cA vector denoted c, T being a temperature parameter;

s42: extracting soft labels of the category t by calculating a Softmax average value of a target domain type t sample excitation function, and training a new model in the same training process (S3);

s43: after the training is completed, the temperature parameter is adjusted by 1.

According to the method, sonar signal features extracted by a time-frequency analysis method are used as cross-scene classification shared space features, a convolutional neural network is firstly used for deep learning in a source domain to form stable classification capability, knowledge is coded in model parameters, model priori knowledge and a model framework, and then a model-based transfer learning technology is applied to a target scene. In the actual measurement process, a deep neural network model trained by 14600 samples of 4 types of targets collected in offshore sea areas of south China sea is used for verifying the classification accuracy by 90.6%. When the method is transferred to a target scene to process data, the classification accuracy can reach 85% only by a small number of samples (the sampling rate is 5kS/s, the length is 3.768 seconds), the method can be quickly adapted to a new environment and a new task under the conditions of small samples and unbalance, and the time, the energy and the resources consumed by large-amount data acquisition of a target domain and model retraining are greatly saved.

In a word, the knowledge between classes is extracted and migrated by utilizing the strong expression capacity of deep learning, and the knowledge distillation technology is particularly adopted to realize the capacity of cross-domain application of the deep neural network model in the aspect of underwater sound target passive identification, and the classification effect is good. The method is an effective technical approach aiming at the practical problems of high underwater sound data acquisition cost, unbalanced samples and the like.

Claims

1. A cross-scene underwater sound target classification method based on model transfer learning is characterized by comprising the following steps:

2. The cross-scene underwater sound target classification method based on model transfer learning according to claim 1 is characterized in that the specific mode of the step (1) is as follows:

3. The cross-scene underwater sound target classification method based on model transfer learning according to claim 1, wherein the concrete mode of the step (2) is as follows:

4. The model migration learning-based cross-scene underwater sound object classification method according to claim 1, wherein the convolutional neural network model in the step (3) comprises 1 input layer, 4 convolutional layers, 4 maximum pooling layers, 2 full-connected layers and 1 Softmax layer, and the convolutional layers use ReLU function as the activation function.

5. The cross-scene underwater sound target classification method based on model transfer learning of claim 1, wherein in the step (3), the time-frequency spectrogram is labeled to the category to which the time-frequency spectrogram belongs, and a convolutional neural network model is trained in a specific manner of:

6. The cross-scene underwater sound target classification method based on model transfer learning according to claim 4 is characterized in that the specific method in the step (4) is as follows: