CN114492540B

CN114492540B - Training method and device of target detection model, computer equipment and storage medium

Info

Publication number: CN114492540B
Application number: CN202210308846.6A
Authority: CN
Inventors: 不公告发明人
Original assignee: Chengdu Shuzhilian Technology Co Ltd
Current assignee: Chengdu Shuzhilian Technology Co Ltd
Priority date: 2022-03-28
Filing date: 2022-03-28
Publication date: 2022-07-05
Anticipated expiration: 2042-03-28
Also published as: CN114492540A

Abstract

The embodiment of the application discloses a training method and device of a target detection model, computer equipment and a storage medium, and relates to the technical field of image recognition. The method comprises the following steps: and constructing a network structure of the target detection model to obtain an initial detection model, inputting a plurality of signal time-frequency graphs into the initial detection model, outputting a final prediction result, constructing a loss function to calculate training loss, and adjusting the initial detection model according to the training loss to obtain the target detection model. The expansion convolution layer and the deconvolution layer are added in the network structure, and the negative sample weight value is added in the loss function, so that the target detection model can smoothly distinguish a real target area from a blank target area in training. Therefore, the signal data in the signal time-frequency diagram can be trained and detected quickly, and various types of signal data can be detected accurately.

Description

Training method and device of target detection model, computer equipment and storage medium

Technical Field

The scheme belongs to the technical field of image recognition, and particularly relates to a training method and device of a target detection model, computer equipment and a storage medium.

Background

In order to ensure the reliability of information transmission, the information transmission system must have stable anti-interference capability, and signal detection is one of the best methods for resisting interference. The existing signal detection scheme is a time-frequency analysis method, and the process is as follows: and mapping the one-dimensional signal to a two-dimensional plane to generate a signal time-frequency diagram. The target signal data in the signal time-frequency diagram is detected by using a deep neural network, which is called a target detection problem. The signal time-frequency diagram can be subjected to target detection through a YOLO algorithm, a YOLOV3 algorithm and a Poly-YOLO algorithm, and when the signal time-frequency diagram is subjected to target detection through the YOLO algorithm, the detection effect on small targets and dense targets is poor, and target detection can hardly be performed on the signal time-frequency diagram with short and dense signals. When the Yolov3 algorithm detects the target of the signal time-frequency diagram, the problems of inaccurate identification of the large target, inaccurate frame regression and inaccurate accurate detection of the dense small target exist; due to the network structure of the Poly-YOLO algorithm, convergence is difficult during training, and training and testing speeds are slow, so that the rapidness and the instantaneity of signal detection cannot be met.

Disclosure of Invention

In order to solve the problems that the existing target detection algorithm is poor in detection effect and difficult to train on different types of signal data in a signal time-frequency diagram, the application provides a training method and device, computer equipment and a storage medium of a target detection model, the signal data of different types of signal time-frequency diagrams can be accurately detected, the training and detection speed is high, and the real-time performance of signal detection can be met.

In order to achieve the purpose, the invention adopts the following technical scheme:

in a first aspect, an embodiment of the present application provides a method for training a target detection model, where the method includes:

acquiring a training set, wherein the training set comprises a plurality of signal time-frequency graphs;

constructing an initial detection model, wherein the initial detection model comprises a backbone network and a head network;

inputting the signal time-frequency graphs into the backbone network, calculating the signal time-frequency graphs through the backbone network, and outputting a plurality of characteristic layers;

inputting a plurality of characteristic layers into the head network, and respectively calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;

acquiring real target information of the signal time-frequency diagrams;

constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;

and adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.

In one possible implementation, the header network includes: a first transformation convolutional layer, a second transformation convolutional layer, a third transformation convolutional layer, a fourth transformation convolutional layer, a first expansion convolutional layer, a second expansion convolutional layer, a first deconvolution layer, a second deconvolution layer and a subsequent convolutional layer; the plurality of feature layers includes: the first characteristic layer, the second characteristic layer and the third characteristic layer;

the step of inputting the first feature layer, the second feature layer, and the third feature layer into the head network, and calculating the first feature layer, the second feature layer, and the third feature layer through the head network to obtain final prediction results corresponding to the signal time-frequency diagrams includes:

enabling the second characteristic layer to pass through the second conversion convolution layer to obtain a first output result;

converting the number of channels of the third feature layer through the third conversion convolutional layer, expanding the visual field degree through the first expansion convolutional layer, and performing up-sampling through the first deconvolution layer to obtain a second output result;

adding the first output result and the second output result to obtain a third output result;

converting the third output result by the fourth conversion convolutional layer according to the number of channels, expanding the visibility by the second expansion convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result;

obtaining a fifth output result by passing the first characteristic layer through the first transformation convolution layer;

adding the fourth output result and the fifth output result to obtain a sixth output result;

and obtaining the final prediction result corresponding to each signal time-frequency diagram by the sixth output result through the subsequent convolutional layer.

In one possible implementation, the method further includes:

dividing each signal time-frequency diagram into a plurality of grids, and setting a preset number of anchor frames for each grid;

the preset loss function includes: a first loss function, a second loss function, a third loss function;

the step of calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function comprises the following steps:

calculating the loss of the target of each anchor frame through the first loss function, and calculating the sum of the loss of the target of each anchor frame to obtain a first prediction loss;

calculating the target category loss of each anchor frame through the second loss function, and calculating the sum of the target category losses of each anchor frame to obtain a second prediction loss;

calculating the target coordinate loss of each anchor frame through the third loss function, and calculating the sum of the target coordinate losses of each anchor frame to obtain a third prediction loss;

and summing the product value of the first prediction loss multiplied by a first weight coefficient, the product value of the second prediction loss multiplied by a second weight coefficient and the product value of the third prediction loss multiplied by a third weight coefficient to obtain the current training loss.

In one possible implementation, the step of calculating whether there is a loss of the target of each of the anchor frames by the first loss function includes:

when the jth anchor frame of the ith grid contains real target information, calculating the target loss of the jth anchor frame of the ith grid according to the confidence coefficient of the fact that the real target information exists in the jth anchor frame of the ith grid through the first loss function;

and when the jth anchor frame of the ith grid does not contain real target information, calculating the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information exists according to the confidence coefficient of the real target information of the jth anchor frame of the ith grid and the jth anchor frame of the ith grid in the predicted jth anchor frame of the ith grid through the first loss function, wherein the target of the jth anchor frame of the ith grid has no loss.

In a possible implementation manner, when the jth anchor frame of the ith mesh contains real target information, calculating, by the first loss function, a confidence that the jth anchor frame of the ith mesh has the real target information according to the fact that the jth anchor frame of the ith mesh has the real target information, where there is no loss in the target of the jth anchor frame of the ith mesh, including:

calculating the target loss of the jth anchor frame of the ith grid according to the following formula (1):

formula (1):

；

wherein,

said object being the jth anchor frame of the ith mesh has no loss,

there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,

the jth anchor box representing the ith mesh contains real target information.

In a possible implementation manner, when the jth anchor frame of the ith mesh does not contain real target information, the calculating, by the first loss function, the confidence that the real target information exists in the jth anchor frame of the ith mesh and the intersection of the jth anchor frame of the ith mesh and the region where the real target information exists are lossless according to the predicted jth anchor frame of the ith mesh, and the calculating includes:

calculating whether the target of the jth anchor frame of the ith grid has loss according to formula (2):

the formula (2) is:

wherein,

said object being the jth anchor frame of the ith mesh has no loss,

the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information is located,

the jth anchor box representing the ith mesh contains no real target information.

In a possible implementation manner, the step of calculating the object class penalty of each anchor frame through the second penalty function includes:

determining the target class penalty for each of the anchor boxes according to equation (3), wherein:

the formula (3) is:

；

wherein,

the target category loss of the jth anchor frame of the ith mesh, nc is the preset number, k is the category of the real target information,

when the prediction category corresponding to the jth anchor frame of the ith grid is the category of the real target information, the prediction category is 1, otherwise, the prediction category is 0,

to predict the probability that the jth anchor box of the ith mesh belongs to the category of the true target information.

In a possible implementation manner, the step of calculating the target coordinate loss of each anchor frame through the third loss function includes:

determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein:

equation (4) is:

；

wherein,

the target coordinate penalty for the jth anchor box of the ith mesh,

represents the intersection ratio of the predicted target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located,

the method comprises the steps of representing the straight-line distance between the center point of a prediction target area corresponding to the jth anchor frame of the ith grid and the center point of an area where the real target information is located, representing the diagonal length of the minimum circumscribed matrix of the prediction target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located, wherein alpha is a first preset parameter, and v is a second preset parameter.

In one possible implementation, the method further includes:

clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories;

and acquiring the coordinates of each clustering center, and adjusting the size of the corresponding anchor frame according to the coordinates of each clustering center.

In a possible implementation manner, the step of adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model includes:

judging whether the current training loss is smaller than a preset training loss threshold value or not;

if the current training loss is larger than the preset training loss threshold, continuing training the initial detection model, adjusting the initial detection model in a back propagation mode through a small batch gradient descent algorithm until the training loss continuously obtained in N training periods is smaller than the preset training loss threshold, and taking the adjusted initial detection model as the target detection model.

In a second aspect, an embodiment of the present application provides an apparatus for training a target detection model, where the apparatus includes:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training set, and the training set comprises a plurality of signal time-frequency graphs;

the device comprises a first construction module, a second construction module and a third construction module, wherein the first construction module is used for constructing an initial detection model, and the initial detection model comprises a backbone network and a head network;

the first calculation module is used for inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network and outputting a plurality of characteristic layers;

the second calculation module is used for inputting the plurality of characteristic layers into the head network, and calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;

the second acquisition module is used for acquiring the real target information of the signal time-frequency diagrams;

the second construction module is used for constructing a preset loss function and calculating the current training loss between the final prediction result corresponding to each signal time-frequency graph and the real target information through the preset loss function;

and the adjusting module is used for adjusting the parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, performs the training method for the target detection model according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the training method of the object detection model according to the first aspect.

Compared with the prior art, the method has the following beneficial effects:

the training method, the training device, the computer equipment and the storage medium of the target detection model provided by the embodiment use a network structure added with an expansion convolutional layer and a deconvolution layer, and expand the visual field of a characteristic layer; and a loss function added with a negative sample weight value is used, so that the target detection model is converged quickly in training. The finally obtained target detection model can quickly and accurately detect various types of signal data in the signal time-frequency diagram.

Drawings

In order to more clearly explain the technical solutions of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of protection of the present application. Like components are numbered alike in the various figures, and other related figures may also be derived from these figures by a person of ordinary skill in the art without inventive effort.

FIG. 1 is a flow chart of a method for training a target detection model according to an embodiment of the present invention;

fig. 2 is an exemplary diagram of a signal time-frequency diagram according to an embodiment of the present invention;

FIG. 3 is a diagram of an exemplary initial detection model provided by an embodiment of the invention;

FIG. 4 is another schematic flow chart of a method for training a target detection model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a training apparatus for an object detection model according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.

Example 1

The embodiment provides a training method of a target detection model. As shown in fig. 1, the training method for providing a target detection model in this embodiment includes the following steps:

step S110, a training set is obtained, wherein the training set comprises a plurality of signal time-frequency graphs.

When signal data in the signal time-frequency diagram is detected through the target detection model, a region where a signal exists is a signal target region, a region where the signal does not exist is a noise region, and the signal target region and the noise region have a clear boundary. As shown in fig. 2, fig. 2 is an example of a signal time-frequency diagram, where the abscissa represents the time domain and the ordinate represents the frequency domain, and some signal data with short duration and small frequency range are reflected to a small and dense rectangular object, such as 201 in fig. 2, on the signal time-frequency diagram; some signal data with long duration and wide frequency range are reflected in the signal time-frequency diagram as a rectangular object with large length and width, such as 202 in fig. 2. In practical applications, these different types of signals may appear in one signal time-frequency diagram at the same time, and in order to better detect various types and shapes of signal data in the signal time-frequency diagram, this embodiment provides a target detection model and a training method thereof.

Step S120, an initial detection model is constructed, and the initial detection model comprises a backbone network and a head network.

In one embodiment, the backbone network is a Resnet18 network, which has a smaller network structure and is faster in training and operation, and can better meet the real-time requirement of target detection.

In one embodiment, the header network comprises: a first transformational convolutional layer, a second transformational convolutional layer, a third transformational convolutional layer, a fourth transformational convolutional layer, a first expansion convolutional layer, a second expansion convolutional layer, a first deconvolution layer, a second deconvolution layer and a subsequent convolutional layer. In steps S130 and S140, the operation of the above-described convolutional layer will be described in detail.

Step S130, inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network, and outputting a plurality of characteristic layers.

As shown in fig. 3, in an embodiment, when passing through a backbone network ResNet18, a signal time-frequency diagram passes through a stem layer and four Block layers (B1, B2, B3, B4), wherein a feature layer obtained after passing through a second Block layer B2 is F2, i.e., a first feature layer; the characteristic layer obtained after passing through the third Block layer B3 is F3, namely a second characteristic layer; the feature layer obtained after the last Block layer B4 is F4, the third feature layer. The feature layer includes three dimensions: the length and width of each feature map are equal to the number of channels, and the number of channels represents the number of layers of the feature map, which is usually the number of convolution kernels of a convolutional layer passed before the feature layer.

Step S140, inputting the plurality of feature layers into the head network, and calculating each feature layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram. The header network, i.e., HeadNet in fig. 3.

As shown in fig. 3 and 4, the step of inputting the first feature layer, the second feature layer, and the third feature layer into the head network, and calculating the first feature layer, the second feature layer, and the third feature layer through the head network to obtain the final prediction result corresponding to each signal time-frequency diagram includes:

step S410, obtaining a first output result from the second feature layer through the second transform convolution layer.

The second feature layer F3 has been changed in the number of channels by the second conversion convolution layer conv2, resulting in a first output result.

Step S420, transform the number of channels of the third feature layer through the third transformed convolutional layer, expand the visibility through the first expanded convolutional layer, and perform upsampling through the first deconvolution layer to obtain a second output result.

The third featured layer, F4 in fig. 3. In one embodiment, the third transformed convolutional layer conv3 is a 1 × 1 convolutional layer; the expansion rate of the first expanded convolutional layer diaconv1 is 2, and the effect is to expand the visual field; the first deconvolution layer transconv1 is used to transform the length and width of the feature map in the third feature layer to 2 times the original length and width, but does not change the number of channels.

In step S430, the first output result and the second output result are added to obtain a third output result, i.e., H1 in fig. 3.

Step S440, transforming the number of channels of the third output result by the fourth transformational convolutional layer, expanding the visibility by the second dilation convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result.

The parameters of the fourth convolutional layer conv4, the second convolutional layer diaconv2, and the second convolutional layer transconv2 are set according to specific tasks, and are not limited herein.

Step S450, a fifth output result is obtained by passing the first feature layer through the first transform convolution layer.

The first feature layer F2 has the number of channels changed by the first conversion convolution layer conv1, and the parameters of the first conversion convolution layer conv1 are set according to specific tasks, and are not limited herein.

In step S460, the fourth output result and the fifth output result are added to obtain a sixth output result, i.e., H2 in fig. 3.

And step S470, obtaining the final prediction result corresponding to each signal time-frequency diagram through the subsequent convolutional layer according to the sixth output result.

The subsequent convolution layer convs only changes the number of channels of the sixth output result, and does not change the size of the feature map in the sixth output result, so that the final prediction result is H3 in fig. 3.

In the embodiment, all the feature layers, the feature maps and the output results are stored in the computer device in the form of a matrix, and the addition and subtraction operations thereof follow the algorithm of the matrix.

And step S150, acquiring the real target information of the signal time-frequency diagrams.

In this embodiment, the real target information is signal data of an area where the real target is located in the signal time-frequency diagram, and includes whether the signal data exists in the area where the real target is located, a category of the signal data, and a coordinate of the signal data.

Step S160, constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function.

In an embodiment, the step of calculating, by using the preset loss function, a current training loss between a final predicted result corresponding to each signal time-frequency diagram and real target information includes:

dividing each signal time-frequency graph into a plurality of grids, and setting a preset number of anchor frames for each grid;

in the target detection algorithm, an Anchor Box (Anchor Box) is a plurality of preset rectangular boxes with different sizes for detecting signal data, wherein the size of the Anchor Box is configured when a model is configured.

In one embodiment, the size of the anchor box is obtained by the Kmeans algorithm. Clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories; and acquiring the coordinates of each clustering center, and adjusting the size of the corresponding anchor frame according to the coordinates of each clustering center. The Kmeans algorithm is a common clustering algorithm for classifying each sample in the data set into a class corresponding to the cluster center with the smallest distance.

The number of anchor frames set for the grid is equal to the cluster type set during clustering, such as: and (4) clustering the real target information in the signal time-frequency diagram into 9 types, and setting 9 anchor frames for each grid.

In one embodiment, the preset loss function comprises: a first loss function, a second loss function, and a third loss function.

in one embodiment, said step of calculating by said first penalty function whether there is a penalty on the target of each of said anchor boxes comprises:

And the confidence of the real target information is calculated by a target detection model, and the intersection and union ratio of the jth anchor frame of the ith grid and the area of the real target information is the ratio of the intersection and union of the area of the jth anchor frame of the ith grid and the area of the real target information.

In one embodiment, when the jth anchor frame of the ith mesh contains real target information, calculating, by the first loss function, whether there is a loss of the target of the jth anchor frame of the ith mesh according to a confidence that the jth anchor frame of the ith mesh has the real target information, the method includes:

formula (1):

；

wherein,

the target of the jth anchor frame for the ith mesh has no loss,

the jth anchor box representing the ith mesh contains real target information.

It should be noted that although the base of the logarithmic function log in the formula (1) is omitted in writing, the base of the logarithmic function log is 1 in practical application, and the formulas (2) and (3) are the same.

In one embodiment, when the jth anchor frame of the ith mesh does not contain real target information, the calculating, by the first loss function, the confidence that the real target information exists according to the jth anchor frame of the ith mesh and the intersection of the jth anchor frame of the ith mesh and the region where the real target information exists are lossless according to the predicted jth anchor frame of the ith mesh, and the method includes:

calculating according to formula (2)

A first of the grid

The target of each anchor frame has no loss:

the formula (2) is:

wherein,

said object being the jth anchor frame of the ith mesh has no loss,

the j-th anchor frame of the i-th grid is compared with the intersection of the area where the real target information is located,

In the following description of the loss function, positive examples indicate examples that are consistent with the true target information class, and negative examples indicate examples that are inconsistent with the true target information class. In the formulas (1) and (2), the confidence coefficients of the actual target information of all the anchor frame predictions are calculated, and punishments are made on the anchor frame with the wrong prediction according to the error degree. In practical application, a situation that a certain anchor frame does not have a target but has a large overlapping area with a region where real target information is located may exist, the intersection ratio of the anchor frame and the region where the real target information is located is high in the situation, the situation generally occurs in the anchor frame in a grid near the center of the real target information, and it is considered that the training effect of the target detection model is influenced when the confidence coefficient of the anchor frame is predicted to be 1 or 0. Therefore, in this case, the confidence that the anchor frame has the real target information is multiplied by a negative sample weight value

If, if

The weight value of the negative sample is higher through square calculation, so that the weight value of the negative sample becomes very low, and even if the confidence coefficient of the target predicted by the anchor frame is close to 1, the large target cannot be generated and no loss exists; and if all of the anchor frames are blank areas,

a negative sample weight of 1 at 0 will result in a large target with no loss. It needs to be noticed that when the target detection model is trained in the early stage, the gradient explosion phenomenon is easily generated without loss of the target because the number of the anchor frames is too large and the positive and negative samples are quite unbalanced when the target detection model is trained, and the target detection model is used for detecting the target in the early stage

The accumulation of extreme values close to 0 or close to 1 produces a target with no loss and a large gradient, so that the target with no loss is calculated before the target with no loss is calculated

Make truncation, the scheme will

Cut off to [0.0001, 0.9999 ]]Within the interval. The specific cutting-off mode is as follows: values less than 0.0001 are taken as unity and values greater than 0.9999 are taken as unity and 0.9999, for example: if it is

When the ratio is 0.00005, the calculation is carried out according to 0.0001; if it is

And the calculation is 0.99999, the calculation is carried out according to 0.9999.

According to the embodiment, the negative sample weight value is added, so that the target detection model is forced to distinguish a real target area and a blank target area, the training effect of the target detection model is improved, and the gradient explosion phenomenon is avoided by intercepting the data interval.

In one embodiment, the step of calculating the object class penalty for each of the anchor boxes by the second penalty function comprises: determining the target class penalty for each of the anchor boxes according to equation (3), wherein:

the formula (3) is:

；

wherein,

The object class penalty is only generated if the anchor box contains real object information, and is 0 in the rest of the cases. The target class loss of a single anchor frame is used for calculating the average value of the two-class cross entropy loss for all the classes of the single anchor frame, and the average value is used for measuring the prediction accuracy of the target detection model. For example: if the signals in the signal time-frequency diagram are classified into 9 classes, the preset number nc is also 9,

representing the probability of predicting that the jth anchor box of the ith mesh belongs to the category k of the real target information.

In one embodiment, the step of calculating the target coordinate penalty of each of the anchor frames by the third penalty function includes:

the formula (4) is:

；

wherein,

the target coordinate penalty for the jth anchor box of the ith mesh,

In one embodiment, v is calculated by equation (5):

equation (5) is:

；

v is a second preset parameter which is set by the user,

indicating the width of the area where the real target information is located of the jth anchor box of the ith mesh,

j (th) representing i (th) gridThe height of the region where the real target information of the anchor frame is located, w is the width of the predicted target region, and h is the height of the predicted target region.

In one embodiment, the calculation of α is represented by equation (6):

equation (6) is:

；

alpha is a first preset parameter which is set by a user,

and v are explained in the formula (4) and the formula (5), and are not described in detail herein.

And summing the product value of the first prediction loss multiplied by a first weight coefficient, the product value of the second prediction loss multiplied by a second weight coefficient and the product value of the third prediction loss multiplied by a third weight coefficient to obtain the current training loss. The first weight coefficient, the second weight coefficient and the third weight coefficient are preset parameters.

And S170, adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.

The step of adjusting the parameters of the initial detection model according to the current training loss feedback to obtain a target detection model comprises:

and if the current training loss is larger than the preset training loss threshold, continuing training the initial detection model, adjusting the initial detection model in a back propagation mode by the current training loss through a small batch gradient descent algorithm until the training loss continuously acquired in N training periods is smaller than the preset training loss threshold, and taking the adjusted initial detection model as the target detection model.

In the actual training process, training parameters and verification parameters are preset. Training parameters include, but are not limited to, total number of training rounds, number of samples for one training, etc.; the verification parameters comprise an evaluation period, an evaluation index, a preset training loss threshold value and the like. During actual training, in a training period, inputting the signal time-frequency diagrams with the preset number of samples into the initial detection model, calculating the training loss through the loss function preset in step S160 by using a random data enhancement method for each signal time-frequency diagram, reversely transmitting the training loss to the initial detection model, and adjusting the parameters of the initial detection model. In one embodiment, a Mini-Batch Gradient decline algorithm may be used.

When the remainder of the training cycle number to the evaluation cycle number is 0, that is, the training cycle number is a multiple of the evaluation cycle, the evaluation index is calculated. The evaluation index may be a Mean Average Precision (MAP) used for measuring the detection precision of the adjusted initial detection model on the signal data in the signal time-frequency diagram.

In practical application, when the number of training cycles reaches the total number of training rounds, the training may be stopped, and the adjusted initial detection model may be used as the target detection model and the evaluation index of the target detection model may be calculated. However, in the method adopted in this embodiment, when the training loss obtained in N training periods is smaller than the preset training loss threshold, the adjusted initial detection model is used as the target detection model, so that it can be ensured that the training is effective to some extent.

The existing target detection mainly comprises the following schemes: (1) designing a rectangular frame according to the prior knowledge of the input signal, and designing the rectangular frame with the corresponding size according to all possible durations and bandwidths of the input signal for detection; strong signal prior knowledge is needed, and the calculated amount is large; (2) the YOLO algorithm divides an image into 7 × 7 meshes, sets 2 anchor frames for each mesh, and detects a signal from the anchor frames; but because the number of grids is small, the detection effect on small targets and intensive targets is poor; (3) the YOLOV3 algorithm increases the number of anchor frames of each grid to 9, and divides the anchor frames into three dimensions of large, medium and small; but the dense small targets are not accurately identified; (4) the Poly-YOLO algorithm reduces the grid dimension to be smaller, and the grid division is denser; however, the target detection model is slow to train and test, and the coordinate and boundary regression of a large target is inaccurate.

Compared with the existing scheme, in the training method of the target detection model provided by the embodiment, the network structure of the target detection model is improved, the problem that the algorithms of YOLO and YOLOV3 are inaccurate in identifying the dense small targets is solved, the problems that the training and testing of the Poly-YOLO network are slow and the regression to the coordinates and the boundary of the large target is inaccurate are solved, and the target data in the signal time-frequency diagram can be quickly and accurately detected. The training method for the target detection model provided by the embodiment further improves the loss function, and proposes to refer to the negative sample weight coefficient of the cross-over ratio to calculate the loss of the negative sample, so that the target detection model can smoothly learn to distinguish a real target area and a blank target area, and can quickly converge in the training.

The training method of the target detection model provided by the embodiment improves the network structure of the target detection model, uses the expansion convolution layer and the deconvolution layer to perform upsampling, expands the visibility of the characteristic layer, and adds the negative sample weight coefficient into the loss function, so that the target detection model can distinguish a real target area and a blank target area more smoothly in training, can train and detect signal data in a signal time-frequency diagram rapidly, and can accurately detect various types of signal data.

Example 2

Referring to fig. 5, the training apparatus 500 for a target detection model includes a first obtaining module 510, a first constructing module 520, a first calculating module 530, a second calculating module 540, a second obtaining module 550, a second constructing module 560, and an adjusting module 570.

In this embodiment, the first obtaining module 510 is configured to: acquiring a training set, wherein the training set comprises a plurality of signal time-frequency graphs;

the first building block 520 is configured to: constructing an initial detection model, wherein the initial detection model comprises a backbone network and a head network;

the first calculation module 530 is configured to: inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network, and outputting a plurality of characteristic layers;

the second computing module 540 is configured to: inputting a plurality of characteristic layers into the head network, and respectively calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;

the second obtaining module 550 is configured to: acquiring real target information of the signal time-frequency diagrams;

the second building module 560 is configured to: constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;

the adjusting module 570 is configured to: and adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.

In an embodiment, the second calculating module 540 is specifically configured to: obtaining a first output result by the second characteristic layer through the second transformation convolution layer;

In an embodiment, the second building module 560 is specifically configured to: dividing each signal time-frequency diagram into a plurality of grids, and setting a preset number of anchor frames for each grid;

In an embodiment, the second building module 560 is further specifically configured to: when the jth anchor frame of the ith grid contains real target information, calculating the target loss of the jth anchor frame of the ith grid according to the confidence coefficient of the fact that the real target information exists in the jth anchor frame of the ith grid through the first loss function;

In an embodiment, the second building module 560 is further specifically configured to: calculating the target loss of the jth anchor frame of the ith grid according to the following formula (1):

the formula (1) is:

；

wherein,

said object being the jth anchor frame of the ith mesh has no loss,

the jth anchor box representing the ith mesh contains real target information.

In one embodiment, the objective of the jth anchor frame of the ith mesh is computed with no loss according to equation (2):

the formula (2) is:

wherein,

said object being the jth anchor frame of the ith mesh has no loss,

is the jth anchor frame of the ith grid and the realThe intersection ratio of the areas where the target information is located,

In an embodiment, the second building module 560 is further specifically configured to: determining the target class penalty for each of the anchor boxes according to equation (3), wherein:

the formula (3) is:

；

wherein,

the target class loss for the jth anchor frame of the ith mesh, nc is the preset number, k is the class of the real target information,

In an embodiment, the second building module 560 is further specifically configured to: determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein:

equation (4) is:

；

wherein,

the target coordinate penalty for the jth anchor box of the ith mesh,

In an embodiment, the second building module 560 is further specifically configured to: clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories;

In an embodiment, the adjusting module 570 is specifically configured to: judging whether the current training loss is smaller than a preset training loss threshold value or not;

The training device for the target detection model provided by this embodiment improves the network structure of the target detection model, performs upsampling using the expansion convolution layer and the deconvolution layer, expands the visibility of the feature layer, and adds the negative sample weight coefficient to the loss function, so that the target detection model can distinguish the real target area from the blank target area more smoothly during training, and can train and detect signal data in the signal time-frequency diagram quickly, and accurately detect various types of signal data.

Example 3

The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the training method of the object detection model according to embodiment 1.

The computer device provided in this embodiment may implement the method for training the target detection model described in embodiment 1, and details are not described here again to avoid repetition.

Example 4

The computer-readable storage medium provided in this embodiment may implement the method for training the target detection model described in embodiment 1, and is not described herein again to avoid repetition.

In this embodiment, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims

1. A method for training an object detection model, the method comprising:

inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network, and outputting a plurality of characteristic layers;

acquiring real target information of the time-frequency graphs of the signals;

adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model;

the header network includes: a first transformation convolutional layer, a second transformation convolutional layer, a third transformation convolutional layer, a fourth transformation convolutional layer, a first expansion convolutional layer, a second expansion convolutional layer, a first deconvolution layer, a second deconvolution layer and a subsequent convolutional layer; the plurality of feature layers includes: the first characteristic layer, the second characteristic layer and the third characteristic layer;

obtaining a first output result by the second characteristic layer through the second transformation convolution layer;

converting the number of channels of the third feature layer through the third conversion convolutional layer, expanding the visibility through the first expansion convolutional layer, and performing up-sampling through the first deconvolution layer to obtain a second output result;

2. The method of training an object detection model according to claim 1, the method further comprising:

3. A method for training an object detection model according to claim 2, wherein said step of calculating whether there is no loss of objects in each of said anchor frames by said first loss function comprises:

4. The method for training the object detection model according to claim 3, wherein when the jth anchor frame of the ith mesh contains real object information, calculating, by the first loss function, whether there is any object loss in the jth anchor frame of the ith mesh according to the predicted confidence that the jth anchor frame of the ith mesh has the real object information, the method comprises:

calculating the object loss of the jth anchor frame of the ith grid according to formula (1),

the formula (1) is:

；

wherein,

said object being the jth anchor frame of the ith mesh has no loss,

the jth anchor box representing the ith mesh contains real target information.

5. The method for training the object detection model according to claim 3, wherein when the jth anchor frame of the ith mesh does not contain real object information, calculating the object of the jth anchor frame of the ith mesh without loss according to the predicted confidence that the real object information exists in the jth anchor frame of the ith mesh and the intersection ratio of the jth anchor frame of the ith mesh and the region where the real object information exists by the first loss function, and comprising:

calculating the object loss of the jth anchor frame of the ith grid according to formula (2),

the formula (2) is:

wherein,

said object being the jth anchor frame of the ith mesh has no loss,

6. The method for training an object detection model according to claim 2, wherein the step of calculating the object class loss of each anchor frame by the second loss function comprises:

determining the target class penalty for each of the anchor boxes according to equation (3),

the formula (3) is:

；

wherein,

7. The method for training an object detection model according to claim 2, wherein the step of calculating the object coordinate loss of each anchor frame by the third loss function comprises:

determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein,

equation (4) is:

；

wherein,

for the jth anchor frame of the ith meshThe loss of the coordinates of the object is,

8. The method of training an object detection model according to claim 2, the method further comprising:

9. The method for training the target detection model according to claim 1, wherein the step of adjusting the parameters of the initial detection model according to the current training loss feedback to obtain the target detection model comprises:

10. An apparatus for training an object detection model, the apparatus comprising:

the second construction module is used for constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;

the adjusting module is used for adjusting the parameters of the initial detection model according to the current training loss feedback to obtain a target detection model;

the second computing module is further configured to obtain a first output result from the second feature layer through the second transform convolution layer;

11. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when the processor is run, performs the method of training an object detection model according to any one of claims 1-9.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, performs the method of training an object detection model according to any one of claims 1-9.