CN114492540B - Training method and device of target detection model, computer equipment and storage medium - Google Patents

Training method and device of target detection model, computer equipment and storage medium Download PDF

Info

Publication number
CN114492540B
CN114492540B CN202210308846.6A CN202210308846A CN114492540B CN 114492540 B CN114492540 B CN 114492540B CN 202210308846 A CN202210308846 A CN 202210308846A CN 114492540 B CN114492540 B CN 114492540B
Authority
CN
China
Prior art keywords
loss
detection model
anchor frame
training
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210308846.6A
Other languages
Chinese (zh)
Other versions
CN114492540A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Shuzhilian Technology Co Ltd
Original Assignee
Chengdu Shuzhilian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Shuzhilian Technology Co Ltd filed Critical Chengdu Shuzhilian Technology Co Ltd
Priority to CN202210308846.6A priority Critical patent/CN114492540B/en
Publication of CN114492540A publication Critical patent/CN114492540A/en
Application granted granted Critical
Publication of CN114492540B publication Critical patent/CN114492540B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a training method and device of a target detection model, computer equipment and a storage medium, and relates to the technical field of image recognition. The method comprises the following steps: and constructing a network structure of the target detection model to obtain an initial detection model, inputting a plurality of signal time-frequency graphs into the initial detection model, outputting a final prediction result, constructing a loss function to calculate training loss, and adjusting the initial detection model according to the training loss to obtain the target detection model. The expansion convolution layer and the deconvolution layer are added in the network structure, and the negative sample weight value is added in the loss function, so that the target detection model can smoothly distinguish a real target area from a blank target area in training. Therefore, the signal data in the signal time-frequency diagram can be trained and detected quickly, and various types of signal data can be detected accurately.

Description

Training method and device of target detection model, computer equipment and storage medium
Technical Field
The scheme belongs to the technical field of image recognition, and particularly relates to a training method and device of a target detection model, computer equipment and a storage medium.
Background
In order to ensure the reliability of information transmission, the information transmission system must have stable anti-interference capability, and signal detection is one of the best methods for resisting interference. The existing signal detection scheme is a time-frequency analysis method, and the process is as follows: and mapping the one-dimensional signal to a two-dimensional plane to generate a signal time-frequency diagram. The target signal data in the signal time-frequency diagram is detected by using a deep neural network, which is called a target detection problem. The signal time-frequency diagram can be subjected to target detection through a YOLO algorithm, a YOLOV3 algorithm and a Poly-YOLO algorithm, and when the signal time-frequency diagram is subjected to target detection through the YOLO algorithm, the detection effect on small targets and dense targets is poor, and target detection can hardly be performed on the signal time-frequency diagram with short and dense signals. When the Yolov3 algorithm detects the target of the signal time-frequency diagram, the problems of inaccurate identification of the large target, inaccurate frame regression and inaccurate accurate detection of the dense small target exist; due to the network structure of the Poly-YOLO algorithm, convergence is difficult during training, and training and testing speeds are slow, so that the rapidness and the instantaneity of signal detection cannot be met.
Disclosure of Invention
In order to solve the problems that the existing target detection algorithm is poor in detection effect and difficult to train on different types of signal data in a signal time-frequency diagram, the application provides a training method and device, computer equipment and a storage medium of a target detection model, the signal data of different types of signal time-frequency diagrams can be accurately detected, the training and detection speed is high, and the real-time performance of signal detection can be met.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a method for training a target detection model, where the method includes:
acquiring a training set, wherein the training set comprises a plurality of signal time-frequency graphs;
constructing an initial detection model, wherein the initial detection model comprises a backbone network and a head network;
inputting the signal time-frequency graphs into the backbone network, calculating the signal time-frequency graphs through the backbone network, and outputting a plurality of characteristic layers;
inputting a plurality of characteristic layers into the head network, and respectively calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;
acquiring real target information of the signal time-frequency diagrams;
constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;
and adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.
In one possible implementation, the header network includes: a first transformation convolutional layer, a second transformation convolutional layer, a third transformation convolutional layer, a fourth transformation convolutional layer, a first expansion convolutional layer, a second expansion convolutional layer, a first deconvolution layer, a second deconvolution layer and a subsequent convolutional layer; the plurality of feature layers includes: the first characteristic layer, the second characteristic layer and the third characteristic layer;
the step of inputting the first feature layer, the second feature layer, and the third feature layer into the head network, and calculating the first feature layer, the second feature layer, and the third feature layer through the head network to obtain final prediction results corresponding to the signal time-frequency diagrams includes:
enabling the second characteristic layer to pass through the second conversion convolution layer to obtain a first output result;
converting the number of channels of the third feature layer through the third conversion convolutional layer, expanding the visual field degree through the first expansion convolutional layer, and performing up-sampling through the first deconvolution layer to obtain a second output result;
adding the first output result and the second output result to obtain a third output result;
converting the third output result by the fourth conversion convolutional layer according to the number of channels, expanding the visibility by the second expansion convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result;
obtaining a fifth output result by passing the first characteristic layer through the first transformation convolution layer;
adding the fourth output result and the fifth output result to obtain a sixth output result;
and obtaining the final prediction result corresponding to each signal time-frequency diagram by the sixth output result through the subsequent convolutional layer.
In one possible implementation, the method further includes:
dividing each signal time-frequency diagram into a plurality of grids, and setting a preset number of anchor frames for each grid;
the preset loss function includes: a first loss function, a second loss function, a third loss function;
the step of calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function comprises the following steps:
calculating the loss of the target of each anchor frame through the first loss function, and calculating the sum of the loss of the target of each anchor frame to obtain a first prediction loss;
calculating the target category loss of each anchor frame through the second loss function, and calculating the sum of the target category losses of each anchor frame to obtain a second prediction loss;
calculating the target coordinate loss of each anchor frame through the third loss function, and calculating the sum of the target coordinate losses of each anchor frame to obtain a third prediction loss;
and summing the product value of the first prediction loss multiplied by a first weight coefficient, the product value of the second prediction loss multiplied by a second weight coefficient and the product value of the third prediction loss multiplied by a third weight coefficient to obtain the current training loss.
In one possible implementation, the step of calculating whether there is a loss of the target of each of the anchor frames by the first loss function includes:
when the jth anchor frame of the ith grid contains real target information, calculating the target loss of the jth anchor frame of the ith grid according to the confidence coefficient of the fact that the real target information exists in the jth anchor frame of the ith grid through the first loss function;
and when the jth anchor frame of the ith grid does not contain real target information, calculating the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information exists according to the confidence coefficient of the real target information of the jth anchor frame of the ith grid and the jth anchor frame of the ith grid in the predicted jth anchor frame of the ith grid through the first loss function, wherein the target of the jth anchor frame of the ith grid has no loss.
In a possible implementation manner, when the jth anchor frame of the ith mesh contains real target information, calculating, by the first loss function, a confidence that the jth anchor frame of the ith mesh has the real target information according to the fact that the jth anchor frame of the ith mesh has the real target information, where there is no loss in the target of the jth anchor frame of the ith mesh, including:
calculating the target loss of the jth anchor frame of the ith grid according to the following formula (1):
formula (1):
Figure M_220323154755031_031800001
wherein,
Figure M_220323154755094_094368001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154755125_125540002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154755156_156781003
the jth anchor box representing the ith mesh contains real target information.
In a possible implementation manner, when the jth anchor frame of the ith mesh does not contain real target information, the calculating, by the first loss function, the confidence that the real target information exists in the jth anchor frame of the ith mesh and the intersection of the jth anchor frame of the ith mesh and the region where the real target information exists are lossless according to the predicted jth anchor frame of the ith mesh, and the calculating includes:
calculating whether the target of the jth anchor frame of the ith grid has loss according to formula (2):
the formula (2) is:
Figure M_220323154755188_188992001
Figure M_220323154755236_236411002
wherein,
Figure M_220323154755267_267612001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154755298_298892002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154755330_330132003
the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154755345_345734004
the jth anchor box representing the ith mesh contains no real target information.
In a possible implementation manner, the step of calculating the object class penalty of each anchor frame through the second penalty function includes:
determining the target class penalty for each of the anchor boxes according to equation (3), wherein:
the formula (3) is:
Figure M_220323154755377_377011001
wherein,
Figure M_220323154755456_456584001
the target category loss of the jth anchor frame of the ith mesh, nc is the preset number, k is the category of the real target information,
Figure M_220323154755472_472190002
when the prediction category corresponding to the jth anchor frame of the ith grid is the category of the real target information, the prediction category is 1, otherwise, the prediction category is 0,
Figure M_220323154755503_503465003
to predict the probability that the jth anchor box of the ith mesh belongs to the category of the true target information.
In a possible implementation manner, the step of calculating the target coordinate loss of each anchor frame through the third loss function includes:
determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein:
equation (4) is:
Figure M_220323154755519_519081001
wherein,
Figure M_220323154755582_582558001
the target coordinate penalty for the jth anchor box of the ith mesh,
Figure M_220323154755614_614319002
represents the intersection ratio of the predicted target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154755645_645548003
the method comprises the steps of representing the straight-line distance between the center point of a prediction target area corresponding to the jth anchor frame of the ith grid and the center point of an area where the real target information is located, representing the diagonal length of the minimum circumscribed matrix of the prediction target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located, wherein alpha is a first preset parameter, and v is a second preset parameter.
In one possible implementation, the method further includes:
clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories;
and acquiring the coordinates of each clustering center, and adjusting the size of the corresponding anchor frame according to the coordinates of each clustering center.
In a possible implementation manner, the step of adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model includes:
judging whether the current training loss is smaller than a preset training loss threshold value or not;
if the current training loss is larger than the preset training loss threshold, continuing training the initial detection model, adjusting the initial detection model in a back propagation mode through a small batch gradient descent algorithm until the training loss continuously obtained in N training periods is smaller than the preset training loss threshold, and taking the adjusted initial detection model as the target detection model.
In a second aspect, an embodiment of the present application provides an apparatus for training a target detection model, where the apparatus includes:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training set, and the training set comprises a plurality of signal time-frequency graphs;
the device comprises a first construction module, a second construction module and a third construction module, wherein the first construction module is used for constructing an initial detection model, and the initial detection model comprises a backbone network and a head network;
the first calculation module is used for inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network and outputting a plurality of characteristic layers;
the second calculation module is used for inputting the plurality of characteristic layers into the head network, and calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;
the second acquisition module is used for acquiring the real target information of the signal time-frequency diagrams;
the second construction module is used for constructing a preset loss function and calculating the current training loss between the final prediction result corresponding to each signal time-frequency graph and the real target information through the preset loss function;
and the adjusting module is used for adjusting the parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.
In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, performs the training method for the target detection model according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the training method of the object detection model according to the first aspect.
Compared with the prior art, the method has the following beneficial effects:
the training method, the training device, the computer equipment and the storage medium of the target detection model provided by the embodiment use a network structure added with an expansion convolutional layer and a deconvolution layer, and expand the visual field of a characteristic layer; and a loss function added with a negative sample weight value is used, so that the target detection model is converged quickly in training. The finally obtained target detection model can quickly and accurately detect various types of signal data in the signal time-frequency diagram.
Drawings
In order to more clearly explain the technical solutions of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope of protection of the present application. Like components are numbered alike in the various figures, and other related figures may also be derived from these figures by a person of ordinary skill in the art without inventive effort.
FIG. 1 is a flow chart of a method for training a target detection model according to an embodiment of the present invention;
fig. 2 is an exemplary diagram of a signal time-frequency diagram according to an embodiment of the present invention;
FIG. 3 is a diagram of an exemplary initial detection model provided by an embodiment of the invention;
FIG. 4 is another schematic flow chart of a method for training a target detection model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a training apparatus for an object detection model according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.
The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present invention, are only intended to indicate specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.
Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of the present invention belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments of the present invention.
Example 1
The embodiment provides a training method of a target detection model. As shown in fig. 1, the training method for providing a target detection model in this embodiment includes the following steps:
step S110, a training set is obtained, wherein the training set comprises a plurality of signal time-frequency graphs.
When signal data in the signal time-frequency diagram is detected through the target detection model, a region where a signal exists is a signal target region, a region where the signal does not exist is a noise region, and the signal target region and the noise region have a clear boundary. As shown in fig. 2, fig. 2 is an example of a signal time-frequency diagram, where the abscissa represents the time domain and the ordinate represents the frequency domain, and some signal data with short duration and small frequency range are reflected to a small and dense rectangular object, such as 201 in fig. 2, on the signal time-frequency diagram; some signal data with long duration and wide frequency range are reflected in the signal time-frequency diagram as a rectangular object with large length and width, such as 202 in fig. 2. In practical applications, these different types of signals may appear in one signal time-frequency diagram at the same time, and in order to better detect various types and shapes of signal data in the signal time-frequency diagram, this embodiment provides a target detection model and a training method thereof.
Step S120, an initial detection model is constructed, and the initial detection model comprises a backbone network and a head network.
In one embodiment, the backbone network is a Resnet18 network, which has a smaller network structure and is faster in training and operation, and can better meet the real-time requirement of target detection.
In one embodiment, the header network comprises: a first transformational convolutional layer, a second transformational convolutional layer, a third transformational convolutional layer, a fourth transformational convolutional layer, a first expansion convolutional layer, a second expansion convolutional layer, a first deconvolution layer, a second deconvolution layer and a subsequent convolutional layer. In steps S130 and S140, the operation of the above-described convolutional layer will be described in detail.
Step S130, inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network, and outputting a plurality of characteristic layers.
As shown in fig. 3, in an embodiment, when passing through a backbone network ResNet18, a signal time-frequency diagram passes through a stem layer and four Block layers (B1, B2, B3, B4), wherein a feature layer obtained after passing through a second Block layer B2 is F2, i.e., a first feature layer; the characteristic layer obtained after passing through the third Block layer B3 is F3, namely a second characteristic layer; the feature layer obtained after the last Block layer B4 is F4, the third feature layer. The feature layer includes three dimensions: the length and width of each feature map are equal to the number of channels, and the number of channels represents the number of layers of the feature map, which is usually the number of convolution kernels of a convolutional layer passed before the feature layer.
Step S140, inputting the plurality of feature layers into the head network, and calculating each feature layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram. The header network, i.e., HeadNet in fig. 3.
As shown in fig. 3 and 4, the step of inputting the first feature layer, the second feature layer, and the third feature layer into the head network, and calculating the first feature layer, the second feature layer, and the third feature layer through the head network to obtain the final prediction result corresponding to each signal time-frequency diagram includes:
step S410, obtaining a first output result from the second feature layer through the second transform convolution layer.
The second feature layer F3 has been changed in the number of channels by the second conversion convolution layer conv2, resulting in a first output result.
Step S420, transform the number of channels of the third feature layer through the third transformed convolutional layer, expand the visibility through the first expanded convolutional layer, and perform upsampling through the first deconvolution layer to obtain a second output result.
The third featured layer, F4 in fig. 3. In one embodiment, the third transformed convolutional layer conv3 is a 1 × 1 convolutional layer; the expansion rate of the first expanded convolutional layer diaconv1 is 2, and the effect is to expand the visual field; the first deconvolution layer transconv1 is used to transform the length and width of the feature map in the third feature layer to 2 times the original length and width, but does not change the number of channels.
In step S430, the first output result and the second output result are added to obtain a third output result, i.e., H1 in fig. 3.
Step S440, transforming the number of channels of the third output result by the fourth transformational convolutional layer, expanding the visibility by the second dilation convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result.
The parameters of the fourth convolutional layer conv4, the second convolutional layer diaconv2, and the second convolutional layer transconv2 are set according to specific tasks, and are not limited herein.
Step S450, a fifth output result is obtained by passing the first feature layer through the first transform convolution layer.
The first feature layer F2 has the number of channels changed by the first conversion convolution layer conv1, and the parameters of the first conversion convolution layer conv1 are set according to specific tasks, and are not limited herein.
In step S460, the fourth output result and the fifth output result are added to obtain a sixth output result, i.e., H2 in fig. 3.
And step S470, obtaining the final prediction result corresponding to each signal time-frequency diagram through the subsequent convolutional layer according to the sixth output result.
The subsequent convolution layer convs only changes the number of channels of the sixth output result, and does not change the size of the feature map in the sixth output result, so that the final prediction result is H3 in fig. 3.
In the embodiment, all the feature layers, the feature maps and the output results are stored in the computer device in the form of a matrix, and the addition and subtraction operations thereof follow the algorithm of the matrix.
And step S150, acquiring the real target information of the signal time-frequency diagrams.
In this embodiment, the real target information is signal data of an area where the real target is located in the signal time-frequency diagram, and includes whether the signal data exists in the area where the real target is located, a category of the signal data, and a coordinate of the signal data.
Step S160, constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function.
In an embodiment, the step of calculating, by using the preset loss function, a current training loss between a final predicted result corresponding to each signal time-frequency diagram and real target information includes:
dividing each signal time-frequency graph into a plurality of grids, and setting a preset number of anchor frames for each grid;
in the target detection algorithm, an Anchor Box (Anchor Box) is a plurality of preset rectangular boxes with different sizes for detecting signal data, wherein the size of the Anchor Box is configured when a model is configured.
In one embodiment, the size of the anchor box is obtained by the Kmeans algorithm. Clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories; and acquiring the coordinates of each clustering center, and adjusting the size of the corresponding anchor frame according to the coordinates of each clustering center. The Kmeans algorithm is a common clustering algorithm for classifying each sample in the data set into a class corresponding to the cluster center with the smallest distance.
The number of anchor frames set for the grid is equal to the cluster type set during clustering, such as: and (4) clustering the real target information in the signal time-frequency diagram into 9 types, and setting 9 anchor frames for each grid.
In one embodiment, the preset loss function comprises: a first loss function, a second loss function, and a third loss function.
Calculating the loss of the target of each anchor frame through the first loss function, and calculating the sum of the loss of the target of each anchor frame to obtain a first prediction loss;
calculating the target category loss of each anchor frame through the second loss function, and calculating the sum of the target category losses of each anchor frame to obtain a second prediction loss;
calculating the target coordinate loss of each anchor frame through the third loss function, and calculating the sum of the target coordinate losses of each anchor frame to obtain a third prediction loss;
in one embodiment, said step of calculating by said first penalty function whether there is a penalty on the target of each of said anchor boxes comprises:
when the jth anchor frame of the ith grid contains real target information, calculating the target loss of the jth anchor frame of the ith grid according to the confidence coefficient of the fact that the real target information exists in the jth anchor frame of the ith grid through the first loss function;
and when the jth anchor frame of the ith grid does not contain real target information, calculating the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information exists according to the confidence coefficient of the real target information of the jth anchor frame of the ith grid and the jth anchor frame of the ith grid in the predicted jth anchor frame of the ith grid through the first loss function, wherein the target of the jth anchor frame of the ith grid has no loss.
And the confidence of the real target information is calculated by a target detection model, and the intersection and union ratio of the jth anchor frame of the ith grid and the area of the real target information is the ratio of the intersection and union of the area of the jth anchor frame of the ith grid and the area of the real target information.
In one embodiment, when the jth anchor frame of the ith mesh contains real target information, calculating, by the first loss function, whether there is a loss of the target of the jth anchor frame of the ith mesh according to a confidence that the jth anchor frame of the ith mesh has the real target information, the method includes:
calculating the target loss of the jth anchor frame of the ith grid according to the following formula (1):
formula (1):
Figure M_220323154755676_676809001
wherein,
Figure M_220323154755723_723674001
the target of the jth anchor frame for the ith mesh has no loss,
Figure M_220323154755754_754913002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154755788_788094003
the jth anchor box representing the ith mesh contains real target information.
It should be noted that although the base of the logarithmic function log in the formula (1) is omitted in writing, the base of the logarithmic function log is 1 in practical application, and the formulas (2) and (3) are the same.
In one embodiment, when the jth anchor frame of the ith mesh does not contain real target information, the calculating, by the first loss function, the confidence that the real target information exists according to the jth anchor frame of the ith mesh and the intersection of the jth anchor frame of the ith mesh and the region where the real target information exists are lossless according to the predicted jth anchor frame of the ith mesh, and the method includes:
calculating according to formula (2)
Figure M_220323154755819_819853001
A first of the grid
Figure M_220323154755851_851114002
The target of each anchor frame has no loss:
the formula (2) is:
Figure M_220323154755866_866744001
Figure M_220323154755913_913593002
wherein,
Figure M_220323154755960_960491001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154755976_976129002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154756009_009430003
the j-th anchor frame of the i-th grid is compared with the intersection of the area where the real target information is located,
Figure M_220323154756024_024944004
the jth anchor box representing the ith mesh contains no real target information.
In the following description of the loss function, positive examples indicate examples that are consistent with the true target information class, and negative examples indicate examples that are inconsistent with the true target information class. In the formulas (1) and (2), the confidence coefficients of the actual target information of all the anchor frame predictions are calculated, and punishments are made on the anchor frame with the wrong prediction according to the error degree. In practical application, a situation that a certain anchor frame does not have a target but has a large overlapping area with a region where real target information is located may exist, the intersection ratio of the anchor frame and the region where the real target information is located is high in the situation, the situation generally occurs in the anchor frame in a grid near the center of the real target information, and it is considered that the training effect of the target detection model is influenced when the confidence coefficient of the anchor frame is predicted to be 1 or 0. Therefore, in this case, the confidence that the anchor frame has the real target information is multiplied by a negative sample weight value
Figure M_220323154756056_056175001
If, if
Figure M_220323154756087_087441002
The weight value of the negative sample is higher through square calculation, so that the weight value of the negative sample becomes very low, and even if the confidence coefficient of the target predicted by the anchor frame is close to 1, the large target cannot be generated and no loss exists; and if all of the anchor frames are blank areas,
Figure M_220323154756103_103065003
a negative sample weight of 1 at 0 will result in a large target with no loss. It needs to be noticed that when the target detection model is trained in the early stage, the gradient explosion phenomenon is easily generated without loss of the target because the number of the anchor frames is too large and the positive and negative samples are quite unbalanced when the target detection model is trained, and the target detection model is used for detecting the target in the early stage
Figure M_220323154756134_134343004
The accumulation of extreme values close to 0 or close to 1 produces a target with no loss and a large gradient, so that the target with no loss is calculated before the target with no loss is calculated
Figure M_220323154756149_149948005
Make truncation, the scheme will
Figure M_220323154756182_182131006
Cut off to [0.0001, 0.9999 ]]Within the interval. The specific cutting-off mode is as follows: values less than 0.0001 are taken as unity and values greater than 0.9999 are taken as unity and 0.9999, for example: if it is
Figure M_220323154756198_198270007
When the ratio is 0.00005, the calculation is carried out according to 0.0001; if it is
Figure M_220323154756229_229529008
And the calculation is 0.99999, the calculation is carried out according to 0.9999.
According to the embodiment, the negative sample weight value is added, so that the target detection model is forced to distinguish a real target area and a blank target area, the training effect of the target detection model is improved, and the gradient explosion phenomenon is avoided by intercepting the data interval.
In one embodiment, the step of calculating the object class penalty for each of the anchor boxes by the second penalty function comprises: determining the target class penalty for each of the anchor boxes according to equation (3), wherein:
the formula (3) is:
Figure M_220323154756245_245185001
wherein,
Figure M_220323154756307_307684001
the target category loss of the jth anchor frame of the ith mesh, nc is the preset number, k is the category of the real target information,
Figure M_220323154756338_338905002
when the prediction category corresponding to the jth anchor frame of the ith grid is the category of the real target information, the prediction category is 1, otherwise, the prediction category is 0,
Figure M_220323154756354_354521003
to predict the probability that the jth anchor box of the ith mesh belongs to the category of the true target information.
The object class penalty is only generated if the anchor box contains real object information, and is 0 in the rest of the cases. The target class loss of a single anchor frame is used for calculating the average value of the two-class cross entropy loss for all the classes of the single anchor frame, and the average value is used for measuring the prediction accuracy of the target detection model. For example: if the signals in the signal time-frequency diagram are classified into 9 classes, the preset number nc is also 9,
Figure M_220323154756386_386724001
representing the probability of predicting that the jth anchor box of the ith mesh belongs to the category k of the real target information.
In one embodiment, the step of calculating the target coordinate penalty of each of the anchor frames by the third penalty function includes:
determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein:
the formula (4) is:
Figure M_220323154756418_418483001
wherein,
Figure M_220323154756465_465364001
the target coordinate penalty for the jth anchor box of the ith mesh,
Figure M_220323154756481_481003002
represents the intersection ratio of the predicted target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154756512_512243003
the method comprises the steps of representing the straight-line distance between the center point of a prediction target area corresponding to the jth anchor frame of the ith grid and the center point of an area where the real target information is located, representing the diagonal length of the minimum circumscribed matrix of the prediction target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located, wherein alpha is a first preset parameter, and v is a second preset parameter.
In one embodiment, v is calculated by equation (5):
equation (5) is:
Figure M_220323154756543_543488001
v is a second preset parameter which is set by the user,
Figure M_220323154756592_592296001
indicating the width of the area where the real target information is located of the jth anchor box of the ith mesh,
Figure M_220323154756624_624086002
j (th) representing i (th) gridThe height of the region where the real target information of the anchor frame is located, w is the width of the predicted target region, and h is the height of the predicted target region.
In one embodiment, the calculation of α is represented by equation (6):
equation (6) is:
Figure M_220323154756655_655320001
alpha is a first preset parameter which is set by a user,
Figure M_220323154756686_686616001
and v are explained in the formula (4) and the formula (5), and are not described in detail herein.
And summing the product value of the first prediction loss multiplied by a first weight coefficient, the product value of the second prediction loss multiplied by a second weight coefficient and the product value of the third prediction loss multiplied by a third weight coefficient to obtain the current training loss. The first weight coefficient, the second weight coefficient and the third weight coefficient are preset parameters.
And S170, adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.
The step of adjusting the parameters of the initial detection model according to the current training loss feedback to obtain a target detection model comprises:
judging whether the current training loss is smaller than a preset training loss threshold value or not;
and if the current training loss is larger than the preset training loss threshold, continuing training the initial detection model, adjusting the initial detection model in a back propagation mode by the current training loss through a small batch gradient descent algorithm until the training loss continuously acquired in N training periods is smaller than the preset training loss threshold, and taking the adjusted initial detection model as the target detection model.
In the actual training process, training parameters and verification parameters are preset. Training parameters include, but are not limited to, total number of training rounds, number of samples for one training, etc.; the verification parameters comprise an evaluation period, an evaluation index, a preset training loss threshold value and the like. During actual training, in a training period, inputting the signal time-frequency diagrams with the preset number of samples into the initial detection model, calculating the training loss through the loss function preset in step S160 by using a random data enhancement method for each signal time-frequency diagram, reversely transmitting the training loss to the initial detection model, and adjusting the parameters of the initial detection model. In one embodiment, a Mini-Batch Gradient decline algorithm may be used.
When the remainder of the training cycle number to the evaluation cycle number is 0, that is, the training cycle number is a multiple of the evaluation cycle, the evaluation index is calculated. The evaluation index may be a Mean Average Precision (MAP) used for measuring the detection precision of the adjusted initial detection model on the signal data in the signal time-frequency diagram.
In practical application, when the number of training cycles reaches the total number of training rounds, the training may be stopped, and the adjusted initial detection model may be used as the target detection model and the evaluation index of the target detection model may be calculated. However, in the method adopted in this embodiment, when the training loss obtained in N training periods is smaller than the preset training loss threshold, the adjusted initial detection model is used as the target detection model, so that it can be ensured that the training is effective to some extent.
The existing target detection mainly comprises the following schemes: (1) designing a rectangular frame according to the prior knowledge of the input signal, and designing the rectangular frame with the corresponding size according to all possible durations and bandwidths of the input signal for detection; strong signal prior knowledge is needed, and the calculated amount is large; (2) the YOLO algorithm divides an image into 7 × 7 meshes, sets 2 anchor frames for each mesh, and detects a signal from the anchor frames; but because the number of grids is small, the detection effect on small targets and intensive targets is poor; (3) the YOLOV3 algorithm increases the number of anchor frames of each grid to 9, and divides the anchor frames into three dimensions of large, medium and small; but the dense small targets are not accurately identified; (4) the Poly-YOLO algorithm reduces the grid dimension to be smaller, and the grid division is denser; however, the target detection model is slow to train and test, and the coordinate and boundary regression of a large target is inaccurate.
Compared with the existing scheme, in the training method of the target detection model provided by the embodiment, the network structure of the target detection model is improved, the problem that the algorithms of YOLO and YOLOV3 are inaccurate in identifying the dense small targets is solved, the problems that the training and testing of the Poly-YOLO network are slow and the regression to the coordinates and the boundary of the large target is inaccurate are solved, and the target data in the signal time-frequency diagram can be quickly and accurately detected. The training method for the target detection model provided by the embodiment further improves the loss function, and proposes to refer to the negative sample weight coefficient of the cross-over ratio to calculate the loss of the negative sample, so that the target detection model can smoothly learn to distinguish a real target area and a blank target area, and can quickly converge in the training.
The training method of the target detection model provided by the embodiment improves the network structure of the target detection model, uses the expansion convolution layer and the deconvolution layer to perform upsampling, expands the visibility of the characteristic layer, and adds the negative sample weight coefficient into the loss function, so that the target detection model can distinguish a real target area and a blank target area more smoothly in training, can train and detect signal data in a signal time-frequency diagram rapidly, and can accurately detect various types of signal data.
Example 2
Referring to fig. 5, the training apparatus 500 for a target detection model includes a first obtaining module 510, a first constructing module 520, a first calculating module 530, a second calculating module 540, a second obtaining module 550, a second constructing module 560, and an adjusting module 570.
In this embodiment, the first obtaining module 510 is configured to: acquiring a training set, wherein the training set comprises a plurality of signal time-frequency graphs;
the first building block 520 is configured to: constructing an initial detection model, wherein the initial detection model comprises a backbone network and a head network;
the first calculation module 530 is configured to: inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network, and outputting a plurality of characteristic layers;
the second computing module 540 is configured to: inputting a plurality of characteristic layers into the head network, and respectively calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;
the second obtaining module 550 is configured to: acquiring real target information of the signal time-frequency diagrams;
the second building module 560 is configured to: constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;
the adjusting module 570 is configured to: and adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model.
In an embodiment, the second calculating module 540 is specifically configured to: obtaining a first output result by the second characteristic layer through the second transformation convolution layer;
converting the number of channels of the third feature layer through the third conversion convolutional layer, expanding the visual field degree through the first expansion convolutional layer, and performing up-sampling through the first deconvolution layer to obtain a second output result;
adding the first output result and the second output result to obtain a third output result;
converting the third output result by the fourth conversion convolutional layer according to the number of channels, expanding the visibility by the second expansion convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result;
obtaining a fifth output result by passing the first characteristic layer through the first transformation convolution layer;
adding the fourth output result and the fifth output result to obtain a sixth output result;
and obtaining the final prediction result corresponding to each signal time-frequency diagram by the sixth output result through the subsequent convolutional layer.
In an embodiment, the second building module 560 is specifically configured to: dividing each signal time-frequency diagram into a plurality of grids, and setting a preset number of anchor frames for each grid;
calculating the loss of the target of each anchor frame through the first loss function, and calculating the sum of the loss of the target of each anchor frame to obtain a first prediction loss;
calculating the target category loss of each anchor frame through the second loss function, and calculating the sum of the target category losses of each anchor frame to obtain a second prediction loss;
calculating the target coordinate loss of each anchor frame through the third loss function, and calculating the sum of the target coordinate losses of each anchor frame to obtain a third prediction loss;
and summing the product value of the first prediction loss multiplied by a first weight coefficient, the product value of the second prediction loss multiplied by a second weight coefficient and the product value of the third prediction loss multiplied by a third weight coefficient to obtain the current training loss.
In an embodiment, the second building module 560 is further specifically configured to: when the jth anchor frame of the ith grid contains real target information, calculating the target loss of the jth anchor frame of the ith grid according to the confidence coefficient of the fact that the real target information exists in the jth anchor frame of the ith grid through the first loss function;
and when the jth anchor frame of the ith grid does not contain real target information, calculating the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information exists according to the confidence coefficient of the real target information of the jth anchor frame of the ith grid and the jth anchor frame of the ith grid in the predicted jth anchor frame of the ith grid through the first loss function, wherein the target of the jth anchor frame of the ith grid has no loss.
In an embodiment, the second building module 560 is further specifically configured to: calculating the target loss of the jth anchor frame of the ith grid according to the following formula (1):
the formula (1) is:
Figure M_220323154756717_717861001
wherein,
Figure M_220323154756780_780331001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154756813_813024002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154756828_828650003
the jth anchor box representing the ith mesh contains real target information.
In one embodiment, the objective of the jth anchor frame of the ith mesh is computed with no loss according to equation (2):
the formula (2) is:
Figure M_220323154756859_859894001
Figure M_220323154756906_906768002
wherein,
Figure M_220323154756938_938036001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220323154756969_969274002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220323154757002_002010003
is the jth anchor frame of the ith grid and the realThe intersection ratio of the areas where the target information is located,
Figure M_220323154757017_017160004
the jth anchor box representing the ith mesh contains no real target information.
In an embodiment, the second building module 560 is further specifically configured to: determining the target class penalty for each of the anchor boxes according to equation (3), wherein:
the formula (3) is:
Figure M_220323154757048_048870001
wherein,
Figure M_220323154757111_111346001
the target class loss for the jth anchor frame of the ith mesh, nc is the preset number, k is the class of the real target information,
Figure M_220323154757142_142616002
when the prediction category corresponding to the jth anchor frame of the ith grid is the category of the real target information, the prediction category is 1, otherwise, the prediction category is 0,
Figure M_220323154757173_173862003
to predict the probability that the jth anchor box of the ith mesh belongs to the category of the true target information.
In an embodiment, the second building module 560 is further specifically configured to: determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein:
equation (4) is:
Figure M_220323154757190_190435001
wherein,
Figure M_220323154757237_237819001
the target coordinate penalty for the jth anchor box of the ith mesh,
Figure M_220323154757269_269097002
represents the intersection ratio of the predicted target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220323154757300_300361003
the method comprises the steps of representing the straight-line distance between the center point of a prediction target area corresponding to the jth anchor frame of the ith grid and the center point of an area where the real target information is located, representing the diagonal length of the minimum circumscribed matrix of the prediction target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located, wherein alpha is a first preset parameter, and v is a second preset parameter.
In an embodiment, the second building module 560 is further specifically configured to: clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories;
and acquiring the coordinates of each clustering center, and adjusting the size of the corresponding anchor frame according to the coordinates of each clustering center.
In an embodiment, the adjusting module 570 is specifically configured to: judging whether the current training loss is smaller than a preset training loss threshold value or not;
and if the current training loss is larger than the preset training loss threshold, continuing training the initial detection model, adjusting the initial detection model in a back propagation mode by the current training loss through a small batch gradient descent algorithm until the training loss continuously acquired in N training periods is smaller than the preset training loss threshold, and taking the adjusted initial detection model as the target detection model.
The training device for the target detection model provided by this embodiment improves the network structure of the target detection model, performs upsampling using the expansion convolution layer and the deconvolution layer, expands the visibility of the feature layer, and adds the negative sample weight coefficient to the loss function, so that the target detection model can distinguish the real target area from the blank target area more smoothly during training, and can train and detect signal data in the signal time-frequency diagram quickly, and accurately detect various types of signal data.
Example 3
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the training method of the object detection model according to embodiment 1.
The computer device provided in this embodiment may implement the method for training the target detection model described in embodiment 1, and details are not described here again to avoid repetition.
Example 4
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the training method of the object detection model according to embodiment 1.
The computer-readable storage medium provided in this embodiment may implement the method for training the target detection model described in embodiment 1, and is not described herein again to avoid repetition.
In this embodiment, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, each functional module or unit in each embodiment of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention.

Claims (12)

1. A method for training an object detection model, the method comprising:
acquiring a training set, wherein the training set comprises a plurality of signal time-frequency graphs;
constructing an initial detection model, wherein the initial detection model comprises a backbone network and a head network;
inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network, and outputting a plurality of characteristic layers;
inputting a plurality of characteristic layers into the head network, and respectively calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;
acquiring real target information of the time-frequency graphs of the signals;
constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;
adjusting parameters of the initial detection model according to the current training loss feedback to obtain a target detection model;
the header network includes: a first transformation convolutional layer, a second transformation convolutional layer, a third transformation convolutional layer, a fourth transformation convolutional layer, a first expansion convolutional layer, a second expansion convolutional layer, a first deconvolution layer, a second deconvolution layer and a subsequent convolutional layer; the plurality of feature layers includes: the first characteristic layer, the second characteristic layer and the third characteristic layer;
the step of inputting the first feature layer, the second feature layer, and the third feature layer into the head network, and calculating the first feature layer, the second feature layer, and the third feature layer through the head network to obtain final prediction results corresponding to the signal time-frequency diagrams includes:
obtaining a first output result by the second characteristic layer through the second transformation convolution layer;
converting the number of channels of the third feature layer through the third conversion convolutional layer, expanding the visibility through the first expansion convolutional layer, and performing up-sampling through the first deconvolution layer to obtain a second output result;
adding the first output result and the second output result to obtain a third output result;
converting the third output result by the fourth conversion convolutional layer according to the number of channels, expanding the visibility by the second expansion convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result;
obtaining a fifth output result by passing the first characteristic layer through the first transformation convolution layer;
adding the fourth output result and the fifth output result to obtain a sixth output result;
and obtaining the final prediction result corresponding to each signal time-frequency diagram by the sixth output result through the subsequent convolutional layer.
2. The method of training an object detection model according to claim 1, the method further comprising:
dividing each signal time-frequency diagram into a plurality of grids, and setting a preset number of anchor frames for each grid;
the preset loss function includes: a first loss function, a second loss function, a third loss function;
the step of calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function comprises the following steps:
calculating the loss of the target of each anchor frame through the first loss function, and calculating the sum of the loss of the target of each anchor frame to obtain a first prediction loss;
calculating the target category loss of each anchor frame through the second loss function, and calculating the sum of the target category losses of each anchor frame to obtain a second prediction loss;
calculating the target coordinate loss of each anchor frame through the third loss function, and calculating the sum of the target coordinate losses of each anchor frame to obtain a third prediction loss;
and summing the product value of the first prediction loss multiplied by a first weight coefficient, the product value of the second prediction loss multiplied by a second weight coefficient and the product value of the third prediction loss multiplied by a third weight coefficient to obtain the current training loss.
3. A method for training an object detection model according to claim 2, wherein said step of calculating whether there is no loss of objects in each of said anchor frames by said first loss function comprises:
when the jth anchor frame of the ith grid contains real target information, calculating the target loss of the jth anchor frame of the ith grid according to the confidence coefficient of the fact that the real target information exists in the jth anchor frame of the ith grid through the first loss function;
and when the jth anchor frame of the ith grid does not contain real target information, calculating the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information exists according to the confidence coefficient of the real target information of the jth anchor frame of the ith grid and the jth anchor frame of the ith grid in the predicted jth anchor frame of the ith grid through the first loss function, wherein the target of the jth anchor frame of the ith grid has no loss.
4. The method for training the object detection model according to claim 3, wherein when the jth anchor frame of the ith mesh contains real object information, calculating, by the first loss function, whether there is any object loss in the jth anchor frame of the ith mesh according to the predicted confidence that the jth anchor frame of the ith mesh has the real object information, the method comprises:
calculating the object loss of the jth anchor frame of the ith grid according to formula (1),
the formula (1) is:
Figure M_220606151859769_769062001
wherein,
Figure M_220606151859894_894059001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220606151859925_925326002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220606151859943_943342003
the jth anchor box representing the ith mesh contains real target information.
5. The method for training the object detection model according to claim 3, wherein when the jth anchor frame of the ith mesh does not contain real object information, calculating the object of the jth anchor frame of the ith mesh without loss according to the predicted confidence that the real object information exists in the jth anchor frame of the ith mesh and the intersection ratio of the jth anchor frame of the ith mesh and the region where the real object information exists by the first loss function, and comprising:
calculating the object loss of the jth anchor frame of the ith grid according to formula (2),
the formula (2) is:
Figure M_220606151859975_975124001
Figure M_220606151900084_084478002
wherein,
Figure M_220606151900115_115739001
said object being the jth anchor frame of the ith mesh has no loss,
Figure M_220606151900148_148426002
there is a confidence level of the true target information for the jth anchor box of the predicted ith mesh,
Figure M_220606151900180_180188003
the intersection ratio of the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220606151900195_195819004
the jth anchor box representing the ith mesh contains no real target information.
6. The method for training an object detection model according to claim 2, wherein the step of calculating the object class loss of each anchor frame by the second loss function comprises:
determining the target class penalty for each of the anchor boxes according to equation (3),
the formula (3) is:
Figure M_220606151900227_227069001
wherein,
Figure M_220606151900305_305190001
the target class loss for the jth anchor frame of the ith mesh, nc is the preset number, k is the class of the real target information,
Figure M_220606151900352_352076002
when the prediction category corresponding to the jth anchor frame of the ith grid is the category of the real target information, the prediction category is 1, otherwise, the prediction category is 0,
Figure M_220606151900375_375039003
to predict the probability that the jth anchor box of the ith mesh belongs to the category of the true target information.
7. The method for training an object detection model according to claim 2, wherein the step of calculating the object coordinate loss of each anchor frame by the third loss function comprises:
determining the target coordinate penalty for each of the anchor boxes according to equation (4), wherein,
equation (4) is:
Figure M_220606151900406_406264001
wherein,
Figure M_220606151900468_468777001
for the jth anchor frame of the ith meshThe loss of the coordinates of the object is,
Figure M_220606151900500_500026002
represents the intersection ratio of the predicted target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located,
Figure M_220606151900515_515641003
the method comprises the steps of representing the straight-line distance between the center point of a prediction target area corresponding to the jth anchor frame of the ith grid and the center point of an area where the real target information is located, representing the diagonal length of the minimum circumscribed matrix of the prediction target area corresponding to the jth anchor frame of the ith grid and the area where the real target information is located, wherein alpha is a first preset parameter, and v is a second preset parameter.
8. The method of training an object detection model according to claim 2, the method further comprising:
clustering the real target information in each signal time-frequency diagram into the preset number of cluster categories through a Kmeans algorithm to obtain cluster centers corresponding to the cluster categories;
and acquiring the coordinates of each clustering center, and adjusting the size of the corresponding anchor frame according to the coordinates of each clustering center.
9. The method for training the target detection model according to claim 1, wherein the step of adjusting the parameters of the initial detection model according to the current training loss feedback to obtain the target detection model comprises:
judging whether the current training loss is smaller than a preset training loss threshold value or not;
if the current training loss is larger than the preset training loss threshold, continuing training the initial detection model, adjusting the initial detection model in a back propagation mode through a small batch gradient descent algorithm until the training loss continuously obtained in N training periods is smaller than the preset training loss threshold, and taking the adjusted initial detection model as the target detection model.
10. An apparatus for training an object detection model, the apparatus comprising:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a training set, and the training set comprises a plurality of signal time-frequency graphs;
the device comprises a first construction module, a second construction module and a third construction module, wherein the first construction module is used for constructing an initial detection model, and the initial detection model comprises a backbone network and a head network;
the first calculation module is used for inputting the signal time-frequency graphs into the backbone network, calculating each signal time-frequency graph through the backbone network and outputting a plurality of characteristic layers;
the second calculation module is used for inputting the plurality of characteristic layers into the head network, and calculating each characteristic layer through the head network to obtain a final prediction result corresponding to each signal time-frequency diagram;
the second acquisition module is used for acquiring the real target information of the signal time-frequency diagrams;
the second construction module is used for constructing a preset loss function, and calculating the current training loss between the final prediction result corresponding to each signal time-frequency diagram and the real target information through the preset loss function;
the adjusting module is used for adjusting the parameters of the initial detection model according to the current training loss feedback to obtain a target detection model;
the header network includes: a first transformation convolutional layer, a second transformation convolutional layer, a third transformation convolutional layer, a fourth transformation convolutional layer, a first expansion convolutional layer, a second expansion convolutional layer, a first deconvolution layer, a second deconvolution layer and a subsequent convolutional layer; the plurality of feature layers includes: the first characteristic layer, the second characteristic layer and the third characteristic layer;
the second computing module is further configured to obtain a first output result from the second feature layer through the second transform convolution layer;
converting the number of channels of the third feature layer through the third conversion convolutional layer, expanding the visual field degree through the first expansion convolutional layer, and performing up-sampling through the first deconvolution layer to obtain a second output result;
adding the first output result and the second output result to obtain a third output result;
converting the third output result by the fourth conversion convolutional layer according to the number of channels, expanding the visibility by the second expansion convolutional layer, and performing upsampling by the second deconvolution layer to obtain a fourth output result;
obtaining a fifth output result by passing the first characteristic layer through the first transformation convolution layer;
adding the fourth output result and the fifth output result to obtain a sixth output result;
and obtaining the final prediction result corresponding to each signal time-frequency diagram by the sixth output result through the subsequent convolutional layer.
11. A computer arrangement, characterized in that the computer arrangement comprises a memory and a processor, the memory storing a computer program which, when the processor is run, performs the method of training an object detection model according to any one of claims 1-9.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, performs the method of training an object detection model according to any one of claims 1-9.
CN202210308846.6A 2022-03-28 2022-03-28 Training method and device of target detection model, computer equipment and storage medium Active CN114492540B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210308846.6A CN114492540B (en) 2022-03-28 2022-03-28 Training method and device of target detection model, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210308846.6A CN114492540B (en) 2022-03-28 2022-03-28 Training method and device of target detection model, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114492540A CN114492540A (en) 2022-05-13
CN114492540B true CN114492540B (en) 2022-07-05

Family

ID=81489139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210308846.6A Active CN114492540B (en) 2022-03-28 2022-03-28 Training method and device of target detection model, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114492540B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100492B (en) * 2022-08-26 2023-04-07 摩尔线程智能科技(北京)有限责任公司 Yolov3 network training and PCB surface defect detection method and device
CN116776130A (en) * 2023-08-23 2023-09-19 成都新欣神风电子科技有限公司 Detection method and device for abnormal circuit signals

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10511908B1 (en) * 2019-03-11 2019-12-17 Adobe Inc. Audio denoising and normalization using image transforming neural network
CN113033473A (en) * 2021-04-15 2021-06-25 中国人民解放军空军航空大学 ST2DCNN + SE-based radar overlapped signal identification method
CN113421281A (en) * 2021-05-17 2021-09-21 西安电子科技大学 Pedestrian micromotion part separation method based on segmentation theory
CN114171044A (en) * 2021-12-09 2022-03-11 江苏恩美谛医疗科技有限公司 Time domain full convolution based deep neural network electronic stethoscope self-adaptive noise elimination method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210157312A1 (en) * 2016-05-09 2021-05-27 Strong Force Iot Portfolio 2016, Llc Intelligent vibration digital twin systems and methods for industrial environments
CN111160255B (en) * 2019-12-30 2022-07-29 成都数之联科技股份有限公司 Fishing behavior identification method and system based on three-dimensional convolution network
CN111541511B (en) * 2020-04-20 2022-08-16 中国人民解放军海军工程大学 Communication interference signal identification method based on target detection in complex electromagnetic environment
CN111832462B (en) * 2020-07-07 2022-07-12 四川大学 Frequency hopping signal detection and parameter estimation method based on deep neural network
CN114154545B (en) * 2021-12-07 2022-08-05 中国人民解放军32802部队 Intelligent unmanned aerial vehicle measurement and control signal identification method under strong mutual interference condition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10511908B1 (en) * 2019-03-11 2019-12-17 Adobe Inc. Audio denoising and normalization using image transforming neural network
CN113033473A (en) * 2021-04-15 2021-06-25 中国人民解放军空军航空大学 ST2DCNN + SE-based radar overlapped signal identification method
CN113421281A (en) * 2021-05-17 2021-09-21 西安电子科技大学 Pedestrian micromotion part separation method based on segmentation theory
CN114171044A (en) * 2021-12-09 2022-03-11 江苏恩美谛医疗科技有限公司 Time domain full convolution based deep neural network electronic stethoscope self-adaptive noise elimination method

Also Published As

Publication number Publication date
CN114492540A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN114492540B (en) Training method and device of target detection model, computer equipment and storage medium
US6564176B2 (en) Signal and pattern detection or classification by estimation of continuous dynamical models
CN111046787A (en) Pedestrian detection method based on improved YOLO v3 model
CN112784930B (en) CACGAN-based HRRP identification database sample expansion method
WO2008016109A1 (en) Learning data set optimization method for signal identification device and signal identification device capable of optimizing the learning data set
CN114565124A (en) Ship traffic flow prediction method based on improved graph convolution neural network
CN112633174B (en) Improved YOLOv4 high-dome-based fire detection method and storage medium
CN113657491A (en) Neural network design method for signal modulation type recognition
CN110224771B (en) Spectrum sensing method and device based on BP neural network and information geometry
CN111880158A (en) Radar target detection method and system based on convolutional neural network sequence classification
CN101794437A (en) Method for detecting abnormal target in hyperspectral remotely sensed image
CN116310850B (en) Remote sensing image target detection method based on improved RetinaNet
CN110929842A (en) Accurate intelligent detection method for burst time region of non-cooperative radio signal
CN113487600A (en) Characteristic enhancement scale self-adaptive sensing ship detection method
CN115937659A (en) Mask-RCNN-based multi-target detection method in indoor complex environment
CN112946600B (en) Method for constructing radar HRRP database based on WGAN-GP
CN113626929A (en) Multi-stage multi-topology ship traffic complexity measuring method and system
CN113536373A (en) Desensitization meteorological data generation method
CN110824478B (en) Automatic classification method and device for precipitation cloud types based on diversified 3D radar echo characteristics
Weinberg et al. A Bayesian-based CFAR detector for Pareto type II clutter
CN114822562A (en) Training method of voiceprint recognition model, voiceprint recognition method and related equipment
CN113627310A (en) Background and scale perception SAR ship target detection method
CN114942480A (en) Ocean station wind speed forecasting method based on information perception attention dynamic cooperative network
CN113688774B (en) Advanced learning-based high-rise building wind induced response prediction and training method and device
CN115859840B (en) Marine environment power element region extremum analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant