CN112308150A - Target detection model training method and device, computer equipment and storage medium - Google Patents

Target detection model training method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112308150A
CN112308150A CN202011204414.8A CN202011204414A CN112308150A CN 112308150 A CN112308150 A CN 112308150A CN 202011204414 A CN202011204414 A CN 202011204414A CN 112308150 A CN112308150 A CN 112308150A
Authority
CN
China
Prior art keywords
target
feature
loss value
detection model
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011204414.8A
Other languages
Chinese (zh)
Other versions
CN112308150B (en
Inventor
赵娅琳
赵晓辉
陈斌
宋晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011204414.8A priority Critical patent/CN112308150B/en
Publication of CN112308150A publication Critical patent/CN112308150A/en
Application granted granted Critical
Publication of CN112308150B publication Critical patent/CN112308150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the field of artificial intelligence, and provides a target detection model training method, a target detection model training device, computer equipment and a storage medium, wherein a picture to be trained is obtained; extracting N characteristic image layers; extracting respective target areas of the target feature layers; obtaining target characteristics corresponding to each target area; inputting each target feature into a feature selection module for calculation to obtain a second output value corresponding to each target feature; calculating a first loss value of the feature selection module according to the first output value and the second output value; calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value; and training the original target detection model according to the target loss value. According to the target detection model training method and device, the computer equipment and the storage medium, the feature selection module and the original target detection model are introduced to be trained jointly, so that the precision and the robustness of the target detection model are enhanced.

Description

Target detection model training method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a method and an apparatus for training a target detection model, a computer device, and a storage medium.
Background
The target detection is a great hot direction of computer vision and image processing, is widely applied to various fields such as intelligent video monitoring, robot navigation, industrial detection and the like, and saves a large part of human resource consumption. The convolutional neural network is a typical feedforward network, can ensure that the displacement, the scale and the deformation of an image target are not deformed, and therefore has excellent performance in the aspect of image processing.
The convolutional neural network has a large number of two-dimensional filters and activation functions, and can excellently extract image features and abstract expression. However, a great number of abstract features extracted from massive parameters of the network are not well combined and utilized in the existing network structure, so that the performance of the network structure is optimized. In most methods, only the picture is sampled by using a convolutional neural network to obtain different feature layers, and target classification and position regression are carried out on the feature layers. Or violently adding or splicing the feature layers together and then performing subsequent operation on the result. The feature selection mode essentially has no use of emphasis on the features extracted by the convolutional neural network, and the network detection precision and the stability of a detection frame are improved without adding other network modules and avoiding time consumption increase.
Disclosure of Invention
The application mainly aims to provide a target detection model training method, a target detection model training device, a computer device and a storage medium, and aims to solve the technical problems that extracted numerous abstract features cannot be reasonably utilized, and therefore the accuracy and stability of network detection are low.
In order to achieve the above object, the present application provides a method for training a target detection model, comprising the following steps:
acquiring a picture to be trained; the picture to be trained has a correct first output value in a preset feature selection module;
the picture to be trained is subjected to N times of downsampling, N characteristic layers are extracted, and a characteristic pyramid is formed; wherein N is a positive integer greater than or equal to 2;
extracting respective target areas of the target feature image layers through an original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid;
obtaining target characteristics corresponding to each target area through ROIAlign operation of each target area;
inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature;
calculating a first loss value of the feature selection module according to the first output value and the second output value;
calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value;
and training the original target detection model according to the target loss value, so that the training of the original target detection model is stopped after the target loss value reaches a preset result, and a target detection model is obtained.
Further, the step of obtaining the target feature corresponding to each target area by subjecting each target area to roiign operation includes:
extracting ROI (region of interest) features of each target region in each target feature layer;
reducing the dimensions of the ROI features to enable the dimensions of the ROI features to be consistent;
and splicing the ROI features subjected to dimension reduction to obtain the target features.
Further, the feature selection module comprises three convolutional layers and a fully connected layer with softmax;
the step of inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature includes:
calculating a third loss value through a focal-loss function and a fourth loss value through an IoU-loss function for each target feature;
and adding the third loss value and the fourth loss value of each target feature and calculating a second output value of each target feature through the softmax.
Further, the step of calculating a first loss value of the feature selection module according to the first output value and the second output value comprises:
and calculating the first loss value and the second loss value through a cross entropy loss function.
Further, the step of training the original target detection model according to the target loss value includes:
and training the original target detection model by adopting an SGD gradient back propagation algorithm according to the target loss value.
Further, the step of calculating a second loss value of the original target detection model includes:
and calculating a second loss value of the original target detection network through the focal loss function and the GIoU loss function.
Further, after the step of calculating the second loss value of the original target detection network by the focal loss function and the GIoU loss function, the method includes:
acquiring an offset value of a pixel of each target feature layer in the downsampling process;
passing smooth-L according to the offset value1Calculating a pixel loss value by a function;
and adding the pixel loss value and the second loss value, and taking the added value as a new second loss value.
The application also provides a target detection model training device, including:
the acquisition unit is used for acquiring a picture to be trained; the picture to be trained has a correct first output value in a preset feature selection module;
the down-sampling unit is used for extracting the picture to be trained into N characteristic layers through N times of down-sampling to form a characteristic pyramid; wherein N is a positive integer greater than or equal to 2;
the extraction unit is used for extracting respective target areas of the target feature image layers through the original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid;
the target feature unit is used for obtaining target features corresponding to the target areas through ROIAlign operation of the target areas;
the first calculation unit is used for inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature;
a second calculating unit, configured to calculate a first loss value of the feature selection module according to the first output value and the second output value;
the third calculating unit is used for calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value;
and the training unit is used for training the original target detection model according to the target loss value, so that the training of the original target detection model is stopped after the target loss value reaches a preset result, and the target detection model is obtained.
The present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the target detection model training method according to any one of the above methods when executing the computer program.
The present application further provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the object detection model training method of any one of the above.
According to the target detection model training method, the device, the computer equipment and the storage medium, the feature layers are input to the feature selection module to calculate respective first output values, loss values are calculated according to the first output values, so that each feature layer can learn which features on the respective feature layer are important, contribution of each feature layer is emphasized, a value obtained by adding the first loss value of the original target detection model and the second loss value of the feature selection module serves as a target loss value, iterative training is performed according to the target loss value, learning of correctly classified and position regression features is facilitated, and network accuracy and robustness of the target detection model are improved.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a target detection model training method according to an embodiment of the present disclosure;
FIG. 2 is a block diagram of an embodiment of a target detection model training apparatus;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, an embodiment of the present application provides a target detection model training method, including the following steps:
step S1, acquiring a picture to be trained; wherein the picture to be trained has a correct first output value in a preset feature selection module
Step S2, carrying out N times of downsampling on the picture to be trained, extracting N feature layers, and forming a feature pyramid; wherein N is a positive integer greater than or equal to 2;
step S3, extracting respective target areas of the target feature layers through the original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid;
step S4, obtaining target characteristics corresponding to each target area through ROIAlign operation of each target area;
step S5, inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature;
step S6, calculating a first loss value of the feature selection module according to the first output value and the second output value;
step S7, calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value;
and step S8, training the original target detection model according to the target loss value, and stopping training the original target detection model after the target loss value reaches a preset result to obtain a target detection model.
In this embodiment, as described in step S1, the picture to be trained has a correct target detection result, that is, the target to be detected in the picture to be trained has a correct category and position. The feature selection module has 3 convolutional layers with 3 x 3 and a fully connected layer with softmax, specifically, the convolutional layers have no padding but have a relu activation function, as shown in table 1. And (3) obtaining N-1 target feature layers by carrying out N times of downsampling on the picture to be trained, wherein the feature selection module is trained in advance, and each target feature layer has a correct first output value in the feature selection module, namely correct probability distribution corresponding to each target feature layer.
Figure BDA0002756538960000051
Figure BDA0002756538960000061
TABLE 1
As described in step S2, the image to be trained is scaled by N times of downsampling to obtain N feature layers, so as to form a feature pyramid, where the scaling degree of each feature layer is different. In the down-sampling, there are two processes, bottom-up and top-down. In the bottom-up process, the images to be trained are convolved, so that each feature map becomes smaller. And in the process from top to bottom, deconvolution is carried out layer by layer from the topmost characteristic diagram, the deconvolution can restore semantic information extracted from the top layer besides the size of the top-layer characteristic diagram, the information ignores background classes in the image, and the foreground object is restored to the corresponding position.
As described in step S3, the original target detection model is obtained based on the ResNet network or the MobileNet network, for example, if the original target detection network is obtained based on the MobileNet 2 network training, the MobileNet 2 network uses the inverse residual structure, so that the accuracy is higher, the model is smaller, and the specific structure of the original target detection network obtained by training is shown in table 2. Where t denotes the "expansion" factor, c denotes the number of output channels, n denotes the number of repetitions, and s denotes the step size stride.
Input Operator t c n s
3202×3 Conv2d - 32 1 2
1602×32 bottleneck 1 16 1 1
1602×16 bottleneck 6 24 2 2
802×24 bottleneck 6 32 3 2
402×32 bottleneck 6 64 4 2
402×64 bottleneck 6 96 3 1
202×96 bottleneck 6 160 3 2
The original target detection model has certain target detection capability and can initialize useful features for the feature selection module. In the step S2, N feature layers are obtained, the scaling degree of the first feature layer is minimum, the remaining N-1 feature layers except the first feature layer are used as target feature layers, and a target region of each target feature layer, that is, a region where each target feature layer is located of a target to be detected is extracted through an original target detection model, so as to obtain category and position information of target detection. The first feature image layer is low in scaling degree, so that the extracted feature semantics are too shallow to be beneficial to training of a target detection model, and therefore the first feature image layer is not used as a target feature image layer.
As described in the foregoing steps S4-S5, the roiign processing is performed on the target region to obtain a feature of 1280 × 7, and N-1 target features are serially spliced and then input to the feature selection module for calculation, so as to obtain a second output value corresponding to each target feature, that is, the predicted probability distribution corresponding to each target feature layer. Specifically, ROIAlign traverses each candidate region in the target region, and floating point number boundaries are kept from being quantized. The candidate region is divided into k units, and the boundary of each unit is not quantized. And calculating fixed four coordinate positions in each unit, calculating the values of the four positions by using a bilinear interpolation method, and then performing maximum pooling operation to obtain the target characteristics of each target area.
As shown in the above steps S6-S8, the first output values and the second output values of all target feature map layers are calculated together by a loss function to obtain a first loss value, the second loss value of the original target detection model is calculated according to the target region extracted by the original target detection model and the correct target region, and the first loss value and the second loss value are added to obtain a target loss value, and further, the first loss value may be assigned a preset weight. The training of the target detection model is an iterative training process, the calculated target loss value is compared with a preset result, and when the target loss value reaches the preset result, namely the trained target detection model can accurately detect the target in the graph to be trained, the training is stopped. And entering the next training process when the target loss value does not reach the preset result.
In this embodiment, the feature layers are input to the feature selection module to calculate respective first output values, and then loss values are calculated according to the first output values, so that each feature layer can learn which features on the respective feature layer are important, and the contribution of each feature layer is emphasized, and then a value obtained by adding the first loss value of the original target detection model and the second loss value of the feature selection module is used as a target loss value, and iterative training is performed according to the target loss value, so that features of correct classification and position regression are learned, and the network precision and robustness of the target detection model are improved.
In an embodiment, the step S4 of obtaining the target feature corresponding to each target area by subjecting each target area to roiign operation includes:
extracting ROI (region of interest) features of each target region in each target feature layer;
reducing the dimensions of the ROI features to enable the dimensions of the ROI features to be consistent;
and splicing the ROI features subjected to dimension reduction to obtain the target features.
In this embodiment, roiign operation is performed on each target feature layer, the features of each target feature layer are mapped, and ROI features, that is, features of 1280 × 7, are extracted. Specifically, the ground-route of the target is utilized to extract the corresponding positions of the target on the respective target feature layers on the target feature layers, namely ROI features, the feature dimensions of the ROI features are scaled, so that the feature dimensions of the ROI features of N-1 target feature layers are kept consistent, and then all the ROI features after dimension reduction are serially spliced together to obtain the target features.
In one embodiment, the feature selection module comprises three convolutional layers and one fully connected layer with softmax;
the step S5 of inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature includes:
calculating a third loss value through a focal-loss function and a fourth loss value through an IoU-loss function for each target feature;
and adding the third loss value and the fourth loss value of each target feature and calculating a second output value of each target feature through the softmax. In this embodiment, as described in step S51, the first loss value and the second loss value are calculated for each ROI feature in the target features through a focal-loss function and a IoU-loss function, respectively, where if N is 5, there are 4 target feature layers, and the first loss value and the second loss value are calculated for the ROI feature of each target feature layer through two functions, so that 4 third loss values and fourth loss values can be obtained for one picture to be trained. Specifically, the expression of the focal-loss function is as follows:
Figure BDA0002756538960000081
and x and y are coordinates of the ROI feature of each target feature layer in the target feature layer respectively.
IoU-loss function is expressed as
Figure BDA0002756538960000082
Where | A ≧ B | is the area where the predicted target frame intersects the real target frame, A is the area of the predicted target frame, B is the area of the real target frame, and | A ≦ B | is the area of both phases. In yet another embodiment, the fourth loss value may also be calculated using a GIoU-loss function. Further, the third penalty value and the fourth penalty value may be weighted, respectively.
As described in the above step S52, the corresponding third loss value and fourth loss value are added for each ROI feature, and a second output value is calculated by softmax of the full-link layer, and has 4 values when N is 5 as described above.
In one embodiment, the step of calculating a first penalty value for the feature selection module based on the first output value and the second output value comprises:
and calculating the first loss value and the second loss value through a cross entropy loss function.
In this embodiment, the loss value is calculated by a cross entropy loss function, and the formula of the cross entropy loss function is:
Figure BDA0002756538960000083
wherein, p (x)i) Is a first output value, q (x)i) Is the second output value, xiIs the ith target feature layer.
In an embodiment, the step of training the original target detection model according to the target loss value includes:
and training the original target detection model by adopting an SGD gradient back propagation algorithm according to the target loss value.
In this embodiment, an SGD gradient back propagation algorithm is used for training, the output is back-propagated to the input layer by layer through the hidden layer, and the error is distributed to all units of each layer, so as to obtain an error signal of each layer of unit, where the error signal is used as a basis for correcting the weight of each unit. The gradient of each parameter of the original target detection model is calculated by back propagation, so that various SGD gradients can be used for updating each parameter, and the self-learning force of the model is enhanced.
In one embodiment, the step of calculating the second loss value of the original target detection model includes:
and calculating a second loss value of the original target detection network through the focal loss function and the GIoU loss function.
In this embodiment, the second loss value is calculated by using the focal loss function for classification and the GIoU loss function for position regression.
The loss function corresponding to the second loss value is: l isdet=LclssizeLsize(ii) a Wherein L isclsIs focal loss function, LsizeIs the GIoU loss function, lambdasizeThe weight is preset, and the user can freely set the weight;
the expression of the focal-loss function is:
Figure BDA0002756538960000091
and x and y are coordinates of the ROI feature of each target feature layer in the target feature layer respectively.
The expression of the GIoU-loss function is
Figure BDA0002756538960000092
Where | A $ B | is the area where the predicted target frame intersects with the real target frame, A is the area of the predicted target frame, B is the area of the real target frame, | A ≦ B | is the area where both are in phase, | A ≦ B | B is the area where both are in phasecAnd | is the area of the minimum closure of both.
In an embodiment, the step of calculating the second loss value of the original target detection network by the focal loss function and the GIoU loss function is followed by:
acquiring an offset value of a pixel of each target feature layer in the downsampling process;
passing smooth-L according to the offset value1Calculating a pixel loss value by a function;
and adding the pixel loss value and the second loss value, and taking the added value as a new second loss value.
In this embodiment, in the down-sampling process, the pixels may shift, which affects the detection of the target. By smooth-L1The function calculates the pixel loss value, in particular, smooth-L1The expression of the function is:
Figure BDA0002756538960000101
wherein x is the offset value.
The corresponding loss function for the second loss value in this embodiment is:
Ldet=LclssizeLsizeoffLoff(ii) a Wherein λ isoffFor a predetermined weight of pixel loss value, λoffMay be equal to 0.1.
In this embodiment, the loss of the pixel offset value is calculated, and then the pixel loss value is added to the loss values calculated by the local loss function and the GIoU loss function to obtain a second loss value, and the second loss value takes the offset of the pixel in the downsampling process into consideration, so that the accuracy of the trained target detection model is higher.
The target detection model training method provided by the application can be applied to the field of block chains, the trained target detection model is stored in a block chain network, and the block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.
Referring to fig. 2, an embodiment of the present application further provides a target detection model training apparatus, including:
an obtaining unit 10, configured to obtain a picture to be trained; wherein the picture to be trained has a correct first output value in a preset feature selection module
The down-sampling unit 20 is configured to perform N times of down-sampling on the picture to be trained, extract N feature layers, and form a feature pyramid; wherein N is a positive integer greater than or equal to 2;
an extracting unit 30, configured to extract respective target regions of the target feature map layers through an original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid;
the target feature unit 40 is configured to perform roiign operation on each target area to obtain a target feature corresponding to each target area;
the first calculating unit 50 is configured to input each target feature into the feature selection module for calculation, so as to obtain a second output value corresponding to each target feature;
a second calculating unit 60, configured to calculate a first loss value of the feature selection module according to the first output value and the second output value;
a third calculating unit 70, configured to calculate a second loss value of the original target detection model, and add the first loss value and the second loss value to obtain a target loss value;
and the training unit 80 is configured to train the original target detection model according to the target loss value, so that the training of the original target detection model is stopped after the target loss value reaches a preset result, and a target detection model is obtained.
In one embodiment, the target feature unit 40 includes:
the extraction subunit is configured to extract an ROI feature of each target region in each target feature map layer;
the dimension reduction subunit is used for reducing the dimensions of the ROI features to enable the dimensions of the ROI features to be consistent;
and the splicing subunit is used for splicing the ROI features subjected to dimension reduction to obtain the target features.
In one embodiment, the first calculating unit 50 includes:
the first calculating subunit is used for calculating a third loss value through a focal-loss function and a fourth loss value through an IoU-loss function for each target feature;
and the adding subunit is used for adding the third loss value and the fourth loss value of each target feature and calculating a second output value of each target feature through the softmax.
In one embodiment, the second computing unit 60 includes:
and the second calculating subunit is used for calculating the first loss value through a cross entropy loss function according to the first output value and the second output value.
In one embodiment, the training unit 80 includes:
and the training subunit is used for training the original target detection model by adopting an SGD gradient back propagation algorithm according to the target loss value.
In an embodiment, the third computing unit 70 includes:
and the third calculating subunit is used for calculating a second loss value of the original target detection network through the focal loss function and the GIoU loss function.
In one embodiment, the third computing subunit comprises:
an obtaining module, configured to obtain an offset value of a pixel in each target feature layer in the downsampling process;
a calculation module for passing smooth-L according to the offset value1Calculating a pixel loss value by a function;
the adding module adds the pixel loss value and the second loss value, and takes the added value as a new second loss value.
In this embodiment, please refer to the above method embodiment for specific implementation of the above units, sub-units, and modules, which are not described herein again.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a storage medium and an internal memory. The storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and computer programs in the storage medium to run. The database of the computer device is used for storing pictures to be trained and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of training an object detection model.
Those skilled in the art will appreciate that the architecture shown in fig. 3 is only a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects may be applied.
An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a method for training a target detection model.
In summary, for the target detection model training method, apparatus, computer device and storage medium provided in the embodiments of the present application, a picture to be trained is obtained; the picture to be trained has a correct first output value in a preset feature selection module; the picture to be trained is subjected to N times of downsampling, N characteristic layers are extracted, and a characteristic pyramid is formed; wherein N is a positive integer greater than or equal to 2; extracting respective target areas of the target feature image layers through an original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid; obtaining target characteristics corresponding to each target area through ROIAlign operation of each target area; inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature; calculating a first loss value of the feature selection module according to the first output value and the second output value; calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value; and training the original target detection model according to the target loss value, so that the training of the original target detection model is stopped after the target loss value reaches a preset result, and a target detection model is obtained. The method comprises the steps of inputting feature layers into a feature selection module to calculate respective first output values, calculating loss values according to the first output values, enabling each feature layer to learn which features on the respective feature layers are important, enabling contribution of each feature layer to be emphasized, taking a value obtained by adding the first loss value of an original target detection model and the second loss value of the feature selection module as a target loss value, performing iterative training according to the target loss value, facilitating learning of correctly classified and position regression features, and improving network accuracy and robustness of a target detection model.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware instructions of a computer program, which may be stored on a computer-readable storage medium, and when executed, may include the processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only for the preferred embodiment of the present application and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims (10)

1. A target detection model training method is characterized by comprising the following steps:
acquiring a picture to be trained; the picture to be trained has a correct first output value in a preset feature selection module;
the picture to be trained is subjected to N times of downsampling, N characteristic layers are extracted, and a characteristic pyramid is formed; wherein N is a positive integer greater than or equal to 2;
extracting respective target areas of the target feature image layers through an original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid;
obtaining target characteristics corresponding to each target area through ROIAlign operation of each target area;
inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature;
calculating a first loss value of the feature selection module according to the first output value and the second output value;
calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value;
and training the original target detection model according to the target loss value, so that the training of the original target detection model is stopped after the target loss value reaches a preset result, and a target detection model is obtained.
2. The method for training the target detection model according to claim 1, wherein the step of obtaining the target feature corresponding to each target region by subjecting each target region to roiign operation includes:
extracting ROI (region of interest) features of each target region in each target feature layer;
reducing the dimensions of the ROI features to enable the dimensions of the ROI features to be consistent;
and splicing the ROI features subjected to dimension reduction to obtain the target features.
3. The method for training the object detection model according to claim 1, wherein the feature selection module comprises three convolutional layers and a fully connected layer with softmax;
the step of inputting each target feature into a feature selection module for calculation to obtain a second output value corresponding to each target feature includes:
calculating a third loss value through a focal-loss function and a fourth loss value through an IoU-loss function for each target feature;
and adding the third loss value and the fourth loss value of each target feature and calculating a second output value of each target feature through the softmax.
4. The method of claim 1, wherein the step of calculating a first loss value of the feature selection module based on the first output value and the second output value comprises:
and calculating the first loss value and the second loss value through a cross entropy loss function.
5. The method of claim 4, wherein the step of training the original target detection model according to the target loss value comprises:
and training the original target detection model by adopting an SGD gradient back propagation algorithm according to the target loss value.
6. The method of claim 1, wherein the step of calculating the second loss value of the original object detection model comprises:
and calculating a second loss value of the original target detection network through the focal loss function and the GIoU loss function.
7. The method for training the object detection model according to claim 1, wherein the step of calculating the second loss value of the original object detection network by the focal loss function and the GIoU loss function is followed by:
acquiring an offset value of a pixel of each target feature layer in the downsampling process;
passing smooth-L according to the offset value1Calculating a pixel loss value by a function;
and adding the pixel loss value and the second loss value, and taking the added value as a new second loss value.
8. An object detection model training apparatus, comprising:
the acquisition unit is used for acquiring a picture to be trained; wherein the picture to be trained has a correct first output value in a preset feature selection module
The down-sampling unit is used for extracting the picture to be trained into N characteristic layers through N times of down-sampling to form a characteristic pyramid; wherein N is a positive integer greater than or equal to 2;
the extraction unit is used for extracting respective target areas of the target feature image layers through the original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid;
the target feature unit is used for obtaining target features corresponding to the target areas through ROIAlign operation of the target areas;
the first calculation unit is used for inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature;
second calculation sheetYuanA first loss value for the feature selection module is calculated according to the first output value and the second output value;
the third calculating unit is used for calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value;
and the training unit is used for training the original target detection model according to the target loss value, so that the training of the original target detection model is stopped after the target loss value reaches a preset result, and the target detection model is obtained.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the object detection model training method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the object detection model training method of any one of claims 1 to 7.
CN202011204414.8A 2020-11-02 2020-11-02 Target detection model training method and device, computer equipment and storage medium Active CN112308150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011204414.8A CN112308150B (en) 2020-11-02 2020-11-02 Target detection model training method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011204414.8A CN112308150B (en) 2020-11-02 2020-11-02 Target detection model training method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112308150A true CN112308150A (en) 2021-02-02
CN112308150B CN112308150B (en) 2022-04-15

Family

ID=74333593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011204414.8A Active CN112308150B (en) 2020-11-02 2020-11-02 Target detection model training method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112308150B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114724011A (en) * 2022-05-25 2022-07-08 北京闪马智建科技有限公司 Behavior determination method and device, storage medium and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416394A (en) * 2018-03-22 2018-08-17 河南工业大学 Multi-target detection model building method based on convolutional neural networks
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN109961107A (en) * 2019-04-18 2019-07-02 北京迈格威科技有限公司 Training method, device, electronic equipment and the storage medium of target detection model
CN110517253A (en) * 2019-08-29 2019-11-29 电子科技大学 The method of the good pernicious classification of Lung neoplasm based on 3D multiple target feature learning
CN111160379A (en) * 2018-11-07 2020-05-15 北京嘀嘀无限科技发展有限公司 Training method and device of image detection model and target detection method and device
CN111259930A (en) * 2020-01-09 2020-06-09 南京信息工程大学 General target detection method of self-adaptive attention guidance mechanism
CN111667011A (en) * 2020-06-08 2020-09-15 平安科技(深圳)有限公司 Damage detection model training method, damage detection model training device, damage detection method, damage detection device, damage detection equipment and damage detection medium
CN111709497A (en) * 2020-08-20 2020-09-25 腾讯科技(深圳)有限公司 Information processing method and device and computer readable storage medium
CN111860265A (en) * 2020-07-10 2020-10-30 武汉理工大学 Multi-detection-frame loss balancing road scene understanding algorithm based on sample loss

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416394A (en) * 2018-03-22 2018-08-17 河南工业大学 Multi-target detection model building method based on convolutional neural networks
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN111160379A (en) * 2018-11-07 2020-05-15 北京嘀嘀无限科技发展有限公司 Training method and device of image detection model and target detection method and device
CN109961107A (en) * 2019-04-18 2019-07-02 北京迈格威科技有限公司 Training method, device, electronic equipment and the storage medium of target detection model
CN110517253A (en) * 2019-08-29 2019-11-29 电子科技大学 The method of the good pernicious classification of Lung neoplasm based on 3D multiple target feature learning
CN111259930A (en) * 2020-01-09 2020-06-09 南京信息工程大学 General target detection method of self-adaptive attention guidance mechanism
CN111667011A (en) * 2020-06-08 2020-09-15 平安科技(深圳)有限公司 Damage detection model training method, damage detection model training device, damage detection method, damage detection device, damage detection equipment and damage detection medium
CN111860265A (en) * 2020-07-10 2020-10-30 武汉理工大学 Multi-detection-frame loss balancing road scene understanding algorithm based on sample loss
CN111709497A (en) * 2020-08-20 2020-09-25 腾讯科技(深圳)有限公司 Information processing method and device and computer readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ZHEN QIN, ET AL.: "Making Deep Neural Networks Robust to Label", 《IEEE ACCESS》 *
张筱晗等: "基于中心点的遥感图像多方向舰船目标检测", 《光子学报》 *
徐义鎏等: "基于改进faster RCNN 的木材运输车辆检测", 《计算机应用》 *
黄怡蒙等: "融合深度学习的机器人目标检测与定位", 《计算机工程与应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114724011A (en) * 2022-05-25 2022-07-08 北京闪马智建科技有限公司 Behavior determination method and device, storage medium and electronic device

Also Published As

Publication number Publication date
CN112308150B (en) 2022-04-15

Similar Documents

Publication Publication Date Title
WO2021135499A1 (en) Damage detection model training and vehicle damage detection methods, device, apparatus, and medium
CN111444881B (en) Fake face video detection method and device
CN110020620B (en) Face recognition method, device and equipment under large posture
CN112115783B (en) Depth knowledge migration-based face feature point detection method, device and equipment
CN114913565B (en) Face image detection method, model training method, device and storage medium
CN108805828B (en) Image processing method, device, computer equipment and storage medium
CN111860674A (en) Sample class identification method and device, computer equipment and storage medium
CN112084917B (en) Living body detection method and device
CN110110601A (en) Video pedestrian weight recognizer and device based on multi-space attention model
CN110516541B (en) Text positioning method and device, computer readable storage medium and computer equipment
CN115496928B (en) Multi-modal image feature matching method based on multi-feature matching
CN110110668B (en) Gait recognition method based on feedback weight convolutional neural network and capsule neural network
CN113837942A (en) Super-resolution image generation method, device, equipment and storage medium based on SRGAN
CN112949468A (en) Face recognition method and device, computer equipment and storage medium
CN112464945A (en) Text recognition method, device and equipment based on deep learning algorithm and storage medium
CN112149590A (en) Hand key point detection method
CN112348116A (en) Target detection method and device using spatial context and computer equipment
CN111242840A (en) Handwritten character generation method, apparatus, computer device and storage medium
CN112308150B (en) Target detection model training method and device, computer equipment and storage medium
CN112241646A (en) Lane line recognition method and device, computer equipment and storage medium
CN116883466A (en) Optical and SAR image registration method, device and equipment based on position sensing
CN113591528A (en) Document correction method, device, computer equipment and storage medium
CN113537020B (en) Complex SAR image target identification method based on improved neural network
CN111046755A (en) Character recognition method, character recognition device, computer equipment and computer-readable storage medium
CN116091596A (en) Multi-person 2D human body posture estimation method and device from bottom to top

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant