CN112308150A

CN112308150A - Target detection model training method and device, computer equipment and storage medium

Info

Publication number: CN112308150A
Application number: CN202011204414.8A
Authority: CN
Inventors: 赵娅琳; 赵晓辉; 陈斌; 宋晨
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-02
Filing date: 2020-11-02
Publication date: 2021-02-02
Anticipated expiration: 2040-11-02
Also published as: CN112308150B

Abstract

The application relates to the field of artificial intelligence, and provides a target detection model training method, a target detection model training device, computer equipment and a storage medium, wherein a picture to be trained is obtained; extracting N characteristic image layers; extracting respective target areas of the target feature layers; obtaining target characteristics corresponding to each target area; inputting each target feature into a feature selection module for calculation to obtain a second output value corresponding to each target feature; calculating a first loss value of the feature selection module according to the first output value and the second output value; calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value; and training the original target detection model according to the target loss value. According to the target detection model training method and device, the computer equipment and the storage medium, the feature selection module and the original target detection model are introduced to be trained jointly, so that the precision and the robustness of the target detection model are enhanced.

Description

Target detection model training method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a method and an apparatus for training a target detection model, a computer device, and a storage medium.

Background

The target detection is a great hot direction of computer vision and image processing, is widely applied to various fields such as intelligent video monitoring, robot navigation, industrial detection and the like, and saves a large part of human resource consumption. The convolutional neural network is a typical feedforward network, can ensure that the displacement, the scale and the deformation of an image target are not deformed, and therefore has excellent performance in the aspect of image processing.

The convolutional neural network has a large number of two-dimensional filters and activation functions, and can excellently extract image features and abstract expression. However, a great number of abstract features extracted from massive parameters of the network are not well combined and utilized in the existing network structure, so that the performance of the network structure is optimized. In most methods, only the picture is sampled by using a convolutional neural network to obtain different feature layers, and target classification and position regression are carried out on the feature layers. Or violently adding or splicing the feature layers together and then performing subsequent operation on the result. The feature selection mode essentially has no use of emphasis on the features extracted by the convolutional neural network, and the network detection precision and the stability of a detection frame are improved without adding other network modules and avoiding time consumption increase.

Disclosure of Invention

The application mainly aims to provide a target detection model training method, a target detection model training device, a computer device and a storage medium, and aims to solve the technical problems that extracted numerous abstract features cannot be reasonably utilized, and therefore the accuracy and stability of network detection are low.

In order to achieve the above object, the present application provides a method for training a target detection model, comprising the following steps:

acquiring a picture to be trained; the picture to be trained has a correct first output value in a preset feature selection module;

the picture to be trained is subjected to N times of downsampling, N characteristic layers are extracted, and a characteristic pyramid is formed; wherein N is a positive integer greater than or equal to 2;

extracting respective target areas of the target feature image layers through an original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid;

obtaining target characteristics corresponding to each target area through ROIAlign operation of each target area;

inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature;

calculating a first loss value of the feature selection module according to the first output value and the second output value;

calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value;

and training the original target detection model according to the target loss value, so that the training of the original target detection model is stopped after the target loss value reaches a preset result, and a target detection model is obtained.

Further, the step of obtaining the target feature corresponding to each target area by subjecting each target area to roiign operation includes:

extracting ROI (region of interest) features of each target region in each target feature layer;

reducing the dimensions of the ROI features to enable the dimensions of the ROI features to be consistent;

and splicing the ROI features subjected to dimension reduction to obtain the target features.

Further, the feature selection module comprises three convolutional layers and a fully connected layer with softmax;

the step of inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature includes:

calculating a third loss value through a focal-loss function and a fourth loss value through an IoU-loss function for each target feature;

and adding the third loss value and the fourth loss value of each target feature and calculating a second output value of each target feature through the softmax.

Further, the step of calculating a first loss value of the feature selection module according to the first output value and the second output value comprises:

and calculating the first loss value and the second loss value through a cross entropy loss function.

Further, the step of training the original target detection model according to the target loss value includes:

and training the original target detection model by adopting an SGD gradient back propagation algorithm according to the target loss value.

Further, the step of calculating a second loss value of the original target detection model includes:

and calculating a second loss value of the original target detection network through the focal loss function and the GIoU loss function.

Further, after the step of calculating the second loss value of the original target detection network by the focal loss function and the GIoU loss function, the method includes:

acquiring an offset value of a pixel of each target feature layer in the downsampling process;

passing smooth-L according to the offset value₁Calculating a pixel loss value by a function;

and adding the pixel loss value and the second loss value, and taking the added value as a new second loss value.

The application also provides a target detection model training device, including:

the acquisition unit is used for acquiring a picture to be trained; the picture to be trained has a correct first output value in a preset feature selection module;

the down-sampling unit is used for extracting the picture to be trained into N characteristic layers through N times of down-sampling to form a characteristic pyramid; wherein N is a positive integer greater than or equal to 2;

the extraction unit is used for extracting respective target areas of the target feature image layers through the original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid;

the target feature unit is used for obtaining target features corresponding to the target areas through ROIAlign operation of the target areas;

the first calculation unit is used for inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature;

a second calculating unit, configured to calculate a first loss value of the feature selection module according to the first output value and the second output value;

the third calculating unit is used for calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value;

and the training unit is used for training the original target detection model according to the target loss value, so that the training of the original target detection model is stopped after the target loss value reaches a preset result, and the target detection model is obtained.

The present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the target detection model training method according to any one of the above methods when executing the computer program.

The present application further provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the object detection model training method of any one of the above.

According to the target detection model training method, the device, the computer equipment and the storage medium, the feature layers are input to the feature selection module to calculate respective first output values, loss values are calculated according to the first output values, so that each feature layer can learn which features on the respective feature layer are important, contribution of each feature layer is emphasized, a value obtained by adding the first loss value of the original target detection model and the second loss value of the feature selection module serves as a target loss value, iterative training is performed according to the target loss value, learning of correctly classified and position regression features is facilitated, and network accuracy and robustness of the target detection model are improved.

Drawings

FIG. 1 is a schematic diagram illustrating steps of a target detection model training method according to an embodiment of the present disclosure;

FIG. 2 is a block diagram of an embodiment of a target detection model training apparatus;

fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Referring to fig. 1, an embodiment of the present application provides a target detection model training method, including the following steps:

step S1, acquiring a picture to be trained; wherein the picture to be trained has a correct first output value in a preset feature selection module

Step S2, carrying out N times of downsampling on the picture to be trained, extracting N feature layers, and forming a feature pyramid; wherein N is a positive integer greater than or equal to 2;

step S3, extracting respective target areas of the target feature layers through the original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid;

step S4, obtaining target characteristics corresponding to each target area through ROIAlign operation of each target area;

step S5, inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature;

step S6, calculating a first loss value of the feature selection module according to the first output value and the second output value;

step S7, calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value;

and step S8, training the original target detection model according to the target loss value, and stopping training the original target detection model after the target loss value reaches a preset result to obtain a target detection model.

In this embodiment, as described in step S1, the picture to be trained has a correct target detection result, that is, the target to be detected in the picture to be trained has a correct category and position. The feature selection module has 3 convolutional layers with 3 x 3 and a fully connected layer with softmax, specifically, the convolutional layers have no padding but have a relu activation function, as shown in table 1. And (3) obtaining N-1 target feature layers by carrying out N times of downsampling on the picture to be trained, wherein the feature selection module is trained in advance, and each target feature layer has a correct first output value in the feature selection module, namely correct probability distribution corresponding to each target feature layer.

TABLE 1

As described in step S2, the image to be trained is scaled by N times of downsampling to obtain N feature layers, so as to form a feature pyramid, where the scaling degree of each feature layer is different. In the down-sampling, there are two processes, bottom-up and top-down. In the bottom-up process, the images to be trained are convolved, so that each feature map becomes smaller. And in the process from top to bottom, deconvolution is carried out layer by layer from the topmost characteristic diagram, the deconvolution can restore semantic information extracted from the top layer besides the size of the top-layer characteristic diagram, the information ignores background classes in the image, and the foreground object is restored to the corresponding position.

As described in step S3, the original target detection model is obtained based on the ResNet network or the MobileNet network, for example, if the original target detection network is obtained based on the MobileNet 2 network training, the MobileNet 2 network uses the inverse residual structure, so that the accuracy is higher, the model is smaller, and the specific structure of the original target detection network obtained by training is shown in table 2. Where t denotes the "expansion" factor, c denotes the number of output channels, n denotes the number of repetitions, and s denotes the step size stride.

Input	Operator	t	c	n	s
						320²×3	Conv2d	-	32	1	2
160²×32	bottleneck	1	16	1	1
						160²×16	bottleneck	6	24	2	2
80²×24	bottleneck	6	32	3	2
						40²×32	bottleneck	6	64	4	2
40²×64	bottleneck	6	96	3	1
						20²×96	bottleneck	6	160	3	2

The original target detection model has certain target detection capability and can initialize useful features for the feature selection module. In the step S2, N feature layers are obtained, the scaling degree of the first feature layer is minimum, the remaining N-1 feature layers except the first feature layer are used as target feature layers, and a target region of each target feature layer, that is, a region where each target feature layer is located of a target to be detected is extracted through an original target detection model, so as to obtain category and position information of target detection. The first feature image layer is low in scaling degree, so that the extracted feature semantics are too shallow to be beneficial to training of a target detection model, and therefore the first feature image layer is not used as a target feature image layer.

As described in the foregoing steps S4-S5, the roiign processing is performed on the target region to obtain a feature of 1280 × 7, and N-1 target features are serially spliced and then input to the feature selection module for calculation, so as to obtain a second output value corresponding to each target feature, that is, the predicted probability distribution corresponding to each target feature layer. Specifically, ROIAlign traverses each candidate region in the target region, and floating point number boundaries are kept from being quantized. The candidate region is divided into k units, and the boundary of each unit is not quantized. And calculating fixed four coordinate positions in each unit, calculating the values of the four positions by using a bilinear interpolation method, and then performing maximum pooling operation to obtain the target characteristics of each target area.

As shown in the above steps S6-S8, the first output values and the second output values of all target feature map layers are calculated together by a loss function to obtain a first loss value, the second loss value of the original target detection model is calculated according to the target region extracted by the original target detection model and the correct target region, and the first loss value and the second loss value are added to obtain a target loss value, and further, the first loss value may be assigned a preset weight. The training of the target detection model is an iterative training process, the calculated target loss value is compared with a preset result, and when the target loss value reaches the preset result, namely the trained target detection model can accurately detect the target in the graph to be trained, the training is stopped. And entering the next training process when the target loss value does not reach the preset result.

In this embodiment, the feature layers are input to the feature selection module to calculate respective first output values, and then loss values are calculated according to the first output values, so that each feature layer can learn which features on the respective feature layer are important, and the contribution of each feature layer is emphasized, and then a value obtained by adding the first loss value of the original target detection model and the second loss value of the feature selection module is used as a target loss value, and iterative training is performed according to the target loss value, so that features of correct classification and position regression are learned, and the network precision and robustness of the target detection model are improved.

In an embodiment, the step S4 of obtaining the target feature corresponding to each target area by subjecting each target area to roiign operation includes:

In this embodiment, roiign operation is performed on each target feature layer, the features of each target feature layer are mapped, and ROI features, that is, features of 1280 × 7, are extracted. Specifically, the ground-route of the target is utilized to extract the corresponding positions of the target on the respective target feature layers on the target feature layers, namely ROI features, the feature dimensions of the ROI features are scaled, so that the feature dimensions of the ROI features of N-1 target feature layers are kept consistent, and then all the ROI features after dimension reduction are serially spliced together to obtain the target features.

In one embodiment, the feature selection module comprises three convolutional layers and one fully connected layer with softmax;

the step S5 of inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature includes:

and adding the third loss value and the fourth loss value of each target feature and calculating a second output value of each target feature through the softmax. In this embodiment, as described in step S51, the first loss value and the second loss value are calculated for each ROI feature in the target features through a focal-loss function and a IoU-loss function, respectively, where if N is 5, there are 4 target feature layers, and the first loss value and the second loss value are calculated for the ROI feature of each target feature layer through two functions, so that 4 third loss values and fourth loss values can be obtained for one picture to be trained. Specifically, the expression of the focal-loss function is as follows:

and x and y are coordinates of the ROI feature of each target feature layer in the target feature layer respectively.

IoU-loss function is expressed as

Where | A ≧ B | is the area where the predicted target frame intersects the real target frame, A is the area of the predicted target frame, B is the area of the real target frame, and | A ≦ B | is the area of both phases. In yet another embodiment, the fourth loss value may also be calculated using a GIoU-loss function. Further, the third penalty value and the fourth penalty value may be weighted, respectively.

As described in the above step S52, the corresponding third loss value and fourth loss value are added for each ROI feature, and a second output value is calculated by softmax of the full-link layer, and has 4 values when N is 5 as described above.

In one embodiment, the step of calculating a first penalty value for the feature selection module based on the first output value and the second output value comprises:

In this embodiment, the loss value is calculated by a cross entropy loss function, and the formula of the cross entropy loss function is:

wherein, p (x)_i) Is a first output value, q (x)_i) Is the second output value, x_iIs the ith target feature layer.

In an embodiment, the step of training the original target detection model according to the target loss value includes:

In this embodiment, an SGD gradient back propagation algorithm is used for training, the output is back-propagated to the input layer by layer through the hidden layer, and the error is distributed to all units of each layer, so as to obtain an error signal of each layer of unit, where the error signal is used as a basis for correcting the weight of each unit. The gradient of each parameter of the original target detection model is calculated by back propagation, so that various SGD gradients can be used for updating each parameter, and the self-learning force of the model is enhanced.

In one embodiment, the step of calculating the second loss value of the original target detection model includes:

In this embodiment, the second loss value is calculated by using the focal loss function for classification and the GIoU loss function for position regression.

The loss function corresponding to the second loss value is: l is_det＝L_cls+λ_sizeL_size(ii) a Wherein L is_clsIs focal loss function, L_sizeIs the GIoU loss function, lambda_sizeThe weight is preset, and the user can freely set the weight;

the expression of the focal-loss function is:

The expression of the GIoU-loss function is

In an embodiment, the step of calculating the second loss value of the original target detection network by the focal loss function and the GIoU loss function is followed by:

In this embodiment, in the down-sampling process, the pixels may shift, which affects the detection of the target. By smooth-L₁The function calculates the pixel loss value, in particular, smooth-L₁The expression of the function is:

wherein x is the offset value.

The corresponding loss function for the second loss value in this embodiment is:

L_det＝L_cls+λ_sizeL_size+λ_offL_off(ii) a Wherein λ is_offFor a predetermined weight of pixel loss value, λ_offMay be equal to 0.1.

In this embodiment, the loss of the pixel offset value is calculated, and then the pixel loss value is added to the loss values calculated by the local loss function and the GIoU loss function to obtain a second loss value, and the second loss value takes the offset of the pixel in the downsampling process into consideration, so that the accuracy of the trained target detection model is higher.

The target detection model training method provided by the application can be applied to the field of block chains, the trained target detection model is stored in a block chain network, and the block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.

The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like.

Referring to fig. 2, an embodiment of the present application further provides a target detection model training apparatus, including:

an obtaining unit 10, configured to obtain a picture to be trained; wherein the picture to be trained has a correct first output value in a preset feature selection module

The down-sampling unit 20 is configured to perform N times of down-sampling on the picture to be trained, extract N feature layers, and form a feature pyramid; wherein N is a positive integer greater than or equal to 2;

an extracting unit 30, configured to extract respective target regions of the target feature map layers through an original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid;

the target feature unit 40 is configured to perform roiign operation on each target area to obtain a target feature corresponding to each target area;

the first calculating unit 50 is configured to input each target feature into the feature selection module for calculation, so as to obtain a second output value corresponding to each target feature;

a second calculating unit 60, configured to calculate a first loss value of the feature selection module according to the first output value and the second output value;

a third calculating unit 70, configured to calculate a second loss value of the original target detection model, and add the first loss value and the second loss value to obtain a target loss value;

and the training unit 80 is configured to train the original target detection model according to the target loss value, so that the training of the original target detection model is stopped after the target loss value reaches a preset result, and a target detection model is obtained.

In one embodiment, the target feature unit 40 includes:

the extraction subunit is configured to extract an ROI feature of each target region in each target feature map layer;

the dimension reduction subunit is used for reducing the dimensions of the ROI features to enable the dimensions of the ROI features to be consistent;

and the splicing subunit is used for splicing the ROI features subjected to dimension reduction to obtain the target features.

In one embodiment, the first calculating unit 50 includes:

the first calculating subunit is used for calculating a third loss value through a focal-loss function and a fourth loss value through an IoU-loss function for each target feature;

and the adding subunit is used for adding the third loss value and the fourth loss value of each target feature and calculating a second output value of each target feature through the softmax.

In one embodiment, the second computing unit 60 includes:

and the second calculating subunit is used for calculating the first loss value through a cross entropy loss function according to the first output value and the second output value.

In one embodiment, the training unit 80 includes:

and the training subunit is used for training the original target detection model by adopting an SGD gradient back propagation algorithm according to the target loss value.

In an embodiment, the third computing unit 70 includes:

and the third calculating subunit is used for calculating a second loss value of the original target detection network through the focal loss function and the GIoU loss function.

In one embodiment, the third computing subunit comprises:

an obtaining module, configured to obtain an offset value of a pixel in each target feature layer in the downsampling process;

a calculation module for passing smooth-L according to the offset value₁Calculating a pixel loss value by a function;

the adding module adds the pixel loss value and the second loss value, and takes the added value as a new second loss value.

In this embodiment, please refer to the above method embodiment for specific implementation of the above units, sub-units, and modules, which are not described herein again.

Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a storage medium and an internal memory. The storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and computer programs in the storage medium to run. The database of the computer device is used for storing pictures to be trained and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of training an object detection model.

Those skilled in the art will appreciate that the architecture shown in fig. 3 is only a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects may be applied.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements a method for training a target detection model.

In summary, for the target detection model training method, apparatus, computer device and storage medium provided in the embodiments of the present application, a picture to be trained is obtained; the picture to be trained has a correct first output value in a preset feature selection module; the picture to be trained is subjected to N times of downsampling, N characteristic layers are extracted, and a characteristic pyramid is formed; wherein N is a positive integer greater than or equal to 2; extracting respective target areas of the target feature image layers through an original target detection model; the target feature layer is a feature layer except the feature layer at the bottommost layer in the feature pyramid; obtaining target characteristics corresponding to each target area through ROIAlign operation of each target area; inputting each target feature into the feature selection module for calculation to obtain a second output value corresponding to each target feature; calculating a first loss value of the feature selection module according to the first output value and the second output value; calculating a second loss value of the original target detection model, and adding the first loss value and the second loss value to obtain a target loss value; and training the original target detection model according to the target loss value, so that the training of the original target detection model is stopped after the target loss value reaches a preset result, and a target detection model is obtained. The method comprises the steps of inputting feature layers into a feature selection module to calculate respective first output values, calculating loss values according to the first output values, enabling each feature layer to learn which features on the respective feature layers are important, enabling contribution of each feature layer to be emphasized, taking a value obtained by adding the first loss value of an original target detection model and the second loss value of the feature selection module as a target loss value, performing iterative training according to the target loss value, facilitating learning of correctly classified and position regression features, and improving network accuracy and robustness of a target detection model.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware instructions of a computer program, which may be stored on a computer-readable storage medium, and when executed, may include the processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only for the preferred embodiment of the present application and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims

1. A target detection model training method is characterized by comprising the following steps:

2. The method for training the target detection model according to claim 1, wherein the step of obtaining the target feature corresponding to each target region by subjecting each target region to roiign operation includes:

3. The method for training the object detection model according to claim 1, wherein the feature selection module comprises three convolutional layers and a fully connected layer with softmax;

the step of inputting each target feature into a feature selection module for calculation to obtain a second output value corresponding to each target feature includes:

4. The method of claim 1, wherein the step of calculating a first loss value of the feature selection module based on the first output value and the second output value comprises:

5. The method of claim 4, wherein the step of training the original target detection model according to the target loss value comprises:

6. The method of claim 1, wherein the step of calculating the second loss value of the original object detection model comprises:

7. The method for training the object detection model according to claim 1, wherein the step of calculating the second loss value of the original object detection network by the focal loss function and the GIoU loss function is followed by:

8. An object detection model training apparatus, comprising:

the acquisition unit is used for acquiring a picture to be trained; wherein the picture to be trained has a correct first output value in a preset feature selection module

second calculation sheetYuanA first loss value for the feature selection module is calculated according to the first output value and the second output value;

9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, characterized in that the processor, when executing the computer program, implements the steps of the object detection model training method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the object detection model training method of any one of claims 1 to 7.