CN117437394A

CN117437394A - Model training method, target detection method and device

Info

Publication number: CN117437394A
Application number: CN202210829603.7A
Authority: CN
Inventors: 吕永春; 朱徽; 王钰; 周迅溢; 曾定衡; 蒋宁
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2024-01-23
Also published as: WO2024012179A1

Abstract

The embodiment of the application provides a model training method, a target detection method and a device, wherein the method comprises the following steps: in the model training stage, the target detection model to be trained is caused to continuously learn the boundary frame distribution based on the first reference boundary frame and the corresponding actual boundary frame, so that the predicted first prediction boundary frame is more similar to the corresponding actual boundary frame, and the boundary frame prediction accuracy, model generalization and data migration of the trained target detection model are improved; and the comparison result set used for determining the boundary frame regression loss value not only comprises a first comparison result representing the boundary frame distribution similarity degree, but also comprises a second comparison result representing the boundary frame coordinate coincidence degree, so that the accuracy of the boundary frame regression loss value obtained based on the first comparison result and the second comparison result is higher, and the accuracy of the model parameter updated based on the boundary frame regression loss value can be further improved.

Description

Model training method, target detection method and device

Technical Field

The present disclosure relates to the field of target detection, and in particular, to a model training method, a target detection method, and a device.

Background

At present, with the rapid development of artificial intelligence technology, target detection is performed on a certain image through a pre-trained target detection model, so that the demands of predicting coordinate information and classification information of a boundary box where each target contained in the obtained image is located are higher and higher.

However, in the training process of the target detection model in the related art, the similarity degree of the image features in the prediction bounding box and the real bounding box is mainly learned, so that for the sample image dataset, the accuracy of the model parameters of the target detection model obtained by training is relatively high, but for the image to be detected, the accuracy of the model parameters of the target detection model obtained by training is reduced, so that the generalization of the target detection model is poor, and further, the target detection accuracy in the model application stage is relatively low.

Disclosure of Invention

The embodiment of the application aims to provide a model training method, a target detection method and a device, which can enable a predicted first prediction boundary frame to be closer to a corresponding actual boundary frame, so that the boundary frame prediction accuracy, model generalization and data migration of a trained target detection model are improved; and the accuracy of the regression loss value of the boundary frame based on the first comparison result and the second comparison result is higher, so that the accuracy of the model parameter updated based on the regression loss value of the boundary frame can be further improved.

In order to achieve the above technical solution, the embodiments of the present application are implemented as follows:

in a first aspect, an embodiment of the present application provides a model training method, where the method includes:

acquiring a first boundary frame subset from a first alternative boundary frame set, and acquiring actual boundary frames corresponding to first reference boundary frames in the first boundary frame subset respectively; the first boundary box subset comprises a first appointed number of first reference boundary boxes, and the first alternative boundary box set is obtained by extracting a target region from a sample image data set by utilizing a preset region of interest extraction model;

inputting the first reference boundary box and the actual boundary box into a target detection model to be trained for model iterative training until the model iterative training result meets the preset model iterative training termination condition, and obtaining a trained target detection model; wherein the target detection model comprises a bounding box predictor model; the specific implementation mode of each model training is as follows:

for each of the first reference bounding boxes: the boundary frame prediction sub-model carries out boundary frame prediction based on the first reference boundary frame to obtain a first prediction boundary frame; generating a boundary frame comparison result set based on the actual boundary frame corresponding to the first reference boundary frame and the first prediction boundary frame corresponding to the first reference boundary frame; the boundary box comparison result set comprises a first comparison result representing the distribution similarity degree of the boundary box and a second comparison result representing the coordinate coincidence degree of the boundary box;

Determining a bounding box regression loss value based on a first comparison result and a second comparison result respectively corresponding to the first reference bounding box in the first bounding box subset;

and updating parameters of the boundary box predictor model based on the boundary box regression loss value.

In a second aspect, an embodiment of the present application provides a target detection method, where the method includes:

acquiring a second boundary box subset corresponding to the image to be detected from a second alternative boundary box set; the second boundary box subset comprises a third appointed number of second reference boundary boxes, and the second alternative boundary box set is obtained by extracting a target region of the image to be detected by using a preset region of interest extraction model;

inputting the second reference boundary boxes into a target detection model to carry out target detection, and obtaining second prediction boundary boxes and second class prediction results corresponding to the second reference boundary boxes;

and generating a target detection result of the image to be detected based on the second prediction boundary box and the second class prediction result corresponding to each second reference boundary box.

In a third aspect, an embodiment of the present application provides a model training apparatus, including:

The boundary box acquisition module is configured to acquire a first boundary box subset from a first alternative boundary box set and acquire actual boundary boxes respectively corresponding to first reference boundary boxes in the first boundary box subset; the first boundary box subset comprises a first appointed number of first reference boundary boxes, and the first alternative boundary box set is obtained by extracting a target region from a sample image data set by utilizing a preset region of interest extraction model;

the model training module is configured to input the first reference boundary box and the actual boundary box into a target detection model to be trained to carry out model iterative training until a model iterative training result meets a preset model iterative training termination condition, and a trained target detection model is obtained; wherein the target detection model comprises a bounding box predictor model; the specific implementation mode of each model training is as follows:

for each of the first reference bounding boxes: the boundary frame prediction sub-model carries out boundary frame prediction based on the first reference boundary frame to obtain a first prediction boundary frame; generating a boundary frame comparison result set based on the actual boundary frame corresponding to the first reference boundary frame and the first prediction boundary frame corresponding to the first reference boundary frame; the boundary box comparison result set comprises a first comparison result representing the distribution similarity degree of the boundary box and a second comparison result representing the coordinate coincidence degree of the boundary box; determining a bounding box regression loss value based on a first comparison result and a second comparison result respectively corresponding to the first reference bounding box in the first bounding box subset; and updating parameters of the boundary box predictor model based on the boundary box regression loss value.

In a fourth aspect, an embodiment of the present application provides an object detection apparatus, including:

the boundary box acquisition module is configured to acquire a second boundary box subset corresponding to the image to be detected from a second alternative boundary box set; the second boundary box subset comprises a third appointed number of second reference boundary boxes, and the second alternative boundary box set is obtained by extracting a target region of the image to be detected by using a preset region of interest extraction model;

the target detection module is configured to input the second reference boundary boxes into a target detection model to carry out target detection, and second prediction boundary boxes and second class prediction results corresponding to the second reference boundary boxes are obtained;

and the detection result generation module is configured to generate a target detection result of the image to be detected based on the second prediction boundary box and the second class prediction result corresponding to each second reference boundary box.

In a fifth aspect, a computer device provided in an embodiment of the present application, the device includes:

a processor; and a memory arranged to store computer executable instructions configured to be executed by the processor, the executable instructions comprising steps for performing the method as described in the first or second aspect.

In a sixth aspect, embodiments of the present application provide a storage medium storing computer executable instructions that cause a computer to perform the steps of the method as described in the first or second aspect.

It can be seen that, in the embodiment of the present application, in the model training stage, the boundary frame predictor model predicts based on the first reference boundary frame to obtain a first prediction boundary frame, and then based on the first prediction boundary frame and the corresponding actual boundary frame thereof, the target detection model to be trained is caused to continuously learn the boundary frame distribution, so that the predicted first prediction boundary frame is closer to the corresponding actual boundary frame, thereby improving the boundary frame prediction accuracy, model generalization and data migration of the trained target detection model; the comparison result set used for determining the boundary frame regression loss value not only comprises a first comparison result representing the boundary frame distribution similarity degree, but also comprises a second comparison result representing the boundary frame coordinate coincidence degree, and the boundary frame regression loss value is obtained based on the first comparison result and the second comparison result which are respectively corresponding to each first reference boundary frame, so that the boundary frame regression loss value comprises the regression loss obtained based on the coarse granularity comparison dimension of the boundary frame distribution similarity degree and the regression loss obtained based on the fine granularity comparison dimension of the boundary frame coordinate coincidence degree, the accuracy of the boundary frame regression loss value can be improved, and the accuracy of model parameters updated based on the boundary frame regression loss value can be further improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments described in one or more of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.

FIG. 1 is a schematic flow chart of a model training method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of each model training process in the model training method according to the embodiment of the present application;

FIG. 3 is a schematic diagram of a first implementation principle of a model training method according to an embodiment of the present application;

FIG. 4a is a schematic diagram of a second implementation principle of the model training method according to the embodiment of the present application;

FIG. 4b is a schematic diagram of a third implementation principle of the model training method according to the embodiment of the present application;

fig. 5 is a schematic flow chart of a target detection method according to an embodiment of the present application;

fig. 6 is a schematic diagram of an implementation principle of a target detection method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of module components of a model training device according to an embodiment of the present disclosure;

Fig. 8 is a schematic block diagram of a target detection apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to better understand the technical solutions in one or more embodiments of the present application, the following description will clearly and completely describe the technical solutions in embodiments of the present application with reference to the drawings in embodiments of the present application, and it is obvious that the described embodiments are only one or more of some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one or more of the embodiments of the present application without inventive faculty, are intended to be within the scope of the present application.

It should be noted that, without conflict, one or more embodiments of the present application and features of the embodiments may be combined with each other. Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.

In consideration of the problem that if the depth network is used for extracting the characteristics, the model is promoted to learn the image characteristics in the boundary frame, the similarity degree of the image characteristics in the prediction boundary frame and the actual boundary frame is continuously learned, and model parameter adjustment is carried out, so that the trained target detection model is compared with a sample data set used in a model training stage, the generalization of the target detection model is poor, the model cross data migration capability is poor, the target detection accuracy of the target detection model on the sample data set is high, and the problem that the target detection accuracy of new image data to be detected is low is solved, therefore, in the model training stage, the boundary frame prediction sub-model is used for predicting the first prediction boundary frame based on the first reference boundary frame, and then the boundary frame distribution is continuously learned based on the first prediction boundary frame and the corresponding actual boundary frame, so that the predicted first prediction boundary frame is more close to the corresponding actual boundary frame, the trained target detection model can be improved, the accuracy of the boundary frame prediction of the target object in the target detection image to be detected in the trained can be improved, the generalization and the data migration capability of the trained target detection model to be improved, and the target detection capability of the target detection model to be improved can be realized by using the new target detection data; and in addition, if model regression loss is determined only from coarse grain comparison dimensions of the boundary frame distribution similarity, model parameter adjustment is carried out, accurate position learning of the boundary frame cannot be considered, or model regression loss is determined only from fine grain comparison dimensions of the boundary frame coordinate coincidence degree, and model parameter adjustment cannot be carried out, so that the comparison result set for determining the boundary frame regression loss value not only comprises a first comparison result representing the boundary frame distribution similarity degree, but also comprises a second comparison result representing the boundary frame coordinate coincidence degree, the boundary frame regression loss value is obtained based on the first comparison result and the second comparison result respectively corresponding to each first reference boundary frame, the effect of simultaneously considering regression loss brought by the boundary frame distribution similarity but specific position deviation and regression loss brought by the first prediction boundary frame corresponding to the boundary frame of the edge ambiguity is achieved, the boundary frame regression loss value comprises the regression loss obtained based on the coarse grain comparison dimensions of the boundary frame distribution similarity degree, and the regression loss obtained based on the boundary frame coordinate coincidence degree regression dimension is further improved, and the accuracy of the regression loss can be further improved based on the model loss can be further improved.

Fig. 1 is a schematic flow diagram of a model training method provided in one or more embodiments of the present application, where the method in fig. 1 can be performed by an electronic device provided with a model training apparatus, and the electronic device may be a terminal device or a designated server, and a hardware apparatus for model training (i.e. an electronic device provided with a model training apparatus) and a hardware apparatus for target detection (i.e. an electronic device provided with a target detection apparatus) may be the same or different; specifically, the training process for the target detection model, as shown in fig. 1, at least includes the following steps:

s102, acquiring a first boundary frame subset from a first alternative boundary frame set, and acquiring actual boundary frames corresponding to first reference boundary frames in the first boundary frame subset respectively; the first boundary box subset comprises a first appointed number of first reference boundary boxes, and the first alternative boundary box set is obtained by extracting a target region from a sample image data set by using a preset region of interest extraction model;

specifically, the determining process for the first reference bounding boxes with the first specified number may be that for each round of model training, a step of performing target region extraction on the sample image dataset by using the preset region of interest extraction model is performed once, so as to obtain the first reference bounding boxes with the first specified number; it is also possible to perform in advance the step of extracting the target region of the sample image dataset with a preset region of interest extraction model, and then randomly sampling, for each round of model training, a first specified number of first reference bounding boxes from a large number of candidate bounding boxes extracted in advance.

In particular, the sample image dataset may comprise a plurality of sample target objects, each sample target object may correspond to a plurality of first reference bounding boxes, i.e. the first specified number of first reference bounding boxes comprises at least one first reference bounding box to which each sample target object corresponds, respectively.

Specifically, before the step S102 of obtaining the first bounding box subset from the first set of candidate bounding boxes, the method further includes: inputting the sample image dataset into a preset region of interest extraction model to extract the region of interest, and obtaining a first alternative bounding box set; wherein the first set of candidate bounding boxes includes a second specified number of candidate bounding boxes; the second designated number is larger than the first designated number, namely, for each round of model training, a preset region of interest extraction model is utilized to extract the regions of interest of a plurality of sample image data in the sample image data set, so as to obtain a first reference boundary frame with the first designated number; for the case that the second specified number is greater than the first specified number, for each round of model training, randomly sampling from the first specified number of candidate bounding boxes to obtain a first specified number of first reference bounding boxes.

Considering that one of the purposes in the model training process is to continuously learn the boundary frame distribution through iterative training of model parameters, so as to improve generalization and data mobility of the model (that is, model parameters are not dependent on sample data used in the model training process and can be better suitable for data to be identified in the model application process), since in order to prompt a target detection model to be trained to better learn the boundary frame distribution, a first reference boundary frame of an extracted target detection model to be input into the training needs to be ensured to obey a certain probability distribution (such as Gaussian distribution or Cauchy distribution), the larger the number N of the anchor frames extracted by using a preset region of interest extraction model is, the better the boundary frame distribution learning is facilitated by using the preset region of interest extraction model (such as a region of interest extraction algorithm ROI) each time, however, if the model is input into the target detection model to be trained, the data processing amount is necessarily larger, and the hardware requirements are higher;

in a specific implementation, preferably, N anchor frames are extracted by using a preset region of interest extraction model in advance, and then m anchor frames are randomly sampled from the N anchor frames as first reference boundary frames in each round of model training, and are input into an object detection model to be trained to perform model training, so that the data processing amount of each round of model training can be ensured, and the model can also be ensured to perform boundary frame distribution learning better, that is, the data processing amount in the model training process and the boundary frame distribution learning can be simultaneously considered, based on the fact that the second specified number is greater than the first specified number, correspondingly, the step S102 is to obtain a first boundary frame subset from a first alternative boundary frame set, and specifically includes: randomly selecting a first appointed number of alternative bounding boxes from the second appointed number of alternative bounding boxes to serve as a first reference bounding box to obtain a first bounding box subset, namely extracting regions of interest from a plurality of sample image data in a sample image data set by utilizing a preset region of interest extraction model in advance to obtain a second appointed number of alternative bounding boxes; then, for each round of model training, a first specified number of first reference bounding boxes are randomly sampled from a second specified number of candidate bounding boxes.

That is, it is preferable that N anchor frames (i.e., the second specified number of candidate bounding boxes) are extracted in advance, then, for each round of model training, m anchor frames (i.e., the first specified number of first reference bounding boxes) are randomly sampled from the N anchor frames, and then the following step S104 is continued.

S104, inputting the first reference boundary box and the actual boundary box into a target detection model to be trained for model iterative training until the model iterative training result meets the preset model iterative training termination condition, and obtaining a trained target detection model; the preset model iteration training termination condition may include: the number of training rounds of the current model is equal to the total training rounds, or the model loss function converges;

the specific implementation process of the model iterative training in the step S104 is described below, and since the processing process of each model training in the model iterative training process is the same, any model training is taken as an example for detailed description. Specifically, if the target detection model to be trained comprises a boundary box predictor model; as shown in fig. 2, the specific implementation manner of each model training includes the following steps S1042 to S1046:

S1042, for each first reference bounding box: the boundary frame prediction sub-model carries out boundary frame prediction based on the first reference boundary frame to obtain a first prediction boundary frame; generating a boundary frame comparison result set based on the actual boundary frame corresponding to the first reference boundary frame and the first prediction boundary frame corresponding to the first reference boundary frame; the boundary box comparison result set comprises a first comparison result representing the distribution similarity degree of the boundary box and a second comparison result representing the coordinate coincidence degree of the boundary box;

specifically, for the determination process of the first comparison result representing the distribution similarity degree of the boundary frames, the first comparison result can be obtained by calculating the relative entropy KL divergence between the actual boundary frames and the corresponding first prediction boundary frames; the magnitude of the relative entropy KL divergence can reflect the probability distribution difference degree of two boundary frames (namely an actual boundary frame and a corresponding first prediction boundary frame), and the larger the probability distribution difference degree is, the lower the corresponding boundary frame distribution similarity degree is, so that the relative entropy KL divergence can represent the distribution similarity degree between the actual boundary frame and the corresponding first prediction boundary frame, and the first regression loss component corresponding to the comparison dimension considered from the boundary frame distribution similarity degree angle can be determined based on the relative entropy KL divergence, so that the model is promoted to carry out boundary frame regression learning; specifically, for an actual bounding box and a first prediction bounding box corresponding to a certain first reference bounding box, the larger the relative entropy KL divergence is, the lower the probability distribution similarity degree of the first prediction bounding box and the corresponding actual bounding box is, the larger the corresponding first regression loss component is for the comparison dimension of the boundary box distribution similarity degree, therefore, the distribution similarity degree of the first prediction bounding box corresponding to the certain first reference bounding box and the corresponding actual bounding box is determined based on the relative entropy KL divergence degree of the actual bounding box and the first prediction bounding box, therefore, a first comparison result can be generated based on the relative entropy KL divergence degree, and thus the first comparison result can represent the boundary box distribution similarity degree, and further the first regression loss component corresponding to the comparison dimension of the boundary box distribution similarity degree can be determined based on the relative entropy KL divergence degree in the first comparison result.

Correspondingly, in the determining process of the second comparison result for representing the coordinate coincidence degree of the boundary frames, only the cross-correlation loss between a certain actual boundary frame and a corresponding first prediction boundary frame can be considered to obtain a target cross-correlation loss; the cross-ratio loss between a certain actual boundary frame and a corresponding first prediction boundary frame and the cross-ratio loss between a certain actual boundary frame and a first prediction boundary frame corresponding to other actual boundary frames can be comprehensively considered, and the target cross-ratio loss is determined; the size of the target cross-correlation loss can represent the coordinate coincidence degree between the actual boundary frame and the corresponding first prediction boundary frame, so that a second regression loss component corresponding to the comparison dimension considered from the coordinate coincidence degree angle of the boundary frame can be determined based on the target cross-correlation loss, and the model is further promoted to carry out boundary frame regression learning; specifically, for an actual bounding box and a first prediction bounding box corresponding to a certain first reference bounding box, determining a target intersection ratio loss between the actual bounding box and the first prediction bounding box, wherein the larger the target intersection ratio loss is, the lower the coordinate coincidence degree of the first prediction bounding box and the corresponding actual bounding box is, the larger the corresponding second regression loss component is for the comparison dimension of the coordinate coincidence degree of the bounding box, so that the coordinate coincidence degree of the first prediction bounding box corresponding to the certain first reference bounding box and the corresponding actual bounding box is determined based on the target intersection ratio loss between the actual bounding box and the first prediction bounding box, therefore, a second comparison result can be generated based on the target intersection ratio loss, the coordinate coincidence degree of the bounding box can be represented by the second comparison result, and the second regression loss component corresponding to the comparison dimension of the coordinate coincidence degree of the bounding box can be determined based on the intersection ratio loss in the second comparison result.

S1044, determining a boundary frame regression loss value of the target detection model to be trained based on a first comparison result and a second comparison result respectively corresponding to the first reference boundary frame in the first boundary frame subset;

specifically, after a bounding box comparison result set is obtained for each first reference bounding box, a sub-regression loss value corresponding to each first reference bounding box is obtained, wherein the sub-regression loss value at least comprises a first regression loss component corresponding to a first comparison dimension considered from the perspective of the distribution similarity of the bounding box and a second regression loss component corresponding to a second comparison dimension considered from the perspective of the coordinate coincidence degree of the bounding box; then, based on the sub-regression loss values corresponding to the respective first reference bounding boxes, a bounding box regression loss value for adjusting the model parameters can be determined.

It should be noted that, in the specific implementation, in the process of determining the sub-regression loss value corresponding to the first reference bounding box, the similarity degree of the bounding box distribution and the coordinate coincidence degree of the bounding box may be considered at the same time, or only the similarity degree of the bounding box distribution may be considered, that is, the bounding box comparison result set corresponding to the first reference bounding box includes the first comparison result, and the sub-regression loss value corresponding to the first reference bounding box is determined correspondingly based on the first regression loss component corresponding to the first comparison result.

S1046, updating parameters of the boundary box predictor model based on the boundary box regression loss value.

Specifically, after determining a boundary frame regression loss value based on the sub regression loss value corresponding to each first reference boundary frame, performing parameter adjustment on the boundary frame prediction sub model based on the boundary frame regression loss value by using a gradient descent method; the sub-regression loss value reflects at least a first regression loss component corresponding to a comparison dimension based on the similarity of the distribution of the boundary frames and a second regression loss component corresponding to the comparison dimension based on the coordinate coincidence degree of the boundary frames, so that the regression loss value of the boundary frames for adjusting the model parameters also reflects regression loss components corresponding to the two regression loss comparison dimensions respectively, and therefore the target detection model obtained through final training can ensure that the probability distribution of the predicted first prediction boundary frame and the actual boundary frame is closer, and can ensure that the coordinate coincidence degree of the first prediction boundary frame and the actual boundary frame is higher.

It should be noted that, iterative training is performed on model parameters based on the regression loss value of the bounding box of the target detection model to be trained, so that the obtained target detection model can refer to the existing process of adjusting and optimizing the model parameters by using the gradient descent method in a back propagation manner, which is not described herein.

In addition, it should be further noted that the target detection model trained by the model training method provided by the embodiment of the present application may be applied to any specific application scenario where target detection is required to be performed on an image to be detected, for example, specific application scenario 1 performs target detection on an image to be detected acquired by using an image acquisition device of a certain public place entrance (such as a mall entrance, a subway entrance, a scenic spot entrance, or a performance site entrance), and for example, specific application scenario 2 performs target detection on an image to be detected acquired by using an image acquisition device of each monitoring point in a certain cultivation base;

the sample image data set used in the model training process is different due to different specific application scenes of the target detection model, and for the specific application scene 1, the sample image data set can be a historical sample image acquired at an entrance of a specified public place in a preset historical time period, a target object defined by a first reference boundary frame corresponds to a target user entering the specified public place in the historical sample image, and the actual category and the first prediction category can be categories to which the target user belongs, such as at least one of age, gender, height and occupation; for the specific application scenario 2, the sample image dataset may be a historical sample image acquired by each monitoring point in the specified culture base within a preset historical time period, and correspondingly, the target object outlined by the first reference bounding box is a target culture object in the historical sample image, and the actual category and the first prediction category may be at least one of a living body state and a body size of the target culture object.

As shown in fig. 3, a schematic diagram of a specific implementation principle of a training process of a target detection model is provided, which specifically includes:

acquiring a first appointed number of first reference boundary frames and acquiring actual boundary frames corresponding to the first reference boundary frames respectively;

for each first reference bounding box: the boundary frame prediction sub-model carries out boundary frame prediction based on the first reference boundary frame to obtain a first prediction boundary frame; then, the comparison result generation module generates a boundary frame comparison result set based on the actual boundary frame corresponding to the first reference boundary frame and the first prediction boundary frame corresponding to the first reference boundary frame;

determining a boundary frame regression loss value of a target detection model to be trained based on a first comparison result and a second comparison result corresponding to each first reference boundary frame;

and carrying out iterative updating on model parameters of the target detection model to be trained based on the bounding box regression loss value until the model iterative training result meets the preset model iterative training termination condition, thereby obtaining the target detection model.

Specifically, for the determination process of the bounding box comparison result set, the generating a bounding box comparison result set in the step S1042 based on the actual bounding box corresponding to the first reference bounding box and the first prediction bounding box corresponding to the first reference bounding box specifically includes:

Calculating the relative entropy KL divergence based on the actual boundary frame and the first prediction boundary frame corresponding to the first reference boundary frame to obtain a first comparison result; and calculating boundary frame cross-correlation loss based on the actual boundary frame and the first prediction boundary frame corresponding to the first reference boundary frame to obtain a second comparison result.

Specifically, for each first reference bounding box, the bounding box comparison result set corresponding to the first reference bounding box not only comprises a first comparison result obtained from the perspective of similarity of bounding box distribution, but also comprises a second comparison result obtained from the perspective of bounding box coordinate coincidence degree, so that the comprehensiveness of the bounding box comparison result set can be improved, and the accuracy of bounding box regression loss obtained based on the bounding box comparison result set is further improved.

In implementation, as shown in fig. 4a, a schematic diagram of another specific implementation principle of the training process of the target detection model is provided, which specifically includes:

carrying out target region extraction on a sample image dataset by utilizing a preset region of interest extraction model in advance to obtain N anchor frames; wherein the sample image dataset comprises a plurality of raw sample images, each raw sample image comprising at least one target object; the feature information corresponding to each anchor frame may include position information (x, y, w, h) and category information c, i.e., (x, y, w, h, c); specifically, in the model training process, multiple parameter dimensions can be set to be mutually independent, so that the iterative training process of model parameters aiming at each dimension is also mutually independent;

Randomly sampling m anchor frames from N anchor frames as first reference boundary frames for each round of model training, and determining actual boundary frames corresponding to each first reference boundary frame respectively; wherein each target object in the sample image dataset may correspond to an actual bounding box, e.g. the total number of target objects in the sample image dataset is d, the number of actual bounding boxes before expansion is d, so that the actual bounding boxes correspond to the first prediction bounding boxes, and therefore, the actual bounding boxes corresponding to the plurality of first reference bounding boxes containing the same target object may be the same, i.e. the actual bounding boxes are expanded based on the target object outlined by the first reference bounding box, resulting in m actual bounding boxes (m > d); for example, if the target object included in a certain original sample image is a cat a, and the cat a corresponds to an actual bounding box a, and if the number of the first reference bounding boxes including the cat a is 4 (e.g., the first reference bounding boxes with serial numbers of 6, 7, 8, and 9), the actual bounding box a is expanded into 4 actual bounding boxes a (i.e., the actual bounding boxes with serial numbers of 6, 7, 8, and 9);

for each first reference boundary frame, the boundary frame prediction sub-model carries out boundary frame prediction based on the first reference boundary frame to obtain a first prediction boundary frame; then, the comparison result generation module generates a boundary frame comparison result set based on the actual boundary frame corresponding to the first reference boundary frame and the corresponding first prediction boundary frame; each first reference boundary frame corresponds to an actual boundary frame and a first prediction boundary frame, and the first prediction boundary frames are predicted by a boundary frame predictor model which continuously carries out boundary frame regression learning; specifically, the target object outlined by the first prediction boundary frames with the serial numbers of 6, 7, 8 and 9 in the m first prediction boundary frames output by the boundary frame predictor model is cat A;

Determining a first regression loss component based on a first comparison result in a boundary frame comparison result set of the first reference boundary frame for each first reference boundary frame, and determining a second regression loss component based on a second comparison result in the boundary frame comparison result set of the first reference boundary frame;

determining a boundary box regression loss value of a target detection model to be trained based on a first regression loss component and a second regression loss component which are respectively corresponding to each first reference boundary box; adjusting model parameters of the boundary frame prediction sub-model based on the regression loss value of the boundary frame by using a random gradient descent method to obtain a boundary frame prediction sub-model with updated parameters;

if the model iterative training result meets the preset model iterative training termination condition, determining the updated boundary frame predictor model as a trained target detection model;

if the model iterative training result does not meet the preset model iterative training termination condition, determining the updated boundary frame predictor model as a target detection model to be trained for the next round of model training until the preset model iterative training termination condition is met.

The determining a bounding box regression loss value of the target detection model to be trained based on the sub-regression loss values corresponding to the first reference bounding boxes respectively, wherein the sub-regression loss value corresponding to each first reference bounding box is determined based on the regression loss components, and based on the determining, in step S1044, the bounding box regression loss value based on the first comparison result and the second comparison result corresponding to the first reference bounding boxes in the first bounding box subset respectively specifically includes:

determining sub-regression loss values corresponding to the first reference bounding boxes in the first bounding box subset respectively; the sub-regression loss value corresponding to each first reference bounding box is determined based on target information, wherein the target information includes one or a combination of the following: the boundary frame distribution similarity represented by the first comparison result and the boundary frame coordinate coincidence degree represented by the second comparison result corresponding to the first reference boundary frame;

and determining the bounding box regression loss value of the target detection model to be trained currently based on the sub regression loss values respectively corresponding to the first reference bounding boxes in the first bounding box subset.

Specifically, in the specific implementation, in the process of determining the sub-regression loss value corresponding to the first reference bounding box, only the first regression loss component corresponding to the first comparison result can be considered, and the first regression loss component corresponding to the first comparison result and the second regression loss component corresponding to the second comparison result can also be considered at the same time; taking the example of considering two bounding box comparison dimensions at the same time, for each first reference bounding box, the corresponding sub-regression loss value is equal to the weighted sum of the two regression loss components, which can be expressed in particular,

V _i (D,G)＝λ ₁ V _i1 +λ ₂ V _i2

Wherein lambda is ₁ Representing a first weight coefficient corresponding to a first regression loss component in a first comparison dimension, V _i1 Represents a first regression loss component (i.e., a regression loss component corresponding to the degree of similarity of the bounding box distribution characterized by the first comparison result) at a first comparison dimension, λ ₂ Representing a second weight coefficient corresponding to a second regression loss component in a second alignment dimension, V _i2 Representing a second regression loss component in a second comparison dimension (i.e., a regression loss component corresponding to the degree of coincidence of the bounding box coordinates characterized by the second comparison result); specifically, the first comparison dimension may be a regression loss comparison dimension based on the degree of similarity of the boundary box distribution, and the second comparison dimension may be a regression loss comparison dimension based on the degree of coincidence of the boundary box coordinates.

In a specific implementation, for a plurality of first reference bounding boxes, the first weight coefficient and the second weight coefficient may be kept unchanged, however, considering that the first regression loss component and the second regression loss component respectively correspond to different regression loss comparison dimensions (i.e. a regression loss comparison dimension based on the degree of similarity of the bounding box distribution and a regression loss comparison dimension based on the degree of coincidence of the bounding box coordinates), and that the emphasis points of the regression loss considerations of the different regression loss comparison dimensions are also different (e.g. the regression loss of the first reference bounding box corresponding to an actual bounding box based on the degree of similarity of the bounding box distribution and the regression loss of the first reference bounding box corresponding to an actual bounding box considering the edge blurring of the bounding box, the regression loss based on the degree of coincidence of the bounding box coordinates and the regression loss of the first reference bounding box corresponding to the dimension considering the similarity of the bounding box distribution and the specific position deviation, the size relationship of the first regression loss component and the second regression loss component reflects to a certain degree which regression loss can more accurately represent the regression between the actual bounding box and the first prediction bounding box than the corresponding to the dimension; specifically, if the absolute value of the difference between the first regression loss component and the second regression loss component is not greater than a preset loss threshold, the first weight coefficient and the second weight coefficient are kept unchanged; if the absolute value of the difference value between the first regression loss component and the second regression loss component is larger than a preset loss threshold value and the first regression loss component is larger than the second regression loss component, increasing a first weight coefficient according to a first preset adjustment mode; if the absolute value of the difference between the first regression loss component and the second regression loss component is larger than a preset loss threshold value and the first regression loss component is smaller than the second regression loss component, increasing the second weight coefficient according to a second preset adjustment mode, so that the effect that the key reference can reflect the regression loss component corresponding to the comparison dimension of the regression loss of the bounding box more in the model training process for each first reference bounding box is achieved, and further the accuracy of model parameter optimization is further improved.

It should be noted that, the first weight coefficient increasing amplitude corresponding to the first preset adjusting mode and the second weight coefficient increasing amplitude corresponding to the second preset adjusting mode may be the same or different, and the weight coefficient increasing amplitude may be set according to actual requirements, which is not limited in this application.

The calculating the relative entropy KL divergence according to the actual bounding box and the first prediction bounding box corresponding to the first reference bounding box to obtain a first comparison result according to comparison dimension consideration of the distribution similarity of the bounding boxes, specifically includes:

a1, determining a first probability distribution of an actual boundary box corresponding to a first reference boundary box and determining a second probability distribution of a first prediction boundary box corresponding to the first reference boundary box;

step A2, calculating KL divergence values between the first probability distribution and the second probability distribution; the KL divergence value is used for representing the distribution similarity degree between the first prediction boundary box and the actual boundary box;

and A3, determining a first comparison result corresponding to the first reference boundary box based on the KL divergence value.

Specifically, for each first reference bounding box, calculating a KL divergence value between a first probability distribution corresponding to an actual bounding box and a second probability distribution corresponding to a first prediction bounding box from a bounding box distribution similarity degree angle, wherein the KL divergence value can represent the bounding box distribution similarity degree between the actual bounding box and the corresponding first prediction bounding box, the smaller the KL divergence value is, the smaller the bounding box distribution difference degree is, and the corresponding bounding box distribution similarity degree is also larger, so that a first comparison result can be obtained after determining the KL divergence value between the actual bounding box and the first prediction bounding box, wherein the first comparison result can represent the bounding box distribution similarity degree; further, based on the first comparison result, a first regression loss component corresponding to the comparison dimension representing the distribution similarity degree of the boundary frame can be determined, wherein the larger the KL divergence value is, the lower the distribution similarity degree of the actual boundary frame corresponding to the first reference boundary frame and the corresponding first prediction boundary frame is represented, and therefore the larger the first regression loss component corresponding to the first reference boundary frame is; then, model parameters of the boundary box predictor model are updated based on the first regression loss component, and the boundary box prediction effect of the boundary box predictor model is improved.

In particular, due to the actual bounding box and the predicted edgeThe probability of occurrence of the bounding box is subject to a probability distribution (e.g. Gaussian distribution) if the first probability distribution isAnd a second probability distribution of +.>The KL divergence value can be expressed as +.>The first probability distribution and the second probability distribution may in particular be determined by:

wherein,a first reference bounding box denoted by the number i, sigma ₁ Representing the first variance, b _ground Mean value of actual bounding box, θ _d Representing parameters related to the true bounding box distribution; />

Wherein,a first reference bounding box denoted by the number i, sigma ₂ Representing the second variance, b _estimation Represents the mean value, θ, of the first prediction bounding box _g Model parameters representing the bounding box predictor model.

Specifically, the above-mentioned bounding box regression loss value is equal to the sum of the sub-regression loss values corresponding to the first reference bounding boxes of the first specified number, which may be expressed as:

wherein N is _reg Representing a first designated number, i representing the sequence number of the first reference bounding box, i having a value of 1 to N _reg 。

The calculating the boundary frame cross-correlation loss according to the actual boundary frame and the first prediction boundary frame corresponding to the first reference boundary frame to obtain a second comparison result according to the comparison dimension consideration of the coordinate coincidence degree of the boundary frames, specifically includes:

Step B1, calculating boundary frame cross-ratio loss of an actual boundary frame corresponding to a first reference boundary frame and a first prediction boundary frame corresponding to the first reference boundary frame to obtain first cross-ratio loss;

specifically, taking a first reference boundary box with the sequence number i as an example, calculating the cross-ratio loss between the actual boundary box with the sequence number i and the first prediction boundary box with the sequence number i, and obtaining a first cross-ratio loss corresponding to the first reference boundary box with the sequence number i.

Step B2, determining a second comparison result corresponding to the first reference boundary box based on the first cross comparison loss; wherein the bounding box intersection ratio loss can characterize the coordinate coincidence degree of the bounding boxes.

Specifically, since the degree of coincidence of the boundary frame coordinates can be represented by the magnitude of the coincidence ratio loss between the two boundary frames, a second comparison result can be obtained based on the coincidence ratio loss between the actual boundary frame and the first prediction boundary frame, so that a second regression loss component corresponding to the comparison dimension considered from the angle of the degree of coincidence of the boundary frame coordinates is determined based on the second comparison result, and further the model is promoted to carry out boundary frame regression learning.

Further, for the determination process of the second comparison result, only the first intersection loss between the actual bounding box and the first prediction bounding box corresponding to the actual bounding box and the second prediction bounding box corresponding to the actual bounding box can be considered, however, in order to improve the accuracy of the second regression loss component corresponding to the comparison dimension considered from the perspective of the coordinate coincidence degree of the bounding box, further improve the accuracy of the bounding box regression loss value used for adjusting the model parameter, not only consider the first intersection loss between the actual bounding box and the first prediction bounding box corresponding to the actual bounding box, but also consider the second intersection loss between the actual bounding box and other first prediction bounding boxes, so that the actual bounding box can be compared with a positive sample (namely, a first prediction bounding box corresponding to a certain actual bounding box obtained through bounding box regression learning) and a negative sample (namely, a first prediction bounding box corresponding to another actual bounding box except a certain actual bounding box obtained through bounding box regression learning) in the comparison dimension of the coordinate coincidence degree of the bounding box, so as to learn the specific position representation of the actual bounding box, further consider the second intersection loss between the actual bounding box and the first prediction bounding box, and further consider the second intersection loss between the actual bounding box and other first prediction bounding boxes, and the second prediction bounding box, and the second comparison result is determined based on the first comparison result, and the comparison result is better:

B21, determining a comparison boundary box set in first prediction boundary boxes corresponding to the first appointed number of first reference boundary boxes respectively; the comparison boundary box set comprises other first prediction boundary boxes except for the first prediction boundary box corresponding to the first reference boundary box or other first prediction boundary boxes which do not contain the target object outlined by the first reference boundary box;

specifically, taking the first reference bounding box with the sequence number i as an example, the above-mentioned comparison bounding box set may include other first prediction bounding boxes except the first prediction bounding box with the sequence number i (i.e. the first prediction bounding box with the sequence number k, k not equal to p, p=i), that is, all the other first prediction bounding boxes except the first prediction bounding box with the sequence number i are taken as negative examples of the actual bounding box with the sequence number i; in order to further improve the selection accuracy of the negative example samples, the above-mentioned comparison bounding box set may include other first prediction bounding boxes except the first prediction bounding box with the sequence number i, and the other first prediction bounding boxes do not include the target object outlined by the first reference bounding box with the sequence number i (i.e. the first prediction bounding box with the sequence number k, k not equal p, p=i or p=j, where the first prediction bounding box with the sequence number j is the same as the target object outlined by the first reference bounding box with the sequence number i), that is, only the other first prediction bounding boxes including different target objects with the first reference bounding box with the sequence number i are taken as the negative example samples of the actual bounding box with the sequence number i.

B22, calculating boundary frame cross-ratio loss of the actual boundary frame corresponding to the first reference boundary frame and other first prediction boundary frames respectively to obtain second cross-ratio loss;

specifically, taking the first reference bounding box with the sequence number i as an example, calculating the cross-ratio loss between the actual bounding box with the sequence number i and the first prediction bounding box with the sequence number k for each other first prediction bounding box in the comparison bounding box set, and obtaining the second cross-ratio loss corresponding to the first prediction bounding box with the sequence number k.

And B23, determining a second comparison result corresponding to the first reference boundary box based on the first cross-correlation loss and the second cross-correlation loss.

Specifically, in the process of determining the second comparison result representing the coordinate coincidence degree of the boundary frame, based on the actual boundary frame with the sequence number i and the first prediction boundary frame with the sequence number i, a first cross-correlation loss is calculated, and based on the actual boundary frame with the sequence number i and the first prediction boundary frame with the sequence number k, a second cross-correlation loss (k not equal to p) is calculated, so as to determine the second comparison result (namely, the second comparison result can comprise the first cross-correlation loss and the second cross-correlation loss), then, based on the second comparison result, a second regression loss component related to the coordinate coincidence degree of the boundary frame can be determined, so that the model parameter is adjusted based on the second regression loss component, the coordinate coincidence degree of the actual boundary frame with the first prediction boundary frame with the sequence number i is higher, and the coordinate coincidence degree with other first prediction boundary frames with the sequence number i is smaller, thereby enhancing the global performance of the regression learning of the boundary frame and further improving the accuracy of the regression learning of the boundary frame.

In a specific implementation, the second regression loss component is a logarithm of a target cross-ratio loss, where the target cross-ratio loss is a quotient value of a sum of an index of the first cross-ratio loss and a plurality of indexes of the second cross-ratio, that is, taking p=i as an example, the second regression loss component may be expressed as:

wherein,representing the actual bounding box corresponding to the first reference bounding box with the sequence number i +.>A first reference bounding box with the sequence number i +.>A first prediction bounding box corresponding to a first reference bounding box with the sequence number i +.>Representing a first cross-ratio loss, ">A first reference bounding box with the sequence number k +.>A first prediction bounding box corresponding to a first reference bounding box with a sequence number k,represents the second cross-ratio loss, theta _g Model parameters representing the bounding box predictor model, ω representing a preset adjustment factor.

Further, considering that in the target detection process, the target detection model needs to determine not only the position of the target object, but also the specific category of the target object, so that in the training process of the target detection model, there may be a problem of low accuracy of category identification for some first reference boundary frames, considering that in the case of a first reference boundary frame with low accuracy of category prediction, the first prediction boundary frame corresponding to such first reference boundary frame may not truly reflect the boundary frame prediction accuracy of the boundary frame prediction sub-model, and further the autoregression loss of the first prediction boundary frame corresponding to such first original boundary frame and the actual boundary frame also cannot truly reflect the boundary frame prediction accuracy of the boundary frame prediction sub-model, and therefore, in order to further improve the accuracy of the boundary frame regression loss value, in the process of determining the autoregression loss value corresponding to the first prediction boundary frame, considering the first prediction category corresponding to the first prediction boundary frame, and considering the corresponding subsegression loss value only if the actual category corresponding to the first prediction boundary frame matches the first prediction category, otherwise considering that the corresponding subsegression loss value only does not accord with the preset regression loss value, namely, the preset classification loss value is not met based on the target classification loss; the specific implementation manner of each model training further comprises the following steps: the boundary frame classification sub-model classifies the first reference boundary frame or the first prediction boundary frame to obtain a first class prediction result; in a specific implementation, the boundary box classification sub-model performs class prediction on the first reference boundary box or the first prediction boundary box, and the output result can be a first class prediction result; the first class prediction result comprises the prediction probability that the first reference boundary frame or the target object outlined by the first prediction boundary frame belongs to each candidate class, the candidate class corresponding to the maximum value of the prediction probability is the first prediction class, namely the class of the target object outlined by the first reference boundary frame or the first prediction boundary frame is predicted to be the first prediction class by the boundary frame classification sub-model, namely the class of the target object in the first reference boundary frame or the image area in the first prediction boundary frame is predicted to be the first prediction class by the boundary frame classification sub-model; in addition, in the specific implementation, considering that the position information of the first reference boundary frame and the position information of the first prediction boundary frame do not deviate greatly, the image features in the first reference boundary frame and the image features in the first prediction boundary frame do not deviate greatly, so that the identification of the target object type of the image area in the boundary frame is not affected, based on the identification, the first prediction boundary frame can be input into the boundary frame classification sub-model for carrying out the type prediction according to the situation that the boundary frame prediction and the type prediction are carried out successively, so that a corresponding first type prediction result is obtained, namely, the first prediction boundary frame is obtained based on the first reference boundary frame prediction, and then the first type prediction result is obtained by carrying out the type prediction on the first prediction boundary frame; and aiming at the situation that the boundary frame prediction and the class prediction are synchronously executed, the first reference boundary frame can be input into the boundary frame classification sub-model to conduct class prediction, so that a corresponding first class prediction result is obtained, namely, the first prediction boundary frame is obtained based on the first reference boundary frame prediction, and the class prediction is conducted on the first reference boundary frame, so that the first class prediction result is obtained.

It should be noted that, the model parameter iterative training process of the above-mentioned bounding box classification sub-model may refer to the existing classification model training process, and will not be described herein.

Specifically, the target information further includes a category matching result between a first prediction category represented by a first category prediction result corresponding to the first reference bounding box and an actual category of the first reference bounding box, wherein, for a determining process of the sub-regression loss value corresponding to each first reference bounding box, if the corresponding category matching result is that the first prediction category does not match the actual category, the sub-regression loss value corresponding to the first reference bounding box is zero; if the corresponding category matching result is that the first prediction category is matched with the actual category, the sub-regression loss value corresponding to the first reference boundary box is a sub-regression loss value determined based on at least one of the first regression loss component corresponding to the boundary box distribution similarity degree and the second regression loss component corresponding to the boundary box coordinate coincidence degree.

Specifically, determining whether the first prediction category corresponding to the first reference bounding box matches the actual category may relate to the first category prediction result, which may specifically include: a constraint condition of a single matching mode or a constraint condition of a variable matching mode, wherein for the constraint condition of the single matching mode, a category matching constraint condition used by each round of model training is kept unchanged (i.e. is irrelevant to the number of current model training rounds), for example, for each round of model training, if an actual category is the same as a first prediction category, the first prediction category corresponding to the first reference bounding box is determined to be matched with the actual category; for the constraint conditions of the change matching mode, the class matching constraint conditions used by each round of model training are related to the number of current model training rounds, and in particular, the constraint conditions of the change matching mode can be classified into class matching stage constraint conditions or class matching gradual change constraint conditions;

The above-mentioned category matching stage constraint condition may be that when the number of training rounds of the current model is less than a first preset number of rounds, the actual category and the first prediction category belong to the same category group, and when the number of training rounds of the current model is greater than or equal to the first preset number of rounds, the actual category is the same as the first prediction category, that is, based on the category matching stage constraint condition and the category prediction result corresponding to the first reference bounding box, the stage category matching constraint can be implemented; the class matching gradual change constraint condition may be that the sum of a first constraint term and a second constraint term is greater than a preset probability threshold, the first constraint term is a first prediction probability corresponding to an actual class in a class prediction probability subset, the second constraint term is a product of the sum of second prediction probabilities except the first prediction probability in the class prediction probability subset and a preset adjustment factor, the preset adjustment factor gradually decreases along with the increase of the number of current training rounds, that is, based on the class prediction result corresponding to the class matching gradual change constraint condition and the first reference boundary frame, gradual change class matching constraint can be realized; specifically, a class prediction probability subset is determined based on a class prediction result corresponding to the first reference boundary box, the class prediction probability subset comprises a first prediction probability that a target object outlined by the first prediction boundary box belongs to an actual class and a second prediction probability that the target object belongs to a non-actual class in a target group, namely the class prediction probability subset comprises a first prediction probability under the actual class in the target group and a second prediction probability under the non-actual class (namely a candidate class in the target group except the actual class) in the target group, which are obtained by carrying out class prediction on the first reference boundary box or the first prediction boundary box by a boundary box classification sub-model, and the target group is a class group where the actual class is located; in specific implementation, a plurality of candidate categories associated with the target detection task are predetermined, and based on semantic information of each candidate category, the plurality of candidate categories are subjected to group division to obtain a plurality of category groups.

Specifically, because the first reference bounding box is obtained by extracting the region of interest by using the preset region of interest extraction model, there may be a case that the classification recognition of the first prediction bounding box corresponding to the first reference bounding box is inaccurate in the initial stage of model training due to the fact that the region where the target object outlined by the first reference bounding box is located is not accurate enough, and based on this, in the process of determining the sub-regression loss value corresponding to the first reference bounding box, the classification matching result between the first prediction classification corresponding to the first reference bounding box and the actual classification of the first reference bounding box is referred to, that is, the classification matching result used for representing whether the first prediction classification corresponding to the first reference bounding box is matched with the actual classification is determined based on the preset classification matching constraint condition;

further, the boundary box classification sub-model may be pre-trained, or in the process of training the model parameters of the boundary box prediction sub-model, the model parameters of the boundary box classification sub-model may be synchronously trained, that is, the classification loss value is determined based on the first prediction class and the actual class, the model parameters of the boundary box classification sub-model are iteratively trained based on the classification loss value, wherein, for the case of synchronously training the model parameters of the boundary box classification sub-model, and considering that the accuracy of the model parameters in the boundary box classification sub-model in the target detection model to be trained is low, which may also be due to the fact that the accuracy of the model parameters in the boundary box classification sub-model to be trained is low, in the early stage of model training, the requirement on the class accuracy is relaxed, as long as the actual class corresponding to the first prediction boundary box belongs to the same class group as the first prediction class, and in the later stage of model training, the requirement on the class accuracy is strictly added, and the constraint condition can be met based on the constraint condition that the actual class corresponding to the first prediction class is the same as the first prediction class is preset, so that the constraint condition is satisfied: constraint conditions of the above-mentioned change matching modes (such as category matching staged constraint conditions or category matching gradual change constraint conditions);

Further, in order to ensure that the transition between two types of class matching constraint branches defining that the first prediction class and the actual class satisfy the class matching result is smoother (i.e. the first prediction class belongs to the target group, and the first prediction class is the same as the actual class), so that as the number of model training rounds increases, the preset class matching constraint condition gradually changes from defining that the first prediction class falls into the target group to defining that the first prediction class is the same as the actual class, based on which, preferably, the preset class matching constraint condition includes: category matching gradient constraints.

In a specific implementation, for the case that the preset category matching constraint condition is a category matching gradual change constraint condition, taking a first reference bounding box with a sequence number i as an example, the category matching gradual change constraint condition may be expressed as:

wherein groups represent target groups, real _i Representing the actual class of the first reference bounding box with the sequence number i in the target group groups, f epsilon groups\real _i Representing non-actual categories in the target group, beta represents a predictive modifier,representing a first predictive probability (i.e. the first constraint mentioned above), -a- >Representing a second predictive probability,/->Representing the second constraint item, μ representing the preset probability threshold; specifically, the->The larger the first predicted class is, the closer the first predicted class is to the actual class is; since the preset adjustment factor decreases with the increase of the current training wheel number, the reference duty ratio of the second constraint term gradually decreases, so that whether the first prediction category matches the actual category is mainly determined by the first constraint term (i.e. the first prediction probability under the actual category) in the later period of model training, and then the second constraint term becomes zero after the current model training wheel number reaches a certain model training wheel number, i.e. when>And when the actual category is larger than the preset probability threshold value, determining the actual category as a first prediction category by the description boundary box classification sub-model.

Specifically, for the preset adjustment factor, the current model training wheel number is reduced along with the increase of the current model training wheel number, and if the current model training wheel number is smaller than or equal to the target training wheel number, the second constraint term is positively correlated with the preset adjustment factor, and the preset adjustment factor is negatively correlated with the current model training wheel number; and if the number of training wheels of the current model is larger than the number of target training wheels, the second constraint item is zero, wherein the number of target training wheels is smaller than the total number of training wheels.

In specific implementation, in order to ensure the adjustment smoothness of the preset adjustment factor, the value of the preset adjustment factor β may be gradually reduced by adopting a linearly decreasing adjustment manner, so that the determination process of the preset adjustment factor used for the current model training specifically includes:

(1) Aiming at first-round model training, determining a first preset value as a preset adjustment factor used for current model training;

specifically, the first preset value may be set according to actual requirements, in order to simplify the adjustment complexity, the first preset value may be set to 1, that is, the preset adjustment factor β=1, that is, in the case of first-round model training, the above-mentioned category matching gradual change constraint condition may be:

i.e. < ->/>

That is, for first round model training, it is determined whether a first predicted class corresponding to a first reference bounding box matches an actual class based on a sum of a first predicted probability and a second predicted probability corresponding to a target group.

(2) Aiming at the model training of the non-initial round, determining a preset adjusting factor used for the model training according to a factor decreasing adjusting mode based on the current model training round number, the target training round number and the first preset value.

Specifically, if the preset adjustment factor β=1 corresponding to the first-round model training, under the condition of non-first-round model training, the category matching gradual change constraint condition may be:

β＜1；

that is, for non-first round model training, the above-mentioned categories match in the gradient constraintsAnd as the number of model training rounds increases, a second constraint termThe participation degree of the (c) is gradually reduced.

For example, the decreasing formula corresponding to the factor decreasing adjustment manner may be:

wherein,representation->Maximum value with 0, above +.>The first term 1 in (a) represents a first preset value (i.e., a preset adjustment factor β used for initial training), δ represents the current model training wheel number, Z represents the target training wheel number, i.e., the target training wheel number may be the total training wheel number minus 1, or may be the designated training wheel number, the designated training wheel number is smaller than the total training wheel number, the difference between the total training wheel number and the designated training wheel number is the preset wheel number Q, Q is greater than 2, i.e., the preset adjustment factor β is set to 0 in the training process of a certain wheel number (not the last wheel) in the later stage of model training, i.e., the judgment conditions used in model training from δ=z+1 to the last wheel in the later stage of model training are all

It should be noted that, for the case that the target training wheel number Z is the total training wheel number minus 1, the decreasing formula may be:that is, the preset adjustment factor is set to 0 in the last round of model training, that is, the judgment conditions used in the last round of model training are +.>In addition, the above-indicated decreasing formula is only a relatively simple linear decreasing adjustment mode, and in the practical application process, the decreasing of the preset adjustment factor beta can be set according to the actual requirementThe above decreasing formula therefore does not constitute a limitation on the scope of protection of the present application.

In a specific implementation, the above target detection model to be trained includes a bounding box predictor model and a bounding box classifier model, as shown in fig. 4b, which shows a schematic diagram of a specific implementation principle of a further target detection model training process, and specifically includes:

(1) Carrying out target region extraction on a sample image dataset by utilizing a preset region of interest extraction model in advance to obtain N anchor frames;

(2) Randomly sampling m anchor frames from N anchor frames as first reference boundary frames for each round of model training, and determining actual boundary frames corresponding to each first reference boundary frame respectively;

(3) For each first reference boundary frame, the boundary frame prediction sub-model carries out boundary frame prediction based on the first reference boundary frame to obtain a first prediction boundary frame; then, the comparison result generation module generates a boundary frame comparison result set based on the actual boundary frame corresponding to the first reference boundary frame and the corresponding first prediction boundary frame; the boundary frame classification sub-model carries out category prediction on the first prediction boundary frame to obtain a category prediction result; determining a category matching result according to a preset category matching constraint condition, an actual category of an actual boundary frame corresponding to the first reference boundary frame and a category prediction result of a first prediction boundary frame corresponding to the first reference boundary frame; if the category matching result indicates that the first prediction category and the actual category do not meet the preset category matching constraint condition, the sub-regression loss value corresponding to the first reference boundary box is zero; if the category matching result represents that the first prediction category and the actual category meet the preset category matching constraint condition, determining a first regression loss component based on a first comparison result in a boundary frame comparison result set of the first reference boundary frame, determining a second regression loss component based on a second comparison result in the boundary frame comparison result set of the first reference boundary frame, and determining a sub regression loss value corresponding to the first reference boundary frame based on the first regression loss component and the second regression loss component;

It should be noted that, the determining process of the above-mentioned category matching result may be considered in the process of determining the bounding box regression loss value based on the bounding box comparison result set, or may be considered in the process of generating the bounding box comparison result set for a certain first reference bounding box, so that, for the case that the first prediction category and the actual category do not meet the preset category matching constraint condition, it is only necessary to directly determine that the corresponding bounding box comparison result set is empty or preset information, and it is unnecessary to generate the bounding box comparison result set based on the actual bounding box corresponding to the first reference bounding box and the corresponding first prediction bounding box, so that the model training efficiency can be further improved; specifically, referring to the comparison result generation module shown in fig. 4b, a boundary frame comparison result set is generated based on the actual boundary frame and the first prediction boundary frame corresponding to the first reference boundary frame, and the actual category and the category prediction result corresponding to the first reference boundary frame; specifically, determining a category matching result according to an actual category of an actual bounding box corresponding to the first reference bounding box and a category prediction result of a first prediction bounding box corresponding to the first reference bounding box; if the category matching result indicates that the first prediction category and the actual category do not meet the preset category matching constraint condition, the corresponding boundary box comparison result set is empty or preset information, and therefore the sub regression loss value determined based on the boundary box comparison result set is zero; if the category matching result represents that the first prediction category and the actual category meet the preset category matching constraint condition, generating a boundary frame comparison result set based on the actual boundary frame corresponding to the first reference boundary frame and the corresponding first prediction boundary frame; therefore, the sub-regression loss value determined based on the boundary box comparison result set is determined based on a first regression loss component corresponding to a first comparison result and a second regression loss component corresponding to a second comparison result in the boundary box comparison result set;

That is, in determining whether the sub-regression loss value corresponding to the first reference bounding box is zero, the bounding box comparison result set may be generated directly based on the actual bounding box corresponding to the first reference bounding box and the corresponding first prediction bounding box; determining a category matching result (namely, a category matching result which indicates whether a preset category matching constraint condition is met between the first prediction category and the actual category) of the first prediction category and the actual category based on the category prediction result; if the category matching result is category mismatch, determining that the corresponding sub-regression loss value is zero, and if the category matching result is category match, determining the corresponding sub-regression loss value based on a plurality of comparison results in the boundary box comparison result set; the method comprises the steps of determining a category matching result of a first prediction category and an actual category based on a category prediction result, determining that a corresponding bounding box comparison result set is empty or preset information if the category matching result is a category mismatch, determining that a corresponding sub-regression loss value is zero if the category matching result is a category match, generating a bounding box comparison result set based on an actual bounding box corresponding to a first reference bounding box and a corresponding first prediction bounding box, and determining a corresponding sub-regression loss value based on a plurality of comparison results in the bounding box comparison result set;

(4) Determining a bounding box regression loss value of a target detection model to be trained based on the sub regression loss values respectively corresponding to the first reference bounding boxes; adjusting model parameters of the boundary frame prediction sub-model based on the regression loss value of the boundary frame by using a random gradient descent method to obtain a boundary frame prediction sub-model with updated parameters;

(5) If the model iterative training result meets the preset model iterative training termination condition, determining the updated boundary frame predictor model as a trained target detection model; if the model iterative training result does not meet the preset model iterative training termination condition, determining the updated boundary frame predictor model as a target detection model to be trained for the next round of model training until the preset model iterative training termination condition is met.

In the model training method in the embodiment of the application, in the model training stage, the boundary frame predictor model predicts based on the first reference boundary frame to obtain a first prediction boundary frame, and then based on the first prediction boundary frame and the corresponding actual boundary frame, the target detection model to be trained is caused to continuously learn the boundary frame distribution, so that the predicted first prediction boundary frame is more similar to the corresponding actual boundary frame, and the boundary frame prediction accuracy, model generalization and data migration of the trained target detection model are improved; the comparison result set used for determining the boundary frame regression loss value not only comprises a first comparison result representing the boundary frame distribution similarity degree, but also comprises a second comparison result representing the boundary frame coordinate coincidence degree, and the boundary frame regression loss value is obtained based on the first comparison result and the second comparison result which are respectively corresponding to each first reference boundary frame, so that the boundary frame regression loss value comprises the regression loss obtained based on the coarse granularity comparison dimension of the boundary frame distribution similarity degree and the regression loss obtained based on the fine granularity comparison dimension of the boundary frame coordinate coincidence degree, the accuracy of the boundary frame regression loss value can be improved, and the accuracy of model parameters updated based on the boundary frame regression loss value can be further improved.

Corresponding to the model training method described in fig. 1 to fig. 4b, based on the same technical concept, the embodiment of the present application further provides a target detection method, fig. 5 is a flowchart of the target detection method provided in the embodiment of the present application, where the method in fig. 5 can be performed by an electronic device provided with a target detection apparatus, and the electronic device may be a terminal device or a designated server, where a hardware device for target detection (i.e. the electronic device provided with the target detection apparatus) and a hardware device for target detection model training (i.e. the electronic device provided with the target detection model training apparatus) may be the same or different, and as shown in fig. 5, the method at least includes the following steps:

s502, acquiring a second boundary box subset corresponding to the image to be detected from a second alternative boundary box set; the second boundary box subset comprises a third appointed number of second reference boundary boxes, and the second alternative boundary box set is obtained by extracting target areas of the image to be detected by using a preset interested area extraction model;

specifically, the process of obtaining the third specified number of second reference bounding boxes may refer to the process of obtaining the first specified number of first reference bounding boxes, which is not described herein.

S504, inputting the second reference boundary boxes into a target detection model to carry out target detection, and obtaining second prediction boundary boxes and second class prediction results corresponding to the second reference boundary boxes; the target detection model is obtained by training based on the model training method, and the specific training process of the target detection model is referred to the above embodiment and is not described herein.

Specifically, the target detection model comprises a boundary box classification sub-model and a boundary box prediction sub-model; for each second reference bounding box: in the target detection process, the boundary frame prediction sub-model carries out boundary frame prediction based on a second reference boundary frame to obtain a second prediction boundary frame corresponding to the second reference boundary frame; and the boundary frame classification sub-model classifies the second reference boundary frame or the second prediction boundary frame to obtain a second prediction category corresponding to the second reference boundary frame.

In the implementation, the boundary frame classification sub-model performs class prediction on the second reference boundary frame or the second prediction boundary frame, and the output result may be a second class prediction result; the second class prediction result comprises the prediction probability that the second reference boundary frame or the target object outlined by the second prediction boundary frame belongs to each candidate class, the candidate class corresponding to the maximum value of the prediction probability is the second prediction class, namely the class of the target object outlined by the second reference boundary frame or the second prediction boundary frame is predicted to be the second prediction class by the boundary frame classification sub-model, namely the class of the target object in the image area in the second reference boundary frame or the second prediction boundary frame is predicted to be the second prediction class by the boundary frame classification sub-model; in addition, in the specific implementation, considering that the position information of the second reference boundary frame and the second prediction boundary frame does not deviate greatly, the image features in the second reference boundary frame and the image features in the second prediction boundary frame do not deviate greatly, so that the identification of the target object category of the image area in the boundary frame is not affected, based on the fact that the boundary frame prediction and the category prediction are carried out successively, the second prediction boundary frame can be input into the boundary frame classification sub-model for category prediction, and a corresponding second category prediction result is obtained, namely, the second prediction boundary frame is obtained based on the second reference boundary frame prediction, and then the category prediction is carried out on the second prediction boundary frame, so that a second category prediction result is obtained; and aiming at the situation that the boundary frame prediction and the class prediction are synchronously executed, a second reference boundary frame can be input into the boundary frame classification sub-model to conduct class prediction, so that a corresponding second class prediction result is obtained, namely, a second prediction boundary frame is obtained based on the second reference boundary frame prediction, and class prediction is conducted on the second reference boundary frame, so that a second class prediction result is obtained.

S506, generating a target detection result of the image to be detected based on the second prediction boundary boxes and the second class prediction results corresponding to the second reference boundary boxes.

Specifically, based on the second prediction bounding boxes and the second prediction categories corresponding to the second reference bounding boxes, the number of target objects contained in the image to be detected and the category to which each target object belongs can be determined, for example, the image to be detected contains a cat, a dog and a pedestrian.

In specific implementation, the above object detection model includes a bounding box predictor model and a bounding box classifier model, as shown in fig. 6, which provides a schematic diagram of a specific implementation principle of an object detection process, and specifically includes:

extracting target areas of the image to be detected by using a preset interested area extraction model to obtain P anchor frames;

randomly sampling n anchor frames from the P anchor frames to serve as a second reference boundary frame;

for each second reference boundary frame, the boundary frame prediction sub-model carries out boundary frame prediction based on the second reference boundary frame to obtain a second prediction boundary frame; the boundary frame classification sub-model carries out category prediction on the second prediction boundary frame to obtain a second prediction category;

And generating a target detection result of the image to be detected based on the second prediction boundary boxes and the second prediction categories corresponding to the second reference boundary boxes.

It should be noted that, the target detection model obtained based on the training of the target detection model training method can be applied to any specific application scenario in which target detection is required to be performed on an image to be detected, where the image to be detected may be acquired by an image acquisition device disposed at a certain site position, and the corresponding target detection device may belong to the image acquisition device, and may specifically be an image processing device in the image acquisition device, where the image processing device receives the image to be detected transmitted by the image acquisition device in the image acquisition device, and performs target detection on the image to be detected; the object detection means may also be a separate object detection device independent of the image acquisition device, which receives the image to be detected of the image acquisition device and performs object detection on the image to be detected.

Specifically, for a specific application scenario of target detection, for example, an image to be detected may be acquired by an image acquisition device disposed at a certain public place entrance (such as a mall entrance, a subway entrance, a scenic spot entrance, or a performance site entrance, etc.), and the corresponding target object to be detected in the image to be detected is a target user entering the public place, and the target detection model is used to perform target detection on the image to be detected, so as to define a second prediction bounding box containing the target user entering the public place in the image to be detected, and determine a second prediction category corresponding to the second prediction bounding box (i.e., a category to which the target user contained in the second prediction bounding box belongs, such as at least one of age, gender, height, and occupation), so as to obtain a target detection result of the image to be detected; then, determining a user group identification result (such as the flow of people entering the public place or the attribute of the user group entering the public place) based on the target detection result, and further executing corresponding business processing (such as automatically triggering entry limit prompting operation or pushing information to the target user) based on the user group identification result; the higher the accuracy of the model parameters of the target detection model, the higher the accuracy of the target detection result of the image to be detected output by the target detection model, and therefore, the higher the accuracy of triggering execution of corresponding business processing based on the target detection result.

For another example, the image to be detected may be acquired by an image acquisition device disposed at each monitoring point in a certain cultivation base, and the corresponding object to be detected in the image to be detected is a target cultivation object in the monitoring point, and the object detection model is used to perform object detection on the image to be detected, so as to define a second prediction boundary box containing the target cultivation object in the image to be detected, and determine a second prediction category corresponding to the second prediction boundary box (i.e. a category to which the target cultivation object contained in the second prediction boundary box belongs, such as at least one of a living body state and a body size), so as to obtain a target detection result of the image to be detected; then, determining a breeding object group identification result (such as the survival rate of the target breeding objects in the breeding monitoring point or the growth rate of the target breeding objects in the breeding monitoring point) based on the target detection result, and further executing corresponding control operation (such as automatically sending out alarm prompt information if the survival rate is detected to be reduced or automatically controlling to increase the feeding amount or the feeding frequency if the growth rate is detected to be reduced) based on the breeding object group identification result; the higher the accuracy of the model parameters of the target detection model, the higher the accuracy of the target detection result of the image to be detected output by the target detection model, and therefore, the higher the accuracy of triggering execution of the corresponding control operation based on the target detection result.

In the target detection process, firstly extracting a plurality of alternative bounding boxes by using a preset region of interest extraction model, and randomly sampling a third specified number of alternative bounding boxes in the alternative bounding boxes to serve as a second reference bounding box; for each second reference boundary frame, the boundary frame prediction sub-model carries out boundary frame prediction based on the second reference boundary frame to obtain a second prediction boundary frame; the classification sub-model carries out class prediction on the second prediction boundary frame to obtain a second prediction class; then, generating a target detection result of the image to be detected based on a second prediction boundary box and a second prediction category corresponding to each second reference boundary box; in the model training stage, the boundary frame predictor model predicts a first prediction boundary frame based on a first reference boundary frame, and then based on the first prediction boundary frame and a corresponding actual boundary frame, the target detection model to be trained is caused to continuously learn boundary frame distribution, so that the predicted first prediction boundary frame is more similar to the corresponding actual boundary frame, and the boundary frame prediction accuracy, model generalization and data migration of the trained target detection model are improved; the comparison result set used for determining the boundary frame regression loss value not only comprises a first comparison result representing the boundary frame distribution similarity degree, but also comprises a second comparison result representing the boundary frame coordinate coincidence degree, and the boundary frame regression loss value is obtained based on the first comparison result and the second comparison result which are respectively corresponding to each first reference boundary frame, so that the boundary frame regression loss value comprises the regression loss obtained based on the coarse granularity comparison dimension of the boundary frame distribution similarity degree and the regression loss obtained based on the fine granularity comparison dimension of the boundary frame coordinate coincidence degree, the accuracy of the boundary frame regression loss value can be improved, and the accuracy of model parameters updated based on the boundary frame regression loss value can be further improved.

It should be noted that, in this application, the embodiment is based on the same inventive concept as the previous embodiment, so that the specific implementation of this embodiment may refer to the implementation of the foregoing model training method, and the repetition is not repeated.

Corresponding to the model training method described in fig. 1 to fig. 4b, based on the same technical concept, the embodiment of the present application further provides a model training device, and fig. 7 is a schematic diagram of module composition of the model training device provided in the embodiment of the present application, where the device is used to execute the model training method described in fig. 1 to fig. 4b, and as shown in fig. 7, the device includes:

a bounding box acquisition module 702 configured to acquire a first subset of bounding boxes from a first set of alternative bounding boxes, and to acquire actual bounding boxes corresponding to respective first reference bounding boxes in the first subset of bounding boxes; the first boundary box subset comprises a first appointed number of first reference boundary boxes, and the first alternative boundary box set is obtained by extracting a target region from a sample image data set by utilizing a preset region of interest extraction model;

the model training module 704 is configured to input the first reference bounding box and the actual bounding box into a target detection model to be trained to perform model iterative training until a model iterative training result meets a preset model iterative training termination condition, and obtain a trained target detection model; wherein the target detection model comprises a bounding box predictor model; the specific implementation mode of each model training is as follows:

In the model training device in the embodiment of the application, in the model training stage, the boundary frame predictor model predicts based on the first reference boundary frame to obtain a first prediction boundary frame, and then based on the first prediction boundary frame and the corresponding actual boundary frame, the target detection model to be trained is caused to continuously learn the boundary frame distribution, so that the predicted first prediction boundary frame is more similar to the corresponding actual boundary frame, and the boundary frame prediction accuracy, model generalization and data migration of the trained target detection model are improved; the comparison result set used for determining the boundary frame regression loss value not only comprises a first comparison result representing the boundary frame distribution similarity degree, but also comprises a second comparison result representing the boundary frame coordinate coincidence degree, and the boundary frame regression loss value is obtained based on the first comparison result and the second comparison result which are respectively corresponding to each first reference boundary frame, so that the boundary frame regression loss value comprises the regression loss obtained based on the coarse granularity comparison dimension of the boundary frame distribution similarity degree and the regression loss obtained based on the fine granularity comparison dimension of the boundary frame coordinate coincidence degree, the accuracy of the boundary frame regression loss value can be improved, and the accuracy of model parameters updated based on the boundary frame regression loss value can be further improved.

It should be noted that, the embodiments of the model training apparatus and the embodiments of the model training method in the present application are based on the same inventive concept, so that the specific implementation of the embodiments may refer to the implementation of the corresponding model training method, and the repetition is not repeated.

Corresponding to the above-mentioned target detection methods described in fig. 5 to 6, based on the same technical concept, an embodiment of the present application further provides a target detection apparatus, and fig. 8 is a schematic block diagram of the target detection apparatus provided in the embodiment of the present application, where the apparatus is configured to perform the target detection method described in fig. 5 to 6, and as shown in fig. 8, the apparatus includes:

a bounding box acquisition module 802 configured to acquire a second bounding box subset corresponding to the image to be detected from the second set of candidate bounding boxes; the second boundary box subset comprises a third appointed number of second reference boundary boxes, and the second alternative boundary box set is obtained by extracting a target region of the image to be detected by using a preset region of interest extraction model;

the target detection module 804 is configured to input the second reference bounding boxes into a target detection model to perform target detection, so as to obtain second prediction bounding boxes and second class prediction results corresponding to the second reference bounding boxes;

The detection result generating module 806 is configured to generate a target detection result of the image to be detected based on the second prediction bounding box and the second class prediction result corresponding to each of the second reference bounding boxes.

In the target detection process, firstly extracting a plurality of alternative bounding boxes by using a preset region of interest extraction model, and randomly sampling a third specified number of alternative bounding boxes in the alternative bounding boxes to serve as a second reference bounding box; for each second reference boundary frame, the boundary frame prediction sub-model carries out boundary frame prediction based on the second reference boundary frame to obtain a second prediction boundary frame; the classification sub-model carries out class prediction on the second prediction boundary frame to obtain a second prediction class; then, generating a target detection result of the image to be detected based on a second prediction boundary box and a second prediction category corresponding to each second reference boundary box; in the model training stage, the boundary frame predictor model predicts a first prediction boundary frame based on a first reference boundary frame, and then based on the first prediction boundary frame and a corresponding actual boundary frame, the target detection model to be trained is caused to continuously learn boundary frame distribution, so that the predicted first prediction boundary frame is more similar to the corresponding actual boundary frame, and the boundary frame prediction accuracy, model generalization and data migration of the trained target detection model are improved; the comparison result set used for determining the boundary frame regression loss value not only comprises a first comparison result representing the boundary frame distribution similarity degree, but also comprises a second comparison result representing the boundary frame coordinate coincidence degree, and the boundary frame regression loss value is obtained based on the first comparison result and the second comparison result which are respectively corresponding to each first reference boundary frame, so that the boundary frame regression loss value comprises the regression loss obtained based on the coarse granularity comparison dimension of the boundary frame distribution similarity degree and the regression loss obtained based on the fine granularity comparison dimension of the boundary frame coordinate coincidence degree, and the accuracy of the boundary frame regression loss value can be improved, and the accuracy of model parameters updated based on the boundary frame regression loss value is further improved.

It should be noted that, the embodiments of the object detection apparatus and the embodiments of the object detection method in the present application are based on the same inventive concept, so that the specific implementation of the embodiments may refer to the implementation of the corresponding object detection method, and the repetition is not repeated.

Further, according to the method shown in fig. 1 to fig. 4b, based on the same technical concept, the embodiment of the present application further provides a computer device, where the computer device is configured to perform the model training method or the target detection method as shown in fig. 9.

Computer devices may vary widely in configuration or performance, and may include one or more processors 901 and memory 902, where memory 902 may store one or more stored applications or data. Wherein the memory 902 may be transient storage or persistent storage. The application programs stored in the memory 902 may include one or more modules (not shown) each of which may include a series of computer-executable instructions for use in a computer device. Still further, the processor 901 may be provided in communication with a memory 902 for executing a series of computer executable instructions in the memory 902 on a computer device. The computer device may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input output interfaces 905, one or more keyboards 906, and the like.

In a particular embodiment, a computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the computer device, and configured to be executed by one or more processors, the one or more programs comprising computer-executable instructions for:

In another particular embodiment, a computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the computer device, and configured to be executed by one or more processors, the one or more programs comprising computer-executable instructions for:

Acquiring a second boundary box subset corresponding to the image to be detected from a second alternative boundary box set; the second boundary box subset comprises a third appointed number of second reference boundary boxes, and the second alternative boundary box set is obtained by extracting a target region of the image to be detected by using a preset region of interest extraction model; inputting the second reference boundary boxes into a target detection model to carry out target detection, and obtaining second prediction boundary boxes and second class prediction results corresponding to the second reference boundary boxes; and generating a target detection result of the image to be detected based on the second prediction boundary box and the second class prediction result corresponding to each second reference boundary box.

In the computer equipment in the embodiment of the application, in a model training stage, a boundary frame predictor model predicts based on a first reference boundary frame to obtain a first prediction boundary frame, and then based on the first prediction boundary frame and a corresponding actual boundary frame, the target detection model to be trained is caused to continuously learn boundary frame distribution, so that the predicted first prediction boundary frame is more similar to the corresponding actual boundary frame, and the boundary frame prediction accuracy, model generalization and data migration of the trained target detection model are improved; the comparison result set used for determining the boundary frame regression loss value not only comprises a first comparison result representing the boundary frame distribution similarity degree, but also comprises a second comparison result representing the boundary frame coordinate coincidence degree, and the boundary frame regression loss value is obtained based on the first comparison result and the second comparison result which are respectively corresponding to each first reference boundary frame, so that the boundary frame regression loss value comprises the regression loss obtained based on the coarse granularity comparison dimension of the boundary frame distribution similarity degree and the regression loss obtained based on the fine granularity comparison dimension of the boundary frame coordinate coincidence degree, the accuracy of the boundary frame regression loss value can be improved, and the accuracy of model parameters updated based on the boundary frame regression loss value can be further improved.

It should be noted that, the embodiments related to the computer device and the embodiments related to the model training method in the present application are based on the same inventive concept, so the specific implementation of the embodiments may refer to the implementation of the corresponding model training method, and the repetition is not repeated.

Further, corresponding to the method shown in fig. 1 to fig. 4b, based on the same technical concept, the embodiment of the present application further provides a storage medium, which is used to store computer executable instructions, in a specific embodiment, the storage medium may be a U disc, an optical disc, a hard disk, etc., where the computer executable instructions stored in the storage medium can implement the following flow when executed by a processor:

acquiring a first boundary frame subset from a first alternative boundary frame set, and acquiring actual boundary frames corresponding to first reference boundary frames in the first boundary frame subset respectively; the first boundary box subset comprises a first appointed number of first reference boundary boxes, and the first alternative boundary box set is obtained by extracting a target region from a sample image data set by utilizing a preset region of interest extraction model; inputting the first reference boundary box and the actual boundary box into a target detection model to be trained for model iterative training until the model iterative training result meets the preset model iterative training termination condition, and obtaining a trained target detection model; wherein the target detection model comprises a bounding box predictor model; the specific implementation mode of each model training is as follows:

In another specific embodiment, the storage medium may be a usb disk, an optical disc, a hard disk, or the like, where the computer executable instructions stored in the storage medium when executed by the processor implement the following procedures:

acquiring a second boundary box subset corresponding to the image to be detected from a second alternative boundary box set; the second boundary box subset comprises a third appointed number of second reference boundary boxes, and the second alternative boundary box set is obtained by extracting a target region of the image to be detected by using a preset region of interest extraction model; inputting the second reference boundary boxes into a target detection model to carry out target detection, and obtaining second prediction boundary boxes and second class prediction results corresponding to the second reference boundary boxes; generating a target detection result of the image to be detected based on the second prediction boundary box and the second class prediction result corresponding to each second reference boundary box

When the computer executable instructions stored in the storage medium in the embodiment of the application are executed by the processor, in a model training stage, the boundary frame predictor model predicts and obtains a first prediction boundary frame based on a first reference boundary frame, and then based on the first prediction boundary frame and a corresponding actual boundary frame, the target detection model to be trained is caused to continuously learn the boundary frame distribution, so that the predicted and obtained first prediction boundary frame is more close to the corresponding actual boundary frame, and the boundary frame prediction accuracy, model generalization and data migration of the trained target detection model are improved; the comparison result set used for determining the boundary frame regression loss value not only comprises a first comparison result representing the boundary frame distribution similarity degree, but also comprises a second comparison result representing the boundary frame coordinate coincidence degree, and the boundary frame regression loss value is obtained based on the first comparison result and the second comparison result which are respectively corresponding to each first reference boundary frame, so that the boundary frame regression loss value comprises the regression loss obtained based on the coarse granularity comparison dimension of the boundary frame distribution similarity degree and the regression loss obtained based on the fine granularity comparison dimension of the boundary frame coordinate coincidence degree, the accuracy of the boundary frame regression loss value can be improved, and the accuracy of model parameters updated based on the boundary frame regression loss value can be further improved.

It should be noted that, the embodiments related to the storage medium and the embodiments related to the model training method in the present application are based on the same inventive concept, so the specific implementation of this embodiment may refer to the implementation of the corresponding model training method, and the repetition is not repeated.

The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media. Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element. Embodiments of the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices. All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. The foregoing description is by way of example only and is not intended to limit the present disclosure. Various modifications and changes may occur to those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. that fall within the spirit and principles of the present document are intended to be included within the scope of the claims of the present document.

Claims

1. A method of model training, the method comprising:

2. The method of claim 1, wherein the generating a set of bounding box comparison results based on the actual bounding box corresponding to the first reference bounding box and the first prediction bounding box corresponding to the first reference bounding box comprises:

calculating the relative entropy KL divergence based on an actual boundary frame and a first prediction boundary frame corresponding to the first reference boundary frame to obtain a first comparison result; and calculating boundary frame cross-correlation loss based on the actual boundary frame and the first prediction boundary frame corresponding to the first reference boundary frame to obtain a second comparison result.

3. The method of claim 2, wherein the determining a bounding box regression loss value based on the first and second comparison results corresponding respectively to the first reference bounding box in the first subset of bounding boxes comprises:

determining sub-regression loss values respectively corresponding to the first reference bounding boxes in the first bounding box subset; the sub-regression loss value corresponding to each of the first reference bounding boxes is determined based on target information including one or a combination of: the boundary frame distribution similarity represented by the first comparison result and the boundary frame coordinate coincidence degree represented by the second comparison result corresponding to the first reference boundary frame;

And determining a bounding box regression loss value based on the sub regression loss value respectively corresponding to each first reference bounding box in the first bounding box subset.

4. The method according to claim 2, wherein calculating the relative entropy KL divergence based on the actual bounding box and the first prediction bounding box corresponding to the first reference bounding box, to obtain a first comparison result, comprises:

determining a first probability distribution of an actual bounding box corresponding to the first reference bounding box, and determining a second probability distribution of a first prediction bounding box corresponding to the first reference bounding box;

calculating KL divergence values between the first probability distribution and the second probability distribution; the KL divergence value is used for representing the distribution similarity degree between the first prediction boundary box and the actual boundary box;

and determining a first comparison result corresponding to the first reference boundary box based on the KL divergence value.

5. The method according to claim 2, wherein calculating a bounding box cross-correlation loss based on the actual bounding box and the first prediction bounding box corresponding to the first reference bounding box, and obtaining a second comparison result includes:

Performing boundary frame cross-ratio loss calculation on an actual boundary frame corresponding to the first reference boundary frame and a first prediction boundary frame corresponding to the first reference boundary frame to obtain a first cross-ratio loss;

determining a second comparison result corresponding to the first reference boundary box based on the first cross-ratio loss; the bounding box intersection ratio loss is used for representing the coordinate coincidence degree of the bounding boxes.

6. The method of claim 5, wherein determining a second comparison corresponding to the first reference bounding box based on the first cross-ratio penalty comprises:

determining a comparison boundary box set in first prediction boundary boxes corresponding to the first appointed number of first reference boundary boxes respectively; the contrast boundary box set comprises other first prediction boundary boxes except for the first prediction boundary box corresponding to the first reference boundary box or other first prediction boundary boxes which do not contain the target object outlined by the first reference boundary box;

respectively carrying out boundary frame cross-ratio loss calculation on the actual boundary frame corresponding to the first reference boundary frame and the other first prediction boundary frames to obtain a second cross-ratio loss;

And determining a second comparison result corresponding to the first reference boundary box based on the first cross-ratio loss and the second cross-ratio loss.

7. The method of claim 3, wherein the object detection model further comprises a bounding box classification sub-model; the specific implementation manner of each model training further comprises the following steps: the boundary frame classification sub-model classifies the first reference boundary frame or the first prediction boundary frame to obtain a first class prediction result;

the target information further comprises a category matching result between a first prediction category represented by the first category prediction result corresponding to the first reference boundary frame and an actual category of the first reference boundary frame, wherein if the category matching result is that the first prediction category is not matched with the actual category, the sub-regression loss value corresponding to the first reference boundary frame is zero; and if the category matching result is that the first prediction category is matched with the actual category, determining a sub-regression loss value corresponding to the first reference boundary box as the sub-regression loss value determined based on at least one of a first regression loss component corresponding to the boundary box distribution similarity degree and a second regression loss component corresponding to the boundary box coordinate coincidence degree.

8. The method according to claim 1, wherein the method further comprises:

inputting the sample image dataset into a preset region of interest extraction model to extract the region of interest, and obtaining a first alternative bounding box set; the first set of candidate bounding boxes includes a second specified number of candidate bounding boxes; the second specified number is greater than the first specified number;

the obtaining a first bounding box subset from the first set of candidate bounding boxes includes: and randomly selecting the first appointed number of alternative bounding boxes from the second appointed number of alternative bounding boxes as a first reference bounding box to obtain a first bounding box subset.

9. A method of target detection, the method comprising:

10. The method of claim 9, wherein the object detection model comprises a bounding box predictor model and a bounding box classifier model;

for each of the second reference bounding boxes: in the target detection process, the boundary frame prediction sub-model carries out boundary frame prediction based on the second reference boundary frame to obtain a second prediction boundary frame corresponding to the second reference boundary frame; and the boundary frame classification sub-model classifies the second reference boundary frame or the second prediction boundary frame to obtain a second-class prediction result corresponding to the second reference boundary frame.

11. A model training apparatus, the apparatus comprising:

12. An object detection device, the device comprising:

13. A computer device, the device comprising:

a processor; and

a memory arranged to store computer executable instructions configured to be executed by the processor, the executable instructions comprising steps for performing the method of any of claims 1-8 or any of claims 9-10.

14. A storage medium storing computer executable instructions for causing a computer to perform the method of any one of claims 1-8 or any one of claims 9-10.