CN117437396A - Target detection model training method, target detection method and target detection device - Google Patents

Target detection model training method, target detection method and target detection device Download PDF

Info

Publication number
CN117437396A
CN117437396A CN202210831398.8A CN202210831398A CN117437396A CN 117437396 A CN117437396 A CN 117437396A CN 202210831398 A CN202210831398 A CN 202210831398A CN 117437396 A CN117437396 A CN 117437396A
Authority
CN
China
Prior art keywords
model
prediction
original
boundary
box
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210831398.8A
Other languages
Chinese (zh)
Inventor
吕永春
朱徽
王钰
周迅溢
曾定衡
蒋宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Consumer Finance Co Ltd
Original Assignee
Mashang Consumer Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Consumer Finance Co Ltd filed Critical Mashang Consumer Finance Co Ltd
Priority to CN202210831398.8A priority Critical patent/CN117437396A/en
Publication of CN117437396A publication Critical patent/CN117437396A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

In the model training stage, the model to be trained is caused to continuously learn the boundary frame distribution and the target object type recognition based on the actual boundary frame, the first original boundary frame and the actual type, so that the prediction result output by the generation sub-model is more real, and the target detection accuracy, generalization and data migration of the model are improved; and determining a loss value based on a discrimination result set output by the discrimination sub-model, and continuously performing multiple iterative updating on the model parameters based on the loss value, wherein the discrimination result set not only comprises discrimination results representing the distribution similarity degree of the boundary frame and discrimination results representing the category similarity degree of the first prediction category and the actual category, but also comprises discrimination results representing the coordinate coincidence degree of the boundary frame, so that the accuracy of the loss value is higher, and the accuracy of the target object position mark and the target object classification during target detection is ensured.

Description

Target detection model training method, target detection method and target detection device
Technical Field
The present disclosure relates to the field of target detection, and in particular, to a target detection model training method, a target detection method, and a target detection device.
Background
At present, with the rapid development of artificial intelligence technology, target detection is performed on a certain image through a pre-trained target detection model, so that the demands of predicting the coordinate information and the category of the bounding box where each target contained in the obtained image is located are higher and higher.
However, in the training process of the existing target detection model, the similarity degree of the image features in the prediction bounding box and the actual bounding box is mainly learned, so that for a preset sample image set, the accuracy of model parameters of the target detection model obtained by training is relatively high, but for an image to be detected, the accuracy of model parameters of the target detection model obtained by training is reduced, so that the generalization of the target detection model is poor, and further, the accuracy of target object position marks and target object classification during target detection in a model application stage is relatively low.
Disclosure of Invention
The embodiment of the application aims to provide a target detection model training method, a target detection method and a target detection device, which can improve the target detection accuracy, generalization and data migration of a trained model, thereby realizing the purpose of simultaneously ensuring the accuracy of target object position marks and target object classification during target detection.
In order to achieve the above technical solution, the embodiments of the present application are implemented as follows:
in a first aspect, an embodiment of the present application provides a method for training a target detection model, where the method includes:
acquiring N first original boundary frames, and acquiring an actual boundary frame corresponding to each first original boundary frame and an actual category corresponding to the actual boundary frame; the first original boundary box is obtained by extracting a target region from a preset sample image set by using a preset region of interest extraction model, and N is a positive integer greater than 1;
inputting the first original boundary box, the actual boundary box and the actual category into a model to be trained for model iterative training until the current model iterative training result meets the model iterative training termination condition, and obtaining a target detection model;
the model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first original bounding boxes: the generation sub-model predicts based on the first original boundary box to obtain a first prediction boundary box and a first prediction category; the judging sub-model generates a judging result set based on an actual boundary box corresponding to the first original boundary box, a first prediction boundary box corresponding to the first original boundary box, the actual category and the first prediction category; the first discrimination result represents the degree of similarity of the boundary frame distribution of the first prediction boundary frame and the actual boundary frame under the condition that a preset constraint is met, the preset constraint is that the class of the target object in the first prediction boundary frame is predicted by the generation sub-model to be a target class matched with the actual class, the second discrimination result represents the degree of similarity of the class between the first prediction class corresponding to the first original boundary frame and the actual class, and the third discrimination result represents the degree of coincidence of boundary frame coordinates of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met;
Determining regression classification loss values of the models to be trained based on the first discrimination result, the second discrimination result and the third discrimination result corresponding to each first original boundary box;
and updating model parameters of the generating sub-model and the judging sub-model based on the regression classification loss value.
In a second aspect, an embodiment of the present application provides a target detection method, where the method includes:
obtaining M second original boundary frames; the second original boundary box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
inputting the second original boundary boxes into a target detection model to carry out target detection, and obtaining a second prediction boundary box and a second prediction category corresponding to each second original boundary box;
and generating a target detection result of the image to be detected based on the second prediction boundary box and the second prediction category corresponding to each second original boundary box.
In a third aspect, an embodiment of the present application provides an object detection model training apparatus, where the apparatus includes:
the first bounding box acquisition module is configured to acquire N first original bounding boxes and acquire an actual bounding box corresponding to each first original bounding box and an actual category corresponding to the actual bounding box; the first original boundary box is obtained by extracting a target region from a preset sample image set by using a preset region of interest extraction model, and N is a positive integer greater than 1;
The model training module is configured to input the first original boundary frame, the actual boundary frame and the actual category into a model to be trained for model iterative training until a current model iterative training result meets a model iterative training termination condition to obtain a target detection model;
the model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first original bounding boxes: the generation sub-model predicts based on the first original boundary box to obtain a first prediction boundary box and a first prediction category; the judging sub-model generates a judging result set based on an actual boundary box corresponding to the first original boundary box, a first prediction boundary box corresponding to the first original boundary box, the actual category and the first prediction category; the judging result set comprises a first judging result, a second judging result and a third judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met, the preset constraint is that the class of the target object in the first prediction boundary frame is predicted by the generating sub-model to be a target class matched with the actual class, the second judging result represents the class similarity degree between the first prediction class and the actual class, and the third judging result represents the boundary frame coordinate coincidence degree of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met; determining regression classification loss values of the models to be trained based on the first discrimination result, the second discrimination result and the third discrimination result corresponding to each first original boundary box; and updating model parameters of the generating sub-model and the judging sub-model based on the regression classification loss value.
In a fourth aspect, an embodiment of the present application provides an object detection apparatus, including:
a second bounding box acquisition module configured to acquire M second original bounding boxes; the second original boundary box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
the target detection module is configured to input the second original boundary boxes into a target detection model to carry out target detection, so as to obtain a second prediction boundary box and a second prediction category corresponding to each second original boundary box;
and the detection result generation module is configured to generate a target detection result of the image to be detected based on the second prediction boundary box and the second prediction category corresponding to each second original boundary box.
In a fifth aspect, a computer device provided in an embodiment of the present application, the device includes:
a processor; and a memory arranged to store computer executable instructions configured to be executed by the processor, the executable instructions comprising steps for performing the method as described in the first or second aspect.
In a sixth aspect, embodiments of the present application provide a storage medium storing computer executable instructions that cause a computer to perform the steps of the method as described in the first or second aspect.
It can be seen that, in the embodiment of the present application, in the model training stage, a first discrimination result representing the similarity degree of the boundary frame distribution is output by discriminating the sub-model based on the actual boundary frame and the first prediction boundary frame obtained from the first original boundary frame, so as to promote the model parameters related to the boundary frame regression to be continuously updated, so as to promote the generation sub-model to continuously learn the boundary frame distribution, so that the predicted first prediction boundary frame is more similar to the actual boundary frame, and the accuracy, model generalization and data migration of the boundary frame prediction of the trained target detection model on the position of the target object are improved; in the model training stage, a second judging result representing the class similarity between the first predicting class and the actual class is output through a judging sub-model based on the actual class and the first predicting class corresponding to the first original boundary box, so that the generation sub-model continuously learns the class of the target object of the image area in the boundary box, the predicted first predicting class is more similar to the actual class, and model parameters related to the target object class prediction are continuously updated by means of the judging sub-model judging result instead of the predicting class in the model training process, so that the target classifying accuracy, the model generalization and the data migration of the trained target detection model are improved; the judging result set output by the judging sub-model not only comprises a first judging result and a second judging result, but also comprises a third judging result representing the coordinate coincidence degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by similar distribution of the boundary frame but specific position deviation is achieved, the regression classification loss value of the model to be trained is determined based on the judging result set, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated continuously based on the regression classification loss value, so that the accuracy of the model parameters updated based on the regression classification loss value is higher due to the fact that the accuracy of the regression classification loss value obtained based on the judging result set is higher, and the accuracy of the position mark of the target object and the classification of the target object in the target detection process is ensured simultaneously.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments described in one or more of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
Fig. 1 is a schematic flow chart of a training method of a target detection model according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of each model training process in the target detection model training method according to the embodiment of the present application;
FIG. 3 is a schematic diagram of a first implementation principle of the training method of the target detection model according to the embodiment of the present application;
FIG. 4a is a schematic diagram of a second implementation principle of the training method of the target detection model according to the embodiment of the present application;
fig. 4b is a schematic diagram of a third implementation principle of the training method of the target detection model according to the embodiment of the present application;
fig. 5 is a schematic flow chart of a target detection method according to an embodiment of the present application;
fig. 6 is a schematic diagram of an implementation principle of a target detection method according to an embodiment of the present application;
Fig. 7 is a schematic diagram of module composition of a training device for a target detection model according to an embodiment of the present application;
fig. 8 is a schematic block diagram of a target detection apparatus according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to better understand the technical solutions in one or more embodiments of the present application, the following description will clearly and completely describe the technical solutions in embodiments of the present application with reference to the drawings in embodiments of the present application, and it is obvious that the described embodiments are only one or more of some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one or more of the embodiments of the present application without inventive faculty, are intended to be within the scope of the present application.
It should be noted that, without conflict, one or more embodiments of the present application and features of the embodiments may be combined with each other. Embodiments of the present application will be described in detail below with reference to the accompanying drawings in conjunction with the embodiments.
Considering that if the cross entropy regression loss between two boundary frames is calculated directly based on a first prediction boundary frame and an actual boundary frame, iterative training is carried out on model parameters based on the cross entropy regression loss, so that the trained target detection model is compared with a preset sample image set used in a model training stage, the generalization of the target detection model is poor, the model cross data migration capability is poor, the problem that the boundary frame prediction accuracy of the target detection model on the preset sample image set is high, and the boundary frame prediction accuracy of new image data to be detected is low is probably caused, therefore, in the model training stage, a first discrimination result representing the similarity degree of boundary frame distribution is output through discriminating the sub-model based on the actual boundary frame and the first prediction boundary frame obtained by the first original boundary frame, the model parameters related to boundary frame regression are continuously updated, the sub-model is continuously caused to learn boundary frame distribution, the first prediction boundary frame obtained through prediction is enabled to be more close to the actual boundary frame, the accuracy, the model generalization and the data migration capability of the boundary frame prediction of the trained target detection model on the position of the target object are improved, and the adaptive data of the target detection model to be used for the predicted is further improved; in consideration of the problems that if cross entropy classification loss between two categories is calculated directly based on a first prediction category and an actual category, iterative training is carried out on model parameters based on the cross entropy classification loss, and then a target detection model after training is compared with a preset sample image set which is used in a model training stage, the generalization of the target detection model is poor, the model cross data migration capability is poor, the target detection model tends to have high target object classification accuracy of the preset sample image set, and the problem that the target object classification accuracy of new image data to be detected is low is solved, based on the problem, in the model training stage, a second discrimination result which represents the similarity degree of the category between the first prediction category and the actual category is output through a discrimination sub-model based on the actual category corresponding to a first original boundary frame, so that the first prediction category obtained by prediction is enabled to be closer to the actual category, and the target object classification model after model training is enabled to be more accurate by means of discrimination results of the sub-model, the accuracy of the target object detection model is enabled to be improved continuously, and the accuracy of the target object detection model after model migration capability is enabled to be improved, and the model transfer capability is enabled to be improved after the model is enabled to be used for the target object classification is enabled to be more accurate, and the model is enabled to be used for the target to be detected; and further consider if only confirm the model regression loss from the rough granularity of the degree of similarity of boundary frame distribution and compare the dimension, carry on the parameter adjustment of the model, can't give consideration to the accurate position study of the boundary frame, or just compare the dimension and confirm the model regression loss from the fine granularity of the degree of coincidence of boundary frame coordinates, carry on the parameter adjustment of the model, can't give consideration to the problem of edge ambiguity of the boundary frame, on the basis of this, through the mode that combines the rough granularity of the degree of similarity of distribution of boundary frame and the fine granularity of the degree of coincidence of boundary frame coordinates to compare the dimension, confirm the model regression loss, namely judge the result set that the sub-model outputs and include first judging result and second judging result, also include the third judging result to characterize the degree of coincidence of boundary frame coordinates, achieve and offset the effect of boundary frame regression loss that similar but specific position deviation bring of boundary frame distribution, so on the basis of this judging result set, confirm the regression classification loss value of model to be trained, constantly on the basis of regression classification loss value pair of model parameters of generating sub-model and judging sub-model, carry on the iterative updating of multiple rounds, thus because the regression classification loss value obtained on the basis of regression loss value is higher, make the result of model parameter of accuracy of updating and accuracy of model parameter of accuracy after the model classification loss of accuracy is higher, and accuracy of object classification loss realization in the object classification process is higher.
Fig. 1 is a schematic flow diagram of a method for training an object detection model according to one or more embodiments of the present application, where the method in fig. 2 can be performed by an electronic device provided with an object detection model training apparatus, and the electronic device may be a terminal device or a designated server, where a hardware apparatus for training an object detection model (i.e. the electronic device provided with the object detection model training apparatus) and a hardware apparatus for object detection (i.e. the electronic device provided with the object detection apparatus) may be the same or different; specifically, for the training process of the target detection model, as shown in fig. 1, the method at least includes the following steps:
s102, acquiring N first original boundary frames, and acquiring an actual boundary frame corresponding to each first original boundary frame and an actual category corresponding to the actual boundary frame; the first original boundary box is obtained by extracting a target area from a preset sample image set by using a preset interested area extraction model, and N is a positive integer greater than 1;
specifically, for the determining process of the N first original bounding boxes, for each round of model training, a step of performing target region extraction on a preset sample image set by using a preset region of interest extraction model is performed once, so as to obtain N first original bounding boxes; the step of extracting the target region from the preset sample image set by using the preset region of interest extraction model may be performed in advance, and then, for each round of model training, N first original bounding boxes are randomly sampled from a large number of candidate bounding boxes extracted in advance.
Specifically, the preset sample image set may include a plurality of sample target objects, and each sample target object may correspond to a plurality of first original bounding boxes, that is, the N first original bounding boxes include at least one first original bounding box corresponding to each sample target object.
Specifically, before acquiring the N first original bounding boxes in step S102, the method further includes: inputting a preset sample image set into a preset region of interest extraction model to extract a region of interest, so as to obtain X alternative bounding boxes; wherein X is more than or equal to N, X is a positive integer greater than 1, N is a first preset number, X is a second preset number, that is, for the case that X=N, namely the second preset number is equal to the first preset number, for each round of model training, a preset region of interest extraction model is utilized to extract regions of interest from a plurality of sample image data in a preset sample image set, so as to obtain a first original bounding box of the first preset number; and for X > N, namely, the second preset number is larger than the first preset number, for each round of model training, randomly sampling from the first preset number of alternative bounding boxes to obtain the first original bounding boxes with the first preset number.
Considering that one of the purposes in the model training process is to continuously learn the boundary frame distribution through iterative training of model parameters, so as to improve generalization and data migration of the model (that is, model parameters are not dependent on sample data used in the model training process and can be better suitable for data to be identified in the model application process), since in order to prompt the model to be trained to better learn the boundary frame distribution, it is required to ensure that the extracted first original boundary frame input into the model to be trained obeys a certain probability distribution (such as Gaussian distribution or Cauchy distribution), the greater the number N of anchor frames extracted by using a preset region of interest extraction model is, the better the boundary frame distribution learning of the model to be trained is facilitated, however, if X anchor frames are extracted as the first original boundary frame by using the preset region of interest extraction model (such as a region of interest extraction algorithm ROI) in real time each time, the data processing amount is required to be larger when the model is input into the model to be trained, and the hardware equipment requirements are higher;
in a specific implementation, preferably, the method includes that the preset region of interest extraction model is utilized to extract X anchor frames in advance, then, N anchor frames are randomly sampled from the X anchor frames in each round of model training and are input into the model to be trained as first original boundary frames, so that the data processing amount of each round of model training can be ensured, the model can also be ensured to better perform boundary frame distribution learning, namely, the data processing amount in the model training process and the boundary frame distribution learning can be simultaneously considered, based on the fact that the second preset number X is greater than the first preset number N, correspondingly, the step S102 is used for obtaining N first original boundary frames, and specifically includes: randomly selecting N alternative bounding boxes from the X alternative bounding boxes as a first original bounding box, namely extracting the region of interest from a plurality of sample image data in a preset sample image set by utilizing a preset region of interest extraction model in advance to obtain X alternative bounding boxes; then, for each round of model training, N first original bounding boxes are randomly sampled from the X candidate bounding boxes.
That is, in a preferred embodiment, X anchor frames (i.e., the second preset number of candidate bounding boxes) are extracted in advance, then N anchor frames (i.e., the first preset number of first original bounding boxes) are randomly sampled from the X anchor frames for each round of model training, and then the following step S104 is continued.
S104, inputting the first original boundary frame, the actual boundary frame and the actual category into a model to be trained for model iterative training until the current model iterative training result meets the model iterative training termination condition, and obtaining a target detection model; the model iteration training termination condition may include: the number of training rounds of the current model is equal to the total number of training rounds, the model loss function converges, or equilibrium is achieved between the generation sub-model and the discrimination sub-model;
the specific implementation process of the model iterative training in the step S104 is described below, and since the processing process of each model training in the model iterative training process is the same, any model training is taken as an example for detailed description. Specifically, if the model to be trained comprises a generation sub-model and a judgment sub-model; as shown in fig. 2, each model training implementation may have the following steps S1042 to S1046:
S1042, for each first original bounding box: generating a sub-model to predict based on the first original boundary box to obtain a first prediction boundary box and a first prediction category; the judging sub-model generates a judging result set based on an actual boundary frame corresponding to the first original boundary frame, a first prediction boundary frame corresponding to the first original boundary frame, a corresponding actual category and a first prediction category; the first discrimination result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met, the preset constraint is that the class of the target object in the first prediction boundary frame is predicted by the generated sub-model to be the target class matched with the corresponding actual class, the second discrimination result represents the class similarity degree between the first prediction class corresponding to the first original boundary frame and the corresponding actual class, and the third discrimination result represents the boundary frame coordinate coincidence degree of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met;
specifically, the generating sub-model is not only used for carrying out boundary frame prediction based on the first original boundary frame to obtain a corresponding first prediction boundary frame, but also used for carrying out category prediction on the first original boundary frame or the target object of the image area in the first prediction boundary frame, so that the model parameters of the generating sub-model comprise a first model parameter related to boundary frame regression and a second model parameter related to target object category prediction, and therefore, the first model parameter and the second model parameter need to be iteratively updated together in the model training process, namely, the first model parameter is iteratively updated based on a first discrimination result and a third discrimination result corresponding to each first original boundary frame, and the second model parameter is iteratively updated based on a second discrimination result corresponding to each first original boundary frame.
Specifically, for the determination process of the first discrimination result representing the distribution similarity degree of the bounding box, the KL divergence between the actual bounding box and the corresponding first prediction bounding box can be directly calculated; however, in the specific implementation, considering whether the first prediction boundary frame obtained by the prediction of the generation sub-model can be judged to be sufficiently real or not, in the case that the generated boundary frame (i.e., the first prediction boundary frame) is difficult to distinguish from the real boundary frame (i.e., the actual boundary frame), the first prediction boundary frame obtained by the prediction of the generation sub-model can be further caused to be closer to the actual boundary frame due to the fact that the discrimination sub-model exists and the model parameters are adjusted based on the discrimination result of the discrimination sub-model, so that the accuracy of regression loss components corresponding to the distribution similarity degree of the boundary frame can be further improved, the first prediction boundary frame obtained by the prediction of the target detection model can be further ensured to be more real, and the discrimination probability from real boundary frame and corresponding first prediction boundary frame corresponding to real data or generated data can be respectively judged by means of the discrimination sub-model; specifically, for the actual bounding box and the first prediction bounding box corresponding to a certain first original bounding box, the discrimination sub-model discriminates that the actual bounding box comes from the discrimination probability of the real data and the discrimination probability that the discrimination sub-model discriminates that the first prediction bounding box comes from the generated data, the larger the discrimination probability that the discrimination sub-model discriminates that the actual bounding box comes from the real data, the larger the discrimination probability that the first prediction bounding box comes from the generated data, the lower the probability distribution similarity of the first prediction bounding box and the corresponding actual bounding box, the larger the first regression loss component corresponding to the discrimination dimension of the boundary box distribution similarity, so that the distribution similarity of the first prediction bounding box corresponding to a certain first original bounding box and the corresponding actual bounding box is determined based on the discrimination probability that the discrimination sub-model respectively comes from the real data or the generated data, the first discrimination result can be generated based on the discrimination probability of the discrimination sub-model, and the first regression loss component corresponding to the boundary box distribution similarity can be determined based on the probability in the first discrimination result; in addition, since the generating sub-model can synchronously perform the boundary frame prediction and the class prediction, considering that for the first original boundary frame with low accuracy of the class prediction, the first prediction boundary frame corresponding to the first original boundary frame may not truly reflect the boundary frame prediction accuracy of the generating sub-model, and further the discrimination sub-model cannot truly reflect the boundary frame prediction accuracy of the generating sub-model for the discrimination result of the first prediction boundary frame and the actual boundary frame corresponding to the first original boundary frame, the preset constraint is introduced in the determining process of the first discrimination result characterizing the similarity degree of the boundary frame distribution, the preset constraint is used as a precondition (that is, the first prediction class corresponding to the first original boundary frame meets a certain preset class matching constraint condition), the discrimination sub-model discriminates the condition discrimination probability that the actual boundary frame is from real data, and the discrimination probability that the first prediction boundary frame is from the condition discrimination probability of the generating data, that the discrimination probability is the discrimination sub-model is from the condition discrimination that the first discrimination boundary frame is from the first discrimination probability that the first discrimination boundary frame is from the real discrimination probability that the first discrimination boundary frame is the first discrimination probability that the first discrimination boundary frame is from the first discrimination boundary frame that the discrimination probability that the first discrimination boundary frame is from the first discrimination that the first discrimination boundary frame from the first discrimination that the discrimination boundary frame that the discrimination frame from the first discrimination frame that the discrimination sub-frame.
Correspondingly, in the determining process of the second discrimination result aiming at the category similarity degree between the first prediction category and the actual category, the cross entropy classification loss between the two categories can be directly calculated based on the first prediction category and the actual category; however, in the specific implementation, considering whether the first prediction category predicted by the generated sub-model can be discriminated to be sufficiently true or not, in the case that the generated target object category (i.e., the first prediction category) is difficult to distinguish from the true target object category (i.e., the actual category), the model parameters are adjusted based on the discrimination result of the discrimination sub-model, so that the first prediction category predicted by the generated sub-model can be further promoted to be closer to the actual category, so that in order to further improve the accuracy of the classification loss component corresponding to the category similarity degree, the first prediction category predicted by the target detection model is further ensured to be more true, the actual category corresponding to the first original boundary frame and the corresponding first prediction category can be discriminated by means of the discrimination sub-model, and the discrimination probability from the true data or the generated data respectively can be related to the proximity degree of the two categories, so that the discrimination probability can characterize the category similarity degree between the actual category and the corresponding first prediction category, so that the target learning loss component corresponding to the first boundary frame can be promoted based on the discrimination probability; specifically, for the actual category and the first prediction category corresponding to a certain first original boundary box, the discrimination sub-model discriminates that the actual category comes from the discrimination probability of the real data and the discrimination probability of discriminating that the first prediction category comes from the generated data, so that the discrimination sub-model discriminates that the discrimination probability of the actual category comes from the real data is larger, the discrimination probability of the first prediction category comes from the generated data is larger, the classification loss component corresponding to the first original boundary box is larger, and therefore the classification loss component corresponding to the first original boundary box is determined based on the discrimination probability that the discrimination sub-model discriminates that the actual category and the first prediction category come from the real data or the generated data respectively.
Correspondingly, in the determining process of the third judging result for representing the coordinate coincidence degree of the boundary frames, only the cross-ratio loss between a certain actual boundary frame and a corresponding first prediction boundary frame can be considered to obtain the target cross-ratio loss; the cross-ratio loss between a certain actual boundary frame and a corresponding first prediction boundary frame and the cross-ratio loss between a certain actual boundary frame and a first prediction boundary frame corresponding to other actual boundary frames can be comprehensively considered, and the target cross-ratio loss is determined; the size of the target cross-ratio loss can represent the coordinate coincidence degree between the actual boundary frame and the corresponding first prediction boundary frame, so that a second regression loss component corresponding to the discrimination dimension considered from the coordinate coincidence degree angle of the boundary frame can be determined based on the target cross-ratio loss, and the model is further promoted to carry out the boundary frame regression learning; specifically, for the actual bounding box and the first prediction bounding box corresponding to a certain first original bounding box, determining a target intersection ratio loss between the actual bounding box and the first prediction bounding box, wherein the larger the target intersection ratio loss is, the lower the coordinate coincidence degree of the first prediction bounding box and the corresponding actual bounding box is, the larger the corresponding second regression loss component is for the discrimination dimension of the coordinate coincidence degree of the bounding box, so that the coordinate coincidence degree of the first prediction bounding box and the corresponding actual bounding box corresponding to a certain first original bounding box is determined based on the target intersection ratio loss between the actual bounding box and the first prediction bounding box, therefore, a third discrimination result can be generated based on the target intersection ratio loss, and the third discrimination result can represent the coordinate coincidence degree of the bounding box, and further, the second regression loss component corresponding to the discrimination dimension of the coordinate coincidence degree of the bounding box can be determined based on the intersection ratio loss in the third discrimination result; in addition, since the third discrimination result is also used to determine the regression loss value, in the determination process of the third discrimination result for characterizing the coordinate coincidence degree of the bounding box, the preset constraint is introduced as a precondition (that is, the first prediction class corresponding to the first original bounding box satisfies a certain preset class matching constraint condition), and the conditional cross ratio loss between the actual bounding box and the first prediction bounding box is determined, that is, the third discrimination result corresponding to the first original bounding box is determined only if the first prediction class corresponding to the first original bounding box satisfies the preset constraint condition under the condition that the class of the target object in the first prediction bounding box is predicted by the generating sub-model to be the target class matching with the corresponding actual class (that is, the first prediction class matches with the actual class).
S1044, determining a regression classification loss value of the model to be trained based on the first discrimination result, the second discrimination result and the third discrimination result corresponding to each first original boundary box;
the regression classification loss value comprises a regression loss value determined based on a first discrimination result and a third discrimination result corresponding to each first original boundary frame and a classification loss value determined based on a second discrimination result corresponding to each first original boundary frame;
specifically, after a discrimination result set is obtained for each first original bounding box, a sub-regression classification loss value corresponding to each first original bounding box is obtained, wherein the sub-regression classification loss value at least comprises a first regression loss component corresponding to a first discrimination dimension considered from the perspective of boundary box distribution similarity, a classification loss component corresponding to a classification discrimination dimension considered from the perspective of classification similarity between an actual class and a first prediction class, and a second regression loss component corresponding to a second discrimination dimension considered from the perspective of boundary box coordinate coincidence; then, based on the sub-regression classification loss values corresponding to each of the first original bounding boxes, a regression classification loss value for adjusting the model parameters may be determined.
It should be noted that, when the sub-regression classification loss value may be considered to include a sub-regression loss value and a sub-classification loss value, in the process of determining the sub-regression loss value corresponding to the first original bounding box, the sub-regression classification loss value corresponding to the first original bounding box may be determined based on the first regression loss component corresponding to the first discrimination result and the classification loss component corresponding to the second discrimination result, where the sub-regression classification loss value corresponding to the first original bounding box may be considered from the perspective of the bounding box distribution similarity and the bounding box coordinate coincidence degree, or may be considered only from the perspective of the bounding box distribution similarity, that is, the discrimination result set corresponding to the first original bounding box includes the first discrimination result and the second discrimination result.
S1046, updating model parameters of the generation sub-model and the discrimination sub-model based on the regression classification loss value.
Specifically, after determining regression classification loss values based on the sub-regression classification loss values corresponding to the first original bounding boxes, performing parameter adjustment on the generation sub-model and the discrimination sub-model based on the regression classification loss values by using a gradient descent method; the classification loss value related to the target object classification is obtained based on the discrimination probability of the discrimination sub-model, and the prediction class is not concerned in the model training process, but the true and false discrimination result of the discrimination sub-model is used, so that the target classification accuracy of the target detection model obtained by final training is higher; in addition, the sub-regression classification loss values corresponding to the respective first original bounding boxes may include a sub-classification loss value and a sub-regression loss value, and since the sub-regression loss value related to the bounding box prediction reflects at least a first regression loss component corresponding to a regression loss discrimination dimension based on a bounding box distribution similarity degree and a second regression loss component corresponding to a regression loss discrimination dimension based on a bounding box coordinate coincidence degree, the regression loss value for adjusting the model parameters also reflects the regression loss components corresponding to the two regression loss discrimination dimensions, respectively, so that the target detection model obtained by the final training not only can ensure that the probability distribution of the predicted first prediction bounding box and the actual bounding box is closer, but also can ensure that the coordinate coincidence degree of the first prediction bounding box and the actual bounding box is higher.
In the model training process, the discrimination sub-model is used for distinguishing the actual boundary frame corresponding to the first original boundary frame from the corresponding first prediction boundary frame as far as possible, whether the actual boundary frame is from real data or generated data respectively, and distinguishing the actual category corresponding to the first original boundary frame from the corresponding first prediction category as far as possible, whether the actual category corresponding to the first original boundary frame is from real data or generated data respectively, so that regression classification loss of a model to be trained is minimized, and in order to maximize resolution errors of the discrimination sub-model, the generation sub-model is forced to constantly learn boundary frame distribution and target object category identification, and the generation sub-model and the discrimination sub-model are forced to conduct multi-round countermeasure learning, so that the generation sub-model with more accurate prediction is obtained as a target detection model.
It should be noted that, iterative training is performed on model parameters based on regression classification loss values of the model to be trained, so that the obtained target detection model can refer to the existing process of adjusting and optimizing the model parameters by using the gradient descent method for back propagation, which is not described herein.
In addition, the target detection model trained based on the target detection model training method provided by the embodiment of the application can be applied to any specific application scene in which target detection is required to be performed on an image to be detected, for example, a specific application scene 1, target detection is performed on an image to be detected acquired by using an image acquisition device of a certain public place entrance (such as a mall entrance, a subway entrance, a scenic spot entrance, or a performance site entrance), and for example, target detection is performed on an image to be detected acquired by using an image acquisition device of each monitoring point in a certain cultivation base in a specific application scene 2;
The preset sample image set used in the training process of the target detection model is different due to different specific application scenes of the target detection model, and for the specific application scene 1, the preset sample image set can be a historical sample image acquired at an entrance of a specified public place in a preset historical time period, correspondingly, a target object outlined by a first original boundary frame is a target user entering the specified public place in the historical sample image, and the actual category and the first prediction category can be categories to which the target user belongs, such as at least one of age, sex, height and occupation; for the specific application scene 2, the preset sample image set may be a historical sample image acquired by each monitoring point in the specified culture base in a preset historical time period, and correspondingly, the target object outlined by the first original bounding box is a target culture object in the historical sample image, and the actual category and the first prediction category may be at least one of the category to which the target culture object belongs, such as a living body state and a body size.
As shown in fig. 3, a schematic diagram of a specific implementation principle of a training process of a target detection model is provided, which specifically includes:
Acquiring a first preset number of first original boundary frames, and acquiring actual boundary frames and actual categories corresponding to the first original boundary frames respectively;
for each first original bounding box: the generating sub-model carries out boundary frame prediction based on the first original boundary frame to obtain a first prediction boundary frame, and carries out target category prediction on the first original boundary frame or an image area in the first prediction boundary frame to obtain a first prediction category; the judgment sub-model generates a judgment result set based on an actual boundary box and a first prediction boundary box corresponding to the first original boundary box and an actual category and a first prediction category corresponding to the first original boundary box;
determining regression classification loss values of the model to be trained based on the first discrimination results, the second discrimination results and the third discrimination results corresponding to the first original boundary boxes;
and iteratively updating model parameters of the model to be trained based on the regression classification loss value until the current model training result meets the preset model training ending condition to obtain the target detection model.
Further, considering that there may be a sudden decrease or even zero gradient of the regression classification loss obtained based on the discrimination results outputted from the discrimination sub-model during the model training, in order to further improve the training accuracy of the model parameters, a loss compensation value is introduced, based on which the set of discrimination results further includes a fourth discrimination result; correspondingly, the generating a discrimination result set in S1042 based on the actual bounding box corresponding to the first original bounding box, the first prediction bounding box corresponding to the first original bounding box, the corresponding actual category and the first prediction category specifically includes:
Under the condition that a first prediction category corresponding to the first original boundary frame meets the preset constraint, carrying out boundary frame authenticity judgment on an actual boundary frame corresponding to the first original boundary frame and the first prediction boundary frame to obtain a first judgment result;
carrying out category authenticity judgment on the actual category corresponding to the first original boundary frame and the first prediction category to obtain a second judgment result;
under the condition that a first prediction category corresponding to the first original boundary frame meets the preset constraint, calculating boundary frame intersection ratio loss based on an actual boundary frame corresponding to the first original boundary frame and the first prediction boundary frame, and obtaining a third judging result;
and calculating a loss compensation value for restraining the loss gradient of the loss function of the model to be trained based on the actual boundary box and the first prediction boundary box corresponding to the first original boundary box, and obtaining a fourth discrimination result.
Specifically, since the determining processes of the first discrimination result and the third discrimination result are both associated with whether the first prediction category satisfies the preset category matching constraint condition, for the first original bounding box for which the first prediction category does not satisfy the preset category matching constraint condition, the first discrimination result and the third discrimination result are empty, or the corresponding discrimination result set only includes the second discrimination result and the fourth discrimination result, that is, only the second discrimination result and the fourth discrimination result corresponding to the first original bounding box are considered, that is, for the first original bounding box for which the first prediction category does not satisfy the preset category matching constraint condition, only the classification loss corresponding to the first original bounding box is counted, but not the regression loss corresponding to the first original bounding box; correspondingly, for the first original boundary box of which the first prediction category meets the preset category matching constraint condition, not only the second discrimination result and the fourth discrimination result corresponding to the first original boundary box, but also the first discrimination result and the third discrimination result corresponding to the first original boundary box are considered, namely, for the first original boundary box of which the first prediction category meets the preset category matching constraint condition, the classification loss corresponding to the first original boundary box is counted, and the regression loss corresponding to the first original boundary box is counted.
Specifically, for each first original bounding box, the set of discrimination results corresponding to the first original bounding box not only comprises a first discrimination result obtained from the perspective of similarity of distribution of the bounding box, a second discrimination result used for determining classification loss and a third discrimination result obtained from the perspective of coincidence degree of coordinates of the bounding box, but also comprises a loss compensation value used for restraining a loss gradient of a loss function, so that the accuracy of regression classification loss value can be improved, and the problem that the loss gradient suddenly decreases or even becomes zero can be solved.
In implementation, as shown in fig. 4a, a schematic diagram of another specific implementation principle of the training process of the target detection model is provided, which specifically includes:
extracting a target region from a preset sample image set by using a preset region of interest extraction model in advance to obtain X anchor frames; the method comprises the steps that a preset sample image set comprises a plurality of original sample images, and each original sample image at least comprises a target object; the feature information corresponding to each anchor frame may include position information (x, y, w, h) and category information c, i.e., (x, y, w, h, c); specifically, in the model training process, multiple parameter dimensions can be set to be mutually independent, so that the iterative training process of model parameters aiming at each dimension is also mutually independent;
Randomly sampling N anchor frames from the X anchor frames as first original boundary frames aiming at each round of model training, and determining an actual boundary frame and an actual category corresponding to each first original boundary frame respectively; wherein, each target object in the preset sample image set may correspond to an actual bounding box and an actual class, for example, if the total number of target objects in the preset sample image set is d, the number of actual bounding boxes before expansion is d, so that the actual bounding boxes correspond to the first prediction bounding boxes, and therefore, the actual bounding boxes corresponding to the first original bounding boxes containing the same target object may be the same, that is, the actual bounding boxes are expanded based on the target object outlined by the first original bounding boxes, to obtain N actual bounding boxes (N > d); for example, if the target object included in a certain original sample image is a cat a, the cat a corresponds to an actual bounding box a, and if the number of the first original bounding boxes including the cat a is 4 (e.g., the first original bounding boxes with serial numbers of 6, 7, 8 and 9), the actual bounding box a is expanded into 4 actual bounding boxes a (i.e., the actual bounding boxes with serial numbers of 6, 7, 8 and 9), and the actual categories corresponding to the 4 expanded actual bounding boxes a are cats;
For each first original boundary frame, generating a sub-model, carrying out boundary frame prediction based on the first original boundary frame to obtain a first prediction boundary frame, and carrying out target category prediction on the first original boundary frame or an image area in the first prediction boundary frame to obtain a first prediction category; the judging sub-model generates a judging result set based on an actual boundary frame and a first predicting boundary frame corresponding to the first original boundary frame and an actual category and a first predicting category corresponding to the first original boundary frame; each first original boundary frame corresponds to an actual boundary frame, an actual category corresponding to the actual boundary frame, a first prediction boundary frame and a first prediction category corresponding to the first prediction boundary frame, wherein the first prediction boundary frame is obtained through continuous boundary frame regression learning generation sub-model prediction, and the first prediction category is obtained through continuous target classification recognition learning generation sub-model prediction; specifically, a target object outlined by a first prediction boundary box with serial numbers of 6, 7, 8 and 9 in N first prediction boundary boxes output by the generation sub-model is cat A;
for each first original bounding box, determining at least one loss component of: determining a first regression loss component based on a first discrimination result of the set of discrimination results for the first original bounding box, determining a classification loss component based on a second discrimination result of the set of discrimination results for the first original bounding box, determining a second regression loss component based on a third discrimination result of the set of discrimination results for the first original bounding box, and determining a loss compensation component based on a fourth discrimination result of the set of discrimination results for the first original bounding box; specifically, if the first original bounding box does not meet the preset constraint, the first discrimination result and the third discrimination result are null, and the corresponding first regression loss component and second regression loss component are equal to zero, that is, the sub-regression loss value of the first original bounding box is not considered, and only the sub-classification loss value and the loss compensation value are considered;
Determining a regression classification loss value of the model to be trained based on at least one of a first regression loss component, a classification loss component, a second regression loss component and a loss compensation component which are respectively corresponding to each first original boundary box; adjusting model parameters of the generating sub-model and the judging sub-model based on the regression classification loss value by using a random gradient descent method to obtain a generating sub-model and a judging sub-model after parameter updating;
if the current model iteration training result meets the model iteration training termination condition, determining the updated generation sub-model as a trained target detection model;
if the current model iterative training result does not meet the model iterative training termination condition, determining the updated generation sub-model and the updated judgment sub-model as a model to be trained for the next round of model training until the model iterative training termination condition is met.
Specifically, in the model training process, for each round of model training, model parameters of the discriminant sub-model can be adjusted based on the discriminant result set, and model parameters of the generated sub-model can be adjusted based on the discriminant result set; however, in the specific implementation, in order to improve the training accuracy of the model parameters of the generation sub-model, for each round of model training, the model parameters of the discrimination sub-model are firstly circularly adjusted t times based on the discrimination result set, and then the model parameters of the generation sub-model are adjusted once based on the discrimination result set, so that the discrimination sub-model and the generation sub-model with the adjusted parameters are obtained as the models to be trained in the next round.
The step S1044 of determining the regression classification loss value of the model to be trained based on the first discrimination result, the second discrimination result and the third discrimination result corresponding to each of the first original bounding boxes, specifically includes:
determining a sub-regression classification loss value corresponding to each first original boundary box; the sub-regression classification loss value corresponding to each first original bounding box is determined based on target information, wherein the target information comprises part or all of the following: whether the first prediction category corresponding to the first original boundary box meets the preset constraint, the boundary box distribution similarity represented by the first discrimination result corresponding to the first original boundary box, the category similarity represented by the second discrimination result, the boundary box coordinate coincidence degree represented by the third discrimination result and the loss compensation value represented by the fourth discrimination result;
and determining the regression classification loss value of the model to be trained based on the sub-regression classification loss value corresponding to each first original boundary box.
The sub-regression classification loss value may further include a loss compensation value or not, and the sub-regression classification loss value may be determined based on the first regression loss component or may be determined based on both the first regression loss component and the second regression loss component, and the sub-classification loss value is determined based on the classification loss component, specifically, for a certain first original bounding box, if the first prediction class corresponding to the first original bounding box does not satisfy the preset constraint, the target information for determining the sub-regression classification loss value corresponding to the first original bounding box may include: determining a sub-regression classification loss value corresponding to the first original boundary box based on the classification loss component corresponding to the second discrimination result, namely considering only the sub-classification loss value corresponding to the first original boundary box and not considering the sub-regression loss value corresponding to the first original boundary box; or the target information for determining the sub-regression class loss value corresponding to the first original bounding box may also include: determining a sub-regression classification loss value corresponding to the first original boundary frame based on the classification loss component corresponding to the second discrimination result and the loss compensation component corresponding to the fourth discrimination result, namely considering only the sub-classification loss value and the loss compensation value corresponding to the first original boundary frame and not considering the sub-regression loss value corresponding to the first original boundary frame; if the first prediction category corresponding to the first original bounding box meets the preset constraint, the target information for determining the sub-regression classification loss value corresponding to the first original bounding box may include: determining a sub-regression classification loss value corresponding to the first original boundary box based on a first regression loss component corresponding to the first discrimination result and a classification loss component corresponding to the second discrimination result; or the target information for determining the sub-regression class loss value corresponding to the first original bounding box may also include: determining a sub-regression classification loss value corresponding to the first original boundary box based on a first regression loss component corresponding to the first judgment result, a classification loss component corresponding to the second judgment result and a second regression loss component corresponding to the third judgment result, namely considering the sub-classification loss value and the sub-regression loss value corresponding to the first original boundary box; or the target information for determining the sub-regression class loss value corresponding to the first original bounding box may further include: the sub-regression classification loss value corresponding to the first original boundary box is determined based on the first regression loss component corresponding to the first original boundary box, the classification loss component corresponding to the second judgment result, the second regression loss component corresponding to the third judgment result and the loss compensation component corresponding to the fourth judgment result, namely, the sub-classification loss value, the sub-regression loss value and the loss compensation value corresponding to the first original boundary box are considered simultaneously.
Taking the first original bounding box satisfying the above-mentioned preset constraint and considering the loss compensation component as an example, the sub-regression classification loss value corresponding to the first original bounding box is equal to the weighted sum of four loss components, which can be expressed as,
V i (D,G)=λ 1 V i12 V i23 V i34 V i4
wherein lambda is 1 Representing a first weight coefficient corresponding to a first regression loss component in a first discriminant dimension, V i1 Represents a first regression loss component (i.e., a regression loss component corresponding to the degree of similarity of the bounding box distribution characterized by the first discrimination result) lambda in the first discrimination dimension 2 Representing a second weight coefficient corresponding to the classification loss component, V i2 Represents a classification loss component (i.e., a classification loss component corresponding to the degree of similarity of the category characterized by the second discrimination result), lambda 3 Representing a third weight coefficient corresponding to a second regression loss component in a second discrimination dimension, V i3 Represents a second regression loss component (i.e., a regression loss component corresponding to the degree of coincidence of the bounding box coordinates characterized by the third discrimination result) lambda in the second discrimination dimension 4 A fourth weight coefficient corresponding to the loss compensation value, V i4 Representing a loss compensation value (i.e., a loss compensation component); specifically, the first discrimination dimension may be a regression loss discrimination dimension based on the similarity of the boundary frame distribution, and the second discrimination dimension may be a regression loss based on the coordinate coincidence degree of the boundary frame And (5) judging the dimension.
In a specific implementation, for a plurality of first original bounding boxes meeting the preset constraint, the first weight coefficient and the third weight coefficient may be kept unchanged, however, considering that the first regression loss component and the second regression loss component respectively correspond to different regression loss discrimination dimensions (i.e. a regression loss discrimination dimension based on the boundary box distribution similarity degree and a regression loss discrimination dimension based on the boundary box coordinate coincidence degree), and the emphasis of the regression loss consideration of the different regression loss discrimination dimensions is also different (such as the regression loss of the first original bounding box corresponding to the actual bounding box considering boundary box edge blurring based on the boundary box distribution similarity degree, the regression loss discrimination dimension based on the boundary box coordinate coincidence degree is larger than the regression loss of the first original bounding box considering boundary box distribution similarity but specific position deviation), so that the size relationship of the first regression loss component and the second regression loss component reflects which regression discrimination dimension can more accurately represent the regression between the actual bounding box and the first prediction bounding box to a certain extent, and the regression loss of the first original bounding box based on the first weight coefficient and the first weight coefficient is adjusted according to the first weight coefficient and the first original bounding box; specifically, if the absolute value of the difference between the first regression loss component and the second regression loss component is not greater than a preset loss threshold, the first weight coefficient and the third weight coefficient are kept unchanged; if the absolute value of the difference value between the first regression loss component and the second regression loss component is larger than a preset loss threshold value and the first regression loss component is larger than the second regression loss component, increasing a first weight coefficient according to a first preset adjustment mode; if the absolute value of the difference between the first regression loss component and the second regression loss component is larger than the preset loss threshold value and the first regression loss component is smaller than the second regression loss component, increasing a third weight coefficient according to a second preset adjustment mode, so that the effect that the regression loss component corresponding to the discrimination dimension of the regression loss of the bounding box can be reflected by the key reference for each first original bounding box in the model training process is achieved, and further the accuracy of model parameter optimization is further improved.
It should be noted that, the first weight coefficient increasing amplitude corresponding to the first preset adjusting mode and the third weight coefficient increasing amplitude corresponding to the second preset adjusting mode may be the same or different, and the weight coefficient increasing amplitude may be set according to actual requirements, which is not limited in this application.
The process of obtaining a first discrimination result from discrimination dimension consideration of the distribution similarity degree of the boundary frame under the condition that the first prediction category meets the preset constraint is aimed at, and performing true and false discrimination of the boundary frame on the actual boundary frame corresponding to the first original boundary frame and the first prediction boundary frame to obtain the first discrimination result specifically includes:
a1, determining a first judging probability that the actual boundary box is judged to be true by the judging sub-model based on the actual boundary box corresponding to the first original boundary box; determining a second discrimination probability of discriminating the first prediction boundary box as falsified by the discrimination sub-model based on a first prediction boundary box corresponding to the first original boundary box;
and A2, generating a first judging result corresponding to the first original boundary frame based on the first judging probability and the second judging probability corresponding to the first original boundary frame.
Specifically, for a first original boundary box of which the first prediction category meets a preset category matching constraint condition, judging the probability that an actual boundary box corresponding to the first original boundary box is from real data through the judging sub-model, namely for the actual boundary box, judging the authenticity of the actual boundary box through the judging sub-model to obtain a first judging probability that the predicted actual boundary box is the real data; similarly, the probability that the first prediction boundary box corresponding to the first original boundary box is derived from the generated data (that is, the probability that the first prediction boundary box is derived from the real data is subtracted from the value 1) is determined by the determination sub-model, that is, the first prediction boundary box is determined by the determination sub-model to be true or false, so that the second determination probability that the first prediction boundary box is predicted to be the generated data is obtained.
Specifically, the first probability distribution corresponding to the actual boundary frame is compared with the second probability distribution corresponding to the first prediction boundary frame from the boundary frame distribution similarity degree angle by the judging sub-model, so that the authenticity judgment of the actual boundary frame and the first prediction boundary frame is realized, and corresponding judging probabilities are obtained, and the judging probabilities can represent the distribution similarity degree between the actual boundary frame and the corresponding first prediction boundary frame, therefore, after the first judging probabilities and the second judging probabilities are determined, a first judging result can be obtained, wherein the first judging result can represent the boundary frame distribution similarity degree; further, based on the first discrimination result, a first regression loss component corresponding to a discrimination dimension representing the distribution similarity degree of the boundary frame can be determined, wherein the larger the first discrimination probability and the second discrimination probability are, the lower the distribution similarity degree of an actual boundary frame corresponding to the first original boundary frame and a corresponding first prediction boundary frame is represented, and therefore the larger the first regression loss component corresponding to the first original boundary frame is; and updating model parameters of the generation sub model based on the first regression loss component, so that the generation result of the generation sub model can optimize the loss value of the model to be trained after being predicted by the discrimination sub model, the purpose of optimizing the generation sub model is achieved, and the boundary frame prediction effect of the generation sub model is improved.
Further, in order to improve the accuracy of the first discrimination result corresponding to each first original bounding box, so that in the process of determining the regression classification loss value based on the first discrimination result, the accuracy of the first regression loss component corresponding to the discrimination dimension of the similarity degree of the bounding box distribution can be improved, based on this, the step A2 generates the first discrimination result corresponding to the first original bounding box based on the first discrimination probability and the second discrimination probability corresponding to the first original bounding box, which specifically includes:
step A21, determining a first weighted probability based on the first discrimination probability and a first prior probability of an actual bounding box corresponding to the first original bounding box; determining a second weighted probability based on the second discrimination probability and a second prior probability of the first original bounding box;
step A22, based on the first weighted probability and the second weighted probability corresponding to the first original boundary box, generating a first discrimination result corresponding to the first original boundary box.
Specifically, in determining the first discrimination result characterizing the similarity degree of the boundary frame distribution, the discrimination sub-model performs true-false discrimination on the actual boundary frame and the first prediction boundary frame by considering the first prior probability of the actual boundary frame and the second prior probability of the first original boundary frame, and the obtained first discrimination probability and second discrimination probability are weighted to determine the first discrimination result (i.e., the first discrimination result may include the first weighted probability and the second weighted probability), so that the first regression loss component related to the similarity degree of the boundary frame distribution obtained based on the first discrimination result may be expressed as:
Wherein,representing the prior probability (i.e., first prior probability) of the occurrence of the ith actual bounding box, P i1 A first discrimination probability indicating that the i-th actual bounding box is predicted as true by the discrimination submodel,/for the first time>Representing the prior probability (i.e., the second prior probability) of the occurrence of the ith first original bounding box, P i2 Representing a second discrimination probability that the ith first prediction boundary box is predicted as counterfeit by the discrimination sub-model.
In the specific implementation, it is noted that,the prior probability of occurrence of the ith first original bounding box may be given as the first prediction bounding box is derived by the generator sub-model based on the first original bounding boxTo, therefore, ">The prior probability may also occur for the i-th first prediction bounding box.
Specifically, since the probability of occurrence of the actual bounding box and the predicted bounding box both follow a certain probability distribution, such as a gaussian distribution, the first prior probability and the second prior probability can be obtained by:
wherein,representing the actual bounding box, σ, corresponding to the first original bounding box with the sequence number i 1 Variance of distribution probability representing a first preset number of actual bounding boxes,/for>Representing the mean value of the distribution probabilities of a first preset number of actual bounding boxes.
Wherein,representing the first original bounding box with the sequence number i, sigma 2 Variance of distribution probability representing first original bounding box of first preset number, +.>Representing the mean value of the distribution probability of a first preset number of first original bounding boxes.
The step of determining a second discrimination result capable of representing the classification loss component corresponding to the first original bounding box, namely obtaining the second discrimination result from discrimination dimension consideration of the class similarity of the bounding box, wherein the step of performing class authenticity discrimination on the actual class corresponding to the first original bounding box and the first prediction class to obtain the second discrimination result specifically comprises the following steps:
step B1, determining a third judging probability that the actual category corresponding to the first original boundary box is judged to be true by the judging sub-model; and determining a fourth discrimination probability that the first prediction category corresponding to the first original boundary box is discriminated as counterfeit by the discrimination sub-model;
and B2, generating a second discrimination result corresponding to the first original boundary frame based on the third discrimination probability and the fourth discrimination probability corresponding to the first original boundary frame.
Specifically, for each first original boundary box, judging the probability that the actual category corresponding to the first original boundary box comes from the real data through the judging sub-model, namely, for the actual category, judging the true and false of the actual category through the judging sub-model to obtain a third judging probability that the predicted actual category is the real data; similarly, the probability that the first prediction category corresponding to the first original boundary box is judged to be from the generated data through the judging sub-model (namely, the probability that the judging sub-model judges that the first prediction category is from the real data is subtracted from the numerical value 1) is that the judging sub-model judges that the first prediction category is true or false for the first prediction category, so that a fourth judging probability that the first prediction category is predicted to be the generated data is obtained.
Specifically, the discrimination sub-model compares the third probability distribution corresponding to the actual category with the fourth probability distribution corresponding to the first prediction category from the category similarity degree angle, so as to realize the true-false discrimination of the actual category and the first prediction category, and obtain the corresponding discrimination probability, wherein the discrimination probability can represent the category similarity degree between the actual category and the corresponding first prediction category, therefore, after the third discrimination probability and the fourth discrimination probability are determined, a second discrimination result can be obtained, wherein the second discrimination result can represent the category similarity degree; further, based on the second discrimination result, a classification loss component can be determined, wherein the greater the third discrimination probability and the fourth discrimination probability, the lower the similarity degree between the actual category corresponding to the first original boundary box and the category corresponding to the first prediction category is represented, and therefore, the greater the classification loss component corresponding to the first original boundary box is; and updating model parameters of the generation sub model based on the classification loss components, so that the generation result of the generation sub model can optimize the loss value of the model to be trained after being predicted by the discrimination sub model, the purpose of optimizing the generation sub model is achieved, and the identification accuracy of the target object class of the generation sub model is improved.
Further, in order to improve the accuracy of the second discrimination result corresponding to each first original bounding box, so that in the process of determining the sub-regression classification loss value based on the second discrimination result, the accuracy of the classification loss component corresponding to the discrimination dimension of the degree of similarity between the first prediction class and the actual class can be improved, based on this, the step B2 generates the second discrimination result corresponding to the first original bounding box based on the third discrimination probability and the fourth discrimination probability corresponding to the first original bounding box, which specifically includes:
b21, determining a third weighted probability based on the third discrimination probability and a third prior probability that the category of the actual bounding box corresponding to the first original bounding box is the actual category; determining a fourth weighted probability based on the fourth discrimination probability and a fourth prior probability that the category of the first original bounding box is an actual category;
and B22, generating a second judging result corresponding to the first original boundary box based on the third weighted probability and the fourth weighted probability corresponding to the first original boundary box.
Specifically, in the process of determining the second discrimination result representing the class similarity, taking into consideration the third prior probability that the class of the actual bounding box is the actual class and the fourth prior probability that the class of the first original bounding box is the actual class, respectively performing true-false discrimination on the actual class and the first prediction class by using the discrimination sub-model, and performing weighting processing on the obtained third discrimination probability and fourth discrimination probability to determine the second discrimination result (i.e. the second discrimination result may include the third weighted probability and the fourth weighted probability), so that the classification loss component related to the class similarity obtained based on the second discrimination result may be expressed as:
Wherein,representing the prior probability (i.e. third prior probability) of the occurrence of the actual category corresponding to the ith first original bounding box, P i3 A third discrimination probability indicating that the actual class corresponding to the i-th first original bounding box is predicted as true by the discrimination submodel,/th first original bounding box>Representing the prior probability (i.e., fourth prior probability) of the occurrence of the first prediction category corresponding to the ith first original bounding box, P i4 And the fourth discrimination probability that the first prediction category corresponding to the ith first original boundary box is predicted as fake by the discrimination sub-model is represented.
Specifically, since the probability of occurrence of a certain category is subject to a certain probability distribution, such as a gaussian distribution, the third prior probability and the fourth prior probability can be obtained by:
wherein,representing the actual class, sigma, corresponding to the first original bounding box with the sequence number i 3 Representing the variance of the distribution probability of the actual class corresponding to the first original bounding box of the first preset number, +.>And representing the average value of the distribution probabilities of the actual categories corresponding to the first original bounding boxes of the first preset quantity.
Wherein,representing a first prediction category, sigma, corresponding to a first original bounding box with the sequence number i 4 Representing a first preset number of variances of distribution probabilities of a first prediction category corresponding to a first original bounding box,/for >And representing the average value of the distribution probabilities of the first prediction categories corresponding to the first original boundary boxes of the first preset quantity.
Specifically, the regression classification loss value is equal to the sum of the sub-regression classification loss values corresponding to the first original bounding boxes with the first preset number, and may be specifically expressed as:
wherein N is reg Representing a first preset number, i represents the serial number of the first original boundary frame, and the value of i is 1 to N reg
The process of obtaining a third discrimination result from discrimination dimension consideration of the coordinate coincidence degree of the boundary frame under the condition that the first prediction category meets the preset constraint includes calculating boundary frame intersection ratio loss based on an actual boundary frame and a first prediction boundary frame corresponding to a first original boundary frame to obtain the third discrimination result, specifically including:
step C1, calculating boundary frame cross-ratio loss of an actual boundary frame corresponding to the first original boundary frame and a first prediction boundary frame corresponding to the first original boundary frame to obtain first cross-ratio loss;
specifically, if the first prediction category corresponding to the first original bounding box with the sequence number i meets the preset category matching constraint condition, calculating the cross-ratio loss between the actual bounding box with the sequence number i and the first prediction bounding box with the sequence number i, and obtaining the first cross-ratio loss corresponding to the first original bounding box with the sequence number i.
And C2, determining a third judging result corresponding to the first original boundary box based on the first cross ratio loss.
Specifically, since the degree of overlap loss of the boundary frame coordinates can be represented by the magnitude of the overlap loss between the two boundary frames, a third discrimination result can be obtained based on the overlap loss between the actual boundary frame and the first prediction boundary frame, so that a second regression loss component corresponding to the discrimination dimension considered from the angle of the degree of overlap of the boundary frame coordinates is determined based on the third discrimination result, and further the model is promoted to perform the boundary frame regression learning.
Further, for the determination process of the third discrimination result, only the first intersection ratio loss between the actual boundary frame and the first prediction boundary frame corresponding to the third discrimination result may be considered, however, in order to improve the accuracy of the determination of the third discrimination result, thereby improving the accuracy of the second regression loss component corresponding to the discrimination dimension considered from the perspective of the degree of coincidence of the boundary frame coordinates, further improving the accuracy of the regression classification loss value used for adjusting the model parameter, not only the first intersection ratio loss between the actual boundary frame and the first prediction boundary frame corresponding to the third discrimination result but also the second intersection ratio loss between the actual boundary frame and the other first prediction boundary frames are considered, so that the specific position representation of the actual boundary frame is learned, further the actual boundary frame is better learned, and the specific intersection ratio loss between the actual boundary frame and the other first prediction boundary frames is determined based on the first discrimination result, which can be achieved by comparing the actual boundary frame with the positive example sample (i.e. the first prediction boundary frame corresponding to the actual boundary frame obtained by the boundary frame regression learning) and the negative example sample (i.e. the first prediction boundary frame corresponding to the actual boundary frame obtained by the boundary frame regression learning except for the actual boundary frame) in the degree of coincidence degree of coordinates of the boundary frame, further, the specific position representation of the actual boundary frame is learned, and the specific position representation of the actual boundary frame is learned based on the first prediction boundary frame, and the specific intersection ratio loss is calculated based on the first discrimination result, and the first discrimination result is determined based on the first comparison model:
C21, determining a comparison boundary frame set in first prediction boundary frames corresponding to a first preset number of first original boundary frames respectively;
the comparison boundary box set comprises other first prediction boundary boxes except for the first prediction boundary box corresponding to the first original boundary box or other first prediction boundary boxes which do not contain the target object outlined by the first original boundary box;
specifically, taking the first original bounding box with the sequence number i, where the first prediction class meets the preset class matching constraint condition as an example, the set of comparison bounding boxes may include other first prediction bounding boxes except the first prediction bounding box with the sequence number i (i.e., the first prediction bounding box with the sequence number k, k not equal to p, p=i), that is, all the other first prediction bounding boxes except the first prediction bounding box with the sequence number i are taken as negative examples of the actual bounding box with the sequence number i; in order to further improve the selection accuracy of the negative example samples, the above-mentioned comparison bounding box set may include other first prediction bounding boxes except the first prediction bounding box with the sequence number i, and the other first prediction bounding boxes do not include the target object outlined by the first original bounding box with the sequence number i (i.e. the first prediction bounding box with the sequence number k, k not equal p, p=i or p=j, where the first prediction bounding box with the sequence number j is the same as the target object outlined by the first original bounding box with the sequence number i), that is, only the other first prediction bounding boxes including different target objects with the first original bounding box with the sequence number i are taken as the negative example samples of the actual bounding box with the sequence number i.
C22, calculating boundary frame cross-ratio loss of the actual boundary frame corresponding to the first original boundary frame and the other first prediction boundary frames respectively to obtain second cross-ratio loss;
specifically, taking a first original boundary box with a sequence number i, where the first prediction category meets a preset category matching constraint condition as an example, calculating the cross-ratio loss between an actual boundary box with a sequence number i and a first prediction boundary box with a sequence number k for each other first prediction boundary box in the comparison boundary box set, and obtaining a second cross-ratio loss corresponding to the first prediction boundary box with a sequence number k.
And C23, determining a third judging result corresponding to the first original boundary box based on the first cross-ratio loss and the second cross-ratio loss.
Specifically, in determining the third discrimination result characterizing the coordinate coincidence degree of the boundary frame, based on the actual boundary frame with the sequence number i and the first prediction boundary frame with the sequence number i, the first cross-over loss is calculated, and based on the actual boundary frame with the sequence number i and the first prediction boundary frame with the sequence number k, the second cross-over loss (k not equal to p) is calculated, so as to determine the third discrimination result (i.e. the third discrimination result may include the first cross-over loss and the second cross-over loss), then, based on the third discrimination result, the second regression loss component related to the coordinate coincidence degree of the boundary frame can be determined, so that the coordinate coincidence degree of the actual boundary frame with the first prediction boundary frame with the sequence number i is higher, and the coordinate coincidence degree with other first prediction boundary frames is smaller, thereby enhancing the global performance of the boundary frame regression learning, and further improving the accuracy of the boundary frame regression learning.
In a specific implementation, the second regression loss component is a logarithm of a target cross-ratio loss, where the target cross-ratio loss is a quotient value of a sum of an index of the first cross-ratio loss and a plurality of indexes of the second cross-ratio, that is, taking p=i as an example, the second regression loss component may be expressed as:
wherein,representing the actual bounding box corresponding to the first original bounding box with sequence number i +.>A first original bounding box representing the sequence number i, < >>Indicating +_in case the first prediction class corresponding to the first original bounding box with the sequence number i satisfies the preset constraint>Representing a first prediction bounding box corresponding to a first original bounding box with a sequence number i under the condition of meeting preset constraints,representing a first cross-ratio loss, ">A first original bounding box representing the sequence number k, < >>Representing a first prediction bounding box corresponding to a first original bounding box with a sequence number k under the condition of meeting preset constraint,represents the second cross-ratio loss, theta g1 Representing a first model parameter related to bounding box regression in the generated submodel, corresponding to θ g2 Representing a second model parameter related to the target object class prediction, ω representing a preset adjustment factor.
Further, considering that in the target detection process, the generating sub-model needs to determine not only the position of the target object, but also the specific category of the target object, so that the model parameters of the generating sub-model in the model to be trained include a first model parameter related to the regression of the boundary frame and a second model parameter related to the prediction of the target object, that is, the first model parameter and the second model parameter need to be iteratively updated together in the model training process, based on the first model parameter and the second model parameter, in order to further improve the accuracy of the regression classification loss value, in the process of determining the sub-regression classification loss value corresponding to the first prediction boundary frame, the preset category matching constraint condition is introduced, and only the sub-regression loss value and the sub-classification loss value corresponding to the first prediction boundary frame are considered under the condition that the actual category corresponding to the first prediction boundary frame is matched with the first prediction category, otherwise, only the sub-regression loss value corresponding to the sub-classification loss value is considered, that is the first original boundary frame corresponding to the first original boundary frame is excluded from the category prediction result, and the first prediction boundary frame is predicted to be the first category prediction result when the first original boundary frame is specifically implemented; the first class prediction result comprises the prediction probability that the first original boundary box or the target object outlined by the first prediction boundary box belongs to each candidate class, the candidate class corresponding to the maximum value of the prediction probability is the first prediction class, namely the class of the first original boundary box or the target object outlined by the first prediction boundary box is predicted as the first prediction class by the generation sub-model, namely the class of the target object in the first original boundary box or the image area in the first prediction boundary box is predicted as the first prediction class by the generation sub-model;
In addition, it should be noted that, in the specific implementation, considering that the position information of the first original bounding box and the first prediction bounding box does not deviate greatly, the image features in the first original bounding box and the image features in the first prediction bounding box do not deviate greatly, so that the identification of the target object type of the image area in the bounding box is not affected, based on the fact that the bounding box prediction and the type prediction are performed successively, the type prediction can be performed on the first prediction bounding box to obtain a corresponding first type prediction result, namely, the first prediction bounding box is obtained based on the first original bounding box prediction, and then the type prediction is performed on the first prediction bounding box to obtain a first type prediction result; and aiming at the situation that the boundary frame prediction and the class prediction are synchronously executed, the class prediction can be carried out on the first original boundary frame at the same time when the boundary frame prediction is carried out on the basis of the first original boundary frame, so as to obtain a corresponding first class prediction result, namely, the first prediction boundary frame is obtained on the basis of the first original boundary frame prediction, and the class prediction is carried out on the first original boundary frame, so as to obtain the first class prediction result.
Specifically, the preset category matching constraint condition may include: the constraint condition of the single matching mode or the constraint condition of the changing matching mode, wherein the preset type matching constraint condition can be related to the first type prediction result, the type matching constraint condition used by each round of model training is kept unchanged (i.e. is irrelevant to the number of current model training rounds) for the constraint condition of the single matching mode, for example, for each round of model training, if the actual type is the same as the first prediction type, the first prediction type corresponding to the first original boundary box is determined to be matched with the actual type (i.e. the first prediction type meets the preset constraint, i.e. the type of the target object in the first prediction boundary box is predicted as the target type matched with the actual type by the generated sub-model); for the constraint conditions of the change matching mode, the class matching constraint conditions used by each round of model training are related to the number of current model training rounds, and in particular, the constraint conditions of the change matching mode can be classified into class matching stage constraint conditions or class matching gradual change constraint conditions;
The above-mentioned category matching stage constraint condition may be that when the number of training rounds of the current model is less than a first preset number of rounds, the actual category and the first prediction category belong to the same category group, and when the number of training rounds of the current model is greater than or equal to the first preset number of rounds, the actual category is the same as the first prediction category, that is, based on the category matching stage constraint condition and a first category prediction result corresponding to the first original bounding box, the stage category matching constraint can be implemented; the class matching gradual change constraint condition may be that the sum of a first constraint term and a second constraint term is greater than a preset probability threshold, the first constraint term is a first prediction probability corresponding to an actual class in a class prediction probability subset, the second constraint term is a product of the sum of second prediction probabilities except the first prediction probability in the class prediction probability subset and a preset adjustment factor, the preset adjustment factor gradually decreases along with the increase of the number of current training rounds, that is, based on a first class prediction result corresponding to the class matching gradual change constraint condition and a first original boundary frame, gradual change class matching constraint can be realized; specifically, a class prediction probability subset is determined based on a first class prediction result corresponding to the first original boundary box, the class prediction probability subset comprises a first prediction probability that a target object outlined by the first prediction boundary box belongs to an actual class and a second prediction probability that the target object belongs to a non-actual class in a target group, namely, the class prediction probability subset comprises a first prediction probability under the actual class in the target group and a second prediction probability under the non-actual class in the target group (namely, a candidate class in the target group except the actual class) obtained by generating a sub-model to conduct class prediction on the first original boundary box or the first prediction boundary box, and the target group is a class group where the actual class is located; in specific implementation, a plurality of candidate categories associated with the target detection task are predetermined, and based on semantic information of each candidate category, the plurality of candidate categories are subjected to group division to obtain a plurality of category groups.
Specifically, because the first original bounding box is obtained by extracting the region of interest by using the preset region of interest extraction model, there may be a case that the classification recognition of the first prediction bounding box corresponding to the first original bounding box is inaccurate in the initial stage of model training due to the fact that the region where the target object outlined by the first original bounding box is located is not accurate enough, and based on this, in the process of determining the sub-regression classification loss value corresponding to the first original bounding box, the matching relationship between the first prediction class corresponding to the first original bounding box and the actual class of the first original bounding box is referred to, that is, whether the first prediction bounding box meets the preset constraint is determined based on the preset class matching constraint condition;
further, since the generating sub-model is used for both the boundary box prediction and the target object class prediction, in the model training process, the first model parameter related to the boundary box regression and the second model parameter related to the target object class prediction in the generating sub-model need to be iteratively trained, wherein, in addition, considering that, possibly, in the early model training period, the accuracy of the model parameter related to the target object class prediction in the generating sub-model is low, thereby leading to inaccurate class identification of the first prediction boundary box corresponding to the first original boundary box, in the early model training period, the requirement on class accuracy is relaxed, and if the actual class corresponding to the first prediction boundary box and the first prediction class belong to the same class group, the preset constraint is determined to be met, and in the later model training period, the requirement on class accuracy is tightly increased, and only if the actual class corresponding to the first prediction class is the same as the first prediction class, the preset constraint is determined to be met, and based on the preset constraint, the preset constraint can be included in the class matching condition: constraint conditions of the above-mentioned change matching modes (such as category matching staged constraint conditions or category matching gradual change constraint conditions);
Further, in order to ensure that the transition between two types of matching constraint branches defining the first prediction category satisfying the above-mentioned preset constraint (i.e. the first prediction category belongs to the target group and the first prediction category is the same as the actual category) is smoother, so that as the number of model training rounds increases, the preset category matching constraint gradually changes from defining the first prediction category to be the same as the actual category, based on which, preferably, the above-mentioned preset category matching constraint includes: category matching gradient constraints.
In a specific implementation, for the case that the preset category matching constraint condition is a category matching gradual change constraint condition, taking a first original bounding box with a sequence number i as an example, the category matching gradual change constraint condition may be expressed as:
wherein groups represent target groups, real i Representing the actual class of the first original bounding box with the sequence number i in the target group groups, f epsilon groups\real i Representing non-actual categories in the target group, beta represents a predictive modifier,representing a first predictive probability (i.e. the first constraint mentioned above), -a->Representing a second predictive probability,/- >Representing the second constraint item, μ representing the preset probability threshold; specifically, the->The larger the first predicted class is, the closer the first predicted class is to the actual class is; since the preset adjustment factor decreases with the increase of the current training wheel number, the reference duty ratio of the second constraint term gradually decreases, so that whether the first prediction category matches the actual category is mainly determined by the first constraint term (i.e. the first prediction probability under the actual category) in the later period of model training, and then the second constraint term becomes zero after the current model training wheel number reaches a certain model training wheel number, i.e. when>And when the actual category is larger than the preset probability threshold value, determining the actual category as a first prediction category by the description generation sub-model.
Specifically, for the preset adjustment factor, the current model training wheel number is reduced along with the increase of the current model training wheel number, and if the current model training wheel number is smaller than or equal to the target training wheel number, the second constraint term is positively correlated with the preset adjustment factor, and the preset adjustment factor is negatively correlated with the current model training wheel number; and if the number of training wheels of the current model is larger than the number of target training wheels, the second constraint item is zero, wherein the number of target training wheels is smaller than the total number of training wheels.
In specific implementation, in order to ensure the adjustment smoothness of the preset adjustment factor, the value of the preset adjustment factor β may be gradually reduced by adopting a linearly decreasing adjustment manner, so that the determination process of the preset adjustment factor used for the current model training specifically includes:
(1) Aiming at first-round model training, determining a first preset value as a preset adjustment factor used for current model training;
specifically, the first preset value may be set according to actual requirements, in order to simplify the adjustment complexity, the first preset value may be set to 1, that is, the preset adjustment factor β=1, that is, in the case of first-round model training, the above-mentioned category matching gradual change constraint condition may be:
i.e. < ->
That is, for first round model training, it is determined whether a first predicted class corresponding to a first original bounding box matches an actual class based on a sum of a first predicted probability and a second predicted probability corresponding to a target group.
(2) Aiming at the model training of the non-initial round, determining a preset adjusting factor used for the model training according to a factor decreasing adjusting mode based on the current model training round number, the target training round number and the first preset value.
Specifically, if the preset adjustment factor β=1 corresponding to the first-round model training, under the condition of non-first-round model training, the category matching gradual change constraint condition may be:
that is, for non-first round model training, the above-mentioned categories match in the gradient constraintsAnd as the number of model training rounds increases, a second constraint termThe participation degree of the (c) is gradually reduced.
For example, the decreasing formula corresponding to the factor decreasing adjustment manner may be:
wherein,representation->Maximum value with 0, above +.>The first term 1 in (a) represents a first preset value (i.e., a preset adjustment factor β used for initial training), δ represents the current model training wheel number, Z represents the target training wheel number, i.e., the target training wheel number may be the total training wheel number minus 1, or may be the designated training wheel number, the designated training wheel number is smaller than the total training wheel number, the difference between the total training wheel number and the designated training wheel number is the preset wheel number Q, Q is greater than 2, i.e., the preset adjustment factor β is set to 0 in the training process of a certain wheel number (not the last wheel) in the later stage of model training, i.e., the judgment conditions used in model training from δ=z+1 to the last wheel in the later stage of model training are all
It should be noted that, for the case that the target training wheel number Z is the total training wheel number minus 1, the decreasing formula may be:that is, the preset adjustment factor is set to 0 in the last round of model training, that is, the judgment conditions used in the last round of model training are +.>In addition, the above-indicated decreasing formula is only a relatively simple linear decreasing adjustment manner, and in the practical application process, the decreasing rate of the preset adjustment factor β may be set according to the actual requirement, so the decreasing formula does not limit the protection scope of the present application.
In addition, in the specific implementation, the preset category matching constraint condition may relate not only to the first category prediction result, but also to the third discrimination probability and the fourth discrimination probability corresponding to the first original bounding box; specifically, considering that the sub-classification loss value (i.e., the classification loss component) corresponding to the first original bounding box is determined based on the third discrimination probability and the fourth discrimination probability, the third discrimination probability and the fourth discrimination probability can reflect the class similarity degree between the first prediction class and the actual class, so that the preset class matching constraint condition can also be related to the third discrimination probability and the fourth discrimination probability corresponding to the first original bounding box; specifically, the constraint condition of the single matching manner may be that the third discrimination probability is smaller than the first numerical value and the fourth discrimination probability is smaller than the second numerical value (i.e. independent of the number of training rounds of the current model), where the smaller the third discrimination probability and the fourth discrimination probability, the smaller the corresponding sub-classification loss values, which indicates that the more difficult the discrimination sub-model to distinguish between the true and false of the first prediction class and the true class, i.e. the more similar the first prediction class is to the true class, the more similar the true class is, and indicates that the class of the target object in the first prediction boundary box is predicted by the generation sub-model as the target class matched with the true class; correspondingly, the constraint condition of the change matching mode may also be that when the number of training wheels of the current model is smaller than the first preset number of wheels, the third discrimination probability is smaller than the third value and the fourth discrimination probability is smaller than the fourth value, and when the number of training wheels of the current model is larger than or equal to the first preset number of wheels, the third discrimination probability is smaller than the first value and the fourth discrimination probability is smaller than the second value (i.e. related to the number of training wheels of the current model), the third value is larger than the first value, and the fourth value is larger than the second value, i.e. along with the increase of the number of training wheels of the model, the first value and the second value which determine whether the preset category matching constraint condition is satisfied become smaller, and the first prediction category is required to be closer to the actual category, thereby realizing the staged category matching constraint.
The calculating, for the determining process of the loss compensation value, the loss compensation value for constraining the loss gradient of the loss function of the model to be trained based on the actual bounding box and the first prediction bounding box corresponding to the first original bounding box specifically includes:
step D1, generating a synthetic boundary frame corresponding to the first original boundary frame based on the actual boundary frame and the first prediction boundary frame corresponding to the first original boundary frame;
specifically, taking a first original boundary box with a sequence number of i as an example, according to a preset coordinate information sampling mode, determining a sampling coordinate information set based on a first coordinate information set corresponding to an actual boundary box with the sequence number of i and a second coordinate information set corresponding to a first prediction boundary box with the sequence number of i; based on the set of sample coordinate information, a synthetic bounding box with a sequence number i is determined.
And D2, determining a loss compensation value based on the boundary frame distribution similarity degree of the synthesized boundary frame corresponding to the first original boundary frame and the actual boundary frame.
Specifically, in determining the synthesis bounding box corresponding to the first original bounding box with the sequence number iThen, the degree of similarity of the boundary box distribution between the synthesis boundary box with the sequence number i and the actual boundary box with the sequence number i is calculated, namely +. >Then calculate the compensation gradient with respect to the synthesized bounding box for the degree of similarity of the bounding box distribution, i.e. +.>And determining a loss compensation value corresponding to the first original boundary box with the sequence number of i based on the matrix two norms of the compensation gradient.
Specifically, for the determination process of the synthetic bounding box corresponding to a certain first original bounding box, the step D1 generates the synthetic bounding box corresponding to the first original bounding box based on the actual bounding box corresponding to the first original bounding box and the first prediction bounding box, and specifically includes:
step D11, determining a first coordinate information subset based on a first sampling proportion and a first coordinate information set of an actual boundary box corresponding to the first original boundary box;
step D12, determining a second coordinate information subset based on a second sampling proportion and a second coordinate information set of a first prediction boundary box corresponding to the first original boundary box; the first sampling ratio and the second sampling ratio may be preset according to actual conditions, and a sum of the first sampling ratio and the second sampling ratio is equal to 1;
and step D13, generating a synthetic boundary box corresponding to the first original boundary box based on the first coordinate information subset and the second coordinate information subset.
Specifically, taking a first original bounding box with a sequence number i as an example, randomly sampling in a first coordinate information set of an actual bounding box with the sequence number i according to a first sampling proportion to obtain a first coordinate information subset; randomly sampling in a second coordinate information set of the first prediction boundary box with the sequence number of i according to a second sampling proportion to obtain a second coordinate information subset; determining the combination of the first coordinate information subset and the second coordinate information subset as a sampling coordinate information set, and drawing a boundary frame based on the sampling coordinate information set, namely a synthetic boundary frame with the sequence number of i; the synthetic bounding box is a bounding box obtained by randomly sampling and mixing the coordinate information (namely real data) of the actual bounding box with the number i and the coordinate information (namely generated data) of the first prediction bounding box with the number i, so that part of the coordinate information of the synthetic bounding box comes from the real data, and the other part of the coordinate information comes from the generated data, namely the synthetic bounding box is commonly determined by the real data and the generated data and has certain randomness, and therefore, the gradient of the regression classification loss value can be compensated under the condition that the gradient of the regression loss corresponding to the first discrimination dimension suddenly decreases or even becomes zero, thereby avoiding the problem that the gradient of the regression classification loss value suddenly decreases due to the sudden decrease or even becomes zero in the model training process, and further improving the training accuracy of model parameters.
In a specific implementation, the model to be trained includes a generating sub-model and a discriminating sub-model, as shown in fig. 4b, which provides a schematic diagram of a specific implementation principle of a training process of another target detection model, and specifically includes:
(1) Extracting a target region from a preset sample image set by using a preset region of interest extraction model in advance to obtain X anchor frames;
(2) Randomly sampling N anchor frames from the X anchor frames as first original boundary frames aiming at each round of model training, and determining an actual boundary frame and an actual category corresponding to each first original boundary frame respectively;
(3) Aiming at each first original boundary frame, generating a sub-model, carrying out boundary frame prediction based on the first original boundary frame to obtain a first prediction boundary frame, and carrying out target class prediction on the first original boundary frame or an image area in the first prediction boundary frame to obtain a first class prediction result; inputting the first prediction boundary box, the corresponding first category prediction result, the actual boundary box and the corresponding actual category into a judging sub-model; the judging sub-model generates a judging result set based on an actual boundary frame and a first predicting boundary frame corresponding to the first original boundary frame, an actual category and a first category predicting result corresponding to the first original boundary frame and a certain preset category matching constraint condition;
Specifically, whether the first prediction category meets the preset constraint is determined according to the preset category matching constraint condition; if the first prediction category does not meet the preset constraint, the judging result set comprises a second judging result and a fourth judging result (at the moment, the first judging result and the third judging result can be empty or preset information), correspondingly, a classification loss component is determined based on the second judging result in the judging result set of the first original boundary frame, a loss compensation component is determined based on the fourth judging result in the judging result set of the first original boundary frame, and then a sub-regression classification loss value corresponding to the first original boundary frame is determined based on the classification loss component and the loss compensation component; if the first prediction category meets the preset constraint, determining a first regression loss component based on the first discrimination result in the discrimination result set of the first original boundary frame, determining a classification loss component based on the second discrimination result in the discrimination result set of the first original boundary frame, determining a second regression loss component based on the third discrimination result in the discrimination result set of the first original boundary frame, determining a loss compensation component based on the fourth discrimination result in the discrimination result set of the first original boundary frame, and determining a sub regression classification loss value corresponding to the first original boundary frame based on the first regression loss component, the classification loss component, the second regression loss component and the loss compensation component;
It should be noted that, because whether the first prediction category and the actual category meet the preset category matching constraint condition (that is, whether the first prediction category meets the preset constraint) is considered in the process of generating the discrimination result set corresponding to each first original boundary frame by the discrimination sub-model, for the case that the first prediction category and the actual category do not meet the preset category matching constraint condition, only the first prediction category and the actual category need to be subjected to true-false discrimination to obtain a second discrimination result, and a loss compensation value is calculated to obtain a fourth discrimination result, and the true-false discrimination of the first prediction boundary frame and the actual boundary frame is not needed to be performed to obtain a first discrimination result, and the computation of the cross-ratio loss of the first prediction boundary frame and the actual boundary frame is also not needed to be performed to obtain a third discrimination result, that is, the first discrimination result and the third discrimination result are directly determined to be null or preset information, so that the model training efficiency can be further improved;
that is, in the process of determining the set of discrimination results corresponding to the first original bounding box, the first prediction bounding box and the actual bounding box may be directly subjected to true-false discrimination to obtain a first discrimination result, the first prediction class and the actual class may be subjected to true-false discrimination to obtain a second discrimination result, the first prediction bounding box and the actual bounding box may be subjected to calculation of the cross-ratio loss to obtain a third discrimination result, and the loss compensation value may be calculated to obtain a fourth discrimination result, so as to generate a set of discrimination results; further, whether a first regression loss component and a second regression loss component corresponding to the first original boundary box are considered or not is determined based on whether the first prediction category meets the preset constraint, namely whether a corresponding sub regression loss value is zero or not is determined; or determining whether to directly empty or preset the corresponding first discrimination result and third discrimination result based on whether the first prediction category meets the preset constraint, obtaining a discrimination result set, and determining a corresponding sub-regression classification loss value based on the discrimination result set;
In addition, it should be noted that, in the specific implementation, the actual category corresponding to each first original bounding box may be input into the generating sub-model, and the generating sub-model determines whether the first prediction category meets the preset constraint based on the preset category matching constraint condition, the actual category and the first category prediction result; if the first prediction category does not meet the preset constraint, only the sub-classification loss value corresponding to the first original boundary frame is considered, and the corresponding sub-regression loss value is not required to be calculated, so that the boundary frame prediction of the first original boundary frame is not required, and the data processing capacity of generating the sub-model can be further reduced;
(4) Determining a regression classification loss value of the model to be trained based on the sub regression classification loss values respectively corresponding to the first original boundary boxes; adjusting model parameters of the generating sub-model and the judging sub-model based on the regression classification loss value by using a random gradient descent method to obtain a generating sub-model and a judging sub-model after parameter updating;
(5) If the current model iteration training result meets the model iteration training termination condition, determining the updated generation sub-model as a trained target detection model; and if the current model iteration training result meets the model iteration training termination condition, determining the updated generation sub-model and the updated judgment sub-model as a model to be trained for the next round of model training until the model iteration training termination condition is met.
In the target detection model training method in the embodiment of the application, in a model training stage, a first judging result representing the similarity degree of boundary frame distribution is output through judging the sub-model based on an actual boundary frame and a first prediction boundary frame obtained by a first original boundary frame, so that model parameters related to boundary frame regression are continuously updated, generation of the sub-model is promoted to continuously learn the boundary frame distribution, the predicted first prediction boundary frame is enabled to be more similar to the actual boundary frame, and therefore accuracy, model generalization and data migration of boundary frame prediction of the trained target detection model on the position of a target object are improved; in the model training stage, a second judging result representing the class similarity between the first predicting class and the actual class is output through a judging sub-model based on the actual class and the first predicting class corresponding to the first original boundary box, so that the generation sub-model continuously learns the class of the target object of the image area in the boundary box, the predicted first predicting class is more similar to the actual class, and model parameters related to the target object class prediction are continuously updated by means of the judging sub-model judging result instead of the predicting class in the model training process, so that the target classifying accuracy, the model generalization and the data migration of the trained target detection model are improved; the judging result set output by the judging sub-model not only comprises a first judging result and a second judging result, but also comprises a third judging result representing the coordinate coincidence degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by similar distribution of the boundary frame but specific position deviation is achieved, the regression classification loss value of the model to be trained is determined based on the judging result set, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated continuously based on the regression classification loss value, so that the accuracy of the model parameters updated based on the regression classification loss value is higher due to the fact that the accuracy of the regression classification loss value obtained based on the judging result set is higher, and the accuracy of the position mark of the target object and the classification of the target object in the target detection process is ensured simultaneously.
Corresponding to the method for training the target detection model described in fig. 1 to fig. 4b, based on the same technical concept, the embodiment of the present application further provides a target detection method, fig. 5 is a flowchart of the target detection method provided in the embodiment of the present application, where the method in fig. 5 can be performed by an electronic device provided with a target detection apparatus, and the electronic device may be a terminal device or a designated server, where a hardware device for target detection (i.e. the electronic device provided with the target detection apparatus) and a hardware device for training the target detection model (i.e. the electronic device provided with the target detection model training apparatus) may be the same or different, and as shown in fig. 5, the method at least includes the following steps:
s502, obtaining M second original boundary boxes; the second original bounding box is obtained by extracting a target area of an image to be detected by using a preset interested area extraction model, and M is a third preset number;
specifically, the process of obtaining the third preset number of second original bounding boxes may refer to the process of obtaining the first preset number of first original bounding boxes, which is not described herein.
S504, inputting the second original boundary boxes into a target detection model to carry out target detection, and obtaining a second prediction boundary box and a second prediction category corresponding to each second original boundary box; the target detection model is obtained based on the training method of the target detection model, and the specific training process of the target detection model is referred to the above embodiment and will not be described herein.
Specifically, the object detection model includes a generation sub-model; for each second original bounding box: in the target detection process, the generation sub-model predicts based on the second original boundary frame to obtain a second prediction boundary frame and a second prediction category corresponding to the second original boundary frame; the model parameters of the generating sub-model comprise first model parameters related to the regression of the boundary frame and second model parameters related to the target classification, so that the generating sub-model not only can conduct boundary frame prediction based on the second original boundary frame to obtain a second prediction boundary frame corresponding to the second original boundary frame, but also can conduct category prediction based on the second original boundary frame to obtain a second prediction category corresponding to the second original boundary frame.
In the implementation, generating a sub-model to conduct category prediction on the second original boundary frame or the second prediction boundary frame, wherein the output result can be a second category prediction result; the second class prediction result includes the prediction probability that the second original boundary box or the target object outlined by the second prediction boundary box belongs to each candidate class, and the candidate class corresponding to the maximum value of the prediction probability is the second prediction class, that is, the class of the target object outlined by the second original boundary box or the second prediction boundary box is predicted as the second prediction class by the generation sub-model, that is, the class of the target object in the image area in the second original boundary box or the second prediction boundary box is predicted as the second prediction class by the generation sub-model; in addition, in the specific implementation, considering that the position information of the second original boundary frame and the second prediction boundary frame does not deviate greatly, the image features in the second original boundary frame and the image features in the second prediction boundary frame do not deviate greatly, so that the identification of the target object category of the image area in the boundary frame is not affected, based on the fact that the boundary frame prediction and the category prediction are carried out successively, the category prediction can be carried out on the second prediction boundary frame to obtain a corresponding second category prediction result, namely, the second prediction boundary frame is firstly obtained based on the second original boundary frame prediction, and then the category prediction is carried out on the second prediction boundary frame to obtain a second category prediction result; and for the case of synchronous execution of the boundary frame prediction and the class prediction, the class prediction can be carried out on the second original boundary frame at the same time when the boundary frame prediction is carried out on the basis of the second original boundary frame, so as to obtain a corresponding second class prediction result, namely, the second prediction boundary frame is obtained on the basis of the second original boundary frame prediction, and the class prediction is carried out on the second original boundary frame, so as to obtain the second class prediction result.
And S506, generating a target detection result of the image to be detected based on the second prediction boundary box and the second prediction category corresponding to each second original boundary box.
Specifically, based on the second prediction bounding boxes and the second prediction categories corresponding to the second original bounding boxes, the number of target objects contained in the image to be detected and the category to which each target object belongs can be determined, for example, the image to be detected contains a cat, a dog and a pedestrian.
In implementation, the object detection model includes a generation sub-model, as shown in fig. 6, which provides a schematic diagram of a specific implementation principle of an object detection process, and specifically includes:
extracting target areas of the image to be detected by using a preset interested area extraction model to obtain P anchor frames;
randomly sampling M anchor frames from the P anchor frames to serve as a second original boundary frame;
aiming at each second original boundary frame, generating a sub-model to conduct boundary frame prediction based on the second original boundary frame to obtain a second prediction boundary frame, and conducting category prediction on the second prediction boundary frame to obtain a second prediction category;
and generating a target detection result of the image to be detected based on the second prediction boundary boxes and the second prediction categories corresponding to the second original boundary boxes.
It should be noted that, the target detection model obtained based on the training of the target detection model training method can be applied to any specific application scenario in which target detection is required to be performed on an image to be detected, where the image to be detected may be acquired by an image acquisition device disposed at a certain site position, and the corresponding target detection device may belong to the image acquisition device, and may specifically be an image processing device in the image acquisition device, where the image processing device receives the image to be detected transmitted by the image acquisition device in the image acquisition device, and performs target detection on the image to be detected; the object detection means may also be a separate object detection device independent of the image acquisition device, which receives the image to be detected of the image acquisition device and performs object detection on the image to be detected.
Specifically, for a specific application scenario of target detection, for example, an image to be detected may be acquired by an image acquisition device disposed at a certain public place entrance (such as a mall entrance, a subway entrance, a scenic spot entrance, or a performance site entrance, etc.), and the corresponding target object to be detected in the image to be detected is a target user entering the public place, and the target detection model is used to perform target detection on the image to be detected, so as to define a second prediction bounding box containing the target user entering the public place in the image to be detected, and determine a second prediction category corresponding to the second prediction bounding box (i.e., a category to which the target user contained in the second prediction bounding box belongs, such as at least one of age, gender, height, and occupation), so as to obtain a target detection result of the image to be detected; then, determining a user group identification result (such as the flow of people entering the public place or the attribute of the user group entering the public place) based on the target detection result, and further executing corresponding business processing (such as automatically triggering entry limit prompting operation or pushing information to the target user) based on the user group identification result; the higher the accuracy of the model parameters of the target detection model, the higher the accuracy of the target detection result of the image to be detected output by the target detection model, and therefore, the higher the accuracy of triggering execution of corresponding business processing based on the target detection result.
For another example, the image to be detected may be acquired by an image acquisition device disposed at each monitoring point in a certain cultivation base, and the corresponding object to be detected in the image to be detected is a target cultivation object in the monitoring point, and the object detection model is used to perform object detection on the image to be detected, so as to define a second prediction boundary box containing the target cultivation object in the image to be detected, and determine a second prediction category corresponding to the second prediction boundary box (i.e. a category to which the target cultivation object contained in the second prediction boundary box belongs, such as at least one of a living body state and a body size), so as to obtain a target detection result of the image to be detected; then, determining a breeding object group identification result (such as the survival rate of the target breeding objects in the breeding monitoring point or the growth rate of the target breeding objects in the breeding monitoring point) based on the target detection result, and further executing corresponding control operation (such as automatically sending out alarm prompt information if the survival rate is detected to be reduced or automatically controlling to increase the feeding amount or the feeding frequency if the growth rate is detected to be reduced) based on the breeding object group identification result; the higher the accuracy of the model parameters of the target detection model, the higher the accuracy of the target detection result of the image to be detected output by the target detection model, and therefore, the higher the accuracy of triggering execution of the corresponding control operation based on the target detection result.
In the target detection method, in the target detection process, firstly, a plurality of candidate bounding boxes are extracted by using a preset region of interest extraction model, and then a certain number of candidate bounding boxes are randomly sampled in the candidate bounding boxes to serve as second original bounding boxes; aiming at each second original boundary frame, generating a sub-model to conduct boundary frame prediction and category prediction based on the second original boundary frame, and obtaining a second prediction boundary frame and a second prediction category; then, generating a target detection result of the image to be detected based on a second prediction boundary box and a second prediction category corresponding to each second original boundary box; in the model parameter training process of the generation sub-model, a first judging result representing the similarity degree of the boundary frame distribution is output through judging the sub-model based on an actual boundary frame and a first prediction boundary frame obtained by a first original boundary frame, so that model parameters related to boundary frame regression are continuously updated, the generation sub-model is continuously caused to learn the boundary frame distribution, the predicted first prediction boundary frame is enabled to be more close to the actual boundary frame, and therefore accuracy, model generalization and data migration of boundary frame prediction of the position of a target object by a trained target detection model are improved; in the model training stage, a second judging result representing the class similarity between the first predicting class and the actual class is output through a judging sub-model based on the actual class and the first predicting class corresponding to the first original boundary box, so that the generation sub-model continuously learns the class of the target object of the image area in the boundary box, the predicted first predicting class is more similar to the actual class, and model parameters related to the target object class prediction are continuously updated by means of the judging sub-model judging result instead of the predicting class in the model training process, so that the target classifying accuracy, the model generalization and the data migration of the trained target detection model are improved; the judging result set output by the judging sub-model not only comprises a first judging result and a second judging result, but also comprises a third judging result representing the coordinate coincidence degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by similar distribution of the boundary frame but specific position deviation is achieved, the regression classification loss value of the model to be trained is determined based on the judging result set, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated continuously based on the regression classification loss value, so that the accuracy of the model parameters updated based on the regression classification loss value is higher due to the fact that the accuracy of the regression classification loss value obtained based on the judging result set is higher, and the accuracy of the position mark of the target object and the classification of the target object in the target detection process is ensured simultaneously.
It should be noted that, in this application, the embodiment and the previous embodiment in this application are based on the same inventive concept, so the specific implementation of this embodiment may refer to the implementation of the foregoing object detection model training method, and the repetition is not repeated.
Corresponding to the above-mentioned target detection model training method described in fig. 1 to fig. 4b, based on the same technical concept, the embodiment of the present application further provides a target detection model training device, and fig. 7 is a schematic diagram of module composition of the target detection model training device provided in the embodiment of the present application, where the device is used to execute the target detection model training method described in fig. 1 to fig. 4b, and as shown in fig. 7, the device includes:
a first bounding box obtaining module 702 configured to obtain N first original bounding boxes, and obtain an actual bounding box corresponding to each of the first original bounding boxes and an actual category corresponding to the actual bounding box; the first original boundary box is obtained by extracting a target region from a preset sample image set by using a preset region of interest extraction model, and N is a positive integer greater than 1;
the model training module 704 is configured to input the first original bounding box, the actual bounding box and the actual class into a model to be trained for model iterative training until a current model iterative training result meets a model iterative training termination condition, and obtain a target detection model;
The model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first original bounding boxes: the generation sub-model predicts based on the first original boundary box to obtain a first prediction boundary box and a first prediction category; the judging sub-model generates a judging result set based on an actual boundary box corresponding to the first original boundary box, a first prediction boundary box corresponding to the first original boundary box, the actual category and the first prediction category; the judging result set comprises a first judging result, a second judging result and a third judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met, the preset constraint is that the class of the target object in the first prediction boundary frame is predicted by the generating sub-model to be a target class matched with the actual class, the second judging result represents the class similarity degree between the first prediction class and the actual class, and the third judging result represents the boundary frame coordinate coincidence degree of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met; determining regression classification loss values of the models to be trained based on the first discrimination result, the second discrimination result and the third discrimination result corresponding to each first original boundary box; and updating model parameters of the generating sub-model and the judging sub-model based on the regression classification loss value.
In the target detection model training device in the embodiment of the application, in a model training stage, a first judging result representing the similarity degree of boundary frame distribution is output through judging the sub-model based on an actual boundary frame and a first prediction boundary frame obtained by a first original boundary frame, so that model parameters related to boundary frame regression are continuously updated, generation of the sub-model is promoted to continuously learn the boundary frame distribution, the predicted first prediction boundary frame is enabled to be more similar to the actual boundary frame, and therefore accuracy, model generalization and data migration of boundary frame prediction of a trained target detection model on the position of a target object are improved; in the model training stage, a second judging result representing the class similarity between the first predicting class and the actual class is output through a judging sub-model based on the actual class and the first predicting class corresponding to the first original boundary box, so that the generation sub-model continuously learns the class of the target object of the image area in the boundary box, the predicted first predicting class is more similar to the actual class, and model parameters related to the target object class prediction are continuously updated by means of the judging sub-model judging result instead of the predicting class in the model training process, so that the target classifying accuracy, the model generalization and the data migration of the trained target detection model are improved; the judging result set output by the judging sub-model not only comprises a first judging result and a second judging result, but also comprises a third judging result representing the coordinate coincidence degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by similar distribution of the boundary frame but specific position deviation is achieved, the regression classification loss value of the model to be trained is determined based on the judging result set, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated continuously based on the regression classification loss value, so that the accuracy of the model parameters updated based on the regression classification loss value is higher due to the fact that the accuracy of the regression classification loss value obtained based on the judging result set is higher, and the accuracy of the position mark of the target object and the classification of the target object in the target detection process is ensured simultaneously.
It should be noted that, in the present application, the embodiment of the object detection model training device and the specific embodiment of the object detection model training method in the present application are based on the same inventive concept, so that the specific implementation of the embodiment may refer to the implementation of the corresponding object detection model training method, and the repetition is omitted.
Corresponding to the above-mentioned target detection methods described in fig. 5 to 6, based on the same technical concept, an embodiment of the present application further provides a target detection apparatus, and fig. 8 is a schematic block diagram of the target detection apparatus provided in the embodiment of the present application, where the apparatus is configured to perform the target detection method described in fig. 5 to 6, and as shown in fig. 8, the apparatus includes:
a second bounding box acquisition module 802 configured to acquire M second original bounding boxes; the second original boundary box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
the target detection module 804 is configured to input the second original bounding boxes into a target detection model to perform target detection, so as to obtain a second prediction bounding box and a second prediction category corresponding to each second original bounding box;
The detection result generating module 806 is configured to generate a target detection result of the image to be detected based on the second prediction bounding box and the second prediction category corresponding to each of the second original bounding boxes.
In the target detection process, firstly extracting a plurality of alternative bounding boxes by using a preset region of interest extraction model, and randomly sampling a certain number of alternative bounding boxes in the alternative bounding boxes to serve as a second original bounding box; aiming at each second original boundary frame, generating a sub-model to conduct boundary frame prediction and category prediction based on the second original boundary frame, and obtaining a second prediction boundary frame and a second prediction category; then, generating a target detection result of the image to be detected based on a second prediction boundary box and a second prediction category corresponding to each second original boundary box; in the model parameter training process of the generation sub-model, a first judging result representing the similarity degree of the boundary frame distribution is output through judging the sub-model based on an actual boundary frame and a first prediction boundary frame obtained by a first original boundary frame, so that model parameters related to boundary frame regression are continuously updated, the generation sub-model is continuously caused to learn the boundary frame distribution, the predicted first prediction boundary frame is enabled to be more close to the actual boundary frame, and therefore accuracy, model generalization and data migration of boundary frame prediction of the position of a target object by a trained target detection model are improved; in the model training stage, a second judging result representing the class similarity between the first predicting class and the actual class is output through a judging sub-model based on the actual class and the first predicting class corresponding to the first original boundary box, so that the generation sub-model continuously learns the class of the target object of the image area in the boundary box, the predicted first predicting class is more similar to the actual class, and model parameters related to the target object class prediction are continuously updated by means of the judging sub-model judging result instead of the predicting class in the model training process, so that the target classifying accuracy, the model generalization and the data migration of the trained target detection model are improved; the judging result set output by the judging sub-model not only comprises a first judging result and a second judging result, but also comprises a third judging result representing the coordinate coincidence degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by similar distribution of the boundary frame but specific position deviation is achieved, the regression classification loss value of the model to be trained is determined based on the judging result set, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated continuously based on the regression classification loss value, so that the accuracy of the model parameters updated based on the regression classification loss value is higher due to the fact that the accuracy of the regression classification loss value obtained based on the judging result set is higher, and the accuracy of the position mark of the target object and the classification of the target object in the target detection process is ensured simultaneously.
It should be noted that, the embodiments of the object detection device in the present application and the specific embodiments of the object detection method in the present application are based on the same inventive concept, so that the specific implementation of the embodiments may refer to the implementation of the corresponding object detection method, and the repetition is omitted.
Further, corresponding to the methods shown in fig. 1 to 6, based on the same technical concept, the embodiments of the present application further provide a computer device, where the computer device is configured to perform the above-mentioned object detection model training method or the object detection method, as shown in fig. 9.
Computer devices may vary widely in configuration or performance, and may include one or more processors 901 and memory 902, where memory 902 may store one or more stored applications or data. Wherein the memory 902 may be transient storage or persistent storage. The application programs stored in the memory 902 may include one or more modules (not shown) each of which may include a series of computer-executable instructions for use in a computer device. Still further, the processor 901 may be provided in communication with a memory 902 for executing a series of computer executable instructions in the memory 902 on a computer device. The computer device may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input output interfaces 905, one or more keyboards 906, and the like.
In a particular embodiment, a computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the computer device, and configured to be executed by one or more processors, the one or more programs comprising computer-executable instructions for:
acquiring N first original boundary frames, and acquiring an actual boundary frame corresponding to each first original boundary frame and an actual category corresponding to the actual boundary frame; the first original boundary box is obtained by extracting a target region from a preset sample image set by using a preset region of interest extraction model, and N is a positive integer greater than 1;
inputting the first original boundary box, the actual boundary box and the actual category into a model to be trained for model iterative training until the current model iterative training result meets the model iterative training termination condition, and obtaining a target detection model;
the model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
For each of said first original bounding boxes: the generation sub-model predicts based on the first original boundary box to obtain a first prediction boundary box and a first prediction category; the judging sub-model generates a judging result set based on an actual boundary box corresponding to the first original boundary box, a first prediction boundary box corresponding to the first original boundary box, the actual category and the first prediction category; the first discrimination result represents the degree of similarity of the boundary frame distribution of the first prediction boundary frame and the actual boundary frame under the condition that a preset constraint is met, the preset constraint is that the class of the target object in the first prediction boundary frame is predicted by the generation sub-model to be a target class matched with the actual class, the second discrimination result represents the degree of similarity of the class between the first prediction class corresponding to the first original boundary frame and the actual class, and the third discrimination result represents the degree of coincidence of boundary frame coordinates of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met;
Determining regression classification loss values of the models to be trained based on the first discrimination result, the second discrimination result and the third discrimination result corresponding to each first original boundary box;
and updating model parameters of the generating sub-model and the judging sub-model based on the regression classification loss value.
In another particular embodiment, a computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the computer device, and configured to be executed by one or more processors, the one or more programs comprising computer-executable instructions for:
obtaining M second original boundary frames; the second original boundary box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
inputting the second original boundary boxes into a target detection model to carry out target detection, and obtaining a second prediction boundary box and a second prediction category corresponding to each second original boundary box;
And generating a target detection result of the image to be detected based on the second prediction boundary box and the second prediction category corresponding to each second original boundary box.
In the model training stage, the computer equipment outputs a first judging result representing the similarity degree of the boundary frame distribution through judging the sub-model based on the actual boundary frame and the first prediction boundary frame obtained by the first original boundary frame, and causes model parameters related to boundary frame regression to be continuously updated so as to cause the generation sub-model to continuously learn the boundary frame distribution, so that the predicted first prediction boundary frame is more similar to the actual boundary frame, and the accuracy, model generalization and data migration of boundary frame prediction of the position of the target object by the trained target detection model are improved; in the model training stage, a second judging result representing the class similarity between the first predicting class and the actual class is output through a judging sub-model based on the actual class and the first predicting class corresponding to the first original boundary box, so that the generation sub-model continuously learns the class of the target object of the image area in the boundary box, the predicted first predicting class is more similar to the actual class, and model parameters related to the target object class prediction are continuously updated by means of the judging sub-model judging result instead of the predicting class in the model training process, so that the target classifying accuracy, the model generalization and the data migration of the trained target detection model are improved; the judging result set output by the judging sub-model not only comprises a first judging result and a second judging result, but also comprises a third judging result representing the coordinate coincidence degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by similar distribution of the boundary frame but specific position deviation is achieved, the regression classification loss value of the model to be trained is determined based on the judging result set, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated continuously based on the regression classification loss value, so that the accuracy of the model parameters updated based on the regression classification loss value is higher due to the fact that the accuracy of the regression classification loss value obtained based on the judging result set is higher, and the accuracy of the position mark of the target object and the classification of the target object in the target detection process is ensured simultaneously.
It should be noted that, the embodiments related to the computer device and the specific embodiments related to the target detection model training method in the present application are based on the same inventive concept, so that the specific implementation of the embodiments may refer to the implementation of the corresponding target detection model training method, and the repetition is omitted.
Further, corresponding to the methods shown in fig. 1 to 6, based on the same technical concept, the embodiments of the present application further provide a storage medium, which is used to store computer executable instructions, and in a specific embodiment, the storage medium may be a U disc, an optical disc, a hard disk, etc., where the computer executable instructions stored in the storage medium can implement the following flow when executed by a processor:
acquiring N first original boundary frames, and acquiring an actual boundary frame corresponding to each first original boundary frame and an actual category corresponding to the actual boundary frame; the first original boundary box is obtained by extracting a target region from a preset sample image set by using a preset region of interest extraction model, and N is a positive integer greater than 1;
inputting the first original boundary box, the actual boundary box and the actual category into a model to be trained for model iterative training until the current model iterative training result meets the model iterative training termination condition, and obtaining a target detection model;
The model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first original bounding boxes: the generation sub-model predicts based on the first original boundary box to obtain a first prediction boundary box and a first prediction category; the judging sub-model generates a judging result set based on an actual boundary box corresponding to the first original boundary box, a first prediction boundary box corresponding to the first original boundary box, the actual category and the first prediction category; the first discrimination result represents the degree of similarity of the boundary frame distribution of the first prediction boundary frame and the actual boundary frame under the condition that a preset constraint is met, the preset constraint is that the class of the target object in the first prediction boundary frame is predicted by the generation sub-model to be a target class matched with the actual class, the second discrimination result represents the degree of similarity of the class between the first prediction class corresponding to the first original boundary frame and the actual class, and the third discrimination result represents the degree of coincidence of boundary frame coordinates of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met;
Determining regression classification loss values of the models to be trained based on the first discrimination result, the second discrimination result and the third discrimination result corresponding to each first original boundary box;
and updating model parameters of the generating sub-model and the judging sub-model based on the regression classification loss value.
In another specific embodiment, the storage medium may be a usb disk, an optical disc, a hard disk, or the like, where the computer executable instructions stored in the storage medium when executed by the processor implement the following procedures:
obtaining M second original boundary frames; the second original boundary box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
inputting the second original boundary boxes into a target detection model to carry out target detection, and obtaining a second prediction boundary box and a second prediction category corresponding to each second original boundary box;
and generating a target detection result of the image to be detected based on the second prediction boundary box and the second prediction category corresponding to each second original boundary box.
When the computer executable instructions stored in the storage medium in the embodiment of the application are executed by a processor, in a model training stage, a first judging result representing the similarity degree of the boundary frame distribution is output through judging the sub-model based on an actual boundary frame and a first prediction boundary frame obtained by a first original boundary frame, model parameters related to boundary frame regression are caused to be continuously updated, so that the generation sub-model is caused to continuously learn the boundary frame distribution, the predicted first prediction boundary frame is enabled to be more similar to the actual boundary frame, and therefore the accuracy, the model generalization and the data migration of boundary frame prediction of the position of a target object by a trained target detection model are improved; in the model training stage, a second judging result representing the class similarity between the first predicting class and the actual class is output through a judging sub-model based on the actual class and the first predicting class corresponding to the first original boundary box, so that the generation sub-model continuously learns the class of the target object of the image area in the boundary box, the predicted first predicting class is more similar to the actual class, and model parameters related to the target object class prediction are continuously updated by means of the judging sub-model judging result instead of the predicting class in the model training process, so that the target classifying accuracy, the model generalization and the data migration of the trained target detection model are improved; the judging result set output by the judging sub-model not only comprises a first judging result and a second judging result, but also comprises a third judging result representing the coordinate coincidence degree of the boundary frame, so that the effect of compensating the regression loss of the boundary frame caused by similar distribution of the boundary frame but specific position deviation is achieved, the regression classification loss value of the model to be trained is determined based on the judging result set, and model parameters of the generating sub-model and the judging sub-model are repeatedly updated continuously based on the regression classification loss value, so that the accuracy of the model parameters updated based on the regression classification loss value is higher due to the fact that the accuracy of the regression classification loss value obtained based on the judging result set is higher, and the accuracy of the target object position mark and the target object classification in the target detection process is ensured simultaneously.
It should be noted that, the embodiments related to the storage medium and the specific embodiments related to the target detection model training method in the present application are based on the same inventive concept, so that the specific implementation of the embodiments may refer to the implementation of the corresponding target detection model training method, and the repetition is not repeated.
The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous. It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves. It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
Embodiments of the application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices. All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. The foregoing description is by way of example only and is not intended to limit the present disclosure. Various modifications and changes may occur to those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. that fall within the spirit and principles of the present document are intended to be included within the scope of the claims of the present document.

Claims (17)

1. A method for training a target detection model, the method comprising:
acquiring N first original boundary frames, and acquiring an actual boundary frame corresponding to each first original boundary frame and an actual category corresponding to the actual boundary frame; the first original boundary box is obtained by extracting a target region from a preset sample image set by using a preset region of interest extraction model, and N is a positive integer greater than 1;
inputting the first original boundary box, the actual boundary box and the actual category into a model to be trained for model iterative training until the current model iterative training result meets the model iterative training termination condition, and obtaining a target detection model;
the model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first original bounding boxes: the generation sub-model predicts based on the first original boundary box to obtain a first prediction boundary box and a first prediction category; the judging sub-model generates a judging result set based on an actual boundary box corresponding to the first original boundary box, a first prediction boundary box corresponding to the first original boundary box, the actual category and the first prediction category; the first discrimination result represents the degree of similarity of the boundary frame distribution of the first prediction boundary frame and the actual boundary frame under the condition that a preset constraint is met, the preset constraint is that the class of the target object in the first prediction boundary frame is predicted by the generation sub-model to be a target class matched with the actual class, the second discrimination result represents the degree of similarity of the class between the first prediction class corresponding to the first original boundary frame and the actual class, and the third discrimination result represents the degree of coincidence of boundary frame coordinates of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met;
Determining regression classification loss values of the models to be trained based on the first discrimination result, the second discrimination result and the third discrimination result corresponding to each first original boundary box;
and updating model parameters of the generating sub-model and the judging sub-model based on the regression classification loss value.
2. The method according to claim 1, wherein the method further comprises:
inputting a preset sample image set into a preset region of interest extraction model to extract a region of interest, so as to obtain X alternative bounding boxes; wherein X is more than N, and X is a positive integer greater than 1;
the obtaining N first original bounding boxes includes: and randomly selecting N candidate bounding boxes from the X candidate bounding boxes as a first original bounding box.
3. The method of claim 1, wherein the set of discrimination results further comprises a fourth discrimination result; the generating a discrimination result set based on the actual bounding box corresponding to the first original bounding box, the first prediction bounding box corresponding to the first original bounding box, the actual category and the first prediction category includes:
under the condition that the first prediction category meets the preset constraint, carrying out boundary frame authenticity judgment on an actual boundary frame and a first prediction boundary frame corresponding to the first original boundary frame to obtain a first judgment result; carrying out category authenticity judgment on the actual category corresponding to the first original boundary box and the first prediction category to obtain a second judgment result; under the condition that the first prediction category meets the preset constraint, calculating the boundary frame cross ratio loss based on an actual boundary frame and a first prediction boundary frame corresponding to the first original boundary frame, and obtaining a third judging result; and calculating a loss compensation value for restraining the loss gradient of the loss function of the model to be trained based on the actual boundary box and the first prediction boundary box corresponding to the first original boundary box, and obtaining a fourth discrimination result.
4. The method of claim 3, wherein the determining the regression class loss value for the model to be trained based on the first, second, and third discrimination results for each of the first original bounding boxes comprises:
determining a sub-regression classification loss value corresponding to each first original boundary box; the sub-regression classification loss value corresponding to the first original bounding box is determined based on target information, which includes part or all of: whether a first prediction category corresponding to the first original boundary box meets the preset constraint, a boundary box distribution similarity degree represented by a first discrimination result corresponding to the first original boundary box, a category similarity degree represented by a second discrimination result, a boundary box coordinate coincidence degree represented by a third discrimination result and a loss compensation value represented by a fourth discrimination result or not;
and determining the regression classification loss value of the model to be trained based on the sub regression classification loss value corresponding to each first original boundary box.
5. The method of claim 3, wherein performing the authenticity determination on the actual bounding box and the first prediction bounding box corresponding to the first original bounding box to obtain a first determination result includes:
Determining a first judging probability that the actual bounding box is judged to be true by the judging sub-model based on an actual bounding box corresponding to the first original bounding box; determining a second discrimination probability of discriminating the first prediction boundary box as fake by the discrimination sub-model based on a first prediction boundary box corresponding to the first original boundary box;
and generating a first discrimination result based on the first discrimination probability and the second discrimination probability.
6. The method of claim 5, wherein generating a first discrimination result based on the first discrimination probability and the second discrimination probability comprises:
determining a first weighted probability based on the first discrimination probability and a first prior probability of an actual bounding box corresponding to the first original bounding box;
determining a second weighted probability based on the second discrimination probability and a second prior probability of the first original bounding box;
and generating a first judging result based on the first weighted probability and the second weighted probability.
7. The method of claim 3, wherein the performing the class authenticity discrimination on the actual class and the first predicted class corresponding to the first original bounding box to obtain the second discrimination result includes:
Determining a third discrimination probability that the actual category corresponding to the first original boundary box is discriminated as true by the discrimination sub-model; determining a fourth discrimination probability that a first prediction category corresponding to the first original boundary box is discriminated as fake by the discrimination sub-model;
and generating a second discrimination result based on the third discrimination probability and the fourth discrimination probability.
8. The method of claim 7, wherein generating a second discrimination result based on the third discrimination probability and the fourth discrimination probability comprises:
determining a third weighted probability based on the third discrimination probability and a third prior probability that the class of the actual bounding box corresponding to the first original bounding box is the actual class;
determining a fourth weighted probability based on the fourth discrimination probability and a fourth prior probability that the class of the first original bounding box is the actual class;
and generating a second judging result based on the third weighted probability and the fourth weighted probability.
9. The method of claim 3, wherein calculating a bounding box overlap ratio loss based on the actual bounding box and the first prediction bounding box corresponding to the first original bounding box, to obtain a third discrimination result, comprises:
Performing boundary frame cross-ratio loss calculation on an actual boundary frame corresponding to the first original boundary frame and a first prediction boundary frame corresponding to the first original boundary frame to obtain a first cross-ratio loss;
and determining a third discrimination result corresponding to the first original boundary box based on the first cross ratio loss.
10. A method according to claim 3, wherein said calculating a loss compensation value for constraining a loss gradient of a loss function of the model to be trained based on an actual bounding box and a first prediction bounding box corresponding to the first original bounding box comprises:
generating a synthetic bounding box corresponding to the first original bounding box based on an actual bounding box and a first prediction bounding box corresponding to the first original bounding box;
and determining a loss compensation value based on the boundary frame distribution similarity degree of the synthesized boundary frame corresponding to the first original boundary frame and the actual boundary frame.
11. The method of claim 10, wherein the generating a composite bounding box corresponding to the first original bounding box based on the actual bounding box and the first prediction bounding box corresponding to the first original bounding box comprises:
Determining a first coordinate information subset based on a first sampling proportion and a first coordinate information set of an actual bounding box corresponding to the first original bounding box;
determining a second coordinate information subset based on a second sampling proportion and a second coordinate information set of a first prediction boundary box corresponding to the first original boundary box; the sum of the first sampling ratio and the second sampling ratio is equal to 1;
and generating a synthetic boundary box corresponding to the first original boundary box based on the first coordinate information subset and the second coordinate information subset.
12. A method of target detection, the method comprising:
obtaining M second original boundary frames; the second original boundary box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
inputting the second original boundary boxes into a target detection model to carry out target detection, and obtaining a second prediction boundary box and a second prediction category corresponding to each second original boundary box;
and generating a target detection result of the image to be detected based on the second prediction boundary box and the second prediction category corresponding to each second original boundary box.
13. The method of claim 12, wherein the object detection model comprises generating a sub-model;
for each of said second original bounding boxes: and in the target detection process, the generation sub-model predicts based on the second original boundary box to obtain a second prediction boundary box and a second prediction category corresponding to the second original boundary box.
14. An object detection model training apparatus, the apparatus comprising:
the first bounding box acquisition module is configured to acquire N first original bounding boxes and acquire an actual bounding box corresponding to each first original bounding box and an actual category corresponding to the actual bounding box; the first original boundary box is obtained by extracting a target region from a preset sample image set by using a preset region of interest extraction model, and N is a positive integer greater than 1;
the model training module is configured to input the first original boundary frame, the actual boundary frame and the actual category into a model to be trained for model iterative training until a current model iterative training result meets a model iterative training termination condition to obtain a target detection model;
The model to be trained comprises a generation sub-model and a judgment sub-model; the specific implementation mode of each model training is as follows:
for each of said first original bounding boxes: the generation sub-model predicts based on the first original boundary box to obtain a first prediction boundary box and a first prediction category; the judging sub-model generates a judging result set based on an actual boundary box corresponding to the first original boundary box, a first prediction boundary box corresponding to the first original boundary box, the actual category and the first prediction category; the judging result set comprises a first judging result, a second judging result and a third judging result, wherein the first judging result represents the boundary frame distribution similarity degree of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met, the preset constraint is that the class of the target object in the first prediction boundary frame is predicted by the generating sub-model to be a target class matched with the actual class, the second judging result represents the class similarity degree between the first prediction class and the actual class, and the third judging result represents the boundary frame coordinate coincidence degree of the first prediction boundary frame and the actual boundary frame under the condition that the preset constraint is met; determining regression classification loss values of the models to be trained based on the first discrimination result, the second discrimination result and the third discrimination result corresponding to each first original boundary box; and updating model parameters of the generating sub-model and the judging sub-model based on the regression classification loss value.
15. An object detection device, the device comprising:
a second bounding box acquisition module configured to acquire M second original bounding boxes; the second original boundary box is obtained by extracting a target region of an image to be detected by using a preset region of interest extraction model;
the target detection module is configured to input the second original boundary boxes into a target detection model to carry out target detection, so as to obtain a second prediction boundary box and a second prediction category corresponding to each second original boundary box;
and the detection result generation module is configured to generate a target detection result of the image to be detected based on the second prediction boundary box and the second prediction category corresponding to each second original boundary box.
16. A computer device, the device comprising:
a processor; and
a memory arranged to store computer executable instructions configured to be executed by the processor, the executable instructions comprising steps for performing the method of any of claims 1-11 or any of claims 12-13.
17. A storage medium storing computer executable instructions for causing a computer to perform the method of any one of claims 1-11 or any one of claims 12-13.
CN202210831398.8A 2022-07-15 2022-07-15 Target detection model training method, target detection method and target detection device Pending CN117437396A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210831398.8A CN117437396A (en) 2022-07-15 2022-07-15 Target detection model training method, target detection method and target detection device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210831398.8A CN117437396A (en) 2022-07-15 2022-07-15 Target detection model training method, target detection method and target detection device

Publications (1)

Publication Number Publication Date
CN117437396A true CN117437396A (en) 2024-01-23

Family

ID=89546747

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210831398.8A Pending CN117437396A (en) 2022-07-15 2022-07-15 Target detection model training method, target detection method and target detection device

Country Status (1)

Country Link
CN (1) CN117437396A (en)

Similar Documents

Publication Publication Date Title
CN111382868B (en) Neural network structure searching method and device
US11585918B2 (en) Generative adversarial network-based target identification
CN117437395A (en) Target detection model training method, target detection method and target detection device
CN114549894A (en) Small sample image increment classification method and device based on embedded enhancement and self-adaptation
CN113570029A (en) Method for obtaining neural network model, image processing method and device
US11468266B2 (en) Target identification in large image data
JP6892606B2 (en) Positioning device, position identification method and computer program
CN113469111A (en) Image key point detection method and system, electronic device and storage medium
CN116561319A (en) Text clustering method, text clustering device and text clustering system
CN117437396A (en) Target detection model training method, target detection method and target detection device
CN117437397A (en) Model training method, target detection method and device
CN113011597B (en) Deep learning method and device for regression task
CN114445716A (en) Key point detection method, key point detection device, computer device, medium, and program product
WO2024012179A1 (en) Model training method, target detection method and apparatuses
CN114565791A (en) Figure file identification method, device, equipment and medium
CN116737974B (en) Method and device for determining threshold value for face image comparison and electronic equipment
KR102491451B1 (en) Apparatus for generating signature that reflects the similarity of the malware detection classification system based on deep neural networks, method therefor, and computer recordable medium storing program to perform the method
CN110633722A (en) Artificial neural network adjusting method and device
CN116091867B (en) Model training and image recognition method, device, equipment and storage medium
CN117058496A (en) Image clustering evaluation method and device and electronic equipment
CN116719962B (en) Image clustering method and device and electronic equipment
Hasibuan et al. Large Scale Bird Species Classification Using Convolutional Neural Network with Sparse Regularization
US20230153392A1 (en) Control device for predicting a data point from a predictor and a method thereof
Praveena et al. XG Boosting and Deep Random Forest Based House Number Detection
CN117312996A (en) Business object classification prediction method based on ensemble learning and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination