WO2024012179A1 - Model training method, target detection method and apparatuses - Google Patents

Model training method, target detection method and apparatuses Download PDF

Info

Publication number
WO2024012179A1
WO2024012179A1 PCT/CN2023/102175 CN2023102175W WO2024012179A1 WO 2024012179 A1 WO2024012179 A1 WO 2024012179A1 CN 2023102175 W CN2023102175 W CN 2023102175W WO 2024012179 A1 WO2024012179 A1 WO 2024012179A1
Authority
WO
WIPO (PCT)
Prior art keywords
bounding box
model
predicted
bounding
category
Prior art date
Application number
PCT/CN2023/102175
Other languages
French (fr)
Chinese (zh)
Inventor
吕永春
朱徽
王钰
周迅溢
曾定衡
蒋宁
Original Assignee
马上消费金融股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 马上消费金融股份有限公司 filed Critical 马上消费金融股份有限公司
Publication of WO2024012179A1 publication Critical patent/WO2024012179A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes

Definitions

  • the present application relates to the field of target detection, and in particular, to a model training method, target detection method and device.
  • the similarity degree of the image features in the predicted bounding box and the real bounding box is mainly learned, which leads to the model parameters of the trained target detection model for the sample image data set.
  • the accuracy is relatively high, but for the image to be detected, the accuracy of the model parameters of the trained target detection model will be reduced, resulting in poor generalization of the target detection model, which in turn leads to poor target detection accuracy in the model application stage. relatively low.
  • the purpose of the embodiments of the present application is to provide a model training method, a target detection method and a device, which can make the predicted first predicted bounding box closer to the corresponding actual bounding box, thereby improving the bounding box of the trained target detection model. Prediction accuracy, model generalization and data transferability; Furthermore, the accuracy of the bounding box regression loss value obtained based on the first comparison result and the second comparison result is higher, thereby further improving the accuracy of the updated model parameters based on the bounding box regression loss value.
  • an embodiment of the present application provides a model training method, which method includes: acquiring a first bounding box subset from a first candidate bounding box set, and acquiring each first reference in the first bounding box subset.
  • the set is obtained by extracting the target area; input the first reference bounding box and the actual bounding box into the target detection model to be trained for model iterative training, until the model iterative training results meet the preset model iterative training termination conditions, Obtain a trained target detection model; wherein the target detection model includes a bounding box prediction sub-model; the specific implementation of each model training is: for each first reference bounding box: the bounding box prediction sub-model Perform bounding box prediction based on the first reference bounding box to obtain a first predicted bounding box; based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box, generate A set of bounding box comparison results; the set of bounding box comparison results includes a first comparison result that represents the similarity of the distribution of bounding boxes, and a second comparison result that represents the degree of overlap of coordinates of the bounding boxes; based on the first bounding box Determine the bounding box regression loss value based on the first comparison result and the second comparison result
  • an embodiment of the present application provides a target detection method, which method includes: obtaining a second subset of bounding boxes corresponding to the image to be detected from a second set of candidate bounding boxes; the second subset of bounding boxes includes a Three specified number of second reference bounding boxes, the second set of candidate bounding boxes is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model; input the second reference bounding box
  • the target detection model performs target detection and obtains the second predicted bounding box and the second category prediction result corresponding to each of the second reference bounding boxes; based on each of the second reference edges
  • the second predicted bounding box corresponding to the bounding box and the second category prediction result generate a target detection result of the image to be detected.
  • an embodiment of the present application provides a model training device.
  • the device includes: a bounding box acquisition module configured to acquire a first bounding box subset from a first candidate bounding box set, and acquire the first bounding box subset. Actual bounding boxes respectively corresponding to each first reference bounding box in the bounding box subset; the first bounding box subset includes a first specified number of first reference bounding boxes, and the first set of candidate bounding boxes is generated using a preset sense
  • the area of interest extraction model is obtained by extracting the target area from the sample image data set;
  • the model training module is configured to input the first reference bounding box and the actual bounding box into the target detection model to be trained for model iterative training until The results of this model iterative training meet the preset model iterative training termination conditions, and the trained target detection model is obtained; wherein, the target detection model includes a bounding box prediction sub-model; the specific implementation methods of each model training are: for each The first reference bounding box: the bounding box prediction sub-model performs bound
  • the second comparison result of the degree determines the bounding box regression loss value based on the first comparison result and the second comparison result respectively corresponding to the first reference bounding box in the first bounding box subset; based on the boundary
  • the box regression loss value updates the parameters of the bounding box prediction sub-model.
  • an embodiment of the present application provides a target detection device.
  • the device includes: a bounding box acquisition module configured to acquire a second subset of bounding boxes corresponding to the image to be detected from a second set of candidate bounding boxes;
  • the second subset of bounding boxes includes a third specified number of second reference bounding boxes, and the second set of candidate bounding boxes is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model;
  • target a detection module configured to input the second reference bounding box into a target detection model for target detection, and obtain a second predicted bounding box and a second category prediction result corresponding to each of the second reference bounding boxes;
  • a detection result generation module is configured to be based on each of the second Generate a target detection result of the image to be detected with reference to the second predicted bounding box corresponding to the bounding box and the second category prediction result.
  • embodiments of the present application provide a computer device, which device includes: a processor; and a memory arranged to store computer-executable instructions, the executable instructions being configured to be executed by the processor, the Executable instructions include steps for performing a method as described in the first aspect or the second aspect.
  • embodiments of the present application provide a storage medium, wherein the storage medium is used to store computer-executable instructions, and the executable instructions cause the computer to perform the steps in the above method.
  • Figure 1 is a schematic flow chart of a model training method provided by an embodiment of the present application.
  • Figure 2 is a schematic flow chart of each model training process in the model training method provided by the embodiment of the present application;
  • Figure 3 is a schematic diagram of the first implementation principle of the model training method provided by the embodiment of the present application.
  • Figure 4a is a schematic diagram of the second implementation principle of the model training method provided by the embodiment of the present application.
  • Figure 4b is a schematic diagram of the third implementation principle of the model training method provided by the embodiment of the present application.
  • Figure 5 is a schematic flow chart of the target detection method provided by the embodiment of the present application.
  • Figure 6 is a schematic diagram of the implementation principle of the target detection method provided by the embodiment of the present application.
  • Figure 7 is a schematic diagram of the module composition of the model training device provided by the embodiment of the present application.
  • FIG. 8 is a schematic diagram of the module composition of the target detection device provided by the embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the model is prompted to learn the image features in the bounding box, continuously learn the similarity between the predicted bounding box and the image features in the actual bounding box, and adjust the model parameters, so that the trained target detection model is more dependent on
  • the target detection model has poor generalization and poor cross-data migration capabilities.
  • the bounding box prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then predicts the first predicted bounding box based on the first predicted bounding box and its corresponding actual bounding box. , prompting the target detection model to be trained to continuously learn the bounding box distribution, so that the predicted first predicted bounding box is closer to the corresponding actual bounding box, which can not only improve the location of the target object in the image to be detected by the trained target detection model
  • the accuracy of bounding box prediction can also improve the generalization and data migration of the trained target detection model, thereby ensuring the target detection accuracy of new images to be detected using the target detection model and improving the target detection after training.
  • the comparison result set used to determine the bounding box regression loss value not only includes the first ratio that characterizes the similarity of the bounding box distribution.
  • the result also includes a second comparison result that represents the degree of coincidence of the bounding box coordinates, and then the bounding box regression loss value is obtained based on the first comparison result and the second comparison result corresponding to each first reference bounding box, so as to achieve simultaneous consideration
  • the regression loss caused by the bounding boxes with similar distribution but specific position deviation, and the regression loss caused by the first predicted bounding box corresponding to the actual bounding box with edge ambiguity makes the bounding box regression loss value include
  • the regression loss obtained from the coarse-grained comparison dimension based on the similarity of the bounding box distribution, and the regression loss obtained from the fine-grained comparison dimension based on the coincidence degree of the bounding box coordinates can improve the accuracy of the bounding box regression loss value, thereby improving the accuracy of the bounding box regression loss value.
  • the accuracy of the model parameters updated based on the bounding box regression loss value can be further improved.
  • Figure 1 is a schematic flow chart of a model training method provided by one or more embodiments of the present application.
  • the method in Figure 1 can be executed by an electronic device equipped with a model training device.
  • the electronic device can be a terminal device or a designated server, where,
  • the hardware device used for model training i.e., the electronic device equipped with the model training device
  • the hardware device used for target detection i.e., the electronic device equipped with the target detection device
  • the training process of the target detection model as shown in Figure 1, including at least the following steps:
  • the first reference bounding box and the first set of candidate bounding boxes are obtained by extracting the target area from the sample image data set using a preset region of interest extraction model.
  • the process of determining the first specified number of first reference bounding boxes may be to perform a step of extracting the target area from the sample image data set using a preset region of interest extraction model for each round of model training to obtain the first specified number.
  • the sample image data set may contain multiple sample target objects, and each sample target object may correspond to multiple first reference bounding boxes. That is, the first specified number of first reference bounding boxes include at least one first reference bounding box corresponding to each sample target object. A reference bounding box.
  • step S102 before obtaining the first bounding box subset from the first candidate bounding box set, it also includes: inputting the sample image data set into the preset region of interest extraction model to extract the region of interest to obtain the first candidate boundary.
  • Model training uses a preset region of interest extraction model to extract regions of interest from multiple sample image data in the sample image data set to obtain a first specified number of first reference bounding boxes; for the second specified number greater than the first In a specified number of cases, for each round of model training, a first specified number of first reference bounding boxes are randomly sampled from a first specified number of candidate bounding boxes.
  • the model parameters do not depend on the samples used in the model training process data, which can be better adapted to the data to be identified in the model application process
  • the bounding box obeys a certain probability distribution (such as Gaussian distribution or Cauchy distribution). In this way, the greater the number N of anchor boxes extracted by the preset region of interest extraction model, the more helpful it is for the target detection model to be trained to better detect boundaries.
  • the preset region of interest extraction model is used to extract N anchor boxes in advance. Then, in each round of model training, m are randomly sampled from the N anchor boxes as the first reference bounding boxes and are input to the target detection to be trained. Model training is performed in the test model, which can not only ensure the data processing volume of each round of model training, but also ensure that the model can better learn the bounding box distribution, that is, it can take into account both the data processing volume during the model training process and the bounding box distribution. Learning, based on this, the above-mentioned second specified number is greater than the above-mentioned first specified number.
  • the above-mentioned step S102 obtains the first bounding box subset from the first candidate bounding box set, specifically including: acquiring the first bounding box subset from the above-mentioned second specified number.
  • a first specified number of candidate bounding boxes are randomly selected as the first reference bounding box to obtain a first subset of bounding boxes. That is, a preset region of interest extraction model is used in advance to extract multiple images from the sample image data set. Extract the region of interest from the sample image data to obtain a second specified number of candidate bounding boxes; then, for each round of model training, randomly sample from the second specified number of candidate bounding boxes to obtain a first specified number of first references. bounding box.
  • N anchor boxes i.e., the second specified number of candidate bounding boxes
  • m anchor boxes i.e., the first specified number of first reference boxes
  • the preset model iterative training termination conditions may include: the current number of model training rounds is equal to the total number of training rounds, or the model loss function converges.
  • step S104 the specific implementation process of the model iterative training is explained below. Since the processing process of each model training in the model iterative training process is the same, any model training is taken as an example. Provide detailed explanation. Specifically, if the target detection model to be trained includes a bounding box prediction sub-model; as shown in Figure 2, the specific implementation of each model training includes the following steps S1042 to step S1046:
  • the bounding box prediction sub-model performs bounding box prediction based on the first reference bounding box to obtain the first predicted bounding box; based on the actual bounding box corresponding to the first reference bounding box and the first reference
  • the first predicted bounding box corresponding to the bounding box generates a set of bounding box comparison results; wherein the set of bounding box comparison results includes a first comparison result that represents the similarity of the distribution of bounding boxes.
  • the result, and the second comparison result that represents the degree of coincidence of the bounding box coordinates.
  • the first comparison can be obtained by calculating the relative entropy KL divergence (Kullback-Leibler divergence) between the actual bounding box and the corresponding first predicted bounding box.
  • the relative entropy KL divergence can characterize the distribution similarity between the actual bounding box and the corresponding first predicted bounding box, so based on the relative entropy KL divergence, it can be determined from the perspective of the similarity of the bounding box distribution
  • the corresponding first regression loss component is greater in terms of the comparison dimension of the similarity of the bounding box distribution. Therefore, a certain The distribution similarity between the first predicted bounding box corresponding to a first reference bounding box and the corresponding actual bounding box is determined based on the relative entropy KL divergence between the actual bounding box and the first predicted bounding box. Therefore, it can be based on the relative entropy KL divergence.
  • the entropy KL divergence generates the first comparison result, so that the first comparison result can characterize the similarity degree of the bounding box distribution, and then based on the relative entropy KL divergence in the first comparison result, the ratio of the similarity degree of the bounding box distribution can be determined
  • the first regression loss component corresponding to the dimension.
  • the intersection-union ratio loss between a certain actual bounding box and the corresponding first predicted bounding box can be considered to obtain the target intersection-union ratio loss; it can also be Taking into account the intersection-to-union ratio loss between an actual bounding box and the corresponding first predicted bounding box, and the intersection-to-union ratio loss between an actual bounding box and the first predicted bounding box corresponding to other actual bounding boxes, determine Target intersection and union ratio loss; since the size of the target intersection and union ratio loss can represent the degree of coordinate coincidence between the actual bounding box and the corresponding first predicted bounding box, the angle of the coordinate coincidence degree of the bounding box can be determined based on the target intersection and union ratio loss.
  • the second comparison result can be generated based on the target intersection-union ratio loss, In this way, the second comparison result can represent the degree of overlap of the bounding box coordinates, and then based on the orthogonal union ratio loss in the second comparison result, the second regression loss component corresponding to the comparison dimension of the degree of overlap of the bounding box coordinates can be determined.
  • S1044 Determine the bounding box regression loss value of the target detection model to be trained based on the first comparison result and the second comparison result respectively corresponding to the first reference bounding box in the first bounding box subset.
  • the sub-regression loss value corresponding to each first reference bounding box can be obtained.
  • the sub-regression loss value at least includes considerations from the perspective of the similarity of the bounding box distribution.
  • the sub-regression loss value corresponding to the first reference bounding box can be considered from the perspective of the similarity of the distribution of the bounding boxes and the degree of coincidence of the coordinates of the bounding boxes at the same time, or it can also be considered only from the perspective of the similarity of the distribution of the bounding boxes.
  • the set of bounding box comparison results corresponding to the first reference bounding box includes the first comparison result.
  • the sub-regression corresponding to the first reference bounding box is determined based on the first regression loss component corresponding to the first comparison result. loss value.
  • the gradient descent method is used to predict the bounding box prediction sub-model based on the above-mentioned bounding box regression loss value. Perform parameter adjustment; among them, because the sub-regression loss value at least reflects the first regression loss component corresponding to the regression loss comparison dimension based on the similarity of the bounding box distribution, and the third regression loss comparison dimension corresponding to the degree of coincidence of the bounding box coordinates.
  • the bounding box regression loss value used to adjust the model parameters also reflects the regression loss components corresponding to the two regression loss comparison dimensions, so that the target detection model finally trained can not only ensure prediction
  • the probability distribution of the obtained first predicted bounding box and the actual bounding box is closer, and it can also ensure that the coordinates of the first predicted bounding box and the actual bounding box coincide with a higher degree.
  • the process of iteratively training the model parameters based on the bounding box regression loss value of the target detection model to be trained to obtain the target detection model can be found in the existing process of backpropagating the model parameters using the gradient descent method, here No longer.
  • the target detection model trained based on the model training method provided by the embodiment of the present application can be applied to any specific application scenario that requires target detection on the image to be detected.
  • specific application scenario 1 for use The image to be detected collected by the image collection equipment at the entrance of a certain public place (such as the entrance of a shopping mall, a subway entrance, an entrance to a scenic spot, or the entrance to a performance site, etc.) is used for target detection.
  • specific application scenario 2 is to use a certain breeding The images to be detected collected by the image acquisition equipment at each monitoring point in the base are used for target detection.
  • the sample image data sets used in the model training process are also different.
  • the sample image data set can be collected at the entrance of a designated public place within a preset historical time period.
  • the target object circled by the first reference bounding box is the target user who entered the designated public place in the historical sample image.
  • the actual category and the first predicted category can be the category to which the target user belongs, such as the age group.
  • the sample image data set can be historical sample images collected at each monitoring point in the designated breeding base within the preset historical time period, corresponding to the first reference boundary
  • the target object enclosed by the frame is the target breeding object in the historical sample image.
  • the actual category and the first predicted category can be the category to which the target breeding object belongs, such as at least one of living status and body size.
  • first reference bounding boxes Obtain a first designated number of first reference bounding boxes, and obtain actual bounding boxes corresponding to each first reference bounding box; for each first reference bounding box: the above-mentioned boundary box prediction sub-model is performed based on the first reference bounding box. Boundary box prediction is performed to obtain the first predicted bounding box; then, the comparison result generation module generates a boundary box comparison based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box.
  • Result set based on the first comparison result and the second comparison result corresponding to each first reference bounding box, determine the bounding box regression loss value of the target detection model to be trained; detect the target to be trained based on the above bounding box regression loss value
  • the model parameters of the model are updated iteratively until the results of this model iterative training meet the preset model iterative training termination conditions, and the target detection model is obtained.
  • the above step S1042 generates a bounding box comparison result set based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box. Specifically, It includes: calculating the relative entropy KL divergence based on the actual bounding box and the first predicted bounding box corresponding to the first reference bounding box, and obtaining the first comparison result; based on the actual bounding box and the first predicted boundary corresponding to the first reference bounding box. box, calculate the intersection and union ratio loss of the bounding box, and obtain the second comparison result.
  • the set of bounding box comparison results corresponding to the first reference bounding box not only includes the first comparison result obtained from the perspective of the similarity degree of the boundary box distribution, but also includes the first comparison result obtained from the perspective of the degree of coincidence of the bounding box coordinates. Starting from the second comparison result obtained, this can improve the comprehensiveness of the bounding box comparison result set, and thereby improve the accuracy of the bounding box regression loss based on the bounding box comparison result set.
  • a schematic diagram of the specific implementation principle of the training process of another target detection model includes: Preliminarily using the preset area of interest extraction model to extract the target area from the sample image data set to obtain N anchor frames ;
  • the sample image data set includes multiple original sample images, each original sample image includes at least one target object;
  • the feature information corresponding to each anchor frame can include Including position information (x, y, w, h) and category information c, that is (x, y, w, h, c); specifically, during the model training process, multiple parameter dimensions can be set to be mutually exclusive Independent, therefore, the iterative training process for the model parameters of each dimension is also independent of each other; for each round of model training, m anchor boxes are randomly sampled from N anchor boxes as the first reference bounding box, and each Actual bounding boxes corresponding to the first reference bounding boxes respectively; wherein, each target object in the sample image data set can correspond to an actual bounding box.
  • the total number of target objects in the sample image data set is d, then the actual bounding box before expansion The number of bounding boxes is d.
  • the actual bounding boxes corresponding to multiple first reference bounding boxes containing the same target object can be the same, that is, based on the first reference
  • the target object enclosed by the bounding box expands the actual bounding box to obtain m actual bounding boxes (m>d); for example, the target object contained in a certain original sample image is a cat A, and cat A corresponds to the actual Bounding box A, if the number of first reference bounding boxes containing cat A is 4 (such as the first reference bounding boxes with serial numbers 6, 7, 8, and 9), then the actual bounding box A is expanded to 4 actual Bounding box A (that is, the actual bounding box with serial numbers 6, 7, 8, and 9); for each first reference bounding box, the bounding box prediction sub-model performs bounding box prediction based on the first reference bounding box to
  • the first predicted bounding box is predicted by the bounding box prediction sub-model that continuously performs bounding box regression learning; specifically, the m first predictions output by the bounding box prediction sub-model
  • the target object circled by the first predicted bounding box with serial numbers 6, 7, 8, and 9 in the bounding box is cat A; for each first reference bounding box, the set of bounding box comparison results based on the first reference bounding box The first comparison result determines the first regression loss component, and the second comparison result in the bounding box comparison result set of the first reference bounding box determines the second regression loss component; based on the corresponding The first regression loss component and the second regression loss component determine the bounding box regression loss value of the target detection model to be trained; use the stochastic gradient descent method to adjust the model parameters of the above bounding box prediction sub-model based on the bounding box regression loss value, Get the bounding box prediction sub-model with updated parameters; if the model iterative training results If the preset model iterative training termination conditions are met, the updated bounding box prediction sub-model will
  • the bounding box regression loss value of the target detection model to be trained is jointly determined based on the sub-regression loss values corresponding to multiple first reference bounding boxes.
  • the sub-regression loss value corresponding to each first reference bounding box is based on the multiple first reference bounding boxes.
  • the above-mentioned step S1044 determines the bounding box regression loss value based on the first comparison result and the second comparison result respectively corresponding to the first reference bounding box in the first bounding box subset.
  • the sub-regression loss value corresponding to each first reference bounding box in the first bounding box subset includes: determining the sub-regression loss value corresponding to each first reference bounding box in the first bounding box subset; the sub-regression loss value corresponding to each first reference bounding box is determined based on the target information, wherein the target information includes one of the following Or a combination: the similarity of the distribution of bounding boxes represented by the first comparison result corresponding to the first reference bounding box, and the degree of coincidence of bounding box coordinates represented by the second comparison result; based on each first reference boundary in the first bounding box subset
  • the sub-regression loss value corresponding to each frame determines the bounding box regression loss value of the target detection model currently to be trained.
  • ⁇ 1 represents the first weight coefficient corresponding to the first regression loss component under the first comparison dimension
  • V i1 represents the first regression loss component under the first comparison dimension (that is, the same as the first comparison result).
  • ⁇ 2 represents the second weight coefficient corresponding to the second regression loss component in the second comparison dimension
  • V i2 represents the second The regression loss component (that is, the regression loss corresponding to the coincidence degree of the bounding box coordinates represented by the second comparison result loss component);
  • the first comparison dimension can be a regression loss comparison dimension based on the similarity of bounding box distributions
  • the second comparison dimension can be a regression loss comparison dimension based on the coincidence degree of bounding box coordinates.
  • the first weight coefficient and the second weight coefficient may remain unchanged, however, considering that the first regression loss component and the second regression loss component respectively correspond to different regression loss comparisons dimensions (that is, the regression loss comparison dimension based on the similarity of the bounding box distribution and the regression loss comparison dimension based on the coincidence degree of the bounding box coordinates), and the focus of regression loss consideration for different regression loss comparison dimensions is also different (such as
  • the regression loss comparison dimension based on the similarity of the bounding box distribution focuses on the regression loss of the first reference bounding box corresponding to the actual bounding box with blurred boundary box edges.
  • the regression loss comparison dimension based on the coincidence degree of the bounding box coordinates focuses on considering the boundary.
  • the size relationship between the first regression loss component and the second regression loss component reflects to a certain extent which regression loss comparison dimension can be more accurate Characterizes the regression loss between the actual bounding box and the first predicted bounding box.
  • the size of the first regression loss component and the second regression loss component corresponding to the first reference bounding box relationship adjust the size of the first weight coefficient and the second weight coefficient; specifically, if the absolute value of the difference between the first regression loss component and the second regression loss component is not greater than the preset loss threshold, then the first weight coefficient and the second weight coefficient The two weight coefficients remain unchanged; if the absolute value of the difference between the first regression loss component and the second regression loss component is greater than the preset loss threshold, and the first regression loss component is greater than the second regression loss component, then the first preset The adjustment method is to increase the first weight coefficient; if the absolute value of the difference between the first regression loss component and the second regression loss component is greater than the preset loss threshold, and the first regression loss component is less than the second regression loss component, then according to the second The second preset adjustment method increases the second weight coefficient, so that for each first reference bounding box during the model training process, the key reference can better reflect the regression loss corresponding to the comparison dimension of
  • the increase amplitude of the first weight coefficient corresponding to the above-mentioned first preset adjustment mode and the increase amplitude of the second weight coefficient corresponding to the second preset adjustment mode may be the same or different.
  • the increase amplitude of the weight coefficient It can be set according to actual needs, and this application does not limit this.
  • the relative entropy KL divergence is calculated based on the actual bounding box and the first predicted bounding box corresponding to the first reference bounding box, and we obtain
  • the first comparison result specifically includes: step A1, determining the first probability distribution of the actual bounding box corresponding to the first reference bounding box, and determining the second probability distribution of the first predicted bounding box corresponding to the first reference bounding box; steps A2, calculate the KL divergence value between the above-mentioned first probability distribution and the second probability distribution; wherein, the KL divergence value is used to characterize the distribution similarity between the first predicted bounding box and the actual bounding box; Step A3, based on The above KL divergence value determines the first comparison result corresponding to the first reference bounding box.
  • the KL divergence value is The degree value can represent the similarity of the bounding box distribution between the actual bounding box and the corresponding first predicted bounding box. The smaller the KL divergence value, the smaller the difference in the bounding box distribution. Correspondingly, the more similar the bounding box distribution is.
  • the first comparison result can be obtained, where the first comparison result can characterize the similarity of the distribution of the bounding boxes; and then , based on the first comparison result, the first regression loss component corresponding to the comparison dimension that represents the similarity of the bounding box distribution can be determined.
  • the greater the KL divergence value the greater the KL divergence value, which represents the actual bounding box corresponding to the first reference bounding box.
  • the first probability distribution and the second probability distribution can be specifically determined in the following ways:
  • ⁇ 1 represents the first variance
  • b ground represents the mean of the actual bounding box
  • ⁇ d represents the parameters related to the distribution of the real bounding box.
  • ⁇ g represents the model parameters of the bounding box prediction sub-model.
  • the above bounding box regression loss value is equal to the sum of the sub-regression loss values corresponding to the first specified number of first reference bounding boxes. Specifically, it can be expressed as:
  • N reg represents the first specified number
  • i represents the serial number of the first reference bounding box
  • the value of i is 1 to N reg .
  • the intersection and union ratio loss of the bounding box is calculated based on the actual bounding box and the first predicted bounding box corresponding to the first reference bounding box.
  • the second comparison result is obtained, specifically including:
  • Step B1 Calculate the intersection and union ratio loss of the bounding box on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box to obtain the first intersection and union ratio loss.
  • Step B2 Determine the second comparison result corresponding to the first reference bounding box based on the above-mentioned first intersection-union ratio loss; wherein the bounding box intersection-union ratio loss can represent the degree of coincidence of the bounding box coordinates.
  • the second can be obtained based on the intersection-union ratio loss between the actual bounding box and the first predicted bounding box.
  • the comparison result is used to determine the second regression loss component corresponding to the comparison dimension considered from the perspective of the coincidence degree of the bounding box coordinates based on the second comparison result, thereby prompting the model to perform bounding box regression learning.
  • Example samples i.e., the first predicted bounding box corresponding to an actual bounding box learned through bounding box regression
  • negative example samples i.e., other actual boundaries other than an actual bounding box learned through bounding box regression
  • the first predicted bounding box corresponding to the frame is compared in the comparison dimension of the coincidence degree of the bounding box coordinates to learn the specific position representation of the actual bounding box, thereby prompting the model to better perform bounding box regression learning.
  • Step B2 based on the above-mentioned first intersection ratio loss, determine the second comparison result corresponding to the first reference bounding box, specifically including:
  • B21 Determine a set of comparison bounding boxes among the first predicted bounding boxes corresponding to the first designated number of first reference bounding boxes; wherein the set of comparison bounding boxes includes the first predicted bounding boxes except the first predicted bounding boxes corresponding to the first reference bounding boxes. other first predicted bounding boxes outside the first reference bounding box, or other first predicted bounding boxes that do not include the target object enclosed by the first reference bounding box.
  • the first reference bounding box contains other first predicted bounding boxes of different target objects as negative samples of the actual bounding box with serial number i.
  • first reference bounding box with serial number i As an example, for each other first predicted bounding box in the comparison boundary box set, calculate the difference between the actual bounding box with serial number i and the first predicted bounding box with serial number k. The intersection-union ratio loss is obtained, and the second intersection-union ratio loss corresponding to the first predicted bounding box with serial number k is obtained.
  • a first intersection and union ratio loss is calculated based on the actual bounding box with serial number i and the first predicted bounding box with serial number i, and based on the serial number i
  • the actual bounding box and the first predicted bounding box with serial number k calculate the second intersection and union ratio loss (k ⁇ p) to determine the second comparison result (that is, the second comparison result can include the first intersection and union ratio loss and the second intersection and union loss)
  • the second regression loss component related to the degree of coincidence of the bounding box coordinates can be determined.
  • the model parameters can be adjusted based on the second regression loss component, so that the serial number
  • the actual bounding box for i has a higher degree of coincidence with the coordinates of the first predicted bounding box with serial number i, and a smaller degree of coincidence with the coordinates of other first predicted bounding boxes, thereby enhancing the global nature of the bounding box regression learning and further improving the boundary Accuracy of box regression learning.
  • the above-mentioned second regression loss component is the logarithm of the target intersection-union ratio loss.
  • the target detection model not only needs to determine the location of the target object, but also needs to determine the specific category of the target object. Therefore, during the training process of the target detection model, there may be some first reference bounding boxes for some The problem of low accuracy in category identification, considering that for the first reference bounding box with low accuracy in category prediction, the first predicted bounding box corresponding to such first reference bounding box may not truly reflect the bounding box predictor The bounding box prediction accuracy of the model, and the autoregressive loss for the first predicted bounding box and the actual bounding box corresponding to such first original bounding box, cannot truly reflect the bounding box prediction accuracy of the bounding box prediction sub-model.
  • the first predicted category corresponding to the first predicted bounding box is considered, and only the first predicted bounding box corresponding When the actual category matches the first predicted category, its corresponding sub-regression loss value will be considered. Otherwise, only its corresponding sub-category loss value will be considered, that is, the first reference boundary where the category prediction result does not meet the preset requirements is excluded.
  • the sub-regression loss value corresponding to the frame is excluded.
  • the above-mentioned target detection model also includes a bounding box classification sub-model; the specific implementation method of each model training also includes: the bounding box classification sub-model performs the first reference bounding box or the first prediction boundary Classify the frame to obtain the first category prediction result; during specific implementation, the bounding box classification sub-model performs category prediction on the above-mentioned first reference bounding box or the above-mentioned first predicted bounding box, and the output result may be the first category prediction result; Among them, the first category prediction result includes the predicted probability that the target object enclosed by the first reference bounding box or the first predicted bounding box belongs to each candidate category.
  • the candidate category corresponding to the maximum predicted probability is the first predicted category, that is, the first reference
  • the category of the target object enclosed by the bounding box or the first predicted bounding box is predicted by the bounding box classification sub-model to be the first predicted category, that is, the first reference bounding box
  • the target object category of the image area within the first predicted bounding box is predicted as the first predicted category by the bounding box classification sub-model; in addition, during specific implementation, it is considered that the position information of the first reference bounding box and the first predicted bounding box are not the same. There will be a large deviation, and the image features in the first reference bounding box will not deviate greatly from the image features in the first predicted bounding box.
  • the first predicted bounding box can be input into the bounding box classification sub-model for category prediction, and the corresponding first category prediction result is obtained, that is, based on the first reference bounding box prediction Obtain the first predicted bounding box, and then perform category prediction on the first predicted bounding box to obtain the first category prediction result; and for the situation where bounding box prediction and category prediction are executed simultaneously, the first reference bounding box can also be input to the bounding box Category prediction is performed in the classification sub-model to obtain the corresponding first category prediction result, that is, the first predicted bounding box is obtained based on the first reference bounding box prediction, and the category prediction is performed on the first reference bounding box to obtain the first category prediction result.
  • the iterative training process of model parameters of the above bounding box classification sub-model can refer to the existing classification model training process, and will not be described again here.
  • the above target information also includes a category matching result between the first predicted category represented by the first category prediction result corresponding to the first reference bounding box and the actual category of the first reference bounding box, wherein for each first reference bounding box corresponding In the determination process of the sub-regression loss value, if the corresponding category matching result is that the first predicted category does not match the actual category, the sub-regression loss value corresponding to the first reference bounding box is zero; if the corresponding category matching result is the first If the predicted category matches the actual category, then the sub-regression loss value corresponding to the first reference bounding box is based on the first regression loss component corresponding to the above-mentioned boundary box distribution similarity and the second regression loss component corresponding to the above-mentioned bounding box coordinate coincidence degree. At least one identified sub-regression loss value.
  • the preset category matching constraints that determine whether the first predicted category corresponding to the first reference bounding box matches the actual category may be related to the first category prediction result, and specifically may include: constraints in a single matching method, or constraints in changing matching methods. Conditions, among which, for the constraints of a single matching method, the category matching constraints used in each round of model training remain unchanged (that is, with The number of current model training rounds is irrelevant). For example, for each round of model training, if the actual category is the same as the first predicted category, it is determined that the first predicted category corresponding to the first reference bounding box matches the actual category; for changes In terms of the constraints of the matching method, the category matching constraints used in each round of model training are related to the current model training round number. Specifically, the constraints that change the matching method can be divided into category matching stage-type constraints, or category matching constraints. Match gradient constraints.
  • the above category matching stage-type constraint may be that when the current model training round number is less than the first preset round number, the actual category and the first predicted category belong to the same category group, and when the current model training round number is greater than or equal to the first preset round number, the actual category and the first predicted category belong to the same category group.
  • the actual category is the same as the first predicted category, that is, based on the category matching staged constraint and the category prediction result corresponding to the first reference bounding box, the staged category matching constraint can be realized; the above category matching gradient constraint
  • the condition may be that the sum of the first constraint item and the second constraint item is greater than the preset probability threshold, the first constraint item is the first predicted probability corresponding to the actual category in the category predicted probability subset, and the second constraint item is the category predicted probability subset except The product of the sum of the second predicted probabilities other than the first predicted probability and the preset adjustment factor.
  • the preset adjustment factor gradually decreases as the current number of training rounds increases, that is, based on the category matching gradient constraints and the first reference
  • the category prediction result corresponding to the bounding box can realize the gradual category matching constraint;
  • a category prediction probability subset is determined based on the category prediction result corresponding to the first reference bounding box, and the category prediction probability subset includes the target circled by the first prediction bounding box
  • the target group is the category group where the actual category is located;
  • multiple candidate categories associated with the target detection task are predetermined, and based on
  • the first reference bounding box is obtained by extracting the area of interest using a preset area of interest extraction model, it may be that the area where the target object is delineated by the first reference bounding box is not accurate enough, resulting in model training.
  • the category recognition of the first predicted bounding box is inaccurate.
  • the first predicted category corresponding to the first reference bounding box and the first reference boundary are referred to.
  • the category matching result between the actual categories of the box is determined based on the above-mentioned preset category matching constraints and is used to represent whether the first predicted category corresponding to the first reference bounding box matches the actual category.
  • the bounding box classification sub-model can be pre-trained, or the model parameters of the bounding box classification sub-model can be trained simultaneously during the training of the model parameters of the bounding box prediction sub-model, that is, based on the first
  • the predicted category and the actual category determine the classification loss value, and iteratively trains the model parameters of the bounding box classification sub-model based on the classification loss value.
  • the simultaneous training of the model parameters of the bounding box classification sub-model it is also considered that it may be due to In the early stage of model training, the accuracy of the model parameters in the bounding box classification sub-model of the target detection model to be trained is low, resulting in inaccurate category identification of the first predicted bounding box corresponding to the first reference bounding box.
  • the above preset Category matching constraints may include: the above-mentioned constraints that change the matching method (such as category matching stage-based constraints, or category matching gradual constraints).
  • the above preset category matching constraints include: category matching gradient constraints.
  • the category matching gradient constraint can be expressed as:
  • groups represents the target group
  • real i represents the actual category of the first reference bounding box with serial number i in the target group groups
  • f ⁇ groups ⁇ real i represents the non-actual category in the target group
  • represents the prediction adjustment factor
  • represents the second prediction probability represents the above-mentioned second constraint item
  • represents the above-mentioned preset probability threshold
  • the first constraint term i.e., the first predicted probability under the actual category. Then, after the current number of model training rounds reaches a certain number of model training rounds, the second constraint term becomes zero. , that is, when When it is greater than the preset probability threshold, it means that the bounding box classification sub-model determines the actual category as the first predicted category.
  • the above-mentioned preset adjustment factor decreases as the number of current model training rounds increases. If the current number of model training rounds is less than or equal to the target number of training rounds, then the above-mentioned second constraint term is positively related to the preset adjustment factor, The above-mentioned preset adjustment factor is negatively related to the current number of model training rounds; if the current number of model training rounds is greater than the target number of training rounds, then the above-mentioned second constraint is zero, where the target number of training rounds is less than the total number of training rounds.
  • the determination process of the preset adjustment factor used in current model training is specifically as follows:
  • the first preset value can be set according to actual needs.
  • the above category matching gradient constraints can be:
  • the first round of model training based on the sum of the first predicted probability and the second predicted probability corresponding to the target group, it is determined whether the first predicted category corresponding to the first reference bounding box matches the actual category.
  • the decreasing formula corresponding to the above factor decreasing adjustment method can be:
  • the first item 1 in represents the first preset value (i.e., the preset adjustment factor ⁇ used in the first round of training), ⁇ represents the current model training round number, and Z represents the target training round number, that is, the target training round number can be the total
  • the number of training rounds is reduced by 1, or it can be the specified number of training rounds.
  • the specified number of training rounds is less than the total number of training rounds.
  • the difference between the total number of training rounds and the specified number of training rounds is the preset number of rounds Q.
  • the above decreasing formula can be: That is, in the last round of model training, the preset adjustment factor is set to 0, that is, the judgment conditions used in the last round of model training are all
  • the decrease formula shown above is only a relatively simple linear decrease adjustment method. In the actual application process, the decrease rate of the preset adjustment factor ⁇ can be set according to actual needs. Therefore, the above decrease formula does not It does not constitute a limitation on the scope of protection of this application.
  • the above-mentioned target detection model to be trained includes a bounding box prediction sub-model and a bounding box classification sub-model.
  • Figure 4b a schematic diagram of the specific implementation principle of the training process of another target detection model is given, including:
  • the bounding box prediction sub-model performs bounding box prediction based on the first reference bounding box to obtain the first predicted bounding box; then, the comparison result generation module based on the first reference bounding box
  • the corresponding actual bounding box and the corresponding first predicted bounding box generate a set of boundary box comparison results
  • the bounding box classification sub-model predicts the category of the first predicted bounding box to obtain the category prediction result; according to the preset category matching constraints,
  • the actual category of the actual bounding box corresponding to the first reference bounding box and the category prediction result of the first predicted bounding box corresponding to the first reference bounding box determine the category matching result; if the category matching result represents the first predicted category and the actual category If the preset category matching constraint is not met, the sub-regression loss value corresponding to the first reference bounding box is zero; if the category matching result represents that the first predicted category and the actual category satisfy the preset category matching constraint, then based on the first reference
  • the determination process of the above category matching results may be to determine the edge based on the bounding box comparison result set.
  • the bounding box regression loss value can also be considered in the process of generating a set of bounding box comparison results for a certain first reference bounding box, so that the first predicted category and the actual category do not satisfy the preset category. In the case of matching constraint conditions, it is enough to directly determine that the corresponding bounding box comparison result set is empty or has preset information. There is no need to generate a bounding box ratio based on the actual bounding box corresponding to the first reference bounding box and the corresponding first predicted bounding box.
  • the model training efficiency can be further improved; specifically, as shown in Figure 4b, the comparison result generation module is based on the actual bounding box and the first predicted bounding box corresponding to the first reference bounding box, and the first reference
  • the actual category and category prediction result corresponding to the bounding box generate a set of boundary box comparison results; specifically, based on the actual category of the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box
  • the category prediction result determines the category matching result; if the category matching result indicates that the first predicted category and the actual category do not meet the preset category matching constraints, the corresponding bounding box comparison result set is empty or preset information.
  • the sub-regression loss value determined by the frame comparison result set is zero; if the category matching result indicates that the first predicted category and the actual category satisfy the preset category matching constraints, then based on the actual bounding box corresponding to the first reference bounding box and the corresponding The first predicted bounding box of , generates a set of bounding box comparison results; therefore, the sub-regression loss value determined based on the set of bounding box comparison results is the first corresponding to the first comparison result based on the set of bounding box comparison results. Determined by the regression loss component and the second regression loss component corresponding to the second comparison result.
  • a set of bounding box comparison results can be generated directly based on the actual bounding box corresponding to the first reference bounding box and the corresponding first predicted bounding box. ; and then determine the category matching result between the first predicted category and the actual category based on the category prediction result (i.e., the category matching result indicates whether the preset category matching constraints are satisfied between the first predicted category and the actual category); if the category matching result is category does not match, then determine the corresponding sub-regression loss value to be zero.
  • the category matching result is category matching, then determine the corresponding sub-regression loss value based on multiple comparison results in the bounding box comparison result set; it can also be based on The category prediction result determines the category matching result between the first predicted category and the actual category. If the category matching result is category mismatch, it is determined that the corresponding bounding box comparison result set is empty or preset information, and the corresponding The sub-regression loss value is zero. If the category matching result is category matching, a bounding box comparison result set is generated based on the actual bounding box corresponding to the first reference bounding box and the corresponding first predicted bounding box, and based on the bounding box comparison Multiple comparison results in the result set determine the corresponding sub-regression loss value.
  • the updated bounding box prediction sub-model will be determined as the trained target detection model; if the model iterative training results do not meet the preset If the model iterative training termination condition is determined, the updated bounding box prediction sub-model will be determined as the target detection model to be trained for the next round of model training until the preset model iterative training termination condition is met.
  • the bounding box prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then based on the first predicted bounding box and its corresponding actual bounding box, Prompts the target detection model to be trained to continuously learn the bounding box distribution, so that the predicted first predicted bounding box is closer to the corresponding actual bounding box, thereby improving the bounding box prediction accuracy and model generalization of the trained target detection model.
  • the set of comparison results used to determine the bounding box regression loss value not only includes the first comparison result that characterizes the similarity of the bounding box distribution, but also includes the second comparison result that characterizes the degree of coincidence of the bounding box coordinates, and then
  • the bounding box regression loss value is obtained based on the first comparison result and the second comparison result corresponding to each first reference bounding box, so that the bounding box regression loss value includes the coarse-grained comparison dimension based on the similarity of the bounding box distribution.
  • Regression loss, as well as regression loss obtained by fine-grained comparison dimensions based on the coincidence degree of bounding box coordinates can improve the accuracy of the bounding box regression loss value and further improve the model updated based on the bounding box regression loss value. Parameter accuracy.
  • FIG. 5 is a schematic flow chart of the target detection method provided by the embodiment of the present application.
  • the method in Figure 5 can be executed by an electronic device equipped with a target detection device.
  • the electronic device can be A terminal device or a designated server, wherein the hardware device for target detection (i.e., the electronic device provided with the target detection device) and the hardware device for target detection model training (i.e., the electronic device provided with the target detection model training device) may be the same or Differently, as shown in Figure 5, this method at least includes the following steps:
  • S502 Obtain a second subset of bounding boxes corresponding to the image to be detected from the second set of candidate bounding boxes; wherein the second subset of bounding boxes includes a third specified number of second reference bounding boxes, and the second set of candidate bounding boxes is It is obtained by extracting the target area of the above-mentioned image to be detected using a preset area of interest extraction model.
  • the process of obtaining the third specified number of second reference bounding boxes may refer to the above-mentioned process of obtaining the first specified number of first reference bounding boxes, which will not be described again here.
  • S504 Input the above-mentioned second reference bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second category prediction result corresponding to each second reference bounding box; wherein, the target detection model is trained based on the above-mentioned model training method.
  • the target detection model is trained based on the above-mentioned model training method.
  • the above target detection model includes a bounding box classification sub-model and a bounding box prediction sub-model; for each second reference bounding box: during the target detection process, the bounding box prediction sub-model performs bounding box prediction based on the second reference bounding box, and obtains the The second predicted bounding box corresponding to the two reference bounding boxes; the bounding box classification sub-model performs classification processing on the second reference bounding box or the second predicted bounding box to obtain a second prediction category corresponding to the second reference bounding box.
  • the bounding box classification sub-model performs category prediction on the above-mentioned second reference bounding box or the above-mentioned second predicted bounding box, and the output result may be a second category prediction result; wherein the second category prediction result includes the second reference bounding box or the second prediction
  • the candidate category corresponding to the maximum predicted probability is the second predicted category, that is, the category of the target object enclosed by the second reference bounding box or the second predicted bounding box is bounded.
  • Box classification sub-model prediction is the second prediction category, that is, the target object category of the image area within the second reference bounding box or the second prediction bounding box is predicted as the second prediction category by the bounding box classification sub-model; in addition, during specific implementation, the second prediction category is taken into consideration
  • the position information of the reference bounding box and the second predicted bounding box will not deviate greatly, and the image features in the second reference bounding box will not deviate greatly from the image features in the second predicted bounding box. Therefore, the boundary will not be affected. Identification of the target object category in the image area within the frame.
  • the second predicted bounding box can be input into the bounding box classification sub-model for category prediction, and the corresponding second predicted bounding box can be obtained.
  • the category prediction result is to first obtain the second predicted bounding box based on the second reference bounding box prediction, and then perform category prediction on the second predicted bounding box to obtain the second category prediction result; for the situation where boundary box prediction and category prediction are executed simultaneously , the second reference bounding box can also be input into the bounding box classification sub-model for category prediction, and the corresponding second category prediction result is obtained, that is, the second predicted bounding box is obtained based on the second reference bounding box prediction, and the second reference The bounding box is used for category prediction and the second category prediction result is obtained.
  • S506 Generate a target detection result of the image to be detected based on the second predicted bounding box and the second category prediction result corresponding to each second reference bounding box.
  • the number of target objects contained in the image to be detected and the category to which each target object belongs can be determined.
  • the image to be detected contains a A cat, a dog and a pedestrian.
  • the above target detection model includes a bounding box prediction sub-model and a bounding box classification sub-model.
  • a schematic diagram of the specific implementation principle of the target detection process is given, which specifically includes: using the preset region of interest extraction model to be detected The target area is extracted from the image to obtain P anchor boxes; n anchor boxes are randomly sampled from the P anchor boxes as second reference bounding boxes; for each second reference bounding box, the bounding box prediction sub-model is based on the second reference The bounding box predicts the bounding box to obtain the second predicted bounding box; the bounding box classification sub-model performs category prediction on the second predicted bounding box to obtain the second predicted category; based on the second predicted bounding box corresponding to each second reference bounding box and The second prediction category generates the target detection results of the image to be detected.
  • the target detection model trained based on the above target detection model training method can be applied to any A specific application scenario that requires target detection of an image to be detected, wherein the image to be detected can be collected by an image acquisition device installed at a certain on-site location, and correspondingly, the target detection device can belong to the image acquisition device, specifically It can be an image processing device in the image acquisition device.
  • the image processing device receives the image to be detected transmitted by the image acquisition device in the image acquisition device, and performs target detection on the image to be detected; the target detection device can also be independent of the image acquisition device.
  • a separate target detection device receives the image to be detected from the image acquisition device and performs target detection on the image to be detected.
  • the image to be detected can be collected by an image collection device installed at the entrance of a certain public place (such as a shopping mall entrance, a subway entrance, an entrance to a scenic spot, or an entrance to a performance site, etc.).
  • a certain public place such as a shopping mall entrance, a subway entrance, an entrance to a scenic spot, or an entrance to a performance site, etc.
  • the target object to be detected in the image to be detected is the target user who enters the public place.
  • the above target detection model is used to perform target detection on the image to be detected, so as to delineate the second target user who enters the public place in the image to be detected.
  • the second prediction category corresponding to the second predicted bounding box that is, the category of the target user included in the second predicted bounding box, such as at least one of age group, gender, height, and occupation
  • the user group identification result is determined based on the target detection result (such as the flow of people entering the public place, or the attributes of the user group entering the public place, etc.), and then, based on the user group identification result, the corresponding Business processing (such as automatically triggering admission restriction prompt operations, or pushing information to target users, etc.); among them, the higher the accuracy of the model parameters of the above target detection model, the target detection of the image to be detected output by the target detection model is The accuracy of the results will be higher. Therefore, the accuracy of triggering corresponding business processing based on the target detection results will be higher.
  • the image to be detected can be collected by image acquisition equipment installed at each monitoring point in a certain breeding base.
  • the target object to be detected in the image to be detected is the target breeding object in the breeding monitoring point.
  • the above target detection model is used to perform target detection on the image to be detected, so as to delineate the second prediction bounding box containing the target breeding object in the image to be detected, and determine the second prediction category (i.e., the second prediction boundary) corresponding to the second prediction boundary box.
  • the category of the target breeding object contained in the frame such as at least one of living status and body size), and the target detection result of the image to be detected is obtained.
  • the result determine the breeding object group identification result based on the target detection result (such as the survival rate of the target breeding object in the breeding monitoring point, or the growth rate of the target breeding object in the breeding monitoring point, etc.), and then, based on the breeding object group identification result Execute corresponding management and control operations (if a decrease in survival rate is detected, an alarm message will be automatically issued, or if a slowdown in growth rate is detected, automatic control will increase the amount of feeding or the frequency of feeding, etc.); among them, the model parameters of the above target detection model The higher the accuracy, the higher the accuracy of the target detection results of the image to be detected output by the target detection model. Therefore, the higher the accuracy of triggering corresponding control operations based on the target detection results.
  • the target detection result such as the survival rate of the target breeding object in the breeding monitoring point, or the growth rate of the target breeding object in the breeding monitoring point, etc.
  • the target detection method in the embodiment of the present application during the target detection process, first uses the preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly samples a third specified number of candidate boundaries in the candidate boundary boxes. box as the second reference bounding box; for each second reference bounding box, the bounding box prediction sub-model performs bounding box prediction based on the second reference bounding box to obtain the second predicted bounding box; the classification sub-model predicts the second predicted bounding box Category prediction is performed to obtain the second predicted category; then, based on the second predicted bounding box and the second predicted category corresponding to each second reference bounding box, the target detection result of the image to be detected is generated; where, since in the model training stage, the boundary The frame prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then based on the first predicted bounding box and its corresponding actual bounding box, prompts the target detection model to be trained to continuously learn the bounding box distribution, so that the prediction can be The first predicted bounding box is closer to the
  • the second comparison result obtains the bounding box regression loss value, so that the bounding box regression loss value includes the regression loss based on the coarse-grained comparison dimension based on the similarity of the bounding box distribution, and the fine-grained comparison dimension based on the coincidence degree of the bounding box coordinates.
  • the obtained regression loss can improve the accuracy of the bounding box regression loss value, thereby further improving the accuracy of the updated model parameters based on the bounding box regression loss value.
  • FIG. 7 is a schematic diagram of the module composition of the model training device provided by the embodiment of the present application. The device is used to perform the model training method described in Figures 1 to 4b.
  • the device includes: a bounding box acquisition module 702 configured to acquire a first bounding box subset from a first candidate bounding box set, and obtaining actual bounding boxes respectively corresponding to each first reference bounding box in the first bounding box subset; the first bounding box subset includes a first specified number of first reference bounding boxes, and the first candidate bounding box
  • the set is obtained by extracting the target area from the sample image data set using a preset region of interest extraction model;
  • the model training module 704 is configured to input the first reference bounding box and the actual bounding box into the target detection to be trained
  • the model undergoes model iterative training until the results of this model iterative training meet the preset model iterative training termination conditions, and the trained target detection model is obtained.
  • the target detection model includes a bounding box prediction sub-model; the specific implementation of each model training is: for each first reference bounding box: the bounding box prediction sub-model is based on the first reference bounding box Perform bounding box prediction to obtain a first predicted bounding box; generate a set of bounding box comparison results based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box;
  • the set of bounding box comparison results includes a first comparison result that represents the degree of similarity of the distribution of bounding boxes, and a second comparison result that represents the degree of coincidence of bounding box coordinates; based on the first reference boundary in the first subset of bounding boxes
  • the first comparison result and the second comparison result respectively corresponding to the box are determined to determine the bounding box regression loss value; and the parameters of the bounding box prediction sub-model are updated based on the bounding box regression loss value.
  • the bounding box prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then based on the first predicted bounding box and its corresponding actual bounding box, Prompts the target detection model to be trained to continuously learn the bounding box distribution, so that The predicted first predicted bounding box is closer to the corresponding actual bounding box, thereby improving the bounding box prediction accuracy, model generalization and data transferability of the trained target detection model; and used to determine the bounding box regression loss
  • the comparison result set of values not only includes the first comparison result that represents the similarity of the bounding box distribution, but also includes the second comparison result that represents the degree of coincidence of the coordinates of the bounding boxes, and then based on the first comparison result corresponding to each first reference bounding box.
  • the bounding box regression loss value is obtained from the comparison result and the second comparison result, so that the bounding box regression loss value includes the regression loss based on the coarse-grained comparison dimension based on the similarity of the bounding box distribution, and the fine-grained regression loss based on the coincidence degree of the bounding box coordinates. Comparing the regression loss obtained by dimensions can improve the accuracy of the bounding box regression loss value, which can further improve the accuracy of the model parameters updated based on the bounding box regression loss value.
  • FIG. 8 is a schematic diagram of the module composition of the target detection device provided by the embodiment of the present application.
  • the device is used to perform the target detection method described in Figures 5 to 6.
  • the device includes: a bounding box acquisition module 802, configured to acquire the third image corresponding to the image to be detected from the second candidate bounding box set.
  • the target detection module 804 is configured to input the second reference bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second category prediction corresponding to each of the second reference bounding boxes.
  • the detection result generation module 806 is configured to generate a target detection result of the image to be detected based on the second predicted bounding box corresponding to each of the second reference bounding boxes and the second category prediction result.
  • the target detection device in the embodiment of the present application during the target detection process, first uses the preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly samples the third finger in the candidate bounding boxes.
  • a certain number of candidate bounding boxes are used as second reference bounding boxes; for each second reference bounding box, the bounding box prediction sub-model performs bounding box prediction based on the second reference bounding box to obtain a second predicted bounding box; the classification sub-model Category prediction is performed on the second predicted bounding box to obtain the second predicted category; then, based on the second predicted bounding box and the second predicted category corresponding to each second reference bounding box, a target detection result of the image to be detected is generated; where, since In the model training phase, the bounding box prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then based on the first predicted bounding box and its corresponding actual bounding box, prompts the target detection model to be trained to continuously learn the boundary
  • the box distribution makes the predicted first predicted bounding box
  • the first comparison result and the second comparison result obtain the bounding box regression loss value, so that the bounding box regression loss value includes the regression loss based on the coarse-grained comparison dimension based on the similarity of the bounding box distribution, and the regression loss based on the coincidence degree of the bounding box coordinates.
  • the regression loss obtained by comparing the fine-grained dimensions can improve the accuracy of the bounding box regression loss value, thereby further improving the accuracy of the updated model parameters based on the bounding box regression loss value.
  • embodiments of the present application also provide a computer device, which is used to execute the above-mentioned model training method or target detection method, as shown in Figure 9 .
  • Computer equipment may vary greatly due to different configurations or performance, and may include one or more processors 901 and memory 902 , and the memory 902 may store one or more storage application programs or data. Among them, the memory 902 may be short-term storage or persistent storage.
  • the application program stored in memory 902 may include one or more modules (not shown), and each module may include a series of computer-executable instructions on a computer device.
  • the processor 901 may be configured to communicate with the memory 902 and execute a series of computer-executable instructions in the memory 902 on the computer device.
  • the computer device may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input-output interfaces 905, one or more keyboards 906, etc.
  • the computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a configuration for the computer device.
  • the one or more programs include computer-executable instructions for: obtaining a first subset of bounding boxes from a first set of candidate bounding boxes.
  • the box set is obtained by extracting the target area from the sample image data set using a preset area of interest extraction model; input the first reference bounding box and the actual bounding box into the target detection model to be trained for model iterative training until The results of this model iterative training meet the preset model iterative training termination conditions, and the trained target detection model is obtained.
  • the target detection model includes a bounding box prediction sub-model; the specific implementation of each model training is: for each first reference bounding box: the bounding box prediction sub-model is based on the first reference bounding box Perform bounding box prediction to obtain a first predicted bounding box; generate a set of bounding box comparison results based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box;
  • the set of bounding box comparison results includes a first comparison result that represents the degree of similarity of the distribution of bounding boxes, and a second comparison result that represents the degree of coincidence of bounding box coordinates; based on the first reference boundary in the first subset of bounding boxes
  • the first comparison result and the second comparison result respectively corresponding to the box are determined to determine the bounding box regression loss value; and the parameters of the bounding box prediction sub-model are updated based on the bounding box regression loss value.
  • the computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a configuration for the computer device.
  • a series of computer-executable instructions, and configured to be executed by one or more processors, the one or more programs include computer-executable instructions for: obtaining from a second set of candidate bounding boxes corresponding to the image to be detected A second subset of bounding boxes; the second subset of bounding boxes includes a third specified number of second reference bounding boxes, and the second set of candidate bounding boxes is performed on the image to be detected using a preset region of interest extraction model.
  • the bounding box prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then based on the first predicted bounding box and its corresponding actual bounding box, prompts
  • the target detection model to be trained continuously learns the bounding box distribution, so that the predicted first predicted bounding box is closer to the corresponding actual bounding box, thereby improving the bounding box prediction accuracy, model generalization and Data mobility; and the set of comparison results used to determine the bounding box regression loss value not only includes the first comparison result that characterizes the similarity of the bounding box distribution, but also includes the second comparison result that characterizes the degree of coincidence of the bounding box coordinates, and then based on
  • the first comparison result and the second comparison result corresponding to each first reference bounding box respectively obtain the bounding box regression loss value, so that the bounding box regression loss value includes the regression obtained from the coarse-grained comparison dimension based on the similarity of the bounding box distribution. loss, and the regression loss obtained by fine
  • embodiments of the present application also provide a storage medium for storing computer executable instructions.
  • the storage medium can be a U disk or an optical disk. , hard disk, etc., when the computer executable instructions stored in the storage medium are executed by the processor, the following process can be realized: obtaining the first bounding box subset from the first candidate bounding box set, and obtaining the first bounding box subset.
  • the first bounding box subset includes a first specified number of first reference bounding boxes
  • the first candidate bounding box set is extracted using a preset region of interest
  • the model is obtained by extracting the target area from the sample image data set; input the first reference bounding box and the actual bounding box into the target detection model to be trained for model iterative training until the model iterative training results meet the preset model Iterate the training termination conditions to obtain the trained target detection model.
  • the target detection model includes a bounding box prediction sub-model; the specific implementation of each model training is: for each first reference bounding box: the bounding box prediction sub-model is based on the first reference bounding box Perform bounding box prediction to obtain a first predicted bounding box; generate a set of bounding box comparison results based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box;
  • the set of bounding box comparison results includes a first comparison result that represents the degree of similarity of the distribution of bounding boxes, and a second comparison result that represents the degree of coincidence of bounding box coordinates; based on the first reference boundary in the first subset of bounding boxes
  • the first comparison result and the second comparison result respectively corresponding to the box are determined to determine the bounding box regression loss value; and the parameters of the bounding box prediction sub-model are updated based on the bounding box regression loss value.
  • the storage medium can be a U disk, an optical disk, a hard disk, etc.
  • the following process can be implemented: Obtain the third image corresponding to the image to be detected from the second alternative bounding box set. Two subsets of bounding boxes; the second subset of bounding boxes includes a third specified number of second reference bounding boxes, and the second set of candidate bounding boxes is a target of the image to be detected using a preset region of interest extraction model. Obtained by region extraction; input the second reference bounding box into the target detection model for target detection, and obtain the second predicted edge corresponding to each second reference bounding box. a bounding box and a second category prediction result; based on the second predicted bounding box and the second category prediction result corresponding to each of the second reference bounding boxes, a target detection result of the image to be detected is generated.
  • the bounding box prediction sub-model predicts the first prediction bounding box based on the first reference bounding box, and then predicts the first bounding box based on the first prediction.
  • the bounding box and its corresponding actual bounding box prompt the target detection model to be trained to continuously learn the bounding box distribution, making the predicted first predicted bounding box closer to the corresponding actual bounding box, thus improving the trained target detection model.
  • the bounding box prediction accuracy, model generalization and data transferability of The second comparison result of the degree of coordinate coincidence is then used to obtain the bounding box regression loss value based on the first comparison result and the second comparison result corresponding to each first reference bounding box, so that the bounding box regression loss value includes the bounding box regression loss value.
  • the regression loss obtained from the coarse-grained comparison dimension based on the degree of distribution similarity, and the regression loss obtained from the fine-grained comparison dimension based on the coincidence degree of the bounding box coordinates can improve the accuracy of the bounding box regression loss value and further improve the accuracy of the bounding box regression loss value.
  • the accuracy of the updated model parameters based on this bounding box regression loss value can improve the accuracy of the bounding box regression loss value and further improve the accuracy of the bounding box regression loss value.
  • embodiments of the present application may be provided as methods, systems or computer program products. Therefore, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may employ computer-readable storage media (including but not limited to disk memory, disk memory, computer-usable program code) embodied therein.
  • These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • Memory may include non-permanent storage in computer-readable media, random access memory (RAM), and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information.
  • Information may be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), Digital versatile discs (DVDs) or other optical storage, magnetic tape cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
  • One or more embodiments of the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network.
  • program modules may be located in both local and remote computer storage media including storage devices.
  • Each embodiment in this application is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments.
  • the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment.
  • the above are only examples of this document and are not intended to limit this document. Various modifications and variations of this document may occur to those skilled in the art. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this document shall be included in the scope of the claims of this document.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present application provide a model training method, a target detection method and apparatuses. The model training method comprises: in a model training stage, on the basis of a first reference bounding box and an actual bounding box corresponding thereto, prompting a target detection model to be trained to continuously learn bounding box distribution, so that a first predicted bounding box obtained by prediction is closer to the corresponding actual bounding box; moreover, a set of comparison results used for determining a regression loss value of a bounding box not only comprises a first comparison result that characterizes a degree of similarity of the bounding box distribution, but also comprises a second comparison result that characterizes a degree of coincidence of bounding box coordinates.

Description

模型训练方法、目标检测方法及装置Model training method, target detection method and device
交叉引用cross reference
本发明要求在2022年07月15日提交中国专利局、申请号为202210829603.7、发明名称为“模型训练方法、目标检测方法及装置”的中国专利申请的优先权,该申请的全部内容通过引用结合在本发明中。This invention claims the priority of a Chinese patent application filed with the China Patent Office on July 15, 2022, with the application number 202210829603.7 and the invention name "Model Training Method, Target Detection Method and Device". The entire content of this application is incorporated by reference. in the present invention.
技术领域Technical field
本申请涉及目标检测领域,尤其涉及一种模型训练方法、目标检测方法及装置。The present application relates to the field of target detection, and in particular, to a model training method, target detection method and device.
背景技术Background technique
目前,随着人工智能技术的快速发展,通过预先训练的目标检测模型对某一图像中进行目标检测,从而预测得到图像中包含的各个目标所在边界框的坐标信息和分类信息的需求越来越高。At present, with the rapid development of artificial intelligence technology, there is an increasing need to detect targets in an image through pre-trained target detection models to predict the coordinate information and classification information of the bounding boxes of each target contained in the image. high.
然而,相关技术中的目标检测模型的训练过程中,主要学习预测边界框与真实边界框中的图像特征相似程度,从而导致对于样本图像数据集而言,训练得到的目标检测模型的模型参数的准确度比较高,但对于待目标检测图像而言,训练得到的目标检测模型的模型参数的准确度会有所降低,导致目标检测模型的泛化性差,进而导致模型应用阶段的目标检测准确度比较低。However, during the training process of the target detection model in the related art, the similarity degree of the image features in the predicted bounding box and the real bounding box is mainly learned, which leads to the model parameters of the trained target detection model for the sample image data set. The accuracy is relatively high, but for the image to be detected, the accuracy of the model parameters of the trained target detection model will be reduced, resulting in poor generalization of the target detection model, which in turn leads to poor target detection accuracy in the model application stage. relatively low.
发明内容Contents of the invention
本申请实施例的目的是提供一种模型训练方法、目标检测方法及装置,能够使得预测得到的第一预测边界框更加接近于对应的实际边界框,从而提高训练后的目标检测模型的边界框预测准确度、模型泛化性和数据迁移性; 并且使得基于第一比对结果和第二比对结果得到的边界框回归损失值的准确度更高,从而能够进一步提高基于该边界框回归损失值更新后的模型参数的准确度。The purpose of the embodiments of the present application is to provide a model training method, a target detection method and a device, which can make the predicted first predicted bounding box closer to the corresponding actual bounding box, thereby improving the bounding box of the trained target detection model. Prediction accuracy, model generalization and data transferability; Furthermore, the accuracy of the bounding box regression loss value obtained based on the first comparison result and the second comparison result is higher, thereby further improving the accuracy of the updated model parameters based on the bounding box regression loss value.
为了实现上述技术方案,本申请实施例是这样实现的:In order to realize the above technical solution, the embodiment of the present application is implemented as follows:
一方面,本申请实施例提供的一种模型训练方法,所述方法包括:从第一备选边界框集合中获取第一边界框子集,以及获取所述第一边界框子集中的各第一参考边界框分别对应的实际边界框;所述第一边界框子集包括第一指定数量的第一参考边界框,所述第一备选边界框集合是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;将所述第一参考边界框和所述实际边界框输入待训练的目标检测模型进行模型迭代训练,直到本次模型迭代训练结果满足预设模型迭代训练终止条件,得到训练后的目标检测模型;其中,所述目标检测模型包括边界框预测子模型;每次模型训练的具体实现方式有:针对每个所述第一参考边界框:所述边界框预测子模型基于所述第一参考边界框进行边界框预测,得到第一预测边界框;基于所述第一参考边界框对应的实际边界框和所述第一参考边界框对应的第一预测边界框,生成边界框比对结果集合;所述边界框比对结果集合包括表征边界框分布相似程度的第一比对结果、以及表征边界框坐标重合程度的第二比对结果;基于所述第一边界框子集中的所述第一参考边界框分别对应的第一比对结果和第二比对结果,确定边界框回归损失值;基于所述边界框回归损失值对所述边界框预测子模型进行参数更新。On the one hand, an embodiment of the present application provides a model training method, which method includes: acquiring a first bounding box subset from a first candidate bounding box set, and acquiring each first reference in the first bounding box subset. The actual bounding boxes corresponding to the bounding boxes respectively; the first bounding box subset includes a first specified number of first reference bounding boxes, and the first candidate bounding box set is based on the sample image data using a preset region of interest extraction model. The set is obtained by extracting the target area; input the first reference bounding box and the actual bounding box into the target detection model to be trained for model iterative training, until the model iterative training results meet the preset model iterative training termination conditions, Obtain a trained target detection model; wherein the target detection model includes a bounding box prediction sub-model; the specific implementation of each model training is: for each first reference bounding box: the bounding box prediction sub-model Perform bounding box prediction based on the first reference bounding box to obtain a first predicted bounding box; based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box, generate A set of bounding box comparison results; the set of bounding box comparison results includes a first comparison result that represents the similarity of the distribution of bounding boxes, and a second comparison result that represents the degree of overlap of coordinates of the bounding boxes; based on the first bounding box Determine the bounding box regression loss value based on the first comparison result and the second comparison result respectively corresponding to the first reference bounding box in the set; perform parameter update on the bounding box prediction sub-model based on the bounding box regression loss value .
一方面,本申请实施例提供的一种目标检测方法,所述方法包括:从第二备选边界框集合中获取待检测图像对应的第二边界框子集;所述第二边界框子集包括第三指定数量的第二参考边界框,所述第二备选边界框集合是利用预设感兴趣区域提取模型对所述待检测图像进行目标区域提取得到的;将所述第二参考边界框输入目标检测模型进行目标检测,得到各所述第二参考边界框对应的第二预测边界框和第二类别预测结果;基于各所述第二参考边 界框对应的所述第二预测边界框和所述第二类别预测结果,生成所述待检测图像的目标检测结果。On the one hand, an embodiment of the present application provides a target detection method, which method includes: obtaining a second subset of bounding boxes corresponding to the image to be detected from a second set of candidate bounding boxes; the second subset of bounding boxes includes a Three specified number of second reference bounding boxes, the second set of candidate bounding boxes is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model; input the second reference bounding box The target detection model performs target detection and obtains the second predicted bounding box and the second category prediction result corresponding to each of the second reference bounding boxes; based on each of the second reference edges The second predicted bounding box corresponding to the bounding box and the second category prediction result generate a target detection result of the image to be detected.
一方面,本申请实施例提供的一种模型训练装置,所述装置包括:边界框获取模块,被配置为从第一备选边界框集合中获取第一边界框子集,以及获取所述第一边界框子集中的各第一参考边界框分别对应的实际边界框;所述第一边界框子集包括第一指定数量的第一参考边界框,所述第一备选边界框集合是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;模型训练模块,被配置为将所述第一参考边界框和所述实际边界框输入待训练的目标检测模型进行模型迭代训练,直到本次模型迭代训练结果满足预设模型迭代训练终止条件,得到训练后的目标检测模型;其中,所述目标检测模型包括边界框预测子模型;每次模型训练的具体实现方式有:针对每个所述第一参考边界框:所述边界框预测子模型基于所述第一参考边界框进行边界框预测,得到第一预测边界框;基于所述第一参考边界框对应的实际边界框和所述第一参考边界框对应的第一预测边界框,生成边界框比对结果集合;所述边界框比对结果集合包括表征边界框分布相似程度的第一比对结果、以及表征边界框坐标重合程度的第二比对结果;基于所述第一边界框子集中的所述第一参考边界框分别对应的第一比对结果和第二比对结果,确定边界框回归损失值;基于所述边界框回归损失值对所述边界框预测子模型进行参数更新。On the one hand, an embodiment of the present application provides a model training device. The device includes: a bounding box acquisition module configured to acquire a first bounding box subset from a first candidate bounding box set, and acquire the first bounding box subset. Actual bounding boxes respectively corresponding to each first reference bounding box in the bounding box subset; the first bounding box subset includes a first specified number of first reference bounding boxes, and the first set of candidate bounding boxes is generated using a preset sense The area of interest extraction model is obtained by extracting the target area from the sample image data set; the model training module is configured to input the first reference bounding box and the actual bounding box into the target detection model to be trained for model iterative training until The results of this model iterative training meet the preset model iterative training termination conditions, and the trained target detection model is obtained; wherein, the target detection model includes a bounding box prediction sub-model; the specific implementation methods of each model training are: for each The first reference bounding box: the bounding box prediction sub-model performs bounding box prediction based on the first reference bounding box to obtain a first predicted bounding box; based on the actual bounding box corresponding to the first reference bounding box and the The first predicted bounding box corresponding to the first reference bounding box generates a set of bounding box comparison results; the set of bounding box comparison results includes a first comparison result that represents the similarity of the distribution of bounding boxes, and a first comparison result that represents the coincidence of coordinates of the bounding boxes. The second comparison result of the degree; determine the bounding box regression loss value based on the first comparison result and the second comparison result respectively corresponding to the first reference bounding box in the first bounding box subset; based on the boundary The box regression loss value updates the parameters of the bounding box prediction sub-model.
一方面,本申请实施例提供的一种目标检测装置,所述装置包括:边界框获取模块,被配置为从第二备选边界框集合中获取待检测图像对应的第二边界框子集;所述第二边界框子集包括第三指定数量的第二参考边界框,所述第二备选边界框集合是利用预设感兴趣区域提取模型对所述待检测图像进行目标区域提取得到的;目标检测模块,被配置为将所述第二参考边界框输入目标检测模型进行目标检测,得到各所述第二参考边界框对应的第二预测边界框和第二类别预测结果;检测结果生成模块,被配置为基于各所述第二 参考边界框对应的所述第二预测边界框和所述第二类别预测结果,生成所述待检测图像的目标检测结果。On the one hand, an embodiment of the present application provides a target detection device. The device includes: a bounding box acquisition module configured to acquire a second subset of bounding boxes corresponding to the image to be detected from a second set of candidate bounding boxes; The second subset of bounding boxes includes a third specified number of second reference bounding boxes, and the second set of candidate bounding boxes is obtained by extracting the target area of the image to be detected using a preset region of interest extraction model; target a detection module configured to input the second reference bounding box into a target detection model for target detection, and obtain a second predicted bounding box and a second category prediction result corresponding to each of the second reference bounding boxes; a detection result generation module, is configured to be based on each of the second Generate a target detection result of the image to be detected with reference to the second predicted bounding box corresponding to the bounding box and the second category prediction result.
一方面,本申请实施例提供的一种计算机设备,所述设备包括:处理器;以及被安排成存储计算机可执行指令的存储器,所述可执行指令被配置由所述处理器执行,所述可执行指令包括用于执行如第一方面或第二方面中所述的方法中的步骤。On the one hand, embodiments of the present application provide a computer device, which device includes: a processor; and a memory arranged to store computer-executable instructions, the executable instructions being configured to be executed by the processor, the Executable instructions include steps for performing a method as described in the first aspect or the second aspect.
一方面,本申请实施例提供的一种存储介质,其中,所述存储介质用于存储计算机可执行指令,所述可执行指令使得计算机执行上述方法中的步骤。On the one hand, embodiments of the present application provide a storage medium, wherein the storage medium is used to store computer-executable instructions, and the executable instructions cause the computer to perform the steps in the above method.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请一个或多个中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only one or more of the drawings described in the present application. For some embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings without exerting any creative effort.
图1为本申请实施例提供的模型训练方法的流程示意图;Figure 1 is a schematic flow chart of a model training method provided by an embodiment of the present application;
图2为本申请实施例提供的模型训练方法中每次模型训练过程的流程示意图;Figure 2 is a schematic flow chart of each model training process in the model training method provided by the embodiment of the present application;
图3为本申请实施例提供的模型训练方法的第一种实现原理示意图;Figure 3 is a schematic diagram of the first implementation principle of the model training method provided by the embodiment of the present application;
图4a为本申请实施例提供的模型训练方法的第二种实现原理示意图;Figure 4a is a schematic diagram of the second implementation principle of the model training method provided by the embodiment of the present application;
图4b为本申请实施例提供的模型训练方法的第三种实现原理示意图;Figure 4b is a schematic diagram of the third implementation principle of the model training method provided by the embodiment of the present application;
图5为本申请实施例提供的目标检测方法的流程示意图;Figure 5 is a schematic flow chart of the target detection method provided by the embodiment of the present application;
图6为本申请实施例提供的目标检测方法的实现原理示意图;Figure 6 is a schematic diagram of the implementation principle of the target detection method provided by the embodiment of the present application;
图7为本申请实施例提供的模型训练装置的模块组成示意图;Figure 7 is a schematic diagram of the module composition of the model training device provided by the embodiment of the present application;
图8为本申请实施例提供的目标检测装置的模块组成示意图;Figure 8 is a schematic diagram of the module composition of the target detection device provided by the embodiment of the present application;
图9为本申请实施例提供的计算机设备的结构示意图。 Figure 9 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请一个或多个中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一个或多个一部分实施例,而不是全部的实施例。基于本申请一个或多个中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请的保护范围。In order to enable those skilled in the art to better understand one or more technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, , the described embodiments are only one or more partial embodiments of the present application, rather than all embodiments. Based on one or more embodiments of this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the protection scope of this application.
需要说明的是,在不冲突的情况下,本申请中的一个或多个实施例以及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请实施例。It should be noted that, without conflict, one or more embodiments and features in the embodiments of the present application can be combined with each other. The embodiments of the present application will be described in detail below with reference to the accompanying drawings and embodiments.
考虑到如果通过使用深度网络提取特征,促使模型学习边界框中图像特征,不断学习预测边界框与实际边界框中的图像特征相似程度,进行模型参数调整,这样训练后的目标检测模型比较依赖于模型训练阶段所使用的样本数据集,目标检测模型的泛化性差、模型跨数据迁移能力差,势必会存在目标检测模型对样本数据集的目标检测准确度高,而对于新的待检测图像数据的目标检测准确度低的问题,因此,在模型训练阶段,边界框预测子模型基于第一参考边界框预测得到第一预测边界框,再基于第一预测边界框、及其对应的实际边界框,促使待训练的目标检测模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于对应的实际边界框,这样不仅能够提高训练后的目标检测模型对待检测图像中目标对象所在位置的边界框预测的准确度,还能够提高训练后的目标检测模型的泛化性和数据迁移性,从而实现确保利用目标检测模型对新的待检测图像的目标检测准确度,提高训练后的目标检测模型的数据迁移适应能力;并且,又考虑到如果仅仅从边界框分布相似程度的粗粒度比对维度确定模型回归损失,进行模型参数调整,则无法兼顾边界框的精确位置学习,或者仅仅从边界框坐标重合程度的细粒度比 对维度确定模型回归损失,进行模型参数调整,则无法兼顾边界框的边缘模糊性问题,因此,用于确定边界框回归损失值的比对结果集合不仅包括表征边界框分布相似程度的第一比对结果,还包括表征边界框坐标重合程度的第二比对结果,再基于各第一参考边界框分别对应的第一比对结果和第二比对结果得到边界框回归损失值,达到同时考虑边界框分布相似但具***置偏差的边界框所带来的回归损失、以及边缘模糊性的实际边界框对应的第一预测边界框所带来的回归损失的效果,使得边界框回归损失值包含了基于边界框分布相似程度的粗粒度比对维度得到的回归损失、以及基于边界框坐标重合程度的细粒度比对维度得到的回归损失,这样能够提高边界框回归损失值的准确度更高,从而能够进一步提高基于该边界框回归损失值更新后的模型参数的准确度。Considering that by using a deep network to extract features, the model is prompted to learn the image features in the bounding box, continuously learn the similarity between the predicted bounding box and the image features in the actual bounding box, and adjust the model parameters, so that the trained target detection model is more dependent on In the sample data set used in the model training phase, the target detection model has poor generalization and poor cross-data migration capabilities. There is bound to be a target detection model that has high target detection accuracy for the sample data set, but for new image data to be detected. The problem of low target detection accuracy. Therefore, in the model training stage, the bounding box prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then predicts the first predicted bounding box based on the first predicted bounding box and its corresponding actual bounding box. , prompting the target detection model to be trained to continuously learn the bounding box distribution, so that the predicted first predicted bounding box is closer to the corresponding actual bounding box, which can not only improve the location of the target object in the image to be detected by the trained target detection model The accuracy of bounding box prediction can also improve the generalization and data migration of the trained target detection model, thereby ensuring the target detection accuracy of new images to be detected using the target detection model and improving the target detection after training. Detect the data migration adaptability of the model; and also consider that if the model regression loss is determined only from the coarse-grained comparison dimension of the similarity of the bounding box distribution, and the model parameters are adjusted, the precise position learning of the bounding box cannot be taken into account, or it may only be learned from Fine-grained ratio of bounding box coordinate coincidence Determining the model regression loss in dimensions and adjusting the model parameters cannot take into account the edge ambiguity of the bounding box. Therefore, the comparison result set used to determine the bounding box regression loss value not only includes the first ratio that characterizes the similarity of the bounding box distribution. The result also includes a second comparison result that represents the degree of coincidence of the bounding box coordinates, and then the bounding box regression loss value is obtained based on the first comparison result and the second comparison result corresponding to each first reference bounding box, so as to achieve simultaneous consideration The regression loss caused by the bounding boxes with similar distribution but specific position deviation, and the regression loss caused by the first predicted bounding box corresponding to the actual bounding box with edge ambiguity, makes the bounding box regression loss value include The regression loss obtained from the coarse-grained comparison dimension based on the similarity of the bounding box distribution, and the regression loss obtained from the fine-grained comparison dimension based on the coincidence degree of the bounding box coordinates can improve the accuracy of the bounding box regression loss value, thereby improving the accuracy of the bounding box regression loss value. The accuracy of the model parameters updated based on the bounding box regression loss value can be further improved.
图1为本申请一个或多个实施例提供的模型训练方法的流程示意图,图1中的方法能够由设置有模型训练装置的电子设备执行,该电子设备可以是终端设备或者指定服务器,其中,用于模型训练的硬件装置(即设置有模型训练装置的电子设备)与目标检测的硬件装置(即设置有目标检测装置的电子设备)可以相同或不同;具体的,针对目标检测模型的训练过程,如图1所示,至少包括以下步骤:Figure 1 is a schematic flow chart of a model training method provided by one or more embodiments of the present application. The method in Figure 1 can be executed by an electronic device equipped with a model training device. The electronic device can be a terminal device or a designated server, where, The hardware device used for model training (i.e., the electronic device equipped with the model training device) and the hardware device used for target detection (i.e., the electronic device equipped with the target detection device) may be the same or different; specifically, the training process of the target detection model , as shown in Figure 1, including at least the following steps:
S102,从第一备选边界框集合中获取第一边界框子集,以及获取第一边界框子集中各第一参考边界框分别对应的实际边界框;其中,第一边界框子集包括第一指定数量的第一参考边界框,第一备选边界框集合是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的。S102, obtain the first boundary box subset from the first candidate boundary box set, and obtain the actual boundary boxes corresponding to each first reference boundary box in the first boundary box subset; wherein the first boundary box subset includes a first specified number The first reference bounding box and the first set of candidate bounding boxes are obtained by extracting the target area from the sample image data set using a preset region of interest extraction model.
针对第一指定数量的第一参考边界框的确定过程,可以是针对每轮模型训练,执行一次利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取的步骤,得到第一指定数量的第一参考边界框;也可以是预先执行利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取的步骤,然后针对每轮模型训练,从预先提取的大量备选边界框随机采样得到第一指定 数量的第一参考边界框。The process of determining the first specified number of first reference bounding boxes may be to perform a step of extracting the target area from the sample image data set using a preset region of interest extraction model for each round of model training to obtain the first specified number. The first reference bounding box of get first designation The number of first reference bounding boxes.
样本图像数据集中可以包含多个样本目标对象,每个样本目标对象可以对应于多个第一参考边界框,即第一指定数量的第一参考边界框包含各样本目标对象分别对应的至少一个第一参考边界框。The sample image data set may contain multiple sample target objects, and each sample target object may correspond to multiple first reference bounding boxes. That is, the first specified number of first reference bounding boxes include at least one first reference bounding box corresponding to each sample target object. A reference bounding box.
在上述步骤S102,从第一备选边界框集合中获取第一边界框子集之前,还包括:将样本图像数据集输入预设感兴趣区域提取模型进行感兴趣区域提取,得到第一备选边界框集合;其中,第一备选边界框集合包括第二指定数量的备选边界框;第二指定数量大于第一指定数量,即针对第二指定数量等于第一指定数量的情况,针对每轮模型训练,均利用预设感兴趣区域提取模型,对样本图像数据集中的多个样本图像数据进行感兴趣区域提取,得到第一指定数量的第一参考边界框;针对第二指定数量大于第一指定数量的情况,针对每轮模型训练,从第一指定数量的备选边界框中随机采样得到第一指定数量的第一参考边界框。In the above step S102, before obtaining the first bounding box subset from the first candidate bounding box set, it also includes: inputting the sample image data set into the preset region of interest extraction model to extract the region of interest to obtain the first candidate boundary. A box set; wherein the first candidate bounding box set includes a second specified number of candidate bounding boxes; the second specified number is greater than the first specified number, that is, for the case where the second specified number is equal to the first specified number, for each round Model training uses a preset region of interest extraction model to extract regions of interest from multiple sample image data in the sample image data set to obtain a first specified number of first reference bounding boxes; for the second specified number greater than the first In a specified number of cases, for each round of model training, a first specified number of first reference bounding boxes are randomly sampled from a first specified number of candidate bounding boxes.
其中,考虑到在模型训练过程中目的之一是通过对模型参数迭代训练不断学习边界框分布,从而提高模型的泛化性和数据可迁移性(即模型参数不依赖于模型训练过程使用的样本数据,能够更好地适用于模型应用过程的待识别数据),由于为了促使待训练的目标检测模型能够更好地学习边界框分布,需要确保提取的输入待训练的目标检测模型的第一参考边界框服从一定概率分布(如高斯分布或者柯西分布),这样利用预设感兴趣区域提取模型提取的锚框的数量N越大,越有助于待训练的目标检测模型更好地进行边界框分布学习,然而如果每次均实时利用预设感兴趣区域提取模型(如感兴趣区域提取算法ROI)提取N个锚框作为第一参考边界框,输入到待训练的目标检测模型中进行模型训练,势必会导致数据处理量比较大,对硬件设备要求比较高;Among them, considering that one of the purposes in the model training process is to continuously learn the bounding box distribution through iterative training of model parameters, thereby improving the generalization and data transferability of the model (that is, the model parameters do not depend on the samples used in the model training process data, which can be better adapted to the data to be identified in the model application process), because in order to promote the target detection model to be trained to better learn the bounding box distribution, it is necessary to ensure that the extracted input is the first reference of the target detection model to be trained. The bounding box obeys a certain probability distribution (such as Gaussian distribution or Cauchy distribution). In this way, the greater the number N of anchor boxes extracted by the preset region of interest extraction model, the more helpful it is for the target detection model to be trained to better detect boundaries. Frame distribution learning, however, if the preset region of interest extraction model (such as region of interest extraction algorithm ROI) is used in real time every time to extract N anchor boxes as the first reference bounding box, input into the target detection model to be trained for model Training will inevitably lead to a relatively large amount of data processing and relatively high requirements for hardware equipment;
预先利用预设感兴趣区域提取模型提取N个锚框,然后,每一轮模型训练从N个锚框中随机采样m个作为第一参考边界框,输入到待训练的目标检 测模型中进行模型训练,这样既能够确保每轮模型训练的数据处理量,也能够确保模型更好地进行边界框分布学习,即能够同时兼顾模型训练过程中的数据处理量和促使边界框分布学习,基于此,上述第二指定数量大于上述第一指定数量,对应的,上述步骤S102,从第一备选边界框集合中获取第一边界框子集,具体包括:从上述第二指定数量的备选边界框中,随机选取第一指定数量的备选边界框作为第一参考边界框,得到第一边界框子集,即预先利用预设感兴趣区域提取模型,对样本图像数据集中的多个样本图像数据进行感兴趣区域提取,得到第二指定数量的备选边界框;然后,针对每轮模型训练,从第二指定数量的备选边界框中随机采样得到第一指定数量的第一参考边界框。The preset region of interest extraction model is used to extract N anchor boxes in advance. Then, in each round of model training, m are randomly sampled from the N anchor boxes as the first reference bounding boxes and are input to the target detection to be trained. Model training is performed in the test model, which can not only ensure the data processing volume of each round of model training, but also ensure that the model can better learn the bounding box distribution, that is, it can take into account both the data processing volume during the model training process and the bounding box distribution. Learning, based on this, the above-mentioned second specified number is greater than the above-mentioned first specified number. Correspondingly, the above-mentioned step S102 obtains the first bounding box subset from the first candidate bounding box set, specifically including: acquiring the first bounding box subset from the above-mentioned second specified number. Among the candidate bounding boxes, a first specified number of candidate bounding boxes are randomly selected as the first reference bounding box to obtain a first subset of bounding boxes. That is, a preset region of interest extraction model is used in advance to extract multiple images from the sample image data set. Extract the region of interest from the sample image data to obtain a second specified number of candidate bounding boxes; then, for each round of model training, randomly sample from the second specified number of candidate bounding boxes to obtain a first specified number of first references. bounding box.
预先提取N个锚框(即第二指定数量的备选边界框),然后,针对每轮模型训练,均从N个锚框中随机采样m个锚框(即第一指定数量的第一参考边界框),然后继续执行下述步骤S104。N anchor boxes (i.e., the second specified number of candidate bounding boxes) are extracted in advance, and then, for each round of model training, m anchor boxes (i.e., the first specified number of first reference boxes) are randomly sampled from the N anchor boxes. bounding box), and then continue to perform the following step S104.
S104,将上述第一参考边界框和实际边界框输入待训练的目标检测模型进行模型迭代训练,直到本次模型迭代训练结果满足预设模型迭代训练终止条件,得到训练后的目标检测模型;上述预设模型迭代训练终止条件可以包括:当前模型训练轮数等于总训练轮数、或者模型损失函数收敛。S104, input the above-mentioned first reference bounding box and the actual bounding box into the target detection model to be trained for model iterative training, until the model iterative training results meet the preset model iterative training termination conditions, and the trained target detection model is obtained; as mentioned above The preset model iterative training termination conditions may include: the current number of model training rounds is equal to the total number of training rounds, or the model loss function converges.
其中,针对上述步骤S104中的模型迭代训练过程,下述对模型迭代训练的具体实现过程进行说明,由于模型迭代训练过程中每次模型训练的处理过程相同,因此,以任意一次模型训练为例进行细化说明。具体的,若待训练的目标检测模型包括边界框预测子模型;如图2所示,每次模型训练的具体实现方式有如下步骤S1042至步骤S1046:Among them, regarding the model iterative training process in step S104, the specific implementation process of the model iterative training is explained below. Since the processing process of each model training in the model iterative training process is the same, any model training is taken as an example. Provide detailed explanation. Specifically, if the target detection model to be trained includes a bounding box prediction sub-model; as shown in Figure 2, the specific implementation of each model training includes the following steps S1042 to step S1046:
S1042,针对每个第一参考边界框:边界框预测子模型基于该第一参考边界框进行边界框预测,得到第一预测边界框;基于第一参考边界框对应的实际边界框和第一参考边界框对应的第一预测边界框,生成边界框比对结果集合;其中,边界框比对结果集合包括表征边界框分布相似程度的第一比对结 果、以及表征边界框坐标重合程度的第二比对结果。S1042, for each first reference bounding box: the bounding box prediction sub-model performs bounding box prediction based on the first reference bounding box to obtain the first predicted bounding box; based on the actual bounding box corresponding to the first reference bounding box and the first reference The first predicted bounding box corresponding to the bounding box generates a set of bounding box comparison results; wherein the set of bounding box comparison results includes a first comparison result that represents the similarity of the distribution of bounding boxes. The result, and the second comparison result that represents the degree of coincidence of the bounding box coordinates.
针对表征边界框分布相似程度的第一比对结果的确定过程,可以通过计算实际边界框与对应的第一预测边界框之间的相对熵KL散度(Kullback-Leibler divergence),得到第一比对结果;其中,由于相对熵KL散度的大小能够反映两个边界框(即实际边界框与对应的第一预测边界框)的概率分布差异程度,概率分布差异程度越大,对应的,边界框分布相似程度越低,因此,相对熵KL散度能够表征实际边界框与对应的第一预测边界框之间的分布相似程度,从而基于相对熵KL散度能够确定从边界框分布相似程度角度考量的比对维度对应的第一回归损失分量,进而促使模型进行边界框回归学习;具体的,针对某一第一参考边界框对应的实际边界框和第一预测边界框而言,相对熵KL散度越大,说明第一预测边界框与对应的实际边界框的概率分布相似程度越低,针对边界框分布相似程度的比对维度而言对应的第一回归损失分量越大,因此,某一第一参考边界框对应的第一预测边界框与对应的实际边界框的分布相似程度是基于针对实际边界框与第一预测边界框的相对熵KL散度所确定的,因此,可以基于相对熵KL散度生成第一比对结果,这样第一比对结果能够表征边界框分布相似程度,进而基于第一比对结果中的相对熵KL散度,即可确定边界框分布相似程度的比对维度对应的第一回归损失分量。For the determination process of the first comparison result that represents the similarity of the bounding box distribution, the first comparison can be obtained by calculating the relative entropy KL divergence (Kullback-Leibler divergence) between the actual bounding box and the corresponding first predicted bounding box. For the results; among them, since the size of the relative entropy KL divergence can reflect the difference in the probability distribution of the two bounding boxes (that is, the actual boundary box and the corresponding first predicted bounding box), the greater the difference in the probability distribution, the corresponding boundary The lower the similarity of the box distribution, therefore, the relative entropy KL divergence can characterize the distribution similarity between the actual bounding box and the corresponding first predicted bounding box, so based on the relative entropy KL divergence, it can be determined from the perspective of the similarity of the bounding box distribution The first regression loss component corresponding to the comparison dimension considered, thereby prompting the model to perform bounding box regression learning; specifically, for the actual bounding box and the first predicted bounding box corresponding to a certain first reference bounding box, the relative entropy KL The greater the divergence, the lower the similarity between the probability distribution of the first predicted bounding box and the corresponding actual bounding box. The corresponding first regression loss component is greater in terms of the comparison dimension of the similarity of the bounding box distribution. Therefore, a certain The distribution similarity between the first predicted bounding box corresponding to a first reference bounding box and the corresponding actual bounding box is determined based on the relative entropy KL divergence between the actual bounding box and the first predicted bounding box. Therefore, it can be based on the relative entropy KL divergence. The entropy KL divergence generates the first comparison result, so that the first comparison result can characterize the similarity degree of the bounding box distribution, and then based on the relative entropy KL divergence in the first comparison result, the ratio of the similarity degree of the bounding box distribution can be determined The first regression loss component corresponding to the dimension.
针对表征边界框坐标重合程度的第二比对结果的确定过程,可以仅考虑某一实际边界框与对应的第一预测边界框之间的交并比损失,得到目标交并比损失;也可以综合考虑某一实际边界框与对应的第一预测边界框之间的交并比损失、以及某一实际边界框与其他实际边界框对应的第一预测边界框之间的交并比损失,确定目标交并比损失;由于目标交并比损失的大小能够表征实际边界框与对应的第一预测边界框之间的坐标重合程度,从而基于目标交并比损失能够确定从边界框坐标重合程度角度考量的比对维度对应的第二回归损失分量,进而促使模型进行边界框回归学习;具体的,针对某一第一 参考边界框对应的实际边界框和第一预测边界框而言,确定实际边界框与第一预测边界框之间的目标交并比损失,目标交并比损失越大,说明第一预测边界框与对应的实际边界框的坐标重合程度越低,针对边界框坐标重合程度的比对维度而言对应的第二回归损失分量越大,因此,某一第一参考边界框对应的第一预测边界框与对应的实际边界框的坐标重合程度是基于实际边界框与第一预测边界框之间的目标交并比损失所确定的,因此,可以基于目标交并比损失生成第二比对结果,这样第二比对结果能够表征边界框坐标重合程度,进而基于第二比对结果中的标交并比损失即可确定边界框坐标重合程度的比对维度对应的第二回归损失分量。For the determination process of the second comparison result that represents the degree of coincidence of bounding box coordinates, only the intersection-union ratio loss between a certain actual bounding box and the corresponding first predicted bounding box can be considered to obtain the target intersection-union ratio loss; it can also be Taking into account the intersection-to-union ratio loss between an actual bounding box and the corresponding first predicted bounding box, and the intersection-to-union ratio loss between an actual bounding box and the first predicted bounding box corresponding to other actual bounding boxes, determine Target intersection and union ratio loss; since the size of the target intersection and union ratio loss can represent the degree of coordinate coincidence between the actual bounding box and the corresponding first predicted bounding box, the angle of the coordinate coincidence degree of the bounding box can be determined based on the target intersection and union ratio loss. The second regression loss component corresponding to the comparison dimension considered, thereby prompting the model to perform bounding box regression learning; specifically, for a certain first Referring to the actual bounding box and the first predicted bounding box corresponding to the bounding box, determine the target intersection-union ratio loss between the actual bounding box and the first predicted bounding box. The greater the target intersection-union ratio loss, the first predicted bounding box The lower the coordinate coincidence degree with the corresponding actual bounding box, the greater the corresponding second regression loss component in terms of the comparison dimension of the boundary box coordinate coincidence degree. Therefore, the first prediction boundary corresponding to a certain first reference bounding box The degree of coordinate coincidence between the frame and the corresponding actual bounding box is determined based on the target intersection-union ratio loss between the actual bounding box and the first predicted bounding box. Therefore, the second comparison result can be generated based on the target intersection-union ratio loss, In this way, the second comparison result can represent the degree of overlap of the bounding box coordinates, and then based on the orthogonal union ratio loss in the second comparison result, the second regression loss component corresponding to the comparison dimension of the degree of overlap of the bounding box coordinates can be determined.
S1044,基于上述第一边界框子集中的第一参考边界框分别对应的第一比对结果和第二比对结果,确定待训练的目标检测模型的边界框回归损失值。S1044: Determine the bounding box regression loss value of the target detection model to be trained based on the first comparison result and the second comparison result respectively corresponding to the first reference bounding box in the first bounding box subset.
在针对每个第一参考边界框分别得到边界框比对结果集合之后,即可得到各第一参考边界框对应的子回归损失值,该子回归损失值至少包括从边界框分布相似程度角度考量的第一比对维度对应的第一回归损失分量、从边界框坐标重合程度角度考量的第二比对维度对应的第二回归损失分量;然后,基于各第一参考边界框对应的子回归损失值,即可确定用于对模型参数进行调整的边界框回归损失值。After obtaining the set of bounding box comparison results for each first reference bounding box, the sub-regression loss value corresponding to each first reference bounding box can be obtained. The sub-regression loss value at least includes considerations from the perspective of the similarity of the bounding box distribution. The first regression loss component corresponding to the first comparison dimension of value to determine the bounding box regression loss value used to adjust the model parameters.
需要说明的是,在确定第一参考边界框对应的子回归损失值的过程中,可以同时从边界框分布相似程度和边界框坐标重合程度的角度考量,也可以仅从边界框分布相似程度角度考量,即上述第一参考边界框对应的边界框比对结果集合包括第一比对结果,对应的,基于第一比对结果对应的第一回归损失分量确定第一参考边界框对应的子回归损失值。It should be noted that in the process of determining the sub-regression loss value corresponding to the first reference bounding box, it can be considered from the perspective of the similarity of the distribution of the bounding boxes and the degree of coincidence of the coordinates of the bounding boxes at the same time, or it can also be considered only from the perspective of the similarity of the distribution of the bounding boxes. Consider that the set of bounding box comparison results corresponding to the first reference bounding box includes the first comparison result. Correspondingly, the sub-regression corresponding to the first reference bounding box is determined based on the first regression loss component corresponding to the first comparison result. loss value.
S1046,基于上述边界框回归损失值对上述边界框预测子模型进行参数更新。S1046: Update the parameters of the above bounding box prediction sub-model based on the above bounding box regression loss value.
在基于各第一参考边界框对应的子回归损失值确定出边界框回归损失值之后,利用梯度下降方法,基于上述边界框回归损失值对边界框预测子模型 进行参数调整;其中,由于子回归损失值至少反映了基于边界框分布相似程度的回归损失比对维度对应的第一回归损失分量、以及基于边界框坐标重合程度的回归损失比对维度对应的第二回归损失分量,因此,用于对模型参数进行调整的边界框回归损失值也反映了这两个回归损失比对维度分别对应的回归损失分量,使得最终训练得到的目标检测模型不仅能够确保预测得到的第一预测边界框与实际边界框的概率分布更接近,也能够确保第一预测边界框与实际边界框的坐标重合程度更高。After determining the bounding box regression loss value based on the sub-regression loss value corresponding to each first reference bounding box, the gradient descent method is used to predict the bounding box prediction sub-model based on the above-mentioned bounding box regression loss value. Perform parameter adjustment; among them, because the sub-regression loss value at least reflects the first regression loss component corresponding to the regression loss comparison dimension based on the similarity of the bounding box distribution, and the third regression loss comparison dimension corresponding to the degree of coincidence of the bounding box coordinates. Two regression loss components, therefore, the bounding box regression loss value used to adjust the model parameters also reflects the regression loss components corresponding to the two regression loss comparison dimensions, so that the target detection model finally trained can not only ensure prediction The probability distribution of the obtained first predicted bounding box and the actual bounding box is closer, and it can also ensure that the coordinates of the first predicted bounding box and the actual bounding box coincide with a higher degree.
基于待训练的目标检测模型的边界框回归损失值对模型参数进行迭代训练,得到目标检测模型的过程,可以参见现有的利用梯度下降方法反向传播对模型参数进行调优的过程,在此不再赘述。The process of iteratively training the model parameters based on the bounding box regression loss value of the target detection model to be trained to obtain the target detection model can be found in the existing process of backpropagating the model parameters using the gradient descent method, here No longer.
另外,还需要说明的是,基于本申请实施例提供的模型训练方法训练得到的目标检测模型可以应用到任一需要对待检测图像进行目标检测的具体应用场景,例如,具体应用场景1,对利用某一公共场所入口(如商场入口、地铁口、景点入口、或演出现场入口等)的图像采集设备所采集得到的待检测图像进行目标检测,又如,具体应用场景2,对利用某一养殖基地中各监控点的图像采集设备所采集得到的待检测图像进行目标检测。In addition, it should be noted that the target detection model trained based on the model training method provided by the embodiment of the present application can be applied to any specific application scenario that requires target detection on the image to be detected. For example, specific application scenario 1, for use The image to be detected collected by the image collection equipment at the entrance of a certain public place (such as the entrance of a shopping mall, a subway entrance, an entrance to a scenic spot, or the entrance to a performance site, etc.) is used for target detection. For another example, specific application scenario 2 is to use a certain breeding The images to be detected collected by the image acquisition equipment at each monitoring point in the base are used for target detection.
其中,由于目标检测模型的具体应用场景的不同,模型训练过程所使用的样本图像数据集也有所不同,针对具体应用场景1,样本图像数据集可以是预设历史时间段内在指定公共场所入口采集得到的历史样本图像,对应的,第一参考边界框所圈定的目标对象为历史样本图像中进入该指定公共场所的目标用户,实际类别和第一预测类别可以是目标用户所属类别,如年龄段、性别、身高、职业中至少一项;针对具体应用场景2,样本图像数据集可以是预设历史时间段内在指定养殖基地中各监控点采集得到的历史样本图像,对应的,第一参考边界框所圈定的目标对象为历史样本图像中的目标养殖对象,实际类别和第一预测类别可以是目标养殖对象所属类别,如活体状态、体型大小中至少一项。 Among them, due to the different specific application scenarios of the target detection model, the sample image data sets used in the model training process are also different. For specific application scenario 1, the sample image data set can be collected at the entrance of a designated public place within a preset historical time period. For the obtained historical sample image, correspondingly, the target object circled by the first reference bounding box is the target user who entered the designated public place in the historical sample image. The actual category and the first predicted category can be the category to which the target user belongs, such as the age group. , gender, height, and occupation; for specific application scenario 2, the sample image data set can be historical sample images collected at each monitoring point in the designated breeding base within the preset historical time period, corresponding to the first reference boundary The target object enclosed by the frame is the target breeding object in the historical sample image. The actual category and the first predicted category can be the category to which the target breeding object belongs, such as at least one of living status and body size.
如图3所示,给出了一种目标检测模型训练过程的具体实现原理示意图,具体包括:As shown in Figure 3, a schematic diagram of the specific implementation principles of the training process of a target detection model is given, including:
获取第一指定数量的第一参考边界框,以及获取各第一参考边界框分别对应的实际边界框;针对每个第一参考边界框:上述边界框预测子模型基于该第一参考边界框进行边界框预测,得到第一预测边界框;然后,比对结果生成模块基于上述第一参考边界框对应的实际边界框和上述第一参考边界框对应的第一预测边界框,生成边界框比对结果集合;基于各第一参考边界框对应的第一比对结果和第二比对结果,确定待训练的目标检测模型的边界框回归损失值;基于上述边界框回归损失值对待训练的目标检测模型的模型参数进行迭代更新,直到本次模型迭代训练结果满足预设模型迭代训练终止条件,得到目标检测模型。Obtain a first designated number of first reference bounding boxes, and obtain actual bounding boxes corresponding to each first reference bounding box; for each first reference bounding box: the above-mentioned boundary box prediction sub-model is performed based on the first reference bounding box. Boundary box prediction is performed to obtain the first predicted bounding box; then, the comparison result generation module generates a boundary box comparison based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box. Result set; based on the first comparison result and the second comparison result corresponding to each first reference bounding box, determine the bounding box regression loss value of the target detection model to be trained; detect the target to be trained based on the above bounding box regression loss value The model parameters of the model are updated iteratively until the results of this model iterative training meet the preset model iterative training termination conditions, and the target detection model is obtained.
针对边界框比对结果集合的确定过程,上述步骤S1042中的基于第一参考边界框对应的实际边界框和第一参考边界框对应的第一预测边界框,生成边界框比对结果集合,具体包括:基于第一参考边界框对应的实际边界框和第一预测边界框,计算相对熵KL散度,得到第一比对结果;基于第一参考边界框对应的实际边界框和第一预测边界框,计算边界框交并比损失,得到第二比对结果。Regarding the determination process of the bounding box comparison result set, the above step S1042 generates a bounding box comparison result set based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box. Specifically, It includes: calculating the relative entropy KL divergence based on the actual bounding box and the first predicted bounding box corresponding to the first reference bounding box, and obtaining the first comparison result; based on the actual bounding box and the first predicted boundary corresponding to the first reference bounding box. box, calculate the intersection and union ratio loss of the bounding box, and obtain the second comparison result.
针对每个第一参考边界框,该第一参考边界框对应的边界框比对结果集合不仅包括从边界框分布相似程度角度出发得到的第一比对结果,还包括从边界框坐标重合程度角度出发得到的第二比对结果,这样能够提高边界框比对结果集合的全面性,进而提高基于边界框比对结果集合得到的边界框回归损失的准确度。For each first reference bounding box, the set of bounding box comparison results corresponding to the first reference bounding box not only includes the first comparison result obtained from the perspective of the similarity degree of the boundary box distribution, but also includes the first comparison result obtained from the perspective of the degree of coincidence of the bounding box coordinates. Starting from the second comparison result obtained, this can improve the comprehensiveness of the bounding box comparison result set, and thereby improve the accuracy of the bounding box regression loss based on the bounding box comparison result set.
如图4a所示,给出了另一种目标检测模型训练过程的具体实现原理示意图,具体包括:预先利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取,得到N个锚框;其中,样本图像数据集包括多个原始样本图像,每个原始样本图像至少包括一个目标对象;每个锚框对应的特征信息可以包 括位置信息(x,y,w,h)和类别信息c,即(x,y,w,h,c);具体的,在模型训练过程中,可以设定多个参数维度之间是相互独立的,因此,针对每个维度的模型参数的迭代训练过程也是相互独立的;针对每一轮模型训练,从N个锚框中随机采样m个锚框作为第一参考边界框,以及确定每个第一参考边界框分别对应的实际边界框;其中,样本图像数据集中的每个目标对象可以对应于一个实际边界框,例如,样本图像数据集中目标对象的总数为d,则扩充前的实际边界框的数量为d,为了使得实际边界框与第一预测边界框相对应,因此,包含相同目标对象的多个第一参考边界框对应的实际边界框可以是相同的,即基于第一参考边界框所圈定的目标对象,对实际边界框进行扩充,得到m个实际边界框(m>d);例如,某一原始样本图像中包含的目标对象为一只猫A,猫A对应于实际边界框A,若包含有猫A的第一参考边界框的数量为4个(如序号为6、7、8、9的第一参考边界框),则将实际边界框A扩充为4个实际边界框A(即序号为6、7、8、9的实际边界框);针对每个第一参考边界框,边界框预测子模型基于该第一参考边界框进行边界框预测,得到第一预测边界框;然后,比对结果生成模块基于该第一参考边界框对应的实际边界框和对应的第一预测边界框,生成边界框比对结果集合;其中,每个第一参考边界框对应于一个实际边界框和一个第一预测边界框,第一预测边界框是通过不断进行边界框回归学习的边界框预测子模型预测得到的;具体的,边界框预测子模型输出的m个第一预测边界框中序号为6、7、8、9的第一预测边界框所圈定的目标对象为猫A;针对每个第一参考边界框,基于第一参考边界框的边界框比对结果集合中的第一比对结果确定第一回归损失分量,基于第一参考边界框的边界框比对结果集合中的第二比对结果确定第二回归损失分量;基于各第一参考边界框分别对应的第一回归损失分量和第二回归损失分量,确定待训练的目标检测模型的边界框回归损失值;利用随机梯度下降方法,基于该边界框回归损失值调整上述边界框预测子模型的模型参数,得到参数更新后的边界框预测子模型;若本次模型迭代训练结果 满足预设模型迭代训练终止条件,则将上述更新后的边界框预测子模型确定为训练后的目标检测模型;若本次模型迭代训练结果不满足预设模型迭代训练终止条件,则将上述更新后的边界框预测子模型确定为下一轮模型训练所使用的待训练的目标检测模型,直到满足预设模型迭代训练终止条件。As shown in Figure 4a, a schematic diagram of the specific implementation principle of the training process of another target detection model is given, which includes: Preliminarily using the preset area of interest extraction model to extract the target area from the sample image data set to obtain N anchor frames ; Among them, the sample image data set includes multiple original sample images, each original sample image includes at least one target object; the feature information corresponding to each anchor frame can include Including position information (x, y, w, h) and category information c, that is (x, y, w, h, c); specifically, during the model training process, multiple parameter dimensions can be set to be mutually exclusive Independent, therefore, the iterative training process for the model parameters of each dimension is also independent of each other; for each round of model training, m anchor boxes are randomly sampled from N anchor boxes as the first reference bounding box, and each Actual bounding boxes corresponding to the first reference bounding boxes respectively; wherein, each target object in the sample image data set can correspond to an actual bounding box. For example, the total number of target objects in the sample image data set is d, then the actual bounding box before expansion The number of bounding boxes is d. In order to make the actual bounding box correspond to the first predicted bounding box, therefore, the actual bounding boxes corresponding to multiple first reference bounding boxes containing the same target object can be the same, that is, based on the first reference The target object enclosed by the bounding box expands the actual bounding box to obtain m actual bounding boxes (m>d); for example, the target object contained in a certain original sample image is a cat A, and cat A corresponds to the actual Bounding box A, if the number of first reference bounding boxes containing cat A is 4 (such as the first reference bounding boxes with serial numbers 6, 7, 8, and 9), then the actual bounding box A is expanded to 4 actual Bounding box A (that is, the actual bounding box with serial numbers 6, 7, 8, and 9); for each first reference bounding box, the bounding box prediction sub-model performs bounding box prediction based on the first reference bounding box to obtain the first prediction bounding box; then, the comparison result generation module generates a set of boundary box comparison results based on the actual bounding box corresponding to the first reference bounding box and the corresponding first predicted bounding box; wherein each first reference bounding box corresponds to An actual bounding box and a first predicted bounding box. The first predicted bounding box is predicted by the bounding box prediction sub-model that continuously performs bounding box regression learning; specifically, the m first predictions output by the bounding box prediction sub-model The target object circled by the first predicted bounding box with serial numbers 6, 7, 8, and 9 in the bounding box is cat A; for each first reference bounding box, the set of bounding box comparison results based on the first reference bounding box The first comparison result determines the first regression loss component, and the second comparison result in the bounding box comparison result set of the first reference bounding box determines the second regression loss component; based on the corresponding The first regression loss component and the second regression loss component determine the bounding box regression loss value of the target detection model to be trained; use the stochastic gradient descent method to adjust the model parameters of the above bounding box prediction sub-model based on the bounding box regression loss value, Get the bounding box prediction sub-model with updated parameters; if the model iterative training results If the preset model iterative training termination conditions are met, the updated bounding box prediction sub-model will be determined as the trained target detection model; if the model iterative training results do not meet the preset model iterative training termination conditions, the above updated bounding box prediction sub-model will be The final bounding box prediction sub-model is determined as the target detection model to be trained for the next round of model training until the preset model iterative training termination conditions are met.
其中,待训练的目标检测模型的边界框回归损失值是基于多个第一参考边界框分别对应的子回归损失值共同决定的,每个第一参考边界框对应的子回归损失值是基于多个回归损失分量共同决定的,基于此,上述步骤S1044,基于第一边界框子集中的第一参考边界框分别对应的第一比对结果和第二比对结果,确定边界框回归损失值,具体包括:确定第一边界框子集中各第一参考边界框分别对应的子回归损失值;每个第一参考边界框对应的子回归损失值是基于目标信息确定的,其中,目标信息包括以下一种或组合:第一参考边界框对应的第一比对结果所表征的边界框分布相似程度、第二比对结果所表征的边界框坐标重合程度;基于第一边界框子集中的各第一参考边界框分别对应的子回归损失值,确定当前待训练的目标检测模型的边界框回归损失值。Among them, the bounding box regression loss value of the target detection model to be trained is jointly determined based on the sub-regression loss values corresponding to multiple first reference bounding boxes. The sub-regression loss value corresponding to each first reference bounding box is based on the multiple first reference bounding boxes. based on this, the above-mentioned step S1044 determines the bounding box regression loss value based on the first comparison result and the second comparison result respectively corresponding to the first reference bounding box in the first bounding box subset. Specifically, It includes: determining the sub-regression loss value corresponding to each first reference bounding box in the first bounding box subset; the sub-regression loss value corresponding to each first reference bounding box is determined based on the target information, wherein the target information includes one of the following Or a combination: the similarity of the distribution of bounding boxes represented by the first comparison result corresponding to the first reference bounding box, and the degree of coincidence of bounding box coordinates represented by the second comparison result; based on each first reference boundary in the first bounding box subset The sub-regression loss value corresponding to each frame determines the bounding box regression loss value of the target detection model currently to be trained.
在确定第一参考边界框对应的子回归损失值的过程中,可以仅考虑第一比对结果对应的第一回归损失分量,也可以同时考虑第一比对结果对应的第一回归损失分量和第二比对结果对应的第二回归损失分量;其中,以同时考虑两个边界框比对维度为例,针对每个第一参考边界框而言,对应的子回归损失值等于两个回归损失分量加权求和,具体可以表示为,
Vi(D,G)=λ1Vi12Vi2
In the process of determining the sub-regression loss value corresponding to the first reference bounding box, only the first regression loss component corresponding to the first comparison result may be considered, or both the first regression loss component corresponding to the first comparison result and The second regression loss component corresponding to the second comparison result; where, taking two bounding box comparison dimensions into consideration at the same time as an example, for each first reference bounding box, the corresponding sub-regression loss value is equal to two regression losses The weighted sum of components can be expressed specifically as,
V i (D,G)=λ 1 V i12 V i2
其中,λ1表示在第一比对维度下的第一回归损失分量对应的第一权重系数,Vi1表示在第一比对维度下的第一回归损失分量(即与第一比对结果所表征的边界框分布相似程度对应的回归损失分量),λ2表示在第二比对维度下的第二回归损失分量对应的第二权重系数,Vi2表示在第二比对维度下的第二回归损失分量(即与第二比对结果所表征的边界框坐标重合程度对应的回归损 失分量);具体的,第一比对维度可以是基于边界框分布相似程度的回归损失比对维度,第二比对维度可以是基于边界框坐标重合程度的回归损失比对维度。Among them, λ 1 represents the first weight coefficient corresponding to the first regression loss component under the first comparison dimension, and V i1 represents the first regression loss component under the first comparison dimension (that is, the same as the first comparison result). The regression loss component corresponding to the similarity of the bounding box distribution represented), λ 2 represents the second weight coefficient corresponding to the second regression loss component in the second comparison dimension, V i2 represents the second The regression loss component (that is, the regression loss corresponding to the coincidence degree of the bounding box coordinates represented by the second comparison result loss component); specifically, the first comparison dimension can be a regression loss comparison dimension based on the similarity of bounding box distributions, and the second comparison dimension can be a regression loss comparison dimension based on the coincidence degree of bounding box coordinates.
对于多个第一参考边界框而言,第一权重系数和第二权重系数可以是保持不变的,然而考虑到第一回归损失分量和第二回归损失分量分别对应于不同的回归损失比对维度(即基于边界框分布相似程度的回归损失比对维度和基于边界框坐标重合程度的回归损失比对维度),并且不同的回归损失比对维度的回归损失考量的侧重点也有所不同(如基于边界框分布相似程度的回归损失比对维度侧重于考虑边界框边缘模糊的实际边界框对应的第一参考边界框的回归损失,基于边界框坐标重合程度的回归损失比对维度侧重于考虑边界框分布相似但具***置偏差的第一参考边界框的回归损失),因此,第一回归损失分量和第二回归损失分量的大小关系在一定程度上反映了哪个回归损失比对维度能够更加准确地表征实际边界框与第一预测边界框之间的回归损失,基于此,针对每个第一参考边界框,根据该第一参考边界框对应的第一回归损失分量和第二回归损失分量的大小关系,调节第一权重系数和第二权重系数的大小;具体的,若第一回归损失分量与第二回归损失分量的差值的绝对值不大于预设损失阈值,则第一权重系数和第二权重系数保持不变;若第一回归损失分量与第二回归损失分量的差值的绝对值大于预设损失阈值,且第一回归损失分量大于第二回归损失分量,则按照第一预设调节方式,增大第一权重系数;若第一回归损失分量与第二回归损失分量的差值的绝对值大于预设损失阈值,且第一回归损失分量小于第二回归损失分量,则按照第二预设调节方式,增大第二权重系数,从而达到在模型训练过程中针对每个第一参考边界框而言,重点参考能够更好地反映边界框回归损失的比对维度对应的回归损失分量的效果,进而实现进一步提高模型参数优化的准确度。For multiple first reference bounding boxes, the first weight coefficient and the second weight coefficient may remain unchanged, however, considering that the first regression loss component and the second regression loss component respectively correspond to different regression loss comparisons dimensions (that is, the regression loss comparison dimension based on the similarity of the bounding box distribution and the regression loss comparison dimension based on the coincidence degree of the bounding box coordinates), and the focus of regression loss consideration for different regression loss comparison dimensions is also different (such as The regression loss comparison dimension based on the similarity of the bounding box distribution focuses on the regression loss of the first reference bounding box corresponding to the actual bounding box with blurred boundary box edges. The regression loss comparison dimension based on the coincidence degree of the bounding box coordinates focuses on considering the boundary. The regression loss of the first reference bounding box with similar box distribution but specific position deviation), therefore, the size relationship between the first regression loss component and the second regression loss component reflects to a certain extent which regression loss comparison dimension can be more accurate Characterizes the regression loss between the actual bounding box and the first predicted bounding box. Based on this, for each first reference bounding box, according to the size of the first regression loss component and the second regression loss component corresponding to the first reference bounding box relationship, adjust the size of the first weight coefficient and the second weight coefficient; specifically, if the absolute value of the difference between the first regression loss component and the second regression loss component is not greater than the preset loss threshold, then the first weight coefficient and the second weight coefficient The two weight coefficients remain unchanged; if the absolute value of the difference between the first regression loss component and the second regression loss component is greater than the preset loss threshold, and the first regression loss component is greater than the second regression loss component, then the first preset The adjustment method is to increase the first weight coefficient; if the absolute value of the difference between the first regression loss component and the second regression loss component is greater than the preset loss threshold, and the first regression loss component is less than the second regression loss component, then according to the second The second preset adjustment method increases the second weight coefficient, so that for each first reference bounding box during the model training process, the key reference can better reflect the regression loss corresponding to the comparison dimension of the bounding box regression loss. component effect, thereby further improving the accuracy of model parameter optimization.
上述第一预设调节方式对应的第一权重系数增大幅度和第二预设调节方式对应的第二权重系数增大幅度可以相同,也可以不同,权重系数增大幅度 可以根据实际需求进行设置,本申请并不对此进行限定。The increase amplitude of the first weight coefficient corresponding to the above-mentioned first preset adjustment mode and the increase amplitude of the second weight coefficient corresponding to the second preset adjustment mode may be the same or different. The increase amplitude of the weight coefficient It can be set according to actual needs, and this application does not limit this.
其中,针对从边界框分布相似程度的比对维度考量得到第一比对结果的过程,上述基于第一参考边界框对应的实际边界框和第一预测边界框,计算相对熵KL散度,得到第一比对结果,具体包括:步骤A1,确定第一参考边界框对应的实际边界框的第一概率分布,以及确定第一参考边界框对应的第一预测边界框的第二概率分布;步骤A2,计算上述第一概率分布与第二概率分布之间的KL散度值;其中,KL散度值用于表征第一预测边界框与实际边界框之间的分布相似程度;步骤A3,基于上述KL散度值,确定第一参考边界框对应的第一比对结果。Among them, for the process of obtaining the first comparison result by considering the comparison dimension of the similarity degree of the bounding box distribution, the relative entropy KL divergence is calculated based on the actual bounding box and the first predicted bounding box corresponding to the first reference bounding box, and we obtain The first comparison result specifically includes: step A1, determining the first probability distribution of the actual bounding box corresponding to the first reference bounding box, and determining the second probability distribution of the first predicted bounding box corresponding to the first reference bounding box; steps A2, calculate the KL divergence value between the above-mentioned first probability distribution and the second probability distribution; wherein, the KL divergence value is used to characterize the distribution similarity between the first predicted bounding box and the actual bounding box; Step A3, based on The above KL divergence value determines the first comparison result corresponding to the first reference bounding box.
针对每个第一参考边界框,从边界框分布相似程度角度,计算实际边界框对应的第一概率分布与第一预测边界框对应的第二概率分布之间的KL散度值,该KL散度值能够表征实际边界框与对应的第一预测边界框之间的边界框分布相似程度,KL散度值越小,边界框分布差异程度越小,对应的,边界框分布相似程度也就越大,因此,在确定出实际边界框与第一预测边界框之间的KL散度值之后,即可得到第一比对结果,其中,第一比对结果能够表征边界框分布相似程度;进而,基于第一比对结果,即可确定表征边界框分布相似程度的比对维度对应的第一回归损失分量,其中,KL散度值越大,表征第一参考边界框对应的实际边界框与对应的第一预测边界框的分布相似程度越低,因此,第一参考边界框对应的第一回归损失分量越大;然后,基于第一回归损失分量更新边界框预测子模型的模型参数,提高边界框预测子模型的边界框预测效果。For each first reference bounding box, calculate the KL divergence value between the first probability distribution corresponding to the actual bounding box and the second probability distribution corresponding to the first predicted bounding box from the perspective of boundary box distribution similarity. The KL divergence value is The degree value can represent the similarity of the bounding box distribution between the actual bounding box and the corresponding first predicted bounding box. The smaller the KL divergence value, the smaller the difference in the bounding box distribution. Correspondingly, the more similar the bounding box distribution is. Therefore, after determining the KL divergence value between the actual bounding box and the first predicted bounding box, the first comparison result can be obtained, where the first comparison result can characterize the similarity of the distribution of the bounding boxes; and then , based on the first comparison result, the first regression loss component corresponding to the comparison dimension that represents the similarity of the bounding box distribution can be determined. The greater the KL divergence value, the greater the KL divergence value, which represents the actual bounding box corresponding to the first reference bounding box. The lower the distribution similarity of the corresponding first predicted bounding box, therefore, the greater the first regression loss component corresponding to the first reference bounding box; then, the model parameters of the bounding box prediction sub-model are updated based on the first regression loss component to improve The bounding box prediction effect of the bounding box prediction sub-model.
由于实际边界框和预测边界框出现的概率均服从某一概率分布(如高斯分布),若第一概率分布为和第二概率分布为则KL散度值可以表示为第一概率分布和第二概率分布具体可以通过如下方式确定:
Since the probability of occurrence of the actual bounding box and the predicted bounding box both obeys a certain probability distribution (such as Gaussian distribution), if the first probability distribution is and the second probability distribution is Then the KL divergence value can be expressed as The first probability distribution and the second probability distribution can be specifically determined in the following ways:
其中,表示序号为i的第一参考边界框,σ1表示第一方差,bground表示实际边界框的均值,θd表示与真实边界框分布有关的参数。
in, represents the first reference bounding box with serial number i, σ 1 represents the first variance, b ground represents the mean of the actual bounding box, and θ d represents the parameters related to the distribution of the real bounding box.
其中,表示序号为i的第一参考边界框,σ2表示第二方差,bestimation表示第一预测边界框的均值,θg表示边界框预测子模型的模型参数。in, represents the first reference bounding box with serial number i, σ 2 represents the second variance, b estimation represents the mean of the first predicted bounding box, and θ g represents the model parameters of the bounding box prediction sub-model.
上述边界框回归损失值等于第一指定数量的第一参考边界框分别对应的子回归损失值之和,具体可以表示为:
The above bounding box regression loss value is equal to the sum of the sub-regression loss values corresponding to the first specified number of first reference bounding boxes. Specifically, it can be expressed as:
其中,Nreg表示第一指定数量,i表示第一参考边界框的序号,i的取值为1至NregAmong them, N reg represents the first specified number, i represents the serial number of the first reference bounding box, and the value of i is 1 to N reg .
其中,针对从边界框坐标重合程度的比对维度考量得到第二比对结果的过程,上述基于上述第一参考边界框对应的实际边界框和第一预测边界框,计算边界框交并比损失,得到第二比对结果,具体包括:Among them, for the process of obtaining the second comparison result by considering the comparison dimension of the bounding box coordinate coincidence degree, the intersection and union ratio loss of the bounding box is calculated based on the actual bounding box and the first predicted bounding box corresponding to the first reference bounding box. , the second comparison result is obtained, specifically including:
步骤B1,对第一参考边界框对应的实际边界框和第一参考边界框对应的第一预测边界框进行边界框交并比损失计算,得到第一交并比损失。Step B1: Calculate the intersection and union ratio loss of the bounding box on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box to obtain the first intersection and union ratio loss.
以序号为i的第一参考边界框为例,计算序号为i的实际边界框与序号为i的第一预测边界框之间的交并比损失,得到序号为i的第一参考边界框对应的第一交并比损失。Taking the first reference bounding box with serial number i as an example, calculate the intersection loss between the actual bounding box with serial number i and the first predicted bounding box with serial number i, and obtain the correspondence of the first reference bounding box with serial number i. The first cross-over loss.
步骤B2,基于上述第一交并比损失,确定第一参考边界框对应的第二比对结果;其中,边界框交并比损失能够表征边界框坐标重合程度。Step B2: Determine the second comparison result corresponding to the first reference bounding box based on the above-mentioned first intersection-union ratio loss; wherein the bounding box intersection-union ratio loss can represent the degree of coincidence of the bounding box coordinates.
由于两个边界框之间的交并比损失的大小能够表征边界框坐标重合程度,因此,可以基于实际边界框与第一预测边界框之间的交并比损失,得到第二 比对结果,从而基于第二比对结果确定从边界框坐标重合程度角度考量的比对维度对应的第二回归损失分量,进而促使模型进行边界框回归学习。Since the size of the intersection-union ratio loss between two bounding boxes can characterize the degree of overlap of the coordinates of the bounding boxes, the second can be obtained based on the intersection-union ratio loss between the actual bounding box and the first predicted bounding box. The comparison result is used to determine the second regression loss component corresponding to the comparison dimension considered from the perspective of the coincidence degree of the bounding box coordinates based on the second comparison result, thereby prompting the model to perform bounding box regression learning.
进一步地,针对第二比对结果的确定过程,可以仅考虑实际边界框与自身对应的第一预测边界框之间的第一交并比损失,然而,为了提高第二比对结果的确定准确度,从而提高从边界框坐标重合程度角度考量的比对维度对应的第二回归损失分量的准确度,进而提高用于调整模型参数的边界框回归损失值的准确度,不仅考虑实际边界框与自身对应的第一预测边界框之间的第一交并比损失,还考虑实际边界框与其他第一预测边界框之间的第二交并比损失,这样能够达到将实际边界框分别与正例样本(即通过边界框回归学习得到的某一实际边界框自身对应的第一预测边界框)和负例样本(即通过边界框回归学习得到的除某一实际边界框之外的其他实际边界框对应的第一预测边界框)在边界框坐标重合程度的比对维度上进行比对,来学习实际边界框的具***置表示,进而促使模型更好地进行边界框回归学习,基于此,上述步骤B2,基于上述第一交并比损失,确定第一参考边界框对应的第二比对结果,具体包括:Further, for the determination process of the second comparison result, only the first intersection-union ratio loss between the actual bounding box and its corresponding first predicted bounding box can be considered. However, in order to improve the accuracy of the determination of the second comparison result, degree, thereby improving the accuracy of the second regression loss component corresponding to the comparison dimension considered from the perspective of the coincidence degree of the bounding box coordinates, thereby improving the accuracy of the bounding box regression loss value used to adjust model parameters, not only considering the actual bounding box and The first intersection loss between the corresponding first predicted bounding boxes is also considered, and the second intersection loss between the actual bounding box and other first predicted bounding boxes is also considered. This can achieve the goal of distinguishing the actual bounding box from the normal bounding box respectively. Example samples (i.e., the first predicted bounding box corresponding to an actual bounding box learned through bounding box regression) and negative example samples (i.e., other actual boundaries other than an actual bounding box learned through bounding box regression) The first predicted bounding box corresponding to the frame) is compared in the comparison dimension of the coincidence degree of the bounding box coordinates to learn the specific position representation of the actual bounding box, thereby prompting the model to better perform bounding box regression learning. Based on this, the above Step B2, based on the above-mentioned first intersection ratio loss, determine the second comparison result corresponding to the first reference bounding box, specifically including:
B21,在第一指定数量的第一参考边界框分别对应的第一预测边界框中,确定对比边界框集合;其中,对比边界框集合包括除第一参考边界框对应的第一预测边界框之外的其他第一预测边界框、或者不包含第一参考边界框所圈定的目标对象的其他第一预测边界框。B21. Determine a set of comparison bounding boxes among the first predicted bounding boxes corresponding to the first designated number of first reference bounding boxes; wherein the set of comparison bounding boxes includes the first predicted bounding boxes except the first predicted bounding boxes corresponding to the first reference bounding boxes. other first predicted bounding boxes outside the first reference bounding box, or other first predicted bounding boxes that do not include the target object enclosed by the first reference bounding box.
仍以序号为i的第一参考边界框为例,上述对比边界框集合可以包括除序号为i的第一预测边界框之外的其他第一预测边界框(即序号为k的第一预测边界框,k≠p,p=i),也就是说,将除序号为i的第一预测边界框之外的其他第一预测边界框均作为序号为i的实际边界框的负例样本;为了进一步提高负例样本的选取准确度,上述对比边界框集合可以包括除序号为i的第一预测边界框之外的其他第一预测边界框,且其他第一预测边界框不包含序号为i的第一参考边界框所圈定的目标对象(即序号为k的第一预测边界框, k≠p,p=i或p=j,序号为j的第一预测边界框与序号为i的第一参考边界框所圈定的目标对象相同),也就是说,仅将与序号为i的第一参考边界框包含不同目标对象的其他第一预测边界框作为序号为i的实际边界框的负例样本。Still taking the first reference bounding box with serial number i as an example, the set of comparison bounding boxes may include other first predicted bounding boxes except the first predicted bounding box with serial number i (i.e., the first predicted boundary box with serial number k). box, k≠p, p=i), that is to say, except for the first predicted bounding box with serial number i, all other first predicted bounding boxes are used as negative samples of the actual bounding box with serial number i; in order To further improve the accuracy of selecting negative samples, the above comparison bounding box set may include other first predicted bounding boxes except the first predicted bounding box with serial number i, and the other first predicted bounding boxes do not include the first predicted bounding box with serial number i. The target object enclosed by the first reference bounding box (i.e., the first predicted bounding box with serial number k, k≠p, p=i or p=j, the first prediction bounding box with serial number j and the first reference bounding box with serial number i have the same target object), that is to say, only the first predicted bounding box with serial number i The first reference bounding box contains other first predicted bounding boxes of different target objects as negative samples of the actual bounding box with serial number i.
B22,对上述第一参考边界框对应的实际边界框和其他第一预测边界框分别进行边界框交并比损失计算,得到第二交并比损失。B22: Calculate the intersection and union loss of the bounding boxes on the actual bounding box corresponding to the above-mentioned first reference bounding box and other first predicted bounding boxes, respectively, to obtain the second intersection and union loss.
仍以序号为i的第一参考边界框为例,针对对比边界框集合中的每个其他第一预测边界框,计算序号为i的实际边界框与序号为k的第一预测边界框之间的交并比损失,得到序号为k的第一预测边界框对应的第二交并比损失。Still taking the first reference bounding box with serial number i as an example, for each other first predicted bounding box in the comparison boundary box set, calculate the difference between the actual bounding box with serial number i and the first predicted bounding box with serial number k. The intersection-union ratio loss is obtained, and the second intersection-union ratio loss corresponding to the first predicted bounding box with serial number k is obtained.
B23,基于上述第一交并比损失和第二交并比损失,确定第一参考边界框对应的第二比对结果。B23: Based on the first intersection-union ratio loss and the second intersection-union ratio loss, determine the second comparison result corresponding to the first reference bounding box.
在确定表征边界框坐标重合程度的第二比对结果的过程中,基于序号为i的实际边界框和序号为i的第一预测边界框,计算第一交并比损失,以及基于序号为i的实际边界框和序号为k的第一预测边界框,计算第二交并比损失(k≠p),以确定第二比对结果(即第二比对结果可以包括第一交并比损失和第二交并比损失),然后,基于第二比对结果即可确定与边界框坐标重合程度有关的第二回归损失分量,这样基于第二回归损失分量对模型参数进行调整,能够让序号为i的实际边界框与序号为i的第一预测边界框的坐标重合程度更高,与其他第一预测边界框的坐标重合程度更小,从而增强边界框回归学习的全局性,进一步提高边界框回归学习的准确度。In the process of determining the second comparison result that represents the degree of coincidence of the bounding box coordinates, a first intersection and union ratio loss is calculated based on the actual bounding box with serial number i and the first predicted bounding box with serial number i, and based on the serial number i The actual bounding box and the first predicted bounding box with serial number k, calculate the second intersection and union ratio loss (k≠p) to determine the second comparison result (that is, the second comparison result can include the first intersection and union ratio loss and the second intersection and union loss), then, based on the second comparison result, the second regression loss component related to the degree of coincidence of the bounding box coordinates can be determined. In this way, the model parameters can be adjusted based on the second regression loss component, so that the serial number The actual bounding box for i has a higher degree of coincidence with the coordinates of the first predicted bounding box with serial number i, and a smaller degree of coincidence with the coordinates of other first predicted bounding boxes, thereby enhancing the global nature of the bounding box regression learning and further improving the boundary Accuracy of box regression learning.
上述第二回归损失分量为对目标交并比损失的求对数,该目标交并比损失为第一交并比损失的指数与多个第二交并比的指数之和的商值,即以p=i为例,第二回归损失分量可以表示为:
The above-mentioned second regression loss component is the logarithm of the target intersection-union ratio loss. The target intersection-union ratio loss is the quotient of the index of the first intersection-union ratio loss and the sum of the indices of multiple second intersection-union ratios, that is Taking p=i as an example, the second regression loss component can be expressed as:
其中,表示序号为i的第一参考边界框对应的实际边界框, 表示序号为i的第一参考边界框,表示序号为i的第一参考边界框对应的第一预测边界框,表示第一交并比损失,表示序号为k的第一参考边界框,表示序号为k的第一参考边界框对应的第一预测边界框,表示第二交并比损失,θg表示边界框预测子模型的模型参数,ω表示预设调节因子。in, Represents the actual bounding box corresponding to the first reference bounding box with serial number i, Represents the first reference bounding box with serial number i, Represents the first predicted bounding box corresponding to the first reference bounding box with serial number i, represents the first cross-union loss, Represents the first reference bounding box with serial number k, Represents the first predicted bounding box corresponding to the first reference bounding box with serial number k, represents the second intersection and union loss, θ g represents the model parameters of the bounding box prediction sub-model, and ω represents the preset adjustment factor.
考虑到在目标检测过程中,目标检测模型不仅需要确定目标对象所在的位置,也需要确定目标对象的具体类别,因此,在目标检测模型的训练过程中,可能存在针对某些第一参考边界框进行类别识别的准确度低的问题,考虑到针对类别预测的准确度低的第一参考边界框而言,此类第一参考边界框对应的第一预测边界框可能不能真正反映边界框预测子模型的边界框预测准确度,进而针对此类第一原始边界框对应的第一预测边界框和实际边界框的自回归损失也无法真正反映边界框预测子模型的边界框预测准确度,因此,为了进一步提高边界框回归损失值的准确度,在确定第一预测边界框对应的子回归损失值的过程中,考虑第一预测边界框对应的第一预测类别,只有第一预测边界框对应的实际类别与第一预测类别相匹配的情况下,才考虑其对应的子回归损失值,否则,仅考虑其对应的子分类损失值,即排除类别预测结果不符合预设要求的第一参考边界框对应的子回归损失值,基于此,上述目标检测模型还包括边界框分类子模型;每次模型训练的具体实现方式还包括:边界框分类子模型对第一参考边界框或者第一预测边界框进行分类处理,得到第一类别预测结果;在具体实施时,边界框分类子模型对上述第一参考边界框或者上述第一预测边界框进行类别预测,输出结果可以为第一类别预测结果;其中,第一类别预测结果包括第一参考边界框或者第一预测边界框所圈定的目标对象属于各候选类别的预测概率,预测概率最大值对应的候选类别为第一预测类别,即第一参考边界框或者第一预测边界框所圈定的目标对象的类别被边界框分类子模型预测为第一预测类别,也即第一参考边界框 或者第一预测边界框内图像区域的目标对象类别被边界框分类子模型预测为第一预测类别;另外,在具体实施时,考虑到第一参考边界框与第一预测边界框的位置信息不会偏差很大,第一参考边界框内的图像特征与第一预测边界框内的图像特征也不会偏差很大,因此,不会影响边界框内图像区域的目标对象类别的识别,基于此,针对边界框预测与类别预测先后执行的情况,可以将第一预测边界框输入到边界框分类子模型中进行类别预测,得到对应的第一类别预测结果,即先基于第一参考边界框预测得到第一预测边界框,然后对第一预测边界框进行类别预测,得到第一类别预测结果;而针对边界框预测与类别预测同步执行的情况,也可以将第一参考边界框输入到边界框分类子模型中进行类别预测,得到对应的第一类别预测结果,即基于第一参考边界框预测得到第一预测边界框,并且对第一参考边界框进行类别预测,得到第一类别预测结果。Considering that in the target detection process, the target detection model not only needs to determine the location of the target object, but also needs to determine the specific category of the target object. Therefore, during the training process of the target detection model, there may be some first reference bounding boxes for some The problem of low accuracy in category identification, considering that for the first reference bounding box with low accuracy in category prediction, the first predicted bounding box corresponding to such first reference bounding box may not truly reflect the bounding box predictor The bounding box prediction accuracy of the model, and the autoregressive loss for the first predicted bounding box and the actual bounding box corresponding to such first original bounding box, cannot truly reflect the bounding box prediction accuracy of the bounding box prediction sub-model. Therefore, In order to further improve the accuracy of the bounding box regression loss value, in the process of determining the sub-regression loss value corresponding to the first predicted bounding box, the first predicted category corresponding to the first predicted bounding box is considered, and only the first predicted bounding box corresponding When the actual category matches the first predicted category, its corresponding sub-regression loss value will be considered. Otherwise, only its corresponding sub-category loss value will be considered, that is, the first reference boundary where the category prediction result does not meet the preset requirements is excluded. The sub-regression loss value corresponding to the frame. Based on this, the above-mentioned target detection model also includes a bounding box classification sub-model; the specific implementation method of each model training also includes: the bounding box classification sub-model performs the first reference bounding box or the first prediction boundary Classify the frame to obtain the first category prediction result; during specific implementation, the bounding box classification sub-model performs category prediction on the above-mentioned first reference bounding box or the above-mentioned first predicted bounding box, and the output result may be the first category prediction result; Among them, the first category prediction result includes the predicted probability that the target object enclosed by the first reference bounding box or the first predicted bounding box belongs to each candidate category. The candidate category corresponding to the maximum predicted probability is the first predicted category, that is, the first reference The category of the target object enclosed by the bounding box or the first predicted bounding box is predicted by the bounding box classification sub-model to be the first predicted category, that is, the first reference bounding box Or the target object category of the image area within the first predicted bounding box is predicted as the first predicted category by the bounding box classification sub-model; in addition, during specific implementation, it is considered that the position information of the first reference bounding box and the first predicted bounding box are not the same. There will be a large deviation, and the image features in the first reference bounding box will not deviate greatly from the image features in the first predicted bounding box. Therefore, it will not affect the recognition of the target object category in the image area within the bounding box. Based on this , for the situation where bounding box prediction and category prediction are performed sequentially, the first predicted bounding box can be input into the bounding box classification sub-model for category prediction, and the corresponding first category prediction result is obtained, that is, based on the first reference bounding box prediction Obtain the first predicted bounding box, and then perform category prediction on the first predicted bounding box to obtain the first category prediction result; and for the situation where bounding box prediction and category prediction are executed simultaneously, the first reference bounding box can also be input to the bounding box Category prediction is performed in the classification sub-model to obtain the corresponding first category prediction result, that is, the first predicted bounding box is obtained based on the first reference bounding box prediction, and the category prediction is performed on the first reference bounding box to obtain the first category prediction result.
上述边界框分类子模型的模型参数迭代训练过程可以参照现有的分类模型训练过程,在此不再赘述。The iterative training process of model parameters of the above bounding box classification sub-model can refer to the existing classification model training process, and will not be described again here.
上述目标信息还包括第一参考边界框对应的第一类别预测结果所表征的第一预测类别与第一参考边界框的实际类别之间的类别匹配结果,其中,针对各第一参考边界框对应的子回归损失值的确定过程,若对应的类别匹配结果为第一预测类别与实际类别不匹配,则第一参考边界框对应的子回归损失值为零;若对应的类别匹配结果为第一预测类别与实际类别相匹配,则第一参考边界框对应的子回归损失值为基于上述边界框分布相似程度对应的第一回归损失分量和上述边界框坐标重合程度对应的第二回归损失分量中至少一项确定的子回归损失值。The above target information also includes a category matching result between the first predicted category represented by the first category prediction result corresponding to the first reference bounding box and the actual category of the first reference bounding box, wherein for each first reference bounding box corresponding In the determination process of the sub-regression loss value, if the corresponding category matching result is that the first predicted category does not match the actual category, the sub-regression loss value corresponding to the first reference bounding box is zero; if the corresponding category matching result is the first If the predicted category matches the actual category, then the sub-regression loss value corresponding to the first reference bounding box is based on the first regression loss component corresponding to the above-mentioned boundary box distribution similarity and the second regression loss component corresponding to the above-mentioned bounding box coordinate coincidence degree. At least one identified sub-regression loss value.
确定第一参考边界框对应的第一预测类别与实际类别是否匹配的预设类别匹配约束条件可以与第一类别预测结果相关,具体可以包括:单一匹配方式的约束条件、或者变化匹配方式的约束条件,其中,对于单一匹配方式的约束条件而言,每一轮模型训练所使用的类别匹配约束条件保持不变(即与 当前模型训练轮数无关),例如,针对每一轮模型训练而言,若实际类别与第一预测类别相同,则确定第一参考边界框对应的第一预测类别与实际类别相匹配;对于变化匹配方式的约束条件而言,每一轮模型训练所使用的类别匹配约束条件与当前模型训练轮数有关,具体的,变化匹配方式的约束条件又可以分为类别匹配阶段式约束条件、或者类别匹配渐变式约束条件。The preset category matching constraints that determine whether the first predicted category corresponding to the first reference bounding box matches the actual category may be related to the first category prediction result, and specifically may include: constraints in a single matching method, or constraints in changing matching methods. Conditions, among which, for the constraints of a single matching method, the category matching constraints used in each round of model training remain unchanged (that is, with The number of current model training rounds is irrelevant). For example, for each round of model training, if the actual category is the same as the first predicted category, it is determined that the first predicted category corresponding to the first reference bounding box matches the actual category; for changes In terms of the constraints of the matching method, the category matching constraints used in each round of model training are related to the current model training round number. Specifically, the constraints that change the matching method can be divided into category matching stage-type constraints, or category matching constraints. Match gradient constraints.
其中,上述类别匹配阶段式约束条件可以是在当前模型训练轮数小于第一预设轮数时,实际类别与第一预测类别属于同一类别群组,且在当前模型训练轮数大于或等于第一预设轮数时,实际类别与第一预测类别相同,即基于类别匹配阶段式约束条件和第一参考边界框对应的类别预测结果,能够实现阶段式类别匹配约束;上述类别匹配渐变式约束条件可以是第一约束项与第二约束项之和大于预设概率阈值,第一约束项为类别预测概率子集中实际类别对应的第一预测概率,第二约束项为类别预测概率子集中除第一预测概率之外的第二预测概率之和与预设调节因子的乘积,预设调节因子随着当前训练轮数的增加而逐渐减小,即基于类别匹配渐变式约束条件和第一参考边界框对应的类别预测结果,能够实现渐变式类别匹配约束;基于第一参考边界框对应的类别预测结果确定类别预测概率子集,该类别预测概率子集包括第一预测边界框所圈定的目标对象属于实际类别的第一预测概率、以及属于目标群组中的非实际类别的第二预测概率,即类别预测概率子集包括边界框分类子模型对第一参考边界框或者第一预测边界框进行类别预测得到的在目标群组中的实际类别下的第一预测概率和在目标群组中的非实际类别(即目标群组中除实际类别之外的候选类别)下的第二预测概率,目标群组为实际类别所在的类别群组;预先确定与目标检测任务关联的多个候选类别,基于各候选类别的语义信息,对多个候选类别进行群组划分,得到多个类别群组。Wherein, the above category matching stage-type constraint may be that when the current model training round number is less than the first preset round number, the actual category and the first predicted category belong to the same category group, and when the current model training round number is greater than or equal to the first preset round number, the actual category and the first predicted category belong to the same category group. At a preset number of rounds, the actual category is the same as the first predicted category, that is, based on the category matching staged constraint and the category prediction result corresponding to the first reference bounding box, the staged category matching constraint can be realized; the above category matching gradient constraint The condition may be that the sum of the first constraint item and the second constraint item is greater than the preset probability threshold, the first constraint item is the first predicted probability corresponding to the actual category in the category predicted probability subset, and the second constraint item is the category predicted probability subset except The product of the sum of the second predicted probabilities other than the first predicted probability and the preset adjustment factor. The preset adjustment factor gradually decreases as the current number of training rounds increases, that is, based on the category matching gradient constraints and the first reference The category prediction result corresponding to the bounding box can realize the gradual category matching constraint; a category prediction probability subset is determined based on the category prediction result corresponding to the first reference bounding box, and the category prediction probability subset includes the target circled by the first prediction bounding box The first predicted probability that the object belongs to the actual category, and the second predicted probability that the object belongs to the non-actual category in the target group, that is, the category predicted probability subset includes a bounding box classification sub-model for the first reference bounding box or the first predicted bounding box The first predicted probability under the actual category in the target group and the second predicted probability under the non-actual category in the target group (that is, the candidate category other than the actual category in the target group) obtained by category prediction , the target group is the category group where the actual category is located; multiple candidate categories associated with the target detection task are predetermined, and based on the semantic information of each candidate category, the multiple candidate categories are divided into groups to obtain multiple category groups .
由于考虑到第一参考边界框是利用预设感兴趣区域提取模型进行感兴趣区域提取得到的,因此,可能存在由于第一参考边界框所圈定的目标对象所在区域不够精准,从而导致在模型训练初期针对此类第一参考边界框对应的 第一预测边界框的类别识别不准确的情况,基于此,在确定第一参考边界框对应的子回归损失值的过程中,参考第一参考边界框对应的第一预测类别与第一参考边界框的实际类别之间的类别匹配结果,即基于上述预设类别匹配约束条件确定用于表征第一参考边界框对应的第一预测类别与实际类别是否匹配的类别匹配结果。Considering that the first reference bounding box is obtained by extracting the area of interest using a preset area of interest extraction model, it may be that the area where the target object is delineated by the first reference bounding box is not accurate enough, resulting in model training. Initially, for this type of first reference bounding box, The category recognition of the first predicted bounding box is inaccurate. Based on this, in the process of determining the sub-regression loss value corresponding to the first reference bounding box, the first predicted category corresponding to the first reference bounding box and the first reference boundary are referred to. The category matching result between the actual categories of the box is determined based on the above-mentioned preset category matching constraints and is used to represent whether the first predicted category corresponding to the first reference bounding box matches the actual category.
进一步地,边界框分类子模型可以是预先训练好的,也可以是在边界框预测子模型的模型参数进行训练的过程中,同步对边界框分类子模型的模型参数进行训练,即基于第一预测类别和实际类别确定分类损失值,基于分类损失值对边界框分类子模型的模型参数进行迭代训练,其中,针对同步训练边界框分类子模型的模型参数的情况,又考虑到还可能是由于在模型训练前期,待训练的目标检测模型中的边界框分类子模型中的模型参数的准确度低,从而导致针对第一参考边界框对应的第一预测边界框的类别识别不准确的情况,因此,在模型训练前期,放宽对类别准确度的要求,只要第一预测边界框对应的实际类别与第一预测类别属于同一类别群组的情况下,均考虑其对应的子回归损失值,而在模型训练后期,加严对类别准确度的要求,只有第一预测边界框对应的实际类别与第一预测类别相同的情况下,才考虑其对应的子回归损失值,基于此,上述预设类别匹配约束条件可以包括:上述变化匹配方式的约束条件(如类别匹配阶段式约束条件、或者类别匹配渐变式约束条件)。Furthermore, the bounding box classification sub-model can be pre-trained, or the model parameters of the bounding box classification sub-model can be trained simultaneously during the training of the model parameters of the bounding box prediction sub-model, that is, based on the first The predicted category and the actual category determine the classification loss value, and iteratively trains the model parameters of the bounding box classification sub-model based on the classification loss value. Among them, for the simultaneous training of the model parameters of the bounding box classification sub-model, it is also considered that it may be due to In the early stage of model training, the accuracy of the model parameters in the bounding box classification sub-model of the target detection model to be trained is low, resulting in inaccurate category identification of the first predicted bounding box corresponding to the first reference bounding box. Therefore, in the early stage of model training, the requirements for category accuracy are relaxed. As long as the actual category corresponding to the first predicted bounding box and the first predicted category belong to the same category group, their corresponding sub-regression loss values will be considered, and In the later stage of model training, the requirements for category accuracy are tightened. Only when the actual category corresponding to the first predicted bounding box is the same as the first predicted category, its corresponding sub-regression loss value will be considered. Based on this, the above preset Category matching constraints may include: the above-mentioned constraints that change the matching method (such as category matching stage-based constraints, or category matching gradual constraints).
为了确保预设类别匹配约束条件在限定第一预测类别与实际类别满足类别匹配结果的两种类别匹配约束分支(即第一预测类别属于目标群组、第一预测类别与实际类别相同)之间的过渡更加平滑,使得随着模型训练轮数的增加,预设类别匹配约束条件由限定第一预测类别落入目标群组逐渐转换为限定第一预测类别与实际类别相同,基于此,优选地,上述预设类别匹配约束条件包括:类别匹配渐变式约束条件。In order to ensure that the preset category matching constraint conditions are between the two category matching constraint branches that define the first predicted category and the actual category to satisfy the category matching results (i.e. the first predicted category belongs to the target group, the first predicted category is the same as the actual category) The transition of , the above preset category matching constraints include: category matching gradient constraints.
针对上述预设类别匹配约束条件为类别匹配渐变式约束条件的情况,仍 以序号为i的第一参考边界框为例,类别匹配渐变式约束条件可以表达为:
In view of the situation where the above preset category matching constraints are category matching gradient constraints, it is still Taking the first reference bounding box with serial number i as an example, the category matching gradient constraint can be expressed as:
其中,groups表示目标群组,reali表示目标群组groups中序号为i的第一参考边界框的实际类别,f∈groups\reali表示目标群组中的非实际类别,β表示预测调节因子,表示第一预测概率(即上述第一约束项),表示第二预测概率,表示上述第二约束项,μ表示上述预设概率阈值;具体的,越大,说明第一预测类别与实际类别越接近;由于预设调节因子随着当前训练轮数的增加而减小,使得第二约束项的参考占比逐渐减小,使得在模型训练后期主要由第一约束项(即实际类别下的第一预测概率)来决定第一预测类别与实际类别是否匹配,然后在当前模型训练轮数达到一定模型训练轮数之后,第二约束项变为零,即当大于预设概率阈值时,说明边界框分类子模型将实际类别确定为第一预测类别。Among them, groups represents the target group, real i represents the actual category of the first reference bounding box with serial number i in the target group groups, f∈groups\real i represents the non-actual category in the target group, and β represents the prediction adjustment factor , Represents the first prediction probability (i.e. the above-mentioned first constraint item), represents the second prediction probability, represents the above-mentioned second constraint item, μ represents the above-mentioned preset probability threshold; specifically, The larger it is, the closer the first predicted category is to the actual category; since the preset adjustment factor decreases as the current number of training rounds increases, the reference proportion of the second constraint item gradually decreases, making it more important in the later stages of model training. Whether the first predicted category matches the actual category is determined by the first constraint term (i.e., the first predicted probability under the actual category). Then, after the current number of model training rounds reaches a certain number of model training rounds, the second constraint term becomes zero. , that is, when When it is greater than the preset probability threshold, it means that the bounding box classification sub-model determines the actual category as the first predicted category.
针对上述预设调节因子而言,随着当前模型训练轮数的增加而减小,若当前模型训练轮数小于或等于目标训练轮数,则上述第二约束项与预设调节因子正相关,上述预设调节因子与当前模型训练轮数负相关;若当前模型训练轮数大于目标训练轮数,则上述第二约束项为零,其中,目标训练轮数小于总训练轮数。For the above-mentioned preset adjustment factor, it decreases as the number of current model training rounds increases. If the current number of model training rounds is less than or equal to the target number of training rounds, then the above-mentioned second constraint term is positively related to the preset adjustment factor, The above-mentioned preset adjustment factor is negatively related to the current number of model training rounds; if the current number of model training rounds is greater than the target number of training rounds, then the above-mentioned second constraint is zero, where the target number of training rounds is less than the total number of training rounds.
为了确保对预设调节因子的调整平滑度,可以采用线性递减的调节方式逐渐减少预设调节因子β的取值,因此,针对当前模型训练所使用的预设调节因子的确定过程,具体为:In order to ensure the smoothness of the adjustment of the preset adjustment factor, a linear decreasing adjustment method can be used to gradually reduce the value of the preset adjustment factor β. Therefore, the determination process of the preset adjustment factor used in current model training is specifically as follows:
(1)针对首轮模型训练,将第一预设值确定为当前模型训练所使用的预设调节因子。(1) For the first round of model training, determine the first preset value as the preset adjustment factor used in current model training.
第一预设值可以根据实际需求进行设定,为了简化调节复杂度,可以将第一预设值设置为1,即预设调节因子β=1,也即在首轮模型训练的情况下,上述类别匹配渐变式约束条件可以为: The first preset value can be set according to actual needs. In order to simplify the adjustment complexity, the first preset value can be set to 1, that is, the preset adjustment factor β=1, that is, in the first round of model training, The above category matching gradient constraints can be:
Right now
也就是说,针对首轮模型训练,基于目标群组对应的第一预测概率和第二预测概率之和,确定第一参考边界框对应的第一预测类别与实际类别是否匹配。That is to say, for the first round of model training, based on the sum of the first predicted probability and the second predicted probability corresponding to the target group, it is determined whether the first predicted category corresponding to the first reference bounding box matches the actual category.
(2)针对非首轮模型训练,按照因子递减调节方式,基于当前模型训练轮数、目标训练轮数和上述第一预设值,确定当前模型训练所使用的预设调节因子。(2) For non-first round model training, determine the preset adjustment factor used in the current model training based on the current model training round number, the target training round number and the above-mentioned first preset value according to the factor decreasing adjustment method.
若首轮模型训练对应的预设调节因子β=1,则在非首轮模型训练的情况下,上述类别匹配渐变式约束条件可以为:
If the preset adjustment factor β=1 corresponding to the first round of model training, then in the case of non-first round of model training, the above category matching gradient constraints can be:
针对非首轮模型训练,上述类别匹配渐变式约束条件中的并且随着模型训练轮数的增加,第二约束项的参与程度逐渐减小。For non-first-round model training, the above categories match the gradient constraints of And as the number of model training rounds increases, the second constraint participation gradually decreases.
例如,上述因子递减调节方式对应的递减公式可以为:
For example, the decreasing formula corresponding to the above factor decreasing adjustment method can be:
其中,表示与0之间取最大值,上述中的第一项1表示第一预设值(即首轮训练所使用的预设调节因子β),δ表示当前模型训练轮数,Z表示目标训练轮数,即目标训练轮数可以是总训练轮数减1,也可以是指定训练轮数,指定训练轮数小于总训练轮数,总训练轮数与指定训练轮数的差值为预设轮数Q,Q大于2,即在模型训练后期的一定轮数(非最后一轮)的训练过程中,就开始将预设调节因子β设置为0,也即在模型训练后期的δ=Z+1轮至最后一轮的模型训练所使用的判定条件均为 in, express Take the maximum value between 0 and 0, the above The first item 1 in represents the first preset value (i.e., the preset adjustment factor β used in the first round of training), δ represents the current model training round number, and Z represents the target training round number, that is, the target training round number can be the total The number of training rounds is reduced by 1, or it can be the specified number of training rounds. The specified number of training rounds is less than the total number of training rounds. The difference between the total number of training rounds and the specified number of training rounds is the preset number of rounds Q. Q is greater than 2, that is, in the model During the training process of a certain number of rounds (not the last round) in the later stage of training, the preset adjustment factor β begins to be set to 0, that is, from δ=Z+1 in the later stage of model training to the last round of model training. The judgment conditions used are all
针对目标训练轮数Z为总训练轮数减1的情况,上述递减公式可以为: 即在模型训练的最后一轮,将预设调节因子设置为0,也即在最后一轮的模型训练所使用的判定条件均为另外,上述示意出的递减公式仅是给出的一种比较简单的线性递减调节方式,在实际应用过程中,可以根据实际需求设置对预设调节因子β的递减速率,因此,上述递减公式并不构成对本申请的保护范围的限制。For the situation where the target number of training rounds Z is the total number of training rounds minus 1, the above decreasing formula can be: That is, in the last round of model training, the preset adjustment factor is set to 0, that is, the judgment conditions used in the last round of model training are all In addition, the decrease formula shown above is only a relatively simple linear decrease adjustment method. In the actual application process, the decrease rate of the preset adjustment factor β can be set according to actual needs. Therefore, the above decrease formula does not It does not constitute a limitation on the scope of protection of this application.
上述待训练的目标检测模型包括边界框预测子模型和边界框分类子模型,如图4b所示,给出了又一种目标检测模型训练过程的具体实现原理示意图,具体包括:The above-mentioned target detection model to be trained includes a bounding box prediction sub-model and a bounding box classification sub-model. As shown in Figure 4b, a schematic diagram of the specific implementation principle of the training process of another target detection model is given, including:
(1)预先利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取,得到N个锚框。(1) Preliminarily use the preset region of interest extraction model to extract the target region of the sample image data set to obtain N anchor frames.
(2)针对每一轮模型训练,从N个锚框中随机采样m个锚框作为第一参考边界框,以及确定每个第一参考边界框分别对应的实际边界框。(2) For each round of model training, m anchor boxes are randomly sampled from N anchor boxes as the first reference bounding boxes, and the actual bounding boxes corresponding to each first reference bounding box are determined.
(3)针对每个第一参考边界框,边界框预测子模型基于该第一参考边界框进行边界框预测,得到第一预测边界框;然后,比对结果生成模块基于该第一参考边界框对应的实际边界框和对应的第一预测边界框,生成边界框比对结果集合;边界框分类子模型对第一预测边界框进行类别预测,得到类别预测结果;根据预设类别匹配约束条件、该第一参考边界框对应的实际边界框的实际类别和该第一参考边界框对应的第一预测边界框的类别预测结果,确定类别匹配结果;若类别匹配结果表征第一预测类别与实际类别不满足预设类别匹配约束条件,则该第一参考边界框对应的子回归损失值为零;若类别匹配结果表征第一预测类别与实际类别满足预设类别匹配约束条件,则基于第一参考边界框的边界框比对结果集合中的第一比对结果确定第一回归损失分量,基于第一参考边界框的边界框比对结果集合中的第二比对结果确定第二回归损失分量,再基于第一回归损失分量和第二回归损失分量确定第一参考边界框对应的子回归损失值。(3) For each first reference bounding box, the bounding box prediction sub-model performs bounding box prediction based on the first reference bounding box to obtain the first predicted bounding box; then, the comparison result generation module based on the first reference bounding box The corresponding actual bounding box and the corresponding first predicted bounding box generate a set of boundary box comparison results; the bounding box classification sub-model predicts the category of the first predicted bounding box to obtain the category prediction result; according to the preset category matching constraints, The actual category of the actual bounding box corresponding to the first reference bounding box and the category prediction result of the first predicted bounding box corresponding to the first reference bounding box determine the category matching result; if the category matching result represents the first predicted category and the actual category If the preset category matching constraint is not met, the sub-regression loss value corresponding to the first reference bounding box is zero; if the category matching result represents that the first predicted category and the actual category satisfy the preset category matching constraint, then based on the first reference The first comparison result in the set of bounding box comparison results of the bounding box determines the first regression loss component, and the second comparison result in the set of bounding box comparison results based on the first reference bounding box determines the second regression loss component, Then, the sub-regression loss value corresponding to the first reference bounding box is determined based on the first regression loss component and the second regression loss component.
上述类别匹配结果的确定过程可以是在基于边界框比对结果集合确定边 界框回归损失值的过程中考虑的,也可以是在针对某一第一参考边界框生成边界框比对结果集合的过程中考虑的,这样针对第一预测类别与实际类别不满足预设类别匹配约束条件的情况,直接确定对应的边界框比对结果集合为空或者预设信息即可,无需基于第一参考边界框对应的实际边界框和对应的第一预测边界框,生成边界框比对结果集合,能够进一步提高模型训练效率;具体的,参照在图4b中所示的,比对结果生成模块基于第一参考边界框对应的实际边界框和第一预测边界框、以及第一参考边界框对应的实际类别和类别预测结果,生成边界框比对结果集合;具体的,根据第一参考边界框对应的实际边界框的实际类别和第一参考边界框对应的第一预测边界框的类别预测结果,确定类别匹配结果;若类别匹配结果表征第一预测类别与实际类别不满足预设类别匹配约束条件,则对应的边界框比对结果集合为空或者预设信息,因此,基于边界框比对结果集合确定出的子回归损失值为零;若类别匹配结果表征第一预测类别与实际类别满足预设类别匹配约束条件,则基于该第一参考边界框对应的实际边界框和对应的第一预测边界框,生成边界框比对结果集合;因此,基于边界框比对结果集合确定出的子回归损失值为基于边界框比对结果集合中的第一比对结果对应的第一回归损失分量、第二比对结果对应的第二回归损失分量所确定的。The determination process of the above category matching results may be to determine the edge based on the bounding box comparison result set. The bounding box regression loss value can also be considered in the process of generating a set of bounding box comparison results for a certain first reference bounding box, so that the first predicted category and the actual category do not satisfy the preset category. In the case of matching constraint conditions, it is enough to directly determine that the corresponding bounding box comparison result set is empty or has preset information. There is no need to generate a bounding box ratio based on the actual bounding box corresponding to the first reference bounding box and the corresponding first predicted bounding box. For the result set, the model training efficiency can be further improved; specifically, as shown in Figure 4b, the comparison result generation module is based on the actual bounding box and the first predicted bounding box corresponding to the first reference bounding box, and the first reference The actual category and category prediction result corresponding to the bounding box generate a set of boundary box comparison results; specifically, based on the actual category of the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box The category prediction result determines the category matching result; if the category matching result indicates that the first predicted category and the actual category do not meet the preset category matching constraints, the corresponding bounding box comparison result set is empty or preset information. Therefore, based on the boundary The sub-regression loss value determined by the frame comparison result set is zero; if the category matching result indicates that the first predicted category and the actual category satisfy the preset category matching constraints, then based on the actual bounding box corresponding to the first reference bounding box and the corresponding The first predicted bounding box of , generates a set of bounding box comparison results; therefore, the sub-regression loss value determined based on the set of bounding box comparison results is the first corresponding to the first comparison result based on the set of bounding box comparison results. Determined by the regression loss component and the second regression loss component corresponding to the second comparison result.
在确定第一参考边界框对应的子回归损失值是否为零的过程中,可以是直接基于第一参考边界框对应的实际边界框和对应的第一预测边界框,生成边界框比对结果集合;进而基于类别预测结果确定第一预测类别与实际类别的类别匹配结果(即类别匹配结果,表示第一预测类别与实际类别之间是否满足预设类别匹配约束条件);若类别匹配结果为类别不匹配,则确定对应的子回归损失值为零,若类别匹配结果为类别匹配,则基于边界框比对结果集合中的多个比对结果确定对应的子回归损失值;也可以是先基于类别预测结果确定第一预测类别与实际类别的类别匹配结果,若类别匹配结果为类别不匹配,则确定对应的边界框比对结果集合为空或预设信息,以及确定对应的 子回归损失值为零,若类别匹配结果为类别匹配,则基于第一参考边界框对应的实际边界框和对应的第一预测边界框,生成边界框比对结果集合,以及基于边界框比对结果集合中的多个比对结果确定对应的子回归损失值。In the process of determining whether the sub-regression loss value corresponding to the first reference bounding box is zero, a set of bounding box comparison results can be generated directly based on the actual bounding box corresponding to the first reference bounding box and the corresponding first predicted bounding box. ; and then determine the category matching result between the first predicted category and the actual category based on the category prediction result (i.e., the category matching result indicates whether the preset category matching constraints are satisfied between the first predicted category and the actual category); if the category matching result is category does not match, then determine the corresponding sub-regression loss value to be zero. If the category matching result is category matching, then determine the corresponding sub-regression loss value based on multiple comparison results in the bounding box comparison result set; it can also be based on The category prediction result determines the category matching result between the first predicted category and the actual category. If the category matching result is category mismatch, it is determined that the corresponding bounding box comparison result set is empty or preset information, and the corresponding The sub-regression loss value is zero. If the category matching result is category matching, a bounding box comparison result set is generated based on the actual bounding box corresponding to the first reference bounding box and the corresponding first predicted bounding box, and based on the bounding box comparison Multiple comparison results in the result set determine the corresponding sub-regression loss value.
(4)基于各第一参考边界框分别对应的子回归损失值,确定待训练的目标检测模型的边界框回归损失值;利用随机梯度下降方法,基于该边界框回归损失值调整上述边界框预测子模型的模型参数,得到参数更新后的边界框预测子模型。(4) Based on the sub-regression loss value corresponding to each first reference bounding box, determine the bounding box regression loss value of the target detection model to be trained; use the stochastic gradient descent method to adjust the above bounding box prediction based on the bounding box regression loss value The model parameters of the sub-model are obtained to obtain the bounding box prediction sub-model after the parameters are updated.
(5)若本次模型迭代训练结果满足预设模型迭代训练终止条件,则将上述更新后的边界框预测子模型确定为训练后的目标检测模型;若本次模型迭代训练结果不满足预设模型迭代训练终止条件,则将上述更新后的边界框预测子模型确定为下一轮模型训练所使用的待训练的目标检测模型,直到满足预设模型迭代训练终止条件。(5) If the model iterative training results meet the preset model iterative training termination conditions, then the updated bounding box prediction sub-model will be determined as the trained target detection model; if the model iterative training results do not meet the preset If the model iterative training termination condition is determined, the updated bounding box prediction sub-model will be determined as the target detection model to be trained for the next round of model training until the preset model iterative training termination condition is met.
本申请实施例中的模型训练方法,在模型训练阶段,边界框预测子模型基于第一参考边界框预测得到第一预测边界框,再基于第一预测边界框、及其对应的实际边界框,促使待训练的目标检测模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于对应的实际边界框,从而提高训练后的目标检测模型的边界框预测准确度、模型泛化性和数据迁移性;并且用于确定边界框回归损失值的比对结果集合不仅包括表征边界框分布相似程度的第一比对结果,还包括表征边界框坐标重合程度的第二比对结果,再基于各第一参考边界框分别对应的第一比对结果和第二比对结果得到边界框回归损失值,使得边界框回归损失值包含了基于边界框分布相似程度的粗粒度比对维度得到的回归损失、以及基于边界框坐标重合程度的细粒度比对维度得到的回归损失,这样能够提高边界框回归损失值的准确度更高,从而能够进一步提高基于该边界框回归损失值更新后的模型参数的准确度。In the model training method in the embodiment of the present application, in the model training stage, the bounding box prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then based on the first predicted bounding box and its corresponding actual bounding box, Prompts the target detection model to be trained to continuously learn the bounding box distribution, so that the predicted first predicted bounding box is closer to the corresponding actual bounding box, thereby improving the bounding box prediction accuracy and model generalization of the trained target detection model. and data mobility; and the set of comparison results used to determine the bounding box regression loss value not only includes the first comparison result that characterizes the similarity of the bounding box distribution, but also includes the second comparison result that characterizes the degree of coincidence of the bounding box coordinates, and then The bounding box regression loss value is obtained based on the first comparison result and the second comparison result corresponding to each first reference bounding box, so that the bounding box regression loss value includes the coarse-grained comparison dimension based on the similarity of the bounding box distribution. Regression loss, as well as regression loss obtained by fine-grained comparison dimensions based on the coincidence degree of bounding box coordinates, can improve the accuracy of the bounding box regression loss value and further improve the model updated based on the bounding box regression loss value. Parameter accuracy.
对应上述图1至图4b描述的模型训练方法,基于相同的技术构思,本申 请实施例还提供了一种目标检测方法,图5为本申请实施例提供的目标检测方法的流程示意图,图5中的方法能够由设置有目标检测装置的电子设备执行,该电子设备可以是终端设备或者指定服务器,其中,用于目标检测的硬件装置(即设置有目标检测装置的电子设备)与目标检测模型训练的硬件装置(即设置有目标检测模型训练装置的电子设备)可以相同或不同,如图5所示,该方法至少包括以下步骤:Corresponding to the model training method described in Figure 1 to Figure 4b above, based on the same technical concept, this application The embodiment also provides a target detection method. Figure 5 is a schematic flow chart of the target detection method provided by the embodiment of the present application. The method in Figure 5 can be executed by an electronic device equipped with a target detection device. The electronic device can be A terminal device or a designated server, wherein the hardware device for target detection (i.e., the electronic device provided with the target detection device) and the hardware device for target detection model training (i.e., the electronic device provided with the target detection model training device) may be the same or Differently, as shown in Figure 5, this method at least includes the following steps:
S502,从第二备选边界框集合中获取待检测图像对应的第二边界框子集;其中,第二边界框子集包括第三指定数量的第二参考边界框,第二备选边界框集合是利用预设感兴趣区域提取模型对上述待检测图像进行目标区域提取得到的。S502: Obtain a second subset of bounding boxes corresponding to the image to be detected from the second set of candidate bounding boxes; wherein the second subset of bounding boxes includes a third specified number of second reference bounding boxes, and the second set of candidate bounding boxes is It is obtained by extracting the target area of the above-mentioned image to be detected using a preset area of interest extraction model.
第三指定数量的第二参考边界框的获取过程,可以参照上述第一指定数量的第一参考边界框的获取过程,在此不再赘述。The process of obtaining the third specified number of second reference bounding boxes may refer to the above-mentioned process of obtaining the first specified number of first reference bounding boxes, which will not be described again here.
S504,将上述第二参考边界框输入目标检测模型进行目标检测,得到各第二参考边界框对应的第二预测边界框和第二类别预测结果;其中,目标检测模型是基于上述模型训练方法训练得到的,目标检测模型的具体训练过程参见上述实施例,在此不再赘述。S504: Input the above-mentioned second reference bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second category prediction result corresponding to each second reference bounding box; wherein, the target detection model is trained based on the above-mentioned model training method. For the specific training process of the target detection model, please refer to the above embodiments and will not be described again here.
上述目标检测模型包括边界框分类子模型和边界框预测子模型;针对每个第二参考边界框:在目标检测过程中,边界框预测子模型基于第二参考边界框进行边界框预测,得到第二参考边界框对应的第二预测边界框;边界框分类子模型对第二参考边界框或者第二预测边界框进行分类处理,得到第二参考边界框对应的第二预测类别。The above target detection model includes a bounding box classification sub-model and a bounding box prediction sub-model; for each second reference bounding box: during the target detection process, the bounding box prediction sub-model performs bounding box prediction based on the second reference bounding box, and obtains the The second predicted bounding box corresponding to the two reference bounding boxes; the bounding box classification sub-model performs classification processing on the second reference bounding box or the second predicted bounding box to obtain a second prediction category corresponding to the second reference bounding box.
边界框分类子模型对上述第二参考边界框或者上述第二预测边界框进行类别预测,输出结果可以为第二类别预测结果;其中,第二类别预测结果包括第二参考边界框或者第二预测边界框所圈定的目标对象属于各候选类别的预测概率,预测概率最大值对应的候选类别为第二预测类别,即第二参考边界框或者第二预测边界框所圈定的目标对象的类别被边界框分类子模型预测 为第二预测类别,也即第二参考边界框或者第二预测边界框内图像区域的目标对象类别被边界框分类子模型预测为第二预测类别;另外,在具体实施时,考虑到第二参考边界框与第二预测边界框的位置信息不会偏差很大,第二参考边界框内的图像特征与第二预测边界框内的图像特征也不会偏差很大,因此,不会影响边界框内图像区域的目标对象类别的识别,基于此,针对边界框预测与类别预测先后执行的情况,可以将第二预测边界框输入到边界框分类子模型中进行类别预测,得到对应的第二类别预测结果,即先基于第二参考边界框预测得到第二预测边界框,然后对第二预测边界框进行类别预测,得到第二类别预测结果;而针对边界框预测与类别预测同步执行的情况,也可以将第二参考边界框输入到边界框分类子模型中进行类别预测,得到对应的第二类别预测结果,即基于第二参考边界框预测得到第二预测边界框,并且对第二参考边界框进行类别预测,得到第二类别预测结果。The bounding box classification sub-model performs category prediction on the above-mentioned second reference bounding box or the above-mentioned second predicted bounding box, and the output result may be a second category prediction result; wherein the second category prediction result includes the second reference bounding box or the second prediction The predicted probability that the target object enclosed by the bounding box belongs to each candidate category. The candidate category corresponding to the maximum predicted probability is the second predicted category, that is, the category of the target object enclosed by the second reference bounding box or the second predicted bounding box is bounded. Box classification sub-model prediction is the second prediction category, that is, the target object category of the image area within the second reference bounding box or the second prediction bounding box is predicted as the second prediction category by the bounding box classification sub-model; in addition, during specific implementation, the second prediction category is taken into consideration The position information of the reference bounding box and the second predicted bounding box will not deviate greatly, and the image features in the second reference bounding box will not deviate greatly from the image features in the second predicted bounding box. Therefore, the boundary will not be affected. Identification of the target object category in the image area within the frame. Based on this, in the case where bounding box prediction and category prediction are performed sequentially, the second predicted bounding box can be input into the bounding box classification sub-model for category prediction, and the corresponding second predicted bounding box can be obtained. The category prediction result is to first obtain the second predicted bounding box based on the second reference bounding box prediction, and then perform category prediction on the second predicted bounding box to obtain the second category prediction result; for the situation where boundary box prediction and category prediction are executed simultaneously , the second reference bounding box can also be input into the bounding box classification sub-model for category prediction, and the corresponding second category prediction result is obtained, that is, the second predicted bounding box is obtained based on the second reference bounding box prediction, and the second reference The bounding box is used for category prediction and the second category prediction result is obtained.
S506,基于各第二参考边界框对应的第二预测边界框和第二类别预测结果,生成待检测图像的目标检测结果。S506: Generate a target detection result of the image to be detected based on the second predicted bounding box and the second category prediction result corresponding to each second reference bounding box.
基于各第二参考边界框对应的第二预测边界框和第二预测类别,即可确定待检测图像中所包含的目标对象的数量、以及各目标对象所属类别,例如,待检测图像中包含一只猫、一只狗和一个行人。Based on the second predicted bounding box and the second predicted category corresponding to each second reference bounding box, the number of target objects contained in the image to be detected and the category to which each target object belongs can be determined. For example, the image to be detected contains a A cat, a dog and a pedestrian.
上述目标检测模型包括边界框预测子模型和边界框分类子模型,如图6所示,给出了一种目标检测过程的具体实现原理示意图,具体包括:利用预设感兴趣区域提取模型对待检测图像进行目标区域提取,得到P个锚框;从P个锚框中随机采样n个锚框作为第二参考边界框;针对每个第二参考边界框,边界框预测子模型基于该第二参考边界框进行边界框预测,得到第二预测边界框;边界框分类子模型对第二预测边界框进行类别预测,得到第二预测类别;基于各第二参考边界框对应的第二预测边界框和第二预测类别,生成待检测图像的目标检测结果。The above target detection model includes a bounding box prediction sub-model and a bounding box classification sub-model. As shown in Figure 6, a schematic diagram of the specific implementation principle of the target detection process is given, which specifically includes: using the preset region of interest extraction model to be detected The target area is extracted from the image to obtain P anchor boxes; n anchor boxes are randomly sampled from the P anchor boxes as second reference bounding boxes; for each second reference bounding box, the bounding box prediction sub-model is based on the second reference The bounding box predicts the bounding box to obtain the second predicted bounding box; the bounding box classification sub-model performs category prediction on the second predicted bounding box to obtain the second predicted category; based on the second predicted bounding box corresponding to each second reference bounding box and The second prediction category generates the target detection results of the image to be detected.
基于上述目标检测模型训练方法训练得到的目标检测模型可以应用到任 一需要对待检测图像进行目标检测的具体应用场景,其中,该待检测图像可以是由设置于某一现场位置的图像采集设备采集得到的,对应的,目标检测装置可以属于该图像采集设备,具体可以是图像采集设备中的图像处理装置,图像处理装置接收图像采集设备中图像采集装置传输的待检测图像,并对该待检测图像进行目标检测;目标检测装置也可以是独立于图像采集设备的单独的一个目标检测设备,目标检测设备接收图像采集设备的待检测图像,并对该待检测图像进行目标检测。The target detection model trained based on the above target detection model training method can be applied to any A specific application scenario that requires target detection of an image to be detected, wherein the image to be detected can be collected by an image acquisition device installed at a certain on-site location, and correspondingly, the target detection device can belong to the image acquisition device, specifically It can be an image processing device in the image acquisition device. The image processing device receives the image to be detected transmitted by the image acquisition device in the image acquisition device, and performs target detection on the image to be detected; the target detection device can also be independent of the image acquisition device. A separate target detection device receives the image to be detected from the image acquisition device and performs target detection on the image to be detected.
针对目标检测的具体应用场景,例如,待检测图像可以是设置于某一公共场所入口(如商场入口、地铁口、景点入口、或演出现场入口等)的图像采集设备所采集得到的,对应的,待检测图像中的待检测目标对象为进入该公共场所的目标用户,利用上述目标检测模型对待检测图像进行目标检测,以在待检测图像中圈定出包含进入该公共场所的目标用户的第二预测边界框,并确定第二预测边界框对应的第二预测类别(即第二预测边界框中包含的目标用户所属类别,如年龄段、性别、身高、职业中至少一项),得到待检测图像的目标检测结果;然后,基于目标检测结果确定用户群识别结果(如进入该公共场所的人流量、或者进入该公共场所的用户群属性等等),进而,基于用户群识别结果执行相应的业务处理(如自动触发入场限制提示操作、或者对目标用户进行信息推送等等);其中,上述目标检测模型的模型参数的准确度越高,利用目标检测模型输出的待检测图像的目标检测结果的准确度也就越高,因此,基于目标检测结果触发执行相应的业务处理的准确度也就越高。For specific application scenarios of target detection, for example, the image to be detected can be collected by an image collection device installed at the entrance of a certain public place (such as a shopping mall entrance, a subway entrance, an entrance to a scenic spot, or an entrance to a performance site, etc.). The corresponding , the target object to be detected in the image to be detected is the target user who enters the public place. The above target detection model is used to perform target detection on the image to be detected, so as to delineate the second target user who enters the public place in the image to be detected. Predict the bounding box, and determine the second prediction category corresponding to the second predicted bounding box (that is, the category of the target user included in the second predicted bounding box, such as at least one of age group, gender, height, and occupation), and obtain the target user to be detected The target detection result of the image; then, the user group identification result is determined based on the target detection result (such as the flow of people entering the public place, or the attributes of the user group entering the public place, etc.), and then, based on the user group identification result, the corresponding Business processing (such as automatically triggering admission restriction prompt operations, or pushing information to target users, etc.); among them, the higher the accuracy of the model parameters of the above target detection model, the target detection of the image to be detected output by the target detection model is The accuracy of the results will be higher. Therefore, the accuracy of triggering corresponding business processing based on the target detection results will be higher.
又如,待检测图像可以是由设置于某一养殖基地中各监控点的图像采集设备采集得到的,对应的,待检测图像中的待检测目标对象为该养殖监控点内的目标养殖对象,利用上述目标检测模型对待检测图像进行目标检测,以在待检测图像中圈定出包含目标养殖对象的第二预测边界框,并确定第二预测边界框对应的第二预测类别(即第二预测边界框中包含的目标养殖对象所属类别,如活体状态、体型大小中至少一项),得到待检测图像的目标检测结 果;然后,基于目标检测结果确定养殖对象群体识别结果(如该养殖监控点内目标养殖对象存活率、或者养殖监控点内目标养殖对象的生长速率等等),进而,基于养殖对象群体识别结果执行相应的管控操作(如若检测出存活率下降,则自动发出告警提示信息、或者若检测出生长速率减缓,则自动控制增加喂养量或喂养频次等等);其中,上述目标检测模型的模型参数的准确度越高,利用目标检测模型输出的待检测图像的目标检测结果的准确度也就越高,因此,基于目标检测结果触发执行相应的管控操作的准确度也就越高。For another example, the image to be detected can be collected by image acquisition equipment installed at each monitoring point in a certain breeding base. Correspondingly, the target object to be detected in the image to be detected is the target breeding object in the breeding monitoring point. The above target detection model is used to perform target detection on the image to be detected, so as to delineate the second prediction bounding box containing the target breeding object in the image to be detected, and determine the second prediction category (i.e., the second prediction boundary) corresponding to the second prediction boundary box. The category of the target breeding object contained in the frame, such as at least one of living status and body size), and the target detection result of the image to be detected is obtained. The result; then, determine the breeding object group identification result based on the target detection result (such as the survival rate of the target breeding object in the breeding monitoring point, or the growth rate of the target breeding object in the breeding monitoring point, etc.), and then, based on the breeding object group identification result Execute corresponding management and control operations (if a decrease in survival rate is detected, an alarm message will be automatically issued, or if a slowdown in growth rate is detected, automatic control will increase the amount of feeding or the frequency of feeding, etc.); among them, the model parameters of the above target detection model The higher the accuracy, the higher the accuracy of the target detection results of the image to be detected output by the target detection model. Therefore, the higher the accuracy of triggering corresponding control operations based on the target detection results.
本申请实施例中的目标检测方法,在目标检测过程中,首先利用预设感兴趣区域提取模型提取多个备选边界框,再在备选边界框中随机采样第三指定数量的备选边界框作为第二参考边界框;针对每个第二参考边界框,边界框预测子模型基于该第二参考边界框进行边界框预测,得到第二预测边界框;分类子模型对第二预测边界框进行类别预测,得到第二预测类别;然后,基于各第二参考边界框对应的第二预测边界框和第二预测类别,生成待检测图像的目标检测结果;其中,由于在模型训练阶段,边界框预测子模型基于第一参考边界框预测得到第一预测边界框,再基于第一预测边界框、及其对应的实际边界框,促使待训练的目标检测模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于对应的实际边界框,从而提高训练后的目标检测模型的边界框预测准确度、模型泛化性和数据迁移性;并且用于确定边界框回归损失值的比对结果集合不仅包括表征边界框分布相似程度的第一比对结果,还包括表征边界框坐标重合程度的第二比对结果,再基于各第一参考边界框分别对应的第一比对结果和第二比对结果得到边界框回归损失值,使得边界框回归损失值包含了基于边界框分布相似程度的粗粒度比对维度得到的回归损失、以及基于边界框坐标重合程度的细粒度比对维度得到的回归损失,这样能够提高边界框回归损失值的准确度更高,从而能够进一步提高基于该边界框回归损失值更新后的模型参数的准确度。The target detection method in the embodiment of the present application, during the target detection process, first uses the preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly samples a third specified number of candidate boundaries in the candidate boundary boxes. box as the second reference bounding box; for each second reference bounding box, the bounding box prediction sub-model performs bounding box prediction based on the second reference bounding box to obtain the second predicted bounding box; the classification sub-model predicts the second predicted bounding box Category prediction is performed to obtain the second predicted category; then, based on the second predicted bounding box and the second predicted category corresponding to each second reference bounding box, the target detection result of the image to be detected is generated; where, since in the model training stage, the boundary The frame prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then based on the first predicted bounding box and its corresponding actual bounding box, prompts the target detection model to be trained to continuously learn the bounding box distribution, so that the prediction can be The first predicted bounding box is closer to the corresponding actual bounding box, thereby improving the bounding box prediction accuracy, model generalization and data transferability of the trained target detection model; and is used to determine the ratio of bounding box regression loss values The set of comparison results not only includes the first comparison result that represents the similarity of the bounding box distribution, but also includes the second comparison result that represents the degree of coincidence of the coordinates of the bounding boxes, and is based on the first comparison result and the corresponding first reference bounding box respectively. The second comparison result obtains the bounding box regression loss value, so that the bounding box regression loss value includes the regression loss based on the coarse-grained comparison dimension based on the similarity of the bounding box distribution, and the fine-grained comparison dimension based on the coincidence degree of the bounding box coordinates. The obtained regression loss can improve the accuracy of the bounding box regression loss value, thereby further improving the accuracy of the updated model parameters based on the bounding box regression loss value.
需要说明的是,本申请中该实施例与本申请中上一实施例基于同一发明 构思,因此该实施例的具体实施可以参见前述模型训练方法的实施,重复之处不再赘述。It should be noted that this embodiment in this application is based on the same invention as the previous embodiment in this application. Therefore, the specific implementation of this embodiment can be referred to the implementation of the aforementioned model training method, and repeated details will not be described again.
对应上述图1至图4b描述的模型训练方法,基于相同的技术构思,本申请实施例还提供了一种模型训练装置,图7为本申请实施例提供的模型训练装置的模块组成示意图,该装置用于执行图1至图4b描述的模型训练方法,如图7所示,该装置包括:边界框获取模块702,被配置为从第一备选边界框集合中获取第一边界框子集,以及获取所述第一边界框子集中的各第一参考边界框分别对应的实际边界框;所述第一边界框子集包括第一指定数量的第一参考边界框,所述第一备选边界框集合是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;模型训练模块704,被配置为将所述第一参考边界框和所述实际边界框输入待训练的目标检测模型进行模型迭代训练,直到本次模型迭代训练结果满足预设模型迭代训练终止条件,得到训练后的目标检测模型。Corresponding to the model training method described in Figures 1 to 4b, based on the same technical concept, embodiments of the present application also provide a model training device. Figure 7 is a schematic diagram of the module composition of the model training device provided by the embodiment of the present application. The device is used to perform the model training method described in Figures 1 to 4b. As shown in Figure 7, the device includes: a bounding box acquisition module 702 configured to acquire a first bounding box subset from a first candidate bounding box set, and obtaining actual bounding boxes respectively corresponding to each first reference bounding box in the first bounding box subset; the first bounding box subset includes a first specified number of first reference bounding boxes, and the first candidate bounding box The set is obtained by extracting the target area from the sample image data set using a preset region of interest extraction model; the model training module 704 is configured to input the first reference bounding box and the actual bounding box into the target detection to be trained The model undergoes model iterative training until the results of this model iterative training meet the preset model iterative training termination conditions, and the trained target detection model is obtained.
其中,所述目标检测模型包括边界框预测子模型;每次模型训练的具体实现方式有:针对每个所述第一参考边界框:所述边界框预测子模型基于所述第一参考边界框进行边界框预测,得到第一预测边界框;基于所述第一参考边界框对应的实际边界框和所述第一参考边界框对应的第一预测边界框,生成边界框比对结果集合;所述边界框比对结果集合包括表征边界框分布相似程度的第一比对结果、以及表征边界框坐标重合程度的第二比对结果;基于所述第一边界框子集中的所述第一参考边界框分别对应的第一比对结果和第二比对结果,确定边界框回归损失值;基于所述边界框回归损失值对所述边界框预测子模型进行参数更新。Wherein, the target detection model includes a bounding box prediction sub-model; the specific implementation of each model training is: for each first reference bounding box: the bounding box prediction sub-model is based on the first reference bounding box Perform bounding box prediction to obtain a first predicted bounding box; generate a set of bounding box comparison results based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box; The set of bounding box comparison results includes a first comparison result that represents the degree of similarity of the distribution of bounding boxes, and a second comparison result that represents the degree of coincidence of bounding box coordinates; based on the first reference boundary in the first subset of bounding boxes The first comparison result and the second comparison result respectively corresponding to the box are determined to determine the bounding box regression loss value; and the parameters of the bounding box prediction sub-model are updated based on the bounding box regression loss value.
本申请实施例中的模型训练装置,在模型训练阶段,边界框预测子模型基于第一参考边界框预测得到第一预测边界框,再基于第一预测边界框、及其对应的实际边界框,促使待训练的目标检测模型不断学习边界框分布,使 得预测得到的第一预测边界框更加接近于对应的实际边界框,从而提高训练后的目标检测模型的边界框预测准确度、模型泛化性和数据迁移性;并且用于确定边界框回归损失值的比对结果集合不仅包括表征边界框分布相似程度的第一比对结果,还包括表征边界框坐标重合程度的第二比对结果,再基于各第一参考边界框分别对应的第一比对结果和第二比对结果得到边界框回归损失值,使得边界框回归损失值包含了基于边界框分布相似程度的粗粒度比对维度得到的回归损失、以及基于边界框坐标重合程度的细粒度比对维度得到的回归损失,这样能够提高边界框回归损失值的准确度更高,从而能够进一步提高基于该边界框回归损失值更新后的模型参数的准确度。In the model training device in the embodiment of the present application, during the model training phase, the bounding box prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then based on the first predicted bounding box and its corresponding actual bounding box, Prompts the target detection model to be trained to continuously learn the bounding box distribution, so that The predicted first predicted bounding box is closer to the corresponding actual bounding box, thereby improving the bounding box prediction accuracy, model generalization and data transferability of the trained target detection model; and used to determine the bounding box regression loss The comparison result set of values not only includes the first comparison result that represents the similarity of the bounding box distribution, but also includes the second comparison result that represents the degree of coincidence of the coordinates of the bounding boxes, and then based on the first comparison result corresponding to each first reference bounding box. The bounding box regression loss value is obtained from the comparison result and the second comparison result, so that the bounding box regression loss value includes the regression loss based on the coarse-grained comparison dimension based on the similarity of the bounding box distribution, and the fine-grained regression loss based on the coincidence degree of the bounding box coordinates. Comparing the regression loss obtained by dimensions can improve the accuracy of the bounding box regression loss value, which can further improve the accuracy of the model parameters updated based on the bounding box regression loss value.
需要说明的是,本申请中关于模型训练装置的实施例与本申请中关于模型训练方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的模型训练方法的实施,重复之处不再赘述。It should be noted that the embodiments of the model training device in this application and the embodiments of the model training method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the implementation of the corresponding model training method mentioned above. Repeat No further details will be given.
对应上述图5至图6描述的目标检测方法,基于相同的技术构思,本申请实施例还提供了一种目标检测装置,图8为本申请实施例提供的目标检测装置的模块组成示意图,该装置用于执行图5至图6描述的目标检测方法,如图8所示,该装置包括:边界框获取模块802,被配置为从第二备选边界框集合中获取待检测图像对应的第二边界框子集;所述第二边界框子集包括第三指定数量的第二参考边界框,所述第二备选边界框集合是利用预设感兴趣区域提取模型对所述待检测图像进行目标区域提取得到的;目标检测模块804,被配置为将所述第二参考边界框输入目标检测模型进行目标检测,得到各所述第二参考边界框对应的第二预测边界框和第二类别预测结果;检测结果生成模块806,被配置为基于各所述第二参考边界框对应的所述第二预测边界框和所述第二类别预测结果,生成所述待检测图像的目标检测结果。Corresponding to the target detection method described in Figures 5 to 6 above, based on the same technical concept, embodiments of the present application also provide a target detection device. Figure 8 is a schematic diagram of the module composition of the target detection device provided by the embodiment of the present application. The device is used to perform the target detection method described in Figures 5 to 6. As shown in Figure 8, the device includes: a bounding box acquisition module 802, configured to acquire the third image corresponding to the image to be detected from the second candidate bounding box set. Two bounding box subsets; the second bounding box subset includes a third specified number of second reference bounding boxes, and the second candidate bounding box set is a target of the image to be detected using a preset region of interest extraction model. Obtained by region extraction; the target detection module 804 is configured to input the second reference bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second category prediction corresponding to each of the second reference bounding boxes. Result: The detection result generation module 806 is configured to generate a target detection result of the image to be detected based on the second predicted bounding box corresponding to each of the second reference bounding boxes and the second category prediction result.
本申请实施例中的目标检测装置,在目标检测过程中,首先利用预设感兴趣区域提取模型提取多个备选边界框,再在备选边界框中随机采样第三指 定数量的备选边界框作为第二参考边界框;针对每个第二参考边界框,边界框预测子模型基于该第二参考边界框进行边界框预测,得到第二预测边界框;分类子模型对第二预测边界框进行类别预测,得到第二预测类别;然后,基于各第二参考边界框对应的第二预测边界框和第二预测类别,生成待检测图像的目标检测结果;其中,由于在模型训练阶段,边界框预测子模型基于第一参考边界框预测得到第一预测边界框,再基于第一预测边界框、及其对应的实际边界框,促使待训练的目标检测模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于对应的实际边界框,从而提高训练后的目标检测模型的边界框预测准确度、模型泛化性和数据迁移性;并且用于确定边界框回归损失值的比对结果集合不仅包括表征边界框分布相似程度的第一比对结果,还包括表征边界框坐标重合程度的第二比对结果,再基于各第一参考边界框分别对应的第一比对结果和第二比对结果得到边界框回归损失值,使得边界框回归损失值包含了基于边界框分布相似程度的粗粒度比对维度得到的回归损失、以及基于边界框坐标重合程度的细粒度比对维度得到的回归损失,这样能够提高边界框回归损失值的准确度更高,从而进一步提高基于该边界框回归损失值更新后的模型参数的准确度。The target detection device in the embodiment of the present application, during the target detection process, first uses the preset region of interest extraction model to extract multiple candidate bounding boxes, and then randomly samples the third finger in the candidate bounding boxes. A certain number of candidate bounding boxes are used as second reference bounding boxes; for each second reference bounding box, the bounding box prediction sub-model performs bounding box prediction based on the second reference bounding box to obtain a second predicted bounding box; the classification sub-model Category prediction is performed on the second predicted bounding box to obtain the second predicted category; then, based on the second predicted bounding box and the second predicted category corresponding to each second reference bounding box, a target detection result of the image to be detected is generated; where, since In the model training phase, the bounding box prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then based on the first predicted bounding box and its corresponding actual bounding box, prompts the target detection model to be trained to continuously learn the boundary The box distribution makes the predicted first predicted bounding box closer to the corresponding actual bounding box, thereby improving the bounding box prediction accuracy, model generalization and data transferability of the trained target detection model; and is used to determine the boundary The comparison result set of the frame regression loss value not only includes the first comparison result that represents the similarity of the bounding box distribution, but also includes the second comparison result that represents the degree of coincidence of the bounding box coordinates, and then based on the corresponding first reference bounding box. The first comparison result and the second comparison result obtain the bounding box regression loss value, so that the bounding box regression loss value includes the regression loss based on the coarse-grained comparison dimension based on the similarity of the bounding box distribution, and the regression loss based on the coincidence degree of the bounding box coordinates. The regression loss obtained by comparing the fine-grained dimensions can improve the accuracy of the bounding box regression loss value, thereby further improving the accuracy of the updated model parameters based on the bounding box regression loss value.
需要说明的是,本申请中关于目标检测装置的实施例与本申请中关于目标检测方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的目标检测方法的实施,重复之处不再赘述。It should be noted that the embodiments of the target detection device in this application and the embodiments of the target detection method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the implementation of the corresponding target detection method mentioned above. Repeat No further details will be given.
对应上述图1至图4b所示的方法,基于相同的技术构思,本申请实施例还提供了一种计算机设备,该设备用于执行上述的模型训练方法或者目标检测方法,如图9所示。Corresponding to the above-mentioned methods shown in Figures 1 to 4b, based on the same technical concept, embodiments of the present application also provide a computer device, which is used to execute the above-mentioned model training method or target detection method, as shown in Figure 9 .
计算机设备可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上的处理器901和存储器902,存储器902中可以存储有一个或一个以上存储应用程序或数据。其中,存储器902可以是短暂存储或持久存储。 存储在存储器902的应用程序可以包括一个或一个以上模块(图示未示出),每个模块可以包括对计算机设备中的一系列计算机可执行指令。更进一步地,处理器901可以设置为与存储器902通信,在计算机设备上执行存储器902中的一系列计算机可执行指令。计算机设备还可以包括一个或一个以上电源903,一个或一个以上有线或无线网络接口904,一个或一个以上输入输出接口905,一个或一个以上键盘906等。Computer equipment may vary greatly due to different configurations or performance, and may include one or more processors 901 and memory 902 , and the memory 902 may store one or more storage application programs or data. Among them, the memory 902 may be short-term storage or persistent storage. The application program stored in memory 902 may include one or more modules (not shown), and each module may include a series of computer-executable instructions on a computer device. Furthermore, the processor 901 may be configured to communicate with the memory 902 and execute a series of computer-executable instructions in the memory 902 on the computer device. The computer device may also include one or more power supplies 903, one or more wired or wireless network interfaces 904, one or more input-output interfaces 905, one or more keyboards 906, etc.
计算机设备包括有存储器,以及一个或一个以上的程序,其中一个或者一个以上程序存储于存储器中,且一个或者一个以上程序可以包括一个或一个以上模块,且每个模块可以包括对计算机设备中的一系列计算机可执行指令,且经配置以由一个或者一个以上处理器执行该一个或者一个以上程序包含用于进行以下计算机可执行指令:从第一备选边界框集合中获取第一边界框子集,以及获取所述第一边界框子集中的各第一参考边界框分别对应的实际边界框;所述第一边界框子集包括第一指定数量的第一参考边界框,所述第一备选边界框集合是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;将所述第一参考边界框和所述实际边界框输入待训练的目标检测模型进行模型迭代训练,直到本次模型迭代训练结果满足预设模型迭代训练终止条件,得到训练后的目标检测模型。The computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a configuration for the computer device. A series of computer-executable instructions and configured to be executed by one or more processors. The one or more programs include computer-executable instructions for: obtaining a first subset of bounding boxes from a first set of candidate bounding boxes. , and obtain the actual bounding boxes respectively corresponding to each first reference bounding box in the first bounding box subset; the first bounding box subset includes a first specified number of first reference bounding boxes, and the first candidate boundary The box set is obtained by extracting the target area from the sample image data set using a preset area of interest extraction model; input the first reference bounding box and the actual bounding box into the target detection model to be trained for model iterative training until The results of this model iterative training meet the preset model iterative training termination conditions, and the trained target detection model is obtained.
其中,所述目标检测模型包括边界框预测子模型;每次模型训练的具体实现方式有:针对每个所述第一参考边界框:所述边界框预测子模型基于所述第一参考边界框进行边界框预测,得到第一预测边界框;基于所述第一参考边界框对应的实际边界框和所述第一参考边界框对应的第一预测边界框,生成边界框比对结果集合;所述边界框比对结果集合包括表征边界框分布相似程度的第一比对结果、以及表征边界框坐标重合程度的第二比对结果;基于所述第一边界框子集中的所述第一参考边界框分别对应的第一比对结果和第二比对结果,确定边界框回归损失值;基于所述边界框回归损失值对所述边界框预测子模型进行参数更新。 Wherein, the target detection model includes a bounding box prediction sub-model; the specific implementation of each model training is: for each first reference bounding box: the bounding box prediction sub-model is based on the first reference bounding box Perform bounding box prediction to obtain a first predicted bounding box; generate a set of bounding box comparison results based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box; The set of bounding box comparison results includes a first comparison result that represents the degree of similarity of the distribution of bounding boxes, and a second comparison result that represents the degree of coincidence of bounding box coordinates; based on the first reference boundary in the first subset of bounding boxes The first comparison result and the second comparison result respectively corresponding to the box are determined to determine the bounding box regression loss value; and the parameters of the bounding box prediction sub-model are updated based on the bounding box regression loss value.
计算机设备包括有存储器,以及一个或一个以上的程序,其中一个或者一个以上程序存储于存储器中,且一个或者一个以上程序可以包括一个或一个以上模块,且每个模块可以包括对计算机设备中的一系列计算机可执行指令,且经配置以由一个或者一个以上处理器执行该一个或者一个以上程序包含用于进行以下计算机可执行指令:从第二备选边界框集合中获取待检测图像对应的第二边界框子集;所述第二边界框子集包括第三指定数量的第二参考边界框,所述第二备选边界框集合是利用预设感兴趣区域提取模型对所述待检测图像进行目标区域提取得到的;将所述第二参考边界框输入目标检测模型进行目标检测,得到各所述第二参考边界框对应的第二预测边界框和第二类别预测结果;基于各所述第二参考边界框对应的所述第二预测边界框和所述第二类别预测结果,生成所述待检测图像的目标检测结果。The computer device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a configuration for the computer device. A series of computer-executable instructions, and configured to be executed by one or more processors, the one or more programs include computer-executable instructions for: obtaining from a second set of candidate bounding boxes corresponding to the image to be detected A second subset of bounding boxes; the second subset of bounding boxes includes a third specified number of second reference bounding boxes, and the second set of candidate bounding boxes is performed on the image to be detected using a preset region of interest extraction model. Obtained by extracting the target area; input the second reference bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second category prediction result corresponding to each of the second reference bounding boxes; based on each of the first The second predicted bounding box and the second category prediction result corresponding to the two reference bounding boxes are used to generate a target detection result of the image to be detected.
本申请实施例中的计算机设备,在模型训练阶段,边界框预测子模型基于第一参考边界框预测得到第一预测边界框,再基于第一预测边界框、及其对应的实际边界框,促使待训练的目标检测模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于对应的实际边界框,从而提高训练后的目标检测模型的边界框预测准确度、模型泛化性和数据迁移性;并且用于确定边界框回归损失值的比对结果集合不仅包括表征边界框分布相似程度的第一比对结果,还包括表征边界框坐标重合程度的第二比对结果,再基于各第一参考边界框分别对应的第一比对结果和第二比对结果得到边界框回归损失值,使得边界框回归损失值包含了基于边界框分布相似程度的粗粒度比对维度得到的回归损失、以及基于边界框坐标重合程度的细粒度比对维度得到的回归损失,这样能够提高边界框回归损失值的准确度更高,从而能够进一步提高基于该边界框回归损失值更新后的模型参数的准确度。In the computer device in the embodiment of the present application, during the model training phase, the bounding box prediction sub-model predicts the first predicted bounding box based on the first reference bounding box, and then based on the first predicted bounding box and its corresponding actual bounding box, prompts The target detection model to be trained continuously learns the bounding box distribution, so that the predicted first predicted bounding box is closer to the corresponding actual bounding box, thereby improving the bounding box prediction accuracy, model generalization and Data mobility; and the set of comparison results used to determine the bounding box regression loss value not only includes the first comparison result that characterizes the similarity of the bounding box distribution, but also includes the second comparison result that characterizes the degree of coincidence of the bounding box coordinates, and then based on The first comparison result and the second comparison result corresponding to each first reference bounding box respectively obtain the bounding box regression loss value, so that the bounding box regression loss value includes the regression obtained from the coarse-grained comparison dimension based on the similarity of the bounding box distribution. loss, and the regression loss obtained by fine-grained comparison dimensions based on the coincidence degree of the bounding box coordinates, which can improve the accuracy of the bounding box regression loss value and further improve the model parameters updated based on the bounding box regression loss value. accuracy.
需要说明的是,本申请中关于计算机设备的实施例与本申请中关于模型训练方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的模型训练方法的实施,重复之处不再赘述。 It should be noted that the embodiment of the computer device in this application and the embodiment of the model training method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the implementation of the corresponding model training method mentioned above, and repeat the same. No further details will be given.
进一步地,对应上述图1至图4b所示的方法,基于相同的技术构思,本申请实施例还提供了一种存储介质,用于存储计算机可执行指令,该存储介质可以为U盘、光盘、硬盘等,该存储介质存储的计算机可执行指令在被处理器执行时,能实现以下流程:从第一备选边界框集合中获取第一边界框子集,以及获取所述第一边界框子集中的各第一参考边界框分别对应的实际边界框;所述第一边界框子集包括第一指定数量的第一参考边界框,所述第一备选边界框集合是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;将所述第一参考边界框和所述实际边界框输入待训练的目标检测模型进行模型迭代训练,直到本次模型迭代训练结果满足预设模型迭代训练终止条件,得到训练后的目标检测模型。Further, corresponding to the above-mentioned methods shown in Figures 1 to 4b, based on the same technical concept, embodiments of the present application also provide a storage medium for storing computer executable instructions. The storage medium can be a U disk or an optical disk. , hard disk, etc., when the computer executable instructions stored in the storage medium are executed by the processor, the following process can be realized: obtaining the first bounding box subset from the first candidate bounding box set, and obtaining the first bounding box subset. actual bounding boxes corresponding to each first reference bounding box; the first bounding box subset includes a first specified number of first reference bounding boxes, and the first candidate bounding box set is extracted using a preset region of interest The model is obtained by extracting the target area from the sample image data set; input the first reference bounding box and the actual bounding box into the target detection model to be trained for model iterative training until the model iterative training results meet the preset model Iterate the training termination conditions to obtain the trained target detection model.
其中,所述目标检测模型包括边界框预测子模型;每次模型训练的具体实现方式有:针对每个所述第一参考边界框:所述边界框预测子模型基于所述第一参考边界框进行边界框预测,得到第一预测边界框;基于所述第一参考边界框对应的实际边界框和所述第一参考边界框对应的第一预测边界框,生成边界框比对结果集合;所述边界框比对结果集合包括表征边界框分布相似程度的第一比对结果、以及表征边界框坐标重合程度的第二比对结果;基于所述第一边界框子集中的所述第一参考边界框分别对应的第一比对结果和第二比对结果,确定边界框回归损失值;基于所述边界框回归损失值对所述边界框预测子模型进行参数更新。Wherein, the target detection model includes a bounding box prediction sub-model; the specific implementation of each model training is: for each first reference bounding box: the bounding box prediction sub-model is based on the first reference bounding box Perform bounding box prediction to obtain a first predicted bounding box; generate a set of bounding box comparison results based on the actual bounding box corresponding to the first reference bounding box and the first predicted bounding box corresponding to the first reference bounding box; The set of bounding box comparison results includes a first comparison result that represents the degree of similarity of the distribution of bounding boxes, and a second comparison result that represents the degree of coincidence of bounding box coordinates; based on the first reference boundary in the first subset of bounding boxes The first comparison result and the second comparison result respectively corresponding to the box are determined to determine the bounding box regression loss value; and the parameters of the bounding box prediction sub-model are updated based on the bounding box regression loss value.
该存储介质可以为U盘、光盘、硬盘等,该存储介质存储的计算机可执行指令在被处理器执行时,能实现以下流程:从第二备选边界框集合中获取待检测图像对应的第二边界框子集;所述第二边界框子集包括第三指定数量的第二参考边界框,所述第二备选边界框集合是利用预设感兴趣区域提取模型对所述待检测图像进行目标区域提取得到的;将所述第二参考边界框输入目标检测模型进行目标检测,得到各所述第二参考边界框对应的第二预测边 界框和第二类别预测结果;基于各所述第二参考边界框对应的所述第二预测边界框和所述第二类别预测结果,生成所述待检测图像的目标检测结果。The storage medium can be a U disk, an optical disk, a hard disk, etc. When the computer executable instructions stored in the storage medium are executed by the processor, the following process can be implemented: Obtain the third image corresponding to the image to be detected from the second alternative bounding box set. Two subsets of bounding boxes; the second subset of bounding boxes includes a third specified number of second reference bounding boxes, and the second set of candidate bounding boxes is a target of the image to be detected using a preset region of interest extraction model. Obtained by region extraction; input the second reference bounding box into the target detection model for target detection, and obtain the second predicted edge corresponding to each second reference bounding box. a bounding box and a second category prediction result; based on the second predicted bounding box and the second category prediction result corresponding to each of the second reference bounding boxes, a target detection result of the image to be detected is generated.
本申请实施例中的存储介质存储的计算机可执行指令在被处理器执行时,在模型训练阶段,边界框预测子模型基于第一参考边界框预测得到第一预测边界框,再基于第一预测边界框、及其对应的实际边界框,促使待训练的目标检测模型不断学习边界框分布,使得预测得到的第一预测边界框更加接近于对应的实际边界框,从而提高训练后的目标检测模型的边界框预测准确度、模型泛化性和数据迁移性;并且用于确定边界框回归损失值的比对结果集合不仅包括表征边界框分布相似程度的第一比对结果,还包括表征边界框坐标重合程度的第二比对结果,再基于各第一参考边界框分别对应的第一比对结果和第二比对结果得到边界框回归损失值,使得边界框回归损失值包含了基于边界框分布相似程度的粗粒度比对维度得到的回归损失、以及基于边界框坐标重合程度的细粒度比对维度得到的回归损失,这样能够提高边界框回归损失值的准确度更高,从而能够进一步提高基于该边界框回归损失值更新后的模型参数的准确度。When the computer-executable instructions stored in the storage medium in the embodiment of the present application are executed by the processor, during the model training phase, the bounding box prediction sub-model predicts the first prediction bounding box based on the first reference bounding box, and then predicts the first bounding box based on the first prediction. The bounding box and its corresponding actual bounding box prompt the target detection model to be trained to continuously learn the bounding box distribution, making the predicted first predicted bounding box closer to the corresponding actual bounding box, thus improving the trained target detection model. The bounding box prediction accuracy, model generalization and data transferability of The second comparison result of the degree of coordinate coincidence is then used to obtain the bounding box regression loss value based on the first comparison result and the second comparison result corresponding to each first reference bounding box, so that the bounding box regression loss value includes the bounding box regression loss value. The regression loss obtained from the coarse-grained comparison dimension based on the degree of distribution similarity, and the regression loss obtained from the fine-grained comparison dimension based on the coincidence degree of the bounding box coordinates can improve the accuracy of the bounding box regression loss value and further improve the accuracy of the bounding box regression loss value. The accuracy of the updated model parameters based on this bounding box regression loss value.
需要说明的是,本申请中关于存储介质的实施例与本申请中关于模型训练方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的模型训练方法的实施,重复之处不再赘述。It should be noted that the embodiment of the storage medium in this application and the embodiment of the model training method in this application are based on the same inventive concept. Therefore, for the specific implementation of this embodiment, please refer to the implementation of the corresponding model training method mentioned above, and repeat the same. No further details will be given.
在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。本领域内的技术人员应明白,本申请实施例可提供为方法、***或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可读存储介质(包括但不限于磁盘存储器、 CD-ROM、光学存储器等)上实施的计算机程序产品的形式。本申请是参照根据本申请实施例的方法、设备(***)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desired results. Additionally, the processes depicted in the figures do not necessarily require the specific order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain implementations. Those skilled in the art should understand that embodiments of the present application may be provided as methods, systems or computer program products. Therefore, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may employ computer-readable storage media (including but not limited to disk memory, disk memory, computer-usable program code) embodied therein. The form of a computer program product implemented on a CD-ROM, optical storage, etc.). The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram. These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram. These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、 数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. Memory may include non-permanent storage in computer-readable media, random access memory (RAM), and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media. Computer-readable media includes both persistent and non-volatile, removable and non-removable media that can be implemented by any method or technology for storage of information. Information may be computer-readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory. (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD-ROM), Digital versatile discs (DVDs) or other optical storage, magnetic tape cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that can be accessed by a computing device. As defined in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。本申请实施例可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请的一个或多个实施例,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于***实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。以上所述仅为本文件的实施例而已,并不用于限制本文件。对于本领域技术人员来说,本文件可以有各种更改和变化。凡在本文件的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本文件的权利要求范围之内。 It should also be noted that the terms "comprises,""comprises" or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements not only includes those elements, but also includes Other elements are not expressly listed or are inherent to the process, method, article or equipment. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of additional identical elements in a process, method, article, or device that includes the stated element. Embodiments of the present application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. One or more embodiments of the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices. Each embodiment in this application is described in a progressive manner. The same and similar parts between the various embodiments can be referred to each other. Each embodiment focuses on its differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment. The above are only examples of this document and are not intended to limit this document. Various modifications and variations of this document may occur to those skilled in the art. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this document shall be included in the scope of the claims of this document.

Claims (14)

  1. 一种模型训练方法,所述方法包括:A model training method, the method includes:
    从第一备选边界框集合中获取第一边界框子集,以及获取所述第一边界框子集中第一参考边界框的实际边界框;所述第一边界框子集包括第一参考边界框,所述第一备选边界框集合是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;Obtain a first bounding box subset from the first candidate bounding box set, and obtain the actual bounding box of the first reference bounding box in the first bounding box subset; the first bounding box subset includes the first reference bounding box, so The first set of candidate bounding boxes is obtained by extracting the target area from the sample image data set using a preset region of interest extraction model;
    将所述第一参考边界框和所述实际边界框输入待训练的目标检测模型进行模型迭代训练,直到模型迭代训练结果满足预设模型迭代训练终止条件,得到训练后的目标检测模型;其中,所述目标检测模型包括边界框预测子模型;所述模型迭代训练中的每次模型训练包括:The first reference bounding box and the actual bounding box are input into the target detection model to be trained for model iterative training until the model iterative training results meet the preset model iterative training termination conditions, and the trained target detection model is obtained; wherein, The target detection model includes a bounding box prediction sub-model; each model training in the model iterative training includes:
    所述边界框预测子模型基于所述第一参考边界框进行边界框预测,得到第一预测边界框;基于所述第一参考边界框的实际边界框和所述第一参考边界框的第一预测边界框,生成边界框比对结果集合;所述边界框比对结果集合包括表征边界框分布相似程度的第一比对结果、以及表征边界框坐标重合程度的第二比对结果;The bounding box prediction sub-model performs bounding box prediction based on the first reference bounding box to obtain a first predicted bounding box; the actual bounding box based on the first reference bounding box and the first reference bounding box Predict the bounding box and generate a set of bounding box comparison results; the set of bounding box comparison results includes a first comparison result that represents the similarity of the distribution of the bounding boxes, and a second comparison result that represents the degree of coincidence of the coordinates of the bounding boxes;
    基于所述第一比对结果和所述第二比对结果,确定边界框回归损失值;Based on the first comparison result and the second comparison result, determine a bounding box regression loss value;
    基于所述边界框回归损失值对所述边界框预测子模型进行参数更新。Parameter updates are performed on the bounding box prediction sub-model based on the bounding box regression loss value.
  2. 根据权利要求1所述的方法,其中,所述基于所述第一参考边界框的实际边界框和所述第一参考边界框的第一预测边界框,生成边界框比对结果集合,包括:The method according to claim 1, wherein generating a set of bounding box comparison results based on the actual bounding box of the first reference bounding box and the first predicted bounding box of the first reference bounding box includes:
    基于所述第一参考边界框的实际边界框和第一预测边界框,计算相对熵KL散度,得到第一比对结果;基于所述第一参考边界框的实际边界框和第一预测边界框,计算边界框交并比损失,得到第二比对结果。Based on the actual bounding box and the first predicted bounding box of the first reference bounding box, calculate the relative entropy KL divergence to obtain the first comparison result; based on the actual bounding box and the first predicted boundary of the first reference bounding box box, calculate the intersection and union ratio loss of the bounding box, and obtain the second comparison result.
  3. 根据权利要求2所述的方法,其中,所述基于所述第一比对结果和所 述第二比对结果,确定边界框回归损失值,包括:The method according to claim 2, wherein the method based on the first comparison result and the According to the second comparison result, the bounding box regression loss value is determined, including:
    确定所述第一参考边界框的子回归损失值;所述子回归损失值是基于目标信息确定的,所述目标信息包括以下一种或组合:所述第一比对结果所表征的边界框分布相似程度、所述第二比对结果所表征的边界框坐标重合程度;Determine the sub-regression loss value of the first reference bounding box; the sub-regression loss value is determined based on target information, and the target information includes one or a combination of the following: the boundary box represented by the first comparison result The degree of distribution similarity and the degree of overlap of the bounding box coordinates represented by the second comparison result;
    基于所述子回归损失值,确定边界框回归损失值。Based on the sub-regression loss value, a bounding box regression loss value is determined.
  4. 根据权利要求2所述的方法,其中,所述基于所述第一参考边界框的实际边界框和第一预测边界框,计算相对熵KL散度,得到第一比对结果,包括:The method according to claim 2, wherein the relative entropy KL divergence is calculated based on the actual bounding box and the first predicted bounding box of the first reference bounding box to obtain the first comparison result, including:
    确定所述实际边界框的第一概率分布,以及确定所述第一预测边界框的第二概率分布;determining a first probability distribution of the actual bounding boxes, and determining a second probability distribution of the first predicted bounding boxes;
    计算所述第一概率分布与所述第二概率分布之间的KL散度值;所述KL散度值用于表征所述第一预测边界框与所述实际边界框之间的分布相似程度;Calculate the KL divergence value between the first probability distribution and the second probability distribution; the KL divergence value is used to characterize the distribution similarity between the first predicted bounding box and the actual bounding box ;
    基于所述KL散度值,确定所述的第一比对结果。Based on the KL divergence value, the first comparison result is determined.
  5. 根据权利要求2所述的方法,其中,所述基于所述第一参考边界框的实际边界框和第一预测边界框,计算边界框交并比损失,得到第二比对结果,包括:The method according to claim 2, wherein calculating the intersection and union ratio loss of the bounding box based on the actual bounding box and the first predicted bounding box of the first reference bounding box to obtain the second comparison result includes:
    对所述第一参考边界框的实际边界框和第一预测边界框进行边界框交并比损失计算,得到第一交并比损失;Perform a boundary box intersection and union loss calculation on the actual bounding box and the first predicted bounding box of the first reference bounding box to obtain a first intersection and union loss;
    基于所述第一交并比损失,确定所述第一参考边界框的第二比对结果;所述边界框交并比损失用于表征边界框坐标重合程度。Based on the first intersection-union ratio loss, a second comparison result of the first reference bounding box is determined; the bounding box intersection-union ratio loss is used to characterize the degree of coincidence of bounding box coordinates.
  6. 根据权利要求5所述的方法,其中,所述基于所述第一交并比损失,确定所述第一参考边界框的第二比对结果,包括:The method of claim 5, wherein determining the second comparison result of the first reference bounding box based on the first intersection-union ratio loss includes:
    在所述第一参考边界框的第一预测边界框中,确定序号为i的第一参考边界框的多个负例样本;所述多个负例样本包括序号为k的第一预测边界框, k不等于i、或者k不等于i且k不等于j,其中,k、i、j均为大于零的整数,序号为j的第一预测边界框与序号为i的第一参考边界框所圈定的目标对象相同;In the first predicted bounding box of the first reference bounding box, a plurality of negative samples of the first reference bounding box with a serial number i are determined; the plurality of negative samples include the first predicted bounding box with a serial number k , k is not equal to i, or k is not equal to i and k is not equal to j, where k, i, and j are all integers greater than zero, and the first predicted bounding box with serial number j and the first reference bounding box with serial number i are The circled target objects are the same;
    对所述序号为i的第一参考边界框对应的实际边界框和所述多个负例样本分别进行边界框交并比损失计算,得到第二交并比损失;Perform a boundary box intersection and union loss calculation on the actual bounding box corresponding to the first reference bounding box with serial number i and the multiple negative samples respectively to obtain a second intersection and union loss;
    基于所述第一交并比损失和所述第二交并比损失,确定所述第一参考边界框对应的第二比对结果。Based on the first intersection-union ratio loss and the second intersection-union ratio loss, a second comparison result corresponding to the first reference bounding box is determined.
  7. 根据权利要求3所述的方法,其中,所述目标检测模型还包括边界框分类子模型;每次模型训练的具体实现方式还包括:所述边界框分类子模型对所述第一参考边界框或者所述第一预测边界框进行分类处理,得到第一类别预测结果;The method according to claim 3, wherein the target detection model further includes a bounding box classification sub-model; the specific implementation of each model training further includes: the bounding box classification sub-model analyzes the first reference bounding box Or the first predicted bounding box is subjected to classification processing to obtain a first category prediction result;
    所述目标信息还包括所述第一类别预测结果所表征的第一预测类别与所述第一参考边界框的实际类别之间的类别匹配结果,若所述类别匹配结果为所述第一预测类别与所述实际类别不匹配,则所述子回归损失值为零;若所述类别匹配结果为所述第一预测类别与所述实际类别相匹配,则基于所述边界框分布相似程度对应的第一回归损失分量和所述边界框坐标重合程度对应的第二回归损失分量中至少一项确定所述子回归损失值。The target information also includes a category matching result between the first predicted category represented by the first category prediction result and the actual category of the first reference bounding box. If the category matching result is the first predicted category, If the category does not match the actual category, the sub-regression loss value is zero; if the category matching result is that the first predicted category matches the actual category, then the similarity degree corresponding to the bounding box distribution is At least one of the first regression loss component and the second regression loss component corresponding to the degree of coincidence of the bounding box coordinates determines the sub-regression loss value.
  8. 根据权利要求1所述的方法,其中,所述方法还包括:The method of claim 1, further comprising:
    将样本图像数据集输入预设感兴趣区域提取模型进行感兴趣区域提取,得到第一备选边界框集合;所述第一备选边界框集合包括第二指定数量的备选边界框;,所述第一参考边界框为第一指定数量,所述第二指定数量大于所述第一指定数量;Input the sample image data set into the preset region of interest extraction model to extract the region of interest to obtain a first set of candidate bounding boxes; the first set of candidate bounding boxes includes a second specified number of candidate bounding boxes; so The first reference bounding box is a first specified number, and the second specified number is greater than the first specified number;
    所述从第一备选边界框集合中获取第一边界框子集,包括:从所述第二指定数量的备选边界框中,选取所述第一指定数量的备选边界框作为第一参 考边界框,得到第一边界框子集。Obtaining the first subset of bounding boxes from the first set of candidate bounding boxes includes: selecting the first specified number of candidate bounding boxes from the second specified number of candidate bounding boxes as the first parameter. Consider the bounding box and obtain the first bounding box subset.
  9. 一种目标检测方法,所述方法包括:A target detection method, the method includes:
    从第二备选边界框集合中获取待检测图像的第二边界框子集;所述第二边界框子集包括第二参考边界框,所述第二备选边界框集合是利用预设感兴趣区域提取模型对所述待检测图像进行目标区域提取得到的;Obtain a second subset of bounding boxes of the image to be detected from the second set of candidate bounding boxes; the second subset of bounding boxes includes a second reference bounding box, and the second set of candidate bounding boxes is generated using a preset region of interest The extraction model is obtained by extracting the target area of the image to be detected;
    将所述第二参考边界框输入目标检测模型进行目标检测,得到各所述第二参考边界框的第二预测边界框和第二类别预测结果;Input the second reference bounding box into the target detection model for target detection, and obtain the second predicted bounding box and the second category prediction result of each of the second reference bounding box;
    基于所述第二预测边界框和所述第二类别预测结果,生成所述待检测图像的目标检测结果。Based on the second predicted bounding box and the second category prediction result, a target detection result of the image to be detected is generated.
  10. 根据权利要求9所述的方法,其中,所述目标检测模型包括边界框预测子模型和边界框分类子模型;The method of claim 9, wherein the target detection model includes a bounding box prediction sub-model and a bounding box classification sub-model;
    在所述目标检测过程中,所述边界框预测子模型基于所述第二参考边界框进行边界框预测,得到所述第二预测边界框;所述边界框分类子模型对所述第二参考边界框或者所述第二预测边界框进行分类处理,得到所述第二参考边界框的第二类别预测结果。During the target detection process, the bounding box prediction sub-model performs bounding box prediction based on the second reference bounding box to obtain the second predicted bounding box; the bounding box classification sub-model predicts the second reference bounding box. The bounding box or the second predicted bounding box is subjected to classification processing to obtain a second category prediction result of the second reference bounding box.
  11. 一种模型训练装置,所述装置包括:A model training device, the device includes:
    边界框获取模块,被配置为从第一备选边界框集合中获取第一边界框子集,以及获取所述第一边界框子集中第一参考边界框的实际边界框;所述第一边界框子集包括第一参考边界框,所述第一备选边界框集合是利用预设感兴趣区域提取模型对样本图像数据集进行目标区域提取得到的;A bounding box acquisition module configured to acquire a first bounding box subset from a first candidate bounding box set, and to acquire an actual bounding box of a first reference bounding box in the first bounding box subset; the first bounding box subset Comprising a first reference bounding box, the first set of candidate bounding boxes is obtained by extracting the target area from the sample image data set using a preset region of interest extraction model;
    模型训练模块,被配置为将所述第一参考边界框和所述实际边界框输入待训练的目标检测模型进行模型迭代训练,直到模型迭代训练结果满足预设模型迭代训练终止条件,得到训练后的目标检测模型;其中,所述目标检测模型包括边界框预测子模型;所述模型迭代训练中的每次模型训练包括: The model training module is configured to input the first reference bounding box and the actual bounding box into the target detection model to be trained to perform model iterative training until the model iterative training results meet the preset model iterative training termination conditions, and obtain after training The target detection model; wherein, the target detection model includes a bounding box prediction sub-model; each model training in the model iterative training includes:
    所述边界框预测子模型基于所述第一参考边界框进行边界框预测,得到第一预测边界框;基于所述第一参考边界框的实际边界框和所述第一参考边界框的第一预测边界框,生成边界框比对结果集合;所述边界框比对结果集合包括表征边界框分布相似程度的第一比对结果、以及表征边界框坐标重合程度的第二比对结果;基于所述所述第一比对结果和所述第二比对结果,确定边界框回归损失值;基于所述边界框回归损失值对所述边界框预测子模型进行参数更新。The bounding box prediction sub-model performs bounding box prediction based on the first reference bounding box to obtain a first predicted bounding box; the actual bounding box based on the first reference bounding box and the first reference bounding box Predict the bounding box and generate a set of bounding box comparison results; the set of bounding box comparison results includes a first comparison result that represents the similarity of the distribution of the bounding boxes, and a second comparison result that represents the degree of coincidence of the coordinates of the bounding boxes; based on the Determine the bounding box regression loss value based on the first comparison result and the second comparison result; perform parameter updates on the bounding box prediction sub-model based on the bounding box regression loss value.
  12. 一种目标检测装置,所述装置包括:A target detection device, the device includes:
    边界框获取模块,被配置为从第二备选边界框集合中获取待检测图像的第二边界框子集;所述第二边界框子集包括第二参考边界框,所述第二备选边界框集合是利用预设感兴趣区域提取模型对所述待检测图像进行目标区域提取得到的;A bounding box acquisition module configured to acquire a second subset of bounding boxes of the image to be detected from a second set of candidate bounding boxes; the second subset of bounding boxes includes a second reference bounding box, and the second candidate bounding box The set is obtained by extracting the target area of the image to be detected using a preset area of interest extraction model;
    目标检测模块,被配置为将所述第二参考边界框输入目标检测模型进行目标检测,得到各所述第二参考边界框的第二预测边界框和第二类别预测结果;A target detection module configured to input the second reference bounding box into a target detection model to perform target detection, and obtain a second predicted bounding box and a second category prediction result for each of the second reference bounding box;
    检测结果生成模块,被配置为基于所述第二预测边界框和所述第二类别预测结果,生成所述待检测图像的目标检测结果。A detection result generation module is configured to generate a target detection result of the image to be detected based on the second predicted bounding box and the second category prediction result.
  13. 一种计算机设备,所述设备包括:A computer device, the device includes:
    处理器;以及processor; and
    被安排成存储计算机可执行指令的存储器,所述可执行指令被配置由所述处理器执行,所述可执行指令包括用于执行如权利要求1-8任一项或者9-10任一项所述的方法中的步骤。Memory arranged to store computer-executable instructions configured to be executed by the processor, the executable instructions including instructions for performing any one of claims 1-8 or any one of claims 9-10 steps in the method.
  14. 一种存储介质,所述存储介质用于存储计算机可执行指令,所述可执行指令使得计算机执行如权利要求1-8任一项或者9-10任一项所述的方法。 A storage medium, the storage medium is used to store computer-executable instructions, the executable instructions enable the computer to execute the method according to any one of claims 1-8 or any one of claims 9-10.
PCT/CN2023/102175 2022-07-15 2023-06-25 Model training method, target detection method and apparatuses WO2024012179A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210829603.7A CN117437394A (en) 2022-07-15 2022-07-15 Model training method, target detection method and device
CN202210829603.7 2022-07-15

Publications (1)

Publication Number Publication Date
WO2024012179A1 true WO2024012179A1 (en) 2024-01-18

Family

ID=89535495

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/102175 WO2024012179A1 (en) 2022-07-15 2023-06-25 Model training method, target detection method and apparatuses

Country Status (2)

Country Link
CN (1) CN117437394A (en)
WO (1) WO2024012179A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117831A (en) * 2018-09-30 2019-01-01 北京字节跳动网络技术有限公司 The training method and device of object detection network
CN111652216A (en) * 2020-06-03 2020-09-11 北京工商大学 Multi-scale target detection model method based on metric learning
CN112329873A (en) * 2020-11-12 2021-02-05 苏州挚途科技有限公司 Training method of target detection model, target detection method and device
CN113780270A (en) * 2021-03-23 2021-12-10 京东鲲鹏(江苏)科技有限公司 Target detection method and device
US20220004935A1 (en) * 2021-09-22 2022-01-06 Intel Corporation Ensemble learning for deep feature defect detection
CN114708185A (en) * 2021-10-28 2022-07-05 中国科学院自动化研究所 Target detection method, system and equipment based on big data enabling and model flow
CN114708462A (en) * 2022-04-29 2022-07-05 人民中科(北京)智能技术有限公司 Method, system, device and storage medium for generating detection model for multi-data training

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117831A (en) * 2018-09-30 2019-01-01 北京字节跳动网络技术有限公司 The training method and device of object detection network
CN111652216A (en) * 2020-06-03 2020-09-11 北京工商大学 Multi-scale target detection model method based on metric learning
CN112329873A (en) * 2020-11-12 2021-02-05 苏州挚途科技有限公司 Training method of target detection model, target detection method and device
CN113780270A (en) * 2021-03-23 2021-12-10 京东鲲鹏(江苏)科技有限公司 Target detection method and device
US20220004935A1 (en) * 2021-09-22 2022-01-06 Intel Corporation Ensemble learning for deep feature defect detection
CN114708185A (en) * 2021-10-28 2022-07-05 中国科学院自动化研究所 Target detection method, system and equipment based on big data enabling and model flow
CN114708462A (en) * 2022-04-29 2022-07-05 人民中科(北京)智能技术有限公司 Method, system, device and storage medium for generating detection model for multi-data training

Also Published As

Publication number Publication date
CN117437394A (en) 2024-01-23

Similar Documents

Publication Publication Date Title
US11468262B2 (en) Deep network embedding with adversarial regularization
EP3467723B1 (en) Machine learning based network model construction method and apparatus
US20180260531A1 (en) Training random decision trees for sensor data processing
CN110414550B (en) Training method, device and system of face recognition model and computer readable medium
JPWO2014136316A1 (en) Information processing apparatus, information processing method, and program
CN114549894A (en) Small sample image increment classification method and device based on embedded enhancement and self-adaptation
WO2010043954A1 (en) Method, apparatus and computer program product for providing pattern detection with unknown noise levels
CN111178533B (en) Method and device for realizing automatic semi-supervised machine learning
CN112446888A (en) Processing method and processing device for image segmentation model
CN112150497A (en) Local activation method and system based on binary neural network
KR20220116110A (en) Method for determining a confidence level of inference data produced by artificial neural network
CN114997287A (en) Model training and data processing method, device, equipment and storage medium
CN113821657A (en) Artificial intelligence-based image processing model training method and image processing method
WO2024012179A1 (en) Model training method, target detection method and apparatuses
CN116561319A (en) Text clustering method, text clustering device and text clustering system
CN114241411B (en) Counting model processing method and device based on target detection and computer equipment
WO2024012138A1 (en) Target detection model training method and apparatus, and target detection method and apparatus
CN115601629A (en) Model training method, image recognition method, medium, device and computing equipment
EP3971782A2 (en) Neural network selection
WO2024012217A1 (en) Model training method and device, and target detection method and device
CN117010480A (en) Model training method, device, equipment, storage medium and program product
CN111708745B (en) Cross-media data sharing representation method and user behavior analysis method and system
CN117437396A (en) Target detection model training method, target detection method and target detection device
US20240104915A1 (en) Long duration structured video action segmentation
CN111507128A (en) Face recognition method and device, electronic equipment and readable medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23838673

Country of ref document: EP

Kind code of ref document: A1