CN110910375A

CN110910375A - Detection model training method, device, equipment and medium based on semi-supervised learning

Info

Publication number: CN110910375A
Application number: CN201911179177.1A
Authority: CN
Inventors: 徐菊婷; 蒋伟伟; 钟浩
Original assignee: Beijing Mininglamp Software System Co ltd
Current assignee: Beijing Mininglamp Software System Co ltd
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2020-03-24

Abstract

The application provides a detection model training method, a detection model training device and a detection model training medium based on semi-supervised learning, and relates to the technical field of image processing. The method comprises the steps of obtaining other amplification labeling frames of a plurality of identification areas by adopting a preset amplification rule for the existing labeling frame of the first identification area of a detection target in a sample image, and training a model according to the existing labeling frame of the first identification area and the amplification labeling frames of the other identification areas to obtain a detection model. According to the method, a plurality of amplification labeling frames are obtained on the basis of the existing labeling frames by presetting amplification rules, so that the amplification speed of a training sample can be effectively increased, the training efficiency of a detection model is improved, and meanwhile, the generalization capability of the detection model is effectively enhanced. In addition, the improved loss function is adopted to optimize the detection model obtained by training, so that the detection accuracy of the detection model is higher.

Description

Detection model training method, device, equipment and medium based on semi-supervised learning

Technical Field

The invention relates to the technical field of image processing, in particular to a detection model training method, a detection model training device, detection model training equipment and a detection model training medium based on semi-supervised learning.

Background

In recent years, with the continuous update and development of image processing technology, deep learning is more and more widely applied to computer vision, and has rapidly made a major breakthrough in the fields of target detection, image classification, segmentation, image generation and the like. Among them, target detection is the basis of many computer vision algorithms and is of interest to many researchers. The target detection algorithm based on deep learning obtains better effect compared with the traditional target detection method. However, an important premise for training the deep learning model is that: a large amount of labeled training data is required. As human costs increase, it is also becoming increasingly important to improve the efficiency and utilization of manually labeled data.

In the prior art, when a training sample is obtained, a rectangular frame needs to be labeled on a target in the sample, when a new scene is faced and a new detection target is replaced, a new batch of training samples corresponding to the new detection target needs to be added, and simultaneously, target labeling is performed on the newly added training sample.

In the prior art, a large number of targets need to be labeled continuously, so that the operation cost is relatively high, the algorithm iteration speed is low, and the model training speed is reduced.

Disclosure of Invention

The invention aims to provide a detection model training method, a detection model training device and a detection model training medium based on semi-supervised learning, aiming at overcoming the defects in the prior art, so as to solve the problems of large labeled data quantity of a detection sample frame and low operation speed, which cause low model training efficiency.

In order to achieve the above purpose, the technical solutions adopted in the embodiments of the present application are as follows:

in a first aspect, an embodiment of the present application provides a detection model training method based on semi-supervised learning, including:

according to the existing marking frame of the first identification area of the detection target in the sample image, respectively adopting at least one preset amplification rule to obtain an amplification marking frame of at least one second identification area of the detection target in the sample image; each preset amplification rule corresponds to one second identification region;

performing model training according to the sample image with the existing labeling frame and the amplification labeling frame to obtain a detection model; the detection model is used for detecting the first identification area and at least one second identification area of the detection target in the image to be detected.

Optionally, the obtaining, according to an existing labeling frame of a first identification region of a detection target in a sample image, an amplification labeling frame of at least one second identification region of the detection target in the sample image by respectively using at least one preset amplification rule includes:

obtaining the coordinates of the amplification marking frame of the second identification area according to the coordinates of the existing marking frame of the first identification area and a preset amplification proportion corresponding to the preset amplification rule;

and labeling the sample image according to the coordinates of the amplification labeling frame of at least one second identification area to obtain the amplification labeling frame of at least one second identification area.

Optionally, the method further comprises:

predicting the sample image by adopting the detection model to obtain a first prediction frame of the first identification area and a second prediction frame of at least one second identification area;

determining a loss function value of the detection model according to the first prediction frame, at least one second prediction frame, the existing labeling frame and at least one amplification labeling frame;

and updating the parameters of the detection model according to the loss function value until the updated loss function value of the detection model is minimum.

Optionally, the determining a loss function value of the detection model according to the first prediction box, the at least one second prediction box, the existing labeling box, and the at least one amplification labeling box includes:

determining a loss function value corresponding to the first identification area according to the first prediction frame and the existing marking frame;

determining a loss function value corresponding to one of said second identified regions based on one of said second prediction boxes and said corresponding amplification labeling box;

and determining the loss function value of the detection model according to the loss function value corresponding to the first identification area and the loss function value corresponding to at least one second identification area.

Optionally, each pixel point in the existing labeling frame corresponds to at least one first anchor point frame;

the determining a loss function value corresponding to the first identification area according to the first prediction box and the existing labeling box includes:

respectively converting the coordinates of the first prediction frame and the existing labeling frame according to the coordinates of at least one first anchor point frame;

and determining a loss function value corresponding to the first identification area according to the transformed coordinates of the first prediction frame and the coordinates of the existing labeling frame.

Optionally, each pixel point in the augmented labeling frame corresponds to at least one second anchor frame;

said determining a loss function value for said second identified region based on said one of said second prediction boxes and said corresponding amplification tagging box, comprising:

respectively converting the coordinates of the second prediction frame and the corresponding amplification labeling frame according to the coordinates of at least one second anchor point frame;

and determining a loss function value corresponding to the second identification area according to the transformed coordinates of the second prediction frame and the transformed coordinates of the amplification labeling frame.

In a second aspect, an embodiment of the present application further provides a detection model training apparatus based on semi-supervised learning, including: an amplification module and a training module;

the amplification module is used for obtaining an amplification labeling frame of at least one second identification area of the detection target in the sample image by respectively adopting at least one preset amplification rule according to the existing labeling frame of the first identification area of the detection target in the sample image; each preset amplification rule corresponds to one second identification region;

the training module is used for carrying out model training according to the sample image with the existing labeling frame and the amplification labeling frame to obtain a detection model; the detection model is used for detecting the first identification area and at least one second identification area of the detection target in the image to be detected.

Optionally, the amplification module is specifically configured to obtain a coordinate of an amplification labeling frame of the second identification region according to the coordinate of an existing labeling frame of the first identification region and a preset amplification ratio corresponding to one preset amplification rule; and labeling the sample image according to the coordinates of the amplification labeling frame of at least one second identification area to obtain the amplification labeling frame of at least one second identification area.

Optionally, the apparatus further comprises: the device comprises a prediction module, a determination module and an updating module;

the prediction module is used for predicting the sample image by adopting the detection model to obtain a first prediction frame of the first identification area and a second prediction frame of at least one second identification area;

the determining module is configured to determine a loss function value of the detection model according to the first prediction box, the at least one second prediction box, the existing labeling box, and the at least one amplification labeling box;

and the updating module is used for updating the parameters of the detection model according to the loss function value until the updated loss function value of the detection model is minimum.

Optionally, the determining module is specifically configured to determine a loss function value corresponding to the first identification area according to the first prediction box and the existing labeling box; determining a loss function value corresponding to one of said second identified regions based on one of said second prediction boxes and said corresponding amplification labeling box; and determining the loss function value of the detection model according to the loss function value corresponding to the first identification area and the loss function value corresponding to at least one second identification area.

the determining module is specifically configured to convert, according to the coordinates of at least one first anchor point frame, the coordinates of the first prediction frame and the coordinates of the existing labeling frame respectively; and determining a loss function value corresponding to the first identification area according to the transformed coordinates of the first prediction frame and the coordinates of the existing labeling frame.

the determining module is specifically configured to convert, according to the coordinates of at least one second anchor point frame, the coordinates of the second prediction frame and the corresponding amplification labeling frame respectively; and determining a loss function value corresponding to the second identification area according to the transformed coordinates of the second prediction frame and the transformed coordinates of the amplification labeling frame.

In a third aspect, an embodiment of the present application further provides a computing device, including: a processor, a storage medium and a bus, the storage medium storing program instructions executable by the processor, the processor and the storage medium communicating via the bus when the computing device is running, the processor executing the program instructions to perform the steps of the semi-supervised learning based detection model training method as described in the first aspect above.

In a fourth aspect, the present application further provides a storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the detection model training method based on semi-supervised learning as described in the first aspect above are performed.

The beneficial effect of this application is: the embodiment of the application provides a detection model training method, a detection model training device and a detection model training medium based on semi-supervised learning, wherein the method comprises the following steps: respectively adopting at least one preset amplification rule according to the existing labeling frame of the first identification area of the detection target in the sample image to obtain an amplification labeling frame of at least one second identification area of the detection target in the sample image; each preset amplification rule corresponds to a second identification area; performing model training according to the sample image with the existing labeling frame and the amplification labeling frame to obtain a detection model; the detection model is used for detecting a first identification area and at least one second identification area of a detection target in an image to be detected. The existing labeling frame of the first identification area of the detection target in the sample image is subjected to preset amplification rules to obtain other amplification labeling frames of the plurality of identification areas, and model training is carried out according to the existing labeling frame of the first identification area and the amplification labeling frames of the other identification areas to obtain a detection model. According to the method, a plurality of amplification labeling frames are obtained on the basis of the existing labeling frames by presetting amplification rules, so that the amplification speed of a training sample can be effectively increased, the training efficiency of a detection model is improved, and meanwhile, the generalization capability of the detection model is effectively enhanced. In addition, the improved loss function is adopted to optimize the detection model obtained by training, so that the detection accuracy of the detection model is higher.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic flowchart of a detection model training method based on semi-supervised learning according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of another detection model training method based on semi-supervised learning according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an amplification labeling box according to an embodiment of the present application;

fig. 4 is a schematic flowchart of another detection model training method based on semi-supervised learning according to an embodiment of the present application;

fig. 5 is an overlapped schematic view of a sheep body labeling frame provided in the embodiment of the present application;

fig. 6 is a schematic flowchart of another detection model training method based on semi-supervised learning according to an embodiment of the present application;

FIG. 7 is a schematic flowchart of another detection model training method based on semi-supervised learning according to an embodiment of the present disclosure;

FIG. 8 is a schematic flowchart of another detection model training method based on semi-supervised learning according to an embodiment of the present disclosure;

fig. 9 is a schematic flowchart of a target detection method based on semi-supervised learning according to an embodiment of the present application;

fig. 10 is a schematic diagram of a detection model training apparatus based on semi-supervised learning according to an embodiment of the present application;

FIG. 11 is a schematic diagram of another detection model training apparatus based on semi-supervised learning according to an embodiment of the present application;

fig. 12 is a schematic diagram of another detection model training device based on semi-supervised learning according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention.

Fig. 1 is a schematic flowchart of a detection model training method based on semi-supervised learning according to an embodiment of the present application. The execution subject of the method can be a computer or a server with a data processing function. As shown in fig. 1, the method may include:

s101, respectively adopting at least one preset amplification rule according to the existing labeling frame of the first identification area of the detection target in the sample image to obtain an amplification labeling frame of at least one second identification area of the detection target in the sample image; each preset amplification rule corresponds to a second recognition area.

Alternatively, in this embodiment, the detection target in the sample image may be of multiple types, for example: the shape and the size of the corresponding existing labeling frame are different for different detection targets and different identification areas of the different detection targets. For example, the target sheep is detected, and the identification area may be a sheep face, a sheep neck, a sheep body, or the like.

Optionally, the first identification area is an example of a goat face, and an existing labeling frame of the first identification area may be manually labeled in advance, which has a certain degree of confidence. In order to detect not only the face of a sheep but also other second recognition areas such as the neck of a sheep or the body of a sheep, it is necessary to obtain the labeling frame corresponding to the second recognition area in advance, that is, the amplification labeling frame of the second recognition area, and train the detection model according to the amplification labeling frame corresponding to the second recognition area.

In some embodiments, in order to improve the efficiency of obtaining the amplification labeling frames of other multiple identification regions, the method improved by this embodiment may be adopted, and on the basis of the existing labeling frame of the first identification region, the existing labeling frame of the first identification region is adjusted by using a preset amplification rule, so as to obtain the amplification labeling frames of other multiple identification regions.

It should be noted that, for different second identification regions, the preset amplification rules are also different, that is, the preset amplification rules corresponding to different second identification regions are different. The corresponding preset amplification rule when the second identification area is the sheep neck is different from the corresponding preset amplification rule when the second identification area is the sheep body. Specifically, the preset amplification rule may be adaptively modified according to the specific shape and size of the second identification region, and the like.

S102, performing model training according to the sample image with the existing labeling frame and the amplification labeling frame to obtain a detection model; the detection model is used for detecting a first identification area and at least one second identification area of a detection target in an image to be detected.

Optionally, in this embodiment, the sample image including the existing labeling box and the augmented labeling box may be input into a training model as a sample to perform training of the detection model, and after the detection model is obtained through training, the detection model may be used to detect a target object in any image including detection targets of the same type, so as to implement intelligent detection of the target. For example: after the detection model is obtained by training the image containing the goat face labeling frame and the goat body labeling frame as a sample, the goat face and the goat body in any image containing the goat can be detected by the detection model obtained by training.

In summary, in the detection model training method based on semi-supervised learning provided in this embodiment, the existing labeling frame of the first identification region of the detection target in the sample image is subjected to the preset amplification rule to obtain the amplification labeling frames of other multiple identification regions, and the model training is performed according to the existing labeling frame of the first identification region and the amplification labeling frames of the other identification regions to obtain the detection model. According to the method, a plurality of amplification labeling frames are obtained on the basis of the existing labeling frames by presetting amplification rules, so that the amplification speed of a training sample can be effectively increased, the training efficiency of a detection model is improved, and meanwhile, the generalization capability of the detection model is effectively enhanced.

Fig. 2 is a schematic flow chart of another detection model training method based on semi-supervised learning according to an embodiment of the present application, and optionally, as shown in fig. 2, in step S101, obtaining an amplification labeling frame of at least one second identification region of a detection target in a sample image by respectively using at least one preset amplification rule according to an existing labeling frame of a first identification region of the detection target in the sample image, where the method may include:

s201, obtaining the coordinates of the amplification labeling frame of the second identification area according to the coordinates of the existing labeling frame of the first identification area and a preset amplification proportion corresponding to a preset amplification rule.

Optionally, before performing the tag box expansion, the coordinate information of the existing tag box of the first identification region may be obtained. And determining the part of the second identification area, namely determining whether the second identification area is a sheep neck or a sheep body, determining a preset amplification proportion corresponding to a preset amplification rule according to the determined part information of the second identification area, and converting the coordinate information of the existing labeling frame of the first identification area according to the preset amplification proportion to obtain the coordinate information of the amplification labeling frame of the second identification area. Wherein, for different second recognition areas, the corresponding amplification ratio can be determined by a priori knowledge or manual measurement.

FIG. 3 is a schematic diagram of an amplification labeling box according to an embodiment of the present application, and as shown in FIG. 3, it is assumed that an existing labeling box of the first identification area is a rectangular box, corresponding to the labeling box a in FIG. 3, and the coordinate information is (x)_f,y_f,w_f,h_f) Wherein x is_f,y_fIs the coordinate of the center point of the rectangular frame, w_f,h_fThe width and height of the rectangular frame. And if the determined second identification region is the sheep neck, amplifying the existing labeling frame according to the amplification proportion determined by the second identification region, wherein the obtained coordinate information of the amplified labeling frame of the second identification region can be as follows: x is the number of_n＝x_f,y_n＝y_f-0.5h_f+0.5βh_f,w_n＝αw_f,h_n＝βh_fWherein α and β are scaling factors, which can be set according to the specific second recognition region, so that the amplified labeling frame (x) of the sheep neck can be obtained_n,y_n,w_n,h_n) Corresponding to the labelled box b in figure 3. Similarly, the same calculation idea can be adopted to further obtain the amplification labeling frame of the sheep body or other parts of the sheep. Wherein, the labeled box c in fig. 3 can correspond to the amplified labeled box of the body of the sheep. Of course, in this embodiment, sheep is taken as an example of the detection target, and in practical applications, the above calculation concept can be adopted for all detection targets, such as human, automobile, and object, to amplify the labeling boxes.

S202, labeling the sample image according to the coordinates of the amplification labeling frame of the at least one second identification area to obtain the amplification labeling frame of the at least one second identification area.

Optionally, after the coordinate information of the amplification labeling frame of any second identification region is obtained through calculation in the above steps, the amplification labeling frame formed by the coordinate information may be used to label the second identification region in the sample image, so as to obtain a new training sample image, and further, the detection model may be trained according to the new training sample image.

In addition, since different sample images differ from each other in the posture and the like of the same detection target, the sizes of the existing labeling frame and the amplification labeling frame of the same detection target also differ from each other, for example: in the two pictures containing the sheep, due to the difference of postures of the sheep, the corresponding labeling frames of the sheep face, the sheep neck and the sheep body are also different. When a new training sample is obtained, the same detection target in a plurality of sample images can be subjected to label frame amplification, and the plurality of sample images are used as training samples to train the model, so that the precision of model training is improved.

Fig. 4 is a schematic flowchart of another detection model training method based on semi-supervised learning according to an embodiment of the present application, and optionally, as shown in fig. 4, the method may further include:

s301, predicting the sample image by adopting a detection model to obtain a first prediction frame of the first identification area and a second prediction frame of at least one second identification area.

Optionally, the existing labeling frame and the amplified labeling frame are used for model training to obtain a detection model, and then the detection model can be used for detecting an input image containing the detection target, wherein the image input into the detection model is an image to be detected which does not contain any labeling frame, after target detection, a detection result is to frame out the detected target object to obtain an image containing the labeling frame, and the labeling frame obtained through model detection is also a prediction frame output by the model, wherein the prediction frame can include the prediction frame of the first identification area and the prediction frame of the second identification area.

S302, determining a loss function value of the detection model according to the first prediction frame, the at least one second prediction frame, the existing labeling frame and the at least one amplification labeling frame.

And S303, updating the parameters of the detection model according to the loss function value until the updated loss function value of the detection model is minimum.

In some embodiments, the accuracy of the obtained prediction frame may be low, and in order to optimize the detection model and improve the detection accuracy of the detection model, the optimized detection model may be obtained by calculating a loss function value of the detection model and continuously iteratively updating a plurality of parameters in the model by using the loss function value until the prediction frame is closer to the existing labeling frame.

In this embodiment, assuming that the augmented labeling frames of the sheep neck and the sheep body are obtained by augmenting the existing labeling frames of the sheep face, the sheep neck and the sheep body can be detected according to the new detection model obtained by training the sample images with the labeling frames of the sheep face, the sheep neck and the sheep body. Then when using the loss function for model optimization, the corresponding loss function can be defined as: l ═ L_face+ω_nL_neck+ω_bω_brL_bodyWherein L is_face、L_neck、L_bodyThe loss functions are respectively corresponding to prediction frames of the sheep face, the sheep neck and the sheep body. Considering that the augmentation marking frames of the sheep neck and the sheep body are automatically calculated according to the preset augmentation rule and have certain errors, the loss functions of the sheep neck and the sheep body are set with certain weights, and omega_n、ω_bWill be set lower (e.g. ω)_n＝0.2，ω_b0.03) as long as it can exert a certain weak supervision effect. While for the weight parameter omega_brThe calculation method of (a) can be as follows:

generally, in a dense scene, the body labeling boxes of two sheep will be mutually occluded, resulting in the calculated loss function L_bodyIs inaccurate. Therefore, for the loss function of the sheep body part, an occlusion ratio is calculated based on all sheep body labeling boxes to further reduce L_bodyThe weight of (c).

Fig. 5 is an overlapping schematic view of a sheep body labeling frame provided in the embodiment of the present application. As shown in FIG. 5, the two sheep bodies are marked with the frame R_A、R_BThen their overlapping area ratio is recorded as

Obviously, the value is a value between 0 and 1, and the overlapping proportion of a certain sheep body labeling frame R and all other sheep body labeling frames is as follows:

then, with r_iThe associated loss attenuation weights are:

by passing

The interference brought by uncertainty in semi-supervised learning is further weakened, and the model optimization accuracy is higher.

Fig. 6 is a schematic flowchart of another method for training a detection model based on semi-supervised learning according to an embodiment of the present application, and optionally, as shown in fig. 6, the determining a loss function value of the detection model according to the first prediction box, the at least one second prediction box, the existing label box, and the at least one amplification label box in step S302 may include:

s401, determining a loss function value corresponding to the first identification area according to the first prediction frame and the existing marking frame.

S402, determining a loss function value corresponding to the second identification area according to the second prediction frame and the corresponding amplification marking frame.

Alternatively, the specific calculation of the loss function value corresponding to the first identification area and the loss function value corresponding to the at least one second identification area may be calculated by using a conventional loss function calculation formula. Specific calculations may be understood with reference to the following.

And S403, determining a loss function value of the detection model according to the loss function value corresponding to the first identification area and the loss function value corresponding to the at least one second identification area.

Optionally, for the detection target in the present application being a sheep, a weighting algorithm may be performed on the determined loss function corresponding to the first identification region (sheep face) and the determined loss function corresponding to the second identification region (assuming that there are two second identification regions: sheep neck and sheep body) according to the optimized loss function definition formula corresponding to the sheep in step S303, so as to obtain a loss function of the detection model.

It should be noted that the definition of the loss function of the detection model recited in step S303 is only an example. For different detection targets, for example, when the detection target is a person or an automobile, the loss function of the detection model defined in step S303 may also change correspondingly, wherein the plurality of weighted terms in the loss function of the detection model may also correspond to the loss functions of the first identification area and the at least one second identification area corresponding to the person or the automobile.

Fig. 7 is a schematic flowchart of another detection model training method based on semi-supervised learning according to an embodiment of the present application, where optionally, each pixel point in an existing labeling frame corresponds to at least one first anchor point frame; as shown in fig. 7, in the step S401, determining the loss function value corresponding to the first identification area according to the first prediction box and the existing label box may include:

s501, respectively converting the coordinates of the first prediction frame and the existing labeling frame according to the coordinates of at least one first anchor point frame.

S502, determining a loss function value corresponding to the first identification area according to the transformed coordinates of the first prediction frame and the coordinates of the existing labeling frame.

It should be noted that, in order to reduce the amount of calculation in the process of model optimization, at least one anchor point frame is pre-assigned to each pixel in each existing labeling frame during model training, so that the existing labeling frame coordinates and the prediction labeling frame coordinates can be transformed with respect to the anchor point frame coordinates. So as to facilitate model learning and reduce the calculation amount of model optimization.

For the first identification region, assume that the coordinates of a first anchor point box are (x, y, w, h), and the coordinates of the existing label box are (x, y, w, h)^G,y^G,w^G,h^G) The coordinate of the first prediction box is (x)^p,y^p,w^p,h^p). Then, with respect to the coordinates of the first anchor point frame, the coordinates of the existing labeling frame after transformation are:

φ^G(x)＝(x^G-x)/w

φ^G(y)＝(y^G-y)/h

φ^G(w)＝log(w^G/w)

φ^G(h)＝log(h^G/h)

namely, the coordinates of the existing labeling frame relative to the first anchor point frame are transformed.

Similarly, the coordinates of the first prediction frame after the transformation and transformation of the first prediction frame with respect to the coordinates of the first anchor point frame are:

φ^p(x)＝(x^p-x)/w

φ^p(y)＝(y^p-y)/h

φ^p(w)＝log(w^p/w)

φ^p(h)＝log(h^p/h)

alternatively, as explained in the previous embodiment, in the present embodiment, the first identification area is a goat face, and then, a loss function may be adopted

And calculating to obtain a loss function corresponding to the first identification area. Wherein L is_locAs a position error, L_confIs a category confidence error;

predicting a loss for the class of the first anchor box of the positive sample;

predicting the loss for the class of the first anchor box of the negative sample, α being L respectively_locAnd L_confWeight of (1), N_mThe number of matches between the first prediction box and the anchor box (i.e. the positive sample first anchor box) during the training process; i is_ijkIs an illustrative function, when the first anchor block is a positive sample, I_ijkIs 1, when the first anchor block is negative, I_ijkThen 0; c is an element of [1, C ∈]C is to be detectedNumber of object classes. When the first anchor box predicts the goat face, the corresponding T_cIs 1. Through the calculation, the loss function corresponding to the first identification area can be obtained.

Fig. 8 is a schematic flowchart of another detection model training method based on semi-supervised learning according to an embodiment of the present disclosure, where optionally, each pixel point in the augmented labeling frame corresponds to at least one second anchor point frame; as shown in fig. 8, the step S402 of determining a loss function value corresponding to a second identification region according to each second prediction box and the corresponding amplification labeling box may include:

s601, respectively converting the coordinates of the second prediction frame and the corresponding amplification labeling frame according to the coordinates of at least one second anchor point frame.

S602, determining a loss function value corresponding to the second identification area according to the transformed coordinates of the second prediction frame and the transformed coordinates of the amplification marking frame.

Optionally, for different second identification regions, each pixel point in the amplification labeling frame corresponding to each second identification region is also pre-assigned with at least one second anchor frame. Specifically, the calculation process of the loss function value corresponding to the second identification area is the same as the calculation method of the loss function value corresponding to the first identification area, and the coordinates of the labeling frame of the first identification area may be replaced with the coordinates of the labeling frame of the second identification area. For the specific calculation process, detailed description is omitted here.

Fig. 9 is a schematic flowchart of a target detection method based on semi-supervised learning according to an embodiment of the present application, and optionally, as shown in fig. 9, the method may include:

and S701, acquiring a detection image.

S702, detecting a first identification area and at least one second identification area of a detection target in a detection image according to a pre-trained detection model.

The detection model is obtained by training through the detection model training method based on semi-supervised learning provided by the embodiment of the application.

Optionally, the detection model may be used to detect a detection target in the qualified image, so as to obtain a detection result. For example: the detection model that the training obtained is sheep detection model, and the image that so accords with the condition also contains the image of sheep, like this, through detecting, can confirm sheep face, sheep neck and sheep health etc. in the image. Of course, the second identification regions listed in the above embodiments are the neck and body of the sheep, in practical cases, the two parts are not limited to the two parts, and more amplification labeling boxes of the second identification regions can be obtained according to different amplification ratios.

In summary, the embodiment of the present application provides a detection model training method based on semi-supervised learning, and the method includes: respectively adopting at least one preset amplification rule according to the existing labeling frame of the first identification area of the detection target in the sample image to obtain an amplification labeling frame of at least one second identification area of the detection target in the sample image; each preset amplification rule corresponds to a second identification area; performing model training according to the sample image with the existing labeling frame and the amplification labeling frame to obtain a detection model; the detection model is used for detecting a first identification area and at least one second identification area of a detection target in an image to be detected. The existing labeling frame of the first identification area of the detection target in the sample image is subjected to preset amplification rules to obtain other amplification labeling frames of the plurality of identification areas, and model training is carried out according to the existing labeling frame of the first identification area and the amplification labeling frames of the other identification areas to obtain a detection model. According to the method, a plurality of amplification labeling frames are obtained on the basis of the existing labeling frames by presetting amplification rules, so that the amplification speed of a training sample can be effectively increased, the training efficiency of a detection model is improved, and meanwhile, the generalization capability of the detection model is effectively enhanced. In addition, the improved loss function is adopted to optimize the detection model obtained by training, so that the detection accuracy of the detection model is higher.

Fig. 10 is a schematic diagram of a detection model training apparatus based on semi-supervised learning according to an embodiment of the present application, where the apparatus may include: an amplification module 801, a training module 802;

the amplification module 801 is configured to obtain an amplification labeling frame of at least one second identification area of the detection target in the sample image by respectively using at least one preset amplification rule according to an existing labeling frame of the first identification area of the detection target in the sample image; each preset amplification rule corresponds to a second identification area;

a training module 802, configured to perform model training according to a sample image with an existing labeling frame and an amplification labeling frame to obtain a detection model; the detection model is used for detecting a first identification area and at least one second identification area of a detection target in an image to be detected.

Optionally, the amplifying module 801 is specifically configured to obtain a coordinate of an amplification labeling frame of the second identification region according to the coordinate of an existing labeling frame of the first identification region and a preset amplification ratio corresponding to a preset amplification rule; and labeling the sample image according to the coordinates of the amplification labeling frame of the at least one second identification area to obtain the amplification labeling frame of the at least one second identification area.

Optionally, as shown in fig. 11, the apparatus may further include: a prediction module 803, a determination module 804, and an update module 805;

the prediction module 803 is configured to predict the sample image by using the detection model, so as to obtain a first prediction frame of the first identification region and a second prediction frame of at least one second identification region;

a determining module 804, configured to determine a loss function value of the detection model according to the first prediction box, the at least one second prediction box, the existing labeling box, and the at least one amplification labeling box;

the updating module 805 is configured to update the parameters of the detection model according to the loss function value until the loss function value of the updated detection model is minimum.

Optionally, the determining module 804 is specifically configured to determine a loss function value corresponding to the first identification area according to the first prediction box and the existing labeling box; determining a loss function value corresponding to a second identification region according to a second prediction frame and a corresponding amplification marking frame; and determining the loss function value of the detection model according to the loss function value corresponding to the first identification area and the loss function value corresponding to the at least one second identification area.

a determining module 804, configured to convert coordinates of the first prediction frame and coordinates of an existing labeling frame according to coordinates of at least one first anchor point frame; and determining a loss function value corresponding to the first identification area according to the transformed coordinates of the first prediction frame and the coordinates of the existing marking frame.

a determining module 804, configured to respectively convert coordinates of the second prediction frame and the corresponding amplification labeling frame according to coordinates of at least one second anchor point frame; and determining a loss function value corresponding to the second identification area according to the transformed coordinates of the second prediction frame and the transformed coordinates of the amplification marking frame.

The above-mentioned apparatus is used for executing the method provided by the foregoing embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).

Fig. 12 is a schematic diagram of another detection model training apparatus based on semi-supervised learning according to an embodiment of the present application, where the apparatus may be integrated in a terminal device or a chip of the terminal device, and the terminal may be a computing device with a data processing function.

The device includes: a processor 901, a memory 902.

The memory 902 is used for storing programs, and the processor 901 calls the programs stored in the memory 902 to execute the above method embodiments. The specific implementation and technical effects are similar, and are not described herein again.

Optionally, the invention also provides a program product, for example a computer-readable storage medium, comprising a program which, when being executed by a processor, is adapted to carry out the above-mentioned method embodiments.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A detection model training method based on semi-supervised learning is characterized by comprising the following steps:

2. The method of claim 1, wherein obtaining an amplification labeling frame of at least one second identification region of the detection target in the sample image by respectively using at least one preset amplification rule according to an existing labeling frame of a first identification region of the detection target in the sample image comprises:

3. The method of claim 2, wherein the method further comprises:

4. The method of claim 3, wherein said determining a loss function value for said detection model based on said first prediction box, at least one of said second prediction box, said existing annotation box, and at least one of said augmented annotation box comprises:

5. The method of claim 4, wherein each pixel point in the existing label box corresponds to at least one first anchor point box;

6. The method of claim 4, wherein each pixel in the augmented annotation box corresponds to at least one second anchor box;

7. A detection model training device based on semi-supervised learning is characterized by comprising: an amplification module and a training module;

8. The apparatus according to claim 7, wherein the amplification module is specifically configured to obtain coordinates of an amplified labeled frame of the second identification region according to coordinates of an existing labeled frame of the first identification region and a preset amplification ratio corresponding to one of the preset amplification rules; and labeling the sample image according to the coordinates of the amplification labeling frame of at least one second identification area to obtain the amplification labeling frame of at least one second identification area.

9. A computing device, comprising: a processor, a storage medium and a bus, the storage medium storing program instructions executable by the processor, the processor and the storage medium communicating via the bus when the computing device is running, the processor executing the program instructions to perform the steps of the semi-supervised learning based detection model training method as claimed in any one of claims 1 to 6.

10. A storage medium having stored thereon a computer program for performing the steps of the semi-supervised learning based detection model training method as claimed in any one of claims 1 to 6 when executed by a processor.