CN111414821B

CN111414821B - Target detection method and related device

Info

Publication number: CN111414821B
Application number: CN202010167999.4A
Authority: CN
Inventors: 宋广录; 刘宇
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-03-11
Filing date: 2020-03-11
Publication date: 2023-12-19
Anticipated expiration: 2040-03-11
Also published as: CN111414821A

Abstract

The embodiment of the application provides a target detection method and a related device, wherein the method comprises the following steps: determining a first original region suggestion frame of the target image according to the region detection model; determining a first positioning area suggestion frame of a positioning task on a target image according to the original area suggestion frame, and determining a first classification area suggestion frame of a classification task on the target image according to the original area suggestion frame; determining a first feature according to the first positioning area suggestion frame, and determining a second feature according to the first classification area suggestion frame; and obtaining a target detection result according to the first characteristic and the second characteristic, so that the accuracy in target detection can be improved.

Description

Target detection method and related device

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a target detection method and a related device.

Background

The general object detection algorithm is an important problem in the fields of computer vision and intelligent video monitoring, and is different from the classification recognition task, and the general object detection task not only requires accurate classification of objects existing in an image, but also requires accurate regression of the positions of the objects. In the existing scheme, a shared region suggestion box and a feature extractor are generally used for simultaneously carrying out classification tasks and regression tasks, and the accuracy of the detector in detection is low due to different optimization targets of the two tasks.

Disclosure of Invention

The embodiment of the application provides a target detection method and a related device, which can improve the accuracy of target detection.

A first aspect of an embodiment of the present application provides a target detection method, including:

determining a first original region suggestion frame of the target image according to the region detection model;

determining a first positioning area suggestion frame of the positioning task on the target image according to the original area suggestion frame, and determining a first classification area suggestion frame of the classification task on the target image according to the original area suggestion frame;

determining a first feature according to the first positioning area suggestion frame, and determining a second feature according to the first classification area suggestion frame;

and obtaining a target detection result according to the first characteristic and the second characteristic.

With reference to the first aspect, in one possible implementation manner, determining, according to the first original region suggestion frame, a first positioning region suggestion frame of a positioning task on the target image includes:

determining a first offset according to the first original region suggestion frame;

and determining a first positioning area suggestion frame according to the first original area suggestion frame and the first offset.

In this example, the first original region suggestion frame and the first offset are used to determine the first positioning region suggestion frame, so that the corresponding region suggestion frame can be determined for the positioning task, and further feature extraction can be performed according to the region suggestion frame, and accuracy in feature extraction can be improved.

With reference to the first aspect, in one possible implementation manner, determining the first offset according to the first original area suggestion box includes:

determining a third feature according to the first original region suggestion box;

inputting the third characteristic into a first operation network for operation to obtain a first operation result;

and determining a first offset according to the width value, the height value and the first operation result of the first original region suggestion frame.

With reference to the first aspect, in one possible implementation manner, determining, according to the first original region suggestion box, a first classification region suggestion box of the classification task on the target image includes:

according to the first original region suggestion frame, M second offset values are determined, wherein M is a positive integer;

and determining a first classification area suggestion frame according to the M second offsets.

In this example, M second offsets are determined according to the first original region suggestion frame, and corresponding sub-regions in the plurality of sub-regions divided by the first original region suggestion frame are offset according to the M second offsets, so as to obtain a first classification region suggestion frame, so that accuracy in determining the first classification region suggestion frame can be improved.

With reference to the first aspect, in one possible implementation manner, determining M second offsets according to the first original area suggestion box includes:

Dividing a first original region suggestion frame into k×k sub-regions;

inputting the third characteristic into a second operation network for operation to obtain a second operation result;

and determining the offset of each sub-region in the k sub-regions according to the width value, the height value and the second operation result of the first original region suggestion frame so as to obtain M second offsets, wherein M is equal to k.

In this example, the first original region suggestion frame is divided into k×k sub-regions, and the second operation result is obtained according to the second operation network, so that a corresponding offset can be determined for each sub-region, and accuracy of determining the first classification region suggestion frame is improved.

With reference to the first aspect, in one possible implementation manner, the first layer of the first operation network and the first layer of the second operation network are the same layer.

In this example, since the first operation network and the first layer of the second operation network are the same layer, the total number of layers required by the model can be reduced, thereby reducing the complexity of the model.

With reference to the first aspect, in one possible implementation manner, the target detection method is implemented by a target detection neural network, and the method further includes:

and adjusting the target detection neural network through the sample image and the sample label of the sample image to obtain the trained target detection neural network.

With reference to the first aspect, in one possible implementation manner, training the target detection neural network through the sample image and the sample label of the sample image to obtain a trained target detection neural network includes:

determining a second original region suggestion frame of the sample image according to the region detection model;

determining a second positioning area suggestion frame of the positioning task on the sample image according to the second original area suggestion frame, and determining a second classification area suggestion frame of the classification task on the sample image according to the second original area suggestion frame;

determining a fourth feature according to the second positioning area suggestion frame, and determining a fifth feature according to the second classification area suggestion frame; determining a target loss function according to at least the fourth feature, the fifth feature and the sample label;

training the target detection neural network according to the target loss function to obtain the trained target detection neural network.

In this example, the corresponding region suggestion boxes are respectively determined for the positioning task and the detection task, so that the characteristics corresponding to the positioning task and the detection task can be obtained, the target detection neural network is trained based on the characteristics, the trained target detection neural network is obtained, the accuracy of the positioning task and the detection task in the process of obtaining the characteristics can be improved, and the accuracy of the trained target detection neural network in the process of performing target detection is improved.

With reference to the first aspect, in one possible implementation manner, determining the objective loss function according to at least the fourth feature, the fifth feature, and the sample label includes:

determining a first loss function of the positioning task according to the fourth characteristic and the sample label;

determining a second loss function of the classification task according to the fifth feature and the sample label;

and determining a target loss function according to the first loss function and the second loss function.

In this example, the target loss function is determined through the first loss function corresponding to the positioning task and the second loss function corresponding to the classifying task, and compared with the corresponding loss function determined by adopting the region suggestion frame shared by the positioning task and the classifying task and the features extracted by the feature extractor in the existing scheme, the accuracy of determining the target loss function can be improved, and the accuracy of the trained target detection neural network is improved.

With reference to the first aspect, in one possible implementation manner, determining the target loss function according to at least the fourth feature and the fifth feature includes:

determining a first original loss function of the positioning task and a second original loss function of the classifying task according to the third characteristics and the sample labels;

and determining an objective loss function according to the first original loss function, the second original loss function, the first loss function and the second loss function.

according to the third characteristic, determining a first original loss function of the positioning task and a second original loss function of the classifying task;

determining a first progressive constraint loss function of the positioning task according to the third characteristic and the fourth characteristic;

determining a second progressive constraint loss function of the classification task according to the third feature and the fifth feature;

and determining an objective loss function according to the first original loss function, the second original loss function, the first progressive constraint loss function, the second progressive constraint loss function, the first loss function and the second loss function.

With reference to the first aspect, in one possible implementation manner, determining, according to the fourth feature, a first progressive constraint loss function of the positioning task includes:

determining a first accuracy of the positioning task according to the third characteristic;

determining a second accuracy of the positioning task according to the fourth characteristic;

and determining a first progressive constraint loss function according to the first precision and the second precision.

With reference to the first aspect, in a possible implementation manner, according to a fifth feature, determining a second progressive constraint loss function of the classification task includes:

determining a first confidence coefficient of the classification task according to the third characteristic;

determining a second confidence coefficient of the classification task according to the fifth characteristic;

and determining a second progressive constraint loss function according to the first confidence coefficient and the second confidence coefficient.

A second aspect of embodiments of the present application provides an object detection apparatus, including:

a first determining unit, configured to determine a first original region suggestion frame of the target image according to the region detection model;

the second determining unit determines a first positioning area suggestion frame of the positioning task on the target image according to the original area suggestion frame and determines a first classification area suggestion frame of the classification task on the target image according to the original area suggestion frame;

A third determining unit, configured to determine a first feature according to the first positioning area suggestion frame, and determine a second feature according to the first classification area suggestion frame;

and the detection unit is used for obtaining a target detection result according to the first characteristic and the second characteristic.

With reference to the second aspect, in one possible implementation manner, in determining a first positioning area suggestion frame of the positioning task on the target image according to the original area suggestion frame, the second determining unit is configured to:

With reference to the second aspect, in one possible implementation manner, in determining the first offset according to the first original area suggestion box, the second determining unit is configured to:

With reference to the second aspect, in one possible implementation manner, in determining a first classification area suggestion frame of the classification task on the target image according to the first original area suggestion frame, the second determining unit is configured to:

With reference to the second aspect, in one possible implementation manner, in determining M second offsets according to the first original area suggestion box, the second determining unit is configured to:

dividing a first original region suggestion frame into k×k sub-regions;

With reference to the second aspect, in one possible implementation manner, the first layer of the first operation network and the first layer of the second operation network are the same layer.

With reference to the second aspect, in one possible implementation manner, the object detection device is implemented by an object detection neural network, and the device is further configured to:

training the target detection neural network through the sample image and the sample label of the sample image to obtain the trained target detection neural network.

With reference to the second aspect, in one possible implementation manner, in training the target detection neural network through the sample image and the sample label of the sample image to obtain a trained target detection neural network, the apparatus is further configured to:

and adjusting the target detection neural network according to the target loss function to obtain the trained target detection neural network.

With reference to the second aspect, in one possible implementation manner, in determining the objective loss function according to at least the fourth feature, the fifth feature and the sample label, the apparatus is further configured to:

With reference to the second aspect, in one possible implementation manner, in determining the objective loss function according to at least the fourth feature and the fifth feature, the apparatus is further configured to:

With reference to the second aspect, in a possible implementation manner, in determining a first progressive constraint loss function of the positioning task according to the fourth feature, the apparatus is further configured to:

With reference to the second aspect, in a possible implementation manner, in determining a second progressive constraint loss function of the classification task according to the fifth feature, the apparatus is further configured to:

A third aspect of the embodiments of the present application provides an object detection device, including a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, and where the memory is configured to store a computer program, the computer program including program instructions, and the processor is configured to invoke the program instructions to execute the step instructions as in the first aspect of the embodiments of the present application.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes a computer to perform part or all of the steps as described in the first aspect of the embodiments of the present application.

A fifth aspect of the embodiments of the present application provides a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps as described in the first aspect of the embodiments of the present application. The computer program product may be a software installation package.

The implementation of the embodiment of the application has at least the following beneficial effects:

according to the region detection model, a first original region suggestion frame of the target image is determined, a first positioning region suggestion frame of a positioning task on the target image is determined according to the original region suggestion frame, a first classification region suggestion frame of a classification task on the target image is determined according to the original region suggestion frame, a first feature is determined according to the first positioning region suggestion frame, a second feature is determined according to the first classification region suggestion frame, and a target detection result is obtained according to the first feature and the second feature.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of a target detection method according to an embodiment of the present application;

fig. 2A is a schematic flow chart of a target detection method according to an embodiment of the present application;

fig. 2B is an application schematic diagram of a target detection method according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of another object detection method according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart of another object detection method according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an object detection device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an object detection device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly understand that the embodiments described herein may be combined with other embodiments.

In order to better understand the target detection method provided in the embodiments of the present application, an application scenario of the target detection method is first briefly described below. Referring to fig. 1, fig. 1 is a schematic diagram of a target detection method according to an embodiment of the present application. As shown in fig. 1, taking an image of an aircraft as an example, the target detection method may be implemented by using a target detection model, inputting the target image into the target detection model, before performing feature extraction on the positioning task and the classification task, using the aircraft as an example, the target detection model may be the position information of the aircraft and the information of the appearance of the aircraft, so that the feature extraction on the positioning task and the classification task can be performed by using a shared feature extractor in the existing scheme, and the feature extraction on the positioning task and the classification task can be performed respectively, so that the target detection model can obtain the target detection result accurately when performing feature extraction on the positioning task and the classification task respectively.

Referring to fig. 2, fig. 2 is a schematic flow chart of a target detection method according to an embodiment of the present application. As shown in fig. 2, the target detection method includes:

201. and determining a first original region suggestion frame of the target image according to the region detection model.

The target image may be an image to be detected for target detection, for example, an image including an airplane for detecting the airplane, or the like.

The region detection model may be a region detection model in an existing scheme, such as an RPN determination region suggestion box region detection model in the fast RCNN scheme, or the like. The first original region suggestion box may be a region suggestion box shared by the positioning task and the classifying task in the existing scheme, which may be specifically understood as follows: in the existing scheme (such as the fast RCNN), the feature extraction of the positioning task and the classification task can be performed through the first original region suggestion frame, so as to obtain corresponding features, and the detection result is determined according to the corresponding features.

The region suggestion box may also be understood as a labeling box or the like for performing feature extraction.

202. According to the original region suggestion frame, a first positioning region suggestion frame of the positioning task on the target image is determined, and according to the original region suggestion frame, a first classification region suggestion frame of the classification task on the target image is determined.

The first offset and the second offset may be determined according to the original region suggestion frame, and the first positioning region suggestion frame and the first classification region suggestion frame may be determined according to the first offset and the second offset, respectively.

203. The first feature is determined according to the first localization area suggestion box and the second feature is determined according to the first classification area suggestion box.

The first classification region suggestion frame may be subjected to feature extraction by a feature extraction algorithm, so as to obtain a first feature, and the first classification region suggestion frame may be subjected to feature extraction to obtain a second feature, where the feature extraction algorithm may be, for example, bilinear difference method or the like.

The first feature may be, for example, accuracy of the positioning and the second feature may be, for example, classification confidence.

204. And obtaining a target detection result according to the first characteristic and the second characteristic.

And calculating according to the first characteristic to obtain the accuracy of the positioning task, and calculating according to the second characteristic to obtain the classification confidence. The target detection result comprises the accuracy of the positioning task and the classification confidence of the classification task. The method for obtaining the accuracy and the classification confidence of the positioning task can be obtained by calculation through a corresponding full-connection operation network.

In one possible embodiment, a possible method for determining a first positioning area suggestion box of a positioning task on a target image includes the following steps A1-A2:

a1, determining a first offset according to a first original region suggestion frame;

a2, determining a first positioning area suggestion frame according to the first original area suggestion frame and the first offset.

The first offset may be determined according to an operation result of an operation performed in the first operation network by the third feature of the first original region suggestion box.

The first original region suggestion box may be translated by a first offset to obtain a first localized region suggestion box. The first positioning area suggestion frame has the same size as the first original area suggestion frame, and it can be understood that the first positioning area suggestion frame and the first original area suggestion frame are area suggestion frames with the same size and different positions.

In one possible embodiment, one possible method of determining the first offset includes steps B1-B3, as follows:

b1, determining a third characteristic according to the first original region suggestion frame;

b2, inputting the third characteristic into a first operation network for operation so as to obtain a first operation result;

and B3, determining a first offset according to the width value, the height value and the first operation result of the first original region suggestion frame.

And extracting the characteristics of the first original area suggestion frame through a characteristic extraction algorithm to obtain a third characteristic corresponding to the original area suggestion frame. The feature extraction algorithm may be an existing general feature extraction algorithm or the like.

The first operation network may be a three-layer full-connection network, the number of output data of a first layer of the first operation network may be 256, and the number of output data of a last layer of the first operation network is 2. The first result may be a normalized offset.

The method for determining the first offset may determine the first offset with reference to a method shown in the following formula according to the width value, the height value, and the first operation result of the first original region suggestion frame:

ΔR＝γF _r (F；θ _r )·(w,h)，

wherein w is the width value of the first original region suggestion frame, h is the height value of the first original region suggestion frame, gamma is a predefined scaling factor, F _r (F；θ _r ) As a first operation result, Δr is a first offset. The predefined scaling factor may guarantee the stability of the training.

In one possible embodiment, the embodiments further provide a method of determining the first feature from the first location area suggestion box, for example by bilinear interpolation.

In one possible embodiment, a possible method for determining a first classification area suggestion box of a classification task on a target image includes steps C1-C2, specifically as follows:

c1, determining M second offset values according to a first original region suggestion frame, wherein M is a positive integer;

and C2, determining a first classification area suggestion frame according to the M second offset values.

The first original region suggestion frame may be divided into a plurality of sub-regions, and the offset corresponding to each sub-region may be determined separately, so as to obtain M second offsets. M is the number of subregions.

The plurality of sub-regions divided by the first original region suggestion frame may be shifted by a corresponding second shift amount to determine the first classification region suggestion frame.

In one possible embodiment, a possible method for determining M second offsets according to the first original region suggestion box includes steps D1-D3, specifically as follows:

D1, dividing a first original region suggestion frame into k sub-regions;

d3, inputting the third characteristic into a second operation network for operation to obtain a second operation result;

and D3, determining the offset of each sub-region in the k sub-regions according to the width value, the height value and the second operation result of the first original region suggestion frame so as to obtain M second offset, wherein M is equal to k.

The dividing of the first original region suggestion box into k×k sub-regions may be performed in a uniform dividing manner, which may be specifically understood that the area of each sub-region is the same (or the number of feature points included is the same), and of course, may also be performed in a non-uniform dividing manner, which is not specifically limited herein.

The second operation network is a three-layer full-connection network, the number of output data of the first layer is 256, the number of output data of the second layer is 256, and the number of output data of the third layer is k.k.2.

According to the width value, the height value and the second operation result of the first original region suggestion frame, determining the offset of each sub-region in k×k sub-regions to obtain M second offsets, where the M second offsets may be obtained by a method shown in the following formula:

ΔC＝γF _c (F；θ _c )·(w,h)，

Wherein w is the width value of the first original region suggestion frame, and h is the firstThe height value of the original region suggestion frame, gamma is a predefined scaling factor, F _c (F；θ _c ) And delta C is a second offset as a result of the second operation. F (F) _c (F；θ _c ) Corresponding to each sub-region.

In one possible embodiment, a possible method for determining the second feature according to the first classification region suggestion box may include the steps of:

e1, acquiring a first number of sampling points of a first subarea, and acquiring a characteristic corresponding to each sampling point of the first subarea, wherein the first subarea is any one of k subareas;

e2, determining a first sub-feature of the first sub-region according to the feature and the first quantity of each sampling point in the first region;

and E3, acquiring the sub-feature corresponding to each sub-region in the k sub-regions by a method for acquiring the first sub-feature of the first sub-region so as to acquire the second feature.

The corresponding feature of each sampling point can be determined by a bilinear difference function.

The mean value of the features of the sampling points in the first sub-region may be obtained from the features of each sampling point and the first number, and the mean value is determined as the first sub-feature of the first sub-region.

The method for acquiring the first sub-feature of the first sub-region includes the steps E1 and E2 described above, so that the first sub-feature can be determined. And determining the corresponding sub-feature for each sub-region by the method, so as to obtain the second feature. Specifically, a set of sub-features corresponding to each sub-region may be determined as the second feature.

Of course, the maximum value of the feature corresponding to each sampling point may be determined as the first sub-feature of the first sub-region. This is by way of example only and is not intended to be limiting.

In one possible embodiment, the second feature may be determined by a method as shown in the following formula:

where G (x, y) is the (x, y) th sub-region and G (x, y) is the number of sampling points of the (x, y) th sub-region during pooling. (p) _x ,p _y ) Is the coordinates of the sampling points in the grid G (x, y), (p) _x ,p _y ) Is the coordinate of p, F _B (. Cndot.) is a bilinear interpolation function, p ₀ The x-coordinate, p, of the sampling points in each grid ₁ Is the y coordinate. Δc (x, y, 1) is the offset in the x direction, and Δc (x, y, 2) is the offset in the y direction.

In one possible embodiment, the first layer of the first operation network and the first layer of the second operation network are the same layer, which can be specifically understood that the first operation network and the second operation network share the first layer.

In this example, since the first operation network and the first layer of the second operation network are the same layer, the total number of layers required by the model can be reduced, thereby reducing the complexity of the model. In one possible embodiment, the foregoing implementation manner may be implemented through a neural network, so that the embodiment of the application further provides a training method of the target detection method. Specifically, the target detection method may be implemented through a target detection neural network, for example, the target neural network is trained through a sample image and a sample label of the sample image, so as to obtain a trained target detection neural network.

The sample image may be, for example, an image marked by a manual method, so as to obtain the image marked by the sample. The trained target detection neural network is a detection network obtained after training is completed, the target detection network can carry out target detection on an input image to obtain a target detection result, and the target detection method provided by the embodiment of the application is used for detecting the input image.

In one possible embodiment, a possible method for training the target detection neural network through the sample image and the sample label of the sample image to obtain the trained target detection neural network includes steps F1-F5, specifically as follows:

f1, determining a second original region suggestion frame of the sample image according to the region detection model;

f2, determining a second positioning area suggestion frame of the positioning task on the sample image according to the second original area suggestion frame, and determining a second classification area suggestion frame of the classification task on the sample image according to the second original area suggestion frame;

f3, determining a fourth characteristic according to the second positioning area suggestion frame, and determining a fifth characteristic according to the second classification area suggestion frame;

f4, determining a target loss function at least according to the fourth characteristic, the fifth characteristic and the sample label;

and F5, adjusting the target detection neural network according to the target loss function to obtain the trained target detection neural network.

The steps F1 to F3 may refer to the implementation methods of the steps corresponding to the steps 201 to 203 in the foregoing embodiments, and are not repeated here.

The method of the foregoing embodiment corresponding to the present example and the like are also applicable to the present example, for example, the method of determining the fourth feature according to the second positioning area suggestion frame may be referred to as the method of determining the first feature in the foregoing embodiment, for example, the method of determining the second positioning area suggestion frame of the positioning task on the sample image according to the second original area suggestion frame may be referred to as the method of determining the first positioning area suggestion frame in the foregoing embodiment, and of course, other methods are also applicable to the present example and are not illustrated herein one by one.

The training may be performed prior to the application of the target detection method, and of course, the target detection method may be applied to training the trained target detection neural network.

The objective loss function may be determined according to at least the fourth feature, the fifth feature, and the sample label, or may be determined according to the fourth feature, the fifth feature, the third feature, the sample label, and the like. Of course, the objective loss function may also be determined by other schemes including at least the fourth feature, the fifth feature and the sample annotation. The sample labels may be label boxes, class labels, and the like.

And adjusting the target detection neural network according to the target loss function until the target detection neural network converges to obtain the trained target detection neural network.

In one possible embodiment, one possible method of determining the target loss function includes G10-G12, as follows:

g10, determining a first loss function of the positioning task according to the fourth characteristic and the sample label;

g11, determining a second loss function of the classification task according to the fifth feature and the sample label;

and G12, determining a target loss function according to the first loss function and the second loss function.

From the fourth feature and the sample annotation, determining the first loss function may determine the first loss function by a method as shown in the following formula:

wherein L1 is a first loss function,f _r (. Cndot.) is the feature extractor of the localization task, R (-) is the operation unit of the localization task, B is the regression supervision,>a second positioning area suggestion box which is a positioning task, F _l Is a third feature. Regression supervision was determined by sample labeling.

From the fifth feature and the sample annotation, determining the second loss function may determine the second loss function by a method as shown in the following formula:

wherein L2 is a second loss function,suggesting boxes for the second classification region, +.>f _c (. Cndot.) is the feature extractor of the classification task, y is the classification label, F _l For the third feature, C (·) is the operation unit of the classification task. The class labels are determined by sample labeling.

The sum of the first loss function and the second loss function may be determined as the target loss function.

In one possible embodiment, another possible method of determining the objective loss function includes G20-G23, as follows:

g20, determining a first original loss function of the positioning task and a second original loss function of the classification task according to the third characteristics and the sample labels;

g21, determining a first loss function of the positioning task according to the fourth characteristic and the sample label;

g22, determining a second loss function of the classification task according to the fifth characteristic and the sample label;

and G23, determining a target loss function according to the first original loss function, the second original loss function, the first loss function and the second loss function.

According to the third feature and the sample label, the first original loss function and the second original loss function are determined, and the method for determining the first loss function and the second loss function in the foregoing embodiment may be referred to, which is not described herein.

The first original loss function, the second original loss function, the sum of the first loss function and the second loss function may be determined as the target loss function.

In one possible embodiment, another possible method of determining the objective loss function includes G30-G35, as follows:

g30, determining a first original loss function of the positioning task and a second original loss function of the classifying task according to the third characteristic;

g31, determining a first progressive constraint loss function of the positioning task according to the third characteristic and the fourth characteristic;

g32, determining a second progressive constraint loss function of the classification task according to the third characteristic and the fifth characteristic;

g33, determining a first loss function of the positioning task according to the fourth characteristic and the sample label;

g34, determining a second loss function of the classification task according to the fifth characteristic and the sample label;

and G35, determining an objective loss function according to the first original loss function, the second original loss function, the first gradual constraint loss function, the second gradual constraint loss function, the first loss function and the second loss function.

The steps G30, G33, G34 may refer to the implementation manners of the corresponding steps G20, G21, G22 in the foregoing embodiments, which are not described herein again.

The first original loss function, the second original loss function, the first progressively constrained loss function, the second progressively constrained loss function, the sum of the first loss function and the second loss function may be determined as the target loss function.

Of course, the sum of the first original loss function, the second original loss function, the first progressive constraint loss function, the second progressive constraint loss function, the sum of the first loss function and the second loss function and the loss function of the region detection model can be determined as the target loss function.

One possible method for determining the first progressive constraint loss function of the positioning task according to the third feature and the fourth feature may be:

g311, determining the first accuracy of the positioning task according to the third characteristic;

g312, determining a second accuracy of the positioning task according to the fourth characteristic;

g313, determining a first progressive constraint loss function according to the first accuracy and the second accuracy.

In determining the first accuracy and the second accuracy, the determination needs to be made by sample labeling. The accuracy is the accuracy between the reflected feature and the sample annotation.

The first accuracy and the second accuracy may be summed with an artificially set super parameter and determined as a first progressive constraint loss function by an activation function. The activation function is the same as the function of the ReLU activation function.

In one possible implementation, the first progressive constraint loss function may be determined by a method as shown in the following formula:

/>

wherein M is _loc For the first progressive constraint loss function,for the first accuracy, +.>For the second accuracy, |·| ₊ Has the same effect as the ReLU activation function, m _r The super-parameter may be an optimization index considered to be set.

In one possible embodiment, a possible method for determining a second progressive constraint loss function of a classification task according to a fifth feature comprises the steps of:

g321, determining a first confidence coefficient of the classification task according to the third characteristic;

g322, determining a second confidence coefficient of the classification task according to the fifth characteristic;

g323, determining a second progressive constraint loss function according to the first confidence coefficient and the second confidence coefficient.

In determining the first confidence level and the second confidence level, the determination needs to be made through sample labeling.

The first and second confidence levels and the artificially set superscripts may be summed and determined as a second progressive constraint loss function by the activation function. The activation function is the same as the function of the ReLU activation function.

In one possible implementation, the second progressive constraint loss function may be determined by a method as shown in the following formula:

wherein M is _cls For the second progressive constraint loss function,for the confidence of the prediction of the y-th class, deltaC is the second offset, m _c Is superparameter, |·| ₊ Has the same effect as the ReLU activation function, τ _c Generating a function for the second region classification suggestion boxThe output result is a second classification area suggestion box.

Referring to fig. 2B, fig. 2B is a schematic application diagram of a target detection method according to an embodiment of the present application. As shown in fig. 2B, fig. 2B may show that the target detection method detects the target, or may show that the target detection neural network is trained, so as to obtain the trained target detection neural network.

First, a process of training the target detection neural network will be described: inputting a target image (sample image, sample label is included in the sample image, and is not shown in the figure) into a target detection neural network, analyzing and processing the target image through a framework and an RPN of the target detection neural network to obtain a second original region suggestion frame P on the target image, and then determining a second positioning region suggestion frame of a positioning task and a second classification region suggestion frame of a classification task according to the second original region suggestion frame through a target detection algorithm, wherein b is a determination region suggestion frame (shown in a process 1) shown in the figure, and a process of determining the second positioning region suggestion frame and the second classification region suggestion frame according to the second original region suggestion frame is shown in the figure.

For determining the second positioning region suggestion frame, the determination may be performed according to a first offset between the second positioning region suggestion frame and the second original region suggestion frame, specifically, the first original region suggestion frame may be offset according to a corresponding first offset to obtain the second positioning region suggestion frame, where τ is _r It is understood as a function or network or the like that offsets the second original region suggestion box as shown in process 2 in the figure.

For determining the second classification area suggestion frame, a second offset between the second classification area suggestion frame and the second original area suggestion frame may also be determined, where the second offset may be M, and specifically may be: dividing the second original region suggestion frame into k×k sub-regions, and respectively determining a second offset corresponding to each sub-region, where m=k×k. The division of the region suggestion box may refer to the specific method described in the foregoing embodiment, and will not be described herein. After determining the M second offsets, performing offset according to the offset corresponding to each sub-area, so as to obtain a second classification area suggestion box, as shown in the process 2 in the figure.

After the second positioning area suggestion frame and the second classification area suggestion frame are obtained, the features of the second positioning area suggestion frame and the second classification area suggestion frame can be extracted respectively to obtain fourth features And fifth feature->The method specifically comprises the following steps: for the fourth feature, a corresponding feature extractor f is employed _r (. Cndot.) and the extraction of R (-) by the operating unit, for the fifth feature, can be obtained by a feature extractor f _c (. Cndot.) and operating unit C (. Cndot.) were obtained as shown in scheme 2. Meanwhile, the third feature D may be obtained by performing feature extraction according to the second original region suggestion frame, and specifically, the third feature D may be obtained by using the common feature extractor f and the corresponding operation unit shown in the process 1.

An objective loss function is determined based at least on the fourth feature, the fifth feature, and the sample label. The method for determining the target loss function may be, for example: the objective loss function may be determined according to at least the fourth feature, the fifth feature, and the sample label, or may be determined according to the fourth feature, the fifth feature, the third feature, the sample label, and the like. Of course, the objective loss function may also be determined by other schemes including at least the fourth feature, the fifth feature and the sample annotation. The sample labels may be label boxes, class labels, and the like. The specific method for determining the objective loss function may refer to the method shown in the foregoing embodiment, and will not be described herein.

PC shown in part (C) of the graph 2B is a progressive constraint loss function, D in the first part (1) of (C) is a classification confidence, m, obtained according to a second original region suggestion box _c Setting the super-parameters to achieve more accurate classification as super-parameters, and D in the second part (2) in (C) is based onRegression accuracy, m, of the second original region suggestion box _r The super-parameters are set for realizing more accurate positioning. The method for determining the progressive constraint loss function may refer to the method shown in the foregoing embodiment, and will not be described herein.

And after determining the target loss function, adjusting the target detection neural network to obtain the trained target detection neural network.

In fig. 2B, the W TSD of the test result portion shows the detection result of the trained target detection neural network obtained by training by the above method, and the W/o TSD portion shows the detection result of detecting the target image set in the conventional scheme.

The following describes a process of performing object detection on the object detection neural network after training:

inputting the target image into a trained target detection neural network, determining P (the process of determining P in the process 1) by RPN and a corresponding skeleton in the network, and then executing all the processes 2, which can be specifically: according to P, determining a first classification area suggestion frame corresponding to the classification task and the positioning task respectively And a first positioning area suggestion box->And obtaining the corresponding first feature according to the corresponding feature extractor and the operation unit>And second feature->And obtaining a target detection result through operation, wherein the target detection result is shown as W TSD in the figure. For a specific method in target detection, etc., reference may be made to the method shown in the foregoing embodiment, and details are not repeated here.

Referring to fig. 3, fig. 3 is a flowchart of another object detection method according to an embodiment of the present application. As shown in fig. 3, the target detection method includes:

301. determining a first original region suggestion frame of the target image according to the region detection model;

302. determining a first offset according to the first original region suggestion frame;

303. determining a first positioning area suggestion frame according to the first original area suggestion frame and the first offset;

304. according to the first original region suggestion frame, M second offset values are determined, wherein M is a positive integer;

305. determining a first classification area suggestion frame according to the M second offset values;

306. determining a first feature according to the first positioning area suggestion frame, and determining a second feature according to the first classification area suggestion frame;

307. and obtaining a target detection result according to the first characteristic and the second characteristic.

In this example, the first location area suggestion frame is determined through the first original area suggestion frame and the first offset, so that a corresponding area suggestion frame can be determined for a location task, feature extraction can be performed according to the area suggestion frame, accuracy in feature extraction can be improved, M second offsets are determined according to the first original area suggestion frame, and the corresponding sub-areas in the multiple sub-areas divided by the first original area suggestion frame are offset according to the M second offsets, so that a first classification area suggestion frame is obtained, and accuracy in determination of the first classification area suggestion frame can be improved.

Referring to fig. 4, fig. 4 is a flowchart of another object detection method according to an embodiment of the present application. As shown in fig. 4, the target detection method includes:

401. determining a first original region suggestion frame of the target image according to the region detection model;

402. determining a first positioning area suggestion frame of the positioning task on the target image according to the original area suggestion frame, and determining a first classification area suggestion frame of the classification task on the target image according to the original area suggestion frame;

403. determining a first feature according to the first positioning area suggestion frame, and determining a second feature according to the first classification area suggestion frame;

404. Obtaining a target detection result according to the first characteristic and the second characteristic;

405. the target detection method is realized by a target detection neural network, and the method further comprises the following steps: training the target detection neural network through the sample image and the sample label of the sample image to obtain the trained target detection neural network.

In this example, the trained target detection neural network is implemented through the target detection neural network, and the sample image is used to extract the features according to the first feature and the second feature which are determined in the method, and the trained target detection neural network is obtained through training, so that the accuracy of the trained target detection neural network can be improved.

In accordance with the foregoing embodiments, referring to fig. 5, fig. 5 is a schematic structural diagram of an object detection device provided in an embodiment of the present application, where the object detection device includes a processor, an input device, an output device, and a memory, and the processor, the input device, the output device, and the memory are connected to each other, where the memory is configured to store a computer program, the computer program includes program instructions, and the processor is configured to invoke the program instructions, and the program includes instructions for performing the following steps;

In one possible implementation, determining a first positioning region suggestion box of the positioning task on the target image according to the first original region suggestion box includes:

In one possible implementation, determining the first offset according to the first original region suggestion box includes:

In one possible implementation, determining a first classification region suggestion box of the classification task on the target image according to the first original region suggestion box includes:

In one possible implementation, determining M second offsets according to the first original region suggestion box includes:

Dividing a first original region suggestion frame into k×k sub-regions;

In one possible implementation, the first layer of the first and second operation networks are the same layer.

In one possible implementation, the target detection method is implemented by a target detection neural network, and the method further includes:

In one possible implementation, training the target detection neural network through the sample image and the sample label of the sample image to obtain a trained target detection neural network includes:

In one possible implementation, determining the objective loss function based at least on the fourth feature, the fifth feature, and the sample label includes:

In one possible implementation manner, determining the objective loss function according to at least the fourth feature and the fifth feature includes:

In one possible implementation, determining the first progressive constraint loss function of the positioning task according to the fourth feature includes:

In one possible implementation, determining a second progressive constraint loss function of the classification task according to the fifth feature includes:

The foregoing description of the embodiments of the present application has been presented primarily in terms of a method-side implementation. It will be appreciated that the object detection device, in order to achieve the above-described functions, comprises corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied as hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the present application may divide the functional units of the object detection device according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated in one processing unit. The integrated units may be implemented in hardware or in software functional units. It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice.

In accordance with the foregoing, referring to fig. 6, fig. 6 is a schematic structural diagram of an object detection device according to an embodiment of the present application. As shown in fig. 6, the object detection device includes:

a first determining unit 601, configured to determine a first original region suggestion frame of the target image according to the region detection model;

a second determining unit 602 that determines a first positioning region suggestion frame of the positioning task on the target image according to the original region suggestion frame, and determines a first classification region suggestion frame of the classification task on the target image according to the original region suggestion frame;

a third determining unit 603, configured to determine a first feature according to the first positioning area suggestion frame, and determine a second feature according to the first classification area suggestion frame;

And the detection unit 604 is configured to obtain a target detection result according to the first feature and the second feature.

In one possible implementation, in determining a first positioning region suggestion frame of the positioning task on the target image according to the original region suggestion frame, the second determining unit 602 is configured to:

In one possible implementation, the second determining unit 602 is configured to:

In one possible implementation manner, in determining a first classification region suggestion box of the classification task on the target image according to the first original region suggestion box, the second determining unit 602 is configured to:

With reference to the second aspect, in one possible implementation manner, in determining M second offsets according to the first original area suggestion box, the second determining unit 602 is configured to:

dividing a first original region suggestion frame into k×k sub-regions;

In one possible implementation, the object detection device is implemented by an object detection neural network, and the device is further configured to:

In one possible implementation, in training the target detection neural network by the sample image and the sample label of the sample image to obtain a trained target detection neural network, the apparatus is further configured to:

In one possible implementation, in determining the objective loss function based at least on the fourth feature, the fifth feature, and the sample labeling, the apparatus is further configured to:

In one possible implementation manner, in determining the objective loss function according to at least the fourth feature and the fifth feature, the apparatus is further configured to include:

In a possible implementation manner, in determining the objective loss function according to at least the fourth feature and the fifth feature, the apparatus is further configured to:

In one possible implementation, in determining the first progressive constraint loss function of the positioning task according to the fourth feature, the apparatus is further configured to:

In one possible implementation, in determining the second progressive constraint loss function of the classification task according to the fifth feature, the apparatus is further configured to:

The present application also provides a computer storage medium storing a computer program for electronic data exchange, the computer program causing a computer to execute part or all of the steps of any one of the object detection methods described in the above method embodiments.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program that causes a computer to perform some or all of the steps of any one of the object detection methods described in the method embodiments above.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented either in hardware or in software program modules.

The integrated units, if implemented in the form of software program modules, may be stored in a computer-readable memory for sale or use as a stand-alone product. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory includes: a U-disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-only memory, random access memory, magnetic or optical disk, etc.

The foregoing has outlined rather broadly the more detailed description of embodiments of the present application, wherein specific examples are provided herein to illustrate the principles and embodiments of the present application, the above examples being provided solely to assist in the understanding of the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method of target detection, the method comprising:

determining a first positioning area suggestion frame of a positioning task on a target image according to the original area suggestion frame, and determining a first classification area suggestion frame of a classification task on the target image according to the original area suggestion frame;

obtaining a target detection result according to the first characteristic and the second characteristic;

the determining a first positioning area suggestion frame of the positioning task on the target image according to the first original area suggestion frame comprises the following steps:

determining the first positioning area suggestion frame according to the first original area suggestion frame and the first offset;

the determining a first offset according to the first original area suggestion box includes:

determining the first offset according to the width value, the height value and the first operation result of the first original region suggestion frame;

the determining a first classification area suggestion frame of the classification task on the target image according to the first original area suggestion frame comprises the following steps:

Determining the first classification area suggestion frame according to the M second offset values;

determining M second offsets according to the first original region suggestion frame, including:

dividing the first original region suggestion frame into k sub-regions;

and determining the offset of each sub-region in the k sub-regions according to the width value, the height value and the second operation result of the first original region suggestion frame so as to obtain M second offset values, wherein M is equal to k.

2. The method of claim 1, wherein the first operational network is the same layer as the first layer of the second operational network.

3. The method according to claim 1 or 2, wherein the target detection method is implemented by a target detection neural network, the method further comprising:

training the target detection neural network through a sample image and a sample label of the sample image to obtain the trained target detection neural network.

4. The method of claim 3, wherein the training the target detection neural network by the sample image and the sample annotation of the sample image to obtain a trained target detection neural network comprises:

determining a fourth feature according to the second positioning area suggestion frame, and determining a fifth feature according to the second classification area suggestion frame; determining a target loss function based at least on the fourth feature, the fifth feature, and the sample annotation;

5. The method of claim 4, wherein the determining a target loss function based at least on the fourth feature, the fifth feature, and the sample annotation comprises:

And determining the target loss function according to the first loss function and the second loss function.

6. The method of claim 5, wherein the determining a target loss function based at least on the fourth feature and the fifth feature comprises:

determining a first original loss function of the positioning task and a second original loss function of the classifying task according to the third characteristic and the sample label;

and determining the target loss function according to the first original loss function, the second original loss function, the first loss function and the second loss function.

7. The method of claim 4, wherein the determining a target loss function based at least on the fourth feature and the fifth feature comprises:

determining a second progressive constraint loss function of the classification task according to the third characteristic and the fifth characteristic;

and determining the target loss function according to the first original loss function, the second original loss function, the first gradual constraint loss function, the second gradual constraint loss function, the first loss function and the second loss function.

8. The method of claim 7, wherein determining a first progressive constraint loss function for the positioning task based on the fourth characteristic comprises:

and determining the first progressive constraint loss function according to the first precision and the second precision.

9. The method according to claim 7 or 8, wherein said determining a second progressive constraint loss function of said classification task based on said fifth feature comprises:

and determining the second progressive constraint loss function according to the first confidence coefficient and the second confidence coefficient.

10. An object detection device, the device comprising:

a second determining unit, configured to determine a first positioning region suggestion frame of a positioning task on a target image according to the original region suggestion frame, and determine a first classification region suggestion frame of a classification task on the target image according to the original region suggestion frame;

a third determining unit, configured to determine a first feature according to the first positioning region suggestion frame, and determine a second feature according to the first classification region suggestion frame;

the detection unit is used for obtaining a target detection result according to the first characteristic and the second characteristic;

In determining a first localization area suggestion frame of the localization task on the target image according to the original area suggestion frame, the second determining unit is configured to:

determining a first positioning area suggestion frame according to the first original area suggestion frame and the first offset;

in determining the first offset according to the first original region suggestion box, the second determining unit is configured to:

determining a first offset according to the width value, the height value and a first operation result of the first original region suggestion frame;

in determining a first classification region suggestion frame of the classification task on the target image according to the first original region suggestion frame, the second determining unit is configured to:

determining a first classification area suggestion frame according to the M second offset values;

in determining M second offsets according to the first original region suggestion frame, the second determining unit is configured to:

Dividing the first original region suggestion frame into k sub-regions;

11. An object detection device comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the method of any of claims 1-9.

12. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-9.