CN112613570B

CN112613570B - Image detection method, image detection device, equipment and storage medium

Info

Publication number: CN112613570B
Application number: CN202011601355.8A
Authority: CN
Inventors: 黄德威
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2024-06-11
Anticipated expiration: 2040-12-29
Also published as: CN112613570A

Abstract

The invention relates to the technical field of image processing, and in particular provides an image detection method, an image detection device, computer equipment and a readable storage medium, wherein the image detection method comprises the following steps: performing stitching processing on the scaled image and the filling image to obtain a stitched image; taking the spliced image as an input image, and inputting the input image into a preset target detection model to obtain a candidate frame of a target object, the confidence coefficient of the position of the candidate frame and the reliability degree corresponding to the confidence coefficient; and determining a final detection frame of the target object from the candidate frame according to the confidence coefficient of the position of the candidate frame and the reliability corresponding to the confidence coefficient. The invention can improve the recognition degree of the target detection model on the spliced image on the one hand, and can improve the accuracy of the final accurate frame on the other hand, thereby reducing the problem of false detection.

Description

Image detection method, image detection device, equipment and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image detection method, an image detection apparatus, a computer device, and a readable storage medium.

Background

In the prior art, the training mode of the neural network model is generally as follows: one branch of the model network outputs a confidence in the position of the detection frame and the other branch of the model network outputs a confidence in the class of the detection frame. In one application scenario, the confidence of the current detection frame category is assumed to be a, the confidence of the detection frame position is assumed to be B, and the final confidence is assumed to be AB. In this way, false detection is easily caused, because if the confidence a of the classification of the detection frame is smaller and the confidence B of the position of the detection frame is larger, the result of AB may be smaller than the actually used set confidence threshold, so that the final detection frame is positioned on a non-target area, for example, for face detection, the background may be currently regarded as a face, and thus false detection may occur.

Disclosure of Invention

The invention provides an image detection method, an image detection device, computer equipment and a readable storage medium, which are used for solving the problem that a traditional neural network model is used for false detection of a detected image.

The first aspect of the present invention provides an image detection method, comprising:

obtaining a scaled image and a filled image according to the detected image;

Performing stitching processing on the scaled image and the filling image to obtain a stitched image;

Inputting the spliced image serving as an input image into a preset target detection model to obtain a candidate frame of a target object, the confidence coefficient of the position of the candidate frame and the reliability degree corresponding to the confidence coefficient;

And determining a final detection frame of the target object from the candidate frame according to the confidence coefficient of the position of the candidate frame and the reliability degree corresponding to the confidence coefficient.

Optionally, the acquiring the scaled image and the filled image according to the detected image includes:

Determining whether the aspect ratio of the detected image accords with the aspect ratio of the input image of the target detection model;

If the aspect ratio of the detected image is not in accordance with the aspect ratio of the input image, scaling the detected image in equal proportion to obtain a scaled image; wherein the scaling factor of the equal-scale scaling is determined by the aspect ratio of the detected image and the aspect ratio of the input image;

copying a preset area in the scaled image to obtain a copied image;

and determining the length and width of a filling image, and amplifying the copied image to the length and width of the filling image so as to acquire the filling image.

Optionally, the scaling the detected image in equal proportion to obtain a scaled image includes:

Dividing the length of the detected image by the length of the input image to obtain the scaling factor;

and scaling the detected image in equal proportion according to the scaling multiple so as to obtain the scaled image.

Optionally, the determining the length and width of the filling image includes:

taking the input image length as the length of the filling image;

the width of the scaled image is subtracted from the input image width to determine the width of the fill image.

Optionally, the stitching the scaled image and the filling image to obtain a stitched image includes:

acquiring the weight ratio of the preset area of the scaled image and the filling image;

and carrying out weighted fusion on the scaled image and the filling image according to the weight proportion so as to acquire the spliced image.

Optionally, the acquiring the weight ratio of the preset area of the scaled image to the filling image includes:

Determining a ratio of the width of the duplicate image to the width of the fill image;

and taking the ratio of the width of the copied image to the width of the filled image as the weight proportion.

Optionally, the target detection model is obtained by training with labeled training images, and includes:

acquiring a training image set and a verification image set, wherein the training image set comprises the marked training image;

training a preset neural network model through the marked training image;

Inputting the verification image of the verification image set into the preset neural network model to obtain the confidence coefficient of the detection frame position corresponding to the detection target output by the preset neural network model and the reliability degree of the confidence coefficient of the detection frame position;

Acquiring a loss function value according to the confidence coefficient of the detection frame position and the reliability degree of the confidence coefficient of the detection frame position;

And adjusting model parameters of the preset neural network model according to the loss function value, and continuing training until the preset neural network model meets convergence conditions, so as to obtain the target detection model.

Optionally, the determining the final detection frame of the target object from the candidate frame according to the confidence coefficient of the candidate frame position of the target object and the reliability degree corresponding to the confidence coefficient includes:

Determining whether a target candidate frame with the confidence degree of the position of the candidate frame being larger than a first preset threshold value and the reliability degree corresponding to the confidence degree being larger than a second preset threshold value exists in the candidate frames;

and if the confidence coefficient of the position of the candidate frame is larger than the first preset threshold value and the reliability degree corresponding to the confidence coefficient is larger than the target candidate frame of the second preset threshold value, the target candidate frame is used as a final detection frame of the target object.

A second aspect of the present invention provides an image detection apparatus including:

the acquisition module is used for acquiring a zoom image and a filling image according to the detected image;

the splicing module is used for carrying out splicing processing on the scaled image and the filling image so as to obtain a spliced image;

The input module is used for taking the spliced image as an input image and inputting the input image into a preset target detection model so as to acquire a candidate frame of a target object, the confidence coefficient of the position of the candidate frame and the reliability degree corresponding to the confidence coefficient;

and the detection module is used for determining a final detection frame of the target object from the candidate frame according to the confidence coefficient of the position of the candidate frame and the reliability degree corresponding to the confidence coefficient.

A third aspect of the invention provides a computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the image detection method as in the first aspect of the invention when executing the computer program.

A fourth aspect of the present invention provides a computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the image detection method according to the first aspect of the present invention.

On the one hand, the method does not directly use a fixed pixel mode to fill the scaled image, so that the part to be filled of the spliced image has the characteristics of the detected image, and the method does not consume computing resources to analyze directly filled useless pixels like the traditional scheme, so that the filled image can provide effective image characteristics, and the identifiable degree of a target detection model to the spliced image can be improved; on the other hand, the target detection model is trained through the confidence coefficient of the detection frame position of the labeling frame of the detection target and the reliability degree of the confidence coefficient, so that after the spliced image is input into the target detection model, the target detection model can determine the final detection frame of the target object according to the confidence coefficient of the candidate frame position and the reliability degree of the confidence coefficient of the candidate frame position, and the accuracy rate of the final accurate frame is improved, and the problem of false detection is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of an application environment of an image detection method according to an embodiment of the invention;

FIG. 2 is a flow chart of steps S10-S40 of the image detection method according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a detected image according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of stitching images according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a prior art detected image filling fixed pixel values in accordance with an embodiment of the present invention;

FIG. 6 is another schematic diagram of stitching images in accordance with an embodiment of the present invention;

FIG. 7 is a schematic diagram of a prior art model training scheme in accordance with an embodiment of the present invention;

FIG. 8 is a schematic diagram of a model training mode according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an image detection apparatus according to an embodiment of the invention;

FIG. 10 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In order to illustrate the technical scheme of the invention, the following description is made by specific examples.

Example 1

The image detection method provided by the embodiment of the invention can be applied to an application environment shown in fig. 1. Specifically, the image detection method is applied to an image detection system, and the image detection system can comprise a client, a server and an image acquisition device as shown in fig. 1.

The client, the server and the image acquisition device communicate through a network and are used for realizing image detection. Specifically, the client is also called a client, which refers to a program corresponding to a server and providing a local service to a client, and the client may be installed on various personal computers, notebook computers, smartphones, tablet computers and portable wearable devices, for example, and is used for displaying the result of image detection. The image acquisition device is used for acquiring images or videos in real time and transmitting the images or videos acquired in real time to the server so that the server can adjust the detected images according to the aspect ratio of the received detected images. The image capture device may include, but is not limited to, for example, a camera, a cradle head, a decoder, a cradle video distributor, and the like, and is particularly not limited. The server may be implemented by a stand-alone server or a server cluster composed of a plurality of servers, and is not limited thereto.

In one application scenario, real-time video or image acquisition may be performed by an image acquisition device, where the acquired video or image may include a person, a vehicle, or other types of objects, but is not limited thereto. Specifically, the image capturing apparatus may be disposed in a specific area including a public place, which may include, for example, but not limited to, a school, a museum, an intersection, a pedestrian street, an office building, a garage, an airport, a hospital, a subway station, a bus station, a supermarket, a hotel, or an entertainment place, etc., and real-time video or image capturing is performed by the image capturing apparatuses of different places so that the video or image of different places may be taken as the detected image.

In one embodiment, as shown in fig. 2, the image detection method includes steps S10-S40:

s10: a scaled image and a fill image are acquired from the detected image.

Currently, the image acquisition device can acquire the detected image by performing real-time image or video acquisition. Based on the acquired detected image, a scaled image and a fill image may be acquired from the detected image.

In one embodiment, acquiring the scaled image and the fill image from the detected image may specifically include:

s101: it is determined whether the aspect ratio of the detected image matches the aspect ratio of the input image of the object detection model.

After the detected image is acquired, it may be determined whether the aspect ratio of the detected image matches the aspect ratio of the input image of the target detection model.

It will be appreciated that the ratio of the images collected by the image collecting device is generally 16:9, and the required aspect ratio of the input image of the target detection model after training is 1:1, for example, the required length and width of the input image of the current target detection model is 416×416, that is, the aspect ratio of the input image of the target detection model is 1:1. Therefore, the obtained detected image may not be suitable for training or predicting by directly inputting the target detection model, at this time, it is required to confirm whether the aspect ratio of the detected image accords with the aspect ratio of the input image of the target detection model, and if the aspect ratio of the detected image does not accord with the aspect ratio of the input image of the target detection model, it is required to pre-process the detected image obtained in real time. Illustratively, as shown in fig. 3, the aspect ratio of the current detected image may be 1920×1080, and it can be seen that the aspect ratio of the current detected image is the aspect ratio of the input image that does not conform to the target detection model.

S102: if the aspect ratio of the detected image is not in accordance with the aspect ratio of the input image, scaling the detected image in equal proportion to obtain a scaled image; wherein the scaling multiple of the equal-scale scaling is determined by the aspect ratio of the detected image and the aspect ratio of the input image.

In this step, based on step S101, if the aspect ratio of the detected image is the aspect ratio of the input image that does not conform to the target detection model, the detected image needs to be preprocessed so that the detected image satisfies the image requirement of the input target detection model. Specifically, when the length-width of the detected image is larger than the length-width of the input image of the target detection model, the detected image may be scaled according to an equal ratio to obtain a scaled image.

For example, the length and width of the input image of the target detection model are 416×416, and based on the length and width of the detected image being 1920×1080, that is, the detected image does not conform to the aspect ratio of the input image of the target detection model, and the length and width of the detected image is greater than the length and width of the input image of the target detection model, the detected image may be scaled according to an equal ratio to obtain a scaled image corresponding to the letter a as shown in fig. 4.

Wherein the scaling factor of the equal-scale scaling is determined by the aspect ratio of the detected image and the aspect ratio of the input image of the target detection model, and in one embodiment, the scaling factor of the equal-scale scaling is specifically implemented by the following ways:

S1021: the length of the detected image is divided by the input image length of the object detection model to obtain a scaling factor.

Illustratively, the length of the detected image is 1920, the input image length of the object detection model is 416, and then the length of the detected image is divided by the input image length of the object detection model by a scaling factor equal to 1920/416.

S1022: and scaling the detected image in equal proportion according to the scaling multiple to obtain a scaled image.

Currently 1920/416 may be taken as a scaling factor such that the length and width of the detected image are scaled by the scaling factor.

After determining the scaling factor of the scaling in equal proportion, the detected image may be scaled in equal proportion to obtain a scaled image, specifically, the length and width (1920×1080) of the detected image may be divided by the scaling factors to obtain a scaled image, and it may be understood that the length and width of the current scaled image is about 416×234.

S103: and copying the preset area in the scaled image to obtain a copied image.

It can be understood that the image acquisition device acquires the image in real time, the upper side of the image is a distant view area, and the target is smaller. The area near the top of the image is typically selected for cropping. In one embodiment, the preset area may be an area where the upper third of the scaled image is located, or an area where the upper half of the scaled image is located, or the like.

In practical applications, the setting may be specifically performed according to an actual scene, and is not limited.

Illustratively, the area where the top third of the scaled image (416 x 234) is located may be currently replicated, and then a replicated image may be obtained, where the replicated image has a length and width of 416 x 78. Or the area of the top half of the scaled image (416 x 234) may be duplicated to obtain a duplicated image, which is not described here to avoid redundancy.

S104: and determining the length and width of the filling image, and amplifying the copied image to the length and width of the filling image to acquire the filling image.

In one embodiment, the length and width of the filling image may be determined according to the length and width of the input image and the length and width of the scaled image of the object detection model, and may be determined by the following steps:

S1041: the length of the input image of the object detection model is taken as the length of the filling image.

S1042: the width of the scaled image is subtracted from the input image width of the object detection model to determine the width of the fill image.

In this step, it can be understood that, based on filling the short sides of the scaled image, the input image length of the object detection model can be taken as the length of the filled image, and the width of the scaled image is subtracted from the input image width of the object detection model, that is, 416-234=182, the width of the filled image can be determined as 182.

It will be appreciated that after determining the length of the fill image and the width of the fill image, the aspect ratio of the fill image may be determined. Based on step S1041 and step S1042, the aspect ratio of the fill image can be currently determined to be 416:182.

After determining the length and width of the fill image, further, the duplicate image may be enlarged to the length and width of the fill image to obtain the fill image. Specifically, based on the input image length and width of the target detection model being 416×416 and the length and width of the scaled image being 416×234, the length and width of the filled image can be determined to be 416×182. The duplicate image may now be enlarged to the fill image corresponding to letter B as shown in fig. 4, i.e. the scaled image of 416 x 78 is enlarged to the fill image of 416 x 182.

In this step, in order to fill the scaled image, the length and width of the filled image may be determined in advance according to the input image length of the object detection model, the input image width of the object detection model, the length of the scaled image, and the width of the scaled image, so that the copied image is enlarged in accordance with the length and width of the filled image, so that the filled image is acquired.

S20: and performing stitching processing on the scaled image and the filling image to obtain a stitched image.

Based on the scaling of the detected image in equal proportion in step S1022 to obtain a scaled image, and the filling image obtained in step S104, the scaled image and the filling image may be subjected to stitching processing to obtain a stitched image.

For example, the length and width of the scaled image is 416×234, the length and width of the filled image is 416×182, and the length and width of the stitched image is 416×416, i.e. the current stitched image meets the ratio requirement of the input image of the input target detection model.

For better illustration, the following describes a complete embodiment, and in an application scenario, specifically, the aspect ratio of the current detected image is 1920×1080, and the length and width of the input image for inputting the target detection model is 416×416; the input image is scaled in equal proportion so that the length of the detected image is scaled to the length of the input image, and the width of the detected image is scaled to the width of the input image, and the length and width of the filling are required to be determined after the 1920×1080 detected image is scaled to the scaled image in equal proportion. Based on the input image length and width of the target detection model being 416×416 and the scaled image being 416×234, the filling image can be currently determined to be 416×182; copying a preset area (for example, an area where the upper third of the scaled image is located) of the scaled image, so as to obtain a copied image of 416×78, and amplifying 416×78 to 416×182 to obtain a filled image; therefore, the zoom image and the filling image can be subjected to stitching processing, so that a stitched image which accords with the aspect ratio of the input image of the target detection model can be obtained.

In the prior art, in order to convert a detected image which does not conform to the aspect ratio of an input image of a target detection model into a scaled image, a general processing method is to fill a shorter side of the detected image with fixed pixel values so that the short side length and the long side length of the detected image are equal. As shown in fig. 5, the length and width of the detected image is 1920×1080, and the length and width of the input image of the target detection model is 416×416. In a conventional manner, a 1920×1080 detected image is scaled in an equal proportion to obtain a 416×234 scaled image, then a portion of the scaled image that is free is filled with a pixel value of 0, and an area corresponding to a letter C1 shown in fig. 5 represents an image with a pixel value of 0, so that the scaled image is filled into 416×416, thereby realizing the conversion of the 1920×1080 detected image into an 416×416 input image. In the manner of the prior art, the inventors have found that simply filled pixels not only do not provide effective information, but rather add to the operation of the network.

In order to solve the technical problem, the wasted operation of the part is fully utilized, and meanwhile, the detection performance of the target is improved. In the embodiment of the present invention, as shown in fig. 6, first, a 1920×1080 detected image is scaled to 416×234, then, the upper third of the image is copied, specifically referring to the area corresponding to the letter C2 shown in fig. 6, so as to obtain a 416×78 copied image, and the copied image is enlarged to a 416×182 filled image, specifically referring to the area corresponding to the letter C3 shown in fig. 6, so that when the scaled image and the filled image are subjected to the stitching process, the 1920×1080 detected image can be converted into an input image conforming to the target detection model of 416×416. It will be appreciated that by means of the present invention, not only can the filled portion of the stitched image be provided with image features that replicate the image, but the magnification of the filled image can be reduced, i.e. equivalent to increasing the resolution of the filled image.

According to the image detection method provided by the embodiment of the invention, through the process of the steps, the detected image is scaled in equal proportion after the aspect ratio of the detected image is determined to be the aspect ratio of the input image which does not accord with the target detection model, so that the scaled image is obtained, the preset area in the scaled image is copied, the copied image is amplified to the length and the width of the filling image, the scaled image is filled in a mode that fixed pixels are not directly used, the part to be filled of the spliced image can be provided with the characteristics of the detected image, so that the filled image can provide effective image characteristics, and the identifiable degree of the target detection model on the spliced image can be improved when the spliced image is input into the target detection model.

In the above embodiment, as shown in fig. 4 and 6, the stitched image includes a scaled image and a filling image, and actually includes two feature extractions, where the features of the two parts have a certain similarity, so as to correspond to the image features under different resolutions, and so as to better utilize the two features, so that the strokes of the two features are better complementary, and the two parts of features may be weighted and fused, further, in one embodiment, in step S20, the scaled image and the filling image are subjected to a stitching process to obtain the stitched image, where:

S201: and acquiring the weight proportion of the preset area of the scaled image and the filling image.

In one embodiment, the weight ratio may be obtained by: the ratio of the width of the copied image to the width of the filled image is determined, and the ratio of the width of the copied image to the width of the filled image is taken as a weight ratio.

In connection with the above embodiment, since the width of the copy image is 78 and the width of the fill image is 182, the ratio of the width of the copy image divided by the width of the fill image=78:182≡1:2.3. After determining the ratio of the width of the duplicate image to the width of the filler image, 1:2.3 can currently be taken as the weight ratio.

S202: and carrying out weighted fusion on the scaled image and the filling image according to the weight proportion so as to obtain a spliced image.

The weighted fusion can be linear fusion and nonlinear fusion, and the linear fusion mode is to perform linear weighted fusion on each pixel point in the two images based on the weighting coefficient so as to obtain a spliced image; the nonlinear fusion mode is to determine the cutting proportion of two images based on the weight coefficient, then cut the two images in a preset cutting mode, and splice the cut parts together to obtain a spliced image, wherein the preset cutting modes have various modes, such as: vertical cutting, horizontal cutting, spaced vertical cutting, spaced horizontal cutting or local area cutting. For example, in step S202, a nonlinear fusion may be adopted, after the weight proportion is determined, a stitching process may be performed on the scaled image of 416×182 and the filled image of 416×234 according to the weight proportion to obtain an image of 416×416, specifically, 1 part of the scaled image of "416×182" may be weighted according to the weight proportion, and 2.3 parts of the filled image of "416×234" may be weighted according to the weight proportion, so that a stitched image is obtained by a weighted fusion method.

In the above embodiment, it is understood that, based on the enlarged acquisition of the fill image for the copy image, the enlarged fill image has a higher resolution, and a larger weight can be set in the weighted fusion.

In the above embodiment, the two feature extractions based on the region on the stitched image have a certain similarity, corresponding to features at different resolutions. The scaled images and the filling images are subjected to weighted fusion, so that fused strokes can be better complemented, spliced images are more accurate, the recognition degree of the target detection model on the detected images is further improved, and the accuracy of the target detection model on target detection is improved.

In one embodiment, if the aspect ratio of the detected image accords with the aspect ratio of the input image of the target detection model, the method further comprises:

S105: determining whether the length of the detected image is equal to the length of the input image of the object detection model, and determining whether the width of the detected image is equal to the width of the input image of the object detection model.

In one application scenario, if the aspect ratio of the detected image matches the aspect ratio of the input image of the target detection model, further, it may be determined whether the length of the detected image is equal to the length of the input image of the target detection model, and whether the width of the detected image is equal to the width of the input image of the target detection model.

S106: if the length of the detected image is not equal to the length of the input image of the target detection model, the width of the detected image is not equal to the width of the input image of the target detection model, and the length and the width of the detected image are both larger than the length and the width of the input image of the target detection model, scaling the detected image in equal proportion so that the length of the detected image is equal to the length of the input image of the target detection model, and the width of the detected image is equal to the width of the input image of the target detection model; the scaling multiple of the equal-proportion scaling is determined according to the length and width of the detected image and the length and width of the input image of the target detection model.

For example, the aspect ratio of the currently acquired detected image is 1080×1080, the current detected image may be scaled in an equal proportion, that is, the detected image of 1080×1080 is scaled to the input image of 416×416, and the specific process may refer to the embodiment of step S1022, which is not described herein to avoid redundancy.

The scaling multiple of the equal-proportion scaling is determined according to the length and the width of the detected image and the length of the input image of the target detection model. Reference may be made specifically to the above-mentioned procedure of step S1021, and a description thereof will not be given here for avoiding redundancy.

In step S106, it should be noted that, currently, if it is determined that the length of the detected image is not equal to the length of the input image of the target detection model, the width of the detected image is not equal to the width of the input image of the target detection model, and the length and width of the detected image are smaller than the length and width of the input image of the target detection model, the detected image may be amplified in equal proportion, and in particular, the above-mentioned equal proportion scaling process may be referred to, and in order to avoid redundancy, description will not be made here.

In the above embodiment, if the aspect ratio of the detected image matches the aspect ratio of the input image of the target detection model, the length of the detected image is not equal to the length of the input image of the target detection model, the width of the detected image is not equal to the width of the input image of the target detection model, and the length-width of the detected image is greater than the length-width of the input image of the target detection model, the detected image is scaled in equal proportion so that the length of the detected image is equal to the length of the input image of the target detection model, and the width of the detected image is equal to the width of the input image of the target detection model, that is, the detected image is uniformly processed, so that the processing efficiency of the target detection model on the detected image can be improved.

S30: and taking the spliced image as an input image, and inputting the input image into a preset target detection model to obtain a candidate frame of the target object and the confidence coefficient of the position of the candidate frame and the reliability degree corresponding to the confidence coefficient.

Based on the acquired stitched image, the stitched image may be used as an input image, and the input image is input into a preset target detection model, where the preset target detection model is obtained by training with a labeled training image, and in one embodiment, the training process of the target detection model specifically includes steps S01-S04:

S01: acquiring a training image set and a verification image set, wherein the training image set comprises marked training images; the annotation frame of the detection target of the annotated training image comprises annotation frame parameters, and the annotation frame parameters comprise the confidence of the detection frame position and the reliability of the confidence of the detection frame position.

It can be understood that the training image set is a set of training images used in performing model training, and the verification image set is a set of verification images used in verifying the trained neural network model. Each training image in the training image set, and each verification image in the verification image set may include a positive sample training image and a negative sample training image in a preset number proportion.

In the prior art, a Non-maximum suppression NMS (Non-Maximum Suppression) manner is generally used to de-duplicate a plurality of output detection frames, specifically, for each detection frame, the NMS manner is to multiply the confidence of the detection frame position with the confidence of the detection frame category, and take the multiplied result as the final confidence corresponding to each detection frame, so that the detection frames with low final confidence are deleted in sequence, and the detection frames corresponding to higher final confidence are obtained. According to the current processing mode, the detector correspondingly outputs the confidence of the position of the detection frame and the confidence of the category of the detection frame aiming at one detection frame. In a general training manner of the neural network model, as shown in fig. 7, one branch of the model network outputs a confidence level of a detection frame position, the other branch of the model network outputs a confidence level of a detection frame class, and the current assumption is that the confidence level of the detection frame class is a, and the confidence level of the detection frame position is B, and the final confidence level is AB. In this way, false detection is easily caused, because if the confidence a of the classification of the detection frame is smaller and the confidence B of the position of the detection frame is larger, the result of AB may be smaller than the actually used set confidence threshold, and for face detection, for example, the background may be used as an adult face, so that false detection may occur.

In order to solve the problem of easily causing false detection, in step S01, the network output is modified, specifically, as shown in fig. 8, that is, the confidence level of the detection frame class of the class branch is changed to the reliability level of the confidence level of the detection frame position. Based on the obtained training image set, each training image in the training image set can be marked in a manual mode, so that detection targets in each training image in the training image set have marking frame parameters corresponding to marking frames, wherein the detection targets are objects to be detected, the detection targets are marked through the marking frames, and the marking frame parameters are used for describing the marking frames. Specifically, the annotation frame parameters may include a confidence level of the detection frame position in the annotation frame and a reliability level of the confidence level of the detection frame position.

S02: and training the preset neural network model through the marked training image.

Specifically, the neural network model may be pre-constructed according to the actual situation, so as to obtain a preset neural network model, and the preset neural network model is trained through the labeled training graph.

S03: inputting the verification images of the verification image set into a preset neural network model to obtain the confidence coefficient of the detection frame position corresponding to the detection target output by the preset neural network model and the reliability degree of the confidence coefficient of the detection frame position.

It can be understood that the detection frame is a detection result of the preset neural network model for the detection target in the training image, and the confidence of the position of the detection frame and the reliability of the confidence of the position of the detection frame are used for describing the detection frame. The confidence of the position of the detection frame aims at the detection frame, and the reliability of the confidence of the position of the detection frame aims at the confidence. In conventional detectors, a judgment is generally made based on the confidence of the detection frame, and if the confidence is greater than a certain threshold, the detection frame is considered as the target. However, the confidence level given by the network is not necessarily reliable, so that a prediction needs to be made on the confidence level given by the network to determine whether the confidence level for the detection frame is reliable, and if not, the confidence level is not adopted, so that the confidence level output by the network can be referred to according to the reliability degree of the confidence level of the detection frame position.

Specifically, after the preset neural network model is trained through the labeled training image, further, each verification image of the verification image set may be input into the preset neural network model, and the detection targets of each verification image may be detected by the preset neural network model, so as to output a detection result for the detection targets in the verification image, that is, the confidence of the detection frame position and the reliability of the confidence of the detection frame position.

S04: and acquiring a loss function value according to the confidence coefficient of the position of the detection frame and the reliability degree of the confidence coefficient of the position of the detection frame. Specifically, in one embodiment, step S04, that is, obtaining the loss function value according to the confidence of the detection frame position and the reliability of the confidence of the detection frame position, may specifically be implemented by the following formula:

loss＝-|p-σ|²((1-p)log(1-σ)+p logσ)

wherein, p represents the confidence coefficient of the position of the detection frame, and the value range is between 0 and 1; σ represents the degree of reliability of the confidence in the position of the detection frame.

Based on the above formula, it can be understood that when the confidence p of the detection frame position and the reliability σ of the confidence of the detection frame position are input into the formula, the corresponding loss function value can be obtained, so that the preset neural network model is trained according to the corresponding loss function value.

S05: and adjusting model parameters of the preset neural network model according to the loss function value, and continuing training until the preset neural network model meets the convergence condition, ending training, so as to obtain the target detection model.

Based on the loss function value obtained in the step S04, in the step, the model parameters in the preset neural network model can be adjusted according to the loss function value, and then the training is continued for the preset neural network model with the adjusted parameters by repeating the steps until the preset neural network model meets the convergence condition, for example, when the loss function value is smaller and smaller than the preset threshold; or specifically, the maximum iteration number can be set, and when the model training iteration number exceeds the maximum iteration number, the training is stopped to obtain a target detection model, and further, after the target detection model is determined, the input image to be detected can be subjected to target detection.

In the above embodiment, the confidence level of the detection frame position and the reliability of the confidence level of the detection frame position are regarded as a whole through steps S01-S05, and the confidence level of the detection frame position and the confidence level of the detection frame category are not trained independently; and the confidence coefficient of the detection frame type is changed into the reliability degree of the confidence coefficient of the detection frame position, namely, the loss function value is obtained according to the confidence coefficient of the detection frame position and the reliability degree of the confidence coefficient of the detection frame position, so that the training mode of training the target detection model according to the loss function value can ensure that the confidence coefficient of the detection frame position or the confidence coefficient of the detection frame type is small any more, and the situation that the value of the confidence coefficient A of the detection frame and the reliability degree B of the detection frame confidence coefficient is small is caused, so that the situation of false detection is reduced.

S40: and determining a final detection frame of the target object from the candidate frame according to the confidence coefficient of the position of the candidate frame and the reliability corresponding to the confidence coefficient.

In an application scenario, based on step S30, when the stitched image is input to the target detection model for detection, a plurality of detection frames are output correspondingly, and how to screen and determine the candidate frames with higher accuracy according to the post-processing process of target detection.

Specifically, after the confidence level of the candidate frame position of the target object and the reliability corresponding to the confidence level of the candidate frame position are obtained, further, according to a preset threshold, the confidence level of the candidate frame position and the reliability of the confidence level of the candidate frame position may be compared with the preset threshold, so as to determine the final detection frame of the target object.

In one embodiment, in step S40, that is, according to the confidence level of the candidate frame position and the reliability level corresponding to the confidence level, determining the final detection frame of the target object from the candidate frame may specifically include:

s401: and determining whether a target candidate frame with the confidence degree of the position of the candidate frame being larger than a first preset threshold value and the reliability degree corresponding to the confidence degree being larger than a second preset threshold value exists in the candidate frames.

Further, based on the acquired candidate frames, the current candidate frame includes a plurality of candidate frames, and then a portion of the candidate frames in which a preset threshold is satisfied needs to be selected.

Specifically, among the candidate frames, it may be determined whether there is a target candidate frame that simultaneously satisfies the following conditions: the confidence coefficient of the candidate frame position is larger than a first preset threshold value, the first preset threshold value can be 0.9, the reliability degree of the confidence coefficient of the candidate frame position is larger than a second preset threshold value, and the first preset threshold value can be 0.9.

S402: and if the confidence coefficient of the position of the candidate frame is larger than the first preset threshold value and the reliability degree corresponding to the confidence coefficient is larger than the target candidate frame of the second preset threshold value, taking the target candidate frame as a final detection frame of the target object.

If the confidence of the candidate frame position of the target candidate frame is determined to be greater than 0.9 and the reliability of the confidence of the candidate frame position is determined to be greater than 0.9, the target candidate frame can be used as the final detection frame of the target object.

In the prior art, the detector generally makes a judgment on the confidence level of the candidate frame position, and if the confidence level of the candidate frame position is greater than a certain threshold value, the current candidate frame is considered as the target detection frame. However, the confidence level of the candidate frame position given by the network is not necessarily reliable, so that a prediction needs to be made on the confidence level of the candidate frame position given by the network to determine whether the confidence level of the candidate frame position is reliable, and if not, the confidence level is not adopted. For example, if the network gives a confidence of 0.9 for the target candidate box location, for a conventional detection network, the current target candidate box is considered to be the final detection box. However, the reliability degree of the confidence coefficient of the network to the candidate frame position is only 0.1, and it can be seen that the reliability degree of the confidence coefficient of the current candidate frame position is smaller than the second preset threshold, that is, the confidence coefficient of the current candidate frame position is not reliable.

In the above-described embodiment, the confidence of the detection frame position and the reliability of the confidence of the detection frame position are trained as a whole, and the confidence of the detection frame category is changed to the reliability of the confidence of the detection frame position. Therefore, when the detected image is input into the target detection model to obtain other images conforming to the aspect ratio, the reliability degree of the confidence coefficient of the candidate frame position can be predicted in the steps S401 and S402, and the situation that one of the confidence coefficient of the detection frame position or the confidence coefficient of the detection frame class is small can not occur any more, so that the situation that the value of the confidence coefficient A of the detection frame and the reliability degree B of the confidence coefficient of the detection frame is small is caused, and the situation that false detection occurs is reduced.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

Example 2

In one embodiment, the present invention further provides an image detection device, where the functions implemented by the image detection device correspond to the steps of the image detection method in the foregoing embodiment one by one. Specifically, as shown in fig. 9, the image detection apparatus may include an acquisition module 10, a stitching module 20, an input module 30, and a detection module 40. Wherein, each functional module is described in detail as follows:

an acquisition module 10 for acquiring a scaled image and a fill image from the detected image;

a stitching module 20, configured to stitch the scaled image and the filling image to obtain a stitched image;

The input module 30 is configured to input the stitched image as an input image to a preset target detection model, so as to obtain a candidate frame of a target object, a confidence coefficient of a candidate frame position, and a reliability degree corresponding to the confidence coefficient;

and the detection module 40 is configured to determine a final detection frame of the target object from the candidate frames according to the confidence level of the candidate frame position and the reliability corresponding to the confidence level.

In one embodiment, the acquisition module 10 is further configured to:

copying a preset area in the scaled image to obtain a copied image;

In one embodiment, the acquisition module 10 is further configured to:

taking the input image length as the length of the filling image;

In one embodiment, the acquisition module 10 is further configured to:

Copying the region where the upper third of the scaled image is located to obtain a copied image;

Or copying the area where the upper half of the scaled image is located so as to obtain a copied image.

In one embodiment, the acquisition module 10 is further configured to:

In one embodiment, the input module 30 is further configured to:

training a preset neural network model through the marked training image;

In one embodiment, the detection module 40 is further configured to:

For specific limitations of the image detection apparatus, reference may be made to the above limitations of the image detection method, and no further description is given here. The respective modules in the above-described image detection apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Example 3

In one embodiment, a computer readable storage medium is provided, where a computer program is stored, where the computer program is executed by a processor to implement steps in the image detection method of the foregoing embodiment, or where the computer program is executed by the processor to implement functions of each module in the image detection apparatus of the foregoing embodiment, so that repetition is avoided, and details are not repeated herein. It will be appreciated that the computer readable storage medium may comprise: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, and so forth.

In an embodiment, a computer readable storage medium is provided, the computer readable storage medium storing a computer program which when executed by a processor performs the steps of:

obtaining a scaled image and a filled image according to the detected image;

And determining a final detection frame of the target object from the candidate frame according to the confidence coefficient of the candidate frame position of the target object and the reliability degree corresponding to the confidence coefficient.

Example 4

In one embodiment, as shown in FIG. 10, a computer device is provided. Specifically, the computer device 60 of this embodiment includes: a processor 61, a memory 62 and a computer program 63 stored in the memory 62 and executable on the processor 61. The steps in the image detection method of the above embodiment are implemented by the processor 61 when executing the computer program 63, or the functions of each module in the image detection apparatus of the above embodiment are implemented by the processor 61 when executing the computer program 63, so that repetition is avoided and detailed description is omitted.

In one embodiment, a computer device is provided comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the following steps when executing the computer program:

obtaining a scaled image and a filled image according to the detected image;

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

It will be apparent to those skilled in the art that the foregoing functional units and modules are merely illustrated for convenience and brevity of description, and that in practical applications, the foregoing functional allocations may be performed by different functional modules, sub-modules and units, i.e. the internal structure of the apparatus is divided into different functional units or modules, to perform all or part of the functions described above.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims

1. An image detection method, the method comprising:

copying a preset area in the scaled image to obtain a copied image;

Determining the length and width of a filling image, and amplifying the copied image to the length and width of the filling image to acquire the filling image;

Performing stitching processing on the scaled image and the filling image to obtain a stitched image, wherein the stitched image is an image meeting the input image proportion requirement of an input target detection model;

Inputting the spliced image as an input image into a preset target detection model to obtain a candidate frame of a target object, confidence coefficient of the position of the candidate frame and reliability degree corresponding to the confidence coefficient, wherein the confidence coefficient of the position of the candidate frame and the reliability degree corresponding to the confidence coefficient are used for describing a detection frame, the confidence coefficient of the position of the candidate frame is specific to the detection frame, and the reliability degree corresponding to the confidence coefficient is specific to the confidence coefficient of the detection frame;

2. The image detection method according to claim 1, wherein the scaling the detected image equally to obtain a scaled image includes:

3. The image detection method according to claim 1, wherein the determining the length and width of the filled image includes:

Taking the input image length as the length of the filling image; the width of the scaled image is subtracted from the input image width to determine the width of the fill image.

4. The image detection method according to claim 1, wherein the stitching the scaled image and the filler image to obtain a stitched image includes:

5. The image detection method according to claim 4, wherein the acquiring the weight ratio of the preset area of the scaled image to the fill image includes:

6. The image detection method according to any one of claims 1 to 5, wherein the object detection model is obtained by training with labeled training images, and comprises:

training a preset neural network model through the marked training image;

7. The image detection method according to any one of claims 1 to 5, wherein the determining the final detection frame of the target object from the candidate frame according to the confidence level of the candidate frame position and the reliability level corresponding to the confidence level includes:

8. An image detection apparatus, characterized in that the image detection apparatus comprises:

The detection module is used for determining a final detection frame of the target object from the candidate frame according to the confidence coefficient of the position of the candidate frame and the reliability degree corresponding to the confidence coefficient;

the image detection device is adapted to implement the method of claim 1.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the image detection method according to any of claims 1-5 when executing the computer program.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the image detection method according to any one of claims 1-5.