CN114170642A

CN114170642A - Image detection processing method, device, equipment and storage medium

Info

Publication number: CN114170642A
Application number: CN202010940806.4A
Authority: CN
Inventors: 徐绍君; 邱书豪; 祝闯; 刘军
Original assignee: Beijing University of Posts and Telecommunications; Chengdu TD Tech Ltd
Current assignee: Beijing University of Posts and Telecommunications; Chengdu TD Tech Ltd
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2022-03-11

Abstract

The application provides a processing method, a device, equipment and a storage medium for image detection, wherein the method comprises the following steps: acquiring an original image; zooming the original image to obtain an image to be detected; based on the image to be detected, a trained multi-task detection model is adopted to obtain a detection result, and the multi-task detection model is obtained by performing optimization training by combining a detection task and an illumination classification task; and determining the target object image in the image to be detected according to the detection result. The accuracy of the detection result of the target object in the image in the dark environment is effectively improved.

Description

Image detection processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing image detection.

Background

With the advent of high performance computers, face detection technology has made tremendous progress. In recent years, with the rapid development of deep learning technology, Convolutional Neural Networks (CNNs) have been widely used in the field of computer vision with their advantages of excellent performance and end-to-end training, and are applied to face detection scenes.

However, in the prior art, the effect of face detection in a dark light environment is greatly reduced, and even detection cannot be performed, for this reason, the image is usually subjected to dark light enhancement first, and then detection is performed by using a deep learning method, but the method has poor accuracy of detection results and poor effect.

Disclosure of Invention

The application provides a processing method, a device, equipment and a storage medium for image detection, which aim to overcome the defects of inaccurate dark light image detection result and the like in the prior art.

The first aspect of the present application provides a processing method for image detection, including:

acquiring an original image;

zooming the original image to obtain an image to be detected;

based on the image to be detected, a trained multi-task detection model is adopted to obtain a detection result, and the multi-task detection model is obtained by performing optimization training by combining a detection task and an illumination classification task;

and determining the target object image in the image to be detected according to the detection result.

Optionally, the scaling the original image to obtain an image to be detected includes:

scaling the original image to a size required by an input of the multi-tasking detection model.

Optionally, the network architecture of the multitask detection model includes a shared network module, a detection network module, and an illumination classification network module;

the method for obtaining the detection result by adopting the trained multi-task detection model based on the image to be detected comprises the following steps:

performing image feature extraction on the image to be detected through the convolution layer of the shared network module to obtain a first feature map;

and detecting the first characteristic diagram by adopting a preset convolution core to obtain a detection result, wherein the detection result comprises a probability value of whether each pixel in the first characteristic diagram is a target object and a prediction box position.

Optionally, the determining, according to the detection result, the target object image in the image to be detected includes:

determining a boundary frame of a target object image in the image to be detected according to the position of the prediction frame corresponding to each pixel in the first characteristic diagram, and determining the pixel of the target object in the image to be detected according to the probability value of whether each pixel in the first characteristic diagram corresponds to the target object;

and determining the target object image in the image to be detected according to the boundary frame of the target object image in the image to be detected and the pixels of the target object.

Optionally, after performing image feature extraction on the image to be detected through the convolution layer of the shared network module to obtain a first feature map, the method further includes:

performing convolution on the first feature map to obtain at least one second feature map;

the detecting the first characteristic diagram by adopting a preset convolution kernel to obtain a detection result comprises the following steps:

and detecting the first characteristic diagram and the at least one second characteristic diagram by adopting a preset convolution core to obtain a detection result, wherein the detection result comprises a probability value of whether each pixel in the first characteristic diagram is a target object and a first prediction frame position, and a probability value of whether each pixel in each second characteristic diagram is the target object and a second prediction frame position.

Optionally, before obtaining a detection result by using a trained multi-task detection model based on the image to be detected, the method further includes:

acquiring training images and label data corresponding to the training images, wherein the training images comprise normal illumination images and dim light images;

and training a pre-established multi-task detection network based on the training image to obtain the multi-task detection model, wherein the multi-task detection network comprises a shared network module, a detection network module and an illumination classification network module.

Optionally, the training a pre-established multi-task detection network based on the training image to obtain the multi-task detection model includes:

alternately taking a normal illumination image and a dim light image in the training image as input images;

for each input image, performing image feature extraction on the input image through a convolutional layer of a shared network module to obtain a first training feature map, and determining at least one second training feature map according to the first training feature map;

detecting the first training characteristic diagram and the second training characteristic diagram by adopting a preset convolution core of a detection network module to obtain a training detection result; determining an illumination classification result corresponding to the first training feature map by adopting an illumination classification network module;

adjusting network parameters of the multi-task detection network based on a training detection result, the illumination classification result and a preset weighting loss function, wherein the weighting loss function comprises the weighting of a detection network module loss function and an illumination classification network module loss function;

and obtaining the multi-task detection model until the final weighting loss function meets the preset requirement.

A second aspect of the present application provides a processing apparatus for image detection, including:

the acquisition module is used for acquiring an original image;

the preprocessing module is used for carrying out scaling processing on the original image to obtain an image to be detected;

the detection module is used for obtaining a detection result by adopting a trained multi-task detection model based on the image to be detected, and the multi-task detection model is obtained by performing optimization training by combining a detection task and an illumination classification task;

and the determining module is used for determining the target object image in the image to be detected according to the detection result.

Optionally, the preprocessing module is specifically configured to:

the detection module is specifically configured to:

Optionally, the determining module is specifically configured to:

Optionally, the detection module is further configured to:

Optionally, the obtaining module is further configured to obtain training images and label data corresponding to each training image, where the training images include a normal illumination image and a dark illumination image;

the detection module is further used for training a pre-established multi-task detection network based on the training image to obtain the multi-task detection model, and the multi-task detection network comprises a shared network module, a detection network module and an illumination classification network module.

Optionally, the detection module is specifically configured to:

A third aspect of the present application provides an electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the method as set forth in the first aspect above and in various possible designs of the first aspect.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon computer-executable instructions that, when executed by a processor, implement a method as set forth in the first aspect and various possible designs of the first aspect.

According to the image detection processing method, the image detection processing device, the image detection processing equipment and the storage medium, the original image is obtained, the original image is subjected to scaling processing to obtain the image to be detected which meets the input requirement of the multi-task detection model, based on the image to be detected, the trained multi-task detection model can be used for obtaining the detection result, the target object image in the image to be detected is determined according to the detection result, the target object in the image can be accurately detected, the multi-task detection model is obtained by combining two tasks of detection and illumination classification and simultaneously performing training optimization, the target object detection of the dark light image has better generalization capability, the target object in the dark light image can be accurately detected, and the accuracy and the effect of target object detection in a dark light scene are effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a schematic diagram of an architecture of a processing system according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a processing method for image detection according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a processing method for image detection according to another embodiment of the present application;

FIG. 4 is a schematic diagram of a network architecture of a multitasking detection model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a network architecture of a multitasking detection model according to an embodiment of the present application;

fig. 6 is a schematic diagram of a network detection process according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a processing apparatus for image detection according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms referred to in this application are explained first:

multi-task learning: the method includes the steps that a detection Task and an illumination condition judging Task are combined to conduct training and Learning, loss functions of the tasks are weighted, a plurality of tasks are optimized simultaneously, and representations among related tasks are shared, so that a detection model can better and accurately detect a target object (such as a human face) in normal illumination and dark environment, and the generalization capability of the detection model is improved.

The image detection processing method provided by the embodiment of the application is suitable for image detection scenes in a dark light environment, such as face detection in the dark light environment. Specifically, the system includes an access control system based on face detection, a payment system based on face recognition, an authentication system, a security check system, and the like. Fig. 1 is a schematic diagram of an architecture of a processing system according to an embodiment of the present application. The processing system may include an electronic device and may also include a terminal. The user provides the original image through the terminal or triggers the start of detection, or triggers the start of training, etc. The terminal may be any terminal that captures an image. After the electronic equipment acquires the detection starting instruction, the original image can be acquired, the original image is subjected to scaling processing to obtain an image to be detected which meets the input requirement of the multi-task detection model, based on the image to be detected, the trained multi-task detection model can be adopted to obtain a detection result, and the target object image in the image to be detected is determined according to the detection result. In the embodiment of the application, the original image can be a normal light image or a dim light image, the multi-task detection model is obtained by combining two tasks of detection and illumination classification and simultaneously training and optimizing, the target object detection of the dim light image has better generalization capability, the target object in the dim light image can be accurately detected, the detected target object can be output, corresponding processing is carried out based on the detected target object, and the accuracy and the effect of target object detection in a dim light scene are effectively improved.

For example, if the target object is a face and the application scene is security inspection, the terminal is a face acquisition device of a security inspection channel, and security inspection processing is performed based on the detected face of the pedestrian to determine whether the pedestrian can pass through. For another example, if the application scene is an access control system, the terminal may be an image acquisition device disposed at a doorway and configured to acquire a face image of a user to be entered, perform face detection, accurately detect the face image of the user to be entered even in a dark environment, verify the user to be entered based on the detected face image, and determine whether a door opening action may be triggered.

Optionally, the processing method for image detection provided by the embodiment of the present application may also be applied to detection of other objects, which is not limited to face detection, such as animal image detection.

The image detection processing method provided by the embodiment of the application can be suitable for any computer vision application scene, such as the application fields of video monitoring, unmanned driving and the like.

Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. In the description of the following examples, "plurality" means two or more unless specifically limited otherwise.

The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

An embodiment of the present application provides a processing method for image detection, which is used for performing image detection in a normal illumination or dark environment. The execution subject of the present embodiment is a processing apparatus for image detection, which may be provided in an electronic device.

As shown in fig. 2, a schematic flow chart of a processing method for image detection provided in this embodiment is shown, where the method includes:

step 101, acquiring an original image.

Specifically, when image detection is required for an image, the image (referred to as an original image) may be acquired, and the original image may be an image of any pixel, that is, the acquired original image is image data including pixel data constituting the original image. For example, 500 × 500 original images are acquired.

And 102, zooming the original image to obtain an image to be detected.

Specifically, after the original image is obtained, the original image needs to be scaled to obtain an image to be detected, where the image to be detected is an image meeting the input requirement of the multitask detection model, for example, if the input of the multitask detection model is set to 300 × 300, the original image needs to be scaled to 300 × 300. The size of the image to be detected can be set according to an actual multi-task detection model, and this embodiment is not limited.

And 103, acquiring a detection result by adopting a trained multi-task detection model based on the image to be detected, wherein the multi-task detection model is obtained by performing optimization training by combining a detection task and an illumination classification task.

Specifically, a multitask learning detection network can be established in advance, a training image is adopted to carry out optimization training on the multitask learning detection network, the multitask learning detection network can comprise a detection task, an illumination classification task and a sharing part of the two tasks, each task has a corresponding loss function, the multitask learning detection network is optimized and trained according to the weighted loss function through weighting of each loss function, and a multitask detection model is finally obtained. The training images comprise normal illumination images and dark light images, the multi-task learning detection network can be alternately input for training, the generalization capability of the multi-task learning detection network is improved, and the multi-task detection model obtained through training can accurately detect the target objects in the images in the normal illumination environment and the dark light environment.

After the image to be detected is obtained, the image to be detected can be detected by adopting a multi-task detection model obtained by training based on the image to be detected, so that a detection result is obtained.

The detection result may include a probability value that each pixel belongs to the target object and a prediction result of the target object bounding box.

And step 104, determining a target object image in the image to be detected according to the detection result.

Specifically, after the detection result is obtained, the target object image in the image to be detected can be determined according to the detection result. Specifically, the target pixel belonging to the target object may be determined based on a probability value of each pixel included in the detection result belonging to the target object, and the bounding box of the target object may be determined based on a prediction result of the bounding box of the target object. And determining the image part of the target object in the image to be detected by combining the target pixel and the boundary frame of the target object. Or the target image part in the original image can be determined by certain calculation by combining the target pixel and the boundary frame of the target, so that the purpose of detecting the target is achieved.

According to the processing method for image detection provided by the embodiment, the original image is obtained, the original image is subjected to scaling processing to obtain the image to be detected which meets the input requirement of the multi-task detection model, based on the image to be detected, the trained multi-task detection model can be adopted to obtain the detection result, the target object image in the image to be detected is determined according to the detection result, the target object in the image can be accurately detected, and the multi-task detection model is obtained by combining two tasks of detection and illumination classification and simultaneously carrying out training optimization, so that the target object detection of the dark light image has better generalization capability, the target object in the dark light image can be accurately detected, and the accuracy and the effect of the target object detection in the dark light scene are effectively improved.

The method provided by the above embodiment is further described in an additional embodiment of the present application.

As shown in fig. 3, a schematic flow chart of the processing method for image detection provided in this embodiment is shown.

As an implementable manner, on the basis of the foregoing embodiment, optionally, the scaling processing is performed on the original image to obtain an image to be detected, which specifically includes:

step 1021, scaling the original image to the size required for input of the multitask detection model.

Specifically, the image to be detected is an image meeting the input requirement of the multitask detection model, for example, the input of the multitask detection model is set to be 300 × 300, after the original image is obtained, the original image needs to be scaled, and the image to be detected is obtained, for example, the original image needs to be scaled to be 300 × 300. The size of the image to be detected can be set according to an actual multi-task detection model, and this embodiment is not limited.

As another implementable manner, on the basis of the foregoing embodiment, optionally, the network architecture of the multitasking detection model includes a shared network module, a detection network module, and an illumination classification network module; based on the image to be detected, a trained multi-task detection model is adopted to obtain a detection result, and the method specifically comprises the following steps:

and step 1031, performing image feature extraction on the image to be detected through the convolution layer of the shared network module to obtain a first feature map.

And 1032, detecting the first feature map by adopting a preset convolution kernel to obtain a detection result, wherein the detection result comprises a probability value of whether each pixel in the first feature map is a target object and a prediction frame position.

Specifically, the multi-task detection model is obtained by combining a detection task and an illumination classification task, and a network architecture of the multi-task detection model comprises a shared network module of the two tasks, a detection network module of the detection task and an illumination classification network module of the illumination classification task, wherein the detection task adopts a loss function corresponding to the detection network module, the illumination classification task adopts a loss function corresponding to the illumination classification network module, and the multi-task detection model (namely the trained multi-task detection model) is obtained through comprehensive optimization training. Specifically, image feature extraction is carried out on an image to be detected through a convolution layer of a shared network module to obtain a first feature map, the first feature map is detected by adopting a preset convolution core to obtain a detection result, and the detection result comprises a probability value of whether each pixel in the first feature map is a target object or not and a prediction frame position.

The first feature map may include at least one pixel feature map, for example, the obtained first feature map includes a feature map of 38 × 38, or the first feature map includes feature maps of 38 × 38 and 18 × 18, or may further include more feature maps, which may be specifically set according to actual requirements.

Optionally, the shared network module and the detection network module may adopt an SSD (Single shot multi box Detector) network as a backbone network architecture for detection, and the backbone network adds a convolution layer on the basis of VGG16 (oxy ford Visual Geometry Group 16) to obtain more feature maps for detection. The illumination classification network module and the detection network module share a front-end feature extraction layer (namely a shared network module), so that hard parameter sharing of multi-task learning is realized, and the network can adjust the weight according to a classification loss function of image distinguishing tasks.

Illustratively, as shown in fig. 4, a network architecture diagram of the multitask detection model provided for this embodiment is shown.

Optionally, determining an image of the target object in the image to be detected according to the detection result, specifically including:

step 1041, determining a bounding box of the target object image in the image to be detected according to the predicted box position corresponding to each pixel in the first feature map, and determining the pixel which is the target object in the image to be detected according to the probability value of whether each pixel in the first feature map corresponds to the target object.

And 1042, determining the target object image in the image to be detected according to the boundary frame of the target object image in the image to be detected and the pixels of the target object.

Specifically, after the detection result is obtained, the boundary frame of the target object image in the image to be detected can be determined according to the position of the prediction frame corresponding to each pixel in the first feature map included in the detection result, and the pixel which is the target object in the image to be detected can be determined according to the probability value of whether each pixel in the first feature map corresponds to the target object. And determining the target object image in the image to be detected according to the boundary frame of the target object image in the image to be detected and the pixels of the target object.

Optionally, after performing image feature extraction on the image to be detected through a convolution layer of the shared network module to obtain a first feature map, the method further includes:

in step 2011, the first feature map is convolved to obtain at least one second feature map.

Correspondingly, detecting the first feature map by using a preset convolution kernel to obtain a detection result, which may specifically include:

step 2012, detecting the first feature map and the at least one second feature map by using a preset convolution kernel to obtain a detection result, where the detection result includes a probability value of whether each pixel in the first feature map is the target object and a first prediction box position, and a probability value of whether each pixel in each second feature map is the target object and a second prediction box position.

Specifically, in the detection network module, a second feature map, for example, the first feature map is 38 × 38, a second feature map of 19 × 19, and second feature maps of 10 × 10, 5 × 5, 3 × 3, 1 × 1, etc., may be further obtained to obtain more feature maps for detection. Different information in the original image can be acquired through different feature maps, and the accuracy of image detection is further improved.

And after the second characteristic diagram is obtained, detecting the first characteristic diagram and at least one second characteristic diagram by adopting a preset convolution kernel to obtain a detection result.

And after the detection result is obtained, determining the target object image in the image to be detected based on the detection result.

As another implementable manner, on the basis of the foregoing embodiment, optionally before obtaining the detection result based on the image to be detected by using a trained multi-task detection model, the method further includes:

step 2021, acquiring training images and labeling data corresponding to each training image, where the training images include normal light images and dim light images.

Step 2022, training the pre-established multitask detection network based on the training image to obtain a multitask detection model, wherein the multitask detection network comprises a shared network module, a detection network module and an illumination classification network module.

Specifically, the multi-task detection model needs to be obtained before the multi-task detection model is used, specifically, the multi-task detection model can be obtained through training, training images, namely, annotation data corresponding to each training image can be obtained, and the training images comprise normal illumination images and dark light images, so that the multi-task detection model obtained through training can have better generalization capability, and can accurately detect a target object under the dark light images.

After the training image and the annotation data are obtained, a pre-established multi-task detection network can be trained based on the training image and the annotation data to obtain a multi-task detection model.

Optionally, training a pre-established multi-task detection network based on the training image to obtain a multi-task detection model, which may specifically include:

step 2031, the normal light image and the dim light image in the training image are alternately used as input images.

Step 2032, for each input image, extracting image features of the input image through the convolution layer of the shared network module to obtain a first training feature map, and determining at least one second training feature map according to the first training feature map.

Step 2033, detecting the first training characteristic diagram and the second training characteristic diagram by using a preset convolution kernel of the detection network module to obtain a training detection result; and determining an illumination classification result corresponding to the first training characteristic diagram by adopting an illumination classification network module.

Step 2034, based on the training detection result, the illumination classification result and the preset weighting loss function, adjusting the network parameters of the multitask detection network, wherein the weighting loss function comprises the weighting of the detection network module loss function and the illumination classification network module loss function.

Step 2035, until the final weighting loss function meets the preset requirement, a multitask detection model is obtained.

Specifically, a multitask detection network can be established in advance, training data can be obtained, the training data can comprise training images and label data corresponding to the training images, in order to improve the generalization capability of the multitask detection network, a multitask detection model obtained through training can detect a target object in an image under normal illumination and a target object in a dim image, during training, the training images can comprise a normal illumination image and a dim image, the multitask detection network can distinguish classification of tasks according to the images during training optimization, common features under dim light and normal illumination scenes tend to be extracted when image features are extracted, and overfitting under the normal illumination scenes is avoided.

For the training image, the normal illumination image and the dim illumination image can be alternately used as input images to train the multitask detection network. The specific alternation rule may be set according to actual requirements, for example, one normal illumination image and one dim image may be alternated, two normal illumination images and two dim images may be alternated, or other alternation modes may also be adopted.

For each input image, image feature extraction may be performed on the input image through a convolutional layer sharing a network module to obtain a training feature map (which may be referred to as a first training feature map for distinction), and at least one second training feature map may be determined according to the first training feature map. Or when the shared network module extracts the image features, a plurality of training feature maps are extracted, that is, the first training feature map may include a plurality of training feature maps for subsequent training.

After the first training feature map and the second training feature map are obtained, a preset convolution kernel of the detection network module can be adopted for detection, and a training detection result is obtained. And if only the first training characteristic diagram is obtained, detecting the first training characteristic diagram to obtain a training detection result. And determining an illumination classification result corresponding to the first training characteristic diagram by adopting an illumination classification network module. The illumination classification result can include both normal illumination and dim illumination. The labeling data corresponding to each training image may include an actual boundary box of a target object in the training image actually labeled and an actual illumination classification of the training image, and after a training detection result is obtained, a weighting loss function is determined according to the labeling data, the training detection result, and the illumination classification result corresponding to each training image, wherein the weighting loss function includes weighting of a detection network module loss function and an illumination classification network module loss function. And adjusting network parameters of the multi-task detection network according to the weighting loss function, and performing the training process in a circulating manner until the weighting loss function finally meets the preset requirement, so as to obtain a trained task detection model. The weights of two loss functions in the weighted loss functions may be set according to actual requirements, and the embodiment of the present application is not limited.

As an exemplary implementation, as shown in fig. 5, a schematic diagram of a network architecture of a multitask detection model provided in this embodiment is shown. The network module is shared by the shared convolution layer, the network module is detected by the bounding box detection, and the network module is classified by the illumination condition classification. Here, the target object takes a human face as an example, and the training process of the multi-task detection model for human face detection is described in detail as follows:

1. face detection network (detection network module)

The SSD network is adopted as a face detection framework, and a convolution layer is added to the backbone network on the basis of VGG16 to obtain more feature maps for detection. The method comprises the steps of scaling a training image to 300 × 300, sending the training image into a neural network, extracting image features through a convolution layer, obtaining a 38 × 38 feature map (namely a first training feature map) through three convolution modules, detecting the first training feature map by adopting a 3 × 3 convolution kernel (namely a preset convolution kernel), presetting k (k takes 4 or 6) prior frames for each unit for the feature map with the size of m × n, and needing (s +4) × k convolution kernels to finish the detection process of the first training feature map, wherein the classification number is s (s is 2 in a face detection task). Similarly, the network will use convolution kernels to make predictions on the signatures (i.e., the second training signature) of 19 × 19, 10 × 10, 5 × 5, 3 × 3, and 1 × 1, respectively, to generate prediction boxes corresponding to the respective bounding boxes. Then, the network matches the real bounding boxes according to the generated prediction boxes, during matching, for each real bounding box in the training image, the prediction box with the largest IOU (Intersection Of Union) is selected to match with the real bounding box, and for the rest prediction boxes, if the threshold value Of the IOU is more than 0.5, the prediction box is also reserved. Meanwhile, due to the fact that the number of prediction frames corresponding to the real boundary frame is too small, positive and negative samples are unbalanced, negative samples are sampled, the negative samples are arranged in a descending order according to confidence errors, only a plurality of prediction frames with large errors (specific number can be set according to actual requirements) are selected as training negative samples, and the proportion of the positive samples to the negative samples is close to 1: 3 (the specific proportion value can be set according to actual requirements).

The loss function of the face detection network is as follows:

wherein, x is the output result (i.e. the training detection result, which may include s +4 values corresponding to each pixel), c is the category confidence prediction value (i.e. the probability value of whether each pixel belongs to the face), l is the position prediction value of the prediction box corresponding to the prior box, g is the position parameter of the real labeling box (i.e. the real face bounding box), and N is the number of positive samples of the prior box,

in order to be a confidence measure of the error,

is a position error; alpha can be set to 1 through cross validation, and can also be set to other values according to actual requirements. Wherein:

wherein the content of the first and second substances,

representing the probability that the ith prior box predicts the pth class,

in order to indicate the parameters of the device,

indicating that the ith prior box matches the jth true bounding box, Pos indicates a positive sample, and Neg indicates a negative sample.

Wherein the content of the first and second substances,

is a predicted value of the bounding box (i.e. the position coordinate value of the predicted box),

the real value of the bounding box (i.e. the position coordinate value of the real bounding box), { cx, cy, w, h } refers to the coordinates of the bounding box or the prediction box; smooth_L1Represents SmoothL1 loss, defined as follows:

2. illumination condition discrimination network (i.e. illumination classification network module)

The illumination condition discrimination network and the face detection network share a feature extraction layer at the front end of the network, so that hard parameter sharing of multi-task learning is realized, the network can adjust the weight according to a classification loss function of an image distinguishing task, common features under dim light and normal illumination scenes are more prone to being extracted when image features are extracted, and overfitting under the normal illumination scenes is avoided. The illumination condition judging network is led out from a third convolution module of the SSD network, feature graphs of 19 × 19 and 10 × 10 are obtained through convolution, and then two classification is realized by connecting two full connection layers. In order to balance the difficulty between the two tasks, a temperature coefficient sigma of 2 can be added into the network output layer, and then the probability distribution of softening is obtained through a softmax layer.

The loss function of the lighting condition discrimination network is as follows:

wherein z is_tFor the original output of the network (i.e. the probability of belonging to normal light and dim light),

to predict the result (i.e., normalized prediction probability), y_tFor the real label (i.e. the labeled illumination classification label), t is 0,1

3. Loss function weighting

In the multi-task learning, a gradient problem due to imbalance in convergence rates among a plurality of tasks is likely to occur, and therefore, it is necessary to appropriately set weights among tasks. In this example, the face detection task is more difficult and slower to converge than the illumination condition detection, so a higher weight can be given to the face detection loss.

Exemplary, λ₁＝0.7，λ₂＝0.3。

The processing method for image detection provided by the embodiment of the application is based on the strong extraction capability of the convolutional neural network on the image characteristics, and the multi-scale characteristic information of the image is extracted through the CNN to detect the dim light face, so that an end-to-end neural network model is realized. Due to the fact that a multi-task learning method is adopted, a face detection task and an illumination condition judgment task are trained at the same time, the weight of feature extraction is shared, the phenomenon that a network is over-fitted on a face data set under normal illumination is prevented, common face features under dim light and normal illumination are extracted when weights are learned, and face detection under the dim light condition is achieved. Because hard parameter sharing is used, the parameter quantity is greatly reduced, the calculated quantity is saved, and the deduction speed is improved. When the method is applied to actual detection, only the face detection network for detection is used for reasoning to obtain a corresponding boundary box, the result of judging the picture illumination condition by the illumination condition judging network is not required to be obtained, and NMS (Non-Maximum Suppression) is added at the end of the network.

Illustratively, as shown in fig. 6, a schematic diagram of a network detection process provided for this embodiment is shown.

It should be noted that the respective implementable modes in the present embodiment may be implemented individually, or may be implemented in combination in any combination without conflict, and the present application is not limited thereto.

Still another embodiment of the present application provides an image detection processing apparatus, configured to execute the method of the foregoing embodiment.

As shown in fig. 7, a schematic structural diagram of the image detection processing apparatus provided in this embodiment is shown. The processing device 30 for image detection comprises an acquisition module 31, a preprocessing module 31, a detection module 33 and a determination module 34.

The acquisition module is used for acquiring an original image; the preprocessing module is used for carrying out scaling processing on the original image to obtain an image to be detected; the detection module is used for obtaining a detection result by adopting a trained multi-task detection model based on the image to be detected, and the multi-task detection model is obtained by performing optimization training by combining a detection task and an illumination classification task; and the determining module is used for determining the target object image in the image to be detected according to the detection result.

The specific manner in which the respective modules perform operations has been described in detail in relation to the apparatus in this embodiment, and will not be elaborated upon here.

According to the processing device for image detection provided by the embodiment, the original image is obtained, the original image is zoomed to obtain the image to be detected which meets the input requirement of the multi-task detection model, based on the image to be detected, the trained multi-task detection model can be adopted to obtain the detection result, the target object image in the image to be detected is determined according to the detection result, the target object in the image can be accurately detected, because the multi-task detection model is obtained by combining two tasks of detection and illumination classification and simultaneously carrying out training optimization, the target object detection of the dark light image has better generalization capability, the target object in the dark light image can be accurately detected, and the accuracy and the effect of the target object detection in the dark light scene are effectively improved.

The present application further provides a supplementary description of the apparatus provided in the above embodiments.

As a practical manner, on the basis of the foregoing embodiment, optionally, the preprocessing module is specifically configured to:

the original image is scaled to the size required by the input of the multitask detection model.

As another implementable manner, on the basis of the foregoing embodiment, optionally, the network architecture of the multitasking detection model includes a shared network module, a detection network module, and an illumination classification network module; the detection module is specifically configured to:

performing image feature extraction on an image to be detected through a convolution layer of a shared network module to obtain a first feature map; and detecting the first characteristic diagram by adopting a preset convolution core to obtain a detection result, wherein the detection result comprises a probability value of whether each pixel in the first characteristic diagram is a target object and a prediction frame position.

Optionally, the determining module is specifically configured to:

determining a boundary frame of a target object image in the image to be detected according to the position of a prediction frame corresponding to each pixel in the first characteristic diagram, and determining the pixel of the target object in the image to be detected according to the probability value of whether each pixel in the first characteristic diagram corresponds to the target object; and determining the target object image in the image to be detected according to the boundary frame of the target object image in the image to be detected and the pixels of the target object.

Optionally, the detection module is further configured to:

performing convolution on the first characteristic diagram to obtain at least one second characteristic diagram; detecting the first characteristic diagram by adopting a preset convolution kernel to obtain a detection result, wherein the detection result comprises the following steps:

and detecting the first characteristic diagram and at least one second characteristic diagram by adopting a preset convolution core to obtain a detection result, wherein the detection result comprises a probability value of whether each pixel in the first characteristic diagram is a target object and a first prediction frame position, and a probability value of whether each pixel in each second characteristic diagram is the target object and a second prediction frame position.

As another implementable manner, on the basis of the above embodiment, optionally, the obtaining module is further configured to obtain training images and annotation data corresponding to each training image, where the training images include normal light images and dark light images; the detection module is also used for training a pre-established multi-task detection network based on the training image to obtain a multi-task detection model, and the multi-task detection network comprises a shared network module, a detection network module and an illumination classification network module.

Optionally, the detection module is specifically configured to:

alternately taking a normal illumination image and a dim light image in the training image as input images; for each input image, performing image feature extraction on the input image through a convolutional layer of a shared network module to obtain a first training feature map, and determining at least one second training feature map according to the first training feature map; detecting the first training characteristic diagram and the second training characteristic diagram by adopting a preset convolution kernel of the detection network module to obtain a training detection result; determining an illumination classification result corresponding to the first training feature map by adopting an illumination classification network module; adjusting network parameters of the multi-task detection network based on the training detection result, the illumination classification result and a preset weighting loss function, wherein the weighting loss function comprises the weighting of the loss function of the detection network module and the loss function of the illumination classification network module; and obtaining the multi-task detection model until the final weighting loss function meets the preset requirement.

According to the processing device for image detection in the embodiment, the original image is obtained, the original image is zoomed to obtain the image to be detected which meets the input requirement of the multi-task detection model, based on the image to be detected, the trained multi-task detection model can be adopted to obtain the detection result, the target object image in the image to be detected is determined according to the detection result, the target object in the image can be accurately detected, and the multi-task detection model is obtained by combining two tasks of detection and illumination classification and simultaneously carrying out training optimization, so that the processing device has better generalization capability on the target object detection of the dark light image, can accurately detect the target object in the dark light image, and effectively improves the accuracy and effect of the target object detection in the dark light scene.

Yet another embodiment of the present application provides an electronic device for performing the method provided by the foregoing embodiment.

As shown in fig. 8, is a schematic structural diagram of the electronic device provided in this embodiment. The electronic device 50 includes: at least one processor 51 and memory 52;

the memory stores computer-executable instructions; the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform a method as provided by any of the embodiments above.

According to the electronic equipment of the embodiment, the original image is obtained, the original image is zoomed to obtain the image to be detected which meets the input requirement of the multi-task detection model, based on the image to be detected, the trained multi-task detection model can be adopted to obtain the detection result, the target object image in the image to be detected is determined according to the detection result, the target object in the image can be accurately detected, and the multi-task detection model is obtained by combining two tasks of detection and illumination classification and simultaneously carrying out training optimization, so that the electronic equipment has better generalization capability on the target object detection of the dark light image, can accurately detect the target object in the dark light image, and effectively improves the accuracy and the effect of the target object detection in the dark light scene.

Yet another embodiment of the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when the processor executes the computer-executable instructions, the method provided in any one of the above embodiments is implemented.

According to the computer-readable storage medium of the embodiment, the original image is obtained, the original image is zoomed to obtain the image to be detected which meets the input requirement of the multi-task detection model, based on the image to be detected, the trained multi-task detection model can be adopted to obtain the detection result, the target object image in the image to be detected is determined according to the detection result, the target object in the image can be accurately detected, because the multi-task detection model is obtained by combining two tasks of detection and illumination classification and simultaneously carrying out training optimization, the target object detection of the dark light image has better generalization capability, the target object in the dark light image can be accurately detected, and the accuracy and the effect of the target object detection in the dark light scene are effectively improved.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A processing method for image detection, comprising:

acquiring an original image;

zooming the original image to obtain an image to be detected;

2. The method according to claim 1, wherein the scaling the original image to obtain the image to be detected comprises:

3. The method of claim 1, wherein the network architecture of the multitasking detection model comprises a shared network module, a detection network module and a lighting classification network module;

4. The method according to claim 3, wherein the determining the target object image in the image to be detected according to the detection result comprises:

5. The method according to claim 3, wherein after the image feature extraction is performed on the image to be detected through the convolution layer of the shared network module to obtain a first feature map, the method further comprises:

6. The method according to any one of claims 1 to 5, wherein before obtaining the detection result based on the image to be detected by using the trained multi-task detection model, the method further comprises:

7. The method of claim 6, wherein training a pre-established multi-tasking detection network based on the training image to obtain the multi-tasking detection model comprises:

8. A processing apparatus for image detection, comprising:

the acquisition module is used for acquiring an original image;

9. An electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of any one of claims 1-7.

10. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1-7.