CN111738934B

CN111738934B - Automatic red eye repairing method based on MTCNN

Info

Publication number: CN111738934B
Application number: CN202010413910.8A
Authority: CN
Inventors: 苏雪平; 高蒙; 陈宁; 任劼; 李云红; 朱丹尧; 段嘉伟
Original assignee: Xian Polytechnic University
Current assignee: Xian Polytechnic University
Priority date: 2020-05-15
Filing date: 2020-05-15
Publication date: 2024-04-02
Anticipated expiration: 2040-05-15
Also published as: CN111738934A

Abstract

The invention discloses an automatic red eye repair method based on MTCNN, which is implemented according to the following steps: step1, inputting a red eye image into an MTCNN network, wherein the MTCNN network detects a human face and returns the position of the human face and the horizontal and vertical coordinates of the pupils, the nasal tips, the left mouth corner and the right mouth corner of the human face; step2, calculating the pupil distance of the eyes according to the pupil coordinates of the eyes of the face obtained in the step1, and then expanding the proportion to obtain the ROI after parameter adjustment; and 3, performing operations of shielding red eyes, cleaning pupil masks and repairing the red eyes on the ROI obtained in the step2, and finally copying the processed image to an eye area of the original image to obtain a repaired face image. The method has the advantages of full automation, low false detection rate and high red eye repairing speed.

Description

Automatic red eye repairing method based on MTCNN

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an automatic red eye repairing method based on MTCNN.

Background

Red eye is a red spot in the photo at the pupil of the human eye, which is caused by the flash of the camera. When the ambient brightness is dark, the human pupil is properly enlarged, and when the eyes suddenly receive intense light, the blood vessels reflect the light reflected to the blood color of the lens, so that the pupil of the eyes in the photo is red. Red eye has a large contrast with the eye colors that people have previously perceived, which reduces the quality of the photo. Red eye is a common uncoordinated factor in photography, and many scholars propose various red eye repairing methods, mainly comprising two methods of full-automatic repairing and semi-automatic repairing. Principle of semi-automatic red eye repair algorithm: the ROI of the red eye is first manually selected (Region of Interest ), then the eye position is determined using a corresponding algorithm, and finally the eye pixels are adjusted to repair the red eye. Although the semiautomatic red eye repair algorithm is accurate, it requires manual processing and cannot be adapted to process large-scale data. Basic principle of automatic red eye repair algorithm: firstly, using some characteristics of eyes, adopting a corresponding method to automatically determine the red eye position, and finally repairing the red eye. Although the automatic red eye repairing algorithm does not need manual processing, the method has the advantages of low efficiency, low speed, easy noise interference and poor robustness, so that the conventional red eye repairing method has the problems of relatively low red eye repairing speed and relatively high false detection rate in a comprehensive view.

Disclosure of Invention

The invention aims to provide an automatic red eye repairing method based on MTCNN, which solves the problems of relatively low red eye repairing speed and relatively high false detection rate in the red eye repairing method in the prior art.

The technical proposal adopted by the invention is that,

the automatic red eye repairing method based on the MTCNN is implemented according to the following steps:

step1, inputting a red eye image into an MTCNN network, wherein the MTCNN network detects a human face and returns the position of the human face and the horizontal and vertical coordinates of the pupils, the nasal tips, the left mouth corner and the right mouth corner of the human face;

step2, calculating the pupil distance of the eyes according to the pupil coordinates of the eyes of the face obtained in the step1, and then expanding the proportion to obtain the ROI after parameter adjustment;

and 3, performing operations of shielding red eyes, cleaning pupil masks and repairing the red eyes on the ROI obtained in the step2, and finally copying the processed image to an eye area of the original image to obtain a repaired face image.

The present invention is also characterized in that,

the step1 is specifically implemented according to the following steps:

step 1.1, creating an image pyramid according to the set size of an input red eye image, and performing multi-stage scaling on the red eye image to obtain a group of input images with different sizes;

step 1.2, inputting a group of images with different sizes into a P-Net, generating a feature map through a convolution layer and a pooling layer with different sizes in sequence, judging face contour points through the feature map, generating face candidate frames and frame regression vectors after the images are analyzed and processed by the P-Net, and obtaining a plurality of face candidate frames after recalibration;

step 1.3, inputting the plurality of face candidate boxes obtained in the step 1.2 into R-Net for further training; continuously removing the face candidate frames which do not reach the standard through the set threshold value, and inhibiting and removing the face candidate frames with high overlapping by using a non-maximum value to obtain a plurality of face candidate frames after further training;

and 1.4, inputting the plurality of face candidate boxes obtained in the step 1.3 after further training into an O-Net network, and finally outputting the face position and characteristic points of the horizontal and vertical coordinates of the pupils, the nasal tips, the left and right mouth corners of the eyes of the face after the O-Net network further accurately positions the face position.

In the step2, the calculation of the pupil distance of the eyes is specifically implemented according to the following steps:

the binocular coordinates returned by face detection calculate the distance of the pupils of the eyes using the following formula (6):

wherein D is _lr Is the distance between the pupils of the left eye and the right eye of the human face,and->Is the abscissa of the left eye, +.>Andthe abscissa of the right eye.

In the step2, the proportion expansion is specifically implemented according to the following steps:

the ROIs of the left eye and the right eye of the face are marked by rectangular frames respectively through the pupil distance of the two eyes according to a certain proportion, and the calculation formula is as follows:

wherein LEL _x,y Is the left upper corner of the left-eye rectangular frameCoordinates, LER _x,y REL is the right lower corner coordinate of the left-eye rectangular frame _x,y Right eye rectangular frame upper left corner coordinates, RER _x,y The lower right corner coordinates of the right-eye rectangular box imw and imh represent the width and height of the face image, respectively.

The step3 is specifically implemented according to the following steps:

step 3.1, dividing the ROI into three channels of red, green and blue, then creating a mask, and only processing the red pupil area; finally, the extracted red pupil area is set to be white, and other areas are set to be black;

step 3.2, performing contour detection on the created mask, extracting white areas which are possibly red eyes in the mask, calculating the area formed by the contour of each white area, storing the contour area with the largest area and pixel points, accurately positioning the red eyes, performing closed operation on the red eyes, and removing noise points in the red eyes;

and 3.3, creating an average channel by averaging the green channel and the blue channel, replacing all pixel values of the red, green and blue channels in the red eye region with the pixel values of the average channel, merging the red, green and blue channels, smoothing and denoising the repaired region by adopting bilateral filtering, and finally obtaining the repaired face image.

In step 3.3, the smoothing denoising treatment for the repair area by bilateral filtering is specifically performed according to the following formula (13):

wherein w (i, j, k, l) is defined by a spatial domain kernel w _d (i, j, k, l) and value range kernel w _r (i, j, k, l) by multiplying, specifically, the following formula (14):

where q (i, j) is the coordinates of the other coefficients of the template window; p (k, l) is the center coordinate point of the template window; sigma (sigma) _d And sigma (sigma) _r Standard deviation as gaussian function; f (i, j) represents the pixel value of the image at point q (i, j); f (k, l) represents the pixel value of the image at point p (k, l).

The beneficial effects of the invention are as follows: the automatic red eye repairing method based on the MTCNN is based on the face detection research result of the convolutional neural network in recent years, combines the advantages of the MTCNN, improves the face detection rate and the detection speed, improves the dissonance factor of red eyes in images, repairs the red eyes in the face images, and achieves full automation, 94.74% of the human eye detection rate, 3.57% of the human eye false detection rate, 84.11% of the red eye repairing rate and 347.51 milliseconds of the red eye image repairing speed.

Drawings

Fig. 1 is a schematic diagram of an automatic red eye repair method based on MTCNN according to the present invention;

FIG. 2 is a P-Net network diagram of the automatic red eye repair method based on the MTCNN of the present invention;

FIG. 3 is an R-Net network diagram of the automatic red eye repair method based on the MTCNN of the present invention;

fig. 4 is an O-Net network diagram of the automatic red eye repair method based on MTCNN of the present invention.

Detailed Description

The invention relates to an automatic red eye repairing method based on MTCNN, which is described in detail below with reference to the accompanying drawings and the detailed description.

As shown in fig. 1, the automatic redeye repairing method based on MTCNN is specifically implemented according to the following steps:

step2, calculating the pupil distance of the eyes according to the pupil coordinates of the eyes of the face obtained in the step1, then expanding the proportion, and obtaining the ROI (Region of Interest) after parameter adjustment;

Step 1.1, creating an image pyramid according to the set size of the input red eye image, and performing multi-stage scaling on the red eye image to obtain a group of input images with different sizes;

step 1.2, inputting a group of images with different sizes into a full convolutional neural network (P-Net), generating feature images through convolutional layers and pooling layers with different sizes in sequence, judging face contour points through the feature images, generating face candidate frames and frame regression vectors after the images are analyzed and processed by the P-Net, and obtaining a plurality of face candidate frames after recalibration;

step 1.3, inputting the plurality of face candidate boxes obtained in the step 1.2 into R-Net for further training; continuously removing the face candidate frames which do not reach the standard through the set threshold value, and removing the face candidate frames with high overlapping by Non-maximum value suppression (Non-Maximum Suppression, NMS) to obtain a plurality of face candidate frames after further training;

Further, in step2, the calculation of the pupil distance of the eyes is specifically performed according to the following steps:

Further, in the step2, the proportion expansion is specifically implemented according to the following steps:

wherein LEL _x,y For the upper left corner of the left-eye rectangular frame, LER _x,y REL is the right lower corner coordinate of the left-eye rectangular frame _x,y Right eye rectangular frame upper left corner coordinates, RER _x,y The lower right corner coordinates of the right-eye rectangular box imw and imh represent the width and height of the face image, respectively.

Further, the step3 is specifically implemented according to the following steps:

The automatic red eye repair method based on the MTCNN is further described in detail through specific examples.

Examples

The invention discloses an automatic red eye repairing method based on MTCNN, which comprises the following steps:

MTCNN-based face detection

For an input red eye image, the red eye image is firstly input into an MTCNN network to detect a human face and returns a human face position and a human face key coordinate, and the specific steps are as follows:

step1: for a given input Image, an Image pyramid (image_pychlamid) is first created according to a set size (minsize), and the Image is subjected to a multi-level scaling (scale) operation, so as to obtain a set of input images with different sizes. Scale=0.7 and minsize=12, as chosen herein.

Step2: the set of images of different sizes from the image pyramid in Step1 are input into a full convolutional neural network (P-Net), as shown in fig. 2. The input layer size of the P-Net network is 12 x 3, the first convolution layer size is 3 x 10, and the maximum pooling layer size is 2 x 2, so as to generate 10 5*5 feature maps; the second convolution layer has a size of 3 x 16, and generates 16 3*3 feature maps; the third convolution layer has a size of 3 x 32, generating 32 signature graphs of 1*1. Finally, for 32 feature maps 1*1, firstly, generating 2 feature maps 1*1 for face classification through 2 convolution kernels of 1 x 32; secondly, generating 4 1*1 feature maps for judging a regression frame through 4 convolution kernels of 1 x 32; finally, 10 characteristic graphs 1*1 are generated through 10 convolution kernels of 1 x 32 and are used for judging the face contour points. The image is analyzed and processed by P-Net to generate face candidate frames and frame regression vectors, the layer network is firstly calibrated according to a set threshold (threshold), the face candidate frames which do not reach standards are removed, and Non-maximum suppression (Non-Maximum Suppression, NMS) is used for removing the face candidate frames which are highly overlapped.

Step3: inputting the candidate frames generated in Step2 into R-Net for further training, continuously removing the non-standard face candidate frames through the set threshold value, and removing the highly overlapped face candidate frames by NMS. As shown in fig. 3, the R-Net network has an R-Net input layer size of 24×24×3, a first convolution layer size of 3×3×28, and a maximum pooling layer size of 3*3, so as to generate 28 feature maps of 11×11. The second convolution layer size was 3 x 48, and the largest pooling layer size was 3*3, yielding 48 4*4 feature maps. The third convolution layer size is 2 x 64, generating 64 3*3 feature maps. The 64 3*3 feature maps are input to a 128-dimensional fully connected layer. Unlike step2, finally, face classification is performed using a full-connection layer with dimension 2, bounding box regression is performed using a full-connection layer with dimension 4, and face key point positioning is performed using a full-connection layer with dimension 10.

Step4: the several candidate boxes generated in Step3 are input into the O-Net network as shown in fig. 4. The size of the O-Net input layer is 48 x 3, the size of the first convolution layer is 3 x 32, and 32 feature maps of 23 x 23 are generated by adopting a maximum pooling layer of 3*3 size; the second convolution layer is 3 x 63, and the maximum pooling layer of 3*3 is adopted to generate 64 10 x 10 feature maps; the third convolution layer is 3 x 64, and the largest pooling layer with the size of 2 x 2 is adopted to generate 64 4*4 characteristic diagrams; the fourth convolution layer size is 2 x 128, and 128 feature maps of 3*3 size are generated; finally, 128 3*3-sized feature maps are connected to a 256-dimensional full connection layer. And finally, respectively carrying out face classification, bounding box regression and face key point positioning by using full-connection layers with dimensions of 2, 4 and 10. The O-Net is similar to the former two steps in removing the face candidate frame, and the face candidate frame is different from the two networks in further precisely positioning the face position, and finally outputting 5 characteristic points (pupil of eyes, nose tip, left and right mouth corners) of the face.

The threshold value threshold selected by the three networks is 0.6,0.7,0.7, the sliding step length of the convolution layer is 1, all zero filling is not adopted, the sliding step length of the pooling layer is 2, all zero filling is adopted, the activation function is PReLu, and the function expression is:

for sample x _i The judgment cross entropy loss function of the face is as follows:

wherein the method comprises the steps ofA true class label representing a face, 0 represents a non-face,1 represents a human face, p _i Represents x _i Probability of being a human face.

The face candidate frame regression adopts a Euclidean distance loss function:

wherein the method comprises the steps ofRepresenting the real coordinates of the face candidate box, +.>Representing face candidates derived from the network,the upper left corner abscissa, height and width of the face candidate box are included.

The characteristic point positioning of the face adopts a Euclidean distance loss function:

wherein the method comprises the steps ofRepresenting the real coordinates of 5 feature points of the face, < >>The system comprises the abscissa of pupils of two eyes of a human face, the abscissa of nasal tips, and the abscissas of left and right mouth corners.

The final objective function of the MTCNN network is:

where N represents the total number of samples. Alpha represents the weight of face judgment, candidate frame regression and feature point positioning in the current stage network, and beta represents the real label of the sample; in P-Net and R-Net networks, the alpha values of face, box, point are 1, 0.5, respectively, while in O-Net networks, the alpha values of face, box, point are 1, 0.5, 1, respectively.

(2) Human eye positioning

Aiming at the pupil coordinates of the eyes obtained by the face detection in the last step, the pupil distance of the eyes is calculated, then a certain proportion expansion is carried out, and a good eye region (namely the ROI for red eye repair) can be obtained through proper parameter adjustment, so that the calculated amount is reduced, and the robustness is improved. The method comprises the following specific steps:

step1: the binocular coordinates returned by face detection calculate the distance of the pupils of the eyes using the following formula (6):

Step2: the pupil distance of the eyes of the face calculated in Step1 is adjusted according to a certain proportion, and the ROIs of the left eye and the right eye of the face are marked by rectangular frames respectively, and the calculation formula is as follows:

(3) Red eye repair

The red eye repairing method provided by the invention comprises 3 steps of red eye shielding, pupil mask cleaning and red eye repairing, and specifically comprises the following steps:

step1: firstly, dividing a human eye ROI marked by a rectangular frame into R, G, B channels (namely red, green and blue channels); secondly, creating a red eye detector, namely creating a mask with a red channel pixel value larger than 50 and larger than the sum of the blue channel pixel value and the green channel pixel value, wherein the purpose is to use the mask as shielding, and only process the red pupil area; finally, the extracted red pupil area is set to be white, and other areas are set to be black. The calculation formula is as follows:

where mask represents a mask, N represents an image size, r _i 、b _i And g _i The pixel values of pixel i in the red, blue and green channels are represented, respectively. This step may initially locate the red eye region, but noise interference points may exist around or within the red eye region, so further accurate location and denoising are required.

Step2: it is known from Step1 that the red eye region is set to white and the other regions are set to black, so that the red eye region is positioned for further accuracy. Firstly, performing contour detection on the created mask, extracting white areas which are possibly red eyes in the mask, then calculating the area formed by the contour of each white area, and storing the contour area with the largest area and pixel points, so that the red eyes area can be accurately positioned. Since there may be interference of noise points inside and outside the precisely located red eye region, denoising processing is required. And a cross structure with the size of 5*5 is adopted to perform closed operation on the red eye region, so that noise points in the red eye region are removed, and meanwhile, the pupil region is more round.

Step3: through the above steps, each eye has a mask containing red portions, since red eyes fill the red channel in the image, saturate it, and red eyes break the texture only in the red channel, and still perform well in the green and blue channels, a reasonable texture should be found to repair it. The average channel is first created by averaging the green and blue channels, the formula:

and then replacing all pixel values of three channels in the red eye region by the average channel pixel value, and finally combining R, G, B three channels, wherein the boundary of the repaired eye region has a significant difference with surrounding pixels through the operation, and in order to make the repaired eye more natural, the repaired region is subjected to smooth denoising treatment by adopting bilateral filtering, and the calculation formula is as follows:

wherein w (i, j, k, l) is defined by a spatial domain kernel w _d (i, j, k, l) and value range kernel w _r (i, j, k, l) by multiplying as follows:

where q (i, j) is the coordinates of the other coefficients of the template window; p (k, l) is the center coordinate point of the template window; sigma (sigma) _d Sum sigma _r Standard deviation as gaussian function; f (i, j) represents the pixel value of the image at point q (i, j); f (k, l) represents the pixel value of the image at point p (k, l).

And finally, copying the processed image to an eye area of the original image, and outputting and storing the repaired face image.

The automatic red eye repairing method based on the MTCNN has the advantages of high face detection speed, good robustness under the unconstrained condition, capability of acquiring the coordinates of key points of human eyes through regression, and the like, and has the advantages of full automation, low false detection rate and high red eye repairing speed.

Claims

1. The automatic red eye repairing method based on the MTCNN is characterized by comprising the following steps:

step3, performing operations of red eye shielding, pupil mask cleaning and red eye repairing on the ROI obtained in the step2, and finally copying the processed image to an eye area of an original image to obtain a repaired face image;

the step3 is specifically implemented according to the following steps:

where mask represents a mask, N represents an image size, r _i 、b _i And g _i Respectively representing pixel values of a pixel point i in a red channel, a blue channel and a green channel;

wherein mean represents the average value of pixel values of pixel points i in the blue channel and the blue channel;

step 3.3, creating an average channel through the average green channel and the blue channel, replacing all pixel values of the red, green and blue channels in the red eye region with the pixel values of the average channel, merging the red, green and blue channels, smoothing and denoising a repaired region by adopting bilateral filtering, and finally obtaining a repaired face image;

in the step 3.3, the smoothing denoising treatment of the repair area by the bilateral filtering is specifically performed according to the following formula (13):

w(i,j,k,l)＝w _d (i,j,k,l)*w _r (i,j,k,l)

2. The automatic red eye repair method based on MTCNN according to claim 1, wherein the step1 is specifically implemented according to the following steps:

3. The automatic red eye repair method based on MTCNN according to claim 1, wherein in step2, the calculation of the pupillary distance of both eyes is specifically performed according to the following steps:

wherein D is _lr Is the distance between the pupils of the left eye and the right eye of the human face,and->Is the abscissa of the left eye, +.>And->The abscissa of the right eye.

4. The automatic red eye repair method based on MTCNN according to claim 3, wherein in step2, the ratio expansion is specifically implemented according to the following steps:

and marking the ROIs of the left eye and the right eye of the face by using rectangular frames respectively according to certain proportion by the pupil distance of the two eyes, wherein the calculation formula is as follows: