CN109993086A

CN109993086A - Method for detecting human face, device, system and terminal device

Info

Publication number: CN109993086A
Application number: CN201910215573.9A
Authority: CN
Inventors: 李江; 王行; 李骊; 周晓军; 盛赞; 李朔; 杨淼
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2019-03-21
Filing date: 2019-03-21
Publication date: 2019-07-09
Anticipated expiration: 2039-03-21
Also published as: CN109993086B

Abstract

This specification provides a kind of method for detecting human face, device, system and terminal device, this method comprises: obtaining depth image and the color image with the deepness image registration；The RGB information of depth information and color image to depth image is normalized jointly；The inner parameter of the camera of depth information and the shooting depth image based on depth image, obtains being sized for face candidate frame；By neural network model trained in advance, determine that score is higher than the face candidate frame of given threshold；The face candidate frame for being higher than given threshold based on the score determines target candidate frame as target human face region.Model inspection precision and robustness are improved using the depth information for the depth image being registrated with color image as the input data of neural network model using the embodiment of the present application；The size of candidate frame is set by depth information and camera internal reference simultaneously, accelerates detection speed, and further improve detection accuracy.

Description

Method for detecting human face, device, system and terminal device

Technical field

This specification is related to human face detection tech field more particularly to a kind of method for detecting human face, device, system and terminal Equipment.

Background technique

With the development of human face detection tech, in safe access control, vision-based detection, content-based image retrieval etc. The application value in field is also growing.

Current face's detection algorithm majority is based on color image, using multitask concatenated convolutional neural network (Multi Task Cascaded Convolutional Networks, MTCNN) it carries out.

It is affected since color image is illuminated by the light the factors such as condition, resolution ratio, color, it is more demanding to training data, Cause the robustness of algorithm model poor；And MTCNN model is more, and algorithm logic is complicated, response speed is slow.

Summary of the invention

To overcome the problems in correlation technique, present description provides a kind of method for detecting human face, device, system and Terminal device.

Specifically, the application is achieved by the following technical solution:

According to this specification embodiment in a first aspect, providing a kind of method for detecting human face, comprising:

Obtain depth image and the color image with the deepness image registration；

The RGB information of depth information and color image to depth image is normalized jointly, and exports normalization Four-way information, wherein the normalization four-way information includes normalized RGB information and normalized depth information；

The inner parameter of the camera of depth information and the shooting depth image based on depth image, obtains face and waits Select being sized for frame；

It is sized based on the normalization four-way information and the face candidate frame, passes through mind trained in advance Through network model, determine that score is higher than the face candidate frame of given threshold；

The face candidate frame for being higher than given threshold based on the score determines target candidate frame as target human face region.

According to the second aspect of this specification embodiment, a kind of human face detection device is provided, comprising:

Image acquisition unit, the color image for obtaining depth image and with the deepness image registration；

Normalization unit, the RGB information for depth information and color image to depth image carry out normalizing jointly Change, and export normalization four-way information, wherein the normalization four-way information includes normalized RGB information and normalizing The depth information of change；

Size acquiring unit for the depth information based on depth image and is shot in the camera of the depth image Portion's parameter obtains being sized for face candidate frame；

First determination unit, for the setting ruler based on the normalization four-way information and the face candidate frame It is very little, by neural network model trained in advance, determine that score is higher than the face candidate frame of given threshold；

Second determination unit, the face candidate frame for being higher than given threshold based on the score determine that target candidate frame is made For target human face region.

According to the third aspect of this specification embodiment, a kind of terminal device is provided, comprising: internal bus, and pass through Memory, processor and the external interface of internal bus connection；Wherein,

The external interface, the color image for obtaining depth image and with the deepness image registration；

The memory, for storing the corresponding machine readable instructions of Face datection；

The processor for reading the machine readable instructions on the memory, and executes described instruction with reality Now following operation:

According to the fourth aspect of this specification embodiment, a kind of face detection system is provided, comprising: depth camera, colour Camera and terminal device, wherein

The depth camera, for shooting depth image；

The color camera, for shooting coloured image, the depth camera is with the color camera by being registrated；

The terminal device, the color image for obtaining depth image and with the deepness image registration；To depth The depth information of image and the RGB information of color image are normalized jointly, and export normalization four-way information, In, the normalization four-way information includes normalized RGB information and normalized depth information；Depth based on depth image The inner parameter for spending the camera of information and the shooting depth image, obtains being sized for face candidate frame；Based on described Normalization four-way information and the face candidate frame are sized, and pass through neural network model trained in advance, are determined Score is higher than the face candidate frame of given threshold；The face candidate frame for being higher than given threshold based on the score determines target candidate Frame is as target human face region.

According to the 5th of this specification embodiment the aspect, a kind of face detection system is provided, comprising: with depth information Camera and terminal device, wherein

The camera with depth information, the cromogram for shooting depth image and with the deepness image registration Picture；

Using Face datection embodiment provided by the present application, the depth information for the depth image being registrated with color image is utilized As the input data of neural network model, model inspection precision and robustness are improved；Pass through depth information and phase simultaneously The size of candidate frame is arranged in machine internal reference, accelerates detection speed, and further improve detection accuracy.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not This specification can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows the reality for meeting this specification Example is applied, and is used to explain the principle of this specification together with specification.

Fig. 1 is a kind of the application flow chart of method for detecting human face shown according to an exemplary embodiment.

Fig. 2 is a kind of the application process of the method for trained neural network model shown according to an exemplary embodiment Figure.

Fig. 3 is a kind of the application structural schematic diagram of human face detection device shown according to an exemplary embodiment.

Fig. 4 is a kind of the application structure chart of terminal device shown according to an exemplary embodiment.

Fig. 5 is a kind of the application structural schematic diagram of face detection system shown according to an exemplary embodiment.

Fig. 6 is the structural schematic diagram of the application another face detection system shown according to an exemplary embodiment.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with this specification.On the contrary, they are only and such as institute The example of the consistent device and method of some aspects be described in detail in attached claims, this specification.

It is the flow chart of the method for detecting human face in one example of the application referring to Fig. 1.This method may include following step It is rapid:

In a step 101, depth image and the color image with the deepness image registration are obtained.

Wherein, depth image can be shot by depth camera, and color image can be shot by color camera；Or depth map Picture and color image are shot by the camera of depth information.

In the present embodiment, depth image and color image should be registrated, each in depth image to ensure A pixel can find its generic pixel in color image, while the two corresponding pixels are to identical bits in space The measurement set.

In one example, depth image and color image can be registrated in the following manner:

If depth image and color image are shot with position at the same angle by the camera with depth information, Or two images obtained from same primary shooting, then the depth image and color image are exactly to be registrated.

If depth image and color image are shot by depth camera and color camera respectively, by identical Under scene, depth camera and color camera are demarcated using identical method, then using the two cameras with same Depth image captured by angles and positions and color image are exactly to be registrated.

For example, carry out scale to depth camera and color camera respectively by Zhang Zhengyou calibration method, and two cameras into The fixed scene of rower must be identical.Depth camera and color camera are demarcated, depth camera can be obtained With the inner parameter of color camera.

The inner parameter of camera includes focal length, the position (as the position of plane) of principal point and pixel and true environment Size is the build-in attribute of camera, for the conversion between camera coordinates system and plane coordinate system.

In a step 102, the RGB information of the depth information of depth image and color image is normalized jointly, And export normalization four-way information.

Wherein, normalization four-way information includes normalized RGB information and normalized depth information.

In one example, can believe the RGB to the depth information of depth image and color image by the following method Breath is normalized jointly:

Depth information is normalized in the section 0-255, obtains the four-way information in same range；

Four-way information in the same range is normalized in [0,1] or [- 1,1] range.

Color image is rgb format, and pixel value includes the information in tri- channels RGB namely the pixel value of each pixel Tri- components of R, G, B are respectively included, the range of three is all [0,255].In order to which depth information and RGB information are carried out jointly Normalization, depth information is normalized in the section 0-255 first, by depth information be also converted into range [0,255] it Between, then obtain the four-way information in same range: R, G, B information and depth information.R, G, B information and depth are believed later Breath carries out being normalized in [0,1] or [- 1,1] range again, obtains normalization four-way information.

In step 103, the inner parameter of the camera of the depth information based on depth image and shooting depth image, is obtained Obtain being sized for face candidate frame.

The depth value of depth image directly reflects the distance of object distance camera, and includes in the inner parameter of camera The size of pixel and true environment, therefore the range based on human body head actual size, can for by the camera institute The depth image of shooting and the color image of registration can obtain being sized for face candidate frame.

It will be appreciated by those skilled in the art that being sized for above-mentioned face candidate frame can according to the actual situation and needs It is adjusted.

In the related technology, it when carrying out Face datection using MTCNN model, needs multiple dimensioned full figure to extract candidate frame and carries out again Face datection will lead to a large amount of unnecessary operations of progress, reduce algorithm and ring due to generate a large amount of detection face candidate frames Answer speed.

In the present embodiment, single ruler can be arranged in the inner parameter of the depth information based on depth image and camera The face candidate frame of degree avoids carrying out unnecessary operation, improves the response speed of algorithm.

At step 104, it is sized based on normalization four-way information and face candidate frame, by training in advance Neural network model, determine score be higher than given threshold face candidate frame.

In one example, neural network model can be trained by the following method.As shown in Fig. 2, this method The following steps are included:

In step 201, sample depth image and the sample color image with the sample depth image registration are obtained.

In this step, the method for registering of sample depth image and sample color image can be with depth map in step 101 Picture is identical with the method for registering of color image.

In step 202, the region in sample depth image and sample color image where face is marked.

When being marked, only sample color image can be marked to sample depth image or only, then is being registrated Sample image in respective pixel on generate corresponding label.It can also be simultaneously in sample depth image and sample color image On be marked, generate flag data.The flag data includes the mark value of different pixels coordinate, and mark value can be 1 or 0, Wherein, 1 face pixel is indicated, 0 indicates non-face pixel；Face pixel can also be indicated with 0,1 indicates non-face pixel.Namely Say, in flag data, each pixel with whether be face pixel label.

In step 203, the RGB information of the depth information to sample depth image and sample color image carries out common Normalization, and export normalization sample four-way information.

Wherein, normalization sample four-way information includes normalized sample depth information and normalized sample RGB letter Breath.

In this step, it can use method identical with step 102 to be normalized.

In step 204, normalization sample four-way information and flag data input neural network model are instructed Practice, until meeting the number of iterations or loss convergence.

In one example, which can be convolutional neural networks model.

By training, which can mark in the data of the color image of the data and registration of depth image Belong to the pixel of face.That is, four-way information input will be normalized into neural network model trained in advance, then it can be defeated Out with whether be face element marking pixel data.In the pixel data, whether it is face picture that each pixel has The label of element.

It is lower since whole Pixel Informations using depth image and gray level image carry out the efficiency of face pixel detection , therefore in the present embodiment, the detection of face pixel is carried out in conjunction with the candidate frame being sized:

The normalization four-way information for choosing respective range every time by the candidate frame being sized, selected is returned One changes four-way information input into neural network model trained in advance, is judged from the pixel data inputted using model Belong to the pixel data of face out.That is, the model output with whether be face element marking pixel data.

To set step-length sliding face candidate frame, by changing the selection range of face candidate frame, i.e., choose every time different The normalization four-way information of range exports after traversal all normalization four-way information and chooses corresponding data every time. Wherein the setting step-length can be configured according to Face datection the required accuracy.

From the data for selecting output every time, the face candidate for being higher than given threshold comprising face pixel score is determined Frame.It is chosen for each, the secondary selection can be obtained according to the face pixel quantity (or ratio) that face candidate frame is included The score of face candidate frame.That is, the quantity (or ratio) for the face pixel that the face candidate frame chosen every time is included Higher, then score is higher；It is on the contrary then score is lower.When the secondary selection face candidate frame score be higher than given threshold, then recognize For in the face candidate frame of the secondary selection include face；Otherwise, then it is assumed that do not include face.Wherein, which can root It is configured according to accuracy needed for Face datection.

In the present embodiment, the number of face pixel included in the data for choosing output every time can be utilized as described above Amount (or ratio) is given a mark come the secondary selection to face candidate frame, and face candidate frame is confirmed by the score of the secondary selection In whether include face.It will be appreciated by those skilled in the art that also can use other because of usually appraiser's face candidate frame Score.

In step 105, the face candidate frame for being higher than given threshold based on the score determines target candidate frame as mesh Mark human face region.

The face candidate frame that score is higher than given threshold may be one, it is also possible to have multiple.When score is higher than setting threshold The face candidate frame of value have it is multiple, can be with this since face may be different the location of in each candidate frame Determine that is chosen the best conduct target candidate frame of effect in multiple face candidate frames.

In one example, using non-maxima suppression NMS algorithm from the face candidate frame that score is higher than given threshold Determine target candidate frame.It will be appreciated by those skilled in the art that the method for determining target candidate frame is not limited to the above, also Other methods can be used, such as face face candidate frame most placed in the middle can be chosen as target candidate frame etc..

Using identified target candidate frame as target human face region, then Face datection is realized.It can be in color image In show target human face region, target human face region can also be shown on depth image.

Corresponding with the embodiment of preceding method, this specification additionally provides the implementation of device, system and terminal device Example.

It is one embodiment block diagram of the application human face detection device referring to Fig. 3.The device includes: image acquisition unit 310, normalization unit 320, size acquiring unit 330, the first determination unit 340, the second determination unit 350.

Wherein, image acquisition unit 310, the cromogram for obtaining depth image and with the deepness image registration Picture；

Normalization unit 320, the RGB information for depth information and color image to depth image are returned jointly One changes, and exports normalization four-way information, wherein the normalization four-way information includes normalized RGB information and returns One depth information changed；

Size acquiring unit 330, the camera for depth information and the shooting depth image based on depth image Inner parameter, obtain face candidate frame and be sized；

First determination unit 340, for the setting based on the normalization four-way information and the face candidate frame Size determines that score is higher than the face candidate frame of given threshold by neural network model trained in advance；

Second determination unit 350, the face candidate frame for being higher than given threshold based on the score determine target candidate Frame is as target human face region.

It referring to fig. 4, is one embodiment block diagram of the application terminal device.The terminal device includes:

Internal bus 410, and the memory 420, processor 430 and the external interface 440 that are connected by internal bus.

Wherein, external interface 440, the color image for obtaining depth image and with the deepness image registration；

Memory 420, for storing the corresponding machine readable instructions of Face datection；

Processor 430 for reading the machine readable instructions on memory, and is executed instruction to realize following operation:

It is one embodiment block diagram of the application face detection system with reference to Fig. 5.The system may include: depth camera 510, color camera 520 and terminal device 530.

Wherein, depth camera 510, for shooting depth image；

Color camera 520, for shooting coloured image, the depth camera is with the color camera by being registrated；

Terminal device 530, the color image for obtaining depth image and with the deepness image registration；To depth map The depth information of picture and the RGB information of color image are normalized jointly, and export normalization four-way information, wherein The normalization four-way information includes normalized RGB information and normalized depth information；Depth based on depth image The inner parameter of the camera of information and the shooting depth image, obtains being sized for face candidate frame；Returned based on described One change four-way information and the face candidate frame are sized, by neural network model trained in advance, determining Divide the face candidate frame higher than given threshold；The face candidate frame for being higher than given threshold based on the score determines target candidate frame As target human face region.

It is another embodiment block diagram of the application face detection system with reference to Fig. 6.The embodiment and system shown in Figure 5 The difference is that, depth image and with the color image of the deepness image registration clapped by the camera 610 with depth information It takes the photograph.

In the embodiment of the present application, computer readable storage medium can be diversified forms, for example, in different examples In, the machine readable storage medium may is that RAM (Radom Access Memory, random access memory), volatile deposit Reservoir, nonvolatile memory, flash memory, memory driver (such as hard disk drive), solid state hard disk, any kind of storage dish (such as CD, dvd) perhaps similar storage medium or their combination.Special, described computer-readable medium Can also be paper or other be suitably capable of the medium of print routine.Using these media, these programs can be passed through The mode of electricity gets (for example, optical scanner), can be compiled, be explained and processing in an appropriate manner, then can be by It stores in computer media.

The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.

Claims

1. a kind of method for detecting human face characterized by comprising

Obtain depth image and the color image with the deepness image registration；

The RGB information of depth information and color image to depth image is normalized jointly, and exports normalization four-way Road information, wherein the normalization four-way information includes normalized RGB information and normalized depth information；

The inner parameter of the camera of depth information and the shooting depth image based on depth image, obtains face candidate frame Be sized；

It is sized based on the normalization four-way information and the face candidate frame, passes through nerve net trained in advance Network model determines that score is higher than the face candidate frame of given threshold；

2. the method according to claim 1, wherein depth image and color image are matched by the following method It is quasi-:

The depth image and the color image are shot respectively using the camera with depth information；Or

Under same scene, depth camera and color camera are demarcated using identical method, wherein the depth camera For shooting depth image, the color camera is for shooting coloured image.

3. the method according to claim 1, wherein the depth information and color image to depth image RGB information carry out common normalized include:

Depth information is normalized in the section 0-255, obtains the four-way information in same range, the four communication breath Including RGB information and depth information；

The four-way information is normalized in [0,1] or [- 1,1] range.

4. the method according to claim 1, wherein the neural network model is trained in the following manner It obtains:

Obtain sample depth image and the sample color image with the sample depth image registration；

The region in sample depth image and sample color image where face is marked, flag data is generated；

The RGB information progress of depth information and sample color image to sample depth image is common normalized, and exports and return One changes sample four-way information, and the normalization sample four-way information includes normalized sample depth information and normalized Sample RGB information；

The normalization sample four-way information and flag data input neural network model are trained, changed until meeting Until generation number or loss convergence.

5. the method according to claim 1, wherein described based on the normalization four-way information and described Face candidate frame is sized, and by neural network model trained in advance, determines that score is higher than the face time of given threshold The frame is selected to include:

Choose the normalization four-way information of respective range by the face candidate frame that is sized, and by selected normalization For four-way information input into neural network model trained in advance, whether it is face picture that the neural network model output has The pixel data of element label；

The face candidate frame is slided to set step-length, traverses all normalization four-way information, output is chosen corresponding every time Data；

Based on each data for choosing output, determine that score is higher than the face candidate frame of given threshold, wherein the score foundation The face pixel quantity for including in face candidate frame obtains.

6. the method according to claim 1, wherein the face for being higher than given threshold based on the score is waited Frame is selected to determine that target candidate frame includes: as target human face region

Target candidate is determined from the face candidate frame that the score is higher than given threshold using non-maxima suppression NMS algorithm Frame.

7. a kind of human face detection device characterized by comprising

Normalization unit, the RGB information for depth information and color image to depth image are normalized jointly, and Output normalization four-way information, wherein the normalization four-way information includes normalized RGB information and normalized depth Spend information；

Size acquiring unit, the inside ginseng of the camera for depth information and the shooting depth image based on depth image Number obtains being sized for face candidate frame；

First determination unit is led to for being sized based on the normalization four-way information and the face candidate frame After neural network model trained in advance, determine that score is higher than the face candidate frame of given threshold；

Second determination unit, the face candidate frame for being higher than given threshold based on the score determine target candidate frame as mesh Mark human face region.

8. a kind of terminal device characterized by comprising internal bus, and the memory by internal bus connection, processing Device and external interface；Wherein,

The processor for reading the machine readable instructions on the memory, and executes described instruction to realize such as Lower operation:

9. a kind of face detection system characterized by comprising depth camera, color camera and terminal device, wherein

The depth camera, for shooting depth image；

The terminal device, the color image for obtaining depth image and with the deepness image registration；To depth image Depth information and the RGB information of color image be normalized jointly, and export normalization four-way information, wherein institute Stating normalization four-way information includes normalized RGB information and normalized depth information；Depth letter based on depth image Cease and shoot the depth image camera inner parameter, obtain face candidate frame and be sized；Based on the normalizing Change being sized for four-way information and the face candidate frame, by neural network model trained in advance, determines score Higher than the face candidate frame of given threshold；Determine that target candidate frame is made based on the face candidate frame that the score is higher than given threshold For target human face region.

10. a kind of face detection system, which is characterized in that camera, terminal device including having depth information, wherein

The camera with depth information, the color image for shooting depth image and with the deepness image registration；