CN109840476B

CN109840476B - Face shape detection method and terminal equipment

Info

Publication number: CN109840476B
Application number: CN201811635950.6A
Authority: CN
Inventors: 董江凯
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2021-12-21
Anticipated expiration: 2038-12-29
Also published as: CN109840476A

Abstract

The embodiment of the invention discloses a face detection method and terminal equipment, relates to the technical field of terminals, and aims to solve the problem that the existing face detection technology is poor in robustness. The method comprises the following steps: acquiring a first image and a second image of a target object, wherein the first image is a depth image, the second image is a two-dimensional image, and the target object comprises a human face; acquiring contour information of the target object according to the first image; processing the target object in the second image according to the contour information to obtain a third image; and generating a face detection result according to the characteristic information of the target object in the third image. The scheme is particularly applied to the scene of face detection.

Description

Face shape detection method and terminal equipment

Technical Field

The embodiment of the invention relates to the technical field of terminals, in particular to a face shape detection method and terminal equipment.

Background

With the continuous development of terminal technology, the application of terminal equipment is more and more extensive. For example, in many scenes, face attribute analysis is required, and face detection is one of the common algorithms for face attribute analysis.

At present, the more common facial form detection methods include: and performing face shape detection based on a geometric local feature template matching method and a parameter optimization-based method. However, in the process of shooting the two-dimensional image, the edge characteristics of the portrait may be weakened due to the influences of illumination and background (strong illumination or weak illumination can cause the portrait area to be fuzzy, and the portrait-like background can cause interference to the portrait area), so that the characteristic modeling is greatly influenced. Thus, the existing face detection technology is less robust.

Disclosure of Invention

The embodiment of the invention provides a face detection method and terminal equipment, and aims to solve the problem that the existing face detection technology is poor in robustness.

In order to solve the technical problem, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a face detection method, including: acquiring a first image and a second image of a target object, wherein the first image is a depth image, the second image is a two-dimensional image, and the target object comprises a human face; acquiring contour information of the target object according to the first image; processing the target object in the second image according to the contour information to obtain a third image; and generating a face detection result according to the characteristic information of the target object in the third image.

In a second aspect, an embodiment of the present invention provides a terminal device, where the terminal device includes: the device comprises an acquisition module, a processing module and a generation module; the acquisition module is used for acquiring a first image and a second image of a target object, wherein the first image is a depth image, the second image is a two-dimensional image, and the target object comprises a human face; acquiring contour information of the target object according to the first image; the processing module is used for processing the target object in the second image according to the contour information acquired by the acquisition module to obtain a third image; the generating module is configured to generate a face detection result according to the feature information of the target object in the third image obtained by the processing module.

In a third aspect, an embodiment of the present invention provides a terminal device, which includes a processor, a memory, and a computer program stored in the memory and operable on the processor, where the computer program, when executed by the processor, implements the steps of the face detection method according to the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the face detection method as in the first aspect.

In the embodiment of the present invention, the terminal device may first obtain the contour information of the target object according to the depth image including the target object, then process the two-dimensional image including the target object according to the contour information to obtain a third image, and finally perform face detection on the face region in the third image to generate a face detection result. According to the scheme, although factors such as illumination and class background greatly affect portrait characteristics in the two-dimensional image, the portrait characteristics of the depth image are hardly affected, and therefore, a face detection result obtained by processing the second image in combination with the first image and then performing face detection on the obtained third image is more accurate than a face detection result obtained by performing face detection on the second image directly, and the robustness of the face detection technology can be improved.

Drawings

Fig. 1 is a schematic diagram of an architecture of a possible android operating system according to an embodiment of the present invention;

FIG. 2 is a flowchart of a face detection method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present invention;

fig. 4 is a hardware schematic diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," and "fourth," etc. in the description and in the claims of the present invention are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first image, the second image, the third image, the fourth image, and so on are for distinguishing different images, not for describing a specific order of the images.

In the embodiments of the present invention, words such as "exemplary" or "for example" are used to mean serving as examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "e.g.," an embodiment of the present invention is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.

In the description of the embodiments of the present invention, unless otherwise specified, "a plurality" means two or more, for example, a plurality of processing units means two or more processing units; plural elements means two or more elements, and the like.

The embodiment of the invention provides a face detection method, wherein terminal equipment can firstly obtain contour information of a target object according to a depth image comprising the target object, then process a two-dimensional image comprising the target object according to the contour information to obtain a third image, and finally perform face detection on a face area in the third image to generate a face detection result. According to the scheme, although factors such as illumination and class background greatly affect portrait characteristics in the two-dimensional image, the portrait characteristics of the depth image are hardly affected, and therefore, a face detection result obtained by processing the second image in combination with the first image and then performing face detection on the obtained third image is more accurate than a face detection result obtained by performing face detection on the second image directly, and the robustness of the face detection technology can be improved.

The following describes a software environment to which the face detection method provided by the embodiment of the present invention is applied, by taking an android operating system as an example.

Fig. 1 is a schematic diagram of an architecture of a possible android operating system according to an embodiment of the present invention. In fig. 1, the architecture of the android operating system includes 4 layers, which are respectively: an application layer, an application framework layer, a system runtime layer, and a kernel layer (specifically, a Linux kernel layer).

The application program layer comprises various application programs (including system application programs and third-party application programs) in an android operating system.

The application framework layer is a framework of the application, and a developer can develop some applications based on the application framework layer under the condition of complying with the development principle of the framework of the application.

The system runtime layer includes libraries (also called system libraries) and android operating system runtime environments. The library mainly provides various resources required by the android operating system. The android operating system running environment is used for providing a software environment for the android operating system.

The kernel layer is an operating system layer of an android operating system and belongs to the bottommost layer of an android operating system software layer. The kernel layer provides kernel system services and hardware-related drivers for the android operating system based on the Linux kernel.

Taking an android operating system as an example, in the embodiment of the present invention, a developer may develop a software program for implementing the face detection method provided in the embodiment of the present invention based on the system architecture of the android operating system shown in fig. 1, so that the face detection method may operate based on the android operating system shown in fig. 1. Namely, the processor or the terminal can implement the face detection method provided by the embodiment of the invention by running the software program in the android operating system.

The terminal device in the embodiment of the invention can be a mobile terminal device and can also be a non-mobile terminal device. The mobile terminal device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), etc.; the non-mobile terminal device may be a Personal Computer (PC), a Television (TV), a teller machine, a self-service machine, or the like; the embodiments of the present invention are not particularly limited.

An execution main body of the face detection method provided by the embodiment of the present invention may be the terminal device (including a mobile terminal device and a non-mobile terminal device), or may also be a functional module and/or a functional entity capable of implementing the method in the terminal device, which may be determined specifically according to actual use requirements, and the embodiment of the present invention is not limited. The following takes a terminal device as an example to exemplarily describe the face detection method provided by the embodiment of the invention.

Referring to fig. 2, an embodiment of the present invention provides a facial form detection method, which may include steps 201 to 204 described below.

Step 201, the terminal device acquires a first image and a second image of a target object.

The first image is a depth image, the second image is a two-dimensional image, and the target object comprises a human face area.

Specifically, the first image is a depth image of a target image, the second image is a two-dimensional image of the target image, the target image includes a target object, and the target object includes a face region.

The target image may be understood as the image content of the first image and the second image, i.e. the first image and the second image are images taken of the same scene with the same content.

The depth image is an image stored as depth information of the target image, and the two-dimensional image is an image stored as two-dimensional information of the target image. The two-dimensional image is a planar image without depth information, and includes a grayscale image, a color image, and the like. The color image further includes an RGB (three primary colors of red, green, and blue) color image, a YUV (luminance, chrominance) color image, and the like, and the embodiment of the present invention is not limited. At present, Depth images and two-dimensional images which are commonly used are Depth images (Depth maps) and RGB color images, also called RGB-D images.

The terminal device may capture the depth image and the two-dimensional image from other devices (e.g., downloaded from a network, captured by a camera, etc.), or may capture the depth image and the two-dimensional image via a camera on the terminal device (typically a combination of two cameras, one for capturing the depth image, such as an infrared camera or a structured light camera, and one for capturing the two-dimensional image, such as a color camera). It should be noted that: in the embodiment of the present invention, the depth image and the two-dimensional image are registered, that is, the coordinates of each pixel point in the depth image and the two-dimensional image are in one-to-one correspondence, and the specific registration process refers to the prior art and is not described herein again.

The target object comprises a human face region, namely a face region comprising a person, and the target object is understood to be the face region; or the target object includes other regions such as a neck region, a body trunk region, an extremity region, etc., in addition to the face region; the embodiments of the present invention are not limited.

In the embodiment of the invention, the depth image comprises depth information of a face area, and the two-dimensional image comprises two-dimensional information of the face area.

Step 202, according to the first image, the terminal device obtains the contour information of the target object.

Alternatively, the contour information may be a contour image, such as an image in which a background other than the target object is removed. The contour information at least comprises contour information of the face region, and at least one of the following items can be included: the contour information of the head region, the contour information of the neck region, the contour information of the trunk region of the body, the contour information of the limb region, and the like, but the embodiment of the present invention is not limited.

Optionally, the terminal device may obtain each pixel point of the boundary contour of the target object according to information such as a depth value and a depth gradient value of each pixel point in the depth image, and then, corresponding each pixel point of the boundary contour to the two-dimensional image (i.e., pixel matching), so as to obtain contour information (a contour image, an area surrounded by each pixel point of the contour boundary, i.e., a contour image) of the target object in the two-dimensional image. The specific implementation process may refer to the related art, and the embodiments of the present invention are not limited.

Optionally, the terminal device may obtain the contour information of the target object by performing normalization processing on the depth image.

Illustratively, this step 202 may be specifically realized by the step 202a described below.

Step 202a, the terminal device performs normalization processing on the first image to obtain the contour information of the target object.

In the embodiment of the present invention, the normalization processing is performed on the first image, which may also be referred to as mapping processing performed on the first image, that is, the depth image is converted into a grayscale image, and the formula is as follows:

wherein d is_i,jIs the depth value, t, of a pixel point (i, j) in the depth map_i,jThe pixel value t of a pixel point (i, j) in a gray level image obtained after the depth map is subjected to mapping processing_i,jIs an integer, if t is calculated according to the above formula_i,jInstead of being an integer, rounding is required (the above formula does not give a rounding procedure, and the rounding can be performed in detail with reference to the existing correlationHere, not described in detail, range (d) is the maximum value of the depth values of each pixel point in the depth map.

However, after the depth image is subjected to the mapping processing, the value range of the pixel value is generally large (for example, the value range of the pixel value is 0 to 4096), and the maximum and minimum values are too dense and belong to useless information, so that in order to simplify the subsequent processing, in the embodiment of the present invention, the mapped image is truncated to remove the useless information, and the value range of the pixel value is controlled to be between 0 and 255. The formula for the truncation process is as follows:

wherein p is_i,jIn order to obtain the pixel values of the pixel points (i, j) in the gray-scale image after the truncation process, α and β are empirical values, and those skilled in the art can obtain the values based on experience.

According to the method, the terminal equipment obtains the contour information of the target object after normalizing the first image.

Optionally, the terminal device may obtain the contour information of the target object by performing depth consistency division processing on the depth image. The method specifically comprises the following steps: the terminal device performs truncation (preprocessing) on the depth image, that is, removes a pixel point whose depth value is greater than the third threshold value in the depth image, and then performs normalization processing on the truncated depth image, and the specific process may refer to the prior art and is not described herein again. The value of the third threshold may be obtained empirically by a person skilled in the art, so that the processing procedure may be simplified by performing truncation preprocessing first and then performing normalization processing on the information of the first image.

Although factors such as illumination and similar backgrounds greatly affect portrait characteristics (such as brightness information) in the two-dimensional image, the factors hardly affect portrait characteristics (depth information) of the depth image, so that the contour information obtained by combining the depth image of the target object is relatively accurate, and the robustness of face shape detection can be improved.

Step 203, processing the target object in the second image according to the contour information to obtain a third image.

The contrast of the face region of the target object in the third image is greater than the contrast of the face region of the target object in the second image, that is, after the target object in the second image is processed, the contrast of the face region of the target object can be improved, so that the accuracy of face detection can be improved, and the robustness of the existing face detection technology can be improved.

It should be noted that: in the embodiment of the invention, the third image obtained after processing the target object in the second image enhances the contrast between the face region of the target object and the region around the face, and the face shape of the face region can be accurately obtained by performing face shape detection on the third image.

Illustratively, this step 203 may be: the terminal equipment performs at least one of the following processes on the target object in the second image according to the contour information: image enhancement processing and processing for extracting the target object. The step 203 may be implemented by the following steps 203a to 203b, 203c to 203d, 203e, or 203 f.

Step 203a, the terminal device performs image enhancement processing on the target object in the second image according to the contour information.

After the normalization processing is performed on the first image in step 202, the terminal device corresponds each pixel point of the contour boundary of the obtained contour information to the second image, and performs image enhancement processing on the region corresponding to the region surrounded by each pixel point of the contour boundary in the second image.

The image enhancement processing may improve a pixel value (brightness value) of a region corresponding to a region surrounded by each pixel of the contour boundary in the second image, so as to improve contrast, image brightness, image quality, and the like of the region where the target object is located in the second image.

The image enhancement processing may also be histogram equalization enhancement processing, laplacian-based image enhancement, and the like, and the specific process may refer to the related art, which is not limited in the embodiment of the present invention.

Further, the performing, by the terminal device, the image enhancement processing on the target object in the second image according to the contour information may further include: the terminal device performs image enhancement processing on the edge region of the target object face in the second image according to the contour information, for example, the edge region of the face refers to: the distance from each pixel point of the inner boundary (the outer boundary is the face contour edge) of the edge region of the face to the face contour edge (the shortest distance from each pixel point of the inner boundary of the edge region of the face to the face contour edge) is the region of the fourth threshold. The value of the fourth threshold may be empirically obtained by those skilled in the art.

The brightness of the face region of the target object in the second image can be enhanced through the image enhancement processing, so that the accuracy of face shape detection can be improved to a certain extent.

Step 203b, the terminal device extracts the target object from the second image after the image enhancement processing according to the contour information.

And extracting the target object from the second image after the image enhancement processing, namely removing the background image except the target object in the second image after the image enhancement processing to obtain a third image including the target object and not including the background image. The target object is extracted from the second image after the image enhancement processing, for example, an image indicated (surrounded) by a pixel point corresponding to the contour information is extracted from the second image after the image enhancement processing, or another method may be used.

Through the processing of extracting the target object, the interference of the class background can be removed, so that the accuracy of the face detection can be improved to a certain extent.

After the processing of steps 203 a-203 b, the contrast between the face region of the target object and the region around the face is enhanced, so that the accuracy of face shape detection can be improved.

Step 203c, the terminal device extracts the target object from the second image according to the contour information.

The terminal device maps each pixel point of the contour boundary in the contour information to the second image (pixel matching) according to the contour information obtained in step 202, so as to extract the target object from the second image, i.e. obtain an intermediate image including the target object and not including the background image. The target object is extracted from the second image, for example, an image indicated (surrounded) by a pixel point corresponding to the contour information is extracted from the second image, or another method may be used.

Step 203d, the terminal device performs image enhancement processing on the target object extracted from the second image according to the contour information.

The terminal device performs image enhancement processing on the target object (the target object in the intermediate image) extracted from the second image according to the contour information to obtain a third image, and the specific description may refer to the related description of step 203a, which is not repeated herein.

By performing image enhancement processing and target object extraction processing on the target object in the second image, the brightness of the face region of the target object in the second image can be enhanced, and the interference of the background-like object can be removed, so that the accuracy of face shape detection can be improved.

Step 203e, the terminal device performs image enhancement processing on the target object in the second image according to the contour information.

And the terminal equipment performs image enhancement processing on the target object in the second image according to the contour information to obtain a third image. For a detailed description, reference may be made to the description related to the step 203a, which is not repeated herein.

This step 203e can be specifically realized by the step 203e1 described below.

In step 203e1, in the case that the brightness of the face region in the second image is detected to be less than or equal to the first threshold, the terminal device performs image enhancement processing on the target object in the second image according to the contour information.

The first threshold may be preset in advance, and is determined according to an actual use condition, and the embodiment of the present invention is not limited.

The terminal device may obtain the brightness of the face region of the second image according to any image brightness evaluation algorithm in the prior art, and perform image enhancement processing on the target object in the second image according to the contour information when the brightness is less than or equal to the first threshold value.

The image brightness evaluation algorithm may refer to the existing related art, and the embodiment of the present invention is not described in detail. The process of performing image enhancement processing on the target object in the second image may refer to the related description of step 203a, and is not described herein again.

Therefore, under the condition that the brightness value of the face area of the target object of the second image does not meet the requirement, the target object in the second image can be subjected to image enhancement processing.

Illustratively, in the case where the terminal device determines that the sharpness of the face region in the second image is poor (the facial features are blurred), the terminal device performs image enhancement processing on the target object in the second image.

Therefore, the terminal equipment can perform targeted processing on the image according to the problems of the image so as to enable the characteristics of the face region to be more obvious, thereby improving the success rate of face detection and the robustness of the face detection technology.

Step 203f, the terminal device extracts the target object from the second image according to the contour information.

And the terminal equipment extracts the target object from the second image according to the contour information to obtain a third image. For a detailed description, reference may be made to the description related to the step 203c, which is not repeated herein.

This step 203f can be specifically realized by the step 203f1 described below.

In step 203f1, in the case that the similarity between the face region and the target region in the second image is detected to be greater than or equal to the second threshold, the terminal device extracts the target object from the second image according to the contour information.

The target area is an area around the face area in the second image. The target region may be a background region of the face region, and another region having a distance from a boundary of the face region within a preset range (which may be preset in advance), for example, the target region may be a neck region.

The second threshold may be preset in advance, and is determined according to an actual use condition, and the embodiment of the present invention is not limited.

The terminal device may obtain a similarity between the face region of the second image and the target region according to any image similarity evaluation algorithm in the prior art, and extract the target object from the second image according to the contour information when the similarity is greater than or equal to a second threshold.

The image similarity evaluation algorithm may refer to the existing related art, and the embodiment of the present invention is not repeated. The process of extracting the target object from the second image according to the contour information may refer to the above description, and will not be described herein again.

For example, in a case where the terminal device determines that the similarity between the face region and the target region in the second image is large (the face region and the target region are difficult to distinguish), the terminal device extracts the target object from the second image, that is, separates the target object in the second image from the background where the target object is located.

And step 204, the terminal device generates a face detection result according to the feature information of the target object in the third image.

Specifically, the terminal device detects the face region in the third image by using a face detection algorithm to generate a face detection result. The face detection result is used for indicating the face of the target object. The face shape detection result may include, for example, a melon-seed face, a long face, a round face, a goose egg face, or a pear-shaped face.

The face detection algorithm can refer to the related art, and is not described herein.

Illustratively, this step 204 may be specifically realized by the following steps 204a to 204 d.

And 204a, the terminal device performs image alignment processing on the third image to obtain a fourth image.

The specific image alignment processing method may refer to the related art, and the embodiment of the present invention is not limited.

Step 204b, the terminal device divides the fourth image into N blocks.

Optionally, the terminal device divides the face area in the fourth image into N blocks.

N is a positive integer. Each block in the N blocks corresponds to a weight, wherein the weight of a first block in the N blocks is greater than the weight of a second block, the first block is a block in the contour region of the face region in the N blocks, and the second block is a block in the non-contour region of the face region in the N blocks.

For example, the desirable value of N may be 5 × 5 or 8 × 8, which is determined according to the actual usage, and the embodiment of the present invention is not limited. And each block is given different weights, the boundary block weight is larger, and the other blocks weight is smaller.

Therefore, the weight of the block of the contour region of the face region is greater than the weights of other regions, the characteristic information of the contour region of the face region can be further enhanced, the accuracy of face detection can be improved, and the robustness of the existing face detection technology can be improved.

Step 204c, the terminal device extracts the feature information of each block of the N blocks to obtain N sets of feature information.

Illustratively, the terminal device extracts a local circular domain-based random pixel difference feature in each tile. Each feature dimension is d-dimension, and d x N-dimension features (i.e. a set of feature information) are integrated. Wherein d may take the values of 72, 128, etc., and the embodiment of the present invention is not limited.

The specific process of extracting the feature information of each block may refer to the related art, and the embodiment of the present invention is not limited.

And step 204d, performing classification training on the N groups of characteristic information, and generating a face detection result by the terminal equipment.

The specific classification training process may refer to the related art, and the embodiment of the present invention is not limited.

For example, the terminal device may also use other face detection algorithms to detect the face region in the third image, and generate a face detection result. For example, the terminal device may further detect the face region in the third image by using a method based on geometric local feature template matching, a method based on parameter optimization, and the like, so as to generate a face detection result. The specific implementation process can refer to the prior related art, and the embodiment of the invention is not limited.

As shown in fig. 3, an embodiment of the present invention provides a terminal device 120, where the terminal device 120 includes: an acquisition module 121, a processing module 122 and a generation module 123; the acquiring module 121 is configured to acquire a first image and a second image of a target object, where the first image is a depth image, the second image is a two-dimensional image, and the target object includes a face region; acquiring contour information of the target object according to the first image; the processing module 122 is configured to process the target object in the second image according to the contour information acquired by the acquiring module 121, so as to obtain a third image; the generating module 123 is configured to generate a face detection result according to the feature information of the target object in the third image obtained by the processing module 122.

Optionally, the processing module 122 is specifically configured to, according to the contour information acquired by the acquiring module 121, perform at least one of the following processes on the target object in the second image: image enhancement processing and processing for extracting the target object.

Optionally, the processing module 122 is specifically configured to, when it is detected that the brightness of the face region in the second image is smaller than or equal to a first threshold, perform image enhancement processing on the target object in the second image according to the contour information acquired by the acquiring module 121; when it is detected that the similarity between the face region in the second image and the target region is greater than or equal to a second threshold, the target object is extracted from the second image according to the contour information acquired by the acquisition module 121, where the target region is a region around the face region in the second image.

Optionally, the generating module 123 is specifically configured to perform image alignment processing on the third image to obtain a fourth image; dividing the fourth image into N blocks; extracting the characteristic information of each block in the N blocks to obtain N groups of characteristic information; and carrying out classification training on the N groups of characteristic information to generate a face detection result, wherein N is a positive integer.

Optionally, each of the N blocks corresponds to a weight, where the weight of a first block in the N blocks is greater than the weight of a second block, the first block is a block in a contour region of the face region in the N blocks, and the second block is a block in a non-contour region of the face region in the N blocks.

Optionally, the obtaining module 121 is specifically configured to perform normalization processing on the first image to obtain the contour information of the target object.

The terminal device provided by the embodiment of the present invention can implement each process shown in fig. 2 in the above method embodiment, and is not described here again to avoid repetition.

The embodiment of the invention provides a terminal device, which can firstly obtain the contour information of a target object according to a depth image comprising the target object, then process a two-dimensional image comprising the target object according to the contour information to obtain a third image, and finally perform face detection on a face area in the third image to generate a face detection result. According to the scheme, although factors such as illumination and class background greatly affect portrait characteristics in the two-dimensional image, the portrait characteristics of the depth image are hardly affected, and therefore, a face detection result obtained by processing the second image in combination with the first image and then performing face detection on the obtained third image is more accurate than a face detection result obtained by performing face detection on the second image directly, and the robustness of the face detection technology can be improved.

Fig. 4 is a schematic diagram of a hardware structure of a terminal device for implementing various embodiments of the present invention. As shown in fig. 4, the terminal device 100 includes but is not limited to: radio frequency unit 101, network module 102, audio output unit 103, input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the terminal device configuration shown in fig. 4 does not constitute a limitation of the terminal device, and that the terminal device may include more or fewer components than shown, or combine certain components, or a different arrangement of components. In the embodiment of the present invention, the terminal device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal device, a wearable device, a pedometer, and the like.

The processor 110 is configured to obtain a first image and a second image of a target object, where the first image is a depth image, the second image is a two-dimensional image, and the target object includes a face region; acquiring contour information of the target object according to the first image; processing the target object in the second image according to the contour information to obtain a third image; and generating a face detection result according to the characteristic information of the target object in the third image.

According to the terminal device provided by the embodiment of the invention, the terminal device can firstly obtain the contour information of the target object according to the depth image comprising the target object, then process the two-dimensional image comprising the target object according to the contour information to obtain the third image, and finally perform face detection on the face area in the third image to generate the face detection result. According to the scheme, although factors such as illumination and class background greatly affect portrait characteristics in the two-dimensional image, the portrait characteristics of the depth image are hardly affected, and therefore, a face detection result obtained by processing the second image in combination with the first image and then performing face detection on the obtained third image is more accurate than a face detection result obtained by performing face detection on the second image directly, and the robustness of the face detection technology can be improved.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 101 may be used for receiving and sending signals during a message transmission or call process, and specifically, after receiving downlink data from a base station, the downlink data is processed by the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through a wireless communication system.

The terminal device provides wireless broadband internet access to the user through the network module 102, such as helping the user send and receive e-mails, browse webpages, access streaming media, and the like.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the network module 102 or stored in the memory 109 into an audio signal and output as sound. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the terminal device 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 includes a speaker, a buzzer, a receiver, and the like.

The input unit 104 is used to receive an audio or video signal. The input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, and the Graphics processor 1041 processes image data of a still picture or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the network module 102. The microphone 1042 may receive sound and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode.

The terminal device 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 1061 and/or the backlight when the terminal device 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the terminal device posture (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration identification related functions (such as pedometer, tapping), and the like; the sensors 105 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the terminal device. Specifically, the user input unit 107 includes a touch panel 1071 and other input devices 1072. Touch panel 1071, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 1071 (e.g., operations by a user on or near touch panel 1071 using a finger, stylus, or any suitable object or attachment). The touch panel 1071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and receives and executes commands sent by the processor 110. In addition, the touch panel 1071 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. Specifically, other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

Further, the touch panel 1071 may be overlaid on the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although in fig. 4, the touch panel 1071 and the display panel 1061 are two independent components to implement the input and output functions of the terminal device, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the terminal device, and is not limited herein.

The interface unit 108 is an interface for connecting an external device to the terminal apparatus 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the terminal apparatus 100 or may be used to transmit data between the terminal apparatus 100 and the external device.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the terminal device, connects various parts of the entire terminal device by using various interfaces and lines, and performs various functions of the terminal device and processes data by running or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the terminal device. Processor 110 may include one or more processing units; alternatively, the processor 110 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The terminal device 100 may further include a power supply 111 (such as a battery) for supplying power to each component, and optionally, the power supply 111 may be logically connected to the processor 110 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.

In addition, the terminal device 100 includes some functional modules that are not shown, and are not described in detail here.

Optionally, an embodiment of the present invention further provides a terminal device, which may include the processor 110 shown in fig. 4, the memory 109, and a computer program stored in the memory 109 and capable of being executed on the processor 110, where the computer program, when executed by the processor 110, implements each process of the face detection method shown in fig. 2 in the foregoing method embodiment, and can achieve the same technical effect, and details are not described here to avoid repetition.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the face detection method shown in fig. 2 in the foregoing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A face detection method, comprising:

acquiring a first image and a second image of a target object, wherein the first image is a depth image, the second image is a two-dimensional image, and the target object comprises a face area;

acquiring contour information of the target object according to the first image;

processing the target object in the second image according to the contour information to obtain a third image, wherein the contrast of the face area of the target object in the third image is greater than that of the face area of the target object in the second image;

generating a face detection result according to the characteristic information of the target object in the third image;

the contour information is obtained by processing pixel values corresponding to contours in the first image, and the value range of the pixel values corresponding to the processed contour information is within a preset range.

2. The method of claim 1, wherein said processing the target photographic subject in the second image according to the contour information comprises:

according to the contour information, performing at least one of the following processes on the target object in the second image: and image enhancement processing and target object extraction processing.

3. The method of claim 2, wherein the performing at least one of the following on the target object in the second image according to the contour information: the image enhancement processing and the target object extraction processing comprise the following steps:

under the condition that the brightness of the face region in the second image is detected to be smaller than or equal to a first threshold value, carrying out image enhancement processing on a target object in the second image according to the contour information;

and under the condition that the similarity between the face region in the second image and a target region is detected to be greater than or equal to a second threshold value, extracting the target object from the second image according to the contour information, wherein the target region is a region around the face region in the second image.

4. The method according to any one of claims 1 to 3, wherein the generating a face detection result from the feature information of the target object in the third image comprises:

carrying out image alignment processing on the third image to obtain a fourth image;

dividing the fourth image into N blocks;

extracting feature information of each block in the N blocks to obtain N groups of feature information;

and carrying out classification training on the N groups of characteristic information to generate a face detection result, wherein N is a positive integer.

5. The method according to claim 4, wherein each of the N blocks corresponds to a weight, and wherein a weight of a first block of the N blocks is greater than a weight of a second block, the first block being a block of the N blocks located in a contour region of the face region, and the second block being a block of the N blocks located in a non-contour region of the face region.

6. The method according to any one of claims 1 to 3, wherein the obtaining contour information of the target object according to the first image comprises:

and carrying out normalization processing on the first image to obtain the contour information of the target object.

7. A terminal device, characterized in that the terminal device comprises: the device comprises an acquisition module, a processing module and a generation module;

the acquisition module is used for acquiring a first image and a second image of a target object, wherein the first image is a depth image, the second image is a two-dimensional image, and the target object comprises a face area; acquiring contour information of the target object according to the first image;

the processing module is configured to process the target object in the second image according to the contour information acquired by the acquisition module to obtain a third image, where a contrast of a face region of the target object in the third image is greater than a contrast of a face region of the target object in the second image;

the generating module is used for generating a face detection result according to the feature information of the target object in the third image obtained by the processing module;

8. The terminal device according to claim 7, wherein the processing module is specifically configured to perform, according to the contour information acquired by the acquisition module, at least one of the following processes on the target object in the second image: and image enhancement processing and target object extraction processing.

9. The terminal device according to claim 8, wherein the processing module is specifically configured to, when it is detected that brightness of a face region in the second image is smaller than or equal to a first threshold, perform image enhancement processing on a target object in the second image according to the contour information acquired by the acquisition module; and under the condition that the similarity between the face region in the second image and a target region is detected to be greater than or equal to a second threshold value, extracting the target object from the second image according to the contour information acquired by the acquisition module, wherein the target region is a region around the face region in the second image.

10. The terminal device according to any one of claims 7 to 9, wherein the generating module is specifically configured to perform image alignment processing on the third image to obtain a fourth image; dividing the fourth image into N blocks; extracting feature information of each block in the N blocks to obtain N groups of feature information; and carrying out classification training on the N groups of characteristic information to generate a face detection result, wherein N is a positive integer.