JP7335018B2

JP7335018B2 - A Fast Face Detection Method Based on Multilayer Preprocessing

Info

Publication number: JP7335018B2
Application number: JP2022512825A
Authority: JP
Inventors: 暉張; 子皓叶; 海涛趙; 雁飛孫; 洪波朱
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-03-25
Filing date: 2021-04-29
Publication date: 2023-08-29
Anticipated expiration: 2041-04-29
Also published as: WO2022198751A1; CN113204991B; CN113204991A; JP2023522501A

Description

本願は、ターゲット検出の分野に関し、具体的には、多層前処理によって顔検出を高速で正確に行う方法に関する。 The present application relates to the field of target detection, and in particular to a fast and accurate method for face detection with multi-layered pre-processing.

本願は、２０２１年３月２５日に中国特許局に提出された出願番号が２０２１１０３２２２０４７であり、発明の名称が「多層前処理に基づく高速顔検出方法」である中国特許出願の優先権を主張し、その全体が参照により本願に組み込まれる。 This application claims the priority of the Chinese Patent Application No. 2021103222047 filed with the Chinese Patent Office on March 25, 2021 and entitled "Fast Face Detection Method Based on Multilayer Preprocessing". , which is incorporated herein by reference in its entirety.

顔認識技術は、監視、セキュリティ、人事管理や画像制作のさまざまな分野で広く使用されている重要な技術である。顔認識技術には、顔の検出と識別の２つの部分があり、この中で、顔検出とは、画像内のすべての顔が現れる位置を検出することであるが、顔識別とは、２つの顔が同じ人物であるかどうかを判断することである。すべての顔の位置が検出された場合にのみ次のステップが実行できるため、顔検出は顔認識技術の基礎である。 Face recognition technology is an important technology widely used in various fields of surveillance, security, personnel management and image production. Face recognition technology has two parts: face detection and identification. Among them, face detection is to detect the positions where all faces appear in an image. to determine whether two faces are the same person. Face detection is the basis of face recognition technology because the next step can only be performed if all face positions have been detected.

ターゲット検出分野の１つのサブ分野としての顔検出には、デジタル画像機能と分類アルゴリズムを組み合わせたＨａａｒカスケード分類器や、深層学習の分野での畳み込みニューラルネットワークなど、多くの成熟したアルゴリズムがある。この中で、畳み込みニューラルネットワークは、現在最も高度なアルゴリズムの１つとして、顔検出の問題で非常にうまく機能している。最適に設計され、完全にトレーニングされたさまざまな畳み込みニューラルネットワークは、さまざまな照明、角度、さらには部分的に遮断された場合でさえも高精度で顔を検出できる。 Face detection, as a subfield of the target detection field, has many mature algorithms such as the Haar cascade classifier, which combines digital image functions and classification algorithms, and convolutional neural networks in the field of deep learning. Among them, the convolutional neural network, as one of the most advanced algorithms at present, has performed very well in the problem of face detection. A variety of optimally designed and fully trained convolutional neural networks can detect faces with high accuracy in different lighting, angles, and even when partially occluded.

本願の例示的な実施例は、複数の画像処理方法と畳み込みニューラルネットワーク技術を組み合わせており、畳み込みニューラルネットワークの演算が遅いという問題を解決することを目的とする多層前処理に基づく高速顔検出方法を提供する。 An exemplary embodiment of the present application combines multiple image processing methods and convolutional neural network technology, a fast face detection method based on multi-layer preprocessing, which aims to solve the problem of slow computation of convolutional neural networks. I will provide a.

本願の一形態では、多層前処理に基づく高速顔検出方法を提供し、具体的な操作ステップは、
被検出画像をＲＧＢ色空間からＹＣｂＣｒ色空間に変換するＳ１０１と、
楕円肌色モデルを使用して、Ｓ１０１で取得された画像のピクセルごとに肌色ピクセルであるかどうかを判断し、肌色領域を取得するＳ１０２であって、いずれかのピクセルの青の色度と赤の色度の成分が楕円肌色モデルの要件を満たしている場合、前記ピクセルを前記肌色ピクセルとして判断するＳ１０２と、
Ｓ１０２で取得された前記肌色領域を形態学的処理して、処理済み肌色領域を取得するＳ１０３と、
Ｓ１０３で処理して取得された前記処理済み肌色領域に対して有効検索位置フィルタリングを行い、有効検索位置を取得し、輪郭抽出技術を利用して有効検索位置の輪郭を抽出し、各輪郭に対応して１つの被検フレームを生成するＳ１０４と、
顔検出機能を有する畳み込みニューラルネットワークを使用して、Ｓ１０４で取得された前記被検フレームを１つずつ検出し、前記被検フレーム内の顔位置決め座標を示すＳ１０５と、
前記被検フレームの座標及び前記被検フレーム内の前記顔位置決め座標に基づいて、顔位置決めフレームの座標を確定するＳ１０６とを含む。 In one aspect of the present application, a fast face detection method based on multi-layer preprocessing is provided, and the specific operation steps are:
S101 for converting the detected image from the RGB color space to the YCbCr color space;
Using an elliptical skin color model, determine whether each pixel of the image acquired in S101 is a skin color pixel, and obtain a skin color region in S102. S102 determining the pixel as the skin color pixel if the chromaticity component meets the requirements of an elliptical skin color model;
S103 for morphologically processing the skin color region obtained in S102 to obtain a processed skin color region;
Perform effective search position filtering on the processed skin color area obtained by processing in S103 to obtain effective search positions, extract the contours of the effective search positions using contour extraction technology, and correspond to each contour. S104 to generate one test frame by
S105, using a convolutional neural network with a face detection function to detect the test frames obtained in S104 one by one, indicating face positioning coordinates in the test frames;
determining S106 the coordinates of the face registration frame based on the coordinates of the frame under test and the face registration coordinates within the frame under test.

一実施例では、前記楕円肌色モデルの要件は、

であり、
ここで、Ｃｂはピクセルの青の色度の成分を表し、Ｃｒはピクセルの赤の色度の成分を表す。 In one embodiment, the requirements of the elliptical skin tone model are:

and
where Cb represents the blue chromaticity component of the pixel and Cr represents the red chromaticity component of the pixel.

一実施例では、前記処理済み肌色領域に対して有効検索位置フィルタリングを行うステップは、
フィルタ行列を使用して前記処理済み肌色領域に対して有効検索位置フィルタリングを行うことであって、前記処理済み肌色領域におけるピクセル値、前記フィルタ行列におけるピクセル値及び前記有効検索位置におけるピクセル値は下記の式を満たすことを含み、

ここで、ｄｓｔ（ｉ，ｊ）は有効検索位置ｄｓｔにおける座標（ｉ，ｊ）でのピクセル値であり、ｓｒｃ（ｉ＋ｘ，ｊ＋ｙ）は肌色領域ｓｒｃにおける座標（ｉ＋ｘ，ｊ＋ｙ）でのピクセル値であり、ｆ（ｘ，ｙ）はフィルタ行列ｆにおける座標（ｘ，ｙ）でのピクセル値であり、フィルタ行列ｆのサイズは（２ａ＋１）×（２ｂ＋１）であり、中心座標は（０，０）であり、ｔは予め設定された有効検索率ＥＳＲ閾値であり、ａｒｅａはフィルタ行列ｆにおける、値が１であるピクセルの数である。 In one embodiment, the step of valid search position filtering for the processed skin tone region comprises:
performing effective search position filtering on the processed skin-color region using a filter matrix, wherein pixel values in the processed skin-color region, pixel values in the filter matrix and pixel values at the effective search positions are: including satisfying the expression of

Here, dst(i, j) is the pixel value at coordinates (i, j) in the effective search position dst, and src(i+x, j+y) is the pixel value at coordinates (i+x, j+y) in the skin color area src. , where f(x,y) is the pixel value at coordinates (x,y) in the filter matrix f, the size of the filter matrix f is (2a+1)×(2b+1), and the center coordinates are (0,0) , t is a preset effective search rate ESR threshold, and area is the number of pixels whose value is 1 in the filter matrix f.

一実施例では、被検フレームの左上角の座標（ｌｅｆｔ, ｔｏｐ）及び右下角の座標（ｒｉｇｈｔ, ｂｏｔｔｏｍ）はそれぞれ、

それぞれ輪郭外接矩形の左上角及び右下角の座標である。 In one embodiment, the upper left corner coordinates (left, top) and the lower right corner coordinates (right, bottom) of the frame under test are respectively:

These are the coordinates of the upper left and lower right corners of the contour circumscribing rectangle, respectively.

一実施例では、前記有効検索率は、前記被検フレームにおける前記肌色領域の面積と前記被検フレームの面積との比として定義される。 In one embodiment, the effective search rate is defined as the ratio of the area of the skin tone region in the test frame to the area of the test frame.

一実施例では、前記被検出画像を前記ＲＧＢ色空間から前記ＹＣｂＣｒ色空間に変換するステップは、
下記の式を利用して、前記被検出画像に対して前記色空間変換を行うことを含み、

ここで、Ｙ、Ｃｂ、Ｃｒは、ピクセルの輝度、青の色度の成分、赤の色度の成分をそれぞれ表し、Ｒ、Ｇ、Ｂはピクセルの赤、緑、青の成分をそれぞれ表す。 In one embodiment, converting the detected image from the RGB color space to the YCbCr color space comprises:
performing the color space conversion on the detected image using the formula:

Here, Y, Cb, and Cr represent the luminance, blue chromaticity component, and red chromaticity component of the pixel, respectively, and R, G, and B represent the red, green, and blue components of the pixel, respectively.

一実施例では、前記肌色領域を形態学的処理するステップは、開操作でゆるい肌色ポイントや細線構造を取り除くことを含む。 In one embodiment, the step of morphologically processing the skin tone region includes removing loose skin tone points and thin line structures in an opening operation.

一実施例では、前記肌色領域を形態学的処理するステップは、閉操作で、穴を埋め、ギャップを埋めることを更に含む。 In one embodiment, the step of morphologically processing the skin tone region further comprises filling holes and filling gaps in a closing operation.

一実施例では、前記被検フレームは、少なくとも被検フレームＡ及び被検フレームＢを含み、前記Ｓ１０４は、
前記被検フレームＡ、Ｂを併合し、前記被検フレームＡとＢを併合して取得された被検フレームＣの面積が前記被検フレームＡとＢの面積の和以下である場合、前記被検フレームＡとＢを併合し、そうでない場合、被検フレームＡとＢを併合しないことを更に含む。 In one embodiment, the test frames include at least a test frame A and a test frame B, and S104 includes:
If the area of the test frame C obtained by merging the test frames A and B is less than or equal to the sum of the areas of the test frames A and B, then the test frame C It further includes merging test frames A and B, and otherwise not merging test frames A and B.

一実施例では、被検フレームＣの左上角の

それぞれ被検フレームＢの左上角の座標及び右下角の座標である。 In one embodiment, in the upper left corner of the frame under test C,

These are the coordinates of the upper left corner and the coordinates of the lower right corner of the subject frame B, respectively.

一実施例では、Ｓ１０６における顔位置決めフレームの左上角及び右下角の座標はそれぞれ、

それぞれ畳み込みニューラルネットワークから出力された、被検フレームＣのある顔を位置決める左上角及び右下角の座標である。 In one embodiment, the coordinates of the upper left corner and lower right corner of the face positioning frame in S106 are respectively:

Coordinates of the upper left and lower right corners locating a face in the frame under test C, respectively, output from the convolutional neural network.

一実施例では、有効検索率は、被検フレームにおける肌色領域面積と被検フレームの面積との比として定義される。 In one embodiment, the effective search rate is defined as the ratio of the area of the skin tone region in the frame under test to the area of the frame under test.

本願の別の形態では、コンピュータプログラムを格納するメモリと、前記コンピュータプログラムを実行すると、上記の実施例のいずれかに記載の方法のステップを実施するプロセッサとを含むコンピュータデバイスを提供する。 According to another aspect of the present application, there is provided a computer device including a memory storing a computer program and a processor which, when executing the computer program, performs the steps of the method according to any of the above embodiments.

本願の更に別の形態では、プロセッサによって実行されると、上記の実施例のいずれかに記載の方法のステップを実施するコンピュータプログラムが記憶されているコンピュータ可読記憶媒体を提供する。 According to yet another aspect of the present application, there is provided a computer readable storage medium having stored thereon a computer program that, when executed by a processor, performs the steps of the method described in any of the above embodiments.

有益な効果は以下のとおりである。本願は、顔検出畳み込みニューラルネットワークの高精度を維持しながら、多層前処理技術により検索が必要な領域のサイズを縮小し、それによってその実行速度を大幅に向上させることができる。 Beneficial effects are: The present application can reduce the size of the region that needs to be searched through the multi-layer preprocessing technique while maintaining the high accuracy of the face detection convolutional neural network, thereby greatly improving its execution speed.

本願の一実施例に係る多層前処理に基づく高速顔検出方法のフローチャートである。1 is a flowchart of a fast face detection method based on multi-layer pre-processing according to one embodiment of the present application; 本願の一実施例に係る有効検索位置フィルタリング（ＥＳＰＦフィルタリング）の模式図である。FIG. 3 is a schematic diagram of effective search location filtering (ESPF filtering) according to one embodiment of the present application; 本願の一実施例に係る被検フレームの生成の模式図である。FIG. 4 is a schematic diagram of generation of a test frame according to an embodiment of the present application; 本願の一実施例に係る被検フレームの併合の模式図である。FIG. 4 is a schematic diagram of merging test frames according to an embodiment of the present application;

前述のように、最適に設計され、完全にトレーニングされたさまざまな畳み込みニューラルネットワークは、さまざまな照明、角度、さらには部分的に遮断された場合でさえも高精度で顔を検出できるが、畳み込みニューラルネットワークにも独自の欠点があり、つまり、高速な演算は、強力な浮動小数点演算機能を備えたＧＰＵに大きく依存している。コスト、体積や電力の制約により、小さなエッジ端末では畳み込みニューラルネットワークの高速演算をサポートすることは困難である。 As mentioned earlier, various optimally designed and fully trained convolutional neural networks can detect faces with high accuracy under different illuminations, angles, and even when partially occluded, but convolutional Neural networks also have their own shortcomings: they rely heavily on GPUs with powerful floating-point math capabilities for fast computation. Due to cost, volume and power constraints, it is difficult for small edge terminals to support fast computation of convolutional neural networks.

本出願の目的、技術的解決手段および利点をより明確にするために、本出願は、図面および実施例を参照して、以下でさらに詳細に説明される。本明細書に記載の特定の実施例は、本出願を解釈するためにのみ使用され、本出願を限定するものではないことを理解されたい。 In order to make the purpose, technical solutions and advantages of the present application clearer, the present application is further described in detail below with reference to the drawings and examples. It should be understood that the specific examples described herein are used only for the purpose of interpreting the application and are not intended to limit the application.

本出願の技術的解決策を、図面および特定の実施例と併せて、以下でさらに詳しく説明する。 The technical solutions of the present application are described in more detail below in conjunction with drawings and specific examples.

図１に示す実施例では、多層前処理に基づく高速顔検出方法は、具体的には下記の操作ステップを含む。 In the embodiment shown in FIG. 1, the fast face detection method based on multi-layer pre-processing specifically includes the following operation steps.

Ｓ１０１：入力画像（被検出画像）を色空間変換し、デフォルトのＲＧＢ色空間からＹＣｂＣｒ色空間に変換する。これは、ＹＣｂＣｒが色の輝度と色度を分離したため、さまざまな照明条件で色を分類するシーンに適しているためである。 S101: Perform color space conversion on the input image (image to be detected) to convert from the default RGB color space to the YCbCr color space. This is because YCbCr separates the luminance and chromaticity of colors, making it suitable for scenes that classify colors under various lighting conditions.

コンピュータ分野では、画像またはビデオのエンコーディングのほとんどはＲＧＢ色空間に基づいているため、ＹＣｂＣｒを使用する場合は、まずＲＧＢ色空間をＹＣｂＣｒ色空間に変換する必要がある。赤、緑、青の３色に対する人間の目の感度は同じではないため、輝度Ｙを変換するときは、赤、緑、青に異なる重みを付ける必要がある。具体的な換算式は次のとおりである。

In the computer field, most image or video encoding is based on the RGB color space, so when using YCbCr, the RGB color space must first be converted to the YCbCr color space. Since the human eye has unequal sensitivities to the three colors red, green, and blue, when converting luminance Y, red, green, and blue must be weighted differently. A specific conversion formula is as follows.

Ｓ１０２：楕円肌色モデルを使用して、Ｓ１０１で取得された画像のピクセルごとに肌色ピクセルであるかどうかを判断し、肌色領域を取得し、いずれかのピクセルの青の色度と赤の色度の成分が楕円肌色モデルの要件を満たしている場合、前記ピクセルを前記肌色ピクセルとして判断する。 S102: Using the elliptical skin color model, determine whether each pixel of the image obtained in S101 is a skin color pixel, obtain the skin color region, and calculate the blue chromaticity and red chromaticity of any pixel satisfies the requirements of an elliptical skin tone model, then the pixel is determined to be the skin tone pixel.

多数の肌色を統計したところ、ＹＣｂＣｒ空間では、肌色はほぼ楕円柱状の分布を示しており、つまり、ＣｂＣｒ平面では、肌色の分布は楕円に近いことがわかった。統計研究によると、Ｃｒを横軸、Ｃｂを縦軸として平面直交座標系を確立する場合、肌色楕円の中心位置は（１５５，１１３）、長軸の長さは３０、短軸の長さは２０、傾斜角は４５°（反時計回り）である。したがって、肌色楕円の方程式は次のようになる。

A statistical analysis of a large number of skin colors reveals that in the YCbCr space, the skin colors exhibit a substantially elliptical cylindrical distribution, that is, in the CbCr plane, the skin color distribution is close to an ellipse. According to statistical studies, when establishing a planar orthogonal coordinate system with Cr as the horizontal axis and Cb as the vertical axis, the center position of the flesh-colored ellipse is (155, 113), the length of the major axis is 30, and the length of the minor axis is 20, the tilt angle is 45° (counterclockwise). Therefore, the equation for the skin-color ellipse is:

肌色楕円モデルを作成した後、１つのピクセルについては、青の色度Ｃｂと赤の色度Ｃｒの成分によって構成されるポイントが肌色楕円内にある場合、肌色ピクセルであると判断でき、そうでない場合は、非肌色ピクセルである。式２を簡略化して、ピクセルが肌色のピクセルである判断が得られる条件は次のようになる。

After creating the skin-color ellipse model, for a pixel, if the point formed by the components of blue chromaticity Cb and red chromaticity Cr is within the skin-color ellipse, it can be determined to be a skin-color pixel; If it is a non-flesh-colored pixel. Simplifying Equation 2, the condition for determining that a pixel is a flesh-colored pixel is as follows.

Ｓ１０１では、ＲＧＢ画像がＹＣｂＣｒ空間に変換された後、そのうちのあるピクセルのＣｂおよびＣｒ成分が式３を満たす場合、そのピクセルは肌色ピクセルと見なすことができる。入力画像における各ピクセルに対して、式３で判断することにより、肌色領域（または肌色マスク）を取得できる。 In S101, after the RGB image is transformed into YCbCr space, if the Cb and Cr components of a pixel thereof satisfy Equation 3, the pixel can be considered as a skin-color pixel. For each pixel in the input image, the skin color region (or skin color mask) can be obtained by determining Equation 3.

Ｓ１０３：Ｓ１０２で取得された前記肌色領域を形態学的処理して、処理済み肌色領域を取得する。 S103: Morphologically process the skin color region obtained in S102 to obtain a processed skin color region.

形態学的操作は、２値化された画像の形状の特徴を処理するための画像処理の分野における一連の技術である。基本的な考え方は、特定の形状の構造要素とルールを使用して画像のピクセル値を変更することで、ノイズの除去、穴やギャップの埋め、グリッチのトリミング、エッジの平滑化の効果を実現し、これにより、さらなる画像分析とターゲット認識を実現することである。基本的な形態学的操作には、侵食（Ｅｒｏｓｉｏｎ）と膨張（Ｄｉｌａｔｉｏｎ）が含まれる。侵食はノイズやグリッチなどの微細構造を除去するために使用され、膨張は穴やギャップを埋めるために使用される。侵食操作を行う場合、構造要素を入力画像上でピクセルごとにスライドさせ、構造要素内のすべての１値が向かい合っている入力画像ピクセルを対応ピクセルと呼び、スライドごとに対応ピクセルの最小値を構造要素のアンカーポイント位置に向かい合っている出力画像のピクセルに書き込む。これは次の式で表される。

Morphological operations are a set of techniques in the field of image processing for manipulating shape features in binarized images. The basic idea is to use certain shaped structuring elements and rules to modify the pixel values of an image to achieve the effects of removing noise, filling holes and gaps, trimming glitches and smoothing edges. This will enable further image analysis and target recognition. Basic morphological manipulations include Erosion and Dilation. Erosion is used to remove fine structures such as noise and glitches, while dilation is used to fill holes and gaps. When performing the erosion operation, the structuring element is slid on the input image pixel by pixel, the input image pixels where all the 1 values in the structuring element are facing each other are called the corresponding pixels, and the minimum value of the corresponding pixels for each slide is the structuring element. Writes to the pixel of the output image opposite the anchor point position of the element. This is represented by the following formula.

ここで、ｄｓｔ、ｓｒｃ、Ｅは出力画像、入力画像及び構造要素をそれぞれ表し、構造要素はアンカーポイントを座標中心とし、（ｉ，ｊ）は現在の構造要素のアンカーポイント位置座標であり、（ｘ，ｙ）はアンカーポイントに対する構造要素のオフセットである。式４は、侵食プロセス中に、構造要素の１値領域が入力画像の１値領域で完全に覆われている場合にのみ、出力画像のアンカーポイント位置のピクセル値が１であることを示している。これにより、画像の１値領域の輪郭が縮小し、つまり、視覚的に１値領域が侵食されているように見える。膨張操作は、最小値が最大値になることを除いて、侵食操作と同様であり、その式は次のとおりである。

Here, dst, src, and E represent the output image, the input image, and the structuring element, respectively, where the structuring element has the anchor point as the coordinate center, (i, j) are the anchor point position coordinates of the current structuring element, and ( x,y) is the offset of the structuring element relative to the anchor point. Equation 4 indicates that during the erosion process, the pixel value at the anchor point location in the output image will be 1 only if the unilevel region of the structuring element is completely covered by the unilevel region of the input image. there is As a result, the contours of the monolevel regions of the image are reduced, that is, the monolevel regions appear to be visually eroded. The dilation operation is similar to the erosion operation, except that the minimum becomes the maximum, and the formula is:

式５は、膨張プロセス中に、構造要素の１値領域が入力画像の０値領域で完全に覆われている場合にのみ、出力画像のアンカーポイント位置でのピクセル値が０であることを示している。これにより、画像の１値領域の輪郭が拡張し、つまり、視覚的には１値領域が膨張されているように見える。侵食と膨張は、肌色領域の面積に大きな変化を引き起こす。 Equation 5 indicates that during the dilation process, the pixel value at the anchor point location in the output image will be 0 only if the 1-value region of the structuring element is completely covered by the 0-value region of the input image. ing. This expands the contours of the unilevel regions of the image, ie, visually the unilevel regions appear dilated. Erosion and swelling cause large changes in the area of the flesh-colored region.

肌色領域のサイズに影響を与えずにノイズを取り除き、穴やギャップを埋めるには、開操作（Ｏｐｅｎｉｎｇ）と閉操作（Ｃｌｏｓｉｎｇ）を使用する必要がある。開操作とは、同じ構造要素で画像を順次侵食および膨張することを指す。閉操作により、小さな接続を切断し、ノイズを除去することができる。閉操作とは、最初に膨張し、次に腐食することを指し、これにより、隣接する領域を接続したり、穴やギャップを埋めたりすることができる。取得された肌色領域に形態学的処理を行い、開操作によりゆるい肌色ポイントや細線構造を取り除き、閉操作により肌色領域の小さい穴を埋め、小さなギャップを埋める。開操作と閉操作は、ノイズを取り除き、穴やギャップを埋めながら、肌色領域の面積にほとんど影響を与えない。Ｓ１０２で取得された肌色マスクをそれぞれ開操作、閉操作して、最終的な肌色マスクを取得することができる。 Opening and closing operations should be used to remove noise and fill holes and gaps without affecting the size of the skin-tone region. An open operation refers to sequential erosion and dilation of an image with the same structuring element. A closing operation can break small connections and eliminate noise. Closure refers to first expansion and then erosion, which can connect adjacent areas or fill holes and gaps. Morphological processing is performed on the acquired skin-color regions, and the opening operation removes loose skin-color points and thin line structures, and the closing operation fills small holes and fills small gaps in the skin-color regions. The opening and closing operations remove noise and fill holes and gaps while having little effect on the area of the flesh-tone regions. The final skin color mask can be obtained by opening and closing the skin color mask obtained in S102.

Ｓ１０４：Ｓ１０３で処理して取得された前記処理済み肌色領域に対して有効検索位置フィルタリングを行い、有効検索位置を取得し、輪郭抽出技術を利用して有効検索位置の輪郭を抽出し、各輪郭に対応して１つの被検フレームを生成する。 S104: Perform effective search position filtering on the processed skin color region obtained by processing in S103 to obtain effective search positions, extract the contours of the effective search positions using contour extraction technology, and extract each contour generates one test frame corresponding to .

最終的に取得された肌色領域に対して有効検索位置フィルタリング（ＥｆｆｅｃｔｉｖｅＳｅａｒｃｈＰｏｓｉｔｉｏｎＦｉｌｔｅｒｉｎｇ、ＥＳＰＦ）を行い、すべての有効検索位置ピクセル領域を取得する。ＥＳＰＦフィルタリングは、特殊な画像フィルタリング操作であり、楕円形状のフィルタ行列及び有効検索率（ＥｆｆｅｃｔｉｖｅＳｅａｒｃｈＲａｔｅ、ＥＳＲ）に基づくフィルタリング計算操作を使用した。ここで、有効検索率は、被検フレームにおける肌色領域面積Ａ_ｓと被検フレーム面積との比Ａ_ｒとして定義され、その式は次の通りである。

Effective search position filtering (ESPF) is performed on the finally obtained skin color area to obtain all effective search position pixel areas. ESPF filtering is a special image filtering operation that used an elliptical filter matrix and a filtering computation operation based on the Effective Search Rate (ESR). Here, the effective retrieval rate is defined as the ratio A _r of the skin-color region area A _s in the test frame to the test frame area, and the formula is as follows.

ＥＳＰＦの計算過程は次の式で表すことができる。

The ESPF calculation process can be expressed by the following equation.

式におけるｄｓｔ、ｓｒｃ及びｆは、それぞれ出力画像、入力画像及びフィルタ行列である。フィルタ行列のサイズは（２ａ＋１）×（２ｂ＋１）であり、中心座標は（０，０）であり、ｔは予め設定されたＥＳＲ閾値であり、ａｒｅａはフィルタ行列における１値ピクセルの数である。ＥＳＰＦフィルタリング中に使用されるフィルタ行列は楕円行列であり、図２におけるフィルタ行列に示すように、そのうちの１値は矩形に内接する標準的な楕円形として配列される。 dst, src and f in the equations are the output image, input image and filter matrix respectively. The size of the filter matrix is (2a+1)×(2b+1), the center coordinates are (0, 0), t is the preset ESR threshold, and area is the number of 1-level pixels in the filter matrix. The filter matrix used during ESPF filtering is an elliptical matrix, one of which is arranged as a standard ellipse inscribed in a rectangle, as shown in the filter matrix in FIG.

図２に示すように、ＥＳＰＦフィルタリングの出力画像は有効検索位置であり、さらに輪郭抽出技術を利用してそのうちの有効検索位置の輪郭を抽出し、各輪郭に対して１つの被検フレームを生成する。被検フレームは、輪郭外接矩形を周囲に一定の距離だけ拡張することによって得られ、当該輪郭外接矩形の４つの辺はいずれも輪郭に外接し、各辺は画像の各辺に平行である。拡張距離はフィルタ行列のサイズの半分に等しい。輪郭外接矩形フレームの左上角及び右下角の座標がそれぞれ

フィルタ行列のサイズが（２ａ＋１）×（２ｂ＋１）であると、拡張することで取得された被検フレームの左上角の座標及び右下角の座標は、

である。 As shown in FIG. 2, the output image of ESPF filtering is the effective search positions, and the contour extraction technique is used to extract the contours of the effective search positions, and one test frame is generated for each contour, as shown in FIG. do. The frame to be examined is obtained by expanding a contour bounding rectangle around it by a constant distance, all four sides of the contour bounding rectangle circumscribing the contour and each side being parallel to each side of the image. The expansion distance is equal to half the size of the filter matrix. The coordinates of the upper left corner and the lower right corner of the contour circumscribing rectangular frame are

When the size of the filter matrix is (2a+1)×(2b+1), the coordinates of the upper left corner and the lower right corner of the test frame obtained by dilation are

is.

最終的に被検フレームを生成する効果は図３に示され、ＥＳＰＦフィルタリング後に取得された各被検フレームはＥＳＲが高い。このとき、面積が小さい肌色領域、細長い肌色領域等の非顔肌色部分がＥＳＰＦフィルタリングにより取り除かれ、肌色領域が連通するという問題も解決される。 The effect of producing the final test frame is shown in FIG. 3, where each test frame acquired after ESPF filtering has a high ESR. At this time, non-skin color areas such as skin color areas with small areas and elongated skin color areas are removed by ESPF filtering, thereby solving the problem of connecting skin color areas.

Ｓ１０５：顔検出機能を有する畳み込みニューラルネットワークを使用して、Ｓ１０４で取得された前記被検フレームを１つずつ検出し、前記被検フレーム内の顔位置決め座標を示す。 S105: Detecting the test frames obtained in S104 one by one using a convolutional neural network with face detection function to indicate the face positioning coordinates in the test frames.

併合できる被検フレームがあるかどうかを確認し、それらをすべて併合して、最終被検フレームを取得する。被検フレームを併合することは、併合する必要がある２つの被検フレームＡとＢを１つのより大きな被検フレームＣに置き換えことであり、被検フレームＣはＡとＢを完全に覆うとともに、面積をできるだけ小さくする必要があり、従って、被検フレームＣの左上角の座標及び右下角の座標は、次のとおりである。

Check if there are test frames that can be merged and merge them all to get the final test frame. Merging test frames is to replace the two test frames A and B that need to be merged with one larger test frame C, which covers A and B completely and , the area should be as small as possible, so the coordinates of the upper left corner and the lower right corner of the frame under test C are:

また、被検フレームを併合するには、総面積が増加しないという条件を満たす、即ち、

を満たすべきである。図４は、被検フレームを併合した効果を示し、大面積で重ね合わせた２対の被検フレームは併合され、畳み込みニューラルネットワークによって検索する必要がある面積がさらに削減され、検索効率が向上する。 Also, in order to merge test frames, the condition that the total area does not increase, i.e.,

should satisfy FIG. 4 shows the effect of merging test frames, two pairs of test frames superimposed in a large area are merged to further reduce the area that needs to be searched by the convolutional neural network and improve the search efficiency. .

Ｓ１０６：前記被検フレームの座標及び前記被検フレーム内の前記顔位置決め座標に基づいて、顔位置決めフレームの座標を確定して、顔検出結果を取得する。 S106: Determine the coordinates of the face positioning frame according to the coordinates of the frame under test and the face positioning coordinates in the frame under test, and obtain a face detection result.

顔検出機能を有する畳み込みニューラルネットワークを使用して、各最終被検フレームを１つずつ検出し、その中の顔位置決め座標を出力し、ここで出力される位置決め座標は被検フレームに対するものである。 Using a convolutional neural network with a face detection function to detect each final test frame one by one and output the face positioning coordinates therein, where the output positioning coordinates are for the test frame .

ステップ７：畳み込みニューラルネットワークは、被検フレームに対する被検フレーム内のすべての顔位置決めフレームの座標を出力し、被検フレームの左上角及び右下角の座標が

であり、畳み込みニューラルネットワークによって出力されるある顔位置決めフレームの左上角及び右下角の座標が

であると、当該顔位置決めフレームの左上角及び右下角の実際座標はそれぞれ下記の通りである。

Step 7: The convolutional neural network outputs the coordinates of all face positioning frames in the test frame relative to the test frame, and the coordinates of the upper left corner and lower right corner of the test frame are

and the coordinates of the upper left and lower right corners of a face positioning frame output by the convolutional neural network are

Then, the actual coordinates of the upper left corner and the lower right corner of the face positioning frame are respectively:

被検フレームの座標及びその中の顔位置決め座標に基づいて顔位置決めフレームの画像での実際座標を計算し出力し、最終的な顔検出結果を取得する。 calculating and outputting the actual coordinates in the image of the face positioning frame according to the coordinates of the test frame and the face positioning coordinates therein to obtain the final face detection result;

理解できるように、図１のフローチャートにおける各ステップは、矢印で示されるように順次表示されているが、これらのステップは必ずしも矢印で示された順序で順番に実行されるとは限らない。本明細書に明示的に記載されていない限り、これらのステップの実行は厳密には限定されず、これらのステップは他の順序で実行されてもよい。さらに、図１における少なくとも一部のステップは複数のサブステップ又は複数の段階を含んでもよく、これらのサブステップ又は段階は必ずしも同時に実行及び完了する必要はないが、異なる時間に実行されてもよく、これらのサブステップ又は段階の実行順序も必ずしも順次実行する必要はなく、他のステップや他のステップのサブステップ又は段階の少なくとも一部と交代又は交互に実行されてもよい。 As can be appreciated, the steps in the flow chart of FIG. 1 are displayed sequentially as indicated by the arrows, but the steps are not necessarily performed sequentially in the order indicated by the arrows. Unless explicitly stated herein, the performance of these steps is not strictly limited and these steps may be performed in other orders. Further, at least some of the steps in FIG. 1 may include multiple substeps or stages, which substeps or stages are not necessarily performed and completed at the same time, but may be performed at different times. Also, the order of execution of these substeps or stages is not necessarily sequential, and may be performed alternately or alternately with other steps or at least some of the substeps or stages of other steps.

当業者は、上記の実施例の方法におけるプロセスの全部または一部が、コンピュータプログラムを介して関連するハードウェアに指示することによって完了でき、前記コンピュータプログラムが不揮発性コンピュータ可読記憶媒体に格納できることを理解することができ、当該コンピュータプログラムが実行されるときに、上記の各方法の実施例のフローを含んでもよい。ここで、本出願で提供される様々な実施例で使用される、メモリ、ストレージ、データベースまたは他の媒体の任意の引用は、不揮発性および／または揮発性メモリを含んでもよい。不揮発性メモリには、読み取り専用メモリ（ＲＯＭ）、プログラマブルＲＯＭ（ＰＲＯＭ）、電気的にプログラム可能なＲＯＭ（ＥＰＲＯＭ）、電気的に消去可能なプログラマブルＲＯＭ（ＥＥＰＲＯＭ）、またはフラッシュメモリが含まれてもよい。揮発性メモリには、ランダムアクセスメモリ（ＲＡＭ）または外部キャッシュメモリが含まれてもよい。例として、限定ではないが、ＲＡＭは、スタティックＲＡＭ（ＳＲＡＭ）、ダイナミックＲＡＭ（ＤＲＡＭ）、同期ＤＲＡＭ（ＳＤＲＡＭ）、ダブルデータレートＳＤＲＡＭ（ＤＤＲＳＤＲＡＭ）、強化型ＳＤＲＡＭ（ＥＳＤＲＡＭ）、同期リンク（Ｓｙｎｃｈｌｉｎｋ）ＤＲＡＭ（ＳＬＤＲＡＭ）、メモリバス（Ｒａｍｂｕｓ）ダイレクトＲＡＭ（ＲＤＲＡＭ）、ダイレクトメモリバスダイナミックＲＡＭ（ＤＲＤＲＡＭ）やメモリバスダイナミックＲＡＭ（ＲＤＲＡＭ）などのさまざまな形態で入手できる。 Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments can be completed by instructing relevant hardware via a computer program, and the computer program can be stored in a non-volatile computer-readable storage medium. It can be understood that when the computer program is executed, it may include the flow of the above method embodiments. Any reference herein to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. good. Volatile memory can include random access memory (RAM) or external cache memory. By way of example, and not limitation, RAM may include static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM. (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM) and memory bus dynamic RAM (RDRAM).

上記の実施例の各技術的特徴は任意に組み合わせることができ、簡潔に説明するために、上記の実施例における各技術的特徴のすべての可能な組み合わせは記載されていないが、これらの技術的特徴の組み合わせに矛盾がない限り、本明細書に記載されている範囲とみなされるべきである。 Each technical feature of the above embodiments can be combined arbitrarily, and for the sake of simplicity, not all possible combinations of each technical feature of the above embodiments are described, but these technical features Any combination of features should be considered to be in the ranges provided herein unless there is a contradiction.

上記の実施例は、本出願のいくつかの実施例だけを示し、その説明は、具体的かつ詳細であるが、本発明の特許の範囲に対する限定として解釈されるべきではない。ただし、当業者にとって、本出願の概念から逸脱することなく、いくつかの補正および改良を行うことができ、それらはすべて本出願の保護範囲に属する。したがって、本出願の特許の保護範囲は、添付の特許請求の範囲に従うものとする。 The above examples show only some embodiments of the present application, and the description, while specific and detailed, should not be construed as a limitation on the patentable scope of the invention. However, some amendments and improvements can be made by those skilled in the art without departing from the concept of the present application, and they all fall within the protection scope of the present application. Therefore, the scope of protection of the patent of this application shall be subject to the attached claims.

Claims

被検出画像をＲＧＢ色空間からＹＣｂＣｒ色空間に変換するＳ１０１ステップと、
楕円肌色モデルを使用して、Ｓ１０１ステップで取得された画像のピクセルごとに肌色ピクセルであるかどうかを判断し、肌色領域を取得するＳ１０２ステップであって、いずれかのピクセルの青の色度と赤の色度の成分が楕円肌色モデルの要件を満たしている場合、前記ピクセルを前記肌色ピクセルとして判断するＳ１０２ステップと、
Ｓ１０２ステップで取得された前記肌色領域を形態学的処理して、処理済み肌色領域を取得するＳ１０３ステップと、
ここで、前記形態学的処理とは、２値化された画像の形状の特徴を処理するための画像処理の分野における一連の技術であり、浸食処理と膨張処理とを含むこと、
Ｓ１０３ステップで処理して取得された前記処理済み肌色領域に対して、有効検索位置フィルタリングを行い、前記有効検索位置を取得し、輪郭抽出技術を利用して前記有効検索位置の輪郭を抽出し、各輪郭に対応して１つの被検フレームを生成するＳ１０４ステップと、
ここで、前記有効検索位置とは、前記有効検索位置フィルタリングの出力画像であること、
顔検出機能を有する畳み込みニューラルネットワークを使用して、Ｓ１０４ステップで取得された前記被検フレームを１つずつ検出し、前記被検フレーム内の顔位置決め座標を示すＳ１０５ステップと、
ここで、顔位置決め座標とは、前記顔検出機能を有する畳み込みニューラルネットワークが顔であると認識した領域の座標であること、
前記被検フレームの座標及び前記被検フレーム内の前記顔位置決め座標に基づいて、顔位置決めフレームの画像での座標を確定するＳ１０６ステップとを含む、
前記処理済み肌色領域に対して有効検索位置フィルタリングを行う前記Ｓ１０４ステップは、
フィルタ行列を使用して前記処理済み肌色領域に対して有効検索位置フィルタリングを行うことであって、前記処理済み肌色領域におけるピクセル値、前記フィルタ行列におけるピクセル値及び前記有効検索位置におけるピクセル値は下記の式を満たすことを含み、

ここで、ｄｓｔ（ｉ，ｊ）は有効検索位置ｄｓｔにおける座標（ｉ，ｊ）でのピクセル値を表し、ｓｒｃ（ｉ＋ｘ，ｊ＋ｙ）は肌色領域ｓｒｃにおける座標（ｉ＋ｘ，ｊ＋ｙ）でのピクセル値を表し、ｆ（ｘ，ｙ）はフィルタ行列ｆにおける座標（ｘ，ｙ）でのピクセル値を表し、前記フィルタ行列ｆのサイズは（２ａ＋１）×（２ｂ＋１）であり、中心座標は（０，０）であり、ｔは予め設定された有効検索率ＥＳＲの閾値を表し、ａｒｅａは前記フィルタ行列ｆにおける、値が１であるピクセルの数を表すこと、
前記有効検索率ＥＳＲは、前記被検フレームにおける前記肌色領域の面積と前記被検フレームとの面積との比として定義されること、を含む顔検出方法。 S101 step of converting the image to be detected from the RGB color space to the YCbCr color space;
A step S102 of determining whether each pixel of the image obtained in step S101 is a skin-color pixel using an elliptical skin-color model to obtain a skin-color region, wherein the blue chromaticity of any pixel and S102 determining the pixel as the skin color pixel if the red chromaticity component meets the requirements of an elliptical skin color model;
a step S103 of morphologically processing the skin-color region obtained in step S102 to obtain a processed skin-color region;
Here, the morphological processing is a series of techniques in the field of image processing for processing features of the shape of a binarized image, including erosion processing and dilation processing;
performing effective search position filtering on the processed skin color area obtained by processing in step S103 to obtain the effective search position , extracting the contour of the effective search position using a contour extraction technique; generating one test frame corresponding to each contour S104;
Here, the effective search position is an output image of the effective search position filtering,
step S105, detecting the test frames obtained in step S104 one by one using a convolutional neural network with a face detection function, and denoting the face positioning coordinates in the test frames;
Here, the face positioning coordinates are coordinates of an area recognized as a face by the convolutional neural network having the face detection function.
determining S106 the coordinates in the image of the face registration frame based on the coordinates of the frame under test and the face registration coordinates in the frame under test;
The step S104 of performing effective search position filtering on the processed skin color area includes:
performing effective search position filtering on the processed skin-color region using a filter matrix, wherein pixel values in the processed skin-color region, pixel values in the filter matrix and pixel values at the effective search positions are: including satisfying the expression of

where dst(i,j) represents the pixel value at coordinates (i,j) in the effective search position dst, and src(i+x,j+y) represents the pixel value at coordinates (i+x,j+y) in the skin color area src. where f(x,y) represents the pixel value at coordinates (x,y) in the filter matrix f, the size of said filter matrix f is (2a+1)×(2b+1) and the center coordinates are (0,0 ), t represents a preset effective search rate ESR threshold, and area represents the number of pixels whose value is 1 in the filter matrix f;
The face detection method, wherein the effective search rate ESR is defined as a ratio of the area of the skin color region in the test frame to the area of the test frame.

前記楕円肌色モデルの要件は、

であり、
ここで、Ｃｂは前記ピクセルの前記青の色度の成分を表し、Ｃｒは前記ピクセルの前記赤の色度の成分を表す請求項１に記載の方法。 The requirements for the elliptical skin color model are:

and
2. The method of claim 1, wherein Cb represents the blue chromaticity component of the pixel and Cr represents the red chromaticity component of the pixel.

前記被検フレームの左上角の座標（ｌｅｆｔ，ｔｏｐ）及び右下角の座標（ｒｉｇｈｔ，ｂｏｔｔｏｍ）はそれぞれ、

それぞれ輪郭外接矩形の左上角及び右下角の座標を表す請求項１に記載の方法。 The coordinates (left, top) of the upper left corner and the coordinates (right, bottom) of the lower right corner of the frame under test are, respectively,

2. The method of claim 1 , representing the coordinates of the upper left and lower right corners of the contour bounding rectangle, respectively.

前記被検出画像を前記ＲＧＢ色空間から前記ＹＣｂＣｒ色空間に変換するステップは、
下記の式を利用して、前記被検出画像に対して前記色空間変換を行うことを含み、

ここで、Ｙ、Ｃｂ、Ｃｒは、前記ピクセルの輝度、前記青の色度の成分、前記赤の色度の成分をそれぞれ表し、Ｒ、Ｇ、Ｂは、前記ピクセルの赤、緑、青の成分をそれぞれ表す請求項１に記載の方法。 converting the detected image from the RGB color space to the YCbCr color space,
performing the color space conversion on the detected image using the formula:

Here, Y, Cb, and Cr represent the luminance of the pixel, the blue chromaticity component, and the red chromaticity component, respectively, and R, G, and B represent the red, green, and blue components of the pixel. 2. The method of claim 1, each representing a component.

前記肌色領域を形態学的処理するステップは、
開操作で肌色ポイントや細線構造を取り除くことを含み、前記開操作は、同じ構造要素で画像を順次浸食及び膨張すること、を含む請求項１に記載の方法。 The step of morphologically processing the skin color region comprises:
2. The method of claim 1, wherein an unfolding operation comprises removing skin color points and fine line structures , said unfolding operation comprising sequentially eroding and dilating an image with the same structuring elements .

前記肌色領域を形態学的処理するステップは、
閉操作で、穴を埋め、ギャップを埋めることを更に含み、前記閉操作は、最初に膨張し、次に浸食すること、を含む請求項５に記載の方法。 The step of morphologically processing the skin color region comprises:
6. The method of claim 5 , further comprising filling holes and filling gaps with a closing operation, said closing operation comprising first expanding and then eroding .

前記被検フレームは、少なくとも被検フレームＡ及び被検フレームＢを含み、前記Ｓ１０４ステップは、
前記被検フレームＡ、Ｂを併合し、前記被検フレームＡとＢを併合して取得された被検フレームＣの面積が前記被検フレームＡとＢの面積の和以下である場合、前記被検フレームＡとＢを併合し、そうでない場合、被検フレームＡとＢを併合しないことを更に含む請求項１に記載の方法。 The test frames include at least a test frame A and a test frame B, and the step S104 includes:
If the area of the test frame C obtained by merging the test frames A and B is less than or equal to the sum of the areas of the test frames A and B, then the test frame C 2. The method of claim 1, further comprising merging test frames A and B, and otherwise not merging test frames A and B.

前記被検フレームＣの左上角の座標

それぞれ前記被検フレームＢの左上角の座標及び右下角の座標である請求項７に記載の方法。 Coordinates of the upper left corner of the subject frame C

8. The method of claim 7 , wherein the upper left corner coordinates and the lower right corner coordinates of the frame under test B are respectively.

前記顔位置決めフレームの左上角の座標（ｌ，ｔ）及び右下角の座標（ｒ，ｂ）はそれぞれ、

それぞれ、前記畳み込みニューラルネットワークから出力された、前記被検フレームＣのいずれかの顔位置決めフレームの左上角の座標及び右下角の座標である請求項７に記載の方法。 The upper left corner coordinates (l, t) and the lower right corner coordinates (r, b) of the face positioning frame are respectively

8. The method of claim 7 , wherein the upper left corner coordinates and the lower right corner coordinates of any face registration frame of the test frame C output from the convolutional neural network, respectively.

コンピュータプログラムを格納するメモリと、前記コンピュータプログラムを実行すると、請求項１～９のいずれか１項に記載の方法のステップを実施するプロセッサとを含むコンピュータデバイス。 A computer device comprising a memory storing a computer program and a processor which, when executing said computer program, implements the steps of the method according to any one of claims 1 to 9 .

プロセッサによって実行されると、請求項１～９のいずれか１項に記載の方法のステップを実施するコンピュータプログラムが記憶されているコンピュータ可読記憶媒体。 A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 9 .