CN113111850B

CN113111850B - Human body key point detection method, device and system based on region-of-interest transformation

Info

Publication number: CN113111850B
Application number: CN202110478213.5A
Authority: CN
Inventors: 杨帆; 郝强; 潘鑫淼; 胡建国
Original assignee: Nanjing Zhenshi Intelligent Technology Co Ltd
Current assignee: Xiaoshi Technology Jiangsu Co ltd
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2022-08-16
Anticipated expiration: 2041-04-30
Also published as: CN113111850A

Abstract

The invention provides a method, a device and a system for detecting human key points based on region-of-interest transformation. And in the model training process, carrying out region-of-interest transformation on the human key point data, and training the human key point model by using the transformed data. And in the process of detecting the model, detecting the human body key points according to the trained human body key point model, and performing inverse transformation to obtain the human body key points of the image before transformation. The invention effectively standardizes the data to a uniform form, overcomes the problem of large data change in an open scene, reduces the training difficulty, can improve the face proportion in the image through the region-of-interest transformation, is beneficial to the prediction of key points of the face, and further improves the integral precision of key points of a human body. Compared with a method for separately predicting body and face key points, the method only needs one face detector and one key point detector, and the calculation cost is low.

Description

Human body key point detection method, device and system based on region-of-interest transformation

Technical Field

The invention relates to the technical field of image processing, in particular to human face detection and recognition, and specifically relates to a human body key point detection method, device and system based on region-of-interest transformation.

Background

The task of human body key point detection is to detect the key point positions of the face and the limbs in the human body image. Human body image data under an uncontrolled scene have large changes, for example, the differences among people, dresses, postures, shelters and background environments are large, the face proportion is small, and difficulty is brought to the training of a human body key point detection model.

The existing human body key point detection methods mainly comprise two types, one type is that the human body position in an image is detected firstly, the human body image is intercepted, and then key points in the image are detected, but because the face occupies a small proportion in the image, the face key point prediction is not accurate, the number of the face key points is often large, and the number of the limb key points is small, so the integral precision is influenced.

The other method is to detect key points of the human body by detecting the positions of the human body and the human face, and specifically comprises the steps of firstly intercepting images of the human body and the human face and then respectively detecting key points of limbs and the face. Although the method has high precision, a plurality of model predictions are needed, and the calculation is time-consuming.

Disclosure of Invention

The invention aims to provide a method, a device and a system for detecting human key points based on region-of-interest transformation.

In order to achieve the above object, a first aspect of the present invention provides a method for detecting human key points based on region of interest transformation, including the following steps:

step 1, obtaining M color images containing a human body, wherein M is a natural number more than 1000;

step 2, marking N human body key points on each color image to obtain marking data; the human body key points comprise face key points and limb key points, and the number of the face key points is more than that of the limb key points;

step 3, determining a face boundary frame of the color image according to the coordinates of the labeled face key points;

step 4, performing region-of-interest transformation on each color image and the labeled data according to the face center point and the face size to obtain transformed images and transformed human body key point coordinates; the face central point and the face size are determined according to the face bounding box;

step 5, training a human body key point detection model for detecting the human body key points based on the image after the region of interest is transformed and the transformed human body key point coordinates;

step 6, detecting a human face boundary frame by using a human face detector for the input image to be detected containing the human body, and then carrying out region-of-interest transformation according to the method in the step 4, so as to improve the proportion of the human face in the image and obtain a transformed image;

step 7, detecting the human key points in the transformed image by using the human key point detection model obtained by training in the step 5; and

and 8, carrying out region-of-interest inverse transformation on the human body key points in the transformed image to obtain the human body key points of the image to be detected before transformation.

The second aspect of the present invention further provides a human body key point detection device based on region of interest transformation, including:

a module for acquiring M color images including a human body, M being a natural number greater than 1000;

a module for labeling N human body key points on each color image to obtain labeling data; the human body key points comprise face key points and limb key points, and the number of the face key points is more than that of the limb key points;

a module for determining a face bounding box of the color image according to the coordinates of the labeled face key points;

a module for performing region-of-interest transformation on each color image and the labeled data according to the face center point and the face size to obtain a transformed image and transformed coordinates of key points of the human body; the face central point and the face size are determined according to the face bounding box;

a module for training a human body key point detection model for detecting human body key points based on the image after the transformation of the region of interest and the transformed human body key point coordinates;

a module for detecting a human face boundary box by using a human face detector for an input image to be detected containing a human body, then carrying out region-of-interest transformation, improving the proportion of the human face in the image and obtaining a transformed image;

a module for detecting human key points in the transformed image using a trained human key point detection model; and

and the module is used for carrying out region-of-interest inverse transformation on the human body key points in the transformed image to obtain the human body key points of the image to be detected before transformation.

The third aspect of the present invention further provides a system for human body keypoint detection based on region of interest transformation, comprising:

one or more processors;

a memory storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising a flow of a human keypoint detection method based on region of interest transformation as previously described.

Compared with the prior art, the technical scheme of the invention has the following remarkable beneficial effects:

the method aims at the problem of human body detection obstacle caused by the problems of large scene change and small face proportion of human body image data in an open environment, provides a mode of carrying out region-of-interest transformation on the data by taking a human face as a center, training a human body key point detection model by using the transformed data, therefore, during actual detection, after the human face is detected by the human face detector, the interested area of the image is changed, then detecting key points of the human body, finally performing inverse transformation to obtain the key point data of the original image to be detected, therefore, on one hand, the data can be adjusted to a uniform mode, the training difficulty is reduced, on the other hand, the proportion of the face in the image can be improved through transformation because the number of the face key points is far more than that of the body key points, the face key points can be predicted more accurately, therefore, the overall performance of human body key point detection is improved, and the accuracy of the human body key point detection model is improved. Meanwhile, the method only needs one face detector and one key point detector, and the calculation cost is low.

It should be understood that all combinations of the foregoing concepts and additional concepts described in greater detail below can be considered as part of the inventive subject matter of this disclosure unless such concepts are mutually inconsistent. In addition, all combinations of claimed subject matter are considered a part of the presently disclosed subject matter.

The foregoing and other aspects, embodiments and features of the present teachings can be more fully understood from the following description taken in conjunction with the accompanying drawings. Additional aspects of the present invention, such as features and/or advantages of exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of specific embodiments in accordance with the teachings of the present invention.

Drawings

The drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a training process of a human key point detection model based on region of interest transformation according to an exemplary embodiment of the present invention.

FIG. 2 is a schematic diagram of a model structure of the human body key point detection model of the present invention.

FIG. 3 is a schematic diagram illustrating a process of detecting key points of a human body by using the model shown in FIG. 1 according to the embodiment of the invention.

Detailed Description

In order to better understand the technical content of the present invention, specific embodiments are described below with reference to the accompanying drawings.

In this disclosure, aspects of the present invention are described with reference to the accompanying drawings, in which a number of illustrative embodiments are shown. Embodiments of the present disclosure are not necessarily intended to include all aspects of the invention. It should be appreciated that the various concepts and embodiments described above, as well as those described in greater detail below, may be implemented in any of numerous ways, as the disclosed concepts and embodiments are not limited to any one implementation. In addition, some aspects of the present disclosure may be used alone, or in any suitable combination with other aspects of the present disclosure.

Referring to fig. 1, 2 and 3, the method for detecting key points of a human body based on region of interest transformation provided by the invention comprises a model training process and a model detection process. And in the model training process, carrying out region-of-interest transformation on the human key point data, and training the human key point model by using the transformed data. And in the process of detecting the model, detecting the human body key points according to the trained human body key point model, and performing inverse transformation to obtain the human body key points of the image before transformation.

In the model training process, the problem of large data change in an open scene is solved by effectively standardizing the data to a uniform form, the training difficulty is reduced, meanwhile, the face proportion in the image can be improved through region-of-interest transformation, the prediction of the face key points is facilitated, and the integral precision of the human body key points is further improved. Compared with a method for separately predicting body and face key points, the method only needs one face detector and one key point detector, and the calculation cost is low.

As shown in fig. 1 and 3, the method for detecting human key points based on region of interest transformation according to the embodiment of the present disclosure includes the following steps:

step 2, marking N human body key points on each color image to obtain marking data;

step 6, detecting a human face boundary box by using a human face detector for the input image to be detected containing the human body, and then carrying out region of interest conversion according to the method in the step 4, so as to improve the proportion of the human face in the image and obtain a converted image;

step 7, detecting the human key points in the transformed image by using the human key point detection model trained in the step 5; and

Human body key point data acquisition and labeling

In step 1, a base image of a training set is constructed by acquiring a large number of color images M including a human body, M being greater than 1000. In particular, the image data covers as much of the scene as possible, such as different people, clothing, poses, occlusions, and background environments.

In step 2, each color image is labeled with N human body key points, and the obtained labeling data are as follows:

wherein the content of the first and second substances,

as the m-th image

The nth key point coordinate of (a), M-1, N-1, 0, 1, 2.

The labeled human key points comprise face key points and limb key points, and the number of the face key points is more than that of the limb key points. From the face key points, a face bounding box for the face can be determined.

Region of interest transformation

In step 4, performing region-of-interest transformation on each color image and the labeled data according to the face center point and the face size to obtain transformed images and transformed key point coordinates, including:

the central point of the face boundary box is taken as the central point of the face,taking the length of the long edge of the bounding box as the size of the face, and according to the central point and the size of the face, the image is displayed

And carrying out region-of-interest transformation on the corresponding human body key points to obtain transformed data expression as follows:

{[I ₀ ，(p _0，0 ，p _0，1 ，...，p _0，N-1 )]，[I ₁ ，(p _1，0 ，p _1，1 ，...，p _1，N-1 )]，...，[I _M-1 ，(p _M-1，0 ，p _M-1，1 ，...， _pM-1，N-1 )]}

wherein p is _m，n ＝(x _m，n ，y _m，n ) For the m-th transformed image I _m The n-th transformed human body key point coordinate is obtained, the side length of the transformed image is L, and L is a positive integer; in a preferred example, L ≧ 64; in the present example, the value is 64 or 128;

transformed image I _m Wherein each pixel value is a slave image

Sampled in (i.e. images before transformation), x _indices，m To be from an image

List of sampled position abscissas, y _indices，m To be from an image

The sampling position ordinate list is specifically obtained as follows:

x _indices，m ＝(x _face，m +warp _RoI，m (0)，x _face，m +warp _RoI，m (1)，...，x _face，m +warp _RoI，m (L-1))

y _indices，m ＝(y _face，m +warp _RoI，m (0)，y _face，m +warp _RoI，m (1)，...，y _face ，m+warp _RoI，m (L-1))

warp _RoI，m (t)＝a _m /2·arctanh(2t/L-0.9)

wherein warp _RoI，m (t) is a region of interest transform function of the mth image, t is a function input, and t is 0, 1, 2.

The image interesting region transformation adopts a remap method in an opencv image processing library, and the parameter map1 is set as x _indices，m The parameter map2 is set to y _indices，m ；

Human body key point (x) after region of interest transformation _m，n ，y _m，n ) And calculating by a traversal method.

Wherein, the human body key point (x) after the region of interest is transformed _m，n ，y _m，n ) The method is calculated by a traversal method, and the traversal calculation process comprises the following steps:

go through t in the range of t-0, 1, 2

At this time, the value of t is the abscissa x of the transformed key point _m，n (ii) a And

go through t to find

At this time, the value of t is the ordinate y of the transformed key point _m，n 。

Human body key point training detection model

In step 5, a CNN network-based implementation of a human key point detection model for detecting human key points, such as the model structure shown in fig. 2, is made up of a convolutional layer, a maximum pooling layer, and a full-link layer.

The convolution kernel size of the convolution layer is 3 × 3, the step size is 1, the zero Padding method is Same Padding, and the number of convolution kernels is indicated in parentheses of each convolution layer in fig. 2.

The pooling window size of the maximum pooling layer was 2 × 2 with a step size of 2.

The number of first fully-connected layer neurons was 1024 and the number of second fully-connected layer neurons was 2N.

Each convolutional layer and the first fully connected layer are then activated using a ReLU activation function.

During the training process, the loss function of the mth data is expressed as:

wherein (x) _m，n ，y _m，n ) Is the nth human body key point of the mth training sample in the data set after the region of interest transformation, (x' _m，n ，y′ _m，n ) And predicting the nth human body key point of the training image after the mth interesting area is transformed by the model.

Therefore, a detection model for detecting key points in the human body image after the region of interest is transformed is trained and obtained according to the image after the region of interest is transformed and the transformed coordinates of the key points of the human body.

Human body key point detection application

As an example shown in fig. 3, the human key point detection process for an input image to be detected containing a human body includes:

firstly, detecting a human face boundary frame by using a human face detector, then carrying out region-of-interest transformation according to the method in the step 4, and improving the proportion of the human face in the image to obtain a transformed image;

then, detecting the human key points in the transformed image by using a human key point detection model obtained by training; and

and finally, carrying out region-of-interest inverse transformation on the human body key points in the transformed image to obtain the human body key points of the image to be detected before transformation.

The adopted face detector can adopt a Dlib tool and the like to detect a human body and determine a boundary frame of the face. It should be understood that, in the implementation of the present invention, the face detection is not limited to the above Dlib tool, and may also be implemented by using other face detection models trained in advance.

According to the center point (x) of the boundary box of the human face _test，face ，y _test，face ) And the length a of the long side of the face bounding box _test Using a remap method in an opencv image processing library to transform the region of interest of the image to be detected, setting the parameter map1 as x _{test，indices} With the parameter map2 set to y _{test，indices} . The calculation method is as follows:

x _{test，indices} ＝(x _test，face +warp _RoI，test (0)，x _test，face +warp _RoI，test (1)，...，x _test，face +warp _RoI，test (L-1))

y _{test，indices} ＝(y _test，face +warp _RoI，test (0)，y _test，face +warp _RoI，test (1)，...，y _test，face +warp _RoI，test (L-1))

warp _RoI，test (t)＝a _test /2·arctanh(2t/L-0.9)

wherein warp _RoI，test And (t) is a transformation function of the region of interest of the image to be detected, t is a function input, and t is 0, 1, 2.

Detecting keypoints (x) in the transformed image using the human keypoint detection model trained in step 3 _test，n ，y _test，n )。

Then, carrying out region-of-interest inverse transformation on the key points in the transformed image to obtain human body key points (x) of the image to be detected before transformation _{src，test，n} ，y _{src，test，n} )：

x _{src，test，n} ＝x _test，face +warp _RoI，test (x _test，n )

y _{src，test，n} ＝y _test，face +warp _RoI，test (y _test，n ) Therefore, the human body key point data of the image to be detected before transformation is obtained.

It should be understood that in step 4 and step 6, the side length L of the image after the region of interest transformation has the same value.

Test procedure

12000 groups of labeled human body key point data are prepared according to the steps 1 and 2, and comprise 10000 groups of training data and 2000 groups of test data. The data covers various people, dresses, poses, occlusions and background environments. On the basis of 10000 groups of training data, region-of-interest transformation is carried out, a detection model is trained, a training human body key point model is used, and verification is carried out on test data. And (4) comparing and directly using the original data to train the human body key point detection model to carry out key point detection on the basis of the test data.

The normalized average error is used as an evaluation index, namely the Euclidean distance between a predicted coordinate and a labeled coordinate is divided by the length of a diagonal line of a human body boundary box. The comparative results are shown in Table 1.

TABLE 1 comparison of test results of the prior art method and the method of the present invention

	Normalized mean error
		Existing methods	6.32％
The method of the invention	4.94％

As can be seen from comparison of test results, the model training method can effectively improve the model precision, and compared with the existing method, the test error is reduced by 1.38%.

Human key point detection device based on region of interest transformation

According to the disclosure of the present invention, there is also provided a human body key point detection device based on region of interest transformation, comprising:

a module for acquiring M color images containing a human body, M being a natural number greater than 1000;

It should be understood that the functions and implementation of the modules of the human body key point detection apparatus based on region of interest transformation of the present embodiment can be implemented based on the specific operations of the aforementioned human body key point detection method based on region of interest transformation.

System for human body key point detection based on region of interest transformation

According to the disclosure of the present invention, there is also provided a system for human keypoint detection based on region of interest transformation, comprising:

one or more processors;

a memory storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising a flow of a region of interest transformation based human keypoint detection method as previously described, in particular the procedures of the detection method as implemented in connection with fig. 1, 3.

Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention should be determined by the appended claims.

Claims

1. A human body key point detection method based on region of interest transformation is characterized by comprising the following steps:

step 6, for the input image to be detected containing the human body, detecting a human face bounding box by using a human face detector, and then carrying out region-of-interest transformation on the image to be detected according to the method in the step 4, so as to improve the proportion of the human face in the image and obtain a transformed image;

step 8, performing region-of-interest inverse transformation on the human body key points in the transformed image to obtain human body key points of the image to be detected before transformation;

in step 2, N human body key points are labeled on each color image, and the obtained labeling data are expressed as:

wherein the content of the first and second substances,

as the m-th image

The nth keypoint coordinate of (a), M0, 1, 2,., M-1, N0, 1, 2,., N-1;

in step 4, performing region-of-interest transformation on each color image and label data according to the face center point and the face size to obtain transformed images and transformed key point coordinates, including:

taking the central point of the human face boundary frame as the human face central point, taking the length of the long edge of the boundary frame as the human face size, and according to the human face central point and the size, carrying out image processing

{[I ₀ ，(p _0，0 ，p _0，1 ，...，p _0，N-1 )]，[I ₁ ，(p _1，0 ，p _1，1 ，...，p _1，N-1 )]，...，[I _M-1 ，(p _M-1，0 ，p _M-1，1 ，...，p _M-1，N-1 )]}

wherein p is _m，n ＝(x _m，n ，y _m，n ) For the m-th transformed image I _m The n-th transformed human body key point coordinate is obtained, the side length of the transformed image is L, and L is a positive integer;

transformed image I _m Wherein each pixel value is a slave image

Obtained by intermediate sampling, x _indices，m To be from an image

List of sampled position abscissas, y _indices，m To be from an image

The sampling position ordinate list is obtained in the following specific manner:

y _indices，m ＝(y _face，m +warp _Rol，m (0)，y _face，m +warp _RoI，m (1)，...，y _face，m +warp _Rol，m (L-1))

warp _RoI，m (t)＝a _m /2·arctanh(2t/L-0.9)

wherein warp _ReI，m (t) is a region of interest transform function of the mth image, t is a function input, and t is 0, 1, 2.

The image interesting region transformation adopts a remap method in an opencv image processing library, and the parameter map1 is set as x _indices，m With the parameter map2 set to y _indices，m ；

Region of interest changeChanged key points (x) of human body _m，n ，y _m，n ) And calculating by a traversal method.

2. The method for detecting human key points based on region of interest transformation according to claim 1, wherein in the step 4, the human key points (x) after the region of interest transformation _m，n ，y _m，n ) The method is calculated by a traversal method, and the traversal calculation process comprises the following steps:

go through t in the range of t-0, 1, 2

go through t to find

3. The method for detecting human key points based on region of interest transformation according to claim 1, wherein in the step 5, the CNN-based network implementation of the human key point detection model for detecting human key points is implemented, wherein in the training process, the loss function of the mth data is expressed as:

4. The method for detecting key points of a human body based on region-of-interest transformation according to claim 1, wherein in the step 8, the inverse region-of-interest transformation is performed on the key points of the human body in the transformed image to obtain the key points of the human body in the image to be detected before transformation, and the method comprises the following steps:

detecting the human key points (x) output by using the human key point detection model in the step 5 _test，n ，y _test，n ) Obtaining the human body key point (x) of the image to be detected before transformation by using the following region of interest inverse transformation formula _{src，test，n} ，y _{src，test，n} )：

x _{src，test，n} ＝x _test，face +warp _RoI.test (x _test，n )

y _{src，test，n} ＝y _test，face +warp _RoI，test (y _test，n )

Wherein (x) _test，face ，y _test，face ) Representing the midpoint of the face bounding box, a _test Representing the length of the long side of the face bounding box; x is the number of _{test，indices} And y _{test，indices} Respectively representing the sampling values, x, of the image to be detected before transformation when the image to be detected is subjected to the interesting transformation _{test，indices} For a list of sampled position abscissas, y _indices，m Is a sampled position ordinate list;

wherein, for the transformation of the interested region of the image to be detected before transformation, the remap method in the opencv image processing library is used, and the parameter map1 is set as x _{test，indices} The parameter map2 is set to y _{test，indices} ；

warp _RoI，test (t)＝a _test /2·arctanh(2t/L-0.9)

Wherein warp _RoI，test And (t) is a region-of-interest transformation function of the image to be detected, t is a function input, and t is 0, 1, 2.

5. The method for detecting human body key points based on region of interest transformation according to claim 1, wherein in the step 4 and the step 6, the side lengths L of the images after the region of interest transformation have the same value.

6. The method for detecting human key points based on region of interest transformation according to claim 5, wherein the side length L of the image after the region of interest transformation is 64 or 128.

7. A system for human keypoint detection based on region of interest transformations, comprising:

one or more processors;

a memory storing instructions that are operable, when executed by the one or more processors, to cause the one or more processors to perform operations comprising a flow of a region of interest transform based human keypoint detection method according to any of claims 1-6.