JP2022510963A

JP2022510963A - Human body orientation detection method, device, electronic device and computer storage medium

Info

Publication number: JP2022510963A
Application number: JP2021531125A
Authority: JP
Inventors: 李▲逍▼; ▲許▼▲經▼▲緯▼; 程光▲亮▼
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2019-11-20
Filing date: 2020-09-08
Publication date: 2022-01-28
Also published as: KR20210087494A; CN112825145A; CN112825145B; WO2021098346A1

Abstract

本願の実施例は、人体向き検出方法、装置、電子機器及びコンピュータ記憶媒体を開示する。該方法は、処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得ることと、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定することと、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定することと、を含む。これにより、本願の実施例において、最終的な人体向きは、人体キーポイントと初歩的な人体向きを総合的に考慮した上で得られたものである。従って、人体キーポイントに基づいて、最終的な人体向きの正確性及び利用可能性を向上させることができる。The embodiments of the present application disclose human body orientation detection methods, devices, electronic devices and computer storage media. In the method, the feature of the image to be processed is extracted to obtain the feature of the image to be processed, and the key point of the human body and the rudimentary orientation of the human body are determined based on the feature of the image to be processed. Includes determining the final human orientation based on the determined human body key points and rudimentary human orientation. As a result, in the embodiment of the present application, the final human body orientation is obtained after comprehensively considering the human body key points and the rudimentary human body orientation. Therefore, the accuracy and availability of the final human body orientation can be improved based on the human body key points.

Description

（関連出願の相互参照）
本願は、２０１９年１１月２０日に提出された、出願番号が２０１９１１１４３０５７．６である中国特許出願に基づく優先権を主張し、該中国特許出願の全内容が参照として本願に組み込まれる。 (Mutual reference of related applications)
The present application claims priority based on the Chinese patent application filed on November 20, 2019, with application number 2009111143057.6, the entire contents of which Chinese patent application is incorporated herein by reference.

本願は、コンピュータビジョン処理技術に関し、特に人体向き検出方法、装置、電子機器及びコンピュータ記憶媒体に関する。 The present application relates to computer vision processing techniques, and in particular to human body orientation detection methods, devices, electronic devices and computer storage media.

コンピュータビジョン処理技術の進歩に伴い、歩行者向き検出は次第に、コンピュータビジョン分野の重要な研究方向となってきた。関連技術において、歩行者向き検出の技術的解決手段は、カメラにより得られた画像に対して処理を行うことで、画像における各人物の体及び／又は顔の向きを予測することであるが、このように検出された歩行者の向きの正確性及び利用可能性が保証されることができない。 With advances in computer vision processing technology, pedestrian orientation detection has gradually become an important research direction in the field of computer vision. In a related technique, a technical solution for pedestrian orientation detection is to process an image obtained by a camera to predict the orientation of each person's body and / or face in the image. The accuracy and availability of such detected pedestrian orientation cannot be guaranteed.

本願の実施例は、人体向き検出技術的解決手段を提供することが望ましい。 It is desirable that the embodiments of the present application provide a technical solution for detecting human body orientation.

本願の実施例は、人体向き検出方法を提供する。前記方法は、
処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得ることと、
前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定することと、
決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定することと、を含む。 The embodiments of the present application provide a method for detecting the orientation of the human body. The method is
Feature extraction is performed on the image to be processed to obtain the features of the image to be processed.
Determining the key points of the human body and the rudimentary orientation of the human body based on the characteristics of the image to be processed.
Includes determining the final human orientation based on the determined human body key points and rudimentary human orientation.

幾つかの実施例において、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定することは、
前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致することに応答して、前記初歩的な人体向きを前記最終的な人体向きとして決定することを含む。これにより、初歩的な人体向きを最終的な人体向きとして決定することで、最終的な人体向きを正確に得ることができる。 In some embodiments, determining the final human orientation based on the determined human body key points and rudimentary human orientation is
It comprises determining the rudimentary human body orientation as the final human body orientation in response to the human body orientation appearing at the determined human body key point matching the rudimentary human body orientation. As a result, the final human body orientation can be accurately obtained by determining the rudimentary human body orientation as the final human body orientation.

幾つかの実施例において、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定することは、
前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致しないことに応答して、前記決定された人体キーポイントで表れる人体向きを前記最終的な人体向きとして決定することを含む。これにより、人体キーポイントで表れる人体向きが初歩的な人体向きに合致しない場合、初歩的な人体向きの正確度が低いと認められ、人体キーポイントで表れる人体向きを最終的な人体向きとして決定ことで、最終的な人体向きの正確度を向上させることができる。 In some embodiments, determining the final human orientation based on the determined human body key points and rudimentary human orientation is
In response to the fact that the human body orientation appearing at the determined human body key point does not match the rudimentary human body orientation, the human body orientation appearing at the determined human body key point is determined as the final human body orientation. include. As a result, if the human body orientation that appears at the human body key point does not match the rudimentary human body orientation, it is recognized that the accuracy of the rudimentary human body orientation is low, and the human body orientation that appears at the human body key point is determined as the final human body orientation. This can improve the accuracy of the final orientation for the human body.

幾つかの実施例において、処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得て、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定するステップは、ニューラルネットワークにより実行され、前記ニューラルネットワークは、第１サンプル画像及び第２サンプル画像により訓練して得られたものであり、前記第１サンプル画像に第１人体画像及びアノテーションされた人体キーポイントが含まれ、前記第２サンプル画像は、第２人体画像及びアノテーションされた人体向きを含む。 In some embodiments, feature extraction is performed on the image to be processed, the features of the image to be processed are obtained, and the key points of the human body and the rudimentary orientation of the human body are determined based on the features of the image to be processed. The steps are performed by a neural network, which is obtained by training with a first sample image and a second sample image, the first sample image with a first human body image and an annotated human body key. Points are included and the second sample image includes a second human body image and an annotated human body orientation.

幾つかの実施例において、前記ニューラルネットワークが第１サンプル画像及び第２サンプル画像により訓練して得られたものであることは、
前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得ることと、前記第１サンプル画像の特徴に基づいて歩行者キーポイント検出を行い、前記第１サンプル画像の人体キーポイントを得ることと、前記第２サンプル画像の特徴に基づいて向き検出を行い、前記第２サンプル画像の人体向きを得ることと、
検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含む。これにより、ニューラルネットワークのネットワークパラメータ値を調整することで、訓練されたニューラルネットワークの性能をより好適にする。 In some embodiments, the neural network was obtained by training with a first sample image and a second sample image.
Features are extracted from the first sample image and the second sample image to obtain the features of the first sample image and the second sample image, and the pedestrian key is based on the features of the first sample image. Point detection is performed to obtain a human body key point of the first sample image, orientation detection is performed based on the characteristics of the second sample image, and the human body orientation of the second sample image is obtained.
Includes adjusting the network parameter values of the neural network based on the detected human body keypoints, the annotated human body keypoints, the detected human body orientation and the annotated human body orientation. This makes the performance of the trained neural network more favorable by adjusting the network parameter values of the neural network.

幾つかの実施例において、前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得ることは、
前記第１サンプル画像と前記第２サンプル画像をスティッチングし、スティッチング後の画像データに対して特徴抽出を行い、スティッチング後の画像データの特徴を得ることと、
前記第１サンプル画像と前記第２サンプル画像とのスティッチング方式に応じて、前記スティッチング後の画像データの特徴を前記第１サンプル画像の特徴及び前記第２サンプル画像の特徴に分割することと、を含む。これにより、スティッチング後の画像データの特徴の分割により、第１サンプル画像及び第２サンプル画像の特徴に対して、それぞれ人体キーポイント検出及び人体向き検出を行うことに寄与し、実現の複雑さが低い。 In some examples, it is possible to perform feature extraction on the first sample image and the second sample image to obtain the features of the first sample image and the second sample image.
Stitching the first sample image and the second sample image, extracting features from the stitched image data, and obtaining the features of the stitched image data.
According to the stitching method between the first sample image and the second sample image, the characteristics of the image data after stitching are divided into the characteristics of the first sample image and the characteristics of the second sample image. ,including. This contributes to human body key point detection and human body orientation detection for the features of the first sample image and the second sample image by dividing the features of the image data after stitching, respectively, and the complexity of realization. Is low.

幾つかの実施例において、前記第１サンプル画像と前記サンプル画像をスティッチングすることは、前記第１サンプル画像と前記第２サンプル画像をバッチ次元に沿ってスティッチングすることを含み、
前記第１サンプル画像と前記第２サンプル画像をスティッチングする前に、前記方法は、
前記第１サンプル画像及び前記第２サンプル画像をチャネル、高さ及び幅という３つの次元で同じくなるように調整することで、バッチ次元に沿って画像データのスティッチングを行うことを実現することを更に含む。 In some embodiments, stitching the first sample image and the sample image comprises stitching the first sample image and the second sample image along a batch dimension.
Before stitching the first sample image and the second sample image, the method is described.
By adjusting the first sample image and the second sample image so as to be the same in the three dimensions of channel, height, and width, it is possible to realize stitching of image data along the batch dimension. Further included.

幾つかの実施例において、検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することは、
前記検出された人体キーポイント及び前記アノテーションされた人体キーポイントに基づいて、前記ニューラルネットワークの第１損失値を得ることであって、前記第１損失値は、前記検出された人体キーポイントと前記アノテーションされた人体キーポイントとの差異を表す、ことと、
前記検出された人体向き及び前記アノテーションされた人体向きに基づいて、前記ニューラルネットワークの第２損失値を得ることであって、前記第２損失値は、前記検出された人体向きと前記アノテーションされた人体向きとの差異を表す、ことと、
前記第１損失値及び前記第２損失値に基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含む。これにより、損失値を用いてニューラルネットワークのネットワークパラメータ値を調整することで、ニューラルネットワークのロバスト性を向上させることができる。 In some embodiments, adjusting the network parameter values of the neural network based on the detected human body keypoint, the annotated human body keypoint, the detected human body orientation and the annotated human body orientation can be performed.
The first loss value of the neural network is obtained based on the detected human body key point and the annotated human body key point, and the first loss value is the detected human body key point and the said. Representing the difference from the annotated human body keypoint,
The second loss value of the neural network is obtained based on the detected human body orientation and the annotated human body orientation, and the second loss value is annotated with the detected human body orientation. Representing the difference from the human body orientation,
It includes adjusting the network parameter value of the neural network based on the first loss value and the second loss value. As a result, the robustness of the neural network can be improved by adjusting the network parameter value of the neural network using the loss value.

本願の実施例は、人体向き検出装置を更に提供する。前記装置は、抽出モジュールと、処理モジュールと、を備え、
抽出モジュールは、処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得るように構成され、
処理モジュールは、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定し、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成される。 The embodiments of the present application further provide a human body orientation detection device. The apparatus includes an extraction module and a processing module.
The extraction module is configured to perform feature extraction on the image to be processed and obtain the features of the image to be processed.
The processing module determines the human body key point and the rudimentary human body orientation based on the characteristics of the processed image, and determines the final human body orientation based on the determined human body key point and the rudimentary human body orientation. It is configured to do.

幾つかの実施例において、前記処理モジュールは、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成され、前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致することに応答して、前記初歩的な人体向きを前記最終的な人体向きとして決定することを含む。 In some embodiments, the processing module is configured to determine the final human body orientation based on the determined human body key points and the rudimentary human body orientation, and is represented by the determined human body key points. It comprises determining the rudimentary human body orientation as the final human body orientation in response to the human body orientation matching the rudimentary human body orientation.

幾つかの実施例において、前記処理モジュールは、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成され、前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致しないことに応答して、前記決定された人体キーポイントで表れる人体向きを前記最終的な人体向きとして決定することを含む。 In some embodiments, the processing module is configured to determine the final human body orientation based on the determined human body key points and the rudimentary human body orientation, and is represented by the determined human body key points. In response to the human body orientation not matching the rudimentary human body orientation, it comprises determining the human body orientation appearing at the determined human body key point as the final human body orientation.

幾つかの実施例において、前記装置は、訓練モジュールを更に備え、前記訓練モジュールは、第１サンプル画像及び第２サンプル画像により前記ニューラルネットワークを訓練するように構成され、
前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得ることと、前記第１サンプル画像の特徴に基づいて歩行者キーポイント検出を行い、前記第１サンプル画像の人体キーポイントを得ることと、前記第２サンプル画像の特徴に基づいて向き検出を行い、前記第２サンプル画像の人体向きを得ることと、
検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含む。 In some embodiments, the device further comprises a training module, the training module being configured to train the neural network with a first sample image and a second sample image.
Features are extracted from the first sample image and the second sample image to obtain the features of the first sample image and the second sample image, and the pedestrian key is based on the features of the first sample image. Point detection is performed to obtain a human body key point of the first sample image, orientation detection is performed based on the characteristics of the second sample image, and the human body orientation of the second sample image is obtained.
Includes adjusting the network parameter values of the neural network based on the detected human body keypoints, the annotated human body keypoints, the detected human body orientation and the annotated human body orientation.

幾つかの実施例において、前記訓練モジュールは、前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得るように構成され、
前記第１サンプル画像と前記第２サンプル画像をスティッチングし、スティッチング後の画像データに対して特徴抽出を行い、スティッチング後の画像データの特徴を得ることと、
前記第１サンプル画像と前記第２サンプル画像とのスティッチング方式に応じて、前記スティッチング後の画像データの特徴を前記第１サンプル画像の特徴及び前記第２サンプル画像の特徴に分割することと、を含む。 In some embodiments, the training module is configured to perform feature extraction on the first sample image and the second sample image to obtain the features of the first sample image and the second sample image. ,
Stitching the first sample image and the second sample image, extracting features from the stitched image data, and obtaining the features of the stitched image data.
According to the stitching method between the first sample image and the second sample image, the characteristics of the image data after stitching are divided into the characteristics of the first sample image and the characteristics of the second sample image. ,including.

幾つかの実施例において、前記訓練モジュールは、前記第１サンプル画像と前記第２サンプル画像をスティッチングするように構成され、前記第１サンプル画像と前記第２サンプル画像をバッチ次元に沿ってスティッチングすることを含み、
前記訓練モジュールは更に、前記第１サンプル画像と前記第２サンプル画像をスティッチングする前に、前記第１サンプル画像及び前記第２サンプル画像をチャネル、高さ及び幅という３つの次元で同じくなるように調整するように構成される。 In some embodiments, the training module is configured to stitch the first sample image and the second sample image, stitching the first sample image and the second sample image along a batch dimension. Including
The training module further aligns the first sample image and the second sample image in the three dimensions of channel, height and width before stitching the first sample image and the second sample image. It is configured to adjust to.

幾つかの実施例において、前記訓練モジュールは、検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整するように構成され、
前記検出された人体キーポイント及び前記アノテーションされた人体キーポイントに基づいて、前記ニューラルネットワークの第１損失値を得ることであって、前記第１損失値は、前記検出された人体キーポイントと前記アノテーションされた人体キーポイントとの差異を表す、ことと、
前記検出された人体向き及び前記アノテーションされた人体向きに基づいて、前記ニューラルネットワークの第２損失値を得ることであって、前記第２損失値は、前記検出された人体向きと前記アノテーションされた人体向きとの差異を表す、ことと、
前記第１損失値及び前記第２損失値に基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含む。 In some embodiments, the training module adjusts the network parameter values of the neural network based on the detected human body keypoints, the annotated human body keypoints, the detected human body orientation and the annotated human body orientation. Configured to
The first loss value of the neural network is obtained based on the detected human body key point and the annotated human body key point, and the first loss value is the detected human body key point and the said. Representing the difference from the annotated human body keypoint,
The second loss value of the neural network is obtained based on the detected human body orientation and the annotated human body orientation, and the second loss value is annotated with the detected human body orientation. Representing the difference from the human body orientation,
It includes adjusting the network parameter value of the neural network based on the first loss value and the second loss value.

本願の実施例は、電子機器を更に提供する。前記電子機器は、プロセッサと、プロセッサで実行可能なコンピュータプログラムを記憶するように構成されるメモリと、を備え、
前記プロセッサは、前記コンピュータプログラムを実行し、上記いずれか１つの人体向き検出方法を実行するように構成される。 The embodiments of the present application further provide electronic devices. The electronic device comprises a processor and a memory configured to store computer programs that can be executed by the processor.
The processor is configured to execute the computer program and execute any one of the above-mentioned human body orientation detection methods.

本願の実施例は、コンピュータ記憶媒体を更に提供する。前記コンピュータ記憶媒体に、コンピュータプログラムが記憶されており、該コンピュータプログラムがプロセッサにより実行される時、上記いずれか１つの人体向き検出方法を実現する。 The embodiments of the present application further provide a computer storage medium. When a computer program is stored in the computer storage medium and the computer program is executed by the processor, any one of the above-mentioned human body orientation detection methods is realized.

本願の実施例で提供される人体向き検出方法、装置、電子機器及びコンピュータ記憶媒体において、処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得て、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定し、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定する。これにより、本願の実施例において、最終的な人体向きは、人体キーポイントと初歩的な人体向きを総合的に考慮した上で得られたものである。従って、人体キーポイントに基づいて、最終的な人体向きの正確性及び利用可能性を向上させることができる。 In the human body orientation detection method, apparatus, electronic device, and computer storage medium provided in the examples of the present application, feature extraction is performed on the processed target image to obtain the features of the processed target image, and the features of the processed target image are obtained. Based on the determined human body key points and rudimentary human body orientation, the final human body orientation is determined based on the determined human body key points and rudimentary human body orientation. As a result, in the embodiment of the present application, the final human body orientation is obtained after comprehensively considering the human body key points and the rudimentary human body orientation. Therefore, the accuracy and availability of the final human body orientation can be improved based on the human body key points.

上記の一般的な説明及び後述する細部に関する説明は、例示及び説明のためのものに過ぎず、本願を限定するものではないことが理解されるべきである。 It should be understood that the general description above and the details described below are for illustration and illustration purposes only and are not intended to limit the present application.

本願の実施例による人体向き検出方法を示すフローチャートである。It is a flowchart which shows the human body orientation detection method by an Example of this application. 本願の実施例による訓練されたニューラルネットワークのアーキテクチャを示す概略図である。FIG. 6 is a schematic diagram showing the architecture of a trained neural network according to an embodiment of the present application. 本願の実施例による人体キーポイントを示す概略図である。It is a schematic diagram which shows the human body key point by the Example of this application. 本願の実施例による人体向きを示す概略図である。It is a schematic diagram which shows the human body orientation by the Example of this application. 本願の実施例によるニューラルネットワーク訓練方法を示すフローチャートである。It is a flowchart which shows the neural network training method by the Example of this application. 本願の実施例によるニューラルネットワーク訓練のアーキテクチャを示す概略図である。It is a schematic diagram which shows the architecture of the neural network training by the embodiment of this application. 本願の実施例による画像データのスティッチングを示す概略図である。It is a schematic diagram which shows the stitching of the image data by the Example of this application. 本願の実施例による画像特徴分割を示す概略図である。It is a schematic diagram which shows the image feature division by the Example of this application. 本願の実施例による人体向き検出装置の構造を示す概略図である。It is a schematic diagram which shows the structure of the human body orientation detection apparatus according to the Example of this application. 本願の実施例による電子機器の構造を示す概略図である。It is a schematic diagram which shows the structure of the electronic device by the Example of this application.

ここで添付した図面は、明細書に引き入れて本明細書の一部分を構成し、本願に適合する実施例を示し、かつ、明細書とともに本願の技術的解決手段を解釈することに用いられる。 The drawings attached herein are incorporated into the specification to form a portion of the specification, show examples conforming to the present application, and are used together with the specification to interpret the technical solutions of the present application.

以下、図面及び実施例を参照しながら、本願の実施例を更に詳しく説明する。ここで提供される実施例は、本願の実施例を解釈するためのものに過ぎず、本願の実施例を限定するものではないことを理解すべきである。また、以下に提供される実施例は、本願の一部の実施例を実行するためのものであり、本願の全ての実施例を実行するためのものではない。矛盾しない限り、本願の実施例に記載の技術的解決手段を任意の組み合わせで実行することができる。 Hereinafter, examples of the present application will be described in more detail with reference to the drawings and examples. It should be understood that the examples provided herein are merely for interpreting the embodiments of the present application and are not intended to limit the embodiments of the present application. In addition, the examples provided below are for executing some of the examples of the present application, not all of the examples of the present application. As long as there is no contradiction, the technical solutions described in the examples of the present application can be implemented in any combination.

本願の実施例において、用語「含む」、「備える」、またはそれらの他のいずれかの変形は、非排他的包含を包括するように意図される。従って、一連の要素を含む方法又は装置は、明確に記載された要素を含むだけでなく、明確に列挙されていない他の要素も含み、又は、このような方法又は装置に固有の要素も含む。更なる限定が存在しない場合、“・・・を含む”なる文章によって規定される要素は、該要素を有する方法又は装置内に、別の関連要素（例えば、方法におけるステップ又は装置におけるユニットであり、ユニットは、例えば、一部の回路、一部のプロセッサ、一部のプログラム又はソフトウェアなどであってもよい）が更に存在することを排除しない。 In the embodiments of the present application, the term "contains", "provides", or any other variation thereof is intended to include non-exclusive inclusion. Accordingly, a method or device comprising a set of elements includes not only clearly described elements, but also other elements not explicitly listed, or elements specific to such methods or devices. .. In the absence of further limitations, the element defined by the sentence "contains ..." is another relevant element (eg, a step in the method or a unit in the device) within the method or device having the element. , The unit may be, for example, some circuits, some processors, some programs or software).

例えば、本願の実施例で提供される人体向き検出方法は、一連のステップを含むが、本願の実施例で提供される人体向き検出方法は、記載したステップに限定されない。同様に、本願の実施例で提供される人体向き検出装置は、一連のモジュールを備えるが、本願の実施例で提供される装置は、明確に記載されたモジュールを備えるものに限定されず、関連情報の取得、又は情報に基づいた処理に必要なモジュールを更に備えてもよい。 For example, the human body orientation detection method provided in the embodiment of the present application includes a series of steps, but the human body orientation detection method provided in the embodiment of the present application is not limited to the described steps. Similarly, the human orientation detector provided in the embodiments of the present application comprises a set of modules, whereas the apparatus provided in the embodiments of the present application is not limited to those comprising a clearly described module and is relevant. Further modules may be provided necessary for information acquisition or information-based processing.

本明細書において、用語「及び／又は」は、関連対象の関連関係を説明するためのものであり、３通りの関係が存在することを表す。例えば、Ａ及び／又はＢは、Ａのみが存在すること、ＡとＢが同時に存在すること、Ｂのみが存在することという３つの場合を表す。また、本明細書において、用語「少なくとも１つ」は、複数のうちのいずれか１つ又は複数のうちの少なくとも２つの任意の組み合わせを表す。例えば、Ａ、Ｂ、Ｃのうちの少なくとも１つを含むことは、Ａ、Ｂ及びＣからなる集合から選ばれるいずれか１つ又は複数の要素を含むことを表す。 As used herein, the term "and / or" is used to describe the relationship of a related object and indicates that there are three types of relationships. For example, A and / or B represent three cases: that only A exists, that A and B exist at the same time, and that only B exists. Also, as used herein, the term "at least one" refers to any one of a plurality or any combination of at least two of the plurality. For example, including at least one of A, B, and C means containing any one or more elements selected from the set consisting of A, B, and C.

本願の実施例は、端末及び／又はサーバからなるコンピュータシステムに適用され、多くの他の汎用又は専用コンピュータシステム環境又は構成と協働することができる。ここで、端末は、シンクライアント、シッククライアント、ハンドヘルド又はラップトップデバイス、マイクロプロセッサベースのシステム、セットトップボックス、プログラマブル消費者向け電子製品、ネットワークパソコン、小型コンピュータシステムなどであってもよい。サーバは、サーバコンピュータシステム、小型コンピュータシステム、大型コンピュータシステム及び上記如何なるシステムを含む分散型クラウドコンピューティング技術などであってもよい。 The embodiments of the present application apply to computer systems consisting of terminals and / or servers and can work with many other general purpose or dedicated computer system environments or configurations. Here, the terminal may be a thin client, a thick client, a handheld or laptop device, a microprocessor-based system, a settop box, a programmable consumer electronics product, a networked personal computer, a small computer system, and the like. The server may be a server computer system, a small computer system, a large computer system, a distributed cloud computing technology including any of the above systems, and the like.

端末、サーバなどの電子機器は、コンピュータシステムにより実行されるコンピュータシステムによる実行可能な命令（例えば、プログラムモジュール）の一般的な内容で説明できる。一般的には、プログラムモジュールは、ルーチン、プログラム、ターゲットプログラム、ユニット、ロジック、データ構造などを含んでもよい。それらは、特定のタスクを実行するか又は特定の抽象的データタイプを実現する。コンピュータシステム／サーバは、分散型クラウドコンピューティング環境で実行される。分散型クラウドコンピューティング環境において、タスクは、通信ネットワークを通じてリンクされたリモート処理デバイスによって実行される。分散型クラウドコンピューティング環境において、プログラムモジュールは、記憶装置を含むローカル又はリモートコンピューティングシステム記憶媒体に位置してもよい。 Electronic devices such as terminals and servers can be described by the general content of instructions (eg, program modules) that can be executed by a computer system that are executed by the computer system. In general, a program module may include routines, programs, target programs, units, logic, data structures, and the like. They perform specific tasks or realize specific abstract data types. Computer systems / servers run in a distributed cloud computing environment. In a distributed cloud computing environment, tasks are performed by remote processing devices linked through a communication network. In a distributed cloud computing environment, the program module may be located on a local or remote computing system storage medium, including a storage device.

上述によれば、本願の幾つかの実施例において、人体向き検出の技術的解決手段を提供する。本願の実施例を適用できるシーンは、自動運転、ロボットナビゲーションなどを含むが、これらに限定されない。 According to the above, in some embodiments of the present application, technical solutions for human body orientation detection are provided. The scenes to which the embodiments of the present application can be applied include, but are not limited to, automatic driving, robot navigation, and the like.

図１は、本願の実施例による人体向き検出方法を示すフローチャートである。図１に示すように、該プロセスは以下を含んでもよい。 FIG. 1 is a flowchart showing a human body orientation detection method according to the embodiment of the present application. As shown in FIG. 1, the process may include:

ステップ１０１において、処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得る。 In step 101, feature extraction is performed on the image to be processed to obtain the features of the image to be processed.

実際の適用において、ローカルストレージ領域又はネットワークから、処理対象画像を取得することができる。処理対象画像のフォーマットは、共同静止画専門家グループ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔｓＧＲＯＵＰ：ＪＰＥＧ）、ビットマップ（Ｂｉｔｍａｐ：ＢＭＰ）、ポータブルネットワークグラフィックス（ＰｏｒｔａｂｌｅＮｅｔｗｏｒｋＧｒａｐｈｉｃｓ：ＰＮＧ）又は他のフォーマットであってもよい。ここで、処理対象画像のフォーマット及びソースを例により説明するだであり、本願の実施例は、処理対象画像のフォーマット及びソースを限定するものではない。 In actual application, the image to be processed can be acquired from the local storage area or the network. The format of the image to be processed may be a joint still image expert group (Joint Photographic Experts GROUP: JPEG), a bitmap (Bitmap: BMP), a portable network graphics (Portable Network Graphics: PNG), or another format. .. Here, the format and source of the image to be processed will be described by example, and the embodiment of the present application does not limit the format and source of the image to be processed.

実際の適用において、処理対象画像を特徴抽出ネットワークに入力し、特徴抽出ネットワークを利用して、処理対象画像に対して特徴抽出を行い、処理対象画像の特徴を得ることができる。本願の実施例において、特徴抽出ネットワークは、画像特徴を抽出するためのニューラルネットワークである。特徴抽出ネットワークは、畳み込み層などの構造を含んでもよい。ここで、特徴抽出ネットワークの種類を限定しない。例えば、特徴抽出ネットワークは、深層残差ネットワーク（Ｒｅｓｎｅｔ）又は画像特徴抽出のための他のニューラルネットワークであってもよい。 In an actual application, the processing target image can be input to the feature extraction network, the feature extraction network can be used to perform feature extraction on the processing target image, and the features of the processing target image can be obtained. In the embodiment of the present application, the feature extraction network is a neural network for extracting image features. The feature extraction network may include a structure such as a convolutional layer. Here, the type of the feature extraction network is not limited. For example, the feature extraction network may be a deep residual network (Error) or another neural network for image feature extraction.

本願の実施例は、処理対象画像の特徴の表現形態を限定しない。例えば、処理対象画像の特徴の表現形態は、特徴マップ又は他の表現形態であってもよい。 The embodiment of the present application does not limit the expression form of the feature of the image to be processed. For example, the expression form of the feature of the image to be processed may be a feature map or another expression form.

ステップ１０２において、処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定する。 In step 102, the human body key point and the rudimentary human body orientation are determined based on the characteristics of the image to be processed.

本ステップの実現形態として、例示的に、処理対象画像の特徴に基づいて、人体キーポイント検出を行い、人体キーポイントを得、処理対象画像の特徴に基づいて、人体向き検出を行い、初歩的な人体向きを得る。 As an embodiment of this step, as an example, a human body key point is detected based on the characteristics of the image to be processed, a human body key point is obtained, and a human body orientation is detected based on the characteristics of the image to be processed, which is rudimentary. Get a good human body orientation.

処理対象画像の特徴に対して人体キーポイント検出を行うための実現形態として、例示的に、処理対象画像の特徴に対して畳み込み及びアップサンプリング処理を行うことで、人体キーポイントを得ることができる。 As an embodiment for detecting the human body key point on the feature of the processing target image, the human body key point can be obtained by performing convolution and upsampling processing on the feature of the processing target image as an example. ..

１つの具体的な例において、処理対象画像の特徴を得た後、処理対象画像の特徴を特徴ピラミッドネットワーク（ＦｅａｔｕｒｅＰｙｒａｍｉｄＮｅｔｗｏｒｋｓ：ＦＰＮ）に入力し、ＦＰＮを利用して、処理対象画像の特徴を処理し、人体キーポイントを得ることができる。ＦＰＮに基づいた画像特徴処理方式は、異なるサイズの特徴マップから特徴を抽出し、続いて、異なるサイズの特徴マップをフュージョンすることで、マルチスケールの特徴を抽出することができる。更に、これらのマルチスケールの特徴をフュージョンすることで、人体キーポイントを正確に得ることができる。 In one specific example, after obtaining the features of the image to be processed, the features of the image to be processed are input to the feature pyramid network (Feature Pyramid Networks: FPN), and the features of the image to be processed are displayed using FPN. You can process and get human body key points. The image feature processing method based on FPN can extract features from feature maps of different sizes, and then fuse feature maps of different sizes to extract multiscale features. Furthermore, by fusing these multi-scale features, it is possible to accurately obtain human body key points.

処理対象画像の特徴に対して人体向き検出を行うための実現形態として、例示的に、処理対象画像の特徴に対して畳み込み処理を行うことで、初歩的な人体向きを得ることができる。実際の適用において、処理対象画像の特徴を得た後、処理対象画像の特徴を、少なくとも１つの畳み込み層からなるニューラルネットワークに入力し、続いて、該ニューラルネットワークにおいて、畳み込み操作により、処理対象画像の特徴を初歩的な人体向き検出結果に変換する。 As an embodiment for detecting the human body orientation with respect to the features of the image to be processed, it is possible to obtain a rudimentary human body orientation by performing a convolution process for the features of the image to be processed, as an example. In an actual application, after obtaining the characteristics of the image to be processed, the characteristics of the image to be processed are input to a neural network composed of at least one convolution layer, and then, in the neural network, the image to be processed is subjected to a convolution operation. Is converted into a rudimentary human body orientation detection result.

実際の適用において、ステップ１０１からステップ１０２は、訓練されたニューラルネットワークにより実現されてもよい。図２は、本願の実施例による訓練されたニューラルネットワークのアーキテクチャを示す概略図である。図２に示すように、訓練されたニューラルネットワークは、下位層ネットワーク及び上位層ネットワークという２つの部分を含む。ここで、下位層ネットワークは、上記特徴抽出ネットワークである。実際に実行する時に、下位層ネットワークの入力は、処理対象画像であり、下位層ネットワークを利用して、処理対象画像に対して特徴抽出を行った後、表現能力が処理対象画像の表現能力よりも強い、比較的高いレベルの特徴を得ることができる。上位層ネットワークは、人体キーポイント検出のための上位層ネットワーク及び人体向き検出のための上位層ネットワークを含む。人体キーポイント検出のための上位層ネットワークを利用して、処理対象画像の特徴を処理し、人体キーポイントを得ることができる。人体向き検出のための上位層ネットワークを利用して、処理対象画像の特徴を処理し、初歩的な人体向きを得ることができる。 In a practical application, steps 101 through 102 may be implemented by a trained neural network. FIG. 2 is a schematic diagram showing the architecture of a trained neural network according to an embodiment of the present application. As shown in FIG. 2, the trained neural network includes two parts, a lower layer network and an upper layer network. Here, the lower layer network is the feature extraction network. At the time of actual execution, the input of the lower layer network is the image to be processed, and after feature extraction is performed on the image to be processed using the lower layer network, the expressive ability is higher than the expressive ability of the image to be processed. It is also possible to obtain a relatively high level of characteristics. The upper layer network includes an upper layer network for detecting a human body key point and an upper layer network for detecting a human body orientation. By using the upper layer network for detecting the human body key point, the feature of the image to be processed can be processed and the human body key point can be obtained. By using the upper layer network for detecting the human body orientation, it is possible to process the features of the image to be processed and obtain a rudimentary human body orientation.

ステップ１０３において、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定する。 In step 103, the final human body orientation is determined based on the determined human body key points and the rudimentary human body orientation.

実際の適用において、ステップ１０１からステップ１０３は、電子機器におけるプロセッサにより実現してもよい。上記プロセッサは、特定用途向け集積回路（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ：ＡＳＩＣ）、デジタル信号プロセッサ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ：ＤＳＰ）、デジタル信号処理機器（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇＤｅｖｉｃｅ：ＤＳＰＤ）、プログラマブルロジックデバイス（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ：ＰＬＤ）、フィールドプログラマブルゲートアレイ（Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ：ＦＰＧＡ）、中央演算装置（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ：ＣＰＵ）、コントローラ、マイクロコントローラ、マイクロプロセッサのうちの少なくとも１つであってもよい。 In actual application, steps 101 to 103 may be realized by a processor in an electronic device. The processor includes an integrated circuit (Application Special Integrated Circuit: ASIC) for a specific application, a digital signal processor (Digital Signal Processor: DSP), a digital signal processing device (Digital Signal Processing Device: DSP), and a programmable device (DSP). It may be at least one of a PLD), a field programmable gate array (Field-Programmable Gate Array: FPGA), a central processing unit (CPU), a controller, a microcontroller, and a microprocessor.

関連技術において、人体向き検出のみを基に、人体の向きを判定するため、得られた人体向きの精度が低い。本願の実施例において、最終的な人体向きは、人体キーポイントと初歩的な人体向きを総合的に考慮した上で得られたものである。従って、人体キーポイントに基づいて、最終的な人体向きの正確性及び利用可能性を向上させることができる。 In the related technique, since the orientation of the human body is determined based only on the detection of the orientation of the human body, the accuracy of the obtained orientation of the human body is low. In the embodiment of the present application, the final human body orientation is obtained after comprehensively considering the human body key points and the rudimentary human body orientation. Therefore, the accuracy and availability of the final human body orientation can be improved based on the human body key points.

また、本願の実施例において、人体キーポイント検出及び人体向き検出タスクにおける画像特徴抽出はいずれも、同一の画像特徴抽出ネットワークで実現される。従って、本願の実施例は、少ないコンピューティングリソースで、人体キーポイント検出と人体向き検出タスクを同時に実現することができ、人体キーポイント検出及び人体向き検出タスクに求められるリアルタイム性の要件を満たすことに寄与する。また、人体キーポイントの検出結果及び人体向きの検出結果の両方を利用して人体向きを判定し、人体向き検出の正確性を向上させる。 Further, in the embodiment of the present application, both the image feature extraction in the human body key point detection and the human body orientation detection task are realized by the same image feature extraction network. Therefore, in the embodiment of the present application, the human body key point detection and the human body orientation detection task can be realized at the same time with a small amount of computing resources, and the real-time requirements required for the human body key point detection and the human body orientation detection task are satisfied. Contribute to. In addition, both the detection result of the human body key point and the detection result of the human body orientation are used to determine the human body orientation, and the accuracy of the human body orientation detection is improved.

ステップ１０３の実現形態として、一例において、前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致することに応答して、前記初歩的な人体向きを前記最終的な人体向きとして決定する。 As an embodiment of step 103, in one example, the rudimentary human body orientation is changed to the final human body orientation in response to the human body orientation appearing at the determined human body key point matching the rudimentary human body orientation. To be determined as.

実際の適用において、決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致するかどうかを判定し、判定結果を得ることができる。決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致すると判定した場合、初歩的な人体向きが比較的正確であると認められる。従って、初歩的な人体向きを最終的な人体向きとして決定することで、最終的な人体向きを正確に得ることができる。 In an actual application, it is possible to determine whether or not the human body orientation appearing at the determined human body key points matches the rudimentary human body orientation, and obtain a determination result. When it is determined that the human body orientation appearing at the determined human body key points matches the rudimentary human body orientation, it is recognized that the rudimentary human body orientation is relatively accurate. Therefore, by determining the rudimentary human body orientation as the final human body orientation, the final human body orientation can be accurately obtained.

以下、図面を参照しながら、例を挙げて本願の実施例の効果を説明する。 Hereinafter, the effects of the embodiments of the present application will be described with reference to the drawings.

図３は、本願の実施例による人体キーポイントを示す概略図である。図３に示すように、数字０から１７は、人体キーポイント検出により得られた人体キーポイントを表す。全ての人体キーポイントを検出できる場合、人体の向きは、前方又は後方を向く。左側キーポイントのみが検出された場合、人体は、左を向く。右側キーポイントのみが検出された場合、人体は、右を向く。図４は、本願の実施例による人体向きを示す概略図である。図４において、数字１から８は、異なる人体向きを表す。人体向き検出において、人体向きを８つの方向に分ける。これにより、キーポイントに基づいて決定された人体向きよりも正確である。従って、キーポイントの検出結果を用いて向き検出結果を修正することで、向き検出結果の正確度を向上させることができる。 FIG. 3 is a schematic diagram showing the key points of the human body according to the embodiment of the present application. As shown in FIG. 3, the numbers 0 to 17 represent the human body key points obtained by the human body key point detection. If all human body key points can be detected, the human body is oriented forward or backward. If only the left keypoint is detected, the human body turns to the left. If only the right keypoint is detected, the human body turns to the right. FIG. 4 is a schematic view showing the orientation of the human body according to the embodiment of the present application. In FIG. 4, the numbers 1 to 8 represent different human body orientations. In the human body orientation detection, the human body orientation is divided into eight directions. This is more accurate than the human orientation determined based on the key points. Therefore, the accuracy of the orientation detection result can be improved by modifying the orientation detection result using the key point detection result.

図３及び図４から分かるように、人体向きが異なる場合、検出可能な人体キーポイントの数及び位置も異なる。例えば、人体左側の全てのキーポイントを検出することができ、それに対して右側のキーポイントの一部しかを検出しておらず、又は右側のキーポイントを全く検出していない場合、初歩的な人体向きは、同様に左側向きであれば、該初歩的な人体向きが正確であると判定することができる。更に、該初歩的な人体向きを最終的な人体向きとして決定することで、最終的な人体向きの正確度を高いレベルに保持することができる。 As can be seen from FIGS. 3 and 4, when the human body orientation is different, the number and position of the detectable human body key points are also different. For example, if you can detect all the keypoints on the left side of the human body, but only part of the keypoints on the right side, or not at all the keypoints on the right side, it is rudimentary. Similarly, if the human body orientation is to the left, it can be determined that the rudimentary human body orientation is accurate. Further, by determining the rudimentary human body orientation as the final human body orientation, the accuracy of the final human body orientation can be maintained at a high level.

ステップ１０３の実現形態として、もう１つの例において、前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致しないことに応答して、前記決定された人体キーポイントで表れる人体向きを前記最終的な人体向きとして決定する。 As an embodiment of step 103, in another example, the human body appearing at the determined human body key point in response to the fact that the human body orientation appearing at the determined human body key point does not match the rudimentary human body orientation. The orientation is determined as the final orientation of the human body.

上記から分かるように、人体キーポイントで表れる人体向きが初歩的な人体向きに合致しない場合、初歩的な人体向きの正確度が低いと認められる。これにより、決定された人体キーポイントで表れる人体向きを前記最終的な人体向きとして決定することで、最終的な人体向きの正確度を向上させることができる。 As can be seen from the above, if the human body orientation shown by the human body key points does not match the rudimentary human body orientation, it is recognized that the accuracy of the rudimentary human body orientation is low. Thereby, the accuracy of the final human body orientation can be improved by determining the human body orientation appearing at the determined human body key point as the final human body orientation.

例えば、図３及び図４に示すように、人体キーポイントのうち、人体側面の幾つかのキーポイントのみが有効である場合、初歩的な人体向きが正面又は背面である場合、該初歩的な人体向きが正確ではないと判定することができる。つまり、人体キーポイントにより、人体向きの有効性及び正確性を効果的に判定することができる。更に、人体キーポイントにより、初歩的な人体向きを最適化することで、最終的な人体向きの正確度及び利用可能性を向上させることができる。 For example, as shown in FIGS. 3 and 4, if only some of the key points on the side of the human body are valid, or if the rudimentary orientation of the human body is front or back, the rudimentary It can be determined that the orientation of the human body is not accurate. That is, the effectiveness and accuracy of the human body orientation can be effectively determined by the human body key points. In addition, the human body key points can improve the accuracy and availability of the final human body orientation by optimizing the rudimentary human body orientation.

幾つかの実施例において、ステップ１０１からステップ１０２は、ニューラルネットワークにより実現してもよい。上記ニューラルネットワークは、第１サンプル画像及び第２サンプル画像により訓練して得られたものである。第１サンプル画像に、第１人体画像及びアノテーションされた人体キーポイントが含まれ、第２サンプル画像に、第２人体画像及びアノテーションされた人体向きが含まれる。 In some embodiments, steps 101 through 102 may be implemented by a neural network. The above neural network was obtained by training with the first sample image and the second sample image. The first sample image includes the first human body image and the annotated human body key points, and the second sample image includes the second human body image and the annotated human body orientation.

実際の適用において、ローカルストレージ領域又はネットワークから、第１サンプル画像又は第２サンプル画像を取得することができる。第１サンプル画像又は第２サンプル画像のフォーマットは、ＪＰＥＧ、ＢＭＰ、ＰＮＧ又は他のフォーマットであってもよい。ここで、第１サンプル画像又は第２サンプル画像のフォーマット及びソースを例により説明するだであり、本願の実施例は、第１サンプル画像又は第２サンプル画像のフォーマット及びソースを限定するものではない。 In actual application, a first sample image or a second sample image can be obtained from a local storage area or network. The format of the first sample image or the second sample image may be JPEG, BMP, PNG or another format. Here, the format and source of the first sample image or the second sample image will be described by example, and the examples of the present application do not limit the format and source of the first sample image or the second sample image. ..

１つの具体的な例において、第１サンプル画像及び第２サンプル画像を異なるデータセットから取得することができる。第１サンプル画像に対応するデータセットと第２サンプル画像に対応するデータセットは、重なり合い部分を有しなくてもよい。 In one specific example, the first sample image and the second sample image can be obtained from different datasets. The data set corresponding to the first sample image and the data set corresponding to the second sample image do not have to have an overlapping portion.

上記から分かるように、本願の実施例において、ニューラルネットワークに基づいて、人体キーポイント及び初歩的な人体向きを得ることができ、実現しやすいという特徴を有する。 As can be seen from the above, in the embodiment of the present application, the human body key points and the rudimentary human body orientation can be obtained based on the neural network, and it is easy to realize.

以下、図面を参照しながら、上記ニューラルネットワークの訓練プロセスを例示的に説明する。 Hereinafter, the training process of the above neural network will be described schematically with reference to the drawings.

図５は、本願の実施例によるニューラルネットワーク訓練方法を示すフローチャートである。図５に示すように、該プロセスは、以下を含んでもよい。 FIG. 5 is a flowchart showing a neural network training method according to the embodiment of the present application. As shown in FIG. 5, the process may include:

ステップ５０１において、第１サンプル画像及び第２サンプル画像を取得する。 In step 501, a first sample image and a second sample image are acquired.

本ステップの実現形態は、上述で説明されたため、ここで、詳細な説明を省略する。 Since the embodiment of this step has been described above, detailed description thereof will be omitted here.

ステップ５０２において、第１サンプル画像及び第２サンプル画像をニューラルネットワークに入力し、ニューラルネットワークに基づいて下記ステップを実行する。第１サンプル画像及び第２サンプル画像に対して特徴抽出を行い、第１サンプル画像及び第２サンプル画像の特徴を得る。第１サンプル画像の特徴に基づいて歩行者キーポイント検出を行い、第１サンプル画像の人体キーポイントを得る。第２サンプル画像の特徴に基づいて向き検出を行い、第２サンプル画像の人体向きを得る。 In step 502, the first sample image and the second sample image are input to the neural network, and the following steps are executed based on the neural network. Feature extraction is performed on the first sample image and the second sample image to obtain the features of the first sample image and the second sample image. Pedestrian key points are detected based on the characteristics of the first sample image, and the human body key points of the first sample image are obtained. Orientation is detected based on the characteristics of the second sample image, and the orientation of the second sample image is obtained.

実際の適用において、第１サンプル画像及び第２サンプル画像を特徴抽出ネットワークに入力し、特徴抽出ネットワークを利用して、第１サンプル画像及び第２サンプル画像に対して特徴抽出を行い、第１サンプル画像及び第２サンプル画像の特徴を得ることができる。 In the actual application, the first sample image and the second sample image are input to the feature extraction network, and the feature extraction network is used to perform feature extraction on the first sample image and the second sample image, and the first sample is performed. The features of the image and the second sample image can be obtained.

本願の実施例は、第１サンプル画像及び第２サンプル画像の特徴の表現形態を限定しない。例えば、第１サンプル画像及び第２サンプル画像の特徴の表現形態は、特徴マップ又は他の表現形態であってもよい。 The embodiments of the present application do not limit the representation form of the features of the first sample image and the second sample image. For example, the expression form of the features of the first sample image and the second sample image may be a feature map or another expression form.

第１サンプル画像及び第２サンプル画像に対して特徴抽出を行い、第１サンプル画像及び第２サンプル画像の特徴を得るための実現形態として、例示的に、第１サンプル画像と第２サンプル画像に対して画像データのスティッチングを行い、スティッチング後の画像データに対して特徴抽出を行い、スティッチング後の画像データの特徴を得る。第１サンプル画像と第２サンプル画像の画像データのスティッチング方式に応じて、スティッチング後の画像データの特徴を第１サンプル画像の特徴及び第２サンプル画像の特徴に分割する。 As an embodiment for obtaining the features of the first sample image and the second sample image by extracting the features of the first sample image and the second sample image, the first sample image and the second sample image are exemplified. On the other hand, the image data is stitched, the characteristics of the stitched image data are extracted, and the characteristics of the stitched image data are obtained. The characteristics of the image data after stitching are divided into the characteristics of the first sample image and the characteristics of the second sample image according to the stitching method of the image data of the first sample image and the second sample image.

上記から分かるように、第１サンプル画像及び第２サンプル画像に対して画像データのスティッチングを行うことで、スティッチング後の画像データに対して特徴抽出を一括して行うことに寄与し、実現しやすい。スティッチング後の画像データの特徴の分割により、第１サンプル画像の特徴と第２サンプル画像の特徴に対して、それぞれ人体キーポイント検出及び人体向き検出を行うことに寄与し、実現しやすい。 As can be seen from the above, by stitching the image data to the first sample image and the second sample image, it contributes to and realizes the feature extraction collectively for the image data after stitching. It's easy to do. By dividing the features of the image data after stitching, it contributes to the detection of the key points of the human body and the detection of the orientation of the human body for the features of the first sample image and the features of the second sample image, respectively, and is easy to realize.

第１サンプル画像及び第２サンプル画像に対して画像データのスティッチングを行うための実現形態として、例示的に、第１サンプル画像と第２サンプル画像をバッチ次元に沿ってスティッチングすることができる。第１サンプル画像と第２サンプル画像スティッチングする前に、第１サンプル画像及び第２サンプル画像をチャネル、高さ及び幅という３つの次元で同じくなるように調整することができる。続いて、バッチ次元で、調整された第１サンプル画像と第２サンプル画像をスティッチングすることができる。 As an embodiment for stitching image data to a first sample image and a second sample image, the first sample image and the second sample image can be stitched along a batch dimension as an example. .. Prior to stitching the first sample image and the second sample image, the first sample image and the second sample image can be adjusted to be the same in the three dimensions of channel, height and width. Subsequently, the adjusted first sample image and second sample image can be stitched in the batch dimension.

ここで、画像のチャネル数は、画像特徴抽出を行うチャネルの数を表し、バッチ次元は、画像の数量次元を表す。本願の実施例において、第１サンプル画像と第２サンプル画像のチャネル数、高さ及び幅を同じ大きさに調整する場合、異なる数の調整された第１サンプル画像及び第２サンプル画像をバッチ次元に沿って画像データのスティッチングすることができる。 Here, the number of channels of the image represents the number of channels for image feature extraction, and the batch dimension represents the quantity dimension of the image. In the embodiment of the present application, when the number of channels, the height and the width of the first sample image and the second sample image are adjusted to the same size, different numbers of the adjusted first sample image and the second sample image are batch-dimensionalized. Image data can be stitched along with.

図６は、本願の実施例によるニューラルネットワーク訓練のアーキテクチャを示す概略図である。図７は、本願の実施例による画像データのスティッチングを示す概略図である。図７において、実線矩形枠は、第１サンプル画像６０１を表し、点線矩形枠は、第２サンプル画像６０２を表す。本願の実施例において、第１サンプル画像６０１及び第２サンプル画像６０２のデータフォーマットは、［ＢＣＨＷ］で表されてもよい。ここで、Ｂは、バッチ次元の大きさを表し、Ｃは、チャネル次元の大きさを表し、Ｈは、高さを表し、Ｗは、幅を表す。画像特徴抽出プロセスに関わる畳み込みなどの演算はいずれも、チャネル次元、高さ次元及び幅次元で行われるため、図６及び図７に示すように、バッチ次元に沿って、第１サンプル画像６０２と第２サンプル画像６０３に対して画像データのスティッチングを行うことができる。 FIG. 6 is a schematic diagram showing the architecture of neural network training according to the embodiment of the present application. FIG. 7 is a schematic diagram showing stitching of image data according to the embodiment of the present application. In FIG. 7, the solid rectangular frame represents the first sample image 601 and the dotted rectangular frame represents the second sample image 602. In the embodiment of the present application, the data formats of the first sample image 601 and the second sample image 602 may be represented by [BCHW]. Here, B represents the size of the batch dimension, C represents the size of the channel dimension, H represents the height, and W represents the width. Since all operations such as convolution related to the image feature extraction process are performed in the channel dimension, the height dimension, and the width dimension, as shown in FIGS. 6 and 7, the first sample image 602 and the first sample image 602 are performed along the batch dimension. Image data can be stitched to the second sample image 603.

図６に示すように、下位層ネットワーク６０１を利用して、スティッチング後の画像データに対して特徴抽出を行い、対応する画像特徴を得ることができる。続いて、下位層ネットワークから出力された画像特徴を分割する必要がある。 As shown in FIG. 6, the lower layer network 601 can be used to extract features from the stitched image data and obtain corresponding image features. Subsequently, it is necessary to divide the image features output from the lower layer network.

図８は、本願の実施例による画像特徴分割を示す概略図である。図８において、実線矩形枠（Ｃ１に対応する）は、第１サンプル画像の画像特徴を表し、点線矩形枠（Ｃ２に対応する）は、第２サンプル画像の画像特徴を表す。本願の実施例において、第１サンプル画像と第２サンプル画像の画像データのスティッチング方式に応じて、バッチ次元に沿って、スティッチング後の画像データの特徴を分割し、第１サンプル画像の画像特徴８０１及び第２サンプル画像の画像特徴８０２を得ることができる。ここで、第１サンプル画像の画像特徴８０１及び第２サンプル画像の画像特徴８０２はいずれも特徴マップで表される。 FIG. 8 is a schematic view showing image feature division according to an embodiment of the present application. In FIG. 8, the solid rectangular frame (corresponding to C1) represents the image feature of the first sample image, and the dotted rectangular frame (corresponding to C2) represents the image feature of the second sample image. In the embodiment of the present application, the characteristics of the stitched image data are divided along the batch dimension according to the stitching method of the image data of the first sample image and the second sample image, and the image of the first sample image is obtained. The feature 801 and the image feature 802 of the second sample image can be obtained. Here, the image feature 801 of the first sample image and the image feature 802 of the second sample image are both represented by a feature map.

図６に示すように、第１サンプル画像の画像特徴を、人体キーポイント検出を行うための上位層ネットワーク６０４に入力することができる。人体キーポイント検出を行うための上位層ネットワークは、入力された画像特徴を処理した後、第１サンプル画像の人体キーポイント６４１を出力する。また、第２サンプル画像の画像特徴を、人体向き検出を行うための上位層ネットワーク６０５に入力することができる。人体向き検出を行うための上位層ネットワーク６０５は、入力された画像特徴を処理した後、第２サンプル画像の人体向き６５１を出力する。 As shown in FIG. 6, the image features of the first sample image can be input to the upper layer network 604 for performing human body key point detection. The upper layer network for detecting the human body key point outputs the human body key point 641 of the first sample image after processing the input image feature. In addition, the image features of the second sample image can be input to the upper layer network 605 for detecting the orientation of the human body. The upper layer network 605 for performing the human body orientation detection processes the input image feature, and then outputs the human body orientation 651 of the second sample image.

更に、図６に示すように、第１サンプル画像の人体キーポイントを得た後、ニューラルネットワークの第１損失６４２を算出することもできる。第１損失６４２は、第１サンプル画像の人体キーポイントとアノテーションされた人体キーポイントとの差異を表す。第２サンプル画像の人体向きを得た後、ニューラルネットワークの第２損失６５２を算出することもできる。第２損失６５２は、第２サンプル画像の人体向きとアノテーションされた人体向きとの差異を表す。 Further, as shown in FIG. 6, after obtaining the human body key points of the first sample image, the first loss 642 of the neural network can be calculated. The first loss 642 represents the difference between the human body keypoint of the first sample image and the annotated human body keypoint. After obtaining the human body orientation of the second sample image, the second loss 652 of the neural network can also be calculated. The second loss 652 represents the difference between the human body orientation of the second sample image and the annotated human body orientation.

本願の実施例において、第１サンプル画像の特徴に基づいて人体キーポイント検出を行うための実現形態は、ステップ１０２における処理対象画像の特徴に基づいて人体キーポイント検出を行うための実現形態と同じであり、ここで、詳細な説明を省略する。第２サンプル画像の特徴に基づいて人体向き検出を行うための実現形態は、ステップ１０２における処理対象画像の特徴に基づいて人体向き検出を行うための実現形態と同じであり、ここで、詳細な説明を省略する。 In the embodiment of the present application, the embodiment for performing human body keypoint detection based on the characteristics of the first sample image is the same as the implementation embodiment for performing human body keypoint detection based on the characteristics of the image to be processed in step 102. Therefore, a detailed description thereof will be omitted here. The implementation form for performing the human body orientation detection based on the characteristics of the second sample image is the same as the implementation form for performing the human body orientation detection based on the characteristics of the processing target image in step 102, and here, in detail. The explanation is omitted.

上記から分かるように、ニューラルネットワークの適用及び試験プロセス（ステップ１０１からステップ１０３）は、ニューラルネットワークの訓練プロセスに比べて、画像データのスティッチング及び画像特徴の分割を行う必要がなく、処理対象画像に対して、下位層ネットワーク及び２つの上位層ネットワークにより処理を行えば、処理対象画像の人体キーポイント及び初歩的な人体向きを得ることができる。 As can be seen from the above, the neural network application and test process (steps 101 to 103) does not require stitching of image data and division of image features as compared to the neural network training process, and the processed image. On the other hand, if the processing is performed by the lower layer network and the two upper layer networks, the human body key point of the image to be processed and the rudimentary human body orientation can be obtained.

ステップ５０３において、検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、ニューラルネットワークのネットワークパラメータ値を調整する。 In step 503, the network parameter values of the neural network are adjusted based on the detected human body key points, the annotated human body key points, the detected human body orientation and the annotated human body orientation.

本ステップの実現形態として、例示的に、検出された人体キーポイント（即ち、第１サンプル画像の人体キーポイント）及びアノテーションされた人体キーポイントに基づいて、ニューラルネットワークの第１損失を得て、検出された人体向き（即ち、第２サンプル画像の人体向き）及びアノテーションされた人体向きに基づいて、ニューラルネットワークの第２損失を得て、上記第１損失及び第２損失に基づいて、ニューラルネットワークのネットワークパラメータ値を調整することができる。 As an embodiment of this step, exemplary, a first loss of a neural network is obtained based on the detected human body keypoints (ie, the human body keypoints of the first sample image) and the annotated human body keypoints. The second loss of the neural network is obtained based on the detected human body orientation (that is, the human body orientation of the second sample image) and the annotated human body orientation, and the neural network is based on the first loss and the second loss. You can adjust the network parameter values of.

具体的に実現する時、第１損失と第２損失の和をニューラルネットワークの総損失とすることができ、第１損失と第２損失の加重和をニューラルネットワークの総損失とすることもできる。第１損失と第２損失の重みは、実際の適用需要に応じて事前設定されてもよい。 When specifically realized, the sum of the first loss and the second loss can be the total loss of the neural network, and the weighted sum of the first loss and the second loss can be the total loss of the neural network. The weights of the first loss and the second loss may be preset according to the actual applicable demand.

ニューラルネットワークの総損失を得た後、ニューラルネットワークの総損失に基づいて、ニューラルネットワークのネットワークパラメータ値を調整することができる。 After obtaining the total loss of the neural network, the network parameter value of the neural network can be adjusted based on the total loss of the neural network.

ステップ５０４において、ネットワークパラメータ値が調整された初期ニューラルネットワークによる画像処理が設定された精度要件を満たすかどうかを判定し、満たしなければ、ステップ５０１からステップ５０４を再実行し、満たせれば、ステップ５０５を実行する。 In step 504, it is determined whether the image processing by the initial neural network adjusted with the network parameter value satisfies the set accuracy requirement, and if it is not satisfied, steps 501 to 504 are re-executed, and if it is satisfied, the step is performed. 505 is executed.

本願の実施例において、設定された精度要件は、事前設定されたものであってもよい。例示的に、設定された精度要件は、第１損失及び第２損失に関わる。１つ目の例において、設定された精度要件は、上記ニューラルネットワークの総損失が第１所定の閾値未満であることであってもよく、２つ目の例において、設定された精度要件は、第１損失が第２所定の閾値未満であって、且つ第２損失が第３所定の閾値未満であることであってもよい。 In the embodiments of the present application, the set accuracy requirements may be preset. Illustratively, the set accuracy requirements relate to the first loss and the second loss. In the first example, the set accuracy requirement may be that the total loss of the neural network is less than the first predetermined threshold, and in the second example, the set accuracy requirement is. The first loss may be less than the second predetermined threshold and the second loss may be less than the third predetermined threshold.

実際の適用において、第１所定の閾値、第２所定の閾値及び第３所定の閾値は、実際の適用需要に応じて事前設定されてもよい。 In actual application, the first predetermined threshold, the second predetermined threshold and the third predetermined threshold may be preset according to the actual application demand.

ステップ５０５において、ネットワークパラメータ値が調整されたニューラルネットワークを訓練されたニューラルネットワークとする。 In step 505, the neural network whose network parameter value is adjusted is regarded as a trained neural network.

実際の適用において、ステップ５０１から５０５は、電子機器におけるプロセッサにより実現してもよい。上記プロセッサは、ＡＳＩＣ、ＤＳＰ、ＤＳＰＤ、ＰＬＤ、ＦＰＧＡ、ＣＰＵ、コントローラ、マイクロコントローラ、マイクロプロセッサのうちの少なくとも１つであってもよい。 In practical applications, steps 501-505 may be implemented by a processor in an electronic device. The processor may be at least one of an ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor.

上記から分かるように、本願の実施例において、ニューラルネットワークを訓練する場合、第１サンプル画像及び第２サンプル画像に対してそれぞれ人体キーポイント検出及び人体向き検出を行う必要がない。人体キーポイント検出及び人体向き検出タスクはいずれも、同一の画像特徴抽出プロセスを基に実現したものである。従って、訓練されたニューラルネットワークは、少ないコンピューティングリソースを消耗する上で、人体キーポイント検出と人体向き検出タスクを同時に実現することができ、人体キーポイント検出及び人体向き検出タスクに求められるリアルタイム性の要件を満たすことに寄与する。 As can be seen from the above, in the embodiment of the present application, when training the neural network, it is not necessary to perform human body key point detection and human body orientation detection on the first sample image and the second sample image, respectively. Both the human body key point detection and the human body orientation detection tasks are realized based on the same image feature extraction process. Therefore, the trained neural network can realize the human body key point detection and the human body orientation detection task at the same time while consuming a small amount of computing resources, and the real-time performance required for the human body key point detection and the human body orientation detection task. Contributes to meeting the requirements of.

ニューラルネットワークを訓練するプロセスにおいて、１つの例で、第１サンプル画像と第２サンプル画像のデータ類似性（即ち、両者はいずれも人体画像を含む）を十分に利用することができる。第１サンプル画像と第２サンプル画像に対して画像データのスティッチングを行うことで、スティッチング後の画像データに対して特徴抽出を一括して行うことに寄与し、実現しやすい。また、人体キーポイント検出を行うためのニューラルネットワークと人体向き検出を行うためのニューラルネットワークの類似性（即ち、いずれも、人体画像における特徴を抽出する必要がある）を利用して、人体キーポイント検出を行うためのニューラルネットワークと人体向き検出を行うためのニューラルネットワークにおいて、同一の下位層ネットワークを抽出して、一括した画像特徴抽出に用いる。更に、同一の訓練されたニューラルネットワークにより、人体キーポイント検出及び人体向き検出を同時に行うことができる。 In the process of training a neural network, the data similarity between the first sample image and the second sample image (that is, both include a human body image) can be fully utilized in one example. By stitching the image data to the first sample image and the second sample image, it contributes to collectively perform feature extraction for the image data after stitching, and it is easy to realize. In addition, by utilizing the similarity between the neural network for detecting the human body key point and the neural network for detecting the human body orientation (that is, it is necessary to extract the features in the human body image), the human body key point is used. In the neural network for detection and the neural network for human body orientation detection, the same lower layer network is extracted and used for batch image feature extraction. Furthermore, the same trained neural network can simultaneously perform human body keypoint detection and human body orientation detection.

具体的な実施形態の上記方法において、各ステップの記述順番は、厳しい実行順番として実施過程を限定するものではなく、各ステップの具体的な実行順番はその機能及び考えられる内在的論理により決まることは、当業者であれば理解すべきである。 In the above method of a specific embodiment, the description order of each step does not limit the execution process as a strict execution order, and the specific execution order of each step is determined by its function and possible intrinsic logic. Should be understood by those skilled in the art.

前記実施例で提供される人体向き検出方法を基に、本願の実施例は、人体向き検出装置を提供する。 Based on the human body orientation detection method provided in the above embodiment, the embodiment of the present application provides a human body orientation detection device.

図９は、本願の実施例による人体向き検出装置の構造を示す概略図である。図９に示すように、該装置は、抽出モジュール９０１と、処理モジュール９０２と、を備えてもよく、
抽出モジュール９０１は、処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得るように構成され、
処理モジュール９０２は、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定し、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成される。 FIG. 9 is a schematic view showing the structure of the human body orientation detection device according to the embodiment of the present application. As shown in FIG. 9, the apparatus may include an extraction module 901 and a processing module 902.
The extraction module 901 is configured to perform feature extraction on the image to be processed and obtain the features of the image to be processed.
The processing module 902 determines the human body key point and the rudimentary human body orientation based on the characteristics of the processed image, and determines the final human body orientation based on the determined human body key point and the rudimentary human body orientation. Configured to determine.

幾つかの実施例において、前記処理モジュール９０２は、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成され、前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致することに応答して、前記初歩的な人体向きを前記最終的な人体向きとして決定することを含む。 In some embodiments, the processing module 902 is configured to determine the final human body orientation based on the determined human body key points and the rudimentary human body orientation, at the determined human body key points. It comprises determining the rudimentary human body orientation as the final human body orientation in response to the matching of the appearing human body orientation to the rudimentary human body orientation.

幾つかの実施例において、前記処理モジュール９０２は、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成され、前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致しないことに応答して、前記決定された人体キーポイントで表れる人体向きを前記最終的な人体向きとして決定することを含む。 In some embodiments, the processing module 902 is configured to determine the final human body orientation based on the determined human body key points and the rudimentary human body orientation, at the determined human body key points. It includes determining the human body orientation appearing at the determined human body key point as the final human body orientation in response to the fact that the appearing human body orientation does not match the rudimentary human body orientation.

実際の適用において、抽出モジュール９０１及び処理モジュール９０２はいずれも、電子機器におけるプロセッサにより実現してもよい。上記プロセッサは、ＡＳＩＣ、ＤＳＰ、ＤＳＰＤ、ＰＬＤ、ＦＰＧＡ、ＣＰＵ、コントローラ、マイクロコントローラ、マイクロプロセッサのうちの少なくとも１つであってもよい。 In actual application, both the extraction module 901 and the processing module 902 may be realized by a processor in an electronic device. The processor may be at least one of an ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor.

また、本願の各実施例における各機能モジュールは一つの処理ユニットに集積されてもよいし、各ユニットが物理的に別個のものとして存在してもよいし、２つ以上のユニットが一つのユニットに集積されてもよい。上記集積したユニットはハードウェアの形態として実現してもよく、ソフトウェア機能モジュールの形態として実現してもよい。 Further, each functional module in each embodiment of the present application may be integrated in one processing unit, each unit may exist as physically separate unit, or two or more units may be one unit. It may be accumulated in. The integrated unit may be realized in the form of hardware or in the form of a software function module.

前記集積したユニットがソフトウェア機能モジュールの形で実現され、かつ独立した製品として販売または使用されるとき、コンピュータ可読記憶媒体内に記憶されてもよい。このような理解のもと、本願の技術的解決手段は、本質的に、又は、従来技術に対して貢献をもたらした部分又は該技術的解決手段の全て又は一部は、ソフトウェア製品の形式で具現することができ、このようなコンピュータソフトウェア製品は、記憶媒体に記憶しても良く、また、コンピュータ機器（パーソナルコンピュータ、サーバ、又はネットワーク装置など）又はｐｒｏｃｅｓｓｏｒ（プロセッサ）に、本願の各実施例に記載の方法の全部又は一部のステップを実行させるための若干の命令を含む。前記の記憶媒体は、ＵＳＢメモリ、リムーバブルハードディスク、読み出し専用メモリ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ：ＲＯＭ）、ランダムアクセスメモリ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ：ＲＡＭ）、磁気ディスク、又は光ディスクなど、プログラムコードを記憶可能な各種の媒体を含む。 When the integrated unit is realized in the form of a software functional module and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application are essentially or parts that have contributed to the prior art, or all or part of the technical solutions, in the form of software products. Such computer software products may be embodied and may be stored in a storage medium, and may be stored in a computer device (personal computer, server, network device, etc.) or processor (processor) according to each embodiment of the present application. Includes some instructions for performing all or part of the steps of the method described in. The storage medium is various media capable of storing a program code, such as a USB memory, a removable hard disk, a read-only memory (Read Only Memory: ROM), a random access memory (Random Access Memory: RAM), a magnetic disk, or an optical disk. including.

具体的には、本実施例における人体向き検出方法に対応するコンピュータプログラム命令は、光ディスク、ハードディスク、ＵＳＢメモリなどの記憶媒体に記憶されてもよい。記憶媒体における、人体向き検出方法に対応するコンピュータプログラム命令が電子機器により読み出されるか又は実行される場合、前記実施例のいずれか１つの人体向き検出方法を実現する。 Specifically, the computer program instructions corresponding to the human body orientation detection method in this embodiment may be stored in a storage medium such as an optical disk, a hard disk, or a USB memory. When the computer program instruction corresponding to the human body orientation detection method in the storage medium is read out or executed by the electronic device, any one of the human body orientation detection methods according to the above embodiment is realized.

前記実施例と同様な技術的構想を基に、本願の実施例による電子機器１０を示す図１０に示すように、電子機器１０は、メモリ１００１と、プロセッサ１００２と、を備えてもよく、
前記メモリ１００１は、コンピュータプログラムを記憶するように構成され、
前記プロセッサ１００２は、前記メモリに記憶されたコンピュータプログラムを実行し、前記実施例のいずれか１つの人体向き検出方法を実現するように構成される。 Based on the same technical concept as in the above embodiment, as shown in FIG. 10 showing the electronic device 10 according to the embodiment of the present application, the electronic device 10 may include a memory 1001 and a processor 1002.
The memory 1001 is configured to store a computer program.
The processor 1002 is configured to execute a computer program stored in the memory and realize any one of the human body orientation detection methods of the above embodiment.

実際の適用において、上記メモリ１００１は、ＲＡＭのような揮発性メモリ（ｖｏｌａｔｉｌｅｍｅｍｏｒｙ）、ＲＯＭ、フラッシュメモリ（ｆｌａｓｈｍｅｍｏｒｙ）、ハードディスク（ＨａｒｄＤｉｓｋＤｒｉｖｅ：ＨＤＤ）又はソリッドステートドライブ（Ｓｏｌｉｄ－ＳｔａｔｅＤｒｉｖｅ：ＳＳＤ）のような不揮発性メモリ（ｎｏｎ－ｖｏｌａｔｉｌｅｍｅｍｏｒｙ）、又は上記メモリの組み合わせであってもよい。該メモリは、プロセッサ１００２に命令及びデータを提供する。 In actual application, the memory 1001 may be a volatile memory such as RAM, a ROM, a flash memory, a hard disk drive (HDD), or a solid-state drive (SSD). ) May be a non-volatile memory, or a combination of the above-mentioned memories. The memory provides instructions and data to processor 1002.

上記プロセッサ１００２は、ＡＳＩＣ、ＤＳＰ、ＤＳＰＤ、ＰＬＤ、ＦＰＧＡ、ＣＰＵ、コントローラ、マイクロコントローラ、マイクロプロセッサのうちのすくなくとも１つであってもよい。様々な機器について、上記プロセッサ機能を実現するための電子機器は他のものであってもよく、本願の実施例は、これを具体的に限定するものではないことは、理解されるべきである。 The processor 1002 may be at least one of an ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It should be understood that for various devices, the electronic device for realizing the processor function may be another device, and the embodiments of the present application do not specifically limit this. ..

幾つかの実施例において、本願の実施例で提供される装置の機能又はモジュールは、上記方法実施例に記載の方法を実行するために用いられ、その具体的な実現は、上記方法実施例の説明をを参照されたい。簡潔化のために、ここで詳細な説明を省略する。 In some embodiments, the functions or modules of the apparatus provided in the embodiments of the present application are used to perform the methods described in the above method embodiments, the specific realization of which is the embodiment of the above method embodiments. Please refer to the explanation. For the sake of brevity, detailed description is omitted here.

各実施例に関する上記説明において、各実施例の相違点を強調する傾向があり、その同一あるいは類似の部分は相互参照することができる。簡潔化のために、ここで詳細な説明を省略する。 In the above description of each embodiment, there is a tendency to emphasize the differences between the respective embodiments, and the same or similar parts can be cross-referenced. For the sake of brevity, detailed description is omitted here.

矛盾が生じない限り、本願で提供される各方法の実施例で開示された特徴を互いに任意に組み合わせて、新たな方法の実施例を得ることができる。 As long as there is no contradiction, the features disclosed in the examples of each method provided in the present application can be arbitrarily combined with each other to obtain an embodiment of a new method.

矛盾が生じない限り、本願で提供される各製品の実施例で開示された特徴を互いに任意に組み合わせて、新たな製品の実施例を得ることができる。 As long as there is no contradiction, the features disclosed in the examples of each product provided in the present application can be arbitrarily combined with each other to obtain new product examples.

矛盾が生じない限り、本願で提供される各方法又は機器の実施例で開示された特徴を互いに任意に組み合わせて、新たな方法又は機器の実施例を得ることができる。 As long as there is no contradiction, the features disclosed in the examples of each method or device provided in the present application can be arbitrarily combined with each other to obtain a new method or device embodiment.

上記実施形態の説明により、上記実施例の方法は、ソフトウェアと必須な汎用ハードウェアプラットフォームとの組み合わせで実現することができ、勿論、ハードウェアにより実現することもできるが、多くの場合、前者は、より好適な実施形態であることを当業者が理解すべきである。このような理解のもと、本願の実施例の技術的解決手段は、本質的に、又は、従来技術に対して貢献をもたらした部分又は該技術的解決手段の一部は、ソフトウェア製品の形態で具現することができ、このようなコンピュータソフトウェア製品は、記憶媒体（例えば、ＲＯＭ／ＲＡＭ、磁気ディスク、光ディスク）に記憶しても良く、また、一台のコンピュータ機器（携帯電話、コンピュータ、サーバ、エアコン、又はネットワーク装置等）に、本願の各実施例に記載の方法を実行させるための若干の命令を含む。 According to the description of the above embodiment, the method of the above embodiment can be realized by a combination of software and an essential general-purpose hardware platform, and of course, it can also be realized by hardware, but in many cases, the former can be realized. , Those skilled in the art should understand that it is a more preferred embodiment. Based on this understanding, the technical solution of the embodiment of the present application is essentially, or a part of the technical solution that has contributed to the prior art, or a part of the technical solution is in the form of a software product. Such computer software products may be stored in a storage medium (eg, ROM / RAM, magnetic disk, optical disk), or may be stored in a single computer device (mobile phone, computer, server). , Air conditioner, network device, etc.), including some instructions for performing the methods described in each embodiment of the present application.

以上は図面を参照しながら、本願の実施例を説明した。本願は、上記具体的な実施形態に限定されず、上記具体的な実施形態は模式的なものに過ぎず、本願を限定するものではない。当業者は、本願に基づいて、本願の実施例要旨及び特許請求の範囲の保護範囲から逸脱することなく、多くの実施形態を想到しうる。これらは、いずれも本願の実施例の保護範囲内に含まれる。 The embodiments of the present application have been described above with reference to the drawings. The present application is not limited to the above-mentioned specific embodiment, and the above-mentioned specific embodiment is merely a schematic one and does not limit the present application. Those skilled in the art can conceive many embodiments based on the present application without departing from the scope of protection of the abstract of the present application and the scope of claims. All of these are within the scope of protection of the embodiments of the present application.

本願は、人体向き検出方法、装置、電子機器及びコンピュータ記憶媒体を提供する。ここで、処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得て、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定し、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定する。 The present application provides human body orientation detection methods, devices, electronic devices and computer storage media. Here, feature extraction is performed on the processing target image, the characteristics of the processing target image are obtained, and the human body key points and the rudimentary human body orientation are determined and determined based on the characteristics of the processing target image. The final human body orientation is determined based on the human body key points and the rudimentary human body orientation.

上記の一般的な説明及び後述する細部に関する説明は、例示及び説明のためのものに過ぎず、本願を限定するものではないことが理解されるべきである。
例えば、本願は以下の項目を提供する。
（項目１）
人体向き検出方法であって、
処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得ることと、
前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定することと、
決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定することと、を含む、人体向き検出方法。
（項目２）
決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定することは、
前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致することに応答して、前記初歩的な人体向きを前記最終的な人体向きとして決定することを含むことを特徴とする
項目１に記載の方法。
（項目３）
決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定することは、
前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致しないことに応答して、前記決定された人体キーポイントで表れる人体向きを前記最終的な人体向きとして決定することを含むことを特徴とする
項目１に記載の方法。
（項目４）
処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得て、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定するステップは、ニューラルネットワークにより実行され、前記ニューラルネットワークは、第１サンプル画像及び第２サンプル画像により訓練して得られたものであり、前記第１サンプル画像に第１人体画像及びアノテーションされた人体キーポイントが含まれ、前記第２サンプル画像は、第２人体画像及びアノテーションされた人体向きを含むことを特徴とする
項目１－３のいずれか一項に記載の方法。
（項目５）
前記ニューラルネットワークが第１サンプル画像及び第２サンプル画像により訓練して得られたものであることは、
前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得ることと、
前記第１サンプル画像の特徴に基づいて歩行者キーポイント検出を行い、前記第１サンプル画像の人体キーポイントを得ることと、
前記第２サンプル画像の特徴に基づいて向き検出を行い、前記第２サンプル画像の人体向きを得ることと、
検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含むことを特徴とする
項目４に記載の方法。
（項目６）
前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得ることは、
前記第１サンプル画像と前記第２サンプル画像をスティッチングし、スティッチング後の画像データに対して特徴抽出を行い、スティッチング後の画像データの特徴を得ることと、
前記第１サンプル画像と前記第２サンプル画像とのスティッチング方式に応じて、前記スティッチング後の画像データの特徴を前記第１サンプル画像の特徴及び前記第２サンプル画像の特徴に分割することと、を含むことを特徴とする
項目５に記載の方法。
（項目７）
前記第１サンプル画像と前記サンプル画像をスティッチングすることは、
前記第１サンプル画像と前記第２サンプル画像をバッチ次元に沿ってスティッチングすることを含み、
前記第１サンプル画像と前記第２サンプル画像をスティッチングする前に、前記方法は、
前記第１サンプル画像及び前記第２サンプル画像をチャネル、高さ及び幅という３つの次元で同じくなるように調整することを更に含むことを特徴とする
項目６に記載の方法。
（項目８）
検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することは、
前記検出された人体キーポイント及び前記アノテーションされた人体キーポイントに基づいて、前記ニューラルネットワークの第１損失値を得ることであって、前記第１損失値は、前記検出された人体キーポイントと前記アノテーションされた人体キーポイントとの差異を表す、ことと、
前記検出された人体向き及び前記アノテーションされた人体向きに基づいて、前記ニューラルネットワークの第２損失値を得ることであって、前記第２損失値は、前記検出された人体向きと前記アノテーションされた人体向きとの差異を表す、ことと、
前記第１損失値及び前記第２損失値に基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含むことを特徴とする
項目５に記載の方法。
（項目９）
人体向き検出装置であって、前記装置は、抽出モジュールと、処理モジュールと、を備え、
抽出モジュールは、処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得るように構成され、
処理モジュールは、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定し、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成される、人体向き検出装置。
（項目１０）
前記処理モジュールは、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成され、前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致することに応答して、前記初歩的な人体向きを前記最終的な人体向きとして決定することを含むことを特徴とする
項目９に記載の装置。
（項目１１）
前記処理モジュールは、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成され、前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致しないことに応答して、前記決定された人体キーポイントで表れる人体向きを前記最終的な人体向きとして決定することを含むことを特徴とする
項目９に記載の装置。
（項目１２）
処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得て、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定するステップは、ニューラルネットワークにより実行され、前記ニューラルネットワークは、第１サンプル画像及び第２サンプル画像により訓練して得られたものであり、前記第１サンプル画像に第１人体画像及びアノテーションされた人体キーポイントが含まれ、前記第２サンプル画像は、第２人体画像及びアノテーションされた人体向きを含むことを特徴とする
項目９－１１のいずれか一項に記載の装置。
（項目１３）
前記装置は、訓練モジュールを更に備え、前記訓練モジュールは、第１サンプル画像及び第２サンプル画像により前記ニューラルネットワークを訓練するように構成され、
前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得ることと、前記第１サンプル画像の特徴に基づいて歩行者キーポイント検出を行い、前記第１サンプル画像の人体キーポイントを得ることと、前記第２サンプル画像の特徴に基づいて向き検出を行い、前記第２サンプル画像の人体向きを得ることと、
検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含むことを特徴とする
項目１２に記載の装置。
（項目１４）
前記訓練モジュールは、前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得るように構成され、
前記第１サンプル画像と前記第２サンプル画像をスティッチングし、スティッチング後の画像データに対して特徴抽出を行い、スティッチング後の画像データの特徴を得ることと、
前記第１サンプル画像と前記第２サンプル画像とのスティッチング方式に応じて、前記スティッチング後の画像データの特徴を前記第１サンプル画像の特徴及び前記第２サンプル画像の特徴に分割することと、を含むことを特徴とする
項目１３に記載の装置。
（項目１５）
前記訓練モジュールは、前記第１サンプル画像と前記第２サンプル画像をスティッチングするように構成され、前記第１サンプル画像と前記第２サンプル画像をバッチ次元に沿ってスティッチングすることを含み、
前記訓練モジュールは更に、前記第１サンプル画像と前記第２サンプル画像をスティッチングする前に、前記第１サンプル画像及び前記第２サンプル画像をチャネル、高さ及び幅という３つの次元で同じくなるように調整するように構成されることを特徴とする
項目１４に記載の装置。
（項目１６）
前記訓練モジュールは、検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整するように構成され、
前記検出された人体キーポイント及び前記アノテーションされた人体キーポイントに基づいて、前記ニューラルネットワークの第１損失値を得ることであって、前記第１損失値は、前記検出された人体キーポイントと前記アノテーションされた人体キーポイントとの差異を表す、ことと、
前記検出された人体向き及び前記アノテーションされた人体向きに基づいて、前記ニューラルネットワークの第２損失値を得ることであって、前記第２損失値は、前記検出された人体向きと前記アノテーションされた人体向きとの差異を表す、ことと、
前記第１損失値及び前記第２損失値に基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含むことを特徴とする
項目１３に記載の装置。
（項目１７）
電子機器であって、前記電子機器は、プロセッサと、プロセッサで実行可能なコンピュータプログラムを記憶するように構成されるメモリと、を備え、
前記プロセッサは、前記コンピュータプログラムを実行し、項目１から８のいずれか一項に記載の方法を実行するように構成される、電子機器。
（項目１８）
コンピュータ記憶媒体であって、コンピュータプログラムが記憶されており、該コンピュータプログラムがプロセッサにより実行される時、項目１から８のいずれか一項に記載の方法を実現する、コンピュータ記憶媒体。 It should be understood that the general description above and the details described below are for illustration and illustration purposes only and are not intended to limit the present application.
For example, the present application provides the following items.
(Item 1)
It is a detection method for the human body,
Feature extraction is performed on the image to be processed to obtain the features of the image to be processed.
Determining the key points of the human body and the rudimentary orientation of the human body based on the characteristics of the image to be processed.
A method for detecting human body orientation, including determining the final human body orientation based on the determined human body key points and rudimentary human body orientation.
(Item 2)
Determining the final human orientation based on the determined human body key points and rudimentary human orientation is
It is characterized by including determining the rudimentary human body orientation as the final human body orientation in response to the human body orientation appearing at the determined human body key point matching the rudimentary human body orientation. do
The method according to item 1.
(Item 3)
Determining the final human orientation based on the determined human body key points and rudimentary human orientation is
In response to the fact that the human body orientation appearing at the determined human body key point does not match the rudimentary human body orientation, the human body orientation appearing at the determined human body key point is determined as the final human body orientation. Characterized by including
The method according to item 1.
(Item 4)
The step of extracting the characteristics of the image to be processed, obtaining the characteristics of the image to be processed, and determining the key points of the human body and the rudimentary orientation of the human body based on the characteristics of the image to be processed is executed by the neural network. The neural network is obtained by training with a first sample image and a second sample image, and the first sample image includes a first human body image and an annotated human body key point, and the first sample image is described. The two sample images are characterized by including a second human body image and an annotated human body orientation.
The method according to any one of items 1-3.
(Item 5)
It is that the neural network was obtained by training with the first sample image and the second sample image.
Feature extraction is performed on the first sample image and the second sample image to obtain the features of the first sample image and the second sample image.
Pedestrian key point detection is performed based on the characteristics of the first sample image, and the human body key point of the first sample image is obtained.
Orientation is detected based on the characteristics of the second sample image to obtain the orientation of the second sample image to the human body.
It comprises adjusting the network parameter values of the neural network based on the detected human body key points, the annotated human body key points, the detected human body orientation and the annotated human body orientation.
The method according to item 4.
(Item 6)
To obtain the features of the first sample image and the second sample image by performing feature extraction on the first sample image and the second sample image.
Stitching the first sample image and the second sample image, extracting features from the stitched image data, and obtaining the features of the stitched image data.
According to the stitching method between the first sample image and the second sample image, the characteristics of the image data after stitching are divided into the characteristics of the first sample image and the characteristics of the second sample image. , Characterized by including
The method according to item 5.
(Item 7)
Stitching the first sample image and the sample image is
Including stitching the first sample image and the second sample image along the batch dimension.
Before stitching the first sample image and the second sample image, the method is described.
It further comprises adjusting the first sample image and the second sample image to be the same in the three dimensions of channel, height and width.
The method according to item 6.
(Item 8)
Adjusting the network parameter values of the neural network based on the detected human body keypoints, the annotated human body keypoints, the detected human body orientation and the annotated human body orientation can be used.
The first loss value of the neural network is obtained based on the detected human body key point and the annotated human body key point, and the first loss value is the detected human body key point and the said. Representing the difference from the annotated human body keypoint,
The second loss value of the neural network is obtained based on the detected human body orientation and the annotated human body orientation, and the second loss value is annotated with the detected human body orientation. Representing the difference from the human body orientation,
It is characterized by including adjusting the network parameter value of the neural network based on the first loss value and the second loss value.
The method according to item 5.
(Item 9)
It is a human body orientation detection device, and the device includes an extraction module and a processing module.
The extraction module is configured to perform feature extraction on the image to be processed and obtain the features of the image to be processed.
The processing module determines the human body key point and the rudimentary human body orientation based on the characteristics of the processed image, and determines the final human body orientation based on the determined human body key point and the rudimentary human body orientation. A human body orientation detector configured to do so.
(Item 10)
The processing module is configured to determine the final human body orientation based on the determined human body key points and the rudimentary human body orientation, and the human body orientation represented by the determined human body key points is the rudimentary human body orientation. It is characterized by comprising determining the rudimentary human body orientation as the final human body orientation in response to matching the human body orientation.
The device according to item 9.
(Item 11)
The processing module is configured to determine the final human body orientation based on the determined human body key points and the rudimentary human body orientation, and the human body orientation represented by the determined human body key points is the rudimentary human body orientation. It is characterized by including determining the human body orientation appearing at the determined human body key point as the final human body orientation in response to the disagreement with the human body orientation.
The device according to item 9.
(Item 12)
The step of extracting the characteristics of the image to be processed, obtaining the characteristics of the image to be processed, and determining the key points of the human body and the rudimentary orientation of the human body based on the characteristics of the image to be processed is executed by the neural network. The neural network is obtained by training with a first sample image and a second sample image, and the first sample image includes a first human body image and an annotated human body key point, and the first sample image is described. The two sample images are characterized by including a second human body image and an annotated human body orientation.
The device according to any one of items 9-11.
(Item 13)
The device further comprises a training module, the training module being configured to train the neural network with a first sample image and a second sample image.
Features are extracted from the first sample image and the second sample image to obtain the features of the first sample image and the second sample image, and the pedestrian key is based on the features of the first sample image. Point detection is performed to obtain a human body key point of the first sample image, orientation detection is performed based on the characteristics of the second sample image, and the human body orientation of the second sample image is obtained.
It comprises adjusting the network parameter values of the neural network based on the detected human body key points, the annotated human body key points, the detected human body orientation and the annotated human body orientation.
Item 12. The apparatus according to item 12.
(Item 14)
The training module is configured to perform feature extraction on the first sample image and the second sample image to obtain the features of the first sample image and the second sample image.
Stitching the first sample image and the second sample image, extracting features from the stitched image data, and obtaining the features of the stitched image data.
According to the stitching method between the first sample image and the second sample image, the characteristics of the image data after stitching are divided into the characteristics of the first sample image and the characteristics of the second sample image. , Characterized by including
The device according to item 13.
(Item 15)
The training module is configured to stitch the first sample image and the second sample image, and includes stitching the first sample image and the second sample image along a batch dimension.
The training module further aligns the first sample image and the second sample image in the three dimensions of channel, height and width before stitching the first sample image and the second sample image. It is characterized by being configured to adjust to
Item 14. The apparatus according to item 14.
(Item 16)
The training module is configured to adjust network parameter values of the neural network based on the detected human body keypoints, the annotated human body keypoints, the detected human body orientation and the annotated human body orientation.
The first loss value of the neural network is obtained based on the detected human body key point and the annotated human body key point, and the first loss value is the detected human body key point and the said. Representing the difference from the annotated human body keypoint,
The second loss value of the neural network is obtained based on the detected human body orientation and the annotated human body orientation, and the second loss value is annotated with the detected human body orientation. Representing the difference from the human body orientation,
It is characterized by including adjusting the network parameter value of the neural network based on the first loss value and the second loss value.
The device according to item 13.
(Item 17)
An electronic device, said electronic device, comprising a processor and a memory configured to store a computer program that can be executed by the processor.
The processor is an electronic device that executes the computer program and is configured to perform the method according to any one of items 1-8.
(Item 18)
A computer storage medium, wherein a computer program is stored, and when the computer program is executed by a processor, the method according to any one of items 1 to 8 is realized.

Claims

人体向き検出方法であって、
処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得ることと、
前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定することと、
決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定することと、を含む、人体向き検出方法。 It is a detection method for the human body,
Feature extraction is performed on the image to be processed to obtain the features of the image to be processed.
Determining the key points of the human body and the rudimentary orientation of the human body based on the characteristics of the image to be processed.
A method for detecting human body orientation, including determining the final human body orientation based on the determined human body key points and rudimentary human body orientation.

決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定することは、
前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致することに応答して、前記初歩的な人体向きを前記最終的な人体向きとして決定することを含むことを特徴とする
請求項１に記載の方法。 Determining the final human orientation based on the determined human body key points and rudimentary human orientation is
It is characterized by including determining the rudimentary human body orientation as the final human body orientation in response to the human body orientation appearing at the determined human body key point matching the rudimentary human body orientation. The method according to claim 1.

決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定することは、
前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致しないことに応答して、前記決定された人体キーポイントで表れる人体向きを前記最終的な人体向きとして決定することを含むことを特徴とする
請求項１に記載の方法。 Determining the final human orientation based on the determined human body key points and rudimentary human orientation is
In response to the fact that the human body orientation appearing at the determined human body key point does not match the rudimentary human body orientation, the human body orientation appearing at the determined human body key point is determined as the final human body orientation. The method according to claim 1, wherein the method includes.

処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得て、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定するステップは、ニューラルネットワークにより実行され、前記ニューラルネットワークは、第１サンプル画像及び第２サンプル画像により訓練して得られたものであり、前記第１サンプル画像に第１人体画像及びアノテーションされた人体キーポイントが含まれ、前記第２サンプル画像は、第２人体画像及びアノテーションされた人体向きを含むことを特徴とする
請求項１－３のいずれか一項に記載の方法。 The step of extracting the characteristics of the image to be processed, obtaining the characteristics of the image to be processed, and determining the key points of the human body and the rudimentary orientation of the human body based on the characteristics of the image to be processed is executed by the neural network. The neural network is obtained by training with a first sample image and a second sample image, and the first sample image includes a first human body image and an annotated human body key point, and the first sample image is described. 2. The method according to any one of claims 1-3, wherein the sample image includes a second human body image and an annotated human body orientation.

前記ニューラルネットワークが第１サンプル画像及び第２サンプル画像により訓練して得られたものであることは、
前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得ることと、
前記第１サンプル画像の特徴に基づいて歩行者キーポイント検出を行い、前記第１サンプル画像の人体キーポイントを得ることと、
前記第２サンプル画像の特徴に基づいて向き検出を行い、前記第２サンプル画像の人体向きを得ることと、
検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含むことを特徴とする
請求項４に記載の方法。 It is that the neural network was obtained by training with the first sample image and the second sample image.
Feature extraction is performed on the first sample image and the second sample image to obtain the features of the first sample image and the second sample image.
Pedestrian key point detection is performed based on the characteristics of the first sample image, and the human body key point of the first sample image is obtained.
The orientation is detected based on the characteristics of the second sample image to obtain the orientation of the second sample image to the human body.
Claims comprising adjusting the network parameter values of the neural network based on the detected human body keypoint, the annotated human body keypoint, the detected human body orientation and the annotated human body orientation. Item 4. The method according to Item 4.

前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得ることは、
前記第１サンプル画像と前記第２サンプル画像をスティッチングし、スティッチング後の画像データに対して特徴抽出を行い、スティッチング後の画像データの特徴を得ることと、
前記第１サンプル画像と前記第２サンプル画像とのスティッチング方式に応じて、前記スティッチング後の画像データの特徴を前記第１サンプル画像の特徴及び前記第２サンプル画像の特徴に分割することと、を含むことを特徴とする
請求項５に記載の方法。 To obtain the features of the first sample image and the second sample image by performing feature extraction on the first sample image and the second sample image.
The first sample image and the second sample image are stitched, features are extracted from the stitched image data, and the features of the stitched image data are obtained.
According to the stitching method between the first sample image and the second sample image, the characteristics of the image data after stitching are divided into the characteristics of the first sample image and the characteristics of the second sample image. The method according to claim 5, wherein the method comprises.

前記第１サンプル画像と前記サンプル画像をスティッチングすることは、
前記第１サンプル画像と前記第２サンプル画像をバッチ次元に沿ってスティッチングすることを含み、
前記第１サンプル画像と前記第２サンプル画像をスティッチングする前に、前記方法は、
前記第１サンプル画像及び前記第２サンプル画像をチャネル、高さ及び幅という３つの次元で同じくなるように調整することを更に含むことを特徴とする
請求項６に記載の方法。 Stitching the first sample image and the sample image is
Including stitching the first sample image and the second sample image along a batch dimension.
Before stitching the first sample image and the second sample image, the method is:
The method of claim 6, further comprising adjusting the first sample image and the second sample image to be the same in the three dimensions of channel, height and width.

検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することは、
前記検出された人体キーポイント及び前記アノテーションされた人体キーポイントに基づいて、前記ニューラルネットワークの第１損失値を得ることであって、前記第１損失値は、前記検出された人体キーポイントと前記アノテーションされた人体キーポイントとの差異を表す、ことと、
前記検出された人体向き及び前記アノテーションされた人体向きに基づいて、前記ニューラルネットワークの第２損失値を得ることであって、前記第２損失値は、前記検出された人体向きと前記アノテーションされた人体向きとの差異を表す、ことと、
前記第１損失値及び前記第２損失値に基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含むことを特徴とする
請求項５に記載の方法。 Adjusting the network parameter values of the neural network based on the detected human body keypoints, the annotated human body keypoints, the detected human body orientation and the annotated human body orientation can be used.
The first loss value of the neural network is obtained based on the detected human body key point and the annotated human body key point, and the first loss value is the detected human body key point and the said. Representing the difference from the annotated human body keypoint,
The second loss value of the neural network is obtained based on the detected human body orientation and the annotated human body orientation, and the second loss value is annotated with the detected human body orientation. Representing the difference from the human body orientation,
The method according to claim 5, wherein the network parameter value of the neural network is adjusted based on the first loss value and the second loss value.

人体向き検出装置であって、前記装置は、抽出モジュールと、処理モジュールと、を備え、
抽出モジュールは、処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得るように構成され、
処理モジュールは、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定し、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成される、人体向き検出装置。 It is a human body orientation detection device, and the device includes an extraction module and a processing module.
The extraction module is configured to perform feature extraction on the image to be processed and obtain the features of the image to be processed.
The processing module determines the human body key point and the rudimentary human body orientation based on the characteristics of the processed image, and determines the final human body orientation based on the determined human body key point and the rudimentary human body orientation. A human body orientation detector configured to do so.

前記処理モジュールは、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成され、前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致することに応答して、前記初歩的な人体向きを前記最終的な人体向きとして決定することを含むことを特徴とする
請求項９に記載の装置。 The processing module is configured to determine the final human body orientation based on the determined human body key points and the rudimentary human body orientation, and the human body orientation represented by the determined human body key points is the rudimentary human body orientation. 9. The apparatus of claim 9, comprising determining the rudimentary human body orientation as the final human body orientation in response to matching the human body orientation.

前記処理モジュールは、決定された人体キーポイント及び初歩的な人体向きに基づいて、最終的な人体向きを決定するように構成され、前記決定された人体キーポイントで表れる人体向きが前記初歩的な人体向きに合致しないことに応答して、前記決定された人体キーポイントで表れる人体向きを前記最終的な人体向きとして決定することを含むことを特徴とする
請求項９に記載の装置。 The processing module is configured to determine the final human body orientation based on the determined human body key points and the rudimentary human body orientation, and the human body orientation represented by the determined human body key points is the rudimentary human body orientation. The apparatus according to claim 9, wherein the apparatus comprising determining the human body orientation appearing at the determined human body key point as the final human body orientation in response to the disagreement with the human body orientation.

処理対象画像に対して特徴抽出を行い、前記処理対象画像の特徴を得て、前記処理対象画像の特徴に基づいて、人体キーポイント及び初歩的な人体向きを決定するステップは、ニューラルネットワークにより実行され、前記ニューラルネットワークは、第１サンプル画像及び第２サンプル画像により訓練して得られたものであり、前記第１サンプル画像に第１人体画像及びアノテーションされた人体キーポイントが含まれ、前記第２サンプル画像は、第２人体画像及びアノテーションされた人体向きを含むことを特徴とする
請求項９－１１のいずれか一項に記載の装置。 The step of extracting the characteristics of the image to be processed, obtaining the characteristics of the image to be processed, and determining the key points of the human body and the rudimentary orientation of the human body based on the characteristics of the image to be processed is executed by the neural network. The neural network is obtained by training with a first sample image and a second sample image, and the first sample image includes a first human body image and an annotated human body key point, and the first sample image is described. 2. The apparatus according to any one of claims 9-11, wherein the sample image includes a second human body image and an annotated human body orientation.

前記装置は、訓練モジュールを更に備え、前記訓練モジュールは、第１サンプル画像及び第２サンプル画像により前記ニューラルネットワークを訓練するように構成され、
前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得ることと、前記第１サンプル画像の特徴に基づいて歩行者キーポイント検出を行い、前記第１サンプル画像の人体キーポイントを得ることと、前記第２サンプル画像の特徴に基づいて向き検出を行い、前記第２サンプル画像の人体向きを得ることと、
検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含むことを特徴とする
請求項１２に記載の装置。 The device further comprises a training module, the training module being configured to train the neural network with a first sample image and a second sample image.
Features are extracted from the first sample image and the second sample image to obtain the features of the first sample image and the second sample image, and the pedestrian key is based on the features of the first sample image. Point detection is performed to obtain a human body key point of the first sample image, orientation detection is performed based on the characteristics of the second sample image, and the human body orientation of the second sample image is obtained.
Claims comprising adjusting the network parameter values of the neural network based on the detected human body keypoint, the annotated human body keypoint, the detected human body orientation and the annotated human body orientation. Item 12. The apparatus according to Item 12.

前記訓練モジュールは、前記第１サンプル画像及び前記第２サンプル画像に対して特徴抽出を行い、前記第１サンプル画像及び前記第２サンプル画像の特徴を得るように構成され、
前記第１サンプル画像と前記第２サンプル画像をスティッチングし、スティッチング後の画像データに対して特徴抽出を行い、スティッチング後の画像データの特徴を得ることと、
前記第１サンプル画像と前記第２サンプル画像とのスティッチング方式に応じて、前記スティッチング後の画像データの特徴を前記第１サンプル画像の特徴及び前記第２サンプル画像の特徴に分割することと、を含むことを特徴とする
請求項１３に記載の装置。 The training module is configured to perform feature extraction on the first sample image and the second sample image to obtain the features of the first sample image and the second sample image.
The first sample image and the second sample image are stitched, features are extracted from the stitched image data, and the features of the stitched image data are obtained.
According to the stitching method between the first sample image and the second sample image, the characteristics of the image data after stitching are divided into the characteristics of the first sample image and the characteristics of the second sample image. The apparatus according to claim 13, wherein the apparatus comprises.

前記訓練モジュールは、前記第１サンプル画像と前記第２サンプル画像をスティッチングするように構成され、前記第１サンプル画像と前記第２サンプル画像をバッチ次元に沿ってスティッチングすることを含み、
前記訓練モジュールは更に、前記第１サンプル画像と前記第２サンプル画像をスティッチングする前に、前記第１サンプル画像及び前記第２サンプル画像をチャネル、高さ及び幅という３つの次元で同じくなるように調整するように構成されることを特徴とする
請求項１４に記載の装置。 The training module is configured to stitch the first sample image and the second sample image, and includes stitching the first sample image and the second sample image along a batch dimension.
The training module further aligns the first sample image and the second sample image in the three dimensions of channel, height and width before stitching the first sample image and the second sample image. 14. The apparatus of claim 14, characterized in that it is configured to be tuned to.

前記訓練モジュールは、検出された人体キーポイント、アノテーションされた人体キーポイント、検出された人体向き及びアノテーションされた人体向きに基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整するように構成され、
前記検出された人体キーポイント及び前記アノテーションされた人体キーポイントに基づいて、前記ニューラルネットワークの第１損失値を得ることであって、前記第１損失値は、前記検出された人体キーポイントと前記アノテーションされた人体キーポイントとの差異を表す、ことと、
前記検出された人体向き及び前記アノテーションされた人体向きに基づいて、前記ニューラルネットワークの第２損失値を得ることであって、前記第２損失値は、前記検出された人体向きと前記アノテーションされた人体向きとの差異を表す、ことと、
前記第１損失値及び前記第２損失値に基づいて、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含むことを特徴とする
請求項１３に記載の装置。 The training module is configured to adjust network parameter values of the neural network based on the detected human body keypoints, the annotated human body keypoints, the detected human body orientation and the annotated human body orientation.
The first loss value of the neural network is obtained based on the detected human body key point and the annotated human body key point, and the first loss value is the detected human body key point and the said. Representing the difference from the annotated human body keypoint,
The second loss value of the neural network is obtained based on the detected human body orientation and the annotated human body orientation, and the second loss value is annotated with the detected human body orientation. Representing the difference from the human body orientation,
13. The apparatus according to claim 13, wherein the network parameter value of the neural network is adjusted based on the first loss value and the second loss value.

電子機器であって、前記電子機器は、プロセッサと、プロセッサで実行可能なコンピュータプログラムを記憶するように構成されるメモリと、を備え、
前記プロセッサは、前記コンピュータプログラムを実行し、請求項１から８のいずれか一項に記載の方法を実行するように構成される、電子機器。 An electronic device, said electronic device, comprising a processor and a memory configured to store a computer program that can be executed by the processor.
An electronic device, wherein the processor is configured to execute the computer program and perform the method according to any one of claims 1-8.

コンピュータ記憶媒体であって、コンピュータプログラムが記憶されており、該コンピュータプログラムがプロセッサにより実行される時、請求項１から８のいずれか一項に記載の方法を実現する、コンピュータ記憶媒体。 A computer storage medium that realizes the method according to any one of claims 1 to 8 when a computer program is stored and the computer program is executed by a processor.