JP2022526347A

JP2022526347A - Image processing methods, equipment, electronic devices and computer program products

Info

Publication number: JP2022526347A
Application number: JP2021557462A
Authority: JP
Inventors: フェイワン; チェンチエン
Original assignee: シャンハイセンスタイムリンガンインテリジェントテクノロジーカンパニーリミテッド
Priority date: 2020-02-18
Filing date: 2020-12-14
Publication date: 2022-05-24
Anticipated expiration: 2040-12-14
Also published as: WO2021164395A1; JP7235892B2; KR20210140758A; CN111275002A

Abstract

本発明の実施例は、画像処理方法、装置並びに電子機器を提供する。前記画像処理方法は、検出待ちの画像を取得することと、前記画像において目標対象の顔を表す第１検出枠、及び目標対象の体を表す第２検出枠をそれぞれ決定することであって、前記第１検出枠の数はＭであり、前記第２検出枠の数はＮであり、Ｍ及びＮのいずれも非負の整数であることと、Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定することであって、Ｋは非負の整数であり、ＫはＭより小さいか等しく、ＫはＮより小さいか等しいことと、Ｍ、Ｎ及びＫに基づいて前記画像内の目標対象の数を決定することと、を含む。【選択図】図１Examples of the present invention provide image processing methods, devices and electronic devices. The image processing method is to acquire an image waiting for detection and to determine a first detection frame representing the face of the target target and a second detection frame representing the body of the target target in the image, respectively. The number of the first detection frames is M, the number of the second detection frames is N, both M and N are non-negative integers, and M first detection frames and N first detection frames. Of the two detection frames, K is to determine the first detection frame and the second detection frame that satisfy the matching relationship, where K is a non-negative integer, K is smaller than or equal to M, and K is more than N. Includes being small or equal and determining the number of target objects in the image based on M, N and K. [Selection diagram] Fig. 1

Description

［関連出願への相互参照］
本願は、２０２０年０２月１８日に中国特許局に提出された、出願番号が２０２０１００９８８０９．８である中国特許出願に基づいて提出されるものであり、当該中国特許出願の優先権を主張し、当該中国特許出願の全ての内容が引用によって本願に組み込まれる。
［技術分野］
本発明は、画像分析技術に関し、具体的には、画像処理方法、装置並びに電子機器に関する。 [Cross-reference to related applications]
This application is filed on the basis of a Chinese patent application with an application number of 202010098809.8, which was filed with the Chinese Patent Office on February 18, 2020, claiming the priority of the Chinese patent application. The entire contents of the Chinese patent application are incorporated herein by reference.
[Technical field]
The present invention relates to an image analysis technique, specifically, an image processing method, an apparatus, and an electronic device.

現在、車内の人数の統計は、顔を検出する方式で実現できる。しかし、車内には座席によってブロックされるか、又は顔の回転角度が大きすぎると、検出漏れが発生し、車内の人数の統計精度が低下する。 Currently, statistics on the number of people in a car can be realized by a method that detects faces. However, if the inside of the car is blocked by a seat or the rotation angle of the face is too large, a detection omission occurs and the statistical accuracy of the number of people in the car is lowered.

本発明の実施例は、画像処理方法、装置、電子機器並びにコンピュータプログラム製品を提供する。 Examples of the present invention provide image processing methods, devices, electronic devices and computer program products.

本発明の実施例は、画像処理方法を提供し、前記方法は、
検出待ちの画像を取得することと、
前記画像において目標対象の顔を表す第１検出枠、及び目標対象の体を表す第２検出枠をそれぞれ決定することであって、前記第１検出枠の数はＭであり、前記第２検出枠の数はＮであり、Ｍ及びＮのいずれも非負の整数であることと、
Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定することであって、Ｋは非負の整数であり、ＫはＭより小さいか等しく、ＫはＮより小さいか等しいことと、
Ｍ、Ｎ及びＫに基づいて前記目標対象の数を決定することと、を含む。 Examples of the present invention provide an image processing method, wherein the method is:
Acquiring images waiting to be detected and
In the image, the first detection frame representing the face of the target target and the second detection frame representing the body of the target target are determined, respectively, and the number of the first detection frames is M, and the second detection frame is used. The number of frames is N, and both M and N are non-negative integers.
Of the M first detection frames and N second detection frames, K is to determine the first detection frame and the second detection frame that satisfy the matching relationship, and K is a non-negative integer. K is less than or equal to M, K is less than or equal to N,
Includes determining the number of target objects based on M, N and K.

本発明のいくつかの例示的な実施例において、前記Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定することは、
前記Ｍ個の第１検出枠をトラバースして、各第１検出枠と各第２検出枠とのＩｏＵ（ＩｎｔｅｒｓｅｃｔｉｏｎｏｖｅｒＵｎｉｏｎ）を決定することと、
各第１検出枠と各第２検出枠とのＩｏＵに基づいて、マッチング関係を満たす第１検出枠と第２検出枠を決定することと、を含む。 In some exemplary embodiments of the present invention, among the M first detection frames and N second detection frames, K first detection frames and second detection frames satisfying the matching relationship are determined. To do
Traversing the M first detection frames to determine the IoU (Intersection over Union) between each first detection frame and each second detection frame.
It includes determining a first detection frame and a second detection frame that satisfy a matching relationship based on the IoU of each first detection frame and each second detection frame.

本発明のいくつかの実施例において、前記各第１検出枠と各第２検出枠とのＩｏＵに基づいて、マッチング関係を満たす第１検出枠と第２検出枠を決定することは、
各第１検出枠と各第２検出枠とのＩｏＵのうちの最大ＩｏＵを決定することと、
前記最大ＩｏＵがプリセットされた閾値より大きいかどうかを判断することと、
前記最大ＩｏＵが前記プリセットされた閾値より大きいことに応答して、前記最大ＩｏＵに対応する第１検出枠と第２検出枠がマッチング関係を満たすと決定することと、を含む。 In some embodiments of the present invention, determining the first detection frame and the second detection frame satisfying the matching relationship based on the IoU of each of the first detection frames and the second detection frames is not possible.
Determining the maximum IoU of the IoUs of each first detection frame and each second detection frame,
Determining if the maximum IoU is greater than the preset threshold,
It comprises determining that the first detection frame and the second detection frame corresponding to the maximum IoU satisfy the matching relationship in response to the maximum IoU being greater than the preset threshold.

本発明のいくつかの実施例において、前記Ｍ、Ｎ及びＫに基づいて前記画像内の目標対象の数を決定することは、前記目標対象の数が、Ｋ＋（Ｍ－Ｋ）＋（Ｎ－Ｋ）であると決定することを含む。 In some embodiments of the present invention, determining the number of target objects in the image based on the M, N and K means that the number of target objects is K + (MK) + (N−). K) involves determining to be.

本発明のいくつかの実施例において、前記画像処理方法は、
前記画像内の各目標対象の体キーポイントを取得することと、
前記体キーポイントに対応する位置分類カテゴリを決定することであって、前記位置分類カテゴリは、前記体キーポイントが前記画像内の複数の特定領域のうちの１つの特定領域内に位置することを表す、ことと、
各体キーポイントに対応する位置分類カテゴリに基づいて、各目標対象が位置する領域を決定することと、を更に含む。 In some embodiments of the invention, the image processing method is:
Acquiring the body key points of each target in the image,
The determination of the position classification category corresponding to the body key point is that the body key point is located in one specific area of a plurality of specific areas in the image. Represent, that,
Further including determining the area in which each target object is located based on the position classification category corresponding to each body key point.

本発明のいくつかの実施例において、
各特定領域がキャビン内の各座席である場合、前記体キーポイントに対応する位置分類カテゴリは、前記体キーポイントに対応する座席であり、前記各体キーポイントに対応する位置分類カテゴリに基づいて、各目標対象が位置する領域を決定することは、１つの目標対象の体キーポイントに対応する座席に基づいて、当該目標対象が位置する座席を決定することを含み、
前記画像処理方法は、キャビンにおける各目標対象が位置する座席に従って、前記キャビン内の各座席の状態を決定することを更に含む。 In some embodiments of the invention
When each specific area is each seat in the cabin, the position classification category corresponding to the body key point is the seat corresponding to the body key point, and is based on the position classification category corresponding to the body key point. Determining the area in which each target object is located involves determining the seat in which the target object is located based on the seat corresponding to one body key point of the target object.
The image processing method further comprises determining the condition of each seat in the cabin according to the seat in which each target object is located in the cabin.

本発明のいくつかの実施例において、前記１つの目標対象の体キーポイントに対応する座席に基づいて、当該目標対象が位置する座席を決定することは、
１つの目標対象の複数の体キーポイントのうち、同じ座席に対応する体キーポイントの数を統計することと、
体キーポイントの最大数に対応する座席が前記目標対象の座席であると決定することと、を含む。 In some embodiments of the invention, determining the seat in which the target object is located is based on the seat corresponding to the body key point of the one target object.
Statistics on the number of body key points corresponding to the same seat among multiple body key points for one target
It includes determining that the seat corresponding to the maximum number of body key points is the target seat.

本発明の実施例は、画像処理装置を更に提供し、前記画像処理装置は、取得ユニット、第１決定ユニット、第２決定ユニット及びマッチングユニットを備え、
前記取得ユニットは、検出待ちの画像を取得するように構成され、
前記第１決定ユニットは、前記画像において目標対象の顔を表す第１検出枠を決定するように構成され、前記第１検出枠の数はＭであり、
前記第２決定ユニットは、前記画像において目標対象の体を表す第２検出枠を決定するように構成され、前記第２検出枠の数はＮであり、Ｍ及びＮのいずれも非負の整数であり、
前記マッチングユニットは、Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定し、Ｍ、Ｎ及びＫに基づいて前記画像内の目標対象の数を決定するように構成され、Ｋは非負の整数であり、ＫはＭより小さいか等しく、ＫはＮより小さいか等しい。 An embodiment of the present invention further provides an image processing apparatus, wherein the image processing apparatus includes an acquisition unit, a first determination unit, a second determination unit, and a matching unit.
The acquisition unit is configured to acquire images awaiting detection.
The first determination unit is configured to determine a first detection frame representing the face of the target target in the image, and the number of the first detection frames is M.
The second determination unit is configured to determine a second detection frame representing the target body in the image, the number of the second detection frames is N, and both M and N are non-negative integers. can be,
The matching unit determines K first detection frames and second detection frames that satisfy the matching relationship among the M first detection frames and N second detection frames, and is based on M, N, and K. Is configured to determine the number of target objects in the image, where K is a non-negative integer, K is less than or equal to M, and K is less than or equal to N.

本発明のいくつかの実施例において、前記マッチングユニットは、前記Ｍ個の第１検出枠をトラバースして、各第１検出枠と各第２検出枠とのＩｏＵ（ＩｎｔｅｒｓｅｃｔｉｏｎｏｖｅｒＵｎｉｏｎ）を決定し、各第１検出枠と各第２検出枠とのＩｏＵに基づいて、マッチング関係を満たす第１検出枠と第２検出枠を決定するように構成される。 In some embodiments of the present invention, the matching unit traverses the M first detection frames to determine an IoU (Intersection over Union) between each first detection frame and each second detection frame. , The first detection frame and the second detection frame satisfying the matching relationship are determined based on the IoU of each first detection frame and each second detection frame.

本発明のいくつかの実施例において、前記マッチングユニットは、各第１検出枠と各第２検出枠とのＩｏＵのうちの最大ＩｏＵを決定し、前記最大ＩｏＵがプリセットされた閾値より大きいかどうかを判断し、前記最大ＩｏＵが前記プリセットされた閾値より大きいことに応答して、前記最大ＩｏＵに対応する第１検出枠と第２検出枠がマッチング関係を満たすと決定するように構成される。 In some embodiments of the invention, the matching unit determines the maximum IoU of the IoUs of each first detection frame and each second detection frame, and whether the maximum IoU is greater than the preset threshold. Is determined, and in response to the fact that the maximum IoU is larger than the preset threshold value, it is determined that the first detection frame and the second detection frame corresponding to the maximum IoU satisfy the matching relationship.

本発明のいくつかの実施例において、前記マッチングユニットは、前記目標対象の数が、Ｋ＋（Ｍ－Ｋ）＋（Ｎ－Ｋ）であると決定するように構成される。 In some embodiments of the invention, the matching unit is configured to determine that the number of target objects is K + (MK) + (NK).

本発明のいくつかの実施例において、前記画像処理装置は更に、分類ユニット及び第３決定ユニットを備え、
前記第２決定ユニットは更に、前記画像内の各目標対象の体キーポイントを取得するように構成され、
前記分類ユニットは、前記体キーポイントに対応する位置分類カテゴリを決定するように構成され、前記位置分類カテゴリは、前記体キーポイントが前記画像内の複数の特定領域のうちの１つの特定領域内に位置することを表し、
前記第３決定ユニットは、各体キーポイントに対応する位置分類カテゴリに基づいて、各目標対象が位置する領域を決定するように構成される。 In some embodiments of the invention, the image processing apparatus further comprises a classification unit and a third determination unit.
The second determination unit is further configured to acquire body key points for each target object in the image.
The classification unit is configured to determine a location classification category corresponding to the body key point, wherein the location classification category is such that the body key point is within one particular region of a plurality of specific regions in the image. Represents being located in
The third determination unit is configured to determine the area in which each target object is located, based on the position classification category corresponding to each body key point.

本発明のいくつかの実施例において、各特定領域がキャビン内の各座席である場合、前記体キーポイントに対応する位置分類カテゴリは、前記体キーポイントに対応する座席であり、前記第３決定ユニットは、１つの目標対象の体キーポイントに対応する座席に基づいて、当該目標対象が位置する座席を決定し、キャビンにおける各目標対象が位置する座席に従って、前記キャビン内の各座席の状態を決定するように構成される。 In some embodiments of the invention, where each particular area is a seat in the cabin, the position classification category corresponding to the body key point is the seat corresponding to the body key point and the third determination. The unit determines the seat in which the target is located based on the seat corresponding to the body key point of one target, and determines the state of each seat in the cabin according to the seat in which each target is located in the cabin. It is configured to decide.

本発明のいくつかの実施例において、前記第３決定ユニットは、１つの目標対象の複数の体キーポイントのうち、同じ座席に対応する体キーポイントの数を統計し、体キーポイントの最大数に対応する座席が前記目標対象の座席であると決定するように構成される。 In some embodiments of the present invention, the third determination unit statistics the number of body key points corresponding to the same seat among a plurality of body key points of one target target, and the maximum number of body key points. It is configured to determine that the seat corresponding to is the seat of the target.

本発明の実施例は、コンピュータプログラムが記憶されているコンピュータ可読記憶媒体を更に提供し、当該プログラムがプロセッサによって実行されるときに、本発明の実施例に記載の画像処理方法のステップを実現する。 The embodiments of the present invention further provide a computer-readable storage medium in which a computer program is stored, and realize the steps of the image processing method described in the embodiments of the present invention when the program is executed by a processor. ..

本発明の実施例は、電子機器を更に提供し、前記電子機器は、メモリと、プロセッサと、メモリに記憶された、コンピュータによって実行可能なコンピュータプログラムと、を備え、前記プロセッサが前記プログラムを実行するときに、本発明の実施例に記載の画像処理方法のステップを実現する。 An embodiment of the present invention further provides an electronic device, wherein the electronic device comprises a memory, a processor, and a computer-executable computer program stored in the memory, wherein the processor executes the program. When doing so, the steps of the image processing method described in the examples of the present invention are realized.

本発明の実施例は、コンピュータ可読コードを含むコンピュータプログラム製品を更に提供し、前記コンピュータ可読コードが電子機器によって実行されるときに、前記電子機器のプロセッサに、本発明の実施例に記載の画像処理方法を実行させる。 An embodiment of the invention further provides a computer program product comprising a computer readable code, the image described in the embodiment of the invention to the processor of the electronic device when the computer readable code is executed by the electronic device. Execute the processing method.

本発明の実施例は、画像処理方法、装置、電子機器並びにコンピュータプログラム製品を提供し、前記画像処理方法は、検出待ちの画像を取得することと、前記画像において目標対象の顔を表す第１検出枠、及び目標対象の体を表す第２検出枠をそれぞれ決定することであって、前記第１検出枠の数はＭであり、前記第２検出枠の数はＮであり、Ｍ及びＮのいずれも非負の整数であることと、Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定することであって、Ｋは非負の整数であり、ＫはＭより小さいか等しく、ＫはＮより小さいか等しいことと、Ｍ、Ｎ及びＫに基づいて前記画像内の目標対象の数を決定することと、を含む。本発明の実施例の技術的解決策によれば、顔検出によって画像内の顔の数を検出し、体検出によって画像内の体の数を検出し、顔と体をマッチングする方式で画像内の人数を決定することにより、画像内の目標対象がブロックされるか、目標対象の顔の回転角度が大きすぎると、検出漏れが発生するという問題を解決し、画像内の人数統計の精度を向上させる。 An embodiment of the present invention provides an image processing method, an apparatus, an electronic device, and a computer program product, wherein the image processing method obtains an image waiting to be detected and represents a target face in the image. The number of the first detection frames is M, the number of the second detection frames is N, and M and N are to determine the detection frame and the second detection frame representing the target body, respectively. All of these are non-negative integers, and among the M first detection frames and N second detection frames, K first detection frames and second detection frames that satisfy the matching relationship are determined. There, K is a non-negative integer, K is less than or equal to M, K is less than or equal to N, and the number of target objects in the image is determined based on M, N and K. ,including. According to the technical solution of the embodiment of the present invention, the number of faces in the image is detected by face detection, the number of bodies in the image is detected by body detection, and the face and body are matched in the image. By determining the number of people in the image, if the target target in the image is blocked or the rotation angle of the face of the target target is too large, the problem that detection omission occurs will be solved, and the accuracy of the number of people statistics in the image will be improved. Improve.

本発明の実施例に係る画像処理方法の第１の例示的なフローチャートである。It is a 1st exemplary flowchart of the image processing method which concerns on embodiment of this invention. 本発明の実施例に係る画像処理方法の第２の例示的なフローチャートである。It is a second exemplary flowchart of the image processing method which concerns on embodiment of this invention. 本発明の実施例に係る画像処理方法のネットワーク構造の概略図である。It is a schematic diagram of the network structure of the image processing method which concerns on embodiment of this invention. 本発明の実施例に係る画像処理装置の構成の第１概略構造図である。It is a 1st schematic structural diagram of the structure of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施例に係る画像処理装置の構成の第２概略構造図である。It is a 2nd schematic structural diagram of the structure of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施例に係る電子機器の構成の概略構造図である。It is a schematic structural diagram of the structure of the electronic device which concerns on embodiment of this invention.

以下、図面および具体的な実施例を参照して、本発明についてさらに詳細に説明する。 Hereinafter, the present invention will be described in more detail with reference to the drawings and specific examples.

本発明の実施例は、画像処理方法を提供する。図１は、本発明の実施例に係る画像処理方法の第１の例示的なフローチャートであり、図１に示されたように、前記画像処理方法は、次のステップを含む。 Examples of the present invention provide an image processing method. FIG. 1 is a first exemplary flowchart of an image processing method according to an embodiment of the present invention, and as shown in FIG. 1, the image processing method includes the following steps.

ステップ１０１において、検出待ちの画像を取得する。 In step 101, an image waiting for detection is acquired.

ステップ１０２において、前記画像において目標対象の顔を表す第１検出枠、及び目標対象の体を表す第２検出枠をそれぞれ決定し、ここで、前記第１検出枠の数はＭであり、前記第２検出枠の数はＮであり、Ｍ及びＮのいずれも非負の整数である。 In step 102, a first detection frame representing the face of the target target and a second detection frame representing the body of the target target are determined in the image, respectively, and here, the number of the first detection frames is M, and the above. The number of the second detection frames is N, and both M and N are non-negative integers.

ステップ１０３において、Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定し、Ｋは、非負の整数であり、ＫはＭより小さいか等しく、ＫはＮより小さいか等しい。 In step 103, out of M first detection frames and N second detection frames, K first detection frames and second detection frames satisfying the matching relationship are determined, and K is a non-negative integer. , K is less than or equal to M, and K is less than or equal to N.

ステップ１０４において、Ｍ、Ｎ及びＫに基づいて前記画像内の目標対象の数を決定する。 In step 104, the number of target objects in the image is determined based on M, N and K.

本実施例では、画像処理方法は画像処理装置に適用され、画像処理装置は、携帯電話、タブレットコンピュータ、ノットブックなどの移動端末に配置されてもよく、デスクトップコンピュータ、オールインワンコンピュータ、サーバなどの電子機器に配置されてもよい。 In this embodiment, the image processing method is applied to an image processing device, and the image processing device may be arranged in a mobile terminal such as a mobile phone, a tablet computer, or a knotbook, and may be electronically arranged in a desktop computer, an all-in-one computer, a server, or the like. It may be placed on the device.

本実施例では、検出待ちの画像（以下、画像と略称）には目標対象が含まれ、ここで、目標対象は、実際の人物であってもよく、他の実施形態において、目標対象は、アニメのキャラクタなどの仮想人物であってもよい。もちろん、目標対象は、他のタイプの対象であってもよく、本実施例ではこれらに対して限定しない。 In the present embodiment, the image waiting for detection (hereinafter, abbreviated as an image) includes a target object, where the target object may be an actual person, and in other embodiments, the target object is It may be a virtual person such as an animated character. Of course, the target object may be another type of object, and is not limited to these in this embodiment.

いくつかの例示的な実施例において、前記目標対象は、車内の内部環境にある目標対象である。例示的に、車両が５人乗りの車両で、車両内に３人が乗車している場合、車両の前部で車両の内部の写真を撮り、取得された画像には、車両内の環境の一部及び座席に座っている３人が含まれることができ、この場合、上記の収集された画像を、本実施例における画像として使用でき、上記の画像内の３人を、本実施例における目標対象として扱うことができる。 In some exemplary embodiments, the target object is a target object in the internal environment of the vehicle. Illustratively, if the vehicle is a five-seater vehicle and three people are in the vehicle, a picture of the inside of the vehicle is taken at the front of the vehicle and the captured image shows the environment inside the vehicle. A portion and three people sitting in a seat can be included, in which case the collected images above can be used as images in this embodiment and the three people in the above images are in this example. It can be treated as a target.

本発明のいくつかの例示的な実施例において、前記画像において目標対象の顔を表す第１検出枠及び目標対象を表す第２検出枠をそれぞれ決定することは、第１ネットワークを介して、前記画像に対して特徴抽出を実行し、抽出された特徴に基づいて、前記画像において目標対象の顔を表す第１検出枠を決定することと、第２ネットワークを介して前記画像に対して特徴抽出を実行し、抽出された特徴に基づいて前記画像において目標対象を表す第２検出枠を決定することと、を含む。 In some exemplary embodiments of the invention, determining a first detection frame representing the face of the target object and a second detection frame representing the target object in the image, respectively, is described via the first network. Feature extraction is performed on the image, the first detection frame representing the target face in the image is determined based on the extracted features, and the feature extraction on the image via the second network. To determine a second detection frame representing the target object in the image based on the extracted features.

本実施例では、第１ネットワークを介して画像内の顔を検出することにより、画像内のＭ個の第１検出枠を決定することができる。ここで、前記第１ネットワークは、顔を検出できる任意のネットワーク構造を採用でき、本実施例ではこれらに対して限定しない。 In this embodiment, M first detection frames in the image can be determined by detecting the faces in the image via the first network. Here, the first network can adopt any network structure capable of detecting a face, and the present embodiment is not limited to these.

本実施例では、第２ネットワークを介して画像内の目標対象を検出することができ、例えば、画像内の体を検出して、画像内のＮ個の第２検出枠を決定する。ここで、前記第２ネットワークは、目標対象を検出できる任意のネットワーク構造（例えば、人体検出ネットワーク）を採用でき、本実施例ではこれらに対して限定しない。 In this embodiment, the target object in the image can be detected via the second network. For example, the body in the image is detected to determine N second detection frames in the image. Here, the second network can adopt an arbitrary network structure (for example, a human body detection network) capable of detecting a target target, and the present embodiment is not limited to these.

いくつかの例示的な実施例において、画像内の目標対象の体を表す第２検出枠を決定することは、第２ネットワークを介して画像に対して特徴抽出を実行し、抽出された特徴に基づいて目標対象のキーポイントを決定する（すなわち、キーポイントの位置情報を決定する）ことと、決定された目標対象のキーポイントに基づいて、目標対象を表す第２検出枠を決定することと、を含む。ここで、前記キーポイント的位置情報は、キーポイントの座標で表すことができる。ここで、同一目標対象に属するすべてのキーポイントを決定することができ、同一目標対象に属するすべてのキーポイントの位置情報に基づいて、当該目標対象の第２検出枠を決定することにより、第２検出枠の領域が当該目標対象のすべてのキーポイントを含むようにし、当該第２検出枠の領域が当該目標対象のすべてのキーポイントを含む最小の領域である。一例として、前記第２検出枠は長方形の枠であってもよい。 In some exemplary embodiments, determining a second detection frame representing the target body in the image performs feature extraction on the image via the second network to the extracted features. To determine the key point of the target target based on (that is, to determine the position information of the key point), and to determine the second detection frame representing the target target based on the determined key point of the target target. ,including. Here, the key point position information can be represented by the coordinates of the key point. Here, all the key points belonging to the same target target can be determined, and the second detection frame of the target target is determined based on the position information of all the key points belonging to the same target target. 2 The area of the detection frame is set to include all the key points of the target object, and the area of the second detection frame is the smallest area including all the key points of the target object. As an example, the second detection frame may be a rectangular frame.

ここで、前記目標対象のキーポイントは、骨骼キーポイント及び／又は輪郭キーポイントを含み得、前記輪郭キーポイントは、目標対象の輪郭エッジを表し、輪郭キーポイントの位置情報に基づいて、目標対象の輪郭エッジを形成できることが理解できる。前記骨骼キーポイントは、目標対象の骨骼的キーポイントを表し、骨骼キーポイントの位置情報に基づいて、目標対象の主な骨骼を形成できることが理解できる。ここで、前記輪郭キーポイントは、腕の輪郭キーポイント、手の輪郭キーポイント、肩の輪郭キーポイント、脚の輪郭キーポイント、足の輪郭キーポイント、腰の輪郭キーポイント、頭の輪郭キーポイント、臀の輪郭キーポイント、胸の輪郭キーポイントのうちの少なくとも１つを含み得る。前記骨骼キーポイントは、腕の骨骼キーポイント、手の骨骼キーポイント、肩の骨骼キーポイント、脚の骨骼キーポイント、足の骨骼キーポイント、腰の骨骼キーポイント、頭の骨骼キーポイント、臀の骨骼キーポイント、胸の骨骼キーポイントのうちの少なくとも１つを含み得る。 Here, the target key point may include a bone key point and / or a contour key point, and the contour key point represents a contour edge of the target target, and the target target is based on the position information of the contour key point. It can be understood that the contour edge of can be formed. It can be understood that the skeletal key points represent the skeletal key points of the target target, and the main skeleton of the target target can be formed based on the position information of the skeletal key points. Here, the contour key points are arm contour key points, hand contour key points, shoulder contour key points, leg contour key points, foot contour key points, waist contour key points, and head contour key points. , Buttocks contour keypoints, chest contour keypoints may include at least one. The bone key points are arm bone key point, hand bone key point, shoulder bone key point, leg bone key point, foot bone key point, waist bone key point, head bone key point, and buttock. It may include at least one of a bone key point and a chest bone key point.

いくつかの例示的な実施例において、画像内の目標対象の体を表す第２検出枠を決定することは、第２ネットワークを介して画像に対して特徴抽出を実行し、抽出された特徴に基づいて、目標対象の中心点及び目標対象に対応する第２検出枠の長さと幅を決定し、前記中心点、前記長さ及び幅に従って前記目標対象の体の第２検出枠を決定することを含む。 In some exemplary embodiments, determining a second detection frame representing the body of the target object in the image performs feature extraction on the image via the second network to the extracted features. Based on this, the center point of the target target and the length and width of the second detection frame corresponding to the target target are determined, and the second detection frame of the body of the target target is determined according to the center point, the length and the width. including.

本発明のいくつかの例示的な実施例において、前記Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定することは、前記Ｍ個の第１検出枠をトラバースして、各第１検出枠と各第２検出枠とのＩｏＵ（ＩｎｔｅｒｓｅｃｔｉｏｎｏｖｅｒＵｎｉｏｎ）を決定することと、各第１検出枠と各第２検出枠とのＩｏＵに基づいて、マッチング関係を満たす第１検出枠と第２検出枠を決定することと、を含む。 In some exemplary embodiments of the present invention, among the M first detection frames and N second detection frames, K first detection frames and second detection frames satisfying the matching relationship are determined. What is to be done is to traverse the M first detection frames to determine the IoU (Intersection over Union) between each first detection frame and each second detection frame, and to determine each first detection frame and each first detection frame. 2 It includes determining a first detection frame and a second detection frame that satisfy the matching relationship based on the IoT with the detection frame.

本実施例では、各第１検出枠について、第１検出枠と各第２検出枠とのＩｏＵをそれぞれ決定する。ここで、前記ＩｏＵは、第１検出枠が位置する領域と第２検出枠が位置する領域との積集合および和集合の比率を表す。理解できることとして、前記ＩｏＵは、対応する第１検出枠と第２検出枠との間の関連度を表し、つまり、対応する顔と目標対象との間の関連度を表す。例示的に、ＩｏＵが大きいほど、対応する第１検出枠と第２検出枠との間の関連度が高くなり、つまり、対応する顔と目標対象との間の関連度が高くなる。これに対応して、ＩｏＵが小さいほど、対応する第１検出枠と第２検出枠との間の関連度が低くなり、つまり、対応する顔と目標対象との間の関連度が低くなる。本実施例では、各第１検出枠と各第２検出枠とのＩｏＵに基づいて、マッチング関係を満たす第１検出枠と第２検出枠を決定することができ、前記マッチング関係を満たす第１検出枠と第２検出枠は、同一目標対象に属する第１検出枠と第２検出枠である。 In this embodiment, the IoU of the first detection frame and each second detection frame is determined for each first detection frame. Here, the IoU represents the ratio of the intersection and the union of the region where the first detection frame is located and the region where the second detection frame is located. As can be understood, the IoU represents the degree of association between the corresponding first detection frame and the second detection frame, that is, the degree of association between the corresponding face and the target object. Illustratively, the larger the IoU, the higher the degree of association between the corresponding first detection frame and the second detection frame, that is, the higher the degree of association between the corresponding face and the target object. Correspondingly, the smaller the IoU, the lower the degree of association between the corresponding first detection frame and the second detection frame, that is, the lower the degree of association between the corresponding face and the target target. In this embodiment, the first detection frame and the second detection frame satisfying the matching relationship can be determined based on the IoU of each first detection frame and each second detection frame, and the first detection frame satisfying the matching relationship can be determined. The detection frame and the second detection frame are the first detection frame and the second detection frame belonging to the same target target.

本発明のいくつかの例示的な実施例において、前記各第１検出枠と各第２検出枠とのＩｏＵに基づいて、マッチング関係を満たす第１検出枠と第２検出枠を決定することは、各第１検出枠と各第２検出枠とのＩｏＵのうちの最大ＩｏＵを決定することと、前記最大ＩｏＵがプリセットされた閾値より大きいかどうかを判断することと、前記最大ＩｏＵが前記プリセットされた閾値より大きいことに応答して、前記最大ＩｏＵに対応する第１検出枠と第２検出枠がマッチング関係を満たすと決定することと、を含む。 In some exemplary embodiments of the present invention, it is possible to determine a first detection frame and a second detection frame that satisfy a matching relationship based on the IoU of each of the first detection frames and each second detection frame. , To determine the maximum IoU of the IoUs of each first detection frame and each second detection frame, to determine whether the maximum IoU is larger than the preset threshold value, and to determine whether the maximum IoU is larger than the preset threshold value. It includes determining that the first detection frame and the second detection frame corresponding to the maximum IoU satisfy the matching relationship in response to being larger than the threshold value.

本実施例では、各第１検出枠と各第２検出枠とのＩｏＵについて、そのうちの最大ＩｏＵに対応する第２検出枠を決定し、最大ＩｏＵが前記プリセットされた閾値より大きい場合、第１検出枠と最大ＩｏＵに対応する第２検出枠がマッチング関係を満たすと決定することができ、つまり、最大ＩｏＵに対応する第１検出枠と第２検出枠が同一目標対象に属すると決定することができる。 In this embodiment, for the IoU of each first detection frame and each second detection frame, the second detection frame corresponding to the maximum IoU is determined, and when the maximum IoU is larger than the preset threshold value, the first It can be determined that the detection frame and the second detection frame corresponding to the maximum IoU satisfy the matching relationship, that is, it is determined that the first detection frame and the second detection frame corresponding to the maximum IoU belong to the same target target. Can be done.

本実施例のステップ１０４では、前記目標対象の数は、Ｋ＋（Ｍ－Ｋ）＋（Ｎ－Ｋ）を満たすことができる。 In step 104 of this embodiment, the number of target objects can satisfy K + (M-K) + (NK).

本実施例では、マッチング関係を満たす第１検出枠及び第２検出枠の数はＫである場合、マッチング関係を満たさない第１検出枠の数はＭ－Ｋであり、マッチング関係を満たさない第２検出枠の数はＮ－Ｋであり、理解できることとして、上記のＭ－Ｋ個の第１検出枠及び上記のＮ－Ｋ個の第２検出枠は、目標対象がブロックされるか、または回転角度が大きすぎるという原因によって生成されたものである可能性がある。本実施例では、上記の原因による第１検出枠及び第２検出枠は、依然として統計の総数に含まれる。 In this embodiment, when the number of the first detection frame and the second detection frame satisfying the matching relationship is K, the number of the first detection frames not satisfying the matching relationship is MK, and the number of the first detection frames not satisfying the matching relationship is not satisfied. 2 The number of detection frames is NK, and it is understandable that the target target is blocked or the target target is blocked in the above-mentioned MH first detection frames and the above-mentioned NK second detection frames. It may have been generated due to the rotation angle being too large. In this embodiment, the first detection frame and the second detection frame due to the above causes are still included in the total number of statistics.

本発明の実施例の技術的解決策によれば、顔検出によって画像内の顔の数を検出し、体検出によって画像内の体の数を検出し、顔と体をマッチングする方式で画像内の人数を決定することにより、画像内の目標対象がブロックされるか、目標対象の顔の回転角度が大きすぎると、検出漏れが発生するという問題を解決し、画像内の人数統計の精度を向上させる。 According to the technical solution of the embodiment of the present invention, the number of faces in the image is detected by face detection, the number of bodies in the image is detected by body detection, and the face and body are matched in the image. By determining the number of people in the image, if the target target in the image is blocked or the rotation angle of the face of the target target is too large, the problem that detection omission occurs will be solved, and the accuracy of the number of people statistics in the image will be improved. Improve.

上記の実施例に基づき、本発明の実施例は、画像処理方法を更に提供する。図２は、本発明の実施例に係る画像処理方法の第２の例示的なフローチャートであり、図２に示されたように、前記画像処理方法は、次のステップを含む。 Based on the above examples, the embodiments of the present invention further provide an image processing method. FIG. 2 is a second exemplary flowchart of the image processing method according to an embodiment of the present invention, and as shown in FIG. 2, the image processing method includes the following steps.

ステップ２０１において、検出待ちの画像を取得する。 In step 201, an image waiting for detection is acquired.

ステップ２０２において、前記画像内の各目標対象の体キーポイントを取得する。 In step 202, the body key points of each target object in the image are acquired.

ステップ２０３において、前記体キーポイントに対応する位置分類カテゴリを決定し、前記位置分類カテゴリは、前記体キーポイントが前記画像内の複数の特定領域のうちの１つの特定領域内に位置することを表す。 In step 203, the position classification category corresponding to the body key point is determined, and the position classification category indicates that the body key point is located in one specific area among a plurality of specific areas in the image. show.

ステップ２０４において、各体キーポイントに対応する位置分類カテゴリに基づいて、各目標対象が位置する領域を決定する。 In step 204, the area where each target object is located is determined based on the position classification category corresponding to each body key point.

本発明のいくつかの例示的な実施例において、前記画像内の各目標対象の体キーポイントを取得することは、第２ネットワークを介して前記画像に対して特徴抽出を実行し、抽出された特徴に基づいて、前記画像において目標対象の体を表す第２検出枠及び各目標対象の体キーポイントを決定することを含む。 In some exemplary embodiments of the invention, acquiring a body key point for each target object in the image was extracted by performing feature extraction on the image via a second network. Based on the characteristics, it includes determining a second detection frame representing the body of the target object and a body key point of each target object in the image.

本実施例では、前述した実施例における第２ネットワークを介して、前記画像内の各目標対象の体キーポイントを取得することができる。第２ネットワークを介して前記画像に対して特徴抽出を実行し、抽出された特徴に基づいて、各目標対象の体キーポイントを取得することができる一方、抽出された特徴に基づいて、目標対象の体を表す第２検出枠を決定することもできることが理解できる。理解できることとして、画像を第２ネットワークに入力して、各目標対象の体キーポイントを出力する同時に、各目標対象の体の第２検出枠も抽出できる。あるいは、画像を第２ネットワークに入力して、各目標対象の体キーポイントの位置情報を抽出する同時に、各目標対象の体の第２検出枠の位置情報も抽出できる。 In this embodiment, the body key points of each target object in the image can be acquired via the second network in the above-described embodiment. Feature extraction can be performed on the image via the second network, and the body key points of each target can be obtained based on the extracted features, while the target can be obtained based on the extracted features. It can be understood that it is also possible to determine the second detection frame representing the body of. As can be understood, the image can be input to the second network, the body key point of each target object can be output, and at the same time, the second detection frame of the body of each target object can be extracted. Alternatively, the image can be input to the second network to extract the position information of the body key point of each target target, and at the same time, the position information of the second detection frame of the body of each target target can be extracted.

いくつかの例示的な実施例において、第２ネットワークを介して前記画像に対して特徴抽出を実行し、抽出された特徴に基づいて、各目標対象の体キーポイントを取得できる一方、抽出された特徴に基づいて、目標対象の中心点及び目標対象に対応する第２検出枠の長さと幅を決定でき、前記中心点、前記長さ及び幅に従って各目標対象の体の第２検出枠を決定し、更に、各目標対象の第２検出枠が位置する領域、及び各体キーポイントに従って、各目標対象の第２検出枠が位置する領域内の体キーポイントを決定し、各第２検出枠が位置する領域内の体キーポイントを、各第２検出枠に対応する目標対象の体キーポイントとして決定する。 In some exemplary embodiments, feature extraction can be performed on the image via a second network to obtain body key points for each target based on the extracted features, while being extracted. Based on the characteristics, the center point of the target target and the length and width of the second detection frame corresponding to the target target can be determined, and the second detection frame of the body of each target target is determined according to the center point, the length and the width. Further, according to the area where the second detection frame of each target target is located and each body key point, the body key points in the area where the second detection frame of each target target is located are determined, and each second detection frame is determined. The body key point in the area where is located is determined as the body key point of the target target corresponding to each second detection frame.

本発明のいくつかの例示的な実施例において、前記体キーポイントに対応する位置分類カテゴリを決定することは、第３ネットワークを介して、前記体キーポイントに対応する位置分類カテゴリを決定することを含み、前記第３ネットワークは、体キーポイントの位置情報及び特定領域の注釈情報を含むサンプル画像に基づいてトレーニングすることによって得られたものである。 In some exemplary embodiments of the invention, determining the location classification category corresponding to the body key point is to determine the location classification category corresponding to the body key point via a third network. The third network is obtained by training based on a sample image including the position information of the body key point and the annotation information of a specific area.

本実施例では、第３ネットワークを介して、各体キーポイントに対応する位置分類カテゴリを決定することができる。理解できることとして、前記第３ネットワークは、任意の分類ネットワークであってもよく、決定された位置分類カテゴリは、前記キーポイントが前記画像内の複数の特定領域のうちの１つの特定領域内に位置することを示す。 In this embodiment, the location classification category corresponding to each body key point can be determined via the third network. As can be understood, the third network may be any classification network, and the determined location classification category is such that the key point is located within a particular region of one of a plurality of specific regions in the image. Show that you do.

本実施例では、画像は、１つ又は複数の特定領域を含み得、前記特定領域は、前記第３ネットワークの分類タスクに関連する。 In this embodiment, the image may include one or more specific areas, the specific area being associated with the classification task of the third network.

例示的に、目標対象が車両の内部環境にある目標対象である場合、第３ネットワークの分類タスクは、各目標対象の体キーポイントが車両内の座席領域にあるかどうかを判定するために使用され、この場合、前記特定領域は、車両内の座席であってもよい。例示的に、車両が５人乗りの車両である場合、第３ネットワークを介して、各体キーポイントがどの座席の領域内に位置するかを決定でき、それにより、各座席の状態を決定することができる。 Illustratively, if the target is a target in the vehicle's internal environment, the third network classification task is used to determine if each target's body key point is in the seating area of the vehicle. In this case, the specific area may be a seat in the vehicle. Illustratively, if the vehicle is a five-seater vehicle, it is possible to determine within which seat area each body keypoint is located, thereby determining the condition of each seat, via the third network. be able to.

本発明のいくつかの例示的な実施例において、各特定領域がキャビン内の各座席である場合、前記体キーポイントに対応する位置分類カテゴリは、前記体キーポイントに対応する座席であり、前記各体キーポイントに対応する位置分類カテゴリに基づいて、各目標対象が位置する領域を決定することは、１つの目標対象の体キーポイントに対応する座席に基づいて、当該目標対象が位置する座席を決定することを含み、前記画像処理方法は、キャビンにおける各目標対象が位置する座席に従って、前記キャビン内の各座席の状態を決定することを更に含む。 In some exemplary embodiments of the invention, where each particular area is a seat in a cabin, the location classification category corresponding to the body key point is the seat corresponding to the body key point, said. Determining the area in which each target is located based on the position classification category corresponding to each body key point is the seat in which the target is located based on the seat corresponding to one target body key point. The image processing method further comprises determining the condition of each seat in the cabin according to the seat in which each target object is located in the cabin.

ここで、前記座席の状態は、アイドル状態又は非アイドル状態を含み得、前記アイドル状態は、対応する座席に目標対象がないこと、すなわち、座席が占有されていないことを表し、これに対応して、前記非アイドル状態は、対応する座席に目標対象があること、すなわち、座席が占有されていることを表す。 Here, the state of the seat may include an idle state or a non-idle state, and the idle state indicates that the corresponding seat has no target object, that is, the seat is not occupied, and corresponds to this. The non-idle state means that the corresponding seat has a target object, that is, the seat is occupied.

本実施例では、各目標対象の体キーポイントに対応する位置分類カテゴリに基づいて、各目標対象の体キーポイントに対応する特定領域を決定し（すなわち、各目標対象の体キーポイントに対応する座席を決定し）、１つの目標対象の体キーポイントが１つの座席に対応する場合、当該座席の状態が非アイドル状態であること、すなわち、座席が占有されていることを表し、１つの座席がどの目標対象の体キーポイントにも対応しない場合、当該座席の状態がアイドル状態であること、すなわち、座席が占有されていないことを表す。 In this embodiment, a specific area corresponding to the body key point of each target target is determined (that is, corresponding to the body key point of each target target) based on the position classification category corresponding to the body key point of each target target. (Determine a seat) If one target body key point corresponds to one seat, it means that the state of the seat is non-idle, that is, the seat is occupied, and one seat. Does not correspond to any target body key point, indicating that the seat is idle, that is, the seat is not occupied.

本発明のいくつかの例示的な実施例において、前記１つの目標対象の体キーポイントに対応する座席に基づいて、当該目標対象が位置する座席を決定することは、１つの目標対象の複数のキーポイントのうち、同じ座席に対応するキーポイントの数を統計することと、体キーポイントの最大数に対応する座席が前記目標対象の座席であると決定することと、を含む。 In some exemplary embodiments of the invention, determining the seat in which the target object is located based on the seat corresponding to the body key point of the one target object is a plurality of targets. Among the key points, the statistics of the number of key points corresponding to the same seat and the determination that the seat corresponding to the maximum number of body key points is the target seat are included.

実際の適用では、目標対象の体キーポイントの数は複数であってもよく、目標対象が必ずしも１つの特定領域（即ち、座席）内にあるとは限れないため、いくつかの例示的な実施例において、同一目標対象に属するすべての体キーポイントの位置分類カテゴリを決定し、同じ位置分類カテゴリに属する体キーポイントの数を統計し（すなわち、同じ座席に対応する体キーポイントの数を統計する）、体キーポイントの最大数を決定し、体キーポイントの最大数に対応する位置分類カテゴリに対応する特定領域を、前記目標対象に対応する領域として使用する（すなわち、体キーポイントの最大数に対応する座席を前記目標対象が位置する座席として使用する）。これに対応して、体キーポイントの最大数に対応する座席の状態を、非アイドル状態、すなわち、座席が占有されている状態として決定する。 In practical applications, the number of body key points of the target may be multiple, and the target is not necessarily within one particular area (ie, seat), so some exemplary implementations. In the example, determine the position classification category of all body key points belonging to the same target, and statistic the number of body key points belonging to the same position classification category (ie, the number of body key points corresponding to the same seat). The maximum number of body key points is determined, and the specific area corresponding to the position classification category corresponding to the maximum number of body key points is used as the area corresponding to the target target (that is, the maximum number of body key points). Use the seat corresponding to the number as the seat where the target object is located). Correspondingly, the state of the seat corresponding to the maximum number of body key points is determined as the non-idle state, that is, the state in which the seat is occupied.

従来の技術、すなわち、座席にセンサを設置することによって座席が占有されているかどうかを決定する実施形態と比較すると、本実施例の技術的解決策では、座席にセンサを設置する必要がないため、検出コストが低減し、更に、圧力センサを設置することで座席が占有されているかどうかを判断する場合、座席に物体が置かれていると、座席が占有されていると見なされるが、本実施例の技術的解決策は、上記の誤判定の発生を回避することができ、これにより、検出の精度を大幅に向上させることができる。 Compared to the prior art, i.e., the embodiment of determining whether a seat is occupied by installing a sensor in the seat, the technical solution of this embodiment does not require the sensor to be installed in the seat. When determining whether a seat is occupied by installing a pressure sensor, the detection cost is reduced, and if an object is placed on the seat, the seat is considered to be occupied, but the book The technical solution of the embodiment can avoid the occurrence of the above-mentioned erroneous determination, and thereby can greatly improve the accuracy of detection.

以下では、具体的な例を参照して、本発明の実施例の画像処理方法について説明する。 Hereinafter, the image processing method according to the embodiment of the present invention will be described with reference to a specific example.

図３は、本発明の実施例に係る画像処理方法のネットワーク構造の概略図であり、図３に示されたように、画像を第１ネットワーク及び第２ネットワークにそれぞれ入力し、ここで、第１ネットワークは、顔検出ネットワークであってもよく、第２ネットワークは、体検出ネットワークであってもよい。 FIG. 3 is a schematic diagram of the network structure of the image processing method according to the embodiment of the present invention, and as shown in FIG. 3, images are input to the first network and the second network, respectively, and here, the first The first network may be a face detection network, and the second network may be a body detection network.

本実施例では、画像は、車両の内部環境の画像であってもよく、例として、車両が５人乗りの車両である場合、画像内の特定領域は、５つの特定領域を含み得、つまり、画像は、５つの座席が位置する領域、例えば、運転席領域、助手席領域、後部座席の左領域、後部座席の中間領域及び後部座席の右領域を含み得、上記の順序に従って、各特定領域のラベルを、それぞれ、０、１、２、３、４として定義することができる。画像内の運転席領域、助手席領域及び後部座席の左領域に人がいると仮定する。 In this embodiment, the image may be an image of the internal environment of the vehicle, for example, if the vehicle is a five-seater vehicle, the specific areas in the image may include five specific areas, i.e. , The image may include areas where the five seats are located, such as the driver's seat area, the passenger's seat area, the left area of the rear seats, the middle area of the rear seats and the right area of the rear seats, each specific according to the above order. Area labels can be defined as 0, 1, 2, 3, and 4, respectively. It is assumed that there is a person in the driver's seat area, the passenger's seat area, and the left area of the rear seat in the image.

第１態様では、第１ネットワークを介して画像に対して特徴抽出を実行し、抽出された特徴に基づいて、画像内の顔検出枠（即ち、上記の実施例における第１検出枠）を取得し、第２ネットワークを介して画像に対して特徴抽出を実行し、抽出された特徴に基づいて、画像内の体検出枠（即ち、上記の実施例における第２検出枠）を取得する。抽出された顔検出枠の数がＭであり、人体検出枠の数がＮであると仮定し、目標対象がブロックされるか又は回転角度が大きすぎる可能性があるため、Ｍの値は３より小さいか等しく、Ｎの値は３より小さいか等しく、つまり、３つの顔検出枠及び／又は３つの体検出枠が検出されない場合が発生する可能性がある。各顔検出枠と各体検出枠とのＩｏＵをそれぞれ計算し、各顔検出枠について、最大ＩｏＵを有する人体検出枠を決定し、最大ＩｏＵがプリセットされた閾値より大きいかどうかを判断し、最大ＩｏＵがプリセットされた閾値より大きいと判断した場合、当該顔検出枠が最大ＩｏＵの人体検出枠とマッチングすると判断することができ、つまり、当該顔検出枠と最大ＩｏＵの体検出枠が同一人に属すると判断することができ、更に、顔検出枠とマッチングする体検出枠の数Ｋを決定する。上記のＭ、Ｎ及びＫに基づいて車内の人数を決定し、Ｋ＋（Ｍ－Ｋ）＋（Ｎ－Ｋ）の結果を車内の人数とする。 In the first aspect, feature extraction is performed on the image via the first network, and the face detection frame in the image (that is, the first detection frame in the above embodiment) is acquired based on the extracted features. Then, feature extraction is performed on the image via the second network, and the body detection frame in the image (that is, the second detection frame in the above embodiment) is acquired based on the extracted features. Assuming that the number of extracted face detection frames is M and the number of human body detection frames is N, the target target may be blocked or the rotation angle may be too large, so the value of M is 3. Less than or equal to, the value of N is less than or equal to 3, that is, there may be cases where three face detection frames and / or three body detection frames are not detected. The IoU of each face detection frame and each body detection frame is calculated respectively, the human body detection frame having the maximum IoU is determined for each face detection frame, and it is determined whether the maximum IoU is larger than the preset threshold value, and the maximum is determined. If it is determined that the IoU is larger than the preset threshold, it can be determined that the face detection frame matches the human body detection frame with the maximum IoU, that is, the face detection frame and the body detection frame with the maximum IoU are the same person. It can be determined that it belongs, and further, the number K of the body detection frames that match the face detection frame is determined. The number of people in the car is determined based on the above M, N and K, and the result of K + (M-K) + (NK) is taken as the number of people in the car.

第２態様では、第２ネットワークを介して画像に対して特徴抽出を実行し、抽出された特徴に基づいて、画像内の体検出枠を取得する一方、画像内の体キーポイント情報も取得でき、当該体キーポイント情報は、各体キーポイントの座標を含み得る。 In the second aspect, feature extraction is performed on the image via the second network, and the body detection frame in the image can be acquired based on the extracted features, while the body key point information in the image can also be acquired. , The body key point information may include the coordinates of each body key point.

例示的に、第２ネットワークを介して前記画像に対して特徴抽出を実行し、抽出された特徴に基づいて、各目標対象の体キーポイントを取得できる一方、抽出された特徴に基づいて、目標対象の中心点及び目標対象に対応する体検出枠の長さと幅を決定でき、前記中心点、前記長さ及び幅に従って各目標対象の体検出枠を決定することができる。更に、各目標対象の体検出枠が位置する領域、及び各体キーポイントに従って、各目標対象の体検出枠が位置する領域内の体キーポイントを決定し、各体検出枠が位置する領域内の体キーポイントを、各体検出枠に対応する目標対象の体キーポイントとして決定できる。 Illustratively, feature extraction can be performed on the image via a second network to obtain body key points for each target target based on the extracted features, while targets based on the extracted features. The length and width of the body detection frame corresponding to the center point of the target and the target target can be determined, and the body detection frame of each target target can be determined according to the center point, the length and the width. Further, according to the area where the body detection frame of each target target is located and each body key point, the body key point in the area where the body detection frame of each target target is located is determined, and the body key point is determined in the area where each body detection frame is located. The body key point of can be determined as the body key point of the target target corresponding to each body detection frame.

例示的に、画像が３×Ｉ×Ｉとして表されると仮定し、ここで、３はチャネル数を表し、この例において、画像は、ＲＧＢカラー画像であり得、この場合、３つのチャネルのうち、１つのチャネルデータは赤（Ｒｅｄ）を表すチャネルデータであり、１つのチャネルデータは緑（Ｇｒｅｅｎ）を表すチャネルデータであり、１つのチャネルデータは青（Ｂｌｕｅ）を表すチャネルデータであり、Ｉ×Ｉは、画像のサイズを表す。この場合、第２ネットワークを介して、画像に対して特徴抽出を実行して、Ｃ×Ｆ×Ｆの特徴マップを取得し、ここで、Ｃはチャネル数を表し、Ｆ×Ｆは、特徴マップのサイズを表す。次に、特定のサイズの畳み込み層（例えば、１×１の畳み込み層など）を介して特徴マップに対して畳み込み処理を実行して、Ｈ×Ｆ×Ｆの特徴マップを取得し、ここで、Ｈはチャネル数を表し、各チャネルは、１つのキーポイントを決定でき、この場合、Ｈ個のキーポイントを取得することができる。ここで、Ｈ×Ｆ×Ｆの特徴マップのガウスピークを識別することにより、ガウスピークの頂点座標をキーポイントの座標として使用し、このようにして、Ｈ個のキーポイントを決定する。 Illustratively, it is assumed that the image is represented as 3 × I × I, where 3 represents the number of channels, in this example the image can be an RGB color image, in this case of 3 channels. Of these, one channel data is channel data representing red, one channel data is channel data representing green, and one channel data is channel data representing blue. I × I represents the size of the image. In this case, feature extraction is performed on the image via the second network to acquire a feature map of C × F × F, where C represents the number of channels and F × F is the feature map. Represents the size of. Next, a convolution process is performed on the feature map via a convolution layer of a specific size (for example, a 1 × 1 convolution layer) to obtain an H × F × F feature map, where the feature map is obtained. H represents the number of channels, and each channel can determine one key point, and in this case, H key points can be acquired. Here, by identifying the Gauss peak in the H × F × F feature map, the vertex coordinates of the Gauss peak are used as the coordinates of the key points, and in this way, H key points are determined.

さらに、取得された体キーポイントを第３ネットワークに入力することにより、各体キーポイントの位置分類カテゴリを決定する。例示的に、上記の５つの特定領域を例にとると、本実施例では、第３ネットワークを介して、各キーポイントに対応する特定領域のラベルを決定する。 Further, by inputting the acquired body key points into the third network, the position classification category of each body key point is determined. Illustratively, taking the above five specific areas as an example, in this embodiment, the label of the specific area corresponding to each key point is determined via the third network.

本発明の実施例は、画像処理装置を更に提供する。図４は、本発明の実施例に係る画像処理装置の構成の第１の概略構造図であり、図４に示されたように、前記画像処理装置は、取得ユニット３１、第１決定ユニット３２、第２決定ユニット３３及びマッチングユニット３４を備える。 The embodiments of the present invention further provide an image processing apparatus. FIG. 4 is a first schematic structural diagram of the configuration of the image processing apparatus according to the embodiment of the present invention, and as shown in FIG. 4, the image processing apparatus includes an acquisition unit 31 and a first determination unit 32. , A second determination unit 33 and a matching unit 34.

前記取得ユニット３１は、検出待ちの画像を取得するように構成される。 The acquisition unit 31 is configured to acquire an image waiting for detection.

前記第１決定ユニット３２は、前記画像において目標対象の顔を表す第１検出枠を決定するように構成され、前記第１検出枠の数はＭである。 The first determination unit 32 is configured to determine a first detection frame representing the face of the target target in the image, and the number of the first detection frames is M.

前記第２決定ユニットは、前記画像において目標対象の体を表す第２検出枠を決定するように構成され、ここで、前記第２検出枠の数はＮであり、Ｍ及びＮのいずれも非負の整数である。 The second determination unit is configured to determine a second detection frame representing the target body in the image, where the number of the second detection frames is N, and neither M nor N is negative. Is an integer of.

前記マッチングユニット３４は、Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定し、Ｍ、Ｎ及びＫに基づいて前記画像内の目標対象の数を決定するように構成され、Ｋは非負の整数であり、ＫはＭより小さいか等しく、ＫはＮより小さいか等しい。 The matching unit 34 determines K first detection frames and second detection frames that satisfy the matching relationship among the M first detection frames and N second detection frames, and sets them to M, N, and K. Based on this, it is configured to determine the number of target objects in the image, where K is a non-negative integer, K is less than or equal to M, and K is less than or equal to N.

本発明のいくつかの例示的な実施例において、前記マッチングユニット３４は、前記Ｍ個の第１検出枠をトラバースして、各第１検出枠と各第２検出枠とのＩｏＵ（ＩｎｔｅｒｓｅｃｔｉｏｎｏｖｅｒＵｎｉｏｎ）を決定し、各第１検出枠と各第２検出枠とのＩｏＵに基づいて、マッチング関係を満たす第１検出枠と第２検出枠を決定するように構成される。 In some exemplary embodiments of the present invention, the matching unit 34 traverses the M first detection frames and ioU (intersection over Union) between each first detection frame and each second detection frame. ), And the first detection frame and the second detection frame satisfying the matching relationship are determined based on the IoU of each first detection frame and each second detection frame.

本発明のいくつかの例示的な実施例において、前記マッチングユニット３４は、各第１検出枠と各第２検出枠とのＩｏＵのうちの最大ＩｏＵを決定し、前記最大ＩｏＵがプリセットされた閾値より大きいかどうかを判断し、前記最大ＩｏＵが前記プリセットされた閾値より大きいことに応答して、前記最大ＩｏＵに対応する第１検出枠と第２検出枠がマッチング関係を満たすと決定するように構成される。 In some exemplary embodiments of the invention, the matching unit 34 determines the maximum IoU of the IoUs of each first detection frame and each second detection frame, and the maximum IoU is a preset threshold value. To determine whether it is greater than, and to determine that the first and second detection slots corresponding to the maximum IoU satisfy the matching relationship in response to the maximal IoU being greater than the preset threshold. It is composed.

本発明のいくつかの例示的な実施例において、前記マッチングユニットは、前記目標対象の数が、Ｋ＋（Ｍ－Ｋ）＋（Ｎ－Ｋ）であると決定するように構成される。 In some exemplary embodiments of the invention, the matching unit is configured to determine that the number of target objects is K + (MK) + (NK).

本発明のいくつかの例示的な実施例において、前記第１決定ユニット３２は、第１ネットワークを介して前記画像に対して特徴抽出を実行し、抽出された特徴に基づいて、前記画像において目標対象の顔を表す第１検出枠を決定するように構成される。 In some exemplary embodiments of the invention, the first determination unit 32 performs feature extraction on the image via a first network and, based on the extracted features, targets in the image. It is configured to determine a first detection frame that represents the target face.

前記第２決定ユニット３３は、第２ネットワークを介して前記画像に対して特徴抽出を実行し、抽出された特徴に基づいて、前記画像において目標対象の体を表す第２検出枠を決定するように構成される。 The second determination unit 33 performs feature extraction on the image via the second network, and determines a second detection frame representing the target body in the image based on the extracted features. It is composed of.

本発明のいくつかの例示的な実施例において、図５に示されたように、前記画像処理装置は更に、分類ユニット３５及び第３決定ユニット３６を備え、ここで、
前記第２決定ユニット３３は更に、前記画像内の各目標対象の体キーポイントを取得するように構成され、
前記分類ユニット３５は、前記体キーポイントに対応する位置分類カテゴリを決定するように構成され、前記位置分類カテゴリは、前記体キーポイントが前記画像内の複数の特定領域のうちの１つの特定領域内に位置することを表し、
前記第３決定ユニット３６は、各体キーポイントに対応する位置分類カテゴリに基づいて、各目標対象が位置する領域を決定するように構成される。 In some exemplary embodiments of the invention, the image processing apparatus further comprises a classification unit 35 and a third determination unit 36, as shown in FIG.
The second determination unit 33 is further configured to acquire body key points for each target object in the image.
The classification unit 35 is configured to determine a position classification category corresponding to the body key point, in which the body key point is one specific area of a plurality of specific areas in the image. Represents being located within
The third determination unit 36 is configured to determine an area in which each target object is located, based on the position classification category corresponding to each body key point.

本発明のいくつかの例示的な実施例において、前記第２決定ユニット３３は、第２ネットワークを介して前記画像に対して特徴抽出を実行し、抽出された特徴に基づいて、前記画像において目標対象の体を表す第２検出枠及び各目標対象の体キーポイントを決定するように構成される。 In some exemplary embodiments of the invention, the second determination unit 33 performs feature extraction on the image via a second network and, based on the extracted features, targets in the image. It is configured to determine the second detection frame representing the body of the target and the body key points of each target target.

本発明のいくつかの例示的な実施例において、前記分類ユニット３５は、第３ネットワークを介して、前記キーポイントに対応する位置分類カテゴリを決定するように構成され、前記第３ネットワークは、体キーポイントの位置情報及び特定領域の注釈情報を含むサンプル画像に基づいてトレーニングすることによって得られたものである。 In some exemplary embodiments of the invention, the classification unit 35 is configured to determine a location classification category corresponding to the key point via a third network, wherein the third network is a body. It was obtained by training based on a sample image containing key point position information and annotation information for a specific area.

本発明のいくつかの例示的な実施例において、各特定領域がキャビン内の各座席である場合、前記体キーポイントに対応する位置分類カテゴリは、前記体キーポイントに対応する座席であり、前記第３決定ユニット３６は、１つの目標対象の体キーポイントに対応する座席に基づいて、当該目標対象が位置する座席を決定し、キャビンにおける各目標対象が位置する座席に従って、前記キャビン内の各座席の状態を決定するように構成される。 In some exemplary embodiments of the invention, where each particular area is a seat in a cabin, the location classification category corresponding to the body key point is the seat corresponding to the body key point, said. The third determination unit 36 determines the seat in which the target object is located based on the seat corresponding to the body key point of the target object, and each in the cabin according to the seat in which each target object is located in the cabin. It is configured to determine the condition of the seat.

本発明のいくつかの例示的な実施例において、前記第３決定ユニット３６は、１つの目標対象の複数の体キーポイントのうち、同じ座席に対応する体キーポイントの数を統計し、体キーポイントの最大数に対応する座席が前記目標対象の座席であると決定するように構成される。 In some exemplary embodiments of the invention, the third determination unit 36 statistics the number of body key points corresponding to the same seat among a plurality of body key points of one target object, and the body key. The seat corresponding to the maximum number of points is configured to be determined to be the target seat.

本発明の実施例において、前記画像処理装置における取得ユニット３１、第１決定ユニット３２、第２決定ユニット３３、マッチングユニット３４、分類ユニット３５、及び第３決定ユニット３６は、実際の応用ではすべて前記装置の中央処理装置（ＣＰＵ：ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、デジタル信号プロセッサ（ＤＳＰ：ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、マイクロコントローラユニット（ＭＣＵ：ＭｉｃｒｏｃｏｎｔｒｏｌｌｅｒＵｎｉｔ）又はフィールド（ＦＰＧＡ、Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）で実現できる。 In the embodiment of the present invention, the acquisition unit 31, the first determination unit 32, the second determination unit 33, the matching unit 34, the classification unit 35, and the third determination unit 36 in the image processor are all described above in actual application. It can be realized by a central processing unit (CPU: Central Processing Unit), a digital signal processor (DSP: Digital Signal Processor), a microcontroller unit (MCU: Microcontroller Unit) or a field (FPGA, Field-Programmable Gate Array).

上記の実施例に係る画像処理装置が画像処理を実行することについて、上述の各プログラムモジュールの分割のみを例に挙げて説明しているが、実際の応用では、必要に応じて、上記の処理が異なるプログラムモジュールによって完了するように割り当てることができ、即ち、上記の処理の全てまたは一部を完了するために、装置の内部構造を異なるプログラムモジュールに分割することができることに留意されたい。更に、上述の実施例で提供される画像処理装置の実施例は、画像処理方法の実施例と同じ構想に属し、その具体的な実現プロセスについては、方法の実施例を参照でき、ここでは繰り返して説明しない。 The fact that the image processing apparatus according to the above embodiment executes image processing is described by taking only the division of each of the above-mentioned program modules as an example, but in actual application, the above processing is performed as necessary. Note that can be assigned to be completed by different program modules, i.e., the internal structure of the device can be divided into different program modules to complete all or part of the above processing. Further, the embodiment of the image processing apparatus provided in the above-described embodiment belongs to the same concept as the embodiment of the image processing method, and the embodiment of the method can be referred to for the specific realization process thereof, and is repeated here. I will not explain.

本発明の実施例は、電子機器を更に提供する。図６は、本発明の実施例に係る電子機器の構成の概略構造図であり、図６に示されたように、前記電子機器４０は、メモリ４２と、プロセッサ４１と、メモリ４２に記憶された、プロセッサ４１によって実行可能なコンピュータプログラムとを備え、前記プロセッサ４１が前記プログラムを実行するときに、本発明の実施例に記載の画像処理方法のステップを実現する。 The embodiments of the present invention further provide electronic devices. FIG. 6 is a schematic structural diagram of the configuration of the electronic device according to the embodiment of the present invention, and as shown in FIG. 6, the electronic device 40 is stored in the memory 42, the processor 41, and the memory 42. Further, a computer program that can be executed by the processor 41 is provided, and when the processor 41 executes the program, the steps of the image processing method described in the embodiment of the present invention are realized.

例示的に、電子機器４０の各コンポーネントは、バスシステム４３を介して結合できる。理解できることとして、バスシステム４３は、これらのコンポーネント間の接続通信を実現するために使用される。データバスに加えて、バスシステム４３は更に、電力バス、制御バスおよび状態信号バスを備える。しかしながら、説明を明確にするために、図６では様々なバスをすべてバスシステム４３として表記する。 Illustratively, each component of the electronic device 40 can be coupled via the bus system 43. As can be understood, the bus system 43 is used to realize connection communication between these components. In addition to the data bus, the bus system 43 further comprises a power bus, a control bus and a status signal bus. However, for the sake of clarity, all of the various buses are referred to as the bus system 43 in FIG.

メモリ４２は、揮発性メモリまたは不揮発性メモリであってもよいし、揮発性および不揮発性メモリの両方を含んでもよいことを理解されたい。ここで、不揮発性メモリは、読み取り専用メモリ（ＲＯＭ：Ｒｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）、プログラム可能な読み取り専用メモリ（ＰＲＯＭ：ＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）、消去可能なプログラム可能な読み取り専用メモリ（ＥＰＲＯＭ：ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）、電気的に消去可能なプログラム可能な読み取り専用メモリ（ＥＥＰＲＯＭ：ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）、強磁性ランダムアクセスメモリ（ＦＲＡＭ：ｆｅｒｒｏｍａｇｎｅｔｉｃｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）フラッシュメモリ（ＦｌａｓｈＭｅｍｏｒｙ）、磁気メモリ、コンパクトディスク、または読み取り専用コンパクトディスク（ＣＤ－ＲＯＭ：ＣｏｍｐａｃｔＤｉｓｃＲｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）であり得、磁気メモリは、磁気ディスクメモリまたは磁気テープメモリであり得る。揮発性メモリは、外部キャッシュとして使用されるランダムアクセスメモリ（ＲＡＭ：ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）であってもよい。例示的であるが限定的な説明ではないが、例えば、スタティックランダムアクセスメモリ（ＳＲＡＭ：ＳｔａｔｉｃＲＡＭ）、同期スタティックランダムアクセスメモリ（ＳＳＲＡＭ：ＳｙｎｃｈｒｏｎｏｕｓＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ダイナミックランダムアクセスメモリ（ＤＲＡＭ：ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、同期ダイナミックランダムアクセスメモリ（ＳＤＲＡＭ：ＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ダブルデータレートの同期ダイナミックランダムアクセスメモリ（ＤＤＲＳＤＲＡＭ：ＤｏｕｂｌｅＤａｔａＲａｔｅＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、強化された同期ダイナミックランダムアクセスメモリ（ＥＳＤＲＡＭ：ＥｎｈａｎｃｅｄＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ダイナミックランダムアクセスメモリの同期接続（ＳＬＤＲＡＭ：ＳｙｎｃＬｉｎｋＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）およびダイレクトメモリバスランダムアクセスメモリ（ＤＲＲＡＭ：ＤｉｒｅｃｔＲａｍｂｕｓＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）など様々な形のＲＡＭを使用することができる。本発明の実施例に記載のメモリ４２は、これらおよび任意の他の適切なタイプのメモリを含むが、これらに限定されないことを意図する。 It should be understood that the memory 42 may be volatile or non-volatile memory and may include both volatile and non-volatile memory. Here, the non-volatile memory includes a read-only memory (ROM: Read-Only Memory), a programmable read-only memory (PROM: Programmable ROM), and an erasable programmable read-only memory (EPROM: Erasable Programmable Read-Only). Memory), electrically erasable programmable read-only memory (EEPROM: Electrically Erasable Read-Only Memory), ferromagnetic random access memory (FRAM: ferromagnetic random memory, magnetic memory) flash memory (Memory) flash memory (Memory) It can be a compact disc or a read-only compact disc (CD-ROM: Compact Disc Read-Only Memory), and the magnetic memory can be a magnetic disk memory or a magnetic tape memory. The volatile memory may be a random access memory (RAM: Random Access Memory) used as an external cache. Although exemplary but not limited, for example, static random access memory (SRAM: Static RAM), synchronous static random access memory (SSRAM: Synchronous Static Access Memory), dynamic random access memory (DRAM: Dynamic Random), etc. Access Memory), Synchronous Dynamic Random Access Memory (SDRAM: Synchronous Dynamic Random Access Memory), Double Data Rate Synchronous Dynamic Random Access Memory (DDRDRAM: Double Data Rate Synchronous Memory Synchronized Dynamic Synchronous Memory) Synchronized Dynamic Random Access Memory, Synchronized Dynamic Random Access Memory ESDRAM: Enhanced Synchronous Dynamic Random Access Memory), Dynamic Random Access Memory Synchronous Connection (SL DRAM: SyncLink Dynamic Random Access Memory) and Direct Memory Bus Random Access Memory (DRRAM) can do. The memory 42 described in the embodiments of the present invention is intended to include, but is not limited to, these and any other suitable type of memory.

上記の本発明の実施例で開示される方法は、プロセッサ４１に適用されるか、またはプロセッサ４１によって実現されることができる。プロセッサ４１は、信号処理機能を備えた集積回路チップであり得る。実現プロセスにおいて、前述した方法の各ステップは、プロセッサ４１におけるハードウェアの集積論理回路またはソフトウェアの形の命令によって完了することができる。上記のプロセッサ４１は、汎用プロセッサ、ＤＳＰ、または他のプログラマブルロジックデバイス、ディスクリートゲートまたはトランジスタロジックデバイス、ディスクリートハードウェアコンポーネントなどであってもよい。プロセッサ４１は、本開示の実施例で開示された各方法、ステップおよび論理ブロック図を実現または実行することができる。汎用プロセッサは、マイクロプロセッサであってもよいし、任意の従来のプロセッサなどであってもよい。本発明の実施例を組み合たせて開示された方法のステップは、直接に、ハードウェア復号化プロセッサによって実行されて完了すると具現されることができ、または復号化プロセッサにおけるハードウェアおよびソフトウェアモジュールの組み合わせによって実行して完了する。ソフトウェアモジュールは記憶媒体に配置されることができ、当該記憶媒体は、メモリ４２に配置され、プロセッサ４１は、メモリ４２内の情報を読み取り、そのハードウェアと組み合わせて前記方法のステップを完成する。 The methods disclosed in the embodiments of the present invention described above can be applied to or realized by the processor 41. The processor 41 may be an integrated circuit chip having a signal processing function. In the implementation process, each step of the method described above can be completed by a hardware integrated logic circuit in processor 41 or an instruction in the form of software. The processor 41 may be a general purpose processor, DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, and the like. The processor 41 can implement or execute each of the methods, steps and logic block diagrams disclosed in the embodiments of the present disclosure. The general-purpose processor may be a microprocessor, an arbitrary conventional processor, or the like. The steps of the method disclosed in combination with embodiments of the present invention can be embodied when executed and completed directly by a hardware decryption processor, or of hardware and software modules in the decryption processor. Execute and complete by combination. The software module can be placed in a storage medium, which is placed in the memory 42, where the processor 41 reads the information in the memory 42 and combines it with its hardware to complete the steps of the method.

例示的な実施例において、電子機器は、前記方法を実行するために、１つまたは複数の特定用途向け集積回路（ＡＳＩＣ：ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、デジタル信号プロセッサ（ＤＳＰ）、プログラマブルロジックデバイス（ＰＬＤ）、複合プログラマブルロジックデバイス（ＣＰＬＤ：ＣｏｍｐｌｅｘＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）、ＦＰＧＡ、汎用プロセッサ、コントローラ、ＭＣＵ、マイクロプロセッサ（Ｍｉｃｒｏｐｒｏｃｅｓｓｏｒ）または他の電子素子によって実現されることができる。 In an exemplary embodiment, the electronic device is one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), programmable logic devices (PLDs) to perform the method. ), A composite programmable logic device (CPLD), FPGA, general purpose processor, controller, MCU, microprocessor or other electronic element.

例示的な実施例において、本発明の実施例は、コンピュータプログラム命令を含むメモリ４２などのコンピュータ可読記憶媒体を更に提供し、上述のコンピュータプログラムは、電子機器４０のプロセッサ４１によって実行されることにより、上記の方法を完了することができる。コンピュータ記憶媒体は、ＦＲＡＭ、ＲＯＭ、ＰＲＯＭ、ＥＰＲＯＭ、ＥＥＰＲＯＭ、ＦｌａｓｈＭｅｍｏｒｙ、磁気表面メモリ、光ディスク、またはＣＤ－ＲＯＭなどのメモリであってもよいし、携帯電話、コンピュータ、タブレットコンピュータ、形態情報端末などの上記のメモリのうちの１つ又は任意に組み合わせた様々な機器であってもよい。 In an exemplary embodiment, embodiments of the present invention further provide a computer-readable storage medium, such as a memory 42 containing computer program instructions, wherein the computer program described above is executed by the processor 41 of the electronic device 40. , The above method can be completed. The computer storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM, or a mobile phone, a computer, a tablet computer, a morphological information terminal, or the like. It may be one of the above-mentioned memories or various devices in any combination.

本発明の実施例は、コンピュータプログラムが記憶されているコンピュータ可読記憶媒体を提供し、当該プログラムがプロセッサによって実行されるときに、本発明の実施例における画像処理方法のステップを実現する。 The embodiments of the present invention provide a computer-readable storage medium in which a computer program is stored, and realize the steps of the image processing method in the embodiments of the present invention when the program is executed by a processor.

本発明の実施例は、コンピュータ可読コードを含むコンピュータプログラム製品を更に提供し、前記コンピュータ可読コードが電子機器によって実行されるときに、前記電子機器のプロセッサに、本発明の実施例における画像処理方法を実行させる。 An embodiment of the present invention further provides a computer program product including a computer-readable code, and when the computer-readable code is executed by the electronic device, the processor of the electronic device is subjected to the image processing method according to the embodiment of the present invention. To execute.

本願で提供されるいくつかの方法の実施例に開示される方法は、競合することなく任意に組み合わせて、新しい方法の実施例を取得することができる。 The methods disclosed in the examples of some of the methods provided herein can be arbitrarily combined without conflict to obtain examples of the new method.

本願で提供されるいくつかの製品の実施例に開示される技術的特徴は、競合することなく任意に組み合わせて、新しい製品の実施例を取得することができる。 The technical features disclosed in the examples of some of the products provided in the present application can be arbitrarily combined to obtain examples of new products without conflict.

本願で提供されるいくつかの方法又は機器の実施例に開示される特徴は、競合することなく任意に組み合わせて、新しい方法の実施例又は機器の実施例を取得することができる。 The features disclosed in the embodiments of some of the methods or devices provided in the present application can be arbitrarily combined without conflict to obtain examples of new methods or devices.

本願で提供されたいくつかの実施例において、開示された機器及び方法は、他の方式で実現できることを理解されたい。上記で説明された機器の実施例は例示的なものに過ぎず、例えば、前記ユニットの分割は、論理機能の分割に過ぎず、実際の実現では、他の分割方法があり、例えば、複数のユニット又はコンポーネントを別のシステムに統合又は集積したり、又は一部の特徴を無視したり、又は実行しないことができる。さらに、表示または議論された各構成要素間の相互結合または直接結合または通信接続は、いくつかのインターフェース、機器またはユニットを介した間接な結合または通信接続であり得、電気的、機械的または他の形態であり得る。 It should be understood that in some of the embodiments provided herein, the disclosed devices and methods can be implemented in other ways. The embodiments of the equipment described above are merely exemplary, for example, the division of the unit is merely a division of the logical function, and in practice there are other division methods, eg, a plurality. Units or components may be integrated or integrated into another system, or some features may be ignored or not implemented. In addition, the interconnect or direct coupling or communication connection between each component displayed or discussed can be an indirect coupling or communication connection through several interfaces, devices or units, electrical, mechanical or other. Can be in the form of.

上記の分離部材として説明されたユニットは、物理的に分離されている場合とされていない場合があり、ユニットとして表示された部材は、物理ユニットである場合もそうでない場合もあり、１箇所に配置される場合もあれば、複数のネットワークユニットに分散される場合もあり、実際の必要に応じて、その一部またはすべてのユニットを選択して、本実施例の技術案の目的を具現することができる。 The unit described above as a separating member may or may not be physically separated, and the member labeled as a unit may or may not be a physical unit in one place. It may be deployed or distributed across multiple network units, and some or all of them may be selected according to actual needs to embody the objectives of the technical proposal of this embodiment. be able to.

なお、本発明の各実施例における各機能ユニットは、全部１つの処理ユニットに統合してもよいし、各ユニットを別々に１つのユニットとして使用してもよいし、２つ以上のユニットを１つのユニットに統合してもよい。上記の統合されたユニットは、ハードウェアの形態で、またはハードウェアおよびソフトウェア機能ユニットの形態で具現することができる。 Each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may be used separately as one unit, or two or more units may be used as one unit. It may be integrated into one unit. The integrated units described above can be embodied in the form of hardware or in the form of hardware and software functional units.

当業者なら自明であるが、前述した方法の実施例のステップの全てまたは一部は、プログラムを介して関連するハードウェアに命令することによって完了することができ、前記プログラムは、コンピュータ読み取り可能な記憶媒体に記憶されることができ、前記プログラムが実行されるときに、上記の方法の実施例のステップを実行し、前記記憶媒体は、モバイル記憶機器、ＲＯＭ、ＲＡＭ、磁気メモリまたは光ディスクなどのプログラムコードを記憶することができる様々な媒体を含む。 Obviously to those skilled in the art, all or part of the steps of the embodiments of the above method can be completed by instructing the relevant hardware via a program, which is computer readable. It can be stored in a storage medium, and when the program is executed, the steps of the embodiment of the above method are performed, and the storage medium may be a mobile storage device, ROM, RAM, magnetic memory, optical disk, or the like. Includes various media that can store program code.

あるいは、本発明の上記の統合されたユニットがソフトウェア機能モジュールの形で実現され、スタンドアロン製品として販売または使用される場合、コンピュータ読み取り可能な記憶媒体に記憶されてもよい。このような理解に基づいて、本発明の実施例の技術的解決策の本質的な部分、すなわち、先行技術に貢献のある部分は、ソフトウェア製品の形で具現されることができ、当該コンピュータソフトウェア製品は、１つの記憶媒体に記憶され、コンピュータ機器（パーソナルコンピュータ、サーバ、又はネットワーク機器等であり得る）に、本開示の各実施例に記載の方法の全部又は一部を実行させるためのいくつかの命令を含む。前述した記憶媒体は、リムーバブルストレージ、ＲＯＭ、ＲＡＭ、磁気メモリまたは光ディスクなどのプログラムコードを記憶することができる様々な媒体を含む。 Alternatively, if the above integrated unit of the invention is realized in the form of a software functional module and sold or used as a stand-alone product, it may be stored on a computer-readable storage medium. Based on this understanding, the essential parts of the technical solution of the embodiments of the present invention, i.e., the parts that contribute to the prior art, can be embodied in the form of software products, such as computer software. The product is stored in one storage medium and is how many to allow a computer device (which may be a personal computer, server, network device, etc.) to perform all or part of the methods described in each embodiment of the present disclosure. Including the instruction. The storage medium described above includes various media capable of storing program code such as removable storage, ROM, RAM, magnetic memory or optical disk.

上記の内容は、本発明の特定の実施形態に過ぎず、本発明の保護範囲はこれに限定されない。当業者は、本発明に開示された技術的範囲内で容易に想到し得る変更又は置換は、すべて本開示の保護範囲内に含まれるべきである。したがって、本発明の保護範囲は、特許請求の範囲の保護範囲に従うものとする。
The above contents are merely specific embodiments of the present invention, and the scope of protection of the present invention is not limited thereto. All skill in the art should include all changes or substitutions readily conceivable within the technical scope disclosed in the present invention within the scope of the present disclosure. Therefore, the scope of protection of the present invention shall be in accordance with the scope of protection of the claims.

Claims

画像処理方法であって、
検出待ちの画像を取得することと、
前記画像において目標対象の顔を表す第１検出枠、及び目標対象の体を表す第２検出枠をそれぞれ決定することであって、前記第１検出枠の数はＭであり、前記第２検出枠の数はＮであり、Ｍ及びＮのいずれも非負の整数である、ことと、
Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定することであって、Ｋは非負の整数であり、ＫはＭより小さいか等しく、ＫはＮより小さいか等しい、ことと、
Ｍ、Ｎ及びＫに基づいて前記画像内の目標対象の数を決定することと、を含む、前記画像処理方法。 It ’s an image processing method.
Acquiring images waiting to be detected and
In the image, the first detection frame representing the face of the target target and the second detection frame representing the body of the target target are determined, respectively, and the number of the first detection frames is M, and the second detection frame is used. The number of frames is N, and both M and N are non-negative integers.
Of the M first detection frames and N second detection frames, K is to determine the first detection frame and the second detection frame that satisfy the matching relationship, and K is a non-negative integer. K is less than or equal to M, K is less than or equal to N,
The image processing method comprising determining the number of target objects in the image based on M, N and K.

前記Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定することは、
前記Ｍ個の第１検出枠をトラバースして、各第１検出枠と各第２検出枠とのＩｏＵ（ＩｎｔｅｒｓｅｃｔｉｏｎｏｖｅｒＵｎｉｏｎ）を決定することと、
各第１検出枠と各第２検出枠とのＩｏＵに基づいて、マッチング関係を満たす第１検出枠と第２検出枠を決定することと、を含む、
請求項１に記載の画像処理方法。 Of the M first detection frames and N second detection frames, K determination of the first detection frame and the second detection frame satisfying the matching relationship can be determined.
Traversing the M first detection frames to determine the IoU (Intersection over Union) between each first detection frame and each second detection frame.
Including determining a first detection frame and a second detection frame satisfying a matching relationship based on the IoU of each first detection frame and each second detection frame.
The image processing method according to claim 1.

前記各第１検出枠と各第２検出枠とのＩｏＵに基づいて、マッチング関係を満たす第１検出枠と第２検出枠を決定することは、
各第１検出枠と各第２検出枠とのＩｏＵのうちの最大ＩｏＵを決定することと、
前記最大ＩｏＵがプリセットされた閾値より大きいかどうかを判断することと、
前記最大ＩｏＵが前記プリセットされた閾値より大きいことに応答して、前記最大ＩｏＵに対応する第１検出枠と第２検出枠がマッチング関係を満たすと決定することと、を含む、
請求項２に記載の画像処理方法。 Determining the first detection frame and the second detection frame that satisfy the matching relationship based on the IoU of each first detection frame and each second detection frame is possible.
Determining the maximum IoU of the IoUs of each first detection frame and each second detection frame,
Determining if the maximum IoU is greater than the preset threshold,
In response to the maximum IoU being greater than the preset threshold, it is determined that the first detection frame and the second detection frame corresponding to the maximum IoU satisfy the matching relationship.
The image processing method according to claim 2.

前記Ｍ、Ｎ及びＫに基づいて前記画像内の目標対象の数を決定することは、
前記目標対象の数が、Ｋ＋（Ｍ－Ｋ）＋（Ｎ－Ｋ）であると決定することを含む、
請求項１ないし３のいずれか一項に記載の画像処理方法。 Determining the number of target objects in the image based on the M, N and K
Including determining that the number of target objects is K + (M-K) + (NK).
The image processing method according to any one of claims 1 to 3.

前記画像処理方法は、
前記画像内の各目標対象の体キーポイントを取得することと、
前記体キーポイントに対応する位置分類カテゴリを決定することであって、前記位置分類カテゴリは、前記体キーポイントが前記画像内の複数の特定領域のうちの１つの特定領域内に位置することを表す、ことと、
各体キーポイントに対応する位置分類カテゴリに基づいて、各目標対象が位置する領域を決定することと、を更に含む、
請求項１ないし４のいずれか一項に記載の画像処理方法。 The image processing method is
Acquiring the body key points of each target in the image,
The determination of the position classification category corresponding to the body key point is that the body key point is located in one specific area of a plurality of specific areas in the image. Represent, that,
Further including determining the area in which each target object is located, based on the location classification category corresponding to each body key point.
The image processing method according to any one of claims 1 to 4.

各特定領域がキャビン内の各座席である場合、前記体キーポイントに対応する位置分類カテゴリは、前記体キーポイントに対応する座席であり、前記各体キーポイントに対応する位置分類カテゴリに基づいて、各目標対象が位置する領域を決定することは、
１つの目標対象の体キーポイントに対応する座席に基づいて、当該目標対象が位置する座席を決定することを含み、
前記画像処理方法は、
キャビンにおける各目標対象が位置する座席に従って、前記キャビン内の各座席の状態を決定することを更に含む、
請求項５に記載の画像処理方法。 When each specific area is each seat in the cabin, the position classification category corresponding to the body key point is the seat corresponding to the body key point, and is based on the position classification category corresponding to the body key point. Determining the area where each target is located is
Including determining the seat in which the target is located, based on the seat corresponding to the body key point of the target.
The image processing method is
Further comprising determining the condition of each seat in said cabin according to the seat in which each target object is located in the cabin.
The image processing method according to claim 5.

前記１つの目標対象の体キーポイントに対応する座席に基づいて、当該目標対象が位置する座席を決定することは、
１つの目標対象の複数の体キーポイントのうち、同じ座席に対応する体キーポイントの数を統計することと、
体キーポイントの最大数に対応する座席が前記目標対象の座席であると決定することと、を含む、
請求項６に記載の画像処理方法。 Determining the seat in which the target is located is based on the seat corresponding to the body key point of the one target.
Statistics on the number of body key points corresponding to the same seat among multiple body key points for one target
Including determining that the seat corresponding to the maximum number of body key points is the target seat.
The image processing method according to claim 6.

画像処理装置であって、
取得ユニット、第１決定ユニット、第２決定ユニット及びマッチングユニットを備え、
前記取得ユニットは、検出待ちの画像を取得するように構成され、
前記第１決定ユニットは、前記画像において目標対象の顔を表す第１検出枠を決定するように構成され、前記第１検出枠の数はＭであり、
前記第２決定ユニットは、前記画像において目標対象の体を表す第２検出枠を決定するように構成され、前記第２検出枠の数はＮであり、Ｍ及びＮのいずれも非負の整数であり、
前記マッチングユニットは、Ｍ個の第１検出枠及びＮ個の第２検出枠のうち、マッチング関係を満たすＫ個の第１検出枠と第２検出枠を決定し、Ｍ、Ｎ及びＫに基づいて前記画像内の目標対象の数を決定するように構成され、Ｋは非負の整数であり、ＫはＭより小さいか等しく、ＫはＮより小さいか等しい、前記画像処理装置。 It is an image processing device
It is equipped with an acquisition unit, a first decision unit, a second decision unit, and a matching unit.
The acquisition unit is configured to acquire images awaiting detection.
The first determination unit is configured to determine a first detection frame representing the face of the target target in the image, and the number of the first detection frames is M.
The second determination unit is configured to determine a second detection frame representing the target body in the image, the number of the second detection frames is N, and both M and N are non-negative integers. can be,
The matching unit determines K first detection frames and second detection frames that satisfy the matching relationship among the M first detection frames and N second detection frames, and is based on M, N, and K. The image processing apparatus, wherein K is a non-negative integer, K is less than or equal to M, and K is less than or equal to N, configured to determine the number of target objects in the image.

前記マッチングユニットは、前記Ｍ個の第１検出枠をトラバースして、各第１検出枠と各第２検出枠とのＩｏＵ（ＩｎｔｅｒｓｅｃｔｉｏｎｏｖｅｒＵｎｉｏｎ）を決定し、各第１検出枠と各第２検出枠とのＩｏＵに基づいて、マッチング関係を満たす第１検出枠と第２検出枠を決定するように構成される、
請求項８に記載の画像処理装置。 The matching unit traverses the M first detection frames to determine the IoU (Intersection over Union) between each first detection frame and each second detection frame, and each first detection frame and each second detection frame. It is configured to determine the first detection frame and the second detection frame that satisfy the matching relationship based on the IoU with the detection frame.
The image processing apparatus according to claim 8.

前記マッチングユニットは、各第１検出枠と各第２検出枠とのＩｏＵのうちの最大ＩｏＵを決定し、前記最大ＩｏＵがプリセットされた閾値より大きいかどうかを判断し、前記最大ＩｏＵが前記プリセットされた閾値より大きいことに応答して、前記最大ＩｏＵに対応する第１検出枠と第２検出枠がマッチング関係を満たすと決定するように構成される、
請求項９に記載の画像処理装置。 The matching unit determines the maximum IoU of the IoUs of each first detection frame and each second detection frame, determines whether the maximum IoU is larger than the preset threshold value, and the maximum IoU is the preset. It is configured to determine that the first detection frame and the second detection frame corresponding to the maximum IoU satisfy the matching relationship in response to being larger than the threshold value.
The image processing apparatus according to claim 9.

前記マッチングユニットは、前記目標対象の数が、Ｋ＋（Ｍ－Ｋ）＋（Ｎ－Ｋ）であると決定するように構成される、
請求項８ないし１０のいずれか一項に記載の画像処理装置。 The matching unit is configured to determine that the number of target objects is K + (MK) + (NK).
The image processing apparatus according to any one of claims 8 to 10.

前記画像処理装置は更に、分類ユニット及び第３決定ユニットを備え、
前記第２決定ユニットは更に、前記画像内の各目標対象の体キーポイントを取得するように構成され、
前記分類ユニットは、前記体キーポイントに対応する位置分類カテゴリを決定するように構成され、前記位置分類カテゴリは、前記体キーポイントが前記画像内の複数の特定領域のうちの１つの特定領域内に位置することを表し、
前記第３決定ユニットは、各体キーポイントに対応する位置分類カテゴリに基づいて、各目標対象が位置する領域を決定するように構成される、
請求項８ないし１１のいずれか一項に記載の画像処理装置。 The image processing apparatus further includes a classification unit and a third determination unit.
The second determination unit is further configured to acquire body key points for each target object in the image.
The classification unit is configured to determine a location classification category corresponding to the body key point, wherein the location classification category is such that the body key point is within one particular region of a plurality of specific regions in the image. Represents being located in
The third determination unit is configured to determine the area in which each target object is located, based on the position classification category corresponding to each body key point.
The image processing apparatus according to any one of claims 8 to 11.

各特定領域がキャビン内の各座席である場合、前記体キーポイントに対応する位置分類カテゴリは、前記体キーポイントに対応する座席であり、
前記第３決定ユニットは、１つの目標対象の体キーポイントに対応する座席に基づいて、当該目標対象が位置する座席を決定し、キャビンにおける各目標対象が位置する座席に従って、前記キャビン内の各座席の状態を決定するように構成される、
請求項１２に記載の画像処理装置。 When each specific area is each seat in the cabin, the position classification category corresponding to the body key point is the seat corresponding to the body key point.
The third determination unit determines the seat in which the target object is located based on the seat corresponding to the body key point of the target object, and each in the cabin according to the seat in which each target object is located in the cabin. Configured to determine the condition of the seat,
The image processing apparatus according to claim 12.

前記第３決定ユニットは、１つの目標対象の複数の体キーポイントのうち、同じ座席に対応する体キーポイントの数を統計し、体キーポイントの最大数に対応する座席が前記目標対象の座席であると決定するように構成される、
請求項１３に記載の画像処理装置。 The third determination unit statistics the number of body key points corresponding to the same seat among a plurality of body key points of one target, and the seat corresponding to the maximum number of body key points is the seat of the target. Configured to determine that
The image processing apparatus according to claim 13.

コンピュータプログラムが記憶されているコンピュータ可読記憶媒体であって、
当該プログラムがプロセッサによって実行されるときに、請求項１ないし７のいずれか一項に記載の方法のステップを実現する、前記コンピュータ可読記憶媒体。 A computer-readable storage medium that stores computer programs.
The computer-readable storage medium that realizes the steps of the method according to any one of claims 1 to 7, when the program is executed by a processor.

電子機器であって、
メモリと、プロセッサと、メモリに記憶された、コンピュータによって実行可能なコンピュータプログラムと、を備え、前記プロセッサが、前記プログラムを実行するときに、請求項１ないし７のいずれか一項に記載の方法のステップを実現する、前記電子機器。 It ’s an electronic device,
The method according to any one of claims 1 to 7, comprising a memory, a processor, and a computer program stored in the memory that can be executed by a computer, when the processor executes the program. The electronic device that realizes the steps of.

コンピュータ可読コードを含むコンピュータプログラム製品あって、
前記コンピュータ可読コードが電子機器で実行されるときに、前記電子機器のプロセッサに、請求項１ないし７のいずれか一項に記載の方法を実行させる、前記コンピュータプログラム。 There are computer program products that include computer-readable code,
The computer program that causes the processor of the electronic device to execute the method according to any one of claims 1 to 7 when the computer-readable code is executed in the electronic device.