WO2024116444A1

WO2024116444A1 - Image processing device and image processing program

Info

Publication number: WO2024116444A1
Application number: PCT/JP2023/022307
Authority: WO
Inventors: 悦郎籾山
Original assignee: 株式会社Ｊｖｃケンウッド
Priority date: 2022-11-28
Filing date: 2023-06-15
Publication date: 2024-06-06

Abstract

A detection unit 24 of an image processing device 2 detects a person area and an area of interest within the person area from a color image captured by a visible light imaging unit 11. A correction unit 25 corrects an abnormal depth value, which is contained in an area of interest within a person area that is in a distance image generated on the basis of output of a range sensor unit 12 and corresponds to the person area of the color image by using a normal depth value within the person area of the distance image.

Description

画像処理装置、画像処理プログラムImage processing device, image processing program

　本発明は、距離画像を補正する画像処理装置、画像処理プログラムに関する。 The present invention relates to an image processing device and an image processing program that correct a distance image.

　近年、ＴｏＦ（Time of Flight）センサ付きのＲＧＢカメラが普及してきている。ＴＯＦセンサ付きのカメラでは、ＴＯＦセンサから深度データを取得し、ＲＧＢカメラからカラー画像を取得し、両者を組み合わせることで３次元点群データを生成することができる（例えば、特許文献１参照）。 In recent years, RGB cameras equipped with ToF (Time of Flight) sensors have become popular. With a camera equipped with a TOF sensor, depth data can be acquired from the TOF sensor, color images can be acquired from the RGB camera, and 3D point cloud data can be generated by combining the two (see, for example, Patent Document 1).

特開２０１４－１５７０４４号公報JP 2014-157044 A

　ＴｏＦセンサで得られる被写体までの距離を表す深度データは、ＬＤ（Laser Diode）光の反射光を利用している。被写体が人物の場合、黒い毛髪でＬＤ光が吸収されてしまったり、眼鏡のレンズやイヤリングやブローチのようなガラスや金属でできたアクセサリーあるいはボタンなどでＬＤ光が反射されてしまったりすることがある。その場合、正確な深度データを取得できなくなるときがある。不正確な深度データをもとに３次元点群データを生成した場合、毛髪部分や眼鏡部分あるいはアクセサリー部分などの注目領域に、欠落や歪みが発生しやすくなる。 The depth data obtained by the ToF sensor, which indicates the distance to the subject, uses reflected light from a laser diode (LD). When the subject is a person, the LD light can be absorbed by black hair, or reflected by glasses lenses, accessories made of glass or metal such as earrings and brooches, or buttons. In such cases, it may not be possible to obtain accurate depth data. If 3D point cloud data is generated based on inaccurate depth data, it is easy for missing or distortion to occur in areas of interest such as hair, glasses, or accessories.

　本実施形態はこうした状況に鑑みてなされたものであり、その目的は、人物の３次元点群データを高精度に作成する技術を提供することにある。 This embodiment was made in consideration of these circumstances, and its purpose is to provide a technology for creating 3D point cloud data of a person with high accuracy.

　上記課題を解決するために、本実施形態のある態様の画像処理装置は、可視光撮像部で撮像されカラー画像から人物領域と、前記人物領域内の注目領域を検出する検出部と、測距センサ部の出力をもとに生成された距離画像における、前記カラー画像の人物領域に対応する人物領域内の注目領域に含まれる異常なデプス値を、前記距離画像の人物領域内の正常なデプス値を使用して補正する補正部と、を備える。 In order to solve the above problem, an image processing device according to one aspect of this embodiment includes a detection unit that detects a person area and an attention area within the person area from a color image captured by a visible light imaging unit, and a correction unit that corrects, in a distance image generated based on the output of a distance measuring sensor unit, abnormal depth values contained in an attention area within the person area corresponding to the person area of the color image, using normal depth values within the person area of the distance image.

　なお、以上の構成要素の任意の組合せ、本実施形態の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本実施形態の態様として有効である。 In addition, any combination of the above components, and conversions of the expressions of this embodiment between methods, devices, systems, recording media, computer programs, etc., are also valid aspects of this embodiment.

　本実施形態によれば、人物の３次元点群データを高精度に作成することができる。 According to this embodiment, it is possible to create 3D point cloud data of a person with high accuracy.

実施形態に係る、撮像装置と画像処理装置の構成例を示す機能ブロック図である。1 is a functional block diagram showing an example of the configuration of an imaging device and an image processing device according to an embodiment. カラー画像と距離画像の一例を示す図である。1A and 1B are diagrams illustrating an example of a color image and a distance image. 実施形態に係る画像処理装置による距離画像の補正処理の流れを示すフローチャートである。5 is a flowchart showing the flow of distance image correction processing by the image processing device according to the embodiment. 距離画像の人物領域のデプス値を抽出する処理の具体的なイメージ図を示す図である。11A and 11B are diagrams showing a specific image diagram of a process for extracting depth values of a person region in a distance image. カラー画像から、顔領域と眼鏡領域を検出する処理の具体的なイメージ図を示す図である。FIG. 13 is a diagram showing a specific image diagram of a process for detecting a face region and an eyeglasses region from a color image. 隣接画素からの補間処理の具体例を示す図である。FIG. 13 is a diagram showing a specific example of an interpolation process using adjacent pixels. 図７（ａ）－（ｂ）は、スプライン補間処理の具体例を示す図である。7A and 7B are diagrams showing a specific example of spline interpolation processing. 垂直方向の画素からの補間処理の具体例を示す図である。13A and 13B are diagrams illustrating a specific example of an interpolation process from pixels in the vertical direction. 眼鏡の反射と毛髪の欠落が補正された後の人物領域の距離画像を示す図である。FIG. 13 is a diagram showing a distance image of a person area after eyeglass reflections and missing hair have been corrected. 変形例に係る撮像装置の構成例を示す機能ブロック図である。FIG. 13 is a functional block diagram showing a configuration example of an imaging device according to a modified example.

　以下では注目領域として眼鏡と毛髪を例にとり説明する。
　図１は、実施形態に係る、撮像装置１と画像処理装置２の構成例を示す機能ブロック図である。撮像装置１は、測距センサ付きの可視光カメラを搭載した装置であり、ビデオカメラ、監視カメラ、車載カメラ、マルチコプタ（ドローン）に搭載されたカメラなどが該当する。画像処理装置２は、画像処理機能を有する汎用の情報処理装置（例えば、ＰＣ、サーバ、タブレット、スマートフォン）、または画像処理機能を有する専用の装置である。撮像装置１と画像処理装置２は、インターネット、専用線などのネットワークを介して接続される。 In the following, glasses and hair are used as examples of areas of interest.
1 is a functional block diagram showing an example of the configuration of an imaging device 1 and an image processing device 2 according to an embodiment. The imaging device 1 is a device equipped with a visible light camera with a distance measuring sensor, and corresponds to a video camera, a surveillance camera, an in-vehicle camera, a camera mounted on a multicopter (drone), etc. The image processing device 2 is a general-purpose information processing device (e.g., a PC, a server, a tablet, a smartphone) having an image processing function, or a dedicated device having an image processing function. The imaging device 1 and the image processing device 2 are connected via a network such as the Internet or a dedicated line.

　撮像装置１は、可視光撮像部１１、測距センサ部１２、処理部１３を備える。可視光撮像部１１は、レンズ、３原色（ＲＧＢ）のカラーフィルタ、固体撮像素子を含む。固体撮像素子には例えば、ＣＭＯＳ（Complementary Metal Oxide Semiconductor）イメージセンサまたはＣＣＤ（Charge Coupled Device）イメージセンサを使用することができる。固体撮像素子は、レンズで集光され、カラーフィルタを透過した光を、電気的なカラー画像信号に変換して処理部１３に出力する。 The imaging device 1 comprises a visible light imaging section 11, a distance measurement sensor section 12, and a processing section 13. The visible light imaging section 11 includes a lens, a color filter of the three primary colors (RGB), and a solid-state imaging element. For example, a CMOS (Complementary Metal Oxide Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor can be used as the solid-state imaging element. The solid-state imaging element converts light that is collected by the lens and transmitted through the color filter into an electrical color image signal and outputs it to the processing section 13.

　測距センサ部１２は、測距センサ部１２から被写体までの距離情報を取得し、処理部１３に出力する。本実施の形態では、測距センサ部１２はＴｏＦ方式で距離情報を取得する。測距センサ部１２は、光源（例えば、ＬＤ）と受光センサ（例えば、フォトダイオード）を備え、光源から照射される所定のパルスを含む赤外線レーザの発光タイミングと、受光センサによる被写体からの反射光の受光タイミングとの時間差をもとに、被写体までの距離を測定する。 The distance measurement sensor unit 12 acquires distance information from the distance measurement sensor unit 12 to the subject and outputs it to the processing unit 13. In this embodiment, the distance measurement sensor unit 12 acquires distance information using the ToF method. The distance measurement sensor unit 12 includes a light source (e.g., LD) and a light receiving sensor (e.g., photodiode), and measures the distance to the subject based on the time difference between the emission timing of an infrared laser containing a predetermined pulse irradiated from the light source and the reception timing of the reflected light from the subject by the light receiving sensor.

　なお、可視光撮像部１１の固体撮像素子にＩＲ画素を設け、当該ＩＲ画素を受光センサとして使用してもよい。測距センサ部１２は、可視光撮像部１１の画角の範囲内の距離情報を、レーザ光の照射角度を光学的または機械的に変化させるか、レーザ光の照射範囲を拡散させることで取得する。なお、測距センサ部１２としてＬｉＤＡＲ（Laser Imaging Detection and Ranging）を使用してもよい。 In addition, IR pixels may be provided in the solid-state imaging element of the visible light imaging unit 11, and the IR pixels may be used as light receiving sensors. The distance measurement sensor unit 12 obtains distance information within the angle of view of the visible light imaging unit 11 by optically or mechanically changing the irradiation angle of the laser light, or by diffusing the irradiation range of the laser light. In addition, LiDAR (Laser Imaging Detection and Ranging) may be used as the distance measurement sensor unit 12.

　処理部１３は、カラー画像データ生成部１４、デプスデータ生成部１５、圧縮符号化部１６、送信部１７を含む。処理部１３は、ハードウェア資源とソフトウェア資源の協働、またはハードウェア資源のみにより実現される。ハードウェア資源として、ＣＰＵ、ＲＯＭ、ＲＡＭ、ＧＰＵ（Graphics Processing Unit）、ＡＳＩＣ（Application Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）、高周波回路、その他のＬＳＩを利用できる。ソフトウェア資源としてファームウェアなどのプログラムを利用できる。 The processing unit 13 includes a color image data generating unit 14, a depth data generating unit 15, a compression encoding unit 16, and a transmission unit 17. The processing unit 13 is realized by a combination of hardware resources and software resources, or by hardware resources alone. The hardware resources that can be used include a CPU, ROM, RAM, GPU (Graphics Processing Unit), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array), high-frequency circuits, and other LSIs. The software resources that can be used include programs such as firmware.

　カラー画像データ生成部１４は、可視光撮像部１１から出力されるカラー画像信号をもとにカラー画像データを生成する。カラー画像データは例えば、画素ごとに２４ビットの色データ（Ｒ＝８ビット，Ｇ＝８ビット、Ｂ＝８ビット）で構成される。デプスデータ生成部１５は、測距センサ部１２から出力される距離情報を取得し、カラー画像の画素配列に並べて距離画像（デプスマップともいう）を生成する。 The color image data generating unit 14 generates color image data based on the color image signal output from the visible light imaging unit 11. The color image data is composed of, for example, 24-bit color data (R=8 bits, G=8 bits, B=8 bits) for each pixel. The depth data generating unit 15 acquires the distance information output from the distance measuring sensor unit 12, and arranges it in the pixel array of the color image to generate a distance image (also called a depth map).

　圧縮符号化部１６は、カラー画像データ生成部１４で生成されたカラー画像データとデプスデータ生成部１５で生成されたデプスデータを、所定の圧縮符号化方式（例えば、ＭＰＥＧをベースとして圧縮符号化方式）で圧縮符号化する。送信部１７は、有線または無線でネットワークに接続し、当該ネットワークに接続されている画像処理装置２に、圧縮符号化データを送信する。 The compression-encoding unit 16 compresses and encodes the color image data generated by the color image data generation unit 14 and the depth data generated by the depth data generation unit 15 using a predetermined compression-encoding method (e.g., a compression-encoding method based on MPEG). The transmission unit 17 is connected to a network via a wired or wireless connection, and transmits the compressed and encoded data to the image processing device 2 connected to the network.

　画像処理装置２は、受信部２１、伸張復号部２２、分離部２３、検出部２４、補正部２５、点群データ生成部２６を含む。画像処理装置２内の当該機能ブロックは、ハードウェア資源とソフトウェア資源の協働、またはハードウェア資源のみにより実現される。ハードウェア資源として、ＣＰＵ、ＲＯＭ、ＲＡＭ、ＧＰＵ、ＡＳＩＣ、ＦＰＧＡ、高周波回路、その他のＬＳＩを利用できる。ソフトウェア資源としてオペレーティングシステム、アプリケーションプログラムなどのプログラムを利用できる。 The image processing device 2 includes a receiving unit 21, a decompression/decoding unit 22, a separating unit 23, a detecting unit 24, a correcting unit 25, and a point cloud data generating unit 26. The functional blocks in the image processing device 2 are realized by a combination of hardware resources and software resources, or by hardware resources alone. As hardware resources, a CPU, ROM, RAM, GPU, ASIC, FPGA, high-frequency circuits, and other LSIs can be used. As software resources, programs such as an operating system and application programs can be used.

　受信部２１は、撮像装置１から送信された圧縮符号化データを受信する。伸張復号部２２は、受信した圧縮符号化データを伸張復号する。分離部２３は、フレームごとに同期をとって、カラー画像データとデプスデータを出力する。 The receiving unit 21 receives the compressed and encoded data transmitted from the imaging device 1. The decompression and decoding unit 22 decompresses and decodes the received compressed and encoded data. The separating unit 23 outputs color image data and depth data, synchronizing with each frame.

　検出部２４は、カラー画像内から人物領域を検出する。検出部２４は、オープンソースの物体検出アルゴリズムを使用して人物領域を検出することができる。例えば、物体検出アルゴリズムのライブラリとして、ディープニューラルネットワークを利用したＤｅｔｅｃｔｒｏｎなどを使用することができる。 The detection unit 24 detects a human region from within a color image. The detection unit 24 can detect a human region using an open source object detection algorithm. For example, Detectron, which uses a deep neural network, can be used as a library of object detection algorithms.

　検出部２４は、検出したカラー画像の人物領域内から顔領域を検出する。検出部２４は例えば、カスケード分類器（識別器）を用いて人物領域内から顔領域を検出する。例えばＯｐｅｎＣＶでは、Ｈａａｒ－ｌｉｋｅ特徴量を使用した、人物の顔検出用のカスケード分類器が用意されている。 The detection unit 24 detects a face region from within the person region of the detected color image. The detection unit 24 detects a face region from within the person region, for example, using a cascade classifier (classifier). For example, OpenCV provides a cascade classifier for human face detection that uses Haar-like features.

　検出部２４は、検出したカラー画像の顔領域内から眼鏡領域を検出する。検出部２４は、ディープラーニングを用いた、眼鏡の有無、瞳の位置などの目に関する情報を検出する既存の検出アルゴリズムを使用して眼鏡領域を検出することができる。 The detection unit 24 detects the glasses area from within the face area of the detected color image. The detection unit 24 can detect the glasses area using an existing detection algorithm that uses deep learning to detect information about the eyes, such as the presence or absence of glasses and the position of the pupils.

　また検出部２４は、偏光カメラを用いて眼鏡領域を検出することもできる。偏光カメラでは固体撮像素子の上に偏光板アレイが組み込まれる。偏光板アレイは、異なる４種類の方位（０°、４５°、９０°、１３５°）の４画素を含む単位画素群をマトリクス状に配列して構成される。検出部２４は、偏光カメラからフレームごとに４種類の光強度を取得し、顔領域内において、均一でフラットな光強度を示す領域を眼鏡領域と判定する。 The detection unit 24 can also detect the glasses area using a polarized camera. In a polarized camera, a polarizing plate array is incorporated on top of a solid-state imaging element. The polarizing plate array is composed of a matrix of unit pixel groups, each of which includes four pixels in four different orientations (0°, 45°, 90°, 135°). The detection unit 24 obtains four types of light intensity from the polarized camera for each frame, and determines that an area within the face area that exhibits a uniform, flat light intensity is the glasses area.

　検出部２４は、検出したカラー画像の顔領域内から毛髪領域を検出することができる。検出部２４は例えば、毛髪領域の分類器を使用して毛髪領域を検出する。なお、人物領域の上半身部分（眼鏡領域を除く）を、毛髪が存在する可能性がある領域として、毛髪領域に設定してもよい。 The detection unit 24 can detect a hair region from within the face region of the detected color image. For example, the detection unit 24 detects the hair region using a hair region classifier. Note that the upper body part of the person region (excluding the glasses region) may be set as a hair region as it is a region where hair may be present.

　補正部２５は、距離画像における、カラー画像の人物領域に対応する人物領域内の、眼鏡領域または毛髪領域に含まれる異常なデプス値を、距離画像の人物領域内の正常なデプス値を使用して補正する。 The correction unit 25 corrects abnormal depth values contained in the glasses area or hair area in the person area of the distance image that corresponds to the person area of the color image, using normal depth values in the person area of the distance image.

　点群データ生成部２６は、補正部２５により補正された距離画像と、カラー画像を組み合わせて３次元点群データを生成する。３次元点群データは、デプス値（Ｚ）を含む３次元座標値（Ｘ，Ｙ，Ｚ）と、画素ごとの色情報（Ｒ，Ｇ，Ｂ）で構成される。生成された３次元点群データに基づく３次元画像は例えば、ＶＲデバイスやＡＲデバイスに表示させることができる。撮像装置１で撮影中の人物が映った３次元画像をリアルタイムに表示させることもできる。なお図１では、撮像装置１から、カラー画像データとデプスデータの圧縮符号化データを画像処理装置２に送信する例を説明したが、撮像装置１側で３次元点群データを生成し、３次元点群データの圧縮符号化データを画像処理装置２に送信してもよい。その場合、画像処理装置２側で３次元点群データを伸張復号した後、３次元点群データからカラー画像データとデプスデータを復元する。 The point cloud data generating unit 26 generates three-dimensional point cloud data by combining the distance image corrected by the correction unit 25 with the color image. The three-dimensional point cloud data is composed of three-dimensional coordinate values (X, Y, Z) including a depth value (Z) and color information (R, G, B) for each pixel. The three-dimensional image based on the generated three-dimensional point cloud data can be displayed on a VR device or an AR device, for example. It is also possible to display a three-dimensional image in real time that shows a person being photographed by the imaging device 1. Note that in FIG. 1, an example is described in which the imaging device 1 transmits compressed and encoded data of color image data and depth data to the image processing device 2, but the imaging device 1 may generate three-dimensional point cloud data and transmit the compressed and encoded data of the three-dimensional point cloud data to the image processing device 2. In that case, the image processing device 2 decompresses and decodes the three-dimensional point cloud data, and then restores the color image data and depth data from the three-dimensional point cloud data.

　人物をＴｏＦセンサで測距する場合、眼鏡によるＬＤ光の反射、黒い毛髪によるＬＤ光の吸収により、正常な３次元点群データが得られず、本来の形状とは異なる再現性になる場合がある。 When measuring the distance to a person using a ToF sensor, reflection of LD light by eyeglasses and absorption of LD light by black hair can prevent normal 3D point cloud data from being obtained, resulting in reproducibility that differs from the original shape.

　図２は、カラー画像Ｃ１と距離画像Ｄ１の一例を示す図である。距離画像Ｄ１では、眼鏡のレンズの一部分Ｇ１が強く反射している。この距離画像Ｄ１に基づく３次元画像では、レンズの一部分Ｇ１に相当する部分が前に飛び出るか、その部分が欠落してしまう。また、毛髪部分Ｈ１の一部が欠落してしまうことがある。 FIG. 2 shows an example of a color image C1 and a distance image D1. In the distance image D1, a portion G1 of the eyeglass lens is strongly reflected. In a three-dimensional image based on this distance image D1, the portion of the lens corresponding to G1 either protrudes forward or is missing. Also, part of the hair portion H1 may be missing.

　図３は、実施形態に係る画像処理装置２による距離画像の補正処理の流れを示すフローチャートである。検出部２４は、カラー画像内において人物領域を探索する（ステップＳ１０）。人物領域が検出されなかった場合（ステップＳ１１：ＮＯ）、対象フレームの補正処理を終了する。人物領域が検出された場合（ステップＳ１１：ＹＥＳ）、検出部２４は、カラー画像から人物領域を抽出する（ステップＳ１２）。補正部２５は、距離画像から、カラー画像の人物領域に対応する領域に含まれる画素のデプス値を抽出する（ステップＳ１３）。 FIG. 3 is a flowchart showing the flow of distance image correction processing by the image processing device 2 according to the embodiment. The detection unit 24 searches for a person area in the color image (step S10). If a person area is not detected (step S11: NO), the correction processing for the target frame ends. If a person area is detected (step S11: YES), the detection unit 24 extracts the person area from the color image (step S12). The correction unit 25 extracts depth values of pixels included in an area corresponding to the person area in the color image from the distance image (step S13).

　図４は、距離画像の人物領域Ｐｄのデプス値を抽出する処理の具体的なイメージ図を示す。カラー画像Ｃ１内において人物領域Ｐ１が検出され、背景が分離され、人物領域Ｐ１のみが抽出される（Ｃ１ａ参照）。距離画像Ｄ１から、対応する人物領域Ｐｄのみのデプス値が抽出される（Ｄ１ａ参照）。 Figure 4 shows a specific image diagram of the process of extracting depth values of a person region Pd in a distance image. A person region P1 is detected in a color image C1, the background is separated, and only the person region P1 is extracted (see C1a). From the distance image D1, the depth value of only the corresponding person region Pd is extracted (see D1a).

　図３に戻る。検出部２４は、カラー画像から顔領域を検出する（ステップＳ１４）。検出部２４は、顔領域から眼鏡領域を検出する（ステップＳ１５）。図５は、カラー画像Ｃ１から、顔領域Ｆ１と眼鏡領域Ｅｒ、Ｅｌを検出する処理の具体的なイメージ図を示す。なお、眼鏡を検出できなかった場合、ステップＳ１５～ステップＳ２２の処理をスキップする。なお、眼鏡を検出できなかった場合、眼鏡の検出の代わりに瞳領域を検出してもよい。 Returning to FIG. 3, the detection unit 24 detects a face region from the color image (step S14). The detection unit 24 detects a glasses region from the face region (step S15). FIG. 5 shows a specific image diagram of the process of detecting the face region F1 and glasses regions Er and El from the color image C1. Note that if glasses cannot be detected, the process of steps S15 to S22 is skipped. Note that if glasses cannot be detected, the pupil region may be detected instead of detecting glasses.

　図３に戻る。補正部２５は、距離画像の対応する顔領域（眼鏡領域を除く）に含まれる画素のデプス値の最大値と最小値を特定する（ステップＳ１６）。例えば、デプス値は０～２５５や０～６５５３５の範囲の値をとる。０が最も近い距離を示す値であり、２５５や６５５３５などの最大値が最も遠い距離を示す値である。また、データが欠損している画素もデプス値は０になる。距離画像をグレースケール画像で表示する場合は０は黒で、最大値は白で表示される。なお、０に最も近い距離を示す値を割り当て、最大値に最も遠い距離を示す値を割り当ててもよい。 Returning to FIG. 3, the correction unit 25 identifies the maximum and minimum depth values of the pixels contained in the corresponding face region (excluding the glasses region) of the distance image (step S16). For example, the depth value may take a range of 0 to 255 or 0 to 65535. 0 is the value indicating the closest distance, and a maximum value such as 255 or 65535 indicates the furthest distance. Also, pixels with missing data will have a depth value of 0. When the distance image is displayed as a grayscale image, 0 is displayed as black and the maximum value is displayed as white. Note that the value indicating the closest distance may be assigned to 0, and the value indicating the furthest distance may be assigned to the maximum value.

　補正部２５は、眼鏡領域の対象画素に初期値を設定する（ステップＳ１７）。通常、初期値は眼鏡領域の一番左上の画素に設定される。補正部２５は眼鏡領域の全画素のチェックが終了したか否か判定する（ステップＳ１８）。眼鏡領域の全画素のチェックが終了した場合（ステップＳ１８：ＹＥＳ）、ステップＳ２３に遷移する。 The correction unit 25 sets an initial value to the target pixel in the glasses region (step S17). Usually, the initial value is set to the top left pixel of the glasses region. The correction unit 25 determines whether checking of all pixels in the glasses region has been completed (step S18). If checking of all pixels in the glasses region has been completed (step S18: YES), the process proceeds to step S23.

　眼鏡領域の全画素のチェックが終了していない場合（ステップＳ１８：ＮＯ）、補正部２５は対象画素のデプス値が０であるか否か判定する（ステップＳ１９）。対象画素のデプス値が０である場合（ステップＳ１９：ＹＥＳ）、ステップＳ２１に遷移する。 If checking of all pixels in the glasses area has not been completed (step S18: NO), the correction unit 25 determines whether the depth value of the target pixel is 0 (step S19). If the depth value of the target pixel is 0 (step S19: YES), the process proceeds to step S21.

　対象画素のデプス値が０でない場合（ステップＳ１９：ＮＯ）、補正部２５は対象画素のデプス値が、顔領域（眼鏡領域を除く）のデプス値の最大値と最小値の範囲（以下、正常範囲という）に収まるか否か判定する（ステップＳ２０）。正常範囲に収まる場合（ステップＳ２０：ＹＥＳ）、ステップＳ２２に遷移する。 If the depth value of the target pixel is not 0 (step S19: NO), the correction unit 25 determines whether the depth value of the target pixel falls within the range between the maximum and minimum depth values of the face area (excluding the glasses area) (hereinafter referred to as the normal range) (step S20). If it falls within the normal range (step S20: YES), the process proceeds to step S22.

　正常範囲に収まらない場合（ステップＳ２０：ＮＯ）、またはデプス値が０である場合（ステップＳ１９：ＹＥＳ）、補正部２５は対象画素のデプス値を、眼鏡領域の正常なデプス値を使用して補正する（ステップＳ２１）。最小値よりも小さいデプス値の画素は顔の表面より前に飛び出し、３次元画像において目が前に飛び出すような症状を引き起こす。反対に、最大値よりも大きいデプス値の画素は顔の表面より奥に沈み込み、３次元画像において目が沈み込むような症状を引き起こす。図４に示した距離画像Ｄ１ａでは、レンズの一部分Ｇｄに相当する画素のデプス値が最小値よりも小さく、このままの３次元点群データから生成される３次元画像においては、目が前に飛び出すような症状を引き起こす。 If it does not fall within the normal range (step S20: NO) or if the depth value is 0 (step S19: YES), the correction unit 25 corrects the depth value of the target pixel using the normal depth value of the glasses area (step S21). Pixels with a depth value smaller than the minimum value protrude forward from the surface of the face, causing the eyes to pop out in the 3D image. Conversely, pixels with a depth value larger than the maximum value sink back from the surface of the face, causing the eyes to pop out in the 3D image. In the distance image D1a shown in Figure 4, the depth value of the pixel corresponding to a portion Gd of the lens is smaller than the minimum value, causing the eyes to pop out in the 3D image generated from this 3D point cloud data as is.

　補正部２５は例えば、眼鏡領域内の、正常範囲を逸脱している対象画素のデプス値を、隣接する複数の画素のデプス値をもとに補間する。図６は、隣接画素からの補間処理の具体例を示す図である。図６において、画素Ｇ、Ｈ、Ｉ、Ｌ、Ｍ、Ｎが、デプス値が正常範囲に収まっていない画素を示している。補正部２５は、対象画素Ｇに、左上の画素Ａのデプス値、真上の画素Ｂのデプス値、右上の画素Ｃのデプス値、左の画素Ｆのデプス値、左下の画素Ｋのデプス値の平均値を割り当てる。なお、平均値の代わりに中央値を割り当ててもよい。 For example, the correction unit 25 interpolates the depth value of a target pixel in the glasses area that is outside the normal range based on the depth values of multiple adjacent pixels. Figure 6 is a diagram showing a specific example of the interpolation process from adjacent pixels. In Figure 6, pixels G, H, I, L, M, and N indicate pixels whose depth values are not within the normal range. The correction unit 25 assigns to the target pixel G the average value of the depth value of pixel A at the top left, the depth value of pixel B directly above, the depth value of pixel C at the top right, the depth value of pixel F on the left, and the depth value of pixel K at the bottom left. Note that the median may be assigned instead of the average value.

　補正部２５は、対象画素Ｈに、左上の画素Ｂのデプス値、真上の画素Ｃのデプス値、右上の画素Ｄのデプス値、左の画素Ｇのデプス値のデプス値の平均値を割り当てる。この例では、補間済みの画素Ｇは参照先に使用し、補間済みでない画素Ｌは参照先から除外している。なお、参照する画素の範囲は図６に示す例に限らず、右側や下側の画素も参照元に含めてもよい。また、時間方向に近接するフレームの眼鏡領域内の同一位置の画素が正常な場合、補正部２５は、当該正常な画素のデプス値を、対象画素に割り当ててもよい。 The correction unit 25 assigns to the target pixel H the average depth value of the top left pixel B, the depth value of the pixel C directly above, the depth value of the pixel D on the top right, and the depth value of the pixel G on the left. In this example, the interpolated pixel G is used as the reference, and the non-interpolated pixel L is excluded from the reference. Note that the range of pixels to be referenced is not limited to the example shown in FIG. 6, and pixels to the right and below may also be included in the reference source. Furthermore, if a pixel at the same position in the glasses area of a frame adjacent in the time direction is normal, the correction unit 25 may assign the depth value of that normal pixel to the target pixel.

　また、補正部２５は眼鏡領域内の、正常範囲を逸脱している対象画素のデプス値を、水平方向にスプライン補間して補正してもよい。図７（ａ）－（ｂ）は、スプライン補間処理の具体例を示す図である。図７（ａ）は、眼鏡領域の、ある水平方向のラインのデプス値をプロットした図である。図７（ａ）には、デプス値が正常範囲を逸脱している部分（レンズの一部分Ｇｄ）が含まれる。補正部２５は、正常範囲を逸脱している部分の前後を結ぶ区間の関数を、スプライン補間により算出し、算出したスプライン曲線関数（一般的に、３次多項式で求められる）上に、正常範囲を逸脱している部分のデプス値を割り当てる。図７（ｂ）は、正常範囲を逸脱しているデプス値を、スプライン曲線上に割り当てる様子を示している。 The correction unit 25 may also correct the depth values of target pixels in the glasses region that are outside the normal range by performing spline interpolation in the horizontal direction. Figures 7(a) and 7(b) are diagrams showing a specific example of spline interpolation processing. Figure 7(a) is a diagram plotting depth values of a horizontal line in the glasses region. Figure 7(a) includes a portion where the depth value is outside the normal range (a part of the lens Gd). The correction unit 25 calculates the function of the section connecting the front and back of the portion that is outside the normal range by spline interpolation, and assigns the depth value of the portion that is outside the normal range onto the calculated spline curve function (generally calculated using a cubic polynomial). Figure 7(b) shows how the depth value that is outside the normal range is assigned onto the spline curve.

　図３に戻る。眼鏡領域の対象画素の補正が終了、または対象画素のデプス値が正常であった場合、補正部２５は眼鏡領域の対象画素をインクリメントする（ステップＳ２２）。具体的には眼鏡領域の対象画素のアドレスを走査方向に一画素、進める。ステップＳ１８に遷移する。 Returning to FIG. 3, when correction of the target pixel in the glasses area is completed, or when the depth value of the target pixel is normal, correction unit 25 increments the target pixel in the glasses area (step S22). Specifically, the address of the target pixel in the glasses area advances by one pixel in the scanning direction. Then, the process proceeds to step S18.

　眼鏡領域の全画素のチェックが終了すると（ステップＳ１８：ＹＥＳ）、検出部２４は、カラー画像の人物領域から毛髪領域を抽出する（ステップＳ２３）。補正部２５は毛髪領域の対象画素に初期値を設定する（ステップＳ２４）。通常、初期値は毛髪領域の一番左上の画素に設定される。補正部２５は毛髪領域の全画素のチェックが終了したか否か判定する（ステップＳ２５）。 When checking of all pixels in the glasses region has been completed (step S18: YES), the detection unit 24 extracts the hair region from the person region of the color image (step S23). The correction unit 25 sets an initial value to the target pixel in the hair region (step S24). Typically, the initial value is set to the top left pixel of the hair region. The correction unit 25 determines whether checking of all pixels in the hair region has been completed (step S25).

　毛髪領域の全画素のチェックが終了していない場合（ステップＳ２５：ＮＯ）、補正部２５は対象画素のデプス値が０であるか否か判定する（ステップＳ２６）。対象画素のデプス値が０でない場合（ステップＳ２６：ＮＯ）、ステップＳ２８に遷移する。対象画素のデプス値が０である場合（ステップＳ２６：ＹＥＳ）、補正部２５は対象画素のデプス値を、毛髪領域内の正常なデプス値を使用して補正する（ステップＳ２７）。 If checking of all pixels in the hair region has not been completed (step S25: NO), the correction unit 25 determines whether the depth value of the target pixel is 0 or not (step S26). If the depth value of the target pixel is not 0 (step S26: NO), the process proceeds to step S28. If the depth value of the target pixel is 0 (step S26: YES), the correction unit 25 corrects the depth value of the target pixel using a normal depth value in the hair region (step S27).

　図８は、垂直方向の画素からの補間処理の具体例を示す図である。図８において、一番上の、最も濃い色の画素群はデプス値が存在しない領域（背景領域Ｂ１）を示し、その下の白色の画素群は毛髪領域の欠落部分Ｈｍ（図４参照）を示し、一番下の中間の色の画素群はデプス値を有している毛髪領域Ｈｎを示している。 Figure 8 shows a specific example of the interpolation process from pixels in the vertical direction. In Figure 8, the top group of darkest pixels indicates an area where no depth value exists (background area B1), the white pixels below that indicate the missing part Hm of the hair area (see Figure 4), and the bottom group of medium-colored pixels indicates the hair area Hn that has a depth value.

　補正部２５は、毛髪領域の各垂直方向のラインにおいて、デプス値が欠落している画素に、デプス値が欠落していない画素のデプス値を割り当てる。図８に示す例では、画素Ａ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆのデプス値が欠落しており、補正部２５は、デプス値が欠落していない画素Ｇのデプス値を画素Ａ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆに割り当てる。なお、頭頂部側の画素のデプス値が欠落しておらず、額側の画素のデプス値が欠落している場合、補正部２５は、欠落していない頭頂部側の画素のデプス値を、欠落している額側の画素に割り当てる。 The correction unit 25 assigns the depth value of a pixel that has a non-missing depth value to a pixel that has a missing depth value in each vertical line of the hair region. In the example shown in FIG. 8, the depth values of pixels A, B, C, D, E, and F are missing, and the correction unit 25 assigns the depth value of pixel G, which has a non-missing depth value, to pixels A, B, C, D, E, and F. Note that if the depth values of the pixels on the top of the head are not missing, but the depth values of the pixels on the forehead are missing, the correction unit 25 assigns the depth value of the non-missing pixel on the top of the head to the missing pixel on the forehead.

　補正部２５は、対象フレームの毛髪領域の全ての垂直ラインの補正が終了した後、各水平ラインにフィルタ（例えば、移動平均）をかけて、水平方向のデプス値の変化を滑らかにしてもよい。 After completing correction of all vertical lines in the hair region of the target frame, the correction unit 25 may apply a filter (e.g., a moving average) to each horizontal line to smooth out changes in horizontal depth values.

　なお、時間方向に隣接するフレームの毛髪領域内の同一位置の画素が欠落していない場合、補正部２５は、当該欠落していない画素のデプス値を、欠落している対象画素に割り当てることで、対象画素を補正してもよい。 In addition, if there is no missing pixel at the same position in the hair region of an adjacent frame in the time direction, the correction unit 25 may correct the missing target pixel by assigning the depth value of the non-missing pixel to the missing target pixel.

　なお、毛髪領域内においてデプス値が欠落していない画素であっても、異常なデプス値を有する画素である場合、補正の対象としてもよい。例えば、眼鏡領域と同様に、顔領域のデプス値の正常範囲に収まらないデプス値を有する画素も補正の対象に含めてもよい。 Note that even if a pixel in the hair region has no missing depth values, it may be subject to correction if it has an abnormal depth value. For example, like the glasses region, pixels with depth values that do not fall within the normal range of depth values in the face region may also be included as subjects of correction.

　図３に戻る。毛髪領域の対象画素の補正が終了、または対象画素のデプス値が欠落していない場合、補正部２５は毛髪領域の対象画素をインクリメントする（ステップＳ２８）。具体的には毛髪領域の対象画素のアドレスを走査方向に一画素、進める。ステップＳ２５に遷移する。毛髪領域の全画素のチェックが終了すると（ステップＳ２５：ＹＥＳ）、対象フレームの補正処理を終了する。 Return to FIG. 3. When correction of the target pixel in the hair region is complete, or when the depth value of the target pixel is not missing, correction unit 25 increments the target pixel in the hair region (step S28). Specifically, the address of the target pixel in the hair region advances by one pixel in the scanning direction. Then, proceed to step S25. When checking of all pixels in the hair region is complete (step S25: YES), the correction process for the target frame ends.

　図９は、眼鏡の反射と毛髪の欠落が補正された後の人物領域Ｐｄの距離画像Ｄ１ｂを示す。補正後の距離画像に基づく３次元画像では、目の飛び出しや毛髪の一部が欠落することが回避される。 Figure 9 shows a distance image D1b of the person area Pd after the reflection from the glasses and missing hair have been corrected. In a 3D image based on the corrected distance image, popping out eyes and missing hair are avoided.

　以上説明したように本実施形態によれば、眼鏡領域または毛髪領域に含まれる異常なデプス値を補正することで、人物の３次元点群データを高精度に生成することができる。 As described above, according to this embodiment, by correcting abnormal depth values contained in the glasses region or hair region, it is possible to generate 3D point cloud data of a person with high accuracy.

　以上、本発明を実施形態に基づき説明した。この実施形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on an embodiment. This embodiment is merely an example, and it will be understood by those skilled in the art that various modifications are possible in the combination of each component and each treatment process, and that such modifications are also within the scope of the present invention.

　図１０は、変形例に係る撮像装置１の構成例を示す機能ブロック図である。変形例では、図１に示した画像処理装置２の検出部２４および補正部２５の機能が、撮像装置１の処理部１３内に組み込まれている。この場合、撮像装置１の処理部１３は、画像処理装置２と同様の機能を有していると言える。また変形例では、撮像装置１の処理部１３に点群データ生成部１８が設けられ、点群データ生成部１８は、補正部２５により補正された距離画像と、カラー画像を組み合わせて３次元点群データを生成する。 FIG. 10 is a functional block diagram showing an example configuration of an imaging device 1 according to a modified example. In this modified example, the functions of the detection unit 24 and correction unit 25 of the image processing device 2 shown in FIG. 1 are incorporated into the processing unit 13 of the imaging device 1. In this case, it can be said that the processing unit 13 of the imaging device 1 has the same functions as the image processing device 2. Also, in this modified example, a point cloud data generation unit 18 is provided in the processing unit 13 of the imaging device 1, and the point cloud data generation unit 18 generates three-dimensional point cloud data by combining the distance image corrected by the correction unit 25 and a color image.

　検出部２４および補正部２５の機能がソフトウェアで実装される場合、距離画像の補正処理を含む実施形態に係る画像処理プログラムが予め処理部１３内にインストールされていてもよいし、事後的にインストールされてもよい。後者の場合、当該画像処理プログラムは、アプリケーションプログラムストアからネットワークを介して撮像装置１にダウンロードされてインストールされる。例えば、撮像装置１がスマートフォンの場合、当該画像処理プログラムを実行するハードウェア要件を十分に満たしている。 When the functions of the detection unit 24 and the correction unit 25 are implemented by software, an image processing program according to an embodiment including distance image correction processing may be pre-installed in the processing unit 13, or may be installed later. In the latter case, the image processing program is downloaded from an application program store via a network to the imaging device 1 and installed. For example, if the imaging device 1 is a smartphone, it sufficiently meets the hardware requirements for executing the image processing program.

　図３のフローチャートでは、眼鏡領域の異常なデプス値を補正する処理と、毛髪領域の異常なデプス値を補正する処理を両方、実行する例を説明した。この点、いずれか一方の補正処理のみを実行する形態も、本発明の一実施形態に含まれる。 The flowchart in FIG. 3 describes an example in which both a process for correcting abnormal depth values in the glasses region and a process for correcting abnormal depth values in the hair region are executed. In this regard, an embodiment of the present invention also includes a form in which only one of the correction processes is executed.

　本発明は、人物の３次元点群データの補正に利用可能である。 The present invention can be used to correct 3D point cloud data of a person.

　１　撮像装置、　１１　可視光撮像部、　１２　測距センサ部、　１３　処理部、　１４　カラー画像データ生成部、　１５　デプスデータ生成部、　１６　圧縮符号化部、　１７　送信部、　１８　点群データ生成部　２　画像処理装置、　２１　受信部、　２２　伸張復号部、　２３　分離部、　２４　検出部、　２５　補正部、　２６　点群データ生成部。 1 imaging device, 11 visible light imaging section, 12 distance measurement sensor section, 13 processing section, 14 color image data generation section, 15 depth data generation section, 16 compression encoding section, 17 transmission section, 18 point cloud data generation section, 2 image processing device, 21 receiving section, 22 decompression decoding section, 23 separation section, 24 detection section, 25 correction section, 26 point cloud data generation section.

Claims

　可視光撮像部で撮像されカラー画像から人物領域と、前記人物領域内の注目領域を検出する検出部と、
　測距センサ部の出力をもとに生成された距離画像における、前記カラー画像の人物領域に対応する人物領域内の注目領域に含まれる異常なデプス値を、前記距離画像の人物領域内の正常なデプス値を使用して補正する補正部と、
　を備える画像処理装置。 a detection unit that detects a person area and an attention area within the person area from a color image captured by the visible light imaging unit;
a correction unit that corrects abnormal depth values included in an attention area in a person area corresponding to the person area of the color image in a distance image generated based on an output of a distance measurement sensor unit, by using normal depth values in the person area of the distance image;
An image processing device comprising:
　前記検出部は、前記カラー画像の前記人物領域内の注目領域として顔領域を検出し、当該顔領域内の眼鏡領域を検出し、
　前記補正部は、前記眼鏡領域を除いた前記顔領域のデプス値の最大値と最小値の範囲に収まらないデプス値を、前記異常なデプス値と判定する、
　請求項１に記載の画像処理装置。 the detection unit detects a face area as a region of interest within the person area of the color image, and detects a glasses area within the face area;
The correction unit determines, as the abnormal depth value, a depth value that does not fall within a range between a maximum value and a minimum value of the depth value of the face region excluding the glasses region.
The image processing device according to claim 1 .
　前記補正部は、前記眼鏡領域内の異常なデプス値を、隣接する複数の画素のデプス値をもとに補間、または水平方向にスプライン補間して補正する、
　請求項２に記載の画像処理装置。 The correction unit corrects abnormal depth values in the glasses region by interpolating based on depth values of a plurality of adjacent pixels or by performing spline interpolation in a horizontal direction.
The image processing device according to claim 2 .
　前記検出部は、前記カラー画像の前記人物領域内の注目領域として毛髪領域を検出し、
　前記補正部は、前記毛髪領域内の異常なデプス値を、前記毛髪領域内の垂直方向の正常なデプス値をもとに補正する、
　請求項１に記載の画像処理装置。 the detection unit detects a hair region as a region of interest within the person region of the color image;
The correction unit corrects the abnormal depth value in the hair region based on a normal depth value in the vertical direction in the hair region.
The image processing device according to claim 1 .
　可視光撮像部で撮像されカラー画像から人物領域と、前記人物領域内の注目領域を検出する処理と、
　測距センサ部の出力をもとに生成された距離画像における、前記カラー画像の人物領域に対応する人物領域内の注目領域に含まれる異常なデプス値を、前記距離画像の人物領域内の正常なデプス値を使用して補正する処理と、
　をコンピュータに実行させる画像処理プログラム。 A process of detecting a person area and an attention area within the person area from a color image captured by a visible light imaging unit;
a process of correcting an abnormal depth value included in a region of interest in a person region of the color image, in a distance image generated based on an output from a distance measurement sensor unit, by using a normal depth value in the person region of the distance image;
An image processing program that causes a computer to execute the above.