WO2024116444A1 - Image processing device and image processing program - Google Patents

Image processing device and image processing program Download PDF

Info

Publication number
WO2024116444A1
WO2024116444A1 PCT/JP2023/022307 JP2023022307W WO2024116444A1 WO 2024116444 A1 WO2024116444 A1 WO 2024116444A1 JP 2023022307 W JP2023022307 W JP 2023022307W WO 2024116444 A1 WO2024116444 A1 WO 2024116444A1
Authority
WO
WIPO (PCT)
Prior art keywords
region
area
person
image
depth value
Prior art date
Application number
PCT/JP2023/022307
Other languages
French (fr)
Japanese (ja)
Inventor
悦郎 籾山
Original Assignee
株式会社Jvcケンウッド
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Jvcケンウッド filed Critical 株式会社Jvcケンウッド
Publication of WO2024116444A1 publication Critical patent/WO2024116444A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images

Definitions

  • the present invention relates to an image processing device and an image processing program that correct a distance image.
  • RGB cameras equipped with ToF (Time of Flight) sensors have become popular.
  • ToF Time of Flight
  • depth data can be acquired from the TOF sensor
  • color images can be acquired from the RGB camera
  • 3D point cloud data can be generated by combining the two (see, for example, Patent Document 1).
  • the depth data obtained by the ToF sensor which indicates the distance to the subject, uses reflected light from a laser diode (LD).
  • LD laser diode
  • the LD light can be absorbed by black hair, or reflected by glasses lenses, accessories made of glass or metal such as earrings and brooches, or buttons. In such cases, it may not be possible to obtain accurate depth data. If 3D point cloud data is generated based on inaccurate depth data, it is easy for missing or distortion to occur in areas of interest such as hair, glasses, or accessories.
  • This embodiment was made in consideration of these circumstances, and its purpose is to provide a technology for creating 3D point cloud data of a person with high accuracy.
  • an image processing device includes a detection unit that detects a person area and an attention area within the person area from a color image captured by a visible light imaging unit, and a correction unit that corrects, in a distance image generated based on the output of a distance measuring sensor unit, abnormal depth values contained in an attention area within the person area corresponding to the person area of the color image, using normal depth values within the person area of the distance image.
  • FIG. 1 is a functional block diagram showing an example of the configuration of an imaging device and an image processing device according to an embodiment.
  • 1A and 1B are diagrams illustrating an example of a color image and a distance image.
  • 5 is a flowchart showing the flow of distance image correction processing by the image processing device according to the embodiment.
  • 11A and 11B are diagrams showing a specific image diagram of a process for extracting depth values of a person region in a distance image.
  • FIG. 13 is a diagram showing a specific image diagram of a process for detecting a face region and an eyeglasses region from a color image.
  • FIG. 13 is a diagram showing a specific example of an interpolation process using adjacent pixels.
  • 7A and 7B are diagrams showing a specific example of spline interpolation processing.
  • FIG. 13A and 13B are diagrams illustrating a specific example of an interpolation process from pixels in the vertical direction.
  • FIG. 13 is a diagram showing a distance image of a person area after eyeglass reflections and missing hair have been corrected.
  • FIG. 13 is a functional block diagram showing a configuration example of an imaging device according to a modified example.
  • the imaging device 1 is a device equipped with a visible light camera with a distance measuring sensor, and corresponds to a video camera, a surveillance camera, an in-vehicle camera, a camera mounted on a multicopter (drone), etc.
  • the image processing device 2 is a general-purpose information processing device (e.g., a PC, a server, a tablet, a smartphone) having an image processing function, or a dedicated device having an image processing function.
  • the imaging device 1 and the image processing device 2 are connected via a network such as the Internet or a dedicated line.
  • the imaging device 1 comprises a visible light imaging section 11, a distance measurement sensor section 12, and a processing section 13.
  • the visible light imaging section 11 includes a lens, a color filter of the three primary colors (RGB), and a solid-state imaging element.
  • a CMOS (Complementary Metal Oxide Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor can be used as the solid-state imaging element.
  • the solid-state imaging element converts light that is collected by the lens and transmitted through the color filter into an electrical color image signal and outputs it to the processing section 13.
  • the distance measurement sensor unit 12 acquires distance information from the distance measurement sensor unit 12 to the subject and outputs it to the processing unit 13. In this embodiment, the distance measurement sensor unit 12 acquires distance information using the ToF method.
  • the distance measurement sensor unit 12 includes a light source (e.g., LD) and a light receiving sensor (e.g., photodiode), and measures the distance to the subject based on the time difference between the emission timing of an infrared laser containing a predetermined pulse irradiated from the light source and the reception timing of the reflected light from the subject by the light receiving sensor.
  • a light source e.g., LD
  • a light receiving sensor e.g., photodiode
  • IR pixels may be provided in the solid-state imaging element of the visible light imaging unit 11, and the IR pixels may be used as light receiving sensors.
  • the distance measurement sensor unit 12 obtains distance information within the angle of view of the visible light imaging unit 11 by optically or mechanically changing the irradiation angle of the laser light, or by diffusing the irradiation range of the laser light.
  • LiDAR Laser Imaging Detection and Ranging
  • LiDAR Laser Imaging Detection and Ranging
  • the processing unit 13 includes a color image data generating unit 14, a depth data generating unit 15, a compression encoding unit 16, and a transmission unit 17.
  • the processing unit 13 is realized by a combination of hardware resources and software resources, or by hardware resources alone.
  • the hardware resources that can be used include a CPU, ROM, RAM, GPU (Graphics Processing Unit), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array), high-frequency circuits, and other LSIs.
  • the software resources that can be used include programs such as firmware.
  • the color image data generating unit 14 generates color image data based on the color image signal output from the visible light imaging unit 11.
  • the depth data generating unit 15 acquires the distance information output from the distance measuring sensor unit 12, and arranges it in the pixel array of the color image to generate a distance image (also called a depth map).
  • the compression-encoding unit 16 compresses and encodes the color image data generated by the color image data generation unit 14 and the depth data generated by the depth data generation unit 15 using a predetermined compression-encoding method (e.g., a compression-encoding method based on MPEG).
  • the transmission unit 17 is connected to a network via a wired or wireless connection, and transmits the compressed and encoded data to the image processing device 2 connected to the network.
  • the image processing device 2 includes a receiving unit 21, a decompression/decoding unit 22, a separating unit 23, a detecting unit 24, a correcting unit 25, and a point cloud data generating unit 26.
  • the functional blocks in the image processing device 2 are realized by a combination of hardware resources and software resources, or by hardware resources alone.
  • hardware resources a CPU, ROM, RAM, GPU, ASIC, FPGA, high-frequency circuits, and other LSIs can be used.
  • software resources programs such as an operating system and application programs can be used.
  • the receiving unit 21 receives the compressed and encoded data transmitted from the imaging device 1.
  • the decompression and decoding unit 22 decompresses and decodes the received compressed and encoded data.
  • the separating unit 23 outputs color image data and depth data, synchronizing with each frame.
  • the detection unit 24 detects a human region from within a color image.
  • the detection unit 24 can detect a human region using an open source object detection algorithm.
  • Detectron which uses a deep neural network, can be used as a library of object detection algorithms.
  • the detection unit 24 detects a face region from within the person region of the detected color image.
  • the detection unit 24 detects a face region from within the person region, for example, using a cascade classifier (classifier).
  • classifier for example, OpenCV provides a cascade classifier for human face detection that uses Haar-like features.
  • the detection unit 24 detects the glasses area from within the face area of the detected color image.
  • the detection unit 24 can detect the glasses area using an existing detection algorithm that uses deep learning to detect information about the eyes, such as the presence or absence of glasses and the position of the pupils.
  • the detection unit 24 can also detect the glasses area using a polarized camera.
  • a polarizing plate array is incorporated on top of a solid-state imaging element.
  • the polarizing plate array is composed of a matrix of unit pixel groups, each of which includes four pixels in four different orientations (0°, 45°, 90°, 135°).
  • the detection unit 24 obtains four types of light intensity from the polarized camera for each frame, and determines that an area within the face area that exhibits a uniform, flat light intensity is the glasses area.
  • the detection unit 24 can detect a hair region from within the face region of the detected color image. For example, the detection unit 24 detects the hair region using a hair region classifier. Note that the upper body part of the person region (excluding the glasses region) may be set as a hair region as it is a region where hair may be present.
  • the correction unit 25 corrects abnormal depth values contained in the glasses area or hair area in the person area of the distance image that corresponds to the person area of the color image, using normal depth values in the person area of the distance image.
  • the point cloud data generating unit 26 generates three-dimensional point cloud data by combining the distance image corrected by the correction unit 25 with the color image.
  • the three-dimensional point cloud data is composed of three-dimensional coordinate values (X, Y, Z) including a depth value (Z) and color information (R, G, B) for each pixel.
  • the three-dimensional image based on the generated three-dimensional point cloud data can be displayed on a VR device or an AR device, for example. It is also possible to display a three-dimensional image in real time that shows a person being photographed by the imaging device 1. Note that in FIG.
  • the imaging device 1 transmits compressed and encoded data of color image data and depth data to the image processing device 2, but the imaging device 1 may generate three-dimensional point cloud data and transmit the compressed and encoded data of the three-dimensional point cloud data to the image processing device 2.
  • the image processing device 2 decompresses and decodes the three-dimensional point cloud data, and then restores the color image data and depth data from the three-dimensional point cloud data.
  • reflection of LD light by eyeglasses and absorption of LD light by black hair can prevent normal 3D point cloud data from being obtained, resulting in reproducibility that differs from the original shape.
  • FIG. 2 shows an example of a color image C1 and a distance image D1.
  • a portion G1 of the eyeglass lens is strongly reflected.
  • the portion of the lens corresponding to G1 either protrudes forward or is missing. Also, part of the hair portion H1 may be missing.
  • FIG. 3 is a flowchart showing the flow of distance image correction processing by the image processing device 2 according to the embodiment.
  • the detection unit 24 searches for a person area in the color image (step S10). If a person area is not detected (step S11: NO), the correction processing for the target frame ends. If a person area is detected (step S11: YES), the detection unit 24 extracts the person area from the color image (step S12). The correction unit 25 extracts depth values of pixels included in an area corresponding to the person area in the color image from the distance image (step S13).
  • Figure 4 shows a specific image diagram of the process of extracting depth values of a person region Pd in a distance image.
  • a person region P1 is detected in a color image C1, the background is separated, and only the person region P1 is extracted (see C1a). From the distance image D1, the depth value of only the corresponding person region Pd is extracted (see D1a).
  • the detection unit 24 detects a face region from the color image (step S14).
  • the detection unit 24 detects a glasses region from the face region (step S15).
  • FIG. 5 shows a specific image diagram of the process of detecting the face region F1 and glasses regions Er and El from the color image C1. Note that if glasses cannot be detected, the process of steps S15 to S22 is skipped. Note that if glasses cannot be detected, the pupil region may be detected instead of detecting glasses.
  • the correction unit 25 identifies the maximum and minimum depth values of the pixels contained in the corresponding face region (excluding the glasses region) of the distance image (step S16).
  • the depth value may take a range of 0 to 255 or 0 to 65535.
  • 0 is the value indicating the closest distance
  • a maximum value such as 255 or 65535 indicates the furthest distance.
  • pixels with missing data will have a depth value of 0.
  • the distance image is displayed as a grayscale image
  • 0 is displayed as black and the maximum value is displayed as white. Note that the value indicating the closest distance may be assigned to 0, and the value indicating the furthest distance may be assigned to the maximum value.
  • the correction unit 25 sets an initial value to the target pixel in the glasses region (step S17). Usually, the initial value is set to the top left pixel of the glasses region.
  • the correction unit 25 determines whether checking of all pixels in the glasses region has been completed (step S18). If checking of all pixels in the glasses region has been completed (step S18: YES), the process proceeds to step S23.
  • step S18 If checking of all pixels in the glasses area has not been completed (step S18: NO), the correction unit 25 determines whether the depth value of the target pixel is 0 (step S19). If the depth value of the target pixel is 0 (step S19: YES), the process proceeds to step S21.
  • step S19: NO the correction unit 25 determines whether the depth value of the target pixel falls within the range between the maximum and minimum depth values of the face area (excluding the glasses area) (hereinafter referred to as the normal range) (step S20). If it falls within the normal range (step S20: YES), the process proceeds to step S22.
  • step S20 If it does not fall within the normal range (step S20: NO) or if the depth value is 0 (step S19: YES), the correction unit 25 corrects the depth value of the target pixel using the normal depth value of the glasses area (step S21). Pixels with a depth value smaller than the minimum value protrude forward from the surface of the face, causing the eyes to pop out in the 3D image. Conversely, pixels with a depth value larger than the maximum value sink back from the surface of the face, causing the eyes to pop out in the 3D image. In the distance image D1a shown in Figure 4, the depth value of the pixel corresponding to a portion Gd of the lens is smaller than the minimum value, causing the eyes to pop out in the 3D image generated from this 3D point cloud data as is.
  • the correction unit 25 interpolates the depth value of a target pixel in the glasses area that is outside the normal range based on the depth values of multiple adjacent pixels.
  • Figure 6 is a diagram showing a specific example of the interpolation process from adjacent pixels.
  • pixels G, H, I, L, M, and N indicate pixels whose depth values are not within the normal range.
  • the correction unit 25 assigns to the target pixel G the average value of the depth value of pixel A at the top left, the depth value of pixel B directly above, the depth value of pixel C at the top right, the depth value of pixel F on the left, and the depth value of pixel K at the bottom left.
  • the median may be assigned instead of the average value.
  • the correction unit 25 assigns to the target pixel H the average depth value of the top left pixel B, the depth value of the pixel C directly above, the depth value of the pixel D on the top right, and the depth value of the pixel G on the left.
  • the interpolated pixel G is used as the reference, and the non-interpolated pixel L is excluded from the reference. Note that the range of pixels to be referenced is not limited to the example shown in FIG. 6, and pixels to the right and below may also be included in the reference source. Furthermore, if a pixel at the same position in the glasses area of a frame adjacent in the time direction is normal, the correction unit 25 may assign the depth value of that normal pixel to the target pixel.
  • the correction unit 25 may also correct the depth values of target pixels in the glasses region that are outside the normal range by performing spline interpolation in the horizontal direction.
  • Figures 7(a) and 7(b) are diagrams showing a specific example of spline interpolation processing.
  • Figure 7(a) is a diagram plotting depth values of a horizontal line in the glasses region.
  • Figure 7(a) includes a portion where the depth value is outside the normal range (a part of the lens Gd).
  • the correction unit 25 calculates the function of the section connecting the front and back of the portion that is outside the normal range by spline interpolation, and assigns the depth value of the portion that is outside the normal range onto the calculated spline curve function (generally calculated using a cubic polynomial).
  • Figure 7(b) shows how the depth value that is outside the normal range is assigned onto the spline curve.
  • correction unit 25 increments the target pixel in the glasses area (step S22). Specifically, the address of the target pixel in the glasses area advances by one pixel in the scanning direction. Then, the process proceeds to step S18.
  • the detection unit 24 extracts the hair region from the person region of the color image (step S23).
  • the correction unit 25 sets an initial value to the target pixel in the hair region (step S24). Typically, the initial value is set to the top left pixel of the hair region.
  • the correction unit 25 determines whether checking of all pixels in the hair region has been completed (step S25).
  • step S25 If checking of all pixels in the hair region has not been completed (step S25: NO), the correction unit 25 determines whether the depth value of the target pixel is 0 or not (step S26). If the depth value of the target pixel is not 0 (step S26: NO), the process proceeds to step S28. If the depth value of the target pixel is 0 (step S26: YES), the correction unit 25 corrects the depth value of the target pixel using a normal depth value in the hair region (step S27).
  • Figure 8 shows a specific example of the interpolation process from pixels in the vertical direction.
  • the top group of darkest pixels indicates an area where no depth value exists (background area B1)
  • the bottom group of medium-colored pixels indicates the hair area Hn that has a depth value.
  • the correction unit 25 assigns the depth value of a pixel that has a non-missing depth value to a pixel that has a missing depth value in each vertical line of the hair region.
  • the depth values of pixels A, B, C, D, E, and F are missing, and the correction unit 25 assigns the depth value of pixel G, which has a non-missing depth value, to pixels A, B, C, D, E, and F.
  • the correction unit 25 assigns the depth value of the non-missing pixel on the top of the head to the missing pixel on the forehead.
  • the correction unit 25 may apply a filter (e.g., a moving average) to each horizontal line to smooth out changes in horizontal depth values.
  • a filter e.g., a moving average
  • the correction unit 25 may correct the missing target pixel by assigning the depth value of the non-missing pixel to the missing target pixel.
  • a pixel in the hair region may be subject to correction if it has an abnormal depth value.
  • pixels with depth values that do not fall within the normal range of depth values in the face region may also be included as subjects of correction.
  • correction unit 25 increments the target pixel in the hair region (step S28). Specifically, the address of the target pixel in the hair region advances by one pixel in the scanning direction. Then, proceed to step S25.
  • step S25 YES
  • Figure 9 shows a distance image D1b of the person area Pd after the reflection from the glasses and missing hair have been corrected. In a 3D image based on the corrected distance image, popping out eyes and missing hair are avoided.
  • FIG. 10 is a functional block diagram showing an example configuration of an imaging device 1 according to a modified example.
  • the functions of the detection unit 24 and correction unit 25 of the image processing device 2 shown in FIG. 1 are incorporated into the processing unit 13 of the imaging device 1.
  • the processing unit 13 of the imaging device 1 has the same functions as the image processing device 2.
  • a point cloud data generation unit 18 is provided in the processing unit 13 of the imaging device 1, and the point cloud data generation unit 18 generates three-dimensional point cloud data by combining the distance image corrected by the correction unit 25 and a color image.
  • an image processing program may be pre-installed in the processing unit 13, or may be installed later.
  • the image processing program is downloaded from an application program store via a network to the imaging device 1 and installed. For example, if the imaging device 1 is a smartphone, it sufficiently meets the hardware requirements for executing the image processing program.
  • FIG. 3 describes an example in which both a process for correcting abnormal depth values in the glasses region and a process for correcting abnormal depth values in the hair region are executed.
  • an embodiment of the present invention also includes a form in which only one of the correction processes is executed.
  • the present invention can be used to correct 3D point cloud data of a person.
  • 1 imaging device 11 visible light imaging section, 12 distance measurement sensor section, 13 processing section, 14 color image data generation section, 15 depth data generation section, 16 compression encoding section, 17 transmission section, 18 point cloud data generation section, 2 image processing device, 21 receiving section, 22 decompression decoding section, 23 separation section, 24 detection section, 25 correction section, 26 point cloud data generation section.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A detection unit 24 of an image processing device 2 detects a person area and an area of interest within the person area from a color image captured by a visible light imaging unit 11. A correction unit 25 corrects an abnormal depth value, which is contained in an area of interest within a person area that is in a distance image generated on the basis of output of a range sensor unit 12 and corresponds to the person area of the color image by using a normal depth value within the person area of the distance image.

Description

画像処理装置、画像処理プログラムImage processing device, image processing program
 本発明は、距離画像を補正する画像処理装置、画像処理プログラムに関する。 The present invention relates to an image processing device and an image processing program that correct a distance image.
 近年、ToF(Time of Flight)センサ付きのRGBカメラが普及してきている。TOFセンサ付きのカメラでは、TOFセンサから深度データを取得し、RGBカメラからカラー画像を取得し、両者を組み合わせることで3次元点群データを生成することができる(例えば、特許文献1参照)。 In recent years, RGB cameras equipped with ToF (Time of Flight) sensors have become popular. With a camera equipped with a TOF sensor, depth data can be acquired from the TOF sensor, color images can be acquired from the RGB camera, and 3D point cloud data can be generated by combining the two (see, for example, Patent Document 1).
特開2014-157044号公報JP 2014-157044 A
 ToFセンサで得られる被写体までの距離を表す深度データは、LD(Laser Diode)光の反射光を利用している。被写体が人物の場合、黒い毛髪でLD光が吸収されてしまったり、眼鏡のレンズやイヤリングやブローチのようなガラスや金属でできたアクセサリーあるいはボタンなどでLD光が反射されてしまったりすることがある。その場合、正確な深度データを取得できなくなるときがある。不正確な深度データをもとに3次元点群データを生成した場合、毛髪部分や眼鏡部分あるいはアクセサリー部分などの注目領域に、欠落や歪みが発生しやすくなる。 The depth data obtained by the ToF sensor, which indicates the distance to the subject, uses reflected light from a laser diode (LD). When the subject is a person, the LD light can be absorbed by black hair, or reflected by glasses lenses, accessories made of glass or metal such as earrings and brooches, or buttons. In such cases, it may not be possible to obtain accurate depth data. If 3D point cloud data is generated based on inaccurate depth data, it is easy for missing or distortion to occur in areas of interest such as hair, glasses, or accessories.
 本実施形態はこうした状況に鑑みてなされたものであり、その目的は、人物の3次元点群データを高精度に作成する技術を提供することにある。 This embodiment was made in consideration of these circumstances, and its purpose is to provide a technology for creating 3D point cloud data of a person with high accuracy.
 上記課題を解決するために、本実施形態のある態様の画像処理装置は、可視光撮像部で撮像されカラー画像から人物領域と、前記人物領域内の注目領域を検出する検出部と、測距センサ部の出力をもとに生成された距離画像における、前記カラー画像の人物領域に対応する人物領域内の注目領域に含まれる異常なデプス値を、前記距離画像の人物領域内の正常なデプス値を使用して補正する補正部と、を備える。 In order to solve the above problem, an image processing device according to one aspect of this embodiment includes a detection unit that detects a person area and an attention area within the person area from a color image captured by a visible light imaging unit, and a correction unit that corrects, in a distance image generated based on the output of a distance measuring sensor unit, abnormal depth values contained in an attention area within the person area corresponding to the person area of the color image, using normal depth values within the person area of the distance image.
 なお、以上の構成要素の任意の組合せ、本実施形態の表現を方法、装置、システム、記録媒体、コンピュータプログラムなどの間で変換したものもまた、本実施形態の態様として有効である。 In addition, any combination of the above components, and conversions of the expressions of this embodiment between methods, devices, systems, recording media, computer programs, etc., are also valid aspects of this embodiment.
 本実施形態によれば、人物の3次元点群データを高精度に作成することができる。 According to this embodiment, it is possible to create 3D point cloud data of a person with high accuracy.
実施形態に係る、撮像装置と画像処理装置の構成例を示す機能ブロック図である。1 is a functional block diagram showing an example of the configuration of an imaging device and an image processing device according to an embodiment. カラー画像と距離画像の一例を示す図である。1A and 1B are diagrams illustrating an example of a color image and a distance image. 実施形態に係る画像処理装置による距離画像の補正処理の流れを示すフローチャートである。5 is a flowchart showing the flow of distance image correction processing by the image processing device according to the embodiment. 距離画像の人物領域のデプス値を抽出する処理の具体的なイメージ図を示す図である。11A and 11B are diagrams showing a specific image diagram of a process for extracting depth values of a person region in a distance image. カラー画像から、顔領域と眼鏡領域を検出する処理の具体的なイメージ図を示す図である。FIG. 13 is a diagram showing a specific image diagram of a process for detecting a face region and an eyeglasses region from a color image. 隣接画素からの補間処理の具体例を示す図である。FIG. 13 is a diagram showing a specific example of an interpolation process using adjacent pixels. 図7(a)-(b)は、スプライン補間処理の具体例を示す図である。7A and 7B are diagrams showing a specific example of spline interpolation processing. 垂直方向の画素からの補間処理の具体例を示す図である。13A and 13B are diagrams illustrating a specific example of an interpolation process from pixels in the vertical direction. 眼鏡の反射と毛髪の欠落が補正された後の人物領域の距離画像を示す図である。FIG. 13 is a diagram showing a distance image of a person area after eyeglass reflections and missing hair have been corrected. 変形例に係る撮像装置の構成例を示す機能ブロック図である。FIG. 13 is a functional block diagram showing a configuration example of an imaging device according to a modified example.
 以下では注目領域として眼鏡と毛髪を例にとり説明する。
 図1は、実施形態に係る、撮像装置1と画像処理装置2の構成例を示す機能ブロック図である。撮像装置1は、測距センサ付きの可視光カメラを搭載した装置であり、ビデオカメラ、監視カメラ、車載カメラ、マルチコプタ(ドローン)に搭載されたカメラなどが該当する。画像処理装置2は、画像処理機能を有する汎用の情報処理装置(例えば、PC、サーバ、タブレット、スマートフォン)、または画像処理機能を有する専用の装置である。撮像装置1と画像処理装置2は、インターネット、専用線などのネットワークを介して接続される。
In the following, glasses and hair are used as examples of areas of interest.
1 is a functional block diagram showing an example of the configuration of an imaging device 1 and an image processing device 2 according to an embodiment. The imaging device 1 is a device equipped with a visible light camera with a distance measuring sensor, and corresponds to a video camera, a surveillance camera, an in-vehicle camera, a camera mounted on a multicopter (drone), etc. The image processing device 2 is a general-purpose information processing device (e.g., a PC, a server, a tablet, a smartphone) having an image processing function, or a dedicated device having an image processing function. The imaging device 1 and the image processing device 2 are connected via a network such as the Internet or a dedicated line.
 撮像装置1は、可視光撮像部11、測距センサ部12、処理部13を備える。可視光撮像部11は、レンズ、3原色(RGB)のカラーフィルタ、固体撮像素子を含む。固体撮像素子には例えば、CMOS(Complementary Metal Oxide Semiconductor)イメージセンサまたはCCD(Charge Coupled Device)イメージセンサを使用することができる。固体撮像素子は、レンズで集光され、カラーフィルタを透過した光を、電気的なカラー画像信号に変換して処理部13に出力する。 The imaging device 1 comprises a visible light imaging section 11, a distance measurement sensor section 12, and a processing section 13. The visible light imaging section 11 includes a lens, a color filter of the three primary colors (RGB), and a solid-state imaging element. For example, a CMOS (Complementary Metal Oxide Semiconductor) image sensor or a CCD (Charge Coupled Device) image sensor can be used as the solid-state imaging element. The solid-state imaging element converts light that is collected by the lens and transmitted through the color filter into an electrical color image signal and outputs it to the processing section 13.
 測距センサ部12は、測距センサ部12から被写体までの距離情報を取得し、処理部13に出力する。本実施の形態では、測距センサ部12はToF方式で距離情報を取得する。測距センサ部12は、光源(例えば、LD)と受光センサ(例えば、フォトダイオード)を備え、光源から照射される所定のパルスを含む赤外線レーザの発光タイミングと、受光センサによる被写体からの反射光の受光タイミングとの時間差をもとに、被写体までの距離を測定する。 The distance measurement sensor unit 12 acquires distance information from the distance measurement sensor unit 12 to the subject and outputs it to the processing unit 13. In this embodiment, the distance measurement sensor unit 12 acquires distance information using the ToF method. The distance measurement sensor unit 12 includes a light source (e.g., LD) and a light receiving sensor (e.g., photodiode), and measures the distance to the subject based on the time difference between the emission timing of an infrared laser containing a predetermined pulse irradiated from the light source and the reception timing of the reflected light from the subject by the light receiving sensor.
 なお、可視光撮像部11の固体撮像素子にIR画素を設け、当該IR画素を受光センサとして使用してもよい。測距センサ部12は、可視光撮像部11の画角の範囲内の距離情報を、レーザ光の照射角度を光学的または機械的に変化させるか、レーザ光の照射範囲を拡散させることで取得する。なお、測距センサ部12としてLiDAR(Laser Imaging Detection and Ranging)を使用してもよい。 In addition, IR pixels may be provided in the solid-state imaging element of the visible light imaging unit 11, and the IR pixels may be used as light receiving sensors. The distance measurement sensor unit 12 obtains distance information within the angle of view of the visible light imaging unit 11 by optically or mechanically changing the irradiation angle of the laser light, or by diffusing the irradiation range of the laser light. In addition, LiDAR (Laser Imaging Detection and Ranging) may be used as the distance measurement sensor unit 12.
 処理部13は、カラー画像データ生成部14、デプスデータ生成部15、圧縮符号化部16、送信部17を含む。処理部13は、ハードウェア資源とソフトウェア資源の協働、またはハードウェア資源のみにより実現される。ハードウェア資源として、CPU、ROM、RAM、GPU(Graphics Processing Unit)、ASIC(Application Specific Integrated Circuit)、FPGA(Field Programmable Gate Array)、高周波回路、その他のLSIを利用できる。ソフトウェア資源としてファームウェアなどのプログラムを利用できる。 The processing unit 13 includes a color image data generating unit 14, a depth data generating unit 15, a compression encoding unit 16, and a transmission unit 17. The processing unit 13 is realized by a combination of hardware resources and software resources, or by hardware resources alone. The hardware resources that can be used include a CPU, ROM, RAM, GPU (Graphics Processing Unit), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array), high-frequency circuits, and other LSIs. The software resources that can be used include programs such as firmware.
 カラー画像データ生成部14は、可視光撮像部11から出力されるカラー画像信号をもとにカラー画像データを生成する。カラー画像データは例えば、画素ごとに24ビットの色データ(R=8ビット,G=8ビット、B=8ビット)で構成される。デプスデータ生成部15は、測距センサ部12から出力される距離情報を取得し、カラー画像の画素配列に並べて距離画像(デプスマップともいう)を生成する。 The color image data generating unit 14 generates color image data based on the color image signal output from the visible light imaging unit 11. The color image data is composed of, for example, 24-bit color data (R=8 bits, G=8 bits, B=8 bits) for each pixel. The depth data generating unit 15 acquires the distance information output from the distance measuring sensor unit 12, and arranges it in the pixel array of the color image to generate a distance image (also called a depth map).
 圧縮符号化部16は、カラー画像データ生成部14で生成されたカラー画像データとデプスデータ生成部15で生成されたデプスデータを、所定の圧縮符号化方式(例えば、MPEGをベースとして圧縮符号化方式)で圧縮符号化する。送信部17は、有線または無線でネットワークに接続し、当該ネットワークに接続されている画像処理装置2に、圧縮符号化データを送信する。 The compression-encoding unit 16 compresses and encodes the color image data generated by the color image data generation unit 14 and the depth data generated by the depth data generation unit 15 using a predetermined compression-encoding method (e.g., a compression-encoding method based on MPEG). The transmission unit 17 is connected to a network via a wired or wireless connection, and transmits the compressed and encoded data to the image processing device 2 connected to the network.
 画像処理装置2は、受信部21、伸張復号部22、分離部23、検出部24、補正部25、点群データ生成部26を含む。画像処理装置2内の当該機能ブロックは、ハードウェア資源とソフトウェア資源の協働、またはハードウェア資源のみにより実現される。ハードウェア資源として、CPU、ROM、RAM、GPU、ASIC、FPGA、高周波回路、その他のLSIを利用できる。ソフトウェア資源としてオペレーティングシステム、アプリケーションプログラムなどのプログラムを利用できる。 The image processing device 2 includes a receiving unit 21, a decompression/decoding unit 22, a separating unit 23, a detecting unit 24, a correcting unit 25, and a point cloud data generating unit 26. The functional blocks in the image processing device 2 are realized by a combination of hardware resources and software resources, or by hardware resources alone. As hardware resources, a CPU, ROM, RAM, GPU, ASIC, FPGA, high-frequency circuits, and other LSIs can be used. As software resources, programs such as an operating system and application programs can be used.
 受信部21は、撮像装置1から送信された圧縮符号化データを受信する。伸張復号部22は、受信した圧縮符号化データを伸張復号する。分離部23は、フレームごとに同期をとって、カラー画像データとデプスデータを出力する。 The receiving unit 21 receives the compressed and encoded data transmitted from the imaging device 1. The decompression and decoding unit 22 decompresses and decodes the received compressed and encoded data. The separating unit 23 outputs color image data and depth data, synchronizing with each frame.
 検出部24は、カラー画像内から人物領域を検出する。検出部24は、オープンソースの物体検出アルゴリズムを使用して人物領域を検出することができる。例えば、物体検出アルゴリズムのライブラリとして、ディープニューラルネットワークを利用したDetectronなどを使用することができる。 The detection unit 24 detects a human region from within a color image. The detection unit 24 can detect a human region using an open source object detection algorithm. For example, Detectron, which uses a deep neural network, can be used as a library of object detection algorithms.
 検出部24は、検出したカラー画像の人物領域内から顔領域を検出する。検出部24は例えば、カスケード分類器(識別器)を用いて人物領域内から顔領域を検出する。例えばOpenCVでは、Haar-like特徴量を使用した、人物の顔検出用のカスケード分類器が用意されている。 The detection unit 24 detects a face region from within the person region of the detected color image. The detection unit 24 detects a face region from within the person region, for example, using a cascade classifier (classifier). For example, OpenCV provides a cascade classifier for human face detection that uses Haar-like features.
 検出部24は、検出したカラー画像の顔領域内から眼鏡領域を検出する。検出部24は、ディープラーニングを用いた、眼鏡の有無、瞳の位置などの目に関する情報を検出する既存の検出アルゴリズムを使用して眼鏡領域を検出することができる。 The detection unit 24 detects the glasses area from within the face area of the detected color image. The detection unit 24 can detect the glasses area using an existing detection algorithm that uses deep learning to detect information about the eyes, such as the presence or absence of glasses and the position of the pupils.
 また検出部24は、偏光カメラを用いて眼鏡領域を検出することもできる。偏光カメラでは固体撮像素子の上に偏光板アレイが組み込まれる。偏光板アレイは、異なる4種類の方位(0°、45°、90°、135°)の4画素を含む単位画素群をマトリクス状に配列して構成される。検出部24は、偏光カメラからフレームごとに4種類の光強度を取得し、顔領域内において、均一でフラットな光強度を示す領域を眼鏡領域と判定する。 The detection unit 24 can also detect the glasses area using a polarized camera. In a polarized camera, a polarizing plate array is incorporated on top of a solid-state imaging element. The polarizing plate array is composed of a matrix of unit pixel groups, each of which includes four pixels in four different orientations (0°, 45°, 90°, 135°). The detection unit 24 obtains four types of light intensity from the polarized camera for each frame, and determines that an area within the face area that exhibits a uniform, flat light intensity is the glasses area.
 検出部24は、検出したカラー画像の顔領域内から毛髪領域を検出することができる。検出部24は例えば、毛髪領域の分類器を使用して毛髪領域を検出する。なお、人物領域の上半身部分(眼鏡領域を除く)を、毛髪が存在する可能性がある領域として、毛髪領域に設定してもよい。 The detection unit 24 can detect a hair region from within the face region of the detected color image. For example, the detection unit 24 detects the hair region using a hair region classifier. Note that the upper body part of the person region (excluding the glasses region) may be set as a hair region as it is a region where hair may be present.
 補正部25は、距離画像における、カラー画像の人物領域に対応する人物領域内の、眼鏡領域または毛髪領域に含まれる異常なデプス値を、距離画像の人物領域内の正常なデプス値を使用して補正する。 The correction unit 25 corrects abnormal depth values contained in the glasses area or hair area in the person area of the distance image that corresponds to the person area of the color image, using normal depth values in the person area of the distance image.
 点群データ生成部26は、補正部25により補正された距離画像と、カラー画像を組み合わせて3次元点群データを生成する。3次元点群データは、デプス値(Z)を含む3次元座標値(X,Y,Z)と、画素ごとの色情報(R,G,B)で構成される。生成された3次元点群データに基づく3次元画像は例えば、VRデバイスやARデバイスに表示させることができる。撮像装置1で撮影中の人物が映った3次元画像をリアルタイムに表示させることもできる。なお図1では、撮像装置1から、カラー画像データとデプスデータの圧縮符号化データを画像処理装置2に送信する例を説明したが、撮像装置1側で3次元点群データを生成し、3次元点群データの圧縮符号化データを画像処理装置2に送信してもよい。その場合、画像処理装置2側で3次元点群データを伸張復号した後、3次元点群データからカラー画像データとデプスデータを復元する。 The point cloud data generating unit 26 generates three-dimensional point cloud data by combining the distance image corrected by the correction unit 25 with the color image. The three-dimensional point cloud data is composed of three-dimensional coordinate values (X, Y, Z) including a depth value (Z) and color information (R, G, B) for each pixel. The three-dimensional image based on the generated three-dimensional point cloud data can be displayed on a VR device or an AR device, for example. It is also possible to display a three-dimensional image in real time that shows a person being photographed by the imaging device 1. Note that in FIG. 1, an example is described in which the imaging device 1 transmits compressed and encoded data of color image data and depth data to the image processing device 2, but the imaging device 1 may generate three-dimensional point cloud data and transmit the compressed and encoded data of the three-dimensional point cloud data to the image processing device 2. In that case, the image processing device 2 decompresses and decodes the three-dimensional point cloud data, and then restores the color image data and depth data from the three-dimensional point cloud data.
 人物をToFセンサで測距する場合、眼鏡によるLD光の反射、黒い毛髪によるLD光の吸収により、正常な3次元点群データが得られず、本来の形状とは異なる再現性になる場合がある。 When measuring the distance to a person using a ToF sensor, reflection of LD light by eyeglasses and absorption of LD light by black hair can prevent normal 3D point cloud data from being obtained, resulting in reproducibility that differs from the original shape.
 図2は、カラー画像C1と距離画像D1の一例を示す図である。距離画像D1では、眼鏡のレンズの一部分G1が強く反射している。この距離画像D1に基づく3次元画像では、レンズの一部分G1に相当する部分が前に飛び出るか、その部分が欠落してしまう。また、毛髪部分H1の一部が欠落してしまうことがある。 FIG. 2 shows an example of a color image C1 and a distance image D1. In the distance image D1, a portion G1 of the eyeglass lens is strongly reflected. In a three-dimensional image based on this distance image D1, the portion of the lens corresponding to G1 either protrudes forward or is missing. Also, part of the hair portion H1 may be missing.
 図3は、実施形態に係る画像処理装置2による距離画像の補正処理の流れを示すフローチャートである。検出部24は、カラー画像内において人物領域を探索する(ステップS10)。人物領域が検出されなかった場合(ステップS11:NO)、対象フレームの補正処理を終了する。人物領域が検出された場合(ステップS11:YES)、検出部24は、カラー画像から人物領域を抽出する(ステップS12)。補正部25は、距離画像から、カラー画像の人物領域に対応する領域に含まれる画素のデプス値を抽出する(ステップS13)。 FIG. 3 is a flowchart showing the flow of distance image correction processing by the image processing device 2 according to the embodiment. The detection unit 24 searches for a person area in the color image (step S10). If a person area is not detected (step S11: NO), the correction processing for the target frame ends. If a person area is detected (step S11: YES), the detection unit 24 extracts the person area from the color image (step S12). The correction unit 25 extracts depth values of pixels included in an area corresponding to the person area in the color image from the distance image (step S13).
 図4は、距離画像の人物領域Pdのデプス値を抽出する処理の具体的なイメージ図を示す。カラー画像C1内において人物領域P1が検出され、背景が分離され、人物領域P1のみが抽出される(C1a参照)。距離画像D1から、対応する人物領域Pdのみのデプス値が抽出される(D1a参照)。 Figure 4 shows a specific image diagram of the process of extracting depth values of a person region Pd in a distance image. A person region P1 is detected in a color image C1, the background is separated, and only the person region P1 is extracted (see C1a). From the distance image D1, the depth value of only the corresponding person region Pd is extracted (see D1a).
 図3に戻る。検出部24は、カラー画像から顔領域を検出する(ステップS14)。検出部24は、顔領域から眼鏡領域を検出する(ステップS15)。図5は、カラー画像C1から、顔領域F1と眼鏡領域Er、Elを検出する処理の具体的なイメージ図を示す。なお、眼鏡を検出できなかった場合、ステップS15~ステップS22の処理をスキップする。なお、眼鏡を検出できなかった場合、眼鏡の検出の代わりに瞳領域を検出してもよい。 Returning to FIG. 3, the detection unit 24 detects a face region from the color image (step S14). The detection unit 24 detects a glasses region from the face region (step S15). FIG. 5 shows a specific image diagram of the process of detecting the face region F1 and glasses regions Er and El from the color image C1. Note that if glasses cannot be detected, the process of steps S15 to S22 is skipped. Note that if glasses cannot be detected, the pupil region may be detected instead of detecting glasses.
 図3に戻る。補正部25は、距離画像の対応する顔領域(眼鏡領域を除く)に含まれる画素のデプス値の最大値と最小値を特定する(ステップS16)。例えば、デプス値は0~255や0~65535の範囲の値をとる。0が最も近い距離を示す値であり、255や65535などの最大値が最も遠い距離を示す値である。また、データが欠損している画素もデプス値は0になる。距離画像をグレースケール画像で表示する場合は0は黒で、最大値は白で表示される。なお、0に最も近い距離を示す値を割り当て、最大値に最も遠い距離を示す値を割り当ててもよい。 Returning to FIG. 3, the correction unit 25 identifies the maximum and minimum depth values of the pixels contained in the corresponding face region (excluding the glasses region) of the distance image (step S16). For example, the depth value may take a range of 0 to 255 or 0 to 65535. 0 is the value indicating the closest distance, and a maximum value such as 255 or 65535 indicates the furthest distance. Also, pixels with missing data will have a depth value of 0. When the distance image is displayed as a grayscale image, 0 is displayed as black and the maximum value is displayed as white. Note that the value indicating the closest distance may be assigned to 0, and the value indicating the furthest distance may be assigned to the maximum value.
 補正部25は、眼鏡領域の対象画素に初期値を設定する(ステップS17)。通常、初期値は眼鏡領域の一番左上の画素に設定される。補正部25は眼鏡領域の全画素のチェックが終了したか否か判定する(ステップS18)。眼鏡領域の全画素のチェックが終了した場合(ステップS18:YES)、ステップS23に遷移する。 The correction unit 25 sets an initial value to the target pixel in the glasses region (step S17). Usually, the initial value is set to the top left pixel of the glasses region. The correction unit 25 determines whether checking of all pixels in the glasses region has been completed (step S18). If checking of all pixels in the glasses region has been completed (step S18: YES), the process proceeds to step S23.
 眼鏡領域の全画素のチェックが終了していない場合(ステップS18:NO)、補正部25は対象画素のデプス値が0であるか否か判定する(ステップS19)。対象画素のデプス値が0である場合(ステップS19:YES)、ステップS21に遷移する。 If checking of all pixels in the glasses area has not been completed (step S18: NO), the correction unit 25 determines whether the depth value of the target pixel is 0 (step S19). If the depth value of the target pixel is 0 (step S19: YES), the process proceeds to step S21.
 対象画素のデプス値が0でない場合(ステップS19:NO)、補正部25は対象画素のデプス値が、顔領域(眼鏡領域を除く)のデプス値の最大値と最小値の範囲(以下、正常範囲という)に収まるか否か判定する(ステップS20)。正常範囲に収まる場合(ステップS20:YES)、ステップS22に遷移する。 If the depth value of the target pixel is not 0 (step S19: NO), the correction unit 25 determines whether the depth value of the target pixel falls within the range between the maximum and minimum depth values of the face area (excluding the glasses area) (hereinafter referred to as the normal range) (step S20). If it falls within the normal range (step S20: YES), the process proceeds to step S22.
 正常範囲に収まらない場合(ステップS20:NO)、またはデプス値が0である場合(ステップS19:YES)、補正部25は対象画素のデプス値を、眼鏡領域の正常なデプス値を使用して補正する(ステップS21)。最小値よりも小さいデプス値の画素は顔の表面より前に飛び出し、3次元画像において目が前に飛び出すような症状を引き起こす。反対に、最大値よりも大きいデプス値の画素は顔の表面より奥に沈み込み、3次元画像において目が沈み込むような症状を引き起こす。図4に示した距離画像D1aでは、レンズの一部分Gdに相当する画素のデプス値が最小値よりも小さく、このままの3次元点群データから生成される3次元画像においては、目が前に飛び出すような症状を引き起こす。 If it does not fall within the normal range (step S20: NO) or if the depth value is 0 (step S19: YES), the correction unit 25 corrects the depth value of the target pixel using the normal depth value of the glasses area (step S21). Pixels with a depth value smaller than the minimum value protrude forward from the surface of the face, causing the eyes to pop out in the 3D image. Conversely, pixels with a depth value larger than the maximum value sink back from the surface of the face, causing the eyes to pop out in the 3D image. In the distance image D1a shown in Figure 4, the depth value of the pixel corresponding to a portion Gd of the lens is smaller than the minimum value, causing the eyes to pop out in the 3D image generated from this 3D point cloud data as is.
 補正部25は例えば、眼鏡領域内の、正常範囲を逸脱している対象画素のデプス値を、隣接する複数の画素のデプス値をもとに補間する。図6は、隣接画素からの補間処理の具体例を示す図である。図6において、画素G、H、I、L、M、Nが、デプス値が正常範囲に収まっていない画素を示している。補正部25は、対象画素Gに、左上の画素Aのデプス値、真上の画素Bのデプス値、右上の画素Cのデプス値、左の画素Fのデプス値、左下の画素Kのデプス値の平均値を割り当てる。なお、平均値の代わりに中央値を割り当ててもよい。 For example, the correction unit 25 interpolates the depth value of a target pixel in the glasses area that is outside the normal range based on the depth values of multiple adjacent pixels. Figure 6 is a diagram showing a specific example of the interpolation process from adjacent pixels. In Figure 6, pixels G, H, I, L, M, and N indicate pixels whose depth values are not within the normal range. The correction unit 25 assigns to the target pixel G the average value of the depth value of pixel A at the top left, the depth value of pixel B directly above, the depth value of pixel C at the top right, the depth value of pixel F on the left, and the depth value of pixel K at the bottom left. Note that the median may be assigned instead of the average value.
 補正部25は、対象画素Hに、左上の画素Bのデプス値、真上の画素Cのデプス値、右上の画素Dのデプス値、左の画素Gのデプス値のデプス値の平均値を割り当てる。この例では、補間済みの画素Gは参照先に使用し、補間済みでない画素Lは参照先から除外している。なお、参照する画素の範囲は図6に示す例に限らず、右側や下側の画素も参照元に含めてもよい。また、時間方向に近接するフレームの眼鏡領域内の同一位置の画素が正常な場合、補正部25は、当該正常な画素のデプス値を、対象画素に割り当ててもよい。 The correction unit 25 assigns to the target pixel H the average depth value of the top left pixel B, the depth value of the pixel C directly above, the depth value of the pixel D on the top right, and the depth value of the pixel G on the left. In this example, the interpolated pixel G is used as the reference, and the non-interpolated pixel L is excluded from the reference. Note that the range of pixels to be referenced is not limited to the example shown in FIG. 6, and pixels to the right and below may also be included in the reference source. Furthermore, if a pixel at the same position in the glasses area of a frame adjacent in the time direction is normal, the correction unit 25 may assign the depth value of that normal pixel to the target pixel.
 また、補正部25は眼鏡領域内の、正常範囲を逸脱している対象画素のデプス値を、水平方向にスプライン補間して補正してもよい。図7(a)-(b)は、スプライン補間処理の具体例を示す図である。図7(a)は、眼鏡領域の、ある水平方向のラインのデプス値をプロットした図である。図7(a)には、デプス値が正常範囲を逸脱している部分(レンズの一部分Gd)が含まれる。補正部25は、正常範囲を逸脱している部分の前後を結ぶ区間の関数を、スプライン補間により算出し、算出したスプライン曲線関数(一般的に、3次多項式で求められる)上に、正常範囲を逸脱している部分のデプス値を割り当てる。図7(b)は、正常範囲を逸脱しているデプス値を、スプライン曲線上に割り当てる様子を示している。 The correction unit 25 may also correct the depth values of target pixels in the glasses region that are outside the normal range by performing spline interpolation in the horizontal direction. Figures 7(a) and 7(b) are diagrams showing a specific example of spline interpolation processing. Figure 7(a) is a diagram plotting depth values of a horizontal line in the glasses region. Figure 7(a) includes a portion where the depth value is outside the normal range (a part of the lens Gd). The correction unit 25 calculates the function of the section connecting the front and back of the portion that is outside the normal range by spline interpolation, and assigns the depth value of the portion that is outside the normal range onto the calculated spline curve function (generally calculated using a cubic polynomial). Figure 7(b) shows how the depth value that is outside the normal range is assigned onto the spline curve.
 図3に戻る。眼鏡領域の対象画素の補正が終了、または対象画素のデプス値が正常であった場合、補正部25は眼鏡領域の対象画素をインクリメントする(ステップS22)。具体的には眼鏡領域の対象画素のアドレスを走査方向に一画素、進める。ステップS18に遷移する。 Returning to FIG. 3, when correction of the target pixel in the glasses area is completed, or when the depth value of the target pixel is normal, correction unit 25 increments the target pixel in the glasses area (step S22). Specifically, the address of the target pixel in the glasses area advances by one pixel in the scanning direction. Then, the process proceeds to step S18.
 眼鏡領域の全画素のチェックが終了すると(ステップS18:YES)、検出部24は、カラー画像の人物領域から毛髪領域を抽出する(ステップS23)。補正部25は毛髪領域の対象画素に初期値を設定する(ステップS24)。通常、初期値は毛髪領域の一番左上の画素に設定される。補正部25は毛髪領域の全画素のチェックが終了したか否か判定する(ステップS25)。 When checking of all pixels in the glasses region has been completed (step S18: YES), the detection unit 24 extracts the hair region from the person region of the color image (step S23). The correction unit 25 sets an initial value to the target pixel in the hair region (step S24). Typically, the initial value is set to the top left pixel of the hair region. The correction unit 25 determines whether checking of all pixels in the hair region has been completed (step S25).
 毛髪領域の全画素のチェックが終了していない場合(ステップS25:NO)、補正部25は対象画素のデプス値が0であるか否か判定する(ステップS26)。対象画素のデプス値が0でない場合(ステップS26:NO)、ステップS28に遷移する。対象画素のデプス値が0である場合(ステップS26:YES)、補正部25は対象画素のデプス値を、毛髪領域内の正常なデプス値を使用して補正する(ステップS27)。 If checking of all pixels in the hair region has not been completed (step S25: NO), the correction unit 25 determines whether the depth value of the target pixel is 0 or not (step S26). If the depth value of the target pixel is not 0 (step S26: NO), the process proceeds to step S28. If the depth value of the target pixel is 0 (step S26: YES), the correction unit 25 corrects the depth value of the target pixel using a normal depth value in the hair region (step S27).
 図8は、垂直方向の画素からの補間処理の具体例を示す図である。図8において、一番上の、最も濃い色の画素群はデプス値が存在しない領域(背景領域B1)を示し、その下の白色の画素群は毛髪領域の欠落部分Hm(図4参照)を示し、一番下の中間の色の画素群はデプス値を有している毛髪領域Hnを示している。 Figure 8 shows a specific example of the interpolation process from pixels in the vertical direction. In Figure 8, the top group of darkest pixels indicates an area where no depth value exists (background area B1), the white pixels below that indicate the missing part Hm of the hair area (see Figure 4), and the bottom group of medium-colored pixels indicates the hair area Hn that has a depth value.
 補正部25は、毛髪領域の各垂直方向のラインにおいて、デプス値が欠落している画素に、デプス値が欠落していない画素のデプス値を割り当てる。図8に示す例では、画素A、B、C、D、E、Fのデプス値が欠落しており、補正部25は、デプス値が欠落していない画素Gのデプス値を画素A、B、C、D、E、Fに割り当てる。なお、頭頂部側の画素のデプス値が欠落しておらず、額側の画素のデプス値が欠落している場合、補正部25は、欠落していない頭頂部側の画素のデプス値を、欠落している額側の画素に割り当てる。 The correction unit 25 assigns the depth value of a pixel that has a non-missing depth value to a pixel that has a missing depth value in each vertical line of the hair region. In the example shown in FIG. 8, the depth values of pixels A, B, C, D, E, and F are missing, and the correction unit 25 assigns the depth value of pixel G, which has a non-missing depth value, to pixels A, B, C, D, E, and F. Note that if the depth values of the pixels on the top of the head are not missing, but the depth values of the pixels on the forehead are missing, the correction unit 25 assigns the depth value of the non-missing pixel on the top of the head to the missing pixel on the forehead.
 補正部25は、対象フレームの毛髪領域の全ての垂直ラインの補正が終了した後、各水平ラインにフィルタ(例えば、移動平均)をかけて、水平方向のデプス値の変化を滑らかにしてもよい。 After completing correction of all vertical lines in the hair region of the target frame, the correction unit 25 may apply a filter (e.g., a moving average) to each horizontal line to smooth out changes in horizontal depth values.
 なお、時間方向に隣接するフレームの毛髪領域内の同一位置の画素が欠落していない場合、補正部25は、当該欠落していない画素のデプス値を、欠落している対象画素に割り当てることで、対象画素を補正してもよい。 In addition, if there is no missing pixel at the same position in the hair region of an adjacent frame in the time direction, the correction unit 25 may correct the missing target pixel by assigning the depth value of the non-missing pixel to the missing target pixel.
 なお、毛髪領域内においてデプス値が欠落していない画素であっても、異常なデプス値を有する画素である場合、補正の対象としてもよい。例えば、眼鏡領域と同様に、顔領域のデプス値の正常範囲に収まらないデプス値を有する画素も補正の対象に含めてもよい。 Note that even if a pixel in the hair region has no missing depth values, it may be subject to correction if it has an abnormal depth value. For example, like the glasses region, pixels with depth values that do not fall within the normal range of depth values in the face region may also be included as subjects of correction.
 図3に戻る。毛髪領域の対象画素の補正が終了、または対象画素のデプス値が欠落していない場合、補正部25は毛髪領域の対象画素をインクリメントする(ステップS28)。具体的には毛髪領域の対象画素のアドレスを走査方向に一画素、進める。ステップS25に遷移する。毛髪領域の全画素のチェックが終了すると(ステップS25:YES)、対象フレームの補正処理を終了する。 Return to FIG. 3. When correction of the target pixel in the hair region is complete, or when the depth value of the target pixel is not missing, correction unit 25 increments the target pixel in the hair region (step S28). Specifically, the address of the target pixel in the hair region advances by one pixel in the scanning direction. Then, proceed to step S25. When checking of all pixels in the hair region is complete (step S25: YES), the correction process for the target frame ends.
 図9は、眼鏡の反射と毛髪の欠落が補正された後の人物領域Pdの距離画像D1bを示す。補正後の距離画像に基づく3次元画像では、目の飛び出しや毛髪の一部が欠落することが回避される。 Figure 9 shows a distance image D1b of the person area Pd after the reflection from the glasses and missing hair have been corrected. In a 3D image based on the corrected distance image, popping out eyes and missing hair are avoided.
 以上説明したように本実施形態によれば、眼鏡領域または毛髪領域に含まれる異常なデプス値を補正することで、人物の3次元点群データを高精度に生成することができる。 As described above, according to this embodiment, by correcting abnormal depth values contained in the glasses region or hair region, it is possible to generate 3D point cloud data of a person with high accuracy.
 以上、本発明を実施形態に基づき説明した。この実施形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。 The present invention has been described above based on an embodiment. This embodiment is merely an example, and it will be understood by those skilled in the art that various modifications are possible in the combination of each component and each treatment process, and that such modifications are also within the scope of the present invention.
 図10は、変形例に係る撮像装置1の構成例を示す機能ブロック図である。変形例では、図1に示した画像処理装置2の検出部24および補正部25の機能が、撮像装置1の処理部13内に組み込まれている。この場合、撮像装置1の処理部13は、画像処理装置2と同様の機能を有していると言える。また変形例では、撮像装置1の処理部13に点群データ生成部18が設けられ、点群データ生成部18は、補正部25により補正された距離画像と、カラー画像を組み合わせて3次元点群データを生成する。 FIG. 10 is a functional block diagram showing an example configuration of an imaging device 1 according to a modified example. In this modified example, the functions of the detection unit 24 and correction unit 25 of the image processing device 2 shown in FIG. 1 are incorporated into the processing unit 13 of the imaging device 1. In this case, it can be said that the processing unit 13 of the imaging device 1 has the same functions as the image processing device 2. Also, in this modified example, a point cloud data generation unit 18 is provided in the processing unit 13 of the imaging device 1, and the point cloud data generation unit 18 generates three-dimensional point cloud data by combining the distance image corrected by the correction unit 25 and a color image.
 検出部24および補正部25の機能がソフトウェアで実装される場合、距離画像の補正処理を含む実施形態に係る画像処理プログラムが予め処理部13内にインストールされていてもよいし、事後的にインストールされてもよい。後者の場合、当該画像処理プログラムは、アプリケーションプログラムストアからネットワークを介して撮像装置1にダウンロードされてインストールされる。例えば、撮像装置1がスマートフォンの場合、当該画像処理プログラムを実行するハードウェア要件を十分に満たしている。 When the functions of the detection unit 24 and the correction unit 25 are implemented by software, an image processing program according to an embodiment including distance image correction processing may be pre-installed in the processing unit 13, or may be installed later. In the latter case, the image processing program is downloaded from an application program store via a network to the imaging device 1 and installed. For example, if the imaging device 1 is a smartphone, it sufficiently meets the hardware requirements for executing the image processing program.
 図3のフローチャートでは、眼鏡領域の異常なデプス値を補正する処理と、毛髪領域の異常なデプス値を補正する処理を両方、実行する例を説明した。この点、いずれか一方の補正処理のみを実行する形態も、本発明の一実施形態に含まれる。 The flowchart in FIG. 3 describes an example in which both a process for correcting abnormal depth values in the glasses region and a process for correcting abnormal depth values in the hair region are executed. In this regard, an embodiment of the present invention also includes a form in which only one of the correction processes is executed.
 本発明は、人物の3次元点群データの補正に利用可能である。 The present invention can be used to correct 3D point cloud data of a person.
 1 撮像装置、 11 可視光撮像部、 12 測距センサ部、 13 処理部、 14 カラー画像データ生成部、 15 デプスデータ生成部、 16 圧縮符号化部、 17 送信部、 18 点群データ生成部 2 画像処理装置、 21 受信部、 22 伸張復号部、 23 分離部、 24 検出部、 25 補正部、 26 点群データ生成部。 1 imaging device, 11 visible light imaging section, 12 distance measurement sensor section, 13 processing section, 14 color image data generation section, 15 depth data generation section, 16 compression encoding section, 17 transmission section, 18 point cloud data generation section, 2 image processing device, 21 receiving section, 22 decompression decoding section, 23 separation section, 24 detection section, 25 correction section, 26 point cloud data generation section.

Claims (5)

  1.  可視光撮像部で撮像されカラー画像から人物領域と、前記人物領域内の注目領域を検出する検出部と、
     測距センサ部の出力をもとに生成された距離画像における、前記カラー画像の人物領域に対応する人物領域内の注目領域に含まれる異常なデプス値を、前記距離画像の人物領域内の正常なデプス値を使用して補正する補正部と、
     を備える画像処理装置。
    a detection unit that detects a person area and an attention area within the person area from a color image captured by the visible light imaging unit;
    a correction unit that corrects abnormal depth values included in an attention area in a person area corresponding to the person area of the color image in a distance image generated based on an output of a distance measurement sensor unit, by using normal depth values in the person area of the distance image;
    An image processing device comprising:
  2.  前記検出部は、前記カラー画像の前記人物領域内の注目領域として顔領域を検出し、当該顔領域内の眼鏡領域を検出し、
     前記補正部は、前記眼鏡領域を除いた前記顔領域のデプス値の最大値と最小値の範囲に収まらないデプス値を、前記異常なデプス値と判定する、
     請求項1に記載の画像処理装置。
    the detection unit detects a face area as a region of interest within the person area of the color image, and detects a glasses area within the face area;
    The correction unit determines, as the abnormal depth value, a depth value that does not fall within a range between a maximum value and a minimum value of the depth value of the face region excluding the glasses region.
    The image processing device according to claim 1 .
  3.  前記補正部は、前記眼鏡領域内の異常なデプス値を、隣接する複数の画素のデプス値をもとに補間、または水平方向にスプライン補間して補正する、
     請求項2に記載の画像処理装置。
    The correction unit corrects abnormal depth values in the glasses region by interpolating based on depth values of a plurality of adjacent pixels or by performing spline interpolation in a horizontal direction.
    The image processing device according to claim 2 .
  4.  前記検出部は、前記カラー画像の前記人物領域内の注目領域として毛髪領域を検出し、
     前記補正部は、前記毛髪領域内の異常なデプス値を、前記毛髪領域内の垂直方向の正常なデプス値をもとに補正する、
     請求項1に記載の画像処理装置。
    the detection unit detects a hair region as a region of interest within the person region of the color image;
    The correction unit corrects the abnormal depth value in the hair region based on a normal depth value in the vertical direction in the hair region.
    The image processing device according to claim 1 .
  5.  可視光撮像部で撮像されカラー画像から人物領域と、前記人物領域内の注目領域を検出する処理と、
     測距センサ部の出力をもとに生成された距離画像における、前記カラー画像の人物領域に対応する人物領域内の注目領域に含まれる異常なデプス値を、前記距離画像の人物領域内の正常なデプス値を使用して補正する処理と、
     をコンピュータに実行させる画像処理プログラム。
    A process of detecting a person area and an attention area within the person area from a color image captured by a visible light imaging unit;
    a process of correcting an abnormal depth value included in a region of interest in a person region of the color image, in a distance image generated based on an output from a distance measurement sensor unit, by using a normal depth value in the person region of the distance image;
    An image processing program that causes a computer to execute the above.
PCT/JP2023/022307 2022-11-28 2023-06-15 Image processing device and image processing program WO2024116444A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022189136 2022-11-28
JP2022-189136 2022-11-28

Publications (1)

Publication Number Publication Date
WO2024116444A1 true WO2024116444A1 (en) 2024-06-06

Family

ID=91323218

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/022307 WO2024116444A1 (en) 2022-11-28 2023-06-15 Image processing device and image processing program

Country Status (1)

Country Link
WO (1) WO2024116444A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014070936A (en) * 2012-09-28 2014-04-21 Dainippon Screen Mfg Co Ltd Error pixel detecting apparatus, error pixel detecting method, and error pixel detecting program
US20170069071A1 (en) * 2015-09-04 2017-03-09 Electronics And Telecommunications Research Institute Apparatus and method for extracting person region based on red/green/blue-depth image
WO2020026677A1 (en) * 2018-07-31 2020-02-06 株式会社ニコン Detection device, processing device, detection method, and processing program
WO2020066637A1 (en) * 2018-09-28 2020-04-02 パナソニックIpマネジメント株式会社 Depth acquisition device, depth acquisition method, and program
CN111626086A (en) * 2019-02-28 2020-09-04 北京市商汤科技开发有限公司 Living body detection method, living body detection device, living body detection system, electronic device, and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014070936A (en) * 2012-09-28 2014-04-21 Dainippon Screen Mfg Co Ltd Error pixel detecting apparatus, error pixel detecting method, and error pixel detecting program
US20170069071A1 (en) * 2015-09-04 2017-03-09 Electronics And Telecommunications Research Institute Apparatus and method for extracting person region based on red/green/blue-depth image
WO2020026677A1 (en) * 2018-07-31 2020-02-06 株式会社ニコン Detection device, processing device, detection method, and processing program
WO2020066637A1 (en) * 2018-09-28 2020-04-02 パナソニックIpマネジメント株式会社 Depth acquisition device, depth acquisition method, and program
CN111626086A (en) * 2019-02-28 2020-09-04 北京市商汤科技开发有限公司 Living body detection method, living body detection device, living body detection system, electronic device, and storage medium

Similar Documents

Publication Publication Date Title
KR102377728B1 (en) Image processing method, computer readable storage medium, and electronic device
US20210365720A1 (en) Image processing apparatus, image processing method, and storage medium for lighting processing on image using model data
EP3477931B1 (en) Image processing method and device, readable storage medium and electronic device
US8379976B2 (en) Image processing apparatus and method and a computer-readable recording medium on which an image processing program is stored
US10194135B2 (en) Three-dimensional depth perception apparatus and method
CN107451969B (en) Image processing method, image processing device, mobile terminal and computer readable storage medium
CN108307675B (en) Multi-baseline camera array system architecture for depth enhancement in VR/AR applications
TWI610571B (en) Display method, system and computer-readable recording medium thereof
WO2014185064A1 (en) Image processing method and system
CN107993209B (en) Image processing method, image processing device, computer-readable storage medium and electronic equipment
JP2010113720A (en) Method and apparatus for combining range information with optical image
CN107800965B (en) Image processing method, device, computer readable storage medium and computer equipment
US11143879B2 (en) Semi-dense depth estimation from a dynamic vision sensor (DVS) stereo pair and a pulsed speckle pattern projector
US10663593B2 (en) Projector apparatus with distance image acquisition device and projection method
WO2021110035A1 (en) Eye positioning apparatus and method, and 3d display device, method and terminal
CN108924426B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN109685853B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
WO2018233217A1 (en) Image processing method, device and augmented reality apparatus
US10057554B2 (en) Projection control device, projection control method and non-transitory storage medium
US9554121B2 (en) 3D scanning apparatus and method using lighting based on smart phone
CN107563329B (en) Image processing method, image processing device, computer-readable storage medium and mobile terminal
CN113534596A (en) RGBD stereo camera and imaging method
CN107454335B (en) Image processing method, image processing device, computer-readable storage medium and mobile terminal
US11218650B2 (en) Image processing method, electronic device, and computer-readable storage medium
US11109006B2 (en) Image processing apparatus and method