JPH1042273A

JPH1042273A - Three-dimensional position recognition utilization system

Info

Publication number: JPH1042273A
Application number: JP8195081A
Authority: JP
Inventors: Yukinori Matsumoto; 幸則松本; Hajime Terasaki; 肇寺崎; Kazuhide Sugimoto; 和英杉本; Tsutomu Arakawa; 勉荒川
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1996-07-24
Filing date: 1996-07-24
Publication date: 1998-02-13

Abstract

PROBLEM TO BE SOLVED: To provide the respective kinds of three-dimensional position recogni tion utilization systems for effectively utilizing the depth information after proposing a technique for accurately obtaining the depth of a subject from two-dimensional moving images (that are moving images photographed from a single-eye camera) or three-dimensional images (that are images photographed from a stereo camera) first. SOLUTION: The movement information on a screen of the subject is extracted from input video images (process 1). At this point, the method of block matching or the like is used. Then, the real movement of the subject in a three-dimensional space is calculated (process 2). Since the input video images are the projection of original three-dimensional movement, the movement of three-dimensional display is obtained from the movement of a plurality of representative points by inverse transformation. As a result, sine the coordinate of the subject is discriminated, the depth information of the subject is acquired (process 3). By using the depth, the respective kinds of systems are constructed (process 4).

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は物体の三次元位置
認識利用システム、特に、単眼カメラから撮影された二
次元画像またはステレオカメラから撮影された三次元画
像に写し出された被写体の三次元位置を認識し、これを
利用する各種システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a system for recognizing a three-dimensional position of an object, and more particularly to a system for recognizing the three-dimensional position of a subject projected on a two-dimensional image taken from a monocular camera or a three-dimensional image taken from a stereo camera. It relates to various systems that recognize and use this.

【０００２】[0002]

【従来の技術】テレビジョン技術分野においては、従来
より二次元画像の奥行きをもとに三次元画像、すなわち
疑似立体画像を生成する技術が知られている。この例と
して、特公昭５５−３６２４０号公報には、外部から与
えられた奥行き情報を用いた立体画像表示装置が開示さ
れている。また、雑誌ＰＩＸＥＬ（Ｎｏ．１２８）（１
９９３年５月１日発行）の９７〜１０２ページには、奥
行き情報を利用した疑似立体画像システムが提案されて
いる。さらに、特表平４−５０４３３３号公報（ＷＯ８
８／０４８０４）にも同様に、奥行き情報を利用して疑
似立体視を実現する方法が開示されている。2. Description of the Related Art In the field of television technology, there has been conventionally known a technique for generating a three-dimensional image, that is, a pseudo three-dimensional image based on the depth of a two-dimensional image. As an example of this, Japanese Patent Publication No. 55-36240 discloses a stereoscopic image display device using externally provided depth information. In addition, magazine PIXEL (No. 128) (1
A pseudo three-dimensional image system using depth information has been proposed on pages 97 to 102 of May 1, 993). Furthermore, Japanese Unexamined Patent Publication No. Hei 4-504333 (WO8)
8/04804) also discloses a method of implementing pseudo stereoscopic vision using depth information.

【０００３】また、フレーム間の対応関係の把握から奥
行き情報を生成するアイデア自体も既知である。例え
ば、特開平７−７１９４０号公報で「従来の技術」とし
て、（１）ステレオカメラで撮影された２つの画像間
で、点や線の対応付けを行い、実際のシーン空間（三次
元空間）における点や線の位置を推定する技術、（２）
カメラを移動しながら被写体を連続的に撮影し、画像上
の特徴点を追跡することで、各特徴点のシーン空間にお
ける実際の位置を推定する技術、の存在が指摘されてい
る。[0003] The idea of generating depth information from the grasp of the correspondence between frames is also known. For example, in Japanese Patent Application Laid-Open No. 7-71940, as “conventional technology”, (1) a point or a line is associated between two images captured by a stereo camera, and an actual scene space (three-dimensional space) is obtained. Technology for estimating the positions of points and lines in a computer, (2)
It has been pointed out that there is a technique of continuously photographing a subject while moving a camera and tracking feature points on an image to estimate an actual position of each feature point in a scene space.

【０００４】[0004]

【発明が解決しようとする課題】このように、奥行き情
報を用いて疑似立体画像を生成する技術自体は知られて
いるが、奥行き情報を疑似立体画像の生成以外の画像処
理のために利用する提案はあまりない。As described above, a technique for generating a pseudo three-dimensional image using depth information is known, but depth information is used for image processing other than generation of a pseudo three-dimensional image. There are not many suggestions.

【０００５】そこで本発明の目的は、まず二次元動画像
（すなわち単眼カメラから撮影した動画像）、または三
次元画像（すなわちステレオカメラから撮影した画像）
から物体、すなわち被写体の奥行きを正確に求める技術
を提案したうえで、この奥行き情報を有効利用する各種
三次元位置認識利用システムを提供することにある。Accordingly, an object of the present invention is to first provide a two-dimensional moving image (ie, a moving image taken from a monocular camera) or a three-dimensional image (ie, an image taken from a stereo camera).
It is an object of the present invention to provide a technology for accurately obtaining the depth of an object, that is, a subject, and to provide various three-dimensional position recognition systems that effectively use the depth information.

【０００６】[0006]

【課題を解決するための手段】[Means for Solving the Problems]

（１）本発明の三次元位置認識利用システムは、物体の
三次元位置を認識して利用するシステムであって、撮影
された物体の奥行きを抽出する抽出手段を備え、抽出さ
れた奥行きをもとに警備を行う。(1) The three-dimensional position recognition and use system of the present invention is a system for recognizing and using the three-dimensional position of an object, comprising extraction means for extracting the depth of a photographed object, and also using the extracted depth. And guard.

【０００７】この構成によれば、物体が撮影されたと
き、抽出手段によってその物体の奥行きが抽出される。
抽出手段はハードウエア的、ソフトウエア的、それらの
混合など、いずれの態様で構成してもよい。「抽出」と
は、検出、算出などの概念を含み、奥行きを認識または
特定することをいう。物体の奥行きが判明すれば、その
物体の動きもわかるため、これを警備に用いる。According to this configuration, when an object is photographed, the depth of the object is extracted by the extracting means.
The extraction means may be configured in any form such as hardware, software, or a mixture thereof. “Extraction” includes concepts such as detection and calculation, and refers to recognizing or specifying depth. If the depth of the object is known, the movement of the object can be known, and this is used for security.

【０００８】（２）（１）のときさらに、物体が近づい
たときに所定の処理を行うことにしてもよい。所定の処
理の例として、その物体を拡大して撮影したり、警備関
係者に警告を発したり、警備区域の照明を明るくした
り、警備に関連する処理であればどのような処理をなし
てもよい。(2) In the case of (1), a predetermined process may be performed when an object approaches. As an example of the predetermined processing, the object is enlarged and photographed, a warning is issued to a security official, the lighting of a security area is brightened, and any processing is performed as long as it is processing related to security. Is also good.

【０００９】（３）本発明の別のシステムでは、抽出さ
れた奥行きをもとにコンピュータビジョン技術における
セグメント・マッチングのための前処理が行われる。
「セグメント・マッチング」とは領域の対応関係を把握
するため、領域の持つ何らかの特徴に着目して行う画像
対比処理をいう。「前処理」とは、セグメント・マッチ
ングを円滑に行うための準備をいう。(3) In another system of the present invention, preprocessing for segment matching in computer vision technology is performed based on the extracted depth.
“Segment matching” refers to an image comparison process performed by focusing on some characteristic of the region in order to grasp the correspondence between the regions. “Preprocessing” refers to preparation for smoothly performing segment matching.

【００１０】（４）（３）のときさらに、前記前処理
は、物体の奥行きを利用してセグメント・マッチングに
おける探索エリアを狭める処理であってもよい。すなわ
ち、セグメント・マッチングでは、２つのフレーム間で
対応しあう画像領域を発見しようとするが、この際、予
め対応しそうな領域を奥行きから推測することができ
る。この推測により、対応関係を探索すべきエリアの絞
り込みを行う。(4) In the case of (3), the preprocessing may be a process of narrowing a search area in segment matching using the depth of an object. That is, in the segment matching, an image region corresponding between two frames is to be found, but at this time, a region likely to correspond can be estimated in advance from the depth. Based on this estimation, the area for which the correspondence relationship is to be searched is narrowed down.

【００１１】（５）本発明の別のシステムでは、抽出さ
れた奥行きをもとにユーザからの指示が認識され、これ
が受け付けられる。ユーザの指示は何らかの動作を伴う
と考えられるため、この動作が起きている箇所の奥行き
をもとに指示の内容を特定しようというものである。(5) In another system of the present invention, an instruction from a user is recognized based on the extracted depth and accepted. Since the user's instruction is considered to involve some operation, the content of the instruction is specified based on the depth of the place where this operation occurs.

【００１２】（６）（５）のときさらに、撮影された物
体のうち最も奥行きの小さな箇所の動きをユーザの指示
と判断してこれを受け付けてもよい。(6) In the case of (5), the movement of a portion having the smallest depth in the photographed object may be determined as a user's instruction and accepted.

【００１３】（７）本発明のさらに別のシステムでは、
抽出された奥行きをもとに疑似立体画像の表示制御が行
われる。表示制御の例として疑似立体画像が正しく知覚
される位置の調整がある。(7) In still another system of the present invention,
Display control of the pseudo stereoscopic image is performed based on the extracted depth. As an example of display control, there is adjustment of a position at which a pseudo stereoscopic image is correctly perceived.

【００１４】（８）（７）のときさらに、撮影された物
体の位置に疑似立体画像の観視者が存在すると判断し、
この位置にて疑似立体画像が良好に表示されるよう表示
位置制御を行ってもよい。特に、観視者の頭の位置を前
記奥行きとともに輪郭検出等の画像処理技術と組み合わ
せて特定すれば、その頭の位置で疑似立体画像が最も効
果的に表示されるよう制御することも可能である。(8) In the case of (7), it is further determined that there is a viewer of the pseudo stereoscopic image at the position of the photographed object,
Display position control may be performed so that the pseudo three-dimensional image is favorably displayed at this position. In particular, if the position of the viewer's head is specified in combination with image processing techniques such as contour detection together with the depth, it is possible to control such that the pseudo three-dimensional image is displayed most effectively at that head position. is there.

【００１５】[0015]

【発明の実施の形態】本発明の好適な実施形態を適宜図
面を参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Preferred embodiments of the present invention will be described with reference to the drawings.

【００１６】実施形態１では、単眼カメラから撮影され
た二次元動画像から奥行き情報を抽出し、これを警備シ
ステムの距離センサとして利用する技術を説明する。In a first embodiment, a technique will be described in which depth information is extracted from a two-dimensional moving image captured by a monocular camera and is used as a distance sensor of a security system.

【００１７】実施形態２では、多眼カメラ、すなわちス
テレオカメラによる映像をもとに実施形態１同様の処理
を行う。実施形態３〜５では、それぞれ実施形態１また
は２同様に奥行き情報を抽出し、これをそれぞれ、コン
ピュータビジョン、コンピュータシステムのユーザイン
タフェイス、疑似立体表示の制御に利用する。In the second embodiment, the same processing as in the first embodiment is performed based on an image from a multi-lens camera, that is, a stereo camera. In the third to fifth embodiments, depth information is extracted as in the first and second embodiments, and is used for controlling computer vision, a user interface of a computer system, and pseudo three-dimensional display, respectively.

【００１８】実施形態１．本実施形態ではまず、コンピ
ュータビジョンにおける手法をテレビジョン分野をはじ
めとする画像処理分野に応用し、奥行き情報に基づく正
しい立体画像を生成する技術を説明する。 Embodiment 1 FIG . In the present embodiment, first, a technique for generating a correct stereoscopic image based on depth information by applying a computer vision technique to an image processing field such as a television field will be described.

【００１９】本実施形態では、二次元動画像の動きを検
出し、この動きからその動画像のシーンと撮影視点との
三次元運動を算出し、この相対的な三次元運動と画像各
部位の動きをもとに、前記撮影視点から画像各部位まで
の相対距離を算出することによって、奥行き情報を導出
する。In the present embodiment, the motion of a two-dimensional moving image is detected, the three-dimensional motion between the scene of the moving image and the photographing viewpoint is calculated from the motion, and the relative three-dimensional motion and each part of the image are calculated. Depth information is derived by calculating a relative distance from the shooting viewpoint to each part of the image based on the movement.

【００２０】この技術を別の表現でいえば、処理の対象
となる二次元動画像から複数の映像フレーム（以下単に
「フレーム」ともいう）を選択し、これらのフレーム間
における二次元的な位置の変位から映像各部位が現実の
三次元空間において占める相対位置関係を導出し、その
結果に従って奥行きを決定する。すなわち、前記二次元
的な位置の変位から前記映像各部位の三次元的な動きを
算出し、この動きから三角測量の原理によって前記映像
各部位の三次元空間における位置座標を算出し、その結
果に従って奥行きを決定する。ここで「フレーム」と
は、１つの画像処理単位であり、例えばＭＰＥＧでいう
フレームピクチャやフィールドピクチャを含む概念であ
る。In other words, this technique selects a plurality of video frames (hereinafter simply referred to as “frames”) from a two-dimensional moving image to be processed, and determines a two-dimensional position between these frames. The relative positional relationship occupied by each part of the image in the actual three-dimensional space is derived from the displacement of the image, and the depth is determined according to the result. That is, the three-dimensional movement of each part of the image is calculated from the displacement of the two-dimensional position, and the position coordinates of each part of the image in the three-dimensional space are calculated from the movement by the principle of triangulation. Determine the depth according to Here, the “frame” is a unit of image processing, and is a concept including, for example, a frame picture and a field picture in MPEG.

【００２１】二次元動画像の場合、前記「複数の映像フ
レーム」は、異なる時刻に撮影されたフレームであるか
ら以降これを「異時刻フレーム」と呼び、後述の多眼カ
メラによって同時に撮影された複数フレーム、すなわち
「同時刻フレーム」と区別する。「二次元的な位置の変
位」とは、フレーム平面上における位置の変位をいう。
実施形態１では、異時刻フレームが問題になるため、
「二次元的な位置の変位」は、時間経過に伴う位置の変
化、すなわち動きを指す。なお後述の同時刻フレームの
場合は、「二次元的な位置の変位」が複数フレーム間の
位置のずれを指すことに注意すべきである。In the case of a two-dimensional moving image, since the "plurality of video frames" are frames shot at different times, they are hereinafter referred to as "different time frames" and are simultaneously shot by a multi-lens camera described later. A plurality of frames, that is, “simultaneous frames” are distinguished. "Two-dimensional displacement of position" refers to displacement of a position on a frame plane.
In the first embodiment, since the different time frame becomes a problem,
"Two-dimensional displacement of position" refers to a change in position over time, that is, movement. It should be noted that, in the case of the same-time frame described later, the “two-dimensional position displacement” indicates a position shift between a plurality of frames.

【００２２】図１は本実施形態によって三次元表示画像
を生成するための主な工程を示す図である。本実施形態
は以下の工程に従って元となる二次元映像から奥行き情
報を抽出し、これを利用する。ここで工程１〜３が映像
の解析による奥行き情報の抽出、工程４がその利用、特
に実施形態１では警備システムへの利用に関するもので
ある。まず各工程の概略を説明する。FIG. 1 is a diagram showing main steps for generating a three-dimensional display image according to the present embodiment. In the present embodiment, depth information is extracted from the original two-dimensional image according to the following steps, and is used. Here, steps 1 to 3 relate to the extraction of depth information by analyzing a video, and step 4 relates to its use, particularly in the first embodiment for use in a security system. First, the outline of each step will be described.

【００２３】［工程１］二次元動き情報の抽出映像に含まれる被写体の動き情報を抽出する。この動き
情報は二次元情報である。表示画面を座標平面上にと
り、被写体のこの画面上の動きを二次元座標で記述す
る。[Step 1] Extraction of two-dimensional motion information The motion information of the subject contained in the video is extracted. This motion information is two-dimensional information. The display screen is set on a coordinate plane, and the movement of the subject on the screen is described in two-dimensional coordinates.

【００２４】本工程では、被写体の動きの把握のために
映像フレーム間の対応関係を検出する。基準フレームで
ある時刻ｔの映像フレーム（以降「フレームｔ」とい
う）に予め複数の代表点を設定しておき、別の時刻ｔ'
の対象フレーム（以降「フレームｔ' 」という）におけ
る前記各代表点の対応点を追跡する。フレームｔとｔ'
は異時刻フレームを形成するが、これらは時間的に隣接
するフレームである必要はない。また本工程の特徴は、
被写体の水平方向のみならず、任意の方向の動きから二
次元動き情報を抽出できる点にある。本明細書では、以
降ｔ、ｔ' がそれぞれ時刻を示す。In this step, the correspondence between video frames is detected in order to grasp the movement of the subject. A plurality of representative points are set in advance in a video frame at time t (hereinafter referred to as “frame t”) which is a reference frame, and a different time t ′
The corresponding points of the respective representative points in the target frame (hereinafter referred to as “frame t ′”) are tracked. Frames t and t '
Form inter-time frames, which need not be temporally adjacent frames. The feature of this process is
The point is that two-dimensional motion information can be extracted from the motion not only in the horizontal direction but also in any direction of the subject. In this specification, hereinafter, t and t 'each indicate time.

【００２５】本明細書では以降「フレーム」といえば、
フィールドなど広く映像構成単位全般を指すものとし、
例えば５２５本の走査線によって構成されるテレビ受像
器の１フレームや、６４０×４８０画素によって構成さ
れるパーソナルコンピュータの一画面のみを指すもので
はない。また、代表点はフレームｔのみならず、フレー
ムｔ、ｔ' の両方に設定してもよい。In the present specification, the term "frame" is hereinafter referred to as "frame".
Broadly refers to the overall video composition unit such as the field,
For example, it does not refer to only one frame of a television receiver composed of 525 scanning lines or one screen of a personal computer composed of 640 × 480 pixels. The representative point may be set not only in the frame t but also in both the frames t and t ′.

【００２６】［工程２］三次元動き情報の算出被写体の二次元的な動きが判明すれば、この被写体の三
次元空間における現実の動き情報（以下、三次元動き情
報ともいう）を算出する。この際、代表点と対応点の組
を多数とることにより、被写体が実際に起こす動きを並
進および回転の両運動成分によって記述する。[Step 2] Calculation of Three-Dimensional Motion Information When the two-dimensional motion of the subject is determined, actual motion information of the subject in the three-dimensional space (hereinafter, also referred to as three-dimensional motion information) is calculated. At this time, by taking a large number of pairs of representative points and corresponding points, the motion actually caused by the subject is described by both translational and rotational motion components.

【００２７】［工程３］奥行き情報の獲得被写体の現実の動きの様子がわかれば、各時刻における
被写体の相対位置関係が判明する。この関係がわかれば
被写体またはその各部位（以降単に「映像各部位」とも
いう）の奥行き情報が求められる。[Step 3] Acquisition of Depth Information If the actual state of movement of the subject is known, the relative positional relationship of the subject at each time can be determined. If this relationship is known, depth information of the subject or each part thereof (hereinafter, also simply referred to as “each part of the image”) is obtained.

【００２８】［工程４］奥行き情報の利用奥行き情報をもとに警備システムを構築する。例えば、
ある場所に対して一定の距離以内に近づいてくる物体が
あれば、警告を発するなどの処理を行う。[Step 4] Use of Depth Information A security system is constructed based on the depth information. For example,
If there is an object approaching a certain place within a certain distance, processing such as issuing a warning is performed.

【００２９】以上が概略である。以降、各工程を詳述す
る。The above is an outline. Hereinafter, each step will be described in detail.

【００３０】［工程１］二次元動き情報の抽出図２は映像フレーム間の対応関係を検出するためのフロ
ーチャートである。同図に示す各ステップを説明する。[Step 1] Extraction of Two-Dimensional Motion Information FIG. 2 is a flowchart for detecting the correspondence between video frames. Each step shown in the figure will be described.

【００３１】（Ｓ１０）フレームｔに代表点を設定図３に示すように、まず基準フレームｔに代表点を設定
する。同図では、フレームｔを８×８画素ごとにメッシ
ュによって分割し、各代表点を格子点に置いている。こ
こでは左からｉ番目、上からｊ番目の代表点をＰt
（ｉ, ｊ）と記述し、Ｐt （ｉ, ｊ）に対する時刻ｔ'
の対応点をＰt'（ｉ, ｊ）と記述する。また、必要に応
じてＰt （ｉ, ｊ）のｘ、ｙ座標をそれぞれ、Ｐt
（ｉ, ｊ）x 、Ｐt （ｉ, ｊ）y と記述する。(S10) Setting a representative point in the frame t As shown in FIG. 3, first, a representative point is set in the reference frame t. In the figure, a frame t is divided by a mesh every 8 × 8 pixels, and each representative point is placed at a grid point. Here, the i-th representative point from the left and the j-th representative point from the top are Pt
(I, j), and the time t 'with respect to Pt (i, j).
Is described as Pt '(i, j). Also, if necessary, the x and y coordinates of Pt (i, j)
(I, j) x and Pt (i, j) y.

【００３２】なお本ステップにおいて、代表点は格子点
に限らず任意の箇所に配置してもよい。極端な場合は全
画素を代表点とすることもできる。In this step, the representative points are not limited to the lattice points, but may be arranged at arbitrary positions. In extreme cases, all pixels can be used as representative points.

【００３３】（Ｓ１１）対応点候補領域の設定例えば図３に示すＰt （６, ４）という代表点を考えた
場合、Ｐt'（６, ４）の存在しうる領域を予め設定す
る。これは映像の動きがある限度以上に急激な場合を除
き、Ｐt'（６, ４）がＰt （６, ４）の近傍にあるとい
う仮定に基づく。本実施形態では、例えばＰt （６,
４）の近傍１００×６０画素の領域にＰt'（６, ４）が
入るものとし、Ｐt'（６, ４）の検出の際の計算量を削
減する。(S11) Setting of Corresponding Point Candidate Area For example, in consideration of a representative point Pt (6, 4) shown in FIG. 3, an area where Pt ′ (6, 4) can exist is set in advance. This is based on the assumption that Pt '(6,4) is in the vicinity of Pt (6,4), except when the motion of the image is more abrupt than a certain limit. In the present embodiment, for example, Pt (6,
It is assumed that Pt '(6, 4) is included in the area of 100 × 60 pixels in the vicinity of 4), and the amount of calculation at the time of detecting Pt' (6, 4) is reduced.

【００３４】本ステップについては以下の応用が可能で
ある。The following application is possible for this step.

【００３５】１．映像が比較的激しく動いているときは
フレームｔ、ｔ' が隣接するようにｔ' を決める。代表
点の位置変化は最小に抑制され、対応点が前記領域に入
らないおそれも最小となる。ただし当然ながら、予め候
補領域を画面全体とする方法も考えられる。その場合、
計算量が増加するが、映像の大きな動きによって対応点
を取り逃すおそれは減る。1. When the image is moving relatively violently, t 'is determined so that frames t and t' are adjacent to each other. The change in the position of the representative point is suppressed to a minimum, and the possibility that the corresponding point does not enter the area is also minimized. However, as a matter of course, a method in which the candidate area is set to the entire screen in advance may be considered. In that case,
Although the amount of calculation increases, the possibility that a corresponding point is missed due to a large motion of the image decreases.

【００３６】２．本実施形態では単にＰt'（６, ４）が
Ｐt （６, ４）の近傍にあると仮定したが、いったんＰ
t （６, ４）の複数フレームにおける移動軌跡が判明し
たときは、この軌跡の延長線上に候補領域を決めること
も可能である。映像の動きがある程度一定している場
合、この方法による対応点候補領域の絞り込みは非常に
有効である。2. In the present embodiment, it is simply assumed that Pt ′ (6, 4) is near Pt (6, 4).
When the movement trajectory in a plurality of frames of t (6, 4) is found, it is possible to determine a candidate area on an extension of this trajectory. If the motion of the video is constant to some extent, narrowing down the corresponding point candidate areas by this method is very effective.

【００３７】（Ｓ１２）対応点候補領域における非類似
度の計算つづいて、候補領域から対応点の位置を具体的に決めて
いく。ただし本ステップでは、前ステップとは逆に、映
像の動きが緩慢過ぎる場合に問題が生じる。動きが少な
いと動き情報の抽出が困難になり、抽出された情報が大
きな誤差を含む可能性があるためである。(S12) Calculation of Dissimilarity in Corresponding Point Candidate Area Next, the position of the corresponding point is specifically determined from the candidate area. However, in this step, contrary to the previous step, a problem occurs when the motion of the video is too slow. This is because if the movement is small, it becomes difficult to extract the motion information, and the extracted information may include a large error.

【００３８】そこでこのようなときには、予めフレーム
ｔ、ｔ' がある程度離れるようｔ' を選択する。この
際、映像各部位の変化量を統計的に処理し、例えば変化
の大きさまたは変化量の分散が所定値を超えるような
ｔ' を選択すればよい。この他の方法として、所定数以
上の特徴点（後述）の動きの総和が所定値を超えるよう
なｔ’、または特徴点の動きの分散が所定値を超えるよ
うなｔ’を選択してもよい。In such a case, t 'is selected in advance so that the frames t and t' are separated to some extent. At this time, the amount of change in each part of the image is statistically processed, and for example, t 'may be selected such that the magnitude of the change or the variance of the amount of change exceeds a predetermined value. As another method, even if t 'is selected such that the sum of the motions of the feature points (described later) exceeds a predetermined value, or t' is such that the variance of the motion of the feature points exceeds a predetermined value. Good.

【００３９】本ステップでは、対応点の位置を決めるべ
く、フレームｔ、ｔ' 間でブロックマッチングによる非
類似度の計算を行う。前記対応点候補領域内のある点を
中心とする近傍ブロックと代表点の近傍ブロックとの間
で濃度の２乗誤差の総和、すなわち非類似度を求め、こ
れが最小になる点を計算上の対応点と決める。In this step, the dissimilarity is calculated by block matching between frames t and t 'in order to determine the position of the corresponding point. The sum of the squared errors of the density, that is, the dissimilarity between the neighboring block around a certain point in the corresponding point candidate area and the neighboring block around the representative point, is calculated. Determine the point.

【００４０】図４はブロックマッチングの様子を示す図
で、本実施形態では９画素で１ブロックと定義し、中央
の画素を代表点とする。FIG. 4 is a diagram showing the state of block matching. In this embodiment, nine pixels are defined as one block, and the central pixel is a representative point.

【００４１】同図において、まずフレームｔ上にＰt
（ｉ, ｊ）を含むブロック１を取り、フレームｔ' 側で
は対応点の仮の候補Ｐt'（ｉ, ｊ）を含むブロック２を
取る。ここで一般に、画素（ｘ, ｙ）の時刻ｔにおける
画素値をＩt （ｘ, ｙ）と記述すれば、非類似度（Ｅ１
とする）は、In the figure, first, Pt is placed on frame t.
A block 1 containing (i, j) is taken, and a block 2 containing a tentative candidate Pt '(i, j) for the corresponding point is taken on the frame t' side. Here, generally, if the pixel value of the pixel (x, y) at time t is described as It (x, y), the dissimilarity (E1
)

【数１】Ｅ1=ΣΣ｛It(Pt(i,j)x+u,Pt(i,j)y+v)-It'(Pt'(i,j)x+u,Pt'(i,j)y+v)｝² （式１）によって求められる。ここで２つのΣはｕ、ｖに関す
る。これらはそれぞれ、ｕ＝−１，０，１ｖ＝−１，０，１の値を取り、仮のＰt'（ｉ, ｊ）に対して合計９画素に
関する２乗誤差総和が計算できる。そこでＰt'（ｉ,
ｊ）を候補領域内で少しずつ動かしていき、Ｅ１が最小
になる点を対応点と決めればよい。E1 = ΣΣ ｛It (Pt (i, j) x + u, Pt (i, j) y + v) -It '(Pt' (i, j) x + u, Pt '(i, j) y + v)｝ ² (Equation 1) Here, two Σs relate to u and v. These take values u = -1,0,1 v = -1,0,1 respectively, and the sum of square errors for a total of 9 pixels can be calculated for the temporary Pt '(i, j). Then Pt '(i,
j) may be moved little by little in the candidate area, and a point at which E1 becomes minimum may be determined as a corresponding point.

【００４２】図５はＰt'（ｉ, ｊ）ごとにＥ１の値を縦
方向に表した模式図であり、同図の場合、非類似度が急
峻なピークをとる点Ｑが対応点と決まる。以降、同様に
他の代表点についても対応点を決めていく。FIG. 5 is a schematic diagram showing the value of E1 in the vertical direction for each Pt '(i, j). In this case, the point Q at which the dissimilarity has a steep peak is determined as the corresponding point. . Thereafter, corresponding points are similarly determined for other representative points.

【００４３】なお、本ステップには以下の応用または変
形がある。This step has the following applications or modifications.

【００４４】１．ここでは濃淡画像を前提に、濃度の２
乗誤差を計算したが、カラー画像の場合、ＲＧＢ各濃度
の２乗誤差の総和、すなわちＥ１_R＋Ｅ１_G＋Ｅ１_Bを
非類似度としてもよい。これは他の色空間による濃度、
例えばＨＶＣ濃度であってもよい。２乗誤差の代わりに
単なる誤差の絶対値、すなわち残差の総和を採用しても
よい。２．本ステップでは１ブロックを９画素とした
が、通常はある程度多数の画素によってブロックを定義
することが望ましい。例えば通常のパーソナルコンピュ
ータまたはワークステーションの高解像度画面を想定す
る場合、実験の結果、１６×１６画素程度のブロックに
よって良好な結果が得られている。1. Here, the density 2
Although the squared error has been calculated, in the case of a color image, the sum of squared errors of each density of RGB, that is, E1 _R + E1 _G + E1 _B may be used as the dissimilarity. This is the density from other color spaces,
For example, it may be an HVC concentration. Instead of the square error, a simple absolute value of the error, that is, the sum of the residuals may be employed. 2. In this step, one block has 9 pixels, but it is usually desirable to define a block by a relatively large number of pixels. For example, when assuming a high-resolution screen of an ordinary personal computer or workstation, as a result of an experiment, good results have been obtained with a block of about 16 × 16 pixels.

【００４５】（Ｓ１３）対応点の初期位置の決定前ステップにより、一応は対応点を求めることができ
た。しかしこの段階では対応点の位置が正しいとは限ら
ない。被写体の境界やエッジについては対応点が比較的
正しく求まるものの、例えば変化の乏しい映像部位で
は、対応点の位置は相当誤差を含んでいるとみるべきで
ある。図５でいえばＥ１が明確なピークをとらない状態
といってよい。図６は前ステップの結果求められた対応
点と代表点の関係を示す図で、同図に示すごとく、家や
木、特にそれらの輪郭等の特徴的な点については対応点
が良好に求まるが、空や地面については誤差が大きい。(S13) Determination of Initial Position of Corresponding Point In the previous step, the corresponding point could be determined at first. However, at this stage, the position of the corresponding point is not always correct. Although the corresponding points can be found relatively correctly for the boundaries and edges of the subject, the position of the corresponding points should be considered to contain a considerable error, for example, in an image part with little change. In FIG. 5, it may be said that E1 does not take a clear peak. FIG. 6 is a diagram showing the relationship between the corresponding points and the representative points obtained as a result of the previous step. As shown in FIG. 6, the corresponding points are satisfactorily obtained for characteristic points such as a house and a tree, especially their contours. However, the error is large for the sky and the ground.

【００４６】そこで本ステップと次ステップにより、対
応点の位置補正を行う。本ステップでは、まず対応点の
初期位置という概念を導入し、実際にその初期位置を決
める。つづいて、次ステップで繰り返し計算による位置
精度の改善を図る。In this step and the next step, the position of the corresponding point is corrected. In this step, first, the concept of the initial position of the corresponding point is introduced, and the initial position is actually determined. Next, in the next step, the position accuracy is improved by repeated calculation.

【００４７】本ステップにおける対応点初期位置の決定
には以下の方針が考えられる。The following policy can be considered for determining the corresponding point initial position in this step.

【００４８】１．前ステップで求められた全対応点を同
等に扱う全対応点の位置をそのままそれらの初期位置として次ス
テップへ投入する。1. Treat all the corresponding points found in the previous step equally. The positions of all the corresponding points are input to the next step as their initial positions.

【００４９】２．対応点の扱いに差異を設ける当初よりある程度正しい位置にあると思われる対応点
（以降「特徴点」という）の位置はそのまま初期位置と
して使い、そうでない対応点（以降「非特徴点」とい
う）の初期位置については、前記特徴点のそれをもとに
決めるというものである。ここで特徴点としては以下の
点を想定することができるが、これらの点は現実には一
致することも多い。なお、これらの対応点に対応するも
との代表点も、本明細書では特徴点とよぶことにする。2. Provide a difference in handling of corresponding points The positions of corresponding points (hereinafter referred to as “feature points”) that seem to be in a correct position to some extent from the beginning are used as initial positions as they are, and corresponding points that are not (hereinafter referred to as “non-feature points”) Is determined based on that of the feature points. Here, the following points can be assumed as the feature points, but these points often coincide in reality. Note that the original representative points corresponding to these corresponding points are also referred to as feature points in this specification.

【００５０】（１）前ステップのＥ１が明確なピークを
示した対応点このような対応点の位置精度は一般に高いためである。(1) Corresponding point where E1 in the previous step shows a clear peak This is because the positional accuracy of such a corresponding point is generally high.

【００５１】（２）直交エッジ成分が多く存在する箇所
にある対応点建物の角などの部分では対応点の位置がかなり正しいと
思われるためである。（３）フレームｔ、ｔ' 、・・・において位置が安定的
に変化している対応点ここで変化の安定性は、動きベクトルの一定性というこ
とができる。ここでは、フレームの進行に伴って移動方
向、移動距離が一定な対応点を選ぶ。具体的には、例え
ば動きベクトルのばらつきが所定値以下の対応点を選定
する。こうした対応点は的確に追跡されているはずであ
り、代表点と正しい対応関係にあると判断できるためで
ある。ただし、例えば映像を撮影するカメラが不規則に
動いた場合には、その影響を加味して判断する。(2) Corresponding point at a position where many orthogonal edge components exist This is because the position of the corresponding point seems to be fairly correct in a portion such as a corner of a building. (3) Corresponding points whose positions are stably changing in frames t, t ',... Here, the stability of the change can be said to be the uniformity of the motion vector. Here, a corresponding point whose moving direction and moving distance are constant as the frame progresses is selected. Specifically, for example, a corresponding point whose variation of the motion vector is equal to or less than a predetermined value is selected. This is because such corresponding points should be accurately tracked, and it can be determined that they have a correct correspondence with the representative points. However, for example, when the camera that shoots the video moves irregularly, the determination is made in consideration of the influence.

【００５２】こうして特徴点が選定されれば、これらは
そのまま初期位置として使い、非特徴点の初期位置は特
徴点の位置を補間したり、または特徴点の近傍から順に
決めていくことができる。すなわち、前ステップによる
非特徴点の位置精度は低いため、精度の高い特徴点から
幾何的に非特徴点の初期位置を与えるのである。なお当
然ながら、（３）の特徴点を見つける場合にも前ステッ
プの方法を有効利用できる。When feature points are selected in this way, they can be used as initial positions as they are, and the initial positions of non-feature points can be determined by interpolating the positions of the feature points or sequentially starting from the vicinity of the feature points. That is, since the position accuracy of the non-feature point in the previous step is low, the initial position of the non-feature point is given geometrically from the highly accurate feature point. Of course, the method of the previous step can also be used effectively when finding the feature point of (3).

【００５３】以上、特徴点の選定に基づいた対応点初期
位置の決定方法を述べたが、この他、動的計画法（ダイ
ナミック・プログラミング）を用いて対応点の初期値を
求めてもよい。The method of determining the corresponding point initial position based on the selection of the feature point has been described above. Alternatively, the initial value of the corresponding point may be obtained by using a dynamic programming method (dynamic programming).

【００５４】（Ｓ１４）対応点の改善処理対応点の位置の妥当性を評価するために式を導入し、繰
り返し計算によって位置精度を改善する。Ｓ１２ステッ
プでは非類似度を評価する式１を導入したが、ここでは
さらに、対応点間の相対位置関係の妥当性を評価する式
を導入し、これら２つの評価結果を統合して位置の改善
を図る。(S14) Corresponding point improvement processing An equation is introduced to evaluate the validity of the position of the corresponding point, and the position accuracy is improved by iterative calculation. In step S12, equation 1 for evaluating the degree of dissimilarity is introduced. Here, an equation for evaluating the validity of the relative positional relationship between corresponding points is introduced, and these two evaluation results are integrated to improve the position. Plan.

【００５５】図７は相対位置を評価する原理を説明する
図である。同図において、各点はそれぞれ対応点を表し
ている。このうち、図中のＰt'（ｉ, ｊ）を中心に考え
ると、これには以下の４つの対応点、FIG. 7 is a diagram for explaining the principle of evaluating the relative position. In the figure, each point represents a corresponding point. Considering Pt '(i, j) in the figure, the following four corresponding points

【数２】Ｐt'（i-1,j ）、Ｐt'（i+1,j ）、Ｐt'（i,j-
1 ）、Ｐt'（i,j+1 ）が隣接している。Ｐt'（ｉ, ｊ）は通常、これら４つの
点の重心付近に存在すると考えるのが妥当である。これ
は、映像各部位が動いても画素単位の微視的な見方をす
れば相対位置関係がほぼ保たれるという経験則に基づい
ている。なお、この性質を数学的にいえば、（ｉ, ｊ）
の関数Ｐt'（ｉ, ｊ）の２次微分がほぼ０であるという
ことにほかならない。## EQU2 ## Pt '(i-1, j), Pt' (i + 1, j), Pt '(i, j-
1) and Pt '(i, j + 1) are adjacent. It is reasonable to consider that Pt '(i, j) usually lies near the centroid of these four points. This is based on an empirical rule that, even when each part of the image moves, the relative positional relationship is substantially maintained if viewed microscopically in pixel units. Note that mathematically speaking this property, (i, j)
The function Pt '(i, j) of the above is substantially zero.

【００５６】従って上記４点の重心を（St'(i,j)x ，S
t'(i,j)y ）と表記すれば、Therefore, the center of gravity of the above four points is defined as (St '(i, j) x, S
t '(i, j) y)

【数３】Ｅ２＝｛Pt'(i,j)x-St'(i,j)x ｝²＋｛Pt'(i,j)y-St'(i,j)y ｝²（式２）が相対位置の妥当性評価式となる。この式だけを考えれ
ば、Ｅ２が最小値になるときに対応点の位置が最も望ま
しい状態となる。Equation 3] E2 = {Pt '(i, j) x-St' (i, j) x} 2 + {Pt '(i, j) y-St' (i, j) y} 2 ( Formula 2 ) Is the formula for evaluating the validity of the relative position. Considering only this equation, the position of the corresponding point becomes the most desirable state when E2 becomes the minimum value.

【００５７】本ステップでは、式１および式２の評価結
果を適当な結合定数ｋで加算し、Ｅ＝Ｅ１／Ｎ＋ｋ・Ｅ２（式３）で表されるＥを最終的な評価式とする（Ｎはブロックマ
ッチングの際に定義された１つのブロックに含まれる画
素数である）。すなわち、まず各対応点についてＥを計
算し、続いて全対応点のＥの総和ΣＥを計算し、ΣＥが
最小値となるよう、各対応点の位置をすこしずつ変化さ
せる。ΣＥの値が収束するか、または繰り返し計算を所
定の上限回数に達するまで行い、改善処理を施す。より
具体的には、各対応点の位置を変化させるとき、以下の
いずれかの方法を実施すればよい。In this step, the evaluation results of Expressions 1 and 2 are added with an appropriate coupling constant k, and E represented by E = E1 / N + kE2 (Expression 3) is used as the final evaluation expression ( N is the number of pixels included in one block defined at the time of block matching). That is, first, E is calculated for each corresponding point, then the sum ΣE of E of all corresponding points is calculated, and the position of each corresponding point is changed little by little so that ΣE becomes the minimum value. The improvement processing is performed until the value of ΣE converges or until the repetition calculation reaches a predetermined upper limit number. More specifically, when changing the position of each corresponding point, one of the following methods may be performed.

【００５８】（１）オイラー方程式を解く方法 ΣＥが極値、ここでは極小値をとる条件を示すオイラー
方程式を数値的に解くことによって対応点を得る。この
手法自体は既知である。これは、各代表点を含むブロッ
クでの画像傾き情報と、対応ブロック間の画素差分情報
から改善すべき方向を見い出し、これに基づいて対応点
の位置を初期位置から徐々に動かしていき、最終解を求
める。(1) Method of Solving the Euler Equation Corresponding points are obtained by numerically solving the Euler equation showing the condition that ΣE is an extreme value, here a minimum value. This technique itself is known. This is because the direction to be improved is found from the image inclination information in the block including each representative point and the pixel difference information between the corresponding blocks, and based on this, the position of the corresponding point is gradually moved from the initial position. Find a solution.

【００５９】（２）固定探索手法まず、対応点候補領域において、改善対象の対応点のＥ
が最小になる点を探し、これを新たな対応点とする。こ
のとき、他の点の位置を不動とみなして探索を行う点に
特徴がある。この処理を順次全対応点に対して行う。(2) Fixed Search Method First, in the corresponding point candidate area, the E of the corresponding point to be improved is
Find a point where is minimum, and use this as a new corresponding point. At this time, it is characterized in that the search is performed by regarding the positions of other points as immovable. This process is sequentially performed on all corresponding points.

【００６０】（３）混合手法（２）の手法によれば、画素単位の精度で対応点の位置
が求まる。一方、（１）によれば、理論上画素単位以下
の精度で位置を求めることができる。そこで、まず
（２）の手法によって画素単位の精度で対応関係を求
め、しかる後にオイラー方程式を適用して精度を高める
ことも可能である。(3) Mixing Method According to the method (2), the position of the corresponding point can be determined with the accuracy of the pixel unit. On the other hand, according to (1), the position can be theoretically obtained with an accuracy of a pixel unit or less. Therefore, it is also possible to first obtain the correspondence with the accuracy of the pixel unit by the method (2), and thereafter to improve the accuracy by applying the Euler equation.

【００６１】なお実験によれば、同じ精度で比較した場
合、（２）の手法によって（１）よりも短時間で好まし
い解が得られている。According to the experiment, when compared with the same precision, a preferable solution is obtained in a shorter time than the method (1) by the method (2).

【００６２】図８は図６の対応点候補に対して本ステッ
プの改善処理を行った結果を示す図である。実験によれ
ば、カラー画像の場合、ｋは５〜２００程度で良好な結
果が得られることがわかった。図６と図８はともに模式
的な図であるが、実験の結果、実際にこれらの図に近い
改善が見られた。FIG. 8 is a diagram showing the result of performing the improvement processing of this step on the corresponding point candidates in FIG. According to an experiment, in the case of a color image, k was about 5 to 200, and a good result was obtained. FIG. 6 and FIG. 8 are schematic diagrams, but as a result of the experiment, improvements close to those in these figures were actually observed.

【００６３】以上が工程１の詳細である。本工程の特徴
は被写体の任意方向の動きから二次元動き情報を抽出で
きる点にある。これは代表点と対応点という概念で動き
を把握する利点であり、水平方向の動きを検出して時間
差を決定する従来の技術に比べ、広い応用を可能とする
ものである。The above is the details of the step 1. The feature of this step is that two-dimensional motion information can be extracted from the motion of the subject in any direction. This is an advantage of grasping the movement based on the concept of the representative point and the corresponding point, and enables a wider application than the conventional technique of detecting the movement in the horizontal direction and determining the time difference.

【００６４】なお本工程には、以下の応用または変形が
ある。This step has the following applications or modifications.

【００６５】１．Ｅ２導出の際、上下左右の４点のみな
らず、斜め方向の４点を加えた計８点の重心を考える。
いかなる組合せが最適であるかは映像の種類にも依存す
るため、適宜実験によって決めていくことが望ましい。1. When deriving E2, consider not only the four points in the up, down, left, and right, but also the center of gravity of a total of eight points including four points in the oblique direction.
Which combination is optimal depends on the type of video, so it is desirable to appropriately determine the combination by experiment.

【００６６】２．式３による評価は、Ｅ１のみによる評
価結果が思わしくなかった対応点から優先的に行う。こ
れはＥ１の結果が悪い対応点は一般に位置の誤差が大き
いと考えられるためであり、こうした対応点の位置を早
期に、かつ大幅に改善することが望ましいためである。2. The evaluation based on Expression 3 is preferentially performed from the corresponding point where the evaluation result based on only E1 is not good. This is because a corresponding point having a poor result of E1 is generally considered to have a large position error, and it is desirable to improve the position of such a corresponding point early and significantly.

【００６７】３．位置改善の際、幾何情報も利用する。
フレームｔにおいて幾何的に特徴のある領域、例えば直
線を形成していた複数の代表点については、それらの対
応点も直線を形成するように位置を補正する。これは映
像上直線に見える部分は現実の三次元空間でも直線であ
る可能性が高く、一方、三次元空間の直線はフレーム
ｔ' でも直線となるべきだからである。本来奥行きは直
線に沿って一様に変化するものであり、直線に沿う変化
は視覚的に容易に把握されるため、この方法による改善
効果は大きい。なお、他の幾何情報として、画像領域の
エッジなどが考えられる。3. Geometric information is also used for position improvement.
For a region having a geometric characteristic in the frame t, for example, a plurality of representative points that have formed a straight line, the positions are corrected so that their corresponding points also form a straight line. This is because a portion that appears to be a straight line on an image is likely to be a straight line even in the actual three-dimensional space, while a straight line in the three-dimensional space should be a straight line even in the frame t ′. Originally, the depth changes uniformly along the straight line, and the change along the straight line can be easily grasped visually, so that the improvement effect by this method is great. The other geometric information may be an edge of an image area.

【００６８】４．さらに別のフレームについても対応点
を求める。本工程ではフレームｔに対するフレームｔ'
の対応点を求めたが、さらに第三のフレームｔ''におけ
る対応点も求め、映像各部位の平均化された動きを求め
ることができる。この方法は、フレームｔ' における対
応点位置を改善していくのではない。多くのフレームで
対応点をとることにより、対応点の位置とそのフレーム
が撮影された時間から映像各部位の動きを統計的に決め
ていくものである。4. A corresponding point is obtained for another frame. In this step, the frame t 'with respect to the frame t
Are obtained, the corresponding points in the third frame t '' are also obtained, and the averaged motion of each part of the video can be obtained. This method does not improve the position of the corresponding point in the frame t '. By taking corresponding points in many frames, the movement of each part of the video is statistically determined from the positions of the corresponding points and the time at which the frames were taken.

【００６９】［工程２］三次元動き情報の算出工程１により、映像各部位の画面上の二次元的な動きが
判明した。工程２ではこの情報から各部位の三次元的な
動きを算出する。映像は被写体の現実の動きを平面に投
影したものであり、本工程では代表点と対応点の位置関
係からもとの動きを導出する。[Step 2] Calculation of Three-Dimensional Motion Information In step 1, two-dimensional motion of each part of the image on the screen was determined. In step 2, a three-dimensional movement of each part is calculated from this information. The image is obtained by projecting the actual movement of the subject on a plane, and in this step, the original movement is derived from the positional relationship between the representative point and the corresponding point.

【００７０】一般に被写体の三次元空間における動き
は、並進運動と回転運動の合成として記述することがで
きる。ここではまず、動きが並進運動のみで構成される
場合の計算方法を説明し、後に一般化された方法を概説
する。In general, the motion of a subject in a three-dimensional space can be described as a combination of a translational motion and a rotational motion. Here, first, a calculation method in the case where the motion is composed of only translational motion will be described, and then a generalized method will be outlined later.

【００７１】１．動きが並進運動のみの場合図９はある点Ｐの画面上の移動と三次元空間での現実の
移動の対応を示す図である。同図では画面上の二次元座
標を大文字Ｘ等で、現実の三次元座標を小文字ｘ等で表
記するものとし、三次元座標のうちｘ、ｙ軸を画面上
に、ｚ軸を奥行き方向にとっている。また、視点から画
面までの距離を１とする。1. FIG. 9 is a diagram showing the correspondence between the movement of a certain point P on the screen and the actual movement in a three-dimensional space. In the figure, the two-dimensional coordinates on the screen are represented by capital letters X and the like, and the actual three-dimensional coordinates are represented by small letters x and the like. The x and y axes of the three-dimensional coordinates are represented on the screen and the z-axis is represented in the depth direction. I have. Further, the distance from the viewpoint to the screen is set to 1.

【００７２】この図に示す通り、Ｐ（Ｘ, Ｙ）は画面上
をＰ' （Ｘ',Ｙ' ）へ移動するが、この間、この点は三
次元空間においてＳ（ｘ, ｙ, ｚ）からＳ（ｘ',ｙ',
ｚ' ）へと移動する。ここで、As shown in this figure, P (X, Y) moves on the screen to P ′ (X ′, Y ′), during which point, in the three-dimensional space, S (x, y, z) To S (x ', y',
z '). here,

【数４】（ｘ',ｙ',ｚ' ）＝（ｘ, ｙ, ｚ）＋（ａ, ｂ, ｃ）とすれば、画面までの距離が１なので、Ｘ＝ｘ／ｚ，Ｙ＝ｙ／ｚＸ' ＝ｘ' ／ｚ' ，Ｙ' ＝ｙ' ／ｚ' となる。これを解けば、Ｘ' ＝（Ｘｚ＋ａ）／（ｚ＋ｃ）Ｙ' ＝（Ｙｚ＋ｂ）／（ｚ＋ｃ）となるため、ｚを消去し、次式が求められる。If (x ′, y ′, z ′) = (x, y, z) + (a, b, c), the distance to the screen is 1, so that X = x / z, Y = y / z X ′ = x ′ / z ′ and Y ′ = y ′ / z ′. If this is solved, X '= (Xz + a) / (z + c) Y' = (Yz + b) / (z + c), so z is eliminated and the following equation is obtained.

【００７３】[0073]

【数５】（ａ−Ｘ' ｃ）（Ｙ' −Ｙ）＝（ｂ−Ｙ' ｃ）（Ｘ' −Ｘ）（式４）式４は画面上の動き情報で表現されているため、工程１
で得られた情報によって未知数ａ, ｂ, ｃを決めること
ができる。しかしこの際、現実にはｋ倍の大きさの物体
がｋ倍離れたところをｋ倍の速さで移動するケースにお
いて、このｋの値（スケールファクター）を決めること
はできず、ａ, ｂ, ｃについてはそれらの比のみを求め
ることが可能となる。数学的にいえば、（Ｘ, Ｙ）と
（Ｘ',Ｙ' ）の対応を３組与えても、この連立方程式を
行列表示した際の係数行列のランク（階数）は高々２で
あり、ａ, ｂ, ｃは相対値としてしか決まらない。そこ
で本工程では、仮にｃ＝１と正規化してａ, ｂを表すこ
とにする。比のみでも、次工程による処理が可能なため
である。(A−X ′ c) (Y′−Y) = (b−Y ′ c) (X′−X) (Equation 4) Since Equation 4 is represented by motion information on the screen, Step 1
The unknowns a, b, and c can be determined based on the information obtained in (1). However, in this case, in reality, in the case where an object having a size of k times moves at a speed of k times at a position separated by k times, the value of k (scale factor) cannot be determined. , c, it is possible to determine only their ratio. Mathematically speaking, even if three sets of (X, Y) and (X ', Y') are given, the rank (order) of the coefficient matrix when this simultaneous equation is represented as a matrix is 2 at most, a, b, and c can be determined only as relative values. Therefore, in this step, a and b are temporarily represented by normalizing c = 1. This is because the process in the next step can be performed using only the ratio.

【００７４】並進運動の別の解法として、式４から誤差
ｅを、As another solution to the translation, the error e from

【数６】 e ＝｛(a-X'c)(Y'-Y) −(b-Y'c)(X'-X) ｝² ＝｛(Y'-Y)a-（X'-X)b−(XY'-X'Y)c｝² （式５）と定義し、代表点と対応点の全対応関係についてｅの総
和Σｅをとり、この値を最小にするａ, ｂ, ｃを次の式
から求めてもよい。E = e (a-X′c) (Y′-Y) − (b-Y′c) (X′-X)｝ ² = ｛(Y′-Y) a- (X′- X) b− (XY′−X′Y) c｝ ² (Equation 5), and the sum Σe of e is obtained for all the correspondences between the representative points and the corresponding points, and a, b, c may be calculated from the following equation.

【００７５】ｄ（Σｅ）／ｄａ＝０（式６）ｄ（Σｅ）／ｄｂ＝０（式７）ｄ（Σｅ）／ｄｃ＝０（式８）より具体的には、式６〜８はそれぞれ次の形に展開され
る。D (Σe) / da = 0 (formula 6) d (Σe) / db = 0 (formula 7) d (ｄe) / dc = 0 (formula 8) More specifically, formulas 6 to 8 are Each is expanded to the following form.

【００７６】[0076]

【数７】 a Σ(Y'-Y)²-bΣ(X'-X)(Y'-Y)-cΣ(Y'-Y)(XY'-X'Y)=0 （式９） -a Σ(X'-X)(Y'-Y)+bΣ(X'-X)²+cΣ(X'-X)(XY'-X'Y)=0 （式１０） -a Σ(Y'-Y)(XY'-X'Y)+b Σ(X'-X)(XY'-X'Y)+c Σ(XY'-X'Y) ²=0 （式１１）以上が並進運動に関する計算方法の例である。７ (Y′-Y) ² -bΣ (X'-X) (Y'-Y) -cΣ (Y'-Y) (XY'-X'Y) = 0 (Equation 9)- a Σ (X'-X) (Y'-Y) + bΣ (X'-X) ² + cΣ (X'-X) (XY'-X'Y) = 0 (Equation 10) -a Σ (Y '-Y) (XY'-X'Y) + b Σ (X'-X) (XY'-X'Y) + c Σ (XY'-X'Y) ² = 0 (Equation 11) It is an example of the calculation method regarding exercise.

【００７７】２．動きが回転運動を含む場合回転運動はｘ, ｙ, ｚ方向の３つの変位と各軸を中心と
する３つの回転角、例えばα, β, γによって記述する
ことができる。回転角はオイラー角またはロールピッチ
法などによって表現することができる。2. When Motion Includes Rotational Motion Rotational motion can be described by three displacements in the x, y, and z directions and three angles of rotation about each axis, for example, α, β, γ. The rotation angle can be expressed by an Euler angle or a roll pitch method.

【００７８】ここで上記合計６つの変数を決定すればよ
いが、ここでも上述のごとくスケールファクターが決ま
らないため、ある変数を１として各変数の比を求める。
理論的上、代表点と対応点を５組とれば運動を記述する
ことができる。Here, the total of six variables may be determined. However, since the scale factor is not determined as described above, a certain variable is set to 1 and the ratio of each variable is obtained.
Theoretically, the movement can be described if there are five sets of representative points and corresponding points.

【００７９】ここで注意すべきは、組のとりかたによっ
ては動きの様子が線形解法によって求まらないことがあ
る点である。こうした場合を考慮する際、組を８以上と
ればよいことが知られている。８組の変化から線形解法
によって回転運動を記述しうる根拠については、例えば
「動きからの単眼立体視による形状認識の線形解法につ
いて」（出口・秋場、計測自動制御学会論文集vol.26，
No.6，714/720 （1990））などに示されている。It should be noted here that, depending on how the pairs are set, the state of the movement may not be obtained by the linear solution method. When such a case is considered, it is known that the number of sets should be eight or more. For the grounds that can describe the rotational motion by the linear solution from the eight sets of changes, see, for example, “About the linear solution of shape recognition by monocular stereoscopic vision from motion” (Exit / Akiba, Transactions of the Society of Instrument and Control Engineers vol.26,
No. 6, 714/720 (1990)).

【００８０】［工程３］奥行き情報の獲得工程２によって映像各部位の三次元的な動きの相対量が
わかった。工程３では、この相対量から各部位の奥行き
情報を導出する。本工程では説明のために、被写体は静
止しており、それを撮影するカメラの側が動くものと仮
定する。映像処理の際には被写体とカメラの相対運動が
問題となるため、この仮定によって良好な結果が得られ
る。[Step 3] Acquisition of Depth Information In step 2, the relative amount of three-dimensional movement of each part of the image was determined. In step 3, depth information of each part is derived from the relative amount. In this step, for the sake of explanation, it is assumed that the subject is stationary and the camera that captures the subject moves. Since the relative motion between the subject and the camera becomes a problem during image processing, a good result can be obtained by this assumption.

【００８１】映像のある部位の動きを回転行列Ｒと並進
ベクトル（ａ, ｂ, ｃ）により、（ｘ',ｙ',ｚ' ）＝Ｒ（ｘ, ｙ, ｚ）＋（ａ, ｂ, ｃ）と表す場合、この逆変換、The motion of a certain part of the image is represented by (x ′, y ′, z ′) = R (x, y, z) + (a, b, c) using the rotation matrix R and the translation vector (a, b, c). c) this inverse transformation,

【数８】（ｘ, ｙ, ｚ）＝Ｒ^-1｛（ｘ',ｙ',ｚ' ）−（ａ, ｂ, ｃ）｝（式１２）をカメラの動きと考える。(X, y, z) = R ⁻¹ {(x ′, y ′, z ′) − (a, b, c)} (Equation 12) is considered as the motion of the camera.

【００８２】図１０はカメラの三次元移動とある点Ｐの
画面上の移動から点Ｐの三次元座標を導く原理を説明す
る図である。同図からわかるように、この原理は一般に
三角測量の原理として知られるもので、位置の異なる２
点から点Ｐの方向を見たとき、点Ｐの現実の位置（図中
の点Ｓ）はそれら２つの視線の交点に存在するというも
のである。FIG. 10 is a diagram for explaining the principle of deriving the three-dimensional coordinates of the point P from the three-dimensional movement of the camera and the movement of the point P on the screen. As can be seen from the figure, this principle is generally known as the principle of triangulation,
When the direction of the point P is viewed from the point, the actual position of the point P (point S in the figure) exists at the intersection of these two lines of sight.

【００８３】同図では、時刻ｔ〜ｔ' の間にカメラが矢
印で示すように式１２に従って移動したとする。フレー
ムｔでは点Ｓが点Ｐt に、ｔ' では点Ｐt'にそれぞれ投
影されている。点Ｓは図中の２つの直線Ｌｔ、Ｌｔ' の
交点にある。In the figure, it is assumed that the camera has moved in accordance with Expression 12 between the times t and t 'as shown by arrows. In frame t, point S is projected on point Pt, and in t ', point S is projected on point Pt'. Point S is located at the intersection of two straight lines Lt and Lt 'in the figure.

【００８４】ここでカメラの方向とＬｔ、Ｌｔ' のなす
角θｔ、θｔ' は既知であり、一方カメラの移動方向と
距離が判明しているため、点Ｓの三次元座標を求めるこ
とが可能となる。この座標により、映像各部位の奥行き
情報が判明する。Here, the angles θt and θt ′ formed by the direction of the camera and Lt and Lt ′ are known, while the moving direction and the distance of the camera are known, so that the three-dimensional coordinates of the point S can be obtained. Becomes From these coordinates, the depth information of each part of the image is determined.

【００８５】ここで注意すべきは、前述のごとくｃ＝１
という正規化のため、求められた座標も一定の割合で拡
大または圧縮されていることである。しかしこの場合で
も、奥行き情報は一様に拡大圧縮されているため、奥行
きの相互関係は正しい。It should be noted that c = 1 as described above.
For this normalization, the obtained coordinates are also expanded or compressed at a constant rate. However, also in this case, the depth information is correct because the depth information is uniformly expanded and compressed.

【００８６】以上が本工程の概要であるが、本工程では
前工程までの誤差を考慮する必要がある。誤差により、
通常は前記Ｌｔ、Ｌｔ' が計算上交わらないためであ
る。こうした事情に配慮し、本工程では両直線の最接近
点の中点のｚ座標を点Ｓの奥行き値と近似する。これを
数式によって説明する。The above is the outline of the present step. In this step, it is necessary to consider an error up to the previous step. Due to the error,
This is because Lt and Lt 'do not normally intersect in calculation. In consideration of such circumstances, in this step, the z coordinate of the midpoint between the closest points of both straight lines is approximated to the depth value of the point S. This will be described using mathematical expressions.

【００８７】上記Ｌｔ、Ｌｔ' の方向ベクトルをそれぞ
れ（ｕ, ｖ, ｗ）、（ｕ',ｖ',ｗ' ）とする。ここで実
数パラメータα、βにより両直線は、Ｌｔ：（ｘ, ｙ, ｚ）＋α（ｕ, ｖ, ｗ）Ｌｔ' ：（ｘ',ｙ',ｚ' ）＋β（ｕ',ｖ',ｗ' ）（式１３）と表すことができる。従って、The direction vectors of Lt and Lt ′ are (u, v, w) and (u ′, v ′, w ′), respectively. Here, both straight lines are expressed by the real number parameters α and β as follows: Lt: (x, y, z) + α (u, v, w) Lt ′: (x ′, y ′, z ′) + β (u ′, v ′, w ′) (Equation 13). Therefore,

【数９】e = {(x+βu)-(x'+ αu')}²+{(y+ βv)-(y'+
αv')}²+{(z+βw)-(z'+ αw')}² とし、ｅを最小にするα、βをｄｅ／ｄα＝０、ｄｅ／
ｄβ＝０より求める。すなわち、## EQU9 ## e = {(x + βu)-(x ′ + αu ′)} ² + {(y + βv)-(y ′ +
αv ′)} ² + {(z + βw) − (z ′ + αw ′)} ^2, and α and β that minimize e are de / dα = 0, de /
It is determined from dβ = 0. That is,

【数１０】 (u²+v²+w²) α-(uu'+vv'+ww')β+(x-x')u+(y-y')v+(z-z')w=0 (u' ²+v' ²+w' ²) β-(uu'+vv'+ww')α+(x-x')u'+(y-y')v'+(z-z')w'=0 を解いてα、βを求め、最終的に点Ｓの奥行き値を、(U ² + v ² + w ² ) α- (uu '+ vv' + ww ') β + (x-x') u + (y-y ') v + (z-z') w = 0 (u ' ² + v' ² + w ' ² ) β- (uu' + vv '+ ww') α + (x-x ') u' + (y-y ') v' + (z-z ') w' = 0 to obtain α and β, and finally the depth value of the point S,

【数１１】{(z+αw)+(z'+ βw')}/2 とすればよい。ここで仮に誤差が０だったとすれば、こ
の座標は両直線の交点のｚ座標に一致する。[Equation 11] {(z + αw) + (z ′ + βw ′)} / 2 Here, if the error is 0, this coordinate coincides with the z coordinate of the intersection of both straight lines.

【００８８】また別の方法として、これら両直線を一旦
フレームｔの画面に透視投影し、投影後の最近接点のｚ
座標を求めることもできる。ここでＬｔは代表点である
一点に投影され、一方Ｌｔ' は一般に直線に投影され
る。Ｌｔ' が式１３で表記されるならば、投影後の各点
のｘ、ｙ座標は、Ｌｔ' 上の各点のそれらをそのｚ座標
で割ることにより、 x = f(x'+ βu')/(z'+βw') （式１４） y = f(y'+ βv')/(z'+βw') （式１５）と書くことができる。ここでｆは視点からフレームｔの
画面までの距離で、実際にはｆ＝１などとして扱えばよ
い。式１４、１５からβを消去すれば投影後の直線（以
下Ｌｉという）が以下のように求まる。As another method, these two straight lines are once perspectively projected on the screen of the frame t, and the projected closest point z is projected.
Coordinates can also be determined. Here, Lt is projected to one point which is a representative point, while Lt 'is generally projected to a straight line. If Lt ′ is expressed by Equation 13, the x, y coordinates of each point after projection can be obtained by dividing those of each point on Lt ′ by its z coordinate, as x = f (x ′ + βu ′). ) / (z ′ + βw ′) (Equation 14) y = f (y ′ + βv ′) / (z ′ + βw ′) (Equation 15) Here, f is the distance from the viewpoint to the screen of the frame t, and may be handled as f = 1 in practice. If β is eliminated from Expressions 14 and 15, a straight line after projection (hereinafter referred to as Li) is obtained as follows.

【００８９】ｋｘ＋ｍｙ＋f ｎ＝0 ただしここで、ｋ＝v'z'-w'y' 、ｍ＝w'x'-u'z' 、ｎ＝u'y'-v'x' とおいている。Kx + my + f n = 0 where k = v'z'-w'y ', m = w'x'-u'z', and n = u'y'-v'x '.

【００９０】求めるべき最近接点は、代表点ＰｔからＬ
ｉに下ろした垂線とＬｉの交点（以下Ｄとする）であ
り、その座標は、ｘ＝（m ²X-kn-kmY）／（k ²+m²）（式１６）ｙ＝（k ²Y-mn-kmX）／（k ²+m²）となる。ここで点Ｔに対応するもとのＬｔ' 上の点をＥ
（x'',y'',z'' ）とすれば、点Ｅは、式１６を式１４に
代入してβを求め、これをＬｔ' の式に代入することよ
って求められる。ここでβは、 β＝（xz'-fx' ）／(fu'-xw'）であるため、これを式１３へ代入し、点Ｅのｚ座標 z''
は、 z'' ＝z'+ w'（xz'-fx' ）／（fu'-xw' ）と求まる。これを点Ｓの奥行き値とすればよい。The closest point to be obtained is L from the representative point Pt.
It is the intersection (hereinafter referred to as D) of the perpendicular drawn to i and Li, and its coordinates are x = (m ² X-kn-kmY) / (k ² + m ² ) (Equation 16) y = (k ² Y-mn-kmX) / (k ² + m ² ). Here, a point on Lt ′ corresponding to point T is represented by E
If (x '', y '', z ''), the point E is obtained by substituting equation 16 into equation 14 to obtain β, and substituting this into the equation Lt '. Here, β is β = (xz′−fx ′) / (fu′−xw ′). Therefore, this is substituted into Expression 13 to obtain the z coordinate z ″ of the point E.
Is obtained as z '' = z '+ w'(xz'-fx') / (fu'-xw'). This may be used as the depth value of the point S.

【００９１】なお、画像処理の際の誤差によって上記の
奥行きが負の値となる場合（点Ｓがカメラの後方に存在
することになる場合）、この計算結果は信頼することが
できない。このときは正の奥行き値を持つ近傍の代表点
から補間するなどの処理を行う。If the above-mentioned depth has a negative value due to an error in the image processing (when the point S is located behind the camera), the calculation result cannot be relied on. At this time, processing such as interpolation from a nearby representative point having a positive depth value is performed.

【００９２】以上、いずれの方法をとるかは別として、
求められた映像各部位の奥行きは、例えば代表点ごとに
数値として与えればよい。図１１はフレームｔにおいて
各代表点に数値が与えられた状態を示す図で、例えばＰ
t （２, ３）、Ｐt （４, ３）それぞれの奥行きは１０
０、２００となり、後者の実際の位置は前者よりも２倍
離れたところにあることがわかる。[0092] Regardless of which method is used,
The obtained depth of each part of the image may be given as a numerical value for each representative point, for example. FIG. 11 is a diagram showing a state in which a numerical value is given to each representative point in a frame t.
t (2, 3) and Pt (4, 3) each have a depth of 10
0 and 200, indicating that the actual position of the latter is twice as far as the former.

【００９３】［工程４］奥行き情報の利用工程３で求められた奥行き情報に応じて警備システムを
構築する。一例として、監視カメラおよびＰＣによるシ
ステムが考えられる。すなわち、人やその他任意の物体
が近寄ることが禁止されている区域に監視カメラを設置
し、この監視カメラで撮影された映像をＰＣに送って解
析することにより、被写体の奥行きを抽出する。被写体
が近づくことにより、その奥行きが所定の値以下になっ
たとき、ＰＣ経由で警告音を発する、警備員に通報す
る、その区域の照明を明るくする、被写体の映像を記録
する、などの処理を行う。[Step 4] Use of Depth Information A security system is constructed according to the depth information obtained in Step 3. As an example, a system using a monitoring camera and a PC can be considered. That is, a surveillance camera is installed in an area where a person or any other object is prohibited from approaching, and a video taken by the surveillance camera is sent to a PC for analysis to extract the depth of the subject. Processing such as issuing a warning sound via a PC, notifying a security guard, brightening the lighting of the area, or recording an image of the subject when the depth of the subject falls below a predetermined value due to approaching the subject. I do.

【００９４】警備システムには、レーザや超音波を対象
物体に当ててその反射から距離を測定するシステムがあ
るが、その場合レーザ等を振るためのスキャン機構が必
要になる。本実施形態ではそうしたスキャン機構が不要
である。また別の警備システムとして、物体の赤外線を
感知するものもあるが、そのシステムでは熱を発しない
物体を見つけることができない。その意味でも、熱に関
係のない本実施形態は有利である。The security system includes a system for measuring the distance from the reflection of a target object by applying a laser or ultrasonic wave to the object. In that case, a scanning mechanism for shaking the laser or the like is required. In the present embodiment, such a scanning mechanism is unnecessary. Other security systems detect infrared light from objects, but cannot detect objects that do not produce heat. In this sense, the present embodiment, which is not related to heat, is advantageous.

【００９５】なお、本警備システムでは、例えばある距
離Ｄ以内に近づいた物体の画像領域のみを画像全体から
切り出すことも可能である。これは画像全体の中から、
奥行きがＤ以内である領域を選択すればよい。こうして
切り出した領域を拡大したり、その領域の動きに追従し
て観察するなどの処理も可能である。In the security system, for example, only an image area of an object approaching within a certain distance D can be cut out from the entire image. This is from the whole image,
What is necessary is just to select an area whose depth is within D. Processing such as enlarging the region cut out in this way or observing the movement following the movement of the region is also possible.

【００９６】実施形態２．実施形態１では、入力映像が
単眼カメラによって撮影されるものとした。ここでは、
多眼カメラによるステレオ映像を入力映像とする場合を
実施形態１との相違点を中心に説明する。 Embodiment 2 In the first embodiment, it is assumed that the input video is captured by the monocular camera. here,
A case in which a stereo image from a multi-lens camera is used as an input image will be described focusing on differences from the first embodiment.

【００９７】図１２は実施形態２の主な工程を示す。同
図と実施形態１の図１との主な違いは以下の通りであ
る。FIG. 12 shows main steps of the second embodiment. The main differences between FIG. 1 and FIG. 1 of the first embodiment are as follows.

【００９８】１．工程１の「動き情報」が「変位情報」
に変更される実施形態１では異時刻フレームを扱ったが、実施形態２
では基本的に同時刻フレームを扱う。同時刻の場合、被
写体に動きを定義することはできないため、代わりに同
時刻フレーム間の被写体の位置のずれ、すなわち変位の
情報を抽出する。1. "Motion information" in step 1 is "displacement information"
The first embodiment deals with the different time frame, but the second embodiment
So basically we deal with the same time frame. In the case of the same time, since movement cannot be defined for the subject, information on the displacement of the subject between the frames at the same time, that is, information on the displacement is extracted instead.

【００９９】２．工程２が不要となる図１の工程２「三次元動き情報の算出」に対応するステ
ップがない。多眼の場合、はじめから図１０の状態で撮
影がなされるため、三角測量の原理によって直接奥行き
情報が獲得できるためである。2. Step 2 becomes unnecessary There is no step corresponding to step 2 “calculation of three-dimensional motion information” in FIG. This is because, in the case of a multi-view, since the shooting is performed in the state of FIG. 10 from the beginning, depth information can be directly obtained by the principle of triangulation.

【０１００】なお、複数カメラの相対位置関係に狂いが
発生しうる多眼カメラシステムを用いる場合、この狂い
を補正するセルフキャリブレーションを行ったほうがよ
い。この場合、工程２をセルフキャリブレーション工程
として利用する。セルフキャリブレーションの手法につ
いては、例えば、富田、高橋「ステレオカメラのセルフ
キャリブレーション」（情報処理Vol.31，No.5（1990）
650 〜659 ページ）、特開平０２−１３８６７１号公
報、特開平０２−１３８６７２号公報などに示されてい
る。以下、実施形態２の工程１〜３を説明する。When using a multi-lens camera system in which the relative positional relationship between a plurality of cameras may be out of order, it is better to perform self-calibration to correct the out of order. In this case, step 2 is used as a self-calibration step. For the self-calibration method, see, for example, Tomita and Takahashi "Self-calibration of stereo camera" (Information Processing Vol.31, No.5 (1990)
650-659), JP-A-02-138671, JP-A-02-138672 and the like. Hereinafter, steps 1 to 3 of the second embodiment will be described.

【０１０１】［工程１］二次元変位情報の抽出実施形態１の説明において、「動き」を「変位」に置き
換える他、フレームｔ、ｔ' の組をフレーム１、２に置
き換えればよい。フレーム１、２はそれぞれステレオカ
メラを構成する左右のカメラ１、２から撮影された映像
を指し、撮影時刻はｔで固定とする。実施形態２では、
最低これら２枚のフレームのみから最終画像を得ること
ができる。すなわち多眼撮影の場合は、入力は静止映像
であってもよい。その他、実施形態１の工程１との相違
は以下の通りである。[Step 1] Extraction of Two-Dimensional Displacement Information In the description of the first embodiment, in addition to replacing “movement” with “displacement”, a set of frames t and t ′ may be replaced with frames 1 and 2. Frames 1 and 2 respectively indicate images taken from left and right cameras 1 and 2 constituting a stereo camera, and the shooting time is fixed at t. In the second embodiment,
The final image can be obtained only from at least these two frames. That is, in the case of multi-view photography, the input may be a still image. In addition, the differences from Step 1 of Embodiment 1 are as follows.

【０１０２】（１）実施形態１のＳ１１（対応点候補領
域の設定）では、映像の動きの激しさまたは各部位の移
動軌跡に基づき、異時刻フレームの選択または対応点候
補領域を絞り込んで対応点検出処理の計算量削減を行っ
た。実施形態２では絞り込みの方法を以下のように変更
し、同様に有効な計算量削減を実現する。(1) In S11 (setting of corresponding point candidate area) in the first embodiment, selection of a different time frame or narrowing of the corresponding point candidate area is performed based on the intensity of the motion of the image or the movement trajectory of each part. The calculation amount of point detection processing was reduced. In the second embodiment, the method of narrowing down is changed as follows, and an effective calculation amount is similarly reduced.

【０１０３】まず、通常どおり多眼カメラが水平に設置
されると仮定する。このとき、対応点のｙ座標（上下方
向の座標）はほぼ等くなる。この仮定と画像処理に伴う
誤差およびカメラの設置誤差を考慮し、対応点候補領域
を横長の帯状領域に限定する。さらに、フレームｔ'
（ｔ' ＝ｔ−１）において対応する代表点の位置の差が
ｘであれば、フレームｔにおける対応点探索領域も、や
はり差がｘとなる近傍に限定することができる。First, it is assumed that the multi-lens camera is installed horizontally as usual. At this time, the y-coordinates (vertical coordinates) of the corresponding points are substantially equal. Considering this assumption, an error associated with image processing, and a camera installation error, the corresponding point candidate area is limited to a horizontally long band-like area. Further, the frame t '
If the difference between the positions of the corresponding representative points at (t ′ = t−1) is x, the corresponding point search area in the frame t can also be limited to the neighborhood where the difference is also x.

【０１０４】（２）実施形態１のＳ１２（対応点候補領
域における非類似度の計算）では、映像の動きが緩慢過
ぎる場合に統計処理を導入したが、実施形態２ではこの
作業も不要である。(2) In S12 (calculation of dissimilarity in the corresponding point candidate area) of the first embodiment, statistical processing is introduced when the motion of the video is too slow. However, this operation is not required in the second embodiment. .

【０１０５】（３）実施形態１のＳ１２同様、実施形態
２でも対応点の位置を決めるべくブロックマッチングを
行うが、ここではバイアスド・ブロックマッチングを採
用したほうがよい場合がある。バイアスド・ブロックマ
ッチングは、多眼カメラを構成する各カメラが異なる特
性を持つ場合、有効に機能する。例えば、カメラ２がカ
メラ１よりも青味がかった映像を送出するとすれば、フ
レーム２の色濃度から青（Ｂ）の成分を一定量差し引い
た後、すなわち色偏向定数α_Bを引いた後、ブロックマ
ッチングを行うべきである。こうした処理を行わない場
合、Ｅ１とＥ２を加算する式３の意味が失せる恐れがあ
る。実際には、例えばＲＧＢで色濃度を表す場合、前記
の青（Ｂ）のみならず、赤（Ｒ）と緑（Ｇ）についても
それぞれ色偏向定数α_R、α_Gを引くべきである。バイ
アスド・ブロックマッチングは、映像の類似性を数値化
する際、２つのカメラの特性を一致させるキャリブレー
ション作用をもつ。バイアスド・ブロックマッチング
は、映像の類似性と相対位置の妥当性を同一ステージで
評価することの適切さを保証する。(3) As in S12 of the first embodiment, block matching is performed in the second embodiment to determine the position of the corresponding point. Here, it may be better to employ biased block matching. Biased block matching works effectively when the cameras constituting the multi-view camera have different characteristics. For example, assuming that the camera 2 transmits an image that is more bluish than the camera 1, after subtracting a certain amount of the blue (B) component from the color density of the frame 2, that is, after subtracting the color deflection constant α _B , Block matching should be performed. If such processing is not performed, there is a possibility that the meaning of Expression 3 for adding E1 and E2 may be lost. Actually, when the color density is represented by RGB, for example, the color deflection constants α _R and α _G should be subtracted not only for the blue (B), but also for the red (R) and the green (G). Biased block matching has a calibration function of matching the characteristics of two cameras when quantifying similarity between images. Biased block matching guarantees the adequacy of evaluating the similarity of images and the validity of the relative position at the same stage.

【０１０６】図４および式１を基礎として、バイアスド
・ブロックマッチングを式で説明する。ここでは、実施
形態１で用いたＰt （ｉ, ｊ）をフレーム１、２に対応
して単にＰ１、Ｐ２と表記し、Ｉt （ｉ, ｊ）も同様に
Ｉ１、Ｉ２と表記する。このとき式１は、The biased block matching will be described using equations based on FIG. 4 and equation (1). Here, Pt (i, j) used in the first embodiment is simply expressed as P1 and P2 corresponding to frames 1 and 2, and It (i, j) is similarly expressed as I1 and I2. At this time, Equation 1 is

【数１２】Ｅ１= ΣΣ｛I1(P1x+u,P1y+v) −I2(P2x+u,P2y+v) ｝² （式１７）と簡単になる。この式が濃淡画像の場合の通常のブロッ
クマッチングを表す。[Equation 12] E1 = {I1 (P1x + u, P1y + v) −I2 (P2x + u, P2y + v)} ² (Equation 17) This expression represents normal block matching for a grayscale image.

【０１０７】一方、バイアスド・ブロックマッチングで
は、式１７を、On the other hand, in biased block matching, equation 17 is obtained by

【数１３】Ｅ１= ΣΣ｛I1(P1x+u,P1y+v) −I2(P2x+u,P2y+v) −α｝² （式１８）とすればよい。カラー画像の場合、αはα_R、α_Gまた
はα_Bであり、ＲＧＢそれぞれの画像において求めたＥ
１の和、すなわちＥ１_R＋Ｅ１_G＋Ｅ１_Bでマッチング
を行う。さらに見やすさを考えて、I1(P1x+u,P1y+v) を
単にＩ１、I2(P2x+u,P2y+v) を単にＩ２と表記すれば、
式１８は、Ｅ１= ΣΣ（I1-I2-α）² （式１９）となる。Ｉ１、Ｉ２はｕ、ｖの関数であるが、αは定数
である。E1 = {I1 (P1x + u, P1y + v) −I2 (P2x + u, P2y + v) −α} ² (Equation 18) In the case of a color image, α is α _R , α _G or α _B , and E obtained in each of the RGB images is
Matching is performed with the sum of 1, that is, E1 _R + E1 _G + E1 _B. Further considering the legibility, if I1 (P1x + u, P1y + v) is simply expressed as I1, and I2 (P2x + u, P2y + v) is simply expressed as I2,
Equation 18 is given by E1 = ΣΣ (I1-I2-α) ² (Equation 19). I1 and I2 are functions of u and v, while α is a constant.

【０１０８】αの最適値を考える。カメラ１、２は同じ
被写体を撮影しているはずだから、フレーム１、２の映
像は、映像各部位の変位は別として、ほぼ同様の内容を
含む。すなわちカメラの特性が近づくほど、式１９のＥ
１の値は小さくなる。この事実から逆に、αはＥ１を最
小にする値とすべきことがわかる。式１９は、Consider the optimal value of α. Since the cameras 1 and 2 should have photographed the same subject, the images of the frames 1 and 2 include substantially the same contents except for the displacement of each part of the image. That is, as the characteristics of the camera become closer, E
The value of 1 becomes smaller. Conversely, from this fact, it can be seen that α should be a value that minimizes E1. Equation 19 is

【数１４】Ｅ１＝ΣΣ｛（I1-I2 ）²−2 α（I1-I2 ）＋α²｝＝ΣΣ（I1-I2 ）²−2 αΣΣ（I1-I2 ）＋ΣΣα² （式２０）領域の総画素数をＮと仮定すれば、ΣΣ１＝Ｎだから、
式２０は、Equation 14] E1 = ΣΣ {(I1-I2 ) 2 -2 α (I1-I2) + α 2} = ΣΣ (I1-I2) 2 -2 αΣΣ (I1-I2) + ΣΣα 2 ( Equation 20) area total of Assuming that the number of pixels is N, since ΣΣ1 = N,
Equation 20 is

【数１５】Ｅ１＝ΣΣ（I1-I2 ）²−2 αΣΣ（I1-I2 ）＋Ｎα² （式２１）となる。従って、ｄＥ１／ｄα＝−２ΣΣ（I1-I2 ）＋２Ｎα であるから、 α＝｛ΣΣ（I1-I2 ）｝／Ｎ（式２２）のとき、Ｅ１は最小となる。このαは、ブロックマッチ
ングの対象となる２つの領域間の各画素の色濃度差の平
均値と言い換えられる。式２２を式２１に代入して計算
すれば、[Equation 15] E1 = {(I1-I2) ² −2 α} (I1-I2) + Nα ² (Equation 21) Therefore, since dE1 / dα = −2ΣΣ (I1-I2) + 2Nα, when α = {(I1-I2)} / N (Equation 22), E1 becomes the minimum. This α is rephrased as the average value of the color density difference of each pixel between the two regions to be subjected to block matching. Substituting equation 22 into equation 21 and calculating,

【数１６】Ｅ１＝ΣΣ（I1-I2 ）²−｛ΣΣ（I1-I2 ）｝²／Ｎ（式２３）となるため、結局バイアスド・ブロックマッチングでは
式２３を計算すればよいことになる。式２３の採用によ
り、仮にカメラ１と２が完全に同じ被写体を写している
とすれば、Ｅ１はほぼ０となる。一方、このとき式２か
らＥ２もほぼ０となるため、バイアスドブロックマッチ
ングは、映像類似性の判断と、相対位置の妥当性の判断
の原点を一致させる効果がある。以降、実施形態１同様
の処理を経て最良マッチングを探索すればよい。## EQU16 ## Since E1 = {(I1-I2) ² -{(I1-I2)} ² / N (Equation 23), Equation 23 can be calculated in biased block matching. According to the adoption of Expression 23, if cameras 1 and 2 capture completely the same subject, E1 becomes almost zero. On the other hand, at this time, since E2 is also substantially equal to 0 from Equation 2, biased block matching has an effect of matching the origin of the determination of the video similarity with the determination of the validity of the relative position. Thereafter, the best matching may be searched through the same processing as in the first embodiment.

【０１０９】ここでは当然ながら、ＨＶＣ濃度などＲＧ
Ｂ濃度以外の色空間による濃度を採用してもよい。２乗
誤差の代わりに１乗誤差、すなわち残差に基づいてブロ
ックマッチングを行ってもよい。なお、式２２で与えら
れる補正値αの値がある範囲を超えた場合、バイアスド
・ブロックマッチングを中止してもよい。カメラ１、２
は同じ被写体を写していても、写す角度が違うため、仮
にこれらのカメラの特性が完全に同じであっても、当然
撮影された映像にはある程度の差異がある。これをすべ
て補正してしまうと、不必要にＥ１の値が小さくなり、
正しい評価ができない場合もありうる。Here, of course, RG such as HVC concentration
A density in a color space other than the B density may be adopted. Block matching may be performed based on a first-order error, that is, a residual error instead of a square error. If the value of the correction value α given by Expression 22 exceeds a certain range, the biased block matching may be stopped. Camera 1, 2
Even if the same subject is photographed, the angle of photographing is different. Therefore, even if the characteristics of these cameras are completely the same, the captured images naturally have some differences. If all this is corrected, the value of E1 becomes unnecessarily small,
There may be cases where a correct evaluation cannot be made.

【０１１０】なお、バイアスド・ブロックマッチングを
中止した場合、通常のブロックマッチングによる評価値
を映像類似性の評価値としてもよいが、補正が許される
範囲の上限値（これをＴとする）の分だけ補正した後の
値を映像類似性の評価値としてもよい。その場合の評価
値は、以下の式で計算される。When the biased block matching is stopped, the evaluation value based on the ordinary block matching may be used as the evaluation value of the video similarity. The value after the correction may be used as the evaluation value of the video similarity. The evaluation value in that case is calculated by the following equation.

【０１１１】Ｅ１＝ΣΣ（I1-I2 ）²−｛ΣΣ（I1-I2
）｝²／Ｎ＋Ｎｘ² ただし、ｘ＝｜ΣΣ（I1−I2）／Ｎ｜−Ｔで与えられ
る。E1 = ΣΣ (I1-I2) ^2- ｛ΣΣ (I1-I2
)｝ ² / N + Nx ² where x = | ΣΣ (I 1 −I 2) / N | −T.

【０１１２】（４）実施形態１のＳ１３（対応点の初期
位置の決定）では、特徴点として、異時刻フレームｔ、
ｔ' 、・・・において位置が安定的に変化している点を
選んだが、ここでは選定基準を加重する。(4) In S13 of the first embodiment (determination of the initial position of the corresponding point), the different time frame t,
At t ',..., a point whose position is stably changed is selected, but the selection criterion is weighted here.

【０１１３】図１３は実施形態２で導入される特徴点の
選定基準を示す図である。同図において、Ｆ１０〜Ｆ１
２の３枚がカメラ１によって撮影された異時刻フレー
ム、Ｆ２０〜Ｆ２２の３枚がカメラ２によって撮影され
た異時刻フレームである。それぞれ左右２枚の組が同時
刻フレームを示す。ここではある点Ｐに注目し、点Ｐの
位置の異時刻フレーム間の動きをベクトルＡｎで、点Ｐ
の同時刻フレーム間の変位をベクトルＢｎ（ｎ：自然
数）でそれぞれ表している。FIG. 13 is a diagram showing criteria for selecting feature points introduced in the second embodiment. In the figure, F10 to F1
3 are different time frames taken by the camera 1, and F20 to F22 are different time frames taken by the camera 2. Each set of two right and left pieces indicates the same time frame. Here, focusing on a certain point P, the movement of the position of the point P between different time frames is represented by
Are represented by a vector Bn (n: natural number).

【０１１４】以上の設定の下、実施形態２では、次の基
準を満足する点を特徴点として選定する。Under the above settings, in Embodiment 2, points satisfying the following criteria are selected as feature points.

【０１１５】（ａ）ベクトルＢｎがほぼ一定であるか、
ほぼ一定変化するあるいは、この他、（ｂ）ベクトルＡｎがほぼ一定であるか、ほぼ一定変化
するの基準も追加し、（ａ）、（ｂ）をともに満足する点を
特徴点として選定してもよい。(A) Whether the vector Bn is substantially constant,
In addition to the above, a criterion of (b) that the vector An is almost constant or that it changes almost constant is added, and a point satisfying both (a) and (b) is selected as a feature point. Is also good.

【０１１６】（ｂ）は実施形態１で導入した条件に相当
する。既述のごとく、多眼撮影では同時刻フレームのみ
から奥行き情報を求めることができる。しかしその前提
となる映像間の対応関係の正確な把握は、これとは別問
題であり、異時刻フレーム間の情報も積極的に利用すべ
きである。上記の２条件を同時に満たす点は、相当正確
に追跡されていると考えられるため、二次元変位情報の
抽出に対して重要な手掛かりを与える。ただし入力が静
止映像の場合は、既知の動的計画法（ダイナミック・プ
ログラミング）によって対応点を求めることもできる。(B) corresponds to the condition introduced in the first embodiment. As described above, in multi-view imaging, depth information can be obtained only from the same time frame. However, accurate understanding of the correspondence between videos as a premise is another problem, and information between frames at different times should be actively used. Points satisfying the above two conditions at the same time are considered to be tracked quite accurately, and thus provide important clues to extraction of two-dimensional displacement information. However, if the input is a still image, the corresponding point can be obtained by a known dynamic programming method (dynamic programming).

【０１１７】［工程２］奥行き情報の獲得工程１で求められた映像各部位の変位から各部位の奥行
き情報を導出する。多眼の場合、ある時刻ｔで図１０の
状態が実現されるので、以下、実施形態１の工程３の方
法によって奥行き情報を獲得すればよい。[Step 2] Acquisition of Depth Information Depth information of each part is derived from the displacement of each part of the video obtained in Step 1. In the case of multi-view, since the state of FIG. 10 is realized at a certain time t, depth information may be obtained by the method of step 3 of the first embodiment.

【０１１８】ここで注意すべきは、撮影カメラの位置関
係が固定であるため、この関係およびカメラの倍率また
は焦点距離が既知であれば、実施形態１で決まらないと
されたスケールファクターｃも含め、実施形態２では正
しい奥行き情報が求まる。It should be noted here that since the positional relationship between the photographing cameras is fixed, if this relationship and the magnification or the focal length of the camera are known, the scale factor c determined to be undetermined in the first embodiment is also included. In the second embodiment, correct depth information is obtained.

【０１１９】［工程３］奥行き情報の利用実施形態１の工程４（奥行き情報の利用）と同等の処理
を行えばよい。[Step 3] Use of Depth Information Processing equivalent to step 4 (use of depth information) in the first embodiment may be performed.

【０１２０】実施形態３．実施形態１、２では抽出した
奥行き情報を利用して警備システムを構築したが、実施
形態３では奥行き情報をコンピュータビジョン技術の一
部に利用する。 Embodiment 3 In the first and second embodiments, a security system is constructed using the extracted depth information. However, in the third embodiment, the depth information is used for a part of the computer vision technology.

【０１２１】コンピュータビジョンと呼ばれる研究分野
では、ロボットの自動制御を主目的とし、対象の三次元
構造や三次元運動を推定する手法が研究されている。具
体的には、ロボットの自律走行のために、ステレオカメ
ラから物体を撮影したり、または単眼カメラを移動させ
ながら物体を撮影することにより、物体までの距離を正
しく把握する手法などである。この手法のいくつかの観
点については、例えば「１９９０年画像符号化シンポシ
ジウム（ＰＣＳＪ９０）」の５７ページに記載されてい
る。In a research field called computer vision, a technique for estimating a three-dimensional structure and a three-dimensional motion of a target is mainly studied for automatic control of a robot. Specifically, for autonomous running of the robot, there is a method of correctly grasping the distance to the object by photographing the object from a stereo camera or photographing the object while moving the monocular camera. Some aspects of this approach are described, for example, on page 57 of "1990 Image Coding Symposium (PCSJ90)".

【０１２２】コンピュータビジョンにおいて物体までの
距離を把握する際、セグメント・マッチングが行われる
ことが多い。セグメント・マッチングは画像領域間のマ
ッチングをとるという意味ではブロック・マッチングと
共通するが、画像の何らかの特徴領域を対象にマッチン
グをとる点で通常のブロック・マッチングとは異なる。
特徴領域を対象に処理をなすことより、通常のブロック
・マッチングよりも高い精度で物体の三次元位置を特定
するためである。When grasping the distance to an object in computer vision, segment matching is often performed. Segment matching is common to block matching in the sense that matching between image regions is performed, but differs from ordinary block matching in that matching is performed on some characteristic region of an image.
This is because the three-dimensional position of the object is specified with higher accuracy than the ordinary block matching by performing the processing on the characteristic region.

【０１２３】図１４はエッジセグメントを対象としてセ
グメント・マッチングをとる方法を示す図である。同図
（ａ）はステレオカメラのうち左カメラから撮影された
映像で、いまエッジ１２０が探索の対象である。一方、
同図（ｂ）は右カメラの映像で、複数のエッジ１２２〜
１２６が存在する。現実の映像の場合、エッジがさらに
多数存在することは容易に想像できる。実際の処理の
際、まず右カメラの映像からエッジをもつ領域を選定
し、これらのそれぞれについて詳細なセグメント・マッ
チングを行う。ロボットの制御の場合、物体の位置認識
には非常に高い精度が要求されるため、セグメント・マ
ッチングに要する時間は一般に非常に長い。FIG. 14 is a diagram showing a method of performing segment matching on edge segments. FIG. 3A shows an image shot from the left camera among the stereo cameras, and the edge 120 is now a search target. on the other hand,
FIG. 2B shows the image of the right camera, which includes a plurality of edges 122 to 122.
126 are present. In the case of a real image, it is easy to imagine that there are many more edges. In the actual processing, first, a region having an edge is selected from the image of the right camera, and detailed segment matching is performed for each of these regions. In the case of controlling a robot, very high accuracy is required for recognizing the position of an object, so that the time required for segment matching is generally very long.

【０１２４】本実施形態はこの課題の解消を図る。すな
わち、まず実施形態１の工程１〜３を行い、画像各部位
の奥行き情報を得る。この際、エッジなど特徴領域の選
定を行う必要はないため、処理時間は比較的短い。こう
してある程度の精度で奥行きが判明すれば、この奥行き
情報によってセグメント・マッチングによる探索領域を
絞り込むことができる。This embodiment aims to solve this problem. That is, first, steps 1 to 3 of the first embodiment are performed to obtain depth information of each part of the image. At this time, since it is not necessary to select a characteristic region such as an edge, the processing time is relatively short. If the depth is determined with a certain degree of accuracy in this way, it is possible to narrow down the search area by segment matching based on the depth information.

【０１２５】図１５は探索領域の絞り込み原理を説明す
る図である。同図（ａ）に示す左カメラの映像に含まれ
るエッジ１３０が探索の対象エッジであり、これを同図
（ｂ）の右カメラの映像から探すものとする。エッジ１
３０は左カメラの映像の右端からｘ０の位置にあるとす
る。なお、ここでカメラは平行に設置されているとす
る。FIG. 15 is a diagram for explaining the principle of narrowing the search area. An edge 130 included in the image of the left camera shown in FIG. 9A is a search target edge, and this edge is to be searched for from the image of the right camera in FIG. Edge 1
Assume that 30 is located at x0 from the right end of the image of the left camera. Here, it is assumed that the cameras are installed in parallel.

【０１２６】この前提において、このエッジ１３０の奥
行きｚの値により、エッジ１３０が右カメラの映像のい
ずれの位置に見えるべきかが、数学的に一意的に決ま
る。例えばｚが無限大なら視差がないはずだから、同図
（ａ）（ｂ）で、カメラ間隔だけ離れた同じ位置、すな
わち映像の右端からｘ０＋ｌ（ｌは画像上におけるカメ
ラ間隔）の位置に見えることになる。同図（ｂ）ではこ
の位置を直線Ｌ１で示している。この位置からｚが小さ
くなるに従い、同図（ｂ）においてエッジ１３０は次第
に左に寄っていく。ここではその様子をｚ＝１００、５
０、１０という数値を例に、それぞれ直線Ｌ２、Ｌ３、
Ｌ４という位置で模式的に示している。On this assumption, the value of the depth z of the edge 130 mathematically uniquely determines at which position in the image of the right camera the edge 130 should be seen. For example, if z is infinity, there should be no parallax. Therefore, in FIGS. 7A and 7B, the image is seen at the same position separated by the camera interval, that is, at the position of x0 + 1 (1 is the camera interval on the image) from the right end of the video. become. This position is indicated by a straight line L1 in FIG. As z decreases from this position, the edge 130 gradually moves to the left in FIG. Here, z = 100, 5
Taking the values 0 and 10 as examples, straight lines L2, L3,
This is schematically shown at a position L4.

【０１２７】一方、こうした数学的な考察とは別に、映
像の各部位の奥行きは実施形態１の工程１〜３で判明し
ている。そこで、実施形態１の方法で得られた奥行きの
分布と図１５（ｂ）の分布から、奥行きの一致する位置
を求めれば、この位置がエッジ１３０に対応するセグメ
ントである確率が高い。図１５（ｂ）のある領域１３２
がｚ＝５０の直線Ｌ３上にあり、かつその領域の奥行き
が工程１〜３によって５０程度の値に求まれば、領域１
３２付近にエッジ１３０の対応領域の存在する可能性が
高い。そこで、この領域１３２付近に絞って詳細なセグ
メント・マッチングを実施すればよい。On the other hand, apart from such mathematical considerations, the depth of each part of the image has been found in steps 1 to 3 of the first embodiment. Therefore, if a position where the depths match is obtained from the depth distribution obtained by the method of the first embodiment and the distribution in FIG. 15B, the probability that this position is a segment corresponding to the edge 130 is high. Area 132 in FIG. 15B
Is on the straight line L3 at z = 50 and the depth of the region is determined to be a value of about 50 by the steps 1 to 3, the region 1
There is a high possibility that a corresponding area of the edge 130 exists near 32. Therefore, it is sufficient to perform detailed segment matching focusing on the area 132.

【０１２８】以上、本実施形態によれば、セグメント・
マッチングの処理時間を大幅に短縮できるのみならず、
よく似たセグメントが多数存在する画像における誤対応
を低減することも可能となる。As described above, according to the present embodiment, the segment
Not only can the processing time for matching be significantly reduced,
It is also possible to reduce erroneous correspondences in an image in which many similar segments exist.

【０１２９】実施形態４．本実施形態では、実施形態
１、２で抽出した奥行き情報をコンピュータシステムの
ユーザインタフェイス、具体的には指示入力に利用す
る。 Embodiment 4 FIG . In the present embodiment, the depth information extracted in the first and second embodiments is used for a user interface of a computer system, specifically, for inputting an instruction.

【０１３０】周知のごとく、ユーザが指示をコンピュー
タに入力する場合、キーボード、マウス、タッチパネル
などを用いることが多い。こうした入力装置を使わない
ユーザインタフェイスについても研究されているが、従
来は一般に、人の顔や手の形状のモデルをコンピュータ
内に保持し、このモデルと実際のユーザの顔の表情や手
の形とを対比することにより、指示を読み取ろうとする
ものであった。しかし現実には、正確なモデルを持つこ
とが困難であったり、手などのいわゆる「柔らかいモデ
ル」のマッチングは精度や処理時間の面で問題があり、
実用化が困難であった。本実施形態では、奥行き情報を
もとにコンピュータシステムの比較的簡易的なユーザイ
ンタフェイスを実現する。As is well known, when a user inputs an instruction to a computer, the user often uses a keyboard, a mouse, a touch panel, or the like. Although user interfaces that do not use such input devices have also been studied, conventionally, in general, a model of the shape of a human face or hand is held in a computer, and this model is used in combination with the facial expression of the actual user's face or hand. They tried to read the instructions by comparing them with shapes. However, in reality, it is difficult to have an accurate model, or matching of so-called "soft models" such as hands has problems in terms of accuracy and processing time,
Practical application was difficult. In the present embodiment, a relatively simple user interface of the computer system is realized based on the depth information.

【０１３１】図１６は本実施形態のシステムの構成とそ
の利用方法を示す模式図である。同図では、ユーザ１４
０がＰＣ１４２に対して指示を行う。ＰＣ１４２の一部
にはユーザ１４０を撮影するカメラ１４４が設置されて
いる。この構成にて、ユーザ１４０は指先１４６によ
り、上下左右等の簡単な動作を行う。カメラ１４４から
撮影されたユーザ１４０の映像についてはＰＣ１４２内
部で実施形態１または実施形態２の工程１〜３に従い、
奥行き情報が計算される。FIG. 16 is a schematic diagram showing the configuration of the system of this embodiment and a method of using the system. In FIG.
0 instructs the PC 142. A camera 144 for photographing the user 140 is installed in a part of the PC 142. With this configuration, the user 140 uses the fingertip 146 to perform simple operations such as up, down, left, and right. The image of the user 140 taken from the camera 144 is processed inside the PC 142 according to steps 1 to 3 of the first or second embodiment.
Depth information is calculated.

【０１３２】ここで、ユーザの指先１４６はカメラ１４
４から最も近い位置にあるため、奥行きが最小になる映
像部分を選定することにより、指先１４６を認識するこ
とができる。以降、指先１４６が動いたとき、奥行きが
最小になる映像部分を追跡することによって指先１４６
の動きを知ることができる。ＰＣ１４２は、指先１４６
がある方向に動いたことがわかれば、その方向に画面上
のカーソルを動かす等の処理を行えばよいし、指先１４
６が例えば円を描けば「ＯＫ」サインとして何らかの処
理を実行すればよい。Here, the user's fingertip 146 is
4, the fingertip 146 can be recognized by selecting an image portion having the minimum depth. Thereafter, when the fingertip 146 moves, the video portion having the minimum depth is tracked to thereby enable the fingertip 146 to move.
You can know the movement of. The PC 142 has a fingertip 146
If it is known that the cursor has moved in a certain direction, a process such as moving the cursor on the screen in that direction may be performed.
For example, if 6 draws a circle, some processing may be executed as the “OK” sign.

【０１３３】実施形態５．本実施形態では、実施形態
１、２によって抽出された奥行き情報を三次元表示装置
の表示制御に利用する。 Embodiment 5 FIG . In the present embodiment, the depth information extracted in the first and second embodiments is used for display control of the three-dimensional display device.

【０１３４】画像の立体表示はいくつかの方法で実現さ
れている。代表的なものに、画面にレンチキュラーレン
ズを重ねた表示装置がある。レンチキュラーレンズは表
面に微細な弧状の凸部が連続的に多数設けられており、
このレンズの屈折作用によって、右目からは右目用画像
を構成する画素のみが、また左目からは左目用画像を構
成する画素のみが見える仕組みになっている。この右目
用および左目用画像には視差がつけられており、立体視
が可能となる。The stereoscopic display of an image is realized by several methods. A typical example is a display device in which a lenticular lens is superimposed on a screen. The lenticular lens is provided with a large number of fine arc-shaped convex portions continuously on the surface,
Due to the refraction effect of this lens, only the pixels constituting the image for the right eye can be seen from the right eye, and only the pixels constituting the image for the left eye can be seen from the left eye. The right-eye image and the left-eye image are provided with parallax, so that stereoscopic viewing is possible.

【０１３５】レンチキュラーレンズを用いる表示装置の
課題は、観視者の頭の位置が少しでもずれると逆立体視
（逆視ともいう）状態になることである。逆立体視と
は、右目から左目用の画素が、左目から右目用の画素が
それぞれ見える状態である。この問題に対処すべく、ヘ
ッドトラッキング技術が知られている。ヘッドトラッキ
ング技術は、「三次元映像の基礎」（泉武博監修，ＮＨ
Ｋ放送技術研究所編、オーム社）の１５４、１５５ペー
ジに紹介されるように、観視者の頭の動きを赤外線、磁
気、超音波などで検出し、頭の移動に連動させて右目用
の画素と左目用の画素を適宜入れ換えるものである。こ
の他に、表示装置の上部に赤外線センサをおき、観視者
の頭の動きを検出してレンチキュラースクリーンの位置
を機械的に変更する装置がある旨も記載されている。A problem with a display device using a lenticular lens is that if the position of the viewer's head deviates even a little, it will be in a state of inverted stereoscopic vision (also called reverse vision). Inverse stereoscopic viewing is a state in which a pixel for the right eye is seen from the right eye, and a pixel for the right eye is seen from the left eye. To address this problem, head tracking technology is known. Head tracking technology is based on “Basics of 3D video” (supervised by Izumi Takehiro, NH
As introduced on pages 154 and 155 of K Broadcasting Research Institute, Ohmsha), the movement of the viewer's head is detected by infrared, magnetism, ultrasonic waves, etc., and linked to the movement of the head for the right eye And the pixel for the left eye are appropriately replaced. In addition, it is described that an infrared sensor is provided above the display device to detect the movement of the viewer's head and mechanically change the position of the lenticular screen.

【０１３６】しかしながらこれらのヘッドトラッキング
技術では、赤外線、超音波についてはスキャン機構やア
レイ構造が必要となる。磁気的な検出の場合は観視者が
頭にセンサをつけることが必要になる。However, these head tracking techniques require a scanning mechanism and an array structure for infrared rays and ultrasonic waves. In the case of magnetic detection, it is necessary for the observer to attach a sensor to the head.

【０１３７】本実施形態のシステムは、こうした課題を
解消するものである。本実施形態では、実施形態４の図
１６のごとく、立体表示を行う表示装置にカメラを併設
し、このカメラから観視者を撮影する。このカメラの映
像から観視者の奥行きを含む三次元的な位置が判明する
ため、観視者の位置に適合する表示制御を行う。The system according to the present embodiment solves such a problem. In the present embodiment, as shown in FIG. 16 of the fourth embodiment, a camera is attached to a display device that performs stereoscopic display, and a viewer is photographed from the camera. Since the three-dimensional position including the depth of the viewer is determined from the image of the camera, display control suitable for the position of the viewer is performed.

【０１３８】制御方法の例として、観視者の奥行きが小
さいとき、すなわち観視者が表示装置に近い場合は、立
体表示すべき左右画像の視差を全体に少なめに調整し、
観視者から見て表示面（つまり画面位置）よりも向こう
側で被写体が像を結ぶよう配慮する。この逆に、観視者
の奥行きが大きいときは、表示面よりもこちら側に像を
結ぶよう調整したり、被写体間で奥行き感の差が明確に
なるよう、与える視差の範囲を広げる等の措置をとるこ
とができる。As an example of the control method, when the depth of the viewer is small, that is, when the viewer is close to the display device, the parallax of the left and right images to be stereoscopically displayed is adjusted to be small as a whole.
Consideration is given so that the subject forms an image on the other side of the display surface (that is, the screen position) as viewed from the viewer. Conversely, when the viewer's depth is large, it is necessary to adjust the image so that it is closer to this side than the display surface, or to increase the range of parallax to be given so that the difference in the sense of depth between subjects becomes clear. Action can be taken.

【０１３９】図１７は本システムの構成図である。同図
のごとくこのシステムは、表示装置として左目で見るた
めの左ディスプレイパネル１５８と右目で見るための右
ディスプレイパネル１６０を採用する。また、観視者を
撮影するカメラ１５２と、その映像から観視者の頭の奥
行きを抽出する奥行き抽出部１６２が設けられている。FIG. 17 is a block diagram of the present system. As shown in the figure, this system employs a left display panel 158 for viewing with the left eye and a right display panel 160 for viewing with the right eye as display devices. Further, a camera 152 for photographing the viewer and a depth extracting unit 162 for extracting the depth of the viewer's head from the video are provided.

【０１４０】表示すべき画像は入力端子１５０に与えら
れる。この画像は左目用の画像であり、本システムでは
この画像から画素を変位させて右目用画像を生成する。
画素の変位によって視差が生じ、立体視が可能となる。
入力端子１５０から入力された画像はバッファメモリ１
５４と右目用画像生成部１５６に並行して入力される。
バッファメモリ１５４は、右目用画像生成部１５６によ
る処理遅延を吸収するよう働く。左ディスプレイパネル
１５８は、バッファメモリ１５４から出力された画像を
表示し、右ディスプレイパネル１６０は、右目用画像生
成部１５６によって変位の与えられた画像を表示する。The image to be displayed is given to the input terminal 150. This image is an image for the left eye, and the system generates a right-eye image by displacing pixels from the image.
Parallax is generated by the displacement of the pixel, and stereoscopic viewing is enabled.
The image input from the input terminal 150 is stored in the buffer memory 1
54 and the right-eye image generation unit 156.
The buffer memory 154 functions to absorb a processing delay caused by the right-eye image generation unit 156. The left display panel 158 displays the image output from the buffer memory 154, and the right display panel 160 displays the image displaced by the right-eye image generation unit 156.

【０１４１】奥行き抽出部１６２で抽出された観視者の
頭の奥行きは、右目用画像生成部１５６に入力される。
ここで、この奥行きが小さいときには画素の変位量を小
さくとり、奥行きが大きいときは大きくとる。この結
果、観視者の位置に応じて良好な立体視が実現する。な
お、ここでは右目用画像のみを生成したが、左目用画像
についてももとの画像の画素に変位を加えることで生成
してもよい。The depth of the viewer's head extracted by the depth extraction unit 162 is input to the right-eye image generation unit 156.
Here, when the depth is small, the displacement amount of the pixel is set small, and when the depth is large, the displacement amount is set large. As a result, good stereoscopic vision is realized according to the position of the viewer. Although only the image for the right eye is generated here, the image for the left eye may be generated by adding displacement to the pixels of the original image.

【０１４２】[0142]

【発明の効果】本発明の三次元位置認識利用システムを
警備システムとして用いる場合、赤外線や超音波の反射
を見る警備システムで必要なスキャン機構が不要であ
り、赤外線を感知するセンサと異なり、熱を発しない物
体でも検出することができる。When the system for utilizing three-dimensional position recognition according to the present invention is used as a security system, a scanning mechanism required for a security system that looks at the reflection of infrared rays and ultrasonic waves is unnecessary, and unlike a sensor that senses infrared rays, heat is not used. Even objects that do not emit can be detected.

【０１４３】本発明の三次元位置認識利用システムをコ
ンピュータビジョンの前処理に用いる場合、セグメント
・マッチングの時間短縮と誤対応の低減が可能になる。When the system for utilizing three-dimensional position recognition according to the present invention is used for preprocessing of computer vision, it is possible to reduce the time required for segment matching and reduce erroneous correspondence.

【０１４４】本発明の三次元位置認識利用システムをコ
ンピュータのユーザインタフェイスに用いる場合、キー
ボードなどの入力装置がいらない。また、表情や手の形
状などをテンプレートまたはモデルとして保持する必要
がない。さらに、表情の認識などに比べて処理時間が短
かい。When the three-dimensional position recognition utilizing system of the present invention is used for a user interface of a computer, an input device such as a keyboard is not required. Also, there is no need to hold facial expressions, hand shapes, and the like as templates or models. Furthermore, the processing time is shorter than the recognition of facial expressions.

【０１４５】本発明の三次元位置認識利用システムを疑
似立体画像の表示制御に用いる場合、観視者の位置を知
るためのハードウエア構成が簡単になる。When the three-dimensional position recognition utilizing system of the present invention is used for controlling the display of a pseudo three-dimensional image, the hardware configuration for knowing the position of the observer is simplified.

【図面の簡単な説明】[Brief description of the drawings]

【図１】実施形態１によって三次元表示画像を生成す
るための主な工程を示す図である。FIG. 1 is a diagram illustrating main steps for generating a three-dimensional display image according to a first embodiment.

【図２】映像フレーム間の対応関係を検出するための
フローチャートである。FIG. 2 is a flowchart for detecting a correspondence between video frames.

【図３】基準フレームｔに代表点を設定する様子を示
す図である。FIG. 3 is a diagram showing how a representative point is set in a reference frame t.

【図４】ブロックマッチングの様子を示す図である。FIG. 4 is a diagram showing a state of block matching.

【図５】仮の対応点Ｐt'（ｉ, ｊ）ごとにＥ１の値を
縦方向に示した模式図である。FIG. 5 is a schematic diagram showing the value of E1 in the vertical direction for each provisional corresponding point Pt ′ (i, j).

【図６】Ｓ１２ステップの結果求められた対応点と代
表点の関係を示す図である。FIG. 6 is a diagram showing a relationship between corresponding points and representative points obtained as a result of step S12.

【図７】対応点の相対位置を評価する原理を説明する
図である。FIG. 7 is a diagram illustrating a principle of evaluating a relative position of a corresponding point.

【図８】図６の対応点候補に対して本ステップの改善
処理を行った結果を示す図である。8 is a diagram illustrating a result of performing the improvement processing of this step on the corresponding point candidates in FIG. 6;

【図９】ある点Ｐの画面上の移動と三次元空間での移
動の対応を示す図である。FIG. 9 is a diagram showing a correspondence between a movement of a certain point P on the screen and a movement in a three-dimensional space.

【図１０】カメラの三次元移動とある点Ｐの画面上の
移動から点Ｐの三次元座標を導く原理を説明する図であ
る。FIG. 10 is a diagram illustrating the principle of deriving the three-dimensional coordinates of a point P from the three-dimensional movement of the camera and the movement of a point P on the screen.

【図１１】フレームｔにおいて各代表点に数値が与え
られた状態を示す図である。FIG. 11 is a diagram showing a state where numerical values are given to respective representative points in a frame t.

【図１２】実施形態２の主な工程を示す図である。FIG. 12 is a diagram showing main steps of a second embodiment.

【図１３】実施形態２で導入される特徴点の選定基準
を示す図である。FIG. 13 is a diagram illustrating a feature point selection criterion introduced in the second embodiment.

【図１４】エッジセグメントを対象としてセグメント
・マッチングをとる方法を示す図である。FIG. 14 is a diagram illustrating a method of performing segment matching on an edge segment.

【図１５】実施形態３による探索領域の絞り込み原理
を説明する図である。FIG. 15 is a diagram illustrating a principle of narrowing a search area according to the third embodiment.

【図１６】実施形態４のシステムの構成とその利用方
法を示す模式図である。FIG. 16 is a schematic diagram illustrating a configuration of a system according to a fourth embodiment and a method of using the system.

【図１７】実施形態５のシステムの構成図である。FIG. 17 is a configuration diagram of a system according to a fifth embodiment.

【符号の説明】[Explanation of symbols]

１２０，１３０探索の対象のエッジ、１２２〜１２６
エッジ、１３２領域、１４０ユーザ、１４２Ｐ
Ｃ、１４４カメラ、１４６指先、１５０入力端子、
１５２カメラ、１５４バッファメモリ、１５６右
目用画像生成部、１５８左ディスプレイパネル、１６
０右ディスプレイパネル、１６２奥行き抽出部。120, 130 Edge to be searched, 122 to 126
Edge, 132 area, 140 user, 142 P
C, 144 camera, 146 fingertip, 150 input terminals,
152 camera, 154 buffer memory, 156 image generator for right eye, 158 left display panel, 16
0 Right display panel, 162 depth extraction unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者荒川勉大阪府守口市京阪本通２丁目５番５号三洋電機株式会社内 ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Tsutomu Arakawa 2-5-5 Keihanhondori, Moriguchi-shi, Osaka Sanyo Electric Co., Ltd.

Claims

【特許請求の範囲】[Claims]

【請求項１】物体の三次元位置を認識して利用するシ
ステムであって、撮影された物体の奥行きを抽出する抽
出手段を備え、抽出された奥行きをもとに警備を行うこ
とを特徴とするシステム。1. A system for recognizing and using a three-dimensional position of an object, comprising: extracting means for extracting the depth of a photographed object, wherein security is performed based on the extracted depth. System to do.

【請求項２】前記システムは、物体が近づいたときに
所定の処理を行う請求項１に記載のシステム。2. The system according to claim 1, wherein the system performs a predetermined process when an object approaches.

【請求項３】物体の三次元位置を認識して利用するシ
ステムであって、撮影された物体の奥行きを抽出する抽
出手段を備え、抽出された奥行きをもとにコンピュータ
ビジョン技術におけるセグメント・マッチングのための
前処理を行うことを特徴とするシステム。3. A system for recognizing and using a three-dimensional position of an object, comprising extraction means for extracting the depth of a photographed object, and performing segment matching in computer vision technology based on the extracted depth. A system for performing pre-processing for

【請求項４】前記前処理は、物体の奥行きを利用して
セグメント・マッチングにおける探索エリアを狭める処
理である請求項３に記載のシステム。4. The system according to claim 3, wherein the preprocessing is processing for narrowing a search area in segment matching using a depth of an object.

【請求項５】物体の三次元位置を認識して利用するシ
ステムであって、撮影された物体の奥行きを抽出する抽
出手段を備え、抽出された奥行きをもとにユーザからの
指示を認識してこれを受け付けることを特徴とするシス
テム。5. A system for recognizing and using a three-dimensional position of an object, comprising extraction means for extracting a depth of a photographed object, and recognizing an instruction from a user based on the extracted depth. A system characterized by receiving this.

【請求項６】撮影された物体のうち最も奥行きの小さ
な箇所の動きをユーザの指示と判断してこれを受け付け
る請求項５に記載のシステム。6. The system according to claim 5, wherein a movement of a portion having the smallest depth among the photographed objects is determined as a user's instruction and accepted.

【請求項７】物体の三次元位置を認識して利用するシ
ステムであって、撮影された物体の奥行きを抽出する抽
出手段を備え、抽出された奥行きをもとに疑似立体画像
の表示制御を行うことを特徴とするシステム。7. A system for recognizing and using a three-dimensional position of an object, comprising extraction means for extracting a depth of a photographed object, and controlling display of a pseudo three-dimensional image based on the extracted depth. A system characterized by performing.

【請求項８】撮影された物体の位置に疑似立体画像の
観視者が存在すると判断し、この位置にて疑似立体画像
が良好に表示されるよう表示位置制御を行う請求項７に
記載のシステム。8. The display device according to claim 7, wherein it is determined that a viewer of the pseudo three-dimensional image exists at the position of the photographed object, and display position control is performed so that the pseudo three-dimensional image is favorably displayed at this position. system.