JP4136652B2

JP4136652B2 - Image processing apparatus, image processing method, image processing program, recording medium, and electronic apparatus

Info

Publication number: JP4136652B2
Application number: JP2002381504A
Authority: JP
Inventors: 雄史長谷川; 貴史北口; 憲彦村田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2002-12-27
Filing date: 2002-12-27
Publication date: 2008-08-20
Anticipated expiration: 2022-12-27
Also published as: JP2004213278A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像処理装置、画像処理方法、画像処理プログラム、記録媒体及び電子機器に関する。
【０００２】
【従来の技術】
現在、デジタルカメラの普及は急速に進んでおり、デジタルカメラを利用したアプリケーションの開発も盛んである。この例としては「画像のパノラマ合成」「画像の歪補正」「三次元形状の算出」「ロボットの自律移動」「車両の自動運転」等が挙げられる。この中でも「三次元形状の算出」は、電子商取引のために商品の三次元情報を提供したいというようなニーズが存在することから、その進展が期待されている。
【０００３】
【特許文献１】
特開平１１−１３６５７５号公報
【０００４】
【特許文献２】
特開平１１−３０６３６３号公報
【０００５】
【特許文献３】
特開２０００−１１５６３９号公報
【０００６】
【特許文献４】
特開２０００−１３４５３７号公報
【０００７】
【特許文献５】
特開２０００−２２８７４８号公報
【０００８】
【特許文献６】
特開２００１−９２９４４号公報
【０００９】
【特許文献７】
特開２００１−１４８０２５号公報
【００１０】
【特許文献８】
特開２００１−３２５０６９号公報
【００１１】
【非特許文献１】
Ｑ．−Ｔ．Ｌｕｏｎｇ，Ｏ．Ｄ．Ｆａｕｇｅｒａｓ「ＤｅｔｅｒｍｉｎｉｎｇｔｈｅＦｕｎｄａｍｅｎｔａｌＭａｔｒｉｘｗｉｔｈＰｌａｎｅｓ：ＩｎｓｔａｂｉｌｉｔｙａｎｄＮｅｗＡｌｇｏｒｉｔｈｍｓ」Ｐｒｏｃ．ｏｆＣｏｍｐｕｔｅｒＶｉｓｉｏｎａｎｄＰａｔｔｅｒｎＲｅｃｏｇｎｉｔｉｏｎ，ｐ４８９−４９４，１９９３
【００１２】
【非特許文献２】
芋生周作，浪江宏宗，安田明生著「ＲＴＫ−ＧＰＳによる船体の三次元姿勢測定：３−ＤｉｍｅｎｓｉｏｎａｌＡｔｉｔｕｄｅＭｅａｓｕｒｅｍｅｎｔｏｆａＳｈｉｐｂｙＲＴＫ−ＧＰＳＰｏｓｉｔｉｏｎｉｎｇ」東京商船大学
【００１３】
【非特許文献３】
浪江宏宗著「ＤＧＰＳ及びＲＴＫ−ＧＰＳの実用化に関する研究」東京商船大学
【００１４】
【非特許文献４】
徐剛，辻三郎著「三次元ビジョン」共立出版，第３章
【００１５】
【非特許文献５】
岡谷貴之，出口光一郎著「３次元向きセンサを取り付けたカメラを用いた投票によるカメラの並進運動の推定」ヒューマンインタフェースコンピュータビジョンとイメージメディア，２００１／９／１３
【００１６】
【非特許文献６】
ＲｉｃｈａｒｄＩ．Ｈａｒｔｌｅｙ，ＣｏｍｐｕｔｅｒＶｉｓｉｏｎＡｎｄＩｍａｇｅＵｎｄｅｒｓｔａｎｄｉｎｇ，ｖｏｌ．６８，Ｎｏ．２，Ｎｏｖｅｍｂｅｒ，ｐｐ１４６−１５７，１９９７
【００１７】
【非特許文献７】
森尻智昭，小野寺康浩，金谷健一著「２画像からの平面の３次元運動の計算」情報処理学会研究報告，９４−ＣＶ−９０，１９９４
【００１８】
【非特許文献８】
徐剛，辻三郎著「三次元ビジョン」共立出版，第６章
【００１９】
【非特許文献９】
金谷健一著「画像理解」司巧社，第４章
【００２０】
【発明が解決しようとする課題】
デジタルカメラ等の撮像装置により撮像された画像を使用して被写体の三次元形状を算出するためには、一般的に「ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎ」という手法が使用される。しかし、この手法では、被写体形状が平面形状に近いと遠近差が小さいため算出精度が低下する（非特許文献１）という。そのため、この手法では、被写体形状が紙や平板のような平面形状である場合には、被写体の三次元形状を精度よく算出することができないという問題がある。
【００２１】
被写体形状が平面形状である場合には、特許文献７に記載されているように、射影変換行列を算出して被写体の三次元形状を算出することができる。しかし、この手法では逆に、被写体形状が立体形状である場合には、被写体の三次元形状を精度よく算出することができないという問題がある。
【００２２】
したがって、本発明は、被写体形状が平面形状である場合にも立体形状である場合にも、被写体の三次元形状を精度よく算出することを課題とする。
【００２３】
【課題を解決するための手段】
本発明の画像処理装置は、撮像装置により撮像された画像について画像処理を実行して三次元形状を算出する画像処理装置であって、2 箇所の視点から撮像された画像上の４点以上の対応点の位置情報から射影変換行列を算出する変換行列算出手段と、前記射影変換行列算出手段により算出された射影変換行列により前記画像の被写体の形状判定に用いる値を抽出する奥行抽出手段と、前記形状判定に用いる値から前記画像の被写体が平面形状であるか立体形状であるかを判定する形状判定手段を備え、前記奥行抽出手段は、互いに異なる組合せの対応点から算出された２つの射影変換行列の差をもって前記形状判定に用いる値とし、平面形状である場合は、前記画像について平面形状用の画像処理を実行して三次元形状を算出して、立体形状である場合は、前記画像について立体形状用の画像処理を実行して三次元形状を算出することを特徴とする。
【００３４】
本発明の画像処理装置又は画像形成方法は、前記画像の被写体が平面形状であるか立体形状であるかを判定して、平面形状である場合は、前記画像について平面形状用の画像処理を実行して三次元形状を算出して、立体形状である場合は、前記画像について立体形状用の画像処理を実行して三次元形状を算出するため、被写体形状が平面形状である場合にも立体形状である場合にも、被写体形状の判定のおかげで、被写体の三次元形状を精度よく算出することができる。
【００３５】
本発明の他の局面における発明の画像処理装置又は画像形成方法は、前記画像の被写体の形状判定に用いる値を抽出して、前記形状判定に用いる値から前記画像の被写体が平面形状であるか立体形状であるかを判定して、平面形状である場合は、前記画像について平面形状用の画像処理を実行して三次元形状を算出して、立体形状である場合は、立体形状用の画像処理を実行して三次元形状を算出するため、被写体形状が平面形状である場合にも立体形状である場合にも、形状判定に用いる値の抽出を介した被写体形状の判定のおかげで、被写体の三次元形状を精度よく算出することができる。
【００３６】
本発明の他の局面における発明の画像処理装置又は画像形成方法は、前記画像の被写体の三次元形状を概算して、前記三次元形状から前記画像の被写体の形状判定に用いる値を抽出して、前記形状判定に用いる値から前記画像の被写体が平面形状であるか立体形状であるかを判定して、平面形状である場合は、前記画像について平面形状用の画像処理を実行して三次元形状を算出して、立体形状である場合は、前記画像について立体形状用の画像処理を実行して三次元形状を算出するため、被写体形状が平面形状である場合にも立体形状である場合にも、被写体の三次元形状の概算を介した被写体の形状判定に用いる値の抽出を介した被写体形状の判定のおかげで、被写体の三次元形状を精度よく算出することができる。
【００３７】
【発明の実施の形態】
本発明の実施の形態について説明する。
【００３８】
図１は、撮像装置１０１と画像処理装置１０２を表す。撮像装置１０１は、画像を撮像する装置であり、画像処理装置１０２は、その画像について画像処理を実行して三次元形状を算出する装置である。撮像装置１０１は、ここではデジタルカメラであり、画像処理装置１０２は、ここではパソコンであるとする。両者の間での画像の授受は、両者をコードで接続するなどして実行する。
【００３９】
図２は、撮像装置１０１のハードウェア構成を表す。被写体の画像は、固定レンズ２０１・ズームレンズ２０２・絞り機構２０３・シャッタ２０４・フォーカスレンズ２０５を通して、撮像素子２０６に形成される。撮像素子２０６からの画像信号は、ＣＤＳ回路２０７（ＣｏｒｒｅｌａｔｅｄＤｏｕｂｌｅＳａｍｐｌｉｎｇ）でサンプリングされて、Ａ／Ｄ変換器２０８でデジタル信号化される。この際のタイミングは、ＴＧ２１１（ＴｉｍｉｎｇＧｅｎｅｒａｔｏｒ）により生成される。撮像素子２０６からの画像信号はその後、ＩＰＰ２０９（ＩｍａｇｅＰｒｅ−Ｐｒｏｃｅｓｓｏｒ）でアパーチャ補正や圧縮等の画像処理が実行されて、画像バッファメモリ２１０で一時的に保存される。各ユニットの動作は、ＭＰＵ２１２により制御される。画像バッファメモリ２１０で保存された画像信号は、最終的に外部通信機器２２１により外部機器に転送される。
【００４０】
図３は、画像処理装置１０２のハードウェア構成を表す。ＣＰＵ３０１は、各種の処理や制御を実行する。ＳＤＲＡＭ３０２は、ＣＰＵ３０１の作業領域として利用されると共に、各種の処理プログラムや制御プログラムの固定情報の記録領域として利用される。外部Ｉ／Ｆ３２１は、インターネット等の電気通信回線や外部機器と接続するためのインタフェースであり、表示Ｉ／Ｆ３２２は、ディスプレイ等の表示装置と接続するためのインタフェースであり、入力Ｉ／Ｆ３２３は、キーボードやマウス等の入力装置と接続するためのインターフェースである。撮像装置１０１からの画像信号や、各種の処理プログラムや制御プログラムは、ＣＤ−ＲＷドライブ等の記録装置３１１や外部Ｉ／Ｆ３２１を介してＳＤＲＡＭ２０２にロードされる。この際、ＨＤＤ３１２に一旦セーブしてＳＤＲＡＭ３０２に適宜ロードしてもよい。
【００４１】
図４は、画像処理装置１０２の機能ブロックを表す。画像処理装置１０２は、形状判定部４０１と、第１処理部４０２と、第２処理部４０３と、記録部４０４を備える。これらの機能ブロックにより画像処理装置１０２は、撮像装置１０１により撮像された画像について画像処理を実行して三次元形状を算出する。
【００４２】
形状判定部４０１は、この画像の被写体が平面形状であるか立体形状であるかを自動的に判定する。例えば、この画像の被写体の奥行を抽出して、この奥行からこの画像の被写体が平面形状であるか立体形状であるかを自動的に判定する。これについては第１実施例として後述する。例えば、この画像の被写体の三次元形状を概算して、この三次元形状からこの画像の被写体の奥行を抽出して、この奥行からこの画像の被写体が平面形状であるか立体形状であるかを自動的に判定する。これについては第２実施例として後述する。
【００４３】
第１処理部４０２は、形状判定部４０１によりこの画像の被写体が平面形状であると判定された場合に、この画像について平面形状用の画像処理を実行して三次元形状を算出する。この詳細については後述することにする。第２処理部４０３は、形状判定部４０１によりこの画像の被写体が立体形状であると判定された場合に、この画像について立体形状用の画像処理を実行して三次元形状を算出する。この詳細については後述することにする。記録部４０４は、画像や判定結果や算出結果等を記録する。
【００４４】
このように、画像処理装置１０２は、上記の画像の被写体が平面形状であるか立体形状であるかを自動的に判定して、平面形状である場合は、上記の画像について平面形状用の画像処理を実行して三次元形状を算出して、立体形状である場合は、上記の画像について立体形状用の画像処理を実行して三次元形状を算出するため、被写体形状が平面形状である場合にも立体形状である場合にも、被写体形状の判定のおかげで、被写体の三次元形状を精度よく算出することができる。
【００４５】
（１）第１実施例
図５は、第１実施例の画像処理装置１０２の機能ブロックを表す。第１実施例の画像形成装置１０２は、図４の形状判定部４０１を奥行抽出部５０１と形状判定部５０２により構成したものである。これらの機能ブロックにより第１実施例の画像処理装置１０２は、撮像装置１０１により撮像された画像について画像処理を実行して三次元形状を算出する。
【００４６】
奥行抽出部５０１は、この画像の被写体の奥行を自動的に抽出して、形状判定部５０２は、この奥行からこの画像の被写体が平面形状であるか立体形状であるかを自動的に判定する。ここでは、２箇所以上の視点から撮像された画像の射影変換により「被写体奥行」を抽出して、抽出された被写体奥行の閾値処理により「被写体形状」を判定するものとする。
【００４７】
なお、この「被写体奥行」とは、図６のように、被写体を正面方向（被写体の概形を直方体で置き換えたときの最大面）から見たときの被写体の厚みのことを意味する。被写体奥行が小さいことは被写体形状が平面形状であることを、被写体奥行が大きいことは被写体形状が立体形状であることを意味する。
【００４８】
さて、射影変換行列は、２箇所の視点から撮像された画像上の４点以上の対応点の位置情報から算出することができる。ただし、被写体形状が平面形状であることが射影変換を行う上での前提となるため、被写体形状が立体形状である場合には射影変換を行うことができず、射影変換行列を精度よく算出することができない。別の見方をすると、互いに異なる組み合わせの対応点から射影変換行列Ｂ１と射影変換行列Ｂ２を算出した場合、被写体形状が平面形状から離れるほどＢ１とＢ２の差が広がることになる。すなわち、Ｂ１とＢ２の差をもって被写体奥行とすることができる。よってここでは、奥行抽出部５０１は、２箇所以上の視点から撮像された画像の射影変換により被写体奥行（Ｂ１とＢ２の差）を抽出して、形状判定部５０２は、抽出された被写体奥行（Ｂ１とＢ２の差）の閾値処理により被写体形状を判定するものとする。
【００４９】
なお、この「２箇所以上の視点から撮像された画像」は、同一の撮像装置１０１により撮像された画像同士でも、別個の撮像装置１０１により撮像された画像同士でもよい。
【００５０】
図７は、第１実施例の画像処理装置１０２に係るフローチャートである。
【００５１】
Ｓ７０１では、２箇所以上の視点から撮像された画像をメモリに記録する。
【００５２】
Ｓ７０２では、１枚目の画像から特徴点を検出する。具体的には、▲１▼各画素を中心位置にして特徴点検出ブロックを作成し、各特徴点検出ブロック内の輝度値分布（又はＲＧＢ値分布）を検出して、▲２▼画像を領域分割し、各領域内で輝度値分布の差が顕著である特徴点検出ブロックを検出して、▲３▼この特徴点検出ブロックの中心位置（特徴点）の位置情報と輝度値分布をメモリに記録する。
【００５３】
Ｓ７０３では、２枚目の画像から特徴点の対応点を検出する。具体的には、▲１▼各画素を中心位置にして対応点検出ブロックを作成し、各対応点検出ブロック内の輝度値分布を検出して、▲２▼この輝度値分布と各特徴点の輝度値分布（Ｓ７０２でメモリに記録した輝度値分布）とのマッチング行い、▲３▼一致度が最良である対応点検出ブロックの中心位置（対応点）の位置情報をメモリに記録する。一致度が閾値より良い対応点検出ブロックが存在しない場合には、その特徴点をメモリから消去する。
【００５４】
Ｓ７０４では、各対応点の正否を判断する。具体的には、▲１▼各特徴点から縦横５ピクセルの画素を中心位置にして特徴点検出ブロックを作成して、▲２▼各対応点から縦横５ピクセルの画素を中心位置にして対応点検出ブロックを作成して、▲３▼この特徴点検出ブロックとこの対応点検出ブロックとのマッチングを行い、▲４▼一致度が閾値より悪い場合には、間違えて検出されたものと判断し、その対応点をメモリから消去する。一致度に応じて対応点の順位付けをして、メモリに記録しておいてもよい。
【００５５】
Ｓ７０５では、任意に選択した４点以上の対応点の位置情報から射影変換行列Ｂ１を算出する。Ｓ７０６では、任意に選択した４点以上の対応点の位置情報から射影変換行列Ｂ２を算出する。Ｓ７０７では、Ｂ１とＢ２の差の閾値処理を行う。なお、Ｓ７０５で選択する対応点とＳ７０６で選択する対応点を別個のものとするために、Ｓ７０５とＳ７０６との間で、Ｓ７０５で選択された対応点をメモリから消去する。
【００５６】
Ｂ１とＢ２の差が閾値以下である場合には、Ｓ７０８にて、上記の画像について平面形状用の画像処理を実行して三次元形状を算出する。Ｂ１とＢ２の差が閾値以下でない場合には、Ｓ７０９にて、上記の画像について平面形状用の画像処理を実行して三次元形状を算出する。いずれの場合もＳ７１０にて、三次元形状の算出結果をメモリに記録する。
【００５７】
（２）第２実施例
図８は、第２実施例の画像処理装置１０２の機能ブロックを表す。第２実施例の画像形成装置１０２は、図４の形状判定部４０１を三次元形状概算部８０１と奥行抽出部８０２と形状判定部８０３により構成したものである。これらの機能ブロックにより第１実施例の画像処理装置１０２は、撮像装置１０１により撮像された画像について画像処理を実行して三次元形状を算出する。
【００５８】
三次元形状概算部８０１は、この画像の被写体の大まかな三次元形状を自動的に概算して、奥行抽出部８０２は、この三次元形状からこの画像の被写体の奥行を自動的に抽出して、形状判定部８０３は、この奥行からこの画像の被写体が平面形状であるか立体形状であるかを自動的に判定する。ここでは、後述する手法により「被写体の三次元形状」を概算して、概算された被写体の三次元形状の点群データにより「被写体奥行」を抽出して、抽出された被写体奥行の閾値処理により「被写体形状」を判定するものとする。
【００５９】
被写体の大まかな三次元形状を概算する手法としては、▲１▼因子分解法を使用して概算する手法、▲２▼対応点からオプティカルフローを算出して概算する手法、▲３▼被写体の輪郭を抽出して概算する手法、▲４▼被写体にパタン光を照射して三角測定により概算する手法、▲５▼被写体からの反射光の強度比を測定して概算する手法、▲６▼焦点距離の違いによる画像ボケを利用して概算する手法等が挙げられる。よってここでは、三次元形状概算部８０１は、いずれかの手法により被写体の三次元形状を概算するものとする。
【００６０】
なお、手法▲４▼や手法▲５▼により被写体の三次元形状を概算する場合には、被写体に光を照射するための光照射装置を撮像装置１０１に設置してもよい。
【００６１】
以下の例のように、概算された被写体の三次元形状を利用すれば、被写体奥行を抽出することができる。まず、概算された被写体の三次元形状から、この三次元形状を構成する点群データを抽出する。次に、５点以上の点群データを平面方程式「ａｘ＋ｂｙ＋ｃｚ＝ｄ」に代入して、４個の定数「ａ，ｂ，ｃ，ｄ」を算出する。５点以上の点群データを代入して４個の定数を算出するため、定数解が一意に決定されず分布を持つ。そこで、最小二乗法を使用して、各点群データと平面方程式との距離の二乗和が最小になるようにして、定数解を決定することにする。ここで、被写体形状が平面形状であれば、点群データは平面をなすのであるから、この最小二乗和は小さい値になるはずであるが、被写体形状が立体形状であれば、点群データは平面をなさないのであるから、この最小二乗和は大きい値になるはずである。すなわち、この最小二乗和をもって被写体奥行とすることができる。よってここでは、奥行抽出部８０２は、概算された被写体の三次元形状の点群データにより被写体奥行（各点群データと平面方程式との距離の二乗和の最小値）を抽出するものとする。
【００６２】
以下の例のように、抽出された被写体奥行を利用すれば、被写体形状を判定することができる。抽出された被写体奥行Ｄの一定閾値Ｊによる閾値処理により、被写体形状を判定することは可能である。しかし、厳密に言えば、被写体奥行が小さくても被写体サイズも小さければ、被写体形状が立体形状である場合もあるし、被写体奥行が大きくても被写体サイズも大きければ、被写体形状が平面形状である場合もある。よってここでは、形状判断部８０３は、抽出された被写体奥行Ｄと被写体サイズＳとの商Ｄ／Ｓの一定閾値Ｊによる閾値処理により、被写体形状を判定するものとする。
【００６３】
なお、被写体サイズは、概算された三次元形状を利用して算出するようにしてもよい。例えば、この三次元形状を構成する点群データのうちで最も離れたもの同士の距離を算出して、この距離をもって被写体サイズとする。また、手法▲３▼のように被写体の輪郭を抽出する場合は、被写体サイズは、この輪郭線を利用して算出するようにしてもよい。
【００６４】
図９は、第２実施例の画像処理装置１０２に係るフローチャートである。
【００６５】
Ｓ９０１では、撮像された画像をメモリに記録する。
【００６６】
Ｓ９０２では、被写体の三次元形状を概算する。Ｓ９０３では、被写体奥行を抽出する。Ｓ９０４では、被写体形状を判定する。ここでは、上述した手法により被写体の三次元形状を概算して、概算された被写体の三次元形状の点群データにより被写体奥行Ｄ（各点群データと平面方程式との距離の二乗和の最小値）を抽出して、抽出された被写体奥行Ｄと被写体サイズＳとの商Ｄ／Ｓの一定閾値Ｊによる閾値処理により被写体形状を判定する。
【００６７】
Ｄ／ＳがＪ以下である場合（平面形状である場合）には、Ｓ９０５にて、上記の画像について平面形状用の画像処理を実行して三次元形状を算出する。Ｄ／ＳがＪ以下でない場合（立体形状である場合）には、Ｓ９０６にて、上記の画像について平面形状用の画像処理を実行して三次元形状を算出する。いずれの場合もＳ９０７にて、三次元形状の算出結果をメモリに記録する。
【００６８】
（３）第３実施例
図１０は、第３実施例の撮像装置１０１のハードウェア構成を表す。第３実施例の撮像装置１０１は、図２の構成要素に加えて、当該撮像装置１０１の姿勢を算出するための物理量（ここでは重力加速度と地磁気）を検出する検出装置１００１を備える。図２により説明したように、画像バッファメモリ２１０で保存された画像信号は、最終的に外部通信機器２２１により外部機器に転送されるが、ここでは、画像バッファメモリ２１０で保存された画像信号は、その前にＭＰＵ２１２内の記憶部２１５に記憶される。
【００６９】
以上のような「撮像動作」は電源スイッチ２１３がＯＮの状態で撮像スイッチ２１４が押されると開始されるのであるが、これと同時に、以下のような「検出動作」も開始される。すなわち、重力加速度と地磁気（詳細にはこれらのＸ成分とＹ成分とＺ成分）が、三軸加速度センサ１００２と三軸磁気センサ１００３でそれぞれ電圧信号として検出されて、Ａ／Ｄ変換器１００４でデジタル信号化されて、ＭＰＵ２１２内の記憶部２１５に画像信号に添付されて記憶される。重力加速度信号と地磁気信号が添付された画像信号は、最終的に外部通信機器２２１により外部機器に転送される。
【００７０】
なお、三軸磁気センサ１００３に代えて「三軸角速度センサ」を使用してもよい。また、三軸磁気センサ１００３と共に「三軸角速度センサ」を使用してもよい。その他にも、三軸重力センサと三軸磁気センサ１００３に代えて「水平センサ」を使用してもよい。検出装置１００１に使用するセンサは、撮像装置１０１の姿勢を算出するための物理量（ここでは重力加速度と地磁気）を検出することができれば、どのようなセンサでも構わない。例えば、撮像装置１０１が巨大である場合には「ＧＰＳセンサ」を使用する（研究例は非特許文献２や非特許文献３）ことも考えられる。
【００７１】
図１１は、第３実施例の画像処理装置１０２の機能ブロックを表す。第３実施例の画像形成装置１０２は、図８の三次元形状概算部８０１を対応点検出部１１０１と姿勢算出部１１０２と並進成分算出部１１０３と三次元形状概算部１１０４により構成したものである。これらの機能ブロックにより第３実施例の画像処理装置１０２は、撮像装置１０１により撮像された画像について画像処理を実行して三次元形状を算出する。
【００７２】
対応点検出部１１０１は、撮像装置１０１により撮像された画像から対応点を検出する。画像から対応点を検出する手法としては、ブロックマッチング（理論的詳細は非特許文献４）が挙げられる。ブロックマッチングでは、２箇所以上の視点から撮像された画像を使用して、輝度値分布やＲＧＢ値分布により、１枚目の画像から特徴点を検出して、輝度値分布やＲＧＢ値分布のマッチング（相互相関の最大化）により、２枚目の画像から特徴点の対応点を検出する。
【００７３】
姿勢算出部１１０２は、検出装置１００１により検出された物理量（ここでは重力加速度と地磁気）から撮像装置１０１の姿勢を算出する。撮像装置１０１の姿勢は、ワールド座標系（図１２のＸＹＺ座標系：地面に固定された座標系）に対する撮像装置座標系（図１３のｘｙｚ座標系：撮像装置１０１に固定された座標系）の向きにより表現される。ワールド座標系に対する撮像装置座標系の向きは、式（１）のような、ワールド座標系を回転して撮像装置座標系にするための回転行列Ｒにより表現される。Ｒは、▲１▼Ｚ軸回りに角度γだけ回転して、▲２▼Ｘ軸回りに角度αだけ回転して、▲３▼Ｙ軸回りに角度βだけ回転することを意味する。
【００７４】
【数１】

さて、ここでは計算の便宜上、地磁気の伏角は無視することにする。これにより、ワールド座標系における重力加速度ｇと地磁気ｍは、式（２）のように表現できる。また、撮像装置座標系における重力加速度Ｇと地磁気Ｍは、式（３）のように表現される。
【００７５】
【数２】

【００７６】
【数３】

ここで、ｇとＧとの間には式（４）のような関係式が成立して、ｍとＭとの間には式（５）のような関係式が成立する。
【００７７】
【数４】

【００７８】
【数５】

式（４）からは、角度αが式（６）のように導出されて、角度γが式（７）のように導出される。式（５）からは、角度βが式（８）のように導出される。
【００７９】
【数６】

【００８０】
【数７】

【００８１】
【数８】

ここで、重力加速度以外の加速度（慣性力加速度）や地磁気以外の磁気は無視できるとすると、三軸加速度センサ１００２で検出される加速度は重力加速度Ｇとなり、三軸磁気センサ１００３で検出される磁気は地磁気Ｍとなる。よって、姿勢算出部１１０２は、検出装置１００１により検出された重力加速度Ｇと地磁気Ｍを、式（６）と式（７）と式（８）に代入して、角度αと角度βと角度γを算出する。こうして、撮像装置１０１の姿勢が算出されたことになる。
【００８２】
並進成分算出部１１０３は、上記の対応点と上記の姿勢から撮像装置１０１の並進成分を算出する。ここで、この「並進成分」とは、図１４のように、対応点検出の際に使用された画像に関して、１枚目の画像を撮像した際の光学中心から２枚目の画像を撮像した際の光学中心までのベクトル成分ｂのことを意味する。
【００８３】
さて、図１４中のオブジェクトに関して対応点検出が実行されたとする場合、１枚目の画像と２枚目の画像を撮像した際の光学中心からオブジェクトまでのベクトル成分をそれぞれＰとＰ’とすると、ＰとＰ’とｂとは同一平面上にあるため、ＰとＰ’とｂとのスカラ３重積（平行六面体の体積にあたる）は０になる。正確には、ＰとＰ’との座標系が異なるため、１枚目の画像を撮像した際から２枚目の画像を撮像した際までの撮像装置１０１の姿勢の変化ＲをＰ’に掛けて、ＰとＲＰ’とｂとのスカラ３重積「Ｐ・ＲＰ’×ｂ」が０になる。よって、並進成分算出部１１０３は、上記の対応点と上記の姿勢を、このスカラ３重積の式に代入して、撮像装置１０１の並進成分を算出する。
【００８４】
なお、必要となるのは並進成分の向きであるから、算出すべき変数は最低２個であり、必要となる対応点は最低２点である。そのため、３点以上の対応点検出を実行した場合には、相互相関の高い対応点のみを使用してもいいし、最小二乗法を使用してもいいし、多数の並進成分を導出して並進成分群を投票空間に投影して最適な並進成分を算出（研究例は非特許文献５）してもいい。
【００８５】
三次元形状概算部１１０４は、上記の対応点と上記の姿勢と上記の並進成分から、撮像装置１０１により撮像された画像の被写体の三次元形状を概算する。対応点と姿勢と並進成分から被写体の三次元形状を概算する手法としては、三角測定原理により概算する手法が代表例として挙げられる。加えて、三角測定の精度向上のための対策（研究例は非特許文献６）を実行してもいい。
【００８６】
このように、第３実施例の画像処理装置１０２は、検出装置１００１により検出された物理量（ここでは重力加速度と地磁気）から算出される撮像装置１０１の姿勢に基づいて、上記の画像の被写体の三次元形状を概算する。
【００８７】
図１５は、第３実施例の画像処理装置１０２に係るフローチャートである。
【００８８】
Ｓ１５０１では、２箇所以上の視点から撮像された画像と共に、これらの画像を撮像した際に検出された重力加速度と地磁気をメモリに記録する。
【００８９】
Ｓ１５０２では、１枚目の画像から特徴点を検出する。具体的には、▲１▼各画素を中心位置にして特徴点検出ブロックを作成し、各特徴点検出ブロック内の輝度値分布（又はＲＧＢ値分布）を検出して、▲２▼画像を領域分割し、各領域内で輝度値分布の差が顕著である特徴点検出ブロックを検出して、▲３▼この特徴点検出ブロックの中心位置（特徴点）の位置情報と輝度値分布をメモリに記録する。
【００９０】
Ｓ１５０３では、２枚目の画像から特徴点の対応点を検出する。具体的には、▲１▼各画素を中心位置にして対応点検出ブロックを作成し、各対応点検出ブロック内の輝度値分布を検出して、▲２▼この輝度値分布と各特徴点の輝度値分布（Ｓ１５０２でメモリに記録した輝度値分布）とのマッチング行い、▲３▼一致度が最良である対応点検出ブロックの中心位置（対応点）の位置情報をメモリに記録する。一致度が閾値より良い対応点検出ブロックが存在しない場合には、その特徴点をメモリから消去する。
【００９１】
Ｓ１５０４では、各対応点の正否を判断する。具体的には、▲１▼各特徴点から縦横５ピクセルの画素を中心位置にして特徴点検出ブロックを作成して、▲２▼各対応点から縦横５ピクセルの画素を中心位置にして対応点検出ブロックを作成して、▲３▼この特徴点検出ブロックとこの対応点検出ブロックとのマッチングを行い、▲４▼一致度が閾値より悪い場合には、間違えて検出されたものと判断し、その対応点をメモリから消去する。一致度に応じて対応点の順位付けをして、メモリに記録しておいてもよい。
【００９２】
Ｓ１５０５では、１枚目の画像を撮像した際に検出された重力加速度と地磁気から、１枚目の画像を撮像した際の撮像装置１０１の姿勢を算出して、２枚目の画像を撮像した際に検出された重力加速度と地磁気から、２枚目の画像を撮像した際の撮像装置１０１の姿勢を算出する。
【００９３】
Ｓ１５０６では、上記の画像と上記の姿勢から、１枚目の画像を撮像した際から２枚目の画像を撮像した際までの撮像装置１０１の並進成分を算出する。
【００９４】
Ｓ１５０７では、上記の画像と上記の姿勢と上記の並進成分から、撮像装置１０１により撮像された画像の被写体の三次元形状を概算する。Ｓ１５０８では、被写体奥行を抽出する。Ｓ１５０９では、被写体形状を判定する。ここでは、三角測定原理により被写体の三次元形状を概算して、概算された被写体の三次元形状の点群データにより被写体奥行Ｄ（各点群データと平面方程式との距離の二乗和の最小値）を抽出して、抽出された被写体奥行Ｄと被写体サイズＳとの商Ｄ／Ｓの一定閾値Ｊによる閾値処理により被写体形状を判定する。
【００９５】
Ｄ／ＳがＪ以下である場合（平面形状である場合）には、Ｓ１５１０にて、上記の画像について平面形状用の画像処理を実行して三次元形状を算出する。Ｄ／ＳがＪ以下でない場合（立体形状である場合）には、Ｓ１５１１にて、上記の画像について平面形状用の画像処理を実行して三次元形状を算出する。いずれの場合もＳ１５１２にて、三次元形状の算出結果をメモリに記録する。
【００９６】
（第１処理部４０２）
上述のように、第１処理部４０２は、上記の画像の被写体が平面形状であると判定された場合に、上記の画像について平面形状用の画像処理（平面形状の三次元形状を算出するのに適した画像処理）を実行して三次元形状を算出する。この具体例としてここでは、２箇所以上の視点から撮像された画像の射影変換により撮像装置１０１の姿勢と並進成分を算出して、これらに基づいて三角測定原理等により三次元形状を算出する場合について説明する。
【００９７】
被写体形状が平面形状である場合には、２箇所以上の視点から撮像された画像から射影変換行列Ｂを算出することにより、撮像装置１０１の姿勢と並進成分を算出することができる。１枚目の画像の特徴点の位置を（ｕ，ｖ）として、２枚目の画像の対応点の位置を（Ｕ，Ｖ）とすると、射影変換は式（９）のように表現される。式（９）の未知数ｂ１からｂ８を式（１０）のようにまとめたものが射影変換行列Ｂである。
【００９８】
【数９】

【００９９】
【数１０】

この射影変換行列Ｂは、２箇所以上の視点から撮像された画像の４点以上の対応点の位置情報から算出することができる。５点以上の対応点検出を実行した場合には、相互相関の高い対応点のみを使用してもいいし、最小二乗法を使用してもいい。
【０１００】
この射影変換行列Ｂは、撮像装置座標系で表現されているので、式（１１）のような撮像装置内部行列Ａによる変換式（１２）を使用して、撮像装置座標系で表現された射影変換行列Ｂをワールド座標系で表現された射影変換行列Ｈに変換する。ただし、ｚは焦点距離であり、ｕ０とｖ０は画面中心であり、ｋｕとｋｖは画素密度である。
【０１０１】
【数１１】

【０１０２】
【数１２】

この射影変換行列Ｈと姿勢Ｒと並進成分ｂとの間には式（１３）のような関係式（理論的詳細は非特許文献７）が成立する。ただし、ｓはスケール調整ための定数であり、ｎは被写体平面の法線ベクトルである。
【０１０３】
【数１３】

よってここでは、第１処理部４０２は、２箇所以上の視点から撮像された画像から射影変換行列Ｂや射影変換行列Ｈを算出することにより、撮像装置１０１の姿勢Ｒと並進成分ｂを算出して、これらに基づいて三角測定原理等により三次元形状を算出するものとする。
【０１０４】
（第２処理部４０３）
上述のように、第２処理部４０３は、上記の画像の被写体が立体形状であると判定された場合に、上記の画像について立体形状用の画像処理（立体形状の三次元形状を算出するのに適した画像処理）を実行して三次元形状を算出する。この具体例としてここでは、２箇所以上の視点から撮像された画像に関しての８点アルゴリズム（ＳｔｒｕｃｔｕｒｅｆｒｏｍＭｏｔｉｏｎの一種。理論的詳細は非特許文献８や非特許文献９）により撮像装置１０１の姿勢と並進成分を算出して、これらに基づいて三角測定原理等により三次元形状を算出する場合について説明する。
【０１０５】
被写体形状が立体形状である場合には、２箇所以上の視点から撮像された画像に関しての８点アルゴリズムにより、撮像装置１０１の姿勢と並進成分を算出することができる。８点アルゴリズムでは、上述したＰとＰ’とｂとの間の関係式「Ｐ・ＲＰ’×ｂ＝０」を使用して、姿勢Ｒと並進成分ｂを、２箇所以上の視点から撮像された画像の８点以上の対応点の位置情報から算出する。
【０１０６】
具体的には、３次行列Ｅ（＝ｂ×Ｒ）を上記の関係式に代入して、３次行列Ｅの各成分を算出する。よって、算出すべき変数は９個であり、必要となる対応点は９点である。しかし、並進成分に関して、必要となるのは並進成分の向きであるから、算出すべき変数は１個少ない最低８個であり、必要となる対応点は１点少ない最低８点である。そのため、９点以上の対応点検出を実行した場合には、相互相関の高い対応点のみを使用してもいいし、最小二乗法を使用してもいい。
【０１０７】
よってここでは、第２処理部４０３は、２箇所以上の視点から撮像された画像から８点アルゴリズムにより３次行列Ｅを算出することにより、撮像装置１０１の姿勢Ｒと並進成分ｂを算出して、これらに基づいて三角測定原理等により三次元形状を算出するものとする。
【０１０８】
（変形例）
第１処理部４０２の変形例としては、▲１▼上記の画像の被写体が平面形状であるか立体形状であるかを再判定して、▲２▼平面形状であると再判断された場合は、上記の画像について平面形状用の画像処理を実行して三次元形状を算出して、▲３▼立体形状であると再判断された場合は、上記の画像について立体形状用の画像処理を実行して三次元形状を算出するようにしてもいい。なお、▲１▼▲２▼▲３▼を実行する機能ブロックを、それぞれ再形状判断部・再第１処理部・再第２処理部と呼ぶ。
【０１０９】
第３実施例に関しては、再形状判定部は、▲１▼撮像装置１０１により撮像された画像から対応点を再検出して、▲２▼この対応点等から被写体の三次元形状を再概算して、▲３▼この三次元形状から被写体奥行を再抽出して、▲４▼この被写体奥行から被写体形状を再判定するようにしてもよい。このようにする理由としては例えば、図１６のように、１平面からしか対応点を検出できなかったために、実際は立体形状であるのに平面形状であると誤判定される場合があることが挙げられる。
【０１１０】
したがって、対応点再検出は、より対応点を検出しやすい手法で実行することが望ましい。このような手法としては、姿勢と並進成分からエピポーラ拘束条件を算出して、エピポーラ線を撮像面に投影して、エピポーラ線上で対応点再検出を実行することで、相互相関の閾値を低下させる手法が挙げられる。エピポーラ拘束条件とは、上述したＰとＰ’とｂとの間の関係式「Ｐ・ＲＰ’×ｂ＝０」に姿勢Ｒと並進成分ｂを代入した関係式である。エピポーラ線とは、この関係式に特徴点（ｕ，ｖ）と焦点距離ｚを代入して算出される、対応点（Ｕ，Ｖ）を変数とする直線方程式「ａＵ＋ｂＹ＋ｃ＝０」である。このエピポーラ線上で対応点再検出を実行するようにすれば、対応点を検出しやすくなる。具体的に言えば、相互相関の閾値を低下させて、より多くの対応点をより誤対応なく検出することができる。
【０１１１】
なお、平面形状であると再判定された場合は、直ちに三次元形状を算出するようにしてもいいし、平面形状であるか立体形状であるかの再判定を何度もループしてもよい。上述した第３実施例に関する具体例で言うと、対応点再検出を何度もループすることになる。このループを終了する手法としては、各対応点再検出で検出された各対応点とエピポーラ線との距離の和を算出しておいて、今回の和Ｌ（ｎ）と前回の和Ｌ（ｎ−１）との差が閾値以下になったら、今回でループを終了するという手法が挙げられる。ループが終了したら、三次元形状を算出するようにする。
【０１１２】
（変形例）
図１の画像処理装置１０２は、本発明に係る「画像処理装置」の実施の形態の例であり、図１の画像処理装置１０２により実行される画像処理方法は、本発明に係る「画像処理方法」の実施の形態の例である。この画像処理方法をコンピュータに実行させる画像処理プログラムは、本発明に係る「画像処理プログラム」の実施の形態の例であり、この画像処理方法をコンピュータに実行させる画像処理プログラムが記録された記録媒体（ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、スマートメディア等）は、本発明に係る「記録媒体」の実施の形態の例である。
【０１１３】
図１の撮像装置１０１と画像処理装置１０２は、個別の電子機器ではなく一体の電子機器でもよい。このような電子機器の例としては、カメラ付き携帯電話機や、カメラ付きノート型パソコンや、カメラ付きＰＤＡや、画像処理機能付きのデジタルカメラが挙げられる。
【０１１４】
【発明の効果】
このように、本発明は、被写体形状が平面形状である場合にも立体形状である場合にも、被写体の三次元形状を精度よく算出することを可能にする。
【図面の簡単な説明】
【図１】撮像装置と画像処理装置を表す。
【図２】撮像装置のハードウェア構成を表す。
【図３】画像処理装置のハードウェア構成を表す。
【図４】画像処理装置の機能ブロックを表す。
【図５】第１実施例の画像処理装置の機能ブロックを表す。
【図６】被写体奥行について説明するための図である。
【図７】第１実施例の画像処理装置に係るフローチャートである。
【図８】第２実施例の画像処理装置の機能ブロックを表す。
【図９】第２実施例の画像処理装置に係るフローチャートである。
【図１０】第３実施例の撮像装置のハードウェア構成を表す。
【図１１】第３実施例の画像処理装置の機能ブロックを表す。
【図１２】ワールド座標系を表す。
【図１３】撮像装置座標系を表す。
【図１４】並進成分について説明するための図である。
【図１５】第３実施例の画像処理装置に係るフローチャートである。
【図１６】第１処理部の変形例について説明するための図である。
【符号の説明】
１０１撮像装置
１０２画像処理装置
４０１形状判定部
４０２第１処理部
４０３第２処理部
４０４記録部
５０１奥行抽出部
５０２形状判定部
８０１三次元形状概算部
８０２奥行抽出部
８０３形状判定部
１００１検出装置
１１０１対応点検出部
１１０２姿勢算出部
１１０３並進成分算出部
１１０４三次元形状概算部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing apparatus, an image processing method, an image processing program, a recording medium, and an electronic apparatus.
[0002]
[Prior art]
Currently, the spread of digital cameras is rapidly progressing, and the development of applications using digital cameras is also active. Examples of this include “image panorama synthesis”, “image distortion correction”, “three-dimensional shape calculation”, “autonomous movement of a robot”, and “automatic driving of a vehicle”. Among these, “calculation of three-dimensional shape” is expected to progress because there is a need to provide three-dimensional information of goods for electronic commerce.
[0003]
[Patent Document 1]
JP 11-136575 A
[0004]
[Patent Document 2]
Japanese Patent Laid-Open No. 11-306363
[0005]
[Patent Document 3]
Japanese Patent Laid-Open No. 2000-115639
[0006]
[Patent Document 4]
JP 2000-134537 A
[0007]
[Patent Document 5]
JP 2000-228748 A
[0008]
[Patent Document 6]
JP 2001-92944 A
[0009]
[Patent Document 7]
JP 2001-148025 A
[0010]
[Patent Document 8]
JP 2001-325069 A
[0011]
[Non-Patent Document 1]
Q. -T. Luong, O .; D. Faugeras “Determining the Fundamental Matrix with Planes: Instability and New Algorithms” Proc. of Computer Vision and Pattern Recognition, p489-494, 1993.
[0012]
[Non-Patent Document 2]
Shusaku Mibu, Hiromune Namie, Akio Yasuda “Three-dimensional attitude measurement of the hull with RTK-GPS: 3-Dimensional Attitude Measurement of RTK-GPS Positioning” Tokyo MOL University
[0013]
[Non-Patent Document 3]
Hiromune Namie “Study on PGPS and RTK-GPS for Practical Use”, Tokyo MOL University
[0014]
[Non-Patent Document 4]
Xugang and Saburo Saburo, “Three-dimensional Vision”, Kyoritsu Shuppan, Chapter 3
[0015]
[Non-Patent Document 5]
Okaya Takayuki, Deguchi Koichiro "Estimating translational motion of a camera by voting using a camera with a three-dimensional sensor" Human Interface Computer Vision and Image Media, 2001/9/13
[0016]
[Non-Patent Document 6]
Richard I. Hartley, Computer Vision And Image Understanding, vol. 68, no. 2, November, pp146-157, 1997
[0017]
[Non-Patent Document 7]
Tomoaki Morijiri, Yasuhiro Onodera, Kenichi Kanaya “Calculation of three-dimensional motion of a plane from two images” Information Processing Society of Japan, 94-CV-90, 1994
[0018]
[Non-Patent Document 8]
Xugang, Saburo Tsubasa "3D Vision" Kyoritsu Shuppan, Chapter 6
[0019]
[Non-patent document 9]
Kenichi Kanaya “Image Understanding”, Shukusha, Chapter 4
[0020]
[Problems to be solved by the invention]
In order to calculate the three-dimensional shape of a subject using an image captured by an imaging device such as a digital camera, a technique called “Structure from Motion” is generally used. However, according to this method, when the subject shape is close to a planar shape, the calculation accuracy decreases because the perspective difference is small (Non-Patent Document 1). For this reason, this method has a problem that the three-dimensional shape of the subject cannot be accurately calculated when the subject shape is a planar shape such as paper or a flat plate.
[0021]
When the subject shape is a planar shape, a three-dimensional shape of the subject can be calculated by calculating a projective transformation matrix as described in Patent Document 7. However, this method has a problem that, if the subject shape is a three-dimensional shape, the three-dimensional shape of the subject cannot be accurately calculated.
[0022]
Therefore, an object of the present invention is to accurately calculate the three-dimensional shape of a subject regardless of whether the subject shape is a planar shape or a three-dimensional shape.
[0023]
[Means for Solving the Problems]
  The image processing apparatus of the present inventionAn image processing device that performs image processing on an image captured by an imaging device to calculate a three-dimensional shape,2 A transformation matrix calculation means for calculating a projection transformation matrix from position information of four or more corresponding points on the image captured from the viewpoint of the location, and the projection transformation matrix calculated by the projection transformation matrix calculation meansThe subject of the imageDepth extraction means for extracting values used for shape determination, and values used for shape determinationComprising a shape determining means for determining whether the subject of the image is a planar shape or a three-dimensional shape;The depth extraction means is a value used for the shape determination with a difference between two projective transformation matrices calculated from corresponding points of different combinations.If it is a planar shape, the image processing for the planar shape is executed for the image to calculate a three-dimensional shape. If it is a three-dimensional shape, the image processing for the three-dimensional shape is executed for the image The shape is calculated.
[0034]
  Image processing apparatus or image forming method of the present inventionDetermines whether the subject of the image is a planar shape or a three-dimensional shape, and if it is a planar shape, performs image processing for the planar shape on the image to calculate a three-dimensional shape, If the object shape is a three-dimensional shape, the image processing for the three-dimensional shape is performed on the image to calculate the three-dimensional shape. Therefore, whether the subject shape is a planar shape or a three-dimensional shape, the subject shape is determined. Thanks to this, the three-dimensional shape of the subject can be calculated accurately.
[0035]
  An image processing apparatus or an image forming method according to another aspect of the present inventionOf the subject of the imageValue used for shape judgmentExtract the aboveValue used for shape judgmentFrom the above, it is determined whether the subject of the image is a planar shape or a three-dimensional shape. If the subject is a planar shape, the image processing for the planar shape is performed on the image to calculate a three-dimensional shape, and the three-dimensional shape is obtained. If it is a shape, it calculates the three-dimensional shape by executing the image processing for the three-dimensional shape, so whether the subject shape is a planar shape or a three-dimensional shape,Value used for shape judgmentThanks to the determination of the subject shape through the extraction of the three-dimensional shape, the three-dimensional shape of the subject can be accurately calculated.
[0036]
  An image processing apparatus or an image forming method according to another aspect of the present inventionApproximates the three-dimensional shape of the subject of the image and calculates the subject of the image from the three-dimensional shape.Value used for shape judgmentExtract the aboveValue used for shape judgmentFrom the above, it is determined whether the subject of the image is a planar shape or a three-dimensional shape. If the subject is a planar shape, the image processing for the planar shape is performed on the image to calculate a three-dimensional shape, and the three-dimensional shape is obtained. If it is a shape, the image processing for the three-dimensional shape is performed on the image to calculate the three-dimensional shape. Therefore, the three-dimensional shape of the subject can be obtained regardless of whether the subject shape is a planar shape or a three-dimensional shape. Subject via an estimate ofUsed for shape determinationThanks to the determination of the subject shape through the extraction of the three-dimensional shape, the three-dimensional shape of the subject can be accurately calculated.
[0037]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described.
[0038]
FIG. 1 shows an imaging device 101 and an image processing device 102. The imaging device 101 is a device that captures an image, and the image processing device 102 is a device that performs image processing on the image and calculates a three-dimensional shape. Here, the imaging device 101 is a digital camera, and the image processing device 102 is a personal computer here. The exchange of images between the two is executed by connecting the two with a cord.
[0039]
FIG. 2 shows a hardware configuration of the imaging apparatus 101. The image of the subject is formed on the image sensor 206 through the fixed lens 201, zoom lens 202, aperture mechanism 203, shutter 204, and focus lens 205. An image signal from the image sensor 206 is sampled by a CDS circuit 207 (Correlated Double Sampling) and converted into a digital signal by an A / D converter 208. The timing at this time is generated by a TG 211 (Timing Generator). Thereafter, the image signal from the image sensor 206 is subjected to image processing such as aperture correction and compression by an IPP 209 (Image Pre-Processor), and is temporarily stored in the image buffer memory 210. The operation of each unit is controlled by the MPU 212. The image signal stored in the image buffer memory 210 is finally transferred to the external device by the external communication device 221.
[0040]
FIG. 3 shows a hardware configuration of the image processing apparatus 102. The CPU 301 executes various processes and controls. The SDRAM 302 is used as a work area for the CPU 301 and as a fixed information recording area for various processing programs and control programs. The external I / F 321 is an interface for connecting to an electric communication line such as the Internet or an external device, the display I / F 322 is an interface for connecting to a display device such as a display, and the input I / F 323 is This is an interface for connecting to an input device such as a keyboard or a mouse. An image signal from the imaging device 101 and various processing programs and control programs are loaded into the SDRAM 202 via a recording device 311 such as a CD-RW drive or an external I / F 321. At this time, the data may be temporarily saved in the HDD 312 and loaded into the SDRAM 302 as appropriate.
[0041]
FIG. 4 shows functional blocks of the image processing apparatus 102. The image processing apparatus 102 includes a shape determination unit 401, a first processing unit 402, a second processing unit 403, and a recording unit 404. With these functional blocks, the image processing apparatus 102 performs image processing on the image captured by the imaging apparatus 101 and calculates a three-dimensional shape.
[0042]
The shape determination unit 401 automatically determines whether the subject of this image is a planar shape or a three-dimensional shape. For example, the depth of the subject of this image is extracted, and it is automatically determined from this depth whether the subject of this image has a planar shape or a three-dimensional shape. This will be described later as a first embodiment. For example, the three-dimensional shape of the subject of this image is approximated, the depth of the subject of this image is extracted from this three-dimensional shape, and whether the subject of this image is a planar shape or a three-dimensional shape from this depth Judge automatically. This will be described later as a second embodiment.
[0043]
When the shape determination unit 401 determines that the subject of the image has a planar shape, the first processing unit 402 performs image processing for the planar shape on the image to calculate a three-dimensional shape. Details of this will be described later. When the shape determination unit 401 determines that the subject of this image has a three-dimensional shape, the second processing unit 403 performs a three-dimensional image process on the image to calculate a three-dimensional shape. Details of this will be described later. The recording unit 404 records images, determination results, calculation results, and the like.
[0044]
As described above, the image processing apparatus 102 automatically determines whether the subject of the image has a planar shape or a three-dimensional shape. If the subject has a planar shape, the image for the planar shape is used for the image. If the object shape is a planar shape because the three-dimensional shape is calculated by executing the processing and the three-dimensional shape is calculated by executing the image processing for the three-dimensional shape on the above image. Even in the case of a three-dimensional shape, the three-dimensional shape of the subject can be accurately calculated thanks to the determination of the subject shape.
[0045]
(1) First embodiment
FIG. 5 illustrates functional blocks of the image processing apparatus 102 according to the first embodiment. In the image forming apparatus 102 of the first embodiment, the shape determination unit 401 in FIG. 4 is configured by a depth extraction unit 501 and a shape determination unit 502. With these functional blocks, the image processing apparatus 102 according to the first embodiment performs image processing on the image captured by the imaging apparatus 101 to calculate a three-dimensional shape.
[0046]
The depth extraction unit 501 automatically extracts the depth of the subject of this image, and the shape determination unit 502 automatically determines from this depth whether the subject of this image has a planar shape or a three-dimensional shape. . Here, “subject depth” is extracted by projective transformation of images captured from two or more viewpoints, and “subject shape” is determined by threshold processing of the extracted subject depth.
[0047]
The “subject depth” means the thickness of the subject when the subject is viewed from the front direction (the maximum surface when the outline of the subject is replaced with a rectangular parallelepiped) as shown in FIG. A small subject depth means that the subject shape is a planar shape, and a large subject depth means that the subject shape is a three-dimensional shape.
[0048]
Now, the projective transformation matrix can be calculated from position information of four or more corresponding points on an image captured from two viewpoints. However, since it is a precondition for performing the projective transformation that the subject shape is a planar shape, the projective transformation cannot be performed when the subject shape is a three-dimensional shape, and the projective transformation matrix is accurately calculated. I can't. From another viewpoint, when the projective transformation matrix B1 and the projective transformation matrix B2 are calculated from corresponding points in different combinations, the difference between B1 and B2 becomes wider as the subject shape becomes farther from the planar shape. That is, the subject depth can be determined by the difference between B1 and B2. Therefore, here, the depth extraction unit 501 extracts the subject depth (difference between B1 and B2) by projective transformation of images captured from two or more viewpoints, and the shape determination unit 502 extracts the subject depth ( It is assumed that the subject shape is determined by threshold processing (difference between B1 and B2).
[0049]
The “images captured from two or more viewpoints” may be images captured by the same image capturing apparatus 101 or images captured by separate image capturing apparatuses 101.
[0050]
FIG. 7 is a flowchart according to the image processing apparatus 102 of the first embodiment.
[0051]
In S701, images taken from two or more viewpoints are recorded in the memory.
[0052]
In S702, feature points are detected from the first image. Specifically, (1) a feature point detection block is created with each pixel as the center position, luminance value distribution (or RGB value distribution) in each feature point detection block is detected, and (2) the image is a region The feature point detection block in which the difference in luminance value distribution is remarkable in each region is detected, and (3) the position information and luminance value distribution of the center position (feature point) of this feature point detection block are stored in the memory. Record.
[0053]
In S703, the corresponding point of the feature point is detected from the second image. Specifically, (1) a corresponding point detection block is created with each pixel as the center position, a luminance value distribution in each corresponding point detection block is detected, and (2) the luminance value distribution and each feature point are detected. Matching with the luminance value distribution (the luminance value distribution recorded in the memory in S702) is performed, and (3) the position information of the center position (corresponding point) of the corresponding point detection block having the best matching degree is recorded in the memory. If there is no corresponding point detection block whose matching degree is better than the threshold value, the feature point is deleted from the memory.
[0054]
In S704, it is determined whether each corresponding point is correct. Specifically, (1) create a feature point detection block centered on 5 pixels vertically and horizontally from each feature point, and (2) check the correspondence centered on 5 pixels vertically and horizontally from each corresponding point. Create an outgoing block and (3) perform matching between this feature point detection block and this corresponding point detection block. (4) If the matching degree is worse than the threshold value, it is determined that it has been detected by mistake, The corresponding point is deleted from the memory. The corresponding points may be ranked according to the degree of coincidence and recorded in the memory.
[0055]
In S705, a projective transformation matrix B1 is calculated from position information of four or more corresponding points that are arbitrarily selected. In S706, a projective transformation matrix B2 is calculated from position information of four or more corresponding points that are arbitrarily selected. In S707, threshold processing for the difference between B1 and B2 is performed. Note that the corresponding point selected in S705 is deleted from the memory between S705 and S706 in order to make the corresponding point selected in S705 and the corresponding point selected in S706 distinct.
[0056]
If the difference between B1 and B2 is less than or equal to the threshold value, in step S708, the image processing for the planar shape is executed on the above image to calculate a three-dimensional shape. If the difference between B1 and B2 is not less than or equal to the threshold value, in step S709, a three-dimensional shape is calculated by executing image processing for a planar shape on the image. In either case, the calculation result of the three-dimensional shape is recorded in the memory in S710.
[0057]
(2) Second embodiment
FIG. 8 illustrates functional blocks of the image processing apparatus 102 according to the second embodiment. In the image forming apparatus 102 of the second embodiment, the shape determination unit 401 of FIG. 4 is configured by a three-dimensional shape estimation unit 801, a depth extraction unit 802, and a shape determination unit 803. With these functional blocks, the image processing apparatus 102 according to the first embodiment performs image processing on the image captured by the imaging apparatus 101 to calculate a three-dimensional shape.
[0058]
The three-dimensional shape approximating unit 801 automatically approximates the rough three-dimensional shape of the subject of this image, and the depth extracting unit 802 automatically extracts the depth of the subject of this image from this three-dimensional shape. The shape determination unit 803 automatically determines from this depth whether the subject of this image has a planar shape or a three-dimensional shape. Here, “subject depth” is estimated by the method described later, “subject depth” is extracted from the point cloud data of the estimated subject three-dimensional shape, and threshold processing of the extracted subject depth is performed. Assume that “subject shape” is determined.
[0059]
The approximate three-dimensional shape of the subject can be estimated by (1) a method using a factorization method, (2) a method for calculating an optical flow from corresponding points, and (3) a contour of the subject. (4) Method of irradiating pattern light to a subject and estimating by triangulation measurement, (5) Method of measuring and estimating the intensity ratio of reflected light from the subject, (6) Focal length For example, a rough estimation method using image blur due to the difference between the two may be used. Therefore, it is assumed here that the three-dimensional shape estimation unit 801 approximates the three-dimensional shape of the subject by any method.
[0060]
Note that when the three-dimensional shape of the subject is approximated by the methods (4) and (5), a light irradiation device for irradiating the subject with light may be installed in the imaging device 101.
[0061]
As in the following example, the subject depth can be extracted by using the estimated three-dimensional shape of the subject. First, point cloud data constituting the three-dimensional shape is extracted from the estimated three-dimensional shape of the subject. Next, four constants “a, b, c, d” are calculated by substituting the point group data of five or more points into the plane equation “ax + by + cz = d”. Since four constants are calculated by substituting five or more point group data, the constant solution is not uniquely determined and has a distribution. Therefore, a constant solution is determined by using the least square method so that the sum of squares of the distance between each point cloud data and the plane equation is minimized. Here, if the subject shape is a plane shape, the point cloud data forms a plane, so this minimum square sum should be a small value, but if the subject shape is a three-dimensional shape, the point cloud data is Since it does not form a plane, this least square sum should be large. In other words, the subject depth can be set with the least square sum. Therefore, here, it is assumed that the depth extraction unit 802 extracts the subject depth (the minimum value of the sum of squares of the distances between the point group data and the plane equation) from the estimated point group data of the three-dimensional shape.
[0062]
As in the following example, the subject shape can be determined by using the extracted subject depth. It is possible to determine the subject shape by threshold processing using the fixed threshold J of the extracted subject depth D. However, strictly speaking, if the subject depth is small and the subject size is small, the subject shape may be a three-dimensional shape. If the subject depth is large and the subject size is large, the subject shape is a planar shape. In some cases. Therefore, here, the shape determining unit 803 determines the subject shape by threshold processing using a constant threshold J of the quotient D / S of the extracted subject depth D and subject size S.
[0063]
Note that the subject size may be calculated using an estimated three-dimensional shape. For example, the distance between the most distant point cloud data constituting this three-dimensional shape is calculated, and this distance is used as the subject size. When extracting the contour of the subject as in method (3), the subject size may be calculated using this contour line.
[0064]
FIG. 9 is a flowchart according to the image processing apparatus 102 of the second embodiment.
[0065]
In step S901, the captured image is recorded in the memory.
[0066]
In S902, the three-dimensional shape of the subject is approximated. In step S903, the subject depth is extracted. In step S904, the subject shape is determined. Here, the three-dimensional shape of the subject is approximated by the above-described method, and the subject depth D (the minimum value of the sum of squares of the distance between each point group data and the plane equation is calculated based on the estimated point cloud data of the subject. ) And the subject shape is determined by threshold processing using a constant threshold J of the quotient D / S of the extracted subject depth D and subject size S.
[0067]
If D / S is equal to or less than J (in the case of a planar shape), in S905, the image processing for the planar shape is executed on the above image to calculate a three-dimensional shape. If D / S is not equal to or less than J (in the case of a three-dimensional shape), a three-dimensional shape is calculated by executing image processing for a planar shape on the above image in S906. In any case, the calculation result of the three-dimensional shape is recorded in the memory in S907.
[0068]
(3) Third embodiment
FIG. 10 illustrates a hardware configuration of the imaging apparatus 101 according to the third embodiment. In addition to the components shown in FIG. 2, the imaging apparatus 101 according to the third embodiment includes a detection apparatus 1001 that detects a physical quantity (here, gravitational acceleration and geomagnetism) for calculating the attitude of the imaging apparatus 101. As described with reference to FIG. 2, the image signal stored in the image buffer memory 210 is finally transferred to the external device by the external communication device 221, but here, the image signal stored in the image buffer memory 210 is Before that, it is stored in the storage unit 215 in the MPU 212.
[0069]
The “imaging operation” as described above is started when the imaging switch 214 is pressed while the power switch 213 is ON. At the same time, the following “detection operation” is also started. That is, gravitational acceleration and geomagnetism (specifically, these X component, Y component, and Z component) are detected as voltage signals by the three-axis acceleration sensor 1002 and the three-axis magnetic sensor 1003, respectively, and the A / D converter 1004 The digital signal is converted into an image signal and stored in the storage unit 215 in the MPU 212. The image signal to which the gravitational acceleration signal and the geomagnetic signal are attached is finally transferred to the external device by the external communication device 221.
[0070]
Instead of the triaxial magnetic sensor 1003, a “triaxial angular velocity sensor” may be used. Further, a “triaxial angular velocity sensor” may be used together with the triaxial magnetic sensor 1003. In addition, a “horizontal sensor” may be used instead of the triaxial gravity sensor and the triaxial magnetic sensor 1003. The sensor used in the detection device 1001 may be any sensor as long as it can detect physical quantities (in this case, gravitational acceleration and geomagnetism) for calculating the attitude of the imaging device 101. For example, when the imaging device 101 is huge, it is conceivable to use a “GPS sensor” (research examples are Non-Patent Document 2 and Non-Patent Document 3).
[0071]
FIG. 11 illustrates functional blocks of the image processing apparatus 102 according to the third embodiment. In the image forming apparatus 102 of the third embodiment, the three-dimensional shape estimation unit 801 in FIG. 8 is configured by a corresponding point detection unit 1101, an attitude calculation unit 1102, a translational component calculation unit 1103, and a three-dimensional shape estimation unit 1104. . With these functional blocks, the image processing apparatus 102 according to the third embodiment performs image processing on the image captured by the imaging apparatus 101 to calculate a three-dimensional shape.
[0072]
The corresponding point detection unit 1101 detects a corresponding point from the image captured by the imaging device 101. As a method for detecting corresponding points from an image, block matching (non-patent document 4 for theoretical details) can be cited. In block matching, feature points are detected from the first image using luminance value distribution and RGB value distribution using images taken from two or more viewpoints, and luminance value distribution and RGB value distribution matching are performed. Corresponding points of feature points are detected from the second image by (maximizing cross-correlation).
[0073]
The posture calculation unit 1102 calculates the posture of the imaging device 101 from physical quantities (here, gravitational acceleration and geomagnetism) detected by the detection device 1001. The orientation of the imaging device 101 is that of the imaging device coordinate system (xyz coordinate system: coordinate system fixed to the imaging device 101 in FIG. 13) relative to the world coordinate system (XYZ coordinate system in FIG. 12: coordinate system fixed to the ground). Expressed by orientation. The orientation of the image pickup apparatus coordinate system with respect to the world coordinate system is expressed by a rotation matrix R for rotating the world coordinate system into the image pickup apparatus coordinate system as shown in Equation (1). R means (1) rotate around the Z axis by an angle γ, (2) rotate around the X axis by an angle α, and (3) rotate around the Y axis by an angle β.
[0074]
[Expression 1]

Here, for the convenience of calculation, the dip angle of geomagnetism is ignored. Thereby, the gravitational acceleration g and the geomagnetism m in the world coordinate system can be expressed as shown in Expression (2). Further, the gravitational acceleration G and the geomagnetism M in the image pickup apparatus coordinate system are expressed as in Expression (3).
[0075]
[Expression 2]

[0076]
[Equation 3]

Here, a relational expression such as Expression (4) is established between g and G, and a relational expression such as Expression (5) is established between m and M.
[0077]
[Expression 4]

[0078]
[Equation 5]

From equation (4), angle α is derived as in equation (6), and angle γ is derived as in equation (7). From equation (5), the angle β is derived as in equation (8).
[0079]
[Formula 6]

[0080]
[Expression 7]

[0081]
[Equation 8]

If acceleration other than gravitational acceleration (inertial force acceleration) and magnetism other than geomagnetism can be ignored, the acceleration detected by the three-axis acceleration sensor 1002 becomes the gravitational acceleration G, and the magnetism detected by the three-axis magnetic sensor 1003. Becomes geomagnetism M. Therefore, the posture calculation unit 1102 substitutes the gravitational acceleration G and the geomagnetism M detected by the detection device 1001 into the equations (6), (7), and (8), and the angle α, the angle β, and the angle γ Is calculated. Thus, the attitude of the imaging device 101 is calculated.
[0082]
The translation component calculation unit 1103 calculates the translation component of the imaging apparatus 101 from the corresponding points and the posture. Here, the “translational component” means that, as shown in FIG. 14, the second image is captured from the optical center when the first image is captured with respect to the image used in detecting the corresponding point. This means the vector component b up to the optical center.
[0083]
Now, assuming that corresponding point detection is executed for the object in FIG. 14, let P and P ′ be vector components from the optical center to the object when the first image and the second image are captured, respectively. , P, P ′, and b are on the same plane, the scalar triple product of P, P ′, and b (corresponding to the volume of a parallelepiped) is zero. Precisely, since the coordinate system of P and P ′ is different, a change in attitude R of the imaging apparatus 101 from when the first image is captured until when the second image is captured is multiplied by P ′. Thus, the scalar triple product “P · RP ′ × b” of P, RP ′, and b becomes zero. Therefore, the translation component calculation unit 1103 calculates the translation component of the imaging apparatus 101 by substituting the corresponding point and the posture into the scalar triple product equation.
[0084]
Note that since the direction of the translation component is required, at least two variables are to be calculated, and at least two corresponding points are required. For this reason, when three or more corresponding points are detected, only corresponding points with high cross-correlation may be used, the least squares method may be used, or many translation components may be derived. An optimal translation component may be calculated by projecting the translation component group onto a voting space (non-patent document 5 is a research example).
[0085]
The three-dimensional shape estimation unit 1104 estimates the three-dimensional shape of the subject of the image captured by the imaging device 101 from the corresponding points, the posture, and the translation component. A representative example of a method for approximating the three-dimensional shape of a subject from corresponding points, postures, and translational components is based on the principle of triangulation measurement. In addition, measures for improving the accuracy of triangulation measurement (non-patent document 6 is a research example) may be executed.
[0086]
As described above, the image processing apparatus 102 according to the third embodiment is based on the posture of the imaging apparatus 101 calculated from the physical quantities (here, gravitational acceleration and geomagnetism) detected by the detection apparatus 1001. Estimate the three-dimensional shape.
[0087]
FIG. 15 is a flowchart according to the image processing apparatus 102 of the third embodiment.
[0088]
In S1501, together with images taken from two or more viewpoints, the gravitational acceleration and geomagnetism detected when these images are taken are recorded in the memory.
[0089]
In S1502, feature points are detected from the first image. Specifically, (1) a feature point detection block is created with each pixel as the center position, luminance value distribution (or RGB value distribution) in each feature point detection block is detected, and (2) the image is a region The feature point detection block in which the difference in luminance value distribution is remarkable in each region is detected, and (3) the position information and luminance value distribution of the center position (feature point) of this feature point detection block are stored in the memory. Record.
[0090]
In S1503, the corresponding point of the feature point is detected from the second image. Specifically, (1) a corresponding point detection block is created with each pixel as the center position, a luminance value distribution in each corresponding point detection block is detected, and (2) the luminance value distribution and each feature point are detected. Matching with the luminance value distribution (the luminance value distribution recorded in the memory in S1502) is performed, and (3) the position information of the center position (corresponding point) of the corresponding point detection block having the best matching degree is recorded in the memory. If there is no corresponding point detection block whose matching degree is better than the threshold value, the feature point is deleted from the memory.
[0091]
In S1504, it is determined whether each corresponding point is correct. Specifically, (1) create a feature point detection block centered on 5 pixels vertically and horizontally from each feature point, and (2) check the correspondence centered on 5 pixels vertically and horizontally from each corresponding point. Create an outgoing block and (3) perform matching between this feature point detection block and this corresponding point detection block. (4) If the matching degree is worse than the threshold value, it is determined that it has been detected by mistake, The corresponding point is deleted from the memory. The corresponding points may be ranked according to the degree of coincidence and recorded in the memory.
[0092]
In S1505, the orientation of the imaging device 101 when the first image is captured is calculated from the gravitational acceleration and the geomagnetism detected when the first image is captured, and the second image is captured. The attitude of the imaging device 101 when the second image is captured is calculated from the gravitational acceleration and the geomagnetism detected at this time.
[0093]
In step S <b> 1506, the translation component of the imaging apparatus 101 from when the first image is captured until when the second image is captured is calculated from the above image and the above posture.
[0094]
In step S <b> 1507, the three-dimensional shape of the subject of the image captured by the imaging apparatus 101 is estimated from the above image, the above posture, and the above translation component. In step S1508, the subject depth is extracted. In step S1509, the subject shape is determined. Here, the three-dimensional shape of the subject is estimated based on the principle of triangulation, and the subject depth D (the minimum value of the sum of squares of the distance between each point group data and the plane equation is calculated based on the estimated point cloud data of the three-dimensional shape. ) And the subject shape is determined by threshold processing using a constant threshold J of the quotient D / S of the extracted subject depth D and subject size S.
[0095]
If D / S is equal to or less than J (in the case of a planar shape), in S1510, the image processing for the planar shape is executed on the above image to calculate a three-dimensional shape. If D / S is not equal to or less than J (in the case of a three-dimensional shape), a three-dimensional shape is calculated by executing image processing for a planar shape on the above image in S1511. In either case, the calculation result of the three-dimensional shape is recorded in the memory in S1512.
[0096]
(First processing unit 402)
As described above, when it is determined that the subject of the image has a planar shape, the first processing unit 402 calculates image processing for the planar shape (calculates a three-dimensional shape of the planar shape) for the image. 3D shape is calculated by executing image processing suitable for the image processing. As a specific example, here, the orientation and translational component of the imaging device 101 are calculated by projective transformation of images taken from two or more viewpoints, and a three-dimensional shape is calculated based on these based on the principle of triangulation, etc. Will be described.
[0097]
When the subject shape is a planar shape, the orientation and translational component of the imaging apparatus 101 can be calculated by calculating the projective transformation matrix B from images captured from two or more viewpoints. When the position of the feature point of the first image is (u, v) and the position of the corresponding point of the second image is (U, V), the projective transformation is expressed as in equation (9). . A projection transformation matrix B is a collection of unknowns b1 to b8 in equation (9) as in equation (10).
[0098]
[Equation 9]

[0099]
[Expression 10]

This projective transformation matrix B can be calculated from position information of four or more corresponding points of an image captured from two or more viewpoints. When five or more corresponding points are detected, only corresponding points with high cross-correlation may be used, or the least square method may be used.
[0100]
Since the projection transformation matrix B is expressed in the imaging device coordinate system, the projection expressed in the imaging device coordinate system using the transformation equation (12) by the imaging device internal matrix A as shown in Equation (11). The transformation matrix B is transformed into a projective transformation matrix H expressed in the world coordinate system. However, z is a focal length, u0 and v0 are the screen centers, and ku and kv are pixel densities.
[0101]
[Expression 11]

[0102]
[Expression 12]

A relational expression (theoretical details are Non-Patent Document 7) is established among the projective transformation matrix H, the posture R, and the translation component b. Here, s is a constant for adjusting the scale, and n is a normal vector of the object plane.
[0103]
[Formula 13]

Therefore, here, the first processing unit 402 calculates the orientation R and the translation component b of the imaging apparatus 101 by calculating the projection transformation matrix B and the projection transformation matrix H from images taken from two or more viewpoints. Based on these, the three-dimensional shape is calculated by the triangular measurement principle or the like.
[0104]
(Second processing unit 403)
As described above, when the second processing unit 403 determines that the subject of the image has a three-dimensional shape, the second processing unit 403 calculates the three-dimensional shape of the three-dimensional shape for the image. 3D shape is calculated by executing image processing suitable for the above. As a specific example, here, the attitude of the image pickup apparatus 101 is determined by an eight-point algorithm (a kind of Structure from Motion for non-patent literature 8 and non-patent literature 9) regarding an image picked up from two or more viewpoints. A case will be described in which a translational component is calculated and a three-dimensional shape is calculated based on these by the triangular measurement principle or the like.
[0105]
When the subject shape is a three-dimensional shape, the posture and translational component of the imaging device 101 can be calculated by an eight-point algorithm for images picked up from two or more viewpoints. In the 8-point algorithm, the posture R and the translation component b are imaged from two or more viewpoints using the relational expression “P · RP ′ × b = 0” between P, P ′, and b described above. It is calculated from the position information of 8 or more corresponding points of the image.
[0106]
Specifically, the cubic matrix E (= b × R) is substituted into the above relational expression to calculate each component of the cubic matrix E. Therefore, there are nine variables to be calculated, and nine corresponding points are required. However, regarding the translation component, since the direction of the translation component is required, the number of variables to be calculated is one less, at least eight, and the corresponding corresponding points are one point less, at least eight. Therefore, when detection of nine or more corresponding points is performed, only corresponding points having a high cross-correlation may be used, or the least square method may be used.
[0107]
Therefore, here, the second processing unit 403 calculates the orientation matrix R and the translational component b of the imaging apparatus 101 by calculating a cubic matrix E from an image captured from two or more viewpoints using an 8-point algorithm. Based on these, the three-dimensional shape is calculated by the triangular measurement principle or the like.
[0108]
(Modification)
As a modified example of the first processing unit 402, (1) when the subject of the image is redetermined whether it is a planar shape or a solid shape, and (2) when it is redetermined that it is a planar shape Execute the image processing for the planar shape for the above image to calculate the three-dimensional shape, and if it is determined again as (3) solid shape, execute the image processing for the solid shape for the above image Then, the three-dimensional shape may be calculated. The function blocks that execute (1), (2), and (3) are referred to as a reshape determination unit, a first reprocessing unit, and a second reprocessing unit, respectively.
[0109]
With respect to the third embodiment, the reshape determining unit {circle around (1)} redetects corresponding points from the image captured by the imaging device 101, and {circle around (2)} re-estimates the three-dimensional shape of the subject from the corresponding points. (3) The subject depth may be re-extracted from the three-dimensional shape, and (4) the subject shape may be re-determined from the subject depth. As a reason for this, for example, as shown in FIG. 16, since corresponding points can be detected only from one plane, it may be erroneously determined to be a planar shape even though it is actually a three-dimensional shape. It is done.
[0110]
Therefore, it is desirable that the corresponding point redetection is performed by a method that makes it easier to detect the corresponding points. As such a method, the epipolar constraint condition is calculated from the posture and the translation component, the epipolar line is projected onto the imaging surface, and the corresponding point is detected again on the epipolar line, thereby reducing the cross-correlation threshold. A method is mentioned. The epipolar constraint condition is a relational expression in which the posture R and the translation component b are substituted into the relational expression “P · RP ′ × b = 0” between P, P ′, and b described above. The epipolar line is a linear equation “aU + bY + c = 0” with the corresponding point (U, V) as a variable, which is calculated by substituting the characteristic point (u, v) and the focal length z into this relational expression. If the corresponding point redetection is executed on the epipolar line, it becomes easier to detect the corresponding point. More specifically, it is possible to detect a greater number of corresponding points with no erroneous correspondence by lowering the cross-correlation threshold.
[0111]
If it is determined again that the shape is a planar shape, the three-dimensional shape may be calculated immediately, or the re-determination of whether the shape is a planar shape or a three-dimensional shape may be looped many times. . If it says in the specific example regarding 3rd Example mentioned above, corresponding point redetection will be looped many times. As a method of ending this loop, the sum of the distances between the corresponding points detected by the corresponding point redetection and the epipolar line is calculated, and the current sum L (n) and the previous sum L (n If the difference from -1) is less than or equal to the threshold value, a method of terminating the loop at this time can be mentioned. When the loop is finished, the three-dimensional shape is calculated.
[0112]
(Modification)
An image processing apparatus 102 in FIG. 1 is an example of an embodiment of an “image processing apparatus” according to the present invention, and an image processing method executed by the image processing apparatus 102 in FIG. It is an example of embodiment of a method. An image processing program for causing a computer to execute this image processing method is an example of an embodiment of an “image processing program” according to the present invention, and a recording medium on which an image processing program for causing the computer to execute this image processing method is recorded (CD-ROM, DVD-ROM, smart media, etc.) is an example of an embodiment of a “recording medium” according to the present invention.
[0113]
The imaging device 101 and the image processing device 102 in FIG. 1 may be integrated electronic devices instead of individual electronic devices. Examples of such electronic devices include mobile phones with cameras, notebook computers with cameras, PDAs with cameras, and digital cameras with image processing functions.
[0114]
【The invention's effect】
As described above, the present invention makes it possible to accurately calculate the three-dimensional shape of a subject regardless of whether the subject shape is a planar shape or a three-dimensional shape.
[Brief description of the drawings]
FIG. 1 illustrates an imaging apparatus and an image processing apparatus.
FIG. 2 illustrates a hardware configuration of the imaging apparatus.
FIG. 3 illustrates a hardware configuration of the image processing apparatus.
FIG. 4 illustrates functional blocks of the image processing apparatus.
FIG. 5 illustrates functional blocks of the image processing apparatus according to the first embodiment.
FIG. 6 is a diagram for explaining subject depth.
FIG. 7 is a flowchart according to the image processing apparatus of the first embodiment.
FIG. 8 illustrates functional blocks of the image processing apparatus according to the second embodiment.
FIG. 9 is a flowchart according to the image processing apparatus of the second embodiment.
FIG. 10 illustrates a hardware configuration of an imaging apparatus according to a third embodiment.
FIG. 11 illustrates functional blocks of the image processing apparatus according to the third embodiment.
FIG. 12 represents a world coordinate system.
FIG. 13 illustrates an imaging apparatus coordinate system.
FIG. 14 is a diagram for explaining a translational component.
FIG. 15 is a flowchart according to an image processing apparatus of a third embodiment.
FIG. 16 is a diagram for explaining a modification of the first processing unit.
[Explanation of symbols]
101 Imaging device
102 Image processing apparatus
401 Shape determination unit
402 First processing unit
403 Second processing unit
404 Recording section
501 Depth extraction unit
502 Shape determination unit
801 3D shape estimation part
802 Depth extraction unit
803 Shape determination unit
1001 Detection device
1101 Corresponding point detector
1102 Attitude calculation unit
1103 Translation component calculation unit
1104 Three-dimensional shape estimation part

Claims

撮像装置により撮像された画像について画像処理を実行して三次元形状を算出する画像処理装置であって、
2 箇所の視点から撮像された画像上の４点以上の対応点の位置情報から射影変換行列を算出する変換行列算出手段と、
前記射影変換行列算出手段により算出された射影変換行列により前記画像の被写体の形状判定に用いる値を抽出する奥行抽出手段と、
前記形状判定に用いる値から前記画像の被写体が平面形状であるか立体形状であるかを判定する形状判定手段を備え、
前記奥行抽出手段は、互いに異なる組合せの対応点から算出された２つの射影変換行列の差をもって前記形状判定に用いる値とし、
平面形状である場合は、前記画像について平面形状用の画像処理を実行して三次元形状を算出して、立体形状である場合は、前記画像について立体形状用の画像処理を実行して三次元形状を算出することを特徴とする画像処理装置。An image processing device that performs image processing on an image captured by an imaging device to calculate a three-dimensional shape,
Transformation matrix calculation means for calculating a projection transformation matrix from position information of four or more corresponding points on an image captured from two viewpoints;
Depth extraction means for extracting values used for shape determination of the subject of the image by the projection transformation matrix calculated by the projection transformation matrix calculation means;
Shape determining means for determining whether the subject of the image is a planar shape or a three-dimensional shape from the value used for the shape determination,
The depth extraction means is a value used for the shape determination with a difference between two projective transformation matrices calculated from corresponding points of different combinations.
If it is a planar shape, the image processing for the planar shape is executed for the image to calculate a three-dimensional shape. If it is a three-dimensional shape, the image processing for the three-dimensional shape is executed for the image An image processing apparatus that calculates a shape.

撮像装置により撮像された画像について画像処理を実行して三次元形状を算出する画像処理装置であって、
前記画像の被写体の三次元形状を概算する三次元形状概算手段と、
前記三次元形状を構成する点群データを抽出し、複数の前記点群データと平面方程式との距離の二乗和の最小値を前記画像の被写体の形状判定に用いる値として抽出する奥行抽出手段と、
前記形状判定に用いる値から前記画像の被写体が平面形状であるか立体形状であるかを判定する形状判定手段を備え、
平面形状である場合は、平面形状用の画像処理を実行して三次元形状を算出して、立体形状である場合は、立体形状用の画像処理を実行して三次元形状を算出することを特徴とする画像処理装置。An image processing device that performs image processing on an image captured by an imaging device to calculate a three-dimensional shape,
Three-dimensional shape estimation means for estimating the three-dimensional shape of the subject of the image;
Depth extraction means for extracting point cloud data constituting the three-dimensional shape, and extracting a minimum value of a square sum of distances between the plurality of point cloud data and a plane equation as a value used for determining the shape of the subject of the image ; ,
Shape determining means for determining whether the subject of the image is a planar shape or a three-dimensional shape from the value used for the shape determination,
If the shape is a plane shape, the image processing for the plane shape is executed to calculate the three-dimensional shape. If the shape is the three-dimensional shape, the image processing for the shape is executed to calculate the three-dimensional shape A featured image processing apparatus.

前記撮像装置は、当該撮像装置の姿勢を算出するための物理量を検出する検出装置を備え、前記三次元形状概算手段は、前記物理量から算出される当該撮像装置の姿勢に基づいて、前記画像の被写体の三次元形状を概算することを特徴とする請求項２に記載の画像処理装置。The imaging device includes a detection device that detects a physical quantity for calculating a posture of the imaging device, and the three-dimensional shape estimation unit calculates the image based on the posture of the imaging device calculated from the physical quantity. The image processing apparatus according to claim 2 , wherein the three-dimensional shape of the subject is estimated.

撮像装置により撮像された画像について画像処理を実行して三次元形状を算出する画像処理方法であって、
2 箇所の視点から撮像された画像上の４点以上の対応点の位置情報から射影変換行列を算出する変換行列算出段階と、
前記射影変換行列算出手段により算出された射影変換行列により前記画像の被写体の形状判定に用いる値を抽出する奥行抽出段階と、
前記形状判定に用いる値から前記画像の被写体が平面形状であるか立体形状であるかを判定する形状判定段階を備え、
前記奥行抽出段階は、互いに異なる組合せの対応点から算出された２つの射影変換行列の差をもって前記形状判定に用いる値とし、
平面形状である場合は、前記画像について平面形状用の画像処理を実行して三次元形状を算出して、立体形状である場合は、前記画像について立体形状用の画像処理を実行して三次元形状を算出することを特徴とする画像処理方法。An image processing method for calculating a three-dimensional shape by executing image processing on an image captured by an imaging device,
A transformation matrix calculation stage for calculating a projection transformation matrix from position information of four or more corresponding points on an image captured from two viewpoints;
A depth extraction step of extracting a value to be used for determining the shape of the subject of the image from the projection transformation matrix calculated by the projection transformation matrix calculating means;
A shape determination step for determining whether the subject of the image is a planar shape or a three-dimensional shape from the values used for the shape determination;
The depth extraction step is a value used for the shape determination with a difference between two projective transformation matrices calculated from corresponding points of different combinations.
If it is a planar shape, the image processing for the planar shape is executed for the image to calculate a three-dimensional shape. If it is a three-dimensional shape, the image processing for the three-dimensional shape is executed for the image An image processing method characterized by calculating a shape.

撮像装置により撮像された画像について画像処理を実行して三次元形状を算出する画像処理方法であって、
前記画像の被写体の三次元形状を概算する三次元形状概算段階と、
前記三次元形状を構成する点群データを抽出し、複数の前記点群データと平面方程式との距離の二乗和の最小値を前記画像の被写体の形状判定に用いる値として抽出する奥行抽出段階と、
前記形状判定に用いる値から前記画像の被写体が平面形状であるか立体形状であるかを判定する形状判定段階を備え、
平面形状である場合は、平面形状用の画像処理を実行して三次元形状を算出して、立体形状である場合は、立体形状用の画像処理を実行して三次元形状を算出することを特徴とする画像処理方法。An image processing method for calculating a three-dimensional shape by executing image processing on an image captured by an imaging device,
A three-dimensional shape estimation step for estimating the three-dimensional shape of the subject of the image;
A depth extraction step of extracting point cloud data constituting the three-dimensional shape, and extracting a minimum sum of squares of distances between the plurality of point cloud data and a plane equation as a value used for determining the shape of the subject of the image ; ,
A shape determination step for determining whether the subject of the image is a planar shape or a three-dimensional shape from the values used for the shape determination;
If the shape is a plane shape, the image processing for the plane shape is executed to calculate the three-dimensional shape. If the shape is the three-dimensional shape, the image processing for the shape is executed to calculate the three-dimensional shape A featured image processing method.

前記撮像装置は、当該撮像装置の姿勢を算出するための物理量を検出する検出装置を備え、前記三次元形状概算段階は、前記物理量から算出される当該撮像装置の姿勢に基づいて、前記画像の被写体の三次元形状を概算することを特徴とする請求項５に記載の画像処理方法。The imaging apparatus includes a detection device that detects a physical quantity for calculating the attitude of the imaging apparatus, and the three-dimensional shape estimation step is based on the attitude of the imaging apparatus calculated from the physical quantity. The image processing method according to claim 5 , wherein the three-dimensional shape of the subject is estimated.

請求項４乃至６のいずれか１項に記載の画像処理方法をコンピュータに実行させる画像処理プログラム。An image processing program for causing a computer to execute the image processing method according to any one of claims 4 to 6 .

請求項４乃至６のいずれか１項に記載の画像処理方法をコンピュータに実行させる画像処理プログラムが記録された記録媒体。Recording medium on which an image processing program is recorded for executing the image processing method according to the computer in any one of claims 4 to 6.

請求項１乃至３のいずれか１項に記載の撮像装置と画像処理装置を備える電子機器。Electronic apparatus including an imaging device and an image processing apparatus according to any one of claims 1 to 3.