JP2013145514A

JP2013145514A - Video processing device and program therefor

Info

Publication number: JP2013145514A
Application number: JP2012006258A
Authority: JP
Inventors: Tatsuhiro Tajima; 達裕田嶋
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2012-01-16
Filing date: 2012-01-16
Publication date: 2013-07-25

Abstract

PROBLEM TO BE SOLVED: To reduce loss of information received from a video by viewers having various visual features as compared with information received from the video by normal viewers.SOLUTION: A video processing device comprises: a structural analysis processing section 10 for performing parallel structural analysis on an input video on the basis of a plurality of different visual feature models, to calculate, as a result of the structural analysis, saliency maps s, s, s, ... that correspond to the plurality of different visual feature models; a comparison calculating section 20 for comparing the saliency maps s, s, s, ... that correspond to the plurality of different visual feature models, and for outputting the comparison result as information loss maps L; an integration calculating section 21 for integrating the information loss maps to calculate a single or a plurality of information loss indicators v; and a comparing section 22 for comparing the information loss indicator vwith a reference information loss indicator vand for outputting the comparison result as a diagnostic result.

Description

本発明は、映像処理装置およびそのプログラムに関する。特に、異なる視覚特性を持った視聴者間におけるユニバーサルデザインを支援する映像処理装置およびそのプログラムに関する。 The present invention relates to a video processing apparatus and a program thereof. In particular, the present invention relates to a video processing apparatus and program for supporting universal design among viewers having different visual characteristics.

映像や図案などの視覚情報を作成する際、一般視聴者だけでなく、特有の視覚特性を有する視聴者（色弱者、白内障者、弱視者など）に対しても同等の情報が伝わるような配慮が求められている。特に、生活に関する情報や災害情報など、視聴者の暮らしや生命に関わる重要な情報については、意図した情報が確実に伝わるよう、細心の注意を払う必要がある。 When creating visual information such as images and designs, consideration should be given not only to general viewers but also to viewers with specific visual characteristics (such as color-blind people, cataracts, and low-sighted people). Is required. In particular, with regard to important information related to the life and life of viewers, such as life information and disaster information, it is necessary to pay close attention to ensure that the intended information is transmitted.

そのためには、様々な視聴者の視覚特性を考慮して、どのような映像が知覚されるかを予測し、視聴者間で受け取る情報的価値が極端に異ならないように、映像を作成、あるいは修正する手段が必要となる。 To do this, consider the visual characteristics of various viewers, predict what kind of images will be perceived, and create videos so that the information value received among viewers does not differ significantly, or A means to correct is needed.

「色弱」の場合、網膜における錐体細胞の分布が一般視聴者のそれと異なっている。例えば、１型、２型、３型の２色覚では、それぞれＬ錐体、Ｍ錐体、Ｓ錐体が存在しないか、数が少ないために、知覚としては、赤と緑、あるいは青と黄などの区別や、暗い背景の中の特定の色の認識が困難になる。 In the case of “color weakness”, the distribution of cone cells in the retina is different from that of general viewers. For example, in the 1-type, 2-type, and 3-type 2-color visions, there are no L cones, M cones, and S cones, respectively, or the numbers are small, so that perception is red and green or blue and yellow. It is difficult to distinguish between them and to recognize a specific color in a dark background.

「白内障」は、水晶体を構成する蛋白質の変性により、水晶体内に黄白色の濁りを生じる現象である。水晶体内での光の散乱の他、網膜上に届く光が短波長成分を中心として減衰されるため、知覚としては、黒と青、白と黄の差が区別しにくくなる等の特徴がある。 “Cataract” is a phenomenon in which yellowish white turbidity is produced in the lens due to degeneration of the protein that constitutes the lens. In addition to light scattering in the lens, the light that reaches the retina is attenuated with a short wavelength component at the center, and as a perception, it is difficult to distinguish the difference between black and blue and white and yellow. .

「弱視」には、いくつかの種類があり、斜視や屈折異常などによる網膜上のデフォーカスや、遺伝的または環境的要因による視細胞の発達異常からくるコントラスト感度低下などが挙げられる。 There are several types of “amblyopia”, and examples include defocusing on the retina due to strabismus and refractive errors, and reduction in contrast sensitivity resulting from abnormal development of photoreceptor cells due to genetic or environmental factors.

上述した様々な視覚特性に関して、これまでの医学的・心理学的研究等により、具体的にどのような色の組み合わせや視覚パターンが認識上問題となるかが調べられ、膨大なデータが蓄積されている。また、それらのデータに基づき、各種の視覚特性を持つ視聴者の知覚を疑似的に体験するシミュレーションツールもいくつか考案されている。映像作成者は、これらの知見やシミュレーションツールの出力等に基づいて、多様な視聴者が知覚する映像を予測し、情報が適切に伝わるような色の組み合わせや視覚パターンを選択することが求められている。 With regard to the various visual characteristics mentioned above, past medical and psychological studies have investigated what color combinations and visual patterns will cause problems in recognition, and a huge amount of data has been accumulated. ing. Some simulation tools have also been devised based on these data to simulate the viewer's perception with various visual characteristics. Based on this knowledge and the output of simulation tools, video creators are expected to predict video perceived by various viewers and select color combinations and visual patterns that allow information to be transmitted appropriately. ing.

これまでにも、映像のユニバーサルデザインを支援するためのシステムが複数提案されてきた。例えば、色覚に関しては、心理物理学的に計測されたモデルを用いて、２色覚者が知覚する色空間をシミュレートするものがある（例えば非特許文献１参照）。 Until now, multiple systems have been proposed to support universal design of images. For example, regarding color vision, there is one that simulates a color space perceived by a two-color person using a model measured psychophysically (see, for example, Non-Patent Document 1).

また、２色覚者にとって見やすい図案を自動的に作成するために、シミュレートされた２色覚者色空間において、近接する領域の色座標同士の距離を広げるような色変換を施す方法などが提案されている（例えば特許文献１、２参照）。 In addition, in order to automatically create a design that is easy for a two-color viewer to see, a method of performing color conversion that increases the distance between the color coordinates of adjacent regions in a simulated two-color viewer color space has been proposed. (For example, see Patent Documents 1 and 2).

Vienot F, Brettel H, Mollon JD. Digital video colourmaps for checking the legibility of displays by dichromats. Color Research & Application. 1999; 24(4): 243-252.Vienot F, Brettel H, Mollon JD.Digital video colormaps for checking the legibility of displays by dichromats.Color Research & Application. 1999; 24 (4): 243-252.

特許第４７２４８８７号公報Japanese Patent No. 4724887 特開２００８−１２９１６２号公報JP 2008-129162 A

上述した非特許文献１に記載された技術では、心理物理学的にモデルを用いて、２色覚者が知覚する色空間をシミュレートすることができる。しかながら、実際にこれを映像制作等のユニバーサルデザインに活かす際には、シミュレーション結果の解釈については、多くの部分を主観的判断に頼らざるを得ないという問題がある。すなわち、シミュレーション結果の画像を見て、元の画像中のどこをどのように修正すれば良いかを客観的に判断するのは非常に難しく、結局はシミュレーションで用いられているアルゴリズムや視覚特性のパラメータ等に関する高度な専門的知識が求められる。そのため、専門知識を持たない映像作成者にとって、ユニバーサルデザインに配慮した映像を作成することは困難であった。 In the technique described in Non-Patent Document 1 described above, a color space perceived by a two-color person can be simulated using a psychophysical model. However, when this is actually applied to universal design such as video production, there is a problem that interpretation of the simulation result has to rely on subjective judgment for many parts. In other words, it is very difficult to objectively determine where and how to modify the original image by looking at the simulation result image. Eventually, the algorithm and visual characteristics used in the simulation Advanced technical knowledge about parameters etc. is required. For this reason, it has been difficult for video creators who do not have specialized knowledge to create video in consideration of universal design.

また、特許文献１に記載された発明は、上記問題に対応するために、２色覚者にとって見やすい図案を自動的に作成する方法を提案している。これは、シミュレートされた２色覚者色空間において、図案中の背景と文字などを弁別しやすくなるように、色座標同士の距離を広げるような色変換を施すものである。しかしながら、色座標同士の距離を評価基準とする方法には、妥当な距離基準の設定が困難であるという点で問題がある。なぜならば、２色覚者の色認知空間の構造については議論があり、当該手法で提案されているような、色空間内での「縮退面」への投影によって、一般視聴者と２色覚者との間で色知覚上の距離が保存されるということは実証されていないからである。 Further, the invention described in Patent Document 1 proposes a method of automatically creating a design that is easy to see for a two-color person in order to cope with the above problem. This performs color conversion that increases the distance between the color coordinates so that the background and characters in the design can be easily distinguished in the simulated two-color visual space. However, the method using the distance between color coordinates as an evaluation criterion has a problem in that it is difficult to set an appropriate distance criterion. This is because there is a discussion about the structure of the color recognition space of the two-color viewer, and the projection to the “degenerate surface” in the color space as proposed by the method, This is because it has not been proven that the color perception distance is preserved between the two.

また、特許文献１に記載された手法では、テクスチャなど、色以外の次元で表現される情報を考慮することができない。さらに、この手法では、主な適用対象としてＨＴＭＬ文書等、構造化された視覚情報を想定しており、画像や図案等のより一般的な視覚情報については「ベイズ推定に基づくベイジアンネットワークなどを用いることによって、分割された領域の視覚的特徴や属性を推定することができる」との記載にとどまり、具体的な図地の判別方法などは明記されていない。現実的な設定においては、写真等の多階調の画像において、背景と物体を分離する作業自体が難易度の高い課題となる。 Further, the technique described in Patent Document 1 cannot take into account information expressed in dimensions other than colors, such as texture. Furthermore, this method assumes structured visual information such as an HTML document as a main application target. For more general visual information such as images and designs, use a “Baysian network based on Bayesian estimation”. Thus, it is possible to estimate the visual characteristics and attributes of the divided areas ”, and the specific method of discriminating the figure is not specified. In a realistic setting, in a multi-tone image such as a photograph, the work itself for separating the background and the object is a difficult task.

また、特許文献２では、色弱者も考慮しつつ視聴者の視覚モデルを随時計測し、照明環境等に合わせた最適な画像表示を行う方法が提案されている。しかしながら、これは色弱者と一般視聴者との見え方の違いを低減するようなユニバーサルデザインを目的としたものではなく、むしろ単一の視聴者内で変換前後の画像の見え方の違いを抑えるようなシステムとなっている。ユニバーサルデザインを目的とした場合、これとは異なり、「変換後」の視覚情報において「複数の視聴者間で」見え方の違いを抑えることが必要となるため、異なる原理に基づいたシステムが必要となる。 Further, Patent Document 2 proposes a method of measuring a viewer's visual model at any time in consideration of the color-impaired person and displaying an optimal image according to the lighting environment. However, this is not aimed at a universal design that reduces the difference in appearance between color-blind people and general viewers, but rather suppresses the difference in image appearance before and after conversion within a single viewer. It is a system like this. For the purpose of universal design, it is different from this, because it is necessary to suppress the difference in appearance between “multiple viewers” in the “converted” visual information, so a system based on different principles is required. It becomes.

また、上述したいずれの従来手法においても、２種類以上の色弱タイプに対して、すべての視聴者に見やすい映像を制作することは想定されていない。さらに、根本的な問題として、これらの従来手法は、２色覚の色弱に特化したものであり、白内障や弱視といった色弱以外の視覚特性に対しては適用できない。 In any of the conventional methods described above, it is not assumed that an image that is easy to see for all viewers is produced for two or more types of color weakness. Furthermore, as a fundamental problem, these conventional methods are specialized for color weakness of two-color vision, and cannot be applied to visual characteristics other than color weakness such as cataract and amblyopia.

こうした理由から、上述した従来技術では、一般的な画像や図案を含む非構造的な視覚情報について、色弱、白内障、弱視といった様々な視覚特性上のマイノリティに属する視聴者を含む、広範な視覚特性をもつ視聴者を対象としたユニバーサルデザインを実現することは困難であるという問題があった。 For these reasons, the above-described prior art has a wide range of visual characteristics for unstructured visual information including general images and designs, including viewers belonging to various visual characteristics minorities such as color weakness, cataract, and amblyopia. There was a problem that it was difficult to achieve universal design for viewers with

本発明は、このような事情を考慮してなされたものであり、様々な特殊な視覚特性を有する視聴者に対して、一般視聴者が映像から受け取る情報と比べた際の情報損失を低減することができる映像処理装置およびそのプログラムを提供しようとするものである。 The present invention has been made in view of such circumstances, and reduces information loss when compared with information received from a video by a general viewer for viewers having various special visual characteristics. It is an object of the present invention to provide a video processing apparatus and its program.

［１］上述した課題を解決するために、本発明の一態様は、複数の異なる視覚特性モデルに基づいて入力映像に対して並列的な構造分析を行い、該構造分析の結果として、前記複数の異なる視覚特性モデルに対応する顕著性マップを算出する構造分析処理部と、前記構造分析処理部により算出された、前記複数の異なる視覚特性モデルの各々に対する前記顕著性マップを比較し、該比較結果を情報損失マップとして出力する比較演算部と、前記比較演算部からの前記情報損失マップを統合して、単一、または複数の情報損失指標を算出する統合演算部と、前記統合演算部により算出された情報損失指標を基準情報損失指標と比較し、該比較結果を診断結果として出力する比較部とを備えることを特徴とする映像処理装置である。 [1] In order to solve the above-described problem, according to an aspect of the present invention, a parallel structural analysis is performed on an input video based on a plurality of different visual characteristic models, and the plurality of structural analyzes are performed as a result of the structural analysis. Comparing the saliency map for each of the plurality of different visual characteristic models calculated by the structural analysis processing unit, and calculating the saliency map corresponding to different visual characteristic models A comparison operation unit that outputs a result as an information loss map, an integration operation unit that calculates a single or a plurality of information loss indicators by integrating the information loss map from the comparison operation unit, and the integration operation unit A video processing apparatus comprising: a comparison unit that compares the calculated information loss index with a reference information loss index and outputs the comparison result as a diagnosis result.

［２］また、本発明の一態様は、上記態様において、前記比較部による診断結果に基づいて前記入力映像を修正する修正部を更に備えることを特徴とする。 [2] Further, according to an aspect of the present invention, in the above aspect, the image processing apparatus further includes a correction unit that corrects the input video based on a diagnosis result by the comparison unit.

［３］また、本発明の一態様は、上記態様において、前記構造分析処理部は、対象とする視聴者の視覚特性データに基づいて、水晶体分光透過特性モデル、光受容体分布モデル、反対色応答モデル、顕著性計算モデルを用いて、入力映像内の顕著領域分布をシミュレートすることで、前記顕著性マップを算出することを特徴とする。 [3] Further, according to one aspect of the present invention, in the above aspect, the structural analysis processing unit is configured to select a lens spectral transmission characteristic model, a photoreceptor distribution model, an opposite color based on visual characteristic data of a target viewer. The saliency map is calculated by simulating the saliency distribution in the input video using a response model and a saliency calculation model.

［４］また、本発明の一態様は、前記修正部は、前記情報損失指標の値を最小化するように前記入力映像に対して修正を施すことを特徴とする。 [4] Moreover, one aspect of the present invention is characterized in that the correction unit corrects the input video so as to minimize the value of the information loss index.

［５］また、本発明の一態様は、コンピュータに、複数の異なる視覚特性モデルに基づいて入力映像に対して並列的な構造分析を行い、該構造分析の結果として、前記複数の異なる視覚特性モデルに対応する顕著性マップを算出する構造分析処理ステップと、前記複数の異なる視覚特性モデルの各々に対する前記顕著性マップを比較し、該比較結果を情報損失マップとして出力する比較演算ステップと、前記情報損失マップを統合して、単一、または複数の情報損失指標を算出する統合演算ステップと、前記算出された情報損失指標を基準情報損失指標と比較し、該比較結果を診断結果として出力する比較ステップとを実行させるプログラムである。 [5] Further, according to one embodiment of the present invention, a computer performs parallel structural analysis on an input video based on a plurality of different visual characteristic models, and the plurality of different visual characteristics are obtained as a result of the structural analysis. A structural analysis processing step of calculating a saliency map corresponding to the model, a comparison operation step of comparing the saliency map for each of the plurality of different visual characteristic models, and outputting the comparison result as an information loss map, An integration calculation step for calculating one or a plurality of information loss indicators by integrating information loss maps, comparing the calculated information loss indicator with a reference information loss indicator, and outputting the comparison result as a diagnosis result This is a program for executing the comparison step.

この発明によれば、様々な視覚特性上のマイノリティに属する視聴者に対する、一般視聴者が映像から受け取る情報と比べた際の情報損失を低減することができる。 According to the present invention, it is possible to reduce information loss for a viewer who belongs to minorities on various visual characteristics when compared with information received from a video by a general viewer.

本発明の第１実施形態による映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video processing apparatus by 1st Embodiment of this invention. 第１実施形態による構造分析処理部における視聴者モデルＸ（Ｘ＝Ａ、Ｂ１、Ｂ２、…）の構成を示すブロック図、及び顕著性モデルでの処理の流れを示す概念図である。It is a block diagram which shows the structure of the viewer model X (X = A, B1, B2, ...) in the structure analysis process part by 1st Embodiment, and the conceptual diagram which shows the flow of a process by a saliency model. 本発明の第２実施形態による映像処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the video processing apparatus by 2nd Embodiment of this invention. 本発明の第１および第２実施形態において生成される情報損失マップの計算例について説明する概念図である。It is a conceptual diagram explaining the example of calculation of the information loss map produced | generated in 1st and 2nd embodiment of this invention. 本発明の第１および第２実施形態において生成される情報損失指標の計算例について説明する概念図である。It is a conceptual diagram explaining the example of calculation of the information loss parameter | index produced | generated in 1st and 2nd embodiment of this invention. 本発明の第１実施形態による映像処理装置を適用した第１の実施例について説明する概念図である。It is a conceptual diagram explaining the 1st Example to which the video processing apparatus by 1st Embodiment of this invention is applied.

本発明の実施形態では、画像処理技術を用いて、番組映像が色弱者にどのように見えるかをシミュレートし、一般視聴者の見る映像に比べた視認性低下の度合いを評価するアルゴリズムおよびソフトウェアを提供する。特に、一般の視聴者が映像中で注目する箇所の情報損失に焦点を当てることで、視聴者に情報を伝えるという目的に即した評価および修正を行うことができる。 In an embodiment of the present invention, an algorithm and software for simulating how a program video looks to a color-blind person using an image processing technique and evaluating the degree of visibility reduction compared to a video viewed by a general viewer I will provide a. In particular, by focusing on the information loss of a portion that a general viewer notices in the video, it is possible to perform evaluation and correction in accordance with the purpose of conveying information to the viewer.

以降、次のような表記を用いる。「視聴者Ａ」は、一般的な視覚特性を持つ視聴者を表わす。また、「視聴者Ｂ」は、視覚特性上のマイノリティに属する視聴者（色弱者、白内障者、弱視者などを含む）を表わす。 Hereinafter, the following notation is used. “Viewer A” represents a viewer having general visual characteristics. “Viewer B” represents a viewer (including a weak color person, a cataract person, a weak eye person, etc.) belonging to a minority in visual characteristics.

視覚特性上のマイノリティに属する視聴者として、複数の異なる種類（例えば、色弱者と白内障者など）を対象とする場合には、それらを区別するために、それぞれ「視聴者Ｂ１」、「視聴者Ｂ２」、…などと置き換えて表わす。また、以下で各視聴者に共通する処理を説明する場合、抽象的に「視聴者Ｘ」（Ｘ＝Ａ、Ｂ１、Ｂ２、…）と表わす。 As viewers belonging to minorities in visual characteristics, when targeting a plurality of different types (for example, the visually impaired and cataracts), in order to distinguish them, “viewer B1” and “viewer”, respectively. B2 ”,... In the following description, processing common to each viewer will be described abstractly as “viewer X” (X = A, B1, B2,...).

本発明の実施形態による映像処理装置では、次のような原理に基づき、画像評価および画像修整の処理を行うように構成される。視覚情報のユニバーサルデザインを実現するにあたり、満たすべき２つの要請がある。 The video processing apparatus according to the embodiment of the present invention is configured to perform image evaluation and image modification processing based on the following principle. There are two requirements that must be met in order to realize the universal design of visual information.

第１の要請としては、異なる視覚特性を持つ視聴者間において、見え方の違いを統一的な尺度で評価する必要がある。その際、単純な色座標間距離などの尺度は、前述したような理由から適当ではない。 As a first requirement, it is necessary to evaluate the difference in appearance on a unified scale between viewers having different visual characteristics. At that time, a simple measure such as a distance between color coordinates is not appropriate for the reason described above.

また、第２の要請として、写真などの一般的に非構造的なデータとして与えられる視覚情報については、領域分割などの構造分析処理をシステムに組み込む必要がある。その際、構造分析の結果は、視認性評価における処理の結果と矛盾しないものである必要がある。これらの要請に対し、従来提案されてきた手法のように、構造分析と視認性評価とを異なるステップで処理する方針をとった場合、多くの制約条件を伴う複雑な問題を解決しなければならない。 As a second requirement, for visual information generally given as unstructured data such as photographs, it is necessary to incorporate structural analysis processing such as region division into the system. At that time, the result of the structural analysis needs to be consistent with the result of the processing in the visibility evaluation. In response to these requests, if a policy is adopted in which structural analysis and visibility evaluation are processed in different steps, as in the previously proposed method, complex problems with many constraints must be solved. .

そこで、本発明の実施形態においては、視覚情報の構造分析の結果自体を視認性評価に利用できることに着目し、構造分析と視認性評価とを一体化させた処理を行う。すなわち、構造分析の段階で、異なる視覚特性モデルを複数併用することにより、各種の視聴者モデルにおける知覚処理をシミュレートする。その際、構造分析において、視覚情報の中で顕著性が高い部分を抽出する処理を行うことにより、その結果を視認性評価として用いることができる。 Therefore, in the embodiment of the present invention, attention is paid to the fact that the structural analysis result of visual information itself can be used for visibility evaluation, and processing in which structural analysis and visibility evaluation are integrated is performed. That is, at the stage of structural analysis, a plurality of different visual characteristic models are used together to simulate perceptual processing in various viewer models. At that time, by performing a process of extracting a portion having high saliency in the visual information in the structural analysis, the result can be used as the visibility evaluation.

本発明の実施形態による映像処理装置は、次のような入出力を伴うアルゴリズムに基づいて計算を行う。入力は、映像（静止画または動画）である。出力は、（ｉ）視聴者Ａに比べ、視聴者Ｂにとって映像中のどの部分で大きく見やすさが低下しているかを示すマップ（情報損失マップＬ）、および／または、（ｉｉ）映像全体の見やすさ低下を示す評価値（情報損失指標ｖ_Ｌ）である。 The video processing apparatus according to the embodiment of the present invention performs calculation based on the following algorithm with input / output. The input is a video (still image or moving image). The output is (i) a map (information loss map L) indicating which part of the video is less visible for viewer B than viewer A, and / or (ii) the entire video. This is an evaluation value (information loss index v _L ) indicating a decrease in visibility.

本発明の実施形態では、上記情報損失マップＬ、および情報損指標ｖ_Ｌを計算するために、視覚処理の分野で提案されている「顕著性マップ」の概念を拡張した処理を行う。 In the embodiment of the present invention, in order to calculate the information loss map L and the information loss index v _L , processing that extends the concept of “saliency map” proposed in the field of visual processing is performed.

顕著性マップは、画像中で視聴者の注意が向けられやすい領域を自動的に抽出する手段として有効な方法であり、注視点の分布とよく相関することが実証されている（参考文献「Itti L, Koch C. “A model of saliency-based visual attention for rapid scene analysis.” IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998; 20(11): 1254-1259」を参照）。但し、従来提案されている顕著性マップの計算は、一般的な視聴者の視覚特性を想定したものであり、色弱や白内障等を含む視聴者の光受容体分布特性や水晶体分光透過特性などを反映した計算を行うようには設計されていなかった。 The saliency map is an effective method for automatically extracting regions in the image where the viewer's attention is likely to be directed, and has been proven to correlate well with the distribution of gazing points (reference document “Itti”). L, Koch C. “A model of saliency-based visual attention for rapid scene analysis.” IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998; 20 (11): 1254-1259 ”). However, the calculation of the saliency map that has been proposed in the past is based on the general visual characteristics of the viewer, and the photoreceptor distribution characteristics and lens spectral transmission characteristics of the viewer including color weakness and cataracts. It was not designed to perform reflected calculations.

そこで、本発明の実施形態では、従来提案されてきた顕著性マップの計算方法を拡張し、生理学的により忠実なモデルを構成する。具体的には、水晶体分光特性、光受容体分布特性、時空間周波数分解特性などをデータに基づいて設定し、視覚特性上のマイノリティに属する視聴者の知覚についても妥当な結果が得られる処理を行う。 Therefore, in the embodiment of the present invention, a conventionally proposed method for calculating a saliency map is expanded to construct a physiologically more faithful model. Specifically, lens spectroscopic characteristics, photoreceptor distribution characteristics, spatio-temporal frequency resolution characteristics, etc. are set based on the data, and processing that gives reasonable results for the perception of viewers belonging to minorities in visual characteristics Do.

以下、図面を参照しながら、本発明の実施形態の詳細について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

［第１実施形態］
まず、本発明の第１実施形態＜映像評価システム＞について説明する。
図１は、本発明の第１実施形態による映像処理装置の構成を示すブロック図である。同図において、映像処理装置１は、構造分析処理部１０と、比較演算部２０と、統合演算部２１と、比較部２２とを含んで構成される。 [First Embodiment]
First, a first embodiment <video evaluation system> of the present invention will be described.
FIG. 1 is a block diagram showing a configuration of a video processing apparatus according to the first embodiment of the present invention. In FIG. 1, the video processing apparatus 1 includes a structure analysis processing unit 10, a comparison calculation unit 20, an integrated calculation unit 21, and a comparison unit 22.

構造分析処理部１０は、入力映像Ｉに対して、複数の異なる視覚特性モデルＡ、Ｂ１、Ｂ２、…に対応する視聴者モデル部１１、１２、１３、…で並列的な構造分析を行い、該構造分析の結果として、視覚特性モデルＡ、Ｂ１、Ｂ２、…に対応する顕著性マップｓ_Ａ、ｓ_Ｂ１、ｓ_Ｂ２、…を出力する。顕著性マップｓ_Ａ、ｓ_Ｂ１、ｓ_Ｂ２、…は、各々、対応する視聴者が、画像中で注目しやすい部分を示す映像を表わす。なお、視覚特性モデルＡ、Ｂ１、Ｂ２、…については、上述したように、視覚特性モデルＡが３色覚者（健常者）を示し、視覚特性モデルＢ１、Ｂ２は、例えば２色覚者（色弱者）などを示す。 The structural analysis processing unit 10 performs parallel structural analysis on the input video I in the viewer model units 11, 12, 13,... Corresponding to a plurality of different visual characteristic models A, B1, B2,. As a result of the structural analysis, saliency maps s _A , s _B1 , s _B2 ,... Corresponding to the visual characteristic models A, B1, B2,. Each of the saliency maps s _A , s _B1 , s _B2 ,... Represents a video showing a portion that is easily noticed by the corresponding viewer. As for the visual characteristic models A, B1, B2,..., As described above, the visual characteristic model A indicates a three-colored person (healthy person). ) Etc.

比較演算部２０は、構造分析処理部１０により算出された、複数の異なる視覚特性モデルの各々に対する顕著性マップを比較し、その比較結果を情報損失マップとして出力する。より具体的には、比較演算部２０は、構造分析の結果である顕著性マップｓ_Ａ、ｓ_Ｂ１、ｓ_Ｂ２、…を比較し、該比較結果に基づいて、情報損失マップＬを映像出力する。なお、このとき、視聴者モデルＢ１、Ｂ２、…に対しては、重みづけＷ_ｋを用いた重みづき加算（Σ_ｋＷ_ｋｓ_Ｂｋ）を行う。統合演算部２１は、情報損失マップＬを統合して単一、または複数の情報損失指標ｖ_Ｌを算出する。比較部２２は、情報損失指標ｖ_Ｌを基準情報損失指標ｖ_Ｌ ^＃と比較し、該比較結果を診断結果として出力する。 The comparison calculation unit 20 compares the saliency maps calculated by the structural analysis processing unit 10 for each of a plurality of different visual characteristic models, and outputs the comparison result as an information loss map. More specifically, the comparison operation unit 20 compares the saliency maps s _A , s _B1 , s _B2 ,... That are the results of the structural analysis, and outputs an information loss map L based on the comparison results. . At this time, weighted addition (Σ _k W _k s _Bk ) using the weight W _k is performed on the viewer models B1, B2 _,. The integrated calculation unit 21 integrates the information loss map _L to calculate a single or a plurality of information loss indicators vL. The comparison unit 22 compares the information loss index v _L with the reference information loss index v _L ^#, and outputs the comparison result as a diagnosis result.

図２は、本第１実施形態による構造分析処理部１０における視聴者モデルＸ（Ｘ＝Ａ、Ｂ１、Ｂ２、…）の構成を示すブロック図、および顕著性モデル部１０３での処理の流れを示す概略図である。同図において、構造分析処理部１０は、対象とする視聴者の視覚特性データに基づき、水晶体分光透過特性モデルΛ_Ｘ、光受容体分布モデルΦ_Ｘ、反対色応答モデルΩ_Ｘ、及び顕著性計算モデルＳ_Ｘを通して入力映像Ｉ内の顕著領域分布をシミュレートするように構成されている。 FIG. 2 is a block diagram showing the configuration of the viewer model X (X = A, B1, B2,...) In the structural analysis processing unit 10 according to the first embodiment, and the processing flow in the saliency model unit 103. FIG. In the figure, the structural analysis processing unit 10 is based on the visual characteristic data of the target viewer, and the lens spectral transmission characteristic model Λ _X , photoreceptor distribution model Φ _X , opposite color response model Ω _X , and saliency calculation The saliency distribution in the input video I is simulated through the model S _X.

より具体的には、構造分析処理部１０は、上記水晶体分光透過特性モデルΛ_Ｘに相当する水晶体モデル部（Λ_Ｘ）１００と、上記光受容体分布モデルΦ_Ｘに相当する光受容体モデル部（Φ_Ｘ）１０１と、上記反対色応答モデルΩ_Ｘに相当する低次応答モデル部（Ω_Ｘ）１０２と、上記顕著性計算モデルＳ_Ｘに相当する顕著性モデル部（Ｓ_Ｘ）１０３とを用いて、入力映像Ｉ内の顕著領域分布をシミュレートすることで、顕著性マップｓ_Ｘを算出する。 More specifically, the structural analysis processing unit 10 includes a lens model unit (Λ _X ) 100 corresponding to the lens spectral transmission characteristic model Λ _X and a photoreceptor model unit corresponding to the photoreceptor distribution model Φ _X. (Φ _X ) 101, a low-order response model portion (Ω _X ) 102 corresponding to the opposite color response model Ω _X , and a saliency model portion (S _X ) 103 corresponding to the saliency calculation model S _X The saliency map s _X is calculated by simulating the saliency distribution in the input video I.

水晶体モデル部（Λ_Ｘ）１００は、入力映像Ｉに対して水晶体をシミュレートして網膜入力を出力する。光受容体モデル部（Φ_Ｘ）１０１は、網膜入力に対して光受容体をシミュレートして光受容出力を出力する。低次応答モデル部（Ω_Ｘ）１０２は、光受容出力に対して低次応答をシミュレートして低次応答を出力する。顕著性モデル部（Ｓ_Ｘ）１０３は、低次応答に対して顕著性をシミュレートし、顕著性マップｓ_Ｘを出力する。なお、図２の下部に示す、顕著性モデル部（Ｓ_Ｘ）１０３での処理については後述する。 The lens model unit (Λ _X ) 100 simulates the lens for the input video I and outputs a retinal input. The photoreceptor model unit (Φ _X ) 101 simulates a photoreceptor with respect to a retinal input and outputs a photoreceptor output. The low-order response model unit (Ω _X ) 102 simulates a low-order response with respect to the photoreceptor output and outputs a low-order response. The saliency model unit (S _X ) 103 simulates saliency with respect to a low-order response and outputs a saliency map s _X. The processing in the saliency model unit (S _X ) 103 shown in the lower part of FIG. 2 will be described later.

次に、本第１実施形態の動作について説明する。映像処理装置１は、以下の３段階の処理を行う。以下の各処理は、パラメータ設定を除いて各視聴者Ｘ（Ｘ＝Ａ、Ｂ１、Ｂ２、…）に共通である。 Next, the operation of the first embodiment will be described. The video processing apparatus 1 performs the following three stages of processing. The following processes are common to each viewer X (X = A, B1, B2,...) Except for parameter setting.

第１段階では、構造分析処理部１０により、入力映像Ｉから、視聴者Ｘの水晶体モデル部（Λ_Ｘ）１００、光受容体モデル部（Φ_Ｘ）１０１、低次応答モデル部（Ω_Ｘ）１０２を用いて、視聴者Ｘの模擬的な低次応答ｃ_Ｘを計算する。
第２段階では、構造分析処理部１０により、上記低次応答ｃ_Ｘから、視聴者Ｘにとって画像中で注目しやすい部分を示す映像（顕著性マップ）ｓ_Ｘを計算する。
第３段階では、比較演算部２０により、各視聴者の顕著性マップｓ_Ａ、ｓ_Ｂ１、ｓ_Ｂ２、…を比較し、顕著性マップｓ_Ａに対する顕著性マップｓ_Ｂｋ（ｋ＝１、２、…）の顕著性低下量を情報損失マップＬとして出力し、統合演算部２１により、情報損失マップＬを統合して、映像全体の情報損失指標ｖ_Ｌを出力する。
以下においては、上記３段階の各処理について、図１および図２を参照して詳細に説明する。 In the first stage, the structural analysis processing unit 10 determines from the input video I the lens model unit (Λ _X ) 100 of the viewer X, the photoreceptor model unit (Φ _X ) 101, and the low-order response model unit (Ω _X ). 102, the simulated low-order response c _X of the viewer _X is calculated.
In the second stage, the structural analysis processing unit 10 calculates a video (saliency map) s _X indicating a portion that is easy to watch in the image for the viewer _X from the low-order response c _X.
In the third stage, the comparison operation unit 20, the saliency map _s A for each _viewer, s _{B1, s B2,} compare ..., saliency map for saliency map _{_{s A s Bk (k = 1,2}} , ..) Is output as an information loss map L, and the integrated calculation unit 21 integrates the information loss map _L to output an information loss index vL for the entire video.
In the following, each of the three steps will be described in detail with reference to FIG. 1 and FIG.

（第１段階）
構造分析処理部１０は、入力映像Ｉから、視聴者Ｘの水晶体分光透過特性モデル部（Λ_Ｘ）１００、光受容体モデル部（Φ_Ｘ）１０１、低次応答モデル部（Ω_Ｘ）１０２を用いて、視聴者Ｘの模擬的な低次応答ｃ_Ｘを計算する。入力映像のフレームｔ、画素ｒ、波長λにおける強度をＩ（λ，ｒ，ｔ）とすると、フレームｔ、画素ｒにおける低次応答をｃ_Ｘ（ｒ，ｔ）は、下の式（１）のように計算される。 (First stage)
From the input video I, the structural analysis processing unit 10 includes a lens spectral transmission characteristic model unit (Λ _X ) 100, a photoreceptor model unit (Φ _X ) 101, and a low-order response model unit (Ω _X ) 102 of the viewer X. The simulated low-order response c _X of the viewer _X is calculated. Assuming that the intensity at the frame t, the pixel r, and the wavelength λ of the input image is I (λ, r, t), the low-order response at the frame t and the pixel r is c _X (r, t), It is calculated as follows.

但し、上記の式（１）において、Λ_Ｘ（λ）は、波長λにおける水晶体分光透過強度を表す。また、Φ_Ｘ（λ）＝（Φ_ＸＬ（λ），Φ_ＸＭ（λ），Φ_ＸＳ（λ））は、各光受容体（Ｌ錐体、Ｍ錐体、Ｓ錐体）の分光吸収感度と細胞体分布強度との積を要素に持つベクトルである。したがって、上記の式（１）の右辺の積分の計算結果は、下の式（２）に示すように、フレームｔ、画素ｒにおける光受容体応答を表すベクトル（Ｌ（ｒ，ｔ），Ｍ（ｒ，ｔ），Ｓ（ｒ，ｔ））として表せる。 However, in the above formula (1), Λ _X (λ) represents the lens spectral transmission intensity at the wavelength λ. Further, Φ _X (λ) = (Φ _X L (λ), Φ _X M (λ), Φ _X S (λ)) is a function of each photoreceptor (L cone, M cone, S cone). This is a vector having the product of spectral absorption sensitivity and cell body distribution intensity as elements. Therefore, the calculation result of the integration on the right side of the above equation (1) is a vector (L (r, t), M representing the photoreceptor response at frame t and pixel r, as shown in equation (2) below. (R, t), S (r, t)).

また、この積分計算は、３原色ディスプレイなど、分光特性が既知の有限の光源の組み合わせにより入力映像が与えられる場合には、下の式（３）のように行列演算の形に簡略化することもできる。 In addition, when the input image is given by a combination of a finite light source having a known spectral characteristic such as a three-primary color display, this integration calculation is simplified to a matrix operation as shown in the following equation (3). You can also.

Λ_Ｘ、Φ_Ｘに用いるパラメータは、視聴者の視覚特性に対応して定められる。例えば、白内障の場合、水晶体において短波長側の分光透過率が低下するため、積分計算において小さなλに対するΦ_Ｘ（λ）の値を小さくしたり、行列計算においてΛ_ＸＢの値を小さくしたりすることで表現できる。また、Ｐ型強度の色弱の場合には、Ｌ錐体が無いため、積分計算におけるΦ_ＸＬ（λ）や行列計算におけるΦ_Ｘ ^Ｌ，Ｒ、Φ_Ｘ ^Ｌ，Ｇ、Φ_Ｘ ^Ｌ，Ｂの値をそれぞれ０にすることで表現できる。 The parameters used for Λ _X and Φ _X are determined according to the visual characteristics of the viewer. For example, in the case of cataract, the spectral transmittance on the short wavelength side of the lens decreases, so the value of Φ _X (λ) for a small λ is reduced in the integral calculation, or the value of Λ _X B is reduced in the matrix calculation It can be expressed by doing. In the case of color weakness of P-type intensity for L cones is not, [Phi _X ^L in Φ _X L (λ) and matrix calculation in the integral ^{_{^{calculation, R, Φ X L, G}}} , Φ X L, the ^B It can be expressed by setting each value to 0.

また、Ω_Ｘは、光受容体応答から低次神経系における色表現ｃ_Ｘを求めるための関数であり、代表的な色表現として「輝度、赤−緑、青−黄」の３変数による反対色表現がある。具体的な変換式として、いくつかの方法が提案されており、適切なものを選択して用いることができる。例えば、ＤＫＬ色空間による表現は、神経系における外側膝状体の細胞応答をよく説明するモデルとして知られ、図２に示す輝度ｌ、赤−緑ｒｇ、青−黄ｂｙの値は、下の式（４）のように行列演算を用いて計算される。 Ω _X is a function for obtaining the color expression c _X in the lower nervous system from the photoreceptor response, and is represented by three variables of “luminance, red-green, blue-yellow” as representative color expressions. There is color expression. As specific conversion formulas, several methods have been proposed, and an appropriate one can be selected and used. For example, the expression in the DKL color space is known as a model that well explains the cell response of the outer knee-like body in the nervous system, and the values of luminance l, red-green rg, blue-yellow by shown in FIG. It is calculated using matrix operation as shown in Equation (4).

但し、上記の式（４）において、Ｌ_０、Ｍ_０、Ｓ_０は、背景色に対するＬ錐体、Ｍ錐体、Ｓ錐体の応答を表す。 However, in the above equation _{_{(4), L 0, M}} 0, S 0 represents L cones for the background color, M cone, the response of the S cone.

（第２段階）
構造分析処理部１０は、顕著性モデル部（Ｓ_Ｘ）１０３において、上記低次応答ｃ_Ｘから、視聴者Ｘにとって画像中で注目しやすい部分を示す映像（顕著性マップ）ｓ_Ｘを計算する（図２を参照）。まず、輝度モダリティのみ抽出した映像（マップ）ｌを、様々な時空間周波数に中心を持つバンドパスフィルタにかけることで、周波数マップ群ｏを生成する。こうして得られた各モダリティの輝度マップｌ、周波数マップｏ、赤緑マップｒｇ、青黄マップｂｙを、特徴モデル部（Ｆ_Ｘ）２００、規格化モデル部（Ｎ_Ｘ）２０１により並列的に処理した後、統合モデル部（Ｕ_Ｘ）２０２により、顕著性マップｓ_Ｘとして、１つの映像に統合する。以下の数式では、各モダリティに共通の処理を説明する際にマップを、記号ｍを用いて表す。 (Second stage)
In the saliency model unit (S _X ) 103, the structure analysis processing unit 10 calculates a video (saliency map) s _X indicating a portion that is easy to watch in the image for the viewer _X from the low-order response c _X. (See FIG. 2). First, a frequency map group o is generated by applying a video (map) l obtained by extracting only luminance modalities to bandpass filters having centers at various spatiotemporal frequencies. The luminance map l, frequency map o, red-green map rg, and blue-yellow map by of each modality thus obtained were processed in parallel by the feature model unit (F _X ) 200 and the normalized model unit (N _X ) 201. After that, the integrated model unit (U _X ) 202 integrates it into one video as the saliency map s _X. In the following mathematical formulas, a map is represented using the symbol m when explaining processes common to the modalities.

特徴モデル部（Ｆ_Ｘ）２００は、マップ中の時空間的な文脈と対比することで、マップの各点を特徴づけるマップ（特徴マップ）を計算する。すなわち、マップ中のある点の近傍における値の分布が、より広い周辺領域における値の分布からどの程度異なっているかを、特徴量として求める。具体的な計算方法の中で最も単純なモデルとして、周辺と近傍の分布平均の差を計算する方法がある。これは、次式（５）に示すように、異なる幅を持つ２つのガウシアンフィルタの差分関数（ＤｏＧ，Difference of Gaussian）を畳み込むことで計算できる。 The feature model unit (F _X ) 200 calculates a map (feature map) that characterizes each point of the map by comparing with the spatiotemporal context in the map. That is, how much the distribution of values in the vicinity of a certain point in the map differs from the distribution of values in a wider peripheral area is obtained as a feature amount. The simplest model of specific calculation methods is a method of calculating the difference between the distribution averages in the vicinity and the vicinity. This can be calculated by convolving a difference function (DoG, Difference of Gaussian) of two Gaussian filters having different widths as shown in the following equation (5).

ＤｏＧに用いられるガウシアンフィルタの幅の組み合わせを様々に変えることで、各モダリティにおいて複数のマップが生成される。 By changing various combinations of the Gaussian filter widths used for DoG, a plurality of maps are generated for each modality.

規格化モデル部（Ｎ_Ｘ）２０１は、下の式（６）に示すように、特徴マップ全体の値の分布に基づいて、各点の値を規格化する。 The normalization model unit (N _X ) 201 normalizes the value of each point based on the distribution of the values of the entire feature map as shown in the following equation (6).

但し、ｚ（ｍ’）は、規格化定数であり、例えば、特徴マップ全体の平均や最大値などを用いることができる。 However, z (m ′) is a normalization constant, and for example, the average or maximum value of the entire feature map can be used.

統合モデル部（Ｕ_Ｘ）２０２は、下の式（７）に示すように、上記特徴モデル部（Ｆ_Ｘ）２００および規格化モデル部（Ｎ_Ｘ）２０１の処理により生成された複数のマップを適当な重みづけにより足し合わせる。 The integrated model unit (U _X ) 202 represents a plurality of maps generated by the processing of the feature model unit (F _X ) 200 and the normalized model unit (N _X ) 201 as shown in the following equation (7). Add together with appropriate weights.

但し、式（７）において、表記「｛｝」はマップの集合を表し、｛ｍ_ｉ ^＊｝＝｛｛ｌ^＊｝，｛ｏ^＊｝，｛ｒｇ^＊｝，｛ｂｙ^＊｝｝である。例えば、デフォーカスによる弱視の場合、特徴モデル部（Ｆ_Ｘ）２００の処理において高空間周波数領域に中心を持つフィルタで処理されたマップの重みＵ_ｘ，ｉを小さくすることで視覚特性による影響を表現できる。 However, in Expression (7), the notation “{}” represents a set of maps, and {m _i ^* } = {{l ^* }, {o ^* }, {rg ^* }, {by ^* }}. For example, in the case of amblyopia due to defocus, the influence of visual characteristics is reduced by reducing the weight U _{x, i} of the map processed by the filter having the center in the high spatial frequency region in the processing of the feature model unit (F _X ) 200. Can express.

（第３段階）
比較演算部２０は、各視聴者の顕著性マップｓ_Ａ、ｓ_Ｂ１、ｓ_Ｂ２、…を比較し、下の式（８）に示すように、顕著性マップｓ_Ａに対するｓ_Ｂｋ（ｋ＝１、２、…）の顕著性低下量を１つの映像（情報損失マップ）Ｌとして出力する。 (3rd stage)
Comparison operation unit 20, saliency map _s A for each _viewer, s _{B1, s B2,} compare ..., as shown in Equation (8) below, _s Bk (k = 1 for the saliency map _{s A} ,...) Is output as one video (information loss map) L.

ここで、ｆは任意の関数である。例えば、単純に顕著性マップの値の差分を考える場合は、ｆとして線形な関数を用いればよい。また、顕著性マップ値が小さい範囲での差異に興味がある場合などには、ｆとして対数関数など非線形の関数を用いることができる。 Here, f is an arbitrary function. For example, when simply considering the difference between the values of the saliency map, a linear function may be used as f. Further, when interested in a difference in a range where the saliency map value is small, a non-linear function such as a logarithmic function can be used as f.

次に、統合演算部２１は、情報損失マップＬの値を映像全体に渡って統合することで映像全体の評価値を計算し、これを情報損失指標ｖ_Ｌとして出力する。統合の計算には、情報損失マップの平均値を用いるものや最大値を用いるものなど、いくつかの方法が可能であり、複数の指標を併用してもよい。 Next, the integration calculation unit 21 calculates the evaluation value of the entire video by integrating the values of the information loss map L over the entire video, and outputs this as the information loss index v _L. For the calculation of integration, several methods are possible, such as a method using an average value of an information loss map and a method using a maximum value, and a plurality of indicators may be used in combination.

比較部２２は、情報損失指標ｖ_Ｌを基準となる情報損失指標値ｖ_Ｌ ^＃と比較し、ｖ_Ｌ＞ｖ_Ｌ ^＃となる場合には、診断結果として、情報損失が許容できない範囲であることを警告する。 The comparison unit 22 compares the information loss index v _L with the reference information loss index value v _L ^#, and if v _L > v _L ^# , the information loss is in an unacceptable range as a diagnosis result. Warning.

［第２実施形態］
次に、本発明の第２実施形態＜映像修正システム＞について説明する。
図３は、本第２実施形態による映像処理装置１の構成を示すブロック図である。なお、第１の実施形態で説明した図１に対応する部分には同一の符号を付けて説明を省略する。図３において、修正部２３は、統合演算部２１の出力である情報損失マップＬ、または比較部２２の出力である、情報損失指標ｖ_Ｌと基準情報損失指標ｖ_Ｌ ^＃との比較結果である診断結果に基づいて、入力映像Ｉを修正するとともに、修正を施した映像を出力する。より具体的には、修正部２３は、情報損失指標ｖ_Ｌの値を最小化するように、言い換えると、診断結果が許容範囲となるように、入力映像Ｉに対して修正を施す。なお、構造分析処理部１０の構成は、図２に示したもの同様であるので説明を省略する。 [Second Embodiment]
Next, a second embodiment <video correction system> of the present invention will be described.
FIG. 3 is a block diagram showing the configuration of the video processing apparatus 1 according to the second embodiment. Note that portions corresponding to those in FIG. 1 described in the first embodiment are denoted by the same reference numerals and description thereof is omitted. In FIG. 3, the correction unit 23 is a comparison result between an information loss map L that is an output of the integrated calculation unit 21 or an information loss index v _L that is an output of the comparison unit 22 and a reference information loss index v _L ^#. Based on the diagnosis result, the input video I is corrected and the corrected video is output. More specifically, the correction unit 23, so as to minimize the value of the information loss index v _L, in other words, as the diagnosis result is the allowable range, performs a correction to the input image I. The structure of the structural analysis processing unit 10 is the same as that shown in FIG.

入力映像Ｉから情報損失指標ｖ_Ｌを計算する過程は、１つの関数、ｖ_Ｌ＝Ｆ（Ｉ）、として表現できる。本第２実施形態では、上記情報損失指標ｖ_Ｌを最小化することを目標として、入力映像Ｉを修正するような最適化問題を解くことに相当する修正処理を行う。これは、一般的な非線形最適化問題に帰着され、原理的に任意の最適化手法を適用できるため、目的に応じてその中から最良のものを選択すればよい。 The process of calculating the information loss index v _L from the input video I can be expressed as one function, v _L = F (I). In the second embodiment, performed with the goal of minimizing the information loss index v _L, a correction process corresponding to solving an optimization problem so as to correct the input image I. This results in a general non-linear optimization problem, and an arbitrary optimization method can be applied in principle. Therefore, the best one can be selected according to the purpose.

例えば、時間的制約が弱い場合には、強化学習アルゴリズム、または遺伝的アルゴリズム等を用いて多数の繰り返し処理による最適化を行うことで、可能な修正案の中から最良のものを求めることができる。 For example, when the time constraint is weak, the best one can be obtained from possible correction proposals by performing optimization through numerous iterative processes using a reinforcement learning algorithm or a genetic algorithm. .

一方、リアルタイムな映像修正など、速い処理が求められる場合には、少数回数の演算で済む処理を用いて、ある程度の精度の最適化を行うこともできる。具体例として、情報損失マップＬの値の符号を反転し、適当なフィードバック用の重みづけを用いて視聴者Ｂｋの規格化モデル部（Ｎ_Ｘ）２０１の出力（図２）に加算し、さらに、図２および図３の処理を逆に辿って加算統合したものを修正された入力映像とする方法がある。この際、上記フィードバック用の重みづけとしては、図３に示す比較演算部２０における重みづけＷ_ｋと図２に示す統合モデル部（Ｕ_Ｘ）２０２における重みづけＵ_ｋｉとを積算した値Ｗ_ｋＵ_ｋｉを用いることができる。 On the other hand, when fast processing such as real-time video correction is required, optimization with a certain degree of accuracy can be performed using processing that requires a small number of operations. As a specific example, the sign of the value of the information loss map L is inverted and added to the output (FIG. 2) of the normalization model part (N _X ) 201 of the viewer Bk using an appropriate feedback weight, There is a method in which the processing of FIG. 2 and FIG. At this time, as the feedback weight, a value W _{k obtained} by integrating the weight W _k in the comparison operation unit 20 shown in FIG. 3 and the weight U _ki in the integrated model unit (U _X ) 202 shown in FIG. U _ki can be used.

これにより、視聴者Ａと視聴者Ｂｋとの間の顕著性マップの形状を近づけることができ、情報損失を低減させることができる。 Thereby, the shape of the saliency map between the viewer A and the viewer Bk can be brought closer, and information loss can be reduced.

図４（ａ）〜（ｃ）は、本発明の第１および第２実施形態において生成される情報損失マップの計算例について説明する概念図である。図４（ａ）には、入力画像（映像）が示されている。なお、入力画像（映像）は、実際にはカラー画像（映像）である。図４（ｂ）には、構造分析処理部１０の所定の視聴者モデル(健常者）から出力される３色覚の顕著性マップｓ１と、所定の視聴者モデル(色弱者）から出力される２色覚の顕著性マップｓ２とが示されている。色の違いは、図の横に示す縦バーで表わしており、白くなるほど目立つことを示している。図示の例では、破線Ｌ１、Ｌ２で囲んだ領域の画像に注目されたい。３色覚の顕著性マップｓ１の方が、２色覚の顕著性マップｓ２に比べて、より目立っていることが分かる。 FIGS. 4A to 4C are conceptual diagrams for explaining calculation examples of the information loss map generated in the first and second embodiments of the present invention. FIG. 4A shows an input image (video). Note that the input image (video) is actually a color image (video). FIG. 4B shows a saliency map s1 for three color vision output from a predetermined viewer model (healthy person) of the structural analysis processing unit 10 and 2 output from a predetermined viewer model (color weak person). A color vision saliency map s2 is shown. The difference in color is represented by a vertical bar shown on the side of the figure, which indicates that the color becomes more conspicuous as it becomes white. In the example shown in the drawing, attention should be paid to the image of the area surrounded by the broken lines L1 and L2. It can be seen that the three-color vision saliency map s1 is more conspicuous than the two-color vision saliency map s2.

図４（ｃ）には、図４（ｂ）に示す、３色覚の顕著性マップｓ１と、２色覚の顕著性マップｓ２と差分を示している。なお、実際には、青〜赤に渡るカラー画像である。色の違いは、図の横に示す縦バーで表わしている。破線Ｌ３で囲んだ領域の画像は、水色に一部黄色から赤色となっており、３色覚者(健常者）には見やすいが、２色覚者（色弱者）には見づらい部分である。 FIG. 4C shows the difference between the three-color vision saliency map s1 and the two-color vision saliency map s2 shown in FIG. 4B. Actually, the color image extends from blue to red. Differences in color are represented by vertical bars shown beside the figure. The image of the area surrounded by the broken line L3 is a part of light blue to yellow, which is easy to see for a three-color person (healthy person) but is difficult for a two-color person (color-weak person).

図５は、本発明の第１および第２実施形態において生成される情報損失指標ｖ_Ｌの計算例について説明する概念図である。同図には、様々な種類のサンプル画像に対して算出された情報損失指標ｖ_Ｌを示している。情報損失指標ｖ_Ｌの値が大きいほど見づらい（悪い）ことを示している。サンプル画像としては、左から、窓と花のある風景の画像３００、女性の顔の画像３０１、カラフルな模様を有する鳥の画像３０２、色覚検査用迷彩文字パターン（その１：赤系のドット中に緑色系の「７」）３０３、色覚検査用迷彩文字パターン（その２：緑系のドット中に赤系の「８」）３０４、色覚検査用迷彩文字パターン（その３：グレー系のドット中に赤系の「４２」）３０５、ボウルに盛られたカラフルなフルーツの画像３０６、日本列島周辺の地図（天気図）の画像３０７を用いている。視聴者モデルが同一であっても（言い換えれば、テレビの視聴者が同一であっても）、映像の種類によって、見づらさの度合いが異なることが分かる。 FIG. 5 is a conceptual diagram illustrating a calculation example of the information loss index v _L generated in the first and second embodiments of the present invention. This figure shows information loss index v _L calculated for various types of sample images. Shows that the larger the value of the information loss index v _L ugly (bad). Sample images from the left are a landscape image 300 with a window and a flower, a female face image 301, a bird image 302 having a colorful pattern, and a camouflage character pattern for color vision inspection (part 1: in red dots) Green color “7”) 303, color vision test camouflage character pattern (part 2: red color “8” in the green dot) 304, color vision test camouflage character pattern (part 3: in the gray color dot) In addition, a red-colored “42”) 305, an image 306 of colorful fruits in a bowl, and an image 307 of a map (weather map) around the Japanese archipelago are used. Even if the viewer model is the same (in other words, the TV viewer is the same), it can be seen that the degree of difficulty in viewing differs depending on the type of video.

上述した第１および第２実施形態によれば、構造分析の段階で、異なる視覚特性モデルを複数併用し、各種の視聴者モデルにおける知覚処理をシミュレートし、その際、構造分析において、視覚情報の中で顕著性が高い部分を抽出する処理を行い、その結果を視認性評価として用いることにより、様々な視覚特性上のマイノリティに属する視聴者に対する、一般視聴者が映像から受け取る情報に比べた際の情報損失を低減することができる。すなわち、ユニバーサルデザインに配慮した視覚情報の提供を、より簡便に、かつ確実に行うことが可能になる。 According to the first and second embodiments described above, at the stage of structural analysis, a plurality of different visual characteristic models are used together to simulate perceptual processing in various viewer models. Compared with the information received by general viewers from video for viewers belonging to minorities on various visual characteristics by extracting the part with high saliency and using the result as visibility evaluation Information loss can be reduced. That is, it becomes possible to provide visual information in consideration of universal design more easily and reliably.

＜第１の実施例＞
本発明の第１の実施例として、コンピュータグラフィクス等での図案作成時の事前チェックシステムを説明する。映像作成段階において、図案等のデザイン担当者は、多くの場合、デザインに関する専門的な技術と芸術的評価とに基づいて図案を作成する。こうしたデザイン技術や芸術的評価を損なうことなく、ユニバーサルデザインに配慮した映像を作成するためには、本発明における「映像評価システム」の結果を、映像作成時の事前チェックの用途に用いることが望ましい。 <First embodiment>
As a first embodiment of the present invention, a pre-check system at the time of creating a design using computer graphics or the like will be described. In the video creation stage, a person in charge of designing a design or the like often creates a design on the basis of specialized technology and artistic evaluation regarding design. In order to create a video in consideration of universal design without impairing such design technology and artistic evaluation, it is desirable to use the result of the “video evaluation system” in the present invention for a preliminary check at the time of video creation. .

図６は、本発明の第１実施形態による映像処理装置を適用した第１の実施例について説明する概念図である。素材登録ＰＣ４００は、素材映像を多数記録したデータベースである。精密試写ＰＣ４０１は、素材映像の試写用のコンピュータである。素材登録ＰＣ４００と精密試写ＰＣ４０１とは、ＬＡＮ（ローカルエリアネットワーク）４０３で接続され、さらに、ネットワーク４０４（例えば、広域ネットワークであるインターネットなど）に接続されている。精密試写ＰＣ４０１からは、ＳＤＩ（Serial Digital Interface：シリアルデジタルインタフェース）４０５を介してスタジオ内へ素材映像が送出されている。 FIG. 6 is a conceptual diagram illustrating a first example to which the video processing apparatus according to the first embodiment of the present invention is applied. The material registration PC 400 is a database in which a large number of material videos are recorded. The precision preview PC 401 is a computer for previewing material video. The material registration PC 400 and the precision preview PC 401 are connected via a LAN (local area network) 403 and further connected to a network 404 (for example, the Internet as a wide area network). The precision preview PC 401 sends a material video to the studio via an SDI (Serial Digital Interface) 405.

また、試写用モニタ４０６は、精密試写ＰＣ４０１から送出される素材映像の試写を行うためのモニタである。チェッカＰＣ４０７は、本発明における「映像評価システム」、すなわち第１実施形態による映像処理装置１が適用されたコンピュータであり、精密試写ＰＣ４０１から送出される素材映像に対する評価結果を表示する。図案等のデザイン担当者は、チェッカＰＣ４０７に表示される、素材映像に対する評価結果を確認することで、映像作成時の事前チェックなどを行う。 The preview monitor 406 is a monitor for previewing the material video sent from the precision preview PC 401. The checker PC 407 is a “video evaluation system” in the present invention, that is, a computer to which the video processing apparatus 1 according to the first embodiment is applied, and displays an evaluation result for the material video sent from the precision preview PC 401. A person in charge of design such as a design checks the evaluation result of the material video displayed on the checker PC 407 to perform a preliminary check at the time of video creation.

デザイン担当者は、チェッカＰＣ４０７で、評価結果に基づいて、情報損失のある箇所を判別し、それに基づいて作成した図案を手作業で修正することができる。こうすることで、デザイン担当者独自の芸術的感性やデザイン意図を損なうことなく、ユニバーサルデザインに配慮した映像を作成することができる。 The person in charge of the design can check the location where the information is lost on the checker PC 407 based on the evaluation result, and can manually correct the design created based on the location. By doing this, it is possible to create a video that takes universal design into consideration without losing the artistry's original artistic sensibility and design intent.

＜第２の実施例＞
また、本発明の第２の実施例として、既存の図案や映像のチェックおよび修正システムを説明する。資料として残されている映像の中には、用いられている図案の電子データがすでに破棄されている場合や、デザイン担当者が直接それらの修正に携われない場合がほとんどである。また、膨大な資料を全て手作業でチェック・修正することが困難である場合もある。 <Second embodiment>
As a second embodiment of the present invention, an existing design and video check and correction system will be described. In most of the videos left as materials, the electronic data of the designs used has already been discarded, or the designer is not directly involved in correcting them. In addition, it may be difficult to manually check and correct all the enormous amounts of material.

そうした映像をユニバーサルデザインに叶ったものに修正する際には、本発明の「映像評価システム」およびそれに基づく自動的な「映像修正システム」を利用することができる。これにより、手作業での修正が困難な場合でも、既存の映像を多くの視聴者にとって情報伝達が行える映像に変換することができる。 When such a video is corrected to a universal design, the “video evaluation system” of the present invention and an automatic “video correction system” based thereon can be used. As a result, even if manual correction is difficult, an existing video can be converted into a video that can be transmitted to many viewers.

＜第３の実施例＞
さらに、本発明の第３の実施例として、映像の受信側での自動映像修正システムを説明する。映像の中には、ユニバーサルデザインに配慮した処理がなされないまま視聴者のもとへ届けられるものもある。本発明の「映像評価システム」およびそれに基づく自動的な「映像修正システム」を受信者側の視聴環境に組み込むことで、ユニバーサルデザインに配慮した処理がなされない映像に関しても的確に変換し、情報伝達を助けることができる。 <Third embodiment>
Furthermore, an automatic video correction system on the video reception side will be described as a third embodiment of the present invention. Some videos are delivered to viewers without being treated for universal design. By incorporating the “video evaluation system” of the present invention and an automatic “video correction system” based on it into the viewing environment on the receiver side, it is possible to accurately convert and transmit information that is not processed in consideration of universal design. Can help.

この場合、様々な視覚特性を持った視聴者全てに考慮した変換を行う必要はなく、受信側で実際に映像を視聴する視聴者の視覚特性のみ考慮すれば良い。これは、本発明における「映像評価システム」において、該当する視覚特性のみが出力に反映されるよう、適切な重みづけを行うことで簡単に実現できる。これにより、個々の視聴者に最適化した処理が行えるだけでなく、考慮すべき視覚特性が絞られることで、計算上の付加を軽減することもできる。 In this case, it is not necessary to perform conversion in consideration of all viewers having various visual characteristics, and only the visual characteristics of viewers who actually view video on the receiving side need to be considered. This can be easily realized by appropriately weighting so that only the relevant visual characteristic is reflected in the output in the “video evaluation system” of the present invention. Thereby, not only processing optimized for individual viewers can be performed, but also the visual characteristics to be considered can be narrowed down, thereby reducing the computational load.

１映像処理装置
１０構造分析処理部
１１視聴者モデルＡ
１２視聴者モデルＢ１
１３視聴者モデルＢ２
２０比較演算部
２１統合演算部
２２比較部
２３修正部
１００水晶体モデル部
１０１光受容体モデル部
１０２低次応答モデル部
１０３顕著性モデル部
２００特徴モデル部
２０１規格化モデル部
２０２統合モデル部
４００素材登録ＰＣ
４０１精密試写ＰＣ
４０３ＬＡＮ
４０４ネットワーク
４０５ＳＤＩ
４０６試写用モニタ
４０７チェッカＰＣ DESCRIPTION OF SYMBOLS 1 Video processing apparatus 10 Structure analysis process part 11 Viewer model A
12 Viewer model B1
13 Viewer model B2
DESCRIPTION OF SYMBOLS 20 Comparison calculation part 21 Integrated calculation part 22 Comparison part 23 Correction part 100 Lens model part 101 Photoreceptor model part 102 Low order response model part 103 Saliency model part 200 Feature model part 201 Normalization model part 202 Integrated model part 400 Material Registered PC
401 Precision preview PC
403 LAN
404 network 405 SDI
406 Preview monitor 407 Checker PC

Claims

複数の異なる視覚特性モデルに基づいて入力映像に対して並列的な構造分析処理を行い、前記構造分析処理の結果として、前記複数の異なる視覚特性モデルに対応する顕著性マップを算出する構造分析処理部と、
前記構造分析処理部により算出された、前記複数の異なる視覚特性モデルの各々に対する前記顕著性マップを比較し、比較結果を情報損失マップとして出力する比較演算部と、
前記比較演算部からの前記情報損失マップを統合して、単一または複数の情報損失指標を算出する統合演算部と、
前記統合演算部により算出された情報損失指標を基準情報損失指標と比較し、該比較結果を診断結果として出力する比較部と
を備えることを特徴とする映像処理装置。 A structure analysis process for performing parallel structure analysis processing on an input video based on a plurality of different visual characteristic models and calculating a saliency map corresponding to the plurality of different visual characteristic models as a result of the structure analysis process And
A comparison operation unit that compares the saliency map calculated by the structural analysis processing unit with respect to each of the plurality of different visual characteristic models, and outputs a comparison result as an information loss map;
Integrating the information loss map from the comparison operation unit to calculate a single or a plurality of information loss indicators;
A video processing apparatus comprising: a comparison unit that compares the information loss index calculated by the integrated calculation unit with a reference information loss index and outputs the comparison result as a diagnosis result.

前記比較部による診断結果に基づいて前記入力映像を修正する修正部、をさらに備えることを特徴とする請求項１に記載の映像処理装置。 The video processing apparatus according to claim 1, further comprising a correction unit that corrects the input video based on a diagnosis result by the comparison unit.

前記構造分析処理部は、対象とする視聴者の視覚特性データに基づいて、水晶体分光透過特性モデル、光受容体分布モデル、反対色応答モデル、顕著性計算モデルを用いて、入力映像内の顕著領域分布をシミュレートすることで、前記顕著性マップを算出する、
ことを特徴とする請求項１または２のいずれか一項に記載の映像処理装置。 The structural analysis processing unit uses the lens spectral transmission characteristic model, the photoreceptor distribution model, the opposite color response model, and the saliency calculation model based on the visual characteristic data of the target viewer, to make the saliency in the input video. Calculating the saliency map by simulating the region distribution;
The video processing apparatus according to claim 1, wherein the video processing apparatus is a video processing apparatus.

前記修正部は、前記情報損失指標の値を最小化するように前記入力映像に対して修正を施す、
ことを特徴とする請求項２に記載の映像処理装置。 The correction unit corrects the input image so as to minimize the value of the information loss index.
The video processing apparatus according to claim 2.

コンピュータに、
複数の異なる視覚特性モデルに基づいて入力映像に対して並列的な構造分析を行い、該構造分析の結果として、前記複数の異なる視覚特性モデルに対応する顕著性マップを算出する構造分析処理ステップと、
前記複数の異なる視覚特性モデルの各々に対する前記顕著性マップを比較し、比較結果を情報損失マップとして出力する比較演算ステップと、
前記情報損失マップを統合して、単一、または複数の情報損失指標を算出する統合演算ステップと、
前記算出された情報損失指標を基準情報損失指標と比較し、該比較結果を診断結果として出力する比較ステップと
を実行させるプログラム。 On the computer,
A structural analysis processing step of performing parallel structural analysis on the input video based on a plurality of different visual characteristic models, and calculating a saliency map corresponding to the plurality of different visual characteristic models as a result of the structural analysis; ,
A comparison operation for comparing the saliency maps for each of the plurality of different visual characteristic models and outputting a comparison result as an information loss map;
An integration operation step of integrating the information loss map to calculate a single or a plurality of information loss indicators;
A program for comparing the calculated information loss index with a reference information loss index and outputting the comparison result as a diagnosis result.