JP5067310B2

JP5067310B2 - Subtitle area extraction apparatus, subtitle area extraction method, and subtitle area extraction program

Info

Publication number: JP5067310B2
Application number: JP2008206289A
Authority: JP
Inventors: 洪亮白; 俊孫; 裕勝山; 克仁藤本; 聡直井
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2007-08-09
Filing date: 2008-08-08
Publication date: 2012-11-07
Anticipated expiration: 2028-08-08
Also published as: JP2009043265A; CN101365072A; CN100589532C

Description

この発明は、ビデオから字幕領域を抽出する字幕領域抽出装置、字幕領域抽出方法および字幕領域抽出プログラムに関する。 The present invention relates to a caption area extraction device, a caption area extraction method, and a caption area extraction program for extracting a caption area from a video.

近来、放送番組、テレビ、ムービーの進歩に伴って大量のビデオが発生するとともに、デジタルスチルカメラおよびデジタルビデオカメラの普及に伴って、種々なビデオが大衆により作成され、これらのビデオに対して分析及び検索を行なうことが多くの人々により要求されるようになった。ビデオは普通、字幕情報を含み、従来の画像およびビデオ処理技術、光学キャラクタ識別技術により字幕情報を抽出して、効果的なビデオ分析および検索に重要な情報を提供する。字幕はイベント発生時点、場所、当事者、スポーツの試合の得点、天気予報、商品の値段などを含む。字幕情報を抽出する前に、まずビデオから字幕領域を抽出することが必要となる。現在多くの字幕領域抽出方法が提案されて、例えば特許文献１、２および非特許文献１〜４に開示されている。 In recent years, with the progress of broadcast programs, television, and movies, a large amount of video has been generated, and with the widespread use of digital still cameras and digital video cameras, various videos have been created by the public and analyzed for these videos. And many people are now required to conduct searches. A video usually contains subtitle information, and subtitle information is extracted by conventional image and video processing techniques and optical character identification techniques to provide important information for effective video analysis and search. Subtitles include event time, location, parties, sports game scores, weather forecasts, product prices, and so on. Before subtitle information is extracted, it is necessary to first extract a subtitle area from the video. Many subtitle region extraction methods have been proposed at present and disclosed in, for example, Patent Documents 1 and 2 and Non-Patent Documents 1 to 4.

例えば、連結領域の特徴、エッジ特徴（特許文献１、２および非特許文献１を参考）とテクスチャ特徴（非特許文献２を参考）などの種々な特徴により字幕領域を抽出することができる。連結領域の特徴に基づく方法はアルゴリズムが簡単で、運算速度は高速であるが、複雑な背景状況に適応させることが困難である。エッジ特徴に基づく方法は、効果的にエッジ特徴を統計することが複雑である。テクスチャ特徴に基づく方法は、運算は時間がかかり、ビデオ復号アルゴリズムにより運動ベクトル情報を抽出することは普通であり、しかし運動ベクトル情報を抽出することそのものは困難であるので、テクスチャ特徴を使用する方法はあまり提案されていない。 For example, a caption area can be extracted by various features such as a feature of a connected area, an edge feature (see Patent Documents 1 and 2 and Non-Patent Document 1), and a texture feature (see Non-Patent Document 2). The method based on the feature of the connected region has a simple algorithm and a high calculation speed, but it is difficult to adapt to a complicated background situation. Edge feature based methods are complex in that they effectively statistic edge features. The method based on texture features is time consuming and it is normal to extract motion vector information by video decoding algorithm, but it is difficult to extract motion vector information itself, so the method using texture features Has not been proposed much.

システムの処理速度を高めるために、字幕領域識別中でビデオ中の１つの画像に対して識別することが可能となる。例えば、非特許文献１、２の方法では、ビデオのうちある特定の１つのフレームのみに対して、例えばMoving Picture Experts Group(MPEG)に従ったビデオのうちＩフレームのみに対して処理してから、画像処理技術により処理を行って、字幕領域を抽出する。しかしながら、字幕が複雑な背景に存在すれば、ロバストな字幕検出はできない。なお、非特許文献１の方法では、字幕検出において１つのスケールのみを考慮して、効果的に異なるフォントの大きさに応対できない。 In order to increase the processing speed of the system, it is possible to identify one image in the video during subtitle area identification. For example, in the methods of Non-Patent Documents 1 and 2, only a specific one frame of video is processed, for example, only I frame of video according to Moving Picture Experts Group (MPEG) is processed. Then, processing is performed using an image processing technique to extract a caption area. However, if captions exist in a complicated background, robust caption detection cannot be performed. Note that the method of Non-Patent Document 1 cannot effectively deal with different font sizes in consideration of only one scale in subtitle detection.

特開２００６−５３８０２号公報Japanese Patent Laid-Open No. 2006-53802 特開平９−１６７６９号公報Japanese Patent Laid-Open No. 9-16769 Rainer Lienhart等，“Localizing and Segmenting Text in Image and Videos”，IEEE Transactions on Circuits and System for Video Technology，Vol. 12, No. 4, pp. 256-268，2002；Rainer Lienhart et al., “Localizing and Segmenting Text in Image and Videos”, IEEE Transactions on Circuits and System for Video Technology, Vol. 12, No. 4, pp. 256-268, 2002; Yu Zhong等，“Automatic Caption Localization in Compressed Video”，IEEE Transaction on Pattern Analysis and Machine Intelligence，Vol.22, No.4, pp. 385-392，2000；Yu Zhong et al., “Automatic Caption Localization in Compressed Video”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.22, No.4, pp. 385-392, 2000; Xiaoou Tang等，“A Spatial-Temporal Approach for Video Caption Detection and Recognition”，IEEE Transactions on Neural Network，Vol. 13, No. 4, pp. 961-971，2002；Xiaoou Tang et al., “A Spatial-Temporal Approach for Video Caption Detection and Recognition”, IEEE Transactions on Neural Network, Vol. 13, No. 4, pp. 961-971, 2002; Toshio Sato等，“Video OCR for Digital News Archive”，Workshop on Content-Based Access of Image and Video Databases， pp52-60, 1998。Toshio Sato et al., “Video OCR for Digital News Archive”, Workshop on Content-Based Access of Image and Video Databases, pp52-60, 1998.

字幕の背景は複雑であるので、単枚の画像を使用するだけでは実際の要求を満足できない。１つの字幕は一定の期間内ビデオに存在し、且つ一般的には字幕の位置は不変であるという特徴を活用することにより、字幕識別機能を顕著に高めることができる。現在、字幕識別中で字幕の時間情報を考慮する方法が提案されて、例えば非特許文献３、４を参考にする。しかしながら、非特許文献３、４の方法では、あらゆるフレームを選択せずに使用しているが、抽出効率の観点からは、十分であると言えない。 Since the background of subtitles is complicated, the actual requirement cannot be satisfied only by using a single image. By utilizing the feature that one subtitle exists in the video within a certain period and the position of the subtitle is generally unchanged, the subtitle identification function can be remarkably enhanced. Currently, a method for considering time information of subtitles during subtitle identification has been proposed. For example, Non-Patent Documents 3 and 4 are referred to. However, in the methods of Non-Patent Documents 3 and 4, every frame is used without being selected, but it cannot be said that it is sufficient from the viewpoint of extraction efficiency.

本発明は従来の技術課題を鑑みてなされたものであり、効率よく、正しくビデオから字幕領域を抽出できる字幕領域抽出装置、字幕領域抽出方法および字幕領域抽出プログラムを提供することを目的とする。 The present invention has been made in view of the conventional technical problems, and an object thereof is to provide a caption area extraction device, a caption area extraction method, and a caption area extraction program that can efficiently and correctly extract a caption area from a video.

上述した課題を解決し、目的を達成するため、本発明にかかるビデオ信号から字幕領域を抽出する字幕領域抽出装置は、前記ビデオ信号を復号して、複数の画像を生成するビデオ復号手段と、時間の順に従って、前記ビデオ復号手段により出力された前記複数の画像を所定の大きさの画像グループに分け、画像グループ毎にエッジ特徴に基づいて所定数の画像を選択する画像選択手段と、前記画像グループのそれぞれに対して、前記画像選択手段により選択された画像を平均化処理して、平均化画像が得られる平均化手段と、前記平均化画像からコーナーを抽出し、前記コーナーの密度に基づいて特徴直線を抽出し、前記特徴直線の密度に基づいて前記平均化画像から字幕領域を抽出する抽出手段とを備える。 In order to solve the above-described problems and achieve the object, a caption area extraction device for extracting a caption area from a video signal according to the present invention includes a video decoding means for decoding the video signal and generating a plurality of images, Image selection means for dividing the plurality of images output by the video decoding means into image groups of a predetermined size according to the order of time, and selecting a predetermined number of images based on edge features for each image group; and For each of the image groups, the image selected by the image selection unit is averaged to obtain an averaged image, corners are extracted from the averaged image, and the density of the corners is obtained. Extraction means for extracting a feature line from the averaged image based on the density of the feature line .

なお、日本語や中国語などの東洋文字の筆画特徴を考えれば、コーナー特徴が非常に顕著であり、コーナーは筆画の出発点、交差点、終点に発生することが多いが、背景のコーナー特徴はあまり顕著ではないので、本発明では、コーナー特徴に基づいて字幕領域を抽出する。 In addition, considering the stroke characteristics of oriental characters such as Japanese and Chinese, corner characteristics are very prominent, and corners often occur at the starting point, intersection, and end point of strokes. Since it is not so noticeable, in the present invention, a caption area is extracted based on the corner feature.

また、本発明にかかる字幕領域抽出装置は、上記字幕領域抽出装置において、前記抽出手段が複数のスケールで水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出して、前記複数のスケールで抽出された字幕領域をクラスタリング処理する。 Also, the caption area extraction apparatus according to the present invention, in the above SL subtitles area extracting device, the extracting means extracts the respective horizontal caption region and vertical subtitle area in the horizontal and vertical directions at multiple scales, the plurality of Clustering processing is performed on the subtitle area extracted by the scale.

また、本発明にかかる字幕領域抽出装置は、上記字幕領域抽出装置において、前記抽出手段は前記平均化画像からHarrisコーナーを抽出して、Harrisコーナー毎に、該Harrisコーナーを中心とする所定サイズのウインドウ内のHarrisコーナーの数を算出し、前記数が所定の閾値未満であれば、該Harrisコーナーを捨てる。 Also, the caption area extraction apparatus according to the present invention, in the above SL subtitles area extracting device, the extracting means extracts the Harris corner from the averaged image, for each Harris corner predetermined centered the Harris corner The number of Harris corners in the size window is calculated, and if the number is less than a predetermined threshold, the Harris corner is discarded.

また、本発明にかかる字幕領域抽出装置は、上記字幕領域抽出装置において、前記抽出手段は水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出し、該装置は前記抽出手段が抽出した水平字幕領域および垂直字幕領域を、前記水平字幕領域と垂直字幕領域が互いに重なり合わないように処理する後処理手段を具備する。 Also, the caption area extraction apparatus according to the present invention, in the above SL subtitles area extracting device, the extracting means extracts the respective horizontal caption region and vertical subtitle area in a horizontal and vertical direction, the said device said extracting means extracts Post-processing means for processing the horizontal subtitle area and the vertical subtitle area thus processed so that the horizontal subtitle area and the vertical subtitle area do not overlap each other.

また、本発明にかかるビデオ信号から字幕領域を抽出する字幕領域抽出方法は、前記ビデオ信号を復号して、複数の画像を生成するステップと、時間の順に従って、前記複数の画像を所定の大きさの画像グループに分け、画像グループ毎にエッジ特徴に基づいて所定数の画像を選択するステップと、前記画像グループのそれぞれに対して、前記選択された画像を平均化処理して、平均化画像が得られるステップと、前記平均化画像からコーナーを抽出し、前記コーナーの密度に基づいて特徴直線を抽出し、前記特徴直線の密度に基づいて前記平均化画像から字幕領域を抽出するステップとを含む。 The subtitle area extraction method for extracting a subtitle area from a video signal according to the present invention includes: a step of decoding the video signal to generate a plurality of images; A predetermined number of images based on edge features for each image group, and averaging the selected images for each of the image groups to obtain an averaged image And a step of extracting a corner from the averaged image, extracting a feature line based on the density of the corner, and extracting a caption area from the averaged image based on the density of the feature line. Including .

また、本発明にかかる字幕領域抽出方法は、上記字幕領域抽出方法において、前記字幕領域を抽出するステップにおいて、複数のスケールで水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出して、前記複数のスケールで抽出された字幕領域をクラスタリング処理する。 Also, the caption area extraction method according to the present invention, in the above SL subtitles region extraction method, in the step of extracting the caption region, respectively extracts a horizontal caption region and vertical subtitle area in the horizontal and vertical directions at multiple scales Then, the subtitle areas extracted at the plurality of scales are clustered.

また、本発明にかかる字幕領域抽出方法は、上記字幕領域抽出方法において、前記字幕領域を抽出するステップにおいて、前記平均化画像からHarrisコーナーを抽出して、Harrisコーナー毎に、該Harrisコーナーを中心とする所定サイズのウインドウ内のHarrisコーナーの数を算出し、前記数が所定の閾値未満であれば、該Harrisコーナーを捨てる。 Also, the caption area extraction method according to the present invention, in the above SL subtitles region extraction method, in the step of extracting the caption area, extracts the Harris corner from the averaged image, for each Harris corner, the Harris Corner The number of Harris corners in a window of a predetermined size centered on is calculated, and if the number is less than a predetermined threshold, the Harris corner is discarded.

また、本発明にかかる字幕領域抽出方法は、上記字幕領域抽出方法において、前記字幕領域を抽出するステップにおいて、水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出し、該方法は前記水平字幕領域および垂直字幕領域を、前記水平字幕領域と垂直字幕領域が互いに重なり合わないように処理する後処理ステップを具備する。 Also, the caption area extraction method according to the present invention, in the above SL subtitles region extraction method, in the step of extracting the caption area, extracts the horizontal and respectively horizontal caption region and vertical subtitle area in a vertical direction, the method comprising And a post-processing step of processing the horizontal subtitle area and the vertical subtitle area so that the horizontal subtitle area and the vertical subtitle area do not overlap each other.

また、本発明にかかる字幕領域抽出プログラムは、情報処理デバイスに、ビデオ信号を復号して、複数の画像を生成するステップと、時間の順に従って、前記複数の画像を所定の大きさの画像グループに分け、画像グループ毎にエッジ特徴に基づいて所定数の画像を選択するステップと、前記画像グループのそれぞれに対して、前記選択された画像を平均化処理して、平均化画像が得られるステップと、前記平均化画像からコーナーを抽出し、前記コーナーの密度に基づいて特徴直線を抽出し、前記特徴直線の密度に基づいて前記平均化画像から字幕領域を抽出するステップとを実行させて、ビデオ信号から字幕領域を抽出する。 The subtitle area extraction program according to the present invention includes a step of decoding a video signal and generating a plurality of images to an information processing device, and the plurality of images in a predetermined size according to a time sequence. A step of selecting a predetermined number of images based on edge features for each image group, and a step of averaging the selected images for each of the image groups to obtain an averaged image And extracting a corner from the averaged image, extracting a feature line based on the density of the corner, and extracting a caption area from the averaged image based on the density of the feature line , Extract a caption area from a video signal.

また、本発明にかかる字幕領域抽出プログラムは、上記字幕領域抽出プログラムにおいて、前記字幕領域を抽出するステップで、複数のスケールで水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出して、前記複数のスケールで抽出された字幕領域をクラスタリング処理する。 Also, the caption area extraction program according to the present invention, in the above SL subtitles region extraction program, in the step of extracting the caption region, respectively extracts a horizontal caption region and vertical subtitle area in the horizontal and vertical directions at multiple scales Then, the subtitle areas extracted at the plurality of scales are clustered.

また、本発明にかかる字幕領域抽出プログラムは、上記字幕領域抽出プログラムにおいて、前記字幕領域を抽出するステップで、前記平均化画像からHarrisコーナーを抽出して、Harrisコーナー毎に、該Harrisコーナーを中心とする所定サイズのウインドウ内のHarrisコーナーの数を算出し、前記数が所定の閾値未満であれば、該Harrisコーナーを捨てる。 Also, the caption area extraction program according to the present invention, in the above SL subtitles region extraction program, in the step of extracting the caption area, extracts the Harris corner from the averaged image, for each Harris corner, the Harris Corner The number of Harris corners in a window of a predetermined size centered on is calculated, and if the number is less than a predetermined threshold, the Harris corner is discarded.

また、本発明にかかる字幕領域抽出プログラムは、上記字幕領域抽出プログラムにおいて、前記字幕領域を抽出するステップで、水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出し、該プログラムは前記情報処理デバイスに、前記水平字幕領域および垂直字幕領域を、前記水平字幕領域と垂直字幕領域が互いに重なり合わないように処理する後処理ステップを実行させる。 Also, the caption area extraction program according to the present invention, in the above SL subtitles region extraction program, said in step of extracting the caption area, extracts the horizontal and respectively horizontal caption region and vertical subtitle area in a vertical direction, the program The information processing device executes a post-processing step of processing the horizontal subtitle area and the vertical subtitle area so that the horizontal subtitle area and the vertical subtitle area do not overlap each other.

本発明によれば、複数のフレームのビデオの情報を使用して字幕領域を抽出することにより、動的な背景のビデオであっても、背景と字幕とのコントラストを高め、正しくロバストな字幕領域検出ができる。なお、本発明において、画像フレームを選択的に使用することにより、更に効率よく、正しく字幕領域を抽出することができる。 According to the present invention, by extracting the subtitle area using video information of a plurality of frames, the contrast between the background and the subtitle is increased and the robust subtitle area is correct even for dynamic background video. Can be detected. In the present invention, by selectively using an image frame, a caption area can be extracted more efficiently and correctly.

以下に添付図面を参照して、この発明に係る字幕領域抽出装置、字幕領域抽出方法および字幕領域抽出プログラムの好適な実施の形態を詳細に説明する。 Exemplary embodiments of a caption area extraction device, a caption area extraction method, and a caption area extraction program according to the present invention will be described below in detail with reference to the accompanying drawings.

まず、本発明の実施例１を説明する。図１は字幕画面の一例を示す模式図であり、図２は本発明の実施例１に係わる字幕領域抽出装置１０の模式ブロック図である。図１に示す画面中では、三つの典型的な字幕「富士通ふじつ」が含まれ、aは小フォントであり、bは垂直フォントであり、cは水平フォントである。 First, Example 1 of the present invention will be described. FIG. 1 is a schematic diagram illustrating an example of a caption screen, and FIG. 2 is a schematic block diagram of a caption area extraction device 10 according to Embodiment 1 of the present invention. The screen shown in FIG. 1 includes three typical subtitles “Fujitsu Fujitsu”, where a is a small font, b is a vertical font, and c is a horizontal font.

図２に示すように、本実施例に係わる字幕領域抽出装置１０はビデオ復号手段１１、画像選択手段１２、平均化手段１３、抽出手段１４および後処理手段１５を含む。このうち、ビデオ復号手段１１は、ビデオファイルまたはビデオストリーム（ビデオ採集デバイスからのデータストリーム）を含むビデオ信号に対して復号処理を行って、複数の画像を生成する。 As shown in FIG. 2, the caption area extracting apparatus 10 according to this embodiment includes a video decoding unit 11, an image selecting unit 12, an averaging unit 13, an extracting unit 14, and a post-processing unit 15. Among these, the video decoding unit 11 performs a decoding process on a video signal including a video file or a video stream (data stream from a video collecting device) to generate a plurality of images.

画像選択手段１２は、時間の順に従って、ビデオ復号手段１１から出力された画像を所定の大きさの画像グループに分けて、画像グループ毎にエッジ特徴に基づいて所定の数の画像を選択する。 The image selection unit 12 divides the images output from the video decoding unit 11 into image groups of a predetermined size according to the order of time, and selects a predetermined number of images based on edge features for each image group.

平均化手段１３は、画像選択手段１２により選択された画像を平均化処理して、平均化画像が得られる。抽出手段１４は、Harrisコーナーの特徴に基づいて、平均化画像から字幕領域を抽出する。 The averaging means 13 averages the image selected by the image selection means 12 to obtain an averaged image. The extraction means 14 extracts a caption area from the averaged image based on the Harris corner feature.

後処理手段１５は主色特徴に基づいて、抽出手段１４により抽出された字幕領域を水平字幕領域と垂直字幕領域が互いに重なり合わないように処理する。以下、字幕領域抽出装置１０の動作の流れを結合して、各手段を具体的に説明する。 The post-processing unit 15 processes the caption area extracted by the extracting unit 14 based on the main color feature so that the horizontal caption area and the vertical caption area do not overlap each other. Each means will be specifically described below by combining the flow of operations of the caption area extraction device 10.

図３は、本発明の実施例１に係わる字幕領域抽出装置におけるビデオ復号手段１１の模式ブロック図である。本実施例１では、ビデオ復号手段１１はマイクロソフト（Microsoft(R)）社のDirectShow(R)技術を使用する。 FIG. 3 is a schematic block diagram of the video decoding means 11 in the caption area extracting apparatus according to Embodiment 1 of the present invention. In the first embodiment, the video decoding unit 11 uses the DirectShow® technology of Microsoft®.

DirectShow技術はマイクロソフトのWindows(R)プラットフォームでのストリームメディアアーキテクチャであり、マルチメディアのビデオおよびオーディオの捉えおよび再生を提供する。DirectShowの基本骨組みは多くの部分を含み、本実施例１において、オーディオ・ビデオ分離器１１１とビデオ復号器（FFMPEG復号器）１１２を含み、その一部のモジュールのみを使用する。勿論、本発明はDirectShow技術を使用することに限定されることなく、特定のビデオエンコード形態に対して相応する復号器を使用することは可能となる。ビデオ信号からビットマップ形態の画像を出力できればよい。 DirectShow technology is a stream media architecture on Microsoft's Windows (R) platform that provides multimedia video and audio capture and playback. The basic framework of DirectShow includes many parts. In the first embodiment, the audio / video separator 111 and the video decoder (FFMPEG decoder) 112 are used, and only a part of the modules is used. Of course, the present invention is not limited to using the DirectShow technology, and it is possible to use a corresponding decoder for a particular video encoding format. It suffices to output a bitmap image from a video signal.

これによって、ビデオ復号手段１１は入力されたビデオ信号を処理して、画像列を出力する。この画像列は画像選択手段１２に入力されて、ビデオ復号手段１１から出力された画像に対して選択する。 Thereby, the video decoding means 11 processes the input video signal and outputs an image sequence. This image sequence is input to the image selection means 12 and selected for the image output from the video decoding means 11.

図４は、字幕領域抽出装置における画像選択手段１２により行われる処理を説明するための模式図である。図４に示すように、ステップ１２１において、画像選択手段１２は所定の最小ビデオ期間（MVD：minimal video duration）で（本実施例では２０であり）、順にビデオ復号手段１１により出力された画像列からＭＶＤ＝２０枚の画像を抽出する。 FIG. 4 is a schematic diagram for explaining processing performed by the image selection unit 12 in the caption area extraction apparatus. As shown in FIG. 4, in step 121, the image selection unit 12 has a predetermined minimum video duration (MVD: minimal video duration) (20 in this embodiment) and is sequentially output by the video decoding unit 11. To extract MVD = 20 images.

次に、ステップ１２２において、画像選択手段１２は、このＭＶＤ＝２０枚の画像に対して、２枚の画像ずつ組み合わせてエッジ画像を算出して、１９０対のエッジ画像を得る。 Next, in step 122, the image selection means 12 calculates an edge image by combining two images with respect to the MVD = 20 images, and obtains 190 pairs of edge images.

ステップ１２３で、画像選択手段１２は、例えば２枚の画像ＡとＢに対して、これらに対する一対のエッジ画像ＥＡとＥＢが得られる。エッジ画像において、画素値は０若しくは２５５である。式（１）に示すように、エッジ部位でエッジ画像と原画像との間の階調変動Ｄ_Ａ，Ｂを運算する。

In step 123, the image selection means 12 obtains a pair of edge images EA and EB for the two images A and B, for example. In the edge image, the pixel value is 0 or 255. As shown in Expression (1), gradation fluctuations DA _{and B} between the edge image and the original image are calculated at the edge part.

ここで、（ｘ，ｙ）∈｛（ｘ，ｙ）｜Ｉ_ＥＡ（ｘ，ｙ）＝２５５｜｜Ｉ_ＥＢ（ｘ，ｙ）＝２５５｝、エッジ画像ＥＡとＥＢのエッジ点を示し、Ｉ_ＥＡ、Ｉ_ＥＢ、Ｉ_Ａ、Ｉ_Ｂはそれぞれ画像ＥＡ、ＥＢ、Ａ、Ｂの階調値であり、Ｗは画像の幅を示し、Ｈは画像の高を示す。
第i画像に対して、式（２）に示すように積算変動値Ｄ_ｉを算出する。

ここで、式（２）に示すようにＤ_ｉｊが算出される。 Here, (x, y) ε {(x, y) | I _EA (x, y) = 255 || I _EB (x, y) = 255}, indicating edge points of the edge images EA and EB, I _EA, a tone value of _{_I} _EB, _I _a, _I _B each image EA, EB, a, B, W denotes the width of the image, H is showing a high image.
For the i-th image, an integrated fluctuation value D _i is calculated as shown in Expression (2).

Here, D _ij is calculated as shown in Equation (2).

ステップ１２４において、あらゆる２０のＤ_ｉをソートして、積算変動値Ｄは最大の１０枚の画像を選択して、この１０枚の画像を平均化手段１３に転送する。 In step 124, all 20 _Di are sorted, and 10 images having the maximum accumulated variation value D are selected, and these 10 images are transferred to the averaging means 13.

平均化手段１３は、画像における画素毎に、式（３）に示すように選択された１０枚の画像に対して平均化して、平均化画像が得られる。

このうち、Ｉ_average（ｘ，ｙ）は平均化画像の座標（ｘ，ｙ）での画素値であり、Ｉ_ｉ（ｘ，ｙ）は画像選択手段１２により選択された１０枚の画像のうち第i画像の座標（ｘ，ｙ）での画素値である。 The averaging means 13 averages the ten images selected as shown in Expression (3) for each pixel in the image, and an averaged image is obtained.

Among these, I _average (x, y) is a pixel value at the coordinates (x, y) of the averaged image, and I _i (x, y) is out of 10 images selected by the image selection means 12. This is the pixel value at the coordinates (x, y) of the i-th image.

これによって平均化手段１３は、１つのＭＶＤで選択された１０枚の画像を平均化処理して、１つの平均化画像を生成する。この平均化画像は抽出手段１４へ伝送されて、平均化画像から複数の字幕領域を抽出する。 As a result, the averaging means 13 averages the 10 images selected by one MVD and generates one averaged image. This averaged image is transmitted to the extracting means 14, and a plurality of caption areas are extracted from the averaged image.

図５は、字幕領域抽出装置における抽出手段１４により行われる処理の概略フローチャートである。図５に示すように、ステップ１４１において、平均化画像からHarrisコーナーを抽出し、ステップ１４２において、ステップ１４１で抽出されたコーナーから精選コーナーを選択する。 FIG. 5 is a schematic flowchart of processing performed by the extraction unit 14 in the caption area extraction apparatus. As shown in FIG. 5, Harris corners are extracted from the averaged image in step 141, and carefully selected corners are selected from the corners extracted in step 141 in step 142.

そして、ステップ１４３において、精選Harrisコーナーに基づいて特徴直線を抽出し、ステップ１４４において、特徴直線に基づいて特徴長方形を抽出し、ステップ１４５において、ステップ１４４で抽出された特徴長方形から字幕領域を特定する。 In step 143, a feature line is extracted based on the selected Harris corner. In step 144, a feature rectangle is extracted based on the feature line. In step 145, a caption area is identified from the feature rectangle extracted in step 144. To do.

抽出手段１４は、前記流れに従ってそれぞれ水平字幕領域と垂直字幕領域を抽出する。以下抽出手段１４は字幕領域を抽出する処理を具体的に説明する。 The extracting means 14 extracts a horizontal caption area and a vertical caption area according to the flow. Hereinafter, the extraction means 14 will specifically describe the process of extracting the caption area.

まず、抽出手段１４が、平均化画像からHarrisコーナーを抽出する処理を説明する。Harrisコーナーは画像における注目点を検出するための重要な特徴であり、回転、大きさおよびイルミネーションの変動に対して変動しない。Harrisコーナーの抽出の詳細は下記の参考文献に記載されている。
C. Harris and M.J. Stephens, A combined corner and edge detector. In Alvey Vision Conference, pp. 147-152, 1988 First, a process in which the extraction unit 14 extracts a Harris corner from the averaged image will be described. The Harris corner is an important feature for detecting a point of interest in an image and does not vary with changes in rotation, size, and illumination. Details of Harris corner extraction are described in the following references.
C. Harris and MJ Stephens, A combined corner and edge detector.In Alvey Vision Conference, pp. 147-152, 1988

日本語や中国語などの東洋文字において、コーナー特徴は筆画特徴により非常に顕著になっている。コーナーは筆画の出発点、交差点、終点に発生することが多く、背景のコーナー特徴はあまり顕著ではない。 In oriental characters such as Japanese and Chinese, the corner feature is very prominent due to the stroke feature. Corners often occur at the starting point, intersection, and end point of a stroke, and the corner characteristics of the background are not so prominent.

なお、本明細書においてHarrisコーナーに基づいて本発明を具体的に説明するが、本発明はHarrisコーナーに限定されることなく、他のコーナーを用いてもよい。図６は、抽出手段１４によりHarrisコーナーを識別する動作のフローチャートである。 In the present specification, the present invention is specifically described based on the Harris corner, but the present invention is not limited to the Harris corner, and other corners may be used. FIG. 6 is a flowchart of the operation for identifying the Harris corner by the extracting means 14.

平均化手段１３から出力される画像はカラー画像であるので、まずステップ１４１１で抽出手段１４は、カラー画像を階調画像に変換する。カラー画像を階調画像に変換する処理は如何なる周知の技術を用いてもよく、ここでは詳細に説明しない。 Since the image output from the averaging unit 13 is a color image, the extraction unit 14 first converts the color image into a gradation image in step 1411. Any known technique may be used for the process of converting a color image into a gradation image, and will not be described in detail here.

次に、抽出手段１４は、ステップ１４１２で階調画像の水平方向の勾配および垂直方向の勾配を算出して、勾配画像を得る。同様に、勾配画像が得られるのは如何なる周知の技術を用いてもよく、ここでは詳細に説明しない。 Next, the extraction unit 14 calculates the gradient in the horizontal direction and the gradient in the vertical direction of the gradation image in step 1412 to obtain a gradient image. Similarly, any known technique may be used to obtain the gradient image and will not be described in detail here.

ステップ１４１３において、抽出手段１４は、得られた勾配画像に基づいて平均化画像における点Ａ（ｘ，ｙ）毎に、自己相関関数のマトリックスＭを算出する。

ここで、式（４）に示すように、Ｉ_ｘ（ｘ，ｙ）、Ｉ_ｙ（ｘ，ｙ）はそれぞれ平均化画像における（ｘ，ｙ）での水平勾配および垂直勾配を示し、Ｗは（ｘ，ｙ）を中心とするウインドウを示す。 In step 1413, the extracting unit 14 calculates an autocorrelation function matrix M for each point A (x, y) in the averaged image based on the obtained gradient image.

Here, as shown in Expression (4), I _x (x, y) and I _y (x, y) respectively indicate the horizontal gradient and the vertical gradient at (x, y) in the averaged image, and W is A window centered at (x, y) is shown.

抽出手段１４は、マトリックスＭが得られた後、その特徴値を算出する。マトリックスの特徴値の算出は周知であり、例えば「現代数学手册.経典数学卷，183頁，2000年，武漢華中大学出版社」に記載されている。 The extraction means 14 calculates the feature value after the matrix M is obtained. The calculation of the feature value of the matrix is well known and is described in, for example, “Contemporary Mathematics. Scripture Mathematics, 183 pages, 2000, Wuhan Huazhong University Press”.

そして、ステップ１４１４において、抽出手段１４は、得られた特徴値と予め決められた閾値とを比較する。特徴値がこの閾値より大きければ、ステップ１４１５でこの点がHarrisコーナーであると判定され、大きくなければ、ステップ１４１６でHarrisコーナーではなく、他の点たとえば境界点であると判定される。 In step 1414, the extraction unit 14 compares the obtained feature value with a predetermined threshold value. If the feature value is larger than this threshold value, it is determined in step 1415 that this point is a Harris corner, and if not, it is determined in step 1416 that it is not a Harris corner but another point, for example, a boundary point.

そして、抽出手段１４は、ステップ１４１７で平均化画像におけるあらゆる点を処理したかどうかを判定し、処理しなかったと判定された場合には、次の点に対して処理を行う。処理したと判定された場合には、ステップ１４１の処理は終了する。 Then, the extracting unit 14 determines whether or not every point in the averaged image has been processed in Step 1417, and when it is determined that it has not been processed, performs processing on the next point. If it is determined that the processing has been performed, the processing in step 141 ends.

これによって、抽出手段１４は、平均化画像からあらゆるHarrisコーナーを抽出する。次にステップ１４２へ処理を進め、抽出されたHarrisコーナーから精選コーナーを選択する。図７は、抽出手段１４によりコーナーを選択する動作のフローチャートである。 Thereby, the extraction means 14 extracts all Harris corners from the averaged image. Next, the process proceeds to step 142, and a selected corner is selected from the extracted Harris corner. FIG. 7 is a flowchart of an operation of selecting a corner by the extracting unit 14.

図７に示すように、抽出手段１４は、ステップ１４２１において、前のステップ１４１で得られたHarrisコーナー集合から１つのコーナーを選択し、ステップ１４２２でこのコーナーを中心とする所定サイズのウインドウ内のHarrisコーナーの数Num_Cornerを特定する。 As shown in FIG. 7, the extraction means 14 selects one corner from the Harris corner set obtained in the previous step 141 in step 1421, and in step 1422, the extraction means 14 within a window of a predetermined size centered on this corner. Specify the number of Harris corners Num_Corner.

抽出手段１４は、ステップ１４２３で、Num_Cornerが所定の閾値Nthより大きいかどうかを判定する。大きければ、ステップ１４２４で該Harrisコーナーが精選コーナーであると判定され、大きくなければ、ステップ１４２５で該Harrisコーナーを削除する。 In step 1423, the extraction unit 14 determines whether Num_Corner is greater than a predetermined threshold Nth. If it is larger, it is determined in step 1424 that the Harris corner is a selected corner, and if it is not larger, in step 1425 the Harris corner is deleted.

そして、ステップ１４２６でHarrisコーナー集合のうちあらゆるコーナーを処理したかどうかを判定し、処理しなかったと判定された場合には、ステップ１４２１に戻って、以降の処理を繰り返す。一方、処理したと判定された場合には、ステップ１４２の処理は終了する。 In step 1426, it is determined whether all corners of the Harris corner set have been processed. If it is determined that no corner has been processed, the process returns to step 1421 and the subsequent processing is repeated. On the other hand, if it is determined that the processing has been performed, the processing in step 142 ends.

そして、抽出手段１４は、ステップ１４３に処理を進め、ステップ１４２で特定された精選Harrisコーナーに基づいて特徴直線を抽出する。実施例１において、抽出手段１４は、水平字幕領域および垂直字幕領域の両者を抽出する。以下にそれぞれ説明する。 Then, the extracting unit 14 proceeds to step 143 and extracts a feature line based on the carefully selected Harris corner specified in step 142. In the first embodiment, the extraction unit 14 extracts both the horizontal caption area and the vertical caption area. Each will be described below.

図８は、抽出手段１４により水平特徴直線を抽出する動作のフローチャートである。図８に示すように、まずステップ１４３０１Ｈにおいて、ステップ１４２で特定された精選Harrisコーナーから任意に１つのコーナーを選択して、このHarrisコーナーをＣ０として、且つこれを出発点ＳＴとする。 FIG. 8 is a flowchart of the operation of extracting the horizontal feature straight line by the extracting means 14. As shown in FIG. 8, first, at step 14301H, one corner is arbitrarily selected from the selected Harris corners identified at step 142, and this Harris corner is set as C0, and this is set as the starting point ST.

次に、抽出手段１４は、ステップ１４３０２Ｈで、水平方向に次の精選Harrisコーナーを検索して、検索された次の精選HarrisコーナーをＣ１とする。ステップ１４３０３Ｈにおいて、この二つのコーナーＣ１とＣ０との距離Dist_Refine_Cornerを算出し、ステップ１４３０４Ｈでこの距離Dist_Refine_Cornerが所定の閾値ＤＨ０より小さいかどうかを判定する。 Next, in step 14302H, the extraction unit 14 searches for the next selected Harris corner in the horizontal direction, and sets the searched next selected Harris corner as C1. In step 14303H, a distance Dist_Refine_Corner between the two corners C1 and C0 is calculated, and in step 14304H, it is determined whether or not the distance Dist_Refine_Corner is smaller than a predetermined threshold DH0.

小さければ、抽出手段１４は、ステップ１４３０５Ｈでこの二つのHarrisコーナーＣ１とＣ０とを直線で連結し、ステップ１４３０６ＨでコーナーＣ１をＣ０として、ステップ１４３０２Ｈへ戻り、再び水平方向に次の精選Harrisコーナーを検索して、以降の処理を繰り返す。 If it is smaller, the extraction means 14 connects these two Harris corners C1 and C0 with a straight line at Step 14305H, sets the corner C1 to C0 at Step 14306H, returns to Step 14302H, and again selects the next selected Harris corner in the horizontal direction. Search and repeat the subsequent processing.

一方、抽出手段１４は、ステップ１４３０４Ｈでの判定結果が否定であれば、すなわちDist_Refine_Cornerが閾値ＤＨ０以上である際に、ステップ１４３０７Ｈへ処理を進め、出発点ＳＴとコーナーＣ１との間に存在する精選Harrisコーナーの数Num_Refine_Cornerを算出する。 On the other hand, if the determination result in step 14304H is negative, that is, when Dist_Refine_Corner is greater than or equal to the threshold value DH0, the extraction means 14 proceeds to step 14307H, and the selection that exists between the starting point ST and the corner C1. Calculate the number of Harris corners Num_Refine_Corner.

抽出手段１４は、ステップ１４３０８Ｈで出発点ＳＴとコーナーＣ１との間に存在する精選Harrisコーナーの数Num_Refine_Cornerが所定の閾値ＮＨ１より大きいかどうかを判断し、大きければ、ステップ１４３０９Ｈで出発点ＳＴとコーナーＣ１とを連結する直線を特徴直線と決定し、大きくなければ、ステップ１４３１０Ｈでこの処理サークルに係わるあらゆるコーナーを捨てる。 In step 14308H, the extraction means 14 determines whether or not the number of selected Harris corners Num_Refine_Corner existing between the starting point ST and the corner C1 is larger than a predetermined threshold NH1, and if so, the starting point ST and the corner are determined in step 14309H. A straight line connecting C1 is determined as a characteristic straight line, and if it is not large, all corners related to this processing circle are discarded in step 14310H.

そして、抽出手段１４は、ステップ１４３１１Ｈであらゆる精選Harrisコーナーに対して上記処理を行ったかどうかを判定する。あらゆる精選Harrisコーナーに対して上記処理を行ったと判定された場合には、ステップ１４３の処理は終了する。処理しなかった精選Harrisコーナーがある場合には、ステップ１４３０１Ｈへ戻って、処理しなかった精選Harrisコーナーから１つのコーナーを選択して、以降の処理を行う。 Then, the extracting unit 14 determines whether or not the above-described processing has been performed on all carefully selected Harris corners in Step 14311H. If it is determined that the above processing has been performed for all selected Harris corners, the processing in step 143 ends. If there is a selected Harris corner that has not been processed, the process returns to Step 14301H to select one corner from the selected Harris corner that has not been processed, and the subsequent processing is performed.

抽出手段１４は、あらゆる精選Harrisコーナーを処理するまでこのように繰り返し処理して、一連の水平特徴直線が得られる。そしてステップ１４４でこれらの水平特徴直線に基づいて水平特徴長方形を抽出する。図９は、抽出手段１４により水平特徴長方形を抽出する動作のフローチャートである。 The extraction means 14 repeatedly processes in this way until all the selected Harris corners are processed, resulting in a series of horizontal feature lines. In step 144, a horizontal feature rectangle is extracted based on these horizontal feature lines. FIG. 9 is a flowchart of the operation of extracting the horizontal feature rectangle by the extracting means 14.

図９に示すように、まず、抽出手段１４はステップ１４４０１Ｈにおいて、前のステップ１４３で抽出された水平特徴直線集合から任意に１本の水平特徴直線を選択して、これをＬ０として、出発直線ＳＴＬとする。 As shown in FIG. 9, the extraction means 14 first selects an arbitrary horizontal feature line from the set of horizontal feature lines extracted in the previous step 143 in step 14401H, sets this as L0, and sets the starting straight line. STL.

次に、抽出手段１４は、ステップ１４４０２Ｈで垂直方向に次の特徴直線を検索して、検索された次の特徴直線をＬ１とする。ステップ１４４０３Ｈで、この２本の特徴直線Ｌ１とＬ０との距離Dist_Lineを算出して、ステップ１４４０４Ｈでこの距離Dist_Lineが所定の閾値ＤＨ１より小さいかどうかを判定する。 Next, the extraction unit 14 searches for the next feature line in the vertical direction in step 14402H, and sets the searched next feature line as L1. In step 14403H, a distance Dist_Line between the two feature lines L1 and L0 is calculated, and in step 14404H, it is determined whether or not the distance Dist_Line is smaller than a predetermined threshold DH1.

小さければ、抽出手段１４は、ステップ１４４０５Ｈでこの２本の特徴直線Ｌ１とＬ０とを１つの長方形に構成させて、ステップ１４４０６Ｈで特徴直線Ｌ１をＬ０として、ステップ１４４０２Ｈに戻って、再び垂直方向に次の特徴直線を検索して、以降の処理を繰り返す。 If it is smaller, the extraction means 14 configures the two feature lines L1 and L0 into one rectangle in step 14405H, sets the feature line L1 to L0 in step 14406H, returns to step 14402H, and again in the vertical direction. The next feature line is searched and the subsequent processing is repeated.

一方、抽出手段１４は、ステップ１４４０４Ｈでの判定結果が否定であれば、すなわちDist_Lineが閾値ＤＨ１以上であれば、ステップ１４４０７Ｈへ処理を進め、出発直線ＳＴＬと直線Ｌ１との間に存在する特徴直線の数Num_Lineを算出する。 On the other hand, if the determination result in step 14404H is negative, that is, if Dist_Line is greater than or equal to the threshold DH1, the extraction means 14 proceeds to step 14407H, and a feature line existing between the departure straight line STL and the straight line L1. The number Num_Line is calculated.

抽出手段１４は、ステップ１４４０８Ｈで出発直線ＳＴＬと特徴直線Ｌ１との間に存在する直線の数Num_Lineが所定の閾値ＮＨ２より大きいかどうかを判定し、大きければ、ステップ１４４０９Ｈで出発直線ＳＴＬと直線Ｌ１との間に存在するあらゆる特徴直線を含む最小の長方形を抽出し、大きくなければ、ステップ１４４１０Ｈでこの処理サークルに係わるあらゆる特徴直線を捨てる。 In step 14408H, the extraction unit 14 determines whether or not the number Num_Line of lines existing between the starting line STL and the characteristic line L1 is larger than a predetermined threshold NH2, and if so, in step 14409H, the starting line STL and the line L1. The smallest rectangle including any feature line existing between and is extracted. If not, the feature line related to this processing circle is discarded in step 14410H.

そして、抽出手段１４は、ステップ１４４１１Ｈであらゆる特徴直線に対して上記処理を行ったかどうかを判定する。あらゆる特徴直線に対して上記処理を行ったと判定された場合に、ステップ１４４の処理は終了する。処理しなかった特徴直線がある場合にはステップ１４４０１Ｈへ戻って、処理しなかった特徴直線から１本の直線を選択して、以降の処理を繰り返す。 Then, the extracting unit 14 determines whether or not the above processing has been performed on all feature lines in Step 14411H. If it is determined that the above process has been performed on all feature lines, the process of step 144 ends. If there is a feature line that has not been processed, the process returns to step 14401H to select one straight line from the feature lines that have not been processed, and the subsequent processing is repeated.

抽出手段１４は、あらゆる特徴直線を処理するまでこのように繰り返し処理して、一連の水平特徴長方形が得られる。そしてステップ１４５へ処理を進め、これらの水平特徴長方形から字幕領域を特定する。 The extraction means 14 repeatedly processes in this way until every feature line is processed, resulting in a series of horizontal feature rectangles. Then, the process proceeds to step 145, and a caption area is specified from these horizontal feature rectangles.

具体的には、抽出手段１４は、特徴長方形毎に、そのアスペクト比、面積、幅および高さという４つの特徴をそれぞれ算出して、これらの特徴が予め決められた範囲内であるかどうかを判定する。この４つの特徴がそれぞれ所定の範囲内であれば、この特徴長方形を字幕領域と決定し、そうではなければ、この特徴長方形を捨てる。 Specifically, the extraction unit 14 calculates, for each feature rectangle, four features such as aspect ratio, area, width, and height, and determines whether these features are within a predetermined range. judge. If these four features are within a predetermined range, the feature rectangle is determined as a caption area. Otherwise, the feature rectangle is discarded.

同様に、抽出手段１４は、図５に示したフローチャートの流れに従って、Harrisコーナーに基づいて垂直字幕領域を抽出できる。図１０は、抽出手段１４により垂直特徴直線を抽出する動作のフローチャートであり、図１１は、抽出手段１４により垂直特徴長方形を抽出する動作のフローチャートである。 Similarly, the extraction unit 14 can extract the vertical caption area based on the Harris corner according to the flow of the flowchart shown in FIG. FIG. 10 is a flowchart of an operation of extracting a vertical feature line by the extracting unit 14, and FIG. 11 is a flowchart of an operation of extracting a vertical feature rectangle by the extracting unit 14.

図１０に示すように、抽出手段１４は、まずステップ１４３０１Ｖにおいて、ステップ１４２で特定された精選Harrisコーナーから任意に１つのコーナーを選択して、このHarrisコーナーをＣ０として、且つこれを出発点ＳＴとする。 As shown in FIG. 10, the extraction means 14 first selects an arbitrary corner from the selected Harris corner specified in Step 142 in Step 14301V, sets this Harris corner as C0, and uses this as the starting point ST. And

次に、抽出手段１４は、ステップ１４３０２Ｖで、垂直方向に次の精選Harrisコーナーを検索して、検索された次の精選HarrisコーナーをＣ１に設定する。ステップ１４３０３Ｖにおいて、この二つのコーナーＣ１とＣ０との距離Dist_Refine_Cornerを算出し、ステップ１４３０４Ｖでこの距離Dist_Refine_Cornerが所定の閾値ＤＨ０より小さいかどうかを判定する。 Next, in step 14302V, the extraction unit 14 searches for the next selected Harris corner in the vertical direction, and sets the searched next selected Harris corner to C1. In step 14303V, a distance Dist_Refine_Corner between the two corners C1 and C0 is calculated. In step 14304V, it is determined whether or not the distance Dist_Refine_Corner is smaller than a predetermined threshold DH0.

小さければ、抽出手段１４は、ステップ１４３０５Ｖでこの二つのHarrisコーナーＣ１とＣ０とを直線で連結し、ステップ１４３０６ＶでコーナーＣ１をＣ０として、ステップ１４３０２Ｖへ戻り、再び垂直方向に次の精選Harrisコーナーを検索して、以降の処理を繰り返す。 If it is smaller, the extraction means 14 connects these two Harris corners C1 and C0 with a straight line at Step 14305V, sets the corner C1 to C0 at Step 14306V, returns to Step 14302V, and again selects the next selected Harris corner in the vertical direction. Search and repeat the subsequent processing.

一方、抽出手段１４は、ステップ１４３０４Ｖでの判定結果が否定であれば、すなわちDist_Refine_Cornerが閾値ＤＨ０以上である際に、ステップ１４３０７Ｖへ処理を進め、出発点ＳＴとコーナーＣ１との間に存在する精選Harrisコーナーの数Num_Refine_Cornerを算出する。 On the other hand, if the determination result at step 14304V is negative, that is, when Dist_Refine_Corner is greater than or equal to the threshold value DH0, the extraction means 14 proceeds to step 14307V, and the selection that exists between the starting point ST and the corner C1. Calculate the number of Harris corners Num_Refine_Corner.

抽出手段１４は、ステップ１４３０８Ｖで出発点ＳＴとコーナーＣ１との間に存在する精選Harrisコーナーの数Num_Refine_Cornerが所定の閾値ＮＨ１より大きいかどうかを判断し、大きければ、ステップ１４３０９Ｖで出発点ＳＴとコーナーＣ１とを連結する直線を特徴直線と決定し、大きくなければ、ステップ１４３１０Ｖでこの処理サークルに係わるあらゆるコーナーを捨てる。 In step 14308V, the extraction means 14 determines whether or not the number of selected Harris corners Num_Refine_Corner existing between the starting point ST and the corner C1 is greater than a predetermined threshold NH1, and if so, the starting point ST and the corner are determined in step 14309V. A straight line connecting C1 is determined as a characteristic straight line, and if not large, all corners related to this processing circle are discarded in step 14310V.

そして、抽出手段１４は、ステップ１４３１１Ｖであらゆる精選Harrisコーナーに対して上記処理を行ったかどうかを判定する。あらゆる精選Harrisコーナーに対して上記処理を行ったと判定された場合には、ステップ１４３の処理は終了する。処理しなかった精選Harrisコーナーがある場合には、ステップ１４３０１Ｖへ戻って、処理しなかった精選Harrisコーナーから１つのコーナーを選択して、以降の処理を行う。 Then, the extraction unit 14 determines whether or not the above processing has been performed on all the selected Harris corners in Step 14311V. If it is determined that the above processing has been performed for all selected Harris corners, the processing in step 143 ends. If there is a selected Harris corner that has not been processed, the process returns to Step 14301V, and one corner is selected from the selected Harris corner that has not been processed, and the subsequent processing is performed.

抽出手段１４は、あらゆる精選Harrisコーナーを処理するまでこのように繰り返し処理して、一連の垂直特徴直線が得られる。そしてステップ１４４へ処理を進め、これらの垂直特徴直線に基づいて垂直特徴長方形を抽出する。 The extraction means 14 iterates in this way until every selected Harris corner is processed, resulting in a series of vertical feature lines. Then, the process proceeds to step 144, and a vertical feature rectangle is extracted based on these vertical feature lines.

図１１の説明に移ると、まず、抽出手段１４は、ステップ１４４０１Ｖにおいて、前のステップ１４３で抽出された垂直特徴直線集合から任意に１本の垂直特徴直線を選択して、これをＬ０として、出発直線ＳＴＬとする。 Moving to the explanation of FIG. 11, first, the extraction means 14 selects one vertical feature line arbitrarily from the set of vertical feature lines extracted in the previous step 143 in step 14401V, and sets this as L0. Let it be a starting straight line STL.

次に、抽出手段１４は、ステップ１４４０２Ｖで水平方向に次の特徴直線を検索して、検索された次の特徴直線をＬ１とする。ステップ１４４０３Ｖにおいて、この２本の特徴直線Ｌ１とＬ０との距離Dist_Lineを算出して、ステップ１４４０４Ｖでこの距離Dist_Lineが所定の閾値ＤＨ１より小さいかどうかを判定する。 Next, the extraction unit 14 searches for the next feature line in the horizontal direction in Step 14402V, and sets the searched next feature line as L1. In step 14403V, a distance Dist_Line between the two feature lines L1 and L0 is calculated, and in step 14404V, it is determined whether or not the distance Dist_Line is smaller than a predetermined threshold DH1.

小さければ、抽出手段１４は、ステップ１４４０５Ｖでこの２本の直線Ｌ１とＬ０とを１つの長方形に構成させて、ステップ１４４０６Ｖで特徴直線Ｌ１をＬ０として、ステップ１４４０２Ｖに戻って、再び水平方向に次の特徴直線を検索して、以降の処理を繰り返す。 If it is smaller, the extraction means 14 configures the two straight lines L1 and L0 into one rectangle in step 14405V, sets the characteristic straight line L1 to L0 in step 14406V, returns to step 14402V, and again in the horizontal direction. The feature line is searched and the subsequent processing is repeated.

一方、抽出手段１４は、ステップ１４４０４Ｖでの判定結果が否定であれば、すなわちDist_Lineが閾値ＤＨ１以上であれば、ステップ１４４０７Ｖへ処理を進め、出発直線ＳＴＬと直線Ｌ１との間に存在する特徴直線の数Num_Lineを算出する。 On the other hand, if the determination result at step 14404V is negative, that is, if Dist_Line is greater than or equal to the threshold DH1, the extraction means 14 proceeds to step 14407V, and a feature line existing between the departure straight line STL and the straight line L1. The number Num_Line is calculated.

抽出手段１４は、ステップ１４４０８Ｖで出発直線ＳＴＬと特徴直線Ｌ１との間に存在する直線の数Num_Lineが所定の閾値ＮＨ２より大きいかどうかを判定し、大きければ、ステップ１４４０９Ｖで出発直線ＳＴＬと直線Ｌ１との間に存在するあらゆる特徴直線を含む最小の長方形を抽出し、大きくなければ、ステップ１４４１０Ｖでこの処理サークルに係わるあらゆる特徴直線を捨てる。 In step 14408V, the extraction unit 14 determines whether the number Num_Line of lines existing between the starting straight line STL and the characteristic line L1 is larger than a predetermined threshold NH2, and if so, the starting line STL and the straight line L1 are determined in step 14409V. The smallest rectangle including any feature line existing between and is extracted, and if it is not larger, step 14410V discards all the feature lines related to this processing circle.

そして、抽出手段１４は、ステップ１４４１１Ｖであらゆる特徴直線に対して上記処理を行ったかどうかを判定する。あらゆる特徴直線に対して上記処理を行ったと判定された場合に、ステップ１４４の処理は終了する。処理しなかった特徴直線がある場合にはステップ１４４０１Ｖへ戻って、処理しなかった特徴直線から１本の直線を選択して、以降の処理を繰り返す。 And the extraction means 14 determines whether the said process was performed with respect to all the characteristic straight lines in step 14411V. If it is determined that the above process has been performed on all feature lines, the process of step 144 ends. If there is a feature line that has not been processed, the process returns to step 14401V, one straight line is selected from the feature lines that have not been processed, and the subsequent processing is repeated.

抽出手段１４は、あらゆる垂直特徴直線を処理するまでこのように繰り返し処理して、一連の垂直特徴長方形が得られる。そしてステップ１４５へ処理を進め、これらの特徴長方形から字幕領域を特定する。 The extraction means 14 iterates in this way until every vertical feature line is processed, resulting in a series of vertical feature rectangles. Then, the process proceeds to step 145 to specify a caption area from these feature rectangles.

具体的には、抽出手段１４は特徴長方形毎に、そのアスペクト比、面積、幅および高さという４つの特徴をそれぞれ算出して、これらの特徴が予め決められた範囲内であるかどうかを判定する。この４つの特徴がそれぞれ所定の範囲内であれば、この特徴長方形を垂直字幕領域と決定し、そうではなければ、この特徴長方形を捨てる。 Specifically, for each feature rectangle, the extraction means 14 calculates four features of the aspect ratio, area, width, and height, respectively, and determines whether these features are within a predetermined range. To do. If the four features are within a predetermined range, the feature rectangle is determined as a vertical caption area, and if not, the feature rectangle is discarded.

これによって複数の水平字幕領域と垂直字幕領域が得られる。これらの字幕領域をそのまま出力することが可能となる。しかしながら、状況によって、検出された水平字幕領域と垂直字幕領域とが互いに重なり合うこともある。従って、本実施例において後処理手段１５により水平字幕領域と垂直字幕領域とが互いに重なり合わなく完全に分離するように水平字幕領域と垂直字幕領域を処理してもよい。 Thereby, a plurality of horizontal caption areas and vertical caption areas are obtained. These subtitle areas can be output as they are. However, the detected horizontal subtitle area and vertical subtitle area may overlap each other depending on the situation. Therefore, in this embodiment, the horizontal subtitle area and the vertical subtitle area may be processed by the post-processing means 15 so that the horizontal subtitle area and the vertical subtitle area are completely separated without overlapping each other.

図１２は、後処理手段１５により行われる処理のフローチャートである。図１２に示すように、後処理手段１５は、ステップ１５１で水平候補字幕領域から任意に１つの字幕領域を選択して、これをＡとして、垂直候補字幕領域から任意に１つの字幕領域を選択して、これをＢとする。 FIG. 12 is a flowchart of processing performed by the post-processing means 15. As shown in FIG. 12, the post-processing means 15 arbitrarily selects one subtitle area from the horizontal candidate subtitle areas in step 151, selects this as A, and arbitrarily selects one subtitle area from the vertical candidate subtitle areas. Let this be B.

後処理手段１５は、ステップ１５２でこの２つの領域ＡとＢが互いに交差するかどうか、即ちＡとＢには重なり合っている部分があるかどうかを判定する。判定結果が否定であれば、ステップ１５８へ処理を進め、あらゆる可能な水平字幕領域と垂直字幕領域との対に対して処理したかどうかを判定する。 In step 152, the post-processing means 15 determines whether or not the two areas A and B intersect each other, that is, whether there is an overlapping portion between A and B. If the determination result is negative, the process proceeds to step 158 to determine whether or not processing has been performed for every possible pair of horizontal subtitle area and vertical subtitle area.

後処理手段１５は、ステップ１５２でＡとＢには重なり合っている部分があると判定した場合には、ステップ１５３へ処理を進め、領域ＡとＢを三つの領域Ｃ、Ａ０およびＢ０に分ける。そのうち、ＣはＡとＢとの重なり合っている部分であり、Ａ０は領域Ａにおける該重なり合っている領域以外の領域であり、Ｂ０は領域Ｂにおける該重なり合っている領域以外の領域である。 If the post-processing means 15 determines in step 152 that there is an overlapping portion between A and B, the post-processing means 15 proceeds to step 153 and divides the regions A and B into three regions C, A0 and B0. Among them, C is a portion where A and B overlap, A0 is a region other than the overlapping region in region A, and B0 is a region other than the overlapping region in region B.

そして、後処理手段１５は、ステップ１５４において、これらの三つの領域Ｃ、Ａ０およびＢ０の主色を算出する。主色の決定において、まず例えば赤、緑、イエロー、青、紫、ブラウン、白、黒、グレイ、シアン等の所定数、例えば１０種の色を決定する。字幕領域内の各画素をこの１０種の色のうち該画素に最も近づく色に対応付ける。字幕領域のそれぞれに対して、この１０種の色のうち対応付けられる画素が最も多い色を特定して、この色をこの字幕領域の主色と決定する。 Then, the post-processing means 15 calculates the main colors of these three areas C, A0 and B0 in step 154. In determining the main color, first, a predetermined number, for example, ten colors such as red, green, yellow, blue, purple, brown, white, black, gray, and cyan are determined. Each pixel in the caption area is associated with the color closest to the pixel among the ten colors. For each subtitle area, a color having the largest number of associated pixels among the ten types of colors is specified, and this color is determined as the main color of the subtitle area.

後処理手段１５は、ステップ１５５で色空間における領域Ｃと他の領域Ａ０および領域Ｂ０との距離Dist_Color(A0,C)とDist_Color(B0,C)に対して、Dist_Color(A0,C)はDist_Color(B0,C)より大きいかどうかを判定する。色距離の算出方法は以下のように示される。

ここで、式（５）に示すように、ＡおよびＢは距離を算出する２つの色であり、Ｒ_Ａ、Ｇ_Ａ、Ｂ_ＡはそれぞれＡでの赤、緑および青という三つのチャネルの値であり、Ｒ_Ｂ、Ｇ_Ｂ、Ｂ_ＢはそれぞれＢでの赤、緑および青という三つのチャネルの値である。 In step 155, the post-processing means 15 determines that Dist_Color (A0, C) is Dist_Color with respect to the distances Dist_Color (A0, C) and Dist_Color (B0, C) between the area C and the other areas A0 and B0 in the color space. Judge whether it is greater than (B0, C). The calculation method of the color distance is shown as follows.

Here, as shown in equation (5), A and B are two colors for calculating the distance, and R _A , G _A , and B _A are the values of three channels of red, green, and blue in A, respectively. Where R _B , G _B , and B _B are the values of the three channels at B, red, green, and blue, respectively.

後処理手段１５は、領域Ａ０とＣとの色距離Dist_Color(A0,C)が領域Ｂ０とＣとの色距離Dist_Color(B0,C)より小さいと判定された場合には、重合領域Ｃを領域Ａに属させることが好ましいと考えて、ステップ１５６で元の垂直字幕領域Ｂから重合領域Ｃを除外して得られる領域を領域Ｂと設定して、水平字幕領域Ａをそのまま維持する。 When it is determined that the color distance Dist_Color (A0, C) between the areas A0 and C is smaller than the color distance Dist_Color (B0, C) between the areas B0 and C, the post-processing unit 15 determines the overlap area C as the area. Considering that it is preferable to belong to A, an area obtained by excluding the overlap area C from the original vertical caption area B is set as an area B in step 156, and the horizontal caption area A is maintained as it is.

一方、領域Ａ０とＣとの色距離Dist_Color(A0,C)が領域Ｂ０とＣとの色距離Dist_Color(B0,C)以上であると判定された場合には、重合領域Ｃを領域Ｂに属させることが好ましいと考えて、ステップ１５７で元の水平字幕領域Ａから重合領域Ｃを除外して得られる領域を領域Ａと設定して、垂直字幕領域Ｂをそのまま維持する。これによって、互いに重なり合わない水平字幕領域Ａと垂直字幕領域Ｂが得られる。 On the other hand, if it is determined that the color distance Dist_Color (A0, C) between the areas A0 and C is equal to or greater than the color distance Dist_Color (B0, C) between the areas B0 and C, the overlapping area C belongs to the area B. In step 157, the area obtained by excluding the overlap area C from the original horizontal subtitle area A is set as the area A, and the vertical subtitle area B is maintained as it is. As a result, a horizontal caption area A and a vertical caption area B that do not overlap each other are obtained.

そして、後処理手段１５は、ステップ１５８であらゆる可能な水平字幕領域と垂直字幕領域との対を処理したかどうかを判定する。処理しなかったと判定された場合には、ステップ１５１へ戻って、水平字幕領域と垂直字幕領域との他の対に対して上記処理を行い、あらゆる可能な水平字幕領域と垂直字幕領域との対を処理したと判定された場合には、後処理手段１５の処理は終了して、抽出された水平字幕領域と垂直字幕領域を出力する。 Then, the post-processing means 15 determines whether or not all possible horizontal subtitle area and vertical subtitle area pairs have been processed in step 158. If it is determined that the processing has not been performed, the process returns to step 151 and the above processing is performed on the other pairs of the horizontal subtitle area and the vertical subtitle area, and all possible pairs of the horizontal subtitle area and the vertical subtitle area are paired. Is determined to have been processed, the processing of the post-processing means 15 ends, and the extracted horizontal subtitle area and vertical subtitle area are output.

上述してきたように、本発明の実施例１によれば、ビデオ信号を復号処理して、複数の画像を生成する、時間の順に従ってこの複数の画像を所定の大きさの画像グループに分けて、画像グループ毎にエッジ特徴に基づいて所定数の画像を選択する、選択された画像を平均化処理して、平均化画像が得られる、コーナー特徴に基づいて平均化画像から字幕領域を抽出する。複数の画像を用いて、エッジ特徴に基づいてこの複数の画像から選択して字幕領域を抽出するため、動的な背景のビデオに対して背景と字幕とのコントラストを高めることができ、更に正しく、ロバストな字幕領域検出ができる。なお、コーナー特徴に基づいて字幕領域を抽出するから、東洋文字の筆画特徴により適合し、より正しく中国語や日本語などの東洋文字の字幕を検出できる。 As described above, according to the first embodiment of the present invention, a video signal is decoded to generate a plurality of images. The plurality of images are divided into image groups of a predetermined size according to the order of time. Select a predetermined number of images based on edge features for each image group, average the selected images to obtain an averaged image, and extract subtitle areas from the averaged images based on corner features . By using multiple images and selecting from these multiple images based on edge features to extract subtitle areas, the contrast between background and subtitles can be increased for dynamic background videos, and more correctly Robust caption area detection. In addition, since the subtitle area is extracted based on the corner feature, it is more suitable for the stroke character of the oriental characters, and the subtitles of the oriental characters such as Chinese and Japanese can be detected more correctly.

本発明の例としての実施例２では、複数のスケールで字幕領域を抽出する。
実施例２に係わる字幕領域抽出装置の基本構成は上記実施例１と同じで、ビデオ復号手段、画像選択手段、平均化処理手段および抽出手段を備え、上記第一実施例と同様な後処理手段を含んでもよい。異なる点は抽出手段により行われる処理である。以下実施例２において抽出手段により行われる処理を詳細に説明する。以下の説明において、実施例１と同様または相応の部位に同じ符号を付与して、重複する説明を省略する。 In the second embodiment as an example of the present invention, caption areas are extracted at a plurality of scales.
The basic configuration of the caption area extracting apparatus according to the second embodiment is the same as that of the first embodiment, and includes a video decoding means, an image selecting means, an averaging processing means, and an extracting means, and the same post-processing means as in the first embodiment. May be included. The difference is the processing performed by the extraction means. Hereinafter, processing performed by the extraction unit in the second embodiment will be described in detail. In the following description, the same reference numerals are assigned to the same or corresponding parts as in the first embodiment, and a duplicate description is omitted.

実施例２において、仮にビデオ信号から復号して得られる画像のサイズが720×480（画素）なら、抽出手段１４は720×480、360×240、180×120という三つのスケールで水平字幕領域と垂直字幕領域を抽出して、抽出された領域のそれぞれを720×480のスケールに正規化して、Ｋ平均クラスタリングアルゴリズムにより合わせる。 In the second embodiment, if the size of an image obtained by decoding from a video signal is 720 × 480 (pixels), the extracting means 14 uses the horizontal subtitle area and the horizontal subtitle area on three scales of 720 × 480, 360 × 240, and 180 × 120. A vertical caption area is extracted, and each of the extracted areas is normalized to a scale of 720 × 480 and matched by a K-means clustering algorithm.

図１３は、本発明の実施例２に係わる抽出手段１４により行われる処理の概略フローチャートである。図１３に示すように、抽出手段１４は、720×480、360×240、180×120という三つのスケールで並列して動作する。720×480は元の画像サイズである。720×480のサイズで処理する場合には、ステップ２０１ａにおいて、縮小された720×480のサイズの画像に対してHarrisコーナーに基づいて水平候補字幕領域と垂直候補字幕領域を抽出する。一方、360×240のサイズで処理する場合には、ステップ２０１ｂにおいて、抽出手段１４は、まず元の平均化画像を360×240のサイズに縮小する。 FIG. 13 is a schematic flowchart of processing performed by the extraction unit 14 according to the second embodiment of the present invention. As shown in FIG. 13, the extraction unit 14 operates in parallel on three scales of 720 × 480, 360 × 240, and 180 × 120. 720 × 480 is the original image size. In the case of processing with a size of 720 × 480, in step 201a, a horizontal candidate subtitle region and a vertical candidate subtitle region are extracted based on the Harris corner with respect to the reduced 720 × 480 size image. On the other hand, when processing with a size of 360 × 240, in step 201b, the extraction means 14 first reduces the original averaged image to a size of 360 × 240.

そして、抽出手段１４は、ステップ２０２ｂにおいて、縮小された360×240のサイズの画像に対してHarrisコーナーに基づいて水平字幕領域と垂直字幕領域を抽出し、ステップ２０３ｂにおいて、抽出された水平字幕領域と垂直字幕領域を２倍、即ち元の720×480のサイズに大きくする。 In step 202b, the extracting unit 14 extracts a horizontal subtitle area and a vertical subtitle area from the reduced 360 × 240 size image based on the Harris corner, and in step 203b, the extracted horizontal subtitle area is extracted. And the vertical caption area is doubled, that is, the original size of 720 × 480.

同様に、180×120のサイズで処理する際に、抽出手段１４は、ステップ２０１ｃにおいて、まず元の平均化画像を180×120のサイズに縮小し、ステップ２０２ｃにおいて、縮小された180×120のサイズの画像に対してHarrisコーナーに基づいて水平字幕領域と垂直字幕領域を抽出する。そして、ステップ２０３ｃにおいて、抽出された水平字幕領域と垂直字幕領域を４倍、即ち元の720×480のサイズに大きくする。 Similarly, when processing with a size of 180 × 120, the extraction means 14 first reduces the original averaged image to a size of 180 × 120 in step 201c, and in step 202c, the extracted 180 × 120 size is reduced. A horizontal subtitle area and a vertical subtitle area are extracted from the size image based on the Harris corner. In step 203c, the extracted horizontal subtitle area and vertical subtitle area are enlarged four times, that is, to the original size of 720 × 480.

実施例２にかかる抽出手段１４が、Harrisコーナーに基づいて水平字幕領域と垂直字幕領域を抽出する処理は、上記実施例１において図６から図１１を参考して説明された処理と一様であるので、便宜上重複する説明を省略する。 The process of extracting the horizontal subtitle area and the vertical subtitle area based on the Harris corner by the extraction unit 14 according to the second embodiment is the same as the process described with reference to FIGS. 6 to 11 in the first embodiment. Because of this, overlapping description is omitted for convenience.

ここで、複数のサイズでのパラメータを選択する準則としては、大きいサイズでできるだけ小さいサイズの字幕を検出し、小さいサイズでできるだけ大きいサイズの字幕を検出する。したがって、大きいサイズの際に特徴直線と特徴長方形の閾値を比較的に小さく設定して、小さいサイズの際に特徴直線と特徴長方形の閾値を比較的に大きく設定することが可能となる。 Here, as a rule for selecting parameters in a plurality of sizes, subtitles having a large size and a size as small as possible are detected, and subtitles having a small size and a size as large as possible are detected. Therefore, it is possible to set the threshold values of the feature line and the feature rectangle to be relatively small when the size is large, and set the threshold values of the feature line and the feature rectangle to be relatively large when the size is small.

以上、異なるサイズで且つ異なる方向の６グループの字幕領域が得られる。次に、抽出手段１４は、ステップ２０４において、Ｋ平均クラスタリングアルゴリズムによりこの６グループの字幕領域を合わせて、クラスタリングされた字幕領域が得られる。 As described above, six groups of caption areas having different sizes and different directions can be obtained. Next, in step 204, the extraction unit 14 combines the six groups of caption areas by the K-means clustering algorithm to obtain a clustered caption area.

Ｋ平均クラスタリングアルゴリズムは周知の簡単な非監督学習アルゴリズムであり、その基本過程は（１）あらゆる字幕領域から始めの集合として任意にＫ（例えば５）の長方形を選択して、集合の中心を求めるステップと、（２）字幕領域の中心と対応の集合の中心とのユークリッド距離が最小である原則に従って、あらゆる字幕領域をこの５の集合に割り当てるステップと、（３）各集合に対して、集合のうちあらゆる字幕領域を平均化して、平均化された中心を集合の中心とするステップと、（４）集合の中心位置は変動しないままでステップ（２）とステップ（３）とを繰り返すステップとを含む。これによって異なるサイズで抽出された字幕領域を合わせる。Ｋ平均クラスタリングアルゴリズムは本技術分野の周知のものであるので、便宜上詳細に説明しない。 The K-means clustering algorithm is a well-known simple non-supervised learning algorithm. The basic process is as follows: (1) Select K (for example, 5) rectangles as the first set from any subtitle area and find the center of the set And (2) assigning all subtitle areas to the five sets according to the principle that the Euclidean distance between the center of the subtitle area and the center of the corresponding set is minimum, and (3) a set for each set. And (4) repeating steps (2) and (3) while the center position of the set remains unchanged. including. Thereby, the subtitle areas extracted in different sizes are combined. The K-means clustering algorithm is well known in the art and will not be described in detail for convenience.

なお、抽出手段１４以外の他の手段により行われる処理は上記実施例１において説明されたものと同一であるので、重複する説明を省略する。 In addition, since the process performed by means other than the extraction means 14 is the same as that described in the first embodiment, a duplicate description is omitted.

本発明の実施例２によれば、複数のサイズで字幕領域を抽出して、抽出された複数のグループの字幕領域をクラスタリングして合わせる。従って上記第一実施例の利点に加えて、より正しくビデオに存在するあらゆる字幕を抽出できる。 According to the second embodiment of the present invention, subtitle areas are extracted in a plurality of sizes, and the extracted subtitle areas of a plurality of groups are clustered and combined. Therefore, in addition to the advantages of the first embodiment, any subtitles present in the video can be extracted more correctly.

以上、例としての実施例により本発明を説明した。本発明の主旨はビデオ信号を復号して、複数の画像を生成すること、時間の順に従って前記複数の画像を所定の大きさの画像グループに分けて、画像グループ毎にエッジ特徴に基づいて所定数の画像を選択すること、各前記画像グループに対して、選択された画像を平均化処理して、平均化画像が得られること、および前記平均化画像からコーナーを抽出して、コーナーに基づいて前記平均化画像から字幕領域を抽出することを具備するビデオ信号から字幕領域を抽出する処理を提供することにあることが分かる。勿論、本発明は上記の細部に限定されることではない。 The present invention has been described above by way of example embodiments. The gist of the present invention is to decode a video signal to generate a plurality of images, divide the plurality of images into image groups of a predetermined size according to the order of time, and determine predetermined values based on edge features for each image group. Selecting a number of images, averaging the selected images for each of the image groups to obtain an averaged image, and extracting a corner from the averaged image, based on the corner Thus, it can be seen that a process for extracting a caption area from a video signal comprising extracting a caption area from the averaged image is provided. Of course, the present invention is not limited to the details described above.

例えば、以上の説明において、Harrisコーナーに基づいて字幕領域を抽出することを説明したが、Harrisコーナーは1つの例だけであり、本発明はHarrisコーナーに限定されることではなく、他のコーナーを用いてもよい。 For example, in the above description, it has been described that the caption area is extracted based on the Harris corner. However, the Harris corner is only one example, and the present invention is not limited to the Harris corner. It may be used.

又、例えば以上の説明において、Ｋ平均クラスタリングアルゴリズムにより複数のサイズで得られる字幕領域に対して合わせ処理を行うことを説明したが、他のクラスタリングアルゴリズムを用いてもよい。 Further, for example, in the above description, it has been described that the matching process is performed on subtitle areas obtained in a plurality of sizes by the K-average clustering algorithm, but other clustering algorithms may be used.

又、例えば上記第二実施例において抽出手段は三つのサイズで字幕領域を抽出する。しかしながら、前記三つのサイズに限定されることではなく、2つのサイズまたは4つのサイズで字幕領域を抽出してクラスタリング処理を行ってもよい。 Also, for example, in the second embodiment, the extraction means extracts the caption area in three sizes. However, the present invention is not limited to the three sizes, and the clustering process may be performed by extracting caption areas in two sizes or four sizes.

以上本発明に係わる字幕領域抽出装置により本発明を説明した。しかし本発明はこの字幕領域抽出装置により実現される字幕領域抽出方法として実施することも可能である。なお、本発明はコンピュータにこの字幕領域抽出方法を実行させるプログラムとこのプログラムを記録したコンピュータで読取り可能な媒体として実施することも可能である。 The present invention has been described with the caption area extracting apparatus according to the present invention. However, the present invention can also be implemented as a caption area extraction method realized by this caption area extraction apparatus. The present invention can also be implemented as a program that causes a computer to execute the caption area extraction method and a computer-readable medium that records the program.

以上の実施例１、２を含む実施形態に関し、更に以下の付記を開示する。 The following additional notes are further disclosed with respect to the embodiments including the first and second examples.

字幕画面の一例を示す模式図である。It is a schematic diagram which shows an example of a caption screen. 本発明の実施例１に係わる字幕領域抽出装置の模式ブロック図である。1 is a schematic block diagram of a caption area extraction apparatus according to Embodiment 1 of the present invention. 本発明の実施例１に係わる字幕領域抽出装置におけるビデオ復号手段の模式ブロック図である。It is a schematic block diagram of the video decoding means in the subtitle area | region extraction apparatus concerning Example 1 of this invention. 字幕領域抽出装置における画像選択手段により行われる処理を説明するための模式図である。It is a schematic diagram for demonstrating the process performed by the image selection means in a caption area extraction device. 字幕領域抽出装置における抽出手段により行われる処理の概略フローチャートである。It is a schematic flowchart of the process performed by the extraction means in a caption area extraction device. 抽出手段によりHarrisコーナーを識別する動作のフローチャートである。It is a flowchart of the operation | movement which identifies a Harris corner by an extraction means. 抽出手段によりコーナーを選択する動作のフローチャートである。It is a flowchart of the operation | movement which selects a corner by an extraction means. 抽出手段により水平特徴直線を抽出する動作のフローチャートである。It is a flowchart of the operation | movement which extracts a horizontal feature straight line by an extraction means. 抽出手段により水平特徴長方形を抽出する動作のフローチャートである。It is a flowchart of the operation | movement which extracts a horizontal feature rectangle by an extraction means. 抽出手段により垂直特徴直線を抽出する動作のフローチャートである。It is a flowchart of the operation | movement which extracts a vertical feature straight line by the extraction means. 抽出手段により垂直特徴長方形を抽出する動作のフローチャートである。It is a flowchart of the operation | movement which extracts a vertical feature rectangle by an extraction means. 後処理手段により行われる処理のフローチャートである。It is a flowchart of the process performed by a post-processing means. 本発明の実施例２に係わる抽出手段により行われる処理の概略フローチャートである。It is a schematic flowchart of the process performed by the extraction means concerning Example 2 of this invention.

符号の説明Explanation of symbols

１０字幕領域抽出装置
１１ビデオ復号手段
１２画像選択手段
１３平均化手段
１４抽出手段
１５後処理手段
１１１オーディオ・ビデオ分離器
１１２ビデオ復号器 DESCRIPTION OF SYMBOLS 10 Subtitle area extraction device 11 Video decoding means 12 Image selection means 13 Averaging means 14 Extraction means 15 Post-processing means 111 Audio / video separator 112 Video decoder

Claims

ビデオ信号から字幕領域を抽出する字幕領域抽出装置であって、
前記ビデオ信号を復号して、複数の画像を生成するビデオ復号手段と、
時間の順に従って、前記ビデオ復号手段により出力された前記複数の画像を所定の大きさの画像グループに分け、画像グループ毎にエッジ特徴に基づいて所定数の画像を選択する画像選択手段と、
前記画像グループのそれぞれに対して、前記画像選択手段により選択された画像を平均化処理して、平均化画像が得られる平均化手段と、
前記平均化画像からコーナーを抽出し、前記コーナーの密度に基づいて特徴直線を抽出し、前記特徴直線の密度に基づいて前記平均化画像から字幕領域を抽出する抽出手段と
を備える字幕領域抽出装置。 A caption area extraction device for extracting a caption area from a video signal,
Video decoding means for decoding the video signal to generate a plurality of images;
Image selection means for dividing the plurality of images output by the video decoding means into image groups of a predetermined size according to the order of time, and selecting a predetermined number of images based on edge features for each image group;
An averaging means for averaging the images selected by the image selection means for each of the image groups to obtain an averaged image;
A caption area extraction device comprising: an extraction means for extracting a corner from the averaged image, extracting a feature line based on the density of the corner, and extracting a caption area from the averaged image based on the density of the feature line .

前記抽出手段は複数のスケールで水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出して、前記複数のスケールで抽出された字幕領域をクラスタリング処理する請求項１に記載の字幕領域抽出装置。 2. The caption area extraction device according to claim 1, wherein the extraction unit extracts a horizontal caption area and a vertical caption area in a horizontal direction and a vertical direction on a plurality of scales, respectively, and performs clustering processing on the caption areas extracted on the plurality of scales. .

前記抽出手段は前記平均化画像からHarrisコーナーを抽出して、Harrisコーナー毎に、該Harrisコーナーを中心とする所定サイズのウインドウ内のHarrisコーナーの数を算出し、前記数が所定の閾値未満であれば、該Harrisコーナーを捨てる請求項１に記載の字幕領域抽出装置。 The extraction means extracts a Harris corner from the averaged image, calculates the number of Harris corners in a window of a predetermined size centered on the Harris corner for each Harris corner, and the number is less than a predetermined threshold value. if, caption area extraction apparatus according to claim 1, discarding the Harris corner.

前記抽出手段は水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出し、前記抽出手段が抽出した水平字幕領域および垂直字幕領域を、前記水平字幕領域と垂直字幕領域が互いに重なり合わないように処理する後処理手段を具備する請求項１に記載の字幕領域抽出装置。 The extraction means extracts a horizontal subtitle area and a vertical subtitle area in the horizontal and vertical directions, respectively, and the horizontal subtitle area and the vertical subtitle area extracted by the extraction means do not overlap each other. The subtitle area extraction apparatus according to claim 1, further comprising post-processing means for performing processing.

ビデオ信号から字幕領域を抽出する字幕領域抽出方法であって、
前記ビデオ信号を復号して、複数の画像を生成するステップと、
時間の順に従って、前記複数の画像を所定の大きさの画像グループに分け、画像グループ毎にエッジ特徴に基づいて所定数の画像を選択するステップと、
前記画像グループのそれぞれに対して、前記選択された画像を平均化処理して、平均化画像が得られるステップと、
前記平均化画像からコーナーを抽出し、前記コーナーの密度に基づいて特徴直線を抽出し、前記特徴直線の密度に基づいて前記平均化画像から字幕領域を抽出するステップと
を備える字幕領域抽出方法。 A subtitle area extraction method for extracting a subtitle area from a video signal,
Decoding the video signal to generate a plurality of images;
Dividing the plurality of images into image groups of a predetermined size according to the order of time, and selecting a predetermined number of images based on edge features for each image group;
Averaging the selected images for each of the image groups to obtain an averaged image;
Extracting a corner from the averaged image, extracting a feature line based on the density of the corner, and extracting a caption region from the averaged image based on the density of the feature line .

前記字幕領域を抽出するステップにおいて、複数のスケールで水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出して、前記複数のスケールで抽出された字幕領域をクラスタリング処理する請求項５に記載の字幕領域抽出方法。 In the step of extracting the caption area, according to claim 5, each extracted horizontal caption region and vertical subtitle region at multiple scales in the horizontal and vertical directions, clustering processing subtitle area extracted by the plurality of scales Subtitle area extraction method.

前記字幕領域を抽出するステップにおいて、前記平均化画像からHarrisコーナーを抽出して、Harrisコーナー毎に、該Harrisコーナーを中心とする所定サイズのウインドウ内のHarrisコーナーの数を算出し、前記数が所定の閾値未満であれば、該Harrisコーナーを捨てる請求項５に記載の字幕領域抽出方法。 In the step of extracting the subtitle area, a Harris corner is extracted from the averaged image, and for each Harris corner, the number of Harris corners in a window of a predetermined size centered on the Harris corner is calculated, and the number is 6. The caption area extraction method according to claim 5 , wherein the Harris corner is discarded if it is less than a predetermined threshold.

前記字幕領域を抽出するステップにおいて、水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出し、前記水平字幕領域および垂直字幕領域を、前記水平字幕領域と垂直字幕領域が互いに重なり合わないように処理する後処理ステップを具備する請求項５に記載の字幕領域抽出方法。 In the step of extracting the subtitle area, a horizontal subtitle area and a vertical subtitle area are extracted in the horizontal and vertical directions, respectively, so that the horizontal subtitle area and the vertical subtitle area do not overlap each other. The subtitle area extracting method according to claim 5 , further comprising a post-processing step for performing the processing.

情報処理デバイスに、
ビデオ信号を復号して、複数の画像を生成するステップと、
時間の順に従って、前記複数の画像を所定の大きさの画像グループに分け、画像グループ毎にエッジ特徴に基づいて所定数の画像を選択するステップと、
前記画像グループのそれぞれに対して、前記選択された画像を平均化処理して、平均化画像が得られるステップと、
前記平均化画像からコーナーを抽出し、前記コーナーの密度に基づいて特徴直線を抽出し、前記特徴直線の密度に基づいて前記平均化画像から字幕領域を抽出するステップと
を実行させることを特徴とする字幕領域抽出プログラム。 In information processing device,
Decoding a video signal to generate a plurality of images;
Dividing the plurality of images into image groups of a predetermined size according to the order of time, and selecting a predetermined number of images based on edge features for each image group;
Averaging the selected images for each of the image groups to obtain an averaged image;
Extracting a corner from the averaged image, extracting a feature line based on the density of the corner, and extracting a caption area from the averaged image based on the density of the feature line. Subtitle area extraction program.

前記字幕領域を抽出するステップにおいて、複数のスケールで水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出して、前記複数のスケールで抽出された字幕領域をクラスタリング処理する請求項９に記載の字幕領域抽出プログラム。 In the step of extracting the caption area, according to claim 9 in which each extracted horizontal caption region and vertical subtitle region at multiple scales in the horizontal and vertical directions, clustering processing subtitle area extracted by the plurality of scales Subtitle area extraction program.

前記字幕領域を抽出するステップにおいて、前記平均化画像からHarrisコーナーを抽出して、Harrisコーナー毎に、該Harrisコーナーを中心とする所定サイズのウインドウ内のHarrisコーナーの数を算出し、前記数は所定の閾値未満であれば、該Harrisコーナーを捨てる請求項９に記載の字幕領域抽出プログラム。 In the step of extracting the subtitle area, a Harris corner is extracted from the averaged image, and for each Harris corner, the number of Harris corners in a window having a predetermined size centered on the Harris corner is calculated, and the number is The subtitle area extraction program according to claim 9 , wherein the Harris corner is discarded if it is less than a predetermined threshold.

前記字幕領域を抽出するステップにおいて、水平と垂直方向にそれぞれ水平字幕領域および垂直字幕領域を抽出し、前記字幕領域抽出プログラムは、更に情報処理デバイスに前記水平字幕領域および垂直字幕領域を、前記水平字幕領域と垂直字幕領域が互いに重なり合わないように処理する後処理ステップを実行させる請求項９に記載の字幕領域抽出プログラム。 In the step of extracting the subtitle area, a horizontal subtitle area and a vertical subtitle area are extracted in the horizontal and vertical directions, respectively, and the subtitle area extraction program further adds the horizontal subtitle area and the vertical subtitle area to the information processing device. The subtitle area extraction program according to claim 9 , wherein a post-processing step is executed to perform processing so that the subtitle area and the vertical subtitle area do not overlap each other.