JP2012032885A

JP2012032885A - Apparatus, program and method for recognizing character

Info

Publication number: JP2012032885A
Application number: JP2010169740A
Authority: JP
Inventors: Yoshinobu Hotta; 悦伸堀田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2010-07-28
Filing date: 2010-07-28
Publication date: 2012-02-16
Anticipated expiration: 2030-07-28
Also published as: JP5691281B2

Abstract

PROBLEM TO BE SOLVED: To improve character recognition rates.SOLUTION: The character recognition apparatus includes: a first character recognition unit 214 that performs character recognition on image information to acquire a plurality of recognition result candidate characters with respect to an image region recognized as one character in the image information; a mask image storing unit 222 that stores a mask image as character image information; a mask processing unit 215 that refers to the mask image storing unit, acquires a mask image corresponding to each of the recognition result candidate characters to generate logical product images by calculating a logical product between the mask image and the image region recognized as one character; a second character recognition unit 216 that performs character recognition on each of the logical product images to determine the recognition character as a character corresponding to the logical product image and similarity between the recognition image and the logical product image; and a deciding unit 217 that decides a character corresponding to the image region recognized as one character from among the recognition characters, on the basis of the similarity between each of the recognition characters and the logical image, which has been determined by the second character recognition unit.

Description

本件は、文字認識装置、文字認識プログラム及び文字認識方法に関する。 The present case relates to a character recognition device, a character recognition program, and a character recognition method.

近年、多数の番組を録画できるＤＶＤ（Digital Versatile Disc）装置や、ＨＤＤ（Hard Disk Drive）ビデオ装置が普及している。その結果、大量の録画データの中からコンテンツを検索するニーズが高まってきている。また、放送局等においても、映像データを効率的に管理するために、過去に放映した映像データに検索用のテキストデータを付与して、映像検索を容易にしたいというニーズが高まっている。 In recent years, DVD (Digital Versatile Disc) devices and HDD (Hard Disk Drive) video devices capable of recording a large number of programs have become widespread. As a result, there is an increasing need to search for content from a large amount of recorded data. In broadcasting stations and the like, in order to efficiently manage video data, there is an increasing need to make it easy to search video by adding search text data to video data broadcast in the past.

映像データを検索することを可能にする方法として、映像中の文字情報を検索のキーに使用することが考えられる。映像の中には、クローズドキャプションと呼ばれる字幕情報が付与された映像も存在するが、大部分の映像はテキスト情報を保持していない。したがって、映像データに対して検索用のテキストデータを付与するには、映像中の文字情報を抽出して、文字認識を行う必要がある。ここで、文字の認識結果を検索のキーとして使用することを考えた場合、その認識精度を高めることが重要である。しかしながら、映像中にあらわれる文字の背景は、一様ではなく複雑である場合が多いため、そのような複雑な背景上に表示された文字を認識することは難しい。そこで、複雑な背景上の文字の認識率を向上させる技術が提案されている（例えば、特許文献１）。 As a method for making it possible to search video data, it is conceivable to use character information in the video as a search key. Some videos include subtitle information called closed captions, but most videos do not hold text information. Therefore, in order to add search text data to video data, it is necessary to extract character information in the video and perform character recognition. Here, when considering using the character recognition result as a search key, it is important to improve the recognition accuracy. However, since the background of characters appearing in the video is often not uniform and complex, it is difficult to recognize characters displayed on such a complex background. Thus, a technique for improving the recognition rate of characters on a complicated background has been proposed (for example, Patent Document 1).

特開２００８−１９１９０６号公報JP 2008-191906 A

一般的な文字認識においては、２値化した画像に対して文字認識処理が行われる。しかしながら、複雑な背景上に文字が存在する画像を２値化した場合、背景の一部が２値化した画像の中に残存してしまうことがある。このような場合、文字認識において、残存した背景の一部（以下、ノイズと記載する）が文字線分の一部とみなされ、誤認識の原因となってしまう。しかしながら、特許文献１の技術では、２値化した画像にノイズが混入する可能性があり、ノイズによる誤認識が発生する可能性があった。 In general character recognition, character recognition processing is performed on a binarized image. However, when an image having characters on a complicated background is binarized, a part of the background may remain in the binarized image. In such a case, in character recognition, a part of the remaining background (hereinafter referred to as noise) is regarded as a part of the character line segment, which causes erroneous recognition. However, in the technique of Patent Document 1, noise may be mixed in the binarized image, and erroneous recognition due to noise may occur.

本件は、上記の事情に鑑みて成されたものであり、文字の認識率を向上させる文字認識装置、文字認識プログラム及び文字認識方法を提供することを目的とする。 The present case has been made in view of the above circumstances, and an object thereof is to provide a character recognition device, a character recognition program, and a character recognition method that improve the character recognition rate.

上記課題を解決するために、明細書開示の文字認識装置は、画像情報について文字認識を行い、該画像情報のうち一文字として認識された画像領域に対して複数の認識結果候補文字を取得する第１の認識部と、文字の画像情報であるマスク画像が記憶されたマスク画像記憶部と、前記マスク画像記億部を参照し、前記認識結果候補文字のそれぞれに対して、認識結果候補文字に対応するマスク画像を取得し、該マスク画像と前記一文字として認識された画像領域との論理積をとった論理積画像を生成する生成部と、前記論理積画像のそれぞれに対して、文字認識を行い、論理積画像に対応する文字である認識文字、および、該認識文字と該論理積画像との類似度を決定する第２の認識部と、前記第２の認識部が決定した前記認識文字それぞれの前記論理積画像との類似度に基づいて、複数の該認識文字の中から、前記一文字として認識された画像領域に対応する文字を決定する決定部と、を備える。 In order to solve the above problems, a character recognition device disclosed in the specification performs character recognition on image information, and acquires a plurality of recognition result candidate characters for an image area recognized as one character of the image information. 1 recognition unit, a mask image storage unit storing a mask image as character image information, and the mask image storage unit, and for each recognition result candidate character, A corresponding mask image is obtained, and a generation unit that generates a logical product image obtained by performing a logical product of the mask image and the image area recognized as the one character, character recognition is performed on each of the logical product images. A recognition character that is a character corresponding to the logical product image, a second recognition unit that determines the similarity between the recognition character and the logical product image, and the recognition character determined by the second recognition unit. Respectively On the basis of the similarity between logical images, from among a plurality of the recognized character, and a determination unit for determining a character corresponding to the recognized image area as the character.

上記課題を解決するために、明細書開示の文字認識プログラムは、画像情報について文字認識を行い、該画像情報のうち一文字として認識された画像領域に対して複数の認識結果候補文字を取得する第１の認識ステップと、文字の画像情報であるマスク画像が記憶されたマスク画像記億部を参照し、前記認識結果候補文字のそれぞれに対して、認識結果候補文字に対応するマスク画像を取得し、該マスク画像と前記一文字として認識された画像領域との論理積をとった論理積画像を生成する生成ステップと、前記論理積画像のそれぞれに対して、文字認識を行い、論理積画像に対応する文字である認識文字、および、該認識文字と該論理積画像との類似度を決定する第２の認識ステップと、前記第２の認識ステップで決定された前記認識文字それぞれの論理積画像との類似度に基づいて、複数の該認識文字の中から、前記一文字として認識された画像領域に対応する文字を決定する決定ステップと、をコンピュータに実行させる In order to solve the above problem, a character recognition program disclosed in the specification performs character recognition on image information, and acquires a plurality of recognition result candidate characters for an image region recognized as one character of the image information. 1 and a mask image storage unit storing a mask image as character image information, and obtaining a mask image corresponding to the recognition result candidate character for each of the recognition result candidate characters. A generation step of generating a logical product of the mask image and the image area recognized as the one character, and character recognition is performed on each of the logical product images to correspond to the logical product image. A recognized character that is a character to be recognized, a second recognition step for determining a similarity between the recognized character and the logical product image, and the recognized character determined in the second recognition step Based on the similarity between logical images respectively from a plurality of the recognized character to execute a determination step of determining a character corresponding to the recognized image area as the character, to the computer

上記課題を解決するために、明細書開示の文字認識方法は、コンピュータが、画像情報について文字認識を行い、一文字として認識された画像領域に対して複数の認識結果候補文字を取得する第１の認識ステップと、文字の画像情報であるマスク画像が記憶されたマスク画像記憶部を参照し、前記認識結果候補文字のそれぞれに対して、認識結果候補文字に対応するマスク画像を取得し、該マスク画像と前記一文字として認識された画像領域との論理積をとった論理積画像を生成する生成ステップと、前記論理積画像のそれぞれに対して、文字認識を行い、論理積画像に対応する文字である認識文字、および、該認識文字と該論理積画像との類似度を決定する第2の認識ステップと、前記第２の認識ステップで決定された前記認識文字それぞれの論理積画像との類似度に基づいて、複数の該認識文字の中から、前記一文字として認識された画像領域に対応する文字を決定する決定ステップと、を実行する。 In order to solve the above problem, a character recognition method disclosed in the specification is a first method in which a computer performs character recognition on image information and acquires a plurality of recognition result candidate characters for an image region recognized as one character. A mask image corresponding to a recognition result candidate character is obtained for each of the recognition result candidate characters with reference to a recognition step and a mask image storage unit storing a mask image as character image information. Generating a logical product of the logical product of the image and the image area recognized as one character, and performing character recognition for each of the logical product images, and using characters corresponding to the logical product image A recognized character, a second recognition step for determining a similarity between the recognized character and the AND image, and the recognized character determined in the second recognition step, respectively And a determination step of determining a character corresponding to the image region recognized as the one character from the plurality of recognized characters based on the similarity to the logical product image.

明細書開示の文字認識装置、文字認識プログラム及び文字認識方法によれば、文字の認識率が向上する。 According to the character recognition device, the character recognition program, and the character recognition method disclosed in the specification, the character recognition rate is improved.

本件の文字認識装置を含む映像管理システムのシステム構成の一例を示す図である。It is a figure which shows an example of the system configuration | structure of the video management system containing the character recognition apparatus of this case. 文字認識装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of a character recognition apparatus. 文字認識装置が備える機能の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the function with which a character recognition apparatus is provided. 前処理部が実施する各処理によって得られる画像の一例を示す図である。It is a figure which shows an example of the image obtained by each process which a pre-processing part implements. ラプラシアンフィルタの一例を示す図である。It is a figure which shows an example of a Laplacian filter. 文字画像の切り出しの概要について説明する図である。It is a figure explaining the outline | summary of extraction of a character image. 認識結果候補文字と対応するマスク画像の一例を示す図である。It is a figure which shows an example of the mask image corresponding to a recognition result candidate character. 文字認識装置が実行する処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process which a character recognition apparatus performs. 第１の文字認識部による文字認識結果の一例を示す図である。It is a figure which shows an example of the character recognition result by a 1st character recognition part. マスク処理の詳細な処理の一例を示すフローチャートである。It is a flowchart which shows an example of the detailed process of a mask process. 論理積演算のルールを示す図である。It is a figure which shows the rule of AND operation. 文字画像と、各認識結果候補文字と対応するマスク画像との論理積をとった論理積画像の一例をしめす図である。It is a figure which shows an example of the logical product image which took the logical product of the character image and the mask image corresponding to each recognition result candidate character. 第２の文字認識部による各論理積画像の文字認識結果の一例を示す図である。It is a figure which shows an example of the character recognition result of each AND image by the 2nd character recognition part. 「犬」を含む文字画像の文字認識結果の一例を示す図である。It is a figure which shows an example of the character recognition result of the character image containing "dog".

以下、本件の実施形態について、添付図面を参照しつつ説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings.

まず、図１を参照して、本件の文字認識装置を含む映像管理システムのシステム構成の一例について説明する。図１に示すように、映像管理システム１００は、映像入力装置１０、文字認識装置２０、操作入力装置３０、及び映像データ蓄積部４０を備える。 First, an example of a system configuration of a video management system including the character recognition device of the present case will be described with reference to FIG. As shown in FIG. 1, the video management system 100 includes a video input device 10, a character recognition device 20, an operation input device 30, and a video data storage unit 40.

映像入力装置１０は、例えば、テレビ映像を受像する受像機である。映像入力装置１０は、文字情報を抽出する対象となる映像を文字認識装置２０に入力する。また、映像入力装置１０は、映像データを、映像データ蓄積部４０に保存する。 The video input device 10 is, for example, a receiver that receives television video. The video input device 10 inputs a video from which character information is extracted to the character recognition device 20. In addition, the video input device 10 stores the video data in the video data storage unit 40.

文字認識装置２０は、例えば、ＨＤＤビデオ装置に備えられる。文字認識装置２０は、映像入力装置１０から入力された映像データに対して文字認識を行い、文字情報を取得する。文字認識装置２０は、文字認識結果をテキストデータとして、操作入力装置３０が備える表示装置に出力する。また、操作入力装置３０から、認識結果採用の入力を受付けた場合には、認識結果を映像データ蓄積部４０に格納する。 The character recognition device 20 is provided in an HDD video device, for example. The character recognition device 20 performs character recognition on the video data input from the video input device 10 and acquires character information. The character recognition device 20 outputs the character recognition result as text data to a display device included in the operation input device 30. In addition, when the input of the recognition result is accepted from the operation input device 30, the recognition result is stored in the video data storage unit 40.

操作入力装置３０は、文字認識装置２０から文字認識結果を受付け、表示装置に表示する。これにより、操作入力装置３０は、ユーザに文字認識の結果を提供する。また、操作入力装置３０は、ユーザから所定の操作入力を受付ける。具体的には、操作入力装置３０は、ユーザから文字認識結果の採否を受付け、文字認識装置２０に文字認識結果の採否を出力する。また、操作入力装置３０は、認識結果に対する修正をユーザから受付けた場合、修正データを文字認識装置２０に出力する。 The operation input device 30 receives the character recognition result from the character recognition device 20 and displays it on the display device. Thereby, the operation input device 30 provides the result of character recognition to the user. In addition, the operation input device 30 receives a predetermined operation input from the user. Specifically, the operation input device 30 accepts acceptance / rejection of the character recognition result from the user, and outputs acceptance / rejection of the character recognition result to the character recognition device 20. In addition, when the operation input device 30 receives a correction to the recognition result from the user, the operation input device 30 outputs the correction data to the character recognition device 20.

映像データ蓄積部４０は、例えば、ＨＤＤビデオ装置に備えられたハードディスクドライブである。映像データ蓄積部４０は、映像データに含まれる文字情報をテキスト化したテキストデータを、文字認識装置２０から受付ける。映像データ蓄積部４０は、受付けたテキストデータを、映像入力装置１０から入力された映像データと紐付けて保存する。これにより、映像データに含まれる文字情報が、映像データ蓄積部４０に格納された映像データを検索する際の検索のキーとして、ユーザに提供される。 The video data storage unit 40 is, for example, a hard disk drive provided in an HDD video device. The video data storage unit 40 receives text data obtained by converting text information included in the video data into text from the character recognition device 20. The video data storage unit 40 stores the received text data in association with the video data input from the video input device 10. Thereby, the character information included in the video data is provided to the user as a search key when searching the video data stored in the video data storage unit 40.

次に、図２を参照して、文字認識装置２０のハードウェア構成の一例について説明する。文字認識装置２０は、ハードウェア構成として、例えば、入出力部２０１、ＲＯＭ（Read Only Memory）２０２、中央処理装置（ＣＰＵ：Central Processing Unit）２０３、ＲＡＭ（Random Access Memory）２０４、及びＨＤＤ２０５を備える。 Next, an example of the hardware configuration of the character recognition device 20 will be described with reference to FIG. The character recognition device 20 includes, for example, an input / output unit 201, a ROM (Read Only Memory) 202, a central processing unit (CPU) 203, a RAM (Random Access Memory) 204, and an HDD 205 as hardware configurations. .

入出力部２０１は、映像入力装置１０、操作入力装置３０及び映像データ蓄積部４０とデータの送受信を行う。ＲＯＭ２０２は、文字認識処理を実行するためのプログラム等を格納する。ＣＰＵ２０３は、ＲＯＭ２０２に格納されたプログラムを読み込んで実行する。ＲＡＭ２０４は、プログラムを実行する際に使用される一時的なデータを保存する。ＨＤＤ２０５は、文字認識処理に使用する辞書およびマスク画像（詳細は後述する）等を格納する。 The input / output unit 201 transmits / receives data to / from the video input device 10, the operation input device 30, and the video data storage unit 40. The ROM 202 stores a program for executing character recognition processing. The CPU 203 reads and executes a program stored in the ROM 202. The RAM 204 stores temporary data used when executing the program. The HDD 205 stores a dictionary and a mask image (details will be described later) used for character recognition processing.

次に、ＲＯＭ２０２に格納されたプログラムのＣＰＵ２０３による演算によって実現される文字認識装置２０の機能の一例について説明する。図３は、文字認識装置２０が備える機能の一例を示す機能ブロック図である。 Next, an example of the function of the character recognition device 20 realized by the operation of the program stored in the ROM 202 by the CPU 203 will be described. FIG. 3 is a functional block diagram illustrating an example of functions provided in the character recognition device 20.

文字認識装置２０は、映像受付部２１１、前処理部２１２、認識対象画像入力部２１３、第１の文字認識部２１４、マスク処理部２１５、第２の文字認識部２１６、決定部２１７及び出力部２１８を備える。映像受付部２１１〜出力部２１８は、ＲＯＭ２０２に格納されたプログラムのＣＰＵ２０３による演算によって実現される。また、文字認識装置２０は、辞書格納部２２１及びマスク画像格納部２２２を備える。 The character recognition device 20 includes a video reception unit 211, a preprocessing unit 212, a recognition target image input unit 213, a first character recognition unit 214, a mask processing unit 215, a second character recognition unit 216, a determination unit 217, and an output unit. 218. The video reception unit 211 to the output unit 218 are realized by calculation by the CPU 203 of a program stored in the ROM 202. Further, the character recognition device 20 includes a dictionary storage unit 221 and a mask image storage unit 222.

辞書格納部２２１及びマスク画像格納部２２２は、例えば、ＨＤＤ２０５等の記憶装置である。辞書格納部２２１は、文字認識に用いられる文字毎の参照用特徴ベクトル（辞書）を格納する。また、マスク画像格納部（マスク画像記憶部）２２２は、マスク処理に使用されるマスク画像（詳細は後述する）を格納する。 The dictionary storage unit 221 and the mask image storage unit 222 are storage devices such as the HDD 205, for example. The dictionary storage unit 221 stores a reference feature vector (dictionary) for each character used for character recognition. The mask image storage unit (mask image storage unit) 222 stores a mask image (details will be described later) used for mask processing.

映像受付部２１１は、映像入力装置１０から、文字情報を含む映像データを受付ける。映像中に含まれる文字情報とは、例えば、出演者のコメントやニュースの項目を文字にしたテロップ（字幕）をいう。映像受付部２１１は、受付けた映像データから、テロップを含むテロップ領域画像を切り出す。映像受付部２１１は、切り出したテロップ領域画像を前処理部２１２に出力する。 The video reception unit 211 receives video data including character information from the video input device 10. The character information included in the video refers to, for example, a telop (caption) in which the comment of a performer or an item of news is a character. The video reception unit 211 cuts out a telop area image including a telop from the received video data. The video reception unit 211 outputs the cut out telop area image to the preprocessing unit 212.

前処理部２１２は、映像受付部２１１からテロップ領域画像を受付ける。前処理部２１２は、テロップ領域画像に対して、輪郭検出処理、白黒反転処理、ぼかし処理、２値化処理及びノイズ除去処理を実行する。ここで、前処理部２１２が実行する処理と、その処理によって得られる画像について説明する。 The preprocessing unit 212 receives a telop area image from the video reception unit 211. The preprocessing unit 212 performs contour detection processing, black-and-white reversal processing, blurring processing, binarization processing, and noise removal processing on the telop area image. Here, processing executed by the preprocessing unit 212 and an image obtained by the processing will be described.

図４は、前処理部２１２が実行する各処理によって得られる画像の一例を示す図である。以下の説明では、映像受付部２１１が、図４（Ａ）に示す映像を映像入力装置１０から受付けたとする。図４（Ａ）に示す映像には、「浮いてた」というテロップが含まれる。この場合、映像受付部２１１は、テロップ領域画像として、図４（Ｂ）に示す画像を前処理部２１２に出力する。 FIG. 4 is a diagram illustrating an example of an image obtained by each process executed by the preprocessing unit 212. In the following description, it is assumed that the video reception unit 211 receives the video illustrated in FIG. 4A from the video input device 10. The video shown in FIG. 4A includes a telop “I was floating”. In this case, the video reception unit 211 outputs the image illustrated in FIG. 4B to the preprocessing unit 212 as a telop area image.

前処理部２１２は、まず、輪郭検出処理を行う。前処理部２１２は、例えば、ラプラシアンフィルタを用いて、文字のエッジ検出を行い、文字の輪郭線を検出する。具体的には、前処理部２１２は、図５に示すフィルタを用いて、着目画素に重み付けを行う。この重み付けを行った着目画素値から、周辺画素値を減算することによって、着目画素が強調されるため、文字のエッジの検出が可能となる。図４（Ｃ）は、輪郭検出処理後の画像の一例を示している。なお、輪郭線を検出するのに用いるフィルタは、ラプラシアンフィルタに限定されるわけではなく、ソーベルフィルタ等を用いてもよい。 The preprocessing unit 212 first performs contour detection processing. The pre-processing unit 212 detects a character edge by using a Laplacian filter, for example, and detects a contour line of the character. Specifically, the preprocessing unit 212 weights the pixel of interest using a filter illustrated in FIG. By subtracting the peripheral pixel value from the weighted target pixel value, the target pixel is emphasized, so that the edge of the character can be detected. FIG. 4C shows an example of an image after the contour detection process. The filter used for detecting the contour line is not limited to the Laplacian filter, and a Sobel filter or the like may be used.

次に、前処理部２１２は、図４（Ｃ）に示した輪郭検出処理後の画像を白黒反転する。図４（Ｄ）は、白黒反転処理後の画像の一例を示している。次に、前処理部２１２は、白黒反転後の画像（図４（Ｄ））に対して、ぼかし処理を行う。具体的には、前処理部２１２は、ガウシアンフィルタを画像中の全ての画素に適用する。ぼかし処理により、黒画素がまばらに存在する背景部分に対して、黒画素が密集している文字部分が強調される。図４（Ｅ）は、ぼかし処理後の画像の一例を示している。 Next, the preprocessing unit 212 inverts the image after the contour detection processing shown in FIG. FIG. 4D shows an example of an image after black-and-white reversal processing. Next, the preprocessing unit 212 performs a blurring process on the image after the black and white reversal (FIG. 4D). Specifically, the preprocessing unit 212 applies a Gaussian filter to all pixels in the image. By the blurring process, the character portion where the black pixels are densely emphasized with respect to the background portion where the black pixels are sparsely present. FIG. 4E shows an example of an image after the blurring process.

次に、前処理部２１２は、ぼかし処理後の画像の２値化を行う。具体的には、前処理部２１２は、濃度閾値を設定し、濃度閾値未満の濃度を有する画素を白画素とし、濃度閾値以上の濃度を有する画素を黒画素とする。図４（Ｆ）は、２値化処理後の画像の一例を示している。 Next, the preprocessing unit 212 binarizes the image after the blurring process. Specifically, the preprocessing unit 212 sets a density threshold, sets pixels having a density less than the density threshold as white pixels, and sets pixels having a density equal to or higher than the density threshold as black pixels. FIG. 4F shows an example of an image after binarization processing.

最後に、前処理部２１２は、図４（Ｆ）の２値化処理後の画像に対し、ノイズ除去処理を行う。具体的には、前処理部２１２は、連結した黒画素数が閾値以下のものを孤立ノイズとみなして除去する。図４（Ｇ）は、ノイズ除去処理後の画像の一例を示している。 Finally, the preprocessing unit 212 performs noise removal processing on the image after the binarization processing in FIG. Specifically, the preprocessing unit 212 regards the connected black pixel number equal to or less than a threshold as an isolated noise and removes it. FIG. 4G shows an example of an image after the noise removal process.

前処理部２１２は、ノイズ除去後の画像（図４（Ｇ））を文字認識の対象となる画像（以下、認識対象画像と記載する）として、認識対象画像入力部２１３に出力する。 The preprocessing unit 212 outputs the image after noise removal (FIG. 4G) to the recognition target image input unit 213 as an image that is a character recognition target (hereinafter referred to as a recognition target image).

認識対象画像入力部２１３は、前処理部２１２から、認識対象画像を受付ける。認識対象画像入力部２１３は、図６において実線の四角形で表すように、文字列の高さを文字サイズ（縦・横）とみなし、認識対象画像に含まれる文字の切り出しを行う。次に、認識対象画像入力部２１３は、切り出した一文字単位の画像領域（以下、文字画像と記載する）を第１の文字認識部２１４に出力する。 The recognition target image input unit 213 receives the recognition target image from the preprocessing unit 212. The recognition target image input unit 213 regards the height of the character string as a character size (vertical / horizontal) as shown by a solid rectangle in FIG. 6 and cuts out characters included in the recognition target image. Next, the recognition target image input unit 213 outputs the cut-out image area of one character unit (hereinafter referred to as a character image) to the first character recognition unit 214.

第１の文字認識部（第１の認識部）２１４は、辞書格納部２２１に格納された辞書を用いて、文字画像の文字認識を行う。ここで、第１の文字認識部２１４は、文字認識方式として、例えば、加重方向指数ヒストグラム法（“加重方向指数ヒストグラム法による手書き漢字・ひらがな認識”信学誌(D) vol.J70−D/7 pp.1390−1397, July 1987）を用いることができる。加重方向指数ヒストグラム法は、文字変形や文字線の太さの変化に強いという特徴を持っている。 The first character recognition unit (first recognition unit) 214 performs character recognition of the character image using the dictionary stored in the dictionary storage unit 221. Here, the first character recognition unit 214 uses, for example, a weighted direction exponent histogram method (“handwritten kanji / hiragana recognition by the weighted direction exponent histogram method” IEICE Journal (D) vol.J70-D / 7 pp. 1390-1397, July 1987) can be used. The weighted direction index histogram method has a feature that it is resistant to character deformation and change in thickness of the character line.

加重方向指数ヒストグラム法では、入力画像を、例えば、４８×４８画素などの大きさに正規化する。そして、正規化した各画素をさらに１６×１６個の小領域に分割し、小領域ごとに縦・横・斜め上、及び斜め下方向の黒画素並びの頻度（ヒストグラム）を調べる。そして、縦・横・斜め上、及び斜め下方向の黒画素並びのヒストグラムを並べたものを、特徴ベクトルとする。第１の文字認識部２１４は、この特徴ベクトルを、辞書格納部２２１が格納する文字毎の参照用特徴ベクトル（辞書）と比較し、ベクトル間の距離値を求める。ここで、ベクトル間の距離値は、文字画像に含まれる文字と、参照用特徴ベクトルに対応する文字との類似度を表す。なお、距離値が小さいとは、文字画像に含まれる文字と、参照用特徴ベクトルに対応する文字との類似度が高いことを意味する。第１の文字認識部２１４は、例えば、距離値が小さい順に所定の数の参照特徴ベクトルに対応する文字を、認識結果候補文字としてマスク処理部２１５に出力する。 In the weighted direction exponent histogram method, the input image is normalized to a size of 48 × 48 pixels, for example. Then, each normalized pixel is further divided into 16 × 16 small regions, and the frequency (histogram) of black pixel arrangement in the vertical, horizontal, diagonally upward, and diagonally downward directions is examined for each small region. A feature vector is obtained by arranging histograms of black pixels arranged vertically, horizontally, diagonally upward, and diagonally downward. The first character recognition unit 214 compares the feature vector with a reference feature vector (dictionary) for each character stored in the dictionary storage unit 221 to obtain a distance value between the vectors. Here, the distance value between vectors represents the similarity between the character included in the character image and the character corresponding to the reference feature vector. Note that a small distance value means that the similarity between the character included in the character image and the character corresponding to the reference feature vector is high. For example, the first character recognition unit 214 outputs characters corresponding to a predetermined number of reference feature vectors in ascending order of distance values to the mask processing unit 215 as recognition result candidate characters.

マスク処理部（論理積画像生成部）２１５は、マスク画像格納部２２２から、各認識結果候補文字と対応するマスク画像を取得する。ここで、認識結果候補文字と対応するマスク画像とは、所定のフォントの文字の画像である。本実施形態におけるマスク画像は、無地の背景部に所定のフォントの一例としてゴシック体の文字の画像であるとする。例えば、図６に示した「浮」を含む文字画像において、認識結果候補文字が「湾」、「溝」、及び「浮」であったとする。この場合、それぞれの認識結果候補文字と対応するマスク画像は、例えば、図７に示す画像となる。マスク処理部２１５は、文字画像とマスク画像との論理積をとった論理積画像を、認識結果候補文字毎に取得する。マスク処理部２１５は、論理積画像を第２の文字認識部２１６に出力する。 The mask processing unit (logical product image generation unit) 215 acquires a mask image corresponding to each recognition result candidate character from the mask image storage unit 222. Here, the mask image corresponding to the recognition result candidate character is an image of a character of a predetermined font. The mask image in this embodiment is assumed to be a Gothic character image as an example of a predetermined font on a plain background portion. For example, in the character image including “floating” shown in FIG. 6, it is assumed that the recognition result candidate characters are “bay”, “groove”, and “floating”. In this case, the mask image corresponding to each recognition result candidate character is, for example, an image shown in FIG. The mask processing unit 215 acquires a logical product image obtained by performing a logical product of the character image and the mask image for each recognition result candidate character. The mask processing unit 215 outputs the logical product image to the second character recognition unit 216.

第２の文字認識部（第２の認識部）２１６は、各論理積画像に対して文字認識を行う。第２の文字認識部２１６が使用する文字認識方式は、第１の文字認識部２１４と同様であるため、説明を省略する。第２の文字認識部２１６は、各論理積画像に対する文字認識結果（各論理積画像の認識結果候補文字）を決定部２１７に出力する。但し、第１の文字認識部２１４とは異なり、第２の文字認識部２１６は各論理積画像の認識結果候補文字として距離値が最小の認識結果候補文字のみを決定する。また、第２の文字認識部２１６は、その認識結果候補文字と併せてその距離値も決定部２１７へ出力する。なお、第１の文字認識部２１４が使用する文字認識方法と、第２の文字認識部２１６が使用する文字認識方法とは、異なっていてもよい。 The second character recognition unit (second recognition unit) 216 performs character recognition on each logical product image. Since the character recognition method used by the second character recognition unit 216 is the same as that of the first character recognition unit 214, the description thereof is omitted. The second character recognition unit 216 outputs the character recognition result (recognition result candidate character of each logical product image) for each logical product image to the determination unit 217. However, unlike the first character recognition unit 214, the second character recognition unit 216 determines only the recognition result candidate character having the minimum distance value as the recognition result candidate character of each logical product image. The second character recognition unit 216 also outputs the distance value together with the recognition result candidate character to the determination unit 217. Note that the character recognition method used by the first character recognition unit 214 and the character recognition method used by the second character recognition unit 216 may be different.

決定部２１７は、第２の文字認識部２１６から、各論理積画像に対する文字認識結果および距離値を受付ける。決定部２１７は、受付けた認識結果の中から、文字画像の認識結果とする文字を決定する。決定部２１７は、認識結果として決定した文字を出力部２１８に出力する。 The determination unit 217 receives a character recognition result and distance value for each logical product image from the second character recognition unit 216. The determination unit 217 determines a character to be a character image recognition result from the received recognition results. The determination unit 217 outputs the character determined as the recognition result to the output unit 218.

出力部２１８は、文字画像の認識結果として決定された文字を決定部２１７から受付ける。出力部２１８は、認識対象画像入力部２１３が切り出した全ての文字画像に対する認識結果を受付けると、各文字画像の認識結果をマージする。たとえば、出力部２１８は、図６の場合に「浮」、「い」、「て」及び「た」を含む各文字画像に対する認識結果を受付け、マージする。出力部２１８は、マージした認識結果（「浮いてた」）を、認識対象画像の認識結果として操作入力装置３０の表示装置に出力する。出力部２１８は、操作入力装置３０から、認識結果の採否情報を受付ける。出力部２１８は、認識結果が採用されると、認識結果を映像データ蓄積部４０に格納する。これにより、映像中のテロップがテキストデータ化され、検索のキーとして利用できるようになる。 The output unit 218 receives the character determined as the character image recognition result from the determination unit 217. When the output unit 218 receives the recognition results for all the character images cut out by the recognition target image input unit 213, the output unit 218 merges the recognition results of the character images. For example, in the case of FIG. 6, the output unit 218 receives and merges the recognition results for each character image including “floating”, “i”, “te”, and “ta”. The output unit 218 outputs the merged recognition result (“floating”) to the display device of the operation input device 30 as the recognition result of the recognition target image. The output unit 218 accepts the recognition result acceptance information from the operation input device 30. When the recognition result is adopted, the output unit 218 stores the recognition result in the video data storage unit 40. As a result, the telop in the video is converted into text data and can be used as a search key.

次に、文字認識装置２０が実行する処理の一例について具体例を参照しながら説明する。図８は、文字認識装置２０が実行する処理の一例を示すフローチャートである。 Next, an example of processing executed by the character recognition device 20 will be described with reference to a specific example. FIG. 8 is a flowchart illustrating an example of processing executed by the character recognition device 20.

映像受付部２１１は、映像入力装置１０から映像を受付け(ステップＳ１１)、テロップ領域画像を前処理部２１２に出力する。 The video reception unit 211 receives a video from the video input device 10 (step S11), and outputs a telop area image to the preprocessing unit 212.

前処理部２１２は、テロップ領域画像に、図４で説明した前処理を施す（ステップＳ１３）。次に、認識対象画像入力部２１３は、前処理を施した認識対象画像を文字毎に切り出し（ステップＳ１５）、文字画像を取得する（ステップＳ１７）。認識対象画像入力部２１３は、取得した文字画像を第１の文字認識部２１４に出力する。 The preprocessing unit 212 performs the preprocessing described with reference to FIG. 4 on the telop area image (step S13). Next, the recognition target image input unit 213 cuts out the recognition target image subjected to the preprocessing for each character (step S15), and acquires a character image (step S17). The recognition target image input unit 213 outputs the acquired character image to the first character recognition unit 214.

第１の文字認識部２１４は、文字画像に対して文字認識を行う（ステップＳ１９）。第１の文字認識部２１４は、認識結果候補文字のうち距離値が小さい方からＸ番目までの文字（上位Ｘ位までの文字）を、マスク処理部２１５に出力する。 The first character recognition unit 214 performs character recognition on the character image (step S19). The first character recognition unit 214 outputs to the mask processing unit 215 the Xth characters (characters up to the top X) from the recognition result candidate characters having the smallest distance value to the Xth character.

例えば、第１の文字認識部２１４が、図６に示される「浮」を含む文字画像に対して文字認識を行ったとする。図９は、第１の文字認識部２１４による文字認識結果の一例を示している。第１の文字認識部２１４は、図９に示す文字認識結果において、例えば、上位３位までの文字をマスク処理部２１５に出力する。第１の文字認識部２１４は、図９の認識結果候補文字において距離値が小さい方から３番目までの文字「湾」、「溝」及び「浮」を、マスク処理部２１５に出力する。なお、第１の文字認識部２１４は、上位Ｘ位までの文字ではなく、例えば、距離値がしきい値以下（例えば、２００以下）である文字をマスク処理部２１５に出力してもよい。 For example, it is assumed that the first character recognition unit 214 performs character recognition on a character image including “floating” shown in FIG. FIG. 9 shows an example of a character recognition result by the first character recognition unit 214. In the character recognition result shown in FIG. 9, the first character recognition unit 214 outputs, for example, the top three characters to the mask processing unit 215. The first character recognition unit 214 outputs to the mask processing unit 215 the characters “bay”, “groove”, and “float” from the smallest distance value in the recognition result candidate character of FIG. Note that the first character recognition unit 214 may output, to the mask processing unit 215, for example, a character whose distance value is equal to or less than a threshold value (for example, 200 or less) instead of the characters up to the top X.

次に、マスク処理部２１５は、文字画像に対してマスク処理を行う（ステップＳ２１）。ここで、マスク処理の詳細について説明する。図１０は、マスク処理の詳細な処理の一例を示すフローチャートである。 Next, the mask processing unit 215 performs mask processing on the character image (step S21). Here, details of the mask processing will be described. FIG. 10 is a flowchart illustrating an example of detailed processing of the mask processing.

マスク処理部２１５は、第１の文字認識部２１４から受付けた認識結果候補文字に対応するマスク画像を、マスク画像格納部２２２から取得する（ステップＳ２１１）。例えば、マスク処理部２１５は、「湾」、「溝」及び「浮」と対応するマスク画像（図７）をマスク画像格納部２２２から取得する。 The mask processing unit 215 acquires a mask image corresponding to the recognition result candidate character received from the first character recognition unit 214 from the mask image storage unit 222 (step S211). For example, the mask processing unit 215 acquires a mask image (FIG. 7) corresponding to “bay”, “groove”, and “floating” from the mask image storage unit 222.

マスク処理部２１５は、文字画像とマスク画像との位置合わせを行う（ステップＳ２１３）。実施の一例では、マスク処理部２１５は、文字画像とマスク画像に外接する矩形をそれぞれ抽出する。マスク処理部２１５は、文字画像とマスク画像の外接矩形が同じ大きさになるように線形正規化（縦方向、横方向に伸縮）をする。マスク処理部２１５は、正規化後の文字画像とマスク画像とを重ね合わせる、すなわち文字画像とマスク画像との論理積をとり、論理積画像を生成する（ステップＳ２１５）。具体的には、マスク処理部２１５は、文字画像とマスク画像との間で、画素単位の論理積演算を行う。論理積演算は、図１１（Ａ）に示すルールに基づいて行われる。ここで、表中の「１」は文字部分、すなわち黒画素を示し、「０」は文字以外の背景部分、すなわち白画素を表す。なお、背景部分が白以外の色からなる場合（例えば、青や赤）の場合には、「０」は文字以外の背景部分である青画素や赤画素を表す。 The mask processing unit 215 performs alignment between the character image and the mask image (step S213). In one example, the mask processing unit 215 extracts a character image and a rectangle circumscribing the mask image. The mask processing unit 215 performs linear normalization (stretching in the vertical and horizontal directions) so that the circumscribed rectangles of the character image and the mask image have the same size. The mask processing unit 215 superimposes the normalized character image and the mask image, that is, obtains a logical product of the character image and the mask image, and generates a logical product image (step S215). Specifically, the mask processing unit 215 performs a logical product operation in units of pixels between the character image and the mask image. The logical product operation is performed based on the rule shown in FIG. Here, “1” in the table indicates a character portion, that is, a black pixel, and “0” indicates a background portion other than the character, that is, a white pixel. When the background portion is made of a color other than white (for example, blue or red), “0” represents a blue pixel or a red pixel that is a background portion other than characters.

図１１（Ａ）によれば、文字画像とマスク画像とにおいて、同じ位置に黒画素が存在する、すなわち、両画像の画素値が「１」の場合にのみ、論理積画像の画素が黒画素（画素値「１」）となる。つまり、文字画像において図１１（Ｂ）に示すようなノイズが存在していたとしても、マスク画像の背景部との論理積をとることによってノイズが除去される。図１２は、文字画像と、各認識結果候補文字と対応するマスク画像との論理積をとった論理積画像を表している。図１２に示すように、認識結果候補文字の「浮」と対応するマスク画像と、文字画像との論理積をとった場合、文字画像中に存在するノイズが除去されている。 According to FIG. 11A, in the character image and the mask image, the black pixel exists at the same position, that is, the pixel of the logical product image is a black pixel only when the pixel value of both images is “1”. (Pixel value “1”). That is, even if the noise as shown in FIG. 11B exists in the character image, the noise is removed by taking the logical product with the background portion of the mask image. FIG. 12 shows a logical product image obtained by performing a logical product of the character image and each recognition result candidate character and the corresponding mask image. As shown in FIG. 12, when the logical product of the mask image corresponding to the recognition result candidate character “floating” and the character image is taken, noise existing in the character image is removed.

マスク処理部２１５は、第１の文字認識部２１４から受付けた、全ての認識結果候補文字のマスク画像と、文字画像との論理積画像を取得したか否か判定する（ステップＳ２１７）。 The mask processing unit 215 determines whether or not the logical product image of the mask images of all recognition result candidate characters and the character image received from the first character recognition unit 214 has been acquired (step S217).

論理積画像を取得していない認識結果候補文字があれば（ステップＳ２１７の判定がＮＯの場合）、マスク処理部２１５は、次の認識結果候補文字について、ステップＳ２１１〜Ｓ２１５の処理を実行する。全ての認識結果候補文字について論理積画像を取得すれば（ステップＳ２１７の判定がＹＥＳの場合）、マスク処理部２１５は、本処理を終了し、取得した論理積画像を第２の文字認識部２１６に出力する。 If there is a recognition result candidate character for which a logical product image has not been acquired (if the determination in step S217 is NO), the mask processing unit 215 executes the processing of steps S211 to S215 for the next recognition result candidate character. If the logical product images have been acquired for all the recognition result candidate characters (when the determination in step S217 is YES), the mask processing unit 215 ends this processing, and uses the acquired logical product image as the second character recognition unit 216. Output to.

図８に戻り、説明を続ける。第２の文字認識部２１６は、論理積画像のそれぞれに対して、文字認識処理を実行する（ステップＳ２３）。第２の文字認識部２１６は、各論理積画像に対する文字認識の結果（各論理積画像の認識結果候補文字）を決定部２１７に出力する。実施の一例では、各論理積画像（図１３左欄）に対して、第１の文字認識部と同様の文字認識を行う。文字認識によって、文字認識の結果の文字（図１３中欄）と、論理積画像と認識結果の文字との類似度を示す距離値（図１３右欄）を取得し、決定部２１７に出力する。但し、第２の文字認識部２１６は、各論理積画像に対して、距離値の最も小さい認識結果の文字のみを決定部２１７に出力する。 Returning to FIG. 8, the description will be continued. The second character recognition unit 216 executes character recognition processing for each of the logical product images (step S23). The second character recognition unit 216 outputs the result of character recognition for each logical product image (recognition result candidate character for each logical product image) to the determination unit 217. In an example of implementation, character recognition similar to that of the first character recognition unit is performed on each logical product image (left column in FIG. 13). By character recognition, a distance value (right column in FIG. 13) indicating the similarity between the character as a result of character recognition (column in FIG. 13) and the logical product image and the character as the recognition result is acquired and output to the determination unit 217. . However, the second character recognition unit 216 outputs only the character of the recognition result having the smallest distance value to the determination unit 217 for each logical product image.

決定部２１７は、第２の文字認識部２１６から文字認識の結果を受付け、受付けた認識結果候補文字の中から、認識結果として出力する文字を決定する（ステップＳ２５）。例えば、第２の文字認識部２１６による各論理積画像の文字認識結果が、図１３に示すとおりであったとする。この場合、決定部２１７は、距離値が最も小さい「浮」を、「浮」を含む文字画像の認識結果として決定する。なお、図１３の距離値は、論理積画像と認識結果候補文字との距離値を示している。 The determination unit 217 receives the character recognition result from the second character recognition unit 216, and determines the character to be output as the recognition result from the received recognition result candidate characters (step S25). For example, it is assumed that the character recognition result of each logical product image by the second character recognition unit 216 is as shown in FIG. In this case, the determination unit 217 determines “floating” having the smallest distance value as the recognition result of the character image including “floating”. The distance value in FIG. 13 indicates the distance value between the logical product image and the recognition result candidate character.

出力部２１８は、切り出された文字画像の全てについて、認識結果として出力する文字を決定したか否か判定する（ステップＳ２７）。 The output unit 218 determines whether or not the character to be output as the recognition result has been determined for all of the clipped character images (step S27).

全ての文字画像について出力する文字を決定していない場合（ステップＳ２７の判定がＮＯの場合）、文字認識装置２０は、次の文字画像について、ステップＳ１７からの処理を実行する。全ての文字画像について文字を決定した場合（ステップＳ２７の判定がＹＥＳの場合）、出力部２１８は、認識結果を、操作入力装置３０の表示装置に出力する（ステップＳ２９）。出力部２１８は、認識結果が採用されたか否か判定する（ステップＳ３１）。出力部２１８は、認識結果が採用された場合には、認識結果を映像データ蓄積部４０に保存し（ステップＳ３３）、本処理を終了する。認識結果が採用されなかった場合には、出力部２１８は、操作入力装置３０から認識結果の修正データを受付け、修正された認識結果を映像データ蓄積部４０に保存し（ステップＳ３５）、本処理を終了する。 If the character to be output has not been determined for all the character images (when the determination in step S27 is NO), the character recognition device 20 executes the processing from step S17 for the next character image. When characters are determined for all character images (when the determination in step S27 is YES), the output unit 218 outputs the recognition result to the display device of the operation input device 30 (step S29). The output unit 218 determines whether the recognition result has been adopted (step S31). When the recognition result is adopted, the output unit 218 stores the recognition result in the video data storage unit 40 (step S33), and ends this process. When the recognition result is not adopted, the output unit 218 receives the correction data of the recognition result from the operation input device 30, stores the corrected recognition result in the video data storage unit 40 (Step S35), and performs this processing. Exit.

以上の説明から明らかなように、本実施形態によれば、認識対象画像入力部２１３が取得した文字画像に対して、第１の文字認識部２１４が文字認識を行い、複数の認識結果候補文字を取得する。マスク処理部２１５が、認識結果候補文字と対応するマスク画像と文字画像との論理積をとった論理積画像を生成し、第２の文字認識部２１６が、各論理積画像に対して文字認識を行う。そして、決定部２１７は、第２の文字認識部２１６による各論理積画像の認識結果候補文字の中から、距離値が最小のものを、文字画像に含まれる文字の認識結果とする。図９で示したように、第１の文字認識部２１４による文字認識では、文字画像に存在するノイズの影響により、文字画像に含まれる「浮」ではなく、「湾」の距離値が最も小さくなっている。しかし、図１２に示したように、文字画像に含まれる文字と対応するマスク画像（図１２の場合「浮」）と文字画像との論理積をとることによって、文字画像の背景部に含まれるノイズを除去できる。その結果、各論理積画像の文字認識においては、「浮」の距離値が最も小さくなり、文字画像に含まれる「浮」が認識結果として取得される。このように、文字画像と認識結果候補文字を含むマスク画像との論理積をとることによって文字画像内のノイズを除去し、ノイズに起因する誤認識の可能性を低減できる。その結果、文字認識の認識率を向上させることができる。 As is clear from the above description, according to the present embodiment, the first character recognition unit 214 performs character recognition on the character image acquired by the recognition target image input unit 213, and a plurality of recognition result candidate characters. To get. The mask processing unit 215 generates a logical product image obtained by ANDing the mask image corresponding to the recognition result candidate character and the character image, and the second character recognition unit 216 performs character recognition for each logical product image. I do. Then, the determination unit 217 sets the character having the smallest distance value among the recognition result candidate characters of each logical product image by the second character recognition unit 216 as the recognition result of the character included in the character image. As shown in FIG. 9, in the character recognition by the first character recognition unit 214, the distance value of “bay” is the smallest, not “floating” included in the character image, due to the influence of noise present in the character image. It has become. However, as shown in FIG. 12, the character image is included in the background portion of the character image by taking the logical product of the character image and the mask image corresponding to the character included in the character image (“floating” in FIG. 12). Noise can be removed. As a result, in character recognition of each logical product image, the distance value of “floating” becomes the smallest, and “floating” included in the character image is acquired as a recognition result. Thus, by taking the logical product of the character image and the mask image including the recognition result candidate character, the noise in the character image is removed, and the possibility of erroneous recognition due to the noise can be reduced. As a result, the recognition rate of character recognition can be improved.

特開２００８−１９１９０６号公報では、文字画像に含まれる文字のエッジ領域を膨張させた画像をマスク画像として用いている。しかしながら、エッジ領域を膨張させたマスク画像と文字画像との論理積をとっても、膨張させた箇所にノイズが存在する場合にはノイズを除去することができず、２値化した画像内にノイズが混入してしまう可能性があった。しかしながら、本実施形態によれば、文字画像に含まれる文字を含むマスク画像を用いれば、文字部分を残存させつつ、文字以外の部分に含まれるノイズを除去した論理積画像を生成できる。他方、ノイズを誤認識して選ばれた認識結果候補文字のマスク画像と文字画像では、論理積の画素が減少するので最終的な文字に選択される可能性は格段に低くなる。このように、ノイズの少ない論理積画像を文字認識することによって、ノイズに起因する誤認識の可能性を低減できるため、認識率が向上する。 In Japanese Patent Laid-Open No. 2008-191906, an image obtained by expanding an edge region of a character included in a character image is used as a mask image. However, even if the logical product of the mask image in which the edge region is expanded and the character image is taken, if noise exists in the expanded portion, the noise cannot be removed, and noise is present in the binarized image. There was a possibility of mixing. However, according to the present embodiment, if a mask image including characters included in a character image is used, a logical product image in which noise included in portions other than the characters is removed while the character portions remain can be generated. On the other hand, in the mask image and the character image of the recognition result candidate character selected by erroneously recognizing noise, the possibility of being selected as the final character is remarkably reduced because the logical product pixels are reduced. Thus, by recognizing a logical product image with little noise, the possibility of erroneous recognition due to noise can be reduced, so that the recognition rate is improved.

上述の実施形態では、決定部２１７は、第２の文字認識部２１６が出力した認識結果候補文字のうち、距離値が最も小さい認識結果候補文字を認識結果とした。しかしながら、決定部２１７は、認識結果候補文字同士の距離値の差がしきい値以下（例えば、２０以下）である場合には、認識結果とした距離値が最小の認識結果候補文字を選択しても誤っている可能性が高くなる。その場合は、他の認識結果と区別できる表示処理（リジェクト処理）をおこなってもよい。 In the above-described embodiment, the determination unit 217 sets the recognition result candidate character having the smallest distance value among the recognition result candidate characters output from the second character recognition unit 216 as the recognition result. However, when the difference between the distance values of the recognition result candidate characters is equal to or less than a threshold value (for example, 20 or less), the determination unit 217 selects the recognition result candidate character having the smallest distance value as the recognition result. However, there is a high possibility that it is wrong. In that case, display processing (reject processing) that can be distinguished from other recognition results may be performed.

例えば、図１４に示すように、「犬」を含む文字画像の文字認識を文字認識装置２０で行ったとする。「犬」と「大」にそれぞれ対応するマスク画像と、文字画像との論理積画像に対して第２の文字認識部２１６が文字認識処理を行った結果、各論理積画像に対して、「犬」と「大」とが認識結果候補文字として出力されたとする。図１４では、「犬」の距離値は１００であり、「大」の距離値は１２０であり、両文字の距離値の差は２０となっており、しきい値以下である。この場合、決定部２１７は、リジェクト処理として、距離値が最も小さい「犬」を認識結果とはせずに、例えば、リジェクトを表す「Ｒ」を出力部２１８に出力する。あるいは、決定部２１７は、距離値が最も小さい「犬」を認識結果とするが、リジェクト処理として、出力部２１８に対し、文字の認識結果を表示する際に、文字の色を例えば黒ではなく赤に変えるよう指示する。これにより、ユーザは、誤認識されているおそれのある文字を重点的に確認すればよいため、認識結果の確認に必要な時間を短縮できる。また、認識結果の確認時において、確認が必要な文字の見落とし等を低減できるため、映像データの検索のキーとなるテキストデータの正確性を向上できる。 For example, as shown in FIG. 14, it is assumed that character recognition of a character image including “dog” is performed by the character recognition device 20. As a result of the second character recognition unit 216 performing character recognition processing on the logical product image of the mask image and the character image respectively corresponding to “dog” and “large”, for each logical product image, “ Assume that “dog” and “large” are output as recognition result candidate characters. In FIG. 14, the distance value of “Dog” is 100, the distance value of “Large” is 120, and the difference between the distance values of both characters is 20, which is below the threshold value. In this case, the determination unit 217 outputs, for example, “R” representing the rejection to the output unit 218 without rejecting “dog” having the smallest distance value as the recognition result as the rejection process. Alternatively, the determination unit 217 uses “dog” with the smallest distance value as the recognition result, but when the character recognition result is displayed on the output unit 218 as the rejection process, the character color is not black, for example. Instruct to change to red. Thereby, since the user only needs to focus on characters that may be erroneously recognized, the time required for confirming the recognition result can be shortened. In addition, when checking the recognition result, it is possible to reduce oversight of characters that need to be checked, so that the accuracy of text data that is a key for searching video data can be improved.

以上、本件の実施形態について詳述したが、本件は係る特定の実施形態に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形・変更が可能である。 The embodiment of the present invention has been described in detail above, but the present invention is not limited to the specific embodiment, and various modifications and changes can be made within the scope of the gist of the present invention described in the claims. Is possible.

上述の実施形態では、マスク画像は、ゴシック体からなる文字を含んでいたが、マスク画像に含まれる文字のフォントは、ゴシック体に限られない。例えば、明朝体、ポップ体等からなる文字を含む画像をマスク画像に用いてもよい。また、マスク処理部２１５は、特開平７−１５２８５５や特開平８−２３５３１４などに示される周知のフォントを識別する技術を用いてテロップに使用されている文字のフォントを識別し、識別をする文字と同一のフォントのマスク画像を用いて、論理積画像を生成してもよい。テロップに使用されている文字のフォントと、マスク画像が含む文字のフォントとを同一にすることによって、ノイズを除去しつつ、文字部分の再現性が高い論理積画像を生成できる。その結果、文字認識の認識率をさらに向上できる。なお、例えば、第１の文字認識部２１４において、文字／フォント別の辞書を用いて文字認識を行い、距離値が最小となるフォントを、テロップの文字のフォントであると決定できる。この場合、辞書格納部２２１は、辞書として、文字／フォント別に参照用特徴ベクトルを格納する。 In the above-described embodiment, the mask image includes characters made of Gothic, but the font of characters included in the mask image is not limited to Gothic. For example, an image including characters composed of Mincho, pop, etc. may be used as the mask image. Further, the mask processing unit 215 identifies a character font used in the telop by using a known font identifying technique disclosed in Japanese Patent Laid-Open Nos. 7-152855 and 8-235314, and the character to be identified. A logical product image may be generated using a mask image of the same font. By making the character font used in the telop and the character font included in the mask image the same, it is possible to generate a logical product image with high reproducibility of the character portion while removing noise. As a result, the recognition rate of character recognition can be further improved. Note that, for example, the first character recognition unit 214 can perform character recognition using a character / font-specific dictionary, and determine a font having a minimum distance value as a font of a telop character. In this case, the dictionary storage unit 221 stores reference feature vectors for each character / font as a dictionary.

また、上述の実施形態では、映像中のテロップについて文字認識を行ったが、携帯電話やパーソナルコンピュータに付属するカメラで撮った画像に含まれる文字（看板等の文字）の認識にも本件の文字認識装置を使用することができる。また、背景に透かし画像のある帳票に印字された文字や、雑誌等において写真上に印字された文字の認識にも、本件の文字認識装置を使用できる。この場合、文字認識装置２０は、スキャナ等を用いて、文字認識の対象となる画像を入力することができる。 In the above-described embodiment, the character recognition is performed on the telop in the video. However, the present character is also recognized for the recognition of characters (characters such as signboards) included in images taken with a camera attached to a mobile phone or a personal computer. A recognition device can be used. The character recognition apparatus of the present invention can also be used for recognizing characters printed on a form having a watermark image in the background or characters printed on a photograph in a magazine or the like. In this case, the character recognition device 20 can input an image as a character recognition target using a scanner or the like.

また、上述の実施形態では、テレビ受像機で受信した映像中の文字認識を例にして説明を行ったが、映像管理システム１００は、放送局等に導入することも可能である。 Further, in the above-described embodiment, description has been made by taking as an example character recognition in video received by a television receiver, but the video management system 100 can also be introduced into a broadcasting station or the like.

また、上述の実施形態では、文字認識の方法として、加重方向指数ヒストグラム法を用いたが、例えば下記のような方法を用いてもよい。
（１）孫寧,田原透,阿曽弘具,木村正行,“方向線素特徴量を用いた高精度文字認識”電子情報通信学会論文誌(D-II) vol.J74-D-II no.3，pp.330-339，Mar. 1991.
（２）萩田他、“外郭方向寄与度特徴による手書き漢字の識別” 電子通信学会論文誌 '83/10 Vol.J66-D No.10, pp.1185-1192
（３）▲裴▼他、“手書き漢字認識の一手法 −多元圧縮法と部分パターン法による認識−”電子通信学会論文誌 '85/4 Vol.J68-D No.4, pp.773-780
（４）斎藤他、“手書漢字の方向パターン・マッチング法による解析”電子通信学会論文誌 '82/5 Vol.J65-D No.5, pp.550-557 In the above-described embodiment, the weighted direction index histogram method is used as the character recognition method. However, for example, the following method may be used.
(1) Sonning, Toru Tahara, Hiroki Aso, Masayuki Kimura, “High-Precision Character Recognition Using Directional Element Features” IEICE Transactions (D-II) vol.J74-D-II no. 3, pp.330-339, Mar. 1991.
(2) Hirota et al., “Identification of Handwritten Kanji by Outer Direction Contribution Features” IEICE Transactions '83 / 10 Vol.J66-D No.10, pp.1185-1192
(3) ▲ 裴 ▼ and others, “A Method for Handwritten Kanji Recognition: Recognition by Multi-element Compression Method and Partial Pattern Method”, IEICE Transactions '85 / 4 Vol.J68-D No.4, pp.773-780
(4) Saito et al., “Analysis of Handwritten Kanji by Direction Pattern Matching Method” IEICE Transactions '82 / 5 Vol.J65-D No.5, pp.550-557

なお、上記の文字認識装置２０が有する機能は、ＣＰＵ、ＲＯＭ、ＲＡＭ等を備えるコンピュータによって実現することができる。その場合、文字認識装置２０が有すべき機能の処理内容を記述したプログラムが提供される。そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録するようにしてもよい。 Note that the functions of the character recognition device 20 described above can be realized by a computer including a CPU, a ROM, a RAM, and the like. In that case, a program describing the processing contents of the functions that the character recognition device 20 should have is provided. By executing the program on a computer, the above processing functions are realized on the computer. The program describing the processing content may be recorded on a computer-readable recording medium.

プログラムを流通させる場合には、例えば、そのプログラムが記録されたＤＶＤ（Digital Versatile Disc）、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）などの可搬型記録媒体の形態で販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送してもよい。 When the program is distributed, for example, it is sold in the form of a portable recording medium such as a DVD (Digital Versatile Disc) or a CD-ROM (Compact Disc Read Only Memory) on which the program is recorded. Alternatively, the program may be stored in a storage device of the server computer, and the program may be transferred from the server computer to another computer via a network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行してもよい。また、コンピュータは、サーバコンピュータからプログラムが転送されるごとに、逐次、受け取ったプログラムに従った処理を実行してもよい。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. Note that the computer may read the program directly from the portable recording medium and execute processing according to the program. Further, each time the program is transferred from the server computer, the computer may sequentially execute processing according to the received program.

また、インターネット等の通信網に接続されたサーバコンピュータを本件の文字認識装置２０とし、これに接続されたパーソナルコンピュータ等からの入力映像に対して文字認識を行うサービスをサーバコンピュータから提供するようにしてもよい（ＡＳＰ(Application Service Provider)）。 Further, a server computer connected to a communication network such as the Internet is used as the character recognition device 20 of the present invention, and a service for performing character recognition on an input video from a personal computer connected to the server is provided from the server computer. (ASP (Application Service Provider)).

２０…文字認識装置
２１３…認識対象画像入力部
２１４…第１の文字認識部
２１５…マスク処理部
２１６…第２の文字認識部
２１７…決定部 DESCRIPTION OF SYMBOLS 20 ... Character recognition apparatus 213 ... Recognition target image input part 214 ... 1st character recognition part 215 ... Mask processing part 216 ... 2nd character recognition part 217 ... Determination part

Claims

画像情報について文字認識を行い、該画像情報のうち一文字として認識された画像領域に対して複数の認識結果候補文字を取得する第１の認識部と、
文字の画像情報であるマスク画像が記憶されたマスク画像記憶部と、
前記マスク画像記億部を参照し、前記認識結果候補文字のそれぞれに対して、認識結果候補文字に対応するマスク画像を取得し、該マスク画像と前記一文字として認識された画像領域との論理積をとった論理積画像を生成する生成部と、
前記論理積画像のそれぞれに対して、文字認識を行い、論理積画像に対応する文字である認識文字、および、該認識文字と該論理積画像との類似度を決定する第２の認識部と、
前記第２の認識部が決定した前記認識文字それぞれの前記論理積画像との類似度に基づいて、複数の該認識文字の中から、前記一文字として認識された画像領域に対応する文字を決定する決定部と、
を備えることを特徴とする文字認識装置。 A first recognition unit that performs character recognition on image information and obtains a plurality of recognition result candidate characters for an image region recognized as one character of the image information;
A mask image storage unit storing a mask image which is image information of characters;
Referring to the mask image storage part, for each of the recognition result candidate characters, obtain a mask image corresponding to the recognition result candidate character, and AND the mask image and the image area recognized as the one character A generating unit for generating a logical product image obtained by taking
Character recognition is performed on each of the logical product images, a recognized character that is a character corresponding to the logical product image, and a second recognition unit that determines the similarity between the recognized character and the logical product image; ,
A character corresponding to the image region recognized as the one character is determined from among the plurality of recognized characters based on the similarity of each of the recognized characters determined by the second recognition unit with the logical product image. A decision unit;
A character recognition device comprising:

画像情報について文字認識を行い、該画像情報のうち一文字として認識された画像領域に対して複数の認識結果候補文字を取得する第１の認識ステップと、
文字の画像情報であるマスク画像が記憶されたマスク画像記億部を参照し、前記認識結果候補文字のそれぞれに対して、認識結果候補文字に対応するマスク画像を取得し、該マスク画像と前記一文字として認識された画像領域との論理積をとった論理積画像を生成する生成ステップと、
前記論理積画像のそれぞれに対して、文字認識を行い、論理積画像に対応する文字である認識文字、および、該認識文字と該論理積画像との類似度を決定する第２の認識ステップと、
前記第２の認識ステップで決定された前記認識文字それぞれの論理積画像との類似度に基づいて、複数の該認識文字の中から、前記一文字として認識された画像領域に対応する文字を決定する決定ステップと、
をコンピュータに実行させる文字認識プログラム。 A first recognition step of performing character recognition on the image information and obtaining a plurality of recognition result candidate characters for an image region recognized as one character of the image information;
With reference to a mask image storage unit in which a mask image that is character image information is stored, a mask image corresponding to a recognition result candidate character is obtained for each of the recognition result candidate characters, and the mask image and the A generation step for generating a logical product image obtained by performing a logical product with an image region recognized as one character;
A second recognition step of performing character recognition on each of the logical product images, determining a recognition character that is a character corresponding to the logical product image, and a similarity between the recognition character and the logical product image; ,
Based on the similarity between each of the recognized characters determined in the second recognition step and the logical product image, a character corresponding to the image region recognized as the one character is determined from the plurality of recognized characters. A decision step;
A character recognition program that causes a computer to execute.

コンピュータが、
画像情報について文字認識を行い、一文字として認識された画像領域に対して複数の認識結果候補文字を取得する第１の認識ステップと、
文字の画像情報であるマスク画像が記憶されたマスク画像記憶部を参照し、前記認識結果候補文字のそれぞれに対して、認識結果候補文字に対応するマスク画像を取得し、該マスク画像と前記一文字として認識された画像領域との論理積をとった論理積画像を生成する生成ステップと、
前記論理積画像のそれぞれに対して、文字認識を行い、論理積画像に対応する文字である認識文字、および、該認識文字と該論理積画像との類似度を決定する第2の認識ステップと、
前記第２の認識ステップで決定された前記認識文字それぞれの論理積画像との類似度に基づいて、複数の該認識文字の中から、前記一文字として認識された画像領域に対応する文字を決定する決定ステップと、
を実行する文字認識方法。 Computer
A first recognition step of performing character recognition on image information and obtaining a plurality of recognition result candidate characters for an image region recognized as one character;
With reference to a mask image storage unit in which a mask image that is character image information is stored, a mask image corresponding to a recognition result candidate character is obtained for each of the recognition result candidate characters, and the mask image and the one character Generating a logical product image obtained by performing a logical product with the image area recognized as
Character recognition is performed on each of the logical product images, a recognition character that is a character corresponding to the logical product image, and a second recognition step for determining the similarity between the recognition character and the logical product image; ,
Based on the similarity between each of the recognized characters determined in the second recognition step and the logical product image, a character corresponding to the image region recognized as the one character is determined from the plurality of recognized characters. A decision step;
Character recognition method to execute.