JPS60207983A

JPS60207983A - Production system of dictionary for recognizing character

Info

Publication number: JPS60207983A
Application number: JP59063560A
Authority: JP
Inventors: Masaki Komiya; 小宮　雅紀
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1984-03-31
Filing date: 1984-03-31
Publication date: 1985-10-19

Abstract

PURPOSE:To accomplish an efficient production of dictionaries for recognizing individual character and for aftertreatment by deleting the same word and character and by developing an object word for aftertreatment of an OCR from a Japanese input to an individual character level in accordance with the individual character. CONSTITUTION:Words such as KANJI (Chinese character) becoming an object of aftertreatment of an OCR13 are inputted from a Japanese I/O device 10 to a controller 12, where in accordance with input words and character groups, the same word and character are deleted, characters unable to be read out are extracted, and a word is developed to an individual character level. As a result basic data necessary for production of dictionaries 15 for aftertreatment and for recognizing the individual character are prepared. The controller 12 makes the aftertreatment dictionary based on the basic data, and produces the dictionary for recognizing an individual character by extracting and synthesizing necessary- character-kind-data from an all-kind-character dictionary memory 14. These produced dictionaries are stored in dictionary memories 15 for aftertreatment and for recognizing an individual character 17.

Description

【発明の詳細な説明】［発明の技術分野］本発明は、漢字ＯＣＲ用の辞書を作成づるための文字認
識用辞書作成方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a character recognition dictionary creation method for creating a dictionary for Kanji OCR.

［発明の技術的背景とその問題点］近年、パターン認識技術の進歩により、漢字を含む日本
；ｎを読取るＯＣＲ（光学的文字読取装［）が即発され
ている。このような漢字ＯＣＲでは、文字認識用辞書と
して特に個別文字！！識用辞龜及び後処理用辞書が重要
である。個別文字０識用辞書は、ＯＣＲの読取対象とな
る文字の標準パターンを組合わせたデータからなる。ま
た、後処理用辞書は、文脈及び階層構造のデータの関連
性を利用して行なう照合、１集用辞書である。[Technical Background of the Invention and Problems Therewith] In recent years, with the advancement of pattern recognition technology, OCR (optical character reading equipment) that reads ``Japanese;n'' including kanji characters has been rapidly developed. In this type of Kanji OCR, individual characters are especially important as a dictionary for character recognition! ! A knowledge dictionary and a post-processing dictionary are important. The individual character 0 recognition dictionary consists of data that is a combination of standard patterns of characters to be read by OCR. Further, the post-processing dictionary is a dictionary for one collection, which performs collation using the context and the relevance of data in a hierarchical structure.

ところで、このようなＯＣＲでは、漢字の文字種が非常
に多いため、その漢字をｉ！識するための上記のような
文字認識用辞書を効率的に作成することは困難である。By the way, in this type of OCR, since there are so many types of kanji, the kanji can be converted into i! It is difficult to efficiently create a dictionary for character recognition such as the one described above.

［発明の目的］本発明は上記の点に鑑みてなされたもので、その目的は
、漢字ＯＣＲに使用される個別文字認識用辞書及び後処
理用辞書を効率的に作成できる文字１ＵＸＩ用作成方式
を提供することにある。[Object of the Invention] The present invention has been made in view of the above points, and its purpose is to provide a creation method for character 1UXI that can efficiently create an individual character recognition dictionary and a post-processing dictionary used in Kanji OCR. Our goal is to provide the following.

［発明の概要］本発明では、日本語情報をコードデータに変換して入ノ
ｊする日本語入力手段が設けられる。さらに、予め日本
語の全文字種を記憶する全文字種辞書メモリが設けられ
る。後処理辞書作成手段は、上記日本語入力手段から入
力されるＯＣＲの後処理の対象となる単語及び個別文字
に基づいて同一単語１文字の排除および単語から個別文
字への展開等の日本語処理を行なってＯＣＲ用後処理辞
書を作成する。この後処理作成手段により作成された後
処理辞書は、後処理辞書メモリに記憶される。[Summary of the Invention] The present invention is provided with a Japanese input means for converting Japanese information into code data and inputting the code data. Furthermore, an all-character type dictionary memory is provided which stores all Japanese character types in advance. The post-processing dictionary creation means performs Japanese processing such as eliminating one character of the same word and expanding words into individual characters based on the words and individual characters to be subjected to OCR post-processing input from the Japanese input means. A post-processing dictionary for OCR is created by performing the following steps. The post-processing dictionary created by this post-processing creating means is stored in the post-processing dictionary memory.

個別文字認識辞書作成手段は、上記後処理辞書及び全文
字種辞書から必要文字種のデータを抽出合成して個別文
字認識辞書を作成する。この個別文字認識辞書作成手段
により作成された個別文字認識辞書は、個別文字認識辞
書メモリに記憶されるように構成される。The individual character recognition dictionary creation means extracts and synthesizes data of necessary character types from the post-processing dictionary and all character types dictionary to create an individual character recognition dictionary. The individual character recognition dictionary created by this individual character recognition dictionary creating means is configured to be stored in the individual character recognition dictionary memory.

このような構成により、日本語入力手段を利用して必要
な日本語情報を入力することにより、容易に文字ｎｌ用
辞書を作成することができる。With such a configuration, it is possible to easily create a dictionary for characters nl by inputting necessary Japanese information using the Japanese input means.

［発明の実施例］以下図面を参照して本発明の一実施例を説明する。図は
一実施例に係わる構成を示すブロック図である。図にお
いて、日本語入出力Ｂ置１０は、例えば日本二Ｂワード
プロセッサに使用される日本語人力機能を備えたＣＲＴ
ディスプレイ装置及びキーボードからなる端末装置であ
る。ここで、日本品人力鏝能とは、区点コード、音訓読
み、又は省略語を入力して漢字変換する機能または内蔵
する辞書と文脈情報を利用してカナ文字入力した文章を
漢字混じり文に変換する瀘能である。この場合の内蔵す
る辞書が、図に示す日本語処理用辞書１１である。[Embodiment of the Invention] An embodiment of the present invention will be described below with reference to the drawings. The figure is a block diagram showing the configuration according to one embodiment. In the figure, the Japanese input/output B position 10 is a CRT equipped with a Japanese language function used in, for example, a Japanese 2B word processor.
This is a terminal device consisting of a display device and a keyboard. Here, Nihonjin Jinriki Kono means a function that inputs Kuten codes, Onkun readings, or abbreviations and converts them into kanji, or converts sentences entered in kana characters into sentences containing kanji by using the built-in dictionary and context information. It is the ability to convert. The built-in dictionary in this case is the Japanese language processing dictionary 11 shown in the figure.

制御装置１２は、日本詔入出力装Ｗ１１０から入力され
る入力データに基づいて漢字ＯＣＲ１３の文字認識処理
に必要な個別文字認識用辞書及び後処理用辞書を作成し
、各辞書用メモリ１７．１５に記憶する。The control device 12 creates individual character recognition dictionaries and post-processing dictionaries necessary for the character recognition process of the Kanji OCR 13 based on the input data input from the Japanese imperial command input/output device W110, and stores each dictionary memory 17.15. to be memorized.

さらに制御装置１２は、ＯＣＲ１３の動作に応じて文字
認識処理に必要な各種の辞書をＯＣＲ１３に出力する。Further, the control device 12 outputs various dictionaries necessary for character recognition processing to the OCR 13 according to the operation of the OCR 13.

辞四用メモリ１４は、予め全文字種辞書を記憶するメモ
リである。編集部１Ｇは、ＯＣＲ１３の後処理において
データの付加などの処理を必要づ−る場合、辞書用メモ
リ１５の後処理用辞書を利用した編集動作を行なう。The dictionary memory 14 is a memory that stores a dictionary of all character types in advance. The editing section 1G performs an editing operation using the post-processing dictionary of the dictionary memory 15 when processing such as data addition is required in the post-processing of the OCR 13.

上記の様な構成において、一実施例に係わる動作を説明
する。先ず、オペレータの操作により、ＯＣＲ１３の後
処理の対象となる漢字等の単語が、日本語入出力装置１
０から制御ｆｌ　８置１２に入力される。In the above configuration, the operation according to one embodiment will be explained. First, through an operator's operation, words such as kanji to be post-processed by the OCR 13 are transferred to the Japanese input/output device 1.
0 to control fl 8 and 12.

このどき、必要であれば単Ｓｎ以外の個別文字（例えば
、漢字、英数字、仮名文字、記号等）も入力される。制
御則１２では、入力された単語及び文字群に基づいて、
同−単語及び同一文字の排除、みΣ識不能文字の抽出、
単語から個別文字への展開Ｊ３よび個別文字の区点コー
ド等による分類の各処理が行なわれる。この処理により
、後処理用辞書及び個別文字認識用辞書の作成に必要な
基礎データが作成される。制御Ｉ装置１２は、上記基礎
データに基づいて後処理用辞書を作成し、さらに後処理
用辞書及び辞書用メモリ１４内の全文字種辞書から必要
文字種のデータを抽出合成して個別文字認識用辞書を作
成する。このように作成された後処理用辞書及び個別文
字認識用辞書は、それぞれ各辞書用メモリ１５．１７に
記憶される。At this time, if necessary, individual characters other than the single Sn (for example, kanji, alphanumeric characters, kana characters, symbols, etc.) are also input. In control law 12, based on the input word and character group,
Exclude the same words and characters, extract unrecognizable characters,
Processes such as expansion J3 of words into individual characters and classification of individual characters using Kuten codes and the like are performed. This process creates basic data necessary for creating a post-processing dictionary and an individual character recognition dictionary. The control I device 12 creates a post-processing dictionary based on the above-mentioned basic data, and further extracts and synthesizes necessary character type data from the post-processing dictionary and all character type dictionaries in the dictionary memory 14 to create a dictionary for individual character recognition. Create. The post-processing dictionary and the individual character recognition dictionary thus created are stored in each dictionary memory 15.17.

次に、ＯＣＲ１３の読取動作が開始されると、制御ｌｌ
装置１２は辞書用メモリ１７から個別文字認識用辞書を
読出してＯＣＲ１３に出力する。これにより、ＯＣＲ１
３は個別文字認識用辞書に基づいて、特定範囲の字種に
応じた文字の認識処理を行なう。このとき、読取対象の
文字特定範囲の字種に限定できない場合、制御Ｉｌ装［
１２は辞書用メモリ１４から全文字種辞書を読出してＯ
ＣＲ１３に出力づる。さらにＯＣＲ１３が後処理を開始
すると、制御１８１１ｆ１２は辞書用メモリ１５から後
処理用辞書を読出して０ＣＲ１３に出力する。このＯＣ
Ｒ１３の後処理において、後処理が単語の有無の照合の
ような簡単な処理だけでなく、データの付加などの編集
動作を必要どする場合には編集部１Ｇにより後処理用辞
書を利用した編集動作が１″ｉわれる。編集部１６は、
例えば郵便番号と住所、区市町村名と都府県名などの各
関連情報を収納しており、これらの情報とＯＣＲ１３の
認識結果とを組合わせるような編集動作を行なう。Next, when the reading operation of the OCR 13 is started, the control
The device 12 reads the dictionary for individual character recognition from the dictionary memory 17 and outputs it to the OCR 13. As a result, OCR1
3 performs character recognition processing according to character types in a specific range based on the individual character recognition dictionary. At this time, if the character type to be read cannot be limited to a specific range of characters, the control system [
12 reads out all character type dictionaries from the dictionary memory 14 and
Output to CR13. Further, when the OCR 13 starts post-processing, the control 1811f12 reads out the post-processing dictionary from the dictionary memory 15 and outputs it to the OCR 13. This OC
In R13 post-processing, if post-processing requires not only simple processing such as checking the presence or absence of words, but also editing operations such as data addition, editing using a post-processing dictionary is performed by editing unit 1G. The operation is performed 1″i. The editorial department 16
For example, it stores various related information such as postal code, address, name of ward, town, village, name of prefecture, etc., and performs editing operations such as combining these information with the recognition results of the OCR 13.

尚、個別文字認識用辞書は必ずしも全文字種辞書ど同一
のデータ構造を有する必要はなく、全文字種辞書に対し
て索引の対象となるか否かをチェックすることが可能な
テーブルでもよい。Note that the individual character recognition dictionary does not necessarily have the same data structure as all character type dictionaries, and may be a table that can check whether or not it is an index target for all character type dictionaries.

［発明の効果］以上詳述したように本発明によれば、日本語入出力装置
の操作により、漢字ＯＣＲの文字認識用辞書である後処
理用辞書及び個別文字認識用辞書を効率的に作成するこ
とができる。したがって、読取対象が文字種の多大な漢
字を含む日本語用のＯＣＲに対して、必要な文字認識用
辞書を効率的に作成でき、確実なＯＣＲの動作を実現で
きるものである。[Effects of the Invention] As detailed above, according to the present invention, a post-processing dictionary and an individual character recognition dictionary, which are Kanji OCR character recognition dictionaries, can be efficiently created by operating a Japanese input/output device. can do. Therefore, it is possible to efficiently create a necessary character recognition dictionary for Japanese OCR in which the reading target includes a large number of Chinese characters, and to realize reliable OCR operation.

【図面の簡単な説明】[Brief explanation of drawings]

図は本発明の一実施例に係わる文字認識用作成方式の構
成を示すブロック図である。１０・・・日本語入出力装置、１２・・・制御装置、１
３・・・０ＣＲ１１４・・・全文字種辞書用メモ１ハ１
５−＝−後処理用辞書川メモリ、１７・・・個別文字認
識用辞書用メモリ。出願人代理人　弁理士　鈴江武彦The figure is a block diagram showing the configuration of a character recognition creation method according to an embodiment of the present invention. 10... Japanese input/output device, 12... Control device, 1
3...0CR114...All character type dictionary memo 1ha1
5-=-Dictionary memory for post-processing, 17...Dictionary memory for individual character recognition. Applicant's agent Patent attorney Takehiko Suzue

Claims

【特許請求の範囲】[Claims]

日本語情報をコードデータに変換して入力する日本語入
力手段と、予め日本紺の全文字種を記憶する全文字種辞
書メモリと、上記日本語入力手段から入力されるＯＣＲ
の後処理の対象となる重器及び個別文字に基づいて同−
軍鉗１文字の排除および重器から個別文字への展開等の
日本語処理を行なってＯＣＲ用後処理辞書を作成する後
処理辞書作成手段と、この後処理作成手段により作成さ
れた後処理辞書を記憶する後処理辞書メモリと、上記ｖ
１処理辞田及び全文字種辞富から必要文字種のデータを
抽出合成して個別文字認識辞書を作成する個別文字認識
辞書作成手段と、この個別文字ＯＸ辞閤作成手段により
作成された個別文字認識辞書を記憶する個別文字認識辞
書メモリとを具備してなることを特徴とする文字認識用
辞書作成方式。A Japanese input means for converting Japanese information into code data and inputting it, an all character type dictionary memory that stores all character types of Japanese navy blue in advance, and an OCR input from the Japanese input means.
Based on the heavy equipment and individual characters to be subjected to post-processing.
A post-processing dictionary creation means for creating a post-processing dictionary for OCR by performing Japanese processing such as eliminating one character for gun (gun) and expanding into individual characters from heavy weapons, and a post-processing dictionary created by this post-processing creation means a post-processing dictionary memory that stores the above v
An individual character recognition dictionary creating means for creating an individual character recognition dictionary by extracting and synthesizing data of required character types from one processing Jida and all character types Jitomi, and an individual character recognition dictionary created by this individual character OX dictionary creating means. A method for creating a dictionary for character recognition, comprising: an individual character recognition dictionary memory for storing.