JPS60207983A - Production system of dictionary for recognizing character - Google Patents

Production system of dictionary for recognizing character

Info

Publication number
JPS60207983A
JPS60207983A JP59063560A JP6356084A JPS60207983A JP S60207983 A JPS60207983 A JP S60207983A JP 59063560 A JP59063560 A JP 59063560A JP 6356084 A JP6356084 A JP 6356084A JP S60207983 A JPS60207983 A JP S60207983A
Authority
JP
Japan
Prior art keywords
dictionary
character
processing
post
individual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP59063560A
Other languages
Japanese (ja)
Inventor
Masaki Komiya
小宮 雅紀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Priority to JP59063560A priority Critical patent/JPS60207983A/en
Publication of JPS60207983A publication Critical patent/JPS60207983A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

PURPOSE:To accomplish an efficient production of dictionaries for recognizing individual character and for aftertreatment by deleting the same word and character and by developing an object word for aftertreatment of an OCR from a Japanese input to an individual character level in accordance with the individual character. CONSTITUTION:Words such as KANJI (Chinese character) becoming an object of aftertreatment of an OCR13 are inputted from a Japanese I/O device 10 to a controller 12, where in accordance with input words and character groups, the same word and character are deleted, characters unable to be read out are extracted, and a word is developed to an individual character level. As a result basic data necessary for production of dictionaries 15 for aftertreatment and for recognizing the individual character are prepared. The controller 12 makes the aftertreatment dictionary based on the basic data, and produces the dictionary for recognizing an individual character by extracting and synthesizing necessary- character-kind-data from an all-kind-character dictionary memory 14. These produced dictionaries are stored in dictionary memories 15 for aftertreatment and for recognizing an individual character 17.

Description

【発明の詳細な説明】 [発明の技術分野] 本発明は、漢字OCR用の辞書を作成づるための文字認
識用辞書作成方式に関する。
DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a character recognition dictionary creation method for creating a dictionary for Kanji OCR.

[発明の技術的背景とその問題点] 近年、パターン認識技術の進歩により、漢字を含む日本
;nを読取るOCR(光学的文字読取装[)が即発され
ている。このような漢字OCRでは、文字認識用辞書と
して特に個別文字!!識用辞龜及び後処理用辞書が重要
である。個別文字0識用辞書は、OCRの読取対象とな
る文字の標準パターンを組合わせたデータからなる。ま
た、後処理用辞書は、文脈及び階層構造のデータの関連
性を利用して行なう照合、1集用辞書である。
[Technical Background of the Invention and Problems Therewith] In recent years, with the advancement of pattern recognition technology, OCR (optical character reading equipment) that reads ``Japanese;n'' including kanji characters has been rapidly developed. In this type of Kanji OCR, individual characters are especially important as a dictionary for character recognition! ! A knowledge dictionary and a post-processing dictionary are important. The individual character 0 recognition dictionary consists of data that is a combination of standard patterns of characters to be read by OCR. Further, the post-processing dictionary is a dictionary for one collection, which performs collation using the context and the relevance of data in a hierarchical structure.

ところで、このようなOCRでは、漢字の文字種が非常
に多いため、その漢字をi!識するための上記のような
文字認識用辞書を効率的に作成することは困難である。
By the way, in this type of OCR, since there are so many types of kanji, the kanji can be converted into i! It is difficult to efficiently create a dictionary for character recognition such as the one described above.

[発明の目的] 本発明は上記の点に鑑みてなされたもので、その目的は
、漢字OCRに使用される個別文字認識用辞書及び後処
理用辞書を効率的に作成できる文字1UXI用作成方式
を提供することにある。
[Object of the Invention] The present invention has been made in view of the above points, and its purpose is to provide a creation method for character 1UXI that can efficiently create an individual character recognition dictionary and a post-processing dictionary used in Kanji OCR. Our goal is to provide the following.

[発明の概要] 本発明では、日本語情報をコードデータに変換して入ノ
jする日本語入力手段が設けられる。さらに、予め日本
語の全文字種を記憶する全文字種辞書メモリが設けられ
る。後処理辞書作成手段は、上記日本語入力手段から入
力されるOCRの後処理の対象となる単語及び個別文字
に基づいて同一単語1文字の排除および単語から個別文
字への展開等の日本語処理を行なってOCR用後処理辞
書を作成する。この後処理作成手段により作成された後
処理辞書は、後処理辞書メモリに記憶される。
[Summary of the Invention] The present invention is provided with a Japanese input means for converting Japanese information into code data and inputting the code data. Furthermore, an all-character type dictionary memory is provided which stores all Japanese character types in advance. The post-processing dictionary creation means performs Japanese processing such as eliminating one character of the same word and expanding words into individual characters based on the words and individual characters to be subjected to OCR post-processing input from the Japanese input means. A post-processing dictionary for OCR is created by performing the following steps. The post-processing dictionary created by this post-processing creating means is stored in the post-processing dictionary memory.

個別文字認識辞書作成手段は、上記後処理辞書及び全文
字種辞書から必要文字種のデータを抽出合成して個別文
字認識辞書を作成する。この個別文字認識辞書作成手段
により作成された個別文字認識辞書は、個別文字認識辞
書メモリに記憶されるように構成される。
The individual character recognition dictionary creation means extracts and synthesizes data of necessary character types from the post-processing dictionary and all character types dictionary to create an individual character recognition dictionary. The individual character recognition dictionary created by this individual character recognition dictionary creating means is configured to be stored in the individual character recognition dictionary memory.

このような構成により、日本語入力手段を利用して必要
な日本語情報を入力することにより、容易に文字nl用
辞書を作成することができる。
With such a configuration, it is possible to easily create a dictionary for characters nl by inputting necessary Japanese information using the Japanese input means.

[発明の実施例] 以下図面を参照して本発明の一実施例を説明する。図は
一実施例に係わる構成を示すブロック図である。図にお
いて、日本語入出力B置10は、例えば日本二Bワード
プロセッサに使用される日本語人力機能を備えたCRT
ディスプレイ装置及びキーボードからなる端末装置であ
る。ここで、日本品人力鏝能とは、区点コード、音訓読
み、又は省略語を入力して漢字変換する機能または内蔵
する辞書と文脈情報を利用してカナ文字入力した文章を
漢字混じり文に変換する瀘能である。この場合の内蔵す
る辞書が、図に示す日本語処理用辞書11である。
[Embodiment of the Invention] An embodiment of the present invention will be described below with reference to the drawings. The figure is a block diagram showing the configuration according to one embodiment. In the figure, the Japanese input/output B position 10 is a CRT equipped with a Japanese language function used in, for example, a Japanese 2B word processor.
This is a terminal device consisting of a display device and a keyboard. Here, Nihonjin Jinriki Kono means a function that inputs Kuten codes, Onkun readings, or abbreviations and converts them into kanji, or converts sentences entered in kana characters into sentences containing kanji by using the built-in dictionary and context information. It is the ability to convert. The built-in dictionary in this case is the Japanese language processing dictionary 11 shown in the figure.

制御装置12は、日本詔入出力装W110から入力され
る入力データに基づいて漢字OCR13の文字認識処理
に必要な個別文字認識用辞書及び後処理用辞書を作成し
、各辞書用メモリ17.15に記憶する。
The control device 12 creates individual character recognition dictionaries and post-processing dictionaries necessary for the character recognition process of the Kanji OCR 13 based on the input data input from the Japanese imperial command input/output device W110, and stores each dictionary memory 17.15. to be memorized.

さらに制御装置12は、OCR13の動作に応じて文字
認識処理に必要な各種の辞書をOCR13に出力する。
Further, the control device 12 outputs various dictionaries necessary for character recognition processing to the OCR 13 according to the operation of the OCR 13.

辞四用メモリ14は、予め全文字種辞書を記憶するメモ
リである。編集部1Gは、OCR13の後処理において
データの付加などの処理を必要づ−る場合、辞書用メモ
リ15の後処理用辞書を利用した編集動作を行なう。
The dictionary memory 14 is a memory that stores a dictionary of all character types in advance. The editing section 1G performs an editing operation using the post-processing dictionary of the dictionary memory 15 when processing such as data addition is required in the post-processing of the OCR 13.

上記の様な構成において、一実施例に係わる動作を説明
する。先ず、オペレータの操作により、OCR13の後
処理の対象となる漢字等の単語が、日本語入出力装置1
0から制御fl 8置12に入力される。
In the above configuration, the operation according to one embodiment will be explained. First, through an operator's operation, words such as kanji to be post-processed by the OCR 13 are transferred to the Japanese input/output device 1.
0 to control fl 8 and 12.

このどき、必要であれば単Sn以外の個別文字(例えば
、漢字、英数字、仮名文字、記号等)も入力される。制
御則12では、入力された単語及び文字群に基づいて、
同−単語及び同一文字の排除、みΣ識不能文字の抽出、
単語から個別文字への展開J3よび個別文字の区点コー
ド等による分類の各処理が行なわれる。この処理により
、後処理用辞書及び個別文字認識用辞書の作成に必要な
基礎データが作成される。制御I装置12は、上記基礎
データに基づいて後処理用辞書を作成し、さらに後処理
用辞書及び辞書用メモリ14内の全文字種辞書から必要
文字種のデータを抽出合成して個別文字認識用辞書を作
成する。このように作成された後処理用辞書及び個別文
字認識用辞書は、それぞれ各辞書用メモリ15.17に
記憶される。
At this time, if necessary, individual characters other than the single Sn (for example, kanji, alphanumeric characters, kana characters, symbols, etc.) are also input. In control law 12, based on the input word and character group,
Exclude the same words and characters, extract unrecognizable characters,
Processes such as expansion J3 of words into individual characters and classification of individual characters using Kuten codes and the like are performed. This process creates basic data necessary for creating a post-processing dictionary and an individual character recognition dictionary. The control I device 12 creates a post-processing dictionary based on the above-mentioned basic data, and further extracts and synthesizes necessary character type data from the post-processing dictionary and all character type dictionaries in the dictionary memory 14 to create a dictionary for individual character recognition. Create. The post-processing dictionary and the individual character recognition dictionary thus created are stored in each dictionary memory 15.17.

次に、OCR13の読取動作が開始されると、制御ll
装置12は辞書用メモリ17から個別文字認識用辞書を
読出してOCR13に出力する。これにより、OCR1
3は個別文字認識用辞書に基づいて、特定範囲の字種に
応じた文字の認識処理を行なう。このとき、読取対象の
文字特定範囲の字種に限定できない場合、制御Il装[
12は辞書用メモリ14から全文字種辞書を読出してO
CR13に出力づる。さらにOCR13が後処理を開始
すると、制御1811f12は辞書用メモリ15から後
処理用辞書を読出して0CR13に出力する。このOC
R13の後処理において、後処理が単語の有無の照合の
ような簡単な処理だけでなく、データの付加などの編集
動作を必要どする場合には編集部1Gにより後処理用辞
書を利用した編集動作が1″iわれる。編集部16は、
例えば郵便番号と住所、区市町村名と都府県名などの各
関連情報を収納しており、これらの情報とOCR13の
認識結果とを組合わせるような編集動作を行なう。
Next, when the reading operation of the OCR 13 is started, the control
The device 12 reads the dictionary for individual character recognition from the dictionary memory 17 and outputs it to the OCR 13. As a result, OCR1
3 performs character recognition processing according to character types in a specific range based on the individual character recognition dictionary. At this time, if the character type to be read cannot be limited to a specific range of characters, the control system [
12 reads out all character type dictionaries from the dictionary memory 14 and
Output to CR13. Further, when the OCR 13 starts post-processing, the control 1811f12 reads out the post-processing dictionary from the dictionary memory 15 and outputs it to the OCR 13. This OC
In R13 post-processing, if post-processing requires not only simple processing such as checking the presence or absence of words, but also editing operations such as data addition, editing using a post-processing dictionary is performed by editing unit 1G. The operation is performed 1″i. The editorial department 16
For example, it stores various related information such as postal code, address, name of ward, town, village, name of prefecture, etc., and performs editing operations such as combining these information with the recognition results of the OCR 13.

尚、個別文字認識用辞書は必ずしも全文字種辞書ど同一
のデータ構造を有する必要はなく、全文字種辞書に対し
て索引の対象となるか否かをチェックすることが可能な
テーブルでもよい。
Note that the individual character recognition dictionary does not necessarily have the same data structure as all character type dictionaries, and may be a table that can check whether or not it is an index target for all character type dictionaries.

[発明の効果] 以上詳述したように本発明によれば、日本語入出力装置
の操作により、漢字OCRの文字認識用辞書である後処
理用辞書及び個別文字認識用辞書を効率的に作成するこ
とができる。したがって、読取対象が文字種の多大な漢
字を含む日本語用のOCRに対して、必要な文字認識用
辞書を効率的に作成でき、確実なOCRの動作を実現で
きるものである。
[Effects of the Invention] As detailed above, according to the present invention, a post-processing dictionary and an individual character recognition dictionary, which are Kanji OCR character recognition dictionaries, can be efficiently created by operating a Japanese input/output device. can do. Therefore, it is possible to efficiently create a necessary character recognition dictionary for Japanese OCR in which the reading target includes a large number of Chinese characters, and to realize reliable OCR operation.

【図面の簡単な説明】[Brief explanation of drawings]

図は本発明の一実施例に係わる文字認識用作成方式の構
成を示すブロック図である。 10・・・日本語入出力装置、12・・・制御装置、1
3・・・0CR114・・・全文字種辞書用メモ1ハ1
5−=−後処理用辞書川メモリ、17・・・個別文字認
識用辞書用メモリ。 出願人代理人 弁理士 鈴江武彦
The figure is a block diagram showing the configuration of a character recognition creation method according to an embodiment of the present invention. 10... Japanese input/output device, 12... Control device, 1
3...0CR114...All character type dictionary memo 1ha1
5-=-Dictionary memory for post-processing, 17...Dictionary memory for individual character recognition. Applicant's agent Patent attorney Takehiko Suzue

Claims (1)

【特許請求の範囲】[Claims] 日本語情報をコードデータに変換して入力する日本語入
力手段と、予め日本紺の全文字種を記憶する全文字種辞
書メモリと、上記日本語入力手段から入力されるOCR
の後処理の対象となる重器及び個別文字に基づいて同−
軍鉗1文字の排除および重器から個別文字への展開等の
日本語処理を行なってOCR用後処理辞書を作成する後
処理辞書作成手段と、この後処理作成手段により作成さ
れた後処理辞書を記憶する後処理辞書メモリと、上記v
1処理辞田及び全文字種辞富から必要文字種のデータを
抽出合成して個別文字認識辞書を作成する個別文字認識
辞書作成手段と、この個別文字OX辞閤作成手段により
作成された個別文字認識辞書を記憶する個別文字認識辞
書メモリとを具備してなることを特徴とする文字認識用
辞書作成方式。
A Japanese input means for converting Japanese information into code data and inputting it, an all character type dictionary memory that stores all character types of Japanese navy blue in advance, and an OCR input from the Japanese input means.
Based on the heavy equipment and individual characters to be subjected to post-processing.
A post-processing dictionary creation means for creating a post-processing dictionary for OCR by performing Japanese processing such as eliminating one character for gun (gun) and expanding into individual characters from heavy weapons, and a post-processing dictionary created by this post-processing creation means a post-processing dictionary memory that stores the above v
An individual character recognition dictionary creating means for creating an individual character recognition dictionary by extracting and synthesizing data of required character types from one processing Jida and all character types Jitomi, and an individual character recognition dictionary created by this individual character OX dictionary creating means. A method for creating a dictionary for character recognition, comprising: an individual character recognition dictionary memory for storing.
JP59063560A 1984-03-31 1984-03-31 Production system of dictionary for recognizing character Pending JPS60207983A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP59063560A JPS60207983A (en) 1984-03-31 1984-03-31 Production system of dictionary for recognizing character

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP59063560A JPS60207983A (en) 1984-03-31 1984-03-31 Production system of dictionary for recognizing character

Publications (1)

Publication Number Publication Date
JPS60207983A true JPS60207983A (en) 1985-10-19

Family

ID=13232729

Family Applications (1)

Application Number Title Priority Date Filing Date
JP59063560A Pending JPS60207983A (en) 1984-03-31 1984-03-31 Production system of dictionary for recognizing character

Country Status (1)

Country Link
JP (1) JPS60207983A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02230488A (en) * 1989-03-03 1990-09-12 Nec Corp Character recognizing device
JPH02302888A (en) * 1989-05-18 1990-12-14 Nec Corp Word dictionary collating device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02230488A (en) * 1989-03-03 1990-09-12 Nec Corp Character recognizing device
JPH02302888A (en) * 1989-05-18 1990-12-14 Nec Corp Word dictionary collating device

Similar Documents

Publication Publication Date Title
JP2726568B2 (en) Character recognition method and device
JPS60207983A (en) Production system of dictionary for recognizing character
JP3253657B2 (en) Document search method
JPS592191A (en) Recognizing and processing system of handwritten japanese sentence
JP2765712B2 (en) Character recognition input device
JPH0256086A (en) Method for postprocessing for character recognition
JPS58123126A (en) Dictionary retrieving device
JPS6395573A (en) Method for processing unknown word in analysis of japanese sentence morpheme
JPS6154559A (en) Japanese word processor
JP2570784B2 (en) Document reader post-processing device
Marukawa et al. A post-processing method for handwritten Kanji name recognition using Furigana information
JPS58123125A (en) Documentation device
JPH0574867B2 (en)
Segert et al. A Computer Program for Analysis of Words According to Their Meaning (Conceptual analysis of Latin equivalents for the comparative dictionary of Semitic languages)
JPH01114976A (en) Dictionary structure for document processor
JPH0262659A (en) Extracting device for correction candidate character of japanese sentence
JP2917310B2 (en) Word dictionary search method for word matching
JPS588379A (en) Kana (japanese syllabary)-kanji (chinese character) converting system
JPS5757379A (en) Character information input device
JPH0746374B2 (en) Character recognition method
JPS6175467A (en) Kana and kanji converting device
KR920001375A (en) How to handle Hangul character input
JPH02155073A (en) Unknown word qualifying device
JPH04167051A (en) Document editing device
JPH0778155A (en) Document recognizing device