KR100834602B1

KR100834602B1 - Character recognition apparatus and character recognition method

Info

Publication number: KR100834602B1
Application number: KR1020060078850A
Authority: KR
Inventors: 다꾸마 아까기; 도시오 사또
Original assignee: 가부시끼가이샤 도시바
Priority date: 2005-08-22
Filing date: 2006-08-21
Publication date: 2008-06-02
Also published as: KR20070022607A

Abstract

문자 인식 장치는 업무 서식 등의 종이 상에 기입되어 있는 복수의 문자 입력 상자에 기입된 문자열로부터 각각의 문자 후보를 발생시키고 문자 인식을 수행한다. 행 이미지 분리부(row image separating unit)(32)는 복수의 문자 입력 상자 및 문자열을 포함하는 이미지를 획득한다. 문자 추출 프로세싱부(33 내지 37)는 이미지 내에 포함된 문자열에서 복수의 라인이 서로 접촉하고 교차하는 각각의 점을 검출하고, 그 점과 대응하는 문자 입력 상자 간의 배치 관계에 기초하여 문자열이 분할 또는 재결합되어야만 하는 점을 결정하며, 문자열을 구성하는 각각의 문자 후보를 발생시킨다. 문자 인식부(38)는 문자 추출 프로세싱부에 의해 발생된 각각의 문자 후보에 대해 문자 인식을 수행한다.The character recognition apparatus generates each character candidate from a character string written in a plurality of character input boxes written on paper such as a business form and performs character recognition. The row image separating unit 32 obtains an image including a plurality of character input boxes and a character string. The character extraction processing units 33 to 37 detect respective points in which a plurality of lines touch and intersect with each other in the character string included in the image, and the character string is divided or divided based on the arrangement relationship between the point and the corresponding character input box. Determines what must be recombined and generates each character candidate that makes up the string. The character recognition unit 38 performs character recognition for each character candidate generated by the character extraction processing unit.

문자 인식, 문자 추출, 문자 후보, 이미지 획득 Character recognition, character extraction, character candidate, image acquisition

Description

문자 인식 장치 및 문자 인식 방법{CHARACTER RECOGNITION APPARATUS AND CHARACTER RECOGNITION METHOD}CHARACTER RECOGNITION APPARATUS AND CHARACTER RECOGNITION METHOD}

도 1은 본 발명의 실시예에 따른 문자 인식 장치에 적용되는 프로세싱의 기본적인 흐름을 나타낸 블록도.1 is a block diagram showing the basic flow of processing applied to the character recognition apparatus according to an embodiment of the present invention.

도 2a 및 도 2b는 하나의 문자 입력 상자 내의 이미지가 문자 입력 상자의 위치에 기초하여 1개-문자 이미지(one-character image)로서 간주된다는 가정 하에 추출 기술을 설명하는 도면.2A and 2B illustrate an extraction technique under the assumption that an image within one text entry box is regarded as a one-character image based on the position of the text entry box.

도 3은 자유롭게 필기된 문자에서 문자 접촉의 유형을 설명하는 도면.3 illustrates the type of character contact in a freely handwritten character.

도 4a 및 도 4b는 자유롭게 필기된 문자열에 적용되는 문자 접촉 위치 결정 프로세싱 및 문자 위치 추출 프로세싱을, 문자 입력 상자에 기입된 문자열에 적용하는 기술을 설명하는 도면.4A and 4B illustrate a technique of applying character contact position determination processing and character position extraction processing applied to a freely written character string to a character string written in a character input box.

도 5는 본 실시예에 따른 문자 인식 장치의 기능적 구성을 나타낸 블록도.5 is a block diagram showing a functional configuration of a character recognition apparatus according to the present embodiment.

도 6a, 도 6b 및 도 6c는 제1 문자 추출 기술을 설명하는 도면.6A, 6B and 6C illustrate the first character extraction technique.

도 7은 제1 문자 추출 기술에 기초한 동작의 예를 나타낸 플로우차트.7 is a flowchart illustrating an example of an operation based on a first character extraction technique.

도 8a, 도 8b 및 도 8c는 제2 문자 추출 기술을 설명하는 도면.8A, 8B and 8C illustrate a second character extraction technique.

도 9는 제2 문자 추출 기술에 기초한 동작의 예를 나타낸 플로우차트.9 is a flowchart showing an example of operation based on the second character extraction technique.

도 10a, 도 10b, 도 10c, 도 10d 및 도 10e는 문자 입력 상자의 여러가지 형 상을 나타낸 도면.10A, 10B, 10C, 10D, and 10E illustrate various shapes of a text input box.

도 11a, 도 11b 및 도 11c는 제3 문자 추출 기술을 설명하는 도면.11A, 11B and 11C illustrate a third character extraction technique.

도 12는 제3 문자 추출 기술에 기초한 속성 첨부 프로세싱에서의 동작의 예를 나타낸 플로우차트.12 is a flowchart illustrating an example of operations in attribute attachment processing based on a third character extraction technique.

도 13은 제3 문자 추출 기술에 기초한 문자 후보 발생 프로세싱에서의 동작의 예를 나타낸 플로우차트.13 is a flowchart showing an example of operation in character candidate generation processing based on the third character extraction technique.

도 14a, 도 14b 및 도 14c는 연자 기호(ligature)를 포함하는 문자열에 제3 문자 추출 기술을 적용하는 예를 나타낸 도면.14A, 14B, and 14C illustrate an example of applying a third character extraction technique to a string including a soft character ligature.

도 15는 제1 문자 후보 발생 규칙이 연자 기호를 포함하는 문자열에 적용될 때 획득되는 이미지의 예를 나타낸 도면.FIG. 15 shows an example of an image obtained when the first character candidate generation rule is applied to a character string including a soft symbol. FIG.

도 16은 제2 문자 후보 발생 규칙이 연자 기호를 포함하는 문자열에 적용될 때 획득되는 이미지의 예를 나타낸 도면.FIG. 16 shows an example of an image obtained when the second character candidate generation rule is applied to a character string including a soft symbol. FIG.

도 17은 도 15에 대응하는 규칙들이 연자 기호를 포함하는 문자열에 적용될 때의 동작의 예를 나타낸 플로우차트.FIG. 17 is a flowchart showing an example of the operation when the rules corresponding to FIG. 15 are applied to a character string including a soft symbol. FIG.

도 18은 도 16에 대응하는 규칙들이 연자 기호를 포함하는 문자열에 적용될 때의 동작의 예를 나타낸 플로우차트.FIG. 18 is a flowchart showing an example of the operation when the rules corresponding to FIG. 16 are applied to a character string including the soft symbol. FIG.

도 19는 다른 문자 후보 발생 규칙의 사용에서의 동작의 예를 나타낸 플로우차트.Fig. 19 is a flowchart showing an example of operation in the use of another character candidate generation rule.

도 20은 다른 문자 후보 발생 규칙의 사용에서의 동작의 예를 나타낸 플로우차트.20 is a flowchart showing an example of operation in the use of another character candidate generation rule.

<도면의 주요 부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

32: 행 이미지 분리부32: row image separator

33: 문자 분할 후보점 검출부33: character division candidate point detection unit

34: 문자 분할점 결정부34: character split point determiner

35: 문자 부분 발생부35: character part generator

36: 문자 부분 속성 첨부부36: Character part attribute attachment

37: 문자 후보 발생부37: character candidate generator

38: 문자 인식부38: character recognition unit

39: 문자열 인식 결과 편집부39: string recognition result editor

41: 전체 이미지 메모리41: total image memory

42: 행 이미지 메모리42: row image memory

43: 문자 부분 이미지 메모리43: character partial image memory

44: 문자 후보 이미지 메모리44: character candidate image memory

본 발명은 업무 서식 등의 종이 상에 기입된 복수의 문자 입력 상자에 기입된 문자열에 대해 문자 인식을 수행하는 문자 인식 장치 및 문자 인식 방법에 관한 것이다.The present invention relates to a character recognition apparatus and a character recognition method for performing character recognition on a character string written in a plurality of character input boxes written on paper such as a business form.

문자 인식 장치에 대한 문자 인식 대상인 종이의 예로서, 복수의 문자 입력 상자가 미리 기입되어 있는 업무 서식이 제공된다. 이러한 업무 서식 상에 기입된 각각의 문자는 반드시 대응하는 문자 입력 상자 내에 들어가는 것은 아니다. 주어진 문자 입력 상자에 기입된 문자의 일부분이 문자 입력 상자로부터 돌출해 있을 수 있고, 이 돌출된 일부분은 인접한 문자 입력 상자 내의 다른 문자의 일부분과 접촉하고 있을 수 있다. 이러한 경우에, 문자 인식 장치에서 각각의 문자 입력 상자 내의 문자에 대해 문자 추출 프로세싱이 수행되고 문자 인식 프로세싱이 수행되는 경우, 종종 정확한 문자 인식 결과가 획득될 수 없다.As an example of paper that is a character recognition object for the character recognition apparatus, a work form in which a plurality of character input boxes are pre-filled is provided. Each character entered on this work form does not necessarily fit within the corresponding character entry box. A portion of the characters written in a given character input box may protrude from the character input box, and the protruding portion may be in contact with a portion of another character in an adjacent character input box. In such a case, when character extraction processing is performed and character recognition processing is performed on a character in each character input box in the character recognition apparatus, often accurate character recognition results cannot be obtained.

기입된 문자열 내에 "00"이 포함되어 있는 경우, 문자 입력 상자로부터 돌출해 있는 연자 기호(ligature)(예를 들어, 0과 0이 연속적으로 기입되어 있을 때 그 위에 기입된 수평 라인)가 종종 문자열 내에 포함되어 있다. 문자 입력 상자와의 위치 관계로부터 이러한 연자 기호를 검출하고 이를 상자로부터 분리시키는 기술이 사용되기 시작하였다. 그렇지만, 문자가 복잡하게 서로 접촉하고 있는 경우, 이러한 기술은 제대로 동작하지 않는다. 게다가, 문자 입력 상자가 형상 또는 크기가 변하는 경우, 분할 장소를 결정하는 방법 또는 알고리즘이 그 변화에 따라 상당 부분 재작성되어야만 한다.If "00" is included in a written string, a string of characters protruding from the character entry box (for example, a horizontal line written over it when 0 and 0 are written consecutively) is often a string. It is contained within. Techniques for detecting such soft symbol from the positional relationship with the character input box and separating it from the box have begun to be used. However, if the characters are in complex contact with each other, this technique will not work properly. In addition, if the character input box changes shape or size, the method or algorithm for determining the dividing place must be rewritten in large part according to the change.

복수의 문자 입력 상자에 기입된 문자열에 대한 문자 추출 프로세싱을 위한 여러가지 기술이 있다. 예를 들어, 일본 특허 출원 공개 제2000-113101호(도 1 및 기타)는 문자열의 세로 및 가로 돌출 부분을 획득하고, 문자들 간에 연속적인 라인이 있는지를 결정하기 위해 이들을 문턱값과 비교하며, 이 결정 결과에 따라 문자 추출 프로세싱을 수행하는 기술을 개시하고 있다.There are various techniques for character extraction processing for strings written in multiple character entry boxes. For example, Japanese Patent Application Laid-Open No. 2000-113101 (FIG. 1 and others) obtains vertical and horizontal protrusions of strings, compares them with thresholds to determine whether there are continuous lines between characters, A technique for performing character extraction processing in accordance with the result of this determination is disclosed.

상기 인용예는 정정 라인(correction line)을 검출하는 기술을 보여준다. 따라서, 문자가 복잡하게 서로 접촉하고 있는 경우, 이 기술은 정확한 문자 추출을 수행할 수 없다.The example cited above shows a technique for detecting a correction line. Therefore, if characters are in complex contact with each other, this technique cannot perform accurate character extraction.

이러한 상황 하에서, 종이 상에 기입되어 있는 복수의 문자 입력 상자에 기입된 문자열에 대해 정확한 문자 추출을 실현하는 기술을 제공하는 것이 요망된다.Under such circumstances, it is desirable to provide a technique for realizing accurate character extraction for a character string written in a plurality of character input boxes written on paper.

본 발명의 한 측면에 따르면, 종이 상에 기입되어 있는 복수의 문자 입력 상자에 기입된 문자열로부터 각각의 문자 후보를 발생하고 문자 인식을 수행하는 문자 인식 장치가 제공되며, 이 장치는 상기 복수의 문자 입력 상자 및 상기 문자열을 포함하는 이미지를 획득하는 이미지 획득부, 상기 이미지 내에 포함된 문자열에서 복수의 라인이 서로 접촉하거나 교차하는 각각의 점을 검출하고, 상기 각각의 점과 대응하는 문자 입력 상자 간의 배치 관계에 기초하여 상기 문자열이 분할 또는 재결합되어야만 하는 점을 결정하며, 상기 분할 또는 재결합을 수행함으로써 상기 문자열을 구성하는 각각의 문자 후보를 발생하는 문자 추출 프로세싱부, 및 상기 문자 추출 프로세싱부에 의해 발생된 각각의 문자 후보에 대해 문자 인식을 수행하는 문자 인식부를 포함한다.According to one aspect of the present invention, there is provided a character recognition apparatus for generating each character candidate from a character string written in a plurality of character input boxes written on paper and performing character recognition, the apparatus comprising the plurality of characters An image acquiring unit for acquiring an image including the input box and the character string, and detecting respective points in which a plurality of lines touch or cross each other in the character string included in the image, and between the character input boxes corresponding to the respective points. A character extraction processing unit for determining a point at which the character string should be divided or recombined based on an arrangement relationship, and generating each character candidate constituting the character string by performing the division or recombination, and by the character extraction processing unit A character recognition unit that performs character recognition for each generated character candidate It should.

본 발명의 다른 측면에 따르면, 종이 상에 기입되어 있는 복수의 문자 입력 상자에 기입된 문자열로부터 각각의 문자 후보를 발생하고 문자 인식을 수행하는 문자 인식 방법이 제공되며, 이 방법은 상기 복수의 문자 입력 상자 및 상기 문자열을 포함하는 이미지를 획득하는 단계, 상기 이미지 내에 포함된 상기 문자열에서 복수의 라인이 서로 접촉하거나 교차하는 각각의 점을 검출하고, 상기 각각의 점과 대응하는 문자 입력 상자 간의 배치 관계에 기초하여 상기 문자열이 분할 또는 재결합되어야만 하는 점을 결정하며, 상기 분할 또는 재결합을 수행함으로써 상기 문자열을 구성하는 각각의 문자 후보를 발생하는 문자 추출 프로세싱을 수행하는 단계, 및 상기 문자 추출 프로세싱에서 발생된 각각의 문자 후보에 대해 문자 인식을 수행하는 단계를 포함한다.According to another aspect of the present invention, there is provided a character recognition method for generating each character candidate from a character string written in a plurality of character input boxes written on paper and performing character recognition. Obtaining an image including an input box and the character string, detecting each point in which the plurality of lines touch or cross each other in the character string included in the image, and arranging between the character input box corresponding to each point Determining that the string should be divided or recombined based on a relationship, and performing character extraction processing to generate each character candidate constituting the string by performing the division or recombination, and in the character extraction processing Performing character recognition for each generated character candidate It should.

명세서에 포함되어 그 일부를 구성하는 첨부 도면은 본 발명의 실시예들을 예시한 것이며, 이상에서 주어진 일반적인 설명 및 이하에 주어지는 실시예들의 상세한 설명과 함께, 본 발명의 원리를 설명하는 역할을 한다.The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the present invention and, together with the general description given above and the detailed description of the embodiments given below, serve to explain the principles of the invention.

본 발명의 실시예들에 대해 도면을 참조하여 이하에 기술한다.Embodiments of the present invention will be described below with reference to the drawings.

도 1은 본 발명의 실시예에 따른 문자 인식 장치에 적용되는 프로세싱의 기본적인 흐름을 나타낸 블록도이다.1 is a block diagram showing the basic flow of processing applied to the character recognition apparatus according to an embodiment of the present invention.

복수의 문자 입력 상자가 업무 서식 상에 기입되어 있고, 각각의 문자 입력 상자에 문자가 기입된다. 먼저, 이미지 데이터를 획득하기 위해 스캐너에 의해 업무 서식 상의 정보에 대한 판독 프로세싱(11)이 수행된다. 이어서, 스캐너에 의해 획득된 이미지 데이터로부터 각각의 문자 입력 상자의 정보를 제거하기 위해 상자 제거 프로세싱(12)이 수행된다. 그 결과, 문자열의 정보가 남는다. 문자 입력 상자가 드롭아웃 컬러(dropout color)로 형성되어 있는 경우, 문자 입력 상자의 정보는 미리 준비된 컬러 정보를 사용하여 제거된다. 그 결과, 문자열의 정보만이 흑 색 픽셀 정보로서 남는다.A plurality of text input boxes are written on the work form, and text is written in each text input box. First, read processing 11 on the work form information is performed by a scanner to obtain image data. Subsequently, box removal processing 12 is performed to remove information of each character input box from image data obtained by the scanner. As a result, the information of the string remains. When the text input box is formed in the dropout color, the information of the text input box is removed using the previously prepared color information. As a result, only the string information remains as black pixel information.

업무 서식 상의 각각의 입력에 대응하는 각각의 이미지에 대한 추출 프로세싱(13)은 문자 입력 상자의 제거 시에 획득된 정보에 대해 수행된다. 프로세싱(13)은 미리 등록된 각각의 입력의 위치의 정보를 사용하여 수행된다.Extraction processing 13 for each image corresponding to each input on the work form is performed on the information obtained upon removal of the text entry box. Processing 13 is performed using information of the position of each input previously registered.

이어서, 업무 서식 상의 각각의 입력에 대응하는 각각의 이미지에 대해 문자 추출 프로세싱(14), 문자 인식 프로세싱(15), 및 입력내 결과 결정(intra-entry result determination) 프로세싱(16)을 포함하는 프로세싱(20)이 실행된다. 즉, 각각의 문자에 대한 문자 추출 프로세싱(14)은 각각의 입력에 대해 미리 등록된 각각의 문자 입력 상자의 위치의 정보를 사용하여 수행된다. 각각의 추출된 문자에 대한 문자 인식 프로세싱(15)이 수행된다. 문자 인식 결과에 기초하여 그 입력에서의 문자열의 내용을 결정하기 위한 입력내 결과 결정 프로세싱이 수행된다.Processing that then includes character extraction processing 14, character recognition processing 15, and intra-entry result determination processing 16 for each image corresponding to each input on the business form. (20) is executed. That is, character extraction processing 14 for each character is performed using the information of the position of each character input box registered in advance for each input. Character recognition processing 15 is performed for each extracted character. Intra-input result determination processing is performed to determine the contents of the character string at the input based on the character recognition result.

마지막으로, 각각의 입력에서의 문자열의 내용을 편집하고 편집된 내용을 파일 또는 종이로 출력하기 위해 입력 편집 출력 프로세싱(17)이 수행된다.Finally, input edit output processing 17 is performed to edit the content of the string at each input and output the edited content to a file or paper.

일반적인 문자 추출 프로세싱에서, 하나의 문자 입력 상자에서의 이미지는 그 이미지가 1개-문자 이미지(one-character image)라는 가정 하에 문자 입력 상자의 위치에 기초하여 추출된다. 도 2a 및 도 2b는 이 경우의 문자 이미지 추출의 예를 나타낸 것이다. 도 2a는 3자리 문자열 "180"이 3개의 연속적인 문자 입력 상자에 기입되어 있는 예를 나타낸 것이다. 명백한 바와 같이, 문자 "8"의 일부분이 대응하는 문자 입력 상자로부터 인접한 문자 입력 상자로 돌출해 있다. 문자 추출에서, 개별적인 문자 입력 상자 내의 문자 이미지가 추출되기 때문에, 문자 이미지 는 문자 "8"의 일부분이 2개의 측면 상의 문자 이미지에 들어가도록 추출된다. 이 때문에, 문자 인식 프로세싱이 이들 문자 이미지 각각에 대해 수행되더라도, 정확한 문자 인식 결과가 얻어질 수 없다.In general character extraction processing, an image in one character entry box is extracted based on the position of the character entry box assuming that the image is a one-character image. 2A and 2B show an example of character image extraction in this case. 2A shows an example in which the three-digit string "180" is written in three consecutive character input boxes. As is apparent, a portion of the character " 8 " protrudes from the corresponding character input box to the adjacent character input box. In the character extraction, since the character image in the individual character entry box is extracted, the character image is extracted so that a portion of the character "8" enters the character image on two sides. Because of this, even if character recognition processing is performed for each of these character images, accurate character recognition results cannot be obtained.

이러한 상황 하에서, 문자가 복잡하게 서로 접촉하고 있더라도 그 문자를 정확하게 분할할 수 있고 또 문자 분할 알고리즘을 상당히 변경해야할 필요가 없게 해주는 기술을 제안하는 것이 요망된다.Under these circumstances, it is desirable to propose a technique that allows for accurate segmentation of characters even if they are in complex contact with each other, and does not require significant modification of the character segmentation algorithm.

각각이 복수의 문자 입력 상자를 갖는 업무 서식 상에 기입된 문자들 간의 접촉을 분석할 시에, 본 발명자는 그 접촉의 대부분이 자유롭게 필기한 문자들 간의 접촉과 아주 유사하다는 것을 발견하였다. 자유 필기(free handwriting)란 우편물에 주소를 기입하는 것과 같이 입력 상자를 갖지 않는 종이 상에 문자열을 자유롭게 기입하는 형태의 필기, 또는 구획선이나 문자 입력 상자를 갖지 않는 입력 상자에 문자열을 기입하는 형태의 필기를 말한다. 자유롭게 필기한 문자들 간의 접촉은 예를 들어 도 3에 나타내어져 있는 "접촉(contact)" 및 "교차(intersection)"로 대별된다. 이러한 문자 접촉은 필기체의 알파벳 문자를 제외한 숫자, 알파벳 문자, 가나(kana) 문자, 및 간지(kanji) 문자에 흔한 일이다.In analyzing the contact between characters written on a business template each having a plurality of text entry boxes, the inventors found that most of the contacts were very similar to the contact between freely written characters. Free handwriting is a form of writing a string freely on paper that does not have an input box, such as entering an address in a mail, or writing a string in an input box that does not have a line or character input box. Say handwriting. Contact between freely written characters is roughly divided into "contact" and "intersection", for example, as shown in FIG. Such character contact is common for numbers, alphabetic characters, kana characters, and kanji characters except for alphabetic characters in cursive.

상기한 바와 같이, 복수의 문자 입력 상자에 기입된 문자들 간의 접촉이 자유롭게 필기한 문자들 간의 접촉과 아주 유사하기 때문에, 자유롭게 필기된 문자열에 적용되는 문자 접촉 위치 추출 프로세싱 및 문자 위치 추출 프로세싱은 또한 문자 입력 상자에 기입된 문자열에도 적용될 수 있다. 도 4a 및 도 4b는 이 경우의 예를 나타낸 것이다.As described above, since the contact between the characters written in the plurality of character input boxes is very similar to the contact between freely written characters, the character contact position extraction processing and the character position extraction processing applied to the freely written character string are also performed. This can also be applied to a string written in a character input box. 4A and 4B show an example of this case.

도 4a는 도 2a에 나타낸 경우에서와 같이 3자리 문자열 "180"이 3개의 연속적인 문자 입력 상자에 기입되어 있는 예를 나타낸 것이다. 문자 "8"의 일부분이 대응하는 문자 입력 상자로부터 인접한 문자 입력 상자로 돌출해 있다. 이 경우에, "1"과 "8" 사이 또한 "8"과 "0" 사이의 문자 접촉 위치는 문자 입력 상자의 어떤 위치 정보도 사용하지 않고 자유롭게 필기된 문자열에 적용되는 문자 위치 추출 프로세싱, 기타 등등을 사용함으로써 검출될 수 있다. 따라서, 문자열은 이 위치에서 서로로부터 분리될 수 있다. 그렇지만, 이러한 기술이 단순하게 사용되는 경우, 문자 접촉에 대응해서는 안되는 부분조차도 문자 접촉으로서 검출되며, 따라서 잘못된 문자 후보가 발생되고, 그 결과 불필요한 문자 후보의 숫자가 증가하게 된다. 문자 부분 "1", "o", "o", "0"은 도 4에 도시된 문자열로부터 발생되고, 문자 후보는 그 문자 부분을 결합함으로써 발생된다. 이것은 도 4b에 나타낸 바와 같이 총 10개의 문자 후보를 발생한다. 그 결과, 업무 서식 인식에서 가장 시간이 많이 걸리는 프로세스 중 하나인 문자 인식 프로세싱의 부하가 증가한다. 이것은 또한 문자 인식 프로세싱에 걸리는 시간을 증가시킨다. 게다가, 인식 에러가 일어나는 경향이 있다. 예를 들어, 발생된 문자 후보에 대해 문자 인식이 수행되는 경우, "8"을 분리함으로써 발생되는 "o" 및 "o"는 잘못하여 "0" 및 "0"으로 인식될 수 있다.FIG. 4A shows an example in which the three-digit string "180" is written in three consecutive character input boxes as in the case shown in FIG. 2A. A portion of the character "8" protrudes from the corresponding character entry box into the adjacent character entry box. In this case, the character contact position between "1" and "8" and also between "8" and "0" is applied to the character position extraction processing applied to a freely written string without using any positional information of the character input box, etc. And the like can be detected. Thus, the strings can be separated from each other at this location. However, when such a technique is used simply, even the portion which should not correspond to the character contact is detected as the character contact, so that an invalid character candidate is generated, resulting in an increase in the number of unnecessary character candidates. The character parts "1", "o", "o", and "0" are generated from the character string shown in Fig. 4, and the character candidate is generated by combining the character parts. This generates a total of 10 character candidates as shown in FIG. 4B. As a result, the load on character recognition processing increases, which is one of the most time-consuming processes in job format recognition. This also increases the time taken for character recognition processing. In addition, recognition errors tend to occur. For example, when character recognition is performed on a generated character candidate, "o" and "o" generated by separating "8" may be mistakenly recognized as "0" and "0".

따라서, 이 실시예는 상기 문제점을 야기하지 않고 정확한 문자 추출을 수행하는 기술을 제안한다. 이 문자 추출 기술에 대한 상세에 대해서는 나중에 기술한다.Therefore, this embodiment proposes a technique for performing accurate character extraction without causing the above problem. Details of this character extraction technique will be described later.

도 5는 본 실시예에 따른 문자 인식 장치의 기능적 구성을 나타낸 블록도이다.5 is a block diagram showing the functional configuration of the character recognition apparatus according to the present embodiment.

본 실시예에 따른 문자 인식 장치는 업무 서식 상에 기입되어 있는 복수의 문자 입력 상자에 기입된 문자열로부터 각각의 문자 후보를 발생함으로써 문자 인식을 수행하도록 구성되어 있으며, 정보 데이터베이스(30), 스캐너(31), 행 이미지 분리부(이미지 획득 수단)(32), 문자 분할 후보점 검출부(검출 수단)(33), 문자 분할점 결정부(결정 수단)(34), 문자 부분 발생부(발생 수단)(35), 문자 부분 속성 첨부부(속성 첨부 수단)(36), 문자 후보 발생부(문자 후보 발생 수단)(37), 문자 인식부(문자 인식 수단)(38), 문자열 인식 결과 편집부(39), 업무서식 인식 결과 출력부(40), 전체 이미지 메모리(41), 행 이미지 메모리(42), 문자 부분 이미지 메모리(43), 문자 후보 이미지 메모리(44) 및 프린터(45)를 포함한다.The character recognition apparatus according to the present embodiment is configured to perform character recognition by generating respective character candidates from character strings written in a plurality of character input boxes written on a work form, and includes an information database 30 and a scanner ( 31, a row image separation unit (image acquisition unit) 32, a character division candidate point detection unit (detection unit) 33, a character division point determination unit (determination unit) 34, a character part generation unit (generation unit) (35), character part attribute attachment unit (attribute attachment means) 36, character candidate generation unit (character candidate generation means) 37, character recognition unit (character recognition means) 38, string recognition result editing unit 39 ), A business form recognition result output unit 40, a full image memory 41, a row image memory 42, a character partial image memory 43, a character candidate image memory 44, and a printer 45.

유의할 점은 상기 프로세싱부(33 내지 37)가 문자 추출 프로세싱을 구현하는 문자 추출 프로세싱부(문자 추출 프로세싱 수단)를 구성한다는 것이다.Note that the processing units 33 to 37 constitute a character extraction processing unit (character extraction processing means) for implementing character extraction processing.

정보 데이터베이스(30)는 각각의 문자 입력 상자의 배치 위치(괘선(ruled line) 정보 및 좌표 정보를 포함함)를 나타내는 상자 정보 및 각각의 문자 부분으로부터 각각의 문자 후보를 발생하는 규칙인 문자 후보 발생 규칙을 저장한다.The information database 30 generates character candidates which are rules for generating respective character candidates from each character part and box information indicating the placement position (including ruled line information and coordinate information) of each character input box. Save the rule.

스캐너(31)는 입력 업무 서식으로부터 문자 입력 상자 및 문자열을 포함한 정보를 판독하고, 전체 이미지를 발생한다. 스캐너(31)에 의해 발생된 전체 이미지는 전체 이미지 메모리(41)에 저장된다.The scanner 31 reads the information including the character input box and the character string from the input work form, and generates the entire image. The entire image generated by the scanner 31 is stored in the entire image memory 41.

행 이미지 분리부(32)는 업무 서식 상의 각각의 입력에서의 행 위치에 대응 하는 행 이미지를 전체 이미지 메모리(41)에 저장된 전체 이미지로부터 분리함으로써 행 이미지를 획득한다. 행 이미지 분리부(32)에 의해 획득되는 각각의 행 이미지는 행 이미지 메모리(42)에 저장된다.The row image separator 32 obtains the row image by separating the row image corresponding to the row position in each input on the work form from the full image stored in the full image memory 41. Each row image obtained by the row image separator 32 is stored in the row image memory 42.

문자 분할 후보점 검출부(33)는 행 이미지 메모리(42)에 저장된 각각의 행 이미지에 포함된 문자열에서 복수의 라인이 서로 접촉하거나 교차하는 점을 검출한다. 이 동작에 의해 검출되는 점은 분할 후보점으로서 간주된다. 이 경우에 검출된 모든 분할 후보점이 반드시 분할점으로서 사용되는 것은 아니다.The character division candidate point detection unit 33 detects a point where a plurality of lines contact or cross each other in a character string included in each row image stored in the row image memory 42. The points detected by this operation are regarded as division candidate points. In this case, not all the split candidate points detected are necessarily used as split points.

문자 분할 후보점 검출부(33)에 의해 검출된 점과 이 점에 가장 가까운 문자 입력 상자 사이의 배치 관계에 기초하여, 문자 분할점 결정부(34)는 문자열이 그 점에서 분할되어야 하는지 여부를 결정한다. 즉, 분할이 실제로 수행되어야만 하는 분할 후보는 문자 분할 후보점 검출부(33)에 의해 획득된 개별적인 분할 후보점들로부터 범위가 축소된다. Based on the arrangement relationship between the point detected by the character division candidate point detection unit 33 and the character input box closest to this point, the character division point determination unit 34 determines whether the character string should be divided at that point. do. That is, the division candidate for which division should actually be performed is reduced in range from the individual division candidate points obtained by the character division candidate point detection unit 33.

문자 분할점 결정부(34)는 이하의 기능을 가질 수 있다. 예를 들어, 문자 분할 후보점 검출부(33)에 의해 검출된 점과 이 점에 가장 가까운 문자 입력 상자의 수직 라인 사이의 거리가 미리 정해진 값보다 크거나 같은 경우, 문자 분할점 결정부(34)는 그 점의 위치에서 분할이 수행되어서는 안되는 것으로 결정한다. 이 거리가 미리 정해진 값보다 작은 경우, 문자 분할점 결정부(34)는 그 점의 위치에서 분할이 수행되어야 하는 것으로 결정한다.The character split point determiner 34 may have the following functions. For example, when the distance between the point detected by the character division candidate point detection unit 33 and the vertical line of the character input box closest to this point is greater than or equal to a predetermined value, the character division point determination unit 34 Determines that splitting should not be performed at that point. If this distance is smaller than a predetermined value, the character division point determination unit 34 determines that division should be performed at the position of the point.

문자 분할점 결정부(34)는 또한 예를 들어, 1) 문자 분할 후보점 검출부(33)에 의해 검출된 점의 위치에서 분할이 수행될 때 발생될 문자 부분의 중심 좌표가 대응하는 문자 입력 상자에서의 미리 정해진 범위 내에 속하고, 2) 문자 부분의 크기가 미리 정해진 값보다 크거나 같으며, 3) 문자 부분의 중심 좌표와 이 중심 좌표의 위치에 가장 가까이 위치한 문자 입력 상자의 수직 라인 사이의 거리가 미리 정해진 값보다 크거나 같은 경우, 문자 분할 후보점 검출부(33)에 의해 검출된 위치에서의 분할이 수행되어야 하는 것으로 결정하는 기능을 가질 수 있다.The character division point determination unit 34 also performs, for example, 1) a character input box to which the center coordinates of the character portion to be generated when the division is performed at the position of the point detected by the character division candidate point detection unit 33 are performed. Is within a predetermined range of 2) the size of the text portion is greater than or equal to a predetermined value, and 3) the center line of the text portion and the vertical line of the text input box nearest the position of the center coordinate. When the distance is greater than or equal to a predetermined value, it may have a function of determining that division at the position detected by the character division candidate point detection unit 33 should be performed.

문자 부분 발생부(35)는 문자 분할 후보점 검출부(33)로부터 획득된 분할 후보점의 위치에서 문자열을 분할하고, 문자열을 구성하는 각각의 문자 부분을 발생한다. 문자 부분 발생부(35)에 의해 추출된 각각의 문자 부분 이미지는 문자 부분 이미지 메모리(43)에 저장된다.The character part generation unit 35 divides the character string at the position of the division candidate point obtained from the character division candidate point detection unit 33, and generates each character part constituting the character string. Each character part image extracted by the character part generator 35 is stored in the character part image memory 43.

문자 부분 속성 첨부부(36)는 문자 부분 이미지 메모리(43)에 저장된 각각의 문자 부분에, 문자 부분 이미지 메모리(43)에 저장된 각각의 문자 부분이 속해 있는 문자 입력 상자를 식별하기 위한 상자 속성(예를 들어, 상자 번호) 등의 속성 정보를 첨부한다. 문자 부분이 속해 있는 문자 입력 상자는 정보 데이터베이스(30)에 저장되어 있는 상자 정보(상자 유형 정보 및 배치 위치 정보를 포함함)에 기초하여 결정된다. 문자 부분 속성 첨부부(36)에 의해 속성이 첨부되는 문자 부분은 문자 부분 이미지 메모리(43)에 저장된다.The character part attribute attachment part 36 has a box attribute for identifying a character input box to which each character part stored in the character part image memory 43 belongs to each character part stored in the character part image memory 43. For example, attribute information such as box number) is attached. The character input box to which the character portion belongs is determined based on the box information (including box type information and arrangement position information) stored in the information database 30. The character part to which an attribute is appended by the character part attribute attachment unit 36 is stored in the character part image memory 43.

문자 후보 발생부(37)는 정보 데이터베이스(30)에 저장되어 있는 문자 후보 발생 규칙에 따라 문자 부분 이미지 메모리(43)에 저장되어 있는 각각의 문자 부분으로부터 각각의 문자 후보를 발생한다. 문자 후보를 발생함에 있어서, 정보 데이터베이스(30)에 저장되어 있는 상자 정보도 역시 필요에 따라 참조된다. 문자 후 보 발생부(37)에 의해 발생된 각각의 문자 후보 이미지는 문자 후보 이미지 메모리(44)에 저장된다.The character candidate generator 37 generates each character candidate from each character part stored in the character part image memory 43 according to the character candidate generation rule stored in the information database 30. In generating the character candidate, the box information stored in the information database 30 is also referred to as necessary. Each character candidate image generated by the character candidate generator 37 is stored in the character candidate image memory 44.

문자 후보 발생부(37)는 동일한 상자 속성이 첨부되어 있는 복수의 문자 부분이 있는 경우, 이 복수의 문자 부분을 재결합함으로써 하나의 문자 후보로서 새로운 문자 부분을 발생하는 기능을 갖는다. 문자 후보 발생부(37)는 또한 문자 부분 속성 첨부부(36)에 의해 어떤 상자 속성도 첨부될 수 없는 문자 부분이 있는 경우(예를 들어, 그 부분이 어느 상자에 속하는지를 알지 못함을 나타내는 속성이 첨부되어 있는 문자 부분이 있는 경우), 그 문자 부분을 인접하는 문자 부분과 재결합함으로써 하나의 문자 후보로서 새로운 문자 부분을 발생하는 기능, 문자 부분 속성 첨부부(36)에 의해 어떤 상자 속성도 첨부될 수 없는 문자 부분이 있고 또 그 문자 부분의 세로 크기가 미리 정해진 값보다 작은 경우, 그 문자 부분을 연자 기호, 기타 등등으로서 간주할 시에 그 문자 부분을 폐기하는 기능, 및 기타 등등의 기능을 갖는다.The character candidate generation unit 37 has a function of generating a new character portion as one character candidate by recombining the plurality of character portions when there are a plurality of character portions to which the same box attribute is attached. The character candidate generator 37 also indicates that if there is a character part to which no box attribute can be attached by the character part attribute attachment part 36 (for example, an attribute indicating that the part does not know which box belongs to). (If there is an attached character portion), the function of generating a new character portion as one character candidate by recombining the character portion with an adjacent character portion, and attaching any box attribute by the character portion attribute attachment portion 36 If there is a character part that cannot be made and the vertical size of the character part is smaller than a predetermined value, the function of discarding the character part when the character part is regarded as a soft symbol, etc. Have

문자 인식부(38)는 문자 후보 이미지 메모리(44)에 저장되어 있는 각각의 문자 후보 이미지에 대해 문자 인식을 수행한다.The character recognition unit 38 performs character recognition on each character candidate image stored in the character candidate image memory 44.

문자열 인식 결과 편집부(39)는 문자 인식부(38)를 통해 획득된 각각의 입력에 대한 문자 인식 결과를 편집하여 하나의 업무 서식과 연관된 문자 후보 결과를 발생한다.The character string recognition result editor 39 edits the character recognition result for each input obtained through the character recognition unit 38 to generate a character candidate result associated with one work style.

업무 서식 인식 결과 출력부(40)는 문자열 인식 결과 편집부(39)에 의해 발생된 문자 후보 결과를 프린터(45) 등의 출력 장치를 통해 출력한다.The business form recognition result output unit 40 outputs the character candidate result generated by the character string recognition result editing unit 39 through an output device such as a printer 45.

본 실시예에 따른 문자 추출 기술의 상세에 대해 이하에 기술한다.Details of the character extraction technique according to the present embodiment will be described below.

<제1 문자 추출 기술><First character extraction technology>

제1 문자 추출 기술에 대해 도 6a 내지 도 6c 및 도 7을 참조하여 기술한다. 유의할 점은 필요에 따라 도 5도 역시 참조된다는 것이다.The first character extraction technique will be described with reference to FIGS. 6A to 6C and 7. Note that FIG. 5 is also referenced as needed.

도 6a에 도시한 바와 같이, 3개의 연속적인 문자 입력 상자에 3자리 문자열 "180"을 기입함으로써 형성되는 이미지가 행 이미지 메모리(42)에 저장되는 것으로 가정한다. 이 문자열의 문자 "8"의 일부분이 대응하는 문자 입력 상자로부터 좌측으로 돌출하여 문자 "1"와 접촉하고 있다. 게다가, 문자 "8"의 다른 일부분은 대응하는 문자 입력 상자로부터 우측으로 돌출하여 문자 "0"과 접촉하고 있다.As shown in Fig. 6A, it is assumed that an image formed by writing the three-digit string " 180 " in three consecutive character input boxes is stored in the row image memory 42. Figs. A part of the character "8" of this character string protrudes to the left from the corresponding character input box and is in contact with the character "1". In addition, the other part of the character "8" protrudes to the right from the corresponding character input box and is in contact with the character "0".

문자 분할 후보점 검출부(33)는 이 문자열의 복수의 라인이 서로 접촉하거나 교차하고 있는 점(분할 후보점)을 검출한다. 이 동작으로, 도 6b에 나타낸 바와 같이, 3개의 분할 후보점(P11, P12, P13)이 획득된다.The character division candidate point detection unit 33 detects a point at which a plurality of lines of the character string contact or intersect with each other (division candidate point). In this operation, as shown in Fig. 6B, three division candidate points P11, P12, and P13 are obtained.

이어서, 문자 분할점 결정부(34)는 각각의 분할 후보점의 위치를 검증한다. 보다 구체적으로는, 문자 분할점 결정부(34)는 문자 분할 후보점 검출부(33)에 의해 검출된 3개의 분할 후보점(P11, P12, P13) 각각과 이 분할 후보점에 가장 가까운 대응하는 문자 입력 상자의 수직 라인 사이의 거리를 구한다. 이 거리가 미리 정해진 값보다 크거나 같은 경우, 문자 분할점 결정부(34)는 이 분할 후보점의 위치에서 분할이 수행되어서는 안되는 것으로 결정한다. 이 거리가 미리 정해진 값보다 작은 경우, 문자 분할점 결정부(34)는 이 분할 후보점의 위치에서 분할이 수행되어야 하는 것으로 결정한다.Subsequently, the character division point determination unit 34 verifies the position of each division candidate point. More specifically, the character division point determination unit 34 includes each of the three division candidate points P11, P12, and P13 detected by the character division candidate point detection unit 33 and the character corresponding to the split candidate point closest to each other. Find the distance between the vertical lines of the input box. If this distance is greater than or equal to a predetermined value, the character division point determination unit 34 determines that division should not be performed at the position of this division candidate point. If this distance is smaller than a predetermined value, the character division point determination unit 34 determines that division should be performed at the position of this division candidate point.

즉, 분할 후보점(P11)과 관련하여, 이 점과 이 점의 우측에 있는 문자 입력 상자의 수직 라인 간의 거리가 구해진다. 분할 후보점(P12)과 관련하여, 이 점과 이 점의 우측에 있는 문자 입력 상자의 수직 라인 간의 거리가 구해진다. 분할 후보점(P13)과 관련하여, 이 점과 이 점의 좌측에 있는 문자 입력 상자의 수직 라인 간의 거리가 구해진다. That is, with respect to the division candidate point P11, the distance between this point and the vertical line of the character input box on the right side of this point is obtained. Regarding the division candidate point P12, the distance between this point and the vertical line of the character input box to the right of this point is obtained. Regarding the division candidate point P13, the distance between this point and the vertical line of the character input box to the left of this point is obtained.

3개의 분할 후보점 중에서, 분할 후보점(P12)과 관련하여, 이 점과 상기 가장 가까운 문자 입력 상자의 수직 라인의 위치 간의 거리가 미리 정해진 값보다 크거나 같기 때문에, 이 분할 후보점(P12)은 3개의 분할 후보점으로부터 제외된다. 그 결과, 분할 후보점(P11, P13)만이 분할이 수행되어야만 하는 점으로서 남는다.Among the three division candidate points, with respect to the division candidate point P12, since the distance between this point and the position of the vertical line of the nearest character input box is greater than or equal to a predetermined value, this division candidate point P12 is used. Is excluded from three division candidate points. As a result, only the division candidate points P11 and P13 remain as points where division should be performed.

이 동작으로, 문자 부분 발생부(35)가 분할 후보점(P11, P13)의 위치에서 문자열을 분할하기 때문에, 도 6c에 나타낸 바와 같이 3개의 문자 부분이 얻어진다. 문자 후보 발생부(37)가 상기 3개의 문자 부분을 문자 후보로서 설정하기 때문에, 문자 인식부(38)는 문자 "1", "8" 및 "0"를 문자 인식 결과로서 획득한다.In this operation, since the character portion generation unit 35 divides the character string at the positions of the division candidate points P11 and P13, three character portions are obtained as shown in Fig. 6C. Since the character candidate generation unit 37 sets the three character portions as character candidates, the character recognition unit 38 obtains the characters "1", "8" and "0" as the character recognition results.

유의할 점은 각각의 분할 후보점의 위치를 검증함에 있어서, 문자 입력 상자의 중심으로부터의 천이폭(shift width), 기타 등등이 문자 입력 상자의 수직 라인으로부터의 거리 대신에 사용될 수 있다는 것이다.Note that in verifying the position of each split candidate point, a shift width from the center of the text entry box, etc. may be used instead of the distance from the vertical line of the text entry box.

그 다음에, 제1 문자 추출 기술에 기초한 동작의 예를 도 7을 참조하여 기술한다.Next, an example of an operation based on the first character extraction technique will be described with reference to FIG.

복수의 문자 입력 상자에 기입되어 있는 문자열에 포함된 모든 분할 후보점(복수의 라인이 서로 접촉하거나 교차하고 있는 점)이 검출되고(단계 S11), 검출된 분할 후보점이 순차적으로 검증된다(단계 S12).All division candidate points (points where a plurality of lines contact or intersect with each other) included in character strings written in the plurality of character input boxes are detected (step S11), and the detected division candidate points are sequentially verified (step S12). ).

대상 분할 후보점과 이 분할 후보점에 가장 가까운 문자 입력 상자의 수직 라인 간의 거리가 미리 정해진 값보다 크거나 같은지 여부가 결정된다(단계 S13). 이 거리가 미리 정해진 값보다 크거나 같은 경우, 분할 후보점의 위치에서의 분할이 수행되지 않는다. 이 거리가 미리 정해진 값보다 작은 경우, 발생된 문자 부분으로부터 문자 후보를 발생하기 위해 분할 후보점의 위치에서의 분할이 수행된다(단계 S14).It is determined whether or not the distance between the target division candidate point and the vertical line of the character input box closest to the division candidate point is greater than or equal to a predetermined value (step S13). If this distance is greater than or equal to a predetermined value, division at the position of the division candidate point is not performed. If this distance is smaller than a predetermined value, division at the position of the division candidate point is performed to generate a character candidate from the generated character portion (step S14).

모든 문자 후보가 이와 같은 방식으로 발생되었을 때, 각각의 문자 후보에 대한 문자 인식이 수행된다.When all character candidates have been generated in this manner, character recognition for each character candidate is performed.

<제2 문자 추출 기술><Second Character Extraction Technology>

제2 문자 추출 기술에 대해 도 8a 내지 도 8c 및 도 9를 참조하여 기술한다. 유의할 점은 필요에 따라 도 5도 역시 참조된다는 것이다.The second character extraction technique will be described with reference to Figs. 8A to 8C and 9. Note that FIG. 5 is also referenced as needed.

도 8a에 도시한 바와 같이, 3개의 연속적인 문자 입력 상자에 3자리 문자열 "145"을 기입함으로써 형성된 이미지가 행 이미지 메모리(42)에 저장되어 있다. 이 문자열의 문자 "4"의 일부분이 대응하는 문자 입력 상자로부터 좌측으로 돌출하여 문자 "1"과 접촉해 있다. 게다가, 문자 "5"의 일부분은 대응하는 문자 입력 상자로부터 좌측으로 돌출하여 문자 "4"와 접촉해 있다.As shown in Fig. 8A, the image formed by writing the three-digit string " 145 " in three consecutive character input boxes is stored in the row image memory 42. The image is stored in the row image memory 42. As shown in Figs. A portion of the character "4" of this character string protrudes to the left from the corresponding character input box and contacts the character "1". In addition, a portion of the character "5" protrudes to the left from the corresponding character input box and contacts the character "4".

문자 분할 후보점 검출부(33)는 이 문자열의 복수의 라인이 서로 접촉하거나 교차하고 있는 점(분할 후보점)을 검출한다. 이 동작으로, 도8b에 나타낸 바와 같이, 3개의 분할 후보점(P21, P22, P23)이 얻어진다.The character division candidate point detection unit 33 detects a point at which a plurality of lines of the character string contact or intersect with each other (division candidate point). In this operation, as shown in Fig. 8B, three division candidate points P21, P22, and P23 are obtained.

이어서, 문자 분할점 결정부(34)는 개별적인 분할 후보점의 위치에서 분할이 수행될 때 발생되는 문자 부분의 위치 또는 크기를 검증한다. 보다 구체적으로는, 분할 후보점(P21, P22, P23)의 위치에서 분할이 순차적으로 수행될 때 발생되는 개별적인 문자 부분의 중심 좌표 또는 세로 크기에 기초하여, 문자 분할점 결정부(34)는 각각의 분할 후보점의 위치에서 분할을 수행할지 여부를 결정한다.Subsequently, the character division point determination unit 34 verifies the position or size of the character portion generated when the division is performed at the position of the individual division candidate point. More specifically, on the basis of the center coordinates or the vertical size of the individual character portions generated when the division is sequentially performed at the positions of the division candidate points P21, P22, and P23, the character division point determination units 34 respectively It is determined whether to perform the division at the position of the division candidate point of.

분할 후보점(P21)의 위치에서 분할이 수행되는 경우, 도 8c에 나타낸 문자 부분(1)이 발생된다. 이하의 검증 결과가 문자 부분(1)에 대해 얻어진다.When division is performed at the position of division candidate point P21, the character portion 1 shown in Fig. 8C is generated. The following verification results are obtained for the character part 1.

● 문자 부분(1)의 중심 좌표가 문자 입력 상자의 중심 근방에(미리 정해진 직사각형 범위 내에) 위치한다.The center coordinates of the character part 1 are located near the center of the character input box (within a predetermined rectangular range).

● 문자 부분(1)의 세로 크기가 문자 입력 상자의 크기에 비해 작지 않다(미리 정해진 값보다 크거나 같다).● The vertical size of the character part 1 is not smaller than the size of the character input box (greater than or equal to a predetermined value).

● 문자 부분(1)의 중심 좌표가 가장 가까운 문자 입력 상자의 수직 라인의 위치로부터 이격되어 있다(미리 정해진 값보다 크거나 같다).The center coordinates of the character part 1 are spaced apart (greater than or equal to a predetermined value) from the position of the vertical line of the nearest character input box.

그 결과, 문자 분할점 결정부(34)는 문자 부분(1)이 문자 후보일 수 있는 것으로 간주하고, 분할 후보점(P21)의 위치에서 분할이 수행되어야 하는 것으로 결정한다. 문자 부분 발생부(35)는 이 분할을 실제로 수행함으로써 문자 부분(1)을 발생하고, 문자 후보 발생부(37)는 문자 부분(1)을 문자 후보로서 설정한다.As a result, the character division point determination unit 34 considers that the character portion 1 may be a character candidate, and determines that division should be performed at the position of the division candidate point P21. The character part generator 35 generates the character part 1 by actually performing this division, and the character candidate generator 37 sets the character part 1 as the character candidate.

분할 후보점(P22)의 위치에서 분할이 수행되는 경우, 도 8c에 나타낸 문자 부분(2)이 발생된다. 이하의 검증 결과가 문자 부분(2)에 대해 얻어진다.When division is performed at the position of division candidate point P22, the character portion 2 shown in Fig. 8C is generated. The following verification results are obtained for the character part 2.

● 문자 부분(2)의 중심 좌표가 상자의 중심 근방에 위치하지 않고 제1 문자 입력 상자와 제2 문자 입력 상자 사이에 위치한다(미리 정해진 직사각형 범위 내에 속하지 않는다).The center coordinates of the character part 2 are not located near the center of the box but between the first text input box and the second text input box (not within the predetermined rectangular range).

● 문자 부분(2)의 세로 크기가 상자의 크기에 비해 작다(미리 정해진 값보다 작다).● The vertical size of the character part 2 is smaller than the size of the box (less than a predetermined value).

● 문자 부분(2)의 중심 좌표가 가장 가까운 문자 입력 상자의 수직 라인의 위치 근방에(미리 정해진 거리 내에) 위치한다.The center coordinates of the character part 2 are located near the position of the vertical line of the nearest character input box (within a predetermined distance).

그 결과, 문자 분할점 결정부(34)는 문자 부분(2)이 문자 후보일 수 없는 것으로 간주하고, 분할 후보점(P22)의 위치에서 분할이 수행되어서는 안되는 것으로 결정한다. 따라서, 문자 부분(2)은 그 자체로 문자 후보가 되지 않는다.As a result, the character division point determination unit 34 considers that the character portion 2 cannot be a character candidate, and determines that division should not be performed at the position of the division candidate point P22. Thus, the character portion 2 does not itself become a character candidate.

분할 후보점(P23)의 위치에서 분할이 수행되는 경우, 도 8c에 나타낸 문자 부분(3)이 발생된다. 문자 부분(1)에 대해 얻은 것과 동일한 검증 결과가 문자 부분(3)에 대해 얻어진다.When division is performed at the position of division candidate point P23, the character portion 3 shown in Fig. 8C is generated. The same verification result as that obtained for the character part 1 is obtained for the character part 3.

그 결과, 문자 분할점 결정부(34)는 문자 부분(3)이 문자 후보일 수 있는 것으로 간주하고, 분할 후보점(P23)의 위치에서 분할이 수행되어야 하는 것으로 결정한다. 문자 부분 발생부(35)는 이 분할을 실제로 수행함으로써 문자 부분(3)을 발생하고, 문자 후보 발생부(37)는 문자 부분(3)을 문자 후보로서 설정한다.As a result, the character division point determination unit 34 considers that the character portion 3 may be a character candidate, and determines that division should be performed at the position of the division candidate point P23. The character part generator 35 generates the character part 3 by actually performing this division, and the character candidate generator 37 sets the character part 3 as the character candidate.

문자 부분(4)이 동일한 방식으로 검증될 때, 문자 부분(1, 3)에 대한 동일한 검증 결과가 얻어지고, 문자 부분(4)은 문자 후보가 된다.When the character part 4 is verified in the same way, the same verification result for the character parts 1 and 3 is obtained, and the character part 4 becomes a character candidate.

이 동작으로, 문자 후보 발생부(37)는 도 8c에 나타낸 바와 같이 3개의 문자 부분(1, 3, 4)을 문자 후보로서 발생하고, 따라서 문자 인식부(38)는 문자 인식 결 과로서 문자 "1", "4" 및 "5"를 얻는다.In this operation, the character candidate generator 37 generates three character portions 1, 3, and 4 as character candidates, as shown in Fig. 8C, so that the character recognition unit 38 generates characters as a character recognition result. "1", "4" and "5" are obtained.

이어서, 제2 문자 추출 기술에 기초한 동작의 예에 대해 도 9를 참조하여 기술한다.Next, an example of an operation based on the second character extraction technique will be described with reference to FIG. 9.

복수의 문자 입력 상자에 기입된 문자열에 포함된 모든 분할 후보점(복수의 라인이 서로 접촉하거나 교차하고 있는 점)이 검출된다(단계 S21). 개별적인 검출된 분할 후보점의 위치에서 분할이 수행될 때 발생될 문자 부분이 순차적으로 검증된다(단계 S22).All division candidate points (points where a plurality of lines contact or cross each other) included in the character strings written in the plurality of character input boxes are detected (step S21). The character portion to be generated when the division is performed at the position of the individual detected division candidate point is sequentially verified (step S22).

즉, 1) 대상 분할 후보점의 위치에서 분할이 수행될 때 발생될 문자 부분의 중심 좌표가 문자 입력 상자의 중심 근방에(미리 정해진 직사각형 범위 내에) 위치하는지 여부, 2) 문자 부분의 세로 크기가 문자 입력 상자의 크기에 비해 작은지(미리 정해진 값보다 크거나 같은지) 여부, 및 3) 문자 부분의 중심 좌표가 가장 가까운 문자 입력 상자의 수직 라인의 위치로부터 이격되어 있는지(미리 정해진 거리 이상 만큼) 여부가 결정된다(단계 S23 내지 S25).That is, 1) whether the center coordinates of the character portion to be generated when the division is performed at the position of the target division candidate point are located near the center of the character input box (within a predetermined rectangular range), and 2) the vertical size of the character portion is Is smaller than the size of the text entry box (greater than or equal to a predetermined value), and 3) the center coordinates of the text portion are separated from the position of the vertical line of the nearest text entry box (by a predetermined distance or more). It is determined whether or not (steps S23 to S25).

상기한 3가지 조건이 만족되지 않는 경우, 대응하는 분할 후보점의 위치에서 분할이 수행되지 않는다. 상기 3가지 조건이 만족되는 경우, 발생된 문자 부분으로부터 문자 후보를 발생하기 위해 대응하는 분할 후보점의 위치에서 분할이 수행된다(단계 S26).If the above three conditions are not satisfied, the division is not performed at the position of the corresponding division candidate point. If the above three conditions are satisfied, division is performed at the position of the corresponding division candidate point to generate a character candidate from the generated character portion (step S26).

이와 같은 방식으로 모든 문자 후보가 발생된 경우, 각각의 문자 후보에 대해 문자 인식이 수행된다.When all character candidates are generated in this manner, character recognition is performed for each character candidate.

<제3 문자 추출 기술><Third character extraction technology>

도 10a 내지 도 13을 참조하여 제3 문자 추출 기술에 대해 기술한다. 유의할 점은 필요에 따라 도 5도 역시 참조된다는 것이다.A third character extraction technique will be described with reference to FIGS. 10A to 13. Note that FIG. 5 is also referenced as needed.

상기한 문자 추출 프로세싱(구체적으로는, 분할이 수행되어야 하는 점을 결정하는 프로세싱 및 문자 후보를 발생하는 프로세싱)은 문자 입력 상자의 유형에 의존한다. 문자 입력 상자는 도 10a 내지 도 10e에 나타낸 것과 같이 다양한 형상을 포함한다. 문자 입력 상자가 형상이 변경되는 경우, 분할이 수행되어야 하는 점을 결정하는 프로세싱 및 문자 후보를 발생하는 프로세싱이 변경되어야만 한다. 이러한 변경에 대한 프로세싱은 일반적으로 많은 개발 단계를 요구한다. 따라서, 문자 입력 상자가 형상이 변경될 때마다 이러한 변경이 행해지는 경우, 개발 비용이 증가한다. 이하의 문자 추출 기술은 이 문제를 해결한다.The above-described character extraction processing (specifically, processing for determining where division should be performed and processing for generating character candidates) depends on the type of character input box. The text input box includes various shapes as shown in FIGS. 10A to 10E. When the character input box is changed in shape, the processing for determining where the division should be performed and the processing for generating the character candidate must be changed. Processing for these changes generally requires many development steps. Therefore, when such a change is made every time the character input box is changed in shape, the development cost increases. The following character extraction technique solves this problem.

도 11a에 도시한 바와 같이, 3개의 연속적인 문자 입력 상자에 3자리 문자열 "180"을 기입함으로써 형성되는 이미지가 행 이미지 메모리(42)에 저장되는 것으로 가정한다. 이 문자열의 문자 "8"의 일부분이 대응하는 문자 입력 상자로부터 좌측으로 돌출하여 문자 "1"과 접촉하고 있다. 게다가, 문자 "8"의 다른 일부분이 대응하는 문자 입력 상자로부터 우측으로 돌출하여 문자 "0"과 접촉하고 있다. 식별을 위한 상자 번호가 3개의 문자 입력 상자에 첨부되어 있는 것으로 가정한다. 이 경우에, 좌측으로부터 3개의 문자 입력 상자에 상자 번호로서 "0", "1" 및 "2"가 순차적으로 첨부되어 있는 것으로 가정한다.As shown in Fig. 11A, it is assumed that an image formed by writing the three-digit string " 180 " in three consecutive character input boxes is stored in the row image memory 42. Figs. A part of the character "8" of this character string protrudes to the left from the corresponding character input box and is in contact with the character "1". In addition, another portion of the character "8" protrudes to the right from the corresponding character input box and is in contact with the character "0". Assume that the box number for identification is attached to three letter entry boxes. In this case, it is assumed that "0", "1", and "2" are sequentially attached to three character input boxes from the left as box numbers.

상기한 제2 기술에서와 같이, 문자 분할 후보점 검출부(33)는 이 문자열의 복수의 라인이 서로 접촉하거나 교차하고 있는 점(분할 후보점)을 검출한다. 이 동작으로, 3개의 분할 후보점이 얻어진다. 유의할 점은 이 경우에 문자 분할점 결정부(34)에 의한 분할 후보점의 범위 축소(narrowing down)가 수행되지 않는다는 것이다.As in the second technique described above, the character division candidate point detection unit 33 detects a point (division candidate point) at which a plurality of lines of the character string are in contact with or intersecting with each other. In this operation, three division candidate points are obtained. Note that in this case, the narrowing down of the split candidate point by the character split point determiner 34 is not performed.

이어서, 문자 부분 발생부(35)는 문자열을 구성하는 각각의 문자 부분을 발생하기 위해 문자 분할 후보점 검출부(33)에 의해 검출된 각각의 분할 후보점에서 문자열을 분할한다. 즉, 도 11b에 나타낸 바와 같이 각각의 문자 입력 상자의 위치를 고려하지 않고 융통성있는 분할에 의해 4개의 문자 부분 "1", "o", "o" 및 "0"이 발생된다.Then, the character part generation unit 35 divides the character string at each division candidate point detected by the character division candidate point detection unit 33 to generate each character part constituting the character string. That is, as shown in Fig. 11B, four character parts " 1 ", " o ", " o " and " 0 " are generated by flexible division without considering the position of each character input box.

문자 부분 속성 첨부부(36)는 4개의 발생된 문자 부분 "1", "o", "o" 및 "0"에 대응하는 속성을 첨부한다.The character part attribute attachment part 36 attaches attributes corresponding to four generated character parts "1", "o", "o" and "0".

이하는 각각의 부분에 첨부된 속성의 예이다.The following are examples of attributes attached to each part.

● 그 부분의 위치와 대응하는 문자 입력 상자의 위치 간의 상대적 관계를 나타내는 정보,Information indicating the relative relationship between the position of the part and the position of the corresponding character entry box,

● 그 부분이 어느 문자 입력 상자에 속하는지를 나타내는 정보,Information indicating which character input box the part belongs to,

● 그 부분이 어느 문자 입력 상자에 속하는지를 모르고 있음을 나타내는 정보,● information indicating that the part does not belong to a character input box,

● 그 부분의 크기를 나타내는 정보,Information indicating the size of the part;

● 그 부분의 형상을 나타내는 정보,Information indicating the shape of the part,

● 그 부분이 분할 후보점의 위치에서의 분할에 의해 발생되는지 여부를 나타내는 정보,Information indicating whether or not the part is generated by division at the position of the division candidate point,

● 그 부분을 발생하기 위해 분할이 수행되는 특정의 분할 후보점의 위치를 나타내는 정보, 및Information indicating the position of a specific splitting candidate point at which splitting is performed to generate the portion, and

● 상기 정보들을 선택적으로 결합함으로써 얻어지는 정보.Information obtained by selectively combining the above information.

도 11b에 나타낸 예에서, 상자 번호 "0", "1", "1" 및 "2"가 개별적인 문자 부분이 속해 있는(상자 속성이 첨부되어 있는) 문자 입력 상자에 속성으로서 첨부되어 있다. 즉, 문자 "1"의 부분의 중심 좌표가 상자 번호 "0"을 갖는 문자 입력 상자 내에 속해 있기 때문에, 상자 번호 "0"이 속성으로서 첨부된다. 문자 "8"을 분리시킴으로써 형성된 2개의 부분 "o" 및 "o"의 중심 좌표가 상자 번호 "1"을 갖는 문자 입력 상자 내에 속해 있기 때문에, 상자 번호 "1"이 속성으로서 첨부된다. 문자 "0"의 부분의 중심 좌표가 상자 번호 "2"를 갖는 문자 입력 상자 내에 속해 있기 때문에, 상자 번호 "2"가 속성으로서 첨부된다.In the example shown in Fig. 11B, box numbers "0", "1", "1" and "2" are attached as attributes to a character input box to which individual character portions belong (attached to the box attributes). That is, since the center coordinates of the part of the character "1" belong to the character input box with the box number "0", the box number "0" is attached as an attribute. Box number "1" is appended as an attribute because the center coordinates of the two parts "o" and "o" formed by separating the letter "8" belong in the character input box having the box number "1". Since the center coordinates of the part of the character "0" belong to the character input box having the box number "2", the box number "2" is attached as an attribute.

유의할 점은 문자 후보를 발생하기 위해 부분들이 결합되는지 여부 및 결합된 부분들이 문자 후보로서 사용되는지 여부를 결정하기 위한 기준인 문자 후보 발생 규칙이 정보 데이터베이스(30)에 미리 저장되어 있다는 것이다. 이렇게 함으로써 문자 후보 발생부(37)가 문자 후보 발생 규칙에 따라 각각의 문자 후보를 발생하는 것이 가능하게 된다.Note that the character candidate generation rule, which is a criterion for determining whether the portions are combined to generate the character candidate and whether the combined portions are used as the character candidate, is previously stored in the information database 30. In this way, the character candidate generation unit 37 can generate each character candidate in accordance with the character candidate generation rule.

예를 들어, "동일한 상자 번호를 갖는 부분들이 결합되고, 결합되기 이전의 부분들은 문자 후보로부터 제외된다"라는 문자 후보 발생 규칙이 준비되고, 각각의 문자 후보는 이 규칙에 따라 발생된다.For example, a character candidate generation rule is prepared, "parts having the same box number are joined, and parts before joining are excluded from the character candidate", and each character candidate is generated according to this rule.

이 동작으로, 도 11c에 나타낸 바와 같이, 문자 후보 발생부(37)가 문자 후 보로서 3개의 문자 부분을 발생하기 때문에, 문자 인식부(38)는 문자 인식 결과로서 문자 "1", "8" 및 "0"을 얻는다.In this operation, as shown in Fig. 11C, since the character candidate generator 37 generates three character portions as character candidates, the character recognition portion 38 causes the characters " 1 " and " 8 " "And" 0 "are obtained.

도 4a 및 도 4b에 나타낸 기술과 비교하여, 발생 규칙 및 상자 정보로부터 얻은 간단한 속성을 제공하는 것만으로 문자 후보의 수가 10에서 3으로 크게 감소될 수 있고, 도 4a 및 도 4b에 나타낸 것과 같은 모호한 문자 후보(인식 결과를 "1000"이 되게 할 수 있는 후보)의 발생이 방지될 수 있다.Compared with the techniques shown in FIGS. 4A and 4B, the number of character candidates can be greatly reduced from 10 to 3 only by providing simple attributes obtained from occurrence rules and box information, and ambiguous as shown in FIGS. 4A and 4B. The occurrence of a character candidate (a candidate that can make the recognition result "1000") can be prevented.

이하는 속성 및 문자 후보 발생 규칙의 예이다.The following is an example of an attribute and character candidate generation rule.

● 작은 크기를 갖는 어떤 부분도 그 부분이 다른 문자 후보와 결합되지 않으면 문자 후보로서 남겨지지 않는다.• Any part with a small size is not left as a character candidate unless that part is combined with another character candidate.

● 동일한 상자 속성을 갖는 부분들을 결합함으로써 형성된 새로운 부분은 결합된 그 부분들의 상자 속성과 동일한 상자 속성을 제공받는다.A new part formed by combining parts with the same box properties is provided with the same box properties as the box properties of those parts combined.

● 특정의 부분에 속하는 것으로 알려져 있지 않은 어떤 부분도 그 부분이 상자 속성을 갖는 부분과 결합되지 않으면 문자 후보로서 남겨지지 않는다.• Any part that is not known to belong to a particular part is not left as a character candidate unless that part is combined with a part that has a box attribute.

● 특정의 부분에 속하는 것으로 알려져 있지 않고 또 미리 정해진 크기보다 작은 크기를 갖는 어떤 부분은 문자 후보를 발생하기 위해 좌측 및 우측 부분 둘다와 결합되거나 이들 중 어느 것과도 결합되지 않고 제거된다.Any part not known to belong to a particular part and having a size smaller than a predetermined size is removed without combining with both the left and right parts or with any of them to generate a character candidate.

이어서, 제3 문자 추출 기술에 기초한 동작의 예에 대해 도 12 및 도 13을 참조하여 기술한다. 이 동작에 대해서는, 각각의 문자 부분에 속성을 첨부하는 속성 첨부 프로세싱 및 속성 첨부 프로세싱 이후에 문자 후보를 발생하는 문자 후보 발생 프로세싱에 관하여 개별적으로 기술한다.Next, an example of an operation based on the third character extraction technique will be described with reference to FIGS. 12 and 13. This operation is described separately with respect to attribute candidate processing for attaching an attribute to each character portion and character candidate generation processing for generating a character candidate after attribute attachment processing.

속성 첨부 프로세싱에서의 동작의 예에 대해 도 12를 참조하여 먼저 기술한다.An example of an operation in attribute attachment processing is first described with reference to FIG. 12.

복수의 문자 입력 상자에 기입된 문자열에 포함된 모든 문자 후보점(복수의 라인이 서로 접촉하거나 교차하고 있는 점)이 검출되고, 복수의 문자 부분을 발생하기 위해 모든 문자 후보점의 위치에서 분할이 수행된다(단계 S31).All character candidate points (points where a plurality of lines touch or intersect with each other) included in a string written in a plurality of character input boxes are detected, and division is performed at the positions of all character candidate points to generate a plurality of character parts. Is performed (step S31).

복수의 문자 부분이 발생되는 경우, 개별적인 문자 부분에 속성을 순차적으로 첨부하는 프로세싱이 수행된다(단계 S32).When a plurality of character portions are generated, processing for sequentially attaching attributes to the individual character portions is performed (step S32).

대상 문자 부분으로부터의 거리가 미리 정해진 값보다 작은 중심 위치를 갖는 문자 입력 상자(즉, 대상 문자 부분에 가까운 문자 입력 상자)를 검출하기 위해(단계 S34), 대상 문자 부분의 위치가 각각의 문자 입력 상자의 위치와 비교된다(단계 S33). 문자 부분의 위치와 문자 입력 상자의 위치 간의 상대적 관계를 나타내는 속성(문자 입력 상자를 식별하기 위한 상자 번호를 포함함)이 대상 문자 부분에 첨부되고, 문자 입력 상자의 수직 괘선과 대상 문자 부분의 중심 간의 거리, 문자 입력 상자의 중심과 대상 문자 부분의 중심 간의 거리, 기타 등등에 관한 정보가 등록된다(단계 S35).In order to detect a character entry box (i.e., a character entry box close to the target character portion) having a center position whose distance from the target character portion is smaller than a predetermined value (step S34), the position of the target character portion is inputted to each character. It is compared with the position of the box (step S33). An attribute indicating the relative relationship between the position of the character portion and the position of the character entry box (including the box number to identify the character entry box) is attached to the target character portion, and the vertical ruled line of the character entry box and the center of the target character portion Information relating to the distance between them, the distance between the center of the character input box and the center of the target character portion, and so forth (step S35).

대상 문자 부분 근방에 있는 어떤 문자 입력 상자도 검출되지 않는 경우, 문자 부분이 어떤 상자에도 속하지 않음을 나타내는 속성이 대상 문자 부분에 첨부된다(단계 S36 및 S37).If no character input box near the target character portion is detected, an attribute indicating that the character portion does not belong to any box is attached to the target character portion (steps S36 and S37).

게다가, 대상 문자 부분의 크기 및 형상의 정보가 속성으로서 첨부된다(단계 S38 및 S39). 대상 문자 부분이 분할에 의해 발생된 문자 부분인 경우, 분할 후보 점이 등록된다. 대상 문자 부분이 처음부터 분리되어 있는 문자 부분인 경우, 대응하는 정보를 나타내는 속성이 그 문자 부분에 첨부된다(단계 S41 내지 S43).In addition, information on the size and shape of the target character portion is attached as an attribute (steps S38 and S39). When the target character portion is the character portion generated by the division, the division candidate point is registered. If the target character portion is a character portion separated from the beginning, an attribute representing corresponding information is attached to the character portion (steps S41 to S43).

속성이 이와 같은 방식으로 모든 문자 부분에 첨부된 경우, 문자 후보 발생 프로세싱이 수행된다.If an attribute is attached to all character parts in this manner, character candidate generation processing is performed.

이어서, 문자 후보 발생 프로세싱에서의 동작의 예에 대해 도 13을 참조하여 기술한다.Next, an example of the operation in the character candidate generation processing will be described with reference to FIG. 13.

모든 문자 부분이 분할에 의해 발생되고 이들에 속성이 첨부된 후에, 우측의 인접한 문자 부분과 결합할지를 결정하기 위해 최좌측(leftmost) 문자 부분부터 순차적으로 검증이 수행된다(단계 S51).After all the character parts are generated by the division and attributes are attached to them, verification is sequentially performed from the leftmost character part to determine whether to combine with the adjacent adjacent character part on the right (step S51).

즉, 우측의 인접한 문자 부분과 대상 부분이 속성으로서 동일한 상자 번호를 갖는지 여부가 결정된다. 이들이 동일한 상자 번호를 갖는 경우, 우측의 인접한 문자 부분과 대상 부분이 결합되어 새로운 문자 부분을 발생하고, 결합 동작에 사용된 원래의 문자 부분은 폐기된다(단계 S52 내지 S54). 이들이 동일한 상자 번호를 갖지 않는 경우, 결합 프로세싱이 수행되지 않는다. 문자 부분이 인접한 우측 상에 존재하는 경우, 프로세싱의 대상이 인접한 우측 상에 있는 문자 부분으로 이동되고 검증이 반복된다(단계 S55 및 S56).That is, it is determined whether the adjacent character portion and the target portion on the right have the same box number as an attribute. If they have the same box number, the adjacent character portion and the target portion on the right side are combined to generate a new character portion, and the original character portion used for the combining operation is discarded (steps S52 to S54). If they do not have the same box number, the join processing is not performed. If the character portion exists on the adjacent right side, the object of processing is moved to the character portion on the adjacent right side and verification is repeated (steps S55 and S56).

이와 같은 방식으로 재구성된 각각의 문자 부분이 문자 후보로서 설정되고, 모든 문자 후보의 발생이 완료된다. 모든 문자 후보의 발생이 완료되면, 각각의 문자 후보에 대해 문자 인식이 수행된다.Each character portion reconstructed in this manner is set as a character candidate, and generation of all character candidates is completed. When generation of all character candidates is completed, character recognition is performed for each character candidate.

이어서, 상기 제3 문자 추출 기술을 연자 기호를 포함하는 문자열에 적용하 는 것의 예에 대해 도 14a 내지 도 18을 참조하여 기술한다.Next, an example of applying the third character extraction technique to a character string including a soft symbol is described with reference to FIGS. 14A to 18.

도 14a에 도시한 바와 같이, 3개의 연속적인 문자 입력 상자에 3자리 문자열 "000"을 기입함으로써 형성된 이미지가 행 이미지 메모리(42)에 저장되어 있는 것으로 가정한다. 이 문자열에서, 인접한 문자 "0"은 연자 기호 "-"를 통해 서로 결합되어 있다.As shown in Fig. 14A, it is assumed that an image formed by writing three character strings " 000 " in three consecutive character input boxes is stored in the row image memory 42. Figs. In this string, the adjacent characters "0" are joined to each other via the character symbol "-".

문자 분할 후보점 검출부(33)는 이 문자열의 복수의 라인이 서로 접촉하거나 교차하고 있는 점(분할 후보점)을 검출한다. 이 동작으로, 도 14b에 나타낸 바와 같이, 3개의 분할 후보점(P31, P32, P33)이 얻어진다.The character division candidate point detection unit 33 detects a point at which a plurality of lines of the character string contact or intersect with each other (division candidate point). In this operation, as shown in Fig. 14B, three division candidate points P31, P32, and P33 are obtained.

문자 부분 발생부(35)는 문자열을 구성하는 문자 부분들을 발생하기 위해 문자 분할 후보점 검출부(33)에 의해 검출된 개별적인 분할 후보점의 위치에서 문자열을 분할한다. 즉, 도 14c에 나타낸 바와 같이, 2개의 연자 기호를 포함하는 5개의 문자 부분 "0", "-", "0", "-" 및 "0"이 발생된다.The character part generation unit 35 divides the character string at the position of the individual division candidate point detected by the character division candidate point detection unit 33 to generate the character parts constituting the character string. That is, as shown in Fig. 14C, five character portions " 0 ", "-", " 0 ", "-" and " 0 "

이어서, 문자 부분 속성 첨부부(36)는 5개의 발생된 문자 부분 "0", "-", "0", "-" 및 "0"에 대응하는 속성을 첨부한다. 이 경우에, "문자 부분이 어느 문자 입력 상자에 속하는지가 알려져 있지 않음을 나타내는 정보"가 연자 기호에 대응하는 문자 부분 "-"에 속성으로서 첨부된다.Then, the character part attribute attachment unit 36 attaches attributes corresponding to the five generated character parts "0", "-", "0", "-" and "0". In this case, "information indicating that it is unknown which character input box belongs to" is appended as an attribute to the character part "-" corresponding to the soft symbol.

이어서, 문자 후보 발생부(37)는 문자 후보 발생 규칙에 따라 각각의 문자 후보를 발생한다.Then, the character candidate generator 37 generates each character candidate according to the character candidate generation rule.

이하는 이 경우에 사용되는 문자 후보 발생 규칙의 예이다.The following is an example of a character candidate generation rule used in this case.

● 동일한 상자에 속하는 부분들은 결합된다.● Parts belonging to the same box are combined.

● 특정의 상자에 속하는 것으로 알려져 있지 않은 부분은 어느 것이든 상자 속성을 갖는 부분과 결합된다.Any part not known to belong to a particular box is combined with the part that has a box attribute.

● 특정의 상자에 속하는 것으로 알려져 있지 않은 부분은 어떠한 것도 그 자체로 문자 후보가 될 수 없다.• Any part not known to belong to a particular box cannot itself be a character candidate.

이러한 문자 후보 발생 규칙이 사용될 때, 문자 후보 발생부(37)는 도 15에 나타낸 바와 같이 연자 기호에 대응하는 문자 부분 "-"을 각각의 인접한 문자 부분 "0"과 결합함으로써 형성된 문자 부분을 포함하는 문자 후보를 발생한다. 문자 인식부(38)가 연자 기호를 갖는 "0"을 통상적인 "0"으로 인식하기 때문에, 문자 "0", "0" 및 "0"이 문자 인식 결과로서 얻어진다.When such a character candidate generation rule is used, the character candidate generator 37 includes a character portion formed by combining the character portion "-" corresponding to the soft symbol with each adjacent character portion "0" as shown in FIG. Generates a character candidate. Since the character recognition unit 38 recognizes "0" having the soft symbol as a normal "0", the characters "0", "0" and "0" are obtained as a character recognition result.

예를 들어, 연자 기호를 갖는 "0"이 문자 인식에서 오인될 수 있는 경우, 이하의 문자 후보 발생 규칙이 설정된다.For example, when " 0 " having the soft symbol can be mistaken in character recognition, the following character candidate generation rule is set.

● 특정의 상자에 속하는 것으로 알려져 있지 않고 또 높이 방향으로의 크기가 작은(미리 정해진 값보다 작은) 부분은 아무것도 결합되지 않는다.• Any part that is not known to belong to a particular box and whose size is smaller (less than a predetermined value) in the height direction is not joined.

● 특정의 상자에 속하는 것으로 알려져 있지 않는 부분은 어떠한 것도 그 자체로 문자 후보가 될 수 없다.• Any part not known to belong to a particular box cannot itself be a character candidate.

이러한 문자 후보 발생 규칙이 사용되는 경우, 문자 후보 발생부(37)는 연자 기호에 대응하는 문자 부분 "-"을 폐기하고, 도 16에 나타낸 바와 같이 3개의 문자 부분 "0", "0" 및 "0"만을 남겨 둔다. 따라서, 문자 인식부(38)는 문자 인식 결과로서 문자 "0", "0" 및 "0"을 얻는다.When such a character candidate generation rule is used, the character candidate generator 37 discards the character portion "-" corresponding to the soft symbol, and shows three character portions "0", "0" and "0" as shown in FIG. Leave only "0". Thus, the character recognition unit 38 obtains the characters "0", "0" and "0" as the character recognition result.

이어서, 도 15에 대응하는 규칙을 연자 기호를 포함하는 문자열에 적용함에 있어서의 동작의 예에 대해 도 17을 참조하여 기술한다.Next, an example of the operation in applying the rule corresponding to FIG. 15 to the character string including the soft symbol will be described with reference to FIG. 17.

분할에 의해 모든 문자 부분이 발생되고 속성의 첨부가 완료된 후에, 인접한 문자 부분과 결합할지 여부를 결정하기 위해 최좌측 문자 부분으로부터 순차적으로 검증이 수행된다(단계 S61).After all the character parts are generated by the division and the attachment of the attributes is completed, verification is sequentially performed from the leftmost character part to determine whether to combine with the adjacent character parts (step S61).

상자 속성을 갖지 않는 문자 부분은 검증 대상으로서 설정된다(단계 S62). 인접한 좌측에 상자 속성을 갖는 문자 부분이 있는 경우, 그 문자 부분과 대상 부분을 결합함으로써 문자 부분이 발생되고, 인접한 좌측 문자 부분의 상자 속성과 동일한 상자 속성이 발생된 문자 부분에 첨부된다(단계 S63 및 S64). 인접한 우측에 상자 속성을 갖는 문자 부분이 있는 경우, 그 문자 부분과 대상 부분을 결합함으로써 문자 부분이 발생되고, 인접한 우측 문자 부분의 상자 속성과 동일한 상자 속성이 발생된 문자 부분에 첨부된다(단계 S65 및 S66). 상자 속성을 갖지 않는 부분은 폐기된다(단계 S67).The character portion having no box attribute is set as the verification target (step S62). If there is a character portion having a box attribute on the left side adjacent, the character portion is generated by combining the character portion and the target portion, and the same box attribute as the box attribute of the adjacent left character portion is attached to the generated character portion (step S63). And S64). If there is a character portion having a box attribute on the adjacent right side, a character portion is generated by combining the character portion and the target portion, and the same box attribute as the box attribute of the adjacent right character portion is attached to the generated character portion (step S65). And S66). The part having no box attribute is discarded (step S67).

이어서, 대상 문자 부분의 인접한 우측에 문자 부분이 있는 경우, 프로세싱의 대상은 인접한 우측 문자 부분으로 이동되고, 검증이 반복된다(단계 S68 및 S69).Then, when there is a character portion on the adjacent right side of the target character portion, the object of processing is moved to the adjacent right character portion, and verification is repeated (steps S68 and S69).

이와 같은 방식으로 재구성된 각각의 문자 부분은 문자 후보로서 설정되고, 모든 문자 후보의 발생이 완료된다. 모든 문자 후보의 발생이 완료될 때, 각각의 문자 후보에 대해 문자 인식이 수행된다.Each character portion reconstructed in this manner is set as a character candidate, and generation of all character candidates is completed. When generation of all character candidates is completed, character recognition is performed for each character candidate.

그 다음에, 도 16에 대응하는 규칙을 연자 기호를 포함하는 문자열에 적용함에 있어서의 동작의 예에 대해 도 18을 참조하여 기술한다.Next, an example of the operation in applying the rule corresponding to FIG. 16 to the character string including the soft symbol is described with reference to FIG.

분할에 의해 모든 문자 부분이 발생되고 속성의 첨부가 완료된 후에, 인접한 문자 부분과 결합할지 여부를 결정하기 위해 최좌측 문자 부분으로부터 순차적으로 검증이 수행된다(단계 S71).After all the character parts are generated by the division and the attachment of the attribute is completed, verification is sequentially performed from the leftmost character part to determine whether to combine with the adjacent character part (step S71).

상자 속성을 갖지 않는 문자 부분은 검증 대상으로서 설정된다(단계 S72). 높이 방향에서의 대상 문자 부분의 길이가 미리 정해진 값보다 작은 경우, 그 문자 부분은 폐기된다(단계 S73 및 S74).The character portion having no box attribute is set as the verification target (step S72). If the length of the target character portion in the height direction is smaller than a predetermined value, the character portion is discarded (steps S73 and S74).

이어서, 대상 문자 부분의 인접한 우측에 문자 부분이 있는 경우, 프로세싱의 대상은 인접한 우측 문자 부분으로 이동되고, 검증이 반복된다(단계 S75 및 S76).Then, when there is a character portion on the adjacent right side of the target character portion, the object of processing is moved to the adjacent right character portion, and verification is repeated (steps S75 and S76).

문자 입력 상자의 유형의 변화는 문자 후보 발생 규칙을 변경함으로써 용이하게 처리될 수 있다. 예를 들어, 인접한 문자 입력 상자들이 서로에 가깝거나 문자들이 하나의 상자 라인에 의해 구분되어 있는 경우, 문자들은 높은 가능성으로 서로 접촉할 수 있으며, 문자는 높은 가능성으로 인접한 상자 내로 돌출할 수 있다. 이러한 경우에, 이하의 문자 후보 발생 규칙의 사용은 상자들 사이의 문자까지도 적절하게 발생하는 것이 용이하도록 만들어준다.Changes in the type of character entry box can be easily handled by changing the character candidate generation rule. For example, if adjacent character input boxes are close to each other or characters are separated by one box line, the characters may contact each other with high probability, and the characters may protrude into the adjacent box with high probability. In this case, the use of the following character candidate generation rule makes it easy to properly generate even characters between boxes.

● 동일한 상자에 속하는 부분들이 결합된다.● Parts belonging to the same box are combined.

● 동일한 상자에 속하는 부분들을 결합함으로써 발생되는 부분은 동일한 상자 속성을 부여받는다.• Parts generated by combining parts belonging to the same box are given the same box properties.

● 서로 다른 상자 속성을 갖는 부분들이 서로 인접해 있는 경우 그 부분들조차 서로 결합되고, 2개의 상자 속성이 첨부된다.• If parts with different box properties are adjacent to each other, even those parts are joined to each other and two box properties are attached.

● 결합될 부분들이 3개 이상의 상자 속성을 갖는 경우, 이들은 결합되지 않는다.If the parts to be joined have three or more box properties, they are not joined.

그 다음에, 이러한 문자 후보 발생 규칙의 사용에서의 동작의 예에 대해 도 19를 참조하여 기술한다.Next, an example of the operation in the use of such a character candidate generation rule is described with reference to FIG.

모든 문자 부분이 분할에 의해 발생되고 이들에 속성이 첨부된 후에, 우측의 인접한 문자 부분과 결합할지를 결정하기 위해 최좌측 문자 부분으로부터 순차적으로 검증이 행해진다(단계 S81).After all the character parts are generated by the division and attributes are attached to them, verification is sequentially performed from the leftmost character part to determine whether to combine with the adjacent character part on the right (step S81).

먼저, 대상 문자 부분이 상자 속성을 갖지 않는 문자 부분인지 여부가 결정된다(단계 S82). 대상 문자 부분이 상자 속성을 갖지 않는 문자 부분인 경우, 이하의 프로세싱이 수행된다. 인접한 좌측에 상자 속성을 갖는 문자 부분이 존재하는 경우, 그 문자 부분을 대상 부분과 결합함으로써 문자 부분이 발생되고, 인접한 좌측 문자 부분의 상자 속성과 동일한 상자 속성이 발생된 문자 부분에 첨부된다(단계 S83 및 S84). 인접한 우측에 상자 속성을 갖는 문자 부분이 존재하는 경우, 그 문자 부분을 대상 부분과 결합함으로써 문자 부분이 발생되고, 인접한 우측 문자 부분의 상자 속성과 동일한 상자 속성이 발생된 문자 부분에 첨부된다(단계 S85 및 S86).First, it is determined whether the target character portion is a character portion that does not have a box attribute (step S82). If the target character portion is a character portion that does not have a box attribute, the following processing is performed. If there is a character part having a box attribute on the adjacent left side, the character portion is generated by combining the character portion with the target portion, and the same box attribute as the box attribute of the adjacent left character portion is attached to the generated character portion (step S83 and S84). If there is a character portion having a box attribute on the adjacent right side, the character portion is generated by combining the character portion with the target portion, and the same box attribute as the box attribute of the adjacent right character portion is appended to the generated character portion (step S85 and S86).

대상 문자 부분이 상자 속성을 갖는 문자 부분인 경우, 그 문자 부분이 인접한 우측 문자 부분의 상자 속성과 동일한 상자 속성을 갖는지 여부가 결정된다(단계 S89). 이들이 동일한 상자 속성을 갖는 경우, 그 문자 부분들이 결합되며, 결합된 문자 부분들의 모든 상자 속성이 첨부된다(단계 S91). 이들이 동일한 상자 속성을 가지지 않고 또 인접한 우측 문자 부분의 상자 속성 및 그 문자 부분의 상자 속성의 총수가 3 이상인 경우, 그 문자 부분들이 결합되고 결합된 문자 부분들의 모든 상자 속성이 첨부된다(단계 S91). 이 프로세싱은 인접한 우측 문자 부분의 상자 속성 및 그 문자 부분의 상자 속성의 총수가 3 이상이 될 때까지 반복된다.If the target character portion is a character portion having a box attribute, it is determined whether the character portion has the same box attribute as that of the adjacent right character portion (step S89). If they have the same box attribute, the character parts are combined, and all the box attributes of the combined character parts are attached (step S91). If they do not have the same box attribute and the total number of the box attributes of the adjacent right character portion and the box attributes of the character portion is three or more, the character portions are combined and all the box attributes of the combined character portions are attached (step S91). . This processing is repeated until the total number of the box attributes of the adjacent right character portion and the box attributes of the character portion is three or more.

대상 문자 부분의 인접한 우측에 문자 부분이 있는 경우, 프로세싱의 대상은 인접한 우측 문자 부분으로 이동되고, 검증이 반복된다(단계 S87 및 S88).If there is a character portion on the adjacent right side of the target character portion, the object of processing is moved to the adjacent right character portion, and verification is repeated (steps S87 and S88).

문자 입력 상자의 형상에 따라서는 문자들이 서로 용이하게 접촉하지 않는다. 예를 들어, 문자 입력 상자가 미리 정해진 간격으로 배열되어 있는 경우, 문자들은 서로 용이하게 접촉하지 않는다. 이러한 경우에, 이하의 문자 후보 발생 규칙의 사용은 발생된 불필요한 문자 후보의 수를 감소시키는 것을 가능하게 만들어준다. 따라서, 판독 오류가 감소될 수 있고, 프로세싱 시간이 단축될 수 있다.Depending on the shape of the character input box, the characters do not touch each other easily. For example, when the character input boxes are arranged at predetermined intervals, the characters do not easily touch each other. In such a case, use of the following character candidate generation rule makes it possible to reduce the number of unnecessary character candidates generated. Thus, read errors can be reduced and processing time can be shortened.

● 특정의 상자에 속하는 것으로 알려져 있지 않은 부분은 어떠한 부분도 그 자체로 문자 후보로서 설정되지 않는다.• Any part not known to belong to a particular box is not set as a character candidate by itself.

● 서로 다른 상자 속성을 갖는 부분들은 결합되지 않는다.• Parts with different box properties are not combined.

● 특정의 상자에 속하는 것으로 알려져 있지 않고 또 크기가 작은 어떠한 부분도 결합되지 않고 제거된다.• Any part that is not known to belong to a particular box and that is small in size is removed without being joined.

그 다음에, 이러한 문자 후보 발생 규칙의 사용에서의 동작의 예에 대해 도 20을 참조하여 기술한다.Next, an example of the operation in the use of such a character candidate generation rule will be described with reference to FIG.

분할에 의해 모든 문자 부분이 발생되고 이들에 속성이 첨부된 후에, 우측의 인접한 문자 부분과 결합할지를 결정하기 위해 최좌측 문자 부분으로부터 순차적으로 검증이 수행된다(단계 S101).After all the character parts have been generated by the division and attributes are attached to them, verification is sequentially performed from the leftmost character part to determine whether to combine with the adjacent adjacent character part on the right (step S101).

먼저, 대상 문자 부분이 상자 속성을 갖지 않고 또 미리 정해진 값보다 작은 크기를 갖는 문자 부분인지가 결정된다(단계 S102). 대상 문자 부분이 이들 조건을 만족하는 경우, 그 문자 부분이 폐기된다(단계 S111). 대상 문자 부분이 이들 조건을 만족하지 않는 경우, 대상 문자 부분이 상자 속성을 갖지 않는 문자 부분인지 여부가 결정된다(단계 S103).First, it is determined whether the target character portion has no box attribute and is a character portion having a size smaller than a predetermined value (step S102). If the target character portion satisfies these conditions, the character portion is discarded (step S111). If the target character portion does not satisfy these conditions, it is determined whether or not the target character portion is a character portion having no box attribute (step S103).

대상 문자 부분이 상자 속성을 갖지 않는 문자 부분인 경우, 이하의 프로세싱이 수행된다. 인접한 좌측에 상자 속성을 갖는 문자 부분이 있는 경우, 그 문자 부분을 대상 부분과 결합함으로써 문자 부분이 발생되고, 인접한 좌측 문자 부분의 상자 속성과 동일한 상자 속성이 발생된 문자 부분에 첨부된다(단계 S104 및 S105). 인접한 우측에 상자 속성을 갖는 문자 부분이 있는 경우, 그 문자 부분을 대상 부분과 결합함으로써 문자 부분이 발생되고, 인접한 우측 문자 부분의 상자 속성과 동일한 상자 속성이 발생된 문자 부분에 첨부된다(단계 S106 및 S107). 상자 속성을 갖지 않는 부분은 폐기된다(단계 S108).If the target character portion is a character portion that does not have a box attribute, the following processing is performed. If there is a character portion having a box attribute on the left side adjacent, the character portion is generated by combining the character portion with the target portion, and the same box attribute as the box attribute of the adjacent left character portion is attached to the generated character portion (step S104). And S105). If there is a character portion having a box attribute on the adjacent right side, the character portion is generated by combining the character portion with the target portion, and the same box attribute as the box attribute of the adjacent right character portion is attached to the generated character portion (step S106). And S107). The part not having the box attribute is discarded (step S108).

대상 문자 부분이 상자 속성을 갖는 문자 부분인 경우, 그 문자 부분이 인접한 우측 문자 부분의 상자 속성과 동일한 상자 속성을 갖는지 여부가 결정된다(단계 S112). 이들이 동일한 상자 속성을 갖는 경우, 이들 문자 부분은 서로 결합되고, 결합된 문자 부분들의 상자 속성이 첨부된다(단계 S113). 이 프로세싱은 인접한 우측 문자 부분이 서로 다른 상자 속성을 가질 때까지 반복된다.If the target character portion is a character portion having a box attribute, it is determined whether the character portion has the same box attribute as the box attribute of the adjacent right character portion (step S112). If they have the same box attributes, these character portions are combined with each other, and the box attributes of the combined character portions are appended (step S113). This processing is repeated until adjacent right character parts have different box attributes.

이어서, 대상 문자 부분의 인접한 우측에 문자 부분이 있는 경우, 프로세싱의 대상이 인접한 우측 문자 부분으로 이동되고, 검증이 반복된다(단계 S109 및 S110).Then, when there is a character portion on the adjacent right side of the target character portion, the object of processing is moved to the adjacent right character portion, and verification is repeated (steps S109 and S110).

이와 같은 방식으로 재구성된 각각의 문자 부분은 문자 후보로서 설정되고, 모든 문자 후보의 발생이 완료된다. 모든 문자 후보의 발생이 완료되면, 각각의 문자 후보에 대해 문자 인식이 수행된다.Each character portion reconstructed in this manner is set as a character candidate, and generation of all character candidates is completed. When generation of all character candidates is completed, character recognition is performed for each character candidate.

전술한 바와 같이, 본 실시예에 따르면, 임의의 문자 입력 상자에 의해 제한되지 않는 복수의 문자 분할 후보점이 발생되고, 이 문자 분할 후보점은 문자 입력 상자의 정보에 기초하여 축소되며, 그에 의해 임의의 종래의 문자 인식 기술에 의 해 인식될 수 없는 복잡하게 서로 접촉하고 있는 문자를 추출 및 인식한다. 본 실시예의 기술에 따르면, 추출될 문자 후보가 많이 발생되지 않기 때문에, 프로세싱 시간이 단축될 수 있고, 잘못된 문자 후보의 인식으로 인한 판독 오류가 감소될 수 있다. 게다가, 문자 분할 후보점에 기초하여 문자가 여러가지 문자 부분으로 분리된 후, 문자 입력 상자의 정보에 기초하여 개별적인 문자 부분에 속성을 부가하고 사전 설정된 발생 규칙에 따라 문자 부분들을 결합함으로써 정확한 문자 후보가 효율적으로 발생될 수 있다. 문자 입력 상자의 형상이 변경되는 경우에도, 문자 후보 발생 규칙을 변경함으로써 각각의 문자 입력 상자의 형상에 대응하는 최적의 문자 후보가 발생될 수 있다.As described above, according to this embodiment, a plurality of character division candidate points which are not limited by any character entry box are generated, and this character division candidate point is reduced based on the information in the character entry box, whereby Extracts and recognizes characters in complex contact with each other that cannot be recognized by conventional character recognition techniques. According to the technique of this embodiment, since there are not many character candidates to be extracted, processing time can be shortened, and a read error due to recognition of a wrong character candidate can be reduced. In addition, after a character is divided into various character parts based on the character division candidate point, an accurate character candidate is obtained by adding attributes to individual character parts based on the information in the character input box and combining the character parts according to preset occurrence rules. It can be generated efficiently. Even when the shape of the character input box is changed, the optimum character candidate corresponding to the shape of each character input box can be generated by changing the character candidate generation rule.

이상에서 상세히 기술한 바와 같이, 본 발명에 따르면, 종이 상에 기입되어 있는 복수의 문자 입력 상자에 기입된 문자열에 대해 높은 정확도로 문자 추출이 수행될 수 있다.As described in detail above, according to the present invention, character extraction can be performed with high accuracy on a character string written in a plurality of character input boxes written on paper.

부가의 이점 및 수정이 당업자에게는 용이할 것이다. 따라서, 광의의 측면에서 본 발명은 본 명세서에 도시되고 기술된 구체적인 상세 및 대표적인 실시예로 한정되지 않는다. 따라서, 첨부된 청구항 및 그의 등가물에 의해 정의되는 본 발명의 일반 개념의 정신 또는 범위를 벗어나지 않고 여러가지 수정이 행해질 수 있다.Additional advantages and modifications will be readily apparent to those skilled in the art. Thus, in its broadest sense, the invention is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general concept of the invention as defined by the appended claims and their equivalents.

Claims

종이 상에 기입되어 있는 복수의 문자 입력 상자에 기입된 문자열로부터 각각의 문자 후보를 발생시키고 문자 인식을 수행하는 문자 인식 장치로서,A character recognition apparatus for generating a character candidate from each character string written in a plurality of character input boxes written on paper and performing character recognition,

상기 복수의 문자 입력 상자 및 상기 문자열을 포함하는 이미지를 획득하는 이미지 획득부,An image obtaining unit obtaining an image including the plurality of text input boxes and the character string;

상기 이미지 내에 포함된 문자열에서 복수의 라인이 서로 접촉하거나 교차하는 각각의 점을 검출하고, 상기 각각의 점과 대응하는 문자 입력 상자 간의 배치 관계에 기초하여 상기 문자열이 분할 또는 재결합되는 점을 결정하며, 상기 분할 또는 재결합을 수행함으로써 상기 문자열을 구성하는 각각의 문자 후보를 발생시키는 문자 추출 프로세싱부, 및Detecting each point in which a plurality of lines touch or intersect with each other in a string included in the image, and determine a point at which the string is divided or recombined based on an arrangement relationship between the respective point and a corresponding text input box; A character extraction processing unit which generates each character candidate constituting the character string by performing the division or recombination; and

상기 문자 추출 프로세싱부에 의해 발생된 각각의 문자 후보에 대해 문자 인식을 수행하는 문자 인식부A character recognition unit that performs character recognition on each character candidate generated by the character extraction processing unit

를 포함하는 문자 인식 장치.Character recognition device comprising a.

제1항에 있어서, 검출된 점과 상기 점에 가장 가까운 문자 입력 상자의 수직 라인 사이의 거리가 미리 정해진 값보다 작지 않은 경우, 상기 문자 추출 프로세싱부는 상기 점의 위치에서 분할이 수행되지 않는 것을 결정하고, 상기 거리가 상기 미리 정해진 값보다 작은 경우, 상기 문자 추출 프로세싱부는 상기 점의 상기 위치에서 분할이 수행되는 것을 결정하는 문자 인식 장치.The character extraction processing unit of claim 1, wherein when the distance between the detected point and the vertical line of the character input box closest to the point is not smaller than a predetermined value, the character extraction processing unit determines that division is not performed at the position of the point. And when the distance is smaller than the predetermined value, the character extraction processing unit determines that division is performed at the position of the point.

제1항에 있어서, 1) 검출된 점의 위치에서 분할이 수행될 때 발생될 문자 부분의 중심 좌표가 대응하는 문자 입력 상자에서의 미리 정해진 범위 내에 속하고, 2) 상기 문자 부분의 크기가 미리 정해진 값보다 작지 않으며, 3) 상기 문자 부분의 상기 중심 좌표와 상기 중심 좌표의 위치에 가장 가까운 문자 입력 상자의 수직 라인 사이의 거리가 미리 정해진 값보다 작지 않은 경우, 상기 문자 추출 프로세싱부는 상기 점의 상기 위치에서 분할이 수행되는 것을 결정하는 문자 인식 장치.2. The method according to claim 1, wherein 1) the center coordinates of the character portion to be generated when the division is performed at the position of the detected point are within a predetermined range in the corresponding character input box, and 2) the size of the character portion is in advance. Not less than a predetermined value, and 3) when the distance between the center coordinate of the character portion and the vertical line of the character input box closest to the position of the center coordinate is not smaller than a predetermined value, the character extracting processing unit Character recognition apparatus for determining that the division is performed at the position.

제1항에 있어서, 상기 문자 추출 프로세싱부는,The method of claim 1, wherein the character extraction processing unit,

각각의 문자 부분이 속하는 문자 입력 상자를 식별하기 위해 상자 속성을 첨부하는 속성 첨부부(attribute attaching unit) - 상기 각각의 문자 부분은 상기 문자열의 복수의 라인이 상기 문자열에서 서로 접촉하거나 교차하는 점의 위치에서 분할하는 것에 의해 발생됨 -, 및An attribute attaching unit attaching a box attribute to identify a character input box to which each character part belongs, wherein each character part is formed by a point where a plurality of lines of the string contact or cross each other in the string. Generated by dividing at position-, and

상기 속성 첨부부에 의해 동일한 상자 속성이 첨부되어 있는 복수의 문자 부분이 있을 때, 상기 복수의 문자 부분을 재결합함으로써 발생되는 새로운 문자 부분을 하나의 문자 후보로서 발생시키는 문자 후보 발생부When there are a plurality of character portions to which the same box attribute is attached by the attribute attachment portion, a character candidate generation portion for generating, as one character candidate, a new character portion generated by recombining the plurality of character portions.

제4항에 있어서, 상기 속성 첨부부에 의해 상자 속성이 첨부되도록 구성되어 있지 않은 문자 부분이 있는 경우, 상기 문자 후보 발생부는 상기 문자 부분을 인 접한 문자 부분과 결합함으로써 발생되는 새로운 문자 부분을 하나의 문자 후보로서 발생시키는 문자 인식 장치.The character candidate generating unit includes a new character portion generated by combining the character portion with an adjacent character portion when the character portion is not configured to attach a box attribute by the attribute attachment portion. Character recognition apparatus to generate as a character candidate of the.

제4항에 있어서, 상기 속성 첨부부에 의해 상자 속성이 첨부되도록 구성되어 있지 않은 문자 부분이 있는 경우, 상기 문자 후보 발생부는 상기 문자 부분의 세로 크기가 미리 정해진 값보다 작을 때 상기 문자 부분을 폐기하는 문자 인식 장치.The character candidate generating unit discards the character part when the vertical size of the character part is smaller than a predetermined value when the character part is not configured to attach a box attribute by the attribute attachment part. Character recognition device.

종이 상에 기입되어 있는 복수의 문자 입력 상자에 기입된 문자열로부터 각각의 문자 후보를 발생시키고 문자 인식을 수행하는 문자 인식 방법으로서,A character recognition method for generating character recognition from each character string written in a plurality of character input boxes written on paper and performing character recognition,

상기 복수의 문자 입력 상자 및 상기 문자열을 포함하는 이미지를 획득하는 단계,Obtaining an image including the plurality of text input boxes and the character string,

상기 이미지 내에 포함된 상기 문자열에서 복수의 라인이 서로 접촉하거나 교차하는 각각의 점을 검출하고, 상기 각각의 점과 대응하는 문자 입력 상자 간의 배치 관계에 기초하여 상기 문자열이 분할 또는 재결합되는 점을 결정하며, 상기 분할 또는 재결합을 수행함으로써 상기 문자열을 구성하는 각각의 문자 후보를 발생시키는 문자 추출 프로세싱을 수행하는 단계, 및Detecting each point in which the plurality of lines touch or intersect with each other in the character string included in the image, and determine the point at which the character string is divided or recombined based on the arrangement relationship between the respective point and the corresponding character input box. Performing character extraction processing to generate each character candidate constituting the character string by performing the division or recombination, and

상기 문자 추출 프로세싱에서 발생된 각각의 문자 후보에 대해 문자 인식을 수행하는 단계Performing character recognition on each character candidate generated in the character extraction processing

를 포함하는 문자 인식 방법.Character recognition method comprising a.

제7항에 있어서, 상기 문자 추출 프로세싱은, 검출된 점과 상기 점에 가장 가까운 문자 입력 상자의 수직 라인 사이의 거리가 미리 정해진 값보다 작지 않은 경우, 상기 점의 위치에서 분할이 수행되지 않는 것을 결정하고, 상기 거리가 상기 미리 정해진 값보다 작은 경우, 상기 점의 상기 위치에서 분할이 수행되는 것을 결정하는 단계를 포함하는 문자 인식 방법.8. The character extraction processing according to claim 7, wherein the character extraction processing is performed such that when the distance between the detected point and the vertical line of the character input box closest to the point is not smaller than a predetermined value, no division is performed at the position of the point. Determining, if the distance is less than the predetermined value, determining that segmentation is performed at the location of the point.

제7항에 있어서, 상기 문자 추출 프로세싱은, 1) 검출된 점의 위치에서 분할이 수행될 때 발생될 문자 부분의 중심 좌표가 대응하는 문자 입력 상자에서의 미리 정해진 범위 내에 속하고, 2) 상기 문자 부분의 크기가 미리 정해진 값보다 작지 않으며, 3) 상기 문자 부분의 상기 중심 좌표와 상기 중심 좌표의 위치에 가장 가까운 문자 입력 상자의 수직 라인 사이의 거리가 미리 정해진 값보다 작지 않은 경우, 상기 점의 상기 위치에서 분할이 수행되는 것을 결정하는 단계를 포함하는 문자 인식 방법.The character extraction processing according to claim 7, wherein the character extraction processing comprises: 1) the center coordinates of the character portion to be generated when the division is performed at the position of the detected point are within a predetermined range in the corresponding character input box, and 2) the The size of the character portion is not smaller than a predetermined value, and 3) the point when the distance between the center coordinate of the character portion and the vertical line of the character input box closest to the position of the center coordinate is not smaller than a predetermined value; And determining that a split is performed at the location of the.

제7항에 있어서, 상기 문자 추출 프로세싱은,The method of claim 7, wherein the character extraction processing,

각각의 문자 부분이 속하는 문자 입력 상자를 식별하기 위해 상자 속성을 첨부하는 단계 - 상기 각각의 문자 부분은 상기 문자열의 복수의 라인이 상기 문자열에서 서로 접촉하거나 교차하는 점의 위치에서 분할하는 것에 의해 발생됨 -, 및Attaching a box attribute to identify a character input box to which each character part belongs, wherein each character part is generated by dividing at a point where a plurality of lines of the string contact or cross each other in the string. -, And

상기 속성 첨부시 동일한 상자 속성이 첨부되는 복수의 문자 부분이 있는 경 우, 상기 복수의 문자 부분을 재결합함으로써 발생되는 새로운 문자 부분을 하나의 문자 후보로서 발생시키는 문자 후보 발생을 수행하는 단계If there are a plurality of character parts to which the same box property is attached when the attribute is attached, performing a character candidate generation of generating a new character part generated by recombining the plurality of character parts as one character candidate;

제10항에 있어서, 상기 속성 첨부시 상자 속성이 첨부되도록 구성되어 있지 않은 문자 부분이 있는 경우, 상기 문자 후보 발생은, 상기 문자 부분을 인접한 문자 부분과 결합함으로써 발생되는 새로운 문자 부분을 하나의 문자 후보로서 발생시키는 단계를 포함하는 문자 인식 방법.The character candidate generation according to claim 10, wherein when there is a character part which is not configured to attach a box attribute when the attribute is attached, the character candidate generation is performed by combining a new character part generated by combining the character part with an adjacent character part as one character. And generating as a candidate.

제10항에 있어서, 상기 문자 후보 발생은, 상기 속성 첨부시 상자 속성이 첨부되도록 구성되어 있지 않은 문자 부분이 있는 경우, 상기 문자 부분의 세로 크기가 미리 정해진 값보다 작을 때 상기 문자 부분을 폐기하는 단계를 포함하는 문자 인식 방법.11. The method of claim 10, wherein the occurrence of the character candidate is to discard the character portion when the vertical portion of the character portion is smaller than a predetermined value when there is a character portion that is not configured to attach a box attribute when the attribute is attached. Character recognition method comprising the steps.