JPS6028027B2

JPS6028027B2 - Korean sort control method

Info

Publication number: JPS6028027B2
Application number: JP56048283A
Authority: JP
Inventors: 澄佐々木; 正太郎喜柳; 敏夫斎藤; 秀昭柳
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1981-03-31
Filing date: 1981-03-31
Publication date: 1985-07-02
Also published as: JPS57162018A

Description

【発明の詳細な説明】本発明は、韓国語ソート制御方式、特にハングル文字と
漢字と英・数字とを含む韓国語単語列を読みに対応した
順序に順序付けするに当って、通常の子音と重子音とを
区別して読みに対応する数値コードを与えた辞書が用い
られる場合に、重子音の取扱いに配慮を与えて、上記順
序付け結果が正しく得られるようにした韓国語ソート制
御システムに関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention provides a Korean sorting control method, in particular, when ordering a Korean word string including Hangul characters, Chinese characters, and alphanumeric characters into an order corresponding to the reading. This relates to a Korean sorting control system that takes into account the handling of double consonants and allows the above ordering results to be obtained correctly when a dictionary is used that distinguishes between double consonants and gives numerical codes corresponding to their pronunciations. be.

一般に上記ハングル文字は、いわば「子音−母音一子音
」の組合わせによって構成された表音文字と考えてよく
、第１図Ａ図示の如き母音群と第１図Ｂ図示の如き子音
群とによって構成される。In general, the above-mentioned Hangul characters can be thought of as phonetic characters composed of a combination of "consonant - vowel and one consonant", and are composed of a vowel group as shown in Figure 1A and a consonant group as shown in Figure 1B. configured.

なお第１図Ａ，Ｂにおいて、各上段はハングル字素を示
し、各下段は対応する発音を示している。したがって、
例えば第２図Ａに示す２文字は、第２図Ｂ図示のように
夫々の字素に対応する発音を割つけてみると判る如く、
第２図Ｃ図示の“ｈａｎ則ｅｌ”と発音される。In FIGS. 1A and 1B, each upper row shows a Hangul glyme, and each lower row shows the corresponding pronunciation. therefore,
For example, the two letters shown in Figure 2A are as shown in Figure 2B, as can be seen by assigning the corresponding pronunciation to each grapheme.
It is pronounced as "han rule el" as shown in FIG. 2C.

これらの通常使用頻度の高いハングル文字や、上記漢字
および英・数字に対して、、最近ＫＥＦ（Ｋｏｒｅａ
ｎＰｒｏｃｅｓｓｉｎｇＥ幻ｅｎｄｅｄＦｅａｔＭｅ）
コードが制定されている。Recently, KEF (Korea
nProcessingEphantomendedFeatMe)
A code has been established.

しかし、該ＫＥＦコードは各文字の読みに関連づけられ
ているものではないために、上記ハングル文字を含む韓
国語単語について、例えば名簿作成などのために読みに
対応して順序付けを行なう場合にはその処理がきわめて
煩雑となる。また、当該順序付けに当って、韓国語には
例えばハングル字素として子音「つ（Ｋ）」に対応する
重子音「「っ（ＫＫ）」や子音「人（Ｓ）」に対応する
重子音「＾人（ＳＳ）」などが存在することが必要であ
る。However, since the KEF code is not associated with the pronunciation of each character, when ordering the Korean words containing the above-mentioned Hangul characters according to their pronunciation, for example, to create a list, etc. Processing becomes extremely complicated. In addition, in Korean, for example, as Hangul glyphs, the double consonant "KK" corresponding to the consonant "tsu (K)" and the double consonant "KK" corresponding to the consonant "人(S)" are used. ＾人(SS)" etc. must exist.

即ち第１１図を参照して後述する如く、ハングル文字「
ブト（ｋａ）」と「フフト（ｋｋａ）」と「フｌ（ｋｉ
）」とは、本来「フト（ｋａ）〃「フフｒ（ｋｋａ）ぃ
「７ｌ（ｋｉ）Ｊの日頃‘こ配列されるべきであるにも
拘わらず、ハングル文字「フト」と「フしとが同じ子音
「「」をもつことから、子音「つ（Ｋ）」と重子音「「
マ（ＫＫ）」とに異なる読み数値コードが与えられると
、上記配列の結果が「７ト（ｋａ）”「フー（Ｋｉ）」
，「７７Ｌ（ｋｋａ）」の如くなってしまうことが生じ
る。したがって、この点を蟹決すべ〈あわせて対処する
ことが必要となる。本発明は上記の点を解決することを
目的としており、本発明の韓国語ソート制御方式は、ハ
ングル文字と漢字と英・数字とに対して夫々予め定めら
れたコードが割付けられているハングル文字を含む単語
列を上記ハングル文字や漢字の読みに対応する順序に順
序付けられたコードを利用して順序付けする韓国語ソー
ト制御方式において、上記各ハングル文字と漢字と英・
数字とに対応して少なくとも当該文字の読みを数値コー
ド化して格納しかつ当該読みに関して通常の子音と重子
音とを区別した数値コードを格納している文字属性辞書
、および上記子音と重子音とが同一の数値コードをもっ
て割付けられかつ上記子音と重子音とを区別する重子音
指示用コードを用意されて構成されるソート用文字属性
コード・モジュールをそなえると共に・上記単語列の各
単語に対応して当該単語に含まれる文字について上記文
字属性辞書あるいは当該文字属性辞書から準備されたソ
ート用文字属性ロード・モジュールを索引して当該単語
に対応する数値コードを生成する数値コード生成部、お
よび該数値コード生成部によって生成された生成数値コ
ー日こもとづいて上記単語列中の各単語を上記生成数値
コードの昇順および／または降順に配列するソート処理
部をそなえ、上記数値コード生成部は、通常の子音と重
子音とを区別して文字属性情報を格納している上記文字
属性辞書を索引して上記数値コードを生成するに当って
、上記ソート用文字属性コード・モジュールの内容を合
わせ索引し、当該内容にもとづいて、上記文字属性辞書
上での上記重子音に対応した数値コードを上記通常の子
音に対応する数値コード‘こ一致するよう変換して上記
生成する生成数値コード中に出力すると共に、当該生成
数値コード中に重子音指示部をもうけて重子音の存否を
指示するよう変換し、該生成数値コードを出力するよう
構成されたことを特徴としている。That is, as will be described later with reference to FIG. 11, the Hangul character "
Buto (ka),” “fuft (kka),” and “fl (ki).”
)” is originally supposed to be arranged like “futo (ka)〃”fufu r (kka)i “7l (ki) J’, but the Hangul letters “futo” and have the same consonant "", so the consonant "tsu (K)" and the double consonant ""
When a different pronunciation value code is given to ``Ma (KK)'', the result of the above array is ``7 To (ka)'' and ``Fu (Ki)''.
, "77L (kka)". Therefore, it is necessary to resolve this issue. The present invention aims to solve the above-mentioned problems, and the Korean language sorting control method of the present invention is based on the Korean language sorting control method, in which Hangul characters, Hanja characters, and alphanumeric characters are assigned predetermined codes, respectively. In the Korean sort control method, which uses codes ordered in the order corresponding to the pronunciations of the Hangul characters and Kanji, the word strings containing the above Hangul characters, Kanji, and English characters are ordered.
A character attribute dictionary that stores at least the reading of the character as a numerical code corresponding to the number, and stores a numerical code that distinguishes normal consonants and double consonants with respect to the reading, and the consonant and double consonant. A character attribute code module for sorting is provided, in which the characters are assigned the same numerical code and are prepared with codes for indicating double consonants to distinguish between the above consonants and double consonants. a numeric code generation unit that generates a numeric code corresponding to the word by indexing the character attribute dictionary or a sorting character attribute load module prepared from the character attribute dictionary for the characters included in the word; A sorting processing section is provided for arranging each word in the word string in ascending order and/or descending order of the generated numerical code based on the generated numerical code generated by the code generating section. When generating the numerical code by indexing the character attribute dictionary that stores character attribute information by distinguishing between consonants and diconsonants, the content of the character attribute code module for sorting is also indexed, and the relevant Based on the content, convert the numerical code corresponding to the double consonant on the character attribute dictionary to match the numerical code corresponding to the normal consonant and output it in the generated numerical code, The present invention is characterized in that a double consonant indicating section is provided in the generated numerical code, the conversion is performed to indicate the presence or absence of a double consonant, and the generated numerical code is output.

以下図面を参照しつつ説明する。第３図は本発明システ
ムの一実施例全体構成図、第４図ＡないしＥ‘ま本発明
に用いる数値コ−ドの態様を説明する説明図、第５図は
第３図図示の文字属性辞書に格納される数値コードの−
実施例態様、第６図は第３図図示のソート用文字属性ロ
ード・モジュールに格納される数値コードの−実施例態
様、第７図は与えられたソート対象単語に対応して数値
コ−ドが附与されたソート用データを説明する説明図、
第８図は上記文字属性辞書の−実施例態様、第９図は上
記ソート用文字属性ロード・モジュールの一実施例態様
、第ｌｏ図ＡないしＤは本発明にいう頓首規則を説明す
る説明図、第１１図ＡないしＣは韓国語に存在する重子
音の取扱いを説明する説明図、第１２図は重子音を正し
く取扱った結果において得られるソート結果を説明する
説明図、第１３図は頭音規則を適用しない場合と適用し
た場合とのソ−ト結果を説明する説明図を示す。第３図
において、１は与えられた単語の１つ（×），２は数値
コード生成部、３はソート処理部、４はソート用データ
、５はソート結果、６は文字属性辞書、７はソート用文
字属性ロード・モジュールを表わしている。This will be explained below with reference to the drawings. FIG. 3 is an overall configuration diagram of an embodiment of the system of the present invention, FIG. 4 is an explanatory diagram illustrating aspects of numerical codes used in the present invention, and FIG. 5 is an illustration of character attributes shown in FIG. 3. − of the numerical code stored in the dictionary
6 shows an example of the numerical code stored in the character attribute load module for sorting shown in FIG. 3, and FIG. 7 shows the numerical code corresponding to the given word to be sorted. An explanatory diagram illustrating sorting data to which is attached,
FIG. 8 is an embodiment of the character attribute dictionary, FIG. 9 is an embodiment of the sorting character attribute load module, and FIGS. , Figures 11 A to C are explanatory diagrams explaining the handling of double consonants that exist in Korean, Figure 12 is an explanatory diagram explaining the sorting results obtained when correctly handling double consonants, and Figure 13 is an explanatory diagram explaining the handling of double consonants that exist in Korean. An explanatory diagram illustrating the sorting results when the sound rule is not applied and when it is applied is shown. In Fig. 3, 1 is one of the given words (×), 2 is a numerical code generator, 3 is a sorting processing unit, 4 is data for sorting, 5 is the sorting result, 6 is a character attribute dictionary, and 7 is Represents the character attribute load module for sorting.

第３図図示実施例については、後でより具体的に説明す
るが、概念的に言えば次の如く順序付け処理が行なわれ
るものと考えてよい。The embodiment shown in FIG. 3 will be explained in more detail later, but conceptually speaking, it can be considered that the ordering process is performed as follows.

‘１｝今単語（Ｘ）１がソート対象単語として入力さ
れたとする。'1} Assume that word (X)1 is now input as a word to be sorted.

この場合、各文字「障り「フオ」…について、ＫＥＦコ
ードをキイとして、文字属性辞書６および／またはソー
ト用文字属性ロード・モジュール７を索引する。‘２１
そして各文字「陸」・・・に対応する読みの数値コー
ド「２６５６０２」，「０２３４００」・・・を抽出し
た結果与えられる読み指示部と、後述する重子音の存在
を指示する重子音指示部と、漢字に対する画数を指示す
る画数指示部とよりなる数値コードを生成する。In this case, the character attribute dictionary 6 and/or the character attribute load module 7 for sorting is indexed for each character ``Kari ``Huo''... using the KEF code as a key. '21
Then, there is a reading instruction section that is given as a result of extracting the reading numerical codes "265602", "023400", etc. corresponding to each character "Riku"... and a double consonant instruction section that indicates the presence of a double consonant, which will be described later. A numeric code is generated, consisting of the following:

‘３’ 該数値コードは上記単語（Ｘ）１の頭部に連結
されて、ソート用データ４がつくられる。‘４１ソー
ト処理部３は、当該ソート用データ４を重みをもつ数値
とみなして、従来周知の如く、昇順あるし、は降順に順
序付けする。■ ソート処理部３は、上記順序付けされ
た結果のソート用データについて、頭部に付与されてい
る数値コードを削除して、単語のみのソート結果５を出
力する。'3' The numerical code is connected to the head of the word (X)1 to create sorting data 4. '41 The sorting processing unit 3 regards the sorting data 4 as numerical values with weights, and orders them in ascending order or descending order, as is conventionally known. (2) The sorting processing unit 3 deletes the numerical code given to the head of the sorting data of the ordered results, and outputs a sorting result 5 containing only words.

以下単語を数値コード化する態様について説明する。The manner in which words are numerically encoded will be described below.

第４図Ａはハングル文字および漢字の読みを数値コード
化する態様を示している。上述の如くハングル文字は一
般に３個の字素の組合わせで構成される。漢字の読みも
ハングル文字に対応づけることができる。これらは、初
声、中声、終声と呼ばれる。該読みをコード化するに当
っては、予め定められたコードに対応して、初声のため
に１バイト分、中声のために１バイト分、終声のために
１バイト分が準備される。なお終声が存在しない場合に
は１６隻表示で▼００▼が与えられる。第４図Ｂはいわ
ゆる空白を数値コード化する態様を示している。この場
合にも３バイト分が用意されるが、最初の１バイト分に
１金隼表示で▼００▼が与えられかつ後の２バイト分に
「空白」を示すＫＥＦコード▼４０４０▼が与えられる
。第４図Ｃは英・数字を数値コード化する態様を示して
いる。この場合にも３バイト分が用意され、最初の１バ
イト分に１６進表示で▼ＦＦ▼が与えられ、後の２バイ
ト分に夫々の英・数字文字のＫＥＦコードが与えられる
。韓国語には第１図Ｂ図示の「フっし「［【い「ｄ日ぃ
「＾人」，「スス」の如く重子音が存在する。該童子音
の取扱いについては後述されるが、初声と終声とに夫々
童子音が存在することがある。このために、１つの単語
を構成する文字数をＮとするとき、（Ｎ十３）／４の商に相当するバイト数を重子音指示部として用意し、
当該単語を構成する各文字について２ビットを準備する
。FIG. 4A shows a mode of numerically encoding the pronunciations of Hangul characters and Chinese characters. As mentioned above, Hangul characters are generally composed of a combination of three graphemes. Kanji readings can also be mapped to Hangul characters. These are called the first voice, middle voice, and final voice. When encoding the reading, one byte is prepared for the first voice, one byte for the middle voice, and one byte for the final voice, corresponding to the predetermined code. Ru. If there is no final voice, ▼00▼ will be given with 16 ships displayed. FIG. 4B shows a mode in which so-called blank spaces are numerically coded. In this case, 3 bytes are also prepared, but the first byte is given ▼00▼ in 1 gold falcon notation, and the KEF code ▼4040▼ indicating "blank" is given to the next 2 bytes. . FIG. 4C shows a mode of converting alphanumeric characters into numerical codes. In this case as well, 3 bytes are prepared, the first 1 byte is given ▼FF▼ in hexadecimal notation, and the next 2 bytes are given KEF codes of alphanumeric characters. In Korean, there are heavy consonants such as ``fusshi'' [[ii ``d 日〃＾人'' and ``susu'' as shown in Figure 1B. Although the handling of the doji sound will be described later, there are cases where there are doji sounds for the first voice and the final voice, respectively. For this purpose, when the number of characters constituting one word is N, the number of bytes corresponding to the quotient of (N13)/4 is prepared as the double consonant indicator,
Two bits are prepared for each character that makes up the word.

そして、第４図Ｄ図示の如く、或る文字の初声に電子音
が存在するとき先頭ビットに「１」が与えられ、終声に
重子音が存在するとき後のビットに「１」が与えられる
。したがって、或る文字についての２ビットには、「０
０」，「１０」，「０１」，「１１」のいずれかが与え
られることとなる。更に漢字の場合には、読み自体でみ
るといわゆる同音異字が存在する。As shown in Figure 4D, when there is an electronic sound in the first voice of a certain character, "1" is given to the first bit, and when there is a double consonant in the final voice, "1" is given to the later bit. Given. Therefore, the 2 bits for a certain character contain "0".
One of "0", "10", "01", and "11" will be given. Furthermore, in the case of kanji, there are so-called homonyms when looking at the reading itself.

このことを考慮して、ソート対象単語を構成する各文字
について、第４図Ｅ図示の如く、１バイト分の画数指示
部が準備される。そして、非漢字の場合には１６隻表示
で▼００▼が与えられ、漢字の場合にはＩＳ隼表示で当
該漢字の画数が与えられる。第３図図示のソート用デー
タ４においては、入力された単語（×）１について、■
漢字「陸」に対応する読み数値コード「２６５６０２
」が与えられ、ハングル文字「フフ「一に対応する読み
数値コード「０２３４００」が与えられ、空白に対応し
て数値コード「００４０４０」が与えられ、・・・・・
・、英字「Ａ」に対応して数値コード「ＦＦＡ父１」が
与えられ、数字「１」に対応して数値コード「ＦＦＡ斑
１」が与えられている。In consideration of this, a 1-byte stroke number indicating section is prepared for each character constituting the word to be sorted, as shown in FIG. 4E. In the case of a non-kanji character, ▼00▼ is given in the 16-character display, and in the case of a kanji character, the number of strokes of the kanji is given in the IS Hayabusa display. In the sorting data 4 shown in FIG. 3, for the input word (x) 1,
Reading value code “265602” corresponding to the kanji “Riku”
” is given, the reading numerical code “023400” corresponding to the Hangul character “fufu” is given, and the numerical code “004040” is given corresponding to the blank space, etc.
・The numerical code "FFA Father 1" is assigned to the alphabetic letter "A", and the numerical code "FFA Spot 1" is assigned to the number "1".

‘Ｂ）更に重子音指示部においては、次の如く数値コー
ド「２３００００」が与えられている。'B) Further, in the double consonant instruction section, a numerical code "230000" is given as follows.

即ち、入力された単語（×）１中に９文字存在すること
から、（９十３）／４＝３となり、３バイト分の重子音指示部が準備される。That is, since there are 9 characters in the input word (x) 1, (913)/4=3, and a 3-byte double consonant indicating part is prepared.

そして、文字「陸」が重子音をもたないことから第４図
Ｄの２ビットして「００」が、文字「〃「一が初声に重
子音をもつことから「１０」が、空白が重子音をもたな
いことから「００」が、文字「蛾」が初声と終声とに重
子音をもつことから「１１」が、・…・・与えられ、結
局順に並べると「００１０，００１１，００００，……
」となることから、１６隻表示で「２３００００」が与
えられている。に’また画数指示部においては、単語（
Ｘ）中の漢字「陸」と「肉」とについて夫々画数「順一
と「０６」とが与えられている。Since the character ``Riku'' does not have a double consonant, the 2 bits in Figure 4 D are ``00'', and the character ``1'' has a double consonant in the first voice, so ``10'' is blank. ``00'' is given because it has no double consonant, and ``11'' is given because the character ``Moth'' has a double consonant in the first and last sounds.In the end, when arranged in order, ``0010'' is given. ,0011,0000,...
” Therefore, “230,000” is given with 16 ships displayed. 'Also, in the stroke count indicator, the word (
X) The kanji characters ``land'' and ``niku'' in the middle are given the stroke counts ``Junichi'' and ``06'', respectively.

第５図は、第３図図示の文字属性辞書に格納させる数値
コードの一実施例態様を示している。FIG. 5 shows an embodiment of the numerical code stored in the character attribute dictionary shown in FIG.

上述の如く、ハングル文字は夫々字素によって組立てら
れる。このことから、各字秦あるいは字素群について、
読みに対応した順序をもつ数値コードを割付けておくよ
うにされらる。即ち、図示「マ」に対応して▼０２▼を
、」７」に対応して▼０３▼を、……の如く割付けてお
くようにする。当該文字属性辞書は、本発明の対象とす
る読みによるソートにのみ利用されるものではないこと
から、図示の如く重子音に対しては奇数コードが与えら
れている。一方、第６図は、第３図図示のソート用文字
属性ロード・モジュールに格納される同様な数値コード
を示している。As mentioned above, each Hangul character is composed of glymes. From this, for each character Qin or character group,
Numerical codes with an order corresponding to the reading are assigned. That is, ▼02▼ is assigned to correspond to "Ma" shown in the figure, ▼03▼ is assigned to correspond to "7", and so on. Since the character attribute dictionary is not only used for sorting by reading, which is the object of the present invention, odd codes are given to double consonants as shown in the figure. On the other hand, FIG. 6 shows a similar numerical code stored in the sorting character attribute load module shown in FIG.

該ロード・モジュールは、ソートのためにのみ利用され
るものであることから、重子音は元の子音と同じコード
が与えらている点において、第５図図示の場合と異なる
。即ち子音「一に対してもまた重子音「一に対しても同
じコード▼０２▼が与えられている。この理由は後述さ
れる。第３図図示のソート用データ中の文字「フ７ｆ」
については、重子音「フフ」に対応する数値コード▼ｏ
２▼と、母音「ト」に対応する数値コード▼３４▼と、
終声が存在しないことを示すコード▼００▼とによって
、１つの数値コード▼０２私００▼が与えられている。Since the load module is used only for sorting, the case differs from that shown in FIG. 5 in that the double consonant is given the same code as the original consonant. In other words, the same code ▼02▼ is given to both the consonant "1" and the double consonant "1.The reason for this will be explained later.The character "F7f" in the sorting data shown in FIG.
For , the numerical code corresponding to the double consonant “fufu”▼o
2▼ and the numerical code ▼34▼ corresponding to the vowel “t”,
One numerical code ▼02I00▼ is given by the code ▼00▼ indicating that there is no final voice.

第７図は例えばソート対象単語が「ユーザ指定ソート．
キ−」の形で４文字分（８バイト）で与えられたとする
場合に、生成されるソート用デー夕のフオーマットを示
している。In FIG. 7, for example, the word to be sorted is "User specified sort."
This shows the format of sorting data that is generated when 4 characters (8 bytes) are given in the form of "key".

即ち、４文字分の各文字について、第４図ＡないしＣに
関連して説明した如き３バイト分の数値コードが合計１
２／ゞィト分与えられる。また重子音指定部として（４
十３）／４＝１…余り３であることから、１バイト分が
与えられる。That is, for each of the four characters, there is a total of one three-byte numerical code as explained in connection with Figures 4A to C.
2/unit will be given. Also, as a double consonant designation part (4
13)/4=1...Since the remainder is 3, 1 byte is given.

更に画数指定部として４バイト分が与えられる。そして
、このように生成された数値コードが入力された単語（
×）の頭部に附加されて、全体としてソート用データ４
となる。該ソート用データ４は図示最左端が最上位桁と
なる如き重みをもった数値コードであり、各ソート用デ
ー外ま当該数値の大小関係にもとづいて第３図図示のソ
ート処理部３によって昇順あるし、は降順に配列されら
る。そして、その結果から、単語（Ｘ）の部分のみが抽
出されて、第３図図示のソート結果５となる。第８図は
文字属性辞書の一実施例態様を示している。当該辞書６
内には、各文字に対応して音読み数値コードと訓読み数
値コードとが与えられている。該辞書６内においては、
重子音については奇数値のコードをもつものとして格納
されている。なおハングル文字の場合には、音読みと訓
読みとの区別が存在しない。しかし、韓国語においては
、後述する如く、頭音規則が存在しており、図示の読み
数値コードとして頭音規則が適用された場合の読みが数
値コードとして格納される。勿論頭音規則の適用を受け
ない文字については、図示音読み数値コード城と読み数
値コード城とで同じ数値コードが格納されている。この
ために、後述する如く、単語の先頭に当該文字が現われ
ている場合に、額音規則が適用されるか否かに拘わらず
、図示読み数値コードを、当該文字に対するコードとし
て抽出すればよい。第９図はソート用文字属性ロード・
モジュールの一実施例態様を示している。Furthermore, 4 bytes are given as a stroke number designation part. And the word (
×) is added to the head of the sorting data 4 as a whole.
becomes. The sorting data 4 is a numerical code with a weight such that the leftmost digit in the figure is the most significant digit, and is sorted in ascending order by the sorting processing unit 3 shown in FIG. Yes, and are arranged in descending order. Then, from the result, only the word (X) portion is extracted, resulting in sorting result 5 shown in FIG. FIG. 8 shows an embodiment of the character attribute dictionary. The dictionary 6
Inside, an on-yomi numerical code and a kun-yomi numerical code are given corresponding to each character. In the dictionary 6,
Consonants are stored as having odd-valued codes. In the case of Hangul characters, there is no distinction between Onyomi and Kunyomi. However, in Korean, as will be described later, there is an initialization rule, and the reading when the initialization rule is applied is stored as a numerical code as the illustrated numerical reading code. Of course, for characters to which the initial sound rule is not applied, the same numerical code is stored in the illustrated phonetic reading numerical code castle and the reading numerical code castle. To this end, as will be described later, if the character appears at the beginning of a word, the illustrated reading numerical code may be extracted as the code for the character, regardless of whether the forehead sound rule is applied or not. . Figure 9 shows character attribute loading for sorting.
1 illustrates an example embodiment of a module.

該モジュール７内には、各文字に対応して、図示の如く
「総画数情報」，「音読み数値コード」，「重子音情報
」，「読み数値コード（頭音規則適用）」などが格納さ
れている。総画数情報は、第８図図示の文字属性辞書６
中にも存在し、第４図Ｅを参照して説明した情報である
。また音読み数値コードと読み数値コ−ドとは、第８図
図示の文字属性辞書６に関連して説明したので省略する
が、ハングル文字については頭音規則を適用する場合の
読みの数値コードが「読み数値コード」として格納され
ている。ただ、該モジュール７内における読みの数値コ
ードとしては、第６図に関連して説明した如く、重子音
に対応するコードは偶数値のコードに書替えら‐れてい
る。そして、このことを補なうために、該モジュール７
においては「重子音情報」が各文字毎に１バイト分準備
されている。即ち、図示の如く重子音が存在する態様に
応じて、「００００００００」・「０○○○○○１○」
「００００○○○１」，「００００００１１」のいずれ
かのパターンが重子音借報として格納されている。In the module 7, "total number of strokes information", "ontic reading numerical code", "divisional consonant information", "reading numerical code (initialization rule applied)", etc. are stored in correspondence with each character, as shown in the figure. ing. The total number of strokes information is obtained from the character attribute dictionary 6 shown in Figure 8.
This information is also present in the above information and is the information described with reference to FIG. 4E. Furthermore, the on-yomi numerical code and the reading numerical code are omitted as they have been explained in connection with the character attribute dictionary 6 shown in Figure 8, but for Hangul characters, the reading numerical code when applying the initialization rule is It is stored as a "reading value code". However, as for the numerical reading codes in the module 7, as explained in connection with FIG. 6, codes corresponding to double consonants are rewritten to even-valued codes. In order to compensate for this, the module 7
In , one byte of "double consonant information" is prepared for each character. In other words, as shown in the figure, depending on the state in which double consonants are present, "00000000" and "0○○○○○1○"
Either pattern "0000○○○1" or "00000011" is stored as a double consonant pattern.

第９図図示のソート用文字属性ロード・モジュール７に
格納されいる文字を以下第１水準の文字と呼び、第８図
図示の文字属性辞書６に格納されている文字を第２水準
の文字と呼ぶ。The characters stored in the sorting character attribute load module 7 shown in FIG. 9 are hereinafter referred to as first-level characters, and the characters stored in the character attribute dictionary 6 shown in FIG. 8 are referred to as second-level characters. call.

第３図図示の数値コード生成処理部２においては、上記
文字属性辞書６から第２水準の文字を索引して利用する
に当っては、読みに対応する数値コードを先頭から調べ
、もしも奇数コードであれば当該コードの値から一１し
た上で上記重子音情報として上述のビットを「１」にす
るようにする。この理由は第１１図を参照して後述され
る。ここで上述の頭音規則について、第１０図を参照し
て説明しておく。In the numerical code generation processing unit 2 shown in FIG. If so, the value of the code is subtracted by 1 and the above-mentioned bit is set to "1" as the double consonant information. The reason for this will be explained later with reference to FIG. The above-mentioned initialization rule will now be explained with reference to FIG. 10.

韓国語においては、第１０図Ａ，Ｂ，Ｃ図示の谷上段に
示されるハングル文字が単語の先頭に現われるときには
、図示各下段に示されるハングル文字を利用するように
される。即ち、単語の先頭に現われた場合に発音が変化
される。例えば第１０図Ｄ図示の如く、単語「良心」を
発音するに当っては、本来図示「誤」の欄に示されるよ
うにハングル文字が対応するにも拘らず、図示「正一の
欄に示されるように変更される。このことから、第８図
および第９図に示した読み数値コード域こは、文字「誓
」に対応して文字「ぢＥ」に対応する数値コードヵ格納
されると考えてよい。In Korean, when the Hangul characters shown in the upper rows of the valleys shown in FIGS. 10A, B, and C appear at the beginning of a word, the Hangul characters shown in the lower rows of the figures are used. That is, the pronunciation changes when it appears at the beginning of a word. For example, as shown in Figure 10D, when pronouncing the word ``conscience,'' even though the Hangul characters correspond to each other as shown in the ``wrong'' column in the illustration, From this, the numerical reading code area shown in Figures 8 and 9 stores the numerical code corresponding to the character ``jiE'' in correspondence with the character ``shou''. You can think that.

そして、頭音規則が適用されるか否かによって、抽出す
る数値コードを選択するようにされる。第１１図Ａない
しＣは、ソート処理に当っての重子音の取扱いについて
説明する説明図である。Then, the numerical code to be extracted is selected depending on whether or not the initial sound rule is applied. FIGS. 11A to 11C are explanatory diagrams illustrating how double consonants are handled in the sorting process.

今文字「フ「”「フフ」「フｌ」とをソートするに当っ
ては、第１１図Ｂ図示の如く配列されるべきものである
。しかし、これらの文字については、文字属性辞書６内
で、第５図から明らかな如く、文字「フト」に対して▼
０２３４００▼が与えられ、文字「ブフ「一に対して▼
０３乳００▼が与えられ、文字「フＬに対して▼０２皮
００▼が与えられている。この結果、読みに対応する数
値コードのみを比較してソートすると、０２３４００＜
０２＄００く０３３４００であることから、第１１図Ｃ
図示の如く配列されることとなる。When sorting the characters "F", "Fufu", and "Fl", they should be arranged as shown in FIG. 11B. However, for these characters, in the character attribute dictionary 6, as is clear from FIG.
023400▼ is given, and the character “Bufu” is given for one▼
03 milk 00▼ is given, and ▼02 skin 00▼ is given for the character "FL. As a result, if you compare and sort only the numerical codes corresponding to the readings, 023400<
Since it is 02 $00 × 033400, Figure 11 C
They will be arranged as shown.

この点を改善すべく、ソート用データ４として、重子音
情報城を用意すると共に、重子音に対応する数値コード
を通常の子音のそれと同じコ−ドをとるようにしている
。In order to improve this point, double consonant information is prepared as the sorting data 4, and the numerical code corresponding to the double consonant is set to be the same as that of a normal consonant.

即ち、文字「７ｌ−」に対応して▼０２３４００，・・
・・・・０（重子音域）▼を与え、文字「刀ト」に対応
して▼０２３４００，・・・…２（重子音域）▼を与え
、文字「フ！」に対応して▼０２に００・・・・・・０
（重子音域）▼を与えるようにしている。このようにす
ることによって、０２乳００・・・０＜０２錨００・・・２＜０２８０・
・・０となり、第１１図Ｂ図示の如く正しく配列するこ
とが可能となる。That is, corresponding to the character "7l-" ▼023400,...
...Give 0 (double consonant range) ▼, give ▼023400, ...2 (double consonant range) ▼ corresponding to the character "Toto", and give ▼02 in response to the character "fu!" 00...0
(Double consonant range) ▼ is given. By doing this, 02milk00...0<02anchor00...2<0280・
. . 0, and it becomes possible to arrange them correctly as shown in FIG. 11B.

第１２図は重子音を上述の如く正しく取扱った結果にお
いて得られるソート結果を説明している。FIG. 12 explains the sorting results obtained when the double consonants are treated correctly as described above.

図中の左欄に示される複数の単語がソート対象単語であ
るとするとき、図示右欄に示す如く正しく配列される。
このためには、第８図および第９図に関連して説明した
如く、文字属性辞書６に存在する文字艮０ち第２水準の
文字を索引した場合には、読みに対応する数値コードが
奇数値をもっていたとき、当該コードの値を−１とした
上で重子音情報城にビット「１」を立てて使用するよう
にする。When a plurality of words shown in the left column of the figure are words to be sorted, they are correctly arranged as shown in the right column of the figure.
To this end, as explained in connection with FIGS. 8 and 9, when characters with character 0 or 2nd level existing in the character attribute dictionary 6 are indexed, the numerical code corresponding to the reading is When the code has an odd value, the value of the code is set to -1 and bit "1" is set in the double consonant information field for use.

第１３図は、観音規則を適用しなかった場合と適用した
場合との、本発明によるソート結果を示している。FIG. 13 shows the sorting results according to the present invention when the Kannon rule is not applied and when it is applied.

図示左欄に示される複数の単語がソート対象単語である
としたとき、頭音規則を適用しなかった場合には、図示
右樹上段に示す如く配列されるが、頭音規則を適用した
場合には、図示右横下段に示す如く配列される。以上説
明した如く、本発明においては、韓国語に存在する重子
音を巧みに取扱ってソート用データを生成するようにし
、これによってソート処理部においては、ソート用デー
タを単純に桁に重みをもつ数値とみなして昇豚あるし、
は降順に配列する処理を行なえば足りるようにしている
。When the words shown in the left column of the illustration are the words to be sorted, if the initial sound rule is not applied, they will be arranged as shown in the upper row of the right tree in the illustration, but if the initial sound rule is applied are arranged as shown in the lower row on the right side of the figure. As explained above, in the present invention, sorting data is generated by skillfully handling the double consonants that exist in Korean, so that the sorting processing section simply assigns digit weights to the sorting data. There is a rising pig when it is considered as a numerical value,
, it is sufficient to arrange them in descending order.

三ＷThree Ws

【図面の簡単な説明】[Brief explanation of drawings]

第１図および第２図は本発明の前提問題を説明する説明
図、第３図は本発明システムの一実施例全体構成図、第
４図ＡないしＥは本発明に用いる数値コードの態様を説
明する説明図、第５図は第３図図示の文字属性辞書に格
納される数値コ−ドの一実施例態様、第６図は第３図図
示のソート用文字属性ロード・モジュールに格納される
数値コードの一実施例態様、第７図は与えられたソート
対象単語に対応して数値コードが附与されたソート用デ
ータを説明する説明図、第８図は上記文字属性辞書の一
実施例態様、第９図は上記ソート用文字属性ロード・モ
ジュールの一実施例態様、第１０図ＡないしＤは本発明
にいう頭音規則を説明する説明図、第１１図ＡないしＣ
は韓国語に存在する重子音の取扱いを説明する説明図、
第１２図は重子音を正しく取扱った結果において得られ
るソート結果を説明する説明図、第１３図は頭音規則を
適用しない場合と適用した場合とのソート結果を説明す
る説明図を示す。図中、１は単語、２は数値コード生成部、３はソート処
理部、４はソート用データ、５はソート結果、６は文字
属性辞書、４はソート用文字属性ロード・モジュールを
表わす。了’図サＺ図図ボ才４図寸；風才６図 ★ｖ図才！０図図ボ寸’１図了’Ｚ図才ー３図Figures 1 and 2 are explanatory diagrams explaining the prerequisite problems of the present invention, Figure 3 is an overall configuration diagram of an embodiment of the system of the present invention, and Figures 4 A to E show aspects of the numerical code used in the present invention. 5 is an example of the numerical code stored in the character attribute dictionary shown in FIG. 3, and FIG. 6 is an example of the numerical code stored in the character attribute load module for sorting shown in FIG. FIG. 7 is an explanatory diagram illustrating sorting data to which a numerical code is assigned corresponding to a given word to be sorted, and FIG. 8 is an embodiment of the above-mentioned character attribute dictionary. FIG. 9 is an embodiment of the character attribute load module for sorting, FIGS. 10A to D are explanatory diagrams for explaining the initialization rules of the present invention, and FIGS. 11A to C
is an explanatory diagram explaining the handling of double consonants that exist in Korean,
FIG. 12 is an explanatory diagram illustrating the sorting results obtained as a result of correctly handling double consonants, and FIG. 13 is an explanatory diagram illustrating the sorting results when the initialization rule is not applied and when it is applied. In the figure, 1 is a word, 2 is a numerical code generation section, 3 is a sort processing section, 4 is data for sorting, 5 is a sorting result, 6 is a character attribute dictionary, and 4 is a character attribute load module for sorting. Completed' figure, Z figure, figure size, 4 figure size; wind figure, 6 figure★v figure size! 0 figure Bo size '1 figure completed' Z figure size - 3 figure

Claims

【特許請求の範囲】[Claims]

１ハングル文字と漢字と英・数字とに対して夫々予め
定められたコードが割付けられているハングル文字を含
む単語列を上記ハングル文字や漢字の読みに対応する順
序に順序付けられたコードを利用して順序付けする韓国
語ソート制御方式において、上記各ハングル文字と漢
字と英・数字とに対応して少なくとも当該文字の読みを
数値コード化して格納しかつ当該読みに関して通常の子
音と重子音とを区別した数値コードを格納している文字
属性辞書、および上記子音と重子音とが同一の数値コ
ードをもつて割付けられかつ上記子音と重子音とを区別
する重子音指示用コードを用意されて構成されるソート
用文字属性コード・モジユールをそなえると共に、上
記単語例の各単語に対応して当該単語に含まれる各文字
について上記文字属性辞書あるいは当該文字属性辞書か
ら準備されたソート用文字属性ロード・モジユールを索
引して当該単語に対応する数値コードを生成する数値コ
ード生成部、および該数値コード生成部によつて生成
された生成数値コードにもとづいて上記単語列中の各単
語を上記生成数値コードの昇順および／または降順に配
列するソート処理部をそなえ、上記数値コード生成部
は、通常の子音と重子音とを区別して文字属性情報を格
納している上記文字属性辞書を索引して上記数値コード
を生成するに当つて、上記ソート用文字属性コード・モ
ジユールの内容を合わせ索引し、当該内容にもとづいて
、上記文字属性辞書上での上記重子音に対応した数値コ
ードを上記通常の子音に対応する数値コードに一致する
よう変換して上記生成する生成数値コード中に出力する
と共に、当該生成数値コード中に重子音指示部をもうけ
て重子音の存否を指示するよう変換し、該生成数値コー
ドを出力するよう構成されたことを特徴とする韓国語ソ
ート制御方式。1. Word strings containing Hangul characters are assigned predetermined codes for Hangul characters, Chinese characters, and alphanumeric characters, respectively, using codes ordered in the order corresponding to the readings of the Hangul characters and Chinese characters. In the Korean sorting control method that orders the characters according to the above-mentioned characters, at least the reading of the character is stored in numerical code corresponding to each of the above Hangul characters, Chinese characters, and alphanumeric characters, and normal consonants and double consonants are distinguished with respect to the reading. A character attribute dictionary that stores the numerical codes of the above-mentioned consonants, and a code for indicating the double consonants in which the consonants and double consonants are assigned the same numerical code and which distinguish between the consonants and the double consonants. a character attribute code module for sorting, and a character attribute load module for sorting prepared from the character attribute dictionary or the character attribute dictionary for each character included in the word corresponding to each word in the example word. a numeric code generation unit that indexes and generates a numeric code corresponding to the word; and a numeric code generation unit that generates a numeric code corresponding to the word, and converts each word in the word string into the generated numeric code based on the generated numeric code generated by the numeric code generation unit. The numeric code generation unit indexes the character attribute dictionary that stores character attribute information by distinguishing between normal consonants and double consonants to generate the numeric code. In generating , the content of the character attribute code module for sorting is indexed, and based on the content, the numerical code corresponding to the double consonant in the character attribute dictionary is matched to the normal consonant. The generated numerical code is converted to match the generated numerical code and output in the generated numerical code generated above, and the generated numerical code is converted to include a double consonant indicating part to indicate the presence or absence of a double consonant, and the generated numerical code is converted to match the generated numerical code. A Korean sorting control method characterized by being configured to output.