JPH01137367A

JPH01137367A - Abbreviation file production system

Info

Publication number: JPH01137367A
Application number: JP62296673A
Authority: JP
Inventors: Koji Hashiguchi; 幸治橋口
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-11-25
Filing date: 1987-11-25
Publication date: 1989-05-30

Abstract

PURPOSE:To automatically produce an abbreviation file data base by extracting an abbreviation/original word pair coincident with a designated abbreviation/ original word pair pattern out of an input document. CONSTITUTION:A pair extracting part 2 extracts an abbreviation/original word pair coincident with a designated abbreviation/original word pair pattern by a pair retrieving/designating part 1 out of the character strings of an input document. This extracted pair is stored in an abbreviation file 3 as an abbreviation file data base. In such a way, the abbreviation file data base is automatically produced by extracting the abbreviation/original word pair coincident with the designated abbreviation/original word pattern out of the input document.

Description

【発明の詳細な説明】〔概要〕文書中から略語と原語との対を検索して略語集データベ
ースを作成する略語集作成方式に関し、略語集データベ
ースを迅速に自動作成することを目的とし、入力した文書の文字列中から、略語／原！！５のペアを
抽出するための略語／原語ペアーバクーンを指定するペ
アー検索指定部と、このペアー検索指定部によって指定
さ、れた略語／原語ペアーパターンに合致するものを、
入力した文書の文字列中から抽出するペアー抽出部とを
備え、このペアー抽出部によって抽出された略語／原語
ペアーを略語集データベースに格納するように構成する
。[Detailed Description of the Invention] [Summary] Regarding an abbreviation collection creation method that searches for pairs of abbreviations and original words in a document to create an abbreviation collection database, the present invention aims to quickly and automatically create an abbreviation collection database. Abbreviations/original! ! A pair search specification section that specifies the abbreviation/original word pair Bakun to extract the pairs of No. 5, and those that match the abbreviation/original word pair pattern specified by this pair search specification section,
and a pair extractor for extracting from character strings of an input document, and is configured to store abbreviation/original word pairs extracted by the pair extractor in an abbreviation collection database.

Ｃｍｍ上上利用分野〕本発明は、文書中から略語と原語との対を検索して略語
集データベースを作成する略語集作成方式に関するもの
である。Field of Application of Cmm] The present invention relates to an abbreviation collection creation method for creating an abbreviation collection database by searching for pairs of abbreviations and original words in a document.

〔従来の技術と発明が解決しようとする問題点〕マニュ
アルの英文の略語集を作成する場合、計算機上に略語フ
ァイル（略語集データベース）が存在すれば、効率良（
処理を行うことができる。[Problems to be solved by the conventional technology and the invention] When creating an English abbreviation collection for a manual, if an abbreviation file (abbreviation collection database) exists on the computer, it can be done efficiently (
can be processed.

しかし、従来の略語集の作成方法は、Ｔｌ）全て人手で
行う、（２）一部を計算機処理で行うようにしていた。However, the conventional methods of creating a collection of abbreviations include (Tl) doing everything manually, and (2) doing some of it by computer processing.

いずれの場合も、文書の検索対象となるものは、略語だ
けである。このため、略語／原語（フルスペル）の対応
リストを作成するためには、予め人手によって別に作成
した対応ファイルを参照する必要があり、しかも対応フ
ァイル中に抽出された略語が存在しなければ、その都度
、人手によって対応する原語（フルスペル）を対応ファ
イルに人力する必要があり、処理が煩雑となり、迅速に
略語／原語の対を作成し難いという問題点があった。In either case, only abbreviations are searched for in the document. Therefore, in order to create a correspondence list of abbreviations/original words (full spelling), it is necessary to refer to a correspondence file that has been created separately by hand in advance, and if the extracted abbreviation does not exist in the correspondence file, the Each time, it is necessary to manually enter the corresponding original word (full spelling) into the corresponding file, which makes the process complicated and makes it difficult to quickly create abbreviation/original word pairs.

また、上記対応ファイルを、複数文書間で共用すると、
対応ファイルの容量が次第に増大し、処理速度の低下を
招くと共に、略語−原語の一意の対応づけが望めなくな
るという問題点があった。Also, if the above compatible files are shared between multiple documents,
There is a problem in that the capacity of the corresponding file gradually increases, leading to a decrease in processing speed, and it becomes impossible to expect a unique correspondence between an abbreviation and an original word.

本発明は、略語集データベースを迅速に自動作成するこ
とを目的としている。The present invention aims to quickly and automatically create an abbreviation database.

〔問題点を解決するための手段〕第１図を参照して問題点を解決するための手段を説明す
る。[Means for solving the problem] Means for solving the problem will be explained with reference to FIG.

第１図において、ペアー検索指定部１は、入力された文
書の文字列中から、略語／原語のペアー抽出するための
略語／原語ペアーパターンを指定するものである。In FIG. 1, a pair search specification section 1 specifies an abbreviation/original word pair pattern for extracting an abbreviation/original word pair from a character string of an input document.

ペアー抽出部２は、入力した文書の文字列中から略語／
原語ペアーバクーンに合致する略語／原語ペアーを抽出
するものである。The pair extraction unit 2 extracts abbreviations/
This is to extract abbreviation/original word pairs that match the original word pair Bakun.

略語ファイル３は、抽出された略語／原語ペアーを格納
するものである。この格納された略語／原語ペアーは、
略語集データベースを形成する。The abbreviation file 3 stores extracted abbreviation/original word pairs. This stored abbreviation/original word pair is
Create an abbreviation database.

〔作用〕[Effect]

本発明は、第１図に示すように、ペアー検索指定部１に
よって指定された略語／原語ペアーパターンに合致する
略語／原語ペアーを、ペアー抽出部２が入力された文書
の文字列中から抽出し、略語ファイル３に略語集データ
ベースとして格納するようにしている。As shown in FIG. 1, in the present invention, a pair extraction unit 2 extracts an abbreviation/original word pair that matches an abbreviation/original word pair pattern specified by a pair search specification unit 1 from a character string of an input document. and is stored in the abbreviation file 3 as an abbreviation collection database.

このため、入力された文書中から、指定された略語／原
語ペアーパターンに合致する略語／原語ペアーを抽出し
て略語集データベースを自動作成することが可能となる
。Therefore, it is possible to automatically create an abbreviation collection database by extracting abbreviation/original word pairs that match a specified abbreviation/original word pair pattern from an input document.

〔実施例〕〔Example〕

次に、第１図ないし第５図を用いて本発明の１実施例の
構成および動作を順次詳細に説明する。Next, the configuration and operation of one embodiment of the present invention will be explained in detail using FIGS. 1 to 5.

第１図において、ファイルエディタ４は、略語ファイル
３から読み出した略語／原語ペアーを編集（ソート、マ
ージ、デリートなど）シ０、その編集結果を略語ファイ
ル３（あるいは必要に応じて出力ファイル５）に格納す
るものである。これにより、例えばアルファベット順に
並んだ略語−原語の対からなる略語集などが作成される
。In FIG. 1, the file editor 4 edits (sorts, merges, deletes, etc.) the abbreviation/original word pair read from the abbreviation file 3, and outputs the editing results to the abbreviation file 3 (or output file 5 as necessary). It is stored in . This creates, for example, an abbreviation collection consisting of abbreviation-original pairs arranged in alphabetical order.

第２図を用いて第１図構成の動作を詳細に説明する。The operation of the configuration shown in FIG. 1 will be explained in detail using FIG.

第２図において、図中■は、文書ファイルから文書例え
ばマニュアルを読み出し、ペアー抽出部２に入力する状
態を示す、これは、例えば第３図Ｔ’ＡＴＴＨＲＮ　１
に示す（例）　　ＣＰＵ：ｃｃｎｔｒａＩ　ｐｒｏｃｅ
ｓｓｉｎｇ　ｕｎｉｔ″を含む文章を入力することを意
味している。In FIG. 2, ■ in the figure indicates a state in which a document, for example, a manual, is read out from a document file and inputted to the pair extraction unit 2.
(Example) CPU: ccntraI process
This means inputting a sentence containing "ssing unit".

図中■は、ペアー検索ランク設定し、略語／原語ペアー
検索する状態を示す、これは、後述する第４図に示す何
れかのランクを設定し、図中■でこの設定したランクに
対応する第３図ＰＡＴＴ１）ＩＣ１１Ｎ１ないしｎのい
ずれかの略語／原語ペアーパターンを取り出し、この取
り出した略語／原語ペアーパターンに合致する略語／原
語ペアーを、文書中から検索開始することを意味してい
る。これら略語／原語ペアーパターンは、通常、検索プ
ログラムに内蔵させておく、また、別ファイルを設けて
これに格納しておいてもよい。■ in the figure indicates a state in which a pair search rank is set and an abbreviation/original word pair search is performed.This means that one of the ranks shown in Figure 4, which will be described later, is set, and ■ in the figure corresponds to the set rank. FIG. 3 PATT1) This means that an abbreviation/original word pair pattern of any one of IC11N1 to n is extracted and a search is started from the document for an abbreviation/original word pair that matches the extracted abbreviation/original word pair pattern. These abbreviation/original word pair patterns are usually built into the search program, or may be stored in a separate file.

図中０は、略語／原語ペアーを抽出する状態を示す、こ
れは図中■で取り出した略語／原語ペアーパターンに合
致する略語／原語ペアーを、文占中から抽出し、略語フ
ァイル３に転送して格納することを意味している。尚、
この図中■による略語／原語ペアーの抽出は、図中■で
設定したペアー検索ランクに対応する全ての略語／原語
ペアーパターンについて行い、合致した略語／原語ペア
ーの全てを略語ファイル３に格納する。0 in the figure indicates the state of extracting abbreviation/original word pairs. This means that the abbreviation/original word pairs that match the abbreviation/original word pair pattern extracted at ■ in the figure are extracted from the Bunsen and transferred to the abbreviation file 3. It is meant to be stored. still,
Extraction of abbreviation/original word pairs by ■ in this figure is performed for all abbreviation/original word pair patterns corresponding to the pair search rank set by ■ in the figure, and all matching abbreviation/original word pairs are stored in abbreviation file 3. .

図中■は、略語ファイル３に格納する状態を示す、これ
により、略語集データベースが作成される。In the figure, ■ indicates a state in which the abbreviations are stored in the abbreviation file 3. As a result, an abbreviation collection database is created.

図中■は、ファイルエディタであって、略語ファイル３
から読み出した略語／原語ペアーについて編集（ソート
、マージ、デリートなど）を行い、その結果を略語ファ
イル３　（必要に応じて出力ファイル５）に格納するも
のである。■ in the figure is a file editor, and the abbreviation file 3
Editing (sorting, merging, deleting, etc.) is performed on the abbreviation/original word pair read out from the file, and the results are stored in the abbreviation file 3 (output file 5 if necessary).

図中■は、出力ユーティリティであって、略語ファイル
に格納されている［１後の略語／原語ペアー（略語集デ
ータベース）を各種出力媒体例えばフロッピィディスク
にダンプするものである。3 in the figure is an output utility that dumps the following abbreviation/original word pairs (abbreviation collection database) stored in the abbreviation file onto various output media, such as a floppy disk.

図中■は、用語自動処理システムであって、略語集作成
、用語集作成などの各種作成処理を行うものである。3 in the figure is an automatic terminology processing system that performs various creation processes such as creating an abbreviation glossary and a glossary.

以上の手順によって、指定したペアー検索ランクに対応
する略語／原語ペアーパターンに合致する略語／原語ペ
アーを、入力された文書中から抽出して略語集データベ
ースを自動作成することが可能となる。By the above procedure, it is possible to automatically create an abbreviation collection database by extracting from the input document the abbreviation/original word pairs that match the abbreviation/original word pair pattern corresponding to the specified pair search rank.

第３図は、略語／原語ペアーパターン例を示す。FIG. 3 shows an example of an abbreviation/original word pair pattern.

これは、文書中から略語／原語ペアーを抽出するための
パターンであって、“ＦＡＴＴｒ（＋？Ｎ１”ないしＰ
ＡＴＴＰ、ＲＮｎ　’から構成されている。This is a pattern for extracting an abbreviation/original word pair from a document, and is a pattern for extracting abbreviation/original word pairs from a document.
It consists of ATTP and RNn'.

図中“×１はいずれかの文字を表す０図中“：”、１じ
　（右括弧）、“）”　（左括弧）、“、”（スペース
）は、これに対応する記号が文Ｇ中に存在する場合に適
用されるものである０図中（例）は、各パターンに対応
する具体例を示す０例えばＰＡＴＴＥｒｌＮ１”は、”
　ｘｘｘ　：　ｘｘ・・・××１から構成され、（例）
“ＣＩ’Ｕ：ｃｅｎｔｒａｌ　ｐｒｏｃｅｓｓｉｎｇ　
ｕｎｉｔ”が合致する略語／原語ペアーとして第１図ペ
アー抽出部２によって抽出される。In the figure, "x1" represents any character. In the figure, ":", 1ji (right parenthesis), ")" (left parenthesis), "," (space), the corresponding symbol is the letter G. The example in the figure shows a specific example corresponding to each pattern.For example, PATTErlN1"
xxx: Consists of xx...××1 (example)
“CI'U: central processing
"unit" is extracted by the pair extraction unit 2 in FIG. 1 as a matching abbreviation/original word pair.

以上のように略語／原語ペアーパターンを設定すること
により、文書中から当該略語／原語ペアーパターンに合
致する略語／原語ペアーを抽出することが可能となる。By setting an abbreviation/original word pair pattern as described above, it becomes possible to extract from a document an abbreviation/original word pair that matches the abbreviation/original word pair pattern.

第４図は、ペアー検索ランク例を示す、これは、指定さ
れた検索ランクに対応して第１図ペアー検索指定部１が
解読し、対応する第３図に示す略語／原語ペアーパター
ンをペアー抽出部２に通知するためのものである０図中
’ｒｌＡＮＫ　　Ｓａ　”は、単一パターンのみを検索
対象とするものである。FIG. 4 shows an example of a pair search rank, which is decoded by the pair search specifying unit 1 in FIG. 'rlANK Sa '' in Figure 0, which is used to notify the extraction unit 2, is for searching only a single pattern.

例えば′″ＲＡＮＫＳ！”は、ＦＡＴＴＥＩ？Ｎ２を用
いて検索するように、第１図ペアー検索指定部１がペア
ー抽出部２に通知する。For example, ``RANKS!'' is FATTEI? The pair search designation unit 1 in FIG. 1 notifies the pair extraction unit 2 to search using N2.

図中“ＲＡＮＫＭ、”は、指定数値以上の複数パターン
を検索対象とするものである０例えば“Ｔ？ＡＮＫＭｆ
ｆｉ”は、Ｐ　Ａ　Ｔ　Ｔ　Ｅ　ＲＮ　２〜ｎを用いて
検索するように、第１図ペアー検索指定部１がペアー抽
出部２に通知する。In the figure, “RANKM,” is a search target for multiple patterns that are greater than or equal to a specified value. For example, “T?ANKMf
The pair search designation unit 1 in FIG. 1 notifies the pair extraction unit 2 to search for “fi” using PATTERN 2 to n.

図中”ＲＩＪＫ　　Ｌ、”は、指定数値以下の複数パタ
ーンを検索対象とするものである０例えば′″ＲＡＮＫ
ＬＥ”は、ＰＡＴＴＥＲＮ　１〜３を用いて検索するよ
うに、第１図ペアー検索指定部１がペアー抽出部２に通
知する。In the figure, "RIJK L," is a search target for multiple patterns below a specified value. For example, ``RIJK L,''
LE", the pair search designation unit 1 in FIG. 1 notifies the pair extraction unit 2 to search using PATTERN 1 to 3.

図中′″ＲＡＮＫ　　ＡＬＬ″は、全登録パターンを検
索対象とするものである。これは、ＰＡＴＴｕＲＮ１〜
ｎを用いて検索するように、第１図ペアー検索指定部ｌ
がペアー抽出部２に通知する。``RANK ALL'' in the figure indicates that all registered patterns are searched. This is PATTuRN1~
In order to search using n, pair search specification part l in Figure 1
notifies the pair extraction unit 2.

以上のように、第４図ペアー検索ランクを設けることに
より、第３図ＦＡＴＴＥＩ？Ｎ　１〜ｎのうちのいずれ
に合致する略語／原語ペアーを、文書中から抽出するか
否かを指定することが可能となる。As described above, by providing the pair search ranks in Figure 4, FATTEI? It becomes possible to specify whether or not an abbreviation/original word pair matching any one of N1 to n is to be extracted from the document.

第５図は略語ファイル（略語集データベース）例を示す
、これは、第２図フローチャートに示す手順によって作
成された略語集データベース例である。第２行目の“Ｍ
−８００動作説明書”は文古のタイトルである。FIG. 5 shows an example of an abbreviation file (abbreviation collection database). This is an example of an abbreviation collection database created by the procedure shown in the flowchart of FIG. “M” in the second line
-800 Operation Manual” is the Bunko title.

第３行目以下に抽出された略語／原語ペアーが示されて
いる。この略語／原語ペアーは、“：１を用いて略語と
原語（フルスペル）との間を区切るように表したもので
ある。抽出′前の文日中には、“　（１、′）”などで
区切られていてもよい、また、第５図は、第１図ファイ
ルエディタ４によってアルファベット順にソートした後
のものである。The extracted abbreviation/original word pairs are shown from the third line onwards. This abbreviation/original word pair is expressed using “:1” to separate the abbreviation and the original word (full spelling). Also, FIG. 5 is after sorting in alphabetical order by the file editor 4 in FIG. 1.

〔発明の効果〕〔Effect of the invention〕

以上説明したように、本発明によれば、入力された文書
中から、指定された略語／原語ペアーパターンに合致す
る略語／原語ペアーを抽出して略語集データベースを作
成する構成を採用しているため、略語集データベースを
自動作成することができる。この自動作成した略ａ？を
集データベースを編集してマニュアルの略語集、用語集
などを自動作成することができる。これにより　、マニ
ュアル、四節の索引の作成工数の削減、品質の向上、電
子ファイル化の促進などを図ることができる。As explained above, according to the present invention, an abbreviation/original word pair matching a specified abbreviation/original word pair pattern is extracted from an input document to create an abbreviation collection database. Therefore, an abbreviation database can be automatically created. This automatically created abbreviation a? By editing the collection database, manual abbreviations, glossaries, etc. can be automatically created. This will reduce the man-hours required to create manuals and four-section indexes, improve quality, and promote the creation of electronic files.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の１実施例構成図、第２図は本発明の動
作説明フローヂャート、第３図は略語／原語ペアーパタ
ーン例、第４図はペアー検索ランク例、第５図は略語フ
ァイル例を示す。図中、ｌはペアー検索指定部、２はペアー抽出部、３ば
略語ファイル、４はファイルエディタを表す。佃瞥語集デ―り■−ス）木籠明の１奥底例７ａ床図尾　　］　　図本衝明の！７）伜ｔＢＰＪフローナヤート方　　２　　
日（注）ＸＸＸ　　英文字列（Ｅｌ＞　：、　（、）ノ１ｉｔＴ後１−７：／り１２
ｉ１み％；Ｉ：！”す旺４ν酬／原話へ゛アーバターン
ｆダ１第３図 σアー検索うノクイ！ＩＪ扇４図Fig. 1 is a configuration diagram of one embodiment of the present invention, Fig. 2 is a flowchart explaining the operation of the present invention, Fig. 3 is an example of an abbreviation/original word pair pattern, Fig. 4 is an example of a pair search rank, and Fig. 5 is an abbreviation file. Give an example. In the figure, l represents a pair search specification section, 2 represents a pair extraction section, 3 represents an abbreviation file, and 4 represents a file editor. Tsukabetsu word collection day ■-su) Akira Kokago's 1 deep example 7a floor map] Zumoto Shōmei's! 7) BPJ Hronayat 2
Date (note) XXX English character string (El>:, (,) ノ1itT after 1-7:/ri12
i1%;I:! ``Suo 4ν Exchange/Go to the original story Arbattern fda 1 Figure 3 σ Search Unokui! IJ Fan 4 Figure

Claims

【特許請求の範囲】文書中から略語と原語との対を検索して略語集データベ
ースを作成する略語集作成方式において、入力した文書
の文字列中から、略語／原語のペアを抽出するための略
語／原語ペアーパターンを指定するペアー検索指定部（
１）と、このペアー検索指定部（１）によって指定された略語／
原語ペアーパターンに合致するものを、入力した文書の
文字列中から抽出するペアー抽出部（２）とを備え、このペアー抽出部（２）によって抽出された略語／原語
ペアーを略語集データベースに格納するように構成した
ことを特徴とする略語集作成方式。[Claims] In an abbreviation collection creation method in which an abbreviation collection database is created by searching for pairs of abbreviations and original words in a document, a method for extracting an abbreviation/original word pair from a character string of an input document is provided. Pair search specification section for specifying abbreviation/original word pair pattern (
1) and the abbreviation specified by this pair search specification part (1)/
It is equipped with a pair extraction unit (2) that extracts those matching the original word pair pattern from the character strings of the input document, and stores the abbreviation/original word pairs extracted by the pair extraction unit (2) in an abbreviation collection database. An abbreviation collection creation method characterized by being configured so as to.