JP6210865B2

JP6210865B2 - Data search system and data search method

Info

Publication number: JP6210865B2
Application number: JP2013249341A
Authority: JP
Inventors: 由美子横張; 及川　道雄; 道雄及川; 加藤　千昭; 千昭加藤; 中江　達哉; 達哉中江; 啓成藤原; 木戸　邦彦; 邦彦木戸
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2013-12-02
Filing date: 2013-12-02
Publication date: 2017-10-11
Anticipated expiration: 2033-12-02
Also published as: JP2015106361A

Description

本発明は、データ検索システムおよびデータ検索方法に関するものであり、具体的には、表記ゆれを含むデータに対する検索処理の網羅性を向上可能とする技術に関する。 The present invention relates to a data search system and a data search method, and more specifically, to a technique that can improve the comprehensiveness of search processing for data including notation fluctuation.

医療情報の電子化が進む中で、蓄積した医療情報を活用し、医療の質の向上や経営改善に役立てる動きが増加している。一方、こうした医療情報を所定の管理システムに登録する際、担当者個人の認識や思考、スキルなど様々な要因により、同じ事象についても異なる表記での登録処理、いわゆる表記ゆれが頻繁に発生している。こうした表記ゆれは、該当医療情報を検索する際の障害となり、医療情報の効率的な二次利用を困難にしている。 With the progress of computerization of medical information, there is an increasing trend to use the accumulated medical information to improve medical quality and improve management. On the other hand, when registering such medical information in a prescribed management system, registration processes with different notation for the same event, so-called notation fluctuations, frequently occur due to various factors such as the individual's personal recognition, thoughts, and skills. Yes. Such notation fluctuation is an obstacle when searching for the corresponding medical information, and makes it difficult to efficiently use the medical information.

上述した表記ゆれに対応して医療情報の利用を行う際の対処方法としては、表記ゆれが生じている各事象について類義語たる各語彙間の対応関係を定義した同義語辞書を利用する方法がある。そうした類義語辞書に関する従来技術としては、例えば以下のような技術が提案されている。すなわち、既存の複数の同義語に共通する文字列パターンを同義語ルールとして抽出し、そのルールとデータベース内の文字列情報を用いて同義語を自動生成する技術（特許文献１参照）である。 As a coping method when using medical information corresponding to the above-mentioned notation fluctuation, there is a method of using a synonym dictionary that defines a correspondence relationship between each vocabulary as synonyms for each event in which the notation fluctuation occurs. . For example, the following techniques have been proposed as conventional techniques related to such a synonym dictionary. That is, this is a technique for extracting a character string pattern common to a plurality of existing synonyms as a synonym rule and automatically generating a synonym using the rule and character string information in a database (see Patent Document 1).

特開２００９−１２８９６８号公報JP 2009-128968 A

従来技術においては、同義語辞書に既存の同義語間に存在する共通パターンを同義語ルールとして抽出しており、確度の高い同義語を生成している。しかしながら、抽出したルールに基づいて同義語生成を行う手法のため、同義語辞書に１度しか現れない文字列など、共通パターン自体がそもそも存在し得ない文字列については同義語ルールの抽出対象とさえならない。つまり、そうした文字列が、或る事象の語彙に関する同義語として適切であったとしても、該当文字列に紐尽く医療情報は検索対象から排除されることになる。 In the prior art, a common pattern existing between synonyms existing in the synonym dictionary is extracted as a synonym rule, and a synonym with high accuracy is generated. However, since the synonym generation is based on the extracted rule, the synonym rule extraction target is used for a character string that cannot have a common pattern itself, such as a character string that appears only once in the synonym dictionary. I can't even do that. That is, even if such a character string is appropriate as a synonym for the vocabulary of a certain event, the medical information tied to the character string is excluded from the search target.

一方、医薬品の有効性評価や副作用調査など、医学分野のデータ分析を実行する場合、該当する医療情報をデータベースから漏れなく網羅的に抽出することは、分析結果の精度を良好に保つ面で特に重要となる。他方、上述したように表記ゆれの医療情報を抽出できずに評価対象に含めることが出来ない状況となれば、分析結果の精度は自ずと低下せざるをえない。 On the other hand, when performing data analysis in the medical field, such as drug efficacy assessment and side effect investigation, comprehensively extracting relevant medical information from the database without omission is particularly important in terms of maintaining the accuracy of the analysis results. It becomes important. On the other hand, as described above, the accuracy of the analysis result is inevitably lowered if the medical information with the shaking cannot be extracted and cannot be included in the evaluation target.

そこで本発明は上記課題に鑑みて発明されたものであって、表記ゆれを含むデータに対する検索処理の網羅性を向上可能とする技術を提供することにある。 Accordingly, the present invention has been invented in view of the above problems, and it is an object of the present invention to provide a technique capable of improving the comprehensiveness of search processing for data including notation fluctuation.

上記課題を解決する本発明のデータ検索システムは、同一事象を示す複数の表記を対応付けたマスタ情報を、事象毎に記憶した記憶装置と、各事象の前記マスタ情報を記憶装置より読み出し、マスタ情報が含む各表記を構成する文字列を単語に分割して、各事象における各表記の部分文字列を特定する処理と、前記特定した部分文字列の各間について、同一事象に関する各表記での位置に応じて同義関係を判定し、当該判定により同義関係があるとした部分文字列同士を対応付けて記憶装置に格納し部分表記ゆれ辞書を生成する処理と、前記表記の文字列を構成する各部分文字列を前記部分表記ゆれ辞書に照合し、該当部分文字列に対応付けされた他部分文字列を特定し、前記表記の各部分文字列と該当部分文字列に関して得た各他部分文字列とに関する組み合わせパターンを生成し、各パターンが示す文字列を前記表記が示す該当事象に対応付けて記憶装置に格納し、表記ゆれ辞書を生成する処理と、入力装置ないし所定端末より検索要求を受け付けて、当該検索要求が示す検索ワードを前記表記ゆれ辞書に照合して、前記検索ワードに対応した事象について登録されたパターンを特定し、当該特定したパターンが示す文字列で所定データベースを検索して、当該検索で抽出した情報を出力装置ないし前記所定端末に出力する処理を実行する演算装置とを備えることを特徴とする。 The data search system of the present invention that solves the above problems includes a storage device that stores, for each event, master information in which a plurality of notations indicating the same event are associated, and reads the master information of each event from the storage device. The character string constituting each notation included in the information is divided into words, the process of specifying the partial character string of each notation in each event, and between each of the specified partial character strings, The synonym relation is determined according to the position, the partial character strings determined to have the synonymous relation are stored in the storage device in association with each other and the partial notation fluctuation dictionary is generated, and the character string of the notation is configured Each partial character string is collated with the partial notation fluctuation dictionary, the other partial character string associated with the corresponding partial character string is specified, and each partial character string obtained with respect to each partial character string of the notation and the corresponding partial character string A combination pattern is generated, a character string indicated by each pattern is stored in a storage device in association with a corresponding event indicated by the notation, a notation fluctuation dictionary is generated, and a search request is accepted from the input device or a predetermined terminal The search word indicated by the search request is collated with the notation fluctuation dictionary, the pattern registered for the event corresponding to the search word is specified, and the predetermined database is searched with the character string indicated by the specified pattern. And an arithmetic unit that executes processing for outputting the information extracted in the search to an output device or the predetermined terminal.

また、本発明のデータ検索方法は、同一事象を示す複数の表記を対応付けたマスタ情報を、事象毎に記憶した記憶装置を備えた情報処理装置が、各事象の前記マスタ情報を記憶装置より読み出し、マスタ情報が含む各表記を構成する文字列を単語に分割して、各事象における各表記の部分文字列を特定する処理と、前記特定した部分文字列の各間について、同一事象に関する各表記での位置に応じて同義関係を判定し、当該判定により同義関係があるとした部分文字列同士を対応付けて記憶装置に格納し部分表記ゆれ辞書を生成する処理と、前記表記の文字列を構成する各部分文字列を前記部分表記ゆれ辞書に照合し、該当部分文字列に対応付けされた他部分文字列を特定し、前記表記の各部分文字列と該当部分文字列に関して得た各他部分文字列とに関する組み合わせパターンを生成し、各パターンが示す文字列を前記表記が示す該当事象に対応付けて記憶装置に格納し、表記ゆれ辞書を生成する処理と、入力装置ないし所定端末より検索要求を受け付けて、当該検索要求が示す検索ワードを前記表記ゆれ辞書に照合して、前記検索ワードに対応した事象について登録されたパターンを特定し、当該特定したパターンが示す文字列で所定データベースを検索して、当該検索で抽出した情報を出力装置ないし前記所定端末に出力する処理と、を実行することを特徴とする。 Further, according to the data search method of the present invention, an information processing device including a storage device that stores, for each event, master information in which a plurality of notations indicating the same event are associated is stored in the storage device. Reading, dividing the character string constituting each notation included in the master information into words, specifying the partial character string of each notation in each event, and between each of the specified partial character strings, A process of determining a synonym relationship according to a position in the notation, associating partial character strings that are determined to have a synonym relationship by the determination, storing them in a storage device, and generating a partial notation fluctuation dictionary; and a character string of the notation Are compared with the partial notation fluctuation dictionary, the other partial character strings associated with the corresponding partial character strings are specified, and each partial character string of the notation and each of the obtained partial character strings are obtained. Other parts A combination pattern relating to a character string is generated, a character string indicated by each pattern is stored in a storage device in association with a corresponding event indicated by the notation, and a notation fluctuation dictionary is generated, and a search request is made from an input device or a predetermined terminal The search word indicated by the search request is collated with the notation fluctuation dictionary, the registered pattern for the event corresponding to the search word is specified, and the predetermined database is searched with the character string indicated by the specified pattern. And the process which outputs the information extracted by the said search to an output device thru | or the said predetermined terminal is performed, It is characterized by the above-mentioned.

本発明によれば、表記ゆれを含むデータに対する検索処理の網羅性が向上する。 According to the present invention, the comprehensiveness of search processing for data including notation fluctuation is improved.

本実施形態のデータ検索システムを含むネットワーク構成図である。It is a network block diagram including the data search system of this embodiment. 本実施形態におけるデータ検索システムのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the data search system in this embodiment. 本実施形態のマスタ情報テーブルのデータ構成例を示す図である。It is a figure which shows the example of a data structure of the master information table of this embodiment. 本実施形態におけるユーザ端末のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the user terminal in this embodiment. 本実施形態におけるデータ検索方法の処理手順例１を示すフロー図である。It is a flowchart which shows process sequence example 1 of the data search method in this embodiment. 本実施形態の部分表記ゆれ辞書のデータ構成例を示す図である。It is a figure which shows the example of a data structure of the partial notation fluctuation dictionary of this embodiment. 本実施形態の表記ゆれ辞書のデータ構成例を示す図である。It is a figure which shows the data structural example of the notation fluctuation dictionary of this embodiment. 本実施形態における表記ゆれ辞書のパターン生成例１を示す図である。It is a figure which shows the pattern production example 1 of the notation fluctuation dictionary in this embodiment. 本実施形態における表記ゆれ辞書のパターン生成例２を示す図である。It is a figure which shows the pattern production example 2 of the notation fluctuation dictionary in this embodiment. 本実施形態における表記ゆれ辞書のパターン生成例３を示す図である。It is a figure which shows the pattern generation example 3 of the notation fluctuation dictionary in this embodiment. 本実施形態におけるデータ検索方法の処理手順例２を示すフロー図である。It is a flowchart which shows process sequence example 2 of the data search method in this embodiment.

−−−システム構成−−−
以下に本発明の実施形態について図面を用いて詳細に説明する。図１は、本実施形態のデータ検索システム１００を含むネットワーク構成図である。図１に示すデータ検索システム１００は、表記ゆれを含むデータに対する検索処理の網羅性を向上可能とするコンピュータシステムである。このデータ検索システム１００は、ネットワーク２０を介してユーザ端末２００らと結ばれており、ユーザ端末２００から送信されてくる検索ワード１０７を含む検索要求を受信し、これに応じて臨床データ１０５から抽出したデータを検索結果１０６として返信する情報処理装置となっている。 --- System configuration ---
Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a network configuration diagram including a data search system 100 of the present embodiment. A data search system 100 shown in FIG. 1 is a computer system that can improve the completeness of search processing for data including notation fluctuation. The data search system 100 is connected to the user terminal 200 and the like via the network 20, receives a search request including the search word 107 transmitted from the user terminal 200, and extracts from the clinical data 105 accordingly. The information processing apparatus returns the retrieved data as the search result 106.

本実施形態のデータ検索システム１００が検索対象とするのは、一例として医療機関において蓄積された臨床データ１０５をあげる。ただし、こうした臨床データの他、医療情報全般（例：医療に関する各種検査データ、医療研究機関での実験データ、公的機関等が集計した統計データなど）など、データ数が膨大で、なおかつ同一事象（例：疾病名や検査名、薬名など）について表記ゆれ（例：がん、ガン、癌）が生じやすい様々なデータ群も本実施形態の検索システム１００における検索対象となる。 The data search system 100 according to the present embodiment is a search target, for example, clinical data 105 accumulated in a medical institution. However, in addition to such clinical data, medical information in general (eg, various examination data related to medical care, experimental data at medical research institutions, statistical data aggregated by public institutions, etc.), etc., are enormous and the same event Various data groups that are likely to cause notation fluctuation (eg, cancer, cancer, cancer) (eg, disease name, examination name, drug name, etc.) are also search targets in the search system 100 of the present embodiment.

ところで医療機関では、検査や疾病といった各事象を示す用語の標準マスタが存在し、電子カルテへの記録やその後の情報活用時などに参照されているが、そのほかにも各医療機関で独自に使用する用語、すなわち表記ゆれと上述の標準マスタとの対応関係をテーブル等で規定している場合が多い。そこで本実施形態においては、医療機関等で用いられる標準マスタや医療機関が独自に作成した上述の対応テーブルにおいて、互いに同義語関係にあると規定された各用語の表記ゆれの情報を活用することで、表記ゆれのパターンを効率的に拡充し、対応する臨床データ１０５の検索処理における網羅性を向上させるものとする。 By the way, in medical institutions, there is a standard master of terms indicating each event such as examinations and diseases, and it is referred to when recording in electronic medical records and subsequent use of information. In many cases, a table or the like defines the correspondence between the term to be used, that is, the notation fluctuation and the standard master. Therefore, in the present embodiment, in the above-described correspondence table created independently by the standard master used by medical institutions or the like or by the medical institution, information on the notation of each term defined as having a synonym relationship with each other is used. Therefore, it is assumed that the notation fluctuation pattern is efficiently expanded and the comprehensiveness in the search processing of the corresponding clinical data 105 is improved.

こうしたデータ検索システム１００の構成について説明する。図２は本実施形態におけるデータ検索システム１００のハードウェア構成例を示す図である。まず、データ検索システム１００におけるハードウェア構成は以下の如くとなる。データ検索システム１００は、ハードディスクドライブなど適宜な不揮発性記憶装置で構成される記憶装置１１、ＲＡＭなど揮発性記憶装置で構成されるメモリ１３、記憶装置１１に保持されるプログラム１２をメモリ１３に読み出すなどして実行し装置自体の統括制御を行なうとともに各種判定、演算及び制御処理を行なうＣＰＵなどの演算装置１４、ネットワーク２０と接続し他装置との通信処理を担う通信装置１７、を備える。 The configuration of such a data search system 100 will be described. FIG. 2 is a diagram illustrating a hardware configuration example of the data search system 100 according to the present embodiment. First, the hardware configuration of the data search system 100 is as follows. The data search system 100 reads out to the memory 13 a storage device 11 composed of an appropriate nonvolatile storage device such as a hard disk drive, a memory 13 composed of a volatile storage device such as RAM, and a program 12 held in the storage device 11. And the like, and a control device 14 such as a CPU for performing various determinations, calculations and control processes, and a communication device 17 connected to the network 20 and responsible for communication processing with other devices.

なお、記憶装置１１内には、本実施形態のデータ検索システム１００として必要な機能を実装する為のプログラム１２と、同一事象を示す複数の表記を対応付けたマスタ情報を事象毎に記憶したマスタ情報テーブル１０１、文字列を成す文字を規定文字種に変換するための変換規則たる正規化ルール１０４、が少なくとも記憶されている。但し、後述する部分表記ゆれ辞書１０２および表記ゆれ辞書１０３が、当該データ検索システム１００により生成された以降は、記憶装置１１内に部分表記ゆれ辞書１０２および表記ゆれ辞書１０３が格納された状態となる。 In the storage device 11, a master 12 that stores, for each event, master information that associates a program 12 for implementing functions necessary for the data search system 100 of the present embodiment and a plurality of notations indicating the same event. The information table 101 and at least a normalization rule 104, which is a conversion rule for converting characters constituting a character string into a specified character type, are stored. However, after the partial notation fluctuation dictionary 102 and the notation fluctuation dictionary 103 described later are generated by the data search system 100, the partial notation fluctuation dictionary 102 and the notation fluctuation dictionary 103 are stored in the storage device 11. .

また、検索対象となる臨床データ１０５は、データ検索システム１００が記憶装置１１にて保持するとしてもよいし（必須ではない意味で図２中では破線で示している）、図１に例示するようにネットワーク２０を介して接続される他装置においてデータ検索可能に保持されているとしてもよい。また、本実施形態におけるマスタ情報テーブル１０１としては、医療検査コードに対応する検査名称を規定したマスタ情報を想定するが、これに限定されるものではなく、傷病マスタや薬品マスタなど任意のマスタ情報を想定出来る。 Further, the clinical data 105 to be searched may be held in the storage device 11 by the data search system 100 (indicated by a broken line in FIG. 2 in a non-essential meaning), as illustrated in FIG. It is also possible to hold the data searchable in other devices connected to the network 20 via the network 20. In addition, as the master information table 101 in the present embodiment, master information that prescribes examination names corresponding to medical examination codes is assumed, but the master information table 101 is not limited to this, and any master information such as an injury and illness master or a medicine master. Can be assumed.

図３に、こうしたマスタ情報テーブル１０１の一例を示す。マスタ情報テーブル１０１は、複数のレコードからなり、１レコードは１つの検査事象に関する情報を格納している。各レコードは、１つの検査コードと、当該検査コードの検査を示す少なくとも１つの検査名称を格納する。検査名称が複数存在する場合すなわち１つの検査コードに関して複数の同義語が存在する場合、該当検査コードに対応した１レコードにおいては、複数の検査名称が格納される。例えば、図３におけるマスタ情報テーブル１０１のうちレコード２０１は、検査コード「１０００」に該当する検査名称として「β‐ＴＧ」と「ｂ−トロンボグロブリン」の２つの表記が格納されている。なお、マスタ情報テーブル１０１における各レコードは、上述の項目（検査コード、検査名称）のみ含む場合のみならず、その他の任意の属性に応じた項目（表記ゆれが生じるもの）を含むとしてもよい。 FIG. 3 shows an example of such a master information table 101. The master information table 101 includes a plurality of records, and one record stores information related to one inspection event. Each record stores one inspection code and at least one inspection name indicating the inspection of the inspection code. When a plurality of examination names exist, that is, when a plurality of synonyms exist for one examination code, a plurality of examination names are stored in one record corresponding to the examination code. For example, the record 201 in the master information table 101 in FIG. 3 stores two notations “β-TG” and “b-thromboglobulin” as examination names corresponding to the examination code “1000”. Note that each record in the master information table 101 may include not only the above-described items (inspection code, inspection name) but also items according to other arbitrary attributes (those that cause notation fluctuation).

なお、データ検索システム１００は、当該データ検索システム１００に対する検索要求を、ネットワーク２０および通信装置１７を介してユーザ端末２００から受けるのではなく、当該データ検索システム１００がユーザからの検索要求を直接受け付けるとしてもよい。その場合、データ検索システム１００は、ユーザからのキー入力や音声入力を受け付ける入力装置１５、処理データの表示を行うディスプレイ等の出力装置１６を備える（必須ではない意味で図２中では破線で示している）。 The data search system 100 does not receive a search request for the data search system 100 from the user terminal 200 via the network 20 and the communication device 17, but the data search system 100 directly receives a search request from the user. It is good. In this case, the data search system 100 includes an input device 15 that accepts key input and voice input from the user, and an output device 16 such as a display that displays processing data (indicated by a broken line in FIG. 2 in a non-essential sense). ing).

一方、ユーザ端末２００は、図４にて例示するように、データ検索システム１００と同様、一般的な情報処理装置としてのハードウェア構成を備えており、ハードディスクドライブなど適宜な不揮発性記憶装置で構成される記憶装置２１、ＲＡＭなど揮発性記憶装置で構成されるメモリ２３、記憶装置２１に保持されるプログラム２２をメモリ２３に読み出すなどして実行し装置自体の統括制御を行なうとともに各種判定、演算及び制御処理を行なうＣＰＵなどの演算装置２４、ユーザからのキー入力や音声入力を受け付ける入力装置２５、処理データの表示を行うディスプレイ等の出力装置２６を備える。このユーザ端末２００は、臨床データ１０５から所望のデータを検索して利用することを望むユーザが操作する端末となる。 On the other hand, as illustrated in FIG. 4, the user terminal 200 has a hardware configuration as a general information processing device, as in the data search system 100, and is configured by an appropriate nonvolatile storage device such as a hard disk drive. Storage device 21, a memory 23 composed of a volatile storage device such as a RAM, and a program 22 held in the storage device 21 is read and executed to the memory 23 to execute overall control of the device itself, and various determinations and calculations And an arithmetic device 24 such as a CPU that performs control processing, an input device 25 that receives key input and voice input from a user, and an output device 26 such as a display that displays processing data. The user terminal 200 is a terminal operated by a user who desires to retrieve desired data from the clinical data 105 and use it.

続いて、本実施形態のデータ検索システム１００が備える機能について説明する。上述したように、以下に説明する機能は、例えばデータ検索システム１００が備えるプログラム１２を実行することで実装される機能と言える。図２の例では、演算装置１４によるプログラム１２の実行により、マスタ情報読み込み部１１１、文字列正規化部１１２、文字列分割部１１３、部分表記ゆれ辞書生成部１１４、表記ゆれ辞書生成部１１５、表記ゆれパターン生成部１１６、データ抽出部１１７が実装された状態となっている。 Next, functions provided in the data search system 100 of this embodiment will be described. As described above, it can be said that the functions described below are implemented by executing the program 12 included in the data search system 100, for example. In the example of FIG. 2, the master information reading unit 111, the character string normalization unit 112, the character string division unit 113, the partial notation fluctuation dictionary generation unit 114, the notation fluctuation dictionary generation unit 115, The notation fluctuation pattern generation unit 116 and the data extraction unit 117 are mounted.

データ検索システム１００は、各事象のマスタ情報を記憶装置１１のマスタ情報テーブル１０１より読み出し（マスタ情報読み込み部１１１の機能）、マスタ情報が含む各表記を構成する文字列を単語に分割して、各事象における各表記の部分文字列を特定する機能（文字列分割部１１３の機能）を備えている。 The data search system 100 reads the master information of each event from the master information table 101 of the storage device 11 (function of the master information reading unit 111), divides the character string constituting each notation included in the master information into words, A function (function of the character string dividing unit 113) for specifying a partial character string of each notation in each event is provided.

また、データ検索システム１００は、上述で特定した部分文字列の各間について、同一事象に関する各表記での位置に応じて同義関係を判定し、当該判定により同義関係があるとした部分文字列同士を対応付けて記憶装置１１に格納し部分表記ゆれ辞書１０２を生成する機能（部分表記ゆれ辞書生成部１１４の機能）を備えている。 In addition, the data search system 100 determines the synonym relation between the partial character strings specified above according to the position in each notation related to the same event, and the partial character strings that are determined to have the synonymous relation by the determination. Are stored in the storage device 11 in association with each other and a partial notation fluctuation dictionary 102 is generated (function of the partial notation fluctuation dictionary generation unit 114).

また、データ検索システム１００は、上述の表記の文字列を構成する各部分文字列を部分表記ゆれ辞書１０２に照合し、該当部分文字列に対応付けされた他部分文字列を特定し、上述の表記の各部分文字列と該当部分文字列に関して得た各他部分文字列とに関する組み合わせパターンを生成し、各パターンが示す文字列を上述の表記が示す該当事象に対応付けて記憶装置１１に格納し、表記ゆれ辞書１０３を生成する機能（表記ゆれ辞書生成部１１５の機能）を備えている。 Further, the data search system 100 collates each partial character string constituting the above-described character string with the partial notation fluctuation dictionary 102, specifies the other partial character string associated with the corresponding partial character string, and A combination pattern regarding each partial character string of the notation and each other partial character string obtained with respect to the corresponding partial character string is generated, and the character string indicated by each pattern is stored in the storage device 11 in association with the corresponding event indicated by the above notation. And a function for generating the notation fluctuation dictionary 103 (function of the notation fluctuation dictionary generation unit 115).

また、データ検索システム１００は、入力装置１５ないしユーザ端末２００（所定端末）より検索要求を受け付けて、当該検索要求が示す検索ワードを表記ゆれ辞書１０３に照合して、検索ワードに対応した事象について登録されたパターンを特定し、当該特定したパターンが示す文字列で臨床データ（所定データベース）１０５を検索して、当該検索で抽出した情報を出力装置１６ないしユーザ端末２００に出力する機能（データ抽出部１１７の機能）を備えている。 Further, the data search system 100 receives a search request from the input device 15 or the user terminal 200 (predetermined terminal), collates the search word indicated by the search request with the notation fluctuation dictionary 103, and the event corresponding to the search word. A function (data extraction) that specifies a registered pattern, searches clinical data (predetermined database) 105 with a character string indicated by the specified pattern, and outputs information extracted by the search to the output device 16 or the user terminal 200 The function of the unit 117).

また、データ検索システム１００は、上述の部分文字列を特定する処理に際し、各事象のマスタ情報を記憶装置１１のマスタ情報テーブル１０１より読み出し、マスタ情報が含む各表記を構成する文字列に所定の変換規則たる正規化ルール１０４を適用し、該当文字列を成す文字を以降の処理に適した規定文字種に変換する正規化処理を実行し（文字列正規化部１１２の機能）、当該正規化処理後の文字列を単語に分割して、各事象における各表記の部分文字列を特定する機能を備えている。 Further, in the process of specifying the partial character string described above, the data search system 100 reads the master information of each event from the master information table 101 of the storage device 11, and sets a predetermined character string constituting each notation included in the master information. Applying the normalization rule 104 that is a conversion rule, a normalization process is performed to convert the characters forming the corresponding character string into a specified character type suitable for the subsequent processing (function of the character string normalization unit 112), and the normalization process It has a function to divide a subsequent character string into words and specify a partial character string of each notation in each event.

また、データ検索システム１００は、表記ゆれ辞書１０３を生成する処理に際し、上述で生成したパターンが示す、部分文字列と他部分文字列、部分文字列と部分文字列、他部分文字列と他部分文字列、のいずれかの組について、マスタ情報テーブル１０１での該当文字列の登録状況ないしパターン間での共通性に応じて、該当パターンの信頼性指標となる確度レベルを判定し、当該確度レベルの値を該当パターンに対応付けて表記ゆれ辞書１０３に登録する機能（表記ゆれ辞書生成部１１５の機能）を備えている。 In addition, when the data search system 100 generates the notation fluctuation dictionary 103, the partial character string and the other partial character string, the partial character string and the partial character string, the other partial character string and the other part indicated by the pattern generated above. For any pair of character strings, the accuracy level to be a reliability index of the corresponding pattern is determined according to the registration status of the corresponding character string in the master information table 101 or the commonality between the patterns, and the accuracy level Is registered in the notation fluctuation dictionary 103 in association with the corresponding pattern (function of the notation fluctuation dictionary generation unit 115).

また、データ検索システム１００は、上述の検索要求に応じて臨床データ１０５を検索して情報を出力する処理において、上述の検索要求が示す検索ワードを単語に分割して、部分文字列を特定し、ここで特定した部分文字列を部分表記ゆれ辞書１０２に照合し、該当部分文字列に対応付けされた他部分文字列を特定し、上述の検索ワードの各部分文字列と該当部分文字列に関して得た各他部分文字列とに関する組み合わせパターンを生成し（表記ゆれパターン生成部１１６の機能）、当該各パターンが示す文字列で臨床データ１０５を検索して、当該検索で抽出した情報を出力装置１５ないしユーザ端末２００に出力する機能を備えている。 Further, in the process of searching the clinical data 105 in response to the search request and outputting information, the data search system 100 divides the search word indicated by the search request into words, and specifies the partial character string. The partial character string specified here is collated with the partial notation fluctuation dictionary 102, the other partial character string associated with the corresponding partial character string is specified, and each partial character string of the search word and the corresponding partial character string are related. A combination pattern related to each of the other partial character strings is generated (function of the notation fluctuation pattern generation unit 116), the clinical data 105 is searched with the character string indicated by each pattern, and the information extracted by the search is output. 15 to a function of outputting to the user terminal 200.

−−−処理手順例１−−−
以下、本実施形態におけるデータ検索方法の実際手順について図に基づき説明する。以下で説明するデータ検索方法に対応する各種動作は、データ検索システム１００がメモリ等に読み出して実行するプログラムによって実現される。そして、このプログラムは、以下に説明される各種の動作を行うためのコードから構成されている。 --- Example of processing procedure 1 ---
Hereinafter, the actual procedure of the data search method in the present embodiment will be described with reference to the drawings. Various operations corresponding to the data search method described below are realized by a program that the data search system 100 reads into a memory or the like and executes. And this program is comprised from the code | cord | chord for performing the various operation | movement demonstrated below.

図５は、本実施形態におけるデータ検索方法の処理手順例１を示すフロー図である。ここではまず、データ検索システム１００が部分表記ゆれ辞書１０２および表記ゆれ辞書１０３を生成する処理について説明する。この場合、データ検索システム１００におけるマスタ情報読み込み部１１１は、記憶装置１１におけるマスタ情報テーブル１０１の各レコードをメモリ１３に読み込み（ｓ１００）、各レコードが含む検査コードと検査名称の各値を抽出する（ｓ１０１）。図２の例であれば、例えば検査コード「１０００」について、検査名称１「β−ＴＧ」、および検査名称２「ｂ−トロンボグロブリン」の各値を抽出し、検査コード「１００１」について、検査名称１「ＣＫアイソ」、および検査名称２「ＣＫアイソザイム」の各値を抽出する、といった抽出処理をレコード毎に順次実行することになる。 FIG. 5 is a flowchart showing a processing procedure example 1 of the data search method in the present embodiment. Here, first, a process in which the data search system 100 generates the partial notation fluctuation dictionary 102 and the notation fluctuation dictionary 103 will be described. In this case, the master information reading unit 111 in the data search system 100 reads each record of the master information table 101 in the storage device 11 into the memory 13 (s100), and extracts each value of the inspection code and the inspection name included in each record. (S101). In the example of FIG. 2, for example, for test code “1000”, each value of test name 1 “β-TG” and test name 2 “b-thromboglobulin” is extracted, and test code “1001” is tested. Extraction processing such as extraction of each value of name 1 “CK iso” and test name 2 “CK isozyme” is sequentially executed for each record.

続いてデータ検索システム１００における文字列正規化部１１２は、上述のステップ１００で各レコードから抽出した検査名称の値すなわち文字列に対して、正規化ルール１０４に基づく文字の変換を実行する（ｓ１０２）。正規化ルール１０４の例としては、小文字を大文字に変換、半角文字を全角文字に変換、「」を（）に変換、といった文字種の統一化を図るルールを想定する。 Subsequently, the character string normalization unit 112 in the data search system 100 executes character conversion based on the normalization rule 104 on the value of the examination name extracted from each record in step 100 described above, that is, the character string (s102). ). As an example of the normalization rule 104, a rule is assumed that unifies character types, such as converting lowercase letters to uppercase letters, half-width characters to full-width characters, and "" to ().

次にデータ検索システム１００における文字列分割部１１３は、上述のステップｓ１０２で正規化ルール１０４に沿って文字変換を行った文字列、すなわち検査コードに対応した検査名称を示す各文字列を部分文字列たる要素（以下、要素）に分割する（ｓ１０３）。本実施形態においては、文字種の違いや所定辞書を利用した既存の形態素解析技術を利用し、文字列を要素ごとに分割することとする。 Next, the character string dividing unit 113 in the data search system 100 converts each character string indicating the examination name corresponding to the examination code to the partial character after the character conversion is performed in accordance with the normalization rule 104 in step s102 described above. The process is divided into elements (hereinafter referred to as elements) that are columns (s103). In the present embodiment, a character string is divided into elements by using a difference in character type or an existing morpheme analysis technique using a predetermined dictionary.

続いてデータ検索システム１００における部分表記ゆれ辞書生成部１３３は、上述までのステップでの処理対象となっているレコードにおいて、１つの検査事象に対応付けされていた複数の検査名称らを同義語として特定し、該当各検査名称を選出する（ｓ１０４）。また、部分表記ゆれ辞書生成部１１３は、上述のステップｓ１０４で選出した複数の検査名称間で、該当検査名称に関してステップｓ１０３で得ている要素を比較し、一致する要素の有無を判定する（ｓ１０５）。 Subsequently, the partial notation fluctuation dictionary generation unit 133 in the data search system 100 uses a plurality of examination names associated with one examination event as synonyms in the record to be processed in the above steps. The specified examination name is selected (s104). Further, the partial notation fluctuation dictionary generation unit 113 compares the elements obtained in step s103 with respect to the corresponding examination name among the plurality of examination names selected in step s104 described above, and determines whether there is a matching element (s105). ).

例えば、検査名称１「β‐ＴＧ」と検査名称２「ｂ‐トロンボグロブリン」がステップｓ１０４で選出された場合、部分表記ゆれ辞書生成部１１３は、各検査名称が含む要素「β」、「‐」、「ＴＧ」、「ｂ」、「‐」、「トロンボグロブリン」（これらはステップｓ１０３で得ている）のうち一致するものとして「‐」の存在を特定する。 For example, when examination name 1 “β-TG” and examination name 2 “b-thromboglobulin” are selected in step s104, the partial notation fluctuation dictionary generation unit 113 includes elements “β”, “−” included in each examination name. ”,“ TG ”,“ b ”,“ − ”,“ thromboglobulin ”(these are obtained in step s103), and the presence of“ − ”is identified.

こうした判定の結果、複数の検査名称間で一致する要素が存在すると判明した場合（ｓ１０５：Ｙ）、部分表記ゆれ辞書生成部１１３は、該当検査名称の文字列のうち、検査名称間で一致する要素の前方の要素集合、および、一致する要素の後方の要素集合をそれぞれ特定する（ｓ１０６）。上述の例であれば、部分表記ゆれ辞書生成部１１３は、検査名称１「β‐ＴＧ」と検査名称２「ｂ‐トロンボグロブリン」の２つの検査名称に共通する要素「‐」より、検査名称１「β‐ＴＧ」については前方の要素たる文字列「β」を、検査名称２「ｂ‐トロンボグロブリン」については前方の要素たる文字列「ｂ」を特定する。また同様に、検査名称１「β‐ＴＧ」については後方の文字列「ＴＧ」を、検査名称２「ｂ‐トロンボグロブリン」については後方の文字列「トロンボグロブリン」をそれぞれ特定することとなる。なお、少なくとも一方の検査名称における（共通の要素から前方ないし後方の）要素数が０の場合は、該当検査名称に関して以降の処理対象から除外する。 As a result of such determination, when it is found that there is a matching element between a plurality of examination names (s105: Y), the partial notation fluctuation dictionary generation unit 113 matches between examination names in the character string of the corresponding examination name. An element set in front of the element and an element set behind the matching element are specified (s106). In the above example, the partial notation fluctuation dictionary generation unit 113 determines the examination name from the element “-” common to the two examination names of the examination name 1 “β-TG” and the examination name 2 “b-thromboglobulin”. For 1 “β-TG”, the character string “β” as the front element is specified, and for the examination name 2 “b-thromboglobulin”, the character string “b” as the front element is specified. Similarly, for the test name 1 “β-TG”, the rear character string “TG” is specified, and for the test name 2 “b-thromboglobulin”, the rear character string “thromboglobulin” is specified. When the number of elements (at the front or rear of the common elements) in at least one examination name is 0, the examination name is excluded from the subsequent processing targets.

続いて部分表記ゆれ辞書生成部１１３は、上述のステップｓ１０６で要素集合を特定した各検査名称に関して、その要素数を比較し、検査名称間で要素数が等しくない場合（ｓ１０７：Ｎ）、要素（部分文字列）の統合処理を行って、要素数が多い要素集合（上述の共通の要素より前方ないし後方の要素集合）の要素数を低減することで、上述の検査名称間で要素数を統一する（ｓ１０８）。部分表記ゆれ辞書生成部１１３は、どの要素を統合するかを、各要素や、各要素を組み合わせた文字列が部分表記ゆれ辞書１０２に存在するか否か等により判定する。例えば、「Ａ、Ｂ、Ｃ」の３つの要素を統合処理によって２つの要素とする場合、各要素「Ａ」、「Ｂ」、「Ｃ」が部分表記ゆれ辞書１０２に存在するか、また、要素同士を連結させ、文字列としての連続性を損なわない形とした文字列、例えば、「ＡＢ」、「ＢＣ」が部分表記ゆれ辞書１０２に存在するか否か判定する。 Subsequently, the partial notation fluctuation dictionary generation unit 113 compares the number of elements for each examination name for which the element set has been specified in step s106 described above, and if the number of elements is not equal between the examination names (s107: N), (Partial character string) is integrated to reduce the number of elements in the element set having a large number of elements (element set in front of or behind the above-mentioned common element), thereby reducing the number of elements between the above examination names. Unify (s108). The partial notation fluctuation dictionary generation unit 113 determines which elements are to be integrated based on whether each element or a character string combining the elements exists in the partial notation fluctuation dictionary 102 or the like. For example, when three elements “A, B, C” are converted into two elements by integration processing, each element “A”, “B”, “C” exists in the partial notation fluctuation dictionary 102, It is determined whether or not a character string, such as “AB” or “BC”, in which the elements are connected to each other and the continuity as a character string is not impaired, is present in the partial notation fluctuation dictionary 102.

要素（部分文字列）、あるいは、要素を統合した文字列が部分表記ゆれ辞書１０２に存在する場合、部分表記ゆれ辞書生成部１１３は、部分表記ゆれ辞書１０２に存在する表記を優先する形で要素の統合処理を行う。例えば、文字列「ＡＢ」なる表記が部分表記ゆれ辞書１０２に存在する場合、要素「Ａ」、「Ｂ」、「Ｃ」のうち要素「Ａ」と要素「Ｂ」を統合して「ＡＢ」を生成し、この要素「ＡＢ」と残りの要素「Ｃ」の２つの要素とする。他方、要素や、要素を結合した文字列が部分表記ゆれ辞書１０２に存在しない場合、部分表記ゆれ辞書生成部１１３は、要素を連結させた場合に文字列としての連続性を損なわないように、複数の統合パターンを生成する。例えば、「Ａ」、「Ｂ」、「Ｃ」、「ＡＢ」、「ＢＣ」、「ＡＢＣ」がどれも部分表記ゆれ辞書１０２に存在しない場合、部分表記ゆれ辞書生成部１１３は、「ＡＢ、Ｃ」、「Ａ、ＢＣ」の２つのパターンを生成し、以降の処理を行う。 When an element (partial character string) or a character string in which elements are integrated exists in the partial notation fluctuation dictionary 102, the partial notation fluctuation dictionary generation unit 113 gives priority to the notation existing in the partial notation fluctuation dictionary 102. Perform the integration process. For example, when the notation “AB” is present in the partial notation fluctuation dictionary 102, the elements “A” and “B” of the elements “A”, “B”, and “C” are integrated into “AB”. And this element “AB” and the remaining element “C” are defined as two elements. On the other hand, if the element or the character string that combines the elements does not exist in the partial notation fluctuation dictionary 102, the partial notation fluctuation dictionary generation unit 113 does not impair continuity as a character string when the elements are connected. Generate multiple integration patterns. For example, when “A”, “B”, “C”, “AB”, “BC”, and “ABC” do not exist in the partial notation fluctuation dictionary 102, the partial notation fluctuation dictionary generation unit 113 selects “AB, Two patterns of “C”, “A, BC” are generated, and the subsequent processing is performed.

次に部分表記ゆれ辞書生成部１１３は、上述のステップｓ１０８まで得ている各検査名称における要素の位置関係、すなわち、同義語として選出し要素数を統一した各検査名称の間における要素の位置関係を利用して部分表記ゆれを判定し、当該判定により同義関係があるとした要素（共通する要素の前方ないし後方の部分文字列）同士を対応付けて記憶装置１１に格納し部分表記ゆれ辞書１０２を生成する（ｓ１０９）。当該ステップｓ１０９の具体的な説明を行う前に、部分表記ゆれ辞書１０２について説明する。 Next, the partial notation fluctuation dictionary generation unit 113 obtains the positional relationship of the elements in each examination name obtained up to step s108 described above, that is, the positional relation of the elements between the examination names with the same number of elements selected as synonyms. Is used to determine the partial notation fluctuation, and the elements that have a synonymous relationship by the determination (partial character strings in front of or behind the common element) are stored in the storage device 11 in association with each other. Is generated (s109). Before describing the step s109 in detail, the partial notation fluctuation dictionary 102 will be described.

図６に部分表記ゆれ辞書１０２の一例を示す。部分表記ゆれ辞書１０２は、上述の要素に対応する部分文字列と、当該部分文字列のＩＤと、部分文字列の同義語と判定された他部分文字列（同義部分文字列）の部分文字列ＩＤとを格納している。同義部分文字列の部分文字列ＩＤは、１レコードに少なくとも１つ格納される。例えば、レコード３０１は、部分文字列「β」に対して、「ｂ」（部分文字列ＩＤ「０００２」と、「ベータ」（部分文字列ＩＤ「００１５」）が同義語として判定されたことを意味する。なお、部分表記ゆれ辞書１０２には、他のマスタ情報を用いて本実施形態のデータ検索方法を実行した結果得られた部分表記ゆれ情報を含んでもよい。 FIG. 6 shows an example of the partial notation fluctuation dictionary 102. The partial notation fluctuation dictionary 102 is a partial character string of a partial character string corresponding to the above-described element, an ID of the partial character string, and another partial character string (synonymous partial character string) determined to be a synonym of the partial character string. ID is stored. At least one partial character string ID of the synonymous partial character string is stored in one record. For example, the record 301 indicates that “b” (partial character string ID “0002” and “beta” (partial character string ID “0015”) is determined as a synonym for the partial character string “β”. It should be noted that the partial notation fluctuation dictionary 102 may include partial notation fluctuation information obtained as a result of executing the data search method of the present embodiment using other master information.

ステップｓ１０９における処理をステップｓ１０８で２つの要素集合「ＡＢ、Ｃ」と「Ｄ、Ｅ」が得られた場合を例として説明する。この場合、部分表記ゆれ辞書生成部１１３は、要素集合における各要素の位置関係（例：要素集合中すなわち文字列中における先頭要素同士、あるいは後端要素同士）から「ＡＢ」と「Ｄ」、「Ｃ」と「Ｅ」をそれぞれ同義語であると判定し、該当要素に応じた文字列を部分表記ゆれ辞書１０２に追加する。 The process in step s109 will be described by taking as an example the case where two element sets “AB, C” and “D, E” are obtained in step s108. In this case, the partial notation fluctuation dictionary generation unit 113 calculates “AB” and “D” from the positional relationship of each element in the element set (for example, in the element set, that is, between the leading elements in the character string or between the trailing elements) It is determined that “C” and “E” are synonyms, and a character string corresponding to the corresponding element is added to the partial notation fluctuation dictionary 102.

この時、部分表記ゆれ辞書生成部１１３は、要素たる部分文字列「ＡＢ」ならびに「Ｄ」が部分表記ゆれ辞書１０２に登録されているか判定する。部分表記ゆれ辞書１０２に当該部分文字列が登録されていない場合、部分表記ゆれ辞書生成部１１３は、部分表記ゆれ辞書１０２に当該文字列を登録する。他方、部分表記ゆれ辞書１０２に当該部分文字列が登録されている場合、部分表記ゆれ辞書生成部１１３は、同義語関係があると判定した要素たる部分文字列が同義語として登録されているかを判定して、登録処理を行う。例えば、部分表記ゆれ辞書１０２において、文字列「Ｄ」の部分文字列ＩＤが、文字列「ＡＢ」の同義語ＩＤとして登録されているかを判定し、登録されていない場合、部分表記ゆれ辞書生成部１１３は、部分文字列「ＡＢ」の同義語ＩＤに文字列「Ｄ」の部分文字列ＩＤを追加する。同時に、文字列「ＡＢ」の部分文字列ＩＤが文字列「Ｄ」の同義語ＩＤとして部分表記ゆれ辞書１０２にて登録されているかを判定し、登録されていない場合、部分表記ゆれ辞書生成部１１３は、文字列「Ｄ」の同義語ＩＤに文字列「ＡＢ」の部分文字列ＩＤを追加する。部分表記ゆれ辞書生成部１１３は、以上の処理を、同義であると判定された文字列「Ｃ」と「Ｅ」に対しても行う。 At this time, the partial notation fluctuation dictionary generation unit 113 determines whether the partial character strings “AB” and “D” as elements are registered in the partial notation fluctuation dictionary 102. When the partial character string is not registered in the partial notation fluctuation dictionary 102, the partial notation fluctuation dictionary generation unit 113 registers the character string in the partial notation fluctuation dictionary 102. On the other hand, if the partial character string is registered in the partial notation fluctuation dictionary 102, the partial notation fluctuation dictionary generation unit 113 determines whether the partial character string that is an element determined to have a synonym relationship is registered as a synonym. Determine and perform the registration process. For example, in the partial notation fluctuation dictionary 102, it is determined whether or not the partial character string ID of the character string “D” is registered as a synonym ID of the character string “AB”. The unit 113 adds the partial character string ID of the character string “D” to the synonym ID of the partial character string “AB”. At the same time, it is determined whether the partial character string ID of the character string “AB” is registered in the partial notation fluctuation dictionary 102 as a synonym ID of the character string “D”. 113 adds the partial character string ID of the character string “AB” to the synonym ID of the character string “D”. The partial notation fluctuation dictionary generation unit 113 performs the above processing for the character strings “C” and “E” determined to be synonymous.

部分表記ゆれ辞書生成部１１３は、以上のｓ１０４〜ｓ１０９の処理を全ての同義語の組み合わせに対して行う。例えば、同じ検査コードを持つ検査名称が３つ以上の場合は、当該検査名称の集合から２つ選択することで成り立つすべての組み合わせに対して、上述のｓ１０４〜ｓ１０９の処理を実行する。例えば、１レコード中から「ＴＧ」、「トリグリセリド」、「中性脂肪」の同じ意味を持つ３つの検査名称が得られる場合、部分表記ゆれ辞書生成部１１３は、「ＴＧ」と「トリグリセリド」、「トリグリセリド」と「中性脂肪」、「ＴＧ」と「中性脂肪」の３通りの組み合わせに対してｓ１０４〜ｓ１０９の処理を行う。 The partial notation fluctuation dictionary generation unit 113 performs the processes of s104 to s109 described above for all combinations of synonyms. For example, when there are three or more inspection names having the same inspection code, the above-described processing of s104 to s109 is executed for all combinations that are established by selecting two from the set of inspection names. For example, when three examination names having the same meaning of “TG”, “triglyceride”, and “neutral fat” are obtained from one record, the partial notation fluctuation dictionary generation unit 113 selects “TG”, “triglyceride”, The processing of s104 to s109 is performed on three combinations of “triglyceride” and “neutral fat”, “TG” and “neutral fat”.

続いて、データ検索システム１００の表記ゆれ辞書生成部１１５による表記ゆれ辞書１０３の生成処理について説明する。表記ゆれ辞書生成部１１５は、部分表記ゆれ辞書１０２に登録した各要素、すなわち検査名称の文字列を構成する各部分文字列（要素）を、部分表記ゆれ辞書１０２に照合し、該当部分文字列に対応付けされた他部分文字列、すなわち同義部分文字列ＩＤに対応する要素を特定し、検査名称の各部分文字列と該当部分文字列に関して得た各他部分文字列とに関する組み合わせパターンを生成し、各パターンが示す文字列を、該当検査名称に対応する検査コードに対応付けて格納し、表記ゆれ辞書１０３を生成する。 Next, generation processing of the notation fluctuation dictionary 103 by the notation fluctuation dictionary generation unit 115 of the data search system 100 will be described. The notation fluctuation dictionary generation unit 115 collates each element registered in the partial notation fluctuation dictionary 102, that is, each partial character string (element) constituting the character string of the examination name, with the partial notation fluctuation dictionary 102, and the corresponding partial character string. Identifies the element corresponding to the other partial character string, that is, the synonymous partial character string ID, and generates a combination pattern relating to each partial character string of the examination name and each other partial character string obtained with respect to the corresponding partial character string Then, the character string indicated by each pattern is stored in association with the inspection code corresponding to the corresponding inspection name, and the notation fluctuation dictionary 103 is generated.

ここでは、一例としてマスタ情報テーブル１０１において同義語とされている「β−ＴＧ」と「ｂ−トロンボグロブリン」（以降、これらを対象用語とする）の表記ゆれについてパターン生成を行う例について説明するものとする。表記ゆれ辞書生成部１１５は、上述の対象用語を構成する部分文字列のＩＤを部分表記ゆれ辞書１０２に照合し、対象用語を構成する部分文字列のＩＤに対応付けられている、同義部分文字列ＩＤを取得する（ｓ１１０）。例えば、対象用語「β−ＴＧ」の部分文字列「β」に着目すると、部分表記ゆれ辞書１０２にて取得するＩＤは、「β」の部分文字列ＩＤである「０００１」と、この「β」に対応付けされている同義部分文字列ＩＤである「０００２」、「００１５」、の計３つとなる。同様に、部分文字列「ＴＧ」に着目して部分表記ゆれ辞書１０２にて得られるＩＤは、「０００３」、「０００４」、「００３９」となる。 Here, as an example, an example will be described in which pattern generation is performed for fluctuations in notation of “β-TG” and “b-thromboglobulin” (hereinafter referred to as target terms) that are synonymous in the master information table 101. Shall. The notation fluctuation dictionary generation unit 115 collates the ID of the partial character string that constitutes the target term described above with the partial notation fluctuation dictionary 102, and synonymous partial characters that are associated with the ID of the partial character string that constitutes the target term. A column ID is acquired (s110). For example, focusing on the partial character string “β” of the target term “β-TG”, the ID acquired in the partial notation fluctuation dictionary 102 is “0001”, which is the partial character string ID of “β”, and this “β "," "0002" and "0015" which are synonymous partial character string IDs associated with "". Similarly, IDs obtained in the partial notation fluctuation dictionary 102 by paying attention to the partial character string “TG” are “0003”, “0004”, and “0039”.

次に、表記ゆれ辞書生成部１１５は、表記ゆれ辞書１０３に登録する表記ゆれパターンを生成する（ｓ１１１）。具体的には、上述の対象用語に関してステップｓ１１０で得た、部分文字列ＩＤおよび同義部分文字列ＩＤを可能なだけ組み合わせることで、すべての表記ゆれパターンを作成する。この表記ゆれパターンは、組み合わせた部分文字列ＩＤおよび同義部分文字列ＩＤにそれぞれ対応した各文字列が連結された検査名称となる。すなわち、マスタ情報テーブル１０１に登録されている既存の検査名称から派生する可能性のある検査名称、をデータ検索システム１００において生成することになる。 Next, the notation fluctuation dictionary generation unit 115 generates a notation fluctuation pattern to be registered in the notation fluctuation dictionary 103 (s111). Specifically, all the notation fluctuation patterns are created by combining the partial character string ID and the synonymous partial character string ID obtained in step s110 with respect to the target term as much as possible. This notation fluctuation pattern is an examination name in which character strings corresponding to the combined partial character string ID and synonymous partial character string ID are connected. That is, the data search system 100 generates a test name that may be derived from an existing test name registered in the master information table 101.

従って、上述の表記ゆれパターンは、対象用語の各部分文字列に基づいて部分表記ゆれ辞書１０２から取得した部分文字列ＩＤおよび同義部分文字列ＩＤの数の積だけ作成される。例えば、対象用語を成す部分文字列「β」と部分文字列「ＴＧ」に基づいて部分表記ゆれ辞書１０２にて取得されるＩＤが、それぞれ３つであった場合、作成される表記ゆれパターンは図８における表６０１のように「３×３」で９つとなる。対象用語「β−ＴＧ」と同義語とされている「ｂ‐トロンボグロブリン」にも同様の処理を行うと、図９の表６０２のように６つの表記ゆれパターンが作成される。 Therefore, the above-described notation fluctuation pattern is created by the product of the number of partial character string IDs and synonymous partial character string IDs acquired from the partial notation fluctuation dictionary 102 based on each partial character string of the target term. For example, when there are three IDs acquired in the partial notation fluctuation dictionary 102 based on the partial character string “β” and the partial character string “TG” that constitute the target term, the notation fluctuation pattern created is As shown in Table 601 in FIG. 8, “3 × 3” is nine. When similar processing is performed on “b-thromboglobulin” which is synonymous with the target term “β-TG”, six notation fluctuation patterns are created as shown in Table 602 of FIG.

続いて表記ゆれ辞書生成部１１５は、上述のステップ１１１で得られた表記ゆれパターンの確度レベルを判定し、表記ゆれ辞書１０３に登録する（ｓ１１２）。この場合、表記ゆれ辞書生成部１１５は、上述のステップｓ１１１で得た各表記ゆれパターンが示す、対象用語を成す部分文字列とこれの同義部分文字列、対象用語を成す部分文字列同士、対象用語を成す部分文字列に関する同義部分文字列同士、の各組について、マスタ情報テーブル１０１における登録状況ないし表記ゆれパターン間での共通性に応じて、該当パターンの信頼性指標となる確度レベルを判定し、当該確度レベルの値を該当パターンに対応付けて表記ゆれ辞書１０３に登録する。 Subsequently, the notation fluctuation dictionary generation unit 115 determines the accuracy level of the notation fluctuation pattern obtained in the above-described step 111 and registers it in the notation fluctuation dictionary 103 (s112). In this case, the notation fluctuation dictionary generation unit 115 indicates the partial character string that forms the target term, the synonymous partial character string, the partial character strings that form the target term, the target, indicated by each notation fluctuation pattern obtained in step s111 described above. For each set of synonymous partial character strings related to a partial character string that constitutes a term, the accuracy level that is a reliability index of the corresponding pattern is determined according to the registration status in the master information table 101 or the commonality between the notation fluctuation patterns. Then, the accuracy level value is registered in the notation fluctuation dictionary 103 in association with the corresponding pattern.

例えば、表記ゆれパターンのうち表６０１における「０００１、０００３」に該当する「β‐ＴＧ」、同様に、表６０２における「０００２、０００４」に該当する「ｂ‐トロンボグロブリン」について、表記ゆれ辞書生成部１１５は、いずれもマスタ情報テーブル１０１のレコード中に含まれる名称であることを認識し、表記ゆれ辞書１０３における該当表記ゆれパターンのレコードにて確度レベル「１」の値を設定する。図１０にて、この確度レベル「１」の表記ゆれパターンについて集約した表６０３を例示している。 For example, a notation fluctuation dictionary is generated for “β-TG” corresponding to “0001, 0003” in Table 601 and “b-thromboglobulin” corresponding to “0002, 0004” in Table 602 among the notation fluctuation patterns. The unit 115 recognizes that all are names included in the record of the master information table 101, and sets a value of the accuracy level “1” in the record of the corresponding notation fluctuation pattern in the notation fluctuation dictionary 103. FIG. 10 exemplifies a table 603 that summarizes the fluctuation pattern of the accuracy level “1”.

また表記ゆれ辞書生成部１１５は、上述の対象用語に基づいて作成した各表記ゆれパターンのいずれにも存在する表記ゆれパターンについては、表記ゆれ辞書１０３における該当表記ゆれパターンのレコードにて確度レベル「２」の値を設定する。例えば、「β‐ＴＧ」から作成した表記ゆれパターンと、「ｂ‐トロンボグロブリン」から作成した表記ゆれパターンのどちらにも存在する文字列、すなわち「β‐トロンボグロブリン」、「ｂ‐ＴＧ」については、確度レベル「２」の値が表記ゆれ辞書１０３に登録される。図１０にて、この確度レベル「２」の表記ゆれパターンについて集約した表６０４を例示している。 In addition, the notation fluctuation dictionary generation unit 115, for the notation fluctuation pattern that exists in any of the notation fluctuation patterns created based on the above-described target terms, the accuracy level “ A value of “2” is set. For example, the character strings that exist in both the written fluctuation pattern created from “β-TG” and the written fluctuation pattern created from “b-thromboglobulin”, ie, “β-thromboglobulin”, “b-TG” The value of the accuracy level “2” is registered in the notation fluctuation dictionary 103. FIG. 10 exemplifies a table 604 that summarizes the fluctuation pattern of the accuracy level “2”.

また、表記ゆれ辞書生成部１１５は、上述の対象用語に基づいて作成した各表記ゆれパターンのいずれか１つのみに存在する表記ゆれパターンについては、表記ゆれ辞書１０３における該当表記ゆれパターンのレコードにて確度レベル「３」の値を設定する。例えば、対象用語「β‐ＴＧ」と「ｂ‐トロンボグロブリン」から作成した表記ゆれパターンのうち、表６０１、６０２にて示すように、「β‐トリグリセリド」や「ｂ‐トリグリセリド」「ベータ‐ＴＧ」等、７つの表記ゆれパターンは、対象用語の一方のみについての表記ゆれパターン中にしか存在しないものであるため、それらは、確度レベル「３」となる。図１０にて、この確度レベル「３」の表記ゆれパターンについて集約した表６０５を例示している。 In addition, the notation fluctuation dictionary generation unit 115 stores a notation fluctuation pattern that exists in only one of the notation fluctuation patterns created based on the above-described target terms in a record of the corresponding notation fluctuation pattern in the notation fluctuation dictionary 103. To set the value of accuracy level “3”. For example, among the notation fluctuation patterns created from the subject terms “β-TG” and “b-thromboglobulin”, as shown in Tables 601 and 602, “β-triglyceride”, “b-triglyceride”, “beta-TG” ”And the like are present only in the written fluctuation pattern for only one of the target terms, so that they have an accuracy level of“ 3 ”. FIG. 10 exemplifies a table 605 that summarizes the fluctuation pattern of the accuracy level “3”.

こうして生成される表記ゆれ辞書１０３の具体例について図７に示す。図７で例示する本実施形態の表記ゆれ辞書１０３は、検査コード、検査名称、および確度レベルの３つの属性値を含むレコードの集合体となっている。各レコードは、検査コードと検査名称と確度レベルを１つずつ格納する。上述した確度レベルは、上述のごとく作成した表記ゆれパターンのもっともらしさを示す指標である。本実施形態では、確度として、該当検査名称がマスタ情報テーブル１０１のレコード中に含まれている（確度レベル１）、マスタ情報テーブル１０１にて同義語とされている各対象用語から作成した表記ゆれパターンのいずれにも存在する表記ゆれパターン（確度レベル２）、同義語となる対象用語から作成した表記ゆれパターンのいずれか一つにのみ存在する表記ゆれパターン（確度レベル３）の３つの確度レベルを設定している。なお、表記ゆれ辞書１０３には、他のマスタ情報を用いて本実施形態のデータ検索方法を実行した結果得られた表記ゆれ情報を含んでもよい。また、本実施形態のデータ検索方法以外の手段、例えば、特許文献１の手段を用いて取得した同義語情報を含んでもよい。 A specific example of the notation fluctuation dictionary 103 generated in this way is shown in FIG. The notation fluctuation dictionary 103 of this embodiment illustrated in FIG. 7 is an aggregate of records including three attribute values of an inspection code, an inspection name, and an accuracy level. Each record stores an inspection code, an inspection name, and an accuracy level one by one. The accuracy level described above is an index indicating the likelihood of the written fluctuation pattern created as described above. In this embodiment, as the accuracy, the corresponding examination name is included in the record of the master information table 101 (accuracy level 1), and the notation fluctuation created from each target term that is a synonym in the master information table 101 Three accuracy levels: a notation fluctuation pattern (accuracy level 2) that exists in any of the patterns, and a notation fluctuation pattern (accuracy level 3) that exists only in one of the notation fluctuation patterns created from the target terms that are synonyms Is set. The notation fluctuation dictionary 103 may include notation fluctuation information obtained as a result of executing the data search method of this embodiment using other master information. Moreover, you may include the synonym information acquired using means other than the data search method of this embodiment, for example, the means of patent document 1. FIG.

−−−処理手順例２−−−
上記では、部分表記ゆれ辞書１０２と表記ゆれ辞書１０３の生成を行う処理までについて説明した。以降では、これら部分表記ゆれ辞書１０２や表記ゆれ辞書１０３を適宜利用し、ユーザからの検索要求に応じて網羅的な検索を実行する検索処理、すなわち、臨床データ１０５から検索ワードに合致するデータだけでなく、検索ワードと同義の用語すなわち表記ゆれに合致するデータをも出力する処理について説明する。図１１は、本実施形態におけるデータ検索方法の処理手順例２を示すフロー図である。ここでは、臨床データ１０５から、「β‐ＴＧ」の検査を行った患者の情報を抽出するという検索課題を例として説明する。 --- Processing procedure example 2 ---
The process up to the generation of the partial notation fluctuation dictionary 102 and the notation fluctuation dictionary 103 has been described above. Thereafter, the partial notation fluctuation dictionary 102 and the notation fluctuation dictionary 103 are used as appropriate, and a search process for performing an exhaustive search in response to a search request from the user, that is, only data that matches the search word from the clinical data 105. In addition, a description will be given of a process for outputting data that matches the term synonymous with the search word, that is, the notation fluctuation. FIG. 11 is a flowchart showing a processing procedure example 2 of the data search method according to this embodiment. Here, a search problem of extracting information on patients who have been examined for “β-TG” from clinical data 105 will be described as an example.

この場合、データ検索システム１００における文字列正規化部１１２は、例えばユーザ端末２００から受信した検索要求から検索ワード１０７を取得し、当該検索ワード１０７に対して正規化ルール１０４を適用して、上述のステップｓ１０２と同様の文字変換処理を行う（ｓ２００）。上述の処理手順例１と同様の処理については説明の重複を避けるため詳細は省略する（以下同様）。また、データ検索システム１００における文字列分割部１１３は、上述のステップｓ２００にて文字列正規化部１１２が正規化した検索ワードに対して、上述のステップｓ１０３と同様の文字列分割の処理を行う（ｓ２０１）。 In this case, the character string normalization unit 112 in the data search system 100 acquires the search word 107 from, for example, a search request received from the user terminal 200, applies the normalization rule 104 to the search word 107, and The character conversion process similar to that in step s102 is performed (s200). Details of processing similar to the processing procedure example 1 described above are omitted to avoid duplication of explanation (the same applies hereinafter). In addition, the character string dividing unit 113 in the data search system 100 performs the character string dividing process similar to that in step s103 described above on the search word normalized by the character string normalizing unit 112 in step s200 described above. (S201).

また、データ検索システム１００における表記ゆれパターン生成部１１６は、検索ワード１０７を分割し特定した部分文字列を部分表記ゆれ辞書１０２に照合し、該当部分文字列に対応付けされた同義部分文字列を特定し、検索ワード１０７の各部分文字列と該当部分文字列に関して得た同義部分文字列とに関する組み合わせパターンを生成する（ｓ２０２）。なお、この処理フローにおいて、文字列分割部１１３と表記ゆれパターン生成部１１６の各処理は、文字列正規化部１１２で正規化された検索ワードが表記ゆれ辞書１０３に含まれる用語であった場合は省略可能である。 Further, the notation fluctuation pattern generation unit 116 in the data search system 100 collates the partial character string identified by dividing the search word 107 with the partial notation fluctuation dictionary 102, and obtains the synonymous partial character string associated with the corresponding partial character string. The combination pattern regarding each partial character string of the search word 107 and the synonymous partial character string obtained regarding the corresponding partial character string is generated (s202). In this processing flow, each process of the character string dividing unit 113 and the notation fluctuation pattern generation unit 116 is a case where the search word normalized by the character string normalization part 112 is a term included in the notation fluctuation dictionary 103. Can be omitted.

続いてデータ検索システム１００におけるデータ抽出部１１７は、上述のステップｓ２００で正規化された検索ワード、ならびに、ステップｓ２０２で表記ゆれパターン生成部１１６で生成された表記ゆれパターンが示す同義語（検索ワード１０７の同義語）、のそれぞれに基づき、臨床データ１０５で検索を実行してデータを抽出し、該当データを検索結果としてユーザ端末２００に出力する（ｓ２０３）。例えば、検索ワード１０７が「β‐ＴＧ」であった場合、「β‐ＴＧ」そのものと、表記ゆれ辞書１０３で「β‐ＴＧ」をキーに検索して当該「β‐ＴＧ」と同じ検査コード「１０００」を持つ他の検査名称と、上述のステップｓ２０２で得た表記ゆれパターンが示す用語とをそれぞれ検索キーとして臨床データ１０５にて検索を行い、この検索結果を出力する。 Subsequently, the data extraction unit 117 in the data search system 100 synonyms (search word) indicated by the search word normalized in step s200 described above and the notation fluctuation pattern generated by the notation fluctuation pattern generation unit 116 in step s202. 107, a search is performed on the clinical data 105 to extract data, and the corresponding data is output to the user terminal 200 as a search result (s203). For example, when the search word 107 is “β-TG”, “β-TG” itself and the notation fluctuation dictionary 103 are searched using “β-TG” as a key and the same inspection code as the “β-TG”. The clinical data 105 is searched using another examination name having “1000” and the term indicated by the notation fluctuation pattern obtained in step s202 described above as a search key, and the search result is output.

なお、上述のデータ抽出部１１７は、ステップｓ２０３の実行に際し、検索ワード１０７とその同義語の一覧（表記ゆれ辞書１０３から得たものと、検索ワード１０７から得たもの）を確度レベルの高い順に検索候補としてユーザ端末２００に返し、これを閲覧した検索者に実際に抽出する用語を選択させるとしてもよい。この場合、検索ワード１０７とその同義語の表示順序や表示範囲は、データ検索システム１００側で事前設定するとしてもよいし、検索者がユーザ端末２００を介してデータ検索システム１００にアクセスして設定するとしてもよい。例えば、マスタ情報テーブル１０１に登録された用語のみ、マスタ情報テーブル１０１に登録された用語と高確度の表記ゆれのみ、というように検索候補の表示範囲を設定することで、検索者の希望に合致した効率的な検索を行うことが可能になる。 The above-described data extraction unit 117, when executing step s203, lists the search word 107 and its synonyms (obtained from the notation fluctuation dictionary 103 and obtained from the search word 107) in descending order of accuracy level. It is good also as returning to the user terminal 200 as a search candidate, and making the searcher who browsed this select the term actually extracted. In this case, the display order and display range of the search word 107 and its synonyms may be preset on the data search system 100 side, or set by the searcher accessing the data search system 100 via the user terminal 200. You may do that. For example, by setting the display range of search candidates so that only the terms registered in the master information table 101, only the terms registered in the master information table 101 and the high-precision notation fluctuation, the searcher's wishes are met. Efficient search can be performed.

また、データ検索システム１００は、検索ワード１０７として、スペース記号をはさんで複数の検索ワードの組みを受け付けた場合、複数の検索ワードに対してＡＮＤ検索を行うものとする。例えば、「トリグリセリドＬＤＬコレステロール」なる検索ワード１０７をユーザ端末２００から受信した場合、「トリグリセリド」と「ＬＤＬコレステロール」の各ワードについて上述の各ステップｓ２００〜ｓ２０３を実行し、各ワードのどちらも含むデータを検索結果としてユーザ端末２００に返す。 In addition, when the data search system 100 accepts a combination of a plurality of search words with a space symbol as the search word 107, the data search system 100 performs an AND search on the plurality of search words. For example, when the search word 107 “triglyceride LDL cholesterol” is received from the user terminal 200, the above steps s200 to s203 are executed for each word of “triglyceride” and “LDL cholesterol”, and data including both words Is returned to the user terminal 200 as a search result.

以上、本発明を実施するための最良の形態などについて具体的に説明したが、本発明はこれに限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能である。 Although the best mode for carrying out the present invention has been specifically described above, the present invention is not limited to this, and various modifications can be made without departing from the scope of the invention.

こうした本実施形態によれば、表記ゆれを含む医療情報から分析対象の情報を精度良く抽出する際に効果的な、網羅的な同義語辞書を効率よく構築出来る。また、同義語辞書に登録した同義語の確度（同義語としての確からしさ、妥当性）をユーザに提示することも可能である。こうした同義語辞書を検索処理時に活用できることで、ユーザが指定した検索ワードに基づく同義語を踏まえた検索を実行し、検索結果の網羅性が向上することとなる。なお、同義語辞書に登録した語句であるが実際には存在しない同義語で上述の検索を実行しても、そもそも検索対象となるデータベースには含まれない語句のため、抽出されることがなく、検索結果の正しさに影響を与えない。 According to the present embodiment, it is possible to efficiently construct an exhaustive synonym dictionary that is effective in accurately extracting information to be analyzed from medical information including notation fluctuation. It is also possible to present to the user the accuracy of the synonyms registered in the synonym dictionary (probability and validity as synonyms). Since such a synonym dictionary can be used at the time of search processing, a search based on a synonym based on a search word designated by the user is executed, and the completeness of the search result is improved. Note that even if the above search is performed with synonyms that are registered in the synonym dictionary but do not actually exist, they are not extracted because they are not included in the search target database. , Does not affect the correctness of search results.

したがって本実施形態によれば、表記ゆれを含むデータに対する検索処理の網羅性が向上する。 Therefore, according to the present embodiment, the completeness of search processing for data including notation fluctuation is improved.

本明細書の記載により、少なくとも次のことが明らかにされる。すなわち、前記演算装置は、前記部分文字列を特定する処理に際し、各事象の前記マスタ情報を記憶装置より読み出し、マスタ情報が含む各表記を構成する文字列に所定の変換規則を適用し、該当文字列を成す文字を以降の処理に適した規定文字種に変換する正規化処理を実行し、当該正規化処理後の文字列を単語に分割して、各事象における各表記の部分文字列を特定するものである、としてもよい。 At least the following will be clarified by the description of the present specification. That is, in the process of specifying the partial character string, the arithmetic device reads the master information of each event from the storage device, applies a predetermined conversion rule to the character string constituting each notation included in the master information, Executes a normalization process that converts the characters that make up the character string into a standard character type suitable for the subsequent processing, divides the character string after the normalization process into words, and identifies the partial character string of each notation in each event It is good also as what to do.

これによれば、マスタ情報が含む文字列において、大文字と小文字、半角と全角などといった文字種のゆれについて事前に統一すなわち正規化し、以降の部分文字列の特定や、部分表記ゆれ辞書および表記ゆれ辞書の生成等の各処理の精度や効率を向上させることが可能となる。 According to this, in the character string included in the master information, the character type fluctuations such as uppercase and lowercase letters, half-width and full-width are unified in advance, that is, normalized, and the subsequent partial character string identification, partial notation fluctuation dictionary and notation fluctuation dictionary It is possible to improve the accuracy and efficiency of each process such as generation.

また、本実施形態のデータ検索システムにおいて、前記演算装置は、前記ゆれ辞書を生成する処理に際し、前記生成したパターンが示す、部分文字列と他部分文字列、部分文字列と部分文字列、他部分文字列と他部分文字列、のいずれかの組について、前記マスタ情報での登録状況ないしパターン間での共通性に応じて、該当パターンの信頼性指標となる確度レベルを判定し、当該確度レベルの値を該当パターンに対応付けて前記表記ゆれ辞書に登録する処理を更に実行するものである、としてもよい。 Further, in the data search system of the present embodiment, the arithmetic unit, in the process of generating the fluctuation dictionary, shows a partial character string and another partial character string, a partial character string and a partial character string, etc. indicated by the generated pattern. For any pair of partial character string and other partial character string, the accuracy level that is a reliability index of the corresponding pattern is determined according to the registration status in the master information or the commonality between the patterns, and the accuracy It is also possible to further execute a process of associating the level value with the corresponding pattern and registering it in the notation fluctuation dictionary.

これによれば、表記ゆれの中でもその信頼性に差異がある点をユーザに明示し、各検索時の検索結果について確からしさを提示することが出来る。 According to this, it is possible to clearly indicate to the user that there is a difference in reliability among the notation fluctuations, and it is possible to present the certainty about the search results at the time of each search.

また、本実施形態のデータ検索システムにおいて、前記演算装置は、前記検索要求に応じて前記所定データベースを検索して情報を出力する処理において、前記検索要求が示す検索ワードを単語に分割して、部分文字列を特定し、前記特定した部分文字列を前記部分表記ゆれ辞書に照合し、該当部分文字列に対応付けされた他部分文字列を特定し、前記検索ワードの各部分文字列と該当部分文字列に関して得た各他部分文字列とに関する組み合わせパターンを生成し、当該各パターンが示す文字列で所定データベースを検索して、当該検索で抽出した情報を出力装置ないし前記所定端末に出力する処理を実行するものである、としてもよい。 Further, in the data search system of the present embodiment, in the process of searching the predetermined database in response to the search request and outputting information, the arithmetic device divides the search word indicated by the search request into words, A partial character string is identified, the identified partial character string is checked against the partial notation fluctuation dictionary, another partial character string associated with the corresponding partial character string is identified, and each partial character string of the search word corresponds to A combination pattern related to each other partial character string obtained with respect to the partial character string is generated, a predetermined database is searched with the character string indicated by each pattern, and information extracted by the search is output to the output device or the predetermined terminal. The process may be executed.

これによれば、検索ワードに関しても表記のゆれを考慮して同義語を生成し、検索ワードのみならずその同義語に関しても検索を実行することが可能となり、検索精度が更に高まることとなる。なお、検索ワードから得た同義語が、検索対象のデータベース中に存在しないものであっても、そうした同義語でデータベースの検索を行った場合、そもそもデータベースには含まれない語句のため、データ抽出がされず、検索結果の正しさに影響が生じることがない。 According to this, it is possible to generate a synonym for the search word in consideration of the fluctuation of the notation, and to execute the search not only for the search word but also for the synonym, and the search accuracy is further improved. Even if the synonym obtained from the search word does not exist in the database to be searched, if the database is searched with such a synonym, the data is extracted because it is not included in the database in the first place. Will not affect the correctness of the search results.

１１記憶装置
１２プログラム
１３メモリ
１４演算装置
１７通信装置
２０ネットワーク
１００データ検索システム
１０１マスタ情報テーブル
１０２部分表記ゆれ辞書
１０３表記ゆれ辞書
１０４正規化ルール
１０５臨床データ
１１１マスタ情報読み込み部
１１２文字列正規化部
１１３文字列分割部
１１４部分表記ゆれ辞書生成部
１１５表記ゆれ辞書生成部
１１６表記ゆれパターン生成部
１１７データ抽出部
２００ユーザ端末
２１記憶装置
２２プログラム
２３メモリ
２４演算装置
２５入力装置
２６出力装置
２７通信装置 DESCRIPTION OF SYMBOLS 11 Storage device 12 Program 13 Memory 14 Arithmetic device 17 Communication device 20 Network 100 Data retrieval system 101 Master information table 102 Partial notation fluctuation dictionary 103 Notation fluctuation dictionary 104 Normalization rule 105 Clinical data 111 Master information reading part 112 Character string normalization part 113 Character string division unit 114 Partial notation fluctuation dictionary generation part 115 Notation fluctuation dictionary generation part 116 Notation fluctuation pattern generation part 117 Data extraction part 200 User terminal 21 Storage device 22 Program 23 Memory 24 Arithmetic device 25 Input device 26 Output device 27 Communication device

Claims

同一事象を示す複数の表記を対応付けたマスタ情報を、事象毎に記憶した記憶装置と、
各事象の前記マスタ情報を記憶装置より読み出し、マスタ情報が含む各表記を構成する文字列を単語に分割して、各事象における各表記の部分文字列を特定する処理と、
前記特定した部分文字列の各間について、同一事象に関する各表記での位置に応じて同義関係を判定し、当該判定により同義関係があるとした部分文字列同士を対応付けて記憶装置に格納し部分表記ゆれ辞書を生成する処理と、
前記表記の文字列を構成する各部分文字列を前記部分表記ゆれ辞書に照合し、該当部分文字列に対応付けされた他部分文字列を特定し、前記表記の各部分文字列と該当部分文字列に関して得た各他部分文字列とに関する組み合わせパターンを生成し、各パターンが示す文字列を前記表記が示す該当事象に対応付けて記憶装置に格納し、表記ゆれ辞書を生成する処理と、
入力装置ないし所定端末より検索要求を受け付けて、当該検索要求が示す検索ワードを前記表記ゆれ辞書に照合して、前記検索ワードに対応した事象について登録されたパターンを特定し、当該特定したパターンが示す文字列で所定データベースを検索して、当該検索で抽出した情報を出力装置ないし前記所定端末に出力する処理を実行する演算装置と、
を備えることを特徴とするデータ検索システム。 A storage device that stores, for each event, master information in which a plurality of notations indicating the same event are associated,
Reading the master information of each event from the storage device, dividing a character string constituting each notation included in the master information into words, and specifying a partial character string of each notation in each event;
For each of the identified partial character strings, a synonym relationship is determined according to the position in each notation relating to the same event, and the partial character strings that are determined to have a synonym relationship are associated with each other and stored in the storage device. Processing to generate a partial notation fluctuation dictionary;
Each partial character string constituting the character string of the notation is collated with the partial notation fluctuation dictionary, the other partial character string associated with the corresponding partial character string is specified, and each partial character string and the corresponding partial character of the notation are specified. Generating a combination pattern related to each other partial character string obtained with respect to the column, storing the character string indicated by each pattern in a storage device in association with the corresponding event indicated by the notation, and generating a notation fluctuation dictionary;
A search request is received from an input device or a predetermined terminal, a search word indicated by the search request is collated with the notation fluctuation dictionary, a pattern registered for an event corresponding to the search word is specified, and the specified pattern is An arithmetic device that executes a process of searching a predetermined database with a character string indicating, and outputting information extracted by the search to an output device or the predetermined terminal;
A data retrieval system comprising:

前記演算装置は、
前記部分文字列を特定する処理に際し、
各事象の前記マスタ情報を記憶装置より読み出し、マスタ情報が含む各表記を構成する文字列に所定の変換規則を適用し、該当文字列を成す文字を以降の処理に適した規定文字種に変換する正規化処理を実行し、当該正規化処理後の文字列を単語に分割して、各事象における各表記の部分文字列を特定するものである、
ことを特徴とする請求項１に記載のデータ検索システム。 The arithmetic unit is
In the process of specifying the partial character string,
The master information of each event is read from the storage device, and a predetermined conversion rule is applied to a character string constituting each notation included in the master information, and characters constituting the corresponding character string are converted into a prescribed character type suitable for subsequent processing. The normalization process is executed, the character string after the normalization process is divided into words, and the partial character string of each notation in each event is specified.
The data search system according to claim 1.

前記演算装置は、
表記ゆれ辞書を生成する処理に際し、
前記生成したパターンが示す、部分文字列と他部分文字列、部分文字列と部分文字列、他部分文字列と他部分文字列、のいずれかの組について、前記マスタ情報での登録状況ないしパターン間での共通性に応じて、該当パターンの信頼性指標となる確度レベルを判定し、当該確度レベルの値を該当パターンに対応付けて前記表記ゆれ辞書に登録する処理を更に実行するものである、
ことを特徴とする請求項２に記載のデータ検索システム The arithmetic unit is
In the process of generating the notation fluctuation dictionary,
The registration status or pattern in the master information for any combination of partial character string and other partial character string, partial character string and partial character string, other partial character string and other partial character string, indicated by the generated pattern In accordance with the commonality between them, the accuracy level that becomes the reliability index of the corresponding pattern is determined, and the processing of registering the accuracy level value in the notation fluctuation dictionary in association with the corresponding pattern is further executed. ,
The data search system according to claim 2,

前記演算装置は、
前記検索要求に応じて前記所定データベースを検索して情報を出力する処理において、
前記検索要求が示す検索ワードを単語に分割して、部分文字列を特定し、前記特定した部分文字列を前記部分表記ゆれ辞書に照合し、該当部分文字列に対応付けされた他部分文字列を特定し、前記検索ワードの各部分文字列と該当部分文字列に関して得た各他部分文字列とに関する組み合わせパターンを生成し、当該各パターンが示す文字列で所定データベースを検索して、当該検索で抽出した情報を出力装置ないし前記所定端末に出力する処理を実行するものである、
ことを特徴とする請求項３に記載のデータ検索システム。 The arithmetic unit is
In the process of searching the predetermined database in response to the search request and outputting information,
The search word indicated by the search request is divided into words, a partial character string is specified, the specified partial character string is checked against the partial notation fluctuation dictionary, and another partial character string associated with the corresponding partial character string And generating a combination pattern relating to each partial character string of the search word and each other partial character string obtained with respect to the corresponding partial character string, searching a predetermined database with the character string indicated by each pattern, and performing the search The process of outputting the information extracted in the output device or the predetermined terminal is executed.
The data search system according to claim 3.

同一事象を示す複数の表記を対応付けたマスタ情報を、事象毎に記憶した記憶装置を備えた情報処理装置が、
各事象の前記マスタ情報を記憶装置より読み出し、マスタ情報が含む各表記を構成する文字列を単語に分割して、各事象における各表記の部分文字列を特定する処理と、
前記特定した部分文字列の各間について、同一事象に関する各表記での位置に応じて同義関係を判定し、当該判定により同義関係があるとした部分文字列同士を対応付けて記憶装置に格納し部分表記ゆれ辞書を生成する処理と、
前記表記の文字列を構成する各部分文字列を前記部分表記ゆれ辞書に照合し、該当部分文字列に対応付けされた他部分文字列を特定し、前記表記の各部分文字列と該当部分文字列に関して得た各他部分文字列とに関する組み合わせパターンを生成し、各パターンが示す文字列を前記表記が示す該当事象に対応付けて記憶装置に格納し、表記ゆれ辞書を生成する処理と、
入力装置ないし所定端末より検索要求を受け付けて、当該検索要求が示す検索ワードを前記表記ゆれ辞書に照合して、前記検索ワードに対応した事象について登録されたパターンを特定し、当該特定したパターンが示す文字列で所定データベースを検索して、当該検索で抽出した情報を出力装置ないし前記所定端末に出力する処理と、
を実行することを特徴とするデータ検索方法。 An information processing apparatus provided with a storage device that stores, for each event, master information in which a plurality of notations indicating the same event are associated,
Reading the master information of each event from the storage device, dividing a character string constituting each notation included in the master information into words, and specifying a partial character string of each notation in each event;
For each of the identified partial character strings, a synonym relationship is determined according to the position in each notation relating to the same event, and the partial character strings that are determined to have a synonym relationship are associated with each other and stored in the storage device. Processing to generate a partial notation fluctuation dictionary;
Each partial character string constituting the character string of the notation is collated with the partial notation fluctuation dictionary, the other partial character string associated with the corresponding partial character string is specified, and each partial character string and the corresponding partial character of the notation are specified. Generating a combination pattern related to each other partial character string obtained with respect to the column, storing the character string indicated by each pattern in a storage device in association with the corresponding event indicated by the notation, and generating a notation fluctuation dictionary;
A search request is received from an input device or a predetermined terminal, a search word indicated by the search request is collated with the notation fluctuation dictionary, a pattern registered for an event corresponding to the search word is specified, and the specified pattern is A process of searching a predetermined database with a character string indicating, and outputting information extracted by the search to an output device or the predetermined terminal;
The data search method characterized by performing.