JP6976178B2

JP6976178B2 - Extractor, extraction method, and extraction program

Info

Publication number: JP6976178B2
Application number: JP2018004684A
Authority: JP
Inventors: 雅二郎岩崎
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2018-01-16
Filing date: 2018-01-16
Publication date: 2021-12-08
Anticipated expiration: 2038-01-16
Also published as: JP2019125124A

Description

本発明は、抽出装置、抽出方法、及び抽出プログラムに関する。 The present invention relates to an extraction device, an extraction method, and an extraction program.

従来、種々の情報を抽出する技術が提供されている。例えば、他の検索ユーザの検索履歴を利用して、現在の検索ユーザに対し、ユーザ間距離が最も近い他の検索ユーザが閲覧した特許文献の中で、現在の検索ユーザが閲覧していない特許文献を抽出する技術が提供されている。 Conventionally, techniques for extracting various information have been provided. For example, a patent that is not browsed by the current search user among the patent documents browsed by the other search user having the closest user-to-user distance to the current search user by using the search history of another search user. Techniques for extracting literature are provided.

特開２００７−２１３１５１号公報Japanese Unexamined Patent Publication No. 2007-21351

岩崎雅二郎 "木構造型インデックスを利用した近似k最近傍グラフによる近傍検索", 情報処理学会論文誌, 2011/2, Vol. 52, No. 2. pp.817-828.Masajiro Iwasaki "Neighborhood Search by Approximate k Nearest Neighborhood Graph Using Tree Structure Index", IPSJ Journal, 2011/2, Vol. 52, No. 2. pp.817-828.

しかしながら、上記の従来技術では、類似の特許文献を適切に抽出することが難しい場合がある。例えば、ユーザ間の類似度に基づくだけでは、特許文献間の類似性が反映されているとは限らず、所望の特許文献を抽出することが難しい場合がある。 However, it may be difficult to properly extract similar patent documents with the above-mentioned prior art. For example, it may be difficult to extract a desired patent document because the similarity between patent documents is not always reflected only based on the degree of similarity between users.

本願は、上記に鑑みてなされたものであって、類似の特許文献を適切に抽出する抽出装置、抽出方法、及び抽出プログラムを提供することを目的とする。 The present application has been made in view of the above, and an object of the present application is to provide an extraction device, an extraction method, and an extraction program for appropriately extracting similar patent documents.

本願に係る抽出装置は、複数の特許文献の各々に対応する複数のノードが、前記複数の特許文献の類似性に応じて連結されたグラフ情報と、一の発明に関する情報を取得する取得部と、前記取得部により取得された前記グラフ情報の前記複数のノードのうち、所定の基準に基づいて決定された前記グラフ情報の検索の起点となる起点ノードを起点として前記グラフ情報を検索することにより、前記複数の特許文献のうち、前記一の発明に類似する特許文献である類似特許文献を抽出する抽出部と、を備えたことを特徴とする。 The extraction device according to the present application includes a graph information in which a plurality of nodes corresponding to each of a plurality of patent documents are connected according to the similarity of the plurality of patent documents, and an acquisition unit for acquiring information relating to one invention. By searching the graph information from the starting node that is the starting point for searching the graph information determined based on a predetermined criterion among the plurality of nodes of the graph information acquired by the acquisition unit. The present invention is characterized by comprising an extraction unit for extracting a similar patent document, which is a patent document similar to the one invention, among the plurality of patent documents.

実施形態の一態様によれば、類似の特許文献を適切に抽出することができるという効果を奏する。 According to one aspect of the embodiment, there is an effect that similar patent documents can be appropriately extracted.

図１は、実施形態に係る抽出処理の一例を示す図である。FIG. 1 is a diagram showing an example of an extraction process according to an embodiment. 図２は、実施形態に係る抽出システムの構成例を示す図である。FIG. 2 is a diagram showing a configuration example of the extraction system according to the embodiment. 図３は、実施形態に係る抽出装置の構成例を示す図である。FIG. 3 is a diagram showing a configuration example of the extraction device according to the embodiment. 図４は、実施形態に係る特許情報記憶部の一例を示す図である。FIG. 4 is a diagram showing an example of a patent information storage unit according to an embodiment. 図５は、実施形態に係るインデックス情報記憶部の一例を示す図である。FIG. 5 is a diagram showing an example of an index information storage unit according to an embodiment. 図６は、実施形態に係るグラフ情報記憶部の一例を示す図である。FIG. 6 is a diagram showing an example of a graph information storage unit according to an embodiment. 図７は、実施形態に係るモデル情報記憶部の一例を示す図である。FIG. 7 is a diagram showing an example of a model information storage unit according to an embodiment. 図８は、実施形態に係る抽出処理の一例を示すフローチャートである。FIG. 8 is a flowchart showing an example of the extraction process according to the embodiment. 図９は、実施形態に係る生成処理の一例を示すフローチャートである。FIG. 9 is a flowchart showing an example of the generation process according to the embodiment. 図１０は、実施形態に係る特徴量の抽出の一例を示す図である。FIG. 10 is a diagram showing an example of extraction of a feature amount according to an embodiment. 図１１は、グラフ情報を用いた検索処理の一例を示すフローチャートである。FIG. 11 is a flowchart showing an example of a search process using graph information. 図１２は、実施形態に係る抽出処理の一例を示す図である。FIG. 12 is a diagram showing an example of the extraction process according to the embodiment. 図１３は、抽出装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。FIG. 13 is a hardware configuration diagram showing an example of a computer that realizes the function of the extraction device.

以下に、本願に係る抽出装置、抽出方法、及び抽出プログラムを実施するための形態（以下、「実施形態」と呼ぶ）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係る抽出装置、抽出方法、及び抽出プログラムが限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, the extraction device, the extraction method, and the embodiment for implementing the extraction program according to the present application (hereinafter referred to as “embodiments”) will be described in detail with reference to the drawings. The extraction device, the extraction method, and the extraction program according to the present application are not limited to this embodiment. Further, in each of the following embodiments, the same parts are designated by the same reference numerals, and duplicate explanations are omitted.

（実施形態）
〔１．抽出処理〕
図１を用いて、実施形態に係る抽出処理の一例について説明する。図１は、実施形態に係る抽出処理の一例を示す図である。図１では、抽出装置１００が特許文献（単に「特許」ともいう）をグラフ構造化したグラフデータ（グラフ情報）を検索することにより、類似する特許文献である類似特許文献（以下、「類似特許」ともいう）を抽出する場合を示す。図１では、抽出装置１００は、各特許に対応するベクトルデータ（「ベクトル情報」や、単に「ベクトル」ともいう）を用いて特許をグラフ構造化したグラフ情報を用いる。なお、抽出装置１００が用いる情報は、ベクトルに限らず、各特許文献の類似性を表現可能な情報であれば、どのような形式の情報であってもよい。例えば、抽出装置１００は、各特許に対応する所定のデータや値を用いて特許をグラフ構造化したグラフ情報を用いてもよい。例えば、抽出装置１００は、各特許から生成された所定の数値（例えば２進数の値や１６進数の値）を用いて特許をグラフ構造化したグラフ情報を用いてもよい。また、図１の例では、特許文献として特許の書類のうち、種類「要約書」を対象とする場合を一例として示すが、特許の書類は、要約書のみに限らず、図面や明細書や特許請求の範囲（以下、「クレーム」とする場合がある）等の各種類に対応する書類であってもよい。すなわち、対象とする特許（オブジェクト）は、各特許間の類似性を表現可能であれば、どのような特許（情報）であってもよい。 (Embodiment)
[1. Extraction process]
An example of the extraction process according to the embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an example of an extraction process according to an embodiment. In FIG. 1, a similar patent document (hereinafter, “similar patent”), which is a similar patent document, is searched by the extraction device 100 by searching graph data (graph information) in which a patent document (also simply referred to as “patent”) is graph-structured. ”) Is shown. In FIG. 1, the extraction device 100 uses graph information in which patents are graph-structured using vector data (also referred to as “vector information” or simply “vector”) corresponding to each patent. The information used by the extraction device 100 is not limited to a vector, and may be any form of information as long as it can express the similarity of each patent document. For example, the extraction device 100 may use graph information in which patents are graph-structured using predetermined data and values corresponding to each patent. For example, the extraction device 100 may use graph information in which patents are graph-structured using predetermined numerical values (for example, binary values and hexadecimal values) generated from each patent. Further, in the example of FIG. 1, a case where the type "abstract" is targeted as a patent document among patent documents is shown as an example, but the patent document is not limited to the abstract, but may be a drawing, a specification, or the like. Documents corresponding to each type such as the scope of claims (hereinafter, may be referred to as "claims") may be used. That is, the target patent (object) may be any patent (information) as long as the similarity between the patents can be expressed.

〔１−１．グラフ情報について〕
また、抽出装置１００は、図１中のグラフ情報ＧＲ１１に示すように、各ベクトル（ノード）が有向エッジにより連結されたグラフデータを対象に抽出処理を行う。なお、図１中のグラフ情報ＧＲ１１に示すようなグラフ情報は、抽出装置１００が生成してもよいし、抽出装置１００は、図１中のグラフ情報ＧＲ１１に示すようなグラフ情報を情報提供装置５０（図２参照）等の他の外部装置から取得してもよい。なお、グラフ情報ＧＲ１１は、特許の書類のうち、種類「要約書」に対応する要約情報（要約データ）をグラフ構造化したグラフ情報である。 [1-1. About graph information]
Further, as shown in the graph information GR11 in FIG. 1, the extraction device 100 performs extraction processing on graph data in which each vector (node) is connected by a directed edge. The graph information as shown in the graph information GR 11 in FIG. 1 may be generated by the extraction device 100, and the extraction device 100 may provide the graph information as shown in the graph information GR 11 in FIG. It may be obtained from another external device such as 50 (see FIG. 2). The graph information GR 11 is graph information obtained by graph-structured summary information (summary data) corresponding to the type "summary" among patent documents.

また、ここでいう、有向エッジとは、一方向にしかデータを辿れないエッジを意味する。以下では、エッジにより辿る元、すなわち始点となるノードを参照元とし、エッジにより辿る先、すなわち終点となるノードを参照先とする。例えば、所定のノード「Ａ」から所定のノード「Ｂ」に連結される有向エッジとは、参照元をノード「Ａ」とし、参照先をノード「Ｂ」とするエッジであることを示す。なお、各ノードを連結するエッジは、有向エッジに限らず、種々のエッジであってもよい。例えば、各ノードを連結するエッジは、ノードを連結する方向のないエッジであってもよい。例えば、各ノードを連結するエッジは、相互に参照可能なエッジであってもよい。例えば、各ノードを連結するエッジは、全て双方向エッジであってもよい。 Further, the directed edge here means an edge in which data can be traced in only one direction. In the following, the source traced by the edge, that is, the node that is the start point is referred to as the reference source, and the destination that is traced by the edge, that is, the node that is the end point is referred to as the reference destination. For example, the directed edge connected from the predetermined node "A" to the predetermined node "B" indicates that the reference source is the node "A" and the reference destination is the node "B". The edge connecting the nodes is not limited to the directed edge, and may be various edges. For example, the edge connecting the nodes may be an edge having no direction to connect the nodes. For example, the edge connecting the nodes may be an edge that can be referred to each other. For example, the edges connecting the nodes may all be bidirectional edges.

例えば、このようにノード「Ａ」を参照元とするエッジをノード「Ａ」の出力エッジという。また、例えば、このようにノード「Ｂ」を参照先とするエッジをノード「Ｂ」の入力エッジという。すなわち、ここでいう出力エッジ及び入力エッジとは、一の有向エッジをその有向エッジが連結する２つのノードのうち、いずれのノードを中心として捉えるかの相違であり、一の有向エッジが出力エッジ及び入力エッジになる。すなわち、出力エッジ及び入力エッジは、相対的な概念であって、一の有向エッジについて、参照元となるノードを中心として捉えた場合に出力エッジとなり、参照先となるノードを中心として捉えた場合に入力エッジとなる。なお、本実施形態においては、エッジについては、出力エッジや入力エッジ等の有向エッジを対象とするため、以下では、有向エッジを単に「エッジ」と記載する場合がある。 For example, the edge with the node "A" as the reference source is called the output edge of the node "A". Further, for example, such an edge with the node "B" as a reference destination is referred to as an input edge of the node "B". That is, the output edge and the input edge referred to here are the differences in which node of the two nodes to which the directed edge is connected is regarded as the center, and the one directed edge is one directed edge. Becomes the output edge and the input edge. That is, the output edge and the input edge are relative concepts, and when one directed edge is regarded as the center of the reference node, the output edge is regarded as the center of the reference node. In some cases it becomes the input edge. In the present embodiment, since the edge is a directed edge such as an output edge or an input edge, the directed edge may be simply referred to as an "edge" in the following.

例えば、抽出装置１００は、数百万〜数億単位の特許文献に対応するノードを対象に処理を行うが、図面においてはその一部のみを図示する。図１の例では、説明を簡単にするために、８個のノードを図示して処理の概要を説明する。例えば、抽出装置１００は、図１中のグラフ情報ＧＲ１１に示すように、ノードＮ１、Ｎ２、Ｎ３等に示すような複数のノード（ベクトル）を含むグラフ情報を取得する。また、図１の例では、グラフ情報ＧＲ１１における各ノードは、そのノードとの間の距離が近い方から所定数のノードへのエッジ（出力エッジ）が連結される。例えば、所定数は、目的や用途等に応じて、２や５や１０や１００等の種々の値であってもよい。例えば、所定数が２である場合、ノードＮ１からは、ノードＮ１からの距離が最も近いノード及び２番目に距離が近い２つのノードに出力エッジが連結される。 For example, the extraction device 100 processes a node corresponding to millions to hundreds of millions of patent documents, but only a part thereof is shown in the drawings. In the example of FIG. 1, in order to simplify the explanation, eight nodes are illustrated to explain the outline of the process. For example, as shown in the graph information GR11 in FIG. 1, the extraction device 100 acquires graph information including a plurality of nodes (vectors) as shown in the nodes N1, N2, N3, and the like. Further, in the example of FIG. 1, each node in the graph information GR 11 is connected with edges (output edges) from the one having a short distance to the node to a predetermined number of nodes. For example, the predetermined number may be various values such as 2, 5, 10, and 100 depending on the purpose, use, and the like. For example, when the predetermined number is 2, the output edge is connected from the node N1 to the node having the closest distance from the node N1 and the two nodes having the second closest distance.

また、このように「ノードＮ＊（＊は任意の数値）」と記載した場合、そのノードはノードＩＤ「Ｎ＊」により識別されるノードであることを示す。例えば、「ノードＮ１」と記載した場合、そのノードはノードＩＤ「Ｎ１」により識別されるノードである。 Further, when described as "node N * (* is an arbitrary numerical value)" in this way, it means that the node is a node identified by the node ID "N *". For example, when described as "node N1", the node is a node identified by the node ID "N1".

また、図１中のグラフ情報ＧＲ１１では、ノードＮ１０は、ノードＮ７へ向かう有向エッジであるエッジＥ７が連結される。すなわち、ノードＮ１０は、ノードＮ７とエッジＥ７により連結される。このように「エッジＥ＊（＊は任意の数値）」と記載した場合、そのエッジはエッジＩＤ「Ｅ＊」により識別されるエッジであることを示す。例えば、「エッジＥ１１」と記載した場合、そのエッジはエッジＩＤ「Ｅ１１」により識別されるエッジである。例えば、ノードＮ１０を参照元とし、ノードＮ７を参照先として連結されるエッジＥ７により、ノードＮ１０からノードＮ７に辿ることが可能となる。この場合、有向エッジであるエッジＥ７は、ノードＮ１０を中心として識別される場合、出力エッジとなり、ノードＮ７を中心として識別される場合、入力エッジとなる。また、図１のグラフ情報ＧＲ１１中の双方向矢印は、両方のノードから他方のノードへの有向エッジが連結されることを示す。例えば、グラフ情報ＧＲ１１中のノードＮ２とノードＮ４５１との間の双方向矢印は、ノードＮ２からノードＮ４５１へ向かう有向エッジと、ノードＮ４５１からノードＮ２へ向かう有向エッジとの２つのエッジが連結されることを示す。 Further, in the graph information GR11 in FIG. 1, the node N10 is connected to the edge E7, which is a directed edge toward the node N7. That is, the node N10 is connected to the node N7 by the edge E7. When "edge E * (* is an arbitrary numerical value)" is described in this way, it means that the edge is an edge identified by the edge ID "E *". For example, when described as "edge E11", the edge is an edge identified by the edge ID "E11". For example, the edge E7 connected with the node N10 as the reference source and the node N7 as the reference destination makes it possible to trace from the node N10 to the node N7. In this case, the edge E7, which is a directed edge, becomes an output edge when identified with the node N10 as the center, and becomes an input edge when identified with the node N7 as the center. Also, the bidirectional arrows in the graph information GR11 of FIG. 1 indicate that the directed edges from both nodes to the other node are connected. For example, a bidirectional arrow between node N2 and node N451 in the graph information GR11 connects two edges, a directed edge from node N2 to node N451 and a directed edge from node N451 to node N2. Indicates that it will be done.

また、図１中のグラフ情報ＧＲ１１は、ユークリッド空間であってもよい。また、図１に示すグラフ情報ＧＲ１１は、各ベクトル間の距離等の説明のための概念的な図であり、グラフ情報ＧＲ１１は、多次元空間である。例えば、図１に示すグラフ情報ＧＲ１１は、平面上に図示するため２次元の態様にて図示されるが、例えば１００次元や１０００次元等の多次元空間であるものとする。 Further, the graph information GR11 in FIG. 1 may be an Euclidean space. Further, the graph information GR 11 shown in FIG. 1 is a conceptual diagram for explaining the distance between each vector, and the graph information GR 11 is a multidimensional space. For example, the graph information GR11 shown in FIG. 1 is shown in a two-dimensional manner for being shown on a plane, but is assumed to be a multidimensional space such as 100 dimensions or 1000 dimensions.

ここで、ベクトルデータ間の距離は、特許文献の類似性を示し、距離が近いほど類似している。本実施形態においては、グラフ情報ＧＲ１１における各ノードの距離を対応する各オブジェクト間の類似度とする。例えば、各ノードに対応する対象（特許文献）の類似性が、グラフ情報ＧＲ１１内におけるノード間の距離として写像されているものとする。例えば、各ノードに対応する概念間の類似度が各ノード間の距離に写像されているものとする。ここで、図１に示す例においては、グラフ情報ＧＲ１１における各ノード間の距離が短いオブジェクト同士の類似度が高く、グラフ情報ＧＲ１１における各ノード間の距離が長いオブジェクト同士の類似度が低い。例えば、図１中のグラフ情報ＧＲ１１において、ノードＩＤ「Ｎ３５」により識別されるノードと、ノードＩＤ「Ｎ６９３」により識別されるノードとは近接している、すなわち距離が短い。そのため、ノードＩＤ「Ｎ３５」により識別されるノードに対応するオブジェクトと、ノードＩＤ「Ｎ６９３」により識別されるノードに対応するオブジェクトとは類似度が高いことを示す。 Here, the distance between the vector data shows the similarity of the patent documents, and the closer the distance is, the more similar they are. In the present embodiment, the distance of each node in the graph information GR11 is defined as the degree of similarity between the corresponding objects. For example, it is assumed that the similarity of the object (patent document) corresponding to each node is mapped as the distance between the nodes in the graph information GR11. For example, it is assumed that the similarity between the concepts corresponding to each node is mapped to the distance between each node. Here, in the example shown in FIG. 1, the similarity between the objects having a short distance between the nodes in the graph information GR11 is high, and the similarity between the objects having a long distance between the nodes in the graph information GR11 is low. For example, in the graph information GR11 in FIG. 1, the node identified by the node ID “N35” and the node identified by the node ID “N693” are close to each other, that is, the distance is short. Therefore, it is shown that the object corresponding to the node identified by the node ID “N35” and the object corresponding to the node identified by the node ID “N693” have a high degree of similarity.

また、例えば、図１中のグラフ情報ＧＲ１１において、ノードＩＤ「Ｎ７」により識別されるノードと、ノードＩＤ「Ｎ２」により識別されるノードとは遠隔にある、すなわち距離が長い。そのため、ノードＩＤ「Ｎ７」により識別されるノードに対応するオブジェクトと、ノードＩＤ「Ｎ２」により識別されるノードに対応するオブジェクトとは類似度が低いことを示す。 Further, for example, in the graph information GR11 in FIG. 1, the node identified by the node ID “N7” and the node identified by the node ID “N2” are remote, that is, have a long distance. Therefore, it is shown that the object corresponding to the node identified by the node ID “N7” and the object corresponding to the node identified by the node ID “N2” have a low degree of similarity.

〔１−２．ベクトルの生成例〕
また、ここでいう、各ノード（ベクトル）は、各オブジェクト（特許）に対応する。図１の例では、各特許の特許文献データ（特許文献情報）から抽出された特徴量により生成された多次元（Ｎ次元）のベクトルがオブジェクトであってもよい。図１の例では、抽出装置１００は、特許文献情報に含まれる書類の種類ごとに生成されるベクトルを用いる。すなわち、抽出装置１００は、各特許について、要約書、図面、明細書、及びクレームごとに生成された多次元（Ｎ次元）のベクトルを用いる。この場合、抽出装置１００は、各種類について生成される複数のグラフ情報を用いる。図１の例では、説明を簡単にするために、抽出装置１００は、各特許の要約書から抽出された特徴量により生成された多次元（Ｎ次元）のベクトルがグラフ構造化されたグラフ情報ＧＲ１１を用いる場合を示す。以下、特許の一の書類の種類である要約書を基に説明する。なお、抽出装置１００は、特許文献情報全体から抽出された特徴量により生成された多次元（Ｎ次元）のベクトルをオブジェクトとして用いてもよい。 [1-2. Vector generation example]
Further, each node (vector) referred to here corresponds to each object (patent). In the example of FIG. 1, a multidimensional (N-dimensional) vector generated by a feature amount extracted from patent document data (patent document information) of each patent may be an object. In the example of FIG. 1, the extraction device 100 uses a vector generated for each type of document included in the patent document information. That is, the extraction device 100 uses a multidimensional (N-dimensional) vector generated for each of the abstracts, drawings, specifications, and claims for each patent. In this case, the extraction device 100 uses a plurality of graph information generated for each type. In the example of FIG. 1, in order to simplify the explanation, the extraction device 100 is a graph information in which a multidimensional (N-dimensional) vector generated by the features extracted from the abstracts of each patent is graph-structured. The case where GR11 is used is shown. Hereinafter, the description will be given based on a summary, which is a type of document for a patent. The extraction device 100 may use a multidimensional (N-dimensional) vector generated by the feature amount extracted from the entire patent document information as an object.

例えば、抽出装置１００は、特許文献情報の特徴を抽出するモデルを用いて各特許文献情報からＮ次元ベクトルを生成してもよい。図１の例では、抽出装置１００は、モデル情報記憶部１２４（図７参照）に示すように、モデルＩＤ「Ｍ１」により識別されるモデル（モデルＭ１）を用いて、各特許の要約データからベクトルを生成する。上記のように、「モデルＭ＊（＊は任意の数値）」と記載した場合、そのモデルはモデルＩＤ「Ｍ＊」により識別されるモデルであることを示す。例えば、「モデルＭ１」と記載した場合、そのモデルはモデルＩＤ「Ｍ１」により識別されるモデルである。また、モデル情報記憶部１２４に示すように、モデルＭ１は用途「特徴抽出（要約書）」、すなわち要約書（要約）のデータからの特徴抽出のために用いられるモデルであり、その具体的なモデルデータが「モデルデータＭＤＴ１」であることを示す。 For example, the extraction device 100 may generate an N-dimensional vector from each patent document information by using a model for extracting the features of the patent document information. In the example of FIG. 1, the extraction device 100 uses the model (model M1) identified by the model ID “M1” as shown in the model information storage unit 124 (see FIG. 7) from the summary data of each patent. Generate a vector. As described above, when "model M * (* is an arbitrary numerical value)" is described, it means that the model is a model identified by the model ID "M *". For example, when described as "model M1", the model is a model identified by the model ID "M1". Further, as shown in the model information storage unit 124, the model M1 is a model used for the purpose "feature extraction (summary)", that is, feature extraction from the data of the summary (summary), and the specific model thereof. Indicates that the model data is "model data MDT1".

例えば、抽出装置１００は、モデルＭ１に要約書の要約情報を入力することにより、モデルＭ１中の各要素（ニューロン）の値を演算し、入力した要約情報と同様の情報を出力する。例えば、抽出装置１００は、中間層の各要素（ニューロン）の値を特徴量として抽出し、各特許の要約書に対応するＮ次元のベクトルデータを生成してもよい。 For example, the extraction device 100 calculates the value of each element (neuron) in the model M1 by inputting the summary information of the summary into the model M1, and outputs the same information as the input summary information. For example, the extraction device 100 may extract the value of each element (neuron) of the intermediate layer as a feature amount and generate N-dimensional vector data corresponding to the abstract of each patent.

ここで、図１０を用いて、各特許の要約書に対応するベクトルデータの生成の一例を示す。図１０は、実施形態に係る特徴量の抽出の一例を示す図である。図１０は、モデルＭ１の概念図である。なお、図１０では、各要素（ニューロン）の各接続関係を示す線の図示を省略する。図１０に示すように、モデルＭ１は、入力層ＩＬと、中間層ＣＬと、出力層ＯＬとを含む。例えば、モデルＭ１の入力層ＩＬは、要約書の要約情報が入力される層である。また、出力層ＯＬは、入力層ＩＬへの入力に応じて、入力された要約情報と同様の情報を出力される層である。 Here, FIG. 10 is used to show an example of generating vector data corresponding to the abstracts of each patent. FIG. 10 is a diagram showing an example of extraction of a feature amount according to an embodiment. FIG. 10 is a conceptual diagram of the model M1. In FIG. 10, the line showing each connection relationship of each element (neuron) is omitted. As shown in FIG. 10, the model M1 includes an input layer IL, an intermediate layer CL, and an output layer OL. For example, the input layer IL of the model M1 is a layer into which the summary information of the abstract is input. Further, the output layer OL is a layer that outputs the same information as the input summary information in response to the input to the input layer IL.

また、例えば、中間層ＣＬの中央部の最も圧縮された圧縮層ＲＰは、入力された要約情報の特徴を表現する層である。例えば、モデルＭ１の中間層ＣＬにおいて、入力層ＩＬから圧縮層ＲＰまでの間は、エンコードの処理を行う部分に対応する。モデルＭ１の中間層ＣＬにおいて、入力層ＩＬから圧縮層ＲＰまでの間は、入力された要約情報の特徴を圧縮する処理を行う部分に対応する。例えば、モデルＭ１の中間層ＣＬにおいて、圧縮層ＲＰから出力層ＯＬまでの間は、デコードの処理を行う部分に対応する。モデルＭ１の中間層ＣＬにおいて、圧縮層ＲＰから出力層ＯＬまでの間は、圧縮された要約情報を復元する処理を行う部分に対応する。 Further, for example, the most compressed compression layer RP in the central portion of the intermediate layer CL is a layer expressing the characteristics of the input summary information. For example, in the intermediate layer CL of the model M1, the portion from the input layer IL to the compression layer RP corresponds to the portion where the encoding process is performed. In the intermediate layer CL of the model M1, between the input layer IL and the compression layer RP corresponds to a portion that performs a process of compressing the characteristics of the input summary information. For example, in the intermediate layer CL of the model M1, the portion from the compression layer RP to the output layer OL corresponds to the portion where the decoding process is performed. In the intermediate layer CL of the model M1, the portion from the compression layer RP to the output layer OL corresponds to a portion for performing a process of restoring the compressed summary information.

例えば、抽出装置１００は、圧縮層ＲＰに含まれるニューロンＮＬ１やニューロンＮＬ２等の情報をベクトルに用いてもよい。例えば、抽出装置１００は、要約書の要約情報が入力された場合に、算出されるニューロンＮＬ１に対応する値ＶＥ１やニューロンＮＬ２に対応する値ＶＥ２をベクトルの要素（一の次元の値）として抽出してもよい。例えば、抽出装置１００は、要約書の要約情報が入力された場合に、算出されるニューロンＮＬ１に対応する値ＶＥ１をその要約書のベクトルの１次元目の要素として抽出してもよい。また、例えば、抽出装置１００は、要約書の要約情報が入力された場合に、算出されるニューロンＮＬ２に対応する値ＶＥ２をその要約書のベクトルの２次元目の要素として抽出してもよい。このように、抽出装置１００は、各要約書の要約情報をモデルＭ１に入力することにより、各要約書に対応するベクトルを生成してもよい。なお、抽出装置１００は、各要約書に対応するベクトルを情報提供装置５０等の他の外部装置から取得してもよい。なお、抽出装置１００は、ベクトルの各要素として、各ニューロンに対応する値自体を用いてもよいし、各ニューロンに対応する値に所定の係数を乗算した値を用いてもよい。また、図１の例では説明を簡単にするために、ベクトルの各要素（値）が整数である場合を示すが、ベクトルの各要素（値）は、小数点以下の数値を含む実数であってもよい。 For example, the extraction device 100 may use information such as neuron NL1 and neuron NL2 included in the compression layer RP as a vector. For example, the extraction device 100 extracts the calculated value VE1 corresponding to the neuron NL1 and the value VE2 corresponding to the neuron NL2 as a vector element (one-dimensional value) when the summary information of the abstract is input. You may. For example, the extraction device 100 may extract the value VE1 corresponding to the calculated neuron NL1 as the first-dimensional element of the vector of the abstract when the abstract information of the abstract is input. Further, for example, the extraction device 100 may extract the value VE2 corresponding to the calculated neuron NL2 as the second-dimensional element of the vector of the abstract when the abstract information of the abstract is input. As described above, the extraction device 100 may generate a vector corresponding to each abstract by inputting the abstract information of each abstract into the model M1. The extraction device 100 may acquire the vector corresponding to each abstract from another external device such as the information providing device 50. The extraction device 100 may use the value itself corresponding to each neuron as each element of the vector, or may use the value corresponding to each neuron multiplied by a predetermined coefficient. Further, in the example of FIG. 1, for the sake of simplicity, a case where each element (value) of the vector is an integer is shown, but each element (value) of the vector is a real number including a numerical value after the decimal point. May be good.

なお、抽出装置１００は、圧縮層ＲＰの要素（ニューロン）に限らず、中間層ＣＬ中の他の要素（ニューロン）の情報をベクトルに用いてもよい。例えば、抽出装置１００は、エンコード部分のニューロンＮＬ３やデコード部分のニューロンＮＬ４等の情報をベクトルに用いてもよい。例えば、抽出装置１００は、要約書の要約情報が入力された場合に、算出されるニューロンＮＬ３に対応する値ＶＥ３やニューロンＮＬ４に対応する値ＶＥ４をベクトルの要素（一の次元の値）として抽出してもよい。なお、上記は、一例であり、抽出装置１００は、オートエンコーダに限らず、種々のモデルを用いて、要約情報からの特徴抽出を行ってもよい。また、例えば、トリプレットロス（triplet loss）といった類似性を学習する方法によりモデルを生成してもよい。また、抽出装置１００は、モデルを用いずに、特徴抽出を行ってもよい。例えば、抽出装置１００は、抽出装置１００の管理者等が設定して特徴（素性）に対応する情報を要約情報から抽出し、ベクトルを生成してもよい。例えば、抽出装置１００は、技術分野や解決手段や効果等の特徴（素性）に対応する情報を要約情報から抽出し、ベクトルを生成してもよい。 The extraction device 100 is not limited to the element (neuron) of the compression layer RP, and may use the information of other elements (neurons) in the intermediate layer CL as the vector. For example, the extraction device 100 may use information such as the neuron NL3 in the encoding portion and the neuron NL4 in the decoding portion as a vector. For example, the extraction device 100 extracts the calculated value VE3 corresponding to the neuron NL3 and the value VE4 corresponding to the neuron NL4 as a vector element (one-dimensional value) when the summary information of the abstract is input. You may. The above is an example, and the extraction device 100 is not limited to the autoencoder, and various models may be used to extract features from the summary information. The model may also be generated by a method of learning similarity, for example, triplet loss. Further, the extraction device 100 may perform feature extraction without using a model. For example, the extraction device 100 may be set by the administrator of the extraction device 100 or the like to extract information corresponding to the feature (feature) from the summary information and generate a vector. For example, the extraction device 100 may extract information corresponding to features (features) such as a technical field, a solution, and an effect from the summary information, and generate a vector.

また、例えば、抽出装置１００は、情報提供装置５０等の他の外部装置からモデルＭ１を取得してもよい。なお、抽出装置１００は、特許情報記憶部１２１（図４参照）に記憶された各特許の要約書の要約情報ＡＤ１、ＡＤ２、ＡＤ４５１等を入力として、モデルＭ１を生成してもよい。例えば、要約情報ＡＤ１、ＡＤ２、ＡＤ４５１等は、対応する特許文献中の要約書の文章全体であってもよい。また、例えば、要約情報ＡＤ１、ＡＤ２、ＡＤ４５１等は、対応する特許文献中の要約書の文章から抽出された各要素、例えば要約書に含まれる単語の一覧や単語の出現頻度等を示す情報であってもよい。また、例えば、要約情報ＡＤ１、ＡＤ２、ＡＤ４５１等が、文章や単語群である場合、抽出装置１００は、Ｗｏｒｄ２ＶｅｃやＤｏｃ２Ｖｅｃのようなアルゴリズム等を用いて、ベクトルを生成してもよい。例えば、抽出装置１００は、要約情報ＡＤ１、ＡＤ２、ＡＤ４５１等から、Ｄｏｃ２Ｖｅｃを用いてベクトルを生成してもよい。 Further, for example, the extraction device 100 may acquire the model M1 from another external device such as the information providing device 50. The extraction device 100 may generate the model M1 by inputting the summary information AD1, AD2, AD451, etc. of the abstract of each patent stored in the patent information storage unit 121 (see FIG. 4). For example, the abstract information AD1, AD2, AD451, etc. may be the entire text of the abstract in the corresponding patent document. Further, for example, the abstract information AD1, AD2, AD451, etc. are information indicating each element extracted from the text of the abstract in the corresponding patent document, for example, a list of words included in the abstract, the frequency of appearance of the words, and the like. There may be. Further, for example, when the summary information AD1, AD2, AD451 and the like are sentences and word groups, the extraction device 100 may generate a vector by using an algorithm such as Word2Vec or Doc2Vec. For example, the extraction device 100 may generate a vector from the summary information AD1, AD2, AD451, etc. using Doc2Vec.

抽出装置１００は、要約書の要約情報（要約データ）が入力される入力層と、出力層と、入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力されたログ情報に対し、出力層以外の各層に属する各要素を第１要素として、第１要素と第１要素の重みとに基づく演算を行うことにより、入力層に入力される情報と同様の情報を出力層から出力するモデルＭ１を生成してもよい。なお、抽出装置１００は、他のモデルＭ２〜Ｍ４についても同様の処理により生成する。例えば、抽出装置１００は、モデルＭ２に特許の図面（選択図）の画像情報（画像データ）を入力することにより、モデルＭ２中の各要素（ニューロン）の値を演算し、入力した画像情報と同様の情報を出力する。例えば、抽出装置１００は、中間層の各要素（ニューロン）の値を特徴量として抽出し、各特許に対応する図面に関するＮ次元のベクトルデータを生成してもよい。また、抽出装置１００は、ベクトル生成に用いるモデルを情報提供装置５０等の他の外部装置から取得してもよい。 The extraction device 100 is a first element belonging to a layer other than the output layer, which is any of the input layer, the output layer, and the input layer to the output layer into which the summary information (summary data) of the abstract is input. A second element whose value is calculated based on the first element and the weight of the first element, and each element belonging to each layer other than the output layer with respect to the log information input to the input layer. As one element, a model M1 that outputs the same information as the information input to the input layer from the output layer may be generated by performing an operation based on the first element and the weight of the first element. The extraction device 100 also generates the other models M2 to M4 by the same processing. For example, the extraction device 100 calculates the value of each element (neuron) in the model M2 by inputting the image information (image data) of the patented drawing (selection diagram) into the model M2, and the input image information and the image information. Output similar information. For example, the extraction device 100 may extract the value of each element (neuron) of the intermediate layer as a feature amount and generate N-dimensional vector data related to the drawing corresponding to each patent. Further, the extraction device 100 may acquire the model used for vector generation from another external device such as the information providing device 50.

〔１−３．処理例〕
ここから、抽出装置１００は、ユーザから一の発明（特許）の要約書の要約情報を取得し、一の特許に類似する特許に関する情報提供を行う場合を一例として説明する。図１の例では、端末装置１０は、類似特許の検索を所望するユーザＵ１が利用する。例えば、ユーザＵ１は、自身や所属する事業体（企業）等が発明し作成した特許出願書類（特許文献）に類似する特許を検索することを所望するものとする。また、図１の例では、説明を簡単にするために、特許書類のうち、種類「要約書」のみを用いて類似特許の検索を行う場合を示す。なお、複数の種類の書類を用いて検索を行う場合は図１２において後述する。 [1-3. Processing example]
From here, the case where the extraction device 100 acquires the summary information of the abstract of one invention (patent) from the user and provides the information on the patent similar to one patent will be described as an example. In the example of FIG. 1, the terminal device 10 is used by a user U1 who desires to search for similar patents. For example, the user U1 desires to search for a patent similar to a patent application document (patent document) invented and created by himself / herself or a business entity (company) to which he / she belongs. Further, in the example of FIG. 1, in order to simplify the explanation, a case is shown in which a search for similar patents is performed using only the type “abstract” of the patent documents. When searching using a plurality of types of documents, it will be described later in FIG.

まず、ユーザＵ１は、端末装置１０を操作することにより、端末装置１０から一の発明に関する情報（発明情報）として、特許Ｘの要約情報ＡＤ１１を抽出装置１００へ送信する。例えば、要約情報ＡＤ１１は、特許Ｘの要約書の文章全体であってもよい。また、例えば、要約情報ＡＤ１１は、特許Ｘの要約書の文章から抽出された各要素、例えば特許Ｘの要約書に含まれる単語の一覧や単語の出現頻度等を示す情報であってもよい。 First, by operating the terminal device 10, the user U1 transmits the summary information AD 11 of the patent X to the extraction device 100 as information (invention information) relating to one invention from the terminal device 10. For example, the abstract information AD11 may be the entire text of the abstract of patent X. Further, for example, the summary information AD 11 may be information indicating each element extracted from the text of the abstract of the patent X, for example, a list of words included in the abstract of the patent X, the frequency of appearance of the words, and the like.

そして、抽出装置１００は、一の発明に関する特許文献情報を取得する（ステップＳ１１）。図１の例では、抽出装置１００は、端末装置１０から特許Ｘに関する要約情報ＡＤ１１を取得する。 Then, the extraction device 100 acquires patent document information relating to one invention (step S11). In the example of FIG. 1, the extraction device 100 acquires the summary information AD11 regarding the patent X from the terminal device 10.

そして、抽出装置１００は、一の発明に対応する要約情報からグラフ情報の探索に用いるベクトルを生成する。図１の例では、抽出装置１００は、処理群ＰＳ１１に示すような処理により、特許Ｘに対応するベクトルを生成する。抽出装置１００は、特許Ｘに関する要約情報ＡＤ１１をモデルＭ１に入力する（ステップＳ１２）。具体的には、抽出装置１００は、端末装置１０から取得した特許Ｘの要約情報ＡＤ１１をモデルＭ１に入力する。そして、抽出装置１００は、要約情報ＡＤ１１の入力後のモデルＭ１中の情報を用いて、ベクトルを生成する（ステップＳ１３）。例えば、抽出装置１００は、要約情報ＡＤ１１が入力されたモデルＭ１中の各要素を用いて、ベクトルデータを生成する。 Then, the extraction device 100 generates a vector used for searching graph information from the summary information corresponding to one invention. In the example of FIG. 1, the extraction device 100 generates the vector corresponding to the patent X by the processing as shown in the processing group PS11. The extraction device 100 inputs the summary information AD11 regarding the patent X into the model M1 (step S12). Specifically, the extraction device 100 inputs the summary information AD 11 of the patent X acquired from the terminal device 10 into the model M1. Then, the extraction device 100 generates a vector by using the information in the model M1 after inputting the summary information AD11 (step S13). For example, the extraction device 100 generates vector data using each element in the model M1 in which the summary information AD11 is input.

図１の例では、抽出装置１００は、要約情報ＡＤ１１が入力されたモデルＭ１中の各要素の値を用いて、ベクトルデータＶＤ１１（単に「ベクトルＶＤ１１」ともいう）を生成する。例えば、抽出装置１００は、特許Ｘの要約情報ＡＤ１１が入力された場合における、モデルＭ１のニューロンＮＬ１に対応する値ＶＥ１（図１０参照）やニューロンＮＬ２に対応する値ＶＥ２（図１０参照）を用いて、ベクトルを生成する。例えば、抽出装置１００は、特許Ｘの要約情報ＡＤ１１が入力された場合に、算出されるニューロンＮＬ１に対応する値ＶＥ１をベクトルＶＤ１１の１次元目の要素として抽出してもよい。また、例えば、抽出装置１００は、要約書の要約情報が入力された場合に、算出されるニューロンＮＬ２に対応する値ＶＥ２をベクトルＶＤ１１の２次元目の要素として、ベクトルＶＤ１１を生成する。図１の例では、抽出装置１００は、１次元目の要素が「３５」であり、２次元目の要素が「６３」であるようなベクトルＶＤ１１を生成する。 In the example of FIG. 1, the extraction device 100 generates vector data VD11 (also simply referred to as “vector VD11”) using the values of each element in the model M1 to which the summary information AD11 is input. For example, the extraction device 100 uses the value VE1 (see FIG. 10) corresponding to the neuron NL1 of the model M1 and the value VE2 (see FIG. 10) corresponding to the neuron NL2 when the summary information AD11 of the patent X is input. To generate a vector. For example, the extraction device 100 may extract the value VE1 corresponding to the calculated neuron NL1 as the first-dimensional element of the vector VD11 when the summary information AD11 of the patent X is input. Further, for example, the extraction device 100 generates the vector VD11 by using the value VE2 corresponding to the calculated neuron NL2 as the second-dimensional element of the vector VD11 when the summary information of the abstract is input. In the example of FIG. 1, the extraction device 100 generates a vector VD11 such that the first-dimensional element is “35” and the second-dimensional element is “63”.

そして、抽出装置１００は、特許Ｘに類似する特許（類似特許）を検索する（ステップＳ１４）。例えば、抽出装置１００は、非特許文献１に開示されるような近傍検索の技術等の種々の従来技術を適宜用いて、特許Ｘの類似特許を検索してもよい。 Then, the extraction device 100 searches for a patent (similar patent) similar to patent X (step S14). For example, the extraction device 100 may search for a similar patent of Patent X by appropriately using various conventional techniques such as a neighborhood search technique as disclosed in Non-Patent Document 1.

図１の例では、抽出装置１００は、情報群ＩＮＦ１１に示すように、グラフ情報ＧＲ１１やインデックス情報ＩＮＤ１１を用いて特許Ｘの類似特許を検索する。例えば、抽出装置１００は、グラフ情報記憶部１２３（図６参照）から特許の要約書に関するグラフ情報ＧＲ１１を取得する。具体的には、抽出装置１００は、グラフデータセット１２３−１（図６参照）から特許の要約書に関するグラフ情報ＧＲ１１を取得する。また、例えば、抽出装置１００は、インデックス情報記憶部１２２（図５参照）から、グラフ情報ＧＲ１１における検索の起点となるノード（以下、「起点ベクトル」ともいう）の決定に用いるインデックス情報ＩＮＤ１１を取得する。具体的には、抽出装置１００は、インデックスデータセット１２２−１（図５参照）から特許の要約書に関するインデックス情報ＩＮＤ１１を取得する。なお、インデックス情報ＩＮＤ１１は、抽出装置１００が生成してもよいし、抽出装置１００は、インデックス情報ＩＮＤ１１を情報提供装置５０等の他の外部装置から取得してもよい。 In the example of FIG. 1, as shown in the information group INF11, the extraction device 100 searches for a similar patent of the patent X by using the graph information GR11 and the index information IND11. For example, the extraction device 100 acquires the graph information GR11 relating to the abstract of the patent from the graph information storage unit 123 (see FIG. 6). Specifically, the extraction device 100 acquires the graph information GR11 regarding the abstract of the patent from the graph data set 123-1 (see FIG. 6). Further, for example, the extraction device 100 acquires the index information IND11 used for determining the node (hereinafter, also referred to as “starting point vector”) that is the starting point of the search in the graph information GR 11 from the index information storage unit 122 (see FIG. 5). do. Specifically, the extraction device 100 acquires the index information IND11 regarding the abstract of the patent from the index data set 122-1 (see FIG. 5). The index information IND 11 may be generated by the extraction device 100, or the extraction device 100 may acquire the index information IND 11 from another external device such as the information providing device 50.

そして、抽出装置１００は、一の要約書（クエリ）に対応する起点ベクトルを決定（特定）するために、インデックス情報ＩＮＤ１１を用いる。図１の例では、抽出装置１００は、特許ＸのベクトルＶＤ１１に対応する起点ベクトルを決定（特定）するために、インデックス情報ＩＮＤ１１を用いる。すなわち、抽出装置１００は、ベクトルＶＤ１１とインデックス情報ＩＮＤ１１とを用いて、グラフ情報ＧＲ１１における起点ベクトルを決定する。 Then, the extraction device 100 uses the index information IND11 to determine (specify) the origin vector corresponding to one abstract (query). In the example of FIG. 1, the extraction device 100 uses the index information IND11 to determine (specify) the origin vector corresponding to the vector VD11 of the patent X. That is, the extraction device 100 uses the vector VD11 and the index information IND11 to determine the starting point vector in the graph information GR11.

図１中のインデックス情報ＩＮＤ１１は、図５中のインデックス情報記憶部１２２に示す階層構造を有する。例えば、インデックス情報ＩＮＤ１１は、ルートＲＴの直下に位置する第１階層のノード（ベクトル）が、節点ＶＴ１、ＶＴ２等であることを示す。また、例えば、インデックス情報ＩＮＤ１１は、節点ＶＴ２の直下の第２階層のノードが、節点ＶＴ２−１〜ＶＴ２−４（図示せず）であることを示す。また、例えば、インデックス情報ＩＮＤ１１は、節点ＶＴ２−２の直下の第３階層のノードが、ノードＮ３５、ノードＮ４５１、ノードＮ６９３、すなわちグラフ情報ＧＲ１１中のノード（ベクトル）であることを示す。 The index information IND11 in FIG. 1 has a hierarchical structure shown in the index information storage unit 122 in FIG. For example, the index information IND11 indicates that the node (vector) of the first layer located immediately below the root RT is a node VT1, VT2, or the like. Further, for example, the index information IND11 indicates that the node of the second layer immediately below the node VT2 is the node VT2-1 to VT2-4 (not shown). Further, for example, the index information IND11 indicates that the node in the third layer immediately below the node VT2-2 is a node N35, a node N451, a node N693, that is, a node (vector) in the graph information GR11.

例えば、抽出装置１００は、図１中のインデックス情報ＩＮＤ１１に示すような木構造型のインデックス情報を用いて、グラフ情報ＧＲ１１における起点ベクトルを決定する（ステップＳ１５）。図１の例では、抽出装置１００は、ベクトルＶＤ１１を生成した後、インデックス情報ＩＮＤ１１を上から下へ辿ることにより、インデックス情報ＩＮＤ１１の近傍候補となる起点ベクトルを特定することにより、効率的に検索クエリ（一の要約書）に対応する起点ベクトルを決定することができる。 For example, the extraction device 100 determines the starting point vector in the graph information GR 11 using the tree-structured index information as shown in the index information IND11 in FIG. 1 (step S15). In the example of FIG. 1, after generating the vector VD11, the extraction device 100 efficiently searches by tracing the index information IND11 from top to bottom to specify a starting point vector that is a candidate near the index information IND11. The origin vector corresponding to the query (one abstract) can be determined.

例えば、抽出装置１００は、インデックス情報ＩＮＤ１１をルートＲＴからリーフノード（グラフ情報ＧＲ１１中のノード（ベクトル））まで辿ることにより、ベクトルＶＤ１１に対応する起点ベクトルを決定してもよい。図１の例では、例えば、抽出装置１００は、インデックス情報ＩＮＤ１１をルートＲＴからノードＮ４５１まで辿ることにより、ノードＮ４５１を起点ベクトルとして決定する。例えば、抽出装置１００は、木構造に関する種々の従来技術を適宜用いて、インデックス情報ＩＮＤ１１をルートＲＴからリーフノードまで辿ることにより、辿りついたリーフノードを起点ベクトルとして決定してもよい。例えば、抽出装置１００は、ベクトルＶＤ１１との類似度に基づいて、インデックス情報ＩＮＤ１１を下へ辿ることにより、起点ベクトルを決定してもよい。例えば、抽出装置１００は、ルートＲＴから節点ＶＴ１、ＶＴ２等のいずれの節点に辿るかを、ベクトルＶＤ１１と節点ＶＴ１、ＶＴ２との類似度に基づいて決定してもよい。例えば、抽出装置１００は、ルートＲＴから節点ＶＴ１、ＶＴ２等のうち、ベクトルＶＤ１１との類似度が最も高い節点ＶＴ２へ辿ると決定してもよい。また、例えば、抽出装置１００は、節点ＶＴ２から節点ＶＴ２−１〜ＶＴ２−４等のうち、ベクトルＶＤ１１との類似度が最も高い節点ＶＴ２−２へ辿ると決定してもよい。また、例えば、抽出装置１００は、節点ＶＴ２−２からノードＮ３５、Ｎ４５１、Ｎ６９３等のうち、ベクトルＶＤ１１との類似度が最も高い節点ノードＮ４５１へ辿ると決定してもよい。なお、図１の例では、説明を簡単にするために、起点ベクトルを１つ決定する場合を示すが、抽出装置１００は、複数個の起点ベクトルを決定してもよい。例えば、抽出装置１００は、ノードＮ４５１、Ｎ３５、Ｎ６９３、Ｎ２等の複数個のベクトル（ノード）を起点ベクトルとして決定してもよい。なお、インデックス情報ＩＮＤ１１を用いずに、検索開始時にグラフ情報ＧＲ１１からランダムに１つ以上のノードを選択し、それを起点ベクトルとしてもよいし、または、予め指定された１つ以上のノードを起点ベクトルとしてもよい。 For example, the extraction device 100 may determine the starting point vector corresponding to the vector VD11 by tracing the index information IND11 from the root RT to the leaf node (node (vector) in the graph information GR11). In the example of FIG. 1, for example, the extraction device 100 determines the node N451 as a starting point vector by tracing the index information IND11 from the root RT to the node N451. For example, the extraction device 100 may determine the leaf node reached as a starting point vector by tracing the index information IND11 from the root RT to the leaf node by appropriately using various conventional techniques related to the tree structure. For example, the extraction device 100 may determine the origin vector by tracing the index information IND11 downward based on the similarity with the vector VD11. For example, the extraction device 100 may determine which node, such as node VT1 or VT2, is to be traced from the root RT based on the degree of similarity between the vector VD11 and the nodes VT1 and VT2. For example, the extraction device 100 may determine to trace from the root RT to the node VT2 having the highest degree of similarity to the vector VD11 among the nodes VT1, VT2, and the like. Further, for example, the extraction device 100 may determine from the node VT2 to the node VT2-2 having the highest degree of similarity to the vector VD11 among the nodes VT2-1 to VT2-4 and the like. Further, for example, the extraction device 100 may determine to trace from the node VT2-2 to the node node N451 having the highest degree of similarity to the vector VD11 among the nodes N35, N451, N693 and the like. Although the example of FIG. 1 shows a case where one starting point vector is determined for the sake of simplicity, the extraction device 100 may determine a plurality of starting point vectors. For example, the extraction device 100 may determine a plurality of vectors (nodes) such as nodes N451, N35, N693, and N2 as starting point vectors. It should be noted that one or more nodes may be randomly selected from the graph information GR11 at the start of the search without using the index information IND11 and used as a starting point vector, or one or more nodes specified in advance may be used as the starting point. It may be a vector.

そして、抽出装置１００は、グラフ情報ＧＲ１１を検索することにより、特許Ｘの類似特許を抽出する（ステップＳ１６）。例えば、抽出装置１００は、ノードＮ４５１の近傍に位置するノードを類似特許として抽出する。例えば、抽出装置１００は、ノードＮ４５１からの距離が近いノードを類似特許として抽出する。例えば、抽出装置１００は、ノードＮ４５１を起点として、エッジを辿ることにより、ノードＮ４５１から到達可能なノードを類似特許として抽出する。例えば、抽出装置１００は、所定数（例えば、２個や１０個等）のノードを類似特許として抽出する。例えば、抽出装置１００は、図１１に示すような検索処理により、特許Ｘの類似特許を抽出してもよいが、詳細は後述する。図１の例では、抽出装置１００は、ノードＮ４５１を起点として、グラフ情報ＧＲ１１を探索することにより、ノードＮ４５１やノードＮ３５を類似特許として抽出する。 Then, the extraction device 100 extracts a similar patent of the patent X by searching the graph information GR11 (step S16). For example, the extraction device 100 extracts a node located in the vicinity of the node N451 as a similar patent. For example, the extraction device 100 extracts a node having a short distance from the node N451 as a similar patent. For example, the extraction device 100 extracts a node reachable from the node N451 as a similar patent by tracing the edge starting from the node N451. For example, the extraction device 100 extracts a predetermined number (for example, two, ten, etc.) of nodes as similar patents. For example, the extraction device 100 may extract a similar patent of patent X by a search process as shown in FIG. 11, but the details will be described later. In the example of FIG. 1, the extraction device 100 extracts the node N451 and the node N35 as similar patents by searching the graph information GR11 starting from the node N451.

そして、抽出装置１００は、抽出した類似特許に関する情報を提供する（ステップＳ１７）。図１の例では、抽出装置１００は、ノードＮ４５１に対応する特許＃４５１や、ノードＮ３５に対応する特許＃３５を特許Ｘに類似する特許文献としてユーザＵ１が利用する端末装置１０に提供する。 Then, the extraction device 100 provides information regarding the extracted similar patents (step S17). In the example of FIG. 1, the extraction device 100 provides the patent # 451 corresponding to the node N451 and the patent # 35 corresponding to the node N35 to the terminal device 10 used by the user U1 as a patent document similar to the patent X.

上述したように、抽出装置１００は、グラフ情報ＧＲ１１やインデックス情報ＩＮＤ１１を用いて、ユーザＵ１から取得した特許Ｘの類似特許を抽出する。例えば、抽出装置１００は、インデックス情報ＩＮＤ１１を用いて、特許Ｘの類似特許を抽出する際のグラフ情報ＧＲ１１における起点ベクトルを決定する。そして、抽出装置１００は、決定した起点ベクトルを起点としてグラフ情報ＧＲ１１を探索することにより、特許Ｘの類似特許を抽出する。これにより、抽出装置１００は、類似の特許文献を適切に抽出することができる。また、抽出装置１００は、グラフ情報ＧＲ１１を探索することにより、類似の特許文献を高速に抽出することができる。 As described above, the extraction device 100 uses the graph information GR11 and the index information IND11 to extract a similar patent of the patent X acquired from the user U1. For example, the extraction device 100 uses the index information IND11 to determine the starting point vector in the graph information GR11 when extracting a similar patent of the patent X. Then, the extraction device 100 extracts a similar patent of the patent X by searching the graph information GR11 with the determined starting point vector as the starting point. Thereby, the extraction device 100 can appropriately extract similar patent documents. Further, the extraction device 100 can extract similar patent documents at high speed by searching for the graph information GR11.

〔１−４．複数の書類〕
〔１−４−１．処理例〕
図１の例では、１つの書類（要約書）による抽出の例を示したが、抽出装置１００は、複数の書類の情報に基づいて類似特許を抽出（決定）してもよい。この点について、図１２を用いて説明する。図１２は、実施形態に係る抽出処理の一例を示す図である。なお、図１２における抽出処理については、図１で説明した処理は適宜説明を省略する。 [1-4. Multiple documents]
[1-4-1. Processing example]
In the example of FIG. 1, an example of extraction by one document (abstract) is shown, but the extraction device 100 may extract (determine) similar patents based on the information of a plurality of documents. This point will be described with reference to FIG. FIG. 12 is a diagram showing an example of the extraction process according to the embodiment. Regarding the extraction process in FIG. 12, the process described in FIG. 1 will be omitted as appropriate.

図１２では、あるユーザ（例えばユーザＵ１）が端末装置１０を用いる場合を示す。図１２では、抽出装置１００は、端末装置１０から書類「要約書」、「図面」、「明細書」、「特許請求の範囲」等の複数の書類の情報を取得し、各情報に基づいて類似特許を抽出（決定）する場合を一例として説明する。 FIG. 12 shows a case where a certain user (for example, user U1) uses the terminal device 10. In FIG. 12, the extraction device 100 acquires information on a plurality of documents such as documents “summary”, “drawing”, “specification”, and “claims” from the terminal device 10, and is based on each information. The case of extracting (determining) similar patents will be described as an example.

まず、端末装置１０は、特許Ｙに関する特許書類「要約書」、「図面」、「明細書」、「特許請求の範囲」等の複数の書類の情報を抽出装置１００へ送信する（ステップＳ５１）。図１２の例では、端末装置１０は、特許Ｙの要約情報ＡＤ５１、図面情報ＳＩＤ５１、明細書情報ＰＳＤ５１、及びクレーム情報ＣＬＤ５１等を含む特許Ｙに関する特許書類を抽出装置１００へ送信する。 First, the terminal device 10 transmits information of a plurality of documents such as a patent document "abstract", "drawing", "specification", and "claims" relating to patent Y to the extraction device 100 (step S51). .. In the example of FIG. 12, the terminal device 10 transmits the patent document relating to the patent Y including the summary information AD51 of the patent Y, the drawing information SID51, the specification information PSD51, the claim information CLD51, and the like to the extraction device 100.

要約情報ＡＤ５１は、特許Ｙの要約書の文章全体であってもよい。また、要約情報ＡＤ５１は、特許Ｙの要約書の文章から抽出された各要素、例えば特許Ｙの要約書に含まれる単語の一覧や単語の出現頻度等を示す情報であってもよい。 The abstract information AD51 may be the entire text of the abstract of patent Y. Further, the summary information AD 51 may be information indicating each element extracted from the text of the abstract of the patent Y, for example, a list of words included in the abstract of the patent Y, the frequency of appearance of the words, and the like.

また、図面情報ＳＩＤ５１は、特許Ｙの選択図の画像データ等であってもよい。また、図面情報ＳＩＤ５１は、特許Ｙの図１や全図面であってもよい。 Further, the drawing information SID 51 may be image data or the like of the selected drawing of the patent Y. Further, the drawing information SID 51 may be FIG. 1 of Patent Y or all drawings.

また、明細書情報ＰＳＤ５１は、特許Ｙの課題の文章全体であってもよい。また、明細書情報ＰＳＤ５１は、特許Ｙの課題の文章から抽出された各要素、例えば特許Ｙの課題に含まれる単語の一覧や単語の出現頻度等を示す情報であってもよい。また、明細書情報ＰＳＤ５１は、特許Ｙの明細書の文章全体であってもよい。また、明細書情報ＰＳＤ５１は、特許Ｙの明細書の文章から抽出された各要素、例えば特許Ｙの明細書に含まれる単語の一覧や単語の出現頻度等を示す情報であってもよい。 Further, the specification information PSD 51 may be the entire text of the subject of the patent Y. Further, the specification information PSD 51 may be information indicating each element extracted from the text of the subject of the patent Y, for example, a list of words included in the subject of the patent Y, the frequency of appearance of the words, and the like. Further, the specification information PSD 51 may be the entire text of the specification of the patent Y. Further, the specification information PSD 51 may be information indicating each element extracted from the text of the specification of the patent Y, for example, a list of words included in the specification of the patent Y, the frequency of appearance of the words, and the like.

また、クレーム情報ＣＬＤ５１は、特許Ｙのメインクレームの文章全体であってもよい。また、クレーム情報ＣＬＤ５１は、特許Ｙのメインクレームの文章から抽出された各要素、例えば特許Ｙのメインクレームに含まれる単語の一覧や単語の出現頻度等を示す情報であってもよい。また、クレーム情報ＣＬＤ５１は、特許Ｙのクレームの文章全体であってもよい。また、クレーム情報ＣＬＤ５１は、特許Ｙのクレームの文章から抽出された各要素、例えば特許Ｙのクレームに含まれる単語の一覧や単語の出現頻度等を示す情報であってもよい。 Further, the claim information CLD 51 may be the entire text of the main claim of the patent Y. Further, the claim information CLD 51 may be information indicating each element extracted from the text of the main claim of the patent Y, for example, a list of words included in the main claim of the patent Y, the frequency of appearance of the words, and the like. Further, the claim information CLD 51 may be the entire text of the claim of the patent Y. Further, the claim information CLD 51 may be information indicating each element extracted from the text of the claim of the patent Y, for example, a list of words included in the claim of the patent Y, the frequency of appearance of the words, and the like.

そして、端末装置１０から特許Ｙに関する特許書類を取得した抽出装置１００は、対象書類「要約書」の要約情報ＡＤ５１に基づく特許Ｙの類似特許を抽出する処理を行う。まず、抽出装置１００は、特許Ｙの要約情報ＡＤ５１からグラフ情報の探索に用いるベクトルを生成する。例えば、抽出装置１００は、要約情報ＡＤ５１を要約情報からベクトルを生成するモデルに入力する。例えば、抽出装置１００は、要約情報ＡＤ５１をモデルＭ１に入力する。そして、抽出装置１００は、要約情報ＡＤ５１の入力後のモデルＭ１中の情報を用いて、ベクトルを生成する（ステップＳ５２）。例えば、抽出装置１００は、要約情報ＡＤ５１が入力されたモデルＭ１中の各要素を用いて、ベクトルデータを生成する。 Then, the extraction device 100, which has obtained the patent document relating to the patent Y from the terminal device 10, performs a process of extracting a similar patent of the patent Y based on the summary information AD 51 of the target document “summary”. First, the extraction device 100 generates a vector used for searching graph information from the summary information AD 51 of the patent Y. For example, the extraction device 100 inputs the summary information AD51 into a model that generates a vector from the summary information. For example, the extraction device 100 inputs the summary information AD51 into the model M1. Then, the extraction device 100 generates a vector by using the information in the model M1 after inputting the summary information AD51 (step S52). For example, the extraction device 100 generates vector data using each element in the model M1 in which the summary information AD51 is input.

図１２の例では、抽出装置１００は、要約情報ＡＤ５１が入力されたモデルＭ１中の各要素の値を用いて、ベクトルデータＶＤ５１を生成する。例えば、抽出装置１００は、特許Ｙの要約情報ＡＤ５１が入力された場合における、モデルＭ１のニューロンＮＬ１に対応する値ＶＥ１（図１０参照）やニューロンＮＬ２に対応する値ＶＥ２（図１０参照）を用いて、ベクトルを生成する。 In the example of FIG. 12, the extraction device 100 generates the vector data VD51 by using the value of each element in the model M1 in which the summary information AD51 is input. For example, the extraction device 100 uses the value VE1 (see FIG. 10) corresponding to the neuron NL1 of the model M1 and the value VE2 (see FIG. 10) corresponding to the neuron NL2 when the abstract information AD51 of the patent Y is input. To generate a vector.

そして、抽出装置１００は、対象書類「要約書」に対応するグラフ情報ＧＲ１１やインデックス情報ＩＮＤ１１を含む情報群ＩＮＦ１１やベクトルデータＶＤ５１を用いて、対応類似特許を抽出する（ステップＳ５３）。例えば、抽出装置１００は、対象書類「要約書」に対応するグラフ情報ＧＲ１１をグラフ情報記憶部１２３（図６参照）から取得し、インデックス情報ＩＮＤ１１をインデックス情報記憶部１２２（図５参照）から取得する。例えば、抽出装置１００は、グラフ情報ＧＲ１１を検索することにより、要約情報ＡＤ５１に基づく特許Ｙの類似特許として、ノードＮ４５１やノードＮ３５を抽出する。そして、抽出装置１００は、グラフ情報記憶部１２３（図６参照）中のノードと特許との対応付けを示す情報に基づいて、図１２中の類似特許一覧ＰＬ５１に示すように、ノードＮ４５１に対応する特許＃４５１やノードＮ３５に対応する特許＃３５を類似特許として抽出する。 Then, the extraction device 100 extracts the corresponding similar patents by using the information group INF11 including the graph information GR11 and the index information IND11 corresponding to the target document “summary” and the vector data VD51 (step S53). For example, the extraction device 100 acquires the graph information GR11 corresponding to the target document “summary” from the graph information storage unit 123 (see FIG. 6) and the index information IND11 from the index information storage unit 122 (see FIG. 5). do. For example, the extraction device 100 extracts the node N451 and the node N35 as similar patents of the patent Y based on the summary information AD51 by searching the graph information GR11. Then, the extraction device 100 corresponds to the node N451 as shown in the similar patent list PL51 in FIG. 12, based on the information indicating the correspondence between the node and the patent in the graph information storage unit 123 (see FIG. 6). Patent # 451 and patent # 35 corresponding to the node N35 are extracted as similar patents.

次に、抽出装置１００は、対象書類「図面」の図面情報ＳＩＤ５１に基づく特許Ｙの類似特許を抽出する処理を行う。まず、抽出装置１００は、特許Ｙの図面情報ＳＩＤ５１からグラフ情報の探索に用いるベクトルを生成する。例えば、抽出装置１００は、図面情報ＳＩＤ５１を図面情報からベクトルを生成するモデルに入力する。例えば、抽出装置１００は、図面情報ＳＩＤ５１をモデルＭ２に入力する。そして、抽出装置１００は、図面情報ＳＩＤ５１の入力後のモデルＭ２中の情報を用いて、ベクトルを生成する（ステップＳ５４）。例えば、抽出装置１００は、図面情報ＳＩＤ５１が入力されたモデルＭ２中の各要素を用いて、ベクトルデータを生成する。 Next, the extraction device 100 performs a process of extracting a similar patent of patent Y based on the drawing information SID 51 of the target document “drawing”. First, the extraction device 100 generates a vector used for searching graph information from the drawing information SID 51 of the patent Y. For example, the extraction device 100 inputs the drawing information SID 51 into a model that generates a vector from the drawing information. For example, the extraction device 100 inputs the drawing information SID 51 into the model M2. Then, the extraction device 100 generates a vector using the information in the model M2 after the input of the drawing information SID 51 (step S54). For example, the extraction device 100 generates vector data using each element in the model M2 in which the drawing information SID 51 is input.

図１２の例では、抽出装置１００は、図面情報ＳＩＤ５１が入力されたモデルＭ２中の各要素の値を用いて、ベクトルデータＶＤ５２を生成する。例えば、抽出装置１００は、特許Ｙの図面情報ＳＩＤ５１が入力された場合における、モデルＭ２の特徴抽出の対象となる各ニューロンの値（図示せず）を用いて、ベクトルを生成する。例えば、抽出装置１００は、モデルＭ２の中間層の中央部の最も圧縮された圧縮層の各ニューロンの値を用いて、ベクトルを生成してもよい。 In the example of FIG. 12, the extraction device 100 generates the vector data VD 52 by using the value of each element in the model M2 in which the drawing information SID 51 is input. For example, the extraction device 100 generates a vector using the value (not shown) of each neuron to be the target of feature extraction of the model M2 when the drawing information SID 51 of the patent Y is input. For example, the extraction device 100 may generate a vector using the values of each neuron in the most compressed compression layer in the center of the middle layer of model M2.

そして、抽出装置１００は、対象書類「図面」に対応するグラフ情報ＧＲ１２やインデックス情報ＩＮＤ１２を含む情報群ＩＮＦ１２やベクトルデータＶＤ５２を用いて、対応類似特許を抽出する（ステップＳ５５）。例えば、抽出装置１００は、対象書類「図面」に対応するグラフ情報ＧＲ１２をグラフ情報記憶部１２３（図６参照）から取得し、インデックス情報ＩＮＤ１２をインデックス情報記憶部１２２（図５参照）から取得する。例えば、抽出装置１００は、グラフ情報ＧＲ１２を検索することにより、図面情報ＳＩＤ５１に基づく特許Ｙの類似特許として、ノードＮ１やノードＮ３５を抽出する。そして、抽出装置１００は、グラフ情報記憶部１２３（図６参照）中のノードと特許との対応付けを示す情報に基づいて、図１２中の類似特許一覧ＰＬ５２に示すように、ノードＮ１に対応する特許＃１やノードＮ３５に対応する特許＃３５を類似特許として抽出する。 Then, the extraction device 100 extracts the corresponding similar patents by using the information group INF12 including the graph information GR12 and the index information IND12 corresponding to the target document “drawing” and the vector data VD52 (step S55). For example, the extraction device 100 acquires the graph information GR12 corresponding to the target document “drawing” from the graph information storage unit 123 (see FIG. 6) and the index information IND12 from the index information storage unit 122 (see FIG. 5). .. For example, the extraction device 100 extracts the node N1 and the node N35 as similar patents of the patent Y based on the drawing information SID 51 by searching the graph information GR12. Then, the extraction device 100 corresponds to the node N1 as shown in the similar patent list PL52 in FIG. 12, based on the information indicating the correspondence between the node and the patent in the graph information storage unit 123 (see FIG. 6). Patent # 1 and patent # 35 corresponding to the node N35 are extracted as similar patents.

そして、抽出装置１００は、対象書類「明細書」の明細書情報ＰＳＤ５１に基づく特許Ｙの類似特許を抽出する処理を行う。まず、抽出装置１００は、特許Ｙの明細書情報ＰＳＤ５１からグラフ情報の探索に用いるベクトルを生成する。例えば、抽出装置１００は、明細書情報ＰＳＤ５１を明細書情報からベクトルを生成するモデルに入力する。例えば、抽出装置１００は、明細書情報ＰＳＤ５１をモデルＭ３に入力する。そして、抽出装置１００は、明細書情報ＰＳＤ５１の入力後のモデルＭ３中の情報を用いて、ベクトルを生成する（ステップＳ５６）。例えば、抽出装置１００は、明細書情報ＰＳＤ５１が入力されたモデルＭ３中の各要素を用いて、ベクトルデータを生成する。 Then, the extraction device 100 performs a process of extracting a similar patent of patent Y based on the specification information PSD 51 of the target document "specification". First, the extraction device 100 generates a vector used for searching graph information from the specification information PSD 51 of Patent Y. For example, the extraction device 100 inputs the specification information PSD 51 into a model that generates a vector from the specification information. For example, the extraction device 100 inputs the specification information PSD 51 into the model M3. Then, the extraction device 100 generates a vector using the information in the model M3 after the input of the specification information PSD 51 (step S56). For example, the extraction device 100 generates vector data using each element in the model M3 in which the specification information PSD 51 is input.

図１２の例では、抽出装置１００は、明細書情報ＰＳＤ５１が入力されたモデルＭ３中の各要素の値を用いて、ベクトルデータＶＤ５３を生成する。例えば、抽出装置１００は、特許Ｙの明細書情報ＰＳＤ５１が入力された場合における、モデルＭ３の特徴抽出の対象となる各ニューロンの値（図示せず）を用いて、ベクトルを生成する。例えば、抽出装置１００は、モデルＭ３の中間層の中央部の最も圧縮された圧縮層の各ニューロンの値を用いて、ベクトルを生成してもよい。 In the example of FIG. 12, the extraction device 100 generates the vector data VD53 by using the value of each element in the model M3 in which the specification information PSD 51 is input. For example, the extraction device 100 generates a vector using the value (not shown) of each neuron to be the target of feature extraction of the model M3 when the specification information PSD 51 of the patent Y is input. For example, the extraction device 100 may generate a vector using the values of each neuron in the most compressed compression layer in the center of the middle layer of model M3.

そして、抽出装置１００は、対象書類「明細書」に対応するグラフ情報ＧＲ１３やインデックス情報ＩＮＤ１３を含む情報群ＩＮＦ１３やベクトルデータＶＤ５３を用いて、対応類似特許を抽出する（ステップＳ５７）。例えば、抽出装置１００は、対象書類「明細書」に対応するグラフ情報ＧＲ１３をグラフ情報記憶部１２３（図６参照）から取得し、インデックス情報ＩＮＤ１３をインデックス情報記憶部１２２（図５参照）から取得する。例えば、抽出装置１００は、グラフ情報ＧＲ１３を検索することにより、明細書情報ＰＳＤ５１に基づく特許Ｙの類似特許として、ノードＮ３５やノードＮ８９を抽出する。そして、抽出装置１００は、グラフ情報記憶部１２３（図６参照）中のノードと特許との対応付けを示す情報に基づいて、図１２中の類似特許一覧ＰＬ５３に示すように、ノードＮ３５に対応する特許＃３５やノードＮ８９に対応する特許＃８９を類似特許として抽出する。 Then, the extraction device 100 extracts the corresponding similar patents by using the information group INF13 including the graph information GR13 and the index information IND13 corresponding to the target document “specification” and the vector data VD53 (step S57). For example, the extraction device 100 acquires the graph information GR13 corresponding to the target document “specification” from the graph information storage unit 123 (see FIG. 6) and the index information IND13 from the index information storage unit 122 (see FIG. 5). do. For example, the extraction device 100 extracts the node N35 and the node N89 as similar patents of the patent Y based on the specification information PSD 51 by searching the graph information GR13. Then, the extraction device 100 corresponds to the node N35 as shown in the similar patent list PL53 in FIG. 12, based on the information indicating the correspondence between the node and the patent in the graph information storage unit 123 (see FIG. 6). The patent # 35 and the patent # 89 corresponding to the node N89 are extracted as similar patents.

また、抽出装置１００は、対象書類「特許請求の範囲」のクレーム情報ＣＬＤ５１に基づく特許Ｙの類似特許を抽出する処理を行う。まず、抽出装置１００は、特許Ｙのクレーム情報ＣＬＤ５１からグラフ情報の探索に用いるベクトルを生成する。例えば、抽出装置１００は、クレーム情報ＣＬＤ５１をクレーム情報からベクトルを生成するモデルに入力する。例えば、抽出装置１００は、クレーム情報ＣＬＤ５１をモデルＭ４に入力する。そして、抽出装置１００は、クレーム情報ＣＬＤ５１の入力後のモデルＭ４中の情報を用いて、ベクトルを生成する（ステップＳ５８）。例えば、抽出装置１００は、クレーム情報ＣＬＤ５１が入力されたモデルＭ４中の各要素を用いて、ベクトルデータを生成する。 Further, the extraction device 100 performs a process of extracting a similar patent of patent Y based on the claim information CLD 51 of the target document "claims". First, the extraction device 100 generates a vector used for searching graph information from the claim information CLD 51 of the patent Y. For example, the extraction device 100 inputs the claim information CLD 51 into a model that generates a vector from the claim information. For example, the extraction device 100 inputs the claim information CLD 51 into the model M4. Then, the extraction device 100 generates a vector using the information in the model M4 after the input of the claim information CLD 51 (step S58). For example, the extraction device 100 generates vector data using each element in the model M4 in which the claim information CLD 51 is input.

図１２の例では、抽出装置１００は、クレーム情報ＣＬＤ５１が入力されたモデルＭ４中の各要素の値を用いて、ベクトルデータＶＤ５４を生成する。例えば、抽出装置１００は、特許Ｙのクレーム情報ＣＬＤ５１が入力された場合における、モデルＭ４の特徴抽出の対象となる各ニューロンの値（図示せず）を用いて、ベクトルを生成する。例えば、抽出装置１００は、モデルＭ４の中間層の中央部の最も圧縮された圧縮層の各ニューロンの値を用いて、ベクトルを生成してもよい。 In the example of FIG. 12, the extraction device 100 generates the vector data VD54 by using the value of each element in the model M4 in which the claim information CLD 51 is input. For example, the extraction device 100 generates a vector using the value (not shown) of each neuron to be the target of feature extraction of the model M4 when the claim information CLD51 of the patent Y is input. For example, the extraction device 100 may generate a vector using the values of each neuron in the most compressed compression layer in the center of the middle layer of model M4.

そして、抽出装置１００は、対象書類「特許請求の範囲」に対応するグラフ情報ＧＲ１４やインデックス情報ＩＮＤ１４を含む情報群ＩＮＦ１４やベクトルデータＶＤ５４を用いて、対応類似特許を抽出する（ステップＳ５９）。例えば、抽出装置１００は、対象書類「特許請求の範囲」に対応するグラフ情報ＧＲ１４をグラフ情報記憶部１２３（図６参照）から取得し、インデックス情報ＩＮＤ１４をインデックス情報記憶部１２２（図５参照）から取得する。例えば、抽出装置１００は、グラフ情報ＧＲ１４を検索することにより、クレーム情報ＣＬＤ５１に基づく特許Ｙの類似特許として、ノードＮ５７１やノードＮ３５を抽出する。そして、抽出装置１００は、グラフ情報記憶部１２３（図６参照）中のノードと特許との対応付けを示す情報に基づいて、図１２中の類似特許一覧ＰＬ５４に示すように、ノードＮ５７１に対応する特許＃５７１やノードＮ３５に対応する特許＃３５を類似特許として抽出する。 Then, the extraction device 100 extracts the corresponding similar patents by using the information group INF14 including the graph information GR14 and the index information IND14 corresponding to the target document “claims” and the vector data VD54 (step S59). For example, the extraction device 100 acquires the graph information GR14 corresponding to the target document “claims” from the graph information storage unit 123 (see FIG. 6), and the index information IND14 is the index information storage unit 122 (see FIG. 5). Get from. For example, the extraction device 100 extracts the node N571 and the node N35 as similar patents of the patent Y based on the claim information CLD51 by searching the graph information GR14. Then, the extraction device 100 corresponds to the node N571 as shown in the similar patent list PL54 in FIG. 12, based on the information indicating the correspondence between the node and the patent in the graph information storage unit 123 (see FIG. 6). Patent # 571 and patent # 35 corresponding to the node N35 are extracted as similar patents.

次に、抽出装置１００は、４つの種類の書類ごとに抽出した類似特許の情報に基づいて、特許Ｙの類似特許を決定（抽出）する（ステップＳ６０）。図１２の例では、抽出装置１００は、図１２中の類似特許一覧ＰＬ５１〜ＰＬ５４に示すように、４つの種類の全書類の類似特許に含まれる特許＃３５を特許Ｙの類似特許として決定（抽出）する。 Next, the extraction device 100 determines (extracts) a similar patent of patent Y based on the information of the similar patent extracted for each of the four types of documents (step S60). In the example of FIG. 12, the extraction device 100 determines patent # 35 included in the similar patents of all four types of documents as similar patents of patent Y, as shown in the similar patent list PL51 to PL54 in FIG. Extract.

そして、抽出装置１００は、類似特許情報を端末装置１０へ送信する（ステップＳ６１）。図１２の例では、抽出装置１００は、特許＃３５が特許Ｙの類似特許であることを示す情報を端末装置１０へ送信する。これにより、抽出装置１００は、特許Ｙの複数の種類の類似性を加味した類似特許を抽出することができる。 Then, the extraction device 100 transmits similar patent information to the terminal device 10 (step S61). In the example of FIG. 12, the extraction device 100 transmits information indicating that the patent # 35 is a similar patent of the patent Y to the terminal device 10. Thereby, the extraction device 100 can extract a similar patent in consideration of a plurality of types of similarities of the patent Y.

〔１−４−２．他の抽出例〕
上述した例では、抽出装置１００が４つの種類の全書類の類似特許に含まれる特許＃３５を特許Ｙの類似特許として決定（抽出）する場合を示したが、抽出装置１００は、他の基準に基づいて抽出を行ってもよい。例えば、抽出装置１００は、複数の書類に基づいて類似特許が抽出された場合、各書類の類似特許として含まれる割合に応じて、類似特許を決定（抽出）してもよい。例えば、抽出装置１００は、一の発明（特許）の複数の書類に対応する抽出のうち、所定の閾値（例えば５０％や８０％等）以上の抽出で類似特許として抽出された特許（特許ＡＡ）がある場合、その特許（特許ＡＡ）を一の発明（特許）の類似特許としてもよい。 [1-4-2. Other extraction examples]
In the above-mentioned example, the extraction device 100 determines (extracts) the patent # 35 included in the similar patents of all four types of documents as the similar patent of the patent Y, but the extraction device 100 is based on other criteria. Extraction may be performed based on. For example, when similar patents are extracted based on a plurality of documents, the extraction device 100 may determine (extract) similar patents according to the ratio included as similar patents in each document. For example, the extraction device 100 is a patent (patent AA) extracted as a similar patent by extracting a predetermined threshold (for example, 50%, 80%, etc.) or more among the extractions corresponding to a plurality of documents of one invention (patent). ), The patent (patent AA) may be used as a similar patent to one invention (patent).

例えば、抽出装置１００は、書類の種類数が４であり、閾値が５０％であり、一の発明（特許）の３つの種類の書類において類似特許として抽出された特許（特許ＡＢ）がある場合、その特許（特許ＡＢ）を一の発明（特許）の類似特許としてもよい。上述のように、抽出装置１００は、一の発明（特許）の複数の書類全体での抽出された割合に応じて、一の発明（特許）の類似特許を決定することにより、より適切に類似特許を抽出することができる。 For example, in the extraction device 100, the number of types of documents is 4, the threshold value is 50%, and there is a patent (patent AB) extracted as a similar patent in three types of documents of one invention (patent). , The patent (patent AB) may be a similar patent of one invention (patent). As described above, the extraction device 100 is more appropriately similar by determining a similar patent of one invention (patent) according to the extraction ratio in a plurality of documents of one invention (patent). Patents can be extracted.

〔１−５．インデックス情報〕
図１の例に示すインデックス情報（インデックスデータ）は一例であり、抽出装置１００は、種々のインデックス情報を用いて、グラフ情報を検索してもよい。また、例えば、抽出装置１００は、検索時に用いるインデックスデータを生成してもよい。例えば、抽出装置１００は、高次元ベクトルを検索する検索インデックスをインデックスデータとして生成する。ここでいう高次元ベクトルとは、例えば、数百次元から数千次元のベクトルであってもよいし、それ以上の次元のベクトルであってもよい。 [1-5. Index information]
The index information (index data) shown in the example of FIG. 1 is an example, and the extraction device 100 may search for graph information using various index information. Further, for example, the extraction device 100 may generate index data to be used at the time of search. For example, the extraction device 100 generates a search index for searching a high-dimensional vector as index data. The high-dimensional vector referred to here may be, for example, a vector having several hundred dimensions to several thousand dimensions, or a vector having more dimensions.

例えば、抽出装置１００は、図１に示すようなツリー構造（木構造）に関する検索インデックスをインデックスデータとして生成してもよい。例えば、抽出装置１００は、ｋｄ木（k-dimensional tree）に関する検索インデックスをインデックスデータとして生成してもよい。例えば、抽出装置１００は、ＶＰ木（Vantage-Point tree）に関する検索インデックスをインデックスデータとして生成してもよい。 For example, the extraction device 100 may generate a search index related to a tree structure (tree structure) as shown in FIG. 1 as index data. For example, the extraction device 100 may generate a search index for a kd tree (k-dimensional tree) as index data. For example, the extraction device 100 may generate a search index for a VP tree (Vantage-Point tree) as index data.

また、例えば、抽出装置１００は、その他の木構造を有するインデックスデータとして生成してもよい。例えば、抽出装置１００は、木構造のインデックスデータのリーフがグラフデータに接続する種々のインデックスデータを生成してもよい。例えば、抽出装置１００は、木構造のインデックスデータのリーフがグラフデータ中のノードに対応する種々のインデックスデータを生成してもよい。また、抽出装置１００は、このようなインデックスデータを用いて検索を行う場合、インデックスデータを辿って到達したリーフ（ノード）からグラフデータを探索してもよい。 Further, for example, the extraction device 100 may generate index data having another tree structure. For example, the extraction device 100 may generate various index data in which the leaf of the index data of the tree structure is connected to the graph data. For example, the extraction device 100 may generate various index data in which the leaf of the index data of the tree structure corresponds to the node in the graph data. Further, when the extraction device 100 performs a search using such index data, the graph data may be searched from the leaf (node) reached by tracing the index data.

なお、上述したようなインデックスデータは一例であり、抽出装置１００は、グラフデータ中のクエリを高速に特定することが可能であれば、どのようなデータ構造のインデックスデータを生成してもよい。例えば、抽出装置１００は、クエリに対応するグラフ情報中のノードを高速に特定することが可能であれば、バイナリ空間分割に関する技術等の種々の従来技術を適宜用いて、インデックスデータを生成してもよい。例えば、抽出装置１００は、高次元ベクトルの検索に対応可能なインデックスであれば、どのようなデータ構造のインデックスデータを生成してもよい。例えば、抽出装置１００は、非特許文献１に記載されるようなグラフ型の検索インデックスに関する情報をインデックス情報として用いてもよい。抽出装置１００は、上述のようなインデックスデータとグラフデータとを用いることにより、所定の対象に関するより効率的な検索を可能にすることができる。 The index data as described above is an example, and the extraction device 100 may generate index data having any data structure as long as the query in the graph data can be specified at high speed. For example, if the extraction device 100 can identify the node in the graph information corresponding to the query at high speed, the extraction device 100 can generate index data by appropriately using various conventional techniques such as a technique related to binary space partitioning. May be good. For example, the extraction device 100 may generate index data having any data structure as long as it is an index capable of searching for a high-dimensional vector. For example, the extraction device 100 may use information regarding a graph-type search index as described in Non-Patent Document 1 as index information. The extraction device 100 can enable a more efficient search for a predetermined target by using the index data and the graph data as described above.

〔２．抽出システムの構成〕
図２に示すように、抽出システム１は、端末装置１０と、情報提供装置５０と、抽出装置１００とが含まれる。端末装置１０と、情報提供装置５０と、抽出装置１００とは所定のネットワークＮを介して、有線または無線により通信可能に接続される。図２は、実施形態に係る抽出システムの構成例を示す図である。なお、図２に示した抽出システム１には、複数台の端末装置１０や、複数台の情報提供装置５０や、複数台の抽出装置１００が含まれてもよい。 [2. Extraction system configuration]
As shown in FIG. 2, the extraction system 1 includes a terminal device 10, an information providing device 50, and an extraction device 100. The terminal device 10, the information providing device 50, and the extraction device 100 are connected to each other via a predetermined network N so as to be communicable by wire or wirelessly. FIG. 2 is a diagram showing a configuration example of the extraction system according to the embodiment. The extraction system 1 shown in FIG. 2 may include a plurality of terminal devices 10, a plurality of information providing devices 50, and a plurality of extraction devices 100.

端末装置１０は、ユーザによって利用される情報処理装置である。端末装置１０は、ユーザによる種々の操作を受け付ける。なお、以下では、端末装置１０をユーザと表記する場合がある。すなわち、以下では、ユーザを端末装置１０と読み替えることもできる。なお、上述した端末装置１０は、例えば、スマートフォンや、タブレット型端末や、ノート型ＰＣ（Personal Computer）や、デスクトップＰＣや、携帯電話機や、ＰＤＡ（Personal Digital Assistant）等により実現される。例えば、端末装置１０は、所定のサーバシステムを管理者が利用する情報処理装置であってもよい。 The terminal device 10 is an information processing device used by the user. The terminal device 10 accepts various operations by the user. In the following, the terminal device 10 may be referred to as a user. That is, in the following, the user can be read as the terminal device 10. The terminal device 10 described above is realized by, for example, a smartphone, a tablet terminal, a notebook PC (Personal Computer), a desktop PC, a mobile phone, a PDA (Personal Digital Assistant), or the like. For example, the terminal device 10 may be an information processing device in which an administrator uses a predetermined server system.

抽出装置１００は、起点ベクトルを起点としてグラフ情報を検索することにより、複数の特許文献のうち、一の発明に類似する特許文献である類似特許を抽出する情報処理装置である。例えば、抽出装置１００は、一の発明に関する要約情報を取得し、要約書に対応するグラフ情報を取得する。例えば、抽出装置１００は、一の発明に関する要約情報を取得し、特許文献情報と、グラフ情報の検索の起点となる起点ベクトルに関する情報とに基づいて、起点ベクトルを決定する。 The extraction device 100 is an information processing device that extracts similar patents, which are patent documents similar to one invention, from a plurality of patent documents by searching graph information with the starting point vector as the starting point. For example, the extraction device 100 acquires summary information about one invention and acquires graph information corresponding to the summary. For example, the extraction device 100 acquires summary information about one invention, and determines a starting point vector based on patent document information and information about a starting point vector that is a starting point for searching graph information.

抽出装置１００は、ユーザ等に種々の情報提供を行うための情報が格納された情報処理装置である。例えば、抽出装置１００は、端末装置１０から一の発明の発明情報（以下、「クエリ情報」や「クエリ」ともいう）を取得すると、クエリに類似する特許（ベクトル情報等）を検索し、検索結果を端末装置１０に提供する。図１の例では、抽出装置１００は、端末装置１０から一の発明（特許）の要約情報を取得すると、一の特許に類似する特許を検索し、検索結果を類似の特許として端末装置１０に提供する。また、例えば、抽出装置１００が端末装置１０に提供するデータは、特許の名称や特許文献自体であってもよいし、ＵＲＬ（Uniform Resource Locator）等の対応するデータを参照するための情報であってもよい。 The extraction device 100 is an information processing device in which information for providing various information to a user or the like is stored. For example, when the extraction device 100 acquires the invention information of one invention (hereinafter, also referred to as “query information” or “query”) from the terminal device 10, it searches for a patent (vector information or the like) similar to the query and searches. The result is provided to the terminal device 10. In the example of FIG. 1, when the extraction device 100 obtains the summary information of one invention (patent) from the terminal device 10, it searches for a patent similar to one patent, and uses the search result as a similar patent in the terminal device 10. offer. Further, for example, the data provided by the extraction device 100 to the terminal device 10 may be the title of the patent or the patent document itself, or is information for referring to the corresponding data such as a URL (Uniform Resource Locator). You may.

情報提供装置５０は、抽出装置１００に種々の情報提供を行うための情報が格納された情報処理装置である。例えば、情報提供装置５０は、ウェブサーバ等の種々の外部装置から収集した特許情報等が格納されてもよい。例えば、情報提供装置５０は、グラフ情報やインデックス情報やモデル等の種々の情報を抽出装置１００に提供する情報処理装置である。 The information providing device 50 is an information processing device in which information for providing various information is stored in the extraction device 100. For example, the information providing device 50 may store patent information or the like collected from various external devices such as a web server. For example, the information providing device 50 is an information processing device that provides various information such as graph information, index information, and a model to the extracting device 100.

〔３．抽出装置の構成〕
次に、図３を用いて、実施形態に係る抽出装置１００の構成について説明する。図３は、実施形態に係る抽出装置の構成例を示す図である。図３に示すように、抽出装置１００は、通信部１１０と、記憶部１２０と、制御部１３０とを有する。なお、抽出装置１００は、抽出装置１００の管理者等から各種操作を受け付ける入力部（例えば、キーボードやマウス等）や、各種情報を表示するための表示部（例えば、液晶ディスプレイ等）を有してもよい。 [3. Extractor configuration]
Next, the configuration of the extraction device 100 according to the embodiment will be described with reference to FIG. FIG. 3 is a diagram showing a configuration example of the extraction device according to the embodiment. As shown in FIG. 3, the extraction device 100 includes a communication unit 110, a storage unit 120, and a control unit 130. The extraction device 100 has an input unit (for example, a keyboard, a mouse, etc.) that receives various operations from the administrator of the extraction device 100, and a display unit (for example, a liquid crystal display, etc.) for displaying various information. You may.

（通信部１１０）
通信部１１０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部１１０は、ネットワーク（例えば図２中のネットワークＮ）と有線または無線で接続され、端末装置１０や情報提供装置５０との間で情報の送受信を行う。 (Communication unit 110)
The communication unit 110 is realized by, for example, a NIC (Network Interface Card) or the like. Then, the communication unit 110 is connected to the network (for example, the network N in FIG. 2) by wire or wirelessly, and transmits / receives information to / from the terminal device 10 and the information providing device 50.

（記憶部１２０）
記憶部１２０は、例えば、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。実施形態に係る記憶部１２０は、図３に示すように、特許情報記憶部１２１と、インデックス情報記憶部１２２と、グラフ情報記憶部１２３と、モデル情報記憶部１２４とを有する。 (Memory unit 120)
The storage unit 120 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. As shown in FIG. 3, the storage unit 120 according to the embodiment includes a patent information storage unit 121, an index information storage unit 122, a graph information storage unit 123, and a model information storage unit 124.

（特許情報記憶部１２１）
実施形態に係る特許情報記憶部１２１は、特許文献（オブジェクト）に関する各種情報を記憶する。例えば、特許情報記憶部１２１は、特許ＩＤやベクトルデータを記憶する。図４は、実施形態に係る特許情報記憶部の一例を示す図である。図４に示す特許情報記憶部１２１は、「特許ＩＤ」、「特許」、「特許書類情報」といった項目が含まれる。「特許書類情報」には、特許文献に含まれる各種類の書類情報（特許文献情報）が含まれる。図４の例では、「特許書類情報」には、「要約書」、「図面」、「明細書」、「特許請求の範囲」といった項目が含まれる。 (Patent Information Storage Unit 121)
The patent information storage unit 121 according to the embodiment stores various information related to the patent document (object). For example, the patent information storage unit 121 stores patent IDs and vector data. FIG. 4 is a diagram showing an example of a patent information storage unit according to an embodiment. The patent information storage unit 121 shown in FIG. 4 includes items such as "patent ID", "patent", and "patent document information". The "patent document information" includes each type of document information (patent document information) included in the patent document. In the example of FIG. 4, the "patent document information" includes items such as "abstract", "drawing", "specification", and "claims".

また、「要約書」には、「要約情報」、「ベクトル情報」といった項目が含まれる。また、「図面」には、「図面情報（選択図）」、「ベクトル情報」といった項目が含まれる。また、「明細書」には、「明細書情報（課題）」、「ベクトル情報」といった項目が含まれる。また、「特許請求の範囲」には、「クレーム情報（メインクレーム）」、「ベクトル情報」といった項目が含まれる。 In addition, the "summary" includes items such as "summary information" and "vector information". Further, the "drawing" includes items such as "drawing information (selection drawing)" and "vector information". Further, the "specification" includes items such as "specification information (problem)" and "vector information". Further, the "claims" include items such as "claim information (main claim)" and "vector information".

「特許ＩＤ」は、特許文献（オブジェクト）を識別するための識別情報を示す。また、「特許」は、特許ＩＤにより識別される特許文献の具体的な名称や内容等を示す。なお、図４の例では、特許を「特許＃１」といった抽象的な符号で示すが、各特許は、発明の名称や、出願番号や公開番号等が含まれてもよい。 The "patent ID" indicates identification information for identifying a patent document (object). Further, "patent" indicates a specific name, content, or the like of a patent document identified by a patent ID. In the example of FIG. 4, the patent is indicated by an abstract reference numeral such as “Patent # 1”, but each patent may include the title of the invention, the application number, the publication number, and the like.

「要約書」中の「要約情報」は、特許ＩＤにより識別される特許文献の要約書の情報を示す。なお、図４の例では、要約情報を「ＡＤ１」といった抽象的な符号で示すが、各要約情報は、要約書の文章全体や要約書の文章から抽出された各要素、例えば要約書に含まれる単語の一覧や単語の出現頻度等を示す情報等が含まれてもよい。「ベクトル情報」とは、特許ＩＤにより識別される特許文献（オブジェクト）の要約書に対応するベクトル情報を示す。すなわち、図４の例では、特許文献（オブジェクト）を識別する特許ＩＤに対して、オブジェクトに対応する要約書のベクトルデータ（ベクトル情報）が対応付けられて登録されている。 The "abstract information" in the "abstract" indicates information on the abstract of the patent document identified by the patent ID. In the example of FIG. 4, the summary information is indicated by an abstract code such as "AD1", but each summary information is included in the entire text of the abstract or in each element extracted from the text of the abstract, for example, the abstract. It may include a list of words to be used, information indicating the frequency of appearance of words, and the like. The "vector information" indicates the vector information corresponding to the abstract of the patent document (object) identified by the patent ID. That is, in the example of FIG. 4, the vector data (vector information) of the abstract corresponding to the object is associated and registered with the patent ID that identifies the patent document (object).

「図面」中の「図面情報（選択図）」は、特許ＩＤにより識別される特許文献の選択図の情報を示す。なお、図４の例では、図面情報を「ＳＩＤ１」といった抽象的な符号で示すが、各図面情報は、選択図の画像データ等が含まれてもよい。また、図面情報は、選択図に限らず、全図面が含まれてもよい。「ベクトル情報」とは、特許ＩＤにより識別される特許文献（オブジェクト）の選択図に対応するベクトル情報を示す。すなわち、図４の例では、特許文献（オブジェクト）を識別する特許ＩＤに対して、オブジェクトに対応する選択図のベクトルデータ（ベクトル情報）が対応付けられて登録されている。 "Drawing information (selection drawing)" in the "drawing" indicates information on the selection drawing of the patent document identified by the patent ID. In the example of FIG. 4, the drawing information is indicated by an abstract reference numeral such as “SID1”, but each drawing information may include image data or the like of the selected drawing. Further, the drawing information is not limited to the selected drawing, and may include all drawings. The "vector information" indicates the vector information corresponding to the selection diagram of the patent document (object) identified by the patent ID. That is, in the example of FIG. 4, the vector data (vector information) of the selection diagram corresponding to the object is associated and registered with the patent ID that identifies the patent document (object).

「明細書」中の「明細書情報（課題）」は、特許ＩＤにより識別される特許文献の課題の情報を示す。なお、図４の例では、明細書情報を「ＰＳＤ１」といった抽象的な符号で示すが、各明細書情報は、課題の文章全体や課題の文章から抽出された各要素、例えば課題に含まれる単語の一覧や単語の出現頻度等を示す情報等が含まれてもよい。また、明細書情報は、課題に限らず、明細書全体が含まれてもよい。「ベクトル情報」とは、特許ＩＤにより識別される特許文献（オブジェクト）の課題に対応するベクトル情報を示す。すなわち、図４の例では、特許文献（オブジェクト）を識別する特許ＩＤに対して、オブジェクトに対応する課題のベクトルデータ（ベクトル情報）が対応付けられて登録されている。 "Specification information (problem)" in the "specification" indicates information on the subject of the patent document identified by the patent ID. In the example of FIG. 4, the specification information is indicated by an abstract reference numeral such as "PSD1", but each specification information is included in the entire text of the task or each element extracted from the text of the task, for example, the task. Information such as a list of words and the frequency of appearance of words may be included. Further, the specification information is not limited to the problem, and may include the entire specification. The "vector information" indicates vector information corresponding to the problem of the patent document (object) identified by the patent ID. That is, in the example of FIG. 4, the vector data (vector information) of the problem corresponding to the object is associated and registered with the patent ID that identifies the patent document (object).

「特許請求の範囲」中の「クレーム情報（メインクレーム）」は、特許ＩＤにより識別される特許文献のメインクレーム（請求項１）の情報を示す。なお、図４の例では、クレーム情報を「ＣＬＤ１」といった抽象的な符号で示すが、各クレーム情報は、メインクレームの文章全体やメインクレームの文章から抽出された各要素、例えばメインクレームに含まれる単語の一覧や単語の出現頻度等を示す情報等が含まれてもよい。また、クレーム情報は、メインクレームに限らず、クレーム全体が含まれてもよい。「ベクトル情報」とは、特許ＩＤにより識別される特許文献（オブジェクト）のメインクレームに対応するベクトル情報を示す。すなわち、図４の例では、特許文献（オブジェクト）を識別する特許ＩＤに対して、オブジェクトに対応するメインクレームのベクトルデータ（ベクトル情報）が対応付けられて登録されている。 The "claim information (main claim)" in the "claims" indicates the information of the main claim (claim 1) of the patent document identified by the patent ID. In the example of FIG. 4, the claim information is indicated by an abstract code such as "CLD1", but each claim information is included in the entire text of the main claim or each element extracted from the text of the main claim, for example, the main claim. It may include a list of words to be used, information indicating the frequency of appearance of words, and the like. Further, the claim information is not limited to the main claim, but may include the entire claim. The "vector information" indicates the vector information corresponding to the main claim of the patent document (object) identified by the patent ID. That is, in the example of FIG. 4, the vector data (vector information) of the main claim corresponding to the object is associated and registered with the patent ID that identifies the patent document (object).

例えば、図４の例では、特許ＩＤ「ＩＰ１」により識別される特許文献（オブジェクト）は、「１０，２４，５４，２．．．」の多次元（Ｎ次元）の要約書のベクトル情報が対応付けられることを示す。例えば、特許＃１については、モデルＭ１等により、特許＃１の要約書の特徴を示す「１０，２４，５４，２．．．」の多次元（Ｎ次元）のベクトル情報が要約情報ＡＤ１から抽出されたことを示す。 For example, in the example of FIG. 4, the patent document (object) identified by the patent ID "IP1" has vector information of a multidimensional (N-dimensional) abstract of "10, 24, 54, 2 ...". Indicates that they can be associated. For example, for patent # 1, the multidimensional (N-dimensional) vector information of "10, 24, 54, 2 ..." showing the characteristics of the abstract of patent # 1 is obtained from the abstract information AD1 by the model M1 or the like. Indicates that it has been extracted.

なお、特許情報記憶部１２１は、上記に限らず、目的に応じて種々の情報を記憶してもよい。 The patent information storage unit 121 is not limited to the above, and may store various information depending on the purpose.

（インデックス情報記憶部１２２）
実施形態に係るインデックス情報記憶部１２２は、インデックスに関する各種情報を記憶する。図５は、実施形態に係るインデックス情報記憶部の一例を示す図である。具体的には、図５の例では、インデックス情報記憶部１２２は、ツリー構造のインデックス情報を示す。図５の例では、インデックス情報記憶部１２２は、インデックスデータセット１２２−１やインデックスデータセット１２２−２やインデックスデータセット１２２−３やインデックスデータセット１２２−４等のように対象書類ごとに情報（テーブル）を記憶する。図５に示すインデックスデータセット１２２−１〜１２２−４等は、「対象書類」、「ルート階層」、「第１階層」、「第２階層」、「第３階層」等といった項目を含む。なお、「第１階層」〜「第３階層」に限らず、インデックスの階層数に応じて、「第４階層」、「第５階層」、「第６階層」等が含まれてもよい。 (Index information storage unit 122)
The index information storage unit 122 according to the embodiment stores various information related to the index. FIG. 5 is a diagram showing an example of an index information storage unit according to an embodiment. Specifically, in the example of FIG. 5, the index information storage unit 122 shows index information having a tree structure. In the example of FIG. 5, the index information storage unit 122 has information (for each target document, such as index data set 122-1, index data set 122-2, index data set 122-3, index data set 122-4, and the like. Table) is memorized. The index data set 122-1 to 122-4 and the like shown in FIG. 5 include items such as "target document", "root hierarchy", "first hierarchy", "second hierarchy", "third hierarchy" and the like. It should be noted that the present invention is not limited to the "first layer" to the "third layer", and may include "fourth layer", "fifth layer", "sixth layer" and the like depending on the number of layers of the index.

図５の例では、インデックスデータセット１２２−１は、書類ＩＤ「ＴＩＤ１」により識別される書類「要約書」に対応し、インデックスデータセット１２２−２は、書類ＩＤ「ＴＩＤ２」により識別される書類「図面（選択図）」に対応する。例えば、書類「要約書」は、特許文献のうち、要約書を対象とするグラフ情報のインデックス情報であることを示す。例えば、書類「図面（選択図）」は、図面のうち、選択図を対象とするグラフ情報のインデックス情報であることを示す。 In the example of FIG. 5, the index data set 122-1 corresponds to the document "summary" identified by the document ID "TID1", and the index data set 122-2 corresponds to the document identified by the document ID "TID2". Corresponds to "drawing (selection drawing)". For example, the document "abstract" indicates that it is index information of graph information for the abstract among patent documents. For example, the document "drawing (selection drawing)" indicates that the drawing is index information of graph information for the selection drawing.

また、インデックスデータセット１２２−３は、書類ＩＤ「ＴＩＤ３」により識別される書類「明細書（課題）」に対応し、インデックスデータセット１２２−４は、書類ＩＤ「ＴＩＤ４」により識別される書類「特許請求の範囲（メインクレーム）」に対応する。例えば、書類「明細書（課題）」は、明細書のうち、課題を対象とするグラフ情報のインデックス情報であることを示す。例えば、書類「特許請求の範囲（メインクレーム）」は、クレームのうち、メインクレーム（請求項１）を対象とするグラフ情報のインデックス情報であることを示す。 Further, the index data set 122-3 corresponds to the document "specification (problem)" identified by the document ID "TID3", and the index data set 122-4 corresponds to the document "TID4" identified by the document ID "TID4". Corresponds to "Claims (Main Claims)". For example, the document "specification (problem)" indicates that it is index information of graph information for the subject in the specification. For example, the document "Claims (Claims)" indicates that it is index information of graph information for the main claim (Claim 1) among the claims.

具体的には、インデックスデータセット１２２−１には、対象書類「要約書」に対応するインデックスに関する情報（インデックス情報ＩＮＤ１１）が記憶される。また、インデックスデータセット１２２−２には、対象書類「図面（選択図）」に対応するインデックスに関する情報（インデックス情報ＩＮＤ１２）が記憶される。また、インデックスデータセット１２２−３には、対象書類「明細書（課題）」に対応するインデックスに関する情報（インデックス情報ＩＮＤ１３）が記憶される。また、インデックスデータセット１２２−４には、対象書類「特許請求の範囲（メインクレーム）」に対応するインデックスに関する情報（インデックス情報ＩＮＤ１４）が記憶される。 Specifically, the index data set 122-1 stores information about the index (index information IND11) corresponding to the target document “summary”. Further, the index data set 122-2 stores information (index information IND12) regarding the index corresponding to the target document “drawing (selected drawing)”. Further, the index data set 122-3 stores information (index information IND13) regarding the index corresponding to the target document “specification (problem)”. Further, the index data set 122-4 stores information (index information IND14) regarding the index corresponding to the target document “claims (main claims)”.

「対象書類」は、対応するインデックスデータセットが対象とする書類を示す。「ルート階層」は、インデックスを用いた起点ノードの決定の開始点となるルート（最上位）の階層を示す。「第１階層」は、インデックスの第１階層に属するノード（節点またはグラフ情報中のベクトル）を識別（特定）する情報が格納される。「第１階層」に格納されるノードは、インデックスの根（ルート）に直接結ばれる階層に対応するノードとなる。 “Target document” indicates a document that is the target of the corresponding index data set. The "root hierarchy" indicates a hierarchy of routes (top level) that is a starting point for determining a starting node using an index. The "first layer" stores information for identifying (identifying) a node (node or vector in graph information) belonging to the first layer of the index. The node stored in the "first layer" is a node corresponding to the layer directly connected to the root of the index.

「第２階層」は、インデックスの第２階層に属するノード（節点またはグラフ情報中のベクトル）を識別（特定）する情報が格納される。「第２階層」に格納されるノードは、第１階層のノードに結ばれる直下の階層に対応するノードとなる。「第３階層」は、インデックスの第３階層に属するノード（節点またはグラフ情報中のベクトル）を識別（特定）する情報が格納される。「第３階層」に格納されるノードは、第２階層のノードに結ばれる直下の階層に対応するノードとなる。 The "second layer" stores information for identifying (identifying) a node (node or vector in graph information) belonging to the second layer of the index. The node stored in the "second layer" is a node corresponding to the immediately lower layer connected to the node of the first layer. The "third layer" stores information for identifying (identifying) a node (node or vector in graph information) belonging to the third layer of the index. The node stored in the "third layer" is a node corresponding to the immediately lower layer connected to the node of the second layer.

例えば、図５に示す例においては、インデックス情報記憶部１２２のうち、インデックスデータセット１２２−１には、図１中のインデックス情報ＩＮＤ１１に対応する情報が記憶される。例えば、インデックスデータセット１２２−１は、第１階層のノードが、節点ＶＴ１〜ＶＴ３等であることを示す。また、各節点の下の括弧内の数値は、各節点に対応するベクトルの値を示す。 For example, in the example shown in FIG. 5, in the index information storage unit 122, the index data set 122-1 stores the information corresponding to the index information IND11 in FIG. For example, the index data set 122-1 indicates that the node of the first layer is the node VT1 to VT3 or the like. The numerical values in parentheses below each node indicate the value of the vector corresponding to each node.

また、例えば、インデックスデータセット１２２−１は、節点ＶＴ２の直下の第２階層のノードが、節点ＶＴ２−１〜ＶＴ２−４であることを示す。また、例えば、インデックスデータセット１２２−１は、節点ＶＴ２−２の直下の第３階層のノードが、ノードＮ３５、ノードＮ４５１、ノードＮ６９３のグラフ情報ＧＲ１１中のノード（ベクトル）であることを示す。 Further, for example, the index data set 122-1 indicates that the node of the second layer immediately below the node VT2 is the node VT2-1 to VT2-4. Further, for example, the index data set 122-1 indicates that the node in the third layer immediately below the node VT2-2 is a node (vector) in the graph information GR11 of the node N35, the node N451, and the node N693.

なお、インデックス情報記憶部１２２は、上記に限らず、目的に応じて種々の情報を記憶してもよい。 The index information storage unit 122 is not limited to the above, and may store various information depending on the purpose.

（グラフ情報記憶部１２３）
実施形態に係るグラフ情報記憶部１２３は、グラフ情報に関する各種情報を記憶する。図６は、実施形態に係るグラフ情報記憶部の一例を示す図である。図６の例では、グラフ情報記憶部１２３は、グラフデータセット１２３−１やグラフデータセット１２３−２やグラフデータセット１２３−３やグラフデータセット１２３−４等のように対象書類ごとに情報（テーブル）を記憶する。図６に示すグラフデータセット１２３−１〜１２３−４等は、「対象書類」、「ノードＩＤ」、「特許ＩＤ」、および「エッジ情報」といった項目を有する。また、「エッジ情報」には、「エッジＩＤ」や「参照先」といった情報が含まれる。 (Graph information storage unit 123)
The graph information storage unit 123 according to the embodiment stores various information related to the graph information. FIG. 6 is a diagram showing an example of a graph information storage unit according to an embodiment. In the example of FIG. 6, the graph information storage unit 123 has information (for each target document, such as graph data set 123-1, graph data set 123-2, graph data set 123-3, graph data set 123-4, and the like. Table) is memorized. The graph data set 123-1-123-4 and the like shown in FIG. 6 have items such as "target document", "node ID", "patent ID", and "edge information". Further, the "edge information" includes information such as "edge ID" and "reference destination".

図６の例では、グラフデータセット１２３−１は、書類ＩＤ「ＴＩＤ１」により識別される書類「要約書」に対応し、グラフデータセット１２３−２は、書類ＩＤ「ＴＩＤ２」により識別される書類「図面（選択図）」に対応する。例えば、書類「要約書」は、特許文献のうち、要約書を対象とするグラフ情報のインデックス情報であることを示す。すなわち、グラフデータセット１２３−１に記憶されるグラフ情報（図１中のグラフ情報ＧＲ１１）は、特許書類のうち要約書がベクトル化されグラフ構造化された情報であることを示す。また、例えば、書類「図面（選択図）」は、図面のうち、選択図を対象とするグラフ情報のインデックス情報であることを示す。すなわち、グラフデータセット１２３−２に記憶されるグラフ情報は、特許書類のうち図面中の選択図がベクトル化されグラフ構造化された情報であることを示す。 In the example of FIG. 6, the graph data set 123-1 corresponds to the document “summary” identified by the document ID “TID1”, and the graph data set 123-2 corresponds to the document identified by the document ID “TID2”. Corresponds to "drawing (selection drawing)". For example, the document "abstract" indicates that it is index information of graph information for the abstract among patent documents. That is, the graph information (graph information GR11 in FIG. 1) stored in the graph data set 123-1 indicates that the abstract of the patent documents is vectorized and graph-structured information. Further, for example, the document "drawing (selection drawing)" indicates that the drawing is index information of graph information for the selection drawing. That is, the graph information stored in the graph data set 123-2 indicates that the selection diagram in the drawing of the patent document is vectorized and graph-structured information.

また、グラフデータセット１２３−３は、書類ＩＤ「ＴＩＤ３」により識別される書類「明細書（課題）」に対応し、グラフデータセット１２３−４は、書類ＩＤ「ＴＩＤ４」により識別される書類「特許請求の範囲（メインクレーム）」に対応する。例えば、書類「明細書（課題）」は、明細書のうち、課題を対象とするグラフ情報であることを示す。すなわち、グラフデータセット１２３−３に記憶されるグラフ情報は、特許書類のうち明細書中の課題がベクトル化されグラフ構造化された情報であることを示す。例えば、書類「特許請求の範囲（メインクレーム）」は、クレームのうち、メインクレーム（請求項１）を対象とするグラフ情報であることを示す。すなわち、グラフデータセット１２３−４に記憶されるグラフ情報は、特許書類のうちクレーム中の請求項１がベクトル化されグラフ構造化された情報であることを示す。 Further, the graph data set 123-3 corresponds to the document "specification (problem)" identified by the document ID "TID3", and the graph data set 123-4 corresponds to the document "TID4" identified by the document ID "TID4". Corresponds to "Claims (Main Claims)". For example, the document "specification (problem)" indicates that the specification is graph information for the subject. That is, the graph information stored in the graph data set 123-3 indicates that the issues in the specification of the patent document are vectorized and graph-structured information. For example, the document "Claims (Claims)" indicates that the graph information covers the main claims (Claim 1) among the claims. That is, the graph information stored in the graph data set 123-4 indicates that claim 1 in the patent document is vectorized and graph-structured information.

具体的には、グラフデータセット１２３−１には、対象書類「要約書」に対応するグラフに関する情報（グラフ情報ＧＲ１１）が記憶される。また、グラフデータセット１２３−２には、対象書類「図面（選択図）」に対応するグラフに関する情報（グラフ情報ＧＲ１２）が記憶される。また、グラフデータセット１２３−３には、対象書類「明細書（課題）」に対応するグラフに関する情報（グラフ情報ＧＲ１３）が記憶される。また、グラフデータセット１２３−４には、対象書類「特許請求の範囲（メインクレーム）」に対応するグラフに関する情報（グラフ情報ＧＲ１４）が記憶される。 Specifically, the graph data set 123-1 stores information (graph information GR11) related to the graph corresponding to the target document “summary”. Further, the graph data set 123-2 stores information (graph information GR12) related to the graph corresponding to the target document “drawing (selection drawing)”. Further, the graph data set 123-3 stores information (graph information GR13) related to the graph corresponding to the target document “specification (problem)”. Further, the graph data set 123-4 stores information (graph information GR14) related to the graph corresponding to the target document “claims (main claims)”.

「対象書類」は、対応するグラフデータセットが対象とする書類を示す。「ノードＩＤ」は、グラフデータにおける各ノード（対象）を識別するための識別情報を示す。また、「特許ＩＤ」は、特許を識別するための識別情報を示す。 “Target document” indicates a document that is the target of the corresponding graph data set. The "node ID" indicates identification information for identifying each node (target) in the graph data. Further, the "patent ID" indicates identification information for identifying the patent.

また、「エッジ情報」は、対応するノードに接続されるエッジに関する情報を示す。図６の例では、「エッジ情報」は、エッジが有向エッジである場合を示し、対応するノードから出力される出力エッジに関する情報を示す。また、「エッジＩＤ」は、ノード間を連結するエッジを識別するための識別情報を示す。また、「参照先」は、エッジにより連結された参照先（ノード）を示す情報を示す。すなわち、図６の例では、ノードを識別するノードＩＤに対して、そのノードに対応するオブジェクト（対象）を識別する情報やそのノードからの有向エッジ（出力エッジ）が連結される参照先（ノード）が対応付けられて登録されている。 Further, "edge information" indicates information about the edge connected to the corresponding node. In the example of FIG. 6, “edge information” indicates a case where the edge is a directed edge, and indicates information about an output edge output from the corresponding node. Further, the "edge ID" indicates identification information for identifying an edge connecting the nodes. Further, "reference destination" indicates information indicating a reference destination (node) connected by an edge. That is, in the example of FIG. 6, the reference destination (output edge) to which the information for identifying the object (target) corresponding to the node and the directed edge (output edge) from the node are concatenated with respect to the node ID for identifying the node. Node) is associated and registered.

例えば、図６の例では、グラフ情報記憶部１２３のうち、グラフデータセット１２３−１においては、ノードＩＤ「Ｎ１」により識別されるノード（ベクトル）は、特許ＩＤ「ＩＰ１」により識別される特許（オブジェクト）に対応することを示す。また、グラフデータセット１２３−１においては、ノードＩＤ「Ｎ１」により識別されるノードからは、エッジＩＤ「Ｅ１１」により識別されるエッジが、ノードＩＤ「Ｎ２５」により識別されるノード（ベクトル）に連結されることを示す。すなわち、図６の例では、グラフデータセット１２３−１においては、ノードＩＤ「Ｎ１」により識別されるノード（ベクトル）からはノードＩＤ「Ｎ２５」により識別されるノード（ベクトル）に辿ることができることを示す。 For example, in the example of FIG. 6, in the graph data set 123-1 of the graph information storage unit 123, the node (vector) identified by the node ID “N1” is a patent identified by the patent ID “IP1”. Indicates that it corresponds to (object). Further, in the graph data set 123-1, from the node identified by the node ID "N1", the edge identified by the edge ID "E11" becomes a node (vector) identified by the node ID "N25". Indicates that they will be linked. That is, in the example of FIG. 6, in the graph data set 123-1, the node (vector) identified by the node ID “N1” can be traced to the node (vector) identified by the node ID “N25”. Is shown.

なお、グラフ情報記憶部１２３は、上記に限らず、目的に応じて種々の情報を記憶してもよい。例えば、グラフ情報記憶部１２３は、各ノード（ベクトル）間を連結するエッジの長さが記憶されてもよい。すなわち、グラフ情報記憶部１２３は、各ノード（ベクトル）間の距離を示す情報が記憶されてもよい。 The graph information storage unit 123 is not limited to the above, and may store various information depending on the purpose. For example, the graph information storage unit 123 may store the length of an edge connecting each node (vector). That is, the graph information storage unit 123 may store information indicating the distance between each node (vector).

（モデル情報記憶部１２４）
実施形態に係るモデル情報記憶部１２４は、モデルに関する情報を記憶する。例えば、モデル情報記憶部１２４は、生成処理により生成されたモデル情報（モデルデータ）を記憶する。図７は、実施形態に係るモデル情報記憶部の一例を示す図である。図７に示すモデル情報記憶部１２４は、「モデルＩＤ」、「用途」、「モデルデータ」といった項目が含まれる。なお、図７では、モデルＭ１〜Ｍ４のみを図示するが、各用途（書類の種類）に応じて多数のモデル情報が記憶されてもよい。例えば、モデル情報記憶部１２４には、特許文献全体を対象とするモデルＭ２１が記憶されてもよい。 (Model information storage unit 124)
The model information storage unit 124 according to the embodiment stores information about the model. For example, the model information storage unit 124 stores model information (model data) generated by the generation process. FIG. 7 is a diagram showing an example of a model information storage unit according to an embodiment. The model information storage unit 124 shown in FIG. 7 includes items such as "model ID", "use", and "model data". Although only models M1 to M4 are shown in FIG. 7, a large number of model information may be stored depending on each application (type of document). For example, the model information storage unit 124 may store the model M21 that covers the entire patent document.

「モデルＩＤ」は、モデルを識別するための識別情報を示す。例えば、モデルＩＤ「Ｍ１」により識別されるモデルは、図１の例に示したモデルＭ１に対応する。「用途」は、対応するモデルの用途を示す。また、「モデルデータ」は、対応付けられた対応するモデルのデータを示す。例えば、「モデルデータ」には、各層におけるノードと、各ノードが採用する関数と、ノードの接続関係と、ノード間の接続に対して設定される接続係数とを含む情報が含まれる。 The "model ID" indicates identification information for identifying the model. For example, the model identified by the model ID "M1" corresponds to the model M1 shown in the example of FIG. "Use" indicates the use of the corresponding model. Further, the "model data" indicates the data of the corresponding corresponding model associated with the model data. For example, "model data" includes information including nodes in each layer, functions adopted by each node, connection relationships of the nodes, and connection coefficients set for connections between the nodes.

例えば、図７に示す例において、モデルＩＤ「Ｍ１」により識別されるモデル（モデルＭ１）は、用途が「特徴抽出（要約書）」であり、入力された特許の要約書の要約情報からの特徴の抽出に用いられることを示す。また、モデルＭ１のモデルデータは、モデルデータＭＤＴ１であることを示す。 For example, in the example shown in FIG. 7, the model (model M1) identified by the model ID "M1" has the purpose of "feature extraction (abstract)" and is derived from the abstract information of the input patent abstract. It is shown that it is used for feature extraction. Further, it is shown that the model data of the model M1 is the model data MDT1.

モデルＭ１（モデルデータＭＤＴ１）は、特許の要約書の要約情報（要約データ）が入力される入力層と、出力層と、入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力された要約情報に対し、出力層以外の各層に属する各要素を第１要素として、第１要素と第１要素の重みとに基づく演算を行うことにより、入力層に入力される情報と同様の情報を出力層から出力するよう、コンピュータを機能させるためのモデルである。 The model M1 (model data MDT1) is an input layer into which abstract information (summary data) of a patent abstract is input, an output layer, and any layer from the input layer to the output layer, other than the output layer. Each layer other than the output layer with respect to the summary information input to the input layer, including the first element belonging to the layer and the second element whose value is calculated based on the weights of the first element and the first element. By performing an operation based on the first element and the weight of the first element with each element belonging to the first element as the first element, the computer functions to output the same information as the information input to the input layer from the output layer. It is a model to make it.

また、モデルＭ１〜Ｍ４等がＤＮＮ（Deep Neural Network）等、１つまたは複数の中間層を有するニューラルネットワークで実現されるとする。この場合、例えば、モデルＭ１〜Ｍ４が含む第１要素は、入力層または中間層が有するいずれかのノードに対応する。また、第２要素は、第１要素と対応するノードから値が伝達されるノードである次段のノードに対応する。また、第１要素の重みは、第１要素と対応するノードから第２要素と対応するノードに伝達される値に対して考慮される重みである接続係数に対応する。 Further, it is assumed that the models M1 to M4 and the like are realized by a neural network having one or a plurality of intermediate layers such as a DNN (Deep Neural Network). In this case, for example, the first element included in the models M1 to M4 corresponds to either node of the input layer or the intermediate layer. Further, the second element corresponds to the node of the next stage, which is the node to which the value is transmitted from the node corresponding to the first element. Further, the weight of the first element corresponds to a connection coefficient which is a weight considered for the value transmitted from the node corresponding to the first element to the node corresponding to the second element.

ここで、モデルＭ１〜Ｍ４等が「ｙ＝ａ１＊ｘ１＋ａ２＊ｘ２＋・・・＋ａｉ＊ｘｉ」で示す回帰モデルで実現されるとする。この場合、例えば、モデルＭ１〜Ｍ４等が含む第１要素は、ｘ１やｘ２等といった入力データ（ｘｉ）に対応する。また、第１要素の重みは、ｘｉに対応する係数ａｉに対応する。ここで、回帰モデルは、入力層と出力層とを有する単純パーセプトロンと見做すことができる。各モデルを単純パーセプトロンと見做した場合、第１要素は、入力層が有するいずれかのノードに対応し、第２要素は、出力層が有するノードと見做すことができる。 Here, it is assumed that the models M1 to M4 and the like are realized by the regression model represented by "y = a1 * x1 + a2 * x2 + ... + ai * xi". In this case, for example, the first element included in the models M1 to M4 and the like corresponds to input data (xi) such as x1 and x2. Further, the weight of the first element corresponds to the coefficient ai corresponding to xi. Here, the regression model can be regarded as a simple perceptron having an input layer and an output layer. When each model is regarded as a simple perceptron, the first element corresponds to any node of the input layer, and the second element can be regarded as the node of the output layer.

なお、モデル情報記憶部１２４は、上記に限らず、目的に応じて種々のモデル情報を記憶してもよい。 The model information storage unit 124 is not limited to the above, and may store various model information depending on the purpose.

（制御部１３０）
図３の説明に戻って、制御部１３０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）等によって、抽出装置１００内部の記憶装置に記憶されている各種プログラム（抽出プログラムの一例に相当）がＲＡＭを作業領域として実行されることにより実現される。また、制御部１３０は、コントローラであり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現される。例えば、制御部１３０は、モデル情報記憶部１２４に記憶されているモデルＭ１に従った情報処理により、特許の要約書の要約情報（要約データ）が入力される入力層と、出力層と、入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力されたログ情報に対し、出力層以外の各層に属する各要素を第１要素として、第１要素と第１要素の重みとに基づく演算を行うことにより、入力層に入力される情報と同様の情報を出力層から出力する。 (Control unit 130)
Returning to the description of FIG. 3, the control unit 130 is a controller, and is stored in a storage device inside the extraction device 100 by, for example, a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). Various programs (corresponding to an example of an extraction program) are realized by executing the RAM as a work area. Further, the control unit 130 is a controller, and is realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). For example, the control unit 130 has an input layer, an output layer, and an input layer into which the summary information (summary data) of the patent abstract is input by information processing according to the model M1 stored in the model information storage unit 124. Includes a first element that is any layer from the layer to the output layer and belongs to a layer other than the output layer, and a second element whose value is calculated based on the weights of the first element and the first element. , The log information input to the input layer is input to the input layer by performing an operation based on the first element and the weight of the first element, with each element belonging to each layer other than the output layer as the first element. The same information as the above information is output from the output layer.

図３に示すように、制御部１３０は、取得部１３１と、生成部１３２と、決定部１３３と、抽出部１３４と、提供部１３５とを有し、以下に説明する情報処理の機能や作用を実現または実行する。なお、制御部１３０の内部構成は、図３に示した構成に限られず、後述する情報処理を行う構成であれば他の構成であってもよい。 As shown in FIG. 3, the control unit 130 includes an acquisition unit 131, a generation unit 132, a determination unit 133, an extraction unit 134, and a provision unit 135, and has functions and operations of information processing described below. Realize or execute. The internal configuration of the control unit 130 is not limited to the configuration shown in FIG. 3, and may be any other configuration as long as it is configured to perform information processing described later.

（取得部１３１）
取得部１３１は、各種情報を取得する。例えば、取得部１３１は、記憶部１２０から各種情報を取得する。例えば、取得部１３１は、特許情報記憶部１２１や、インデックス情報記憶部１２２や、グラフ情報記憶部１２３や、モデル情報記憶部１２４等から各種情報を取得する。また、取得部１３１は、各種情報を外部の情報処理装置から取得する。取得部１３１は、端末装置１０や情報提供装置５０から各種情報を取得する。 (Acquisition unit 131)
The acquisition unit 131 acquires various types of information. For example, the acquisition unit 131 acquires various information from the storage unit 120. For example, the acquisition unit 131 acquires various information from the patent information storage unit 121, the index information storage unit 122, the graph information storage unit 123, the model information storage unit 124, and the like. Further, the acquisition unit 131 acquires various information from an external information processing device. The acquisition unit 131 acquires various information from the terminal device 10 and the information providing device 50.

取得部１３１は、複数の特許文献の各々に対応する複数のノードが、複数の特許文献の類似性に応じて連結されたグラフ情報を取得する。取得部１３１は、複数の特許文献の各々に対応する複数のベクトルが類似性に応じて連結されたグラフ情報と、一の発明に関する情報を取得する。また、取得部１３１は、複数の特許文献の各々の特徴を示す複数のベクトルが類似性に応じて連結されたグラフ情報を取得する。また、取得部１３１は、所定のモデルを用いて複数の特許文献の各々から抽出された特徴量を要素とする複数のベクトルが、類似性に応じて連結されたグラフ情報を取得する。取得部１３１は、所定のモデルを用いて複数の特許文献の各々から抽出された特徴量を要素とする複数のベクトルの類似性に応じて、複数のノードが連結されたグラフ情報を取得する。取得部１３１は、所定のモデルを用いて複数の特許文献の各々から抽出された特徴量を要素とする複数のベクトルの類似性に応じて、複数のノードが連結されたグラフ情報を取得する。取得部１３１は、複数の特許文献に関する情報を所定のモデルに入力することにより、抽出される複数の特許文献の各々の特徴量を要素とする複数のベクトルの類似性に応じて、複数のノードが連結されたグラフ情報を取得する。 The acquisition unit 131 acquires graph information in which a plurality of nodes corresponding to each of the plurality of patent documents are concatenated according to the similarity of the plurality of patent documents. The acquisition unit 131 acquires graph information in which a plurality of vectors corresponding to each of the plurality of patent documents are concatenated according to the similarity, and information on one invention. Further, the acquisition unit 131 acquires graph information in which a plurality of vectors showing the characteristics of each of the plurality of patent documents are concatenated according to the similarity. Further, the acquisition unit 131 acquires graph information in which a plurality of vectors whose elements are feature quantities extracted from each of a plurality of patent documents using a predetermined model are concatenated according to the similarity. The acquisition unit 131 acquires graph information in which a plurality of nodes are connected according to the similarity of a plurality of vectors whose elements are feature quantities extracted from each of a plurality of patent documents using a predetermined model. The acquisition unit 131 acquires graph information in which a plurality of nodes are connected according to the similarity of a plurality of vectors whose elements are feature quantities extracted from each of a plurality of patent documents using a predetermined model. The acquisition unit 131 inputs information about a plurality of patent documents into a predetermined model, and the acquisition unit 131 has a plurality of nodes according to the similarity of the plurality of vectors having the feature amount of each of the plurality of patent documents to be extracted as an element. Gets the graph information concatenated with.

また、取得部１３１は、複数の特許文献に関する情報を所定のモデルに入力することにより、抽出される複数の特許文献の各々の特徴量を要素とする複数のベクトルが、類似性に応じて連結されたグラフ情報を取得する。また、取得部１３１は、複数の特許文献に含まれる書類の各種類に対応する複数のグラフ情報を取得する。取得部１３１は、複数の特許文献の各々に含まれる各要約書に対応する複数のベクトルの類似性に応じて、各要約書に対応する複数のノードが連結されたグラフ情報を含む複数のグラフ情報を取得する。取得部１３１は、複数の特許文献の各々に含まれる各図面に対応する複数のベクトルの類似性に応じて、各図面に対応する複数のノードが連結されたグラフ情報を含む複数のグラフ情報を取得する。取得部１３１は、複数の特許文献の各々に含まれる各明細書に対応する複数のベクトルの類似性に応じて、各明細書に対応する複数のノードが連結されたグラフ情報を含む複数のグラフ情報を取得する。取得部１３１は、複数の特許文献の各々に含まれる各特許請求の範囲に対応する複数のベクトルの類似性に応じて、各特許請求の範囲に対応する複数のノードが連結されたグラフ情報を含む複数のグラフ情報を取得する。取得部１３１は、複数の特許文献の各々に含まれる各特許請求の範囲に対応する複数のベクトルの類似性に応じて、各特許請求の範囲に対応する複数のノードが連結されたグラフ情報を含む複数のグラフ情報を取得する。 Further, the acquisition unit 131 inputs information on a plurality of patent documents into a predetermined model, so that a plurality of vectors whose elements are the feature quantities of the plurality of patent documents to be extracted are concatenated according to the similarity. Get the graph information. In addition, the acquisition unit 131 acquires a plurality of graph information corresponding to each type of documents included in the plurality of patent documents. The acquisition unit 131 includes a plurality of graphs including graph information in which a plurality of nodes corresponding to each abstract are concatenated according to the similarity of a plurality of vectors corresponding to each abstract included in each of the plurality of patent documents. Get information. The acquisition unit 131 obtains a plurality of graph information including graph information in which a plurality of nodes corresponding to each drawing are concatenated according to the similarity of a plurality of vectors corresponding to each drawing included in each of the plurality of patent documents. get. The acquisition unit 131 includes a plurality of graphs including graph information in which a plurality of nodes corresponding to each specification are concatenated according to the similarity of a plurality of vectors corresponding to each specification included in each of the plurality of patent documents. Get information. The acquisition unit 131 obtains graph information in which a plurality of nodes corresponding to each claim range are concatenated according to the similarity of a plurality of vectors corresponding to each claim range included in each of the plurality of patent documents. Acquire multiple graph information including. The acquisition unit 131 obtains graph information in which a plurality of nodes corresponding to each claim range are concatenated according to the similarity of a plurality of vectors corresponding to each claim range included in each of the plurality of patent documents. Acquire multiple graph information including.

また、取得部１３１は、複数の特許文献の各々に含まれる各要約書に対応する複数のベクトルが類似性に応じて連結されたグラフ情報を含む複数のグラフ情報を取得する。また、取得部１３１は、複数の特許文献の各々に含まれる各図面に対応する複数のベクトルが類似性に応じて連結されたグラフ情報を含む複数のグラフ情報を取得する。また、取得部１３１は、複数の特許文献の各々に含まれる各明細書に対応する複数のベクトルが類似性に応じて連結されたグラフ情報を含む複数のグラフ情報を取得する。また、取得部１３１は、複数の特許文献の各々に含まれる各特許請求の範囲に対応する複数のベクトルが類似性に応じて連結されたグラフ情報を含む複数のグラフ情報を取得する。 Further, the acquisition unit 131 acquires a plurality of graph information including graph information in which a plurality of vectors corresponding to each abstract included in each of the plurality of patent documents are concatenated according to the similarity. Further, the acquisition unit 131 acquires a plurality of graph information including graph information in which a plurality of vectors corresponding to each drawing included in each of the plurality of patent documents are concatenated according to the similarity. Further, the acquisition unit 131 acquires a plurality of graph information including graph information in which a plurality of vectors corresponding to each specification included in each of the plurality of patent documents are concatenated according to the similarity. Further, the acquisition unit 131 acquires a plurality of graph information including graph information in which a plurality of vectors corresponding to the claims included in each of the plurality of patent documents are concatenated according to the similarity.

また、取得部１３１は、ユーザが利用する端末装置１０から一の発明に関する情報を取得する。また、取得部１３１は、一の発明に関する情報として、一の発明の特許文献に関する情報を取得する。また、取得部１３１は、一の発明に関する情報として、一の発明の特許文献のうち、一の種類の書類に関する情報を取得する。 Further, the acquisition unit 131 acquires information regarding one invention from the terminal device 10 used by the user. Further, the acquisition unit 131 acquires information on the patent document of one invention as information on one invention. Further, the acquisition unit 131 acquires information on one type of document in the patent document of one invention as information on one invention.

例えば、取得部１３１は、データ検索の対象となる複数のノード（ベクトル）を取得する。例えば、取得部１３１は、複数のノードと、複数のノードの各々を連結する複数の有向エッジを含む有向エッジ群を取得する。 For example, the acquisition unit 131 acquires a plurality of nodes (vectors) to be searched for data. For example, the acquisition unit 131 acquires a plurality of nodes and a group of directed edges including a plurality of directed edges connecting each of the plurality of nodes.

例えば、取得部１３１は、外部の情報処理装置からグラフ情報（グラフデータ）を取得する。例えば、取得部１３１は、グラフ情報記憶部１２３からグラフ情報を取得する。例えば、取得部１３１は、グラフ情報を取得する。図１の例では、取得部１３１は、グラフ情報ＧＲ１１を取得する。 For example, the acquisition unit 131 acquires graph information (graph data) from an external information processing device. For example, the acquisition unit 131 acquires graph information from the graph information storage unit 123. For example, the acquisition unit 131 acquires graph information. In the example of FIG. 1, the acquisition unit 131 acquires the graph information GR11.

例えば、取得部１３１は、外部の情報処理装置からインデックス情報（インデックスデータ）を取得する。例えば、取得部１３１は、インデックス情報記憶部１２２からインデックス情報を取得する。例えば、取得部１３１は、木構造型のインデックス情報を取得する。図１の例では、取得部１３１は、インデックス情報ＩＮＤ１１を取得する。 For example, the acquisition unit 131 acquires index information (index data) from an external information processing device. For example, the acquisition unit 131 acquires index information from the index information storage unit 122. For example, the acquisition unit 131 acquires the index information of the tree structure type. In the example of FIG. 1, the acquisition unit 131 acquires the index information IND11.

また、取得部１３１は、ユーザが利用する端末装置１０から一の発明に関する特許情報を取得する。例えば、取得部１３１は、検索クエリとして、一の発明の要約情報を取得する。 Further, the acquisition unit 131 acquires patent information relating to one invention from the terminal device 10 used by the user. For example, the acquisition unit 131 acquires the summary information of one invention as a search query.

取得部１３１は、一の発明に関する特許文献情報（要約情報）を取得する。図１の例では、取得部１３１は、端末装置１０から特許Ｘに関する要約情報ＡＤ１１を取得する。 The acquisition unit 131 acquires patent document information (summary information) relating to one invention. In the example of FIG. 1, the acquisition unit 131 acquires the summary information AD11 regarding the patent X from the terminal device 10.

図１の例では、取得部１３１は、情報群ＩＮＦ１１に示すように、グラフ情報ＧＲ１１やインデックス情報ＩＮＤ１１を用いて特許Ｘの類似特許を検索する。例えば、取得部１３１は、グラフ情報記憶部１２３（図６参照）から特許の要約書に関するグラフ情報ＧＲ１１を取得する。具体的には、取得部１３１は、グラフデータセット１２３−１（図６参照）から特許の要約書に関するグラフ情報ＧＲ１１を取得する。また、例えば、取得部１３１は、インデックス情報記憶部１２２（図５参照）から、グラフ情報ＧＲ１１における検索の起点となるノードの決定に用いるインデックス情報ＩＮＤ１１を取得する。具体的には、取得部１３１は、インデックスデータセット１２２−１（図５参照）から特許の要約書に関するインデックス情報ＩＮＤ１１を取得する。なお、インデックス情報ＩＮＤ１１は、取得部１３１が生成してもよいし、取得部１３１は、インデックス情報ＩＮＤ１１を情報提供装置５０等の他の外部装置から取得してもよい。 In the example of FIG. 1, as shown in the information group INF11, the acquisition unit 131 searches for a similar patent of the patent X by using the graph information GR11 and the index information IND11. For example, the acquisition unit 131 acquires the graph information GR11 relating to the abstract of the patent from the graph information storage unit 123 (see FIG. 6). Specifically, the acquisition unit 131 acquires the graph information GR11 regarding the abstract of the patent from the graph data set 123-1 (see FIG. 6). Further, for example, the acquisition unit 131 acquires the index information IND11 used for determining the node that is the starting point of the search in the graph information GR 11 from the index information storage unit 122 (see FIG. 5). Specifically, the acquisition unit 131 acquires the index information IND11 relating to the abstract of the patent from the index data set 122-1 (see FIG. 5). The index information IND11 may be generated by the acquisition unit 131, or the acquisition unit 131 may acquire the index information IND11 from another external device such as the information providing device 50.

（生成部１３２）
生成部１３２は、各種情報を生成する。例えば、生成部１３２は、特許情報記憶部１２１に記憶された学習データ（ログ情報）を用いて、モデル情報記憶部１２４に示すようなモデルを生成する。例えば、生成部１３２は、取得部１３１により取得された学習データに基づいて、入力したログ情報と同様の情報を出力するモデル（オートエンコーダ）を生成する。例えば、生成部１３２は、入力するログ情報自体を正解情報として、入力したログ情報と同様の情報を出力するモデル（オートエンコーダ）を生成する。 (Generation unit 132)
The generation unit 132 generates various information. For example, the generation unit 132 uses the learning data (log information) stored in the patent information storage unit 121 to generate a model as shown in the model information storage unit 124. For example, the generation unit 132 generates a model (autoencoder) that outputs the same information as the input log information based on the learning data acquired by the acquisition unit 131. For example, the generation unit 132 generates a model (autoencoder) that outputs the same information as the input log information, using the input log information itself as correct answer information.

例えば、生成部１３２は、モデルＭ１等を生成し、生成したモデルＭ１等をモデル情報記憶部１２４に格納する。なお、生成部１３２は、いかなる学習アルゴリズムを用いてモデルＭ１を生成してもよい。例えば、生成部１３２は、ニューラルネットワーク（neural network）等の学習アルゴリズムを用いてモデルＭ１を生成する。一例として、生成部１３２がニューラルネットワークを用いてモデルＭ１等を生成する場合、モデルＭ１等は、一以上のニューロンを含む入力層と、一以上のニューロンを含む中間層と、一以上のニューロンを含む出力層とを有する。 For example, the generation unit 132 generates the model M1 and the like, and stores the generated model M1 and the like in the model information storage unit 124. The generation unit 132 may generate the model M1 using any learning algorithm. For example, the generation unit 132 generates the model M1 by using a learning algorithm such as a neural network. As an example, when the generation unit 132 generates a model M1 or the like using a neural network, the model M1 or the like includes an input layer containing one or more neurons, an intermediate layer containing one or more neurons, and one or more neurons. It has an output layer including.

生成部１３２は、発明や特許に関する情報が入力される入力層と、出力層と、入力層から出力層までのいずれかの層であって出力層以外の層に属する第１要素と、第１要素と第１要素の重みとに基づいて値が算出される第２要素と、を含み、入力層に入力されたログ情報に対し、出力層以外の各層に属する各要素を第１要素として、第１要素と第１要素の重みとに基づく演算を行うことにより、入力層に入力される情報と同様の情報を出力層から出力するモデルを生成する。 The generation unit 132 includes an input layer into which information related to an invention or a patent is input, an output layer, a first element which is any layer from the input layer to the output layer and belongs to a layer other than the output layer, and a first element. A second element whose value is calculated based on the element and the weight of the first element is included, and each element belonging to each layer other than the output layer is set as the first element with respect to the log information input to the input layer. By performing an operation based on the first element and the weight of the first element, a model that outputs the same information as the information input to the input layer from the output layer is generated.

例えば、生成部１３２は、学習データに基づいてモデルを生成する。例えば、生成部１３２は、学習データに基づいてモデルを生成する。例えば、生成部１３２は、特許情報記憶部１２１中の要約情報ＡＤ１、ＡＤ２等を学習データ（教師データ）として、学習を行なうことにより、モデルを生成する。 For example, the generation unit 132 generates a model based on the training data. For example, the generation unit 132 generates a model based on the training data. For example, the generation unit 132 generates a model by learning using the summary information AD1, AD2, etc. in the patent information storage unit 121 as learning data (teacher data).

例えば、生成部１３２は、要約情報ＡＤ１が入力された場合に、モデルＭ１が要約情報ＡＤ１と同様の情報を出力するように、学習処理を行う。例えば、生成部１３２は、要約情報ＡＤ２が入力された場合に、モデルＭ１が要約情報ＡＤ２と同様の情報を出力するように、学習処理を行う。生成部１３２は、モデルを生成し、生成したモデルをモデル情報記憶部１２４に格納する。なお、抽出装置１００は、情報提供装置５０等の他の外部装置からモデルを取得する場合、生成部１３２を有しなくてもよい。 For example, the generation unit 132 performs a learning process so that when the summary information AD1 is input, the model M1 outputs the same information as the summary information AD1. For example, the generation unit 132 performs a learning process so that when the summary information AD2 is input, the model M1 outputs the same information as the summary information AD2. The generation unit 132 generates a model and stores the generated model in the model information storage unit 124. The extraction device 100 does not have to have the generation unit 132 when the model is acquired from another external device such as the information providing device 50.

なお、生成部１３２は、モデルＭ１に限らず、特許の図面に対応するモデルＭ２や特許の明細書に対応するモデルＭ３や特許のクレームに対応するモデルＭ４等を生成してもよい。例えば、生成部１３２は、種々の学習アルゴリズムを用いてモデルを生成してもよい。例えば、生成部１３２は、ニューラルネットワーク（neural network）、サポートベクターマシン（ＳＶＭ）、クラスタリング、強化学習等の学習アルゴリズムを用いてモデルＭ２１、Ｍ２２等を生成する。一例として、生成部１３２がニューラルネットワークを用いてモデルＭ２１、Ｍ２２等を生成する場合、モデルＭ２１、Ｍ２２等は、一以上のニューロンを含む入力層と、一以上のニューロンを含む中間層と、一以上のニューロンを含む出力層とを有する。 The generation unit 132 is not limited to the model M1, and may generate a model M2 corresponding to the drawing of the patent, a model M3 corresponding to the specification of the patent, a model M4 corresponding to the claim of the patent, and the like. For example, the generation unit 132 may generate a model using various learning algorithms. For example, the generation unit 132 generates models M21, M22, and the like using learning algorithms such as a neural network, a support vector machine (SVM), clustering, and reinforcement learning. As an example, when the generation unit 132 generates models M21, M22, etc. using a neural network, the models M21, M22, etc. include an input layer containing one or more neurons and an intermediate layer containing one or more neurons. It has an output layer containing the above neurons.

図１の例では、生成部１３２は、一の発明に対応する要約情報からグラフ情報の探索に用いるベクトルを生成する。図１の例では、生成部１３２は、処理群ＰＳ１１に示すような処理により、特許Ｘに対応するベクトルを生成する。生成部１３２は、特許Ｘに関する要約情報ＡＤ１１をモデルＭ１に入力する。具体的には、生成部１３２は、端末装置１０から取得した特許Ｘの要約情報ＡＤ１１をモデルＭ１に入力する。そして、生成部１３２は、要約情報ＡＤ１１の入力後のモデルＭ１中の情報を用いて、ベクトルを生成する。例えば、生成部１３２は、要約情報ＡＤ１１が入力されたモデルＭ１中の各要素を用いて、ベクトルデータを生成する。 In the example of FIG. 1, the generation unit 132 generates a vector used for searching graph information from the summary information corresponding to one invention. In the example of FIG. 1, the generation unit 132 generates the vector corresponding to the patent X by the processing as shown in the processing group PS11. The generation unit 132 inputs the summary information AD11 regarding the patent X into the model M1. Specifically, the generation unit 132 inputs the summary information AD 11 of the patent X acquired from the terminal device 10 into the model M1. Then, the generation unit 132 generates a vector by using the information in the model M1 after the input of the summary information AD11. For example, the generation unit 132 generates vector data using each element in the model M1 in which the summary information AD11 is input.

図１の例では、生成部１３２は、要約情報ＡＤ１１が入力されたモデルＭ１中の各要素の値を用いて、ベクトルデータＶＤ１１を生成する。例えば、生成部１３２は、特許Ｘの要約情報ＡＤ１１が入力された場合における、モデルＭ１のニューロンＮＬ１に対応する値ＶＥ１（図１０参照）やニューロンＮＬ２に対応する値ＶＥ２（図１０参照）を用いて、ベクトルを生成する。例えば、生成部１３２は、特許Ｘの要約情報ＡＤ１１が入力された場合に、算出されるニューロンＮＬ１に対応する値ＶＥ１をベクトルＶＤ１１の１次元目の要素として抽出してもよい。また、例えば、生成部１３２は、要約書の要約情報が入力された場合に、算出されるニューロンＮＬ２に対応する値ＶＥ２をベクトルＶＤ１１の２次元目の要素として、ベクトルＶＤ１１を生成する。図１の例では、生成部１３２は、１次元目の要素が「３５」であり、２次元目の要素が「６３」であるようなベクトルＶＤ１１を生成する。 In the example of FIG. 1, the generation unit 132 generates the vector data VD11 by using the value of each element in the model M1 in which the summary information AD11 is input. For example, the generation unit 132 uses the value VE1 (see FIG. 10) corresponding to the neuron NL1 of the model M1 and the value VE2 (see FIG. 10) corresponding to the neuron NL2 when the summary information AD11 of the patent X is input. To generate a vector. For example, the generation unit 132 may extract the value VE1 corresponding to the calculated neuron NL1 as the first-dimensional element of the vector VD11 when the summary information AD11 of the patent X is input. Further, for example, the generation unit 132 generates the vector VD11 by using the value VE2 corresponding to the calculated neuron NL2 as the second-dimensional element of the vector VD11 when the summary information of the abstract is input. In the example of FIG. 1, the generation unit 132 generates the vector VD11 such that the element of the first dimension is “35” and the element of the second dimension is “63”.

（決定部１３３）
決定部１３３は、各種情報を決定する。決定部１３３は、起点ノードの決定に用いるインデックス情報に基づいて、起点ノードを決定する。決定部１３３は、木構造型のインデックス情報に基づいて、起点ノードを決定する。例えば、決定部１３３は、取得部１３１により取得された一の発明に関する情報と、グラフ情報の検索の起点となる起点ベクトルに関する情報とに基づいて、起点ベクトルを決定する。また、決定部１３３は、起点ベクトルの決定に用いるインデックス情報に基づいて、起点ベクトルを決定する。また、決定部１３３は、木構造型のインデックス情報に基づいて、起点ベクトルを決定する。 (Decision unit 133)
The determination unit 133 determines various information. The determination unit 133 determines the origin node based on the index information used for determining the origin node. The determination unit 133 determines the starting node based on the index information of the tree structure type. For example, the determination unit 133 determines the starting point vector based on the information regarding one invention acquired by the acquisition unit 131 and the information regarding the starting point vector that is the starting point for searching the graph information. Further, the determination unit 133 determines the origin vector based on the index information used for determining the origin vector. Further, the determination unit 133 determines the starting point vector based on the index information of the tree structure type.

図１の例では、決定部１３３は、一の要約書（クエリ）に対応する起点ベクトルを決定（特定）するために、インデックス情報ＩＮＤ１１を用いる。図１の例では、決定部１３３は、特許ＸのベクトルＶＤ１１に対応する起点ベクトルを決定（特定）するために、インデックス情報ＩＮＤ１１を用いる。すなわち、決定部１３３は、ベクトルＶＤ１１とインデックス情報ＩＮＤ１１とを用いて、グラフ情報ＧＲ１１における起点ベクトルを決定する。 In the example of FIG. 1, the determination unit 133 uses the index information IND11 to determine (specify) the origin vector corresponding to one abstract (query). In the example of FIG. 1, the determination unit 133 uses the index information IND11 to determine (specify) the origin vector corresponding to the vector VD11 of the patent X. That is, the determination unit 133 determines the starting point vector in the graph information GR 11 by using the vector VD11 and the index information IND11.

例えば、決定部１３３は、図１中のインデックス情報ＩＮＤ１１に示すような木構造型のインデックス情報を用いて、グラフ情報ＧＲ１１における起点ベクトルを決定する。図１の例では、決定部１３３は、ベクトルＶＤ１１を生成した後、インデックス情報ＩＮＤ１１を上から下へ辿ることにより、インデックス情報ＩＮＤ１１の近傍候補となる起点ベクトルを特定することにより、効率的に検索クエリ（一の要約書）に対応する起点ベクトルを決定することができる。 For example, the determination unit 133 determines the starting point vector in the graph information GR 11 by using the tree-structured index information as shown in the index information IND11 in FIG. In the example of FIG. 1, after generating the vector VD11, the determination unit 133 efficiently searches by tracing the index information IND11 from top to bottom to specify a starting point vector that is a candidate near the index information IND11. The origin vector corresponding to the query (one abstract) can be determined.

例えば、決定部１３３は、インデックス情報ＩＮＤ１１をルートＲＴからリーフノード（グラフ情報ＧＲ１１中のノード（ベクトル））まで辿ることにより、ベクトルＶＤ１１に対応する起点ベクトルを決定してもよい。図１の例では、例えば、決定部１３３は、インデックス情報ＩＮＤ１１をルートＲＴからノードＮ４５１まで辿ることにより、ノードＮ４５１を起点ベクトルとして決定する。 For example, the determination unit 133 may determine the origin vector corresponding to the vector VD11 by tracing the index information IND11 from the root RT to the leaf node (node (vector) in the graph information GR11). In the example of FIG. 1, for example, the determination unit 133 determines the node N451 as a starting point vector by tracing the index information IND11 from the root RT to the node N451.

（抽出部１３４）
抽出部１３４は、各種情報を抽出する。抽出部１３４は、取得部１３１により取得されたグラフ情報の複数のノードのうち、所定の基準に基づいて決定されたグラフ情報の検索の起点となる起点ノードを起点としてグラフ情報を検索することにより、複数の特許文献のうち、一の発明に類似する特許文献である類似特許文献を抽出する。抽出部１３４は、決定部１３３により決定された起点ノードを起点として、類似特許を抽出する。抽出部１３４は、取得部１３１により取得されたグラフ情報の複数のベクトルのうち、所定の基準に基づいて決定されたグラフ情報の検索の起点となる起点ベクトルを起点としてグラフ情報を検索することにより、複数の特許文献のうち、一の発明に類似する特許文献である類似特許を抽出する。例えば、抽出部１３４は、決定部１３３により決定された起点ベクトルを起点としてグラフ情報を検索することにより、複数の特許文献のうち、一の発明に類似する特許文献である類似特許を抽出する。例えば、抽出部１３４は、グラフ情報記憶部１２３に記憶された各ノード（ベクトル）間を連結するエッジの長さ（距離）の情報を用いてもよいし、各ノードのベクトル情報から各ノード（ベクトル）間を連結するエッジの長さ（距離）の情報を算出し、算出した長さ（距離）の情報を用いてもよい。 (Extraction unit 134)
The extraction unit 134 extracts various information. The extraction unit 134 searches for graph information starting from a starting node that is a starting point for searching graph information determined based on a predetermined criterion among a plurality of nodes of the graph information acquired by the acquisition unit 131. , A similar patent document which is a patent document similar to one invention is extracted from a plurality of patent documents. The extraction unit 134 extracts similar patents from the starting point node determined by the determination unit 133. The extraction unit 134 searches for graph information from a plurality of vectors of graph information acquired by the acquisition unit 131, starting from a starting point vector that is a starting point for searching graph information determined based on a predetermined criterion. , A similar patent, which is a patent document similar to one invention, is extracted from a plurality of patent documents. For example, the extraction unit 134 extracts similar patents, which are patent documents similar to one invention, from a plurality of patent documents by searching graph information with the starting point vector determined by the determination unit 133 as the starting point. For example, the extraction unit 134 may use information on the length (distance) of the edge connecting each node (vector) stored in the graph information storage unit 123, or each node (from the vector information of each node). Information on the length (distance) of the edges connecting the vectors) may be calculated, and the calculated length (distance) information may be used.

図１の例では、抽出部１３４は、グラフ情報ＧＲ１１を検索することにより、特許Ｘの類似特許を抽出する。例えば、抽出部１３４は、ノードＮ４５１の近傍に位置するノードを類似特許として抽出する。例えば、抽出部１３４は、ノードＮ４５１からの距離が近いノードを類似特許として抽出する。例えば、抽出部１３４は、ノードＮ４５１を起点として、エッジを辿ることにより、ノードＮ４５１から到達可能なノードを類似特許として抽出する。例えば、抽出部１３４は、所定数（例えば、２個や１０個等）のノードを類似特許として抽出する。例えば、抽出部１３４は、図１１に示すような検索処理により、特許Ｘの類似特許を抽出してもよいが、詳細は後述する。図１の例では、抽出部１３４は、ノードＮ４５１を起点として、グラフ情報ＧＲ１１を探索することにより、ノードＮ４５１やノードＮ３５を類似特許として抽出する。 In the example of FIG. 1, the extraction unit 134 extracts a similar patent of patent X by searching the graph information GR11. For example, the extraction unit 134 extracts a node located in the vicinity of the node N451 as a similar patent. For example, the extraction unit 134 extracts a node having a short distance from the node N451 as a similar patent. For example, the extraction unit 134 extracts a node reachable from the node N451 as a similar patent by tracing the edge starting from the node N451. For example, the extraction unit 134 extracts a predetermined number (for example, two, ten, etc.) of nodes as similar patents. For example, the extraction unit 134 may extract a similar patent of patent X by a search process as shown in FIG. 11, but the details will be described later. In the example of FIG. 1, the extraction unit 134 extracts the node N451 and the node N35 as similar patents by searching the graph information GR11 starting from the node N451.

（提供部１３５）
提供部１３５は、各種情報を提供する。例えば、提供部１３５は、端末装置１０や情報提供装置５０に各種情報を送信する。例えば、提供部１３５は、端末装置１０や情報提供装置５０に各種情報を配信する。例えば、提供部１３５は、端末装置１０や情報提供装置５０に各種情報を提供する。提供部１３５は、抽出部１３４により抽出された類似特許に基づいて、所定のサービスを提供する。また、提供部１３５は、類似特許に関する情報提供サービスを提供する。提供部１３５は、端末装置１０に類似特許に関する情報を提供する。 (Providing section 135)
The providing unit 135 provides various information. For example, the providing unit 135 transmits various information to the terminal device 10 and the information providing device 50. For example, the providing unit 135 distributes various information to the terminal device 10 and the information providing device 50. For example, the providing unit 135 provides various information to the terminal device 10 and the information providing device 50. The providing unit 135 provides a predetermined service based on the similar patent extracted by the extracting unit 134. In addition, the providing unit 135 provides an information providing service regarding similar patents. The providing unit 135 provides the terminal device 10 with information regarding similar patents.

例えば、提供部１３５は、クエリに対応するオブジェクトＩＤを検索結果として提供する。例えば、提供部１３５は、抽出部１３４により選択されたオブジェクトＩＤを情報提供装置５０へ提供する。提供部１３５は、抽出部１３４により選択されたオブジェクトＩＤをクエリに対応するベクトルを示す情報として情報提供装置５０に提供する。また、提供部１３５は、生成部１３２により生成されたモデルを外部の情報処理装置へ提供してもよい。 For example, the providing unit 135 provides the object ID corresponding to the query as a search result. For example, the providing unit 135 provides the object ID selected by the extracting unit 134 to the information providing device 50. The providing unit 135 provides the information providing device 50 with the object ID selected by the extracting unit 134 as information indicating a vector corresponding to the query. Further, the providing unit 135 may provide the model generated by the generating unit 132 to an external information processing device.

図１の例では、提供部１３５は、抽出部１３４により抽出された類似特許に関する情報を提供する。例えば、提供部１３５は、ノードＮ４５１に対応する特許＃４５１や、ノードＮ３５に対応する特許＃３５を特許Ｘに類似する特許文献としてユーザＵ１が利用する端末装置１０に提供する。 In the example of FIG. 1, the provider 135 provides information about similar patents extracted by the extractor 134. For example, the providing unit 135 provides the patent # 451 corresponding to the node N451 and the patent # 35 corresponding to the node N35 to the terminal device 10 used by the user U1 as a patent document similar to the patent X.

〔４．抽出処理のフロー〕
次に、図８を用いて、実施形態に係る抽出システム１による抽出処理の手順について説明する。図８は、実施形態に係る抽出処理の一例を示すフローチャートである。 [4. Extraction process flow]
Next, the procedure of the extraction process by the extraction system 1 according to the embodiment will be described with reference to FIG. FIG. 8 is a flowchart showing an example of the extraction process according to the embodiment.

図８に示すように、抽出装置１００は、一の発明に関する情報を取得する（ステップＳ１０１）。例えば、抽出装置１００は、一の発明に関する要約情報を取得する。図１の例では、抽出装置１００は、端末装置１０から特許Ｘに関する要約情報ＡＤ１１を取得する。 As shown in FIG. 8, the extraction device 100 acquires information regarding one invention (step S101). For example, the extraction device 100 acquires summary information about one invention. In the example of FIG. 1, the extraction device 100 acquires the summary information AD11 regarding the patent X from the terminal device 10.

抽出装置１００は、一の発明に関する情報に対応する種類の特許書類のグラフ情報を取得する（ステップＳ１０２）。例えば、抽出装置１００は、グラフ情報記憶部１２３のうち、対象書類「要約書」に対応するグラフデータセット１２３−１から、対象書類「要約書」に関するグラフ情報ＧＲ１１を取得する。 The extraction device 100 acquires graph information of a type of patent document corresponding to the information relating to one invention (step S102). For example, the extraction device 100 acquires the graph information GR11 related to the target document “summary” from the graph data set 123-1 corresponding to the target document “summary” in the graph information storage unit 123.

そして、抽出装置１００は、モデルを用いて一の発明に関する情報からベクトルを生成する（ステップＳ１０３）。例えば、抽出装置１００は、モデルを用いて一の発明の要約情報からベクトルを生成する。図１の例では、抽出装置１００は、モデル情報記憶部１２４に記憶されたモデルＭ１を用いて、要約情報ＡＤ１１からベクトルＶＤ１１を生成する。 Then, the extraction device 100 generates a vector from the information regarding one invention by using the model (step S103). For example, the extraction device 100 uses a model to generate a vector from the summary information of one invention. In the example of FIG. 1, the extraction device 100 generates the vector VD11 from the summary information AD11 by using the model M1 stored in the model information storage unit 124.

そして、抽出装置１００は、生成したベクトルとインデックス情報を用いて起点ベクトルを決定する（ステップＳ１０４）。図１の例では、抽出装置１００は、ベクトルＶＤ１１と、インデックス情報記憶部１２２に記憶されたインデックス情報ＩＮＤ１１とを用いて、起点ベクトルをノードＮ４５１に決定する。 Then, the extraction device 100 determines the starting point vector using the generated vector and the index information (step S104). In the example of FIG. 1, the extraction device 100 determines the starting point vector at the node N451 by using the vector VD11 and the index information IND11 stored in the index information storage unit 122.

そして、抽出装置１００は、グラフ情報を検索することにより、一の発明の類似特許を抽出する（ステップＳ１０５）。図１の例では、抽出装置１００は、ノードＮ４５１を起点として、グラフ情報ＧＲ１１を探索することにより、ノードＮ４５１やノードＮ３５を類似特許として抽出する。 Then, the extraction device 100 extracts a similar patent of one invention by searching the graph information (step S105). In the example of FIG. 1, the extraction device 100 extracts the node N451 and the node N35 as similar patents by searching the graph information GR11 starting from the node N451.

そして、抽出装置１００は、抽出した類似特許に関する情報を提供する（ステップＳ１０６）。図１の例では、抽出装置１００は、ノードＮ４５１に対応する特許＃４５１や、ノードＮ３５に対応する特許＃３５を特許Ｘに類似する特許文献としてユーザＵ１が利用する端末装置１０に提供する。 Then, the extraction device 100 provides information regarding the extracted similar patents (step S106). In the example of FIG. 1, the extraction device 100 provides the patent # 451 corresponding to the node N451 and the patent # 35 corresponding to the node N35 to the terminal device 10 used by the user U1 as a patent document similar to the patent X.

〔５．生成処理のフロー〕
次に、図９を用いて、実施形態に係る抽出システム１による生成処理の手順について説明する。図９は、実施形態に係る生成処理の一例を示すフローチャートである。 [5. Generation process flow]
Next, the procedure of the generation process by the extraction system 1 according to the embodiment will be described with reference to FIG. FIG. 9 is a flowchart showing an example of the generation process according to the embodiment.

図９に示すように、抽出装置１００は、学習データを取得する（ステップＳ２０１）。例えば、抽出装置１００は、特許情報記憶部１２１から学習データを取得する。例えば、抽出装置１００は、特許情報記憶部１２１から要約情報ＡＤ１、ＡＤ２等を学習データとして取得する。 As shown in FIG. 9, the extraction device 100 acquires training data (step S201). For example, the extraction device 100 acquires learning data from the patent information storage unit 121. For example, the extraction device 100 acquires summary information AD1, AD2, etc. as learning data from the patent information storage unit 121.

その後、抽出装置１００は、学習データに基づきモデルを生成する（ステップＳ２０２）。例えば、抽出装置１００は、特許情報記憶部１２１から学習データを用いてモデルＭ１を生成する。例えば、抽出装置１００は、入力層に入力される情報（要約情報）と同様の情報（要約情報）を出力層から出力するようにモデルＭ１を生成する。例えば、抽出装置１００は、特許の書類の種類「要約書」の要約情報（要約データ）を入力とするオートエンコーダとしてのモデルＭ１を生成する。 After that, the extraction device 100 generates a model based on the training data (step S202). For example, the extraction device 100 generates the model M1 from the patent information storage unit 121 using the learning data. For example, the extraction device 100 generates the model M1 so as to output the same information (summary information) as the information (summary information) input to the input layer from the output layer. For example, the extraction device 100 generates a model M1 as an autoencoder that inputs summary information (summary data) of the patent document type “summary”.

〔６．検索例〕
ここで、上述したグラフ情報を用いた検索の一例を示す。なお、グラフ情報（グラフデータ）を用いた検索は下記に限らず、種々の手順により行われてもよい。この点について、図１１を一例として説明する。図１１は、グラフデータ（グラフ情報）を用いた検索処理の一例を示すフローチャートである。また、以下でいうオブジェクトは、ベクトルやノードと読み替えてもよい。なお、以下では、抽出装置１００が検索処理を行うものとして説明するが、検索処理は他の装置により行われてもよい。例えば、抽出装置１００は、検索クエリとして、一の発明の要約情報（要約データ）から生成されたベクトルデータを用いる。例えば、抽出装置１００は、一の発明の要約情報（要約データ）から生成されたベクトルデータとインデックス情報とに基づいて決定された起点ベクトルを起点としてグラフデータを検索する。図１の例では、抽出装置１００は、特許ＸのベクトルＶＤ１１とインデックス情報ＩＮＤ１１とに基づいて決定された起点ベクトルであるノードＮ４５１を起点としてグラフ情報ＧＲ１１を検索する。 [6. Search example]
Here, an example of a search using the graph information described above is shown. The search using graph information (graph data) is not limited to the following, and may be performed by various procedures. This point will be described with reference to FIG. 11 as an example. FIG. 11 is a flowchart showing an example of a search process using graph data (graph information). In addition, the objects referred to below may be read as vectors and nodes. In the following description, it is assumed that the extraction device 100 performs the search process, but the search process may be performed by another device. For example, the extraction device 100 uses vector data generated from the summary information (summary data) of one invention as a search query. For example, the extraction device 100 searches for graph data starting from a starting point vector determined based on vector data and index information generated from the summary information (summary data) of one invention. In the example of FIG. 1, the extraction device 100 searches the graph information GR11 starting from the node N451 which is the starting point vector determined based on the vector VD11 of the patent X and the index information IND11.

ここでは、近傍オブジェクト集合Ｎ（Ｇ，ｙ）は、ノードｙに付与されているエッジにより関連付けられている近傍のオブジェクトの集合である。「Ｇ」は、所定のグラフデータ（例えば、グラフ情報ＧＲ１１等）であってもよい。例えば、抽出装置１００は、ｋ近傍検索処理を実行する。 Here, the neighborhood object set N (G, y) is a set of neighborhood objects associated with the edge assigned to the node y. “G” may be predetermined graph data (for example, graph information GR11 or the like). For example, the extraction device 100 executes the k-nearest neighbor search process.

例えば、抽出装置１００は、超球の半径ｒを∞（無限大）に設定し（ステップＳ３００）、既存のオブジェクト集合から部分集合Ｓを抽出する（ステップＳ３０１）。例えば、抽出装置１００は、ルートノード（起点ベクトル）として選択されたオブジェクト（ノード）を部分集合Ｓとして抽出してもよい。図１の例では、抽出装置１００は、起点ベクトルであるノードＮ４５１等を部分集合Ｓとして抽出してもよい。また、例えば、超球とは、検索範囲を示す仮想的な球である。なお、ステップＳ３０１において抽出されたオブジェクト集合Ｓに含まれるオブジェクトは、同時に検索結果のオブジェクト集合Ｒの初期集合にも含められる。 For example, the extraction device 100 sets the radius r of the hypersphere to ∞ (infinity) (step S300), and extracts the subset S from the existing object set (step S301). For example, the extraction device 100 may extract an object (node) selected as a root node (starting point vector) as a subset S. In the example of FIG. 1, the extraction device 100 may extract the node N451 or the like, which is a starting point vector, as a subset S. Further, for example, a hypersphere is a virtual sphere indicating a search range. The objects included in the object set S extracted in step S301 are also included in the initial set of the object set R of the search results at the same time.

次に、抽出装置１００は、オブジェクト集合Ｓに含まれるオブジェクトの中で、検索クエリオブジェクトをｙとするとオブジェクトｙとの距離が最も短いオブジェクトを抽出し、オブジェクトｓとする（ステップＳ３０２）。図１の例では、抽出装置１００は、オブジェクト集合Ｓに含まれるオブジェクトの中で、検索クエリオブジェクトであるベクトルＶＤ１１との距離が最も短いオブジェクトを抽出し、オブジェクトｓとする。例えば、抽出装置１００は、オブジェクト集合Ｓに含まれるオブジェクトの中で、ベクトルＶＤ１１との距離が最も短いノードＮ４５１を抽出し、オブジェクトｓとする。例えば、抽出装置１００は、ルートノード（起点ベクトル）として選択されたオブジェクト（ノード）のみがオブジェクト集合Ｓの要素の場合には、結果的にルートノード（起点ベクトル）がオブジェクトｓとして抽出される。次に、抽出装置１００は、オブジェクトｓをオブジェクト集合Ｓから除外する（ステップＳ３０３）。 Next, the extraction device 100 extracts the object having the shortest distance from the object y, where y is the search query object among the objects included in the object set S, and sets it as the object s (step S302). In the example of FIG. 1, the extraction device 100 extracts the object having the shortest distance from the vector VD11, which is the search query object, among the objects included in the object set S, and uses it as the object s. For example, the extraction device 100 extracts the node N451 having the shortest distance from the vector VD11 among the objects included in the object set S, and uses it as the object s. For example, in the extraction device 100, when only the object (node) selected as the root node (starting point vector) is an element of the object set S, the root node (starting point vector) is extracted as the object s as a result. Next, the extraction device 100 excludes the objects s from the object set S (step S303).

次に、抽出装置１００は、オブジェクトｓとオブジェクトｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超えるか否かを判定する（ステップＳ３０４）。ここで、εは拡張要素であり、ｒ（１＋ε）は、探索範囲（この範囲内のノードのみを探索する。検索範囲よりも大きくすることで精度を高めることができる）の半径を示す値である。オブジェクトｓとオブジェクトｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超える場合（ステップＳ３０４：Ｙｅｓ）、抽出装置１００は、オブジェクト集合Ｒをオブジェクトｙの近傍オブジェクト集合として出力し（ステップＳ３０５）、処理を終了する。 Next, the extraction device 100 determines whether or not the distance d (s, y) between the object s and the object y exceeds r (1 + ε) (step S304). Here, ε is an extension element, and r (1 + ε) is a value indicating the radius of the search range (searching only the nodes within this range. The accuracy can be improved by making it larger than the search range). be. When the distance d (s, y) between the object s and the object y exceeds r (1 + ε) (step S304: Yes), the extraction device 100 outputs the object set R as a neighborhood object set of the object y (step S305). ), End the process.

オブジェクトｓと検索クエリオブジェクトｙとの距離ｄ（ｓ，ｙ）がｒ（１＋ε）を超えない場合（ステップＳ３０４：Ｎｏ）、抽出装置１００は、オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトの中からオブジェクト集合Ｃに含まれないオブジェクトを一つ選択し、選択したオブジェクトｕを、オブジェクト集合Ｃに格納する（ステップＳ３０６）。オブジェクト集合Ｃは、重複検索を回避するために便宜上設けられるものであり、処理開始時には空集合に設定される。 When the distance d (s, y) between the object s and the search query object y does not exceed r (1 + ε) (step S304: No), the extraction device 100 determines that the object set N (G, s) in the vicinity of the objects s. One object not included in the object set C is selected from the objects that are elements, and the selected object u is stored in the object set C (step S306). The object set C is provided for convenience in order to avoid duplicate search, and is set to an empty set at the start of processing.

次に、抽出装置１００は、オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ（１＋ε）以下であるか否かを判定する（ステップＳ３０７）。オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ（１＋ε）以下である場合（ステップＳ３０７：Ｙｅｓ）、抽出装置１００は、オブジェクトｕをオブジェクト集合Ｓに追加する（ステップＳ３０８）。 Next, the extraction device 100 determines whether or not the distance d (u, y) between the object u and the object y is r (1 + ε) or less (step S307). When the distance d (u, y) between the object u and the object y is r (1 + ε) or less (step S307: Yes), the extraction device 100 adds the object u to the object set S (step S308).

次に、抽出装置１００は、オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ以下であるか否かを判定する（ステップＳ３０９）。オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒを超える場合（ステップＳ３０９：Ｎｏ）、抽出装置１００は、ステップＳ３１５の判定（処理）を行う。 Next, the extraction device 100 determines whether or not the distance d (u, y) between the object u and the object y is r or less (step S309). When the distance d (u, y) between the object u and the object y exceeds r (step S309: No), the extraction device 100 determines (processes) step S315.

オブジェクトｕとオブジェクトｙとの距離ｄ（ｕ，ｙ）がｒ以下である場合（ステップＳ３０９：Ｙｅｓ）、抽出装置１００は、オブジェクトｕをオブジェクト集合Ｒに追加する（ステップＳ３１０）。そして、抽出装置１００は、オブジェクト集合Ｒに含まれるオブジェクト数がｋｓを超えるか否かを判定する（ステップＳ３１１）。所定数ｋｓは、任意に定められる自然数である。例えば、ｋｓ＝２やｋｓ＝１０等の種々の設定であってもよい。 When the distance d (u, y) between the object u and the object y is r or less (step S309: Yes), the extraction device 100 adds the object u to the object set R (step S310). Then, the extraction device 100 determines whether or not the number of objects included in the object set R exceeds ks (step S311). The predetermined number ks is an arbitrarily determined natural number. For example, various settings such as ks = 2 and ks = 10 may be used.

オブジェクト集合Ｒに含まれるオブジェクト数がｋｓを超える場合（ステップＳ３１１：Ｙｅｓ）、抽出装置１００は、オブジェクト集合Ｒに含まれるオブジェクトの中でオブジェクトｙとの距離が最も長い（遠い）オブジェクトを、オブジェクト集合Ｒから除外する（ステップＳ３１２）。 When the number of objects included in the object set R exceeds ks (step S311: Yes), the extraction device 100 selects the object having the longest distance (far) from the object y among the objects included in the object set R. Exclude from set R (step S312).

次に、抽出装置１００は、オブジェクト集合Ｒに含まれるオブジェクト数がｋｓと一致するか否かを判定する（ステップＳ３１３）。オブジェクト集合Ｒに含まれるオブジェクト数がｋｓと一致する場合（ステップＳ３１３：Ｙｅｓ）、抽出装置１００は、オブジェクト集合Ｒに含まれるオブジェクトの中でオブジェクトｙとの距離が最も長い（遠い）オブジェクトと、オブジェクトｙとの距離を、新たなｒに設定する（ステップＳ３１４）。 Next, the extraction device 100 determines whether or not the number of objects included in the object set R matches ks (step S313). When the number of objects included in the object set R matches ks (step S313: Yes), the extraction device 100 determines that the object included in the object set R has the longest distance (far) from the object y. The distance to the object y is set to a new r (step S314).

そして、抽出装置１００は、オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトから全てのオブジェクトを選択してオブジェクト集合Ｃに格納し終えたか否かを判定する（ステップＳ３１５）。オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトから全てのオブジェクトを選択してオブジェクト集合Ｃに格納し終えていない場合（ステップＳ３１５：Ｎｏ）、抽出装置１００は、ステップＳ３０６に戻って処理を繰り返す。 Then, the extraction device 100 selects all the objects from the objects that are the elements of the object set N (G, s) in the vicinity of the objects s, and determines whether or not the objects have been stored in the object set C (step S315). When all the objects are selected from the objects which are the elements of the object set N (G, s) in the vicinity of the objects and stored in the object set C (step S315: No), the extraction device 100 sets the extraction device 100 in step S306. Go back and repeat the process.

オブジェクトｓの近傍オブジェクト集合Ｎ（Ｇ，ｓ）の要素であるオブジェクトから全てのオブジェクトを選択してオブジェクト集合Ｃに格納し終えた場合（ステップＳ３１５：Ｙｅｓ）、抽出装置１００は、オブジェクト集合Ｓが空集合であるか否かを判定する（ステップＳ３１６）。オブジェクト集合Ｓが空集合でない場合（ステップＳ３１６：Ｎｏ）、抽出装置１００は、ステップＳ３０２に戻って処理を繰り返す。また、オブジェクト集合Ｓが空集合である場合（ステップＳ３１６：Ｙｅｓ）、抽出装置１００は、オブジェクト集合Ｒを出力し、処理を終了する（ステップＳ３１７）。例えば、抽出装置１００は、オブジェクト集合Ｒに含まれるオブジェクト（ノード）を検索クエリ（入力オブジェクトｙ）に対応する検索結果として、検索を行った端末装置１０等へ提供してもよい。図１の例では、抽出装置１００は、オブジェクト集合Ｒに含まれるノードＮ４５１やノードＮ３５を検索クエリ（特許ＸのベクトルＶＤ１１）に対応する検索結果として、検索を行った端末装置１０等へ提供してもよい。例えば、抽出装置１００は、ノードＮ４５１に対応する特許＃４５１や、ノードＮ３５に対応する特許＃３５を特許Ｘに類似する特許文献としてユーザＵ１が利用する端末装置１０に提供する。 When all the objects are selected from the objects that are the elements of the object set N (G, s) in the vicinity of the objects s and stored in the object set C (step S315: Yes), the extraction device 100 has the object set S in the object set S. It is determined whether or not it is an empty set (step S316). If the object set S is not an empty set (step S316: No), the extraction device 100 returns to step S302 and repeats the process. When the object set S is an empty set (step S316: Yes), the extraction device 100 outputs the object set R and ends the process (step S317). For example, the extraction device 100 may provide the object (node) included in the object set R to the terminal device 10 or the like that has performed the search as a search result corresponding to the search query (input object y). In the example of FIG. 1, the extraction device 100 provides the node N451 and the node N35 included in the object set R to the searched terminal device 10 and the like as search results corresponding to the search query (patent X vector VD11). You may. For example, the extraction device 100 provides the patent # 451 corresponding to the node N451 and the patent # 35 corresponding to the node N35 to the terminal device 10 used by the user U1 as a patent document similar to the patent X.

〔７．効果〕
上述してきたように、実施形態に係る抽出装置１００は、取得部１３１と、抽出部１３４とを有する。取得部１３１は、複数の特許文献の各々に対応する複数のノードが、複数の特許文献の類似性に応じて連結されたグラフ情報と、一の発明に関する情報を取得する。抽出部１３４は、取得部１３１により取得されたグラフ情報の複数のノードのうち、所定の基準に基づいて決定されたグラフ情報の検索の起点となる起点ノードを起点としてグラフ情報を検索することにより、複数の特許文献のうち、一の発明に類似する特許文献である類似特許文献を抽出する。 [7. effect〕
As described above, the extraction device 100 according to the embodiment has an acquisition unit 131 and an extraction unit 134. The acquisition unit 131 acquires graph information in which a plurality of nodes corresponding to each of the plurality of patent documents are linked according to the similarity of the plurality of patent documents, and information relating to one invention. The extraction unit 134 searches for graph information starting from a starting node that is a starting point for searching graph information determined based on a predetermined criterion among a plurality of nodes of the graph information acquired by the acquisition unit 131. , A similar patent document which is a patent document similar to one invention is extracted from a plurality of patent documents.

このように、実施形態に係る抽出装置１００は、起点ノードを起点としてグラフ情報を検索し、複数の特許文献のうち、一の発明に類似する特許文献である類似特許文献を抽出することにより、類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment searches the graph information with the starting point node as the starting point, and extracts a similar patent document which is a patent document similar to one invention from the plurality of patent documents. Similar patent documents can be appropriately extracted.

また、実施形態に係る抽出装置１００は、決定部１３３を有する。決定部１３３は、起点ノードの決定に用いるインデックス情報に基づいて、起点ノードを決定する。抽出部１３４は、決定部１３３により決定された起点ノードを起点として、類似特許を抽出する。 Further, the extraction device 100 according to the embodiment has a determination unit 133. The determination unit 133 determines the origin node based on the index information used for determining the origin node. The extraction unit 134 extracts similar patents from the starting point node determined by the determination unit 133.

このように、実施形態に係る抽出装置１００は、起点ノードの決定に用いるインデックス情報に基づいて、起点ノードを決定することにより、類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment can appropriately extract similar patent documents by determining the starting point node based on the index information used for determining the starting point node.

また、実施形態に係る抽出装置１００において、決定部１３３は、木構造型のインデックス情報に基づいて、起点ノードを決定する。 Further, in the extraction device 100 according to the embodiment, the determination unit 133 determines the starting node based on the index information of the tree structure type.

このように、実施形態に係る抽出装置１００は、木構造型のインデックス情報に基づいて、起点ノードを決定することにより、類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment can appropriately extract similar patent documents by determining the starting node based on the index information of the tree structure type.

また、実施形態に係る抽出装置１００において、取得部１３１は、複数の特許文献の各々の特徴を示す複数のベクトルが類似性に応じて連結されたグラフ情報を取得する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 acquires graph information in which a plurality of vectors showing the characteristics of each of the plurality of patent documents are concatenated according to the similarity.

このように、実施形態に係る抽出装置１００は、複数の特許文献の各々の特徴を示す複数のベクトルが類似性に応じて連結されたグラフ情報を取得することにより、類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment appropriately obtains similar patent documents by acquiring graph information in which a plurality of vectors showing the characteristics of each of the plurality of patent documents are concatenated according to the similarity. Can be extracted.

また、実施形態に係る抽出装置１００において、取得部１３１は、複数の特許文献の各々の特徴を示す複数のベクトルの類似性に応じて、複数のノードが連結されたグラフ情報を取得する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 acquires graph information in which a plurality of nodes are connected according to the similarity of a plurality of vectors showing the characteristics of each of the plurality of patent documents.

このように、実施形態に係る抽出装置１００は、複数の特許文献の各々の特徴を示す複数のベクトルの類似性に応じて、複数のノードが連結されたグラフ情報を取得することにより、類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment is similar by acquiring graph information in which a plurality of nodes are connected according to the similarity of a plurality of vectors showing the characteristics of each of the plurality of patent documents. Patent documents can be appropriately extracted.

また、実施形態に係る抽出装置１００において、取得部１３１は、所定のモデルを用いて複数の特許文献の各々から抽出された特徴量を要素とする複数のベクトルの類似性に応じて、複数のノードが連結されたグラフ情報を取得する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 has a plurality of vectors according to the similarity of a plurality of vectors whose elements are the feature quantities extracted from each of the plurality of patent documents using a predetermined model. Gets the graph information to which the nodes are concatenated.

このように、実施形態に係る抽出装置１００は、所定のモデルを用いて複数の特許文献の各々から抽出された特徴量を要素とする複数のベクトルの類似性に応じて、複数のノードが連結されたグラフ情報を取得することにより、類似の特許文献を適切に抽出することができる。 As described above, in the extraction device 100 according to the embodiment, a plurality of nodes are connected according to the similarity of a plurality of vectors whose elements are the feature quantities extracted from each of the plurality of patent documents using a predetermined model. By acquiring the graph information obtained, similar patent documents can be appropriately extracted.

また、実施形態に係る抽出装置１００において、取得部１３１は、複数の特許文献に関する情報を所定のモデルに入力することにより、抽出される複数の特許文献の各々の特徴量を要素とする複数のベクトルの類似性に応じて、複数のノードが連結されたグラフ情報を取得する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 inputs information on a plurality of patent documents into a predetermined model, so that a plurality of features of each of the plurality of patent documents to be extracted are used as elements. Acquire graph information in which multiple nodes are concatenated according to the similarity of vectors.

このように、実施形態に係る抽出装置１００は、複数の特許文献に関する情報を所定のモデルに入力することにより、抽出される複数の特許文献の各々の特徴量を要素とする複数のベクトルの類似性に応じて、複数のノードが連結されたグラフ情報を取得することにより、類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment is similar to a plurality of vectors having the feature amount of each of the plurality of patent documents extracted by inputting the information regarding the plurality of patent documents into a predetermined model. By acquiring graph information in which a plurality of nodes are concatenated according to the sex, similar patent documents can be appropriately extracted.

また、実施形態に係る抽出装置１００において、取得部１３１は、複数の特許文献に含まれる書類の各種類に対応する複数のグラフ情報を取得する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 acquires a plurality of graph information corresponding to each type of documents included in the plurality of patent documents.

このように、実施形態に係る抽出装置１００は、複数の特許文献に含まれる書類の各種類に対応する複数のグラフ情報を取得することにより、書類の各種類に応じて類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment appropriately obtains similar patent documents according to each type of documents by acquiring a plurality of graph information corresponding to each type of documents included in the plurality of patent documents. Can be extracted to.

また、実施形態に係る抽出装置１００において、取得部１３１は、複数の特許文献の各々に含まれる各要約書に対応する複数のベクトルの類似性に応じて、各要約書に対応する複数のノードが連結されたグラフ情報を含む複数のグラフ情報を取得する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 has a plurality of nodes corresponding to each abstract according to the similarity of the plurality of vectors corresponding to each abstract included in each of the plurality of patent documents. Gets multiple graph information, including graph information concatenated with.

このように、実施形態に係る抽出装置１００は、複数の特許文献の各々に含まれる各要約書に対応する複数のベクトルが類似性に応じて、各要約書に対応する複数のノードが連結されたグラフ情報を取得することにより、各特許文献の要約書に応じて類似の特許文献を適切に抽出することができる。 As described above, in the extraction device 100 according to the embodiment, a plurality of nodes corresponding to each abstract are connected to each other according to the similarity of the plurality of vectors corresponding to each abstract included in each of the plurality of patent documents. By acquiring the graph information, similar patent documents can be appropriately extracted according to the abstract of each patent document.

また、実施形態に係る抽出装置１００において、取得部１３１は、複数の特許文献の各々に含まれる各図面に対応する複数のベクトルの類似性に応じて、各図面に対応する複数のノードが連結されたグラフ情報を含む複数のグラフ情報を取得する。 Further, in the extraction device 100 according to the embodiment, in the acquisition unit 131, a plurality of nodes corresponding to each drawing are connected according to the similarity of the plurality of vectors corresponding to each drawing included in each of the plurality of patent documents. Acquires a plurality of graph information including the graph information obtained.

このように、実施形態に係る抽出装置１００は、複数の特許文献の各々に含まれる各図面に対応する複数のベクトルの類似性に応じて、各図面に対応する複数のノードが連結されたグラフ情報を取得することにより、各特許文献の図面に応じて類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment is a graph in which a plurality of nodes corresponding to each drawing are connected according to the similarity of a plurality of vectors corresponding to each drawing included in each of the plurality of patent documents. By acquiring the information, similar patent documents can be appropriately extracted according to the drawings of each patent document.

また、実施形態に係る抽出装置１００において、取得部１３１は、複数の特許文献の各々に含まれる各明細書に対応する複数のベクトルの類似性に応じて、各明細書に対応する複数のノードが連結されたグラフ情報を含む複数のグラフ情報を取得する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 has a plurality of nodes corresponding to each specification according to the similarity of the plurality of vectors corresponding to each specification included in each of the plurality of patent documents. Gets multiple graph information, including graph information concatenated with.

このように、実施形態に係る抽出装置１００は、複数の特許文献の各々に含まれる各明細書に対応する複数のベクトルの類似性に応じて、各明細書に対応する複数のノードが連結されたグラフ情報を取得することにより、各特許文献の明細書に応じて類似の特許文献を適切に抽出することができる。 As described above, in the extraction device 100 according to the embodiment, a plurality of nodes corresponding to each specification are connected according to the similarity of the plurality of vectors corresponding to each specification included in each of the plurality of patent documents. By acquiring the graph information, similar patent documents can be appropriately extracted according to the specification of each patent document.

また、実施形態に係る抽出装置１００において、取得部１３１は、複数の特許文献の各々に含まれる各特許請求の範囲に対応する複数のベクトルの類似性に応じて、各特許請求の範囲に対応する複数のノードが連結されたグラフ情報を含む複数のグラフ情報を取得する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 corresponds to each claim range according to the similarity of the plurality of vectors corresponding to each claim range included in each of the plurality of patent documents. Acquires multiple graph information including graph information in which multiple nodes are concatenated.

このように、実施形態に係る抽出装置１００は、複数の特許文献の各々に含まれる各特許請求の範囲に対応する複数のベクトルの類似性に応じて、各特許請求の範囲に対応する複数のノードが連結されたグラフ情報を取得することにより、各特許文献の特許請求の範囲に応じて類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment has a plurality of patent claims corresponding to the similarities of the plurality of vectors corresponding to the claims included in each of the plurality of patent documents. By acquiring the graph information in which the nodes are concatenated, it is possible to appropriately extract similar patent documents according to the claims of each patent document.

また、実施形態に係る抽出装置１００は、提供部１３５を有する。提供部１３５は、抽出部１３４により抽出された類似特許文献に基づいて、所定のサービスを提供する。 Further, the extraction device 100 according to the embodiment has a providing unit 135. The providing unit 135 provides a predetermined service based on the similar patent documents extracted by the extracting unit 134.

このように、実施形態に係る抽出装置１００は、抽出した類似特許文献に基づいて、所定のサービスを提供することにより、類似の特許文献に関する情報を用いたサービスを適切に提供することができる。 As described above, the extraction device 100 according to the embodiment can appropriately provide a service using information on similar patent documents by providing a predetermined service based on the extracted similar patent documents.

また、実施形態に係る抽出装置１００において、提供部１３５は、類似特許文献に関する情報提供サービスを提供する。 Further, in the extraction device 100 according to the embodiment, the providing unit 135 provides an information providing service regarding similar patent documents.

このように、実施形態に係る抽出装置１００は、類似特許文献に関する情報提供サービスを提供することにより、類似の特許文献に関する情報を用いたサービスを適切に提供することができる。 As described above, the extraction device 100 according to the embodiment can appropriately provide a service using information on similar patent documents by providing an information providing service on similar patent documents.

また、実施形態に係る抽出装置１００において、取得部１３１は、ユーザが利用する端末装置１０から一の発明に関する情報を取得する。提供部１３５は、端末装置１０に類似特許文献に関する情報を提供する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 acquires information regarding one invention from the terminal device 10 used by the user. The providing unit 135 provides the terminal device 10 with information regarding similar patent documents.

このように、実施形態に係る抽出装置１００は、ユーザが利用する端末装置１０から一の発明に関する情報を取得する。提供部１３５は、端末装置１０に類似特許文献に関する情報を提供することにより、抽出した類似の特許文献に関する情報を適切にユーザに提供することができる。 As described above, the extraction device 100 according to the embodiment acquires information regarding one invention from the terminal device 10 used by the user. By providing the terminal device 10 with information on similar patent documents, the providing unit 135 can appropriately provide the user with information on the extracted similar patent documents.

また、実施形態に係る抽出装置１００において、取得部１３１は、一の発明に関する情報として、一の発明の特許文献に関する情報を取得する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 acquires information on the patent document of one invention as information on one invention.

このように、実施形態に係る抽出装置１００は、一の発明の特許文献に関する情報を取得することにより、一の発明の特許文献に応じて類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment can appropriately extract similar patent documents according to the patent documents of one invention by acquiring the information regarding the patent documents of one invention.

また、実施形態に係る抽出装置１００において、取得部１３１は、一の発明に関する情報として、一の発明の特許文献のうち、一の種類の書類に関する情報を取得する。 Further, in the extraction device 100 according to the embodiment, the acquisition unit 131 acquires information on one type of document in the patent document of one invention as information on one invention.

このように、実施形態に係る抽出装置１００は、一の発明に関する情報として、一の発明の特許文献のうち、一の種類の書類に関する情報を取得することにより、一の発明の一の種類の書類に関する情報に応じて類似の特許文献を適切に抽出することができる。 As described above, the extraction device 100 according to the embodiment obtains information on one type of document in the patent document of one invention as information on one invention, so that one type of one invention can be obtained. Similar patent documents can be appropriately extracted depending on the information regarding the documents.

〔８．ハードウェア構成〕
上述してきた実施形態に係る抽出装置１００は、例えば図１３に示すような構成のコンピュータ１０００によって実現される。図１３は、抽出装置の機能を実現するコンピュータの一例を示すハードウェア構成図である。コンピュータ１０００は、ＣＰＵ１１００、ＲＡＭ１２００、ＲＯＭ（Read Only Memory）１３００、ＨＤＤ（Hard Disk Drive）１４００、通信インターフェイス（Ｉ／Ｆ）１５００、入出力インターフェイス（Ｉ／Ｆ）１６００、及びメディアインターフェイス（Ｉ／Ｆ）１７００を有する。 [8. Hardware configuration]
The extraction device 100 according to the above-described embodiment is realized by, for example, a computer 1000 having a configuration as shown in FIG. FIG. 13 is a hardware configuration diagram showing an example of a computer that realizes the function of the extraction device. The computer 1000 includes a CPU 1100, a RAM 1200, a ROM (Read Only Memory) 1300, an HDD (Hard Disk Drive) 1400, a communication interface (I / F) 1500, an input / output interface (I / F) 1600, and a media interface (I / F). ) Has 1700.

ＣＰＵ１１００は、ＲＯＭ１３００またはＨＤＤ１４００に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ１３００は、コンピュータ１０００の起動時にＣＰＵ１１００によって実行されるブートプログラムや、コンピュータ１０００のハードウェアに依存するプログラム等を格納する。 The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400, and controls each part. The ROM 1300 stores a boot program executed by the CPU 1100 when the computer 1000 is started, a program depending on the hardware of the computer 1000, and the like.

ＨＤＤ１４００は、ＣＰＵ１１００によって実行されるプログラム、及び、かかるプログラムによって使用されるデータ等を格納する。通信インターフェイス１５００は、ネットワークＮを介して他の機器からデータを受信してＣＰＵ１１００へ送り、ＣＰＵ１１００が生成したデータをネットワークＮを介して他の機器へ送信する。 The HDD 1400 stores a program executed by the CPU 1100, data used by such a program, and the like. The communication interface 1500 receives data from another device via the network N and sends it to the CPU 1100, and transmits the data generated by the CPU 1100 to the other device via the network N.

ＣＰＵ１１００は、入出力インターフェイス１６００を介して、ディスプレイやプリンタ等の出力装置、及び、キーボードやマウス等の入力装置を制御する。ＣＰＵ１１００は、入出力インターフェイス１６００を介して、入力装置からデータを取得する。また、ＣＰＵ１１００は、生成したデータを入出力インターフェイス１６００を介して出力装置へ出力する。 The CPU 1100 controls an output device such as a display or a printer, and an input device such as a keyboard or a mouse via the input / output interface 1600. The CPU 1100 acquires data from the input device via the input / output interface 1600. Further, the CPU 1100 outputs the generated data to the output device via the input / output interface 1600.

メディアインターフェイス１７００は、記録媒体１８００に格納されたプログラムまたはデータを読み取り、ＲＡＭ１２００を介してＣＰＵ１１００に提供する。ＣＰＵ１１００は、かかるプログラムを、メディアインターフェイス１７００を介して記録媒体１８００からＲＡＭ１２００上にロードし、ロードしたプログラムを実行する。記録媒体１８００は、例えばＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 1700 reads a program or data stored in the recording medium 1800 and provides the program or data to the CPU 1100 via the RAM 1200. The CPU 1100 loads the program from the recording medium 1800 onto the RAM 1200 via the media interface 1700, and executes the loaded program. The recording medium 1800 is, for example, an optical recording medium such as a DVD (Digital Versatile Disc) or PD (Phase change rewritable Disk), a magneto-optical recording medium such as MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. And so on.

例えば、コンピュータ１０００が実施形態に係る抽出装置１００として機能する場合、コンピュータ１０００のＣＰＵ１１００は、ＲＡＭ１２００上にロードされたプログラムまたはデータ（例えば、モデルＭ１（モデルデータＭＤＴ１））を実行することにより、制御部１３０の機能を実現する。コンピュータ１０００のＣＰＵ１１００は、これらのプログラムまたはデータ（例えば、モデルＭ１（モデルデータＭＤＴ１））を記録媒体１８００から読み取って実行するが、他の例として、他の装置からネットワークＮを介してこれらのプログラムを取得してもよい。 For example, when the computer 1000 functions as the extraction device 100 according to the embodiment, the CPU 1100 of the computer 1000 is controlled by executing a program or data (for example, model M1 (model data MDT1)) loaded on the RAM 1200. The function of the unit 130 is realized. The CPU 1100 of the computer 1000 reads and executes these programs or data (for example, model M1 (model data MDT1)) from the recording medium 1800, but as another example, these programs from other devices via the network N. May be obtained.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の行に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 Although some of the embodiments of the present application have been described in detail with reference to the drawings, these are examples, and various modifications are made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure line of the invention. It is possible to carry out the present invention in other modified forms.

〔９．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [9. others〕
Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, information including processing procedures, specific names, various data and parameters shown in the above documents and drawings can be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the information shown in the figure.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically in arbitrary units according to various loads and usage conditions. Can be integrated and configured.

また、上述してきた各実施形態に記載された各処理は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the processes described in the above-described embodiments can be appropriately combined as long as the processing contents do not contradict each other.

また、上述してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、取得部は、取得手段や取得回路に読み替えることができる。 Further, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the acquisition unit can be read as an acquisition means or an acquisition circuit.

１抽出システム
１００抽出装置
１２１特許情報記憶部
１２２インデックス情報記憶部
１２３グラフ情報記憶部
１２４モデル情報記憶部
１３０制御部
１３１取得部
１３２生成部
１３３決定部
１３４抽出部
１３５提供部
１０端末装置
５０情報提供装置
Ｎネットワーク 1 Extraction system 100 Extraction device 121 Patent information storage unit 122 Index information storage unit 123 Graph information storage unit 124 Model information storage unit 130 Control unit 131 Acquisition unit 132 Generation unit 133 Decision unit 134 Extraction unit 135 Providing unit 10 Terminal device 50 Information provision Device N network

Claims

複数の特許文献の各々に対応する複数のノードが、前記複数の特許文献の第１の要素の類似性に応じて連結されたグラフ情報と、前記複数の特許文献の各々に対応する他の複数のノードが、前記第１の要素とは異なる前記複数の特許文献の第２の要素の類似性に応じて連結された他のグラフ情報と、一の発明に関する情報を取得する取得部と、
前記取得部により取得された前記グラフ情報の前記複数のノードのうち、所定の基準に基づいて決定された前記グラフ情報の検索の起点となる起点ノードを起点として前記グラフ情報を検索することにより抽出した第１の候補特許文献と、前記他のグラフ情報を検索することにより抽出した第２の候補特許文献とを用いて、前記複数の特許文献のうち、前記一の発明に類似する特許文献である類似特許文献を抽出する抽出部と、
を備えたことを特徴とする抽出装置。 The graph information in which the plurality of nodes corresponding to each of the plurality of patent documents are concatenated according to the similarity of the first element of the plurality of patent documents, and the other plurality corresponding to each of the plurality of patent documents. A node for acquiring other graph information concatenated according to the similarity of the second element of the plurality of patent documents different from the first element, and an acquisition unit for acquiring information regarding one invention.
Extracted by searching the graph information from the starting node that is the starting point for searching the graph information determined based on a predetermined criterion among the plurality of nodes of the graph information acquired by the acquisition unit. A patent document similar to the one invention among the plurality of patent documents , using the first candidate patent document obtained and the second candidate patent document extracted by searching the other graph information. An extraction unit that extracts a similar patent document,
An extraction device characterized by being equipped with.

前記起点ノードの決定に用いるインデックス情報に基づいて、前記起点ノードを決定する決定部、
をさらに備え、
前記抽出部は、
前記決定部により決定された前記起点ノードを起点として、前記特許文献を抽出する
ことを特徴とする請求項１に記載の抽出装置。 A determination unit that determines the origin node based on the index information used to determine the origin node.
Further prepare
The extraction unit
The extraction device according to claim 1, wherein the patent document is extracted from the origin node determined by the determination unit.

前記決定部は、
木構造型の前記インデックス情報に基づいて、前記起点ノードを決定する
ことを特徴とする請求項２に記載の抽出装置。 The decision-making part
The extraction device according to claim 2, wherein the origin node is determined based on the index information of the tree structure type.

前記取得部は、
前記複数の特許文献の各々に対応する複数のベクトルの類似性に応じて、前記複数のノードが連結されたグラフ情報を取得する
ことを特徴とする請求項１〜３のいずれか１項に記載の抽出装置。 The acquisition unit
The invention according to any one of claims 1 to 3, wherein the graph information in which the plurality of nodes are connected is acquired according to the similarity of the plurality of vectors corresponding to each of the plurality of patent documents. Extractor.

前記取得部は、
前記複数の特許文献の各々の特徴を示す前記複数のベクトルの類似性に応じて、前記複数のノードが連結された前記グラフ情報を取得する
ことを特徴とする請求項４に記載の抽出装置。 The acquisition unit
The extraction device according to claim 4, wherein the graph information in which the plurality of nodes are connected is acquired according to the similarity of the plurality of vectors showing the characteristics of each of the plurality of patent documents.

前記取得部は、
所定のモデルを用いて前記複数の特許文献の各々から抽出された特徴量を要素とする前記複数のベクトルの類似性に応じて、前記複数のノードが連結された前記グラフ情報を取得する
ことを特徴とする請求項４または請求項５に記載の抽出装置。 The acquisition unit
Acquiring the graph information in which the plurality of nodes are concatenated according to the similarity of the plurality of vectors whose elements are the feature quantities extracted from each of the plurality of patent documents using a predetermined model. The extraction device according to claim 4 or 5.

前記取得部は、
前記複数の特許文献に関する情報を所定のモデルに入力することにより、抽出される前記複数の特許文献の各々の特徴量を要素とする前記複数のベクトルの類似性に応じて、前記複数のノードが連結された前記グラフ情報を取得する
ことを特徴とする請求項４〜６のいずれか１項に記載の抽出装置。 The acquisition unit
By inputting information about the plurality of patent documents into a predetermined model, the plurality of nodes can be subjected to the similarity of the plurality of vectors having the feature amount of each of the plurality of patent documents as an element. The extraction device according to any one of claims 4 to 6, wherein the linked graph information is acquired.

前記取得部は、
前記複数の特許文献に含まれる書類の各種類に対応する複数のグラフ情報を取得する
ことを特徴とする請求項１〜７のいずれか１項に記載の抽出装置。 The acquisition unit
The extraction device according to any one of claims 1 to 7, wherein a plurality of graph information corresponding to each type of documents included in the plurality of patent documents is acquired.

前記取得部は、
前記複数の特許文献の各々に含まれる各要約書に対応する複数のベクトルの類似性に応じて、前記各要約書に対応する複数のノードが連結されたグラフ情報を含む前記複数のグラフ情報を取得する
ことを特徴とする請求項８に記載の抽出装置。 The acquisition unit
The plurality of graph information including the graph information in which the plurality of nodes corresponding to the respective abstracts are concatenated according to the similarity of the plurality of vectors corresponding to each abstract contained in each of the plurality of patent documents. The extraction device according to claim 8, wherein the extraction device is to be obtained.

前記取得部は、
前記複数の特許文献の各々に含まれる各図面に対応する複数のベクトルの類似性に応じて、前記各図面に対応する複数のノードが連結されたグラフ情報を含む前記複数のグラフ情報を取得する
ことを特徴とする請求項８または請求項９に記載の抽出装置。 The acquisition unit
Acquire the plurality of graph information including the graph information in which the plurality of nodes corresponding to the respective drawings are connected according to the similarity of the plurality of vectors corresponding to each drawing included in each of the plurality of patent documents. The extraction device according to claim 8 or 9.

前記取得部は、
前記複数の特許文献の各々に含まれる各明細書に対応する複数のベクトルの類似性に応じて、前記各明細書に対応する複数のノードが連結されたグラフ情報を含む前記複数のグラフ情報を取得する
ことを特徴とする請求項８〜１０のいずれか１項に記載の抽出装置。 The acquisition unit
The plurality of graph information including the graph information in which the plurality of nodes corresponding to the respective specifications are concatenated according to the similarity of the plurality of vectors corresponding to the respective specifications included in each of the plurality of patent documents. The extraction device according to any one of claims 8 to 10, wherein the extraction device is to be obtained.

前記取得部は、
前記複数の特許文献の各々に含まれる各特許請求の範囲に対応する複数のベクトルの類似性に応じて、前記各特許請求の範囲に対応する複数のノードが連結されたグラフ情報を含む前記複数のグラフ情報を取得する
ことを特徴とする請求項８〜１１のいずれか１項に記載の抽出装置。 The acquisition unit
The plurality of patents including graph information in which a plurality of nodes corresponding to the scope of each claim are concatenated according to the similarity of the plurality of vectors corresponding to the scope of each claim contained in each of the plurality of patent documents. The extraction device according to any one of claims 8 to 11, characterized in that the graph information of the above is acquired.

前記抽出部により抽出された前記類似特許文献に基づいて、所定のサービスを提供する提供部、
をさらに備えたことを特徴とする請求項１〜１２のいずれか１項に記載の抽出装置。 A providing unit that provides a predetermined service based on the similar patent document extracted by the extracting unit,
The extraction device according to any one of claims 1 to 12, further comprising.

前記提供部は、
前記類似特許文献に関する情報提供サービスを提供する
ことを特徴とする請求項１３に記載の抽出装置。 The providing part
The extraction device according to claim 13, wherein an information providing service relating to the similar patent document is provided.

前記取得部は、
ユーザが利用する端末装置から前記一の発明に関する情報を取得し、
前記提供部は、
前記端末装置に前記類似特許文献に関する情報を提供する
ことを特徴とする請求項１３または請求項１４に記載の抽出装置。 The acquisition unit
Obtaining information regarding the above-mentioned invention from the terminal device used by the user,
The providing part
The extraction device according to claim 13 or 14, wherein the terminal device is provided with information regarding the similar patent document.

前記取得部は、
前記一の発明に関する情報として、前記一の発明の特許文献に関する情報を取得する
ことを特徴とする請求項１〜１５のいずれか１項に記載の抽出装置。 The acquisition unit
The extraction device according to any one of claims 1 to 15, wherein as the information relating to the invention, information relating to the patent document of the invention is acquired.

前記取得部は、
前記一の発明に関する情報として、前記一の発明の特許文献のうち、一の種類の書類に関する情報を取得する
ことを特徴とする請求項１〜１６のいずれか１項に記載の抽出装置。 The acquisition unit
The extraction device according to any one of claims 1 to 16, wherein as the information relating to the invention, information relating to one type of document is acquired from the patent documents of the invention.

コンピュータが実行する抽出方法であって、
複数の特許文献の各々に対応する複数のノードが、前記複数の特許文献の第１の要素の類似性に応じて連結されたグラフ情報と、前記複数の特許文献の各々に対応する他の複数のノードが、前記第１の要素とは異なる前記複数の特許文献の第２の要素の類似性に応じて連結された他のグラフ情報と、一の発明に関する情報を取得する取得工程と、
前記取得工程により取得された前記グラフ情報の前記複数のノードのうち、所定の基準に基づいて決定された前記グラフ情報の検索の起点となる起点ノードを起点として前記グラフ情報を検索することにより抽出した第１の候補特許文献と、前記他のグラフ情報を検索することにより抽出した第２の候補特許文献とを用いて、前記複数の特許文献のうち、前記一の発明に類似する特許文献である類似特許文献を抽出する抽出工程と、
を含んだことを特徴とする抽出方法。 An extraction method performed by a computer
The graph information in which the plurality of nodes corresponding to each of the plurality of patent documents are concatenated according to the similarity of the first element of the plurality of patent documents, and the other plurality corresponding to each of the plurality of patent documents. A node for acquiring other graph information concatenated according to the similarity of the second element of the plurality of patent documents different from the first element, and an acquisition step of acquiring information regarding one invention.
Extracted by searching the graph information from the starting node that is the starting point for searching the graph information determined based on a predetermined criterion among the plurality of nodes of the graph information acquired by the acquisition step. A patent document similar to the one invention among the plurality of patent documents , using the first candidate patent document obtained and the second candidate patent document extracted by searching the other graph information. An extraction process for extracting a similar patent document,
An extraction method characterized by containing.

複数の特許文献の各々に対応する複数のノードが、前記複数の特許文献の第１の要素の類似性に応じて連結されたグラフ情報と、前記複数の特許文献の各々に対応する他の複数のノードが、前記第１の要素とは異なる前記複数の特許文献の第２の要素の類似性に応じて連結された他のグラフ情報と、一の発明に関する情報を取得する取得手順と、
前記取得手順により取得された前記グラフ情報の前記複数のノードのうち、所定の基準に基づいて決定された前記グラフ情報の検索の起点となる起点ノードを起点として前記グラフ情報を検索することにより抽出した第１の候補特許文献と、前記他のグラフ情報を検索することにより抽出した第２の候補特許文献とを用いて、前記複数の特許文献のうち、前記一の発明に類似する特許文献である類似特許文献を抽出する抽出手順と、
をコンピュータに実行させることを特徴とする抽出プログラム。 The graph information in which the plurality of nodes corresponding to each of the plurality of patent documents are concatenated according to the similarity of the first element of the plurality of patent documents, and the other plurality corresponding to each of the plurality of patent documents. A node for acquiring other graph information concatenated according to the similarity of the second element of the plurality of patent documents different from the first element, and an acquisition procedure for acquiring information regarding one invention.
Extracted by searching the graph information from the starting node that is the starting point for searching the graph information determined based on a predetermined criterion among the plurality of nodes of the graph information acquired by the acquisition procedure. A patent document similar to the one invention among the plurality of patent documents , using the first candidate patent document obtained and the second candidate patent document extracted by searching the other graph information. An extraction procedure for extracting a similar patent document, and
An extraction program characterized by having a computer execute.