KR101686068B1

KR101686068B1 - Method and system for answer extraction using conceptual graph matching

Info

Publication number: KR101686068B1
Application number: KR1020150160472A
Authority: KR
Inventors: 맹성현; 류지희; 김경민
Original assignee: 한국과학기술원
Priority date: 2015-02-24
Filing date: 2015-11-16
Publication date: 2016-12-14
Also published as: KR20160103911A

Abstract

질의응답 시스템이 외부로부터 수신한 복수의 텍스트들을 자연어 처리하여 결과 데이터를 생성하고, 결과 데이터가 정의문인지 판별하여 정의문 데이터와 비정의문 데이터를 추출한다. 결과 데이터와 정의문 데이터 또는 비정의문 데이터 중 어느 하나의 데이터를 이용하여 생성된 제1 개념 그래프 및 제2 개념 그래프를 이용하여 확장 그래프를 생성하고, 외부로부터 입력되는 질문을 토대로 질문 그래프를 생성한 후, 생성한 질문 그래프와 상기 확장 그래프를 토대로 정답 노드를 찾아 질문에 대한 정답을 도출한다.The query response system processes natural texts of the plurality of texts received from the outside, generates result data, and determines whether the result data is a definition statement, and extracts the definition data and the non-question data. An expanded graph is generated using the first conceptual graph and the second conceptual graph generated by using any one of the result data, the definition data, and the questionable data, and a question graph is generated based on the externally inputted question Then, based on the generated question graph and the extended graph, a correct answer node is searched to find the correct answer to the question.

Description

개념 그래프 매칭을 이용한 질의응답 방법 및 시스템{Method and system for answer extraction using conceptual graph matching}[0001] The present invention relates to a method and system for query response using concept graph matching,

본 발명은 개념 그래프 매칭을 이용한 질의응답 방법 및 시스템에 관한 것이다.The present invention relates to a query response method and system using conceptual graph matching.

담고 있는 데이터의 양이 기하급수적으로 늘어남에 따라, 오늘날의 웹은 사실상 사용자가 원하는 모든 정보를 갖고 있다고 볼 수 있을 정도의 수준을 갖추게 되었다. 이러한 웹의 특성을 잘 살리기 위해선, 웹에 존재하는 데이터를 사용자의 요구에 맞게 적절히 추출할 필요성이 있다.As the amount of data contained grows exponentially, today's Web is at a level that can actually be seen as having all the information users want. In order to take advantage of the characteristics of the web, it is necessary to appropriately extract the data existing on the web according to the user's needs.

이에 따라 사용자의 질의와 관련된 문서를 검색하는 정보검색 연구가 활발히 진행되었으며, 많은 웹 기반 정보검색 시스템이 만들어졌다. 하지만 종래의 웹 기반 정보검색 시스템은 사용자가 직접적인 답변을 요구하는 경우, 관련 문서를 제공해주는 선에서 그친다는 한계를 보였다.As a result, information retrieval researches have been actively conducted to retrieve documents related to user queries, and many web-based information retrieval systems have been created. However, the conventional web-based IR system has shown limitations in that it provides only the relevant documents when the user requests direct answers.

질의응답 시스템은 단순히 관련 문서를 제공하는 정보검색 시스템에서 한 걸음 더 나아가, 사용자의 질의에 대한 답변이 될 수 있는 정답을 제시해주는 시스템이다. 특정 분야의 정보에 대해서만 답변이 가능했던 BASEBALL, LUNAR 등과 같은 웹 발달 이전의 초창기 질의응답 시스템으로 시작하여, 2000년대 이후의 LCC, QuASM, Mulder 등의 질의응답 시스템은 웹 데이터로부터 구축한 지식 베이스를 통해 넓은 범위의 질의에 답변이 가능하다. The Q & A system is a system that goes one step further from the information retrieval system that simply provides the related documents and gives the correct answer that can be the answer to the user's query. Starting with the early question-and-answer system before web development such as BASEBALL, LUNAR, etc., which were able to respond only to information in specific fields, the question-and-answer system of LCC, QuASM, A wide range of queries can be answered.

보편적인 형태의 질의응답 시스템은 다음과 같은 과정으로 작동한다. 자연언어 형태의 질의를 입력 받아 질의의 종류를 결정하고, 핵심 단어들을 추출한다. 핵심 단어들을 담고 있는 문서들을 수집한 뒤, 해당 문서들 내에서 가장 질의 종류에 알맞게 핵심 단어들과 관계되어 위치하는 부분을 추출하여 답변으로 제공한다.A universal query-response system works by the following process. A natural language type query is input to determine the type of query and extract key words. After collecting the documents containing the key words, the parts that are related to the key words are extracted from the documents and the answers are provided.

이러한 기존의 질의응답 시스템은 정답을 찾기 위해 기반으로 삼고 있는 지식베이스가, 지식베이스의 구축을 위해 사용된 원본 문서 내에서 나타나는 정보들의 상호관계를 온전히 다 담아낼 수 없다는 단점을 갖고 있다.These existing question and answer systems have the disadvantage that the knowledge base, which is based on finding the correct answer, can not fully capture the interrelationships of the information appearing in the original document used to build the knowledge base.

따라서, 본 발명은 질의문과 정답후보 문서간의 개념 그래프 변환 및 매칭을 이용하여 질의응답을 수행하는 방법 및 시스템을 제공한다.Accordingly, the present invention provides a method and system for performing a query response using a concept graph transformation and matching between a query statement and a correct answer candidate document.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 하나의 특징인 질의응답 시스템은,According to an aspect of the present invention, there is provided a query response system,

외부로부터 수신한 복수의 텍스트들을 자연어 처리하고, 자연어 처리 결과인 결과 데이터를 생성하는 자연어 처리부; 상기 자연어 처리부가 생성한 상기 결과 데이터가 정의문인지 판별하고, 정의문 데이터와 비정의문 데이터를 추출하는 정의문 판별부; 상기 자연어 처리부가 생성한 결과 데이터와 상기 정의문 판별부에서 추출한 정의문 데이터를 토대로 제1 개념 그래프를 생성하는 제1 그래프 생성부; 상기 자연어 처리부가 생성한 결과 데이터와 상기 정의문 판별부에서 추출한 비정의문 데이터를 토대로 제2 개념 그래프를 생성하는 제2 그래프 생성부; 상기 제1 개념 그래프와 상기 제2 개념 그래프를 결합 연산하여 확장 그래프를 생성하는 지식 확장부; 및 상기 지식 확장부에서 생성한 확장 그래프와 외부로부터 입력된 질문을 토대로 생성된 질문 그래프를 토대로 질문에 대응하는 정답 그래프를 찾고, 상기 정답 그래프에서 정답 노드를 찾아 상기 질문에 대한 정답을 도출하는 정답 도출부를 포함한다.A natural language processing unit for processing a plurality of texts received from the outside in a natural language and generating result data as a natural language processing result; A definition statement discrimination unit for discriminating whether the result data generated by the natural language processing unit is a definition statement, and extracting the definition statement data and non-question data; A first graph generation unit for generating a first concept graph based on result data generated by the natural language processing unit and definition statement data extracted by the definition determination unit; A second graph generating unit for generating a second concept graph based on the result data generated by the natural language processing unit and the non-question text data extracted by the definition determining unit; A knowledge extension unit for combining the first conceptual graph and the second conceptual graph to generate an extended graph; And a correct answer graph that finds a correct answer graph corresponding to a question based on an expanded graph generated by the knowledge expanding unit and a question graph generated based on a question input from the outside and finds a correct answer node in the correct answer graph, And a derivation unit.

상기 시스템은 외부로부터 복수의 텍스트들을 포함하는 데이터를 수집하고, 상기 수집한 텍스트들을 상기 자연어 처리부로 전달하는 데이터 수집부; 및 상기 외부로부터 입력되는 질문을 수신하고, 상기 수신한 질문을 자연어 처리하여 질문 그래프인 상기 질문 그래프를 생성한 후 상기 정답 도출부로 제공하는 질문 수신부를 포함할 수 있다.The system includes: a data collecting unit collecting data including a plurality of texts from the outside and delivering the collected texts to the natural language processing unit; And a question receiver for receiving the question input from the outside, processing the received question in a natural language to generate the question graph, and providing the question graph to the correct answer derivation unit.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 또 다른 특징인 질의응답 시스템이 질문에 대해 응답하는 방법은,According to another aspect of the present invention, there is provided a method of responding to a query, the method comprising:

외부로부터 수신한 복수의 텍스트들을 자연어 처리하여 결과 데이터를 생성하고, 상기 결과 데이터가 정의문인지 판별하여 정의문 데이터와 비정의문 데이터를 추출하는 단계; 상기 결과 데이터와 정의문 데이터 또는 비정의문 데이터 중 어느 하나의 데이터를 이용하여 생성된 제1 개념 그래프 및 제2 개념 그래프를 이용하여 확장 그래프를 생성하는 단계; 및 외부로부터 입력되는 질문을 토대로 질문 그래프를 생성하고, 생성한 질문 그래프와 상기 확장 그래프를 토대로 정답 노드를 찾아, 상기 질문에 대한 정답을 도출하는 단계를 포함한다.Processing the plurality of texts received from the outside in a natural language to generate result data, and extracting the definition data and the questionnaire data by determining whether the result data is a definition statement; Generating an extended graph using the first conceptual graph and the second conceptual graph generated by using the resultant data and either one of the definition data or the non-questionable data; And a step of generating a question graph based on a question input from the outside, and searching for a correct node based on the generated question graph and the expanded graph, and deriving a correct answer to the question.

상기 확장 그래프를 생성하는 단계는, 상기 결과 데이터와 정의문 데이터를 이용하여 상기 제1 개념 그래프를 생성하는 단계; 상기 결과 데이터와 비정의문 데이터를 이용하여 상기 제2 개념 그래프를 생성하는 단계; 및 상기 제1 개념 그래프와 상기 제2 개념 그래프를 결합 연산하여 상기 확장 그래프를 생성하는 단계를 포함할 수 있다.The generating of the extended graph may include generating the first concept graph using the result data and the definition statement data; Generating the second concept graph using the resultant data and non-questionable data; And combining the first conceptual graph and the second conceptual graph to generate the extended graph.

상기 질문에 대한 정답을 도출하는 단계는, 상기 외부로부터 입력되는 질문을 자연어 처리하여 질문 그래프를 생성하는 단계; 상기 질문 그래프와 상기 확장 그래프 간에 유사도 계산 함수를 이용하여 근사 매칭으로 정답 그래프를 찾는 단계; 및 상기 정답 그래프에서 정답 노드를 찾아 상기 질문에 대한 정답으로 도출하는 단계를 포함할 수 있다.The step of deriving a correct answer to the question may include: generating a question graph by processing a natural-language query inputted from the outside; Finding an answer graph by approximate matching using the similarity calculation function between the question graph and the extended graph; And finding a correct node in the correct answer graph as a correct answer to the question.

본 발명에 따르면, 관계와 논항들로 이루어진 수많은 단편적인 튜플(tuple)형태로 지식을 표현한 기존의 지식 표현 체계 및 응용 시스템들과 달리, 개념 그래프로 지식을 표현하고 추상화하여 정답을 생성함으로써 단편적 튜플들로부터 직접 혹은 추론 과정을 거쳐서도 찾아낼 수 없었던 답을 얻어낼 수 있다.According to the present invention, unlike existing knowledge representation systems and application systems that express knowledge in the form of a number of fragmentary tuples of relations and arguments, by expressing and abstracting knowledge by concept graphs, You can get answers that can not be found either directly or through reasoning.

도 1은 본 발명의 실시예에 따른 질의응답 시스템의 구조도이다.
도 2는 본 발명의 실시예에 따른 질의응답 방법에 대한 흐름도이다.
도 3은 본 발명의 실시예에 따른 질의로부터 정답을 추출하는 예시도이다.1 is a structural diagram of a query response system according to an embodiment of the present invention.
2 is a flowchart of a query response method according to an embodiment of the present invention.
3 is an exemplary diagram for extracting correct answers from a query according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

이하 도면을 참조로 하여 본 발명의 실시예에 따른 개념 그래프 매칭을 이용한 질의응답 시스템 및 방법에 대해 설명한다.Hereinafter, a query response system and method using conceptual graph matching according to an embodiment of the present invention will be described with reference to the drawings.

도 1은 본 발명의 실시예에 따른 질의응답 시스템의 구조도이다.1 is a structural diagram of a query response system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 질의응답 시스템(100)은 데이터 수집부(110), 자연어 처리부(120), 정의문 판별부(130), 제1 그래프 생성부(140), 제2 그래프 생성부(150), 지식 확장부(160), 질문 수신부(170) 및 정답 도출부(180)를 포함한다.1, the question answering system 100 includes a data collecting unit 110, a natural language processing unit 120, a definition discriminating unit 130, a first graph generating unit 140, A knowledge extension unit 160, a question receiving unit 170, and a correct answer derivation unit 180. As shown in FIG.

데이터 수집부(110)는 외부로부터 복수의 데이터들을 수집한다. 본 발명의 실시예에서는 데이터 수집부(110)가 수집하는 데이터로 텍스트를 예로 하여 설명하며, 텍스트는 사전, 백과사전, 웹 문서에 포함되어 있는 복수의 텍스트를 예로 하여 설명하나 반드시 이와 같이 한정되는 것은 아니다. The data collecting unit 110 collects a plurality of data from the outside. In the embodiment of the present invention, text is used as an example of data collected by the data collecting unit 110, and text is described as an example of a plurality of texts included in a dictionary, an encyclopedia, and a web document. However, It is not.

자연어 처리부(120)는 데이터 수집부(110)가 수집한 복수의 텍스트들에 포함되어 있는 다양한 자연어 지식을 개념 그래프로 표상하기 위하여, 텍스트들에 자연어 처리를 수행한다. 그리고 자연어 처리한 결과로서 자연어 분석 결과를 추출한다. 여기서 결과 데이터에는 동사 토큰, 동사 토큰과 의미역 관계를 갖고 있는 복수의 모든 토큰들과 각각의 복수의 의미역(sematic role), 복수의 의미역 각각에 대응하는 중심어 토큰을 기준으로 하는 텍스트를 포함하는 것을 예로 하여 설명한다.The natural language processing unit 120 performs natural language processing on texts to represent various natural language knowledge included in a plurality of texts collected by the data collection unit 110 as conceptual graphs. Then, natural language analysis result is extracted as natural language processing result. Wherein the result data includes a verb token, a plurality of all tokens having a semantic inverse relationship with the verb token, and a plurality of semantic roles, and a text based on a token of the key word corresponding to each of the plurality of semantic roles As an example.

이는 데이터 수집부(110)가 다양한 도메인에서 데이터를 수집하기 때문에, 도메인에 종속적이지 않은 일반적인 자연어 분석을 위해서 텍스트의 토큰화, 텍스트 내 문장 인식, 품사 태깅, 개체명 인식, 구문 파싱, 의존 파싱, 상호 참조 해소 및 의미역 부착으로 이어지는 언어 처리의 파이프라인을 사용한다. 따라서, 본 발명의 실시예에 따른 자연어 처리부(120)는 기계 학습을 통한 자동적인 의미역 부착 기법에서 사용되고 있는 언어 자원인 PropBank와 이를 기반으로 하는 자동 의미역 부착기인 ClearNLP를 활용하는 것을 예로 하여 설명하나, 반드시 이와 같이 한정되는 것은 아니다. Since the data collecting unit 110 collects data from various domains, it can be used for general natural language analysis that is not dependent on the domain, such as tokenization of text, textual sentence recognition, partly tagging, object name recognition, syntax parsing, It uses a pipeline of language processing that leads to cross-reference resolution and semantic attachment. Therefore, the natural language processing unit 120 according to the embodiment of the present invention takes as an example the use of PropBank, which is a language resource used in an automatic semantic attaching technique through machine learning, and ClearNLP, which is an automatic semantic adder based thereon But it is not necessarily limited thereto.

또한, 본 발명의 실시예에서는 성능의 증대를 위해 Stanford NLP 도구를 병행적으로 활용한다. 이를 통해 구문 분석 트리로의 연결, 상호참조 해소 결과 결합, 시간 표현 정규화 결과 결합에 있어서 공동의 토큰 리스트를 사용하여, 각 결과를 균열 없이 통합하여 사용할 수 있다. 여기서 PropBank, ClearNLP, StanfordNLP 도구의 기능은 이미 알려진 사항으로, 본 발명의 실시예에서는 상세한 설명을 생략한다.Also, in the embodiment of the present invention, the Stanford NLP tool is used in parallel to increase the performance. This allows for concatenation of each result to the parsing tree, combining the results of the cross-reference elimination, combining the time expression normalization results, and using the common token list. Here, the functions of the PropBank, ClearNLP, and Stanford NLP tools are already known, and a detailed description thereof will be omitted in the embodiment of the present invention.

정의문 판별부(130)는 자연어 처리부(120)에서 자연어 처리되어 생성된 자연어 분석 결과인 결과 데이터를 수신하고, 수신한 결과 데이터가 정의문인지 여부를 판별하여 정의문인 결과 데이터(이하, 설명의 편의를 위하여 '정의문 데이터'라 지칭함)와 비정의문인 결과 데이터(이하, 설명의 편의를 위하여 '비정의문 데이터'라 지칭함)를 추출한다. 이는 생성되는 지식의 종류에 따라 그래프의 종류를 구분하여 효과적으로 질의 응답을 수행하기 위함이다. 이때, 정의문 판별부(130)가 결과 데이터를 정의문 데이터인지 비정의문 데이터인지 판별하는 방법은 여러 방법을 통해 수행될 수 있으므로, 본 발명의 실시예에서는 상세한 설명을 생략한다.The definition statement determination unit 130 receives result data that is a natural language analysis result generated by natural language processing in the natural language processing unit 120, determines whether the received result data is a definition statement, and outputs definition word result data (Hereinafter referred to as 'definition data' for convenience) and result data (hereinafter referred to as 'non-question data' for convenience of explanation). This is because the types of graphs are classified according to the kind of knowledge to be generated, so that a query response can be effectively performed. At this time, a method of determining whether the result data is the definition data or the non-question data can be performed through various methods, so that detailed description will be omitted in the embodiment of the present invention.

제1 그래프 생성부(140)는 자연어 처리부(120)에서 자연어 처리되어 추출된 자연어 분석 결과인 결과 데이터와 정의문 판별부(130)에서 추출한 정의문에는 각각 용어에 대한 정의적 지식이 포함된 것으로 가정하므로, 제1 그래프 생성부(140)는 자연어 분석 결과와 정의적 지식으로부터 제1 개념 그래프를 생성한다. 본 발명의 실시예에서는 제1 개념 그래프를 용어적 개념 그래프라 지칭하기도 하며, 반드시 이와 같이 한정되는 것은 아니다. The first graph generating unit 140 includes a definitional knowledge about the terms in the result data that is the natural language analysis result extracted by natural language processing in the natural language processing unit 120 and the definition statements extracted by the definition statement discriminating unit 130 , The first graph generating unit 140 generates a first concept graph from the natural language analysis result and the affirmative knowledge. In the embodiment of the present invention, the first concept graph may be referred to as a term concept graph, but it is not necessarily limited thereto.

제2 그래프 생성부(150)는 자연어 처리부(120)에서 자연어 처리되어 추출된 자연어 분석 결과와 정의문 판별부(130)에서 판별된 비정의문에는 실제에 대한 사실적 지식이 포함된 것으로 가정하므로, 이로부터 제2 개념 그래프를 생성한다. 본 발명의 실시예에서는 제2 개념 그래프를 개체적 개념 그래프라 지칭하기도 하며, 반드시 이와 같이 한정되는 것은 아니다. The second graph generator 150 assumes that the natural language analysis result extracted by natural language processing in the natural language processing unit 120 and the factual knowledge about the actual are included in the non-question text discriminated by the definition statement discrimination unit 130, A second concept graph is generated. In the embodiment of the present invention, the second conceptual graph may also be referred to as an individual conceptual graph, but it is not necessarily limited thereto.

여기서, 개념 그래프라 함은, 지식을 개념 노드와 복수의 개념 노드 사이의 관계 노드들이 연결되는 형태로 표현하는 것을 의미하며, 개념 노드에 담기는 지식의 종류에 따라서 용어적 개념 그래프와 개체적 개념 그래프로 구분하여 저장한다. Here, the term concept graph refers to the expression of knowledge in the form of a connection node between concept nodes and concept nodes. The concept node includes a term concept graph and an individual concept Save them in a graph.

지식 확장부(160)는 제1 그래프 생성부(140)에서 생성한 제1 개념 그래프와 제2 그래프 생성부(150)에서 생성한 제2 개념 그래프를 연계하여 지식이 확장된 확장 그래프를 생성한다. 확장 그래프는 제1 개념 그래프와 제2 개념 그래프의 결합(join) 연산에 의해 생성되는 것을 예로 하여 설명하며, 결합 연산 방법은 이미 알려진 사항으로 본 발명의 실시예에서는 상세한 설명을 생략한다.The knowledge extension unit 160 generates an extended graph with knowledge extended by associating the first conceptual graph generated by the first graph generating unit 140 and the second conceptual graph generated by the second graph generating unit 150 . The extended graph is generated by a join operation between the first concept graph and the second concept graph. The joint computation method is already known, and a detailed description thereof will be omitted in the embodiment of the present invention.

질문 수신부(170)는 외부로부터 입력되는 질문을 수신하고, 수신한 질문을 자연어 처리 한 후 질문 그래프를 생성한다. 생성한 질문 그래프는 정답 도출부(180)로 입력되어, 질문에 대한 정답을 도출할 수 있도록 한다.The question receiver 170 receives a question input from the outside, processes the received question in a natural language, and generates a question graph. The generated question graph is input to the correct answer derivation unit 180 so that a correct answer to the question can be derived.

정답 도출부(180)는 지식 확장부(160)에서 생성된 확장 그래프와 질문 그래프 간의 유사도 계산 함수를 통한 근사 매칭으로 정답 그래프를 찾는다. 그리고 정답 그래프에서 정답 노드를 찾아 질의에 대한 정답으로 도출하여 사용자에게 제공한다.The correct answer derivation unit 180 finds the correct answer graph by approximate matching using the similarity calculation function between the extended graph generated by the knowledge expanding unit 160 and the question graph. Then, a correct node is searched for in the correct answer graph, and the answer is given to the user.

상기에서 설명한 질의응답 시스템(100)을 이용하여 질의응답을 수행하는 방법에 대해 도 2를 참조로 설명한다.A method of performing a query response using the above-described question and answer system 100 will be described with reference to FIG.

도 2는 본 발명의 실시예에 따른 질의응답 방법에 대한 흐름도이다.2 is a flowchart of a query response method according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 질의응답 시스템(100)의 자연어 처리부(110)는 외부로부터 데이터를 수신하고, 수신한 데이터 즉, 입력 데이터를 자연어 처리하여 분석 결과를 출력 데이터로 생성한다(S100). 여기서 입력 데이터는 사전, 백과사전, 웹 문서 내에 포함되어 있는 텍스트들을 의미하며, 출력 데이터는 동사 토큰, 동사와 의미역 관계를 갖고 있는 복수의 토큰들과 복수의 의미역, 복수의 의미역 각각에 대응하는 중심어 토큰을 기준으로 하는 텍스트를 포함한다. 2, the natural language processing unit 110 of the query response system 100 receives data from outside, processes the received data, that is, input data, by natural processing, and generates an analysis result as output data (S100) . Here, the input data refers to texts contained in a dictionary, an encyclopedia, and a web document. The output data includes a plurality of tokens having a semantic inverse relation with the verb token, a verb, And includes text based on the corresponding key word token.

본 발명의 실시예에서는 ClearNLP 도구에 의한 분석 결과를 질의응답 시스템(100)에서 활용하기 위해, 3단계로 이루어진 후처리 과정이 추가로 이루어진다. 먼저 첫 번째 단계로, PropBank 식별자를 지니고 있는 동사 토큰들을 찾는 과정을 수행한다. 그리고 두 번째 단계로, 동사와 의미역 관계를 갖고 있는 모든 토큰들과 토큰의 의미역을 찾아내고, 마지막으로 각 의미역에 대응하는 중심어 토큰을 기준으로 구 수준의 텍스트를 찾아낸다. 전치사의 경우 해당하는 실제 논항을 찾기 위해 연결된 명사구까지 더불어 탐색할 수 있다.In the embodiment of the present invention, in order to utilize the analysis result of the ClearNLP tool in the query response system 100, a post-processing process of three steps is additionally performed. The first step is to find the verb tokens with the PropBank identifier. The second step is to find all the tokens and token semantics that have an inverse semantic relationship with the verb, and finally find the phrase level text based on the token of the key word corresponding to each semantic part. In the case of prepositions, we can search for connected noun phrases together to find the corresponding actual argument.

이상의 S100 단계를 통해 출력 데이터가 생성되면, 정의문 판별부(130)는 출력 데이터에서 정의문을 판별한다. 그리고, S100 단계에서 생성된 출력 데이터와 정의문 판별 결과를 토대로, 제1 그래프 생성부(140)는 용어적 개념 그래프인 제1 개념 그래프를 생성하고, 제2 그래프 생성부(150)는 개체적 개념 그래프인 제2 개념 그래프를 각각 생성한다(S110). If the output data is generated through step S100, the definition statement determiner 130 determines the definition statement from the output data. The first graph generating unit 140 generates a first concept graph, which is a terminological concept graph, based on the output data generated in step S100 and the result of the definition statement determination, and the second graph generating unit 150 generates a first concept graph, And a second concept graph, which is a concept graph, are respectively generated (S110).

여기서 개념 그래프라 함은, 지식을 개념 노드와 복수의 개념 노드 사이의 관계 노드들이 연결되는 형태로 표현하는 것을 의미한다. 그리고 용어적 개념 그래프는 정의문으로부터 지식 요소들을 추출하여 용어에 대한 정의적 지식을 표현한 것이고, 개체적 개념 그래프는 비정의문으로부터 지식 요소들을 추출하여 실제에 대한 사실적 지식을 표현한 것이다.Here, conceptual graph means that knowledge is expressed in a form in which relation nodes between a concept node and a plurality of concept nodes are connected. And the terminological concept graph is the expression of the definition knowledge about the term by extracting the knowledge elements from the definition statement and the individual concept graph is the representation of the factual knowledge about the reality by extracting the knowledge elements from the irrelevant question.

제1 개념 그래프의 생성에 대해 먼저 설명하면, 본 발명의 실시예에서는 정의문 판별부(130)에서 판별한 정의문들 및 자연어 처리부(120)에서 생성한 출력 데이터로부터 용어적 개념 그래프를 추출한다. 생성된 용어적 개념 그래프는 정의문에서 정의하는 용어와 동일(equivalent) 관계를 형성한 뒤에, 제1 그래프 생성부(140) 내에 용어적 개념 그래프로 저장된다. 본 발명의 실시예에서는 용어적 개념 그래프가 제1 그래프 생성부(140)에 저장되는 것을 예로 하여 설명하나, 제1 개념 그래프를 저장하는 저장부가 별도로 구현될 수도 있다. In the embodiment of the present invention, a term concept concept graph is extracted from the definition statements determined by the definition statement determiner 130 and the output data generated by the natural language processor 120 . The generated terminology concept graph is stored as a terminology concept graph in the first graph generation unit 140 after forming an equivalent relationship with terms defined in the definition statement. Although the terminology concept graph is stored in the first graph generator 140 in the embodiment of the present invention, the storage unit storing the first concept graph may be separately implemented.

다음, 제2 개념 그래프에 대해 설명하면, 본 발명의 실시예에서는 정의문 판별부(130)에서 판별한 비정의문들 및 자연어 처리부(120)에서 생성한 출력 데이터로부터 개체적 개념 그래프를 추출한다. 공통적으로 개념 그래프 자체를 생성하는 과정은 동일하나, 정의문 여부가 판별되어 다른 종류의 지식이 담겨 있고, 정의하고 있는 용어가 없기 때문에, 이에 대한 추가적인 연결 과정 없이 제2 그래프 생성부(150) 내에 개체적 개념 그래프로 저장된다. 본 발명의 실시예에서는 개체적 개념 그래프가 제2 그래프 생성부(150)에 저장되는 것을 예로 하여 설명하나, 제2 개념 그래프를 저장하는 저장부가 별도로 구현될 수도 있다.Next, the second conceptual graph will be described. In the embodiment of the present invention, an individual conceptual graph is extracted from the non-questionable words determined by the definition-statement determining unit 130 and the output data generated by the natural language processing unit 120. Generally, the process of generating the concept graph itself is the same, but since the definition statement is discriminated to include other kinds of knowledge and there is no defined term, the second graph generation unit 150 It is stored as an individual concept graph. In an embodiment of the present invention, an exemplary concept graph is stored in the second graph generator 150, but a storage unit storing the second concept graph may be separately implemented.

제1 개념 그래프와 제2 개념 그래프에서 공통적으로 필요한 개념 그래프를 생성하기 위해서는, 텍스트로부터 개념에 해당하는 부분들을 먼저 인식하는 일과 개념들 사이의 관계를 인식하는 일이 필요하다. 본 발명의 실시예에서는 개념에 해당하는 부분들을 인식하는 것은 명사구를 찾는 것으로 하며, 관계를 인식하는 것에 대해서는 동사를 중심으로 하여 인식된 의미역과 OIE(Open Information Extraction) 기법을 구현한 최신 시스템인 OLLIE에 의하여 인식된 관계를 포함하는 것으로 한다. 관계를 인식하는 과정에서 PropBank 의미역 식별자는 VerbNet 의미역으로 변환하여 사용하며, 의미역 식별자들의 변환 방법은 이미 알려진 사항으로 본 발명의 실시예에서는 상세한 설명을 생략한다. 추가적으로 전치사 종류와 개체명 인식 결과를 이용한 규칙들에 의하여 시간, 장소, 행위자 정보를 인식하여 반영한다.In order to generate a concept graph that is common to both the first concept graph and the second concept graph, it is necessary to recognize the relationship between concepts and concepts of recognizing parts corresponding to concepts from the text. In the embodiment of the present invention, recognizing the parts corresponding to the concept is to search for a noun phrase, and regarding the recognition of the relation, OLLIE, which is a state-of-the-art system implementing OIE (Open Information Extraction) As well as the relationships recognized by them. In the process of recognizing the relationship, the PropBank semantic identifier is converted into VerbNet semantic domain and the conversion method of the semantic identifier is already known, and a detailed description thereof will be omitted in the embodiment of the present invention. In addition, time, place, and actor information are recognized and reflected by rules using prepositional types and entity name recognition results.

이상의 S110 단계를 통해 제1 개념 그래프 및 제2 개념 그래프가 생성되면, 지식 확장부(160)는 제2 개념 그래프로부터 제1 개념 그래프를 연계하여 지식이 확장된 확장 그래프를 생성한다(S120). 제2 개념 그래프인 개체적 개념 그래프에 존재하는 개념들에는, 개념 그 자체로 이해되기보다 추가적인 정의적 지식에 의하여 이해될 필요가 있는 개념 노드들이 존재하는 경우가 많다. 이러한 개념 노드들이 이미 생성된 제1 그래프인 용어적 개념 그래프와의 결합(join) 연산에 의하여 두 종류의 개념 그래프 간의 연계가 이루어지게 되는 것이다.If the first conceptual graph and the second conceptual graph are generated through step S110, the knowledge extension unit 160 generates an expanded extended graph by associating the first conceptual graph from the second conceptual graph (S120). Concepts that exist in the second concept graph, an individual concept graph, often have concept nodes that need to be understood by additional definition knowledge rather than being understood by the concept itself. The association between the two types of concept graphs is performed by a join operation with the terminology concept graph, which is the first graph in which the concept nodes are already generated.

본 발명의 실시예에서는 지식 확장부(160)가 제2 그래프 생성부(150)로부터 각 개체적 개념 그래프들을 CogXML 포맷으로 읽어오고, 개체적 개념 그래프 상에 존재하는 개념 노드들을 제1 그래프 생성부(140)에 저장된 용어적 개념 그래프들 중에서 검색한 뒤 결합 연산을 수행한다. 이 검색 과정은 생성된 모든 용어적 개념 그래프의 용어들을 미리 색인한 뒤에 개체적 개념 그래프에서 나타난 개념 노드들을 검색하여 상위 1개의 검색 결과를 활용한다. 이렇게 확장된 그래프가 지식 확장부(160)가 출력하는 확장 그래프가 된다.In the embodiment of the present invention, the knowledge extension unit 160 reads each conceptual concept graph from the second graph creator 150 in the CogXML format, And searches for terminological concept graphs stored in the storage unit 140 and performs a combining operation. In this search process, all the generated terminology concept graphs are pre-indexed, and then the concept nodes shown in the individual concept graph are searched and the top one search result is utilized. The extended graph is the extended graph output by the knowledge extension unit 160.

지식 확장부(160)에서 생성된 확장 그래프는 질문 수신부(170)에서 생성된 질문 그래프와의 비교를 거쳐 정답 추출에 이용된다. 즉, 정답 도출부(180)는 질문을 토대로 생성된 질문 그래프와 지식 확장부(160)에서 생성된 확장 그래프에 유사도 계산 함수를 기반으로 일치하는 노드의 존재 유무 및 일치 노드의 연결정보를 토대로 근사매칭을 실시하며, 일치도가 가장 높은 하위 그래프를 가지고 있는 그래프를 정답 후보로 인식하여 결과 그래프를 생성한다(S130). The extended graph generated by the knowledge extension unit 160 is used for correct answer extraction through comparison with the question graph generated by the question receiver 170. [ That is, based on the question graph generated based on the question and the extended graph generated by the knowledge expanding unit 160, the correct answer derivation unit 180 calculates an approximation based on the presence or absence of the matching node and the connection information of the matching node Matching is performed, and a graph having a sub-graph with the highest degree of agreement is recognized as a correct candidate and a result graph is generated (S130).

이상에서 설명한 절차에 따라 질의응답 시스템(100)에서 정답을 추출하는 예에 대해 도 3을 참조로 설명한다.An example of extracting the correct answer in the Q & A 100 according to the procedure described above will be described with reference to FIG.

도 3은 본 발명의 실시예에 따른 질의로부터 정답을 추출하는 예시도이다.3 is an exemplary diagram for extracting correct answers from a query according to an embodiment of the present invention.

도 3에 도시된 바와 같이, 장학퀴즈 질의로부터 위키피디아 정답 문서와의 개념 그래프 매칭을 통해 정답을 추출한다. 본 발명의 실시예에서의 개념 그래프 매칭에 사용되는 정보로 위키피디아를 사용하였으나, 반드시 이와 같이 한정되는 것은 아니다.As shown in FIG. 3, a correct answer is extracted from a scholarship quiz query through conceptual graph matching with a Wikipedia correct answer document. Although Wikipedia is used as the information used in the conceptual graph matching in the embodiment of the present invention, it is not necessarily limited thereto.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It belongs to the scope of right.

Claims

질의응답 시스템에 있어서,
외부로부터 수신한 복수의 텍스트들을 기계 학습을 통해 자연어 처리하고, 자연어 처리 결과인 결과 데이터를 생성하는 자연어 처리부;
상기 자연어 처리부가 생성한 상기 결과 데이터가 정의문인지 판별하고, 정의문 데이터와 비정의문 데이터를 추출하는 정의문 판별부;
상기 자연어 처리부가 생성한 결과 데이터와 상기 정의문 판별부에서 추출한 정의문 데이터를 토대로 제1 개념 그래프를 생성하는 제1 그래프 생성부;
상기 자연어 처리부가 생성한 결과 데이터와 상기 정의문 판별부에서 추출한 비정의문 데이터를 토대로 제2 개념 그래프를 생성하는 제2 그래프 생성부;
상기 제1 개념 그래프와 상기 제2 개념 그래프를 결합 연산하여 확장 그래프를 생성하는 지식 확장부; 및
상기 지식 확장부에서 생성한 확장 그래프와 외부로부터 입력된 질문을 자연어 처리하여 생성한 질문 그래프 간에 유사도 계산 함수를 이용하여 근사 매칭으로 질문에 대응하는 정답 그래프를 찾고, 상기 정답 그래프에서 정답 노드를 찾아 상기 질문에 대한 정답을 도출하는 정답 도출부
를 포함하는 질의응답 시스템.In a query response system,
A natural language processing unit for processing a plurality of texts received from the outside through natural language processing through machine learning and generating result data as a natural language processing result;
A definition statement discrimination unit for discriminating whether the result data generated by the natural language processing unit is a definition statement, and extracting the definition statement data and non-question data;
A first graph generation unit for generating a first concept graph based on result data generated by the natural language processing unit and definition statement data extracted by the definition determination unit;
A second graph generating unit for generating a second concept graph based on the result data generated by the natural language processing unit and the non-question text data extracted by the definition determining unit;
A knowledge extension unit for combining the first conceptual graph and the second conceptual graph to generate an extended graph; And
Searching for a correct answer graph corresponding to the question by approximate matching using the similarity calculation function between the extended graph generated by the knowledge expanding unit and the question graph generated by natural language processing of the query inputted from the outside, A correct answer derivation part for deriving a correct answer to the question
And a query response system.

제1항에 있어서,
외부로부터 복수의 텍스트들을 포함하는 데이터를 수집하고, 상기 수집한 텍스트들을 상기 자연어 처리부로 전달하는 데이터 수집부; 및
상기 외부로부터 입력되는 질문을 수신하고, 상기 수신한 질문을 자연어 처리하여 질문 그래프인 상기 질문 그래프를 생성한 후 상기 정답 도출부로 제공하는 질문 수신부
를 더 포함하는 질의응답 시스템.The method according to claim 1,
A data collecting unit collecting data including a plurality of texts from the outside and transmitting the collected texts to the natural language processing unit; And
A question receiving unit for receiving the question input from the outside, processing the received question by natural language to generate the question graph as a question graph, and providing the question graph to the correct answer derivation unit,
Further comprising:

제1항에 있어서,
상기 결과 데이터는 동사 토큰, 상기 동사 토큰과 의미역 관계를 갖고 있는 복수의 모든 토큰들과 각각의 복수의 의미역(sematic role), 복수의 의미역 각각에 대응하는 중심어 토큰을 기준으로 하는 텍스트를 포함하는 질의응답 시스템.The method according to claim 1,
The resultant data may include a verb token, a plurality of all tokens having a semantic inverse relationship with the verb token, a plurality of semantic roles, and a text based on a token of a key word corresponding to each of the plurality of semantic roles, Included question-and-answer system.

제1항에 있어서,
상기 제1 개념 그래프는 정의문 데이터로부터 지식 요소들을 추출하여 용어에 대한 정의적 지식을 표현한 용어적 개념 그래프이고, 상기 제2 개념 그래프는 비정의문으로부터 지식 요소들을 추출하여 실제에 대한 사실적 지식을 표현하는 개체적 개념 그래프인 질의응답 시스템.The method according to claim 1,
The first conceptual graph is a terminological conceptual graph that extracts knowledge elements from the definition data and expresses the definitional knowledge about the term. The second conceptual graph extracts knowledge elements from the non-questionable statement, A question-and-answer system that is an individual concept graph.

질의응답 시스템이 질문에 대해 응답하는 방법에 있어서,
외부로부터 수신한 복수의 텍스트들을 기계 학습을 통해 자연어 처리하여 결과 데이터를 생성하고, 상기 결과 데이터가 정의문인지 판별하여 정의문 데이터와 비정의문 데이터를 추출하는 단계;
상기 결과 데이터와 정의문 데이터 또는 비정의문 데이터 중 어느 하나의 데이터를 이용하여 생성된 제1 개념 그래프 및 제2 개념 그래프를 이용하여 확장 그래프를 생성하는 단계;
외부로부터 입력되는 질문을 자연어 처리하여 질문 그래프를 생성하는 단계;
상기 질문 그래프와 상기 확장 그래프 간에 유사도 계산 함수를 이용하여 근사 매칭으로 정답 그래프를 찾는 단계; 및
상기 정답 그래프에서 정답 노드를 찾아 상기 질문에 대한 정답으로 도출하는 단계
를 포함하는 질의응답 방법.As to how the Q & A system responds to questions,
Processing the plurality of texts received from the outside by natural language processing through machine learning, generating result data, and extracting the definition data and the questionable data by determining whether the result data is a definition statement;
Generating an extended graph using the first conceptual graph and the second conceptual graph generated by using the resultant data and either one of the definition data or the non-questionable data;
Generating a question graph by subjecting a question input from the outside to a natural language;
Finding an answer graph by approximate matching using the similarity calculation function between the question graph and the extended graph; And
Finding a correct node in the correct answer graph and deriving the answer node as a correct answer to the question
/ RTI >

제5항에 있어서,
상기 확장 그래프를 생성하는 단계는,
상기 결과 데이터와 정의문 데이터를 이용하여 상기 제1 개념 그래프를 생성하는 단계;
상기 결과 데이터와 비정의문 데이터를 이용하여 상기 제2 개념 그래프를 생성하는 단계; 및
상기 제1 개념 그래프와 상기 제2 개념 그래프를 결합 연산하여 상기 확장 그래프를 생성하는 단계
를 포함하는 질의응답 방법.6. The method of claim 5,
Wherein the step of generating the extended graph comprises:
Generating the first concept graph using the result data and the definition statement data;
Generating the second concept graph using the resultant data and non-questionable data; And
Combining the first conceptual graph and the second conceptual graph to generate the extended graph
/ RTI >

제6항에 있어서,
상기 제1 개념 그래프는 정의문 데이터로부터 지식 요소들을 추출하여 용어에 대한 정의적 지식을 표현한 용어적 개념 그래프이고, 상기 제2 개념 그래프는 비정의문으로부터 지식 요소들을 추출하여 실제에 대한 사실적 지식을 표현하는 개체적 개념 그래프인 질의응답 방법.The method according to claim 6,
The first conceptual graph is a terminological conceptual graph that extracts knowledge elements from the definition data and expresses the definitional knowledge about the term. The second conceptual graph extracts knowledge elements from the non-questionable statement, A question and answer method that is an individual concept graph.

삭제delete