KR101541170B1

KR101541170B1 - Apparatus and method for summarizing text

Info

Publication number: KR101541170B1
Application number: KR1020140142828A
Authority: KR
Inventors: 송도규
Original assignee: (주)센솔로지
Priority date: 2014-10-21
Filing date: 2014-10-21
Publication date: 2015-08-03

Abstract

The present invention relates to an apparatus for summarizing text based on a resource description framework (RDF), which comprises: an RDF triple conversion unit for converting an input text made up of a plurality of sentences into a plurality of RDF triples; a main triple determination unit for extracting one or more main triples from the RDF triples based on at least one among qualities and vocabularies included in a predicate; and a summary generation unit for generating a summary by extracting a sentence corresponding to the main triples from the input text.

Description

텍스트 요약 장치 및 방법{APPARATUS AND METHOD FOR SUMMARIZING TEXT}[0001] APPARATUS AND METHOD FOR SUMMARIZING TEXT [0002]

본 발명은 텍스트 요약 장치 및 방법에 관한 것이다.The present invention relates to a text summarizing apparatus and method.

현재 우리는 각종 문서뿐 아니라 뉴스, 블로그, SNS 등 소셜미디어에 수시로 포스팅되는 텍스트양이 폭증하는 빅데이터 시대에 살고 있다. 따라서, 많은 텍스트들 중에서 관심있는 텍스트만을 선별하는 데에도 과도한 시간이 요구된다. 이에 컴퓨터가 사람을 대신하여 텍스트의 의미를 파악하고, 선호/비선호의 감성 분석을 하며 텍스트의 주요 내용을 요약하여 리포팅해 준다면, 우리의 일상생활을 편리하게 향상시키고 인류의 생활양식에 유용한 변화를 가져올 수 있다. 그러나 컴퓨터는 자연언어의 유연성과 풍부한 표현력을 충분히 다루지 못하는 한계가 있다. 현재 컴퓨터가 텍스트 요약을 하는 방법은, 대부분 언급 빈도가 높은 어휘가 포함된 문장을 추출하는 방법이므로, 이렇게 의미를 도외시한 단순한 방법으로는 사용자에게 실용성 있는 서비스를 제공하기 어렵다. Currently, we live in a big data era where the amount of text that is often posted on social media such as news, blog, SNS as well as various documents is increasing. Therefore, excessive time is required to select only the text of interest among many texts. If the computer understands the meaning of the text on behalf of the person, analyzes the emotion of the preference / non-preference, and summarizes the main contents of the text, we can easily improve our everyday life and make useful changes in the lifestyle of mankind. Can be imported. However, computers have limitations that can not sufficiently deal with natural language flexibility and expressive power. Currently, the way the computer performs text summarization is to extract sentences containing vocabulary with the most frequently mentioned words. Therefore, it is difficult to provide practical service to the users with a simple method that ignores this meaning.

컴퓨터에서 자연언어를 자동으로 처리하기 위한 연구는 오래전부터 시도되었으나 텍스트의 의미를 이해하기에는 충분치 못하였다. 최근 언어를 컴퓨터가 이해할 수 있는 포맷인 리소스 디스크립션 프레임워크(Resource Description Framework, RDF) 트리플로 변환하여 텍스트의 의미를 이해하는 기술이 연구되고 있다. RDF 트리플은 월드 와이드 웹 컨소시엄(World Wide Web Consortium, W3C)이 관장하는 국제 표준으로서, 지식과 정보를 서브젝트[Subject(resource)], 프레디키트[Predicate(property)], 오브젝트[Object(literal)]의 세 쌍으로 나타내는 형식이다. 그러나, 지금까지 제시된 방법론은 텍스트의 감성분석과 요약, 리포팅을 충분히 구현하지 못하는 한계가 있다.Research on automatic processing of natural language on computers has long been attempted, but it was not enough to understand the meaning of the text. Recently, a technique of understanding the meaning of text by converting it into a resource description framework (RDF) triple, which is a format that a computer can understand, is being studied. The RDF triple is an international standard, governed by the World Wide Web Consortium (W3C), which provides knowledge and information to subjects [resource], predicate [property], object [ As shown in Fig. However, the methodology presented so far has limitations in that it can not sufficiently implement emotional analysis, summarization and reporting of text.

국제특허출원의 출원공개공보 공개번호 특1997-7007499(1997년12월01일 공개)International patent application Laid-open Publication No. 1997-7007499 (published on December 01, 1997) 대한민국공개특허공보 공개번호 특2003-0039575(2003년05월22일 공개)Korean Unexamined Patent Publication No. 2003-0039575 (published May 22, 2003) 대한민국공개특허공보 공개번호 10-2009-0003090(2009년01월09일 공개)Korean Patent Publication No. 10-2009-0003090 (published on Jan. 09, 2009)

본 발명이 해결하고자 하는 과제는 RDF 트리플을 기초로 텍스트의 감성 내용을 분석하여 텍스트를 요약하고 리포팅하는 장치 및 방법을 제공하는 것이다.An object of the present invention is to provide an apparatus and method for analyzing emotional contents of a text based on an RDF triple to summarize and report the text.

본 발명의 한 실시예에 따른 리소스 디스크립션 프레임워크(Resource Description Framework, RDF) 기반으로 텍스트를 요약하는 장치로서, 복수의 문장으로 구성된 입력문을 복수의 RDF 트리플로 변환하는 RDF 트리플 생성부, 프레디키트에 포함된 자질과 어휘 중 적어도 하나를 기초로, 상기 복수의 RDF 트리플 중에서 적어도 하나의 주요 트리플을 추출하는 주요 트리플 결정부, 그리고 상기 입력문에서 상기 적어도 하나의 주요 트리플에 해당하는 문장을 추출하여 요약문을 작성하는 요약문 작성부를 포함한다.An apparatus for summarizing text based on a Resource Description Framework (RDF) according to an embodiment of the present invention includes: an RDF triple generator for converting an input statement composed of a plurality of sentences into a plurality of RDF triples; Extracting at least one main triple from among the plurality of RDF triples based on at least one of a quality and a lexicon included in the at least one main triple, and extracting a sentence corresponding to the at least one main triple in the input statement And a summary creating unit for creating a summary.

상기 주요 트리플 결정부는 상기 복수의 RDF 트리플 중에서, 프레디키트에 지정된 의미자질이 포함된 트리플을 상기 주요 트리플로 추출하는 제1 트리플 추출부를 포함할 수 있다.The main triple determination unit may include a first triple extracting unit for extracting, among the plurality of RDF triples, a triple including the semantic qualities assigned to the predicate kit by the main triple.

상기 지정된 의미자질은 선호 자질과 비선호 자질 중 어느 하나일 수 있다.The designated semantic feature may be one of a preference feature and a non-preference feature.

상기 주요 트리플 결정부는 상기 복수의 RDF 트리플 중에서, 프레디키트에 지정된 형태의 어휘가 포함된 트리플을 주요 트리플로 추출하는 제2 트리플 추출부를 더 포함할 수 있다.The main triple determination unit may further include a second triple extracting unit for extracting, from the plurality of RDF triples, a triple including a vocabulary of a type specified in the predicated kit, as a main triple.

상기 지정된 형태의 어휘는 "~하", "~지", "~되", "~돼", "수 있", 그리고 "수 없"는 중 적어도 하나를 포함할 수 있다.The vocabulary of the specified type may include at least one of "to", "to", "to", "to", "can", and "can not".

상기 요약문 작성부는 상기 입력문에서 상기 적어도 하나의 주요 트리플에 해당하는 적어도 하나의 주요 문장을 추출하는 주요 문장 추출부, 상기 적어도 하나의 주요 문장 각각에서 지정된 어구를 제거하여 각 정제 문장을 생성하는 불필요 어구 제거부, 그리고 상기 불필요 어구 제거부에서 생성된 정제 문장들을 모아서 요약문을 만드는 요약문 생성부를 포함할 수 있다.Wherein the summary creating unit includes a main sentence extracting unit for extracting at least one main sentence corresponding to the at least one main triple in the input sentence, a non-essential sentence extracting unit for removing the specified phrase from each of the at least one main sentence, And a summary generation unit for collecting the refinement sentences generated from the unnecessary phrase removal unit to generate a summary sentence.

상기 지정된 어구는 접속사, 부사어, 관형어, 감탄사 중 어느 하나에 해당할 수 있다.The designated phrase may correspond to any one of a conjunction, an adjective, an idiom, and an exclamation point.

상기 텍스트 요약 장치는 상기 요약문 작성부에서 작성된 요약문을 사용자에게 리포팅하는 요약문 리포팅부를 더 포함할 수 있다.The text summarizing apparatus may further include a summary report reporting unit for reporting the summary prepared by the summary preparing unit to a user.

본 발명의 다른 실시예에 따라 리소스 디스크립션 프레임워크(Resource Description Framework, RDF) 기반으로 장치가 텍스트를 요약하는 방법으로서, 복수의 문장으로 구성된 입력문을 복수의 RDF 트리플로 변환하는 단계, 상기 복수의 RDF 트리플 중에서, 지정된 의미자질과 지정된 형태의 어휘 중 적어도 하나가 프레디키트에 포함된 트리플을 주요 트리플로 추출하는 단계, 그리고 상기 입력문에서 상기 주요 트리플에 해당하는 문장을 추출하여 요약문을 작성하는 단계를 포함한다.According to another embodiment of the present invention, there is provided a method for a device to summarize text based on a Resource Description Framework (RDF), comprising: converting an input statement composed of a plurality of sentences into a plurality of RDF triples; Extracting, from the RDF triple, a triple in which at least one of a specified semantic qualification and a designated type of lexicon is included in a predicated kit, and extracting a sentence corresponding to the main triple in the input sentence, .

상기 주요 트리플로 추출하는 단계는 상기 복수의 RDF 트리플 중에서, 선호 자질과 비선호 자질 중 어느 하나가 프레디키트에 포함된 트리플을 상기 주요 트리플로 추출할 수 있다.The step of extracting with the main triple may extract a triple including one of the preference qualities and non-preference qualities of the plurality of RDF triples included in the predicate kit with the main triple.

상기 주요 트리플로 추출하는 단계는 상기 복수의 RDF 트리플 중에서, "~하", "~지", "~되", "~돼", "수 있", 그리고 "수 없" 중 어느 하나가 프레디키트에 포함된 트리플을 상기 주요 트리플로 추출할 수 있다.Wherein the step of extracting with the main triple is a step of selecting one of the plurality of RDF triples as one of the " to ", "to "," to ", "to "," The triple included in the kit can be extracted into the main triple.

상기 요약문을 작성하는 단계는 상기 입력문에서 상기 주요 트리플 각각에 해당하는 적어도 하나의 주요 문장을 추출하는 단계, 상기 적어도 하나의 주요 문장에서 지정된 어구를 제거하는 단계, 그리고 상기 지정된 어구가 제거된 적어도 하나의 문장을 모아서 상기 요약문을 만드는 단계를 포함할 수 있다.Wherein the step of generating the summary comprises extracting at least one main sentence corresponding to each of the major triples in the input statement, removing a specified phrase from the at least one main sentence, And collecting one sentence to form the summary.

상기 지정된 어구는 접속사, 부사어, 관형어, 감탄사 중 어느 하나에 해당하는 어구일 수 있다.The designated phrase may be a phrase corresponding to any one of a conjunction, an adjective, an adjective, and an adjective.

상기 텍스트 요약 방법은 상기 요약문을 사용자에게 리포팅하는 단계를 더 포함할 수 있다.The text summary method may further include reporting the summary to the user.

본 발명의 실시예에 따르면 텍스트 요약 장치가 사용자를 대신하여 텍스트의 의미를 파악하고, 선호/비선호의 감성을 포함하는 내용을 추출하여 텍스트를 요약할 수 있다. 본 발명의 실시예에 따르면 방대한 양의 텍스트를 컴퓨터가 대신 읽고 의미상 중요한 문장을 추출하여 요약하고 리포팅하므로, 사용자는 모든 텍스트를 읽고 선별할 필요없이, 중요한 텍스트를 쉽고 빠르게 파악할 수 있다. 따라서, 본 발명의 실시예에 따르면 사용자는 관심있는 텍스트 또는 중요한 텍스트를 간과하는 위험도 줄일 수 있다.According to the embodiment of the present invention, the text summarizing device can grasp the meaning of the text on behalf of the user and extract the contents including the emotion of the preference / non-preference to summarize the text. According to the embodiment of the present invention, a computer reads a large amount of text on behalf of the user, extracts and summarizes semantically important sentences, and allows the user to quickly and easily recognize important texts without reading and selecting all the texts. Thus, according to embodiments of the present invention, the user can also reduce the risk of overlooking text of interest or important text.

도 1은 본 발명의 한 실시예에 따른 텍스트 요약 장치의 블록도이다.
도 2는 본 발명의 한 실시예에 따른 텍스트 요약 방법의 흐름도이다.
도 3은 본 발명의 한 실시예에 따른 RDF 트리플 생성 방법의 흐름도이다.1 is a block diagram of a text summarizing apparatus according to an embodiment of the present invention.
2 is a flow diagram of a text summarization method in accordance with an embodiment of the present invention.
3 is a flowchart of a method for generating an RDF triple according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise.

이제 도면을 참고하여 본 발명의 실시예에 따른 텍스트 요약 장치 및 방법에 대해 설명한다.A text summarizing apparatus and method according to an embodiment of the present invention will now be described with reference to the drawings.

도 1은 본 발명의 한 실시예에 따른 텍스트 요약 장치의 블록도이다.1 is a block diagram of a text summarizing apparatus according to an embodiment of the present invention.

도 1을 참고하면, 텍스트 요약 장치(앞으로, "요약 장치"라고 함)(10)는 복수의 문장으로 구성된 텍스트를 RDF(Resource Description Framework) 트리플들로 변환하고, RDF 트리플들 중에서 프레디키트의 자질과 어휘 형태를 참조하여 선별된 RDF 트리플들로 요약문을 작성한다.Referring to FIG. 1, a text summarizer (hereinafter referred to as a "summarizer") 10 converts text composed of a plurality of sentences into RDF (Resource Description Framework) triples, And RDF triples selected with reference to the lexical form.

요약 장치(10)는 텍스트 입력부(100), RDF 트리플 생성부(200), RDF 트리플 리파지토리(300), 주요 트리플 결정부(400), 요약문 작성부(500), 그리고 요약문 리포팅부(600)를 포함한다. The abstracting apparatus 10 includes a text input unit 100, an RDF triple generation unit 200, an RDF triple repository 300, a main triple determination unit 400, a summary text generation unit 500, and a summary text reporting unit 600 .

텍스트 입력부(100)는 적어도 하나의 문장으로 구성된 텍스트(입력문)를 입력받는다. 텍스트 입력부(100)는 이메일, 인터넷 뉴스, 소셜미디어 등의 웹 문서, 그리고 워드프로세서 문서와 같이 다양한 종류의 텍스트를 입력받을 수 있다.The text input unit 100 receives a text (input statement) composed of at least one sentence. The text input unit 100 can receive various kinds of text such as a web document such as e-mail, Internet news, social media, and a word processor document.

RDF 트리플 생성부(200)는 텍스트 입력부(100)로 입력된 입력문을 RDF 트리플로 변환하여 RDF 트리플 리파지토리(300)에 저장한다. RDF 트리플 생성부(200)는 형태소 분석부(210), 어절 생성부(220), 문장 성분 분석부(230), 그리고 RDF 트리플 변환부(240)를 포함한다.The RDF triple generator 200 converts an input sent to the text input unit 100 into an RDF triple and stores it in the RDF triple repository 300. The RDF triple generating unit 200 includes a morpheme analyzing unit 210, a word generating unit 220, a sentence component analyzing unit 230, and an RDF triple converting unit 240.

형태소 분석부(210)는 입력문을 형태소 분석기와 전자사전을 이용하여 형태소로 분석한다. 형태소는 문장을 구성하는 요소 중 의미를 가진 가장 작은 단위이다. 전자사전은 형태소를 표제어로 하며, 각 형태소의 문법자질과 의미자질을 포함한다. 의미자질은 선호/비선호 자질을 포함한다. 선호/비선호 자질은 감성 관련 자질이라고 할 수 있다.The morpheme analysis unit 210 analyzes the input sentence using morpheme analyzer and electronic dictionary as morphemes. A morpheme is the smallest unit of meaning that makes up a sentence. The electronic dictionary uses stemming as a heading and includes grammatical and semantic qualities of each morpheme. Meaning qualities include preference / non-preference qualities. The preference / non-preference qualities are qualities related to emotion.

어절 생성부(220)는 형태소를 기초로 어절을 생성한다. 어절은 맞춤법에 맞게 쓰여진 문장에서 공백으로 구분되는 문장 구성 요소이다. 어절은 품사적 성격에 따라 체언(NN), 용언(VV), 관형어(MM), 부사어(MA), 감탄사(IC), 접속사(CONJ)로 구분된다.The phrase generator 220 generates a word phrase based on the morpheme. A phrase is a sentence component that is separated by a space in a spelling sentence. The vernacular is divided into cognition (NN), verb (VV), idiom (MM), adverb (MA), exclamation (IC), and connective (CONJ)

문장 성분 분석부(230)는 어절의 문장 내에서의 역할, 즉 문장 성분을 분석한다. 문장 성분은 주어(SBJ), 목적어(OBJ), 서술어(PRD), 보어(CMP), 수식어(MOD), 부가어(AJT), 접속어(CNJ), 독립어(INT)로 구분된다.The sentence component analyzing unit 230 analyzes a role in a sentence of a word, that is, a sentence component. Sentence components are divided into subject (SBJ), object (OBJ), predicate (PRD), bore (CMP), modifier (MOD), additional word (AJT), connection word (CNJ) and independent word (INT).

RDF 트리플 변환부(240)는 형태소 분석부(210), 어절 생성부(220), 그리고 문장 성분 분석부(230)에 의해 분석된 문장 성분과 문장 분절 정보를 기초로 입력문에 포함된 문장 각각을 RDF 트리플로 변환한다. RDF 트리플은 서브젝트, 프레디키트, 오브젝트의 세 쌍으로 구성된다.The RDF triple conversion unit 240 generates the RDF triple conversion unit 240 based on the sentence component and sentence segment information analyzed by the morpheme analysis unit 210, the word generating unit 220, and the sentence component analysis unit 230, To an RDF triple. The RDF triple consists of three pairs of subjects, predicate kits, and objects.

RDF 트리플 리파지토리(300)는 RDF 트리플 변환부(240)에서 변환된 RDF 트리플들을 저장한다.The RDF triple repository 300 stores the converted RDF triples in the RDF triple conversion unit 240.

주요 트리플 결정부(400)는 RDF 트리플 리파지토리(300)에 저장된 입력문의 RDF 트리플들 중에서, 주요 RDF 트리플을 선별한다. 이때, 주요 트리플 결정부(400)는 RDF 트리플의 프레디키트를 분석하여 주요 RDF 트리플을 선별한다. 주요 트리플 결정부(400)는 감성 트리플 추출부(410), 주요 트리플 추가 추출부(420), 그리고 출력부(430)를 포함한다.The main triple decision unit 400 selects the main RDF triple among the input query RDF triples stored in the RDF triple repository 300. At this time, the main triple decision unit 400 analyzes the predicate kit of the RDF triple to select the main RDF triple. The main triple determination unit 400 includes an emotional triple extraction unit 410, a main triple addition extraction unit 420, and an output unit 430.

감성 트리플 추출부(410)는 입력문의 RDF 트리플들 중에서, 지정된 의미자질, 예를 들면, 선호/비선호 자질을 가진 RDF 트리플을 추출한다. RDF 트리플의 프레디키트는 다양한 자질을 포함할 수 있는데, 만약, 프레디키트의 자질 중 '선호'라는 자질이 있는 경우, 해당 RDF 트리플을 감성 트리플로 추출하고, 더 구체적으로 선호 트리플로 태깅할 수 있다. 마찬가지로, 프레디키트의 자질 중 '비선호'라는 자질이 있는 경우, 해당 RDF 트리플을 감성 트리플로 추출하고, 더 구체적으로 비선호 트리플로 태깅할 수 있다. 예를 들어, "스마트폰A는 화질은 좋은데, 반응감이 나쁘다"라는 문장은 표 1의 RDF 트리플로 변환된다. 트리플1의 프레디키트 "좋다"는 선호 자질이고, 트리플1의 프레디키트 "나쁘다"는 비선호 자질이므로, 감성 트리플 추출부(410)는 트리플1과 트리플2를 감성 트리플로 추출한다.The emotional triple extractor 410 extracts, among RDF triples of the input query, an RDF triple having a specified semantic qualification, for example, preference / non-preference qualities. A Freddie kit in an RDF triple can contain a variety of qualities. If there is a predicate qualification of a Freddie kit, you can extract the RDF triple as an emotional triple and more specifically tag it with a preferred triple . Likewise, if the qualities of the Freddie kit are 'non-favorable', the RDF triple can be extracted as an emotional triple, and more specifically tagged as a non-preferred triple. For example, the sentence "Smartphone A has good image quality, but the reaction is bad" is converted to the RDF triple shown in Table 1. Since the Freddie kit "good" of triple 1 is a favorable quality and the Freddie kit "bad" of triple 1 is a non-favorable quality, emotional triple extractor 410 extracts triple 1 and triple 2 as emotional triples.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 트리플1Triple 1 스마트폰ASmartphone A 좋다good 화질Quality 트리플2Triple 2 스마트폰ASmartphone A 나쁘다bad 반응감Reaction

주요 트리플 추가 추출부(420)는 입력문의 RDF 트리플들 중에서, 감성 트리플 이외에도 특정 형태의 어휘를 포함하는 RDF 트리플을 추출한다. 주요 트리플 추가 추출부(420)가 지정한 특정 형태의 어휘는 다양할 수 있고, 사용자마다 자신이 원하는 정보를 설정할 수 있다. 예를 들어, 주요 트리플 추가 추출부(420)는 프레디키트에 "~하", "~지", "~되(돼)", "수 있", "수 없"과 같은 형태의 어휘가 있는 트리플을 주요 트리플로 추출할 수 있다. The main triple adder extractor 420 extracts an RDF triple including a certain type of vocabulary in addition to the emotional triple among the input query RDF triples. The vocabulary of a specific type designated by the main triple addition extractor 420 may vary, and the user can set desired information for each user. For example, the main triple adder extractor 420 may have a form of a form such as " to ", "to ", " A triple can be extracted as a major triple.

출력부(430)는 감성 트리플 추출부(410)와 주요 트리플 추가 추출부(420)에서 추출된 트리플들을 주요 RDF 트리플로 출력한다.The output unit 430 outputs the triples extracted by the emotional triple extraction unit 410 and the main triple addition extraction unit 420 to the main RDF triple.

주요 트리플 결정부(400)가 주요 RDF 트리플을 결정하는 방법을 다음에서, 예를 들어 설명한다. A method by which the main triple decision unit 400 determines a main RDF triple will be described in the following, for example.

입력문이 "스마트워치의 화면은 2인치 커브드 디스플레이가 사용됐다. 기본 디스플레이 바탕화면은 아날로그 바늘시계 그림이다. 일단 선명한 OLED 디스플레이를 통한 시계 디자인은 흠잡을 데 없을 만큼 훌륭하다. 여기에 러버 재질의 일반 손목시계 스트랩과 팔찌 형태의 스트랩을 모두 제공해 사용자 취향대로 선택할 수 있도록 했다. A전자는 향후 새로운 디자인과 재질이 적용된 스트랩을 추가로 출시할 계획이다."인 경우, RDF 트리플 리파지토리(300)에 표 2와 같은 RDF 트리플들이 저장된다.The input is: "The smart watch has a 2-inch curved display, the base display is an analog needle watch, and the watch design with a crisp OLED display is just as flawless. "In the future, we plan to launch a new strap with a new design and materials," said A Electronics president and chief executive officer of the RDF Triple Repository (300). RDF triples as shown in Table 2 are stored.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 트리플1Triple 1 스마트워치의 화면Screen of Smart Watch 사용되다be used 2인치 커브드 디스플레이2 inch curved display 트리플2Triple 2 기본 디스플레이 바탕화면Default Display Wallpaper 이다to be 아날로그 바늘시계 그림Analog needle clock picture 트리플3Triple 3 ?x? x 훌륭하다great 시계 디자인Clock design 트리플4Triple 4 ?x? x 제공하다Offer 러버 재질의 일반 손목시계 스트랩Plain wrist strap of rubber material 트리플5Triple 5 ?x? x 제공하다Offer 팔찌 형태의 스트랩Bracelet-shaped strap 트리플6Triple 6 ?y? y 선택할 수 있다You can choose ?x? x 트리플7Triple 7 A전자A electron 출시할 계획이다I plan to release it. 새로운 디자인과 재질이 적용된 스트랩Strap with new design and material

주요 트리플 결정부(400)는 표 2의 트리플들 중에서, 프레디키트에 선호/비선호 자질이 있거나, 특정 형태의 어휘(예를 들면,"~하", "~지", "~되(돼)", "수 있", "수 없")를 포함하는 트리플을 추출한다. 주요 트리플 결정부(400)는 표 3과 같이 프레디키트에 선호 자질(훌륭하다)이 있는 트리플3, 그리고, 프레디키트에 특정 형태의 어휘가 포함되어 있는 트리플1(~되), 트리플4(~하), 트리플5(~하), 트리플6(수 있)을 주요 RDF 트리플로 결정한다.The main triple decision unit 400 determines whether the predicate kits have a preference / non-preference qualification or a specific type of vocabulary (for example, "to "," to ", " Quot ;, "possibly "," not "). The main triple decision unit 400 includes a triple 3 with a favorable quality in the Freddie kit as shown in Table 3 and a triple 1 with a specific type of vocabulary in the Freddie kit, ), Triple 5 (~), triple 6 (possibly) as the main RDF triple.

서브젝트Subject 프레디키트Freddie Kit 오브젝트Object 트리플1Triple 1 스마트워치의 화면Screen of Smart Watch 사용되다It is being used 2인치 커브드 디스플레이2 inch curved display 트리플3Triple 3 ?x? x 훌륭하다 GRT is 시계 디자인Clock design 트리플4Triple 4 ?x? x 제공하다 And provide the 러버 재질의 일반 손목시계 스트랩Plain wrist strap of rubber material 트리플5Triple 5 ?x? x 제공하다 And provide the 팔찌 형태의 스트랩Bracelet-shaped strap 트리플6Triple 6 ?y? y 선택할 수 있다 You can choose ?x? x

요약문 작성부(500)는 주요 트리플 결정부(400)에서 추출한 주요 RDF 트리플을 기초로 요약문을 작성한다. 요약문 작성부(500)는 주요 문장 추출부(510), 불필요 어구 제거부(520), 그리고 요약문 생성부(530)를 포함한다. The summary-statement creating unit 500 creates a summary based on the main RDF triple extracted by the main triple determining unit 400. The summary-statement creating unit 500 includes a main-sentence extracting unit 510, an unnecessary-phrase removing unit 520, and a summary-generating unit 530.

주요 문장 추출부(510)는 텍스트 입력부(100)의 입력문에서 주요 RDF 트리플에 해당하는 문장을 추출한다. 주요 RDF 트리플이 표 3인 경우, 주요 문장 추출부(510)는 트리플1에 해당하는 문장(스마트워치의 화면은 2인치 커브드 디스플레이가 사용됐다.), 트리플3에 해당하는 문장(일단 선명한 OLED 디스플레이를 통한 시계 디자인은 흠잡을 데 없을 만큼 훌륭하다.), 트리플4부터 트리플6에 해당하는 문장(여기에 러버 재질의 일반 손목시계 스트랩과 팔찌 형태의 스트랩을 모두 제공해 사용자 취향대로 선택할 수 있도록 했다.)을 주요 문장으로 추출한다. 주요 문장을 모아보면 다음과 같다.
The main sentence extracting unit 510 extracts a sentence corresponding to the main RDF triple from the input sentence of the text input unit 100. When the main RDF triple is in Table 3, the main sentence extracting unit 510 extracts a sentence corresponding to the triple 1 (a 2-inch curve display is used for the smart watch screen), a sentence corresponding to the triple 3 The design of the watch through the display is unmistakeable), Triple 4 to Triple 6 (here you can find a custom rubber strap and a bracelet strap to choose from .) Is extracted as a main sentence. The main sentences are as follows.

<주요 문장> 스마트워치의 화면은 2인치 커브드 디스플레이가 사용됐다. 일단 선명한 OLED 디스플레이를 통한 시계 디자인은 흠잡을 데 없을 만큼 훌륭하다. 여기에 러버 재질의 일반 손목시계 스트랩과 팔찌 형태의 스트랩을 모두 제공해 사용자 취향대로 선택할 수 있도록 했다.
<Main Sentence> The screen of SmartWatch is a 2 inch curved display. Once a clear OLED display with a watch design is not flawless. Here, both rubber strap and wrist strap are available, so you can choose according to your preference.

불필요 어구 제거부(520)는 주요 문장 추출부(510)에서 추출한 주요 문장에서 불필요 어구를 제거하여 문장을 정제한다. 불필요 어구는 다양하게 설정될 수 있고, 예를 들면, 불필요 어구 제거부(520)는 주요 문장에서 접속사, 부사어, 관형어, 감탄사를 제거한다. The unnecessary phrase removal unit 520 removes unnecessary phrases from the main sentence extracted by the main sentence extraction unit 510 to refine the sentence. Unnecessary phrases can be set in a variety of ways, for example, the unnecessary phrase eliminator 520 removes the conjunctions, adjectives, adjectives, and exclamations in the main sentence.

요약문 생성부(530)는 불필요 어구 제거부(520)에 의해 정제된 문장을 모아서 요약문으로 출력한다. 예를 들면, 요약문 생성부(530)는 주요 문장에서 접속사, 부사어, 관형어, 감탄사가 제거된 다음과 같은 요약문을 출력할 수 있다.
The summary-statement generating unit 530 collects the sentences refined by the unnecessary-phrase removal unit 520 and outputs them as a summary. For example, the summary-statement generation unit 530 may output the following summary sentences in which the conjunction, adverb, idiomatic, and exclamation have been removed from the main sentence.

<요약문> 스마트워치의 화면은 2인치 커브드 디스플레이가 사용됐다. 시계 디자인은 훌륭하다. 러버 재질의 일반 손목시계 스트랩과 팔찌 형태의 스트랩을 제공해 사용자 취향대로 선택할 수 있도록 했다.
<Summary>SmartWatch's screen was a 2-inch curved display. The clock design is amazing. It features a rubber strap and a bracelet strap for your personal preference.

요약문 리포팅부(600)는 요약문 작성부(500)에서 출력한 요약문을 사용자에게 리포팅한다. 사용자에게 리포팅하는 방법은 다양할 수 있고, 요약문 리포팅부(600)는 사용자가 지정한 단말로 요약문을 전송할 수 있다.The summary report generator 600 reports the summary to the user. The method of reporting to the user may be various, and the summary report reporting unit 600 may transmit the summary to the terminal designated by the user.

도 2는 본 발명의 한 실시예에 따른 텍스트 요약 방법의 흐름도이다.2 is a flow diagram of a text summarization method in accordance with an embodiment of the present invention.

도 2를 참고하면, 요약 장치(10)는 입력문을 복수의 RDF 트리플로 변환한다(S110).Referring to FIG. 2, the summary apparatus 10 converts an input statement into a plurality of RDF triples (S110).

요약 장치(10)는 복수의 RDF 트리플 중에서, 프레디키트에, 관심있는 의미자질이 포함된 트리플을 주요 트리플로 추출한다(S120). 관심있는 의미자질은 입력문의 성격, 사용자의 관심사항 등에 따라 다르게 설정될 수 있다. 예를 들어, 요약 장치(10)는 선호/비선호 자질과 같은 감성적 자질을 관심있는 의미자질로 설정할 수 있고, 프레디키트에 "좋다", "나쁘다"와 같은 선호/비선호 표현이 있는 트리플을 주요 트리플로 결정할 수 있다.The summary apparatus 10 extracts, among the plurality of RDF triples, a triple including a semantic feature of interest into a major triple in a Freddie kit (S120). The semantic qualities of interest may be set differently depending on the nature of the input query, the user's interests, and the like. For example, the summary device 10 may set the emotional qualities such as preference / non-preference qualities to the interesting semantic qualities of interest, and the triple with preferred / non-preference expressions such as "good" .

요약 장치(10)는 복수의 RDF 트리플 중에서, 프레디키트에, 지정된 형태의 어휘가 포함된 트리플을 주요 트리플로 추출한다(S130). 예를 들면, 요약 장치(10)는 프레디키트에 "~하", "~지", "~되(돼)", "수 있", "수 없"의 어휘가 있는 트리플을 주요 트리플로 선별한다.The summary apparatus 10 extracts, among the plurality of RDF triples, a triple including a vocabulary of a designated type as a main triple in a predicated kit (S130). For example, the digesting device 10 may select a triple with a vocabulary of " to ", "to "," do.

요약 장치(10)는 입력문에서 주요 트리플에 해당하는 주요 문장을 추출한다(S140). 요약 장치(10)는 주요 트리플에 해당하는 모든 문장을 주요 문장으로 추출할 수 있지만, 문장의 개수나 길이, 글자 수 등을 고려하여 주요 트리플에 해당하는 문장들 중에서 가중치에 따라 주요 문장을 선별할 수 있다. 이때, 요약 장치(10)는 관심있는 의미자질이 포함된 트리플에 해당하는 문장의 가중치를 높게 설정할 수 있다.The summary apparatus 10 extracts a main sentence corresponding to the main triple in the input statement (S140). The summary device 10 can extract all the sentences corresponding to the main triple as the main sentence, but selects the main sentence according to the weight among the sentences corresponding to the main triple in consideration of the number of sentences, the length, and the number of characters . At this time, the summary apparatus 10 can set a weight of a sentence corresponding to a triple including a semantic feature of interest to a high value.

요약 장치(10)는 주요 문장에서 지정된 불필요 어구를 제거한다(S150).The summary apparatus 10 removes the unnecessary phrases specified in the main sentence (S150).

요약 장치(10)는 불필요 어구가 제거된 문장들로 요약문을 작성한다(S160).The summary apparatus 10 creates a summary with sentences whose unnecessary phrases have been removed (S160).

요약 장치(10)는 요약문을 사용자에게 리포팅한다(S170).The summary device 10 reports the summary to the user (S170).

도 3은 본 발명의 한 실시예에 따른 RDF 트리플 생성 방법의 흐름도이다.3 is a flowchart of a method for generating an RDF triple according to an embodiment of the present invention.

도 3을 참고하면, 요약 장치(10)는 복수의 문장으로 구성된 입력문을 입력받는다(S210).Referring to FIG. 3, the summary apparatus 10 receives an input statement composed of a plurality of sentences (S210).

요약 장치(10)는 입력문을 형태소로 분석한다(S220).The summary apparatus 10 analyzes the input sentence as a morpheme (S220).

요약 장치(10)는 입력문의 형태소를 기초로 어절을 생성한다(S230).The summary apparatus 10 generates a word based on the morpheme of the input query (S230).

요약 장치(10)는 입력문의 문장 성분을 분석한다(S240).The summary apparatus 10 analyzes the sentence components of the input query (S240).

요약 장치(10)는 입력문의 문장 분석 정보를 기초로 각 문장을 RDF 트리플로 변환한다(S250). RDF 트리플은 RDF 트리플 리파지토리(300)에 저장된다.The summary apparatus 10 converts each sentence into an RDF triple based on the sentence analysis information of the input query (S250). The RDF triple is stored in the RDF triple repository 300.

이와 같이, 요약 장치(10)가 사용자를 대신하여 텍스트의 의미를 파악하고, 의미적으로 중요한 문장을 추출하여 텍스트를 요약할 수 있다. 따라서, 요약 장치(10)가 방대한 양의 텍스트를 대신 읽고 의미상 중요한 문장을 추출하여 요약하고 리포팅하므로, 사용자는 모든 텍스트를 읽고 선별할 필요없이, 중요한 텍스트를 쉽고 빠르게 파악할 수 있다. 또한, 사용자는 관심있는 텍스트 또는 중요한 텍스트를 간과하는 위험도 줄일 수 있다.In this manner, the summary device 10 can grasp the meaning of the text on behalf of the user, and can abstract the text by extracting semantically important sentences. Thus, the summary device 10 reads a vast amount of text instead, summarizes and reports semantically significant sentences, so that the user can quickly and easily identify important text without having to read and select all the text. In addition, the user can also reduce the risk of overlooking text of interest or important text.

이상에서 설명한 본 발명의 실시예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있다.The embodiments of the present invention described above are not implemented only by the apparatus and method, but may be implemented through a program for realizing the function corresponding to the configuration of the embodiment of the present invention or a recording medium on which the program is recorded.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It belongs to the scope of right.

Claims

리소스 디스크립션 프레임워크(Resource Description Framework, RDF) 기반으로 텍스트를 요약하는 장치로서,
복수의 문장으로 구성된 입력문을 복수의 RDF 트리플로 변환하는 RDF 트리플 생성부,
프레디키트에 포함된 자질과 어휘 중 적어도 하나를 기초로, 상기 복수의 RDF 트리플 중에서 적어도 하나의 주요 트리플을 추출하는 주요 트리플 결정부, 그리고
상기 입력문에서 상기 적어도 하나의 주요 트리플에 해당하는 문장을 추출하여 요약문을 작성하는 요약문 작성부
를 포함하는 텍스트 요약 장치.An apparatus for summarizing text based on a Resource Description Framework (RDF)
An RDF triple generator for converting an input sentence composed of a plurality of sentences into a plurality of RDF triples,
A main triple decision unit for extracting at least one main triple among the plurality of RDF triples based on at least one of a quality and a lexicon included in the Freddie kit,
Extracting a sentence corresponding to the at least one major triple in the input sentence and creating a summary sentence;
/ RTI >

제1항에서,
상기 주요 트리플 결정부는
상기 복수의 RDF 트리플 중에서, 프레디키트에 지정된 의미자질이 포함된 트리플을 상기 주요 트리플로 추출하는 제1 트리플 추출부
를 포함하는 텍스트 요약 장치.The method of claim 1,
The main triple determination unit
A first triple extracting unit for extracting, from the plurality of RDF triples, a triple including the semantic qualities assigned to the predicated kit,
/ RTI >

제2항에서,
상기 지정된 의미자질은 선호 자질과 비선호 자질 중 어느 하나인 텍스트 요약 장치.3. The method of claim 2,
Wherein the designated semantic feature is one of a preference feature and a non-preference feature.

제2항에서,
상기 주요 트리플 결정부는
상기 복수의 RDF 트리플 중에서, 프레디키트에 지정된 형태의 어휘가 포함된 트리플을 주요 트리플로 추출하는 제2 트리플 추출부
를 더 포함하는 텍스트 요약 장치.3. The method of claim 2,
The main triple determination unit
A second triple extracting unit for extracting, from the plurality of RDF triples, a triple including a vocabulary of a type specified in the predicated kit,
Further comprising:

제4항에서,
상기 지정된 형태의 어휘는
"~하", "~지", "~되", "~돼", "수 있", 그리고 "수 없"는 중 적어도 하나를 포함하는 텍스트 요약 장치.5. The method of claim 4,
The vocabulary of the specified type
A text summarizing device comprising at least one of "to", "to", "to", "to", "can" and "not".

제1항에서,
상기 요약문 작성부는
상기 입력문에서 상기 적어도 하나의 주요 트리플에 해당하는 적어도 하나의 주요 문장을 추출하는 주요 문장 추출부,
상기 적어도 하나의 주요 문장 각각에서 지정된 어구를 제거하여 각 정제 문장을 생성하는 불필요 어구 제거부, 그리고
상기 불필요 어구 제거부에서 생성된 정제 문장들을 모아서 요약문을 만드는 요약문 생성부
를 포함하는 텍스트 요약 장치.The method of claim 1,
The summary-
A main sentence extracting unit for extracting at least one main sentence corresponding to the at least one main triple in the input sentence,
Removing unnecessary phrases from each of said at least one main sentence to generate a respective refinement sentence, and
A summary sentence generation unit for collecting the refined sentences generated in the unnecessary phrase removal unit and generating a summary sentence,
/ RTI >

제6항에서,
상기 지정된 어구는 접속사, 부사어, 관형어, 감탄사 중 어느 하나에 해당하는 어구인 텍스트 요약 장치.The method of claim 6,
Wherein the designated phrase is a phrase corresponding to one of conjunction, adverb, adjective, and exclamation.

제1항에서,
상기 요약문 작성부에서 작성된 요약문을 사용자에게 리포팅하는 요약문 리포팅부
를 더 포함하는 텍스트 요약 장치.The method of claim 1,
A summary report reporting unit for reporting the summary to the user,
Further comprising:

리소스 디스크립션 프레임워크(Resource Description Framework, RDF) 기반으로 장치가 텍스트를 요약하는 방법으로서,
복수의 문장으로 구성된 입력문을 복수의 RDF 트리플로 변환하는 단계,
상기 복수의 RDF 트리플 중에서, 지정된 의미자질과 지정된 형태의 어휘 중 적어도 하나가 프레디키트에 포함된 트리플을 주요 트리플로 추출하는 단계, 그리고
상기 입력문에서 상기 주요 트리플에 해당하는 문장을 추출하여 요약문을 작성하는 단계
를 포함하는 텍스트 요약 방법.A method for a device to summarize text based on a Resource Description Framework (RDF)
Converting an input statement composed of a plurality of sentences into a plurality of RDF triples,
Extracting, among the plurality of RDF triples, a triple including at least one of a specified semantic qualification and a lexicon of a designated type in a predicate kit as a main triple; and
Extracting a sentence corresponding to the main triple from the input sentence and creating a summary sentence
&Lt; / RTI >

제9항에서,
상기 주요 트리플로 추출하는 단계는
상기 복수의 RDF 트리플 중에서, 선호 자질과 비선호 자질 중 어느 하나가 프레디키트에 포함된 트리플을 상기 주요 트리플로 추출하는 텍스트 요약 방법.The method of claim 9,
The step of extracting with the main triple
And extracting, from the plurality of RDF triples, a triple in which one of the preference qualities and the non-preference qualities is included in the predicate kit.

제9항에서,
상기 주요 트리플로 추출하는 단계는
상기 복수의 RDF 트리플 중에서, "~하", "~지", "~되", "~돼", "수 있", 그리고 "수 없" 중 어느 하나가 프레디키트에 포함된 트리플을 상기 주요 트리플로 추출하는 텍스트 요약 방법.The method of claim 9,
The step of extracting with the main triple
Wherein a triple included in the Freddie kit is selected from among the plurality of RDF triples as one of " A text summarization method that extracts into triples.

제9항에서,
상기 요약문을 작성하는 단계는
상기 입력문에서 상기 주요 트리플 각각에 해당하는 적어도 하나의 주요 문장을 추출하는 단계,
상기 적어도 하나의 주요 문장에서 지정된 어구를 제거하는 단계, 그리고
상기 지정된 어구가 제거된 적어도 하나의 문장을 모아서 상기 요약문을 만드는 단계
를 포함하는 텍스트 요약 방법.The method of claim 9,
The step of creating the summary
Extracting at least one main sentence corresponding to each of the major triples in the input statement,
Removing the specified phrase from the at least one main sentence, and
Collecting at least one sentence from which the designated phrase has been removed,
&Lt; / RTI >

제12항에서,
상기 지정된 어구는 접속사, 부사어, 관형어, 감탄사 중 어느 하나에 해당하는 어구인 텍스트 요약 방법.The method of claim 12,
Wherein the designated phrase is a phrase corresponding to one of a conjunction, an adjective, an adjective, and an exclamation.

제9항에서,
상기 요약문을 사용자에게 리포팅하는 단계
를 더 포함하는 텍스트 요약 방법.The method of claim 9,
The step of reporting the summary to the user
&Lt; / RTI >