KR102365308B1

KR102365308B1 - Method and device for predicting risk in corporte internal documents using artificial neural networks

Info

Publication number: KR102365308B1
Application number: KR1020210107288A
Authority: KR
Inventors: 이송자
Original assignee: 주식회사 데이터아이
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2022-02-23

Abstract

The present invention relates to a method and a device for predicting a risk in a corporate internal document using an artificial neural network. The device includes: at least one processor; and a memory storing instructions for the at least one processor to execute at least one operation. The at least one operation includes the following steps of: receiving article information from an external server; generating keyword impact information with respect to a risk detection keyword based on the article information; receiving a corporate internal document and enterprise information from an enterprise terminal; inputting the corporate internal document, the enterprise information, and the keyword impact information to an artificial neural network supervised-learned in advance; and determining the risk score of the corporate internal document based on an output value of the artificial neural network. The risk score means a score quantifying a risk that is a negative effect on an enterprise in the case of leakage of the corporate internal document.

Description

인공 신경망을 이용하여 기업 내부 문서의 리스크를 예측하는 방법 및 장치 {METHOD AND DEVICE FOR PREDICTING RISK IN CORPORTE INTERNAL DOCUMENTS USING ARTIFICIAL NEURAL NETWORKS}Method and device for predicting risk of internal corporate documents using artificial neural networks {METHOD AND DEVICE FOR PREDICTING RISK IN CORPORTE INTERNAL DOCUMENTS USING ARTIFICIAL NEURAL NETWORKS

본 발명은 기업 내부 문서의 리스크를 예측하는 기술에 관한 것으로, 더욱 상세하게는 인공 신경망을 기업 내부 문서의 리스크를 예측하는 방법 및 장치에 관한 것이다.The present invention relates to a technology for predicting the risk of an internal corporate document, and more particularly, to a method and apparatus for predicting the risk of an internal corporate document using an artificial neural network.

본 명세서에서 달리 표시되지 않는 한, 이 섹션에 설명되는 내용들은 이 출원의 청구항들에 대한 종래 기술이 아니며, 이 섹션에 포함된다고 하여 종래 기술이라고 인정되는 것은 아니다.Unless otherwise indicated herein, the material described in this section is not prior art to the claims of this application, and inclusion in this section is not an admission that it is prior art.

기업 등에서 생산되는 다양한 종류의 문서를 매일 누적 저장하기 위하여 매우 큰 저장공간이 필요하다. 또한, 저장공간에 저장된 대용량의 문서들을 일반적으로 문서 이름 또는 문서에 부여된 몇몇 키워드 등으로 검색될 수 있다.A very large storage space is required to accumulate and store various types of documents produced by companies every day. In addition, large-capacity documents stored in the storage space can be generally searched for by a document name or some keywords assigned to the document.

그러나, 문서 이름이나 몇몇 키워드로 문서 내 어떤 내용이 포함되어 있는지 정확하게 파악하기 어렵다. 더욱이 기업 내부 문서의 경우, 외부에 노출될 경우 기업에 끼칠 부정적인 영향력을 고려하면, 철저하게 관리되어야 함은 당연하다.However, it is difficult to determine exactly what content is included in the document by the document name or some keywords. Moreover, in the case of corporate internal documents, it is natural that they should be thoroughly managed, considering the negative impact they will have on the company if exposed to the outside.

따라서, 기업 내부 문서가 외부에 노출될 경우 기업에 끼칠 부정적인 영향력(리스크)을 미리 산출하여 리스크에 따라서 대용량의 문서들을 분류할 필요가 생기게 되었다. 그러나, 많은 문서를 사람이 직접 검토하는 데에는 필요한 시간 비용과 인적 비용이 높은 문제가 있었다.Therefore, it is necessary to calculate in advance the negative impact (risk) that the company will have when internal documents are exposed to the outside and classify large-capacity documents according to the risk. However, there is a problem in that the time required and the human cost are high for manually reviewing many documents.

한국등록특허 제10-2008707호(공개일자: 2019.08.02.)Korean Patent Registration No. 10-2008707 (published date: 2019.08.02.)

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, 인공 신경망을 이용하여 기업 내부 문서의 리스크를 예측하는 방법 및 장치를 제공하는 데 있다.An object of the present invention for solving the above problems is to provide a method and an apparatus for predicting the risk of a corporate internal document using an artificial neural network.

상기 목적을 달성하기 위한 본 발명의 일 측면은, 인공 신경망을 이용하여 기업 내부 문서의 리스크를 예측하는 장치를 제공한다.One aspect of the present invention for achieving the above object provides an apparatus for predicting the risk of a corporate internal document using an artificial neural network.

인공 신경망을 이용하여 기업 내부 문서의 리스크를 예측하는 장치는, 적어도 하나의 프로세서(processor); 및 상기 적어도 하나의 프로세서가 적어도 하나의 동작(operation)을 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory)를 포함한다.An apparatus for predicting a risk of an internal corporate document using an artificial neural network includes: at least one processor; and a memory for storing instructions instructing the at least one processor to perform at least one operation.

상기 적어도 하나의 동작은, 외부 서버로부터 기사 정보를 수신하는 단계; 상기 기사 정보에 기초하여 리스크 검출 키워드에 대한 키워드 파급력 정보를 생성하는 단계; 기업 단말로부터 기업 내부 문서 및 기업 정보를 수신하는 단계; 상기 기업 내부 문서, 상기 기업 정보, 및 상기 키워드 파급력 정보를 미리 지도학습(supervised learning)된 인공 신경망에 입력하는 단계; 및 상기 인공 신경망의 출력값에 기반하여 상기 기업 내부 문서의 리스크 점수를 결정하는 단계를 포함한다.The at least one operation may include: receiving article information from an external server; generating keyword impact information for a risk detection keyword based on the article information; Receiving a corporate internal document and corporate information from the corporate terminal; inputting the company internal document, the company information, and the keyword impact information into a pre-supervised learning artificial neural network; and determining a risk score of the corporate internal document based on the output value of the artificial neural network.

상기 리스크 점수는 상기 기업 내부 문서가 유출되었을 경우 기업에 끼치는 부정적인 영향력인 리스크(risk)를 정량화한 점수를 의미한다.The risk score refers to a score that quantifies a risk that is a negative influence on a company when the internal documents of the company are leaked.

상기 적어도 하나의 동작은, 언어 전처리기를 이용하여 상기 리스크 검출 키워드에 대한 분포 정보를 생성하는 단계를 더 포함할 수 있다.The at least one operation may further include generating distribution information for the risk detection keyword using a language preprocessor.

상기 분포 정보는 상기 리스크 검출 키워드가 상기 기업 내부 문서에 포함되어 있는지 여부와 상기 기업 내부 문서에 상기 리스크 검출 키워드가 포함된 개수를 포함할 수 있다.The distribution information may include whether the risk detection keyword is included in the corporate internal document and the number of the risk detection keyword included in the corporate internal document.

상기 키워드 파급력 정보를 생성하는 단계는, 상기 기사 정보에 기초하여 상기 리스크 검출 키워드의 파급력을 계산하고, 계산된 상기 파급력을 포함하는 상기 키워드 파급력 정보를 생성하는 단계를 포함할 수 있다. The generating of the keyword impact information may include calculating the impact of the risk detection keyword based on the article information, and generating the keyword impact information including the calculated impact.

상기 파급력은, 기사가 게재된 언론사의 티어(Tier), 상기 기사가 공개된 매체의 유형, 및 상기 기사 내에서 상기 리스크 검출 키워드가 사용된 방식에 따라 결정될 수 있다.The ripple power may be determined according to a tier of a media company in which the article is published, a type of medium in which the article is published, and a method in which the risk detection keyword is used in the article.

상기 인공 신경망은 미리 생성된 학습 데이터를 이용하여 지도학습될 수 있다.The artificial neural network may be supervised by using pre-generated learning data.

상기 학습 데이터는, 상기 리스크 검출 키워드와 상기 기업 정보에 기초하여 생성된 문서 리스크 벡터 및 상기 키워드 파급력 정보에 기초하여 생성된 파급력 벡터로 구성된 훈련 입력값; 및 상기 분포 정보, 상기 기업 정보, 및 상기 키워드 파급력 정보를 기반으로 생성된 목표 리스크 벡터로 구성된 훈련 출력값을 포함할 수 있다.The learning data may include: a training input value including a document risk vector generated based on the risk detection keyword and the company information, and a ripple power vector generated based on the keyword impact information; and a training output value composed of a target risk vector generated based on the distribution information, the company information, and the keyword impact information.

상기 인공 신경망은, 미리 정의된 손실 함수에 기초하여, 상기 훈련 입력값을 입력받았을 때 얻어지는 출력 벡터와 상기 목표 리스크 벡터 사이의 차이가 최소화되도록 지도학습될 수 있다.The artificial neural network may be supervised based on a predefined loss function to minimize a difference between an output vector obtained when receiving the training input value and the target risk vector.

본 발명은 리스크 관리를 지원하는 온라인 서비스(및/또는 온라인 플랫폼)를 기업(및/또는 사용자)에게 제공함으로써, 기업(및/또는 사용자)이 기업 내부 문서의 리스크를 파악할 수 있는 효과를 기대할 수 있다.The present invention provides an online service (and/or an online platform) that supports risk management to a company (and/or user), so that the company (and/or user) can expect the effect of being able to understand the risk of internal documents in the company. there is.

또한 본 발명은 기업 내부 문서의 리스크를 정량적으로 계산하여 사용자에게 제공하기에 기업(및/또는 사용자)이 리스크가 존재하는 기업 내부 문서의 확인이 가능해지는 효과가 있다.In addition, the present invention has the effect of enabling the company (and/or the user) to check the company internal document in which the risk exists because the risk of the company internal document is quantitatively calculated and provided to the user.

또한 본 발명은 인공 신경망(10)을 이용하여 기업 내부 문서가 외부에 노출될 경우 기업에 끼칠 부정적인 영향력인 리스크에 따라서 대용량의 문서들을 분류할 수 있기 때문에, 많은 문서를 검토하는 비용을 절감하면서도 리스크가 높은 문서를 명확하게 가려낼 수 있는 효과가 있다.In addition, the present invention uses the artificial neural network 10 to classify large-capacity documents according to the risk, which is a negative influence on the company when the internal documents of the company are exposed to the outside, thereby reducing the cost of reviewing many documents. It has the effect of clearly screening documents with high values.

실시예들로부터 얻을 수 있는 효과들은 이상에서 언급된 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 이하의 상세한 설명을 기반으로 당해 기술분야에서 통상의 지식을 가진 자에게 명확하게 도출되고 이해될 수 있다.Effects obtainable from the embodiments are not limited to the effects mentioned above, and other effects not mentioned are clearly derived and understood by those of ordinary skill in the art based on the detailed description below. can be

실시예들에 대한 이해를 돕기 위해 상세한 설명의 일부로 포함된, 첨부 도면은 다양한 실시예들을 제공하고, 상세한 설명과 함께 다양한 실시예들의 기술적 특징을 설명한다.
도 1은 본 발명의 일 실시예에 따른 인공지능을 이용한 기업 내부 문서의 리스크 예측 방법 및 장치에 대한 개요도이다.
도 2는 도 1에 따른 리스크 예측 서버의 기능적 모듈을 예시적으로 나타낸 블록도이다.
도 3은 하나의 기사에 대한 특정 키워드의 파급력을 계산하기 위해 언론사 티어 점수(PT)이고, 노출 방식 점수(EM), 언급 방식 점수(MM)에 대하여 미리 정해진 테이블에 따라 할당된 점수의 예시를 나타내는 도면이다.
도 4는 도 1에 따른 리스크 예측 서버에서 이용하는 인공 신경망의 구조와 동작을 설명하기 위한 개념도이다.
도 5는 문서 리스크 벡터(Y_D)와 리스크 검출 키워드의 파급력 벡터(Y_RE)를 설명하기 위한 개념도이다.
도 6은 본 발명의 일 실시예에 따른 리스크 예측 서버의 하드웨어 구성을 예시적으로 나타낸 도면이다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included as part of the detailed description to aid understanding of the embodiments, provide various embodiments and, together with the detailed description, explain technical features of the various embodiments.
1 is a schematic diagram of a method and apparatus for predicting risk of a corporate internal document using artificial intelligence according to an embodiment of the present invention.
FIG. 2 is a block diagram exemplarily showing a functional module of the risk prediction server according to FIG. 1 .
3 is a media company tier score (PT), an example of a score assigned according to a predetermined table for an exposure method score (EM), and a mention method score (MM) to calculate the impact of a specific keyword for one article. It is a drawing showing
4 is a conceptual diagram for explaining the structure and operation of an artificial neural network used in the risk prediction server according to FIG. 1 .
5 is a conceptual diagram for explaining the document risk vector (Y _D ) and the impact vector (Y _RE ) of the risk detection keyword.
6 is a diagram exemplarily showing a hardware configuration of a risk prediction server according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals have been used for like elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연걸되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어"있다고 언급된 때에는, 중간에 다른 구성요가 존재하지 않는 것으로 이해되어야 할 것이다.When an element is referred to as being “connected to” or “connected to” another element, it is understood that the other element may be directly connected or connected to the other element, but other elements may exist in between. it should be On the other hand, when an element is referred to as being “directly connected” or “directly connected” to another element, it should be understood that no other element is present in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가가능성을 미리 배제하지 않는 것으로 이해되어야 한다. The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

본 명세서에서, 리스크는 기업 내부 문서가 유출되었을 경우 기업에 끼치는 부정적인 영향력을 의미한다.In this specification, risk refers to a negative impact on a company if internal corporate documents are leaked.

도 1은 본 발명의 일 실시예에 따른 인공지능을 이용한 기업 내부 문서의 리스크 예측 방법 및 장치에 대한 개요도이다.1 is a schematic diagram of a method and apparatus for predicting risk of internal corporate documents using artificial intelligence according to an embodiment of the present invention.

도 1을 참조하면, 인공 신경망을 이용하여 기업 내부 문서의 리스크를 예측하는 방법은 리스크 예측 서버(100, 인공 신경망을 이용하여 기업 내부 문서의 리스크를 예측하는 장치와 혼용하여 지칭될 수 있음)는, 기업 단말(200), 외부 서버(300), 및 사용자 단말(400)를 이용하여 수행될 수 있다. 이때, 리스크 예측 서버(100), 기업 단말(200), 외부 서버(300), 및, 사용자 단말(400)을 리스크 예측 시스템(10)으로 지칭할 수도 있다.Referring to FIG. 1, the method of predicting the risk of internal corporate documents using an artificial neural network is a risk prediction server (100, may be referred to as a device for predicting the risk of internal corporate documents using an artificial neural network) , the enterprise terminal 200 , the external server 300 , and the user terminal 400 may be used. In this case, the risk prediction server 100 , the corporate terminal 200 , the external server 300 , and the user terminal 400 may be referred to as the risk prediction system 10 .

기업 단말(200)은, 리스크 예측 서버(100)를 통해 기업 내부 문서의 리스크를 예측하고자 하는 기업의 단말로서, 기업 내부 문서 및 기업 정보를 리스크 예측 서버(100)에 전송할 수 있다. 이때 기업 내부 문서는 전자 문서로서, 문서의 타입은 doc, pdf, hwp, ppt, txt 등이 있을 수 있다. The company terminal 200 is a terminal of a company that wants to predict the risk of a company internal document through the risk prediction server 100 , and may transmit internal company documents and company information to the risk prediction server 100 . In this case, the corporate internal document is an electronic document, and the document type may include doc, pdf, hwp, ppt, txt, and the like.

예를 들어, 기업 정보는, 기업 내부 문서의 리스크를 예측할 대상 기업에 대한 정보로서, 기업의 사업연도, 매출액, 순이익, 근로자수 등을 포함할 수 있다. For example, the company information is information on a target company for which the risk of internal company documents is to be predicted, and may include a business year of the company, sales, net profit, number of workers, and the like.

사용자 단말(400)은, 리스크 예측 서버(100)로부터 기업 내부 문서의 리스크 예측 결과를 제공받을 수 있다. 이때, 사용자 단말(400)과 기업 단말(200)은 동일한 단말일 수 있다. 예를 들어, 기업 내부 문서의 리스크 예측 결과는, 기업 내부 문서에 대해 리스크를 정량화한 점수인 리스크 점수, 특정 기업 내부 문서가 어떤 종류의 리스크 점수가 가장 높은 지에 대한 정보, 리스크 점수에 따라 복수의 기업 내부 문서를 오름차순하거나 내림차순으로 정렬한 리스트 정보, 리스크 점수에 따라 기업 내부 문서에 리스크 등급이 부여된 경우 특정 리스크 등급을 모아놓은 기업 내무 문서의 리스크 등급별 리스트 정보 등이 포함할 수 있다. 이때, 리스크의 종류에는 오너 리스크, 제조 리스크, 유통 리스크, 자산 리스크, 자산 리스크, 서비스 리스크, 건설 리스크, 국제조세 리스크 등이 포함될 수 있다.The user terminal 400 may receive a risk prediction result of a corporate internal document from the risk prediction server 100 . In this case, the user terminal 400 and the enterprise terminal 200 may be the same terminal. For example, the risk prediction result of internal corporate documents is a risk score, which is a score that quantifies the risk for internal corporate documents, information about which type of risk score is the highest for a specific internal corporate document, and multiple risk scores according to the risk score. List information arranged in ascending or descending order of internal corporate documents, and when a risk grade is assigned to internal corporate documents according to risk scores, list information by risk grade of internal corporate documents that collect specific risk grades may be included. In this case, the type of risk may include owner risk, manufacturing risk, distribution risk, asset risk, asset risk, service risk, construction risk, international tax risk, and the like.

기업 단말(200) 및 사용자 단말(400)의 예를 들면, 통신 가능한 데스크탑 컴퓨터(desktop computer), 랩탑 컴퓨터(laptop computer), 노트북(notebook), 스마트폰(smart phone), 태블릿 PC(tablet PC), 모바일폰(mobile phone), 스마트 워치(smart watch), 스마트 글래스(smart glass), e-book 리더기, PMP(portable multimedia player), 휴대용 게임기, 네비게이션(navigation) 장치, 디지털 카메라(digital camera), DMB(digital multimedia broadcasting) 재생기, 디지털 음성 녹음기(digital audio recorder), 디지털 음성 재생기(digital audio player), 디지털 동영상 녹화기(digital video recorder), 디지털 동영상 재생기(digital video player), PDA(Personal Digital Assistant) 등일 수 있다.For example, the enterprise terminal 200 and the user terminal 400, which can communicate, a desktop computer (desktop computer), a laptop computer (laptop computer), a notebook (notebook), a smart phone (smart phone), a tablet PC (tablet PC) , mobile phone, smart watch, smart glass, e-book reader, PMP (portable multimedia player), portable game console, navigation device, digital camera, DMB (digital multimedia broadcasting) player, digital audio recorder, digital audio player, digital video recorder, digital video player, PDA (Personal Digital Assistant) etc.

외부 서버(300)는, 기사 정보를 리스크 예측 서버(100)에게 전송할 수 있다. 예를 들어, 기사 정보는 적어도 하나 이상의 언론사가 발행한 기사의 본문 및 제목, 상기 기사가 온라인과 오프라인 중 어디서 발행되었는 지 등이 포함할 수 있다.The external server 300 may transmit article information to the risk prediction server 100 . For example, the article information may include a body and title of an article published by at least one or more media companies, whether the article was published online or offline, and the like.

리스크 예측 서버(100)는, 기업 단말(200)로부터 기업 내부 문서와 기업 정보, 및 외부 서버(300)로부터 기사 정보를 입력받고, 인공신경망을 이용하여 기업 내부 문서의 리스크를 예측하고, 생성된 리스크 예측 결과를 사용자 단말(400)에게 전송할 수 있다.The risk prediction server 100 receives the company internal document and company information from the company terminal 200, and article information from the external server 300, predicts the risk of the company internal document using an artificial neural network, and generates The risk prediction result may be transmitted to the user terminal 400 .

리스크 예측 서버(100)는 전자 문서 관리 시스템(EDMS; Electronic Document Management System), 그룹웨어(GW; Groupware), 비즈니스 프로세스 관리 (BPM; Business Process Management) 시스템, 전사적 자원 관리(ERP; Enterprise Resource Planning) 시스템, 이메일 시스템 등과 같이 기업 내부 문서의 결재, 공유, 수정, 저장, 전송 등과 같은 기업 내부 문서의 처리를 위한 장치에 상응할 수 있다. 상기 리스크 예측 서버(100)는 중앙 서버, 관리 서버, 클라우드 서버, 웹서버, 클라이언트 서버 등의 형태로 구현될 수 있다. Risk prediction server 100 is an electronic document management system (EDMS; Electronic Document Management System), groupware (GW; Groupware), business process management (BPM; Business Process Management) system, enterprise resource planning (ERP; Enterprise Resource Planning) system , may correspond to a device for processing internal corporate documents such as approval, sharing, modification, storage, and transmission of internal corporate documents, such as an e-mail system. The risk prediction server 100 may be implemented in the form of a central server, a management server, a cloud server, a web server, a client server, and the like.

리스크 예측 서버(100)는, 기업 내부 문서와 기업 정보를 수집할 수 있다.The risk prediction server 100 may collect company internal documents and company information.

예를 들어, 리스크 예측 서버(100)는, 기업 단말(200)의 데이터베이스에 접속하여 기업 내부 문서를 수집하거나 기업 단말(200)로부터 기업 내부 문서를 전송받을 수 있다. 또한, 기업 정보의 경우에도 리스크 예측 서버(100)는 기업 단말(200)의 데이터베이스에 접속하여 기업 정보를 수집하거나 기업 단말(200)로부터 기업 정보를 전송받을 수 있고, 관리자로부터 기업 정보를 입력받을 수도 있다.For example, the risk prediction server 100 may access the database of the corporate terminal 200 to collect internal corporate documents or receive internal corporate documents from the corporate terminal 200 . In addition, even in the case of corporate information, the risk prediction server 100 can access the database of the corporate terminal 200 to collect corporate information or receive corporate information from the corporate terminal 200, and receive corporate information input from the manager. may be

리스크 예측 서버(100)는, 인공 신경망(10)을 내부 저장소에 저장하고, 인공 신경망(10)을 구동시킬 수 있으나, 인공 신경망(10)을 구동하는 독립한 서버와 통신함으로써 인공 신경망(10)에 기업 내부 문서와 기업 정보를 입력하고, 인공 신경망(10)의 출력으로 리스크 예측 결과를 수신할 수도 있다. The risk prediction server 100 stores the artificial neural network 10 in the internal storage and can drive the artificial neural network 10, but by communicating with an independent server that drives the artificial neural network 10, the artificial neural network 10 It is also possible to input internal company documents and company information to the , and receive a risk prediction result as an output of the artificial neural network 10 .

리스크 예측 서버(100)는, 기업 내부 문서, 기업 정보, 및 기사 정보를 이용해 학습 데이터를 생성하고, 생성된 학습 데이터를 이용하여 인공 신경망(10)을 미리 지도학습(supervised learning)시킬 수 있다.The risk prediction server 100 may generate learning data using company internal documents, company information, and article information, and perform supervised learning of the artificial neural network 10 in advance using the generated training data.

리스크 예측 서버(100)는 인공 신경망(10)의 출력으로 획득한 리스크 예측 결과를 사용자 단말(400)에게 전송할 수 있다.The risk prediction server 100 may transmit a risk prediction result obtained as an output of the artificial neural network 10 to the user terminal 400 .

본 발명의 일 실시예에 따른 리스크 예측 서버(100)는 기업 내부 문서, 기업 정보, 및 기사 정보를 이용해 인공 신경망(10)을 미리 지도학습시키고, 지도학습된 인공 신경망(10)에 리스크를 예측할 기업 내부 문서와 기업 정보를 입력하여 리스크 예측 결과를 획득함으로써, 기업의 데이터베이스에 저장된 대용량의 문서들에 대한 리스크 예측 결과를 쉽게 획득할 수 있다.The risk prediction server 100 according to an embodiment of the present invention supervises and learns the artificial neural network 10 in advance using company internal documents, company information, and article information, and predicts risks in the supervised artificial neural network 10 . By inputting company internal documents and company information to obtain risk prediction results, it is possible to easily obtain risk prediction results for large-capacity documents stored in the company database.

따라서, 인공 신경망(10)을 통해, 기업 내부 문서가 외부에 노출될 경우 기업에 끼칠 부정적인 영향력인 리스크에 따라서 대용량의 문서들을 분류할 수 있기 때문에, 많은 문서를 검토하는 비용을 절감하면서도 리스크가 높은 문서를 명확하게 가려낼 수 있다.Therefore, through the artificial neural network 10, large-capacity documents can be classified according to the risk, which is a negative influence on the company when internal corporate documents are exposed to the outside, thereby reducing the cost of reviewing many documents and high risk Documents can be clearly identified.

도 2는 도 1에 따른 리스크 예측 서버의 기능적 모듈을 예시적으로 나타낸 블록도이다.FIG. 2 is a block diagram exemplarily showing a functional module of the risk prediction server according to FIG. 1 .

도 2를 참조하면, 리스크 예측 서버(100)는, 기업 내부 문서 전처리부(101), 키워드 파급력 추출부(102), 학습데이터 생성부(103), 리스크 예측 모델 학습부(104), 리스크 예측 모델(105)를 포함할 수 있다.Referring to FIG. 2 , the risk prediction server 100 includes a company internal document preprocessing unit 101 , a keyword impact extraction unit 102 , a learning data generation unit 103 , a risk prediction model learning unit 104 , and a risk prediction model 105 .

기업 내부 문서 전처리부(101)는 기업 단말(200)로부터 획득한 기업 내부 문서에 형태소 분석기, 색인어 추출기 등 언어 전처리기를 이용하여 기업 내부 문서로부터 무의미한 텍스트 정보를 필터링하고, 기업 내부 문서에 포함된 키워드의 분포 정보를 생성할 수 있다. 예를 들어, 키워드의 분포 정보는 기업 내부 문서에 특정 키워드의 포함 여부와 기업 내부 문서에 특정 키워드가 포함된 개수 등을 포함할 수 있다. The company internal document preprocessing unit 101 filters meaningless text information from the company internal documents by using a language preprocessor such as a morpheme analyzer and an index word extractor on the company internal documents obtained from the company terminal 200, and keywords included in the company internal documents distribution information can be generated. For example, the keyword distribution information may include whether a specific keyword is included in an internal company document and the number of specific keywords included in an internal company document.

관련하여, 기업 내부 문서 전처리부(101)는 수집한 적어도 하나의 기업 내부 문서에서 제목 및/또는 본문을 식별하고, 상기 식별된 제목 및/또는 본문에 기반하여 키워드의 개수를 추출할 수 있다.In relation to this, the company internal document preprocessor 101 may identify a title and/or a body from at least one collected internal company document, and extract the number of keywords based on the identified title and/or body.

또한, 기업 내부 문서 전처리부(101)는, 언어 전처리기를 이용하여 기업 내부 문서로부터 리스크 검출 키워드의 분포 정보를 생성할 수 있다. 여기서, 리스크 검출 키워드는 리스크를 정량화한 리스크 점수를 산출하기 위해 사용되는 검출용 키워드를 의미한다. 또한, 리스크 검출 키워드는 관리자에 의해 미리 설정될 수 있다. 이하에서, 리스크 검출 키워드의 총 개수는 N개로 가정하고 설명하도록 한다.In addition, the company internal document preprocessor 101 may generate distribution information of the risk detection keyword from the company internal document by using the language preprocessor. Here, the risk detection keyword means a detection keyword used to calculate a risk score quantified by risk. In addition, the risk detection keyword may be preset by an administrator. Hereinafter, it is assumed that the total number of risk detection keywords is N.

한편, 검출하고자 하는 리스크는 다양하게 존재할 수 있는데, 오너 리스크, 제조 리스크, 유통 리스크, 자산 리스크, 서비스 리스크, 건서 리스크, 국제조세 리스크 등이 존재할 수 있다. 이에 대응하여 리스크 검출 키워드의 종류도 다양할 수 있다. On the other hand, various risks to be detected may exist, such as owner risk, manufacturing risk, distribution risk, asset risk, service risk, construction risk, international tax risk, and the like. Correspondingly, the types of risk detection keywords may be varied.

자세하게는, 리스크 검출 키워드의 종류에는 오너 리스크 검출 키워드, 제조 리스크 검출 키워드, 유통 리스크 검출 키워드, 자산 리스크 검출 키워드, 서비스 리스크 검출 키워드, 건설 리스크 검출 키워드, 국제조세 리스크 검출 키워드 등이 포함될 수 있다. 예를 들어, 오너 리스크 검출 키워드에는, 상속, 증여, 경영권, 악재, 미술품, 골동품, VIP 등이 포함될 수 있으며, 제조 리스크 검출 키워드에는 반품, 보관, 재고미실현, 인건비, 선매출 등이 포함될 수 있으며, 유통 리스크 검출 키워드에는, 바꿔치기, 내부거래, 단가, 덤핑 등이 포함될 수 있으며, 자산 리스크 검출 키워드에는 부외자산, 재고실사, 처분손실, 감모손실, 불용자산, 대여자산 등이 포함될 수 있으며, 서비스 리스크 검출 키워드에는 직원할인, 연령분석, 해외광고, 브랜드, 광고 등이 포함될 수 있으며, 건설 리스크 검출 키워드에는 매각, 주택임차, 대리점 등이 포함될 수 있으며, 국제조세 리스크 검출 키워드에는 보고서, 컨설팅, 소송, 감찰, 부정, 횡령, 부당행위 등이 포함된다.Specifically, the types of risk detection keywords may include owner risk detection keywords, manufacturing risk detection keywords, distribution risk detection keywords, asset risk detection keywords, service risk detection keywords, construction risk detection keywords, international tax risk detection keywords, and the like. For example, the owner risk detection keyword may include inheritance, gift, management right, bad news, artwork, antique, VIP, etc., and the manufacturing risk detection keyword may include return, storage, inventory unrealization, labor cost, advance sales, etc. , distribution risk detection keywords may include swapping, insider trading, unit price, dumping, etc., and asset risk detection keywords may include off-balance sheet assets, inventory due diligence, disposal loss, depreciation loss, unused assets, loan assets, etc. Service risk detection keywords may include employee discount, age analysis, overseas advertisements, brands, and advertisements. Construction risk detection keywords may include sale, housing rental, agency, etc., and international tax risk detection keywords include reports, consulting, This includes litigation, inspection, fraud, embezzlement, and unfair practices.

관련하여, 기업 내부 문서 전처리부(101)는, 언어 전처리기를 이용하여 기업 내부 문서의 리스크 검출 키워드에 대한 분포 정보를 생성할 수 있다. 예를 들어, 리스크 검출 키워드에 대한 분포 정보에는 리스크 검출 키워드가 기업 내부 문서에 포함되어 있는지 여부와 기업 내부 문서에 리스크 검출 키워드가 포함된 개수가 포함될 수 있다. 이하에서, 후술되는 분포 정보는 리스크 검출 키워드에 대한 분포 정보를 지칭한다.In relation to this, the company internal document preprocessor 101 may generate distribution information on the risk detection keyword of the company internal document by using the language preprocessor. For example, the distribution information on the risk detection keyword may include whether the risk detection keyword is included in the company internal document and the number of the risk detection keyword included in the company internal document. Hereinafter, distribution information to be described below refers to distribution information for a risk detection keyword.

키워드 파급력 추출부(102)는, 외부 서버(300)로부터 획득한 기사 정보에 기초하여 리스크 검출 키워드의 파급력(ripple effect)을 계산할 수 있다. 예를 들어, 기사 정보는 적어도 하나 이상의 언론사가 발행한 기사의 본문 및 제목, 기사가 온라인과 오프라인 중 어디서 발행되었는 지 등이 포함할 수 있다.The keyword ripple power extraction unit 102 may calculate a ripple effect of the risk detection keyword based on the article information obtained from the external server 300 . For example, the article information may include the body and title of an article published by at least one or more media companies, whether the article was published online or offline, and the like.

여기서, 파급력은 특정 키워드가 언론사의 기사에 포함된 경우 해당 기사에 대한 언론의 관심을 정량적으로 계산한 값을 의미한다. 이때, 파급력은 하나의 기사 단위로 계산될 수 있으며, 복수 개의 기사에 대한 특정 키워드의 파급력은 각각의 기사에 대한 파급력을 계산하고, 계산된 각각의 기사에 대한 특정 키워드의 파급력에 기초하여 복수 개의 기사에 대한 파급력을 계산할 수 있다. 예를 들어, 복수 개의 기사에 대한 파급력은 각각의 기사에 대한 특정 키워드의 파급력의 평균값으로 계산될 수 있다.Herein, the ripple effect refers to a value obtained by quantitatively calculating the media's interest in the article when a specific keyword is included in the article of the media company. In this case, the ripple power may be calculated in units of one article, and the ripple power of a specific keyword for a plurality of articles is calculated by calculating the ripple power for each article, and based on the calculated ripple power of the specific keyword for each article, a plurality of You can calculate the ripple effect on the article. For example, the ripple power of a plurality of articles may be calculated as an average value of the ripple power of a specific keyword for each article.

리스크 검출 키워드의 파급력을 계산하는 경우에는, 리스크 검출 키워드의 파급력을 계산하는 데 사용될 기사의 개수를 미리 설정하고, 미리 설정된 개수의 기사 각각의 리스크 검출 키워드에 대한 파급력의 평균값을 사용한다고 가정한다.In the case of calculating the ripple power of the risk detection keyword, it is assumed that the number of articles to be used for calculating the ripple power of the risk detection keyword is preset, and the average value of the ripple power of each risk detection keyword of the preset number of articles is used.

한편, 하나의 기사에 대한 특정 키워드의 파급력은 다음의 수학식 1과 같이 정의될 수 있다.Meanwhile, the ripple effect of a specific keyword for one article may be defined as in Equation 1 below.

수학식 1에서 REn은 임의의 n(n은 1 이상의 자연수로서 키워드의 총 개수 N보다 작거나 같음)번째 키워드에 대한 파급력(Ripple Effect)이고, PT는 기사가 게재된 언론사 티어 점수(Press Tier Score)이고, EM은 노출 방식 점수(Exposure Method Score)이고, MM은 언급 방식 점수(Mention Method Score)이다. 여기서, 언론사 티어 점수(PT) 관련하여, 언론사 티어는 (언론사의 오프라인 기사 발행부 수) * (언론사의 홈페이지에 대한 온라인 트래픽 수)를 계산한 값에 따라 구분될 수 있으며, 계산한 값이 클수록 기사가 게재된 언론사 티어가 높은 등급의 티어가 되도록 설정되며, 그에 따라 언론사 티어에 따른 점수(PT)도 높아지도록 설정된다. 또한, 노출 방식 점수(EM)는 특정 키워드를 포함한 기사가 공개된 매체(media)의 유형에 따라 결정된다. 예를 들어, 노출 방식 점수(EM)는 온라인에서만 노출되었는 지, 온라인과 오프라인에서 함께 노출되었는 지에 따라 결정될 수 있으며, 온라인과 오프라인에서 함께 발행된 경우가 온라인에서만 발행된 경우보다 높도록 설정된다. 언급 방식 점수(MM)는 기사 내에서 특정 키워드가 사용된 방식에 따라 결정된다. 특정 키워드가 기사의 제목에 있는 경우와 제목에 없는 경우에 따라 결정될 수 있으며, 특정 키워드가 기사의 제목에 있는 경우가 제목에 없는 경우보다 높도록 설정된다.In Equation 1, REn is the ripple effect for an arbitrary n (n is a natural number equal to or less than the total number N of keywords as n is a natural number greater than or equal to 1), and PT is the press tier score of the media company in which the article is published. ), EM is the Exposure Method Score, and MM is the Mention Method Score. Here, with respect to the media tier score (PT), the media tier can be divided according to the calculated value of (the number of publications of offline articles by the media) * (the number of online traffic to the website of the media). The media tier where the article is published is set to become a high-grade tier, and accordingly, the score (PT) according to the media tier is set to increase. In addition, the exposure method score (EM) is determined according to the type of media in which an article including a specific keyword is published. For example, the exposure method score (EM) may be determined based on whether it was exposed only online or both online and offline, and the case where it was published online and offline is set to be higher than the case where it was published only online. The Mention Mode Score (MM) is determined by how certain keywords are used within the article. It can be determined depending on the case where a specific keyword is in the title of the article and the case where it is not in the title, and the case where the specific keyword is in the title of the article is set higher than the case where it is not in the title.

도 3은 하나의 기사에 대한 특정 키워드의 파급력을 계산하기 위해 언론사 티어 점수(PT)이고, 노출 방식 점수(EM), 언급 방식 점수(MM)에 대하여 미리 정해진 테이블에 따라 할당된 점수의 예시를 나타내는 도면이다. 구체적으로, 언론사 티어가 높을수록 언론사 티어 점수(PT)에 할당된 점수는 높도록 설정되어 있고, 노출 방식 점수(EM)에 할당된 점수는 온라인과 오프라인에서 함께 발행된 경우가 온라인에서만 발행된 경우보다 높도록 설정되어 있고, 언급 방식 점수(MM)는 특정 키워드가 기사의 제목에 있는 경우가 제목에 없는 경우보다 높도록 설정된다.3 is a media company tier score (PT), an example of a score assigned according to a predetermined table for an exposure method score (EM), and a mention method score (MM) to calculate the impact of a specific keyword for one article. It is a drawing showing Specifically, the higher the media tier, the higher the score assigned to the media tier score (PT), and the score assigned to the exposure method score (EM) is issued both online and offline. It is set to be higher, and the mention method score (MM) is set to be higher when a specific keyword is in the title of the article than when it is not in the title.

한편, 리스크 검출 키워드와 리스크 검출 키워드의 개수(N개)는 관리자에 의해 미리 설정될 수 있으며, 키워드 파급력 추출부(102)는, 외부 서버(300)로부터 획득한 기사 정보에 기초하여 총 N개의 리스크 검출 키워드 각각의 파급력을 계산할 수 있고, 계산된 리스크 검출 키워드 각각의 파급력을 포함하는 리스크 검출 키워드에 대한 파급력 정보를 생성할 수 있다. 또한, 키워드 파급력 추출부(102)는 주기적으로 외부 서버(300)로부터 획득한 기사 정보에 기초하여 파급력 정보를 업데이트할 수 있다.On the other hand, the number of risk detection keywords and risk detection keywords (N) may be preset by the administrator, and the keyword impact extraction unit 102, based on the article information obtained from the external server 300, a total of N The ripple power of each risk detection keyword may be calculated, and ripple power information for the risk detection keyword including the calculated ripple power of each risk detection keyword may be generated. In addition, the keyword impact extraction unit 102 may periodically update the impact information based on the article information obtained from the external server 300 .

학습데이터 생성부(103)는, 분포 정보, 기업 정보, 및 키워드 파급력 정보를 이용하여 인공 신경망(10)을 학습시키기 위한 학습데이터를 생성할 수 있다. The learning data generator 103 may generate learning data for learning the artificial neural network 10 using distribution information, company information, and keyword impact information.

일 실시예에 있어서, 학습데이터 생성부(103)는 기업 단말로부터 획득한 기업 내부 문서의 분포 정보 및 기업 정보를 이용하여 획득된 입력 벡터로 키워드 파급력 정보를 반영한 리스크 점수를 출력하도록 인공 신경망(10)을 지도학습하기 위한 학습데이터를 생성할 수 있다. In one embodiment, the learning data generating unit 103 outputs the risk score reflecting the keyword impact information as an input vector obtained using the distribution information of the company internal document obtained from the company terminal and the company information. ) can generate learning data for supervised learning.

이때, 리스크 점수는 기업 내부 문서가 유출되었을 경우 기업에 끼치는 부정적인 영향력을 기업 정보를 반영하여 정량화한 점수를 의미한다. 즉, 리스크 점수는 같은 내용의 기업 내부 문서일지라도, 기업에 따라서 달라질 수 있다.In this case, the risk score refers to a score that reflects the company information and quantifies the negative impact on the company when internal documents are leaked. In other words, the risk score may vary from company to company, even for internal documents with the same content.

또한, 지도 학습이란, 입력값과 그에 따른 출력값이 있는 데이터를 학습데이터로 이용하여 주어진 입력값에 따른 출력값을 찾는 학습을 의미하며, 정답을 알고 있는 상태에서 이루어지는 학습을 의미한다. 또한, 인공 신경망(Artificial neural network)은 많은 수의 인공 뉴런(또는, 노드)들을 이용하여 생물학적인 시스템의 계산 능력을 모방하는 소프트웨어나 하드웨어로 구현된 예측 모델이다.In addition, supervised learning refers to learning to find an output value according to a given input value using data having an input value and an output value corresponding thereto as learning data, and means learning performed in a state where the correct answer is known. Also, an artificial neural network is a predictive model implemented in software or hardware that mimics the computational power of a biological system using a large number of artificial neurons (or nodes).

학습데이터 생성부(103)는 기업 내부 문서의 분포 정보, 기업 정보, 및 키워드 파급력 정보를 훈련 입력값으로 하고, 기업 내부 문서의 분포 정보, 기업 정보, 및 키워드 파급력 정보를 기반으로 연산하여 얻어지는 목표 리스크를 훈련 출력값으로 갖는, 학습데이터를 생성할 수 있다.The learning data generating unit 103 uses the distribution information of the company internal document, the company information, and the keyword impact information as training input values, and a target obtained by calculating based on the distribution information of the company internal document, the company information, and the keyword impact information It is possible to generate training data with risk as a training output.

구체적으로, 학습데이터 생성부(103)는 훈련 입력값을 생성하기 위하여 기업 내부 문서 정보의 분포 정보와 기업 정보를 하기 수학식 2에 기초하여 문서 리스크 벡터(Y_D)로 변환할 수 있다. Specifically, the learning data generation unit 103 may convert the distribution information of the company internal document information and the company information into a document risk vector (Y _D ) based on Equation 2 below to generate a training input value.

수학식 2에서, ND_n은 리스크 검출 키워드의 총 개수 N(N은 1 이상의 자연수)개 중 n(n은 1 이상이고 N 이하인 자연수)번째 검출 키워드가 문서에 포함된 개수이고, w_n은 리스크 검출 키워드의 총 개수 N개 중 n번째 검출 키워드에 대한 기업 정보를 반영한 가중치이다. 이때, 문서 리스크 벡터(Y_D)의 성분 개수는 검출 키워드의 개수(N개)와 동일하도록 기업 내부 문서 정보의 분포 정보와 기업 정보가 문서 리스크 벡터(Y_D)로 변환될 수 있다.In Equation 2, ND _n is the number of n (n is a natural number greater than or equal to 1 and less than or equal to N) detection keyword among the total number of risk detection keywords N (N is a natural number greater than or equal to 1), and w _n is the number of keywords included in the document. It is a weight reflecting company information for the nth detection keyword among the total number of detection keywords. In this case, the distribution information of the company internal document information and the company information may be converted into the document risk vector Y _D so that the number of components of the document risk vector Y _{D is} equal to the number of detected keywords (N).

문서 리스크 벡터(Y_D)는, 검출 키워드의 개수와 기업 정보에 따른 키워드별 가중치를 반영하여 구성되기 때문에, 내부 문서 자체가 기업에 대하여 갖는 중요도 또는 영향력을 대표할 수 있다.Since the document risk vector (Y _D ) is configured by reflecting the number of detected keywords and the weight for each keyword according to company information, the internal document itself may represent the importance or influence that it has on the company.

또한, 학습데이터 생성부(103)는 훈련 입력값을 생성하기 위하여 키워드 파급력 정보를 하기 수학식 3에 기초하여 파급력 벡터(Y_RE)로 변환할 수 있다.In addition, the learning data generation unit 103 may convert the keyword impact information into the impact force vector (Y _RE ) based on Equation 3 below to generate a training input value.

수학식 3에서, RE_n는 리스크 검출 키워드의 총 개수 N개 중 n번째 검출 키워드의 파급력이다.In Equation 3, RE _n is the ripple power of the nth detection keyword out of the total number of N risk detection keywords.

파급력 벡터(Y_RE)는, 각 키워드별 파급도를 성분값으로 갖는 벡터이기 때문에, 내부 문서가 외부에 유출되었을 때 전파되는 정도를 대표할 수 있다.Since the ripple force vector (Y _RE ) is a vector having a ripple rate for each keyword as a component value, it can represent the degree of propagation when an internal document is leaked to the outside.

문서 리스크 벡터(Y_D)와 파급력 벡터(Y_RE)는 훈련 입력값을 구성할 수 있다.The document risk vector (Y _D ) and impact vector (Y _RE ) may constitute the training input.

또한, 학습데이터 생성부(103)는 기업 내부 문서의 분포 정보, 기업 정보, 및 키워드 파급력 정보를 기반으로 미리 정의된 하기 수학식 4에 따라 연산함으로써 목표 리스크 벡터(Y)를 생성할 수 있다. 구체적으로, 학습데이터 생성부(103)는, 문서 리스크 벡터(Y_D)와 파급력 벡터(Y_RE)를 이용하여 목표 리스크 벡터(Y)를 생성할 수 있다. In addition, the learning data generation unit 103 may generate the target risk vector (Y) by calculating according to the predefined Equation 4 based on the distribution information of the company internal document, the company information, and the keyword influence information. Specifically, the learning data generation unit 103 may generate the target risk vector (Y) by using the document risk vector (Y _D ) and the impact vector (Y _RE ).

구체적으로, 학습데이터 생성부(103)는, 문서 리스크 벡터(Y_D)를 전치(transpose) 연산하여 전치 문서 리스크 벡터(Y_D ^T)를 생성하고, 생성된 전치 문서 리스크 벡터(Y_D ^T)를 하나의 행(row)으로 하는 N×N 크기의 정방 행렬인, 문서 리스크 행렬(MATY_D)를 하기 수학식 4와 같이 생성할 수 있다.Specifically, the learning data generation unit 103 transposes the document risk vector (Y _D ) to generate a transposed document risk vector (Y _D ^T ), and the generated transposed document risk vector (Y _D ^T ) A document risk matrix (MATY _D ), which is a square matrix of N×N size in which is a single row, may be generated as in Equation 4 below.

다음으로, 학습데이터 생성부(103)는, 문서 리스크 행렬(MATY_D)의 각 행에 파급력 벡터(Y_RE)의 각 행에 위치하는 성분값을 곱하여 N×N 크기의 정방 행렬인, 문서 파급력 행렬(MATY_DRE)을 하기 수학식 5와 같이 생성할 수 있다.Next, the learning data generation unit 103 multiplies each row of the document risk matrix (MATY _D ) by the component value located in each row of the ripple force vector (Y _RE ), which is a square matrix of N×N size, document ripple power A matrix MATY _DRE may be generated as in Equation 5 below.

다음으로, 학습데이터 생성부(103)는, 수학식 5에 따른 문서 파급력 행렬(MATY_DRE)에 대한 고유 벡터(eigen vector)를 산출하고, 산출된 고유 벡터를 목표 리스크 벡터(Y)로서 생성할 수 있다. 여기서, 고유 벡터를 산출하는 과정은 통상의 기술자에게 널리 알려져있으므로 구체적 설명은 생략한다.Next, the learning data generation unit 103 calculates an eigen vector for the document impact matrix (MATY _DRE ) according to Equation 5, and generates the calculated eigen vector as a target risk vector (Y). can Here, since the process of calculating the eigenvector is well known to those skilled in the art, a detailed description thereof will be omitted.

여기서 생성되는 목표 리스크 벡터(Y)는 훈련 출력값이 될 수 있다.The target risk vector (Y) generated here may be a training output value.

리스크 예측 모델 학습부(104)는 생성된 훈련 입력값과 훈련 출력값으로 구성되는 학습데이터를 이용하여 리스크 예측 모델(105)을 학습시킬 수 있다.The risk prediction model learning unit 104 may learn the risk prediction model 105 by using the training data composed of the generated training input value and the training output value.

즉, 본 발명의 일 실시예에 따른 학습데이터 생성부(103)는, 내부 문서가 유출되었을 때의 파급력이나 내부 기업 리스크를 실험적으로 또는 실증적으로 수집하기 어려운 문제를 보완하기 위하여, 상술한 연산 방식을 통해 훈련 출력값을 구성하여 학습데이터로서 리스크 예측 모델(105)을 학습시켜 운용이 가능하도록 구성한 후, 실질적으로 현장에서 리스크 예측을 사용함에 따른 예측 결과를 미세조정(fine-tuning)하고, 오류 역전파 알고리즘(back propagation)을 통해 가중치를 재조정함으로써 추가적인 기계학습을 하도록 구성한다.That is, the learning data generation unit 103 according to an embodiment of the present invention is the above-described calculation method to compensate for the difficulty in experimentally or empirically collecting the ripple effect or internal corporate risk when the internal document is leaked. After configuring the training output value to learn and operate the risk prediction model 105 as learning data through It is configured to perform additional machine learning by re-adjusting the weights through a back propagation algorithm.

일 실시예에 있어서, 리스크 예측 모델 학습부(104)은 리스크 예측 모델(105)에 기업 내부 문서의 분포 정보, 기업 정보, 및 키워드 파급력 정보를 입력하여 리스크 점수가 출력되도록, 리스크 예측 모델(105)을 지도 학습시킬 수 있다.In one embodiment, the risk prediction model learning unit 104 inputs the distribution information of the company internal document, company information, and keyword impact information to the risk prediction model 105 so that the risk score is output, the risk prediction model 105 ) can be supervised.

리스크 예측 모델(105)은 리스크 예측 모델 학습부(104)에 의해서 기업 내부 문서의 기업 내부 문서의 분포 정보, 기업 정보, 및 키워드 파급력 정보를 입력하여 리스크 점수를 출력하도록 지도 학습될 수 있다. 리스크 예측 모델(105)로서 인공 신경망(Artifical Neural Network)이 이용될 수 있으며, 인공 신경망(Artificial neural network)은 많은 수의 인공 뉴런(또는, 노드)들을 이용하여 생물학적인 시스템의 계산 능력을 모방하는 소프트웨어나 하드웨어로 구현된 예측 모델이다. The risk prediction model 105 may be supervised and learned by the risk prediction model learning unit 104 to output a risk score by inputting distribution information, company information, and keyword impact information of the company internal document of the company internal document. An artificial neural network may be used as the risk prediction model 105, and the artificial neural network mimics the computational power of a biological system using a large number of artificial neurons (or nodes). A predictive model implemented in software or hardware.

학습데이터 생성부(103), 리스크 예측 모델 학습부(104) 및 리스크 예측 모델(105)의 동작에 대해서는 도 4를 참조하여 자세히 설명하도록 한다.Operations of the training data generation unit 103 , the risk prediction model learning unit 104 , and the risk prediction model 105 will be described in detail with reference to FIG. 4 .

도 4는 도 1에 따른 리스크 예측 서버에서 이용하는 인공 신경망의 구조와 동작을 설명하기 위한 개념도이다.4 is a conceptual diagram for explaining the structure and operation of an artificial neural network used in the risk prediction server according to FIG. 1 .

도 4를 참조하면, 인공 신경망(10)은 입력층(11), 은닉층(12), 및 출력층(13)을 포함할 수 있다.Referring to FIG. 4 , the artificial neural network 10 may include an input layer 11 , a hidden layer 12 , and an output layer 13 .

본 발명의 일 실시예에서 인공 신경망(10)은 학습 데이터의 훈련 입력값을 입력받고, 학습 데이터의 훈련 출력값인 목표 리스크 벡터(Y)를 은닉층(12)에서 출력할 수 있도록 지도학습된다.In an embodiment of the present invention, the artificial neural network 10 receives a training input value of training data, and is supervised so as to output a target risk vector (Y), which is a training output value of the training data, from the hidden layer 12 .

입력층(11)은, 문서 리스크 벡터(Y_D) 및 파급력 벡터(Y_RE) 각각의 성분 개수(N)와 동일한 개수(N)의 입력 노드들로 구성될 수 있다. 예를 들어, 문서 리스크 벡터(Y_D) 및 파급력 벡터(Y_RE) 각각의 성분 개수가 N개인 경우, 입력층(11)은 N개의 입력 노드들로 구성될 수 있다.The input layer 11 may be composed of a number of input nodes equal to the number N of each of the document risk vector Y _D and the impact vector Y _RE . For example, when the number of components of each of the document risk vector (Y _D ) and the impact vector (Y _RE ) is N, the input layer 11 may be composed of N input nodes.

입력층(11)은 문서 리스크 벡터(Y_D) 및 파급력 벡터(Y_RE) 각각에 대하여, 입력 노드들과 대응하는 하나 이상의 연결 강도값들을 적용하여 은닉층(12)에 전달할 수 있다.The input layer 11 may apply and transmit one or more connection strength values corresponding to the input nodes to the hidden layer 12 for each of the document risk vector (Y _D ) and the impact vector (Y _RE ).

예를 들어, 입력 노드들 각각에 대응하는 하나 이상의 연결 강도값들은 N×V의 크기를 갖는 제1 연결강도 행렬(W_N×V)로 표현할 수 있다. 이때, N은 입력노드들과 동일한 개수인 N일 수 있고, V는 벡터의 매우 많은 다차원(여기서 차원은 성분값의 개수와 동일)을 유의미한 차원으로 사상(projection)할 수 있도록, N보다 매우 작게 설정되는 것이 유리하다. 예를 들어, V는 N의 1/10일 수 있다. 제1 연결강도 행렬(W_N×V)은 임의의 초기값으로 설정된 후 지도학습을 통해 지속적으로 갱신될 수 있다.For example, one or more connection strength values corresponding to each of the input nodes may be expressed as a first connection strength matrix W _N×V having a size of N×V. In this case, N may be N, which is the same number of input nodes, and V is very smaller than N so that very many multidimensional (here, dimension equal to the number of component values) of the vector can be projected into a meaningful dimension. It is advantageous to set For example, V may be 1/10 of N. The first connection strength matrix (W _N×V ) may be continuously updated through supervised learning after being set to an arbitrary initial value.

종합하면, 입력층(11)은 입력받은 문서 리스크 벡터(Y_D) 에 제1 연결강도 행렬(W_N×V)을 행렬곱 연산하여 얻어진 제1 중간 벡터(X1)를 은닉층(12)에 전달하고, 입력받은 파급력 벡터(Y_RE)에 제1 연결강도 행렬(W_N×V)을 행렬곱 연산하여 얻어진 제2 중간 벡터(X2)를 은닉층(12)에 전달할 수 있다.In summary, the input layer 11 transfers the first intermediate vector (X1) obtained by the matrix multiplication operation of the received document risk vector (Y _D ) to the first connection strength matrix (W _N×V ) to the hidden layer 12 . and a second intermediate vector (X2) obtained by performing a matrix multiplication operation of the input ripple force vector (Y _RE ) with the first connection strength matrix (W _N×V ) may be transmitted to the hidden layer 12 .

은닉층(12)은, 입력층(11)으로부터 전달받은 제1 중간 벡터(X1) 및 제2 중간 벡터(X2)를 서로 합성하여 생성된 중간 출력 벡터(X`)에 은닉 노드들 각각에 대응하는 하나 이상의 연결 강도를 적용하여 출력 벡터(Y`)를 생성하고, 생성된 출력 벡터(Y`)를 출력층(13)에 전달할 수 있다. 여기서, 제1 중간 벡터(X1) 및 제2 중간 벡터(X2)를 합성하는 과정은, 미리 정의된 은닉층(12)의 기준식에 따라 결정되며, 본 발명이 속하는 기술분야에서 다양한 기준식들이 공지되어 있으므로 통상의 기술자가 이를 선택적으로 적용할 수 있다. 이때, 은닉 노드들 각각에 대응하는 하나 이상의 연결 강도값들은 V×N의 크기를 갖는 제2 연결강도 행렬(U_V×N)로 표현할 수 있다. 즉, 제2 연결강도 행렬(U_V×N)은 V개의 차원으로 사상된 중간 출력 벡터(X`)를 다시 N개의 차원으로 복원한다.The hidden layer 12 corresponds to each of the hidden nodes in the intermediate output vector X′ generated by synthesizing the first intermediate vector X1 and the second intermediate vector X2 received from the input layer 11 with each other. An output vector Y′ may be generated by applying one or more connection strengths, and the generated output vector Y′ may be transmitted to the output layer 13 . Here, the process of synthesizing the first intermediate vector (X1) and the second intermediate vector (X2) is determined according to a predefined reference formula of the hidden layer 12, and various reference formulas are known in the art. Therefore, a person skilled in the art can selectively apply it. In this case, one or more connection strength values corresponding to each of the hidden nodes may be expressed as a second connection strength matrix (U _V×N ) having a size of V×N. That is, the second connection strength matrix (U _V×N ) restores the intermediate output vector (X`) mapped in V dimensions to N dimensions again.

한편, 제2 연결강도 행렬(U_V×N)의 초기값은 임의의 값으로 설정된 후, 중간 출력 벡터(X`)와 제2 연결강도 행렬(U_V×N) 사이의 행렬곱 연산하여 생성된 출력 벡터(Y`)가 훈련 출력값인 목표 리스크 벡터(Y)가 되도록 지속적으로 갱신될 수 있다. 즉, 제2 연결강도 행렬(U_V×N)은 학습 데이터를 지속적으로 지도학습함에 따라 갱신될 수 있다.On the other hand, after the initial value of the second connection strength matrix (U _V×N ) is set to an arbitrary value, it is generated by matrix multiplication operation between the intermediate output vector (X`) and the second connection strength matrix (U _V×N ) It may be continuously updated so that the output vector (Y`) obtained becomes the target risk vector (Y), which is the training output value. That is, the second connection strength matrix (U _V×N ) may be updated as the learning data is continuously supervised.

출력층(13)은, 은닉층(12)으로부터 전달받은 출력 벡터(Y`)에 활성화 함수를 적용함으로써 출력 벡터(Y`)에 대응하는 확률(p)을 출력할 수 있다. 활성화 함수는 다양한 범위를 가지는 값들을 0과 1 사이의 값으로 확대 또는 축소함으로써 확률로 변환하는 효과가 있다. 예를 들어, 활성화 함수는, LeRU 함수 또는 Softmax 함수일 수 있으나 이에 한정되는 것은 아니다.The output layer 13 may output a probability p corresponding to the output vector Y′ by applying an activation function to the output vector Y′ received from the hidden layer 12 . The activation function has the effect of converting values having various ranges into probabilities by expanding or reducing them to values between 0 and 1. For example, the activation function may be a LeRU function or a Softmax function, but is not limited thereto.

리스크 예측 서버(100)는, 출력층(13)으로부터 출력된 확률(p)에 기반하여 키워드 파급력 정보를 반영한 리스크 점수를 결정할 수 있다. The risk prediction server 100 may determine a risk score reflecting keyword impact information based on the probability p output from the output layer 13 .

한편, 앞서 설명한 것처럼 인공 신경망(10)은, 학습 데이터의 훈련 입력값(임의의 기업에 대한 기업 내부 문서의 검출 키워드에 대한 분포 정보, 기업 정보, 및 키워드 파급력 정보)을 벡터 형태로 입력받아 출력된 출력 벡터(Y`)와 목표 리스크 벡터(Y) 사이의 차이가 최소화되도록 제1 연결강도 행렬(W_N×V)과 제2 연결강도 행렬(U_V×N)을 지속적으로 갱신하는 방식으로 지도학습될 수 있다.On the other hand, as described above, the artificial neural network 10 receives and outputs the training input value of the learning data (distribution information, company information, and keyword impact information for a keyword detected in a company internal document for an arbitrary company) in a vector form. The first connection strength matrix (W _N×V ) and the second connection strength matrix (U _V×N ) are continuously updated so that the difference between the output vector (Y`) and the target risk vector (Y) is minimized. can be supervised.

이를 위해 더욱 구체적으로, 인공 신경망(10)은, 손실 함수(Loss function)에 출력 벡터(Y`)와 목표 리스크 벡터(Y)를 대입하고, 손실 함수의 결과값이 최소화되도록 제1 연결강도 행렬(WN×V)과 제2 연결강도 행렬(UV×N)을 지속적으로 갱신할 수 있다.To this end, more specifically, the artificial neural network 10 substitutes an output vector (Y`) and a target risk vector (Y) into a loss function, and a first connection strength matrix so that the resultant value of the loss function is minimized. (WN×V) and the second connection strength matrix (UV×N) may be continuously updated.

예를 들어, 손실 함수는, 크로스 엔트로피(Cross Entropy) 함수일 수 있다. 출력 벡터(Y`)와 목표 리스크 벡터(Y) 사이의 크로스 엔트로피(H(Y,Y`))는 다음의 수학식 6과 같이 정의될 수 있다.For example, the loss function may be a cross entropy function. The cross entropy (H(Y, Y')) between the output vector (Y′) and the target risk vector (Y) may be defined as in Equation 6 below.

수학식 6에서 Ym은 목표 리스크 벡터(Y)의 m(m은 1 이상의 자연수)번째 성분이고, Y`m은 점수 벡터(Y`)의 m번째 성분일 수 있다.In Equation 6, Ym may be the m-th component (m is a natural number greater than or equal to 1) of the target risk vector (Y), and Y'm may be the m-th component of the score vector (Y').

한편, 본 발명의 일 실시예에서 손실 함수(Loss function, LF)는 정확도를 향상시키기 위해 다음의 수학식 7과 같이 정의될 수도 있다.Meanwhile, in an embodiment of the present invention, a loss function (LF) may be defined as in Equation 7 below to improve accuracy.

수학식 7을 참조하면, 손실 함수(LF)는, 목표 리스크 벡터(Y)와 출력 벡터(Y`) 사이의 내적을 목표 리스크 벡터(Y)의 놈(norm) 연산값과 출력 벡터(Y`)의 놈(norm) 연산값으로 나눈 값 및 크로스 엔트로피 함수(H)의 결과값 중 큰 값으로 정의될 수 있다.Referring to Equation 7, the loss function (LF) is a dot product between the target risk vector (Y) and the output vector (Y`), the norm operation value of the target risk vector (Y) and the output vector (Y`) .

도 5는 문서 리스크 벡터(Y_D)와 리스크 검출 키워드의 파급력 벡터(Y_RE)를 설명하기 위한 개념도이다.5 is a conceptual diagram for explaining the document risk vector (Y _D ) and the impact vector (Y _RE ) of the risk detection keyword.

도 5를 참조하면, 총 N개의 검출 키워드가 미리 설정되어 있고, 검출 키워드마다 할당된 파급력, 특정 기업 내부 문서에 대하여 검출 키워드의 포함 개수가 나타나 있음을 알 수 있다. Referring to FIG. 5 , it can be seen that a total of N detection keywords are preset, and the ripple power assigned to each detection keyword and the number of detection keywords included in a specific company internal document are indicated.

관련하여, 문서 리스크 벡터(Y_D)는 기업 단말(200)로부터 획득한 기업 내부 문서의 검출 키워드에 대한 분포 정보(총 N개)와 기업 정보에 기초하여 생성되고, 리스크 검출 키워드의 파급력 벡터(Y_RE)는 외부 서버(300)로부터 획득한 기사 정보에 기초하여 총 N개의 리스크 검출 키워드 각각의 파급력을 계산하여 생성된 키워드 파급력 정보에 기초하여 생성될 수 있다. In relation to this, the document risk vector (Y _D ) is generated based on the distribution information (N total) and the company information for the detection keyword of the company internal document obtained from the company terminal 200, and the impact vector of the risk detection keyword ( Y _RE ) may be generated based on the keyword impact information generated by calculating the impact of each of the total N risk detection keywords based on the article information obtained from the external server 300 .

일 실시예에 있어서, 문서 리스크 벡터(Y_D)는 1×N의 크기의 벡터로, n행의 성분값은 언어 전처리기를 이용하여 추출되는 기업 내부 문서에 검출 키워드가 포함된 개수(ND_n)와 총 N개의 검출 키워드 중 n번째 검출 키워드에 대해 기업정보에 기초하여 할당된 가중치(w_n)의 곱이 되도록 설정된다. 이때, 검출 키워드에 대해 기업정보에 기초하여 할당된 가중치는 기업마다 달라질 수 있으며, 예를 들어, 검출 키워드 '상속'은 대기업의 경우가 개인사업자의 경우보다 리스크가 높기 때문에 대기업의 검출 키워드 '상속'에 대한 가중치는 개인사업자의 검출 키워드 '상속'에 대한 가중치보다 높게 설정되도록 한다. 또한, 검출 키워드에 대해 기업정보에 기초하여 할당된 가중치는 검출 키워드마다 달라질 수도 있다.In one embodiment, the document risk vector (Y _D ) is a vector with a size of 1×N, and the component values of n rows are the number of detected keywords included in the corporate internal document extracted using a language preprocessor (ND _n ) and a weight (w _n ) assigned based on the company information for the nth detection keyword among the total N detection keywords. At this time, the weight assigned to the detection keyword based on company information may vary from company to company. The weight for ' is set to be higher than the weight for the detected keyword 'inheritance' of the individual entrepreneur. In addition, a weight assigned to the detected keyword based on company information may be different for each detected keyword.

또한, 리스크 검출 키워드의 파급력 벡터(Y_RE)는 1×N의 크기의 벡터로, n행의 성분값은 총 N개의 검출 키워드 중 n번째 검출 키워드의 파급력(RE_n)이 되도록 설정된다. 이때, 검출 키워드의 파급력은 키워드 파급력 추출부(102)에 의해 상술한 방식에 의해 계산될 수 있다.In addition, the ripple power vector (Y _RE ) of the risk detection keyword is a 1×N vector, and the component value of the n row is set to be the ripple power (RE _n ) of the nth detection keyword among the total N detection keywords. In this case, the ripple power of the detected keyword may be calculated by the keyword ripple power extraction unit 102 in the manner described above.

도 6은 본 발명의 일 실시예에 따른 리스크 예측 서버의 하드웨어 구성을 예시적으로 나타낸 도면이다.6 is a diagram exemplarily showing a hardware configuration of a risk prediction server according to an embodiment of the present invention.

도 6을 참조하면, 리스크 예측 서버(100)는, 적어도 하나의 프로세서(110); 및 상기 적어도 하나의 프로세서(110)가 적어도 하나의 동작(operation)을 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory)를 포함할 수 있다.Referring to FIG. 6 , the risk prediction server 100 includes at least one processor 110 ; and a memory for storing instructions instructing the at least one processor 110 to perform at least one operation.

상기 적어도 하나의 동작은, 도 1 내지 도 5를 참조하여 설명한 리스크 예측 서버(100)의 동작을 포함할 수 있다.The at least one operation may include the operation of the risk prediction server 100 described with reference to FIGS. 1 to 5 .

여기서 적어도 하나의 프로세서(110)는 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명의 실시예들에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다.Here, the at least one processor 110 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed. can

메모리(120)는 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(120)는 읽기 전용 메모리(read only memory, ROM) 및 랜덤 액세스 메모리(random access memory, RAM) 중에서 적어도 하나로 구성될 수 있다.The memory 120 may be configured as at least one of a volatile storage medium and a non-volatile storage medium. For example, the memory 120 may be configured as at least one of a read only memory (ROM) and a random access memory (RAM).

저장 장치(160)는, 인공 신경망(10)을 저장할 수 있는데, 예를 들어 HDD(hard disk drive), SSD(solid state drive) 등일 수 있다.The storage device 160 may store the artificial neural network 10 , and may be, for example, a hard disk drive (HDD), a solid state drive (SSD), or the like.

또한, 리스크 예측 서버(100)는, 무선 네트워크를 통해 통신을 수행하는 송수신 장치(transceiver)(130)를 포함할 수 있다. 또한, 리스크 예측 서버 (100)는 입력 인터페이스 장치(140), 출력 인터페이스 장치(150), 저장 장치(160) 등을 더 포함할 수 있다. 리스크 예측 서버(100)에 포함된 각각의 구성 요소들은 버스(bus, 170)에 의해 연결되어 서로 통신을 수행할 수 있다.In addition, the risk prediction server 100 may include a transceiver 130 for performing communication through a wireless network. In addition, the risk prediction server 100 may further include an input interface device 140 , an output interface device 150 , a storage device 160 , and the like. Each of the components included in the risk prediction server 100 may be connected by a bus 170 to communicate with each other.

리스크 예측 서버(100)의 예를 들면, 통신 가능한 데스크탑 컴퓨터(desktop computer), 랩탑 컴퓨터(laptop computer), 노트북(notebook), 스마트폰(smart phone), 태블릿 PC(tablet PC), 모바일폰(mobile phone), 스마트 워치(smart watch), 스마트 글래스(smart glass), e-book 리더기, PMP(portable multimedia player), 휴대용 게임기, 네비게이션(navigation) 장치, 디지털 카메라(digital camera), DMB(digital multimedia broadcasting) 재생기, 디지털 음성 녹음기(digital audio recorder), 디지털 음성 재생기(digital audio player), 디지털 동영상 녹화기(digital video recorder), 디지털 동영상 재생기(digital video player), PDA(Personal Digital Assistant) 등일 수 있다.For example, the risk prediction server 100 that can communicate, a desktop computer (desktop computer), a laptop computer (laptop computer), a notebook (notebook), a smart phone (smart phone), a tablet PC (tablet PC), mobile phone (mobile) phone, smart watch, smart glass, e-book reader, PMP (portable multimedia player), portable game console, navigation device, digital camera, DMB (digital multimedia broadcasting) ) player, digital audio recorder, digital audio player, digital video recorder, digital video player, PDA (Personal Digital Assistant), and the like.

본 발명에 따른 방법들은 다양한 컴퓨터 수단을 통해 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위해 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.The methods according to the present invention may be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the computer-readable medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software.

컴퓨터 판독 가능 매체의 예에는 롬(ROM), 램(RAM), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함될 수 있다. 프로그램 명령의 예에는 컴파일러(compiler)에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다. 상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 적어도 하나의 소프트웨어 모듈로 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of computer-readable media may include hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions may include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as at least one software module to perform the operations of the present invention, and vice versa.

또한, 상술한 방법 또는 장치는 그 구성이나 기능의 전부 또는 일부가 결합되어 구현되거나, 분리되어 구현될 수 있다. In addition, the above-described method or apparatus may be implemented by combining all or part of its configuration or function, or may be implemented separately.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. Although the above has been described with reference to the preferred embodiment of the present invention, those skilled in the art can variously modify and change the present invention within the scope without departing from the spirit and scope of the present invention as described in the claims below. You will understand that it can be done.

10: 인공 신경망 11: 입력층
12: 은닉층 13: 출력층
100: 리스크 예측 서버 101: 기업 내부 문서 전처리부
102: 키워드 파급력 추출부 103: 학습데이터 생성부
104: 리스크 예측 모델 학습부 105: 리스크 예측 모델
110: 프로세서 120: 메모리
130: 송수신 장치 140: 입력 인터페이스 장치
150: 출력 인터페이스 장치 160: 저장 장치
170: 버스 200: 기업단말
300: 외부 서버 400: 사용자 단말10: artificial neural network 11: input layer
12: hidden layer 13: output layer
100: risk prediction server 101: internal corporate document preprocessing unit
102: keyword impact extraction unit 103: learning data generation unit
104: risk prediction model learning unit 105: risk prediction model
110: processor 120: memory
130: transceiver device 140: input interface device
150: output interface device 160: storage device
170: bus 200: corporate terminal
300: external server 400: user terminal

Claims

인공 신경망을 이용하여 기업 내부 문서의 리스크를 예측하는 장치로서,
적어도 하나의 프로세서(processor); 및
상기 적어도 하나의 프로세서가 적어도 하나의 동작(operation)을 수행하도록 지시하는 명령어들(instructions)을 저장하는 메모리(memory)를 포함하고,
상기 적어도 하나의 동작은,
외부 서버로부터 기사 정보를 수신하는 단계;
상기 기사 정보에 기초하여 미리 정의된 리스크 검출 키워드에 대한 키워드 파급력 정보를 생성하는 단계;
기업 단말로부터 기업 내부 문서 및 기업 정보를 수신하는 단계;
상기 기업 내부 문서와 상기 기업 정보, 및 상기 키워드 파급력 정보를 미리 지도학습(supervised learning)된 인공 신경망에 입력하는 단계; 및
상기 인공 신경망의 출력값에 기반하여 상기 기업 내부 문서의 리스크 점수를 결정하는 단계를 포함하고,
상기 리스크 점수는 상기 기업 내부 문서가 유출되었을 경우 기업에 끼치는 부정적인 영향력인 리스크(risk)를 정량화한 점수를 의미하고,
상기 키워드 파급력 정보를 생성하는 단계는,
상기 기사 정보에 기초하여 상기 리스크 검출 키워드의 파급력을 계산하고, 계산된 상기 파급력을 포함하는 상기 키워드 파급력 정보를 생성하는 단계를 포함하며,
상기 파급력은, 하기 수학식에 기초하여 결정되고,

상기 수학식에서 RE_n은 임의의 n(n은 1 이상의 자연수로서 상기 리스크 검출 키워드의 총 개수 N보다 작거나 같음)번째 리스크 검출 키워드에 대한 파급력(Ripple Effect)이고, PT는 기사가 게재된 언론사 티어 점수(Press Tier Score)이고, EM은 온라인 노출 또는 오프라인 노출에 따라 달라지는 노출 방식 점수(Exposure Method Score)이고, MM은 기사 내에서 상기 리스크 검출 키워드가 사용된 방식에 따라 달라지는 언급 방식 점수(Mention Method Score)이며,
상기 인공 신경망은 미리 생성된 학습 데이터를 이용하여 지도학습되고,
상기 학습 데이터는,
상기 리스크 검출 키워드와 상기 기업 정보에 기초하여 생성된 문서 리스크 벡터 및 상기 키워드 파급력 정보에 기초하여 생성된 파급력 벡터로 구성된 훈련 입력값; 및
분포 정보, 상기 기업 정보, 및 상기 키워드 파급력 정보를 기반으로 생성된 목표 리스크 벡터로 구성된 훈련 출력값을 포함하고,
상기 문서 리스크 벡터는, 상기 리스크 검출 키워드의 총 개수 N개 중 상기 n번째 리스크 검출 키워드가 기업 내부 문서에 포함된 개수 및 상기 n번째 리스크 검출 키워드에 대하여 기업 정보를 반영하여 설정된 가중치를 서로 곱한 값을 n번째 성분으로 갖고,
상기 파급력 벡터는, 상기 n번째 리스크 검출 키워드에 대한 파급력을 n번째 성분으로 가지며,
상기 목표 리스크 벡터는,
상기 문서 리스크 벡터를 전치(transpose) 연산하여 전치 문서 리스크 벡터를 생성하는 단계;
생성된 상기 전치 문서 리스크 벡터를 하나의 행(row)으로 하는 정방 행렬인 문서 리스크 행렬을 생성하는 단계;
생성된 상기 문서 리스크 행렬의 각 행에 상기 파급력 벡터의 각 행에 위치하는 성분값을 곱하여 문서 파급력 행렬을 생성하는 단계; 및
생성된 상기 문서 파급력 행렬에 대한 고유 벡터를 상기 목표 리스크 벡터로 결정하는 단계를 거쳐 결정되는, 장치.A device for predicting the risk of internal corporate documents using an artificial neural network,
at least one processor; and
and a memory for storing instructions instructing the at least one processor to perform at least one operation,
The at least one operation is
receiving article information from an external server;
generating keyword impact information for a predefined risk detection keyword based on the article information;
Receiving a corporate internal document and corporate information from the corporate terminal;
inputting the company internal document, the company information, and the keyword impact information into an artificial neural network that has been supervised in advance; and
Comprising the step of determining the risk score of the internal document of the company based on the output value of the artificial neural network,
The risk score refers to a score that quantifies a risk that is a negative influence on a company when the internal documents of the company are leaked,
The generating of the keyword impact information includes:
calculating the ripple power of the risk detection keyword based on the article information, and generating the keyword ripple power information including the calculated ripple power;
The ripple power is determined based on the following formula,

In the above formula, RE _n is the ripple effect for the nth risk detection keyword (where n is a natural number greater than or equal to 1 and less than or equal to the total number N of the risk detection keywords), PT is the media tier where the article is published Score (Press Tier Score), EM is the Exposure Method Score that depends on online exposure or offline exposure, and MM is the Mention Method Score that depends on how the risk detection keyword is used within the article. Score),
The artificial neural network is supervised learning using pre-generated learning data,
The learning data is
a training input value consisting of a document risk vector generated based on the risk detection keyword and the company information, and a ripple power vector generated based on the keyword impact information; and
including a training output value composed of a target risk vector generated based on distribution information, the company information, and the keyword impact information,
The document risk vector is a value obtained by multiplying the number of the nth risk detection keyword among the total number of risk detection keywords N included in the company internal document and a weight value set by reflecting company information for the nth risk detection keyword has as the nth component,
The ripple power vector has a ripple power for the n th risk detection keyword as an n th component,
The target risk vector is,
generating a transpose document risk vector by transposing the document risk vector;
generating a document risk matrix that is a square matrix using the generated transposed document risk vector as one row;
generating a document impact matrix by multiplying each row of the generated document risk matrix by component values located in each row of the impact vector; and
and determining an eigenvector for the generated document impact matrix as the target risk vector.

청구항 1에서,
상기 적어도 하나의 동작은,
언어 전처리기를 이용하여 상기 리스크 검출 키워드에 대한 분포 정보를 생성하는 단계를 더 포함하며,
상기 분포 정보는 상기 리스크 검출 키워드가 상기 기업 내부 문서에 포함되어 있는지 여부와 상기 기업 내부 문서에 상기 리스크 검출 키워드가 포함된 개수를 포함하는, 장치.In claim 1,
The at least one operation is
Further comprising the step of generating distribution information for the risk detection keyword using a language preprocessor,
The distribution information includes whether the risk detection keyword is included in the corporate internal document and the number of the risk detection keyword included in the corporate internal document, the device.

삭제delete

청구항 1에서,
상기 인공 신경망은,
미리 정의된 손실 함수에 기초하여, 상기 훈련 입력값을 입력받았을 때 얻어지는 출력 벡터와 상기 목표 리스크 벡터 사이의 차이가 최소화되도록 지도학습되는, 장치.In claim 1,
The artificial neural network is
Supervised learning such that a difference between an output vector obtained when receiving the training input value and the target risk vector is minimized based on a predefined loss function.