KR102295868B1

KR102295868B1 - Network failure prediction system

Info

Publication number: KR102295868B1
Application number: KR1020210013985A
Authority: KR
Inventors: 임승환
Original assignee: (주)제스아이앤씨
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2021-09-01
Also published as: KR102295868B9

Abstract

An objective of the present invention is to provide a network failure prediction system which automatically takes action with respect to a disability situation for several hours to several days within five minutes when an existing server fails by minimizing an artificial factor intervention. The network failure prediction system according to the present invention comprises: a micro agent for extracting raw data for each of ICT infra, which includes servers, networks, DBMSs, applications, application platforms and application solutions, and a source code developing system, and sensing use rate, number of users, connection status, resource occupancy situation, and response time corresponding to the extracted raw data to label based on the source code driven in the ICT infra and the source code driven in the source code developing system; a data collecting unit to receive labeled data from the micro agent; an applied learning model in which learning data for predicting network failure is established; a machine running unit to perform supervised learning with respect to the applied learning model using the collected data from the source code developing system, and to update and optimize learning model of the applied learning model using the collected data from the ICT infra and the source code developing system; and a failure predicting unit to predict failure using the data collected from the ICT infra and the source code developing system.

Description

네트워크 장애예측 시스템{Network failure prediction system}Network failure prediction system

본 발명은 네트워크 장애예측 시스템에 관한 것으로서, 보다 상세하게는 기존의 네트워크 장애예측 시스템에 더하여 소스코드의 오류에 의하여 유발되는 장애예측 등을 수행하는 시스템에 관한 것이다.The present invention relates to a system for predicting network failure, and more particularly, to a system for predicting failure caused by source code errors in addition to an existing network failure prediction system.

4차 산업혁명 시대와 함께 ICT 서비스 수요의 급증에 따라 사회 전반의 ICT 서비스 인프라의 위험요인이 크게 잠재되어 있다. 서버장애로 대표되는 ICT 인프라 장애는 발생과 동시에 이미 큰 비용과 사회적 손실을 야기하고 있으며 장애 이후 조치가 아닌 사전에 선제적인 장애감지 및 예방 기술이 어떤 산업분야보다 필요한 상황이다.With the rapid increase in demand for ICT services along with the era of the 4th industrial revolution, risk factors for ICT service infrastructure in society as a whole are highly potential. ICT infrastructure failures, represented by server failures, already cause large costs and social losses as soon as they occur.

그러나, 웹/모바일 기반의 폭발적인 서비스 증가에 비해 ICT업계의 기술인력 부족 현상은 심화되어 가고 있으며, ICT 인프라 운영 관리 기술인력에 비해 턱없이 부족한 인력난이 가중되어 가고 있다.However, compared to the explosive increase in web/mobile-based services, the shortage of technical manpower in the ICT industry is getting worse, and the shortage of manpower is getting worse compared to the manpower for ICT infrastructure operation and management.

산업적으로도 ICT 인프라 수요의 급증에 따른 장애 발생 빈도가 증가하고 그에 따른 손실도 크게 증가하고 있다. 2017년부터 정부는 효과적인 정보시스템 장애 예방 및 대응을 위해 필요한 개선 사항의 발굴과 조치를 강하게 권고하고 있는 상황에서 이 같은 시장 환경과 수요에 비해 ICT 분야의 장애 예측 및 예방 기술의 개발은 크게 진전이 없는 상황이며 해당 분야에 AI 기술의 접목은 더욱 요원한 것이 현실이다. Industrially, the frequency of failures is increasing due to the rapid increase in ICT infrastructure demand, and the resulting losses are also increasing significantly. Since 2017, the government has strongly recommended the discovery and measures necessary for effective information system failure prevention and response. The reality is that there is no such situation, and the application of AI technology to the relevant field is far more difficult.

또한 ICT 장애관리 분야에서 전문화되고 경험 있는 기술인력의 부족은 갈수록 심화되어 가고 있는데 비해 웹/모바일 서비스의 증가로 장애발생은 폭발적으로 증가하고 있다. AI의 오류에 따른 잘못된 판단에 대비한 AI의 예측 근거와 과정에 대한 분석 기술과 함께 개발 단계에서 소스코드 오류 및 파악하기 어려운 장애 원인을 AI에 의해 사전에 감지하여 짧은 시간에 자동화된 장애 예방 기술에 대한 요구는 현재 뿐 만 아니라 향후에 더욱 급증할 것이다.In addition, while the shortage of specialized and experienced technical manpower in the ICT disability management field is increasing, the occurrence of disabilities is increasing explosively due to the increase of web/mobile services. Automated failure prevention technology in a short time by detecting source code errors and difficult-to-understand causes of failures by AI in advance in the development stage, along with analysis technology for the basis and process of AI prediction in preparation for erroneous judgments caused by AI errors The demand for it will increase not only now but also in the future.

이에 비하여 그간의 국내외 서버 기반의 ICT 인프라 장애 예측 시스템 도입을 위해서는 기대효과에 비해 고가의 비용과 기대 수요에 최적화가 쉽지 않은 솔루션 구성 등의 어려운 문제가 상존하며, 거의 대부분의 IT 스타트업 기업을 포함한 중소 웹/모바일 서비스 기업은 기업 규모상 서버 시스템 운영 및 관리에 매우 큰 어려움을 겪고 있다. 단 한번의 서버 장애로 인해 브랜드가치 하락하거나 신뢰도가 저하되는 경우가 흔히 발생하는 경우가 빈번. 전문인력 부족으로 서버 장애시 즉각적인 복구가 되지 않아 회원이탈, 거래 및 결제 오류, 매출 등에 치명적 타격 등으로 폐업에 이르는 경우까지 발생되는 등 그 심각성은 매우 큰데 비해 현실적인 대응책은 부족한 상황이다.On the other hand, in order to introduce the server-based ICT infrastructure failure prediction system in Korea and abroad, difficult problems such as high cost compared to the expected effect and solution configuration that is not easy to optimize for the expected demand exist. Small and medium-sized web/mobile service companies face great difficulties in operating and managing server systems due to their corporate scale. A single server failure often leads to a decrease in brand value or a decrease in reliability. In the event of a server failure due to a lack of professional manpower, immediate recovery is not possible, leading to cases of membership withdrawal, transaction and payment errors, and fatal blows to sales.

서버 시스템 관련 장애는 기업입장에서 실제로 장애가 발생한 원인이 외부로 알려지지 않거나 알려져서는 안되는 사실들이 매우 많다. 금융기관 및 기타 웹/모바일 서비스를 통해 다양한 수익사업을 펴고 있는 기업들은 서비스를 중단할 수 없어 시스템 패치를 뒤로 미루는 경우가 많은데 이 때문에 위험성을 인식하면서도 시스템을 중단하고 패치를 적용하기 전까지 외부 공격과 장애에 노출되는 경우가 매우 많다. 이 경우 장애원인으로서 보고 또는 외부에 알려지면 해당 기업의 기술력에 치명적인 약점으로 노출되기에 공개하지 않을 뿐 아니라, 향후에도 반복적인 장애에 노출될 수밖에 없다. 또한, 전문인력 부족으로 패치 적용 및 시스템 점검을 수행하지 못하는 중소기업이 대부분인 것이 ICT 업계의 현실이다.As for the server system related failure, there are many facts that the cause of the actual failure is not known to the outside or should not be known from the point of view of the company. Financial institutions and other web/mobile services companies that are operating various profit businesses often delay system patching because they cannot stop the service. They are very often exposed to disability. In this case, if it is reported or known to the outside as the cause of the failure, it is not disclosed because it is exposed as a fatal weakness to the technology of the company concerned, and in the future, it is inevitably exposed to repeated failures. In addition, the reality of the ICT industry is that most SMEs are unable to apply patches and perform system inspections due to a lack of professional manpower.

이와 함께, 기업의 복잡한 ICT인프라 환경도 패치 적용 및 업데이트를 어렵게 하는 대표적인 이유로, 데이터베이스는 미들웨어나 애플리케이션에 연결돼 있는 경우가 대부분이며, 패치 및 업그레이드 정책은 단순 권고부터 버전 업그레이드까지 다양할 수 있다. 특히 문제점이 발견돼 버전을 업그레이드해야 경우 DB와 OS, 어플리케이션, 솔루션 등 제반 인프라와의 연계로 패치를 미룰 수밖에 없는 경우가 비일비재하다. DB나 OS를 업그레이드하기 위해 다른 애플리케이션과 기술적으로 단단히 맞물려 있는 미들웨어를 변경하기는 쉽지 않은 일이기 때문이다.At the same time, the most common reason that a company's complex ICT infrastructure environment makes patch application and update difficult is that the database is mostly connected to middleware or applications, and patch and upgrade policies can vary from simple recommendations to version upgrades. In particular, when a problem is found and the version needs to be upgraded, it is not uncommon to have to postpone the patch due to the linkage between the DB, OS, applications, and other infrastructure such as solutions. This is because it is not easy to change middleware that is technically tightly interlocked with other applications to upgrade DB or OS.

더 나아가 소스코드의 경우 자체의 오류로 인하여 네트워크의 장애를 유발하는 경우가 있으나 네트워크 장애예측 시스템에서 이러한 점을 고려하여 분석 및 대응을 수행하는 시스템에 대한 연구는 부족한 실정이다.Furthermore, in the case of source code, network failure may be caused due to its own error, but research on a system that analyzes and responds in consideration of such a point in a network failure prediction system is insufficient.

(특허문헌 0001) 공개특허공보 제10-2001-0057820호, 2001.07.05(Patent Document 0001) Patent Publication No. 10-2001-0057820, 2001.07.05

본 발명은 인위적인 요소 개입을 최소화하여 기존 서버 장애 발생시 몇시간에서 며칠이 소요되는 장애상황을 5분 이내에 자동 예방 조치로 기업 이익과 손실 보전에 기여할 수 있으며, 동시에 소스코드에 의한 의도치 않은 장애와 파악되지 않은 잠재적인 장애를 예측하여 이를 해결할 수 있는 정책을 적용가능한 시스템을 제공하는데 있다.The present invention can contribute to the preservation of corporate profits and losses through automatic preventive measures within 5 minutes of a failure situation that takes several hours to several days in the event of an existing server failure by minimizing the intervention of artificial factors, and at the same time, It is to provide a system that can apply policies to predict and solve potential failures that have not been identified.

본 발명에 따른 네트워크 장애예측 시스템은 서버, 네트워크, DBMS, 어플리케이션, 응용 플랫폼, 응용 솔루션을 포함하는 ICT 인프라 및 소스코드 개발 시스템별로 원시데이터를 추출하고, 상기 추출된 원시데이터에 대응하는 사용률, 접속자수, 접속상황, 자원 점유상황, 응답시간을 감지하여 상기 ICT 인프라에서 구동되는 소스코드 및 상기 소스코드 개발 시스템에서 구동되는 소스코드 별로 라벨링하는 마이크로 에이전트; 상기 마이크로 에이전트로부터 라벨링된 데이터를 전송받는 데이터 수집부; 네트워크 장애예측을 위한 학습 데이터가 구축되는 적용 학습모델; 상기 소스코드 개발 시스템으로부터 수집된 데이터를 이용하여 상기 적용 학습모델을 지도학습을 수행하고, 상기 ICT 인프라 및 상기 소스코드 개발 시스템으로부터 수집된 데이터를 이용하여 상기 적용 학습모델의 학습모델 업데이트 및 최적화를 수행하는 머신 러닝부; 및 상기 ICT 인프라 및 상기 소스코드 개발 시스템으로부터 수집된 데이터를 이용하여 장애예측을 수행하는 장애 예측부;을 포함한다.The network failure prediction system according to the present invention extracts raw data for each ICT infrastructure and source code development system including a server, a network, a DBMS, an application, an application platform, and an application solution, and the usage rate corresponding to the extracted raw data, the number of users a micro agent that detects the number, access status, resource occupancy status, and response time and labels each source code driven in the ICT infrastructure and source code driven in the source code development system; a data collection unit receiving the labeled data from the micro-agent; an applied learning model in which learning data for network failure prediction is built; Perform supervised learning of the applied learning model using data collected from the source code development system, and update and optimize the learning model of the applied learning model using data collected from the ICT infrastructure and the source code development system. a machine learning unit to perform; and a failure prediction unit that performs failure prediction using data collected from the ICT infrastructure and the source code development system.

또한 상기 데이터 수집부에 의하여 수집된 데이터의 탐색, 변수 정의, 데이터 라벨링을 수행하는 데이터 전처리부;를 포함할 수 있다.It may also include a data pre-processing unit that searches for the data collected by the data collection unit, defines a variable, and performs data labeling.

또한 상기 적용 학습모델에 의하여 장애 예측이 수행되는 결과인 종속변수에 대응하는 독립변수를 종속변수의 유형에 따라 유형별로 구분하는 소스코드 유형 분류부;를 포함할 수 있다.In addition, it may include; a source code type classification unit for classifying the independent variable corresponding to the dependent variable, which is a result of the failure prediction by the applied learning model, by type according to the type of the dependent variable.

또한 상기 데이터전처리부는 상기 데이터 수집부에 의하여 수집된 데이터 중 상기 마이크로 에이전트를 통하여 수집된 라벨링된 데이터의 소스코드가 속하는 유형을 상기 소스코드 유형 분류부로부터 전달받아 전달된 소스코드의 유형을 독립변수로 하여 장애 예측부에 함께 전달할 수 있다.In addition, the data preprocessor receives from the source code type classification unit the type to which the source code of the labeled data collected through the microagent belongs among the data collected by the data collection unit, and sets the type of the transmitted source code as an independent variable. This can be transmitted together with the failure prediction unit.

또한 상기 지도학습에 의한 각 소스코드 유형 별 각각의 독립변수에 대한 가중치를 상기 적용 학습모델로부터 참조하여 상기 각 소스코드에 대한 독립변수 중 기 설정된 기준 이하의 가중치를 구비하는 독립변수가 제외된 유형별 독립변수 세트 데이터를 상기 마이크로 에이전트에 전달하는 예측결과 추적부;를 포함할 수 있다.In addition, by referring to the weight of each independent variable for each source code type by supervised learning from the applied learning model, among the independent variables for each source code, independent variables having a weight less than or equal to a preset standard are excluded for each type It may include; a prediction result tracking unit that transmits the independent variable set data to the micro-agent.

또한 상기 마이크로 에이전트는 상기 유형별 독립변수 세트 데이터를 전달받아 상기 소스코드 별로 필요한 독립변수를 수집할 수 있다.In addition, the microagent may receive the independent variable set data for each type and collect the necessary independent variables for each source code.

또한 상기 마이크로 에이전트는 상기 라벨링된 데이터에 대응하는 소스 코드에 대한 검색을 수행할 수 있다.In addition, the micro-agent may perform a search for the source code corresponding to the labeled data.

본 발명에 따르면, ICT인프라의 장애를 AI기반의 자동화된 예방이 가능하며, 소스코드에 따른 장애 발생시 장애 원인 분석과 예방 조치를 제공함과 더불어 유형별 AI의 장애 판단 근거 제공으로 향후 추가 장애 요인에 대한 대응이 가능하다.According to the present invention, AI-based automated prevention of ICT infrastructure failure is possible, and when failure occurs according to source code, failure cause analysis and preventive measures are provided. response is possible.

도 1은 일 실시예에 따른 네트워크 장애예측 시스템을 나타내는 블록도이다.1 is a block diagram illustrating a network failure prediction system according to an embodiment.

이하 첨부된 도면을 참조하여 본 발명의 실시예를 설명한다. 특별한 정의나 언급이 없는 경우에 본 설명에 사용하는 방향을 표시하는 용어는 도면에 표시된 상태를 기준으로 한다. 또한 각 실시예를 통하여 동일한 도면부호는 동일한 부재를 가리킨다. 한편, 도면상에서 표시되는 각 구성은 설명의 편의를 위하여 그 두께나 치수가 과장될 수 있으며, 실제로 해당 치수나 구성간의 비율로 구성되어야 함을 의미하지는 않는다.Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. Unless there is a specific definition or reference, the terms indicating the direction used in this description are based on the state indicated in the drawings. Also, the same reference numerals refer to the same members throughout each embodiment. On the other hand, each component shown in the drawings may have an exaggerated thickness or dimension for convenience of description, and does not mean that it should actually be configured in a ratio between the corresponding dimensions or components.

도 1을 참조하여 일 실시예에 따른 네트워크 장애예측 시스템을 설명한다. 도 1은 일 실시예에 따른 네트워크 장애예측 시스템을 나타내는 블록도이다.A network failure prediction system according to an embodiment will be described with reference to FIG. 1 . 1 is a block diagram illustrating a network failure prediction system according to an embodiment.

본 발명에 따른 네트워크 장애예측 시스템은 웹 어플리케이션 서버(WAS), 네트워크, 데이터베이스 관리시스템(DBMS), 어플리케이션 응용 플랫폼 및 응용 솔루션을 포함하는 ICT 인프라로부터 각종 원시 데이터를 수집하며, 동시에 소스코드 개발 시스템으로부터 원시 데이터를 추출한다. 이러한 원시 데이터의 추출을 마이크로 에이전트(190)가 담당한다. 마이크로 에이전트(190)는 추출된 원시데이터에 대응하는 사용률, 접속자수, 접속상황, 자원 점유상황, 응답시간을 감지하여 ICT 인프라 및 소스코드 개발 시스템에서 구동되는 소스코드 별로 라벨링하여 데이터 수집부(110)로 데이터를 전송한다. 마이크로 에이전트(190)는 데이터 수집부(110) 및 데이터 전처리부(190)가 하는 일정 부분의 기능을 분담하여 네트워크 장애예측 시스템(100)의 본체에 부하가 집중되는 것을 방지한다.The network failure prediction system according to the present invention collects various raw data from ICT infrastructure including web application server (WAS), network, database management system (DBMS), application application platform and application solution, and at the same time from source code development system Extract raw data. The micro-agent 190 is responsible for the extraction of such raw data. The micro agent 190 detects the usage rate, the number of users, the access status, the resource occupancy status, and the response time corresponding to the extracted raw data, and labels each source code driven in the ICT infrastructure and the source code development system by labeling the data collection unit 110 ) to send data. The micro agent 190 distributes the functions of a certain part of the data collection unit 110 and the data preprocessor 190 to prevent the load from being concentrated on the main body of the network failure prediction system 100 .

또한 마이크로 에이전트(190)는 소스코드 유형 분류부(170)에 의하여 소스코드의 유형별로 설정된 독립변수 세트를 전달받아 소스코드 별로 필요한 독립변수에 해당하는 요소 데이터들만을 수집할 수 있다. 또한 마이크로 에이전트(190)는 라벨링된 데이터에 대응하는 소스 코드에 대한 검색을 수행하여 해당 소스 코드의 유형 들을 판별하기 위하여 소스코드 유형 분류부(170)에 제공할 수 있다.In addition, the micro agent 190 may receive the set of independent variables set for each type of source code by the source code type classification unit 170 and collect only element data corresponding to the independent variable required for each source code. In addition, the micro agent 190 may perform a search for the source code corresponding to the labeled data and provide it to the source code type classification unit 170 to determine the types of the source code.

데이터 수집부(110)는 마이크로 에이전트(190)로부터 라벨링된 데이터를 전송받는다. 데이터 수집부(110)는 마이크로 에이전트(190)로부터 전송받은 데이터 이외에 TCP/IP 기반의 네트워크에서 네트워크 상의 각 호스트에서 정기적으로 UI 설정 상태정보, 장애 상태정보, 등 여러 가지 정보를 자동적으로 수집하여 네트워크 관리를 한다. 데이터 수집부(110)는 보안장비로부터 로그 데이터를 수집(TCP 또는 UDP)하고, 호스트별 로컬 파일 시스템 원본 자체를 저장한 후 호스트 식별표지를 삽입한 후 퍼블리싱한다. 또한 데이터 수집부(110)는 SNMP 데이터 수집과 관련하여, 각 장치에 SNMP을 통하여 요청하고 응답을 받아 저장하고, 데이터 수집부(110)는 SSH 데이터 수집과 관련하여, 각 장치에 SSH 방식을 사용하여 연결하고, 연결된 장치의 콘솔 로그 데이터를 수집한다.The data collection unit 110 receives labeled data from the micro agent 190 . The data collection unit 110 automatically collects various information such as UI setting status information, failure status information, and the like from each host on the network in a TCP/IP-based network in addition to the data transmitted from the microagent 190 on a regular basis. manage The data collection unit 110 collects log data (TCP or UDP) from the security equipment, stores the original local file system for each host, inserts a host identification mark, and then publishes it. In addition, in relation to SNMP data collection, the data collection unit 110 makes a request to each device through SNMP and receives and stores the response, and the data collection unit 110 uses the SSH method for each device in connection with the SSH data collection. to connect and collect console log data of the connected device.

적용 학습모델(150)은 네트워크 장애예측을 위한 학습 데이터가 구축된다. 적용 학습모델(150)에는 마이크로 에이전트(190)로부터 전달된 소스코드 개발 시스템으로부터 수집된 라벨링 데이터를 이용하여 지도학습이 수행된 학습모델이 구축되며, 추후 ICT 인프라와 소스코드 개발 시스템으로부터 수집된 라벨링 데이터를 이용하여 머신 러닝부(140)에서 머신 러닝을 통하여 추가로 학습을 수행하여 업데이트된 학습모델이 구축된다.The applied learning model 150 is constructed with learning data for predicting network failure. In the applied learning model 150, a learning model in which supervised learning is performed using the labeling data collected from the source code development system delivered from the microagent 190 is built, and the labeling collected from the ICT infrastructure and the source code development system later An updated learning model is constructed by performing additional learning through machine learning in the machine learning unit 140 using the data.

머신 러닝부(140)는 이와 같이 준지도학습이 수행된 적용 학습모델(150)에 대하여 추후 수집된 라벨링 데이터를 통하여 기계 학습을 수행한다. 즉, 머신 러닝부(140)는 소스코드 개발 시스템으로부터 수집된 데이터를 이용하여 상기 적용 학습모델을 지도학습을 수행하고, ICT 인프라 및 소스코드 개발 시스템으로부터 수집된 데이터를 이용하여 적용 학습모델(150)의 학습모델 업데이트 및 최적화를 수행한다.The machine learning unit 140 performs machine learning through the labeling data collected later on the applied learning model 150 on which the semi-supervised learning is performed. That is, the machine learning unit 140 supervises the applied learning model using data collected from the source code development system, and the applied learning model 150 using data collected from the ICT infrastructure and the source code development system. ), update and optimize the learning model.

장애 예측부(130)는 ICT 인프라 및 상기 소스코드 개발 시스템으로부터 수집된 데이터를 이용하여 장애예측을 수행한다. 즉, 장애 예측부(130)는 수집된 데이터로부터 독립변수를 선정하여 머신러닝부(140)에 제공하고, 적용 학습모델(150)을 통하여 도출되는 종속변수를 전달받아 장애를 예측하게 된다.The failure prediction unit 130 performs failure prediction using data collected from the ICT infrastructure and the source code development system. That is, the disability prediction unit 130 selects an independent variable from the collected data, provides it to the machine learning unit 140 , and receives the dependent variable derived through the applied learning model 150 to predict the disability.

데이터전처리부(120)는 데이터 수집부(110)에 의하여 수집된 데이터의 탐색, 변수 정의, 데이터 라벨링을 수행한다. 특히 데이터전처리부(120)는 마이크로 에이전트(190)로부터 수집된 데이터 이외의 데이터에 대하여 이러한 전처리를 수행한다. 또한 데이터전처리부(120)는 데이터 수집부(110)에 의하여 수집된 데이터 중 마이크로 에이전트(190)를 통하여 수집된 라벨링된 데이터의 소스코드가 속하는 유형을 소스코드 유형 분류부(170)로부터 전달받아 전달된 소스코드의 유형을 독립변수로 하여 장애 예측부에 함께 전달함으로써 소스코드의 유형을 머신 러닝을 위한 하나의 요소 데이터로 할 수 있다.The data preprocessor 120 searches for the data collected by the data collection unit 110 , defines variables, and performs data labeling. In particular, the data preprocessor 120 performs such preprocessing on data other than the data collected from the microagent 190 . In addition, the data preprocessor 120 receives from the source code type classification unit 170 the type to which the source code of the labeled data collected through the microagent 190 belongs among the data collected by the data collection unit 110 . By passing the transmitted source code type as an independent variable and passing it together to the failure predictor, the source code type can be used as one element data for machine learning.

소스코드 유형 분류부(170)는 상술한 적용 학습모델(150)을 통하여 장애 예측이 수행되는 결과인 종속변수에 따라 소스코드를 특정 유형으로 분류한다. 이러한 소스코드의 유형별 분류는 소스코드 자체와 해당 소스코드로부터 머신 러닝을 통하여 도출되는 종속변수들을 하나의 세트로 하여 수행하게 된다. 이 때 해당 유형에 속하는 독립변수 항목들을 유형별로 구분하여 함께 세트로 저장한다. 즉, 소스코드 유형 분류부(170)는 소스코드에 대한 직접 분류, 해당 소스코드에 대한 결과로서의 종속변수들 세트의 유형 분류 및 이러한 소스코드와 종속변수 세트에 대하여 대응하는 독립변수 세트를 하나의 유형에 대하여 저장하게 된다.The source code type classification unit 170 classifies the source code into a specific type according to a dependent variable that is a result of the failure prediction through the above-described applied learning model 150 . The classification by type of source code is performed by setting the source code itself and the dependent variables derived through machine learning from the source code as one set. In this case, the independent variable items belonging to the corresponding type are classified by type and stored together as a set. That is, the source code type classification unit 170 directly classifies the source code, classifies the type of the set of dependent variables as a result of the source code, and separates the set of independent variables corresponding to the source code and the set of dependent variables into one. stored for the type.

예측결과 추적부(160)는 머신 러닝부(140)에 의한 판단과 결정에 대한 블랙박스를 분석하고, 머신 러닝부(140)가 예측한 장애 예측(종속변수) 및 예방 과정에 대하여 판단 근거와 자동화된 조치 과정을 관리자가 파악할 수 있도록 사용자 디스플레이에 시각화기위한 시각화 데이터를 생성한다. 즉, 예측결과 추적부(160)는 머신 러닝부(140)에 의한 장애예측 및 예방에 대하여 실무자와 담당자가 필요로 하는 단서를 제공이 가능하도록 요소별 수치 등을 시각화하는 그래픽 데이터 등을 생성할 수 있다.The prediction result tracking unit 160 analyzes the black box for the judgment and decision by the machine learning unit 140, and determines the basis of judgment and the prevention process for the failure prediction (dependent variable) and the prevention process predicted by the machine learning unit 140 It creates visualization data for visualization on the user display so that the administrator can understand the automated action process. That is, the prediction result tracking unit 160 generates graphic data that visualizes numerical values for each element, etc. so as to provide the clues required by the practitioner and the person in charge with respect to the prediction and prevention of the failure by the machine learning unit 140. can

또한 예측결과 추적부(160)는 지도학습에 의한 각 소스코드 유형 별 각각의 독립변수에 대한 가중치를 적용 학습모델(150)로부터 참조하여 각 소스코드에 대한 독립변수 중 기 설정된 기준 이하의 가중치를 구비하는 독립변수가 제외된 유형별 독립변수 세트 데이터를 마이크로 에이전트(190)에 전달한다. 즉, 예측결과 추적부(160)는 소스코드 유형 분류부(170)에 의하여 분류된 소스코드, 종속변수 및 독립변수 세트 중 기준 이하의 독립변수를 제외하여 마이크로 에이전트(190)에 전달함으로써 수집 요소 데이터의 수를 슬림화한다.In addition, the prediction result tracking unit 160 refers to the weight of each independent variable for each source code type by supervised learning from the application learning model 150 and determines the weight of the independent variable for each source code below the preset standard. The independent variable set data for each type in which the provided independent variable is excluded is transmitted to the micro agent 190 . That is, the prediction result tracking unit 160 excludes the independent variable below the standard among the source code, the dependent variable, and the set of independent variables classified by the source code type classification unit 170 and transfers it to the micro agent 190 to collect elements Reduce the number of data.

한편, 정책 결정부(180)는 장애 예측부(130)에 의하여 예측된 장애에 대하여 장애의 유형을 일반화하고, 장애 요인을 습득하며, 상기 장애에 대한 정책을 설정 및 제어할 수 있다. 관리자에게 소스코드, 종속변수 분류, 독립변수 항목의 데이터 세트와 함께 예측결과 추적부(160)에 의하여 생성된 그래픽 데이터들을 제공하고, 이를 통하여 관리자로부터 장애에 대한 정책을 수립할 수 있도록 설정값을 입력받고 이를 이용한 제어를 수행할 수 있도록 한다.On the other hand, the policy decision unit 180 may generalize the type of failure with respect to the failure predicted by the failure prediction unit 130 , acquire failure factors, and set and control a policy for the failure. Provides the manager with the graphic data generated by the prediction result tracking unit 160 together with the data set of the source code, the classification of the dependent variable, and the item of the independent variable, and through this, the manager provides the set value so that the policy for the failure can be established. It receives input and enables control using it.

한편, 관리자는 정책 결정부(180)를 통하여 독립변수 및 종속변수들이 유사한 데이터 세트에 대하여 소스코드의 유형을 통합하도록 입력할 수 있다. 이러한 소스코드 유형의 통합은 앞서 설명한 소스코드 유형 분류부(170)에 의하여 자동으로 수행하도록 할 수 있다. 이 때 통합의 기준으로서 예를 들면, 종속 변수가 동일하고 독립변수의 항목과 수가 10% 이내에서 동일한 경우에 소스코드의 유형을 통합하여 하나의 유형으로 묶도록 하는 것도 가능하다.On the other hand, the administrator may input the type of source code to be integrated with respect to a data set in which the independent variable and the dependent variable are similar through the policy decision unit 180 . The integration of such source code types may be automatically performed by the source code type classification unit 170 described above. In this case, as a standard for integration, for example, when the dependent variable is the same and the number of items and the independent variable is the same within 10%, it is also possible to combine the types of the source code and group them into one type.

이와 같은 구성을 통하여 장애 발생시 시스템 전문가가 아니면 장애 원인 파악이 불가능한 서비스 개발 단계에서의 소스코드 오류와 DB접근 오류, 시스템 자원 점유와 같은 상황을 인지하고 자동으로 예방 조치하고, 감시 대상 서비스의 개발 당시 소스코드 오류로 인한 장애 및 DBMS 관련 오류에 대한 장애 원인을 제공할 수 있게 된다.Through this configuration, when a failure occurs, it recognizes and automatically prevents situations such as source code errors, DB access errors, and system resource occupancy in the service development stage, where it is impossible to identify the cause of the failure unless you are a system expert. It becomes possible to provide the cause of failure for source code errors and DBMS-related errors.

또한 마이크로 에이전트(190)가 데이터 수집에 따른 장애 유발 요소 인지와 소스 코드 검색을 수행하고, 지능형 에이전트, 즉 장애 예측부, 예측결과 추적부(160) 및 소스코드 유형 분류부(170)들이 비교된 분석 데이터의 장애 요인에 대한 2차 분석과 학습, 장애 가능성 예측, 자동 예방 기능을 수행함으로써 효율적인 장애의 예측 및 예방이 가능하도록 한다.In addition, the micro-agent 190 performs failure-inducing factor recognition and source code search according to data collection, and the intelligent agent, that is, the failure prediction unit, the prediction result tracking unit 160 and the source code type classification unit 170 are compared. It enables efficient prediction and prevention of disability by performing secondary analysis and learning on the obstacles of analysis data, predicting the possibility of disability, and performing automatic prevention functions.

이상 본 발명의 바람직한 실시예에 대하여 설명하였으나, 본 발명의 기술적 사상이 상술한 바람직한 실시예에 한정되는 것은 아니며, 특허청구범위에 구체화된 본 발명의 기술적 사상을 벗어나지 않는 범주에서 다양하게 구현될 수 있다.Although the preferred embodiment of the present invention has been described above, the technical spirit of the present invention is not limited to the above-described preferred embodiment, and can be implemented in various ways without departing from the technical spirit of the present invention embodied in the claims. have.

100: 네트워크 장애예측 장치
110: 데이터 수집부
120: 데이터 전처리부
130: 장애 예측부
140: 머신 러닝부
140: 적용 학습모델
160: 예측결과 추적부
170: 소스코드 유형 분류부100: network failure prediction device
110: data collection unit
120: data preprocessor
130: failure prediction unit
140: machine learning unit
140: applied learning model
160: prediction result tracking unit
170: source code type classification unit

Claims

서버, 네트워크, DBMS, 어플리케이션, 응용 플랫폼, 응용 솔루션을 포함하는 ICT 인프라 및 소스코드 개발 시스템별로 원시데이터를 추출하고, 상기 추출된 원시데이터에 대응하는 사용률, 접속자수, 접속상황, 자원 점유상황, 응답시간을 감지하여 상기 ICT 인프라에서 구동되는 소스코드 및 상기 소스코드 개발 시스템에서 구동되는 소스코드 별로 라벨링하는 마이크로 에이전트;
상기 마이크로 에이전트로부터 라벨링된 데이터를 전송받는 데이터 수집부;
네트워크 장애예측을 위한 학습 데이터가 구축되는 적용 학습모델;
상기 소스코드 개발 시스템으로부터 수집된 데이터를 이용하여 상기 적용 학습모델을 지도학습을 수행하고, 상기 ICT 인프라 및 상기 소스코드 개발 시스템으로부터 수집된 데이터를 이용하여 상기 적용 학습모델의 학습모델 업데이트를 수행하는 머신 러닝부; 및
상기 ICT 인프라 및 상기 소스코드 개발 시스템으로부터 수집된 데이터를 이용하여 장애예측을 수행하는 장애 예측부;을 포함하고,
상기 데이터 수집부에 의하여 수집된 데이터의 탐색, 변수 정의, 데이터 라벨링을 수행하는 데이터 전처리부;를 포함하며,
상기 소스코드에 대한 분류, 상기 적용 학습모델에 의하여 장애 예측이 수행되는 결과인 종속변수 및 상기 소스코드와 종속변수에 대응하는 독립변수 세트를 유형별로 저장하는 소스코드 유형 분류부;를 포함하고,
상기 데이터 전처리부는 상기 데이터 수집부에 의하여 수집된 데이터 중 상기 마이크로 에이전트를 통하여 수집된 라벨링된 데이터의 소스코드가 속하는 유형을 상기 소스코드 유형 분류부로부터 전달받아 전달된 소스코드의 유형을 독립변수로 하여 장애 예측부에 함께 전달하는 네트워크 장애예측 시스템.Extract raw data for each ICT infrastructure and source code development system including server, network, DBMS, application, application platform, and application solution, and use rate corresponding to the extracted raw data, number of users, connection status, resource occupancy status, a micro agent that detects response time and labels each source code driven in the ICT infrastructure and source code driven in the source code development system;
a data collection unit receiving labeled data from the microagent;
an applied learning model in which learning data for network failure prediction is built;
Perform supervised learning of the applied learning model using data collected from the source code development system, and update the learning model of the applied learning model using data collected from the ICT infrastructure and the source code development system machine learning unit; and
and a failure prediction unit that performs failure prediction using data collected from the ICT infrastructure and the source code development system.
It includes; a data pre-processing unit that searches for the data collected by the data collection unit, defines a variable, and performs data labeling;
A source code type classification unit for storing, by type, a dependent variable that is a result of the classification of the source code and the failure prediction by the applied learning model, and a set of independent variables corresponding to the source code and the dependent variable,
The data pre-processing unit receives the type to which the source code of the labeled data collected through the micro-agent among the data collected by the data collection unit belongs from the source code type classification unit, and sets the type of the transmitted source code as an independent variable. network failure prediction system that transmits it together to the failure prediction unit.

삭제delete

제1항에 있어서,
상기 지도학습에 의한 각 소스코드 유형 별 각각의 독립변수에 대한 가중치를 상기 적용 학습모델로부터 참조하여 상기 각 소스코드에 대한 독립변수 중 기 설정된 기준 이하의 가중치를 구비하는 독립변수가 제외된 유형별 독립변수 세트 데이터를 상기 마이크로 에이전트에 전달하는 예측결과 추적부;를 포함하는 네트워크 장애예측 시스템.According to claim 1,
Independent by type except for independent variables having a weight less than or equal to a preset standard among independent variables for each source code by referring to the weight of each independent variable for each source code type by supervised learning from the applied learning model A network failure prediction system comprising a; a prediction result tracking unit that transmits the variable set data to the micro-agent.

제5항에 있어서,
상기 마이크로 에이전트는 상기 유형별 독립변수 세트 데이터를 전달받아 상기 소스코드 별로 필요한 독립변수를 수집하는 네트워크 장애예측 시스템.6. The method of claim 5,
The micro-agent receives the type-specific independent variable set data and collects the necessary independent variables for each source code.

제1항에 있어서,
상기 마이크로 에이전트는 상기 라벨링된 데이터에 대응하는 소스 코드에 대한 검색을 수행하는 네트워크 장애예측 시스템.According to claim 1,
The micro-agent is a network failure prediction system for performing a search for a source code corresponding to the labeled data.