KR101791021B1

KR101791021B1 - Monitoring Method by Automatic Threshold Setting and Automatic Monitoring System applying the same

Info

Publication number: KR101791021B1
Application number: KR1020150130140A
Authority: KR
Inventors: 김원철; 유영채; 손영섭
Original assignee: 에스케이 주식회사
Priority date: 2015-09-15
Filing date: 2015-09-15
Publication date: 2017-10-27
Also published as: WO2017047952A1; KR20170032610A

Abstract

임계값 자동 설정에 의한 시스템 자동 모니터링 방법 및 이를 적용한 모니터링 시스템이 제공된다. 본 발명의 실시예에 따른 서비스 모니터링 방법은, 서비스를 제공하면서 측정된 과거 측정값들을 이용하여 정상값을 생성하고, 정상값을 이용하여 임계값을 생성하며, 임계값을 이용하여 서비스 제공 상의 장애 발생을 판단한다. 이에 의해, 최적의 임계값을 자동으로 설정하고, 설정된 임계값에 의해 시스템을 자동으로 모니터링하여, 시스템 장애 판단의 정확도를 높일 수 있게 된다.A system automatic monitoring method by automatic threshold setting and a monitoring system applying the system are provided. A service monitoring method according to an embodiment of the present invention generates a steady state value using past measured values while providing a service, generates a threshold value using a steady state value, . Thereby, the optimal threshold value is automatically set, and the system is automatically monitored by the set threshold value, so that the accuracy of the system failure judgment can be increased.

Description

임계값 자동 설정에 의한 시스템 자동 모니터링 방법 및 이를 적용한 모니터링 시스템{Monitoring Method by Automatic Threshold Setting and Automatic Monitoring System applying the same}TECHNICAL FIELD [0001] The present invention relates to an automatic monitoring system for automatically setting a threshold value,

본 발명은 시스템 모니터링 기술에 관한 것으로, 더욱 상세하게는 서비스들과 서비스들을 제공하기 위해 사용되는 리소스들로 구성되는 시스템을 모니터링하는 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to system monitoring techniques, and more particularly, to a method of monitoring a system comprised of resources used to provide services and services.

비즈니스 서비스를 제공하기 위해서는 시스템이 필요하다. 이 시스템은 여러 개의 리소스들로 구성되는 것이 일반적이며, 시스템 구성은 복잡해지고 있는 추세에 있다.A system is needed to provide business services. This system is generally composed of a plurality of resources, and the system configuration is becoming complicated.

이에 따라, 서비스 장애 발생에 대한 모니터링 역시 매우 복잡할 수 밖에 없어, 서비스 장애 감지 및 이를 위한 대응에 많은 인력이 필요하고 막대한 비용이 소요된다.As a result, the monitoring of the occurrence of the service failure is also very complicated, requiring a large amount of manpower for the service failure detection and countermeasures, and a great cost.

서비스 장애 모니터링에 있어 가장 중요하고 기본이 되는 것은, 장애 판단의 정확성이다. 현재 장애 판단은 자동으로 이루어지는데, 수집된 측정값과 설정된 임계값을 비교하는 방식에 의하고 있다.The most important and fundamental basis for service fault monitoring is the accuracy of fault judgment. Currently, the fault judgment is performed automatically. It is based on a method of comparing the measured value with the set threshold value.

임계값 설정이 잘못된 경우 장애 판단이 부정확해진다. 복잡한 환경과 상황이 빈번한 요즘에는, 이 같은 상황/환경을 커버할 수 있는 적정한 임계값을 설정하는 것은 매우 어렵다.If the threshold setting is incorrect, the fault judgment becomes inaccurate. With complex environments and situations nowadays, it is very difficult to set appropriate thresholds to cover this situation / environment.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 시스템 장애 판단의 정확도를 높이기 위한 방안으로, 최적의 임계값을 자동으로 설정하고, 설정된 임계값에 의해 시스템을 자동으로 모니터링하는 방법 및 이를 적용한 시스템을 제공함에 있다.SUMMARY OF THE INVENTION The present invention has been made in order to solve the above problems, and it is an object of the present invention to improve the accuracy of the system failure judgment by automatically setting an optimal threshold value, And a system to which the present invention is applied.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 서비스 모니터링 방법은, 서비스를 제공하면서 측정된 과거 측정값들을 이용하여 정상값을 생성하는 단계; 상기 정상값을 이용하여 임계값을 생성하는 단계; 및 상기 임계값을 이용하여 상기 서비스 제공 상의 장애 발생을 판단하는 단계;를 포함한다.According to an aspect of the present invention, there is provided a service monitoring method comprising: generating a steady state value using past measured values while providing a service; Generating a threshold using the steady state value; And determining a failure occurrence on the service provision using the threshold value.

그리고, 상기 정상값은, 상기 과거 측정값들의 중간값일 수 있다.The steady state value may be a median value of the past measured values.

또한, 상기 과거 측정값들은, 특정 시점들에서 측정된 측정값들일 수 있다.In addition, the past measurement values may be measured values measured at specific points in time.

그리고, 상기 특정 시점들은, 월, 일, 요일, 시, 분, 초 중 어느 하나가 일치하는 시점들일 수 있다.The specific time points may be time points at which any one of month, day, day, hour, minute, and second coincide.

또한, 상기 과거 측정값은, 상기 특정 시점이 포함된 구간에서 측정된 측정값들의 평균값일 수 있다.The past measurement value may be an average value of the measurement values measured in the interval including the specific time point.

그리고, 상기 임계값 생성단계는, 상기 정상값과 고정값을 이용한 연산으로, 상기 임계값을 생성할 수 있다.The threshold value generation step may generate the threshold value using an operation using the steady value and the fixed value.

또한, 상기 임계값 생성단계는, 상기 정상값과 변동값을 이용한 연산으로, 상기 임계값을 생성할 수 있다.In addition, the threshold value generation step may generate the threshold value by an operation using the steady value and the variation value.

그리고, 본 발명의 일 실시예에 따른 서비스 모니터링 방법은, 측정값 패턴과 정상값 패턴을 비교하여, 장애 징후를 판단하는 단계;를 더 포함할 수 있다.The method of monitoring a service according to an embodiment of the present invention may further include comparing a measured value pattern with a steady value pattern to determine a failure symptom.

한편, 본 발명의 다른 실시예에 따른, 모니터링 시스템은, 서비스를 제공하면서 측정된 과거 측정값들을 이용하여 정상값을 생성하고, 상기 정상값을 이용하여 임계값을 생성하는 분석부; 및 측정값을 수집하고, 상기 임계값을 참조하여 상기 서비스 제공 상의 장애 발생을 판단하는 수집부;를 포함한다.According to another aspect of the present invention, there is provided a monitoring system including: an analysis unit generating a steady state value using past measured values while providing a service, and generating a threshold value using the steady state value; And a collection unit for collecting measurement values and referring to the threshold value to determine occurrence of a failure on the service provision.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 최적의 임계값을 자동으로 설정하고, 설정된 임계값에 의해 시스템을 자동으로 모니터링하여, 시스템 장애 판단의 정확도를 높일 수 있게 된다. 이에 따라, 불필요한 장애 대응 횟수가 줄어들어, 그에 소요되는 인력과 비용을 획기적으로 줄일 수 있게 된다.As described above, according to the embodiments of the present invention, it is possible to automatically set the optimum threshold value and automatically monitor the system based on the set threshold value, thereby improving the accuracy of the system failure judgment. As a result, unnecessary frequency of failure countermeasures is reduced, and the manpower and cost required for the reduction can be drastically reduced.

또한, 본 발명의 실시예들에 따르면, 시스템에 대한 장애 판단 시점이 보다 빨라져, 즉, 사용자(고객)가 장애를 체감하기 전에 장애를 판단하여, 초동 대응을 빠르게 함으로써, 사용자 불만을 미연에 방지할 수 있게 된다.In addition, according to the embodiments of the present invention, it is possible to prevent a user from complaining by promptly determining the failure of the system, that is, by determining the failure before the user (customer) .

도 1은 본 발명의 일 실시예에 따른 시스템 모니터링 방법의 개념 설명에 제공되는 도면,
도 2에는 장애 판단/예측 방법의 설명에 제공되는 도면,
도 3은 정상값 생성 방법의 설명에 제공되는 도면,
도 4는 정상값을 이용한 임계값 생성 방법의 설명에 제공되는 도면, 그리고,
도 5는 본 발명의 다른 실시예에 따른 모니터링 시스템의 블럭도이다.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of a system monitoring method according to an embodiment of the present invention;
FIG. 2 is a block diagram illustrating the failure judgment /
3 is a diagram provided in the explanation of the steady-
FIG. 4 is a diagram provided for explanation of a threshold value generation method using a steady value, and FIG.
5 is a block diagram of a monitoring system in accordance with another embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 시스템 모니터링 방법의 개념 설명에 제공되는 도면이다. 본 발명의 실시예에 따른 시스템 모니터링 방법은, 기업의 비즈니스를 위한 시스템을 자동으로 모니터링하기 위한 방법이다.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram provided in a conceptual description of a system monitoring method according to an embodiment of the present invention; FIG. A system monitoring method according to an embodiment of the present invention is a method for automatically monitoring a system for a business of an enterprise.

구체적으로, 도 1에 도시된 바와 같이, 기업의 시스템에서, 서비스들과 이 서비스들을 제공하는데 사용되는 서버, 네트워크, WEB/WAS, DBMS, 스토리지, 애플리케이션 등의 리소스들이 모니터링 대상이 된다.Specifically, as shown in FIG. 1, in the enterprise system, resources such as services and servers, networks, WEB / WAS, DBMS, storage, and applications used to provide these services are monitored.

이를 위해, 시스템 모니터링 시스템은, "①"에 나타난 바와 같이, 서비스들과 리소스들의 상태 정보를 수집한다. 하나의 상태 정보로부터 단일 패턴 분석을 통해 장애로 판단되는 경우, 시스템 모니터링 시스템은 정보 수집 과정 중에 알람을 발생시킨다.To this end, the system monitoring system collects status information of services and resources, as indicated by "1 ". If the system monitoring system determines from the single state information that it is a failure through a single pattern analysis, it generates an alarm during the information gathering process.

또한, 시스템 모니터링 시스템은, 수집된 서비스들과 리소스들의 상태 정보를 필터링하고, 표준/정형화한 후 연계/가공하여, 장애 예측 분석의 기초 정보를 생성할 수 있다.In addition, the system monitoring system may filter the status information of the collected services and resources, and standard / formulate and associate / process the information to generate the baseline information of the failure prediction analysis.

한편, 시스템 모니터링 시스템은, "②"에 나타난 바와 같이, 다수의 서비스 상태 정보들로부터 복합 패턴 분석을 통해, 장애 징후를 감지하고, 장애 징후가 감지되면 알람을 발생시킨다.On the other hand, the system monitoring system detects an obstacle symptom through a complex pattern analysis from a plurality of service state information as shown by "2 ", and generates an alarm when a trouble symptom is detected.

나아가, "①"에서 연계/가공된 기초 정보들을 장기간 누적하여 획득한 빅 데이터를 분석하여 장애 예측을 수행할 수도 있다.Furthermore, it is also possible to perform the failure prediction by analyzing the big data obtained by accumulating the basic information linked / processed in "①" for a long period of time.

또한, "③"에 나타난 바와 같이, 시스템 모니터링 시스템은 관계-맵을 생성하고, 업데이트하여 관리한다. 관계-맵에는, 서비스들 간의 연관 관계, 서비스들을 제공하는데 사용되는 리소스들에 대한 정보가 서비스 별로 구축되어 있다.Also, as indicated by "3 ", the system monitoring system creates and updates and manages the relational-map. In the relationship-map, information on resources used to provide services and relationships between services is constructed for each service.

관계-맵은, 장애 발생에 의한 비즈니스/서비스 영향도를 판단하고, 시스템 모니터링 결과를 체계적으로 제공하여 주기 위해 필요하다.The relationship-map is needed to determine the business / service impact due to a failure and systematically provide system monitoring results.

"④"에 나타난 바와 같이, 시스템 모니터링 시스템에 의해, 서비스 장애 발생 여부에 대한 모니터링은 자동으로 이루어지고 장애 발생시 운영자에게 알람을 발생시킨다.As shown in "④", the system monitoring system monitors the occurrence of a service failure automatically and generates an alarm to the operator in the event of a failure.

도 2에는 시스템에 발생한 장애를 판단하고, 판단 전에 장애 발생을 예측하는 방법을 나타낸 도면이다.FIG. 2 is a diagram illustrating a method of determining a failure occurring in the system and predicting occurrence of a failure before determination.

도 2에 도시된 바와 같이, 시스템으로부터 측정한 응답속도(실선)가 임계값을 벗어난 경우, 즉, 상한 임계값 초과시 또는 하한 임계값 미만시 에는, 시스템에 장애가 발생한 것으로 판단한다.As shown in FIG. 2, when the response speed (solid line) measured from the system is out of the threshold value, that is, when the upper limit threshold value is exceeded or the lower limit threshold value is lower than the lower limit threshold value,

여기서, 장애 판단의 기준선인 임계값(점선)은 시점에서 따라 변동되는 변동 임계값를 적용한다. 장애 판단에 이용할 임계값를 설정하는 방법에 대해서는, 상세히 후술한다.Here, the threshold value (dotted line), which is the baseline of the failure judgment, is applied to the variation threshold value which varies with the point of view. A method of setting a threshold value to be used for fault judgment will be described later in detail.

한편, 임계값을 벗어난 경우라 할지라도, 서비스의 특성을 고려하여 장애 발생이 아닌 것으로 예외적인 취급이 가능하다.On the other hand, even when the threshold value is out of range, it is possible to handle the exception that the fault does not occur in consideration of the characteristics of the service.

예를 들어, i) 특정 시점/일자/구간(은행 업무 서비스의 경우 정산 시간, 대학 서비스의 경우 수강 신청일이나 원서 접수 기간)에 해당하는 경우, ii) 서비스 제공에 사용되는 리소스에 대한 변경 작업이 수행 중인 경우 등이 예외적인 경우에 해당한다.For example: i) the time of day / day / interval (settlement time for banking services, application period or application period for university services), ii) And the case where it is being executed corresponds to an exceptional case.

또한, 측정값이 임계값을 벗어난 횟수에 대한 조건을 더 부가하는 것도 가능하다. 예를 들어, 10분 동안 측정값이 임계값을 3회 이상 벗어난 경우에, 장애가 발생한 것으로 판단하는 것이 가능하다.It is also possible to add conditions for the number of times the measurement value deviates from the threshold value. For example, it is possible to determine that a fault has occurred if the measured value deviates more than three times from the threshold value for 10 minutes.

아울러, 단일 측정값으로 장애 판단하는 것이 아닌 다수의 측정값들에 의한 복합 판단도 가능하다. 예를 들어, 도 2에 도시된 바와 같은 응답 속도가 임계값을 벗어남과 동시에, 메모리 점유율도 임계값을 벗어난 경우에, 장애가 발생하였다고 판단하는 것이다.In addition, it is possible to make a composite judgment based on a plurality of measured values rather than judging a failure by a single measured value. For example, when the response speed as shown in Fig. 2 deviates from the threshold value and the memory occupancy rate also deviates from the threshold value, it is determined that a failure has occurred.

한편, 도 2에 도시된 바와 같이, 시스템에 장애가 발생한 것으로 판단된 시점 이전에, 장애 징후(이상 징후)가 감지되었음을 확인할 수 있다. 장애 징후란, 실시간으로 측정되는 응답속도의 패턴이 평상시 패턴과 많이 다른 경우를 말한다.Meanwhile, as shown in FIG. 2, it can be confirmed that a failure symptom (an abnormal symptom) is detected before a failure is determined to have occurred in the system. The disturbance symptom refers to a case where the pattern of the response speed measured in real time is significantly different from the normal pattern.

평상시 패턴은 도 1의 "①"에서 연계/가공된 기초 정보들을 장기간 누적하여 획득한 빅 데이터를 분석하여 생성한다. 평상시 패턴으로 정상값 패턴을 이용하는 것이 가능하다.The normal pattern is generated by analyzing the big data obtained by accumulating the basic information linked / processed in "① " in FIG. 1 for a long period of time. It is possible to use the normal value pattern as a normal pattern.

장애 징후가 발생한 경우, 측정값은 임계값을 벗어나지 않음에 유념하여야 한다. 만약, 측정값이 임계값을 벗어난 경우라면 장애 징후가 아닌 장애가 발생한 것에 해당한다.It should be noted that in the event of a failure indication, the measured value does not deviate from the threshold value. If the measured value is out of the threshold, it corresponds to a failure, not a failure indication.

장애 징후가 발생하면, 향후 시스템 장애가 발생할 가능성이 있다고 예측한다. 한편, 장애 징후는, 측정값 하나에 대한 패턴 분석에만 의존하는 것이 아닌 다수의 측정값에 대한 복합 패턴 분석을 통해, 판단하는 것이 좋다.If a failure symptom occurs, it is predicted that a future system failure may occur. On the other hand, it is better to judge the failure symptom through a complex pattern analysis for a plurality of measured values, rather than depending only on the pattern analysis for one of the measured values.

예를 들어, 도 2에 도시된 바와 같은 응답 속도 패턴이 나타나고, 메모리 점유율이 평상시 패턴과 많이 다르게 나타난 경우에, 장애 징후가 발생하였다고 판단할 수 있다.For example, when a response speed pattern as shown in FIG. 2 appears and the memory occupancy rate is significantly different from the normal pattern, it can be determined that a failure symptom has occurred.

장애 발생은 물론, 장애 징후가 감지된 경우에도, 시스템 모니터링 시스템은 알람을 발생시켜, 운영자에게 알린다.The system monitoring system generates an alarm and notifies the operator of the occurrence of the failure, as well as the indication of the failure.

이하에서, 장애 판단의 기준으로 작용하는 임계값을 설정하는 과정에 대해, 도 3 및 도 4를 참조하여 상세히 설명한다.Hereinafter, a process of setting a threshold value serving as a criterion for determining a failure will be described in detail with reference to FIGS. 3 and 4. FIG.

도 3은 임계값 생성의 기초가 되는 정상값을 생성하는 방법의 설명에 제공되는 도면이다. 도 4를 통해 후술하겠지만, 임계값은 정상값으로부터 생성된다.Figure 3 is a diagram provided in the description of a method of generating a steady state value that is the basis of threshold generation. As will be described later with reference to FIG. 4, the threshold value is generated from the normal value.

도 3에 도시된 바와 같이, 정상값은 시스템을 통해 서비스를 제공하면서 측정된 과거 측정값들의 중간값(Median)이다. 그리고, 과거 측정값들은 동일/동종 시점에 측정된 값들이다.As shown in FIG. 3, the steady state value is a median of past measured values while providing service through the system. And, past measurement values are measured at the same / homogeneous time point.

도 3에서는 과거 7주 간의 측정값들 중 목요일의 측정값들에 대해 1분 단위로 중간값을 선정하는 상황이 나타나 있다.FIG. 3 shows a situation in which an intermediate value is selected in units of one minute for the measured values of Thursday among the measured values in the past seven weeks.

즉, 과거 7주 동안 목요일 9시 00분(정각)의 측정값들에 대한 중간값을 선정하고, 과거 7주 동안 목요일 9시 01분의 측정값들에 대한 중간값을 선정하는 상황이 나타나 있다.That is, a median value for the measurements at 9:00 on Thursday (right) was selected for the past 7 weeks, and a median value for the measurements on Thursday 9:01 was selected for the past 7 weeks .

여기서, 과거 측정값들은 측정 구간에서의 평균값이다. 여기서, 측정 구간이란 측정 시점이 포함된 구간을 의미한다. 부연하면, 1초 단위로 측정이 이루어진다고 가정할 때, i) 2015년 7월 23일 목요일 09:00의 측정값은, 2015년 7월 23일 목요일 09:00 부터 09:01 이전까지 측정된 60개의 측정값에 대한 평균이고, ii) 2015년 7월 09일 목요일 09:01의 측정값은, 2015년 7월 09일 목요일 09:01 부터 09:02 이전까지 측정된 60개의 측정값에 대한 평균인 것이다.Here, past measurements are average values in the measurement interval. Here, the measurement period means a period including the measurement point. In addition, assuming that measurements are taken in units of seconds, i) the measurements at 09:00 on Thursday, July 23, 2015, measured from 09:00 to 09:01 on Thursday, July 23, 2015 The mean of 60 measurements, and ii) the measure of 09:01 on Thursday, July 09, 2015, for the 60 measurements measured from 09:01 to 09:02 on Thursday, July 09, 2015 It is an average.

도 4는 임계값을 생성하는 방법의 설명에 제공되는 도면이다. 도 4에 도시된 바와 같이, 임계값 생성은 다양한 방식으로 가능하다.Figure 4 is a diagram provided in the description of a method of generating a threshold. As shown in FIG. 4, threshold generation is possible in various ways.

첫 번째, 고정된 임계값을 사용하는 방식이다. 이 경우는, 도 3을 통해 산출된 정상값을 이용하지 않는다는 점에서, 정상값을 이용하는 아래의 나머지 방법들과 차이가 있다.First, a fixed threshold is used. In this case, there is a difference from the remaining methods below, which use a steady state value, in that the steady state value calculated through Fig. 3 is not used.

두 번째, 고정값을 이용하여 임계값를 생성하는 방식이다. 즉, 정상값에 고정값을 가산하여 상한 임계값(MAX)를 생성하고, 정상값으로부터 고정값을 감산하여 하한 임계값(MIN)를 생성하는 방식이다.Second, a threshold value is generated using a fixed value. That is, the fixed value is added to the steady value to generate the upper threshold value MAX, and the fixed value is subtracted from the steady value to generate the lower threshold value MIN.

또는, 정상값에 고정 상한 비율(이를 테면, 120%)을 곱한 값으로 상한 임계값(MAX)를 생성하고, 정상값에 고정 하한 비율(이를 테면, 80%)을 곱한 값으로 하한 임계값(MIN)를 생성하는 것도 가능하다.Alternatively, the upper limit threshold value MAX is generated by multiplying the steady state value by the fixed upper limit rate (for example, 120%), and the lower limit threshold value MAX is generated by multiplying the steady state value by the fixed lower limit ratio (for example, 80% MIN) can be generated.

고정 값과 고정 비율을 사용하기 때문에, 상한 임계값과 하한 임계값의 폭이 일정하다.Since fixed and fixed ratios are used, the widths of upper and lower thresholds are constant.

세 번째, 변동값을 이용하여 임계값를 생성하는 방식이다. 즉, 정상값에 변동값을 가산하여 상한 임계값(MAX)를 생성하고, 정상값으로부터 변동값을 감산하여 하한 임계값(MIN)를 생성하는 방식이다.Third, a threshold value is generated using a variation value. That is, the upper limit threshold value MAX is generated by adding the variation value to the steady value, and the lower limit threshold value MIN is generated by subtracting the variation value from the steady value.

또는, 정상값에 변동 상한 비율을 곱한 값으로 상한 임계값(MAX)를 생성하고, 정상값에 변동 하한 비율을 곱한 값으로 하한 임계값(MIN)를 생성하는 것도 가능하다.Alternatively, it is possible to generate the upper limit threshold value MAX by multiplying the steady state value by the fluctuation upper limit ratio, and generate the lower limit threshold value MIN by multiplying the steady state value by the lower limit variation rate.

변동 값과 변동 비율을 사용하기 때문에, 상한 임계값과 하한 임계값의 폭이 가변적이다. 변동 값과 변동 비율은 다양한 방법으로 정할 수 있다.Since the variation value and the variation ratio are used, the widths of the upper limit threshold and the lower limit threshold are variable. Variable values and variable rates can be determined in a variety of ways.

예를 들어, 정상값(70)과 이전 시점들의 정상값 3개(75,70,60) 및 이후 시점들의 정상값 3개(80,80,65) 중 최대값(80)과 최소값(60)의 차(20)를 산출하고, 산출된 차의 80%(16)를 변동 값으로 정할 수 있다. 이 경우, 상한 임계값은 86(70+16)이고, 하한 임계값은 64(70-16)이 된다.For example, the maximum value 80 and the minimum value 60 of the steady state value 70, the steady state values of the previous time points 75, 70, 60, and the steady state values 80, The difference 20 of 80% (16) of the calculated difference can be determined as the variation value. In this case, the upper threshold value is 86 (70 + 16) and the lower threshold value is 64 (70-16).

지금까지 설명한 임계값은 상한과 하한이 모두 포함된 임계값이었다. 하지만, 서비스나 리소스의 특성에 따라 상한과 하한 중 하나만을 포함하는 임계값을 적용하는 경우도 본 발명의 기술적 사상이 적용될 수 있다.The threshold values described so far are threshold values including both the upper limit and the lower limit. However, the technical idea of the present invention can also be applied to a case where a threshold value including only one of the upper limit and the lower limit is applied according to the characteristics of a service or a resource.

도 5는 본 발명의 다른 실시예에 따른 자동 모니터링 시스템의 블럭도이다. 이해의 편의를 위해, 도 7에는 자동 모니터링 대상이 되는 기업 시스템(10)을 더 도시하였다.5 is a block diagram of an automatic monitoring system in accordance with another embodiment of the present invention. For convenience of understanding, FIG. 7 further shows the enterprise system 10 to be automatically monitored.

본 발명의 실시예에 따른 자동 모니터링 시스템(100)은, 수집부(110), 관계-맵 저장부(120), 분석부(130), 저장부(140) 및 모니터링 화면 생성부(150)를 포함한다.The automatic monitoring system 100 according to the embodiment of the present invention includes a collecting unit 110, a relation-map storing unit 120, an analyzing unit 130, a storing unit 140, and a monitoring screen generating unit 150 .

수집부(110)는 서비스 상태 정보와 리소스 상태 정보를 수집하여 저장부(140)에 저장한다. 또한, 수집부(110)는 수집된 단일 정보로부터 단일 패턴 분석을 통해 서비스나 리소스의 장애 여부를 판단하고, 장애 발생시에는 알람을 발생시킨다.The collection unit 110 collects the service status information and the resource status information and stores them in the storage unit 140. In addition, the collecting unit 110 determines whether a service or a resource has failed through a single pattern analysis from the collected single information, and generates an alarm when a failure occurs.

또한, 수집부(110)는 수집된 상태 정보들을 필터링하고, 표준/정형화한 후 연계/가공하여, 장애 예측 분석의 기초가 되는 정보를 생성한다.In addition, the collecting unit 110 filters the collected state information, standardizes and formats and then associates / processes the information to generate information on which the failure prediction analysis is based.

관계-맵 저장부(120)는 전술한 관계-맵이 저장되는 저장 매체이다. 저장된 관계-맵은 기업 시스템(10)의 변경시에 업데이트 된다.The relationship-map storage unit 120 is a storage medium in which the aforementioned relationship-map is stored. The stored relationship-map is updated at the time of change of the enterprise system 10.

분석부(130)는 저장부(140)에 저장된 다수의 정보들로부터 복합 패턴 분석을 통해 서비스나 리소스의 장애 여부를 판단하고, 장애 발생시에는 알람을 발생시킨다.The analyzer 130 determines whether a service or a resource has failed through a complex pattern analysis from a plurality of information stored in the storage unit 140, and generates an alarm when a failure occurs.

또한, 분석부(130)는 수집부(110)에서 생성된 기초 정보를 장기간 누적시켜 획득한 빅 데이터를 분석하여 장애 예측을 수행한다.In addition, the analysis unit 130 analyzes the big data acquired by accumulating the basic information generated by the collecting unit 110 for a long period of time to perform the failure prediction.

모니터링 화면 생성부(150)는 수집부(110)의 수집결과와 분석부(130)의 분석결과를 기반으로 서비스/리소스 모니터링 화면을 생성하여 운영자에게 제공한다.The monitoring screen generation unit 150 generates a service / resource monitoring screen based on the collection result of the collection unit 110 and the analysis result of the analysis unit 130, and provides the service / resource monitoring screen to the operator.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.It goes without saying that the technical idea of the present invention can also be applied to a computer-readable recording medium having a computer program for performing the functions of the apparatus and method according to the present embodiment. In addition, the technical idea according to various embodiments of the present invention may be embodied in computer-readable code form recorded on a computer-readable recording medium. The computer-readable recording medium is any data storage device that can be read by a computer and can store data. For example, the computer-readable recording medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical disk, a hard disk drive, or the like. In addition, the computer readable code or program stored in the computer readable recording medium may be transmitted through a network connected between the computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.

100 : 모니터링 시스템
110 : 수집부 120 : 관계-맵 저장부
130 : 분석부 140 : 저장부
150 : 모니터링 화면 생성부100: Monitoring system
110: Collecting unit 120: Relation-map storage unit
130: Analyzer 140:
150: Monitoring screen generating unit

Claims

서비스를 제공하면서 측정된 과거 측정값들을 이용하여 정상값을 생성하는 단계;
상기 정상값을 이용하여 임계값을 생성하는 단계; 및
상기 임계값을 이용하여 상기 서비스 제공 상의 장애 발생을 판단하는 단계;를 포함하고,
상기 정상값은, 과거 측정값들의 중간값이며,
과거 측정값은, 특정 시점이 포함된 구간에서 측정된 측정값들의 평균값이고,
상기 임계값 생성단계는,
상기 정상값, '상기 정상값의 이전 시점들 중 특정 개수의 정상값들' 및 '상기 정상값의 이후 시점들 중 특정 개수의 정상값들' 중에서 최대값과 최소값을 선별하고,
상기 최대값과 상기 최소값의 차를 산출하여, 산출된 차의 특정 비율을 변동값으로 결정하며,
상기 정상값에 상기 변동값을 가산하여, 상한 임계값을 생성하고,
상기 정상값에서 상기 변동값을 감산하여 하한 임계값을 생성하는 것을 특징으로 하는 서비스 모니터링 방법.
Generating a steady state value using past measured values while providing a service;
Generating a threshold using the steady state value; And
And determining a failure occurrence on the service provision using the threshold value,
The normal value is a median value of past measurements,
The past measurement is an average value of the measured values measured in the interval including the specific time point,
Wherein the threshold value generation step comprises:
A maximum value and a minimum value are selected from the steady state value, a specific number of steady state values of the previous time points of the steady state value and a certain number of steady state values of the following steady state values,
Calculates a difference between the maximum value and the minimum value, determines a specific ratio of the calculated difference as a variation value,
Adding the variation value to the normal value to generate an upper limit threshold value,
And subtracting the variation value from the normal value to generate a lower limit threshold value.

삭제delete

청구항 1에 있어서,
상기 특정 시점은,
특정 월, 일, 요일, 시, 분, 초 중 어느 하나인 것을 특징으로 하는 서비스 모니터링 방법.
The method according to claim 1,
The specific point in time is,
Wherein the service monitoring method is one of a specific month, day, day of the week, hour, minute, and second.

삭제delete

청구항 1에 있어서,
측정값 패턴과 정상값 패턴을 비교하여, 장애 징후를 판단하는 단계;를 더 포함하는 것을 특징으로 하는 서비스 모니터링 방법.
The method according to claim 1,
Comparing the measured value pattern with the normal value pattern to determine a failure symptom.

서비스를 제공하면서 측정된 과거 측정값들을 이용하여 정상값을 생성하고, 상기 정상값을 이용하여 임계값을 생성하는 분석부; 및
측정값을 수집하고, 상기 임계값을 참조하여 상기 서비스 제공 상의 장애 발생을 판단하는 수집부;를 포함하고,
상기 정상값은, 과거 측정값들의 중간값이며,
과거 측정값은, 특정 시점이 포함된 구간에서 측정된 측정값들의 평균값이고,
상기 분석부는,
상기 정상값, '상기 정상값의 이전 시점들 중 특정 개수의 정상값들' 및 '상기 정상값의 이후 시점들 중 특정 개수의 정상값들' 중에서 최대값과 최소값을 선별하고,
상기 최대값과 상기 최소값의 차를 산출하여, 산출된 차의 특정 비율을 변동값으로 결정하며,
상기 정상값에 상기 변동값을 가산하여, 상한 임계값을 생성하고,
상기 정상값에서 상기 변동값을 감산하여 하한 임계값을 생성하는 것을 특징으로 하는 모니터링 시스템.
An analysis unit for generating a normal value by using measured past measurement values while providing a service and generating a threshold value using the normal value; And
And a collecting unit collecting the measured value and referring to the threshold value to determine occurrence of a fault on the service provision,
The normal value is a median value of past measurements,
The past measurement is an average value of the measured values measured in the interval including the specific time point,
The analyzing unit,
A maximum value and a minimum value are selected from the steady state value, a specific number of steady state values of the previous time points of the steady state value and a certain number of steady state values of the following steady state values,
Calculates a difference between the maximum value and the minimum value, determines a specific ratio of the calculated difference as a variation value,
Adding the variation value to the normal value to generate an upper limit threshold value,
And subtracts the variation value from the steady state value to generate a lower limit threshold value.