KR20100048738A

KR20100048738A - Method for classification and forecast of remote measuring power load patterns

Info

Publication number: KR20100048738A
Application number: KR1020080108023A
Authority: KR
Inventors: 신진호; 김영일; 이봉재; 양일권
Original assignee: 한국전력공사
Priority date: 2008-10-31
Filing date: 2008-10-31
Publication date: 2010-05-11
Also published as: CN101728868B; KR100987168B1; CN101728868A

Abstract

PURPOSE: A method for classification and forecast of remote measuring power load patterns is provided to analyze the load characteristic and power usage behavior through the representative load patterns of a client by generating the representative load patterns by groups. CONSTITUTION: Remote measurement data are collected(1), and an error and an abnormal value are excluded from sample data(2). The normalization of pre-processed data is performed(3). After grouping normalized data, a representative load pattern is generated(4). Based on measurement facility property information, the load patterns are classified(7). On the basis of non-measured facility property, a group is allocated(9).

Description

원격계측 전력 부하패턴의 분류 및 예측 방법{METHOD FOR CLASSIFICATION AND FORECAST OF REMOTE MEASURING POWER LOAD PATTERNS}METHOOD FOR CLASSIFICATION AND FORECAST OF REMOTE MEASURING POWER LOAD PATTERNS}

본 발명은 원격 계측 전력 부하패턴의 분류 및 예측 방법에 관한 것이다.The present invention relates to a method for classifying and predicting telemetric power load patterns.

전력산업의 특성상 전력설비와 고객 계기는 전국적으로 산재되어 있으며, 설비의 과부하, 저전압 등 전력 품질의 감시, 안정적 전력공급을 위한 전력 설비의 제어, 전력 사용량 취득을 위한 원격검침 등 원격계측이 유·무선 통신망을 통해 시행되고 있다. 이러한 설비의 원격계측이 확대되면서 부하 프로파일(Profile)에 관한 기술개발이 활발히 추진되고 있다. 특히 원격검침 데이터의 부하 분석을 통해 요금제도의 개선, 고객 서비스 혁신과 부가가치 제공, 영업전략 수립, 전력수급 및 에너지 정책 개발, 설비 투자 계획 수립 등에 활용하기 위한 목적으로 다양한 연구가 시도되고 있다. Due to the nature of the power industry, power facilities and customer instruments are scattered throughout the country, and remote measurements such as monitoring of power quality such as overload and low voltage, control of power facilities for stable power supply, and remote metering for power consumption are used. It is implemented through wireless communication network. As telemetry of these facilities is expanded, technology development on load profiles has been actively promoted. In particular, various studies have been attempted to improve the billing system, provide customer service innovation and value-adding, establish sales strategies, develop power supply and energy policies, and establish facility investment plans.

그러나 기존의 부하 프로파일 분석기술은 평일과 휴일만을 구분하고 하루 단위의 전력 사용량 벡터를 구성하여 계절별 연도별 분석함으로써 전반적인 부하형태만을 파악하는데 그쳤으며, 이럴 경우 월초나 월말의 평일의 부하가 동일하게 생성되기 때문에 매일 매일 변화하는 부하패턴을 분석하거나 연속적인 시간대별 부하분 석이나 예측은 불가능하다. 또한 대부분의 부하분석은 군집화하여 대표 부하패턴을 추출하는데 그치고 있고, 예측에 있어서도 기존 계측된 부하가 향후 어떻게 변화할 것인가에 초점을 맞춘 미래 예측 기술이 주류를 이루고 있다. 즉, 계측 설비의 속성을 이용한 분류와 미계측 설비에 대한 부하패턴 예측 기술이나 방법론은 찾아보기 어렵다.However, the existing load profile analysis technique distinguishes only weekdays and holidays, and constructs a daily power consumption vector to analyze only seasonal loads, and only the overall load type is identified. As a result, it is not possible to analyze load patterns that change from day to day, or to continuously analyze or predict load-hours by time. In addition, most load analysis is clustered to extract representative load patterns, and in prediction, future prediction techniques focusing on how the measured load will change in the future are mainstream. In other words, it is difficult to find techniques or methodologies for classifying using attributes of measurement equipment and load pattern prediction for unmeasured equipment.

본 발명은 상기의 문제점을 해결하기 위한 것으로서, 원격계측되는 전력 부하의 시간대별 연속적인 패턴의 군집화와 분류, 미계측 설비의 예측 방법을 제공하고자 하는 것이다.The present invention is to solve the above problems, and to provide a method for predicting the grouping and classification of the continuous pattern for each time interval of the power load to be remotely measured, the unmeasured equipment.

본 발명은 원격계측 전력 부하패턴의 분류 및 예측 방법에 관한 것으로서, 특히 원격검침이나 변압기 무선부하감시와 같이 원격계측되어 전력설비의 부하패턴이 주기적으로 생성되는 데이터를 군집화 및 분류하여 부하 특성을 분석하고 전력 공급 및 수요 계획에 반영할 수 있도록 하며, 분류된 대표 부하패턴과 설비 속성을 이용하여 고객의 설치 회피나 계측장치의 설치 곤란, 설치비용 부담 등으로 미계측되고 있는 전력설비의 부하패턴을 예측하고 계측 비용을 절감하는데 목적이 있다.The present invention relates to a method for classifying and predicting a remotely measured power load pattern. In particular, the load characteristics are analyzed by grouping and classifying data periodically generated by a remote measurement such as a remote meter reading or a wireless load monitoring of a transformer. It can be reflected in the power supply and demand plan, and the classified load patterns and equipment properties are used to identify the load patterns of the power equipment that are not measured due to the customer's avoidance of installation, difficulty in installing the measuring device, and burden of installation cost. It aims to predict and reduce measurement costs.

상기 목적을 달성하기 위한 본 발명은, 대용량 데이터의 수집과 처리 기술, 시간 데이터 마이닝 기술, 고기능 통계처리 기술, 부하 시뮬레이션 및 해석 기술을 적용한다. 시간대별 연속적인 부하패턴의 군집화를 통해 군집별 대표 부하패턴을 생성하고, 원격계측되는 설비의 속성정보와 대표 부하패턴을 이용하여 분류하며, 미계측 설비 속성을 분류 모델에 적용하여 부하패턴을 예측할 수 있도록 한다.In order to achieve the above object, the present invention applies a large-capacity data collection and processing technique, a time data mining technique, a high performance statistical processing technique, a load simulation and analysis technique. Generate representative load patterns for each cluster through clustering of continuous load patterns by time zone, classify them using attribute information and representative load patterns of equipment to be remotely measured, and apply unmeasured equipment attributes to classification models to predict load patterns. To help.

본 발명에 따른 목적은, 원격 계측 데이터 수집단계; 상기 원격계측 데이터에서 층화표본을 추출하고 표본 데이터에서 오류 및 이상치를 제외 처리하는 절차를 포함하는 데이터 전처리 단계; 상기 전처리된 데이터를 분석하고자 하는 소정의 기 간 단위로 벡터를 구성하고 원격계측값이 특정 범위 내에 분포하도록 정규화 작업을 행하는 정규화 단계; 상기 정규화된 데이터를 군집화하여 대표 부하패턴을 생성하는 군집화 단계; 계측 설비 속성정보에 기초하여 분류하는 단계; 미계측 설비 속성에 기초하여 군집을 할당하고 미계측 설비 부하패턴을 예측하는 단계를 포함하는, 원격계측 전력 부하패턴의 분류 및 예측 방법에 의해 달성될 수 있다.According to the present invention, a telemetry data collection step; A data preprocessing step including a step of extracting a stratified sample from the remote measurement data and excluding an error and an outlier from sample data; A normalization step of constructing a vector in units of predetermined periods for analyzing the preprocessed data and performing normalization so that a remote measurement value is distributed within a specific range; Clustering the normalized data to generate a representative load pattern; Classifying based on the measurement facility attribute information; And a method of classifying and predicting remotely measured power load patterns, including assigning clusters based on the unmeasured facility attributes and predicting the unmeasured facility load patterns.

본 발명의 바람직한 실시예에 따라서, 상기 군집화 단계는 정규화 데이터를 훈련 데이터와 시험 데이터로 분할하고 k값을 입력한 다음 군집화 모델을 적용하여 교차분류표를 생성하고 최적의 k값을 결정하는 재현성 평가 방법으로 군집분석을 처리하는, 원격계측 전력 부하패턴의 군집화 방법.According to a preferred embodiment of the present invention, the clustering step divides the normalized data into training data and test data, inputs a k value, and then applies a clustering model to generate a cross classification table to determine an optimal k value. A method of clustering telemetric power load patterns, which processes the cluster analysis by the method.

본 발명의 바람직한 실시예에 따라서, 상기 분류 단계는 군집화 결과와 계측 설비 정보를 이용하여 의사결정나무 구조의 분류를 포함할 수 있다.According to a preferred embodiment of the present invention, the classification step may include classification of the decision tree structure using the clustering result and the measurement equipment information.

본 발명의 바람직한 실시예에 따라서, 상기 계측 설비 속성은 원격검침 데이터의 경우 고객 특성정보로서 계약종별, 계약전력, 전기사용용도, 산업분류, 공급방식, 지역구분, 월검침량을 포함하고, 변압기 무선부하감시 데이터의 경우는 변압기 특성정보로서 용량, 전등수용호수, 동력수용호수, 부하지역특성과 해당 변압기에서 공급하고 있는 고객의 계약종별, 계약전력, 전기사용용도, 고압/저압 구분, 그리고 월사용량을 포함할 수 있다.According to a preferred embodiment of the present invention, the measurement facility property includes the type of contract, contract power, electricity use, industrial classification, supply method, regional classification, monthly meter reading amount as the customer characteristic information in case of remote meter reading data. In the case of the radio load monitoring data, the characteristics of the transformer as information on the capacity, light receiving lake, power receiving lake, load area characteristics and the customer's contract type, contract power, electric use, high / low voltage classification, and monthly May include usage.

본 발명의 바람직한 실시예에 따라서, 상기 미계측 설비 부하패턴의 예측은 군집화 결과와 계측 설비 속성과 동일한 형태로 미계측 설비 속성을 분류 모델의 결과인 의사결정나무에 군집을 할당하고, 할당된 군집의 정규화된 대표 부하패턴을 원래의 부하량으로 복원하여 미계측 설비의 부하패턴을 예측할 수 있다.According to a preferred embodiment of the present invention, the prediction of the unmeasured equipment load pattern is performed by assigning a cluster to a decision tree that is a result of a classification model and assigning a cluster to an unmeasured equipment property in the same form as the clustering result and the measurement equipment property. It is possible to predict the load pattern of the unmeasured equipment by restoring the normalized representative load pattern to the original load amount.

상술한 바와 같이 본 발명에 따르면, 시간대별 연속적인 부하패턴의 군집화를 통해 군집별 대표 부하패턴을 생성하게 되면 원격검침 데이터의 경우 고객의 대표 부하패턴을 통해 부하특성 및 전력사용행태 분석이 가능하게 되며, 부하예측의 정확성 향상, 전력수요 그룹별 외부요인(날씨, 경제성장률 등) 민감도 분석, 가격체제 시뮬레이션 및 개선, 부하패턴에 따라 차별화된 고객 서비스 제공 및 영업전략 수립 등을 위한 핵심지식을 제공하게 된다. 변압기 무선부하감시 데이터의 경우 변압기 부하패턴의 특성분석이 가능하며, 부하특성에 따른 변압기 교체기준 수립 및 수명평가 연구에 활용할 수 있다.As described above, according to the present invention, when the representative load pattern for each cluster is generated through clustering of continuous load patterns for each time zone, the load characteristic and power usage behavior can be analyzed through the representative load pattern of the customer in the case of remote meter reading data. It provides key knowledge for improving accuracy of load forecasting, analyzing external factors (weather, economic growth rate, etc.) sensitivity by each power demand group, simulating and improving the price system, providing differentiated customer service according to load patterns, and establishing sales strategies. Done. In the case of transformer wireless load monitoring data, it is possible to analyze the characteristics of transformer load pattern, and it can be used for establishing transformer replacement criteria and life assessment study according to load characteristics.

군집화된 부하패턴을 설비 속성에 따라 분류하게 되면 기존에는 설비 구분방식에 대해 부하패턴이 어떻게 분류되는지 파악할 수 있다. 원격검침 데이터의 경우 고객의 계약종별, 계약전력, 산업분류, 전기사용용도, 월검침량 등의 구분에 따라 부하패턴이 분류되며, 변압기 무선부하감시 데이터의 경우 변압기 용량, 지역특성, 공급하는 고객의 특성정보 구분에 따라 부하패턴이 분류된다. 이를 통해 미계측 전력설비의 부하패턴의 예측이 가능하며, 미래의 연속성 있는 시간대별 부하예측에도 활용될 수 있다. 본 발명은 원격검침, 변압기 무선부하감시 뿐만 아니라 배전자동화, 송변전 설비 부하감시와 같이 원격계측되는 전력설비의 부하패턴 데이터에 전반적으로 적용이 가능한 방법으로 활용 범위는 넓다고 할 수 있다.When the clustered load pattern is classified according to the property of the facility, it is possible to grasp how the load pattern is classified in the conventional method of classifying the facility. In the case of remote meter reading data, load patterns are classified according to the customer's contract type, contract power, industrial classification, electricity use, monthly meter reading, and so on. The load pattern is classified according to the characteristic information classification of. This makes it possible to predict load patterns of unmeasured power equipment and can be used for future continuity load forecasts. The present invention can be said to be widely applicable to the load pattern data of the remotely measured power equipment, such as remote metering, transformer wireless load monitoring, distribution automation, load transmission and transmission equipment load monitoring.

도 1은 본 발명에 따른 원격계측 전력 부하패턴의 분류 및 예측 방법의 순서도이다.1 is a flowchart illustrating a classification and prediction method of a remote measurement power load pattern according to the present invention.

도 1을 참조하면, 본 발명은 먼저 단계1에서 원격계측 데이터를 데이터베이스나 파일로 연계하고 단계2에서 전처리를 수행한다. 여기서, 원격계측 데이터는 고객 원격검침 데이터, 변압기 무선부하감시 데이터, 자동화 개폐기 전력품질감시 데이터, 송변전설비 부하감시 데이터가 해당될 수 있다. 전처리 작업은 표본 추출과 이상치 제외 처리를 포함할 수 있다. 표본 추출은 관심 항목에 대해 상이한 빈도를 수용할 수 있도록 층화표본(Stratified Sample) 추출 기법을 적용한다. 층화 표본은 대용량 데이터 집합 D가 층(Strata)이라 불리는 서로 상호배반적인 부분들로 분할되어 있다면, 각 층에서 하나의 단순무작위표본(Simple Random Sample)을 얻음으로써 D의 층화표본을 생성할 수 있다. 예를 들어, 원격검침 데이터의 경우 관심 항목인 계약종별 및 계약전력에 대해 데이터의 빈도가 비대칭이므로 모집단의 빈도에 따라 표본을 추출한다.Referring to FIG. 1, the present invention first associates remote measurement data with a database or file in step 1 and performs preprocessing in step 2. Here, the remote measurement data may correspond to customer remote meter reading data, transformer wireless load monitoring data, automation switch power quality monitoring data, transmission transformer load monitoring data. Pretreatment operations may include sampling and outlier handling. Sampling applies a stratified sample sampling technique to accommodate different frequencies for the item of interest. A stratified sample can generate a stratified sample of D by obtaining a simple random sample from each layer if the large dataset D is divided into mutually intersecting parts called strata. . For example, in the case of remote meter reading data, the frequency of the data is asymmetrical for the contract type and contract power, which are the items of interest, and thus, the sampling is performed according to the frequency of the population.

수집된 데이터에 포함된 오류 및 이상치(Outlier) 데이터는 군집분석의 성능을 크게 저하시킬 수 있기 때문에 데이터 정제를 위한 전처리 작업은 필수적이다. 원시 원격검침 데이터의 경우 하루 96개 데이터가 존재하지 않거나, 하루 유효전력의 합이 1보다 작은 당일 데이터는 제외한다. 15분 단위에서 최소 검침량은 계약전력이 가로등으로 0.08로 1이하는 정상적으로 검침되지 않은 것으로 간주한다. 또한 이상치 처리를 위하여 데이터 정제 기법 중 SOM(Self-Organizing feature Map) 군집화 알고리즘을 적용한다. 구성 매트릭스는 10 by 10 (100 군집)으로 한 클러스터 에 포함된 데이터 객체가 1개 이하인 군집 결과는 이상치로 간주하여 제외한다.Error and outlier data included in the collected data can significantly degrade the performance of the cluster analysis, so preprocessing for data purification is essential. In the case of the raw telemetry data, 96 data per day do not exist or the day data whose sum of active powers per day is less than 1 is excluded. At 15 minute intervals, the minimum meter reading is deemed not normally read when the contracted power is less than 1 to 0.08 as a street light. In addition, SOM (Self-Organizing Feature Map) clustering algorithm is applied to outlier processing. The constituent matrix is 10 by 10 (100 clusters). Clustering results of one or less data objects in a cluster are considered outliers.

전처리가 완료되면 단계3에서 정규화 작업을 수행하는데, 전처리된 데이터를 분석하고자 하는 주기, 이를테면 월, 주, 일, 시간 등의 단위로 벡터를 구성하고 원격계측값이 특정 범위 내에 분포하도록 정규화 작업을 처리한다. 원격검침 데이터는 예측하고자 하는 미계측 고객이 월검침을 시행하고 있으므로 월 단위로 15분 원격검침 데이터를 고객별로 하나의 벡터로 구성한다. 변압기 무선부하감시 데이터는 월 단위로 부하패턴의 특성을 분석을 하고 미계측 설비를 예측하기 위해 월 단위로 30분 계측 데이터를 설비별로 하나의 벡터로 구성한다. 이때 원시 계측 데이터를 그대로 사용하게 되면 군집화가 전력사용량의 분포에 따라 형성된다. 따라서 벡터의 최대값이 1이 되도록 정규화한다.When the preprocessing is completed, the normalization operation is performed in step 3, and the normalization operation is performed so that the vector is composed in units of cycles, for example, month, week, day, and time, and the remote measurement value is distributed within a specific range. Process. Since the remote meter reading data is performed by the non-measured customer who wants to predict, the monthly meter reading data is composed of one vector for each customer. In order to analyze the characteristics of the load pattern on a monthly basis and to predict the unmeasured equipment, the transformer wireless load monitoring data is composed of one vector for 30 minutes per month. If raw measurement data is used as it is, clustering is formed according to the distribution of power consumption. Therefore, normalize so that the maximum value of the vector is 1.

이어서, 단계4에서, 정규화된 데이터를 군집화하여 단계5로 진행하여 대표 부하패턴을 생성한다.Then, in step 4, the normalized data is clustered to proceed to step 5 to generate a representative load pattern.

또 한편으로 군집분석의 결과로 다수의 군집이 생성되면 단계 7에서 클래스의 분류 모델을 구축한다. 분류는 서로 다른 클래스의 객체들을 구별하기 위한 설명 도구로서의 역할과 알려지지 않는 레코드들의 클래스 레이블을 예측하기 위해 사용된다. 즉, 각 군집을 계측되는 고객이나 설비의 특성에 따라 분류하고 미계측 고객이나 설비가 입력되었을 때 클래스 레이블을 예측한다.On the other hand, when a plurality of clusters are generated as a result of cluster analysis, in step 7, a classification model of the class is constructed. Classification is used as a descriptive tool to distinguish objects of different classes and to predict class labels of unknown records. That is, each cluster is classified according to the characteristics of the customer or facility being measured and the class label is predicted when an unmeasured customer or facility is input.

이때, 단계6의 계측 설비 속성이 단계7의 분류단계에 제공될 수 있다. 이에 대해 상술하면, 원격검침 데이터의 경우 고객 특성정보로서 계약종별, 계약전력, 전기사용 용도, 산업분류, 공급방식, 지역구분, 월검침량이 될 수 있으며, 변압기 무선부하감시 데이터의 경우는 변압기 특성정보로서 용량, 전등수용호수, 동력수용호수, 부하지역특성과 해당 변압기에서 공급하고 있는 고객의 계약종별, 계약전력, 전기사용용도, 고압/저압 구분, 그리고 월사용량이 될 수 있다. 이 때 속성정보가 많을 경우 엔트로피(Entropy)를 계산하여 적용 범위를 축소할 수 있다. 엔트로피는 구간의 순도에 대한 척도로서 구간에서 클래스들의 발생 확률(값의 비율)이다. 즉, 주어진 구간이 한 클래스 속한 값들만을 포함하면(완전히 순수하면) 엔트로피는 0, 한 구간에 여러 클래스들의 값들이 동등하게 자주 발생하면(구간이 가능한 불순하면) 엔트로피는 최대가 되므로, 각 속성 값을 하나의 분리 구간으로 시작해서 통계적 시험 결과가 유사한 인접 구간들을 합병해서 더 큰 구간을 생성한다.At this time, the measurement facility attribute of step 6 may be provided to the classification step of step 7. In detail, in the case of the remote meter reading data, it may be a contract type, contract power, use of electricity, industrial classification, supply method, regional classification, and monthly meter reading as customer characteristic information. The information can be capacity, light receiving lake, power receiving lake, load area characteristics and customer's contract type, contract power, electricity use, high / low pressure classification, and monthly usage of the transformer. At this time, if there is a lot of attribute information, entropy can be calculated to reduce the application range. Entropy is a measure of the purity of an interval and is the probability of occurrence of classes in the interval (ratio of values). That is, if a given interval contains only values that belong to one class (completely pure), entropy is zero, and if values of several classes occur equally frequently (if the interval is impure), then the entropy is maximum. Starting with a single interval, the values are merged into adjacent intervals with similar statistical test results to create a larger interval.

단계7의 분류에 사용되는 분류기의 종류로는 의사결정나무(Decision Tree), 베이지안(Bayesian) 분류기, 신경망(Neural Network), SVM(Support Vector Machine), 그리고 규칙기반 분류기 등이 있는데, 본 발명에서는 성능을 평가하여 의사결정나무를 사용한다. 분류 모델의 성능평가는 TP(True Positive), TN(True Negative), FP(False Positive), FN(False Negative)로 표현되는 혼동 행렬(confusion matrix)을 이용하는데, 정확성을

식과 같이 계산하여 최댓값을 생성하는 알고리즘을 선택할 수 있다. 의사결정나무는 의사결정규칙을 나무구조로 도표화하여 분류와 예측을 수행하는 분석방법이다. 이 방법은 분류 또는 예측의 과정이 나무구조에 의한 추론규칙(Induction Rule)에 의해서 표 현되기 때문에, 다른 방법들에 비해서 분석자가 그 과정을 쉽게 이해하고 설명할 수 있다는 장점을 가지고 있다.Types of classifiers used in the classification of step 7 include a decision tree, a Bayesian classifier, a neural network, a support vector machine, and a rule-based classifier. Evaluate performance and use decision trees. The performance evaluation of the classification model uses a confusion matrix expressed as TP (True Positive), TN (True Negative), FP (False Positive), or FN (False Negative).

You can choose an algorithm that produces the maximum value by calculating it as an equation. Decision trees are analytical methods that perform classification and prediction by charting decision rules in a tree structure. This method has the advantage that the process of classification or prediction is represented by induction rules based on tree structure, so that the analyst can easily understand and explain the process compared to other methods.

미계측 설비에 대한 부하패턴의 예측은 단계6의 계측 설비 속성과 동일한 형태로 단계7의 미계측 설비 속성을 단계7의 분류 모델의 결과인 의사결정나무에서 단계9와 같이 군집 할당을 하는 것이다. 이 때 할당된 군집(즉, 대표 부하패턴)은 0과 1 사이의 정규화된 부하패턴이므로 원래의 부하량으로 복원 되어야 한다. C가 대표 부하패턴(C₁ ~ C_k ~ C_n)이고 T가 월 단위 총 사용량일 때 각 시간대별 미계측 설비 부하패턴(10)은

식으로 계산한다.The prediction of the load pattern for the unmeasured equipment is to assign the unmeasured equipment properties of step 7 to the clustering of the decision tree as a result of the classification model of step 7, in the same form as the measurement equipment properties of step 6. At this time, the assigned cluster (ie, representative load pattern) is a normalized load pattern between 0 and 1, so it must be restored to the original load amount. When C is the representative load pattern (C ₁ ~ C _k ~ C _n) and T is the total monthly usage, the unmeasured facility load pattern 10 for each time zone is

Calculate by the formula.

도 2는 본 발명에 따른 군집수 결정과 대표 부하패턴 생성을 위한 군집화 방법을 도시한 것이다. 군집화에서 최적의 k개의 군집 수 결정은 군집화뿐만 아니라 분류 성능에도 큰 영향을 주기 때문에 매우 중요한 요소로서 간주되어야 하며, 과거의 경험적 측면을 도입한 휴리스틱(Heuristic) 방법론을 적용해야 한다. 본 발명에서의 군집 수 결정은 재현성(Reproducibility) 평가 방법을 사용한다. 군집분석의 재현성이란 동일한 메커니즘에서 생성된 독립적인 새 데이터 셋을 동일한 방식으로 군집화한 결과가 기존 군집화 결과와 유사하다면 재현성이 있다고 하는 것이다. 재현성 평가 모델은 신경망이나 의사결정나무 분류 및 회귀분석과 같은 지도학습(Supervised Learning) 모델링에서 사용되는 데이터 분할 기법을 활용한 것이다. 자료 분할은 동일한 군집화 방법의 반복을 가능하게 해주므로 이를 활용하여 재현성 평가를 할 수 있다.2 illustrates a clustering method for determining cluster number and generating a representative load pattern according to the present invention. Determination of the optimal number of k clusters in clustering should be regarded as a very important factor because it affects not only clustering but also classification performance, and heuristic methodology adopting past empirical aspects should be applied. Cluster number determination in the present invention uses a method of reproducibility evaluation. Reproducibility of cluster analysis is that if the results of clustering independent new datasets generated by the same mechanism in the same way are similar to existing clustering results, they are reproducible. Reproducibility assessment models utilize data segmentation techniques used in supervised learning modeling such as neural networks, decision tree classification, and regression analysis. Data segmentation allows the same clustering method to be repeated, which can be used to evaluate reproducibility.

도 2를 참조하면, 단계S1에서 정규화 데이터는 단계S2에서 임의로 2개로 데이터 분할된다. 이중 하나는 단계S3에서 훈련 데이터이고 다른 하나는 단계S4에서 시험 데이터이다. 쉽게 분할하는 방법은 원격계측 설비의 ID를 홀수인 것과 짝수인 것으로 분할하는 것이다. 그 다음 경험적으로 유의하다고 예상되는 k값을 입력(단계S5)하고, 훈련 데이터를 군집화 모델에 적용한다(단계S6). 여기에서, 군집화 모델은 대용량 데이터에서 빠른 군집 구성, 사용자 기반의 군집 수 결정방식, 군집의 적합성을 판단하는 재현율 적용 용이성 등을 고려하여 적용하는데, 본 발명에서는 k-means 알고리즘을 적용한다.Referring to Fig. 2, in step S1, normalized data is arbitrarily divided into two in step S2. One of them is training data in step S3 and the other is test data in step S4. An easy way to do this is to divide the ID of the remote instrumentation into odd and even numbers. Then, the k value expected to be empirically significant is input (step S5), and the training data is applied to the clustering model (step S6). In this case, the clustering model is applied in consideration of fast cluster configuration, user-based cluster number determination method, and ease of application of reproducibility for determining cluster suitability in large data, and the present invention applies the k-means algorithm.

그리고, 시험 데이터의 각 객체를 훈련 데이터로부터 생성된 군집화 모델에 적용하여 분리한다. 즉 시험 데이터를 가장 가까운 중심의 군집에 할당한다. 시험 데이터를 동일한 방식으로 군집화하여 자체 모델을 산출한다. 그리고 시험 데이터의 각 객체를 몇 개의 군집 중 하나로 할당한다. 그 다음 시험 데이터의 두 군집화 결과를 토대로 교차분류표를 생성한다(S7). 이에 대해서는 도 3을 참조하여 후술될 것이다. 적용된 군집화가 최적화된 것이라면 이 표에서 행과 열은 강한 대응성을 보일 것이다. 그렇지 않다면 행과 열의 대응성은 약하게 나타날 것이다.Then, each object of the test data is separated by applying to the clustering model generated from the training data. That is, test data is assigned to the nearest central cluster. The test data is clustered in the same way to yield its own model. Each object in the test data is assigned to one of several clusters. Next, a cross classification table is generated based on the results of the two clustering of the test data (S7). This will be described later with reference to FIG. 3. If the applied clustering is optimized, then the rows and columns in this table will show strong correspondence. Otherwise, the correspondence between rows and columns will be weak.

다음에 단계S8로 진행한다. 교차분류표에서 주 경향에서 벗어난 데이터 수와 비율(%)을 구한다. 입력된 k값에 의해 생성된 교차분류표에서 최초로 최소의 주경향에서 벗어난 비율이 산출되면 최적의 k값으로 결정하고, 그렇지 않으면 다시 단 계S5로 진행하여 다른 k값을 입력하여 다시 교차 분류표를 생성하는 반복 작업을 수행한다. k값이 결정이 되었으면 분할 전의 정규화 데이터(단계S1에서)를 군집화 모델에 적용한다(S9). 그 결과로, 도 4에 도시된 바와 같이(후술함), 각 군집별 대표 부하패턴이 생성된다(S10).The flow then advances to step S8. Find the number and percentage of data deviating from the main trend in the cross classification table. When the ratio of deviation from the minimum main trend is calculated from the cross classification table generated by the input k value, the optimum k value is determined. Otherwise, proceed to step S5 again and input another k value to enter the cross classification table again. Perform an iterative operation to create it. If the k value is determined, the normalized data (in step S1) before division is applied to the clustering model (S9). As a result, as shown in FIG. 4 (to be described later), a representative load pattern for each cluster is generated (S10).

도 3은 원격검침 데이터의 군집화에서 예시 교차분류표를 나타낸 것이다.3 shows an example cross-classification table in the clustering of telemetered data.

도 3을 참조하면, 원격검침 데이터를 군집수를 5개로 설정하여 훈련 데이터와 시험 데이터의 군집으로 군집화하여 교차분류표를 생성한 결과로서, 도 3에서 음영으로 나타낸 해당하는 부분이 주경향에 속하는 데이터의 수이며 음영이 아닌 부분이 주경향에서 벗어난 데이터의 수이다. 주경향에서 벗어난 데이터의 수를 전체 데이터의 수에서 나누게 되면 주경향에서 벗어난 데이터의 비율이 생성된다.Referring to FIG. 3, as a result of generating a cross-classification table by setting the number of remote reading data to five clusters and clustering the training data and the test data, the corresponding parts shown in shade in FIG. 3 belong to the main trend. Number of data, which is the number of data out of the main trend where the non-shaded part is. Dividing the number of data out of the main trend by the total number of data out of the main trend creates a proportion of the data out of the main trend.

도 4는 원격검침 고객의 대표 부하패턴 생성 결과를 나타낸 예시 그래프이다.4 is an exemplary graph illustrating a result of generating a representative load pattern of a remote meter reading customer.

도 4를 참조하면, k값을 31로 군집화하여 31개의 대표 부하패턴을 특정 하루동안에 대해 15분 단위로 표현한 것이다.Referring to FIG. 4, 31 representative load patterns are expressed in units of 15 minutes for a specific day by clustering k values by 31. FIG.

도 5는 원격검침 고객의 예시 의사결정나무를 나타낸 것이다.5 shows an example decision tree of a telemetered customer.

도 5를 참조하면, 원격검침 데이터의 군집들에 대해 고객 특성정보를 적용하여 의사결정나무 분류기에 의해 생성된 결과의 일부분을 도시한 것으로서, 여기에서 사용된 고객 특성정보는 계약종별코드, 계약전력, 전기사용용도코드, 산업분류코드, 공급방식코드, 지역(동)코드, 월검침량이다. 월검침량 속성정보가 분류기준의 중요한 요소가 됨을 알 수 있다. 이렇게 월 단위로 구성된 의사결정나무를 이용하여 미계측 고객의 특성정보가 입력되면 해당 고객 특성정보를 대입하여 군집을 결정하게 된다. 예를 들어 입력된 미계측 고객의 특성정보에서 월검침량(TOT_KWH)이 1,426,536kWH 보다 크고 계약전력(CNTR_PWR)이 12,500kW보다 크고, 계약종별코드(CNTR_CD)가 228보다 같거나 작고 산업분류코드(INDU_CD)가 67121보다 크면 군집(Cluster) 23을 할당하게 된다.Referring to FIG. 5, a part of the result generated by the decision tree classifier by applying the customer characteristic information to the clusters of the remote reading data is shown, and the customer characteristic information used here is a contract type code and a contract power. , Utility code, industrial classification code, supply code, area code, and monthly meter reading. It can be seen that the monthly inspection amount attribute information is an important element of the classification criteria. When the characteristic information of the unmeasured customer is input using the decision tree composed of the monthly unit, the cluster is determined by substituting the corresponding customer characteristic information. For example, the monthly meter reading (TOT_KWH) is greater than 1,426,536 kWH, the contract power (CNTR_PWR) is greater than 12,500 kW, and the contract type code (CNTR_CD) is less than or equal to 228, and the industrial classification code ( If INDU_CD) is greater than 67121, Cluster 23 is allocated.

도 1은 본 발명에 따른 원격계측 전력 부하패턴의 분류 및 예측 방법을 나타낸 순서도.1 is a flowchart illustrating a method of classifying and predicting a remote measurement power load pattern according to the present invention.

도 2는 군집화 방법을 나타낸 순서도.2 is a flow chart illustrating a clustering method.

도 3은 원격검침 데이터의 군집화에서 교차분류표를 나타낸 예시도.3 is an exemplary diagram showing a cross-classification table in the clustering of the remote meter reading data.

도 4는 원격검침 고객의 대표 부하패턴 생성 결과를 나타낸 예시도.4 is an exemplary view showing a representative load pattern generation result of the remote meter reading customer.

도 5는 원격검침 고객의 의사결정나무 나타낸 예시도.5 is an exemplary view showing a decision tree of a remote meter reading customer.

Claims

원격 계측 데이터 수집단계;Telemetry data collection step;

상기 원격계측 데이터에서 층화표본을 추출하고 표본 데이터에서 오류 및 이상치를 제외 처리하는 절차를 포함하는 데이터 전처리 단계;A data preprocessing step including a step of extracting a stratified sample from the remote measurement data and excluding an error and an outlier from sample data;

상기 전처리된 데이터를 분석하고자 하는 소정의 기간 단위로 벡터를 구성하고 원격계측값이 특정 범위 내에 분포하도록 정규화 작업을 행하는 정규화 단계;A normalization step of constructing a vector in units of predetermined periods for analyzing the preprocessed data and performing normalization so that a remote measurement value is distributed within a specific range;

상기 정규화된 데이터를 군집화하여 대표 부하패턴을 생성하는 군집화 단계;Clustering the normalized data to generate a representative load pattern;

계측 설비 속성정보에 기초하여 분류하는 단계;Classifying based on the measurement facility attribute information;

미계측 설비 속성에 기초하여 군집을 할당하고 미계측 설비 부하패턴을 예측하는 단계를 포함하는, 원격계측 전력 부하패턴의 분류 및 예측 방법.And assigning a cluster based on the unmeasured facility attributes and predicting the unmeasured facility load pattern.

제1항에 있어서, 상기 군집화 단계는 정규화 데이터를 훈련 데이터와 시험 데이터로 분할하고 k값을 입력한 다음 군집화 모델을 적용하여 교차분류표를 생성하고 최적의 k값을 결정하는 재현성 평가 방법으로 군집분석을 처리하는, 원격계측 전력 부하패턴의 군집화 방법. The clustering method of claim 1, wherein the clustering step divides the normalized data into training data and test data, inputs a k value, and then applies a clustering model to generate a cross classification table and determine an optimal k value. A method of clustering telemeasurement power load patterns for processing an analysis.

제1항에 있어서, 상기 분류 단계는 군집화 결과와 계측 설비 정보를 이용하여 의사결정나무 구조의 분류를 포함하는, 원격계측 전력 부하패턴의 분류 및 예측 방법. 2. The method of claim 1, wherein the classifying step includes classifying the decision tree structure using clustering results and metrology facility information.

제3항에 있어서, 상기 계측 설비 속성은 원격검침 데이터의 경우 고객 특성정보로서 계약종별, 계약전력, 전기사용용도, 산업분류, 공급방식, 지역구분, 월검침량을 포함하고, 변압기 무선부하감시 데이터의 경우는 변압기 특성정보로서 용량, 전등수용호수, 동력수용호수, 부하지역특성과 해당 변압기에서 공급하고 있는 고객의 계약종별, 계약전력, 전기사용용도, 고압/저압 구분, 그리고 월사용량을 포함하는, 원격계측 전력 부하패턴의 분류 방법. The method of claim 3, wherein the property of the measurement facility includes the type of contract, contract power, electricity use, industrial classification, supply method, area classification, monthly meter reading, and the like as the customer characteristic information in the case of the remote meter reading data. In case of data, the characteristics of transformer include information about capacity, light receiving lake, power receiving lake, load area characteristics, customer's contract type, contract power, electricity use, high / low pressure classification, and monthly usage. The method of classifying the remote measurement power load pattern.

제1항에 있어서, 상기 미계측 설비 부하패턴의 예측은 군집화 결과와 계측 설비 속성과 동일한 형태로 미계측 설비 속성을 분류 모델의 결과인 의사결정나무에 군집을 할당하고, 할당된 군집의 정규화된 대표 부하패턴을 원래의 부하량으로 복원하여 미계측 설비의 부하패턴을 예측하는, 원격계측 전력 부하패턴의 분류 및 예측 방법. The method of claim 1, wherein the prediction of the unmeasured equipment load pattern is performed by assigning a cluster to a decision tree that is a result of a classification model and assigning a cluster to a decision tree in the same form as a clustering result and a measurement facility property, and normalizing the assigned cluster. A method for classifying and predicting remotely measured power load patterns, wherein the representative load patterns are restored to original load amounts to predict load patterns of unmeasured equipment.