KR20020019997A

KR20020019997A - A data sorting method on spreadsheet using matrices

Info

Publication number: KR20020019997A
Application number: KR1020000052792A
Authority: KR
Inventors: 정석희; 한일
Original assignee: 민성기; 주식회사 타스테크
Priority date: 2000-09-06
Filing date: 2000-09-06
Publication date: 2002-03-14

Abstract

PURPOSE: A method for classifying the data on a spreadsheet using a matrix is provided to enable a user without knowledge of database to exactly and easily classify a new class in a database except the previously defined field by using a visual environment named a matrix chart. CONSTITUTION: In case of analyzing a table having one or more text fields, the method comprises steps of arranging the records of original table in a row at a state of displaying a field name, selecting one text field name among the arranged field names in order to use for a classification, making a dispersion by using the selected field and a matrix chart, setting up a temporary class by the dispersion, appointing a spacing and number of temporary class, creating the final classes from the spacing and number of the temporary class, connecting each final class to the temporary classes created from the matrix chart, fixing a new field name for the new class of original table, and making a new field of the appointed new field name in the original table and inputting the field content by using the temporary and final class.

Description

매트릭스를 이용한 스프레드 시트상의 데이터 분류 방법{A data sorting method on spreadsheet using matrices}A data sorting method on spreadsheet using matrices}

본 발명은 매트릭스를 이용한 자료분류 방법에 대한 것이다. 특히 본 발명은 자료의 새로운 분류필드를 만드는 방법 즉 숫자로 된 필드들을 가진 레코드들을 분류하는 방법에 관한 것이다.The present invention relates to a data classification method using a matrix. In particular, the present invention relates to a method of creating a new classification field of data, ie to classify records with numeric fields.

이 분야의 종래 기술로는 1997년 8월 22일 출원되어 2000년 7월 25일 특허가 허여된 IBM사의 미국 특허 제 6,094,651 호 "Discovery-driven exploration of OLAP(On-line Analytical Processing) data cubes"가 있다.Prior art in this field includes US Patent No. 6,094,651, "Discovery-driven exploration of On-line Analytical Processing (OLAP) data cubes," filed on August 22, 1997, issued July 25, 2000. have.

이 특허는 k 차원 데이터 큐브(data cube)의 데이터 변칙(data anomalies)을 찾아내는 방법으로서, 놀람값(surprise value)을 데이터 큐브의 각 셀과 관련지어 이 데이터 큐브의 다른 셀들의 내용의 변칙의 정도를 나타내는 셀과 관련을 짓는 단계와, 셀(cell)과 관련된 놀람값이 소정의 예외 임계값(predetermined exception threshold)을 초과하는 경우에 데이터 변칙을 나타내는 단계를 포함하여 이루어진다.This patent is a method of finding data anomalies in a k-dimensional data cube, in which a surprise value is associated with each cell in the data cube and the extent of the anomaly in the contents of the other cells of the data cube. Associating with an indicative cell, and indicative of a data anomaly when a surprise value associated with the cell exceeds a predetermined exception threshold.

이 특허는 데이터 마이닝(data mining)을 위한 것으로 고차원 데이터의 자동 서브스페이스 클러스터링에 관한 것이다. 그러나 상기한 종래 기술에 의해서는 데이터베이스를 모르는 일반인은 업무시에 주어진 데이터들로부터 새로운 데이터를 분류하기가 매우 어렵다. 그래서 데이터베이스나 통계적 지식이 부족한 사용자들은 사용자 임의로 자료를 분류할 수 없다.This patent is for data mining and relates to automatic subspace clustering of high dimensional data. However, according to the conventional technology described above, it is very difficult for a general person who does not know a database to classify new data from data given at work. Thus, users who lack databases or statistical knowledge cannot classify data arbitrarily.

또한 매트릭스차트 개념을 이용하는 다른 기존의 방법들도 대부분 종이 위에서 그림을 그려서 표현하는 방법이 그리고 이 분야의 또 다른 방법으로는 숫자 필드를 포함한 레코드들을 분류하는 통계적인 군집분석 방법이 있다. 그러나 이 발명도 사용자가 통계적인 지식을 가지고 있어야 분석결과를 이해하고 사용할 수 있다는 단점이 있었다. 이뿐만 아니라 이 방법은 사용자가 원하는 임의의 구간과 개수로 지정할 수 없어 유연성이 부족하였다.In addition, most of the existing methods using the matrix chart concept are represented by drawing on paper, and another method in this field is statistical clustering to classify records including numeric fields. However, this invention also has the disadvantage that the user must have statistical knowledge to understand and use the analysis results. In addition, this method was not flexible because the user could not specify any desired interval and number.

본 발명은 상기한 종래 기술들의 문제점을 해결하기 위하여 고안된 것으로서 데이터베이스를 모르는 사용자도, 데이터베이스에서 이미 정의된 필드 외의 새로운 분류를 매트릭스 차트라는 시각적인 환경을 통하여 손쉽고 정확하게 할 수 있도록 하고, 새로운 필드를 생성시켜 또 다른 분석에 파생시켜 활용할 수 있도록 하는 방법을 제공하는데 목적이 있다The present invention is designed to solve the above problems of the prior art, so that a user who does not know the database can easily and accurately perform a new classification other than the fields already defined in the database through a visual environment called a matrix chart, and create a new field. It aims to provide a way to make it possible to derive and use it in another analysis.

도 1은 서울시의 구별, 행정동별 세대수 등의 분포를 수치로 나타낸 스프레트 쉬트의 예시도.1 is an exemplary view of a spread sheet numerically showing the distribution of the distinction of Seoul, the number of households by administrative town, and the like.

도 2는 문자필드의 그룹을 재분류 하는 과정의 시작단계 예시도.2 is a diagram illustrating a start step of a process of reclassifying a group of character fields.

도 3은 새로이 생성된 아이템 입력란에 사용자가 지정하는 그룹명을 입력하면 새로운 아이템 목록이 생성되는 과정의 예시도.3 is an exemplary diagram illustrating a process of generating a new item list when a group name designated by a user is input into a newly created item input box.

도 4는 새 그룹 아이템 목록의 설정이 완료된 뒤 각각의 새그룹 아이템 목록속에 그룹화된 기존의 아이템 목록을 매칭시키는 과정 예시도.4 is an exemplary process of matching existing item lists grouped in each new group item list after setting of the new group item list is completed.

도 5는 새 그룹 아이템 목록과 기존의 아이템 목록의 매칭을 완료한 상태의 예시도.5 is an exemplary view of a state in which a matching between a new group item list and an existing item list is completed.

도 6은 문자 그룹화 과정을 거친 후 다시 스프레드 시트로 재출력한 화면의 예시도.6 is an exemplary view of a screen which is again outputted to a spreadsheet after undergoing a character grouping process.

도 7은 수치특성을 가진 스프레드 시트의 예시도.7 is an exemplary view of a spreadsheet having numerical characteristics.

도 8은 수치성을 가진 스프레드 시트상의 각 필드목록 예시도.8 shows an example of each field list on a spreadsheet with numerical values.

도 9는 사용자가 지정한 수치필드 1개의 항목을 분석 버튼을 이용하여 산포도로 표현한 예시도.9 is an exemplary diagram showing a scatter diagram using an analysis button of one item of a numerical field designated by a user.

도 10은 값이동 버튼을 이용하여 분석라인을 추가시키고 분석라인을 이동하여 정렬시키는 과정의 예시도.10 is an exemplary view illustrating a process of adding an analysis line and moving and arranging the analysis line by using a value moving button.

도 11은 수치데이터들을 특성별로 그룹화 하는 초기단계를 나타내는 예시도.11 is an exemplary diagram showing an initial step of grouping numerical data by characteristics.

도 12는 그룹화한 새 그룹명과 산포도상에 분류되어 있는 그룹명 등을 매칭시켜 그룹명과 그에 속할 산포도 데이터 값들을 연결시킨 후의 화면 예시도.Fig. 12 shows an example of a screen after matching a grouped new group name with a group name classified on a scatter diagram, and concatenating the group name with the scatter plot data values belonging thereto.

도 13은 수치성을 가진 단일필드 1개의 값과 각 항목값을 연계시켜 그룹화 한 후 사용자가 지정한 그룹명으로 재분류된 스프레드 시트의 예시도.FIG. 13 is an exemplary view of a spreadsheet reclassified with a group name designated by a user after grouping each item value by the value of a single field having numerical values.

도 14는 수치성 2개 필드 데이터의 그룹화 시작단계인 스프레드 시트를 나타내는 화면이다.Fig. 14 is a screen showing a spreadsheet which is a grouping start stage of numerical two field data.

도 15는 매트릭스 기본 쉬트에서 매트릭스 챠트로 분석하려는 필드목록의 수치성 데이터를 2개 선택한 화면을 나타낸다.15 shows a screen in which two numerical data of the field list to be analyzed by the matrix chart are selected from the matrix basic sheet.

도 16은 선택된 2개의 필드를 상호 연관성 수치에 따라 분류된 산포도를 나타내는 예시도.FIG. 16 is an exemplary diagram showing a scatter diagram in which two selected fields are classified according to correlation values. FIG.

도 17은 사용자가 원하는 분석라인의 수를 추가 생성하여 분석의 유연성을 제고시키는 화면의 예시도.17 is an exemplary view of a screen for increasing the flexibility of analysis by additionally generating the number of analysis lines desired by the user.

도 18은 추가시킨 분석라인을 수치필드 값의 산포도에 따라 정렬시키는 과정을 나타낸다.18 shows a process of aligning the added analysis line according to the scatter diagram of the numerical field values.

도 19는 매트릭스 챠트에서 분석라인으로 분류되어진 셀을 유사영역별로 문자를 이용하여 그룹핑하는 초기화면.FIG. 19 is an initial screen for grouping cells classified as analysis lines in a matrix chart using characters for each similar area. FIG.

도 20은 새로이 분류된 6개의 그룹명과 그룹명 중 기타지역에 포함되는 G7, G8, G9의 매트릭스 셀을 나타낸 화면.20 is a screen showing matrix cells of G7, G8, and G9 included in other regions among the six newly classified group names and group names.

도 21은 분류과정을 거친 데이터 그룹들이 재분류되어 스프레드 시트로 재출력된 예시도.21 is an exemplary view in which data groups that have undergone a classification process have been reclassified and outputted to a spreadsheet.

도 22는 분석의 시작단계인 스프레드 시트를 나타낸다.22 shows a spreadsheet that is the beginning of the analysis.

도 23은 산포도로 표현될 2개 필드목록을 선택한 화면이다.23 is a screen for selecting two field lists to be expressed as scatter diagrams.

도 24는 2개의 수치성필드를 분석하여 산포도로 표현한 화면.24 is a screen in which two numerical fields are analyzed and expressed in a scatter diagram.

도 25는 각 데이터 값의 산포도를 x, y 축의 직선 분석라인을 이용하지 않고 사선을 사용하여 분석한 화면.25 is a screen in which a scatter diagram of each data value is analyzed using diagonal lines without using a straight line analysis line on the x and y axes.

도 26은 데이터 값의 산포 위치에 따라 원형이나 자유도형 기법을 이용하여 매트릭스 산포도 상에서 그룹화 한 예시도.FIG. 26 is an exemplary diagram grouping on a matrix scatter plot using a circular or degrees of freedom technique according to the scatter location of data values. FIG.

도 27은 분류할 필드명의 데이터 항목을 사용자가 원하는 새 그룹으로 입력하는 단계를 표현한 것.Fig. 27 represents the step of inputting data items of field names to be classified into a new group desired by the user.

도 28은 그룹명과 항목간의 매칭을 통한 그룹핑 과정을 표현한 것.28 illustrates a grouping process through matching between a group name and an item.

도 29는 그룹화 할 목록별로 재분류된 스프레드 시트를 재출력한 것FIG. 29 is a reprint of a spreadsheet reclassified by a list to be grouped

도 30은 본 발명에 따른 데이터 분류방법을 나타내는 순서도.30 is a flowchart showing a data classification method according to the present invention.

데이터를 이용하여 새로운 분석필드를 만드는 방법에는 문자형 데이터의 그룹화 방법, 수치성 단일필드 데이터의 그룹화 방법 및 수치성 2개 필드 데이터의 그룹화 방법이 있다.Methods for creating new analysis fields using data include grouping of character data, grouping of numerical single field data, and grouping of two numerical field data.

첫째, 문자형 데이터의 그룹화 방법은, 스프레드 시트에 추출된 데이터의 필드가 직업필드인 경우에 통상 의사, 변호사, 약사, 회사원, 공무원, 상인, 학생, 주부로 분류되어 있는데 사용자가 특정목적으로 직업필드의 구성 데이터를 이용하여 새로운 그룹으로 필드를 그룹화할 경우 의사, 변호사, 약사를 전문가집단으로,회사원, 공무원을 봉급생활자집단으로, 상인을 자영업자집단으로, 학생과 주부를 기타집단으로 그룹화 하여 스프레드 시트상에서 각 집단별 특성 분석에 활용할 수 있게 해주는 방법이다.First, the character data grouping method is usually classified into doctor, lawyer, pharmacist, office worker, government employee, trader, student, housewife when the field of data extracted from the spreadsheet is a job field. When grouping fields into new groups using the organization's data, doctors, lawyers, and pharmacists are grouped as experts, office workers and civil servants as salaried groups, merchants as self-employed groups, and students and housewives as groups. It is a method that can be used for analyzing characteristics of each group.

문자형 데이터의 그룹화 방법은 스프레드 시트에 추출된 문자 데이터를 사용자의 특정목적에 따라 그룹화 하여 분석에 용이하도록 설계되어 있다. 도 1은 서울시의 구별, 행정동별 가구수 등의 분포를 수치로 나타낸 것이다. 스프레드 시트의 사용자는 임의로 그룹화한 필드명으로 스프레드 시트의 문자 필드를 그룹화 하여 분석함으로써 구성데이터를 구분지을 수 있다. 또한 구성데이터를 그룹화할 때 이에 연동되는 수치데이터 그룹은 사용자의 임의 그룹화 필드로 연동 계산되어 구분되어지므로 사용자가 그룹화한 필드의 수치상 데이터를 신속하게 비교 분석한다.The character data grouping method is designed to be easy for analysis by grouping the character data extracted in the spreadsheet according to the specific purpose of the user. Figure 1 shows the distribution of the number of households, such as the distinction of Seoul, administrative buildings. The user of the spreadsheet can distinguish the composition data by grouping and analyzing the character fields of the spreadsheet by field names arbitrarily grouped. In addition, when grouping the configuration data, the numerical data group linked to the data group is linked and calculated by the user's arbitrary grouping field so that the numerical data of the field grouped by the user can be compared and analyzed quickly.

도 2는 문자필드의 그룹을 재분류 하는 과정의 시작단계로서 스프레드 시트가 가지고 있는 문자데이터의 항목(도면 가운데 그룹명에 해당)을 보여주고 있다.FIG. 2 shows an item of text data (corresponding to a group name in the drawing) that a spreadsheet has as a start step of reclassifying a group of text fields.

도 3은 새 아이템 목록에 입력하는 특성에 따라 사용자가 그룹명을 지정하는 과정을 보여주고 있다. 즉 새로이 생성된 아이템 입력란에 사용자가 그룹명을 입력하면 새로운 아이템 목록이 생성되는 과정을 나타낸다.3 illustrates a process in which a user specifies a group name according to characteristics input to a new item list. That is, when a user enters a group name in a newly created item input box, a new item list is generated.

도 4는 도3의 과정을 통해 새 그룹 아이템 그룹명을 지정한 후 각각의 새 그룹 아이템 목록속에 그룹화된 기존의 아이템 목록을 매칭시키는 과정을 보여주고 있다.4 illustrates a process of matching a group of existing items grouped in each new group item list after designating a new group item group name through the process of FIG. 3.

도 5는 그룹화 과정에서 지정한 새 그룹 아이템 목록인 기타구, 우수구, 일반구를 기존 그룹의 아이템 목록과 매칭한 상태를 나타낸다FIG. 5 shows a state in which the new group item list specified in the grouping process, the guitar ball, the excellent ball, and the general ball match the item list of the existing group.

도 6은 일련의 문자형 데이터의 그룹화를 완료한 후, 스프레드 시트 상에 재출력한 화면을 나타낸다.6 shows a screen re-outputted on a spreadsheet after completing grouping of a series of character data.

둘째, 수치성 단일필드 데이터의 그룹화 방법은, 스프레드 시트에 추출된 데이터의 필드가 매출액인 경우에 매출액 필드의 값 범위가 1부터 1,000까지의 산포도를 갖는 값으로 구성되어 있는데 사용자가 특정목적으로 매출액필드의 값 데이터를 이용하여 새로운 그룹으로 필드를 그룹화 할 경우 1부터 50까지를 매출액 저조 집단으로, 51부터 100까지를 매출액 보통 집단으로, 101부터 250까지를 매출액 양호 집단으로, 251부터 500까지를 매출액 우수 집단으로, 501이상을 매출액 최상 집단으로 그룹화 하여 각 집단별 특성 분석에 활용할 수 있고, 스프레드 시트 상에서 각 집단별 특성 분석에 활용할 수 있도록 하는 방법이다.Second, the numerical single-field data grouping method consists of a value range of 1 to 1,000 when the value of the sales field is scattered when the field of the data extracted from the spreadsheet is sales. When grouping fields into a new group using the value data of the field, 1 to 50 is regarded as low sales group, 51 to 100 as normal sales group, 101 to 250 as good sales group, and 251 to 500. It is a method that can be used to analyze the characteristics of each group by grouping more than 501 into the best group of sales as the excellent sales group, and to use the characteristics analysis of each group on the spreadsheet.

수치성 단일필드 데이터의 그룹화 방법은 스프레드 시트에 추출된 데이터의 필드가 수치의 특성을 지니고 있을 경우 이를 매트릭스 쉬트에서 동일한 특성을 가진 그룹의 형태로 그룹화하는 작업이다. 도 7은 수치특성을 가진 스프레드 시트를 보여준다. 일반적인 스프레드 시트는 아래 설명될 일련의 과정을 거쳐 사용자가 원하는 산포도를 얻을 수 있으며 이러한 결과값을 지닌 산포도는 사용자의 정의에 따라 그룹화 하여 분석에 이용된다. 도 8은 수치성을 가진 스프레드 시트상의 각 필드목록을 나타낸다. 사용자는 수치성의 각 필드항목에서 1개의 수치필드 항목을 선택한다. 도 9는 사용자가 지정한 수치필드 1개의 항목을 분석 버튼을 이용하여 산포도로 표현한 것이다. 이때 분석라인은 y값을 가진 1개의 분석 라인이 생성된다. 도 10은 값이동 버튼을 이용하여 분석라인을 추가시키고 분석라인을 이동하여 정열시키는 과정을 표현한 것이다. 사용자가 지정하는 수치데이터 값을 산포도를 통해 시각적 분석을 가능하게 하고 이를 다시 분석라인을 추가, 이동, 정열시켜 그룹화하여 특정 값을 나타내는 데이터를 분석할 수 있게 된다. 도 11은 수치데이터들의 특성별로 그룹화하는 초기단계를 표현한 것으로 수치데이터 값을 2.5미만집단, 7이상집단, 2.5-6.9집단으로 사용자가 임의 분류하여 그룹화할 새 그룹명을 입력하는 화면이다. 도 12는 상기와 같이 그룹화한 새 그룹명과 산포도상에 분류(도면상 아직 지정되지 않은 매트릭스 셀)되어 있는 그룹명(G1, G2)등을 매칭시켜 그룹명과 그에 속할 산포도 데이터 값들을 연결시킨 후의 화면을 나타내고 있다. 도 13은 수치성을 가진 단일필드 1개의 값과 각 항목값을 연계시켜 그룹화 한 후 사용자가 지정한 그룹명으로 재분류된 스프레드 시트를 나타낸다. 사용자는 원하는 임의의 그룹명으로 수치데이터를 재분류하여 분석에 활용할 수 있다.The method of grouping numerical single field data is to group the fields of the data extracted from the spreadsheet into the form of groups having the same characteristics in the matrix sheet. 7 shows a spreadsheet with numerical characteristics. In a typical spreadsheet, a scatter diagram can be obtained through a series of processes described below. Scatter plots with these results can be grouped according to the user's definition and used for analysis. 8 shows a list of each field on the numerical spreadsheet. The user selects one numerical field item from each field item of numerical value. 9 illustrates a scatter diagram of one numerical field designated by a user using an analysis button. In this case, one analysis line having a y value is generated in the analysis line. 10 illustrates a process of adding an analysis line and moving and sorting the analysis line using a value moving button. The user-specified numerical data values can be visually analyzed using a scatter chart, and the analysis lines can be added, moved, and sorted and grouped to analyze data representing specific values. FIG. 11 is a screen for inputting a new group name to be grouped by a user by randomly classifying numerical data values into groups less than 2.5, groups of 7 or more, and groups 2.5-6.9. 12 is a screen after matching the group name and the scatter plot data values belonging to it by matching the new group name grouped as above with the group names G1 and G2 classified on the scatter diagram (a matrix cell not yet designated on the drawing) and the like. Indicates. FIG. 13 shows a spreadsheet in which a single field having numerical values and each item value are linked and grouped, and then reclassified to a group name designated by a user. The user can reclassify the numerical data to any desired group name and use it for analysis.

셋째, 수치성 2개 필드 데이터의 그룹화 방법은, 스프레드 시트에 추출된 데이터의 필드가 매출액 및 수익인 경우에 매출액 필드의 값 범위가 1부터 1,000까지, 수익 필드의 값 범위가 1부터 100까지의 산포도를 갖는 값으로 구성되어 있는 데, 사용자가 특정목적으로 매출액필드 및 수익필드의 값 데이터를 이용하여 새로운 그룹으로 필드를 그룹화 할 경우 매출액 필드의 값의 기준 라인을 250으로, 수익필드 값 의 기준 라인을 50으로 설정한 후 사용자가 특정목적으로 매출액필드 및 수익필드 기준 라인 값을 이용하여 데이터를 새로운 그룹으로 필드를 그룹화할 경우 매출액이 250이하이면서 수익이 50이하인 집단을 매출액 및 수익 모두 저조 집단으로, 매출액이 251이상이면서 수익이 50 이하인 집단을 매출액은 우수하나 수익은 저조한 집단으로, 매출액이 250이하이면서 수익은 51이상인 집단을 매출액은 저조하나 수익은 우수한 집단으로, 매출액이 251이상이면서 수익은 51이상인 집단을 매출액 및 수익 모두 우수한 집단으로 그룹화 하여 스프레드 시트 상에서 각 집단별 특성 분석에 활용할 수 있다.Third, the grouping method of the two-digit numerical field data includes a value range of 1 to 1,000 and a value range of 1 to 100 if the value of the data extracted in the spreadsheet is sales and revenue. It is composed of scattered values. When users group fields into a new group by using the value data of sales field and profit field for specific purposes, the base line of the value of sales field is 250, and the value of profit field value is used. After setting the line to 50, if the user grouped the data into a new group by using the sales field and revenue field baseline line values for a specific purpose, the group with sales less than 250 and the revenue less than 50 would have a low sales and profit group. As a group with sales of more than 251 and a profit of less than 50, the sales are excellent but the revenue is low. Groups with less than 250 but less than 51 revenues are low in sales but good in profits. Groups with sales of more than 251 and more than 51 in sales and profits can be grouped into excellent groups for sales and profits. have.

수치성 2개 필드 데이터의 그룹화 방법은 스프레드 시트에서 추출된 데이터의 필드를 2개 선택하여 각 필드별 연관성을 시각적으로 즉시 분석할 수 있는 매트릭스 분석 기법이다. 도 14는 수치성 2개 필드 데이터의 그룹화의 시작단계인 스프레드 시트를 보여준다. 도 15는 매트릭스 기본 쉬트에서 매트릭스 챠트로 분석하려는 필드목록의 수치성 데이터를 2개 선택한 화면을 나타낸다. 사용자는 사용자의 필요로 수치성 데이터를 2개 선택하여 각 수치 필드의 연관성을 분석할 수 있게 된다.The method of grouping two numerical field data is a matrix analysis technique that can select two fields of data extracted from a spreadsheet and visually analyze the association of each field. Figure 14 shows a spreadsheet that is the beginning of grouping two numerical field data. 15 shows a screen in which two numerical data of the field list to be analyzed by the matrix chart are selected from the matrix basic sheet. The user can select two numerical data according to the user's needs and analyze the correlation of each numerical field.

본 화면은 주택수와 평균연령 2개의 필드를 사용자가 선택하였다. 도16은 선택된 2개의 필드를 상호 연관성 수치에 따라 분류된 산포도를 나타낸다. 선택된 수치필드가 2개 이므로 x, y 좌표에 각 1개씩 분석라인이 자동으로 설정되어 진다. 도 17은 사용자가 원하는 분석라인의 수를 추가 생성하여 분석의 유연성을 제고시키는 화면을 나타낸다. 사용자는 산포도로 표현된 2개 이상의 수치필드 값을 분석라인을 추가시켜 임의의 구간으로 정할 수 있으며 데이터 분석라인을 데이터의 값에 따라 이동시킬 수 있다. 도18은 추가시킨 분석라인을 수치필드 값의 산포도에 따라 정열시키는 과정이다. 사용자는 수치필드의 산포도 값을 분석라인을 이용하여 정열함으로써 유사한 수치데이터 필드의 값으로 매트릭스 도면상에 그룹핑할 수 있으며 문자형 데이터의 분류에서 사용되었던 도 10번의 "라인 이동'기능을 이용하여 값의 수치를 입력함으로써 정열시킬 수도 있다. 도 19는 매트릭스 챠트에서 분석라인으로 분류되어진 셀을 유사영역별로 문자를 이용하여 그룹핑하는 초기화면을 나타낸다. 사용자가 원하는 그룹명을 새 그룹 입력창을 이용하여 입력하고 지정되지 않은 매트릭스 셀의 항목과 매칭시켜 그룹화를 하게된다. 도 20은 새로이 분류된 6개의 그룹명과 그룹명 중 기타지역에 포함되는 G7, G8, G9의 매트릭스 셀을 나타낸다. 각 그룹명을 선택하면 그 그룹명 안에 포함되어진 매트릭스 셀의 항목들이 표현되어 진다.In this screen, the user selects two fields, house number and average age. Figure 16 shows a scatter plot of the two selected fields according to their correlation values. Since there are two selected numerical fields, one analysis line is automatically set up for each of x and y coordinates. FIG. 17 illustrates a screen for additionally generating the number of analysis lines desired by a user to increase flexibility of analysis. The user can set two or more numerical field values expressed as scatter diagrams to an arbitrary section by adding an analysis line, and move the data analysis line according to the data value. 18 is a process of arranging the added analysis line according to the scatter diagram of the numerical field values. Users can group the scatter plot values of numerical fields using analysis lines to group similar numerical data field values on the matrix drawing and use the "line move" function of FIG. Fig. 19 shows an initial screen for grouping cells classified as analysis lines in a matrix chart by character area by inputting numerical values, using a new group input window. The grouping is performed by matching items of an unspecified matrix cell, and FIG. 20 shows matrix cells of G7, G8, and G9 included in other regions among the six newly classified group names and group names. Then, the items of the matrix cell included in the group name are displayed.

이러한 분류과정을 거친 데이터 그룹들은 도 21과 같이 재분류된 스프레드 시트로 재 출력됨으로써 사용자가 원하는 형태로 분석을 할 수 있는 분석의 유연성을 제공한다.The data groups that have undergone such a classification process are re-printed into a reclassified spreadsheet as shown in FIG. 21 to provide flexibility of analysis that can be analyzed in a desired format by a user.

넷째, 수치성 데이터분포의 사선 및 자유형 그룹화 방법은, 단일 수치데이터의 분류 및 2개 수치데이터 분류에 의하여 매트릭스 쉬트 상에 산포도로 표현되어진 데이터의 값을 유사한 군으로 재분류하는 기법이다. 단일수치 및 2개 수치의 산포도 분류에서는 x, y 좌표를 가로 및 세로의 수치선을 설정하는데 이용하였지만 사선 및 자유형 그룹화 방법에서는 분포도로 표현된 데이터 값의 유사성을 x, y 좌표 값을 이용한 분석라인 설정 이외에 데이터값의 유사성을 시각적으로 확인한 후 원하는 사선 혹은 자유도형으로 그룹화하고, 그룹화된 데이터의 값을 토대로 스프레트 쉬트에 재분류하여 분석할 수 있는 그룹화 방법이다. 가령 데이터의 분포도가 중앙 및 좌우측에 밀접하게 분류되어 있는 경우 자유형 데이터 분류기법중의 사선분석라인을 선택하여 시각적으로 판단한 유사분포 데이터 수치를 사선분석라인을 통하여 그룹화를 할 수 있으며 이렇게 분류된 사선분류기법의 각각 영역을 스프레트 쉬트로 재 분류 통합하여 분석에 활용한다.Fourth, the diagonal and free-form grouping method of the numerical data distribution is a technique of reclassifying the values of the data represented by the scatter diagram on the matrix sheet by the classification of the single numerical data and the two numerical data classification. In the scatter plot classification of single and two values, x, y coordinates are used to set the horizontal and vertical numerical lines, but in the diagonal and free form grouping methods, the similarity of the data values represented by the distribution chart is analyzed using x, y coordinate values. It is a grouping method that visually confirms the similarity of the data values in addition to the setting, then groups them in a desired diagonal line or degrees of freedom, and reclassifies them into the spreadsheet based on the values of the grouped data. For example, if the distribution of data is closely classified on the center and left and right sides, you can select the diagonal analysis line in the free-form data classification technique and group the similarly distributed data values visually judged through the diagonal analysis line. Each area of the technique is reclassified into a spreadsheet and used for analysis.

또한 산포도로 표현된 데이터의 값이 원형 또는 자유형태의 도형으로 분포되어있는 경우 혹은 산포된 데이터의 값이 분포되어 있는 특정 영역을 분석해 보고자 할 때에 원형 혹은 자유영역 분석라인을 선택하여 유저가 원하는 분포도상의 데이터 값을 선택하면 중요분포 영역의 데이터 값을 스프레트 쉬트로 분석할 수 있도록 재분류 및 그룹화 하는 기법이다.In addition, when the value of the data represented by the scatter diagram is distributed in a circular or free form, or when you want to analyze a specific area in which the scattered data values are distributed, the user selects a circular or free area analysis line and selects a distribution map. If you select the data value of the phase, it is a technique to reclassify and group the data value of the important distribution area so that it can be analyzed by the spread sheet.

도 22는 분석의 시작단계인 스프레드 시트를 나타낸다. 도 23은 산포도로 표현될 2개 필드목록을 선택한 화면이다. 수치성 데이터 분포의 사선 및 자유형 그룹화 방법에서는 1개의 수치성 필드를 선택하는 것에도 똑같이 적용하여 활용할 수 있으며 본 면에서는 2개의 수치성 필드를 선택한 화면을 예로 들어 설명한다.22 shows a spreadsheet that is the beginning of the analysis. 23 is a screen for selecting two field lists to be expressed as scatter diagrams. The diagonal and free-form grouping methods of the numerical data distribution can be similarly applied to selecting one numerical field. In this aspect, a screen in which two numerical fields are selected will be described as an example.

도 24는 2개의 수치성필드를 분석하여 산포도로 표현한 화면이다 2개의 수치성 필드를 선택하였으므로 x, y 축에 각 1개씩의 분석라인이 자동으로 생성되었다. 도 25는 각 데이터 값의 산포도를 x, y 축의 직선 분석라인을 이용하지 않고 사선을 사용하여 분석한 화면이다. 데이터의 산포 모양에 따라 x, y 직선 분석라인은 분석의 정밀도에서 다소 부족함을 나타낼 수 있는데 이러한 한계를 사선을 통해 데이터 분포 유사성의 분석 정확도를 높인 것이다. 데이터의 x좌표 값에 따라 동일 x좌표 값인 경우 y값의 높낮이에 따라 G1, G2 등 산포영역명이 자동으로 생성된다. 도 26은 데이터 값의 산포 위치에 따라 직선 분석라인, 사선 분석라인과는 별도의자유 도형의 형태로 산포도상의 데이터 값을 그룹화 하는 장면을 표현한 것이다. 데이터 값에 따라 분포형태는 직선 및 사선의 형태로 제한이 있을 수 있는데 이러한 경우 자유도형의 그룹화방법을 이용하여 산포도를 묶을 수 있다. 도 27은 분류할 필드명의 데이터 항목을 사용자가 원하는 새 그룹으로 입력하는 단계를 표현한 것이다. 도 28은 그룹화명과 항목간의 매칭을 통한 그룹화 과정을 표현한 그림이다. 이러한 과정을 거쳐 도 29에서 그룹화 할 목록별로재 분류된 스프레드 시트를 재출력함으로써 분석에 활용할 수 있다.24 is a screen in which two numerical fields are analyzed and expressed in a scatter diagram. Since two numerical fields are selected, one analysis line is automatically generated on the x and y axes. FIG. 25 is a screen obtained by analyzing a scatter diagram of each data value using a diagonal line without using a straight line analysis line on the x and y axes. Depending on the shape of the data scattered, the x and y linear analysis lines may be somewhat lacking in the accuracy of the analysis. These limits increase the accuracy of the analysis of the similarity of the data distribution. In the case of the same x-coordinate value according to the x-coordinate value of the data, the scattering area names such as G1 and G2 are automatically generated according to the height of the y-value. FIG. 26 illustrates a scene in which data values on a scatter diagram are grouped in a form of a free figure separate from the straight line and the diagonal line according to the distribution position of the data values. According to the data value, the distribution form may be limited to the form of straight lines and oblique lines. In this case, the scatter diagram may be bundled using a grouping method of the degrees of freedom. 27 illustrates a step of inputting a data item of a field name to be classified into a new group desired by a user. 28 illustrates a grouping process through matching between group names and items. Through this process, the spreadsheet sorted again according to the list to be grouped in FIG. 29 may be used for analysis.

한편 데이터를 분류할 때 문자형 데이터의 그룹화 방법에 의할 경우에는 문자별로 새로운 필드명과 메핑하여 분류하며 수치성 단일 필드 데이터의 그룹화 방법에 의할 경우에는 산포도를 작성한 후 사용자가 지정하는 해당 값의 라인을 이용하여 데이터를 세분화한 후 각 라인 영역별로 새로운 필드명과 메핑하여 분류하며, 수치성 2개 필드 데이터의 그룹화 방법에 의할 경우에는 매트릭스 산포도를 작성한 후 사용자가 지정하는 각 필드별 해당 값의 라인을 이용하여 데이터를 세분화한 후 각 영역별로 새로운 필드명과 메핑하여 분류한다.On the other hand, when classifying data, the character data grouping method is used to classify the new field name by character and maps it. When the numerical single field data grouping method is used, a scatter map is created and a line of the corresponding value designated by the user. After subdividing the data by using and classifying by new field name and mapping by each line area, in case of grouping of two numerical field data, a matrix scatter map is created and the line of the corresponding value for each field designated by the user. After subdividing data by using, classify with new field name and mapping in each area.

도 30 은 본 발명에 따른 데이터 분류방법의 내용을 전체적으로 나타내는 순서도이다.30 is a flowchart showing the overall content of the data classification method according to the present invention.

본 발명을 이용하여 자료를 분류하면, 데이터베이스를 잘 모르는 사용자라도 손쉽게 숫자 필드를 포함한 자료들을 분류 할 수 있다. 또한 매트릭스 차트를 이용하여 시각적으로 데이터들의 산포도를 보면서 적당한 개수와 적당한 간격으로 분류기준을 지정할 수 있으므로, 최적으로 자료를 분류 할 수 있다.By classifying data using the present invention, users who are not familiar with the database can easily classify materials including numeric fields. In addition, it is possible to classify the data optimally because the classification chart can be specified at an appropriate number and at an appropriate interval while visually viewing the scatter diagram of the data using the matrix chart.

그리고 데이터베이스를 설계할 때, 이미 분류된 필드만 사용하던 기존 분석환경과 달리, 사용자 임의로 숫자 필드들을 분류할 수 있고, 또한 이렇게 분류된 내용을 새로운 필드로 생성함으로써, 파생된 분석에 사용하여 기존환경에서는 어려웠던, 새로운 분석을 할 수 있게 된다.And when designing a database, unlike the existing analysis environment that uses only the already classified fields, you can classify numerical fields arbitrarily and also create new fields by using the classified contents in the existing analysis environment. In Esau, new analysis was possible.

본 발명은 자료를 분류하는 기술을 기반으로 하는 분야에서 많이 사용될 수 있으며, 시프레드쉬트 상에서 데이터를 활용하여 분석하는 EXCEL, 통계패키지, OLAP(On-line Analytical Processing) 분석 등에 적용 가능하다.The present invention can be used in many fields based on the technology of classifying data, and can be applied to EXCEL, statistical package, OLAP (On-line Analytical Processing) analysis, etc., by analyzing data on a spreadsheet.

Claims

데이터베이스에서 이미 정의된 문자로 된 문자데이터 필드를 가진 원본테이블내의 레코드를 분류하여 분석하는 경우에 있어서,In the case of classifying and analyzing records in the original table with character data fields of characters already defined in the database,

사용자 임의로 그룹핑할 문자아이템을 생성하는 단계;Generating a text item to be grouped by a user arbitrarily;

상기 문자 데이터를 분류하여 분석할 원본테이블내의 레코드를 상기 생성된 문자아이템과 연결시키는 단계; 및Associating a record in the original table to classify and analyze the text data with the generated text item; And

상기 생성된 문자아이템과 원본테이블의 레코드를 연결시킨 뒤 이를 다시 스프레드 시트와 비교 분석하는 단계를 포함하는, 자료 분류 및 분석 방법.Linking the generated text items with the records of the original table and comparing the same with the spreadsheet again.

데이터베이스에서 이미 정의된, 숫자로 된 1개이상의 필드를 가진 원본 테이블내의 레코드들을 분류하여 분석하는 경우에 있어서,In the case of categorizing and analyzing records in the original table with one or more numeric fields already defined in the database,

상기 원본 테이블내의 레코드들을 필드명의 나타난 상태에서 나열하는 단계;Listing the records in the original table in the indicated state of the field name;

사용자가 상기 나열된 필드명 가운데에서 분류에 사용할 1개 이상의 숫자필드를 선택하는 단계;Selecting one or more numeric fields from among the listed field names for use in classification;

사용자가 선택한 필드를 매트릭스 차트를 이용하여 산포도로 작성하는 단계;Creating a scatter diagram of a field selected by the user using a matrix chart;

상기 매트릭스 차트를 이용해 작성된 산포도를 기준으로 임시분류(구간)를 만드는 단계;Creating a temporary classification (section) on the basis of the scatter chart created using the matrix chart;

사용자가 상기 만들어진 임시분류(구간) 가운데 원하는 임시분류(구간)의 개수 및 각 임시분류(구간)의 간격을 지정하는 단계;Designating a number of desired temporary classifications (sections) among the created temporary classifications (sections) and an interval of each temporary classification (section);

상기 지정된 임시분류(구간)의 개수 및 임시분류(구간)의 간격으로부터 최종적으로 만들고자 하는 최종분류들을 생성하는 단계;Generating final classifications to be finally made from the number of the specified temporary classifications (sections) and the intervals of the temporary classifications (sections);

상기 생성된 최종분류들의 각각을 매트릭스 차트에서 생성된 임시분류(구간)들과 연결하는 단계;Linking each of the generated final classifications with temporary classifications (sections) generated in a matrix chart;

상기 원본 테이블에 새로운 분류에 사용할 새로운 필드명을 입력 지정하는 단계; 및Inputting and specifying a new field name for a new classification in the original table; And

상기 원본 테이블에 상기 매트릭스 차트에서 지정된 새로운 필드명으로 필드를 만들고, 임시분류와 최종분류를 이용하여 필드의 내용을 입력하는 단계를 포함하는, 매트릭스를 이용한 자료분류 방법.Creating a field in the original table with the new field name specified in the matrix chart, and inputting the contents of the field using the temporary classification and the final classification.

제 2항에 있어서, 상기 임시분류(구간)의 개수 및 각 임시분류(구간)의 간격의 지정은 마우스로 드래그 하거나 숫자를 입력함으로써 수행되는 것이 특징인, 매트릭스를 이용한 자료분류 방법.The method of claim 2, wherein the number of the temporary classifications (sections) and the designation of the intervals of the temporary classifications (sections) are performed by dragging with a mouse or entering a number.

제 2 항에 있어서, 상기 사용자가 선택한 필드를 매트릭스 차트를 이용하여 작성되는 산포도는 사용자가 필드를 선택한 뒤 분석 버튼을 누르면 분석결과가 자동으로 화면상에 산포도로 작성되어 도시되는 것이 특징인, 매트릭스를 이용한 자료분류 방법.The matrix of claim 2, wherein the scatter diagram created by using the matrix chart of the user selected field is automatically drawn and drawn on the screen when the analysis button is selected after the user selects the field. Data classification method using

제 2 항에 있어서, 상기 원본 테이블내의 레코드들은 통계 데이터 및OLAP(On-line Analytical Processing) 데이터를 포함하는, 매트릭스를 이용한 자료분류 방법.3. The method of claim 2, wherein the records in the original table include statistical data and OLAP (On-line Analytical Processing) data.

제 2 항에 있어서, 상기 사용자가 분류기준을 지정할 때 GUI(Graphic User Interface)를 사용하는 것이 특징인, 매트릭스를 이용한 자료분류 방법.The method of claim 2, wherein the user uses a graphical user interface (GUI) to designate a classification criterion.

제 2 항에 있어서, 사용자가 분류기준을 지정할 때 사선과 자유도형을 사용하여 매트릭스 셀 상에 지정하는 것이 특징인, 매트릭스를 이용한 자료분류 방법.3. The method of claim 2, wherein the user designates the classification criteria on the matrix cell using diagonal lines and degrees of freedom.

GIS를 이용하여, 스프레드 시트에 추출된 문자 데이터를 사용자의 목적에 따라 그룹화 하는 경우에 있어서,In the case of using GIS to group text data extracted in a spreadsheet according to a user's purpose,

스프레드 시트가 가지고 있는 문자 데이터를 로딩하는 단계;Loading character data possessed by the spreadsheet;

상기 로딩된 데이터에서 분류할 데이터 필드를 선택하고 구성 데이터의 명칭을 추출하는 단계;Selecting a data field to classify from the loaded data and extracting a name of configuration data;

상기 선택된 필드의 그룹화명칭을 지정하는 단계;Designating a grouping name of the selected field;

상기 지정된 그룹화명칭과 데이터 항목을 매칭시키는 단계; 및Matching the specified grouping name and data item; And

상기 지정된 그룹화명칭과 데이터 항목의 매칭 후 스프레드 시트를 재출력하는 단계를 포함하는, 문자형 데이터의 그룹핑 방법.And re-printing a spreadsheet after matching the specified grouping name with the data item.

제 8 항에 있어서, 상기 데이터 항목 가운데 행정단위별로 지정 가능한 아이템 목록인 선택된 항목의 행정단위별 분포를 수치화 하는 단계를 더 포함하는, 문자형 데이터의 그룹핑 방법.The method of claim 8, further comprising digitizing a distribution of administrative items of the selected item, which is a list of items that can be designated for each administrative unit among the data items.

GIS를 이용하여, 스프레드 시트에 추출된 데이터의 필드가 수치 특성을 갖는 수치성 단일 필드 데이터를 사용자의 목적에 따라 그룹화 하는 경우에 있어서,In the case of using GIS to group numerical single field data whose fields of data extracted into a spreadsheet have numerical characteristics according to the user's purpose,

스프레드시트상의 데이터를 로딩하는 단계;Loading data on the spreadsheet;

사용자가 수치성 각 필드항목에서 1개의 수치필드 항목을 선택하는 단계;A user selecting one numerical field item from each numerical field item;

상기 사용자가 선택한 수치필드 항목을 분석 버튼을 이용하여 산포도로 표현하는 단계;Expressing the numerical field item selected by the user using a scatter button;

상기 산포도 표현시 1개의 기본 분석라인을 생성하는 단계;Generating one basic analysis line when the scatter diagram is expressed;

값이동 버튼을 이용하여 추가 분석라인을 생성하는 단계;Generating an additional analysis line using a value moving button;

상기 생성된 추가 분석라인을 이동하여 배열시키는 단계;Moving and arranging the generated additional analysis line;

지정할 문자 아이템을 생성하는 단계;Creating a text item to designate;

상기 생성된 문자 아이템과 수치데이터를 연결하는 단계; 및Linking the generated text item with numerical data; And

상기 문자 아이템과 수치데이터가 연결 후 스프레드 시트를 재출력하는 단계를 포함하는, 수치성 단일 필드 데이터의 그룹핑 방법.And re-printing the spreadsheet after the text item and the numeric data are concatenated.

제 10 항에 있어서, 상기 산포도를 통해 수치데이터 값을 시각적으로 분석하는 단계를 더 포함하는, 수치성 단일 필드 데이터의 그룹핑 방법.12. The method of claim 10, further comprising visually analyzing numerical data values through the scatter plot.

제 11 항에 있어서, 상기 새 그룹명과 아직 지정되지 않은 매트릭스 셀인 산포도상에 분류되어 있는 그룹명을 매칭시켜 그룹명과 그에 속할 산포도 데이터 값들을 연결시켜 그룹화한 뒤 사용자가 지정한 그룹명으로 재분류하는 단계를 더 포함하는, 수치성 단일 필드 데이터의 그룹핑 방법.12. The method of claim 11, further comprising: matching a new group name with a group name classified on a scatter diagram which is not yet designated matrix cell, concatenating and grouping a group name and scatter data values to belong to the group name, and reclassifying the group name into a group name designated by a user. Further comprising, the grouping method of numerical single field data.

GIS를 이용하여, 스프레드 시트에 추출된 데이터의 필드가 수치 특성을 갖는 수치성 2개 필드 데이터를 사용자의 목적에 따라 그룹화 하는 경우에 있어서,In the case where a field of data extracted in a spreadsheet is grouped according to a user's purpose by using GIS, two field data having numerical characteristics

스프레드시트 상의 데이터를 로딩하는 단계;Loading data on a spreadsheet;

매트릭스 기본 시트에서 추출되어 매트릭스 챠트로 분석하려는 필드목록의 수치성 필드 데이터를 2개 선택하는 단계;Selecting two numerical field data of the field list to be extracted from the matrix base sheet and analyzed by the matrix chart;

상기 선택된 2개의 수치필드 항목을 상호 연관성 수치에 따라 분류하여 매트릭스 산포도로 나타내는 단계;Classifying the two selected numerical field items according to correlation values and displaying a matrix scatter diagram;

상기 산포도 표현시 각 필드에 대응하는 x, y 좌표에 각 한 개씩의 분석라인을 생성하는 단계;Generating one analysis line at x and y coordinates corresponding to each field when the scatter diagram is expressed;

추가 분석라인을 생성하는 단계;Generating an additional analysis line;

상기 생성된 추가 분석라인을 배열하는 단계;Arranging the generated additional analysis line;

상기 연결 후 스프레드 시트를 재출력하는 단계를 포함하는, 수치성 2개의 필드 데이터의 그룹핑 방법.And reprinting the spreadsheet after the linking.

제 13 항에 있어서, 상기 산포도를 통해 수치데이터 값을 시각적으로 분석하는 단계를 더 포함하는, 수치성 2개의 필드 데이터의 그룹핑 방법.15. The method of claim 13, further comprising visually analyzing numerical data values through the scatter plot.

제 14 항에 있어서, 상기 매트릭스 챠트에서 분석라인으로 분류되어진 셀을 유사영역별로 문자를 이용하여 그룹핑하는 단계; 및15. The method of claim 14, further comprising: grouping cells classified as analysis lines in the matrix chart using characters for each similar area; And

사용자가 원하는 그룹명을 입력하고, 지정되지 않은 매트릭스 셀의 항목과 매칭시켜 그룹화 하는 단계를 더 포함하는, 수치성 2개의 필드 데이터의 그룹핑 방법.Inputting a group name desired by the user, and matching the group with an item of an unspecified matrix cell for grouping.

GIS를 이용하여, 스프레드 시트에 추출된 데이터의 필드가 수치 특성을 갖는 수치성 필드 데이터를 사용자의 목적에 따라 그룹화 하는 경우에 있어서,In the case where the field of the data extracted in the spreadsheet is grouped according to the user's purpose by using the GIS,

스프레드 시트 상의 데이터를 로딩하는 단계;Loading data on a spreadsheet;

숫자필드를 한 개 또는 두 개 선택하는 단계;Selecting one or two numeric fields;

상기 선택한 필드를 분석하여 매트릭스 산포도로 나타내는 단계;Analyzing the selected field and presenting a matrix scatter diagram;

기본 분석라인을 생성하는 단계;Generating a basic analysis line;

상기 수치성 필드의 수에 해당하는 좌표축에 각 1개씩의 사선/자유도형 분석라인을 생성하는 단계;Generating one diagonal / freedom analysis line each in a coordinate axis corresponding to the number of the numerical fields;

상기 생성된 사선/자유도형 분석라인을 배열하는 단계;Arranging the generated diagonal / freedom analysis line;

상기 생성된 문자 아이템과 수치데이터를 연결시키는 단계; 및Linking the generated text item with numerical data; And

상기 연결 후 스트레드 시트를 재출력하는 단계를 포함하는, 수치성 데이터 분포의 사선 및 자유형 그룹핑 방법.And reprinting the strain sheet after the connection.

제 16 항에 있어서, 데이터 값의 산포위치에 따라 자유 도형의 형태로 산포도상의 데이터값을 그룹화하는 단계를 더 포함하는, 수치성 데이터 분포의 사선 및 자유형 그룹핑 방법.17. The method of claim 16, further comprising grouping data values on a scatter diagram in the form of a free form according to the distribution position of the data values.