KR101963821B1

KR101963821B1 - Method and apparatus for calculating similarity of program

Info

Publication number: KR101963821B1
Application number: KR1020170025717A
Authority: KR
Inventors: 김형식; 윤정무; 김기수; 이경준
Original assignee: 충남대학교산학협력단
Priority date: 2017-02-27
Filing date: 2017-02-27
Publication date: 2019-03-29
Also published as: KR20180098925A

Abstract

식별 번호 생성부가, 2개의 프로그램 각각에 포함된 모든 함수 또는 모든 기본 블록(basic block) 각각을 기설정된 조건에 따라 변환하여 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호를 생성하는 단계, 특성 비트 벡터 생성부가, 2개의 프로그램 각각에 대응되며 기설정된 개수의 비트를 포함하는 2개의 비트 벡터 상에서 식별 번호에 각각 대응되는 비트의 비트값을 설정하여 2개의 특성 비트 벡터를 각각 생성하는 단계 및 프로그램 유사도 산출부가, 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 서로 비교하여 2개의 프로그램 상호간의 유사도를 산출하는 단계를 포함하는, 프로그램 유사도 산출 방법 및 이를 위한 장치에 관한 것이다.Generating an identification number corresponding to each of all functions or all basic blocks by converting each of all functions included in each of the two programs or all of the basic blocks according to a predetermined condition, Generating a plurality of characteristic bit vectors by setting a bit value of a bit corresponding to each identification number on two bit vectors corresponding to each of two programs and including a predetermined number of bits, And calculating the degree of similarity between the two programs by comparing the two characteristic bit vectors on a bit-by-bit basis corresponding to each other, and an apparatus therefor.

Description

프로그램 유사도 산출 방법 및 장치{METHOD AND APPARATUS FOR CALCULATING SIMILARITY OF PROGRAM}[0001] METHOD AND APPARATUS FOR CALCULATING SIMILARITY OF PROGRAM [0002]

본 발명은 2 개의 프로그램 상호간의 유사도를 산출하는 프로그램 유사도 산출 방법 및 장치에 관한 것이다.The present invention relates to a method and an apparatus for calculating a degree of similarity between two programs.

컴퓨터 프로그래밍 기술 및 이동 통신 기술이 발전함에 따라 사용자들은 인터넷 환경하에서 다양한 컴퓨터 프로그램들에 손쉽게 접근할 수 있게 되었으며, 필요에 따른 각종 프로그램들을 다운로드하여 자신의 컴퓨터, 노트북, 스마트 폰 등 각종 단말기에 저장할 수 있게 되었다.As computer programming technology and mobile communication technology develop, users can easily access various computer programs under the Internet environment, and can download various programs according to their needs and store them in various terminals such as a computer, a notebook, and a smart phone It was.

이러한, 인터넷 환경이 조성됨에 따라서 사용자들은 상술한 바와 같이 필요에 따른 각종 프로그램들을 다운로드 할 수 있게 되었으나, 그 반대급부로 각종 위변조 프로그램, 악성 코드 등에 손쉽게 노출되는 문제가 발생하게 되었다.As the Internet environment is developed, users can download various programs according to needs as described above, but they are easily exposed to various forgery and falsification programs and malicious codes at the opposite level.

종래에는, 특정 프로그램이 위변조된 프로그램인지 여부, 특정 프로그램이 악성 코드인지 여부 등을 판단하기 위해서는 2개의 프로그램 내에 포함된 함수 또는 기본 블록을 서로 일대일로 직접 비교하는 과정을 거쳐야 했다.Conventionally, in order to determine whether a specific program is a falsified program or whether or not a specific program is malicious code, it has been necessary to directly compare functions or basic blocks included in the two programs on a one-to-one basis.

그러나, 2 개의 프로그램 각각에 포함된 함수 또는 기본 블록을 서로 일대일로 직접 비교하는 경우에 비교 대상이 되는 프로그램 각각의 용량이 큰 상황에서 2개의 프로그램을 비교하기 위해서는 과도하게 많은 시간이 요구되는 문제가 있다.However, in a case where a function or a basic block included in each of two programs are directly compared one-on-one, there is a problem that an excessive time is required to compare two programs in a situation where the capacity of each of the programs to be compared is large have.

또한, 상술한 경우에서 비교하는 2개의 프로그램의 용량에 따라 프로그램을 함수 또는 기본 블록 단위로 서로 비교하는데 걸리는 시간이 변하게 되므로, 비교에 걸리는 전체 시간을 예측할 수 없는 문제가 있다.In addition, since the time taken to compare the programs with each other in units of functions or basic blocks changes according to the capacity of the two programs to be compared in the above case, there is a problem that the total time to be compared can not be predicted.

한국 등록특허공보 제10-0572660호(2006.04.24.)Korean Patent Registration No. 10-0572660 (April 24, 2006)

본 발명의 목적은, 상기 문제점을 해결하기 위한 것으로 2개의 프로그램 각각에 포함된 모든 함수 또는 모든 기본 블록(basic block) 각각에 대응되는 식별 번호를 생성하고, 기설정된 크기의 비트 벡터 상에서 식별 번호 각각에 대응되는 비트의 비트값을 설정하여 2개의 프로그램 각각에 대응되는 2개의 특성 비트 벡터를 생성한 뒤, 2개의 특성 비트 벡터를 서로 비교하여 2개의 프로그램 상호간의 유사도를 산출하기 위함이다.An object of the present invention is to solve the above problem by generating identification numbers corresponding to all functions or all basic blocks included in each of the two programs, To generate two characteristic bit vectors corresponding to each of the two programs and then compares the two characteristic bit vectors with each other to calculate the similarity between the two programs.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제(들)로 제한되지 않으며, 언급되지 않은 또 다른 과제(들)은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the above-mentioned problem (s), and another problem (s) not mentioned can be clearly understood by those skilled in the art from the following description.

상기한 목적을 달성하기 위하여, 본 발명의 실시예에 따른 프로그램 유사도 산출 방법은 식별 번호 생성부가, 2개의 프로그램 각각에 포함된 모든 함수 또는 모든 기본 블록 각각을 기설정된 조건에 따라 변환하여 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호를 생성하는 단계, 특성 비트 벡터 생성부가, 2개의 프로그램 각각에 대응되며 기설정된 개수의 비트를 포함하는 2개의 비트 벡터(bit vector) 상에서 식별 번호에 각각 대응되는 비트의 비트값을 설정하여 2개의 특성 비트 벡터를 각각 생성하는 단계 및 프로그램 유사도 산출부가, 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 서로 비교하여 2개의 프로그램 상호간의 유사도를 산출하는 단계를 포함한다.In order to achieve the above object, a method of calculating a program similarity degree according to an embodiment of the present invention is characterized in that the identification number generator converts all functions included in each of the two programs or each of all the basic blocks according to predetermined conditions, The characteristic bit vector generation unit generates an identification number corresponding to each of the basic blocks, and the characteristic bit vector generation unit generates the characteristic bit vector corresponding to each identification code on two bit vectors corresponding to each of the two programs and including a predetermined number of bits Generating two characteristic bit vectors by setting a bit value of a bit and a step of calculating a degree of similarity between two programs by comparing two characteristic bit vectors with each other corresponding to each other bit by bit, .

예컨대, 본 발명의 실시예에 따른 프로그램 유사도 산출 방법은 2개의 프로그램 각각이 기계어로 구성된 2개의 실행 프로그램인 경우, 식별 번호를 부여하는 단계 이전에, 프로그램 변환부가, 2개의 실행 프로그램 각각을 함수 또는 기본 블록으로 구성되는 어셈블리언어 기반 프로그램 또는 고급 언어 기반 프로그램으로 변환하는 단계를 더 포함한다.For example, in the method of calculating the program similarity degree according to the embodiment of the present invention, when each of the two programs is two execution programs composed of machine words, before the step of assigning the identification numbers, Into an assembly language-based program or a high-level language-based program constituted by basic blocks.

예를 들어, 식별 번호를 생성하는 단계는, 모든 함수 또는 모든 기본 블록 각각에 해시함수를 적용하여 모든 함수 또는 모든 기본 블록 각각을 임의의 비트열로 변환하는 단계, 모든 함수 또는 모든 기본 블록 각각에 대응되는 임의의 비트열 각각에서 기설정된 개수의 연속된 비트열인 연속 비트열을 각각 추출하는 단계 및 추출된 연속 비트열 각각을 10진수로 변환하여 식별 번호를 생성하는 단계를 포함한다.For example, the step of generating an identification number may include applying a hash function to every function or every basic block to convert each or all of the basic blocks into an arbitrary bit string, Extracting consecutive bit strings, each of which is a predetermined number of consecutive bit strings, from each of the corresponding arbitrary bit strings; and generating identification numbers by converting each of the extracted consecutive bit strings into decimal numbers.

예컨대, 2개의 특성 비트 벡터를 각각 생성하는 단계는, 식별 번호 각각에 대응되는 비트의 비트값을 '1'로 설정하는 단계를 포함한다.For example, generating each of the two characteristic bit vectors includes setting a bit value of the bit corresponding to each of the identification numbers to '1'.

예를 들어, 2개의 프로그램 상호간의 유사도를 산출하는 단계는, 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 순차적으로 비교하여, 서로 일치하는 비트값을 가지는 비트의 개수에 기초하여 2개의 프로그램 상호간의 유사도를 산출하는 단계를 포함한다.For example, the step of calculating the degree of similarity between the two programs is a step of comparing the two characteristic bit vectors sequentially corresponding to each other on a bit-by-bit basis, and calculating the degree of similarity between the two programs based on the number of bits having mutually corresponding bit values. And calculating the degree of similarity.

일 실시예에 따르면, 2개의 프로그램 상호간의 유사도를 산출하는 단계는, 서로 대응되는 비트의 비트값이 모두 '1'인 비트의 개수를 서로 대응되는 비트의 비트값 중 적어도 하나가 '1'인 비트의 개수로 나누어 2개의 프로그램 상호간의 유사도를 산출하는 단계를 포함한다.According to an embodiment, calculating the degree of similarity between two programs may include calculating a degree of similarity between two programs by calculating a degree of similarity between two programs, And dividing the number of bits by the number of bits to calculate the degree of similarity between the two programs.

상기한 목적을 달성하기 위하여, 본 발명의 실시예에 따른, 프로그램 유사도 산출 장치는 2개의 프로그램 각각에 포함된 모든 함수 또는 모든 기본 블록 각각을 기설정된 조건에 따라 변환하여 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호를 생성하는 식별 번호 생성부, 2개의 프로그램 각각에 대응되며 기설정된 개수의 비트를 포함하는 2개의 비트 벡터 상에서 식별 번호에 각각 대응되는 비트의 비트값을 설정하여 2개의 특성 비트 벡터를 각각 생성하는 특성 비트 벡터 생성부 및 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 서로 비교하여 2개의 프로그램 상호간의 유사도를 산출하는 프로그램 유사도 산출부를 포함한다.In order to achieve the above object, a program similarity degree calculating device according to an embodiment of the present invention converts all functions included in each of two programs or each of all basic blocks according to predetermined conditions, An identification number generation unit for generating an identification number corresponding to each of the two programs and setting a bit value of a bit corresponding to the identification number on two bit vectors corresponding to each of the two programs and including a predetermined number of bits, And a program similarity calculating unit for calculating a degree of similarity between two programs by comparing the characteristic bit vector generating unit and the two characteristic bit vectors for each bit corresponding to each other.

예컨대, 본 발명의 실시예에 따른 프로그램 유사도 산출 장치는, 2개의 프로그램 각각이 기계어로 구성된 2개의 실행 프로그램인 경우, 2개의 실행 프로그램 각각을 함수 또는 기본 블록으로 구성되는 어셈블리언어 기반 프로그램 또는 고급 언어 기반 프로그램으로 변환하는 프로그램 변환부를 더 포함한다.For example, in the case where the two programs are two executable programs each composed of a machine language, the program similarity degree calculating apparatus according to the embodiment of the present invention may be configured so that each of the two executive programs is an assembly language- Based program according to the present invention.

일 실시예에 따르면, 식별 번호 생성부는, 모든 함수 또는 모든 기본 블록 각각에 해시함수를 적용하여 모든 함수 또는 모든 기본 블록 각각을 임의의 비트열로 변환하고, 모든 함수 또는 모든 기본 블록 각각에 대응되는 임의의 비트열 각각에서 기설정된 개수의 연속된 비트열인 연속 비트열을 각각 추출하고, 추출된 연속 비트열 각각을 10진수로 변환하여 식별 번호를 생성한다.According to one embodiment, the identification number generator converts all functions or all basic blocks into arbitrary bit strings by applying a hash function to each or all of the basic blocks, Extracts a consecutive bit string, which is a predetermined number of consecutive bit strings, from each arbitrary bit string, and converts the extracted consecutive bit strings into decimal numbers to generate an identification number.

일 실시예에 따라, 특성 비트 벡터 생성부는, 식별 번호 각각에 대응되는 비트의 비트값을 '1'로 설정한다.According to one embodiment, the characteristic bit vector generation unit sets the bit value of the bit corresponding to each identification number to '1'.

예를 들어, 프로그램 유사도 산출부는, 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 순차적으로 비교하여, 서로 일치하는 비트값을 가지는 비트의 개수에 기초하여 2개의 프로그램 상호간의 유사도를 산출한다.For example, the program similarity degree calculating section sequentially compares two characteristic bit vectors for respective bits corresponding to each other, and calculates the similarity between two programs based on the number of bits having mutually corresponding bit values.

예컨대, 프로그램 유사도 산출부는, 서로 대응되는 비트의 비트값이 모두 '1'인 비트의 개수를 서로 대응되는 비트의 비트값 중 적어도 하나가 '1'인 비트의 개수로 나누어 2개의 프로그램 상호간의 유사도를 산출한다.For example, the program similarity degree calculator divides the number of bits having bit values of all '1' corresponding to each other by the number of bits having at least one bit of '1' corresponding to each other, .

본 발명의 일 실시예에 따르면, 2개의 프로그램 각각에 포함된 모든 함수 또는 모든 기본 블록을 2개의 특성 비트 벡터로 표현하여 서로 비교함으로써 유사도를 산출하기 때문에, 모든 함수 또는 모든 기본 블록을 직접 비교하는 것에 비해 더욱 신속하게 2개의 프로그램 상호간의 유사도를 산출할 수 있다.According to an embodiment of the present invention, since all the functions included in each of the two programs or all the basic blocks are expressed by two characteristic bit vectors and are compared with each other to calculate the similarity, all the functions or all the basic blocks are directly compared The degree of similarity between the two programs can be calculated more quickly than that of the two programs.

나아가, 본 발명의 일 실시예에 따르면, 2개의 프로그램 각각에 포함된 모든 함수 또는 모든 기본 블록을 기설정된 크기의 2개의 특성 비트 벡터로 표현하기 때문에, 2개의 프로그램 상호간의 유사도를 산출하기 위해 걸리는 소요 시간을 보다 정확하게 예측할 수 있다.Further, according to an embodiment of the present invention, since all the functions included in each of the two programs or all the basic blocks are represented by two characteristic bit vectors of predetermined sizes, The required time can be more accurately predicted.

도 1은 본 발명의 실시예에 따른 프로그램 유사도 산출 장치를 설명하기 위한 구성도이다.
도 2는 본 발명의 실시예에 따른 프로그램 유사도 산출 방법을 설명하기 위한 순서도이다.
도 3은 본 발명의 실시예에 따른 프로그램 유사도 산출 방법에서 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호 생성 단계를 설명하기 위한 순서도이다.
도 4는 본 발명의 실시예에 따른 프로그램 유사도 산출 방법 및 장치에서 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호를 생성하는 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 실시예에 따른 프로그램 유사도 산출 방법 및 장치에서 2개의 특성 비트 벡터를 생성하는 방법을 설명하기 위한 도면이다.
도 6은 본 발명의 실시예에 따른 프로그램 유사도 산출 방법 및 장치에서 2개의 특성 비트 벡터에 기초하여 2개의 프로그램 상호간의 유사도를 산출하는 방법을 설명하기 위한 도면이다.1 is a block diagram for explaining a program similarity degree calculating apparatus according to an embodiment of the present invention.
FIG. 2 is a flowchart for explaining a program similarity degree calculating method according to an embodiment of the present invention.
3 is a flowchart for explaining an identification number generation step corresponding to all functions or all basic blocks in the program similarity degree calculation method according to the embodiment of the present invention.
4 is a diagram for explaining a method of generating identification numbers corresponding to all functions or all basic blocks in the program similarity calculating method and apparatus according to the embodiment of the present invention.
5 is a view for explaining a method of generating two characteristic bit vectors in the program similarity calculating method and apparatus according to the embodiment of the present invention.
FIG. 6 is a diagram for explaining a method of calculating the degree of similarity between two programs based on two characteristic bit vectors in the program similarity calculating method and apparatus according to the embodiment of the present invention.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여, 본 발명의 가장 바람직한 실시예를 첨부 도면을 참조하여 설명하기로 한다. 우선 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to facilitate a person skilled in the art to easily carry out the technical idea of the present invention. . In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

이하, 본 발명의 실시예에 따른 프로그램 유사도 산출 방법 및 장치를 첨부된 도면을 참조하여 상세하게 설명하면 아래와 같다.Hereinafter, a method and apparatus for calculating the degree of program similarity according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

이제, 도 1을 참조하여 본 발명의 실시예에 따른 프로그램 유사도 산출 장치를 설명한다.Now, a program similarity degree calculating apparatus according to an embodiment of the present invention will be described with reference to FIG.

도 1은 본 발명의 실시예에 따른 프로그램 유사도 산출 장치를 설명하기 위한 구성도이다.1 is a block diagram for explaining a program similarity degree calculating apparatus according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 프로그램 유사도 산출 장치(100)는 식별 번호 생성부(110), 특성 비트 벡터 생성부(120) 및 프로그램 유사도 산출부(130)를 포함한다.1, the program similarity degree calculating apparatus 100 according to the embodiment of the present invention includes an identification number generator 110, a characteristic bit vector generator 120, and a program similarity calculator 130 .

예컨대, 도면에 도시되지는 않았으나, 본 발명의 실시예에 따른 프로그램 유사도 산출 장치(100)는 프로그램 변환부(미도시)를 더 포함할 수 있으나, 본 발명은 이에 한정되지 않는다.For example, although not shown in the drawing, the program similarity degree calculation apparatus 100 according to the embodiment of the present invention may further include a program conversion unit (not shown), but the present invention is not limited thereto.

이하, 본 발명의 실시예에 따른 프로그램 유사도 산출 장치(100)가 상호간에 유사도를 산출하는 대상이 제1 프로그램(1) 및 제2 프로그램(2)을 2개의 프로그램(1, 2)으로 정의하여 설명한다.Hereinafter, the program degree-of-similarity calculating apparatus 100 according to the embodiment of the present invention defines the first program 1 and the second program 2 as two programs 1 and 2, Explain.

예컨대, 2개의 프로그램(1, 2) 각각이 기계어로 구성된 2개의 실행 프로그램인 경우, 프로그램 변환부(미도시)는 2개의 실행 프로그램 각각을 함수 또는 기본 블록으로 구성되는 어셈블리언어 기반 프로그램 또는 고급 언어 기반 프로그램으로 변환한다.For example, when each of the two programs 1 and 2 is two executable programs composed of a machine language, the program conversion unit (not shown) converts each of the two executable programs into an assembly language-based program or a high- Based program.

예컨대, 프로그램 변환부(미도시)는 2개의 프로그램(1, 2) 각각이 "0" 또는 "1"로 표현되는 기계어로 구성된 2개의 실행 프로그램인 경우, 기계어를 디스어셈블(disassemble)하여 2개의 실행 프로그램 각각을 함수 또는 기본 블록으로 구분할 수 있는 언어인 어셈블리언어 기반 프로그램으로 변환할 수 있다.For example, the program conversion unit (not shown) disassembles the machine language when two programs 1 and 2 are two execution programs each composed of a machine language represented by " 0 " You can convert an executable program into an assembly language-based program that can be divided into functions or basic blocks.

예를 들면, 본 발명의 실시예에 따른 프로그램 유사도 산출 장치(100)는 어셈블리언어 기반 프로그램 또는 고급 언어 기반 프로그램에 기초하여 2개의 프로그램(1, 2) 상호간의 유사도를 산출할 수 있기 때문에, 2개의 프로그램(1, 2) 각각이 이미 어셈블리언어 기반 프로그램인 경우 프로그램 변환부(미도시)의 동작은 생략될 수 있다.For example, since the program similarity degree calculating apparatus 100 according to the embodiment of the present invention can calculate the degree of similarity between two programs 1 and 2 based on an assembly language-based program or a high-level language-based program, If each of the programs 1 and 2 is already an assembly language-based program, the operation of the program conversion unit (not shown) may be omitted.

예컨대, 프로그램 변환부(미도시)는 2개의 프로그램(1, 2) 각각이 이미 어셈블리언어 기반 프로그램인 경우에도, 어셈블리언어 기반 프로그램을 더 변환하여 C언어, JAVA언어를 비롯한 고급 언어 기반 프로그램으로 상술한 2개의 프로그램(1, 2) 각각을 더 변환할 수도 있으나, 본 발명은 이에 한정되지 않는다.For example, even if each of the two programs 1 and 2 is already an assembly language-based program, the program conversion unit (not shown) converts the assembly language-based program further into an advanced language-based program including C language and JAVA language Each of the two programs 1 and 2 may be further converted, but the present invention is not limited thereto.

일 실시예에 따르면, 프로그램 변환부(미도시)는 2개의 프로그램(1, 2) 각각이 이미 고급 언어 기반 프로그램인 경우에도, 고급 언어 기반 프로그램을 더 변환하여 상술한 2개의 프로그램(1, 2) 각각을 어셈블리언어 기반 프로그램으로 더 변환할 수도 있으나, 본 발명은 이에 한정되지 않는다.According to one embodiment, even if each of the two programs 1 and 2 is already a high-level language-based program, the program conversion unit (not shown) further converts the high-level language- ) May be further converted into an assembly language-based program, but the present invention is not limited thereto.

즉, 2개의 프로그램(1, 2) 각각이 기계어에 기반한 프로그램인 경우, 프로그램 변환부(미도시)는 2개의 프로그램(1, 2) 각각을 어셈블리언어 기반 프로그램 또는 고급 언어 기반 프로그램으로 변환할 수 있다.That is, when each of the two programs 1 and 2 is a program based on a machine language, the program conversion unit (not shown) can convert each of the two programs 1 and 2 into an assembly language program or a high- have.

한편, 2개의 프로그램(1, 2) 각각이 이미 어셈블리언어 기반 프로그램 또는 고급 언어 기반 프로그램인 경우, 프로그램 변환부(미도시)는 생략될 수 있다.On the other hand, when each of the two programs 1 and 2 is an assembly language-based program or a high-level language-based program, the program conversion unit (not shown) may be omitted.

식별 번호 생성부(110)는 2개의 프로그램 각각에 포함된 모든 함수 또는 모든 기본 블록 각각을 기설정된 조건에 따라 변환하여 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호를 생성한다.The identification number generation unit 110 converts all the functions included in each of the two programs or each of all the basic blocks according to predetermined conditions to generate all the functions or identification numbers corresponding to all the basic blocks.

예컨대, 2개의 프로그램(1, 2) 각각은 함수 또는 기본 블록 단위로 구분할 수 있는 프로그램인 어셈블리언어 기반 프로그램 또는 고급 언어 기반 프로그램을 의미할 수 있다.For example, each of the two programs 1 and 2 may be an assembly language program or a high-level language program, which is a program that can be divided into functions or basic block units.

예를 들어, 2개의 프로그램(1, 2) 각각이 고급 언어 중 C언어에 기반한 프로그램인 경우, 2개의 프로그램(1, 2) 각각에 포함된 모든 함수는 "printf("Hello, World!\n")"과 같이 함수명과 인자값을 모두 포함하는 함수 각각을 의미할 수 있다.For example, if each of the two programs 1 and 2 is a program based on the C language among the high-level languages, all the functions contained in each of the two programs 1 and 2 are "printf (" Hello, World! \ N &Quot; ") ", respectively.

예를 들어, 2개의 프로그램(1, 2) 각각에 포함된 모든 기본 블록은 함수보다 작은 식별 단위로써, 실행 흐름 단위로 함수를 분할한 엔트리 외에는 들어오는 분기가 없고, 출구 외에는 나가는 분기가 없는 직선 코드열을 의미할 수 있다.For example, all the basic blocks included in each of the two programs (1, 2) have a smaller identification unit than a function, and there is no incoming branch other than the entry in which the function is divided in execution flow units, It can mean heat.

예를 들어, 2개의 프로그램(1, 2) 각각이 고급 언어 중 C언어에 기반한 프로그램인 경우, 2개의 프로그램(1, 2) 각각에 포함된 특정 함수의 변수 a가 1 또는 2에서 정의된 경우에서 특정 함수가 "if(a == 1)"인 경우, 특정 함수에 포함된 기본 블록은 a가 1인 경우와 a가 1이 아닌 경우(a가 2인 경우) 각각을 의미하는 2개의 기본 블록을 의미할 수 있다.For example, when each of the two programs 1 and 2 is a program based on the C language among the high-level languages, if the variable a of the specific function included in each of the two programs 1 and 2 is defined as 1 or 2 (A == 1), then the base block contained in the particular function is a base of two primitives, one for the case a is 1 and the other for a if it is not 1 It can mean a block.

예컨대, 식별 번호 생성부(110)는, 모든 함수 또는 모든 기본 블록 각각에 해시함수를 적용하여 모든 함수 또는 모든 기본 블록 각각을 임의의 비트열로 변환하고, 모든 함수 또는 모든 기본 블록 각각에 대응되는 임의의 비트열 각각에서 기설정된 개수의 연속된 비트열인 연속 비트열을 각각 추출하고, 추출된 연속 비트열 각각을 10진수로 변환하여 식별 번호를 생성한다.For example, the identification number generator 110 may apply a hash function to each of all the functions or all the basic blocks to convert all the functions or all of the basic blocks into arbitrary bit strings, Extracts a consecutive bit string, which is a predetermined number of consecutive bit strings, from each arbitrary bit string, and converts the extracted consecutive bit strings into decimal numbers to generate an identification number.

특성 비트 벡터 생성부(120)는 2개의 프로그램(1, 2) 각각에 대응되며 기설정된 개수의 비트를 포함하는 2개의 비트 벡터 상에서 식별 번호에 각각 대응되는 비트의 비트값을 설정하여 2개의 특성 비트 벡터를 각각 생성한다.The characteristic bit vector generation unit 120 sets the bit values of the bits corresponding to the identification numbers on the two bit vectors corresponding to the two programs 1 and 2 respectively and including a predetermined number of bits, Respectively.

예컨대, 상술한 2개의 비트 벡터 각각은 복수 개의 비트를 포함하며, 각각의 비트는 비트 벡터 상에 포함된 하나의 비트에 대응되며 각각의 비트는 특정 식별 번호에 대응되는 비트가 사용되면 '1'의 비트값을 가지고 특정 식별 번호에 대응되는 비트가 사용되지 않으면 '0'의 비트값을 가질 수 있다.For example, each of the two bit vectors described above includes a plurality of bits, each bit corresponding to one bit included on the bit vector, each bit having a value of " 1 " 0 " unless a bit corresponding to a specific identification number is used with a bit value of " 0 ".

예컨대, 상술한 2개의 비트 벡터 각각은 모든 비트의 비트값이 '0'으로 초기화될 수 있다.For example, each of the two bit vectors described above may be initialized to a bit value of '0' for all bits.

예컨대, 특성 비트 벡터 생성부(120)는, 식별 번호 각각에 대응되는 비트의 비트값을 '1'로 설정할 수 있다.For example, the characteristic bit vector generation unit 120 may set the bit value of the bit corresponding to each identification number to '1'.

즉, 특성 비트 벡터 생성부(120)는 모든 비트의 비트값이 '0'으로 초기화된 기설정된 크기의 비트 벡터 상에서 식별 번호에 대응되는 비트의 비트값을 '1'로 변경하여, 특성 비트 벡터를 생성할 수 있다.That is, the characteristic bit vector generation unit 120 changes the bit value of the bit corresponding to the identification number to '1' on a bit vector of a predetermined size, in which the bit value of all bits is initialized to '0' Lt; / RTI >

프로그램 유사도 산출부(130)는 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 서로 비교하여 2개의 프로그램(1, 2) 상호간의 유사도를 산출한다.The program similarity degree calculating unit 130 compares the two characteristic bit vectors for each bit corresponding to each other to calculate the degree of similarity between the two programs 1 and 2.

예컨대, 프로그램 유사도 산출부(130)는, 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 순차적으로 비교하여, 서로 일치하는 비트값을 가지는 비트의 개수에 기초하여 2개의 프로그램(1, 2) 상호간의 유사도를 산출할 수 있다.For example, the program similarity degree calculating section 130 sequentially compares the two characteristic bit vectors on a bit-by-bit basis to determine the degree of similarity between two programs (1, 2) based on the number of bits having mutually corresponding bit values. The degree of similarity can be calculated.

일 실시예에 따라, 프로그램 유사도 산출부(130)는, 서로 대응되는 비트의 비트값이 모두 '1'인 비트의 개수를 서로 대응되는 비트의 비트값 중 적어도 하나가 '1'인 비트의 개수로 나누어 2개의 프로그램(1, 2) 상호간의 유사도를 산출할 수 있다.According to one embodiment, the program similarity degree calculating unit 130 may calculate the number of bits having bit values of all '1' corresponding to one another by the number of bits having at least one bit value of '1' , The similarity between the two programs 1 and 2 can be calculated.

즉, 프로그램 유사도 산출부(130)는 2개의 프로그램(1, 2) 상호간의 유사도를 산출하기 위하여, 2개의 프로그램(1, 2) 각각에 포함된 모든 함수 또는 모든 기본 블록의 존재 여부를 표시하는 2개의 특성 비트 벡터를 서로 비교함으로써, 2개의 프로그램(1, 2) 상호간의 유사도를 산출할 수 있다.That is, in order to calculate the degree of similarity between the two programs 1 and 2, the program similarity degree calculating unit 130 displays whether all the functions included in each of the two programs 1 and 2 or all the basic blocks exist By comparing the two characteristic bit vectors with each other, the degree of similarity between the two programs 1 and 2 can be calculated.

본 발명의 실시예에 따른, 프로그램 유사도 산출 장치(100)의 각각의 구성에 대한 보다 구체적인 설명은 이하 도 2 내지 도 6을 참조하여 후술하도록 하며, 중복되는 설명은 생략한다.A more detailed description of each configuration of the program similarity degree calculation device 100 according to the embodiment of the present invention will be described below with reference to FIG. 2 to FIG. 6, and redundant description will be omitted.

이제 도 2를 참조하여, 본 발명의 실시예에 따른 프로그램 유사도 산출 방법을 설명한다.Now, a program similarity calculating method according to an embodiment of the present invention will be described with reference to FIG.

도 2는 본 발명의 실시예에 따른 프로그램 유사도 산출 방법을 설명하기 위한 순서도이다.FIG. 2 is a flowchart for explaining a program similarity degree calculating method according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 본 발명의 실시예에 따른 프로그램 유사도 산출 방법은 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호 생성 단계(S210), 식별 번호 각각에 대응되는 비트의 비트값을 설정하여 2개의 특성 비트 벡터 생성 단계(S220) 및 2개의 프로그램 상호간의 유사도를 산출 단계(S230)를 포함한다.As shown in FIG. 2, the program similarity calculating method according to an embodiment of the present invention includes an identification number generation step S210 corresponding to all functions or all basic blocks, a bit value of a bit corresponding to each identification number is set Two characteristic bit vector generation steps S220 and a similarity calculation step S230.

일 실시예에 따르면, 도면에 도시되지는 않았으나, 본 발명의 실시예에 따른 프로그램 유사도 산출 방법은, 프로그램 변환 단계(미도시)를 더 포함할 수도 있다.According to an embodiment, although not shown in the drawings, the program similarity degree calculation method according to the embodiment of the present invention may further include a program conversion step (not shown).

예컨대, 프로그램 변환 단계(미도시)는 유사도 산출의 대상이 되는 2개의 프로그램(1, 2) 각각이 기계어로 구성된 2개의 실행 프로그램인 경우, S210 단계 이전에, 수행될 수 있다.For example, the program conversion step (not shown) may be performed before the step S210 if each of the two programs 1 and 2 to be subjected to the similarity degree calculation is two execution programs each composed of a machine language.

이는, 기계어는 "0" 또는 "1"로 구성되기 때문이며, S210 단계는 2개의 프로그램(1, 2) 각각에 포함된 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호를 생성하기 위하여, 2개의 프로그램(1, 2) 각각에 함수 또는 기본 블록으로 표현되어야 수행될 수 있기 때문이다.This is because the machine language is composed of " 0 " or " 1 ", and in step S210, in order to generate an identification number corresponding to all the functions included in each of the two programs 1 and 2, This is because it can be performed by expressing a function or a basic block in each of the programs 1 and 2.

예컨대, 2개의 프로그램(1, 2) 각각이 어셈블리언어에 기반하거나 고급 언어에 기반하는 경우, 프로그램 변환 단계(미도시)는 생략될 수 있다.For example, if each of the two programs 1 and 2 is based on an assembly language or a high-level language, the program conversion step (not shown) may be omitted.

보다 상세하게, 프로그램 변환 단계(미도시)는 프로그램 변환부(미도시)가, 2개의 실행 프로그램 각각을 함수 또는 기본 블록으로 구성되는 어셈블리언어 기반 프로그램 또는 고급 언어 기반 프로그램으로 변환하는 단계를 의미할 수 있다.More specifically, the program conversion step (not shown) means a step of converting a program conversion unit (not shown) into an assembly language-based program or a high-level language-based program in which each of the two execution programs is composed of a function or a basic block .

S210 단계는, 식별 번호 생성부(110)가, 2개의 프로그램(1, 2) 각각에 포함된 모든 함수 또는 모든 기본 블록 각각을 기설정된 조건에 따라 변환하여 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호를 생성하는 단계를 의미할 수 있다.In step S210, the identification number generator 110 converts all functions included in each of the two programs 1 and 2 or all of the basic blocks in accordance with predetermined conditions, It may refer to a step of generating an identification number.

예컨대, S210 단계의 수행 결과 2개의 프로그램(1, 2) 각각에 포함된 모든 함수 또는 모든 기본 블록 각각은 대응되는 식별 번호로 변환되게 되며, 그 결과 2개의 프로그램(1, 2) 각각은 복수 개의 식별 번호의 집합으로 표현될 수 있게 된다.For example, as a result of performing the step S210, all the functions included in each of the two programs 1 and 2 or all of the basic blocks are converted into the corresponding identification numbers. As a result, each of the two programs 1, And can be represented by a set of identification numbers.

이제 도 3 및 도 4를 동시에 참조하여, S210 단계를 설명한다.Now, referring to FIG. 3 and FIG. 4 simultaneously, step S210 will be described.

도 3은 본 발명의 실시예에 따른 프로그램 유사도 산출 방법에서 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호 생성 단계를 설명하기 위한 순서도이다.3 is a flowchart for explaining an identification number generation step corresponding to all functions or all basic blocks in the program similarity degree calculation method according to the embodiment of the present invention.

도 4는 본 발명의 실시예에 따른 프로그램 유사도 산출 방법 및 장치에서 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호를 생성하는 방법을 설명하기 위한 도면이다.4 is a diagram for explaining a method of generating identification numbers corresponding to all functions or all basic blocks in the program similarity calculating method and apparatus according to the embodiment of the present invention.

도 3에 도시된 바와 같이, S210 단계는 모든 함수 또는 모든 기본 블록 각각을 임의의 비트열로 변환 단계(S211), 임의의 비트열 각각에서 기설정된 개수의 연속 비트열 추출 단계(S212) 및 연속 비트열 각각을 10진수로 변환하여 식별 번호 생성 단계(S213)를 포함한다.3, in operation S210, all of the functions or all of the basic blocks are converted into a certain bit string (S211), a predetermined number of consecutive bit string extraction steps (S212) and a predetermined number Converting each bit string into a decimal number and generating an identification number (S213).

S211 단계는, 모든 함수 또는 모든 기본 블록 각각에 해시함수를 적용하여 모든 함수 또는 모든 기본 블록 각각을 임의의 비트열로 변환하는 단계를 의미할 수 있다.Step S211 may refer to a step of applying a hash function to all the functions or all of the basic blocks to convert all the functions or all of the basic blocks into arbitrary bit strings.

예컨대, 해시함수의 출력값은 그 크기가 일정하되 임의의 값을 가지며, 해시함수에 동일한 입력값을 넣는 경우, 항상 같은 출력값이 나오지만 출력값만을 이용해서 입력값을 역산할 수는 없게 된다.For example, the output value of the hash function is constant in size but has an arbitrary value. When the same input value is input to the hash function, the same output value always appears, but the input value can not be inversely calculated using only the output value.

이때, 해시함수는 메시지 다이제스트(Message Digest), 시큐어 해시 알고리즘(Secure Hash Algorithm), HAS-160 및 SHA-256을 비롯한 각종 해시함수가 사용될 수 있으며, 본 발명은 특정 해시함수의 종류에 한정되지 않는다.At this time, various hash functions including a message digest, a secure hash algorithm, HAS-160, and SHA-256 may be used as the hash function, and the present invention is not limited to a specific hash function type .

도 4에 도시된 바와 같이, S211 단계에서 제1 프로그램(1) 또는 제2 프로그램(2) 각각에 포함된 특정 함수 또는 특정 기본 블록에 대해 해시함수를 적용하는 경우 특정 함수 또는 특정 기본 블록에 대응되는 해시값이 생성될 수 있으며, 이때 생성된 해시값은 임의의 비트열로 정의한다.As shown in FIG. 4, when a hash function is applied to a specific function or a specific basic block included in each of the first program 1 or the second program 2 in step S211, The generated hash value is defined as an arbitrary bit string.

예를 들어, S211 단계에서 제1 프로그램(1)이 어셈블리언어에 기반한 프로그램인 경우, 함수명과 인자값을 포함하는 특정 함수인 "mov ebp, esp"에 해시함수를 적용하는 경우를 예로들어 S211 단계를 설명하면, S211 단계의 연산 결과값으로"0101010110101101101011010101010110110100110101010101010101011010"와 같은 해시값이 생성될 수 있으며, 이때 생성된 해시값을 임의의 비트열이라 할 수 있다.For example, when the first program (1) is a program based on the assembly language in step S211, the hash function is applied to a specific function "mov ebp, esp" including a function name and an argument value as an example, A hash value such as " 01010101101011011010110101010101010101010101010101010101010101010101010101011010 " may be generated as the operation result value in step S211, and the generated hash value may be referred to as an arbitrary bit string.

이 때, 2개의 프로그램(1, 2) 각각에 포함된 모든 함수 또는 모든 기본 블록에 대해 해시함수를 적용한다면, 2개의 프로그램(1, 2) 각각에 포함된 모든 함수 또는 모든 기본 블록 각각은 임의의 비트열로 표현될 수 있다.At this time, if a hash function is applied to all the functions included in each of the two programs (1, 2) or all the basic blocks, all the functions included in each of the two programs (1, 2) Lt; / RTI >

S212 단계는, 모든 함수 또는 모든 기본 블록 각각에 대응되는 임의의 비트열 각각에서 기설정된 개수의 연속된 비트열인 연속 비트열을 각각 추출하는 단계를 의미할 수 있다.Step S212 may be a step of extracting consecutive bit strings, which are consecutive bit sequences of a predetermined number, in each bit sequence corresponding to all the functions or all the basic blocks.

예를 들어, 도 4에 도시된 바와 같이, 임의의 비트열인 "0101010110101101101011010101010110110100110101010101010101011010"의 최초의 비트값인 "0"을 0번째 비트라 정의할 때, S212 단계에서 24번째 내지 39번째로 연속되는 16개의 비트를 연속 비트열로 추출한 결과 "0101010110110100"이 추출될 수 있다.For example, as shown in Fig. 4, when the first bit value "0" of an arbitrary bit string "0101010110101010101010101010101010101010101001010101010101010100101010101010101010110101010101010101010101010" is defined as the 0th bit, in step S212, 16 &Quot; 0101010110110100 " can be extracted as a result of extracting a plurality of bits by consecutive bit strings.

이때, S212 단계에서 추출되는 연속된 비트열의 기설정된 개수는 미리 설정되어 식별 번호 생성부(110)에 저장될 수 있으며, 본 발명은 특정 개수의 비트열을 추출하는 것으로 한정되지 않는다.At this time, the predetermined number of consecutive bit strings extracted in step S212 may be preset and stored in the identification number generator 110, and the present invention is not limited to extracting a specific number of bit strings.

일 실시예에 따르면, S212 단계에서 식별 번호 생성부(110)는 기설정된 개수의 연속된 비트열을 추출하는 대신에, 기설정된 개수의 불연속한 비트열을 추출할 수도 있으며, 본 발명은 연속된 비트열을 추출하는 것으로 한정되지 않는다.According to an embodiment, the identification number generator 110 may extract a predetermined number of discontinuous bit strings instead of extracting a predetermined number of consecutive bit strings in step S212. It is not limited to extracting the bit stream.

S213 단계는, 추출된 연속 비트열 각각을 10진수로 변환하여 식별 번호를 생성하는 단계를 의미할 수 있다.Step S213 may refer to a step of generating an identification number by converting each of the extracted continuous bit strings into a decimal number.

예컨대, 도 4에 도시된 바와 같이 연속 비트열 "0101010110110100"의 10진수 변환 값은 21,940이기 때문에, S211, S212 및 S213 단계를 거쳐 특정 함수인 "mov ebp, esp"에 대응되는 식별 번호는 21,940으로 생성될 수 있다.For example, as shown in Fig. 4, since the decimal conversion value of the continuous bit string " 0101010110110100 " is 21,940, the identification number corresponding to the specific function " mov ebp, esp " through steps S211, S212 and S213 is 21,940 Lt; / RTI >

다시 말해, 2개의 프로그램(1, 2) 각각에 포함된 모든 함수 또는 모든 기본 블록 각각에 S211, S212 및 S213 단계를 적용하면, 2개의 프로그램(1, 2) 각각은 복수 개의 식별 번호로 표현될 수 있게 된다.In other words, when S211, S212 and S213 are applied to all the functions included in each of the two programs 1 and 2 or to all the basic blocks, each of the two programs 1 and 2 is represented by a plurality of identification numbers .

이제 도 2 및 도 5를 동시에 참조하여, S220 단계를 설명한다.Now, referring to FIG. 2 and FIG. 5 at the same time, step S220 will be described.

도 5는 본 발명의 실시예에 따른 프로그램 유사도 산출 방법 및 장치에서 2개의 특성 비트 벡터를 생성하는 방법을 설명하기 위한 도면이다.5 is a view for explaining a method of generating two characteristic bit vectors in the program similarity calculating method and apparatus according to the embodiment of the present invention.

S220 단계는, 특성 비트 벡터 생성부(120)가, 2개의 프로그램(1, 2) 각각에 대응되며 기설정된 개수의 비트를 포함하는 2개의 비트 벡터 상에서 식별 번호에 각각 대응되는 비트의 비트값을 설정하여 2개의 특성 비트 벡터를 각각 생성하는 단계를 의미할 수 있다.In step S220, the characteristic bit vector generation unit 120 generates a bit value of a bit corresponding to the identification number on two bit vectors corresponding to each of the two programs 1 and 2 and including a predetermined number of bits And generating two characteristic bit vectors, respectively.

예컨대, 2개의 비트 벡터는 각각이 포함하는 모든 비트의 비트값이 '0'으로 초기화된 상태일 수 있다.For example, the two bit vectors may be a state in which the bit values of all the bits included in the two bit vectors are initialized to '0'.

도 5에 도시된 바와 같이, 상술한 예시에서 S211, S212 및 S213 단계를 통해 제1 프로그램(1)에 포함된 함수인 "mov ebp, esp"에 대응되는 식별 번호는 21,940으로 생성되었으며, S220 단계에서 특성 비트 벡터 생성부(120)는 제1 프로그램(1)에 대응되는 비트 벡터 상에서 21,940 번째 비트의 비트값을 '1'로 설정할 수 있다.5, the identification number corresponding to the function "mov ebp, esp" included in the first program 1 through steps S211, S212, and S213 is generated as 21,940, and in step S220 The characteristic bit vector generation unit 120 may set the bit value of the 21, 940th bit to '1' on the bit vector corresponding to the first program (1).

S220 단계에서, 특성 비트 벡터 생성부(120)가 제1 프로그램(1)에 포함된 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호에 대응되는 비트의 비트값을 설정하면, 제1 프로그램(1)에 대응되는 제1 특성 비트 벡터(미도시)가 생성될 수 있다.If the characteristic bit vector generation unit 120 sets the bit values of the bits corresponding to all the functions included in the first program 1 or the identification numbers corresponding to all the basic blocks in step S220, (Not shown) corresponding to the first characteristic bit vector may be generated.

마찬가지로, S220 단계에서 특성 비트 벡터 생성부(120)가 제2 프로그램(2)에 포함된 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호에 대응되는 비트의 비트값을 설정하면, 제2 프로그램(2)에 대응되는 제2 특성 비트 벡터(미도시)가 생성될 수 있다.Similarly, if the characteristic bit vector generation unit 120 sets bit values of bits corresponding to all the functions included in the second program 2 or the identification numbers corresponding to all the basic blocks in step S220, A second characteristic bit vector (not shown) corresponding to the second characteristic bit vector may be generated.

S220 단계의 수행결과, 제1 프로그램(1)은 제1 프로그램(1)에 포함된 모든 함수 또는 모든 기본 블록 각각의 식별 번호에 기초하여, 제1 프로그램(1)에 포함된 모든 함수 또는 모든 기본 블록의 존재 여부를 나타내는 제1 특성 비트 벡터(미도시)로 변환될 수 있으며, 마찬가지로 제2 프로그램(2) 또한 제2 특성 비트 벡터(미도시)로 변환될 수 있다.As a result of the execution of step S220, the first program (1) stores all the functions included in the first program (1) or all the functions included in the first program (1) (Not shown) indicating the presence or absence of a block, and likewise, the second program 2 can also be converted into a second characteristic bit vector (not shown).

다시 말해 S220 단계는, 식별 번호 각각에 대응되는 비트의 비트값을 '1'로 설정하는 단계를 포함할 수 있다.In other words, the step S220 may include setting the bit value of the bit corresponding to each identification number to '1'.

이제 도 2 및 도 6을 동시에 참조하여, S230 단계를 설명한다.Now, referring to FIG. 2 and FIG. 6 simultaneously, step S230 will be described.

S230 단계는, 프로그램 유사도 산출부(130)가, 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 서로 비교하여 2개의 프로그램 상호간의 유사도를 산출하는 단계를 의미할 수 있다.In step S230, the program similarity calculating unit 130 may compute the degree of similarity between two programs by comparing the two characteristic bit vectors on a bit-by-bit basis.

도 6에 도시된 바와 같이, 제1 프로그램(1, 프로그램 A)과 제2 프로그램(2, 프로그램 B) 상호간의 유사도를 산출하기 위하여, S230 단계에서 프로그램 유사도 산출부(130)는 제1 프로그램(1, 프로그램 A)에 대응되는 제1 특성 비트 벡터(미도시)와 제2 프로그램(2, 프로그램 B)에 대응되는 제2 특성 비트 벡터(미도시) 각각의 비트를 서로 비교할 수 있다.6, in order to calculate the degree of similarity between the first program (1, program A) and the second program (2, program B), in step S230, the program similarity degree calculating section 130 calculates a degree of similarity (Not shown) corresponding to the second program (2, program B) and the second characteristic bit vector (not shown) corresponding to the second program (2, program B)

예컨대, 제1 특성 비트 벡터(미도시)는 제1 프로그램(1, 프로그램 A)의 우측 화살표 영역에 도시된 특성 비트 벡터를 의미할 수 있다.For example, the first characteristic bit vector (not shown) may mean the characteristic bit vector shown in the right arrow area of the first program (1, program A).

예를 들어, 제2 특성 비트 벡터(미도시)는 제2 프로그램(2, 프로그램 B)의 우측 화살표 영역에 도시된 특성 비트 벡터를 의미할 수 있다.For example, the second characteristic bit vector (not shown) may mean the characteristic bit vector shown in the right arrow area of the second program (2, program B).

이제, S230 단계에 대한 보다 구체적인 설명에 앞서, 도 6을 참조하여 제1 특성 비트 벡터(미도시) 및 제2 특성 비트 벡터(미도시) 각각의 의미에 대해 설명한다.Prior to a more detailed description of step S230, the meaning of each of the first characteristic bit vector (not shown) and the second characteristic bit vector (not shown) will be described with reference to Fig.

예를 들어, 제1 프로그램(1, 프로그램 A)의 우측 화살표 영역에 도시된 제1 특성 비트 벡터(미도시)가 0번째 비트부터 9번째 비트까지 총 10개의 비트를 가지는 경우를 가정한다.For example, it is assumed that the first characteristic bit vector (not shown) shown in the right arrow area of the first program (1, program A) has 10 bits in total from the 0th bit to the 9th bit.

이 경우, 제1 특성 비트 벡터(미도시)는 0번째 비트의 비트값이 '0', 1번째 비트의 비트값이 '1', 2번째 비트의 비트값이 '1', 3번째 비트의 비트값이 '0', 4번째 비트의 비트값이 '0', 5번째 비트의 비트값이 '1', 6번째 비트의 비트값이 '0', 7번째 비트의 비트값이 '1', 8번째 비트의 비트값이 '0', 9번째 비트의 비트값이 '0'인 비트 벡터로 '0110010100'과 같이 표현될 수 있다.In this case, the first characteristic bit vector (not shown) has a bit value of '0', a bit value of the first bit is '1', a bit value of the second bit is '1' The bit value of the 4th bit is 0, the bit value of the 5th bit is 1, the bit value of the 6th bit is 0, the bit value of the 7th bit is 1, , A bit value of the eighth bit is '0', and a bit value of the ninth bit is '0', as '0110010100'.

이 경우, 제1 특성 비트 벡터(미도시)의 의미는 제1 프로그램(1, 프로그램 A)이 식별 번호 0, 3, 4, 6, 8, 9 각각에 대응되는 함수 또는 기본 블록을 포함하지 않으며, 식별 번호 1, 2, 5, 7 각각에 대응되는 함수 또는 기본 블록을 포함한다는 의미일 수 있다.In this case, the meaning of the first characteristic bit vector (not shown) is that the first program (1, program A) does not contain a function or basic block corresponding to each of identification numbers 0, 3, 4, 6, 8 and 9 , And a function or a basic block corresponding to identification numbers 1, 2, 5, and 7, respectively.

마찬가지로, 제2 프로그램(2, 프로그램 B)의 우측 화살표 영역에 도시된 제2 특성 비트 벡터(미도시)가 0번째 비트부터 9번째 비트까지 총 10개의 비트를 가지는 경우를 가정한다.Similarly, it is assumed that the second characteristic bit vector (not shown) shown in the right arrow area of the second program (2, program B) has 10 bits in total from the 0th bit to the 9th bit.

이 경우, 제2 특성 비트 벡터(미도시)는 0번째 비트의 비트값이 '0', 1번째 비트의 비트값이 '1', 2번째 비트의 비트값이 '1', 3번째 비트의 비트값이 '0', 4번째 비트의 비트값이 '1', 5번째 비트의 비트값이 '1', 6번째 비트의 비트값이 '0', 7번째 비트의 비트값이 '0', 8번째 비트의 비트값이 '1', 9번째 비트의 비트값이 '0'인 비트 벡터로 '0110110010'과 같이 표현될 수 있다.In this case, the second characteristic bit vector (not shown) has a bit value of '0', a bit value of the first bit is '1', a bit value of the second bit is '1' The bit value of the 4th bit is 1, the bit value of the 5th bit is 1, the bit value of the 6th bit is 0, the bit value of the 7th bit is 0, , The bit value of the eighth bit is '1', and the bit value of the ninth bit is '0'.

이 경우, 제2 특성 비트 벡터(미도시)의 의미는 제2 프로그램(2, 프로그램 B)이 식별 번호 0, 3, 6, 7, 9 각각에 대응되는 함수 또는 기본 블록을 포함하지 않으며, 식별 번호 1, 2, 4, 5, 8 각각에 대응되는 함수 또는 기본 블록을 포함한다는 의미일 수 있다.In this case, the meaning of the second characteristic bit vector (not shown) means that the second program (2, program B) does not contain a function or basic block corresponding to each of identification numbers 0, 3, 6, 7 and 9, May include a function or a basic block corresponding to the numbers 1, 2, 4, 5, and 8, respectively.

예컨대, S230 단계는, 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 순차적으로 비교하여, 서로 일치하는 비트값을 가지는 비트의 개수에 기초하여 2개의 프로그램(1, 2) 상호간의 유사도를 산출하는 단계를 포함할 수 있다.For example, in step S230, the two characteristic bit vectors are sequentially compared for corresponding bits, and the degree of similarity between the two programs 1 and 2 is calculated based on the number of bits having mutually corresponding bit values . &Lt; / RTI >

이제 도 6의 상술한 예시를 참조하여, S230 단계의 제1 실시예를 설명한다.Now, referring to the above-mentioned example of FIG. 6, the first embodiment of step S230 will be described.

상술한 예시의 상황에서, 제1 특성 비트 벡터(미도시)는 '0110010100'과 같이 표현되며, 제2 특성 비트 벡터(미도시)는 '0110110010'과 같이 표현되며, S230 단계에서 프로그램 유사도 산출부(130)는 상술한 제1 특성 비트 벡터(미도시) 및 제2 특성 비트 벡터(미도시)를 서로 동일한 비트끼리 비교함으로써, 2개의 프로그램(1, 2) 상호간의 유사도를 산출할 수 있다.In the illustrated example, the first characteristic bit vector (not shown) is expressed as '0110010100', the second characteristic bit vector (not shown) is expressed as '0110110010' The degree of similarity between the two programs 1 and 2 can be calculated by comparing the first characteristic bit vector (not shown) and the second characteristic bit vector (not shown) between the same bits.

상술한 예시의 경우에서 제1 특성 비트 벡터(미도시) 및 제2 특성 비트 벡터(미도시)를 비교하면, 2개의 특성 비트 벡터는 서로 0, 1, 2, 3, 5, 6, 9 번째 비트가 서로 동일하며, 4, 7, 8 번째 비트가 서로 상이함을 확인할 수 있다.In the above example, when the first characteristic bit vector (not shown) and the second characteristic bit vector (not shown) are compared, the two characteristic bit vectors are 0, 1, 2, 3, 5, 6, The bits are the same, and the 4th, 7th, and 8th bits are different from each other.

이 경우, S230 단계에서 프로그램 유사도 산출부(130)는 전체 비트의 개수인 10개와 서로 동일한 비트의 개수인 7개의 비율에 기초하여 제1 특성 비트 벡터(미도시)와 제2 특성 비트 벡터(미도시) 상호간의 유사도를 산출할 수 있으며, 그 결과 제1 프로그램(1, 프로그램 A)와 제2 프로그램(2, 프로그램 B) 상호간의 유사도가 산출될 수 있다.In this case, in step S230, the program similarity degree calculating unit 130 calculates a degree of similarity between the first characteristic bit vector (not shown) and the second characteristic bit vector (not shown) based on ten ratios of all bits and seven ratios, The degree of similarity between the first program (1, program A) and the second program (2, program B) can be calculated.

상술한 실시예의 경우, 제1 프로그램(1, 프로그램 A)와 제2 프로그램(2, 프로그램 B) 상호간의 유사도는 7/10인 70%일 수 있으나, 본 발명은 이에 한정되지 않는다.In the case of the above embodiment, the degree of similarity between the first program (1, program A) and the second program (2, program B) may be 70%, which is 7/10, but the present invention is not limited thereto.

한편, S230 단계는, 서로 대응되는 비트의 비트값이 모두 '1'인 비트의 개수를 상기 서로 대응되는 비트의 비트값 중 적어도 하나가 '1'인 비트의 개수로 나누어 2개의 프로그램(1, 2) 상호간의 유사도를 산출하는 단계를 포함할 수 있다.In operation S230, the number of bits having bit values of '1' corresponding to each other is divided by the number of bits having at least one of bit values of the bits corresponding to each other, 2) calculating the degree of similarity between each other.

다시 도 6의 상술한 예시를 참조하여, S230 단계의 제2 실시예를 설명한다.Referring back to the above-mentioned example of FIG. 6, the second embodiment of step S230 will be described.

상술한 예시의 경우에서, 제1 특성 비트 벡터(미도시)에 포함된 각각의 비트와 제2 특성 비트 벡터(미도시)에 포함된 각각의 비트 모두에서 비트값이 '1'인 비트는 1, 2, 5번째 비트로, 서로 대응되는 비트의 비트값이 모두 '1'인 비트의 개수는 3개임을 확인할 수 있다.In the above-described example, a bit having a bit value of '1' in each bit included in the first characteristic bit vector (not shown) and each bit included in the second characteristic bit vector (not shown) is 1 , 2 & cir & and 5 & cir &, and it can be confirmed that the number of bits having bit values of '1' corresponding to each other is three.

같은 예시의 경우에서, 제1 특성 비트 벡터(미도시)에 포함된 각각의 비트와 제2 특성 비트 벡터(미도시)에 포함된 각각의 비트 중 어느 하나의 비트값이 '1'인 비트는 1, 2, 4, 5, 7, 8 번째 비트로, 서로 대응되는 비트의 비트값 중 적어도 하나가 '1'인 비트의 개수는 6개임을 확인할 수 있다.In the case of the same example, a bit having a bit value '1' of either one of the bits contained in the first characteristic bit vector (not shown) and the bit included in the second characteristic bit vector (not shown) 1, 2, 4, 5, 7, and 8 bits, and the number of bits having at least one of the bit values of the bits corresponding to each other is '6'.

그 결과, 제1 특성 비트 벡터(미도시)와 제2 특성 비트 벡터(미도시) 상호간의 유사도는 3/6으로 나타날 수 있으며, 제1 프로그램(1) 및 제2 프로그램(2) 상호간의 유사도 또한 3/6로 나타날 수 있게 된다.As a result, the degree of similarity between the first characteristic bit vector (not shown) and the second characteristic bit vector (not shown) can be expressed as 3/6, and the similarity between the first program 1 and the second program 2 It can also be shown as 3/6.

본 발명의 제2 실시예에 따른 유사도 산출 방법은, 제1 특성 비트 벡터(미도시)와 제2 특성 비트 벡터(미도시)에서 '1'의 비트값을 가지는 비트의 교집합의 개수에서 '1'의 비트값을 가지는 비트의 합집합의 개수를 나누는 일종의 변형된 자카드 인덱스(Jaccard Index) 방법을 의미할 수 있으나, 본 발명은 이에 한정되지 않는다.The similarity calculation method according to the second embodiment of the present invention calculates the similarity between the first characteristic bit vector (not shown) and the second characteristic bit vector (not shown) by using the number of intersections of bits having a bit value of '1' (Jaccard Index) method of dividing the number of union of bits having a bit value of '', but the present invention is not limited thereto.

예를 들어, S230 단계에서 프로그램 유사도 산출부(130)는 상술한 2가지 유사도 산출 방법 외에도 각종 유사도 산출 방법을 활용하여 제1 프로그램(1) 및 제2 프로그램(2) 상호간의 유사도를 산출할 수 있으며, 본 발명은 특정 유사도 산출 방법으로 한정되지 않는다.For example, in step S230, the program similarity degree calculating unit 130 may calculate the degree of similarity between the first program 1 and the second program 2 by using various similarity calculating methods in addition to the two similarity calculating methods described above And the present invention is not limited to a specific similarity calculation method.

본 발명의 실시예에 따른, 프로그램 유사도 산출 방법 및 장치에 따르면, 제1 프로그램(1) 및 제2 프로그램(2)의 실제 크기에 관계없이, 동일한 크기의 특성 비트 벡터로 제1 프로그램(1) 및 제2 프로그램(2)을 표현하여 유사도를 산출할 수 있기 때문에, 유사도 산출에 소요되는 시간을 정확하게 예측할 수 있으며, 더 나아가 제1 프로그램(1) 및 제2 프로그램(2) 각각에 포함된 함수 또는 기본 블록을 직접 비교하지 않고, 특정 함수 또는 특정 기본 블록의 존재 여부를 나타내는 제1 특성 비트 벡터(미도시) 및 제2 특성 비트 벡터(미도시)를 비교함으로써, 제1 프로그램(1) 및 제2 프로그램(2) 상호간의 유사도를 산출하기 때문에, 보다 신속하게 2개의 프로그램(1, 2) 상호간의 유사도를 산출할 수 있다.According to the method and apparatus for calculating the degree of program similarity according to the embodiment of the present invention, the first program (1) with the same size of characteristic bit vector, regardless of the actual size of the first program (1) and the second program (2) (1) and the second program (2), it is possible to accurately estimate the time required to calculate the degree of similarity, and moreover, By comparing the first characteristic bit vector (not shown) and the second characteristic bit vector (not shown) indicating the presence or absence of the specific function or the specific basic block without directly comparing the basic block or the basic block, The degree of similarity between the two programs 1 and 2 can be calculated more quickly because the degree of similarity between the second programs 2 is calculated.

이상에서 본 발명에 따른 바람직한 실시예에 대해 설명하였으나, 다양한 형태로 변형이 가능하며, 본 기술분야에서 통상의 지식을 가진 자라면 본 발명의 특허청구범위를 벗어남이 없이 다양한 변형예 및 수정예를 실시할 수 있을 것으로 이해된다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but many variations and modifications may be made without departing from the scope of the present invention. It will be understood that the invention may be practiced.

1: 제1 프로그램
2: 제2 프로그램
100: 프로그램 유사도 산출 장치
110: 식별 번호 생성부
120: 특성 비트 벡터 생성부
130: 프로그램 유사도 산출부1: First program
2: The second program
100: program similarity calculating device
110: identification number generator
120: characteristic bit vector generation unit
130: program similarity calculating unit

Claims

식별 번호 생성부가, 2개의 프로그램 각각에 포함된 모든 함수 또는 모든 기본 블록(basic block) 각각을 기설정된 조건에 따라 변환하여 상기 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호를 생성하는 단계;
특성 비트 벡터 생성부가, 상기 2개의 프로그램 각각에 대응되며 기설정된 개수의 비트를 포함하는 2개의 비트 벡터(bit vector) 상에서 상기 식별 번호에 각각 대응되는 비트의 비트값을 ‘1’로 설정하고, 나머지 비트의 비트값을 ‘0’으로 설정하여, 2개의 특성 비트 벡터를 각각 생성하는 단계; 및
프로그램 유사도 산출부가, 상기 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 서로 비교하여 상기 2개의 프로그램 상호간의 유사도를 산출하는 단계를 포함하되,
상기 식별 번호를 생성하는 단계는,
상기 모든 함수 또는 모든 기본 블록 각각에 해시함수를 적용하여 상기 모든 함수 또는 모든 기본 블록 각각을 임의의 비트열로 변환하는 단계;
상기 모든 함수 또는 모든 기본 블록 각각에 대응되는 상기 임의의 비트열 각각에서 기설정된 개수의 연속된 비트열인 연속 비트열을 각각 추출하는 단계; 및
추출된 상기 연속 비트열 각각을 10진수로 변환하여 상기 식별 번호를 생성하는 단계를 포함하고,
상기 2개의 프로그램 상호간의 유사도를 산출하는 단계는,
상기 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 순차적으로 비교하여, 서로 일치하는 비트값을 가지는 비트의 개수에 기초하여 상기 2개의 프로그램 상호간의 유사도를 산출하는 단계를 포함하는, 프로그램 유사도 산출 방법.Generating an identification number corresponding to each of all the functions or all the basic blocks by converting all functions included in each of the two programs or all of the basic blocks according to predetermined conditions;
The characteristic bit vector generation unit sets a bit value of a bit corresponding to the identification number to '1' on two bit vectors corresponding to each of the two programs and including a predetermined number of bits, Setting a bit value of the remaining bits to '0' to generate two characteristic bit vectors, respectively; And
Wherein the program similarity calculation unit compares the two characteristic bit vectors for each bit corresponding to each other to calculate a degree of similarity between the two programs,
Wherein the step of generating the identification number comprises:
Applying a hash function to each of all the functions or all of the basic blocks to convert each of the all functions or all of the basic blocks into an arbitrary bit string;
Extracting consecutive bit strings, each of which is a predetermined number of consecutive bit strings, in each of the arbitrary bit strings corresponding to all the functions or all the basic blocks; And
Converting each of the extracted continuous bit strings into a decimal number to generate the identification number,
Wherein the step of calculating the degree of similarity between the two programs comprises:
Sequentially comparing the two characteristic bit vectors on a bit-by-bit basis corresponding to each other, and calculating a degree of similarity between the two programs based on the number of bits having mutually corresponding bit values.

제1항에 있어서,
상기 2개의 프로그램 각각이 기계어로 구성된 2개의 실행 프로그램인 경우,
상기 식별 번호를 부여하는 단계 이전에,
프로그램 변환부가, 상기 2개의 실행 프로그램 각각을 함수 또는 기본 블록으로 구성되는 어셈블리언어 기반 프로그램 또는 고급 언어 기반 프로그램으로 변환하는 단계를 더 포함하는, 프로그램 유사도 산출 방법.The method according to claim 1,
If each of the two programs is two executable programs composed of a machine language,
Prior to the step of assigning the identification number,
Wherein the program conversion section further comprises converting each of the two execution programs into an assembly language-based program or a high-level language-based program composed of a function or a basic block.

삭제delete

제1항에 있어서,
상기 2개의 프로그램 상호간의 유사도를 산출하는 단계는,
상기 서로 대응되는 비트의 비트값이 모두 '1'인 비트의 개수를 상기 서로 대응되는 비트의 비트값 중 적어도 하나가 '1'인 비트의 개수로 나누어 상기 2개의 프로그램 상호간의 유사도를 산출하는 단계를 포함하는, 프로그램 유사도 산출 방법.The method according to claim 1,
Wherein the step of calculating the degree of similarity between the two programs comprises:
Dividing the number of bits having bit values of all of the bits corresponding to each other by '1' into the number of bits having at least one bit of the bits corresponding to each other to calculate a degree of similarity between the two programs And calculating a program similarity.

2개의 프로그램 각각에 포함된 모든 함수 또는 모든 기본 블록(basic block) 각각을 기설정된 조건에 따라 변환하여 상기 모든 함수 또는 모든 기본 블록 각각에 대응되는 식별 번호를 생성하는 식별 번호 생성부;
상기 2개의 프로그램 각각에 대응되며 기설정된 개수의 비트를 포함하는 2개의 비트 벡터(bit vector) 상에서 상기 식별 번호에 각각 대응되는 비트의 비트값을 ‘1’로 설정하고, 나머지 비트의 비트값을 ‘0’으로 설정하여, 2개의 특성 비트 벡터를 각각 생성하는 특성 비트 벡터 생성부; 및
상기 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 서로 비교하여 상기 2개의 프로그램 상호간의 유사도를 산출하는 프로그램 유사도 산출부를 포함하되,
상기 식별 번호 생성부는,
상기 모든 함수 또는 모든 기본 블록 각각에 해시함수를 적용하여 상기 모든 함수 또는 모든 기본 블록 각각을 임의의 비트열로 변환하고, 상기 모든 함수 또는 모든 기본 블록 각각에 대응되는 상기 임의의 비트열 각각에서 기설정된 개수의 연속된 비트열인 연속 비트열을 각각 추출하며, 추출된 상기 연속 비트열 각각을 10진수로 변환하여 상기 식별 번호를 생성하고,
상기 프로그램 유사도 산출부는,
상기 2개의 특성 비트 벡터를 서로 대응되는 비트 별로 순차적으로 비교하여, 서로 일치하는 비트값을 가지는 비트의 개수에 기초하여 상기 2개의 프로그램 상호간의 유사도를 산출하는, 프로그램 유사도 산출 장치.An identification number generator for converting each of all functions included in each of the two programs or all of the basic blocks according to predetermined conditions to generate an identification number corresponding to each of all the functions or all the basic blocks;
A bit value of a bit corresponding to the identification number is set to '1' on two bit vectors corresponding to each of the two programs and including a predetermined number of bits, and a bit value of the remaining bits is set to ' A characteristic bit vector generation unit for generating two characteristic bit vectors by setting '0', respectively; And
And a program similarity calculating unit for calculating the similarity between the two programs by comparing the two characteristic bit vectors for each bit corresponding to each other,
Wherein the identification number generator comprises:
A hash function is applied to each of all the functions or all of the basic blocks to convert each of the all functions or all of the basic blocks into an arbitrary bit string, and in each of the arbitrary bit strings corresponding to all the functions or all the basic blocks, Extracts a consecutive bit string that is a set number of consecutive bit strings, converts each of the extracted consecutive bit strings into a decimal number to generate the identification number,
The program similarity degree calculating unit may calculate,
And sequentially compares the two characteristic bit vectors on a bit-by-bit basis to calculate a degree of similarity between the two programs based on the number of bits having mutually corresponding bit values.

제7항에 있어서,
상기 2개의 프로그램 각각이 기계어로 구성된 2개의 실행 프로그램인 경우,
상기 2개의 실행 프로그램 각각을 함수 또는 기본 블록으로 구성되는 어셈블리언어 기반 프로그램 또는 고급 언어 기반 프로그램으로 변환하는 프로그램 변환부를 더 포함하는, 프로그램 유사도 산출 장치.8. The method of claim 7,
If each of the two programs is two executable programs composed of a machine language,
Further comprising a program conversion unit for converting each of the two execution programs into an assembly language-based program or a high-level language-based program that is a function or a basic block.

삭제delete

제7항에 있어서,
상기 프로그램 유사도 산출부는,
상기 서로 대응되는 비트의 비트값이 모두 '1'인 비트의 개수를 상기 서로 대응되는 비트의 비트값 중 적어도 하나가 '1'인 비트의 개수로 나누어 상기 2개의 프로그램 상호간의 유사도를 산출하는, 프로그램 유사도 산출 장치.8. The method of claim 7,
The program similarity degree calculating unit may calculate,
Calculating a degree of similarity between the two programs by dividing the number of bits having bit values of all of the bits corresponding to each other by '1' into the number of bits having at least one bit of the bits corresponding to each other, Program similarity calculating device.