KR102316671B1

KR102316671B1 - Method for treating sound using cnn

Info

Publication number: KR102316671B1
Application number: KR1020190160505A
Authority: KR
Inventors: 최성규
Original assignee: 주식회사 포스코건설
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2021-10-22
Also published as: KR20210070586A

Abstract

본 발명은 CNN을 이용하여 음향을 처리하는 CNN을 이용한 음향 처리방법에 관한 것으로서, 음향해석 시뮬레이션을 사용하여 음원 데이터와 음장 데이터를 생성하는 단계와, 시간차 데이터를 대량으로 취득하는 단계와, GCC-PHAT 과정을 통해 주파수 밴드별 특성을 추출하는 단계와, 이미지를 CNN 알고리즘으로 학습시키는단계와, CNN 알고리즘의 입력값에 실제 마이크로부터 취득한 시간차 데이터를 입력하는 단계와, 마이크 시간차 데이터로부터 음원의 위치예측과 노이즈맵핑의 출력값을 도출하는 단계를 포함하는 것을 특징으로 한다. 따라서 본 발명은 CNN 알고리즘을 이용하여 두 개의 마이크로부터 측정된 시간차 데이터로부터 소음 발생 지점에서의 노이즈맵핑 결과를 도출함으로써, 소음의 전달 경로를 예측해서 확인하는 동시에 방음 대책을 수립할 수 있는 효과를 제공한다.The present invention relates to a sound processing method using a CNN that processes sound using the CNN, comprising the steps of: generating sound source data and sound field data using acoustic analysis simulation; Extracting the characteristics of each frequency band through the PHAT process, learning the image with a CNN algorithm, inputting the time difference data obtained from the actual microphone to the input value of the CNN algorithm, and predicting the location of the sound source from the microphone time difference data and deriving an output value of noise mapping. Therefore, the present invention provides the effect of predicting and confirming the transmission path of noise and establishing soundproofing measures at the same time by deriving the noise mapping result at the noise generation point from the time difference data measured from two microphones using the CNN algorithm. do.

Description

CNN을 이용한 음향 처리방법{METHOD FOR TREATING SOUND USING CNN}Sound processing method using CNN {METHOD FOR TREATING SOUND USING CNN}

본 발명은 CNN을 이용한 음향 처리방법에 관한 것으로서, 보다 상세하게는 CNN(Convolutional Neural Network)을 이용하여 음원 위치예측 및 노이즈맵핑을 하는 CNN을 이용한 음향 처리방법에 관한 것이다.The present invention relates to a sound processing method using a CNN, and more particularly, to a sound processing method using a CNN for predicting a sound source location and noise mapping using a CNN (Convolutional Neural Network).

종래의 음성 분리 또는 음성 향상 방법에는, 통계적 모델 기반의 접근 방법과 템플릿 기반(template-based)의 접근 방법이 있다. 통계적 모델을 기반으로 하는 음성 향상 방법은, 입력 신호로부터 목표가 되는 음성과 그 외의 제거하고자 하는 잡음을 각각 다른 통계적 모델로 만들고, 매 시간 프레임마다 음성 존재 검출(voice activity detection), 잡음 추정(noise power tracking) 등의 방법을 결합하여 음성 향상 프로세스를 수행한다. Conventional speech separation or speech enhancement methods include a statistical model-based approach and a template-based approach. A voice enhancement method based on a statistical model makes a target voice and other noise to be removed from an input signal into different statistical models, and performs voice activity detection and noise power every time frame. tracking) to perform the voice enhancement process.

이러한 방법은 잡음이 시간의 흐름에 따라 천천히 변한다는(stationary) 가정 하에 진행이 된다. 이와 같이, 통계적 방식에서는, 음성과 잡음에 대한 통계적 모델을 기반으로 음성의 활성 여부를 판단하기 때문에, 신호 대 잡음비가 높은 신호에서는 비교적 정확한 음성 검출이 가능하지만 상대적으로 열악한 잡음 환경, 즉 주변 잡음이 급격히 변화하는(non-stationary) 환경에서는 음성 향상 성능이 급격히 저하되는 단점이 있다.This method proceeds under the assumption that noise changes slowly over time (stationary). As such, in the statistical method, since voice activity is determined based on a statistical model for voice and noise, relatively accurate voice detection is possible in a signal with a high signal-to-noise ratio, but relatively poor noise environment, that is, ambient noise In a rapidly changing (non-stationary) environment, there is a disadvantage in that voice enhancement performance is rapidly deteriorated.

반면에, 템플릿 기반의 접근 방법은, 사전에 미리 트레이닝 된 음성과 잡음의 패턴이나 통계적인 데이터 정보를 이용하는 방법이다. 즉, 트레이닝을 통해 미리 알고 있는 음성 특성과 잡음 정보를 통해 입력된 신호로부터 잡음을 제거하는 음성 향상 방식이다. On the other hand, the template-based approach uses pre-trained speech and noise patterns or statistical data information. That is, it is a voice enhancement method that removes noise from an input signal through noise information and voice characteristics known in advance through training.

이와 같은 접근 방법에 따르면, 급격히 변화하는 잡음 환경에서도 우수한 음성 향상 결과를 얻을 수 있다. 그러나 음원들의 주파수 특성이 크게 다를 경우에만 주어진 가정에 따른 모델이 성립하며, 서로 비슷해질수록 정확한 음원의 추정이 어려워지는 한계가 있었다.According to this approach, excellent voice enhancement results can be obtained even in a rapidly changing noise environment. However, the model according to the given assumption is established only when the frequency characteristics of the sound sources are significantly different, and there is a limit that the accurate sound source estimation becomes difficult as they become more similar.

이러한 음원 위치예측 기술에는 빔포밍(Beamforming), HRSE, TDOA(Time Difference of Arrival) 등 시간 및 주파수 영역에서 많은 연구가 선행되었다. TDOA 방법은 간단한 계산 방식과 높은 정확도로 가장 널리 쓰이고 있으며, 특히 GCC-PHAT( GCC (Generalized Cross Correlation)-PHAT (Phase Transform))는 잡음이나 반향 환경에서 좋은 특성을 보인다.A lot of research in the time and frequency domains, such as beamforming, HRSE, and TDOA (Time Difference of Arrival), has preceded such sound source location prediction technology. The TDOA method is most widely used due to its simple calculation method and high accuracy. In particular, GCC-PHAT (Generalized Cross Correlation (GCC)-PHAT (Phase Transform)) shows good characteristics in a noisy or reverberant environment.

음향해석 시뮬레이션은 주파수 영역에 따라 FEM(finite element method), BEM(boundary element method), 레이 트레이싱(Ray Tracing) 기법 등 상용 프로그램이 다양하게 존재하며 모델링이 구체적일수록 해석 정확도는 높아지나 해석시간이 오래 걸리는 단점이 있다.For acoustic analysis simulation, there are various commercial programs such as FEM (finite element method), BEM (boundary element method), and ray tracing techniques depending on the frequency domain. There is a downside to it.

특히, 음원의 위치의 시각화 용도로 음향카메라도 상용화 되어 있으나 구성이 복잡하며 장비가 고가이다. 특히, 실내 모델링의 경우 건물 벽체의 형상과 재질에 따른 흡음도, 투과손실도, 반사계수 등 고려 인자가 증가하게 되며 약간의 변경만으로도 많은 해석 시간이 필요하게 된다는 문제도 있었다.In particular, an acoustic camera has been commercialized for the purpose of visualizing the location of the sound source, but the configuration is complicated and the equipment is expensive. In particular, in the case of indoor modeling, factors such as sound absorption, transmission loss, and reflection coefficient increase according to the shape and material of the building wall, and there is a problem that a lot of analysis time is required even with a small change.

대한민국 등록특허 제10-1620866호 (2016년05월13일)Republic of Korea Patent Registration No. 10-1620866 (May 13, 2016) 대한민국 공개특허 제10-2019-0042203호 (2019년04월24일)Republic of Korea Patent Publication No. 10-2019-0042203 (April 24, 2019) 대한민국 등록특허 제10-1992970호 (2019년06월26일)Republic of Korea Patent No. 10-1992970 (June 26, 2019)

본 발명은 상기와 같은 종래의 문제점을 해소하기 위해 안출한 것으로서, CNN 알고리즘을 이용하여 두 개의 마이크로부터 측정된 시간차 데이터로부터 소음 발생 지점에서의 노이즈맵핑 결과를 도출함으로써, 소음의 전달 경로를 예측해서 확인하는 동시에 방음 대책을 수립할 수 있는 CNN을 이용한 음향 처리방법을 제공하는 것을 그 목적으로 한다. The present invention was devised to solve the conventional problems as described above, and by deriving the noise mapping result at the noise generation point from the time difference data measured from two microphones using the CNN algorithm, predicting the noise transmission path. The purpose is to provide a sound processing method using CNN that can confirm and establish soundproofing measures at the same time.

또한, 본 발명은 CNN 알고리즘 학습 기법을 적용한 사전 학습 알고리즘 기반의 음원 분리 방법에 따라 음원 사이의 관계를 학습하고 학습된 관계를 이용해 음원 위치예측 및 노이즈맵핑을 도출함으로써, 음원 위치예측 및 노이즈맵핑에 의해 결함의 위치와 크기를 정확히 판단할 수 있어, 불량 원인, 고장 진단시에 적용할 수 있는 CNN을 이용한 음향 처리방법을 제공하는 것을 또 다른 목적으로 한다. In addition, the present invention learns the relationship between sound sources according to a sound source separation method based on a pre-learning algorithm to which a CNN algorithm learning technique is applied, and derives sound source location prediction and noise mapping using the learned relationship, thereby providing sound source location prediction and noise mapping. Another object of the present invention is to provide a sound processing method using CNN that can be used to accurately determine the location and size of a defect by means of a CNN, which can be applied when diagnosing the cause of a defect or a failure.

또한, 본 발명은 가상 음원과 가상 마이크로부터 음원 데이터와 음장 데이터를 생성하여 CNN 알고리즘을 학습해서 실제 마이크 시간차 데이터로부터 음원의 위치예측과 노이즈맵핑의 출력값을 도출함으로써, 도출된 음원 위치예측 및 노이즈맵핑에 의해 생산 조업 조건에 따른 제품의 품질을 판정하고 이를 이용하여 생산조건을 최적화 하는 동시에 생산현장에서 결함 발생을 최소화할 수 있는 CNN을 이용한 음향 처리방법을 제공하는 것을 또 다른 목적으로 한다. In addition, the present invention generates sound source data and sound field data from a virtual sound source and a virtual microphone, learns the CNN algorithm, and derives the output value of the sound source location prediction and noise mapping from the actual microphone time difference data, thereby predicting the sound source location and noise mapping Another object is to provide a sound processing method using CNN that can determine the quality of products according to the production operating conditions and optimize the production conditions by using them and minimize the occurrence of defects in the production site.

상기와 같은 목적을 달성하기 위한 본 발명은, CNN(Convolutional Neural Network)을 이용하여 음향을 처리하는 음향 처리방법으로서, 음향해석 시뮬레이션을 사용하여 가상 음원과 복수개의 가상 마이크로부터 음원 데이터와 음장 데이터를 생성하는 음향해석 시뮬레이션 단계; 가상 음원과 복수개의 가상 마이크 사이의 수평 각도를 변화시키면서 복수개의 가상 마이크 사이의 시간차 데이터를 대량으로 취득하는 시간 도달차 데이터 생성 단계; GCC-PHAT(Generalized Cross Correlation - Phase Transform) 과정을 통해 주파수 밴드별 특성을 추출하는 GCC-PHAT 이미지 생성 단계; 특성의 추출을 위해 정제된 복수의 이미지를 CNN 알고리즘으로 학습시키는 CNN 알고리즘 학습 단계; 상기 학습된 CNN 알고리즘의 입력값에 복수개의 실제 마이크로부터 취득한 시간차 데이터를 입력하는 마이크 실측 데이터 입력 단계; 및 상기 입력된 마이크 시간차 데이터로부터 음원의 위치예측과 노이즈맵핑의 출력값을 도출하는 결과 도출 단계;를 포함하는 것을 특징으로 한다.The present invention for achieving the above object is a sound processing method for processing sound using a CNN (Convolutional Neural Network), and sound source data and sound field data from a virtual sound source and a plurality of virtual microphones using sound analysis simulation. generating an acoustic analysis simulation step; a time arrival difference data generating step of acquiring a large amount of time difference data between the plurality of virtual microphones while changing the horizontal angle between the virtual sound source and the plurality of virtual microphones; Generating a GCC-PHAT image for extracting characteristics for each frequency band through a GCC-PHAT (Generalized Cross Correlation-Phase Transform) process; A CNN algorithm learning step of learning a plurality of images refined for feature extraction with a CNN algorithm; a microphone measurement data input step of inputting time difference data obtained from a plurality of real microphones to the learned CNN algorithm input value; and a result derivation step of deriving output values of the position prediction and noise mapping of the sound source from the input microphone time difference data.

본 발명의 CNN(Convolutional Neural Network)을 이용하여 음향을 처리하는 음향 처리방법으로서, 음향해석 시뮬레이션을 사용하여 가상 음원과 복수개의 가상 마이크로부터 음원 데이터와 음장 데이터를 생성하는 음향해석 시뮬레이션 단계; 가상 음원이 위치하는 공간 조건의 설정과 가상 음원 위치의 변경으로 복수의 해석을 수행하는 경계조건 별 음장해석 단계; 가상 음원의 특성에 따른 음장해석 이미지를 수집하는 음장해석 이미지 생성 단계; 특성의 추출을 위해 정제된 복수의 이미지를 CNN 알고리즘으로 학습시키는 CNN 알고리즘 학습 단계; 상기 학습된 CNN 알고리즘의 입력값에 복수개의 실제 마이크로부터 취득한 시간차 데이터를 입력하는 마이크 실측 데이터 입력 단계; 및 상기 입력된 마이크 시간차 데이터로부터 음원의 위치예측과 노이즈맵핑의 출력값을 도출하는 결과 도출 단계;를 포함하는 것을 특징으로 한다.A sound processing method for processing sound using a Convolutional Neural Network (CNN) of the present invention, comprising: an acoustic analysis simulation step of generating sound source data and sound field data from a virtual sound source and a plurality of virtual microphones using the acoustic analysis simulation; a sound field analysis step for each boundary condition of performing a plurality of analyzes by setting a spatial condition in which the virtual sound source is located and changing the location of the virtual sound source; A sound field analysis image generation step of collecting sound field analysis images according to the characteristics of the virtual sound source; A CNN algorithm learning step of learning a plurality of images refined for feature extraction with a CNN algorithm; a microphone measurement data input step of inputting time difference data obtained from a plurality of real microphones to the learned CNN algorithm input value; and a result derivation step of deriving output values of the position prediction and noise mapping of the sound source from the input microphone time difference data.

본 발명의 CNN(Convolutional Neural Network)을 이용하여 음향을 처리하는 음향 처리방법으로서, 음향해석 시뮬레이션을 사용하여 가상 음원과 복수개의 가상 마이크로부터 음원 데이터와 음장 데이터를 생성하는 음향해석 시뮬레이션 단계; 가상 음원과 복수개의 가상 마이크 사이의 수평 각도를 변화시키면서 복수개의 가상 마이크 사이의 시간차 데이터를 복수개 취득하는 시간 도달차 데이터 생성 단계; GCC-PHAT(Generalized Cross Correlation - Phase Transform) 과정을 통해 주파수 밴드별 특성을 추출하는 GCC-PHAT 이미지 생성 단계; 가상 음원이 위치하는 공간 조건의 설정과 가상 음원 위치의 변경으로 복수의 해석을 수행하는 경계조건 별 음장해석 단계; 가상 음원의 특성에 따른 음장해석 이미지를 수집하는 음장해석 이미지 생성 단계; 특성의 추출을 위해 정제된 복수의 이미지를 CNN 알고리즘으로 학습시키는 CNN 알고리즘 학습 단계; 상기 학습된 CNN 알고리즘의 입력값으로 복수개의 실제 마이크로부터 취득한 시간차 데이터를 입력하는 마이크 실측 데이터 입력 단계; 및 상기 입력된 마이크 시간차 데이터로부터 음원의 위치예측과 노이즈맵핑의 출력값을 도출하는 결과 도출 단계;를 포함하는 것을 특징으로 한다.A sound processing method for processing sound using a Convolutional Neural Network (CNN) of the present invention, comprising: an acoustic analysis simulation step of generating sound source data and sound field data from a virtual sound source and a plurality of virtual microphones using the acoustic analysis simulation; a time arrival difference data generation step of acquiring a plurality of time difference data between a plurality of virtual microphones while changing a horizontal angle between the virtual sound source and the plurality of virtual microphones; Generating a GCC-PHAT image for extracting characteristics for each frequency band through a GCC-PHAT (Generalized Cross Correlation-Phase Transform) process; a sound field analysis step for each boundary condition of performing a plurality of analyzes by setting a spatial condition in which the virtual sound source is located and changing the location of the virtual sound source; A sound field analysis image generation step of collecting sound field analysis images according to the characteristics of the virtual sound source; A CNN algorithm learning step of learning a plurality of images refined for feature extraction with a CNN algorithm; a microphone measurement data input step of inputting time difference data obtained from a plurality of real microphones as input values of the learned CNN algorithm; and a result derivation step of deriving output values of sound source position prediction and noise mapping from the input microphone time difference data.

본 발명의 상기 음향해석 시뮬레이션 단계에서는, 음향해석 시뮬레이션을 사용하여 가상 음원으로부터 복수개의 가상 마이크까지 전달되는 시간 및 진폭의 음원 데이터와, 가상 공간에서의 노이즈맵핑의 음장 데이터를 생성하는 것을 특징으로 한다.In the acoustic analysis simulation step of the present invention, sound source data of time and amplitude transmitted from a virtual sound source to a plurality of virtual microphones and sound field data of noise mapping in a virtual space are generated using the acoustic analysis simulation. .

본 발명의 상기 GCC-PHAT 이미지 생성 단계는, 복수개의 가상 마이크에 도달한 가상 음원신호를 위상차의 계산으로 크로스 스펙트럼(Cross-Spectrum) 하는 단계; 상기 크로스 스펙트럼의 가중치를 일정하게 하기 위해 주파수 크기를 1로 백색화(Whitening)하는 단계; 음원신호의 시간차를 주파수에 따라 달리하여 복수개의 밴드로 구성하고, 각각의 밴드를 제외한 특성을 0으로 설정해서 밴드 패스 필터(Bandpass Filter)하는 밴드 패스 필터 단계; 및 일정시간 후에 주파수/진폭 영역에서 시간/진폭 영역으로 역과정인 IFFT(Inverse Fast Fourier Tramsform)계산에 의해 특성을 추출하는 단계;를 포함하는 것을 특징으로 한다.The step of generating the GCC-PHAT image of the present invention includes: performing cross-spectrum of a virtual sound source signal arriving at a plurality of virtual microphones by calculating a phase difference; whitening the frequency level to 1 to make the weight of the cross spectrum constant; a bandpass filter step of configuring a plurality of bands by varying the time difference of the sound source signal according to the frequency, and performing a bandpass filter by setting characteristics except for each band to 0; and extracting the characteristic by IFFT (Inverse Fast Fourier Tramsform) calculation, which is an inverse process from the frequency/amplitude domain to the time/amplitude domain after a predetermined time.

본 발명의 상기 밴드 패스 필터 단계는, 음원신호의 시간차를 63∼125Hz, 125∼250Hz, 250∼500Hz, 500∼1,000Hz, 1,000∼2,000Hz, 2,000∼4,000Hz, 4,000∼8,000Hz의 주파수에 따라 밴드로 구성하는 것을 특징으로 한다.In the band pass filter step of the present invention, the time difference of the sound source signal according to the frequency of 63 to 125 Hz, 125 to 250 Hz, 250 to 500 Hz, 500 to 1,000 Hz, 1,000 to 2,000 Hz, 2,000 to 4,000 Hz, 4,000 to 8,000 Hz. It is characterized in that it consists of a band.

이상에서 살펴본 바와 같이, 본 발명은 CNN 알고리즘을 이용하여 두 개의 마이크로부터 측정된 시간차 데이터로부터 소음 발생 지점에서의 노이즈맵핑 결과를 도출함으로써, 소음의 전달 경로를 예측해서 확인하는 동시에 방음 대책을 수립할 수 있는 효과를 제공한다.As described above, the present invention derives the noise mapping result at the noise generation point from the time difference data measured from two microphones using the CNN algorithm, thereby predicting and confirming the transmission path of noise and establishing soundproofing measures at the same time. provide possible effects.

또한, CNN 알고리즘 학습 기법을 적용한 사전 학습 알고리즘 기반의 음원 분리 방법에 따라 음원 사이의 관계를 학습하고 학습된 관계를 이용해 음원 위치예측 및 노이즈맵핑을 도출함으로써, 음원 위치예측 및 노이즈맵핑에 의해 결함의 위치와 크기를 정확히 판단할 수 있어, 불량 원인, 고장 진단시에 적용할 수 있는 효과를 제공한다.In addition, by learning the relationship between sound sources according to the sound source separation method based on the pre-learning algorithm applying the CNN algorithm learning technique, and deriving the sound source location prediction and noise mapping using the learned relationship, the sound source location prediction and noise mapping It can accurately determine the location and size, providing an effect that can be applied when diagnosing the cause of defects and failures.

또한, 가상 음원과 가상 마이크로부터 음원 데이터와 음장 데이터를 생성하여 CNN 알고리즘을 학습해서 실제 마이크 시간차 데이터로부터 음원의 위치예측과 노이즈맵핑의 출력값을 도출함으로써, 도출된 음원 위치예측 및 노이즈맵핑에 의해 생산 조업 조건에 따른 제품의 품질을 판정하고 이를 이용하여 생산조건을 최적화 하는 동시에 생산현장에서 결함 발생을 최소화할 수 있는 효과를 제공한다.In addition, by generating sound source data and sound field data from virtual sound sources and virtual microphones, learning CNN algorithm, and deriving output values of sound source location prediction and noise mapping from actual microphone time difference data, it is produced by the derived sound source location prediction and noise mapping. It judges the quality of products according to the operating conditions and uses them to optimize the production conditions and at the same time provide the effect of minimizing the occurrence of defects in the production site.

도 1은 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법을 나타내는 구성도.
도 2는 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법의 GCC-PHAT 이미지 생성단계를 나타내는 구성도.
도 3은 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법의 시간에 따른 주파수 밴드의 특성을 이미지화한 GCC-PHAT 이미지의 합성곱 계층(Convolution Layer)의 구성을 통해 CNN 알고리즘을 적용하는 상세도.
도 4는 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법의 음장해석 이미지 생성 단계의 이미지를 나타내는 상세도.
도 5는 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법의 음장해석 이미지 생성 단계의 이미지의 복원상태를 나타내는 구성도.
도 6은 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법의 처리결과를 나타내는 예시도.1 is a block diagram showing a sound processing method using CNN according to an embodiment of the present invention.
Figure 2 is a block diagram showing the GCC-PHAT image generation step of the sound processing method using CNN according to an embodiment of the present invention.
3 is a detailed view of applying a CNN algorithm through the construction of a convolution layer of a GCC-PHAT image imaged with characteristics of a frequency band according to time of a sound processing method using CNN according to an embodiment of the present invention. do.
4 is a detailed view showing an image of the sound field analysis image generation step of the sound processing method using CNN according to an embodiment of the present invention.
5 is a block diagram showing the restoration state of the image in the sound field analysis image generation step of the sound processing method using CNN according to an embodiment of the present invention.
6 is an exemplary view showing a processing result of a sound processing method using CNN according to an embodiment of the present invention.

이하, 첨부도면을 참조하여 본 발명의 바람직한 일 실시예를 더욱 상세히 설명한다. Hereinafter, a preferred embodiment of the present invention will be described in more detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법을 나타내는 구성도이고, 도 2는 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법의 GCC-PHAT 이미지 생성단계를 나타내는 구성도이고, 도 3은 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법의 시간에 따른 주파수 밴드의 특성을 이미지화한 GCC-PHAT 이미지의 합성곱 계층(Convolution Layer)의 구성을 통해 CNN 알고리즘을 적용하는 상세도이고, 도 4는 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법의 음장해석 이미지 생성 단계의 이미지를 나타내는 상세도이고, 도 5는 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법의 음장해석 이미지 생성 단계의 이미지의 복원상태를 나타내는 구성도이고, 도 6은 본 발명의 일 실시예에 의한 CNN을 이용한 음향 처리방법의 처리결과를 나타내는 예시도이다.1 is a block diagram showing a sound processing method using CNN according to an embodiment of the present invention, and FIG. 2 is a configuration showing a GCC-PHAT image generation step of a sound processing method using CNN according to an embodiment of the present invention. 3 is a CNN algorithm through the configuration of a convolution layer of the GCC-PHAT image that imaged the characteristics of the frequency band according to time of the sound processing method using CNN according to an embodiment of the present invention. 4 is a detailed view showing an image of a sound field analysis image generation step of a sound processing method using CNN according to an embodiment of the present invention, and FIG. 5 is a CNN according to an embodiment of the present invention. It is a block diagram showing the restoration state of the image in the sound field analysis image generation step of the sound processing method used, and FIG. 6 is an exemplary view showing the processing result of the sound processing method using CNN according to an embodiment of the present invention.

도 1에 나타낸 바와 같이, 본 실시예에 의한 CNN을 이용한 음향 처리방법의 일예는, 음향해석 시뮬레이션 단계(S110), 시간 도달차 데이터 생성 단계(S120), GCC-PHAT 이미지 생성 단계(S130), CNN 알고리즘 학습 단계(S160), 마이크 실측 데이터 입력 단계(S170) 및 결과 도출 단계(S180)를 포함하여 이루어져, CNN(Convolutional Neural Network)을 이용하여 음향을 처리하는 음향 처리방법이다.As shown in Fig. 1, an example of the sound processing method using CNN according to this embodiment is a sound analysis simulation step (S110), a time arrival difference data generation step (S120), a GCC-PHAT image generation step (S130), It is a sound processing method for processing sound using a CNN (Convolutional Neural Network) by including a CNN algorithm learning step (S160), a microphone measurement data input step (S170), and a result derivation step (S180).

음향해석 시뮬레이션 단계(S110)는, 음향해석 시뮬레이션을 사용하여 가상 음원과 복수개의 가상 마이크로부터 음원 데이터와 음장 데이터를 생성하는 단계로서, CNN(Convolutional Neural Network)을 이용하여 음향을 처리하기 전에 가상 음원과 2개의 가상 마이크로부터 음원 데이터와 음장 데이터를 생성하게 된다.The acoustic analysis simulation step (S110) is a step of generating sound source data and sound field data from a virtual sound source and a plurality of virtual microphones using the acoustic analysis simulation, and a virtual sound source before processing the sound using a convolutional neural network (CNN). and two virtual microphones to generate sound source data and sound field data.

이러한 음향해석 시뮬레이션 단계(S110)에서는, 음향해석 시뮬레이션을 사용하여 가상 음원으로부터 2개의 가상 마이크까지 전달되는 시간 및 진폭의 음원 데이터와, 가상 공간에서의 노이즈맵핑의 음장 데이터를 생성하는 것이 바람직하다.In this acoustic analysis simulation step ( S110 ), it is preferable to generate sound source data of time and amplitude transmitted from the virtual sound source to the two virtual microphones and sound field data of noise mapping in the virtual space using the acoustic analysis simulation.

시간 도달차 데이터 생성 단계(S120)는, 가상 음원과 복수개의 가상 마이크 사이의 수평 각도를 변화시키면서 복수개의 가상 마이크 사이의 시간차 데이터를 대량으로 취득하는 단계로서, 가상 음원과 2개의 가상 마이크 사이의 수평 각도를 변화시키면서 2개의 가상 마이크 사이의 시간차 데이터를 대량으로 취득하게 된다.The time arrival difference data generation step S120 is a step of acquiring a large amount of time difference data between a plurality of virtual microphones while changing a horizontal angle between the virtual sound source and the plurality of virtual microphones. By changing the horizontal angle, a large amount of time difference data between two virtual microphones is acquired.

GCC-PHAT 이미지 생성 단계(S130)는, GCC-PHAT(Generalized Cross Correlation - Phase Transform) 과정을 통해 주파수 밴드별 특성을 추출하는 단계로서, 도 3에 나타낸 바와 같이 시간에 따른 주파수 밴드의 특성을 이미지화한 GCC-PHAT 이미지의 합성곱 계층(Convolution Layer)의 구성을 통해 CNN 알고리즘을 적용하게 된다.The GCC-PHAT image generation step (S130) is a step of extracting characteristics for each frequency band through a GCC-PHAT (Generalized Cross Correlation-Phase Transform) process. As shown in FIG. 3 , the characteristics of the frequency band over time are imaged. The CNN algorithm is applied through the construction of the convolution layer of one GCC-PHAT image.

이러한 GCC-PHAT 이미지 생성 단계(S130)는, 도 2에 나타낸 바와 같이 제1 음원신호 수집단계(S131), 제2 음원신호 입력단계(S132), 크로스 스펙트럼 단계(S133), 백색화 단계(S134), 밴드 패스 필터 단계(S135) 및 특성 추출 단계(S136)를 포함하여 이루어져 있다.The GCC-PHAT image generation step (S130), as shown in FIG. 2, includes a first sound source signal collection step (S131), a second sound source signal input step (S132), a cross spectrum step (S133), and a whitening step (S134). ), a band pass filter step (S135) and a feature extraction step (S136).

제1 음원신호 수집단계(S131)에서는 가상 마이크에 도달한 가상의 제1 음원신호를 수집하게 되고, 제2 음원신호 수집단계(S132)에서는 가상 마이크에 도달한 가상의 제1 음원신호를 수집하게 된다.In the first sound source signal collection step (S131), the virtual first sound source signal that has reached the virtual microphone is collected, and in the second sound source signal collection step (S132), the virtual first sound source signal that has reached the virtual microphone is collected. do.

크로스 스펙트럼(Cross-Spectrum) 단계(S133)는, 복수개의 가상 마이크에 도달한 가상 음원신호를 위상차의 계산으로 크로스 스펙트럼(Cross-Spectrum) 하는 단계로서, 2개의 가상 마이크에 도달한 가상 음원신호를 위상차의 계산하게 된다.The cross-spectrum step (S133) is a step of cross-spectrum the virtual sound source signal arriving at the plurality of virtual microphones by calculating the phase difference. The phase difference is calculated.

이러한 크로스 스펙트럼(Cross-Spectrum) 단계(S133)에서는 2개의 마이크로부터 받아들인 신호를 각각의 전치 증폭기를 통하여 A/D 변환기에 의해 디지털 신호화하여 크로스 스펙트럼을 이 디지털화 된 신호로부터 계산하고 측정계가 가지는 잡음과 소음 원의 잡음이 서로 무상관하게 되므로 정확한 소음레벨을 측정할 수 있게 된다. In this cross-spectrum step (S133), the signals received from the two microphones are digitalized by the A/D converter through each preamplifier, the cross-spectrum is calculated from the digitized signals, and the Since the noise and the noise of the noise source become uncorrelated with each other, it is possible to accurately measure the noise level.

백색화 단계(S134)는, 크로스 스펙트럼의 가중치를 일정하게 하기 위해 주파수 크기를 1로 백색화(Whitening)하는 단계로서, 크로스 스펙트럼(Cross-Spectrum) 단계(S133)에서 가중치를 일정하도록 주파수를 1로 동일하게 유지하게 된다.The whitening step (S134) is a step of whitening the frequency to 1 in order to make the weight of the cross spectrum constant, and the frequency is set to 1 so that the weight is constant in the cross-spectrum step (S133). will remain the same as

밴드 패스 필터 단계(S135)는, 음원신호의 시간차를 주파수에 따라 달리하여 복수개의 밴드로 구성하고, 각각의 밴드를 제외한 특성을 0으로 설정해서 밴드 패스 필터(Bandpass Filter)하는 단계로서, 이러한 밴드 패스 필터 단계는, 음원신호의 시간차를 63∼125Hz, 125∼250Hz, 250∼500Hz, 500∼1,000Hz, 1,000∼2,000Hz, 2,000∼4,000Hz, 4,000∼8,000Hz의 주파수에 따라 7개의 밴드로 구성하는 것이 바람직하다.The bandpass filter step (S135) is a step of performing a bandpass filter by varying the time difference of the sound source signal according to the frequency to form a plurality of bands, and setting characteristics except for each band to 0. In the pass filter stage, the time difference of the sound source signal is composed of 7 bands according to the frequencies of 63-125Hz, 125-250Hz, 250-500Hz, 500-1,000Hz, 1,000-2,000Hz, 2,000-4,000Hz, and 4,000-8,000Hz. It is preferable to do

특성 추출 단계(S136)는, 일정시간 후에 주파수/진폭 영역에서 시간/진폭 영역으로 역과정인 IFFT(Inverse Fast Fourier Tramsform) 계산에 의해 특성을 추출하는 단계로서, IFFT 계산에 의해 음원신호를 주파수 영역에서 시간 영역으로 변환하게 된다.The feature extraction step (S136) is a step of extracting characteristics by IFFT (Inverse Fast Fourier Tramsform) calculation, which is an inverse process from the frequency/amplitude domain to the time/amplitude domain after a certain time, and converts the sound source signal into the frequency domain by IFFT calculation. is transformed into the time domain.

CNN 알고리즘 학습 단계(S160)는, 음원신호 특성의 추출을 위해 정제된 복수의 이미지를 CNN 알고리즘으로 학습시키는 단계로서, CNN(Convolutional Neural Network; 합성곱신경망) 알고리즘을 이용하여 음향해석 시뮬레이션으로부터 생성된 다량의 이미지를 학습시키게 된다.The CNN algorithm learning step (S160) is a step of learning a plurality of images refined for extraction of sound source signal characteristics with a CNN algorithm. Learn a lot of images.

마이크 실측 데이터 입력 단계(S170)는, CNN 알고리즘 학습 단계(S160)에서 학습된 CNN 알고리즘의 입력값에 복수개의 실제 마이크로부터 취득한 시간차 데이터를 입력하는 단계로서, CNN 알고리즘의 입력값으로 2개의 실제 마이크로부터 취득한 시간차 데이터를 입력하게 된다.The microphone measurement data input step (S170) is a step of inputting time difference data obtained from a plurality of real microphones to the input value of the CNN algorithm learned in the CNN algorithm learning step (S160). The time difference data obtained from

결과 도출 단계(S180)는, 마이크 실측 데이터 입력 단계(S170)에서 입력된 마이크 시간차 데이터로부터 음원의 위치예측과 노이즈맵핑의 출력값을 도출하는 단계로서, CNN 알고리즘의 입력값으로 2개의 실제 마이크로부터 취득한 시간차 데이터를 입력하여 연산한 출력값으로 음원의 위치예측과 노이즈맵핑을 도출하게 된다.The result derivation step (S180) is a step of deriving the output values of the location prediction and noise mapping of the sound source from the microphone time difference data input in the microphone measurement data input step (S170). By inputting the time difference data and calculating the output value, the location prediction and noise mapping of the sound source are derived.

도 1에 나타낸 바와 같이, 본 실시예에 의한 CNN을 이용한 음향 처리방법의 다른예는, 음향해석 시뮬레이션 단계(S110), 경계조건 별 음장해석 단계(S140), 음장해석 이미지 생성 단계(S150), CNN 알고리즘 학습 단계(S160), 마이크 실측 데이터 입력 단계(S170) 및 결과 도출 단계(S180)를 포함하여 이루어져, CNN(Convolutional Neural Network)을 이용하여 음향을 처리하는 음향 처리방법이다.As shown in FIG. 1, another example of the sound processing method using CNN according to this embodiment is a sound analysis simulation step (S110), a sound field analysis step for each boundary condition (S140), a sound field analysis image generation step (S150), It is a sound processing method for processing sound using a CNN (Convolutional Neural Network) by including a CNN algorithm learning step (S160), a microphone measurement data input step (S170), and a result derivation step (S180).

경계조건 별 음장해석 단계(S140)는, 가상 음원이 위치하는 공간 조건의 설정과 가상 음원 위치의 변경으로 복수의 해석을 수행하는 단계로서, 건물 내부, 외부와 같은 가상 음원이 위치하는 공간 조건의 설정과 가상 음원 위치 변경으로 대량의 해석을 수행하게 된다.The sound field analysis step (S140) for each boundary condition is a step of performing a plurality of analyzes by setting the spatial condition in which the virtual sound source is located and changing the location of the virtual sound source. A large amount of analysis is performed by changing the settings and the location of the virtual sound source.

음장해석 이미지 생성 단계(S150)는, 경계조건 별 음장해석 단계(S140)에서 해석된 가상 음원의 특성에 따른 음장해석 이미지를 수집하는 단계로서, 도 4 및 도 5에 나타낸 바와 같이 가상 음원 1/1 Octaveband 특성에 따른 음장해석 결과 이미지를 수집하게 된다.The sound field analysis image generation step (S150) is a step of collecting sound field analysis images according to the characteristics of the virtual sound source analyzed in the sound field analysis step (S140) for each boundary condition, and as shown in FIGS. 4 and 5, the virtual sound source 1/ 1 The image is collected as a result of sound field analysis according to the characteristics of Octaveband.

도 1에 나타낸 바와 같이, 본 실시예에 의한 CNN을 이용한 음향 처리방법의 또 다른예는, 음향해석 시뮬레이션 단계(S110), 시간 도달차 데이터 생성 단계(S120), GCC-PHAT 이미지 생성 단계(S130), 경계조건 별 음장해석 단계(S140), 음장해석 이미지 생성 단계(S150), CNN 알고리즘 학습 단계(S160), 마이크 실측 데이터 입력 단계(S170) 및 결과 도출 단계(S180)를 포함하여 이루어져, CNN(Convolutional Neural Network)을 이용하여 음향을 처리하는 음향 처리방법이다.As shown in Fig. 1, another example of the sound processing method using CNN according to the present embodiment is a sound analysis simulation step (S110), a time arrival difference data generation step (S120), a GCC-PHAT image generation step (S130) ), a sound field analysis step (S140) for each boundary condition, a sound field analysis image generation step (S150), a CNN algorithm learning step (S160), a microphone measurement data input step (S170) and a result derivation step (S180). It is a sound processing method that processes sound using (Convolutional Neural Network).

따라서, 본 실시예의 CNN을 이용한 음향 처리방법은, 도 6에 나타낸 바와 같이 학습된 GCC-PHAT와 노이즈맵핑을 연계하여 실측 마이크 기반의 음원 위치예측 및 노이즈맵핑 결과를 나타내게 된다.Therefore, the sound processing method using CNN of this embodiment shows the sound source location prediction and noise mapping result based on the actual measurement microphone by linking the learned GCC-PHAT and noise mapping as shown in FIG. 6 .

이상 설명한 바와 같이, 본 발명에 따르면 CNN 알고리즘을 이용하여 두 개의 마이크로부터 측정된 시간차 데이터로부터 소음 발생 지점에서의 노이즈맵핑 결과를 도출함으로써, 소음의 전달 경로를 예측해서 확인하는 동시에 방음 대책을 수립할 수 있는 효과를 제공한다.As described above, according to the present invention, by deriving the noise mapping result at the noise generation point from the time difference data measured from two microphones using the CNN algorithm, it is possible to predict and confirm the transmission path of noise and establish soundproofing measures at the same time. provide possible effects.

이상 설명한 본 발명은 그 기술적 사상 또는 주요한 특징으로부터 벗어남이 없이 다른 여러 가지 형태로 실시될 수 있다. 따라서 상기 실시예는 모든 점에서 단순한 예시에 지나지 않으며 한정적으로 해석되어서는 안 된다. The present invention described above can be embodied in various other forms without departing from the technical spirit or main characteristics thereof. Accordingly, the above embodiments are merely examples in all respects and should not be construed as limiting.

Claims

CNN(Convolutional Neural Network)을 이용하여 음향을 처리하는 음향 처리방법으로서,
음향해석 시뮬레이션을 사용하여 가상 음원과 복수개의 가상 마이크로부터 음원 데이터와 음장 데이터를 생성하는 음향해석 시뮬레이션 단계;
가상 음원과 복수개의 가상 마이크 사이의 수평 각도를 변화시키면서 복수개의 가상 마이크 사이의 시간차 데이터를 대량으로 취득하는 시간 도달차 데이터 생성 단계;
GCC-PHAT(Generalized Cross Correlation - Phase Transform) 과정을 통해 주파수 밴드별 특성을 추출하는 GCC-PHAT 이미지 생성 단계;
특성의 추출을 위해 정제된 복수의 이미지를 CNN 알고리즘으로 학습시키는 CNN 알고리즘 학습 단계;
상기 학습된 CNN 알고리즘의 입력값에 복수개의 실제 마이크로부터 취득한 시간차 데이터를 입력하는 마이크 실측 데이터 입력 단계; 및
상기 입력된 마이크 시간차 데이터로부터 음원의 위치예측과 노이즈맵핑의 출력값을 도출하는 결과 도출 단계;를 포함하고,
상기 GCC-PHAT 이미지 생성 단계는,
복수개의 가상 마이크에 도달한 가상 음원신호를 위상차의 계산으로 크로스 스펙트럼(Cross-Spectrum) 하는 단계;
상기 크로스 스펙트럼의 가중치를 일정하게 하기 위해 주파수 크기를 1로 백색화(Whitening)하는 단계;
음원신호의 시간차를 주파수에 따라 달리하여 복수개의 밴드로 구성하고, 각각의 밴드를 제외한 특성을 0으로 설정해서 밴드 패스 필터(Bandpass Filter)하는 밴드 패스 필터 단계; 및
일정시간 후에 주파수/진폭 영역에서 시간/진폭 영역으로 역과정인 IFFT(Inverse Fast Fourier Tramsform)계산에 의해 특성을 추출하는 단계;를 포함하는 것을 특징으로 하는 CNN을 이용한 음향 처리방법.As a sound processing method for processing sound using CNN (Convolutional Neural Network),
an acoustic analysis simulation step of generating sound source data and sound field data from a virtual sound source and a plurality of virtual microphones using the acoustic analysis simulation;
a time arrival difference data generating step of acquiring a large amount of time difference data between a plurality of virtual microphones while changing a horizontal angle between the virtual sound source and the plurality of virtual microphones;
Generating a GCC-PHAT image for extracting characteristics for each frequency band through a GCC-PHAT (Generalized Cross Correlation-Phase Transform) process;
A CNN algorithm learning step of learning a plurality of images refined for feature extraction with a CNN algorithm;
a microphone measurement data input step of inputting time difference data obtained from a plurality of real microphones to the learned CNN algorithm input value; and
Including; a result deriving step of deriving the output value of the location prediction and noise mapping of the sound source from the input microphone time difference data;
The GCC-PHAT image generation step includes:
Cross-spectrum the virtual sound source signal that has reached the plurality of virtual microphones by calculating the phase difference;
whitening the frequency level to 1 to make the weight of the cross spectrum constant;
a bandpass filter step of configuring a plurality of bands by varying the time difference of the sound source signal according to the frequency, and performing a bandpass filter by setting characteristics except for each band to 0; and
A sound processing method using a CNN, comprising: extracting a characteristic by an Inverse Fast Fourier Tramsform (IFFT) calculation, which is an inverse process from the frequency/amplitude domain to the time/amplitude domain after a predetermined time.

삭제delete

CNN(Convolutional Neural Network)을 이용하여 음향을 처리하는 음향 처리방법으로서,
음향해석 시뮬레이션을 사용하여 가상 음원과 복수개의 가상 마이크로부터 음원 데이터와 음장 데이터를 생성하는 음향해석 시뮬레이션 단계;
가상 음원과 복수개의 가상 마이크 사이의 수평 각도를 변화시키면서 복수개의 가상 마이크 사이의 시간차 데이터를 복수개 취득하는 시간 도달차 데이터 생성 단계;
GCC-PHAT(Generalized Cross Correlation - Phase Transform) 과정을 통해 주파수 밴드별 특성을 추출하는 GCC-PHAT 이미지 생성 단계;
가상 음원이 위치하는 공간 조건의 설정과 가상 음원 위치의 변경으로 복수의 해석을 수행하는 경계조건 별 음장해석 단계;
가상 음원의 특성에 따른 음장해석 이미지를 수집하는 음장해석 이미지 생성 단계;
특성의 추출을 위해 정제된 복수의 이미지를 CNN 알고리즘으로 학습시키는 CNN 알고리즘 학습 단계;
상기 학습된 CNN 알고리즘의 입력값으로 복수개의 실제 마이크로부터 취득한 시간차 데이터를 입력하는 마이크 실측 데이터 입력 단계; 및
상기 입력된 마이크 시간차 데이터로부터 음원의 위치예측과 노이즈맵핑의 출력값을 도출하는 결과 도출 단계;를 포함하고,
상기 GCC-PHAT 이미지 생성 단계는,
복수개의 가상 마이크에 도달한 가상 음원신호를 위상차의 계산으로 크로스 스펙트럼(Cross-Spectrum) 하는 단계;
상기 크로스 스펙트럼의 가중치를 일정하게 하기 위해 주파수 크기를 1로 백색화(Whitening)하는 단계;
음원신호의 시간차를 주파수에 따라 달리하여 복수개의 밴드로 구성하고, 각각의 밴드를 제외한 특성을 0으로 설정해서 밴드 패스 필터(Bandpass Filter)하는 밴드 패스 필터 단계; 및
일정시간 후에 주파수/진폭 영역에서 시간/진폭 영역으로 역과정인 IFFT(Inverse Fast Fourier Tramsform)계산에 의해 특성을 추출하는 단계;를 포함하는 것을 특징으로 하는 CNN을 이용한 음향 처리방법.As a sound processing method for processing sound using CNN (Convolutional Neural Network),
an acoustic analysis simulation step of generating sound source data and sound field data from a virtual sound source and a plurality of virtual microphones using the acoustic analysis simulation;
a time arrival difference data generation step of acquiring a plurality of time difference data between a plurality of virtual microphones while changing a horizontal angle between the virtual sound source and the plurality of virtual microphones;
Generating a GCC-PHAT image for extracting characteristics for each frequency band through a GCC-PHAT (Generalized Cross Correlation-Phase Transform) process;
a sound field analysis step for each boundary condition of performing a plurality of analyzes by setting a spatial condition in which the virtual sound source is located and changing the location of the virtual sound source;
A sound field analysis image generation step of collecting sound field analysis images according to the characteristics of the virtual sound source;
A CNN algorithm learning step of learning a plurality of images refined for feature extraction with a CNN algorithm;
a microphone measurement data input step of inputting time difference data obtained from a plurality of real microphones as input values of the learned CNN algorithm; and
Including; a result deriving step of deriving the output value of the location prediction and noise mapping of the sound source from the input microphone time difference data;
The GCC-PHAT image generation step includes:
Cross-spectrum the virtual sound source signal that has reached the plurality of virtual microphones by calculating the phase difference;
whitening the frequency level to 1 to make the weight of the cross spectrum constant;
a bandpass filter step of configuring a plurality of bands by varying the time difference of the sound source signal according to the frequency, and performing a bandpass filter by setting characteristics except for each band to 0; and
A sound processing method using a CNN, comprising: extracting a characteristic by an Inverse Fast Fourier Tramsform (IFFT) calculation, which is an inverse process from the frequency/amplitude domain to the time/amplitude domain after a predetermined time.

제 1 항 또는 제 3 항에 있어서,
상기 음향해석 시뮬레이션 단계에서는, 음향해석 시뮬레이션을 사용하여 가상 음원으로부터 복수개의 가상 마이크까지 전달되는 시간 및 진폭의 음원 데이터와, 가상 공간에서의 노이즈맵핑의 음장 데이터를 생성하는 것을 특징으로 하는 CNN을 이용한 음향 처리방법.4. The method of claim 1 or 3,
In the acoustic analysis simulation step, by using the acoustic analysis simulation, sound source data of time and amplitude transmitted from a virtual sound source to a plurality of virtual microphones and sound field data of noise mapping in a virtual space are generated using CNN. sound processing method.

삭제delete

제 1 항 또는 제 3 항에 있어서,
상기 밴드 패스 필터 단계는, 음원신호의 시간차를 63∼125Hz, 125∼250Hz, 250∼500Hz, 500∼1,000Hz, 1,000∼2,000Hz, 2,000∼4,000Hz, 4,000∼8,000Hz의 주파수에 따라 밴드로 구성하는 것을 특징으로 하는 CNN을 이용한 음향 처리방법.4. The method of claim 1 or 3,
In the band pass filter step, the time difference of the sound source signal is composed of bands according to frequencies of 63 to 125 Hz, 125 to 250 Hz, 250 to 500 Hz, 500 to 1,000 Hz, 1,000 to 2,000 Hz, 2,000 to 4,000 Hz, and 4,000 to 8,000 Hz. A sound processing method using CNN, characterized in that