KR101864925B1

KR101864925B1 - Global Model-based Audio Object Separation method and system

Info

Publication number: KR101864925B1
Application number: KR1020160014914A
Authority: KR
Inventors: 조충상; 김제우; 이영한; 이혜인
Original assignee: 전자부품연구원
Priority date: 2016-02-05
Filing date: 2016-02-05
Publication date: 2018-06-05
Also published as: WO2017135487A1; KR20170093474A

Abstract

A global model-based audio object separation method and system are provided. The audio separation method according to the embodiment of the present invention automatically generates a second model by automatically expanding a first model used for separating a sound source and separates the sound source into a plurality of audio objects using the generated second model . Thus, it is possible to automatically generate a long-length NMF model by extending the global NMF model with a small length, and to use it for audio object separation, thereby enabling audio objects to be separated for all sound sources in a shorter time.

Description

[0001] The present invention relates to a global model-based audio object separation method and system,

The present invention relates to audio processing techniques and, more particularly, to a method and system for separating a sound source into a plurality of audio objects.

An audio source comprises a plurality of audio objects, such as vocals, drums, guitars, pianos, and the like. It is possible to separate these audio sources into audio objects.

Currently, one of the most popular techniques for separating audio objects is the Non-Negative Matrix Factorization Model (NMF) -based audio object separation technique.

In order to separate audio objects based on the NMF model, the NMF model having the same length as the audio sound source to be separated is required. Since the sound source has a different length, the NMF model used differs for each sound source.

In addition, in order to increase the degree of separation, the audio engineer is designing the NMF model directly considering the length of the audio source to be separated in addition to the attribute, which is a very difficult and long time operation.

It is an object of the present invention to provide a global model-based audio object separation method and system that automatically and easily generates and uses a model used for sound source separation.

According to an aspect of the present invention, there is provided an audio separation method including: automatically expanding a first model used for separating a sound source to generate a second model; And separating the sound source into a plurality of audio objects using the generated second model.

Further, the method of separating audio according to an embodiment of the present invention further includes the step of determining the length of the sound source, and the first model may be extended with reference to the length.

Also, the first model is a first NMF model (Non-Negative Matrix Factorization Model), the second model is a second NMF model, and the generating step is a step of repeatedly listing the W matrix of the first NMF model .

The generating step may be repeatedly arranged in all or a part of the W matrix.

The generation step may be extended by selecting and arranging a part of the W matrix.

The generation step may randomly select a part of the W matrix.

Also, the generation step may select a part of the W matrix based on the analysis result of the sound source.

According to another embodiment of the present invention, there is provided an audio separating system comprising: a generating unit for automatically expanding a first model used for separating a sound source to generate a second model; And a separator separating the sound source into a plurality of audio objects using the generated second model.

As described above, according to the embodiments of the present invention, it is possible to automatically generate a long-length NMF model by extending a global NMF model having a small length, and to use it for audio object separation, It is possible to separate the audio object from the audio object.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of an audio object separation system according to an embodiment of the present invention.
2 is a diagram provided in the description of the NMF model,
3 is a drawing provided in the description of an extension of the global NMF model,
FIG. 4 is a diagram provided in the detailed description of the NMF model extension engine shown in FIG. 1,
Figures 5-8 illustrate the drawings provided in the description of a method for extending / converting H to H '
FIG. 9 is a diagram provided in detail of an index determination method by the audio analysis module.

Hereinafter, the present invention will be described in detail with reference to the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram illustrating an audio object separation system according to an exemplary embodiment of the present invention. The audio object separation system according to the embodiment of the present invention is a system for expanding NMF models required for separating a sound source having an input length T into audio objects,

As shown in FIG. 1, the audio object separation system according to an embodiment of the present invention includes an NMF model extension engine 110 and an NMF model based object separation engine 120.

The NMF model extension engine 110 automatically expands the global NMF models 10-1, ..., 10-n used for sound source separation according to the length of the sound source, and outputs the NMF models 20-1 , ... 20-n.

The NMF model based object separation engine 120 separates a sound source into a plurality of audio objects using the NMF models 20-1 to 20-n generated by the NMF model extension engine 110 .

The global NMF models 10-1 to 10-n are small-length NMF models provided for each audio object (vocal, drum, guitar, piano), and are commonly used for all input sound sources .

As shown in FIG. 2, the NMF model includes W (F by k) and W (F k) from an STFT (Short Term Fourier Transform) operation result of 'a sound source mixed with various audio objects' H (k by N).

N is the window size used in the STFT operation, F is the number of frequency bins in the STFT operation, and k is the dimension applied to the STFT operation. The parameter affecting the length T of the mixing sound source is "N ".

In order to generate the extended NMF models 20-1 to 20-n from the global NMF models 10-1 to 10-n, as shown in FIG. 3, (K by N1) matrix of the models (10-1, ..., 10-n) to the H '(k by N2) matrix required for the mixing sound source of length T. [

W of the global NMF models 10-1, ..., 10-n are used as they are in generating the NMF models 20-1, ..., 20-n. That is, W of the global NMF models 10-1, ... 10-n and W of the NMF models 20-1, ..., 20-n are implemented in the same manner.

4 is a diagram provided in the detailed description of the NMF model extension engine 110 shown in FIG. 4 shows a process in which the NMF model extension engine 110 converts global NMF models 10-1, ..., 10-n into NMF models 20-1, ..., 20-n .

4, the NMF model extension engine 110 determines the length T of the mixing sound source and configures the global NMF models 10-1, ..., 10-n based on the recognized length T A process of converting the small size H into the long size H 'and generating the NMF models 20-1, ..., 20-n is shown.

The method of expanding H to H 'and converting it is very diverse and will be described in detail below. In converting H into H ', only H itself may be used, but the analysis result of the mixing sound source by the audio analysis module 115 may be used.

FIG. 5 shows a method of expanding / converting H to H '. The expansion / conversion method shown in FIG. 5 is a method of repeatedly rearranging H with a column length N1, and deleting a portion exceeding the required N2 to generate H '.

Figure 6 shows another method of extending / converting H to H '. The expansion / conversion method shown in FIG. 6 is a method of generating H 'by repeatedly arranging columns of H having a column length N1 in column units. At this time, the number of repetitions can be set differently for each column to match the required N2.

Figure 7 shows another method of extending / converting H to H '. The expansion / conversion method shown in FIG. 7 is a method for generating H 'by repeatedly selecting and arranging randomly one of the columns of H having a column length N1.

Figure 8 shows another method of extending / converting H to H '. The expansion / conversion method shown in FIG. 8 is the same as the method shown in FIG. 7 in that one of the columns of H is selected and arranged to generate H 'repeatedly.

However, in the method shown in FIG. 8, instead of randomly selecting one of the columns of H to be listed in H ', the audio analysis module 115 selects according to the index determined based on the analysis result of the mixing sound source , There is a difference from the method shown in Fig.

FIG. 9 is a diagram provided in detail of the method of determining an index by the audio analysis module 115. FIG. As shown in FIG. 9, the audio analysis module 115 calculates the absolute value of the STFT operation result for the mixing sound source, moves the window from the calculated result, and selects the most similar H column through the similarity analysis Repeatedly create indexes.

So far, preferred embodiments have been described in detail for a global model based audio object separation method and system.

In the above embodiments, the audio object separation using the NMF models is assumed. It is needless to say that the technical idea of the present invention can be applied to a case where a model other than the NMF model is modified or a model different from the NMF model is applied.

Also, in the above embodiments, the vocals, drums, guitars, and piano referred to as audio objects are merely illustrative. The technical idea of the present invention is also applicable to the case of separating a sound source with a variety of audio objects.

The audio object separation method and system proposed in the embodiments of the present invention can be applied to fields such as audio effects, content production, surveillance system, and the like as well as fields requiring voice separation or other types of source separation.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.

10-1, ... 10-n: global NMF model
20-1, ..., 20-n: NMF model
110: NMF model extension engine
120: NMF model-based object separation engine

Claims

Generating a second NMF model by using the W matrix of the first NMF model (Non-Negative Matrix Factorization Model) commonly used for separating the sound sources and by repeating the H matrix; And
And separating the sound source into a plurality of audio objects using the generated second NMF model.

delete

The method according to claim 1,
Wherein the generating comprises:
Wherein the H matrix is repeatedly arranged in all or a part of the H matrix.

The method according to claim 1,
Wherein the generating comprises:
And selecting and arranging a part of the H matrix.

The method of claim 5,
Wherein the generating comprises:
Wherein a portion of the H matrix is randomly selected.

The method of claim 5,
Wherein the generating comprises:
Calculating an absolute value of an STFT (Short Term Fourier Transform) operation result on the sound source, and analyzing the calculated result and a similarity degree to select a portion of the most similar H matrix.

A generating unit for generating a second NMF model by using the W matrix of the first NMF model (Non-Negative Matrix Factorization Model) commonly used for the sound source separation, and by repeatedly arranging the H matrix; And
And separating the sound source into a plurality of audio objects using the generated second NMF model.