KR101864925B1 - Global Model-based Audio Object Separation method and system - Google Patents

Global Model-based Audio Object Separation method and system Download PDF

Info

Publication number
KR101864925B1
KR101864925B1 KR1020160014914A KR20160014914A KR101864925B1 KR 101864925 B1 KR101864925 B1 KR 101864925B1 KR 1020160014914 A KR1020160014914 A KR 1020160014914A KR 20160014914 A KR20160014914 A KR 20160014914A KR 101864925 B1 KR101864925 B1 KR 101864925B1
Authority
KR
South Korea
Prior art keywords
model
nmf
matrix
sound source
audio
Prior art date
Application number
KR1020160014914A
Other languages
Korean (ko)
Other versions
KR20170093474A (en
Inventor
조충상
김제우
이영한
이혜인
Original Assignee
전자부품연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 전자부품연구원 filed Critical 전자부품연구원
Priority to KR1020160014914A priority Critical patent/KR101864925B1/en
Priority to PCT/KR2016/001393 priority patent/WO2017135487A1/en
Publication of KR20170093474A publication Critical patent/KR20170093474A/en
Application granted granted Critical
Publication of KR101864925B1 publication Critical patent/KR101864925B1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

A global model-based audio object separation method and system are provided. The audio separation method according to the embodiment of the present invention automatically generates a second model by automatically expanding a first model used for separating a sound source and separates the sound source into a plurality of audio objects using the generated second model . Thus, it is possible to automatically generate a long-length NMF model by extending the global NMF model with a small length, and to use it for audio object separation, thereby enabling audio objects to be separated for all sound sources in a shorter time.

Description

[0001] The present invention relates to a global model-based audio object separation method and system,

The present invention relates to audio processing techniques and, more particularly, to a method and system for separating a sound source into a plurality of audio objects.

An audio source comprises a plurality of audio objects, such as vocals, drums, guitars, pianos, and the like. It is possible to separate these audio sources into audio objects.

Currently, one of the most popular techniques for separating audio objects is the Non-Negative Matrix Factorization Model (NMF) -based audio object separation technique.

In order to separate audio objects based on the NMF model, the NMF model having the same length as the audio sound source to be separated is required. Since the sound source has a different length, the NMF model used differs for each sound source.

In addition, in order to increase the degree of separation, the audio engineer is designing the NMF model directly considering the length of the audio source to be separated in addition to the attribute, which is a very difficult and long time operation.

It is an object of the present invention to provide a global model-based audio object separation method and system that automatically and easily generates and uses a model used for sound source separation.

According to an aspect of the present invention, there is provided an audio separation method including: automatically expanding a first model used for separating a sound source to generate a second model; And separating the sound source into a plurality of audio objects using the generated second model.

Further, the method of separating audio according to an embodiment of the present invention further includes the step of determining the length of the sound source, and the first model may be extended with reference to the length.

Also, the first model is a first NMF model (Non-Negative Matrix Factorization Model), the second model is a second NMF model, and the generating step is a step of repeatedly listing the W matrix of the first NMF model .

The generating step may be repeatedly arranged in all or a part of the W matrix.

The generation step may be extended by selecting and arranging a part of the W matrix.

The generation step may randomly select a part of the W matrix.

Also, the generation step may select a part of the W matrix based on the analysis result of the sound source.

According to another embodiment of the present invention, there is provided an audio separating system comprising: a generating unit for automatically expanding a first model used for separating a sound source to generate a second model; And a separator separating the sound source into a plurality of audio objects using the generated second model.

As described above, according to the embodiments of the present invention, it is possible to automatically generate a long-length NMF model by extending a global NMF model having a small length, and to use it for audio object separation, It is possible to separate the audio object from the audio object.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of an audio object separation system according to an embodiment of the present invention.
2 is a diagram provided in the description of the NMF model,
3 is a drawing provided in the description of an extension of the global NMF model,
FIG. 4 is a diagram provided in the detailed description of the NMF model extension engine shown in FIG. 1,
Figures 5-8 illustrate the drawings provided in the description of a method for extending / converting H to H '
FIG. 9 is a diagram provided in detail of an index determination method by the audio analysis module.

Hereinafter, the present invention will be described in detail with reference to the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram illustrating an audio object separation system according to an exemplary embodiment of the present invention. The audio object separation system according to the embodiment of the present invention is a system for expanding NMF models required for separating a sound source having an input length T into audio objects,

As shown in FIG. 1, the audio object separation system according to an embodiment of the present invention includes an NMF model extension engine 110 and an NMF model based object separation engine 120.

The NMF model extension engine 110 automatically expands the global NMF models 10-1, ..., 10-n used for sound source separation according to the length of the sound source, and outputs the NMF models 20-1 , ... 20-n.

The NMF model based object separation engine 120 separates a sound source into a plurality of audio objects using the NMF models 20-1 to 20-n generated by the NMF model extension engine 110 .

The global NMF models 10-1 to 10-n are small-length NMF models provided for each audio object (vocal, drum, guitar, piano), and are commonly used for all input sound sources .

As shown in FIG. 2, the NMF model includes W (F by k) and W (F k) from an STFT (Short Term Fourier Transform) operation result of 'a sound source mixed with various audio objects' H (k by N).

N is the window size used in the STFT operation, F is the number of frequency bins in the STFT operation, and k is the dimension applied to the STFT operation. The parameter affecting the length T of the mixing sound source is "N ".

In order to generate the extended NMF models 20-1 to 20-n from the global NMF models 10-1 to 10-n, as shown in FIG. 3, (K by N1) matrix of the models (10-1, ..., 10-n) to the H '(k by N2) matrix required for the mixing sound source of length T. [

W of the global NMF models 10-1, ..., 10-n are used as they are in generating the NMF models 20-1, ..., 20-n. That is, W of the global NMF models 10-1, ... 10-n and W of the NMF models 20-1, ..., 20-n are implemented in the same manner.

4 is a diagram provided in the detailed description of the NMF model extension engine 110 shown in FIG. 4 shows a process in which the NMF model extension engine 110 converts global NMF models 10-1, ..., 10-n into NMF models 20-1, ..., 20-n .

4, the NMF model extension engine 110 determines the length T of the mixing sound source and configures the global NMF models 10-1, ..., 10-n based on the recognized length T A process of converting the small size H into the long size H 'and generating the NMF models 20-1, ..., 20-n is shown.

The method of expanding H to H 'and converting it is very diverse and will be described in detail below. In converting H into H ', only H itself may be used, but the analysis result of the mixing sound source by the audio analysis module 115 may be used.

FIG. 5 shows a method of expanding / converting H to H '. The expansion / conversion method shown in FIG. 5 is a method of repeatedly rearranging H with a column length N1, and deleting a portion exceeding the required N2 to generate H '.

Figure 6 shows another method of extending / converting H to H '. The expansion / conversion method shown in FIG. 6 is a method of generating H 'by repeatedly arranging columns of H having a column length N1 in column units. At this time, the number of repetitions can be set differently for each column to match the required N2.

Figure 7 shows another method of extending / converting H to H '. The expansion / conversion method shown in FIG. 7 is a method for generating H 'by repeatedly selecting and arranging randomly one of the columns of H having a column length N1.

Figure 8 shows another method of extending / converting H to H '. The expansion / conversion method shown in FIG. 8 is the same as the method shown in FIG. 7 in that one of the columns of H is selected and arranged to generate H 'repeatedly.

However, in the method shown in FIG. 8, instead of randomly selecting one of the columns of H to be listed in H ', the audio analysis module 115 selects according to the index determined based on the analysis result of the mixing sound source , There is a difference from the method shown in Fig.

FIG. 9 is a diagram provided in detail of the method of determining an index by the audio analysis module 115. FIG. As shown in FIG. 9, the audio analysis module 115 calculates the absolute value of the STFT operation result for the mixing sound source, moves the window from the calculated result, and selects the most similar H column through the similarity analysis Repeatedly create indexes.

So far, preferred embodiments have been described in detail for a global model based audio object separation method and system.

In the above embodiments, the audio object separation using the NMF models is assumed. It is needless to say that the technical idea of the present invention can be applied to a case where a model other than the NMF model is modified or a model different from the NMF model is applied.

Also, in the above embodiments, the vocals, drums, guitars, and piano referred to as audio objects are merely illustrative. The technical idea of the present invention is also applicable to the case of separating a sound source with a variety of audio objects.

The audio object separation method and system proposed in the embodiments of the present invention can be applied to fields such as audio effects, content production, surveillance system, and the like as well as fields requiring voice separation or other types of source separation.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, It will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the present invention.

10-1, ... 10-n: global NMF model
20-1, ..., 20-n: NMF model
110: NMF model extension engine
120: NMF model-based object separation engine

Claims (8)

Generating a second NMF model by using the W matrix of the first NMF model (Non-Negative Matrix Factorization Model) commonly used for separating the sound sources and by repeating the H matrix; And
And separating the sound source into a plurality of audio objects using the generated second NMF model.
delete delete The method according to claim 1,
Wherein the generating comprises:
Wherein the H matrix is repeatedly arranged in all or a part of the H matrix.
The method according to claim 1,
Wherein the generating comprises:
And selecting and arranging a part of the H matrix.
The method of claim 5,
Wherein the generating comprises:
Wherein a portion of the H matrix is randomly selected.
The method of claim 5,
Wherein the generating comprises:
Calculating an absolute value of an STFT (Short Term Fourier Transform) operation result on the sound source, and analyzing the calculated result and a similarity degree to select a portion of the most similar H matrix.
A generating unit for generating a second NMF model by using the W matrix of the first NMF model (Non-Negative Matrix Factorization Model) commonly used for the sound source separation, and by repeatedly arranging the H matrix; And
And separating the sound source into a plurality of audio objects using the generated second NMF model.
KR1020160014914A 2016-02-05 2016-02-05 Global Model-based Audio Object Separation method and system KR101864925B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020160014914A KR101864925B1 (en) 2016-02-05 2016-02-05 Global Model-based Audio Object Separation method and system
PCT/KR2016/001393 WO2017135487A1 (en) 2016-02-05 2016-02-11 Method and system for separating audio objects on basis of global model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020160014914A KR101864925B1 (en) 2016-02-05 2016-02-05 Global Model-based Audio Object Separation method and system

Publications (2)

Publication Number Publication Date
KR20170093474A KR20170093474A (en) 2017-08-16
KR101864925B1 true KR101864925B1 (en) 2018-06-05

Family

ID=59501027

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020160014914A KR101864925B1 (en) 2016-02-05 2016-02-05 Global Model-based Audio Object Separation method and system

Country Status (2)

Country Link
KR (1) KR101864925B1 (en)
WO (1) WO2017135487A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101511553B1 (en) 2014-02-14 2015-04-13 전자부품연구원 Multi Step Audio Separation Method and Audio Device using the same
US20150205575A1 (en) * 2014-01-20 2015-07-23 Canon Kabushiki Kaisha Audio signal processing apparatus and method thereof

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101043114B1 (en) * 2009-07-31 2011-06-20 포항공과대학교 산학협력단 Method of Restoration of Sound, Recording Media of the same and Apparatus of the same
KR101225932B1 (en) * 2009-08-28 2013-01-24 포항공과대학교 산학협력단 Method and system for separating music sound source
US9721202B2 (en) * 2014-02-21 2017-08-01 Adobe Systems Incorporated Non-negative matrix factorization regularized by recurrent neural networks for audio processing
KR101641645B1 (en) * 2014-06-11 2016-07-22 전자부품연구원 Audio Source Seperation Method and Audio System using the same

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150205575A1 (en) * 2014-01-20 2015-07-23 Canon Kabushiki Kaisha Audio signal processing apparatus and method thereof
KR101511553B1 (en) 2014-02-14 2015-04-13 전자부품연구원 Multi Step Audio Separation Method and Audio Device using the same

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Brian King, et al. Optimal cost function and magnitude power for NMF-based speech separation and music interpolation. IEEE International Workshop on Machine Learning for Signal Processing. 2012.09.26.*

Also Published As

Publication number Publication date
WO2017135487A1 (en) 2017-08-10
KR20170093474A (en) 2017-08-16

Similar Documents

Publication Publication Date Title
CN106486128B (en) Method and device for processing double-sound-source audio data
JP6617783B2 (en) Information processing method, electronic device, and program
WO2006135806A3 (en) Forensic integrated search technology
CA2763312A1 (en) Audio signal processing device, audio signal processing method, and program
KR20080030922A (en) Information processing apparatus, method, program and recording medium
KR20160076775A (en) Composition program based on input music data and system thereof
KR101648931B1 (en) Apparatus and method for producing a rhythm game, and computer program for executing the method
KR100512143B1 (en) Method and apparatus for searching of musical data based on melody
Krause et al. Classifying Leitmotifs in Recordings of Operas by Richard Wagner.
JP6747447B2 (en) Signal detection device, signal detection method, and signal detection program
JPWO2017217412A1 (en) Signal processing apparatus, signal processing method and signal processing program
KR102128153B1 (en) Apparatus and method for searching music source using machine learning
KR101864925B1 (en) Global Model-based Audio Object Separation method and system
KR101493006B1 (en) Apparatus for editing of multimedia contents and method thereof
KR20170128070A (en) Chord composition method based on recurrent neural network
Lederle et al. Combining high-level features of raw audio waves and mel-spectrograms for audio tagging
JP2014112190A (en) Signal section classifying apparatus, signal section classifying method, and program
JP2012181307A (en) Voice processing device, voice processing method and voice processing program
JP6633753B2 (en) Music selection device for lighting control data generation, music selection method for lighting control data generation, and music selection program for lighting control data generation
Anderson Musical instrument classification utilizing a neural network
JP2008134298A (en) Signal processing device, signal processing method and program
JP6565529B2 (en) Automatic arrangement device and program
Li et al. Knowledge based fundamental and harmonic frequency detection in polyphonic music analysis
Bagul et al. Recognition of similar patterns in popular Hindi Jazz songs by music data mining
Mukherjee et al. Instrumentals/songs separation for background music removal

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant