JP3508981B2

JP3508981B2 - Method for separating, separating and extracting melodies included in music performance

Info

Publication number: JP3508981B2
Application number: JP32707697A
Authority: JP
Inventors: 邦夫柏野; 洋村瀬
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1997-11-12
Filing date: 1997-11-12
Publication date: 2004-03-22
Anticipated expiration: 2017-11-12
Also published as: JPH11143460A

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、複数の旋律（パ
ート）を含む音楽演奏の音響信号に対して、ある旋律だ
けを抽出するか、またはある旋律だけを除去することに
よって、たとえば歌唱だけを抽出したり、原楽曲からカ
ラオケを作成したりすることなどを目的とする、音楽演
奏に含まれる旋律を分離抽出および分離除去する方法に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention extracts only a certain melody or removes only a certain melody from an audio signal of a music performance including a plurality of melody (parts), for example, singing only. The present invention relates to a method for separating and extracting a melody included in a music performance for the purpose of extracting or creating karaoke from an original music.

【０００２】[0002]

【従来の技術】従来、音楽演奏に含まれる旋律の分離抽
出および分離除去方法に関しては、周波数成分を画面上
に表示し、これを手動で選択して、選択した周波数成分
に由来する単音の音響信号を出力する方法が知られてい
る。しかし、この方法では、一般の楽器演奏にしばしば
見られるように、同時に数多くの単音が発音している場
合には、各単音に由来するパワースペクトルが複雑に混
ざり合い、どの周波数成分がどの単音に由来するもので
あるかの判別は人手をもってしても極めて困難である
（図４にこれを示す）ため、周波数成分の選択において
は、専門家が試行錯誤を繰り返して選択を行うほかはな
く、一般の利用者が容易に実行できるものではないとい
う欠点があった。2. Description of the Related Art Conventionally, regarding a method of separating and extracting a melody included in a music performance, a frequency component is displayed on a screen, and this is manually selected, so that a single-tone sound derived from the selected frequency component is selected. A method of outputting a signal is known. However, with this method, when many single notes are being pronounced at the same time, as is often the case with general musical instrument performance, the power spectra derived from each single note are complicatedly mixed, and which frequency component becomes which single note. Since it is extremely difficult to discriminate whether or not it comes from the human hand (this is shown in FIG. 4), an expert must repeat trial and error to select the frequency component, There is a drawback that it cannot be easily executed by general users.

【０００３】また、周波数成分の情報を元にして音響信
号を生成する方法としては、短時間フーリエ分析によっ
て得られた瞬時周波数値と振幅値を用いて時間的に区分
された正弦波を生成し、これを加算して原波形を得る、
いわゆる加算合成の方法が知られている。しかし、この
方法では、瞬時周波数の計算に隣接する分析区間の情報
のみを用いているため、短時間フーリエ分析によって十
分な精度で瞬時周波数値を得ることは困難である。した
がって瞬時周波数の推定誤差が生じ、これが異音の原因
となって、高品質（高音質）の処理の妨げになることが
多いという欠点があった。Further, as a method of generating an acoustic signal based on information of frequency components, a sine wave temporally divided is generated by using an instantaneous frequency value and an amplitude value obtained by short-time Fourier analysis. , Add this to get the original waveform,
A so-called additive synthesis method is known. However, in this method, it is difficult to obtain the instantaneous frequency value with sufficient accuracy by the short-time Fourier analysis because only the information of the adjacent analysis sections is used for the calculation of the instantaneous frequency. Therefore, there is a disadvantage that an estimation error of the instantaneous frequency occurs, which causes abnormal noise and often hinders high-quality (high-quality sound) processing.

【０００４】従って、上記の各方法は、専門家とは限ら
ない一般の利用者の利用に供する場合や、高音質の処理
が必要な場合にあっては、十分な、音楽演奏に含まれる
旋律の分離抽出および分離除去処理が期待し難い。Therefore, each of the above-mentioned methods is sufficient for use by general users who are not necessarily experts, and when high-quality sound processing is required, sufficient melody included in music performance. It is difficult to expect separate extraction and separation removal processing of.

【０００５】[0005]

【発明が解決しようとする課題】この発明は、専門家と
は限らない一般の利用者の利用に供する場合や、高音質
の処理が必要な場合であっても適用することのできる音
楽演奏に含まれる旋律の分離抽出方法および分離除去方
法を提供し、公知の方法に比較して一般の利用者が容易
に利用できる形で、かつ高い品質で音楽演奏に含まれる
旋律の分離抽出および分離除去処理を行うことを目的と
している。SUMMARY OF THE INVENTION The present invention is applicable to music performance that can be applied to general users who are not limited to specialists and when high-quality sound processing is required. A method for separating and extracting the included melody, and a method for separating and removing the included melody, which is easier for general users to use than known methods and which has a high quality and is used for separating and extracting the melody included in music performance. It is intended to be processed.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するため
に、この発明は、音楽演奏の音響信号に含まれている周
波数成分を抽出する周波数成分抽出過程と、上記周波数
成分抽出過程で抽出した周波数成分に対して、また別の
手段で与えられる、分離抽出または分離除去したい旋律
を構成する各単音の音高、発音開始時刻、および発音終
了時刻の情報を用いて、出力に用いるべき周波数成分と
その時間範囲とを選択する周波数成分選択過程と、上記
周波数成分選択過程で選択された周波数成分の時刻、振
幅、位相の情報に基づいて波形を合成加算することによ
って出力音響信号波形を生成する波形加算合成過程を有
することを特徴とする。In order to achieve the above object, the present invention extracts a frequency component contained in an audio signal of a music performance by a frequency component extracting process and the frequency component extracting process. Frequency components to be used for output, using the information of pitch, pronunciation start time, and pronunciation end time of each single note that constitutes the melody to be separated or extracted, which is given to the frequency component by another means. A frequency component selection process for selecting the frequency range and its time range, and an output acoustic signal waveform is generated by synthetically adding the waveforms based on the time, amplitude, and phase information of the frequency components selected in the frequency component selection process. It is characterized by having a waveform addition and synthesis process.

【０００７】[0007]

【発明の実施の形態】次に、この発明の実施形態につい
て図面を用いて説明する。図１は、この発明方法を適用
した音楽演奏に含まれる旋律の分離抽出および分離除去
装置の一実施形態の機能構成を示すブロック図である。
この実施形態の音楽演奏に含まれる旋律の分離抽出およ
び分離除去装置は、周波数成分抽出手段１と、周波数成
分選択手段２と、波形加算合成手段３で構成され、音楽
音響信号を周波数成分抽出手段１に入力し、楽譜情報を
周波数成分選択手段２に入力し、入力音響信号から入力
楽譜情報に対応した旋律のみを抽出した音響信号、また
は入力音響信号から入力楽譜情報に対応した旋律のみを
除去した音響信号を出力する。BEST MODE FOR CARRYING OUT THE INVENTION Next, an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a functional configuration of an embodiment of a device for separating and extracting a melody included in a music performance to which the method of the present invention is applied.
The separation / extraction / separation / removal device of the melody included in the music performance of this embodiment is composed of a frequency component extraction means 1, a frequency component selection means 2, and a waveform addition / synthesis means 3, and a music acoustic signal is extracted as a frequency component extraction means. 1 to input the musical score information to the frequency component selection means 2, and remove only the melody corresponding to the input musical score information from the input acoustic signal or the acoustic signal obtained by extracting only the melody corresponding to the input musical score information from the input acoustic signal. The sound signal is output.

【０００８】周波数成分抽出手段１は、音楽演奏の音響
信号に対し、周波数解析を行って、周波数軸方向のロー
カルピークを時間方向に接続することによって周波数成
分を抽出し、各周波数成分の各時刻における瞬時周波数
を取得する。周波数成分選択手段２は、周波数成分抽出
手段１で抽出した周波数成分の各時刻における瞬時周波
数と、別の手段で与えられる各単音の時刻および周波数
とを参照することによって、抽出または除去の対象とな
る単音を構成する周波数成分を選択する。The frequency component extraction means 1 performs frequency analysis on the acoustic signal of music performance, extracts the frequency components by connecting the local peaks in the frequency axis direction in the time direction, and extracts the time components of each frequency component. To get the instantaneous frequency at. The frequency component selection means 2 refers to the instantaneous frequency of the frequency component extracted by the frequency component extraction means 1 at each time, and the time and frequency of each single tone given by another means, so as to be extracted or removed. Select the frequency components that make up a single tone.

【０００９】波形加算合成手段３は、周波数成分選択手
段２で選択された周波数成分の時刻、振幅、位相の情報
に基づいて合成される区分的な正弦波を、各時刻におい
て同時に存在する周波数成分の数だけ加算することによ
って出力音響信号波形を生成する。次に、上述した各手
段１，２，３における各処理の流れを、図６から図８に
示す流れ図を参照して具体的に説明する。The waveform adding / synthesizing means 3 produces a piecewise sine wave which is synthesized on the basis of the time, amplitude, and phase information of the frequency component selected by the frequency component selecting means 2 at the same time. The output acoustic signal waveform is generated by adding the number of Next, the flow of each processing in each of the above-mentioned means 1, 2 and 3 will be specifically described with reference to the flow charts shown in FIGS. 6 to 8.

【００１０】周波数成分抽出手段１では、まず装置への
入力となる音楽演奏の音響信号波形を読み込む（ステッ
プ１０１）、音響信号波形の例を図２に示す。図２は、
Ｔ社発売の市販ＣＤに収録されている、女性歌手の歌謡
曲の音響信号の一部である。上段が左チャンネル、下段
が右チャンネルの波形を示す。次に、読み込んだ波形に
対し周波数解析を行って、スペクトログラムを得る（ス
テップ１０２）。スペクトログラムは、音響信号に含ま
れるパワーを横軸時間、縦軸周波数の平面上に表現した
ものである。スペクトログラムの例を図３に示す。図３
は、高速フーリエ変換の手法を用いて得たものである。
続いて、ステップ１０２で得たスペクトログラムに対し
て、周波数成分を抽出する。周波数成分とは、スペクト
ログラム上における一連のローカルピークのことであ
る。周波数成分抽出は、スペクトログラムをまず周波数
方向に走査してパワーのローカルピークを検出し（ステ
ップ１０３）、このローカルピークの時間方向の連続性
を検出して、連続するローカルピークを接続する（ステ
ップ１０４）ことによって行う。抽出された周波数成分
の例を図４に示す。図４では、周波数成分が数多くの線
分で表されている。線分の太さがパワー値を示す。In the frequency component extracting means 1, first, an acoustic signal waveform of a music performance which is input to the apparatus is read (step 101), and an example of the acoustic signal waveform is shown in FIG. Figure 2
It is a part of the audio signal of the song of a female singer recorded on a commercial CD released by T company. The upper row shows the left channel waveform, and the lower row shows the right channel waveform. Next, frequency analysis is performed on the read waveform to obtain a spectrogram (step 102). The spectrogram is a representation of the power contained in the acoustic signal on the plane of horizontal axis time and vertical axis frequency. An example of the spectrogram is shown in FIG. Figure 3
Is obtained by using the fast Fourier transform method.
Subsequently, frequency components are extracted from the spectrogram obtained in step 102. The frequency component is a series of local peaks on the spectrogram. In the frequency component extraction, the spectrogram is first scanned in the frequency direction to detect a local peak of power (step 103), the continuity of the local peak in the time direction is detected, and continuous local peaks are connected (step 104). ) By doing. An example of the extracted frequency components is shown in FIG. In FIG. 4, the frequency component is represented by many line segments. The thickness of the line segment indicates the power value.

【００１１】一般に、高速フーリエ変換の手法では、周
波数方向の分解能は分析時間窓の長さ（ポイント数）す
なわち時間分解能によって決まり、音楽演奏を対象とし
た場合、十分な時間分解能を得ようとすると、十分な周
波数分解能が得られない。例えば、標本化周波数４８ｋ
Ｈｚのとき、４０ミリ秒の時間分解能を得ようとすれ
ば、分析フレーム長を２０４８サンプルかそれより短い
時間区間としなければならないが、このとき周波数分解
能は高々約２３Ｈｚとなる。ところが、音楽では半音の
周波数差が約６％であるから、例えば、歌唱の代表的音
域である２２０ＨｚのＡの音と２３３ＨｚのＡ＃の音と
を区別するためには１３Ｈｚの周波数分解能が必要であ
り、上記分析では周波数分解能が不足する。しかし、ス
テップ１０４で検出された周波数成分の連続性を用いる
と、より精密な周波数の値を推定することができる（ス
テップ１０５）。Generally, in the fast Fourier transform method, the resolution in the frequency direction is determined by the length (the number of points) of the analysis time window, that is, the time resolution, and when a music performance is targeted, an attempt is made to obtain sufficient time resolution. , Sufficient frequency resolution cannot be obtained. For example, sampling frequency 48k
In order to obtain a time resolution of 40 msec at Hz, the analysis frame length must be 2048 samples or a shorter time interval, and at this time, the frequency resolution is at most about 23 Hz. However, since the frequency difference of semitones is about 6% in music, for example, a frequency resolution of 13 Hz is required to distinguish the sound A of 220 Hz and the sound A # of 233 Hz, which is a typical range of singing. Therefore, the frequency resolution is insufficient in the above analysis. However, by using the continuity of the frequency components detected in step 104, a more accurate frequency value can be estimated (step 105).

【００１２】これは例えば次のように行うことができ
る。この方法は、まず位相の連続性を用いて各時刻にお
ける位相を近似し、その時間微分として周波数を求める
ものである。なお、このようにして求めた周波数値を瞬
時周波数と呼ぶ。ｋ番目の分析フレームと、これと時間
的に隣接したｋ＋１番目の分析フレームを考え、それぞ
れのフレームにおいて検出されたローカルピーク成分
の、フーリエ変換によって得られる周波数値ωと位相値
θをそれぞれωｋ，ωｋ＋１，θｋ，θｋ＋１とする。
位相は連続的に変化しているはずなので、低次の多項式
で位相の時問変化を表現することを考える。そこで、時
刻ｔにおける位相θ（ｔ）を３次式で近似することにす
れば、次式のようになる。This can be done, for example, as follows. In this method, the phase continuity is first used to approximate the phase at each time, and the frequency is obtained as the time derivative thereof. The frequency value thus obtained is called an instantaneous frequency. Considering the k-th analysis frame and the k + 1-th analysis frame temporally adjacent to the k-th analysis frame, the frequency value ω and the phase value θ obtained by the Fourier transform of the local peak component detected in each frame are ωk, Let ωk + 1, θk, and θk + 1.
Since the phase must change continuously, consider expressing the time change of the phase with a low-order polynomial. Therefore, if the phase θ (t) at time t is approximated by a cubic expression, the following expression is obtained.

【００１３】 θ（ｔ）＝θ_k＋ω_kｔ＋αｔ²＋βｔ³ …（１）ここで、αとβは、次式によって求めることができる。ただしＴは隣接フレーム間の時間間隔であり、Ｍは次の
ｘ′に最も近い整数である。Θ (t) = θ _k + ω _k t + αt ² + βt ³ (1) Here, α and β can be obtained by the following equations. However, T is a time interval between adjacent frames, and M is an integer closest to the next x '.

【００１４】ｘ′＝(1/2) π（θ_k−θ_k+1＋（ω_k+1
＋ω_k）Ｔ／２）この計算で得られたθ（ｔ）を時間微分することによっ
て瞬時周波数値ω（ｔ）が計算できる。瞬時周波数の計
算では、フーリエ変換のフレーム長に依存して決まる周
波数分解能の影響を直接には受けないため、上記の時間
分解能と周波数分解能の関係によって分解能が不足する
という間題点は解消することができる。X '= (1/2) π (θ _k −θ _{k + 1} + (ω _{k + 1}
+ Ω _k ) T / 2) The instantaneous frequency value ω (t) can be calculated by time-differentiating θ (t) obtained by this calculation. Since the calculation of the instantaneous frequency is not directly affected by the frequency resolution that depends on the frame length of the Fourier transform, the problem of insufficient resolution due to the relationship between the time resolution and the frequency resolution should be resolved. You can

【００１５】しかし、上記の計算では、実際には演算精
度等の点から、常に信頼性の高い瞬時周波数値ω（ｔ）
が計算できるとは限らない。このような場含には、ω
（ｔ）の時系列に対してローパスフィルタあるいはメジ
アンフィルタ（一定区間ごとにその中央値を出力する）
等の演算を行うことができる。メジアンフィルタを用い
る場合、例えば、サンプリング周波数４８ｋＨｚ、フレ
ーム長２０４８サンプル、隣接フレーム間の時間差３２
サンプル（直前のフレームと２０１６サンプルは同一）
の場合で、対象フレームの前後１０フレーム分について
計算されたω（ｔ）の中間値を対象フレームの瞬時周波
数値として用いると、後続の再合成処理等において良好
な結果が得られる。However, in the above calculation, the instantaneous frequency value ω (t) is always highly reliable from the viewpoint of calculation accuracy and the like.
Cannot always be calculated. In such cases, ω
Low-pass filter or median filter for the time series of (t) (the median value is output for each fixed section)
Etc. can be calculated. When the median filter is used, for example, the sampling frequency is 48 kHz, the frame length is 2048 samples, and the time difference between adjacent frames is 32.
Sample (the previous frame and 2016 sample are the same)
In this case, if the intermediate value of ω (t) calculated for 10 frames before and after the target frame is used as the instantaneous frequency value of the target frame, good results can be obtained in the subsequent recombining process and the like.

【００１６】上記のような計算を行った結果得られた瞬
時周波数値ω（ｔ）と振幅値とをバッファに格納し（ス
テップ１０６）、ステップ１０１に戻って、入力された
全ての音響信号を処理し終わるまで、逐次各フレームの
処理を行う。周波数成分選択手段２では、はじめに、周
波数成分抽出手段１で得た周波数成分抽出結果を読み込
み（ステップ２０１）、それぞれの周波数成分を一つず
つ選択しながら処理を行う。まず、選択した周波数成分
が、別途装置に与えられている処理対象の旋律に含まれ
る単音の情報、対応する楽譜情報に該当しているかどう
かを検査する（ステップ２０２）。すなわち周波数成分
の時刻と瞬時周波数とが、処理対象の単音の時間区間と
周波数とに一致するかどうかを調べる。もしこれらが一
致していれば、その成分は基本周波数成分であると判定
できるのでステップ２０３に進む。もし一致していなけ
れば、ステップ２０１に戻る。基本周波数成分と判定さ
れた成分は、まずその成分自身を選択し（ステップ２０
３）、次にその成分の高調波成分（すなわち瞬時周波数
値がその成分の瞬時周波数値のほぼ整数倍となっている
成分）を選択する（ステップ２０４）、選択された成分
を周波数成分バッファに格納し（ステップ２０５）、ス
テップ２０１に戻る。The instantaneous frequency value ω (t) and the amplitude value obtained as a result of the above-mentioned calculation are stored in a buffer (step 106), the process returns to step 101, and all the input acoustic signals are input. The processing of each frame is sequentially performed until the processing is completed. The frequency component selection means 2 first reads the frequency component extraction result obtained by the frequency component extraction means 1 (step 201) and performs processing while selecting each frequency component one by one. First, it is checked whether or not the selected frequency component corresponds to the information of a single note included in the melody to be processed separately provided to the device and the corresponding score information (step 202). That is, it is checked whether the time of the frequency component and the instantaneous frequency match the time interval and frequency of the single tone to be processed. If they match, it can be determined that the component is the fundamental frequency component, so the routine proceeds to step 203. If they do not match, the process returns to step 201. For the component determined to be the fundamental frequency component, the component itself is first selected (step 20
3) Next, a harmonic component of the component (that is, a component whose instantaneous frequency value is an integer multiple of the instantaneous frequency value of the component) is selected (step 204), and the selected component is stored in the frequency component buffer. Store (step 205) and return to step 201.

【００１７】なおステップ２０２の検査に当って、入力
音楽の音響信号に対する楽譜の音符記号から各時刻に発
生（入力）する音の周波数と継続時間とを予め求めてお
き、これを入力音響信号に同期させて入力させればよ
い。これらのことについてはIn the inspection of step 202, the frequency and duration of the sound generated (input) at each time from the musical notation symbols of the musical score of the input music are obtained in advance, and this is used as the input acoustic signal. All you have to do is input in synchronization. About these things

【発明の効果】の項で説明する実験で更に述べる。波形
加算合成手段３では、まず、周波数成分抽出手段１で生
成された周波数成分のうちで未処理のものを検索し（ス
テップ３０１）、未処理の単音がなければ処理を終了す
る。未処理の周波数成分があれば、周波数成分選択手段
２の出力を参照して、その成分が合成に用いるべき成分
であるかどうかを判定する（ステップ３０２）。もし、
与えられた単音を分離抽出したい場合には、周波数成分
選択手段２の出力に含まれる成分のみを合成に用いるの
で、ステップ３０３に進む。もし、与えられた単音を分
離除去したい場合には、周波数成分選択手段２の出力に
含まれる成分を合成に用いないので、ステップ３０１に
戻る。ステップ３０２において合成に用いられると判定
された成分については、その成分の時刻、位相、および
振幅の情報を用いて正弦波波形を計算する（ステップ３
０３）。なおステップ３０２において、合成に用いるか
用いないかの二値的な判断をするのではなく、抽出また
は除外する程度に応じて０から１までの範囲の値をとる
係数を計算して、ステップ３０３で用いる振幅値に積算
してもよい。ステップ３０３の計算結果を波形バッファ
に加算格納し（ステップ３０４）、ステッブ３０１に戻
る。This will be further described in the experiment described in the section of "Effects of the Invention". The waveform addition / synthesis unit 3 first searches for an unprocessed frequency component among the frequency components generated by the frequency component extraction unit 1 (step 301), and ends the process if there is no unprocessed single tone. If there is an unprocessed frequency component, the output of the frequency component selection means 2 is referred to and it is determined whether or not the component is a component to be used for synthesis (step 302). if,
When it is desired to separate and extract a given single tone, only the component included in the output of the frequency component selecting means 2 is used for synthesis, so that the process proceeds to step 303. If it is desired to separate and remove the given single tone, the component included in the output of the frequency component selecting means 2 is not used for synthesis, and therefore the process returns to step 301. For the component determined to be used for synthesis in step 302, a sine wave waveform is calculated using the time, phase, and amplitude information of the component (step 3
03). It should be noted that, in step 302, instead of making a binary decision as to whether or not to use for synthesis, a coefficient having a value in the range of 0 to 1 is calculated according to the degree of extraction or exclusion, and step 303 It may be integrated with the amplitude value used in. The calculation result of step 303 is added and stored in the waveform buffer (step 304) and the process returns to step 301.

【００１８】上述の説明から明らかなように、この発明
の特徴は図１中の周波数成分抽出手段１の抽出方法と、
これら抽出された周波数成分から周波数成分選択手段２
における周波数成分選択方法とにあり、つまりこれらの
方法により、音楽演奏音響信号から分離したい旋律を分
離する方法に１つの発明があり、更に、その分離した周
波数成分を、波形加算合成手段３の音響信号波形生成方
法により、音響信号として出力する分離抽出方法が他の
発明を構成し、また音楽演奏音響信号から前記分離方法
の発明により除去したい旋律を分離し、その残りの周波
数成分を波形加算合成手段３の方法により合成加算して
音響信号として出力する分離除去方法がもう１つの発明
を構成するものである。As is apparent from the above description, the features of the present invention are the extraction method of the frequency component extraction means 1 in FIG.
Frequency component selection means 2 from these extracted frequency components
There is one invention in the method of separating the melody to be separated from the music performance acoustic signal by these methods, that is, the method of separating the separated frequency component from the sound of the waveform adding and synthesizing means 3. A separate extraction method of outputting as a sound signal by a signal waveform generation method constitutes another invention, and a melody to be removed is separated from a music playing sound signal by the invention of the above separation method, and the remaining frequency components are subjected to waveform addition synthesis. Another method of separating and removing by combining and outputting as an acoustic signal by the method of means 3 constitutes another invention.

【００１９】なお周波数成分選択過程で、分離したい旋
律を構成する各単音のみならず和音の各音高、発音開始
時刻、発音終了時刻の情報を用いてもよい。In the frequency component selection process, not only the individual notes constituting the melody to be separated but also the pitch of each chord, the sounding start time, and the sounding end time may be used.

【００２０】[0020]

【発明の効果】次に、この発明を適用した装置の動作実
験例を示す。この実験では、Ｔ社から発売されている市
販のＣＤ（女性ボーカルによる歌謡曲）の音響信号を入
力とした。入力音響信号波形を図２に示す。この実験で
は、この入力音響信号に対し、抽出する旋律の情報とし
て、別途次のような形式のデータを用意した。これは、
各行の最初の数字が単音の開始時刻［秒］、次の数字が
単音の終了時刻［秒］、最後の数字が音高番号（中央ド
を６０とし半音ごとに１だけ異なる数字を与えた番号）
である。この実験においては、このデータを簡単なプロ
グラム（音響信号波形を画面に表示し、ある指定した時
間区間を再生することのできる時刻指定プログラム）を
用いて作成した。Next, an operation test example of an apparatus to which the present invention is applied will be shown. In this experiment, the sound signal of a commercially available CD (a song by a female vocalist) released by Company T was used as an input. The input acoustic signal waveform is shown in FIG. In this experiment, data of the following format was separately prepared as the information of the extracted melody for this input acoustic signal. this is,
The first number in each line is the start time [seconds] of a single note, the next number is the end time [seconds] of a single note, and the last number is the pitch number (a number that differs by 1 for each semitone, with the center dot at 60). )
Is. In this experiment, this data was created using a simple program (a time designation program capable of displaying an acoustic signal waveform on the screen and reproducing a certain designated time section).

【００２１】図３は、周波数成分抽出過程１によって出
力された周波数成分である、各周波数成分が線分で示さ
れている。用意した旋律データの一部は下記の通りであ
る。図４は、周波数成分選択過程２によって選択された周波
数成分である。この例では、女性ボーカルのみを分離抽
出している。この選択された周波数成分を用いて、女性
ボーカルのみの音響信号を再生することができる。In FIG. 3, each frequency component, which is the frequency component output in the frequency component extraction process 1, is shown by a line segment. A part of the prepared melody data is as follows. FIG. 4 shows the frequency components selected in the frequency component selection process 2. In this example, only female vocals are separated and extracted. By using this selected frequency component, it is possible to reproduce the acoustic signal of only female vocals.

【００２２】この例のように、市販ＣＤを用いて、この
発明を適用した装置を動作させたところ、公知の方法よ
りも容易に、聴感上品質の高い出力音響信号が得られる
ことが確かめられた。以上、説明したように、この発明
によれば、楽譜情報を利用し、また高精度の瞬時周波数
抽出を行うことによって、専門家とは限らない一般の利
用者の利用に供する場合や、高音質の処理が必要な場合
であっても、公知の方法に比較して一般の利用者が容易
に利用できる形で、かつ高い品質で音楽演奏に含まれる
旋律の分離抽出および分離除去処理を行うことができる
という利点がある。As shown in this example, when a device to which the present invention is applied is operated by using a commercially available CD, it is confirmed that an output acoustic signal of high audible quality can be obtained more easily than the known method. It was As described above, according to the present invention, by using the musical score information and extracting the instantaneous frequency with high accuracy, the present invention is used for general users who are not limited to experts, and high sound quality. Even if the processing of melody is necessary, the separation extraction and separation processing of the melody included in the music performance should be performed in a form that can be easily used by general users compared to the known method and with high quality. The advantage is that

【図面の簡単な説明】[Brief description of drawings]

【図１】この発明の方法を適用した、音楽演奏に含まれ
る旋陣の分離抽出および分離除去装置の一実施形態の機
能構成を示すブロック図。FIG. 1 is a block diagram showing a functional configuration of an embodiment of a separation extraction / separation removal device for a circle included in music performance, to which the method of the present invention is applied.

【図２】音響信号波形の一例の一部を示す図。FIG. 2 is a diagram showing a part of an example of an acoustic signal waveform.

【図３】図２の音響信号に対して周波数解析処理を行っ
た結果を示す図。FIG. 3 is a diagram showing a result of performing frequency analysis processing on the acoustic signal of FIG.

【図４】図３の周波数解析結果に対して周波数成分抽出
処理を行った結果を示す図。FIG. 4 is a diagram showing a result of performing frequency component extraction processing on the frequency analysis result of FIG. 3;

【図５】図４の周波数成分抽出結果に対して、周波数成
分選択手段によって選択を行った結果、選択された周波
数成分（この例では歌唱に相当）のみを示す図。5 is a diagram showing only the frequency components (corresponding to singing in this example) selected as a result of selection by the frequency component selection means with respect to the frequency component extraction result of FIG.

【図６】周波数成分抽出手段の処理手順を示す流れ図。FIG. 6 is a flowchart showing a processing procedure of frequency component extraction means.

【図７】周波数成分選択手段の処理手順を示す流れ図。FIG. 7 is a flowchart showing a processing procedure of frequency component selection means.

【図８】波形加算合成手段の処理手順を示す流れ図。FIG. 8 is a flowchart showing a processing procedure of a waveform addition / synthesis unit.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平８−286689（ＪＰ，Ａ) 特開平８−227296（ＪＰ，Ａ) 特開平７−319488（ＪＰ，Ａ) 実開平５−60100（ＪＰ，Ｕ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10H 1/00 - 1/00 102 G10K 15/04 302 G10G 1/00 - 3/04 ─────────────────────────────────────────────────── ─── Continuation of the front page (56) References JP-A-8-286689 (JP, A) JP-A-8-227296 (JP, A) JP-A-7-319488 (JP, A) Actual Kaihei 5- 60100 (JP, U) (58) Fields surveyed (Int.Cl. ⁷ , DB name) G10H 1/00-1/00 102 G10K 15/04 302 G10G 1/00-3/04

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】音楽演奏の音響信号に含まれている周波
数成分を抽出する周波数成分抽出過程と、上記周波数成分抽出過程で抽出した周波数成分に対し、
分離したい旋律を構成する各音の音高、発音開始時刻、
および発音終了時刻の情報を用いて分離すべき周波数成
分とその時刻とを選択する周波数成分選択過程と、を有し、上記周波数成分選択過程で選択したもの又は上
記音響信号中の上記選択したもの以外のものを用いて音
響信号波形を合成する旋律の分離方法であって、上記周波数成分抽出過程は、上記音響信号を分析フレー
ムごとに短時間フーリエ分析して位相を求める過程と、
位相の連続性に基づき、上記求めた位相の時間変化を低
次の多項式で近似する過程と、その近似位相を時間微分
して周波数値の時系列を求める過程と、この周波数値の
時系列に対しフィルタリング演算を行って瞬時周波数値
を得る過程とを有する音楽演奏に含まれる旋律の分離方
法。1. A frequency component extracting process for extracting a frequency component included in an audio signal of music performance, and a frequency component extracted in the frequency component extracting process,
The pitch of each note that constitutes the melody you want to separate, the pronunciation start time,
And pronunciation ending time information frequency components to be separated using the frequency component selecting step of selecting a time time and its, have a, those selected in the above frequency component selection process or above
A sound is generated using a sound signal other than the one selected above.
A method of separating a melody for synthesizing a reverberation signal waveform, wherein the frequency component extracting step analyzes the acoustic signal.
The process of obtaining the phase by short-time Fourier analysis for each
Based on the continuity of the phase, reduce the time change of the phase obtained above.
The process of approximating with the following polynomial and its approximate phase are time-differentiated.
To obtain the time series of frequency values and
Instantaneous frequency value by filtering the time series
The method of separating the melody included in the musical performance having the process of obtaining .

【請求項２】上記周波数成分選択過程は、上記発音開
始時刻および発音終了時刻情報で決まる音の時間区間
と、上記音高情報で決まる周波数に、上記抽出した周波
数成分の時刻と瞬時周波数値とが対応しているかを検査
し、対応していればその瞬時周波数値の成分およびその
周波数の整数倍に近い瞬時周波数値の成分を、単音を構
成する成分として選択することを特徴とする請求項１記
載の音楽演奏に含まれる旋律の分離方法。2. The frequency component selection process is performed by the sound generation.
Time interval of sound determined by start time and pronunciation end time information
And the extracted frequency to the frequency determined by the pitch information above.
Check if the time of several components correspond to the instantaneous frequency value
If it corresponds, the component of the instantaneous frequency value and its
A component of the instantaneous frequency value that is close to an integer multiple of the frequency
The method for separating a melody included in a musical performance according to claim 1 , wherein the method is selected as a component to be formed .

【請求項３】音楽演奏の音響信号から請求項１又は２
の旋律分離方法で選択された周波数成分の時刻と、その
時刻におけるその周波数成分の瞬時周波数値、振幅、位
相情報に基づいて波形を合成加算して音響信号波形を生
成する波形加算合成過程を有する音楽演奏に含まれる旋
律の分離抽出方法。Wherein wherein the acoustic signal of the music playing in claim 1 or 2
And time of a frequency component selected in the melody separation method, the
As instantaneous frequency value of the frequency component, amplitude, method of separation and extraction melody contained in the music performance having a waveform additive synthesis over extent to generate a composite addition to acoustic signal waveform waveform based on the phase information at the time.

【請求項４】音楽演奏の音響信号から請求項１又は２
の旋律分離方法で選択された周波数成分以外の周波数成
分の時刻と、その時刻におけるその周波数成分の瞬時周
波数値、振幅、位相情報に基づいて波形を合成加算して
音響信号波形を生成する波形加算合成過程とを有する音
楽演奏に含まれる旋律の分離除去方法。4. The claims from acoustic signals of musical performance to claim 1 or 2
Frequency components other than the melody frequency component selected by the separation method
The time of minute and the instantaneous circumference of that frequency component at that time
Wave number, amplitude, waveform based on the position Aijo report synthesis and addition to
The method of separating and removing melody contained in the music performance having a waveform additive synthesis process of generating acoustic signal waveform.