JP2012002858A

JP2012002858A - Time scaling method, pitch shift method, audio data processing apparatus and program

Info

Publication number: JP2012002858A
Application number: JP2010135068A
Authority: JP
Inventors: Yoshihisa Furukawa; 善久古川; Yasuki Sekine; 泰樹関根
Original assignee: Pioneer Electronic Corp
Current assignee: Pioneer Corp
Priority date: 2010-06-14
Filing date: 2010-06-14
Publication date: 2012-01-05

Abstract

PROBLEM TO BE SOLVED: To provide a time scaling method and the like capable of converting a musical sound to a high quality sound using a FFT method regardless of type of the musical sound.SOLUTION: An audio data processing apparatus comprises: a FFT unit 21 for converting digital audio data to an amplitude and a phase of each frequency component; a phase operation unit 22 for performing any of first phase operation processing in which phase resetting is performed on the assumption that the phase of each frequency component is an operation result of a frequency conversion step and second phase operation processing in which phase continuation is performed on the assumption that the phase of each frequency component is continuously changed in view of time warping from a previous operation result of the frequency conversion step, in accordance with an operation result for time rate of change in the amplitude and/or the phase; an inverse FFT unit 23 for converting each frequency component after the phase operation processing by the phase operation unit 22 to digital audio data; and a time warping operation unit 24 for increasing or reducing the number of data in proportion to a time warping ratio when inverse transformation processing for frequency is performed by the inverse FFT unit 23.

Description

デジタルオーディオデータの音高を変えずに時間軸上の長さを伸張および圧縮するタイムスケーリング方法、ピッチシフト方法、オーディオデータ処理装置およびプログラムに関するものである。 The present invention relates to a time scaling method, a pitch shift method, an audio data processing device, and a program for extending and compressing the length on the time axis without changing the pitch of digital audio data.

従来、デジタルオーディオデータの音高を変えずに時間軸上の長さを伸張および圧縮するタイムスケーリング技術として、クロスフェード方式が知られている。例えば、特許文献１の［従来の技術］の欄には、クロスフェード方式の一例として、ＰＩＣＯＬＡ方式が記載されている。また、デジタルオーディオデータの再生速度を変えずにサンプリング周波数を変更するＳＲＣ（Sampling Rate Convert）処理が知られている。例えば、特許文献２には、ＳＲＣ処理により音程変換を行う方法が記載されている。このＳＲＣ処理を、クロスフェード方式のタイムスケーリング量をキャンセルするサンプリング周波数変更量でクロスフェード方式のタイムスケーリング処理の前あるいは後に行い、元のデジタルオーディオデータのサンプリング周波数で再生すれば同時に音高が変更されるため、元のデジタルオーディオデータの音高だけを変更し時間軸上の長さを変更しないピッチシフト（キーコントロール）を実現できる。 Conventionally, a crossfade method is known as a time scaling technique for extending and compressing the length on the time axis without changing the pitch of digital audio data. For example, in the column [Prior Art] of Patent Document 1, a PICOLA method is described as an example of a crossfade method. Also, SRC (Sampling Rate Convert) processing is known in which the sampling frequency is changed without changing the playback speed of digital audio data. For example, Patent Document 2 describes a method for performing pitch conversion by SRC processing. This SRC processing is performed before or after the crossfade time scaling process with the sampling frequency change amount that cancels the crossfade time scaling amount, and the pitch is changed at the same time if playback is performed with the original digital audio data sampling frequency. Therefore, it is possible to realize a pitch shift (key control) in which only the pitch of the original digital audio data is changed and the length on the time axis is not changed.

一方、タイムスケーリングやピッチシフトを実現する方法として、ＦＦＴ（Fast Fourier Transform，高速フーリエ変換）方式が知られている。例えば、特許文献３には、ＦＦＴ方式を用いて入力サンプリング数と出力サンプリング数とを変更することで、タイムスケーリングを行う方法が記載されている。また、特許文献４には、ＦＦＴ方式を用いてタイムスケーリングを行う際に、入力オーバーラップサンプリング数と出力オーバーラップサンプリング数とを変更したときのトランジェント（打撃音）の開始ずれを補正する方法が記載されている。 On the other hand, an FFT (Fast Fourier Transform) method is known as a method for realizing time scaling and pitch shift. For example, Patent Document 3 describes a method for performing time scaling by changing the number of input samplings and the number of output samplings using an FFT method. Patent Document 4 discloses a method for correcting a start deviation of a transient (battering sound) when the input overlap sampling number and the output overlap sampling number are changed when performing time scaling using the FFT method. Are listed.

なお、タイムスケーリングやピッチシフトを実現する上記の２方式を比較すると、一般的にＦＦＴ方式は、クロスフェード方式で問題となる２度鳴り、音抜け、うなり（トレモロ）といった音質劣化がなく、クロスフェード方式よりも高品質な音に変換できるといった利点がある。 Compared to the above two methods that realize time scaling and pitch shift, the FFT method generally does not cause sound quality degradation such as double sound, missing sound, and tremolo, which is a problem with the crossfade method. There is an advantage that it can be converted to a higher quality sound than the fade method.

特許第３３９５５６０号公報（段落［０００３］等）Japanese Patent No. 3395560 (paragraph [0003] etc.) 特開平０８−１３９５７０号公報Japanese Patent Laid-Open No. 08-139570 特表２００７−５１９９６７号公報Special table 2007-519967 米国特許７５６５２８９号公報US Pat. No. 7,565,289

ところが、ＦＦＴ方式を用いた場合、アタックが緩いロングトーンの音（メロディ音）は高品質に変換できるが、アタックが急峻な打楽器音（リズム音）はアタック部が時間軸方向に間延びし、アタック感が失われるという音質劣化が生じる。これは、例えば上記のようにＦＦＴ方式を用いて入出力のサンプリング数を変更することでタイムスケーリングを実現する場合、原音の位相のままでは、次のＦＦＴ演算との間で位相が不連続になってしまうため、位相が不連続にならないように、位相を連続化する処理が必要となると共に、ピッチシフトを行う場合は周波数領域で周波数シフトを行なうため、ＦＦＴにより計算した位相を周波数シフト後の位相とすることができず、周波数シフト後の周波数成分ごとに連続変化位相処理が必要となり、その結果、原音とはまったく違う位相になるためである。ところが、アタック感の強い音（打楽器音など）では広い周波数帯域で位相が特定の関係（音の鳴り始め時点で位相が０）にあると考えられる。従来のＦＦＴ方式によるピッチシフトでは、連続変化位相処理により周波数成分ごとに位相が連続になるように計算されるので、アタック部で特定の位相関係を持つべき複数の周波数成分間の位相の関係が原音と異なり、結果的にアタック感が失われるという音質上の問題があった。つまり、従来のＦＦＴ方式では、ＦＦＴにより計算された位相を周波数シフト後には別の値に変換して用いなければならないため、周波数成分間でアタックを失わないために保つべき位相関係が失われ、アタック感の消失を防止できなかった。また、従来のＦＦＴ方式は、周波数領域でピッチシフトするため、周波数成分が一対一に対応せず、演算誤差が生じてしまう。また、各周波数成分で周波数誤差と位相誤差が分離できず、ピッチシフト演算時に誤差が生じてしまう。これにより、アタック部以外においても音の劣化が生じるといった問題もあった。 However, when the FFT method is used, a long tone sound (melody sound) with a loose attack can be converted to high quality, but a percussion instrument sound (rhythm sound) with a sharp attack has an attack portion extending in the time axis direction. Deterioration of sound quality is lost. For example, when the time scaling is realized by changing the number of input / output samples using the FFT method as described above, the phase is discontinuous with the next FFT operation as it is with the original sound phase. Therefore, in order to prevent the phase from becoming discontinuous, it is necessary to make the phase continuous, and when performing the pitch shift, the frequency shift is performed in the frequency domain. This is because the continuous phase change processing is required for each frequency component after the frequency shift, resulting in a completely different phase from the original sound. However, it is considered that the phase of a sound with a strong attack feeling (percussion instrument sound, etc.) has a specific relationship in a wide frequency band (the phase is 0 at the start of sounding). In the pitch shift by the conventional FFT method, since the phase is calculated for each frequency component by continuous change phase processing, the phase relationship between a plurality of frequency components that should have a specific phase relationship in the attack portion is obtained. Unlike the original sound, there was a sound quality problem that resulted in a loss of attack. That is, in the conventional FFT method, the phase calculated by the FFT must be converted into a different value after the frequency shift, so that the phase relationship to be maintained in order not to lose the attack between the frequency components is lost. Loss of attack could not be prevented. In addition, since the conventional FFT method is pitch-shifted in the frequency domain, the frequency components do not correspond one-to-one, and calculation errors occur. Further, the frequency error and the phase error cannot be separated for each frequency component, and an error occurs during the pitch shift calculation. As a result, there is a problem that sound is deteriorated even in portions other than the attack portion.

本発明は、上記の問題点に鑑み、メロディ音やリズム音などの楽音の種類に拠らず、また、メロディ音やリズム音などが混在したような複雑な楽音の場合であっても、ＦＦＴ方式を用いて高品質な音に変換可能なタイムスケーリング方法、ピッチシフト方法、オーディオデータ処理装置およびプログラムを提供することを目的とする。 In view of the above problems, the present invention does not depend on the type of musical sound such as a melody sound or a rhythm sound, and even in the case of a complex musical sound such as a mixture of a melody sound or a rhythm sound. It is an object of the present invention to provide a time scaling method, a pitch shift method, an audio data processing device, and a program that can be converted into a high-quality sound using a method.

本発明のタイムスケーリング方法は、オーディオデータ処理装置が、デジタルオーディオデータを、周波数成分ごとの振幅と位相に変換する周波数変換ステップと、振幅および／または位相の時間変化率の演算結果を用いて異なる位相切替判別を行う複数の位相切替判別処理の処理結果に応じ、周波数成分ごとの位相が、周波数変換ステップの演算結果そのものとして位相のリセット処理を行う第１の位相演算処理と、周波数成分ごとの位相が、周波数変換ステップの前回の演算結果から時間伸縮を考慮して連続変化したものとして位相の連続化処理を行う第２の位相演算処理と、の少なくともいずれかを行う位相演算ステップと、位相演算ステップによる位相演算処理後の各周波数成分を、デジタルオーディオデータに変換する周波数逆変換ステップと、周波数逆変換ステップによる周波数逆変換処理時に、時間伸縮率に比例して周波数逆変換後のデジタルオーディオデータのデータ数を増減させる時間伸縮演算ステップと、を実行することを特徴とする。 The time scaling method of the present invention differs in that the audio data processing apparatus uses a frequency conversion step in which digital audio data is converted into amplitude and phase for each frequency component, and a calculation result of the time change rate of the amplitude and / or phase. In accordance with the processing results of the plurality of phase switching determination processes for performing the phase switching determination, the phase for each frequency component is converted into the first phase calculation processing for performing phase reset processing as the calculation result of the frequency conversion step, and for each frequency component. A phase calculation step for performing at least one of a second phase calculation process for performing a phase continuation process on the assumption that the phase has continuously changed in consideration of time expansion and contraction from the previous calculation result of the frequency conversion step; Inverse frequency change that converts each frequency component after the phase calculation process in the calculation step into digital audio data A step, when the frequency inverse conversion processing by inverse frequency conversion step, and executes a a time warping computation step of increasing or decreasing the number of data of digital audio data after frequency inversion in proportion to the time scaling factor.

本発明のオーディオデータ処理装置は、デジタルオーディオデータを、周波数成分ごとの振幅と位相に変換する周波数変換手段と、振幅および／または位相の時間変化率の演算結果を用いて異なる位相切替判別を行う複数の位相切替判別処理の処理結果に応じ、周波数成分ごとの位相が、周波数変換手段の演算結果そのものとして位相のリセット処理を行う第１の位相演算処理と、周波数成分ごとの位相が、周波数変換手段の前回の演算結果から時間伸縮を考慮して連続変化したものとして位相の連続化処理を行う第２の位相演算処理と、の少なくともいずれかを行う位相演算手段と、位相演算手段による位相演算処理後の各周波数成分を、デジタルオーディオデータに変換する周波数逆変換手段と、周波数逆変換手段による周波数逆変換処理時に、時間伸縮率に比例して周波数逆変換後のデジタルオーディオデータのデータ数を増減させる時間伸縮演算手段と、を備えたことを特徴とする。 The audio data processing apparatus of the present invention performs different phase switching discrimination using frequency conversion means for converting digital audio data into amplitude and phase for each frequency component, and the calculation result of the time change rate of amplitude and / or phase. In accordance with the processing results of the plurality of phase switching discrimination processes, the phase for each frequency component is converted into a first phase calculation process for performing phase reset processing as the calculation result itself of the frequency conversion means, and the phase for each frequency component is converted to a frequency. A phase calculation means for performing at least one of the second phase calculation processing for performing phase continuation processing as a result of continuous change in consideration of time expansion and contraction from the previous calculation result of the means, and phase calculation by the phase calculation means Frequency inverse transform means for converting each processed frequency component into digital audio data, and frequency inverse transform processing by the frequency inverse transform means. Sometimes, characterized in that and a time warping operation means for increasing or decreasing the number of data of digital audio data after to frequency inversion proportional to the time scaling factor.

これらの構成によれば、振幅および／または位相の時間変化率の演算結果を用いて、異なる位相切替判別を行う複数の位相切替判別処理を行うことにより、急峻な音の立ち上がりなどを検出できる。また、当該複数の位相切替判別処理の処理結果に応じて、適切な位相演算処理（第１位相演算処理および第２の位相演算処理とのいずれか）を行うため、アタック感の消失を防止できる。つまり、振幅および／または位相の時間変化率の演算結果から、急峻な音の立ち上がりなどが検出された場合は、位相の連続化処理ではなく（第２の位相演算処理ではなく）、ＦＦＴ変換した位相そのものを用いて位相演算処理を行うため（第１の位相演算処理を行うため）、アタック感を再現することが可能となる。これにより、リズム音などのアタックが急峻な楽音であっても、ＦＦＴ方式を用いて高品質なタイムスケーリングが可能となる。 According to these configurations, it is possible to detect a steep rise in sound or the like by performing a plurality of phase switching discrimination processes for performing different phase switching discrimination using the calculation result of the time change rate of the amplitude and / or phase. In addition, since an appropriate phase calculation process (either the first phase calculation process or the second phase calculation process) is performed according to the processing results of the plurality of phase switching determination processes, it is possible to prevent the loss of attack feeling. . In other words, when a steep rise in sound is detected from the calculation result of the time change rate of the amplitude and / or phase, FFT conversion is performed instead of the phase continuation process (not the second phase calculation process). Since the phase calculation process is performed using the phase itself (to perform the first phase calculation process), it is possible to reproduce the attack feeling. As a result, even if the rhythm sound or the like has a steep attack, it is possible to perform high-quality time scaling using the FFT method.

上記に記載のタイムスケーリング方法において、複数の位相切替判別処理は、異なる周波数帯域ごとにアタック部の有無を判別するものであり、位相演算ステップでは、複数の位相切替判別処理の判別によりアタック部「有」と判別された場合、第１の位相演算処理を行い、アタック部「無」と判別された場合、第２の位相演算処理を行うことを特徴とする。 In the time scaling method described above, the plurality of phase switching determination processes determine the presence / absence of an attack unit for each different frequency band. In the phase calculation step, the attack unit “ When it is determined as “present”, the first phase calculation process is performed, and when it is determined that the attack part is “none”, the second phase calculation process is performed.

上記に記載のオーディオデータ処理装置において、複数の位相切替判別処理は、異なる周波数帯域ごとにアタック部の有無を判別するものであり、位相演算手段は、複数の位相切替判別処理の判別によりアタック部「有」と判別された場合、第１の位相演算処理を行い、アタック部「無」と判別された場合、第２の位相演算処理を行うことを特徴とする。 In the audio data processing device described above, the plurality of phase switching determination processes determine presence / absence of an attack unit for each different frequency band, and the phase calculation means is configured to determine the attack unit by determining the plurality of phase switching determination processes. A first phase calculation process is performed when it is determined as “present”, and a second phase calculation process is performed when it is determined that the attack part is “none”.

この構成によれば、異なる周波数帯域ごとにアタック部の有無を判別するため、正確にアタック部を検出することができる。また、アタック部を検出した場合は、位相の連続化処理（第２の位相演算処理）を行わないため、アタック感の消失を防止できる。 According to this configuration, since the presence / absence of the attack portion is determined for each different frequency band, the attack portion can be accurately detected. Further, when the attack part is detected, the phase continuation process (second phase calculation process) is not performed, so that the loss of attack can be prevented.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、振幅の時間変化率を振幅で除算した正規化振幅差分値を用いて、複数の位相切替判別処理を行うことを特徴とする。 In the time scaling method described above, in the phase calculation step, a plurality of phase switching determination processes are performed using a normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude.

上記に記載のオーディオデータ処理装置において、位相演算手段は、振幅の時間変化率を振幅で除算した正規化振幅差分値を用いて、複数の位相切替判別処理を行うことを特徴とする。 In the audio data processing apparatus described above, the phase calculation means performs a plurality of phase switching determination processes using a normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude.

これらの構成によれば、振幅の時間変化率を用いるため、アタック部を正確に検出することができる。また、振幅の時間変化率を振幅で除算した正規化振幅差分値を用いるため、原音の音量が小さい場合でも確実にアタック部を検出することができる。 According to these structures, since the time change rate of an amplitude is used, an attack part can be detected correctly. Further, since the normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude is used, the attack portion can be reliably detected even when the volume of the original sound is low.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、位相切替判別処理として、正規化振幅差分値の合計値が所定の閾値以上であるか否かを判別し、所定の閾値以上である場合、アタック部の有無を判別し、当該アタック部が検出された場合は、全周波数帯域に対して第１の位相演算処理を行うことを特徴とする。 In the time scaling method described above, in the phase calculation step, as the phase switching determination process, it is determined whether or not the total value of the normalized amplitude difference values is equal to or greater than a predetermined threshold. The presence or absence of an attack part is determined, and when the attack part is detected, a first phase calculation process is performed on the entire frequency band.

この構成によれば、正規化振幅差分値の合計値が大きい場合であってアタック部が検出された場合は、位相リセットをするべき周波数成分が広範囲に広がっていることを意味するため、全周波数帯域に対して第１の位相演算処理を行うことで、高品質な音の変換を実現できる。 According to this configuration, when the total value of the normalized amplitude difference values is large and an attack portion is detected, it means that the frequency components to be phase-reset are spread over a wide range. By performing the first phase calculation process on the band, high-quality sound conversion can be realized.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、位相切替判別処理として、正規化振幅差分値の合計値が所定の閾値以上であるか否かを判別し、所定の閾値未満である場合、さらに位相切替判別処理として、周波数帯域ごとにアタック部の有無を判別し、当該アタック部が検出された場合は、周波数帯域ごとに第１の位相演算処理を行い、アタック部が検出されなかった場合は、第２の位相演算処理を行うことを特徴とする。 In the time scaling method described above, in the phase calculation step, as the phase switching determination process, it is determined whether or not the total value of the normalized amplitude difference values is equal to or greater than a predetermined threshold. Further, as phase switching determination processing, the presence / absence of an attack portion is determined for each frequency band, and when the attack portion is detected, the first phase calculation processing is performed for each frequency band, and the attack portion is not detected Is characterized by performing a second phase calculation process.

この構成によれば、正規化振幅差分値の合計値が小さい場合は、周波数帯域ごとにアタック部の有無を検出するため、微細なアタックであっても確実に検出することができる。また、周波数帯域ごとのアタック部が検出された場合は、周波数帯域ごとに第１の位相演算処理を行うため、微細なアタック感の消失を防止し、高品質な音の変換を実現できる。 According to this configuration, when the total value of the normalized amplitude difference values is small, the presence / absence of the attack portion is detected for each frequency band, so that even a fine attack can be reliably detected. In addition, when an attack portion for each frequency band is detected, the first phase calculation process is performed for each frequency band, so that the loss of a fine attack feeling can be prevented and high-quality sound conversion can be realized.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、位相の時間変化率である位相断層度を用いて、複数の位相切替判別処理を行うことを特徴とする。 In the time scaling method described above, in the phase calculation step, a plurality of phase switching determination processes are performed using a phase slice degree that is a time change rate of the phase.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、振幅の時間変化率を振幅で除算した正規化振幅差分値と、位相の時間変化率である位相断層度とを用いて、複数の位相切替判別処理を行うことを特徴とする。 In the time scaling method described above, the phase calculation step uses a normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude, and a phase slice degree that is the time change rate of the phase, thereby switching a plurality of phases. A discrimination process is performed.

これらの構成によれば、位相の時間変化率である位相断層度を用いてアタック部を検出することができる。また、正規化振幅差分値と位相断層度とを組み合わせることで、より正確にアタック部を検出することができる。 According to these configurations, it is possible to detect the attack portion using the phase tomographic degree that is the time change rate of the phase. Further, the attack portion can be detected more accurately by combining the normalized amplitude difference value and the phase fault degree.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、アタック部が検出された場合でも、周波数ピークが時間的に継続している継続成分に対しては、第２の位相演算処理を行うことを特徴とする。 In the time scaling method described above, in the phase calculation step, the second phase calculation process is performed on the continuous component in which the frequency peak continues in time even when the attack portion is detected. Features.

この構成によれば、周波数ピークが時間的に継続している継続成分を位相リセット処理の対象外とすることで、高音質化を図ることができる。具体的には、継続成分を継続音として検出し、これを位相連続化処理することで、アタック部の前後で継続して鳴っている音を途切れにくくすることができる。
なお、「周波数ピーク」とは、ＦＦＴ変換で得られたスペクトルが極大となる箇所の周波数を指す。 According to this configuration, it is possible to improve the sound quality by excluding the continuous component in which the frequency peak continues in time from the phase reset process. Specifically, by detecting the continuation component as a continuation sound and subjecting it to a phase continuation process, it is possible to make it difficult to interrupt the sound that is continuously sounding before and after the attack portion.
“Frequency peak” refers to the frequency at which the spectrum obtained by the FFT transform becomes maximum.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、周波数ピークの継続成分に対して位相のリセット処理を行わない場合、当該継続成分となる周波数ピークの近傍に発生するサイドローブ成分に対しても、第２の位相演算処理を行うことを特徴とする。 In the time scaling method described above, in the phase calculation step, when the phase reset process is not performed on the continuous component of the frequency peak, the side lobe component generated in the vicinity of the frequency peak serving as the continuous component is also detected. The second phase calculation process is performed.

この構成によれば、継続成分となる周波数ピークの近傍に発生するサイドローブ成分も継続音として検出し、これを位相連続化処理する（位相リセット処理の対象外とする）ことで、アタック部の前後で継続して鳴っている音をより途切れにくくすることができる。
なお、「サイドローブ成分」とは、周波数ピークの両隣に発生する、なだらかに大きさが小さくなっていく周波数帯を指す。 According to this configuration, the side lobe component generated in the vicinity of the frequency peak that is a continuation component is also detected as a continuation sound, and this is subjected to phase continuation processing (excluded from the phase reset processing target), thereby It is possible to make it more difficult to interrupt the sound that is continuously played back and forth.
The “sidelobe component” refers to a frequency band that occurs on both sides of the frequency peak and gradually decreases in size.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、アタック部が検出されなかった場合でも、周波数ピークが時間的に継続していない非継続成分に対して位相のリセット処理を行う場合、当該周波数ピークの近傍に発生するサイドローブ成分に対して、第１の位相演算処理を行うことを特徴とする。 In the time scaling method described above, in the phase calculation step, even when the attack portion is not detected, when the phase reset process is performed on the non-continuous component in which the frequency peak does not continue in time, A first phase calculation process is performed on the side lobe component generated in the vicinity of the peak.

この構成によれば、周波数ピークが時間的に継続していない非継続成分に対して位相のリセット処理を行う場合（つまり前処理において位相リセットすると判定された周波数成分が周波数ピークであった場合）、その周波数ピークのサイドローブ成分も合わせて位相リセット処理を行うことで、アタック部をより鮮明に再現することができる。 According to this configuration, when a phase reset process is performed on a non-continuous component whose frequency peak does not continue in time (that is, when the frequency component determined to be phase reset in the preprocessing is a frequency peak) The attack portion can be reproduced more clearly by performing the phase reset process together with the side lobe component of the frequency peak.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、ステレオの左右の音に対する正規化振幅差分値の合算結果を用いて、複数の位相切替判別処理を行うことを特徴とする。 In the time scaling method described above, in the phase calculation step, a plurality of phase switching determination processes are performed using a summation result of normalized amplitude difference values for stereo left and right sounds.

ステレオの左右の音に音量差があった場合、同一音源からの発生音は左右同時にリセットしないと位相が左右ばらばらになってしまう。この構成によれば、ステレオの左右の音に対する正規化振幅差分値の合算結果を用いて、複数の位相切替判別処理を行うため、左右の音の位相リセットのタイミングを同期させ、音像（定位）の乱れを防止することができる。 When there is a difference in volume between the left and right sounds of stereo, the sound generated from the same sound source will be out of phase if left and right are not reset simultaneously. According to this configuration, since a plurality of phase switching determination processes are performed using the sum of normalized amplitude difference values for stereo left and right sounds, the timing of phase reset of the left and right sounds is synchronized, and the sound image (localization) Can be prevented.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、ステレオの左右の音に対する正規化振幅差分値の合算結果と、ステレオの左右の音それぞれの正規化振幅差分値と、を用いて、複数の位相切替判別処理を行うことを特徴とする。 In the time scaling method described above, in the phase calculation step, a plurality of normalized amplitude difference values for the left and right stereo sounds and a normalized amplitude difference value for each of the left and right stereo sounds are used. A phase switching discrimination process is performed.

この構成によれば、ステレオの左右の音に対する正規化振幅差分値の合算結果だけでなく、ステレオの左右の音それぞれの正規化振幅差分値を用いるため、左右の音量差なども考慮して、より確実に音像の乱れを防止することができる。 According to this configuration, not only the sum of the normalized amplitude difference values for the left and right sounds of the stereo, but also the normalized amplitude difference values of the left and right sounds of the stereo are used. Disturbance of the sound image can be prevented more reliably.

上記に記載のタイムスケーリング方法において、位相演算ステップでは、正規化振幅差分値の合計値が所定の閾値以上である場合であってアタック部が検出された場合は、低周波成分のみ、所定時間だけタイミングを遅らせて、第１の位相演算処理を行うことを特徴とする。 In the time scaling method described above, in the phase calculation step, when the total value of the normalized amplitude difference values is equal to or greater than a predetermined threshold and an attack portion is detected, only the low frequency component, only the predetermined time The first phase calculation process is performed with the timing delayed.

低音域の音は周期が長いため、前処理で検出した位相リセットのタイミングでは位相が安定せず、位相リセット処理の効果が小さい。このため、前処理で検出した位相リセットのタイミングから所定時間だけタイミングを遅らせて位相リセット処理を行うことで、低音域のアタック感を復活させることができる。これにより、バスドラムなどの胴鳴り（高い周波数の後に低い周波数が続く部分）の高音質化を図ることができる。 Since the sound of the low frequency range has a long period, the phase is not stable at the phase reset timing detected in the preprocessing, and the effect of the phase reset processing is small. For this reason, the attack feeling in the low sound range can be restored by performing the phase reset process by delaying the timing by a predetermined time from the phase reset timing detected in the preprocessing. As a result, it is possible to improve the sound quality of the drumming of a bass drum or the like (the portion where a low frequency follows a high frequency).

本発明のピッチシフト方法は、オーディオデータ処理装置が、上記に記載のＦＦＴを用いたタイムスケーリング方法における各ステップと、デジタルオーディオデータのサンプリング周波数を変更することで、時間伸縮および音高変更を行うサンプリングレート変換演算ステップと、を実行し、タイムスケーリング方法の各ステップによる時間長変化と、サンプリングレート変換演算ステップによる時間長変化とが相殺され、音高のみが変更されることを特徴とする。 In the pitch shift method of the present invention, the audio data processing device performs time expansion and contraction and pitch change by changing each step in the time scaling method using the FFT described above and the sampling frequency of the digital audio data. The sampling rate conversion calculation step is executed, and the time length change by each step of the time scaling method and the time length change by the sampling rate conversion calculation step are offset, and only the pitch is changed.

上記に記載のオーディオデータ処理装置において、デジタルオーディオデータのサンプリング周波数を変更することで、時間伸縮および音高変更を行うサンプリングレート変換演算手段をさらに備え、サンプリングレート変換演算手段および／または時間伸縮演算手段は、それぞれの演算処理にて発生した時間長変化を相殺することを特徴とする。 The audio data processing apparatus described above further includes sampling rate conversion calculation means for changing the sampling frequency of the digital audio data to change the time expansion / contraction and the pitch, and the sampling rate conversion calculation means and / or the time expansion / contraction calculation. The means is characterized by canceling a change in time length generated in each calculation process.

従来のＦＦＴを用いて周波数領域で周波数シフトを行なう方式のピッチシフトでは、ＦＦＴによって計算された位相は周波数シフト後には別の値に変換して用いなければならないため、周波数成分間でアタックを失わないために保つべき位相関係が失われるので第１の位相演算処理を正しく行なえずアタック感の消失を防止できない。これに対し、これらのピッチシフト方法の構成によれば、サンプリングレート変換法を用いることで、周波数領域で周波数シフトを行わないためＦＦＴにより計算した位相をアタック部分においてはピッチシフト変換音の位相とすることができるので、第１の位相演算処理により、アタック感の消失を防止できる。また、周波数シフト処理の誤差要因が少ないため、サンプリングレート変換法を使用しない従来のＦＦＴ方式と比較すると、アタック部以外の音質低下も防止でき、高品質なピットシフトが可能となる。 In the pitch shift of the frequency shift method using the conventional FFT in the frequency domain, the phase calculated by the FFT must be converted to a different value after the frequency shift, so that the attack between the frequency components is lost. Therefore, the phase relationship to be maintained is lost, so that the first phase calculation process cannot be performed correctly and the loss of attack cannot be prevented. On the other hand, according to the structure of these pitch shift methods, since the frequency shift is not performed in the frequency domain by using the sampling rate conversion method, the phase calculated by FFT is the phase of the pitch shift converted sound in the attack portion. Therefore, the first phase calculation process can prevent the loss of attack feeling. In addition, since there are few error factors in the frequency shift process, compared with the conventional FFT method that does not use the sampling rate conversion method, it is possible to prevent deterioration in sound quality other than the attack portion, and high-quality pit shift is possible.

本発明のプログラムは、コンピューターに、上記に記載のタイムスケーリング方法における各ステップを実行させることを特徴とする。 A program according to the present invention causes a computer to execute each step in the time scaling method described above.

本発明の他のプログラムは、コンピューターに、上記に記載のピッチシフト方法における各ステップを実行させることを特徴とする。 Another program of the present invention causes a computer to execute each step in the pitch shift method described above.

これらのプログラムを用いることにより、メロディ音やリズム音などの楽音の種類に拠らず、ＦＦＴ方式を用いて高品質な音に変換可能なタイムスケーリング方法およびピッチシフト方法を実現できる。 By using these programs, it is possible to realize a time scaling method and a pitch shift method that can be converted into a high-quality sound using the FFT method, regardless of the type of musical sound such as a melody sound or a rhythm sound.

第１実施形態に係る再生装置と、その一部であるオーディオデータ処理部の簡易ブロック図である。1 is a simplified block diagram of a playback apparatus according to a first embodiment and an audio data processing unit that is a part of the playback apparatus. FIG. オーディオデータ処理部のブロック図である。It is a block diagram of an audio data processing part. オーディオデータ処理部によるピッチシフト処理を示すフローチャートである。It is a flowchart which shows the pitch shift process by an audio data processing part. 第１実施形態に係る位相演算処理を示すフローチャートである。It is a flowchart which shows the phase calculation process which concerns on 1st Embodiment. 第２実施形態に係る位相演算処理を示すフローチャートである。It is a flowchart which shows the phase calculation process which concerns on 2nd Embodiment. 第３実施形態に係る位相演算処理を示すフローチャートである。It is a flowchart which shows the phase calculation process which concerns on 3rd Embodiment.

以下、本発明の一実施形態に係るタイムスケーリング方法、ピッチシフト方法、オーディオデータ処理装置およびプログラムについて、添付図面を参照しながら詳細に説明する。本実施形態では、本発明のオーディオデータ処理装置を、ＣＤプレーヤーなどの再生装置に適用した場合について例示する。 Hereinafter, a time scaling method, a pitch shift method, an audio data processing device, and a program according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. In the present embodiment, the case where the audio data processing device of the present invention is applied to a playback device such as a CD player will be exemplified.

［第１実施形態］
図１（ａ）は、再生装置１の簡易ブロック図である。同図に示すように、再生装置１は、再生部２と、オーディオデータ処理部３（オーディオデータ処理装置）と、バッファメモリ４と、オーディオデータ出力部５と、を備えている。再生部２は、ＣＤなどのデバイスから楽曲・楽音を読み出して再生する。オーディオデータ処理部３は、ＣＰＵ（Central Processing Unit）またはＤＳＰ（Digital Signal Processor）によって主要部が構成され、再生部２によって再生されたデジタルオーディオデータ（以下、単に「オーディオデータ」と称する）をバッファメモリ４に格納すると共に、バッファメモリ４から読み出したオーディオデータに対し、デジタル信号処理を施す。なお、バッファメモリ４は、入力用のバッファメモリ４（以下、「入力バッファ４ａ」と称する）と、出力用のバッファメモリ４（以下、「出力バッファ４ｂ」と称する）と、から成る。オーディオデータ出力部５は、オーディオデータ処理部３による処理後のオーディオデータ（出力バッファ４ｂから読み出したオーディオデータ）を、外部（アンプおよびスピーカーを有する出力装置など）に出力する。 [First Embodiment]
FIG. 1A is a simplified block diagram of the playback device 1. As shown in FIG. 1, the playback device 1 includes a playback unit 2, an audio data processing unit 3 (audio data processing device), a buffer memory 4, and an audio data output unit 5. The playback unit 2 reads and plays music / musical sound from a device such as a CD. The audio data processing unit 3 is constituted by a CPU (Central Processing Unit) or a DSP (Digital Signal Processor), and buffers digital audio data (hereinafter simply referred to as “audio data”) reproduced by the reproduction unit 2. Digital signal processing is performed on the audio data stored in the memory 4 and read out from the buffer memory 4. The buffer memory 4 includes an input buffer memory 4 (hereinafter referred to as “input buffer 4a”) and an output buffer memory 4 (hereinafter referred to as “output buffer 4b”). The audio data output unit 5 outputs the audio data processed by the audio data processing unit 3 (audio data read from the output buffer 4b) to the outside (such as an output device having an amplifier and a speaker).

図１（ｂ）は、オーディオデータ処理部３の一例を示すブロック図である。図１（ｂ）のオーディオデータ処理部３は、主な機能構成として、タイムスケーリング部１１を備えている。タイムスケーリング部１１は、上記のバッファメモリ４（入力バッファ４ａ）から、処理対象となるオーディオデータを取得してタイムスケーリング（時間伸縮変換処理）を行う。なお、本実施形態では、ＦＦＴ方式を用いてタイムスケーリングを行う。 FIG. 1B is a block diagram illustrating an example of the audio data processing unit 3. The audio data processing unit 3 in FIG. 1B includes a time scaling unit 11 as a main functional configuration. The time scaling unit 11 acquires audio data to be processed from the buffer memory 4 (input buffer 4a) and performs time scaling (time expansion / contraction conversion process). In the present embodiment, time scaling is performed using the FFT method.

一方、図１（ｃ）は、オーディオデータ処理部３の他の例を示すブロック図である。図１（ｃ）のオーディオデータ処理部３は、主な機能構成として、ＳＲＣ部１２（サンプリングレート変換演算部）と、タイムスケーリング部１１と、を備えている。つまり、図１（ｂ）のオーディオデータ処理部３に、ＳＲＣ部１２を追加した構成となっている。 On the other hand, FIG. 1C is a block diagram showing another example of the audio data processing unit 3. The audio data processing unit 3 in FIG. 1C includes an SRC unit 12 (sampling rate conversion calculation unit) and a time scaling unit 11 as main functional configurations. That is, the SRC unit 12 is added to the audio data processing unit 3 in FIG.

ＳＲＣ部１２は、タイムスケーリング部１１によるタイムスケーリングの前あるいは後に、オーディオデータのサンプリング周波数を変更するＳＲＣ処理を行う（サンプリングレート変換演算ステップ）。ＳＲＣ処理は本来デジタルオーディオデータのサンプリング周期を変更するために使われる技術であるが、ＳＲＣ処理を施して新たに求めたサンプリングデータを、サンプリング周波数を元のままとすることで時間伸縮および音高変更が行われる。つまり、図１（ｃ）のオーディオデータ処理部３は、ＳＲＣ部１２とタイムスケーリング部１１によるオーディオデータの時間長変化を相殺することで、時間軸上の長さを変更することなく音高のみを変更させるピッチシフトを実現できるようになっている。以下、図１（ｃ）に示したオーディオデータ処理部３により、ピッチシフトを行う方法について記載する。 The SRC unit 12 performs an SRC process for changing the sampling frequency of the audio data before or after the time scaling by the time scaling unit 11 (sampling rate conversion calculation step). SRC processing is a technique that is originally used to change the sampling period of digital audio data. However, sampling data newly obtained by performing SRC processing is time-expanded and pitched by keeping the sampling frequency as the original. Changes are made. That is, the audio data processing unit 3 in FIG. 1C cancels the time length change of the audio data by the SRC unit 12 and the time scaling unit 11, so that only the pitch is changed without changing the length on the time axis. The pitch shift which changes can be realized. Hereinafter, a method of performing a pitch shift by the audio data processing unit 3 shown in FIG.

図２は、オーディオデータ処理部３の詳細な機能構成を示すブロック図である。上記の通り、オーディオデータ処理部３は、ＳＲＣ部１２およびタイムスケーリング部１１から成る。本実施形態では、最初にＳＲＣ処理を行い、その後タイムスケーリングを行うものとする。ＳＲＣ部１２は、原音となるオーディオデータに対してＳＲＣ処理を行う。 FIG. 2 is a block diagram showing a detailed functional configuration of the audio data processing unit 3. As described above, the audio data processing unit 3 includes the SRC unit 12 and the time scaling unit 11. In the present embodiment, it is assumed that the SRC process is first performed and then time scaling is performed. The SRC unit 12 performs SRC processing on the audio data that is the original sound.

一方、タイムスケーリング部１１は、ＦＦＴ部２１、位相演算部２２、逆ＦＦＴ部２３および時間伸縮演算部２４から成る。ＦＦＴ部２１は、オーディオデータを、周波数成分ごとの振幅と位相に変換する（周波数変換ステップ）。つまり、時間領域の音を、周波数領域に変換し、振幅と位相を求める。 On the other hand, the time scaling unit 11 includes an FFT unit 21, a phase calculation unit 22, an inverse FFT unit 23, and a time expansion / contraction calculation unit 24. The FFT unit 21 converts the audio data into amplitude and phase for each frequency component (frequency conversion step). That is, the sound in the time domain is converted into the frequency domain, and the amplitude and phase are obtained.

位相演算部２２は、振幅の時間変化率の演算結果に応じて、位相演算処理を行う（位相演算ステップ）。具体的には、振幅の時間変化率を振幅で除算した正規化振幅差分値を用いて、異なる位相切替判別を行う複数の位相切替判別処理を行い、その判別結果に応じた位相演算処理を行う。当該複数の位相切替判別処理は、アタック部を検出するための処理である。同図に示すように、位相演算部２２は、アタック検出部３１、位相リセット部３２および位相連続処理部３３を備えている。さらに、アタック検出部３１は、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃから成る。 The phase calculation unit 22 performs phase calculation processing according to the calculation result of the time change rate of the amplitude (phase calculation step). Specifically, using the normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude, a plurality of phase switching determination processes for performing different phase switching determination are performed, and a phase calculation process according to the determination result is performed. . The plurality of phase switching determination processes are processes for detecting an attack unit. As shown in the figure, the phase calculation unit 22 includes an attack detection unit 31, a phase reset unit 32, and a phase continuation processing unit 33. Further, the attack detection unit 31 includes an all frequency band detection unit 31a, a frequency band detection unit (A) 31b, and a frequency band detection unit (B) 31c.

全周波数帯域検出部３１ａは、上記の複数の位相切替判別処理の一つである第１の位相切替判別処理により、正規化振幅差分値の合計値が所定の閾値Ｌ１（当該閾値を、以下「高閾値」と称する）以上であるか否かを判別し、前回の演算において高閾値未満であり、且つ今回の演算で高閾値以上である場合、全周波数帯域に対してリセット処理が必要なアタック部を検出したと判定する。具体的例を挙げると、バスドラ等の低音の打楽器の打撃音を検出した場合などである。低音打楽器の場合のアタック部には楽器の音の高さを特徴づける基本の低い周波数成分から上のかなり高音域までの周波数成分が含まれているため、ほぼ全周波数帯にわたるような位相のリセット処理が必要となる。 The total frequency band detection unit 31a performs a first phase switching determination process that is one of the plurality of phase switching determination processes described above, so that the total value of the normalized amplitude difference values is a predetermined threshold value L1 (hereinafter referred to as “the threshold value”). An attack that requires reset processing for all frequency bands if it is less than the high threshold value in the previous computation and greater than or equal to the high threshold value in the current computation. Part is detected. A specific example is a case where a percussion sound of a bass percussion instrument such as a bass drum is detected. In the case of bass percussion instruments, the attack part contains frequency components from the basic low frequency component that characterizes the pitch of the musical instrument to the considerably higher frequency range above, so the phase can be reset over almost the entire frequency band. Processing is required.

また、周波数帯域別検出部（Ａ）３１ｂは、第１の位相切替判別処理により、正規化振幅差分値の合計値が、上記の高閾値未満且つ所定の閾値Ｌ２（当該閾値を、以下「低閾値」と称する）以上である場合（但し、Ｌ１＞Ｌ２）、第２の位相切替判別処理を行う。当該第２の位相切替判別処理は、周波数成分ごとに正規化振幅差分値を低閾値で２値化し且つ高域限定で、周波数成分ごとにアタック部を検出する処理である。ここでは、中域から高域の打撃音を検出可能である。 Further, the frequency band detection unit (A) 31b performs the first phase switching determination process so that the total value of the normalized amplitude difference values is less than the above high threshold and the predetermined threshold L2 (hereinafter referred to as “low threshold”). If it is equal to or greater than (referred to as “threshold”) (where L1> L2), a second phase switching determination process is performed. The second phase switching determination process is a process of binarizing the normalized amplitude difference value for each frequency component with a low threshold and detecting an attack part for each frequency component with a high frequency limitation. Here, it is possible to detect a striking sound from the mid range to the high range.

さらに、周波数帯域別検出部（Ｂ）３１ｃは、第１の位相切替判別処理により、正規化振幅差分値の合計値が低閾値未満であると判別した場合、第３の位相切替判別処理を行う。当該第３の位相切替判別処理は、周波数成分ごとに正規化振幅差分値を高い閾値で２値化して、周波数成分ごとにアタック部の有無を検出する処理である。ここでは、ボーカルや弦楽器などによるアタックを検出可能である。 Furthermore, the frequency band detection unit (B) 31c performs the third phase switching determination process when it is determined by the first phase switching determination process that the total value of the normalized amplitude difference values is less than the low threshold. . The third phase switching determination process is a process of binarizing the normalized amplitude difference value for each frequency component with a high threshold and detecting the presence or absence of an attack portion for each frequency component. Here, it is possible to detect an attack by a vocal or a stringed instrument.

位相リセット部３２は、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃにおいてアタック部が検出された場合、周波数成分ごとの位相が、ＦＦＴ部２１の演算結果そのものとして位相のリセット処理（第１の位相演算処理）を行う。また、位相連続処理部３３は、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃにおいてアタック部が検出されなかった場合、周波数成分（周波数グリッド）ごとの位相が、ＦＦＴ部２１の前回の演算結果から時間伸縮を考慮して連続変化したものとして位相の連続化処理（第２の位相演算処理）を行う。つまり、位相連続処理部３３は、時間伸縮分位相補正を行い、周波数成分ごとに前回の演算結果と位相が連続するように演算処理を行う。このように、本実施形態の位相演算部２２は、正規化振幅差分値の合計値、および個別の周波数成分ごとの値に応じて、位相リセット処理および位相連続化処理のいずれかの処理を選択的に実行する。 When the attack unit is detected in the total frequency band detection unit 31a, the frequency band detection unit (A) 31b, and the frequency band detection unit (B) 31c, the phase reset unit 32 determines the phase for each frequency component as the FFT unit. A phase reset process (first phase calculation process) is performed as the calculation result of 21 itself. In addition, the phase continuation processing unit 33, when the attack unit is not detected in the all frequency band detection unit 31a, the frequency band detection unit (A) 31b, and the frequency band detection unit (B) 31c, ), The phase continuation process (second phase calculation process) is performed on the assumption that the phase has changed continuously in consideration of time expansion and contraction from the previous calculation result of the FFT unit 21. That is, the phase continuation processing unit 33 performs phase correction for time expansion and contraction, and performs a calculation process so that the previous calculation result and the phase are continuous for each frequency component. As described above, the phase calculation unit 22 of the present embodiment selects one of the phase reset process and the phase continuation process according to the total value of the normalized amplitude difference values and the value for each individual frequency component. Run it.

逆ＦＦＴ部２３は、位相演算部２２による位相演算処理後の各周波数成分を、オーディオデータに変換する（周波数逆変換ステップ）。つまり、周波数領域の振幅と位相を、時間領域の音に変換する。 The inverse FFT unit 23 converts each frequency component after the phase calculation processing by the phase calculation unit 22 into audio data (frequency reverse conversion step). That is, the amplitude and phase in the frequency domain are converted into sound in the time domain.

時間伸縮演算部２４は、逆ＦＦＴ部２３による周波数逆変換処理時に、時間伸縮率に比例してデータ数を増減させる（時間伸縮演算ステップ）。具体的には、ＳＲＣ部１２によるオーディオデータの時間長変化を相殺するように時間伸縮する。なお、時間伸縮の方法については、逆ＦＦＴ部２３により演算された時間領域のオーディオデータをＦＦＴ時にずらしたサンプル数より時間伸縮量に比例して変化させたサンプル数だけずらすことにより実現する。時間伸縮演算部２４による演算処理後のオーディオデータは、変換音として出力される。 The time expansion / contraction calculation unit 24 increases / decreases the number of data in proportion to the time expansion / contraction rate at the time of frequency inverse conversion processing by the inverse FFT unit 23 (time expansion / contraction calculation step). Specifically, time expansion and contraction is performed so as to cancel out the time length change of the audio data by the SRC unit 12. The time expansion / contraction method is realized by shifting the time-domain audio data calculated by the inverse FFT unit 23 by the number of samples changed in proportion to the amount of time expansion / contraction from the number of samples shifted during FFT. The audio data after the arithmetic processing by the time expansion / contraction arithmetic unit 24 is output as converted sound.

なお、ステレオ再生の場合、本実施形態では、各部（ＳＲＣ部１２、ＦＦＴ部２１、位相演算部２２、逆ＦＦＴ部２３および時間伸縮演算部２４）において、左右の音を独立して処理する。 In the case of stereo reproduction, in this embodiment, the left and right sounds are processed independently in each unit (SRC unit 12, FFT unit 21, phase calculation unit 22, inverse FFT unit 23, and time expansion / contraction calculation unit 24).

次に、図３および図４のフローチャートを参照し、第１実施形態に係るピッチシフト処理の流れについて説明する。まず、オーディオデータ処理部３は、初期化処理を行い（ＦＦＴ演算回数ｉ＝１とする，Ｓ０１）、入力バッファ４ａからオーディオデータを取得する（Ｓ０２）。続いて、ＳＲＣ部１２によりＳＲＣ処理を行い（Ｓ０３）、その後Ｓ０４以降のタイムスケーリングを開始する。 Next, the flow of the pitch shift process according to the first embodiment will be described with reference to the flowcharts of FIGS. 3 and 4. First, the audio data processing unit 3 performs an initialization process (FFT operation count i = 1, S01), and acquires audio data from the input buffer 4a (S02). Subsequently, SRC processing is performed by the SRC unit 12 (S03), and then time scaling after S04 is started.

タイムスケーリングでは、まず、入力窓関数（ハニング窓関数）を乗じ（Ｓ０４）、ｉ回目のＦＦＴを行う（Ｓ０５）。また、周波数成分、すなわちＦＦＴ周波数グリッド番号ｊをｊ＝０とし（Ｓ０６）、位相振幅計算を行う（Ｓ０７）。以上、Ｓ０３〜Ｓ０７は、ＦＦＴ部２１による処理工程である。 In time scaling, first, the input window function (Hanning window function) is multiplied (S04), and the i-th FFT is performed (S05). Further, the frequency component, that is, the FFT frequency grid number j is set to j = 0 (S06), and the phase amplitude calculation is performed (S07). As described above, S03 to S07 are processing steps by the FFT unit 21.

続いて、オーディオデータ処理部３は、位相演算部２２により位相演算処理を行う（Ｓ０８）。当該位相演算処理については、図４にて後述する。オーディオデータ処理部３は、位相演算処理を終えると、振幅と位相を複素数化し（Ｓ０９）、ＦＦＴ周波数グリッド番号ｊが、ＦＦＴサンプル数ｎ_ＦＦＴの半分に達したか否か、すなわち「ｊ＝ｎ_ＦＦＴ／２」に達したか否かを判別する（Ｓ１０）。ここで、「ｊ＝ｎ_ＦＦＴ／２」に達していない場合は（Ｓ１０：Ｎｏ）、ＦＦＴ周波数グリッド番号ｊをカウントアップして（Ｓ１１）、Ｓ０７に戻る。また、「ｊ＝ｎ_ＦＦＴ／２」に達した場合は（Ｓ１０：Ｙｅｓ）、複素数化したデータの共役複素数を残り半分の負の周波数成分の複素数データとしてＩＦＦＴを行う（Ｓ１２）。以上、Ｓ０９〜Ｓ１２は、逆ＦＦＴ部２３による処理工程である。 Subsequently, the audio data processing unit 3 performs phase calculation processing by the phase calculation unit 22 (S08). The phase calculation process will be described later with reference to FIG. After completing the phase calculation process, the audio data processing unit 3 converts the amplitude and phase into complex numbers (S09), and determines whether or not the FFT frequency grid number j has reached half of the FFT sample number n _FFT , that is, “j = n _It is determined whether or not “ _FFT / 2” has been reached (S10). If “j = n _FFT / 2” has not been reached (S10: No), the FFT frequency grid number j is counted up (S11), and the process returns to S07. When “j = n _FFT / 2” is reached (S10: Yes), IFFT is performed using the complex complex data of the complex data as the complex data of the remaining negative frequency component (S12). As described above, S09 to S12 are processing steps by the inverse FFT unit 23.

続いて、オーディオデータ処理部３は、出力窓関数（ハニング窓関数）を乗じ（Ｓ１３）、ＳＲＣ率をキャンセルすべく、入力オーバーラップ数にタイムストレッチ率を乗算して、出力ポインタを移動する（Ｓ１４）。また、これを出力バッファ４ｂに書き込んで（出力バッファ４ｂに加算して，Ｓ１５）、変換音として出力する。以上、Ｓ１３〜Ｓ１５は、時間伸縮演算部２４による処理工程である。なお、この実施例では出力窓関数もＦＦＴ前と同じハニング窓としたが、必ずしも同じである必要はなく、別の窓関数を選んでもよいし、ＦＦＴ前の窓関数が適切である場合には出力窓関数を省略することも可能である。 Subsequently, the audio data processing unit 3 multiplies the output window function (Hanning window function) (S13), multiplies the input overlap number by the time stretch ratio and moves the output pointer to cancel the SRC ratio ( S14). Further, this is written into the output buffer 4b (added to the output buffer 4b, S15) and output as converted sound. As described above, S13 to S15 are processing steps by the time expansion / contraction calculation unit 24. In this embodiment, the output window function is also the same Hanning window as before FFT, but it is not necessarily the same, and another window function may be selected, or when the window function before FFT is appropriate. It is also possible to omit the output window function.

その後、オーディオデータ処理部３は、入力オーバーラップ数の入力ポインタを移動し（Ｓ１６）、入力バッファ４ａにオーディオデータが残っているか否かを判別する（Ｓ１７）。ここで、オーディオデータが残っている場合は（Ｓ１７：データあり）、ＦＦＴ演算回数ｉをカウントアップして（Ｓ１８）、Ｓ０２に戻る。また、オーディオデータが残っていない場合は（Ｓ１７：データなし）、ピッチシフト処理を終了する。 Thereafter, the audio data processing unit 3 moves the input pointer for the number of input overlaps (S16), and determines whether or not audio data remains in the input buffer 4a (S17). If audio data remains (S17: data present), the FFT operation count i is counted up (S18), and the process returns to S02. If no audio data remains (S17: no data), the pitch shift process ends.

次に、図４を参照し、図３のＳ０８に相当する位相演算処理について説明する。オーディオデータ処理部３（位相演算部２２）は、まず、振幅の差分を演算し（Ｓ２１）、正規化振幅差分値を求める（Ｓ２２）。すなわち、振幅の時間変化率をさらに振幅で除算することにより正規化振幅差分値を求める。但し、振幅が０であるか、非常に微小である場合は、除算できないか、除算の結果が適切でなくなる可能性があるため、例外処理として正規化振幅差分値も０とする。ここで、i回目の正規化振幅差分値の合計値（図４では、「Σi」と図示）が、高閾値以上であるか、低閾値以上高閾値未満であるか、低閾値未満であるかを判別する（Ｓ２３，第１の位相切替判別処理）。 Next, the phase calculation process corresponding to S08 of FIG. 3 will be described with reference to FIG. The audio data processing unit 3 (phase calculation unit 22) first calculates an amplitude difference (S21) and obtains a normalized amplitude difference value (S22). That is, the normalized amplitude difference value is obtained by further dividing the amplitude time change rate by the amplitude. However, if the amplitude is 0 or very small, division may not be possible or the result of the division may not be appropriate, so the normalized amplitude difference value is also set to 0 as exception processing. Here, whether the total value of i-th normalized amplitude difference values (shown as “Σi” in FIG. 4) is greater than or equal to a high threshold value, greater than or equal to a lower threshold value, and less than or equal to a lower threshold value. (S23, first phase switching determination process).

ここで、i回目の正規化振幅差分値の合計値Σiが高閾値以上の場合は（Ｓ２３：高閾値以上）、i-１回目の正規化振幅差分値の合計値Σi-1が高閾値以上であったか否かを判別し（Ｓ２４）、高閾値以上でなかった場合（Ｓ２４：Ｎｏ）、全周波数帯域に対して位相リセット処理を行う（Ｓ３０）。また、i-１回目の正規化振幅差分値の合計値Σi-1が高閾値以上であった場合は（Ｓ２４：Ｙｅｓ）、位相連続処理を行う（Ｓ３１）。つまり、全周波数帯域検出部３１ａにより、i-１回目演算２値化が０で、i回目演算の２値化が１の場合にアタック部が検出されたと判別し、位相リセット部３２により、周波数成分ごとの位相を、ＦＦＴ部２１の演算結果そのものとして位相をリセットする。また、アタック部が検出されなかった場合は、位相連続処理部３３により、周波数成分ごとの位相が、ＦＦＴ部２１の前回の演算結果から時間伸縮を考慮して連続変化したものとして位相の連続化処理を行う。 Here, when the total value Σi of the i-th normalized amplitude difference value is equal to or greater than the high threshold (S23: equal to or greater than the high threshold), the total value Σi-1 of the i-1th normalized amplitude difference value is equal to or greater than the high threshold. (S24: No), phase reset processing is performed for all frequency bands (S30). If the sum value Σi-1 of the i-1th normalized amplitude difference value is equal to or higher than the high threshold (S24: Yes), the phase continuation process is performed (S31). That is, the total frequency band detection unit 31a determines that the attack unit has been detected when the i-1th calculation binarization is 0 and the i-th calculation binarization is 1, and the phase reset unit 32 The phase of each component is reset as the calculation result of the FFT unit 21 itself. If no attack portion is detected, the phase continuation processing unit 33 makes the phase for each frequency component continuous from the previous calculation result of the FFT unit 21 in consideration of time expansion and contraction. Process.

また、正規化振幅差分値の合計値が低閾値以上高閾値未満の場合は（Ｓ２３：低閾値以上高閾値未満）、周波数成分ごとの正規化振幅差分値を低閾値で２値化し（Ｓ２５）、さらに高域限定で（Ｓ２６）、周波数別リセット（Ａ）の要否を判別する（Ｓ２７，第２の位相切替判別処理）。ここで、周波数別リセット（Ａ）が必要と判別した場合は（Ｓ２７：Ｙｅｓ）、周波数成分ごとに位相リセット処理を行い（Ｓ３０）、周波数別リセット（Ａ）が不要と判別した場合は（Ｓ２７：Ｎｏ）、位相連続処理を行う（Ｓ３１）。つまり、周波数帯域別検出部（Ａ）３１ｂにより、i-１回目演算２値化が０で、i回目演算の２値化が１の場合にアタック部が検出されたと判別し、位相リセット部３２による位相リセット処理を行う。また、アタック部が検出されなかった場合は、位相連続処理部３３による位相の連続化処理を行う。 When the total value of the normalized amplitude difference values is not less than the low threshold value and less than the high threshold value (S23: not less than the low threshold value and less than the high threshold value), the normalized amplitude difference value for each frequency component is binarized with the low threshold value (S25). Further, it is further limited to the high frequency range (S26), it is determined whether or not the frequency-specific reset (A) is necessary (S27, second phase switching determination process). If it is determined that frequency-specific reset (A) is necessary (S27: Yes), phase reset processing is performed for each frequency component (S30), and if frequency-specific reset (A) is determined to be unnecessary (S27). : No), phase continuous processing is performed (S31). That is, the frequency band detection unit (A) 31b determines that the attack unit has been detected when the i-1th calculation binarization is 0 and the i-th calculation binarization is 1, and the phase reset unit 32 Perform phase reset processing by. If no attack portion is detected, the phase continuation processing by the phase continuation processing unit 33 is performed.

さらに、正規化振幅差分値の合計値が低閾値未満の場合は（Ｓ２３：低閾値未満）、周波数成分ごとの正規化振幅差分値を高閾値で２値化し（Ｓ２８）、周波数別リセット（Ｂ）の要否を判別する（Ｓ２９，第３の位相切替判別処理）。ここで、周波数別リセット（Ｂ）が必要と判別した場合は（Ｓ２９：Ｙｅｓ）、位相リセット処理を行い（Ｓ３０）、周波数別リセット（Ｂ）が不要と判別した場合は（Ｓ２９：Ｎｏ）、位相連続処理を行う（Ｓ３１）。つまり、周波数帯域別検出部（Ｂ）３１ｃにより、i-１回目演算２値化が０で、i回目演算の２値化が１の場合にアタック部が検出されたと判別し、位相リセット部３２による位相リセット処理を行い、アタック部が検出されなかった場合は、位相連続処理部３３による位相の連続化処理を行う。なお、位相リセット処理（Ｓ３０）および位相連続処理（Ｓ３１）の終了により、位相演算処理が終了となる。 Furthermore, when the total value of the normalized amplitude difference values is less than the low threshold value (S23: less than the low threshold value), the normalized amplitude difference value for each frequency component is binarized with the high threshold value (S28), and the frequency-specific reset (B ) Is determined (S29, third phase switching determination process). Here, when it is determined that frequency-specific reset (B) is necessary (S29: Yes), phase reset processing is performed (S30), and when frequency-specific reset (B) is determined to be unnecessary (S29: No), A continuous phase process is performed (S31). That is, the frequency band detection unit (B) 31c determines that the attack unit has been detected when the i-1th calculation binarization is 0 and the i-th calculation binarization is 1, and the phase reset unit 32 If the attack part is not detected, the phase continuation process by the phase continuation processing unit 33 is performed. Note that the phase calculation process is completed when the phase reset process (S30) and the phase continuation process (S31) are completed.

以上説明したとおり、本実施形態によれば、正規化振幅差分値に応じて、アタック部の有無を検出し、その検出結果に応じて、ＦＦＴの演算結果そのものを利用して位相のリセット処理を行う第１の位相演算処理と、前回のＦＦＴの演算結果から時間伸縮を考慮して位相の連続化処理を行う第２の位相演算処理と、のいずれかを行うため、高品質な音の変換が可能となる。特に、アタック部が検出された場合は、第１の位相演算処理を行うことでアタック感を再現するため、アタック感の消失を防止できる。これにより、ＦＦＴ方式であっても、リズム音などのアタックが急峻な楽音に対し、高品質なタイムスケーリングおよびピッチシフトを実現できる。 As described above, according to the present embodiment, the presence / absence of an attack portion is detected according to the normalized amplitude difference value, and the phase reset process is performed using the FFT calculation result itself according to the detection result. High-quality sound conversion to perform either the first phase calculation process to be performed or the second phase calculation process to perform phase continuation processing in consideration of time expansion and contraction from the previous FFT calculation result Is possible. In particular, when an attack portion is detected, the attack feeling is reproduced by performing the first phase calculation process, so that the disappearance of the attack feeling can be prevented. As a result, even with the FFT method, high-quality time scaling and pitch shift can be realized for a musical sound with a sharp attack such as a rhythm sound.

また、振幅の時間変化率を振幅で除算した正規化振幅差分値を用いてアタック部を検出するため、原音の音量が小さい場合でも正確且つ確実にアタック部を検出することができる。 Further, since the attack portion is detected using the normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude, the attack portion can be detected accurately and reliably even when the volume of the original sound is small.

さらに、正規化振幅差分値を用いて、異なる周波数帯域ごとに３回の位相切替判別処理を行うため、より確実にアタック部を検出することができる。例えば、正規化振幅差分値の合計値が高閾値以上の場合は、位相リセットをするべき周波数成分が広範囲に広がっていることを意味するため、全周波数帯域に対して第１の位相演算処理を行うことで、確実にアタック部を検出できる。また、正規化振幅差分値の合計値が高閾値未満の場合は、周波数帯域ごとにアタック部の有無を検出するため、微細なアタックであっても確実に検出することができる。 Furthermore, since the phase switching determination process is performed three times for each different frequency band using the normalized amplitude difference value, the attack portion can be detected more reliably. For example, when the total value of the normalized amplitude difference values is equal to or higher than the high threshold value, it means that the frequency components to be phase-reset are spread over a wide range. Therefore, the first phase calculation process is performed for all frequency bands. By doing so, it is possible to reliably detect the attack portion. Further, when the total value of the normalized amplitude difference values is less than the high threshold value, the presence / absence of an attack portion is detected for each frequency band, so even a fine attack can be reliably detected.

また、ＳＲＣ部１２により、サンプリングレート変換法を用いることで、周波数領域で周波数シフトを行わないためＦＦＴにより計算した位相そのものをアタック部分においてはピッチシフト変換音の位相とすることができるので、第１の位相演算処理により、アタック感の消失を防止できる。また、周波数シフト処理の誤差要因が少ないため、サンプリングレート変換法を使用しない従来のＦＦＴ方式と比較すると、アタック部以外の音質低下も防止でき、高品質なピットシフトが可能となる。 Further, by using the sampling rate conversion method by the SRC unit 12, since the frequency shift is not performed in the frequency domain, the phase itself calculated by FFT can be used as the phase of the pitch shift converted sound in the attack portion. By the phase calculation process of 1, the loss of attack can be prevented. In addition, since there are few error factors in the frequency shift process, compared with the conventional FFT method that does not use the sampling rate conversion method, it is possible to prevent deterioration in sound quality other than the attack portion, and high-quality pit shift is possible.

［第２実施形態］
次に、図５を参照し、本発明の第２実施形態について説明する。上記の第１実施形態では、振幅の時間変化率から得られる正規化振幅差分値に基づいてアタック部を検出したが、本実施形態は、位相の時間変化率から得られる位相断層度に基づいてアタック部を検出する点で異なる。以下、第１実施形態と異なる点のみ説明する。なお、本実施形態において、第１実施形態と同様の構成部分については同様の符号を付し、詳細な説明を省略する。また、第１実施形態と同様の構成部分について適用される変形例は、本実施形態についても同様に適用される。 [Second Embodiment]
Next, a second embodiment of the present invention will be described with reference to FIG. In the first embodiment, the attack unit is detected based on the normalized amplitude difference value obtained from the time change rate of the amplitude. However, in the present embodiment, the attack level is obtained based on the phase fault degree obtained from the time change rate of the phase. It is different in that the attack part is detected. Only differences from the first embodiment will be described below. In the present embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted. Moreover, the modification applied about the component similar to 1st Embodiment is applied similarly about this embodiment.

本実施形態のオーディオデータ処理部３は、図２に示した第１実施形態の機能構成から、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃを省略した構成となっている（図示省略）。当該構成により、本実施形態の位相演算部２２は、位相の時間変化率の演算結果に応じて、位相演算処理を行う（位相演算ステップ）。具体的には、アタック検出部３１により、位相の時間変化率である位相断層度を用いて、位相断層が生じているか否かを判別し、その判別結果に応じて位相演算処理を行う。つまり、位相断層が生じている場合は、アタック部「有」と判定し、位相リセット部３２による位相リセット処理を行う。また、位相断層が生じていない場合は、アタック部「無」と判定し、位相連続処理部３３による位相の連続化処理を行う。 The audio data processing unit 3 according to the present embodiment has a total frequency band detection unit 31a, a frequency band detection unit (A) 31b, and a frequency band detection unit (B) from the functional configuration of the first embodiment shown in FIG. 31c is omitted (not shown). With this configuration, the phase calculation unit 22 of the present embodiment performs phase calculation processing according to the calculation result of the time change rate of the phase (phase calculation step). Specifically, the attack detection unit 31 determines whether or not a phase fault has occurred using the phase tomographic degree that is the rate of time change of the phase, and performs a phase calculation process according to the determination result. That is, if a phase fault has occurred, it is determined that the attack portion is “present”, and phase reset processing by the phase reset unit 32 is performed. If no phase fault has occurred, it is determined that there is no attack, and the phase continuation processing by the phase continuation processing unit 33 is performed.

図５は、第２実施形態に係る位相演算処理を示すフローチャートである。なお、図３に示したピッチシフト処理のメインフローについては、第１実施形態と同様であるため、図示を省略する。本実施形態のオーディオデータ処理部３（位相演算部２２）は、まず、位相の２階差分を演算し（Ｓ４１）、位相断層度を算出する（Ｓ４２）。ここで、位相断層度が所定の閾値以上であるか否かに応じて、位相断層の有無（アタック部の有無）を判別する（Ｓ４３）。つまり、位相断層度が所定の閾値以上である場合は（Ｓ４３：あり）、位相リセット部３２により、周波数成分ごとの位相が、ＦＦＴ部２１の演算結果そのものとして位相リセット処理を行う（Ｓ４４）。また、位相断層度が所定の閾値未満である場合は（Ｓ４３：なし）、位相連続処理部３３により、周波数成分ごとの位相が、ＦＦＴ部２１の前回の演算結果から時間伸縮を考慮して連続変化したものとして位相連続処理を行う（Ｓ４５）。また、位相リセット処理（Ｓ４４）および位相連続処理（Ｓ４５）の終了後、位相演算処理を終了する。 FIG. 5 is a flowchart showing phase calculation processing according to the second embodiment. Since the main flow of the pitch shift process shown in FIG. 3 is the same as that of the first embodiment, the illustration is omitted. The audio data processing unit 3 (phase calculation unit 22) of the present embodiment first calculates a second-order phase difference (S41) and calculates a phase tomography degree (S42). Here, the presence / absence of a phase fault (presence / absence of an attack portion) is determined according to whether or not the phase fault degree is equal to or greater than a predetermined threshold (S43). In other words, when the phase tomographic degree is equal to or greater than the predetermined threshold (S43: Yes), the phase reset unit 32 performs phase reset processing as the calculation result of the FFT unit 21 by the phase for each frequency component (S44). In addition, when the phase slice degree is less than the predetermined threshold (S43: None), the phase for each frequency component is continuously considered by the phase continuation processing unit 33 in consideration of time expansion and contraction from the previous calculation result of the FFT unit 21. As a result of the change, phase continuation processing is performed (S45). In addition, after the phase reset process (S44) and the phase continuation process (S45) are finished, the phase calculation process is finished.

以上説明したとおり、本実施形態によれば、位相の時間変化率を用いることで、原音の音量に拠らず、アタック部を正確に検出することができる。また、位相断層の有無で２つの処理に分岐するだけで良いため、少ない演算処理量で、高品質なタイムスケーリングおよびピッチシフトを実現できる。 As described above, according to the present embodiment, the attack portion can be accurately detected regardless of the volume of the original sound by using the time change rate of the phase. Further, since it is only necessary to branch into two processes depending on the presence or absence of a phase slice, high-quality time scaling and pitch shift can be realized with a small amount of calculation processing.

なお、上記の実施例において、第２実施形態に係るオーディオデータ処理部３の構成は、図２に示した第１実施形態の機能構成から、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃを省略した構成であると記載したが、第１実施形態の機能構成と同様の構成としても良い。この場合、全周波数帯域検出部３１ａ、周波数帯域別検出部（Ａ）３１ｂおよび周波数帯域別検出部（Ｂ）３１ｃにより、位相断層度を用いて、異なる位相切替判別を行う複数の位相切替判別処理を行う。また、当該複数の位相切替判別処理により位相断層が生じているか否かを判別し、その判別結果に応じて位相演算処理を行う。なお、本例における位相演算処理の流れは、図４に示した第１実施形態の位相演算処理において、Ｓ２１の「振幅差分」を「位相差分」に、またＳ２２の「正規化振幅差分値」を「位相断層度」に変更したものと同様であるため、図示を省略する。 In the above example, the configuration of the audio data processing unit 3 according to the second embodiment is different from the functional configuration of the first embodiment shown in FIG. Although A) 31b and the frequency band detection unit (B) 31c have been described as being omitted, a configuration similar to the functional configuration of the first embodiment may be employed. In this case, a plurality of phase switching discriminating processes for performing different phase switching discriminating using the phase tomography degree by the all frequency band detecting unit 31a, the frequency band detecting unit (A) 31b and the frequency band detecting unit (B) 31c. I do. Further, it is determined whether or not a phase fault has occurred by the plurality of phase switching determination processes, and a phase calculation process is performed according to the determination result. The flow of the phase calculation process in this example is the same as the phase calculation process of the first embodiment shown in FIG. 4 except that the “amplitude difference” in S21 is changed to “phase difference” and the “normalized amplitude difference value” in S22. Is not shown in FIG.

また、第１実施形態の正規化振幅差分値を用いたアタック部の検出方法と、第２実施形態の位相断層度を用いたアタック部の検出方法と、を組み合わせてアタック部を検出しても良い。この構成によれば、アタック部の検出精度をより向上させることができる。 Moreover, even if the attack part is detected by combining the attack part detection method using the normalized amplitude difference value of the first embodiment and the attack part detection method using the phase tomography degree of the second embodiment, good. According to this configuration, the detection accuracy of the attack part can be further improved.

［第３実施形態］
次に、図６を参照し、本発明の第３実施形態について説明する。本実施形態では、位相リセット処理を行うべき成分と、位相連続処理を行うべき成分と、を高度に分類することを特徴とする。なお、本実施形態では、正規化振幅差分値を用いてアタック部を検出するものとする。また、本実施形態においても、上記の各実施形態と同様の構成部分については同様の符号を付し、詳細な説明を省略する。また、上記の各実施形態と同様の構成部分について適用される変形例は、本実施形態についても同様に適用される。 [Third Embodiment]
Next, a third embodiment of the present invention will be described with reference to FIG. The present embodiment is characterized in that a component to be subjected to phase reset processing and a component to be subjected to phase continuous processing are highly classified. In this embodiment, it is assumed that the attack portion is detected using the normalized amplitude difference value. Also in the present embodiment, the same reference numerals are given to the same components as those in the above-described embodiments, and detailed description thereof is omitted. Moreover, the modification applied about the component similar to said each embodiment is applied similarly about this embodiment.

図６は、第３実施形態に係る位相演算処理を示すフローチャートである。なお、図３に示したピッチシフト処理のメインフローについては、第１実施形態とほぼ同様であるため、異なる部分のみ言及する。本実施形態のオーディオデータ処理部３（位相演算部２２）は、まず、ＦＦＴ周波数グリッド番号ｊをｊ＝１とし（Ｓ５１）、遅延リセットカウンタが０になったか否かを判別する（Ｓ５２）。この遅延リセットカウンタは、正規化振幅差分値の合計値Σiが高閾値以上である場合であって（Ｓ５３：Ｙｅｓの場合）アタック部が検出された場合（Ｓ５４：Ｙｅｓの場合）、低周波成分のみ、所定時間だけタイミングを遅らせて位相のリセット処理を行うため（Ｓ７５）、そのタイミングを計測すべくオーディオデータ処理部３内に設けられたものである。なお、遅延リセットカウンタは、Ｓ５５にてセットされる。 FIG. 6 is a flowchart showing phase calculation processing according to the third embodiment. Note that the main flow of the pitch shift process shown in FIG. 3 is substantially the same as that of the first embodiment, and therefore only different parts will be mentioned. The audio data processing unit 3 (phase calculation unit 22) of the present embodiment first sets the FFT frequency grid number j to j = 1 (S51), and determines whether or not the delay reset counter has become 0 (S52). This delay reset counter is a case where the total value Σi of the normalized amplitude difference values is equal to or higher than the high threshold value (in the case of S53: Yes), and when the attack portion is detected (in the case of S54: Yes), the low frequency component In order to perform the phase reset process by delaying the timing by a predetermined time (S75), the audio data processing unit 3 is provided to measure the timing. The delay reset counter is set in S55.

Ｓ５２：Ｎｏの場合は、正規化振幅差分値の合計値Σiが高閾値以上であるか否かを判別する（Ｓ５３）。当該Ｓ５３および後述するＳ６４は、図４のＳ２３（第１の位相切替判別処理）に相当する工程である。本実施形態では、正規化振幅差分値の合計値Σiが高閾値より大きいか否かを判別するための閾値をＰ0とする。ここで、正規化振幅差分値の合計値Σiが高閾値以上の場合は（Ｓ５３：Ｙｅｓ）、i-１回目の正規化振幅差分値の合計値Σi-1が高閾値以上であったか否かを判別する（Ｓ５４）。ここで、i-１回目の正規化振幅差分値の合計値Σi-1が高閾値未満であった場合（Ｓ５４：Ｎｏ，つまり、i-１回目演算２値化が０で、i回目演算の２値化が１の場合）、全周波数帯域検出部３１ａによりアタック部が検出されたと判別する。また、i-１回目の正規化振幅差分値の合計値Σi-1が高閾値以上であった場合は（Ｓ５４：Ｙｅｓ）、Ｓ６９に移行する。 S52: In the case of No, it is determined whether or not the total value Σi of the normalized amplitude difference values is equal to or higher than a high threshold (S53). S53 and S64 described later are steps corresponding to S23 (first phase switching determination process) in FIG. In the present embodiment, the threshold for determining whether or not the total value Σi of the normalized amplitude difference values is larger than the high threshold is P0. Here, when the total value Σi of the normalized amplitude difference values is equal to or higher than the high threshold value (S53: Yes), it is determined whether or not the total value Σi-1 of the i-1th normalized amplitude difference value is equal to or higher than the high threshold value. A determination is made (S54). Here, when the sum value Σi-1 of the i-1th normalized amplitude difference value is less than the high threshold (S54: No, i.e., the i-1st operation binarization is 0, the i-th operation When binarization is 1, it is determined that the attack part is detected by the all frequency band detection part 31a. On the other hand, when the total value Σi-1 of the i-1th normalized amplitude difference values is equal to or higher than the high threshold (S54: Yes), the process proceeds to S69.

i-１回目の正規化振幅差分値の合計値Σi-1が高閾値未満であった場合は（Ｓ５４：Ｎｏ）、上記の遅延リセットカウンタの値をＮ（但し、Ｎは、Ｎ≧１となる任意の整数）にセットする（Ｓ５５）。その後、高域限定で（Ｓ５６）、Ｓ５７の継続音判定を行った後、必要に応じて位相リセット処理を行う（Ｓ５８）。 When the sum value Σi-1 of the i-1th normalized amplitude difference value is less than the high threshold (S54: No), the value of the delay reset counter is set to N (where N is N ≧ 1). To any arbitrary integer) (S55). Thereafter, only in the high frequency range (S56), after performing the continuous sound determination of S57, a phase reset process is performed as necessary (S58).

ここで、Ｓ５７の継続音判定について説明する。当該工程では、対象となる周波数成分が、周波数ピークが時間的に継続している継続成分（継続音）に相当するか否か、または継続成分となる周波数ピークのサイドローブ成分に相当するか否かを判別する。なお、継続成分については、所定処理回数以上（例えば、２回以上）、周波数ピークと判定した場合に「継続音」と判定する。また、「周波数ピーク」とは、ＦＦＴ変換で得られたスペクトルが極大となる箇所の周波数を指し、「サイドローブ成分」とは、周波数ピークの両隣に発生する、なだらかに大きさが小さくなっていく周波数帯を指す。このように、本実施形態では、アタック部が検出された場合でも、継続音または継続音サイドローブに該当する場合は、位相リセット処理の対象外とすることで、アタック部の前後で継続して鳴っている音を途切れにくくしている。したがって、Ｓ５７：Ｎｏの場合に、位相リセット処理を行い（Ｓ５８）、Ｓ５７：Ｙｅｓの場合は、位相連続処理を行う（Ｓ７０）。 Here, the continuous sound determination in S57 will be described. In this process, whether the target frequency component corresponds to a continuous component (continuous sound) in which the frequency peak continues in time, or whether it corresponds to a side lobe component of the frequency peak that becomes the continuous component Is determined. In addition, about a continuation component, when it determines with a frequency peak more than predetermined processing frequency (for example, 2 times or more), it determines with a "continuation sound". “Frequency peak” refers to the frequency at which the spectrum obtained by the FFT transform is maximized, and “side lobe component” is generated on both sides of the frequency peak and gradually decreases in size. It refers to the frequency band that goes. As described above, in this embodiment, even when an attack part is detected, if it falls into a continuous sound or a continuous sound sidelobe, it is excluded from the phase reset process, so that it continues before and after the attack part. The sound that is sounding is made difficult to interrupt. Therefore, in the case of S57: No, the phase reset process is performed (S58), and in the case of S57: Yes, the phase continuous process is performed (S70).

一方、Ｓ５３：Ｎｏと判別した場合は（正規化振幅差分値の合計値Σiが高閾値未満である場合は）、Ｓ５９およびＳ６０にて、２回のステレオ判定を行う。当該ステレオ判定とは、ステレオの左右の音（以下、「Ｌ，Ｒ」と記載する）に対する正規化振幅差分値の合算結果と、ステレオの左右の音それぞれの正規化振幅差分値と、を用いて、位相切替の判定（複数の位相切替判別処理の一部に相当）を行うものである。これにより、左右の音の位相リセットのタイミングを同期させ、音像（定位）の乱れを防止する。 On the other hand, when it is determined as S53: No (when the total value Σi of the normalized amplitude difference values is less than the high threshold value), the stereo determination is performed twice at S59 and S60. The stereo determination uses a summation result of normalized amplitude difference values for stereo left and right sounds (hereinafter referred to as “L, R”) and normalized amplitude difference values of the left and right stereo sounds. Thus, phase switching determination (corresponding to a part of a plurality of phase switching determination processes) is performed. Thereby, the timing of phase reset of the left and right sounds is synchronized, and the disturbance of the sound image (localization) is prevented.

具体的には、Ｓ５９において、Ｌ，Ｒに対する正規化振幅差分値の合計値の合算結果（ΣiL＋ΣiR）が所定閾値Ｐ1より大きく、且つＬの正規化振幅差分値の合計値ΣiLが所定閾値Ｐ1Lより大きく、且つＲの正規化振幅差分値の合計値ΣiRが所定閾値Ｐ1Rより大きい場合、Ｓ５９：Ｙｅｓと判定する。ここで、Ｐ1＞Ｐ0、Ｐ1L＝Ｐ1R＜Ｐ0であることが好ましい。つまり、Ｌ，Ｒそれぞれの正規化振幅差分値の合計値Σiが、Σi＞Ｐ0とならない場合であっても、Ｌ，Ｒに対する正規化振幅差分値の合計値の合算結果（ΣiL＋ΣiR）が大きく、Ｌ，Ｒに対するそれぞれの正規化振幅差分値の合計値（ΣiL，ΣiR）が最低限の閾値（Ｐ1L＝Ｐ1R）よりも大きい場合は、全周波数帯域検出部３１ａによりＬ，Ｒともにアタック部を検出する。このように、Ｓ５９では、Ｓ５４の判定を行うためのＬ，Ｒの足並みをそろえる処理を行っている。 Specifically, in S59, the sum (ΣiL + ΣiR) of the total values of the normalized amplitude difference values for L and R is greater than the predetermined threshold value P1, and the total value ΣiL of the L normalized amplitude difference values is greater than the predetermined threshold value P1L. When it is large and the total value ΣiR of the normalized amplitude difference values of R is larger than the predetermined threshold value P1R, it is determined as S59: Yes. Here, it is preferable that P1> P0 and P1L = P1R <P0. That is, even if the total value Σi of the normalized amplitude difference values of L and R does not satisfy Σi> P0, the total result (ΣiL + ΣiR) of the total values of the normalized amplitude difference values for L and R is large. When the total value (ΣiL, ΣiR) of the normalized amplitude difference values for L and R is larger than the minimum threshold (P1L = P1R), both L and R are detected by the entire frequency band detection unit 31a. To do. As described above, in S59, a process for aligning the L and R steps for the determination in S54 is performed.

また、Ｓ６０では、Ｌ，Ｒに対する正規化振幅差分値の合計値の合算結果（ΣiL＋ΣiR）が所定閾値Ｐ2より大きく、且つＬの正規化振幅差分値の合計値ΣiLが所定閾値Ｐ2Lより大きく、且つＲの正規化振幅差分値の合計値ΣiRが所定閾値Ｐ2Rより大きい場合、Ｓ６０：Ｙｅｓと判定する。ここで、Ｐ2＜Ｐ0、Ｐ2L＝Ｐ2R＞Ｐ1L＝Ｐ1Rであることが好ましい。つまり、Ｌ，Ｒに対する正規化振幅差分値の合計値の合算結果（ΣiL＋ΣiR）がある所定閾値（Ｐ2）より大きく、Ｌ，Ｒに対するそれぞれの正規化振幅差分値の合計値に大きな差が生じていない場合、ＬまたはＲの周波数成分ごとの正規化振幅差分値を、低閾値にて２値化し（Ｓ６１）、さらに高域限定で（Ｓ６２）、周波数別リセット（Ａ）の要否を判別する（Ｓ６３，第２の位相切替判別処理）。また、Ｓ６０：Ｎｏの場合は、Ｌ，Ｒそれぞれの正規化振幅差分値の合計値Σiが、Σi≧Ｐ3となるか否かを判別する（Ｓ６４）。ここで、所定閾値Ｐ3は、Ｐ3＜Ｐ0となる値であり、当該Ｓ６４では、正規化振幅差分値の合計値が低閾値（P3）以上であるか否かを判定している。このように、Ｓ６０では、Ｓ６４の判定を行うためのＬ，Ｒの足並みをそろえる処理を行っている。 In S60, the sum of the sum of the normalized amplitude difference values for L and R (ΣiL + ΣiR) is greater than the predetermined threshold P2, the sum of the normalized amplitude difference values of L is greater than the predetermined threshold P2L, and When the total value ΣiR of the R normalized amplitude difference values is larger than the predetermined threshold value P2R, it is determined as S60: Yes. Here, it is preferable that P2 <P0, P2L = P2R> P1L = P1R. That is, the sum of the sum of the normalized amplitude difference values for L and R (ΣiL + ΣiR) is greater than a predetermined threshold (P2), and there is a large difference in the sum of the normalized amplitude difference values for L and R. If not, the normalized amplitude difference value for each frequency component of L or R is binarized with a low threshold value (S61), further limited to the high frequency range (S62), and whether or not frequency-specific reset (A) is necessary is determined. (S63, second phase switching determination process). In the case of S60: No, it is determined whether or not the total value Σi of the normalized amplitude difference values of L and R is Σi ≧ P3 (S64). Here, the predetermined threshold P3 is a value satisfying P3 <P0, and in S64, it is determined whether or not the total value of the normalized amplitude difference values is equal to or greater than the low threshold (P3). As described above, in S60, the process of aligning the L and R steps for performing the determination in S64 is performed.

続いて、Ｓ６４：Ｙｅｓの場合（正規化振幅差分値の合計値が低閾値以上である場合）、周波数成分ごとの正規化振幅差分値を低閾値で２値化し（Ｓ６５）、さらに高域限定で（Ｓ６２）、周波数別リセット（Ａ）の要否を判別する（Ｓ６３）。Ｓ６３にて、周波数別リセット（Ａ）が必要と判別した場合（i-１回目演算２値化が０で、i回目演算の２値化が１の場合）は（Ｓ６３：Ｙｅｓ）、Ｓ５７に移行し、必要に応じて位相リセット処理を行う。また、周波数別リセット（Ａ）が不要と判別した場合は（Ｓ６３：Ｎｏ）、Ｓ６９に移行し、必要に応じて位相連続処理を行う。 Subsequently, in the case of S64: Yes (when the total value of the normalized amplitude difference values is equal to or higher than the low threshold value), the normalized amplitude difference value for each frequency component is binarized with the low threshold value (S65), and further limited to the high frequency range. (S62), it is determined whether or not the frequency-specific reset (A) is necessary (S63). In S63, when it is determined that the frequency-specific reset (A) is necessary (when the i-th calculation binarization is 0 and the i-th calculation binarization is 1) (S63: Yes), the process proceeds to S57. The phase shift process is performed as necessary. If it is determined that the frequency-specific reset (A) is not required (S63: No), the process proceeds to S69, and the continuous phase process is performed as necessary.

さらに、Ｓ６４：Ｎｏの場合（正規化振幅差分値の合計値が低閾値未満の場合）、周波数成分ごとの正規化振幅差分値を高閾値で２値化し（Ｓ６６）、高域限定で（Ｓ６７）、周波数別リセット（Ｂ）の要否を判別する（Ｓ６８，第３の位相切替判別処理）。このように、高域に周波数帯域を絞って周波数別リセット（Ｂ）の要否を判別し、位相演算処理を行うことで、高音質化を図ることができる。また、Ｓ６８にて、周波数別リセット（Ｂ）が必要と判別した場合（i-１回目演算２値化が０で、i回目演算の２値化が１の場合）は（Ｓ６８：Ｙｅｓ）、Ｓ５７に移行する。また、周波数別リセット（Ｂ）が不要と判別した場合は（Ｓ６８：Ｎｏ）、Ｓ６９に移行して、リセット音サイドローブ判定を行う。 Furthermore, in the case of S64: No (when the total value of normalized amplitude difference values is less than the low threshold value), the normalized amplitude difference value for each frequency component is binarized with a high threshold value (S66), and limited to the high frequency range (S67). ), It is determined whether or not the frequency-specific reset (B) is necessary (S68, third phase switching determination process). As described above, it is possible to improve the sound quality by narrowing the frequency band to a high frequency range, determining whether or not frequency-specific reset (B) is necessary, and performing the phase calculation process. If it is determined in S68 that the frequency-specific reset (B) is necessary (when the i-th calculation binarization is 0 and the i-th calculation binarization is 1) (S68: Yes), The process proceeds to S57. If it is determined that the frequency-specific reset (B) is not required (S68: No), the process proceeds to S69 to perform reset sound sidelobe determination.

ここで、リセット音サイドローブ判定について説明する。当該工程では、対象となる周波数成分が、周波数ピークが時間的に継続していない位相リセットされる非継続成分の周波数ピークの近傍に発生するサイドローブ成分に相当するか否かを判別する。つまり、リセット音サイドローブ判定は、i回目の処理において、位相リセットすると判定された周波数成分について、その成分が非継続の周波数ピークであった場合に、そのサイドローブ成分も合わせて位相リセット処理を行うための判定処理である。このように、本実施形態では、アタック部が検出されなかった周波数成分でも、その周波数成分が位相リセットする非継続周波数ピークのサイドローブであった場合は、アタック部をより鮮明に再現できるように、位相リセット処理を行う。したがって、Ｓ６９：Ｙｅｓの場合は、位相リセット処理を行い（Ｓ５８）、Ｓ６９：Ｎoの場合は、位相連続処理を行う（Ｓ７０）。 Here, the reset sound side lobe determination will be described. In this step, it is determined whether or not the target frequency component corresponds to a side lobe component generated in the vicinity of the frequency peak of the non-continuous component whose phase is not reset and the frequency peak is not continued in time. In other words, the reset sound side lobe determination is performed for the frequency component determined to be phase reset in the i-th processing when the component is a discontinuous frequency peak, and the side lobe component is also combined. It is a determination process for performing. As described above, in the present embodiment, even if a frequency component in which no attack portion is detected is detected, if the frequency component is a side lobe of a non-continuous frequency peak that causes phase reset, the attack portion can be reproduced more clearly. The phase reset process is performed. Therefore, in the case of S69: Yes, the phase reset process is performed (S58), and in the case of S69: No, the phase continuous process is performed (S70).

なお、Ｓ５８の位相リセット処理およびＳ７０の位相連続処理の後は、図３に示した第１実施形態の処理と同様に、振幅と位相を複素数化する（Ｓ７１）。その後、ＦＦＴ周波数グリッド番号ｊが、ＦＦＴサンプル数ｎ_ＦＦＴの半分に達したか否かを判別し（Ｓ７２）、達していない場合は（Ｓ７２：Ｎｏ）、ＦＦＴ周波数グリッド番号ｊをカウントアップして（Ｓ７３）、Ｓ５２に戻る。また、達した場合は（Ｓ７２：Ｙｅｓ）、遅延リセットカウンタをデクリメントし（カウンタの値を１ずつ減らし）（Ｓ７４）、図１３のＳ１２に移行する。一方、Ｓ５２：Ｙｅｓの場合（遅延リセットカウンタが０になった場合）は、低域限定で（Ｓ７５）、Ｓ５７の継続音判定を行い、その判定結果に応じて、位相リセット処理（Ｓ５８）または位相連続処理を行う（Ｓ７０）。 Note that, after the phase reset process of S58 and the continuous phase process of S70, the amplitude and phase are converted to complex numbers as in the process of the first embodiment shown in FIG. 3 (S71). Thereafter, it is determined whether or not the FFT frequency grid number j has reached half of the FFT sample number n _FFT (S72). If not (S72: No), the FFT frequency grid number j is incremented. (S73), the process returns to S52. If it has been reached (S72: Yes), the delay reset counter is decremented (the counter value is decreased by 1) (S74), and the process proceeds to S12 in FIG. On the other hand, in the case of S52: Yes (when the delay reset counter becomes 0), the low tone is limited (S75), the continuation sound determination of S57 is performed, and the phase reset process (S58) or according to the determination result A continuous phase process is performed (S70).

以上説明したとおり、本実施形態によれば、位相演算部２２は、アタック部が検出された場合でも、周波数ピークが時間的に継続している継続成分に対しては、位相リセット処理の対象外とすることで、高音質化を図ることができる。つまり、周波数ピークが時間的に継続している継続成分を継続音として検出し、これを位相連続処理することで、アタック部の前後で継続して鳴っている音を途切れにくくすることができる。また、継続音のサイドローブ成分についても、位相リセット処理の対象外とすることで、アタック部の前後で継続して鳴っている音をより高音質にできる。 As described above, according to the present embodiment, the phase calculation unit 22 is not subject to the phase reset process for the continuous component in which the frequency peak continues in time even when the attack unit is detected. As a result, high sound quality can be achieved. That is, by detecting a continuation component in which a frequency peak continues in time as a continuous sound and subjecting this to continuous phase processing, it is possible to make it difficult to interrupt the sound that is continuously sounding before and after the attack portion. In addition, the side lobe component of the continuous sound is also excluded from the phase reset process, so that the sound that is continuously sounding before and after the attack portion can be improved in sound quality.

また、位相演算部２２は、アタック部が検出されなかった場合でも、周波数ピークが時間的に継続していない非継続成分に対して位相のリセット処理を行う場合、当該周波数ピークの近傍に発生するサイドローブ成分に対しては、位相連続処理の対象外とすることで、更なる高音質化を図ることができる。つまり、前処理において位相リセットすると判定された周波数成分について、その成分が周波数ピークであった場合、そのサイドローブ成分も合わせて位相リセット処理を行うことで、アタック部をより鮮明に再現することができる。 Further, the phase calculation unit 22 occurs in the vicinity of the frequency peak when the phase reset process is performed on the non-continuous component in which the frequency peak is not continued in time even when the attack unit is not detected. By excluding the side lobe component from the target of the phase continuation processing, it is possible to further improve the sound quality. In other words, if the frequency component determined to be phase reset in the pre-processing is a frequency peak, the attack part can be reproduced more clearly by performing phase reset processing together with the side lobe component. it can.

また、位相演算部２２は、Ｌ，Ｒの正規化振幅差分値の合算結果を用いて、位相切替判別処理を行うため、Ｌ，Ｒの音に音量差があった場合でも、Ｌ，Ｒの位相リセットのタイミングを同期させ、音像の乱れを防止することができる。さらに、Ｌ，Ｒそれぞれの正規化振幅差分値を用いるため、Ｌ，Ｒのバランスを考慮して、より適切に音像の乱れを防止することができる。 Further, since the phase calculation unit 22 performs phase switching determination processing using the sum of the normalized amplitude difference values of L and R, even when there is a volume difference between the L and R sounds, The phase reset timing can be synchronized to prevent disturbance of the sound image. Furthermore, since the normalized amplitude difference values of L and R are used, it is possible to more appropriately prevent the disturbance of the sound image in consideration of the balance between L and R.

また、位相演算部２２は、正規化振幅差分値の合計値が高閾値以上である場合であってアタック部が検出された場合は、低周波成分のみ、所定時間だけタイミングを遅らせて位相リセット処理を行うため、低音域のアタック感を復活させることができる。これは、低音域の音は周期が長いため、前処理で検出した位相リセットのタイミングでは位相が安定せず、位相リセット処理の効果が小さいが、タイミングを遅らせることで、位相リセット処理の効果を高めることができるためである。これにより、低音打楽器における打撃音後に継続する低周波数の音、例えばバスドラムの胴鳴りなどの高音質化を図ることができる。 Further, when the total value of the normalized amplitude difference values is equal to or higher than the high threshold value and the attack part is detected, the phase calculation unit 22 delays the timing for a predetermined time only for the low frequency component and performs the phase reset process. Therefore, the attack feeling in the low range can be restored. This is because the low frequency sound has a long period, so the phase is not stable at the timing of phase reset detected in the preprocessing, and the effect of the phase reset process is small, but the effect of the phase reset process is reduced by delaying the timing. This is because it can be increased. As a result, it is possible to improve the sound quality of a low-frequency sound that continues after the percussion sound in a bass percussion instrument, for example, the drumming of a bass drum.

なお、上記の各実施形態において、オーディオデータ処理部３は、再生部２による再生に伴ってバッファメモリ４に書き込まれるオーディオデータを解析しながらピッチシフト（タイムスケーリング）を行うものとしたが、事前に解析したデータを読み出してこれらを行っても良い。つまり、楽曲を再生しながらリアルタイムにピッチシフト（タイムスケーリング）を行う構成としても良いし、事前に解析したデータを利用して、楽曲全体または楽曲の一部をピッチシフト（タイムスケーリング）する構成としても良い。 In each of the above embodiments, the audio data processing unit 3 performs the pitch shift (time scaling) while analyzing the audio data written to the buffer memory 4 along with the reproduction by the reproduction unit 2. These data may be read by reading the analyzed data. That is, it is good also as a structure which performs a pitch shift (time scaling) in real time, reproducing a music, or a structure which pitch-shifts (time scaling) the whole music or a part of music using the data analyzed beforehand. Also good.

また、上記に示したオーディオデータ処理部３の各構成要素をプログラムとして提供することが可能である。また、そのプログラムを各種記録媒体（ＣＤ−ＲＯＭ、フラッシュメモリ等）に格納して提供することも可能である。すなわち、コンピューターをオーディオデータ処理部３の各構成要素として機能させるためのプログラム、およびそれを記録した記録媒体も、本発明の権利範囲に含まれるものである。 In addition, each component of the audio data processing unit 3 described above can be provided as a program. Further, the program can be provided by being stored in various recording media (CD-ROM, flash memory, etc.). That is, a program for causing a computer to function as each component of the audio data processing unit 3 and a recording medium on which the program is recorded are also included in the scope of the right of the present invention.

また、上記の各実施形態では、オーディオデータ処理部３を再生装置１に適用した場合を例示したが、ミキサー装置などのＤＪ機器、各種電子楽器およびコンピューター（ＰＣアプリケーション）などに適用しても良い。また、カラオケ装置、ボイスチェンジャーおよび音声合成装置など、音高を変更する機能を有する音声処理装置への適用も有用である。例えば、本発明を適用することで、異なる楽曲を連続して再生するＤＪ機器において、連続再生する楽曲のキーが不協和な関係にある場合に、ピッチシフトにより親和性の高いキーに変換するハーモニックスミックス処理を高音質化できる。また、カラオケ装置において、ユーザの声の高さに合わせてキー変更する機能があるが、音質を落とさずにキー変更可能なように、音源を打ち込み音であるＭＩＤＩとしている場合が多いが、本発明を適用することによって、生音を音源に用いても高品質なキー変換が可能となる。 Further, in each of the above embodiments, the case where the audio data processing unit 3 is applied to the playback device 1 is exemplified, but the present invention may be applied to DJ equipment such as a mixer device, various electronic musical instruments, and a computer (PC application). . Moreover, application to a speech processing device having a function of changing the pitch, such as a karaoke device, a voice changer, and a speech synthesizer, is also useful. For example, by applying the present invention, in a DJ device that continuously plays different music pieces, when the keys of the music pieces to be played continuously are in a dissonant relationship, the harmonics are converted to keys having high affinity by pitch shift. The sound processing can be improved. In addition, in the karaoke apparatus, there is a function of changing the key according to the voice of the user, but in many cases, the sound source is set to MIDI, which is a driving sound, so that the key can be changed without degrading the sound quality. By applying the invention, it is possible to perform high-quality key conversion even when raw sound is used as a sound source.

さらに、キーを変えずに音声の時間軸長さだけを変更する場合など、タイムスケーリングのみの適用も可能である。例えば、本発明を適用することで、異なる楽曲を連続して再生するＤＪ機器において、連続再生する楽曲のテンポのみを変更し、キー（音高）を変更しないようにするタイムスケーリングを高音質化できる。また、音声を録音・再生できる装置において、高速再生しても、キーを変えない早聴き機能を高音質化できる。その他、本発明の要旨を逸脱しない範囲で、適宜変更が可能である。 Furthermore, only time scaling can be applied, for example, when only the time axis length of the voice is changed without changing the key. For example, by applying the present invention, in a DJ device that continuously plays different music, only the tempo of the music that is played continuously is changed, and the time scaling that does not change the key (pitch) is improved. it can. In addition, in a device capable of recording and playing back sound, a fast listening function that does not change the key even when played back at high speed can be improved in sound quality. Other modifications can be made as appropriate without departing from the scope of the present invention.

１…再生装置２…再生部３…オーディオデータ処理部４…バッファメモリ４ａ…入力バッファ４ｂ…出力バッファ５…オーディオデータ出力部１１…タイムスケーリング部１２…ＳＲＣ部２１…ＦＦＴ部２２…位相演算部２３…逆ＦＦＴ部２４…時間伸縮演算部３１…アタック検出部３１ａ…全周波数帯域検出部３１ｂ…周波数帯域別検出部（Ａ）３１ｃ…周波数帯域別検出部（Ｂ）３２…位相リセット部３３…位相連続処理部 DESCRIPTION OF SYMBOLS 1 ... Playback apparatus 2 ... Playback part 3 ... Audio data processing part 4 ... Buffer memory 4a ... Input buffer 4b ... Output buffer 5 ... Audio data output part 11 ... Time scaling part 12 ... SRC part 21 ... FFT part 22 ... Phase calculation part DESCRIPTION OF SYMBOLS 23 ... Inverse FFT part 24 ... Time expansion-contraction operation part 31 ... Attack detection part 31a ... All frequency band detection part 31b ... Detection part according to frequency band (A) 31c ... Detection part according to frequency band (B) 32 ... Phase reset part 33 ... Phase continuous processing section

Claims

オーディオデータ処理装置が、
デジタルオーディオデータを、周波数成分ごとの振幅と位相に変換する周波数変換ステップと、
前記振幅および／または前記位相の時間変化率の演算結果を用いて異なる位相切替判別を行う複数の位相切替判別処理の処理結果に応じ、前記周波数成分ごとの位相が、前記周波数変換ステップの演算結果そのものとして位相のリセット処理を行う第１の位相演算処理と、前記周波数成分ごとの位相が、前記周波数変換ステップの前回の演算結果から時間伸縮を考慮して連続変化したものとして位相の連続化処理を行う第２の位相演算処理と、の少なくともいずれかを行う位相演算ステップと、
前記位相演算ステップによる位相演算処理後の各周波数成分を、デジタルオーディオデータに変換する周波数逆変換ステップと、
前記周波数逆変換ステップによる周波数逆変換処理時に、時間伸縮率に比例して周波数逆変換後のデジタルオーディオデータのデータ数を増減させる時間伸縮演算ステップと、を実行することを特徴とするタイムスケーリング方法。 Audio data processing device
A frequency conversion step for converting digital audio data into amplitude and phase for each frequency component;
According to the processing results of a plurality of phase switching determination processes for performing different phase switching determination using the calculation result of the time change rate of the amplitude and / or the phase, the phase for each frequency component is the calculation result of the frequency conversion step. First phase calculation processing that performs phase reset processing as it is, and phase continuation processing in which the phase for each frequency component has changed continuously in consideration of time expansion and contraction from the previous calculation result of the frequency conversion step. A phase calculation step for performing at least one of a second phase calculation process for performing
Each frequency component after the phase calculation processing by the phase calculation step, a frequency inverse conversion step for converting into digital audio data,
Performing a time expansion / contraction calculation step of increasing / decreasing the number of digital audio data after frequency inverse conversion in proportion to the time expansion / contraction ratio during frequency inverse conversion processing by the frequency inverse conversion step. .

前記複数の位相切替判別処理は、異なる周波数帯域ごとにアタック部の有無を判別するものであり、
前記位相演算ステップでは、前記複数の位相切替判別処理の判別により前記アタック部「有」と判別された場合、前記第１の位相演算処理を行い、前記アタック部「無」と判別された場合、前記第２の位相演算処理を行うことを特徴とする請求項１に記載のタイムスケーリング方法。 The plurality of phase switching determination processing is for determining the presence or absence of an attack portion for each different frequency band,
In the phase calculation step, when it is determined that the attack unit is “present” by the determination of the plurality of phase switching determination processes, the first phase calculation process is performed, and when the attack unit is determined to be “none”, The time scaling method according to claim 1, wherein the second phase calculation process is performed.

前記位相演算ステップでは、前記振幅の時間変化率を振幅で除算した正規化振幅差分値を用いて、前記複数の位相切替判別処理を行うことを特徴とする請求項１に記載のタイムスケーリング方法。 2. The time scaling method according to claim 1, wherein in the phase calculation step, the plurality of phase switching determination processes are performed using a normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude.

前記位相演算ステップでは、前記位相切替判別処理として、前記正規化振幅差分値の合計値が所定の閾値以上であるか否かを判別し、所定の閾値以上である場合、アタック部の有無を判別し、当該アタック部が検出された場合は、全周波数帯域に対して前記第１の位相演算処理を行うことを特徴とする請求項３に記載のタイムスケーリング方法。 In the phase calculation step, as the phase switching determination process, it is determined whether or not the total value of the normalized amplitude difference values is equal to or greater than a predetermined threshold value. The time scaling method according to claim 3, wherein when the attack unit is detected, the first phase calculation process is performed on the entire frequency band.

前記位相演算ステップでは、前記位相切替判別処理として、前記正規化振幅差分値の合計値が所定の閾値以上であるか否かを判別し、所定の閾値未満である場合、さらに前記位相切替判別処理として、周波数成分ごとにアタック部の有無を判別し、当該アタック部が検出された場合は、周波数成分ごとに前記第１の位相演算処理を行い、前記アタック部が検出されなかった場合は、前記第２の位相演算処理を行うことを特徴とする請求項４に記載のタイムスケーリング方法。 In the phase calculation step, as the phase switching determination process, it is determined whether or not a total value of the normalized amplitude difference values is equal to or larger than a predetermined threshold. If the total value is less than a predetermined threshold, the phase switching determination process is further performed. As described above, the presence / absence of an attack part is determined for each frequency component, and when the attack part is detected, the first phase calculation process is performed for each frequency component, and when the attack part is not detected, The time scaling method according to claim 4, wherein a second phase calculation process is performed.

前記位相演算ステップでは、前記位相の時間変化率である位相断層度を用いて、前記複数の位相切替判別処理を行うことを特徴とする請求項１に記載のタイムスケーリング方法。 2. The time scaling method according to claim 1, wherein, in the phase calculation step, the plurality of phase switching determination processes are performed using a phase slice degree that is a time change rate of the phase.

前記位相演算ステップでは、前記振幅の時間変化率を振幅で除算した正規化振幅差分値と、前記位相の時間変化率である位相断層度と、を用いて、前記複数の位相切替判別処理を行うことを特徴とする請求項１に記載のタイムスケーリング方法。 In the phase calculation step, the plurality of phase switching determination processes are performed using a normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude, and a phase slice degree that is the time change rate of the phase. The time scaling method according to claim 1, wherein:

前記位相演算ステップでは、前記アタック部が検出された場合でも、周波数ピークが時間的に継続している継続成分に対しては、前記第２の位相演算処理を行うことを特徴とする請求項５に記載のタイムスケーリング方法。 6. The phase calculating step performs the second phase calculating process on a continuous component in which a frequency peak continues in time even when the attack unit is detected. The time scaling method described in.

前記位相演算ステップでは、前記周波数ピークの継続成分に対して位相のリセット処理を行わない場合、当該継続成分となる周波数ピークの近傍に発生するサイドローブ成分に対しても、前記第２の位相演算処理を行うことを特徴とする請求項８に記載のタイムスケーリング方法。 In the phase calculation step, when phase reset processing is not performed on the continuous component of the frequency peak, the second phase calculation is also performed on a sidelobe component generated in the vicinity of the frequency peak serving as the continuous component. The time scaling method according to claim 8, wherein processing is performed.

前記位相演算ステップでは、前記アタック部が検出されなかった場合でも、周波数ピークが時間的に継続していない非継続成分に対して位相のリセット処理を行う場合、当該周波数ピークの近傍に発生するサイドローブ成分に対して、前記第１の位相演算処理を行うことを特徴とする請求項５に記載のタイムスケーリング方法。 In the phase calculation step, when a phase reset process is performed on a non-continuous component whose frequency peak is not continued in time even when the attack portion is not detected, a side generated in the vicinity of the frequency peak 6. The time scaling method according to claim 5, wherein the first phase calculation process is performed on a lobe component.

前記位相演算ステップでは、ステレオの左右の音に対する前記正規化振幅差分値の合算結果を用いて、前記複数の位相切替判別処理を行うことを特徴とする請求項３に記載のタイムスケーリング方法。 4. The time scaling method according to claim 3, wherein, in the phase calculation step, the plurality of phase switching determination processes are performed using a summation result of the normalized amplitude difference values for stereo left and right sounds.

前記位相演算ステップでは、ステレオの左右の音に対する前記正規化振幅差分値の合算結果と、ステレオの左右の音それぞれの前記正規化振幅差分値と、を用いて、前記複数の位相切替判別処理を行うことを特徴とする請求項１１に記載のタイムスケーリング方法。 In the phase calculation step, the plurality of phase switching determination processes are performed using a summation result of the normalized amplitude difference values for the left and right sounds of the stereo and the normalized amplitude difference values of the left and right sounds of the stereo. The time scaling method according to claim 11, wherein the time scaling method is performed.

前記位相演算ステップでは、前記正規化振幅差分値の合計値が所定の閾値以上である場合であって前記アタック部が検出された場合は、低周波成分のみ、所定時間だけタイミングを遅らせて、前記第１の位相演算処理を行うことを特徴とする請求項４に記載のタイムスケーリング方法。 In the phase calculation step, when the total value of the normalized amplitude difference values is equal to or greater than a predetermined threshold and the attack portion is detected, only the low frequency component is delayed by a predetermined time, 5. The time scaling method according to claim 4, wherein the first phase calculation process is performed.

前記オーディオデータ処理装置が、
請求項１ないし１３のいずれか１項に記載のタイムスケーリング方法における各ステップと、
デジタルオーディオデータのサンプリング周波数を変更することで、時間伸縮および音高変更を行うサンプリングレート変換演算ステップと、を実行し、
前記タイムスケーリング方法の各ステップによる時間長変化と、前記サンプリングレート変換演算ステップによる時間長変化とが相殺され、音高のみが変更されることを特徴とするピッチシフト方法。 The audio data processing device is
Each step in the time scaling method according to any one of claims 1 to 13,
By changing the sampling frequency of the digital audio data, and performing a sampling rate conversion calculation step that performs time expansion and contraction and pitch change,
A pitch shift method characterized in that a time length change due to each step of the time scaling method and a time length change due to the sampling rate conversion calculation step are canceled out, and only the pitch is changed.

デジタルオーディオデータを、周波数成分ごとの振幅と位相に変換する周波数変換手段と、
前記振幅および／または前記位相の時間変化率の演算結果を用いて異なる位相切替判別を行う複数の位相切替判別処理の処理結果に応じ、前記周波数成分ごとの位相が、前記周波数変換手段の演算結果そのものとして位相のリセット処理を行う第１の位相演算処理と、前記周波数成分ごとの位相が、前記周波数変換手段の前回の演算結果から時間伸縮を考慮して連続変化したものとして位相の連続化処理を行う第２の位相演算処理と、の少なくともいずれかを行う位相演算手段と、
前記位相演算手段による位相演算処理後の各周波数成分を、デジタルオーディオデータに変換する周波数逆変換手段と、
前記周波数逆変換手段による周波数逆変換処理時に、時間伸縮率に比例して周波数逆変換後のデジタルオーディオデータのデータ数を増減させる時間伸縮演算手段と、を備えたことを特徴とするオーディオデータ処理装置。 Frequency conversion means for converting digital audio data into amplitude and phase for each frequency component;
According to the processing results of a plurality of phase switching determination processes for performing different phase switching determination using the calculation result of the time change rate of the amplitude and / or the phase, the phase for each frequency component is the calculation result of the frequency conversion means. A first phase calculation process for performing phase reset processing as it is, and a phase continuation process in which the phase for each frequency component is changed continuously in consideration of time expansion and contraction from the previous calculation result of the frequency conversion means. Second phase calculation processing for performing phase calculation means for performing at least one of:
Frequency inverse conversion means for converting each frequency component after the phase calculation processing by the phase calculation means into digital audio data;
Audio data processing, comprising: time expansion / contraction calculation means for increasing / decreasing the number of digital audio data after frequency inverse conversion in proportion to the time expansion / contraction ratio during frequency inverse conversion processing by the frequency inverse conversion means apparatus.

前記複数の位相切替判別処理は、異なる周波数帯域ごとにアタック部の有無を判別するものであり、
前記位相演算手段は、前記複数の位相切替判別処理の判別により前記アタック部「有」と判別された場合、前記第１の位相演算処理を行い、前記アタック部「無」と判別された場合、前記第２の位相演算処理を行うことを特徴とする請求項１５に記載のオーディオデータ処理装置。 The plurality of phase switching determination processing is for determining the presence or absence of an attack portion for each different frequency band,
The phase calculation means performs the first phase calculation process when it is determined that the attack part is `` present '' by the determination of the plurality of phase switching determination processes, and when it is determined that the attack part is `` no '', The audio data processing apparatus according to claim 15, wherein the second phase calculation process is performed.

前記位相演算手段は、前記振幅の時間変化率を振幅で除算した正規化振幅差分値を用いて、前記複数の位相切替判別処理を行うことを特徴とする請求項１５に記載のオーディオデータ処理装置。 16. The audio data processing apparatus according to claim 15, wherein the phase calculation means performs the plurality of phase switching determination processes using a normalized amplitude difference value obtained by dividing the time change rate of the amplitude by the amplitude. .

デジタルオーディオデータのサンプリング周波数を変更することで、時間伸縮および音高変更を行うサンプリングレート変換演算手段をさらに備え、
前記サンプリングレート変換演算手段および／または前記時間伸縮演算手段は、それぞれの演算処理にて発生した時間長変化を相殺することを特徴とする請求項１５に記載のオーディオデータ処理装置。 By further changing the sampling frequency of the digital audio data, it further comprises sampling rate conversion calculation means for performing time expansion and contraction and pitch change,
16. The audio data processing apparatus according to claim 15, wherein the sampling rate conversion calculation unit and / or the time expansion / contraction calculation unit cancels a change in time length generated in each calculation process.

コンピューターに、請求項１ないし１３のいずれか１項に記載のタイムスケーリング方法における各ステップを実行させるためのプログラム。 The program for making a computer perform each step in the time scaling method of any one of Claim 1 thru | or 13.

コンピューターに、請求項１４に記載のピッチシフト方法における各ステップを実行させるためのプログラム。 The program for making a computer perform each step in the pitch shift method of Claim 14.