JP2001005450A

JP2001005450A - Method of encoding acoustic signal

Info

Publication number: JP2001005450A
Application number: JP11177875A
Authority: JP
Inventors: Toshio Motegi; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 1999-06-24
Filing date: 1999-06-24
Publication date: 2001-01-12

Abstract

PROBLEM TO BE SOLVED: To encode a general acoustic signal with high quality by using MIDI data. SOLUTION: In this encoding method, acoustic signals to be encoded are fetched as digital acoustic data. Plural unit sections d1-d5 are set on the time base, and the section signal in each unit section is encoded. Sine functions and cosine functions are prepared for plural kinds of frequencies beforehand, and correlation values A, B of all the functions with the section signals are obtained, and their square sum root values are obtained. Three kinds of frequencies are selected as representative frequencies in descending order of the square sum root values, and phase angles θs are obtained based on the ratios of the correlation values A, B. Notes having pitches according to the three kinds of representative frequencies are accommodated in tracks T1-Ts. Each note is provided with the information on the amplitude strength e and the phase angle θ corresponding to the square sum root value.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音響信号の符号化方
法に関し、時系列の強度信号として与えられる音響信号
を符号化し、これを復号化して再生する技術に関する。
特に、本発明は一般の音響信号を、ＭＩＤＩ形式の符号
データに効率良く変換する処理に適しており、放送メデ
ィア（ラジオ、テレビ）、通信メディア（ＣＳ映像・音
声配信、インターネット配信）、パッケージメディア
（ＣＤ、ＭＤ、カセット、ビデオ、ＬＤ、ＣＤ−ＲＯ
Ｍ、ゲームカセット）などで提供する各種オーディオコ
ンテンツを制作する種々の産業分野への応用が期待され
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for encoding an audio signal, and more particularly to a technique for encoding an audio signal given as a time-series intensity signal, and decoding and reproducing the audio signal.
In particular, the present invention is suitable for efficiently converting a general acoustic signal into MIDI-format code data, and includes broadcast media (radio, television), communication media (CS video / audio distribution, Internet distribution), and package media. (CD, MD, cassette, video, LD, CD-RO
M, game cassettes) and the like, and is expected to be applied to various industrial fields for producing various audio contents provided by the user.

【０００２】[0002]

【従来の技術】音響信号を符号化する技術として、ＰＣ
Ｍ（Pulse Code Modulation ）の手法は最も普及してい
る手法であり、現在、オーディオＣＤやＤＡＴなどの記
録方式として広く利用されている。このＰＣＭの手法の
基本原理は、アナログ音響信号を所定のサンプリング周
波数でサンプリングし、各サンプリング時の信号強度を
量子化してデジタルデータとして表現する点にあり、サ
ンプリング周波数や量子化ビット数を高くすればするほ
ど、原音を忠実に再生することが可能になる。ただ、サ
ンプリング周波数や量子化ビット数を高くすればするほ
ど、必要な情報量も増えることになる。そこで、できる
だけ情報量を低減するための手法として、信号の変化差
分のみを符号化するＡＤＰＣＭ（Adaptive Differentia
l Pulse Code Modulation ）の手法も用いられている。2. Description of the Related Art As a technique for encoding an audio signal, a PC is used.
The M (Pulse Code Modulation) method is the most widespread method, and is currently widely used as a recording method for audio CDs and DATs. The basic principle of this PCM method is that an analog audio signal is sampled at a predetermined sampling frequency, and the signal strength at each sampling is quantized and represented as digital data. The more it is, the more faithful it is possible to reproduce the original sound. However, the higher the sampling frequency and the number of quantization bits, the larger the required information amount. Therefore, as a technique for reducing the amount of information as much as possible, an ADPCM (Adaptive Differentia) that encodes only a signal change difference is used.
l Pulse Code Modulation) is also used.

【０００３】一方、電子楽器による楽器音を符号化しよ
うという発想から生まれたＭＩＤＩ（Musical Instrume
nt Digital Interface）規格も、パーソナルコンピュー
タの普及とともに盛んに利用されるようになってきてい
る。このＭＩＤＩ規格による符号データ（以下、ＭＩＤ
Ｉデータという）は、基本的には、楽器のどの鍵盤キー
を、どの程度の強さで弾いたか、という楽器演奏の操作
を記述したデータであり、このＭＩＤＩデータ自身に
は、実際の音の波形は含まれていない。そのため、実際
の音を再生する場合には、楽器音の波形を記憶したＭＩ
ＤＩ音源が別途必要になる。しかしながら、上述したＰ
ＣＭの手法で音を記録する場合に比べて、情報量が極め
て少なくてすむという特徴を有し、その符号化効率の高
さが注目を集めている。このＭＩＤＩ規格による符号化
および復号化の技術は、現在、パーソナルコンピュータ
を用いて楽器演奏、楽器練習、作曲などを行うソフトウ
エアに広く採り入れられており、カラオケ、ゲームの効
果音といった分野でも広く利用されている。[0003] On the other hand, MIDI (Musical Instrume) was born from the idea of encoding musical instrument sounds by electronic musical instruments.
The Digital Interface (nt Digital Interface) standard has also been actively used with the spread of personal computers. Code data according to the MIDI standard (hereinafter, MID)
I data) is basically data describing an operation of playing a musical instrument, such as which keyboard key of the musical instrument was played and at what strength, and the MIDI data itself contains the actual sound. No waveform is included. Therefore, when reproducing the actual sound, the MI which stores the waveform of the musical instrument sound is used.
A DI sound source is required separately. However, the P
It has the feature that the amount of information is extremely small as compared with the case where sound is recorded by the CM method, and its high encoding efficiency has attracted attention. The encoding and decoding technology based on the MIDI standard is now widely used in software for playing musical instruments, practicing musical instruments, and composing music using a personal computer, and is also widely used in fields such as karaoke and game sound effects. Have been.

【０００４】[0004]

【発明が解決しようとする課題】上述したように、ＰＣ
Ｍの手法により音響信号を符号化する場合、十分な音質
を確保しようとすれば情報量が膨大になり、データ処理
の負担が重くならざるを得ない。したがって、通常は、
ある程度の情報量に抑えるため、ある程度の音質に妥協
せざるを得ない。もちろん、ＭＩＤＩ規格による符号化
の手法を採れば、非常に少ない情報量で十分な音質をも
った音の再生が可能であるが、上述したように、ＭＩＤ
Ｉ規格そのものが、もともと楽器演奏の操作を符号化す
るためのものであるため、広く一般音響への適用を行う
ことはできない。別言すれば、ＭＩＤＩデータを作成す
るためには、実際に楽器を演奏するか、あるいは、楽譜
の情報を用意する必要がある。As described above, the PC
In the case of encoding an audio signal by the method of M, the amount of information becomes enormous if sufficient sound quality is to be ensured, and the load of data processing must be increased. Therefore, usually
In order to keep the amount of information to a certain extent, we have to compromise on some sound quality. Of course, if the encoding method based on the MIDI standard is adopted, it is possible to reproduce a sound having a sufficient sound quality with a very small amount of information.
Since the I standard itself is originally for encoding the operation of musical instrument performance, it cannot be widely applied to general sound. In other words, in order to create MIDI data, it is necessary to actually play a musical instrument or prepare musical score information.

【０００５】このように、従来用いられているＰＣＭの
手法にしても、ＭＩＤＩの手法にしても、それぞれ音響
信号の符号化方法としては一長一短があり、一般の音響
について、少ない情報量で十分な音質を確保することは
できない。ところが、一般の音響についても効率的な符
号化を行いたいという要望は、益々強くなってきてい
る。いわゆるヴォーカル音響と呼ばれる人間の話声や歌
声を取り扱う分野では、かねてからこのような要望が強
く出されている。たとえば、語学教育、声楽教育、犯罪
捜査などの分野では、ヴォーカル音響信号を効率的に符
号化する技術が切望されている。このような要求に応え
るために、特開平１０−２４７０９９号公報、特開平１
１−７３１９９号公報、特開平１１−７３２００号公
報、特開平１１−９５７５３号公報、特願平１０−２８
３４５３号明細書、特願平１０−２８３４５４号明細
書、特願平１１−５８４３１号明細書には、ＭＩＤＩデ
ータを利用することが可能な新規な符号化方法が提案さ
れている。[0005] As described above, there are advantages and disadvantages in the encoding method of the audio signal in both the conventional PCM method and the MIDI method, and a small amount of information is sufficient for general audio. Sound quality cannot be ensured. However, there is an increasing demand for efficient encoding of general audio. In the field of dealing with human voices and singing voices, so-called vocal sound, such a request has been strongly issued for some time. For example, in fields such as language education, vocal education, and criminal investigation, there is a strong need for a technology for efficiently encoding vocal acoustic signals. In order to meet such a demand, Japanese Patent Application Laid-Open Nos.
1-73199, JP-A-11-73200, JP-A-11-95753, and Japanese Patent Application No. 10-28.
No. 3453, Japanese Patent Application No. 10-283454, and Japanese Patent Application No. 11-58431 propose a novel encoding method that can use MIDI data.

【０００６】これらの方法では、音響信号の時間軸に沿
って複数の単位区間を設定し、各単位区間ごとに相関の
高い所定の周期関数を対応させ、この周期関数に応じた
ＭＩＤＩデータを作成するという手順が実行される。し
かしながら、そもそもＭＩＤＩデータには、音響波形の
位相を考慮するという概念がないため、各単位区間ごと
に対応づけられた周期関数は位相の情報を有しておら
ず、最終的に得られるＭＩＤＩデータには、原音響信号
の位相に関する情報が失われてしまうことになる。この
ため、再生時には、原音響信号に忠実な音を再現するこ
とができず、再生音に歪みが発生するなど品質低下の問
題が生じていた。In these methods, a plurality of unit sections are set along a time axis of an acoustic signal, a predetermined periodic function having a high correlation is associated with each unit section, and MIDI data corresponding to the periodic function is created. Is performed. However, since the MIDI data does not have a concept of considering the phase of the acoustic waveform in the first place, the periodic function associated with each unit section does not have phase information, and the MIDI data obtained finally In this case, information on the phase of the original acoustic signal is lost. For this reason, at the time of reproduction, it is not possible to reproduce a sound faithful to the original sound signal, and there has been a problem of deterioration in quality such as distortion of reproduced sound.

【０００７】そこで本発明は、ＭＩＤＩデータのような
符号データへの変換を高い品質をもって行うことが可能
な音響信号の符号化方法を提供することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a method of encoding an audio signal capable of performing high-quality conversion to encoded data such as MIDI data.

【０００８】[0008]

【課題を解決するための手段】(1) 本発明の第１の態
様は、時系列の強度信号として与えられる音響信号を符
号化するための音響信号の符号化方法において、符号化
対象となる音響信号を、デジタルの音響データとして取
り込む入力段階と、この音響データの時間軸上に複数の
単位区間を設定する区間設定段階と、複数通りの周波数
を設定し、各設定周波数のそれぞれについて、互いに位
相が異なる一対の周期関数を定義する周期関数定義段階
と、個々の単位区間内の音響データと各周期関数との相
関値を計算し、各設定周波数のそれぞれについて、一対
の周期関数に対する総合的な相関が所定の基準以上の大
きさとなる１つまたは複数の設定周波数を代表周波数と
して選出する代表周波数選出段階と、個々の単位区間に
ついての各代表周波数ごとに、当該代表周波数をもった
一対の周期関数について得られた一対の相関値の比率に
基いて位相角を計算する位相角計算段階と、個々の単位
区間について、代表周波数、当該代表周波数をもった周
期関数についての相関を示す値、当該代表周波数につい
て計算された位相角、個々の単位区間の時間軸上での位
置、を示す情報を含む符号データを１つまたは複数生成
し、個々の単位区間の音響データを個々の符号データに
よって表現する符号化段階と、を行うようにしたもので
ある。(1) A first aspect of the present invention is an audio signal encoding method for encoding an audio signal given as a time-series intensity signal, which is to be encoded. An input step of capturing an audio signal as digital audio data, an interval setting step of setting a plurality of unit intervals on the time axis of the audio data, and setting of a plurality of frequencies. A periodic function defining step of defining a pair of periodic functions having different phases, and calculating a correlation value between the acoustic data in each unit section and each periodic function, and for each set frequency, comprehensively defining a pair of periodic functions. A representative frequency selecting step of selecting, as a representative frequency, one or a plurality of set frequencies whose correlation is greater than or equal to a predetermined reference; and a representative frequency for each unit section. A phase angle calculation step of calculating a phase angle based on a ratio of a pair of correlation values obtained for a pair of periodic functions having the representative frequency, and for each unit section, a representative frequency and the representative frequency. One or more code data including information indicating a value indicating a correlation with respect to the periodic function, a phase angle calculated for the representative frequency, and a position on the time axis of each unit section are generated. Encoding step of expressing the sound data of the unit section by individual code data.

【０００９】(2) 本発明の第２の態様は、上述の第１
の態様に係る音響信号の符号化方法において、個々の単
位区間に関する相関値を演算する際に、時間軸上の所定
の基準点について常に同一の位相をもった共通の周期関
数を用いるようにしたものである。(2) The second aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the aspect, when calculating the correlation value for each unit section, a common periodic function having the same phase at all times for a predetermined reference point on the time axis is used. Things.

【００１０】(3) 本発明の第３の態様は、上述の第１
または第２の態様に係る音響信号の符号化方法におい
て、周期関数として、その波形形状が正弦波、三角波、
矩形波、鋸歯状波となる複数通りの関数を定義してお
き、取り込んだ音響データに基いて所定の波形形状をも
った関数を選択的に用いるようにしたものである。(3) A third aspect of the present invention is the above-mentioned first aspect.
Alternatively, in the audio signal encoding method according to the second aspect, as a periodic function, the waveform shape is a sine wave, a triangular wave,
A plurality of functions such as a rectangular wave and a sawtooth wave are defined, and a function having a predetermined waveform shape is selectively used based on the acquired acoustic data.

【００１１】(4) 本発明の第４の態様は、上述の第１
〜第３の態様に係る音響信号の符号化方法において、複
数通りの周波数として、周波数値が等比級数配列をなす
周波数を設定するようにしたものである。(4) The fourth aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the third to third aspects, as the plurality of frequencies, frequencies whose frequency values form a geometric series are set.

【００１２】(5) 本発明の第５の態様は、上述の第１
〜第４の態様に係る音響信号の符号化方法において、一
対の周期関数として、互いに位相がπ／２だけ異なる周
期関数を定義し、この一対の周期関数に対する総合的な
相関として、各周期関数に対する各相関値の二乗和平方
根値を用いるようにしたものである。(5) The fifth aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the fourth to fourth aspects, periodic functions having phases different from each other by π / 2 are defined as a pair of periodic functions, and each periodic function is defined as an overall correlation with the pair of periodic functions. , The root-sum-square value of each correlation value is used.

【００１３】(6) 本発明の第６の態様は、上述の第１
〜第５の態様に係る音響信号の符号化方法において、符
号化段階において、位相角に相当する時間だけ単位区間
の時間軸上での位置を修正する処理を行い、位相角を示
す情報を、単位区間の時間軸上での位置を示す情報に内
包した符号データを生成するようにしたものである。(6) The sixth aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the fifth to fifth aspects, in the encoding stage, a process of correcting the position on the time axis of the unit section by a time corresponding to the phase angle is performed, and information indicating the phase angle is obtained. Code data included in information indicating the position of the unit section on the time axis is generated.

【００１４】(7) 本発明の第７の態様は、上述の第１
〜第５の態様に係る音響信号の符号化方法において、符
号化段階において、代表周波数をノートナンバー、相関
を示す値をベロシティー、単位区間の時間軸上での位置
をデルタタイム、位相角をチャンネル番号、によってそ
れぞれ表現したＭＩＤＩデータにより、符号化を行うよ
うにしたものである。(7) A seventh aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the fifth aspect, in the encoding step, the representative frequency is a note number, the value indicating the correlation is velocity, the position on the time axis of the unit section is delta time, and the phase angle is The encoding is performed by MIDI data respectively represented by channel numbers.

【００１５】(8) 本発明の第８の態様は、上述の第１
〜第７の態様に係る音響信号の符号化方法において、符
号化段階において、代表周波数、位相角、単位区間の時
間軸上での位置、なる３つの要素がそれぞれ所定の許容
範囲内で近似している複数の符号データが生成された場
合、これら複数の符号データを１つの符号データに統合
する処理を行うようにしたものである。(8) The eighth aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the seventh to seventh aspects, in the encoding step, the three elements of the representative frequency, the phase angle, and the position on the time axis of the unit section are each approximated within a predetermined allowable range. When a plurality of code data are generated, a process of integrating the plurality of code data into one code data is performed.

【００１６】(9) 本発明の第９の態様は、上述の第１
〜第８の態様に係る音響信号の符号化方法において、代
表周波数選出段階において、音響データのフーリエスペ
クトルにおけるスペクトル強度値に基いて、１つまたは
複数の代表周波数を選出するようにしたものである。(9) The ninth aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the eighth to eighth aspects, in the representative frequency selecting step, one or a plurality of representative frequencies are selected based on a spectrum intensity value in a Fourier spectrum of the audio data. .

【００１７】(10) 本発明の第１０の態様は、上述の第
１〜第８の態様に係る音響信号の符号化方法において、
代表周波数選出段階において、第ｊ番目の対象音響デー
タに対する相関が最も大きくなる一対の周期関数の周波
数を第ｊ番目の代表周波数として選出し、この第ｊ番目
の代表周波数を有し、求めた相関に応じた振幅をもった
周期関数からなる信号成分を第ｊ番目の対象音響データ
から減じることにより得られる音響データを、第（ｊ＋
１）番目の対象音響データとする処理を、ｊ＝１〜Ｐ
（Ｐは任意の整数）まで繰り返し実行し、Ｐ個の代表周
波数を選出するようにしたものである。(10) A tenth aspect of the present invention is the audio signal encoding method according to the first to eighth aspects, wherein:
In the representative frequency selection step, a frequency of a pair of periodic functions having the largest correlation with the j-th target acoustic data is selected as a j-th representative frequency, and the selected correlation function having the j-th representative frequency is determined. The acoustic data obtained by subtracting the signal component consisting of the periodic function having the amplitude according to the j-th target acoustic data from the (j +
1) The processing for the first target acoustic data is j = 1 to P
(P is an arbitrary integer), and P representative frequencies are selected.

【００１８】(11) 本発明の第１１の態様は、上述の第
１〜第１０の態様に係る音響信号の符号化方法をコンピ
ュータに実行させるためのプログラムを、コンピュータ
読み取り可能な記録媒体に記録するようにしたものであ
る。(11) According to an eleventh aspect of the present invention, a program for causing a computer to execute the audio signal encoding method according to the above-described first to tenth aspects is recorded on a computer-readable recording medium. It is something to do.

【００１９】[0019]

【発明の実施の形態】以下、本発明を図示する実施形態
に基づいて説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below based on an embodiment shown in the drawings.

【００２０】§１．本発明に係る音響信号の符号化方法
の基本原理はじめに、本発明に係る音響信号の符号化方法の基本原
理を述べておく。この基本原理は、前掲の各公報あるい
は明細書に開示されているので、ここではその概要のみ
を簡単に述べることにする。 §1. Audio signal encoding method according to the present invention
First, the basic principle of the audio signal encoding method according to the present invention will be described. Since this basic principle is disclosed in the above-mentioned publications or in the specification, only an outline thereof will be briefly described here.

【００２１】いま、図１(a) に示すように、時系列の強
度信号としてアナログ音響信号が与えられたものとしよ
う。図示の例では、横軸に時間ｔ、縦軸に振幅（強度）
をとってこの音響信号を示している。ここでは、まずこ
のアナログ音響信号を、デジタルの音響データとして取
り込む処理を行う。これは、従来の一般的なＰＣＭの手
法を用い、所定のサンプリング周期でこのアナログ音響
信号をサンプリングし、振幅を所定の量子化ビット数を
用いてデジタルデータに変換する処理を行えばよい。こ
こでは、説明の便宜上、ＰＣＭの手法でデジタル化した
音響データの波形も、図１(a) のアナログ音響信号と同
一の波形で示すことにする。Assume that an analog sound signal is given as a time-series intensity signal as shown in FIG. In the illustrated example, the horizontal axis represents time t, and the vertical axis represents amplitude (intensity).
The acoustic signal is shown in FIG. Here, first, a process of capturing the analog audio signal as digital audio data is performed. This can be done by using a conventional general PCM technique, sampling the analog audio signal at a predetermined sampling period, and converting the amplitude into digital data using a predetermined quantization bit number. Here, for convenience of explanation, the waveform of the acoustic data digitized by the PCM method is also shown by the same waveform as the analog acoustic signal of FIG.

【００２２】続いて、この符号化対象となる音響信号の
時間軸上に、複数の単位区間を設定する。図１(a) に示
す例では、時間軸ｔ上に等間隔に６つの時刻ｔ１〜ｔ６
が定義され、これら各時刻を始点および終点とする５つ
の単位区間ｄ１〜ｄ５が設定されている（より実用的な
区間設定方法については後述する）。Subsequently, a plurality of unit sections are set on the time axis of the audio signal to be encoded. In the example shown in FIG. 1A, six times t1 to t6 are equally spaced on the time axis t.
Are defined, and five unit sections d1 to d5 having these times as a start point and an end point are set (a more practical section setting method will be described later).

【００２３】こうして単位区間が設定されたら、各単位
区間ごとの音響信号（ここでは、区間信号と呼ぶことに
する）について、それぞれ代表周波数を選出する。各区
間信号には、通常、様々な周波数成分が含まれている
が、その中でも振幅の大きな周波数成分を代表周波数と
して選出すればよい。代表周波数は１つだけ選出しても
よいが、複数の代表周波数を選出した方が、より精度の
高い符号化が可能になる。代表周波数を選出する方法の
ひとつは、フーリエ変換を利用する方法である。すなわ
ち、各区間信号ごとに、それぞれフーリエ変換を行い、
スペクトルを作成する。このとき、ハニング窓（Hannin
g Window )などの重み関数で、切り出した区間信号にフ
ィルタをかけてフーリエ変換を施す。一般にフーリエ変
換は、切り出した区間前後に同様な信号が無限に存在す
ることが想定されているため、重み関数を用いない場
合、作成したスペクトルに高周波ノイズがのることが多
い。ハニング窓関数など区間の両端の重みが０になるよ
うな重み関数を用いると、このような弊害をある程度抑
制できる。ハニング窓関数Ｈ（ｋ）は、単位区間長をＬ
とすると、ｋ＝１…Ｌに対して、Ｈ（ｋ）＝０．５−０．５＊ｃｏｓ（２πｋ／Ｌ）で与えられる関数である。When the unit sections are set in this way, a representative frequency is selected for each of the sound signals (herein, referred to as section signals) for each unit section. Each section signal usually contains various frequency components, and among them, a frequency component having a large amplitude may be selected as a representative frequency. Only one representative frequency may be selected, but selecting a plurality of representative frequencies enables more accurate encoding. One of the methods for selecting a representative frequency is a method using Fourier transform. That is, a Fourier transform is performed for each section signal,
Create a spectrum. At this time, the Hanning window (Hannin
Using a weighting function such as g Window), the cut-out section signal is filtered and subjected to Fourier transform. In general, the Fourier transform is assumed to have an infinite number of similar signals before and after the cut-out section. Therefore, when a weighting function is not used, high frequency noise often appears on a created spectrum. By using a weighting function such as a Hanning window function in which the weights at both ends of the section become 0, such an adverse effect can be suppressed to some extent. The Hanning window function H (k) is expressed as follows:
Then, for k = 1... L, a function given by H (k) = 0.5−0.5 * cos (2πk / L).

【００２４】図１(b) には、単位区間ｄ１について作成
されたスペクトルの一例が示されている。このスペクト
ルでは、横軸上に定義された周波数ｆによって、単位区
間ｄ１についての区間信号に含まれる周波数成分（０〜
Ｆ：ここでＦはサンプリング周波数）が示されており、
縦軸上に定義された複素強度Ａによって、各周波数成分
ごとの複素強度が示されている。FIG. 1 (b) shows an example of a spectrum created for the unit section d1. In this spectrum, the frequency components (0 to 0) included in the section signal for the unit section d1 are determined by the frequency f defined on the horizontal axis.
F: where F is the sampling frequency).
The complex intensity A defined on the vertical axis indicates the complex intensity for each frequency component.

【００２５】次に、このスペクトルの周波数軸ｆに対応
させて、離散的に複数Ｘ個の符号コードを定義する。別
言すれば、周波数軸ｆ上に、複数Ｘ通りの周波数を設定
することになる。この例では、符号コードとしてＭＩＤ
Ｉデータで利用されるノートナンバーｎを用いており、
ｎ＝０〜１２７までの１２８個の符号コードを定義して
いる。ノートナンバーｎは、音符の音階を示すパラメー
タであり、たとえば、ノートナンバーｎ＝６９は、ピア
ノの鍵盤中央の「ラ音（Ａ３音）」を示しており、４４
０Ｈｚの音に相当する。このように、１２８個のノート
ナンバーには、いずれも所定の設定周波数が対応づけら
れるので、スペクトルの周波数軸ｆ上の所定位置に、そ
れぞれ１２８個のノートナンバーｎが離散的に定義され
ることになる。Next, a plurality of X code codes are discretely defined corresponding to the frequency axis f of the spectrum. In other words, a plurality of X kinds of frequencies are set on the frequency axis f. In this example, the code code is MID
Note number n used in I data is used,
128 code codes from n = 0 to 127 are defined. The note number n is a parameter indicating the scale of the note. For example, the note number n = 69 indicates the “ra (A3)” at the center of the keyboard of the piano.
This corresponds to a sound of 0 Hz. As described above, since a predetermined set frequency is associated with each of the 128 note numbers, 128 note numbers n are discretely defined at predetermined positions on the frequency axis f of the spectrum. become.

【００２６】ここで、ノートナンバーｎは、１オクター
ブ上がると、周波数が２倍になる対数尺度の音階を示す
ため、周波数軸ｆに対して線形には対応しない。すなわ
ち、周波数軸ｆ上に離散的に定義された各ノートナンバ
ーに対応する設定周波数は、個々の周波数値が等比級数
配列をなす周波数ということになる。そこで、ここでは
周波数軸ｆを対数尺度で表し、この対数尺度軸上にノー
トナンバーｎを定義した強度グラフを作成してみる。図
１(c) は、このようにして作成された単位区間ｄ１につ
いての強度グラフを示す。この強度グラフの横軸は、図
１(b) に示すスペクトルの横軸を対数尺度に変換したも
のであり、ノートナンバーｎ＝０〜１２７が等間隔にプ
ロットされている。一方、この強度グラフの縦軸は、図
１(b) に示すスペクトルの複素強度Ａを実効強度Ｅに変
換したものであり、各ノートナンバーｎの位置における
強度を示している。一般に、フーリエ変換によって得ら
れる複素強度Ａは、実数部Ｒ（余弦関数との相関を示
す）と虚数部Ｉ（正弦関数との相関を示す）とによって
表されるが、実効強度Ｅは、Ｅ＝（Ｒ^２＋Ｉ^２）^１／ ^２
なる二乗和平方根値として演算によって求めることがで
きる。Note that the note number n indicates a logarithmic scale in which the frequency doubles when the octave is increased by one octave, and thus does not correspond linearly to the frequency axis f. That is, the set frequency corresponding to each note number discretely defined on the frequency axis f is a frequency whose individual frequency values form a geometric series. Therefore, here, the frequency axis f is represented by a logarithmic scale, and an intensity graph in which a note number n is defined on the logarithmic scale axis is created. FIG. 1C shows an intensity graph for the unit section d1 created in this way. The horizontal axis of this intensity graph is obtained by converting the horizontal axis of the spectrum shown in FIG. 1B into a logarithmic scale, and note numbers n = 0 to 127 are plotted at equal intervals. On the other hand, the vertical axis of this intensity graph is obtained by converting the complex intensity A of the spectrum shown in FIG. 1B into the effective intensity E, and indicates the intensity at the position of each note number n. In general, the complex intensity A obtained by Fourier transform is represented by a real part R (correlation with a cosine function) and an imaginary part I (correlation with a sine function). ^{^{^{= (R 2 + I 2)}}} 1/2
It can be obtained by calculation as the root-sum-square value.

【００２７】こうして求められた単位区間ｄ１の強度グ
ラフは、単位区間ｄ１についての区間信号に含まれる振
動成分について、ノートナンバーｎ＝０〜１２７に相当
する各振動成分の割合を実効強度として示すグラフとい
うことができる。そこで、この強度グラフに示されてい
る各実効強度に基いて、全Ｘ個（この例ではＸ＝１２
８）のノートナンバーの中からＰ個のノートナンバーを
選択し、このＰ個のノートナンバーｎを、単位区間ｄ１
を代表する代表符号コードとして抽出する。これは、全
Ｘ通りの設定周波数の中から、Ｐ個の周波数を代表周波
数として選出することに他ならない。ここでは、説明の
便宜上、Ｐ＝３として、全１２８個の候補の中から３個
のノートナンバーを代表符号コードとして抽出する場合
を示すことにする。たとえば、「候補の中から強度の大
きい順にＰ個の符号コードを抽出する」という基準に基
いて抽出を行えば、図１(c) に示す例では、第１番目の
代表符号コードとしてノートナンバーｎ（ｄ１，１）
が、第２番目の代表符号コードとしてノートナンバーｎ
（ｄ１，２）が、第３番目の代表符号コードとしてノー
トナンバーｎ（ｄ１，３）が、それぞれ抽出されること
になる。The intensity graph of the unit section d1 obtained in this manner is a graph showing, as the effective intensity, the ratio of each of the vibration components corresponding to the note numbers n = 0 to 127 with respect to the vibration components included in the section signal for the unit section d1. It can be said. Therefore, based on each effective intensity shown in this intensity graph, a total of X (X = 12 in this example)
8) P note numbers are selected from the note numbers, and the P note numbers n are assigned to the unit section d1.
Is extracted as a representative code code representing. This is nothing but selecting P frequencies as representative frequencies from all the X set frequencies. Here, for convenience of explanation, it is assumed that P = 3 and three note numbers are extracted as representative code codes from a total of 128 candidates. For example, if the extraction is performed based on the criterion of “extracting P code codes from candidates in descending order of strength”, in the example shown in FIG. 1C, note number is used as the first representative code code. n (d1,1)
Is the note number n as the second representative code.
Note that (d1, 2) is extracted as the third representative code, and the note number n (d1, 3) is extracted.

【００２８】このようにして、Ｐ個の代表符号コードが
抽出されたら、これらの代表符号コードとその実効強度
によって、単位区間ｄ１についての区間信号を表現する
ことができる。たとえば、上述の例の場合、図１(c) に
示す強度グラフにおいて、ノートナンバーｎ（ｄ１，
１）、ｎ（ｄ１，２）、ｎ（ｄ１，３）の実効強度がそ
れぞれｅ（ｄ１，１）、ｅ（ｄ１，２）、ｅ（ｄ１，
３）であったとすれば、以下に示す３組のデータ対によ
って、単位区間ｄ１の音響信号を表現することができ
る。ｎ（ｄ１，１），ｅ（ｄ１，１）ｎ（ｄ１，２），ｅ（ｄ１，２）ｎ（ｄ１，３），ｅ（ｄ１，３）以上、単位区間ｄ１についての処理について説明した
が、単位区間ｄ２〜ｄ５についても、それぞれ別個に同
様の処理が行われ、代表符号コードおよびその強度を示
すデータが得られることになる。たとえば、単位区間ｄ
２については、ｎ（ｄ２，１），ｅ（ｄ２，１）ｎ（ｄ２，２），ｅ（ｄ２，２）ｎ（ｄ２，３），ｅ（ｄ２，３）なる３組のデータ対が得られる。このようにして各単位
区間ごとに得られたデータによって、原音響信号を符号
化することができる。After the P representative code codes have been extracted in this way, the section signal for the unit section d1 can be expressed by these representative code codes and their effective strengths. For example, in the case of the above example, the note number n (d1, d1) in the intensity graph shown in FIG.
1), n (d1, 2), and n (d1, 3) have an effective intensity of e (d1, 1), e (d1, 2), e (d1,
If 3), the acoustic signal of the unit section d1 can be represented by the following three data pairs. n (d1,1), e (d1,1) n (d1,2), e (d1,2) n (d1,3), e (d1,3) The processing for the unit section d1 has been described above. However, the same processing is performed separately for each of the unit sections d2 to d5, and the representative code and the data indicating its strength are obtained. For example, unit section d
For two, three data pairs of n (d2,1), e (d2,1) n (d2,2), e (d2,2) n (d2,3), e (d2,3) can get. The original audio signal can be encoded by the data obtained for each unit section in this manner.

【００２９】図２は、上述の方法による符号化の概念図
である。図２(a) には、図１(a) と同様に、原音響信号
について５つの単位区間ｄ１〜ｄ５を設定した状態が示
されており、図２(b) には、各単位区間ごとに得られた
符号データが音符の形式で示されている。この例では、
個々の単位区間ごとに３個の代表符号コードを抽出して
おり（Ｐ＝３）、これら代表符号コードに関するデータ
を３つのトラックＴ１〜Ｔ３に分けて収容するようにし
ている。たとえば、単位区間ｄ１について抽出された代
表符号コードｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ
１，３）は、それぞれトラックＴ１，Ｔ２，Ｔ３に収容
されている。もっとも、図２(b) は、上述の方法によっ
て得られる符号データを音符の形式で示した概念図であ
り、実際には、各音符にはそれぞれ強度に関するデータ
が付加されている。たとえば、トラックＴ１には、ノー
トナンバーｎ（ｄ１，１），ｎ（ｄ２，１），ｎ（ｄ
３，１）…なる音階を示すデータとともに、ｅ（ｄ１，
１），ｅ（ｄ２，１），ｅ（ｄ３，１）…なる強度を示
すデータが収容されることになる。また、図２(b) に示
す概念図では、音符の横方向の位置によって、個々の単
位区間の時間軸上での位置が示されているが、実際に
は、この時間軸上での位置を正確に数値として示すデー
タが各音符に付加されていることになる。FIG. 2 is a conceptual diagram of encoding by the above method. FIG. 2 (a) shows a state in which five unit sections d1 to d5 are set for the original sound signal, as in FIG. 1 (a), and FIG. Are shown in the form of musical notes. In this example,
Three representative code codes are extracted for each unit section (P = 3), and data relating to these representative code codes is stored in three tracks T1 to T3. For example, the representative code codes n (d1,1), n (d1,2), n (d) extracted for the unit section d1
1, 3) are accommodated in tracks T1, T2, T3, respectively. However, FIG. 2B is a conceptual diagram showing the code data obtained by the above-described method in the form of musical notes, and in practice, data relating to the intensity is added to each musical note. For example, the track T1 has note numbers n (d1,1), n (d2,1), n (d
(1), along with data indicating the scale, e (d1,
1), e (d2, 1), e (d3, 1)... Are stored. In the conceptual diagram shown in FIG. 2B, the position of each note on the time axis is indicated by the position of the note in the horizontal direction. Is accurately added as a numerical value to each note.

【００３０】なお、ここで採用する符号化の形式として
は、必ずしもＭＩＤＩ形式を採用する必要はないが、こ
の種の符号化形式としてはＭＩＤＩ形式が最も普及して
いるため、実用上はＭＩＤＩ形式の符号データを用いる
のが最も好ましい。ＭＩＤＩ形式では、「ノートオン」
データもしくは「ノートオフ」データが、「デルタタイ
ム」データを介在させながら存在する。「ノートオン」
データは、特定のノートナンバーＮとベロシティーＶと
を指定して特定の音の演奏開始を指示するデータであ
り、「ノートオフ」データは、特定のノートナンバーＮ
とベロシティーＶとを指定して特定の音の演奏終了を指
示するデータである。また、「デルタタイム」データ
は、所定の時間間隔を示すデータである。ベロシティー
Ｖは、たとえば、ピアノの鍵盤などを押し下げる速度
（ノートオン時のベロシティー）および鍵盤から指を離
す速度（ノートオフ時のベロシティー）を示すパラメー
タであり、特定の音の演奏開始操作もしくは演奏終了操
作の強さを示すことになる。It is not always necessary to adopt the MIDI format as the encoding format adopted here, but since the MIDI format is the most widely used as this type of encoding format, the MIDI format is practically used. It is most preferable to use the code data of In MIDI format, "Note On"
Data or "note-off" data exists with "delta time" data interposed. "Note on"
The data is data that designates a specific note number N and a velocity V to instruct the start of performance of a specific sound, and the “note off” data is a specific note number N
And data indicating the end of the performance of a specific sound by designating the velocity and the velocity V. The “delta time” data is data indicating a predetermined time interval. Velocity V is a parameter indicating, for example, the speed at which a piano keyboard or the like is depressed (velocity at the time of note-on) and the speed at which a finger is released from the keyboard (velocity at the time of note-off). Or it indicates the strength of the performance end operation.

【００３１】前述の方法では、第ｉ番目の単位区間ｄｉ
について、代表符号コードとしてＰ個のノートナンバー
ｎ（ｄｉ，１），ｎ（ｄｉ，２），…，ｎ（ｄｉ，Ｐ）
が得られ、このそれぞれについて実効強度ｅ（ｄｉ，
１），ｅ（ｄｉ，２），…，ｅ（ｄｉ，Ｐ）が得られ
る。そこで、次のような手法により、ＭＩＤＩ形式の符
号データを作成することができる。まず、「ノートオ
ン」データもしくは「ノートオフ」データの中で記述す
るノートナンバーＮとしては、得られたノートナンバー
ｎ（ｄｉ，１），ｎ（ｄｉ，２），…，ｎ（ｄｉ，Ｐ）
をそのまま用いればよい。一方、「ノートオン」データ
もしくは「ノートオフ」データの中で記述するベロシ
ティーＶとしては、得られた実効強度ｅ（ｄｉ，１），
ｅ（ｄｉ，２），…，ｅ（ｄｉ，Ｐ）を、所定の方法で
規格化した値を用いればよい。また、「デルタタイム」
データは、各単位区間の長さに応じて設定すればよい。In the above method, the i-th unit section di
, P note numbers n (di, 1), n (di, 2),..., N (di, P) as representative code codes
Are obtained, and the effective intensity e (di,
1), e (di, 2),..., E (di, P) are obtained. Therefore, MIDI-format code data can be created by the following method. First, as the note number N described in the “note-on” data or “note-off” data, the obtained note numbers n (di, 1), n (di, 2),. )
May be used as it is. On the other hand, as the velocity V described in the “note-on” data or the “note-off” data, the obtained effective intensity e (di, 1),
e (di, 2),..., and e (di, P) may be standardized by a predetermined method. Also, "Delta Time"
Data may be set according to the length of each unit section.

【００３２】結局、上述した実施形態では、３トラック
からなるＭＩＤＩ符号データが得られることになる。こ
のＭＩＤＩ符号データを３台のＭＩＤＩ音源を用いて再
生すれば、６チャンネルのステレオ再生音として音響信
号が再生される。As a result, in the above-described embodiment, MIDI code data composed of three tracks is obtained. If this MIDI coded data is reproduced using three MIDI sound sources, an acoustic signal is reproduced as six-channel stereo reproduced sound.

【００３３】なお、上述の例では、区間信号のフーリエ
スペクトルを求め、その強度値の大きい順にＰ個の周波
数（ノートナンバー）を選出して代表周波数とする処理
を行っているが、代表周波数の選出には、その他の方法
を用いてもかまわない。たとえば、特願平１１−５８４
３１号明細書には、一般化調和解析の手法を用いて代表
周波数の選出を行う例が示されている。この方法の基本
原理は次のとおりである。たとえば、図３(a) に示すよ
うな単位区間ｄについて、区間信号Ｘｊが与えられてい
るとしよう。ここで、この区間信号Ｘｊについてのフー
リエスペクトルを求め、そのピーク位置に相当する周波
数を代表周波数として選出する。続いて、図３(b) に示
すように、選出された代表周波数をもった周期関数Ｇｊ
を定義する。このとき、周期関数Ｇｊの振幅は、上記フ
ーリエスペクトルの代表周波数位置におけるスペクトル
強度に応じたものとなるように設定する。そして、区間
信号Ｘｊと周期関数Ｇｊとの差分信号Ｘｊ＋１を求める
（たとえば、図３(c) のようになる）。この差分信号Ｘ
ｊ＋１を新たな区間信号Ｘｊとして取り扱い、同様の処
理をパラメータｊをｊ＝１〜Ｐまで１ずつ増やしながら
Ｐ回繰り返し実行すれば、Ｐ個の代表周波数を選出する
ことができる。In the above-described example, the Fourier spectrum of the section signal is obtained, and P frequencies (note numbers) are selected in descending order of the intensity value, and the processing is performed as the representative frequency. Other methods may be used for the selection. For example, Japanese Patent Application No. Hei 11-584
No. 31 discloses an example of selecting a representative frequency by using a generalized harmonic analysis technique. The basic principle of this method is as follows. For example, suppose that a section signal Xj is given to a unit section d as shown in FIG. Here, a Fourier spectrum of the section signal Xj is obtained, and a frequency corresponding to the peak position is selected as a representative frequency. Subsequently, as shown in FIG. 3B, the periodic function Gj having the selected representative frequency
Is defined. At this time, the amplitude of the periodic function Gj is set so as to correspond to the spectrum intensity at the representative frequency position of the Fourier spectrum. Then, a difference signal Xj + 1 between the section signal Xj and the periodic function Gj is obtained (for example, as shown in FIG. 3C). This difference signal X
If j + 1 is treated as a new section signal Xj and the same processing is repeated P times while increasing the parameter j by 1 from j = 1 to P, P representative frequencies can be selected.

【００３４】また、上述した例では、非常に単純な区間
設定例を述べたが、実際には、より実用的な区間設定を
行うのが好ましい。すなわち、図２(a) に示された例で
は、時間軸ｔ上に等間隔に定義された６つの時刻ｔ１〜
ｔ６を境界として、５つの単位区間ｄ１〜ｄ５が設定さ
れている。このような区間設定に基いて符号化を行った
場合、再生時に、境界となる時刻において音の不連続が
発生しやすい。したがって、実用上は、隣接する単位区
間が時間軸上で部分的に重複するような区間設定を行う
のが好ましい。In the above example, a very simple section setting example has been described. However, in practice, it is preferable to set a more practical section setting. That is, in the example shown in FIG. 2A, six times t1 to t1 defined at equal intervals on the time axis t.
Five unit sections d1 to d5 are set with t6 as a boundary. When encoding is performed based on such a section setting, discontinuity of sound is likely to occur at a boundary time during reproduction. Therefore, in practice, it is preferable to set a section in which adjacent unit sections partially overlap on the time axis.

【００３５】図４(a) は、このように部分的に重複する
区間設定を行った例である。図示されている単位区間ｄ
１〜ｄ４は、いずれも部分的に重なっており、このよう
な区間設定に基いて前述の処理を行うと、図４(b) の概
念図に示されているような符号化が行われることにな
る。この例では、それぞれの単位区間の中心を基準位置
として、各音符をそれぞれの基準位置に配置している
が、単位区間に対する相対的な基準位置は、必ずしも中
心に設定する必要はない。図４(b) に示す概念図を図２
(b) に示す概念図と比較すると、音符の密度が高まって
いることがわかる。このように重複した区間設定を行う
と、作成される符号データの数は増加することになる
が、再生時に音の不連続が生じない自然な符号化が可能
になる。FIG. 4A shows an example in which a partially overlapping section is set as described above. The unit section d shown
1 to d4 partially overlap each other, and if the above-described processing is performed based on such section setting, encoding as shown in the conceptual diagram of FIG. become. In this example, each note is arranged at each reference position with the center of each unit section as a reference position, but the reference position relative to the unit section does not necessarily need to be set at the center. The conceptual diagram shown in FIG.
Compared with the conceptual diagram shown in (b), it can be seen that the density of notes has increased. When such overlapping sections are set, the number of code data to be created increases, but natural coding that does not cause sound discontinuity during reproduction can be performed.

【００３６】図５は、時間軸上で部分的に重複する区間
設定を行う具体的な手法を示す図である。この具体例で
は、音響信号を２２ｋＨｚのサンプリング周波数でサン
プリングすることによりデジタル音響データとして取り
込み、個々の単位区間の区間長Ｌを１０２４サンプル分
（約４７ｍｓｅｃ）に設定し、各単位区間ごとのずれ量
を示すオフセット長ΔＬを２０サンプル分（約０．９ｍ
ｓｅｃ）に設定したものである。すなわち、任意のｉに
対して、第ｉ番目の単位区間の始点と第（ｉ＋１）番目
の単位区間の始点との時間軸上での隔たりがオフセット
長ΔＬに設定されることになる。たとえば、第１番目の
単位区間ｄ１は、１〜１０２４番目のサンプルを含んで
おり、第２番目の単位区間ｄ２は、２０サンプル分ずれ
た２１〜１０４４番目のサンプルを含んでいることにな
る。時間軸上において、第１番目の単位区間ｄ１の始点
位置に基準点ｔ０を設定すれば、第２番目の単位区間ｄ
２の始点位置はｔ０＋ΔＬであり、第３番目の単位区間
ｄ３の始点位置はｔ０＋２・ΔＬとなる。FIG. 5 is a diagram showing a specific method for setting a partially overlapping section on the time axis. In this specific example, an audio signal is sampled at a sampling frequency of 22 kHz, taken in as digital audio data, the section length L of each unit section is set to 1024 samples (about 47 msec), and the shift amount for each unit section is set. Is set to 20 samples (about 0.9 m).
sec). That is, for any i, the offset on the time axis between the start point of the i-th unit section and the start point of the (i + 1) -th unit section is set as the offset length ΔL. For example, the first unit section d1 includes the 1st to 1024th samples, and the second unit section d2 includes the 21st to 1044th samples shifted by 20 samples. If the reference point t0 is set at the start point of the first unit section d1 on the time axis, the second unit section d
The start point of No. 2 is t0 + ΔL, and the start point of the third unit section d3 is t0 + 2 · ΔL.

【００３７】§２．本発明に係る音響信号の符号化方法本発明は、§１において基本原理を述べた符号化方法
に、更に、位相の概念を導入し、より高い品質をもった
符号化を実現するものである。上述した基本原理に基く
方法では、時系列の強度信号として与えられる音響信号
が、デジタルの音響データとして取り込まれ、この音響
データの時間軸上に複数の単位区間が設定される。そし
て、各単位区間内の音響データ（区間信号）に対して、
１つまたは複数の代表周波数が選出され、この代表周波
数をもった周期信号によって、当該区間信号が表現され
ることになる。ここで選出される代表周波数は、文字ど
おり、当該単位区間内の信号成分を代表する周波数であ
り、§１では、この代表周波数を、区間信号のフーリエ
スペクトルに基いて選出する方法および一般化調和解析
の手法を利用して選出する方法を述べた。いずれの方法
であっても、結局は、周波数の異なるいくつかの周期関
数を用意しておき、これら複数の周期関数の中から、当
該単位区間内の音響データに対する相関が高い周期関数
を見つけ出し、この相関の高い周期関数の周波数を代表
周波数として選出する、という概念は共通している。 §2. The encoding method of the audio signal according to the present invention The present invention realizes encoding with higher quality by further introducing the concept of phase into the encoding method described in §1 of the basic principle. . In the method based on the basic principle described above, an acoustic signal given as a time-series intensity signal is taken in as digital acoustic data, and a plurality of unit sections are set on a time axis of the acoustic data. Then, for the sound data (section signal) in each unit section,
One or more representative frequencies are selected, and the section signal is represented by a periodic signal having this representative frequency. The representative frequency selected here is, literally, a frequency representative of a signal component in the unit section. In §1, the representative frequency is selected based on the Fourier spectrum of the section signal, and the generalized harmonic analysis is performed. The method of selecting using the method described above was described. In any case, after all, some periodic functions having different frequencies are prepared, and among these multiple periodic functions, a periodic function having a high correlation with the acoustic data in the unit section is found, The concept of selecting a frequency of a periodic function having a high correlation as a representative frequency is common.

【００３８】そこで、この相関の求め方について、もう
少し具体的な説明を行っておこう。たとえば、図６に示
すように、ある単位区間ｄについて区間信号ｘが与えら
れたとする。ここでは、区間長Ｌをもった単位区間ｄに
ついて、サンプリング周波数Ｆでサンプリングが行われ
ており、全部でｗ個のサンプル値が得られているものと
し、サンプル番号を図示のように、０，１，２，３，
…，ｋ，…，ｗ−２，ｗ−１としよう。この場合、任意
のサンプル番号ｋについては、ｘ（ｋ）なる振幅値がデ
ジタルデータとして与えられていることになる。Therefore, a more specific description will be given of how to obtain the correlation. For example, as shown in FIG. 6, it is assumed that a section signal x is given for a certain unit section d. Here, it is assumed that sampling is performed at a sampling frequency F for a unit section d having a section length L, and that a total of w sample values have been obtained. 1,2,3,
..., k, ..., w-2, w-1. In this case, for an arbitrary sample number k, an amplitude value of x (k) is given as digital data.

【００３９】一方、複数の周期関数としては、図７に示
すような２５６通りの三角関数が用意されているものと
しよう。これらの三角関数は、同一周波数をもった正弦
波と余弦波との対から構成されており、１２８通りの周
波数ｆ（０）〜ｆ（１２７）のそれぞれについて、正弦
波および余弦波の対が定義されていることになる。各三
角関数内の変数Ｆおよびｋは、図６に示すように、区間
信号ｘについてのサンプリング周波数Ｆおよびサンプル
番号ｋに相当する変数である。たとえば、周波数ｆ
（０）についての正弦波は、sin （２πｆ（０）ｋ／
Ｆ）で示され、任意のサンプル番号ｋを与えると、第ｋ
番目のサンプルと同一時間位置における周期関数の振幅
値が得られる。On the other hand, it is assumed that 256 types of trigonometric functions as shown in FIG. 7 are prepared as a plurality of periodic functions. These trigonometric functions are composed of a pair of a sine wave and a cosine wave having the same frequency. For each of the 128 frequencies f (0) to f (127), a pair of a sine wave and a cosine wave is formed. It will be defined. Variables F and k in each trigonometric function are variables corresponding to sampling frequency F and sample number k for section signal x, as shown in FIG. For example, the frequency f
The sine wave for (0) is sin (2πf (0) k /
F), given an arbitrary sample number k,
The amplitude value of the periodic function at the same time position as the sample is obtained.

【００４０】図８は、ある単位区間ｄについての区間信
号ｘと、第ｎ番目の周波数ｆ（ｎ）をもった正弦波Ｒｎ
との相関値を求める原理を示す図である。両者の相関値
Ａ（ｎ）は、図９の第１の演算式によって定義すること
ができる。ここで、ｘ（ｋ）は、図８に示すように、区
間信号ｘにおけるサンプル番号ｋの振幅値であり、sin
（２πｆ（ｎ）・ｋ／Ｆ）は、時間軸上での同位置にお
ける正弦波Ｒｎの振幅値である。この第１の演算式は、
単位区間ｄ内の全サンプル番号ｋ＝０〜ｗ−１の位置に
ついて、それぞれ区間信号ｘの振幅値と正弦波Ｒｎの振
幅値との積を求め、その総和を求める式ということがで
きる。振幅値は正負の符号を有しているので、その積も
正負の符号を有したものになる。したがって、区間信号
ｘと正弦波Ｒｎとの間に全く相関がなかったとすれば、
両振幅の積の符号は、全くランダムに正になったり負に
なったりするので、その総和は０になる。逆に、両者間
に相関があれば、両振幅の積の総和の絶対値は、相関の
程度に応じて大きくなる。たとえば、区間信号ｘの振幅
が正である時には、正弦波Ｒｎの振幅も常に正であり、
区間信号ｘの振幅が負である時には、正弦波Ｒｎの振幅
も常に負である、というような正の相関がある場合（区
間信号ｘと正弦波Ｒｎとが同一周波数で同位相）なら
ば、積の総和は正の最大値になり、これとは逆に、区間
信号ｘの振幅が正である時には、正弦波Ｒｎの振幅は常
に負であり、区間信号ｘの振幅が負である時には、正弦
波Ｒｎの振幅は常に正である、というような負の相関が
ある場合（区間信号ｘと正弦波Ｒｎとが同一周波数で逆
位相）ならば、積の総和は負の最大値になる。FIG. 8 shows a section signal x for a certain unit section d and a sine wave Rn having an n-th frequency f (n).
FIG. 6 is a diagram illustrating a principle of obtaining a correlation value with the reference numeral. The correlation value A (n) between the two can be defined by the first arithmetic expression in FIG. Here, x (k) is an amplitude value of the sample number k in the section signal x as shown in FIG.
(2πf (n) · k / F) is the amplitude value of the sine wave Rn at the same position on the time axis. This first operation expression is:
It can be said that, for the positions of all sample numbers k = 0 to w−1 in the unit section d, the product of the amplitude value of the section signal x and the amplitude value of the sine wave Rn is obtained, and the sum is obtained. Since the amplitude values have positive and negative signs, their products also have positive and negative signs. Therefore, if there is no correlation between the section signal x and the sine wave Rn,
Since the sign of the product of the two amplitudes becomes positive or negative at random, the sum is zero. Conversely, if there is a correlation between the two, the absolute value of the sum of the products of the two amplitudes increases according to the degree of the correlation. For example, when the amplitude of the section signal x is positive, the amplitude of the sine wave Rn is always positive,
When there is a positive correlation such that the amplitude of the sine wave Rn is always negative when the amplitude of the section signal x is negative (the section signal x and the sine wave Rn have the same frequency and the same phase), The sum of the products becomes the maximum positive value. Conversely, when the amplitude of the section signal x is positive, the amplitude of the sine wave Rn is always negative, and when the amplitude of the section signal x is negative, If there is a negative correlation such that the amplitude of the sine wave Rn is always positive (the section signal x and the sine wave Rn have the same frequency and opposite phases), the sum of the products becomes a negative maximum value.

【００４１】同様に、図９の第２の演算式は、区間信号
ｘと、第ｎ番目の周波数ｆ（ｎ）をもった余弦波との相
関値を求める式であり、両者の相関値はＢ（ｎ）で与え
られる。なお、相関値Ａ（ｎ）を求めるための第１の演
算式も、相関値Ｂ（ｎ）を求めるための第２の演算式
も、最終的に係数Ｑが乗ぜられているが、この係数Ｑ
は、Ａ（ｎ），Ｂ（ｎ）が−１〜＋１の間の値となるよ
うにするためのものである。Similarly, the second arithmetic expression in FIG. 9 is an expression for calculating the correlation value between the section signal x and the cosine wave having the n-th frequency f (n). B (n). Note that the coefficient Q is finally multiplied by both the first arithmetic expression for calculating the correlation value A (n) and the second arithmetic expression for obtaining the correlation value B (n). Q
Is for setting A (n) and B (n) to values between −1 and +1.

【００４２】区間信号ｘと周波数ｆ（ｎ）をもった周期
関数との総合的な相関は、たとえば、図９の第３の演算
式に示すように、正弦波との相関値Ａ（ｎ）と余弦波と
の相関値Ｂ（ｎ）との二乗和平方根値Ｅ（ｎ）によって
示すことができる。このように、二乗和平方根値を用い
れば、正の相関と負の相関との双方を反映させた総合的
な相関を求めることができる。たとえば、正弦波に対し
ては正の相関を示し、余弦波に対しては負の相関を示す
ような場合、相関値Ａ（ｎ）は正の値となり、相関値Ｂ
（ｎ）は負の値となるが、二乗和平方根値Ｅ（ｎ）は、
両相関値の絶対値を反映した値となる。The overall correlation between the section signal x and the periodic function having the frequency f (n) can be calculated, for example, by the correlation value A (n) with the sine wave as shown in the third arithmetic expression in FIG. It can be indicated by the root sum square (E (n)) of the correlation value B (n) between the sine wave and the cosine wave. As described above, if the root-sum-square value is used, a comprehensive correlation that reflects both the positive correlation and the negative correlation can be obtained. For example, when a positive correlation is shown for a sine wave and a negative correlation is shown for a cosine wave, the correlation value A (n) becomes a positive value and the correlation value B
(N) is a negative value, but the root-sum-square value E (n) is
The value reflects the absolute value of both correlation values.

【００４３】図９に示す演算式は、周期関数として三角
関数を用いた場合の例（別言すれば、波形形状が正弦波
になる関数の例）であるが、本発明を実施する上で用い
る周期関数の波形形状は、正弦波に限定されるものでは
なく、三角波、矩形波、鋸歯状波などの波形形状をもっ
た周期関数を用いてもかまわない。実用上は、周期関数
として、その波形形状が正弦波、三角波、矩形波、鋸歯
状波になる複数通りの関数を定義しておき、取り込んだ
音響データの特性に基いて、所定の波形形状をもった関
数を手動（オペレータの指示）により選択的に用いるこ
とができるようにしておくのが好ましい。もちろん、オ
ペレータの選択指示を待たずに、取り込んだ音響データ
の特性を分析し、最も適した周期関数を自動選択するよ
うな機能をもたせておくこともできる。The arithmetic expression shown in FIG. 9 is an example in which a trigonometric function is used as a periodic function (in other words, an example of a function whose waveform shape is a sine wave). The waveform shape of the periodic function used is not limited to a sine wave, and a periodic function having a waveform shape such as a triangular wave, a rectangular wave, or a sawtooth wave may be used. In practice, as the periodic function, a plurality of functions whose waveform shapes are sine, triangular, rectangular, and sawtooth waves are defined, and a predetermined waveform shape is defined based on the characteristics of the acquired acoustic data. It is preferable that the function having the function can be selectively used manually (instruction of an operator). Of course, it is also possible to provide a function of analyzing the characteristics of the acquired acoustic data and automatically selecting the most suitable periodic function without waiting for an operator's selection instruction.

【００４４】図１０に示す式は、三角関数の代わりに、
周波数ｆ（ｎ）をもった一般的な周期関数Ｒｎを用いた
場合の相関を定義する演算式である。相関値Ａ（ｎ）を
求める演算式では、周期関数Ｒｎ（ｋ）が用いられてい
るのに対し、相関値Ｂ（ｎ）を求める演算式では、周期
関数Ｒｎ（ｋ＋Ｆ／４ｆ（ｎ））が用いられているの
は、両周期関数は、同じ周波数ｆ（ｎ）を有しているに
もかかわらず、互いに位相がπ／２だけ異なっているた
めである。上述したように、Ｆは区間信号ｘのサンプリ
ング周波数であり、Ｆ／ｆ（ｎ）は、１周期内のサンプ
ル総数に相当する。したがって、Ｆ／４ｆ（ｎ）は、１
／４周期に相当する時間内のサンプル数を示す値とな
り、位相差π／２をサンプル番号の単位で示した値とな
る。このように、周波数ｆ（ｎ）をもち、互いに位相が
π／２だけ異なる一対の周期関数について、それぞれ相
関値Ａ（ｎ），Ｂ（ｎ）を求めれば、その二乗和平方根
値Ｅ（ｎ）が、周波数ｆ（ｎ）をもった周期関数に対す
る総合的な相関を示すパラメータになる。The equation shown in FIG. 10 is obtained by replacing the trigonometric function with
This is an arithmetic expression that defines a correlation when a general periodic function Rn having a frequency f (n) is used. In the equation for calculating the correlation value A (n), the periodic function Rn (k) is used, whereas in the equation for calculating the correlation value B (n), the periodic function Rn (k + F / 4f (n)) Is used because both periodic functions have the same frequency f (n), but differ in phase from each other by π / 2. As described above, F is the sampling frequency of the section signal x, and F / f (n) corresponds to the total number of samples in one cycle. Therefore, F / 4f (n) is 1
This is a value indicating the number of samples in a time corresponding to ４ cycle, and is a value indicating the phase difference π / 2 in units of sample numbers. As described above, when the correlation values A (n) and B (n) are obtained for a pair of periodic functions having the frequency f (n) and the phases different from each other by π / 2, the root sum of squares E (n) is obtained. ) Is a parameter indicating an overall correlation with a periodic function having a frequency f (n).

【００４５】フーリエスペクトルの実効強度Ｅは、図９
の演算式による二乗和平方根値Ｅ（ｎ）に他ならない。
たとえば、図１(c) に示すグラフには、ノートナンバー
ｎ＝０〜１２７のそれぞれについての実効強度Ｅが示さ
れているが、第ｎ番目のノートナンバーｎの実効強度
は、図９に示す二乗和平方根値Ｅ（ｎ）として求められ
た値である。§１で述べたＭＩＤＩデータを利用した符
号化方法では、この二乗和平方根値Ｅ（ｎ）を、ＭＩＤ
Ｉデータのベロシティーとして用いることになる。この
ような符号化を行うと、区間信号ｘの位相に関する情報
は失われてしまう。そもそもＭＩＤＩデータは、位相に
関する情報を表現することを想定していないので、従来
の一般的なＭＩＤＩデータを利用した符号表現を行う限
り、位相に関する情報を表現することはできない。The effective intensity E of the Fourier spectrum is shown in FIG.
Is the root-sum-of-squares value E (n) according to the following equation.
For example, in the graph shown in FIG. 1C, the effective intensity E for each of the note numbers n = 0 to 127 is shown, while the effective intensity for the n-th note number n is shown in FIG. This is a value obtained as the root-sum-square value E (n). In the encoding method using MIDI data described in §1, this root sum of squares E (n)
It will be used as the velocity of the I data. When such encoding is performed, information on the phase of the section signal x is lost. Originally, MIDI data is not intended to represent information related to phase, so that information related to phase cannot be represented as long as a code representation using conventional general MIDI data is performed.

【００４６】しかしながら、原音響信号をより忠実に再
現することができる符号化を行うためには、個々の周期
信号相互間の位相に関する情報を含んだ符号を生成する
必要がある。本発明の主眼は、位相に関する情報を含ん
だ符号を生成することにある。上述したように、フーリ
エスペクトルの実効強度Ｅには、位相に関する情報は含
まれていない。しかしながら、この実効強度Ｅを求める
ための演算過程で、位相に関する情報は既に求められて
いる。すなわち、区間信号ｘの位相は、図９または図１
０に示す相関値Ａ（ｎ）とＢ（ｎ）との比として定義す
ることができる。別言すれば、所定の周波数ｆ（ｎ）を
有し、互いに位相がπ／２だけ異なる一対の周期関数を
用意し、区間信号ｘについて、これら一対の周期関数と
の相関値Ａ（ｎ），Ｂ（ｎ）を求めれば、このＡ（ｎ）
／Ｂ（ｎ）なる比（符号を考慮した比）が、区間信号ｘ
内に含まれている周波数ｆ（ｎ）の成分をもった周期信
号の位相を示すパラメータということになる。However, in order to perform encoding capable of reproducing the original audio signal more faithfully, it is necessary to generate a code including information on the phase between individual periodic signals. An object of the present invention is to generate a code including information on a phase. As described above, the information on the phase is not included in the effective intensity E of the Fourier spectrum. However, in the calculation process for obtaining the effective intensity E, information on the phase has already been obtained. That is, the phase of the section signal x is as shown in FIG.
It can be defined as the ratio between the correlation values A (n) and B (n) shown as 0. In other words, a pair of periodic functions having a predetermined frequency f (n) and a phase different from each other by π / 2 are prepared, and a correlation value A (n) of the section signal x with the pair of periodic functions is prepared. , B (n), this A (n)
/ B (n) (the ratio in consideration of the sign) is the section signal x
Is a parameter indicating the phase of the periodic signal having the component of the frequency f (n) included in the parameter.

【００４７】実用上は、位相に関する情報を、位相角と
して表現すると便利である。そこで、ここでは、相関値
の比Ａ（ｎ）／Ｂ（ｎ）を利用して、図１１に示す演算
式によって、第ｎ番目の周波数ｆ（ｎ）をもった周期関
数についての位相角θ（ｎ）を定義する。このような定
義を行うと、位相角θ（ｎ）は、必ず０≦θ（ｎ）＜２
πの範囲の値をとる。たとえば、図９に示す演算式を用
いて相関値を演算すると、Ａ（ｎ）＝１，Ｂ（ｎ）＝０
なる結果が得られた場合（正弦波に対して正の最大相関
を有し、余弦波に対しては全く相関を有していない場
合）、位相角θ（ｎ）＝π／２（正弦波の位相角に一
致）となる。逆に、Ａ（ｎ）＝０，Ｂ（ｎ）＝１なる結
果が得られた場合（余弦波に対して正の最大相関を有
し、正弦波に対しては全く相関を有していない場合）、
位相角θ（ｎ）＝０（余弦波の位相角に一致）となる。
また、Ａ（ｎ）＝−１，Ｂ（ｎ）＝０なる結果が得られ
た場合（正弦波に対して負の最大相関を有し、余弦波に
対しては全く相関を有していない場合）、位相角θ
（ｎ）＝π３／２となり、Ａ（ｎ）＝０，Ｂ（ｎ）＝−
１なる結果が得られた場合（余弦波に対して負の最大相
関を有し、正弦波に対しては全く相関を有していない場
合）、位相角θ（ｎ）＝πとなる。In practice, it is convenient to express information about the phase as a phase angle. Therefore, here, using the correlation value ratio A (n) / B (n), the phase angle θ for the periodic function having the n-th frequency f (n) is calculated by the arithmetic expression shown in FIG. (N) is defined. With this definition, the phase angle θ (n) must be 0 ≦ θ (n) <2.
Take a value in the range of π. For example, when the correlation value is calculated using the calculation formula shown in FIG. 9, A (n) = 1 and B (n) = 0
When the following result is obtained (a case having a positive maximum correlation with a sine wave and no correlation with a cosine wave), a phase angle θ (n) = π / 2 (a sine wave ). Conversely, when the results of A (n) = 0 and B (n) = 1 are obtained (they have a positive maximum correlation with the cosine wave, and have no correlation with the sine wave at all) Case),
The phase angle θ (n) = 0 (coincides with the phase angle of the cosine wave).
When the results A (n) =-1, B (n) = 0 are obtained (they have a negative maximum correlation with a sine wave, and have no correlation with a cosine wave at all). Case), phase angle θ
(N) = π3 / 2, A (n) = 0, B (n) = −
When a result of 1 is obtained (in the case of having a maximum negative correlation with a cosine wave and no correlation with a sine wave), the phase angle θ (n) = π.

【００４８】結局、ある１つの単位区間ｄ内の区間信号
ｘを符号化するのであれば、次のような手法を採ればよ
い。まず、図７に示すような１２７通りの設定周波数を
もった周期関数をそれぞれ一対ずつ用意する（１つの周
波数について、互いに位相がπ／２だけ異なる一対の周
期関数を用意する）。そして、図９に示す演算式に基い
て、周波数ｆ（ｎ）をもった周期関数との相関値Ａ
（ｎ），Ｂ（ｎ）を求める処理を、ｎ＝０〜１２７のそ
れぞれについて行い、それぞれについて二乗和平方根値
Ｅ（ｎ）を求める。そして、この二乗和平方根値Ｅ
（ｎ）が所定の基準以上の大きさとなる１つまたは複数
の設定周波数を代表周波数として選出する。ここまで
は、§１で述べた方法と全く同様である。なお、ここで
「Ｅ（ｎ）が所定の基準以上の大きさとなる」という選
出条件は、たとえば、何らかの閾値を設定しておき、Ｅ
（ｎ）がこの閾値を越えるような周波数ｆ（ｎ）をすべ
て代表周波数として選出する、という絶対的な選出条件
を設定してもよいが、たとえば、Ｅ（ｎ）の大きさの順
に３番目までを選出する、というような相対的な選出条
件を設定してもよい。After all, if the section signal x in one unit section d is to be encoded, the following method may be employed. First, a pair of periodic functions each having 127 set frequencies as shown in FIG. 7 are prepared (a pair of periodic functions having a phase different from each other by π / 2 are prepared for one frequency). Then, based on the operation formula shown in FIG. 9, a correlation value A with a periodic function having a frequency f (n) is obtained.
The process of obtaining (n) and B (n) is performed for each of n = 0 to 127, and the root-sum-square value E (n) is obtained for each of them. Then, the root-sum-square value E
One or more set frequencies where (n) is equal to or larger than a predetermined reference are selected as representative frequencies. The procedure so far is exactly the same as the method described in §1. Here, the selection condition that “E (n) is equal to or larger than a predetermined reference” is set, for example, by setting some threshold value.
An absolute selection condition of selecting all frequencies f (n) such that (n) exceeds the threshold value as the representative frequency may be set. For example, the third selection order is E (n) in the order of magnitude of E (n). A relative selection condition such as selecting up to may be set.

【００４９】本発明では、更に、各代表周波数につい
て、位相角が求められる。たとえば、上述の手順によ
り、３通りの代表周波数ｆ（ｎ１），ｆ（ｎ２），ｆ
（ｎ３）が選出されたとしよう。この場合、各代表周波
数について、それぞれ二乗和平方根値Ｅ（ｎ１），Ｅ
（ｎ２），Ｅ（ｎ３）が求まっていることになるが、本
発明では、更に、位相角θ（ｎ１），θ（ｎ２），θ
（ｎ３）を求めることになる。そして、単位区間ｄ内の
区間信号ｘは、代表周波数（ＭＩＤＩデータを利用する
場合であれば、ノートナンバー）、当該代表周波数をも
った周期関数についての二乗和平方根値（ＭＩＤＩデー
タを利用する場合であれば、ベロシティー）、当該単位
区間ｄの時間軸上での位置（ＭＩＤＩデータを利用する
場合であれば、デルタタイム）、を示す情報に、更に、
当該代表周波数について計算された位相角を示す情報を
付加した符号データによって表現されることになる。In the present invention, a phase angle is further obtained for each representative frequency. For example, according to the above-described procedure, three representative frequencies f (n1), f (n2), f
Suppose (n3) is elected. In this case, for each representative frequency, the root sum of squares E (n1), E
(N2) and E (n3) are determined, but in the present invention, the phase angles θ (n1), θ (n2), θ
(N3) will be obtained. The section signal x in the unit section d is represented by a representative frequency (note number if MIDI data is used) and a root-sum-square value of a periodic function having the representative frequency (when MIDI data is used). If so, information indicating the velocity) and the position of the unit section d on the time axis (delta time in the case of using MIDI data) are further included in the information.
It is represented by code data to which information indicating the phase angle calculated for the representative frequency is added.

【００５０】図４では、単位区間ｄ１内の音響データ
が、３つの音符ｎ（ｄ１，１），ｎ（ｄ１，２），ｎ
（ｄ１，３）によって表現される例を示した。各音符
は、その五線譜上の上下位置によって音階、すなわち周
波数を示しており（これらの周波数は、選出された３つ
の代表周波数に相当する）、また、五線譜上の左右位置
および音符の種類によって、単位区間ｄ１の時間軸上で
の位置（始点位置と長さ／もしくは始点位置と終点位
置）を示している。また、個々の音符は、振幅強度に関
する情報ｅ（ｄ１，１），ｅ（ｄ１，２），ｅ（ｄ１，
３）ももっている。本発明に係る符号化方法を行うと、
これらの各音符に、更に、位相に関する情報が含まれる
ことになる。たとえば、一般のＭＩＤＩ音符では、どの
音階の音を、どの期間にわたって、どのような強さで演
奏するか、という情報だけが提示されることになるが、
本発明に係る方法で生成されたＭＩＤＩ音符を用いる
と、更に、どのような位相をもった音響波形で演奏する
か、という位相に関する演奏指示を与えることができる
ようになるので、原音響信号をより忠実に再生すること
が可能になる。In FIG. 4, the sound data in the unit section d1 includes three notes n (d1, 1), n (d1, 2), n
The example represented by (d1, 3) was shown. Each note indicates a scale, that is, a frequency by its vertical position on the staff (these frequencies correspond to the three representative frequencies selected), and also, by the horizontal position and the type of the note on the staff. The position on the time axis of the unit section d1 (start point position and length / or start point position and end point position) is shown. Also, individual notes have information e (d1, 1), e (d1, 2), e (d1,
3) By performing the encoding method according to the present invention,
Each of these notes will further include information regarding the phase. For example, in a general MIDI note, only information of which scale is to be played, for what period, and at what strength, is presented.
By using the MIDI notes generated by the method according to the present invention, it is possible to further provide a performance instruction relating to the phase of the sound waveform to be played with the original sound signal. It becomes possible to reproduce more faithfully.

【００５１】なお、これまでの説明では、ある１つの単
位区間ｄ内の区間信号ｘを符号化する手順を述べたが、
実際には、図４(a) に示すように、時間軸上で少しずつ
ずらしながら、多数の単位区間を定義し、個々の単位区
間ごとにそれぞれ符号化を行うことになる。このため、
個々の符号に含まれた位相に関する情報が、複数の単位
区間にわたって整合性をもつようにしておくのが好まし
い。たとえば、図５に示す例では、第１の単位区間ｄ１
と第２の単位区間ｄ２とが、所定のオフセット長ΔＬだ
け隔たって定義されている。この場合、単位区間ｄ１に
ついて得られた符号と、単位区間ｄ２について得られた
符号とには、いずれも位相に関する情報が含まれている
が、両単位区間ｄ１，ｄ２は、時間軸上で大部分重複し
ているので、両単位区間における位相に整合性が確保さ
れていないと、位相に関する情報は本来の意味をもたな
くなる。In the above description, the procedure for encoding the section signal x in a certain unit section d has been described.
Actually, as shown in FIG. 4A, a large number of unit sections are defined while being slightly shifted on the time axis, and encoding is performed for each individual unit section. For this reason,
It is preferable that the information on the phase included in each code has consistency over a plurality of unit sections. For example, in the example shown in FIG. 5, the first unit section d1
And the second unit section d2 are defined to be separated by a predetermined offset length ΔL. In this case, both the code obtained for the unit section d1 and the code obtained for the unit section d2 contain information on the phase, but both the unit sections d1 and d2 are large on the time axis. Since the phases overlap, if the phases in both unit sections are not consistent, the information on the phases has no original meaning.

【００５２】そこで、個々の単位区間に関する相関値を
演算する際には、時間軸上の所定の基準点について常に
同一の位相をもった共通の周期関数を用いるようにする
とよい。たとえば、図７に示すような三角関数を相関値
演算に用いる場合、図５に示す基準点ｔ０（この例で
は、第１の単位区間ｄ１の始点位置に設定しているが、
時間軸上の任意の位置に設定してかまわない）につい
て、常に同一の位相をもった周期関数（たとえば、正弦
波であれば、基準点ｔ０の位置において振幅値０をとる
ような位相をもった周期関数）を、いずれの単位区間に
ついても共通して用いるようにするとよい。このよう
に、どの単位区間についても共通の周期関数を用いた相
関演算を行うには、図１０に示す第１および第２の演算
式の代わりに、図１２に示す演算式を用いるようにすれ
ばよい。図１２に示す演算式は、図１０に示す演算式に
おける変数ｋを（ｋ＋Ｋ）に置き換えたものである。こ
こで、Ｋは、基準点ｔ０から当該単位区間の始点位置に
至る区間内のサンプル数（たとえば、図５に示す単位区
間ｄ２についての演算を行う場合であれば、Ｋ＝２０）
であり、変数（ｋ＋Ｋ）は、基準点ｔ０のサンプルを第
１番目のサンプルとした累積サンプル番号を示す値にな
る。このように、図１２に示す演算式では、基準点ｔ０
の位置を基準としたサンプル番号が用いられることにな
り、異なる単位区間についての演算結果であっても、時
間軸が共通になるため、位相に関する情報に整合性が得
られることになる。Therefore, when calculating a correlation value for each unit section, it is preferable to use a common periodic function that always has the same phase for a predetermined reference point on the time axis. For example, when a trigonometric function as shown in FIG. 7 is used for the correlation value calculation, the reference point t0 shown in FIG. 5 (in this example, the reference point is set at the start point of the first unit section d1,
A periodic function that always has the same phase (it may be set at an arbitrary position on the time axis) (for example, a sine wave has a phase that takes an amplitude value 0 at the position of the reference point t0). Is preferably used in common for any unit section. As described above, in order to perform the correlation calculation using the common periodic function for any unit section, the calculation formula shown in FIG. 12 may be used instead of the first and second calculation formulas shown in FIG. I just need. The arithmetic expression shown in FIG. 12 is obtained by replacing the variable k in the arithmetic expression shown in FIG. 10 with (k + K). Here, K is the number of samples in the section from the reference point t0 to the start point position of the unit section (for example, K = 20 in the case of performing an operation on the unit section d2 shown in FIG. 5).
And the variable (k + K) is a value indicating the cumulative sample number with the sample at the reference point t0 as the first sample. Thus, in the calculation formula shown in FIG.
Is used as a reference, and even if the calculation results are for different unit sections, the time axis is common, so that the phase information can be consistent.

【００５３】§３．符号データの統合処理図４(b) に示す符号化の例では、個々の符号がすべて８
分音符で示されている。これは、単位区間ｄ１，ｄ２，
ｄ３，ｄ４，…がいずれも同一の区間長を有しているた
めである。しかしながら、実際の音響信号には、複数の
単位区間にわたって持続する音の成分が多数含まれてお
り、これらを個々の単位区間ごとに細切れに符号化する
と、符号化の効率は極めて低下してしまう。そこで、一
連の音成分を表現していると判断できる複数の符号につ
いては、１つの符号に統合する処理を行うのが好まし
い。たとえば、図４(b) のトラックＴ１に配置されてい
るノートｎ（ｄ２，１）とノートｎ（ｄ３，１）は、同
一音程の８分音符であるから、これを１つの４分音符に
統合しても問題はない。 §3. In the coding example shown in FIG. 4 (b), all the codes are 8
Indicated by minute notes. This is the unit section d1, d2
.. have the same section length. However, an actual audio signal contains a large number of sound components that persist over a plurality of unit sections, and if these are coded in small pieces for each unit section, the coding efficiency is extremely reduced. . Therefore, it is preferable to perform a process of integrating a plurality of codes that can be determined to represent a series of sound components into one code. For example, note n (d2,1) and note n (d3,1) arranged on track T1 in FIG. 4 (b) are eighth notes of the same pitch, and are converted into one quarter note. There is no problem with integration.

【００５４】ただ、本願発明により生成された符号デー
タには位相に関する情報が含まれているため、符号の統
合を行うか否かを判断する際には、この位相に関する情
報を考慮するのが好ましい。たとえば、同一音程の２つ
の符号データが連続している場合であっても、両者の位
相が不連続であったとすると、両符号データはそれぞれ
別個の音成分を表現している可能性が高く、１つの符号
に統合するべきではない。本願発明者は、符号の統合化
を行う基準として、代表周波数、位相角、単位区間の時
間軸上での位置、なる３つの要素がそれぞれ所定の許容
範囲内で近似している複数の符号データが生成された場
合に、これら複数の符号データを１つの符号データに統
合する処理を行うようにすると、理想的な符号統合化が
可能になると考えている。所定の許容範囲は、実情に合
わせて適宜設定することが可能である。たとえば、代表
周波数に関しては、「同一の周波数」というような厳格
な範囲を定めることもできるし、「ノートナンバーの差
が１以内」というような範囲を定めることもできる。位
相角に関しても同様に、たとえば、「差がπ／１０以
内」というような範囲を定めることができる。また、単
位区間の時間軸上での位置に関しては、たとえば、「単
位区間の始端位置の差が５ｍｓ以内」というような範囲
を定めることができる。However, since the code data generated according to the present invention contains information on the phase, it is preferable to consider the information on the phase when determining whether or not to integrate the codes. . For example, even if two code data of the same pitch are continuous, if both phases are discontinuous, there is a high possibility that both code data represent separate sound components, respectively. It should not be merged into one code. The inventor of the present application has determined that a plurality of code data in which three elements, that is, a representative frequency, a phase angle, and a position on a time axis of a unit section, are each approximated within a predetermined allowable range, as criteria for integrating codes. Is generated, it is considered that ideal code integration can be achieved by performing processing for integrating the plurality of code data into one code data. The predetermined allowable range can be appropriately set according to the actual situation. For example, with respect to the representative frequency, a strict range such as “the same frequency” can be defined, or a range such as “the difference between note numbers is within 1” can be defined. Similarly, for the phase angle, for example, a range such as “the difference is within π / 10” can be determined. As for the position of the unit section on the time axis, for example, a range such as “the difference between the start positions of the unit section is within 5 ms” can be defined.

【００５５】図１３は、このような方針に基く符号デー
タの統合処理の具体的な処理手順を示す図である。ここ
では、たとえば、図１３(a) に示すような符号データが
生成されたものとしよう。ここに示すＮ１〜Ｎ４は、そ
れぞれ所定の代表周波数、所定の位相角、所定の振幅強
度を有する符号データであり、同一名称の符号データは
代表周波数が同一であることを示しており、横軸上の位
置は、時間軸上での位置を示している。このような符号
データ群が得られたら、図１３(b) に示すように、各符
号データを代表周波数ごとにそれぞれ分離して配置して
みる。この例では、Ｎ１〜Ｎ４という４種類の代表周波
数をもつ符号データが存在するので、これらをそれぞれ
４行に分けて配置する。すると、同一行に隣接配置され
た符号データは、同一代表周波数を有し、時間軸上の近
似範囲に配置された符号データということになるので、
もし、位相角の差が所定の許容範囲内であったとすれ
ば、前述した統合化の基準を満たすことになる。そこ
で、同一行に隣接配置された符号データのうち、位相角
の差が所定の許容範囲内にあるものを統合化し、１つの
符号データに置き換える。なお、置換後の位相角は、た
とえば、置換前の各符号データの位相角の平均となるよ
うに設定すればよい。FIG. 13 is a diagram showing a specific processing procedure of code data integration processing based on such a policy. Here, it is assumed that, for example, code data as shown in FIG. N1 to N4 shown here are code data having a predetermined representative frequency, a predetermined phase angle, and a predetermined amplitude intensity, respectively, and code data having the same name indicates that the representative frequencies are the same, and the horizontal axis indicates The upper position indicates the position on the time axis. When such a code data group is obtained, as shown in FIG. 13 (b), each code data is separated and arranged for each representative frequency. In this example, there are code data having four types of representative frequencies N1 to N4, and these are respectively arranged in four rows. Then, the code data adjacently arranged in the same row has the same representative frequency and is code data arranged in an approximate range on the time axis.
If the phase angle difference is within a predetermined allowable range, the above-described integration criterion is satisfied. Therefore, among the code data adjacently arranged in the same row, data having a phase angle difference within a predetermined allowable range are integrated and replaced with one code data. The phase angle after the replacement may be set, for example, to be the average of the phase angles of the respective code data before the replacement.

【００５６】図１３(c) は、このような統合化後の符号
データを示す図である。矩形で囲われた符号データが、
統合化後の１つの符号データを示している。たとえば、
図１３(c) の１行目に示されている２つの符号データ
「Ｎ１，Ｎ１」は、図１３(b)に示されている４つの符
号データ「Ｎ１，Ｎ１，Ｎ１，Ｎ１」を統合して得られ
たものである。この例では、図１３(b) に示す４つの符
号データのうち、１番目および２番目の符号データの位
相角の差は許容範囲内であり、３番目および４番目の符
号データの位相角の差も許容範囲内であるが、２番目お
よび３番目の符号データの位相角の差が許容範囲を越え
ていた場合の統合結果が示されている。図１３(d) は、
この図１３(c) に示す符号データをＭＩＤＩ符号で示す
場合のノートオンおよびノートオフの符号列を示す図で
ある。このように、符号の統合化を行うことにより、本
来、図１３(a) に示すような形態であった符号データ
を、図１３(d) に示すように、合計１２組のノートオン
またはノートオフデータによって表現することが可能に
なる。FIG. 13C shows the code data after such integration. The code data enclosed by the rectangle is
One code data after integration is shown. For example,
The two code data "N1, N1" shown in the first row of FIG. 13 (c) combine the four code data "N1, N1, N1, N1" shown in FIG. 13 (b). It was obtained. In this example, of the four code data shown in FIG. 13B, the difference between the phase angles of the first and second code data is within the allowable range, and the difference between the phase angles of the third and fourth code data is obtained. Although the difference is also within the allowable range, the result of integration when the difference between the phase angles of the second and third code data exceeds the allowable range is shown. FIG. 13 (d)
FIG. 14 is a diagram showing a note-on and note-off code string when the code data shown in FIG. 13C is represented by a MIDI code. In this way, by integrating the codes, the code data originally having the form shown in FIG. 13A is replaced with a total of 12 sets of note-on or note data as shown in FIG. 13D. It can be represented by off-data.

【００５７】§４．ＭＩＤＩデータによる位相角の表現
手法本発明に係る符号化方法で生成された符号データには、
位相に関する情報が含まれているが、一般のＭＩＤＩデ
ータの規格には、位相に関する情報を表現する方法が用
意されていない。したがって、本発明に係る方法で生成
された符号データをＭＩＤＩデータの形式で出力する場
合、位相に関する情報を表現するための何らかの方策を
採る必要がある。そこで、ここでは位相角の情報をＭＩ
ＤＩデータに盛り込むための手法の一例を挙げておく。 §4. Expression of phase angle by MIDI data
The code data generated by the coding method according to the present invention includes:
Although information about the phase is included, the general MIDI data standard does not provide a method for expressing information about the phase. Therefore, when outputting the code data generated by the method according to the present invention in the form of MIDI data, it is necessary to take some measures for expressing information regarding the phase. Therefore, here, the information of the phase angle is set to MI
An example of a method for incorporating the data into DI data will be described.

【００５８】図１４は、現在、最も標準的に利用されて
いるＳＭＦ（Standard MIDI File）フォーマットによる
ＭＩＤＩデータの形式を示す図である。図示のとおり、
このＭＩＤＩデータは、「ノートオン」データもしくは
「ノートオフ」データが、「デルタタイム」データを介
在させながら存在する。「デルタタイム」データは、１
〜４バイトのデータで構成され、所定の時間間隔を示す
データである。一方、「ノートオン」データは、全部で
３バイトから構成されるデータであり、１バイト目はノ
ートオン符号「９０ H」に固定されており（後述するよ
うに、チャンネル番号０の場合。 Hは１６進数を示
す）、２バイト目にノートナンバーＮを示すコードが、
３バイト目にベロシティーＶonを示すコードが、それぞ
れ配置される。ノートナンバーＮは、音階（一般の音楽
でいう全音７音階の音階ではなく、ここでは半音１２音
階の音階をさす）の番号を示す数値であり、このノート
ナンバーＮが定まると、たとえば、ピアノの特定の鍵盤
キーが指定されることになる（Ｃ−２の音階がノートナ
ンバーＮ＝０に対応づけられ、以下、Ｎ＝１２７までの
１２８通りの音階が対応づけられる。ピアノの鍵盤中央
のラの音（Ａ３音）は、ノートナンバーＮ＝６９にな
る）。ベロシティーＶonは、音の強さを示すパラメータ
であり（もともとは、ピアノの鍵盤などを弾く速度を意
味する）、Ｖon＝０〜１２７までの１２８段階の強さが
定義される。FIG. 14 is a diagram showing the format of MIDI data in the SMF (Standard MIDI File) format currently most commonly used. As shown,
In the MIDI data, “note-on” data or “note-off” data exists while interposing “delta time” data. "Delta time" data is 1
This data is composed of up to 4 bytes of data and indicates a predetermined time interval. On the other hand, the “note-on” data is data composed of a total of 3 bytes, and the first byte is fixed to a note-on code “90 H” (for a channel number 0 as described later; H Indicates a hexadecimal number.) The code indicating the note number N in the second byte is
A code indicating the velocity Von is placed in the third byte. The note number N is a numerical value indicating the number of a musical scale (not a musical scale of seven whole notes in general music, but a musical scale of twelve semitones in this case). A specific keyboard key is designated (the scale of C-2 is associated with the note number N = 0, and hereafter, 128 types of scales up to N = 127 are associated. (Note A3) has a note number N = 69). The velocity Von is a parameter indicating the intensity of the sound (originally, it means the speed of playing the piano keyboard or the like), and defines 128 levels of intensity from Von = 0 to 127.

【００５９】同様に、「ノートオフ」データも、全部で
３バイトから構成されるデータであり、１バイト目は常
にノートオフ符号「８０ H」に固定されており（チャン
ネル番号０の場合）、２バイト目にノートナンバーＮを
示すコードが、３バイト目にベロシティーＶoff を示す
コードが、それぞれ配置される。「ノートオン」データ
と「ノートオフ」データとは対になって用いられ、この
一対のデータにより１つのノート（音符）についての発
音開始操作および発音終了操作が表現されることにな
る。たとえば、「９０ H，６９，８０」なる３バイトの
「ノートオン」データは、ノートナンバーＮ＝６９に対
応する鍵盤中央のラのキーを押し下げる操作（ラの音符
の発音開始操作）を表現し、以後、同じノートナンバー
Ｎ＝６９を指定した「ノートオフ」データが与えられる
まで、そのキーを押し下げた状態が維持される（実際に
は、ピアノなどのＭＩＤＩ音源波形を用いた場合、有限
の時間内に、ラの音の波形は減衰してしまう）。そし
て、ノートナンバーＮ＝６９を指定した「ノートオフ」
データは、たとえば、「８０ H，６９，５０」のような
３バイトのデータとして与えられ、このような「ノート
オフ」データは、鍵盤中央のラのキーから指を離す操作
（ラの音符の発音終了操作）を表現する。なお、「ノー
トオフ」データにおけるベロシティーＶoff の値は、た
とえばピアノの場合、鍵盤キーから指を離す速度を示す
パラメータになる。Similarly, the "note-off" data is data composed of a total of 3 bytes, and the first byte is always fixed to the note-off code "80H" (in the case of channel number 0). In the second byte, a code indicating the note number N is arranged, and in the third byte, a code indicating the velocity Voff is arranged. The “note-on” data and the “note-off” data are used in pairs, and the pair of data represents a sound generation start operation and a sound generation end operation for one note (note). For example, 3-byte "note-on" data of "90H, 69, 80" represents an operation of depressing a key at the center of the keyboard corresponding to a note number N = 69 (operation of starting sounding of a musical note at the key). Thereafter, the state in which the key is depressed is maintained until “note-off” data specifying the same note number N = 69 is given (actually, when a MIDI sound source waveform such as a piano is used, a finite In time, the sound waveform of La will attenuate.) Then, "note off" specifying the note number N = 69
The data is given as 3-byte data such as “80H, 69, 50”, and such “note-off” data is obtained by releasing a finger from the key at the center of the keyboard (the note at (Pronounce end operation). The value of the velocity Voff in the "note-off" data is a parameter indicating the speed at which a finger is released from a keyboard key in the case of a piano, for example.

【００６０】別言すれば、特定のノート（音符）に関す
る情報が、同一ノートナンバーＮを引用した「ノートオ
ン」データと「ノートオフ」データとのデータ対によっ
て表現されることになる。すなわち、特定のノート（音
符）に関して、「ノートオン」データにより発音開始操
作（たとえば、ピアノの鍵盤キーを押し下げる操作）が
記述され、「ノートオフ」データにより発音終了操作
（たとえば、鍵盤キーから指を離す操作）が記述され
る。また、この特定のノートの発音時間（発音開始操作
から発音終了操作に至るまでの時間：実際に楽器の音が
鳴り始めてから鳴り終わるまでの時間とは必ずしも一致
しない）は、「ノートオン」データと、これと対になる
「ノートオフ」データとの間に介在するデルタタイムに
よって定まる。In other words, information on a specific note (note) is represented by a data pair of “note on” data and “note off” data citing the same note number N. That is, for a specific note (note), a note-on data describes a sound generation start operation (for example, an operation of depressing a keyboard key of a piano), and a "note-off" data describes a sound end operation (for example, a finger from a keyboard key) Release) is described. The sounding time of this particular note (the time from the sounding start operation to the sounding end operation: the time from when the sound of the instrument actually starts to when the sound ends) is not necessarily the “note-on” data. , And the delta time intervening between it and the companion “note-off” data.

【００６１】図１５に、具体的なＭＩＤＩデータの構成
例を示す。図１５(a) に示す例は、ノートナンバーＮ１
で示されるノート（音符）の演奏情報を、ＭＩＤＩデー
タで記述したものである。データｄ１は、１〜４バイト
からなるデルタタイムＴ１を示すデータ（必要なバイト
数は、デルタタイムの長さによって異なる）である。こ
のデルタタイムのデジタル値は、たとえば、Ｔ１＝１／
７６８秒のようなスケーリングを予め定義しておくこと
により、時間を示す値となる。データｄ２は、ノートナ
ンバーＮ１で示されるノート（以下、単にノートＮ１と
いう）の発音開始操作を記述した「ノートオン」データ
であり、１バイト目にノートオン符号：９０H 、２バイ
ト目にノートナンバー：Ｎ１、３バイト目にベロシティ
ー：Ｖonの各コードが配置されている。たとえば、ピア
ノの場合、ノートＮ１に対応する鍵盤キーを、ベロシテ
ィーＶonで示される強さ（速度）で押し下げるという発
音開始操作を示すことになる。続くデータｄ３は、１〜
４バイトからなるデルタタイムＴ２を示すデータであ
り、やはり具体的な時間を示す値となる。最後のデータ
ｄ４は、ノートＮ１の発音終了操作を記述した「ノート
オフ」データであり、１バイト目にノートオフ符号：８
０H 、２バイト目にノートナンバー：Ｎ１、３バイト目
にベロシティー：Ｖoff の各コードが配置されている。
たとえば、ピアノの場合、ノートＮ１に対応する鍵盤キ
ーから、ベロシティーＶoff で示される強さ（速度）で
指を離すという発音終了操作を示すことになる。FIG. 15 shows a specific example of the configuration of MIDI data. The example shown in FIG. 15A shows the note number N1.
The performance information of the note (note) indicated by is described by MIDI data. The data d1 is data indicating the delta time T1 consisting of 1 to 4 bytes (the required number of bytes varies depending on the length of the delta time). The digital value of this delta time is, for example, T1 = 1 /
By defining a scaling such as 768 seconds in advance, a value indicating time is obtained. The data d2 is "note-on" data describing the sound generation start operation of the note indicated by the note number N1 (hereinafter simply referred to as note N1). The first byte has a note-on code: 90H, and the second byte has a note number. : N1, Velocity: Von codes are arranged in the third byte. For example, in the case of a piano, a sound generation start operation of depressing the keyboard key corresponding to the note N1 at the strength (speed) indicated by the velocity Von is shown. The following data d3 is 1 to
This is data indicating a delta time T2 consisting of 4 bytes, and also a value indicating a specific time. The last data d4 is "note-off" data describing a sound ending operation of note N1, and note-off code: 8 in the first byte.
0H, note number: N1 in the second byte, and velocity: Voff in the third byte.
For example, in the case of a piano, a sounding end operation of releasing a finger from the keyboard key corresponding to the note N1 at the strength (speed) indicated by the velocity Voff is indicated.

【００６２】こうして、図１５(a) に示すデータｄ１〜
ｄ４によって、ノートＮ１に関する演奏情報が記述され
ることになる。このように、ＭＩＤＩデータでは、同一
のノートナンバーＮを引用した一対のデータ（「ノート
オン」データおよび「ノートオフ」データ）によって、
特定のノートに関する演奏情報が示される。また、「ノ
ートオン」データや「ノートオフ」データで示される発
音開始操作や発音終了操作を実行するタイミングは、先
行する「デルタタイム」データに基づいて定まる。たと
えば、このＭＩＤＩデータを再生する際の基準時刻をｔ
０とすれば、データｄ２で示される発音開始操作の時刻
は、これに先行する「デルタタイム」データｄ１に基づ
いて定まり、具体的には、時刻（ｔ０＋Ｔ１）の時点で
発音開始操作が実行される。同様に、データｄ４で示さ
れる発音終了操作の時刻は、これに先行する「デルタタ
イム」データｄ１，ｄ３に基づいて定まり、具体的に
は、時刻（ｔ０＋Ｔ１＋Ｔ２）の時点で発音終了操作が
実行される。したがって、この例の場合のノートＮ１の
発音時間は、デルタタイムＴ２に一致する。Thus, the data d1 to d1 shown in FIG.
By d4, performance information on the note N1 is described. As described above, in the MIDI data, a pair of data (“note-on” data and “note-off” data) citing the same note number N is used.
Performance information for a particular note is shown. The timing of executing the sound generation start operation and the sound generation end operation indicated by the “note on” data and the “note off” data is determined based on the preceding “delta time” data. For example, the reference time for reproducing the MIDI data is t
If it is set to 0, the time of the sound generation start operation indicated by the data d2 is determined based on the preceding “delta time” data d1, and specifically, the sound generation start operation is executed at the time (t0 + T1). You. Similarly, the time of the sound generation end operation indicated by the data d4 is determined based on the preceding “delta time” data d1 and d3. Specifically, the sound generation end operation is executed at the time (t0 + T1 + T2). You. Therefore, the sounding time of the note N1 in this example coincides with the delta time T2.

【００６３】図１５(b) に示す例は、ノートＮ１の演奏
時間とノートＮ２の演奏時間とが一部重なり、和音が発
生する例である。まず、最初のデータｄ１によって、デ
ルタタイムＴ１が示され、続くデータｄ２によって、ノ
ートＮ１についての発音開始操作が示される。次のデー
タｄ３によって、再びデルタタイムＴ２が示され、続く
データｄ４によって、ノートＮ２についての発音開始操
作が示される。すなわち、この時点では、２つのノート
Ｎ１，Ｎ２が同時に発音している状態になり、和音とし
ての再生が行われることになる。続くデータｄ５によっ
て、デルタタイムＴ３が示され、データｄ６によって、
ノートＮ２についての発音終了操作が示される。更に、
データｄ７によって、デルタタイムＴ４が示され、最後
のデータｄ８によって、ノートＮ１についての発音終了
操作が示される。The example shown in FIG. 15B is an example in which the performance time of the note N1 and the performance time of the note N2 partially overlap, and a chord is generated. First, the first data d1 indicates the delta time T1, and the subsequent data d2 indicates a sound generation start operation for the note N1. The next data d3 indicates the delta time T2 again, and the subsequent data d4 indicates a sound generation start operation for the note N2. That is, at this point, the two notes N1 and N2 are sounding simultaneously, and the chord is reproduced. The subsequent data d5 indicates the delta time T3, and the data d6 indicates
The sound generation end operation for the note N2 is shown. Furthermore,
The data d7 indicates the delta time T4, and the last data d8 indicates the sound generation end operation for the note N1.

【００６４】結局、この図１５(b) に示すデータｄ１〜
ｄ８のうち、一対のデータｄ２，ｄ８は、同一のノート
ナンバーＮ１を引用してノートＮ１に関する演奏情報を
記述したデータであり、一対のデータｄ４，ｄ６は、同
一のノートナンバーＮ２を引用してノートＮ２に関する
演奏情報を記述したデータということになる。ここで、
個々の操作を行うべき時刻は、やはり先行するデルタタ
イムに基づいて定まることになる。すなわち、このＭＩ
ＤＩデータを再生する際の基準時刻をｔ０とすれば、デ
ータｄ２で示されるノートＮ１の発音開始操作は時刻
（ｔ０＋Ｔ１）、データｄ４で示されるノートＮ２の発
音開始操作は時刻（ｔ０＋Ｔ１＋Ｔ２）、データｄ６で
示されるノートＮ２の発音終了操作は時刻（ｔ０＋Ｔ１
＋Ｔ２＋Ｔ３）、データｄ８で示されるノートＮ１の発
音終了操作は時刻（ｔ０＋Ｔ１＋Ｔ２＋Ｔ３＋Ｔ４）と
なり、ノートＮ１の発音時間は、Ｔ２＋Ｔ３＋Ｔ４とな
り、ノートＮ２の発音時間は、Ｔ３となる。As a result, data d1 to d1 shown in FIG.
Among the data d8, a pair of data d2 and d8 is data describing performance information on the note N1 with reference to the same note number N1, and a pair of data d4 and d6 is data with reference to the same note number N2. This is data describing performance information on the note N2. here,
The time at which each operation should be performed will also be determined based on the preceding delta time. That is, this MI
Assuming that the reference time at the time of reproducing the DI data is t0, the sound generation start operation of the note N1 indicated by the data d2 is time (t0 + T1), the sound generation start operation of the note N2 indicated by the data d4 is the time (t0 + T1 + T2), The sound generation end operation of the note N2 indicated by d6 is performed at time (t0 + T1).
+ T2 + T3), the sound generation end operation of the note N1 indicated by the data d8 is at time (t0 + T1 + T2 + T3 + T4), the sound generation time of the note N1 is T2 + T3 + T4, and the sound generation time of the note N2 is T3.

【００６５】なお、上述の説明では、チャンネル番号が
０という前提の下で、ノートオン符号「９０ H」および
ノートオフ符号「８０ H」が固定であると述べたが、こ
れらの符号の下位４ビットは必ずしも０に固定されてい
るわけではなく、チャンネル番号０〜１５のいずれかを
特定するコードとして利用できる。このような複数のチ
ャンネルを利用すれば、各チャンネルごとにそれぞれ別
々の楽器の音色についてのオン・オフを指定することが
できる。In the above description, the note-on code "90 H" and the note-off code "80 H" are fixed under the assumption that the channel number is 0. The bit is not necessarily fixed to 0, and can be used as a code for specifying any of channel numbers 0 to 15. By using such a plurality of channels, it is possible to specify on / off of the tone of a different musical instrument for each channel.

【００６６】図１６は、図１４に示す「ノートオン」デ
ータのビット構成を示す図である。図示のとおり、「ノ
ートオン」データは、ノートオン符号を示す第１バイト
ＢＹ１，ノートナンバーＮを示す第２バイトＢＹ２，ベ
ロシティーＶonを示す第３バイトＢＹ３という３バイト
のデータから構成されているが、細かなビット構成は次
のとおりである。まず、第１バイトＢＹ１では、実際の
ノートオン符号が上位４ビットの「１００１」で示さ
れ、この上位４ビットの値は固定であるが、図に「＊＊
＊＊」と示された下位４ビットはチャンネル番号０〜１
５のいずれかを特定するコードとして利用される。チャ
ンネル番号が０の場合は、下位４ビットが「００００」
になるため、第１バイトＢＹ１は上述したように「９０
H」になる。続く、第２バイトＢＹ２では、ＭＳＢは必
ず「０」に固定されており、図に「＊＊＊＊＊＊＊」と
示された下位７ビットによりノートナンバーＮ（０〜１
２７の１２８通り）が示される。同様に、第３バイトＢ
Ｙ３では、ＭＳＢは必ず「０」に固定されており、図に
「＊＊＊＊＊＊＊」と示された下位７ビットによりベロ
シティーＶon（０〜１２７の１２８通り）が示される。FIG. 16 is a diagram showing a bit configuration of the "note-on" data shown in FIG. As shown in the figure, the "note-on" data is composed of three-byte data of a first byte BY indicating a note-on code, a second byte BY indicating a note number N, and a third byte BY3 indicating velocity Von. However, the detailed bit configuration is as follows. First, in the first byte BY1, the actual note-on code is indicated by the upper 4 bits “1001”, and the value of the upper 4 bits is fixed.
The lower 4 bits indicated by “**” are channel numbers 0 to 1.
5 is used as a code for specifying any one of the above. If the channel number is 0, the lower 4 bits are “0000”
Therefore, the first byte BY1 is “90” as described above.
H ”. In the subsequent second byte BY2, the MSB is always fixed to "0", and the note number N (0 to 1) is indicated by the lower 7 bits indicated as "****" in the drawing.
27 128 patterns) are shown. Similarly, the third byte B
In Y3, the MSB is always fixed to “0”, and the lower 7 bits indicated as “****” in the figure indicate the velocity Von (128 ways from 0 to 127).

【００６７】一方、図１７は、図１４に示す「ノートオ
フ」データのビット構成を示す図である。図示のとお
り、「ノートオフ」データは、ノートオフ符号を示す第
１バイトＢＹ１，ノートナンバーＮを示す第２バイトＢ
Ｙ２，ベロシティーＶoff を示す第３バイトＢＹ３とい
う３バイトのデータから構成されており、細かなビット
構成は次のとおりである。まず、第１バイトＢＹ１で
は、実際のノートオフ符号が上位４ビットの「１００
０」で示され、この上位４ビットの値は固定であるが、
図に「＊＊＊＊」と示された下位４ビットはチャンネル
番号０〜１５のいずれかを特定するコードとして利用さ
れる。チャンネル番号が０の場合は、下位４ビットが
「００００」になるため、第１バイトＢＹ１は上述した
ように「８０ H」になる。続く、第２バイトＢＹ２で
は、ＭＳＢは必ず「０」に固定されており、図に「＊＊
＊＊＊＊＊」と示された下位７ビットによりノートナン
バーＮ（０〜１２７の１２８通り）が示される。同様
に、第３バイトＢＹ３では、ＭＳＢは必ず「０」に固定
されており、図に「＊＊＊＊＊＊＊」と示された下位７
ビットによりベロシティーＶoff （０〜１２７の１２８
通り）が示される。FIG. 17 is a diagram showing the bit configuration of the "note-off" data shown in FIG. As shown in the drawing, the “note-off” data includes a first byte BY indicating a note-off code and a second byte B indicating a note number N.
It is composed of three bytes of data Y3, the third byte BY3 indicating the velocity Voff, and the detailed bit configuration is as follows. First, in the first byte BY1, the actual note-off code is the upper four bits “100”.
0 ", and the value of the upper 4 bits is fixed,
The lower 4 bits indicated by “****” in the figure are used as codes for specifying any of channel numbers 0 to 15. When the channel number is 0, since the lower 4 bits are "0000", the first byte BY1 is "80H" as described above. In the subsequent second byte BY2, the MSB is always fixed to “0”.
Note number N (128 ways from 0 to 127) is indicated by the lower 7 bits indicated as “****”. Similarly, in the third byte BY3, the MSB is always fixed to “0”, and the lower 7 bytes indicated by “****” in the figure are similarly displayed.
Velocity Voff (128 of 0 to 127)
Street) is shown.

【００６８】このように、複数のチャンネルを利用する
場合には、「ノートオン」データも、「ノートオフ」デ
ータも、第１バイトＢＹ１の下位４ビットにチャンネル
番号が格納されることになる。この場合、同一のノート
ナンバーを引用している「ノートオン」データと「ノー
トオフ」データとが存在しても、それだけでは、これら
のデータが同一のノートを表現する一対のデータ（当該
ノートの発音開始操作と発音終了操作とを表現するデー
タ）ということにはならない。たとえば、「９０Ａ８
６Ｃ H」なる３バイトからなる「ノートオン」データ
に後続して、「８１Ａ８６Ｃ H」なる３バイトから
なる「ノートオフ」データが存在していた場合、両デー
タはいずれも同一のノートナンバー「Ａ８」を引用して
いるが、前者はチャンネル番号０に関する「ノートオ
ン」データであり、後者はチャンネル番号１に関する
「ノートオフ」データであるため、それぞれ異なるノー
トを表現するためのデータということになる。As described above, when a plurality of channels are used, both the “note-on” data and the “note-off” data store the channel number in the lower 4 bits of the first byte BY1. In this case, even if there is “note-on” data and “note-off” data that refer to the same note number, these data alone constitute a pair of data representing the same note (a pair of data of the note). It does not mean that the data represents the sounding start operation and the sounding end operation. For example, "90 A8
If "Note A" data consisting of 3 bytes of "81 A8 6CH" follows "Note On" data consisting of 3 bytes of "6CH", both data have the same note number. Although "A8" is quoted, the former is "note-on" data for channel number 0 and the latter is "note-off" data for channel number 1, so it is data for expressing different notes. become.

【００６９】結局、複数のチャンネルを利用する場合に
は、同一のノートナンバーを引用しており、かつ、同一
のチャンネル番号を引用している「ノートオン」データ
と「ノートオフ」データとが存在する場合にのみ、両デ
ータは同一のノートを表現する一対のデータと認識され
ることになる。たとえば、「９３Ａ８６Ｃ H」なる
３バイトからなる「ノートオン」データに後続して、
「８３Ａ８６Ｃ H」なる３バイトからなる「ノート
オフ」データが存在していた場合であれば、両データは
いずれも同一のノートナンバー「Ａ８」を引用し、かつ
同一のチャンネル番号３を引用しているので、両データ
は同一のノートを表現する一対のデータとして認識され
ることになる。After all, when using a plurality of channels, there are “Note On” data and “Note Off” data which refer to the same note number and refer to the same channel number. Only when this is done, both data will be recognized as a pair of data representing the same note. For example, following "note on" data of 3 bytes "93 A8 6CH",
If there is “Note Off” data consisting of 3 bytes “83 A8 6CH”, both data refer to the same note number “A8” and the same channel number 3 Therefore, both data are recognized as a pair of data representing the same note.

【００７０】本願発明者は、本来、位相の情報をもたな
いＭＩＤＩデータに、位相の情報を付加するために、チ
ャンネルを示すデータを利用することができることに気
が付いた。ＳＭＦフォーマットによるＭＩＤＩデータの
「ノートオン」データおよび「ノートオフ」データのビ
ット構成は、既に述べたように、図１６および図１７に
示すようなものになる。ここで、第１バイトＢＹ１の下
位４ビットは、チャンネル番号を示す情報として利用さ
れているが、これを位相を示す情報として利用すること
も可能である。たとえば、この４ビットで示される０〜
１２７の範囲のデジタル値を、０〜＋２πの範囲の位相
角に割り当てるようにすれば、位相角θ自身を表現する
ことができる。この場合、同一のノートを示す「ノート
オン」データおよび「ノートオフ」データでは、位相角
θの値として同一のデジタル値を設定するようにする。
これは、既に述べたように、第１バイトＢＹ１の下位４
ビットおよび第２バイトＢＹ２の全８ビットが完全に一
致するような「ノートオン」データと「ノートオフ」デ
ータとの対が、同一のノートの発音開始操作および発音
終了操作を示す１組のデータとして認識されるためであ
る。The present inventor has noticed that data indicating a channel can be used to add phase information to MIDI data originally having no phase information. As described above, the bit configuration of the "note-on" data and the "note-off" data of the MIDI data in the SMF format is as shown in FIGS. Here, the lower 4 bits of the first byte BY1 are used as information indicating a channel number, but may be used as information indicating a phase. For example, 0 to 0 shown by these 4 bits
By assigning digital values in the range of 127 to phase angles in the range of 0 to + 2π, the phase angle θ itself can be expressed. In this case, the same digital value is set as the value of the phase angle θ in the “note on” data and the “note off” data indicating the same note.
This is, as already mentioned, the lower 4 bytes of the first byte BY1.
A pair of "note-on" data and "note-off" data in which the bit and all eight bits of the second byte BY2 completely match each other is a set of data indicating a sound generation start operation and a sound generation end operation of the same note. It is because it is recognized as.

【００７１】たとえば、「９４Ａ８６Ｃ H」なる３
バイトからなる「ノートオン」データに後続して、「８
４Ａ８６Ｃ H」なる３バイトからなる「ノートオ
フ」データが存在していた場合、両データの第１バイト
ＢＹ１の下位４ビットはいずれも「４ H」であり、第２
バイトＢＹ２はいずれも「Ａ８」であるから、両データ
は同一のノートを示す１組のデータとして認識されるこ
とになる。このノートは、所定のＭＩＤＩ音源のノート
ナンバー「Ａ８」に対応する音響波形によって再生され
ることになるが、第１バイトＢＹ１の下位４ビット（再
生時の位相を指示する情報）はいずれも「４ H」である
から、この音響波形は、たとえば、＋π／２だけ（「４
H」に対応した位相角だけ）位相をずらした状態で再生
されることになる。For example, “94 A8 6CH”
Following “Note On” data consisting of bytes, “8
In the case where “Note-off” data consisting of 3 bytes of “4 A8 6CH” exists, the lower 4 bits of the first byte BY1 of both data are “4H”, and
Since both bytes BY2 are "A8", both data are recognized as a set of data indicating the same note. This note is reproduced by the acoustic waveform corresponding to the note number “A8” of the predetermined MIDI sound source, and the lower 4 bits (information indicating the phase at the time of reproduction) of the first byte BY1 are all “ 4 H ”, the acoustic waveform is, for example, only + π / 2 (“ 4
The reproduction is performed with the phase shifted (by the phase angle corresponding to "H").

【００７２】§５．発音開始時刻の修正による対処本発明により符号化された符号データは、それぞれ位相
に関する情報を有しているので、これを再生する場合、
この位相を考慮した再生が行われないと意味がない。こ
のような位相を考慮した再生は、再生用音響波形（たと
えば、ＭＩＤＩ音源の音響波形）を所定の位相角だけず
らして提示すればよい。より具体的には、再生用音響波
形の再生開始時点を当該位相角に相当する時間だけ早め
るような処理を行うと簡単である。たとえば、あるＭＩ
ＤＩ音符について、本来の発音開始時刻（ノートオン時
刻）がｔｓであった場合、位相を考慮しなければ、この
時刻ｔｓから再生音が鳴り始まることになるが、位相角
θを考慮した再生を行うのであれば、位相角θに相当す
る時間δだけ早め、時刻（ｔｓ−δ）から再生音を鳴ら
すような修正を行えばよい。 §5. Correction by Correction of Sounding Start Time Since the code data coded according to the present invention has information on each phase, when reproducing them,
It is meaningless unless reproduction is performed in consideration of this phase. For such a reproduction in consideration of the phase, the reproduction acoustic waveform (for example, the acoustic waveform of the MIDI sound source) may be presented by being shifted by a predetermined phase angle. More specifically, it is easy to perform processing to advance the reproduction start point of the reproduction acoustic waveform by a time corresponding to the phase angle. For example, a certain MI
If the original sound generation start time (note-on time) of the DI note is ts, the reproduction sound starts to sound from this time ts if the phase is not considered. If so, a correction may be made such that the playback sound is produced at a time (ts−δ) earlier by a time δ corresponding to the phase angle θ.

【００７３】このように、位相角θを示すデータに基づ
いて、再生音の発音開始タイミングを修正する処理を、
再生時に行うことも可能であるが、符号化の段階で、こ
のタイミング修正を先に行っておくことも可能である。
たとえば、図１８上段に示すように、所定の代表周波数
ｆ（ｎ）および位相角θ（ｎ）を有する符号データＮｉ
が、本発明に係る符号化方法により生成されたとしよ
う。この符号データＮｉは、発音開始時刻ｔｓから発音
終了時刻ｔｅに至るまでの期間Δに渡っての演奏を指示
するデータということになる。もちろん、このような符
号データＮｉと、位相角θ（ｎ）とを、再生装置に対し
て与えれば、上述したように、再生時に発音開始時刻ｔ
ｓを位相角θに相当する時間δだけ早めるような修正を
行うことは可能であるが、符号化の段階で、予め発音開
始時刻ｔｓを早める修正を行っておくと便利である。As described above, the processing for correcting the reproduction start timing of the reproduced sound based on the data indicating the phase angle θ
Although it is possible to perform this at the time of reproduction, it is also possible to perform this timing correction first in the encoding stage.
For example, as shown in the upper part of FIG. 18, code data Ni having predetermined representative frequency f (n) and phase angle θ (n)
Is generated by the encoding method according to the present invention. The code data Ni is data instructing a performance over a period Δ from the sound generation start time ts to the sound generation end time te. Of course, if such code data Ni and the phase angle θ (n) are given to the reproducing apparatus, the sound generation start time t
Although it is possible to make a correction to advance s by the time δ corresponding to the phase angle θ, it is convenient to make a correction to advance the sound generation start time ts in advance at the encoding stage.

【００７４】具体的には、図１８上段に示す符号データ
Ｎｉを、図１８下段に示す符号データＮｉ^＊に修正する
処理を行うのである。符号データＮｉ^＊は、発音開始時
刻ｔｓ^＊から発音終了時刻ｔｅ^＊に至るまでの期間Δ^＊
に渡っての演奏を指示するデータである。ここで、ｔｅ
^＊＝ｔｅであるが、ｔｓ^＊＝ｔｓ−δとなり発音開始時
刻がδだけ早まっており、Δ^＊＝Δ＋δとなり、図にハ
ッチングを施した時間δだけ演奏期間が伸びている。こ
こで、時間δは、位相角θ（ｎ）に対応する時間であ
り、δ＝θ（ｎ）／２πｆ（ｎ）で与えられる。なお、
演奏期間については、何ら修正を加えずに、Δ^＊＝Δと
しても大きな支障はない（演奏終了時刻が本来よりδだ
け早くなってしまうが、通常、再生音の末尾部分はあま
り重要ではないため、実用上は大きな支障はない）。More specifically, a process for correcting the code data Ni shown in the upper part of FIG. 18 to the code data Ni ^* shown in the lower part of FIG. 18 is performed. The code data Ni ^* has a period Δ ^* from the sound generation start time ts ^* to the sound generation end time te ^* ^.
This is data for instructing a performance over the period. Where te
^* = Te, but ts ^* = ts−δ, the sound generation start time is advanced by δ, Δ ^* = Δ + δ, and the performance period is extended by the time δ indicated by hatching in the figure. Here, the time δ is a time corresponding to the phase angle θ (n), and is given by δ = θ (n) / 2πf (n). In addition,
Regarding the performance period, there is no significant problem even if Δ ^* = Δ without any modification. (The performance end time is earlier by δ than it should be, but since the end part of the reproduced sound is usually not important, There is no major problem in practical use).

【００７５】図１９(a) に示す符号データ（これは、図
１３(c) に示す統合処理後の符号データである）に対し
て、上述した発音開始時刻の修正を行った一例を図１９
(b)に示す。図にハッチングを施した部分が、上述の時
間δに相当する部分であり、それぞれ位相角に相当する
時間だけ発音開始時刻が早まっている。図１９(c) は、
この図１９(b) に示す符号データをＭＩＤＩ符号で示す
場合のノートオンおよびノートオフの符号列を示す図で
ある。図１３(d) に示す符号列と比べると、ノートオン
の時刻が修正されていることになる。FIG. 19 shows an example in which the above-described tone generation start time is corrected for the code data shown in FIG. 19A (this is the code data after the integration processing shown in FIG. 13C).
It is shown in (b). The hatched portions in the figure are portions corresponding to the above-mentioned time δ, and the sound generation start time is advanced by the time corresponding to each phase angle. FIG. 19 (c)
FIG. 20 is a diagram showing a note-on and note-off code string when the code data shown in FIG. 19B is represented by a MIDI code. Compared to the code string shown in FIG. 13D, the note-on time has been corrected.

【００７６】このように、位相角に相当する時間だけ発
音開始時刻（単位区間の始端位置）を修正する処理を行
えば、位相に関する情報が発音開始時刻に内包された状
態での符号化が行われることになる。こうして、位相に
関する情報を内包させるような符号化を行うと、位相に
関する情報は外見上は出てこなくなるため、§４で述べ
たような手法（チャンネル番号として位相角を示す手
法）を採ることなしに、従来の一般的なＭＩＤＩデータ
としての取り扱いが可能になる。As described above, if the processing for correcting the sound generation start time (start position of the unit section) by the time corresponding to the phase angle is performed, the encoding in a state in which the information on the phase is included in the sound generation start time is performed. Will be In this manner, when encoding is performed so as to include information about the phase, the information about the phase does not appear in appearance, so that the method described in §4 (a method of indicating the phase angle as a channel number) is not used. In addition, it can be handled as conventional general MIDI data.

【００７７】§６．ＭＩＤＩ音源を用いた再生時の留意最後に、ＭＩＤＩ音源を用いて再生を行う場合に有用な
手法をいくつか述べておく。前述の§５では、符号デー
タの発音開始時刻（ノートオン時刻）を早める修正を行
うことにより位相情報を内包させる手法を述べたが、必
要に応じて、発音終了時刻（ノートオフ時刻）を修正す
ることも可能である。この発音終了時刻の修正を行う
と、後述するように、リリースタイムによる悪影響を低
減させるというメリットが得られる。 §6. Attention to Playback Using MIDI Sound Source Finally, some techniques useful for performing playback using a MIDI sound source will be described. In the above-mentioned §5, the method of including the phase information by performing the correction to advance the sound generation start time (note on time) of the code data has been described. However, the sound generation end time (note off time) may be corrected as necessary. It is also possible. The correction of the sound generation end time has an advantage of reducing an adverse effect due to the release time, as described later.

【００７８】一般的なＭＩＤＩ音源を用いて再生を行う
場合、図２０(a) に示すようなＭＩＤＩイベントデータ
（ノートオンデータとノートオフデータ）に同期して、
音源の演奏開始および終了が制御されることになるた
め、音源の出力レベルは図２０(b) に示すようなものに
なる。ここで、時間軸上におけるノートオン後の出力レ
ベルの立上がり部分はアタックタイムと呼ばれ、ノート
オフから出力レベルが０になるまでの部分はリリースタ
イムと呼ばれている。ピアノなどの音源では、ノートオ
フ時にダンパーが作用するためリリースタイムはほとん
どないが、管楽器などの音源では、演奏者が息を吹き込
む操作を停止した後も、管内に音が残るため、ある程度
のリリースタイムが現れる。このようなリリースタイム
が存在すると、シーケンサソフトによっては、後続する
符号データのノートオン処理が遅延したり、不必要なエ
コーが発生したり、種々の悪影響が生じる場合がある。When reproduction is performed using a general MIDI sound source, in synchronization with MIDI event data (note-on data and note-off data) as shown in FIG.
Since the start and end of the performance of the sound source are controlled, the output level of the sound source is as shown in FIG. Here, the rising portion of the output level after the note on on the time axis is called an attack time, and the portion from the note off to when the output level becomes 0 is called a release time. With a sound source such as a piano, there is almost no release time because the damper works when the note is off, but with a sound source such as a wind instrument, the sound remains in the tube even after the player stops breathing, so a certain amount of release time Time appears. When such a release time exists, note-on processing of subsequent code data may be delayed, unnecessary echoes may occur, or various adverse effects may occur depending on the sequencer software.

【００７９】そこで、符号化の段階において、予めリリ
ースタイムを考慮して、発音終了時刻（ノートオフ時
刻）を早めるような修正を行っておくことにより、この
ような悪影響を抑えることが可能になる。楽器音を再生
する場合、リリースタイムが存在する方が自然である
が、人間の声などを再生する場合は、リリースタイムが
ない方が好ましい。そこで、原音響信号が人間の声のよ
うな場合には、上述したように発音終了時刻を早める修
正を行い、リリースタイムが発生しないような処理をす
るのが好ましい。Therefore, in the encoding stage, such an adverse effect can be suppressed by making a correction so as to advance the sound generation end time (note-off time) in consideration of the release time in advance. . When playing a musical instrument sound, it is natural to have a release time, but when playing a human voice or the like, it is preferable that there be no release time. Therefore, when the original sound signal is like a human voice, it is preferable to correct the sound generation end time earlier as described above and perform processing to prevent the release time from occurring.

【００８０】また、一般的なＭＩＤＩ音源では、出力振
幅が周波数ごとに変動する場合が少なくない。図２１
は、現在市販されている具体的なＭＩＤＩ音源について
の音源固有の出力振幅比の測定値を示す表である。ここ
では、ノートナンバー２１〜１０４を６つのグループに
分け、各グループごとに、音源固有の出力振幅を測定し
た結果が示されている。この表の「音源固有の出力振幅
比」の欄には、第１グループ（ノートナンバー２１〜５
１）の出力振幅の平均値を基準値１．０にとったとき
の、各グループの出力振幅の平均値の値が比として示さ
れている。たとえば、第１グループに属するノートナン
バー２１を演奏した場合に比べ、第３グループに属する
ノートナンバー５９を演奏した場合は、ＭＩＤＩデータ
のベロシティーの値が同一であったとしても、４倍もの
出力振幅が得られることになる。In a general MIDI sound source, the output amplitude often varies for each frequency. FIG.
Is a table showing measured values of output amplitude ratios specific to sound sources for specific MIDI sound sources currently on the market. Here, note numbers 21 to 104 are divided into six groups, and the results of measuring the output amplitude specific to the sound source for each group are shown. In the column of “output amplitude ratio specific to sound source” in this table, the first group (note numbers 21 to 5) is set.
The value of the average value of the output amplitude of each group when the average value of the output amplitude of 1) is set to the reference value 1.0 is shown as a ratio. For example, when the note number 59 belonging to the third group is played as compared with the case where the note number 21 belonging to the first group is played, even if the velocity value of the MIDI data is the same, the output is four times as large. The amplitude will be obtained.

【００８１】人間の声などを符号化し、これを再生する
際、このように出力振幅が周波数ごとに変動しているＭ
ＩＤＩ音源を用いると、原音に忠実な再生音が得られな
くなる。そこで、実用上は、実際に利用するＭＩＤＩ音
源についての出力振幅特性を測定し、この出力振幅特性
に基づいて、ＭＩＤＩデータのベロシティーを補正する
処理を行うのが好ましい。図示されている表の「ベロシ
ティ補正量」の欄に記載された数値は、出力振幅比に基
づいて設定したベロシティの補正量を示している。たと
えば、第３グループについての補正量は０％であるのに
対し、第１グループについての補正量は＋１００％とな
っている。第１グループのＭＩＤＩデータに対して、＋
１００％の補正を行うと、補正後のベロシティ値はもと
の２００％となるが、振幅はベロシティの２乗に比例す
るので、振幅値としては４００％となる補正が行われる
ことになる。結局、第１グループについての「音源固有
の出力振幅比」の１．０に対して、４００％になるよう
な補正が行われ、補正後の振幅比は４．０になり、第３
グループの振幅比と等しくなる。図２１の「補正後振幅
比」の欄には、このような補正が行われた後の振幅比が
記載されており、いずれのグループもほぼ４．０に近い
値となっている。When a human voice or the like is encoded and reproduced, the output amplitude fluctuates for each frequency.
If an IDI sound source is used, a reproduced sound faithful to the original sound cannot be obtained. Therefore, in practice, it is preferable to measure the output amplitude characteristics of the MIDI sound source actually used, and to perform processing for correcting the velocity of the MIDI data based on the output amplitude characteristics. Numerical values described in the column of “velocity correction amount” in the illustrated table indicate a correction amount of velocity set based on the output amplitude ratio. For example, while the correction amount for the third group is 0%, the correction amount for the first group is + 100%. + For the MIDI data of the first group
When the 100% correction is performed, the corrected velocity value becomes 200% of the original value. However, since the amplitude is proportional to the square of the velocity, the correction is performed so that the amplitude value becomes 400%. As a result, a correction is made to 400% against 1.0 of the “output amplitude ratio specific to the sound source” for the first group, and the corrected amplitude ratio becomes 4.0.
It becomes equal to the amplitude ratio of the group. In the column of “amplitude ratio after correction” in FIG. 21, the amplitude ratio after such correction is described, and each group has a value close to approximately 4.0.

【００８２】[0082]

【発明の効果】以上のとおり本発明に係る音響信号の符
号化方法によれば、位相を考慮した符号化が行なわれる
ため、原音響波形を高い品質をもって符号化することが
できるようになる。As described above, according to the audio signal encoding method according to the present invention, since encoding is performed in consideration of the phase, the original acoustic waveform can be encoded with high quality.

【図面の簡単な説明】[Brief description of the drawings]

【図１】先願発明に係る音響信号の符号化方法の基本原
理を示す図である。FIG. 1 is a diagram showing the basic principle of an audio signal encoding method according to the invention of the prior application.

【図２】図１(c) に示す強度グラフに基づいて作成され
た符号コードを示す図である。FIG. 2 is a diagram showing a code generated based on the intensity graph shown in FIG. 1 (c).

【図３】単位区間ｄ内の区間信号Ｘｊと周期関数Ｇｊと
の差分を求める原理図である。FIG. 3 is a principle diagram for calculating a difference between an interval signal Xj in a unit interval d and a periodic function Gj.

【図４】時間軸上に部分的に重複するように単位区間設
定を行うことにより作成された符号コードを示す図であ
る。FIG. 4 is a diagram showing code codes created by performing unit section settings so as to partially overlap on the time axis.

【図５】時間軸上に部分的に重複するような単位区間設
定の具体例を示す図である。FIG. 5 is a diagram showing a specific example of unit section setting that partially overlaps on a time axis.

【図６】符号化の対象となる区間信号ｘとサンプル番号
との関係を示す図である。FIG. 6 is a diagram illustrating a relationship between a section signal x to be encoded and a sample number.

【図７】符号化のために用意された１２８対の周期信号
の一例を示す図である。FIG. 7 is a diagram showing an example of 128 pairs of periodic signals prepared for encoding.

【図８】図６に示す区間信号ｘと図７に示す周期信号と
の相関を求める演算を示す図である。8 is a diagram showing an operation for calculating a correlation between the section signal x shown in FIG. 6 and the periodic signal shown in FIG. 7;

【図９】図８に示す相関演算に用いられる演算式（三角
関数を用いた式）を示す図である。9 is a diagram showing an arithmetic expression (an expression using a trigonometric function) used in the correlation operation shown in FIG. 8;

【図１０】図８に示す相関演算に用いられる演算式（一
般の周期関数を用いた式）を示す図である。FIG. 10 is a diagram showing an arithmetic expression (an expression using a general periodic function) used in the correlation operation shown in FIG. 8;

【図１１】相関演算の結果から、位相角を求める演算式
を示す図である。FIG. 11 is a diagram showing an arithmetic expression for obtaining a phase angle from a result of a correlation operation.

【図１２】図８に示す相関演算を、各単位区間について
共通の周期関数を用いて行うための演算式を示す図であ
る。12 is a diagram showing an arithmetic expression for performing the correlation operation shown in FIG. 8 using a common periodic function for each unit section.

【図１３】符号統合処理の具体的な手順を示す図であ
る。FIG. 13 is a diagram showing a specific procedure of a code integration process.

【図１４】現在、最も標準的に利用されているＳＭＦ
（Standard MIDI File）フォーマットによるＭＩＤＩデ
ータの形式を示す図である。FIG. 14: SMF currently most commonly used
FIG. 3 is a diagram showing a format of MIDI data in a (Standard MIDI File) format.

【図１５】具体的なＭＩＤＩデータの構成例を示す図で
ある。FIG. 15 is a diagram showing a specific configuration example of MIDI data.

【図１６】図１４に示すＭＩＤＩデータにおける「ノー
トオン」データのビット構成を示す図である。16 is a diagram showing a bit configuration of “note-on” data in the MIDI data shown in FIG.

【図１７】図１４に示すＭＩＤＩデータにおける「ノー
トオフ」データのビット構成を示す図である。FIG. 17 is a diagram showing a bit configuration of “note-off” data in the MIDI data shown in FIG. 14;

【図１８】発音開始時刻を修正することにより、位相に
関する情報を内包させる原理を示す図である。FIG. 18 is a diagram illustrating a principle of including information on a phase by correcting a sound generation start time.

【図１９】発音開始時刻を修正することにより、位相に
関する情報を内包させた具体例を示す図である。FIG. 19 is a diagram showing a specific example in which information on a phase is included by correcting a sound generation start time.

【図２０】ＭＩＤＩ音源再生時におけるリリースタイム
の影響を抑制する手法を示す図である。FIG. 20 is a diagram illustrating a technique for suppressing the influence of a release time during playback of a MIDI sound source.

【図２１】ＭＩＤＩ音源再生時における音源固有の出力
振幅の変動を調整する手法を示す図である。FIG. 21 is a diagram illustrating a method of adjusting a variation in output amplitude specific to a sound source during reproduction of a MIDI sound source.

【符号の説明】[Explanation of symbols]

Ａ…複素強度Ａ（ｎ），Ｂ（ｎ）…相関値ｄ，ｄ１〜ｄ８…単位区間／ＭＩＤＩデータＥ，Ｅ（ｎ）…実効強度（二乗和平方根値）ｅ…振幅強度Ｆ…サンプリング周波数ｆ，ｆ（ｎ）…周波数Ｇｊ…周期信号Ｋ…単位区間に先行する累積サンプル番号ｋ…１単位区間内のサンプル番号Ｌ…単位区間の区間長 ΔＬ…オフセット長Ｎ１〜Ｎ４，Ｎｉ，Ｎｉ^＊…符号データｎ，ｎ１，ｎ２，ｎ３…ノートナンバーＱ…係数Ｒｎ…周期信号Ｔ１〜Ｔ３…トラックＸｊ，Ｘｊ＋１…区間信号ｘ…区間信号ｗ…単位区間内のサンプル数 θ…位相角 Δ，Δ^＊…演奏時間 δ…位相角θに対応する時間A: Complex intensity A (n), B (n): Correlation value d, d1 to d8: Unit section / MIDI data E, E (n): Effective intensity (root sum of squares) e: Amplitude intensity F: Sampling frequency f, f (n): frequency Gj: periodic signal K: cumulative sample number preceding the unit section k: sample number in one unit section L: section length of the unit section ΔL: offset length N1 to N4, Ni, Ni ^* ... code data n, n1, n2, n3 ... note number Q ... coefficient Rn ... periodic signal T1 to T3 ... track Xj, Xj + 1 ... section signal x ... section signal w ... number of samples in a unit section θ ... phase angle Δ, Δ ^* … Performance time δ… Time corresponding to phase angle θ

Claims

【特許請求の範囲】[Claims]

【請求項１】時系列の強度信号として与えられる音響
信号を符号化するための符号化方法であって、符号化対象となる音響信号を、デジタルの音響データと
して取り込む入力段階と、前記音響データの時間軸上に複数の単位区間を設定する
区間設定段階と、複数通りの周波数を設定し、各設定周波数のそれぞれに
ついて、互いに位相が異なる一対の周期関数を定義する
周期関数定義段階と、個々の単位区間内の音響データと前記各周期関数との相
関値を計算し、各設定周波数のそれぞれについて、一対
の周期関数に対する総合的な相関が所定の基準以上の大
きさとなる１つまたは複数の設定周波数を代表周波数と
して選出する代表周波数選出段階と、個々の単位区間についての各代表周波数ごとに、当該代
表周波数をもった一対の周期関数について得られた一対
の相関値の比率に基いて位相角を計算する位相角計算段
階と、個々の単位区間について、代表周波数、当該代表周波数
をもった周期関数についての相関を示す値、当該代表周
波数について計算された位相角、個々の単位区間の時間
軸上での位置、を示す情報を含む符号データを１つまた
は複数生成し、個々の単位区間の音響データを個々の符
号データによって表現する符号化段階と、を有することを特徴とする音響信号の符号化方法。1. An encoding method for encoding an audio signal given as a time-series intensity signal, comprising: inputting an audio signal to be encoded as digital audio data; An interval setting step of setting a plurality of unit sections on the time axis of the above; a periodic function defining step of setting a plurality of frequencies and defining a pair of periodic functions having phases different from each other for each of the set frequencies; Calculate the correlation value between the acoustic data in the unit section and each of the periodic functions, and, for each of the set frequencies, one or a plurality of ones or more in which the overall correlation with respect to the pair of periodic functions is greater than or equal to a predetermined reference. A representative frequency selecting step of selecting a set frequency as a representative frequency, and a pair of periodic relations having the representative frequency for each representative frequency for each unit section. A phase angle calculating step of calculating a phase angle based on a ratio of a pair of correlation values obtained for each of the unit intervals, a representative frequency, a value indicating a correlation for a periodic function having the representative frequency, One or a plurality of code data including information indicating the phase angle calculated for the frequency and the position of each unit section on the time axis are generated, and the sound data of each unit section is represented by the individual code data. An encoding method, comprising: encoding an audio signal.

【請求項２】請求項１に記載の音響信号の符号化方法
において、個々の単位区間に関する相関値を演算する際に、時間軸
上の所定の基準点について常に同一の位相をもった共通
の周期関数を用いるようにすることを特徴とする音響信
号の符号化方法。2. A method according to claim 1, wherein when calculating a correlation value for each unit section, a common phase having the same phase at a predetermined reference point on the time axis is always used. A method for encoding an acoustic signal, wherein a periodic function is used.

【請求項３】請求項１または２に記載の音響信号の符
号化方法において、周期関数として、その波形形状が正弦波、三角波、矩形
波、鋸歯状波となる複数通りの関数を定義しておき、取
り込んだ音響データに基いて所定の波形形状をもった関
数を選択的に用いるようにすることを特徴とする音響信
号の符号化方法。3. The method of encoding an acoustic signal according to claim 1, wherein a plurality of functions having a waveform shape of a sine wave, a triangular wave, a rectangular wave, and a sawtooth wave are defined as the periodic function. A function having a predetermined waveform shape is selectively used based on the acquired acoustic data.

【請求項４】請求項１〜３のいずれかに記載の音響信
号の符号化方法において、複数通りの周波数として、周波数値が等比級数配列をな
す周波数を設定することを特徴とする音響信号の符号化
方法。4. The audio signal encoding method according to claim 1, wherein as the plurality of frequencies, frequencies whose frequency values form a geometric series are set. Encoding method.

【請求項５】請求項１〜４のいずれかに記載の音響信
号の符号化方法において、一対の周期関数として、互いに位相がπ／２だけ異なる
周期関数を定義し、前記一対の周期関数に対する総合的な相関として、各周
期関数に対する各相関値の二乗和平方根値を用いること
を特徴とする音響信号の符号化方法。5. The audio signal encoding method according to claim 1, wherein a periodic function having a phase different from that of the pair by π / 2 is defined as a pair of periodic functions. A sound signal encoding method characterized by using a root-sum-square value of each correlation value for each periodic function as an overall correlation.

【請求項６】請求項１〜５のいずれかに記載の音響信
号の符号化方法において、符号化段階において、位相角に相当する時間だけ単位区
間の時間軸上での位置を修正する処理を行い、位相角を
示す情報を、単位区間の時間軸上での位置を示す情報に
内包した符号データを生成することを特徴とする音響信
号の符号化方法。6. The audio signal encoding method according to claim 1, wherein in the encoding step, a process of correcting a position on a time axis of a unit section by a time corresponding to a phase angle is performed. A coding method for an audio signal, comprising: generating code data in which information indicating a phase angle is included in information indicating a position on a time axis of a unit section.

【請求項７】請求項１〜５のいずれかに記載の音響信
号の符号化方法において、符号化段階において、代表周波数をノートナンバー、相
関を示す値をベロシティー、単位区間の時間軸上での位
置をデルタタイム、位相角をチャンネル番号、によって
それぞれ表現したＭＩＤＩデータにより、符号化を行う
ことを特徴とする音響信号の符号化方法。7. The audio signal encoding method according to claim 1, wherein in the encoding step, a representative frequency is a note number, a value indicating a correlation is a velocity, and a time interval of a unit section on a time axis. A method for encoding an audio signal, characterized in that encoding is performed using MIDI data in which the position is represented by a delta time and the phase angle is represented by a channel number.

【請求項８】請求項１〜７のいずれかに記載の音響信
号の符号化方法において、符号化段階において、代表周波数、位相角、単位区間の
時間軸上での位置、なる３つの要素がそれぞれ所定の許
容範囲内で近似している複数の符号データが生成された
場合、これら複数の符号データを１つの符号データに統
合する処理を行うことを特徴とする音響信号の符号化方
法。8. The method for encoding an audio signal according to claim 1, wherein in the encoding step, three elements of a representative frequency, a phase angle, and a position on a time axis of a unit section are included. An audio signal encoding method characterized by performing processing for integrating a plurality of code data into one code data when a plurality of code data that are each approximated within a predetermined allowable range are generated.

【請求項９】請求項１〜８のいずれかに記載の音響信
号の符号化方法において、代表周波数選出段階において、音響データのフーリエス
ペクトルにおけるスペクトル強度値に基いて、１つまた
は複数の代表周波数を選出することを特徴とする音響信
号の符号化方法。9. The audio signal encoding method according to claim 1, wherein in the representative frequency selecting step, one or more representative frequencies are determined based on a spectrum intensity value in a Fourier spectrum of the acoustic data. A sound signal encoding method, wherein

【請求項１０】請求項１〜８のいずれかに記載の音響
信号の符号化方法において、代表周波数選出段階において、第ｊ番目の対象音響デー
タに対する相関が最も大きくなる一対の周期関数の周波
数を第ｊ番目の代表周波数として選出し、この第ｊ番目
の代表周波数を有し前記相関に応じた振幅をもった周期
関数からなる信号成分を前記第ｊ番目の対象音響データ
から減じることにより得られる音響データを、第（ｊ＋
１）番目の対象音響データとする処理を、ｊ＝１〜Ｐ
（Ｐは任意の整数）まで繰り返し実行し、Ｐ個の代表周
波数を選出することを特徴とする音響信号の符号化方
法。10. The audio signal encoding method according to claim 1, wherein in the representative frequency selecting step, a frequency of a pair of periodic functions having the largest correlation with the j-th target audio data is set. It is selected as the j-th representative frequency, and is obtained by subtracting from the j-th target acoustic data a signal component having a periodic function having the j-th representative frequency and having an amplitude corresponding to the correlation. The sound data is converted to the (j +
1) The processing for the first target acoustic data is j = 1 to P
(P is an arbitrary integer), and selects P representative frequencies.

【請求項１１】請求項１〜１０のいずれかに記載の音
響信号の符号化方法をコンピュータに実行させるための
プログラムを記録したコンピュータ読み取り可能な記録
媒体。11. A computer-readable recording medium on which a program for causing a computer to execute the audio signal encoding method according to claim 1 is recorded.