JP4473979B2

JP4473979B2 - Acoustic signal encoding method and decoding method, and recording medium storing a program for executing the method

Info

Publication number: JP4473979B2
Application number: JP17787699A
Authority: JP
Inventors: 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 1999-06-24
Filing date: 1999-06-24
Publication date: 2010-06-02
Anticipated expiration: 2019-06-24
Also published as: JP2001005451A

Abstract

PROBLEM TO BE SOLVED: To encode general acoustic signal with high quality by using MIDI data, and reproduce this by using a general MIDI sound source. SOLUTION: A fundamental frequency, a '1/3 semitone' higher intermediate frequency, and a '1/3 semitone' lower intermediate frequency corresponding to note numbers prepared by a MIDI sound source are defined on the frequency axis. Plural unit sections are set on the time base, and an original acoustic signal are encoded in each unit section. Encoding is performed by expressing the original acoustic signals by sign data as sets of periodic signals with the defined fundamental frequency or the medium frequencies. At the time of decoding, the MIDI sound source is prepared, and the encoded data showing the fundamental frequency is to be reproduced from the MIDI sound source as it is, and the encoded data showing the intermediate frequencies are reproduced by correcting the MIDI sound source by a '1/3 semitone'.

Description

【０００１】
【発明の属する技術分野】
本発明は音響信号の符号化方法および復号化方法に関し、時系列の強度信号として与えられる音響信号を符号化し、これを復号化して再生する技術に関する。特に、本発明は一般の音響信号を、ＭＩＤＩ形式の符号データに効率良く変換する処理に適しており、放送メディア（ラジオ、テレビ）、通信メディア（ＣＳ映像・音声配信、インターネット配信）、パッケージメディア（ＣＤ、ＭＤ、カセット、ビデオ、ＬＤ、ＣＤ−ＲＯＭ、ゲームカセット）などで提供する各種オーディオコンテンツを制作する種々の産業分野への応用が期待される。
【０００２】
【従来の技術】
音響信号を符号化する技術として、ＰＣＭ（Pulse Code Modulation ）の手法は最も普及している手法であり、現在、オーディオＣＤやＤＡＴなどの記録方式として広く利用されている。このＰＣＭの手法の基本原理は、アナログ音響信号を所定のサンプリング周波数でサンプリングし、各サンプリング時の信号強度を量子化してデジタルデータとして表現する点にあり、サンプリング周波数や量子化ビット数を高くすればするほど、原音を忠実に再生することが可能になる。ただ、サンプリング周波数や量子化ビット数を高くすればするほど、必要な情報量も増えることになる。そこで、できるだけ情報量を低減するための手法として、信号の変化差分のみを符号化するＡＤＰＣＭ（Adaptive Differential Pulse Code Modulation ）の手法も用いられている。
【０００３】
一方、電子楽器による楽器音を符号化しようという発想から生まれたＭＩＤＩ（Musical Instrument Digital Interface）規格も、パーソナルコンピュータの普及とともに盛んに利用されるようになってきている。このＭＩＤＩ規格による符号データ（以下、ＭＩＤＩデータという）は、基本的には、楽器のどの鍵盤キーを、どの程度の強さで弾いたか、という楽器演奏の操作を記述したデータであり、このＭＩＤＩデータ自身には、実際の音の波形は含まれていない。そのため、実際の音を再生する場合には、楽器音の波形を記憶したＭＩＤＩ音源が別途必要になる。しかしながら、上述したＰＣＭの手法で音を記録する場合に比べて、情報量が極めて少なくてすむという特徴を有し、その符号化効率の高さが注目を集めている。このＭＩＤＩ規格による符号化および復号化の技術は、現在、パーソナルコンピュータを用いて楽器演奏、楽器練習、作曲などを行うソフトウエアに広く採り入れられており、カラオケ、ゲームの効果音といった分野でも広く利用されている。
【０００４】
【発明が解決しようとする課題】
上述したように、ＰＣＭの手法により音響信号を符号化する場合、十分な音質を確保しようとすれば情報量が膨大になり、データ処理の負担が重くならざるを得ない。したがって、通常は、ある程度の情報量に抑えるため、ある程度の音質に妥協せざるを得ない。もちろん、ＭＩＤＩ規格による符号化の手法を採れば、非常に少ない情報量で十分な音質をもった音の再生が可能であるが、上述したように、ＭＩＤＩ規格そのものが、もともと楽器演奏の操作を符号化するためのものであるため、広く一般音響への適用を行うことはできない。別言すれば、ＭＩＤＩデータを作成するためには、実際に楽器を演奏するか、あるいは、楽譜の情報を用意する必要がある。
【０００５】
このように、従来用いられているＰＣＭの手法にしても、ＭＩＤＩの手法にしても、それぞれ音響信号の符号化方法としては一長一短があり、一般の音響について、少ない情報量で十分な音質を確保することはできない。ところが、一般の音響についても効率的な符号化を行いたいという要望は、益々強くなってきている。いわゆるヴォーカル音響と呼ばれる人間の話声や歌声を取り扱う分野では、かねてからこのような要望が強く出されている。たとえば、語学教育、声楽教育、犯罪捜査などの分野では、ヴォーカル音響信号を効率的に符号化する技術が切望されている。このような要求に応えるために、特開平１０−２４７０９９号公報、特開平１１−７３１９９号公報、特開平１１−７３２００号公報、特開平１１−９５７５３号公報、特開２０００−９９００９号公報、特開２０００−９９０９３号公報、特開２０００−２６１３２２号公報には、ＭＩＤＩデータを利用することが可能な新規な符号化方法が提案されている。
【０００６】
これらの方法では、音響信号の時間軸に沿って複数の単位区間を設定し、各単位区間内の音響信号を、予め用意した何種類かの周期関数によって表現する、という手法により符号化が行われる。しかしながら、この方法で用いられる周期関数は、周波数軸上に離散的に定義されたとびとびの周波数値をもった周期関数になるため、必ずしも原音響信号に忠実な符号化が行われるわけではない。特に、ＭＩＤＩデータを用いて符号化した場合、一般的なＭＩＤＩ音源を用いた復号化（原音響信号の再生）ができるというメリットが得られるものの、用いる周期関数としては、ＭＩＤＩ規格のノートナンバーに相当する周波数（半音単位の周波数）をもった周期関数に限定されてしまうことになる。もともとＭＩＤＩ規格は、楽器の演奏操作を記述する目的で制定された規格であり、半音単位の離散的な周波数のみを用いて再生を行ったとしても、楽器音を表現する上では何ら問題は生じない。しかしながら、ヴォーカル音響など、多彩な周波数成分を有する一般的な音響信号を表現する上では、半音単位の離散的な周波数のみを用いた符号化を行うと、十分な品質をもった再生音を得ることはできない。
【０００７】
そこで本発明は、ＭＩＤＩデータのような符号データへの変換を高い品質をもって行うことが可能な音響信号の符号化方法を提供することを目的とし、また、そのような符号化に応じた復号化方法を提供することを目的とする。
【０００８】
【課題を解決するための手段】
(1) 本発明の第１の態様は、時系列の強度信号として与えられる音響信号を符号化するための音響信号の符号化方法において、
符号化対象となる音響信号を、デジタルの音響データとして取り込む入力段階と、
音響データの時間軸上に複数の単位区間を設定する区間設定段階と、
Ｎ通りの基本周波数（Ｎ＞１）を定義するとともに、各隣接基本周波数の中間にそれぞれ（Ｍ−１）通りの中間周波数（Ｍ＞１）を定義する周波数定義段階と、
個々の単位区間内の音響データについて、各周波数をもった周期関数との相関値を演算し、この相関値に基づいて音響データに含まれる代表的な周波数成分を示す代表周波数を選出する代表周波数選出段階と、
個々の単位区間内の音響データを、選出した代表周波数、演算した相関値、個々の単位区間の時間軸上での位置、を示す情報を含む符号データによって表現する符号化段階と、
符号データを、代表周波数の基本周波数に対するずれ量に応じて、複数Ｍ個のグループに分類し、各グループごとに分離して出力する符号出力段階と、
を行うようにしたものである。
【０００９】
(2) 本発明の第２の態様は、上述の第１の態様に係る音響信号の符号化方法において、
Ｎ通りの基本周波数のうちの第ｉ番目（ただし、ｉ＝１，２，…，Ｎ）の基本周波数をｆ（ｉ，０）とし、この第ｉ番目の基本周波数と第（ｉ＋１）番目の基本周波数との中間に定義される（Ｍ−１）通りの中間周波数のうちの第ｊ番目（ただし、ｊ＝１，２，…，Ｍ−１）の中間周波数をｆ（ｉ，ｊ）としたときに、１以上の実数で与えられる比例定数Ｋを用いて、
ｆ（ｉ＋１，０）＝Ｋ・ｆ（ｉ，０）
ｆ（ｉ，ｊ）＝Ｋ^ｊ／Ｍ・ｆ（ｉ，０）
なる式が成り立つように、周波数定義段階における周波数定義を行うようにしたものである。
【００１０】
(3) 本発明の第３の態様は、上述の第２の態様に係る音響信号の符号化方法において、
符号化段階において、代表周波数をノートナンバー、相関値をベロシティー、単位区間の時間軸上での位置をデルタタイム、によってそれぞれ表現したＭＩＤＩデータにより符号化を行い、Ｍ個のグループにそれぞれＭ個のチャンネルを対応させ、各グループに分類された符号データをそれぞれ対応するチャンネルに出力するようにしたものである。
【００１１】
(4) 本発明の第４の態様は、上述の第１の態様に係る音響信号の符号化方法によって符号化された符号データを復号化するための音響信号の復号化方法において、
復号化方法を実行する装置が、Ｎ通りの基本周波数の音源信号を発生することができる音源を利用して、基本周波数をもった符号データのグループについては、この音源信号を用いて再生を行い、中間周波数をもった符号データのグループについては、この音源信号に、基本周波数に対するずれ量に相当する周波数補正を施した補正信号を用いて再生を行うようにしたものである。
【００１２】
(5) 本発明の第５の態様は、上述の第２の態様に係る音響信号の符号化方法によって符号化された符号データを復号化するための音響信号の復号化方法において、
復号化方法を実行する装置が、基本周波数に相当するＮ通りの音源信号を発生することができる音源を利用して、
第ｉ番目の基本周波数ｆ（ｉ，０）をもった符号データについては、第ｉ番目の音源信号を用いて再生を行い、
第ｉ番目の基本周波数と第（ｉ＋１）番目の基本周波数との中間に定義される（Ｍ−１）通りの中間周波数のうちの第ｊ番目の中間周波数ｆ（ｉ，ｊ）をもった符号データについては、第ｉ番目の音源信号もしくは第（ｉ＋１）番目の音源信号に、基本周波数に対するずれ量に相当する周波数補正を施した補正信号を用いて再生を行うようにしたものである。
【００１３】
(6) 本発明の第６の態様は、上述の第３の態様に係る音響信号の符号化方法によって符号化された符号データを復号化するための音響信号の復号化方法において、
復号化方法を実行する装置が、基本周波数に相当するＮ通りの音源信号を発生することができるＭＩＤＩ音源を利用して、各チャンネルごとに、音源信号に対する固有の周波数補正量を設定し、基本周波数をもった符号データを含むチャンネルについては音源信号がそのまま再生され、中間周波数をもった符号データを含むチャンネルについては固有の周波数補正量に基づく周波数補正が行われた音源信号が再生されるようにしたものである。
【００１４】
(7) 本発明の第７の態様は、上述の第１〜第３の態様に係る音響信号の符号化方法をコンピュータに実行させるためのプログラムを、コンピュータ読み取り可能な記録媒体に記録するようにしたものである。
【００１５】
(8) 本発明の第８の態様は、上述の第４〜第６の態様に係る音響信号の復号化方法をコンピュータに実行させるためのプログラムを、コンピュータ読み取り可能な記録媒体に記録するようにしたものである。
【００１６】
【発明の実施の形態】
以下、本発明を図示する実施形態に基づいて説明する。
【００１７】
§１．本発明に係る音響信号の符号化方法の基本原理
はじめに、本発明に係る音響信号の符号化方法の基本原理を述べておく。この基本原理は、前掲の各公報あるいは明細書に開示されているので、ここではその概要のみを簡単に述べることにする。
【００１８】
いま、図１(a) に示すように、時系列の強度信号としてアナログ音響信号が与えられたものとしよう。図示の例では、横軸に時間ｔ、縦軸に振幅（強度）をとってこの音響信号を示している。ここでは、まずこのアナログ音響信号を、デジタルの音響データとして取り込む処理を行う。これは、従来の一般的なＰＣＭの手法を用い、所定のサンプリング周期でこのアナログ音響信号をサンプリングし、振幅を所定の量子化ビット数を用いてデジタルデータに変換する処理を行えばよい。ここでは、説明の便宜上、ＰＣＭの手法でデジタル化した音響データの波形も、図１(a) のアナログ音響信号と同一の波形で示すことにする。
【００１９】
続いて、この符号化対象となる音響信号の時間軸上に、複数の単位区間を設定する。図１(a) に示す例では、時間軸ｔ上に等間隔に６つの時刻ｔ１〜ｔ６が定義され、これら各時刻を始点および終点とする５つの単位区間ｄ１〜ｄ５が設定されている（実際には、後述するように、各区間は部分的に重複するように設定するのが好ましい）。
【００２０】
こうして単位区間が設定されたら、各単位区間ごとの音響信号（ここでは、区間信号と呼ぶことにする）について、それぞれ代表周波数を選出する。各区間信号には、通常、様々な周波数成分が含まれているが、その中でも振幅の大きな周波数成分を代表周波数として選出すればよい。代表周波数は１つだけ選出してもよいが、複数の代表周波数を選出した方が、より精度の高い符号化が可能になる。代表周波数を選出する方法のひとつは、フーリエ変換を利用する方法である。すなわち、各区間信号ごとに、それぞれフーリエ変換を行い、スペクトルを作成する。このとき、ハニング窓（Hanning Window )などの重み関数で、切り出した区間信号にフィルタをかけてフーリエ変換を施す。一般にフーリエ変換は、切り出した区間前後に同様な信号が無限に存在することが想定されているため、重み関数を用いない場合、作成したスペクトルに高周波ノイズがのることが多い。ハニング窓関数など区間の両端の重みが０になるような重み関数を用いると、このような弊害をある程度抑制できる。
【００２１】
図１(b) には、単位区間ｄ１について作成されたスペクトルの一例が示されている。このスペクトルでは、横軸上に定義された周波数ｆによって、単位区間ｄ１についての区間信号に含まれる周波数成分（０〜Ｆ：ここでＦはサンプリング周波数）が示されており、縦軸上に定義された複素強度Ａによって、各周波数成分ごとの複素強度が示されている。
【００２２】
次に、このスペクトルの周波数軸ｆに対応させて、離散的に複数Ｎ個の符号コードを定義する。別言すれば、周波数軸ｆ上に、複数Ｎ通りの周波数を定義することになる。この例では、符号コードとしてＭＩＤＩデータで利用されるノートナンバーｎを用いており、ｎ＝０〜１２７までの１２８個の符号コードを定義している。ノートナンバーｎは、音符の音階を示すパラメータであり、たとえば、ノートナンバーｎ＝６９は、ピアノの鍵盤中央の「ラ音（Ａ３音）」を示しており、４４０Ｈｚの音に相当する。このように、１２８個のノートナンバーには、いずれも所定の周波数が対応づけられるので、スペクトルの周波数軸ｆ上の所定位置に、それぞれ１２８個のノートナンバーｎが離散的に定義されることになる。ここでは、この１２８個のノートナンバーｎに対応する周波数を基本周波数と呼ぶことにする。
【００２３】
ノートナンバーｎは、１オクターブ上がると、周波数が２倍になる対数尺度の音階を示すため、周波数軸ｆに対して線形には対応しない。すなわち、周波数軸ｆ上に離散的に定義された各ノートナンバーに対応する基本周波数は、個々の周波数値が等比級数配列をなす周波数ということになる。そこで、ここでは周波数軸ｆを対数尺度で表し、この対数尺度軸上にノートナンバーｎを定義した強度グラフを作成してみる。図１(c) は、このようにして作成された単位区間ｄ１についての強度グラフを示す。この強度グラフの横軸は、図１(b) に示すスペクトルの横軸を対数尺度に変換したものであり、ノートナンバーｎ＝０〜１２７が等間隔にプロットされている。一方、この強度グラフの縦軸は、図１(b) に示すスペクトルの複素強度Ａを実効強度Ｅに変換したものであり、各ノートナンバーｎの位置における強度を示している。一般に、フーリエ変換によって得られる複素強度Ａは、実数部Ｒ（余弦関数との相関を示す部分）と虚数部Ｉ（正弦関数との相関を示す部分）とによって表されるが、実効強度Ｅは、Ｅ＝（Ｒ^２＋Ｉ^２）^１／２なる二乗和平方根値として演算によって求めることができる。
【００２４】
こうして求められた単位区間ｄ１の強度グラフは、単位区間ｄ１についての区間信号に含まれる振動成分について、ノートナンバーｎ＝０〜１２７に相当する各振動成分の割合を実効強度として示すグラフということができる。そこで、この強度グラフに示されている各実効強度に基いて、全Ｎ個（この例ではＮ＝１２８）のノートナンバーの中からＰ個のノートナンバーを選択し、このＰ個のノートナンバーｎを、単位区間ｄ１を代表する代表符号コードとして抽出する。これは、全Ｎ通りの基本周波数の中から、Ｐ個の周波数を代表周波数として選出することに他ならない。ここでは、説明の便宜上、Ｐ＝３として、全１２８個の候補の中から３個のノートナンバーを代表符号コードとして抽出する場合を示すことにする。たとえば、「候補の中から強度の大きい順にＰ個の符号コードを抽出する」という基準に基いて抽出を行えば、図１(c) に示す例では、第１番目の代表符号コードとしてノートナンバーｎ（ｄ１，１）が、第２番目の代表符号コードとしてノートナンバーｎ（ｄ１，２）が、第３番目の代表符号コードとしてノートナンバーｎ（ｄ１，３）が、それぞれ抽出されることになる。
【００２５】
このようにして、Ｐ個の代表符号コードが抽出されたら、これらの代表符号コードとその実効強度によって、単位区間ｄ１についての区間信号を表現することができる。たとえば、上述の例の場合、図１(c) に示す強度グラフにおいて、ノートナンバーｎ（ｄ１，１）、ｎ（ｄ１，２）、ｎ（ｄ１，３）の実効強度がそれぞれｅ（ｄ１，１）、ｅ（ｄ１，２）、ｅ（ｄ１，３）であったとすれば、以下に示す３組のデータ対によって、単位区間ｄ１の音響信号を表現することができる。
ｎ（ｄ１，１），ｅ（ｄ１，１）
ｎ（ｄ１，２），ｅ（ｄ１，２）
ｎ（ｄ１，３），ｅ（ｄ１，３）
以上、単位区間ｄ１についての処理について説明したが、単位区間ｄ２〜ｄ５についても、それぞれ別個に同様の処理が行われ、代表符号コードおよびその強度を示すデータが得られることになる。たとえば、単位区間ｄ２については、
ｎ（ｄ２，１），ｅ（ｄ２，１）
ｎ（ｄ２，２），ｅ（ｄ２，２）
ｎ（ｄ２，３），ｅ（ｄ２，３）
なる３組のデータ対が得られる。このようにして各単位区間ごとに得られたデータによって、原音響信号を符号化することができる。
【００２６】
図２は、上述の方法による符号化の概念図である。上述の例では、非常に単純な区間設定例を述べたが、図２に示す例では、より実用的な区間設定が行われている。すなわち、図２(a) に示すように、単位区間ｄ１，ｄ２，ｄ３，…は、いずれも部分的に重なっており、このような区間設定に基いて前述の処理を行うと、図２(b) の概念図に示されているような符号化が行われることになる。この例においても、個々の単位区間ごとにそれぞれ３個の代表符号コードを抽出しており（Ｐ＝３）、これら代表符号コードに関するデータを３つのサブトラックＴ１〜Ｔ３に分けて収容するようにしている。たとえば、単位区間ｄ１について抽出された代表符号コードｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ１，３）は、それぞれサブトラックＴ１，Ｔ２，Ｔ３に収容されている。もっとも、図２(b) は、上述の方法によって得られる符号データを音符の形式で示した概念図であり、実際には、各音符にはそれぞれ強度に関するデータが付加されている。たとえば、サブトラックＴ１には、ノートナンバーｎ（ｄ１，１），ｎ（ｄ２，１），ｎ（ｄ３，１）…なる音階を示すデータとともに、ｅ（ｄ１，１），ｅ（ｄ２，１），ｅ（ｄ３，１）…なる強度を示すデータが収容されることになる。また、図２(b) に示す概念図では、音符の水平方向に関する位置によって、個々の単位区間の時間軸上での位置が示されているが、実際には、この時間軸上での位置を正確に数値として示すデータが各音符に付加されていることになる。
【００２７】
なお、ここで採用する符号化の形式としては、必ずしもＭＩＤＩ形式を採用する必要はないが、この種の符号化形式としてはＭＩＤＩ形式が最も普及しているため、実用上はＭＩＤＩ形式の符号データを用いるのが最も好ましい。ＭＩＤＩ形式では、「ノートオン」データもしくは「ノートオフ」データが、「デルタタイム」データを介在させながら存在する。「ノートオン」データは、特定のノートナンバーＮとベロシティーＶとを指定して特定の音の演奏開始を指示するデータであり、「ノートオフ」データは、特定のノートナンバーＮとベロシティーＶとを指定して特定の音の演奏終了を指示するデータである。また、「デルタタイム」データは、所定の時間間隔を示すデータである。ベロシティーＶは、たとえば、ピアノの鍵盤などを押し下げる速度（ノートオン時のベロシティー）および鍵盤から指を離す速度（ノートオフ時のベロシティー）を示すパラメータであり、特定の音の演奏開始操作もしくは演奏終了操作の強さを示すことになる。
【００２８】
前述の方法では、第ｋ番目の単位区間ｄｋについて、代表符号コードとしてＰ個のノートナンバーｎ（ｄｋ，１），ｎ（ｄｋ，２），…，ｎ（ｄｋ，Ｐ）が得られ、このそれぞれについて実効強度ｅ（ｄｋ，１），ｅ（ｄｋ，２），…，ｅ（ｄｋ，Ｐ）が得られる。そこで、次のような手法により、ＭＩＤＩ形式の符号データを作成することができる。まず、「ノートオン」データもしくは「ノートオフ」データの中で記述するノートナンバーＮとしては、得られたノートナンバーｎ（ｄｋ，１），ｎ（ｄｋ，２），…，ｎ（ｄｋ，Ｐ）をそのまま用いればよい。一方、「ノートオン」データもしくは「ノートオフ」データの中で記述するベロシティーＶとしては、得られた実効強度ｅ（ｄｋ，１），ｅ（ｄｋ，２），…，ｅ（ｄｋ，Ｐ）を所定の方法で規格化した値を用いればよい。また、「デルタタイム」データは、各単位区間の長さに応じて設定すればよい。
【００２９】
なお、上述の例では、区間信号のフーリエスペクトルを求め、その強度値の大きい順にＰ個の周波数（ノートナンバー）を選出して代表周波数とする処理を行っているが、代表周波数の選出には、その他の方法を用いてもかまわない。たとえば、特開２０００−２６１３２２号公報には、一般化調和解析の手法を用いて代表周波数の選出を行う例が示されており、本発明に係る符号化方法を実施する上では、このような一般化調和解析の手法を用いて代表周波数の選出を行ってもかまわない。フーリエスペクトルのピークに基づいて代表周波数を選出する方法も、一般化調和解析の手法を用いて代表周波数を選出する方法も、結局は、個々の単位区間内の音響データについて、予め用意された各基本周波数（上述の例では、ＭＩＤＩノートナンバーに対応する１２８通りの周波数）をもった周期関数との相関値を演算し、この相関値に基づいて代表周波数を選出する（具体的には、相関の大きい周期関数の周波数を代表周波数として選出する）という点では変わりはない。
【００３０】
§２．本発明に係る音響信号の符号化および復号化方法
上述したように、本発明に係る音響信号の符号化方法の基本原理は、時系列の強度信号として与えられる音響信号を、デジタルの音響データとして取り込み、この音響データの時間軸上に複数の単位区間を設定し、各単位区間ごとの音響データ（区間信号）を、それぞれ所定の代表周波数をもった周期信号で表現する、という手法にある。ここで、代表周波数は、区間信号に含まれる代表的な周波数成分を示すものであり、予め用意されたＮ通りの基本周波数の中から、区間信号に対する相関の大きなものが代表周波数として選出されることになる。前述の例では、ＭＩＤＩノートナンバーに対応する１２８通りの基本周波数の中から、３つの代表周波数を選出し、１つの区間信号を、３つの符号データで表現したことになる。ここで、各符号データは、選出した代表周波数、演算した相関値（当該代表周波数をもった周期関数と区間信号との相関を示す値）、個々の単位区間の時間軸上での位置、を示す情報から構成され、ＭＩＤＩデータを利用した場合、ノートナンバー（代表周波数を示す）、ベロシティー（相関値を示す）、デルタタイム（時間軸上での位置を示す）から構成される。
【００３１】
このようにＭＩＤＩデータの形式で符号化された符号データは、一般のＭＩＤＩ音源を利用した再生（復号化）が可能であり、１２８通りの基本周波数（ノートナンバー）に対応する音源波形によって再生されることになる。図３には、この１２８通りのノートナンバーのうち、ノートナンバー６０〜６５に相当する６通りの基本周波数が周波数軸上にプロットされている。図示のとおり、ノートナンバー６０〜６５は、音階では、Ｃ３〜Ｆ３に相当し、周波数では２６２〜３５０Ｈｚに相当する。図示の周波数軸は対数尺度となっているため、各基本周波数は等間隔にプロットされているが、実際には、各基本周波数の周波数値は等比級数をなし、半音単位で音階が上がってゆく。このように、周波数軸上に半音単位で離散的に設定された１２８通りの基本周波数のみを用いた場合、楽器音は十分に表現できたとしても、ヴォーカル音などの一般の音響信号を十分に表現することはできない。
【００３２】
本発明の着眼点は、復号化の際に音源として用意される基本周波数だけでなく、その間の中間周波数も用いて原音響信号をできるだけ忠実に符号化しておき、復号化の際には、基本周波数については音源信号をそのまま用いた再生を行い、中間周波数については音源信号に対して周波数補正を施して再生を行うようにする点にある。
【００３３】
たとえば、図４に示す例のように、隣接する基本周波数の間を対数尺度上で３等分し、それぞれ２つずつの中間周波数を定義する。ここでは、これら各中間周波数のノートナンバーとして、基本周波数のノートナンバーに「＋」または「−」を付加して示すことにする。たとえば、基本周波数のノートナンバー６０と６１との間に定義される２つの中間周波数は、ノートナンバー「６０＋」および「６１−」として示される。別言すれば、ノートナンバー「ｎ」で示される基本周波数について、「１／３半音」だけ高い「ｎ＋」なるノートナンバーで示される中間周波数と、「１／３半音」だけ低い「ｎ−」なるノートナンバーで示される中間周波数と、が定義されることになる。その結果、１２８通りの基本周波数と、２５６通りの中間周波数とが定義されることになり、周波数軸上には、合計で３８４通りの周波数が定義されることになる。これらの周波数は、対数尺度の周波数軸上で等間隔に並んだ等比級数配列をなすことになる。
【００３４】
§１で述べた例では、１２８通りの基本周波数をもった周期信号について、それぞれ音響データとの相関を求め、ある程度以上の相関をもった基本周波数を代表周波数として選出する処理を行い、当該音響データを、選出された代表周波数をもつ音符データとして表現した。ここで述べる方法では、３８４通りの周波数（基本周波数と中間周波数）をもった周期信号について、それぞれ音響データとの相関を求め、ある程度以上の相関をもった基本周波数もしくは中間周波数を代表周波数として選出する処理を行い、当該音響データを、選出された代表周波数をもつ音符データとして表現することになる。このような方法によれば、代表周波数の選出の自由度が３倍に増加するため、原音響波形に忠実な符号化が可能になる。
【００３５】
ただし、一般のＭＩＤＩ音源には、基本周波数に相当する音源信号しか用意されていないため、上述の方法で符号化された符号データを復号化（再生）する際には、工夫が必要になる。すなわち、ノートナンバー「ｎ」で示される基本周波数をもった符号データについては、ＭＩＤＩ音源に用意された基本周波数の音源信号をそのまま用いて再生し、ノートナンバー「ｎ＋」あるいは「ｎ−」で示される中間周波数をもった符号データについては、ＭＩＤＩ音源に用意された基本周波数の音源信号に対して周波数補正を施して再生を行うようにする。具体的には、基本周波数より「１／３半音」だけ高い「ｎ＋」なるノートナンバーで示される中間周波数をもった符号データの場合は、音源信号を「１／３半音」だけ高くするチューニングを行って再生を行い、基本周波数より「１／３半音」だけ低い「ｎ−」なるノートナンバーで示される中間周波数をもった符号データの場合は、音源信号を「１／３半音」だけ低くするチューニングを行って再生を行えばよい。
【００３６】
このように、個々の符号データごとに、それぞれ所定のチューニングを行った再生音が提示されるようにするためには、符号化の時点で、各符号データのもつ周波数の基本周波数に対するずれ量に応じて、グループ分けした出力を行うようにすればよい。たとえば、上述の例の場合、図５に示すような分類を行えばよく、各グループごとに独立したトラックに出力するようにすればよい。すなわち、ノートナンバー「ｎ」で示される基本周波数をもった符号データをグループ１、ノートナンバー「ｎ＋」で示される中間周波数をもった符号データをグループ２、ノートナンバー「ｎ−」で示される中間周波数をもった符号データをグループ３に分類すればよい。このような分類を行っておけば、グループ１に所属する符号データを復号化する際には、ＭＩＤＩ音源に用意された基本周波数の音源信号をそのまま用いた再生を行い、グループ２に所属する符号データを復号化する際には、ＭＩＤＩ音源に用意された基本周波数に対して「１／３半音」だけ高くするチューニングを行って再生を行い、グループ３に所属する符号データを復号化する際には、ＭＩＤＩ音源に用意された基本周波数に対して「１／３半音」だけ低くするチューニングを行って再生を行うことにより、符号化に応じた正しい復号化が可能になる。
【００３７】
最も標準的に利用されているＳＭＦ（Standard MIDI File）フォーマットでは、ＭＩＤＩデータを１６チャンネル（オーディオの分野では一般にトラックと呼ばれるが、ＭＩＤＩ規格ではチャンネルと呼ばれる）に分けて収録することができ、各チャンネルごとに、音源信号のチューニングを行うことが可能である。そこで、実用上は、このチャンネルを利用して、ＭＩＤＩデータのグループ分けを行うのが好ましい。たとえば、図５に示す例の場合、グループ１に属する符号データをチャンネル０に、グループ２に属する符号データをチャンネル１に、グループ３に属する符号データをチャンネル２に、それぞれ分けて収容しておき、再生時には、チャンネル１については「１／３半音」だけ高くするチューニングを指示し、チャンネル２については「１／３半音」だけ低くするチューニングを指示すればよい。
【００３８】
なお、上述の例では、隣接する基本周波数間を対数尺度の周波数軸上で３等分し、２つの中間周波数を定義したが、隣接する基本周波数間をより多数に分割し、より多数の中間周波数を定義してもよい。一般的には、Ｎ通りの基本周波数（Ｎ＞１）を定義するとともに、各隣接基本周波数の中間にそれぞれ（Ｍ−１）通りの中間周波数（Ｍ＞１）を定義すれば、Ｎ×Ｍ通りの周波数の中から代表周波数を選出できることになり、選出の自由度はそれだけ高まることになる。図６は、このようなＮ×Ｍ通りの一般的な周波数定義を行った例を示す図である。周波数軸上にプロットされた周波数ｆ（ｉ，０）は、Ｎ通りの基本周波数のうちの第ｉ番目の基本周波数を示し、周波数ｆ（ｉ＋１，０）は、第（ｉ＋１）番目の基本周波数を示している（０は基本周波数であることを示す）。この隣接する２つの周波数ｆ（ｉ，０）とｆ（ｉ＋１，０）との間は、対数尺度の周波数軸上でＭ等分され、（Ｍ−１）通りの中間周波数が定義されている。すなわち、第１番目の中間周波数がｆ（ｉ，１）、第２番目の中間周波数がｆ（ｉ，２）、第ｊ番目の中間周波数がｆ（ｉ，ｊ）、第（Ｍ−１）番目の中間周波数がｆ（ｉ，Ｍ−１）である。このような周波数定義は、１以上の実数で与えられる比例定数Ｋを用いて、
ｆ（ｉ＋１，０）＝Ｋ・ｆ（ｉ，０）
ｆ（ｉ，ｊ）＝Ｋ^ｊ／Ｍ・ｆ（ｉ，０）
なる式で示すことができる。
【００３９】
このようなＮ×Ｍ通りの周波数を用いて符号化された符号データを復号化する際には、基本周波数に相当するＮ通りの音源信号を発生することができる音源を用意し、第ｉ番目の基本周波数ｆ（ｉ，０）をもった符号データについては、第ｉ番目の音源信号を用いて再生を行い、第ｉ番目の基本周波数と第（ｉ＋１）番目の基本周波数との中間に定義される（Ｍ−１）通りの中間周波数のうちの第ｊ番目の中間周波数ｆ（ｉ，ｊ）をもった符号データについては、第ｉ番目の音源信号もしくは第（ｉ＋１）番目の音源信号に、基本周波数に対するずれ量に相当する周波数補正を施した補正信号を用いて再生を行うようにすればよい。
【００４０】
なお、パラメータＮ，Ｍの値を増加させればさせるほど、原音響信号の符号化品質は向上するが、代表周波数の選出を行う際には、Ｎ×Ｍ通りの各周波数をもった周期関数との相関を求める演算が必要になるため、パラメータＮおよびＭの値を増加させればさせるほど、演算負担も増加することになる。また、一般的なＭＩＤＩ音源には、１２８通りの基本周波数しか用意されていないので、汎用ＭＩＤＩ音源での再生が可能な符号化を行うのであれば、Ｎ＝１２８に設定するのが好ましい。また、上述したように、ＳＭＦフォーマットでのチャンネル数は１６に定められているため、グループ分けをチャンネルを利用して行う場合には、Ｍ≦１６に設定する必要がある。
【００４１】
§３．具体的な符号化および復号化の実施例
最後に、本発明に係る音響信号の符号化方法および復号化方法の具体的な実施例を述べておく。ここでは、何らかの原音響信号について、これまで述べてきた符号化手法を適用することにより、図７に示すような符号データが得られたものとしよう。これら各符号データにおいて、文字「Ｎ」および「１〜４」の数字からなる部分は、ＭＩＤＩデータの所定のノートナンバーを示しており、「＋」および「−」の部分は、このノートナンバーで示される基本周波数に対するずれ量を示している。すなわち、符号化の際には、１２８通りの基本周波数と、２５６通りの中間周波数が用いられており、「＋」が付された符号データは、基本周波数より「１／３半音」高い中間周波数の符号データ、「−」が付された符号データは、基本周波数より「１／３半音」低い中間周波数の符号データ、「＋」も「−」も付されていない符号データは、基本周波数の符号データ、ということになる。なお、図７では、各符号データが２行にわたって配置されているが、これは、１つの単位区間内の音響データに対して、２つの代表周波数が選出され、２つの符号データが生成されたことを示しており、水平方向が時間軸に相当することになる。
【００４２】
さて、このような符号データが生成されたら、これらを図８に示すように、３つのグループ（トラック）に分類する。ここで、グループ１は基本周波数の符号データ、グループ２は「＋」が付された中間周波数の符号データ、グループ３は「−」が付された中間周波数の符号データである。なお、ここでは、各グループごとに更にサブトラックを設け、同一周波数の符号データを同一サブトラックに収容するようにしてある。たとえば、グループ１では、符号データ「Ｎ１」，「Ｎ２」，「Ｎ４」がそれぞれ独立したサブトラックに収容されている。この例では、グループ２，３には、同一周波数の符号データしか含まれていないため、１つのサブトラックしか示されていないが、異なる周波数の符号データが含まれていた場合には、グループ１と同様に、それぞれ異なるサブトラックに収容される。
【００４３】
続いて、図９に示すように、グループ２，３に収容された符号データ、すなわち、中間周波数をもった符号データについて、「＋」，「−」の符号を削除する。そもそも、この「＋」，「−」の符号は、基本周波数よりも「１／３半音」高いまたは低いことを示す符号であり、一般的なＭＩＤＩ規格には定義されていない符号である。この「＋」，「−」の符号は、各グループ（トラック）への分類を行うために付された符号であり、分類が完了した時点でその役目を終えることになる。こうして、これらの符号を削除することにより、図９に示す符号データは、ＭＩＤＩ規格に適合したデータとなる。もちろん、「１／３半音」高いまたは低いという情報は、グループごとに保持されていることになる。
【００４４】
次に、各サブトラックごとに、符号データの統合化を行う。この統合化は、時間軸上で隣接配置された複数の符号データが、同一または類似の特性をもった符号データであった場合に、１つに統合してまとめる処理であり、この統合化処理により、符号データの総数を減少させることができる。たとえば、図９に示す例において、「同一ノートナンバーをもった符号データが隣接配置されていた場合には、これを統合する」という方針で統合化を行うと、図１０に示すような結果が得られる。それぞれ矩形で囲った複数の符号データが、１つの符号データに統合されることになる。また、この図１０では、グループ１〜３が、ＭＩＤＩ規格におけるチャンネル０〜２に割り当てられて収容された例が示されている。
【００４５】
図１１は、図１０に示す各符号データを、ＭＩＤＩデータにおける「ノートオン（演奏操作開始）」および「ノートオフ（演奏操作終了）」を示すデータに置き換えて示した例である。先頭の番号１〜１２は、このデータのシリアル番号である。たとえば、「１．Ｃｈ０Ｎ１Ｏｎ」は、チャンネル０のノートナンバーＮ１を「ノートオン」にせよ、というコマンドを示しており、「２．Ｃｈ０Ｎ４Ｏｎ」は、チャンネル０のノートナンバーＮ４を「ノートオン」にせよ、というコマンドを示している。実際のＭＩＤＩデータでは、これらのコマンドが、所定のデルタタイムとともにコード化されて記述されることになる。
【００４６】
参考のために、図１０に示す各符号データを、ＳＭＦフォーマットによりコード化することにより得られる具体的なＭＩＤＩデータ（１６進数表記）の構成を図１２に示す。この図１２に示された各行が、図１１に示す１つのコマンドに対応する。図１２の各行の行末には、このコマンドが記載されている。行頭の「Delta Time」欄の数値は、各コマンドを実行するまでの待ち時間を示す数値であり、各符号データの時間軸上での位置を示す情報ということになる。続く「Status」欄は、コマンドの種類を示すコードであり、１桁目の「９」は「ノートオン」、「８」は「ノートオフ」を示し、２桁目の数値はチャンネル番号を示している。次の「Note Number 」欄には、ノートナンバーが記述される（図では、便宜上、Ｎ１〜Ｎ４なる符号が記載されているが、実際には、０〜１２７（１６進数での００〜７Ｆ）までのうちのいずれかのノートナンバーが記載されることになる）。最後の「Velocity」欄には、ベロシティーの数値が記載される。この例では、「ノートオン」の際のベロシティーのみを設定し（前述したように、ノートナンバーに対応する周波数をもった周期関数との相関値に基づいて設定される）、「ノートオフ」の際のベロシティー値はすべて０としてある。
【００４７】
一方、図１３には、各チャンネルごとのチューニングコマンドを、ＳＭＦフォーマットによりコード化した具体的なＭＩＤＩデータの構成例が示されている。ここでは細かな説明は省略するが、ここに示すコードの１〜６行目は、チャンネル０を通常のチューニング（周波数補正を行わず、基本周波数どおりの再生）を設定するコマンドであり、７〜１２行目は、チャンネル１の音程を高めるチューニング（基本周波数に対する＋３３セントの周波数補正）を設定するコマンドであり、１３〜１８行目は、チャンネル２の音程を低めるチューニング（基本周波数に対する−３３セントの周波数補正）を設定するコマンドである。１００セントの周波数補正が半音に相当するので、＋３３セントの周波数補正は、「１／３半音」高める補正となり、−３３セントの周波数補正は、「１／３半音」低める補正となる。
【００４８】
実際には、図１３に示すチューニングコマンドからなるＭＩＤＩデータを、図１２に示す演奏情報からなるＭＩＤＩデータに先行して実行させておくようにし、音源信号のチューニングを完了した状態で、再生が行われるようにすることになる。ＳＭＦフォーマットのＭＩＤＩデータでは、このように、各チャンネルごとのチューニングをソフトウエア的に容易に行うことができるので、通常のＭＩＤＩ規格に準じたままの符号データを用いて、本発明に係る復号化方法を実行することが可能である。また、チューニングに必要な情報は、すべてＭＩＤＩデータ内に含ませることができるので、再生を行う際には、何ら特別な設定は不要である。
【００４９】
以上、本発明に係る音響信号の符号化方法および復号化方法を図示する実施形態に基づいて説明したが、本発明はこれらの実施形態に限定されるものではなく、この他にも種々の態様で実施可能である。特に、上述の実施形態では、ＭＩＤＩデータとして符号化を行う例を示したが、本発明は、ＭＩＤＩデータへの符号化に限定されるものではなく、任意の規格に基づく符号化に適用可能である。また、本発明に係る音響信号の符号化方法および復号化方法は、パソコンなどの汎用コンピュータに専用のソフトウエアを組み込むことにより実施可能であり、そのような専用のソフトウエアは、コンピュータ読み取り可能な記録媒体に記録して配付することができる。
【００５０】
【発明の効果】
以上のとおり本発明に係る音響信号の符号化方法および復号化方法によれば、音源信号に含まれている基本周波数の他に、中間周波数を用いて符号化を行い、復号化の際には、周波数補正した音源信号により中間周波数を再生するようにしたため、原音響波形を高い品質をもって符号化し、これを復号化することができるようになる。
【図面の簡単な説明】
【図１】本発明に係る音響信号の符号化方法の基本原理を示す図である。
【図２】図１に示す原理に基づいて作成された符号データの概念図である。
【図３】本発明に係る音響信号の符号化方法を実施する上で定義される基本周波数の一例を示す図である。
【図４】図３に示す基本周波数の間に、それぞれ２つの中間周波数を定義した例を示す図である。
【図５】本発明に係る音響信号の符号化方法における符号データのグループ分けの一例を示す図である。
【図６】本発明に係る音響信号の符号化方法を実施する上で定義される基本周波数および中間周波数の一般例を示す図である。
【図７】本発明に係る音響信号の符号化方法により作成された符号データの一実施例を示す図である。
【図８】図７に示す実施例に係る符号データをグループ分けした状態を示す図である。
【図９】図８に示すグループ分け後の符号データの状態を示す図である。
【図１０】図９に示す符号データに対して統合化処理を行い、これを各チャンネルに収容した状態を示す図である。
【図１１】図１０に示す各符号データを、ＭＩＤＩデータにおける「ノートオン（演奏操作開始）」および「ノートオフ（演奏操作終了）」を示すデータに置き換えて示した図である。
【図１２】図１０に示す各符号データを、ＳＭＦフォーマットによりコード化した具体的なＭＩＤＩデータを示す図である。
【図１３】各チャンネルごとのチューニングコマンドを、ＳＭＦフォーマットによりコード化した具体的なＭＩＤＩデータを示す図である。
【符号の説明】
ｄ１〜ｄ５…単位区間
ｆ（ｉ，０），ｆ（ｉ＋１，０）…基本周波数
ｆ（ｉ，１），ｆ（ｉ，２），ｆ（ｉ，ｊ），ｆ（ｉ，Ｍ−１）…中間周波数
ｎ…ノートナンバー
Ｎ１〜Ｎ４…ノートナンバー
ｔ１〜ｔ６…時刻
Ｔ１〜Ｔ３…サブトラック[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an encoding method and a decoding method for an acoustic signal, and relates to a technique for encoding an acoustic signal given as a time-series intensity signal, and decoding and reproducing the encoded acoustic signal. In particular, the present invention is suitable for a process for efficiently converting a general audio signal into MIDI format code data, and includes broadcast media (radio, television), communication media (CS video / audio distribution, Internet distribution), package media. (CD, MD, cassette, video, LD, CD-ROM, game cassette) etc. are expected to be applied to various industrial fields that produce various audio contents.
[0002]
[Prior art]
As a technique for encoding an acoustic signal, a PCM (Pulse Code Modulation) technique is the most popular technique, and is currently widely used as a recording system for audio CDs, DAT, and the like. The basic principle of this PCM method is that analog audio signals are sampled at a predetermined sampling frequency, and the signal intensity at each sampling is quantized and expressed as digital data. The sampling frequency and the number of quantization bits can be increased. The more you play, the more faithfully the original sound can be played. However, the higher the sampling frequency and the number of quantization bits, the more information is required. Therefore, as a technique for reducing the amount of information as much as possible, an ADPCM (Adaptive Differential Pulse Code Modulation) technique that encodes only a signal change difference is also used.
[0003]
On the other hand, the MIDI (Musical Instrument Digital Interface) standard, which was born from the idea of encoding musical instrument sounds by electronic musical instruments, has been actively used with the spread of personal computers. The code data according to the MIDI standard (hereinafter referred to as MIDI data) is basically data that describes the operation of the musical instrument performance such as which keyboard key of the instrument is played with what strength. The data itself does not include the actual sound waveform. Therefore, when reproducing actual sound, a separate MIDI sound source storing the waveform of the instrument sound is required. However, compared to the case where sound is recorded by the PCM method described above, the amount of information is extremely small, and the high coding efficiency is attracting attention. The encoding and decoding technology based on the MIDI standard is widely used in software for performing musical instruments, practicing musical instruments, and composing music using a personal computer, and is widely used in fields such as karaoke and game sound effects. Has been.
[0004]
[Problems to be solved by the invention]
As described above, when an acoustic signal is encoded by the PCM method, if an attempt is made to ensure sufficient sound quality, the amount of information becomes enormous and the burden of data processing must be increased. Therefore, normally, in order to limit the amount of information to a certain level, a certain level of sound quality must be compromised. Of course, if the encoding method based on the MIDI standard is adopted, it is possible to reproduce a sound having a sufficient sound quality with a very small amount of information. However, as described above, the MIDI standard itself originally performed the operation of the musical instrument. Since it is for encoding, it cannot be widely applied to general sound. In other words, in order to create MIDI data, it is necessary to actually play a musical instrument or prepare information on a musical score.
[0005]
  As described above, both the conventional PCM method and the MIDI method have advantages and disadvantages in the method of encoding an acoustic signal, and sufficient sound quality is ensured with a small amount of information for general sound. I can't do it. However, there is an increasing demand for efficient encoding of general sound. In the field of human voice and singing voice called so-called vocal sound, such a request has been strongly issued for some time. For example, in the fields of language education, vocal music education, criminal investigation and the like, there is a strong demand for a technique for efficiently encoding a vocal acoustic signal. In order to meet such a demand, JP-A-10-247099, JP-A-11-73199, JP-A-11-73200, JP-A-11-95753,JP 2000-99009 A, JP 2000-99093 A, JP 2000-261322 AHas proposed a novel encoding method capable of using MIDI data.
[0006]
In these methods, encoding is performed by a method in which a plurality of unit sections are set along the time axis of the acoustic signal, and the acoustic signals in each unit section are represented by several types of periodic functions prepared in advance. Is called. However, since the periodic function used in this method is a periodic function having discrete frequency values discretely defined on the frequency axis, encoding that is faithful to the original acoustic signal is not necessarily performed. In particular, when encoding is performed using MIDI data, a merit that decoding (reproduction of the original sound signal) using a general MIDI sound source can be obtained, but as a periodic function to be used, a MIDI standard note number is used. It is limited to a periodic function having a corresponding frequency (frequency in semitone units). The MIDI standard was originally established for the purpose of describing musical instrument performance operations. Even if playback is performed using only discrete frequencies in semitones, there is no problem in expressing instrument sounds. Absent. However, when expressing a general sound signal having various frequency components such as vocal sound, encoding using only a discrete frequency of a semitone unit provides a reproduced sound with sufficient quality. It is not possible.
[0007]
Therefore, the present invention has an object to provide an audio signal encoding method capable of performing conversion to encoded data such as MIDI data with high quality, and decoding according to such encoding. It aims to provide a method.
[0008]
[Means for Solving the Problems]
(1) A first aspect of the present invention is an acoustic signal encoding method for encoding an acoustic signal given as a time-series intensity signal.
An input stage for capturing an acoustic signal to be encoded as digital acoustic data;
A section setting stage for setting a plurality of unit sections on the time axis of the acoustic data;
Defining N fundamental frequencies (N> 1) and defining (M-1) intermediate frequencies (M> 1) in the middle of each adjacent fundamental frequency;
A representative frequency that calculates a correlation value with a periodic function having each frequency for acoustic data in each unit section, and selects a representative frequency indicating a representative frequency component included in the acoustic data based on this correlation value The election stage,
An encoding stage for expressing the acoustic data in each unit section by code data including information indicating the selected representative frequency, the calculated correlation value, and the position on the time axis of each unit section;
A code output stage for classifying the code data into a plurality of M groups according to the deviation amount of the representative frequency with respect to the fundamental frequency, and outputting the code data separately for each group;
Is to do.
[0009]
(2) According to a second aspect of the present invention, in the audio signal encoding method according to the first aspect described above,
Of the N fundamental frequencies, the i-th (where i = 1, 2,..., N) fundamental frequency is f (i, 0), and the i-th fundamental frequency and the (i + 1) -th fundamental frequency. Of the (M−1) intermediate frequencies defined in the middle of the fundamental frequency, the jth intermediate frequency (where j = 1, 2,..., M−1) is defined as f (i, j). When using a proportional constant K given as a real number of 1 or more,
f (i + 1,0) = K · f (i, 0)
f (i, j) = K^{j / M}・ F (i, 0)
The frequency definition is performed at the frequency definition stage so that the following equation holds.
[0010]
(3) A third aspect of the present invention is the acoustic signal encoding method according to the second aspect described above,
In the encoding stage, encoding is performed with MIDI data expressed by a note number as a representative frequency, a velocity as a correlation value, and a delta time as a position on the time axis of a unit section. The code data classified into each group is output to the corresponding channel.
[0011]
  (4) According to a fourth aspect of the present invention, there is provided an acoustic signal decoding method for decoding code data encoded by the acoustic signal encoding method according to the first aspect described above.
  An apparatus for executing the decryption methodA sound source that can generate sound source signals of N fundamental frequenciesUseFor the group of code data having the fundamental frequency, reproduction is performed using this sound source signal, and for the group of code data having the intermediate frequency, the frequency correction corresponding to the deviation amount with respect to the fundamental frequency is added to the sound source signal. The reproduction is performed using the correction signal subjected to.
[0012]
  (5) According to a fifth aspect of the present invention, in the acoustic signal decoding method for decoding the code data encoded by the acoustic signal encoding method according to the second aspect described above,
  An apparatus for executing the decryption methodA sound source that can generate N sound source signals corresponding to the fundamental frequency.Use,
  The code data having the i-th fundamental frequency f (i, 0) is reproduced using the i-th sound source signal,
  Code having j-th intermediate frequency f (i, j) among (M−1) intermediate frequencies defined between the i-th basic frequency and the (i + 1) -th basic frequency The data is reproduced by using a correction signal obtained by performing frequency correction corresponding to a deviation amount with respect to the fundamental frequency on the i-th sound source signal or the (i + 1) -th sound source signal.
[0013]
  (6) According to a sixth aspect of the present invention, in the audio signal decoding method for decoding the code data encoded by the audio signal encoding method according to the third aspect described above,
  An apparatus for executing the decryption methodMIDI sound source capable of generating N sound source signals corresponding to the fundamental frequencyUseFor each channel, a unique frequency correction amount for the sound source signal is set, the sound source signal is reproduced as it is for the channel including the code data having the fundamental frequency, and the channel including the code data having the intermediate frequency is unique. The sound source signal subjected to frequency correction based on the frequency correction amount is reproduced.
[0014]
  (7) The seventh aspect of the present invention isA program for causing a computer to execute the acoustic signal encoding method according to the first to third aspects described above is recorded on a computer-readable recording medium.
[0015]
  (8) The eighth aspect of the present invention isAcoustic signal decoding methods according to the fourth to sixth aspects described aboveIs recorded on a computer-readable recording medium.
[0016]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described based on the illustrated embodiments.
[0017]
§1. Basic principle of encoding method of acoustic signal according to the present invention
First, the basic principle of the audio signal encoding method according to the present invention will be described. Since this basic principle is disclosed in the above-mentioned publications or specifications, only the outline will be briefly described here.
[0018]
Assume that an analog acoustic signal is given as a time-series intensity signal, as shown in FIG. In the illustrated example, this acoustic signal is shown with time t on the horizontal axis and amplitude (intensity) on the vertical axis. Here, first, the analog sound signal is processed as digital sound data. This may be performed by using a conventional general PCM method, sampling the analog acoustic signal at a predetermined sampling period, and converting the amplitude into digital data using a predetermined number of quantization bits. Here, for convenience of explanation, the waveform of the acoustic data digitized by the PCM method is also shown by the same waveform as the analog acoustic signal of FIG.
[0019]
Subsequently, a plurality of unit sections are set on the time axis of the acoustic signal to be encoded. In the example shown in FIG. 1 (a), six times t1 to t6 are defined at equal intervals on the time axis t, and five unit intervals d1 to d5 having these times as start points and end points are set ( In practice, as will be described later, each section is preferably set so as to partially overlap.
[0020]
When the unit section is set in this way, a representative frequency is selected for each acoustic signal (hereinafter referred to as section signal) for each unit section. Each section signal usually includes various frequency components, and among them, a frequency component having a large amplitude may be selected as a representative frequency. Although only one representative frequency may be selected, encoding with higher accuracy becomes possible by selecting a plurality of representative frequencies. One of the methods for selecting the representative frequency is a method using Fourier transform. That is, for each section signal, a Fourier transform is performed to create a spectrum. At this time, the weighted function such as Hanning Window is used to apply a Fourier transform by filtering the extracted section signal. In general, in the Fourier transform, it is assumed that the same signal exists infinitely before and after the extracted section. Therefore, when a weight function is not used, high frequency noise often appears in the created spectrum. If a weight function such as a Hanning window function is used such that the weights at both ends of the section are 0, such a harmful effect can be suppressed to some extent.
[0021]
FIG. 1 (b) shows an example of a spectrum created for the unit section d1. In this spectrum, the frequency component (0 to F: F is a sampling frequency) included in the section signal for the unit section d1 is indicated by the frequency f defined on the horizontal axis, and is defined on the vertical axis. The complex intensity A for each frequency component is indicated by the complex intensity A.
[0022]
Next, a plurality of N code codes are discretely defined corresponding to the frequency axis f of this spectrum. In other words, a plurality of N frequencies are defined on the frequency axis f. In this example, note numbers n used in MIDI data are used as code codes, and 128 code codes from n = 0 to 127 are defined. The note number n is a parameter indicating the scale of the note. For example, the note number n = 69 indicates the “ra sound (A3 sound)” at the center of the piano keyboard, and corresponds to a sound of 440 Hz. As described above, since the predetermined frequency is associated with each of the 128 note numbers, 128 note numbers n are discretely defined at predetermined positions on the frequency axis f of the spectrum. Become. Here, the frequency corresponding to the 128 note numbers n is referred to as a fundamental frequency.
[0023]
The note number n indicates a logarithmic scale whose frequency is doubled by one octave, and therefore does not correspond linearly to the frequency axis f. That is, the fundamental frequency corresponding to each note number discretely defined on the frequency axis f is a frequency at which individual frequency values form a geometric series array. Therefore, here, an intensity graph in which the frequency axis f is expressed on a logarithmic scale and the note number n is defined on the logarithmic scale axis will be created. FIG.1 (c) shows the intensity | strength graph about the unit area d1 produced in this way. The horizontal axis of the intensity graph is obtained by converting the horizontal axis of the spectrum shown in FIG. 1 (b) into a logarithmic scale, and note numbers n = 0 to 127 are plotted at equal intervals. On the other hand, the vertical axis of this intensity graph is obtained by converting the complex intensity A of the spectrum shown in FIG. 1 (b) into the effective intensity E, and indicates the intensity at the position of each note number n. In general, the complex intensity A obtained by Fourier transform is represented by a real part R (part showing a correlation with a cosine function) and an imaginary part I (a part showing a correlation with a sine function), but the effective intensity E is , E = (R²+ I²)^1/2As a square sum square root value,
[0024]
The intensity graph of the unit interval d1 thus obtained is a graph indicating the ratio of each vibration component corresponding to the note number n = 0 to 127 as the effective intensity with respect to the vibration component included in the interval signal for the unit interval d1. it can. Therefore, P note numbers are selected from all N (N = 128 in this example) note numbers based on the effective intensities shown in the intensity graph, and the P note numbers n are selected. Is extracted as a representative code code representing the unit interval d1. This is none other than selecting P frequencies as representative frequencies from among all N basic frequencies. Here, for convenience of explanation, it is assumed that P = 3 and three note numbers are extracted as representative code codes from a total of 128 candidates. For example, if extraction is performed based on the criterion “P code codes are extracted from candidates in descending order of strength”, the note number is used as the first representative code code in the example shown in FIG. n (d1,1) is extracted as the second representative code code, and note number n (d1,3) is extracted as the third representative code code. Become.
[0025]
When P representative code codes are extracted in this way, a section signal for the unit section d1 can be expressed by these representative code codes and their effective intensities. For example, in the case of the above example, in the intensity graph shown in FIG. 1 (c), the effective intensities of the note numbers n (d1,1), n (d1,2), n (d1,3) are e (d1,1), respectively. If 1), e (d1,2) and e (d1,3), the acoustic signal of the unit interval d1 can be expressed by the following three data pairs.
n (d1,1), e (d1,1)
n (d1,2), e (d1,2)
n (d1,3), e (d1,3)
Although the processing for the unit section d1 has been described above, the same processing is performed separately for each of the unit sections d2 to d5, and data representing the representative code code and its strength is obtained. For example, for the unit section d2,
n (d2,1), e (d2,1)
n (d2,2), e (d2,2)
n (d2,3), e (d2,3)
Three sets of data pairs are obtained. In this way, the original sound signal can be encoded by the data obtained for each unit section.
[0026]
FIG. 2 is a conceptual diagram of encoding by the above-described method. In the above example, a very simple section setting example has been described, but in the example shown in FIG. 2, more practical section setting is performed. That is, as shown in FIG. 2 (a), the unit sections d1, d2, d3,... Partially overlap each other, and when the above-described processing is performed based on such section setting, FIG. Encoding as shown in the conceptual diagram of b) will be performed. Also in this example, three representative code codes are extracted for each unit section (P = 3), and the data related to these representative code codes are accommodated in three subtracks T1 to T3. ing. For example, representative code codes n (d1,1), n (d1,2), and n (d1,3) extracted for the unit section d1 are accommodated in subtracks T1, T2, and T3, respectively. However, FIG. 2 (b) is a conceptual diagram showing the code data obtained by the above-described method in the form of a note, and in fact, data relating to strength is added to each note. For example, the sub-track T1 includes e (d1,1), e (d2,1) together with data indicating the scales of note numbers n (d1,1), n (d2,1), n (d3,1). ), E (d3, 1)... Is stored. In the conceptual diagram shown in FIG. 2 (b), the position of each unit section on the time axis is shown by the position of the note in the horizontal direction. Is accurately added as a numerical value to each note.
[0027]
Note that the MIDI format is not necessarily adopted as the encoding format adopted here, but since the MIDI format is the most popular as this type of encoding format, the code data in the MIDI format is practically used. Most preferably, is used. In the MIDI format, “note-on” data or “note-off” data exists while interposing “delta time” data. “Note-on” data is data that designates a specific note number N and velocity V to instruct the start of performance of a specific sound, and “note-off” data is specific note number N and velocity V. Is data that designates the end of the performance of a specific sound. The “delta time” data is data indicating a predetermined time interval. Velocity V is a parameter indicating, for example, the speed at which the piano keyboard is pressed down (velocity at note-on) and the speed at which the finger is released from the keyboard (velocity at note-off). Or it shows the strength of the performance end operation.
[0028]
In the above-described method, P note numbers n (dk, 1), n (dk, 2),..., N (dk, P) are obtained as representative code codes for the k-th unit interval dk. Effective strength e (dk, 1), e (dk, 2),..., E (dk, P) is obtained for each. Therefore, MIDI format code data can be created by the following method. First, as the note number N described in the “note on” data or “note off” data, the obtained note numbers n (dk, 1), n (dk, 2),..., N (dk, P ) Can be used as is. On the other hand, as the velocity V described in the “note-on” data or “note-off” data, the obtained effective intensities e (dk, 1), e (dk, 2),..., E (dk, P ) May be used as a value normalized by a predetermined method. The “delta time” data may be set according to the length of each unit section.
[0029]
  In the above example, the Fourier spectrum of the section signal is obtained, and P frequencies (note numbers) are selected in descending order of the intensity value, and the representative frequency is selected. Other methods may be used. For example,JP 2000-261322 AShows an example of selecting a representative frequency using a generalized harmonic analysis technique, and in implementing the encoding method according to the present invention, such a generalized harmonic analysis technique is used. A representative frequency may be selected. Both the method of selecting the representative frequency based on the peak of the Fourier spectrum and the method of selecting the representative frequency using the generalized harmonic analysis method, after all, for each acoustic data in each unit section, A correlation value with a periodic function having a basic frequency (in the above example, 128 frequencies corresponding to MIDI note numbers) is calculated, and a representative frequency is selected based on this correlation value (specifically, correlation The frequency of a periodic function having a large value is selected as a representative frequency).
[0030]
§2. Acoustic signal encoding and decoding method according to the present invention
As described above, the basic principle of the method of encoding an acoustic signal according to the present invention is that an acoustic signal given as a time-series intensity signal is taken in as digital acoustic data, and a plurality of units on the time axis of the acoustic data are obtained. There is a method of setting a section and expressing acoustic data (section signal) for each unit section by a periodic signal having a predetermined representative frequency. Here, the representative frequency indicates a representative frequency component included in the section signal, and one having a large correlation with the section signal is selected from the N basic frequencies prepared in advance as the representative frequency. It will be. In the above example, three representative frequencies are selected from 128 basic frequencies corresponding to the MIDI note number, and one section signal is expressed by three code data. Here, each code data includes the selected representative frequency, the calculated correlation value (a value indicating the correlation between the periodic function having the representative frequency and the section signal), and the position on the time axis of each unit section. When MIDI data is used, it consists of a note number (representing a representative frequency), velocity (representing a correlation value), and delta time (representing a position on the time axis).
[0031]
Code data encoded in the MIDI data format as described above can be reproduced (decoded) using a general MIDI sound source, and is reproduced by a sound source waveform corresponding to 128 basic frequencies (note numbers). Will be. In FIG. 3, among the 128 note numbers, six basic frequencies corresponding to note numbers 60 to 65 are plotted on the frequency axis. As shown in the figure, note numbers 60 to 65 correspond to C3 to F3 in the musical scale and correspond to 262 to 350 Hz in frequency. Since the frequency axis shown in the figure is a logarithmic scale, each fundamental frequency is plotted at equal intervals, but in reality, the frequency value of each fundamental frequency is a geometric series, and the scale increases in semitone units. go. As described above, when only 128 basic frequencies discretely set in semitone units on the frequency axis are used, even if the instrument sound can be expressed sufficiently, a general acoustic signal such as a vocal sound is sufficiently obtained. It cannot be expressed.
[0032]
The focus of the present invention is that the original acoustic signal is encoded as faithfully as possible using not only the fundamental frequency prepared as a sound source during decoding but also an intermediate frequency therebetween, The frequency is reproduced using the sound source signal as it is, and the intermediate frequency is reproduced by correcting the frequency of the sound source signal.
[0033]
For example, as in the example shown in FIG. 4, adjacent fundamental frequencies are divided into three equal parts on a logarithmic scale, and two intermediate frequencies are defined for each. Here, as a note number of each of these intermediate frequencies, “+” or “−” is added to the note number of the basic frequency. For example, two intermediate frequencies defined between the fundamental frequency note numbers 60 and 61 are denoted as note numbers “60+” and “61−”. In other words, for the fundamental frequency indicated by the note number “n”, the intermediate frequency indicated by the note number “n +”, which is higher by “1/3 semitone”, and “n−”, which is lower by “1/3 semitone”. The intermediate frequency indicated by the note number is defined. As a result, 128 basic frequencies and 256 intermediate frequencies are defined, and a total of 384 frequencies are defined on the frequency axis. These frequencies form a geometric series array arranged at equal intervals on the logarithmic scale frequency axis.
[0034]
In the example described in §1, for each of the periodic signals having 128 basic frequencies, a correlation with the acoustic data is obtained, and a process of selecting a fundamental frequency having a certain degree of correlation as a representative frequency is performed. The data was expressed as note data having a selected representative frequency. In the method described here, the correlation with acoustic data is obtained for each periodic signal having 384 frequencies (basic frequency and intermediate frequency), and the basic frequency or intermediate frequency having a certain degree of correlation is selected as the representative frequency. Thus, the sound data is expressed as note data having the selected representative frequency. According to such a method, since the degree of freedom in selecting the representative frequency increases three times, it is possible to perform encoding faithful to the original acoustic waveform.
[0035]
However, since a general MIDI sound source has only a sound source signal corresponding to the fundamental frequency, it is necessary to devise when decoding (reproducing) the code data encoded by the above method. That is, the code data having the fundamental frequency indicated by the note number “n” is reproduced using the sound source signal of the fundamental frequency prepared for the MIDI sound source as it is and indicated by the note number “n +” or “n−”. The code data having the intermediate frequency is reproduced by performing frequency correction on the sound source signal of the basic frequency prepared in the MIDI sound source. Specifically, in the case of code data having an intermediate frequency indicated by a note number “n +” that is higher by “1/3 semitone” than the basic frequency, tuning is performed to increase the sound source signal by “1/3 semitone”. In the case of code data having an intermediate frequency indicated by a note number “n−” lower by “1/3 semitone” than the fundamental frequency, the sound source signal is lowered by “1/3 semitone”. It is only necessary to perform tuning and playback.
[0036]
As described above, in order to present each reproduced sound with a predetermined tuning for each piece of code data, the amount of deviation of the frequency of each piece of code data from the basic frequency is set at the time of encoding. Accordingly, grouped output may be performed. For example, in the case of the above-described example, classification as shown in FIG. 5 may be performed, and each group may be output to an independent track. That is, the code data having the fundamental frequency indicated by the note number “n” is group 1, the code data having the intermediate frequency indicated by the note number “n +” is group 2, and the code data having the intermediate frequency indicated by the note number “n−”. Code data having a frequency may be classified into group 3. If such classification is performed, when the code data belonging to group 1 is decoded, the sound source signal of the fundamental frequency prepared for the MIDI sound source is reproduced as it is, and the code belonging to group 2 is reproduced. When data is decoded, playback is performed by tuning the basic frequency prepared for the MIDI sound source to be higher by “1/3 semitone”, and when the code data belonging to group 3 is decoded. By performing the reproduction by performing tuning that lowers the basic frequency prepared in the MIDI sound source by “1/3 semitone”, it becomes possible to perform correct decoding according to the encoding.
[0037]
In the SMF (Standard MIDI File) format, which is most commonly used, MIDI data can be divided into 16 channels (generally called tracks in the audio field, but called channels in the MIDI standard). It is possible to tune the sound source signal for each channel. Therefore, in practice, it is preferable to perform grouping of MIDI data using this channel. For example, in the example shown in FIG. 5, code data belonging to group 1 is stored separately in channel 0, code data belonging to group 2 is stored in channel 1, and code data belonging to group 3 is stored separately in channel 2. At the time of reproduction, the channel 1 may be instructed to be increased by “1/3 semitone”, and the channel 2 may be instructed to be decreased by “1/3 semitone”.
[0038]
In the above example, adjacent basic frequencies are equally divided into three on the logarithmic scale frequency axis and two intermediate frequencies are defined. However, adjacent basic frequencies are divided into a larger number, and a larger number of intermediate frequencies are defined. A frequency may be defined. In general, if N basic frequencies (N> 1) are defined, and (M−1) intermediate frequencies (M> 1) are defined in the middle of adjacent basic frequencies, N × M The representative frequency can be selected from among the street frequencies, and the degree of freedom of selection is increased accordingly. FIG. 6 is a diagram illustrating an example in which such N × M general frequency definitions are performed. The frequency f (i, 0) plotted on the frequency axis indicates the i-th basic frequency among the N basic frequencies, and the frequency f (i + 1, 0) indicates the (i + 1) -th basic frequency. (0 indicates the fundamental frequency). The two adjacent frequencies f (i, 0) and f (i + 1, 0) are equally divided into M on the logarithmic scale frequency axis, and (M-1) intermediate frequencies are defined. . That is, the first intermediate frequency is f (i, 1), the second intermediate frequency is f (i, 2), the jth intermediate frequency is f (i, j), and the (M−1) th. The th intermediate frequency is f (i, M−1). Such a frequency definition uses a proportionality constant K given as a real number of 1 or more,
f (i + 1,0) = K · f (i, 0)
f (i, j) = K^{j / M}・ F (i, 0)
It can be shown by the following formula.
[0039]
When decoding code data encoded using such N × M frequencies, a sound source capable of generating N sound source signals corresponding to the fundamental frequency is prepared, and the i th Code data having the fundamental frequency f (i, 0) is reproduced using the i-th excitation signal and defined between the i-th fundamental frequency and the (i + 1) -th fundamental frequency. For the code data having the jth intermediate frequency f (i, j) among the (M-1) intermediate frequencies, the i th sound source signal or the (i + 1) th sound source signal is used. The reproduction may be performed using a correction signal that has been subjected to frequency correction corresponding to a deviation amount with respect to the fundamental frequency.
[0040]
As the values of the parameters N and M are increased, the encoding quality of the original sound signal is improved. However, when selecting a representative frequency, a periodic function having N × M frequencies. Therefore, the calculation load increases as the values of the parameters N and M are increased. Further, since only 128 basic frequencies are prepared for a general MIDI sound source, it is preferable to set N = 128 if encoding capable of reproduction with a general-purpose MIDI sound source is performed. Further, as described above, since the number of channels in the SMF format is set to 16, when grouping is performed using channels, it is necessary to set M ≦ 16.
[0041]
§3. Specific encoding and decoding examples
Finally, specific examples of the audio signal encoding method and decoding method according to the present invention will be described. Here, it is assumed that code data as shown in FIG. 7 is obtained by applying the encoding method described so far to some kind of original sound signal. In each of these code data, the portion consisting of the numbers “N” and “1-4” indicates a predetermined note number of the MIDI data, and the “+” and “−” portions are the note numbers. The amount of deviation with respect to the indicated fundamental frequency is shown. That is, when encoding, 128 basic frequencies and 256 intermediate frequencies are used, and code data with “+” is an intermediate frequency that is “1/3 semitone” higher than the basic frequency. The code data with “−” is the code data with the intermediate frequency “1/3 semitone” lower than the basic frequency, and the code data without “+” and “−” is That is, code data. In FIG. 7, each code data is arranged over two rows. This is because two representative frequencies are selected for the acoustic data in one unit section, and two code data are generated. The horizontal direction corresponds to the time axis.
[0042]
Now, when such code data are generated, they are classified into three groups (tracks) as shown in FIG. Here, group 1 is code data of the basic frequency, group 2 is code data of the intermediate frequency marked with “+”, and group 3 is code data of the intermediate frequency marked with “−”. Here, a sub-track is further provided for each group, and code data of the same frequency is accommodated in the same sub-track. For example, in group 1, code data “N1”, “N2”, and “N4” are stored in independent subtracks. In this example, since the groups 2 and 3 include only code data of the same frequency, only one sub-track is shown. However, when code data of different frequencies are included, the group 1 In the same manner as above, each sub-track is accommodated.
[0043]
Subsequently, as shown in FIG. 9, the codes “+” and “−” are deleted from the code data accommodated in the groups 2 and 3, that is, the code data having the intermediate frequency. In the first place, the codes “+” and “−” are codes indicating “1/3 semitone” higher or lower than the fundamental frequency, and are not defined in the general MIDI standard. The symbols “+” and “−” are symbols assigned to classify each group (track), and the roles are finished when the classification is completed. Thus, by deleting these codes, the code data shown in FIG. 9 becomes data conforming to the MIDI standard. Of course, the information that “1/3 semitone” is high or low is held for each group.
[0044]
Next, integration of code data is performed for each sub-track. This integration is a process of integrating and consolidating a plurality of pieces of code data arranged adjacently on the time axis into code data having the same or similar characteristics. Thus, the total number of code data can be reduced. For example, in the example shown in FIG. 9, when integration is performed with the policy that “when code data having the same note number is arranged adjacently, this is integrated”, a result as shown in FIG. 10 is obtained. can get. A plurality of code data each surrounded by a rectangle is integrated into one code data. FIG. 10 shows an example in which groups 1 to 3 are allocated and accommodated in channels 0 to 2 in the MIDI standard.
[0045]
FIG. 11 is an example in which each code data shown in FIG. 10 is replaced with data indicating “note on (performance operation start)” and “note off (performance operation end)” in the MIDI data. The leading numbers 1 to 12 are serial numbers of this data. For example, “1. Ch0 N1 On” indicates a command to set the note number N1 of channel 0 to “note on”, and “2.Ch0 N4 On” indicates the note number N4 of channel 0 as “note”. It shows a command that says “ON”. In actual MIDI data, these commands are coded and described with a predetermined delta time.
[0046]
For reference, FIG. 12 shows a specific MIDI data (hexadecimal notation) configuration obtained by encoding each code data shown in FIG. 10 in the SMF format. Each row shown in FIG. 12 corresponds to one command shown in FIG. This command is written at the end of each line in FIG. The numerical value in the “Delta Time” column at the beginning of the line is a numerical value indicating the waiting time until each command is executed, and is information indicating the position of each code data on the time axis. The following “Status” column is a code indicating the type of command. The first digit “9” indicates “note on”, “8” indicates “note off”, and the second digit indicates a channel number. ing. In the next “Note Number” column, note numbers are described (in the figure, codes N1 to N4 are described for convenience, but actually 0 to 127 (00 to 7F in hexadecimal)). Any one of the note numbers will be listed). In the last “Velocity” column, the numerical value of velocity is written. In this example, only the velocity at the time of “note on” is set (as described above, based on the correlation value with the periodic function having the frequency corresponding to the note number), and “note off”. In this case, the velocity values are all 0.
[0047]
On the other hand, FIG. 13 shows a specific MIDI data configuration example in which the tuning command for each channel is encoded in the SMF format. Although detailed explanation is omitted here, the first to sixth lines of the code shown here are commands for setting channel 0 to normal tuning (reproduction according to the basic frequency without performing frequency correction). The 12th line is a command for setting tuning for increasing the pitch of channel 1 (frequency correction of +33 cents relative to the fundamental frequency), and the 13th to 18th lines are tuning for decreasing the pitch of channel 2 (−33 cents relative to the fundamental frequency). This is a command for setting (frequency correction). Since a frequency correction of 100 cents corresponds to a semitone, a frequency correction of +33 cents is a correction that increases “1/3 semitone”, and a frequency correction of −33 cents is a correction that decreases “1/3 semitone”.
[0048]
In practice, the MIDI data consisting of the tuning command shown in FIG. 13 is executed prior to the MIDI data consisting of the performance information shown in FIG. 12, and playback is performed with the tuning of the sound source signal completed. It will be made to be. Since the SMF format MIDI data can be easily tuned for each channel in this way, the decoding according to the present invention can be performed using code data that conforms to the normal MIDI standard. It is possible to carry out the method. Further, since all the information necessary for tuning can be included in the MIDI data, no special setting is required for reproduction.
[0049]
The acoustic signal encoding method and decoding method according to the present invention have been described above based on the illustrated embodiments. However, the present invention is not limited to these embodiments, and various other aspects are possible. Can be implemented. In particular, in the above-described embodiment, an example in which encoding is performed as MIDI data has been shown. However, the present invention is not limited to encoding into MIDI data, and can be applied to encoding based on an arbitrary standard. is there. The audio signal encoding method and decoding method according to the present invention can be implemented by incorporating dedicated software into a general-purpose computer such as a personal computer. Such dedicated software can be read by a computer. It can be recorded and distributed on a recording medium.
[0050]
【The invention's effect】
As described above, according to the encoding method and decoding method of an acoustic signal according to the present invention, encoding is performed using an intermediate frequency in addition to the fundamental frequency included in the sound source signal, and at the time of decoding, Since the intermediate frequency is reproduced by the frequency-corrected sound source signal, the original sound waveform can be encoded with high quality and can be decoded.
[Brief description of the drawings]
FIG. 1 is a diagram showing a basic principle of an audio signal encoding method according to the present invention.
FIG. 2 is a conceptual diagram of code data created based on the principle shown in FIG.
FIG. 3 is a diagram illustrating an example of a fundamental frequency defined when performing an audio signal encoding method according to the present invention.
4 is a diagram showing an example in which two intermediate frequencies are defined between the fundamental frequencies shown in FIG. 3; FIG.
FIG. 5 is a diagram showing an example of grouping of code data in the audio signal encoding method according to the present invention.
FIG. 6 is a diagram showing a general example of a fundamental frequency and an intermediate frequency that are defined when the audio signal encoding method according to the present invention is implemented.
FIG. 7 is a diagram showing an example of code data created by the audio signal encoding method according to the present invention.
FIG. 8 is a diagram showing a state in which code data according to the embodiment shown in FIG. 7 is grouped.
9 is a diagram showing a state of code data after grouping shown in FIG. 8. FIG.
10 is a diagram illustrating a state in which the integration processing is performed on the code data illustrated in FIG. 9 and is stored in each channel.
FIG. 11 is a diagram in which each code data shown in FIG. 10 is replaced with data indicating “note on (performance operation start)” and “note off (performance operation end)” in the MIDI data.
12 is a diagram showing specific MIDI data obtained by encoding each code data shown in FIG. 10 in the SMF format.
FIG. 13 is a diagram showing specific MIDI data obtained by encoding a tuning command for each channel in an SMF format.
[Explanation of symbols]
d1 to d5: Unit section
f (i, 0), f (i + 1, 0) ... fundamental frequency
f (i, 1), f (i, 2), f (i, j), f (i, M-1) ... intermediate frequency
n ... Note number
N1-N4 ... Note number
t1-t6 ... Time
T1 to T3 ... sub track

Claims

時系列の強度信号として与えられる音響信号を符号化するための符号化方法であって、
符号化対象となる音響信号を、デジタルの音響データとして取り込む入力段階と、
前記音響データの時間軸上に複数の単位区間を設定する区間設定段階と、
Ｎ通りの基本周波数（Ｎ＞１）を定義するとともに、各隣接基本周波数の中間にそれぞれ（Ｍ−１）通りの中間周波数（Ｍ＞１）を定義する周波数定義段階と、
個々の単位区間内の音響データについて、前記各周波数をもった周期関数との相関値を演算し、この相関値に基づいて前記音響データに含まれる代表的な周波数成分を示す代表周波数を選出する代表周波数選出段階と、
個々の単位区間内の音響データを、選出した代表周波数、演算した相関値、個々の単位区間の時間軸上での位置、を示す情報を含む符号データによって表現する符号化段階と、
前記符号データを、代表周波数の基本周波数に対するずれ量に応じて、複数Ｍ個のグループに分類し、各グループごとに分離して出力する符号出力段階と、
を有することを特徴とする音響信号の符号化方法。An encoding method for encoding an acoustic signal given as a time-series intensity signal,
An input stage for capturing an acoustic signal to be encoded as digital acoustic data;
A section setting step for setting a plurality of unit sections on the time axis of the acoustic data;
Defining N fundamental frequencies (N> 1) and defining (M−1) intermediate frequencies (M> 1) in the middle of each adjacent fundamental frequency;
For acoustic data in each unit section, a correlation value with the periodic function having each frequency is calculated, and a representative frequency indicating a representative frequency component included in the acoustic data is selected based on the correlation value. Representative frequency selection stage,
An encoding stage for expressing the acoustic data in each unit section by code data including information indicating the selected representative frequency, the calculated correlation value, and the position of each unit section on the time axis;
A code output step of classifying the code data into a plurality of M groups according to a deviation amount of the representative frequency with respect to the fundamental frequency, and outputting the code data separately for each group;
A method for encoding an acoustic signal, comprising:

請求項１に記載の音響信号の符号化方法において、
Ｎ通りの基本周波数のうちの第ｉ番目（ただし、ｉ＝１，２，…，Ｎ）の基本周波数をｆ（ｉ，０）とし、この第ｉ番目の基本周波数と第（ｉ＋１）番目の基本周波数との中間に定義される（Ｍ−１）通りの中間周波数のうちの第ｊ番目（ただし、ｊ＝１，２，…，Ｍ−１）の中間周波数をｆ（ｉ，ｊ）としたときに、１以上の実数で与えられる比例定数Ｋを用いて、
ｆ（ｉ＋１，０）＝Ｋ・ｆ（ｉ，０）
ｆ（ｉ，ｊ）＝Ｋ ^ｊ／Ｍ・ｆ（ｉ，０）
なる式が成り立つように、周波数定義段階における周波数定義を行うようにすることを特徴とする音響信号の符号化方法。The method of encoding an acoustic signal according to claim 1,
Of the N fundamental frequencies, the i-th (where i = 1, 2,..., N ) fundamental frequency is defined as f (i, 0), and the i-th fundamental frequency and the (i + 1) -th fundamental frequency. Of the (M−1) intermediate frequencies defined in the middle of the fundamental frequency, the jth intermediate frequency (where j = 1, 2,..., M−1 ) is defined as f (i, j). When using a proportional constant K given as a real number of 1 or more,
f (i + 1,0) = K · f (i, 0)
f (i, j) = K ^{j / M} · f (i, 0)
An acoustic signal encoding method, wherein frequency definition is performed in the frequency definition stage so that the following equation holds.

請求項２に記載の音響信号の符号化方法において、
符号化段階において、代表周波数をノートナンバー、相関値をベロシティー、単位区間の時間軸上での位置をデルタタイム、によってそれぞれ表現したＭＩＤＩデータにより符号化を行い、Ｍ個のグループにそれぞれＭ個のチャンネルを対応させ、各グループに分類された符号データをそれぞれ対応するチャンネルに出力することを特徴とする音響信号の符号化方法。The method for encoding an acoustic signal according to claim 2,
At the encoding stage, encoding is performed with MIDI data represented by a note number as a representative frequency, a velocity as a correlation value, and a delta time as a position on the time axis of a unit section. A method of encoding an acoustic signal, wherein the encoded data classified into each group is output to the corresponding channel.

請求項１に記載の音響信号の符号化方法によって符号化された符号データを復号化するための復号化方法であって、
前記復号化方法を実行する装置が、Ｎ通りの基本周波数の音源信号を発生することができる音源を利用して、前記基本周波数をもった符号データのグループについては、前記音源信号を用いて再生を行い、前記中間周波数をもった符号データのグループについては、前記音源信号に、基本周波数に対するずれ量に相当する周波数補正を施した補正信号を用いて再生を行うことを特徴とする音響信号の復号化方法。A decoding method for decoding code data encoded by the audio signal encoding method according to claim 1, comprising:
A device that performs the decoding method uses a sound source that can generate sound source signals of N fundamental frequencies, and reproduces a group of code data having the fundamental frequency using the sound source signal. For the group of code data having the intermediate frequency, the sound signal is reproduced using a correction signal obtained by performing frequency correction corresponding to a deviation amount with respect to the fundamental frequency. Decryption method.

請求項２に記載の音響信号の符号化方法によって符号化された符号データを復号化するための復号化方法であって、
前記復号化方法を実行する装置が、基本周波数に相当するＮ通りの音源信号を発生することができる音源を利用して、
第ｉ番目の基本周波数ｆ（ｉ，０）をもった符号データについては、第ｉ番目の音源信号を用いて再生を行い、
第ｉ番目の基本周波数と第（ｉ＋１）番目の基本周波数との中間に定義される（Ｍ−１）通りの中間周波数のうちの第ｊ番目の中間周波数ｆ（ｉ，ｊ）をもった符号データについては、第ｉ番目の音源信号もしくは第（ｉ＋１）番目の音源信号に、基本周波数に対するずれ量に相当する周波数補正を施した補正信号を用いて再生を行うことを特徴とする音響信号の復号化方法。A decoding method for decoding code data encoded by the audio signal encoding method according to claim 2,
An apparatus that performs the decoding method uses a sound source that can generate N sound source signals corresponding to the fundamental frequency,
The code data having the i-th fundamental frequency f (i, 0) is reproduced using the i-th sound source signal,
Code having j-th intermediate frequency f (i, j) among (M−1) intermediate frequencies defined between the i-th basic frequency and the (i + 1) -th basic frequency The data is reproduced using a correction signal obtained by performing frequency correction corresponding to a deviation amount with respect to the fundamental frequency on the i-th sound source signal or the (i + 1) -th sound source signal. Decryption method.

請求項３に記載の音響信号の符号化方法よって符号化された符号データを復号化するための復号化方法であって、
前記復号化方法を実行する装置が、基本周波数に相当するＮ通りの音源信号を発生することができるＭＩＤＩ音源を利用して、各チャンネルごとに、前記音源信号に対する固有の周波数補正量を設定し、基本周波数をもった符号データを含むチャンネルについては前記音源信号がそのまま再生され、中間周波数をもった符号データを含むチャンネルについては前記固有の周波数補正量に基づく周波数補正が行われた音源信号が再生されるようにしたことを特徴とする音響信号の復号化方法。A decoding method for decoding code data encoded by the audio signal encoding method according to claim 3, comprising:
A device that executes the decoding method uses a MIDI sound source capable of generating N kinds of sound source signals corresponding to the fundamental frequency, and sets a specific frequency correction amount for the sound source signal for each channel. The sound source signal is reproduced as it is for the channel including the code data having the fundamental frequency, and the sound source signal subjected to frequency correction based on the specific frequency correction amount is performed for the channel including the code data having the intermediate frequency. An audio signal decoding method characterized by being reproduced.

請求項１〜３のいずれかに記載の音響信号の符号化方法をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体。The computer-readable recording medium which recorded the program for making a computer perform the encoding method of the acoustic signal in any one of Claims 1-3.

請求項４〜６のいずれかに記載の音響信号の復号化方法をコンピュータに実行させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体。A computer-readable recording medium recording a program for causing a computer to execute the method for decoding an acoustic signal according to claim 4 .