JP4502246B2

JP4502246B2 - Pitch determination device

Info

Publication number: JP4502246B2
Application number: JP2003119776A
Authority: JP
Inventors: 雅章岡; 毅久郎山内
Original assignee: Kawai Musical Instrument Manufacturing Co Ltd
Current assignee: Kawai Musical Instrument Manufacturing Co Ltd
Priority date: 2003-04-24
Filing date: 2003-04-24
Publication date: 2010-07-14
Anticipated expiration: 2023-04-24
Also published as: JP2004325744A

Description

【０００１】
【発明の属する技術分野】
本発明は、音程判定装置に関し、特に、正確なハーモニーつまり和声を発生させるように歌唱もしくは楽器演奏するための練習に適した音程判定装置に関する。
【０００２】
【従来の技術】
自分で唱った歌の正確さを自分で判断しつつ歌唱練習できるようにするため、歌の正確さを表示装置に表示することができる歌唱自習器が知られる（特公昭６０−３８７８号公報）。この自習器では、手本となる歌信号とマイクロフォンから入力された練習者の歌信号とに関して歌のメロディや音量などの特徴を抽出し、両歌信号の前記特徴を表示装置に同時に表示させる。歌唱者はこの表示によって自分の歌と手本の歌とのずれを視覚的に認識できる。
【０００３】
【発明が解決しようとする課題】
上記歌唱自習器は、手本となる歌信号に対して練習者の歌信号を一致させるようにする練習では効果が期待できる。しかし、この歌唱自習器の表示では、複数の歌唱者の歌声がきれいな和声になっているか、つまり正確なハーモニーを生じているかを判断することはできない。合唱など、多数の人が歌唱するような楽曲においては、一人一人の歌唱者が和音の一音を分担して正確に歌唱しなければならない。そこで、単に手本となる音との一致を判断できるような表示にとどまらず、複数の歌声の和声的音程の正確さを判断できる表示を伴う練習装置が望まれる。
【０００４】
本発明は、上記課題に鑑み、複数の歌唱音もしくは演奏音が正確な和声を生じているかどうかを表示することができる音程練習装置を提供することを目的とする。
【０００５】
【課題を解決するための手段】
上記の課題を解決し、目的を達成するための本発明は、マイクロフォンからの入力音の周波数スペクトルに現れたピークのうちからレベルの高い順に、予定数のピークの周波数を基本周波数として決定する基本周波数抽出手段と、決定された前記基本周波数のうち周波数が隣接するもの同士の比の値によって音程を算出する音程算出手段と、算出音程に近い所定の音程に対応する音程名を表示する音程名表示手段とを具備した点に第１の特徴がある。
【０００６】
第１の特徴によれば、例えば、マイクロフォンから同時に二つの音源で発生された音が入力されたときに、その合成音の周波数スペクトルから二つの基本周波数が抽出される。そして、複数の基本周波数の音程に近い音程名が選ばれて表示される。この表示された音程名をみて、正確な和声の音が入力されたか否かを判断することができる。
【０００９】
また、本発明は、前記音程算出手段で算出された音程と該音程に近い所定の音程との違いを音程の正確さとして表示する正確さ表示手段を具備した点に第２の特徴がある。第２の特徴によれば、表示により所定の音程との違いを詳細に判断することができる。
【００１０】
また、本発明は、前記予定数の基本周波数に対応する音高を譜面として表示する音高表示手段を具備した点に第３の特徴がある。第３の特徴によれば、入力された音の音高を譜面上で確認できる。
【００１１】
また、本発明は、前記基本周波数抽出手段で基本周波数の抽出対象となる周波数範囲を設定し、周波数分析手段で分解された入力音の各周波数成分を前記周波数範囲に制限するようにした点に第４の特徴がある。
【００１２】
さらに、本発明は、前記基本周波数抽出手段が、検出された前記予定数のピークの周波数を仮の基本周波数とし、該仮の基本周波数と該基本周波数にそれぞれ隣接する周波数成分の周波数との補間により真の基本周波数を決定するように構成された点に第５の特徴がある。
【００１３】
第５の特徴によれば、周波数分析の周波数分解能より小さい音程を正確に検出することができる。
【００１４】
また、本発明は、前記マイクロフォンから入力された音が歌唱音または楽器演奏音である点に第６の特徴がある。第６の特徴によれば、例えば、複数の歌唱者による音声や複数人による楽器演奏音が同時に入力された場合に各歌唱者等が発生した音の音程が表示されるので、歌唱者等はその音程名の表示をみて、正確な和声で発声もしくは演奏できたか否かを判断することができる。
【００１５】
【発明の実施の形態】
以下、図面を参照して本発明を詳細に説明する。図１は本発明の一実施形態に係る歌唱練習装置のシステム構成を示すブロック図である。同図において、歌唱練習装置１は、マイクロフォン２、Ａ／Ｄ変換器３、音声信号処理装置４、および表示装置５からなる。音声信号処理装置４は、フーリエ変換部４０、基本周波数抽出部４１、周波数比算出部４２、音程検出部４３、および表示情報生成部４４を含む。音声信号処理装置４は、ＣＰＵ、ＲＯＭ、ＲＡＭ、およびマイクロフォン２から入力される音声信号を受け入れるインタフェース等を備えるパーソナルコンピュータによって実現でき、表示装置５としては液晶ディスプレイやブラウン管等、周知の表示装置を使用することができる。なお、パーソナルコンピュータは、後述する発音のための音源装置およびサウンドシステムをさらに備えるのが望ましい。
【００１６】
Ａ／Ｄ変換器３は、マイクロフォン２から入力されたアナログ音声信号をデジタル音声信号に変換する。ここでは、マイクロフォン２に複数の音声信号が同時入力される場合を想定するので、Ａ／Ｄ変換器３には、複数の音声信号が合成されて入力される。
【００１７】
フーリエ変換部４０は、高速フーリエ変換機能を備え、Ａ／Ｄ変換器３から入力された合成音声信号を基本周波と高調波とに分解する。基本周波数抽出部４１は、高速フーリエ変換によって分解された各周波数ごとの音声信号レベルを比較して、レベルすなわちパワーが最も大きい二つのピークを検出する。ここでは、二つの音声信号（例えば、二人の歌唱者がハーモニーを生じさせるために同時に発音したときの音声信号）に対応して二つのピークを検出する。したがって、さらに多くの歌唱者による練習では、その歌唱者の人数分もしくはパート分のピークが検出される。本明細書では、ここで検出されたピークのそれぞれの周波数を音声信号の「仮の基本周波数」と呼ぶ。
【００１８】
基本周波数抽出部４１は、さらに、二つの仮の基本周波数に隣接する周波数をそれぞれ検出し、仮の基本周波数と該隣接する周波数とを補間する。一般に、高速フーリエ変換において周波数方向の分解能は分析時間窓の長さつまりポイント数によって決まる。例えば、サンプリング周波数が４８ＫＨｚのときポイント数２０４８で高速フーリエ変換すると周波数分解能は２３Ｈｚになる。この分解能では半音間の周波数差、例えば、Ｃ３とＣ３＃の周波数差１５．５Ｈｚを判別することができない。そこで、二つの周波数を補間することによってフーリエ変換の低い周波数分解能を補い、正確な基本周波数（仮の基本周波数に対して「真の基本周波数」と呼ぶ）を抽出する。補間手法として例えば、複素スペクトル内挿法を用いることができる。複素スペクトル内挿法はさらに後述する。
【００１９】
周波数比算出部４２は、マイクロフォン２から同時に入力される複数の音声信号の真の基本周波数の比率を算出する。音程検出部４３は、和音の音程に対応する所定の周波数比と周波数比算出部４２で算出された周波数比とを比較して音程名を検出する。すなわち、算出された周波数比と最も近い所定の周波数比の音程名を検出音程名として決定する。また、この検出音程名の周波数比に対する算出された周波数比のずれの大きさによって音程の正確さも検出される。音程の正確さは予め設定されたずれ量と対応して決定される。
【００２０】
表示情報生成部４４は、前記音程検出部４３で検出された音程名と正確さとを、予め設定された表示画面上に合成できる表示情報に変換する。
【００２１】
図２は、フーリエ変換部４０から出力される音声信号の周波数スペクトルの例を示す図であり、横軸が周波数（対数表示）、縦軸がレベル（デシベル）である。この例では、二人が同時に音声を入力しているので、仮の基本周波数は二つ検出される。周波数ｆ１，ｆ２が仮の基本周波数である。そして、仮の基本周波数ｆ１，ｆ２ならびにこれらにそれぞれ隣接する、次に高い周波数ｆ１ａおよびｆ２ａが抽出されて補間に使用される。
【００２２】
補間手法の一例として、複素スペクトル内挿法について説明する。図３は、仮の基本周波数およびその近傍の周波数成分の一例をベクトル表示した図である。このように周波数成分は虚軸、実軸、および周波数軸上に三次元表示される。この例では、各周波数成分のベクトル方向は仮の基本周波数（周波数成分がＺｍの周波数）以下の周波数と、仮の基本周波数より高い周波数とで異なる（前者はφ、後者はφ＋π、つまり反転している）。この各周波数成分の逆数表示を図４に示す。同図において、横軸は周波数、縦軸は（１／Ｚｍ・Ｚｍ／│Ｚｍ│）である。この図のように周波数成分は直線上に乗る。したがって、真の基本周波数（ピーク周波数）ｆは、この直線のゼロ交点を算出することにより求められる。
【００２３】
つまり、真の基本周波数ｆは、次式で算出される。ｆ＝ｍ＋（ｕ，Ｚｍ＋１）／（（ｕ，Ｚｍ＋１）−（ｕ，Ｚｍ））。この式において、ｕはｍ番目およびｍ＋１番目の成分ＺｍとＺｍ＋１との差のベクトルの方向を持つ単位ベクトルであり、（・，・）は、内積を表す。
【００２４】
なお、ここでは真の基本周波数が仮の基本周波数より高周波数側にある例を示したが、真の基本周波数が仮の基本周波数より低周波数側にあることもある。その場合は、周波数ｍと周波数ｍ−１とで補間する。なお、図３のベクトル表示例および図４の逆数表示例ならびに複素スペクトル内挿法に関しては、共立出版株式会社発行のbit（ビット）別冊（昭和６２年９月９日発行）の第３２頁〜４１頁、および原、井口：複素スペクトルを用いた周波数同定、計測自動制御学会論文集、Ｖｏｌ．１９，Ｎｏ．９（１９８３）に詳述されている。
【００２５】
図５は、表示装置５の表示例を示す図であり、表示情報生成部４４で生成された表示情報に基づく表示例である。同図において、表示事項には、入力音声に基づいて決定された音程名、入力音声の正確さ、および譜面形式による入力音声の音高が含まれる。音程名は「長３度」等、具体的な名称で表示される。また、正確さを示すインジケータ５０は、入力音声の音程と上記決定された音程（この例では長３度）との差を示す。つまり、決定された音程より入力音声の音程が大きい場合は、インジケータ５０は右寄りに表示され、入力音声の音程が小さい場合はインジケータ５０は左寄りに表示される。表示例では、決定された音程と入力音声の音程とがほぼ一致している。
【００２６】
図５の表示例では、上記音程名などの表示に加えて、各種制御のための指示部の表示を含む。レベル操作子５１は録音レベルつまりマイクロフォン２から入力される音声のゲインを調整するためのものである。表示例下部のレベルメータ５２によってマイクロフォン２による入力音声のレベルが表示されるので、歌唱者はこのレベルを認識して、レベル操作子５１を操作することにより、録音レベルを適正にできる。
【００２７】
また、表示例の右側に表示される音域操作子５３は、前記基本周波数抽出部４１で音域を制限するために使用される。図２のスペクトル表示例では、明らかなピークが二つだけであり、仮の基本周波数を正しく判別できる。しかし、常にこのように所望のピークが顕著に現れるとは限らない。例えば、倍音構成が似た１オクターブ外れた音のスペクトルが強く現れることがあるので、このような場合、正確な仮の基本周波数が検出されず、誤った音程名が決定される。その場合、歌唱者は入力音声表示から異常に気づくことができるので、周波数制限手段としての音域操作子５３を操作して、音域を狭めることができる。音域はライン５４と５５とによって上下を規定された領域で表示される。ライン５４および５５は、音域操作子５３を操作すると個々に上下される。フーリエ変換部４０から出力される周波数成分のうち、音域操作子５３で指示された周波数範囲で設定される音域以外の周波数成分は、前記基本周波数抽出部４１では処理対象外にして、精度の高い判断をできるようにする。なお、操作子５１，５３は、パーソナルコンピュータのマウス等、周知の指示手段によって操作できる。
【００２８】
さらに、図５の表示例では、独習のための音声を選択するためのスイッチ（発音選択スイッチ）５６が設けられる。上記実施形態では、マイクロフォン２から複数人が同時に音声を入力した例を説明した。しかし、一人で合唱の練習する場合にも本発明は使用できる。そのような独習に使用される音声または楽音（例えば管楽器音のように安定して持続する楽音や合成音）を比較音として発生するため発音選択スイッチ５６が使用される。発音選択スイッチ５６を操作すると、鍵盤５７が表示される。そして、この鍵盤５７の各鍵を指示することにより、指示された鍵に対応する音高の音が比較音として発生される。発音選択スイッチ５６と鍵盤５７が比較音の指定手段である。鍵盤５７の鍵を指示することによる比較音の発音機能は、指示された音高に応じて音声や楽音を合成する音声合成装置、またはＰＣＭ音源装置、ならびにアンプやスピーカ等のサウンドシステムによって実現できる。
【００２９】
鍵盤５７の鍵を指示した場合、前記基本周波数抽出部４１では、マイクロフォン２から入力された音声の基本周波数を検出するとともに、該基本周波数と比較音の基本周波数との比を算出する。つまり、鍵盤５７の鍵を指示することによる比較音と入力音声との音程を検出してハーモニーの正確さを判定する。なお、この判定において、比較音を入力音声と同様にマイクロフォン２で拾って周波数比算出部４２で入力音声と比較してもよいし、前記音声合成装置で生成される基本周波数やＰＣＭ音源装置で発生される音の基本周波数をマイクロフォン２の入力音声と比較しても良い。
【００３０】
なお、比較音の指定手段は発音選択スイッチ５６と鍵盤５７とに限らず、任意のスイッチ手段で代替できる。例えば、パーソナルコンピュータのキーボードの入力キーに予め音高を割り当てておき、このキーを使って比較音の指定をしてもよい。
【００３１】
上述の実施形態では仮の基本周波数を補間して精度を抽出上げているが、簡易的には、補間を省略して仮の基本周波数同士の周波数比で音程名を決定しても良い。
【００３２】
上述のように、本実施形態によれば、複数の練習者の音声のハーモニーの判定や、練習者の音声と比較音との音声のハーモニーの判定・表示を行うことができる。なお、基本周波数の抽出のために、複素スペクトル内挿法を用いる例を示したが、位相差計測法など、他の公知手法を用いることができる。また、音程名は、平均律音程に限らず、純正調律音程等による音程名で表示するようにしてもよい。
【００３３】
また、３人の歌唱者が同時に音声を入力するときには、検出された三つの基本周波数の隣接するもの同士の比の値により音程名を決定して表示するようにする。この場合、音程は二つ検出されるので、音程名および正確さは２種類表示する。
【００３４】
さらに、前記サウンドシステムの出力をヘッドフォンに接続する手段を備えるのが望ましい。練習者がヘッドフォンから比較音を聴きながら、この比較音に自分の音声を重ねるように発声できるからである。
【００３５】
また、さらに、本発明は、歌唱練習だけでなく、楽器の合奏練習としてマイクロフォンから楽器音を入力して、この入力音と他の楽器音もしくは比較音との音程を判定し、音程名等を表示する場合にも適用できる。
【００３６】
【発明の効果】
以上の説明から明らかなように、請求項１〜８の発明によれば、複数の入力音同士の音程または入力音と比較音との音程を判断してその音程に近い所定の音程の音程名を表示することができる。したがって、歌唱練習や楽器練習において、複数の練習者同士の演奏音や音声または練習者と練習装置が発生する音同士の音程を視覚で認識することができるので、ハーモニー発生の練習効果が上がる。
【００３７】
特に、請求項３の発明によれば、音程の正確さを視覚で判断でき、請求項４の発明によれば、発音の音高を視覚で判断できる。
【００３８】
また、請求項５の発明によれば、基本周波数抽出手段に入力される周波数成分の周波数範囲が制限され、例えば倍音構成が同様である１オクターブ上の音の基本周波数として誤って抽出される等の不具合を防止できる。
【００３９】
さらに、請求項６の発明によれば、周波数分析手段の分解能よりも高い分解能で基本周波数を抽出することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る歌唱練習装置の要部機能を示すブロック図である。
【図２】フーリエ変換部から出力される音声信号の周波数スペクトルの例を示す図である。
【図３】仮の基本周波数およびその近傍の周波数成分をベクトル表示した図である。
【図４】周波数分析された各周波数成分を逆数表示した図である。
【図５】表示装置による音声等の表示例を示す図である。
【符号の説明】
１…歌唱練習装置、２…マイクロフォン、４…音声信号処理装置、５…表示装置、４０…フーリエ変換部、４１…基本周波数抽出部、４２…周波数比算出部、４３…音程検出部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a pitch determination device, and more particularly to a pitch determination device suitable for practice for singing or playing a musical instrument so as to generate accurate harmony, that is, harmony.
[0002]
[Prior art]
In order to be able to practice singing while judging the accuracy of the song sung by himself, a singing self-study device that can display the accuracy of the song on a display device is known (Japanese Patent Publication No. 60-3878). ). This self-study device extracts features such as melody and volume of a song with respect to a song signal as a model and a song signal of a practitioner input from a microphone, and simultaneously displays the features of both song signals on a display device. By this display, the singer can visually recognize the difference between his song and the model song.
[0003]
[Problems to be solved by the invention]
The singing self-study device can be expected to be effective in practice in which the singing signal of the practitioner is matched with the singing signal serving as a model. However, the display of this singing self-study device cannot determine whether the singing voices of a plurality of singers are beautiful harmony, that is, whether an accurate harmony is produced. In music that many people sing, such as chorus, each singer must share a single chord and sing accurately. Therefore, there is a demand for a practice device that includes a display that can determine the accuracy of the harmony pitch of a plurality of singing voices, as well as a display that can determine whether the sound matches a model sound.
[0004]
In view of the above-described problems, an object of the present invention is to provide a pitch practice device capable of displaying whether or not a plurality of singing sounds or performance sounds produce an accurate harmony.
[0005]
[Means for Solving the Problems]
In order to solve the above-described problems and achieve the object, the present invention provides a basic method for determining the frequency of a predetermined number of peaks as a fundamental frequency in descending order of the peaks appearing in the frequency spectrum of the input sound from the microphone. A pitch name for displaying a pitch name corresponding to a predetermined pitch close to the calculated pitch, a pitch calculation means for calculating a pitch based on a ratio value between adjacent frequency components among the determined fundamental frequencies The first feature is that the display means is provided.
[0006]
According to the first feature, for example, when sounds generated by two sound sources are input simultaneously from a microphone, two fundamental frequencies are extracted from the frequency spectrum of the synthesized sound. A pitch name close to the pitches of a plurality of fundamental frequencies is selected and displayed. It is possible to determine whether or not an accurate harmony sound has been input by looking at the displayed pitch name.
[0009]
In addition, the present invention has a second feature in that it includes accuracy display means for displaying the difference between the pitch calculated by the pitch calculation means and a predetermined pitch close to the pitch as the accuracy of the pitch. According to the second feature, the difference from the predetermined pitch can be determined in detail by display.
[0010]
Further, the present invention has a third feature in that a pitch display means for displaying a pitch corresponding to the predetermined number of fundamental frequencies as a musical score is provided. According to the third feature, the pitch of the input sound can be confirmed on the score.
[0011]
Further, the present invention is such that the fundamental frequency extraction means sets a frequency range from which a fundamental frequency is extracted, and limits each frequency component of the input sound decomposed by the frequency analysis means to the frequency range. There is a fourth feature.
[0012]
Further, according to the present invention, the fundamental frequency extracting means sets the detected number of peak frequencies as a provisional fundamental frequency, and interpolates between the provisional fundamental frequency and a frequency component frequency adjacent to the fundamental frequency. There is a fifth feature in that it is configured to determine the true fundamental frequency.
[0013]
According to the fifth feature, it is possible to accurately detect a pitch smaller than the frequency resolution of frequency analysis.
[0014]
The present invention has a sixth feature in that the sound input from the microphone is a singing sound or a musical instrument performance sound. According to the sixth feature, for example, when voices of a plurality of singers and musical instrument performance sounds of a plurality of people are input simultaneously, the pitch of the sound generated by each singer is displayed. By looking at the display of the pitch name, it can be determined whether or not the voice has been produced or played with an accurate harmony.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a system configuration of a singing practice apparatus according to an embodiment of the present invention. In FIG. 1, the singing practice device 1 includes a microphone 2, an A / D converter 3, an audio signal processing device 4, and a display device 5. The audio signal processing device 4 includes a Fourier transform unit 40, a fundamental frequency extraction unit 41, a frequency ratio calculation unit 42, a pitch detection unit 43, and a display information generation unit 44. The audio signal processing device 4 can be realized by a personal computer including a CPU, ROM, RAM, and an interface that receives an audio signal input from the microphone 2, and the display device 5 is a known display device such as a liquid crystal display or a cathode ray tube. Can be used. The personal computer preferably further includes a sound source device and a sound system for sound generation, which will be described later.
[0016]
The A / D converter 3 converts the analog audio signal input from the microphone 2 into a digital audio signal. Here, since it is assumed that a plurality of audio signals are simultaneously input to the microphone 2, a plurality of audio signals are synthesized and input to the A / D converter 3.
[0017]
The Fourier transform unit 40 has a fast Fourier transform function, and decomposes the synthesized speech signal input from the A / D converter 3 into a fundamental frequency and a harmonic. The fundamental frequency extraction unit 41 compares the audio signal level for each frequency decomposed by the fast Fourier transform, and detects two peaks having the highest level, that is, power. Here, two peaks are detected corresponding to two audio signals (for example, an audio signal when two singers are simultaneously sung to produce harmony). Therefore, in practice by more singers, peaks corresponding to the number of singers or parts are detected. In the present specification, each frequency of the detected peak is referred to as a “provisional fundamental frequency” of the audio signal.
[0018]
The fundamental frequency extraction unit 41 further detects frequencies adjacent to the two temporary fundamental frequencies, and interpolates the temporary fundamental frequency and the adjacent frequencies. In general, in the fast Fourier transform, the resolution in the frequency direction is determined by the length of the analysis time window, that is, the number of points. For example, when the sampling frequency is 48 KHz, the frequency resolution becomes 23 Hz when fast Fourier transform is performed with 2048 points. With this resolution, a frequency difference between semitones, for example, a frequency difference of 15.5 Hz between C3 and C3 # cannot be determined. Therefore, the low frequency resolution of the Fourier transform is compensated by interpolating the two frequencies, and an accurate fundamental frequency (referred to as a “true fundamental frequency” with respect to the provisional fundamental frequency) is extracted. As an interpolation method, for example, a complex spectrum interpolation method can be used. The complex spectrum interpolation method will be further described later.
[0019]
The frequency ratio calculation unit 42 calculates the ratio of true fundamental frequencies of a plurality of audio signals input simultaneously from the microphone 2. The pitch detection unit 43 detects a pitch name by comparing a predetermined frequency ratio corresponding to the pitch of a chord with the frequency ratio calculated by the frequency ratio calculation unit 42. That is, a pitch name having a predetermined frequency ratio closest to the calculated frequency ratio is determined as a detected pitch name. Further, the accuracy of the pitch is also detected by the magnitude of the calculated frequency ratio deviation with respect to the frequency ratio of the detected pitch name. The accuracy of the pitch is determined corresponding to a preset amount of deviation.
[0020]
The display information generation unit 44 converts the pitch name and accuracy detected by the pitch detection unit 43 into display information that can be synthesized on a preset display screen.
[0021]
FIG. 2 is a diagram illustrating an example of a frequency spectrum of an audio signal output from the Fourier transform unit 40, where the horizontal axis represents frequency (logarithmic display) and the vertical axis represents level (decibel). In this example, since two people are inputting voice at the same time, two provisional fundamental frequencies are detected. The frequencies f1 and f2 are provisional fundamental frequencies. Then, the provisional fundamental frequencies f1 and f2 and the next higher frequencies f1a and f2a adjacent to these are extracted and used for interpolation.
[0022]
A complex spectrum interpolation method will be described as an example of the interpolation method. FIG. 3 is a vector display showing an example of the provisional fundamental frequency and frequency components in the vicinity thereof. Thus, the frequency component is displayed three-dimensionally on the imaginary axis, the real axis, and the frequency axis. In this example, the vector direction of each frequency component is different between a frequency equal to or lower than the tentative fundamental frequency (frequency component is a frequency of Zm) and a frequency higher than the tentative fundamental frequency (the former is φ, the latter is φ + π, that is, inversion). ing). The reciprocal display of each frequency component is shown in FIG. In the figure, the horizontal axis represents frequency, and the vertical axis represents (1 / Zm · Zm / | Zm |). As shown in this figure, the frequency component is on a straight line. Therefore, the true fundamental frequency (peak frequency) f is obtained by calculating the zero intersection of this straight line.
[0023]
That is, the true fundamental frequency f is calculated by the following equation. f = m + (u, Zm + 1) / ((u, Zm + 1)-(u, Zm)). In this equation, u is a unit vector having the direction of the difference vector between the m-th and m + 1-th components Zm and Zm + 1, and (•, •) represents an inner product.
[0024]
Although an example in which the true fundamental frequency is on the higher frequency side than the tentative fundamental frequency is shown here, the true fundamental frequency may be on the lower frequency side than the tentative fundamental frequency. In that case, interpolation is performed between the frequency m and the frequency m-1. For the vector display example of FIG. 3, the reciprocal display example of FIG. 4, and the complex spectrum interpolation method, the bit (bit) separate volume (issued September 9, 1987) issued by Kyoritsu Publishing Co., Ltd. 41, Hara and Iguchi: Frequency Identification Using Complex Spectrum, Transactions of the Society of Instrument and Control Engineers, Vol. 19, no. 9 (1983).
[0025]
FIG. 5 is a diagram illustrating a display example of the display device 5, which is a display example based on the display information generated by the display information generation unit 44. In the figure, the display items include a pitch name determined based on the input sound, the accuracy of the input sound, and the pitch of the input sound in the form of a musical score. The pitch name is displayed with a specific name such as “3rd long”. The accuracy indicator 50 indicates the difference between the pitch of the input voice and the determined pitch (3 degrees in this example). That is, when the pitch of the input voice is larger than the determined pitch, the indicator 50 is displayed to the right, and when the pitch of the input voice is small, the indicator 50 is displayed to the left. In the display example, the determined pitch almost coincides with the pitch of the input voice.
[0026]
The display example of FIG. 5 includes a display of an instruction unit for various controls in addition to the display of the pitch name. The level operator 51 is for adjusting the recording level, that is, the gain of the sound input from the microphone 2. Since the level of the input voice from the microphone 2 is displayed by the level meter 52 at the bottom of the display example, the singer can recognize the level and operate the level operator 51 to make the recording level appropriate.
[0027]
The range controller 53 displayed on the right side of the display example is used to limit the range by the fundamental frequency extraction unit 41. In the spectrum display example of FIG. 2, there are only two apparent peaks, and the provisional fundamental frequency can be correctly identified. However, the desired peak does not always appear prominently in this way. For example, since the spectrum of a sound that is off one octave with a similar overtone structure may appear strongly, in such a case, an accurate provisional fundamental frequency is not detected, and an incorrect pitch name is determined. In this case, since the singer can notice an abnormality from the input voice display, the singer can be narrowed by operating the synthesizer 53 as the frequency limiting means. The sound range is displayed in an area defined by lines 54 and 55. The lines 54 and 55 are individually moved up and down when the range controller 53 is operated. Of the frequency components output from the Fourier transform unit 40, the frequency components other than the sound range set in the frequency range specified by the sound range controller 53 are excluded from the processing target by the fundamental frequency extraction unit 41, and the accuracy is high. Make judgments. The operators 51 and 53 can be operated by a well-known instruction means such as a mouse of a personal computer.
[0028]
Further, in the display example of FIG. 5, a switch (sound generation selection switch) 56 for selecting a voice for self-study is provided. In the above-described embodiment, an example in which a plurality of people simultaneously input sound from the microphone 2 has been described. However, the present invention can also be used when practicing choral alone. The sound selection switch 56 is used to generate a comparison sound such as a sound or a musical sound used for self-study (for example, a musical sound or a synthesized sound that is stably sustained like a wind instrument sound). When the pronunciation selection switch 56 is operated, a keyboard 57 is displayed. Then, by instructing each key of the keyboard 57, a sound having a pitch corresponding to the instructed key is generated as a comparison sound. The sound selection switch 56 and the keyboard 57 are means for designating comparative sounds. The sound generation function of the comparative sound by instructing the keys of the keyboard 57 can be realized by a speech synthesizer that synthesizes speech or musical sound according to the instructed pitch, or a PCM sound source device, and a sound system such as an amplifier or a speaker. .
[0029]
When the key of the keyboard 57 is instructed, the fundamental frequency extraction unit 41 detects the fundamental frequency of the voice input from the microphone 2 and calculates the ratio between the fundamental frequency and the fundamental frequency of the comparative sound. That is, the accuracy of the harmony is determined by detecting the pitch between the comparison sound and the input sound by instructing the key of the keyboard 57. In this determination, the comparison sound may be picked up by the microphone 2 in the same manner as the input sound and compared with the input sound by the frequency ratio calculation unit 42, or by the fundamental frequency generated by the speech synthesizer or the PCM sound source device. The fundamental frequency of the generated sound may be compared with the input sound of the microphone 2.
[0030]
The means for designating the comparison sound is not limited to the sound selection switch 56 and the keyboard 57, and can be replaced by any switch means. For example, a pitch may be assigned in advance to an input key of a keyboard of a personal computer, and a comparative sound may be designated using this key.
[0031]
In the above-described embodiment, the accuracy is extracted by interpolating the provisional fundamental frequency, but for simplicity, the pitch name may be determined by the frequency ratio between the provisional fundamental frequencies without interpolation.
[0032]
As described above, according to the present embodiment, it is possible to determine the harmony of the voices of a plurality of trainees, and to determine and display the harmony of the voices of the trainees and the comparison sound. In addition, although the example which uses a complex spectrum interpolation method was shown for the extraction of a fundamental frequency, other well-known methods, such as a phase difference measuring method, can be used. In addition, the pitch name is not limited to the average pitch, and may be displayed as a pitch name based on a pure tuning pitch.
[0033]
In addition, when three singers simultaneously input voice, the pitch name is determined and displayed based on the ratio value of the detected adjacent three fundamental frequencies. In this case, since two pitches are detected, two types of pitch names and accuracy are displayed.
[0034]
Furthermore, it is desirable to provide means for connecting the output of the sound system to headphones. This is because the practitioner can speak while listening to the comparative sound from the headphones while superimposing his own voice on the comparative sound.
[0035]
Furthermore, the present invention is not limited to singing practice, but as a musical instrument ensemble practice, a musical instrument sound is input from a microphone, a pitch between this input sound and another musical instrument sound or a comparative sound is determined, and a pitch name, etc. It can also be applied to display.
[0036]
【The invention's effect】
As is apparent from the above description, according to the first to eighth aspects of the present invention, a pitch name of a predetermined pitch close to the pitch by judging the pitch of the plurality of input sounds or the pitch of the input sound and the comparison sound. Can be displayed. Therefore, in singing practice or instrument practice, performance sounds and voices between a plurality of practitioners or a pitch between sounds generated by a practitioner and a training device can be visually recognized, so that the practice effect of generating harmony is improved.
[0037]
In particular, according to the invention of claim 3, the accuracy of the pitch can be visually judged, and according to the invention of claim 4, the pitch of the pronunciation can be judged visually.
[0038]
According to the invention of claim 5, the frequency range of the frequency component input to the fundamental frequency extraction means is limited, and for example, it is erroneously extracted as the fundamental frequency of a sound above one octave having the same harmonic structure. Can be prevented.
[0039]
Furthermore, according to the invention of claim 6, the fundamental frequency can be extracted with a resolution higher than the resolution of the frequency analysis means.
[Brief description of the drawings]
FIG. 1 is a block diagram showing main functions of a singing practice apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating an example of a frequency spectrum of an audio signal output from a Fourier transform unit.
FIG. 3 is a diagram in which a provisional fundamental frequency and frequency components in the vicinity thereof are displayed as vectors.
FIG. 4 is a diagram in which each frequency component subjected to frequency analysis is displayed as an inverse number.
FIG. 5 is a diagram illustrating a display example of sound and the like by the display device.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Singing practice apparatus, 2 ... Microphone, 4 ... Audio | voice signal processing apparatus, 5 ... Display apparatus, 40 ... Fourier-transform part, 41 ... Fundamental frequency extraction part, 42 ... Frequency ratio calculation part, 43 ... Pitch detection part

Claims

マイクロフォンから入力された音を各周波数成分に分解する周波数分析手段と、
前記周波数分析手段によって得られた周波数スペクトルに現れたピークのうちからレベルの高い順に予定数のピークを検出し、該予定数のピークの周波数を入力音の基本周波数として決定する基本周波数抽出手段と、
決定された前記基本周波数のうち周波数が隣接するもの同士の比の値によって音程を算出する音程算出手段と、
前記音程算出手段で算出された音程に近い所定の音程に対応する音程名を表示する音程名表示手段とを具備したことを特徴とする音程判定装置。Frequency analysis means for decomposing the sound input from the microphone into frequency components;
Fundamental frequency extraction means for detecting a predetermined number of peaks in order from the highest level among peaks appearing in the frequency spectrum obtained by the frequency analysis means, and determining the frequency of the predetermined number of peaks as a fundamental frequency of the input sound; ,
A pitch calculation means for calculating a pitch based on a ratio value between adjacent ones of the determined fundamental frequencies;
A pitch determination apparatus comprising pitch name display means for displaying a pitch name corresponding to a predetermined pitch close to the pitch calculated by the pitch calculation means.

前記音程算出手段で算出された音程と該音程に近い所定の音程との違いを音程の正確さとして表示する正確さ表示手段を具備したことを特徴とする請求項１記載の音程判定装置。 2. The pitch determination apparatus according to claim 1, further comprising accuracy display means for displaying the difference between the pitch calculated by the pitch calculation means and a predetermined pitch close to the pitch as the accuracy of the pitch.

前記予定数の基本周波数に対応する音高を譜面として表示する音高表示手段を具備したことを特徴とする請求項１記載の音程判定装置。The pitch determination device according to claim 1 , further comprising pitch display means for displaying a pitch corresponding to the predetermined number of fundamental frequencies as a musical score .

前記基本周波数抽出手段で基本周波数の抽出対象となる周波数範囲を設定する手段と、
前記周波数分析手段から出力される各周波数成分を前記周波数範囲に制限する手段とを具備したことを特徴とする請求項１記載の音程判定装置。 Means for setting a frequency range from which the fundamental frequency is extracted by the fundamental frequency extracting means;
2. A pitch determination apparatus according to claim 1 , further comprising means for limiting each frequency component output from the frequency analysis means to the frequency range .

前記基本周波数抽出手段が、検出された前記予定数のピークの周波数を仮の基本周波数とし、該仮の基本周波数と該基本周波数にそれぞれ隣接する周波数成分の周波数との補間により真の基本周波数を決定するように構成されたことを特徴とする請求項１記載の音程判定装置。 The fundamental frequency extraction means sets the detected number of peak frequencies as a provisional fundamental frequency, and obtains a true fundamental frequency by interpolation between the provisional fundamental frequency and the frequency components adjacent to the fundamental frequency. The pitch determination device according to claim 1 , wherein the pitch determination device is configured to determine.

前記マイクロフォンから入力される音が、歌唱音であることを特徴とする請求項１記載の音程判定装置。The pitch determination apparatus according to claim 1, wherein the sound input from the microphone is a singing sound .

前記マイクロフォンから入力される音が、楽器演奏音であることを特徴とする請求項１記載の音程判定装置。The pitch determination apparatus according to claim 1, wherein the sound input from the microphone is a musical instrument performance sound.