JP3001353B2

JP3001353B2 - Automatic transcription device

Info

Publication number: JP3001353B2
Application number: JP5184820A
Authority: JP
Inventors: 和広望月; 扶佐子平林
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1993-07-27
Filing date: 1993-07-27
Publication date: 2000-01-24
Anticipated expiration: 2015-01-24
Also published as: JPH0744163A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、歌唱音声やハミング音
声や楽器音等の音響信号から楽譜データを生成する自動
採譜装置に関し、特に、音響信号の所定区間に対して１
つの音程を決定する音程同定処理に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic transcription apparatus for generating musical score data from acoustic signals such as singing voices, humming voices, and instrument sounds.
It relates to a pitch identification process for determining one pitch.

【０００２】[0002]

【従来の技術】歌唱音声やハミング音声や楽器音等の音
響信号を楽譜データに変換する自動採譜方式において
は、音響信号から楽譜としての基本的な情報である音
長、音程、調、拍子及びテンポを検出することを有す
る。2. Description of the Related Art In an automatic transcription system for converting an acoustic signal such as a singing voice, a humming voice or an instrumental sound into musical score data, a sound length, a pitch, a key, a time signature, and the like, which are basic information of a musical score from an acoustic signal. Detecting the tempo.

【０００３】従来の自動採譜装置においては、まず、分
析周期毎に音響信号のピッチ情報とパワー情報を抽出す
る。ここで言う分析周期とは、楽音の最小単位（例えば
１６分音符など）よりも十分短い時間間隔である。ま
た、ピッチ情報とは１つの時間点における音の高低の情
報である（以後、単にピッチと言った場合も同様であ
る）。また、パワー情報とは１つの時間点における音量
の情報である。その後、抽出されたピッチ情報及びパワ
ー情報から音響信号を一音と見なせる区間（以下、セグ
メントと呼ぶ）に区分し（かかる処理をセグメンテーシ
ョン処理と呼ぶ）、次いで、各セグメントに対して、そ
のセグメント内から抽出された任意個のピッチ情報を元
に１つの音程を決定する（かかる処理を音程同定処理と
呼ぶ）。ここでいう音程とは最終的なアウトプットであ
る楽譜で表現できる音の高低、つまり、半音刻みの音の
高低の情報である（以後、音程と言った場合には同様で
ある）。さらに、ピッチ情報の分布情報に基づいて音響
信号全体の調を決定し、セグメントの分布状況やセグメ
ントの長さの頻度などから拍子及びテンポを決定すると
いう順序で各情報を得ている。In a conventional automatic transcription apparatus, first,
Extract pitch information and power information of acoustic signal for each analysis cycle
You. The analysis period here is the minimum unit of a musical tone (for example,
This is a time interval that is sufficiently shorter than 1 / 16th note. Ma
Also, pitch information is information on the pitch of a sound at one time point.
(The same applies to the case where the pitch is simply referred to as
). Power information is the volume at one time point.
Information. Then, the extracted pitch information and power information from the considered audio signal and one sound interval (hereinafter, referred to as segment) (referred to as such processing segmentation processing) divided into, then for each segment, its
Based on any pitch information extracted from within the segment
One pitch is determined (this process is referred to as a pitch identification process).
Call). The pitch here is the final output
Pitch that can be expressed in a musical score
It is the information of the high and low.
is there). Further, the tone of the entire sound signal is determined based on the distribution information of the pitch information, and the respective information is obtained in the order of determining the beat and the tempo from the distribution state of the segments, the frequency of the length of the segments, and the like.

【０００４】前記音程同定処理の具体的方法としては、
従来、セグメント内の各ピッチ情報との差が一番小さい
音程に同定する方法、ピッチ情報の平均音程に同定する
方法、ピッチ情報の中央値に同定する方法、ピッチ情報
の頻出値に同定する方法、パワー情報がピークに達した
時点のピッチ情報に同定する方法があった。A specific method of the pitch identification processing is as follows.
Conventionally, a method of identifying a pitch having the smallest difference from each pitch information in a segment, a method of identifying an average pitch of pitch information, a method of identifying a median of pitch information, and a method of identifying a frequent value of pitch information There is a method of identifying the pitch information when the power information reaches a peak.

【０００５】[0005]

【発明が解決しようとする課題】音響信号、特に人によ
って発声された音響信号は、ピッチが安定しておらず、
一定の音程を意図している場合であってもピッチが揺れ
ることが多い。特に、音の出だしの部分や、次の音に移
行するときに、速やかに意図する音程を出せずにピッチ
がふらつくことが多い。また歌唱や演奏の技術の１つと
して意図的に音の出だしのピッチを変化させることもあ
る。さらには、楽器の構造上、音の出だしや終わりの部
分でピッチが変化するものもある。このようなことが音
程同定処理を非常に難しいものとしている。 SUMMARY OF THE INVENTION Acoustic signals, especially by humans
The pitch of the sound signal uttered is not stable,
The pitch fluctuates even if the intended pitch is constant
Often. In particular, move to the beginning of the sound or to the next sound.
When pitching, do not immediately produce the intended pitch
Often fluctuates. Also one of the techniques of singing and playing
Sometimes intentionally change the pitch of the sound.
You. Furthermore, due to the structure of the instrument, the beginning and end of the sound
Some pitches change in minutes. Such a sound
This makes the identification process very difficult.

【０００６】音程は、音長と共に楽譜データの重要な要
素であるので、正確に同定する必要があり、これができ
ない場合は、楽譜データの精度を低いものとする。[0006] Since the pitch is an important element of the musical score data together with the pitch, it is necessary to identify it accurately. If this is not possible, the accuracy of the musical score data is reduced.

【０００７】本発明はこの点を考慮し、音程をより正確
に同定することのできる新規の音程同定方法を提案し、
最終的な楽譜データの精度を一段と向上させることので
きる自動採譜装置を提供しようとするものである。In view of this point, the present invention proposes a new pitch identification method capable of more accurately identifying a pitch,
It is an object of the present invention to provide an automatic transcription apparatus that can further improve the accuracy of final musical score data.

【０００８】[0008]

【課題を解決するための手段】前記課題を解決するた
め、本発明では、入力された音響信号から分析周期毎に
ピッチ情報及びパワー情報を抽出するピッチ・パワー抽
出部と、前記ピッチ情報及び前記パワー情報に基づいて
前記音響信号を一音とみなせる区間に区分するセグメン
テーション部と、前記の各区間に対して区間内から抽出
された任意個の前記ピッチ情報を用いて当該区間に対す
る１つの音程を決定する音程同定部と、前記音程同定の
結果から前記音響信号の調と拍子とテンポを推定し前記
音響信号を楽譜形式で出力する楽譜生成部とを一部に備
えた自動採譜装置において、前記音程同定部を、前記ピ
ッチ情報に対して同定する音程候補との距離を算出する
距離算出手段と、前記ピッチ情報に対して前記区間内で
の位置に応じて重み付け係数を決定する重み付け係数決
定手段と、前記区間内の各ピッチ情報における前記距離
と前記重み付け係数との積和値を算出する積和算出手段
と、算出された前記積和値が最も小さくなる音程候補に
前記区間の音程を同定する音程決定手段とによって構成
することを特徴とする。In order to solve the above-mentioned problems, according to the present invention, an input acoustic signal is analyzed every analysis period.
And pitch power extractor for extracting a pitch information and power information, the pitch information and the a segmentation unit for dividing the acoustic signals into sections that can be regarded as one sound based on the power information, the section in respect to the respective sections of Extract from
Using the arbitrary number of the pitch information
Automatic provided in part a score generator outputting a pitch identification unit for determining one of the pitch, the sound signal to estimate the tone and time signature and tempo of the acoustic signal from the result of the pitch identified in notation format that In the music transcription device, the pitch identification unit calculates a distance from a pitch candidate identified with respect to the pitch information, and a weighting coefficient is determined for the pitch information according to a position in the section. Weighting coefficient determining means, a sum-of-products calculating means for calculating a sum-of-product value of the distance and the weighting coefficient in each pitch information in the section, and a pitch candidate in which the calculated sum-of-products value is the smallest. And a pitch determining means for identifying a pitch of the section.

【０００９】[0009]

【作用】本発明における音程同定部を用いれば、各区間
の音程を同定する際、まず、当該区間の各ピッチ情報に
対して、同定する音程候補との距離と区間内の位置によ
って定まる重み付け係数とを求め、その積和値が最も小
さくなる音程に同定される。この重み付け係数を区間の
始端や終端付近では小さく設定しておけば、区間始端や
終端のピッチが積和値に及ぼす影響は小さくなるので、
この部分の不安定なピッチで区間全体が意図しない音程
に同定されることを少なくすることができ、より正確な
音程同定が可能になる。When the pitch identification unit of the present invention is used to identify a pitch in each section, first, for each pitch information of the section, a weighting coefficient determined by the distance from the pitch candidate to be identified and the position in the section. Is determined, and the product sum is identified at the pitch at which the sum is the smallest. If this weighting coefficient is set small near the beginning and end of the section, the influence of the pitch at the beginning and end of the section on the product sum value will be small,
It is possible to reduce the possibility that the entire section is identified as an unintended pitch at the unstable pitch of this portion, and more accurate pitch identification becomes possible.

【００１０】[0010]

【実施例】以下、本発明の一実施例を図面を参照しなが
ら説明する。An embodiment of the present invention will be described below with reference to the drawings.

【００１１】図１は、本発明の１実施例を示すブロック
図である。本実施例は、ピッチ・パワー抽出部１１、セ
グメンテーション部１２、音程同定部１３、楽譜生成部
１４から構成され、さらに前記音程同定部１３は、距離
算出手段１３１、重み付け係数決定手段１３２、積和算
出手段１３３、音程決定手段１３４の各手段からなる。FIG. 1 is a block diagram showing one embodiment of the present invention. This embodiment comprises a pitch / power extraction unit 11, a segmentation unit 12, a pitch identification unit 13, and a musical score generation unit 14. The pitch identification unit 13 further includes a distance calculation unit 131, a weighting coefficient determination unit 132, a product sum It comprises a calculating means 133 and a pitch determining means 134.

【００１２】ピッチ・パワー抽出部１１では、入力され
た音響信号のピッチ情報及びパワー情報を抽出する。セ
グメンテーション部１２では、ピッチ・パワー抽出部１
１で得られたピッチ情報及びパワー情報に基づいて入力
された音響信号を一音とみなせる区間に区分する。音程
同定部１３は、区分された各区間毎に１つの音程を決定
する。楽譜生成部は、音程同定部１３の結果から入力さ
れた音響信号の調と拍子とテンポを推定し楽譜形式に変
換して出力する。The pitch / power extraction unit 11 extracts pitch information and power information of the input audio signal. In the segmentation section 12, the pitch / power extraction section 1
Based on the pitch information and power information obtained in step 1, the input audio signal is divided into sections that can be regarded as one sound. The pitch identification unit 13 determines one pitch for each section. The musical score generation unit estimates the key, beat, and tempo of the input audio signal from the result of the pitch identification unit 13, converts it to a musical score format, and outputs it.

【００１３】図２は、前記各部の処理を実施するシステ
ムの構成図である。中央処理ユニット（ＣＰＵ２１）
は、当該装置の全体を制御するものである。ＣＰＵ２１
とバス２２を介して接続されている主記憶装置２３に
は、図３及び図４に示す採譜処理プログラムが格納され
ている。バス２２には、ＣＰＵ２１及び主記憶装置２３
に加えて、入力装置であるキーボード２４、出力装置で
ある表示装置２５、ワーキングメモリとして用いられる
補助記憶装置２６及びアナログ／デジタル変換器２７が
接続されている。FIG. 2 is a block diagram of a system for executing the processing of each of the above-mentioned units. Central processing unit (CPU 21)
Controls the entire device. CPU 21
3 and FIG. 4 are stored in the main storage device 23 connected via the bus 22. The bus 22 includes a CPU 21 and a main storage device 23.
In addition, a keyboard 24 as an input device, a display device 25 as an output device, an auxiliary storage device 26 used as a working memory, and an analog / digital converter 27 are connected.

【００１４】アナログ／デジタル変換器２７には、マイ
クロフォン等の音響信号入力装置２８が接続されてい
る。この音響信号入力装置２８は、ユーザーによって発
声された歌唱やハミングや、楽器から発生された楽音等
の音響信号を捕捉して電気信号に変換するものであり、
その電気信号をアナログ／デジタル変換器２７に出力す
る。An audio signal input device 28 such as a microphone is connected to the analog / digital converter 27. The acoustic signal input device 28 captures acoustic signals such as singing and humming uttered by the user and musical sounds generated from musical instruments and converts the signals into electric signals.
The electric signal is output to the analog / digital converter 27.

【００１５】ＣＰＵ２１は、キーボード２４によって処
理が命令されたとき、主記憶装置２３に格納されている
プログラムを実行してアナログ／デジタル変換器２７に
よってデジタル信号に変換された信号を一旦、補助記憶
装置２６に格納し、その後、これら音響信号を前記のプ
ログラムを実行して楽譜データに変換し、必要に応じて
表示装置２５に出力する。When a process is instructed by the keyboard 24, the CPU 21 executes a program stored in the main storage device 23 and temporarily converts a signal converted into a digital signal by the analog / digital converter 27 into an auxiliary storage device. After that, these sound signals are converted into musical score data by executing the above-mentioned program, and output to the display device 25 as necessary.

【００１６】次に、ＣＰＵ２１が音響信号を補助記憶装
置２６に格納した後に実行する採譜処理を、図３に示す
処理フローに従って説明する。Next, a music transcription process executed after the CPU 21 stores the audio signal in the auxiliary storage device 26 will be described with reference to a processing flow shown in FIG.

【００１７】まず、ＣＰＵ２１は、音響信号を自己相関
分析して分析周期毎に音響信号のピッチ情報を抽出し、
また２乗和処理して分析周期毎にパワー情報を抽出し、
その後ノイズ除去や平滑化等の処理を実行する（ステッ
プ３０１、３０２）。その後、ＣＰＵ２１は、ピッチ情
報については、その分布状況に基づいて得られる音響信
号の基準音程と絶対音程との差を算出し、その差の大き
さに応じてピッチ情報をシフトさせるチューニング処理
を実行する（ステップ３０３）。First, the CPU 21 performs an autocorrelation analysis of the acoustic signal to extract pitch information of the acoustic signal for each analysis cycle.
In addition, power information is extracted for each analysis cycle by performing a square sum process,
Thereafter, processing such as noise removal and smoothing is executed (steps 301 and 302). Thereafter, for the pitch information, the CPU 21 calculates a difference between the reference pitch and the absolute pitch of the acoustic signal obtained based on the distribution state, and executes a tuning process of shifting the pitch information according to the magnitude of the difference. (Step 303).

【００１８】次いで、ＣＰＵ２１は、得られたピッチの
連続性から、１音と見なせるセグメントに切り分けるセ
グメンテーション処理（ステップ３０４）を実行し、ま
た、得られたパワー情報の変化に基づいて、１音と見な
せるセグメントに切り分けるセグメンテーション処理
（ステップ３０５）を実行する。ここで得られた両者の
セグメント情報に基づいて、ＣＰＵ２１は、４分音符や
８分音符等の時間長に相当する基準長を算出してこの基
準長に基づいて再度セグメンテーション処理を実行する
（ステップ３０６）。Next, the CPU 21 executes a segmentation process (step 304) for dividing the obtained pitch continuity into segments that can be regarded as one sound, and based on the obtained change in power information, generates one sound. A segmentation process (step 305) for dividing into segments that can be regarded is performed. Based on the two pieces of segment information obtained here, the CPU 21 calculates a reference length corresponding to a time length of a quarter note, an eighth note, etc., and executes the segmentation process again based on the reference length (step 306).

【００１９】ＣＰＵ２１は、このようにセグメンテーシ
ョン処理された１音毎の各区間に対して音程同定処理を
行う（ステップ３０７）。The CPU 21 performs a pitch identification process for each section of each sound that has been subjected to the segmentation process as described above (step 307).

【００２０】その後、ＣＰＵ２１は入力音響信号の調を
決定する（ステップ３０８）。これは例えば、チューニ
ング後のピッチ情報の出現頻度を調べ、この出現頻度を
あらかじめ設定された各調毎の特性（例えばハ長調であ
れば、ド、レ、ミ、ソ、ラの周辺のピッチの出現頻度が
高く、次いでファ、シが高く、ド＃、レ＃、ファ＃、ソ
＃、ラ＃は低い）と照らし合わせることで、調を決定す
る。さらに必要に応じては、ステップ３０７で同定され
た音程のうち、ステップ３０８で決定された調では使用
される頻度が低い音程に関しては、同定された音程を見
直し修正する（ステップ３０９）。 Thereafter, the CPU 21 adjusts the tone of the input audio signal.
A decision is made (step 308). This is, for example, Tuni
Of the pitch information after pitching, and
The characteristics of each key set in advance (for example, C major
Then, the frequency of the pitch around do, les, mi, so, la
High, then fa, si high, de #, les #, fa #, se
#, LA # is low) to determine the key
You. Further, if necessary, the identification
Used in the key determined in step 308
For infrequently performed intervals, look at the identified intervals.
The correction is made (step 309).

【００２１】このようにしてセグメント及び音程が決定
されると、ＣＰＵ２１は、セグメントの分布状況やセグ
メントの長さの頻度などから拍の位置や小節先頭の位置
を決定し（ステップ３１０）、この決定された拍及び小
節の情報からテンポを決定する（ステップ３１１）。When the segment and the pitch are determined in this manner, the CPU 21 determines a beat position and a bar start position from the distribution state of the segment and the frequency of the length of the segment (step 310), and this determination is made. The tempo is determined from the information on the beats and measures (step 311).

【００２２】そして、ＣＰＵ２１は、決定された音程、
音長、調、拍及びテンポから、最終的に楽譜データを生
成する（ステップ３１２）。Then, the CPU 21 determines the determined pitch,
Musical score data is finally generated from the pitch, key, beat, and tempo (step 312).

【００２３】次に、本実施例における、１セグメントに
対する音程同定処理（ステップ３０７）について、図４
のフローチャートを用いて詳しく説明する。Next, the pitch identification processing for one segment (step 307) in the present embodiment will be described with reference to FIG.
This will be described in detail using the flowchart of FIG.

【００２４】ＣＰＵ２１は、まず同定される音程の候
補、｛ｎ₀、ｎ₁、、、ｎ_m｝を洗い出す（ステップ４００）。これは、同定される音
程は少なくとも、セグメント内の一番低いピッチ情報を
越えない最高の音程と、セグメント内の一番高いピッチ
情報を越える最低の音程と、及びその間にある音程のい
ずれかの中にあるはずであるから、それらの音程を列挙
すればよい。The CPU 21 first identifies the candidate of the pitch to be identified, {n ₀ , n _1, ..., N _m } (step 400). This means that the intervals identified are at least one of the highest interval that does not exceed the lowest pitch information in the segment, the lowest interval that exceeds the highest pitch information in the segment, and any interval in between. It should be inside, so just list those pitches.

【００２５】そして、まず１つ目の音程の候補ｎ
_{i ( i = 0 )}を選び（ステップ４０１）、積和値を集計
する変数Ｔ（ｎ_i）を０に初期化し（ステップ４０
２）、時間ｔをそのセグメント内の最初のピッチ分析点
にセットする（ステップ４０３）。First, the first pitch candidate n
_{i (i = 0)} is selected (step 401), and a variable T (n _i ) for summing up the sum of products is initialized to 0 (step 40).
2) Set time t to the first pitch analysis point in the segment (step 403).

【００２６】続いて、ｔ点でのピッチ情報ｐ_t と音程
ｎ_i の距離ε（ｎ_i ，ｐ_t ）を算出する（ステップ
４０４）。この距離εは、音程が離れているほど大きく
なる値で、例えば、 ε（ｎ，ｐ）＝｜（ｎの周波数の対数値）−（ｐの周波数の対数値）｜のように定義すればよい。[0026] Then, the distance of the pitch information p _t and pitch n _i at t point epsilon (n _i, p _t) is calculated (step 404). The distance ε is a value that increases as the interval increases. For example, if the distance ε is defined as follows: ε (n, p) = | (logarithmic value of frequency of n) − (logarithmic value of frequency of p) | Good.

【００２７】次に、セグメント内の位置によって決まる
重み付け係数ω（ｔ）を求める（ステップ４０５）。こ
れは図５に示すようなセグメント内での位置と係数値と
の関係を、主記憶装置２３に格納されている前記プログ
ラムにあらかじめ記述しておけばよい。Next, a weighting coefficient ω (t) determined by the position in the segment is obtained (step 405). In this case, the relationship between the position in the segment and the coefficient value as shown in FIG. 5 may be described in advance in the program stored in the main storage device 23.

【００２８】以上のようにして求めた距離ε（ｎ_i，ｐ
_t）と係数ω（ｔ）の積算値を変数Ｔ（ｎ_i）に加算す
る（ステップ４０６）。The distance ε (n _i , p
The sum of _t ) and the coefficient ω (t) is added to the variable T (n _i ) (step 406).

【００２９】このステップ４０４、４０５、４０６の処
理を、セグメント内の最後のピッチ分析点まで繰り返す
（ステップ４０７、４０８）。最後の分析点まで積算値
を加算したら、その積和値を記憶しておく（ステップ４
０９）。The processing of steps 404, 405 and 406 is repeated until the last pitch analysis point in the segment (steps 407 and 408). When the integrated value is added up to the last analysis point, the product-sum value is stored (step 4).
09).

【００３０】そして、ｉ＜ｍつまり、他の音程の候補があれば、次の音程の候補でス
テップ４０２からの処理を繰り返す（ステップ４１０、
４１１）。If i <m, that is, if there is another pitch candidate, the process from step 402 is repeated with the next pitch candidate (step 410,
411).

【００３１】最後の音程の候補まで積和値を求めたら、
その積和値が最小となる音程の候補に、そのセグメント
の音程を同定し（ステップ４１２）、１セグメントの音
程同定処理を終える。When the product sum value is obtained up to the last pitch candidate,
The pitch of the segment is identified as a pitch candidate with the minimum sum of products (step 412), and the pitch identification processing of one segment is completed.

【００３２】図５で示したセグメント内での位置と係数
値との関係に関して、その他の設定例を図６に示した。
この関係は、ここに挙げたもの以外にも、歌唱を採譜す
る場合には歌唱者の癖に応じて、また、楽器音を採譜す
る場合にはその楽器の特性に応じて、それぞれ設定すれ
ばよい。FIG. 6 shows another setting example of the relationship between the position in the segment shown in FIG. 5 and the coefficient value.
In addition to those listed here, this relationship can be set according to the habit of the singer when transcribing singing, or according to the characteristics of the instrument when transcribing musical instrument sounds. Good.

【００３３】また、音程同定処理に用いるピッチ情報
は、周波数単位のＨｚで表されているものであっても、
また、音楽分野で用いられているセントを単位としたも
のであってもよい。The pitch information used in the pitch identification processing may be expressed in Hz in frequency units.
Also, the unit may be a cent used in the music field.

【００３４】[0034]

【発明の効果】以上のように、本発明によれば、各セグ
メントの音程の同定に際し、ピッチが比較的安定した部
分を重視できるため、良好に音程を決定でき、楽譜デー
タの精度を一段と高めることができる。As described above, according to the present invention, when the pitch of each segment is identified, a portion where the pitch is relatively stable can be emphasized, so that the pitch can be determined satisfactorily and the accuracy of the score data can be further improved. be able to.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の一実施例を示すブロック図FIG. 1 is a block diagram showing an embodiment of the present invention.

【図２】本発明を実施する自動採譜装置のシステム構成
図FIG. 2 is a system configuration diagram of an automatic transcription apparatus that implements the present invention.

【図３】実施例の処理フローを説明する図FIG. 3 is a view for explaining a processing flow of the embodiment;

【図４】本発明の一実施例における音程同定処理を示す
フローチャートFIG. 4 is a flowchart showing a pitch identification process according to an embodiment of the present invention.

【図５】本発明で用いる重み付け係数の定義例を説明す
るための図FIG. 5 is a diagram for explaining an example of defining weighting coefficients used in the present invention;

【図６】図５以外の重み付け係数の定義例を示すための
図FIG. 6 is a diagram showing an example of definition of weighting factors other than FIG.

【符号の説明】[Explanation of symbols]

１１ピッチ・パワー抽出部１２セグメンテーション部１３音程同定部１４楽譜生成部１３１距離算出手段１３２重み付け係数決定手段１３３積和算出手段１３４音程決定手段２１ＣＰＵ２２バス２３主記憶装置２４キーボード２５表示装置２６補助記憶装置２７アナログ／デジタル変換器２８音響信号入力装置 DESCRIPTION OF SYMBOLS 11 Pitch power extraction part 12 Segmentation part 13 Pitch identification part 14 Music score generation part 131 Distance calculation means 132 Weighting coefficient determination means 133 Product sum calculation means 134 Pitch determination means 21 CPU 22 Bus 23 Main storage device 24 Keyboard 25 Display device 26 Auxiliary Storage device 27 Analog / digital converter 28 Sound signal input device

───────────────────────────────────────────────────── フロントページの続き (72)発明者望月和広神奈川県川崎市高津区坂戸３丁目２番１号日本電気技術情報システム開発株式会社内 (72)発明者平林扶佐子東京都港区芝五丁目７番１号日本電気株式会社内 (56)参考文献特開平２−120893（ＪＰ，Ａ) 特開昭60−90376（ＪＰ，Ａ) 特開昭59−24895（ＪＰ，Ａ) 特開昭61−32098（ＪＰ，Ａ) ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Kazuhiro Mochizuki 3-2-1 Sakado, Takatsu-ku, Kawasaki-shi, Kanagawa Japan Electric Technology Information System Development Co., Ltd. (72) Inventor Fusako Hirabayashi 5-chome, Shiba, Minato-ku, Tokyo No. 7-1 NEC Corporation (56) References JP-A-2-120893 (JP, A) JP-A-60-90376 (JP, A) JP-A-59-24895 (JP, A) JP-A Sho 61-32098 (JP, A)

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】入力された音響信号から分析周期毎にピ
ッチ情報及びパワー情報を抽出するピッチ・パワー抽出
部と、前記ピッチ情報及び前記パワー情報に基づいて前
記音響信号を一音とみなせる区間に区分するセグメンテ
ーション部と、前記の各区間に対して区間内から抽出さ
れた任意個の前記ピッチ情報を用いて当該区間に対する
１つの音程を決定する音程同定部と、前記音程同定の結
果から前記音響信号の調と拍子とテンポを推定し前記音
響信号を楽譜形式で出力する楽譜生成部とを一部に備え
た自動採譜装置において、前記音程同定部を、前記ピッチ情報に対して同定する音
程候補との距離を算出する距離算出手段と、前記ピッチ
情報に対して前記区間内での位置に応じて重み付け係数
を決定する重み付け係数決定手段と、前記区間内の各ピ
ッチ情報における前記距離と前記重み付け係数との積和
値を算出する積和算出手段と、算出された前記積和値が
最も小さくなる音程候補に前記区間の音程を同定する音
程決定手段とによって構成することを特徴とする自動採
譜装置。1. An audio signal from an input sound signal is analyzed every analysis cycle.
And pitch power extractor for extracting pitch information and power information, the pitch information and the segmentation unit for dividing the section can be regarded as one sound the sound signal based on the power information, the interval to the respective sections of the Extracted from within
A pitch identification unit that determines one pitch for the section using the given arbitrary pieces of pitch information, and estimates the key, time signature, and tempo of the audio signal from the result of the pitch identification, and converts the audio signal into a musical score format. An automatic music transcription apparatus partially including a musical score generation unit for outputting, wherein the pitch identification unit calculates a distance to a pitch candidate identified with respect to the pitch information; Weighting coefficient determining means for determining a weighting coefficient according to a position in the section, and a product-sum calculating means for calculating a product-sum value of the distance and the weighting coefficient in each pitch information in the section; And a pitch determining means for identifying a pitch in the section as a pitch candidate having the smallest sum of products.