JP2004118010A

JP2004118010A - Automatic imparting apparatus for musical piece impression value

Info

Publication number: JP2004118010A
Application number: JP2002283389A
Authority: JP
Inventors: Tadahiko Kumamoto; 熊本　忠彦; Kimiko Uchimoto; 内元　公子
Original assignee: Communications Research Laboratory
Current assignee: Communications Research Laboratory
Priority date: 2002-09-27
Filing date: 2002-09-27
Publication date: 2004-04-15
Anticipated expiration: 2022-09-27
Also published as: JP3697515B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an automatic musical piece impression value imparting apparatus which automatically imparts a musical piece impression value by using musical piece data, specially, a high-precision imparting technology for musical piece impression values. <P>SOLUTION: Processes are carried out by a musical piece basic feature quantity extraction part 10 which inputs computer-processable musical piece data from a musical piece data input part 9 and extracts a physical feature quantity, an N-gram generation part 11 which generates its N gram, and an N-gram feature quantity generation part 12 which generates an N-gram feature quantity. The musical piece feature quantity is extracted from the N-gram feature quantity and a musical piece impression value arithmetic part 13 calculates a musical piece impression value. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、楽曲データの処理装置に関するものであり、特に楽曲データから該楽曲の印象値を自動的に付与する装置に関わる。
【０００２】
【従来の技術】
音楽などの芸術作品に対する評価、例えばその作品に対する印象を決定することは、従来コンピュータなどの処理にはなじまないと考えられていた。そのため、例えば楽曲の印象によって作品の印象を分類するとしても、分類作業自体は人間が行うものであった。従って、まったく新しい楽曲に対して新しい印象値をコンピュータによって付与することが課題となっている。
【０００３】
本件出願人らをはじめとして、従来までの研究によると、コンピュータにおいて楽曲印象値を自動的に付与するということは、コンピュータが処理可能な楽曲データから、どのような楽曲特徴量を抽出し、どのような計算式を用いて、どのような楽曲印象値を出力するのか、という問題を中心に議論が進められている。
ここで、楽曲印象値とは、楽曲印象を数値化したものであり、楽曲特徴量とは、楽曲データから抽出し、楽曲印象値を計算するために用いられる物理的特徴量を指している。
【０００４】
従って、上記の課題は楽曲印象値の設計、楽曲特徴量の設計、楽曲印象値計算式の設計についての技術的課題と言うことができ、いくつかの研究が行われてきたが、いずれも断片的なものにとどまり、未だに全体的な設計が行われて的確な楽曲印象値を自動的に付与する装置は提供されていない。
【０００５】
例えば、楽曲印象値の設計において、非特許文献１によれば、ＳＤ（Ｓｅｍａｎｔｉｃ　Ｄｉｆｆｅｒｅｎｔｉａｌ）法に基づく主観評価実験データに対する因子分析の結果から、音楽感性空間と呼ばれる５次元の因子空間を構成し、ユーザが入力する印象と楽曲が有する印象とをこの空間内の座標値として表している。
【０００６】
しかしながら、因子軸の意味の解釈は人手によるので個人差があり、楽曲に付与された座標値が実際にどのような印象を表しているのかを端的に示すことは難しい。また、楽曲の印象を１つの点で表しているため、すべての印象尺度（非特許文献１のシステムでは８個）に何らかの値を入力しなければならず、印象尺度に対する評価として「どちらでもない（楽曲印象値が不定な状態）」を認めていない。
そのため、明るい楽曲を検索するつもりで、明るさに関する印象尺度の評価を「明るい」にしても、実際には明るさ以外の印象尺度に対して「どちらとも言えない」に相当する値（１点〜７点の７段階評価では４点）を持つ楽曲が検索されることになる。
【０００７】
【非特許文献１】池添剛、梶川嘉延、野村康雄：「音楽感性空間を用いた感性語による音楽データベース検索システム」　情処学論，４２，１２，ｐｐ．３２０１−３２１２（２００１）
【０００８】
また、楽曲特徴量の設計においては、非特許文献１〜３に発表された研究などがある。これらの研究をはじめとする従来の楽曲データを対象とする楽曲検索研究では、楽曲特徴量として、音の高さや強さ、長さ、リズムやテンポ、拍子、調性（短調／長調）等の音楽構成要素に対する平均や分散、時間的割合といった静的な特徴量を用いていることが多い。
しかしながら、本来時系列データである楽曲を静的な特徴量だけで表現するのは本質的に限界があるものと考えられる。
【０００９】
【非特許文献２】佐藤聡、菊地幸平、北上始：「音楽データを対象としたイメージ検索のための感情価の自動生成」、情処研報，データベースシステム１１８−８，情報学基礎５４−８，ｐｐ．５７−６４（１９９９）
【非特許文献３】佐藤聡、小川潤、堀野義博、北上始：「感情に基づく音楽作品検索システムの実現に向けての検討」、信学技報（音声），ＳＰ２０００−１３７，ｐｐ．５１−５６（２００１）
【００１０】
従来研究（非特許文献２〜４）でも、このような音の時間的推移を考慮した特徴量として、連続する３音の音の高さや長さの推移をパターン化したものなどが提案されているが、連続する音の数が一定であり、限定的な時間推移しか取り扱えなかった。
【００１１】
【非特許文献４】辻康博、星守、大森匡：「曲の局所パターン特徴量を用いた類似曲検索・感性語による検索」、信学技報（音声），ＳＰ９６−１２４，ｐｐ．１７−２４（１９９７）
【００１２】
【発明が解決しようとする課題】
本発明は上記従来の技術が有する問題に鑑みて創出されたものであって、楽曲データを用いて楽曲印象値を自動的に付与する楽曲印象値自動付与装置を提供することを課題とし、特に高精度な楽曲印象値の付与技術の提供を目的とする。
【００１３】
【課題を解決するための手段】
本発明は上記課題の解決を図るため、次のような手段を創出した。
すなわち、少なくともコンピュータ処理が可能な所定のデータ規格に基づく楽曲データに対して、当該楽曲が有する印象を自動的に数値化し、付与する楽曲印象値自動付与装置を提供する。
本装置は、楽曲データを入力する入力手段と、楽曲データにおける、楽曲印象に係る物理的特徴量である楽曲基本特徴量を抽出する楽曲基本特徴量抽出手段を備える。そして、楽曲基本特徴量から、Ｎグラムを生成するＮグラム生成手段と、異なりＮグラムを用いてＮグラム特徴量を生成するＮグラム特徴量生成手段と、Ｎグラム特徴量を用い、所定の楽曲印象値計算式による演算を行う楽曲印象値演算手段と、楽曲印象値を出力する出力手段とを備える。
【００１４】
ここで、前記Ｎグラム特徴量生成手段が、前記異なりＮグラムの相対出現頻度と、所定の重み値を乗じてＮグラム特徴量を生成してもよい。
【００１５】
前記楽曲印象値計算式が、重回帰式であってもよい。
【００１６】
楽曲印象値自動付与装置が、複数の印象尺度についての印象値を付与する構成において、Ｎグラム特徴量生成手段が、印象尺度毎にＮグラム特徴量を生成すると共に、楽曲印象値演算手段とが、該印象尺度毎に、該Ｎグラム特徴量を用いて演算を行う構成でもよい。
【００１７】
前記データ規格が、ＭＩＤＩ（ｍｕｓｉｃａｌ　ｉｎｓｔｒｕｍｅｎｔ　ｄｉｇｉｔａｌ　ｉｎｔｅｒｆａｃｅ）規格であってもよい。
【００１８】
本発明の楽曲印象値自動付与装置は、入力手段から楽曲データを入力し、楽曲データが含む複数のトラックチャンク及び／又はチャネルを分割し、各トラックチャンク及び／又はチャネル毎に楽曲基本特徴量抽出手段に出力するストリーム分割手段を備えてもよい。
【００１９】
楽曲基本特徴量が、音の高さ、音の強さ、音の長さ、音色情報とすることができる。
【００２０】
前記Ｎグラム特徴量生成手段において、複数のＮ値についてＮグラム特徴量を生成する構成でもよい。
【００２１】
印象尺度に、「静かな」・「落ち着いた」・「爽やかな」・「明るい」・「荘厳な」・「ゆったりとした」・「綺麗な」・「楽しい」・「気持ちが落ち着く」・「心が癒される」の少なくともいずれかの文言、又はその同意語、又はその反意語としてもよい。
【００２２】
【発明の実施の形態】
本発明の実施形態を図面に示した実施例に基づいて説明する。なお、実施形態は、本発明の主旨から逸脱しないかぎり適宜変更可能なものである。
図１には本発明による楽曲印象値自動付与装置（以下、本装置と呼ぶ。）の構成図を示すと共に、図２に該装置における処理のフローチャートを示す。
【００２３】
本装置（１）は、主に演算等の処理を司る中核であるＣＰＵ（２）と、ユーザーに対して処理内容や結果を示す表示装置であるモニタ（３）、ユーザーが本装置（１）の操作を行うキーボード（４）、及びＣＰＵと連動して作用するメモリ（５）や、データを記憶可能な外部記憶装置（６）から構成される。
このような構成の装置として公知のパーソナルコンピュータがあり、本装置（１）はパーソナルコンピュータ上に実装することが可能である。
【００２４】
このような本装置（１）を用い、本発明では標準ＭＩＤＩファイルを入力し、自動的に楽曲印象値を付与し、それを出力する技術を創出した。各処理は図２に示す通りであり、標準ＭＩＤＩファイル（２０）から楽曲の印象に係る物理的特徴量である楽曲基本特徴量を抽出（２１）し、それを用いて連続する楽曲基本特徴量の組み合わせからＮグラムを生成した後、必要に応じ、重みや出現頻度を用いてＮグラム特徴量を生成する。（２２）
楽曲の特徴を表すのに有効なものを選択して楽曲特徴量の抽出（２３）を行い、楽曲印象値計算式に用いる。この演算処理を行うことで、本発明が目的とする楽曲印象値（２５）が算出される。
本実施例において、実数値で各印象尺度毎に１個の楽曲印象値が出力される。
次に各処理について詳述する。
【００２５】
標準ＭＩＤＩファイル（２０）は、本装置（１）に備えた外部記憶手段に楽曲ＭＩＤＩデータ（７）として記録されている。図１においては別体としているが、同じく外部記憶手段であるハードディスク（６）内に記録してもよいし、ネットワーク接続された別のコンピュータにおける外部記憶手段に記録してもよい。
ＣＰＵ（２）は楽曲データ入力部（９）の処理によって楽曲ＭＩＤＩデータ（７）を読み出し、楽曲基本特徴量抽出部（１０）に送る。
【００２６】
楽曲基本特徴量抽出部（１０）において、標準ＭＩＤＩファイル形式（フォーマット０または１）のデータ（７）から各トラックチャンク及び各チャネル毎に楽曲基本特徴量を抽出するストリーム分割機能を有する。標準ＭＩＤＩデータ（７）の場合には、トラックチャンク及びチャネルが並列的に記載されているため、各ストリームを別個に切り分けて抽出し、それぞれを１つのストリームデータとする。
【００２７】
例えば、１トラックチャンク・３チャネルの楽曲からは３つのストリームデータが生成される。本実施例において、抽出される楽曲基本特徴量は、音の高さ、音の強さ、音の長さ、音色情報の４種類であり、それぞれノートナンバー値、オンベロシティ値、ノートオンメッセージからノートオフメッセージが到着するまでの時間（ミリ秒）、ＧＭ（Ｇｅｎｅｒａｌ　ＭＩＤＩ）規格に基づく音色番号に対応している。
【００２８】
ここで、楽曲基本特徴量の抽出例としてストリームデータの一例を図３に示す。ストリームデータ（３０）において、各行の第１列が音の長さ（３１）、第２列が音の高さ（３２）、第３列が音の強さ（３３）、第４列が音色情報（３４）に対応している。
また、同一トラックチャンク同一チャネルにおいて、２音以上が同時に発音している場合を「和音」と定義し、和音がある場合は、２音目以降の楽曲基本特徴量（音の長さを除く）を第５列以降（３５）（３６）（３７）に繰り返し記述する。
各チャネルにおいて、そのチャネル（例えば３８）の無音状態を休符と定義し、音の長さを０、音の長さ以外を記号「ｓ」で表す。
このように楽曲基本特徴量抽出部（１０）で抽出されたデータは、ハードディスク（６）に記録される。
【００２９】
Ｎグラム特徴量の生成（２２）は、Ｎグラム生成部（１１）及びＮグラム特徴量生成部（１２）において処理する。Ｎグラム特徴量は、後処理で用いる楽曲特徴量の候補となる特徴量であり、以下の手順で楽曲基本特徴量から生成される。
まず、Ｎグラム生成部（１１）では、ハードディスク（６）上の楽曲基本特徴量データを用い、各ストリ−ムデータから４種類の楽曲基本特徴量を分離し、音色情報からはｕｎｉｇｒａｍ（１グラム、Ｎ＝１）を、それ以外の特徴量からはＮグラム（Ｎ＝１，２，３，４，５）を生成する。
【００３０】
例えば、図３に示されたストリームデータ（３０）の音の高さからは図４のようなＮグラム（４０）（４１）（４２）（４３）（４４）が生成される。なお、和音（３９ａ）（３９ｂ）（３９ｃ）は、値の大きい順に並べ替えられ、リスト形式の入れ子（４５）として記述される。
生成された結果はハードディスク（６）などに記録する。
【００３１】
次に、Ｎグラム特徴量生成部（１２）において、音色情報以外の楽曲基本特徴量から生成されたＮグラムの各要素（ｘ_１ｘ_２・・ｘ_Ｎ）を表１、表２の抽象化ルールに基づいて置換する。
【００３２】
【表１】

【００３３】
【表２】

【００３４】
表１のルールは、各Ｎグラムの第１要素ｘ_１に適用され、楽曲基本特徴量の種類に応じてその要素を置換する。このとき、リスト形式の入れ子を１つの記号（例えば７９−７１−６２（４５））で記述するとともに、楽曲基本特徴量の種類を示すためのタグとして、音の高さならｈ、音の強さならｖ、音の長さならｄ）　を付加する（例えばｈ７９−７１−６２）。
【００３５】
一方、表２のルールは、各Ｎグラムの第２要素以降ｘ_ｉ（ｉ＝２，３，・・・，Ｎ）に適用され、その直前の要素ｘ_ｉ−１との比較結果に応じてｘ_ｉを対応する記号で置換する。
このとき、ｘ_ｉ−１とｘ_ｉの比較は、それぞれの最大値同士、最小値同士で行われるが、和音以外では最大値＝最小値として扱われる。
以上の処理の結果、例えば、図４のＮグラムは抽象化され、図５のようになる。
【００３６】
以上のようにして抽象化されたＮグラムの異なりＮグラムを、本稿では「Ｎグラム特徴量」と呼ぶ。そして、それぞれのＮグラム特徴量は、その相対出現頻度に重みｗを掛けたものを値として持つ。
但し、相対出現頻度は、楽曲基本特徴量の種類毎、Ｎグラム統計量のＮ値毎に計算され、小数点第４位で四捨五入される。例えば、図５のｂｉｇｒａｍ（５０）からは４つのＮグラム特徴量が生成され、（ｈｓ　ｓｘ）（５１）（５２）の相対出現頻度は０．４００、それ以外（５３）（５４）（５５）の相対出現頻度は０．２００となる。
【００３７】
一方、重みｗには表３に示すような３種類の重み付け方法を用意した。
本発明では以上のＮグラム生成部（１１）及びＮグラム特徴量生成部（１２）における処理によって、Ｎグラム特徴量を生成し、ハードディスク（６）に記録する。もっとも、本発明のＮグラム特徴量生成プロセスは、上記の構成による相対出現頻度や重みを用いることに限定されるものではなく、公知のＮグラム統計量の算出方法から逸脱しない範囲で任意に設定することができる。
【００３８】
【表３】

【００３９】
ここで、本発明の楽曲印象値自動付与装置（１）は、前記した楽曲特徴量及び楽曲印象値計算式を決めるため、具体的には、図６に示した設計手順に従って設計している。図に明らかなように、本設計手順は、本装置（１）を使用する際と極めて近い工程を含んでいる。以下、この流れに沿って、各手順を説明する。
楽曲が有する印象を数値化する際の基準となるデータを得るために、ＳＤ法に基づく主観評価実験（６５）として、以下のような印象評価実験を行った。
【００４０】
被験者は、男性３９名、女性６１名の計１００名であり、プロレベル（演奏家としての収入があるような人）１名、セミプロレベル（音楽大学などで専門的に勉強したような人）７名、アマチュアレベル（バンドやオーケストラ、合唱団などに入っているような人）２０名、趣味レベル（以上の条件には該当しないけれども一応演奏できるような人）４６名、未経験者（ほとんど演奏できないような人）２６名と音楽経験が豊かでない人も多数含まれている。
【００４１】
印象に基づく楽曲検索は、音楽経験の豊富な人というよりも、そうでない人に対して特に有効な検索手段であり、そういう人の音楽感性を反映したデータを利用することは本装置（１）を設計する上で重要なことと言える。
また、実験で用いた楽曲（６０）は標準ＭＩＤＩファイル形式のクラシック８０曲であり、インターネット上で公開されていたものを採用している。但し、実験時間の都合により、楽曲聴取に要する平均試聴時間が１分前後となるよう楽曲の長さを調整する。被験者は、各楽曲を２回まで試聴することができ、その間にすべての印象尺度に対し７段階評価もしくは「どちらでもない」の評価を行うことが求められる。
【００４２】
本装置（１）で用いる印象尺度は、任意に設定することができるが、例えば本件出願人が特願２００２−２０３６９４号において開示した印象尺度の設計方法に基づいて設計することができ、表４に示す１０個の印象尺度を用いる。
【００４３】
【表４】

【００４４】
ここで、各印象尺度の７段階評価結果に対し点数を割り振った。例えば、明るさに関する印象尺度では、「とても明るい」を７点、「明るい」を６点、「少し明るい」を５点、「どちらとも言えない」を４点、「少し暗い」を３点、「暗い」を２点、「とても暗い」を１点とし、「どちらでもない」は無得点とした。
これにより、各印象尺度において楽曲印象値がどのような印象を表現しているのか明確になるし、ユーザが入力する「どちらでもない」という評価結果をその印象尺度に関しては点がない状態だと考えれば、「どちらでもない（無得点）」と「どちらとも言えない（４点）」の区別が可能となる。
以上の結果得られた８００００個（１００人×８０曲×１０印象尺度）のデータから各楽曲毎の平均を求め、印象値データ（８００個＝８０曲×１０印象尺度）（６６）とした。但し、無得点のデータは事前に除外し、計算には用いなかった。
【００４５】
一方、８０曲の楽曲データ（６０）は本装置（１）の楽曲データ入力部（９）から入力され、上記の処理により楽曲基本特徴量抽出部（１０）において、楽曲基本特徴量の抽出（６１）が行われる。
同様に、上記処理によりＮグラム生成部（１１）及びＮグラム特徴量生成部（１２）において、Ｎグラム特徴量の生成（６２）を行う。
【００４６】
ここで、Ｎグラム特徴量生成部（１２）において、上記のように表１、表２の抽象化ルールに基づいて置換するが、表５には抽象化処理による異なりＮグラム数の変化を音の高さの場合を例に示す。
【００４７】
【表５】

【００４８】
表５に示したように、抽象化により異なりＮグラム（すなわちＮグラム特徴量）の数は約半分に減少しているが、それでもまだ１，０００のオーダーである。
本発明の設計で用いる重回帰分析の性質上、説明変数となるＮグラム特徴量の数は、目的変数である印象値データのサンプル数（ここでは楽曲データ数８０である。）よりも２個以上（３個以上が推奨されている）少なくなければならない。（非特許文献５参照。）
【００４９】
【非特許文献５】菅民郎：「多変量統計分析」、現代数学社、京都（２０００）
【００５０】
そこで本実施例においてはＮグラム特徴量生成部（１２）で、Ｎグラム特徴量の数が多くても７７個を超えないよう、以下のような方法でＮグラム特徴量の選択処理（６３）を行う。
まず、各楽曲におけるＮグラム特徴量の相対出現頻度がいずれの楽曲においても０．０１０未満であったＮグラム特徴量を除外した。この操作により、Ｎグラム特徴量の数は表６のように変化した。但し、この操作は音色情報に対しては行っていない。
【００５１】
【表６】

【００５２】
次に、Ｎグラム特徴量と印象値データとの相関係数を求め、その絶対値が大きかった特徴量（最大７７個）を重回帰分析のための説明変数として選択（６４）した。このとき、Ｎグラム特徴量のＮ値の組み合わせとして、ｕｎｉｇｒａｍのみ、ｂｉｇｒａｍのみ、ｂｉｇｒａｍとｔｒｉｇｒａｍ、ｂｉｇｒａｍから４−ｇｒａｍまで、ｂｉｇｒａｍから５−ｇｒａｍまでの５通りを用意したので、この５グループのそれぞれにおいてＮグラム特徴量の選択（６４）を行った。
【００５３】
楽曲特徴量及び楽曲印象値計算式を決定するために、上記で選択されたＮグラム特徴量（６４）を説明変数、印象尺度ｍ（ｍ＝１，２，・・・、１０）における楽曲印象値データ（ＳＤ法に基づく印象評価実験の結果）（６６）を目的変数とする重回帰分析（変数増加法）（６７）を行う。
このとき、説明変数に用いるＮグラム特徴量のＮ値の組み合わせは、５通りあり、重みタイプには上記のｗ_１，ｗ_２，ｗ_３の３種類を用いるので、結局、各印象尺度毎に１５回の重回帰分析（６７）を行う。
【００５４】
ここで、各印象尺度毎に１５回の重回帰分析を行うが、その中で自由度修正済み決定係数Ｒ^２‘が最も大きかった重回帰式を楽曲印象値計算式として採用し（６８）、その重回帰式を構成する説明変数（Ｎグラム特徴量）を楽曲特徴量（６９）と定義する。
【００５５】
自由度修正済み決定係数について簡単に説明すると、サンプル数と説明変数の数との差が小さい（すなわち自由度が低い）と、決定係数が大きくなる傾向がある。この不具合を修正したのが自由度修正済み決定係数であり、次の式で計算される。
【数式１】

ただし、Ｓ_ｅ：残差平方和、Ｓ_ｙｙ：偏差平方和、ｎ：サンプル数、ｑ：説明変数の数
なお、自由度修正済み決定係数については、非特許文献５に記載されている。
【００５６】
本設計方法において、各印象尺度において　　Ｒ^２‘が最大となるＮ値の組み合わせ及び重みタイプを、そのときのＲ^２‘とともに表７に示す。なお、表７は、Ｎ＝５のＮグラム特徴量（５−ｇｒａｍ）が用いられなかったことを示しており、Ｎグラム特徴量におけるＮ値としては４までで十分なことを示唆している。
【００５７】
【表７】

【００５８】
ここで、印象尺度１の場合を例に、設計された楽曲特徴量と楽曲印象値計算式（６９）の偏回帰係数及び定数項を表８に示す。印象尺度１の場合の重みタイプは表７よりｗ_１なので、楽曲から抽出される楽曲特徴量の相対出現頻度に重み１（表３参照）を掛けた値が楽曲印象値計算式（重回帰式）に代入され、その楽曲の印象尺度１における楽曲印象値が算出される。
【００５９】
【表８】

【００６０】
以上の繰り返しにより、各印象尺度毎の楽曲特徴量、楽曲印象値計算式（６９）が定義され、本装置（１）の設計が完了する。定義された印象尺度ごとの楽曲特徴量、楽曲印象値計算式は、外部記憶手段である印象値データベース（８）に記録され、本装置（１）の楽曲印象値演算部（１３）から随時呼び出し可能とする。
印象値データベースは、ハードディスク（６）上に設けてもよい。
【００６１】
以下、再び本装置（１）のフローチャート（図２）に基づいて説述する。
Ｎグラム特徴量生成部（１２）において生成（２２）され、ハードディスク（６）に記録されたＮグラム特徴量を用いて、次の楽曲印象値演算部（１３）において、楽曲印象値の演算を行う。
【００６２】
楽曲印象値演算部（１３）においては、まずＮグラム特徴量から各印象尺度毎の楽曲特徴量を印象値データベース（８）を参照して抽出（２３）し、同データベース（８）内の楽曲印象値計算式に代入し演算処理（２４）する。
該演算の結果は、実数値で各印象尺度毎に１個の楽曲印象値（２５）が楽曲印象値出力部（１４）から出力される。
【００６３】
図７には本発明で開発した楽曲印象値自動付与装置（１）のモニタ（３）に表示される画面（７０）の一例を示す。
楽曲ＭＩＤＩデータ（７）は楽曲１曲分のファイルを指定するときにはボタン（７１）を、複数の楽曲を収容したフォルダごと指定するときはボタン（７２）をキーボード（４）やマウス（図示しない）などで指示する。
【００６４】
「印象値の自動付与」ボタン（７３）を指示することにより、上記で指定されていれば当該楽曲ＭＩＤＩデータ（７）を、指定されていなければ、デフォルトで定義されたフォルダ内の楽曲ＭＩＤＩデータ（７）を、以上に説述したＣＰＵ（２）における各処理により処理し、最終的に楽曲印象値出力部（１４）が、規定のファイルｍｉｄｉ．ｉｗｔとしてハードディスク（６）に保存する。
【００６５】
ここで、ｍｉｄｉ．ｉｗｔは、ｃｓｖ（カンマ区切り）形式のファイルであり、１行１楽曲で、各行の第１要素に標準ＭＩＤＩファイル名（拡張子は含まない）、第ｍ＋１要素に印象尺度ｍに対する楽曲印象値という並びで登録される。
なお、本装置（１）の出力は、ハードディスク（６）への記録に限らず、任意の外部記憶装置、モニタ（３）などへの表示により行うこともできる。
また、本装置（１）は単独で用いるだけでなく、他の任意の装置、例えばジュークボックスや楽曲を検索する装置などに付属させてもよい。また、本装置にネットワークアダプタを備えてネットワーク上に設け、他の端末からアクセスできるようにしてもよい。
【００６６】
【発明の効果】
本発明は上記の構成を備えるので、次の効果を奏する。
本発明によれば、標準ＭＩＤＩデータなど、コンピュータで処理可能な楽曲データから楽曲基本特徴量を抽出し、Ｎグラムを生成すると共に、Ｎグラムのうち、異なりＮグラムを用いてＮグラム特徴量を生成することにより、コンピュータ処理に適した形態で当該楽曲の楽曲特徴を抽出することができる。
そして、該楽曲特徴量から所定の楽曲印象値計算式による演算を行うため、高精度な楽曲印象値の算出を行うことができる。
これにより、簡便・高速な処理が可能な楽曲印象値自動付与装置を提供することができる。
【図面の簡単な説明】
【図１】本発明による楽曲印象値自動付与装置の一実施例の構成図である。
【図２】本発明における一実施例の処理のフローチャートである。
【図３】楽曲基本特徴量の抽出例である。
【図４】生成されたＮグラムの一例である。
【図５】抽象化されたＮグラムの一例である。
【図６】本発明による楽曲印象値自動付与装置の設計方法のフローチャートである。
【図７】本発明による楽曲印象値自動付与装置の表示画面の一例である。
【符号の説明】
１　　　楽曲印象値自動付与装置
２　　　ＣＰＵ
３　　　モニタ
４　　　キーボード
５　　　メモリ
６　　　外部記憶手段（ハードディスク）
７　　　楽曲ＭＩＤＩデータ
８　　　印象値データベース
９　　　楽曲データ入力部
１０　　　楽曲基本特徴量抽出部
１１　　　Ｎグラム生成部
１２　　　Ｎグラム特徴量生成部
１３　　　楽曲印象値演算部
１４　　　楽曲印象値出力部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an apparatus for processing music data, and more particularly to an apparatus for automatically giving an impression value of a music piece from the music data.
[0002]
[Prior art]
Conventionally, it has been considered that evaluation of a work of art such as music, for example, determination of an impression of the work is not suitable for processing by a computer or the like. For this reason, for example, even if the impression of a work is classified based on the impression of a music piece, the classification operation itself is performed by a human. Therefore, it is an issue to give a computer a new impression value for a completely new song.
[0003]
According to the applicants and others, according to conventional research, automatically assigning a music impression value to a computer means that a music feature value is extracted from music data that can be processed by a computer, Discussions are underway mainly on the problem of what kind of music impression value is output using such a calculation formula.
Here, the music impression value is a numerical value of the music impression, and the music feature amount indicates a physical feature amount extracted from the music data and used for calculating the music impression value.
[0004]
Therefore, the above-mentioned problems can be said to be technical problems regarding the design of music impression values, the design of music feature values, and the design of music impression value calculation formulas. However, there has not yet been provided a device which is designed overall and which automatically gives an accurate music impression value.
[0005]
For example, in designing a music impression value, according to Non-Patent Document 1, a five-dimensional factor space called a music sensitivity space is configured from a result of a factor analysis on subjective evaluation experiment data based on the SD (Semantic Differential) method. The impression input by the user and the impression of the music are represented as coordinate values in this space.
[0006]
However, since the interpretation of the meaning of the factor axis is manually performed, there is an individual difference, and it is difficult to clearly indicate what impression the coordinate values given to the music actually represent. In addition, since the impression of the music is represented by one point, some value must be input to all impression scales (8 in the system of Non-Patent Document 1), and the evaluation of the impression scale is “neither. (Song impression value is indeterminate). "
Therefore, even if the evaluation of the impression scale related to brightness is set to “bright” in order to search for a bright song, a value equivalent to “neither” for an impression scale other than brightness (1 point) A song having a 7-point rating of 7 to 4) is searched.
[0007]
[Non-Patent Document 1] Tsuyoshi Ikezoe, Yoshinobu Kajikawa, Yasuo Nomura: "Music database search system using sensibility words using music sensibility space" Journal of Information Processing, 42,12, pp. 3201-3212 (2001)
[0008]
In addition, there are studies published in Non-Patent Literatures 1 to 3 in designing music feature quantities. In the music search research targeting conventional music data such as these researches, music features such as pitch, strength, length, rhythm, tempo, beat, tonality (minor / major), etc. In many cases, a static feature amount such as an average, a variance, and a temporal ratio with respect to a music component is used.
However, it is considered that there is an inherent limitation in expressing music that is originally time-series data using only static features.
[0009]
[Non-Patent Document 2] Satoshi Sato, Kohei Kikuchi, Hajime Kitakami: "Automatic generation of emotional valence for image search of music data", Jikken Kenho, Database System 118-8, Informatics Basic 54- 8, pp. 57-64 (1999)
[Non-Patent Document 3] Satoshi Sato, Jun Ogawa, Yoshihiro Horino, Hajime Kitakami: "Study for Realization of Music Work Search System Based on Emotion", IEICE Technical Report (Voice), SP2000-137, pp. 51-56 (2001)
[0010]
Conventional researches (Non-Patent Documents 2 to 4) have proposed a pattern in which the transition of the pitch and length of three consecutive sounds is patterned as a feature amount in consideration of such a temporal transition of the sound. However, the number of continuous sounds was constant, and only a limited time transition could be handled.
[0011]
[Non-Patent Document 4] Yasuhiro Tsuji, Tadashi Hoshimori, Tadashi Omori: "Similar song search using local pattern feature of song / search by kansei word", IEICE Tech. 17-24 (1997)
[0012]
[Problems to be solved by the invention]
The present invention has been created in view of the above-described problems of the related art, and has an object to provide a music impression value automatic assigning device that automatically assigns a song impression value using song data. An object of the present invention is to provide a technique for giving a high-accuracy music impression value.
[0013]
[Means for Solving the Problems]
The present invention has created the following means in order to solve the above problems.
In other words, the present invention provides an automatic music impression value assigning device for automatically digitizing the impression of a song and assigning it to song data based on a predetermined data standard that can be processed at least by a computer.
The present apparatus includes an input unit for inputting music data, and a music basic feature amount extracting unit for extracting a music basic feature amount which is a physical feature amount related to a music impression in the music data. Then, an N-gram generating means for generating an N-gram from the music basic feature quantity, an N-gram feature quantity generating means for generating the N-gram feature quantity using the N-gram, and a predetermined music piece using the N-gram feature quantity A music impression value calculating means for performing an operation based on the impression value calculation formula and an output means for outputting a music impression value are provided.
[0014]
Here, the N-gram feature amount generation means may generate an N-gram feature amount by multiplying the relative appearance frequency of the different N-gram by a predetermined weight value.
[0015]
The music impression value calculation formula may be a multiple regression formula.
[0016]
In the configuration in which the music impression value automatic giving device gives impression values for a plurality of impression scales, the N-gram feature value generation means generates an N-gram feature value for each impression scale, and the music impression value calculation means includes Alternatively, a configuration may be used in which the calculation is performed using the N-gram feature amount for each impression scale.
[0017]
The data standard may be a MIDI (musical instrument digital interface) standard.
[0018]
The music impression value automatic assigning device of the present invention inputs music data from input means, divides a plurality of track chunks and / or channels included in the music data, and extracts a music basic feature amount for each track chunk and / or channel. The apparatus may further comprise a stream dividing means for outputting the data to the means.
[0019]
The music basic feature amount can be a pitch, a sound intensity, a sound length, and timbre information.
[0020]
The N-gram feature generation unit may generate an N-gram feature for a plurality of N values.
[0021]
Impression scales include "quiet", "calm", "fresh", "bright", "mall", "slow", "beautiful", "fun", "calm", " At least one of the words "Heal the mind", or a synonym or antonym thereof may be used.
[0022]
BEST MODE FOR CARRYING OUT THE INVENTION
An embodiment of the present invention will be described based on an example shown in the drawings. The embodiments can be appropriately modified without departing from the gist of the present invention.
FIG. 1 shows a configuration diagram of a music impression value automatic giving device (hereinafter, this device) according to the present invention, and FIG. 2 shows a flowchart of processing in the device.
[0023]
The present device (1) includes a CPU (2), which is a core that mainly performs processing such as calculations, a monitor (3), which is a display device that shows processing contents and results to a user. , A memory (5) that works in conjunction with the CPU, and an external storage device (6) that can store data.
There is a known personal computer as an apparatus having such a configuration, and the apparatus (1) can be mounted on the personal computer.
[0024]
Using such a device (1), the present invention has created a technique for inputting a standard MIDI file, automatically assigning a music impression value, and outputting the music impression value. Each process is as shown in FIG. 2. A basic song amount is extracted from a standard MIDI file (20), which is a physical feature amount relating to the impression of the song (21). After the N-gram is generated from the combination of the N-grams, the N-gram feature is generated using the weight and the appearance frequency as necessary. (22)
A selection effective for representing the feature of the music is selected, and the music feature amount is extracted (23), and is used in the music impression value calculation formula. By performing this arithmetic processing, the music impression value (25) targeted by the present invention is calculated.
In the present embodiment, one music impression value is output for each impression scale as a real value.
Next, each process will be described in detail.
[0025]
The standard MIDI file (20) is recorded as music MIDI data (7) in external storage means provided in the apparatus (1). Although it is separate in FIG. 1, it may be recorded in the hard disk (6) which is also an external storage means, or may be recorded in the external storage means of another computer connected to the network.
The CPU (2) reads out the music MIDI data (7) by the processing of the music data input section (9) and sends it to the music basic feature quantity extraction section (10).
[0026]
The music basic feature extraction unit (10) has a stream division function for extracting the music basic feature for each track chunk and each channel from the data (7) in the standard MIDI file format (format 0 or 1). In the case of the standard MIDI data (7), since track chunks and channels are described in parallel, each stream is separately cut and extracted, and each stream is defined as one stream data.
[0027]
For example, three stream data are generated from music of one track chunk and three channels. In the present embodiment, the extracted basic music feature amounts are four types of pitch, sound intensity, sound length, and timbre information, and are respectively obtained from a note number value, an on-velocity value, and a note-on message. The time (millisecond) until the note-off message arrives, and corresponds to a tone color number based on the GM (General MIDI) standard.
[0028]
Here, an example of stream data is shown in FIG. 3 as an example of extracting the music basic feature amount. In the stream data (30), the first column of each row is a tone length (31), the second column is a tone pitch (32), the third column is a tone intensity (33), and the fourth column is a timbre. It corresponds to information (34).
In addition, a case where two or more sounds are generated simultaneously in the same track chunk and the same channel is defined as a “chord”, and when there is a chord, the basic feature amount of the music after the second sound (excluding the length of the sound). Are repeated in the fifth and subsequent columns (35), (36), and (37).
In each channel, the silence state of the channel (for example, 38) is defined as a rest, and the length of a sound is represented by 0, and the rest is represented by a symbol "s".
The data extracted by the music basic feature amount extraction unit (10) is recorded on the hard disk (6).
[0029]
The generation (22) of the N-gram feature is processed in the N-gram generator (11) and the N-gram feature generator (12). The N-gram feature amount is a feature amount that is a candidate for a song feature amount used in post-processing, and is generated from the song basic feature amount in the following procedure.
First, the N-gram generation unit (11) uses the music basic feature data on the hard disk (6), separates four types of music basic features from each stream data, and obtains unigram (1 gram, 1 gram, N = 1), and N-grams (N = 1, 2, 3, 4, 5) are generated from the other feature amounts.
[0030]
For example, N-grams (40) (41) (42) (43) (44) as shown in FIG. 4 are generated from the pitches of the stream data (30) shown in FIG. The chords (39a) (39b) (39c) are rearranged in descending order of value and described as nesting (45) in a list format.
The generated result is recorded on a hard disk (6) or the like.
[0031]
Next, in the N-gram feature generation unit (12), each element (x ₁ x ₂ ... X _N ) of the N-gram generated from the music basic feature other than the timbre information is abstracted in Tables 1 and 2. Replace based on rules.
[0032]
[Table 1]

[0033]
[Table 2]

[0034]
Table 1 rule is applied to the first element x ₁ of each N-gram, replacing the element in accordance with the type of music the basic feature amount. At this time, the nesting in the list format is described by one symbol (for example, 79-71-62 (45)), and the tag for indicating the type of the music basic feature amount is h if the pitch is a pitch, and the strength of the sound. Then, v is added if the length is a sound, and d) is added (for example, h79-71-62).
[0035]
On the other hand, the rule in Table 2 is applied to the second element and subsequent elements x _i (i = 2, 3,..., N) of each N-gram, and according to the result of comparison with the element x _i−1 immediately before the element. Replace _xi with the corresponding symbol.
In this case, comparison of x _i-1 and x _i are the respective maximum values each other is carried out at a minimum value between, are treated as the maximum value = minimum value outside chords.
As a result of the above processing, for example, the N-gram in FIG. 4 is abstracted and becomes as shown in FIG.
[0036]
The N-gram, which is different from the N-gram abstracted as described above, is referred to as “N-gram feature quantity” in this paper. Then, each N-gram feature has a value obtained by multiplying the relative appearance frequency by a weight w.
However, the relative appearance frequency is calculated for each type of the music basic feature amount and for each N value of the N-gram statistic, and is rounded off to the fourth decimal place. For example, four N-gram features are generated from bigram (50) in FIG. 5, the relative frequency of occurrence of (hs sx) (51) (52) is 0.400, and the other (53) (54) (55) ) Is 0.200.
[0037]
On the other hand, three types of weighting methods as shown in Table 3 were prepared for the weight w.
In the present invention, an N-gram feature is generated by the processing in the N-gram generator (11) and the N-gram feature generator (12), and is recorded on the hard disk (6). However, the N-gram feature amount generation process of the present invention is not limited to the use of the relative appearance frequency and weight according to the above configuration, and may be arbitrarily set without departing from the known N-gram statistic calculation method. can do.
[0038]
[Table 3]

[0039]
Here, in order to determine the above-mentioned music feature value and music impression value calculation formula, the music impression value automatic giving device (1) of the present invention is specifically designed according to the design procedure shown in FIG. As is clear from the figure, the present design procedure includes steps very similar to those when using the present apparatus (1). Hereinafter, each procedure will be described along this flow.
The following impression evaluation experiment was performed as a subjective evaluation experiment (65) based on the SD method in order to obtain data serving as a reference when quantifying the impression of the music.
[0040]
The subjects were 39 men and 61 women, 100 in total, one professional level (person who has income as a performer), semi-professional level (person who studied professionally at music colleges etc.) 7 people, amateur level (people who are in a band, orchestra, choir, etc.) 20 people, hobby level (people who do not meet the above conditions but who can play for the time being) 46 people, inexperienced people (mostly playing There are also 26 people who do not have enough music experience.
[0041]
The music search based on the impression is a particularly effective search means for those who are not experienced in music, rather than those who are experienced in music, and the device (1) uses data reflecting the music sensitivity of such people. It can be said that it is important in designing.
Also, the music (60) used in the experiment is a classic 80 music in a standard MIDI file format, which is published on the Internet. However, the length of the music is adjusted so that the average listening time required for listening to the music is about 1 minute depending on the experimental time. The test subject can listen to each song up to two times, during which time all the impression scales are required to be evaluated on a seven-point scale or “neither”.
[0042]
The impression scale used in the present apparatus (1) can be set arbitrarily. For example, the impression scale can be designed based on the design method of the impression scale disclosed in Japanese Patent Application No. 2002-203694 by the present applicant. 10 impression scales are used.
[0043]
[Table 4]

[0044]
Here, scores were assigned to the seven-level evaluation results of each impression scale. For example, the impression scale for brightness is 7 points for “very bright”, 6 points for “bright”, 5 points for “slightly bright”, 4 points for “neither”, 3 points for “slightly dark”, "Dark" was given 2 points, "very dark" was given 1 point, and "neither" was given no score.
This clarifies what impression the music impression value expresses in each impression scale, and the evaluation result of "neither" input by the user is that there is no point for the impression scale Considering this, it is possible to distinguish between “neither (no score)” and “neither (4 points)”.
The average of each music piece was calculated from the data of 80000 pieces (100 people x 80 music pieces x 10 impression scales) obtained as the above results, and the result was taken as impression value data (800 pieces = 80 music pieces x 10 impression scales) (66). However, non-scoring data was excluded in advance and was not used in the calculation.
[0045]
On the other hand, 80 pieces of music data (60) are inputted from the music data input section (9) of the present apparatus (1), and the basic music quantity extraction section (10) extracts the basic music piece quantity (10) by the above processing. 61) is performed.
Similarly, the N-gram feature amount generation unit (11) and the N-gram feature amount generation unit (12) perform the generation (62) of the N-gram feature amount by the above processing.
[0046]
Here, the N-gram feature quantity generation unit (12) performs replacement based on the abstraction rules of Tables 1 and 2 as described above. An example of the case of the height is shown.
[0047]
[Table 5]

[0048]
As shown in Table 5, the number of N-grams (that is, N-gram features) is reduced by about half depending on the abstraction, but is still on the order of 1,000.
Due to the nature of the multiple regression analysis used in the design of the present invention, the number of N-gram features as explanatory variables is two more than the number of samples of the impression value data as the objective variable (here, the number of music data is 80). Must be less (three or more are recommended). (See Non-Patent Document 5.)
[0049]
[Non-Patent Document 5] Tamio Suga: "Multivariate Statistical Analysis", Contemporary Mathematics, Kyoto (2000)
[0050]
Therefore, in the present embodiment, the N-gram feature amount selecting unit (12) selects the N-gram feature amount by the following method (63) so that the number of N-gram feature amounts does not exceed 77 at most. I do.
First, the N-gram feature amount in which the relative appearance frequency of the N-gram feature amount in each song was less than 0.010 in any song was excluded. By this operation, the number of N-gram feature quantities changed as shown in Table 6. However, this operation is not performed on the tone color information.
[0051]
[Table 6]

[0052]
Next, the correlation coefficient between the N-gram feature value and the impression value data was obtained, and the feature value (maximum 77) whose absolute value was large was selected as an explanatory variable for multiple regression analysis (64). At this time, as the combinations of the N values of the N-gram feature amounts, only unigram, bigram only, bigram and trigram, bigram to 4-gram, and bigram to 5-gram were prepared. , The selection (64) of the N-gram feature amount was performed.
[0053]
In order to determine the music feature quantity and the music impression value calculation formula, the N-gram feature quantity (64) selected above is used as an explanatory variable and a music impression in an impression scale m (m = 1, 2,..., 10). Multiple regression analysis (variable increase method) (67) is performed using the value data (result of the impression evaluation experiment based on the SD method) (66) as the objective variable.
In this case, the combination of N values of the N-gram features for the explanatory variables, There are 5, since using three kinds of the above w _1, w _2, w ₃ is the weight type, after all, for each Impression 15 multiple regression analyzes (67) are performed.
[0054]
Here, the multiple regression analysis is performed 15 times for each impression scale. Among them, the multiple regression equation having the largest determined coefficient of freedom R ² ′ is adopted as the music impression value calculation equation (68). The explanatory variable (N-gram feature amount) constituting the multiple regression equation is defined as a music feature amount (69).
[0055]
To briefly explain the determination coefficient after the degree of freedom correction, the smaller the difference between the number of samples and the number of explanatory variables (ie, the lower the degree of freedom), the larger the determination coefficient tends to be. This defect is corrected by the degree of freedom corrected determination coefficient, which is calculated by the following equation.
[Formula 1]

However, S _e : sum of residual squares, S _yy : sum of squares of deviation, n: number of samples, q: number of explanatory variables.
[0056]
In this design method, Table 7 shows combinations of N values and weight types that maximize R ² ′ in each impression scale, along with R ² ′ at that time. Table 7 shows that the N-gram feature amount (5-gram) of N = 5 was not used, and suggests that an N value of 4 in the N-gram feature amount is sufficient. .
[0057]
[Table 7]

[0058]
Here, taking the case of the impression scale 1 as an example, Table 8 shows the designed music feature amounts and the partial regression coefficients and constant terms of the music impression value calculation formula (69). Weight type for the Impression 1 since w ₁ from Table 7, the value obtained by multiplying the weight 1 (see Table 3) to the relative frequency of occurrence of the music feature quantity extracted from the music song impression value calculation formula (regression equation ) Is calculated, and the music impression value of the music in the impression scale 1 is calculated.
[0059]
[Table 8]

[0060]
By repeating the above, the music feature amount and the music impression value calculation formula (69) for each impression scale are defined, and the design of the present apparatus (1) is completed. The defined music feature quantity and music impression value calculation formula for each impression scale are recorded in the impression value database (8) as external storage means, and are called from the music impression value calculation unit (13) of the present apparatus (1) as needed. Make it possible.
The impression value database may be provided on the hard disk (6).
[0061]
Hereinafter, description will be made again based on the flowchart (FIG. 2) of the present apparatus (1).
Using the N-gram feature value generated (22) in the N-gram feature value generation unit (12) and recorded on the hard disk (6), the next music impression value calculation unit (13) calculates the music impression value. Do.
[0062]
The music impression value calculation unit (13) first extracts (23) the music feature amount for each impression scale from the N-gram feature amount with reference to the impression value database (8), and then extracts the music in the database (8). Substitution is performed in the impression value calculation formula to perform an operation process (24).
As a result of the calculation, one music impression value (25) is output from the music impression value output unit (14) as a real value for each impression scale.
[0063]
FIG. 7 shows an example of a screen (70) displayed on the monitor (3) of the automatic music impression value giving device (1) developed in the present invention.
For the song MIDI data (7), a button (71) is used to designate a file for one song, and a button (72) is used to designate a folder containing a plurality of songs, using a keyboard (4) or a mouse (not shown). And so on.
[0064]
By instructing the "automatically assign impression value" button (73), the music MIDI data (7) is specified if specified, and the music MIDI data in the folder defined by default if not specified. (7) is processed by each process in the CPU (2) described above, and finally the music impression value output unit (14) outputs the specified file midi. It is stored in the hard disk (6) as iwt.
[0065]
Here, midi. iwt is a file in the csv (comma-separated) format, with one song per line, the first element of each line being the standard MIDI file name (not including the extension), and the (m + 1) th element being the song impression value for the impression scale m. Registered side by side.
The output of the device (1) is not limited to recording on the hard disk (6), but can also be performed by display on any external storage device, monitor (3) or the like.
Further, the present apparatus (1) may be used not only alone but also attached to any other apparatus, for example, a jukebox or a music retrieval apparatus. Further, the present apparatus may be provided with a network adapter and provided on a network so as to be accessible from another terminal.
[0066]
【The invention's effect】
The present invention has the above configuration, and has the following effects.
According to the present invention, a music basic feature is extracted from music data that can be processed by a computer, such as standard MIDI data, and an N-gram is generated. By generating, the music feature of the music can be extracted in a form suitable for computer processing.
Then, since a calculation based on the predetermined music impression value calculation formula is performed from the music feature amount, highly accurate calculation of the music impression value can be performed.
This makes it possible to provide a music impression value automatic assigning device capable of simple and high-speed processing.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of one embodiment of a music impression value automatic providing device according to the present invention.
FIG. 2 is a flowchart of a process according to an embodiment of the present invention.
FIG. 3 is an example of extracting a music basic feature amount.
FIG. 4 is an example of a generated N-gram.
FIG. 5 is an example of an abstracted N-gram.
FIG. 6 is a flowchart of a method for designing a music impression value automatic giving device according to the present invention.
FIG. 7 is an example of a display screen of the music impression value automatic giving device according to the present invention.
[Explanation of symbols]
1 Music impression value automatic giving device 2 CPU
3 monitor 4 keyboard 5 memory 6 external storage means (hard disk)
7 music MIDI data 8 impression value database 9 music data input unit 10 music basic feature amount extraction unit 11 N-gram generation unit 12 N-gram feature amount generation unit 13 music impression value calculation unit 14 music impression value output unit

Claims

少なくともコンピュータ処理が可能な所定のデータ規格に基づく楽曲データに対して、当該楽曲が有する印象を自動的に数値化し、付与する楽曲印象値自動付与装置であって、該装置が、
楽曲データを入力する入力手段と、
該楽曲データにおける、楽曲印象に係る物理的特徴量である楽曲基本特徴量を抽出する楽曲基本特徴量抽出手段と、
該楽曲基本特徴量から、Ｎグラムを生成するＮグラム生成手段と、
該Ｎグラムのうち、異なりＮグラムを用いてＮグラム特徴量を生成するＮグラム特徴量生成手段と、
該Ｎグラム特徴量を用い、所定の楽曲印象値計算式による演算を行う楽曲印象値演算手段と、
楽曲印象値を出力する出力手段と
を備えることを特徴とする楽曲印象値自動付与装置。For music data based on at least a predetermined data standard that can be processed by a computer, a music impression value automatic assigning device that automatically digitizes the impression of the song and assigns the impression,
Input means for inputting music data,
Music basic feature amount extraction means for extracting a music basic feature amount which is a physical feature amount related to a music impression in the music data;
N-gram generating means for generating an N-gram from the music basic feature amount;
N-gram feature generation means for generating an N-gram feature using a different N-gram among the N-grams;
Music impression value calculation means for performing an operation according to a predetermined music impression value calculation formula using the N-gram feature amount;
Output means for outputting a music impression value.

前記Ｎグラム特徴量生成手段が、
前記異なりＮグラムの相対出現頻度と、所定の重み値を乗じてＮグラム特徴量を生成する
請求項１に記載の楽曲印象値自動付与装置。The N-gram feature quantity generation unit includes:
The music impression value automatic assigning apparatus according to claim 1, wherein the relative appearance frequency of the different N-gram is multiplied by a predetermined weight value to generate an N-gram feature amount.

前記楽曲印象値計算式が、
重回帰式である
請求項１又は２に記載の楽曲印象値自動付与装置。The music impression value calculation formula is
The music impression value automatic giving device according to claim 1 or 2, which is a multiple regression formula.

前記楽曲印象値自動付与装置が、複数の印象尺度についての印象値を付与する構成において、
前記Ｎグラム特徴量生成手段が、該印象尺度毎にＮグラム特徴量を生成すると共に、
前記楽曲印象値演算手段とが、該印象尺度毎に、該Ｎグラム特徴量を用いて演算を行う
請求項１ないし３に記載の楽曲印象値自動付与装置。In the configuration in which the music impression value automatic giving device gives impression values for a plurality of impression scales,
The N-gram feature generation means generates an N-gram feature for each impression scale,
4. The music impression value automatic assigning device according to claim 1, wherein the music impression value calculation means performs calculation using the N-gram feature amount for each impression scale. 5.

前記データ規格が、ＭＩＤＩ（ｍｕｓｉｃａｌ　ｉｎｓｔｒｕｍｅｎｔ　ｄｉｇｉｔａｌ　ｉｎｔｅｒｆａｃｅ）規格である
請求項１ないし４に記載の楽曲印象値自動付与装置。The music impression value automatic assigning apparatus according to claim 1, wherein the data standard is a MIDI (musical instrument digital interface) standard.

前記楽曲印象値自動付与装置において、
入力手段から楽曲データを入力し、楽曲データが含む複数のトラックチャンク及び／又はチャネルを分割し、各トラックチャンク及び／又はチャネル毎に楽曲基本特徴量抽出手段に出力する
ストリーム分割手段を備えた
請求項１ないし５に記載の楽曲印象値自動付与装置。In the music impression value automatic giving device,
A stream dividing means for inputting music data from the input means, dividing a plurality of track chunks and / or channels included in the music data, and outputting the divided track chunks and / or channels to the music basic feature amount extracting means for each track chunk and / or channel; Item 6. The music impression value automatic assigning device according to any one of Items 1 to 5.

前記楽曲基本特徴量が、音の高さ、音の強さ、音の長さ、音色情報である
請求項１ないし６に記載の楽曲印象値自動付与装置。7. The music impression value automatic assigning apparatus according to claim 1, wherein the music basic feature amount is sound pitch, sound intensity, sound length, and timbre information.

前記Ｎグラム生成手段において、
複数のＮ値についてＮグラムを生成する
請求項１ないし７に記載の楽曲印象値自動付与装置。In the N-gram generating means,
8. The music impression value automatic assigning apparatus according to claim 1, wherein an N-gram is generated for a plurality of N values.

前記印象尺度が、
「静かな」・「落ち着いた」・「爽やかな」・「明るい」・「荘厳な」・「ゆったりとした」・「綺麗な」・「楽しい」・「気持ちが落ち着く」・「心が癒される」
の少なくともいずれかの文言、又はその同意語、又はその反意語である
請求項１ないし８に記載の楽曲印象値自動付与装置。The impression scale is
"Quiet", "calm", "refreshing", "bright", "mall", "slow", "beautiful", "fun", "calm down", "healing the mind""
9. The music impression value automatic assigning apparatus according to claim 1, which is at least one of the following words, a synonym thereof, or an antonym thereof.