JP6197569B2

JP6197569B2 - Acoustic analyzer

Info

Publication number: JP6197569B2
Application number: JP2013216008A
Authority: JP
Inventors: 暖篠井; 陽前澤
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2013-10-17
Filing date: 2013-10-17
Publication date: 2017-09-20
Anticipated expiration: 2033-10-17
Also published as: JP2015079110A

Description

本発明は、音響信号を解析する技術に関する。 The present invention relates to a technique for analyzing an acoustic signal.

楽曲の演奏音を表す音響信号の特徴を解析する各種の技術が従来から提案されている。例えば非特許文献１には、多数の楽曲の音響信号に対して非負値行列因子分解（NMF：Nonnegative Matrix Factorization）を実行した結果を利用して楽曲のジャンルを推定する技術が開示されている。 Various techniques for analyzing the characteristics of acoustic signals representing the performance sound of music have been proposed. For example, Non-Patent Document 1 discloses a technique for estimating the genre of music by using a result obtained by performing nonnegative matrix factorization (NMF) on acoustic signals of a large number of music.

Konstantin Markov, Tomoko Matsui, "NONNEGATIVE MATRIX FACTORIZATION BASED SELF-TAUGHT LEARNING WITH APPLICATION TO MUSIC GENRE CLASSIFICATION", IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2012Konstantin Markov, Tomoko Matsui, "NONNEGATIVE MATRIX FACTORIZATION BASED SELF-TAUGHT LEARNING WITH APPLICATION TO MUSIC GENRE CLASSIFICATION", IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2012

しかし、非特許文献１のように既存の非負値行列因子分解を音響信号の解析に単純に適用した構成では、実際には音響信号の高精度な解析は困難である。以上の事情を考慮して、本発明は、楽曲のジャンルやスタイル等の区分を高精度に解析することを目的とする。 However, in the configuration in which the existing non-negative matrix factorization is simply applied to the analysis of the acoustic signal as in Non-Patent Document 1, it is actually difficult to analyze the acoustic signal with high accuracy. In view of the above circumstances, an object of the present invention is to analyze the classification of music genres, styles, and the like with high accuracy.

以上の課題を解決するために、本発明の音響解析装置は、複数の参照音を分類した複数の区分について、区分内の参照音の周波数特性を表す複数の基底ベクトルを含む区分毎の参照基底行列（例えば参照基底行列Ｂ[g]）と、参照基底行列の各基底ベクトルの加重値の時間変動を表す複数の係数ベクトルを含む解析係数行列（例えば解析係数行列Ｙ[g]）との行列積を、区分毎の第１区分加重値（例えば区分加重値ｗA[g]）のもとで加重加算した結果が、解析対象音の周波数特性の時系列を表す解析特性行列（例えば解析特性行列Ｘ）に近似するように、第１区分加重値と解析係数行列とを区分毎に算定する行列解析手段を具備する。以上の構成では、参照基底行列と解析係数行列との行列積を区分毎の第１区分加重値のもとで加重加算した結果が解析対象音の解析特性行列に近似するように、第１区分加重値と解析係数行列とが参照音の区分毎に個別に算定されるから、相異なる複数の区分の参照音に音響特性が類似する音響成分を解析対象音が包含する場合でも、当該音響成分は１個の区分の解析係数行列に優勢に反映される。すなわち、解析対象音の解析係数行列が高精度に推定される。したがって、解析対象音のジャンルやスタイル等の区分を高精度に解析することが可能である。なお、各区分は典型的には複数の参照音を包含するが、１個の参照音のみを各区分に包含させることも可能である。 In order to solve the above problems, the acoustic analysis device of the present invention provides a reference basis for each section including a plurality of basis vectors representing frequency characteristics of reference sounds in the section for a plurality of sections into which a plurality of reference sounds are classified. A matrix (e.g., reference coefficient matrix B [g]) and an analysis coefficient matrix (e.g., analysis coefficient matrix Y [g]) including a plurality of coefficient vectors representing temporal variation of the weights of the respective base vectors of the reference basis matrix An analysis characteristic matrix (for example, an analysis characteristic matrix) in which a product is weighted and added under a first classification weight value (for example, a classification weight value wA [g]) for each classification, and represents a time series of frequency characteristics of the analysis target sound. A matrix analyzing means for calculating the first section weight value and the analysis coefficient matrix for each section so as to approximate to X). In the above configuration, the first division is made so that the result of weighted addition of the matrix product of the reference basis matrix and the analysis coefficient matrix under the first division weight value for each division approximates the analysis characteristic matrix of the analysis target sound. Since the weight value and the analysis coefficient matrix are individually calculated for each reference sound category, even if the analysis target sound includes an acoustic component having an acoustic characteristic similar to the reference sound of different categories, the sound component Is predominantly reflected in the analysis coefficient matrix of one section. That is, the analysis coefficient matrix of the analysis target sound is estimated with high accuracy. Therefore, it is possible to analyze the category of the analysis target sound such as the genre and style with high accuracy. Each section typically includes a plurality of reference sounds, but only one reference sound can be included in each section.

本発明の好適な態様に係る音響解析装置は、参照音の周波数特性の時系列を表す複数の参照特性行列（例えば参照特性行列Ｒ[g,s]）を、参照基底行列と、当該参照基底行列の各基底ベクトルの加重値の時間変動を表す複数の係数ベクトルを含む参照係数行列（例えば参照係数行列Ｚ[g,s]）とに分解したときの当該参照係数行列と、行列解析手段が算定した解析係数行列とを比較する特性比較手段を具備する。以上の構成によれば、参照特性行列を参照基底行列と参照係数行列とに分解（非負値行列因子分解）したときの当該参照係数行列と解析対象音の解析係数行列とが比較されるから、解析対象音と参照音との間で各音響成分の時間的なパターンの類否の度合を評価することが可能である。 The acoustic analysis device according to a preferred aspect of the present invention includes a plurality of reference characteristic matrices (for example, a reference characteristic matrix R [g, s]) representing a time series of frequency characteristics of a reference sound, a reference basis matrix, and the reference basis. A reference coefficient matrix when decomposed into a reference coefficient matrix (for example, a reference coefficient matrix Z [g, s]) including a plurality of coefficient vectors representing a time variation of a weight value of each basis vector of the matrix, and matrix analysis means A characteristic comparison means for comparing the calculated analysis coefficient matrix is provided. According to the above configuration, the reference coefficient matrix when the reference characteristic matrix is decomposed into a reference basis matrix and a reference coefficient matrix (non-negative matrix factorization) is compared with the analysis coefficient matrix of the analysis target sound. It is possible to evaluate the degree of similarity of temporal patterns of each acoustic component between the analysis target sound and the reference sound.

本発明の好適な態様において、特性比較手段は、参照基底行列と参照係数行列との行列積を区分毎の第２区分加重値（例えば区分加重値ｗB[g,s]）のもとで加重加算した結果が参照特性行列に近似するように、第２区分加重値と参照係数行列とを区分毎に算定したときの当該各参照係数行列を、行列解析手段が算定した解析係数行列と比較する。以上の態様では、参照基底行列と参照係数行列との行列積を区分毎の第２区分加重値のもとで加重加算した結果が参照音の参照特性行列に近似するように、第２区分加重値と参照係数行列とが区分毎に個別に算定されるから、参照音の参照係数行列を高精度が高精度に推定される。したがって、解析対象音のジャンルやスタイル等の区分を高精度に解析できるという前述の効果は格別に顕著である。 In a preferred aspect of the present invention, the characteristic comparison means weights the matrix product of the reference basis matrix and the reference coefficient matrix under a second segment weight value (for example, a segment weight value wB [g, s]) for each segment. Each reference coefficient matrix when the second section weight value and the reference coefficient matrix are calculated for each section is compared with the analysis coefficient matrix calculated by the matrix analyzing means so that the addition result approximates the reference characteristic matrix. . In the above aspect, the second section weighting is performed so that the result of weighted addition of the matrix product of the reference basis matrix and the reference coefficient matrix under the second section weight value for each section approximates the reference characteristic matrix of the reference sound. Since the value and the reference coefficient matrix are individually calculated for each section, the reference coefficient matrix of the reference sound is estimated with high accuracy. Therefore, the above-described effect of being able to analyze the genre, style, etc. of the analysis target sound with high accuracy is particularly remarkable.

本発明の好適な態様において、特性比較手段は、複数の区分のうち行列解析手段が区分毎に算定した第１区分加重値に応じて選択された特定区分内の複数の参照音の各々について、当該参照音の参照係数行列と、行列解析手段が特定区分について算定した解析係数行列とを比較する。以上の態様では、複数の区分のうち第１区分加重値に応じて選択された特定区分（例えば第１区分加重値が最大となる区分）内の複数の参照音の各々について参照係数行列と解析対象音の解析係数行列とが比較されるから、複数の区分の全部について参照係数行列の算定や解析係数行列との比較を実行する構成と比較して処理量が削減されるという利点がある。 In a preferred aspect of the present invention, the characteristic comparison means for each of the plurality of reference sounds in the specific section selected according to the first section weight value calculated for each section by the matrix analysis means among the plurality of sections. The reference coefficient matrix of the reference sound is compared with the analysis coefficient matrix calculated by the matrix analysis unit for the specific section. In the above aspect, the reference coefficient matrix and the analysis are performed for each of a plurality of reference sounds in a specific section selected according to the first section weight value among the plurality of sections (for example, the section having the maximum first section weight value). Since the analysis coefficient matrix of the target sound is compared, there is an advantage that the processing amount is reduced as compared with the configuration in which the calculation of the reference coefficient matrix and the comparison with the analysis coefficient matrix are performed for all of the plurality of sections.

本発明の好適な態様において、複数の参照音は、音楽的なスタイルが相違する各楽曲の伴奏音であり、楽曲のジャンル毎に複数の区分に分類され、特定区分のジャンルの名称と特性比較手段による比較結果に応じて選択された参照音のスタイルの名称とを表示装置に表示させる表示制御手段を具備する。以上の態様では、特定区分のジャンルの名称と特性比較手段による比較結果に応じて選択された参照音のスタイルの名称とが表示装置に表示されるから、解析対象音のジャンルおよびスタイルの判別を所望する利用者に有用な情報を提供できるという利点がある。 In a preferred embodiment of the present invention, the plurality of reference sounds are accompaniment sounds of music pieces having different musical styles, and are classified into a plurality of categories for each music genre, and the characteristics of the specific categories are compared with the names of the genres. Display control means for displaying on the display device the name of the style of the reference sound selected according to the comparison result by the means; In the above aspect, since the name of the genre of the specific category and the name of the reference sound style selected according to the comparison result by the characteristic comparison unit are displayed on the display device, the genre and style of the analysis target sound can be discriminated. There is an advantage that useful information can be provided to a desired user.

以上の各態様に係る音響解析装置は、解析対象音の解析に専用されるＤＳＰ（Digital Signal Processor）等のハードウェア（電子回路）によって実現されるほか、ＣＰＵ（Central Processing Unit）等の汎用の演算処理装置とプログラムとの協働によっても実現される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体や磁気記録媒体等の公知の任意の形式の記録媒体を包含し得る。また、例えば、本発明のプログラムは、通信網を介した配信の形態で提供されてコンピュータにインストールされ得る。また、本発明は、以上に説明した各態様に係る音響解析装置の動作方法（音響解析方法）としても特定される。 The acoustic analysis device according to each of the above aspects is realized by hardware (electronic circuit) such as DSP (Digital Signal Processor) dedicated to analysis of the analysis target sound, and a general-purpose such as CPU (Central Processing Unit). This is also realized by cooperation between the arithmetic processing unit and the program. The program of the present invention can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium can be included. For example, the program of the present invention can be provided in the form of distribution via a communication network and installed in a computer. The present invention is also specified as an operation method (acoustic analysis method) of the acoustic analysis device according to each aspect described above.

本発明の第１実施形態に係る音響解析装置の構成図である。1 is a configuration diagram of an acoustic analysis device according to a first embodiment of the present invention. 基底学習処理の説明図である。It is explanatory drawing of a base learning process. 基底学習処理のフローチャートである。It is a flowchart of a basic learning process. 行列解析処理のフローチャートである。It is a flowchart of a matrix analysis process. 行列解析処理（階層化ＮＭＦ）の説明図である。It is explanatory drawing of a matrix analysis process (hierarchical NMF). 係数算定部による処理の説明図である。It is explanatory drawing of the process by a coefficient calculation part. 係数算定部が実行する階層化ＮＭＦの説明図である。It is explanatory drawing of the hierarchization NMF which a coefficient calculation part performs. 解析結果画面の模式図である。It is a schematic diagram of an analysis result screen. 対比例の説明図である。It is explanatory drawing of contrast. 本発明の第２実施形態に係る電子楽器の構成図である。It is a block diagram of the electronic musical instrument which concerns on 2nd Embodiment of this invention. 変形例に係る特性比較部の動作の説明図である。It is explanatory drawing of operation | movement of the characteristic comparison part which concerns on a modification.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る音響解析装置１００の構成図である。図１に例示される通り、音響解析装置１００には信号供給装置１２と表示装置１４とが接続される。信号供給装置１２は、音響信号ＡXを音響解析装置１００に供給する。音響信号ＡXは、音響解析装置１００による解析の対象となる音響（以下「解析対象音」という）の波形を表す信号である。第１実施形態では、楽曲を構成する複数の演奏パートの演奏音（歌唱者の音声や楽器の楽音）の混合音を解析対象音として想定する。可搬型または内蔵型の記録媒体から音響信号ＡXを取得して音響解析装置１００に供給する再生装置や、配信サーバ装置から配信（例えばストリーミング配信）された楽曲の音響信号ＡXを通信網から受信して音響解析装置１００に供給する通信装置が、信号供給装置１２として好適に採用される。なお、信号供給装置１２を音響解析装置１００と一体に構成することも可能である。 <First Embodiment>
FIG. 1 is a configuration diagram of an acoustic analysis apparatus 100 according to the first embodiment of the present invention. As illustrated in FIG. 1, a signal supply device 12 and a display device 14 are connected to the acoustic analysis device 100. The signal supply device 12 supplies the acoustic signal AX to the acoustic analysis device 100. The acoustic signal AX is a signal that represents the waveform of the sound to be analyzed by the acoustic analysis device 100 (hereinafter referred to as “analysis target sound”). In 1st Embodiment, the mixed sound of the performance sound (singer's voice and musical tone of a musical instrument) of the some performance part which comprises a music is assumed as an analysis object sound. Receiving an acoustic signal AX from a communication network, such as a playback device that acquires an acoustic signal AX from a portable or built-in recording medium and supplies the acoustic signal AX to the acoustic analysis device 100 or a distribution server device (for example, streaming distribution). Thus, a communication device that supplies the acoustic analysis device 100 is preferably employed as the signal supply device 12. Note that the signal supply device 12 may be integrated with the acoustic analysis device 100.

音響解析装置１００は、信号供給装置１２から供給される音響信号ＡXを解析する信号処理装置である。具体的には、第１実施形態の音響解析装置１００は、音響信号ＡXで表現される楽曲のジャンルとスタイルとを推定する。ジャンルは、楽曲を音楽的な観点で分類した区分（種類）を意味し、スタイルは、楽曲をジャンルよりも詳細に分類した区分（様式）を意味する。例えばロックやポップスやクラシック等の区分がジャンルに相当し、６０年代や８０年代等の区分がスタイルに相当する。第１実施形態では、音響信号ＡXのジャンルをＧ個（Ｇは２以上の自然数）の候補から推定するとともに１個のジャンルにおける音響信号ＡXのスタイルをＳ個（Ｓは２以上の自然数）の候補から推定する場合を想定する。以下の説明では便宜的に、Ｇ個のジャンルの各々が同数（Ｓ個）のスタイルを包含する場合を想定するが、実際にはスタイルの種類や総数Ｓはジャンル毎に相違する。図１の表示装置１４（例えば液晶表示パネル）は、音響解析装置１００からの指示に応じた画像を表示する。具体的には、音響解析装置１００による音響信号ＡXの解析結果（楽曲のジャンルおよびスタイル）が表示装置１４に表示される。 The acoustic analysis device 100 is a signal processing device that analyzes the acoustic signal AX supplied from the signal supply device 12. Specifically, the acoustic analysis device 100 according to the first embodiment estimates the genre and style of the music expressed by the acoustic signal AX. The genre means a category (type) in which the music is classified from a musical viewpoint, and the style means a category (style) in which the music is classified in more detail than the genre. For example, categories such as rock, pop, and classic correspond to genres, and categories such as the 1960s and 80s correspond to styles. In the first embodiment, the genre of the acoustic signal AX is estimated from G candidates (G is a natural number of 2 or more), and the style of the acoustic signal AX in one genre is S (S is a natural number of 2 or more). Assume a case of estimation from candidates. In the following description, for the sake of convenience, it is assumed that each of the G genres includes the same number (S) of styles. However, the type of style and the total number S are actually different for each genre. The display device 14 (for example, a liquid crystal display panel) in FIG. 1 displays an image according to an instruction from the acoustic analysis device 100. Specifically, the analysis result (music genre and style) of the acoustic signal AX by the acoustic analysis device 100 is displayed on the display device 14.

図１に例示される通り、音響解析装置１００は、演算処理装置２２と記憶装置２４とを具備するコンピュータシステムで実現される。記憶装置２４は、演算処理装置２２が実行するプログラムや演算処理装置２２が使用する各種のデータを記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体や複数種の記録媒体の組合せが記憶装置２４として利用される。音響信号ＡXを記憶装置２４に記憶した構成（したがって、信号供給装置１２は省略され得る）も好適である。 As illustrated in FIG. 1, the acoustic analysis device 100 is realized by a computer system including an arithmetic processing device 22 and a storage device 24. The storage device 24 stores a program executed by the arithmetic processing device 22 and various data used by the arithmetic processing device 22. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media is used as the storage device 24. A configuration in which the acoustic signal AX is stored in the storage device 24 (therefore, the signal supply device 12 can be omitted) is also suitable.

第１実施形態の記憶装置２４は、音響信号ＡXの解析に利用される複数の参照データＤR[g,s]（ｇ＝１〜Ｇ，ｓ＝１〜Ｓ）を記憶する。図１に例示される通り、各参照データＤR[g,s]は、属性情報ｄと参照信号ＡRとを含んで構成される。参照信号ＡRは、音響信号ＡXの解析に利用される音響（以下「参照音」という）の波形を表す信号である。参照データＤR[g,s]の参照信号ＡRで表現される参照音は、第ｇ番目のジャンルと第ｓ番目のスタイルとの組合せに対応する楽曲の伴奏パートに好適な演奏音（例えば当該組合せに該当する既存の楽曲で多用される傾向がある打楽器等のリズム楽器の伴奏パターン）である。楽曲の所定長（例えば４小節分）の区間にわたる参照音が各参照信号ＡRで表現される。 The storage device 24 of the first embodiment stores a plurality of reference data DR [g, s] (g = 1 to G, s = 1 to S) used for analysis of the acoustic signal AX. As illustrated in FIG. 1, each reference data DR [g, s] includes attribute information d and a reference signal AR. The reference signal AR is a signal representing a waveform of sound (hereinafter referred to as “reference sound”) used for analysis of the sound signal AX. The reference sound expressed by the reference signal AR of the reference data DR [g, s] is a performance sound suitable for the accompaniment part of the music corresponding to the combination of the gth genre and the sth style (for example, the combination Accompaniment patterns of rhythm instruments such as percussion instruments that tend to be frequently used in existing music that falls under A reference sound over a predetermined length (for example, four bars) of the music is represented by each reference signal AR.

属性情報ｄは、参照音に対応する楽曲（例えば参照音が伴奏パートの演奏音として好適な楽曲）の属性を指定する。具体的には、参照データＤR[g,s]の属性情報ｄは、第ｇ番目のジャンルの名称（ロックやポップス等の名称）と第ｓ番目のスタイルの名称（６０年代や８０年代等の名称）とを指定する。楽曲のジャンルまたはスタイルが相違する多数の参照音の各々について参照データＤR[g,s]が事前に用意されて記憶装置２４に格納される。以上の説明から理解される通り、複数の参照音は、Ｇ個のジャンルとＳ個のスタイルとに分類される。なお、楽曲の参照音の発音／消音を時系列に指定するMIDI（Musical Instrument Digital Interface）形式の演奏データを参照データＤR[g,s]として記憶装置２４に記憶し、演奏データから参照信号ＡRを生成する構成も採用され得る。 The attribute information d designates an attribute of a song corresponding to the reference sound (for example, a song whose reference sound is suitable as a performance sound of the accompaniment part). Specifically, the attribute information d of the reference data DR [g, s] includes the name of the gth genre (name of rock, pop, etc.) and the name of the sth style (e.g. 60's or 80's). Name). Reference data DR [g, s] is prepared in advance and stored in the storage device 24 for each of a large number of reference sounds having different genres or styles of music. As understood from the above description, the plurality of reference sounds are classified into G genres and S styles. Note that performance data in MIDI (Musical Instrument Digital Interface) format for designating the sound generation / mute of the reference sound in time series is stored in the storage device 24 as reference data DR [g, s], and the reference signal AR is obtained from the performance data. A configuration for generating

演算処理装置２２は、記憶装置２４に記憶されたプログラムを実行することで、音響信号ＡXを解析するための複数の機能（基底学習部３２，行列解析部３４，係数算定部３６，特性比較部３８，表示制御部４０）を実現する。なお、演算処理装置２２の各機能を複数の集積回路に分散した構成や、専用の電子回路（例えばDSP）が演算処理装置２２の一部の機能を実現する構成も採用され得る。 The arithmetic processing device 22 executes a program stored in the storage device 24 to thereby analyze a plurality of functions (basic learning unit 32, matrix analysis unit 34, coefficient calculation unit 36, characteristic comparison unit). 38, the display control unit 40) is realized. A configuration in which each function of the arithmetic processing device 22 is distributed over a plurality of integrated circuits, or a configuration in which a dedicated electronic circuit (for example, DSP) realizes a part of the functions of the arithmetic processing device 22 may be employed.

基底学習部３２は、記憶装置２４に記憶された各参照データＤR[g,s]から相異なるジャンルに対応するＧ個の参照基底行列Ｂ[1]〜Ｂ[G]を生成する。図２に例示される通り、任意の１個の参照基底行列Ｂ[g]は、第ｇ番目のジャンルに分類される楽曲の伴奏パートに典型的に出現する各音響成分に対応するＫ個の基底ベクトルｂ[1]〜ｂ[K]を横方向に配列したＭ行Ｋ列の非負値行列（基底行列）である。参照基底行列Ｂ[g]のうち第ｋ列（ｋ＝１〜Ｋ）の基底ベクトルｂ[k]は、第ｇ番目のジャンルの楽曲の伴奏パートの参照音に典型的に出現するＫ種類の音響成分のうち第ｋ番目の音響成分の周波数特性（振幅スペクトルまたはパワースペクトル）を表現する。参照基底行列Ｂ[g]の行数（基底ベクトルｂ[k]の要素数）Ｍは、周波数軸上に離散的に設定された周波数の個数に相当する。なお、以下の説明では便宜的に、参照基底行列Ｂ[g]の列数ＫをＧ個の参照基底行列Ｂ[1]〜Ｂ[G]にわたり共通させた場合を例示するが、参照基底行列Ｂ[g]毎（ジャンル毎）に列数Ｋを相違させることも可能である。 The basis learning unit 32 generates G reference basis matrices B [1] to B [G] corresponding to different genres from each reference data DR [g, s] stored in the storage device 24. As illustrated in FIG. 2, any one reference basis matrix B [g] is K pieces corresponding to each acoustic component typically appearing in the accompaniment part of the music classified into the g-th genre. This is a non-negative matrix (basic matrix) of M rows and K columns in which base vectors b [1] to b [K] are arranged in the horizontal direction. The basis vector b [k] in the k-th column (k = 1 to K) in the reference basis matrix B [g] is K types that typically appear in the reference sound of the accompaniment part of the music of the g-th genre. The frequency characteristic (amplitude spectrum or power spectrum) of the kth acoustic component among the acoustic components is expressed. The number of rows of the reference basis matrix B [g] (number of elements of the basis vector b [k]) M corresponds to the number of frequencies discretely set on the frequency axis. In the following description, for convenience, the case where the number K of columns of the reference basis matrix B [g] is made common in G reference basis matrices B [1] to B [G] will be described. It is also possible to make the number of columns K different for each B [g] (for each genre).

図３は、基底学習部３２が参照データＤR[g,s]から各参照基底行列Ｂ[g]を算定する処理（以下「基底学習処理」という）のフローチャートである。基底学習処理を開始すると、基底学習部３２は、記憶装置２４に記憶された複数の参照データＤR[g,s]の各々について参照特性行列Ｒ[g,s]を生成する（ＳA1）。参照特性行列Ｒ[g,s]は、図２に例示される通り、参照データＤR[g,s]の参照信号ＡRの周波数特性の時系列（スペクトログラム）を表すＭ行Ｎ列（Ｎは２以上の自然数）の非負値行列である。すなわち、参照特性行列Ｒ[g,s]の第ｎ列（ｎ＝１〜Ｎ）は、参照データＤR[g,s]の参照信号ＡRを時間軸上で区分したＮ個のフレームのうち第ｎ番目のフレームでの参照信号ＡRの周波数特性（振幅スペクトルまたはパワースペクトル）に相当する。参照基底行列Ｂ[g]の生成には、短時間フーリエ変換等の公知の周波数分析が任意に採用される。なお、以下の各図面において記号ｔは時間を意味し、記号ｆは周波数を意味する。 FIG. 3 is a flowchart of a process in which the base learning unit 32 calculates each reference base matrix B [g] from the reference data DR [g, s] (hereinafter referred to as “base learning process”). When the base learning process is started, the base learning unit 32 generates a reference characteristic matrix R [g, s] for each of the plurality of reference data DR [g, s] stored in the storage device 24 (SA1). As illustrated in FIG. 2, the reference characteristic matrix R [g, s] has M rows and N columns (N is 2) representing a time series (spectrogram) of the frequency characteristics of the reference signal AR of the reference data DR [g, s]. This is a non-negative matrix of the above natural numbers). That is, the n-th column (n = 1 to N) of the reference characteristic matrix R [g, s] is the first of N frames obtained by dividing the reference signal AR of the reference data DR [g, s] on the time axis. This corresponds to the frequency characteristic (amplitude spectrum or power spectrum) of the reference signal AR in the nth frame. For the generation of the reference basis matrix B [g], a known frequency analysis such as a short-time Fourier transform is arbitrarily employed. In the following drawings, the symbol t means time, and the symbol f means frequency.

基底学習部３２は、各参照データＤR[g,s]から算定した複数（(Ｇ×Ｓ)個）の参照特性行列Ｒ[g,s]（Ｒ[1,1]〜Ｒ[G,S]）をジャンル毎に区分し、各ジャンルのＳ個の参照特性行列Ｒ[g,1]〜Ｒ[g,S]に応じた結合特性行列Ｒ[g]をジャンル毎に生成する（ＳA2）。具体的には、結合特性行列Ｒ[g]は、図２に例示される通り、第ｇ番目のジャンルに対応するＳ個の参照特性行列Ｒ[g,1]〜Ｒ[g,S]を横方向（時間軸方向）に配列したＭ行(Ｎ×Ｓ)列の非負値行列である。 The base learning unit 32 calculates a plurality ((G × S)) reference characteristic matrices R [g, s] (R [1,1] to R [G, S] calculated from each reference data DR [g, s]. ]) For each genre, and a combined characteristic matrix R [g] corresponding to the S reference characteristic matrices R [g, 1] to R [g, S] of each genre is generated for each genre (SA2). . Specifically, the connection characteristic matrix R [g] includes S reference characteristic matrices R [g, 1] to R [g, S] corresponding to the g-th genre, as illustrated in FIG. It is a non-negative matrix of M rows (N × S) columns arranged in the horizontal direction (time axis direction).

基底学習部３２は、結合特性行列Ｒ[g]に対する非負値行列因子分解で第ｇ番目のジャンルの参照基底行列Ｂ[g]を算定する（ＳA3）。具体的には、基底学習部３２は、結合特性行列Ｒ[g]を参照基底行列Ｂ[g]と図２の係数行列Ｈ[g]とに分解する。係数行列Ｈ[g]は、参照基底行列Ｂ[g]の各基底ベクトルｂ[k]に対応するＫ個の係数ベクトルｈ[1]〜ｈ[K]を縦方向に配列したＫ行(Ｎ×Ｓ)列の非負値行列（アクティベーション）である。係数行列Ｈ[g]の第ｋ行の係数ベクトルｈ[k]は、参照基底行列Ｂ[g]の基底ベクトルｂ[k]に対する加重値（活性度）の時間変動に相当する。基底学習部３２は、参照基底行列Ｂ[g]と係数行列Ｈ[g]との行列積Ｂ[g]Ｈ[g]が結合特性行列Ｒ[g]に近付くように参照基底行列Ｂ[g]と係数行列Ｈ[g]とを反復的に更新する学習処理で参照基底行列Ｂ[g]と係数行列Ｈ[g]とを算定する。結合特性行列Ｒ[g]の非負値行列因子分解（参照基底行列Ｂ[g]の算定）には公知の技術が任意に採用される。 The base learning unit 32 calculates the reference base matrix B [g] of the g-th genre by non-negative matrix factorization with respect to the coupling characteristic matrix R [g] (SA3). Specifically, the base learning unit 32 decomposes the coupling characteristic matrix R [g] into a reference base matrix B [g] and a coefficient matrix H [g] in FIG. The coefficient matrix H [g] has K rows (N) in which K coefficient vectors h [1] to h [K] corresponding to the respective basis vectors b [k] of the reference basis matrix B [g] are arranged in the vertical direction. (S) Non-negative matrix (activation) of columns. The coefficient vector h [k] in the k-th row of the coefficient matrix H [g] corresponds to the time variation of the weight value (activity) for the base vector b [k] of the reference base matrix B [g]. The basis learning unit 32 determines the reference basis matrix B [g] so that the matrix product B [g] H [g] of the reference basis matrix B [g] and the coefficient matrix H [g] approaches the coupling characteristic matrix R [g]. ] And the coefficient matrix H [g] are repetitively updated to calculate the reference basis matrix B [g] and the coefficient matrix H [g]. A known technique is arbitrarily employed for non-negative matrix factorization of the coupling characteristic matrix R [g] (calculation of the reference basis matrix B [g]).

結合特性行列Ｒ[g]の非負値行列因子分解（ＳA3）がジャンル毎に実行されることで、相異なるジャンルに対応するＧ個の参照基底行列Ｂ[1]〜Ｂ[G]が生成される。以上の説明から理解される通り、参照基底行列Ｂ[g]は、第ｇ番目のジャンルの相異なるスタイルに対応するＳ個の参照音（参照データＤR[g,1]〜ＤR[g,S]の各々の参照信号ＡRで表現される参照音）に優勢に出現する音響成分の周波数特性を表現する。なお、参照基底行列Ｂ[g]とともに算定される係数行列Ｈ[g]は破棄されて音響信号ＡXの解析には利用されない。 By executing non-negative matrix factorization (SA3) of the coupling characteristic matrix R [g] for each genre, G reference basis matrices B [1] to B [G] corresponding to different genres are generated. The As understood from the above description, the reference basis matrix B [g] is represented by S reference sounds corresponding to different styles of the g-th genre (reference data DR [g, 1] to DR [g, S ] Represents the frequency characteristics of acoustic components that appear predominantly in each reference signal AR). Note that the coefficient matrix H [g] calculated together with the reference basis matrix B [g] is discarded and is not used for the analysis of the acoustic signal AX.

図１の行列解析部３４は、基底学習部３２が算定したＧ個の参照基底行列Ｂ[1]〜Ｂ[G]を利用して解析対象音の音響信号ＡXを解析する。第１実施形態の行列解析部３４は、以下に詳述する通り、音響信号ＡXの周波数特性の時系列を表す解析特性行列Ｘについて、基底学習部３２が算定した各参照基底行列Ｂ[g]を教師情報（事前情報）として利用した教師あり非負値行列因子分解を実行する。 The matrix analysis unit 34 in FIG. 1 analyzes the acoustic signal AX of the analysis target sound using the G reference basis matrices B [1] to B [G] calculated by the basis learning unit 32. As described in detail below, the matrix analysis unit 34 of the first embodiment uses each reference basis matrix B [g] calculated by the basis learning unit 32 for the analysis characteristic matrix X representing the time series of the frequency characteristics of the acoustic signal AX. Is used as supervised information (prior information) to perform supervised non-negative matrix factorization.

図４は、行列解析部３４が音響信号ＡXの解析特性行列Ｘを解析する処理（以下「行列解析処理」という）のフローチャートであり、図５は、行列解析処理の説明図である。行列解析処理を開始すると、行列解析部３４は、図５に例示される通り、信号供給装置１２が供給する音響信号ＡXから解析対象の区間（以下「解析区間」という）を抽出する（ＳB1，ＳB2）。具体的には、行列解析部３４は、音響信号ＡXの時間軸上の各拍点を特定し（ＳB1）、各拍点を境界として音響信号ＡXから解析区間を抽出する（ＳB2）。解析区間は、音響信号ＡXのうち各参照信号ＡRと同等の時間長（例えば４小節分）の区間である。なお、音響信号ＡXの拍点の特定には公知の技術（ビート検出）が任意に採用される。例えば、行列解析部３４は、時間軸上で音響信号ＡXの音量が極大となる略等間隔の時点を拍点として特定する。 FIG. 4 is a flowchart of a process in which the matrix analysis unit 34 analyzes the analysis characteristic matrix X of the acoustic signal AX (hereinafter referred to as “matrix analysis process”), and FIG. 5 is an explanatory diagram of the matrix analysis process. When the matrix analysis process is started, the matrix analysis unit 34 extracts an analysis target section (hereinafter referred to as “analysis section”) from the acoustic signal AX supplied by the signal supply device 12 as illustrated in FIG. SB2). Specifically, the matrix analysis unit 34 specifies each beat point on the time axis of the acoustic signal AX (SB1), and extracts an analysis section from the acoustic signal AX with each beat point as a boundary (SB2). The analysis section is a section having a time length (for example, four bars) equivalent to each reference signal AR in the acoustic signal AX. A known technique (beat detection) is arbitrarily employed to specify the beat point of the acoustic signal AX. For example, the matrix analysis unit 34 specifies substantially equidistant time points at which the volume of the acoustic signal AX is maximum on the time axis as beat points.

行列解析部３４は、音響信号ＡXの解析区間について解析特性行列Ｘを生成する（ＳB3）。解析特性行列Ｘは、図５に例示される通り、解析区間内の音響信号ＡXの周波数特性の時系列（スペクトログラム）を表すＭ行Ｎ列の非負値行列である。すなわち、解析特性行列Ｘの第ｎ列は、解析区間を時間軸上で区分したＮ個のフレームのうち第ｎ番目のフレームでの音響信号ＡXの周波数特性（振幅スペクトルまたはパワースペクトル）に相当する。解析特性行列Ｘの生成には、短時間フーリエ変換等の公知の周波数分析が任意に採用される。 The matrix analysis unit 34 generates an analysis characteristic matrix X for the analysis section of the acoustic signal AX (SB3). As illustrated in FIG. 5, the analysis characteristic matrix X is a non-negative matrix of M rows and N columns that represents a time series (spectrogram) of the frequency characteristics of the acoustic signal AX in the analysis section. That is, the nth column of the analysis characteristic matrix X corresponds to the frequency characteristic (amplitude spectrum or power spectrum) of the acoustic signal AX in the nth frame among the N frames obtained by dividing the analysis section on the time axis. . For the generation of the analysis characteristic matrix X, known frequency analysis such as short-time Fourier transform is arbitrarily employed.

行列解析部３４は、解析特性行列Ｘに対して非負値行列因子分解を実行する（ＳB4）。第１実施形態の行列解析部３４が実行する非負値行列因子分解は、図５からも理解される通り、相異なるジャンルに対応するＧ個の係数（以下「区分加重値」という）ｗA[1]〜ｗA[G]を適用した以下の数式(1)で表現される。

数式(1)および図５から理解される通り、第１実施形態の行列解析部３４は、参照基底行列Ｂ[g]と解析係数行列Ｙ[g]との行列積Ｂ[g]Ｙ[g]をジャンル毎の区分加重値ｗA[g]のもとで加重加算した結果（数式(1)の右辺）が音響信号ＡXの解析特性行列Ｘに近似するように、区分加重値ｗA[g]（ｗA[1]〜ｗA[G]）と解析係数行列Ｙ[g]（Ｙ[1]〜Ｙ[G]）とをジャンル毎に算定する。具体的には、行列解析部３４は、区分加重値ｗA[1]〜ｗA[G]を適用したＧ個の行列積Ｂ[1]Ｙ[1]〜Ｂ[G]Ｙ[G]の加重和が解析特性行列Ｘに近付くように各区分加重値ｗA[g]と各解析係数行列Ｙ[g]とを反復的に更新する学習処理で、各ジャンルの区分加重値ｗA[g]と解析係数行列Ｙ[g]とを一括的に算定する。区分加重値ｗA[g]および解析係数行列Ｙ[g]の更新式は、非負値行列因子分解に適用される既存の更新式の導出と同様に、例えば区分加重値ｗA[1]〜ｗA[G]を適用したＧ個の行列積Ｂ[1]Ｙ[1]〜Ｂ[G]Ｙ[G]の加重和と音響信号ＡXの解析特性行列Ｘとの差分に相当する評価関数が最小化される（評価関数の微分値がゼロになる）という条件から導出される。 The matrix analysis unit 34 performs non-negative matrix factorization on the analysis characteristic matrix X (SB4). The non-negative matrix factorization executed by the matrix analysis unit 34 of the first embodiment is G coefficients (hereinafter referred to as “partition weights”) wA [1 corresponding to different genres, as can be understood from FIG. ] To wA [G] are expressed by the following formula (1).

As understood from Equation (1) and FIG. 5, the matrix analysis unit 34 of the first embodiment performs the matrix product B [g] Y [g] of the reference basis matrix B [g] and the analysis coefficient matrix Y [g]. ] Is weighted and added under the category weight value wA [g] for each genre so that the result (right side of Equation (1)) approximates the analysis characteristic matrix X of the acoustic signal AX. (WA [1] to wA [G]) and the analysis coefficient matrix Y [g] (Y [1] to Y [G]) are calculated for each genre. Specifically, the matrix analysis unit 34 weights G matrix products B [1] Y [1] to B [G] Y [G] to which the division weights wA [1] to wA [G] are applied. In the learning process that iteratively updates each category weight value wA [g] and each analysis coefficient matrix Y [g] so that the sum approaches the analysis characteristic matrix X, the category weight value wA [g] and analysis of each genre are analyzed. The coefficient matrix Y [g] is calculated collectively. Similar to the derivation of the existing update formula applied to the non-negative matrix factorization, the update formula of the piece weight wA [g] and the analysis coefficient matrix Y [g] is, for example, the piece weight wA [1] to wA [ The evaluation function corresponding to the difference between the weighted sum of G matrix products B [1] Y [1] to B [G] Y [G] to which G] is applied and the analysis characteristic matrix X of the acoustic signal AX is minimized. Is derived (the derivative value of the evaluation function becomes zero).

解析係数行列Ｙ[g]は、図５に例示される通り、参照基底行列Ｂ[g]の各基底ベクトルｂ[k]に対応するＫ個の係数ベクトルｙ[1]〜ｙ[K]を縦方向に配列したＫ行Ｎ列の非負値行列である。解析係数行列Ｙ[g]の第ｋ行の係数ベクトルｙ[k]は、参照基底行列Ｂ[g]の基底ベクトルｂ[k]に対する加重値（活性度）の時間変動（すなわち、基底ベクトルｂ[k]の音響成分が解析対象音の音響信号ＡXに出現する時間的なパターン）に相当する。したがって、参照基底行列Ｂ[g]と解析係数行列Ｙ[g]との行列積Ｂ[g]Ｙ[g]は、音響信号ＡXのうち第ｇ番目のジャンルの各参照音に優勢に出現する音響成分の周波数特性の時系列（スペクトログラム）に相当する。以上の説明から理解される通り、各区分加重値ｗA[g]は、第ｇ番目のジャンルの楽曲の伴奏パートに多用される音響成分を解析対象音の音響信号ＡXが含有する度合（優勢度）の指標に相当する。すなわち、行列解析部３４が算定する区分加重値ｗA[g]が大きいほど、第ｇ番目のジャンルに多用される音響成分が音響信号ＡXにて優勢である（解析対象音が第ｇ番目のジャンルに該当する確度が高い）と評価できる。 As illustrated in FIG. 5, the analysis coefficient matrix Y [g] includes K coefficient vectors y [1] to y [K] corresponding to each base vector b [k] of the reference base matrix B [g]. It is a non-negative matrix of K rows and N columns arranged in the vertical direction. The coefficient vector y [k] in the k-th row of the analysis coefficient matrix Y [g] is the time variation of the weight (activity) with respect to the basis vector b [k] of the reference basis matrix B [g] (that is, the basis vector b [k] corresponds to a temporal pattern in which the acoustic component appears in the acoustic signal AX of the analysis target sound). Therefore, the matrix product B [g] Y [g] of the reference basis matrix B [g] and the analysis coefficient matrix Y [g] appears predominantly in each reference sound of the g-th genre in the acoustic signal AX. This corresponds to a time series (spectrogram) of frequency characteristics of acoustic components. As understood from the above description, each category weight value wA [g] indicates the degree (dominance degree) that the acoustic signal AX of the analysis target sound contains the acoustic component frequently used in the accompaniment part of the g-th genre of music. ). That is, the larger the classification weight value wA [g] calculated by the matrix analysis unit 34, the more dominant the acoustic component used in the gth genre is in the acoustic signal AX (the analysis target sound is the gth genre). It is possible to evaluate that the accuracy is high.

以上の傾向を考慮して、第１実施形態の行列解析部３４は、解析特性行列Ｘの非負値行列因子分解で算定したＧ個の区分加重値ｗA[1]〜ｗA[G]に応じて解析対象音のジャンル（以下「特定ジャンル」という）を推定する（ＳB5）。具体的には、行列解析部３４は、Ｇ個の区分加重値ｗA[1]〜ｗA[G]のうち最大の区分加重値ｗA[γ]（γ＝argmax_g（ｗA[g]））に対応するジャンル（第γ番目のジャンル）を特定ジャンルとして特定する。そして、行列解析部３４は、解析特性行列Ｘに対する非負値行列因子分解で相異なるジャンルについて算定したＧ個の解析係数行列Ｙ[1]〜Ｙ[G]のうち特定ジャンルに対応する解析係数行列Ｙ[γ]を選択する（ＳB6）。以上の説明から理解される通り、解析係数行列Ｙ[γ]の各係数ベクトルｙ[k]は、特定ジャンルの楽曲の伴奏パートに多用される各音響成分が解析対象音の音響信号ＡXに出現する時間的なパターン（当該音響成分のリズムパターン）に相当する。なお、参照基底行列Ｂ[g]の基底ベクトルｂ[k]の加重値が、当該基底ベクトルｂ[k]に対応する係数ベクトルｙ[k]と各ジャンルの区分加重値ｗA[g]とに階層化されるという観点から、数式(1)で例示されるように基底行列と係数行列との加重和で分解対象の行列（数式(1)の例示では解析特性行列Ｘ）を近似する非負値行列因子分解を、以下の説明では便宜的に「階層化ＮＭＦ」と表記する。 Considering the above tendency, the matrix analysis unit 34 of the first embodiment responds to the G piece weights wA [1] to wA [G] calculated by the non-negative matrix factorization of the analysis characteristic matrix X. The genre (hereinafter referred to as “specific genre”) of the analysis target sound is estimated (SB5). Specifically, the matrix analysis unit 34 sets the maximum segment weight value wA [γ] (γ = argmax _g (wA [g])) among the G segment weight values wA [1] to wA [G]. The corresponding genre (γth genre) is specified as a specific genre. The matrix analysis unit 34 then analyzes the analysis coefficient matrix corresponding to the specific genre among the G analysis coefficient matrices Y [1] to Y [G] calculated for different genres by non-negative matrix factorization with respect to the analysis characteristic matrix X. Y [γ] is selected (SB6). As understood from the above description, in each coefficient vector y [k] of the analysis coefficient matrix Y [γ], each acoustic component frequently used in the accompaniment part of the music of a specific genre appears in the acoustic signal AX of the analysis target sound. Corresponds to a temporal pattern (rhythm pattern of the acoustic component). The weight value of the basis vector b [k] of the reference basis matrix B [g] is the coefficient vector y [k] corresponding to the basis vector b [k] and the category weight value wA [g] of each genre. From the viewpoint of being hierarchized, a non-negative value that approximates the matrix to be decomposed (analysis characteristic matrix X in the example of equation (1)) with the weighted sum of the base matrix and the coefficient matrix as illustrated in equation (1) The matrix factorization is expressed as “hierarchical NMF” for convenience in the following description.

図１の係数算定部３６は、特定ジャンルのＳ個の参照データＤR[γ,1]〜ＤR[γ,S]の各々の参照信号ＡRの参照特性行列Ｒ[γ,s]について参照基底行列Ｂ[1]〜Ｂ[G]を教師情報とする教師あり非負値行列因子分解を実行することで図６の基礎データＱ[γ]を生成する。図６に例示される通り、基礎データＱ[γ]は、特定ジャンルの相異なるスタイルに対応するＳ個の単位データｑ[γ,1]〜ｑ[γ,S]を含んで構成される。 The coefficient calculation unit 36 in FIG. 1 uses the reference basis matrix for the reference characteristic matrix R [γ, s] of each reference signal AR of the S reference data DR [γ, 1] to DR [γ, S] of a specific genre. The basic data Q [γ] in FIG. 6 is generated by executing supervised non-negative matrix factorization using B [1] to B [G] as teacher information. As illustrated in FIG. 6, the basic data Q [γ] includes S unit data q [γ, 1] to q [γ, S] corresponding to different styles of a specific genre.

第１実施形態の係数算定部３６が参照特性行列Ｒ[γ,s]に対して実行する非負値行列因子分解は、前述の解析特性行列Ｘの非負値行列因子分解（数式(1)）と同様に、相異なるジャンルに対応するＧ個の区分加重値ｗB[1,s]〜ｗB[G,s]を適用した以下の数式(2)で表現される階層化ＮＭＦである。

数式(2)および図７から理解される通り、第１実施形態の係数算定部３６は、参照基底行列Ｂ[g]と参照係数行列Ｚ[g,s]との行列積Ｂ[g]Ｚ[g,s]をジャンル毎の区分加重値ｗB[g,s]のもとで加重加算した結果（数式(2)の右辺）が特定ジャンルの参照信号ＡRの参照特性行列Ｒ[γ,s]に近似するように、Ｇ個の区分加重値ｗB[1,s]〜ｗB[G,s]とＧ個の参照係数行列Ｚ[1,s]〜Ｚ[G,s]とを算定する。図６に例示される通り、基礎データＱ[γ]のうち１個の参照特性行列Ｒ[γ,s]に対応する単位データｑ[γ,s]は、参照特性行列Ｒ[γ,s]から算定されたＧ個の区分加重値ｗB[1,s]〜ｗB[G,s]とＧ個の参照係数行列Ｚ[1,s]〜Ｚ[G,s]とを包含する。 The non-negative matrix factorization performed by the coefficient calculation unit 36 of the first embodiment on the reference characteristic matrix R [γ, s] is the non-negative matrix factorization (formula (1)) of the analytic characteristic matrix X described above. Similarly, it is a hierarchized NMF expressed by the following formula (2) to which G piece weights wB [1, s] to wB [G, s] corresponding to different genres are applied.

As understood from Equation (2) and FIG. 7, the coefficient calculation unit 36 of the first embodiment performs the matrix product B [g] Z of the reference basis matrix B [g] and the reference coefficient matrix Z [g, s]. The result of weighted addition of [g, s] based on the category weight value wB [g, s] for each genre (the right side of Equation (2)) is the reference characteristic matrix R [γ, s of the reference signal AR of the specific genre. ], G piece weights wB [1, s] to wB [G, s] and G reference coefficient matrices Z [1, s] to Z [G, s] are calculated. . As illustrated in FIG. 6, unit data q [γ, s] corresponding to one reference characteristic matrix R [γ, s] in the basic data Q [γ] is a reference characteristic matrix R [γ, s]. G pieces of weighted values wB [1, s] to wB [G, s] and G reference coefficient matrices Z [1, s] to Z [G, s] are included.

参照係数行列Ｚ[g,s]は、図７に例示される通り、参照基底行列Ｂ[g]の各基底ベクトルｂ[k]に対応するＫ個の係数ベクトルｚ[1]〜ｚ[K]を縦方向に配列したＫ行Ｎ列の非負値行列である。参照係数行列Ｚ[g,s]の第ｋ行の係数ベクトルｚ[k]は、参照基底行列Ｂ[g]の基底ベクトルｂ[k]に対する加重値の時間変動（すなわち、基底ベクトルｂ[k]の音響成分が参照音の参照信号ＡRに出現する時間的なパターン）に相当する。以上の説明から理解される通り、参照基底行列Ｂ[g]と参照係数行列Ｚ[g,s]との行列積Ｂ[g]Ｚ[g,s]は、第ｇ番目のジャンルの参照音に優勢に出現する音響成分の周波数特性の時系列（スペクトログラム）に相当する。したがって、係数算定部３６がスタイル毎に算定するＧ個の区分加重値ｗB[1,s]〜ｗB[G,s]のうち特定ジャンルに対応する１個の区分加重値ｗB[γ,s]は他の(Ｇ−１)個と比較して大きい数値（１に近い数値）となる。 As illustrated in FIG. 7, the reference coefficient matrix Z [g, s] is K coefficient vectors z [1] to z [K corresponding to each base vector b [k] of the reference base matrix B [g]. ] Is a non-negative matrix of K rows and N columns arranged in the vertical direction. The coefficient vector z [k] in the k-th row of the reference coefficient matrix Z [g, s] is a time variation of the weighted value with respect to the base vector b [k] of the reference base matrix B [g] (ie, the base vector b [k] ] Corresponds to a temporal pattern in which the sound component appears in the reference signal AR of the reference sound. As understood from the above description, the matrix product B [g] Z [g, s] of the reference basis matrix B [g] and the reference coefficient matrix Z [g, s] is the reference sound of the g-th genre. This corresponds to a time series (spectrogram) of frequency characteristics of the acoustic component that appears predominantly. Accordingly, one of the G classification weights wB [1, s] to wB [G, s] calculated by the coefficient calculation unit 36 for each style corresponds to a specific genre. Becomes a larger value (a value closer to 1) than the other (G-1).

係数算定部３６は、図６から理解される通り、単位データｑ[γ,s]に包含されるＧ個の参照係数行列Ｚ[1,s]〜Ｚ[G,s]のうち特定ジャンルに対応する参照係数行列Ｚ[γ,s]を、相異なるスタイルに対応するＳ個の単位データｑ[γ,1]〜ｑ[γ,S]の各々について選択する。すなわち、特定ジャンルの相異なるスタイルに対応するＳ個の参照係数行列Ｚ[γ,1]〜Ｚ[γ,s]が選択される。以上の説明から理解される通り、任意の１個の参照係数行列Ｚ[γ,s]は、特定ジャンルの楽曲の伴奏パートに多用される各音響成分が参照データＤR[γ,s]の参照信号ＡRに出現する時間的なパターン（当該音響成分のリズムパターン）に相当する。 As understood from FIG. 6, the coefficient calculation unit 36 sets a specific genre among G reference coefficient matrices Z [1, s] to Z [G, s] included in the unit data q [γ, s]. A corresponding reference coefficient matrix Z [γ, s] is selected for each of S unit data q [γ, 1] to q [γ, S] corresponding to different styles. That is, S reference coefficient matrices Z [γ, 1] to Z [γ, s] corresponding to different styles of a specific genre are selected. As understood from the above description, in any one reference coefficient matrix Z [γ, s], each acoustic component frequently used in the accompaniment part of the music of a specific genre is referred to the reference data DR [γ, s]. This corresponds to a temporal pattern (rhythm pattern of the acoustic component) appearing in the signal AR.

図１の特性比較部３８は、行列解析部３４が特定ジャンルについて算定した解析係数行列Ｙ[γ]と、係数算定部３６が特定ジャンルのスタイル毎に算定した参照係数行列Ｚ[γ,1]〜Ｚ[γ,S]の各々とを比較する。具体的には、特性比較部３８は、解析係数行列Ｙ[γ]と参照係数行列Ｚ[γ,s]との類似度σ[s]をスタイル毎に算定する。すなわち、特定ジャンルの相異なるスタイル（相異なるＳ個の参照係数行列Ｚ[γ,1]〜Ｚ[γ,S]の各々）に対応するＳ個の類似度σ[1]〜σ[S]が算定される。類似度σ[s]は、解析係数行列Ｙ[γ]と参照係数行列Ｚ[γ,s]との類否の度合の指標であり、例えば距離（ユークリッド距離）や相関が好適例である。第１実施形態では、解析係数行列Ｙ[γ]と参照係数行列Ｚ[γ,s]との相関を類似度σ[s]として算定する。したがって、解析係数行列Ｙ[γ]と参照係数行列Ｚ[γ,s]とが類似するほど類似度σ[s]は増加する。以上の説明から理解される通り、特性比較部３８が算定する類似度σ[s]が大きいほど、特定ジャンルの第ｓ番目のスタイルの楽曲の伴奏パートに多用される音響成分の時間的なパターンに音響信号ＡXが類似する（解析対象音が特定ジャンルの第ｓ番目のスタイルに該当する確度が高い）と評価できる。 The characteristic comparison unit 38 in FIG. 1 includes an analysis coefficient matrix Y [γ] calculated for the specific genre by the matrix analysis unit 34 and a reference coefficient matrix Z [γ, 1] calculated by the coefficient calculation unit 36 for each style of the specific genre. Each of ~ Z [γ, S] is compared. Specifically, the characteristic comparison unit 38 calculates the similarity σ [s] between the analysis coefficient matrix Y [γ] and the reference coefficient matrix Z [γ, s] for each style. That is, S similarities σ [1] to σ [S] corresponding to different styles of a specific genre (each of different S reference coefficient matrices Z [γ, 1] to Z [γ, S]). Is calculated. The similarity σ [s] is an index of the degree of similarity between the analysis coefficient matrix Y [γ] and the reference coefficient matrix Z [γ, s]. For example, distance (Euclidean distance) and correlation are preferable examples. In the first embodiment, the correlation between the analysis coefficient matrix Y [γ] and the reference coefficient matrix Z [γ, s] is calculated as the similarity σ [s]. Therefore, the similarity σ [s] increases as the analysis coefficient matrix Y [γ] and the reference coefficient matrix Z [γ, s] are more similar. As understood from the above description, as the similarity σ [s] calculated by the characteristic comparison unit 38 is larger, the temporal pattern of the acoustic component used more frequently in the accompaniment part of the s-th style song of a specific genre. Can be evaluated that the acoustic signal AX is similar (the analysis target sound is highly likely to correspond to the s-th style of the specific genre).

表示制御部４０は、特性比較部３８が算定した類似度σ[1]〜σ[S]に応じた解析結果を表示装置１４に表示させる。第１実施形態の表示制御部４０は、図８に例示される解析結果画面５０を表示装置１４に表示させる。解析結果画面５０は、特定ジャンルの名称（ロックやポップス等のジャンル名）と、類似度σ[s]に応じて選択されたスタイルの名称とを含むリストである。具体的には、特定ジャンルのＳ個のスタイルのうち類似度σ[s]の降順で上位に位置する所定個のスタイル（すなわち音響信号ＡXが該当する確度が高いスタイル）の名称が類似度σ[s]の降順で配列される。ジャンルおよびスタイルの名称は、各参照データＤR[g,s]の属性情報ｄから特定される。利用者は、表示装置１４に表示された解析結果を確認することで、音響信号ＡXのジャンルおよびスタイルを認識することが可能である。なお、以上の例示では、類似度σ[s]の降順で上位に位置する所定個のスタイルの名称を表示したが、例えば類似度σ[s]が所定の閾値を上回る１個以上（類似度σ[s]と閾値とに応じた可変の個数）のスタイルの名称を表示させることも可能である。 The display control unit 40 causes the display device 14 to display an analysis result corresponding to the degrees of similarity σ [1] to σ [S] calculated by the characteristic comparison unit 38. The display control unit 40 according to the first embodiment causes the display device 14 to display the analysis result screen 50 illustrated in FIG. The analysis result screen 50 is a list including the name of a specific genre (genre name such as rock or pop) and the name of the style selected according to the similarity σ [s]. Specifically, among the S styles of a specific genre, the name of a predetermined number of styles that are positioned higher in descending order of the similarity σ [s] (that is, the style with a high degree of accuracy corresponding to the acoustic signal AX) is the similarity σ. Arranged in descending order of [s]. The name of the genre and style is specified from the attribute information d of each reference data DR [g, s]. The user can recognize the genre and style of the acoustic signal AX by confirming the analysis result displayed on the display device 14. In the above example, the names of a predetermined number of styles positioned higher in the descending order of the similarity σ [s] are displayed. For example, one or more names whose similarity σ [s] exceeds a predetermined threshold (similarity It is also possible to display the names of styles (variable number according to σ [s] and threshold).

以上に説明した第１実施形態では、参照基底行列Ｂ[g]と各解析係数行列Ｙ[g]との行列積Ｂ[g]Ｙ[g]をジャンル毎の区分加重値ｗA[g]のもとでＧ個のジャンルについて加重加算した結果が音響信号ＡXの解析特性行列Ｘに近似するように、区分加重値ｗA[g]と解析係数行列Ｙ[g]とがジャンル毎に個別に算定される。したがって、以下に詳述する通り、音響信号ＡXのジャンルやスタイルを高精度に解析できるという利点がある。 In the first embodiment described above, the matrix product B [g] Y [g] of the reference basis matrix B [g] and each analysis coefficient matrix Y [g] is used as the division weight value wA [g] for each genre. The classification weight value wA [g] and the analysis coefficient matrix Y [g] are calculated individually for each genre so that the result of weighted addition for G genres approximates the analysis characteristic matrix X of the acoustic signal AX. Is done. Accordingly, as described in detail below, there is an advantage that the genre and style of the acoustic signal AX can be analyzed with high accuracy.

各ジャンルの参照音に優勢に出現する音響成分（基底ベクトルｂ[k]）の時間的なパターン（各音響成分の加重値の時間変動）を算定する方法としては、例えば図９に例示される通り、相異なるジャンルに対応するＧ個の参照基底行列Ｂ[1]〜Ｂ[G]を連結したＭ行(Ｋ×Ｇ)列の大行列（以下「統合基底行列」という）Ｂ0を音響信号ＡXの解析特性行列Ｘの非負値行列因子分解に適用する方法（以下「対比例」という）が想定される。対比例では、解析特性行列Ｘが、統合基底行列Ｂ0と統合係数行列Ｙとに分解される。統合基底行列Ｂ0は、Ｇ個の参照基底行列Ｂ[1]〜Ｂ[G]の各々に包含される複数（(Ｋ×Ｍ)個）の基底ベクトルｂ[k]を包含し、統合係数行列Ｙは、各基底ベクトルｂ[k]に対応する複数（(Ｋ×Ｍ)個）の係数ベクトルｙ[k]を包含する。対比例では、相異なるジャンルに属する各基底ベクトルｂ[k]がジャンル毎に区別されることなく相互に対等に取扱われるから、相異なる２個以上のジャンルの参照音に音響特性が類似する解析対象音の音響成分が、各ジャンルに対応する複数の係数ベクトルｙ[k]に分配される（複数の係数ベクトルｙ[k]にて同時に励起される）可能性がある。すなわち、例えば「ダンス」のジャンルの演奏音（例えばキックドラムの演奏音）と「アコースティック」のジャンルの演奏音（例えばスネアドラムの演奏音）とに類似する解析対象音の音響成分は、本来的には１個のジャンルの係数ベクトルｙ[k]のみに反映されるべきであるが、「ダンス」のジャンルの基底ベクトルｂ[k1]に対応する係数ベクトルｙ[k1]と「アコースティック」のジャンルの基底ベクトルｂ[k2]（ｋ2≠ｋ1）に対応する係数ベクトルｙ[k2]との双方に分配され得る。 For example, FIG. 9 illustrates a method for calculating the temporal pattern (temporal fluctuation of the weight value of each acoustic component) of the acoustic component (base vector b [k]) that appears predominantly in the reference sound of each genre. As described above, a large matrix (hereinafter referred to as “integrated basis matrix”) B0 having M rows (K × G) columns obtained by connecting G reference basis matrices B [1] to B [G] corresponding to different genres is used as an acoustic signal. A method (hereinafter referred to as “proportional”) applied to non-negative matrix factorization of the analytical characteristic matrix X of AX is assumed. In contrast, the analysis characteristic matrix X is decomposed into an integrated basis matrix B0 and an integrated coefficient matrix Y. The integrated base matrix B0 includes a plurality ((K × M)) of base vectors b [k] included in each of the G reference base matrices B [1] to B [G], and an integrated coefficient matrix Y includes a plurality ((K × M)) of coefficient vectors y [k] corresponding to each base vector b [k]. In contrast, since each base vector b [k] belonging to different genres is handled equally without being distinguished for each genre, an analysis in which acoustic characteristics are similar to reference sounds of two or more different genres. There is a possibility that the acoustic component of the target sound is distributed to a plurality of coefficient vectors y [k] corresponding to each genre (excited simultaneously by the plurality of coefficient vectors y [k]). That is, for example, the sound component of the analysis target sound similar to the performance sound of the “dance” genre (for example, the performance sound of a kick drum) and the performance sound of the “acoustic” genre (for example, the performance sound of a snare drum) Should be reflected only in the coefficient vector y [k] of one genre, but the coefficient vector y [k1] corresponding to the basis vector b [k1] of the “dance” genre and the genre of “acoustic” And the coefficient vector y [k2] corresponding to the basis vector b [k2] (k2 ≠ k1).

以上に例示した対比例とは対照的に、第１実施形態にて解析特性行列Ｘに実行される階層化ＮＭＦでは、参照基底行列Ｂ[g]と解析係数行列Ｙ[g]とが区分加重値ｗA[g]によりジャンル毎に区分されるから、解析対象音の音響成分の音響特性が２個以上のジャンルの参照音に類似する場合でも、当該音響成分は１個のジャンルの係数ベクトルｙ[g]に適切に分配される。すなわち、解析対象音の解析係数行列Ｙ[g]が高精度に推定される。したがって、前述の通り、音響信号ＡXのジャンルやスタイルを高精度に推定することが可能である。 In contrast to the comparative example illustrated above, in the hierarchical NMF executed on the analysis characteristic matrix X in the first embodiment, the reference basis matrix B [g] and the analysis coefficient matrix Y [g] are piecewise weighted. Since it is classified for each genre by the value wA [g], even if the acoustic characteristics of the acoustic component of the analysis target sound are similar to the reference sound of two or more genres, the acoustic component is a coefficient vector y of one genre. Appropriately distributed to [g]. That is, the analysis coefficient matrix Y [g] of the analysis target sound is estimated with high accuracy. Therefore, as described above, the genre and style of the acoustic signal AX can be estimated with high accuracy.

以上の説明では解析特性行列Ｘの階層化ＮＭＦに着目したが、第１実施形態では、参照信号ＡRの参照特性行列Ｒ[g,s]についても同様に、参照基底行列Ｂ[g]と各参照係数行列Ｚ[g,s]との行列積Ｂ[g]Ｚ[g,s]をジャンル毎の区分加重値ｗB[g,s]のもとでＧ個のジャンルについて加重加算した結果を参照信号ＡRの参照特性行列Ｒ[g,s]に近似させる階層化ＮＭＦが実行される。以上の構成によれば、Ｇ個の参照基底行列Ｂ[1]〜Ｂ[G]を包含する統合基底行列Ｂ0を利用して参照特性行列Ｒ[g,s]を分解する構成と比較して、参照音の参照係数行列Ｚ[g,s]が高精度に推定される。したがって、音響信号ＡXのジャンルやスタイルを高精度に推定できるという効果は格別に顕著である。 In the above description, attention is paid to the hierarchized NMF of the analysis characteristic matrix X. However, in the first embodiment, the reference characteristic matrix R [g, s] of the reference signal AR is also referred to as the reference basis matrix B [g]. The result of the weighted addition of G products with the matrix product B [g] Z [g, s] with the reference coefficient matrix Z [g, s] for each genre based on the category weight wB [g, s] Hierarchical NMF that approximates the reference characteristic matrix R [g, s] of the reference signal AR is executed. According to the above configuration, the reference characteristic matrix R [g, s] is decomposed using the integrated base matrix B0 including the G reference base matrices B [1] to B [G]. The reference coefficient matrix Z [g, s] of the reference sound is estimated with high accuracy. Therefore, the effect that the genre and style of the acoustic signal AX can be estimated with high accuracy is particularly remarkable.

また、第１実施形態では、Ｇ個のジャンルのうち区分加重値ｗA[g]に応じて選択された特定ジャンルのＳ個の参照音の参照特性行列Ｒ[γ,1]〜Ｒ[γ,S]について基礎データＱ[γ]の算定や解析係数行列Ｙ[γ]との比較が実行される。したがって、Ｇ個のジャンルの全部について基礎データＱ[γ]の算定や解析係数行列Ｙ[γ]との比較を実行する構成と比較して、演算処理装置２２の処理量が削減されるという利点がある。 In the first embodiment, the reference characteristic matrices R [γ, 1] to R [γ, of S reference sounds of a specific genre selected according to the division weight value wA [g] among the G genres. Calculation of basic data Q [γ] and comparison with analysis coefficient matrix Y [γ] are performed for S]. Therefore, the processing amount of the arithmetic processing unit 22 is reduced as compared with the configuration in which the calculation of the basic data Q [γ] and the comparison with the analysis coefficient matrix Y [γ] are performed for all of the G genres. There is.

＜第２実施形態＞
本発明の第２実施形態について説明する。第２実施形態は、第１実施形態の音響解析装置１００を利用した電子楽器である。なお、以下に例示する各形態において作用や機能が第１実施形態と同様である要素については、第１実施形態の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 Second Embodiment
A second embodiment of the present invention will be described. The second embodiment is an electronic musical instrument that uses the acoustic analysis apparatus 100 of the first embodiment. In addition, about the element which an effect | action and function are the same as that of 1st Embodiment in each form illustrated below, the reference | standard referred by description of 1st Embodiment is diverted, and each detailed description is abbreviate | omitted suitably.

図１０は、第２実施形態の電子楽器２００の構成図である。電子楽器２００は、鍵盤楽器型の演奏機器（例えばMIDI楽器）であり、演算処理装置２２と記憶装置２４と表示装置１４とに加えて操作機器１６と放音装置１８とを具備する。操作機器１６は、利用者が操作する入力機器である。具体的には、操作機器１６は、鍵盤楽器と同様に複数の鍵（白鍵および黒鍵）が配列された鍵盤と、利用者が操作する操作子とを含んで構成される。利用者は、操作機器１６（典型的には鍵盤以外の操作子）を適宜に操作することで、音響信号ＡXの解析結果として表示装置１４に表示された図８の解析結果画面５０から、所望のジャンルおよびスタイルの組合せを選択することが可能である。放音装置１８（例えばスピーカやヘッドホン）は、演算処理装置２２から供給される音響信号Ｖに応じた音響を放射する。 FIG. 10 is a configuration diagram of the electronic musical instrument 200 of the second embodiment. The electronic musical instrument 200 is a keyboard musical instrument-type performance device (for example, a MIDI musical instrument), and includes an operation device 16 and a sound emitting device 18 in addition to the arithmetic processing device 22, the storage device 24, and the display device 14. The operation device 16 is an input device operated by a user. Specifically, the operating device 16 includes a keyboard on which a plurality of keys (white key and black key) are arranged in the same manner as a keyboard instrument, and an operator operated by a user. The user appropriately operates the operation device 16 (typically, an operator other than the keyboard) to appropriately select the desired result from the analysis result screen 50 of FIG. 8 displayed on the display device 14 as the analysis result of the acoustic signal AX. A combination of genres and styles can be selected. The sound emitting device 18 (for example, a speaker or headphones) radiates sound corresponding to the acoustic signal V supplied from the arithmetic processing device 22.

図１０に例示される通り、第２実施形態の電子楽器２００の演算処理装置２２は、電子楽器２００に接続された信号供給装置１２から供給される解析対象音の音響信号ＡXを第１実施形態と同様に解析して解析結果を利用者に提示する要素（基底学習部３２，行列解析部３４，係数算定部３６，特性比較部３８，表示制御部４０）として機能する。したがって、第２実施形態においても第１実施形態と同様の効果が実現される。第１実施形態と同様の要素に加えて、第２実施形態の演算処理装置２２は、指示受付部６２および再生処理部６４としても機能する。指示受付部６２は、操作機器１６に対する利用者からの操作を受付ける。具体的には、指示受付部６２は、操作機器１６の鍵盤に対する演奏操作と、操作機器１６に対するジャンルおよびスタイルの選択操作とを受付ける。 As illustrated in FIG. 10, the arithmetic processing device 22 of the electronic musical instrument 200 according to the second embodiment uses the acoustic signal AX of the analysis target sound supplied from the signal supply device 12 connected to the electronic musical instrument 200 as the first embodiment. It functions as an element (base learning unit 32, matrix analysis unit 34, coefficient calculation unit 36, characteristic comparison unit 38, display control unit 40) that analyzes and presents the analysis result to the user. Therefore, the same effects as those of the first embodiment are realized in the second embodiment. In addition to the same elements as in the first embodiment, the arithmetic processing device 22 of the second embodiment also functions as an instruction receiving unit 62 and a reproduction processing unit 64. The instruction receiving unit 62 receives an operation from the user for the operating device 16. Specifically, the instruction receiving unit 62 receives a performance operation on the keyboard of the operation device 16 and a genre and style selection operation on the operation device 16.

再生処理部６４は、記憶装置２４に記憶された複数の参照データＤR[g,s]のうち指示受付部６２が受付けた選択操作で指定されたジャンルおよびスタイルの参照データＤR[g,s]の参照信号ＡRと、指示受付部６２が受付けた演奏操作で順次に指定される音高の時系列を表す演奏信号とを混合することで音響信号Ｖを生成して放音装置１８に供給する。なお、参照信号ＡRがMIDI形式等の演奏データで記憶装置２４に記憶された構成では、再生処理部６４が演奏データから参照信号ＡRを生成する。以上の説明から理解される通り、第２実施形態では、音響信号ＡXの楽曲のジャンルおよびスタイルに好適な伴奏パートの演奏音（参照信号ＡR）のもとで、例えば当該楽曲の旋律パートを、利用者が操作機器１６に対する演奏操作で演奏することが可能である。 The reproduction processing unit 64 uses the genre and style reference data DR [g, s] specified by the selection operation received by the instruction receiving unit 62 among the plurality of reference data DR [g, s] stored in the storage device 24. The reference signal AR and the performance signal representing the time series of the pitches sequentially designated by the performance operation received by the instruction receiving unit 62 are mixed to generate the acoustic signal V and supply it to the sound emitting device 18. . In the configuration in which the reference signal AR is stored in the storage device 24 as performance data in the MIDI format or the like, the reproduction processing unit 64 generates the reference signal AR from the performance data. As will be understood from the above description, in the second embodiment, for example, the melody part of the music piece is selected based on the performance sound (reference signal AR) of the accompaniment part suitable for the genre and style of the music of the acoustic signal AX. A user can perform by performing a performance operation on the operation device 16.

＜第３実施形態＞
第１実施形態では、区分加重値ｗA[1]〜ｗA[G]を適用したＧ個の行列積Ｂ[1]Ｙ[1]〜Ｂ[G]Ｙ[G]の加重和と音響信号ＡXの解析特性行列Ｘとの差分に相当する評価関数が最小化されるという条件から導出された更新式の演算で区分加重値ｗA[g]と解析係数行列Ｙ[g]とをジャンル毎に算定したが、階層化ＮＭＦの解法は以上の例示に限定されない。第３実施形態は、階層化ＮＭＦの処理に変分ベイズ法を適用した形態である。 <Third Embodiment>
In the first embodiment, the weighted sum of G matrix products B [1] Y [1] to B [G] Y [G] to which the section weights wA [1] to wA [G] are applied and the acoustic signal AX. The segment weights wA [g] and the analysis coefficient matrix Y [g] are calculated for each genre using the update formula derived from the condition that the evaluation function corresponding to the difference from the analysis characteristic matrix X is minimized However, the solution of the hierarchical NMF is not limited to the above examples. In the third embodiment, the variational Bayes method is applied to the hierarchical NMF processing.

観測対象音の音響信号ＡXの解析特性行列Ｘの観測尤度は、ポアソン分布（Pois()）を適用した以下の数式(3)の確率モデルで近似的に表現される。数式(3)の添字ｔは時間を意味し、添字ｆは周波数を意味する。また、数式(3)の記号ｂ_f[k,g]は、第ｇ番目のジャンルの参照基底行列Ｂ[g]における第ｋ列の基底ベクトルｂ[k]に相当する。

The observation likelihood of the analysis characteristic matrix X of the acoustic signal AX of the observation target sound is approximately expressed by a probability model of the following formula (3) to which a Poisson distribution (Pois ()) is applied. In the formula (3), the subscript t means time, and the subscript f means frequency. In addition, the symbol b _f [k, g] in Equation (3) corresponds to the basis vector b [k] of the k-th column in the reference basis matrix B [g] of the g-th genre.

数式(3)の係数ベクトルｙ_t[k]および基底ベクトルｂ_f[g,k]の各々の事前分布は、ガンマ分布（Gam()）を適用した以下の数式(4A)および数式(4B)で表現される。

The prior distributions of the coefficient vector y _t [k] and the basis vector b _f [g, k] in the equation (3) are _expressed by the following equations (4A) and (4B) to which the gamma distribution (Gam ()) is applied. It is expressed by

ジャンルの総数Ｇを不定値として好適な数値に設定する観点から、以下の数式(5A)のようにガンマ過程を仮定する。また、基底ベクトルｂ_f[k,g]の総数Ｋを不定値として好適な数値に設定する観点から、前述の数式(3)では、ガンマ過程を適用した数式(5B)で表現される変数θ_g[k]を導入した。

From the viewpoint of setting the total number G of genres to an appropriate value as an indefinite value, a gamma process is assumed as in the following formula (5A). Further, from the viewpoint of setting the total number K of the basis vectors b _f [k, g] to a suitable numerical value as an indefinite value, the above-described mathematical expression (3) uses the variable θ expressed by the mathematical expression (5B) to which the gamma process is applied. _g [k] was introduced.

以上のように定義された確率モデルの各変数を推定する。対数同時分布logｐ(wA,b,y,θ)は、定数項を無視すると以下の数式(6)で表現される。

Each variable of the probability model defined as above is estimated. The logarithmic simultaneous distribution logp (wA, b, y, θ) is expressed by the following formula (6) when the constant term is ignored.

数式(6)の変数Λ_f,t[g,k]は、以下の数式(7)の条件を充足する変数である。

The variable Λ _{f, t} [g, k] in Expression (6) is a variable that satisfies the condition of Expression (7) below.

数式(6)の確率モデルの各変数の推定に公知の変分ベイズ法を適用する。まず、変数Λ_f,t[g,k]を以下の数式(8)の演算で更新する。

A known variational Bayes method is applied to estimate each variable of the probability model of Equation (6). First, the variable Λ _{f, t} [g, k] is updated by the calculation of the following formula (8).

確率モデルの他の変数の事後分布も以下の数式(9)から数式(12)のように設定できる。

The posterior distribution of other variables in the probabilistic model can also be set as in the following formula (9) to formula (12).

行列解析部３４は、数式(9)から数式(12)の演算で数式(3)の確率モデルの各変数（ｂ_f[k,g]，ｙ_t[k]，ｗA[g]，θ_g[k]）を算定する。第３実施形態においても第１実施形態と同様の効果が実現される。また、第３実施形態では、ガンマ過程の導入により基底ベクトルｂ_f[k,g]の総数（スタイルの総数）Ｋを不定値として取扱う確率モデルで音響信号ＡXの解析特性行列Ｘを表現するから、基底ベクトルｂ_f[k,g]の総数Ｋを適切に設定しながら階層化ＮＭＦを実現できるという利点がある。 Matrix analysis unit 34, Equation (9) from Equation (12) each variable probabilistic model of equation (3) in the calculation of _{(b f [k, g]} , y t [k], wA [g], θ g [k]) is calculated. In the third embodiment, the same effect as in the first embodiment is realized. In the third embodiment, the analysis characteristic matrix X of the acoustic signal AX is expressed by a probability model that treats the total number (total number of styles) K of the basis vectors b _f [k, g] as an indefinite value by introducing a gamma process. There is an advantage that the hierarchical NMF can be realized while appropriately setting the total number K of the basis vectors b _f [k, g].

＜変形例＞
以上の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様は適宜に併合され得る。 <Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples can be appropriately combined.

（１）前述の各形態では、図１１の部分(A)に例示される通り、特定ジャンルの各スタイルに対応するＳ個の参照係数行列Ｚ[γ,1]〜Ｚ[γ,S]の各々と音響信号ＡXから生成された特定ジャンルの解析係数行列Ｙ[γ]とを特性比較部３８が比較したが、特性比較部３８による比較の対象は以上の例示に限定されない。例えば、図１１の部分(B)に例示される通り、数式(1)の階層化ＮＭＦで算定された特定ジャンルの区分加重値ｗA[γ]と解析係数行列Ｙ[γ]との乗算結果ｗA[γ]Ｙ[γ]と、数式(2)の階層化ＮＭＦで算定された特定ジャンルの区分加重値ｗB[γ,s]と参照係数行列Ｚ[γ,s]との乗算結果ｗB[γ,s]Ｚ[γ,s]とを、特定ジャンルのスタイル毎に特性比較部３８が比較する（すなわち類似度σ[1]〜σ[S]を算定する）ことも可能である。以上の例示から理解される通り、特性比較部３８は、解析係数行列Ｙ[γ]と参照係数行列Ｚ[γ,s]とを比較する要素として包括的に表現され、解析係数行列Ｙ[γ]に対する区分加重値ｗA[γ]の乗算の有無や参照係数行列Ｚ[γ,s]に対する区分加重値ｗB[γ]の乗算の有無は不問である。 (1) In each of the above-described embodiments, as illustrated in part (A) of FIG. 11, S reference coefficient matrices Z [γ, 1] to Z [γ, S] corresponding to each style of a specific genre are used. The characteristic comparison unit 38 compares the analysis coefficient matrix Y [γ] of each specific genre generated from the acoustic signal AX, but the comparison target by the characteristic comparison unit 38 is not limited to the above examples. For example, as illustrated in part (B) of FIG. 11, the multiplication result wA of the division weight value wA [γ] of the specific genre calculated by the hierarchical NMF of Expression (1) and the analysis coefficient matrix Y [γ]. Multiplying result wB [γ by [γ] Y [γ], the division weight value wB [γ, s] of the specific genre calculated by the hierarchical NMF of Equation (2), and the reference coefficient matrix Z [γ, s] , s] Z [γ, s] can be compared by the characteristic comparison unit 38 for each style of a specific genre (that is, the similarity σ [1] to σ [S] is calculated). As understood from the above examples, the characteristic comparison unit 38 is comprehensively expressed as an element for comparing the analysis coefficient matrix Y [γ] and the reference coefficient matrix Z [γ, s], and the analysis coefficient matrix Y [γ The presence or absence of multiplication of the division weight value wA [γ] with respect to] and the presence or absence of multiplication of the division weight value wB [γ] with respect to the reference coefficient matrix Z [γ, s] are unquestioned.

（２）前述の各形態では、相異なるジャンルに対応するＧ個の参照基底行列Ｂ[1]〜Ｂ[G]を基底学習部３２が参照データＤR[g,s]から算定する構成を例示したが、Ｇ個の参照基底行列Ｂ[1]〜Ｂ[G]を事前に算定して記憶装置２４に格納した構成も採用され得る。例えば音響解析装置１００の基底学習部３２が事前に生成した参照基底行列Ｂ[1]〜Ｂ[G]や、音響解析装置１００とは別個の装置にて第１実施形態と同様の方法で事前に生成された参照基底行列Ｂ[1]〜Ｂ[G]が記憶装置２４に格納される。以上の説明から理解される通り、基底学習部３２は音響解析装置１００から省略され得る。 (2) In each of the above-described embodiments, a configuration in which the base learning unit 32 calculates G reference basis matrices B [1] to B [G] corresponding to different genres from the reference data DR [g, s] is illustrated. However, a configuration in which G reference basis matrices B [1] to B [G] are calculated in advance and stored in the storage device 24 can also be employed. For example, the reference basis matrices B [1] to B [G] generated in advance by the base learning unit 32 of the acoustic analysis device 100 or a device separate from the acoustic analysis device 100 in advance using the same method as in the first embodiment. The reference basis matrices B [1] to B [G] generated in the above are stored in the storage device 24. As understood from the above description, the base learning unit 32 can be omitted from the acoustic analysis apparatus 100.

（３）前述の各形態では、特定ジャンルの基礎データＱ[γ]を係数算定部３６が算定する構成を例示したが、相異なるジャンルに対応するＧ個の基礎データＱ[1]〜Ｑ[G]を事前に算定して記憶装置２４に格納した構成も採用され得る。図８を参照して前述した通り、基礎データＱ[g]は、第ｇ番目のジャンルの相異なるスタイルに対応するＳ個の単位データｑ[g,1]〜ｑ[g,S]を含んで構成される。各単位データｑ[g,s]は、参照特性行列Ｒ[g,s]に対する数式(2)の階層化ＮＭＦで算定されたＧ個の区分加重値ｗB[1,s]〜ｑB[G,s]とＧ個の参照係数行列Ｚ[1,s]〜Ｚ[G,s]とを包含する。特性比較部３８は、記憶装置２４に記憶されたＧ個の基礎データＱ[1]〜Ｑ[G]のうち特定ジャンルの基礎データＱ[γ]を選択し、基礎データＱ[γ]の各単位データｑ[γ,s]からＳ個の参照係数行列Ｚ[γ,1]〜Ｚ[γ,S]を抽出してスタイル毎の類似度σ[s]を算定する。例えば音響解析装置１００の係数算定部３６が事前に生成した基礎データＱ[1]〜Ｑ[G]や、音響解析装置１００とは別個の装置にて第１実施形態と同様の方法で事前に生成された基礎データＱ[1]〜Ｑ[G]が記憶装置２４に格納される。以上の説明から理解される通り、係数算定部３６は音響解析装置１００から省略され得る。 (3) In each of the above embodiments, the configuration in which the coefficient calculation unit 36 calculates the basic data Q [γ] of a specific genre is illustrated, but G basic data Q [1] to Q [corresponding to different genres A configuration in which G] is calculated in advance and stored in the storage device 24 may also be employed. As described above with reference to FIG. 8, the basic data Q [g] includes S unit data q [g, 1] to q [g, S] corresponding to different styles of the g-th genre. Consists of. Each unit data q [g, s] includes G piece weights wB [1, s] to qB [G, calculated by the hierarchical NMF of Expression (2) for the reference characteristic matrix R [g, s]. s] and G reference coefficient matrices Z [1, s] to Z [G, s]. The characteristic comparison unit 38 selects basic data Q [γ] of a specific genre among the G basic data Q [1] to Q [G] stored in the storage device 24, and each of the basic data Q [γ] S reference coefficient matrices Z [γ, 1] to Z [γ, S] are extracted from the unit data q [γ, s] to calculate the similarity σ [s] for each style. For example, basic data Q [1] to Q [G] generated in advance by the coefficient calculation unit 36 of the acoustic analysis device 100 or a device separate from the acoustic analysis device 100 in advance by the same method as in the first embodiment. The generated basic data Q [1] to Q [G] are stored in the storage device 24. As understood from the above description, the coefficient calculation unit 36 can be omitted from the acoustic analysis apparatus 100.

（４）前述の各形態では、複数の参照音のジャンルおよびスタイルの名称を類似度σ[s]の降順で配列したリストを表示装置１４に表示させたが、解析結果を利用者に提示する方法は以上の例示に限定されない。例えば、特定ジャンルのＳ個のスタイルのうち類似度σ[s]が最大となる１個のスタイルの名称を表示装置１４に表示させることも可能である。また、解析結果の利用方法は利用者に対する提示（典型的には画像表示）に限定されない。例えば、特定ジャンルのＳ個のスタイルのうち類似度σ[s]が最大となるスタイルの参照信号ＡRを放音装置１８に供給して再生する構成や、類似度が最大となるスタイルの参照信号ＡRを解析対象音の音響信号ＡXに対応付けて記憶する構成も採用され得る。以上の説明から理解される通り、解析結果を表示装置１４に表示させる表示制御部４０は省略され得る。 (4) In each of the above-described forms, a list in which the genres and style names of the plurality of reference sounds are arranged in descending order of the similarity σ [s] is displayed on the display device 14, but the analysis result is presented to the user. The method is not limited to the above examples. For example, the name of one style having the maximum similarity σ [s] among S styles of a specific genre can be displayed on the display device 14. Further, the method of using the analysis result is not limited to presentation to the user (typically, image display). For example, among the S styles of a specific genre, a configuration in which the reference signal AR having the maximum similarity σ [s] is supplied to the sound emitting device 18 and played back, or the reference signal having the style having the maximum similarity A configuration may also be employed in which AR is stored in association with the acoustic signal AX of the analysis target sound. As understood from the above description, the display control unit 40 that displays the analysis result on the display device 14 may be omitted.

（５）前述の各形態では、行列解析部３４が算定したＧ個の解析係数行列Ｙ[1]〜Ｙ[G]のうち区分加重値ｗA[1]〜ｗA[G]に応じて推定された特定ジャンルの解析係数行列Ｙ[γ]について各参照音の参照係数行列Ｚ[γ,s]との類似度σ[s]を算定したが、行列解析部３４による解析結果を利用する方法は以上の例示に限定されない。例えば、行列解析部３４が算定したＧ個の区分加重値ｗA[1]〜ｗA[G]のうち最大の区分加重値ｗA[γ]に対応するジャンルの名称を解析結果として利用者に提示する構成も採用され得る。すなわち、音響解析装置１００は、音響信号ＡXで表現される楽曲のジャンルを推定する装置として利用される。以上の説明から理解される通り、行列解析部３４による解析結果を利用して類似度σ[s]を算定する特性比較部３８は省略され得る。 (5) In each of the above-described embodiments, it is estimated according to the division weights wA [1] to wA [G] among the G analysis coefficient matrices Y [1] to Y [G] calculated by the matrix analysis unit 34. For the analysis coefficient matrix Y [γ] of a specific genre, the similarity σ [s] with the reference coefficient matrix Z [γ, s] of each reference sound was calculated. A method of using the analysis result by the matrix analysis unit 34 is as follows. It is not limited to the above illustration. For example, the name of the genre corresponding to the largest segment weight value wA [γ] among the G segment weight values wA [1] to wA [G] calculated by the matrix analysis unit 34 is presented to the user as an analysis result. Configurations can also be employed. That is, the acoustic analysis device 100 is used as a device that estimates the genre of music expressed by the acoustic signal AX. As understood from the above description, the characteristic comparison unit 38 that calculates the similarity σ [s] using the analysis result by the matrix analysis unit 34 can be omitted.

（６）前述の各形態では、音響信号ＡXで表現される楽曲の音楽的なジャンルやスタイルの推定を例示したが、音響解析装置１００による解析の目的は、解析対象音のジャンルやスタイルの推定に限定されない。例えば、複数の楽曲から抽出されて楽曲構成用の素材（ループ素材）として利用される多数の参照音から音響信号ＡXに類似する参照音を推定する処理にも本発明を適用することが可能である。 (6) In each of the above-described embodiments, the estimation of the musical genre and style of the music expressed by the acoustic signal AX is exemplified. However, the purpose of the analysis by the acoustic analysis apparatus 100 is to estimate the genre and style of the analysis target sound. It is not limited to. For example, the present invention can also be applied to processing for estimating a reference sound similar to the acoustic signal AX from a large number of reference sounds extracted from a plurality of music pieces and used as material (loop material) for music composition. is there.

（７）携帯電話機等の端末装置と通信するサーバ装置により音響解析装置１００を実現することも可能である。具体的には、音響解析装置１００は、端末装置から通信網を介して受信した音響信号ＡXを前述の各形態と同様に解析するとともに解析結果（例えば類似度σ[1]〜σ[S]や解析結果画面５０の画像データ）を端末装置に送信する。 (7) The acoustic analysis apparatus 100 can also be realized by a server device that communicates with a terminal device such as a mobile phone. Specifically, the acoustic analysis device 100 analyzes the acoustic signal AX received from the terminal device via the communication network in the same manner as each of the above-described forms and analyzes the results (for example, the similarity σ [1] to σ [S] Or the image data of the analysis result screen 50) is transmitted to the terminal device.

１００……音響解析装置、２００……電子楽器、１２……信号供給装置、１４……表示装置、１６……操作機器、１８……放音装置、２２……演算処理装置、２４……記憶装置、３２……基底学習部、３４……行列解析部、３６……係数算定部、３８……特性比較部、４０……表示制御部、５０……解析結果画面、６２……指示受付部、６４……再生処理部。
DESCRIPTION OF SYMBOLS 100 ... Acoustic analysis apparatus, 200 ... Electronic musical instrument, 12 ... Signal supply apparatus, 14 ... Display apparatus, 16 ... Operating equipment, 18 ... Sound emission apparatus, 22 ... Arithmetic processing apparatus, 24 ... Memory Device 32... Basis learning unit 34... Matrix analysis unit 36... Coefficient calculation unit 38 .. characteristic comparison unit 40 .. display control unit 50 .. analysis result screen 62. 64... Reproduction processing unit.

Claims

複数の参照音を分類した複数の区分について、前記区分内の参照音の周波数特性を表す複数の基底ベクトルを含む前記区分毎の参照基底行列と、前記参照基底行列の各基底ベクトルの加重値の時間変動を表す複数の係数ベクトルを含む解析係数行列との行列積を、前記区分毎の第１区分加重値のもとで加重加算した結果が、解析対象音の周波数特性の時系列を表す解析特性行列に近似するように、前記第１区分加重値と前記解析係数行列とを前記区分毎に算定する行列解析手段
を具備する音響解析装置。 For a plurality of sections into which a plurality of reference sounds are classified, a reference basis matrix for each section including a plurality of basis vectors representing frequency characteristics of reference sounds in the section, and a weight value of each basis vector of the reference basis matrix Analysis that represents the time series of the frequency characteristics of the sound to be analyzed, as a result of weighted addition of a matrix product with an analysis coefficient matrix including a plurality of coefficient vectors representing time variation under the first section weight value for each section An acoustic analysis apparatus comprising: a matrix analysis unit that calculates the first section weight value and the analysis coefficient matrix for each section so as to approximate a characteristic matrix.

前記参照音の周波数特性の時系列を表す複数の参照特性行列を、前記参照基底行列と、当該参照基底行列の各基底ベクトルの加重値の時間変動を表す複数の係数ベクトルを含む参照係数行列とに分解したときの当該参照係数行列と、前記行列解析手段が算定した前記解析係数行列とを比較する特性比較手段
を具備する請求項１の音響解析装置。 A plurality of reference characteristic matrices representing a time series of frequency characteristics of the reference sound, a reference coefficient matrix including the reference basis matrix and a plurality of coefficient vectors representing temporal variation of weight values of the respective basis vectors of the reference basis matrix; The acoustic analysis apparatus according to claim 1, further comprising: a characteristic comparison unit that compares the reference coefficient matrix when the matrix is decomposed into the analysis coefficient matrix calculated by the matrix analysis unit.

前記特性比較手段は、前記参照基底行列と前記参照係数行列との行列積を前記区分毎の第２区分加重値のもとで加重加算した結果が前記参照特性行列に近似するように、前記第２区分加重値と前記参照係数行列とを前記区分毎に算定したときの当該各参照係数行列を、前記行列解析手段が算定した前記解析係数行列と比較する
請求項２の音響解析装置。 The characteristic comparison unit is configured so that a result of weighted addition of a matrix product of the reference basis matrix and the reference coefficient matrix under a second division weight value for each division approximates the reference characteristic matrix. The acoustic analysis device according to claim 2, wherein each reference coefficient matrix when the two-section weight value and the reference coefficient matrix are calculated for each section is compared with the analysis coefficient matrix calculated by the matrix analysis means.

前記特性比較手段は、前記複数の区分のうち前記行列解析手段が前記区分毎に算定した第１区分加重値に応じて選択された特定区分内の複数の参照音の各々について、当該参照音の前記参照係数行列と、前記行列解析手段が前記特定区分について算定した解析係数行列とを比較する
請求項２または請求項３の音響解析装置。 The characteristic comparison unit is configured to determine, for each of a plurality of reference sounds in a specific category selected according to a first category weight value calculated for each category by the matrix analysis unit among the plurality of categories. The acoustic analysis device according to claim 2 or 3, wherein the reference coefficient matrix is compared with an analysis coefficient matrix calculated by the matrix analysis unit for the specific section.

前記複数の参照音は、音楽的なスタイルが相違する各楽曲の伴奏音であり、前記楽曲のジャンル毎に前記複数の区分に分類され、
前記特定区分のジャンルの名称と前記特性比較手段による比較結果に応じて選択された参照音のスタイルの名称とを表示装置に表示させる表示制御手段を具備する
請求項４の音響解析装置。
The plurality of reference sounds are accompaniment sounds of music pieces having different musical styles, and are classified into the plurality of sections for each genre of the music pieces,
The acoustic analysis apparatus according to claim 4, further comprising display control means for displaying on the display device the name of the genre of the specific category and the name of the reference sound style selected according to the comparison result by the characteristic comparison means.