WO2022074754A1

WO2022074754A1 - Information processing method, information processing system, and program

Info

Publication number: WO2022074754A1
Application number: PCT/JP2020/037966
Authority: WO
Inventors: 竜之介大道; 慶二郎才野; 正宏清水
Original assignee: ヤマハ株式会社
Priority date: 2020-10-07
Filing date: 2020-10-07
Publication date: 2022-04-14
Also published as: JPWO2022074754A1; CN116324965A

Abstract

This invention includes: editing, in accordance with a first instruction from a user, first time-series data representing a time series of a feature amount of a sound produced by pronouncing a symbol string with a first pronunciation style; saving, as a new version of data, first historical data corresponding to the edited first time-series data, for each edit of the first time-series data; editing, in accordance with a second instruction from the user, second time-series data representing a time series of a feature amount of a sound produced by pronouncing the symbol string with a second pronunciation style which is different from the first pronunciation style; saving, as a new version of data, second historical data corresponding to the edited second time-series data, for each edit of the second time-series data; and acquiring first time-series data corresponding to a first historical data piece, among a plurality of saved first historical data pieces of different versions, that corresponds to an instruction from the user, or second time-series data corresponding to a second historical data piece, among a plurality of saved second historical data pieces of different versions, that corresponds to an instruction from the user.

Description

情報処理方法、情報処理システムおよびプログラムInformation processing methods, information processing systems and programs

　本開示は、時系列データの処理に関する。 This disclosure relates to the processing of time series data.

　任意の音韻の音声を合成する各種の音声合成技術が従来から提案されている。例えば特許文献１には、利用者が編集画面に対して指示した音符列を発音した歌唱音声を合成する技術が開示されている。編集画面は、時間軸と音高軸とが設定されたピアノロール画面である。利用者は、楽曲を構成する音符毎に、音韻（発音文字）と音高と発音期間とを指定する。 Various speech synthesis techniques for synthesizing speeches of arbitrary phonology have been proposed conventionally. For example, Patent Document 1 discloses a technique for synthesizing a singing voice that pronounces a note sequence instructed by a user on an editing screen. The edit screen is a piano roll screen in which the time axis and the pitch axis are set. The user specifies a phonetic (phonetic character), a pitch, and a pronunciation period for each note that constitutes a musical piece.

特開２０１６－９０９１６号公報Japanese Unexamined Patent Publication No. 2016-90916

　利用者の意図を正確に反映した音声を合成するためには、音声合成の条件（例えば各種のパラメータ）の編集と実際の音声の聴取とを反復する試行錯誤が利用者に要求される。利用者が順番に指示した複数の編集のうち最新の編集を逆順に取消す処理（アンドゥ）、または取消済の編集を実行し直す処理（リドゥ）を許容する構成も想定されるが、単純なアンドゥまたはリドゥだけでは、多様な編集の結果を相互に比較しながら試行錯誤的に利用者が編集を指示することは実際には困難である。なお、以上の説明では音声合成の場面を例示したが、時系列データを生成する各種の場面において同様の課題が想定される。以上の事情を考慮して、本開示は、利用者の意図に沿った時系列データの生成を容易化することを目的とする。 In order to synthesize a voice that accurately reflects the user's intention, the user is required to perform trial and error by repeating editing of voice synthesis conditions (for example, various parameters) and listening to the actual voice. A configuration that allows the process of canceling the latest edit in reverse order (undo) or the process of re-executing the canceled edit (redo) among multiple edits instructed by the user in order is also assumed, but a simple undo Or, it is actually difficult for the user to instruct editing by trial and error while comparing the results of various editing with each other only by redoing. In the above description, the scene of speech synthesis is illustrated, but similar problems are assumed in various scenes of generating time-series data. In consideration of the above circumstances, it is an object of the present disclosure to facilitate the generation of time-series data in line with the user's intention.

　以上の課題を解決するために、本開示のひとつの態様に係る情報処理方法は、第１発音スタイルでシンボル列を発音した音の特徴量の時系列を表す第１時系列データを、利用者からの第１指示に応じて編集し、前記第１時系列データの編集毎に、当該編集後の前記第１時系列データに応じた第１履歴データを新規バージョンのデータとして保存し、前記第１発音スタイルとは異なる第２発音スタイルで前記シンボル列を発音した音の特徴量の時系列を表す第２時系列データを、前記利用者からの第２指示に応じて編集し、前記第２時系列データの編集毎に、当該編集後の前記第２時系列データに応じた第２履歴データを新規バージョンのデータとして保存し、前記保存された相異なるバージョンの複数の第１履歴データのうち前記利用者からの指示に応じた第１履歴データに対応する第１時系列データ、または、前記保存された相異なるバージョンの複数の第２履歴データのうち前記利用者からの指示に応じた第２履歴データに対応する第２時系列データを取得する。 In order to solve the above problems, the information processing method according to one aspect of the present disclosure uses the first time-series data representing the time-series of the feature amount of the sound in which the symbol string is pronounced in the first pronunciation style. Edit according to the first instruction from, and for each edit of the first time series data, the first history data corresponding to the first time series data after the edit is saved as a new version of the data, and the first The second time-series data representing the time series of the feature amount of the sound that pronounced the symbol string in the second pronunciation style different from the first pronunciation style is edited according to the second instruction from the user, and the second Each time the time series data is edited, the second history data corresponding to the edited second time series data is saved as new version data, and among the plurality of first history data of the saved different versions. The first time-series data corresponding to the first history data in response to the instruction from the user, or the second of the plurality of saved second history data of different versions in response to the instruction from the user. 2 Acquire the second time-series data corresponding to the historical data.

　本開示のひとつの態様に係る情報処理システムは、第１発音スタイルでシンボル列を発音した音の特徴量の時系列を表す第１時系列データを、利用者からの第１指示に応じて編集し、前記第１発音スタイルとは異なる第２発音スタイルで前記シンボル列を発音した音の特徴量の時系列を表す第２時系列データを、前記利用者からの第２指示に応じて編集する編集処理部と、前記第１時系列データの編集毎に、当該編集後の前記第１時系列データに応じた第１履歴データを新規バージョンのデータとして保存し、前記第２時系列データの編集毎に、当該編集後の前記第２時系列データに応じた第２履歴データを新規バージョンのデータとして保存する情報管理部とを具備し、前記情報管理部は、前記保存された相異なるバージョンの複数の第１履歴データのうち前記利用者からの指示に応じた第１履歴データに対応する第１時系列データ、または、前記保存された相異なるバージョンの複数の第２履歴データのうち前記利用者からの指示に応じた第２履歴データに対応する第２時系列データを取得する。 The information processing system according to one aspect of the present disclosure edits the first time-series data representing the time-series of the feature amount of the sound that pronounces the symbol string in the first pronunciation style according to the first instruction from the user. Then, the second time series data representing the time series of the feature amount of the sound that pronounced the symbol string in the second pronunciation style different from the first pronunciation style is edited according to the second instruction from the user. Each time the editing processing unit edits the first time-series data, the first history data corresponding to the edited first time-series data is saved as new version data, and the second time-series data is edited. Each time, it is provided with an information management unit that saves the second history data corresponding to the edited second time-series data as new version data, and the information management unit has the saved different versions. The use of the first time-series data corresponding to the first history data according to the instruction from the user among the plurality of first history data, or the second history data of a plurality of different versions of the saved data. Acquire the second time-series data corresponding to the second history data according to the instruction from the person.

　本開示のひとつの態様に係る情報処理システムは、第１発音スタイルでシンボル列を発音した音の特徴量の時系列を表す第１時系列データを、利用者からの第１指示に応じて編集し、前記第１発音スタイルとは異なる第２発音スタイルで前記シンボル列を発音した音の特徴量の時系列を表す第２時系列データを、前記利用者からの第２指示に応じて編集する編集処理部、および、前記第１時系列データの編集毎に、当該編集後の前記第１時系列データに応じた第１履歴データを新規バージョンのデータとして保存し、前記第２時系列データの編集毎に、当該編集後の前記第２時系列データに応じた第２履歴データを新規バージョンのデータとして保存する情報管理部、としてコンピュータシステムを機能させるプログラムであって、前記情報管理部は、前記保存された相異なるバージョンの複数の第１履歴データのうち前記利用者からの指示に応じた第１履歴データに対応する第１時系列データ、または、前記保存された相異なるバージョンの複数の第２履歴データのうち前記利用者からの指示に応じた第２履歴データに対応する第２時系列データを取得する。 The information processing system according to one aspect of the present disclosure edits the first time-series data representing the time-series of the feature amount of the sound that pronounces the symbol string in the first pronunciation style according to the first instruction from the user. Then, the second time series data representing the time series of the feature amount of the sound that pronounced the symbol string in the second pronunciation style different from the first pronunciation style is edited according to the second instruction from the user. For each editing of the editing processing unit and the first time-series data, the first history data corresponding to the edited first time-series data is saved as new version data, and the second time-series data is stored. A program that causes the computer system to function as an information management unit that saves the second history data corresponding to the edited second time-series data as new version data for each edit, and the information management unit is a program. Among the plurality of saved different versions of the first history data, the first time-series data corresponding to the first history data in response to the instruction from the user, or the plurality of saved different versions of the first history data. Of the second history data, the second time-series data corresponding to the second history data according to the instruction from the user is acquired.

第１実施形態に係る情報処理システムの構成を例示するブロック図である。It is a block diagram which illustrates the structure of the information processing system which concerns on 1st Embodiment. 編集画面の模式図である。It is a schematic diagram of an edit screen. 情報処理システムの機能的な構成を例示するブロック図である。It is a block diagram which illustrates the functional structure of an information processing system. 第１編集処理の手順を例示するフローチャートである。It is a flowchart illustrating the procedure of the 1st editing process. 第２編集処理の手順を例示するフローチャートである。It is a flowchart which illustrates the procedure of the 2nd editing process. 第３編集処理の手順を例示するフローチャートである。It is a flowchart which illustrates the procedure of the 3rd editing process. 履歴領域におけるデータ構造の説明図である。It is explanatory drawing of the data structure in a history area. 第１管理処理の手順を例示するフローチャートである。It is a flowchart illustrating the procedure of the 1st management process. 第２管理処理の手順を例示するフローチャートである。It is a flowchart which illustrates the procedure of the 2nd management process. 第３管理処理の手順を例示するフローチャートである。It is a flowchart which illustrates the procedure of the 3rd management process. 第２実施形態における編集画面の模式図である。It is a schematic diagram of the edit screen in 2nd Embodiment. 第２実施形態における情報処理システムの機能的な構成を例示するブロック図である。It is a block diagram which illustrates the functional structure of the information processing system in 2nd Embodiment. 第２実施形態における履歴領域におけるデータ構造の説明図である。It is explanatory drawing of the data structure in the history area in 2nd Embodiment. 比較画面の模式図である。It is a schematic diagram of a comparison screen. 第３実施形態における合成音の説明図である。It is explanatory drawing of the synthetic sound in 3rd Embodiment. 第３実施形態における編集画面の模式図である。It is a schematic diagram of the edit screen in 3rd Embodiment. 変形例における編集画面の模式図である。It is a schematic diagram of an edit screen in a modification.

Ａ：第１実施形態
　図１は、本開示の第１実施形態に係る情報処理システム１００の構成を例示するブロック図である。情報処理システム１００は、音響信号Ｚを生成する音響処理システムである。音響信号Ｚは、合成音の波形を表す時間領域の信号である。合成音は、例えば仮想的な演奏者が楽器を演奏することで発音される楽器音、または、例えば仮想的な歌唱者が楽曲を歌唱することで発音される歌唱音である。 A: First Embodiment FIG. 1 is a block diagram illustrating the configuration of the information processing system 100 according to the first embodiment of the present disclosure. The information processing system 100 is an acoustic processing system that generates an acoustic signal Z. The acoustic signal Z is a signal in the time domain representing the waveform of the synthetic sound. The synthetic sound is, for example, a musical instrument sound produced by a virtual performer playing a musical instrument, or a singing sound produced by, for example, a virtual singer singing a song.

　情報処理システム１００は、制御装置１１と記憶装置１２と放音装置１３と表示装置１４と操作装置１５とを具備するコンピュータシステムで実現される。情報処理システム１００は、例えば、スマートフォン、タブレット端末またはパーソナルコンピュータ等の情報機器により実現される。なお、情報処理システム１００は、単体の装置で実現されるほか、相互に別体で構成された複数の装置（例えばクライアントサーバシステム）でも実現される。 The information processing system 100 is realized by a computer system including a control device 11, a storage device 12, a sound emitting device 13, a display device 14, and an operating device 15. The information processing system 100 is realized by, for example, an information device such as a smartphone, a tablet terminal, or a personal computer. The information processing system 100 is realized not only by a single device but also by a plurality of devices (for example, a client-server system) configured as separate bodies from each other.

　制御装置１１は、情報処理システム１００の各要素を制御する単数または複数のプロセッサである。具体的には、例えばＣＰＵ（Central Processing Unit）、ＳＰＵ（Sound Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＦＰＧＡ（Field Programmable Gate Array）、またはＡＳＩＣ（Application Specific Integrated Circuit）等の１種類以上のプロセッサにより、制御装置１１が構成される。制御装置１１は、音響信号Ｚを生成する各種の処理を実行する。 The control device 11 is a single or a plurality of processors that control each element of the information processing system 100. Specifically, for example, one or more types of processors such as CPU (Central Processing Unit), SPU (Sound Processing Unit), DSP (Digital Signal Processor), FPGA (Field Programmable Gate Array), or ASIC (Application Specific Integrated Circuit). 3. The control device 11 is configured. The control device 11 executes various processes for generating the acoustic signal Z.

　記憶装置１２は、制御装置１１が実行するプログラムと制御装置１１が使用する各種のデータとを記憶する単数または複数のメモリである。記憶装置１２は、例えば磁気記録媒体または半導体記録媒体等の公知の記録媒体で構成される。記憶装置１２は、複数種の記録媒体の組合せで構成されてもよい。また、情報処理システム１００に着脱される可搬型の記録媒体、または、通信網を介した書込および読出が可能な記録媒体（例えばクラウドストレージ）が、記憶装置１２として利用されてもよい。 The storage device 12 is a single or a plurality of memories for storing a program executed by the control device 11 and various data used by the control device 11. The storage device 12 is composed of a known recording medium such as a magnetic recording medium or a semiconductor recording medium. The storage device 12 may be composed of a combination of a plurality of types of recording media. Further, a portable recording medium attached to and detached from the information processing system 100, or a recording medium capable of writing and reading via a communication network (for example, cloud storage) may be used as the storage device 12.

　放音装置１３は、制御装置１１が生成した音響信号Ｚが表す合成音を再生する。放音装置１３は、例えばスピーカまたはヘッドホンである。なお、音響信号Ｚをデジタルからアナログに変換するＤ/Ａ変換器と、音響信号Ｚを増幅する増幅器とについては、便宜的に図示が省略されている。また、図１においては、放音装置１３が情報処理システム１００に搭載された構成を例示したが、情報処理システム１００とは別体の放音装置１３が有線または無線により情報処理システム１００に接続されてもよい。 The sound emitting device 13 reproduces the synthetic sound represented by the acoustic signal Z generated by the control device 11. The sound emitting device 13 is, for example, a speaker or headphones. The D / A converter that converts the acoustic signal Z from digital to analog and the amplifier that amplifies the acoustic signal Z are not shown for convenience. Further, in FIG. 1, the configuration in which the sound emitting device 13 is mounted on the information processing system 100 is illustrated, but the sound emitting device 13 separate from the information processing system 100 is connected to the information processing system 100 by wire or wirelessly. May be done.

　表示装置１４は、制御装置１１による制御のもとで画像を表示する。表示装置１４は、例えば液晶パネルまたは有機ＥＬ（ElectroLuminescence）パネル等の表示パネルで構成される。操作装置１５は、利用者からの指示を受付ける入力機器である。操作装置１５は、例えば、利用者が操作する複数の操作子、または、利用者による接触を検知するタッチパネルである。利用者は、操作装置１５を操作することで、合成音の条件を指示することが可能である。表示装置１４は、合成音の条件を指示するために利用者が参照する画像（以下「編集画面」という）Ｇを表示する。 The display device 14 displays an image under the control of the control device 11. The display device 14 is composed of a display panel such as a liquid crystal panel or an organic EL (ElectroLuminescence) panel. The operation device 15 is an input device that receives instructions from the user. The operation device 15 is, for example, a plurality of controls operated by the user or a touch panel for detecting contact by the user. The user can instruct the condition of the synthesized sound by operating the operation device 15. The display device 14 displays an image (hereinafter referred to as “editing screen”) G referred to by the user for instructing the condition of the synthetic sound.

　図２は、編集画面Ｇの模式図である。編集画面Ｇは、複数の編集領域Ｅ（Ｅn、ＥfおよびＥw）を含む。複数の編集領域Ｅには共通の時間軸（横軸）が設定される。合成音のうち編集画面Ｇに表示される区間は、操作装置１５に対する利用者からの指示に応じて変更される。 FIG. 2 is a schematic diagram of the edit screen G. The editing screen G includes a plurality of editing areas E (En, Ef, and Ew). A common time axis (horizontal axis) is set in the plurality of editing areas E. The section of the synthetic sound displayed on the edit screen G is changed according to the instruction from the user to the operation device 15.

　編集領域Ｅnには、合成音の楽譜を構成する複数の音符の時系列（以下「音符列」という）Ｎが表示される。編集領域Ｅnには、時間軸と音高軸（縦軸）とで規定される座標平面が設定される。音符列Ｎを構成する各音符を表す画像が編集領域Ｅnに配置される。音符列Ｎの音符毎に音高（例えばノート番号）と発音期間とが指定される。また、合成音が歌唱音である場合には、音符毎に音韻が指定される。編集領域Ｅnには、例えばクレッシェンド、フォルテまたはデクレッシェンド等の演奏記号も表示される。利用者は、操作装置１５を操作することで、編集領域Ｅnに対する編集指示Ｑnを付与できる。編集指示Ｑnは、音符列Ｎを編集する指示である。具体的には、編集指示Ｑnは、音符列Ｎの各音符の追加または削除の指示、各音符の条件（音高、発音期間または音韻）の変更の指示、または、演奏記号の変更の指示である。 In the editing area En, a time series (hereinafter referred to as "note sequence") N of a plurality of notes constituting the score of the synthesized sound is displayed. A coordinate plane defined by a time axis and a pitch axis (vertical axis) is set in the editing area En. An image representing each note constituting the note sequence N is arranged in the editing area En. A pitch (for example, a note number) and a pronunciation period are specified for each note in the note sequence N. When the synthetic sound is a singing sound, the phoneme is specified for each note. In the editing area En, performance symbols such as crescendo, forte, and decrescendo are also displayed. The user can give an edit instruction Qn to the edit area En by operating the operation device 15. The edit instruction Qn is an instruction to edit the note string N. Specifically, the edit instruction Qn is an instruction to add or delete each note in the note sequence N, an instruction to change the condition (pitch, pronunciation period or phonology) of each note, or an instruction to change the performance symbol. be.

　編集領域Ｅfには、合成音の特徴量の時系列（以下「特徴列」という）Ｆが表示される。特徴量は、合成音の音響的な特徴量である。具体的には、合成音の基本周波数（ピッチ）を特徴量として編集領域Ｅfに特徴列Ｆ（すなわち基本周波数の時間的な遷移）が表示される。利用者は、操作装置１５を操作することで、編集領域Ｅfに対する編集指示Ｑfを付与できる。編集指示Ｑfは、特徴列Ｆを編集する指示である。具体的には、編集指示Ｑfは、例えば、編集領域Ｅfに表示された特徴列Ｆのうち利用者の所望の区間における特徴量の時間変化を変更する指示である。 In the editing area Ef, a time series (hereinafter referred to as "feature column") F of the feature amount of the synthetic sound is displayed. The feature amount is an acoustic feature amount of the synthetic sound. Specifically, the feature column F (that is, the temporal transition of the fundamental frequency) is displayed in the editing area Ef with the fundamental frequency (pitch) of the synthesized sound as the feature amount. The user can give an edit instruction Qf to the edit area Ef by operating the operation device 15. The edit instruction Qf is an instruction to edit the feature column F. Specifically, the editing instruction Qf is, for example, an instruction for changing the time change of the feature amount in the desired section of the feature column F displayed in the editing area Ef.

　編集領域Ｅwには、時間軸上における合成音の波形Ｗが表示される。利用者は、操作装置１５を操作することで、編集領域Ｅwに対する編集指示Ｑwを付与できる。編集指示Ｑwは、波形Ｗを編集する指示である。具体的には、編集指示Ｑwは、編集領域Ｅwに表示された波形Ｗのうち利用者の所望の区間における波形を変更する指示である。 In the editing area Ew, the waveform W of the synthesized sound on the time axis is displayed. The user can give an edit instruction Qw to the edit area Ew by operating the operation device 15. The edit instruction Qw is an instruction to edit the waveform W. Specifically, the editing instruction Qw is an instruction to change the waveform in the user's desired section of the waveform W displayed in the editing area Ew.

　編集画面Ｇは、以上に例示した複数の編集領域Ｅのほか、相異なる編集領域Ｅに対応する複数の操作領域（Ｇn、ＧfおよびＧw）と、操作画像Ｂ1（再生）とを含む。操作画像Ｂ1は、操作装置１５を利用して利用者が操作可能なソフトウェアボタンである。具体的には、操作画像Ｂ1は、合成音の再生を利用者が指示するための操作子である。具体的には、利用者が操作画像Ｂ1を操作することで、編集領域Ｅwに表示された波形Ｗの合成音が放音装置１３から再生される。 The editing screen G includes, in addition to the plurality of editing areas E exemplified above, a plurality of operating areas (Gn, Gf and Gw) corresponding to different editing areas E, and an operating image B1 (playback). The operation image B1 is a software button that can be operated by the user using the operation device 15. Specifically, the operation image B1 is an operation element for the user to instruct the reproduction of the synthesized sound. Specifically, when the user operates the operation image B1, the synthetic sound of the waveform W displayed in the editing area Ew is reproduced from the sound emitting device 13.

　操作領域Ｇnは、音符列Ｎに関する領域である。具体的には、操作領域Ｇnには、音符列バージョン番号Ｖnと操作画像Ｇn1と操作画像Ｇn2とが表示される。音符列バージョン番号Ｖnは、編集領域Ｅnに表示される音符列Ｎのバージョンを表す番号である。編集指示Ｑnに応じた音符列Ｎの編集毎に音符列バージョン番号Ｖnが１ずつ増加する。また、利用者は、操作装置１５を操作することで、操作領域Ｇn内の音符列バージョン番号Ｖnを任意の数値に変更することが可能である。過去の編集の過程で生成された音符列Ｎの複数のバージョンのうち、利用者による変更後の音符列バージョン番号Ｖnに対応するバージョンの音符列Ｎが編集領域Ｅnに表示される。 The operation area Gn is an area related to the note string N. Specifically, the note string version number Vn, the operation image Gn1 and the operation image Gn2 are displayed in the operation area Gn. The note string version number Vn is a number representing the version of the note string N displayed in the editing area En. The note string version number Vn is incremented by 1 each time the note string N is edited according to the edit instruction Qn. Further, the user can change the note string version number Vn in the operation area Gn to an arbitrary numerical value by operating the operation device 15. Of the plurality of versions of the note string N generated in the process of past editing, the note string N of the version corresponding to the note string version number Vn changed by the user is displayed in the editing area En.

　操作画像Ｇn1および操作画像Ｇn2は、操作装置１５を利用して利用者が操作可能なソフトウェアボタンである。操作画像Ｇn1は、音符列Ｎを直前の編集の実行前の状態に戻すこと（Undo）を利用者が指示するための操作子である。すなわち、利用者が操作画像Ｇn1を操作することで、音符列バージョン番号Ｖnが直前の数値に変更され、かつ、当該変更後の音符列バージョン番号Ｖnに対応するバージョンの音符列Ｎが編集領域Ｅnに表示される。したがって、操作画像Ｇn1は、音符列バージョン番号Ｖnを直前の数値に後退させる（すなわち音符列Ｎに関する直前の編集を取消する）ための操作子とも表現される。他方、操作画像Ｇn2は、操作画像Ｇn1に対する操作で取消された編集を再び実行すること（Redo）を利用者が指示するための操作子である。 The operation image Gn1 and the operation image Gn2 are software buttons that can be operated by the user using the operation device 15. The operation image Gn1 is an operation element for the user to instruct to return the note string N to the state before the execution of the immediately preceding edit (Undo). That is, when the user operates the operation image Gn1, the note string version number Vn is changed to the immediately preceding numerical value, and the note string N of the version corresponding to the changed note string version number Vn is the edit area En. Is displayed in. Therefore, the operation image Gn1 is also expressed as an operator for retreating the note string version number Vn to the immediately preceding numerical value (that is, canceling the immediately preceding edit regarding the note string N). On the other hand, the operation image Gn2 is an operator for instructing the user to perform the editing canceled by the operation on the operation image Gn1 again (Redo).

　操作領域Ｇfは、特徴列Ｆに関する領域である。具体的には、操作領域Ｇfには、特徴列バージョン番号Ｖfと操作画像Ｇf1と操作画像Ｇf2とが表示される。特徴列バージョン番号Ｖfは、編集領域Ｅfに表示される特徴列Ｆのバージョンを表す番号である。編集指示Ｑfに応じた特徴列Ｆの編集毎に特徴列バージョン番号Ｖfが１ずつ増加する。また、利用者は、操作装置１５を操作することで、操作領域Ｇf内の特徴列バージョン番号Ｖfを任意の数値に変更することが可能である。過去の編集の過程で生成された特徴列Ｆの複数のバージョンのうち、利用者による変更後の特徴列バージョン番号Ｖfに対応するバージョンの特徴列Ｆが編集領域Ｅfに表示される。 The operation area Gf is an area related to the feature column F. Specifically, the feature column version number Vf, the operation image Gf1, and the operation image Gf2 are displayed in the operation area Gf. The feature column version number Vf is a number representing the version of the feature column F displayed in the editing area Ef. The feature column version number Vf is incremented by 1 each time the feature column F is edited according to the edit instruction Qf. Further, the user can change the feature column version number Vf in the operation area Gf to an arbitrary numerical value by operating the operation device 15. Of the plurality of versions of the feature column F generated in the process of past editing, the feature column F of the version corresponding to the feature column version number Vf changed by the user is displayed in the editing area Ef.

　操作画像Ｇf1および操作画像Ｇf2は、操作装置１５を利用して利用者が操作可能なソフトウェアボタンである。操作画像Ｇf1は、特徴列Ｆを直前の編集の実行前の状態に戻すこと（Undo）を利用者が指示するための操作子である。すなわち、利用者が操作画像Ｇf1を操作することで、特徴列バージョン番号Ｖfが直前の数値に変更され、かつ、当該変更後の特徴列バージョン番号Ｖfに対応するバージョンの特徴列Ｆが編集領域Ｅfに表示される。したがって、操作画像Ｇf1は、特徴列バージョン番号Ｖfを直前の数値に後退させる（すなわち特徴列Ｆに関する直前の編集を取消する）ための操作子とも表現される。他方、操作画像Ｇf2は、操作画像Ｇf1に対する操作で取消された編集を再び実行すること（Redo）を利用者が指示するための操作子である。 The operation image Gf1 and the operation image Gf2 are software buttons that can be operated by the user using the operation device 15. The operation image Gf1 is an operation element for the user to instruct to return the feature column F to the state before the execution of the immediately preceding edit (Undo). That is, when the user operates the operation image Gf1, the feature column version number Vf is changed to the immediately preceding numerical value, and the feature column F of the version corresponding to the changed feature column version number Vf is the edit area Ef. Is displayed in. Therefore, the operation image Gf1 is also expressed as an operator for retreating the feature column version number Vf to the immediately preceding numerical value (that is, canceling the immediately preceding edit regarding the feature sequence F). On the other hand, the operation image Gf2 is an operator for instructing the user to perform the editing canceled by the operation on the operation image Gf1 again (Redo).

　操作領域Ｇwは、波形Ｗに関する領域である。具体的には、操作領域Ｇwには、波形バージョン番号Ｖwと操作画像Ｇw1と操作画像Ｇw2とが表示される。波形バージョン番号Ｖwは、編集領域Ｅwに表示される波形Ｗのバージョンを表す番号である。編集指示Ｑwに応じた波形Ｗの編集毎に波形バージョン番号Ｖwが１ずつ増加する。また、利用者は、操作装置１５を操作することで、操作領域Ｇw内の波形バージョン番号Ｖwを任意の数値に変更することが可能である。過去の編集の過程で生成された波形Ｗの複数のバージョンのうち、利用者による変更後の波形バージョン番号Ｖwに対応するバージョンの波形Ｗが編集領域Ｅwに表示される。 The operation area Gw is an area related to the waveform W. Specifically, the waveform version number Vw, the operation image Gw1 and the operation image Gw2 are displayed in the operation area Gw. The waveform version number Vw is a number representing the version of the waveform W displayed in the editing area Ew. The waveform version number Vw is incremented by 1 each time the waveform W is edited according to the edit instruction Qw. Further, the user can change the waveform version number Vw in the operation area Gw to an arbitrary numerical value by operating the operation device 15. Of the plurality of versions of the waveform W generated in the process of past editing, the version of the waveform W corresponding to the waveform version number Vw changed by the user is displayed in the editing area Ew.

　操作画像Ｇw1および操作画像Ｇw2は、操作装置１５を利用して利用者が操作可能なソフトウェアボタンである。操作画像Ｇw1は、波形Ｗを直前の編集の実行前の状態に戻すこと（Undo）を利用者が指示するための操作子である。すなわち、利用者が操作画像Ｇw1を操作することで、波形バージョン番号Ｖwが直前の数値に変更され、かつ、当該変更後の波形バージョン番号Ｖwに対応するバージョンの波形Ｗが編集領域Ｅwに表示される。したがって、操作画像Ｇw1は、波形バージョン番号Ｖwを直前の数値に後退させる（すなわち波形Ｗに関する直前の編集を取消する）ための操作子とも表現される。他方、操作画像Ｇw2は、操作画像Ｇw1に対する操作で取消された編集を再び実行すること（Redo）を利用者が指示するための操作子である。 The operation image Gw1 and the operation image Gw2 are software buttons that can be operated by the user using the operation device 15. The operation image Gw1 is an operator for instructing the user to return the waveform W to the state before the execution of the immediately preceding edit (Undo). That is, when the user operates the operation image Gw1, the waveform version number Vw is changed to the immediately preceding value, and the waveform W of the version corresponding to the changed waveform version number Vw is displayed in the editing area Ew. To. Therefore, the operation image Gw1 is also expressed as an operator for retreating the waveform version number Vw to the immediately preceding value (that is, canceling the immediately preceding edit regarding the waveform W). On the other hand, the operation image Gw2 is an operator for instructing the user to perform the editing canceled by the operation on the operation image Gw1 again (Redo).

　以上の例示の通り、第１実施形態においては、複数のバージョン番号Ｖ（Ｖn、Ｖf、Ｖw）が使用される。各バージョン番号の増加（increment）は、編集作業の進行を意味し、各バージョン番号の減少（decrement）は、編集作業の後退を意味する。 As described above, in the first embodiment, a plurality of version numbers V (Vn, Vf, Vw) are used. An increase in each version number (increment) means the progress of the editing work, and a decrease in each version number (decrement) means a recession in the editing work.

　図３は、情報処理システム１００の機能的な構成を例示するブロック図である。制御装置１１は、記憶装置１２に記憶されたプログラムを実行することで、合成音の条件の編集と音響信号Ｚの生成とのための複数の機能（表示制御部２０、編集処理部３０および情報管理部４０）を実現する。表示制御部２０は、制御装置１１による制御のもとで表示装置１４に画像を表示させる。例えば、表示制御部２０は、図２に例示した編集画面Ｇを表示装置１４に表示させる。また、表示制御部２０は、利用者からの指示（Ｑn、ＱfまたはＱw）に応じて編集画面Ｇを更新する。 FIG. 3 is a block diagram illustrating a functional configuration of the information processing system 100. The control device 11 executes a program stored in the storage device 12 to perform a plurality of functions (display control unit 20, editing processing unit 30, and information) for editing synthetic sound conditions and generating an acoustic signal Z. Realize the management unit 40). The display control unit 20 causes the display device 14 to display an image under the control of the control device 11. For example, the display control unit 20 causes the display device 14 to display the editing screen G illustrated in FIG. Further, the display control unit 20 updates the edit screen G in response to an instruction (Qn, Qf or Qw) from the user.

　図３の編集処理部３０は、合成音の条件（音符列Ｎ、特徴列Ｆおよび波形Ｗ）を利用者からの指示（Ｑn，ＱfまたはＱw）に応じて編集する。編集処理部３０は、第１編集部３１と第１生成部３２と第２編集部３３と第２生成部３４と第３編集部３５とを具備する。 The editing processing unit 30 in FIG. 3 edits the synthetic sound conditions (note sequence N, feature sequence F, and waveform W) according to an instruction (Qn, Qf, or Qw) from the user. The editing processing unit 30 includes a first editing unit 31, a first generation unit 32, a second editing unit 33, a second generation unit 34, and a third editing unit 35.

　第１編集部３１は、音符列データＤnを編集する。音符列データＤnは、合成音の音符列Ｎを表す時系列データである。具体的には、第１編集部３１は、編集領域Ｅnに対する利用者からの編集指示Ｑnに応じて音符列データＤnを編集する。表示制御部２０は、第１編集部３１による編集後の音符列データＤnが表す音符列Ｎを編集領域Ｅnに表示する。 The first editorial unit 31 edits the note string data Dn. The note string data Dn is time-series data representing the note sequence N of the synthesized sound. Specifically, the first editing unit 31 edits the note string data Dn according to the editing instruction Qn from the user for the editing area En. The display control unit 20 displays the musical note string N represented by the musical note string data Dn edited by the first editing unit 31 in the editing area En.

　第１生成部３２は、第１編集部３１による編集後の音符列データＤnから特徴列データＤfを生成する。特徴列データＤfは、合成音の特徴列Ｆを表す時系列データである。なお、特徴列Ｆを構成する複数の特徴量のうち時間軸上の各時点における特徴量の生成には、当該時点の音符のデータに加えて、当該音符の前方の音符および後方の音符の少なくとも一方の音符のデータが利用される。すなわち、特徴列データＤfは、音符列データＤnが表す音符列Ｎのコンテキストに応じて生成される。 The first generation unit 32 generates the feature sequence data Df from the note sequence data Dn edited by the first editing unit 31. The feature sequence data Df is time-series data representing the feature sequence F of the synthesized sound. In addition, in addition to the note data at the time point, at least the notes before and after the note are generated for the generation of the feature amount at each time point on the time axis among the plurality of feature amounts constituting the feature sequence F. The data of one note is used. That is, the feature sequence data Df is generated according to the context of the note sequence N represented by the note sequence data Dn.

　具体的には、第１生成部３２は、第１生成モデルＭ1を利用して特徴列データＤfを生成する。第１生成モデルＭ1は、音符列データＤnを入力として特徴列データＤfを出力する統計的推定モデルである。具体的には、第１生成モデルＭ1は、音符列Ｎと特徴列Ｆとの関係を学習した学習済モデルである。第１生成モデルＭ1は、例えば深層ニューラルネットワーク（DNN：Deep Neural Network）で構成される。例えば、畳込ニューラルネットワーク（CNN：Convolutional Neural Network）または再帰型ニューラルネットワーク（RNN：Recurrent Neural Network）等の任意の形式の深層ニューラルネットワークが、第１生成モデルＭ1として利用される。なお、長短期記憶（LSTM：Long Short-Term Memory）またはSelf-Attention等の付加的な要素が第１生成モデルＭ1に搭載されてもよい。 Specifically, the first generation unit 32 generates the feature column data Df using the first generation model M1. The first generative model M1 is a statistical inference model that inputs the note sequence data Dn and outputs the feature sequence data Df. Specifically, the first generative model M1 is a trained model that has learned the relationship between the note sequence N and the feature sequence F. The first generative model M1 is composed of, for example, a deep neural network (DNN). For example, an arbitrary form of deep neural network such as a convolutional neural network (CNN) or a recurrent neural network (RNN) is used as the first generative model M1. In addition, additional elements such as long short-term memory (LSTM: Long Short-Term Memory) or Self-Attention may be mounted on the first generation model M1.

　第１生成モデルＭ1は、音符列データＤnから特徴列データＤfを生成する演算を制御装置１１に実行させるプログラムと、当該演算に適用される複数の変数（具体的には加重値およびバイアス）との組合せで実現される。第１生成モデルＭ1を規定する複数の変数は、複数の第１訓練データを利用した機械学習により事前に設定されて記憶装置１２に記憶される。複数の第１訓練データの各々は、音符列データＤnと特徴列データＤf（正解値）とを含む。第１生成モデルＭ1の機械学習においては、各第１訓練データの音符列データＤnに対して暫定的な第１生成モデルＭ1が出力する特徴列データＤfと、当該第１訓練データの特徴列データＤfとの誤差が低減されるように、第１生成モデルＭ1の複数の変数が反復的に更新される。したがって、第１生成モデルＭ1は、複数の第１訓練データにおいて音符列Ｎと特徴列Ｆとの間に潜在する傾向のもとで、未知の音符列データＤnに対して統計的に妥当な特徴列データＤfを出力する。 The first generation model M1 includes a program that causes the control device 11 to execute an operation for generating feature sequence data Df from the note sequence data Dn, and a plurality of variables (specifically, weighted values and biases) applied to the operation. It is realized by the combination of. The plurality of variables defining the first generation model M1 are preset and stored in the storage device 12 by machine learning using the plurality of first training data. Each of the plurality of first training data includes the note sequence data Dn and the feature sequence data Df (correct answer value). In the machine learning of the first generation model M1, the feature sequence data Df output by the provisional first generation model M1 for the note sequence data Dn of each first training data and the feature sequence data of the first training data. A plurality of variables of the first generation model M1 are updated iteratively so that the error with Df is reduced. Therefore, the first generative model M1 is a statistically valid feature for the unknown note sequence data Dn under the latent tendency between the note sequence N and the feature sequence F in the plurality of first training data. Output the column data Df.

　第２編集部３３は、第１生成部３２が生成した特徴列データＤfを編集する。具体的には、第２編集部３３は、編集領域Ｅfに対する利用者からの編集指示Ｑfに応じて特徴列データＤfを編集する。表示制御部２０は、第１生成部３２が生成した特徴列データＤfが表す特徴列Ｆ、または第２編集部３３による編集後の特徴列データＤfが表す特徴列Ｆを、編集領域Ｅfに表示する。 The second editing unit 33 edits the feature column data Df generated by the first generation unit 32. Specifically, the second editing unit 33 edits the feature column data Df according to the editing instruction Qf from the user for the editing area Ef. The display control unit 20 displays the feature column F represented by the feature column data Df generated by the first generation unit 32 or the feature column F represented by the feature column data Df edited by the second editing unit 33 in the editing area Ef. do.

　第２生成部３４は、音符列データＤnと特徴列データＤfとから波形データＤwを生成する。波形データＤwは、合成音の波形Ｗを表す時系列データである。すなわち、波形データＤwは、音響信号Ｚを表す複数のサンプルの時系列で構成される。波形データＤwに対するＤ/Ａ変換および増幅により音響信号Ｚが生成される。なお、第１生成部３２が生成した直後の特徴列データＤf（すなわち第２編集部３３により編集されていない特徴列データＤF）を、波形データＤwの生成に利用してもよい。 The second generation unit 34 generates waveform data Dw from the note sequence data Dn and the feature sequence data Df. The waveform data Dw is time-series data representing the waveform W of the synthesized sound. That is, the waveform data Dw is composed of a time series of a plurality of samples representing the acoustic signal Z. The acoustic signal Z is generated by D / A conversion and amplification for the waveform data Dw. The feature sequence data Df immediately after being generated by the first generation unit 32 (that is, the feature sequence data DF not edited by the second editing unit 33) may be used for generating the waveform data Dw.

　第２生成部３４は、第２生成モデルＭ2を利用して波形データＤwを生成する。第２生成モデルＭ2は、音符列データＤnと特徴列データＤfとの組（以下「入力データＤin」という）を入力として波形データＤwを出力する統計的推定モデルである。具体的には、第２生成モデルＭ2は、音符列Ｎおよび特徴列Ｆの組と波形Ｗとの関係を学習した学習済モデルである。第２生成モデルＭ2は、例えば深層ニューラルネットワークで構成される。例えば、畳込ニューラルネットワークまたは再帰型ニューラルネットワーク等の任意の形式の深層ニューラルネットワークが、第２生成モデルＭ2として利用される。なお、長短期記憶またはSelf-Attention等の付加的な要素が第２生成モデルＭ2に搭載されてもよい。 The second generation unit 34 generates waveform data Dw using the second generation model M2. The second generative model M2 is a statistical inference model that outputs waveform data Dw by inputting a set of note sequence data Dn and feature sequence data Df (hereinafter referred to as “input data Din”). Specifically, the second generative model M2 is a trained model in which the relationship between the set of the note sequence N and the feature sequence F and the waveform W is learned. The second generative model M2 is composed of, for example, a deep neural network. For example, an arbitrary form of deep neural network such as a convolutional neural network or a recurrent neural network is used as the second generative model M2. In addition, additional elements such as long-term memory or self-attention may be mounted on the second generative model M2.

　第２生成モデルＭ2は、音符列データＤnと特徴列データＤfとを含む入力データＤinから波形データＤwを生成する演算を制御装置１１に実行させるプログラムと、当該演算に適用される複数の変数（具体的には加重値およびバイアス）との組合せで実現される。第２生成モデルＭ2を規定する複数の変数は、複数の第２訓練データを利用した機械学習により事前に設定されて記憶装置１２に記憶される。複数の第２訓練データの各々は、入力データＤinと波形データＤw（正解値）とを含む。第２生成モデルＭ2の機械学習においては、各第２訓練データの入力データＤinに対して暫定的な第２生成モデルＭ2が出力する波形データＤwと、当該第２訓練データの波形データＤwとの誤差が低減されるように、第２生成モデルＭ2の複数の変数が反復的に更新される。したがって、第２生成モデルＭ2は、複数の第２訓練データにおいて音符列Ｎおよび特徴列Ｆの組と波形Ｗとの間に潜在する傾向のもとで、未知の入力データＤinに対して統計的に妥当な波形データＤwを出力する。 The second generation model M2 is a program that causes the control device 11 to execute an operation of generating waveform data Dw from the input data Din including the note string data Dn and the feature sequence data Df, and a plurality of variables applied to the operation (the second generation model M2). Specifically, it is realized in combination with a weighted value and a bias). The plurality of variables defining the second generation model M2 are preset and stored in the storage device 12 by machine learning using the plurality of second training data. Each of the plurality of second training data includes input data Din and waveform data Dw (correct answer value). In the machine learning of the second generative model M2, the waveform data Dw output by the provisional second generative model M2 with respect to the input data Din of each second training data and the waveform data Dw of the second training data. A plurality of variables of the second generative model M2 are updated iteratively so that the error is reduced. Therefore, the second generative model M2 is statistical with respect to the unknown input data Din under the latent tendency between the set of the note sequence N and the feature sequence F and the waveform W in the plurality of second training data. Outputs appropriate waveform data Dw.

　第３編集部３５は、第２生成部３４が生成した波形データＤwを編集する。具体的には、第３編集部３５は、編集領域Ｅwに対する利用者からの編集指示Ｑwに応じて波形データＤwを編集する。表示制御部２０は、第２生成部３４が生成した波形データＤwが表す波形Ｗ、または第３編集部３５による編集後の波形データＤwが表す波形Ｗを、編集領域Ｅwに表示する。また、操作画像Ｂ1（再生）が利用者により操作された場合、第２生成部３４が生成した波形データＤwまたは第３編集部３５による編集後の波形データＤwに応じた音響信号Ｚが放音装置１３に供給されることで、合成音が再生される。 The third editing unit 35 edits the waveform data Dw generated by the second generation unit 34. Specifically, the third editing unit 35 edits the waveform data Dw according to the editing instruction Qw from the user for the editing area Ew. The display control unit 20 displays the waveform W represented by the waveform data Dw generated by the second generation unit 34 or the waveform W represented by the waveform data Dw edited by the third editing unit 35 in the editing area Ew. Further, when the operation image B1 (reproduction) is operated by the user, the acoustic signal Z corresponding to the waveform data Dw generated by the second generation unit 34 or the waveform data Dw edited by the third editing unit 35 is emitted. By being supplied to the device 13, the synthesized sound is reproduced.

　情報管理部４０は、音符列データＤnと特徴列データＤfと波形データＤwとの各々についてバージョンを管理する。具体的には、情報管理部４０は、音符列バージョン番号Ｖnと特徴列バージョン番号Ｖfと波形バージョン番号Ｖwとを管理する。 The information management unit 40 manages versions of each of the note sequence data Dn, the feature sequence data Df, and the waveform data Dw. Specifically, the information management unit 40 manages the note sequence version number Vn, the feature sequence version number Vf, and the waveform version number Vw.

　また、情報管理部４０は、音符列データＤnと特徴列データＤfと波形データＤwとの各々について相異なるバージョンのデータ（以下「履歴データ」という）を記憶装置１２に保存する。記憶装置１２には、履歴領域と作業領域とが設定される。履歴領域は、合成音の条件に関する編集の履歴が記憶される記憶領域である。他方、作業領域は、編集画面Ｇを利用した編集の過程において音符列データＤnと特徴列データＤfと波形データＤwとが一時的に保存される記憶領域である。 Further, the information management unit 40 stores different versions of data (hereinafter referred to as “history data”) for each of the note sequence data Dn, the feature sequence data Df, and the waveform data Dw in the storage device 12. A history area and a work area are set in the storage device 12. The history area is a storage area in which the history of editing related to the synthetic sound condition is stored. On the other hand, the work area is a storage area in which the note sequence data Dn, the feature sequence data Df, and the waveform data Dw are temporarily stored in the process of editing using the edit screen G.

　具体的には、情報管理部４０は、編集指示Ｑnに応じた音符列Ｎの編集毎に、編集後の音符列データＤnを第１履歴データＨn[Vn,Vf,Vw]として履歴領域に保存する。すなわち、新規なバージョンの音符列データＤnが第１履歴データＨn[Vn,Vf,Vw]として記憶装置１２に保存される。 Specifically, the information management unit 40 saves the edited note sequence data Dn as the first history data Hn [Vn, Vf, Vw] in the history area for each edit of the note sequence N in response to the edit instruction Qn. do. That is, the new version of the note string data Dn is stored in the storage device 12 as the first history data Hn [Vn, Vf, Vw].

　また、情報管理部４０は、編集指示Ｑfに応じた編集後の特徴列データＤfに対応する第２履歴データＨf[Vn,Vf,Vw]を、新規なバージョンのデータとして履歴領域に保存する。第１実施形態の第２履歴データＨf[Vn,Vf,Vw]は、特徴列データＤfが編集指示Ｑfに応じて如何に編集されたか（すなわち編集指示Ｑfの時系列）を表すデータである。第２履歴データＨf[Vn,Vf,Vw]は、編集の前後における特徴列データＤfの差分を表すデータとも換言される。 Further, the information management unit 40 saves the second history data Hf [Vn, Vf, Vw] corresponding to the edited feature column data Df according to the edit instruction Qf in the history area as new version data. The second history data Hf [Vn, Vf, Vw] of the first embodiment is data showing how the feature column data Df was edited according to the edit instruction Qf (that is, the time series of the edit instruction Qf). The second history data Hf [Vn, Vf, Vw] is also referred to as data representing the difference between the feature column data Df before and after editing.

　同様に、情報管理部４０は、編集指示Ｑwに応じた編集後の波形データＤwに対応する第３履歴データＨw[Vn,Vf,Vw]を、新規なバージョンのデータとして履歴領域に保存する。第１実施形態の第３履歴データＨw[Vn,Vf,Vw]は、波形データＤwが編集指示Ｑwに応じて如何に編集されたか（すなわち編集指示Ｑwの時系列）を表すデータである。第３履歴データＨw[Vn,Vf,Vw]は、編集の前後における波形データＤwの差分を表すデータとも換言される。 Similarly, the information management unit 40 saves the third history data Hw [Vn, Vf, Vw] corresponding to the edited waveform data Dw according to the edit instruction Qw in the history area as new version data. The third history data Hw [Vn, Vf, Vw] of the first embodiment is data showing how the waveform data Dw was edited according to the editing instruction Qw (that is, the time series of the editing instruction Qw). The third history data Hw [Vn, Vf, Vw] is also referred to as data representing the difference between the waveform data Dw before and after editing.

　図４から図６は、利用者からの編集指示Ｑ（Ｑn、ＱfまたはＱw）に応じて合成音の条件を編集する編集処理Ｓa（Ｓa1、Ｓa2およびＳa3）の具体的な手順を例示するフローチャートである。図４は、音符列Ｎの編集に関する第１編集処理Ｓa1のフローチャートである。音符列Ｎに対する編集指示Ｑnを契機として第１編集処理Ｓa1が開始される。第１編集処理Ｓa1が開始されると、第１編集部３１は、現時点の音符列データＤnを編集指示Ｑnに応じて編集する（Ｓa101）。 4 to 6 are flowcharts illustrating a specific procedure of the editing process Sa (Sa1, Sa2 and Sa3) for editing the condition of the synthetic sound according to the editing instruction Q (Qn, Qf or Qw) from the user. Is. FIG. 4 is a flowchart of the first editing process Sa1 relating to the editing of the note string N. The first editing process Sa1 is started with the editing instruction Qn for the note string N as a trigger. When the first editing process Sa1 is started, the first editing unit 31 edits the current note string data Dn according to the editing instruction Qn (Sa101).

　情報管理部４０は、音符列バージョン番号Ｖnを「１」だけ増加させる（Ｓa102）。なお、編集指示Ｑnが最初に付与された段階では、音符列データＤnが新規に生成され（Ｓa101）、音符列バージョン番号Ｖnが「０」に初期化される（Ｓa102）。また、情報管理部４０は、特徴列バージョン番号Ｖfを「０」に初期化し（Ｓa103）、かつ、波形バージョン番号Ｖwを「０」に初期化する（Ｓa104）。そして、情報管理部４０は、第１編集部３１による編集後の音符列データＤnを、音符列Ｎの第１履歴データＨn[Vn,Vf=0,Vw=0]として記憶装置１２の履歴領域に保存する（Ｓa105）。 The information management unit 40 increases the note string version number Vn by "1" (Sa102). When the edit instruction Qn is first given, the note string data Dn is newly generated (Sa101), and the note string version number Vn is initialized to "0" (Sa102). Further, the information management unit 40 initializes the feature column version number Vf to "0" (Sa103) and initializes the waveform version number Vw to "0" (Sa104). Then, the information management unit 40 uses the note string data Dn edited by the first editing unit 31 as the first history data Hn [Vn, Vf = 0, Vw = 0] of the note string N in the history area of the storage device 12. Save to (Sa105).

　以上の説明から理解される通り、編集指示Ｑnに応じた音符列データＤnの編集毎に、当該編集後のバージョンの音符列データＤnが第１履歴データＨn[Vn,Vf=0,Vw=0]として履歴領域に保存され（Ｓa105）、音符列バージョン番号Ｖnが増加され（Ｓa102）、かつ、特徴列バージョン番号Ｖfと波形バージョン番号Ｖwとが初期化される（Ｓa103およびＳa104）。 As can be understood from the above explanation, for each edit of the note string data Dn according to the edit instruction Qn, the note string data Dn of the edited version is the first history data Hn [Vn, Vf = 0, Vw = 0. ] Is saved in the history area (Sa105), the note string version number Vn is increased (Sa102), and the feature column version number Vf and the waveform version number Vw are initialized (Sa103 and Sa104).

　第１生成部３２は、第１編集部３１による編集後の音符列データＤnを第１生成モデルＭ1に供給することで特徴列データＤfを生成する（Ｓa106）。第１生成部３２が生成した特徴列データＤfは、記憶装置１２の作業領域に保存される。また、第２生成部３４は、第１編集部３１による編集後の音符列データＤnと第１生成部３２が生成した特徴列データＤfとを含む入力データＤinを第２生成モデルＭ2に供給することで波形データＤwを生成する（Ｓa107）。第２生成部３４が生成した波形データＤwは、記憶装置１２の作業領域に保存される。 The first generation unit 32 generates the feature sequence data Df by supplying the note sequence data Dn edited by the first editing unit 31 to the first generation model M1 (Sa106). The feature sequence data Df generated by the first generation unit 32 is stored in the work area of the storage device 12. Further, the second generation unit 34 supplies the input data Din including the note sequence data Dn edited by the first editing unit 31 and the feature sequence data Df generated by the first generation unit 32 to the second generation model M2. This generates waveform data Dw (Sa107). The waveform data Dw generated by the second generation unit 34 is stored in the work area of the storage device 12.

　なお、音符列データＤnは、音符毎に１個のデータが必要である。特徴列データＤfは、各音符内におけるピッチの変化を表すため、数ミリ秒から数十ミリ秒毎に１個のサンプルで構成される。波形データＤwは、各音符の波形を表すため、サンプリング周期（例えば１/５０ｋＨｚ～２０μ秒）毎に１個のサンプルが構成される。以上の例示の通り、１個の音符列データＤnから作成される特徴列データＤfのデータ量は、当該音符列データＤnのデータ量の数百倍から数千倍であり、１個の特徴列データＤfから生成される波形データＤwのデータ量は、当該特徴列データＤfのデータ量の数百倍から数千倍である。以上の事情を考慮して、第１実施形態においては、上位層のデータ（音符列データＤn）はそのまま第１履歴データＨn[Vn,Vf=0,Vw=0]として保存される。他方、階層のデータ（特徴列データＤfおよび波形データＤw）は、前述の通りデータ量が大きいため、上位層のデータとの差分だけが履歴データとして保存される。以上の構成によれば、階層のデータについても当該データ自体を保存する構成と比較して、記憶装置１２に記憶されるデータ量を大幅に削減できるという利点がある。 Note that the note string data Dn requires one data for each note. The feature sequence data Df is composed of one sample every several milliseconds to several tens of milliseconds in order to represent the change in pitch in each note. Since the waveform data Dw represents the waveform of each note, one sample is configured for each sampling period (for example, 1/50 kHz to 20 μsec). As shown in the above example, the amount of data of the feature sequence data Df created from one note sequence data Dn is several hundred to several thousand times the amount of data of the note sequence data Dn, and one feature sequence. The amount of data of the waveform data Dw generated from the data Df is several hundred times to several thousand times the amount of data of the feature column data Df. In consideration of the above circumstances, in the first embodiment, the upper layer data (note string data Dn) is stored as it is as the first history data Hn [Vn, Vf = 0, Vw = 0]. On the other hand, since the layer data (feature column data Df and waveform data Dw) has a large amount of data as described above, only the difference from the upper layer data is stored as historical data. According to the above configuration, there is an advantage that the amount of data stored in the storage device 12 can be significantly reduced with respect to the hierarchical data as compared with the configuration in which the data itself is stored.

　表示制御部２０は、編集画面Ｇを更新する（Ｓa108－Ｓa110）。具体的には、表示制御部２０は、第１編集部３１による編集後の音符列データＤnが表す音符列Ｎを編集領域Ｅnに表示する（Ｓa108）。また、表示制御部２０は、作業領域に保存された現時点の特徴列データＤfが表す特徴列Ｆを編集領域Ｅfに表示する（Ｓa109）。同様に、表示制御部２０は、作業領域に保存された現時点の波形データＤwが表す波形Ｗを編集領域Ｅwに表示する（Ｓa110）。 The display control unit 20 updates the edit screen G (Sa108-Sa110). Specifically, the display control unit 20 displays the note string N represented by the note string data Dn edited by the first editing unit 31 in the editing area En (Sa108). Further, the display control unit 20 displays the feature column F represented by the current feature column data Df stored in the work area in the edit area Ef (Sa109). Similarly, the display control unit 20 displays the waveform W represented by the current waveform data Dw stored in the work area in the edit area Ew (Sa110).

　図５は、特徴列Ｆの編集に関する第２編集処理Ｓa2のフローチャートである。特徴列Ｆに対する編集指示Ｑfを契機として第２編集処理Ｓa2が開始される。第２編集処理Ｓa2が開始されると、第２編集部３３は、現時点の特徴列データＤfを編集指示Ｑfに応じて編集する（Ｓa201）。 FIG. 5 is a flowchart of the second editing process Sa2 relating to the editing of the feature column F. The second editing process Sa2 is started with the editing instruction Qf for the feature column F as a trigger. When the second editing process Sa2 is started, the second editing unit 33 edits the current feature column data Df according to the editing instruction Qf (Sa201).

　情報管理部４０は、特徴列バージョン番号Ｖfを「１」だけ増加させる（Ｓa202）。また、情報管理部４０は、音符列バージョン番号Ｖnを現在値Ｃnに維持し（Ｓa203）、かつ、波形バージョン番号Ｖwを「０」に初期化する（Ｓa204）。そして、情報管理部４０は、今回の編集指示Ｑfを表す第２履歴データＨf[Vn,Vf,Vw=0]を新規なバージョンのデータとして履歴領域に保存する（Ｓa205）。 The information management unit 40 increases the feature column version number Vf by "1" (Sa202). Further, the information management unit 40 maintains the note string version number Vn at the current value Cn (Sa203) and initializes the waveform version number Vw to “0” (Sa204). Then, the information management unit 40 saves the second history data Hf [Vn, Vf, Vw = 0] representing the editing instruction Qf this time in the history area as new version data (Sa205).

　以上の説明から理解される通り、編集指示Ｑfに応じた特徴列データＤfの編集毎に、当該編集後の特徴列データＤfに応じた第２履歴データＨf[Vn,Vf,Vw=0]が履歴領域に保存され（Ｓa205）、音符列バージョン番号Ｖnが維持されたまま（Ｓa203）、特徴列バージョン番号Ｖfが増加され（Ｓa202）、かつ、波形バージョン番号Ｖwが初期化される（Ｓa204）。なお、ステップＳa203は省略されてもよい。 As can be understood from the above explanation, every time the feature column data Df is edited according to the edit instruction Qf, the second history data Hf [Vn, Vf, Vw = 0] corresponding to the edited feature column data Df is generated. It is saved in the history area (Sa205), the note sequence version number Vn is maintained (Sa203), the feature sequence version number Vf is increased (Sa202), and the waveform version number Vw is initialized (Sa204). Note that step Sa203 may be omitted.

　第２生成部３４は、現時点の音符列データＤnと第２編集部３３による編集後の特徴列データＤfとを含む入力データＤinを第２生成モデルＭ2に供給することで波形データＤwを生成する（Ｓa206）。第２生成部３４が生成した波形データＤwは、記憶装置１２の作業領域に保存される。 The second generation unit 34 generates waveform data Dw by supplying input data Din including the current note sequence data Dn and the feature sequence data Df edited by the second editing unit 33 to the second generation model M2. (Sa206). The waveform data Dw generated by the second generation unit 34 is stored in the work area of the storage device 12.

　表示制御部２０は、編集画面Ｇを更新する（Ｓa207およびＳa208）。具体的には、表示制御部２０は、第２編集部３３による編集後の特徴列データＤfが表す特徴列Ｆを編集領域Ｅfに表示する（Ｓa207）。また、表示制御部２０は、作業領域に保存された現時点の波形データＤwが表す波形Ｗを編集領域Ｅwに表示する（Ｓa208）。なお、第２編集処理Ｓa2においては、編集領域Ｅn内の音符列Ｎは更新されない。 The display control unit 20 updates the edit screen G (Sa207 and Sa208). Specifically, the display control unit 20 displays the feature column F represented by the feature column data Df edited by the second editing unit 33 in the editing area Ef (Sa207). Further, the display control unit 20 displays the waveform W represented by the current waveform data Dw stored in the work area in the edit area Ew (Sa208). In the second editing process Sa2, the note string N in the editing area En is not updated.

　図６は、波形Ｗの編集に関する第３編集処理Ｓa3のフローチャートである。波形Ｗに対する編集指示Ｑwを契機として第３編集処理Ｓa3が開始される。第３編集処理Ｓa3が開始されると、第３編集部３５は、現時点の波形データＤwを編集指示Ｑwに応じて編集する（Ｓa301）。 FIG. 6 is a flowchart of the third editing process Sa3 relating to the editing of the waveform W. The third editing process Sa3 is started with the editing instruction Qw for the waveform W as a trigger. When the third editing process Sa3 is started, the third editing unit 35 edits the current waveform data Dw according to the editing instruction Qw (Sa301).

　情報管理部４０は、波形バージョン番号Ｖwを「１」だけ増加させる（Ｓa302）。また、情報管理部４０は、音符列バージョン番号Ｖnを現在値Ｃnに維持し（Ｓa303）、かつ、特徴列バージョン番号Ｖfも現在値Ｃfに維持する（Ｓa304）。そして、情報管理部４０は、今回の編集指示Ｑwを表す第３履歴データＨw[Vn,Vf,Vw]を新規なバージョンのデータとして履歴領域に保存する（Ｓa305）。 The information management unit 40 increases the waveform version number Vw by "1" (Sa302). Further, the information management unit 40 maintains the note sequence version number Vn at the current value Cn (Sa303), and also maintains the feature sequence version number Vf at the current value Cf (Sa304). Then, the information management unit 40 saves the third history data Hw [Vn, Vf, Vw] representing the editing instruction Qw this time in the history area as new version data (Sa305).

　以上の説明から理解される通り、編集指示Ｑwに応じた波形データＤwの編集毎に、当該編集後の波形データＤwに応じた第３履歴データＨw[Vn,Vf,Vw]が履歴領域に保存され（Ｓa305）、音符列バージョン番号Ｖnと特徴列バージョン番号Ｖfとが維持されたまま（Ｓa303およびＳa304）、波形バージョン番号Ｖwが増加される（Ｓa302）。なお、ステップＳa303およびステップＳa304は省略されてもよい。 As understood from the above explanation, every time the waveform data Dw is edited according to the editing instruction Qw, the third history data Hw [Vn, Vf, Vw] corresponding to the edited waveform data Dw is saved in the history area. (Sa305), the waveform version number Vw is increased (Sa302) while the note string version number Vn and the feature column version number Vf are maintained (Sa303 and Sa304). In addition, step Sa303 and step Sa304 may be omitted.

　表示制御部２０は、第３編集部３５による編集後の波形データＤwが表す波形Ｗを編集領域Ｅwに表示する（Ｓa306）。なお、第３編集処理Ｓa3においては、編集領域Ｅn内の音符列Ｎと編集領域Ｅf内の特徴列Ｆとは更新されない。 The display control unit 20 displays the waveform W represented by the waveform data Dw edited by the third editing unit 35 in the editing area Ew (Sa306). In the third editing process Sa3, the note string N in the editing area En and the feature string F in the editing area Ef are not updated.

　図７は、記憶装置１２の履歴領域におけるデータ構造の説明図である。履歴領域には、音符列Ｎの相異なるバージョンに対応する複数の第１履歴データＨn[Vn,Vf=0,Vw=0]（音符列データＤn）が記憶される。複数の第１履歴データＨn[Vn,Vf=0,Vw=0]の各々について、共通の音符列Ｎのもとで相異なるバージョンの特徴列Ｆに対応する複数の第２履歴データＨf[Vn,Vf,Vw=0]が、履歴領域に記憶される。また、複数の第２履歴データＨf[Vn,Vf,Vw=0]の各々について、共通の特徴列Ｆのもとで相異なるバージョンの波形Ｗに対応する複数の第３履歴データＨw[Vn,Vf,Vw]が、履歴領域に記憶される。以上の例示の通り、音符列Ｎは特徴列Ｆの上位に位置し、特徴列Ｆは波形Ｗの上位に位置する、という階層関係が成立する。特徴列Ｆが編集されると、特徴列バージョン番号Ｖfが増加され、かつ、上位層に対応する音符列バージョン番号Ｖnが維持されたまま、下位層に対応する波形バージョン番号Ｖwは「０」に初期化される。 FIG. 7 is an explanatory diagram of the data structure in the history area of the storage device 12. A plurality of first history data Hn [Vn, Vf = 0, Vw = 0] (note string data Dn) corresponding to different versions of the note sequence N are stored in the history area. For each of the plurality of first history data Hn [Vn, Vf = 0, Vw = 0], a plurality of second history data Hf [Vn] corresponding to the feature sequence F of different versions under the common note sequence N. , Vf, Vw = 0] is stored in the history area. Further, for each of the plurality of second history data Hf [Vn, Vf, Vw = 0], a plurality of third history data Hw [Vn, Vn, corresponding to different versions of the waveform W under the common feature sequence F. Vf, Vw] is stored in the history area. As described above, the hierarchical relationship is established in which the note sequence N is located above the feature sequence F and the feature sequence F is located above the waveform W. When the feature column F is edited, the feature column version number Vf is increased, and the waveform version number Vw corresponding to the lower layer is set to "0" while the note string version number Vn corresponding to the upper layer is maintained. It is initialized.

　図８から図１０は、利用者からの指示に応じてバージョンを管理する管理処理Ｓb（Ｓb1、Ｓb2およびＳb3）の具体的な手順を例示するフローチャートである。図８は、音符列Ｎのバージョンに関する第１管理処理Ｓb1のフローチャートである。音符列バージョン番号Ｖnの変更の指示を契機として第１管理処理Ｓb1が開始される。 8 to 10 are flowcharts illustrating a specific procedure of the management process Sb (Sb1, Sb2 and Sb3) that manages the version according to the instruction from the user. FIG. 8 is a flowchart of the first management process Sb1 regarding the version of the note string N. The first management process Sb1 is started with the instruction to change the note string version number Vn.

　利用者からの指示に応じた変更後の音符列バージョン番号Ｖnの数値を以下では「設定値Ｘn」と表記する。操作領域Ｇn内の音符列バージョン番号Ｖnを利用者が直接に変更した場合、当該変更後の数値（すなわち利用者が指定した数値）が設定値Ｘnに相当する。また、利用者が操作画像Ｇn1を操作した場合、音符列バージョン番号Ｖnの現在値Ｃnの直前の数値（＝Ｃn－１）が設定値Ｘnに相当する。他方、利用者が操作画像Ｇn2を操作した場合、音符列バージョン番号Ｖnの現在値Ｃnの直後の数値（＝Ｃn＋１）が設定値Ｘnに相当する。 The numerical value of the note string version number Vn after the change according to the instruction from the user is referred to as "set value Xn" below. When the user directly changes the note string version number Vn in the operation area Gn, the changed numerical value (that is, the numerical value specified by the user) corresponds to the set value Xn. Further, when the user operates the operation image Gn1, the numerical value (= Cn-1) immediately before the current value Cn of the note string version number Vn corresponds to the set value Xn. On the other hand, when the user operates the operation image Gn2, the numerical value (= Cn + 1) immediately after the current value Cn of the note string version number Vn corresponds to the set value Xn.

　第１管理処理Ｓb1が開始されると、情報管理部４０は、音符列バージョン番号Ｖnを現在値Ｃnから設定値Ｘnに変更する（Ｓb101）。 When the first management process Sb1 is started, the information management unit 40 changes the note string version number Vn from the current value Cn to the set value Xn (Sb101).

　情報管理部４０は、特徴列バージョン番号Ｖfを、音符列Ｎの設定値Ｘnに対応する最新値Ｙfに設定する（Ｓb102）。最新値Ｙfは、設定値Ｘnに対応するバージョンの音符列Ｎのもとで編集指示Ｑf毎に生成された特徴列Ｆの複数のバージョンのうち、最新のバージョンの番号である。 The information management unit 40 sets the feature column version number Vf to the latest value Yf corresponding to the set value Xn of the note string N (Sb102). The latest value Yf is the number of the latest version among the plurality of versions of the feature string F generated for each edit instruction Qf under the note string N of the version corresponding to the set value Xn.

　情報管理部４０は、波形バージョン番号Ｖwを、音符列Ｎの設定値Ｘnに対応する最新値Ｙwに設定する（Ｓb103）。最新値Ｙwは、設定値Ｘnに対応するバージョンの音符列Ｎのもとで編集指示Ｑw毎に生成された波形Ｗの複数のバージョンのうち、最新のバージョンの番号である。 The information management unit 40 sets the waveform version number Vw to the latest value Yw corresponding to the set value Xn of the note string N (Sb103). The latest value Yw is the number of the latest version among a plurality of versions of the waveform W generated for each edit instruction Qw under the note string N of the version corresponding to the set value Xn.

　情報管理部４０は、音符列Ｎの第１履歴データＨn[Vn=Xn,Vf=0,Vw=0]と、特徴列Ｆの第２履歴データＨf[Vn=Xn,Vf=1,Vw=0]～Ｈf[Vn=Xn,Vf=Yf,Vw=0]と、波形Ｗの第３履歴データＨw[Vn=Xn,Vf=Yf,Vw=1]～Ｈw[Vn=Xn,Vf=Yf,Vw=Yw]とを、記憶装置１２の履歴領域から取得する（Ｓb104）。なお、第２履歴データＨf[Vn=Xn,Vf=1,Vw=0]～Ｈf[Vn=Xn,Vf=Yf,Vw=0]の取得は、実際には特徴量Ｆが編集された場合に実行され、特徴量Ｆが編集されない場合には実行されない。音符列Ｎの第１履歴データＨn[Vn=Xn,Vf=0,Vw=0]は、音符列バージョン番号Ｖnが設定値Ｘnであるバージョンの音符列Ｎを表す音符列データＤnである。特徴列Ｆの第２履歴データＨf[Vn=Xn,Vf=1,Vw=0]～Ｈf[Vn=Xn,Vf=Yf,Vw=0]は、音符列バージョン番号Ｖnが設定値Ｘnである音符列Ｎのもとで利用者が順次に付与した１以上の編集指示Ｑfのうち第Ｙf番目以前の編集指示Ｑfの時系列を表すデータである。波形Ｗの第３履歴データＨw[Vn=Xn,Vf=Yf,Vw=1]～Ｈw[Vn=Xn,Vf=Yf,Vw=Yw]は、音符列バージョン番号Ｖnが設定値Ｘnであるバージョンの音符列Ｎと特徴列バージョン番号Ｖfが最新値Ｙfであるバージョンの特徴列Ｆとのもとで利用者が順次に付与した１以上の編集指示Ｑwのうち第Ｙw番目以前の編集指示Ｑwの時系列を表すデータである。 The information management unit 40 has the first history data Hn [Vn = Xn, Vf = 0, Vw = 0] of the note sequence N and the second history data Hf [Vn = Xn, Vf = 1, Vw = = of the feature sequence F. 0] to Hf [Vn = Xn, Vf = Yf, Vw = 0] and the third history data Hw [Vn = Xn, Vf = Yf, Vw = 1] to Hw [Vn = Xn, Vf = Yf] of the waveform W. , Vw = Yw] is acquired from the history area of the storage device 12 (Sb104). The acquisition of the second history data Hf [Vn = Xn, Vf = 1, Vw = 0] to Hf [Vn = Xn, Vf = Yf, Vw = 0] is when the feature amount F is actually edited. Is executed, and is not executed when the feature amount F is not edited. The first history data Hn [Vn = Xn, Vf = 0, Vw = 0] of the note string N is the note string data Dn representing the version of the note string N in which the note string version number Vn is the set value Xn. In the second history data Hf [Vn = Xn, Vf = 1, Vw = 0] to Hf [Vn = Xn, Vf = Yf, Vw = 0] of the feature column F, the note sequence version number Vn is the set value Xn. It is data representing the time series of the edit instruction Qf before the Yfth among the one or more edit instruction Qf sequentially given by the user under the note string N. The third history data Hw [Vn = Xn, Vf = Yf, Vw = 1] to Hw [Vn = Xn, Vf = Yf, Vw = Yw] of the waveform W is the version in which the note string version number Vn is the set value Xn. Of the one or more edit instruction Qw sequentially given by the user under the note sequence N of the note sequence N and the feature sequence F of the version in which the feature column version number Vf is the latest value Yf, the edit instruction Qw before the Yw th It is data representing a time series.

　第１生成部３２は、情報管理部４０が取得した第１履歴データＨn[Vn=Xn,Vf=0,Vw=0]（音符列データＤn）を第１生成モデルＭ1に供給することで特徴列データＤfを生成する（Ｓb105）。第２編集部３３は、情報管理部４０が取得した１以上の第２履歴データＨf[Vn=Xn,Vf=1,Vw=0]～Ｈf[Vn=Xn,Vf=Yf,Vw=0]が表す編集指示Ｑfに応じて当該特徴列データＤfを順次に編集する（Ｓb106）。すなわち、設定値Ｘnに対応する音符列Ｎのもとで第Ｙf番目までの編集指示Ｑfに応じて編集された特徴列データＤfが生成される。なお、第２編集部３３による編集は、複数の音符にわたる特徴列データＤfのうちのごく一部である。例えば、楽曲内の特定の音符のアタック部、または、楽曲内の第３番目のフレーズにおける最初から２個の音符等、楽曲の全体からすれば非常に僅かな部分だけが編集される。 The first generation unit 32 is characterized by supplying the first history data Hn [Vn = Xn, Vf = 0, Vw = 0] (note string data Dn) acquired by the information management unit 40 to the first generation model M1. Generate column data Df (Sb105). The second editorial unit 33 has one or more second history data Hf [Vn = Xn, Vf = 1, Vw = 0] to Hf [Vn = Xn, Vf = Yf, Vw = 0] acquired by the information management unit 40. The feature column data Df is sequentially edited according to the editing instruction Qf represented by (Sb106). That is, the feature sequence data Df edited according to the edit instruction Qf up to the Yf th is generated under the note sequence N corresponding to the set value Xn. The editing by the second editing unit 33 is a small part of the feature sequence data Df over a plurality of notes. For example, only a very small part of the whole music, such as the attack part of a specific note in the music, or the first two notes in the third phrase in the music, is edited.

　第２生成部３４は、情報管理部４０が取得した第１履歴データＨn[Vn=Xn,Vf=0,Vw=0]（音符列データＤn）と編集後の特徴列データＤfとを含む入力データＤinを第２生成モデルＭ2に供給することで波形データＤwを生成する（Ｓb107）。第３編集部３５は、情報管理部４０が取得した１以上の第３履歴データＨw[Vn=Xn,Vf=Yf,Vw=1]～Ｈw[Vn=Xn,Vf=Yf,Vw=Yw]が表す編集指示Ｑwに応じて波形データＤwを順次に編集する（Ｓb108）。すなわち、設定値Ｘnに対応する音符列Ｎと最新値Ｙfに対応する特徴列Ｆとのもとで第Ｙw番目までの編集指示Ｑwに応じて編集された波形データＤwが生成される。なお、第２履歴データＨf[Vn=Xn,Vf=1,Vw=0]～Ｈf[Vn=Xn,Vf=Yf,Vw=0]が存在しない場合、第３履歴データＨw[Vn=Xn,Vf=Yf,Vw=1]～Ｈw[Vn=Xn,Vf=Yf,Vw=Yw]は取得されない。すなわち、波形データＤwはステップＳb108において編集されず、当該波形データＤwが最終的なデータとして確定する。なお、波形Ｗを時間軸の方向に移動させる編集が指示された場合、例えば「時点１から時点２の区間をＸミリ秒だけ移動する」という編集指示Ｑwのみが第３履歴データＨw[Vn=Xn,Vf=Yf,Vw=1]～Ｈw[Vn=Xn,Vf=Yf,Vw=Yw]として保存される。したがって、移動後の波形Ｗのサンプルデータを楽曲の全体にわたり保存する形態と比較して、記憶装置１２に記憶されるデータ量を大幅に削減できる。波形Ｗに対する音量の編集またはフィルタの編集についても同様である。波形Ｗに対する音量の編集については、当該編集の区間における音量変化の遷移が保存され、波形Ｗに対するフィルタの編集については、当該編集の区間内におけるフィルタのパラメータが保存される。 The second generation unit 34 is an input including the first history data Hn [Vn = Xn, Vf = 0, Vw = 0] (note string data Dn) acquired by the information management unit 40 and the edited feature sequence data Df. The waveform data Dw is generated by supplying the data Din to the second generation model M2 (Sb107). The third editorial unit 35 has one or more third history data Hw [Vn = Xn, Vf = Yf, Vw = 1] to Hw [Vn = Xn, Vf = Yf, Vw = Yw] acquired by the information management unit 40. The waveform data Dw is sequentially edited according to the editing instruction Qw represented by (Sb108). That is, the waveform data Dw edited according to the edit instruction Qw up to the Ywth th is generated under the note string N corresponding to the set value Xn and the feature string F corresponding to the latest value Yf. If the second history data Hf [Vn = Xn, Vf = 1, Vw = 0] to Hf [Vn = Xn, Vf = Yf, Vw = 0] does not exist, the third history data Hw [Vn = Xn, Vf = Yf, Vw = 1] to Hw [Vn = Xn, Vf = Yf, Vw = Yw] are not acquired. That is, the waveform data Dw is not edited in step Sb108, and the waveform data Dw is determined as final data. When editing to move the waveform W in the direction of the time axis is instructed, for example, only the editing instruction Qw "move the section from time point 1 to time point 2 by X milliseconds" is the third history data Hw [Vn = It is saved as Xn, Vf = Yf, Vw = 1] to Hw [Vn = Xn, Vf = Yf, Vw = Yw]. Therefore, the amount of data stored in the storage device 12 can be significantly reduced as compared with the form in which the sample data of the waveform W after movement is stored over the entire music. The same applies to the editing of the volume or the editing of the filter for the waveform W. For editing the volume for the waveform W, the transition of the volume change in the section of the editing is saved, and for editing the filter for the waveform W, the parameters of the filter in the section of the editing are saved.

　表示制御部２０は、編集画面Ｇを更新する（Ｓb109－Ｓb111）。具体的には、表示制御部２０は、情報管理部４０が取得した第１履歴データＨn[Vn=Xn,Vf=0,Vw=0]（音符列データＤn）が表す音符列Ｎを編集領域Ｅnに表示し、操作領域Ｇnの音符列バージョン番号Ｖnの表示を設定値Ｘnに更新する（Ｓb109）。すなわち、第Ｘn番目の編集指示Ｑnによる編集後の音符列Ｎが編集領域Ｅnに表示される。 The display control unit 20 updates the edit screen G (Sb109-Sb111). Specifically, the display control unit 20 edits the note string N represented by the first history data Hn [Vn = Xn, Vf = 0, Vw = 0] (note string data Dn) acquired by the information management unit 40. It is displayed in En, and the display of the note string version number Vn in the operation area Gn is updated to the set value Xn (Sb109). That is, the note string N after editing by the Xnth editing instruction Qn is displayed in the editing area En.

　また、表示制御部２０は、第２編集部３３による編集後の特徴列データＤfが表す特徴列Ｆを編集領域Ｅfに表示し、操作領域Ｇfの特徴列バージョン番号Ｖfの表示を最新値Ｙfに更新する（Ｓb110）。すなわち、設定値Ｘnと最新値Ｙfとに対応する特徴列Ｆが編集領域Ｅ２に表示される。同様に、表示制御部２０は、第３編集部３５による編集後の波形データＤwが表す波形Ｗを編集領域Ｅwに表示し、操作領域Ｇwの波形バージョン番号Ｖwの表示を最新値Ｙwに更新する（Ｓb111）。すなわち、設定値Ｘnと最新値Ｙfと最新値Ｙwとに対応する波形Ｗが編集領域Ｅwに表示される。以上の状態において、利用者は、音符列Ｎと特徴列Ｆと波形Ｗとの各々に関する編集の指示（Ｑn、ＱfまたはＱw）を付与できる。 Further, the display control unit 20 displays the feature column F represented by the feature column data Df edited by the second editing unit 33 in the edit area Ef, and displays the feature column version number Vf of the operation area Gf in the latest value Yf. Update (Sb110). That is, the feature column F corresponding to the set value Xn and the latest value Yf is displayed in the editing area E2. Similarly, the display control unit 20 displays the waveform W represented by the waveform data Dw edited by the third editing unit 35 in the editing area Ew, and updates the display of the waveform version number Vw in the operation area Gw to the latest value Yw. (Sb111). That is, the waveform W corresponding to the set value Xn, the latest value Yf, and the latest value Yw is displayed in the editing area Ew. In the above state, the user can give an editing instruction (Qn, Qf or Qw) for each of the note sequence N, the feature sequence F and the waveform W.

　図９は、特徴列Ｆのバージョンに関する第２管理処理Ｓb2のフローチャートである。特徴列バージョン番号Ｖfの変更の指示を契機として第２管理処理Ｓb2が開始される。 FIG. 9 is a flowchart of the second management process Sb2 regarding the version of the feature column F. The second management process Sb2 is started with the instruction to change the feature column version number Vf.

　利用者からの指示に応じた変更後の特徴列バージョン番号Ｖfの数値を以下では「設定値Ｘf」と表記する。操作領域Ｇf内の特徴列バージョン番号Ｖfを利用者が直接に変更した場合、当該変更後の数値（すなわち利用者が指定した数値）が設定値Ｘfに相当する。また、利用者が操作画像Ｇf1を操作した場合、特徴列バージョン番号Ｖfの現在値Ｃfの直前の数値（＝Ｃf－１）が設定値Ｘfに相当する。他方、利用者が操作画像Ｇf2を操作した場合、特徴列バージョン番号Ｖfの現在値Ｃfの直後の数値（＝Ｃf＋１）が設定値Ｘfに相当する。 The numerical value of the feature column version number Vf after the change according to the instruction from the user is referred to as "set value Xf" below. When the user directly changes the feature column version number Vf in the operation area Gf, the changed numerical value (that is, the numerical value specified by the user) corresponds to the set value Xf. Further, when the user operates the operation image Gf1, the numerical value (= Cf-1) immediately before the current value Cf of the feature column version number Vf corresponds to the set value Xf. On the other hand, when the user operates the operation image Gf2, the numerical value (= Cf + 1) immediately after the current value Cf of the feature column version number Vf corresponds to the set value Xf.

　第２管理処理Ｓb2が開始されると、情報管理部４０は、特徴列バージョン番号Ｖfを現在値Ｃfから設定値Ｘfに変更する（Ｓb201）。また、情報管理部４０は、音符列バージョン番号Ｖnを現在値Ｃnに維持し（Ｓb202）、波形バージョン番号Ｖwを現在値Ｃwから最新値Ｙwに変更する（Ｓb203）。波形バージョン番号Ｖwの最新値Ｙwは、設定値Ｘfに対応するバージョンの特徴列Ｆのもとで編集指示Ｑw毎に生成された波形Ｗの複数のバージョンのうち、最新のバージョンの番号である。 When the second management process Sb2 is started, the information management unit 40 changes the feature column version number Vf from the current value Cf to the set value Xf (Sb201). Further, the information management unit 40 maintains the note string version number Vn at the current value Cn (Sb202), and changes the waveform version number Vw from the current value Cw to the latest value Yw (Sb203). The latest value Yw of the waveform version number Vw is the number of the latest version among the plurality of versions of the waveform W generated for each edit instruction Qw under the feature column F of the version corresponding to the set value Xf.

　情報管理部４０は、音符列Ｎの第１履歴データＨn[Vn=Cn,Vf=0,Vw=0]と、特徴列Ｆの第２履歴データＨf[Vn=Cn,Vf=1,Vw=0]～Ｈf[Vn=Cn,Vf=Xf,Vw=0]と、波形Ｗの第３履歴データＨw[Vn=Cn,Vf=Xf,Vw=1]～Ｈw[Vn=Xn,Vf=Xf,Vw=Yw]とを、記憶装置１２の履歴領域から取得する（Ｓb204）。音符列Ｎの第１履歴データＨn[Vn=Cn,Vf=0,Vw=0]は、現在のバージョンの音符列Ｎを表す音符列データＤnである。特徴列Ｆの第２履歴データＨf[Vn=Cn,Vf=1,Vw=0]～Ｈf[Vn=Cn,Vf=Xf,Vw=0]は、現在のバージョンの音符列Ｎのもとで利用者が順次に付与した１以上の編集指示Ｑfのうち第Ｘf番目以前の編集指示Ｑfの時系列を表すデータである。波形Ｗの第３履歴データＨw[Vn=Cn,Vf=Xf,Vw=1]～Ｈw[Vn=Xn,Vf=Xf,Vw=Yw]は、音符列バージョン番号Ｖnが現在値Ｃnであるバージョンの音符列Ｎと特徴列バージョン番号Ｖfが設定値Ｘfであるバージョンの特徴列Ｆとのもとで利用者が順次に付与した１以上の編集指示Ｑwのうち第Ｙw番目以前の編集指示Ｑwの時系列を表すデータである。 The information management unit 40 has the first history data Hn [Vn = Cn, Vf = 0, Vw = 0] of the note sequence N and the second history data Hf [Vn = Cn, Vf = 1, Vw = = of the feature sequence F. 0] to Hf [Vn = Cn, Vf = Xf, Vw = 0] and the third history data Hw [Vn = Cn, Vf = Xf, Vw = 1] to Hw [Vn = Xn, Vf = Xf] of the waveform W. , Vw = Yw] is acquired from the history area of the storage device 12 (Sb204). The first history data Hn [Vn = Cn, Vf = 0, Vw = 0] of the note string N is the note string data Dn representing the note string N of the current version. The second history data Hf [Vn = Cn, Vf = 1, Vw = 0] to Hf [Vn = Cn, Vf = Xf, Vw = 0] of the feature column F is under the note string N of the current version. It is data representing the time series of the edit instruction Qf before the Xfth among the one or more edit instruction Qf given sequentially by the user. The third history data Hw [Vn = Cn, Vf = Xf, Vw = 1] to Hw [Vn = Xn, Vf = Xf, Vw = Yw] of the waveform W is the version in which the note string version number Vn is the current value Cn. Of the one or more edit instruction Qw sequentially given by the user under the note sequence N of the note sequence N and the feature sequence F of the version in which the feature column version number Vf is the set value Xf, the edit instruction Qw before the Ywth It is data representing a time series.

　第１生成部３２は、情報管理部４０が取得した第１履歴データＨn[Vn=Cn,Vf=0,Vw=0]（音符列データＤn）を第１生成モデルＭ1に供給することで特徴列データＤfを生成する（Ｓb205）。第２編集部３３は、情報管理部４０が取得した１以上の第２履歴データＨf[Vn=Cn,Vf=1,Vw=0]～Ｈf[Vn=Cn,Vf=Xf,Vw=0]が表す編集指示Ｑfに応じて当該特徴列データＤfを順次に編集する（Ｓb206）。すなわち、現在値Ｃnに対応する音符列Ｎのもとで第Ｘf番目までの編集指示Ｑfに応じて編集された特徴列データＤfが生成される。 The first generation unit 32 is characterized by supplying the first history data Hn [Vn = Cn, Vf = 0, Vw = 0] (note string data Dn) acquired by the information management unit 40 to the first generation model M1. Generate column data Df (Sb205). The second editorial unit 33 has one or more second history data Hf [Vn = Cn, Vf = 1, Vw = 0] to Hf [Vn = Cn, Vf = Xf, Vw = 0] acquired by the information management unit 40. The feature column data Df is sequentially edited according to the editing instruction Qf represented by (Sb206). That is, the feature sequence data Df edited according to the edit instruction Qf up to the Xf th is generated under the note sequence N corresponding to the current value Cn.

　第２生成部３４は、情報管理部４０が取得した第１履歴データＨn[Vn=Cn,Vf=0,Vw=0]（音符列データＤn）と編集後の特徴列データＤfとを含む入力データＤinを第２生成モデルＭ2に供給することで波形データＤwを生成する（Ｓb207）。第３編集部３５は、情報管理部４０が取得した１以上の第３履歴データＨw[Vn=Cn,Vf=Xf,Vw=1]～Ｈw[Vn=Xn,Vf=Xf,Vw=Yw]が表す編集指示Ｑwに応じて波形データＤwを順次に編集する（Ｓb208）。すなわち、現在値Ｃnに対応する音符列Ｎと設定値Ｘfに対応する特徴列Ｆとのもとで第Ｙw番目までの編集指示Ｑwに応じて編集された波形データＤwが生成される。 The second generation unit 34 is an input including the first history data Hn [Vn = Cn, Vf = 0, Vw = 0] (note string data Dn) acquired by the information management unit 40 and the edited feature sequence data Df. The waveform data Dw is generated by supplying the data Din to the second generation model M2 (Sb207). The third editorial unit 35 has one or more third history data Hw [Vn = Cn, Vf = Xf, Vw = 1] to Hw [Vn = Xn, Vf = Xf, Vw = Yw] acquired by the information management unit 40. The waveform data Dw is sequentially edited according to the editing instruction Qw represented by (Sb208). That is, the waveform data Dw edited according to the edit instruction Qw up to the Ywth th is generated under the note string N corresponding to the current value Cn and the feature string F corresponding to the set value Xf.

　表示制御部２０は、編集画面Ｇを更新する（Ｓb209－Ｓb210）。具体的には、表示制御部２０は、第２編集部３３による編集後の特徴列データＤfが表す特徴列Ｆを編集領域Ｅfに表示し、操作領域Ｇfの特徴列バージョン番号Ｖfの表示を設定値Ｘfに更新する（Ｓb209）。すなわち、現在値Ｃnと設定値Ｘfとに対応する特徴列Ｆが編集領域Ｅfに表示される。また、表示制御部２０は、第３編集部３５による編集後の波形データＤwが表す波形Ｗを編集領域Ｅwに表示し、操作領域Ｇwの波形バージョン番号Ｖwの表示を最新値Ｙwに更新する（Ｓb210）。すなわち、現在値Ｃnと設定値Ｘfと最新値Ｙwとに対応する波形Ｗが編集領域Ｅwに表示される。以上の状態において、利用者は、音符列Ｎと特徴列Ｆと波形Ｗとの各々に関する編集の指示（Ｑn、ＱfまたはＱw）を付与できる。 The display control unit 20 updates the edit screen G (Sb209-Sb210). Specifically, the display control unit 20 displays the feature column F represented by the feature column data Df edited by the second editing unit 33 in the edit area Ef, and sets the display of the feature column version number Vf of the operation area Gf. Update to the value Xf (Sb209). That is, the feature column F corresponding to the current value Cn and the set value Xf is displayed in the editing area Ef. Further, the display control unit 20 displays the waveform W represented by the waveform data Dw edited by the third editing unit 35 in the editing area Ew, and updates the display of the waveform version number Vw in the operation area Gw to the latest value Yw ( Sb210). That is, the waveform W corresponding to the current value Cn, the set value Xf, and the latest value Yw is displayed in the editing area Ew. In the above state, the user can give an editing instruction (Qn, Qf or Qw) for each of the note sequence N, the feature sequence F and the waveform W.

　図１０は、波形Ｗのバージョンに関する第３管理処理Ｓb3のフローチャートである。波形バージョン番号Ｖwの変更の指示を契機として第３管理処理Ｓb3が開始される。 FIG. 10 is a flowchart of the third management process Sb3 regarding the version of the waveform W. The third management process Sb3 is started with the instruction to change the waveform version number Vw.

　利用者からの指示に応じた変更後の波形バージョン番号Ｖwの数値を以下では「設定値Ｘw」と表記する。操作領域Ｇw内の波形バージョン番号Ｖwを利用者が直接に変更した場合、当該変更後の数値（すなわち利用者が指定した数値）が設定値Ｘwに相当する。また、利用者が操作画像Ｇw1を操作した場合、波形バージョン番号Ｖwの現在値Ｃwの直前の数値（＝Ｃw－１）が設定値Ｘwに相当する。他方、利用者が操作画像Ｇw2を操作した場合、波形バージョン番号Ｖwの現在値Ｃwの直後の数値（＝Ｃw＋１）が設定値Ｘwに相当する。 The numerical value of the waveform version number Vw after the change according to the instruction from the user is referred to as "set value Xw" below. When the user directly changes the waveform version number Vw in the operation area Gw, the changed numerical value (that is, the numerical value specified by the user) corresponds to the set value Xw. Further, when the user operates the operation image Gw1, the numerical value (= Cw-1) immediately before the current value Cw of the waveform version number Vw corresponds to the set value Xw. On the other hand, when the user operates the operation image Gw2, the numerical value (= Cw + 1) immediately after the current value Cw of the waveform version number Vw corresponds to the set value Xw.

　第３管理処理Ｓb3が開始されると、情報管理部４０は、波形バージョン番号Ｖwを現在値Ｃwから設定値Ｘwに変更する（Ｓb301）。また、情報管理部４０は、音符列バージョン番号Ｖnを現在値Ｃnに維持し（Ｓb302）、特徴列バージョン番号Ｖfを現在値Ｃfに維持する（Ｓb303）。 When the third management process Sb3 is started, the information management unit 40 changes the waveform version number Vw from the current value Cw to the set value Xw (Sb301). Further, the information management unit 40 maintains the note sequence version number Vn at the current value Cn (Sb302) and the feature sequence version number Vf at the current value Cf (Sb303).

　情報管理部４０は、音符列Ｎの第１履歴データＨn[Vn=Cn,Vf=0,Vw=0]と、特徴列Ｆの第２履歴データＨf[Vn=Cn,Vf=1,Vw=0]～Ｈf[Vn=Cn,Vf=Cf,Vw=0]と、波形Ｗの第３履歴データＨw[Vn=Cn,Vf=Cf,Vw=1]～Ｈw[Vn=Cn,Vf=Cf,Vw=Xw]とを、記憶装置１２の履歴領域から取得する（Ｓb304）。音符列Ｎの第１履歴データＨn[Vn=Cn,Vf=0,Vw=0]は、現在のバージョンの音符列Ｎを表す音符列データＤnである。特徴列Ｆの第２履歴データＨf[Vn=Cn,Vf=1,Vw=0]～Ｈf[Vn=Cn,Vf=Cf,Vw=0]は、音符列バージョン番号Ｖnが設定値Ｘnである音符列Ｎのもとで利用者が順次に付与した１以上の編集指示Ｑfのうち第Ｃf番目以前の編集指示Ｑfの時系列を表すデータである。波形Ｗの第３履歴データＨw[Vn=Cn,Vf=Cf,Vw=1]～Ｈw[Vn=Cn,Vf=Cf,Vw=Xw]は、現在のバージョンの音符列Ｎと現在のバージョンの特徴列Ｆとのもとで利用者が順次に付与した１以上の編集指示Ｑwのうち第Ｘw番目以前の編集指示Ｑwの時系列を表すデータである。 The information management unit 40 has the first history data Hn [Vn = Cn, Vf = 0, Vw = 0] of the note sequence N and the second history data Hf [Vn = Cn, Vf = 1, Vw = = of the feature sequence F. 0] to Hf [Vn = Cn, Vf = Cf, Vw = 0] and the third history data Hw [Vn = Cn, Vf = Cf, Vw = 1] to Hw [Vn = Cn, Vf = Cf] of the waveform W. , Vw = Xw] is acquired from the history area of the storage device 12 (Sb304). The first history data Hn [Vn = Cn, Vf = 0, Vw = 0] of the note string N is the note string data Dn representing the note string N of the current version. In the second history data Hf [Vn = Cn, Vf = 1, Vw = 0] to Hf [Vn = Cn, Vf = Cf, Vw = 0] of the feature column F, the note sequence version number Vn is the set value Xn. It is data representing the time series of the edit instruction Qf before the Cfth among one or more edit instruction Qf sequentially given by the user under the note string N. The third history data Hw [Vn = Cn, Vf = Cf, Vw = 1] to Hw [Vn = Cn, Vf = Cf, Vw = Xw] of the waveform W is the note string N of the current version and the current version. It is data representing the time series of the edit instruction Qw before the Xwth among one or more edit instruction Qw sequentially given by the user under the feature column F.

　第１生成部３２は、情報管理部４０が取得した第１履歴データＨn[Vn=Cn,Vf=0,Vw=0]（音符列データＤn）を第１生成モデルＭ1に供給することで特徴列データＤfを生成する（Ｓb305）。第２編集部３３は、情報管理部４０が取得した１以上の第２履歴データＨf[Vn=Cn,Vf=1,Vw=0]～Ｈf[Vn=Cn,Vf=Cf,Vw=0]が表す編集指示Ｑfに応じて当該特徴列データＤfを順次に編集する（Ｓb306）。すなわち、現在値Ｃnに対応する音符列Ｎのもとで第Ｃf番目までの編集指示Ｑfに応じて編集された特徴列データＤfが生成される。 The first generation unit 32 is characterized by supplying the first history data Hn [Vn = Cn, Vf = 0, Vw = 0] (note string data Dn) acquired by the information management unit 40 to the first generation model M1. Generate column data Df (Sb305). The second editorial unit 33 has one or more second history data Hf [Vn = Cn, Vf = 1, Vw = 0] to Hf [Vn = Cn, Vf = Cf, Vw = 0] acquired by the information management unit 40. The feature column data Df is sequentially edited according to the editing instruction Qf represented by (Sb306). That is, the feature sequence data Df edited according to the edit instruction Qf up to the Cf th is generated under the note sequence N corresponding to the current value Cn.

　第２生成部３４は、情報管理部４０が取得した第１履歴データＨn[Vn=Cn,Vf=0,Vw=0]（音符列データＤn）と編集後の特徴列データＤfとを含む入力データＤinを第２生成モデルＭ2に供給することで波形データＤwを生成する（Ｓb307）。第３編集部３５は、情報管理部４０が取得した１以上の第３履歴データＨw[Vn=Cn,Vf=Cf,Vw=1]～Ｈw[Vn=Cn,Vf=Cf,Vw=Xw]が表す編集指示Ｑwに応じて波形データＤwを順次に編集する（Ｓb308）。すなわち、現在値Ｃnに対応する音符列Ｎと現在値Ｃfに対応する特徴列Ｆとのもとで第Ｘw番目までの編集指示Ｑwに応じて編集された波形データＤwが生成される。 The second generation unit 34 is an input including the first history data Hn [Vn = Cn, Vf = 0, Vw = 0] (note string data Dn) acquired by the information management unit 40 and the edited feature sequence data Df. The waveform data Dw is generated by supplying the data Din to the second generation model M2 (Sb307). The third editorial unit 35 has one or more third history data Hw [Vn = Cn, Vf = Cf, Vw = 1] to Hw [Vn = Cn, Vf = Cf, Vw = Xw] acquired by the information management unit 40. The waveform data Dw is sequentially edited according to the editing instruction Qw represented by (Sb308). That is, the waveform data Dw edited according to the edit instruction Qw up to the Xwth is generated under the note string N corresponding to the current value Cn and the feature string F corresponding to the current value Cf.

　表示制御部２０は、編集画面Ｇを更新する（Ｓb309）。具体的には、表示制御部２０は、第３編集部３５による編集後の波形データＤwが表す波形Ｗを編集領域Ｅwに表示し、操作領域Ｇwの波形バージョン番号Ｖwの表示を設定値Ｘwに更新する。すなわち、現在値Ｃnと現在値Ｃfと設定値Ｘfとに対応する波形Ｗが編集領域Ｅwに表示される。 The display control unit 20 updates the edit screen G (Sb309). Specifically, the display control unit 20 displays the waveform W represented by the waveform data Dw edited by the third editing unit 35 in the editing area Ew, and displays the waveform version number Vw in the operation area Gw as the set value Xw. Update. That is, the waveform W corresponding to the current value Cn, the current value Cf, and the set value Xf is displayed in the editing area Ew.

　以上の通り、第１実施形態においては、音符列データＤnと特徴列データＤfとが利用者からの指示（編集指示Ｑnおよび編集指示Ｑf）に応じて編集される。したがって、音符列データＤnのみが利用者からの指示に応じて編集される構成と比較して、利用者からの指示を精緻に反映した波形データＤwを生成できる。 As described above, in the first embodiment, the note sequence data Dn and the feature sequence data Df are edited according to the instructions (editing instruction Qn and editing instruction Qf) from the user. Therefore, it is possible to generate waveform data Dw that precisely reflects the instruction from the user, as compared with the configuration in which only the note string data Dn is edited in response to the instruction from the user.

　また、音符列データＤnが編集された場合には、音符列バージョン番号Ｖnが増加し、かつ、特徴列バージョン番号Ｖfの数値が初期化され、特徴列データＤfが編集された場合には、音符列バージョン番号Ｖnの数値が維持されたまま、特徴列バージョン番号Ｖfの数値が増加する。そして、音符列バージョン番号Ｖnの複数の数値のうち利用者からの指示に応じた設定値Ｘnに対応する第１履歴データＨn[Vn,Vf,Vw]と、特徴列バージョン番号Ｖfの複数の数値のうち利用者からの指示に応じた設定値Ｘfに対応する第２履歴データＨf[Vn,Vf,Vw]との少なくとも一方を利用して波形データＤwが生成される。したがって、利用者は、音符列バージョン番号Ｖnと特徴列バージョン番号Ｖfとの相異なる組合せについて試行錯誤的に波形データＤwを生成しながら、音符列データＤnおよび特徴列データＤfの編集を指示できる。 Further, when the note string data Dn is edited, the note string version number Vn is increased, the numerical value of the feature string version number Vf is initialized, and when the feature string data Df is edited, the note is used. The numerical value of the feature column version number Vf is increased while the numerical value of the column version number Vn is maintained. Then, among the plurality of numerical values of the note string version number Vn, the first history data Hn [Vn, Vf, Vw] corresponding to the set value Xn according to the instruction from the user, and the plurality of numerical values of the feature column version number Vf. Of these, the waveform data Dw is generated by using at least one of the second history data Hf [Vn, Vf, Vw] corresponding to the set value Xf according to the instruction from the user. Therefore, the user can instruct the editing of the note sequence data Dn and the feature sequence data Df while generating the waveform data Dw by trial and error for different combinations of the note sequence version number Vn and the feature sequence version number Vf.

Ｂ：第２実施形態
　第２実施形態を説明する。なお、以下に例示する各態様において機能が第１実施形態と同様である要素については、第１実施形態の説明で使用した符号と同様の符号を使用して各々の詳細な説明を適宜に省略する。 B: Second Embodiment The second embodiment will be described. For the elements whose functions are the same as those of the first embodiment in each of the embodiments exemplified below, the same reference numerals as those used in the description of the first embodiment are used, and detailed description of each is appropriately omitted. do.

　図１１は、第２実施形態における編集画面Ｇの模式図である。第２実施形態の編集画面Ｇにおいては、第１実施形態と同様の要素に操作画像Ｂ2が追加される。操作画像Ｂ2は、合成音の発音スタイルを利用者が選択するための画像（具体的にはプルダウンメニュー）である。利用者は、操作装置１５を操作することで複数の発音スタイルのうち所望の発音スタイルを選択できる。 FIG. 11 is a schematic diagram of the editing screen G in the second embodiment. In the editing screen G of the second embodiment, the operation image B2 is added to the same elements as those of the first embodiment. The operation image B2 is an image (specifically, a pull-down menu) for the user to select the pronunciation style of the synthetic sound. The user can select a desired pronunciation style from a plurality of pronunciation styles by operating the operation device 15.

　発音スタイルは、発音の仕方に関する特徴を意味する。例えば合成音が楽器音である場合、発音スタイルは、楽器の演奏の仕方に関する特徴である。また、例えば合成音が歌唱音である場合、発音スタイルは、楽曲の歌唱の仕方に関する特徴（歌い廻し）である。具体的には、ポップス／ロック／ラップ等、音楽ジャンル毎に好適な発音の仕方が発音スタイルとして例示される。また、明るく／静かに／激しく等、演奏または歌唱の音楽的な表情も発音スタイルとして例示される。 Pronunciation style means a feature related to how to pronounce. For example, when the synthetic sound is a musical instrument sound, the pronunciation style is a characteristic of how the musical instrument is played. Further, for example, when the synthetic sound is a singing sound, the pronunciation style is a feature (sung around) regarding how to sing the music. Specifically, a suitable pronunciation method for each music genre, such as pop / rock / rap, is exemplified as a pronunciation style. In addition, the musical expression of playing or singing, such as bright / quiet / violent, is also exemplified as a pronunciation style.

　図１２は、第２実施形態における制御装置１１の機能的な構成を例示するブロック図である。第２実施形態の第１生成部３２および第２生成部３４には、操作画像Ｂ2に対する操作で利用者が選択した発音スタイルｓが指示される。 FIG. 12 is a block diagram illustrating a functional configuration of the control device 11 in the second embodiment. The pronunciation style s selected by the user in the operation on the operation image B2 is instructed to the first generation unit 32 and the second generation unit 34 of the second embodiment.

　第１生成部３２は、音符列データＤnと発音スタイルｓとから特徴列データＤfを生成する。特徴列データＤfは、音符列データＤnが表す音符列Ｎを発音スタイルｓで発音した合成音に関する特徴量（例えば基本周波数）の時系列を表す時系列データである。 The first generation unit 32 generates the feature sequence data Df from the note sequence data Dn and the pronunciation style s. The feature sequence data Df is time series data representing a time series of feature quantities (for example, fundamental frequency) related to a synthetic sound obtained by reproducing the note sequence N represented by the note sequence data Dn in the pronunciation style s.

　具体的には、第１生成部３２は、第１生成モデルＭ1を利用して特徴列データＤfを生成する。第１生成モデルＭ1は、音符列データＤnと発音スタイルｓとを入力として特徴列データＤfを出力する統計的推定モデルである。第１実施形態と同様に、第１生成モデルＭ1は、例えば畳込ニューラルネットワークまたは再帰型ニューラルネットワーク等の任意の構造の深層ニューラルネットワークで構成される。具体的には、第１生成モデルＭ1は、音符列データＤnと発音スタイルｓとから特徴列データＤfを生成する演算を制御装置１１に実行させるプログラムと、当該演算に適用される複数の変数との組合せで実現される。 Specifically, the first generation unit 32 generates the feature column data Df using the first generation model M1. The first generative model M1 is a statistical inference model that outputs feature sequence data Df by inputting note sequence data Dn and pronunciation style s. Similar to the first embodiment, the first generative model M1 is composed of a deep neural network having an arbitrary structure such as a convolutional neural network or a recurrent neural network. Specifically, the first generation model M1 includes a program that causes the control device 11 to execute an operation for generating feature sequence data Df from the note string data Dn and the pronunciation style s, and a plurality of variables applied to the operation. It is realized by the combination of.

　第１生成モデルＭ1を規定する複数の変数は、複数の第１訓練データを利用した機械学習により事前に設定されて記憶装置１２に記憶される。複数の第１訓練データの各々は、音符列データＤnおよび発音スタイルｓの組と特徴列データＤf（正解値）とを含む。第１生成モデルＭ1の機械学習においては、各第１訓練データの音符列データＤnと発音スタイルｓとに対して暫定的な第１生成モデルＭ1が出力する特徴列データＤfと、当該第１訓練データの特徴列データＤfとの誤差が低減されるように、第１生成モデルＭ1の複数の変数が反復的に更新される。したがって、第１生成モデルＭ1は、複数の第１訓練データに潜在する傾向のもとで音符列データＤnと発音スタイルｓとの未知の組合せに対して統計的に妥当な特徴列データＤfを出力する。 A plurality of variables defining the first generation model M1 are set in advance by machine learning using a plurality of first training data and stored in the storage device 12. Each of the plurality of first training data includes the note sequence data Dn, the set of pronunciation styles s, and the feature sequence data Df (correct answer value). In the machine learning of the first generation model M1, the feature sequence data Df output by the provisional first generation model M1 for the note sequence data Dn and the pronunciation style s of each first training data, and the first training. The plurality of variables of the first generation model M1 are updated iteratively so that the error with the data feature column data Df is reduced. Therefore, the first generative model M1 outputs statistically valid feature sequence data Df for an unknown combination of note sequence data Dn and pronunciation style s under a tendency latent in a plurality of first training data. do.

　第２生成部３４は、音符列データＤnと特徴列データＤfと発音スタイルｓとから波形データＤwを生成する。波形データＤwは、音符列データＤnが表す音符列Ｎを発音スタイルｓで発音した合成音音の波形を表す時系列データである。 The second generation unit 34 generates waveform data Dw from the note sequence data Dn, the feature sequence data Df, and the pronunciation style s. The waveform data Dw is time-series data representing the waveform of the synthetic sound sound obtained by pronouncing the note sequence N represented by the note sequence data Dn in the pronunciation style s.

　具体的には、第２生成部３４は、第２生成モデルＭ2を利用して波形データＤwを生成する。第２生成モデルＭ2は、音符列データＤnと特徴列データＤfと発音スタイルｓとを入力として波形データＤwを出力する統計的推定モデルである。第１実施形態と同様に、第２生成モデルＭ2は、例えば畳込ニューラルネットワークまたは再帰型ニューラルネットワーク等の任意の構造の深層ニューラルネットワークで構成される。具体的には、第２生成モデルＭ2は、音符列データＤnと特徴列データＤfと発音スタイルｓとから波形データＤwを生成する演算を制御装置１１に実行させるプログラムと、当該演算に適用される複数の変数との組合せで実現される。 Specifically, the second generation unit 34 generates the waveform data Dw using the second generation model M2. The second generative model M2 is a statistical inference model that outputs waveform data Dw by inputting note sequence data Dn, feature sequence data Df, and pronunciation style s. Similar to the first embodiment, the second generative model M2 is composed of a deep neural network having an arbitrary structure such as a convolutional neural network or a recurrent neural network. Specifically, the second generation model M2 is applied to a program that causes the control device 11 to execute an operation of generating waveform data Dw from the note string data Dn, the feature sequence data Df, and the pronunciation style s, and the operation. It is realized by combining with multiple variables.

　第２生成モデルＭ2を規定する複数の変数は、複数の第２訓練データを利用した機械学習により事前に設定されて記憶装置１２に記憶される。複数の第２訓練データの各々は、音符列データＤnと特徴列データＤfと発音スタイルｓとの組と、波形データＤw（正解値）とを含む。第２生成モデルＭ2の機械学習においては、各第２訓練データの音符列データＤnと特徴列データＤfと発音スタイルｓとに対して暫定的な第２生成モデルＭ2が出力する波形データＤwと、当該第２訓練データの波形データＤwとの誤差が低減されるように、第２生成モデルＭ2の複数の変数が反復的に更新される。したがって、第２生成モデルＭ2は、複数の第２訓練データに潜在する傾向のもとで音符列データＤnと特徴列データＤfと発音スタイルｓとの未知の組合せに対して統計的に妥当な波形データＤwを出力する。 A plurality of variables defining the second generation model M2 are set in advance by machine learning using a plurality of second training data and stored in the storage device 12. Each of the plurality of second training data includes a set of the note sequence data Dn, the feature sequence data Df, and the pronunciation style s, and the waveform data Dw (correct answer value). In the machine learning of the second generative model M2, the waveform data Dw output by the tentative second generative model M2 for the note sequence data Dn, the feature sequence data Df, and the pronunciation style s of each second training data, A plurality of variables of the second generation model M2 are iteratively updated so that the error of the second training data with the waveform data Dw is reduced. Therefore, the second generative model M2 has a statistically valid waveform for an unknown combination of the note sequence data Dn, the feature sequence data Df, and the pronunciation style s under the tendency latent in the plurality of second training data. Output data Dw.

　第１編集部３１は、第２編集処理Ｓa2のステップＳa201において、利用者が選択した発音スタイルｓで音符列Ｎを発音した合成音の特徴列Ｆを表す特徴列データＤfを、利用者からの編集指示Ｑfに応じて編集する。また、情報管理部４０は、第２編集処理Ｓa2のステップＳa205において、編集後の特徴列データＤfに応じた第２履歴データＨf[Vn,Vf,Vw]を特徴列データＤfのバージョン毎に記憶装置１２の履歴領域に保存する。 In step Sa201 of the second editing process Sa2, the first editing unit 31 obtains the feature sequence data Df representing the feature sequence F of the synthetic sound in which the note sequence N is pronounced by the pronunciation style s selected by the user. Edit according to the edit instruction Qf. Further, in the step Sa205 of the second editing process Sa2, the information management unit 40 stores the second history data Hf [Vn, Vf, Vw] corresponding to the edited feature column data Df for each version of the feature column data Df. It is saved in the history area of the device 12.

　以上の説明から理解される通り、特定の音符列Ｎのもとで、発音スタイルｓに応じた特徴列データＤfと当該発音スタイルｓに応じた波形データＤwとが生成される。他方、音符列Ｎは発音スタイルｓに影響されない。したがって、図１３に例示される通り、１個の音符列Ｎに対応する第１履歴データＨn[Vn,Vf,Vw]（音符列データＤn）について、発音スタイルｓ毎に、相異なる特徴列Ｆに対応する複数の第２履歴データＨf[Vn,Vf,Vw]と、相異なる波形Ｗに対応する複数の第３履歴データＨw[Vn,Vf,Vw]とが、記憶装置１２の履歴領域に保存される。 As understood from the above explanation, the feature sequence data Df corresponding to the pronunciation style s and the waveform data Dw corresponding to the pronunciation style s are generated under the specific note sequence N. On the other hand, the note sequence N is not affected by the pronunciation style s. Therefore, as illustrated in FIG. 13, for the first history data Hn [Vn, Vf, Vw] (note string data Dn) corresponding to one note sequence N, different feature sequences F for each pronunciation style s. A plurality of second history data Hf [Vn, Vf, Vw] corresponding to the above and a plurality of third history data Hw [Vn, Vf, Vw] corresponding to different waveforms W are stored in the history area of the storage device 12. It will be saved.

　次に、第２実施形態における動作の具体例を説明する。第１編集処理Ｓa1においては、音符列Ｎを発音スタイルｓで発音する合成音の特徴列Ｆを表す特徴列データＤfが第１処理部により生成され（Ｓa106）、当該合成音の波形Ｗを表す波形データＤwが第２処理部により生成される（Ｓa107）。 Next, a specific example of the operation in the second embodiment will be described. In the first editing process Sa1, the feature sequence data Df representing the feature sequence F of the synthetic sound that pronounces the note sequence N in the pronunciation style s is generated by the first processing unit (Sa106), and represents the waveform W of the synthetic sound. Waveform data Dw is generated by the second processing unit (Sa107).

　第２編集処理Ｓa2において、第２編集部３３は、発音スタイルｓに応じた特徴列データＤfを、利用者からの編集指示Ｑfに応じて編集する。情報管理部４０は、特徴列データＤfの編集毎（すなわち特徴列データＤfのバージョン毎）に、当該編集後の特徴列データＤfに応じた第２履歴データＨf[Vn,Vf,Vw]を履歴領域に保存する。 In the second editing process Sa2, the second editing unit 33 edits the feature sequence data Df according to the pronunciation style s according to the editing instruction Qf from the user. The information management unit 40 history the second history data Hf [Vn, Vf, Vw] corresponding to the edited feature column data Df for each edit of the feature column data Df (that is, for each version of the feature column data Df). Save to area.

　同様に、第３編集処理Ｓa3において、第３編集部３５は、発音スタイルｓに応じた波形データＤwを、利用者からの編集指示Ｑwに応じて編集する。情報管理部４０は、波形データＤwの編集毎（すなわち波形データＤwのバージョン毎）に、当該編集後の波形データＤwに応じた第３履歴データＨw[Vn,Vf,Vw]を履歴領域に保存する。 Similarly, in the third editing process Sa3, the third editing unit 35 edits the waveform data Dw according to the pronunciation style s according to the editing instruction Qw from the user. The information management unit 40 saves the third history data Hw [Vn, Vf, Vw] corresponding to the edited waveform data Dw in the history area for each edit of the waveform data Dw (that is, for each version of the waveform data Dw). do.

　第２実施形態においては、発音スタイルｓが選択された状態において、音符列バージョン番号Ｖnの変更の指示を契機として、第１管理処理Ｓb1が開始される。第１管理処理Ｓb1のステップＳb104において、情報管理部４０は、音符列Ｎの第１履歴データＨn[Vn=Xn,Vf=0,Vw=0]と、発音スタイルｓに対応する特徴列Ｆの第２履歴データＨf[Vn=Xn,Vf=1,Vw=0]～Ｈf[Vn=Xn,Vf=Yf,Vw=0]と、当該発音スタイルｓに対応する波形Ｗの第３履歴データＨw[Vn=Xn,Vf=Yf,Vw=1]～Ｈw[Vn=Xn,Vf=Yf,Vw=Yw]とを、履歴領域から取得する。第１管理処理Ｓb1のステップＳb105からステップＳb108においては、発音スタイルｓに対応する特徴列Ｆの特徴列データＤfと、発音スタイルｓに対応する波形Ｗの波形データＤwが生成される。 In the second embodiment, in the state where the pronunciation style s is selected, the first management process Sb1 is started with the instruction to change the note string version number Vn. In step Sb104 of the first management process Sb1, the information management unit 40 has the first history data Hn [Vn = Xn, Vf = 0, Vw = 0] of the note sequence N and the feature sequence F corresponding to the pronunciation style s. The second history data Hf [Vn = Xn, Vf = 1, Vw = 0] to Hf [Vn = Xn, Vf = Yf, Vw = 0] and the third history data Hw of the waveform W corresponding to the pronunciation style s. [Vn = Xn, Vf = Yf, Vw = 1] to Hw [Vn = Xn, Vf = Yf, Vw = Yw] are acquired from the history area. In steps Sb105 to Sb108 of the first management process Sb1, the feature sequence data Df of the feature sequence F corresponding to the pronunciation style s and the waveform data Dw of the waveform W corresponding to the pronunciation style s are generated.

　第２実施形態においては、発音スタイルｓが選択された状態において、特徴列バージョン番号Ｖfの変更の指示を契機として、第２管理処理Ｓb2が開始される。第２管理処理Ｓb2のステップＳb204において、情報管理部４０は、音符列Ｎの第１履歴データＨn[Vn=Cn,Vf=0,Vw=0]と、発音スタイルｓに対応する特徴列Ｆの第２履歴データＨf[Vn=Cn,Vf=1,Vw=0]～Ｈf[Vn=Cn,Vf=Xf,Vw=0]と、当該発音スタイルｓに対応する波形Ｗの第３履歴データＨw[Vn=Cn,Vf=Xf,Vw=1]～Ｈw[Vn=Xn,Vf=Xf,Vw=Yw]とを、履歴領域から取得する。「発音スタイルｓに対応する特徴列Ｆ」は、具体的には、音符列バージョン番号Ｖn（設定値Ｘn）と、発音スタイルｓと、特徴列バージョン番号Ｖf（最新値Ｙf）とに対応する特徴列Ｆである。また、「発音スタイルｓに対応する波形Ｗ」は、具体的には、音符列バージョン番号Ｖn（設定値Ｘn）と、発音スタイルｓと、特徴列バージョン番号Ｖf（最新値Ｙf）と、波形バージョン番号Ｖw（最新値Ｙw）とに対応する波形Ｗである。第２管理処理Ｓb2のステップＳb205からステップＳb208においては、発音スタイルｓに対応する特徴列Ｆの特徴列データＤfと、発音スタイルｓに対応する波形Ｗの波形データＤwが生成される。「発音スタイルｓに対応する特徴列Ｆ」は、具体的には、音符列バージョン番号Ｖn（現在値Ｃn）と、発音スタイルｓと、特徴列バージョン番号Ｖf（設定値Ｘf）とに対応する特徴列Ｆである。また、「発音スタイルｓに対応する波形Ｗ」は、具体的には、音符列バージョン番号Ｖn（現在値Ｃn）と、発音スタイルｓと、特徴列バージョン番号Ｖf（設定値Ｘf）と、波形バージョン番号Ｖw（最新値Ｙw）とに対応する波形Ｗである。 In the second embodiment, in the state where the pronunciation style s is selected, the second management process Sb2 is started with the instruction to change the feature column version number Vf. In step Sb204 of the second management process Sb2, the information management unit 40 has the first history data Hn [Vn = Cn, Vf = 0, Vw = 0] of the note sequence N and the feature sequence F corresponding to the pronunciation style s. The second history data Hf [Vn = Cn, Vf = 1, Vw = 0] to Hf [Vn = Cn, Vf = Xf, Vw = 0] and the third history data Hw of the waveform W corresponding to the pronunciation style s. [Vn = Cn, Vf = Xf, Vw = 1] to Hw [Vn = Xn, Vf = Xf, Vw = Yw] are acquired from the history area. Specifically, the "feature column F corresponding to the pronunciation style s" is a feature corresponding to the note sequence version number Vn (set value Xn), the pronunciation style s, and the feature sequence version number Vf (latest value Yf). Column F. Specifically, the "waveform W corresponding to the pronunciation style s" includes a note string version number Vn (set value Xn), a pronunciation style s, a feature string version number Vf (latest value Yf), and a waveform version. It is a waveform W corresponding to the number Vw (latest value Yw). In steps Sb205 to Sb208 of the second management process Sb2, the feature sequence data Df of the feature sequence F corresponding to the pronunciation style s and the waveform data Dw of the waveform W corresponding to the pronunciation style s are generated. Specifically, the "feature sequence F corresponding to the pronunciation style s" is a feature corresponding to the note sequence version number Vn (current value Cn), the pronunciation style s, and the feature sequence version number Vf (set value Xf). Column F. Specifically, the "waveform W corresponding to the pronunciation style s" includes a note string version number Vn (current value Cn), a pronunciation style s, a feature string version number Vf (set value Xf), and a waveform version. It is a waveform W corresponding to the number Vw (latest value Yw).

　第２実施形態においては、発音スタイルｓが選択された状態において、波形バージョン番号Ｖwの変更の指示を契機として、第３管理処理Ｓb3が開始される。第３管理処理Ｓb3のステップＳb304において、情報管理部４０は、音符列Ｎの第１履歴データＨn[Vn=Cn,Vf=0,Vw=0]と、発音スタイルｓに対応する特徴列Ｆの第２履歴データＨf[Vn=Cn,Vf=1,Vw=0]～Ｈf[Vn=Cn,Vf=Cf,Vw=0]と、当該発音スタイルｓに対応する波形Ｗの第３履歴データＨw[Vn=Cn,Vf=Cf,Vw=1]～Ｈw[Vn=Cn,Vf=Cf,Vw=Xw]とを、履歴領域から取得する。第３管理処理Ｓb3のステップＳb305からステップＳb308においては、発音スタイルｓに対応する特徴列Ｆの特徴列データＤfと、発音スタイルｓに対応する波形Ｗの波形データＤwが生成される。「発音スタイルｓに対応する特徴列Ｆ」は、具体的には、音符列バージョン番号Ｖn（現在値Ｃn）と、発音スタイルｓと、特徴列バージョン番号Ｖf（現在値Ｃf）とに対応する特徴列Ｆである。また、「発音スタイルｓに対応する波形Ｗ」は、具体的には、音符列バージョン番号Ｖn（現在値Ｃn）と、発音スタイルｓと、特徴列バージョン番号Ｖf（現在値Ｃf）と、波形バージョン番号Ｖw（設定値Ｘw）とに対応する波形Ｗである。 In the second embodiment, in the state where the pronunciation style s is selected, the third management process Sb3 is started with the instruction to change the waveform version number Vw. In step Sb304 of the third management process Sb3, the information management unit 40 has the first history data Hn [Vn = Cn, Vf = 0, Vw = 0] of the note sequence N and the feature sequence F corresponding to the pronunciation style s. The second history data Hf [Vn = Cn, Vf = 1, Vw = 0] to Hf [Vn = Cn, Vf = Cf, Vw = 0] and the third history data Hw of the waveform W corresponding to the pronunciation style s. [Vn = Cn, Vf = Cf, Vw = 1] to Hw [Vn = Cn, Vf = Cf, Vw = Xw] are acquired from the history area. In steps Sb305 to Sb308 of the third management process Sb3, the feature sequence data Df of the feature sequence F corresponding to the pronunciation style s and the waveform data Dw of the waveform W corresponding to the pronunciation style s are generated. Specifically, the "feature sequence F corresponding to the pronunciation style s" is a feature corresponding to the note sequence version number Vn (current value Cn), the pronunciation style s, and the feature sequence version number Vf (current value Cf). Column F. Further, the "waveform W corresponding to the pronunciation style s" specifically includes a note string version number Vn (current value Cn), a pronunciation style s, a feature string version number Vf (current value Cf), and a waveform version. It is a waveform W corresponding to the number Vw (set value Xw).

　ここで、複数の発音スタイルｓから利用者が選択し得る発音スタイルｓ1と発音スタイルｓ2とに着目する。発音スタイルｓ1と発音スタイルｓ2とは相異なる発音スタイルｓである。発音スタイルｓ1は、「第１発音スタイル」の一例であり、発音スタイルｓ2は、「第２発音スタイル」の一例である。 Here, pay attention to the pronunciation style s1 and the pronunciation style s2 that the user can select from a plurality of pronunciation styles s. The pronunciation style s1 and the pronunciation style s2 are different pronunciation styles s. The pronunciation style s1 is an example of the "first pronunciation style", and the pronunciation style s2 is an example of the "second pronunciation style".

　まず、発音スタイルｓ1が選択されている場合を想定する。第２編集処理Ｓa2において、第２編集部３３は、発音スタイルｓ1に応じた特徴列データＤfを、利用者からの編集指示Ｑfに応じて編集する。そして、情報管理部４０は、特徴列データＤfの編集毎に、当該編集後の特徴列データＤfに応じた第２履歴データＨf[Vn,Vf,Vw]を履歴領域に保存する。同様に、第３編集処理Ｓa3において、第３編集部３５は、発音スタイルｓ1に応じた波形データＤwを、利用者からの編集指示Ｑwに応じて編集する。そして、情報管理部４０は、波形データＤwの編集毎に、当該編集後の波形データＤwに応じた第３履歴データＨw[Vn,Vf,Vw]を履歴領域に保存する。なお、発音スタイルｓ1が選択された状態で生成される特徴列データＤfまたは波形データＤwは、「第１時系列データ」の一例である。また、発音スタイルｓ1が選択された状態で利用者から付与される編集指示Ｑfまたは編集指示Ｑwは、「第１指示」の一例である。 First, assume that the pronunciation style s1 is selected. In the second editing process Sa2, the second editing unit 33 edits the feature sequence data Df corresponding to the pronunciation style s1 according to the editing instruction Qf from the user. Then, each time the feature column data Df is edited, the information management unit 40 saves the second history data Hf [Vn, Vf, Vw] corresponding to the edited feature column data Df in the history area. Similarly, in the third editing process Sa3, the third editing unit 35 edits the waveform data Dw according to the pronunciation style s1 according to the editing instruction Qw from the user. Then, each time the waveform data Dw is edited, the information management unit 40 saves the third history data Hw [Vn, Vf, Vw] corresponding to the edited waveform data Dw in the history area. The feature sequence data Df or waveform data Dw generated when the pronunciation style s1 is selected is an example of "first time series data". Further, the editing instruction Qf or the editing instruction Qw given by the user with the pronunciation style s1 selected is an example of the "first instruction".

　発音スタイルｓ1が選択されている場合、第１管理処理Ｓb1のステップＳb104と、第２管理処理Ｓb2のステップＳb204と、第３管理処理Ｓb3のステップＳb304とにおいては、発音スタイルｓ1に対応する特徴列Ｆの特徴列データＤfと、発音スタイルｓ1に対応する波形Ｗの波形データＤwとが生成される。すなわち、発音スタイルｓ1に対応する複数の履歴データＨ（Ｈn、Ｈf、Ｈw）のうち利用者からの指示（Ｘn、Ｘf、Ｘw）に応じた履歴データＨに対応する特徴列データＤfおよび波形データＤwが生成される。 When the pronunciation style s1 is selected, the feature sequence corresponding to the pronunciation style s1 in step Sb104 of the first management process Sb1, step Sb204 of the second management process Sb2, and step Sb304 of the third management process Sb3. The feature sequence data Df of F and the waveform data Dw of the waveform W corresponding to the pronunciation style s1 are generated. That is, the feature sequence data Df and the waveform data corresponding to the history data H corresponding to the instruction (Xn, Xf, Xw) from the user among the plurality of history data H (Hn, Hf, Hw) corresponding to the pronunciation style s1. Dw is generated.

　次に、発音スタイルｓ2が選択されている場合を想定する。第２編集処理Ｓa2において、第２編集部３３は、発音スタイルｓ2に応じた特徴列データＤfを、利用者からの編集指示Ｑfに応じて編集する。そして、情報管理部４０は、特徴列データＤfの編集毎に、当該編集後の特徴列データＤfに応じた第２履歴データＨf[Vn,Vf,Vw]を履歴領域に保存する。同様に、第３編集処理Ｓa3において、第３編集部３５は、発音スタイルｓ2に応じた波形データＤwを、利用者からの編集指示Ｑwに応じて編集する。そして、情報管理部４０は、波形データＤwの編集毎に、当該編集後の波形データＤwに応じた第３履歴データＨw[Vn,Vf,Vw]を履歴領域に保存する。なお、発音スタイルｓ2が選択された状態で生成される特徴列データＤfまたは波形データＤwは、「第２時系列データ」の一例である。また、発音スタイルｓ2が選択された状態で利用者から付与される編集指示Ｑfまたは編集指示Ｑwは、「第２指示」の一例である。 Next, assume that the pronunciation style s2 is selected. In the second editing process Sa2, the second editing unit 33 edits the feature sequence data Df corresponding to the pronunciation style s2 according to the editing instruction Qf from the user. Then, each time the feature column data Df is edited, the information management unit 40 saves the second history data Hf [Vn, Vf, Vw] corresponding to the edited feature column data Df in the history area. Similarly, in the third editing process Sa3, the third editing unit 35 edits the waveform data Dw corresponding to the pronunciation style s2 according to the editing instruction Qw from the user. Then, each time the waveform data Dw is edited, the information management unit 40 saves the third history data Hw [Vn, Vf, Vw] corresponding to the edited waveform data Dw in the history area. The feature sequence data Df or waveform data Dw generated when the pronunciation style s2 is selected is an example of "second time series data". Further, the editing instruction Qf or the editing instruction Qw given by the user with the pronunciation style s2 selected is an example of the "second instruction".

　発音スタイルｓ2が選択されている場合、第１管理処理Ｓb1のステップＳb104と、第２管理処理Ｓb2のステップＳb204と、第３管理処理Ｓb3のステップＳb304とにおいては、発音スタイルｓ2に対応する特徴列Ｆの特徴列データＤfと、発音スタイルｓ2に対応する波形Ｗの波形データＤwとが生成される。すなわち、発音スタイルｓ2に対応する複数の履歴データＨ（Ｈn、ＨfおよびＨw）のうち利用者からの指示（Ｘn、ＸfまたはＸw）に応じた履歴データＨに対応する特徴列データＤfおよび波形データＤwが生成される。 When the pronunciation style s2 is selected, the feature sequence corresponding to the pronunciation style s2 in the step Sb104 of the first management process Sb1, the step Sb204 of the second management process Sb2, and the step Sb304 of the third management process Sb3. The feature sequence data Df of F and the waveform data Dw of the waveform W corresponding to the pronunciation style s2 are generated. That is, the feature sequence data Df and the waveform data corresponding to the history data H corresponding to the instruction (Xn, Xf or Xw) from the user among the plurality of history data H (Hn, Hf and Hw) corresponding to the pronunciation style s2. Dw is generated.

　以上の例示から理解される通り、第２実施形態における編集処理部３０は、発音スタイルｓ1に対応する特徴列データＤfおよび波形データＤw、または、発音スタイルｓ2に対応する特徴列データＤfおよび波形データＤwを、共通のバージョンの音符列データＤnに応じて取得する。 As can be understood from the above examples, the editing processing unit 30 in the second embodiment has the feature sequence data Df and the waveform data Dw corresponding to the pronunciation style s1, or the feature sequence data Df and the waveform data corresponding to the pronunciation style s2. Dw is acquired according to the common version of the note string data Dn.

　以上に例示した通り、第２実施形態においては、発音スタイルｓ1に対応する特徴列データＤfおよび波形データＤwの編集の履歴が記憶装置１２に保存され、発音スタイルｓ2に対応する特徴列データＤfおよび波形データＤwの編集の履歴が記憶装置１２に保存される。したがって、発音スタイルｓ1に対応する特徴列データＤfまたは波形データＤwの編集と、発音スタイルｓ2に対応する特徴列データＤfまたは波形データＤwの編集とを、利用者からの指示に応じて試行錯誤的に実行することが可能である。 As illustrated above, in the second embodiment, the editing history of the feature sequence data Df and the waveform data Dw corresponding to the pronunciation style s1 is stored in the storage device 12, and the feature sequence data Df and the feature sequence data Df corresponding to the pronunciation style s2 are stored. The editing history of the waveform data Dw is stored in the storage device 12. Therefore, the editing of the feature sequence data Df or the waveform data Dw corresponding to the pronunciation style s1 and the editing of the feature sequence data Df or the waveform data Dw corresponding to the pronunciation style s2 are performed by trial and error according to the instruction from the user. It is possible to execute.

　例えば、操作装置１５の操作により利用者が発音スタイルｓ間の比較を指示すると、表示制御部２０は、図１４の比較画面Ｕを表示装置１４に表示させる。比較画面Ｕは、第１領域Ｕ1と操作画像Ｕ1a（呼出）と操作画像Ｕ1b（再生）と第２領域Ｕ2と操作画像Ｕ2a（呼出）と操作画像Ｕ2b（再生）とを含む。 For example, when the user instructs the comparison between the pronunciation styles s by the operation of the operation device 15, the display control unit 20 causes the display device 14 to display the comparison screen U of FIG. The comparison screen U includes a first region U1, an operation image U1a (call), an operation image U1b (reproduction), a second region U2, an operation image U2a (call), and an operation image U2b (reproduction).

　第１領域Ｕ1および第２領域Ｕ2の各々には、第１履歴データＨn[Vn,Vf,Vw]と第２履歴データＨf[Vn,Vf,Vw]と第３履歴データＨw[Vn,Vf,Vw]との間の階層関係が表示される。利用者は、操作装置１５を操作することで、第１領域Ｕ1および第２領域Ｕ2の各々について、所望の履歴データＨを選択することが可能である。具体的には、利用者は、発音スタイルｓと各バージョン番号（Ｖn，Ｖf，Ｖw）とを指定することで、第１領域Ｕ1および第２領域Ｕ2の各々について所望の履歴データＨを選択する。 In each of the first region U1 and the second region U2, the first history data Hn [Vn, Vf, Vw], the second history data Hf [Vn, Vf, Vw] and the third history data Hw [Vn, Vf, The hierarchical relationship with Vw] is displayed. By operating the operating device 15, the user can select desired historical data H for each of the first region U1 and the second region U2. Specifically, the user selects the desired history data H for each of the first region U1 and the second region U2 by designating the pronunciation style s and each version number (Vn, Vf, Vw). ..

　操作画像Ｕ1a（呼出）を利用者が選択した場合、制御装置１１は、第１領域Ｕ1において選択されている履歴データＨを記憶装置１２から取得し、当該履歴データＨに応じた編集画面Ｇを表示装置１４に表示させる。具体的には、制御装置１１は、第１領域Ｕ1について選択された履歴データＨの発音スタイルｓと各バージョン番号（Ｖn，Ｖf，Ｖw）とに応じて、音符列Ｎの第１履歴データＨn[Vn=Xn,Vf=0,Vw=0]と、発音スタイルｓに対応する特徴列Ｆの第２履歴データＨf[Vn=Xn,Vf=1,Vw=0]～Ｈf[Vn=Xn,Vf=Xf,Vw=0]と、発音スタイルｓに対応する波形Ｗの第３履歴データＨw[Vn=Xn,Vf=Xf,Vw=1]～Ｈw[Vn=Xn,Vf=Xf,Vw=Xw]とを履歴領域から取得する。制御装置１１は、履歴領域から取得した各履歴データＨを利用して、発音スタイルｓのバージョン番号（Ｖn，Ｖf，Ｖw）に対応する特徴列Ｆの特徴列データＤfと波形Ｗの波形データＤwとを生成する。そして、制御装置１１は、第１履歴データＨn[Vn=Xn,Vf=0,Vw=0]が示す音符列と、特徴列データＤfが示す特徴列Fと、波形データＤwが示す波形Ｗとを含む表示画面Gを、表示装置１４に表示させる。また、操作画像Ｕ1b（再生）を利用者が選択した場合、制御装置１１は、第１領域Ｕ1について以上の手順で生成した波形データＤwに応じた音響信号Ｚを、放音装置１３に供給することで、合成音を再生させる。 When the user selects the operation image U1a (call), the control device 11 acquires the history data H selected in the first area U1 from the storage device 12, and displays the edit screen G corresponding to the history data H. Display on the display device 14. Specifically, the control device 11 determines the first history data Hn of the note sequence N according to the pronunciation style s of the history data H selected for the first region U1 and each version number (Vn, Vf, Vw). [Vn = Xn, Vf = 0, Vw = 0] and the second history data Hf [Vn = Xn, Vf = 1, Vw = 0] to Hf [Vn = Xn, of the feature column F corresponding to the pronunciation style s. Vf = Xf, Vw = 0] and the third history data Hw [Vn = Xn, Vf = Xf, Vw = 1] to Hw [Vn = Xn, Vf = Xf, Vw =] of the waveform W corresponding to the pronunciation style s. Xw] and get from the history area. The control device 11 uses each history data H acquired from the history area to display the feature sequence data Df of the feature sequence F and the waveform data Dw of the waveform W corresponding to the version numbers (Vn, Vf, Vw) of the pronunciation style s. And generate. Then, the control device 11 has a note sequence indicated by the first history data Hn [Vn = Xn, Vf = 0, Vw = 0], a feature sequence F indicated by the feature sequence data Df, and a waveform W indicated by the waveform data Dw. The display screen G including the above is displayed on the display device 14. When the user selects the operation image U1b (reproduction), the control device 11 supplies the sound emitting device 13 with the acoustic signal Z corresponding to the waveform data Dw generated in the above procedure for the first region U1. By doing so, the synthetic sound is reproduced.

　同様に、操作画像Ｕ2a（呼出）を利用者が選択した場合、制御装置１１は、第２領域Ｕ2において選択されている履歴データＨを記憶装置１２から取得し、当該履歴データＨに応じた編集画面Ｇを表示装置１４に表示させる。具体的には、制御装置１１は、第１領域Ｕ1について前述したのと同様の手順により、利用者が第２領域Ｕ2について指定した発音スタイルｓと各バージョン番号（Ｖn，Ｖf，Ｖw）とに対応する特徴列データＤfおよび波形データＤwを生成する。そして、制御装置１１は、第１履歴データＨn[Vn=Xn,Vf=0,Vw=0]が示す音符列と、特徴列データＤfが示す特徴列Fと、波形データＤwが示す波形Ｗとを含む表示画面Gを、表示装置１４に表示させる。また、操作画像Ｕ2b（再生）を利用者が選択した場合、制御装置１１は、第２領域Ｕ2について以上の手順で生成した波形データＤwに応じた音響信号Ｚを、放音装置１３に供給することで、合成音を再生させる。 Similarly, when the user selects the operation image U2a (call), the control device 11 acquires the history data H selected in the second region U2 from the storage device 12, and edits the history data H according to the history data H. The screen G is displayed on the display device 14. Specifically, the control device 11 sets the pronunciation style s and each version number (Vn, Vf, Vw) specified by the user for the second region U2 by the same procedure as described above for the first region U1. Generate the corresponding feature sequence data Df and waveform data Dw. Then, the control device 11 has a note sequence indicated by the first history data Hn [Vn = Xn, Vf = 0, Vw = 0], a feature sequence F indicated by the feature sequence data Df, and a waveform W indicated by the waveform data Dw. The display screen G including the above is displayed on the display device 14. When the user selects the operation image U2b (reproduction), the control device 11 supplies the sound emitting device 13 with the acoustic signal Z corresponding to the waveform data Dw generated in the above procedure for the second region U2. By doing so, the synthetic sound is reproduced.

　以上の例示から理解される通り、利用者は、第１領域Ｕ1から選択されたバージョンおよび発音スタイルｓの組合せと、第２領域Ｕ2から選択されたバージョンおよび発音スタイルｓの組合せとを相互に比較しながら、音符列Ｎと特徴列Ｆと波形Ｗと発音スタイルｓとを調整することが可能である。 As can be understood from the above examples, the user mutually compares the combination of the version and the pronunciation style s selected from the first region U1 with the combination of the version and the pronunciation style s selected from the second region U2. While, it is possible to adjust the note sequence N, the feature sequence F, the waveform W, and the pronunciation style s.

Ｃ：第３実施形態
　図１５は、第３実施形態における合成音の説明図である。第３実施形態の合成音は、時間軸上で相互に並行する複数のトラックＴ（Ｔ1、Ｔ2、…）で構成される。例えば、複数の演奏パートで構成される楽器音を合成音とした場合、各演奏パートがトラックＴに相当する。また、複数の歌唱パートで構成される歌唱音を合成音とした場合、各歌唱パートがトラックＴに相当する。 C: Third Embodiment FIG. 15 is an explanatory diagram of the synthetic sound in the third embodiment. The synthetic sound of the third embodiment is composed of a plurality of tracks T (T1, T2, ...) Parallel to each other on the time axis. For example, when a musical instrument sound composed of a plurality of performance parts is regarded as a synthetic sound, each performance part corresponds to the track T. Further, when a singing sound composed of a plurality of singing parts is used as a synthetic sound, each singing part corresponds to the track T.

　複数のトラックＴの各々は、時間軸上で相互に重複しない複数の区間（以下「単位区間」という）Ｒを含む。複数の単位区間Ｒの各々は、時間軸上において音符列Ｎを含む区間（リージョン）である。すなわち、時間軸上で相互に近接する複数の音符の集合を音符列Ｎとして、音符列Ｎ毎に単位区間Ｒが設定される。各単位区間Ｒの時間長は、音符列Ｎの音符の総数または各音符の継続長等に応じた可変長である。 Each of the plurality of tracks T includes a plurality of sections (hereinafter referred to as "unit intervals") R that do not overlap each other on the time axis. Each of the plurality of unit intervals R is an interval (region) including the note string N on the time axis. That is, a unit interval R is set for each note sequence N, with a set of a plurality of notes that are close to each other on the time axis as a note sequence N. The time length of each unit interval R is a variable length according to the total number of notes in the note sequence N, the continuation length of each note, and the like.

　図１６は、第３実施形態における編集画面Ｇの模式図である。合成音の複数のトラックＴから利用者が選択した１個のトラックＴの複数の単位区間Ｒのうち、利用者が選択した１個の単位区間Ｒに関する情報（音符列Ｎ、特徴列Ｆまたは波形Ｗ）が、編集画面Ｇに表示される。第２実施形態の編集画面Ｇにおいては、第１実施形態と同様の要素に操作領域Ｇtと操作領域Ｇrとが追加される。 FIG. 16 is a schematic diagram of the editing screen G in the third embodiment. Information on one unit interval R selected by the user (note sequence N, feature sequence F, or waveform) among the plurality of unit intervals R of one track T selected by the user from the plurality of tracks T of the synthetic sound. W) is displayed on the edit screen G. In the editing screen G of the second embodiment, the operation area Gt and the operation area Gr are added to the same elements as those of the first embodiment.

　操作領域Ｇtは、合成音のトラックＴに関する領域である。具体的には、操作領域Ｇtには、トラックバージョン番号Ｖtと操作画像Ｇt1と操作画像Ｇt2とが表示される。トラックバージョン番号Ｖtは、編集画面Ｇに表示されるトラックＴのバージョンを表す番号である。編集画面Ｇに表示されたトラックＴに関する情報（音符列Ｎ、特徴列Ｆまたは波形Ｗ）の編集毎にトラックバージョン番号Ｖtが１ずつ増加する。また、利用者は、操作装置１５を操作することで、操作領域Ｇt内のトラックバージョン番号Ｖtを任意の数値に変更することが可能である。 The operation area Gt is an area related to the track T of the synthetic sound. Specifically, the track version number Vt, the operation image Gt1 and the operation image Gt2 are displayed in the operation area Gt. The track version number Vt is a number representing the version of the track T displayed on the edit screen G. The track version number Vt is incremented by 1 each time the information about the track T displayed on the edit screen G (note string N, feature column F, or waveform W) is edited. Further, the user can change the track version number Vt in the operation area Gt to an arbitrary numerical value by operating the operation device 15.

　操作画像Ｇt1および操作画像Ｇt2は、操作装置１５を利用して利用者が操作可能なソフトウェアボタンである。操作画像Ｇt1は、トラックＴに関する情報（音符列Ｎ、特徴列Ｆまたは波形Ｗ）を直前の編集の実行前の状態に戻すこと（Undo）を利用者が指示するための操作子である。また、操作画像Ｇt2は、操作画像Ｇt1に対する操作で取消された編集を再び実行すること（Redo）を利用者が指示するための操作子である。 The operation image Gt1 and the operation image Gt2 are software buttons that can be operated by the user using the operation device 15. The operation image Gt1 is an operator for instructing the user to return the information (note string N, feature sequence F, or waveform W) related to the track T to the state before the execution of the immediately preceding edit (Undo). Further, the operation image Gt2 is an operator for instructing the user to perform the editing canceled by the operation on the operation image Gt1 again (Redo).

　操作領域Ｇrは、合成音の単位区間Ｒに関する領域である。具体的には、操作領域Ｇrには、区間バージョン番号Ｖrと操作画像Ｇr1と操作画像Ｇr2とが表示される。区間バージョン番号Ｖrは、編集画面Ｇに表示される単位区間Ｒのバージョンを表す番号である。編集画面Ｇに表示された単位区間Ｒに関する情報（音符列Ｎ、特徴列Ｆまたは波形Ｗ）の編集毎に区間バージョン番号Ｖrが１ずつ増加する。また、利用者は、操作装置１５を操作することで、操作領域Ｇt内のトラックバージョン番号Ｖtを任意の数値に変更することが可能である。 The operation area Gr is an area related to the unit interval R of the synthetic sound. Specifically, the section version number Vr, the operation image Gr1 and the operation image Gr2 are displayed in the operation area Gr. The section version number Vr is a number representing the version of the unit section R displayed on the edit screen G. The section version number Vr is incremented by 1 each time the information regarding the unit interval R displayed on the edit screen G (note sequence N, feature sequence F, or waveform W) is edited. Further, the user can change the track version number Vt in the operation area Gt to an arbitrary numerical value by operating the operation device 15.

　操作画像Ｇr1および操作画像Ｇr2は、操作装置１５を利用して利用者が操作可能なソフトウェアボタンである。操作画像Ｇr1は、単位区間Ｒに関する情報（音符列Ｎ、特徴列Ｆまたは波形Ｗ）を直前の編集の実行前の状態に戻すこと（Undo）を利用者が指示するための操作子である。また、操作画像Ｇr2は、操作画像Ｇr1に対する操作で取消された編集を再び実行すること（Redo）を利用者が指示するための操作子である。 The operation image Gr1 and the operation image Gr2 are software buttons that can be operated by the user using the operation device 15. The operation image Gr1 is an operator for instructing the user to return the information (note string N, feature sequence F, or waveform W) regarding the unit interval R to the state before the execution of the immediately preceding edit (Undo). Further, the operation image Gr2 is an operator for instructing the user to execute (Redo) the editing canceled by the operation on the operation image Gr1 again.

　編集画面Ｇに表示される１個のトラックＴ内の複数の単位区間Ｒの各々について、編集処理Ｓa（Ｓa1－Ｓa3）または管理処理Ｓb（Ｓb1－Ｓb3）が実行される。編集処理Ｓaにおいて、音符列Ｎと特徴列Ｆと波形Ｗとの何れかが編集されるたびに、情報管理部４０は、トラックバージョン番号Ｖtおよび区間バージョン番号Ｖrを１ずつ増加させる。また、操作画像（Ｇn1、Ｇf1、Ｇw1、Ｇn2、Ｇf2またはＧw2）を利用者が操作した場合も同様に、情報管理部４０は、トラックバージョン番号Ｖtおよび区間バージョン番号Ｖrを１ずつ増加させる。 The editing process Sa (Sa1-Sa3) or the management process Sb (Sb1-Sb3) is executed for each of the plurality of unit intervals R in one track T displayed on the editing screen G. In the editing process Sa, each time any one of the note sequence N, the feature sequence F, and the waveform W is edited, the information management unit 40 increases the track version number Vt and the section version number Vr by one. Further, when the user operates the operation image (Gn1, Gf1, Gw1, Gn2, Gf2 or Gw2), the information management unit 40 similarly increases the track version number Vt and the section version number Vr by one.

　第３実施形態においても第１実施形態と同様の効果が実現される。また、第３実施形態においては、利用者は、時間軸上の複数の単位区間Ｒの各々について試行錯誤的に波形データＤwを生成しながら、音符列データＤnと特徴列データＤfと波形データＤwとの各々の編集を指示できる。 The same effect as that of the first embodiment is realized in the third embodiment. Further, in the third embodiment, the user generates the waveform data Dw by trial and error for each of the plurality of unit intervals R on the time axis, while the note sequence data Dn, the feature sequence data Df, and the waveform data Dw. You can instruct each edit with.

Ｄ：変形例
　以上に例示した各態様に付加される具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２以上の態様を、相互に矛盾しない範囲で適宜に併合してもよい。 D: Modification example The specific modification mode added to each of the above-exemplified embodiments will be exemplified below. Two or more embodiments arbitrarily selected from the following examples may be appropriately merged to the extent that they do not contradict each other.

（１）前述の各形態においては、各バージョンの音符列データＤnを第１履歴データＨn[Vn,Vf,Vw]を履歴領域に保存したが、第１履歴データＨn[Vn,Vf,Vw]が表す事項および第１履歴データＨn[Vn,Vf,Vw]の形式は、以上の例示に限定されない。例えば、音符列データＤnが如何に編集されるか（すなわち編集指示Ｑnの時系列）を表す第１履歴データＨn[Vn,Vf,Vw]を保存してもよい。以上の説明から理解される通り、第１履歴データＨn[Vn,Vf,Vw]は、編集後の音符列Ｎに応じたデータとして包括的に表現される。 (1) In each of the above-mentioned forms, the note string data Dn of each version is stored in the history area as the first history data Hn [Vn, Vf, Vw], but the first history data Hn [Vn, Vf, Vw] And the format of the first history data Hn [Vn, Vf, Vw] are not limited to the above examples. For example, the first history data Hn [Vn, Vf, Vw] indicating how the note string data Dn is edited (that is, the time series of the edit instruction Qn) may be saved. As understood from the above description, the first history data Hn [Vn, Vf, Vw] is comprehensively expressed as data corresponding to the edited note sequence N.

（２）前述の各形態においては、特徴列データＤfが如何に編集されるか（すなわち編集指示Ｑfの時系列）を表す第２履歴データＨf[Vn,Vf,Vw]を履歴領域に保存したが、第２履歴データＨf[Vn,Vf,Vw]が表す事項および第２履歴データＨf[Vn,Vf,Vw]の形式は、以上の例示に限定されない。例えば、編集指示Ｑfに応じた編集後の特徴列データＤfを第２履歴データＨf[Vn,Vf,Vw]として履歴領域に保存してもよい。以上の例示から理解される通り、第２履歴データＨf[Vn,Vf,Vw]は、編集後の特徴列データＤfに応じたデータとして包括的に表現される。 (2) In each of the above-mentioned forms, the second history data Hf [Vn, Vf, Vw] indicating how the feature column data Df is edited (that is, the time series of the edit instruction Qf) is stored in the history area. However, the matters represented by the second history data Hf [Vn, Vf, Vw] and the format of the second history data Hf [Vn, Vf, Vw] are not limited to the above examples. For example, the feature column data Df after editing according to the editing instruction Qf may be saved in the history area as the second history data Hf [Vn, Vf, Vw]. As can be understood from the above examples, the second history data Hf [Vn, Vf, Vw] is comprehensively represented as data corresponding to the edited feature column data Df.

（３）前述の各形態においては、波形データＤwが如何に編集されるか（すなわち編集指示Ｑwの時系列）を表す第３履歴データＨw[Vn,Vf,Vw]を履歴領域に保存したが、第３履歴データＨw[Vn,Vf,Vw]が表す事項および第３履歴データＨw[Vn,Vf,Vw]の形式は、以上の例示に限定されない。例えば、編集指示Ｑwに応じた編集後の波形データＤwを第３履歴データＨw[Vn,Vf,Vw]として履歴領域に保存してもよい。以上の例示から理解される通り、第３履歴データＨw[Vn,Vf,Vw]は、編集後の波形データＤwに応じたデータとして包括的に表現される。 (3) In each of the above-mentioned forms, the third history data Hw [Vn, Vf, Vw] indicating how the waveform data Dw is edited (that is, the time series of the edit instruction Qw) is saved in the history area. , The matters represented by the third history data Hw [Vn, Vf, Vw] and the format of the third history data Hw [Vn, Vf, Vw] are not limited to the above examples. For example, the waveform data Dw after editing according to the editing instruction Qw may be saved in the history area as the third history data Hw [Vn, Vf, Vw]. As can be understood from the above examples, the third history data Hw [Vn, Vf, Vw] is comprehensively expressed as data corresponding to the edited waveform data Dw.

（４）前述の各形態においては、合成音の基本周波数を特徴量とする特徴列Ｆを例示したが、特徴列データＤfが表す特徴量は基本周波数に限定されない。例えば、周波数領域における合成音の周波数スペクトル（例えば強度スペクトル）、または時間軸上の音圧レベルを特徴量として、当該特徴量の時系列（特徴列Ｆ）を表す時系列データを、特徴列データＤfとしてもよい。特徴列データＤfは、音符列データＤnの特徴量の時系列（特徴列Ｆ）を表す時系列データとして包括的に表現される。 (4) In each of the above-described forms, the feature sequence F having the fundamental frequency of the synthesized sound as the feature quantity is illustrated, but the feature quantity represented by the feature sequence data Df is not limited to the fundamental frequency. For example, the frequency spectrum of the synthesized sound in the frequency domain (for example, the intensity spectrum) or the time-series data representing the time series (feature sequence F) of the feature amount with the sound pressure level on the time axis as the feature sequence data. It may be Df. The feature sequence data Df is comprehensively represented as time series data representing a time series (feature sequence F) of the feature amount of the note sequence data Dn.

（５）前述の各形態においては、第２生成部３４が、音符列データＤnと特徴列データＤfとから波形データＤwを生成したが、第２生成部３４が音符列データＤnから波形データＤwを生成する構成、または、第２生成部３４が特徴列データＤfから波形データＤwを生成する構成も想定される。すなわち、第２生成部３４は、音符列データＤnおよび波形データＤwの少なくとも一方から波形データＤwを生成する要素として特定される。 (5) In each of the above-described embodiments, the second generation unit 34 generates the waveform data Dw from the note sequence data Dn and the feature sequence data Df, but the second generation unit 34 generates the waveform data Dw from the note sequence data Dn. Or a configuration in which the second generation unit 34 generates waveform data Dw from the feature column data Df is also assumed. That is, the second generation unit 34 is specified as an element that generates waveform data Dw from at least one of the note string data Dn and the waveform data Dw.

（６）第２実施形態においては、発音スタイルｓを含む入力に対して特徴列データＤfを出力する第１生成モデルＭ1を例示したが、発音スタイルｓに応じた特徴列データＤfを第１生成部３２が生成するための構成は以上の例示に限定されない。例えば、相異なる発音スタイルｓに対応する複数の第１生成モデルＭ1を選択的に利用して特徴列データＤfを生成してもよい。各発音スタイルｓに対応する第１生成モデルＭ1は、当該発音スタイルｓについて用意された複数の第１訓練データを利用した機械学習により構築される。第１生成部３２は、複数の第１生成モデルＭ1のうち利用者が選択した発音スタイルｓに対応する第１生成モデルＭ1に音符列データＤnを入力することで、特徴列データＤfを生成する。 (6) In the second embodiment, the first generation model M1 that outputs the feature sequence data Df for the input including the pronunciation style s is exemplified, but the feature sequence data Df corresponding to the pronunciation style s is first generated. The configuration for the unit 32 to be generated is not limited to the above examples. For example, the feature sequence data Df may be generated by selectively using a plurality of first generation models M1 corresponding to different pronunciation styles s. The first generation model M1 corresponding to each pronunciation style s is constructed by machine learning using a plurality of first training data prepared for the pronunciation style s. The first generation unit 32 generates the feature sequence data Df by inputting the note sequence data Dn into the first generation model M1 corresponding to the pronunciation style s selected by the user among the plurality of first generation models M1. ..

　また、第２実施形態においては、発音スタイルｓを含む入力に対して波形データＤwを出力する第２生成モデルＭ2を例示したが、発音スタイルｓに応じた波形データＤwを第２生成部３４が生成するための構成は以上の例示に限定されない。例えば、相異なる発音スタイルｓに対応する複数の第２生成モデルＭ2を選択的に利用して波形データＤwを生成してもよい。各発音スタイルｓに対応する第２生成モデルＭ2は、当該発音スタイルｓについて用意された複数の第２訓練データを利用した機械学習により構築される。第２生成部３４は、複数の第２生成モデルＭ2のうち利用者が選択した発音スタイルｓに対応する第２生成モデルＭ2に音符列データＤnおよび特徴列データＤf（入力データＤin）を入力することで、波形データＤwを生成する。 Further, in the second embodiment, the second generation model M2 that outputs the waveform data Dw to the input including the pronunciation style s is exemplified, but the second generation unit 34 generates the waveform data Dw according to the pronunciation style s. The configuration for generation is not limited to the above examples. For example, the waveform data Dw may be generated by selectively using a plurality of second generation models M2 corresponding to different pronunciation styles s. The second generative model M2 corresponding to each pronunciation style s is constructed by machine learning using a plurality of second training data prepared for the pronunciation style s. The second generation unit 34 inputs the note sequence data Dn and the feature sequence data Df (input data Din) to the second generation model M2 corresponding to the pronunciation style s selected by the user among the plurality of second generation models M2. As a result, waveform data Dw is generated.

（７）前述の各形態においては、編集画面Ｇの編集領域Ｅwに音響信号Ｚの波形Ｗを表示したが、音響信号Ｚの周波数スペクトルの時系列（すなわちスペクトログラム）を波形Ｗとともに編集画面Ｇに表示してもよい。例えば、図１７に例示された編集画面Ｇは、編集領域Ｅw1と編集領域Ｅw2とを含む。編集領域Ｅw1には、前述の各形態における編集領域Ｅwと同様に波形Ｗが表示される。他方、編集領域Ｅw2には、音響信号Ｚの周波数スペクトルの時系列が表示される。利用者は、編集領域Ｅw1内の波形に対する編集指示Ｑwのほか、編集領域Ｅw2内の周波数スペクトルに対する編集指示Ｑwを、操作装置１５に対する操作で付与できる。 (7) In each of the above-described embodiments, the waveform W of the acoustic signal Z is displayed in the edit area Ew of the edit screen G, but the time series (that is, spectrogram) of the frequency spectrum of the acoustic signal Z is displayed on the edit screen G together with the waveform W. It may be displayed. For example, the editing screen G illustrated in FIG. 17 includes an editing area Ew1 and an editing area Ew2. In the editing area Ew1, the waveform W is displayed in the same manner as the editing area Ew in each of the above-described forms. On the other hand, in the editing area Ew2, the time series of the frequency spectrum of the acoustic signal Z is displayed. In addition to the editing instruction Qw for the waveform in the editing area Ew1, the user can give the editing instruction Qw for the frequency spectrum in the editing area Ew2 by operating the operation device 15.

（８）音符列データＤnは、時間軸上の複数の音符を要素とする音符列Ｎを表す時系列データである。特徴列データＤfは、時間軸上の複数の特徴量を要素とする特徴列Ｆを表す時系列データである。波形データＤwは、時間軸上の複数のサンプルを要素とする波形Ｗを表す時系列データである。以上の例示から理解される通り、音符列データＤnと特徴列データＤfと波形データＤwとは、複数の要素の時系列を表す時系列データとして包括的に表現される。 (8) Note string data Dn is time-series data representing a note sequence N having a plurality of notes on the time axis as elements. The feature sequence data Df is time-series data representing the feature sequence F having a plurality of feature quantities on the time axis as elements. The waveform data Dw is time-series data representing a waveform W having a plurality of samples on the time axis as elements. As understood from the above examples, the note sequence data Dn, the feature sequence data Df, and the waveform data Dw are comprehensively represented as time series data representing a time series of a plurality of elements.

（９）前述の各形態においては、深層ニューラルネットワークを第１生成モデルＭ1および第２生成モデルＭ2として例示したが、第１生成モデルＭ1および第２生成モデルＭ2の構成は任意である。例えばＨＭＭ（Hidden Markov Model）等の他の構造の統計的推定モデルを第１生成モデルＭ1または第２生成モデルＭ2として利用してもよい。 (9) In each of the above-described embodiments, the deep neural network is exemplified as the first generation model M1 and the second generation model M2, but the configurations of the first generation model M1 and the second generation model M2 are arbitrary. For example, a statistical inference model of another structure such as HMM (Hidden Markov Model) may be used as the first generation model M1 or the second generation model M2.

（１０）前述の各形態においては、音符列Ｎに対応する合成音の合成を例示したが、複数の要素の時系列を表す時系列データを処理する任意の場面において、前述の各形態は利用される。例えば、前述の各形態においては、上位層が音符列Ｎに対応し、中位層が特徴列Ｆに対応し、下位層が波形Ｗに対応する形態を例示したが、合成音の合成以外の場面における各階層は、以下に例示する組合せとなる。 (10) In each of the above-mentioned forms, the synthesis of the synthetic sound corresponding to the note string N is illustrated, but each of the above-mentioned forms can be used in any scene for processing time-series data representing a time-series of a plurality of elements. Will be done. For example, in each of the above-mentioned forms, the upper layer corresponds to the note sequence N, the middle layer corresponds to the feature sequence F, and the lower layer corresponds to the waveform W. Each layer in the scene is a combination illustrated below.

　例えば、メロディを生成する自動作曲の場面においては、当該メロディを構成する音符列が上位層に対応し、当該メロディにおけるコードの時系列が中位層に対応し、当該メロディに調和する伴奏音の音符列が下位層に対応する。また、文字列に対応する音声を合成する音声合成の場面においては、当該文字列が上位層に対応し、音声の発音のスタイルが中位層に対応し、当該音声の波形が下位層に対応する。各種の信号を処理する信号処理の場面においては、当該信号の波形が上位層に対応し、当該信号の特徴量の時系列が中位層に対応し、当該信号に対する処理に関するパラメータの時系列が下位層に対応する。以上に例示した何れの形態においても、上位層のデータは「上位データ」と表現され、中位層のデータは「中位データ」と表現され、下位層のデータは「下位データ」と表現される。下位データは，利用者が実際に利用するコンテンツ（例えば前述の各形態における波形Ｗ）を表すデータである。 For example, in the scene of a self-operated song that generates a melody, the note strings constituting the melody correspond to the upper layer, the time series of the chords in the melody corresponds to the middle layer, and the accompaniment sound that harmonizes with the melody. The note sequence corresponds to the lower layer. Further, in the voice synthesis scene in which the voice corresponding to the character string is synthesized, the character string corresponds to the upper layer, the pronunciation style of the voice corresponds to the middle layer, and the waveform of the voice corresponds to the lower layer. do. In the signal processing scene where various signals are processed, the waveform of the signal corresponds to the upper layer, the time series of the feature amount of the signal corresponds to the middle layer, and the time series of the parameters related to the processing for the signal corresponds to the upper layer. Corresponds to the lower layer. In any of the above-exemplified forms, the upper layer data is expressed as "upper data", the middle layer data is expressed as "middle data", and the lower layer data is expressed as "lower data". To. The lower-level data is data representing the content actually used by the user (for example, the waveform W in each of the above-mentioned forms).

　なお、前述の各形態における音符列Ｎを構成する各音符と、音声合成における文字列を構成する各文字とは、音を示すシンボルとして包括的に表現される。また、音符列Ｎおよび文字列は、複数のシンボルが時系列に配列されたシンボル列として包括的に表現される。 Note that each note constituting the note string N in each of the above-mentioned forms and each character constituting the character string in speech synthesis are comprehensively expressed as symbols indicating sounds. Further, the note string N and the character string are comprehensively represented as a symbol string in which a plurality of symbols are arranged in time series.

（１１）以上に例示した音響処理システムの機能は、前述の通り、制御装置１１を構成する単数または複数のプロセッサと、記憶装置１２に記憶されたプログラムとの協働により実現される。本開示に係るプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体または磁気記録媒体等の公知の任意の形式の記録媒体も包含される。なお、非一過性の記録媒体とは、一過性の伝搬信号（transitory, propagating signal）を除く任意の記録媒体を含み、揮発性の記録媒体も除外されない。また、配信装置が通信網を介してプログラムを配信する構成では、当該配信装置においてプログラムを記憶する記憶装置１２が、前述の非一過性の記録媒体に相当する。 (11) As described above, the functions of the acoustic processing system exemplified above are realized by the cooperation of the single or a plurality of processors constituting the control device 11 and the program stored in the storage device 12. The program according to the present disclosure may be provided and installed in a computer in a form stored in a computer-readable recording medium. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disc) such as a CD-ROM is a good example, but a semiconductor recording medium, a magnetic recording medium, or the like is known as arbitrary. Recording media in the form of are also included. The non-transient recording medium includes any recording medium other than the transient propagation signal (transitory, propagating signal), and the volatile recording medium is not excluded. Further, in the configuration in which the distribution device distributes the program via the communication network, the storage device 12 that stores the program in the distribution device corresponds to the above-mentioned non-transient recording medium.

Ｅ：付記
　以上に例示した形態から、例えば以下の構成が把握される。 E: Addendum For example, the following configuration can be grasped from the above-exemplified forms.

　本開示のひとつの態様（態様１）に係る情報処理方法は、第１発音スタイルでシンボル列を発音した音の特徴量の時系列を表す第１時系列データを、利用者からの第１指示に応じて編集し、前記第１時系列データの編集毎に、当該編集後の前記第１時系列データに応じた第１履歴データを新規バージョンのデータとして保存し、前記第１発音スタイルとは異なる第２発音スタイルで前記シンボル列を発音した音の特徴量の時系列を表す第２時系列データを、前記利用者からの第２指示に応じて編集し、前記第２時系列データの編集毎に、当該編集後の前記第２時系列データに応じた第２履歴データを新規バージョンのデータとして保存し、前記保存された相異なるバージョンの複数の第１履歴データのうち前記利用者からの指示に応じた第１履歴データに対応する第１時系列データ、または、前記保存された相異なるバージョンの複数の第２履歴データのうち前記利用者からの指示に応じた第２履歴データに対応する第２時系列データを取得する。 In the information processing method according to one aspect (aspect 1) of the present disclosure, the first time-series data representing the time-series of the feature amount of the sound in which the symbol string is sounded in the first pronunciation style is given as the first instruction from the user. The first history data corresponding to the edited first time-series data is saved as new version data for each edit of the first time-series data, and what is the first pronunciation style? The second time-series data representing the time series of the feature amount of the sound that pronounced the symbol string with a different second pronunciation style is edited according to the second instruction from the user, and the second time-series data is edited. For each time, the second history data corresponding to the edited second time-series data is saved as new version data, and among the saved first history data of different versions, from the user. Corresponds to the first time-series data corresponding to the first history data corresponding to the instruction, or the second history data corresponding to the instruction from the user among the plurality of saved second history data of different versions. 2nd time series data to be acquired.

　以上の態様によれば、第１発音スタイルに対応する第１時系列データの編集の履歴が保存され、第２発音スタイルに対応する第２時系列データの編集の履歴が保存される。したがって、第１発音スタイルに対応する第１時系列データの編集と、第２発音スタイルに対応する第２時系列データの編集とを、利用者からの指示に応じて試行錯誤的に実行することが可能である。なお、「シンボル列」は、例えば音符列または文字列である。 According to the above aspect, the history of editing the first time-series data corresponding to the first pronunciation style is saved, and the history of editing the second time-series data corresponding to the second pronunciation style is saved. Therefore, the editing of the first time-series data corresponding to the first pronunciation style and the editing of the second time-series data corresponding to the second pronunciation style are executed by trial and error according to the instruction from the user. Is possible. The "symbol string" is, for example, a musical note string or a character string.

　態様１の具体例（態様２）において、前記シンボル列は、時系列に配列された複数の音符を含む音符列である。また、態様２の具体例（態様３）において、前記音符列を表す音符列データを前記利用者からの指示に応じて編集し、前記第１時系列データおよび前記第２時系列データは、共通のバージョンの前記音符列データから生成される。 In the specific example of the first aspect (the second aspect), the symbol string is a note sequence including a plurality of notes arranged in a time series. Further, in the specific example of the second aspect (aspect 3), the note sequence data representing the note sequence is edited according to the instruction from the user, and the first time series data and the second time series data are common. Generated from the note string data of the version of.

　態様１から態様３の具体例（態様４）において、前記取得においては、前記複数の第１履歴データのうち直前の編集後の第１履歴データ、および、前記複数の第２履歴データのうち直前の編集後の第２履歴データの何れかを取得する。以上の構成によれば、直前の編集の実行前（すなわち当該編集を取消した状態）の第１履歴データまたは第２履歴データを取得できる。 In the specific example of the first to third aspects (aspect 4), in the acquisition, the first history data after the editing immediately before the plurality of first history data and the immediately preceding of the plurality of second history data. Acquire any of the second history data after editing. According to the above configuration, the first history data or the second history data before the execution of the immediately preceding edit (that is, the state in which the edit is canceled) can be acquired.

　態様１から態様３の具体例（態様５）において、前記取得においては、前記複数の第１履歴データのうち前記利用者が指定したバージョンの第１履歴データ、および、前記複数の第２履歴データのうち前記利用者が指定したバージョンの第２履歴データの何れかを取得する。以上の構成によれば、利用者からの指示に応じた任意のバージョンに対応する第１履歴データまたは第２履歴データを取得できる。 In the specific example of the first to third aspects (aspect 5), in the acquisition, the first history data of the version designated by the user among the plurality of first history data, and the plurality of second history data. Of these, any of the second history data of the version specified by the user is acquired. According to the above configuration, it is possible to acquire the first history data or the second history data corresponding to any version according to the instruction from the user.

　本開示のひとつの態様に係る情報処理システムは、第１発音スタイルでシンボル列を発音した音の特徴量の時系列を表す第１時系列データを、利用者からの第１指示に応じて編集し、前記第１発音スタイルとは異なる第２発音スタイルで前記シンボル列を発音した音の特徴量の時系列を表す第２時系列データを、前記利用者からの第２指示に応じて編集する編集処理部と、前記第１時系列データの編集毎に、当該編集後の前記第１時系列データに応じた第１履歴データを新規バージョンのデータとして保存し、前記第２時系列データの編集毎に、当該編集後の前記第２時系列データに応じた第２履歴データを新規バージョンのデータとして保存する情報管理部とを具備し、前記情報管理部は、前記保存された相異なるバージョンの複数の第１履歴データのうち前記利用者からの指示に応じた第１履歴データに対応する第１時系列データ、または、前記保存された相異なるバージョンの複数の第２履歴データのうち前記利用者からの指示に応じた第２履歴データに対応する第２時系列データを取得する。本開示のひとつの態様に係るプログラムは、コンピュータシステムを以上の情報処理システムとして機能させる。 The information processing system according to one aspect of the present disclosure edits the first time-series data representing the time-series of the feature amount of the sound that pronounces the symbol string in the first pronunciation style according to the first instruction from the user. Then, the second time series data representing the time series of the feature amount of the sound that pronounced the symbol string in the second pronunciation style different from the first pronunciation style is edited according to the second instruction from the user. Each time the editing processing unit edits the first time-series data, the first history data corresponding to the edited first time-series data is saved as new version data, and the second time-series data is edited. Each time, it is provided with an information management unit that saves the second history data corresponding to the edited second time-series data as new version data, and the information management unit has the saved different versions. The use of the first time-series data corresponding to the first history data according to the instruction from the user among the plurality of first history data, or the second history data of a plurality of different versions of the saved data. Acquire the second time-series data corresponding to the second history data according to the instruction from the person. The program according to one aspect of the present disclosure causes the computer system to function as the above information processing system.

１００…情報処理システム、１１…制御装置、１２…記憶装置、１３…放音装置、１４…表示装置、１５…操作装置、２０…表示制御部、３０…編集処理部、３１…第１編集部、３２…第１生成部、３３…第２編集部、３４…第２生成部、３５…第３編集部、Ｍ1…第１生成モデル、Ｍ2…第２生成モデル。 100 ... Information processing system, 11 ... Control device, 12 ... Storage device, 13 ... Sound emitting device, 14 ... Display device, 15 ... Operation device, 20 ... Display control unit, 30 ... Editing processing unit, 31 ... First editing unit , 32 ... 1st generation unit, 33 ... 2nd editorial unit, 34 ... 2nd generation unit, 35 ... 3rd editorial unit, M1 ... 1st generation model, M2 ... 2nd generation model.

Claims

　第１発音スタイルでシンボル列を発音した音の特徴量の時系列を表す第１時系列データを、利用者からの第１指示に応じて編集し、
　前記第１時系列データの編集毎に、当該編集後の前記第１時系列データに応じた第１履歴データを新規バージョンのデータとして保存し、
　前記第１発音スタイルとは異なる第２発音スタイルで前記シンボル列を発音した音の特徴量の時系列を表す第２時系列データを、前記利用者からの第２指示に応じて編集し、
　前記第２時系列データの編集毎に、当該編集後の前記第２時系列データに応じた第２履歴データを新規バージョンのデータとして保存し、
　前記保存された相異なるバージョンの複数の第１履歴データのうち前記利用者からの指示に応じた第１履歴データに対応する第１時系列データ、または、前記保存された相異なるバージョンの複数の第２履歴データのうち前記利用者からの指示に応じた第２履歴データに対応する第２時系列データを取得する
　コンピュータシステムにより実現される情報処理方法。 The first time-series data representing the time-series of the feature amount of the sound that pronounced the symbol string in the first pronunciation style is edited according to the first instruction from the user.
Every time the first time-series data is edited, the first history data corresponding to the edited first time-series data is saved as new version data.
The second time-series data representing the time-series of the feature amount of the sound that pronounced the symbol string in the second pronunciation style different from the first pronunciation style is edited according to the second instruction from the user.
Every time the second time-series data is edited, the second history data corresponding to the edited second time-series data is saved as new version data.
Among the plurality of saved different versions of the first history data, the first time-series data corresponding to the first history data in response to the instruction from the user, or the plurality of saved different versions of the first history data. An information processing method realized by a computer system that acquires second time-series data corresponding to the second history data according to an instruction from the user among the second history data.
　前記シンボル列は、時系列に配列された複数の音符を含む音符列である
　請求項１の情報処理方法。 The information processing method according to claim 1, wherein the symbol sequence is a musical note sequence including a plurality of musical notes arranged in a time series.
前記音符列を表す音符列データを前記利用者からの指示に応じて編集し、
　前記第１時系列データおよび前記第２時系列データは、共通のバージョンの前記音符列データから生成される
　請求項２の情報処理方法。 The note string data representing the note string is edited according to the instruction from the user, and the note string data is edited.
The information processing method according to claim 2, wherein the first time-series data and the second time-series data are generated from a common version of the note string data.
　前記取得においては、前記複数の第１履歴データのうち直前の編集後の第１履歴データ、および、前記複数の第２履歴データのうち直前の編集後の第２履歴データの何れかを取得する
　請求項１から請求項３の何れかの情報処理方法。 In the acquisition, either the first history data after the previous editing among the plurality of first history data and the second history data after the previous editing among the plurality of second history data are acquired. The information processing method according to any one of claims 1 to 3.
　前記取得においては、前記複数の第１履歴データのうち前記利用者が指定したバージョンの第１履歴データ、および、前記複数の第２履歴データのうち前記利用者が指定したバージョンの第２履歴データの何れかを取得する
　請求項１から請求項３の何れかの情報処理方法。 In the acquisition, the first history data of the version specified by the user among the plurality of first history data and the second history data of the version designated by the user among the plurality of second history data. The information processing method according to any one of claims 1 to 3.
　第１発音スタイルでシンボル列を発音した音の特徴量の時系列を表す第１時系列データを、利用者からの第１指示に応じて編集し、前記第１発音スタイルとは異なる第２発音スタイルで前記シンボル列を発音した音の特徴量の時系列を表す第２時系列データを、前記利用者からの第２指示に応じて編集する編集処理部と、
　前記第１時系列データの編集毎に、当該編集後の前記第１時系列データに応じた第１履歴データを新規バージョンのデータとして保存し、前記第２時系列データの編集毎に、当該編集後の前記第２時系列データに応じた第２履歴データを新規バージョンのデータとして保存する情報管理部とを具備し、
　前記情報管理部は、前記保存された相異なるバージョンの複数の第１履歴データのうち前記利用者からの指示に応じた第１履歴データに対応する第１時系列データ、または、前記保存された相異なるバージョンの複数の第２履歴データのうち前記利用者からの指示に応じた第２履歴データに対応する第２時系列データを取得する
　情報処理システム。 The first time-series data representing the time-series of the feature amount of the sound in which the symbol string is pronounced in the first pronunciation style is edited according to the first instruction from the user, and the second pronunciation is different from the first pronunciation style. An editing processing unit that edits the second time-series data representing the time-series of the feature amount of the sound that pronounces the symbol string in the style according to the second instruction from the user.
Each time the first time-series data is edited, the first history data corresponding to the edited first time-series data is saved as new version data, and each time the second time-series data is edited, the edit is performed. It is equipped with an information management unit that saves the second history data corresponding to the second time series data later as new version data.
The information management unit is the first time-series data corresponding to the first history data according to the instruction from the user among the plurality of first history data of different versions saved, or the saved first history data. An information processing system that acquires second time-series data corresponding to the second history data in response to an instruction from the user among a plurality of second history data of different versions.
　第１発音スタイルでシンボル列を発音した音の特徴量の時系列を表す第１時系列データを、利用者からの第１指示に応じて編集し、前記第１発音スタイルとは異なる第２発音スタイルで前記シンボル列を発音した音の特徴量の時系列を表す第２時系列データを、前記利用者からの第２指示に応じて編集する編集処理部、および、
　前記第１時系列データの編集毎に、当該編集後の前記第１時系列データに応じた第１履歴データを新規バージョンのデータとして保存し、前記第２時系列データの編集毎に、当該編集後の前記第２時系列データに応じた第２履歴データを新規バージョンのデータとして保存する情報管理部、
　としてコンピュータシステムを機能させるプログラムであって、
　前記情報管理部は、前記保存された相異なるバージョンの複数の第１履歴データのうち前記利用者からの指示に応じた第１履歴データに対応する第１時系列データ、または、前記保存された相異なるバージョンの複数の第２履歴データのうち前記利用者からの指示に応じた第２履歴データに対応する第２時系列データを取得する
　プログラム。 The first time-series data representing the time-series of the feature amount of the sound in which the symbol string is pronounced in the first pronunciation style is edited according to the first instruction from the user, and the second pronunciation is different from the first pronunciation style. An editing processing unit that edits the second time-series data representing the time-series of the feature amount of the sound that pronounces the symbol string in the style according to the second instruction from the user, and
Each time the first time-series data is edited, the first history data corresponding to the edited first time-series data is saved as new version data, and each time the second time-series data is edited, the edit is performed. Information management unit that saves the second history data corresponding to the second time series data later as new version data,
It is a program that makes a computer system function as
The information management unit is the first time-series data corresponding to the first history data according to the instruction from the user among the plurality of first history data of different versions saved, or the saved first history data. A program for acquiring second time-series data corresponding to the second history data in response to an instruction from the user among a plurality of second history data of different versions.