JP5589741B2

JP5589741B2 - Music editing apparatus and program

Info

Publication number: JP5589741B2
Application number: JP2010229845A
Authority: JP
Inventors: 英治赤澤; 秀紀劔持; 健一山内
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2010-10-12
Filing date: 2010-10-12
Publication date: 2014-09-17
Anticipated expiration: 2030-10-12
Also published as: JP2012083564A

Description

本発明は楽曲を示すデータを編集する技術に関する。 The present invention relates to a technique for editing data indicating music.

楽曲のデータを編集する種々の技術が知られている。特許文献１は、テキスト情報を解析してピッチを変化させる技術を開示している。特許文献２は、楽器の画像の上でマウスをドラッグするとビブラート等の特殊効果を音色に加えることを開示している。特許文献３は、正規化したビブラート波形にエンベロープ波形を乗算し、ビブラート波形として用いる技術を開示している。特許文献４は、ポインティング操作子のドラッグ操作に応じてパラメータ値をアサインする技術を開示している。 Various techniques for editing music data are known. Patent Document 1 discloses a technique for analyzing text information and changing the pitch. Patent Document 2 discloses that special effects such as vibrato are added to a timbre when a mouse is dragged on an image of a musical instrument. Patent Document 3 discloses a technique of multiplying a normalized vibrato waveform by an envelope waveform and using it as a vibrato waveform. Patent Document 4 discloses a technique for assigning a parameter value according to a drag operation of a pointing operator.

特開２００５−２５０２６４号公報JP 2005-250264 A 特開平１０−１４３１５５号公報Japanese Patent Laid-Open No. 10-143155 特開２００１−３１８６７５号公報JP 2001-318675 A 特開２００２−３７２９７２号公報JP 2002-372972 A

特許文献１〜４の技術によっても、音符に対する音楽的な修飾と直感的な操作で入力することは難しかった。
本発明は、音符に対する音楽的な修飾を直感的な操作で入力することを可能にする技術を提供する。 Even with the techniques of Patent Documents 1 to 4, it is difficult to input music with musical modification and intuitive operation.
The present invention provides a technique that enables a musical modification to a note to be input by an intuitive operation.

本発明は、複数の音符の各々について、当該音符の発音期間の始期、音高、音長、および当該音符に対する音楽的な修飾を示す付加属性を含む属性を記憶する属性記憶手段と、
音高を表す第１軸および時間を表す第２軸を有する座標系に従って、前記複数の音符の各々の発音期間の始期、音高、および音長を表す図形を表示手段の画面に表示させる表示制御手段と、前記画面上の位置を検出する位置検出手段と、前記位置検出手段により検出された位置の軌跡の始点が、前記複数の音符の各々に相当する領域内になかった場合、前記始点に応じた発音期間の始期を有する新たな音符を処理対象音符として、前記軌跡に応じて、前記処理対象音符の付加属性を前記属性記憶手段に追加する属性変更手段と、前記属性記憶手段に記憶されている属性を用いて、前記複数の音符に対応する楽曲データを生成するデータ生成手段とを有する楽曲編集装置を提供する。 The present invention, for each of a plurality of notes, an attribute storage means for storing attributes including additional attributes indicating the beginning of the sound generation period of the notes, the pitch, the tone length, and musical modifications to the notes;
A display for displaying on the screen of the display means a graphic representing the beginning, pitch, and length of the pronunciation period of each of the plurality of notes according to a coordinate system having a first axis representing pitch and a second axis representing time. A control means; a position detection means for detecting a position on the screen; and a start point of a locus of a position detected by the position detection means when the start point is not within an area corresponding to each of the plurality of notes. A new note having the beginning of the pronunciation period according to the processing target note, an attribute changing unit for adding an additional attribute of the processing target note to the attribute storage unit according to the trajectory, and a storage in the attribute storage unit There is provided a music editing apparatus having data generation means for generating music data corresponding to the plurality of notes using the attribute that is set.

また、本発明は、複数の音符の各々について、当該音符の発音期間の始期、音高、音長、および当該音符に対する音楽的な修飾を示す付加属性を含む属性を記憶する属性記憶手段と、音高を表す第１軸および時間を表す第２軸を有する座標系に従って、前記複数の音符の各々の発音期間の始期、音高、および音長を表す図形を表示手段の画面に表示させる表示制御手段と、前記画面上の位置を検出する位置検出手段と、前記位置検出手段により検出された位置の軌跡が、前記複数の音符のうち一の音符に対応する領域とあらかじめ決められた条件を満たした場合、前記軌跡に応じて、前記属性記憶手段に記憶されている前記一の音符の付加属性を変更する属性変更手段と、前記属性記憶手段に記憶されている属性を用いて、前記複数の音符に対応する楽曲データを生成するデータ生成手段とを有する楽曲編集装置を提供する。 Further, the present invention provides, for each of a plurality of notes, an attribute storage means for storing an attribute including an additional attribute indicating the beginning of a sound generation period of the note, a pitch, a sound length, and a musical modification to the note; A display for displaying on the screen of the display means a graphic representing the beginning, pitch, and length of the pronunciation period of each of the plurality of notes according to a coordinate system having a first axis representing pitch and a second axis representing time. The control means, the position detection means for detecting the position on the screen, and the locus of the position detected by the position detection means has a predetermined condition as a region corresponding to one note among the plurality of notes. When satisfied, using the attribute change means for changing the additional attribute of the one note stored in the attribute storage means and the attributes stored in the attribute storage means according to the trajectory, To the note of Providing the music editing apparatus and a data generating means for generating the music data to be compliant.

好ましい態様において、前記付加属性は、それぞれ異なる音楽的修飾を示す複数のパラメータを含み、前記属性変更手段は、前記一の音符について、前記複数のパラメータの中から前記軌跡に応じて選択された一のパラメータの値を、前記軌跡に応じて変更してもよい。 In a preferred aspect, the additional attribute includes a plurality of parameters indicating different musical modifications, and the attribute changing means selects one of the plurality of parameters according to the trajectory from the plurality of parameters. The parameter value may be changed according to the locus.

さらに、本発明は、コンピュータを、複数の音符の各々について、当該音符の発音期間の始期、音高、音長、および当該音符に対する音楽的な修飾を示す付加属性を含む属性を記憶する属性記憶手段と、音高を表す第１軸および時間を表す第２軸を有する座標系に従って、前記複数の音符の各々の発音期間の始期、音高、および音長を表す図形を表示手段の画面に表示させる表示制御手段と、前記画面上の位置を検出する位置検出手段と、前記位置検出手段により検出された位置の軌跡の始点が、前記複数の音符の各々に相当する領域内になかった場合、前記始点に応じた発音期間の始期を有する新たな音符を処理対象音符として、前記軌跡に応じて、前記処理対象音符の付加属性を前記属性記憶手段に追加する属性変更手段と、前記属性記憶手段に記憶されている属性を用いて、前記複数の音符に対応する楽曲データを生成するデータ生成手段として機能させるためのプログラムを提供する。 Further, the present invention provides an attribute storage for storing an attribute including an additional attribute indicating the beginning of a note generation period, a pitch, a tone length, and musical modification of the note for each of a plurality of notes. In accordance with the means and a coordinate system having a first axis representing the pitch and a second axis representing the time, a graphic representing the beginning, pitch, and length of the pronunciation period of each of the plurality of notes is displayed on the screen of the display means. When the display control means to be displayed, the position detection means for detecting the position on the screen, and the start point of the locus of the position detected by the position detection means are not within the region corresponding to each of the plurality of notes An attribute changing unit for adding an additional attribute of the processing target note to the attribute storage unit according to the trajectory, with a new note having a beginning of a pronunciation period corresponding to the starting point as a processing target note, and the attribute recording Using the attributes stored in the device, to provide a program for functioning as a data generating means for generating music data corresponding to the multiple notes.

さらに、本発明は、コンピュータを、複数の音符の各々について、当該音符の発音期間の始期、音高、音長、および当該音符に対する音楽的な修飾を示す付加属性を含む属性を記憶する属性記憶手段と、音高を表す第１軸および時間を表す第２軸を有する座標系に従って、前記複数の音符の各々の発音期間の始期、音高、および音長を表す図形を表示手段の画面に表示させる表示制御手段と、前記画面上の位置を検出する位置検出手段と、前記位置検出手段により検出された位置に応じて前記複数の音符の中から選択された一の音符を処理対象音符として、前記位置検出手段により検出された位置の軌跡に応じて、前記属性記憶手段に記憶されている前記処理対象音符の付加属性を変更する属性変更手段と、前記属性記憶手段に記憶されている属性を用いて、前記複数の音符に対応する楽曲データを生成するデータ生成手段として機能させるためのプログラムを提供する。 Further, the present invention provides an attribute storage for storing an attribute including an additional attribute indicating the beginning of a note generation period, a pitch, a tone length, and musical modification of the note for each of a plurality of notes. In accordance with the means and a coordinate system having a first axis representing the pitch and a second axis representing the time, a graphic representing the beginning, pitch, and length of the pronunciation period of each of the plurality of notes is displayed on the screen of the display means. Display control means for displaying, position detecting means for detecting a position on the screen, and one note selected from the plurality of notes according to the position detected by the position detecting means as a processing target note , Attribute changing means for changing an additional attribute of the processing target note stored in the attribute storage means according to the locus of the position detected by the position detection means, and stored in the attribute storage means Using the attribute to provide a program for functioning as a data generating means for generating music data corresponding to the multiple notes.

本発明によれば、音符に対する音楽的な修飾を一度の直感的な操作で入力することが可能になる。 According to the present invention, it is possible to input musical modifications to musical notes with a single intuitive operation.

一実施形態に係る音声合成装置１の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the speech synthesis apparatus 1 which concerns on one Embodiment. 素片ライブラリを例示する図である。It is a figure which illustrates a segment library. 楽譜データを例示する図である。It is a figure which illustrates musical score data. 音声合成装置１のハードウェア構成を示す図である。2 is a diagram illustrating a hardware configuration of the speech synthesizer 1. FIG. 音声合成装置１の外観を例示する図である。1 is a diagram illustrating an appearance of a speech synthesizer 1. FIG. 音声合成アプリケーションが実行されているときの画面を例示する図である。It is a figure which illustrates a screen when the speech synthesis application is being executed. 音声合成装置１の動作を示すフローチャートである。3 is a flowchart showing the operation of the speech synthesizer 1. 新たな音符が追加された画面を例示する図である。It is a figure which illustrates the screen to which a new note was added. 軌跡、特徴パラメータ、および処理対象付加属性の対応関係を例示する。The correspondence of a locus | trajectory, a feature parameter, and a process target additional attribute is illustrated. 更新後の画面を例示する図である。It is a figure which illustrates the screen after an update. 変形例１に係る画面を例示する図である。It is a figure which illustrates the screen concerning modification 1. 変形例１に係る動作を示すフローチャートである。10 is a flowchart showing an operation according to Modification Example 1. ウインドウ２１３に表示される画像の変化を例示する図である。It is a figure which illustrates the change of the image displayed on window 213.

１．構成
図１は、一実施形態に係る音声合成装置１の機能構成を示すブロック図である。音声合成装置１は、文字列および音符列を含む楽譜データに基づいて音声を合成し、合成された音声を出力する装置であり、楽曲編集装置の一例である。音声合成装置１は、大別すると、楽譜データを編集する機能と、楽譜データに基づいて音声を合成する機能とを有する。より詳細には、音声合成装置１は、記憶手段１１と、表示制御手段１２と、表示手段１３と、位置検出手段１４と、属性変更手段１５と、音声合成手段１６と、音声出力手段１７とを有する。記憶手段１１は、素片ライブラリと、歌詞（文字列）と、楽譜データとを記憶する。楽譜データは、複数の音符の各々について、その音符の発音期間の始期、音高、および音長、並びに文字列のうちその音符に割り当てられた文字を含む属性を有する。表示制御手段１２は、音高を表す第１軸および時間を表す第２軸を有する座標系に従って、複数の音符の各々の発音期間の始期、音高、および音長を表す図形を表示手段１３の画面に表示させる。表示手段１３は、表示制御手段１２の制御下で画像を表示する。位置検出手段１４は、画面上の位置を検出する。属性変更手段１５は、位置検出手段１４により検出された位置の軌跡の始点が、音符列の各々に相当する領域内になかった場合、始点に応じた発音期間の始期を有する新たな音符を処理対象音符として、軌跡に応じて、処理対象音符の付加属性を記憶手段１１に追加する。音声合成手段１６（データ生成手段の一例）は、楽譜データに従って、素片ライブラリに含まれるデータを用いて音声を合成する。音声出力手段１７は、合成された音声を出力する。この例で、付加属性は、それぞれ異なる音楽的修飾を示す複数のパラメータを含む。属性変更手段１５は、処理対象音符について、複数のパラメータの中から軌跡に応じて選択された一のパラメータの値を、軌跡に応じて変更する。 1. Configuration FIG. 1 is a block diagram showing a functional configuration of a speech synthesizer 1 according to an embodiment. The speech synthesizer 1 is a device that synthesizes speech based on musical score data including a character string and a note sequence, and outputs the synthesized speech, and is an example of a music editing device. The speech synthesizer 1 roughly has a function of editing score data and a function of synthesizing speech based on the score data. More specifically, the speech synthesizer 1 includes a storage unit 11, a display control unit 12, a display unit 13, a position detection unit 14, an attribute change unit 15, a speech synthesis unit 16, and a speech output unit 17. Have The storage means 11 stores a segment library, lyrics (character strings), and score data. The musical score data has, for each of a plurality of notes, an attribute including the beginning of the sound generation period, the pitch, and the length of the note, and the character assigned to the note in the character string. The display control means 12 displays a graphic representing the beginning of the pronunciation period, the pitch, and the length of each of a plurality of notes in accordance with a coordinate system having a first axis representing the pitch and a second axis representing the time. On the screen. The display unit 13 displays an image under the control of the display control unit 12. The position detection unit 14 detects a position on the screen. If the start point of the position locus detected by the position detection unit 14 is not in an area corresponding to each of the note strings, the attribute changing unit 15 processes a new note having a start period of the pronunciation period corresponding to the start point. As a target note, an additional attribute of the processing target note is added to the storage unit 11 according to the trajectory. The voice synthesizing unit 16 (an example of a data generating unit) synthesizes voice using data included in the segment library according to the score data. The voice output means 17 outputs the synthesized voice. In this example, the additional attribute includes a plurality of parameters indicating different musical modifications. The attribute changing unit 15 changes the value of one parameter selected according to the trajectory from among a plurality of parameters for the processing target note according to the trajectory.

図２は、素片ライブラリを例示する図である。素片ライブラリは、例えば人間の声からサンプリングした音楽素片（歌声の断片）を含むデータベースである。素片ライブラリは、複数の歌唱者の各々に対応した個人別データベースに分かれている。図２に示される例では、素片ライブラリはそれぞれ３人の歌唱者に対応する個人別データベース３０３ａ〜ｃを含んでいる。各歌唱者に対応した個人別データベース３０３には、その歌唱者の歌唱音声波形から採取された素片データが複数含まれている。素片データとは、歌唱音声波形から、音声学的な特徴部分を切り出して符号化した音声データである。 FIG. 2 is a diagram illustrating a fragment library. The segment library is a database including musical segments (singing voice fragments) sampled from, for example, a human voice. The segment library is divided into individual databases corresponding to each of a plurality of singers. In the example shown in FIG. 2, the segment library includes personal databases 303a-c corresponding to three singers, respectively. The individual database 303 corresponding to each singer includes a plurality of pieces of segment data collected from the singing voice waveform of the singer. The segment data is voice data obtained by extracting and encoding a phonetic feature from a singing voice waveform.

ここで、素片データについて、「さいた」という歌詞を歌唱する場合を例として説明する。「さいた」という歌詞は発音記号で「ｓａｉｔａ」と表される。発音記号「ｓａｉｔａ」で表される音声の波形を特徴により分析すると、「ｓ」の音の立ち上がり部分→「ｓ」の音→「ｓ」の音から「ａ」の音への遷移部分→「ａ」の音・・・と続き、「ａ」の音の減衰部分で終わる。それぞれの素片データは、これらの音声学的な特徴部分に対応する音声データである。 Here, the case of singing the lyrics of “sai” with respect to the segment data will be described as an example. The lyrics “Saita” are expressed by phonetic symbols “saita”. When analyzing the waveform of the sound represented by the phonetic symbol “saita”, the rising portion of the sound of “s” → the sound of “s” → the transition portion from the sound of “s” to the sound of “a” → “ The sound of “a” is continued, and ends with the attenuation part of the sound of “a”. Each piece of data is audio data corresponding to these phonetic features.

以下の説明において、ある発音記号で表される音の立ち上がり部分に対応する素片データを、その発音記号の前に「＃」を付けて、「＃ｓ」のように表す。また、ある発音記号で表される音の減衰部分に対応する素片データを、その発音記号の後に「＃」を付けて、「ａ＃」のように表す。また、ある発音記号で表される音から他の発音記号で表される音への遷移部分に対応する素片データを、それらの発音記号の間に「−」を入れて、「ｓ−ａ」のように表す。素片ライブラリの素片データ群３０３０には、歌唱者が通常に歌唱した場合の歌唱音声波形から採取された、あらゆる音および音の組み合わせに関する素片データが格納されている。素片データ群３０３１Ｈ〜Ｌには、それぞれ、歌唱者が強いアクセント、中程度のアクセント、および弱いアクセントを付加して歌唱した場合の歌唱音声波形から採取された、あらゆる音および音の組み合わせに関する素片データが格納されている。素片データ群３０３２Ｈ〜Ｌには、それぞれ、歌唱者が強いレガート、中程度のレガート、および弱いレガートを付加して歌唱した場合の歌唱音声波形から採取された、あらゆる音および音の組み合わせに関する素片データが格納されている。 In the following description, segment data corresponding to a rising portion of a sound represented by a phonetic symbol is represented as “#s” by adding “#” in front of the phonetic symbol. The segment data corresponding to the sound attenuation part represented by a phonetic symbol is represented as “a #” by adding “#” after the phonetic symbol. Also, segment data corresponding to a transition portion from a sound represented by a certain phonetic symbol to a sound represented by another phonetic symbol is inserted between those phonetic symbols, and “sa” is entered. ". The segment data group 3030 of the segment library stores segment data regarding all sounds and combinations of sounds collected from a singing voice waveform when a singer normally sings. In the segment data groups 3031H to L, elements related to all sounds and combinations of sounds collected from a singing voice waveform when a singer sings with a strong accent, a medium accent, and a weak accent, respectively. One piece of data is stored. In the segment data groups 3032H to 30L, elements related to all sounds and sound combinations collected from the singing voice waveform when the singer sings with strong legato, medium legato, and weak legato added, respectively. One piece of data is stored.

図３は、楽譜データを例示する図である。楽譜データには、歌唱演奏を表すパートデータが、１または複数含まれている。楽譜データには、このパートデータの他に、演奏で用いられる拍子およびテンポを示すデータ、および分解能を示すデータが含まれている。パートデータは、複数の音符のそれぞれにつき、基本属性および付加属性を示すデータの組であるノートデータを含んでいる。基本属性データは、音の発音を指示するにあたり不可欠な属性を示すデータであり、音高、発音期間（発音期間の始期および終期）、および発音記号を含んでいる。付加属性データは、音に対し表情付け等の指示、すなわち音楽的な修飾を与えるためのデータであり、この例では、音符と歌詞との対応関係、音の強さ、アクセントの強さ、レガートの強さ、ビブラートの強さ（深さ）、ビブラート期間を含んでいる。 FIG. 3 is a diagram illustrating score data. The score data includes one or more part data representing a singing performance. In addition to the part data, the musical score data includes data indicating the time and tempo used in the performance, and data indicating the resolution. The part data includes note data which is a set of data indicating basic attributes and additional attributes for each of a plurality of notes. The basic attribute data is data indicating attributes indispensable for instructing sound generation, and includes a pitch, a sound generation period (the start and end of the sound generation period), and a sound generation symbol. The additional attribute data is data for giving an expression to the sound, such as giving a musical modification. In this example, the correspondence between the note and the lyrics, the strength of the sound, the strength of the accent, the legato Strength, vibrato strength (depth), and vibrato period.

次に、音声合成手段１６による音声合成処理の概要を説明する。ここでは、楽譜データに含まれる「ｓａｋｕｒａ」という発音記号列に対する処理を例として説明する。音声合成手段１６は、発音記号列を素片データの単位に分解する。例えば、「ｓａｋｕｒａ」は、「＃ｓ」、「ｓ」、「ｓ−ａ」、「ａ」、「ａ−ｋ」、「ｋ」、「ｋ−ｕ」、「ｕ」、「ｕ−ｒ」、「ｒ」、「ｒ−ａ」、「ａ」、および「ａ＃」に分解される。音声合成手段１６は、分解された発音記号列のそれぞれに対応する素片データを、素片データ群３０３０から読み出す。音声合成手段１６は、読み出した素片データに対し、各音符により示される音高に基づき、音高調整を行う。さらに音声合成手段１６は、素片データに対し、付加属性データに応じた加工を施す。音声合成手段１６は、音高調整を行った素片データに対し、音符列により示される発音期間に基づき、素片の継続時間の調整を施す。音声合成手段１６は、継続時間の調整を行った素片データに対し、音量調節を行う。音声合成手段１６は、音量調節を行った素片データを順番に接合し、合成音声データを生成する。音声合成手段１６は、生成した合成音声データを、記憶手段１１に記憶する。 Next, the outline of the speech synthesis process by the speech synthesizer 16 will be described. Here, the process for the phonetic symbol string “sakura” included in the score data will be described as an example. The speech synthesizer 16 decomposes the phonetic symbol string into units of segment data. For example, “sakuura” is “#s”, “s”, “sa”, “a”, “ak”, “k”, “ku”, “u”, “ur”. ”,“ R ”,“ r−a ”,“ a ”, and“ a # ”. The speech synthesizer 16 reads the segment data corresponding to each of the dissociated phonetic symbol strings from the segment data group 3030. The voice synthesizer 16 adjusts the pitch of the read segment data based on the pitch indicated by each note. Furthermore, the speech synthesizer 16 processes the segment data according to the additional attribute data. The speech synthesizer 16 adjusts the duration of the segment based on the pronunciation period indicated by the note sequence for the segment data for which the pitch has been adjusted. The voice synthesizer 16 adjusts the volume of the segment data whose duration has been adjusted. The voice synthesizing unit 16 sequentially joins the segment data whose volume has been adjusted to generate synthesized voice data. The voice synthesis unit 16 stores the generated synthesized voice data in the storage unit 11.

ユーザが楽曲の再生指示を入力し、位置検出手段１４がこれを取得すると、音声出力手段１７は、記憶手段１１に記憶されている合成音声データを読み出し、これに応じた音声を出力する。その結果、ユーザは楽譜データにより示される歌唱演奏を聴くことができる。 When the user inputs a music playback instruction and the position detection unit 14 acquires the instruction, the voice output unit 17 reads the synthesized voice data stored in the storage unit 11 and outputs the corresponding voice. As a result, the user can listen to the singing performance indicated by the score data.

図４は、音声合成装置１のハードウェア構成を示す図である。この例で、音声合成装置１はコンピュータであり、ＣＰＵ（Central Processing Unit）１０１と、ＲＯＭ（Read Only Memory）１０２と、ＲＡＭ（Random Access Memory）１０３と、記憶部１０４と、入力部１０５と、表示部１０６と、ＤＡＣ（Digital Analog Converter）１０７と、アンプ１０８と、スピーカ１０９とを有する。ＣＰＵ１０１は汎用的なデータ処理を行うマイクロプロセッサである。ＲＯＭ１０２はＢＩＯＳ（Basic Input/Output System）等の制御用プログラムを格納する不揮発性メモリである。ＲＡＭ１０３はデータを記憶する揮発性メモリである。記憶部１０４は、不揮発性の記憶装置、例えばＨＤＤ（Hard Disk Drive）またはフラッシュメモリである。記憶部１０４は、ＯＳ（Operating System）、アプリケーションプログラム、および各種のデータを記憶する。ＣＰＵ１０１は、ＢＩＯＳ、ＯＳ、またはアプリケーションプログラムに従い、音声合成装置１の他の構成部を制御する。 FIG. 4 is a diagram illustrating a hardware configuration of the speech synthesizer 1. In this example, the speech synthesizer 1 is a computer, a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, a storage unit 104, an input unit 105, The display unit 106, a DAC (Digital Analog Converter) 107, an amplifier 108, and a speaker 109 are included. The CPU 101 is a microprocessor that performs general-purpose data processing. The ROM 102 is a non-volatile memory that stores a control program such as BIOS (Basic Input / Output System). The RAM 103 is a volatile memory that stores data. The storage unit 104 is a non-volatile storage device such as an HDD (Hard Disk Drive) or a flash memory. The storage unit 104 stores an OS (Operating System), application programs, and various data. The CPU 101 controls other components of the speech synthesizer 1 according to the BIOS, OS, or application program.

入力部１０５は、指示またはデータを入力するための装置である。表示部１０６は、液晶ディスプレイまたは有機ＥＬ（Electro-Luminescence）ディスプレイ等の表示装置と、表示装置を駆動する駆動回路とを有し、文字および図形等を表示する。この例で、入力部１０５と表示部１０６とを一体とした構成として、タッチパネル（タッチスクリーン）が用いられる。ＤＡＣ１０７は、合成音声データ等の音声データを取得し、これをアナログ音声信号に変換する。ＤＡＣ１０７は、アナログ音声信号をアンプ１０８に出力する。アンプ１０８は、アナログ音声信号を増幅し、スピーカ１０９に出力する。スピーカ１０９は、アナログ音声信号に応じた音波を出力する。 The input unit 105 is a device for inputting instructions or data. The display unit 106 includes a display device such as a liquid crystal display or an organic EL (Electro-Luminescence) display, and a drive circuit that drives the display device, and displays characters, graphics, and the like. In this example, a touch panel (touch screen) is used as a configuration in which the input unit 105 and the display unit 106 are integrated. The DAC 107 acquires audio data such as synthesized audio data and converts it into an analog audio signal. The DAC 107 outputs an analog audio signal to the amplifier 108. The amplifier 108 amplifies the analog audio signal and outputs it to the speaker 109. The speaker 109 outputs a sound wave corresponding to the analog audio signal.

この例で、記憶部１０４は、コンピュータを音声合成装置として機能させるためのアプリケーションプログラム（以下このプログラムを「音声合成アプリケーション」という）を記憶している。ＣＰＵ１０１がこの音声合成アプリケーションを実行することにより、音声合成装置１に図１に示される機能が実装される。音声合成アプリケーションを実行しているＣＰＵ１０１は、表示制御手段１２、位置検出手段１４、属性変更手段１５、および音声合成手段１６の一例である。ＲＯＭ１０２、ＲＡＭ１０３、または記憶部１０４は、記憶手段１１の一例である。ＣＰＵ１０１の制御下にある表示部１０６は、表示手段１３の一例である。ＣＰＵ１０１の制御下にあるＤＡＣ１０７、アンプ１０８、およびスピーカ１０９は、音声出力手段１７の一例である。 In this example, the storage unit 104 stores an application program for causing a computer to function as a speech synthesizer (hereinafter, this program is referred to as “speech synthesis application”). When the CPU 101 executes the speech synthesis application, the function shown in FIG. The CPU 101 executing the speech synthesis application is an example of the display control unit 12, the position detection unit 14, the attribute change unit 15, and the speech synthesis unit 16. The ROM 102, the RAM 103, or the storage unit 104 is an example of the storage unit 11. The display unit 106 under the control of the CPU 101 is an example of the display unit 13. The DAC 107, the amplifier 108, and the speaker 109 under the control of the CPU 101 are an example of the audio output unit 17.

図５は、音声合成装置１の外観を例示する図である。この例で、音声合成装置１はタッチパネル式の情報表示装置であり、筐体１１０と、タッチパネル１１１とを有する。筐体１１０にはスピーカ１０９およびタッチパネル１１１が設けられている。タッチパネル１１１は、表示装置の画面上に光透過性のタッチセンサが積層された構造を有している。ユーザは、表示されている画像を見ながらタッチパネル１１１に指を触れたり、タッチパネル１１１上を指でなぞったりすることにより、音声合成装置１に対して指示を入力する。すなわち、タッチパネル１１１は、入力部１０５と表示部１０６とを一体としたものである。 FIG. 5 is a diagram illustrating the appearance of the speech synthesizer 1. In this example, the speech synthesizer 1 is a touch panel type information display device, and includes a housing 110 and a touch panel 111. The housing 110 is provided with a speaker 109 and a touch panel 111. The touch panel 111 has a structure in which a light transmissive touch sensor is stacked on a screen of a display device. The user inputs an instruction to the speech synthesizer 1 by touching the touch panel 111 while watching the displayed image or by tracing the touch panel 111 with a finger. That is, the touch panel 111 is a unit in which the input unit 105 and the display unit 106 are integrated.

２．動作
図６は、音声合成アプリケーションが実行されているときの画面を例示する図である。この画面は、入力ボックス２０１、ウインドウ２０２、ガイド図形２０３、ノート図形２０４、ノート図形２０５、ノート図形２０６、ノート図形２０７、ノート図形２０８、再生ボタン２０９、および停止ボタン２１０を含む。入力ボックス２０１は、歌詞を入力および表示するための領域である。この例では、「あさがくるひるがくるよるがくる」という文字列が歌詞として入力されている。ウインドウ２０２は、音高を表す第１軸（この例では縦軸）および時間を表す第２軸（横軸）を有する座標系に従って、音符列を入力および表示するための領域である。音高軸は、図６において上向きが正方向（音が高くなる）である。時間軸は、図６において右向きが正方向（時間が後になる）である。ガイド図形２０３は、音高を示す図形であり、ウインドウ２０２の音高軸に沿って表示される。この例では、ガイド図形２０３としてピアノの鍵盤を模した図形が用いられる。このことから、ウインドウ２０２を用いた音符列の表示を「ピアノロール表示」という。ガイド図形２０３は、音高を特定する画像（この例では、「Ｃ３」および「Ｃ４」という文字）を含む。ノート図形２０４−２０８は、音符列を構成する各音符を示す図形である。この例でノート図形２０４−２０８は、長方形の形状を有しており、左端が発音期間の始期を、右端が発音器官の終期を示している。ノート図形２０４−２０８の縦方向の位置は音高を示している。ノート図形２０４−２０８の内部には、その音符に割り当てられた文字（歌詞の一部）が表示されている。この例で、ノート図形２０４−２０８が示す音符には、それぞれ、「あ」、「さ」、「が」、「く」、および「る」という文字が割り当てられている。再生ボタン２０９は、ピアノロール表示されている楽曲の再生を指示するためのボタンである。停止ボタン２１０は、楽曲の再生を停止するためのボタンである。 2. Operation FIG. 6 is a diagram illustrating a screen when the speech synthesis application is being executed. This screen includes an input box 201, a window 202, a guide graphic 203, a note graphic 204, a note graphic 205, a note graphic 206, a note graphic 207, a note graphic 208, a play button 209, and a stop button 210. The input box 201 is an area for inputting and displaying lyrics. In this example, the character string “Higaru Agaru” comes as the lyrics. The window 202 is an area for inputting and displaying a musical note string in accordance with a coordinate system having a first axis representing the pitch (vertical axis in this example) and a second axis representing the time (horizontal axis). As for the pitch axis, the upward direction in FIG. 6 is the positive direction (the sound becomes higher). In the time axis, the right direction in FIG. 6 is the positive direction (time is later). The guide graphic 203 is a graphic indicating the pitch, and is displayed along the pitch axis of the window 202. In this example, a figure imitating a piano keyboard is used as the guide figure 203. Therefore, the display of the note string using the window 202 is called “piano roll display”. The guide graphic 203 includes an image (in this example, characters “C3” and “C4”) that specify the pitch. The notebook figures 204-208 are figures showing each note constituting the note string. In this example, note graphics 204-208 have a rectangular shape, with the left end indicating the beginning of the sounding period and the right end indicating the end of the sounding organ. The vertical position of the note graphic 204-208 indicates the pitch. Characters (part of the lyrics) assigned to the note are displayed inside the note graphic 204-208. In this example, the letters “a”, “sa”, “ga”, “ku”, and “ru” are assigned to the notes indicated by the note graphics 204-208, respectively. The playback button 209 is a button for instructing playback of the music displayed on the piano roll. The stop button 210 is a button for stopping the reproduction of music.

図７は、音声合成装置１の動作を示すフローチャートである。図７のフローは、例えば、音声合成アプリケーションの起動がユーザにより指示されたことを契機として開始する。音声合成アプリケーションが起動されると、図６に例示した画面が表示される。ステップＳ１００において、ＣＰＵ１０１は、ウインドウ２０２内においてタッチを検出したか判断する。詳細には以下のとおりである。タッチパネル１１１は、ユーザがタッチした位置を示す信号をＣＰＵ１０１に出力する。ＣＰＵ１０１は、タッチパネル１１１から出力された信号が示す位置が、ウインドウ２０２内に相当する位置であるか判断する。タッチを検出していないと判断された場合（Ｓ１００：ＮＯ）、ＣＰＵ１０１は、タッチを検出するまで待機する。タッチを検出したと判断された場合（Ｓ１００：ＹＥＳ）、ＣＰＵ１０１は、処理をステップＳ１１０に移行する。 FIG. 7 is a flowchart showing the operation of the speech synthesizer 1. The flow in FIG. 7 starts, for example, in response to an instruction from the user to start the speech synthesis application. When the speech synthesis application is activated, the screen illustrated in FIG. 6 is displayed. In step S <b> 100, the CPU 101 determines whether a touch is detected in the window 202. Details are as follows. The touch panel 111 outputs a signal indicating the position touched by the user to the CPU 101. The CPU 101 determines whether the position indicated by the signal output from the touch panel 111 is a position corresponding to the window 202. When it is determined that a touch is not detected (S100: NO), the CPU 101 waits until a touch is detected. If it is determined that a touch has been detected (S100: YES), the CPU 101 proceeds to step S110.

ステップＳ１１０において、ＣＰＵ１０１は、処理対象となる一の音符（以下「処理対象音符」という）を特定する。処理対象音符は、楽譜データに含まれる音符列（入力済みの音符）の中から選ばれた一の音符であるか、または楽譜データに含まれていない新たな音符である。処理対象音符の特定は、例えば以下のように行われる。ＣＰＵ１０１は、タッチされた位置が、ウインドウ２０２に表示されているノート図形のいずれかに相当する位置（すなわちノート図形に重なる位置）であった場合、ＣＰＵ１０１は、そのノート図形が示す音符を処理対象音符として特定する。例えば、図６の画面でノート図形２０５をユーザがタッチした場合、ＣＰＵ１０１は、ノート図形２０５が示す音符（歌詞「さ」が割り当てられている）を処理対象音符として特定する。タッチされた位置が、ウインドウ２０２に表示されているノート図形のいずれかにも相当しない位置であった場合、ＣＰＵ１０１は、新たな音符を処理対象音符として特定する。この場合、ＣＰＵ１０１は、新たな音符の属性を示すデータを生成し、ＲＡＭ１０３に記憶する。新たな音符の属性は、決められた初期値に設定される。 In step S110, the CPU 101 specifies one note to be processed (hereinafter referred to as “processing target note”). The processing target note is one note selected from the note string (input note) included in the score data, or a new note not included in the score data. The processing target note is identified as follows, for example. When the touched position is a position corresponding to one of the note graphics displayed in the window 202 (that is, a position overlapping the note graphic), the CPU 101 processes the note indicated by the note graphic. Identifies as a note. For example, when the user touches the note graphic 205 on the screen of FIG. 6, the CPU 101 identifies the note indicated by the note graphic 205 (allocated with the lyrics “sa”) as the processing target note. When the touched position is a position that does not correspond to any of the note figures displayed in the window 202, the CPU 101 specifies a new note as a processing target note. In this case, the CPU 101 generates data indicating new note attributes and stores them in the RAM 103. The new note attribute is set to a predetermined initial value.

図８は、新たな音符が追加された画面を例示する図である。図８には、ユーザが、時間軸においてノート図形２０４およびノート図形２０５の間に位置し、音高軸において「Ｄ３」に相当する位置を指Ｆでタッチした例が示されている。このときＣＰＵ１０１は、新たな音符を示すノート図形２１１を、ウインドウ２０２内においてユーザがタッチしている位置に応じた位置に表示する。ＣＰＵ１０１は、タッチされた位置を、新たな音符の発音期間の始期として設定する。新たな音符のノート図形の時間軸方向の幅は、決められた初期値（例えば四分音符）に設定される。処理対象音符として新たな音符が追加されると、ＣＰＵ１０１は、処理対象音符に対して歌詞を割り当てる。詳細には、まず、ＣＰＵ１０１は、処理対象音符の時間軸上の位置、特に他の音符との前後関係に基づいて、処理対象音符の順番を特定する。図８の例では、処理対象音符の位置がノート図形２０４およびノート図形２０５の間に指定されているので、ＣＰＵ１０１は、処理対象音符が第２音であると特定する。次に、ＣＰＵ１０１は、特定された順番に基づいて、処理対象音符に割り当てるべき文字を決定する。この例で、処理対象音符は第２音なので、ＣＰＵ１０１は、歌詞「あさがくるひるがくるよるがくる」のうち２文字目「さ」を処理対象音符に割り当てる。さらに、ＣＰＵ１０１は、新たな文字の割り当てに伴って、他の音符への文字の割り当てを変更する。処理対象音符が第２音になったので、従前の第２音は第３音となり、以下順番が１つずつ繰り下がる。この例では、ノート図形２０５が示す音に割り当てられる文字が「さ」から「が」に変更される。他の音符についても同様である。ＣＰＵ１０１は、この変更後の、音符と歌詞との対応関係を示すデータをＲＡＭ１０３に記憶する。 FIG. 8 is a diagram illustrating a screen on which a new note is added. FIG. 8 shows an example in which the user is located between the note graphic 204 and the note graphic 205 on the time axis and touches the position corresponding to “D3” with the finger F on the pitch axis. At this time, the CPU 101 displays a note graphic 211 indicating a new note at a position corresponding to the position touched by the user in the window 202. The CPU 101 sets the touched position as the beginning of a new note generation period. The width of the note pattern of the new note in the time axis direction is set to a predetermined initial value (for example, a quarter note). When a new note is added as a processing target note, the CPU 101 assigns lyrics to the processing target note. Specifically, first, the CPU 101 specifies the order of the processing target notes based on the position of the processing target notes on the time axis, particularly the context with other notes. In the example of FIG. 8, since the position of the processing target note is specified between the note graphic 204 and the note graphic 205, the CPU 101 specifies that the processing target note is the second sound. Next, the CPU 101 determines a character to be assigned to the processing target note based on the identified order. In this example, since the processing target note is the second sound, the CPU 101 assigns the second character “sa” of the lyrics “Asagari Hirugaru Kurugarakuru” to the processing target note. Furthermore, the CPU 101 changes the assignment of characters to other notes as new characters are assigned. Since the note to be processed is the second sound, the previous second sound is the third sound, and the order is lowered one by one. In this example, the character assigned to the sound indicated by the note graphic 205 is changed from “sa” to “ga”. The same applies to other notes. The CPU 101 stores data indicating the correspondence relationship between the note and the lyrics after the change in the RAM 103.

再び図７を参照する。ステップＳ１２０において、ＣＰＵ１０１は、ユーザがタッチした位置の軌跡を記録する。詳細には以下のとおりである。ＣＰＵ１０１は、決められた時間間隔で、タッチパネル１１１からタッチ位置を示す座標を取得する。ＣＰＵ１０１は、取得した座標を、ＲＡＭ１０３内の記憶領域に順番に書き込む。 Refer to FIG. 7 again. In step S120, the CPU 101 records the locus of the position touched by the user. Details are as follows. The CPU 101 acquires coordinates indicating the touch position from the touch panel 111 at predetermined time intervals. The CPU 101 writes the acquired coordinates in the storage area in the RAM 103 in order.

ステップＳ１３０において、ＣＰＵ１０１は、ウインドウ２０２内におけるタッチが検出されなくなったか（すなわちタッチ非検出状態になったか）判断する。ユーザがタッチパネル１１１をタッチしていないとき、タッチパネル１１１は、タッチされていないことを示す信号をＣＰＵ１０１に出力する。タッチが検出された場合（すなわち、ステップＳ１００でタッチが検出されてから継続してタッチが検出された場合）（Ｓ１３０：ＮＯ）、ＣＰＵ１０１は、処理を再びステップＳ１２０に移行する。すなわち、ステップＳ１００でタッチが検出されてから継続してタッチが検出され続けている間、ＣＰＵ１０１は、軌跡を記録し続ける。タッチが検出されなかった場合（Ｓ１３０：ＹＥＳ）、ＣＰＵ１０１は、処理をステップＳ１４０に移行する。 In step S <b> 130, the CPU 101 determines whether or not a touch in the window 202 has been detected (that is, whether or not a touch non-detection state has been entered). When the user does not touch the touch panel 111, the touch panel 111 outputs a signal indicating that the touch is not touched to the CPU 101. When the touch is detected (that is, when the touch is continuously detected after the touch is detected in step S100) (S130: NO), the CPU 101 shifts the process to step S120 again. That is, while the touch is continuously detected after the touch is detected in step S100, the CPU 101 continues to record the locus. When the touch is not detected (S130: YES), the CPU 101 shifts the process to step S140.

タッチが検出されている間、ＣＰＵ１０１は、軌跡に応じて処理対象音符のノート画像を変化させる。図８の例で、ユーザが指Ｆをタッチパネル１１１にタッチしたまま時間軸正方向に移動させる（いわゆる「ドラッグ」する）と、ＣＰＵ１０１は、処理対象音符の発音期間の始期を固定したまま、終期を後に変更する。すなわち、ノート図形２１１の左端は固定したまま右端を時間軸正方向に移動させる（すなわち、ノート図形２１１を時間軸正方向に伸ばす）。この間、ＣＰＵ１０１は、処理対象音符の発音期間の終期を示すデータをＲＡＭ１０３に書き込み、更新し続ける。既に入力済みの音符が処理対象音符である場合、ＣＰＵ１０１は、タッチを検出していない状態（以下「非タッチ状態」という）から最初にタッチを検出したときにはノート図形は変化させず、タッチを検出してからドラッグを検出したときに、軌跡に応じて発音期間の終期を変更する。 While the touch is detected, the CPU 101 changes the note image of the processing target note according to the trajectory. In the example of FIG. 8, when the user moves the finger F in the positive direction of the time axis while touching the touch panel 111 (so-called “drag”), the CPU 101 keeps the beginning of the sound generation period of the processing target note fixed. Will be changed later. That is, the left end of the note graphic 211 is fixed, and the right end is moved in the time axis positive direction (that is, the note graphic 211 is extended in the time axis positive direction). During this time, the CPU 101 continues to update the RAM 103 by writing data indicating the end of the sound generation period of the processing target note. When a note that has already been input is a processing target note, the CPU 101 detects a touch without changing the note figure when it first detects a touch from a state where no touch is detected (hereinafter referred to as “non-touch state”). Then, when a drag is detected, the end of the pronunciation period is changed according to the trajectory.

ステップＳ１４０において、ＣＰＵ１０１は、記録されている軌跡に基づいて処理対象となるパラメータすなわち楽譜データにおける付加属性を決定する。詳細には以下のとおりである。この例では、処理対象となる付加属性は、音の強さ、またはビブラートの強さである。ＣＰＵ１０１は、ＲＡＭ１０３に記録されている軌跡から、この軌跡の特徴を示すパラメータ（以下「特徴パラメータ」という）を抽出する。この例では、始点の座標、終点の座標、極値の数、および振幅（極値のうち最大値と最少値との差）が、特徴パラメータとして抽出される。記憶部１０４は、特徴パラメータが満たすべき条件と付加属性とを対応させる情報を記憶している。この情報は、例えば以下の内容を示す。
（１）極値の数がしきい値（例えば４つ）以上であった場合、処理対象となる付加属性はビブラートの強さである。
（２）極値の数がしきい値未満であり、かつ、始点の音高と終点の音高との差がしきい値以上であった場合、処理対象となる付加属性は、音の強さである。
ＣＰＵ１０１は、これらの条件をＲＡＭ１０３に記録されている軌跡に適用して、処理対象となる付加属性を決定する。軌跡がどの条件も満たさなかった場合、ＣＰＵ１０１は、処理対象となる付加属性は無いと判断する。別の例で、軌跡がどの条件も満たさなかった場合、ＣＰＵ１０１は、決められた（デフォルトの）付加属性を処理の対象とすることを決定してもよい。デフォルトの付加属性は、例えば、ユーザにより設定される。この場合、後述する付加属性の値の決定の際には、決められた値が用いられる。例えば、軌跡がどの条件も満たさなかった場合、ＣＰＵ１０１は、ビブラートの強さおよび早さを決められた値に決定する。 In step S140, the CPU 101 determines a parameter to be processed based on the recorded trajectory, that is, an additional attribute in the score data. Details are as follows. In this example, the additional attribute to be processed is sound intensity or vibrato intensity. The CPU 101 extracts parameters (hereinafter referred to as “feature parameters”) indicating the characteristics of the trajectory from the trajectory recorded in the RAM 103. In this example, the coordinates of the start point, the coordinates of the end point, the number of extreme values, and the amplitude (difference between the maximum value and the minimum value of the extreme values) are extracted as feature parameters. The storage unit 104 stores information for associating a condition to be satisfied by a feature parameter with an additional attribute. This information indicates, for example, the following contents.
(1) When the number of extreme values is greater than or equal to a threshold value (for example, four), the additional attribute to be processed is the vibrato strength.
(2) If the number of extreme values is less than the threshold value and the difference between the pitch at the start point and the pitch at the end point is greater than or equal to the threshold value, the additional attribute to be processed is the sound intensity That's it.
The CPU 101 applies these conditions to the trajectory recorded in the RAM 103 to determine additional attributes to be processed. When the locus does not satisfy any condition, the CPU 101 determines that there is no additional attribute to be processed. In another example, when the trajectory does not satisfy any condition, the CPU 101 may determine that the determined (default) additional attribute is to be processed. The default additional attribute is set by the user, for example. In this case, the determined value is used when determining the value of the additional attribute described later. For example, when the trajectory does not satisfy any condition, the CPU 101 determines the strength and speed of the vibrato to predetermined values.

図９は、軌跡、特徴パラメータ、および処理対象付加属性の対応関係を例示する図である。実際には軌跡として記録されるデータは点の集合であるが、ここでは分かりやすくするため線で示している。図９上段の例では、何度か上下する波のような軌跡が示されている。この軌跡から、５つの極値および振幅が特徴パラメータとして抽出される。極値の数がしきい値以上であるので、ビブラートの強さが処理対象の付加属性として決定される。図９下段の例では、右上がりの軌跡が示されている。この軌跡から極値は抽出されず、始点および終点の座標が特徴パラメータとして抽出される。始点の音高と終点の音高との差がしきい値以上であった場合、音の強さが処理対象の付加属性として決定される。 FIG. 9 is a diagram illustrating a correspondence relationship between a trajectory, a feature parameter, and a processing target additional attribute. Actually, the data recorded as the trajectory is a set of points, but here it is shown by lines for easy understanding. In the example in the upper part of FIG. 9, a trajectory like a wave that rises and falls several times is shown. From this locus, five extreme values and amplitudes are extracted as feature parameters. Since the number of extreme values is equal to or greater than the threshold value, the strength of vibrato is determined as an additional attribute to be processed. In the example in the lower part of FIG. 9, a locus that rises to the right is shown. Extreme values are not extracted from this locus, but the coordinates of the start point and end point are extracted as feature parameters. If the difference between the pitch at the start point and the pitch at the end point is equal to or greater than the threshold value, the sound intensity is determined as an additional attribute to be processed.

再び図７を参照する。ステップＳ１５０において、ＣＰＵ１０１は、記録されている軌跡に基づいて処理対象のパラメータすなわち付加属性の値を決定する。詳細には以下のとおりである。この例では、記憶部１０４は、付加属性と、その値を決定する方法とを対応させる情報を記憶している。この情報は、例えば以下の内容を示す。
（１）ビブラートの強さが処理対象である場合、軌跡から特徴パラメータとして抽出された振幅に応じて、ビブラートの強さをＬ、Ｍ、またはＨに決定する（振幅が小さい場合はＬ、大きい場合はＨ）。ビブラートの早さおよび始期は、決められた初期値が用いられる。
（２）音の強さが処理対象の場合、始点の音高と終点の音高との差に応じて、音の強さを決定する。なお図３の例では、単一の音について音の強さが一定であるデータが例示されているが、単一の音について音の強さは変化してもよい。図３にはそのような音は示されていないが、決められた書式に従って音の強さが記述される。
ＣＰＵ１０１は、これらの条件を処理対象の軌跡に適用して、処理対象の付加属性の値を決定する。 Refer to FIG. 7 again. In step S150, the CPU 101 determines the parameter to be processed, that is, the value of the additional attribute, based on the recorded trajectory. Details are as follows. In this example, the storage unit 104 stores information that associates an additional attribute with a method for determining the value. This information indicates, for example, the following contents.
(1) When the vibrato strength is a processing target, the vibrato strength is determined to be L, M, or H according to the amplitude extracted as a characteristic parameter from the trajectory (L is large when the amplitude is small, and large) H). A predetermined initial value is used for the speed and start of vibrato.
(2) When the sound intensity is a processing target, the sound intensity is determined according to the difference between the pitch at the start point and the pitch at the end point. In the example of FIG. 3, data with a constant sound intensity for a single sound is illustrated, but the sound intensity for a single sound may vary. Although such a sound is not shown in FIG. 3, the sound intensity is described according to a predetermined format.
The CPU 101 determines the value of the additional attribute to be processed by applying these conditions to the locus to be processed.

ステップＳ１６０において、ＣＰＵ１０１は、ステップＳ１４０およびＳ１５０の結果に応じて、記憶部１０４に記憶されている楽譜データを書き替える。また、ＣＰＵ１０１は、ＲＡＭ１０３に記憶されている発音期間の終期の値を、記憶部１０４に記憶されている楽譜データに書き込む。さらに、ＣＰＵ１０１は、書き替えた楽譜データに応じて、ピアノロール表示を更新する。 In step S160, the CPU 101 rewrites the score data stored in the storage unit 104 in accordance with the results of steps S140 and S150. Further, the CPU 101 writes the end value of the pronunciation period stored in the RAM 103 into the score data stored in the storage unit 104. Further, the CPU 101 updates the piano roll display according to the rewritten musical score data.

図１０は、更新後の画面を例示する図である。図１０は、図８で追加された新たな音符に対してビブラートが付加された例を示している。この例では、ノート図形２１１内に、ビブラートが付加されていることを示す記号が表示されている。この例では、ウインドウ２０２において非タッチ状態から最初にタッチが検出されたとき、ＣＰＵ１０１は、タッチされた場所に応じて、処理対象音符を決定する。具体的には、ノート図形がタッチされた場合にはそのノート図形が示す音符が処理対象音符になり、ノート図形が無い場所がタッチされた場合には新たな音符が処理対象音符になる。これらいずれの場合でも、最初にタッチが検出されてから、非タッチ状態を経由せずそのままドラッグにより描かれた軌跡により、変更される付加属性およびその値が決定される。このように、音声合成装置１によれば、タッチパネル１１１をタッチして一連の軌跡を描く直感的な操作により、音符の長さとそれに対する音楽的な修飾とを入力することができる。 FIG. 10 is a diagram illustrating the updated screen. FIG. 10 shows an example in which vibrato is added to the new note added in FIG. In this example, a symbol indicating that vibrato is added is displayed in the notebook graphic 211. In this example, when a touch is first detected from the non-touch state in the window 202, the CPU 101 determines a processing target note according to the touched location. Specifically, when a note graphic is touched, a note indicated by the note graphic becomes a processing target note, and when a place without a note graphic is touched, a new note becomes a processing target note. In any of these cases, after the first touch is detected, the additional attribute to be changed and its value are determined by the trajectory drawn by dragging without passing through the non-touch state. As described above, according to the speech synthesizer 1, it is possible to input the length of a note and a musical modification thereto by an intuitive operation of drawing a series of trajectories by touching the touch panel 111.

３．他の実施形態
本発明は上述の実施形態に限定されるものではなく、種々の変形実施が可能である。以下、変形例をいくつか説明する。以下の変形例のうち２つ以上のものが組み合わせて用いられてもよい。 3. Other Embodiments The present invention is not limited to the above-described embodiments, and various modifications can be made. Hereinafter, some modifications will be described. Two or more of the following modifications may be used in combination.

３−１．変形例１
図１１は、変形例１に係る画面を例示する図である。この画面は、図６に示した画面の要素に加えて、さらにウインドウ２１２およびウインドウ２１３を有する。ウインドウ２１２は、現時点での動作モードを示す画像を表示する領域である。この例で、音声合成装置１は、入力モードおよび編集モードを含む複数の動作モードで動作する。入力モードは、新たに音符を入力するための動作モードである。編集モードは、既に入力済みの音符の属性を変更するための動作モードである。編集モードは、さらにビブラート編集モードとダイナミクス編集モードとに分けられる。ビブラート編集モードは、ビブラートに関する付加属性を変更するための編集モードである。ダイナミクス編集モードは、音の強さに関する付加属性を変更するための編集モードである。 3-1. Modification 1
FIG. 11 is a diagram illustrating a screen according to the first modification. This screen further includes a window 212 and a window 213 in addition to the elements of the screen shown in FIG. The window 212 is an area for displaying an image indicating the current operation mode. In this example, the speech synthesizer 1 operates in a plurality of operation modes including an input mode and an edit mode. The input mode is an operation mode for newly inputting a note. The edit mode is an operation mode for changing the attributes of notes that have already been input. The edit mode is further divided into a vibrato edit mode and a dynamics edit mode. The vibrato edit mode is an edit mode for changing an additional attribute related to vibrato. The dynamics edit mode is an edit mode for changing an additional attribute related to sound intensity.

ウインドウ２１３は、動作モードに応じて、処理対象の付加属性を示す画像を表示する領域である。例えば、動作モードがビブラート編集モードであった場合、ウインドウ２１３には、ビブラートの強さ、速さ、ビブラートの始期および終期を表す画像が表示される。この画像は、例えば、音高を縦軸とし時間を横軸とする座標系において表される棒グラフの画像である。別の例で、動作モードがダイナミクス編集モードであった場合、ウインドウ２１３には、音の強さの時間変化を表す画像が表示される。この画像は、例えば、音の強さを縦軸とし時間を横軸とする座標系において表される棒グラフの画像である。 The window 213 is an area for displaying an image indicating an additional attribute to be processed according to the operation mode. For example, when the operation mode is the vibrato edit mode, the window 213 displays images representing the vibrato strength, speed, and the start and end of the vibrato. This image is, for example, a bar graph image represented in a coordinate system in which the pitch is the vertical axis and the time is the horizontal axis. In another example, when the operation mode is the dynamics editing mode, an image representing a temporal change in sound intensity is displayed in the window 213. This image is, for example, a bar graph image represented in a coordinate system in which the intensity of sound is the vertical axis and time is the horizontal axis.

図１２は、音声合成装置１の変形例１に係る動作を示すフローチャートである。ステップＳ２００において、ＣＰＵ１０１は、モード変更の指示が入力されたか判断する。モード変更の指示が入力されたと判断した場合（Ｓ２００：ＹＥＳ）、ＣＰＵ１０１は、処理をステップＳ２１０に移行する。モード変更の指示が入力されていないと判断した場合（Ｓ２００：ＮＯ）、ＣＰＵ１０１は、モード変更の指示が入力されるまで待機する。モード変更の指示は、ユーザによるタッチパネル１１１の操作を介して入力される。この例で、ＣＰＵ１０１は、ウインドウ２１２に相当する位置においてタッチを検出しない状態から、ウインドウ２１２に相当する位置においてタッチを検出する状態に変わった場合に、モード変更の指示が入力されたと判断する。 FIG. 12 is a flowchart illustrating an operation according to the first modification of the speech synthesizer 1. In step S200, the CPU 101 determines whether a mode change instruction has been input. When determining that the mode change instruction has been input (S200: YES), the CPU 101 shifts the processing to step S210. When determining that the mode change instruction has not been input (S200: NO), the CPU 101 waits until the mode change instruction is input. The mode change instruction is input through an operation on the touch panel 111 by the user. In this example, the CPU 101 determines that a mode change instruction has been input when the touch is not detected at the position corresponding to the window 212 and the touch is detected at the position corresponding to the window 212.

ステップＳ２１０において、ＣＰＵ１０１は、動作モードを切り替える。この例では、ユーザがウインドウ２１２をタッチするたびに、動作モードが、入力モード、ビブラート編集モード、およびダイナミクス編集モードの順番で切り替わり、ダイナミクス編集モードの次は再び入力モードに切り替わる。例えば、入力モードの状態においてモード変更の指示が入力された場合、ＣＰＵ１０１は、動作モードを入力モードからビブラート編集モードに切り替える。 In step S210, the CPU 101 switches the operation mode. In this example, every time the user touches the window 212, the operation mode is switched in the order of the input mode, the vibrato edit mode, and the dynamics edit mode, and then the dynamics edit mode is switched to the input mode again. For example, when a mode change instruction is input in the input mode state, the CPU 101 switches the operation mode from the input mode to the vibrato edit mode.

ステップＳ２２０において、ＣＰＵ１０１は、現時点の動作モードが編集モードであるか判断する。ＲＡＭ１０３は現時点の動作モードを示す識別子を記憶しており、ＣＰＵ１０１は、動作モードを切り替えるたびにこの情報を書き替える。ＣＰＵ１０１は、ＲＡＭ１０３を参照して、現時点の動作モードが編集モードであるか判断する。現時点の動作モードが編集モードであると判断された場合（Ｓ２２０：ＮＯ）、ＣＰＵ１０１は、処理をステップＳ２３０に移行する。現時点の動作モードが編集モードでないと判断された場合（Ｓ２２０：ＹＥＳ）、ＣＰＵ１０１は、処理をステップＳ２００に移行する。 In step S220, the CPU 101 determines whether the current operation mode is the edit mode. The RAM 103 stores an identifier indicating the current operation mode, and the CPU 101 rewrites this information every time the operation mode is switched. The CPU 101 refers to the RAM 103 to determine whether the current operation mode is the edit mode. When it is determined that the current operation mode is the edit mode (S220: NO), the CPU 101 shifts the processing to step S230. When it is determined that the current operation mode is not the edit mode (S220: YES), the CPU 101 proceeds to step S200.

ステップＳ２３０の処理は、図７のステップＳ１００の処理と同様である。ステップＳ２４０において、ＣＰＵ１０１は、処理対象音符を特定する。ＣＰＵ１０１は、タッチされた位置が、ウインドウ２０２に表示されているノート図形のいずれかに相当する位置であった場合、ＣＰＵ１０１は、そのノート図形が示す音符を処理対象音符として特定する。例えば、図１１の画面でノート図形２０５をユーザがタッチした場合、ＣＰＵ１０１は、ノート図形２０５が示す音符（歌詞「さ」が割り当てられている）を処理対象音符として特定する。 The process in step S230 is the same as the process in step S100 in FIG. In step S240, the CPU 101 specifies a processing target note. When the touched position is a position corresponding to one of the note graphics displayed in the window 202, the CPU 101 specifies the note indicated by the note graphic as the processing target note. For example, when the user touches the note graphic 205 on the screen of FIG. 11, the CPU 101 specifies the note (along with the lyrics “sa” assigned to it) indicated by the note graphic 205 as the processing target note.

ステップＳ２５０において、ＣＰＵ１０１は、ウインドウ２１３に相当する位置でタッチを検出したか判断する。ウインドウ２１３に相当する位置でタッチを検出した場合（Ｓ２５０：ＹＥＳ）、ＣＰＵ１０１は、処理をステップＳ２６０に移行する。ウインドウ２１３に相当する位置でタッチを検出しなかった場合（Ｓ２５０：ＮＯ）、ＣＰＵ１０１は、タッチを検出するまで待機する。 In step S <b> 250, the CPU 101 determines whether a touch is detected at a position corresponding to the window 213. When a touch is detected at a position corresponding to the window 213 (S250: YES), the CPU 101 shifts the process to step S260. When a touch is not detected at a position corresponding to the window 213 (S250: NO), the CPU 101 waits until a touch is detected.

ステップＳ２６０において、ＣＰＵ１０１は、ユーザがタッチした位置の軌跡を記録する。タッチが検出されている間、ＣＰＵ１０１は、ウインドウ２１３に表示される画像を、軌跡に応じて変化させる。 In step S260, the CPU 101 records the locus of the position touched by the user. While the touch is detected, the CPU 101 changes the image displayed in the window 213 according to the trajectory.

図１３は、ウインドウ２１３に表示される画像の変化を例示する図である。ウインドウ２１３に相当する位置のタッチが検出される前は、ウインドウ２１３には、処理対象付加属性の現状（変更されていない場合は決められた初期値）を示す画像を表示する。ウインドウ２１３において、横軸は規格化された時間を示す。例えば処理対象音符が四分音符である場合、ウインドウ２１３の左端が時間ゼロを、右端が四分音符に相当する時間を示す。別の例で処理対象音符が二分音符である場合、ウインドウ２１３の左端が時間ゼロを、右端が二分音符に相当する時間を示す。ウインドウ２１３においてタッチを検出すると、ＣＰＵ１０１は、軌跡に応じて棒グラフの高さを変化させる。図１３（Ａ）は、ウインドウ２１３の左端付近から中央付近まで、矢印線に沿った軌跡が検出された場合を例示している。この場合、タッチが検出された位置においては、棒グラフの高さが軌跡に沿って変化している。図１３（Ｂ）は、図１３（Ａ）の状態からさらにウインドウ２１３の右端付近まで、矢印線に沿った軌跡が検出された場合を例示している。 FIG. 13 is a diagram illustrating a change in the image displayed in the window 213. Before a touch at a position corresponding to the window 213 is detected, an image indicating the current state of the processing target additional attribute (a determined initial value if not changed) is displayed on the window 213. In the window 213, the horizontal axis indicates the normalized time. For example, when the processing target note is a quarter note, the left end of the window 213 indicates time zero, and the right end indicates a time corresponding to a quarter note. In another example, when the processing target note is a half note, the left end of the window 213 indicates time zero, and the right end indicates a time corresponding to a half note. When a touch is detected in the window 213, the CPU 101 changes the height of the bar graph according to the trajectory. FIG. 13A illustrates a case where a locus along the arrow line is detected from the vicinity of the left end of the window 213 to the vicinity of the center. In this case, the height of the bar graph changes along the locus at the position where the touch is detected. FIG. 13B illustrates a case where a locus along the arrow line is detected from the state of FIG. 13A to the vicinity of the right end of the window 213.

再び図１２を参照する。ステップＳ２７０において、ＣＰＵ１０１は、ウインドウ２１３内におけるタッチが検出されなくなったか（すなわちタッチ非検出状態になったか）判断する。タッチが検出された場合（すなわち、ステップＳ２５０でタッチが検出されてから継続してタッチが検出された場合）（Ｓ２７０：ＮＯ）、ＣＰＵ１０１は、処理を再びステップＳ２６０に移行する。すなわち、ステップＳ２５０でタッチが検出されてから継続してタッチが検出され続けている間、ＣＰＵ１０１は、軌跡を記録し続ける。タッチが検出されなかった場合（Ｓ２７０：ＹＥＳ）、ＣＰＵ１０１は、処理をステップＳ２８０に移行する。 Refer to FIG. 12 again. In step S270, the CPU 101 determines whether or not the touch in the window 213 is not detected (that is, whether or not the touch is not detected). When a touch is detected (that is, when a touch is continuously detected after the touch is detected in step S250) (S270: NO), the CPU 101 shifts the process to step S260 again. That is, the CPU 101 continues to record the locus while the touch is continuously detected after the touch is detected in step S250. When no touch is detected (S270: YES), the CPU 101 shifts the process to step S280.

ステップＳ２８０において、ＣＰＵ１０１は、付加属性（パラメータ）の値を決定する。付加属性の種類は、動作モードによって決められている。例えば、ビブラート編集モードにおいて図１３（Ｂ）のような軌跡が検出された場合、ＣＰＵ１０１は、軌跡に応じてビブラートの強さ、速さ、および始期を決定する。
ステップＳ２９０における処理は、ステップＳ１６０の処理と同様である。 In step S280, the CPU 101 determines the value of the additional attribute (parameter). The type of additional attribute is determined by the operation mode. For example, when a trajectory as shown in FIG. 13B is detected in the vibrato edit mode, the CPU 101 determines the strength, speed, and start of the vibrato according to the trajectory.
The process in step S290 is the same as the process in step S160.

以上で説明したように、変形例１では、属性変更手段１５は、位置検出手段１４により検出された位置に応じて音符列の中から選択された一の音符を処理対象音符として、位置検出手段１４により検出された位置の軌跡に応じて、記憶手段１１に記憶されている処理対象音符の付加属性を変更する。変形例１では、タッチまたはドラッグにより、３種類の指示が入力される。第１の指示は、モードの変更の指示（ステップＳ２００およびＳ２１０）である。第２の指示は、処理対象音符を指定する指示（ステップＳ２３０およびＳ２４０）である。第３の指示は、パラメータの値の指示（ステップＳ２５０−Ｓ２８０）である。第１の指示と第２の指示の間、および第２の指示と第３の指示の間において、非タッチ状態を経由する。上述の実施形態と比較すると、操作の回数は増えるが、それぞれの指示を切り分けているため、誤操作を低減することができる。また、、ユーザは、ノート画像よりも大きいウインドウにおいて、すなわち、よりよい精度で、付加属性の変化を直感的に入力することができる。 As described above, in the first modification, the attribute changing unit 15 uses the one note selected from the note sequence according to the position detected by the position detecting unit 14 as the processing target note, and uses the position detecting unit. The additional attribute of the processing target note stored in the storage unit 11 is changed according to the locus of the position detected by 14. In the first modification, three types of instructions are input by touching or dragging. The first instruction is a mode change instruction (steps S200 and S210). The second instruction is an instruction (steps S230 and S240) for specifying a processing target note. The third instruction is a parameter value instruction (steps S250 to S280). A non-touch state is passed between the first instruction and the second instruction, and between the second instruction and the third instruction. Compared with the above-mentioned embodiment, although the frequency | count of operation increases, since each instruction | indication is separated, an erroneous operation can be reduced. Further, the user can intuitively input the change of the additional attribute in a window larger than the note image, that is, with better accuracy.

３−２．変形例２
軌跡に応じて変化されるパラメータは、実施形態で説明したものに限定されない。例えば、以下のパラメータが変更されてもよい（実施形態で説明したものも再度記載する）。
（１）ビブラート
上下動を繰り返す軌跡の場合には、ビブラートに関連する付加属性を変更する。別の例で、タッチパネル１１１がタッチされている面積を取得可能な構成を有している場合、強くタッチされているとき（タッチされている面積が広いとき）はビブラートを強くし、弱くタッチされているとき（タッチされている面積が広いとき）はビブラートを弱くする。さらに別の例で、強くタッチされているときはビブラートを速くし、弱くタッチされているときはビブラートを遅くする。
（２）音の強さ
右上がりの軌跡の場合は音をだんだん強くし、右下がりの軌跡の場合は音を段々弱くする（一音の範囲で音の強さが時間的に変化する）。あるいは、右上がりの軌跡の場合は従前より音を強くし、右下がりの軌跡の場合は従前より音を弱くする（一音の範囲で音の強さが時間的に変化しない）。さらに別の例で、タッチパネル１１１がタッチされている面積を取得可能な構成を有している場合、強くタッチされているときは音を強くし、弱くタッチされているときは音を弱くする。
（３）ピッチ
右上がりの軌跡の場合はピッチをだんだん高くし、右下がりの軌跡の場合はピッチを段々低くする（一音の範囲でピッチが時間的に変化する）。あるいは、右上がりの軌跡の場合は従前よりピッチを高くし、右下がりの軌跡の場合は従前よりピッチを低くする（一音の範囲でピッチが時間的に変化しない）。さらに別の例で、タッチパネル１１１がタッチされている面積を取得可能な構成を有している場合、強くタッチされているときはピッチを高くし、弱くタッチされているときはピッチを低くする。
（４）音の遷移時間
ドラッグした速度に応じて、素片データの遷移時間の割り当てを変更する。早くドラッグした場合（軌跡の始点と終点の時間間隔が短い場合）、前後の音とのクロスフェードの時間を長くする。ゆっくりドラッグした場合、前後の音とのクロスフェードの時間を短くする。
（５）ブレス
ドラッグした速度に応じて、ブレスのオンオフを変更する。早くドラッグした場合、ブレスを入れない。ゆっくりドラッグした場合、前の音との間にブレスを入れる。この場合は、楽譜データにブレスの有無を示す付加属性が含まれている。 3-2. Modification 2
The parameters that are changed according to the trajectory are not limited to those described in the embodiment. For example, the following parameters may be changed (the same as those described in the embodiment will be described again).
(1) Vibrato In the case of a trajectory that repeatedly moves up and down, the additional attribute related to vibrato is changed. In another example, when the touch panel 111 has a configuration capable of acquiring the touched area, the vibrato is strengthened and the touch is weak when the touch panel is strongly touched (when the touched area is wide). When you are touching (when the touched area is large), weaken the vibrato. In yet another example, vibrato is accelerated when touched strongly, and slowed down when touched weakly.
(2) Sound intensity In the case of a trajectory going up to the right, the sound is gradually strengthened, and in the trajectory going down to the right, the sound is gradually weakened (the intensity of the sound changes over time within a range of one sound). Alternatively, in the case of a trajectory that goes up to the right, the sound is made stronger than in the past, and in the case of a trajectory that goes down to the right, the sound is made weaker than before (the strength of the sound does not change with time in the range of one sound). In yet another example, when the touch panel 111 has a configuration capable of acquiring the touched area, the sound is increased when the touch panel is strongly touched, and the sound is decreased when the touch panel 111 is touched weakly.
(3) Pitch Increase the pitch gradually in the case of a trajectory that goes up to the right, and gradually lower the pitch in the trajectory that goes down to the right (the pitch changes with time in the range of one note). Alternatively, in the case of a trajectory that goes up to the right, the pitch is made higher than in the past, and in the case of a trajectory that goes down to the right, the pitch is made lower than in the past (the pitch does not change temporally within a range of one sound). In another example, when the touch panel 111 has a configuration capable of acquiring the touched area, the pitch is increased when the touch panel is strongly touched, and the pitch is decreased when the touch panel 111 is touched weakly.
(4) Sound transition time The transition time allocation of the segment data is changed according to the dragged speed. When dragging quickly (when the time interval between the start point and end point of the trajectory is short), the crossfade time between the previous and next sounds is increased. When dragging slowly, shorten the crossfade time with the sound before and after.
(5) Breath Change breath on / off according to dragging speed. If you drag early, don't put a breath. If you drag slowly, put a breath between the previous sound. In this case, the musical score data includes an additional attribute indicating the presence or absence of breath.

３−３．変形例３
軌跡に応じたパラメータの変化のさせ方は、実施形態で説明したものに限定されない。例えば、上下動を繰り返す軌跡が入力され、ビブラートに関する付加属性が変更される場合、付加属性の変化のさせ方は、以下のいずれか、または以下の２つ以上の組み合わせが用いられる。
（１）ビブラートをかけることだけが決定され、ビブラートの強さ、速さ、および始期は、決められた初期値が用いられる。
（２）軌跡から特徴パラメータとして抽出された振幅に応じてビブラートの強さが決定される（振幅が大きいほどビブラートは強く）。
（３）軌跡から特徴パラメータとして抽出された極値の数に応じてビブラートの早さが決定される（極値の数が多いほどビブラートは早く）。
（４）軌跡から特徴パラメータとして抽出された始点の時間軸座標に応じてビブラートの始期が決定される（時間が遅いほど始期が遅く）。
（５）記録された軌跡を決められた解像度で量子化（クオンタイズ）し、量子化後の軌跡に応じてビブラートの波形を決定。
（６）軌跡から特徴パラメータとして極値の座標が抽出される場合、極値の時間間隔の変化に応じてビブラートの早さが決定される（極値の時間間隔が段々短くすなわち段々密になると、一音の範囲でビブラートの速さが段々速く）。
（７）軌跡から特徴パラメータとして極値の座標が抽出される場合、振幅（時間軸で隣接する２つの極値の音高の差）の時間変化に応じてビブラートの強さが決定される（振幅が段々大きくなると、一音の範囲でビブラートの強さが段々強く）。
なお、ビブラート以外の付加属性に対し、上記（１）〜（７）のうち適用可能なものを適用してもよい。 3-3. Modification 3
The method of changing the parameter according to the trajectory is not limited to that described in the embodiment. For example, when a trajectory that repeatedly moves up and down is input and the additional attribute related to vibrato is changed, any of the following or a combination of two or more of the following is used to change the additional attribute.
(1) It is determined only to apply vibrato, and determined initial values are used for the strength, speed, and start of vibrato.
(2) The vibrato strength is determined according to the amplitude extracted as a characteristic parameter from the trajectory (the greater the amplitude, the stronger the vibrato).
(3) The speed of vibrato is determined according to the number of extreme values extracted from the locus as feature parameters (the more the number of extreme values, the faster the vibrato).
(4) The start time of the vibrato is determined according to the time axis coordinates of the start point extracted as a feature parameter from the trajectory (the start time is later as the time is later).
(5) The recorded trajectory is quantized (quantized) with a predetermined resolution, and the vibrato waveform is determined according to the quantized trajectory.
(6) When extremal coordinates are extracted as feature parameters from the trajectory, the speed of vibrato is determined in accordance with changes in the extremum time interval (when the extremum time interval becomes shorter, that is, becomes denser) , The vibrato speed is gradually higher in the range of one sound).
(7) When extremal coordinates are extracted as feature parameters from the trajectory, the vibrato strength is determined according to the temporal change in amplitude (difference in pitch between two extreme values adjacent on the time axis) ( (When the amplitude increases gradually, the vibrato strength gradually increases within the range of one note).
In addition, you may apply the applicable thing among said (1)-(7) with respect to additional attributes other than vibrato.

３−４．変形例４
軌跡に応じて付加属性の変更が確定した後、確認音を出力してもよい。確認音とは、処理対象音符について、割り当てられた歌詞と属性（基本属性および付加属性）に基づいて合成された音声をいう。例えば処理対象音符にビブラートが付加された場合、ＣＰＵ１０１は、ビブラートが付加された音声を合成してスピーカ１０９から出力させてもよい。この例によれば、ユーザは、変更後の付加属性を容易に確認することができる。 3-4. Modification 4
A confirmation sound may be output after the change of the additional attribute is confirmed according to the trajectory. The confirmation sound refers to a sound synthesized based on the assigned lyrics and attributes (basic attributes and additional attributes) for the processing target note. For example, when vibrato is added to the processing target note, the CPU 101 may synthesize the voice with the vibrato added and output it from the speaker 109. According to this example, the user can easily confirm the added attribute after the change.

３−５．変形例５
軌跡に応じて、付加属性の値が決められた値（「値なし」または初期値）に書き替えられてもよい。例えば、終点の時間軸座標が始点の時間軸座標よりも小さい場合、すなわち、時間軸の負方向（逆向き）にドラッグした場合、ＣＰＵ１０１は、付加属性の値を取り消し（値なし）にしてもよい。この場合において、軌跡の形状に応じて、値を取り消す付加属性を特定してもよい。例えば、時間軸の負方向に上下に振動する軌跡が得られた場合、ＣＰＵは、ビブラートに関するパラメータの値を「値なし」に書き替えてもよい。この場合において、付加属性だけでなく、処理対象音符の基本属性を消去してもよい。別の例で、処理対象音符がダブルクリックされた場合（すなわち、タッチパネル１１１においてタッチが検出され処理対象音符が確定してから、タッチ非検出状態を経由して、決められた時間内に処理対象音符に相当する位置において再度タッチが検出された場合）、ＣＰＵ１０１は、処理対象音符の付加属性または基本属性を消去してもよい。さらに別の例で、タッチパネル１１１がマルチタッチ可能な構成（すなわち複数の位置を同時に検出可能な構成）を有している場合において、第１の指が処理対象音符をタッチし、第２の指が処理対象音符以外の場所をタッチしたとき（すなわち、一の音符（処理対象音符）に相当する位置における第１のタッチが検出され、タッチが検出されたままの状態で、処理対象音符以外の場所における第２のタッチが検出されたとき）、ＣＰＵ１０１は、処理対象音符の付加属性または基本属性を消去してもよい。 3-5. Modification 5
Depending on the trajectory, the value of the additional attribute may be rewritten to a determined value (“no value” or an initial value). For example, when the time axis coordinate of the end point is smaller than the time axis coordinate of the start point, that is, when dragging in the negative direction (reverse direction) of the time axis, the CPU 101 cancels the value of the additional attribute (no value). Good. In this case, an additional attribute whose value is to be canceled may be specified according to the shape of the trajectory. For example, when a trajectory that vibrates up and down in the negative direction of the time axis is obtained, the CPU may rewrite the parameter value relating to vibrato as “no value”. In this case, not only the additional attribute but also the basic attribute of the processing target note may be deleted. In another example, when a processing target note is double-clicked (that is, after a touch is detected on the touch panel 111 and the processing target note is confirmed, the processing target note is processed within a predetermined time via a touch non-detection state. When a touch is detected again at a position corresponding to a note), the CPU 101 may delete the additional attribute or basic attribute of the processing target note. In yet another example, when the touch panel 111 has a multi-touch configuration (that is, a configuration capable of detecting a plurality of positions at the same time), the first finger touches the processing target note and the second finger Touches a place other than the processing target note (that is, the first touch at the position corresponding to the one note (processing target note) is detected and the touch remains detected, When the second touch at the place is detected), the CPU 101 may delete the additional attribute or the basic attribute of the processing target note.

３−６．他の変形例
音声合成装置１は、タッチパネル１１１を有していなくてもよい。例えば、音声合成装置１は、入力部１０５としてマウス、キーパッド、またはペンタブレットを有していてもよい。また、音声合成装置１は、タッチパネル式の情報表示装置に限定されない。音声合成装置１は、パーソナルコンピュータ、携帯電話機、携帯ゲーム機、携帯音楽プレーヤ、または電子ブックリーダであってもよい。 3-6. Other Modifications The speech synthesizer 1 may not have the touch panel 111. For example, the speech synthesizer 1 may have a mouse, keypad, or pen tablet as the input unit 105. The speech synthesizer 1 is not limited to a touch panel type information display device. The voice synthesizer 1 may be a personal computer, a mobile phone, a portable game machine, a portable music player, or an electronic book reader.

音声合成装置１のハードウェア構成は、図４で説明したものに限定されない。図１に示される機能を実装できるものであれば、音声合成装置１はどのようなハードウェア構成を有していてもよい。例えば、音声合成装置１は、図１に示される機能要素の各々に対応する専用のハードウェア（回路）を有していてもよい。別の例で、図４で例示した音声合成装置１のハードウェア構成要素の一部は、いわゆる外付けの装置であってもよい。例えば、表示部１０６またはスピーカ１０９は外付けの装置であってもよい。 The hardware configuration of the speech synthesizer 1 is not limited to that described with reference to FIG. As long as the functions shown in FIG. 1 can be implemented, the speech synthesizer 1 may have any hardware configuration. For example, the speech synthesizer 1 may have dedicated hardware (circuit) corresponding to each of the functional elements shown in FIG. In another example, some of the hardware components of the speech synthesizer 1 illustrated in FIG. 4 may be a so-called external device. For example, the display unit 106 or the speaker 109 may be an external device.

文字列は平仮名に限定されない。アルファベットまたは発音記号等が、歌詞を示す文字列として用いられてもよい。
ピアノロール表示においてノート図形内に表示される文字は実施形態で説明したものに限定されない。歌詞の一部である平仮名に加え、対応する発音記号が併せて表示されてもよい。
楽譜データの構造は、図３で例示したものに限定されない。音符と歌詞との対応関係、および音符の属性を特定できるものであれば、どのような構造のデータが用いられてもよい。また、実施形態において歌詞（文字列）と楽譜データとが別のデータセットである例を説明したが、歌詞は楽譜データの一部であってもよい。別の例で、楽譜データは発音記号を含んでいなくてもよい。すなわち、本発明は、音声合成機能を有していない、楽曲の編集装置に適用されてもよい。 The character string is not limited to hiragana. An alphabet or a phonetic symbol may be used as a character string indicating lyrics.
The characters displayed in the note graphic in the piano roll display are not limited to those described in the embodiment. In addition to hiragana, which is part of the lyrics, a corresponding phonetic symbol may be displayed together.
The structure of the score data is not limited to that illustrated in FIG. Data having any structure may be used as long as the correspondence between the notes and the lyrics and the attributes of the notes can be specified. In the embodiment, the example in which the lyrics (character string) and the score data are separate data sets has been described. However, the lyrics may be a part of the score data. In another example, the musical score data may not include phonetic symbols. That is, the present invention may be applied to a music editing apparatus that does not have a speech synthesis function.

音声合成処理の詳細は、実施形態で説明したものに限定されない。音符と発音記号（文字）とが与えられたときに、その音符および発音記号に応じた音声を合成するものであれば、どのような処理が用いられてもよい。
軌跡から抽出される特徴パラメータは、実施形態で説明したものに限定されない。軌跡の特徴を示すものであれば、どのようなパラメータが用いられてもよい。
軌跡と処理対象の付加属性を対応させる方法は特徴パラメータを抽出するものに限定されない。軌跡を規格化し、基準となる波形との一致度を算出するような手法が用いられてもよい。
変更された付加属性を表示する方法は、図１０で例示したものに限定されない。例えば、ノート図形の色を変えたり、ノート図形の形状を変えたりしてもよい。 Details of the speech synthesis process are not limited to those described in the embodiment. As long as a note and a phonetic symbol (character) are given, any processing may be used as long as it synthesizes a sound corresponding to the note and the phonetic symbol.
The feature parameters extracted from the trajectory are not limited to those described in the embodiment. Any parameter may be used as long as it indicates the trajectory characteristics.
The method for associating the trajectory with the additional attribute to be processed is not limited to extracting feature parameters. A technique may be used in which the trajectory is normalized and the degree of coincidence with the reference waveform is calculated.
The method for displaying the changed additional attribute is not limited to that illustrated in FIG. For example, the color of the note graphic may be changed, or the shape of the note graphic may be changed.

上述の実施形態で説明した音声合成プログラムは、磁気記録媒体（磁気テープ、磁気ディスク（ＨＤＤ、ＦＤ（Flexible Disk））など）、光記録媒体（光ディスク（ＣＤ（Compact Disk）、ＤＶＤ（Digital Versatile Disk））など）、光磁気記録媒体、半導体メモリ（フラッシュＲＯＭなど）などのコンピュータ読取り可能な記録媒体に記憶した状態で提供されてもよい。また、このプログラムは、インターネットのようなネットワーク経由でダウンロードされてもよい。 The speech synthesis program described in the above embodiment includes a magnetic recording medium (magnetic tape, magnetic disk (HDD, FD (Flexible Disk)), etc.), an optical recording medium (optical disk (CD (Compact Disk)), DVD (Digital Versatile Disk). )), Etc.), a magneto-optical recording medium, and a computer-readable recording medium such as a semiconductor memory (flash ROM or the like). The program may be downloaded via a network such as the Internet.

１…音声合成装置、１１…記憶手段、１２…表示制御手段、１３…表示手段、１４…位置検出手段、１５…属性変更手段、１６…音声合成手段、１７…音声出力手段、１０１…ＣＰＵ、１０２…ＲＯＭ、１０３…ＲＡＭ、１０４…記憶部、１０５…入力部、１０６…表示部、１０７…ＤＡＣ、１０８…アンプ、１０９…スピーカ、１１１…タッチパネル、２０１…入力ボックス、２０２…ウインドウ、２０３…ガイド図形、２０４…ノート図形、２０５…ノート図形、２０６…ノート図形、２０７…ノート図形、２０８…ノート図形、２０９…再生ボタン、２１０…停止ボタン、２１１…ウインドウ、２１２…ウインドウ、２１３…ウインドウ、３０３…個人別データベース、３０３０…素片データ群、３０３１…素片データ群、３０３２…素片データ群 DESCRIPTION OF SYMBOLS 1 ... Speech synthesizer, 11 ... Memory | storage means, 12 ... Display control means, 13 ... Display means, 14 ... Position detection means, 15 ... Attribute change means, 16 ... Speech synthesizer, 17 ... Voice output means, 101 ... CPU, DESCRIPTION OF SYMBOLS 102 ... ROM, 103 ... RAM, 104 ... Memory | storage part, 105 ... Input part, 106 ... Display part, 107 ... DAC, 108 ... Amplifier, 109 ... Speaker, 111 ... Touch panel, 201 ... Input box, 202 ... Window, 203 ... Guide graphic, 204 ... note graphic, 205 ... note graphic, 206 ... note graphic, 207 ... note graphic, 208 ... note graphic, 209 ... play button, 210 ... stop button, 211 ... window, 212 ... window, 213 ... window, 303 ... Individual database, 3030 ... Segment data group, 3031 ... Segment data group, 3032 ... Piece data group

Claims

複数の音符の各々について、当該音符の発音期間の始期、音高、音長、および当該音符に対する音楽的な修飾を示す付加属性を含む属性を記憶する属性記憶手段と、
音高を表す第１軸および時間を表す第２軸を有する座標系に従って、前記複数の音符の各々の発音期間の始期、音高、および音長を表す図形を表示手段の画面に表示させる表示制御手段と、
前記画面上において指定された位置を検出する位置検出手段と、
前記位置検出手段により継続して検出された位置の軌跡の始点が、前記複数の音符の各々に相当する領域内になかった場合、前記始点に応じた発音期間の始期を有する新たな音符を処理対象音符として生成し、前記処理対象音符の音高、音長および付加属性を前記軌跡に応じて決定し、当該決定された音高、音長および付加属性を前記属性記憶手段に書き込む属性変更手段と、
前記属性記憶手段に記憶されている属性を用いて、前記複数の音符に対応する楽曲データを生成するデータ生成手段と
を有する楽曲編集装置。 Attribute storage means for storing, for each of a plurality of notes, an attribute including an additional attribute indicating the beginning of the sound generation period of the note, the pitch, the tone length, and musical modification to the note;
A display for displaying on the screen of the display means a graphic representing the beginning, pitch, and length of the pronunciation period of each of the plurality of notes according to a coordinate system having a first axis representing pitch and a second axis representing time. Control means;
Position detecting means for detecting a designated position on the screen;
When the start point of the locus of the position continuously detected by the position detection means is not in the area corresponding to each of the plurality of notes, a new note having a start period of the sound generation period corresponding to the start point is processed. Attribute changing means for generating as a target note, determining the pitch, pitch and additional attribute of the processing target note according to the trajectory, and writing the determined pitch, pitch and additional attribute to the attribute storage means When,
A music editing apparatus comprising: data generation means for generating music data corresponding to the plurality of notes using the attributes stored in the attribute storage means.

複数の音符の各々について、当該音符の発音期間の始期、音高、音長、および当該音符に対する音楽的な修飾を示す付加属性を含む属性を記憶する属性記憶手段と、
音高を表す第１軸および時間を表す第２軸を有する座標系に従って、前記複数の音符の各々の発音期間の始期、音高、および音長を表す図形を表示手段の画面のうち第１領域に表示させる表示制御手段と、
前記画面上において指定された位置を検出する位置検出手段と、
前記位置検出手段により検出された位置に応じて前記複数の音符の中から選択された一の音符を処理対象音符として、前記位置検出手段により継続して検出された位置の軌跡に応じて、前記属性記憶手段に記憶されている前記処理対象音符の付加属性を変更する属性変更手段と、
前記属性記憶手段に記憶されている属性を用いて、前記複数の音符に対応する楽曲データを生成するデータ生成手段と
を有し、
前記表示制御手段は、前記処理対象音符の付加属性を示す画像を前記画面のうち第２領域に表示させ、
前記属性変更手段により前記処理対象音符の付加属性の変更がされた場合、前記表示制御手段は、前記第２領域に表示されている画像を当該付加属性の変更に応じて変更する
楽曲編集装置。 Attribute storage means for storing, for each of a plurality of notes, an attribute including an additional attribute indicating the beginning of the sound generation period of the note, the pitch, the tone length, and musical modification to the note;
In accordance with a coordinate system having a first axis representing pitch and a second axis representing time, a graphic representing the beginning, pitch, and length of the pronunciation period of each of the plurality of notes is displayed on the first screen of the display means . Display control means for displaying in the area ;
Position detecting means for detecting a designated position on the screen;
One note selected from the plurality of notes according to the position detected by the position detection means is a processing target note, and according to the locus of the position continuously detected by the position detection means, Attribute changing means for changing the additional attribute of the processing target note stored in the attribute storage means;
Using the attributes stored in the attribute storage means, have a data generating means for generating music data corresponding to the plurality notes,
The display control means displays an image indicating an additional attribute of the processing target note in the second area of the screen,
When the additional attribute of the processing target note is changed by the attribute changing unit, the display control unit changes the image displayed in the second area in accordance with the change of the additional attribute .

前記付加属性は、それぞれ異なる音楽的修飾を示す複数のパラメータを含み、
前記属性変更手段は、前記処理対象音符について、前記複数のパラメータの中から前記軌跡に応じて選択された一のパラメータの値を、前記軌跡に応じて変更する
ことを特徴とする請求項１または２に記載の楽曲編集装置。 The additional attribute includes a plurality of parameters indicating different musical modifications,
The attribute changing unit changes, for the processing target note, a value of one parameter selected according to the trajectory from the plurality of parameters according to the trajectory. 2. The music editing device according to 2.

前記属性変更手段は、前記軌跡が描かれた速度に応じて、前記付加属性を変化させるThe attribute changing unit changes the additional attribute according to a speed at which the trajectory is drawn.
ことを特徴とする請求項１ないし３のいずれか一項に記載の楽曲編集装置。The music editing device according to any one of claims 1 to 3, wherein

コンピュータを、
複数の音符の各々について、当該音符の発音期間の始期、音高、音長、および当該音符に対する音楽的な修飾を示す付加属性を含む属性を記憶する属性記憶手段と、
音高を表す第１軸および時間を表す第２軸を有する座標系に従って、前記複数の音符の各々の発音期間の始期、音高、および音長を表す図形を表示手段の画面に表示させる表示制御手段と、
前記画面上において指定された位置を検出する位置検出手段と、
前記位置検出手段により継続して検出された位置の軌跡の始点が、前記複数の音符の各々に相当する領域内になかった場合、前記始点に応じた発音期間の始期を有する新たな音符を処理対象音符として生成し、前記処理対象音符の音高、音長および付加属性を前記軌跡に応じて決定し、当該決定された音高、音長および付加属性を前記属性記憶手段に書き込む属性変更手段と、
前記属性記憶手段に記憶されている属性を用いて、前記複数の音符に対応する楽曲データを生成するデータ生成手段と
して機能させるためのプログラム。 Computer
Attribute storage means for storing, for each of a plurality of notes, an attribute including an additional attribute indicating the beginning of the sound generation period of the note, the pitch, the tone length, and musical modification to the note;
A display for displaying on the screen of the display means a graphic representing the beginning, pitch, and length of the pronunciation period of each of the plurality of notes according to a coordinate system having a first axis representing pitch and a second axis representing time. Control means;
Position detecting means for detecting a designated position on the screen;
When the start point of the locus of the position continuously detected by the position detection means is not in the area corresponding to each of the plurality of notes, a new note having a start period of the sound generation period corresponding to the start point is processed. Attribute changing means for generating as a target note, determining the pitch, pitch and additional attribute of the processing target note according to the trajectory, and writing the determined pitch, pitch and additional attribute to the attribute storage means When,
A program for functioning as data generation means for generating music data corresponding to the plurality of notes, using attributes stored in the attribute storage means.

コンピュータを、
複数の音符の各々について、当該音符の発音期間の始期、音高、音長、および当該音符に対する音楽的な修飾を示す付加属性を含む属性を記憶する属性記憶手段と、
音高を表す第１軸および時間を表す第２軸を有する座標系に従って、前記複数の音符の各々の発音期間の始期、音高、および音長を表す図形を表示手段の画面のうち第１領域に表示させる表示制御手段と、
前記画面上において指定された位置を検出する位置検出手段と、
前記位置検出手段により検出された位置に応じて前記複数の音符の中から選択された一の音符を処理対象音符として、前記位置検出手段により継続して検出された位置の軌跡に応じて、前記属性記憶手段に記憶されている前記処理対象音符の付加属性を変更する属性変更手段と、
前記属性記憶手段に記憶されている属性を用いて、前記複数の音符に対応する楽曲データを生成するデータ生成手段と
して機能させ、
前記表示制御手段は、前記処理対象音符の付加属性を示す画像を前記画面のうち第２領域に表示させ、
前記属性変更手段により前記処理対象音符の付加属性の変更がされた場合、前記表示制御手段は、前記第２領域に表示されている画像を当該付加属性の変更に応じて変更する
プログラム。 Computer
Attribute storage means for storing, for each of a plurality of notes, an attribute including an additional attribute indicating the beginning of the sound generation period of the note, the pitch, the tone length, and musical modification to the note;
In accordance with a coordinate system having a first axis representing pitch and a second axis representing time, a graphic representing the beginning, pitch, and length of the pronunciation period of each of the plurality of notes is displayed on the first screen of the display means . Display control means for displaying in the area ;
Position detecting means for detecting a designated position on the screen;
One note selected from the plurality of notes according to the position detected by the position detection means is a processing target note, and according to the locus of the position continuously detected by the position detection means, Attribute changing means for changing the additional attribute of the processing target note stored in the attribute storage means;
Using the attribute stored in the attribute storage means, function as data generation means for generating music data corresponding to the plurality of notes ,
The display control means displays an image indicating an additional attribute of the processing target note in the second area of the screen,
When the additional attribute of the processing target note is changed by the attribute changing unit, the display control unit changes the image displayed in the second area according to the change of the additional attribute .