JP2010008938A

JP2010008938A - Voice recorder and speech recording method

Info

Publication number: JP2010008938A
Application number: JP2008171263A
Authority: JP
Inventors: Shinji Torigoe; 真志鳥越
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2008-06-30
Filing date: 2008-06-30
Publication date: 2010-01-14

Abstract

<P>PROBLEM TO BE SOLVED: To provide technology capable of easily reproducing a desired recorded part in a record range without user's operation during recording. <P>SOLUTION: When a person speaks, a subject is often changed at any time. When the subject is changed, connecting words such as "Now then" and "By the way" are often used. Taking it into consideration, when speech is recorded, speech input of the connection words is detected by performing speech recognition. A speech data obtained by the speech input is stored in a memory so as to be divided according to its detection result. The recorded part which is divided according to the detection result of the connection words is regarded as one reproduction unit. Selection of the reproduction unit is performed by the user, by showing a letter string which is obtained by performing speech recognition on a head part for each reproduction unit. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声をデジタルデータに変換して入力し保存するための技術に関する。 The present invention relates to a technique for converting voice to digital data, inputting it, and storing it.

近年、音声をデジタルデータに変換して入力し保存するボイスレコーダーは、半導体技術の進歩に伴い、小型化・軽量化している（特許文献１）。それにより、音声録音用の装置として広く普及している。 In recent years, voice recorders that convert voice to digital data, input, and store it have become smaller and lighter with the advance of semiconductor technology (Patent Document 1). As a result, it is widely used as a voice recording device.

音声の録音、つまり音声を変換して得られるデジタルデータの保存は普通、半導体メモリを用いて行われる。半導体技術の進歩により、ボイスレコーダーは大容量の半導体メモリが搭載されているか、或いは装着することが可能となっている。それにより、長時間録音が可能なボイスレコーダーが数多く製品化されている。デジタルデータの保存には音声圧縮技術が用いられるのが普通である。 Sound recording, that is, storage of digital data obtained by converting sound is usually performed using a semiconductor memory. Due to advances in semiconductor technology, voice recorders have or can be equipped with large-capacity semiconductor memory. As a result, many voice recorders that can record for a long time have been commercialized. Audio compression technology is usually used to store digital data.

録音開始から終了までの録音時間（範囲）内に得られたデジタルデータは、１ファイルとして扱うのが普通である。長時間録音を行った場合、その録音時間のなかで重要と考える録音部分を探すのは面倒である。このことからボイスレコーダーのなかには、録音中、頭出しが可能な箇所を登録できるようになっているものがある。 Digital data obtained within the recording time (range) from the start to the end of recording is normally handled as one file. When recording for a long time, it is troublesome to find a recording portion that is considered important in the recording time. For this reason, some voice recorders allow you to register where you can cue during recording.

頭出しが可能な箇所を登録できる従来のボイスレコーダーは、その箇所の登録は所定の操作によりユーザー自身が行うようになっていた。その操作を行うためには、その操作を行う必要性を常に判断していなければならない。これは、本来、意識すべきことに集中するのを阻害する。例えば、録音している話の内容の理解や記憶をより難しくする。それにより、対話を録音している場合には、相手への適切な対応をより難しいものとさせる可能性がある。このようなことから、長時間録音を想定した場合、ユーザーに何らかの操作を行わせることなく、録音範囲のなかで再生させたい録音部分をより容易に探しだして再生できるようにすることが重要と考えられる。
特開２００２−２２１９９６号公報 In a conventional voice recorder that can register a location where cueing is possible, the location is registered by the user by a predetermined operation. In order to perform the operation, the necessity to perform the operation must always be determined. This essentially prevents you from focusing on what you should be aware of. For example, it makes it harder to understand and remember the content of the story being recorded. This can make it more difficult to respond appropriately to the other party when recording a conversation. For this reason, when recording for a long period of time, it is important to make it easier to find and play the recording part you want to play in the recording range without requiring the user to perform any operation. Conceivable.
Japanese Patent Laid-Open No. 2002-221996

本発明の課題は、録音中にユーザーに操作を行わせることなく、録音範囲のなかで所望の録音部分をより容易に再生できるようにする技術を提供することにある。 An object of the present invention is to provide a technique that makes it possible to more easily reproduce a desired recording portion within a recording range without causing a user to perform an operation during recording.

本発明の第１の態様のボイスレコーダーは、音声をデジタルデータに変換して入力する音声入力手段、及び該音声入力手段が出力するデジタルデータを記憶する記憶手段を備えていることを前提とし、音声入力手段が出力するデジタルデータを用いて音声認識を行う音声認識手段と、音声認識手段による音声認識の結果を用いて、音声入力手段が音声を入力する人が話す内容上の区切りを特定する区切り特定手段と、区切り特定手段による区切りの特定結果に応じて、音声入力手段が連続して出力するデジタルデータを複数のデジタルデータ群に分割可能に記憶手段に記憶する制御手段と、を具備する。 The voice recorder according to the first aspect of the present invention is based on the premise that the voice recorder includes voice input means for converting voice to digital data and inputting it, and storage means for storing the digital data output from the voice input means. The speech recognition means for performing speech recognition using the digital data output from the speech input means, and the speech recognition means by the speech recognition means to identify a break on the content spoken by the person who inputs the speech A delimiter specifying unit; and a control unit that stores digital data continuously output by the voice input unit in a storage unit in a manner that can be divided into a plurality of digital data groups according to a delimiter specifying result by the delimiter specifying unit. .

上記記憶手段に記憶するデジタルデータは、音声入力によって得られたデータであっても良いが、圧縮処理を施した後のものであっても良い。つまり、デジタルデータの種類は特に限定するものではない。複数のデジタルデータ群への分割は、物理的に行っても良いが、論理的に行えるようにしても良い。このことから、それはデジタルデータを記憶手段に記憶させる方法を制限するものではない。 The digital data stored in the storage means may be data obtained by voice input, or may be data after compression processing. That is, the type of digital data is not particularly limited. The division into a plurality of digital data groups may be performed physically or may be performed logically. For this reason, it does not limit the way in which the digital data is stored in the storage means.

第２の態様のボイスレコーダーは、上記第１の態様における構成に加えて、各種情報を表示可能な表示手段と、記憶手段に記憶されたデジタルデータを再生する再生手段と、を更に具備し、制御手段は、区切り特定手段が特定する区切りによって分割可能に記憶手段に記憶させたデジタルデータ群毎に、該デジタルデータ群を用いた音声認識手段による音声認識の結果を表示手段に表示させ、再生手段に再生させるデジタルデータ群をユーザーに選択させる。 In addition to the configuration of the first aspect, the voice recorder of the second aspect further includes a display means capable of displaying various information, and a reproducing means for reproducing the digital data stored in the storage means, The control means displays the result of speech recognition by the speech recognition means using the digital data group on the display means for each digital data group stored in the storage means so that it can be divided by the break specified by the break specifying means, and reproduces it. The user selects a digital data group to be reproduced by the means.

なお、上記表示手段に表示させる音声認識の結果は、デジタルデータ群の先頭部分が示す先頭文字列、及び該デジタルデータ群が示す文字列全体のなかに出現する予め登録した文字列のうちの少なくとも一方である、ことが望ましい。 Note that the result of speech recognition displayed on the display means is at least one of the first character string indicated by the leading portion of the digital data group and the previously registered character string appearing in the entire character string indicated by the digital data group. On the other hand, it is desirable.

また、上記区切り特定手段は、音声認識手段による音声認識の結果を用いて、音声入力手段を介して音声入力された話の話題を変える際に使用される接続語の検出を行うことにより、区切りを特定する、ことが望ましい。 Further, the delimiter specifying unit detects a connection word used when changing the topic of the speech input through the voice input unit using the result of the voice recognition by the voice recognition unit, thereby delimiting the delimiter. It is desirable to specify.

本発明の音声録音方法は、音声をデジタルデータに変換して入力し保存する場合に用いられるものであり、入力により得られたデジタルデータを用いて音声認識を行う音声認識工程と、音声認識工程により得られた音声認識の結果を用いて、音声を入力する人が話す内容上の区切りを特定する区切り特定工程と、区切り特定工程での区切りの特定結果に応じて、音声の変換により連続して得られるデジタルデータを複数のデジタルデータ群に分割可能に記憶手段に記憶する記憶工程と、を備える。 The voice recording method of the present invention is used when voice is converted into digital data, input and stored, and a voice recognition process for performing voice recognition using the digital data obtained by the input, and a voice recognition process Using the speech recognition results obtained by the above, the segment identification step for identifying the segment on the content spoken by the person who inputs the speech, and the conversion of the speech according to the segment identification result in the segment identification step A storage step of storing the digital data obtained in this manner in a storage means so as to be divided into a plurality of digital data groups.

本発明では、音声入力により得られたデジタルデータを用いて音声認識を行い、その音声認識の結果を用いて、音声を入力する人が話す内容上の区切りを特定し、その特定結果に応じて、音声入力により連続して得られるデジタルデータを複数のデジタルデータ群に分割可能に記憶する。 In the present invention, voice recognition is performed using digital data obtained by voice input, and the speech recognition result is used to specify a break in the content spoken by the person who inputs the voice, and according to the specification result. The digital data continuously obtained by voice input is stored so as to be divided into a plurality of digital data groups.

人は、聞く人が理解しやすいように話をするのが普通である。このため、録音される音声を聞いていた人にとって、例えば話題の変更は話の内容上の区切りと認識される箇所となり、記憶に残りやすい。異なる話題では、話す内容も異なるのが普通であるから、１つの話題の話は再生を所望する範囲に相当することが多い。話の内容上の区切りで分割可能にデジタルデータを保存した場合には、異なる話が行われた録音範囲で分けた再生が行えるようになる。これらのことから、再生を所望する録音範囲の抽出はより容易、且つ迅速に行えるようになる。その録音範囲の再生もより容易、且つ迅速に行うことができる。 A person usually speaks so that the listener can easily understand. For this reason, for a person who has listened to the recorded voice, for example, a topic change becomes a location that is recognized as a break in the content of the story and is likely to remain in memory. Since different topics usually have different spoken contents, a single topic often corresponds to a range desired to be reproduced. When digital data is stored so that it can be divided at the segmentation of the story content, it is possible to perform playback divided in the recording range where the different story was performed. For these reasons, the recording range desired to be reproduced can be extracted more easily and quickly. The recording range can be reproduced more easily and quickly.

話題の変更では、「さて」「ところで」といった接続語が使用される場合が多い。このことから、接続語に着目することにより、話の内容上の区切りを高精度に特定することができる。 In changing topics, connection words such as “Now” and “By the way” are often used. For this reason, it is possible to specify the break in the content of the story with high accuracy by paying attention to the connected word.

上記録音範囲、つまりデジタルデータ群毎に、音声認識の結果、例えば先頭部分の音声認識により得られる文字列（話し出し部分）、或いは予め登録された文字列のなかで出現する文字列等を表示して、再生する録音範囲をユーザーに選択させるようにした場合には、ユーザーは再生を所望する録音範囲の選択を更に容易に行えるようになる。これは、話し出し部分、登録された文字列のなかで出現する文字列は共に、対応する録音範囲の話の内容を思い出す、或いは推測するのを支援する情報だからである。 For each recording range, that is, for each digital data group, as a result of speech recognition, for example, a character string (spoken part) obtained by speech recognition of the head portion or a character string appearing in a pre-registered character string is displayed. Thus, when the user selects the recording range to be reproduced, the user can more easily select the recording range desired to be reproduced. This is because both the spoken part and the character string appearing in the registered character string are information that assists in recalling or guessing the content of the story in the corresponding recording range.

以下、本発明の実施形態について、図面を参照しながら詳細に説明する。
図１は、本実施形態によるボイスレコーダーの構成を示す図である。そのボイスレコーダー（以降「レコーダー」と略記）は、音声をデジタルデータに変換して入力する録音部１０１と、レコーダー全体の制御を実行するマイクロコンピュータ（以降「マイコン」）１０２と、録音した音声（デジタルデータ）を再生するための再生部１０３と、録音部１０１が出力するデジタルデータを格納する常時録音用メモリ（以降「録音用メモリ」と略記）１０４と、音声保存用のメモリ１０５と、音声認識部１０６と、音声認識部１０６が音声認識した文字列の意味を認識するための辞書を格納した辞書部１０７と、ユーザーが各種操作を行うための操作部１０８と、例えば液晶表示装置である表示部１０９と、各部１０１〜１０９を互いに接続する内部バス１１０と、を備えた構成となっている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 1 is a diagram showing the configuration of the voice recorder according to the present embodiment. The voice recorder (hereinafter abbreviated as “recorder”) includes a recording unit 101 that converts voice into digital data and inputs it, a microcomputer (hereinafter “microcomputer”) 102 that controls the entire recorder, and recorded voice ( A playback unit 103 for playing back the digital data), a continuous recording memory (hereinafter abbreviated as “recording memory”) 104 for storing the digital data output from the recording unit 101, a voice storage memory 105, and a voice A recognition unit 106; a dictionary unit 107 that stores a dictionary for recognizing the meaning of a character string recognized by the voice recognition unit 106; an operation unit 108 for a user to perform various operations; and a liquid crystal display device, for example. The display unit 109 includes an internal bus 110 that connects the units 101 to 109 to each other.

録音部１０１は、音声を入力するマイク１２１と、そのマイク１２１が出力するアナログ信号を増幅する増幅回路１２２と、その増幅回路１２２が出力するアナログ信号をデジタルデータに変換するＡ／Ｄ変換回路１２３と、を備えた構成となっている。再生部１０３は、デジタルデータをアナログ信号に変換するＤ／Ａ変換回路１３１と、Ｄ／Ａ変換回路１２３が出力するアナログ信号を増幅する増幅回路１３２と、その増幅回路１３２から出力されたアナログ信号を音声に変換するスピーカー１３３と、を備えた構成となっている。 The recording unit 101 includes a microphone 121 for inputting sound, an amplification circuit 122 for amplifying an analog signal output from the microphone 121, and an A / D conversion circuit 123 for converting the analog signal output by the amplification circuit 122 into digital data. And, it has a configuration comprising. The reproduction unit 103 includes a D / A conversion circuit 131 that converts digital data into an analog signal, an amplification circuit 132 that amplifies the analog signal output from the D / A conversion circuit 123, and an analog signal output from the amplification circuit 132. And a speaker 133 for converting the sound into sound.

録音部１０１から出力されるデジタルデータには、例えばマイコン１０２により、所定の圧縮処理が実行され、録音用メモリ１０４に格納される。メモリ１０５には、録音用メモリ１０４に格納された圧縮後のデジタルデータ、或いは別の圧縮処理を実行したデジタルデータが保存される。それにより、録音用メモリ１０４、或いはメモリ１０５から読み出した圧縮後のデジタルデータの再生は、マイコン１０２により解凍した後、再生部１０３に出力することで行われる。しかしここでは便宜的に、デジタルデータの圧縮や解凍は省いて、つまり録音部１０１から出力されるデジタルデータはそのまま録音用メモリ１０４、或いはメモリ１０５に格納されるとの想定で説明する。そのデジタルデータは以降「音声データ」と呼び、音声のアナログ信号は以降「音声信号」と呼ぶことにする。 The digital data output from the recording unit 101 is subjected to a predetermined compression process by, for example, the microcomputer 102 and stored in the recording memory 104. The memory 105 stores the compressed digital data stored in the recording memory 104 or the digital data subjected to another compression process. Thereby, the compressed digital data read from the recording memory 104 or the memory 105 is reproduced by being decompressed by the microcomputer 102 and then outputted to the reproducing unit 103. However, here, for the sake of convenience, description will be made on the assumption that the digital data is not compressed or decompressed, that is, the digital data output from the recording unit 101 is stored in the recording memory 104 or the memory 105 as it is. The digital data is hereinafter referred to as “audio data”, and the analog audio signal is hereinafter referred to as “audio signal”.

音声録音を行う場合、録音部１０１のＡ／Ｄ変換回路１２３は、予め定められたサンプリング周期で増幅回路１２２から出力された音声信号をサンプリングし、その音声信号を音声データに変換して出力する。マイコン１０２は、その音声データを録音用メモリ１０４にストアする。それによりメモリ１０５に保存する音声データは、録音用メモリ１０４から取得する。このように録音用メモリ１０４に格納された音声データをメモリ１０５に保存するのは、録音用メモリ１０４には過去、所定時間内の音声データを格納することにより、録音の開始をユーザーが指示した時よりも一定時間前の音声データから録音するためである。これは、録音する必要性をユーザーが認識してから実際に録音開始を指示するまでに或る程度の時間がかかる場合が多いと想定しているためである。 When performing audio recording, the A / D conversion circuit 123 of the recording unit 101 samples the audio signal output from the amplifier circuit 122 at a predetermined sampling period, converts the audio signal into audio data, and outputs the audio data. . The microcomputer 102 stores the audio data in the recording memory 104. Thereby, the audio data stored in the memory 105 is acquired from the recording memory 104. The audio data stored in the recording memory 104 in this way is saved in the memory 105 because the user instructs the start of recording by storing the audio data in the past and a predetermined time in the recording memory 104. This is because recording is performed from audio data a certain time before the time. This is because it is assumed that it takes a certain amount of time until the user actually instructs the start of recording after the user recognizes the necessity of recording.

録音用メモリ１０４を介してメモリ１０５に保存された音声データの再生は、マイコン１０２がメモリ１０５から音声データを順次、読み出して再生部１０３のＤ／Ａ変換回路１３１にサンプリング周期で出力することで行われる。その変換回路１３１は、音声データを音声信号に変換する。その音声信号が増幅回路１３２を介してスピーカー１３３に入力することにより、音声が放音される。 The reproduction of the audio data stored in the memory 105 via the recording memory 104 is performed by the microcomputer 102 sequentially reading out the audio data from the memory 105 and outputting the audio data to the D / A conversion circuit 131 of the reproduction unit 103 at a sampling period. Done. The conversion circuit 131 converts the audio data into an audio signal. The audio signal is input to the speaker 133 via the amplifier circuit 132, so that the audio is emitted.

図２は、メモリ１０５に保存する音声データを説明する図である。
メモリ１０５には、例えば論理的に音声データの保存用エリアであるレコードエリアが確保され、録音開始から終了までの１録音分の音声データはそのレコードエリア内に分けて保存される。１録音分の音声データは以降「録音データ」とも呼ぶことにする。図中「録音データ（１）」「録音データ（２）」等はそれぞれ録音データが保存されているファイル（音声ファイル）を表している。その音声ファイルが格納されるエリアは「録音音声エリア」と呼ぶことにする。 FIG. 2 is a diagram for explaining audio data stored in the memory 105.
In the memory 105, for example, a record area that is logically a storage area for audio data is secured, and audio data for one recording from the start to the end of recording is stored separately in the record area. The audio data for one recording is hereinafter referred to as “recording data”. In the figure, “recorded data (1)”, “recorded data (2)”, etc. represent files (sound files) in which recorded data is stored. The area in which the audio file is stored is called a “recorded audio area”.

音声ファイルには、先頭に録音を開始した日時が格納される。日時に続く経過時間は、録音開始から終了までの録音時間であり、日時と経過時間はヘッダを構成する。そのヘッダ以降に、録音データが格納される。 The audio file stores the date and time when recording started at the beginning. The elapsed time that follows the date and time is the recording time from the start to the end of recording, and the date and time and the elapsed time constitute a header. Recording data is stored after the header.

音声データは、区切りデータによって分けた形で格納される。この区切りデータは、１音声ファイル内で再生単位となる音声データ群の範囲を示すものであり、例えば再生単位となる音声データの最後を示すユニークなデータと、その再生単位の録音時間（経過時間）とを含むデータである。この区切りデータを挿入する箇所（タイミング）は、音声認識部１０６による音声認識結果を用いて判定するようになっている。 The audio data is stored in a form divided by delimiter data. This delimiter data indicates the range of the audio data group as a playback unit in one audio file. For example, unique data indicating the end of the audio data as a playback unit and the recording time (elapsed time) of the playback unit. ). The location (timing) at which this delimiter data is inserted is determined using the speech recognition result by the speech recognition unit 106.

人が話をする場合、話題を随時、変えていくことが多い。話題を変える場合、「さて」「ところで」といった接続語（接続詞）を用いることが多い。重要な点、或いは強調したい点を話す場合には、間を空けることが多い。このようなことから本実施形態では、接続語を音声認識により検出した、或いは一定時間以上、全体的に音声データの値が閾値以下となる期間（以降、便宜的に「無音期間」と呼ぶ）を検出した場合に、その接続語、或いは無音期間が話す内容上の区切りを表しているとして、区切りデータを自動的に挿入するようにしている。それにより、話題、或いは重要性等の話す内容上の変化に着目して分けた話の範囲を再生単位とした音声データの再生を可能とさせて、ユーザーが再生を所望する録音部分をより容易に行えるようにしている。 When people talk, they often change the topic at any time. When changing the topic, conjunctions (conjunctions) such as “Now” and “By the way” are often used. When talking about important points or points that you want to emphasize, there is often a gap. For this reason, in the present embodiment, a connection word is detected by voice recognition, or a period in which the value of voice data is generally equal to or less than a threshold value for a certain time or longer (hereinafter referred to as “silent period” for convenience). Is detected, the delimiter data is automatically inserted assuming that the connected word or silence period represents a delimiter on the content spoken. As a result, it is possible to reproduce audio data with the range of the story divided by paying attention to changes in the content of the talk, such as the topic or importance, so that the recording part that the user desires to reproduce is easier. To be able to.

本実施形態では、話の内容上の変化を検出して区切りデータを挿入するようにしているが、例えばその変化が発生したタイミングを示す時間情報をヘッダ、或いはメモリ１０５の別の記憶エリアに保存するようにしても良い。そのように保存したとしても、音声データ群を再生単位（音声データ群）別に分割して再生することができる。 In this embodiment, the change in the content of the story is detected and the delimiter data is inserted. For example, time information indicating the timing at which the change has occurred is saved in the header or another storage area of the memory 105. You may make it do. Even if such storage is performed, the audio data group can be divided and reproduced in units of reproduction (audio data group).

本実施形態では、ユーザーが所望する録音部分を更に容易に探し出せるように、再生単位毎に話し出し部分の音声認識結果を表示させるようにしている。図６は、その表示画面例を示す図である。図６に示すように本実施形態では、話し出し部分の音声認識結果と共に、録音が行われた日付、キーワード、そのキーワードが出現した（キーワードを認識した）回数である頻度、再生単位の録音時間を表示するようにしている。再生単位のリスト表示は、キーワード毎に、そのキーワードの出現頻度が高いほうを上にする形で行うようにしている。 In this embodiment, the speech recognition result of the spoken portion is displayed for each playback unit so that the user can more easily find the desired recorded portion. FIG. 6 is a diagram showing an example of the display screen. As shown in FIG. 6, in this embodiment, along with the voice recognition result of the spoken part, the recording date, the keyword, the frequency of the occurrence of the keyword (recognizing the keyword), and the recording time of the playback unit are shown. It is trying to display. The list display of the reproduction unit is performed for each keyword in such a manner that the higher appearance frequency of the keyword is raised.

話し出し部分は、再生を所望する録音部分の位置を特定するために用いることができる情報である。その話し出し部分で始まる話題の話を基準にして、再生を所望する部分の話の相対的な位置関係をより容易にユーザーが特定できるようにする。このため、ユーザーにとっては、単に再生単位で分ける場合と比較して、再生を所望する録音部分が含まれる再生単位をより容易、且つ確実に探し出せるようになる。 The spoken portion is information that can be used to specify the position of the recording portion desired to be reproduced. The user can more easily identify the relative positional relationship of the portion of the story desired to be reproduced with reference to the topic of the story that starts at the beginning of the speech. For this reason, the user can more easily and reliably find the reproduction unit including the recording portion desired to be reproduced as compared with the case of simply dividing the reproduction unit.

本実施形態では、上記したように、音声ファイルを構成する再生単位毎に再生対象を選択できるようにしている。それ以外に、音声ファイル単位で再生対象を選択できるようにさせている。図５は、音声ファイル単位で再生対象を選択可能な表示画面を示す図である。その表示画面では、図５に示すように、話し出し部分、録音時間、及び日付が音声ファイル毎にリスト表示している。以降、図５及び図６にそれぞれ示す表示画面は「通常再生画面」及び「キーワード再生画面」と呼ぶことにする。 In the present embodiment, as described above, the playback target can be selected for each playback unit constituting the audio file. In addition, the playback target can be selected for each audio file. FIG. 5 is a diagram showing a display screen on which a playback target can be selected in units of audio files. On the display screen, as shown in FIG. 5, the speech portion, recording time, and date are displayed in a list for each audio file. Hereinafter, the display screens shown in FIGS. 5 and 6 will be referred to as “normal playback screen” and “keyword playback screen”, respectively.

図３は、メモリ１０５を用いたキーワードの保存方法を説明する図である。
メモリ１０５には、例えば論理的にキーワードの保存用エリアであるキーワード登録エリアが確保されている。キーワードは、この登録エリア内に分けて保存される。図中「登録キーワード（１）」が表記されたエリアは、１キーワード用のエリア（登録キーワードエリア）であり、そのエリアに、キーワード、及びそのキーワードを認識（検出）した回数が格納される。 FIG. 3 is a diagram for explaining a keyword storage method using the memory 105.
In the memory 105, for example, a keyword registration area which is a logical keyword storage area is secured. Keywords are stored separately in this registration area. In the figure, an area where “registered keyword (1)” is written is an area for one keyword (registered keyword area), and the keyword and the number of times the keyword is recognized (detected) are stored in the area.

キーワードの入力は、音声入力により行うようにしている。マイコン１０２は、録音部１０１から出力され、録音用メモリ１０４に格納された音声データを用いた音声認識を音声認識部１０６に行わせ、その認識結果を受け取ることにより、音声入力された文字列を確認する。その文字列の確認は、例えばキーワードを音声入力する前後でユーザーは音声入力をしないか、或いは声が小さいと想定し、或る一定以上の値の音声データが継続して録音部１０１から出力されている範囲を対象に行う。それにより、所定時間以上、音声データの値が閾値以下となった場合に、文字列の認識結果を音声入力されたキーワードとして、そのキーワードを配置した図４に示すようなキーワード登録画面を表示させる。そのようにして、ユーザーに音声認識したキーワードを確認させ、ユーザーの指示に応じて登録する。 The keyword is input by voice input. The microcomputer 102 causes the voice recognition unit 106 to perform voice recognition using the voice data output from the recording unit 101 and stored in the recording memory 104, and receives the recognition result, thereby converting the character string input by voice. Check. For confirmation of the character string, for example, it is assumed that the user does not input voice before or after inputting a keyword by voice, or the voice is low, and voice data of a certain value or more is continuously output from the recording unit 101. The target range is performed. As a result, when the voice data value is equal to or lower than the threshold value for a predetermined time or longer, the keyword registration screen as shown in FIG. 4 is displayed with the keyword placed as a voice input keyword. . In this way, the user is allowed to confirm the recognized keyword and register it according to the user's instruction.

ユーザーが登録したキーワードが出現する頻度は、録音した話の内容を推測するうえでの有用な情報となる。このため、図６に示すように、キーワード、及びその出現回数である頻度を再生単位毎に表示する場合には、再生を所望する部分が録音されている再生単位をより容易にユーザーは特定できるようになる。 The frequency of occurrence of the keyword registered by the user is useful information for estimating the content of the recorded story. For this reason, as shown in FIG. 6, when displaying the keyword and the frequency that is the number of appearances for each reproduction unit, the user can more easily specify the reproduction unit in which the portion desired to be reproduced is recorded. It becomes like this.

以降は、図７〜図１５に示す各種処理のフローチャートを参照して、マイコン１０２の動作について詳細に説明する。図７〜図１５に示す各種処理は、マイコン１０２が例えばメモリ１０５に格納されたプログラムを実行することで実現される。 Hereinafter, the operation of the microcomputer 102 will be described in detail with reference to flowcharts of various processes shown in FIGS. Various processes shown in FIGS. 7 to 15 are realized by the microcomputer 102 executing a program stored in the memory 105, for example.

図７は、全体処理のフローチャートである。始めに図７を参照して、この全体処理について詳細に説明する。
先ず、ステップＳ１０１ではイニシャライズを行い、レコーダーを予め定めた状態に設定する。続くステップＳ１０２では、録音部１０１を作動させる。その次に移行するステップＳ１０３では、録音部１０１から出力される音声データを録音用（図中「常時録音メモリ」と表記）メモリ１０４にストアする。その後はステップＳ１０４に移行する。 FIG. 7 is a flowchart of the entire process. First, the entire process will be described in detail with reference to FIG.
First, in step S101, initialization is performed and the recorder is set in a predetermined state. In the subsequent step S102, the recording unit 101 is operated. In the next step S103, the audio data output from the recording unit 101 is stored in the recording memory 104 (denoted as “always recording memory” in the figure). Thereafter, the process proceeds to step S104.

ステップＳ１０４では、キーワード登録が可能なキーワード登録モードが設定されているか否か判定する。その登録モードが設定されている場合、判定はＹＥＳとなり、ステップＳ１０５でその登録モード用のキーワード登録処理を実行した後、ステップＳ１１０に移行する。そうでない場合には、判定はＮＯとなってステップＳ１０６に移行する。 In step S104, it is determined whether a keyword registration mode capable of keyword registration is set. If the registration mode is set, the determination is yes, and after performing keyword registration processing for the registration mode in step S105, the process proceeds to step S110. Otherwise, the determination is no and the process moves to step S106.

ステップＳ１０６では、音声録音が可能なレコーダーモードが設定されているか否か判定する。そのレコーダーモードが設定されている場合、判定はＹＥＳとなり、ステップＳ１０７でそのモード用のレコード処理を実行した後、ステップＳ１１０に移行する。そうでない場合には、判定はＮＯとなってステップＳ１０８に移行する。 In step S106, it is determined whether or not a recorder mode capable of voice recording is set. If the recorder mode is set, the determination is yes, and after executing the record processing for that mode in step S107, the process proceeds to step S110. Otherwise, the determination is no and the process moves to step S108.

ステップＳ１０８では、音声データの再生が可能な再生モードが設定されているか否か判定する。その再生モードが設定されている場合、判定はＹＥＳとなり、ステップＳ１０９でそのモード用の再生処理を実行した後、ステップＳ１１０に移行する。そうでない場合には、判定はＮＯとなってそのステップＳ１１０に移行する。 In step S108, it is determined whether or not a reproduction mode capable of reproducing audio data is set. If the playback mode is set, the determination is yes, and after performing playback processing for that mode in step S109, the process proceeds to step S110. Otherwise, the determination is no and the process moves to step S110.

ステップＳ１１０では、モード切替が指示されたか否か判定する。ユーザーが操作部１０８を操作して別のモードへの変更を指示した場合、その操作内容がマイコン１０２に通知されることから、判定はＹＥＳとなり、ステップＳ１１１でユーザーの指示に応じたモード設定を行い、次のステップＳ１１２で表示部１０９の表示をクリアしてから、上記ステップＳ１０３に戻る。そうでない場合には、判定はＮＯとなり、そのステップＳ１０３に戻る。 In step S110, it is determined whether mode switching is instructed. When the user operates the operation unit 108 to instruct to change to another mode, the operation content is notified to the microcomputer 102. Therefore, the determination is YES, and the mode setting according to the user instruction is performed in step S111. After clearing the display on the display unit 109 in the next step S112, the process returns to step S103. Otherwise, the determination is no and the process returns to step S103.

このようにして本実施形態では、ユーザーが所望のモードを設定し、そのモードで可能な動作をレコーダーに行わせるようにしている。設定可能なモードとしては他に、音声ファイルの削除が可能なモードや、登録したキーワードの削除が可能なモード等が存在するが、ここでは詳細な説明は省略することとする。また、操作部１０８が備えた各種スイッチの大部分は、モード等によって割り当てる機能を変更させるようになっている。例えば図４に表記の「確認ＳＷ」は、キーワード登録画面を表示させている状況でキーワードの登録を指示する機能が割り当てられたスイッチである。このように状況によってスイッチに割り当てる機能が異なる場合が多いことから、混乱を避けるために操作方法等についての詳細な説明も省略することとする。 In this way, in this embodiment, the user sets a desired mode and causes the recorder to perform operations that are possible in that mode. There are other modes that can be set, such as a mode in which an audio file can be deleted, a mode in which registered keywords can be deleted, and the like, but detailed description thereof will be omitted here. In addition, most of the various switches provided in the operation unit 108 are configured to change the function assigned depending on the mode or the like. For example, “confirmation SW” shown in FIG. 4 is a switch to which a function for instructing keyword registration is assigned in a situation where the keyword registration screen is displayed. Since the functions assigned to the switches are often different depending on the situation as described above, detailed description of the operation method and the like will be omitted to avoid confusion.

図８は、上記ステップＳ１０５として実行されるキーワード登録処理のフローチャートである。次に図８を参照して、この登録処理について詳細に説明する。
先ず、ステップＳ２０１では、音声データの値が閾値以上の音声入力が一定時間以上、継続してあったか否か判定する。そのような音声入力が行われた場合、判定はＹＥＳとなってステップＳ２０２に移行する。それにより、その音声入力はキーワード登録用のものとして扱う。そうでない場合には、判定はＮＯとなり、ここでこの登録処理を終了する。 FIG. 8 is a flowchart of the keyword registration process executed as step S105. Next, the registration process will be described in detail with reference to FIG.
First, in step S201, it is determined whether or not a voice input whose voice data value is equal to or greater than a threshold value has continued for a certain time or more. If such a voice input is performed, the determination is yes and the process moves to step S202. Thus, the voice input is handled as a keyword registration. Otherwise, the determination is no and the registration process ends here.

ステップＳ２０２では、録音用メモリ１０４に格納されている音声データをメモリ１０５に確保したキーワードエリアにストアする。続くステップＳ２０２では、キーワードエリアにストアした音声データを用いた音声認識を音声認識部１０６に実行させ、その認識結果を取得する。次のステップＳ２０４では、その認識結果をキーワードとして示す図４に示すようなキーワード登録画面を生成し、表示部１０９に出力することで表示させる。その後はステップＳ２０５に移行する。 In step S202, the audio data stored in the recording memory 104 is stored in the keyword area secured in the memory 105. In the subsequent step S202, the speech recognition unit 106 is caused to perform speech recognition using the speech data stored in the keyword area, and the recognition result is acquired. In the next step S204, a keyword registration screen as shown in FIG. 4 showing the recognition result as a keyword is generated and displayed on the display unit 109 for display. Thereafter, the process proceeds to step S205.

音声認識部１０６による音声認識は文字列としてマイコン１０２に渡される。マイコン１０２は、その文字列を用いて辞書部１０７の辞書を参照することにより、その文字列に対応する用語を推定し、その推定結果に従って文字列に相当する表記を決定する。これはテキストデータを別のテキストデータに変換する操作に相当する。それにより、音声認識した文字列が「かめら」であればカタカナ表記の「カメラ」をキーワードとして認識する。その文字列が「かいしゃ」であれば、漢字表記の「会社」をキーワードとして認識する。この認識自体は、音声認識部１０６による音声認識と同じく、周知の技術を用いて行われる。このため、詳細な説明は省略する。 The voice recognition by the voice recognition unit 106 is passed to the microcomputer 102 as a character string. The microcomputer 102 estimates a term corresponding to the character string by referring to the dictionary of the dictionary unit 107 using the character string, and determines a notation corresponding to the character string according to the estimation result. This corresponds to an operation for converting text data into another text data. As a result, if the recognized character string is “Kamera”, “camera” in katakana notation is recognized as a keyword. If the character string is “Kaisha”, “K” in kanji is recognized as a keyword. This recognition itself is performed using a well-known technique, similarly to the speech recognition by the speech recognition unit 106. For this reason, detailed description is omitted.

ステップＳ２０５では、ユーザーによる操作部１０８への操作を待って、その操作が確認スイッチ（ＳＷ）への操作か否か判定する。その確認スイッチをユーザーが操作した場合、判定はＹＥＳとなり、音声認識したキーワードを保存する登録キーワードエリアをキーワード登録エリア内に確保してストアする処理をステップＳ２０６で実行し、次にメモリ１０５に確保した音声認識用のキーワードエリアをステップＳ２０７でクリアしてから、この登録処理を終了する。そうでない場合には、つまりキーワードの登録をしないことをユーザーが指示した場合には、判定はＮＯとなってそのステップＳ２０７に移行し、キーワードエリアのクリアを行う。 In step S205, the user waits for an operation on the operation unit 108, and determines whether the operation is an operation on the confirmation switch (SW). If the confirmation switch is operated by the user, the determination is yes, and a process for securing and storing a registered keyword area for storing the voice-recognized keyword in the keyword registration area is executed in step S206, and then secured in the memory 105. After clearing the voice recognition keyword area in step S207, the registration process is terminated. If not, that is, if the user instructs not to register a keyword, the determination is no, the process proceeds to step S207, and the keyword area is cleared.

図９及び図１０は、図７に示す全体処理内でステップＳ１０７として実行されるレコード処理のフローチャートである。次に図９及び図１０を参照して、このレコード処理について詳細に説明する。 9 and 10 are flowcharts of the record process executed as step S107 in the overall process shown in FIG. Next, this record processing will be described in detail with reference to FIG. 9 and FIG.

先ず、ステップＳ３０１では、スタートフラグの値が０か否か判定する。このスタートフラグは、録音を管理するための変数であり、０の値は録音中でないことを、１の値は録音中であることをそれぞれ示している。このことから、現在、録音を開始していない場合、判定はＹＥＳとなってステップＳ３０２に移行する。そうでない場合には、判定はＮＯとなって図１０のステップＳ３１５に移行する。 First, in step S301, it is determined whether or not the value of the start flag is zero. This start flag is a variable for managing recording. A value of 0 indicates that recording is not being performed, and a value of 1 indicates that recording is being performed. For this reason, when recording is not currently started, the determination is YES, and the process proceeds to step S302. Otherwise, the determination is no and the process moves to step S315 in FIG.

ステップＳ３０２では、録音用メモリ１０４に新たにストアされた音声データを用いた音声認識を音声認識部１０６に実行させ、その認識結果を取得する。続くステップＳ３０３では、取得した認識結果、及び、それよりも前に得られた認識結果のなかにキーワードが存在するか否か判定する。例えば「カメラ」がキーワードとして登録され、今回、取得した認識結果が「ら」であり、その直前に認識結果として「か」及び「め」がその順序で取得していた場合、認識結果にはキーワードに相当する「かめら」が存在することから、判定はＹＥＳとなってステップＳ３０４に移行する。そうでない場合には、判定はＮＯとなってステップＳ３１３に移行する。 In step S302, the voice recognition unit 106 is caused to perform voice recognition using the voice data newly stored in the recording memory 104, and the recognition result is acquired. In a succeeding step S303, it is determined whether or not a keyword exists in the acquired recognition result and the recognition result obtained before that. For example, if “camera” is registered as a keyword, the recognition result acquired this time is “ra”, and “ka” and “me” are acquired in that order as the recognition result immediately before that, the recognition result is Since “camera” corresponding to the keyword exists, the determination is YES, and the process proceeds to step S304. Otherwise, the determination is no and the process moves to step S313.

ステップＳ３０４では、ステップＳ３０３で確認したキーワードの登録キーワードエリアに格納されている回数をインクリメントする。次のステップＳ３０５では、録音用メモリ１０４にストアされている音声データのうち、現時点より一定時間前からの音声データを抽出する。その次のステップＳ３０６では、メモリ１０５のレコードエリア内に音声データ保存用の空き録音音声エリアを確保する。その確保後に移行するステップＳ３０７では、例えばマイコン１０２に搭載されたハードタイマから日時を取得する。ステップＳ３０８には、その取得後に移行する。 In step S304, the number of times stored in the registered keyword area of the keyword confirmed in step S303 is incremented. In the next step S305, audio data from a predetermined time before the current time is extracted from the audio data stored in the recording memory 104. In the next step S306, an empty recording voice area for storing voice data is secured in the record area of the memory 105. In step S307, which shifts after the reservation, the date and time is acquired from, for example, a hard timer mounted on the microcomputer 102. The process proceeds to step S308 after the acquisition.

ステップＳ３０８では、確保した空き録音音声エリアに音声ファイルのヘッダを構成する日時をストアする。次のステップＳ３０９では、ステップＳ３０５で抽出した音声データをヘッダに続く位置にストアする。その後は、ステップＳ３１０でスタートフラグの値を１にし、ステップＳ３１１でタイマの値をクリアしてスタートさせ、更にステップＳ３１２で経過時間カウンタをスタートさせてから、図１０のステップＳ３１５に移行する。このようにして本実施形態では、ユーザーが登録したキーワードが音声入力された場合、自動的に録音を開始するようにしている。 In step S308, the date and time constituting the header of the audio file is stored in the reserved recording audio area. In the next step S309, the audio data extracted in step S305 is stored at a position following the header. After that, the value of the start flag is set to 1 in step S310, the timer value is cleared and started in step S311, and the elapsed time counter is started in step S312, and then the process proceeds to step S315 in FIG. In this way, in this embodiment, when a keyword registered by the user is inputted by voice, recording is automatically started.

上記タイマは、再生単位毎の経過時間（録音時間）を計時するための例えばハードタイマである。経過時間カウンタは録音開始からの時間を計時するための変数であり、その値は例えば所定時間が経過する度にインクリメントされる。経過時間カウンタのスタートとは、そのインクリメントを行わない状況から行う状況への変更に相当する。図２に示す経過時間は、経過時間カウンタの最終値が示す時間であり、タイマの値は、区切りデータを構成する録音時間として扱われる。 The timer is, for example, a hard timer for measuring the elapsed time (recording time) for each reproduction unit. The elapsed time counter is a variable for measuring the time from the start of recording, and the value is incremented each time a predetermined time elapses, for example. The start of the elapsed time counter corresponds to a change from a situation where the increment is not performed to a situation where the increment is performed. The elapsed time shown in FIG. 2 is the time indicated by the final value of the elapsed time counter, and the timer value is treated as the recording time that constitutes the delimiter data.

上記ステップＳ３１３では、録音開始を指示するためのスタートスイッチ（ＳＷ）がオンしたか否か判定する。そのスイッチをユーザーが操作した場合、判定はＹＥＳとなってステップ３１４に移行し、録音用メモリ１０４にストアされている音声データのうち、このスイッチ操作時より一定時間前からの音声データを抽出する。その抽出後は上記ステップＳ３０６に移行する。一方、そうでない場合には、つまりスタートスイッチをユーザーが操作していない場合には、判定はＮＯとなり、ここでレコード処理を終了する。 In step S313, it is determined whether a start switch (SW) for instructing the start of recording is turned on. If the switch is operated by the user, the determination is yes, the process proceeds to step 314, and the audio data stored in the recording memory 104 is extracted from a predetermined time before the switch operation. . After the extraction, the process proceeds to step S306. On the other hand, if this is not the case, that is, if the user has not operated the start switch, the determination is no and the record processing is terminated here.

上記ステップＳ３０１の判定がＮＯとなることにより、或いは上記ステップＳ３１２の処理を実行することにより移行する図１０のステップＳ３１５では、スタートフラグの値が１か否か判定する。その値が１でない場合、判定はＮＯとなり、ここでレコード処理を終了する。そうでない場合には、判定はＹＥＳとなってステップＳ３１６に移行する。 In step S315 of FIG. 10 which is shifted when the determination in step S301 is NO or when the process of step S312 is executed, it is determined whether or not the value of the start flag is 1. If the value is not 1, the determination is no and the record processing is terminated here. Otherwise, the determination is yes and the process moves to step S316.

ステップＳ３１６では、録音用メモリ１０４に新たにストアされた音声データをメモリ１０５のレコードエリアに確保した録音音声エリアにストアする。次のステップＳ３１７では、録音音声エリアに新たにストアした音声データを用いた音声認識を音声認識部１０６に行わせ、その認識結果を取得する。その次に移行するステップＳ３１８では、今回、取得した認識結果、及びその認識結果の直前に取得した１つ以上の認識結果によって表される文字列にキーワードが含まれているか否か判定する。その文字列に登録した何れかのキーワードが含まれている場合、判定はＹＥＳとなり、ステップＳ３１９でそのキーワードに対応する登録キーワードエリア中の回数をインクリメントしてからステップＳ３２０に移行する。そうでない場合には、判定はＮＯとなってそのステップＳ３２０に移行する。取得した認識結果、及びその認識結果の直前に取得した１つ以上の認識結果によって表される文字列は、キーワード、或いは接続語を認識する対象となるものであることから、以降「認識対象文字列」と呼ぶこととする。 In step S316, the voice data newly stored in the recording memory 104 is stored in the recording voice area secured in the record area of the memory 105. In the next step S317, the voice recognition unit 106 performs voice recognition using the voice data newly stored in the recording voice area, and acquires the recognition result. In the next step S318, it is determined whether or not a keyword is included in the character string represented by the recognition result acquired this time and one or more recognition results acquired immediately before the recognition result. If any of the registered keywords is included in the character string, the determination is yes, and the number of times in the registered keyword area corresponding to the keyword is incremented in step S319, and then the process proceeds to step S320. Otherwise, the determination is no and the process moves to step S320. Since the character string represented by the acquired recognition result and one or more recognition results acquired immediately before the recognition result is a target for recognizing a keyword or a connected word, hereinafter “recognized character” It will be called a “column”.

ステップＳ３２０では、認識対象文字列に接続語が含まれる、或いは一定時間以上、値が閾値以下の音声データの入力が続いた後にその閾値より大きい値の音声データの入力が開始したか否か判定する。その接続語が含まれる、或いは一定時間以上、値が閾値以下の音声データの入力が続いた後にその閾値より大きい値の音声データの入力が開始する無音期間があった場合、判定はＹＥＳとなってステップＳ３２１に移行する。そうでない場合には、つまり、接続語が含まれず、且つ無音期間がなかった場合には、判定はＮＯとなってステップＳ３２３に移行する。 In step S320, it is determined whether or not input of speech data having a value greater than the threshold value has been started after input of speech data whose value is equal to or less than the threshold value continues for a certain time or longer, including a connected word in the recognition target character string. To do. If the connection word is included or there is a silent period in which the input of voice data having a value greater than the threshold value is continued after the input of voice data having a value equal to or less than the threshold value for a certain time or more, the determination is YES. Then, the process proceeds to step S321. If not, that is, if no connection word is included and there is no silence period, the determination is no and the process moves to step S323.

上述したように接続語は、話題を変更する際に良く用いられる。重要な点、或いは強調したい点を話す際には、間を空けることが多い。このようなことから、ステップＳ３２０でのＹＥＳの判定は、音声入力をしている人が話す内容を変更する可能性が高いことを示している。このことから、ステップＳ３２１では、メモリ１０５のレコードエリア内で音声データのストアを行っている録音音声エリア内の接続語、或いは無音期間に対応する音声データの前に区切りデータを挿入する。その後は、ステップＳ３２２でタイマをクリアしスタートさせてからステップＳ３２５に移行する。レコードエリア内で音声データのストアを行っている録音音声エリアについては以降「対象録音音声エリア」と呼ぶことにする。 As described above, connected words are often used when changing topics. When talking about important or emphasized points, there is often a gap. For this reason, the determination of YES in step S320 indicates that there is a high possibility of changing the content spoken by the person who is inputting voice. For this reason, in step S321, the delimiter data is inserted before the connection word in the recording voice area where the voice data is stored in the record area of the memory 105 or the voice data corresponding to the silent period. Thereafter, the timer is cleared and started in step S322, and then the process proceeds to step S325. The recorded voice area where the voice data is stored in the record area will be referred to as “target recorded voice area” hereinafter.

一方、ステップＳ３２３では、タイマの値を参照して、タイマがカウントアップしたか否か判定する。本実施形態では、予め定めた時間内に接続語、或いは無音期間を検出できなかった場合に、その時間の経過により区切りデータを挿入して、再生単位の録音時間が長くなり過ぎないようにしている。タイマは、録音時間を計時するとともに、予め定めた時間が経過したか否か確認するために用いている。カウントアップとは、その時間の経過した状態を表している。このことから、その時間が経過した場合、判定はＹＥＳとなってステップＳ３２４に移行し、対象録音音声エリア内で最後に位置する音声データの後に区切りデータを挿入する。その挿入後は上記ステップＳ３２２の処理を実行する。そうでない場合には、判定はＮＯとなってステップＳ３２５に移行する。 On the other hand, in step S323, it is determined whether the timer has counted up with reference to the timer value. In this embodiment, when a connection word or silence period cannot be detected within a predetermined time, a delimiter data is inserted as time passes so that the recording time of the playback unit does not become too long. Yes. The timer counts the recording time and is used to check whether a predetermined time has elapsed. The count up represents a state in which the time has elapsed. Therefore, if the time has elapsed, the determination is YES, the process proceeds to step S324, and the delimiter data is inserted after the last voice data in the target recording voice area. After the insertion, the process of step S322 is executed. Otherwise, the determination is no and the process moves to step S325.

図６に示すキーワード再生画面には、話し出し部分が「で決定でよろしいでしょうか？ではこの」となっている再生単位が存在する。このように話し出し部分が話の途中のものとなっている再生単位は、音声認識が適切に行えなかったような場合を除き、予め定めた時間分、録音を行った再生単位の次に位置している。それにより、例に挙げた再生単位は、話し出し部分が「ところで、このデジタルカメラの型番は」の再生単位の次に位置していると推測することができる。話し出し部分が「ところで、このデジタルカメラの型番は」の再生単位の録音時間は５分丁度であることから、予め定めた時間は５分丁度となっている。 In the keyword playback screen shown in FIG. 6, there is a playback unit whose spoken part is “Are you sure?” In this way, the playback unit whose speech portion is in the middle of the talk is positioned next to the playback unit that has been recorded for a predetermined period of time, except when speech recognition cannot be performed properly. ing. As a result, it can be assumed that the playback unit given as an example is located next to the playback unit of “the part number of this digital camera is”. Since the recording time of the playback unit is “5 minutes just for the model number of this digital camera”, the predetermined time is exactly 5 minutes.

ステップＳ３２５では、録音終了を指示するためのストップスイッチ（ＳＷ）がオンしたか否か判定する。そのスイッチをユーザーが操作した場合、判定はＹＥＳとなってステップＳ３２６に移行する。そうでない場合には、判定はＮＯとなり、ここでレコード処理を終了する。 In step S325, it is determined whether or not a stop switch (SW) for instructing the end of recording is turned on. If the switch is operated by the user, the determination is yes and the process moves to step S326. Otherwise, the determination is no and the record processing is terminated here.

ステップＳ３２６では、対象録音音声エリアのヘッダの経過時間として、経過時間カウンタの値が示す時間をストアする。続くステップＳ３２７では、この経過時間カウンタを停止させ、その値をクリアする。その後は、ステップＳ３２８でスタートフラグの値を０に変更してから、レコード処理を終了する。経過時間カウンタの停止とは、その値をインクリメントする状況からそれを行わない状況への変更に相当する。 In step S326, the time indicated by the value of the elapsed time counter is stored as the elapsed time of the header of the target recording voice area. In the subsequent step S327, the elapsed time counter is stopped and the value is cleared. Thereafter, the value of the start flag is changed to 0 in step S328, and then the record processing is ended. Stopping the elapsed time counter corresponds to a change from a situation where the value is incremented to a situation where the value is not incremented.

このようにして本実施形態では、接続語、或いは無音期間を検出するか、或いは一定時間（図６は５分丁度とした例である）の録音により、区切りデータを自動的に挿入し、音声データ全体（録音データ）を複数に分割可能とさせている。それにより、録音データの部分的な再生や途中の位置から再生を開始する頭出し再生を行えるようにしている。 In this way, in this embodiment, the connection data or the silent period is detected, or the delimiter data is automatically inserted by recording for a certain time (FIG. 6 is an example of exactly 5 minutes), and the voice is recorded. The entire data (recorded data) can be divided into multiple parts. As a result, partial playback of recorded data and cue playback that starts playback from an intermediate position can be performed.

図１１は、図７に示す全体処理内でステップＳ１０９として実行される再生処理のフローチャートである。次に図１１を参照して、この再生処理について詳細に説明する。
図５及び図６に示すように再生モードでは、再生対象を選択するための再生画面として２種類が表示可能となっている。表示させる表示画面の選択は、サブモードの設定により行えるようにしている。ここでは、図５の通常再生画面が表示されるサブモードは「通常再生モード」、図６のキーワード再生画面が表示されるサブモードは「キーワード再生モード」とそれぞれ呼ぶことにする。 FIG. 11 is a flowchart of the reproduction process executed as step S109 in the overall process shown in FIG. Next, the reproduction process will be described in detail with reference to FIG.
As shown in FIGS. 5 and 6, in the playback mode, two types of playback screens for selecting a playback target can be displayed. The display screen to be displayed can be selected by setting the sub mode. Here, the sub mode in which the normal playback screen of FIG. 5 is displayed is referred to as “normal playback mode”, and the sub mode in which the keyword playback screen of FIG. 6 is displayed is referred to as “keyword playback mode”.

先ず、ステップＳ４０１では、通常再生モードが設定されているか否か判定する。そのモードが設定されている場合、判定はＹＥＳとなり、そのモードでの再生を可能とさせる通常再生処理をステップＳ４０２で実行した後、この再生処理を終了する。そうでない場合には、判定はＮＯとなってステップＳ４０３に移行する。 First, in step S401, it is determined whether the normal playback mode is set. If the mode is set, the determination is yes, and after executing the normal playback process that enables playback in that mode in step S402, the playback process ends. Otherwise, the determination is no and the process moves to step S403.

ステップＳ４０３では、キーワード再生モードが設定されているか否か判定する。そのモードが設定されている場合、判定はＹＥＳとなり、そのモードでの再生を可能とさせるキーワード再生処理をステップＳ４０４で実行した後、この再生処理を終了する。そうでない場合には、判定はＮＯとなり、ここでこの再生処理を終了する。 In step S403, it is determined whether the keyword playback mode is set. If the mode is set, the determination is yes, and after executing the keyword playback process that enables playback in that mode in step S404, the playback process ends. Otherwise, the determination is no and the reproduction process ends here.

以降は、上記再生処理内で実行するサブルーチン処理について、図１２〜図１４を参照して詳細に説明する。
図１２は、上記ステップＳ４０２として実行される通常再生処理のフローチャートである。始めに図１２を参照して、その通常再生処理について詳細に説明する。 Hereinafter, subroutine processing executed in the reproduction processing will be described in detail with reference to FIGS.
FIG. 12 is a flowchart of the normal reproduction process executed as step S402. First, the normal reproduction process will be described in detail with reference to FIG.

先ず、ステップＳ５０１では、図５に示すような通常再生画面の表示が終了したか否か判定する。その表示が終了していない場合、判定はＮＯとなってステップＳ５０２に移行する。そうでない場合には、判定はＹＥＳとなってステップＳ５０７に移行する。 First, in step S501, it is determined whether or not the display of the normal reproduction screen as shown in FIG. If the display has not ended, the determination is no and the process moves to step S502. Otherwise, the determination is yes and the process moves to step S507.

通常再生画面では、図５に示すように、音声ファイル毎に話し出し部分を配置するようになっている。ステップＳ５０２〜Ｓ５０６では、音声ファイル毎に話し出し部分に対応するテキストデータを取得し、通常再生画面に配置するための処理が実行される。 On the normal playback screen, as shown in FIG. 5, a spoken portion is arranged for each audio file. In steps S502 to S506, processing for obtaining text data corresponding to the spoken portion for each audio file and placing it on the normal playback screen is executed.

先ず、ステップＳ５０２では、メモリ１０５のレコードエリアに格納された音声ファイルの一つ、つまりそのレコードエリアに確保された録音音声エリアの一つから先頭部分（録音開始部分）の音声データを取り出す。続くステップＳ５０３では、取り出した音声データを用いた音声認識を音声認識部１０６に行わせ、その認識結果を取得する。その次のステップＳ５０４では、辞書部１０７の辞書を参照して、認識結果をテキストデータに変換する。その後はステップＳ５０５に移行する。 First, in step S502, the audio data of the head portion (recording start portion) is extracted from one of the audio files stored in the record area of the memory 105, that is, one of the recording audio areas secured in the record area. In subsequent step S503, the voice recognition unit 106 is caused to perform voice recognition using the extracted voice data, and the recognition result is acquired. In the next step S504, the recognition result is converted into text data with reference to the dictionary of the dictionary unit 107. Thereafter, the process proceeds to step S505.

ステップＳ５０５では、対象とする音声ファイルのヘッダから日時、及び経過時間を取り出し、それらをテキストデータとともにリスト表示させる。それにより、１音声ファイル分の画像表示を行った後のステップＳ５０６では、レコードエリア内の音声ファイル全てをリスト表示したか否か判定する。レコードエリア内にリスト表示させるべき音声ファイルが存在しない場合、判定はＹＥＳとなってステップＳ５０７に移行する。そうでない場合には、判定はＮＯとなって上記ステップＳ５０２に戻る。それにより、全ての音声ファイルをリスト表示するまで、ステップＳ５０２〜Ｓ５０６で形成される処理ループを繰り返し実行する。 In step S505, the date and time and the elapsed time are extracted from the header of the target audio file, and are displayed in a list together with the text data. Accordingly, in step S506 after the image display for one audio file is performed, it is determined whether all the audio files in the record area have been displayed as a list. If there is no audio file to be displayed in a list in the record area, the determination is yes and the process moves to step S507. Otherwise, the determination is no and the process returns to step S502. Thereby, the processing loop formed in steps S502 to S506 is repeatedly executed until all audio files are displayed as a list.

通常表示画面を表示するための画像データの生成は、例えばマイコン１０４は自身に搭載されたＲＡＭを用いて行う。文字表示用、或いは画面表示用の画像データは例えばメモリ１０５から読み出し、画面表示用の画像データの生成に用いる。そのようにして生成した画像データは、例えば１音声ファイルのリスト表示用の画像データを追加する度に表示部１０９に出力する。それにより、音声ファイル単位で表示させる通常表示画面を更新する。リストのなかで選択状態とするリストは他とは異なる状態とすることで強調表示させる。自動的に強調表示させるリストは、例えばリスト上で先頭に位置するリストである。 For example, the microcomputer 104 generates image data for displaying a normal display screen by using a RAM mounted on the microcomputer 104. Image data for character display or screen display is read from, for example, the memory 105 and used to generate image data for screen display. The image data generated in this way is output to the display unit 109 each time image data for list display of one audio file is added, for example. Thereby, the normal display screen to be displayed for each audio file is updated. The list to be selected in the list is highlighted by making it different from the others. The list to be automatically highlighted is, for example, a list positioned at the top of the list.

ステップＳ５０７では、操作部１０８に対してユーザーがリストを選択するための操作を行ったか否か判定する。その操作をユーザーが行った場合、判定はＹＥＳとなってステップＳ５０８に移行し、その操作内容に応じて強調表示させるリストを変更する。その変更後はステップＳ５０９に移行する。一方、その操作をユーザーが行っていない場合には、判定はＮＯとなってそのステップＳ５０９に移行する。 In step S507, it is determined whether the user has performed an operation for selecting a list on the operation unit 108. If the user performs the operation, the determination is yes, the process proceeds to step S508, and the list to be highlighted is changed according to the operation content. After the change, the process proceeds to step S509. On the other hand, if the user has not performed the operation, the determination is no and the process moves to step S509.

ステップＳ５０９では、再生開始を指示するためのスタートスイッチがオンされたか否か判定する。そのスイッチをユーザーが操作した場合、判定はＹＥＳとなってステップＳ５１０に移行する。そうでない場合には、判定はＮＯとなってステップＳ５１５に移行する。ここでのスタートスイッチは、上述したスタートスイッチと同じスイッチであっても良いが、異なるスイッチであっても良い。これはストップスイッチでも同様である。 In step S509, it is determined whether or not the start switch for instructing the start of reproduction has been turned on. If the switch is operated by the user, the determination is yes and the process moves to step S510. Otherwise, the determination is no and the process moves to step S515. The start switch here may be the same switch as the start switch described above, or may be a different switch. The same applies to the stop switch.

ステップＳ５１０では、再生を管理するためのスタートフラグに１を代入する。１を代入した後に移行するステップＳ５１１では、再生部１０３の動作を開始させる。次のステップＳ５１２では、スタートフラグの値が１か否か判定する。その値が１であった場合、判定はＹＥＳとなってステップＳ５１３に移行する。そうでない場合には、判定はＮＯとなり、ここで通常再生処理を終了する。 In step S510, 1 is substituted into a start flag for managing reproduction. In step S511, which is shifted to after substituting 1, the operation of the playback unit 103 is started. In the next step S512, it is determined whether or not the value of the start flag is 1. If the value is 1, the determination is yes and the process moves to step S513. Otherwise, the determination is no and the normal playback process is terminated here.

スタートフラグに代入される１は再生中を示し、０は再生中でないことを示している。スタートスイッチへの操作によって再生する音声ファイルは、その操作時に強調表示させているリストに対応するものである。ここではその音声ファイルは「指定音声ファイル」と呼ぶことにする。 1 assigned to the start flag indicates that playback is in progress, and 0 indicates that playback is not in progress. The audio file that is reproduced by operating the start switch corresponds to the list that is highlighted at the time of the operation. Here, the audio file is referred to as a “designated audio file”.

ステップＳ５１３では、指定音声ファイルから音声データを必要に応じて読み出し、再生部１０３に供給する。次のステップＳ５１４では、指定音声ファイルの最後の音声データを読み出したか否か、つまりその指定音声ファイルの再生が終了したか否か判定する。その再生が終了した場合、判定はＹＥＳとなってステップＳ５１６に移行する。そうでない場合には、判定はＮＯとなり、ここで通常再生処理を終了する。 In step S 513, audio data is read from the designated audio file as necessary and supplied to the playback unit 103. In the next step S514, it is determined whether or not the last audio data of the designated audio file has been read, that is, whether or not the reproduction of the designated audio file has been completed. If the reproduction ends, the determination is yes and the process moves to step S516. Otherwise, the determination is no and the normal playback process is terminated here.

上記ステップＳ５０９の判定がＮＯとなって移行するステップＳ５１５では、再生終了を指示するためのストップスイッチがオンされたか否か判定する。そのスイッチをユーザーが操作した場合、判定はＹＥＳとなり、ステップＳ５１６でスタートフラグに０を代入し、更にステップＳ５１７で再生部１０３を停止させてから上記ステップＳ５１２に移行する。そうでない場合には、判定はＮＯとなって、そのステップＳ５１２に移行する。 In step S515 in which the determination in step S509 is NO and the process proceeds to step S515, it is determined whether or not a stop switch for instructing the end of reproduction is turned on. If the switch is operated by the user, the determination is yes, 0 is substituted for the start flag in step S516, and the playback unit 103 is stopped in step S517, and then the process proceeds to step S512. Otherwise, the determination is no and the process moves to step S512.

この通常再生処理では、上述したように、通常再生画面を表示させた後は、再生する音声ファイルの選択、或いは選択された音声ファイルの再生を可能とさせる。そのために、ステップＳ５０１では、音声ファイルを再生中であれば、ＹＥＳと判定するようにしている。 In this normal reproduction process, as described above, after the normal reproduction screen is displayed, the audio file to be reproduced can be selected or the selected audio file can be reproduced. Therefore, in step S501, if the audio file is being reproduced, it is determined as YES.

図１３及び図１４は、図１１に示す再生処理内でステップＳ４０４として実行されるキーワード再生処理のフローチャートである。次に図１３及び図１４を参照して、その再生処理について詳細に説明する。 FIGS. 13 and 14 are flowcharts of the keyword reproduction process executed as step S404 in the reproduction process shown in FIG. Next, the reproduction processing will be described in detail with reference to FIGS.

先ず、ステップＳ６０１では、図６に示すようなキーワード再生画面の表示が終了したか否か判定する。その表示が終了していない場合、判定はＮＯとなってステップＳ６０２に移行する。そうでない場合には、判定はＹＥＳとなって図１４のステップＳ６０８に移行する。上記ステップＳ６０１では、通常再生処理と同様に、音声データを再生中の場合にも判定はＹＥＳとなる。 First, in step S601, it is determined whether or not the keyword reproduction screen display as shown in FIG. If the display has not ended, the determination is no and the process moves to step S602. Otherwise, the determination is yes and the process moves to step S608 in FIG. In step S601, as in the normal playback process, the determination is YES even when the audio data is being played back.

キーワード再生画面では、図６に示すように、音声ファイルの再生単位毎にリストを配置する。ステップＳ６０２〜Ｓ６０７では、そのようなキーワード再生画面を表示するための処理が実行される。図中、再生単位は「区間データ」と表記している。 On the keyword playback screen, as shown in FIG. 6, a list is arranged for each playback unit of the audio file. In steps S602 to S607, a process for displaying such a keyword reproduction screen is executed. In the figure, the reproduction unit is described as “section data”.

先ず、ステップＳ６０２では、メモリ１０５のキーワード登録エリアに確保された登録キーワードエリアにそれぞれ記憶させている回数を検索し、最大の回数を抽出する。次のステップＳ６０３では、抽出した回数に対応するキーワードを抽出、つまりその回数が格納されている登録キーワードエリアのキーワードを抽出する。その次に移行するステップＳ６０４では、保存している音声ファイルを構成する再生単位（区間データ）のなかから、抽出したキーワードが含まれているものを抽出する。 First, in step S602, the number of times stored in each registered keyword area secured in the keyword registration area of the memory 105 is searched, and the maximum number is extracted. In the next step S603, a keyword corresponding to the extracted number of times is extracted, that is, a keyword in a registered keyword area in which the number of times is stored is extracted. In the next step S604, the one containing the extracted keyword is extracted from the reproduction units (section data) constituting the stored audio file.

キーワード抽出後に移行するステップＳ６０５では、抽出した再生単位毎に、抽出したキーワードが出現する回数を検出してリスト表示するための表示処理を実行する。その実行後に移行するステップＳ６０６では、キーワード登録エリアに登録した全てのキーワードでリスト表示を終了したか否か判定する。キーワード登録エリアに対象となるキーワードが残っていない場合、判定はＹＥＳとなって図１４のステップＳ６０８に移行する。そうでない場合には、判定はＮＯとなってステップＳ６０７に移行し、現在、対象としていたキーワードの次に回数の多いキーワードの抽出を行う。上記ステップＳ６０３には、その抽出後に戻る。 In step S605, which proceeds after keyword extraction, display processing for detecting the number of times the extracted keyword appears for each extracted playback unit and displaying the list is executed. In step S606 to which the process proceeds after the execution, it is determined whether or not the list display has been completed for all the keywords registered in the keyword registration area. If the target keyword does not remain in the keyword registration area, the determination is yes and the process proceeds to step S608 in FIG. Otherwise, the determination is no, the process moves to step S607, and the keyword having the next highest number of times is extracted after the keyword currently targeted. The process returns to step S603 after the extraction.

このようにして、キーワード登録エリアに登録されたキーワード毎に、対象となるキーワードが無くなってステップＳ６０７の判定がＹＥＳとなるまでステップＳ６０２〜Ｓ６０８で形成される処理ループが繰り返し実行される。それにより、回数の多いキーワードが出現する音声ファイルから、その音声ファイル内ではそのキーワードが出現する回数の多い再生単位（区間データ）からキーワード再生画面にそのリストが配置されることとなる。この結果、図６に示すようなキーワード再生画面が表示部１０９に表示される。自動的に強調表示させるリストは、例えばリスト上で先頭に位置するリストである。 In this way, for each keyword registered in the keyword registration area, the processing loop formed in steps S602 to S608 is repeatedly executed until there is no target keyword and the determination in step S607 is YES. Accordingly, the list is arranged on the keyword reproduction screen from the audio file in which the keyword appears frequently, from the reproduction unit (section data) in which the keyword appears frequently in the audio file. As a result, a keyword reproduction screen as shown in FIG. The list to be automatically highlighted is, for example, a list positioned at the top of the list.

図６に示すキーワード再生画面では、同じ音声ファイルには一つのキーワードのみ表示させている。これは、２００８年３月３０日の録音では「デジタルカメラ」は音声入力されず、２００８年２月１６日の録音では「会社」は音声入力されなかったことを示している。再生単位のなかには、何れのキーワードも出現しないものもありうる。そのような再生単位はステップＳ６０６の判定がＹＥＳとなるまでにリスト表示されない。このことから、特には図示していないが、ステップＳ６０６の判定がＹＥＳとなった場合、リスト表示していない再生単位の特定を行い、その再生単位が特定できたときには、その再生単位のリスト表示を行うようにしている。その再生単位のリスト表示では、キーワードは配置されない形となる。 In the keyword reproduction screen shown in FIG. 6, only one keyword is displayed in the same audio file. This indicates that “digital camera” was not input by voice during the recording of March 30, 2008, and “company” was not input by voice during the recording of February 16, 2008. Some playback units may not have any keywords. Such playback units are not displayed in a list until the determination in step S606 is YES. For this reason, although not specifically shown, if the determination in step S606 is YES, the playback unit that is not displayed in the list is specified, and when the playback unit can be specified, the playback unit list is displayed. Like to do. In the list display of the reproduction unit, the keyword is not arranged.

図１４のステップＳ６０８では、操作部１０８に対してユーザーがリストを選択するための操作を行ったか否か判定する。その操作をユーザーが行った場合、判定はＹＥＳとなってステップＳ６０９に移行し、その操作内容に応じて強調表示させるリストを変更する。その変更後はステップＳ６１０に移行する。一方、その操作をユーザーが行っていない場合には、判定はＮＯとなってそのステップＳ６１０に移行する。 In step S608 in FIG. 14, it is determined whether or not the user has performed an operation for selecting a list on the operation unit 108. When the operation is performed by the user, the determination is yes, the process proceeds to step S609, and the list to be highlighted is changed according to the operation content. After the change, the process proceeds to step S610. On the other hand, if the user has not performed the operation, the determination is no and the process moves to step S610.

ステップＳ６１０では、再生開始を指示するためのスタートスイッチがオンされたか否か判定する。そのスイッチをユーザーが操作した場合、判定はＹＥＳとなってステップＳ６１１に移行する。そうでない場合には、判定はＮＯとなってステップＳ６１４に移行する。 In step S610, it is determined whether or not the start switch for instructing the start of reproduction has been turned on. If the switch is operated by the user, the determination is yes and the process moves to step S611. Otherwise, the determination is no and the process moves to step S614.

ステップＳ６１１では、再生を管理するためのスタートフラグに１を代入する。その後に移行するステップＳ６１２では、スタートスイッチの操作時に強調表示させていたリストに対応する再生単位（区間データ）を再生対象として選択する。その次のステップＳ６１３では、再生部１０３の動作を開始させる。ステップＳ６１７にはその後に移行する。 In step S611, 1 is substituted into a start flag for managing reproduction. In step S612, the playback unit (section data) corresponding to the list highlighted when the start switch is operated is selected as a playback target. In the next step S613, the operation of the reproducing unit 103 is started. Then, the process proceeds to step S617.

一方、ステップＳ６１４では、再生終了を指示するためのストップスイッチがオンされたか否か判定する。そのスイッチをユーザーが操作した場合、判定はＹＥＳとなり、ステップＳ６１５で再生部１０３を停止させ、更にステップｓ６１６でスタートフラグに０を代入してから上記ステップＳ６１７に移行する。そうでない場合には、判定はＮＯとなって、そのステップＳ６１７に移行する。 On the other hand, in step S614, it is determined whether or not a stop switch for instructing the end of reproduction is turned on. If the switch is operated by the user, the determination is yes, the playback unit 103 is stopped in step S615, and 0 is substituted for the start flag in step s616, and then the process proceeds to step S617. Otherwise, the determination is no and the process moves to step S617.

ステップＳ６１７では、スタートフラグの値が１か否か判定する。その値が１であった場合、判定はＹＥＳとなってステップＳ６１８に移行する。そうでない場合には、判定はＮＯとなり、ここでキーワード再生処理を終了する。 In step S617, it is determined whether the value of the start flag is 1. If the value is 1, the determination is yes and the process moves to step S618. Otherwise, the determination is no and the keyword reproduction process is terminated here.

ステップＳ６１８では、再生対象とする再生単位（区間データ）から音声データを必要に応じて読み出し、再生部１０３に供給する。次のステップＳ６１９では、読み出したデータが区切りデータ、或いは音声ファイル最後の音声データか否か判定する。再生対象とする再生単位の再生が終了した場合、区切りデータを読み出すか、他に読み出す音声データが存在しないことになる。このため、判定はＹＥＳとなって上記ステップＳ６１５に移行する。そうでない場合には、判定はＮＯとなり、ここでキーワード再生処理を終了する。 In step S 618, audio data is read out as necessary from the playback unit (section data) to be played back and supplied to the playback unit 103. In the next step S619, it is determined whether the read data is delimiter data or the last audio data of the audio file. When the reproduction of the reproduction unit to be reproduced is finished, the delimiter data is read or there is no other audio data to be read. Therefore, the determination is yes and the process proceeds to step S615. Otherwise, the determination is no and the keyword reproduction process is terminated here.

最後に、上記ステップＳ６０５として実行される表示処理について、図１５に示すそのフローチャートを参照して詳細に説明する。この表示処理には、対象とするキーワード、及びそのキーワードが出現する再生単位を示す情報がキーワード再生処理から渡される。 Finally, the display process executed as step S605 will be described in detail with reference to the flowchart shown in FIG. In this display process, the target keyword and information indicating the reproduction unit in which the keyword appears are passed from the keyword reproduction process.

先ず、ステップＳ７０１では、再生単位毎にキーワードの数を音声認識を用いて検出する。次のステップＳ７０２では、音声ファイル毎に、キーワードを検出した数が最も多い再生単位（区間データ）を特定する。その次のステップＳ７０３では、特定した再生単位の最初の一部（先頭部分）の音声認識を行わせ、その認識結果を取得する。取得した認識結果は、次のステップＳ７０４で辞書部１０７の辞書を参照しテキストデータに変換する。その後はステップＳ７０５に移行する。 First, in step S701, the number of keywords is detected for each playback unit using voice recognition. In the next step S702, the reproduction unit (section data) having the largest number of detected keywords is specified for each audio file. In the next step S703, voice recognition of the first part (first part) of the specified reproduction unit is performed, and the recognition result is acquired. The acquired recognition result is converted into text data by referring to the dictionary of the dictionary unit 107 in the next step S704. Thereafter, the process proceeds to step S705.

ステップＳ７０５では、特定した再生単位の再生時間を求める。これは、例えば最後に位置する区切りデータ中の録音時間を抽出することで行う。続くステップＳ７０６では、特定した再生単位を有する音声ファイル（録音データ）のヘッダから日時を取得する。その次のステップＳ７０７では、日時、キーワード、キーワード数、再生時間（録音時間）、及びテキストデータを配置したリスト表示用の画像データを生成してキーワード再生画面表示用の画像データに加え、そのようにして更新した画像データを表示部１０９に出力する。そのようにして表示部１０９の表示内容を更新した後はステップＳ７０８に移行する。 In step S705, the playback time of the specified playback unit is obtained. This is performed, for example, by extracting the recording time in the last segmented data. In subsequent step S706, the date and time are acquired from the header of the audio file (recorded data) having the specified reproduction unit. In the next step S707, image data for list display in which date / time, keywords, number of keywords, playback time (recording time), and text data are arranged is generated and added to the image data for keyword playback screen display. The updated image data is output to the display unit 109. After updating the display content of the display unit 109 as described above, the process proceeds to step S708.

ステップＳ７０８では、対象となる再生単位分のリスト表示が全て終了したか否か判定する。対象となりうる再生単位が残っていない場合、判定はＹＥＳとなり、ここで表示処理を終了する。そうでない場合には、判定はＹＥＳとなってステップＳ７０９に移行し、現在、対象とする音声ファイルで次にキーワード数が多い再生単位を特定する。その音声ファイルで対象となる再生単位が存在しなければ、他の音声ファイルでキーワード数が最も多い再生単位を特定する。そのようにして再生単位を特定した後に上記ステップＳ７０３に戻る。それにより、図６に示すように、音声ファイル毎にまとめる形で、再生単位のリスト表示を行う。 In step S708, it is determined whether or not the list display for the target reproduction unit has been completed. If there is no remaining playback unit that can be the target, the determination is yes, and the display process ends here. Otherwise, the determination is yes and the process moves to step S709, where the playback unit with the next largest number of keywords in the target audio file is specified. If there is no target playback unit in the audio file, the playback unit having the largest number of keywords in other audio files is specified. After specifying the reproduction unit in this way, the process returns to step S703. As a result, as shown in FIG. 6, a list of playback units is displayed so as to be grouped for each audio file.

図６に示すキーワード再生画面では、キーワード「会社」の総出現数は１３であり、キーワード「デジタルカメラ」の総出現数は１３より多い１９である。しかし、総出現数が少ない方の音声ファイルのリスト表示が先に行われている。これは、登録したキーワードの回数（図３）は、登録してから行われた全ての録音により得られた回数であり、その回数は、現在、保存されている音声ファイルに対応するものであるとは限らないことから生じている。つまり、削除された音声ファイルが存在する場合、キーワードの総出現数は登録キーワードエリアに格納されている回数以下となり、その総出現数と回数の間の差は、削除した音声ファイル、その数によって変化するからである。 In the keyword reproduction screen shown in FIG. 6, the total number of appearances of the keyword “company” is 13 and the total number of appearances of the keyword “digital camera” is 19, which is more than 13. However, the list display of the audio file with the smaller total number of appearances is performed first. This is because the number of registered keywords (FIG. 3) is the number obtained by all recordings performed after registration, and this number corresponds to the currently stored audio file. This is not always the case. In other words, if there are deleted audio files, the total number of occurrences of the keyword is less than or equal to the number of times stored in the registered keyword area, and the difference between the total number of occurrences and the number of times depends on the number of deleted audio files and their number. Because it changes.

音声ファイル単位のリスト表示は、キーワードの回数ではなく、キーワードが登録された順序、文字の並び（五十音順）、等に着目して行っても良い。キーワードではなく、音声ファイルを作成した日時に着目した順序で行っても良い。このようなことから、図６に示すようなリスト表示は一例であり、様々な変形を行って良いものである。 The list display for each audio file may be performed noting the number of keywords but paying attention to the order in which the keywords are registered, the arrangement of characters (alphabetical order), and the like. You may carry out in the order which paid its attention to the date and time which produced not the keyword but the audio | voice file. For this reason, the list display as shown in FIG. 6 is an example, and various modifications may be made.

なお、本実施形態では、接続語、及び無音期間が話の内容上の区切り（変化）を表しているとして、接続語、或いは無音期間の検出により区切りデータを挿入するようにしているが、それら以外の条件を採用することもできる。例えば重要な点、強調したい点を話す際に、声をより大きくして話すことも多い。このことから、無音期間の検出と同様に、入力する音声データの値の変化を監視して、その値が通常よりも平均的に大きくなる一定時間以上の期間を検出し、その期間を含む範囲を１再生単位として区別するようにしても良い。 In this embodiment, it is assumed that the connection word and the silence period represent a break (change) in the content of the story, but the break data is inserted by detecting the connection word or the silence period. Other conditions can also be adopted. For example, when talking about important points or points that you want to emphasize, you often speak louder. Therefore, in the same manner as the detection of the silent period, the change in the value of the input audio data is monitored, the period in which the value is larger than normal is detected over a certain time, and the range including the period is detected. May be distinguished as one reproduction unit.

区切りデータの挿入は、接続語、或いは無音期間の検出により無条件に行っているが、無条件には行わないようにしても良い。例えば接続語、或いは無音期間を検出した後は、一定時間が経過するまでの間、接続語、或いは無音期間を検出しても区切りデータを挿入しないようにしても良い。そのような条件を設けた場合には、録音時間が短い再生単位（区間データ）が数多くなることにより録音した内容の確認がし難くなるような不具合の発生を回避できるようになる。 Insertion of delimiter data is unconditionally performed by detecting a connection word or a silent period, but may not be performed unconditionally. For example, after a connection word or silence period is detected, the delimiter data may not be inserted even if the connection word or silence period is detected until a certain time elapses. When such a condition is provided, it is possible to avoid the occurrence of a problem that makes it difficult to confirm the recorded content due to the large number of playback units (section data) having a short recording time.

本実施形態は、音声認識機能（音声認識部１０６）を搭載したボイスレコーダーに本発明を適用したものであるが、音声認識機能の搭載の有無は適用可能か否かに影響を及ぼすものではない。なぜなら、音声認識機能はマイコン１０２に実行させるプログラムにより実現できるからである。このことから、音声認識機能を搭載したボイスレコーダーでは、上述したような動作、或いは変形例をマイコン１０２により実現させるプログラムをインストールすれば良く、その音声認識機能を搭載していないボイスレコーダーでは、そのプログラムに加えて、音声認識機能を実現させるプログラム、或いはサブプログラムを用意すれば良い。そのようなプログラムは、光ディスクやフラッシュメモリ等の記録媒体に記録して配布しても良く、ＬＡＮ、或いはインターネット等の通信ネットワークを介して配信できるようにしても良い。それにより、プログラムは通信ネットワークを介して配信する装置がアクセス可能であっても良い。 In the present embodiment, the present invention is applied to a voice recorder equipped with a voice recognition function (voice recognition unit 106). However, whether the voice recognition function is installed does not affect whether the voice recorder is applicable. . This is because the voice recognition function can be realized by a program executed by the microcomputer 102. Therefore, in a voice recorder equipped with a voice recognition function, it is only necessary to install a program that realizes the above-described operation or modification by the microcomputer 102. In a voice recorder not equipped with the voice recognition function, In addition to the program, a program or subprogram for realizing the voice recognition function may be prepared. Such a program may be distributed by being recorded on a recording medium such as an optical disk or a flash memory, or may be distributed via a communication network such as a LAN or the Internet. Thereby, the program may be accessible by a device that distributes the program via a communication network.

本実施形態によるボイスレコーダーの構成を示す図である。It is a figure which shows the structure of the voice recorder by this embodiment. メモリ１０５に保存する音声データを説明する図である。It is a figure explaining the audio | speech data preserve | saved at the memory. メモリ１０５を用いたキーワードの保存方法を説明する図である。It is a figure explaining the storage method of the keyword using the memory. キーワード登録画面例を示す図である。It is a figure which shows the example of a keyword registration screen. 通常再生画面例を示す図である。It is a figure which shows the example of a normal reproduction | regeneration screen. キーワード再生画面を示す図である。It is a figure which shows a keyword reproduction | regeneration screen. 全体処理のフローチャートである。It is a flowchart of the whole process. キーワード登録処理のフローチャートである。It is a flowchart of a keyword registration process. レコード処理のフローチャートである。It is a flowchart of a record process. レコード処理のフローチャートである（続き）。It is a flowchart of record processing (continuation). 再生処理のフローチャートである。It is a flowchart of a reproduction process. 通常再生処理のフローチャートである。It is a flowchart of a normal reproduction process. キーワード再生処理のフローチャートである。It is a flowchart of a keyword reproduction process. キーワード再生処理のフローチャートである（続き）。It is a flowchart of keyword reproduction | regeneration processing (continuation). 表示処理のフローチャートである。It is a flowchart of a display process.

符号の説明Explanation of symbols

１０１録音部
１０２マイクロコンピュータ
１０３再生部
１０４常時録音用メモリ
１０５メモリ
１０６音声認識部
１０７辞書部
１０８操作部
１０９表示部
１１０内部バス
１２１マイク
１２２、１３２増幅回路
１２３Ａ／Ｄ変換回路
１３１Ｄ／Ａ変換回路
１３３スピーカー DESCRIPTION OF SYMBOLS 101 Recording part 102 Microcomputer 103 Playback part 104 Memory for continuous recording 105 Memory 106 Voice recognition part 107 Dictionary part 108 Operation part 109 Display part 110 Internal bus 121 Microphone 122, 132 Amplification circuit 123 A / D conversion circuit 131 D / A conversion Circuit 133 speaker

Claims

音声をデジタルデータに変換して入力する音声入力手段、及び該音声入力手段が出力するデジタルデータを記憶する記憶手段を備えたボイスレコーダーにおいて、
前記音声入力手段が出力するデジタルデータを用いて音声認識を行う音声認識手段と、
前記音声認識手段による音声認識の結果を用いて、前記音声入力手段が音声を入力する人が話す内容上の区切りを特定する区切り特定手段と、
前記区切り特定手段による区切りの特定結果に応じて、前記音声入力手段が連続して出力するデジタルデータを複数のデジタルデータ群に分割可能に前記記憶手段に記憶する制御手段と、
を具備することを特徴とするボイスレコーダー。 In a voice recorder provided with voice input means for converting voice to digital data and inputting, and storage means for storing digital data output from the voice input means,
Voice recognition means for performing voice recognition using digital data output by the voice input means;
Using the result of voice recognition by the voice recognition means, a delimiter specifying means for specifying a delimiter on the content spoken by the person who inputs the voice,
Control means for storing digital data continuously output by the voice input means in the storage means so as to be divided into a plurality of digital data groups in accordance with the result of the separator specified by the separator specifying means;
Voice recorder characterized by comprising.

各種情報を表示可能な表示手段と、
前記記憶手段に記憶されたデジタルデータを再生する再生手段と、を更に具備し、
前記制御手段は、前記区切り特定手段が特定する区切りによって分割可能に前記記憶手段に記憶させたデジタルデータ群毎に、該デジタルデータ群を用いた前記音声認識手段による音声認識の結果を前記表示手段に表示させ、前記再生手段に再生させるデジタルデータ群をユーザーに選択させる、
ことを特徴とする請求項１記載のボイスレコーダー。 Display means capable of displaying various information;
Reproducing means for reproducing the digital data stored in the storage means,
The control means, for each digital data group stored in the storage means in a manner that can be divided by the break specified by the break specifying means, displays the result of voice recognition by the voice recognition means using the digital data group as the display means. And allowing the user to select a digital data group to be reproduced by the reproduction means.
The voice recorder according to claim 1.

前記表示手段に表示させる前記音声認識の結果は、前記デジタルデータ群の先頭部分が示す先頭文字列、及び該デジタルデータ群が示す文字列全体のなかに出現する予め登録した文字列のうちの少なくとも一方である、
ことを特徴とする請求項２記載のボイスレコーダー。 The result of the speech recognition to be displayed on the display means is at least one of a first character string indicated by a leading portion of the digital data group and a previously registered character string appearing in the entire character string indicated by the digital data group. On the other hand,
The voice recorder according to claim 2.

前記区切り特定手段は、前記音声認識手段による音声認識の結果を用いて、前記音声入力手段を介して音声入力された話の話題を変える際に使用される接続語の検出を行うことにより、前記区切りを特定する、
ことを特徴とする請求項１記載のボイスレコーダー。 The delimiter specifying means uses the result of speech recognition by the speech recognition means to detect a connected word used when changing the topic of the speech input through the speech input means. Identify the separator,
The voice recorder according to claim 1.

音声をデジタルデータに変換して入力し保存する音声録音方法において、
前記入力により得られたデジタルデータを用いて音声認識を行う音声認識工程と、
前記音声認識工程により得られた音声認識の結果を用いて、前記音声を入力する人が話す内容上の区切りを特定する区切り特定工程と、
前記区切り特定工程での区切りの特定結果に応じて、前記音声の変換により連続して得られるデジタルデータを複数のデジタルデータ群に分割可能に記憶手段に記憶する記憶工程と、
を備えることを特徴とする音声録音方法。 In a voice recording method that converts voice to digital data, inputs it, and saves it,
A speech recognition step for performing speech recognition using digital data obtained by the input;
Using a result of speech recognition obtained by the speech recognition step, a separator specifying step for specifying a separator on the content spoken by a person who inputs the speech;
A storage step of storing digital data continuously obtained by the conversion of the voice in a storage unit so as to be divided into a plurality of digital data groups in accordance with the determination result of the separation in the separation specifying step;
A voice recording method comprising: