JP4587916B2

JP4587916B2 - Audio signal discrimination device, sound quality adjustment device, content display device, program, and recording medium

Info

Publication number: JP4587916B2
Application number: JP2005260618A
Authority: JP
Inventors: 智也中村; 直大西; 修藤井
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2005-09-08
Filing date: 2005-09-08
Publication date: 2010-11-24
Anticipated expiration: 2025-09-08
Also published as: JP2007072273A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice signal discrimination apparatus capable of subjecting an input voice signal to exact discrimination of a speech/non-speech. <P>SOLUTION: The apparatus comprises a music nature detecting means 11a detecting the degree of the music nature possessed by the inputted voice signal, a speech nature detecting means 11b detecting the degree of the speech nature possessed by the inputted voice signal, and a speech/non-speech judging means 12 performing judgment for discriminating whether the inputted voice signal corresponds to the speech or corresponds to the non-speech. The speech/non-speech judging means 12 judges on the speech/non-speech by using calculation formulas different according to the degree of the speech nature and the degree of the music nature on the basis of the detection result from the music nature detecting means 11a and the detection result from the speech nature detecting means 11b. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、音声信号判別装置、音質調整装置、コンテンツ表示装置、プログラム、及び記録媒体に関し、より詳細には、音声信号に対しスピーチ／非スピーチの判定を行う音声信号判定装置、その音声信号判定装置を備えた音質調整装置、その音質調整装置を備えたコンテンツ表示装置、それらのプログラム、及び、そのプログラムを記録したコンピュータ読み取り可能な記録媒体に関する。 The present invention relates to an audio signal determination device, a sound quality adjustment device, a content display device, a program, and a recording medium, and more particularly, an audio signal determination device that performs speech / non-speech determination on an audio signal, and the audio signal determination thereof The present invention relates to a sound quality adjusting device including the device, a content display device including the sound quality adjusting device, a program thereof, and a computer-readable recording medium storing the program.

従来から、一般的なオーディオ装置では、低音域の出力周波数特性を調整するバス調整、高音域の出力周波数特性を調整するトレブル調整、低音域及び高音域を強調するラウドネス調整等の各種音質調整装置が設けられている。 Conventionally, in general audio devices, various sound quality adjustment devices such as bass adjustment that adjusts the output frequency characteristics of the low frequency range, treble adjustment that adjusts the output frequency characteristics of the high frequency range, and loudness adjustment that emphasizes the low and high frequency ranges. Is provided.

このような音質調整装置としては、入力された音声信号の音声情報自体からその周期性の有無を検出することにより、入力された信号が音楽情報かそれ以外の情報かを判断し、その結果に応じて音響パラメータを制御するものも提案されている（例えば、特許文献１を参照）。
特開昭６１−９３７１２号公報 As such a sound quality adjusting device, by detecting the presence or absence of the periodicity from the sound information itself of the input sound signal, it is determined whether the input signal is music information or other information, and the result is A device that controls acoustic parameters in response to this has been proposed (see, for example, Patent Document 1).
JP-A-61-93712

しかしながら、特にテレビジョン放送やラジオ放送を受信する機器においては、音声情報だけから音楽情報の是非を判断すると思わぬ誤判定が生じる場合がある。 However, in particular, in devices that receive television broadcasts or radio broadcasts, an unexpected misjudgment may occur when judging whether or not music information is appropriate only from audio information.

例えば、音楽番組でアカペラが流れた場合は、その作風のためにリズム感を検出することができずに、音楽情報ではないと判定し、この音楽情報に最適な音響パラメータをイコライザ等で選択しないという誤判定が生じる。その結果、この音楽情報は、イコライザの方で例えばスピーチに最適な音響パラメータ等を選択することも生じ得るので、生の音の響きを重視したいアカペラの音楽情報に対して、言葉の明瞭性を重視した（中音域を比較的強調した）音響特性で出力する結果となり、ユーザが本来聞きたい音響設定にならない。 For example, if a cappella flows in a music program, it is not possible to detect the rhythm due to its style, it is determined that it is not music information, and the optimal acoustic parameter for this music information is not selected by an equalizer or the like A misjudgment occurs. As a result, for this music information, the equalizer may select, for example, the optimal acoustic parameters for speech, etc., so the clarity of the words is improved for the music information of a cappella that emphasizes the sound of raw sounds. As a result, the sound characteristics that are emphasized (relatively emphasized in the middle sound range) are output, and the sound settings that the user originally wants to hear are not achieved.

また、ニュース番組を視聴中には、本来言語の明瞭性を重視したスピーチに最適なパラメータ等を選択するのが好適であるが、ニュースの内容によっては時にはアナウンサのスピーチと並行してニュースの取材現場で集音した音声をそのまま出力する場合もある。このような集音した音声情報に音楽が混在していると、その両者の音量のバランスによってはニュース番組のスピーチより、集音した音声から出力された音楽情報などが優位性を持つことも想定されるので、このような場合も、上述のアカペラの例とは逆の例として十分起こり得る問題点である。 While watching a news program, it is preferable to select parameters that are optimal for speech that emphasizes language clarity. However, depending on the content of the news, sometimes news reporting is performed in parallel with the announcement speech. In some cases, the sound collected on site is output as it is. If music is mixed in such collected audio information, it is assumed that the music information output from the collected audio has an advantage over the speech of the news program depending on the balance of the volume of both. Therefore, such a case is also a problem that can occur sufficiently as an example opposite to the above-described a cappella example.

そして、上述のごとき問題を解決し、入力音声信号に対し的確なスピーチ／非スピーチ判定を実行可能とした機器であっても、機器内部で判定並びにその判定に基づく音質調整を実行していることから、ユーザはどのような理由で音質が変更されたのかを理解できないといった問題が生じる。特に、このようなスピーチ／非スピーチ判定に基づく音質調整の結果として出力された音声がユーザ好みでなかった場合、ユーザは、音質調整の原因が分からず設定を変更することもできないので、不快感を抱かざるを得ない。 And even if the device solves the problems as described above and can execute accurate speech / non-speech determination on the input audio signal, the determination and the sound quality adjustment based on the determination are executed inside the device. Therefore, there arises a problem that the user cannot understand why the sound quality has been changed. In particular, if the sound output as a result of the sound quality adjustment based on such speech / non-speech determination is not user-preferred, the user cannot understand the cause of the sound quality adjustment and cannot change the setting. I have to hold.

本発明は、上述のごとき実情に鑑みてなされたものであり、入力された音声信号に対して的確にスピーチ／非スピーチを判別することが可能な音声信号判別装置、その音声信号判別装置を備えた音質調整装置、その音質調整装置を備えたコンテンツ表示装置、それらのプログラム、及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体を提供することをその目的とする。 The present invention has been made in view of the above circumstances, and includes an audio signal determination device capable of accurately determining speech / non-speech with respect to an input audio signal, and the audio signal determination device. It is an object of the present invention to provide a sound quality adjusting device, a content display device including the sound quality adjusting device, a program thereof, and a computer-readable recording medium on which the program is recorded.

また、本発明は、入力された音声信号に対してスピーチ／非スピーチを判定してその判定結果に基づき音質を調整する際に、その判定結果をユーザに視認させることが可能な音質調整装置、その音質調整装置を備えたコンテンツ表示装置、それらのプログラム、及びそのプログラムを記録したコンピュータ読み取り可能な記録媒体を提供することを他の目的とする。 In addition, the present invention provides a sound quality adjustment device that allows a user to visually recognize a determination result when determining speech / non-speech with respect to an input audio signal and adjusting the sound quality based on the determination result, It is another object of the present invention to provide a content display device including the sound quality adjusting device, a program thereof, and a computer-readable recording medium on which the program is recorded.

本発明は、上述のごとき課題を解決するために、以下の各技術手段でそれぞれ構成される。 The present invention is constituted by the following technical means in order to solve the above-described problems.

第１の技術手段は、入力された音声信号がもつミュージック性の度合を検出するミュージック性検出手段と、入力された音声信号がもつスピーチ性の度合を検出するスピーチ性検出手段と、入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを判別するための判定を行うスピーチ／非スピーチ判定手段とを有する音声信号判別装置であって、前記スピーチ／非スピーチ判定手段は、前記ミュージック性検出手段の検出結果を所定数の段階に分類し、且つ前記スピーチ性検出手段の検出結果を前記所定数と同じ又は異なる所定数の段階に分類し、スピーチ性の度合及びミュージック性の度合に応じた各分類の組み合わせ毎に異なる計算式を用い、スピーチ／非スピーチの判定を行うことを特徴としたものである。 The first technical means includes a music detection means for detecting the degree of music characteristic of the input voice signal, a speech detection means for detecting the degree of speech characteristic of the input voice signal, and the input A speech / non-speech determination unit for determining whether a speech signal corresponds to speech or non-speech, and the speech / non-speech determination unit includes: The detection result of the music property detection means is classified into a predetermined number of steps, and the detection result of the speech property detection means is classified into a predetermined number of steps that are the same as or different from the predetermined number, and the degree of speech property and the music property This method is characterized in that speech / non-speech determination is performed using a different calculation formula for each combination of classifications according to the degree.

第２の技術手段は、第１の技術手段において、入力された音声信号がモノラル信号又はステレオ信号のいずれであるかを判定するモノラル／ステレオ判定手段を有し、前記スピーチ／非スピーチ判定手段は、前記モノラル／ステレオ判定手段の判定結果に基づいて、前記計算式の補正成分を調整することを特徴としたものである。 The second technical means includes monaural / stereo determination means for determining whether the input audio signal is a monaural signal or a stereo signal in the first technical means, and the speech / non-speech determination means includes The correction component of the calculation formula is adjusted based on the determination result of the monaural / stereo determination means.

第３の技術手段は、第１又は第２の技術手段における音声信号判別装置を備えた音質調整装置であって、該音声信号判別装置によってスピーチ／非スピーチに判別された音声信号に対し、スピーチと非スピーチとで異なる音質に調整する音質調整手段を備えることを特徴としたものである。 A third technical means is a sound quality adjusting device including the audio signal discriminating device in the first or second technical means, and is adapted to perform speech on an audio signal discriminated as speech / non-speech by the audio signal discriminating device. And a non-speech speech quality adjusting means for adjusting to a different sound quality.

第４の技術手段は、第３の技術手段において、前記スピーチ／非スピーチ判定手段における判定結果を表示する判定結果表示手段を備え、該判定結果表示手段は、ユーザに対し、前記判定結果をスピーチ或いは非スピーチの度合に応じて段階的に表示することを特徴としたものである。 A fourth technical means includes a determination result display means for displaying the determination result in the speech / non-speech determination means in the third technical means, and the determination result display means provides the user with the determination result as a speech. Or it is characterized by displaying in steps according to the degree of non-speech.

第５の技術手段は、第４の技術手段において、前記音質調整手段は、前記スピーチ／非スピーチ判定手段の判定結果に基づく前記音質調整を実行するか否かを設定する調整設定手段を有し、前記判定結果表示手段は、前記調整設定手段によって前記音質調整を実行するよう設定されている場合にのみ、前記判定結果の表示を行うことを特徴としたものである。 According to a fifth technical means, in the fourth technical means, the sound quality adjusting means has an adjustment setting means for setting whether or not to execute the sound quality adjustment based on a determination result of the speech / non-speech determining means. The determination result display means displays the determination result only when the sound quality adjustment is set to be executed by the adjustment setting means.

第６の技術手段は、第４又は第５の技術手段において、前記判定結果表示手段は、前記判定結果の表示を実行するか否かを設定する表示設定手段を有し、該表示設定手段によって前記判定結果表示を実行するよう設定されている場合にのみ、前記判定結果の表示を行うことを特徴としたものである。 A sixth technical means is the fourth or fifth technical means, wherein the determination result display means has display setting means for setting whether or not to display the determination result, and the display setting means The determination result is displayed only when it is set to execute the determination result display.

第７の技術手段は、第４乃至第６のいずれかの技術手段における音質調整装置とコンテンツ入力装置とを備えたコンテンツ表示装置であって、該コンテンツ入力装置で入力されたコンテンツに含まれる音声信号を前記音質調整装置に入力し、音質を調整して音声出力し、且つ、前記コンテンツに含まれる映像信号を表示すると共に、必要に応じて前記判定結果表示手段による判定結果表示を行うことを特徴としたものである。 A seventh technical means is a content display device comprising the sound quality adjusting device and the content input device according to any one of the fourth to sixth technical means, wherein the audio included in the content input by the content input device A signal is input to the sound quality adjustment device, the sound quality is adjusted and sound is output, a video signal included in the content is displayed, and a determination result display by the determination result display means is performed as necessary. It is a feature.

第８の技術手段は、ミュージック性検出手段が、入力された音声信号がもつミュージック性の度合を検出するミュージック性検出ステップと、スピーチ性検出手段が、入力された音声信号がもつスピーチ性の度合を検出するスピーチ性検出ステップと、スピーチ／非スピーチ判定手段が、入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを判別するための判定を行うスピーチ／非スピーチ判定ステップとを、コンピュータに実行させるためのプログラムであって、前記スピーチ／非スピーチ判定ステップは、前記ミュージック性検出ステップの検出結果を所定数の段階に分類し、且つ前記スピーチ性検出ステップの検出結果を前記所定数と同じ又は異なる所定数の段階に分類し、スピーチ性の度合及びミュージック性の度合に応じた各分類の組み合わせ毎に異なる計算式を用い、スピーチ／非スピーチの判定を行うことを特徴としたものである。 Eighth technical means is the music characteristic detection means, and music of detecting a degree of music of having the input audio signal, the speech characteristic detection means, a speech of having the input audio signal degree Speech / non-speech determination , and speech / non-speech determination means for determining whether the input audio signal corresponds to speech or non-speech. The speech / non-speech determination step classifies the detection result of the music property detection step into a predetermined number of stages, and the detection result of the speech property detection step. the classified into stages of the same or a different predetermined number and the predetermined number, the speech of the degree and MusiCares Using different calculation formulas for each combination of the classification according to the degree of click resistance, in which it is characterized in that a determination of the speech / non-speech.

第９の技術手段は、第８の技術手段において、当該プログラムは、音質調整手段が前記スピーチ／非スピーチ判定ステップによりスピーチ／非スピーチに判別された音声信号に対しスピーチと非スピーチとで異なる音質に調整する音質調整ステップを、前記コンピュータに実行させるための調整プログラムを含むことを特徴としたものである。 In the ninth technical means is the eighth technical means, the program includes a sound quality adjustment means the speech / non-speech decision step by the speech / pair to the determined audio signal into non-speech cis peach and non-speech An adjustment program for causing the computer to execute a sound quality adjustment step for adjusting to a different sound quality is included.

第１０の技術手段は、第８又は第９の技術手段において、当該プログラムは、判定結果表示手段が前記スピーチ／非スピーチ判定ステップにおける判定結果を表示部に表示する判定結果表示ステップを、前記コンピュータに実行させるための表示プログラムを含み、該判定結果表示ステップは、ユーザに対し、前記判定結果をスピーチ或いは非スピーチの度合に応じて段階的に表示することを特徴としたものである。 A tenth technical means is the eighth or ninth technical means, the program causes the determination result display step of determination result displaying means displays on the display unit of the determination result in said speech / non-speech decision step, the computer The determination result display step is characterized in that the determination result is displayed stepwise to the user in accordance with the degree of speech or non-speech.

第１１の技術手段は、第８乃至第１０のいずれかの技術手段におけるプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The eleventh technical means is a computer-readable recording medium on which a program according to any of the eighth to tenth technical means is recorded.

本発明によれば、入力された音声信号に対して的確にスピーチ／非スピーチを判別することが可能となる。また、本発明によれば、入力された音声信号に対してスピーチ／非スピーチを判定してその判定結果に基づき音質を調整する際に、その判定結果をユーザに視認させることが可能となる。 According to the present invention, it is possible to accurately determine speech / non-speech for an input audio signal. Further, according to the present invention, when the speech / non-speech is determined for the input audio signal and the sound quality is adjusted based on the determination result, the determination result can be made visible to the user.

本発明に係る音声信号判別装置は、ミュージック性検出手段、スピーチ性検出手段、及びスピーチ／非スピーチ判定手段を備えるものとする。以下、このような音声信号判別装置を備え、ここでの判別に基づいた音質調整を行う音質調整手段を備えた音質調整装置について説明するが、本発明に係る音声信号判別装置は、音質調整以外、例えば判別に基づいたコンテンツ（その音声信号を含むコンテンツ）の分別記録（録画）などにも適用可能である。 The audio signal discriminating apparatus according to the present invention includes music property detection means, speech property detection means, and speech / non-speech determination means. Hereinafter, a sound quality adjusting device including such a sound signal determining device and including sound quality adjusting means for performing sound quality adjustment based on the determination here will be described. However, the sound signal determining device according to the present invention is other than sound quality adjusting. For example, the present invention can also be applied to classification recording (recording) of content (content including the audio signal) based on discrimination.

また、本発明に係る音質調整装置は、このような音声信号判別装置に加え、音質調整手段、及び好ましくは判定結果表示手段を備えるものとする。以下、本発明の説明にあたり、スピーチ／非スピーチ判定に際して、モノラル／ステレオ判定並びにその判定結果に基づきスピーチ／非スピーチ判定における判断基準を最適化するといった好適な例を挙げて説明するが、本発明ではこのようなモノラル／ステレオ判定及び最適化を実行しない形態も当然採用可能である。このような他の実施形態として、モノラル／ステレオ判定及び最適化の代わりに有音／無音判定を行う形態についても説明するが、当然モノラル／ステレオ判定及び最適化と有音／無音判定とを併用する形態を採用してもよい。 The sound quality adjusting apparatus according to the present invention includes a sound quality adjusting means and preferably a determination result displaying means in addition to such an audio signal discriminating apparatus. Hereinafter, in the description of the present invention, in the speech / non-speech determination, the mono / stereo determination and the determination standard in the speech / non-speech determination based on the determination result will be described as a preferred example. Then, it is naturally possible to adopt a form in which such monaural / stereo determination and optimization are not executed. As another embodiment of the present invention, an embodiment in which sound / silence determination is performed instead of monaural / stereo determination and optimization will be described. Naturally, monaural / stereo determination and optimization are combined with sound / silence determination. You may employ | adopt the form to do.

図１は、本発明の一実施形態に係る音質調整装置の一構成例を示すブロック図で、図中、１は音質調整装置、１０は音声信号入力手段、１１ａはミュージック性検出手段、１１ｂはスピーチ性検出手段、１２はスピーチ／非スピーチ判定手段、１３はモノラル／ステレオ判定手段、１４は基準最適化手段、１４ａはスイッチ、１４ｂは閾値（スレッショルド）Ｖ_ＳＬ１への設定手段、１４ｃは閾値Ｖ_ＳＬ２への設定手段、１５は音質調整手段、１６は音声信号出力手段、１７は判定結果表示手段である。 FIG. 1 is a block diagram showing a configuration example of a sound quality adjusting apparatus according to an embodiment of the present invention. In the figure, 1 is a sound quality adjusting apparatus, 10 is an audio signal input means, 11a is a music detection means, and 11b is Speech property detection means, 12 is a speech / non-speech determination means, 13 is a monaural / stereo determination means, 14 is a reference optimization means, 14a is a switch, 14b is a threshold _VSL1 setting means, 14c is a threshold V _SL2 setting means, 15 a sound quality adjusting means, 16 an audio signal output means, and 17 a determination result display means.

ミュージック性検出手段１１ａは、入力された音声信号がもつミュージック性の度合を検出する手段で、非スピーチ性判定手段とも言える。スピーチ性検出手段１１ｂは、入力された音声信号がもつスピーチ性の度合を検出する手段で、スピーチ性判定手段とも言える。ミュージック性とは音声信号が音楽の信号である可能性を示し、スピーチ性とは音声信号が会話などを含む信号である可能性を示す。ミュージック性検出手段１１ａ及びスピーチ性検出手段１１ｂは、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよい。 The music property detection means 11a is a means for detecting the degree of music property of the input audio signal, and can be said to be a non-speech property determination means. The speech property detection unit 11b is a unit that detects the degree of speech property of the input audio signal, and can be said to be a speech property determination unit. The music characteristic indicates the possibility that the voice signal is a music signal, and the speech characteristic indicates the possibility that the voice signal is a signal including conversation. The music property detection unit 11a and the speech property detection unit 11b may be configured in whole or in part by hardware or software.

スピーチ／非スピーチ判定手段１２は、音声信号入力手段１０で入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを判別するための判定を行う。音声信号入力手段１０では、その入力元や入力方法は問わない。また、スピーチ／非スピーチ判定手段１２も、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよい。 The speech / non-speech determination unit 12 performs a determination to determine whether the audio signal input by the audio signal input unit 10 corresponds to speech or non-speech. In the audio signal input means 10, the input source and input method are not limited. The speech / non-speech determination unit 12 may be configured entirely or partially by hardware or software.

そして、本発明におけるスピーチ／非スピーチ判定手段１２は、ミュージック性検出手段１１ａの検出結果及びスピーチ性検出手段１１ｂの検出結果に基づき、スピーチ性の度合及びミュージック性の度合に応じて異なる計算式を用い、スピーチ／非スピーチの判定を行う。従って、例えば、スピーチ性の度合を０〜１００及びミュージック性の度合も０〜１００で検出した場合、スピーチ／非スピーチの判定は１０１×１０１通りの検出結果を閾値処理などして実行する。 The speech / non-speech determination unit 12 according to the present invention uses different calculation formulas depending on the degree of speech and the degree of music based on the detection result of the music property detection unit 11a and the detection result of the speech property detection unit 11b. Used to determine speech / non-speech. Therefore, for example, when the degree of speech is detected from 0 to 100 and the degree of music is also detected from 0 to 100, speech / non-speech determination is executed by performing threshold processing on 101 × 101 detection results.

このような判定は煩雑であることから、より好ましくは、スピーチ／非スピーチ判定手段１２は、まず、ミュージック性検出手段１１ａの検出結果を、予め分類した所定数の段階のどの段階に該当するかを判定し、且つスピーチ性検出手段１１ｂの検出結果をその所定数と同じ又は異なる所定数の予め分類した段階のどの段階に該当するかを判定する。そして、スピーチ／非スピーチ判定手段１２は、ミュージック性の度合及びスピーチ性の度合に応じた各分類の組み合わせ毎に異なる計算式を用い、スピーチ／非スピーチの判定を行う。例えば、ミュージック性・スピーチ性共に３つずつの段階に分類していた場合、３×３の９通りの計算式が用いられ、ミュージック性・スピーチ性の検出結果に基づきこれらの計算式が選択され計算がなされる。 Since such determination is complicated, more preferably, the speech / non-speech determination unit 12 first corresponds to a predetermined number of stages in which the detection result of the music property detection unit 11a is classified in advance. And a determination is made as to which of the predetermined number of pre-classified stages the detection results of the speech detection means 11b are the same as or different from the predetermined number. Then, the speech / non-speech determination unit 12 determines speech / non-speech by using different calculation formulas for each combination of classifications according to the degree of music and the degree of speech. For example, if both music and speech are classified into three stages, three 3 × 3 formulas are used, and these formulas are selected based on the detection results of music and speech. Calculations are made.

また、スピーチ／非スピーチ判定手段１２では、「ニュース番組などは一般的にモノラル放送が多く、一方で音楽が流れるＣＭや音楽番組はステレオ放送に設定されていることが多い」といった経験則を利用し、音声信号に重畳されたモノラル／ステレオ信号を検出することによって、現在放送されている番組がスピーチ／非スピーチ（音楽）のいずれに好適かを判断することが好ましい。このため、ここで説明する音質調整装置は、モノラル／ステレオ判定手段１３及び基準最適化手段１４を備え、これらの手段によってスピーチ／非スピーチ判定を最適化し、その判定に基づき上述の計算式或いは他の計算式の音響パラメータの制御を行っている。 Further, the speech / non-speech determination means 12 uses an empirical rule such as “News programs are generally monaural broadcasting, while music and music programs in which music flows are often set to stereo broadcasting”. It is preferable to determine whether the currently broadcast program is suitable for speech / non-speech (music) by detecting the monaural / stereo signal superimposed on the audio signal. For this reason, the sound quality adjustment apparatus described here includes a monaural / stereo determination unit 13 and a reference optimization unit 14, and optimizes the speech / non-speech determination by these units, and based on the determination, the above formula or other The acoustic parameters of the calculation formula are controlled.

モノラル／ステレオ判定手段１３は、入力された音声信号が、モノラル信号又はステレオ信号のいずれであるかを判定する。モノラル／ステレオ判定手段１３も、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよく、また、単に音声信号を入力した際のモノラル／ステレオの切り替えなどの情報によって判定してもよい。さらに、音声信号の元のコンテンツが電子プログラムガイド（ＥＰＧ）に掲載され予約録画可能なようになっている場合などには、ＥＰＧにおけるモノラル／ステレオの情報も共に掲載されているので、その情報を取得することでモノラル／ステレオ判定を行うことも可能である。 The monaural / stereo determination means 13 determines whether the input audio signal is a monaural signal or a stereo signal. The monaural / stereo determination means 13 may be configured in whole or in part by hardware or software, and is determined by information such as mono / stereo switching when a sound signal is input. May be. Furthermore, if the original content of the audio signal is posted in an electronic program guide (EPG) and can be reserved for recording, the mono / stereo information in the EPG is also posted together. It is also possible to perform monaural / stereo determination by acquiring.

基準最適化手段１４は、モノラル／ステレオ判定手段１３での判定結果に基づいて、スピーチ／非スピーチ判定手段１２における判定基準を最適化する。この最適化は、上述の計算式の補正項（補正成分）のパラメータを変更することで行ってもよいし、その他、例えば上述の計算式による計算後の閾値処理などの閾値のパラメータ（例えば後述のＶ_ＳＬ１，Ｖ_ＳＬ２）を変更することで行っても、これら双方変更することを行ってもよい。このように、モノラル／ステレオ判定によりスピーチ自動検出機能の判定基準を最適化させることで、検出機能の精度を向上させることができる。従って、入力された音声信号に対して的確にスピーチ／非スピーチを判別すること、すなわち音声信号のモノラル／ステレオの信号に応じて好適なスピーチ／非スピーチ検出が可能となる。 The reference optimization unit 14 optimizes the determination criterion in the speech / non-speech determination unit 12 based on the determination result in the monaural / stereo determination unit 13. This optimization may be performed by changing the parameter of the correction term (correction component) of the above-described calculation formula. In addition, for example, threshold parameters such as threshold processing after calculation by the above-described calculation formula (for example, described later) V _SL1 and V _SL2 ) may be changed, or both of them may be changed. Thus, the accuracy of the detection function can be improved by optimizing the determination criterion of the automatic speech detection function by monaural / stereo determination. Therefore, it is possible to accurately determine speech / non-speech with respect to the input audio signal, that is, suitable speech / non-speech detection according to the monaural / stereo signal of the audio signal.

例えば、ニュース等のモノラル信号時はスピーチと判定し易く、またＢＧＭを含めた音楽が多いステレオ信号時は非スピーチと判定し易くなるように最適化制御を行うことができる。また、この例では、音声信号のスピーチ／非スピーチの判定を的確に行うためにその音声信号に対してモノラル／ステレオ判定及び基準最適化が予めなされていることを前提とするが、ディレイなどを用いてもよいし、単に、音声信号が入力される度に、逐次、モノラル／ステレオ判定及び基準最適化を行ってスピーチ／非スピーチ判定を行っていってもよい。 For example, optimization control can be performed so that a monaural signal such as news can be easily determined as speech, and a stereo signal with a lot of music including BGM can be easily determined as non-speech. Also, in this example, it is assumed that monaural / stereo determination and reference optimization have been performed in advance for the sound signal in order to accurately determine the speech / non-speech of the sound signal. Alternatively, the speech / non-speech determination may be performed by sequentially performing monaural / stereo determination and reference optimization every time an audio signal is input.

また、ミュージック性検出手段１１ａやスピーチ性検出手段１１ｂにおける検出は、入力された音声信号に対して複数の信号解析を施すことによって行うようにすることが好ましい。信号解析としては、例えば、信号の対時間エネルギー変化解析，音節の均一解析，周波数対音声強度の解析などである。このような信号解析により、例えば、（Ｉ）信号の対時間エネルギー変化，（ＩＩ）周波数対音声強度，（ＩＩＩ）母音と子音の順序，（ＩＶ）音節の長さ，（Ｖ）子音と母音のエネルギー量などが得られる。そして、ミュージック性検出手段１１ａとスピーチ性検出手段１１ｂとの差として、これらの信号解析の一部又は全部のパラメータを異ならしめるようにすればよい。 Moreover, it is preferable that the detection in the music property detection means 11a and the speech property detection means 11b is performed by performing a plurality of signal analyzes on the input audio signal. Signal analysis includes, for example, signal energy change analysis with respect to time, syllable uniformity analysis, and frequency vs. sound intensity analysis. By such signal analysis, for example, (I) signal energy change over time, (II) frequency versus voice intensity, (III) vowel and consonant order, (IV) syllable length, (V) consonant and vowel. The amount of energy can be obtained. Then, as a difference between the music property detection means 11a and the speech property detection means 11b, some or all of these signal analysis parameters may be made different.

そして、これらの検出結果に基づいて、最終的に例えば次のような点を考慮して、スピーチ／非スピーチが判定されるようにするとよい。（Ｉ）スピーチには、音節（音声エネルギーが高い）と音節との間に、音声エネルギーが低い区分が存在し、非スピーチにはこのような区分は存在しないことが多い。（ＩＩ）スピーチが１００Ｈｚ〜３ｋＨｚの中域の強度が強く、非スピーチが低域及び高域の強度が強い。（ＩＩＩ）スピーチは、音節内の順序が子音から母音へと続く場合が多い。（ＩＶ）スピーチは、音節の長さが均一の場合が多い。（Ｖ）スピーチは、母音のエネルギー量が子音のエネルギー量より大きい場合が多い。さらに、（Ｉ）〜（Ｖ）に対し、重み付けを行って合算し、統計処理を施すなどして、最終的な信号解析の結果を得、その数値をモノラルの場合にはそれ用の閾値Ｖ_ＳＬ１でステレオの場合はそれ用の閾値Ｖ_ＳＬ２で判定することで、スピーチ／非スピーチの判定（例えばスピーチの可能性等の度合の判定）を行えばよい。他の方法として、基準最適化手段１４が、スピーチ／非スピーチの判定基準としての各信号解析に対する閾値のセットを、モノラル／ステレオ判定に基づいて変更するようにしてもよい。 Based on these detection results, speech / non-speech may be finally determined in consideration of, for example, the following points. (I) In speech, there is a segment with low speech energy between syllables (with high speech energy) and syllables, and such segments are often absent in non-speech. (II) The mid-range intensity of speech is 100 Hz to 3 kHz, and the non-speech intensity is high in low and high frequencies. (III) Speech often follows the order in a syllable from a consonant to a vowel. (IV) Speech often has uniform syllable lengths. In (V) speech, the amount of vowel energy is often greater than the amount of consonant energy. Further, weighting is performed on (I) to (V), and the result is subjected to statistical processing to obtain a final signal analysis result. _{When SL1} is stereo, it is sufficient to determine speech / non-speech (for example, determination of the degree of possibility of speech, etc.) by determining with the threshold V _SL2 for that. As another method, the reference optimization unit 14 may change the set of threshold values for each signal analysis as a speech / non-speech determination criterion based on the monaural / stereo determination.

音質調整手段１５は、上述のごとき構成によってスピーチ／非スピーチに判別された音声信号に対し、少なくともスピーチと非スピーチとで異なる音質に調整する。ここでの音質設定の方法は任意であり、スピーチ／非スピーチの可能性などの度合により、その設定値や増減の設定値、或いは各周波数帯での設定値などが異なっていればよい。例えば、グラフィックイコライザのごときイコライザの中心周波数とフィルタのＱ値（グラフィックイコライザの１つの帯域分のカーブにおける山，谷の鋭さ）が固定されている音質設定や、パラメトリックイコライザのごとくこれらも変更可能な音質設定であってもよい。そして、音声信号出力手段１６は、音質調整手段１５で調整された音声信号を出力する。 The sound quality adjusting means 15 adjusts the sound signal determined to be speech / non-speech by the configuration as described above to at least different sound quality between speech and non-speech. The sound quality setting method here is arbitrary, and the set value, the increase / decrease set value, or the set value in each frequency band may be different depending on the degree of possibility of speech / non-speech. For example, a sound quality setting in which the center frequency of the equalizer such as a graphic equalizer and the Q value of the filter (the sharpness of peaks and valleys in the curve for one band of the graphic equalizer) are fixed, and these can be changed as in the case of a parametric equalizer. Sound quality setting may be used. The audio signal output unit 16 outputs the audio signal adjusted by the sound quality adjustment unit 15.

そして、本発明の特徴となる判定結果表示手段１７は、ユーザに対し、スピーチ／非スピーチ判定手段１２における判定結果を、スピーチ或いは非スピーチの度合（例えば、スピーチ部分の割合やスピーチである可能性）に応じて段階的に表示する。実際、スピーチ／非スピーチ判定手段１２においては、上述のごとくスピーチ性及び非スピーチ性（ミュージック性）を検出し、その検出結果に応じて、計算式を選択し、その計算式での計算結果を所定の閾値で閾値処理し、スピーチであるか／非スピーチであるかの判定を下す。判定結果表示手段１７では、このようなスピーチ／非スピーチの判定結果を、そのレベル（例えばスピーチの度合）に応じて段階的に表示するようにしてもよい。このような段階的表示を行う際には、併せて複数段階の閾値処理（モノラル／ステレオの度合いに応じて少なくとも２セット以上の閾値群を用意しておくとよい）を行っておくなどして、各段階に応じた音質に調整するようにしておくことで、より段階的表示が効果的となる。 Then, the determination result display means 17 which is a feature of the present invention gives the determination result of the speech / non-speech determination means 12 to the user as the degree of speech or non-speech (for example, the ratio of speech portion or the possibility of speech) ) To display in stages. Actually, the speech / non-speech determination means 12 detects speech and non-speech (music) as described above, selects a calculation formula according to the detection result, and calculates the calculation result in the calculation formula. Threshold processing is performed with a predetermined threshold value, and it is determined whether the speech is speech / non-speech. The determination result display means 17 may display such a speech / non-speech determination result in stages according to the level (for example, the degree of speech). When performing such stepwise display, a plurality of threshold processings (at least two sets of threshold groups may be prepared according to the level of monaural / stereo) are performed. By adjusting the sound quality according to each stage, the staged display becomes more effective.

また、判定結果表示手段１７では、このようなスピーチ／非スピーチの判定の元となるスピーチ性検出結果或いはミュージック性（ミュージック信号）検出結果を、その検出レベル（例えばスピーチの度合）に応じて段階的に表示するようにしてもよい。また、このような場合には、判定結果の表示のみにスピーチ性検出結果及びミュージック性検出結果の双方を用い、音質調整にはスピーチ性検出結果をそのままスピーチ／非スピーチの判定結果として採用してもよい。但し、この場合、音質調整の元となるデータと判定結果のデータとが例えば音楽番組などで異なることとなってしまうが、その差異が視聴者に分からない程度（例えば放送内容と合う程度）となるような工夫を行う必要がある。 In addition, the determination result display means 17 provides a speech detection result or a music (music signal) detection result that is a source of such speech / non-speech determination according to the detection level (for example, the degree of speech). You may make it display automatically. In such a case, both the speech detection result and the music detection result are used only for displaying the determination result, and the speech detection result is used as it is as a speech / non-speech determination result for sound quality adjustment. Also good. However, in this case, the sound quality adjustment source data and the determination result data differ, for example, in a music program, etc., but the difference is not understood by the viewer (for example, suitable for the broadcast content). It is necessary to make such a device.

また、音質調整手段１５は、スピーチ／非スピーチ判定手段１２の判定結果に基づく音質調整手段１５による音質調整を実行するか否かを設定する調整設定手段を有するようにしてもよい。なお、スピーチ／非スピーチ判定以外に起因する音質調整については別途設定するなどすればよい。この調整設定手段ではユーザ操作により設定させることとなる。そして、ここでいう設定とは、例えば、（ａ）音質調整をスピーチ／非スピーチ判定に基づき自動的に行うこと、（ｂ）音質調整を固定すること（所定のスピーチに対して行う音質調整とするなど）、（ｃ）音質調整（あくまでスピーチ／非スピーチ判定に基づく音質調整）を行わないこと、などの選択肢の中からユーザの選択操作によって設定となる。その調整設定手段におけるユーザ設定に基づき、音質調整手段１５では（ａ），（ｂ），（ｃ）のそれぞれに合致した音質調整を行い、判定結果表示手段１７では、（ａ）の場合には判定結果（検出結果）の表示、（ｂ），（ｃ）の場合には非表示とする。このように、判定結果表示手段１７では、調整設定手段によって音質調整を実行するよう設定されている場合にのみ、判定結果の表示を行えばよい。例えば、単に上述の（ｂ）のごときスピーチ用の音質調整を行うだけのときには判定結果を表示しないことになる。 Further, the sound quality adjusting unit 15 may include an adjustment setting unit that sets whether or not to perform the sound quality adjustment by the sound quality adjusting unit 15 based on the determination result of the speech / non-speech determining unit 12. In addition, what is necessary is just to set separately about the sound quality adjustment resulting from other than speech / non-speech determination. This adjustment setting means is set by a user operation. The settings here include, for example, (a) automatically adjusting the sound quality based on speech / non-speech determination, (b) fixing the sound quality adjustment (sound quality adjustment performed on a predetermined speech, And (c) sound quality adjustment (sound quality adjustment based on speech / non-speech determination) is not performed, and the setting is set by the user's selection operation from among the options. Based on the user setting in the adjustment setting means, the sound quality adjustment means 15 performs sound quality adjustments that match each of (a), (b), and (c), and the determination result display means 17 performs the case of (a). Display of the determination result (detection result), and in the case of (b) and (c), it is not displayed. As described above, the determination result display unit 17 may display the determination result only when the adjustment setting unit is set to execute the sound quality adjustment. For example, the determination result is not displayed when the sound quality adjustment for speech is simply performed as in (b) described above.

さらに、判定結果表示手段１７は、判定結果の表示を実行するか否かを設定する表示設定手段を有するようにしてもよい。そして、判定結果表示手段１７では、表示設定手段によって判定結果表示を実行するよう設定されている場合にのみ、判定結果の表示を行えばよい。なお、この表示設定手段は上述の調整設定手段の具備の如何は問わず具備すればよいが、調整設定手段と共に具備する形態にあっては、判定結果表示手段１７は、調整設定手段で判定結果に基づく音質調整を実行する場合で、且つ判定結果表示を実行する場合でのみ、判定結果の表示を行うこととなる。 Furthermore, the determination result display unit 17 may include a display setting unit that sets whether or not to display the determination result. Then, the determination result display means 17 may display the determination result only when the display setting means is set to execute the determination result display. The display setting means may be provided regardless of the provision of the above-described adjustment setting means. However, in the case of being provided with the adjustment setting means, the determination result display means 17 is the adjustment result by the adjustment setting means. The determination result is displayed only when the sound quality adjustment based on is performed and the determination result display is performed.

図２は、図１の音質調整装置における音質調整処理並びに判定結果表示処理の一例を説明するためのフロー図で、図３は、図１の音質調整装置における音質調整処理で用いる音質設定イコライジングの一例を示す図、図４は、図２の判定結果表示処理における画面表示例を示す図である。 2 is a flowchart for explaining an example of the sound quality adjustment process and the determination result display process in the sound quality adjustment apparatus of FIG. 1, and FIG. 3 shows the sound quality setting equalization used in the sound quality adjustment process in the sound quality adjustment apparatus of FIG. FIG. 4 is a diagram illustrating an example, and FIG. 4 is a diagram illustrating a screen display example in the determination result display process of FIG.

簡略化のため、スピーチ／非スピーチにおける判定基準がある１つの閾値処理によってなされるものとして説明するが、複数段階の閾値処理を行う場合には以下の説明で閾値を閾値のセットと読みかえればよい。まず、音声信号が入力されると、モノラル／ステレオ判定手段１３によりモノラル／ステレオ判定がなされる（ステップＳ１）。この判定に際しては、例えば、Ｌを左入力信号、Ｒを右入力信号とすると、入力信号に（Ｌ−Ｒ）／（Ｌ＋Ｒ）の演算を実行し、位相差判定を実施するとよい。 For simplification, the description will be made on the assumption that a determination criterion for speech / non-speech is performed by a single threshold process. However, in the case of performing a threshold process in a plurality of stages, the threshold value can be read as a set of thresholds in the following description. Good. First, when an audio signal is input, monaural / stereo determination means 13 performs monaural / stereo determination (step S1). In this determination, for example, assuming that L is a left input signal and R is a right input signal, a calculation of (LR) / (L + R) is performed on the input signal, and the phase difference determination may be performed.

この判定により、モノラル信号であると判定された場合には、基準最適化手段１４において、スイッチ１４ａを閾値Ｖ_ＳＬ１への設定手段１４ｂ側へ接続し、スピーチ／非スピーチ判定手段１２における判定の閾値をＶ_ＳＬ１に設定する（ステップＳ２）。一方、ステップＳ１により、ステレオ信号であると判定された場合には、基準最適化手段１４において、スイッチ１４ａを閾値Ｖ_ＳＬ２への設定手段１４ｃ側へ接続し、スピーチ／非スピーチ判定手段１２における判定の閾値をＶ_ＳＬ２に設定する（ステップＳ３）。このように閾値の設定を最適化することで、ニュース等のモノラル信号時はスピーチと判定し易く、またＢＧＭを含めた音楽が多いステレオ信号時は非スピーチと判定し易くなるように制御することができる。なお、基準最適化手段１４の構成は図示したものに限定されるものではない。 If it is determined by this determination that the signal is a monaural signal, the reference optimization means 14 connects the switch 14a to the setting means 14b side for the threshold value _VSL1, and the determination threshold value in the speech / non-speech determination means 12 _Is set to _VSL1 (step S2). On the other hand, if it is determined in step S1 that the signal is a stereo signal, the reference optimization unit 14 connects the switch 14a to the setting unit 14c side for the threshold value _VSL2, and the determination in the speech / non-speech determination unit 12 is performed. _Is set to _VSL2 (step S3). By optimizing the threshold setting in this way, control is performed so that it is easy to determine speech for monaural signals such as news, and it is easy to determine non-speech for stereo signals with a lot of music including BGM. Can do. The configuration of the reference optimization unit 14 is not limited to that shown in the figure.

次に、ミュージック性検出手段１１ａ及びスピーチ性検出手段１１ｂが、ミュージック性の検出及びスピーチ性の検出をに実行する（ステップＳ４，Ｓ５）。ステップＳ４，Ｓ５の順序は問わない。そして、スピーチ／非スピーチ判定手段１２が、まず、ステップＳ４，Ｓ５での検出結果に基づいて計算式を選択して計算を実行し、さらにステップＳ２／Ｓ３のいずれかで設定された閾値Ｖ_ＳＬ１／Ｖ_ＳＬ２に基づいて、スピーチ／非スピーチの判定を行う（ステップＳ６）。そして、スピーチであると判定された場合には、音質設定Ａを選択して音質を調整する（ステップＳ７）。一方、ステップＳ６で非スピーチと判定された場合、音質設定Ｂを選択して音質を調整する（ステップＳ８）。 Next, the music property detection unit 11a and the speech property detection unit 11b perform music property detection and speech property detection (steps S4 and S5). The order of steps S4 and S5 does not matter. Then, the speech / non-speech determination unit 12 first selects a calculation formula based on the detection results in steps S4 and S5 and executes the calculation, and further, the threshold V _SL1 set in any of steps S2 / S3. Based on / V _SL2 , speech / non-speech determination is performed (step S6). If the speech is determined to be speech, the sound quality setting A is selected to adjust the sound quality (step S7). On the other hand, if it is determined in step S6 that the speech is not speech, the sound quality setting B is selected to adjust the sound quality (step S8).

ここで、音質設定Ａと音質設定Ｂとの違いの例について、図３を参照して説明する。音質設定Ａ（スピーチ）の場合、イコライザの周波数特性をグラフ２１で示すように設定し、音質設定Ｂ（非スピーチ）の場合、イコライザの周波数特性をグラフ２２で示すように設定する。グラフ２１とグラフ２２との違いは、非スピーチのときはスピーチのときに比べて、所定の低周波数２２ａの付近及び所定の高周波数２２ｂの付近を強調している点にある。 Here, an example of the difference between the sound quality setting A and the sound quality setting B will be described with reference to FIG. In the case of the sound quality setting A (speech), the frequency characteristic of the equalizer is set as shown in the graph 21, and in the case of the sound quality setting B (non-speech), the frequency characteristic of the equalizer is set as shown in the graph 22. The difference between the graph 21 and the graph 22 is that in the case of non-speech, the vicinity of the predetermined low frequency 22a and the vicinity of the predetermined high frequency 22b are emphasized compared to the case of speech.

ステップＳ７／Ｓ８の処理の前後（少なくともステップＳ６におけるスピーチ／非スピーチ判定の後）に、その判定結果を表示する（ステップＳ９）。この表示の方法としては音質調整装置にＬＥＤ表示するようにしてもよいし、音声信号が映像信号と共に入力されている場合には、例えば図４で例示するように、その映像信号を表示する画面３１上にＯＳＤ（ＯｎＳｃｒｅｅｎＤｉｓｐｌａｙ）表示を行うようにしてもよい。 The determination result is displayed before and after the processing of step S7 / S8 (at least after the speech / non-speech determination in step S6) (step S9). As a display method, an LED may be displayed on the sound quality adjusting device. When an audio signal is input together with a video signal, a screen for displaying the video signal, for example, as illustrated in FIG. OSD (On Screen Display) display may be performed on 31.

また、ステップＳ９における判定結果表示に際しては、スピーチ／非スピーチ判定によるスピーチ度合（或いは非スピーチ度合）が視認できるように、段階的に表示する。なお、ここでのスピーチ度合或いは非スピーチ度合は、ミュージック性検出手段１１ａ及びスピーチ性検出手段１１ｂが検出したミュージック性の度合及びスピーチ性の度合とは通常異なるものとする。なお、ここでの最低の段階表示処理としては、結果的に１つの閾値でスピーチ／非スピーチ判定の処理をして音質調整を実行する場合に対応させ、少なくともスピーチか非スピーチかの２段階で表示する。 When the determination result is displayed in step S9, the determination result is displayed step by step so that the speech degree (or non-speech degree) by the speech / non-speech determination can be visually recognized. Here, the degree of speech or the degree of non-speech is usually different from the degree of music and the degree of speech detected by the music detection unit 11a and the speech detection unit 11b. Note that the lowest stage display process here corresponds to the case where the speech / non-speech determination process is performed with one threshold value and the sound quality adjustment is executed, and at least two stages of speech or non-speech. indicate.

以下、スピーチ度合をユーザに視認させるような例で説明すると、図４で例示したように、例えば、画面３１上に「スピーチ度合」を表す文字３２等を表示させると共に、スピーチ度合（スピーチ検出レベル）に応じた数のマーク３３を表示させるとよい。このマーク３３の数は、スピーチ度合に応じた数であってスピーチセンサマークとも呼べ、結果的に音質調整がどの位スピーチ寄りになされているかを示すものであり、マーク３３の例としてはグリーンの色で口を開けた人の顔をイメージしたスピーチマークを表示するなどすればよい。その他、例えばユーザ設定によって、色の選択や（例えば日本語はグリーン、英文字はオレンジ等）、形状の選択（スピーカマーク，サイン，コサインマーク，フラッシング点滅等）も可能としておいてもよい。なお、図４の例では、「スピーチ度合」を表す文字３２として、スピーチ／非スピーチ判定に基づく音質調整の名称（ここでは「いきいきボイス」と命名）を示している。また、このスピーチセンサマークの近隣にスピーチ／非スピーチの判定結果の確実性などをパーセンテージで表示するようにしてもよい。この確実性は、ミュージック性の検出結果とスピーチ性の検出結果があまりにも相反するものであった場合に低いものとすればよい。 Hereinafter, an example in which the user visually recognizes the speech level will be described. As illustrated in FIG. 4, for example, the character 32 indicating “speech level” is displayed on the screen 31 and the speech level (speech detection level). The number of marks 33 may be displayed according to (). The number of marks 33 is a number corresponding to the degree of speech and can also be called a speech sensor mark. As a result, it indicates how much the sound quality adjustment is made, and an example of the mark 33 is green. For example, a speech mark representing the face of a person who opened his mouth with a color may be displayed. In addition, for example, color selection (for example, green for Japanese, orange for English) or selection of shape (speaker mark, sign, cosine mark, flashing flashing, etc.) may be made possible by user settings. In the example of FIG. 4, the name of the sound quality adjustment based on the speech / non-speech determination (named “lively voice” here) is shown as the character 32 representing “speech degree”. Further, the certainty of the determination result of speech / non-speech may be displayed as a percentage in the vicinity of the speech sensor mark. This certainty may be low when the music detection result and the speech detection result are too contradictory.

また、判定結果表示に際しては、マーク３３のごとく画面３１の下部に顔イメージを横方向に表示するようにしてもよいし、自動又は手動によって表示位置を任意の位置に移動できるようにすること、さらには縦型表示／横型表示を変更することも可能としておくとよい。また、表示位置を移動する方法として、例えば画面の下部や上部に文字が表示された場合は、それらの文字と重ならない位置に移動できるようにするとよい。より具体的には、例えば、音声多重放送の日本語の吹き替え表示や画面の下部にデータ放送のニュース情報等の文字表示と重ならない位置などに移動すればよい。また、ＥＰＧから番組種別情報（例えば歌番組かそれ以外の番組）を取得して、歌番組の場合に表示の大きさを小さく又は大きくするとともに、画面に表示される歌詞の表示と重ならない位置に表示するなどの応用も可能である。 When displaying the determination result, the face image may be displayed in the horizontal direction at the bottom of the screen 31 like the mark 33, or the display position may be moved to an arbitrary position automatically or manually. Furthermore, it is preferable that the vertical display / horizontal display can be changed. Further, as a method of moving the display position, for example, when characters are displayed at the lower or upper part of the screen, it is preferable that the display position can be moved to a position that does not overlap those characters. More specifically, for example, it may be moved to a position where it does not overlap with text display such as Japanese dubbing display of audio multiplex broadcasting or news information of data broadcasting at the bottom of the screen. In addition, the program type information (for example, a song program or other program) is acquired from the EPG, and in the case of a song program, the display size is reduced or increased, and the position that does not overlap with the display of lyrics displayed on the screen Applications such as displaying on the screen are also possible.

さらに、本発明は、上述のごとき音質調整装置とコンテンツ入力装置とを備えたコンテンツ表示装置（例えば、デジタル／アナログに限らずテレビジョン放送やラジオ放送の放送信号を受信する放送受信装置）にも適用可能である。このコンテンツ表示装置では、コンテンツ入力装置で入力されたコンテンツに含まれる音声信号を音質調整装置に入力し、音質を調整して音声出力し、且つ、コンテンツに含まれる映像信号を表示すると共に、必要に応じて判定結果表示手段１７による判定結果表示を行う。本発明に係るコンテンツ表示装置は、例えば、テレビジョン受信機をはじめ、コンテンツ再生プログラム，ビデオカード（ビデオアダプタともいう）等のモジュールを備えた汎用のパーソナルコンピュータ（以下、ＰＣと略す）などにも、後述するように適用可能である。また、本発明においては、コンテンツの配信及び放送形態は基本的に問わない。次に、音質調整装置を組み込んだコンテンツ表示装置の例としてテレビ受信機（テレビ受像機）を挙げて、より具体的に説明する。 Furthermore, the present invention is also applied to a content display device (for example, a broadcast receiving device that receives a broadcast signal of a television broadcast or a radio broadcast as well as a digital / analog) including the sound quality adjusting device and the content input device as described above. Applicable. In this content display device, the audio signal included in the content input by the content input device is input to the sound quality adjustment device, the sound quality is adjusted and the sound is output, the video signal included in the content is displayed, and necessary. In response to this, the determination result display means 17 displays the determination result. The content display device according to the present invention is also applicable to, for example, a general-purpose personal computer (hereinafter abbreviated as a PC) including modules such as a television receiver, a content reproduction program, and a video card (also referred to as a video adapter). It is applicable as described later. Further, in the present invention, the distribution and broadcasting form of the content are basically not questioned. Next, a television receiver (television receiver) is given as an example of a content display device incorporating a sound quality adjusting device, and will be described more specifically.

図５は、図１の音質調整装置における適用例の一つであるテレビ受像機の一構成例を示すブロック図で、図６は、図５におけるマイコン内に格納されている計算式テーブルの一例を示す図で、図７は、図５におけるマイコン内に格納されているマーク表示目標テーブルの一例を示す図である。図５において、４はテレビ受像機本体、４０はチューナ部、４１は外部入力部、４２は本体操作部、４３は映像処理ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、４４は本体のマイクロコンピュータ（以下、マイコン）、４５は音声処理ＩＣ、４６はディスプレイ、４７Ｌは左スピーカ、４７Ｒは右スピーカ、４８は受光部、４９はリモートコントローラユニット（以下、リモコン）である。また、図６及び図７において、５１はマイコン４４内のＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）等に格納された計算式テーブル、５２はマイコン４４内のＲＯＭ等に格納されたスピーチセンサマーク表示目標テーブルである。 FIG. 5 is a block diagram showing a configuration example of a television receiver which is one of application examples in the sound quality adjustment apparatus of FIG. 1, and FIG. 6 is an example of a calculation formula table stored in the microcomputer in FIG. FIG. 7 is a diagram showing an example of a mark display target table stored in the microcomputer in FIG. In FIG. 5, 4 is a television receiver main body, 40 is a tuner unit, 41 is an external input unit, 42 is a main body operation unit, 43 is a video processing IC (Integrated Circuit), 44 is a main body microcomputer (hereinafter referred to as a microcomputer), 45 is an audio processing IC, 46 is a display, 47L is a left speaker, 47R is a right speaker, 48 is a light receiving unit, and 49 is a remote controller unit (hereinafter referred to as a remote controller). 6 and 7, 51 is a calculation formula table stored in a ROM (Read Only Memory) or the like in the microcomputer 44, and 52 is a speech sensor mark display target table stored in the ROM or the like in the microcomputer 44. .

また、図８は、図５のテレビ受像機におけるスピーチ／非スピーチ判定及び判定結果表示処理を説明するためのフロー図で、図９は、図５のテレビ受像機における判定結果表示処理を説明するためのフロー図で、図２のフロー図における判定結果表示処理を抜粋して詳細に説明するためのフロー図でもある。さらに、図１０乃至図１２は、図１の音質調整装置における判定結果表示の設定画面の一例を示す図で、図１０は音声調整の設定項目例を、図１１は図１０の設定項目例のうちの本発明に係る音質調整に対する動作設定の項目例を、図１２は図１０の設定項目例のうちの本発明に係る音質調整に対する表示設定の項目例を、それぞれ示している。また、図１０乃至図１２において、６は音声調整の設定画面例、６１は設定メニュー一覧、６２は音声調整項目一覧、６３は動作設定項目、６４は表示設定項目である。 FIG. 8 is a flowchart for explaining speech / non-speech determination and determination result display processing in the television receiver of FIG. 5, and FIG. 9 explains the determination result display processing in the television receiver of FIG. FIG. 3 is a flowchart for excerpting the determination result display process in the flowchart of FIG. 2 and explaining in detail. Further, FIGS. 10 to 12 are diagrams illustrating an example of a setting screen for determination result display in the sound quality adjustment device of FIG. 1, FIG. 10 is an example of setting items for sound adjustment, and FIG. FIG. 12 shows an example of operation setting items for sound quality adjustment according to the present invention, and FIG. 12 shows an example of display setting items for sound quality adjustment according to the present invention, among the setting item examples of FIG. 10 to 12, 6 is an example of a setting screen for voice adjustment, 61 is a setting menu list, 62 is a voice adjustment item list, 63 is an operation setting item, and 64 is a display setting item.

ここで例示するテレビ受像機本体４は、主として、制御手段の一例としての本体マイコン４４、アンテナ及びチューナ部４０や外部入力部４１などの映像・音声入力部、入力した映像信号に対し各種映像処理を施す映像処理ＩＣ４３、入力した音声信号に対し各種音声処理を施す音声処理ＩＣ４５、ユーザ操作を受け付ける本体操作部４２、映像処理した映像信号を映し出すＬＣＤ，ＰＤＰ，有機ＥＬ等のディスプレイ（表示デバイス）４６、音声処理した音声信号を出力する左右のスピーカ４７Ｌ，４７Ｒ、リモコン４９からの光を受光する受光部４８により構成される。そして、マイコン４４内のＲＯＭ等には、計算式テーブル５１及びスピーチセンサマーク表示目標テーブル５２が格納されているものとする。なお、マイコン４４及び音声処理ＩＣ４５（及び映像処理ＩＣ４３）は、システムＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）としても組み込むこともできる。 The television receiver main body 4 exemplified here mainly includes a main body microcomputer 44 as an example of a control means, a video / audio input unit such as an antenna and a tuner unit 40 and an external input unit 41, and various video processings for input video signals. A video processing IC 43 that performs various processing on the input audio signal, a main body operation unit 42 that accepts user operations, a display (display device) such as an LCD, PDP, or organic EL that displays the video signal that has undergone the video processing 46, left and right speakers 47L and 47R that output audio signals subjected to audio processing, and a light receiving unit 48 that receives light from the remote controller 49. It is assumed that a calculation formula table 51 and a speech sensor mark display target table 52 are stored in a ROM or the like in the microcomputer 44. The microcomputer 44 and the audio processing IC 45 (and the video processing IC 43) can also be incorporated as a system LSI (Large Scale Integrated Circuit).

また、周期処理時間の設定を、テレビ受像機４における調整工程で設定しておく。この周期処理時間の設定は、本発明に係る判定結果表示処理を行うに際し、音声処理ＩＣ４５でなされるスピーチ／非スピーチの判定結果をマイコン４４で読み取る周期を設定する処理であり、例えば１００ｍｓ単位で読み取る設定しておくとよい。ここでは、例えば１００ｍｓ〜２０００ｍｓの間で可変としてもよく、調整工程だけでなくユーザ設定によっても可変としてもよい。このように読み取り時間をある程度固定しないと、判定結果表示の滑らかさに影響してしまう。実際にここで設定された周期で読み取られるデータ、すなわちスピーチ／非スピーチの判定結果のデータとしては、例えばレジスタの可動範囲として−１００〜０〜＋１００（ＦＦＦＦ９Ｃ〜００００００〜００００６４）を用意しておき、このレジスタの初期設定値を「００００００」としておく。そして、音質調整自体は、このレジスタ値が正方向でスピーチ、負方向で非スピーチの音質設定となるように制御しておく。なお、音質調整を行わないモードの時は、マイコン４４内部で強制的にスピーチの音質設定にするなどすればよいが、上述のスピーチ／非スピーチの判定結果のレジスタへの書込みは行わない。 In addition, the setting of the periodic processing time is set in the adjustment process in the television receiver 4. The setting of the period processing time is a process for setting a period for reading the speech / non-speech determination result performed by the voice processing IC 45 by the microcomputer 44 when the determination result display process according to the present invention is performed. It is good to set to read. Here, for example, it may be variable between 100 ms and 2000 ms, and may be variable not only by the adjustment process but also by user settings. Thus, if the reading time is not fixed to some extent, the smoothness of the determination result display is affected. For example, -100 to 0 to +100 (FFFF9C to 000000 to 000064) is prepared as the movable range of the register as data that is actually read at the period set here, that is, data of the speech / non-speech determination result. The initial setting value of this register is set to “000000”. Then, the sound quality adjustment itself is controlled so that the register value is set to a sound quality setting of speech in the positive direction and non-speech in the negative direction. In the mode in which the sound quality adjustment is not performed, the speech sound quality setting may be forcibly set in the microcomputer 44, but the above-described speech / non-speech determination result is not written to the register.

また、音質設定の計算式は、図６で例示した次式などにより予め設定しておく。まず、スピーチ性検出結果を、（Ｉ）０≦ＳＰ結果≦ＳＰＥＥＣＨＬＰ、（ＩＩ）ＳＰＥＥＣＨＬＰ＜ＳＰ結果＜ＳＰＥＥＣＨＨＰ、（ＩＩＩ）ＳＰＥＥＣＨＨＰ≦ＳＰ結果、の３つの段階に分類分けしておく。ミュージック性検出結果は、（ｉ）０≦ＭＵ結果≦ＭＵＳＩＣＬＰ，（ｉｉ）ＭＵＳＩＣＬＰ＜ＭＵ結果＜ＭＵＳＩＣＨＰ、（ｉｉｉ）ＭＵＳＩＣＨＰ≦ＭＵ結果、の３つの段階に分類分けしておく。なお、例えば、ＳＰ結果はスピーチ性検出結果／８３８８６の整数部分を、ＭＵ結果はミュージック性検出結果／８３８８６の整数部分を採用すればよい。そして、ＳＰ結果及びＭＵ結果は、例えば０〜１００の範囲の値（００００００ｈ〜７ＦＦＦＦＦｈ）とすればよい。 The sound quality setting calculation formula is set in advance by the following formula illustrated in FIG. First, the speech detection result is classified into three stages: (I) 0 ≦ SP result ≦ SPEECH LP, (II) SPEECH LP <SP result <SPEECH HP, and (III) SPEECH HP ≦ SP result. . The music property detection result is classified into three stages: (i) 0 ≦ MU result ≦ MUSIC LP, (ii) MUSIC LP <MU result <MUSIC HP, and (iii) MUSIC HP ≦ MU result. For example, the SP result may be the integer part of the speech detection result / 83886, and the MU result may be the integer part of the music detection result / 83886. The SP result and the MU result may be set to a value in the range of 0 to 100 (000000h to 7FFFFFh), for example.

そして、（Ｉ）且つ（ｉ）の場合、｜ＳＰ結果−ＭＵ結果｜＋α、（Ｉ）且つ（ｉｉ）の場合、｜ＳＰ結果−ＭＵ結果｜、（Ｉ）且つ（ｉｉｉ）の場合、−ＭＵ結果、（ＩＩ）且つ（ｉ）の場合、ＳＰ結果−ＭＵ結果、（ＩＩ）且つ（ｉｉ）の場合、｜ＳＰ結果−ＭＵ結果｜＋α、（ＩＩ）且つ（ｉｉｉ）の場合、ＳＰ結果−ＭＵ結果、（ＩＩＩ）且つ（ｉ）の場合、ＳＰ結果、（ＩＩＩ）且つ（ｉｉ）の場合、ＳＰ結果−ＭＵ結果＋α、（ＩＩＩ）且つ（ｉｉｉ）の場合、｜ＳＰ結果−ＭＵ結果｜＋α、といった計算式を用いる。 In the case of (I) and (i), | SP result−MU result | + α, in the case of (I) and (ii), | SP result−MU result |, in the case of (I) and (iii), − MU result, (II) and (i), SP result-MU result, (II) and (ii), | SP result-MU result | + α, (II) and (iii), SP result -MU result, (III) and (i), SP result, (III) and (ii), SP result -MU result + α, (III) and (iii), | SP result -MU result A calculation formula such as | + α is used.

ここで、ＳＰＥＥＣＨＬＰ、ＳＰＥＥＣＨＨＰ、ＭＵＳＩＣＬＰ、ＭＵＳＩＣＨＰは０〜１００の範囲で、状態の境界線となり、ＭＯＮＯ、ＳＴＥは０〜１００の範囲でモノラル／ステレオ判定によるステレオ判定時「ＳＴＥ」、モノラル判定時「ＭＯＮＯ」の値を＋αとして計算結果に加算している。また、これらの値「ＳＰＥＥＣＨＬＰ」「ＳＰＥＥＣＨＨＰ」「ＭＵＳＩＣＬＰ」「ＭＵＳＩＣＨＰ」「ＭＯＮＯ」「ＳＴＥ」は、調整工程で用意しておけばよい。「ＳＴＥ」の場合、α＝＋５、「ＭＯＮＯ」の場合、α＝＋１０などと決めておけばよく、αはマイナスの値であってもよい。 Here, SPEECH LP, SPEECH HP, MUSIC LP, and MUSIC HP are in the range of 0 to 100, which is a boundary line of the state, and MONO and STE are in the range of 0 to 100 and “STE” at the time of stereo determination by mono / stereo determination At the time of monaural determination, the value of “MONO” is added to the calculation result as + α. Also, these values “SPEECH LP”, “SPEECH HP”, “MUSIC LP”, “MUSIC HP”, “MONO”, and “STE” may be prepared in the adjustment step. In the case of “STE”, α = + 5, in the case of “MONO”, α = + 10 may be determined, and α may be a negative value.

音質設定の計算式の他に、表示目標数を下式、並びに下式におけるＭＩＮ及びＭＡＸの値の設定などにより、予め設定しておく。ここで、各表示数の設定値は「以上未満」とする。なお、下式をスピーチセンサマーク表示目標テーブル５２などとして格納しておけばよい。 In addition to the sound quality setting calculation formula, the display target number is set in advance by the following formula, and the MIN and MAX values in the following formula. Here, the set value of each display number is “less than or equal to”. The following equation may be stored as the speech sensor mark display target table 52 or the like.

ＭＩＮ＋（ＭＡＸ−ＭＩＮ）×変数［１〜９］÷９ MIN + (MAX−MIN) × variables [1-9] ÷ 9

上式において、ＭＡＸ及びＭＩＮは、上述した例でいうところの−１００〜＋１００の間の値として予め設定される最大値及び最小値であり、例えばＭＩＮを−８０、ＭＡＸを９０などと予め設定しておけばよい。さらに下式では、判定結果表示を１０段階（つまりＭＡＸ）で行うものとして、すなわち表示の個数の一例として図４のマーク３３が０〜１０個表示できるように予め設定されているものとして例示しているが、これに限ったものではない。 In the above equation, MAX and MIN are the maximum and minimum values preset as values between −100 and +100 in the above example, for example, MIN is set to −80, MAX is set to 90, etc. You just have to. Further, in the following expression, the determination result is displayed in 10 steps (that is, MAX), that is, as an example of the number of displays, it is illustrated as being preset so that 0 to 10 marks 33 in FIG. 4 can be displayed. However, it is not limited to this.

上述のごときテレビ受像機４におけるマイコン４４の処理は、図８を参照すると、まず、上述のごとく設定された周期での周期処理（例えば１００ｍｓ単位）を行う（ステップＳ１１）。ステップＳ１１では、処理周期の到来によって、以下のステップＳ１２〜Ｓ１６を実行させることになる。まずステップＳ１２では、動作設定が自動か否かを判定する。自動であれば、ステップＳ１３〜Ｓ１６の処理を実行してスピーチ／非スピーチ判定結果に基づく音質調整を実行することとなるが、自動でない（固定）の場合には以降の処理は実行せず、例えば強制的にスピーチ用の音質設定を行うなどすればよい。 As for the processing of the microcomputer 44 in the television receiver 4 as described above, referring to FIG. 8, first, periodic processing (for example, in units of 100 ms) is performed with the cycle set as described above (step S11). In step S11, the following steps S12 to S16 are executed according to the arrival of the processing cycle. First, in step S12, it is determined whether or not the operation setting is automatic. If it is automatic, the processing of steps S13 to S16 is executed and the sound quality adjustment based on the speech / non-speech determination result is executed. However, if it is not automatic (fixed), the subsequent processing is not executed. For example, the sound quality setting for speech may be forcibly performed.

ステップＳ１３では、マイコン４４は、音声処理ＩＣ４５に命令することでスピーチ性及びミュージック性の検出を行わせ、その検出結果を読み込む。次に或いはステップＳ１３の前段で、マイコン４４は、音声処理ＩＣ４５に命令することでモノラル／ステレオの判定を行わせ、その検出結果を読み込む（ステップＳ１４）。そして、マイコン４４は、読み取った音声処理ＩＣ４５における検出結果を、テーブル５１と比較することで計算式を選択する（ステップＳ１５）。ステップＳ１５では、スピーチ性検出結果及びミュージック性検出結果と「ＳＰＥＥＣＨＬＰ」「ＳＰＥＥＣＨＨＰ」「ＭＵＳＩＣＬＰ」「ＭＵＳＩＣＨＰ」を比較し計算式を決定することとなる。そして、マイコン４４は、テーブル５１上の該当する計算式を用い、モノラル／ステレオ判定結果を併せて代入して計算結果を算出し、スピーチ／非スピーチの判定結果（音質設定の計算結果）を算出してレジスタに書き込む（ステップＳ１６）。このレジスタの値が、図９のステップＳ２２での表示目標値の設定に使用される。 In step S13, the microcomputer 44 instructs the voice processing IC 45 to detect speech and music and reads the detection result. Next or before the step S13, the microcomputer 44 instructs the audio processing IC 45 to determine monaural / stereo and reads the detection result (step S14). Then, the microcomputer 44 selects a calculation formula by comparing the detection result in the read voice processing IC 45 with the table 51 (step S15). In step S15, the speech detection result and the music detection result are compared with “SPEECH LP”, “SPEECH HP”, “MUSIC LP”, and “MUSIC HP” to determine the calculation formula. Then, the microcomputer 44 calculates the calculation result by substituting the monaural / stereo determination result together with the corresponding calculation formula on the table 51, and calculates the speech / non-speech determination result (the calculation result of the sound quality setting). Is written in the register (step S16). The value of this register is used for setting the display target value in step S22 of FIG.

マイコン４４における表示処理は、まず、上述のごとく設定された周期での周期処理（例えば１００ｍｓ単位）を行う（ステップＳ２１）。ステップＳ２１では、処理周期の到来によって、以下のステップＳ２２〜Ｓ３２を実行させることになる。まず、ステップＳ２２では、図８で説明した処理の結果得られた判定結果のレジスタ値を上式（テーブル５２）に代入すること、すなわち音質設定（音質調整）による計算結果をテーブル５２に代入することで、表示目標値を設定、すなわち表示数を決定する。 In the display processing in the microcomputer 44, first, periodic processing (for example, in units of 100 ms) is performed with the period set as described above (step S21). In step S21, the following steps S22 to S32 are executed according to the arrival of the processing cycle. First, in step S22, the register value of the determination result obtained as a result of the processing described in FIG. 8 is substituted into the above equation (table 52), that is, the calculation result by sound quality setting (sound quality adjustment) is substituted into table 52. Thus, the display target value is set, that is, the display number is determined.

ここで、同期無し時及び無音時は表示を即時に“０”とする（ステップＳ２３，Ｓ２４）。ステップＳ２３において、入力信号の同期の有無の判定及び無音状態の判定を行い、入力信号同期が無かった場合或いは無音状態であった場合、ステップＳ２４において「強制的に“０”」とする計算を行って、ステップＳ３０へ進む。無音状態の判定については他の実施形態で後述する。なお、ステップＳ２３の判断及びステップＳ２４における計算は、例えばユーザがニュース番組を視聴していて次に選曲によって砂嵐の画面が表示された場合などに有効である。このような場合、またスピーチ／非スピーチの判定結果としては例えばスピーチであるとの判定結果（例えばレジスタ値が＋１００）が徐々に０に落ちてはいくがレジスタに残ってしまっており、周期的な表示がそのレジスタ値（その残った値）を読み取って実行するようになっていることから、スピーチ／非スピーチの判定が実行できない砂嵐に対しても実行されているようにユーザが勘違いしてしまう。従って、このような勘違いを防止するために強制的にレジスタ値を０にする必要がある。 Here, the display is immediately set to “0” when there is no synchronization and when there is no sound (steps S23 and S24). In step S23, the presence / absence of synchronization of the input signal and the silence state are determined. If the input signal is not synchronized or is silent, the calculation is forcibly set to “0” in step S24. Go to step S30. The determination of the silent state will be described later in another embodiment. Note that the determination in step S23 and the calculation in step S24 are effective, for example, when the user views a news program and then displays a sandstorm screen by selecting a song. In such a case, as a speech / non-speech determination result, for example, a determination result indicating that the speech is a speech (for example, the register value is +100) gradually falls to 0 but remains in the register. Since the display is executed by reading the register value (the remaining value), the user misunderstands that it is also executed for a sandstorm where speech / non-speech determination cannot be performed. End up. Therefore, it is necessary to forcibly set the register value to 0 in order to prevent such a misunderstanding.

一方、ステップＳ２３でＮＯの場合、前周期の表示数がステップＳ２２で設定された表示目標値であるか否かを判定する（ステップＳ２５）。ステップＳ２５でＹＥＳの場合、その表示数を維持し（ステップＳ２６）、ステップＳ３０へ進む。ステップＳ２５でＮＯの場合、前周期の表示数がステップＳ２２で設定された表示目標値より小さいか否かを判定する（ステップＳ２７）。ステップＳ２７でＹＥＳの場合、「前周期の表示数＋１」の計算を実行し（ステップＳ２８）、ステップＳ３０へ進む。ステップＳ２７でＮＯの場合、「前周期の表示数−１」の計算を実行し（ステップＳ２９）、ステップＳ３０へ進む。 On the other hand, in the case of NO in step S23, it is determined whether or not the display number of the previous cycle is the display target value set in step S22 (step S25). If YES in step S25, the display number is maintained (step S26), and the process proceeds to step S30. In the case of NO in step S25, it is determined whether or not the display number of the previous cycle is smaller than the display target value set in step S22 (step S27). If “YES” in the step S27, a calculation of “the display number of the previous cycle + 1” is executed (step S28), and the process proceeds to the step S30. In the case of NO in step S27, the calculation of “number of previous period displays−1” is executed (step S29), and the process proceeds to step S30.

そして、ステップＳ２４，Ｓ２６，Ｓ２８，Ｓ２９の後、表示数を前周期の表示数に格納し（ステップＳ３０）、表示するか否かの判定を行って（ステップＳ３１）、表示すると判定された場合には画面に表示を行い（ステップＳ３２）、そうでない場合にはそのままこの周期での処理を終了して次の周期の到来を待つ。このように、マイコン４４では、ＲＯＭ内に格納されたテーブル５２を元に、上述のごとき周期処理及び計算がなされる。 Then, after Steps S24, S26, S28, and S29, the display number is stored in the display number of the previous cycle (Step S30), and whether or not to display is determined (Step S31). Is displayed on the screen (step S32). If not, the process in this cycle is terminated and the next cycle is awaited. Thus, the microcomputer 44 performs the periodic processing and calculation as described above based on the table 52 stored in the ROM.

次に、ステップＳ３１における判定に関して説明する。この判定は、デフォルト値或いはユーザ設定を読み取ることでなされる。ここで、ユーザ設定は、上述した調整設定手段並びに表示設定手段における設定がそれに相当し、次のような手順でなされる。まず、図１０に示すようにユーザメニュー一覧６１（映像調整，音声調整，本体設定，機能切替）を表示し、ユーザが音声調整を選択することで、音声調整に関する項目一覧６２（高音，低音，バランス，サラウンド，いきいきボイス，リセット）を表示する。ユーザが、その中から本発明に係る音質調整（「いきいきボイス」６２ａ）を選択することで、図１１或いは図１２のように、動作設定項目（調整設定手段における設定項目）６３及び表示設定項目６４（表示設定手段における設定項目）を表示する。 Next, the determination in step S31 will be described. This determination is made by reading a default value or a user setting. Here, the user setting corresponds to the setting in the adjustment setting means and the display setting means described above, and is performed in the following procedure. First, as shown in FIG. 10, a user menu list 61 (video adjustment, audio adjustment, main unit setting, function switching) is displayed, and when the user selects audio adjustment, an item list 62 (high, low, (Balance, surround, lively voice, reset) is displayed. When the user selects the sound quality adjustment (“live voice” 62a) according to the present invention from among them, the operation setting item (setting item in the adjustment setting means) 63 and the display setting item as shown in FIG. 11 or FIG. 64 (setting item in the display setting means) is displayed.

動作設定項目６３としては、例えば、本発明に係る音質調整を行わない設定に相当する「切」６３ａ、スピーチ／非スピーチの判定無しで或いは判定に依らずにスピーチ（又は非スピーチ）寄りの音質に調整するための設定に相当する「固定」６３ｂ、及び自動でスピーチ／非スピーチの判定並びにその判定結果に基づく音質調整を行う設定に相当する「自動」６３ｃを用意しておく。そして、「動作設定」が「自動」６３ｃの時にスピーチセンサマークを表示し、「固定」６３ｂ，「切」６３ａの時にはスピーチセンサマークを表示しない。なお、フローのように、「切」６４ａに設定されている時でもデータの読み取りを行っておくとよい。一方、表示設定項目６４としては、「表示なし」６４ａ及び「表示あり」６４ｂを用意しておき、「表示設定」が「表示あり」６４ｂの時だけ、スピーチセンサマークを表示する。勿論、設定周期（例えば１００ｍｓ単位）毎にデータを読み取って画面下部にスピーチセンサマークを表示すること自体を、「表示あり」６４ｂに設定されている時のみ実行してもよい。 As the operation setting item 63, for example, “OFF” 63a corresponding to a setting for performing no sound quality adjustment according to the present invention, sound quality close to speech (or non-speech) without or without determination of speech / non-speech “Fixed” 63b corresponding to the setting for adjusting to “Automatic” and “Automatic” 63c corresponding to the setting for automatically performing speech / non-speech determination and sound quality adjustment based on the determination result are prepared. When the “operation setting” is “automatic” 63c, the speech sensor mark is displayed. When the “operation setting” is “fixed” 63b and “off” 63a, the speech sensor mark is not displayed. Note that data may be read even when “OFF” 64a is set as in the flow. On the other hand, “no display” 64a and “display” 64b are prepared as display setting items 64, and the speech sensor mark is displayed only when “display setting” is “display” 64b. Of course, reading the data every set cycle (for example, in units of 100 ms) and displaying the speech sensor mark at the bottom of the screen may be executed only when “with display” 64b is set.

上述のごとき構成及び処理により、本実施形態では、入力された音声信号に対してスピーチ／非スピーチを判定する際に、その判定結果をユーザに視認させることが可能となる。このような判定結果をユーザに視認させることによって、その判定結果に基づいて処理されている音質調整の正しい要因もユーザに把握させることが可能となる。また、その視認によって、さらなるユーザ設定も可能になる。また、スピーチ／非スピーチを判定する際にモノラル／ステレオ判定を行うことで、音声信号の音声情報だけからではなく番組（その音声信号を含む番組）の主旨に沿った判断（スピーチ／非スピーチの判断）も同時になすことで、入力された音声信号の特性によるイコライザ等の音響パラメータ制御の誤判定を極力低減し、的確な音響パラメータの制御及び的確な音質調整が可能となる。また、例えば、音声信号に音声情報と同時に重畳されたモノラル／ステレオ信号によってその番組の主旨を判定し、その結果に応じて入力された音声信号がスピーチか非スピーチ（音楽）かを判断するための判断基準を最適化することによって、放送された番組の内容、特性に応じたスピーチ／非スピーチ検出の自由な制御、及びその制御に基づく機器の制御（例えば音質調整や分別録画等）も可能になる。 With the configuration and processing as described above, in this embodiment, when speech / non-speech is determined for an input audio signal, the determination result can be made visible to the user. By allowing the user to visually recognize such a determination result, it is possible to cause the user to understand the correct factor of the sound quality adjustment being processed based on the determination result. Further, further user settings can be made by the visual recognition. In addition, by performing monaural / stereo determination when determining speech / non-speech, not only the audio information of the audio signal but also the determination (speech / non-speech) based on the gist of the program (the program including the audio signal) (Judgement) is also performed at the same time, it is possible to reduce erroneous determination of acoustic parameter control of an equalizer or the like due to the characteristics of the input audio signal as much as possible, and it is possible to perform accurate acoustic parameter control and accurate sound quality adjustment. Also, for example, in order to determine whether the program is based on a monaural / stereo signal superimposed on the audio signal at the same time as the audio information, and to determine whether the input audio signal is speech or non-speech (music) according to the result. By optimizing the judgment criteria, it is possible to freely control speech / non-speech detection according to the contents and characteristics of the broadcasted program, and to control equipment based on that control (for example, sound quality adjustment and separate recording) become.

また、本実施形態に係るコンテンツ表示装置では、例えば、スピーチ自動検出機能を使用し、ＴＶ番組やビデオ／ＤＶＤ等がスピーチ音声か非スピーチ音声かを視覚的に認識できる表示機能を備えることで、現在表示しているコンテンツがスピーチ音声か非スピーチ音声かをユーザに視覚的に認識させることが可能となる。すなわち、リアルタイムにＴＶ番組やビデオ／ＤＶＤ等の音声体系（スピーチ／非スピーチ）が視覚的にわかる。また、上述したスピーチ／非スピーチの判定をコンテンツの記録（再録画も含む）に適用してもよく、その場合には、コンテンツ表示装置に、コンテンツを放送経由，ネットワーク経由，記録媒体経由などで取得するだけでなく取得したコンテンツを記録或いは予約記録する機能を付加しておくとよい。例えば、各種レコーダなどでスピーチ／非スピーチ判定をＣＭ判定やその他の分別録画に利用することもでき、そのときに、併せてそのコンテンツがスピーチに相当するのか、或いは非スピーチに相当するのかをユーザに視認可能なように表示すればよい。 In addition, the content display device according to the present embodiment includes a display function capable of visually recognizing whether a TV program, a video / DVD, or the like is a speech sound or a non-speech sound by using a speech automatic detection function, for example. It is possible to make the user visually recognize whether the currently displayed content is speech voice or non-speech voice. That is, an audio system (speech / non-speech) such as a TV program or video / DVD can be visually recognized in real time. In addition, the speech / non-speech determination described above may be applied to content recording (including re-recording). In this case, the content is transmitted to the content display device via broadcast, via a network, via a recording medium, or the like. It is preferable to add a function to record or reserve record the acquired content as well as the acquired content. For example, speech / non-speech determination can be used for CM determination and other separate recordings with various recorders, etc., and at that time, whether the content corresponds to speech or non-speech at the same time It may be displayed so as to be visible.

また、図１乃至図１２で上述した音質調整装置１やテレビ受像機４等のコンテンツ表示装置、さらにはそれらの構成要素となる各手段は、上述したように、ハードウェアで構成してもよいがその一部をソフトウェアで構成してもよい。例えば、図５のマイコンで示したようなコンピュータやＰＣ等の汎用コンピュータなどにプログラムを組み込むことで構成してもよく、その場合の各種処理について、図１３に示す一般的な情報処理装置の構成例を参照して説明する。図１３は、一般的な情報処理装置の構成例を示すブロック図で、図中、７は情報処理装置、７１はＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、７２はＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、７３は書き換え可能なＲＯＭ、７４は入力装置、７５は表示装置、７６は出力装置、７７はバスである。 Also, the content display devices such as the sound quality adjusting device 1 and the television receiver 4 described above with reference to FIGS. 1 to 12, and further, each means that is a component thereof may be configured with hardware as described above. However, a part thereof may be configured by software. For example, it may be configured by incorporating a program into a general-purpose computer such as a computer or a PC as shown by the microcomputer in FIG. 5, and the configuration of the general information processing apparatus shown in FIG. This will be described with reference to an example. FIG. 13 is a block diagram showing a configuration example of a general information processing apparatus. In the figure, 7 is an information processing apparatus, 71 is a CPU (Central Processing Unit), 72 is a RAM (Random Access Memory), and 73 is rewritable. ROM, 74 is an input device, 75 is a display device, 76 is an output device, and 77 is a bus.

また、コンピュータを本発明に係る装置や各手段として機能させるためのプログラム、或いは各処理ステップをコンピュータに実行させるためのプログラムは、ＲＯＭ７３に蓄積されており、ＣＰＵ７１が読み出すことによって実行される。コンピュータ等に搭載される場合のこのプログラムは、上述の各手段としてコンピュータのＣＰＵ７１等を制御するプログラム（コンピュータを機能させるプログラム）である。本発明に係る装置や各手段で取り扱われる情報は、その処理時に一時的にＲＡＭ７２に蓄積され、その後、各種ＲＯＭ７３に格納され、必要に応じて、ＣＰＵ７１によって読み出し、修正・書き込みが行われる。ここで本発明に関連する情報としては、ユーザ選択された項目の情報や、閾値や入力装置７４の一つとしての音声信号入力手段によって入力され信号解析される時の音声信号などが挙げられる。また、例えばＲＯＭ７３に記憶された設定選択肢のうち設定された値をＲＡＭ７２に読み出すことでその設定をその間維持するようにしてもよい。 Further, a program for causing a computer to function as an apparatus or each means according to the present invention or a program for causing a computer to execute each processing step is stored in the ROM 73 and is executed by being read out by the CPU 71. This program when installed in a computer or the like is a program (a program that causes a computer to function) that controls the CPU 71 or the like of the computer as each of the means described above. Information handled by the apparatus and each means according to the present invention is temporarily accumulated in the RAM 72 at the time of processing, then stored in various ROMs 73, and read out, corrected, and written by the CPU 71 as necessary. Here, the information related to the present invention includes information on items selected by the user, audio signals when the signal is input and analyzed by the audio signal input means as one of the threshold value and the input device 74, and the like. Further, for example, the setting value stored in the ROM 73 may be read out to the RAM 72 to maintain the setting during that time.

また、処理の途中経過や結果は、ＬＣＤ，ＰＤＰ，有機ＥＬ，ＣＲＴ等の表示装置７５を通して装置ユーザに提示され、ユーザ設定が必要な場合には、キーボード，マウス（ポインティングデバイス）等の入力装置７４から装置ユーザが処理に必要なパラメータを入力指定或いは選択入力すればよい（例えば入力する音声信号或いはそれを含むコンテンツの指定、各種ユーザ設定項目の選択など）。また、このプログラムは、装置ユーザが使用する際に容易となるように、表示装置７５用のグラフィカルユーザインターフェース（ＧＵＩ）を備えるようにするとよい。ＧＵＩの例は、図１０乃至図１２でも例示している。出力装置７６としては、音声信号の出力装置であるスピーカをはじめとして、ネットワークに接続して通信を行うためのネットワークボード等の通信機器や、その他、印刷装置等の出力デバイス用の出力装置がある。なお、ＣＰＵ７１，ＲＡＭ７２，ＲＯＭ７３，入力装置７４，表示装置７５，出力装置７６は、バス７７などで接続されていればよい。 The progress and result of the process are presented to the device user through a display device 75 such as an LCD, PDP, organic EL, or CRT. When user settings are required, an input device such as a keyboard or a mouse (pointing device). The device user may input or select and input parameters necessary for processing from 74 (for example, specification of an audio signal to be input or content including it, selection of various user setting items, etc.). In addition, this program may be provided with a graphical user interface (GUI) for the display device 75 so as to be easy for the device user to use. An example of the GUI is also illustrated in FIGS. Examples of the output device 76 include a speaker which is an audio signal output device, a communication device such as a network board for communication by connecting to a network, and an output device for an output device such as a printing device. . The CPU 71, RAM 72, ROM 73, input device 74, display device 75, and output device 76 may be connected by a bus 77 or the like.

また、上述のごときプログラムを記録した記録媒体としては、具体的には、ＣＤ−ＲＯＭ、光磁気ディスク、ＤＶＤ−ＲＯＭ、ＦＤ、フラッシュメモリ、及びその他各種ＲＯＭ（書き換え可能なＲＯＭも含む）やＲＡＭ等が想定でき、上述した本発明の各実施形態の機能をコンピュータに実行させるプログラムを、これら記録媒体に記録して流通させることにより、当機能の実現を容易にする。そして、コンピュータ等の情報処理装置に、上述のごとくの記録媒体を装着して情報処理装置によりプログラムを読み出すか、若しくは情報処理装置が備えている記録媒体に当プログラムを記憶させておき、必要に応じて読み出すことにより、本発明に係わる機能を実行することができる。 Further, as a recording medium on which the program as described above is recorded, specifically, a CD-ROM, a magneto-optical disk, a DVD-ROM, an FD, a flash memory, and various other ROMs (including a rewritable ROM) and a RAM The above functions can be easily realized by recording and distributing a program for causing a computer to execute the functions of the above-described embodiments of the present invention on these recording media. Then, the information processing apparatus such as a computer is loaded with the recording medium as described above, and the program is read by the information processing apparatus, or the program is stored in the recording medium included in the information processing apparatus, By reading in response, the function according to the present invention can be executed.

図１４は、本発明の他の実施形態に係る音質調整装置の一構成例を示すブロック図で、図中、８は音質調整装置、８０は音声信号入力手段、８１ａはミュージック性検出手段、８１ｂはスピーチ性検出手段、８２はスピーチ／非スピーチ判定手段、８３は有音／無音判定手段、８５は音質調整手段、８６は音声信号出力手段、８７は判定結果表示手段である。 FIG. 14 is a block diagram showing a configuration example of a sound quality adjusting apparatus according to another embodiment of the present invention, in which 8 is a sound quality adjusting apparatus, 80 is an audio signal input means, 81a is a music property detecting means, and 81b. Is speech quality detection means, 82 is speech / non-speech determination means, 83 is sound / silence determination means, 85 is sound quality adjustment means, 86 is audio signal output means, and 87 is judgment result display means.

本実施形態に係る音質調整装置８は、ミュージック性検出手段８１ａ、スピーチ性検出手段８１ｂ、スピーチ／非スピーチ判定手段８２、有音／無音判定手段８３、音質調整手段８５、音声信号出力手段８６、及び判定結果表示手段８７を備えるものとする。有音／無音判定手段８３は、音声信号入力手段８０で入力された音声信号が有音の状態か無音の状態かを判定する。音声信号入力手段８０では、その入力元や入力方法は問わない。また、有音／無音判定手段８３では、例えば入力音声信号の信号レベルを検出すること（所定レベル以上を有音とするなど）で、有音／無音のいずれの状態であるかを判定すればよい。なお、有音／無音判定手段８３は、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよい。 The sound quality adjusting device 8 according to the present embodiment includes a music property detecting unit 81a, a speech property detecting unit 81b, a speech / non-speech determining unit 82, a sound / non-sound determining unit 83, a sound quality adjusting unit 85, an audio signal output unit 86, The determination result display means 87 is provided. The voice / silence determination unit 83 determines whether the voice signal input by the voice signal input unit 80 is in a voiced state or in a silent state. In the audio signal input means 80, the input source and input method are not limited. In addition, the sound / silence determination means 83 detects the state of sound / silence, for example, by detecting the signal level of the input sound signal (eg, a sound level above a predetermined level). Good. The voice / silence determination means 83 may be configured in whole or in part by hardware or software.

音質調整手段８５は、スピーチ／非スピーチ判定手段８２の判定結果（図１等で説明したものと同様）並びに有音／無音判定手段８３での判定結果に基づいて、音声信号を有音と無音とで異なる音質に設定し、その設定に基づいて音質を調整する。なお、音質調整手段８５は、その全体又は一部をハードウェアで構成してもソフトウェアで構成してもよい。そして、音質調整手段８５による無音時の音質設定は、有音／無音判定手段８３で無音と判定された直前の有音時の音質設定に基づき、その一部のみの変更により行う。例えば、無音の場合には所定の低域帯及び所定の高域帯の出力レベルを有音の場合に比べ１〜２ｄＢ下げるなどすればよい。一部のみの変更により、直前の有音時の設定値に近い設定値で調整することとなり、無音時から再度有音状態に移行した際、この状態が上記直前の有音時と近い信号レベルを持つ状態と想定されることから、設定値の変更が一部で済み、素早い復帰が可能となる。なお、この効果は、音質調整手段８５に基づく音質の設定をハードウェアで構成することでより顕著になる。そして、音声信号出力手段８６は、音質調整手段８５で調整された音声信号を出力する。 The sound quality adjusting unit 85 converts the sound signal into sound and sound based on the determination result of the speech / non-speech determination unit 82 (similar to that described in FIG. 1 and the like) and the determination result of the sound / silence determination unit 83. Set different sound quality for and adjust the sound quality based on that setting. The sound quality adjusting means 85 may be configured in whole or in part by hardware or software. The sound quality setting by the sound quality adjusting means 85 is performed by changing only a part thereof based on the sound quality setting at the time of sound immediately before the sound / silence determining means 83 determines that there is no sound. For example, in the case of silence, the output level of a predetermined low frequency band and a predetermined high frequency band may be lowered by 1 to 2 dB compared to the case of sound. By changing only a part, adjustment is made with a setting value close to the setting value at the time of the previous sound, and when transitioning from the silent state to the sounding state again, this state is close to the previous sounding level. Therefore, it is possible to change the setting value in part and to quickly return. This effect becomes more prominent when the sound quality setting based on the sound quality adjusting means 85 is configured by hardware. The audio signal output unit 86 outputs the audio signal adjusted by the sound quality adjusting unit 85.

また、ミュージック性検出手段８１ａ、スピーチ性検出手段８１ｂ、及びスピーチ／非スピーチ判定手段８２については、図１で説明した通りであるが、ここではモノラル／ステレオ判定に基づく閾値の最適化を行わない例を示している。なお、モノラル／ステレオ判定によってスピーチ自動検出機能の判定基準を最適化させる方が、検出機能の精度を向上させることができる。また、計算式テーブル５１のαに相当するパラメータを有音／無音によって異ならしめるようにしてもよい。また、スピーチ／非スピーチ判定手段８２の代わりに、ＥＰＧ情報によってコンテンツの詳細な時系列の情報を取得するよう構成してもよく、その場合にはその取得した情報を元に判定結果表示も行うこととなる。また、スピーチ／非スピーチ判定手段８２の配置は、図１４で示したものに限らない。そして、この形態における音質調整手段８５は、スピーチ／非スピーチ判定手段８２における判定結果に基づいて、スピーチと非スピーチとで、上記一部のみの変更の値を異ならしめればよい。 Further, the music property detection unit 81a, the speech property detection unit 81b, and the speech / non-speech determination unit 82 are as described with reference to FIG. 1, but here, the threshold is not optimized based on the monaural / stereo determination. An example is shown. Note that the accuracy of the detection function can be improved by optimizing the determination criterion of the automatic speech detection function by monaural / stereo determination. Further, the parameter corresponding to α in the calculation formula table 51 may be made different depending on whether sound is present or not. Further, instead of the speech / non-speech determination means 82, it may be configured to acquire detailed time-series information of the content based on the EPG information. In this case, the determination result is also displayed based on the acquired information. It will be. Further, the arrangement of the speech / non-speech determination means 82 is not limited to that shown in FIG. Then, the sound quality adjusting means 85 in this embodiment may make the change values of only a part of the speech and non-speech different based on the determination result in the speech / non-speech determination means 82.

ここでの音質設定の方法は任意であり、スピーチ／非スピーチにより、その設定値や増減の設定値、或いは各周波数帯での設定値などが異なっていればよい。例えば、グラフィックイコライザのごときイコライザの中心周波数とフィルタのＱ値が固定されている音質設定や、パラメトリックイコライザのごとくこれらも変更可能な音質設定であってもよいが、上述したように、基本的に有音から無音に移行した際の音質設定は直前の有音時のそれに一部変更したものとなる。さらに、上記一部のみの変更は、無音の場合には所定の低域帯及び所定の高域帯の出力レベルを有音の場合に比べ１〜２ｄＢ下げるなどとして例示したように、一部の周波数帯域で局所的に出力レベルを低減させる変更とすることが好ましい。 The sound quality setting method here is arbitrary, and it suffices if the setting value, the increase / decrease setting value, or the setting value in each frequency band differs depending on speech / non-speech. For example, a sound quality setting such as a graphic equalizer in which the center frequency of the equalizer and the Q value of the filter are fixed, or a sound quality setting that can be changed like a parametric equalizer may be used. The sound quality setting at the time of transition from sound to silence is partially changed from that at the previous sound. In addition, only a part of the change described above is illustrated as a case where the output level of the predetermined low-frequency band and the predetermined high-frequency band is lowered by 1 to 2 dB in the case of silence as compared to the case of sound. It is preferable to change the output level locally in the frequency band.

また、判定結果表示手段８７は、スピーチ／非スピーチ判定の結果をユーザに視認させるための手段であるが、同様に、有音／無音の判定結果をユーザに視認させるようにしてもよい。 Further, the determination result display means 87 is a means for making the user visually recognize the result of speech / non-speech determination. Similarly, the determination result of sound / no sound may be made visible to the user.

図１５は、図１４の音質調整装置における音質調整処理の一例を説明するためのフロー図で、図１６は、図１４の音質調整装置における音質調整処理で用いる音質設定イコライジングの一例を示す図である。ここで、図１６（Ａ）はスピーチ時の例、図１６（Ｂ）は非スピーチ時の例を示している。 FIG. 15 is a flowchart for explaining an example of the sound quality adjustment process in the sound quality adjustment apparatus of FIG. 14, and FIG. 16 is a diagram showing an example of the sound quality setting equalizing used in the sound quality adjustment process in the sound quality adjustment apparatus of FIG. is there. Here, FIG. 16A shows an example during speech, and FIG. 16B shows an example during non-speech.

音質が基本音質に初期設定されているものとして説明する。また、音声信号からスピーチ／非スピーチを判定し、スピーチと判定されたときにはＡの音質に、非スピーチと判定されたときにはＢの音質に設定する例を中心に説明する。 In the following description, it is assumed that the sound quality is initially set to the basic sound quality. Further, an explanation will be mainly given of an example in which speech / non-speech is determined from a speech signal, and the sound quality of A is set when it is determined as speech, and the sound quality of B is set when it is determined as non-speech.

まず、有音／無音判定手段８３で入力レベルを確認する（ステップＳ４１）。ここで、有音であればステップＳ４５へ、無音であれば基本音質を修正し（ステップＳ４２）、再度ステップＳ４１で入力レベルを確認する。ステップＳ４２では、ステップＳ４１での無音状態との判定が二度目以降の場合には、基本音質の修正を行わないようにしてもよく、この場合でなく再度修正する場合でもその設定は継続しておく。ステップＳ４１，Ｓ４２での処理は、音声信号が入力され、最初に音質が音質Ａ／Ｂのいずれかに設定される前の処理であり、その後はステップＳ４３以降の処理で設定の変更及び保持が遂行されていく。 First, the voice / silence determination means 83 confirms the input level (step S41). Here, if there is sound, the process proceeds to step S45. If there is no sound, the basic sound quality is corrected (step S42), and the input level is confirmed again in step S41. In step S42, if the silence state is determined for the second time or later in step S41, the basic sound quality may not be corrected, and the setting is continued even in the case of correcting again instead of this case. deep. The processes in steps S41 and S42 are processes before an audio signal is input and the sound quality is first set to one of sound quality A / B, and thereafter, the setting is changed and retained in the processes after step S43. It will be carried out.

次に、ミュージック性検出手段１１ａ及びスピーチ性検出手段１１ｂが、ミュージック性の検出及びスピーチ性の検出をに実行する（ステップＳ４３，Ｓ４４）。ステップＳ４３，Ｓ４４の順序は問わない。次に、スピーチ／非スピーチを判定する（ステップＳ４５）。なお、スピーチ／非スピーチにおける判定基準は、ある１つの閾値処理によってなされても複数パラメータの閾値処理によってなされてもよい。ステップＳ４５の判定に基づいて、音質の設定・調整を行う（ステップＳ４６，Ｓ４７）。この音質設定では、スピーチと判定されたときにはＡの音質を選択して音質を調整し（ステップＳ４６）、非スピーチと判定されたときにはＢの音質を選択して音質を調整する（ステップＳ４７）。 Next, the music property detection unit 11a and the speech property detection unit 11b execute music property detection and speech property detection (steps S43 and S44). The order of steps S43 and S44 does not matter. Next, speech / non-speech is determined (step S45). Note that the determination criteria for speech / non-speech may be made by a single threshold process or a multi-parameter threshold process. Based on the determination in step S45, the sound quality is set and adjusted (steps S46 and S47). In this sound quality setting, when it is determined to be speech, the sound quality of A is selected to adjust the sound quality (step S46), and when it is determined to be non-speech, the sound quality of B is selected to adjust the sound quality (step S47).

ここで、音質設定Ａと音質設定Ｂとの違いの例について、図１６を参照して説明する。音質設定Ａ（スピーチ）の場合、イコライザの周波数特性をグラフ９１で示すように設定し、音質設定Ｂ（非スピーチ）の場合、イコライザの周波数特性をグラフ９３で示すように設定する。グラフ９１とグラフ９３との違いは、非スピーチのとき、スピーチのときの所定の低周波数９１ａの付近及び所定の高周波数９１ｂの付近の出力レベルに比べて、所定の低周波数９３ａの付近及び所定の高周波数９３ｂの付近の出力レベルを強調している点にある。 Here, an example of the difference between the sound quality setting A and the sound quality setting B will be described with reference to FIG. In the case of sound quality setting A (speech), the frequency characteristic of the equalizer is set as shown by a graph 91, and in the case of sound quality setting B (non-speech), the frequency characteristic of the equalizer is set as shown by a graph 93. The difference between the graph 91 and the graph 93 is that in the case of non-speech, the vicinity of the predetermined low frequency 93a and the predetermined level in the vicinity of the predetermined low frequency 91a and the output level in the vicinity of the predetermined high frequency 91b at the time of speech. The output level near the high frequency 93b is emphasized.

ステップＳ４６，Ｓ４７の処理では、この選択した音質を保持しておき、次にステップＳ４８において、その元となったスピーチ／非スピーチの判定結果の表示を行う。そして、有音／無音判定手段８３で入力レベルを確認する（ステップＳ４９）。ここで、有音であれば処理を終了し、無音であれば音質の調整を行う。ここで行われる音質の調整は、音質をそれぞれの前の状態に合わせて修正する（ステップＳ５０）。設定保持されている音質（無音になる前の音質）が、音質Ａであった場合には図１６（Ａ）のグラフ９２のごとき音質Ａ′、音質Ｂであった場合には図１６（Ｂ）のグラフ９４のごとき音質Ｂ′に修正する。スピーチ時のグラフ９２とグラフ９１との違いは、所定の低周波数９１ａの付近及び所定の高周波数９１ｂの付近を強調している点にある。同様に、非スピーチ時のグラフ９４とグラフ９３との違いは、所定の低周波数９３ａの付近及び所定の高周波数９３ｂの付近を強調している点にある。本実施形態では、音質Ａ′，Ｂ′のように、スピーチ自動検出機能使用時に、有音時の音質設定Ａ，Ｂの他に、無音状態用の音質設定、すなわち音声入力信号が無い時、若しくは入力信号が小さい（バックグランドノイズ）時の音質設定を設けておく。 In the processes in steps S46 and S47, the selected sound quality is held, and in step S48, the original speech / non-speech determination result is displayed. The voice / silence determination means 83 confirms the input level (step S49). Here, if there is sound, the process is terminated, and if there is no sound, the sound quality is adjusted. In the sound quality adjustment performed here, the sound quality is corrected according to the previous state (step S50). When the set sound quality (the sound quality before silence) is the sound quality A, the sound quality is A ′ as shown in the graph 92 of FIG. ) To a sound quality B 'as shown in the graph 94 of FIG. The difference between the graph 92 and the graph 91 during speech is that the vicinity of the predetermined low frequency 91a and the vicinity of the predetermined high frequency 91b are emphasized. Similarly, the difference between the non-speech graph 94 and the graph 93 is that the vicinity of the predetermined low frequency 93a and the vicinity of the predetermined high frequency 93b are emphasized. In the present embodiment, when the automatic speech detection function is used, as in the case of the sound quality A ′ and B ′, in addition to the sound quality settings A and B when there is sound, when there is no sound quality setting for the silent state, that is, there is no audio input signal, Alternatively, a sound quality setting when the input signal is small (background noise) is provided.

次に、無音状態から有音状態へ復帰したかを判定する（ステップＳ５１）。復帰せず、無音のままであればそのときの設定（音質パラメータなど）は変更せずに継続しておき、有音状態への復帰を待つ。一方、復帰した場合には、音質Ａ′又は音質Ｂ′を、有音時の音質設定Ａ又はＢに戻し（ステップＳ５２）、処理を終了する。 Next, it is determined whether the silent state has returned to the voiced state (step S51). If the sound is not restored and remains silent, the settings (sound quality parameters, etc.) at that time are continued without being changed, and a return to the sound state is awaited. On the other hand, when the sound is restored, the sound quality A ′ or the sound quality B ′ is returned to the sound quality setting A or B when there is a sound (step S52), and the process is terminated.

以上、本実施形態のごとき有音／無音判定を実行することにより、次のような従来技術の課題を解決することができる。すなわち、従来技術では、音声情報だけから音楽情報の是非を判断することによって生ずるこのような誤判定によって的確な音質調整を行うことが困難であるだけでなく、音声信号が無音の信号や入力レベルが小さい信号であった場合には、スピーカから低高域ノイズが出力される。このような事態を解消するために、信号レベルが０或いは小さいときには入力信号をシャットアウトするような音質調整を行うように機器を構成した場合であっても、信号レベルが上がり音声が復帰したときに的確で素早い音質設定ができない。このような現象は、記録媒体のローディング時、外部入力との切り替え時、視聴するコンテンツがスピーチ時から非スピーチ時への切り替え時、受信するチャンネルの切り替え時、さらにはＣＭからの本編への移行時など、急激に信号レベルの大小が切り替わるような音声信号に対しては、特に問題となる。 As described above, by executing the sound / silence determination as in the present embodiment, the following problems of the prior art can be solved. That is, in the prior art, not only is it difficult to accurately adjust the sound quality due to such a misjudgment caused by judging whether or not the music information is only from the sound information, but the sound signal is a silent signal or an input level. Is a small signal, low and high frequency noise is output from the speaker. To solve this situation, when the signal level rises and the sound is restored even when the equipment is configured to adjust the sound quality so that the input signal is shut out when the signal level is 0 or low The sound quality cannot be set accurately and quickly. Such phenomena occur when recording media are loaded, when switching to external input, when the content to be viewed is switched from speech to non-speech, when the channel to be received is switched, and from CM to the main part. This is particularly a problem for audio signals whose signal level changes suddenly, such as at times.

すなわち、本実施形態に係る音質調整装置によれば、無音時にスピーカから低高域ノイズが出力されるのを削減すると共に、前の状態に近い状態で音質設定をすることによって、音声復帰時の素早い対応（音質設定）が可能となる。つまり、この音質調整装置では、入力レベルが急激に切り替わるような音声信号に対しても、無音時のノイズ出力を的確に低減し且つ有音状態に素早く復帰するような音質設定を行うことが可能となる。 That is, according to the sound quality adjustment apparatus according to the present embodiment, the low-high frequency noise is output from the speaker when there is no sound, and the sound quality setting is performed in a state close to the previous state, so that the sound quality is restored. Quick response (sound quality setting) is possible. In other words, with this sound quality adjustment device, it is possible to perform sound quality settings that accurately reduce noise output during silence and quickly return to the sound state even for audio signals whose input level changes abruptly. It becomes.

本実施形態によれば、このような効果に加え、音声信号の音声情報だけからではなく番組（その音声信号を含む番組）の主旨に沿った判断（スピーチ／非スピーチの判断）も同時になすことで、入力された音声信号の特性によるイコライザ等の音響パラメータ制御の誤判定を極力低減し、的確な音響パラメータの制御及び的確な音質調整が可能となり、さらに、本発明の主たる効果として、入力された音声信号に対してスピーチ／非スピーチを判定する際にその判定結果をユーザに視認させることが可能となる。例えば、音声信号に音声情報と同時に重畳されたモノラル／ステレオ信号によってその番組の主旨を判定し、その結果に応じて入力された音声信号がスピーチか非スピーチ（音楽）かを判断するための判断基準を最適化することによって、放送された番組の内容、特性に応じたスピーチ／非スピーチ検出の自由な制御、及びその制御に基づく音質調整、並びにユーザへの検出結果の提示が可能になる。 According to the present embodiment, in addition to the effects described above, not only the audio information of the audio signal but also the determination (speech / non-speech determination) according to the gist of the program (the program including the audio signal) is made at the same time. Therefore, it is possible to reduce the erroneous determination of the acoustic parameter control of the equalizer or the like as much as possible due to the characteristics of the input audio signal, and it is possible to accurately control the acoustic parameters and adjust the sound quality.In addition, as the main effect of the present invention, When the speech / non-speech is determined for the audio signal, the determination result can be made visible to the user. For example, the main purpose of the program is determined by a monaural / stereo signal superimposed on the audio signal at the same time as the audio information, and a determination for determining whether the input audio signal is speech or non-speech (music) according to the result. By optimizing the criteria, it is possible to freely control speech / non-speech detection according to the contents of broadcasted programs, characteristics, and sound quality adjustment based on the control, and to present detection results to the user.

また、図１４乃至図１６で上述した音質調整装置８も、図１等で示した音質調整装置と同様に、コンテンツ表示装置に組み込むことも可能である。また、その音質調整装置８又はコンテンツ表示装置における構成要素となる各手段もハードウェアで構成してもよいがその一部をソフトウェアで構成してもよい。ＰＣ（パーソナルコンピュータ）等の汎用コンピュータなどにプログラムを組み込むことで構成した例、並びにそのプログラムを記録したコンピュータ読み取り可能な記録媒体の例も、図１３を参照して説明した通りであるが、ＲＯＭに格納されているプログラムが異なる。このプログラムは、上述した各手段に対応する処理ステップ、すなわち有音／無音判定ステップ、スピーチ／非スピーチ判定ステップ、音質調整ステップ、及びスピーチ／非スピーチ判定に基づく判定結果表示ステップとを、コンピュータに実行させるためのプログラムである。そして、音質調整ステップにおける無音時の音質設定は、有音／無音判定ステップで無音と判定された直前の有音時の音質設定に基づき、その一部のみの変更により行う。また、音質調整を音質調整器（ハードウェア）によって実行させる場合の音質調整ステップは、音声信号を音質設定に基づき音声信号の音質を音質調整機器に調整させるための制御を行うステップとなる。 Further, the sound quality adjusting device 8 described above with reference to FIGS. 14 to 16 can also be incorporated in the content display device in the same manner as the sound quality adjusting device shown in FIG. Further, each means that is a component in the sound quality adjusting device 8 or the content display device may be configured by hardware, but a part thereof may be configured by software. An example in which a program is incorporated in a general-purpose computer such as a PC (personal computer), and an example of a computer-readable recording medium in which the program is recorded are as described with reference to FIG. The programs stored in are different. This program performs processing steps corresponding to the above-described means, that is, sound / silence determination step, speech / non-speech determination step, sound quality adjustment step, and determination result display step based on speech / non-speech determination on a computer. This is a program to be executed. The sound quality setting in the sound quality adjustment step is performed by changing only a part thereof based on the sound quality setting in the sound immediately before it is determined that there is no sound in the sound / silence determination step. The sound quality adjustment step when the sound quality adjustment is executed by the sound quality adjuster (hardware) is a step of performing control for causing the sound quality adjusting device to adjust the sound quality of the sound signal based on the sound quality setting of the sound signal.

本発明の一実施形態に係る音質調整装置の一構成例を示すブロック図である。It is a block diagram which shows the example of 1 structure of the sound quality adjustment apparatus which concerns on one Embodiment of this invention. 図１の音質調整装置における音質調整処理並びに判定結果表示処理の一例を説明するためのフロー図である。It is a flowchart for demonstrating an example of the sound quality adjustment process and determination result display process in the sound quality adjustment apparatus of FIG. 図１の音質調整装置における音質調整処理で用いる音質設定イコライジングの一例を示す図である。It is a figure which shows an example of the sound quality setting equalizing used by the sound quality adjustment process in the sound quality adjustment apparatus of FIG. 図２の判定結果表示処理における画面表示例を示す図である。It is a figure which shows the example of a screen display in the determination result display process of FIG. 図１の音質調整装置における適用例の一つであるテレビ受像機の一構成例を示すブロック図である。It is a block diagram which shows the example of 1 structure of the television receiver which is one of the application examples in the sound quality adjustment apparatus of FIG. 図５におけるマイコン内に格納されている計算式テーブルの一例を示す図である。It is a figure which shows an example of the calculation formula table stored in the microcomputer in FIG. 図５におけるマイコン内に格納されているマーク表示目標テーブルの一例を示す図である。It is a figure which shows an example of the mark display target table stored in the microcomputer in FIG. 図５のテレビ受像機におけるスピーチ／非スピーチ判定及び判定結果表示処理を説明するためのフロー図である。FIG. 6 is a flowchart for explaining speech / non-speech determination and determination result display processing in the television receiver of FIG. 5. 図５のテレビ受像機における判定結果表示処理を説明するためのフロー図である。FIG. 6 is a flowchart for explaining determination result display processing in the television receiver of FIG. 5. 図１の音質調整装置における判定結果表示の設定画面の一例を示す図である。It is a figure which shows an example of the setting screen of the determination result display in the sound quality adjustment apparatus of FIG. 図１の音質調整装置における判定結果表示の設定画面の一例を示す図である。It is a figure which shows an example of the setting screen of the determination result display in the sound quality adjustment apparatus of FIG. 図１の音質調整装置における判定結果表示の設定画面の一例を示す図である。It is a figure which shows an example of the setting screen of the determination result display in the sound quality adjustment apparatus of FIG. 一般的な情報処理装置の構成例を示すブロック図である。FIG. 11 is a block diagram illustrating a configuration example of a general information processing apparatus. 本発明の他の実施形態に係る音質調整装置の一構成例を示すブロック図である。It is a block diagram which shows the example of 1 structure of the sound quality adjustment apparatus which concerns on other embodiment of this invention. 図１４の音質調整装置における音質調整処理の一例を説明するためのフロー図である。It is a flowchart for demonstrating an example of the sound quality adjustment process in the sound quality adjustment apparatus of FIG. 図１４の音質調整装置における音質調整処理で用いる音質設定イコライジングの一例を示す図である。It is a figure which shows an example of the sound quality setting equalizing used by the sound quality adjustment process in the sound quality adjustment apparatus of FIG.

符号の説明Explanation of symbols

１，８…音質調整装置、４…テレビ受像機、７…情報処理装置、１０，８０…音声信号入力手段、１１ａ，８１ａ…ミュージック性検出手段、１１ｂ，８１ｂ…スピーチ性検出手段、１２，８２…スピーチ／非スピーチ判定手段、１３…モノラル／ステレオ判定手段、１４…基準最適化手段、１４ａ…スイッチ、１４ｂ…閾値Ｖ_ＳＬ１への設定手段、１４ｃ…閾値Ｖ_ＳＬ２への設定手段、１５，８５…音質調整手段、１６，８６…音声信号出力手段、、１７，８７…判定結果表示手段４０…チューナ部、４１…外部入力部、４２…本体操作部、４３…映像処理ＩＣ、４４…マイコン、４５…音声処理ＩＣ、４６…ディスプレイ、４７Ｌ，４７Ｒ…スピーカ、４８…受光部、４９…リモコン、７１…ＣＰＵ、７２…ＲＡＭ、７３…書き換え可能なＲＯＭ、７４…入力装置、７５…表示装置、７６…出力装置、７７…バス、８３…有音／無音判定手段。 DESCRIPTION OF SYMBOLS 1,8 ... Sound quality adjustment apparatus, 4 ... Television receiver, 7 ... Information processing apparatus, 10, 80 ... Audio | voice signal input means, 11a, 81a ... Music property detection means, 11b, 81b ... Speech property detection means, 12, 82 ... speech / non-speech decision section, 13 ... mono / stereo decision means, 14 ... reference optimization means, 14a ... switch, setting means to 14b ... threshold _{V SL1,} setting means to 14c ... threshold _{V SL2,} 15,85 ... sound quality adjustment means, 16, 86 ... audio signal output means, 17, 87 ... determination result display means 40 ... tuner part, 41 ... external input part, 42 ... main body operation part, 43 ... video processing IC, 44 ... microcomputer, 45 ... Audio processing IC, 46 ... Display, 47L, 47R ... Speaker, 48 ... Light receiving unit, 49 ... Remote control, 71 ... CPU, 72 ... RAM, 73 ... Rewriteable A ROM, 74 ... input apparatus, 75 ... display, 76 ... output device, 77 ... bus, 83 ... sound / silence decision unit.

Claims

入力された音声信号がもつミュージック性の度合を検出するミュージック性検出手段と、入力された音声信号がもつスピーチ性の度合を検出するスピーチ性検出手段と、入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを判別するための判定を行うスピーチ／非スピーチ判定手段とを有する音声信号判別装置であって、前記スピーチ／非スピーチ判定手段は、前記ミュージック性検出手段の検出結果を所定数の段階に分類し、且つ前記スピーチ性検出手段の検出結果を前記所定数と同じ又は異なる所定数の段階に分類し、スピーチ性の度合及びミュージック性の度合に応じた各分類の組み合わせ毎に異なる計算式を用い、スピーチ／非スピーチの判定を行うことを特徴とする音声信号判別装置。 Musicality detection means for detecting the degree of musiciness of the input speech signal, speechiness detection means for detecting the degree of speechiness of the input speech signal, and the input speech signal corresponds to speech A speech / non-speech determination unit that performs a determination to determine whether the signal corresponds to non-speech, or the speech / non-speech determination unit includes: The detection results are classified into a predetermined number of stages, and the detection results of the speech detection means are classified into a predetermined number of stages that are the same as or different from the predetermined number, and each classification according to the degree of speech and music A speech signal discriminating apparatus that performs speech / non-speech determination using a different calculation formula for each combination .

入力された音声信号がモノラル信号又はステレオ信号のいずれであるかを判定するモノラル／ステレオ判定手段を有し、前記スピーチ／非スピーチ判定手段は、前記モノラル／ステレオ判定手段の判定結果に基づいて、前記計算式の補正成分を調整することを特徴とする請求項１に記載の音声信号判別装置。 It has a monaural / stereo determination means for determining whether the input audio signal is a monaural signal or a stereo signal, and the speech / non-speech determination means is based on the determination result of the monaural / stereo determination means, The audio signal determination apparatus according to claim 1 , wherein the correction component of the calculation formula is adjusted.

請求項１又は２に記載の音声信号判別装置を備えた音質調整装置であって、該音声信号判別装置によってスピーチ／非スピーチに判別された音声信号に対し、スピーチと非スピーチとで異なる音質に調整する音質調整手段を備えることを特徴とする音質調整装置。 A tone control device equipped with a speech signal discrimination system according to claim 1 or 2, with respect to the audio signal is determined in the speech / non-speech by voice signal determining apparatus, a different tone in the speech and non-speech A sound quality adjusting device comprising a sound quality adjusting means for adjusting.

前記スピーチ／非スピーチ判定手段における判定結果を表示する判定結果表示手段を備え、該判定結果表示手段は、ユーザに対し、前記判定結果をスピーチ或いは非スピーチの度合に応じて段階的に表示することを特徴とする請求項３に記載の音質調整装置。 A determination result display means for displaying a determination result in the speech / non-speech determination means is provided, and the determination result display means displays the determination result to the user in stages according to the degree of speech or non-speech. The sound quality adjusting device according to claim 3 .

前記音質調整手段は、前記スピーチ／非スピーチ判定手段の判定結果に基づく前記音質調整を実行するか否かを設定する調整設定手段を有し、前記判定結果表示手段は、前記調整設定手段によって前記音質調整を実行するよう設定されている場合にのみ、前記判定結果の表示を行うことを特徴とする請求項４に記載の音質調整装置。 The sound quality adjusting means includes adjustment setting means for setting whether or not to execute the sound quality adjustment based on a determination result of the speech / non-speech determining means, and the determination result display means is determined by the adjustment setting means. The sound quality adjusting apparatus according to claim 4 , wherein the determination result is displayed only when the sound quality adjustment is set to be executed.

前記判定結果表示手段は、前記判定結果の表示を実行するか否かを設定する表示設定手段を有し、該表示設定手段によって前記判定結果表示を実行するよう設定されている場合にのみ、前記判定結果の表示を行うことを特徴とする請求項４又は５に記載の音質調整装置。 The determination result display means has display setting means for setting whether or not to display the determination result, and only when the determination result display is set to be executed by the display setting means, tone control apparatus according to claim 4 or 5, characterized in that for displaying the judgment result.

請求項４乃至６のいずれか１項に記載の音質調整装置とコンテンツ入力装置とを備えたコンテンツ表示装置であって、該コンテンツ入力装置で入力されたコンテンツに含まれる音声信号を前記音質調整装置に入力し、音質を調整して音声出力し、且つ、前記コンテンツに含まれる映像信号を表示すると共に、必要に応じて前記判定結果表示手段による判定結果表示を行うことを特徴とするコンテンツ表示装置。 A content display device comprising the sound quality adjustment device according to any one of claims 4 to 6 and a content input device, wherein an audio signal included in the content input by the content input device is transmitted to the sound quality adjustment device. The content display device is characterized in that the sound quality is adjusted, the sound is output, the video signal included in the content is displayed, and the determination result display unit displays the determination result as necessary. .

ミュージック性検出手段が、入力された音声信号がもつミュージック性の度合を検出するミュージック性検出ステップと、スピーチ性検出手段が、入力された音声信号がもつスピーチ性の度合を検出するスピーチ性検出ステップと、スピーチ／非スピーチ判定手段が、入力された音声信号がスピーチに対応するものか、非スピーチに対応するものかを判別するための判定を行うスピーチ／非スピーチ判定ステップとを、コンピュータに実行させるためのプログラムであって、前記スピーチ／非スピーチ判定ステップは、前記ミュージック性検出ステップの検出結果を所定数の段階に分類し、且つ前記スピーチ性検出ステップの検出結果を前記所定数と同じ又は異なる所定数の段階に分類し、スピーチ性の度合及びミュージック性の度合に応じた各分類の組み合わせ毎に異なる計算式を用い、スピーチ／非スピーチの判定を行うことを特徴とするプログラム。 A music property detection step in which the music property detection means detects the degree of music property of the input speech signal , and a speech property detection step in which the speech property detection means detects the degree of speech property of the input speech signal. And a speech / non-speech determination means for the computer to execute a speech / non-speech determination step for determining whether the input audio signal corresponds to speech or non-speech. The speech / non-speech determination step classifies the detection result of the music property detection step into a predetermined number of stages, and the detection result of the speech property detection step is the same as the predetermined number or classified into different stages of a predetermined number, depending on the degree and music of the degree of speech of Using different calculation formulas for each combination of each classification, programs and performs determination of the speech / non-speech.

当該プログラムは、音質調整手段が前記スピーチ／非スピーチ判定ステップによりスピーチ／非スピーチに判別された音声信号に対しスピーチと非スピーチとで異なる音質に調整する音質調整ステップを、前記コンピュータに実行させるための調整プログラムを含むことを特徴とする請求項８に記載のプログラム。 The program, the sound quality adjustment step tone adjusting means is adjusted to different sound quality and the speech / non-speech decision step by the speech / pair to the determined audio signal into non-speech cis peach and non-speech, executed by the computer The program according to claim 8 , further comprising an adjustment program.

当該プログラムは、判定結果表示手段が前記スピーチ／非スピーチ判定ステップにおける判定結果を表示部に表示する判定結果表示ステップを、前記コンピュータに実行させるための表示プログラムを含み、該判定結果表示ステップは、ユーザに対し、前記判定結果をスピーチ或いは非スピーチの度合に応じて段階的に表示することを特徴とする請求項８又は９のいずれか１項に記載のプログラム。 The program includes a display program for causing the computer to execute a determination result display step in which the determination result display unit displays a determination result in the speech / non-speech determination step on a display unit, and the determination result display step includes: The program according to any one of claims 8 and 9 , wherein the determination result is displayed stepwise to the user in accordance with the degree of speech or non-speech.

請求項８乃至１０のいずれか１項に記載のプログラムを記録したコンピュータ読み取り可能な記録媒体。 The computer-readable recording medium which recorded the program of any one of Claims 8 thru | or 10 .