JP2024023163A

JP2024023163A - Audio signal processing device and program

Info

Publication number: JP2024023163A
Application number: JP2023128849A
Authority: JP
Inventors: 岳大杉本; Takehiro Sugimoto; 弘樹久保; Hiroki Kubo; 泰士岩崎; Yasushi Iwasaki; 訓史大出; Norifumi Oide; 靖茂中山; Yasushige Nakayama; 洋幸大久保; Hiroyuki Okubo
Original assignee: Nippon Hoso Kyokai NHK
Current assignee: Japan Broadcasting Corp
Priority date: 2022-08-08
Filing date: 2023-08-07
Publication date: 2024-02-21

Abstract

PROBLEM TO BE SOLVED: To improve easiness to hear a specific audio object while suppressing deterioration of overall impression of program voice that is constituted from a plurality of audio objects.

SOLUTION: An audio signal processing device 10 comprises: an adjustment value determination unit 14 that determines a first adjustment value b which is an adjustment value of a signal level of a first audio object and a second adjustment value c which is an adjustment value of a signal level of a second audio object; and an audio signal synthesis unit 13 that synthesizes an audio signal of the first audio object after adjusting the signal level based on the first adjustment value b and an audio signal of the second audio object after adjusting the signal level based on the second adjustment value c, and outputs the synthesized audio signals. The adjustment value determination unit 14 determines the first adjustment value b and the second adjustment value c such that a square-sum of an antilogarithm of the first adjustment value b and an antilogarithm of the second adjustment value c becomes to be constant to fit into an enhancement amount d.

SELECTED DRAWING: Figure 1

Description

本発明は、音声信号処理装置およびプログラムに関する。 The present invention relates to an audio signal processing device and a program.

近年、音声信号と音響メタデータ（非特許文献１，２参照）とを組み合わせたオブジェクトベース音響システムの実用化が進められている（非特許文献３－５参照）。オブジェクトベース音響システムは、視聴者が視聴環境あるいは好みに合わせて再生音をカスタマイズすることができるという特徴がある。 In recent years, object-based audio systems that combine audio signals and acoustic metadata (see Non-Patent Documents 1 and 2) have been put into practical use (see Non-Patent Documents 3-5). Object-based sound systems are characterized in that viewers can customize the reproduced sound according to their viewing environment or preferences.

Rec. ITU-R BS.2076-1 「Audio Definition Model」（2017）Rec. ITU-R BS.2076-1 “Audio Definition Model” (2017) Rec. ITU-R BS.2125-0 「A serial representation of the Audio Definition Model」（2019）Rec. ITU-R BS.2125-0 “A serial representation of the Audio Definition Model” (2019) ISO/IEC 23008-3:2019 「Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3: 3D audio, Second edition」（2019）ISO/IEC 23008-3:2019 “Information technology - High efficiency coding and media delivery in heterogeneous environments - Part 3: 3D audio, Second edition” (2019) ETSI TS 103 190-2 V1.2.1 「AC-4 Part 2」（2018）ETSI TS 103 190-2 V1.2.1 “AC-4 Part 2” (2018) ATSC Standard: A/342 Part 3 （2017）ATSC Standard: A/342 Part 3 (2017)

オブジェクトベース音響システムの代表的な機能の１つに、ダイアログエンハンスメント機能がある。ダイアログエンハンスメント機能は、図８に示すように、番組音声がダイアログ（セリフあるいはナレーション）および背景音（ＢＧＭあるいは効果音）の音声オブジェクトで構成されている場合に、番組の制作時のダイアログおよび背景音の信号レベルと比べて、ダイアログを強調する（ダイアログの信号レベルを大きくする）ことで、ダイアログの聞き取りを改善する機能である。 One of the representative functions of object-based sound systems is a dialog enhancement function. As shown in Figure 8, when the program audio consists of dialog (dialogue or narration) and background sound (BGM or sound effects) audio objects, the dialog enhancement function is used to enhance the dialogue and background sound during program production. This function improves the audibility of dialogue by emphasizing the dialogue (increasing the signal level of the dialogue) compared to the signal level.

オブジェクトベース音響システムにおいては、番組音声の全体的な印象の悪化を抑制しつつ、ダイアログの聞き取りやすさの向上を図ることができる、より効果的なダイアログエンハンスメント機能の実装が求められている。 In object-based audio systems, there is a need to implement a more effective dialogue enhancement function that can improve the audibility of dialogue while suppressing deterioration of the overall impression of program audio.

本発明の目的は、上述した課題を解決し、複数の音声オブジェクトから構成される番組音声の全体的な印象の悪化を抑制しつつ、特定の音声オブジェクトの聞き取りやすさの向上を図ることができる音声信号処理装置およびプログラムを提供することにある。 An object of the present invention is to solve the above-mentioned problems, and to improve the ease of listening to specific audio objects while suppressing deterioration of the overall impression of program audio composed of multiple audio objects. The purpose of the present invention is to provide an audio signal processing device and program.

（１）本開示に係る音声信号処理装置は、複数の音声オブジェクトから構成される番組の音声信号を処理する音声信号処理装置であって、前記複数の音声オブジェクトの内の第１の音声オブジェクトの信号レベルと前記第１の音声オブジェクトと重畳して再生される第２の音声オブジェクトの信号レベルとの比ａの増加量または減少量がエンハンスメント量ｄとして要求されると、前記第１の音声オブジェクトの信号レベルの調整値である第１の調整値ｂ、および、前記第２の音声オブジェクトの信号レベルの調整値である第２の調整値ｃを決定する調整値決定部と、前記第１の調整値ｂに基づき、前記第１の音声オブジェクトの信号レベルを調整する第１の音声信号調整部と、前記第２の調整値ｃに基づき、前記第２の音声オブジェクトの信号レベルを調整する第２の音声信号調整部と、前記第１の音声信号調整部による信号レベルの調整後の前記第１の音声オブジェクトの音声信号と、前記第２の音声信号調整部による信号レベルの調整後の前記第２の音声オブジェクトの音声信号とを合成して出力する音声信号合成部と、を備え、前記調整値決定部は、前記エンハンスメント量ｄに合わせて、前記第１の調整値ｂの真数と前記第２の調整値ｃの真数との二乗和が一定になるように前記第１の調整値ｂおよび前記第２の調整値ｃを決定する。 (1) An audio signal processing device according to the present disclosure is an audio signal processing device that processes an audio signal of a program composed of a plurality of audio objects, and in which a first audio object among the plurality of audio objects is processed. When the amount of increase or decrease in the ratio a between the signal level and the signal level of the second audio object that is reproduced superimposed on the first audio object is requested as the enhancement amount d, the first audio object an adjustment value determination unit that determines a first adjustment value b that is an adjustment value of the signal level of the second audio object; and a second adjustment value c that is the adjustment value of the signal level of the second audio object; a first audio signal adjustment unit that adjusts the signal level of the first audio object based on the adjustment value b; and a first audio signal adjustment unit that adjusts the signal level of the second audio object based on the second adjustment value c. the audio signal of the first audio object after the signal level has been adjusted by the first audio signal adjustment unit; and the audio signal of the first audio object after the signal level has been adjusted by the second audio signal adjustment unit. an audio signal synthesis unit that synthesizes and outputs the audio signal of the second audio object, and the adjustment value determination unit is configured to combine the antilog of the first adjustment value b and the audio signal of the first adjustment value b in accordance with the enhancement amount d. The first adjustment value b and the second adjustment value c are determined such that the sum of squares of the antilog of the second adjustment value c is constant.

（２）前記調整値決定部は、前記エンハンスメント量ｄが上限値ｅを超えないように前記第１の調整値ｂおよび前記第２の調整値ｃを決定する、（１）に記載の音声信号処理装置。 (2) The audio signal according to (1), wherein the adjustment value determining unit determines the first adjustment value b and the second adjustment value c so that the enhancement amount d does not exceed an upper limit e. Processing equipment.

（３）前記調整値決定部は、前記複数の音声オブジェクトの再生に関する情報である音響メタデータに含まれる前記エンハンスメント量ｄの上限値ｈを取得した場合、前記エンハンスメント量ｄが前記上限値ｈを超えないように前記第１の調整値ｂおよび前記第２の調整値ｃを決定する、（２）に記載の音声信号処理装置。 (3) When the adjustment value determining unit obtains an upper limit h of the enhancement amount d included in acoustic metadata that is information regarding reproduction of the plurality of audio objects, the adjustment value determination unit determines that the enhancement amount d exceeds the upper limit h. The audio signal processing device according to (2), wherein the first adjustment value b and the second adjustment value c are determined so that the first adjustment value b and the second adjustment value c are not exceeded.

（４）前記エンハンスメント量ｄと、前記第１の調整値ｂおよび前記第２の調整値ｃとを対応付けた調整値リストを記憶する調整値記憶部をさらに備え、前記調整値決定部は、前記調整値記憶部に記憶されている調整値リストに基づき、前記第１の調整値ｂおよび前記第２の調整値ｃを決定する、（１）から（３）のいずれか一項に記載の音声信号処理装置。 (4) The adjustment value determining unit further includes an adjustment value storage unit that stores an adjustment value list in which the enhancement amount d is associated with the first adjustment value b and the second adjustment value c, and the adjustment value determination unit includes: The method according to any one of (1) to (3), wherein the first adjustment value b and the second adjustment value c are determined based on the adjustment value list stored in the adjustment value storage unit. Audio signal processing device.

（５）前記調整値決定部は、前記上限値ｅが＋６ｄＢ以上の場合に、前記エンハンスメント量ｄを＋６ｄＢ以上とする、（２）に記載の音声信号処理装置。 (5) The audio signal processing device according to (2), wherein the adjustment value determining unit sets the enhancement amount d to +6 dB or more when the upper limit e is +6 dB or more.

（６）前記調整値決定部は、前記上限値ｈが＋６ｄＢ以上の場合に、前記エンハンスメント量ｄを＋６ｄＢ以上とする、（３）に記載の音声信号処理装置。 (6) The audio signal processing device according to (3), wherein the adjustment value determination unit sets the enhancement amount d to +6 dB or more when the upper limit h is +6 dB or more.

（７）前記上限値ｅは＋１２ｄＢである、（２）に記載の音声信号処理装置。 (7) The audio signal processing device according to (2), wherein the upper limit e is +12 dB.

（８）本開示に係るプログラムは、コンピュータを、（１）から（７）のいずれかに記載の音声信号処理装置として動作させる。 (8) A program according to the present disclosure causes a computer to operate as the audio signal processing device according to any one of (1) to (7).

本発明に係る音声信号処理装置およびプログラムによれば、複数の音声オブジェクトから構成される番組音声の全体的な印象の悪化を抑制しつつ、特定の音声オブジェクトの聞き取りやすさの向上を図ることができる。 According to the audio signal processing device and program according to the present invention, it is possible to improve the audibility of a specific audio object while suppressing deterioration of the overall impression of program audio composed of a plurality of audio objects. can.

本開示の一実施形態に係る音声信号処理装置の構成例を示す図である。1 is a diagram illustrating a configuration example of an audio signal processing device according to an embodiment of the present disclosure. 図１に示す調整値決定部の構成例を示す図である。FIG. 2 is a diagram showing a configuration example of an adjustment value determining section shown in FIG. 1; 図２に示す調整値記憶部が記憶する調整値リストの一例を示す図である。3 is a diagram showing an example of an adjustment value list stored in an adjustment value storage section shown in FIG. 2. FIG. 図２に示す調整値記憶部が記憶する調整値リストの他の一例を示す図である。3 is a diagram showing another example of an adjustment value list stored in the adjustment value storage section shown in FIG. 2. FIG. ダイアログおよび背景音のエネルギーの合計が一定である場合の、γおよびσの変化の様子を真数表示した図である。FIG. 7 is a diagram showing how γ and σ change when the sum of energy of dialogue and background sound is constant. ダイアログおよび背景音のエネルギーの合計が一定である場合の、γおよびσの変化の様子を対数表示した図である。FIG. 7 is a logarithmic diagram of changes in γ and σ when the total energy of dialogue and background sound is constant. 図１に示す音声信号処理装置の動作の一例を示すフローチャートである。2 is a flowchart showing an example of the operation of the audio signal processing device shown in FIG. 1. FIG. エンハンスメント操作のためのＵＩの一例を示す図である。FIG. 3 is a diagram illustrating an example of a UI for enhancement operations. スピーカ再生による、音素材の聞き取りおよび総合印象に関する評価結果を示す図である。FIG. 6 is a diagram showing evaluation results regarding the audibility and overall impression of sound materials through speaker reproduction. イヤホン・ヘッドホン再生による、音素材の聞き取りおよび総合印象に関する評価結果を示す図である。FIG. 4 is a diagram showing evaluation results regarding the audibility of sound materials and overall impressions through earphone/headphone playback. ダイアログエンハンスメントについて模式的に示す図である。FIG. 2 is a diagram schematically showing dialog enhancement.

以下、本発明の実施の形態について図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図１は、本開示の一実施形態に係る音声信号処理装置１０の構成例を示す図である。本実施形態に係る音声信号処理装置１０は、ダイアログおよび背景音などの複数の音声オブジェクトから構成される番組の音声信号を再生するものである。具体的には、音声信号処理装置１０は、上述したダイアログエンハンスメントのように、複数の音声オブジェクトの内、特定の音声オブジェクト（第１の音声オブジェクト）の信号レベルと、特定の音声オブジェクトと重畳して再生される他の音声オブジェクト（第２の音声オブジェクト）の信号レベルとの比ａを増加または減少させる信号処理を行う。以下では、ダイアログの信号レベルと、ダイアログと重畳して再生される背景音の信号レベルとの比ａを増加または減少させる例を用いて説明する。なお、番組の音声を構成する音声オブジェクトとして、例えば、複数の言語それぞれに対応するダイアログの音声オブジェクトと、背景音の音声オブジェクトとが含まれることがある。この場合、音声信号処理装置１０は、例えば、視聴者により選択された一つの言語に対応するダイアログの信号レベルと、背景音の信号レベルとの比ａを増加または減少させる。 FIG. 1 is a diagram illustrating a configuration example of an audio signal processing device 10 according to an embodiment of the present disclosure. The audio signal processing device 10 according to this embodiment reproduces the audio signal of a program that is composed of a plurality of audio objects such as dialogue and background sounds. Specifically, the audio signal processing device 10 determines the signal level of a specific audio object (first audio object) among the multiple audio objects and superimposes the specific audio object, like the dialog enhancement described above. Signal processing is performed to increase or decrease the ratio a to the signal level of another audio object (second audio object) to be reproduced. In the following, an example will be described in which the ratio a between the signal level of a dialogue and the signal level of a background sound reproduced superimposed on the dialogue is increased or decreased. Note that the audio objects that make up the audio of the program may include, for example, dialog audio objects and background sound audio objects that correspond to each of a plurality of languages. In this case, the audio signal processing device 10 increases or decreases, for example, the ratio a between the signal level of dialogue corresponding to one language selected by the viewer and the signal level of background sound.

図１に示すように、本実施形態に係る音声信号処理装置１０は、音声信号調整部１１，１２と、音声信号合成部１３と、調整値決定部１４とを備える。 As shown in FIG. 1, the audio signal processing device 10 according to this embodiment includes audio signal adjustment sections 11 and 12, an audio signal synthesis section 13, and an adjustment value determination section 14.

第１の音声信号調整部としての音声信号調整部１１は、番組を構成する複数の音声オブジェクトの内、ダイアログ（第１の音声オブジェクト）の音声信号が入力される。音声信号調整部１１は、後述する調整値決定部１４により決定された、ダイアログ（第１の音声オブジェクト）の信号レベルの調整値である第１の調整値ｂに基づき、入力されたダイアログの信号レベルを調整し、音声信号合成部１３に出力する。 The audio signal adjustment unit 11 serving as a first audio signal adjustment unit receives an audio signal of a dialog (first audio object) among a plurality of audio objects making up a program. The audio signal adjustment unit 11 adjusts the input dialogue signal based on a first adjustment value b, which is an adjustment value for the signal level of the dialogue (first audio object), determined by the adjustment value determination unit 14, which will be described later. The level is adjusted and output to the audio signal synthesis section 13.

第２の音声信号調整部としての音声信号調整部１２は、ダイアログと重畳して再生される背景音（第２の音声オブジェクト）の音声信号が入力される。音声信号調整部１２は、後述する調整値決定部１４により決定された、背景音（第２の音声オブジェクト）の信号レベルの調整値である第２の調整値ｃに基づき、入力された背景音の信号レベルを調整し、音声信号合成部１３に出力する。 The audio signal adjustment unit 12 as a second audio signal adjustment unit receives an audio signal of a background sound (second audio object) that is reproduced superimposed on the dialogue. The audio signal adjustment unit 12 adjusts the input background sound based on a second adjustment value c, which is an adjustment value for the signal level of the background sound (second audio object), determined by an adjustment value determination unit 14 to be described later. The signal level is adjusted and output to the audio signal synthesis section 13.

音声信号合成部１３は、音声信号調整部１１による信号レベルの調整後のダイアログの音声信号と、音声信号調整部１２による信号レベルの調整後の背景音の音声信号とを合成した再生音を出力する。 The audio signal synthesis unit 13 outputs a reproduced sound that is a combination of the dialog audio signal whose signal level has been adjusted by the audio signal adjustment unit 11 and the background sound audio signal whose signal level has been adjusted by the audio signal adjustment unit 12. do.

調整値決定部１４は、ダイアログの信号レベルと背景音の信号レベルとの比ａの増加または減少を要求するエンハンスメント要求が入力される。エンハンスメント要求としては、ダイアログの信号レベルと背景音の信号レベルとの比ａの増加量または減少量が入力（要求）されてよい。また、エンハンスメント要求は、エンハンスメント量ｄを指定せず、単にダイアログの信号レベルと背景音の信号レベルとの比ａの調整を要求するものであってもよい。この場合、調整値決定部１４は、予め定められた所定値のエンハンスメント量ｄが要求されたものとして処理する。調整値決定部１４は、音響メタデータが入力されてもよい。音響メタデータは、番組を構成する複数のオブジェクトの再生に関する情報である。音響メタデータは、例えば、エンハンスメント量ｄの上限値ｈを含む。 The adjustment value determination unit 14 receives an enhancement request requesting an increase or decrease in the ratio a between the dialog signal level and the background sound signal level. As the enhancement request, an increase or decrease in the ratio a between the dialog signal level and the background sound signal level may be input (requested). Alternatively, the enhancement request may simply request adjustment of the ratio a between the dialog signal level and the background sound signal level without specifying the enhancement amount d. In this case, the adjustment value determining unit 14 processes the process as if a predetermined value of enhancement amount d was requested. The adjustment value determination unit 14 may be input with acoustic metadata. The audio metadata is information regarding the reproduction of multiple objects that make up the program. The acoustic metadata includes, for example, the upper limit h of the enhancement amount d.

調整値決定部１４は、エンハンスメント要求が入力されると（ダイアログと背景音の信号レベルとの比ａの増加量または減少量がエンハンスメント量ｄとして要求されると）、ダイアログの信号レベルの調整値である第１の調整値ｂ、および、背景音の信号レベルの調整値である第２の調整値ｃを決定する。ここで、調整値決定部１４は、詳細は後述するが、エンハンスメント量ｄに合わせて、第１の調整値ｂの真数と第２の調整値ｃの真数との二乗和が一定になるように第１の調整値ｂおよび第２の調整値ｃを決定する。また、調整値決定部１４は、エンハンスメント量ｄが上限値ｅを超えないように第１の調整値ｂおよび第２の調整値ｃを決定してよい。 When an enhancement request is input (when the amount of increase or decrease of the ratio a between the signal level of the dialog and the background sound is requested as the enhancement amount d), the adjustment value determination unit 14 determines the adjustment value of the signal level of the dialog. A first adjustment value b, which is an adjustment value of the signal level of the background sound, and a second adjustment value c, which is an adjustment value of the signal level of the background sound, are determined. Here, the adjustment value determination unit 14 determines that the sum of squares of the antilog of the first adjustment value b and the antilog of the second adjustment value c becomes constant in accordance with the enhancement amount d, although the details will be described later. The first adjustment value b and the second adjustment value c are determined as follows. Further, the adjustment value determination unit 14 may determine the first adjustment value b and the second adjustment value c so that the enhancement amount d does not exceed the upper limit value e.

調整値決定部１４は、決定した第１の調整値ｂを音声信号調整部１１に出力し、決定した第２の調整値ｃを音声信号調整部１２に出力する。 The adjustment value determination unit 14 outputs the determined first adjustment value b to the audio signal adjustment unit 11 and outputs the determined second adjustment value c to the audio signal adjustment unit 12.

図２は、調整値決定部１４の構成例を示す図である。 FIG. 2 is a diagram showing a configuration example of the adjustment value determining section 14. As shown in FIG.

図２に示すように、調整値決定部１４は、調整値記憶部１４１と、調整値選択部１４２とを備える。 As shown in FIG. 2, the adjustment value determining section 14 includes an adjustment value storage section 141 and an adjustment value selection section 142.

調整値記憶部１４１は、エンハンスメント量ｄと、そのエンハンスメント量ｄに応じた第１の調整値ｂ（ダイアログの信号レベルの調整値）および第２の調整値ｃ（背景音の信号レベルの調整値）とが対応付けられた調整値リストを記憶する。 The adjustment value storage unit 141 stores an enhancement amount d, a first adjustment value b (dialog signal level adjustment value) and a second adjustment value c (background sound signal level adjustment value) corresponding to the enhancement amount d. ) are associated with each other.

図３Ａは、調整値記憶部１４１が記憶する調整値リストの一例を示す図である。図３Ａにおいては、エンハンスメント量ｄ（レベル）が０ｄＢ，＋６ｄＢ，＋１２ｄＢである場合の調整値リストの例を示している。 FIG. 3A is a diagram illustrating an example of an adjustment value list stored in the adjustment value storage unit 141. FIG. 3A shows an example of an adjustment value list when the enhancement amount d (level) is 0 dB, +6 dB, and +12 dB.

図３Ａに示すように、調整値記憶部１４１は、エンハンスメント量ｄと、第１の調整値ｂ（ダイアログの信号レベルの調整値（係数、レベル））および第２の調整値ｃ（背景音の信号レベルの調整値（係数、レベル））とを対応付けた調整値リストを記憶する。 As shown in FIG. 3A, the adjustment value storage unit 141 stores an enhancement amount d, a first adjustment value b (dialogue signal level adjustment value (coefficient, level)), and a second adjustment value c (background sound adjustment value). An adjustment value list in which signal level adjustment values (coefficients, levels) are associated is stored.

図３Ａにおいては、エンハンスメント量ｄ（レベル）が０ｄＢ，＋６ｄＢ，＋１２ｄＢである例を用いて説明したが、本開示はこれに限られるものではない。調整値記憶部１４１は、図３Ｂに示すように、より広範囲のエンハンスメント量ｄと、第１の調整値ｂおよび第２の調整値ｃとが対応付けられた調整値リストを記憶してもよい。ただし詳細は後述するが、ダイアログエンハンスメントによるダイアログの信号レベルと背景音の信号レベルとの比ａの増加量はある値以上でないとダイアログの聞き取りの改善効果が視聴者に感じられにくく、また、ダイアログの信号レベルと背景音の信号レベルとの比ａの増加量が大きすぎると、番組音声の全体的な印象が悪化してしまう。そのため、図３Ａに示すように、エンハンスメント量ｄの区切りは＋６ｄＢ程度、また、エンハンスメント量ｄの上限値ｅは＋１２ｄＢ程度とするのが好ましい。 In FIG. 3A, an example has been described in which the enhancement amount d (level) is 0 dB, +6 dB, and +12 dB, but the present disclosure is not limited to this. The adjustment value storage unit 141 may store an adjustment value list in which a wider range of enhancement amounts d are associated with a first adjustment value b and a second adjustment value c, as shown in FIG. 3B. . However, as will be described in detail later, unless the increase in the ratio a between the dialog signal level and the background sound signal level due to dialog enhancement exceeds a certain value, it will be difficult for the viewer to feel the improvement in dialog audibility. If the amount of increase in the ratio a between the signal level of the background sound and the signal level of the background sound is too large, the overall impression of the program sound will deteriorate. Therefore, as shown in FIG. 3A, it is preferable that the enhancement amount d is separated by approximately +6 dB, and the upper limit e of the enhancement amount d is approximately +12 dB.

図３Ａ，３Ｂに示すような調整値リストは予め計算により求めることができる。以下では、エンハンスメント量ｄに応じた、第１の調整値ｂおよび第２の調整値ｃの計算方法について説明する。 Adjustment value lists such as those shown in FIGS. 3A and 3B can be obtained by calculation in advance. Below, a method of calculating the first adjustment value b and the second adjustment value c according to the enhancement amount d will be explained.

ダイアログの音声オブジェクトＤを、時間ｔの関数およびチャネル数ｍを用いて式（１）のように定義し、背景音の音声オブジェクトＢを、時間ｔの関数およびチャネル数ｎを用いて式（２）のように定義する。 Dialog audio object D is defined using equation (1) using a function of time t and number of channels m, and background sound audio object B is defined using equation (2) using a function of time t and number of channels n. ).

また、時間ｔが０からＴの範囲の、各音声オブジェクトの平均エネルギーＥ_Ｄ，Ｅ_Ｂは以下の式（３）および式（４）で表される。 Further, the average energies E _D and E _B of each audio object in the range of time t from 0 to T are expressed by the following equations (3) and (4).

なお、エネルギーレベルとラウドネスレベルとは一般に異なる値であるが、エンハンスメント量ｄおよびダイアログ／背景音比（以下、「Ｄ／Ｂ比」と称する。）など、各音声オブジェクト間の比に関する数値の計算においては、エネルギーレベルとラウドネスレベルとを等価とみなすことができることとする。 Although the energy level and loudness level are generally different values, calculation of numerical values related to the ratio between each audio object, such as the enhancement amount d and the dialogue/background sound ratio (hereinafter referred to as "D/B ratio"). In this case, it is assumed that the energy level and the loudness level can be considered equivalent.

ダイアログ用の係数γ（γ≧０）および背景音用の係数δ（δ≧０）を用いると、ダイアログエンハンスメント後のダイアログ

および背景音

はそれぞれ、以下の式（５），（６）で表される。なお、ダイアログ用の係数γは、第１の調整値ｂの真数に相当し、背景音用の係数δは、第２の調整値ｃの真数に相当する。 Using the coefficient γ (γ≧0) for dialog and the coefficient δ (δ≧0) for background sound, the dialog after dialog enhancement

and background sounds

are respectively expressed by the following equations (5) and (6). Note that the coefficient γ for dialog corresponds to the antilog of the first adjustment value b, and the coefficient δ for background sound corresponds to the antilog of the second adjustment value c.

同様に、ダイアログエンハンスメント後のダイアログの平均エネルギー

および背景音の平均エネルギー

はそれぞれ、以下の式（７），（８）で表される。 Similarly, the average energy of the dialog after dialog enhancement

and the average energy of the background sound.

are respectively expressed by the following equations (7) and (8).

次に、ダイアログエンハンスメントに係るパラメータである、エンハンスメント量ｇおよびＤ／Ｂ比ｒをそれぞれ、以下の式（９）、（１０）のように定義する。 Next, the enhancement amount g and the D/B ratio r, which are parameters related to dialogue enhancement, are defined as shown in the following equations (9) and (10), respectively.

エンハンスメント量ｇのレベルをＧ、Ｄ／Ｂ比ｒのレベルをＲとすると、それぞれ以下の式（１１），（１２）で表される。 Letting the level of the enhancement amount g be G and the level of the D/B ratio r be R, they are expressed by the following equations (11) and (12), respectively.

ここで、係数γ，δの初期値を１とすると、ダイアログの強調条件はγ＞１，δ＝１となる。また、ダイアログの抑制条件はγ＜１，δ＝１となる。しかしながら、この条件では、ダイアログエンハンスメントに伴って番組音声全体のエネルギーも変化してしまい、番組の全体の平均ラウドネスレベルを規制している現在の運用規定に合わなくなってしまう。そのため、番組音声を構成する全ての音声オブジェクトのエネルギーの合計を一定に保つことで、番組音声の音量感が大きく変化しないようにダイアログエンハンスメントを実装することが望ましい。全ての音声オブジェクト（ここでは、ダイアログおよび背景音）のエネルギーの合計が一定であるという条件は以下の式（１３）で表される。 Here, if the initial values of the coefficients γ and δ are 1, the dialogue emphasis conditions are γ>1 and δ=1. Further, the dialog suppression conditions are γ<1, δ=1. However, under this condition, the energy of the entire program audio changes with dialog enhancement, which does not meet the current operational regulations regulating the overall average loudness level of the program. Therefore, it is desirable to implement dialogue enhancement so that the perceived volume of program audio does not change significantly by keeping the total energy of all audio objects that make up the program audio constant. The condition that the total energy of all audio objects (here, dialogue and background sounds) is constant is expressed by the following equation (13).

なお、実際にダイアログエンハンスメントを適用するためには、エンハンスメント量ｇに応じた係数γ，δの値が必要であり、そのためには平均エネルギーＥ_Ｄ,Ｅ_Ｂの値が必要になる。平均エネルギーＥ_Ｄ，Ｅ_Ｂの値が音響メタデータなどから取得可能な場合、実際の平均エネルギーＥ_Ｄ，Ｅ_Ｂの値を用いて係数γ，δの値を導出し、ダイアログエンハンスメントを行うことができる。 Note that in order to actually apply dialogue enhancement, values of coefficients γ and δ corresponding to the enhancement amount g are required, and for this purpose, values of average energies E _D and E _B are required. If the values of the average energies E _D and E _B can be obtained from acoustic metadata etc., the values of the coefficients γ and δ can be derived using the actual values of the average energies E _D and E _B to perform dialogue enhancement. can.

一方、音声信号の状態が逐次変化する生放送においては、各時点での正確な平均エネルギーＥ_Ｄ，Ｅ_Ｂの値を受信機側で把握できないことが多い。そこで、ダイアログエンハンスメント機能が用いられる条件としてＥ_Ｄ＝Ｅ_Ｂと仮定すると、以下の式（１４）が導かれる。
γ^２＋δ^２＝２式（１４） On the other hand, in live broadcasting where the state of the audio signal changes successively, it is often impossible for the receiver side to grasp the accurate values of the average energies E _D and E _B at each point in time. Therefore, assuming that E _D =E _B as a condition for using the dialog enhancement function, the following equation (14) is derived.
γ ² + δ ² = 2 Equation (14)

ここで、媒介変数θ（０≦θ≦π／２）を導入すると、係数γ，δは以下の式（１５），（１６）で表すことができる。 Here, if a parameter θ (0≦θ≦π/2) is introduced, the coefficients γ and δ can be expressed by the following equations (15) and (16).

図４Ａは、式（１５），（１６）より得られる、ダイアログおよび背景音のエネルギーの合計が一定である場合の、係数γ，δの変化の様子を真数表示した図である。また、図４Ｂは、式（１５），（１６）より得られる、ダイアログおよび背景音のエネルギーの合計が一定である場合の、係数γ，δの変化の様子を対数表示した図である。図４Ｂにおける、２０ｌｏｇγ－２０ｌｏｇδがエンハンスメント量ｄに相当する。 FIG. 4A is an antilog representation of how the coefficients γ and δ change when the sum of the energies of dialogue and background sound is constant, obtained from equations (15) and (16). Further, FIG. 4B is a logarithmic diagram of how the coefficients γ and δ change when the sum of the energies of dialogue and background sound obtained from equations (15) and (16) is constant. In FIG. 4B, 20logγ−20logδ corresponds to the enhancement amount d.

式（１５），（１６）および図４Ａ，４Ｂに基づき、各エンハンスメント量ｄに対応する、ダイアログの信号レベルおよび背景音の信号レベルの調整値を決定し、図３Ａ，３Ｂに示すような、調整値リストを作成することができる。 Based on equations (15) and (16) and FIGS. 4A and 4B, the adjustment values for the dialog signal level and background sound signal level corresponding to each enhancement amount d are determined, and the adjustment values are as shown in FIGS. 3A and 3B. A list of adjustment values can be created.

図２を再び参照すると、調整値選択部１４２は、エンハンスメント要求が入力される。また、調整値選択部１４２は、音響メタデータが入力されてもよい。調整値選択部１４２は、エンハンスメント要求（ダイアログと背景音の信号レベルとの比ａの増加量または減少量がエンハンスメント量ｄとして要求されると）が入力されると、調整値記憶部１４１に記憶されている調整値リストを参照し、第１の調整値ｂおよび第２の調整値ｃを選択する。式（１４）から分かるように、調整値リストに記載される第１の調整値ｂおよび第２の調整値ｃは、ダイアログ用の係数γ（第１の調整値ｂの真数）と、背景音用の係数δ（第２の調整値ｃの真数）との二乗和が一定となるようにして決定された値である。したがって、調整値選択部１４２は、第１の調整値ｂの真数と、第２の調整値ｃの真数との二乗和が一定となるように、第１の調整値ｂおよび第２の調整値ｃを決定する。調整値選択部１４２は、エンハンスメント要求として、エンハンスメント量ｄが入力された場合、調整値リストにおいて、そのエンハンスメント量ｄに対応付けられている調整値を、第１の調整値ｂおよび第２の調整値ｃとして選択する。例えば、調整値記憶部１４１には図３Ａに示す調整値リストが記憶されており、エンハンスメント要求に含まれるエンハンスメント量ｄが＋６ｄＢである場合、調整値リストにおいて、＋６ｄＢのエンハンスメント量ｄに対応付けられている調整値を、第１の調整値ｂおよび第２の調整値ｃとして選択する。 Referring to FIG. 2 again, the adjustment value selection unit 142 receives an enhancement request. Further, acoustic metadata may be input to the adjustment value selection unit 142. When an enhancement request (when an increase or decrease amount of the ratio a between the dialog and background sound signal levels is requested as the enhancement amount d), the adjustment value selection unit 142 stores the adjustment value in the adjustment value storage unit 141. The first adjustment value b and the second adjustment value c are selected with reference to the adjustment value list provided. As can be seen from equation (14), the first adjustment value b and the second adjustment value c listed in the adjustment value list are based on the dialog coefficient γ (the antilog of the first adjustment value b) and the background This value is determined so that the sum of squares with the sound coefficient δ (the antilog of the second adjustment value c) is constant. Therefore, the adjustment value selection unit 142 selects the first adjustment value b and the second adjustment value b so that the sum of squares of the antilog of the first adjustment value b and the antilog of the second adjustment value c is constant. Determine the adjustment value c. When an enhancement amount d is input as an enhancement request, the adjustment value selection unit 142 selects the adjustment value associated with the enhancement amount d from the first adjustment value b and the second adjustment value in the adjustment value list. Select as the value c. For example, the adjustment value storage unit 141 stores the adjustment value list shown in FIG. 3A, and if the enhancement amount d included in the enhancement request is +6 dB, the adjustment value list corresponds to the enhancement amount d of +6 dB. are selected as the first adjustment value b and the second adjustment value c.

なお、エンハンスメント要求に含まれるエンハンスメント量ｄが、調整値リストに含まれるエンハンスメント量ｄと一致しない場合がある。この場合、調整値選択部１４２は、例えば、調整値リストに含まれるエンハンスメント量ｄの内、エンハンスメント要求に含まれるエンハンスメント量ｄと近いエンハンスメント量ｄに対応付けられている調整値を、第１の調整値ｂおよび第２の調整値ｃとして選択する。例えば、調整値記憶部１４１には図３Ａに示す調整値リストが記憶されており、エンハンスメント要求に含まれるエンハンスメント量ｄが＋５ｄＢである場合、調整値リストにおいて、＋５ｄＢに最も近い＋６ｄＢのエンハンスメント量ｄに対応付けられている調整値を、第１の調整値ｂおよび第２の調整値ｃとして選択する。なお、エンハンスメント要求に含まれるエンハンスメント量ｄに近い調整値リストに含まれるエンハンスメント量ｄが２種類ある場合は、どちらかを適宜選択する。 Note that the enhancement amount d included in the enhancement request may not match the enhancement amount d included in the adjustment value list. In this case, the adjustment value selection unit 142 selects, for example, an adjustment value associated with an enhancement amount d that is close to the enhancement amount d included in the enhancement request among the enhancement amounts d included in the adjustment value list. selected as the adjustment value b and the second adjustment value c. For example, the adjustment value storage unit 141 stores the adjustment value list shown in FIG. 3A, and if the enhancement amount d included in the enhancement request is +5 dB, the enhancement amount d of +6 dB closest to +5 dB in the adjustment value list The adjustment values associated with are selected as the first adjustment value b and the second adjustment value c. Note that if there are two types of enhancement amounts d included in the adjustment value list close to the enhancement amount d included in the enhancement request, one of them is selected as appropriate.

また、エンハンスメント要求がエンハンスメント量を含まず、ダイアログの音量の増加だけを要求する場合、調整値選択部１４２は、例えば、予め定められた所定値のエンハンスメント量ｄが要求されたものとして、第１の調整値ｂおよび第２の調整値ｃを選択してもよい。この場合、調整値選択部１４２は、エンハンスメント要求が行われるたびに、エンハンスメント量ｄが上限値ｅを超えない範囲で、エンハンスメント量ｄを増加させる。例えば、調整値選択部１４２は、上限値ｅが＋１２ｄＢであるとすると、１回目のエンハンスメント要求が入力されると、エンハンスメント量ｄを＋６ｄＢとし、２回目のエンハンスメント要求が入力されると、エンハンスメント量ｄを＋１２ｄＢとし、３回目のエンハンスメント要求が入力されると、エンハンスメント量ｄを０としてもよい。また、調整値選択部１４２は、上限値ｅが＋６ｄＢより大きく、＋１２ｄＢ未満である場合、１回目のエンハンスメント要求が入力されると、エンハンスメント量ｄを＋６ｄＢとし、２回目のエンハンスメント要求が入力されると、エンハンスメント量ｄをその上限値としてもよい。また、調整値選択部１４２は、上限値ｅが＋６ｄＢ未満である場合、１回目のエンハンスメント要求が入力されると、エンハンスメント量ｄをその上限値とし、２回目のエンハンスメント要求が入力されると、エンハンスメント量ｄを０としてよい。 Further, when the enhancement request does not include an enhancement amount and requests only an increase in the volume of the dialog, the adjustment value selection unit 142 selects the first The adjustment value b and the second adjustment value c may be selected. In this case, the adjustment value selection unit 142 increases the enhancement amount d within a range where the enhancement amount d does not exceed the upper limit e every time an enhancement request is made. For example, if the upper limit value e is +12 dB, when the first enhancement request is input, the adjustment value selection unit 142 sets the enhancement amount d to +6 dB, and when the second enhancement request is input, the adjustment value selection unit 142 sets the enhancement amount d to +6 dB. d may be set to +12 dB, and when the third enhancement request is input, the enhancement amount d may be set to 0. Further, when the upper limit value e is greater than +6 dB and less than +12 dB, when the first enhancement request is input, the adjustment value selection unit 142 sets the enhancement amount d to +6 dB, and inputs the second enhancement request. The enhancement amount d may be set as the upper limit value. Further, when the upper limit value e is less than +6 dB, when the first enhancement request is input, the adjustment value selection unit 142 sets the enhancement amount d to the upper limit value, and when the second enhancement request is input, The enhancement amount d may be set to 0.

また、調整値選択部１４２は、エンハンスメント要求にエンハンスメント量ｄが含まれていても、そのエンハンスメント量ｄに関わらず、所定値だけエンハンスメント量ｄが増加または減少するように、第１の調整値ｂおよび第２の調整値ｃを選択してもよい。すなわち、調整値選択部１４２は、例えば、図３Ａに示す調整値リストが調整値記憶部１４１に記憶されている場合、エンハンスメント量ｄがエンハンスメント要求に含まれているか否かに関わらず、エンハンスメント量ｄが所定値（例えば、＋６ｄＢ））以上となるように、第１の調整値ｂおよび第２の調整値ｃを選択してよい。 Further, the adjustment value selection unit 142 selects the first adjustment value b so that the enhancement amount d is increased or decreased by a predetermined value, regardless of the enhancement amount d, even if the enhancement amount d is included in the enhancement request. and the second adjustment value c may be selected. That is, for example, when the adjustment value list shown in FIG. 3A is stored in the adjustment value storage unit 141, the adjustment value selection unit 142 selects the enhancement amount regardless of whether the enhancement amount d is included in the enhancement request. The first adjustment value b and the second adjustment value c may be selected such that d is greater than or equal to a predetermined value (for example, +6 dB).

調整値選択部１４２は、選択した第１の調整値ｂを音声信号調整部１１に出力し、選択した第２の調整値ｃを音声信号調整部１２に出力する。 The adjustment value selection unit 142 outputs the selected first adjustment value b to the audio signal adjustment unit 11 and outputs the selected second adjustment value c to the audio signal adjustment unit 12.

次に、本実施形態に係る音声信号処理装置１０の動作について説明する。図５は、本実施形態に係る音声信号処理装置１０の動作の一例を示すフローチャートである。図５においては、エンハンスメント要求が行われるたびに、上限値ｅ（＋１２ｄＢ）を超えない範囲で、ダイアログの音量を所定値（＋６ｄＢ）ずつ増加させる場合の、音声信号処理装置１０の動作を例として説明する。 Next, the operation of the audio signal processing device 10 according to this embodiment will be explained. FIG. 5 is a flowchart showing an example of the operation of the audio signal processing device 10 according to this embodiment. In FIG. 5, the operation of the audio signal processing device 10 is taken as an example when the volume of the dialog is increased by a predetermined value (+6 dB) within a range not exceeding the upper limit e (+12 dB) every time an enhancement request is made. explain.

音声信号調整部１１は、ダイアログの音声オブジェクトを取得する（ステップＳ１０１）。音声信号調整部１２は、背景音の音声オブジェクトを取得する（ステップＳ１０２）。また、調整値決定部１４は、音響メタデータを取得する（ステップＳ１０３）。 The audio signal adjustment unit 11 acquires the audio object of the dialog (step S101). The audio signal adjustment unit 12 acquires the audio object of the background sound (step S102). Further, the adjustment value determining unit 14 acquires acoustic metadata (step S103).

調整値決定部１４は、取得した音響メタデータからエンハンスメント量ｄの上限値ｈを取得したか否かを判定する（ステップＳ１０４）。 The adjustment value determination unit 14 determines whether the upper limit h of the enhancement amount d has been acquired from the acquired acoustic metadata (step S104).

音響メタデータにエンハンスメント量ｄの上限値ｈが含まれておらず、エンハンスメント量ｄの上限値ｈを取得していないと判定した場合（ステップＳ１０４：Ｎｏ）調整値決定部１４は、エンハンスメント量ｄの上限値ｅを＋１２ｄＢに設定し（ステップＳ１０５）、後述するステップＳ１０７の処理に進む。 When it is determined that the upper limit h of the enhancement amount d is not included in the acoustic metadata and the upper limit h of the enhancement amount d has not been acquired (step S104: No), the adjustment value determination unit 14 determines that the upper limit h of the enhancement amount d The upper limit value e of is set to +12 dB (step S105), and the process proceeds to step S107, which will be described later.

エンハンスメント量ｄの上限値ｈを取得したと判定した場合（ステップＳ１０４：Ｙｅｓ）調整値決定部１４は、取得したエンハンスメント量ｄの上限値ｈは＋６ｄＢより大きいか否かを判定する（ステップＳ１０６）。 If it is determined that the upper limit h of the enhancement amount d has been obtained (step S104: Yes), the adjustment value determining unit 14 determines whether the obtained upper limit h of the enhancement amount d is greater than +6 dB (step S106). .

取得したエンハンスメント量ｄの上限値ｈは＋６ｄＢより大きいと判定した場合（ステップＳ１０６：Ｙｅｓ）、および、上限値ｅを＋１２ｄＢと設定した（上限値ｅが＋６ｄＢ以上である）場合（ステップＳ１０５）、調整値決定部１４は、エンハンスメント量ｄを＋６ｄＢに設定する（ステップＳ１０７）。このように、調整値決定部１４は、上限値ｈが＋６ｄＢ以上の場合（ステップＳ１０６：Ｙｅｓ）、あるいは、上限値ｅが＋６ｄＢ以上の場合（ステップＳ１０５）、エンハンスメント量ｄを＋６ｄＢ以上とする。 If it is determined that the upper limit h of the acquired enhancement amount d is greater than +6 dB (step S106: Yes), and if the upper limit e is set to +12 dB (the upper limit e is greater than or equal to +6 dB) (step S105), The adjustment value determining unit 14 sets the enhancement amount d to +6 dB (step S107). In this way, the adjustment value determining unit 14 sets the enhancement amount d to be +6 dB or more when the upper limit h is +6 dB or more (Step S106: Yes) or when the upper limit e is +6 dB or more (Step S105).

取得したエンハンスメント量ｄの上限値ｈは＋６ｄＢより大きくないと判定した場合（ステップＳ１０６：Ｎｏ）、調整値決定部１４は、エンハンスメント量ｄを取得した上限値ｈに設定する（ステップＳ１０８）。このように、調整値決定部１４は、音響メタデータに含まれるエンハンスメント量ｄの上限値ｈを取得した場合、上限値ｈを超えないように、エンハンスメント量ｄを設定する（ステップＳ１０７，Ｓ１０８）。そして、調整値決定部１４は、設定したエンハンスメント量ｄに基づき、第１の調整値ｂおよび第２の調整値ｃを決定する。すなわち、調整値決定部１４は、音響メタデータに含まれるエンハンスメント量ｄの上限値ｈを取得した場合、エンハンスメント量ｄが上限値ｈを超えないように第１の調整値ｂおよび第２の調整値ｃを決定する。 If it is determined that the upper limit h of the acquired enhancement amount d is not greater than +6 dB (step S106: No), the adjustment value determining unit 14 sets the enhancement amount d to the acquired upper limit h (step S108). In this way, when the adjustment value determining unit 14 obtains the upper limit h of the enhancement amount d included in the acoustic metadata, the adjustment value determining unit 14 sets the enhancement amount d so as not to exceed the upper limit h (steps S107, S108). . Then, the adjustment value determination unit 14 determines the first adjustment value b and the second adjustment value c based on the set enhancement amount d. That is, when the adjustment value determining unit 14 acquires the upper limit h of the enhancement amount d included in the acoustic metadata, the adjustment value determination unit 14 sets the first adjustment value b and the second adjustment value so that the enhancement amount d does not exceed the upper limit h. Determine the value c.

エンハンスメント量ｄを設定した後、エンハンスメント要求を取得すると（ステップＳ１０９）、調整値決定部１４は、調整値リストを参照して、第１の調整値ｂおよび第２の調整値ｃを決定する。図５に示す例では、調整値決定部１４は、上限値ｅが＋６ｄＢ以上の場合に、エンハンスメント量ｄを＋６ｄＢ以上とする。調整値決定部１４により決定された調整値に基づき、ダイアログエンハンスメントが実施される（ステップＳ１１０）。具体的には、音声信号調整部１１は、調整値決定部１４により決定された第１の調整値ｂに基づきダイアログの信号レベルを調整し、調整後のダイアログの音声信号を音声信号合成部１３に出力する。また、音声信号調整部１２は、調整値決定部１４により決定された第２の調整値ｃに基づき背景音の信号レベルを調整し、調整後の背景音の音声信号を音声信号合成部１３に出力する。 After setting the enhancement amount d and obtaining an enhancement request (step S109), the adjustment value determination unit 14 refers to the adjustment value list and determines the first adjustment value b and the second adjustment value c. In the example shown in FIG. 5, the adjustment value determination unit 14 sets the enhancement amount d to +6 dB or more when the upper limit e is +6 dB or more. Dialog enhancement is performed based on the adjustment value determined by the adjustment value determination unit 14 (step S110). Specifically, the audio signal adjustment unit 11 adjusts the dialog signal level based on the first adjustment value b determined by the adjustment value determination unit 14, and sends the adjusted dialogue audio signal to the audio signal synthesis unit 13. Output to. The audio signal adjustment unit 12 also adjusts the signal level of the background sound based on the second adjustment value c determined by the adjustment value determination unit 14, and sends the adjusted background sound audio signal to the audio signal synthesis unit 13. Output.

音声信号合成部１３は、音声信号調整部１１から出力された音声信号に示されるダイアログと、音声信号調整部１２から出力された音声信号に示される背景音とを合成し、再生音として出力する（ステップＳ１１１）。エンハンスメント要求が再び入力されると、調整値決定部１４は、ステップＳ１０９の処理に戻る。 The audio signal synthesis unit 13 synthesizes the dialog shown in the audio signal output from the audio signal adjustment unit 11 and the background sound shown in the audio signal output from the audio signal adjustment unit 12, and outputs the synthesized sound as reproduced sound. (Step S111). When the enhancement request is input again, the adjustment value determining unit 14 returns to the process of step S109.

図６は、ユーザがエンハンスメント要求を入力する際のＵＩ（User Interface）の一例を示す図であり、テレビなどの表示装置における表示例を示す図である。 FIG. 6 is a diagram illustrating an example of a UI (User Interface) when a user inputs an enhancement request, and is a diagram illustrating an example display on a display device such as a television.

例えば、ダイアログエンハンスメントが行われていない状態では、図６に示すように、ダイアログエンハンスメントが行われていない（エンハンスメント量ｄが０ｄＢである）ことを示すアイコン２１が表示される。アイコン２１が表示された状態で、リモートコントローラなどを介してダイアログエンハンスメントを行うための所定の操作が行われると、図６に示すように、エンハンスメント量ｄを＋６ｄＢとすることを示すアイコン２２が表示される。 For example, in a state where dialog enhancement is not being performed, as shown in FIG. 6, an icon 21 indicating that dialog enhancement is not being performed (the amount of enhancement d is 0 dB) is displayed. When a predetermined operation for dialog enhancement is performed via a remote controller or the like while the icon 21 is displayed, an icon 22 indicating that the enhancement amount d is set to +6 dB is displayed as shown in FIG. be done.

アイコン２２が表示された状態で、リモートコントローラなどを介してダイアログエンハンスメントを行うための所定の操作が行われると、図６に示すように、エンハンスメント量ｄを＋１２ｄＢとすることを示すアイコン２３が表示される。 When a predetermined operation for dialog enhancement is performed via a remote controller or the like while the icon 22 is displayed, an icon 23 indicating that the enhancement amount d is set to +12 dB is displayed as shown in FIG. be done.

アイコン２３が表示された状態で、リモートコントローラなどを介してダイアログエンハンスメントを行うための所定の操作が行われると、図６に示すように、ダイアログの出力をなくす（エンハンスメント量ｄを－∞とする）ことを示すアイコン２４が表示される。 When a predetermined operation for dialog enhancement is performed via a remote controller or the like while the icon 23 is displayed, the dialog output is eliminated (the enhancement amount d is set to -∞), as shown in FIG. ) is displayed.

アイコン２４が表示された状態で、リモートコントローラなどを介してダイアログエンハンスメントを行うための所定の操作が行われると、図６に示すように、アイコン２１が再び表示される。そして、例えば、アイコン２１～２４の表示に応じたエンハンスメント要求が出力される。 When a predetermined operation for dialog enhancement is performed via a remote controller or the like while the icon 24 is displayed, the icon 21 is displayed again as shown in FIG. 6. Then, for example, an enhancement request corresponding to the display of the icons 21 to 24 is output.

次に、本願発明者らが行った、ダイアログエンハンスメントの効果の主観評価の結果について説明する。評価手法は、Rec. ITU-T P.800 Annex Eに規定されている、比較範疇尺度法（CMOS）を採用した。各試行では、Ｄ／Ｂ比の異なる２個１組の刺激をランダムに評価者に提示した。評価者は，コンテンツ聴取に普段使用する再生機器（スピーカ、ヘッドホン、イヤホンのいずれか）を用いて素材音を聴取し、ランダムに指定された一方の刺激を基準とし、もう一方の刺激を、「ダイアログの聞き取り（以下、聞き取り）」「番組の総合印象（以下，総合印象）」それぞれの観点で、非常に良い（＋３）、良い（＋２）、やや良い（＋１）、ほとんど同じ（０）、やや悪い（－１）、悪い（－２）、非常に悪い（－３）の７段階で評価した。 Next, the results of a subjective evaluation of the effects of dialogue enhancement conducted by the inventors of the present application will be explained. The comparative category scaling method (CMOS) specified in Rec. ITU-T P.800 Annex E was used as the evaluation method. In each trial, pairs of stimuli with different D/B ratios were randomly presented to the rater. The evaluator listens to the material sound using the playback device (speakers, headphones, or earphones) that they usually use to listen to the content, and uses one randomly designated stimulus as the reference and the other stimulus as From the perspectives of listening to the dialogue (hereinafter referred to as listening) and ``overall impression of the program (hereinafter referred to as overall impression)'', the ratings were: very good (+3), good (+2), somewhat good (+1), almost the same (0), Evaluation was made on a seven-point scale: somewhat bad (-1), bad (-2), and very bad (-3).

評価に用いる素材音としては１０種類の番組の音声を用意し、各素材音の長さは約２０秒とし、ダイアログおよび背景音の平均ラウドネスレベルはそれぞれ、－２７ＬＫＦＳ（loudness, K-weighted, relative to full scale）に調整した。ダイアログと背景音とを組み合わせた刺激全体の平均ラウドネスレベルを一定に保ち、Ｄ／Ｂ比を３ｄＢ刻みで－６～＋１２ｄＢの範囲で変化させた。 We prepared audio from 10 different programs as the material sounds used for evaluation, and the length of each material sound was approximately 20 seconds.The average loudness level of dialogue and background sound was -27LKFS (loudness, K-weighted, relative). to full scale). The average loudness level of the entire stimulus combining dialogue and background sound was kept constant, and the D/B ratio was varied in a range of -6 to +12 dB in 3 dB steps.

図７Ａは、スピーカ再生による、全１０番組分の音素材の聞き取りおよび総合印象に関する評価結果の平均と、９５％信頼区間とを、ダイアログエンハンスメント前のＤ／Ｂ比（以下、「初期Ｄ／Ｂ比」と称する。）からのエンハンスメント量ｄを変数として示した図である。また、図７Ｂは、ヘッドホン・イヤホン再生による、全１０番組分の音素材の聞き取りおよび総合印象に関する評価結果の平均と、９５％信頼区間とを、初期Ｄ／Ｂ比からのエンハンスメント量ｄを変数として示した図である。なお、評価者数はいずれも２５名である。 FIG. 7A shows the average and 95% confidence interval of the evaluation results regarding the listening and overall impression of the sound materials for all 10 programs by speaker playback. FIG. 3 is a diagram showing the enhancement amount d from the "ratio" as a variable. Furthermore, FIG. 7B shows the average and 95% confidence interval of the evaluation results regarding the listening and overall impression of the sound materials of all 10 programs through headphone/earphone playback, and the enhancement amount d from the initial D/B ratio as a variable. It is a diagram shown as . The number of evaluators was 25 in each case.

図７Ａ，７Ｂに示されるように、ダイアログエンハンスメントにより聞き取りが改善されることが確認された。ただし、エンハンスメント量ｄが＋１２ｄＢ以上では、聞き取りの改善効果は飽和し、どの初期Ｄ／Ｂ比でも、評点は＋２に達しなかった。また、エンハンスメント量ｄが＋３ｄＢの場合、どの初期Ｄ／Ｂでも、聞き取りに関する評価カテゴリが変わるほどの効果はなかった。これは、エンハンスメント量ｄが小さいと、ダイアログエンハンスメントの効果を知覚しにくくなるためと考えられる。 As shown in FIGS. 7A and 7B, it was confirmed that dialogue enhancement improved listening comprehension. However, when the enhancement amount d was +12 dB or more, the audibility improvement effect was saturated, and the score did not reach +2 at any initial D/B ratio. Further, when the enhancement amount d was +3 dB, no initial D/B had enough effect to change the evaluation category regarding hearing. This is considered to be because when the enhancement amount d is small, it becomes difficult to perceive the effect of dialog enhancement.

図７Ａ，７Ｂに示されるように、ダイアログエンハンスメントにより総合印象が低下する場合があることが確認された。これは、Ｄ／Ｂ比が大きくなりすぎると、番組の臨場感が損なわれることがあるためと考えられる。したがって、上限値ｅを＋１２ｄＢとし、本実施形態に係る音声信号処理装置１０のように、エンハンスメント量ｄが上限値ｅを超えない範囲で、エンハンスメント量ｄを所定値（例えば、＋６ｄＢ）だけ増加または減少させるように、第１の調整値ｂおよび第２の調整値ｃを決定することで、複数の音声オブジェクトから構成される番組音声の全体的な印象の悪化を抑制しつつ、特定の音声オブジェクトの聞き取りやすさの向上を図ることができる。 As shown in FIGS. 7A and 7B, it was confirmed that the overall impression may deteriorate due to dialogue enhancement. This is thought to be because if the D/B ratio becomes too large, the realism of the program may be impaired. Therefore, the upper limit e is set to +12 dB, and as in the audio signal processing device 10 according to the present embodiment, the enhancement amount d is increased by a predetermined value (for example, +6 dB) or By determining the first adjustment value b and the second adjustment value c so that the specific audio object It is possible to improve the ease of hearing.

実施形態では特に触れていないが、コンピュータを、音声信号処理装置１０として動作させるプログラムが提供されてもよい。また、プログラムは、コンピュータ読取り可能媒体に記録されていてもよい。コンピュータ読取り可能媒体を用いれば、コンピュータにインストールすることが可能である。ここで、プログラムが記録されたコンピュータ読取り可能媒体は、非一過性の記録媒体であってもよい。非一過性の記録媒体は、特に限定されるものではないが、例えば、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭなどの記録媒体であってもよい。 Although not specifically mentioned in the embodiment, a program for causing a computer to operate as the audio signal processing device 10 may be provided. Moreover, the program may be recorded on a computer-readable medium. It can be installed on a computer using computer readable media. Here, the computer-readable medium on which the program is recorded may be a non-transitory recording medium. The non-transitory recording medium is not particularly limited, but may be a recording medium such as a CD-ROM or a DVD-ROM.

あるいは、音声信号処理装置１０が行う各処理を実行するためのプログラムを記憶するメモリ、および、メモリに記憶されたプログラムを実行するプロセッサによって構成され、音声信号処理装置１０に搭載されるチップが提供されてもよい。 Alternatively, a chip installed in the audio signal processing device 10 is provided, which is configured by a memory that stores a program for executing each process performed by the audio signal processing device 10, and a processor that executes the program stored in the memory. may be done.

上述の実施形態は代表的な例として説明したが、本発明の趣旨および範囲内で、多くの変更および置換が可能であることは当業者に明らかである。したがって、本発明は、上述の実施形態によって制限するものと解するべきではなく、特許請求の範囲から逸脱することなく、種々の変形および変更が可能である。例えば、実施形態の構成図に記載の複数の構成ブロックを１つに組み合わせたり、あるいは１つの構成ブロックを分割したりすることが可能である。 Although the embodiments described above have been described as representative examples, it will be apparent to those skilled in the art that many modifications and substitutions can be made within the spirit and scope of the invention. Therefore, the present invention should not be construed as being limited by the embodiments described above, and various modifications and changes can be made without departing from the scope of the claims. For example, it is possible to combine a plurality of configuration blocks described in the configuration diagram of the embodiment into one, or to divide one configuration block.

１０音声信号処理装置
１１音声信号調整部（第１の音声信号調整部）
１２音声信号調整部（第２の音声信号調整部）
１３音声信号合成部
１４調整値決定部
１４１調整値記憶部
１４２調整値選択部
10 Audio signal processing device 11 Audio signal adjustment section (first audio signal adjustment section)
12 Audio signal adjustment section (second audio signal adjustment section)
13 Audio signal synthesis section 14 Adjustment value determination section 141 Adjustment value storage section 142 Adjustment value selection section

Claims

複数の音声オブジェクトから構成される番組の音声信号を処理する音声信号処理装置であって、
前記複数の音声オブジェクトの内の第１の音声オブジェクトの信号レベルと前記第１の音声オブジェクトと重畳して再生される第２の音声オブジェクトの信号レベルとの比ａの増加量または減少量がエンハンスメント量ｄとして要求されると、前記第１の音声オブジェクトの信号レベルの調整値である第１の調整値ｂ、および、前記第２の音声オブジェクトの信号レベルの調整値である第２の調整値ｃを決定する調整値決定部と、
前記第１の調整値ｂに基づき、前記第１の音声オブジェクトの信号レベルを調整する第１の音声信号調整部と、
前記第２の調整値ｃに基づき、前記第２の音声オブジェクトの信号レベルを調整する第２の音声信号調整部と、
前記第１の音声信号調整部による信号レベルの調整後の前記第１の音声オブジェクトの音声信号と、前記第２の音声信号調整部による信号レベルの調整後の前記第２の音声オブジェクトの音声信号とを合成して出力する音声信号合成部と、を備え、
前記調整値決定部は、前記エンハンスメント量ｄに合わせて、前記第１の調整値ｂの真数と前記第２の調整値ｃの真数との二乗和が一定になるように前記第１の調整値ｂおよび前記第２の調整値ｃを決定する、音声信号処理装置。 An audio signal processing device that processes an audio signal of a program composed of a plurality of audio objects,
The amount of increase or decrease in the ratio a between the signal level of a first audio object among the plurality of audio objects and the signal level of a second audio object that is reproduced in a superimposed manner with the first audio object is enhancement. a first adjustment value b that is an adjustment value of the signal level of the first audio object when requested as the amount d; and a second adjustment value b that is an adjustment value of the signal level of the second audio object. an adjustment value determination unit that determines c;
a first audio signal adjustment unit that adjusts the signal level of the first audio object based on the first adjustment value b;
a second audio signal adjustment unit that adjusts the signal level of the second audio object based on the second adjustment value c;
The audio signal of the first audio object after the signal level has been adjusted by the first audio signal adjustment unit, and the audio signal of the second audio object after the signal level has been adjusted by the second audio signal adjustment unit. an audio signal synthesis unit that synthesizes and outputs the
The adjustment value determining unit determines the first adjustment value so that the sum of squares of the antilog of the first adjustment value b and the antilog of the second adjustment value c becomes constant in accordance with the enhancement amount d. An audio signal processing device that determines an adjustment value b and the second adjustment value c.

前記調整値決定部は、前記エンハンスメント量ｄが上限値ｅを超えないように前記第１の調整値ｂおよび前記第２の調整値ｃを決定する、請求項１に記載の音声信号処理装置。 The audio signal processing device according to claim 1, wherein the adjustment value determining unit determines the first adjustment value b and the second adjustment value c so that the enhancement amount d does not exceed an upper limit value e.

前記調整値決定部は、前記複数の音声オブジェクトの再生に関する情報である音響メタデータに含まれる前記エンハンスメント量ｄの上限値ｈを取得した場合、前記エンハンスメント量ｄが前記上限値ｈを超えないように前記第１の調整値ｂおよび前記第２の調整値ｃを決定する、請求項２に記載の音声信号処理装置。 When the adjustment value determining unit obtains an upper limit h of the enhancement amount d included in acoustic metadata that is information regarding reproduction of the plurality of audio objects, the adjustment value determining unit determines that the enhancement amount d does not exceed the upper limit h. 3. The audio signal processing device according to claim 2, wherein the first adjustment value b and the second adjustment value c are determined.

前記エンハンスメント量ｄと、前記第１の調整値ｂおよび前記第２の調整値ｃとを対応付けた調整値リストを記憶する調整値記憶部をさらに備え、
前記調整値決定部は、前記調整値記憶部に記憶されている調整値リストに基づき、前記第１の調整値ｂおよび前記第２の調整値ｃを決定する、請求項１から３のいずれか一項に記載の音声信号処理装置。 further comprising an adjustment value storage unit that stores an adjustment value list in which the enhancement amount d is associated with the first adjustment value b and the second adjustment value c,
Any one of claims 1 to 3, wherein the adjustment value determining unit determines the first adjustment value b and the second adjustment value c based on an adjustment value list stored in the adjustment value storage unit. The audio signal processing device according to item 1.

前記調整値決定部は、前記上限値ｅが＋６ｄＢ以上の場合に、前記エンハンスメント量ｄを＋６ｄＢ以上とする、請求項２に記載の音声信号処理装置。 The audio signal processing device according to claim 2, wherein the adjustment value determining unit sets the enhancement amount d to +6 dB or more when the upper limit e is +6 dB or more.

前記調整値決定部は、前記上限値ｈが＋６ｄＢ以上の場合に、前記エンハンスメント量ｄを＋６ｄＢ以上とする、請求項３に記載の音声信号処理装置。 The audio signal processing device according to claim 3, wherein the adjustment value determining unit sets the enhancement amount d to +6 dB or more when the upper limit value h is +6 dB or more.

前記上限値ｅは＋１２ｄＢである、請求項２に記載の音声信号処理装置。 The audio signal processing device according to claim 2, wherein the upper limit value e is +12 dB.

コンピュータを、請求項１に記載の音声信号処理装置として動作させるプログラム。
A program that causes a computer to operate as the audio signal processing device according to claim 1.