JP5692255B2

JP5692255B2 - Content reproduction apparatus and content processing method

Info

Publication number: JP5692255B2
Application number: JP2013034471A
Authority: JP
Inventors: 良太郎青木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2010-12-03
Filing date: 2013-02-25
Publication date: 2015-04-01
Anticipated expiration: 2031-08-31
Also published as: JP2013174882A

Description

この発明は、圧縮されたオーディオ信号の品質に応じて音声を聴きやすい音に処理するコンテンツ再生装置およびコンテンツ処理方法に関する。 The present invention relates to a content reproduction apparatus and a content processing method for processing sound into sound that is easy to hear according to the quality of a compressed audio signal.

近年のテレビ受像機は、テレビ放送を受信・再生するチューナを備えているのみならず、ＨＤＭＩ、アナログ（ＮＴＳＣ）など多数の入力端子を備えている（たとえば特許文献１参照）。また、メモリカードスロットを備えているものもある。入力端子には、たとえば、ゲーム装置、インターネットに接続されたパソコン、ホームビデオなどが接続される。 Recent television receivers include not only a tuner for receiving and playing back a television broadcast, but also a large number of input terminals such as HDMI and analog (NTSC) (see, for example, Patent Document 1). Some have a memory card slot. For example, a game device, a personal computer connected to the Internet, a home video, or the like is connected to the input terminal.

特開２００６−０１９９４７号公報JP 2006-019947 A

チューナで受信されるテレビ放送の動画（映像および音声）は、テレビ受像機で良好に再生できるように最適化されているため、そのまま復調出力すれば最適化された音質で映像および音声が再生される。しかしながら、各種入力端子に接続されるパソコン、ホームビデオ等から入力される動画の多くは、いわゆる素人が撮影したもの、または、インターネット等で配信するために高い圧縮率で圧縮されたものである。素人が撮影した動画は、撮影後の調整も殆どされていないものが多いため音量設定が大きすぎたり小さすぎたりまちまちであり、そのまま再生したのでは、音声が極端に大きかったり、小さかったりすることが多い。また、高い圧縮率で圧縮された動画は、圧縮率をあげるために音声の高音域が除去されているものが多い。したがって、テレビ受像機でこのような動画を再生する場合には、動画の品質に応じて音声を処理して聞きやすく加工することが望まれる。 The TV broadcast video (video and audio) received by the tuner is optimized so that it can be played back satisfactorily on the TV receiver. Therefore, if the demodulated output is output as it is, the video and audio are played back with the optimized sound quality. The However, most of the moving images input from personal computers connected to various input terminals, home videos, etc. are taken by so-called amateurs or compressed at a high compression rate for distribution over the Internet or the like. Many videos taken by amateurs have little adjustment after shooting, so the volume setting is too large or too small, and if you play it as it is, the sound will be extremely loud or small There are many. In addition, many moving images compressed at a high compression rate have a high sound range removed from the sound in order to increase the compression rate. Therefore, when reproducing such a moving image on a television receiver, it is desirable to process the sound according to the quality of the moving image so as to make it easy to hear.

記録された動画ファイルには、撮影に用いられたカメラの機種情報、映像の圧縮アルゴリズム、解像度、色ビット数、フレームレート、音声の圧縮アルゴリズム、サンプルレート、サンプルビット数、ビットレートなどの属性情報が書き込まれている。しかし、ＨＤＭＩやアナログなどの入力端子から入力される動画は、外部の機器でデコード・再生されたストリーミング信号であるため、これらの情報が失われていることがある。したがってテレビ受像機は、この動画の音声をどのように処理すれば聴きやすくすることができるかを判断することができなかった。 The recorded video file contains attribute information such as model information of the camera used for shooting, video compression algorithm, resolution, color bit number, frame rate, audio compression algorithm, sample rate, sample bit number, bit rate, etc. Has been written. However, since a moving image input from an input terminal such as HDMI or analog is a streaming signal decoded and reproduced by an external device, such information may be lost. Therefore, the television receiver cannot determine how to process the sound of the moving image to make it easier to listen.

また、ＡＶアンプにも、ＨＤＭＩやアナログなどのビデオ入力端子を備えテレビに映像を供給するとともに、スピーカから音声を放音する機能を備えた装置があるが、このようなＡＶアンプにおいても状況は同様であった。 In addition, AV amplifiers include devices that have a video input terminal such as HDMI or analog and supply video to a television, and a function of emitting sound from a speaker. It was the same.

この発明は、コンテンツの種類に応じて音声を聴きやすく処理することができるコンテンツ再生装置およびコンテンツ処理方法を提供することを目的とする。 An object of the present invention is to provide a content reproduction apparatus and a content processing method capable of easily processing sound according to the type of content.

この発明のコンテンツ再生装置は、所定のビットレートの音声を入力して再生し、音声のビットレートを検出するビットレート検出部と、検出されたビットレートに応じた程度で、音声のダイナミックレンジの圧縮および高音域成分の補完を行う音声処理部と、を備えている。音声処理部は、音声のビットレートが高いほどダイナミックレンジの圧縮の程度を小さく、高音域成分の補完の程度を大きくし、音声のビットレートが低いほどダイナミックレンジの圧縮の程度を大きく、高音域成分の補完の程度を小さくする。 The content reproduction apparatus of the present invention inputs and reproduces audio of a predetermined bit rate, detects a bit rate of the audio, and a dynamic range of the audio to the extent corresponding to the detected bit rate. An audio processing unit that performs compression and complementation of high-frequency components. The higher the audio bit rate, the smaller the degree of compression of the dynamic range and the greater the degree of high-frequency component complementation, and the lower the audio bit rate, the greater the degree of compression of the dynamic range. Reduce the degree of ingredient complementation.

上記発明において、検出されたビットレートが、音声に損失を与えない大きさであった場合、音声処理部が、この音声に対してダイナミックレンジの圧縮および高音域成分の補完を行わないようにしてもよい。 In the above invention, when the detected bit rate is a size that does not cause loss to the sound, the sound processing unit is configured not to compress the dynamic range and complement the high frequency component for the sound. Also good.

この発明のコンテンツ処理方法は、所定のビットレートの音声を入力して再生するステップと、音声のビットレートを検出するビットレート検出ステップと、検出されたビットレートに応じた程度で、音声のダイナミックレンジの圧縮および高音域成分の補完を行う音声処理ステップと、を有する。音声処理ステップは、音声のビットレートが高いほどダイナミックレンジの圧縮の程度を小さく、高音域成分の補完の程度を大きくし、音声のビットレートが低いほどダイナミックレンジの圧縮の程度を大きく、高音域成分の補完の程度を小さくする。 The content processing method according to the present invention includes a step of inputting and playing back audio of a predetermined bit rate, a bit rate detecting step of detecting the bit rate of the audio, and dynamic of the audio to the extent corresponding to the detected bit rate. An audio processing step for compressing the range and complementing the high frequency range component. Audio processing steps, reduce the degree of compression of the dynamic range higher bit rate of the audio, to increase the degree of complementary high frequency components, a large degree of compression of the dynamic range lower the bit rate of the audio, treble Reduce the degree of ingredient complementation.

上記発明において、検出されたビットレートが、音声に損失を与えない大きさであった場合、音声処理ステップを行わないようにしてもよい。 In the above invention, when the detected bit rate has a magnitude that does not cause loss of sound, the sound processing step may not be performed.

この発明によれば、音声のビットレートにより、音声の周波数特性を推定することができ、確実な音声の処理を行うことが可能になる。 According to the present invention, it is possible to estimate the frequency characteristics of the voice based on the bit rate of the voice, and to perform reliable voice processing.

この発明の実施形態であるテレビ受像機のブロック図The block diagram of the television receiver which is embodiment of this invention 同テレビ受像機の映像処理部および音声処理部の動作を示すフローチャートThe flowchart which shows operation | movement of the video processing part and audio | voice processing part of the television receiver 同テレビ受像機に入力される動画の音声の周波数特性の例を示す図The figure which shows the example of the frequency characteristic of the audio | voice of the moving image input into the television receiver 同テレビ受像機のデコーダの動作を示すフローチャートA flowchart showing the operation of the decoder of the television receiver 同テレビ受像機に入力される動画の音声の周波数特性の例を示す図The figure which shows the example of the frequency characteristic of the audio | voice of the moving image input into the television receiver この発明の他の実施形態であるＡＶアンプのブロック図Block diagram of an AV amplifier according to another embodiment of the present invention

図１は、この発明の実施形態であるテレビ受像機のブロック図である。テレビ受像機１は、主としてテレビ放送およびこれに類似した形式の動画を入力してこれを再生する装置である。この実施形態において、映像およびこれに同期した音声を含むコンテンツを動画と呼ぶ。放送局から送られてくる放送信号を受信するためのチューナとして、地デジ（地上波デジタル放送）チューナ２１、ＢＳ（放送衛星放送）チューナ２２、ＣＳ（通信衛星放送）チューナ２３を備えている。また、外部から動画を入力するための入力端子としてＨＤＭＩ入力部２４、アナログ入力部２５を備えている。アナログ入力部２５は、コンポジット端子またはＳ端子＋ステレオ端子などを有し、これらの端子から入力された信号をデジタル化するＡ／Ｄコンバータを内蔵している。また、圧縮された動画をデコードするためデコーダ（ＣＯＤＥＣ）２６も設けられている。デコーダ２６には、たとえばネットワークケーブルやメモリカードスロット、ＵＳＢコネクタなどが接続される。デコーダ２６は、ネットワークケーブルを介してインターネットからストリーミング配信される動画ファイルをデコードする。また、デコーダ２６は、メモリカードスロットにセットされているメモリカードなどの記録媒体から動画ファイルを読み出してデコードして再生する。 FIG. 1 is a block diagram of a television receiver according to an embodiment of the present invention. The television receiver 1 is a device that mainly receives a television broadcast and a moving image of a similar format and reproduces the same. In this embodiment, content including video and audio synchronized therewith is called a moving image. As tuners for receiving broadcast signals sent from broadcast stations, a terrestrial digital (terrestrial digital broadcast) tuner 21, a BS (broadcast satellite broadcast) tuner 22, and a CS (communication satellite broadcast) tuner 23 are provided. Also, an HDMI input unit 24 and an analog input unit 25 are provided as input terminals for inputting a moving image from the outside. The analog input unit 25 has a composite terminal or an S terminal + stereo terminal and the like, and incorporates an A / D converter that digitizes signals input from these terminals. A decoder (CODEC) 26 is also provided to decode the compressed moving image. For example, a network cable, a memory card slot, a USB connector, or the like is connected to the decoder 26. The decoder 26 decodes a moving image file that is streamed from the Internet via a network cable. The decoder 26 reads out a moving image file from a recording medium such as a memory card set in the memory card slot, decodes and reproduces it.

これら動画再生部（地デジチューナ２１、ＢＳチューナ２２、ＣＳチューナ２３、ＨＤＭＩ入力部２４、アナログ入力部２５およびデコーダ２６）はセレクタ２０の入力側に接続されている。また、セレクタ２０の出力側には映像処理部１１および音声処理部１２が接続されている。セレクタ２０は、マイクロコンピュータ（マイコン）で構成されるコントローラ１０によって切り換えられる。すなわち、動画再生部から入力された動画のうち、セレクタ２０によって選択された動画の映像が映像処理部１１に入力され、選択された動画の音声が音声処理部１２に入力される。 These moving picture reproduction units (terrestrial digital tuner 21, BS tuner 22, CS tuner 23, HDMI input unit 24, analog input unit 25, and decoder 26) are connected to the input side of the selector 20. The video processing unit 11 and the audio processing unit 12 are connected to the output side of the selector 20. The selector 20 is switched by a controller 10 composed of a microcomputer. That is, among the videos input from the video playback unit, the video of the video selected by the selector 20 is input to the video processing unit 11, and the audio of the selected video is input to the audio processing unit 12.

映像処理部１１には表示処理部１３が接続され、表示処理部１３にはディスプレイ１５が接続されている。また、音声処理部１２には放音処理部１４が接続され、放音処理部１４にはスピーカ１６が接続されている。 A display processing unit 13 is connected to the video processing unit 11, and a display 15 is connected to the display processing unit 13. In addition, a sound emission processing unit 14 is connected to the sound processing unit 12, and a speaker 16 is connected to the sound emission processing unit 14.

映像処理部１１は、地デジチューナ２１、ＢＳチューナ２２、ＣＳチューナ２３から入力されたテレビ放送の映像は、ディスプレイ１５に良好な画質で表示できるように最適化されているため、そのまま表示処理部１３に出力する。また、音声処理部１２は、地デジチューナ２１、ＢＳチューナ２２、ＣＳチューナ２３から入力されたテレビ放送の音声は、スピーカ１６から良好な音質で放音できるように最適化されているため、そのまま放音処理部１４に出力する。 The video processing unit 11 is optimized so that the TV broadcast video input from the terrestrial digital tuner 21, the BS tuner 22, and the CS tuner 23 can be displayed on the display 15 with good image quality. 13 is output. The audio processing unit 12 is optimized so that the sound of the television broadcast input from the terrestrial digital tuner 21, the BS tuner 22, and the CS tuner 23 can be emitted from the speaker 16 with good sound quality. The sound is output to the sound emission processing unit 14.

映像処理部１１は、ＨＤＭＩ入力部２４、アナログ入力部２５またはデコーダ２６から入力された動画の映像は、ディスプレイ１５に表示したときユーザが見やすくなるように、その動画の画質に合わせて映像を処理する。映像処理は、たとえば、解像度や映像のシャープさを調整するなどの処理である。処理された映像は表示処理部１３に入力される。表示処理部１３は映像信号をマトリクス状に展開してフレームデータとし、ディスプレイ１５に表示する処理を行う。 The video processing unit 11 processes the video of the moving image input from the HDMI input unit 24, the analog input unit 25, or the decoder 26 according to the image quality of the moving image so that the user can easily view it when displayed on the display 15. To do. The video processing is, for example, processing such as adjusting resolution and video sharpness. The processed video is input to the display processing unit 13. The display processing unit 13 performs processing for expanding the video signal into a matrix to form frame data and displaying the frame data on the display 15.

映像処理部１１において、映像の画質は、映像処理部１１に内蔵されている映像分析部１１Ａが分析する。映像分析部１１Ａは、セレクタ２０を介してＨＤＭＩ入力部２４またはアナログ入力部２５から入力される映像の解像度や圧縮歪の程度等を分析する。この分析結果は、映像処理部１１における映像処理に利用されるほか、コントローラ１０に送られる。 In the video processing unit 11, the video image quality is analyzed by the video analysis unit 11 </ b> A built in the video processing unit 11. The video analysis unit 11A analyzes the resolution of the video input from the HDMI input unit 24 or the analog input unit 25 via the selector 20, the degree of compression distortion, and the like. This analysis result is sent to the controller 10 in addition to being used for video processing in the video processing unit 11.

コントローラ１０は、映像処理部１１の映像解析部１１Ａから取得した映像の分析結果に基づいて音質を推定し、推定された音質に応じた音声処理の内容を決定する。コントローラ１０は、この決定された処理内容を音声処理部１２に対して設定する。音声処理部１２は、設定された内容の音声処理を行う。音声処理は、たとえば、テレビ放送よりも劣る音質の音声信号に対してダイナミックレンジを圧縮または拡張する処理、高音域成分を強調または補完する成分付加処理などである。処理された音声は放音処理部１４に入力される。放音処理部１４は音声をアナログ信号に変換して増幅しスピーカ１６から放音する。 The controller 10 estimates the sound quality based on the analysis result of the video acquired from the video analysis unit 11A of the video processing unit 11, and determines the content of the audio processing according to the estimated sound quality. The controller 10 sets the determined processing content for the voice processing unit 12. The sound processing unit 12 performs sound processing of the set contents. The audio processing includes, for example, processing for compressing or expanding a dynamic range for an audio signal having a sound quality inferior to that of television broadcasting, and component addition processing for enhancing or complementing high-frequency components. The processed voice is input to the sound emission processing unit 14. The sound emission processing unit 14 converts the sound into an analog signal, amplifies it, and emits the sound from the speaker 16.

また、デコーダ２６が、ネットワーク経由で送られてくるストリーミング動画をデコードする場合、および、記録媒体の動画ファイルをデコードして再生する場合、デコーダ２６は、その動画の属性を取得する。動画の属性とは、映像の圧縮アルゴリズム、解像度、色ビット数、フレームレート、および、音声の圧縮アルゴリズム、サンプルレート、サンプルビット数、ビットレートなどである。 Further, when the decoder 26 decodes a streaming moving image sent via the network, and when decoding and reproducing a moving image file on the recording medium, the decoder 26 acquires the attribute of the moving image. The moving image attributes include a video compression algorithm, resolution, the number of color bits, a frame rate, an audio compression algorithm, a sample rate, a number of sample bits, a bit rate, and the like.

コントローラ１０は、セレクタ２０でデコーダ２６を選択したとき、デコーダ２６から動画の属性を取得する。コントローラ１０は、この映像の属性を映像処理部１１に送るとともに、音声の属性に基づき、テレビ放送よりも劣る音質に対応して、これを補償するための音声処理部１２の処理内容を決定して、これを音声処理部１２に設定する。 The controller 10 acquires the attribute of the moving image from the decoder 26 when selecting the decoder 26 with the selector 20. The controller 10 sends the video attribute to the video processing unit 11 and determines the processing content of the audio processing unit 12 to compensate for the sound quality inferior to that of the television broadcast based on the audio attribute. This is set in the voice processing unit 12.

図２は、映像処理部１１およびコントローラ１０の動作を示すフローチャートである。図２（Ａ）は、映像処理部１１の映像解析部１１Ａの映像解析動作を示すフローチャートである。映像を入力して（Ｓ１）、その解像度を解析する（Ｓ２）。そして、ディスプレイ１５にフルスクリーンで表示するための解像度変換の係数を設定する（Ｓ３）。また、映像にどの程度の圧縮歪が含まれているかを解析する（Ｓ４）。圧縮歪の有無およびその程度は、以下のような手法で求めることが可能である。ここでは圧縮歪の一例であるブロックノイズの程度を求める手法を示す。映像の任意の垂直ドット列、水平ドット列の明度変化を求め、一定間隔（例えば１６ドット）ごとに不連続点があればブロックノイズがあり、その不連続点の差が大きいほどブロックノイズの程度が大きいと判定することができる。検出されたブロックノイズの程度の大きさに応じて、ブロックのエッジを目立たなくするよう映像をソフトにする等の処理を設定する（Ｓ５）。同様に圧縮歪とその処理の他の例としてモスキートノイズの程度を検出し、除去することなども挙げられる。 FIG. 2 is a flowchart showing operations of the video processing unit 11 and the controller 10. FIG. 2A is a flowchart showing the video analysis operation of the video analysis unit 11 </ b> A of the video processing unit 11. An image is input (S1) and the resolution is analyzed (S2). Then, a resolution conversion coefficient for displaying the full screen on the display 15 is set (S3). Further, it is analyzed how much compression distortion is included in the video (S4). The presence or absence and degree of compression distortion can be determined by the following method. Here, a method for obtaining the degree of block noise, which is an example of compression distortion, is shown. The brightness change of an arbitrary vertical dot row and horizontal dot row of an image is obtained, and if there is a discontinuous point at regular intervals (for example, 16 dots), there is a block noise, and the larger the difference between the discontinuous points, the more the block noise level Can be determined to be large. Depending on the magnitude of the detected block noise, processing such as softening the video is set so as to make the block edge inconspicuous (S5). Similarly, another example of compression distortion and its processing includes detecting and removing the degree of mosquito noise.

そして実フレームレートを解析する（Ｓ６）。ＨＤＭＩ入力部２４から入力された動画の場合、フレームレートは例えば６０ｆｐｓ、１２０ｆｐｓなどに調整されている。しかし、元々の動画がより低いフレームレートであったものをＨＤＭＩ信号化するときにそのフレームレートに変換された可能性がある。そこで、複数のフレーム内画像の動きをチェックし、動きが段階的であればＨＤＭＩ信号化するときにフレームレートを上げたと考えられる。すなわち、６０ｆｐｓの映像であっても実際には３フレームごとにしか画像に動きがない場合は元々は２０ｆｐｓの映像であったと推定することができる。このようにして実フレームレートを推定する。映像は既にＨＤＭＩ用に変換されているため、この解析結果は映像処理には用いない。 Then, the actual frame rate is analyzed (S6). In the case of a moving image input from the HDMI input unit 24, the frame rate is adjusted to, for example, 60 fps or 120 fps. However, there is a possibility that an original moving image having a lower frame rate is converted to the frame rate when it is converted into an HDMI signal. Therefore, it is considered that the frame rate was increased when the motion of the plurality of intra-frame images was checked and if the motion was stepwise, the HDMI signal was converted. That is, even if the image is 60 fps, if the image actually moves only every three frames, it can be estimated that the image was originally 20 fps. In this way, the actual frame rate is estimated. Since the video has already been converted for HDMI, this analysis result is not used for video processing.

そして、解析された解像度、圧縮歪強度、実フレームレートをコントローラ１０に送信する（Ｓ７）。こののち、映像処理部１１は、Ｓ３、Ｓ５の設定内容に応じて映像処理を実行する。 Then, the analyzed resolution, compression distortion strength, and actual frame rate are transmitted to the controller 10 (S7). After that, the video processing unit 11 executes video processing according to the setting contents of S3 and S5.

図２（Ｂ）は、コントローラによる音声処理設定動作を示すフローチャートである。映像処理部１１から解像度、圧縮歪強度、実フレームレートを取得すると（Ｓ１１）、これに基づいてダイナミックレンジの変換の程度を決定する（Ｓ１２）とともに、音声の周波数成分の拡張の程度を決定する（Ｓ１３）。 FIG. 2B is a flowchart showing an audio processing setting operation by the controller. When the resolution, compression distortion strength, and actual frame rate are acquired from the video processing unit 11 (S11), the degree of conversion of the dynamic range is determined based on the resolution (S12), and the degree of expansion of the audio frequency component is determined. (S13).

ダイナミックレンジの変換には複数の手法が考えられる。すなわち、「ビットレートの低い貧弱な動画ほどダイナミックレンジが圧縮されているため、これを拡張する」という手法、「放送専用機器ではない、コンシューマー用のビデオカメラを使いて、プロフェッショナルの技術をもたない、いわゆる素人が撮影した動画は音声信号レベルがまちまちであるため、ダイナミックレンジを圧縮して小さな音をききとりやすくしたり、過大になっている音をリミッターで制限したりして、テレビで再生可能な音量に最適化する」という手法などである。Ｓ１２では、これらのいずれかまたは各手法を複合してダイナミックレンジの変換を行う。また、周波数成分の拡張については、ビットレートの低い動画ほど圧縮率を上げるため高音域と低音域がカットされている場合が多いので、Ｓ１３では、解像度、実フレームレート、圧縮歪強度から推定されるビットレートに合わせて高音域と低音域の補強を行う。 There are several methods for dynamic range conversion. In other words, “The poorer video with a lower bit rate is compressed because the dynamic range is compressed, so this is expanded,” “Professional technology is used by using a consumer video camera that is not a dedicated broadcast device. Since there are no so-called amateur videos, the audio signal level varies, so the dynamic range is compressed to make it easier to remove small sounds, or excessive sounds are limited with a limiter and played on a TV. For example, a method of “optimizing for possible volume”. In S12, dynamic range conversion is performed by combining any one or each of these methods. In addition, the frequency component expansion is estimated from the resolution, the actual frame rate, and the compression distortion strength in S13 because the high frequency range and the low frequency range are often cut in order to increase the compression rate as the moving image has a lower bit rate. Reinforce the high and low frequencies according to the bit rate.

決定された処理内容を実行するよう音声処理部１２に対して設定を行い（Ｓ１４）、音声処理設定動作を終了する。以上の動作により、映像処理部１１が映像の処理内容を決定するために行う映像解析の結果を取得して音声処理部１２の音声処理の内容が設定される。 The audio processing unit 12 is set to execute the determined processing content (S14), and the audio processing setting operation is terminated. Through the above operation, the result of the video analysis performed by the video processing unit 11 to determine the processing content of the video is acquired, and the content of the audio processing of the audio processing unit 12 is set.

なお、映像の解析において、カメラワーク、すなわち、パンニングやズーミングの巧拙に基づいて、プロフェッショナルによる撮影であるか素人による撮影であるかを推定し、この推定結果に基づいてダイナミックレンジ圧縮をするか否かを決定してもよい。 In video analysis, based on the camera work, that is, the skill of panning and zooming, it is estimated whether the image is taken by a professional or an amateur and whether or not dynamic range compression is performed based on the estimation result You may decide.

パンニング・ズーミングの巧拙は、以下のような手法で推定可能である。たとえば、「パンニング・ズーミングを過剰に使用している」、「パンニング・ズーミングの速度が速すぎたり、不均一な速度である」、「パンニングが行ったり来たりしている、また、ぶれている(波打っている)」などプロフェッショナルなカメラマンによる撮影なら起こらないような映像の変化を検出し、これらが検出された場合には、ホームビデオカメラによる撮影であると判定する。接続機器がビデオカメラの場合、入力される解像度の情報に基づいて音声処理の内容を決定すればよいが、ビデオカメラの場合、音声のダイナミックレンジが広すぎる場合が多いため、ダイナミックレンジ圧縮処理（ＤＲＣ）は強いめに設定する。 The skill of panning / zooming can be estimated by the following method. For example, “too much panning / zooming”, “panning / zooming is too fast or uneven”, “panning is coming back and forth, and it ’s blurry Changes in video that would not occur if shooting by a professional photographer, such as “waving”, are detected, and if these are detected, it is determined that shooting is by a home video camera. If the connected device is a video camera, the audio processing content may be determined based on the input resolution information. However, in the case of a video camera, the dynamic range of the audio is often too wide. DRC) is set to a stronger value.

ここで、映像解析部１１Ａから取得した映像の解像度に応じた周波数成分拡張処理（Ｓ１３）およびダイナミックレンジ変換処理（Ｓ１２）の内容の決定手法の一例について説明する。圧縮された動画データやアナログの動画の場合、映像の解像度と音声の音質（ビットレート）は、ほぼ相関していると考えられる。そこで、映像の解像度に応じて、以下のように音声処理の内容を決定する。 Here, an example of a method for determining the contents of the frequency component expansion process (S13) and the dynamic range conversion process (S12) according to the resolution of the video acquired from the video analysis unit 11A will be described. In the case of compressed moving image data or analog moving images, it is considered that the video resolution and the sound quality (bit rate) of audio are substantially correlated. Therefore, the contents of the audio processing are determined as follows according to the resolution of the video.

映像の解像度が２４０ｐ（垂直走査線数２４０本）の場合、図３（Ａ）に示すように、音声周波数特性が６ｋＨｚ付近までしか伸びておらず、強調可能な高音帯域がほとんど検出されない。このため、低域側だけ拡張が望ましい。元成分のあら（量子化ノイズなど）が目立たないように抑え目に付加する。帯域が狭く音圧バランスも崩れているのでダイナミックレンジ圧縮処理（ＤＲＣ）を強いめにかける。すなわち、
高域側成分付加なし
低域側成分付加 −６ｄＢ
ＤＲＣ強
とする。 When the video resolution is 240p (the number of vertical scanning lines is 240), as shown in FIG. 3A, the audio frequency characteristic extends only to around 6 kHz, and the high-frequency band that can be emphasized is hardly detected. For this reason, expansion is desirable only on the low frequency side. The original components (quantization noise, etc.) are suppressed and added to the eyes so that they do not stand out. Since the band is narrow and the sound pressure balance is broken, the dynamic range compression processing (DRC) is strongly applied. That is,
High frequency component added None Low frequency component added -6dB
DRC strong.

映像の解像度が３６０ｐの場合、図３（Ｂ）に示すように、音声周波数特性が、高音域の１０ｋＨｚ付近まで伸びている。このため、数ｋＨｚ以上の成分を高域拡張成分として拡張して付加することが望ましい。元成分のあら（量子化ノイズなど）が目立たないように抑え目に付加する。帯域が狭く音圧バランスも崩れているのでＤＲＣを強いめにかける。すなわち、
高域側成分付加 −６ｄＢ
低域側成分付加 −６ｄＢ
ＤＲＣ強
とする。 When the video resolution is 360p, as shown in FIG. 3B, the audio frequency characteristic extends to around 10 kHz in the high sound range. For this reason, it is desirable to add a component of several kHz or more as a high-frequency extension component. The original components (quantization noise, etc.) are suppressed and added to the eyes so that they do not stand out. Since the band is narrow and the sound pressure balance is broken, the DRC is put stronger. That is,
High frequency component added -6dB
Low side component added -6dB
DRC strong.

映像の解像度が４８０ｐの場合、図３（Ｃ）に示すように、音声周波数特性が、高音域の１６ｋＨｚ付近まで伸びている。このため、数ｋＨｚ以上の成分を高域拡張成分として拡張して付加することが望ましい。効果が分かりやすいように（元成分に埋もれないように）強いめに付加する。帯域がやや狭いためＤＲＣを中程度にかける。すなわち、
高域側成分付加 −３ｄＢ
低域側成分付加 −３ｄＢ
ＤＲＣ中
とする。 When the video resolution is 480p, as shown in FIG. 3C, the audio frequency characteristic extends to around 16 kHz in the high sound range. For this reason, it is desirable to add a component of several kHz or more as a high-frequency extension component. To make the effect easy to understand (so as not to be buried in the original component), add a strong one. Since the band is slightly narrow, DRC is applied to a medium level. That is,
High side component added -3dB
Low side component added -3dB
During DRC.

映像の解像度が７２０ｐの場合、図３（Ｄ）に示すように、音声周波数特性は、やはり、高音域の１６ｋＨｚ付近までの伸びである。解像度が上がっても、インターネットコンテンツの場合、音声データの帯域は制限されている場合が多い。このため、解像度が４８０ｐの場合と同様に、数ｋＨｚ以上の成分を高域拡張成分として拡張して付加することが望ましい。効果が分かりやすいように（元成分に埋もれないように）強いめに付加する。帯域がやや狭いためＤＲＣを中程度にかける。すなわち、
高域側成分付加 −３ｄＢ
低域側成分付加 −３ｄＢ
ＤＲＣ中
とする。 When the video resolution is 720p, as shown in FIG. 3D, the audio frequency characteristic is also an extension up to around 16 kHz in the high sound range. Even when the resolution is increased, in the case of Internet content, the bandwidth of audio data is often limited. For this reason, as in the case of the resolution of 480p, it is desirable to extend and add a component of several kHz or more as a high-frequency extension component. To make the effect easy to understand (so as not to be buried in the original component), add a strong one. Since the band is slightly narrow, DRC is applied to a medium level. That is,
High side component added -3dB
Low side component added -3dB
During DRC.

また、ストリーミング配信される動画や記録媒体に格納された動画ファイルには、属性情報（プロパティ）が書き込まれているため、デコーダ２６はこの属性情報に基づいてデコード処理の内容を決定してデコードを行う。同時に、この属性情報はコントローラ１０に送られ、映像処理部１１および音声処理部１２の処理内容の決定に利用される。 In addition, attribute information (property) is written in a moving image to be distributed by streaming or a moving image file stored in a recording medium. Therefore, the decoder 26 determines the content of decoding processing based on the attribute information and performs decoding. Do. At the same time, this attribute information is sent to the controller 10 and used to determine the processing contents of the video processing unit 11 and the audio processing unit 12.

図４はデコーダ２６の動作を示すフローチャートである。この動作はネットワークを介してストリーミング動画が入力されたときの動作を示している。ストリーミング動画が入力されると（Ｓ２０）、この動作の映像および音声を解析する（Ｓ２１，Ｓ２２）。この解析は動画の属性情報を読み出すことによって行われる。映像解析においては、圧縮アルゴリズム、解像度、色ビット数、フレームレート等の属性が読み出される。音声解析においては、圧縮アルゴリズム、サンプルレート、ビットレート、サンプルビット数などの属性が読み出される。これらの解析結果である属性情報をコントローラ１０に送信する。コントローラ１０は、映像の解析結果を映像処理部１１に転送し、音声の解析結果に基づいて音声処理の内容を決定して音声処理部１２を設定する。デコーダ２６は、取得した属性情報に基づいて動画のデコードを開始する（Ｓ２４）。 FIG. 4 is a flowchart showing the operation of the decoder 26. This operation indicates an operation when a streaming video is input via the network. When a streaming video is input (S20), the video and audio of this operation are analyzed (S21, S22). This analysis is performed by reading the attribute information of the moving image. In video analysis, attributes such as a compression algorithm, resolution, number of color bits, and frame rate are read. In speech analysis, attributes such as a compression algorithm, a sample rate, a bit rate, and the number of sample bits are read out. The attribute information that is the analysis result is transmitted to the controller 10. The controller 10 transfers the video analysis result to the video processing unit 11, determines the content of the audio processing based on the audio analysis result, and sets the audio processing unit 12. The decoder 26 starts decoding the moving image based on the acquired attribute information (S24).

この場合において、映像処理部１１は、コントローラ１０から転送された属性情報に加えてさらに映像解析部１１Ａによる解析結果を用いて映像の処理内容を決定してもよい。また、コントローラ１０は、デコーダ２６から取得した属性情報に加えてさらに映像処理部１１から取得した映像の解析結果を用いて音声の処理内容を決定してもよい。 In this case, the video processing unit 11 may determine the processing content of the video using the analysis result by the video analysis unit 11A in addition to the attribute information transferred from the controller 10. In addition to the attribute information acquired from the decoder 26, the controller 10 may further determine the audio processing content using the video analysis result acquired from the video processing unit 11.

なお、コントローラ１０がデコーダ２６から動画の属性情報を取得できない場合、すなわちデコーダ２６が属性情報を外部出力する構造になっていない場合には、デコーダ２６で動画をデコードして再生する場合も、映像処理部１１がその映像を解析して映像および音声の処理内容を決定すればよい。 If the controller 10 cannot acquire the attribute information of the moving image from the decoder 26, that is, if the decoder 26 is not configured to output the attribute information to the outside, the moving image may be decoded and reproduced by the decoder 26. The processing unit 11 may analyze the video and determine the processing content of the video and audio.

ここで、デコーダ２６によって取得された属性情報により音声のビットレートが判った場合には、以下のような処理すればよい。 If the audio bit rate is found from the attribute information acquired by the decoder 26, the following processing may be performed.

音声のビットレートが３２ｋｂｐｓであった場合、図５（Ａ）に示すように、音声周波数特性が６ｋＨｚ付近までしか伸びておらず、強調可能な高音帯域がほとんど検出されない。このため、低域側だけ拡張が望ましい。元成分のあら（量子化ノイズなど）が目立たないように抑え目に付加する。帯域が狭く音圧バランスも崩れているのでダイナミックレンジ圧縮処理（ＤＲＣ）を強いめにかける。すなわち、
高域側成分付加なし
低域側成分付加 −６ｄＢ
ＤＲＣ強
とする。 When the audio bit rate is 32 kbps, as shown in FIG. 5 (A), the audio frequency characteristic extends only to around 6 kHz, and an emphasizable treble band is hardly detected. For this reason, expansion is desirable only on the low frequency side. The original components (quantization noise, etc.) are suppressed and added to the eyes so that they do not stand out. Since the band is narrow and the sound pressure balance is broken, the dynamic range compression processing (DRC) is strongly applied. That is,
High frequency component added None Low frequency component added -6dB
DRC strong.

音声のビットレートが６４ｋｂｐｓであった場合、図５（Ｂ）に示すように、音声周波数特性が、高音域の１０ｋＨｚ付近まで伸びている。このため、数ｋＨｚ以上の成分を高域拡張成分として拡張して付加することが望ましい。元成分のあら（量子化ノイズなど）が目立たないように抑え目に付加する。帯域が狭く音圧バランスも崩れているのでＤＲＣを強いめにかける。すなわち、
高域側成分付加 −６ｄＢ
低域側成分付加 −６ｄＢ
ＤＲＣ強
とする。 When the audio bit rate is 64 kbps, as shown in FIG. 5B, the audio frequency characteristic extends to around 10 kHz in the high sound range. For this reason, it is desirable to add a component of several kHz or more as a high-frequency extension component. The original components (quantization noise, etc.) are suppressed and added to the eyes so that they do not stand out. Since the band is narrow and the sound pressure balance is broken, the DRC is put stronger. That is,
High frequency component added -6dB
Low side component added -6dB
DRC strong.

音声のビットレートが１２８ｋｂｐｓであった場合、図５（Ｃ）に示すように、音声周波数特性が、高音域の１６ｋＨｚ付近まで伸びている。このため、数ｋＨｚ以上の成分を高域拡張成分として拡張して付加することが望ましい。効果が分かりやすいように（元成分に埋もれないように）強いめに付加する。帯域がやや狭いためＤＲＣを中程度にかける。すなわち、
高域側成分付加 −３ｄＢ
低域側成分付加 −３ｄＢ
ＤＲＣ中
とする。 When the audio bit rate is 128 kbps, as shown in FIG. 5C, the audio frequency characteristic extends to around 16 kHz in the high sound range. For this reason, it is desirable to add a component of several kHz or more as a high-frequency extension component. To make the effect easy to understand (so as not to be buried in the original component), add a strong one. Since the band is slightly narrow, DRC is applied to a medium level. That is,
High side component added -3dB
Low side component added -3dB
During DRC.

音声のビットレートが２５６ｋｂｐｓであった場合、図５（Ｄ）に示すように、音声周波数特性が、高音域の１８ｋＨｚ付近まで伸びている。このため、数ｋＨｚ以上の成分を高域拡張成分として拡張して付加することが望ましい。効果が分かりやすいように（元成分に埋もれないように）強いめに付加する。帯域が広いためＤＲＣを弱くかける。すなわち、
高域側成分付加０ｄＢ
低域側成分付加０ｄＢ
ＤＲＣ弱
とする。 When the audio bit rate is 256 kbps, as shown in FIG. 5D, the audio frequency characteristic extends to around 18 kHz in the high sound range. For this reason, it is desirable to add a component of several kHz or more as a high-frequency extension component. To make the effect easy to understand (so as not to be buried in the original component), add a strong one. Since the bandwidth is wide, the DRC is weakened. That is,
High frequency component added 0dB
Low band side component addition 0dB
DRC is weak.

音声のビットレートが損失なし（たとえば１５００ｋｂｐｓ）であった場合、図５（Ｅ）に示すように、音声周波数特性がナイキスト周波数（たとえば２２ｋＨｚ）付近まで伸びている。このように、周波数的な損失がないため高域の拡張処理は不要である。また、音圧バランスの崩れもないため、ＤＲＣは不要である。
高域側成分付加なし
低域側成分付加なし
ＤＲＣなし
とする。 When the bit rate of sound is no loss (for example, 1500 kbps), as shown in FIG. 5 (E), the sound frequency characteristic extends to the vicinity of the Nyquist frequency (for example, 22 kHz). Thus, since there is no frequency loss, high-frequency extension processing is unnecessary. Also, since there is no disruption of the sound pressure balance, DRC is unnecessary.
High-frequency component added None Low-frequency component added None DRC None.

また、デコーダ２６から得られた属性情報に基づき、以下のような推定も可能である。特殊な解像度（アスペクト比が１６：９や４：３でないなど）の映像や１５ｆｐｓなどの特殊なフレームレートの映像を持つ動画は、撮影機器で撮影された映像そのものではなく、パーソナルコンピュータなどで加工されたものである可能性が高いので音声のビットレートは低いとみなしてよい。 The following estimation is also possible based on the attribute information obtained from the decoder 26. Videos with special resolutions (such as aspect ratios not 16: 9 or 4: 3) and videos with special frame rates such as 15 fps can be processed with a personal computer, etc. Therefore, the audio bit rate may be regarded as low.

なお、属性情報の音声ビットレートが高くても、圧縮されたのち伸長されたものである可能性があるため、ビットレートにかかわらず周波数特性から音質を判定し、これに基づいて最適な音声処理の内容を決定してもよい。 Note that even if the audio bit rate of the attribute information is high, it may be compressed and expanded, so the sound quality is determined from the frequency characteristics regardless of the bit rate, and optimal audio processing is performed based on this The contents of may be determined.

以上、実施形態としてテレビ受像機１について説明したが、本発明が適用される機器はテレビ受像機に限定されない。たとえば、動画処理機能を備えたＡＶアンプなどに適用可能である。 As described above, the television receiver 1 has been described as an embodiment, but the device to which the present invention is applied is not limited to the television receiver. For example, the present invention can be applied to an AV amplifier having a moving image processing function.

図６に本発明の他の実施形態であるＡＶアンプ２のブロック図を示す。同図において、図１に示したテレビ受像機１と同一構成の部分は同一番号を付して説明を省略する。ＡＶアンプ２は、オーディオソースを入力する端子も備えているが、この図では動画すなわちＡＶソースを入力する端子のみを記載している。セレクタ２０には、ＨＤＭＩ入力部２４、アナログ入力部２５に加えて、ＵＳＢ再生部２７およびＬＡＮ通信部２８を備えている。 FIG. 6 shows a block diagram of an AV amplifier 2 which is another embodiment of the present invention. In the figure, the same components as those of the television receiver 1 shown in FIG. The AV amplifier 2 also has a terminal for inputting an audio source, but in this figure, only a terminal for inputting a moving image, that is, an AV source is shown. The selector 20 includes a USB reproduction unit 27 and a LAN communication unit 28 in addition to the HDMI input unit 24 and the analog input unit 25.

また、映像処理部１１にはＨＤＭＩ出力部１７が接続されている。ＨＤＭＩ出力部１７には、外部のテレビ受像機３が接続される。映像処理部１１が出力した映像はＨＤＭＩ出力部１７を介してテレビ受像機３に送られ、テレビ受像機３の画面で表示される。また、放音処理部（アンプ）１４にはスピーカ端子１８が接続されている。スピーカ端子１８には、外部のスピーカ４が接続される。放音処理部１４が出力した音声はスピーカ端子１８を介してスピーカ４に送られ、スピーカ４から放音される。 Further, an HDMI output unit 17 is connected to the video processing unit 11. An external television receiver 3 is connected to the HDMI output unit 17. The video output from the video processing unit 11 is sent to the television receiver 3 via the HDMI output unit 17 and displayed on the screen of the television receiver 3. A speaker terminal 18 is connected to the sound emission processing unit (amplifier) 14. An external speaker 4 is connected to the speaker terminal 18. The sound output from the sound emission processing unit 14 is sent to the speaker 4 via the speaker terminal 18 and is emitted from the speaker 4.

ＵＳＢ再生部２７はＵＳＢインタフェースを備え、動画ファイルが格納された記録媒体が接続される。ＵＳＢ再生部２７はこの記録媒体に格納されている動画ファイルを読み出してデコードして再生する。すなわち、ＵＳＢ再生部２７はデコーダを備えている。ＵＳＢ再生部２７は、デコーダによって読み出された動画ファイルの属性情報をコントローラ１０に転送する。また、ＬＡＮ通信部２８はネットワークを介してストリーミング配信される動画を受信し、これをデコードして非圧縮の映像と音声を再生する。すなわち、ＬＡＮ通信部２８はデコーダを備えている。ＬＡＮ通信部２８は、デコーダによって読み出された動画ファイルの属性情報をコントローラ１０に転送する。なお、ＵＳＢ再生部２７、ＬＡＮ通信部２８は、図１に示したようにデコーダ（ＣＯＤＥＣ）を共用してもよい。 The USB playback unit 27 includes a USB interface and is connected to a recording medium in which a moving image file is stored. The USB playback unit 27 reads out, decodes and plays back the moving image file stored in the recording medium. In other words, the USB playback unit 27 includes a decoder. The USB playback unit 27 transfers the attribute information of the moving image file read by the decoder to the controller 10. In addition, the LAN communication unit 28 receives a moving image that is stream-distributed via a network, decodes this, and reproduces uncompressed video and audio. That is, the LAN communication unit 28 includes a decoder. The LAN communication unit 28 transfers the attribute information of the moving image file read by the decoder to the controller 10. Note that the USB playback unit 27 and the LAN communication unit 28 may share a decoder (CODEC) as shown in FIG.

１テレビ受像機
１０コントローラ
１１映像処理部
１１Ａ映像解析部
１２音声処理部
２６デコーダ
２ＡＶアンプ DESCRIPTION OF SYMBOLS 1 Television receiver 10 Controller 11 Video processing part 11A Video analysis part 12 Audio | voice processing part 26 Decoder 2 AV amplifier

Claims

所定のビットレートの音声を入力して再生するコンテンツ再生装置であって、
前記音声のビットレートを検出するビットレート検出部と、
前記検出されたビットレートに応じた程度で、前記音声のダイナミックレンジの圧縮および高音域成分の補完を行う音声処理部と、
を備え、
前記音声処理部は、前記音声のビットレートが高いほど、前記ダイナミックレンジの圧縮の程度を小さく、前記高音域成分の補完の程度を大きくし、前記音声のビットレートが低いほど、前記ダイナミックレンジの圧縮の程度を大きく、前記高音域成分の補完の程度を小さくするコンテンツ再生装置。 A content playback apparatus that inputs and plays back audio of a predetermined bit rate,
A bit rate detector for detecting the bit rate of the audio;
An audio processing unit that compresses the dynamic range of the audio and complements high-frequency components, to the extent corresponding to the detected bit rate;
With
The voice processing unit increases the degree of compression of the dynamic range and increases the degree of complementation of the high frequency range component as the bit rate of the voice increases, and decreases the dynamic range as the bit rate of the voice decreases. A content playback apparatus that increases the degree of compression and reduces the degree of complementation of the high-frequency component.

前記音声処理部は、前記検出されたビットレートが、前記音声に損失を与えない大きさであった場合、この音声に対して、前記ダイナミックレンジの圧縮および前記高音域成分の補完を行わない請求項１に記載のコンテンツ再生装置。 The audio processing unit does not perform compression of the dynamic range and complement of the high-frequency component for the audio when the detected bit rate is a size that does not cause loss to the audio. Item 2. A content playback apparatus according to Item 1.

所定のビットレートの音声を入力して再生するステップと、
前記音声のビットレートを検出するビットレート検出ステップと、
前記検出されたビットレートに応じた程度で、前記音声のダイナミックレンジの圧縮および高音域成分の補完を行う音声処理ステップと、
を有し、
前記音声処理ステップは、前記音声のビットレートが高いほど、前記ダイナミックレンジの圧縮の程度を小さく、前記高音域成分の補完の程度を大きくし、前記音声のビットレートが低いほど、前記ダイナミックレンジの圧縮の程度を大きく、前記高音域成分の補完の程度を小さくするコンテンツ処理方法。 Inputting and playing audio of a predetermined bit rate; and
A bit rate detecting step for detecting the bit rate of the voice;
Extent in accordance with the detected bit rate, the audio processing step of performing a complementary compression and treble components of the dynamic range of the audio,
Have
In the sound processing step, the higher the bit rate of the sound, the smaller the degree of compression of the dynamic range, and the greater the degree of complementation of the high-frequency component, and the lower the bit rate of the sound, A content processing method in which a degree of compression is increased and a degree of complementation of the high-frequency component is reduced.

前記ビットレート検出ステップによって検出されたビットレートが、前記音声に損失を与えない大きさであった場合、前記音声処理ステップを行わない請求項３に記載のコンテンツ処理方法。 The content processing method according to claim 3, wherein the audio processing step is not performed when the bit rate detected by the bit rate detection step is a magnitude that does not cause loss to the audio.